[clang][bytecode] Add a missing fallthrough() call (#194537)
When the local variable is enabled but we don't emit any dtor
instructions, we need to fallthrough to the `EndLabel`.
[mlir][tosa] Make tosa-attach-target optional in addTosaToLinalgPasses (#193467)
addTosaToLinalgPasses unconditionally scheduled tosa-attach-target, thus
adding a `tosa.target_env` attribute to the module. Callers therefore
had no way to opt out of such attribute. This attribute is consumed if
validationOptions is enabled, which is optional. Therefore, if the
caller disables validationOptions, the tosa-attach-target attribute will
exist even after TosaToLinalg. So consumers that don't load the TOSA
dialect can't even parse the resulting module.
This PR makes sure that we schedule tosa-attach-target only when the
caller actually needs it, i.e. when validationOptions or
attachTargetOptions is set. The default values stay inside the
`!attachTargetOptions` branch so callers that set validationOptions
still get a target env to validate against, preserving existing
behavior.
Also add a `validation` pipeline option (default `true`) to the
registered `tosa-to-linalg-pipeline`, so it can be invoked without
scheduling either `tosa-attach-target` / `tosa-validate`. A lit test is
also added.
[clang-tidy] Fix UB in SuspiciousIncludeCheck when IgnoredRegex is not set (#194521)
When the "IgnoredRegex" option is not set, `IgnoredRegexString` is
default-constructed, i.e. initialized with a null data pointer. This is
passed to `llvm_regcomp` as the pattern argument, causing a nullptr+0 UB
in regcomp.c (caught by UBSan). Fix by initializing
`IgnoredRegexString` with an empty string literal instead.
[CodeGen] Make AsmPrinter::MAI a reference. NFC (#194538)
AsmPrinter::MAI is non-null. This is made more explicit after
PR #194523 changed TargetMachine::getMCAsmInfo to return a reference
with recent MCAsmInfo/MCTargetOptions related refactoring.
Convert the member from const MCAsmInfo * to const MCAsmInfo & and
update all consumers.
Fix UNSUPPORTED added to test in #194502. (#194541)
The change in #194502 attempted to mark the test as UNSUPPORTED for
AArch64, but it didn't work because it wasn't specified correctly. This
fixes it.
[RFC][AMDGPU] Add AMDGPU_SUMMARY bitcode block for ThinLTO
With AMDGPU object linking, device functions are compiled separately from the
kernels that call them. Without whole-program visibility, the compiler must be
conservative about occupancy for every device function, leading to suboptimal
resource usage. However, GPU kernels typically carry explicit occupancy control
attributes that constrain the launch environment. ThinLTO is the natural place
to propagate these kernel attributes to callees: the combined module summary
index contains a cross-TU call graph, allowing occupancy information to be
propagated top-down from kernels to all reachable device functions. The backend
can then generate better code with the propagated constraints, achieving
whole-program awareness without the compile-time overhead of full LTO.
This patch introduces a dedicated AMDGPU_SUMMARY bitcode block that serializes
per-function summary data alongside the standard module summary. The block is
scoped to AMDGPU so that non-AMDGPU targets are completely unaffected. A
follow-up patch will add the ThinLTO propagation logic that reads these
summaries and applies conservative attribute bounds to device functions
reachable from multiple kernels.
[DataLayout] Add null pointer value infrastructure
Add support for specifying the null pointer bit representation per address space
in DataLayout via new pointer spec flags:
- 'z': null pointer is all-zeros
- 'o': null pointer is all-ones
When neither flag is present, the address space inherits the default set by the
new 'N<null-value>' top-level specifier ('Nz' or 'No'). If that is also absent,
the null pointer value is zero.
No target DataLayout strings are updated in this change. This is pure
infrastructure for a future ConstantPointerNull semantic change to support
targets with non-zero null pointers (e.g. AMDGPU).
[CIR][AMDGPU] Add lowering for amdgcn div fmas builtins (#194334)
Upstreaming ClangIR PR: https://github.com/llvm/clangir/pull/2051
This PR adds support for lowering of _builtin_amdgcn_div_fmas* amdgpu
builtins to clangIR.
Followed similar lowering from reference clang->llvmir in
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp.
[MLIR][OpenMP] Post-translate declare-target USM indirection in OpenMPIRBuilder
When lowering OpenMP to LLVM IR for the target device, record pairs of the
`declare target` device global and the OMPIRBuilder "ref" pointer global
(used for unified shared memory) via `OpenMPIRBuilder`. During the
`OpenMPIRBuilder::finalize` pass, run a postpass that rewrites remaining uses of the
original global to load from the ref global and adjust the pointer (shared
path for `ConstantExpr` addrspace/bitcast chains and for direct
instruction uses).
This follows what is done by clang for similar cases:
https://reviews.llvm.org/D63108.
Co-authored-by: Composer
Co-authored-by: Gemini Pro
[Docs] Update indents for SandboxIR and RemoveDIsDebugInfo (#194528)
This distinguishes the doc title from the headers.
Fixes navigation indents for Furo theme update (see
https://github.com/llvm/llvm-project/pull/184440).
[Flang][OpenMP] Clear close on descriptor members for box parents in USM (#194287)
Extend the MapInfoFinalization walk introduced in #185330 so
parent/member close consistency is enforced whenever
unified_shared_memory is in effect, not only when the parent map's
variable is a fir.RecordType. Allocatable (box) roots expand to member
maps the same way as derived-type instances; getDescriptorMapType may
add OMP_MAP_CLOSE to implicit descriptor members while the parent map
does not set close, which led to bad device behavior under
-fopenmp-force-usm with multiple mapped allocatables.
PR stack:
- https://github.com/llvm/llvm-project/pull/194287 (this one)
- https://github.com/llvm/llvm-project/pull/194291
Co-authored-by: Composer (Cursor) <ai at cursor.com>
[MC] Make MCContext::getAsmInfo return a reference. NFC (#194523)
The MAI member is non-null. #194280 made this clearer by making the
MCContext constructor take MCAsmInfo by reference. Convert getAsmInfo to
return const MCAsmInfo & and the member to a reference.
[AArch64][SelectionDAG] Generate subs+csel for usub.sat (#193203)
Fixes https://github.com/llvm/llvm-project/issues/191488
As this is a regression of
https://github.com/llvm/llvm-project/pull/170076, adds a check to avoid
generic lowering of usub.sat to X - zext(X != 0) in case of aarch64 by
making the constraint of this transformation stricter via an extra
isOperationLegalOrCustom guard on USUBO_CARRY. All other backends will
still receive generic lowering as implemented in the original patch.
[InstCombine] Reland #165975: Fix #163110: Support peeling off matching shifts from icmp operands via canEvaluateShifted (#190918)
This relanding of #165975 fixes the bug that caused the bootstrap-asan
buildbot failure
(https://lab.llvm.org/buildbot/#/builders/52/builds/16329).
## Original optimization
Consider a pattern like: `icmp (shl nsw/nuw X, L), (add nsw/nuw (shl
nsw/nuw Y, L), K)`
When K is a multiple of 2^L, this can be simplified to: `icmp X, (add
nsw/nuw Y, K >> L)`
This patch extends `canEvaluateShifted` to support `Instruction::Add`
and refactors its signature to accept a `ShiftSemantics` enum (`Lossy` /
`Unsigned` / `Signed`) instead of a bare opcode. This allows the
function to enforce losslessness requirements according to the overflow
flags (nsw/nuw) of the operands. The logic is wired into
[14 lines not shown]
[clang][bytecode] Rework APValue visiting (#194408)
First, we can't just ignore the LValuePath of an lvalue APValue. Add
code to handle that and a test case exercising the newly added code.
We also didn't look at APValue bases when initializing from an APValue.
[SystemZ] Enable -fpatchable-function-entry=M,N (#178191)
This PR enables the option `-fpatchable-function-entry` for SystemZ. It
utilizes existing common code and just adds the emission of nops after
the function label in the backend.
SystemZ provides multiple nop options of varying length, making the
semantics of this option somewhat ambiguous. In order to align with what
`gcc` does with that same option, we#re choosing `nopr` as the
canoonical nop for this purpose.
For test, this adapts an existing test file from aarch64.
(cherry picked from commit 355898a6ce901bf9285a428888068e008b5557e9)
[X86] lowerV4F32Shuffle - don't use INSERTPS if SHUFPS will suffice (#186468)
If we have 2 or more undef/undemanded elements, the INSERTPS replaces
those with explicit zero'd elements which can cause infinite loops later
on in shuffle combining depending on whether we demand those elements or
not.
I'll try to improve the (minor) v2f32 regressions in a follow up, but I
need to fix the infinite loop first.
Fixes #186403
[C++20] [Modules] Add VisiblePromoted module ownership kind (#189903)
This patch adds a new ModuleOwnershipKind::VisiblePromoted to handle
declarations that are not visible to the current TU but are promoted to
be visible to avoid re-parsing.
Originally we set the visible visiblity directly in such cases. But
https://github.com/llvm/llvm-project/issues/188853 shows such decls may
be excluded later if we import #include and then import. So we have to
introduce a new visibility to express the intention that the visibility
of the decl is intentionally promoted.
Close https://github.com/llvm/llvm-project/issues/188853
(cherry picked from commit c97e08e331736ae8c7d17bf1f24954570f564ad0)