[libclc] Rename declaration .inc files to *_decl.inc (#186340)
These .inc files in the header directory have the same name as .inc
files in implementation directory. Rename them to avoid name conflict
and avoid wrong file being used in implementation. This fixes bitcode
change when changing `#include <>` to `#include ""`.
[mlir][gpu] Fix null-deref crash in gpu-kernel-outlining for unresolved symbols (#186273)
In `GpuKernelOutliningPass::createKernelModule`, the symbol-copying
worklist iterates over all symbol uses inside the outlined kernel and
looks each leaf reference up in the parent symbol table. If the symbol
refers to a name inside a nested module (e.g. `@some_module::@func`),
the leaf reference `@func` is not directly present in the parent table,
so `SymbolTable::lookup` returns nullptr. Calling `->clone()` on that
null pointer causes a segfault.
Add a null check: if the symbol is not found in the parent table (it may
live in a nested gpu.module that is already handled separately), skip
it.
Fixes #185357
Assisted-by: Claude Code
[MLIR][Bufferization] Fix out-of-bounds access in setInPlaceOpOperand (#186280)
When annotating operations with bufferization markers during analysis,
setInPlaceOpOperand reads the existing __inplace_operands_attr__ and
then sets one entry. If the attribute was provided by the user with
fewer entries than the op has operands (e.g. a return with two tensor
operands but only one entry in the annotation), the function would crash
with an out-of-bounds vector access.
Fix by resizing the vector to the actual operand count before setting
the entry when the existing annotation is too short.
Fixes #128316
Assisted-by: Claude Code
[Flang] Apply nusw nuw flags on array_coor gep's (#184573)
When generating the LLVM IR, since #110060, `nsw` is applied to
operations when lowering the subscripts. This was, up until now, only
applied to arithmetic, and not the related getelementptr's.
The original Discouse thread noted that NSW helped with vectorisation
later on in the process. Changes to the BasicAA pipeline has led to
vectorisation no longer being applied where wrapping cannot be
guaranteed for array_coor instructions. By applying the `nusw nuw` flags
to the GEP's, this enables vectorisation in the middle end. Supporting
arithmatic instructions will also be marked `nuw` to ensure instcombine
does not remove these flags when transforming instructions.
There does need to be some consideration to the `sub` operations
generated in this process. There are cases, such as when an array is
shifted, where unsigned wrapping may occur due to using negative values.
To protect against this, if an array is shifted, `nuw` won't be applied
to the `sub` operations.
[9 lines not shown]
Revert "[flang][OpenMP] Implement nest depth calculation in LoopSequence" (#186364)
Reverts llvm/llvm-project#185298
(It broke a bunch of big apps, including 535.weather)
[VPlan] Reuse mask of immediate dominator in VPlanPredicator (#185595)
Previously, VPlanPredicator only reused the mask of the loop header when
a block post-dominates the header. This patch generalizes the
optimization to reuse the mask of immediate dominator when a block
post-dominates its immediate dominator.
This reduces more redundant mask computations, simplifies the generated
code, and improves EVL tail folding.
Based on #173265
Fix #173260
[IR][Core][NFC] Drop some BranchInst uses (#186352)
Now that CondBrInst and UncondBrInst are explicit subclasses, use them
instead.
HotColdSplitting was trying to inspect prof metadata also on
unconditional branches, fix this.
Also introduce C API cast functions and deprecate LLVMIsConditional in
favor of LLVMIsACondBrInst.
This patch covers all LLVM uses outside of Transforms, Analysis,
CodeGen/Target, SandboxIR, Frontend/OpenMP, tools, examples.
[InstCombine] Fix crash in `foldReversedIntrinsicOperands` for struct-return intrinsics (#186339)
Fixes #186334
Similar to #176556 , add the missing result type check in
`foldReversedIntrinsicOperands()`. This prevents `CreateVectorReverse()`
from being applied to struct-returning intrinsics.
[AArch64] Fold NEON splats into users by using SVE immediates (#165559)
This patch adds patterns that attempt to fold NEON constant splats into
users by promoting the users to use SVE, when the splat immediate is a
legal SVE immediate operand.
This is done as ISEL patterns to avoid folding to SVE too early, which
can disrupt other patterns/combines.
[AMDGPU][Doc] GFX12.5 Barrier Execution Model
- Document GFX12.5-specific intrinsics.
- Rename signal -> arrive, leave -> drop to match C++ terminology.
- Update execution model to support GFX12.5 semantics (e.g. threads can arrive w/o waiting)
- Various clean-ups & wording updates on the model.
- Added "mutually exclusive" barrier objects.
- Added barrier-phase-with + related constraints.
- Document that barriers can exist at cluster scope too.
- Update GFX12 target semantics/code sequences to include GFX12.5.
The model is no longer marked as incomplete, it is now just experimental.
There are more updates planned in the future to support more features, and
improve some known shortcomings of the model. e.g., currently many relations
encode too much semantic information, which means the model doesn't build
when barriers aren't used correctly. I'd like the model to eventually represent
broken executions as well, just like a memory model can.
[SCEV] Introduce SCEVUse wrapper type (NFC)
Add SCEVUse as a PointerIntPair wrapper around const SCEV * to prepare
for storing additional per-use information.
This commit contains the mechanical changes of adding an intial SCEVUse
wrapper and updating all relevant interfaces to take SCEVUse. Note that
currently the integer part is never set, and all SCEVUses are
considered canonical.
[MLIR][OpenACC] Fix crash in verifyDeviceTypeCountMatch when deviceTypes is null (#186279)
When an acc.parallel op has async operands (via operandSegmentSizes) but
no corresponding asyncOperandsDeviceType attribute, the verifier called
verifyDeviceTypeCountMatch with a null ArrayAttr. The function then
dereferenced the null pointer via deviceTypes.getValue(), causing a
segfault instead of a diagnostic.
Fix by guarding the getValue() call with a null check. When deviceTypes
is absent but operands are present, the mismatch is now reported as a
proper verifier error.
Fixes #107027
Assisted-by: Claude Code
[LowerMemIntrinsics][AMDGPU] Optimize memset.pattern lowering (#185901)
This patch changes the lowering of the [experimental.memset.pattern intrinsic](https://llvm.org/docs/LangRef.html#llvm-experimental-memset-pattern-intrinsic)
to match the optimized memset and memcpy lowering when possible. (The tl;dr of
memset.pattern is that it is like memset, except that you can use it to set
values that are wider than a single byte.)
The memset.pattern lowering now queries `TTI::getMemcpyLoopLoweringType` for a
preferred memory access type. If the size of that type is a multiple of the set
value's type, and if both types have consistent store and alloc sizes (since
memset.pattern behaves in a way that is not well suitable for access widening
if store and alloc size differ), the memset.pattern is lowered into two loops:
a main loop that stores a sufficiently wide vector splat of the SetValue with
the preferred memory access type and a residual loop that covers the remaining
set values individually.
In contrast to the memset lowering, this patch doesn't include a specialized
lowering for residual loops with known constant lengths. Loops that are
statically known to be unreachable will not be emitted.
[6 lines not shown]
[C++20] [Modules] [Reduced BMI] Try not write merged lookup table (#186337)
Update:
Close https://github.com/llvm/llvm-project/issues/184957
The roo cause of the problem is reduced BMI may not emit everything in
the lookup table, if Reduced BMI **partially** emits some decls, then
the generator may not emit the corresponding entry for the corresponding
name is already there. See
MultiOnDiskHashTableGenerator::insert and
MultiOnDiskHashTableGenerator::emit for details. So we won't emit the
lookup
table if we're generating reduced BMI.