[mlir][gpu] Fix null-deref crash in gpu-kernel-outlining for unresolved symbols (#186273)
In `GpuKernelOutliningPass::createKernelModule`, the symbol-copying
worklist iterates over all symbol uses inside the outlined kernel and
looks each leaf reference up in the parent symbol table. If the symbol
refers to a name inside a nested module (e.g. `@some_module::@func`),
the leaf reference `@func` is not directly present in the parent table,
so `SymbolTable::lookup` returns nullptr. Calling `->clone()` on that
null pointer causes a segfault.
Add a null check: if the symbol is not found in the parent table (it may
live in a nested gpu.module that is already handled separately), skip
it.
Fixes #185357
Assisted-by: Claude Code
[MLIR][Bufferization] Fix out-of-bounds access in setInPlaceOpOperand (#186280)
When annotating operations with bufferization markers during analysis,
setInPlaceOpOperand reads the existing __inplace_operands_attr__ and
then sets one entry. If the attribute was provided by the user with
fewer entries than the op has operands (e.g. a return with two tensor
operands but only one entry in the annotation), the function would crash
with an out-of-bounds vector access.
Fix by resizing the vector to the actual operand count before setting
the entry when the existing annotation is too short.
Fixes #128316
Assisted-by: Claude Code
[Flang] Apply nusw nuw flags on array_coor gep's (#184573)
When generating the LLVM IR, since #110060, `nsw` is applied to
operations when lowering the subscripts. This was, up until now, only
applied to arithmetic, and not the related getelementptr's.
The original Discouse thread noted that NSW helped with vectorisation
later on in the process. Changes to the BasicAA pipeline has led to
vectorisation no longer being applied where wrapping cannot be
guaranteed for array_coor instructions. By applying the `nusw nuw` flags
to the GEP's, this enables vectorisation in the middle end. Supporting
arithmatic instructions will also be marked `nuw` to ensure instcombine
does not remove these flags when transforming instructions.
There does need to be some consideration to the `sub` operations
generated in this process. There are cases, such as when an array is
shifted, where unsigned wrapping may occur due to using negative values.
To protect against this, if an array is shifted, `nuw` won't be applied
to the `sub` operations.
[9 lines not shown]
Revert "[flang][OpenMP] Implement nest depth calculation in LoopSequence" (#186364)
Reverts llvm/llvm-project#185298
(It broke a bunch of big apps, including 535.weather)
[VPlan] Reuse mask of immediate dominator in VPlanPredicator (#185595)
Previously, VPlanPredicator only reused the mask of the loop header when
a block post-dominates the header. This patch generalizes the
optimization to reuse the mask of immediate dominator when a block
post-dominates its immediate dominator.
This reduces more redundant mask computations, simplifies the generated
code, and improves EVL tail folding.
Based on #173265
Fix #173260
[IR][Core][NFC] Drop some BranchInst uses (#186352)
Now that CondBrInst and UncondBrInst are explicit subclasses, use them
instead.
HotColdSplitting was trying to inspect prof metadata also on
unconditional branches, fix this.
Also introduce C API cast functions and deprecate LLVMIsConditional in
favor of LLVMIsACondBrInst.
This patch covers all LLVM uses outside of Transforms, Analysis,
CodeGen/Target, SandboxIR, Frontend/OpenMP, tools, examples.
[InstCombine] Fix crash in `foldReversedIntrinsicOperands` for struct-return intrinsics (#186339)
Fixes #186334
Similar to #176556 , add the missing result type check in
`foldReversedIntrinsicOperands()`. This prevents `CreateVectorReverse()`
from being applied to struct-returning intrinsics.
*/*: bump PORTREVISION for gtk40 upgrade
The gtk40 port and friends had a binary incompatible upgrade. Bump
PORTREVISION of their consumers to for rebuild and reinstallation.
PR: 292076
[AArch64] Fold NEON splats into users by using SVE immediates (#165559)
This patch adds patterns that attempt to fold NEON constant splats into
users by promoting the users to use SVE, when the splat immediate is a
legal SVE immediate operand.
This is done as ISEL patterns to avoid folding to SVE too early, which
can disrupt other patterns/combines.
[AMDGPU][Doc] GFX12.5 Barrier Execution Model
- Document GFX12.5-specific intrinsics.
- Rename signal -> arrive, leave -> drop to match C++ terminology.
- Update execution model to support GFX12.5 semantics (e.g. threads can arrive w/o waiting)
- Various clean-ups & wording updates on the model.
- Added "mutually exclusive" barrier objects.
- Added barrier-phase-with + related constraints.
- Document that barriers can exist at cluster scope too.
- Update GFX12 target semantics/code sequences to include GFX12.5.
The model is no longer marked as incomplete, it is now just experimental.
There are more updates planned in the future to support more features, and
improve some known shortcomings of the model. e.g., currently many relations
encode too much semantic information, which means the model doesn't build
when barriers aren't used correctly. I'd like the model to eventually represent
broken executions as well, just like a memory model can.