[MLIR][Presburger] Make getSubMatrix exclusive on the right end (#190911)
Currently `getSubMatrix(fromRow, toRow, fromCol, toCol)` forms a
submatrix with both ends inclusive. In this way, it's impossible to form
an empty submatrix, as the assertions in the function prevents cases
where `toRow < fromRow`. However, the functionality is necessary for
Barvinok procedures (e.g. we might want to inspect the submatrix for
parameters, which will be empty if there's none).
This PR changes it to be inclusive on the left end and exclusive on the
right end, making it the same as canonical C++ ranges.
[clang][AArch64][nfc] Remove redundant truncation for FP16 reduction builtins (#195825)
The following non-overloaded NEON builtins already return the expected
result
type, so CodeGen does not need to truncate their results:
* BI__builtin_neon_vmaxv_f16
* BI__builtin_neon_vmaxvq_f16
* BI__builtin_neon_vminv_f16
* BI__builtin_neon_vminvq_f16
* BI__builtin_neon_vmaxnmv_f16
* BI__builtin_neon_vmaxnmvq_f16
* BI__builtin_neon_vminnmv_f16
* BI__builtin_neon_vminnmvq_f16
Remove the redundant truncation from AArch64 CodeGen.
[RISC-V] Add support for cheriot ABI in DataLayout (#190806)
CHERIoT uses the same DataLayout setup as RISC-V Y base, but does not share instruction encodings with it.
[MLIR] Parallel loop fusion extended to interchanged loops. (#191245)
Patch extends fusion of two parallel loops to the case where the second
parallel loop comprises of two interchanged loops of same iteration
space.
[SCEV] Introduce loop-uniform SCEV classification. (#194304)
This patch extends `ScalarEvolution::LoopDisposition` with a new
`LoopUniform` state to describe SCEVs that are invariant across all
iterations of a given loop, but may still depend on inner-loop induction
variables.
Unlike `LoopInvariant`, which requires the value to be fully invariant
with respect to the loop, LoopUniform captures expressions that do not
depend on the loop’s own induction variables, yet may vary in nested
loops. This distinction is useful for analyses and optimizations that
reason about per-iteration stability at a specific loop level.
Example:
```
for (i)
for (j)
dep(j); // uniform w.r.t. i
dep(i, j); // not uniform w.r.t. i
[4 lines not shown]
[mlir][spirv] Lower math.ctlz to OpenCL.std clz for Kernel targets (#195470)
Lower `math.ctlz` to `spirv.CL.Clz` for targets with Kernel capability.
Shader targets keep the existing GLSL-based fallback implemented via
`spirv.GL.FindUMsb`.
Previously, `math.ctlz` was lowered through the GLSL path using
`spirv.GL.FindUMsb` plus additional SPIR-V ops. That worked for Shader
targets, but failed legalization for OpenCL/Kernel targets where Shader
capability is not supported.
[clang][bytecode] Fix sized builtin operator delete handling (#195741)
**Problem:**
A crash happens with std::allocator triggered sized/aligned delete
operations with new constant evaluator.
`interp__builtin_operator_delete` currently consumes the top of the
interpreter stack as a `Pointer`.
This is correct for unsized delete:
```cpp
__builtin_operator_delete(p);
```
but not for sized/aligned delete reached through
`std::allocator<T>::deallocate`:
[64 lines not shown]
[AMDGPU][Doc] Move barrier documentation to a separate document (#194569)
Create a new "AMDGPU Execution Synchronization" document.
For now, it just documents barriers and their execution model.
Hopefully, over time, we can improve it to document the
programming model of most common methods of synchronizing execution
of threads (e.g. using memory/spinlock).
I kept the documentation mostly as-is, but I did do some minor changes
to make it flow a bit better as a standalone document. For example,
the fact that barriers work at a wavefront granularity has been moved
to the section about `s_barrier` specifically.
I also moved the note about barrier objects existing within a scope
in the main documentation. As a result, the "target-specific properties"
section has been eliminated.
[fir] Lower to llvm int constants with appropriately typed int attrs (#195861)
When we lower fir operations to llvm int constants, we used to always
generate `llvm.mlir.constant`s with a i64 integer attribute regardless
of the width of the constant type. This made some llvm dialect level
folding hit assertions in some cases.
Fix this by generating the appropriately typed integer attributes
matching the constant type.
[ELF,test] Cover --why-live mark() paths in MarkLive (#196007)
Add cases that exercise the non-parallel mark() loop reached only when
TrackWhyLive is true: cNamedSections.lookup in resolveReloc
(__libc_atexit
via __start_/__stop_), the nextInSectionGroup fallthrough, and the
.eh_frame personality CIE relocation processed by scanEhFrameSection.
MarkLive.cpp coverage on check-lld-elf goes 90.88% -> 92.18% regions,
84.15% -> 86.04% branches.
[MLIR][XeGPU] Clean up the temporary layout usage in XeGPU test (#195739)
This PR cleans up the XeGPU test to remove the temporary layout usage.
All distribution and unrolling tests now don't use temporary layout from
the operation and TensorDescriptor, since the recovery process won't
honor the temporary layout and only depends on the anchor layout.
It also refactors the layout function implementation by removing
recursive loops in getDistributeLayoutAttr(), and fixes two issues
surfaced from the test clean up: adding layout recovery support for
Extract/Insert op and tensor descriptor type.
[LoopFusion] Document LoopFusion Pass (#192926)
The LoopFusion pass, currently disabled by default, lacks documentation. This patch is the first attempt to document the flow and current limitations.
Assisted by : Claude Opus 4.6
[LiveDebugValues] Avoid SmallSet for dead registers (#195841)
transferRegisterDef builds a list of dead registers and removes open ranges for
debug locations that use those registers. This list used a SmallSet, so each
insert also does uniquing in the hot per-instruction path. This showed up under
SmallSet<Register, 32>::insertImpl on profiles of sqlite on aarch64-O0-g.
Using a SmallVector instead and uniquing in collectIDsForRegs improves
compile-time.
CTMark geomean:
- stage1-O0-g: -0.35%
- stage1-aarch64-O0-g: -0.72%
- stage2-O0-g: -0.27%
https://llvm-compile-time-tracker.com/compare.php?from=c9d713aa48a714d20b8502d06b9feb24829e6f22&to=6c0d4aafb9e325259c88577d148ac13c643ea993&stat=instructions%3Au
Assisted-by: codex
[RegAlloc] consider urgent evict in evictInterference (#192631)
This assertion causes a crash in programs with high register pressure
when inline assembly is used.
```
assert((ExtraInfo->getCascade(Intf->reg()) < Cascade ||
VirtReg.isSpillable() < Intf->isSpillable()) &&
"Cannot decrease cascade number, illegal eviction");
```
It should account for the case where an urgent eviction may result in
cascade being less than `ExtraInfo->getCascade(Intf->reg())`
---------
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
[CIR][NFC] Upstream mem2reg.cir from incubator (#194517)
Upstream `mem2reg.cir` from incubator.
Check that stack slots are promoted away after CFG flattening.
Partially addresses #156747.