LLVM/project 79f35a2mlir/include/mlir/Analysis/Presburger Matrix.h, mlir/lib/Analysis/Presburger Barvinok.cpp Matrix.cpp

[MLIR][Presburger] Make getSubMatrix exclusive on the right end (#190911)

Currently `getSubMatrix(fromRow, toRow, fromCol, toCol)` forms a
submatrix with both ends inclusive. In this way, it's impossible to form
an empty submatrix, as the assertions in the function prevents cases
where `toRow < fromRow`. However, the functionality is necessary for
Barvinok procedures (e.g. we might want to inspect the submatrix for
parameters, which will be empty if there's none).

This PR changes it to be inclusive on the left end and exclusive on the
right end, making it the same as canonical C++ ranges.
DeltaFile
+7-8mlir/lib/Analysis/Presburger/Barvinok.cpp
+5-5mlir/lib/Analysis/Presburger/Matrix.cpp
+2-1mlir/include/mlir/Analysis/Presburger/Matrix.h
+14-143 files

LLVM/project 1c70435clang/lib/CodeGen/TargetBuiltins ARM.cpp

[clang][AArch64][nfc] Remove redundant truncation for FP16 reduction builtins (#195825)

The following non-overloaded NEON builtins already return the expected
result
type, so CodeGen does not need to truncate their results:

  * BI__builtin_neon_vmaxv_f16
  * BI__builtin_neon_vmaxvq_f16
  * BI__builtin_neon_vminv_f16
  * BI__builtin_neon_vminvq_f16
  * BI__builtin_neon_vmaxnmv_f16
  * BI__builtin_neon_vmaxnmvq_f16
  * BI__builtin_neon_vminnmv_f16
  * BI__builtin_neon_vminnmvq_f16

Remove the redundant truncation from AArch64 CodeGen.
DeltaFile
+8-15clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+8-151 files

LLVM/project 8d61b16llvm/lib/TargetParser TargetDataLayout.cpp, llvm/unittests/TargetParser TripleTest.cpp

[RISC-V] Add support for cheriot ABI in DataLayout (#190806)

CHERIoT uses the same DataLayout setup as RISC-V Y base, but does not share instruction encodings with it.
DeltaFile
+8-6llvm/lib/TargetParser/TargetDataLayout.cpp
+13-0llvm/unittests/TargetParser/TripleTest.cpp
+21-62 files

LLVM/project 1e52b49llvm/test/CodeGen/RISCV reserved-reg-errors.ll, llvm/test/CodeGen/RISCV/rvv fixed-vectors-mask-logic.ll

[RISCV] Fix duplicate RUN lines in tests (#182272)
DeltaFile
+1-1llvm/test/CodeGen/RISCV/reserved-reg-errors.ll
+0-2llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-logic.ll
+1-32 files

LLVM/project b9bcdabmlir/lib/Dialect/SCF/Transforms ParallelLoopFusion.cpp, mlir/test/Dialect/SCF parallel-loop-fusion.mlir

[MLIR] Parallel loop fusion extended to interchanged loops. (#191245)

Patch extends fusion of two parallel loops to the case where the second
parallel loop comprises of two interchanged loops of same iteration
space.
DeltaFile
+48-2mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
+26-0mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
+74-22 files

LLVM/project a257e2allvm/include/llvm/Analysis ScalarEvolution.h, llvm/lib/Analysis ScalarEvolution.cpp

[SCEV] Introduce loop-uniform SCEV classification. (#194304)

This patch extends `ScalarEvolution::LoopDisposition` with a new
`LoopUniform` state to describe SCEVs that are invariant across all
iterations of a given loop, but may still depend on inner-loop induction
variables.

Unlike `LoopInvariant`, which requires the value to be fully invariant
with respect to the loop, LoopUniform captures expressions that do not
depend on the loop’s own induction variables, yet may vary in nested
loops. This distinction is useful for analyses and optimizations that
reason about per-iteration stability at a specific loop level.

Example:
```
for (i)
  for (j)
    dep(j);       // uniform w.r.t. i
    dep(i, j);    // not uniform w.r.t. i

    [4 lines not shown]
DeltaFile
+12-12llvm/test/Analysis/ScalarEvolution/max-expr-cache.ll
+24-0llvm/include/llvm/Analysis/ScalarEvolution.h
+20-2llvm/lib/Analysis/ScalarEvolution.cpp
+9-9llvm/test/Analysis/ScalarEvolution/incorrect-exit-count.ll
+7-7llvm/test/Analysis/ScalarEvolution/different-loops-recs.ll
+6-6llvm/test/Analysis/ScalarEvolution/exit-count-select-safe.ll
+78-3610 files not shown
+103-6116 files

LLVM/project 7e1f6b6llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.rsq.f16.ll llvm.amdgcn.rsq.ll

AMDGPU/GlobalISel: Implement RegBankLegalizeRules for amdgcn_rsq and amdgcn_rsq_clamp. (#187672)
DeltaFile
+167-91llvm/test/CodeGen/AMDGPU/llvm.amdgcn.rsq.f16.ll
+149-2llvm/test/CodeGen/AMDGPU/llvm.amdgcn.rsq.ll
+80-2llvm/test/CodeGen/AMDGPU/llvm.amdgcn.rsq.clamp.ll
+8-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+404-954 files

LLVM/project 1fb93e8llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize vplan-widen-select-instruction.ll

[VPlan] Deep-traverse in narrowToSingleScalars (#194680)

vputils::isSingleScalar will anyway return false for Replicate in
replicate regions.
DeltaFile
+3-5llvm/test/Transforms/LoopVectorize/vplan-widen-select-instruction.ll
+1-5llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+4-102 files

LLVM/project 8079feaclang/lib/CIR/CodeGen CIRGenCUDANV.cpp, clang/test/CIR/CodeGenCUDA device-stub.cu kernel-call.cu

Add HIP LLVM lowering and proper HIP host kernel handle emission.
DeltaFile
+39-3clang/test/CIR/CodeGenCUDA/device-stub.cu
+2-2clang/test/CIR/CodeGenCUDA/kernel-call.cu
+2-2clang/lib/CIR/CodeGen/CIRGenCUDANV.cpp
+1-1clang/test/CIR/CodeGenHIP/simple.cpp
+44-84 files

LLVM/project f86c2f5llvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp, llvm/test/CodeGen/AMDGPU vector-reduce-fadd.ll vector-reduce-fmul.ll

[AMDGPU][GlobalISel] Fold G_TRUNC(G_LSHR(x, 16)) into hi16 subregister copy in True16 mode
DeltaFile
+52-82llvm/test/CodeGen/AMDGPU/vector-reduce-fadd.ll
+52-82llvm/test/CodeGen/AMDGPU/vector-reduce-fmul.ll
+55-47llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+12-36llvm/test/CodeGen/AMDGPU/flat-saddr-store.ll
+21-19llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
+21-19llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
+213-2853 files not shown
+220-2999 files

LLVM/project 915a1e5llvm/test/CodeGen/SystemZ memcpy-03.ll memmove-01.ll

[SystemZ] Remove superfluous args in tests. (#196022)

The third %val argument only makes sense for memset, so remove from
memcpy/memmove tests.
DeltaFile
+69-69llvm/test/CodeGen/SystemZ/memcpy-03.ll
+69-69llvm/test/CodeGen/SystemZ/memmove-01.ll
+138-1382 files

LLVM/project 480f144llvm/lib/Transforms/Vectorize VPlanRecipes.cpp

[VPlan] Use map_to_vector to create ParamTys vector (NFC). (#195931)

Use map_to_vector to slightly simplify code.
DeltaFile
+5-6llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+5-61 files

LLVM/project f9c0cf5mlir/lib/Conversion/MathToSPIRV MathToSPIRV.cpp, mlir/test/Conversion/MathToSPIRV math-to-opencl-spirv.mlir

[mlir][spirv] Lower math.ctlz to OpenCL.std clz for Kernel targets (#195470)

Lower `math.ctlz` to `spirv.CL.Clz` for targets with Kernel capability.
Shader targets keep the existing GLSL-based fallback implemented via
`spirv.GL.FindUMsb`.

Previously, `math.ctlz` was lowered through the GLSL path using
`spirv.GL.FindUMsb` plus additional SPIR-V ops. That worked for Shader
targets, but failed legalization for OpenCL/Kernel targets where Shader
capability is not supported.
DeltaFile
+42-33mlir/lib/Conversion/MathToSPIRV/MathToSPIRV.cpp
+9-0mlir/test/Conversion/MathToSPIRV/math-to-opencl-spirv.mlir
+51-332 files

LLVM/project 7234297clang/lib/AST/ByteCode InterpBuiltin.cpp, clang/test/AST/ByteCode new-delete.cpp

[clang][bytecode] Fix sized builtin operator delete handling (#195741)

**Problem:**

A crash happens with std::allocator triggered sized/aligned delete
operations with new constant evaluator.

`interp__builtin_operator_delete` currently consumes the top of the
interpreter stack as a `Pointer`.

This is correct for unsized delete:

```cpp
__builtin_operator_delete(p);
```

but not for sized/aligned delete reached through
`std::allocator<T>::deallocate`:


    [64 lines not shown]
DeltaFile
+10-0clang/test/AST/ByteCode/new-delete.cpp
+8-0clang/lib/AST/ByteCode/InterpBuiltin.cpp
+18-02 files

LLVM/project fa74542llvm/docs AMDGPUExecutionSynchronization.rst AMDGPUUsage.rst

[AMDGPU][Doc] Move barrier documentation to a separate document (#194569)

Create a new "AMDGPU Execution Synchronization" document.
For now, it just documents barriers and their execution model.
Hopefully, over time, we can improve it to document the
programming model of most common methods of synchronizing execution
of threads (e.g. using memory/spinlock).

I kept the documentation mostly as-is, but I did do some minor changes
to make it flow a bit better as a standalone document. For example,
the fact that barriers work at a wavefront granularity has been moved
to the section about `s_barrier` specifically.
I also moved the note about barrier objects existing within a scope
in the main documentation. As a result, the "target-specific properties"
section has been eliminated.
DeltaFile
+430-0llvm/docs/AMDGPUExecutionSynchronization.rst
+4-411llvm/docs/AMDGPUUsage.rst
+4-0llvm/docs/UserGuides.rst
+438-4113 files

LLVM/project d027cacflang/lib/Optimizer/Support Utils.cpp, flang/test/Fir logical-convert.fir convert-to-llvm-openmp-and-fir.fir

[fir] Lower to llvm int constants with appropriately typed int attrs (#195861)

When we lower fir operations to llvm int constants, we used to always
generate `llvm.mlir.constant`s with a i64 integer attribute regardless
of the width of the constant type. This made some llvm dialect level
folding hit assertions in some cases.

Fix this by generating the appropriately typed integer attributes
matching the constant type.
DeltaFile
+46-46flang/test/Fir/logical-convert.fir
+5-5flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
+2-2flang/test/Fir/convert-to-llvm.fir
+1-1flang/test/Fir/tbaa.fir
+1-1flang/test/Fir/global-initialization.fir
+1-1flang/lib/Optimizer/Support/Utils.cpp
+56-566 files

LLVM/project 8513771clang/include/clang/Analysis/Analyses/LifetimeSafety Origins.h, clang/lib/Analysis/LifetimeSafety Origins.cpp FactsGenerator.cpp

[LifetimeSafety] Track per-field origins for record types
DeltaFile
+237-4clang/test/Sema/warn-lifetime-safety.cpp
+82-8clang/lib/Analysis/LifetimeSafety/Origins.cpp
+59-24clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+47-13clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+21-12clang/lib/Analysis/LifetimeSafety/LiveOrigins.cpp
+4-6clang/test/Sema/warn-lifetime-safety-dangling-field.cpp
+450-672 files not shown
+455-678 files

LLVM/project 1b6cc52clang/lib/CIR/CodeGen CIRGenBuiltinAMDGPU.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn.hip

[CIR][AMDGPU] Add lowering for amdgcn ds swizzle builtin.
DeltaFile
+10-1clang/lib/CIR/CodeGen/CIRGenBuiltinAMDGPU.cpp
+8-0clang/test/CIR/CodeGenHIP/builtins-amdgcn.hip
+18-12 files

LLVM/project b197418lld/test/ELF why-live.test

[ELF,test] Cover --why-live mark() paths in MarkLive (#196007)

Add cases that exercise the non-parallel mark() loop reached only when
TrackWhyLive is true: cNamedSections.lookup in resolveReloc
(__libc_atexit
via __start_/__stop_), the nextInSectionGroup fallthrough, and the
.eh_frame personality CIE relocation processed by scanEhFrameSection.

MarkLive.cpp coverage on check-lld-elf goes 90.88% -> 92.18% regions,
84.15% -> 86.04% branches.
DeltaFile
+59-0lld/test/ELF/why-live.test
+59-01 files

LLVM/project 6ab5d93clang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp

Simplify `RegisterFunction` call on target divergance
DeltaFile
+8-12clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+8-121 files

LLVM/project b2dd248clang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp

fix fmt
DeltaFile
+4-5clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+4-51 files

LLVM/project 4248db2clang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp, clang/test/CIR/CodeGenCUDA device-stub.cu

[CIR][HIP] Handle HIP module constructor and destructor emission
DeltaFile
+147-5clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+121-0clang/test/CIR/CodeGenCUDA/device-stub.cu
+268-52 files

LLVM/project 95aecf0clang/lib/CIR/CodeGen CIRGenModule.cpp, clang/test/CIR/Lowering fmv-features.cir

use interleaveComma and features.reserve
DeltaFile
+35-0clang/test/CIR/Lowering/fmv-features.cir
+6-3clang/lib/CIR/CodeGen/CIRGenModule.cpp
+41-32 files

LLVM/project 3053a3cmlir/lib/Dialect/XeGPU/Transforms XeGPULayoutImpl.cpp, mlir/test/Dialect/XeGPU xegpu-wg-to-sg.mlir sg-to-wi-experimental-unit.mlir

[MLIR][XeGPU] Clean up the temporary layout usage in XeGPU test (#195739)

This PR cleans up the XeGPU test to remove the temporary layout usage.
All distribution and unrolling tests now don't use temporary layout from
the operation and TensorDescriptor, since the recovery process won't
honor the temporary layout and only depends on the anchor layout.
It also refactors the layout function implementation by removing
recursive loops in getDistributeLayoutAttr(), and fixes two issues
surfaced from the test clean up: adding layout recovery support for
Extract/Insert op and tensor descriptor type.
DeltaFile
+314-171mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir
+106-217mlir/test/Dialect/XeGPU/sg-to-wi-experimental-unit.mlir
+165-42mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+84-95mlir/test/Dialect/XeGPU/sg-to-wi-experimental.mlir
+102-58mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir
+75-85mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-elemwise.mlir
+846-6688 files not shown
+935-73014 files

LLVM/project cb15e67llvm/docs LoopFusion.rst Passes.rst

[LoopFusion] Document LoopFusion Pass (#192926)

The LoopFusion pass, currently disabled by default, lacks documentation. This patch is the first attempt to document the flow and current limitations.

Assisted by : Claude Opus 4.6
DeltaFile
+442-0llvm/docs/LoopFusion.rst
+7-0llvm/docs/Passes.rst
+449-02 files

LLVM/project 9b0d277llvm/lib/CodeGen/LiveDebugValues VarLocBasedImpl.cpp

[LiveDebugValues] Avoid SmallSet for dead registers (#195841)

transferRegisterDef builds a list of dead registers and removes open ranges for
debug locations that use those registers. This list used a SmallSet, so each
insert also does uniquing in the hot per-instruction path. This showed up under
SmallSet<Register, 32>::insertImpl on profiles of sqlite on aarch64-O0-g.

Using a SmallVector instead and uniquing in collectIDsForRegs improves
compile-time.

CTMark geomean:
- stage1-O0-g: -0.35%
- stage1-aarch64-O0-g: -0.72%
- stage2-O0-g: -0.27%

https://llvm-compile-time-tracker.com/compare.php?from=c9d713aa48a714d20b8502d06b9feb24829e6f22&to=6c0d4aafb9e325259c88577d148ac13c643ea993&stat=instructions%3Au

Assisted-by: codex
DeltaFile
+8-9llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
+8-91 files

LLVM/project d97c568clang/test/Analysis/Scalable/ssaf-analyzer analyzer.test, clang/test/Analysis/Scalable/ssaf-analyzer/Inputs lu.json

Revert "[clang][ssaf] Add `clang-ssaf-analyzer` (#188881)" (#195993)

This reverts commit 51d2a66d52a95beeb31de81dd819c603062a5770 introduced by PR https://github.com/llvm/llvm-project/pull/188881 because of an HWSan failure.
DeltaFile
+0-141clang/test/Analysis/Scalable/ssaf-analyzer/analyzer.test
+0-134clang/tools/clang-ssaf-analyzer/SSAFAnalyzer.cpp
+0-126clang/test/Analysis/Scalable/ssaf-analyzer/Inputs/lu.json
+0-90clang/test/Analysis/Scalable/ssaf-analyzer/Outputs/all.json
+0-81clang/test/Analysis/Scalable/ssaf-analyzer/Outputs/both.json
+0-70clang/test/Analysis/Scalable/ssaf-analyzer/Outputs/pairs.json
+0-64215 files not shown
+0-93121 files

LLVM/project dfb8d68llvm/include/llvm/CodeGen RegAllocEvictionAdvisor.h, llvm/lib/CodeGen RegAllocEvictionAdvisor.cpp RegAllocGreedy.cpp

[RegAlloc] consider urgent evict in evictInterference (#192631)

This assertion causes a crash in programs with high register pressure
when inline assembly is used.

```
    assert((ExtraInfo->getCascade(Intf->reg()) < Cascade ||
            VirtReg.isSpillable() < Intf->isSpillable()) &&
           "Cannot decrease cascade number, illegal eviction");
```

It should account for the case where an urgent eviction may result in
cascade being less than `ExtraInfo->getCascade(Intf->reg())`

---------

Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
DeltaFile
+17-12llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
+17-0llvm/test/CodeGen/RISCV/regalloc-greedy-urgent-evict.ll
+4-0llvm/include/llvm/CodeGen/RegAllocEvictionAdvisor.h
+2-0llvm/lib/CodeGen/RegAllocGreedy.cpp
+40-124 files

LLVM/project d6a1064clang/test/CIR/Transforms mem2reg.cir

[CIR][NFC] Upstream mem2reg.cir from incubator (#194517)

Upstream `mem2reg.cir` from incubator.

Check that stack slots are promoted away after CFG flattening.

Partially addresses #156747.
DeltaFile
+23-0clang/test/CIR/Transforms/mem2reg.cir
+23-01 files

LLVM/project 5f72b7cllvm/docs AMDGPUUsage.rst, llvm/docs/AMDGPU DeveloperGuideline.rst

[NFC][AMDGPU][Doc] Add developer guideline

This guideline covers topics on top of existing LLVM guideline.
DeltaFile
+442-0llvm/docs/AMDGPU/DeveloperGuideline.rst
+1-0llvm/docs/AMDGPUUsage.rst
+443-02 files