LLVM/project 73bcfb6mlir/lib/Dialect/Affine/Transforms AffineLoopInvariantCodeMotion.cpp, mlir/test/Dialect/Affine affine-loop-invariant-code-motion.mlir

[mlir][Affine] Fix LICM incorrectly hoisting stores from zero-trip-count loops (#189165)

The affine-loop-invariant-code-motion pass was hoisting side-effectful
operations (e.g. affine.store) out of loops whose trip count is
statically known to be zero. This caused stores to execute
unconditionally even though the loop body should never run, producing
incorrect results.

The fix skips hoisting of non-memory-effect-free ops when
getConstantTripCount returns 0. Pure/side-effect-free ops are still
eligible for hoisting because they cannot change observable program
state.

Fixes #128273

Assisted-by: Claude Code
DeltaFile
+74-2mlir/test/Dialect/Affine/affine-loop-invariant-code-motion.mlir
+12-1mlir/lib/Dialect/Affine/Transforms/AffineLoopInvariantCodeMotion.cpp
+86-32 files

LLVM/project e96ec28llvm/test/CodeGen/AMDGPU amdgpu-sw-lower-lds-multi-static-dynamic-indirect-access-asan.ll amdgpu-sw-lower-lds-static-dynamic-indirect-access-asan.ll

[AMDGPU] Use ASan callback functions instead of inline checks in SW lower LDS pass
DeltaFile
+30-157llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-multi-static-dynamic-indirect-access-asan.ll
+8-119llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-indirect-access-asan.ll
+6-117llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-indirect-access-asan.ll
+3-118llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-lds-test-atomicrmw-asan.ll
+7-98llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-asan.ll
+4-89llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-dynamic-lds-test-asan.ll
+58-6987 files not shown
+113-96913 files

LLVM/project 9e516f5llvm/include/llvm/CodeGen MachinePipeliner.h, llvm/lib/CodeGen MachinePipeliner.cpp

[MachinePipeliner] Remove isLoopCarriedDep and use DDG (#174394)

This patch completely removes `isLoopCarriedDep`, which was used
previously to identify loop-carried dependencies in the DAG. Now that we
have the DDG representation, this special handling is no longer
necessary. Simply replacing its usage with the DDG causes several tests
to fail, since cycle detection takes some of the validation-only edges
in the DDG into account. To address this, this patch introduces extra
edges in the DDG, which are used only for cycle detection and not for
other parts of the pass (e.g., scheduling). The extra edges are
determined to preserve the existing behavior of the pass as closely as
possible, which makes the predicates for adding them somewhat complex.

Split off from #135148, and the final patch in the series for #135148
DeltaFile
+0-335llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir
+42-50llvm/lib/CodeGen/MachinePipeliner.cpp
+15-9llvm/include/llvm/CodeGen/MachinePipeliner.h
+57-3943 files

LLVM/project 746439dllvm/include/llvm/IR DiagnosticInfo.h, llvm/lib/Target/AMDGPU AMDGPUAsmPrinter.cpp

DiagnosticInfo: Fix missing LLVM_LIFETIME_BOUND on Twine arguments

Fix use after free errors in DiagnosticInfoResourceLimit uses.
DeltaFile
+16-22llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+2-1llvm/include/llvm/IR/DiagnosticInfo.h
+18-232 files

LLVM/project 94e3fe7offload/test/libc rpc_callback.cpp

Merge branch 'main' into users/kasuga-fj/pipeliner-remove-isloopcarrieddep
DeltaFile
+1-1offload/test/libc/rpc_callback.cpp
+1-11 files

LLVM/project 3bc3e30llvm/test/CodeGen/AArch64 sms-instruction-scheduled-at-correct-cycle.mir

remove useless test
DeltaFile
+0-338llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir
+0-3381 files

LLVM/project a2d3783offload/test/libc rpc_callback.cpp

[offload][libc] Adapt test to changes in #190239 (#190330)
DeltaFile
+1-1offload/test/libc/rpc_callback.cpp
+1-11 files

LLVM/project 87439fdllvm/include/llvm/CodeGen MachinePipeliner.h, llvm/lib/CodeGen MachinePipeliner.cpp

[MachinePipeliner] Remove isLoopCarriedDep and use DDG
DeltaFile
+42-50llvm/lib/CodeGen/MachinePipeliner.cpp
+15-9llvm/include/llvm/CodeGen/MachinePipeliner.h
+3-0llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir
+60-593 files

LLVM/project fa8a303llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Transforms/InstCombine/AArch64 neon-fcvtz-roundtrip.ll

[AArch64] Fold fcvtzu/fcvtzs(uitofp/sitofp(x)) roundtrip

stack-info: PR: https://github.com/llvm/llvm-project/pull/190328, branch: users/SavchenkoValeriy/feat/instcombine/fcvtzu_fcvtzs_roundtrip/stack/2
DeltaFile
+210-0llvm/test/Transforms/InstCombine/AArch64/neon-fcvtz-roundtrip.ll
+47-0llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+257-02 files

LLVM/project fb89973llvm/include/llvm/Transforms/InstCombine InstCombiner.h, llvm/lib/Transforms/InstCombine InstCombineCasts.cpp

[InstCombine][NFC] Expose isKnownExactCastIntToFP as a public method

stack-info: PR: https://github.com/llvm/llvm-project/pull/190327, branch: users/SavchenkoValeriy/feat/instcombine/fcvtzu_fcvtzs_roundtrip/stack/1
DeltaFile
+6-9llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+4-0llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
+10-92 files

LLVM/project bd6a0ebclang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

[CIR] Auto-generate matchAndRewrite for one-to-one CIR-to-LLVM lowerings

When a CIR op specifies a non-empty `llvmOp` field, the lowering
emitter now generates the `matchAndRewrite` body that converts the
result type and forwards all operands to the corresponding LLVM op.
This removes 27 boilerplate lowering patterns from LowerToLLVM.cpp.

Ops needing custom logic (FMaxNumOp/FMinNumOp for FastmathFlags::nsz)
override `llvmOp = ""` to retain hand-written implementations.

Also fixes llvmOp names (TruncOp -> FTruncOp, FloorOp -> FFloorOp)
and adds a diagnostic rejecting conflicting llvmOp + custom constructor.
DeltaFile
+0-255clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+30-5clang/utils/TableGen/CIRLoweringEmitter.cpp
+6-2clang/include/clang/CIR/Dialect/IR/CIROps.td
+36-2623 files

LLVM/project ff86be2mlir/lib/Dialect/MemRef/Transforms FlattenMemRefs.cpp, mlir/test/Dialect/MemRef flatten_memref.mlir

[MLIR][MemRef] Fix AllocOp/AllocaOp flattening domination violation (#188980)

The generic MemRefRewritePattern handles AllocOp/AllocaOp by calling
getFlattenMemrefAndOffset with the op's own result as the source memref.
This inserts ExtractStridedMetadataOp and ReinterpretCastOp that consume
op.result before the alloc op itself in the block. After
replaceOpWithNewOp, op.result is RAUW'd to the new ReinterpretCastOp
result, leaving those earlier ops with forward references — a domination
violation caught by MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.

Replace the AllocOp/AllocaOp cases in MemRefRewritePattern with a
dedicated AllocLikeFlattenPattern that never touches op.result until the
final replaceOpWithNewOp:
- sizes come from op.getMixedSizes() (operands, not the result)
- strides come from getStridesAndOffset on the MemRefType
- the flat allocation size is computed via
getLinearizedMemRefOffsetAndSize plus the static base offset so the
buffer covers [0, offset+extent)
- castAllocResult is simplified to take the pre-computed sizes and

    [10 lines not shown]
DeltaFile
+97-34mlir/lib/Dialect/MemRef/Transforms/FlattenMemRefs.cpp
+73-0mlir/test/Dialect/MemRef/flatten_memref.mlir
+170-342 files

LLVM/project 7c1d91cbolt/runtime instr.cpp

[BOLT] Move extern "C" out of unnamed namespace (#190282)

GCC 15 changes how it interprets extern "C" in unnamed namespaces and
gives the variable internal linkage.
DeltaFile
+2-2bolt/runtime/instr.cpp
+2-21 files

LLVM/project d725513mlir/lib/Dialect/Affine/Analysis Utils.cpp, mlir/test/Dialect/SCF foreach-thread-canonicalization.mlir

[MLIR][Affine] Fix null operands in simplifyConstrainedMinMaxOp (#189246)

`mlir::affine::simplifyConstrainedMinMaxOp` called
`canonicalizeMapAndOperands` with `newOperands` that could contain null
`Value()`s. These nulls came from
`unpackOptionalValues(constraints.getMaybeValues(), newOperands)` where
internal constraint variables added by `appendDimVar` (for `dimOp`,
`dimOpBound`, and `resultDimStart*`) have no associated SSA values.

Passing null Values to `canonicalizeMapAndOperands` risks undefined
behavior:
- `seenDims.find(null_value)` in the DenseMap causes all null operands
to collide at the same key, producing incorrect dim remapping.
- Any null operand that remains referenced in the result map would
propagate as a null Value into `AffineValueMap`, crashing callers that
try to use those operands to create ops.

Fix: Before calling `canonicalizeMapAndOperands`, filter null operands
from `newOperands` by replacing their dim/symbol positions in `newMap`

    [6 lines not shown]
DeltaFile
+52-1mlir/test/Dialect/SCF/foreach-thread-canonicalization.mlir
+41-0mlir/lib/Dialect/Affine/Analysis/Utils.cpp
+93-12 files

LLVM/project a7bf249mlir/lib/Interfaces/Utils InferIntRangeCommon.cpp, mlir/test/Dialect/Affine int-range-interface.mlir

[mlir][IntRangeAnalysis] Fix assertion in inferAffineExpr for mod with range crossing modulus boundary (#188842)

The "small range with constant divisor" optimization in
`inferAffineExpr` for `AffineExprKind::Mod` assumed that if the dividend
range span (`lhsMax - lhsMin`) is less than the divisor, then the mod
results form a contiguous range. This is not always true, as the range
can straddle a modulus boundary.

For example, `[14, 17] mod 8`:
- Span is 3 < 8, so the old condition passed
- But `14%8=6` and `17%8=1` (wraps at 16)
- `umin=6, umax=1` → assertion `umin.ule(umax)` fails

The fix adds a same-quotient check (`lhsMin/rhs == lhsMax/rhs`) to
ensure both endpoints fall within the same modular period. When they
don't, we fall back to the conservative `[0, divisor-1]` range.

Assisted-by: Cursor (Claude)

Signed-off-by: Yu-Zhewen <zhewenyu at amd.com>
DeltaFile
+12-0mlir/test/Dialect/Affine/int-range-interface.mlir
+7-5mlir/lib/Interfaces/Utils/InferIntRangeCommon.cpp
+19-52 files

LLVM/project c80443cclang/include/clang/StaticAnalyzer/Core/PathSensitive CoreEngine.h ExprEngine.h, clang/lib/StaticAnalyzer/Core ExprEngine.cpp CoreEngine.cpp

[NFC][analyzer] Eliminate SwitchNodeBuilder (#188096)

This commit removes the class `SwitchNodeBuilder` because it just
obscured the logic of switch handling by hiding some parts of it in
another source file.
DeltaFile
+31-11clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+0-23clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
+0-18clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h
+11-1clang/test/Analysis/switch-basics.c
+0-1clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+42-545 files

LLVM/project 6286d74offload/plugins-nextgen/amdgpu/src rtl.cpp

offload: Parse Triple using triple for amdgcn-amd-amdhsa

Avoid hardcoding the exact triple.
DeltaFile
+12-4offload/plugins-nextgen/amdgpu/src/rtl.cpp
+12-41 files

LLVM/project e46c5a8llvm/test/CodeGen/AArch64 arm64-stur.ll

[AArch64] Regenerate arm64-stur.ll. NFC (#190317)
DeltaFile
+43-24llvm/test/CodeGen/AArch64/arm64-stur.ll
+43-241 files

LLVM/project f91124alldb/include/lldb/Core ModuleList.h Module.h, lldb/source/Commands CommandObjectTarget.cpp

[lldb][Module] Only call LoadScriptingResourceInTarget via ModuleList (#190136)

This patch is motivated by
https://github.com/llvm/llvm-project/pull/189943, where we would like to
print the "these module scripts weren't loaded" warning for *all*
modules batched together. I.e., we want to print the warning *after* all
the script loading attempts, not from within each attempt.

To do so we want to hoist the `ReportWarning` calls in
`Module::LoadScriptingResourceInTarget` out into the callsites. But if
we do that, the callers have to remember to print the warnings. To avoid
this, we redirect all callsites to use
`ModuleList::LoadScriptingResourceInTarget`, which will be responsible
for printing the warnings.

To avoid future accidental uses of
`Module::LoadScriptingResourceInTarget` I moved the API into
`ModuleList` and made it `private`.
DeltaFile
+0-87lldb/source/Core/Module.cpp
+84-1lldb/source/Core/ModuleList.cpp
+5-14lldb/source/Target/Target.cpp
+4-10lldb/source/Commands/CommandObjectTarget.cpp
+5-0lldb/include/lldb/Core/ModuleList.h
+0-2lldb/include/lldb/Core/Module.h
+98-1146 files

LLVM/project 8db1f64mlir/include/mlir/Reducer Tester.h, mlir/lib/Reducer OptReductionPass.cpp Tester.cpp

[mlir][reducer] Remove the restriction that OptReductionPass must be a ModuleOp (#189038)

This PR aims to make the pass more generic by removing the ModuleOp
restriction. This PR reimplements the logic using a standalone
PassManager. Additionally, the isInteresting method has been updated to
accept Operation* for better flexibility. Finally, a dedicated test
directory has been added to improve the organization of OptReductionPass
tests.
DeltaFile
+15-18mlir/lib/Reducer/OptReductionPass.cpp
+17-0mlir/test/mlir-reduce/opt-reduction/dce-test.mlir
+0-17mlir/test/mlir-reduce/dce-test.mlir
+16-0mlir/test/mlir-reduce/opt-reduction/cse-test.mlir
+3-3mlir/lib/Reducer/Tester.cpp
+1-1mlir/include/mlir/Reducer/Tester.h
+52-392 files not shown
+53-428 files

LLVM/project 5347990clang/include/clang/CIR MissingFeatures.h, clang/include/clang/CIR/Dialect/IR CIROps.td

[CIR] Remove OpenCLKernel enum and updated the ordering to match llvm::CallingConv
DeltaFile
+8-8clang/include/clang/CIR/Dialect/IR/CIROps.td
+0-4clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+0-1clang/include/clang/CIR/MissingFeatures.h
+8-133 files

LLVM/project df48719llvm/lib/Target/AMDGPU AMDGPULowerKernelArguments.cpp, llvm/test/CodeGen/AMDGPU lower-kernel-arguments-noalias-call-no-ptr-args.ll

[AMDGPU] Add !noalias metadata to mem-accessing calls w/o pointer args (#188949)

addAliasScopeMetadata in AMDGPULowerKernelArguments skips instructions
with empty PtrArgs, including memory-accessing calls that have no
pointer arguments (e.g. builtins like threadIdx()). Because these calls
never receive !noalias metadata, ScopedNoAliasAA cannot prove they don't
alias noalias kernel arguments. MemorySSA then conservatively reports
them as clobbers, which prevents AMDGPUAnnotateUniformValues from
marking loads as noclobber, blocking scalarization (s_load) and forcing
expensive vector loads (global_load) instead.

Fix by adding all noalias kernel argument scopes to !noalias metadata
for memory-accessing instructions with no pointer arguments. Since such
instructions cannot access memory through any kernel pointer argument,
all noalias scopes are safe to apply.

This fixes a performance regression in rocFFT introduced by bd9668df0f00
("[AMDGPU] Propagate alias information in AMDGPULowerKernelArguments").

Assisted-by: Claude Opus
DeltaFile
+133-0llvm/test/CodeGen/AMDGPU/lower-kernel-arguments-noalias-call-no-ptr-args.ll
+51-39llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
+184-392 files

LLVM/project e09d1e3llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

[VPlan] Use not_equal_to to improve code (NFC) (#190262)
DeltaFile
+1-1llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+1-11 files

LLVM/project a1b303eclang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp

unreachable on RDC compilation
DeltaFile
+2-1clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+2-11 files

LLVM/project 24bcb78clang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp

fix undefined void ty
DeltaFile
+1-0clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+1-01 files

LLVM/project f3e629bclang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp, clang/test/CIR/CodeGenCUDA device-stub.cu

[CIR][CUDA] Handle CUDA module constructor and destructor emission.
DeltaFile
+122-2clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+41-0clang/test/CIR/CodeGenCUDA/device-stub.cu
+163-22 files

LLVM/project a52a504clang-tools-extra/clang-doc Representation.h

[clang-doc] Prepare Info types for Arena allocation (#190046)

To allocate Info structures directly in an Arena, they cannot have
members with nontrivial destructors, or we will leak memory. Before we
migrate them, we can replace growable vector types with intrusive lists.

This introduces some slight overhead as these types now have new pointer
members for use in ilists in later patches.

| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 1005.7s | 1010.5s | +9.8% | +0.5% |
| Memory | 86.0G | 42.1G | 42.9G | -50.2% | +1.8% |

| Benchmark | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| BM_BitcodeReader_Scale/10 | 67.9us | 68.6us | 69.2us | +1.9% | +0.9% |
| BM_BitcodeReader_Scale/10000 | 70.5ms | 21.3ms | 21.9ms | -68.9% |
+2.8% |

    [32 lines not shown]
DeltaFile
+9-8clang-tools-extra/clang-doc/Representation.h
+9-81 files

LLVM/project bc40a11clang/include/clang/Basic DiagnosticSemaKinds.td, clang/test/CodeGenCUDA amdgpu-atomic-ops.cu atomic-ops.cu

warning default ignored, new group -Whip-deprecated-builtins
DeltaFile
+3-8clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
+2-2clang/test/SemaHIP/atomic-deprecated.hip
+2-2clang/test/CodeGenHIP/atomic-deprecated-fixit.hip
+3-0clang/include/clang/Basic/DiagnosticSemaKinds.td
+1-1clang/test/CodeGenCUDA/atomic-ops.cu
+1-1clang/test/SemaCUDA/atomic-ops.cu
+12-143 files not shown
+15-169 files

LLVM/project f919a8bclang/include/clang/CIR/Dialect/IR CIROps.td

[CIR] Add calling convention values to CIR_CallingConv
DeltaFile
+8-3clang/include/clang/CIR/Dialect/IR/CIROps.td
+8-31 files

LLVM/project bb53a86clang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/CodeGen CIRGenModule.cpp

[CIR] Add calling_conv attribute to FuncOp with lowering support
DeltaFile
+38-0clang/test/CIR/IR/calling-conv.cir
+34-0clang/test/CIR/Lowering/calling-conv.cir
+23-5clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+24-1clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+16-1clang/include/clang/CIR/Dialect/IR/CIROps.td
+5-6clang/lib/CIR/CodeGen/CIRGenModule.cpp
+140-133 files not shown
+143-219 files