LLVM/project c236ef5llvm/include/llvm InitializePasses.h, llvm/include/llvm/CodeGen CFIFixup.h

[CodeGen][NewPM] Port cfi-fixup to new pass manager (#203692)

Standard work for `cfi-fixup`.
DeltaFile
+14-4llvm/lib/CodeGen/CFIFixup.cpp
+9-2llvm/include/llvm/CodeGen/CFIFixup.h
+1-1llvm/include/llvm/InitializePasses.h
+1-1llvm/include/llvm/Passes/MachinePassRegistry.def
+1-1llvm/lib/CodeGen/CodeGen.cpp
+1-1llvm/lib/CodeGen/TargetPassConfig.cpp
+27-105 files not shown
+32-1111 files

LLVM/project ad611b6llvm/lib/Target/X86 X86ISelLoweringCall.cpp, llvm/test/CodeGen/X86 abi-isel.ll

[X86] Do not hold GOT base for indirect call or absolute address (#203192)

Fixes:
https://github.com/llvm/llvm-project/pull/202370#discussion_r3384983368

Assisted-by: Claude Sonnet 4.6
DeltaFile
+32-40llvm/test/CodeGen/X86/abi-isel.ll
+6-5llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+38-452 files

LLVM/project 05efe1bclang/test/Sema warn-lifetime-safety.cpp, clang/test/Sema/LifetimeSafety safety.cpp

rebase

Created using spr 1.3.7
DeltaFile
+3,204-3,450llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
+1,905-2,037llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
+3,716-0clang/test/Sema/LifetimeSafety/safety.cpp
+0-3,653clang/test/Sema/warn-lifetime-safety.cpp
+2,760-227llvm/test/CodeGen/AMDGPU/fcanonicalize.ll
+1,813-654llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sched.group.barrier.ll
+13,398-10,0211,541 files not shown
+92,209-41,2531,547 files

LLVM/project 568122aclang/include/clang/Options Options.td, clang/lib/Driver/ToolChains AMDGPU.cpp AMDGPU.h

clang/AMDGPU: Split out target ID flags in TranslateArgs. (#203750)

Change how xnack and sramecc are processed. Introduce
-mxnack/-mno-xnack and -msramecc/-mno-sramecc flags.
When the target is first parsed in TranslateArgs, synthesize
the appropriate flag for the toolchain. This avoids
special case feature string fixups in getAMDGPUTargetFeatures,
and also avoids an extra parse of the target ID.

In the future this will also simplify tracking these ABI
modifiers in a module flag.

As a side-effect, you can use these flags to override the
no specifier case with the flags. These do not fully replace
the target ID syntax, as there's no way to represent compiling
both modes for the same subtarget.

I didn't bother trying to forward these flags on the main command
line without being specified to the offload device, but I suppose

    [2 lines not shown]
DeltaFile
+149-0clang/test/Driver/amdgpu-xnack-sramecc-flags.c
+24-27clang/lib/Driver/ToolChains/AMDGPU.cpp
+9-4clang/test/Driver/hip-target-id.hip
+6-4clang/lib/Driver/ToolChains/AMDGPU.h
+3-2clang/lib/Driver/ToolChains/HIPAMD.cpp
+4-0clang/include/clang/Options/Options.td
+195-375 files not shown
+203-4411 files

LLVM/project 08940b5clang-tools-extra/clang-tidy/bugprone NotNullTerminatedResultCheck.cpp, clang-tools-extra/clang-tidy/modernize LoopConvertUtils.cpp

[clang-tidy][NFC] Apply const-correctness to code (#203823)
DeltaFile
+12-10clang-tools-extra/clang-tidy/readability/ElseAfterReturnCheck.cpp
+10-10clang-tools-extra/clang-tidy/bugprone/NotNullTerminatedResultCheck.cpp
+6-5clang-tools-extra/clang-tidy/readability/EnumInitialValueCheck.cpp
+6-4clang-tools-extra/clang-tidy/modernize/LoopConvertUtils.cpp
+4-4clang-tools-extra/clang-tidy/readability/ImplicitBoolConversionCheck.cpp
+4-4clang-tools-extra/clang-tidy/performance/UnnecessaryCopyInitializationCheck.cpp
+42-3720 files not shown
+78-6926 files

LLVM/project 6f916feclang/docs/tools dump_ast_matchers.py

[ASTMatchers][Docs] print ignoring message only when class was not documented before (#203783)
DeltaFile
+4-1clang/docs/tools/dump_ast_matchers.py
+4-11 files

LLVM/project c775d6elibcxx/include print

[libc++] Make the body of println(FILE*) dependent on the template parameter to avoid template instantiation (#200996)

Make the function parameter of the `std::print` call inside the
`std::println` overload taking `FILE*` dependent on the template
parameter to avoid eager instantiation.
DeltaFile
+2-2libcxx/include/print
+2-21 files

LLVM/project bab217bmlir/lib/Bindings/Python IRCore.cpp, mlir/test/python context_shutdown.py

[mlir][python] Fix segfault at interpreter shutdown with entered contexts

The thread-local context stack (`PyThreadContextEntry::getStack()`)
holds `nb::object` references to Python Context, Location, and
InsertionPoint objects. When a Context is entered via `__enter__` but
never exited before the interpreter shuts down, these references
cause a segfault during process teardown.

The crash sequence:
1. User calls `ctx.__enter__()`, pushing a frame onto the
   `static thread_local vector<PyThreadContextEntry>`.
2. The script ends; CPython runs `Py_FinalizeEx()` which tears down
   the interpreter (clears modules, destroys remaining objects).
3. `main()` returns.
4. The C runtime destroys static/thread_local storage. On the main
   thread, thread_local variables have the same destruction timing
   as static storage — they are destroyed *after* main() returns.
5. The vector destructor runs, and each `PyThreadContextEntry`'s
   `nb::object` members call `Py_DECREF` — but the interpreter is

    [8 lines not shown]
DeltaFile
+26-0mlir/test/python/context_shutdown.py
+9-0mlir/lib/Bindings/Python/IRCore.cpp
+35-02 files

LLVM/project b113f5butils/bazel/llvm-project-overlay/flang/tools/flang-driver BUILD.bazel

[Bazel] Fixes 625facd (#203814)

This fixes 625facd4375f6bfa5de501d0559bd262062e2dc3.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+1-0utils/bazel/llvm-project-overlay/flang/tools/flang-driver/BUILD.bazel
+1-01 files

LLVM/project 80ae495clang/lib/Sema SemaExprCXX.cpp

fixup! formatting
DeltaFile
+1-1clang/lib/Sema/SemaExprCXX.cpp
+1-11 files

LLVM/project 06b52a0clang/test/CIR/CodeGenHIP builtins-amdgcn-extended-image.hip

[CIR][AMDGPU] Adds missing test cases
DeltaFile
+28-4clang/test/CIR/CodeGenHIP/builtins-amdgcn-extended-image.hip
+28-41 files

LLVM/project c9cdee0clang/lib/CIR/CodeGen CIRGenBuiltinAMDGPU.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn-extended-image.hip

[CIR][AMDGPU] Adds lowering for amdgcn extended image sample/gather4 builtins
DeltaFile
+350-0clang/test/CIR/CodeGenHIP/builtins-amdgcn-extended-image.hip
+50-12clang/lib/CIR/CodeGen/CIRGenBuiltinAMDGPU.cpp
+400-122 files

LLVM/project 3f55f00mlir/lib/Bindings/Python IRCore.cpp, mlir/test/python context_shutdown.py

[mlir][python] Fix segfault at interpreter shutdown with entered contexts

The thread-local context stack (`PyThreadContextEntry::getStack()`)
holds `nb::object` references to Python Context, Location, and
InsertionPoint objects. When a Context is entered via `__enter__` but
never exited before the interpreter shuts down, these references
cause a segfault during process teardown.

The crash sequence:
1. User calls `ctx.__enter__()`, pushing a frame onto the
   `static thread_local vector<PyThreadContextEntry>`.
2. The script ends; CPython runs `Py_FinalizeEx()` which tears down
   the interpreter (clears modules, destroys remaining objects).
3. `main()` returns.
4. The C runtime destroys static/thread_local storage. On the main
   thread, thread_local variables have the same destruction timing
   as static storage — they are destroyed *after* main() returns.
5. The vector destructor runs, and each `PyThreadContextEntry`'s
   `nb::object` members call `Py_DECREF` — but the interpreter is

    [8 lines not shown]
DeltaFile
+27-0mlir/test/python/context_shutdown.py
+11-0mlir/lib/Bindings/Python/IRCore.cpp
+38-02 files

LLVM/project 05c8f9bclang/lib/AST/ByteCode Interp.cpp, clang/test/AST/ByteCode codegen-cxx2a.cpp

[clang][bytecode] Overide constant context state in CallVar (#203747)

We do this for regular calls, so do it for variable calls as well. Also
remove two comments that don't have any meaning today anymore.
DeltaFile
+26-0clang/test/AST/ByteCode/codegen-cxx2a.cpp
+1-6clang/lib/AST/ByteCode/Interp.cpp
+27-62 files

LLVM/project c730204clang/lib/Sema SemaExprCXX.cpp, clang/test/CXX/drs cwg5xx.cpp

[Clang] Implement CWG 2282
DeltaFile
+41-26clang/lib/Sema/SemaExprCXX.cpp
+8-5clang/test/CXX/expr/expr.unary/expr.new/p14.cpp
+7-6clang/test/SemaCXX/new-delete.cpp
+6-0clang/test/SemaCXX/std-align-val-t-in-operator-new.cpp
+3-2clang/test/CXX/drs/cwg5xx.cpp
+65-395 files

LLVM/project 8714a01llvm/lib/Target/AArch64 AArch64TargetTransformInfo.h, llvm/test/CodeGen/AArch64 sve-fixed-length-masked-64-128bit-loads.ll sve-fixed-length-masked-64-128bit-stores.ll

[AArch64] Use sve for 64 wide masked load/store (#203480)
DeltaFile
+136-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-64-128bit-loads.ll
+118-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-64-128bit-stores.ll
+0-85llvm/test/CodeGen/AArch64/sve-fixed-length-masked-128bit-loads.ll
+0-69llvm/test/CodeGen/AArch64/sve-fixed-length-masked-128bit-stores.ll
+5-4llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+259-1585 files

LLVM/project f211ea1llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

[VPlan] Don't calculate VPDT in non-side-effect early exit loops. NFC (#203476)

Follow up from
https://github.com/llvm/llvm-project/pull/203233/changes/BASE..6c916678df65787d524558919cee233bf2329aa8#r3398107262
DeltaFile
+6-5llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+6-51 files

LLVM/project 8106136llvm/lib/Support Z3Solver.cpp, llvm/lib/Target/Hexagon/MCTargetDesc HexagonMCTargetDesc.cpp

[llvm] Replace unordered_map<std::string, T> with StringMap (#203815)

Prefer StringMap to the slow unordered_map per ProgrammersManual.
DeltaFile
+10-10llvm/tools/llvm-exegesis/lib/SubprocessMemory.cpp
+7-8llvm/lib/Support/Z3Solver.cpp
+7-7llvm/tools/llvm-exegesis/lib/SubprocessMemory.h
+6-6llvm/tools/llvm-profgen/ProfiledBinary.cpp
+4-5llvm/utils/TableGen/DFAPacketizerEmitter.cpp
+4-5llvm/lib/Target/Hexagon/MCTargetDesc/HexagonMCTargetDesc.cpp
+38-419 files not shown
+57-6315 files

LLVM/project 625facdflang/test/Driver compiler-options.f90, flang/tools/flang-driver driver.cpp

[Flang] Store only options in FLANG_COMPILER_OPTIONS_STRING (#201278)

Previously, FLANG_COMPILER_OPTIONS_STRING stored every argument passed
to the flang driver, including input file names. The GNU extension
compiler_options() is documented to return only the options, not the
input files. Including the input files also caused the string to exceed
ARG_MAX on large builds, producing:

    posix_spawn failed: Argument list too long

Use the driver's parsed InputArgList to filter out OPT_INPUT arguments,
preserving all options and their values (e.g. `-I /path`, `-o file`).

Fixes: https://github.com/llvm/llvm-project/issues/170651
DeltaFile
+17-9flang/tools/flang-driver/driver.cpp
+1-1flang/test/Driver/compiler-options.f90
+18-102 files

LLVM/project ee5856dmlir/lib/Conversion/IndexToSPIRV IndexToSPIRV.cpp, mlir/test/Conversion/IndexToSPIRV index-to-spirv.mlir

[mlir][SPIR-V] Add OpenCL lowering path for index.min/max (#203493)

GLSL min/max ops were emitted even for Kernel targets where these ops
are illegal
DeltaFile
+26-4mlir/test/Conversion/IndexToSPIRV/index-to-spirv.mlir
+18-11mlir/lib/Conversion/IndexToSPIRV/IndexToSPIRV.cpp
+44-152 files

LLVM/project 838f7f7mlir/lib/Dialect/SPIRV/IR SPIRVOps.cpp MemoryOps.cpp, mlir/test/Dialect/SPIRV/IR structure-ops.mlir

[mlir][SPIR-V] Enforce physical storage buffer pointer decorations on GlobalVariable (#203600)

Enable the rule that was already enforced by spirv.Variable (in case of
SPV_KHR_physical_storage_buffer ext) requiring exactly one of
AliasedPointer/RestrictPointer
DeltaFile
+50-0mlir/test/Dialect/SPIRV/IR/structure-ops.mlir
+45-0mlir/lib/Dialect/SPIRV/IR/SPIRVOps.cpp
+3-28mlir/lib/Dialect/SPIRV/IR/MemoryOps.cpp
+7-0mlir/lib/Dialect/SPIRV/IR/SPIRVOpUtils.h
+105-284 files

LLVM/project e413f6ellvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch crc.ll

[LoongArch] Propagate demanded bits for CRC[C].W.{B,H}.W (#203201)

CRC byte and halfword instructions only use the low 8 or 16 bits of
their data operand. Propagate these demanded-bit requirements through
SimplifyDemandedBitsForTargetNode() so redundant masking operations can
be removed during DAG combining.
DeltaFile
+21-4llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+0-4llvm/test/CodeGen/LoongArch/crc.ll
+21-82 files

LLVM/project 569c21amlir/lib/Dialect/SPIRV/IR ControlFlowOps.cpp, mlir/test/Dialect/SPIRV/IR control-flow-ops.mlir

[mlir][SPIR-V] Fix empty optional deref in spirv.Switch verifier (#203561)
DeltaFile
+16-0mlir/test/Dialect/SPIRV/IR/control-flow-ops.mlir
+8-6mlir/lib/Dialect/SPIRV/IR/ControlFlowOps.cpp
+24-62 files

LLVM/project 9f2035dllvm/lib/Transforms/Vectorize VPlanConstruction.cpp LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/VPlan tail-folding.ll

[VPlan] Move tail folding logic out of addMiddleCheck. NFC (#203475)

We simplify the TripCount == VectorTrip count condition with tail
folding, but we can just do that in foldTailByMasking and keep the
logic in one place instead.
DeltaFile
+12-7llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+4-0llvm/test/Transforms/LoopVectorize/VPlan/tail-folding.ll
+1-2llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+1-1llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
+1-1llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+19-115 files

LLVM/project b455aa6llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch crc.ll

[LoongArch] Propagate demanded bits for CRC[C].W.{B,H}.W

CRC byte and halfword instructions only use the low 8 or 16 bits of
their data operand. Propagate these demanded-bit requirements through
SimplifyDemandedBitsForTargetNode() so redundant masking operations can
be removed during DAG combining.
DeltaFile
+21-4llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+0-4llvm/test/CodeGen/LoongArch/crc.ll
+21-82 files

LLVM/project 31f998dllvm/test/CodeGen/LoongArch crc.ll

[LoongArch][NFC] Add demanded bits tests for CRC[C].W.{B,H}.W (#203200)
DeltaFile
+50-0llvm/test/CodeGen/LoongArch/crc.ll
+50-01 files

LLVM/project e79e056mlir/lib/Transforms/Utils CSE.cpp

[mlir][CSE] Remove the opsToErase container and immediately delete dead ops. (#203702)

This PR removes the `opsToErase` container and immediately erases dead
operations. Since dead ops are deleted on the fly, the value in the
`MemEffectsCache` map now correctly tracks the previous operation of
`toOp`. This change improves the storage efficiency of CSE. Furthermore,
it is part of https://github.com/llvm/llvm-project/pull/180556 and
substantially simplifies the implementation.
DeltaFile
+23-23mlir/lib/Transforms/Utils/CSE.cpp
+23-231 files

LLVM/project 303400flibclc/clc/lib/generic/math clc_sincos_helpers_fp64.inc

[libclc] Use __CLC_SCALAR instead of nonexistant __CLC_SCALAR1 for sin (#203807)

Summary:
This seems to be a typo? Every other case is guarded by `__CLC_SCALAR`
but this case had a `1` after it. Removing this improved performance on
sin/cos/tan to match the ROCm version.
DeltaFile
+1-1libclc/clc/lib/generic/math/clc_sincos_helpers_fp64.inc
+1-11 files

LLVM/project 8b6551elibclc/clc/lib/generic/math clc_acos.inc clc_atan.inc

[libclc] Use FMA for the pi reconstruction in acos / atan (#203804)

Summary:
This should recombine the split constant for this case. The performance
should be negligible for such large math functions, we get an extra add,
but in exchange the results should improve 1 ULP.

This was primarily done to match what AMD's math libraries do, with this
change we are byte-for-byte identical in output.
DeltaFile
+4-4libclc/clc/lib/generic/math/clc_acos.inc
+1-1libclc/clc/lib/generic/math/clc_atan.inc
+5-52 files

LLVM/project bf0ccb4llvm/lib/Passes PassBuilderPipelines.cpp, llvm/test/Other new-pm-lto-defaults.ll

[Passes] Invoke CGSCCOptimizerLateEP callbacks in LTO pipeline (#203262)

The CGSCCOptimizerLateEP extension point was not being invoked in the
LTO pipeline. Right now only AMDGPU registers any passes during this
callback, but it was a real source of delta between the LTO and default
pipelines when targeting AMDGPU. This doesn't seem to be an intentional
omission given that it is instantiated in thinLTO as well. Just add it.
DeltaFile
+4-0llvm/test/Other/new-pm-lto-defaults.ll
+1-0llvm/lib/Passes/PassBuilderPipelines.cpp
+5-02 files