LLVM/project d8a7467clang/lib/CodeGen/Targets NVPTX.cpp, clang/test/CodeGen scoped-atomic-ops.c

[NVPTX] Support __scoped_atomic_* operations in NVPTX (#184737)

Summary:
All the infrastructure for this is here, it's just no one's turned it
on.
DeltaFile
+1,419-130clang/test/CodeGen/scoped-atomic-ops.c
+33-0clang/lib/CodeGen/Targets/NVPTX.cpp
+1,452-1302 files

LLVM/project 71608f4libcxx/utils/ci/images libcxx_next_runners.txt

[libc++] Update the libcxx-next-runners image (#185871)
DeltaFile
+1-1libcxx/utils/ci/images/libcxx_next_runners.txt
+1-11 files

LLVM/project 5368b81mlir/include/mlir/Dialect/Arith/IR ArithOps.td, mlir/lib/Dialect/Arith/IR ArithCanonicalization.td ArithOps.cpp

[MLIR][Arith] Add canonicalization rules for int-to-float of integer extension (#185386)

Three patterns are valid but were missing:

1. `sitofp(extsi(x)) → sitofp(x)`: extsi preserves the sign and value,
so it represents the same signed integer as x.

2. `uitofp(extui(x)) → uitofp(x)`: same reasoning as above, but for
unsigned extension.

3. `sitofp(extui(x)) → uitofp(x)` extui zero-extends, so the extended
value is always non-negative. For non-negative integers, sitofp and
uitofp produce the same result, meaning we could replace the left
expression by `uitofp(extui(x))`. At this point rule 2. above can be
used to simplify further to `uitofp(x)`.

All three rewrites have been verified with Alive2.
DeltaFile
+36-1mlir/test/Dialect/Arith/canonicalize.mlir
+23-0mlir/lib/Dialect/Arith/IR/ArithCanonicalization.td
+7-4mlir/test/Dialect/Arith/emulate-wide-int-canonicalization.mlir
+10-0mlir/lib/Dialect/Arith/IR/ArithOps.cpp
+2-0mlir/include/mlir/Dialect/Arith/IR/ArithOps.td
+78-55 files

LLVM/project acd52a2flang/include/flang/Utils OpenMP.h, flang/lib/Lower/OpenMP Utils.cpp Utils.h

[flang][OpenMP][DoConcurrent] Emit declare mapper for records (#179936)

Extends `do concurrent` device support by emitting compiler-generated
declare mapper ops for live-ins whose types are record types and have
allocatable members.
DeltaFile
+9-107flang/lib/Lower/OpenMP/Utils.cpp
+104-0flang/lib/Utils/OpenMP.cpp
+33-1flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+28-0flang/test/Transforms/DoConcurrent/implicit_mapper.f90
+8-4flang/lib/Lower/OpenMP/Utils.h
+9-0flang/include/flang/Utils/OpenMP.h
+191-1122 files not shown
+202-1158 files

LLVM/project 3b8cd6cmlir/lib/Dialect/Affine/Transforms SuperVectorize.cpp, mlir/test/Dialect/Affine/SuperVectorize vectorize_1d.mlir

[mlir][affine] Fix crash in affine-super-vectorize for index constants inside loops (#184614)

When an arith.constant of index type is defined inside the loop body
being vectorized, vectorizeConstant creates a vector<Nxindex> constant
and registers it as the vector replacement. However,
getScalarValueReplacementsFor (used by vectorizeAffineStore to compute
indices for vector.transfer_write) looks only in the scalar replacement
map. With no scalar replacement registered for the index constant, it
falls back to the original scalar value, which is erased when the scalar
loop is cleaned up. This results in a "operation destroyed but still has
uses" crash.

Fix: when vectorizeConstant processes an index-typed constant, also
create a new scalar constant in the vector loop body and register it as
the scalar replacement. This ensures that memory operation index
computation can find a live value in the vectorized IR.

Fixes #122213

Assisted-by: Claude Code
DeltaFile
+22-0mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir
+13-0mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
+35-02 files

LLVM/project 4fd826dllvm/include/llvm/IR Instructions.h Instruction.def, llvm/lib/IR Instructions.cpp

[IR] Split Br into UncondBr and CondBr (#184027)

BranchInst currently represents both unconditional and conditional
branches. However, these are quite different operations that are often
handled separately. Therefore, split them into separate opcodes and
classes to allow distinguishing these operations in the type system.
Additionally, this also slightly improves compile-time performance.
DeltaFile
+207-51llvm/include/llvm/IR/Instructions.h
+112-111llvm/test/tools/llvm-ir2vec/entities.ll
+89-89llvm/test/Transforms/IRNormalizer/regression-infinite-loop.ll
+81-80llvm/include/llvm/IR/Instruction.def
+38-38llvm/test/tools/llvm-ir2vec/triplets.ll
+36-31llvm/lib/IR/Instructions.cpp
+563-40065 files not shown
+838-65571 files

LLVM/project c7aaaealibcxx/test/benchmarks adjacent_view_begin.bench.cpp filesystem.bench.cpp, libcxx/test/benchmarks/algorithms swap_ranges.bench.cpp

[libc++] Rename a few benchmarks to allow identifying what's being benchmarked from the name (#185747)
DeltaFile
+76-76libcxx/test/benchmarks/libcxxabi/dynamic_cast.bench.cpp
+23-23libcxx/test/benchmarks/adjacent_view_begin.bench.cpp
+21-14libcxx/test/benchmarks/filesystem.bench.cpp
+13-6libcxx/test/benchmarks/algorithms/swap_ranges.bench.cpp
+8-8libcxx/test/benchmarks/format/formatter_int.bench.cpp
+141-1275 files

LLVM/project d8f71b1llvm/lib/Target/SPIRV SPIRVEmitIntrinsics.cpp SPIRVInstructionSelector.cpp, llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_masked_gather_scatter masked-gather-scatter.ll vector-of-pointers-ptrtoint.ll

[SPIR-V] Add support for SPV_INTEL_masked_gather_scatter extension (#185418)

Fixes the first bullet in #184638 

Corresponding patch to add support for vector operands in
OpConvertPtrToU/OpConvertUToPtr operations in spirv-val:
https://github.com/KhronosGroup/SPIRV-Tools/pull/6575

SPIR-V extension reference used:
https://github.com/KhronosGroup/SPIRV-Registry/blob/278044a51fee280bfc91322cdb55b51357db5cb8/extensions/INTEL/SPV_INTEL_masked_gather_scatter.asciidoc
DeltaFile
+103-0llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_masked_gather_scatter/masked-gather-scatter.ll
+94-0llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
+80-0llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+24-10llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.cpp
+33-0llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_masked_gather_scatter/vector-of-pointers-ptrtoint.ll
+19-2llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
+353-1210 files not shown
+428-1216 files

LLVM/project 1d5ba1allvm/utils/gn/secondary/clang/lib/Analysis BUILD.gn

[gn] port 6bc779506107d
DeltaFile
+2-0llvm/utils/gn/secondary/clang/lib/Analysis/BUILD.gn
+2-01 files

LLVM/project f6cafcbllvm/utils/gn/secondary/clang-tools-extra/clang-doc BUILD.gn, llvm/utils/gn/secondary/clang/lib/Analysis/Scalable BUILD.gn

[gn] port 65cb738ff41995 more (clang UnifiedSymbolResolution)
DeltaFile
+11-0llvm/utils/gn/secondary/clang/lib/UnifiedSymbolResolution/BUILD.gn
+1-1llvm/utils/gn/secondary/clang/lib/Analysis/Scalable/BUILD.gn
+1-1llvm/utils/gn/secondary/clang/lib/CrossTU/BUILD.gn
+1-1llvm/utils/gn/secondary/clang/lib/Tooling/Refactoring/BUILD.gn
+1-1llvm/utils/gn/secondary/clang/lib/ExtractAPI/BUILD.gn
+1-1llvm/utils/gn/secondary/clang-tools-extra/clang-doc/BUILD.gn
+16-55 files not shown
+21-511 files

LLVM/project ce0488fllvm/include/llvm/Analysis Delinearization.h, llvm/lib/Analysis Delinearization.cpp

[Delinearization] Fix comment in Delinearization.cpp/h (#182596)
DeltaFile
+1-3llvm/include/llvm/Analysis/Delinearization.h
+1-3llvm/lib/Analysis/Delinearization.cpp
+2-62 files

LLVM/project dc93e6elibclc/clc/include/clc/math gentype.inc, libclc/clc/lib/generic/conversion clc_convert_float.inc

libclc: Add gentype infinity macro (#185864)
DeltaFile
+4-5libclc/clc/lib/generic/conversion/clc_convert_float.inc
+1-0libclc/clc/include/clc/math/gentype.inc
+5-52 files

LLVM/project cb3fbe9llvm/lib/Target/AMDGPU AMDGPU.td GCNSubtarget.cpp, llvm/test/CodeGen/AMDGPU function-alignment.ll s_code_end.ll

[AMDGPU] Set preferred function alignment based on icache geometry (#183064)

Non-entry functions were unconditionally aligned to 4 bytes with no
architecture-specific preferred alignment, and setAlignment() was used
instead of ensureAlignment(), overwriting any explicit IR attributes.

Add instruction cache line size and fetch alignment data to GCNSubtarget
for each generation (GFX9: 64B/32B, GFX10: 64B/4B, GFX11+: 128B/4B). Use
this to call setPrefFunctionAlignment() in SITargetLowering, aligning
non-entry functions to the cache line size by default. Change
setAlignment to ensureAlignment in AMDGPUAsmPrinter so explicit IR align
attributes are respected.

Empirical thread trace analysis on gfx942, gfx1030, gfx1100, and gfx1200
showed that only GFX9 exhibits measurable fetch stalls when functions
cross the 32-byte fetch window boundary. GFX10+ showed no alignment
sensitivity. A hidden option -amdgpu-align-functions-for-fetch-only is
provided to use the fetch granularity instead of cache line size.

Assisted-by: Claude Opus
DeltaFile
+116-0llvm/test/CodeGen/AMDGPU/function-alignment.ll
+22-7llvm/lib/Target/AMDGPU/AMDGPU.td
+9-0llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+4-4llvm/test/CodeGen/AMDGPU/s_code_end.ll
+6-0llvm/lib/Target/AMDGPU/GCNSubtarget.h
+3-3llvm/test/CodeGen/AMDGPU/hsa-func.ll
+160-143 files not shown
+166-179 files

LLVM/project e08fd82llvm/lib/Target/X86 X86ISelLowering.cpp X86SelectionDAGInfo.cpp

[X86] LowerINTRINSIC_W_CHAIN - ensure the X86ISD::CMPCCXADD X86CondCode is a i8 target constant (#185856)

Fixes verification failure in X86SelectionDAGInfo::verifyTargetNode (#185649)
DeltaFile
+5-4llvm/lib/Target/X86/X86ISelLowering.cpp
+0-2llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
+5-62 files

LLVM/project 3669d2eclang/docs ReleaseNotes.rst, clang/lib/Sema SemaTemplateInstantiate.cpp

[Clang] Fix ICE in constraint normalization when substituting concept template parameters (#184406)

23341c3d139b889e8c46867f8d704ab3c22b51f8 introduced
`SubstituteConceptsInConstraintExpression` to substitute non-dependent
concept template arguments into a concept's constraint expression during
normalization, as part of the P2841R7 implementation
([temp.constr.normal]/1.4).

The `ConstraintExprTransformer` added in that commit overrides
`TransformTemplateArgument` to only transform concept-related arguments
and preserve all others. However, `TransformUnresolvedLookupExpr` called
`Sema::SubstExpr`, which creates a separate `TemplateInstantiator` that
performs full substitution bypassing the selective override entirely.

This caused all template parameters in the constraint expression to be
substituted using the concept's MLTAL. For example, given:

```cpp
template <class A, template <typename...> concept C>

    [22 lines not shown]
DeltaFile
+34-5clang/lib/Sema/SemaTemplateInstantiate.cpp
+36-0clang/test/SemaCXX/cxx2c-template-template-param.cpp
+1-0clang/docs/ReleaseNotes.rst
+71-53 files

LLVM/project 1930089libcxx/utils/ci/docker docker-compose.yml

[libc++] Update the docker base image version (#185863)
DeltaFile
+2-2libcxx/utils/ci/docker/docker-compose.yml
+2-21 files

LLVM/project b7aeb7fclang/lib/Basic/Targets AMDGPU.cpp, clang/test/Driver amdgpu-macros.cl

clang/AMDGPU: Ensure more macros are defined for dummy target (#185837)

FP_FAST_FMA should be unconditionally true.
DeltaFile
+15-15clang/lib/Basic/Targets/AMDGPU.cpp
+8-8clang/test/Preprocessor/predefined-arch-macros.c
+10-0clang/test/Driver/amdgpu-macros.cl
+33-233 files

LLVM/project c7bd306llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer consecutive-access.ll reduced-gathered-vectorized.ll

Revert "[SLP] Loop aware cost model/tree building"

This reverts commit 8963edb534e28d548d8381675bb18af1770c3041 to fix
miscompilations/compile time regressions, reported in https://github.com/llvm/llvm-project/pull/150450#issuecomment-4037224288, https://github.com/llvm/llvm-project/pull/150450#issuecomment-4037481719 and https://github.com/llvm/llvm-project/pull/150450#issuecomment-4038134121
DeltaFile
+13-183llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+92-69llvm/test/Transforms/SLPVectorizer/RISCV/buildvector-all-external-scalars.ll
+45-34llvm/test/Transforms/SLPVectorizer/AArch64/getelementptr.ll
+52-24llvm/test/Transforms/SLPVectorizer/consecutive-access.ll
+20-49llvm/test/Transforms/SLPVectorizer/reduced-gathered-vectorized.ll
+35-32llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
+257-39122 files not shown
+432-52928 files

LLVM/project 06b5110llvm/utils/gn/secondary/clang-tools-extra/clang-doc BUILD.gn

[gn build] Port b80248a0ea35
DeltaFile
+1-0llvm/utils/gn/secondary/clang-tools-extra/clang-doc/BUILD.gn
+1-01 files

LLVM/project 86f1a77llvm/utils/gn/secondary/clang/lib/CodeGen BUILD.gn

[gn build] Port af7c352fa38d
DeltaFile
+1-0llvm/utils/gn/secondary/clang/lib/CodeGen/BUILD.gn
+1-01 files

LLVM/project 6699e93llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/modernize BUILD.gn

[gn build] Port 9dece6d7a1d1
DeltaFile
+1-0llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/modernize/BUILD.gn
+1-01 files

LLVM/project add0870llvm/utils/gn/secondary/clang/lib/Index BUILD.gn

[gn build] Port 65cb738ff419
DeltaFile
+0-1llvm/utils/gn/secondary/clang/lib/Index/BUILD.gn
+0-11 files

LLVM/project 3d73d7allvm/utils/gn/secondary/clang/lib/Analysis/Scalable BUILD.gn

[gn build] Port 5cafc12f06ea
DeltaFile
+1-0llvm/utils/gn/secondary/clang/lib/Analysis/Scalable/BUILD.gn
+1-01 files

LLVM/project addde02llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/hicpp BUILD.gn

[gn build] Port 61a58cfa5c9a
DeltaFile
+0-1llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/hicpp/BUILD.gn
+0-11 files

LLVM/project 6c81bfdllvm/utils/gn/secondary/clang-tools-extra/clang-tidy/hicpp BUILD.gn, llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/portability BUILD.gn

[gn build] Port 63db92d6d25b
DeltaFile
+0-1llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/hicpp/BUILD.gn
+1-0llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/portability/BUILD.gn
+1-12 files

LLVM/project 0d067dbllvm/utils/gn/secondary/clang-tools-extra/clang-tidy/bugprone BUILD.gn, llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/hicpp BUILD.gn

[gn build] Port 49c714ecd731
DeltaFile
+1-0llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/bugprone/BUILD.gn
+0-1llvm/utils/gn/secondary/clang-tools-extra/clang-tidy/hicpp/BUILD.gn
+1-12 files

LLVM/project d9077cbllvm/utils/gn/secondary/libcxx/include BUILD.gn

[gn build] Port 1729480d243f
DeltaFile
+1-0llvm/utils/gn/secondary/libcxx/include/BUILD.gn
+1-01 files

LLVM/project 1423dc6clang/include/clang/Basic BuiltinsAMDGPU.td, clang/test/CodeGenHIP builtins-amdgcn-gfx12-f16-w64.hip builtins-amdgcn-gfx12-f16-w32.hip

Revert "[Clang][AMDGPU] Change __fp16 to _Float16 in builtin definitions (#18…"

This reverts commit a17289b76ae31efdd5b6ce0ed8da04b1b9185a33.
DeltaFile
+0-96clang/test/CodeGenHIP/builtins-amdgcn-gfx12-f16-w64.hip
+0-96clang/test/CodeGenHIP/builtins-amdgcn-gfx12-f16-w32.hip
+0-88clang/test/CodeGenHIP/builtins-amdgcn-f16-misc.hip
+0-70clang/test/CodeGenHIP/builtins-amdgcn-gfx1250-f16-misc.hip
+0-27clang/test/CodeGenHIP/builtins-amdgcn-gfx950-f16.hip
+13-13clang/include/clang/Basic/BuiltinsAMDGPU.td
+13-3908 files not shown
+22-39914 files

LLVM/project 0b179dellvm/include/llvm/ExecutionEngine/Orc WaitingOnGraph.h, llvm/unittests/ExecutionEngine/Orc WaitingOnGraphTest.cpp

[ORC] Update comment to reflect change b883091badd. NFCI.

And while I'm updating comments: fix some typos in WaitingOnGraphTest.cpp.
DeltaFile
+2-2llvm/unittests/ExecutionEngine/Orc/WaitingOnGraphTest.cpp
+1-1llvm/include/llvm/ExecutionEngine/Orc/WaitingOnGraph.h
+3-32 files

LLVM/project 9ba92ffllvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64SVEInstrInfo.td

[LLVM][CodeGen][SVE] Refactor isel of 128-bit constant splats. (#185652)

Rather than lower constant splats that only SVE supports to scalable
vectors this patch maintains the use of fixed length vectors but adds
isel patterns to select the necessary SVE instructions.

Doing this means we can extend coverage to include SVE operations that
take an immediate operand without needing to convert more of the DAG to
scalable vectors, which can potentially prevent larger NEON patterns
from matching.
DeltaFile
+3-4llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+3-0llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+3-0llvm/lib/Target/AArch64/SVEInstrFormats.td
+9-43 files