LLVM/project 682ae8bmlir/lib/Dialect/X86/Transforms VectorContractToAMXDotProduct.cpp, mlir/test/Dialect/X86/AMX vector-contract-to-tiled-dp.mlir

[mlir][x86] Lower packed type vector.contract to AMX dot-product (online-packing) (#188192)

A transform pass to lower flat layout `vector.contract` operation to (a)
amx.tile_mulf for BF16, or (b) amx.tile_muli for Int8 packed types via
`online` packing.

TODOs: On an another `patch` planned to re-factor this pass + retiring
`convert-vector-to-amx` pass.
DeltaFile
+875-148mlir/lib/Dialect/X86/Transforms/VectorContractToAMXDotProduct.cpp
+480-20mlir/test/Dialect/X86/AMX/vector-contract-to-tiled-dp.mlir
+1,355-1682 files

LLVM/project 8048e36clang/lib/CIR/CodeGen CIRGenModule.cpp, clang/test/CIR/CodeGen attr-retain.c attr-used.c

add gv section attribute
DeltaFile
+1-5clang/lib/CIR/CodeGen/CIRGenModule.cpp
+2-2clang/test/CIR/CodeGen/attr-retain.c
+1-1clang/test/CIR/CodeGen/attr-used.c
+1-1clang/test/CIR/CodeGen/keep-persistent-storage-variables.cpp
+1-1clang/test/CIR/CodeGen/keep-static-consts.cpp
+6-105 files

LLVM/project 4a1d1c2clang/test/CIR/CodeGenHIP hip-cuid.hip

fix hip test
DeltaFile
+2-3clang/test/CIR/CodeGenHIP/hip-cuid.hip
+2-31 files

LLVM/project 2b71043clang/test/CIR/CodeGen keep-persistent-storage-variables.cpp keep-static-consts.cpp

add tests persistent-storage-variables and keep-static-consts
DeltaFile
+20-0clang/test/CIR/CodeGen/keep-persistent-storage-variables.cpp
+11-0clang/test/CIR/CodeGen/keep-static-consts.cpp
+31-02 files

LLVM/project 957215cclang/lib/CIR/CodeGen CIRGenModule.cpp CIRGenModule.h, clang/test/CIR/CodeGen attr-retain.c attr-used.c

use CIRGlobalValueInterface
DeltaFile
+30-29clang/lib/CIR/CodeGen/CIRGenModule.cpp
+18-0clang/test/CIR/CodeGen/attr-retain.c
+7-7clang/lib/CIR/CodeGen/CIRGenModule.h
+14-0clang/test/CIR/CodeGen/attr-used.c
+69-364 files

LLVM/project fbcdc95clang/lib/CIR/CodeGen CIRGenModule.cpp CIRGenModule.h, clang/test/CIR/CodeGenHIP hip-cuid.hip

[CIR] Add addLLVMUsed and addLLVMCompilerUsed methods to CIRGenModule
DeltaFile
+100-2clang/lib/CIR/CodeGen/CIRGenModule.cpp
+27-0clang/test/CIR/CodeGenHIP/hip-cuid.hip
+19-0clang/lib/CIR/CodeGen/CIRGenModule.h
+146-23 files

LLVM/project f647f0cllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/RISCV revec-strided-load.ll

[SLP] Fix handling of strided loads during re-vectorization (#191294)

Fixes #191292
DeltaFile
+8-2llvm/test/Transforms/SLPVectorizer/RISCV/revec-strided-load.ll
+4-3llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+12-52 files

LLVM/project 874702ellvm/lib/Target/AMDGPU AMDGPUSwLowerLDS.cpp

use getFirstNonPHIOrDbgOrAlloca
DeltaFile
+1-3llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+1-31 files

LLVM/project def143aclang/lib/AST DeclTemplate.cpp, clang/test/SemaTemplate GH188759.cpp

[clang] fix getReplacedTemplateParameter for function template specializations (#189559)

(cherry picked from commit 2b439327026d45bf53e59159c8e40fccf87930b6)
DeltaFile
+13-0clang/test/SemaTemplate/GH188759.cpp
+6-4clang/lib/AST/DeclTemplate.cpp
+19-42 files

LLVM/project a98b9dallvm/lib/Target/AMDGPU AMDGPUSwLowerLDS.cpp, llvm/test/CodeGen/AMDGPU amdgpu-sw-lower-lds-static-alloca-placement.ll

splice and then move stragglers allocas
DeltaFile
+40-74llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-alloca-placement.ll
+9-6llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+49-802 files

LLVM/project b0a403allvm/lib/Target/AMDGPU AMDGPUSwLowerLDS.cpp, llvm/test/CodeGen/AMDGPU amdgpu-sw-lower-lds-static-alloca-placement.ll

[AMDGPU][ASAN] Move allocas to entry block in amdgpu-sw-lower-lds pass
DeltaFile
+95-0llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-alloca-placement.ll
+13-1llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+108-12 files

LLVM/project c755c08llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/test/CodeGen/RISCV split-udiv-by-constant.ll split-urem-by-constant.ll

[TargetLowering] Support larger divisors in expandDIVREMByConstant. (#191119)

Instead of bailing out if the original divisor exceeds HBitWidth,
allow divisors that fit in HBitWidth after removing trailing zeros.

PartialRem now needs a low and high part. Shifting RemL left
now needs to handle shifting into RemH.

Assisted-by: Claude Sonnet 4.5
DeltaFile
+287-2llvm/test/CodeGen/RISCV/split-udiv-by-constant.ll
+210-2llvm/test/CodeGen/RISCV/split-urem-by-constant.ll
+70-24llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+23-5llvm/test/CodeGen/X86/i128-udiv.ll
+590-334 files

LLVM/project 215f35ellvm/lib/Target/AArch64 AArch64ExpandPseudoInsts.cpp

[AArch64] Skip non-pseudo instructions in AArch64ExpandPseudoInsts (#191395)

AArch64::getSVEPseudoMap calls are visible in compile-time profiles even on
non-SVE targets. I think CodeGenMapTable could be improved, it's currently
emitting a constexpr array sorted by opcode and a hand-rolled binary search
over that array, however the AArch64ExpandPseudoInsts pass is missing a simple
check for pseudo instructions before expanding. This avoids the compile-time
cost.

https://llvm-compile-time-tracker.com/compare.php?from=0d42811ea4658b3e86a3801b3bc848324f8540f8&to=9e2434de84577ca1c5e6de8fe8d75c6b8e282b3f&stat=instructions%3Au
DeltaFile
+2-1llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+2-11 files

LLVM/project 8e64b13llvm/lib/Target/AMDGPU FLATInstructions.td, llvm/test/CodeGen/AMDGPU llvm.amdgcn.av.global.load.b128.ll llvm.amdgcn.av.global.store.b128.ll

Address review comments

- Revert a lot of mnemonic renames caused by a brute-force sed.
- Add -filetype=null to unsupported test RUN lines
- Regenerate CHECK lines in codegen tests

Assisted-By: Claude Opus 4.6 (1M context)
DeltaFile
+696-696llvm/test/CodeGen/AMDGPU/llvm.amdgcn.av.global.load.b128.ll
+96-96llvm/test/CodeGen/AMDGPU/llvm.amdgcn.av.global.store.b128.ll
+48-48llvm/test/CodeGen/AMDGPU/amdgcn-av-scopes.ll
+7-7llvm/lib/Target/AMDGPU/FLATInstructions.td
+6-6llvm/test/CodeGen/AMDGPU/unsupported-av-global-store.ll
+6-6llvm/test/CodeGen/AMDGPU/unsupported-av-global-load.ll
+859-8591 files not shown
+863-8637 files

LLVM/project 5864733llvm/include/llvm/CodeGen/GlobalISel GIMatchTableExecutorImpl.h GIMatchTableExecutor.h, llvm/utils/TableGen/Common/GlobalISel GlobalISelMatchTable.cpp GlobalISelMatchTable.h

Skip type check for metadata operands in addTypeCheckPredicate

Metadata is trivially always metadata. So we don't actually need the predicate
introduced in #191389.
DeltaFile
+4-15llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.cpp
+0-18llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.h
+0-9llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h
+0-6llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutor.h
+4-484 files

LLVM/project 121f5a9.ci compute_projects_test.py compute_projects.py, libclc CMakeLists.txt README.md

[libclc] Enable LLVM_RUNTIME_TARGETS in build system (#189892)

libclc target is now passed in from LLVM_RUNTIME_TARGETS.

The old configure flow based on `-DLLVM_ENABLE_RUNTIMES=libclc` is
deprecated because libclc no longer has a default target.
`-DLLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="<target-triple>"`
still works but it is considered legacy.

The new standard build requires:
Each target must now be selected explicitly on the CMake command line
through the runtimes target-specific cache entry and
LLVM_RUNTIME_TARGETS.
For example:
-DRUNTIMES_amdgcn-amd-amdhsa-llvm_LLVM_ENABLE_RUNTIMES=libclc
-DLLVM_RUNTIME_TARGETS="amdgcn-amd-amdhsa-llvm"
-DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=libclc
-DLLVM_RUNTIME_TARGETS="nvptx64-nvidia-cuda"
-DRUNTIMES_clspv--_LLVM_ENABLE_RUNTIMES=libclc

    [17 lines not shown]
DeltaFile
+156-162libclc/CMakeLists.txt
+66-15libclc/README.md
+13-10.ci/compute_projects_test.py
+7-14libclc/test/CMakeLists.txt
+14-3.ci/compute_projects.py
+10-2.ci/monolithic-windows.sh
+266-2062 files not shown
+287-2098 files

LLVM/project 507d823llvm/lib/Transforms/Scalar LoopStrengthReduce.cpp, llvm/test/CodeGen/X86/AMX amx-across-func.ll

[LSR] Use TTI to check if zero-start IV is free in getSetupCost (#190587)

This avoids a downstream regression where LSR prefers {-1,+1}.
When constant zero typically doesn't require preheader initialization
(queried via TTI::getIntImmCost), consider it as free in getSetupCost.

Three test changes are improvements: amx-across-func.ll,
2011-11-29-postincphi.ll and pr62660-normalization-failure.ll.
Other test changes are neutral.
DeltaFile
+16-8llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+6-8llvm/test/Transforms/LoopStrengthReduce/X86/2011-11-29-postincphi.ll
+6-7llvm/test/Transforms/LoopStrengthReduce/X86/pr62660-normalization-failure.ll
+4-6llvm/test/CodeGen/X86/AMX/amx-across-func.ll
+4-4llvm/test/Transforms/LoopStrengthReduce/X86/postinc-iv-used-by-urem-and-udiv.ll
+4-4llvm/test/Transforms/LoopStrengthReduce/duplicated-phis.ll
+40-371 files not shown
+41-387 files

LLVM/project 7b94b9alibclc/clc/lib/generic/workitem clc_get_sub_group_size.cl

[libclc] Refine generic __clc_get_sub_group_size with fast full sub-group path (#188895)

Add a fast path for the common case that total work-group size is
multiple of max sub-group size.

The fallback path is ported from amdgpu/workitem/clc_get_sub_group_size.cl.

Compiler can generate predicated instructions for the fallback path to
avoid branches.
DeltaFile
+12-12libclc/clc/lib/generic/workitem/clc_get_sub_group_size.cl
+12-121 files

LLVM/project 00328f1flang/lib/Optimizer/Transforms MIFOpConversion.cpp

[flang][NFC] Fix typo in comment for multi-image environment (#191722)
DeltaFile
+1-1flang/lib/Optimizer/Transforms/MIFOpConversion.cpp
+1-11 files

LLVM/project 2c28158llvm/lib/IR Intrinsics.cpp, llvm/utils/TableGen/Basic IntrinsicEmitter.cpp

[LLVM][Intrinsics] Eliminate range check for IIT table in `DecodeIITType` (#190260)

`DecodeIITType` does a range check each time the next entry from the IIT
encoding table is read. This is required to handle IIT encodings that
are in-lined into the `IIT_Table` entries, since the `IITEntries` array
in `getIntrinsicInfoTableEntries` is terminated after the last non-zero
nibble is seen in the inlined encoding (but that may not be the actual
end). Change this code to instead have the `IITEntries` array for the
inlined case point to the full `IITValues` array payload + a IIT_Done
terminator, so that such entries look exactly like they would if they
were encoded in the long encoding table and then remove the range check
in `DecodeIITType` to streamline that code a bit.

Additionally, change some use if 0s (in loop conditions and default
constructed terminator in the IIT long encoding table) to explicitly use
IIT_Done to clarify the code better.

Also use `consume_front()` in a few places instead of `front()` followed
by `slice(1)`.
DeltaFile
+27-26llvm/lib/IR/Intrinsics.cpp
+7-1llvm/utils/TableGen/Basic/IntrinsicEmitter.cpp
+34-272 files

LLVM/project e62acf4llvm/include/llvm-c Core.h, llvm/include/llvm/IR IRBuilder.h

[NFC][LLVM] Rename IRBuilder/LLVM C API params for overload types (#191674)

Rename IRBuilder and LLVM C API function params for overload types to
use names to better reflect their meaning.
DeltaFile
+20-17llvm/lib/IR/Core.cpp
+12-12llvm/include/llvm-c/Core.h
+5-4llvm/include/llvm/IR/IRBuilder.h
+2-2llvm/lib/IR/IRBuilder.cpp
+39-354 files

LLVM/project d946ac3llvm/lib/Transforms/Utils CallGraphUpdater.cpp, llvm/test/Transforms/Inline inline-history-dead-function.ll

[CallGraphUpdater] Replace dead function in metadata with null instead of poison (#191729)

Assisted-by: claude-4.6-opus
DeltaFile
+29-0llvm/test/Transforms/Inline/inline-history-dead-function.ll
+6-1llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
+35-12 files

LLVM/project 21d9778llvm/test/CodeGen/AMDGPU memory-legalizer-private-singlethread.ll memory-legalizer-private-wavefront.ll

Rebase

Created using spr 1.3.7
DeltaFile
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
+8,449-1,355llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+8,449-1,355llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
+8,069-1,315llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
+50,599-8,1237,128 files not shown
+569,814-214,9437,134 files

LLVM/project c0fbdb2clang/lib/AST/ByteCode EvaluationResult.cpp

[clang][bytecode] Stop using QualTypes when checking evaluation results (#191732)

They might not match the descriptor contents exactly, so just look at
the descriptors.
DeltaFile
+25-34clang/lib/AST/ByteCode/EvaluationResult.cpp
+25-341 files

LLVM/project c4c31f8llvm/lib/Transforms/Vectorize SLPVectorizer.cpp

Fix formatting

Created using spr 1.3.7
DeltaFile
+1-2llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+1-21 files

LLVM/project bef1453clang/lib/Headers/hlsl hlsl_alias_intrinsics.h, llvm/test/CodeGen/AArch64 itofp-bf16.ll

Rebase, address comments

Created using spr 1.3.7
DeltaFile
+0-4,851llvm/test/tools/llvm-mca/RISCV/SiFiveX390/vector-fp.s
+2,832-1,746llvm/test/CodeGen/AArch64/itofp-bf16.ll
+4,526-0llvm/test/tools/llvm-mca/RISCV/SiFiveX390/rvv/arithmetic.test
+3,583-866llvm/test/CodeGen/RISCV/fpclamptosat.ll
+4-3,871clang/lib/Headers/hlsl/hlsl_alias_intrinsics.h
+3,706-0llvm/test/tools/llvm-mca/RISCV/SiFiveX390/rvv/fp.test
+14,651-11,3343,790 files not shown
+188,733-91,1953,796 files

LLVM/project 717ba7cllvm/lib/Transforms/Vectorize VPlanRecipes.cpp VPlanConstruction.cpp

[VPlan] Handle calls in VPInstruction:opcodeMayReadOrWriteFromMemory. (#190681)

Retrieve the called function and check its memory attributes, to
determine if a VPInstruction calling a function reads or writes memory.

Use it to strengthen assert in areAllLoadsDereferenceable.

PR: https://github.com/llvm/llvm-project/pull/190681
DeltaFile
+25-8llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+4-2llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+29-102 files

LLVM/project a98cb95llvm/tools/llvm-profgen PerfReader.cpp PerfReader.h

reduce changes

Created using spr 1.3.4
DeltaFile
+10-10llvm/tools/llvm-profgen/PerfReader.cpp
+3-0llvm/tools/llvm-profgen/PerfReader.h
+13-102 files

LLVM/project 029e5b0clang/lib/Format WhitespaceManager.cpp ContinuationIndenter.h, clang/unittests/Format AlignmentTest.cpp

[clang-format] treat continuation as indent for aligned lines (#191217)

This allows to inherit tabbed indent from the lines we break by the
lines we want to align. Thus in the AlignWithSpaces mode aligned lines
do not generate smaller indent than those they are aligned to.
DeltaFile
+38-19clang/lib/Format/WhitespaceManager.cpp
+34-0clang/unittests/Format/AlignmentTest.cpp
+16-17clang/lib/Format/ContinuationIndenter.h
+15-14clang/lib/Format/ContinuationIndenter.cpp
+19-0clang/lib/Format/FormatToken.h
+8-7clang/lib/Format/BreakableToken.cpp
+130-572 files not shown
+136-628 files

LLVM/project 5b1b0efclang/tools/diagtool ShowEnabledWarnings.cpp

[Clang][diagtool] Fix memory leak in ShowEnabledWarnings (#191711)

Fix 136-byte memory leak introduced in commit 6dc059ac3c7c. Before
that commit, the TextDiagnosticBuffer was passed to DiagnosticsEngine
constructor which took ownership and managed its lifetime. After the
refactoring, the buffer is no longer passed to DiagnosticsEngine, so
it becomes an orphaned allocation that is never freed. Changed to use
std::unique_ptr for automatic cleanup.
DeltaFile
+2-1clang/tools/diagtool/ShowEnabledWarnings.cpp
+2-11 files