LLVM/project 7f8faefllvm/test/Transforms/PhaseOrdering/X86 horizontal-reduce-add.ll horizontal-reduce-fadd.ll

[PhaseOrdering][X86] Copied codegen add/fadd reduction pattern tests to ensure middle-end is creating reduction intrinsics (#206101)

AVX512 is missing a llvm.vector.reduce.add.v16i32 call - will investigate
DeltaFile
+125-0llvm/test/Transforms/PhaseOrdering/X86/horizontal-reduce-add.ll
+98-0llvm/test/Transforms/PhaseOrdering/X86/horizontal-reduce-fadd.ll
+223-02 files

LLVM/project bcc3939llvm/lib/Transforms/Scalar GVNSink.cpp

[llvm][GVNSink] Avoid non-determistic iteration order over NeededPHIs (#205952)

The iteration order of DenseSet is not guaranteed, which affects the
output of code generated with GVNSink enabled. This can cause code to be
emitted in differing order, affect section ordering, and in some cases
was reported to result in larger binaries due to increased padding between
sections.

This patch addresses this by using SetVector, which has a deterministic
iteration order.
DeltaFile
+3-4llvm/lib/Transforms/Scalar/GVNSink.cpp
+3-41 files

LLVM/project 5cc938bllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll, llvm/test/CodeGen/RISCV clmul.ll

Rebase

Created using spr 1.3.7
DeltaFile
+25,784-36,416llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+12,227-23,140llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+12,991-3,310llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+11,856-3,719llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s
+4,004-11,142llvm/test/CodeGen/RISCV/clmul.ll
+6,940-6,782llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+73,802-84,5098,083 files not shown
+467,177-361,3538,089 files

LLVM/project 6a0ef50clang/lib/CodeGen CGStmtOpenMP.cpp

[clang][OpenMP][NFC] Assert fused distribute loop invariant

See
https://github.com/llvm/llvm-project/pull/201670#discussion_r3463060131
DeltaFile
+8-3clang/lib/CodeGen/CGStmtOpenMP.cpp
+8-31 files

LLVM/project fdf30aeflang/lib/Optimizer/Builder FIRBuilder.cpp, flang/lib/Optimizer/HLFIR/Transforms InlineHLFIRAssign.cpp

Revert "[Flang]Add support for inlining hlfir.assign operation where both LHS and RHS are slices of the same array" (#206103)

Reverts llvm/llvm-project#204532 due to regressions in numerous Fujitsu
tests and several important apps
DeltaFile
+0-178flang/test/HLFIR/inline-hlfir-assign-self-copy.fir
+0-141flang/test/HLFIR/inline-hlfir-assign-self-copy-runtime-stride.fir
+0-137flang/lib/Optimizer/Builder/FIRBuilder.cpp
+0-132flang/test/HLFIR/inline-hlfir-assign-pointer-overlap.fir
+0-98flang/test/HLFIR/inline-hlfir-assign-scalar-index.fir
+12-48flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRAssign.cpp
+12-7342 files not shown
+13-7458 files

LLVM/project 0c0633bllvm/include/llvm/ADT Enum.h

[spr] initial version

Created using spr 1.3.8-wip
DeltaFile
+2-1llvm/include/llvm/ADT/Enum.h
+2-11 files

LLVM/project 7dbdc34llvm/cmake/modules CoverageReport.cmake

Fix command-line argument for prepare-code-coverage-artifact.py (#200982)
DeltaFile
+1-1llvm/cmake/modules/CoverageReport.cmake
+1-11 files

LLVM/project 543ee60flang/lib/Semantics check-omp-structure.cpp check-omp-structure.h

Remove unused locator check
DeltaFile
+0-11flang/lib/Semantics/check-omp-structure.cpp
+0-1flang/lib/Semantics/check-omp-structure.h
+0-122 files

LLVM/project 649fbc6llvm/include/llvm/Object ELFObjectFile.h

LLVM_ABI

Created using spr 1.3.8-wip
DeltaFile
+1-1llvm/include/llvm/Object/ELFObjectFile.h
+1-11 files

LLVM/project 96ef44cllvm/lib/Target/AMDGPU/Utils AMDGPUPALMetadata.cpp

[AMDGPU][NFC] Use compact enum table for PALMetadata (#206085)

Instead of storing pointer+value pair, use the new enum tables to store
the same information more compact and without dynamic relocations.
DeltaFile
+359-363llvm/lib/Target/AMDGPU/Utils/AMDGPUPALMetadata.cpp
+359-3631 files

LLVM/project 5c577a1flang/lib/Optimizer/Builder FIRBuilder.cpp, flang/lib/Optimizer/HLFIR/Transforms InlineHLFIRAssign.cpp

Revert "[Flang]Add support for inlining hlfir.assign operation where both LHS…"

This reverts commit 8eae99152a23bd70c6f8bbf8f59a4518902eb73f.
DeltaFile
+0-178flang/test/HLFIR/inline-hlfir-assign-self-copy.fir
+0-141flang/test/HLFIR/inline-hlfir-assign-self-copy-runtime-stride.fir
+0-137flang/lib/Optimizer/Builder/FIRBuilder.cpp
+0-132flang/test/HLFIR/inline-hlfir-assign-pointer-overlap.fir
+0-98flang/test/HLFIR/inline-hlfir-assign-scalar-index.fir
+12-48flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRAssign.cpp
+12-7342 files not shown
+13-7458 files

LLVM/project 00a6186llvm/lib/Target/AMDGPU AMDGPUSwLowerLDS.cpp AMDGPUAsmPrinter.cpp, llvm/unittests/CodeGen AMDGPUMetadataTest.cpp

AMDGPU: Prefer getting the triple from the module over the TargetMachine (#206055)
DeltaFile
+10-18llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+12-9llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+2-4llvm/lib/Target/AMDGPU/AMDGPU.h
+3-3llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+1-0llvm/unittests/CodeGen/AMDGPUMetadataTest.cpp
+29-356 files

LLVM/project 9ff2237llvm/runtimes CMakeLists.txt

scope to tests
DeltaFile
+10-1llvm/runtimes/CMakeLists.txt
+10-11 files

LLVM/project 60908dcllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 reduction-copyable-reused-scalars.ll

[SLP] Fix reused-scalar reduction counters for copyable root nodes

The horizontal reduction reuse-counter scale is built in
getRootNodeScalars() order and applied positionally to the emitted
reduction vector. For a root node with copyable elements the scalar
order is reordered while the emitted lanes still follow the reduced
values (candidates) order, so the repeat count was applied to the wrong
lane, producing a wrong reduction result.

Fixes #205614

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/206102
DeltaFile
+151-0llvm/test/Transforms/SLPVectorizer/X86/reduction-copyable-reused-scalars.ll
+26-8llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+177-82 files

LLVM/project fe5f49fcross-project-tests/dtlto cache-serialization.test, llvm/include/llvm/DTLTO DTLTO.h

[DTLTO] Do not serialize inputs that hit in the ThinLTO cache (#204104)

To handle bitcode inputs that are not in individual files on disk, such
as members of non-thin archives, DTLTO serializes those inputs to
temporary individual bitcode files.

This patch changes LLVM to serialize only uncached input modules and any
modules they import from.

For a link of Clang 22 (debug build with sanitizers and
instrumentation), I performed measurements with and without this patch
for an optimized toolchain (PGO non-LTO, based on recent main commit
c264e07c2f3d9f25a2526e69926daea3a68be74b). The measurements were run on:
- Windows 11 Pro build 26200, AMD Family 25 at approximately 4.5 GHz,
  16 cores / 32 threads, and 64 GB RAM.
- Ubuntu 24.04.3 LTS, Ryzen 9 5950X with 32 threads, and 62 GiB RAM.

There was no difference in serialization time when the cache was
disabled.

    [4 lines not shown]
DeltaFile
+91-0cross-project-tests/dtlto/cache-serialization.test
+19-12llvm/include/llvm/DTLTO/DTLTO.h
+7-4llvm/lib/DTLTO/DTLTOInputFiles.cpp
+6-0llvm/lib/DTLTO/DTLTO.cpp
+123-164 files

LLVM/project d445267llvm/test/Transforms/LoopVectorize/AArch64 select-index.ll gather-cost.ll, llvm/test/Transforms/PhaseOrdering/AArch64 hoist-runtime-checks.ll

[AArch64] Increase the max interleave factor to 4 for loops with reductions (#205612)

The default max interleave factor is 2. Increasing it to 4 universally
can spend an amount of codeside on something that does not always
increase performance (especially if the loop gets over-unrolled). Small
reduction loops often benefit from extra interleaving due to the
multiple independant streams that can execute in parallel. This patch
increases the max interleave factor to 4 for such loops, limited to
where the VF is <= 4 to limit the impact for already highly vectorized
loops.
DeltaFile
+357-90llvm/test/Transforms/LoopVectorize/AArch64/select-index.ll
+317-36llvm/test/Transforms/LoopVectorize/AArch64/gather-cost.ll
+247-55llvm/test/Transforms/LoopVectorize/AArch64/interleave-with-gaps.ll
+138-48llvm/test/Transforms/LoopVectorize/AArch64/clamped-load.ll
+114-39llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll
+116-35llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll
+1,289-30337 files not shown
+1,776-45143 files

LLVM/project 49f061cllvm/utils/gn/secondary/libcxx/src BUILD.gn

[gn] port 0eefb2682bf8c (C++26 for libc++) (#206100)

Other than in 8a7846fe86f95e82c (the C++23 bump), we apparently only
bump the standard for libc++, but not for libc++abi.
DeltaFile
+1-1llvm/utils/gn/secondary/libcxx/src/BUILD.gn
+1-11 files

LLVM/project 7ee6526llvm/lib/BinaryFormat DXContainer.cpp

clang-format

Created using spr 1.3.8-wip
DeltaFile
+3-4llvm/lib/BinaryFormat/DXContainer.cpp
+3-41 files

LLVM/project 94b5b37llvm/utils/gn/secondary/compiler-rt/lib/builtins BUILD.gn

[gn] "port" affc89f7cb64 (#206098)
DeltaFile
+2-0llvm/utils/gn/secondary/compiler-rt/lib/builtins/BUILD.gn
+2-01 files

LLVM/project ddb1d0cllvm/test/CodeGen/AMDGPU float-sopc-vopc.ll

[AMDGPU][NFC] Add explicit -global-isel flag to test (#203533)

Follow up to #200414
[comment](https://github.com/llvm/llvm-project/pull/200414#discussion_r3403426555)
to add explicit `-global-isel` flag to mixed tests.
DeltaFile
+4-4llvm/test/CodeGen/AMDGPU/float-sopc-vopc.ll
+4-41 files

LLVM/project 255916elldb/source/Plugins/SymbolFile/NativePDB PdbFPOProgramToDWARFExpression.cpp

add namespace prefix in lldb

Created using spr 1.3.8-wip
DeltaFile
+1-1lldb/source/Plugins/SymbolFile/NativePDB/PdbFPOProgramToDWARFExpression.cpp
+1-11 files

LLVM/project 8249d08mlir/include/mlir/Dialect/OpenACC/Transforms Passes.td, mlir/lib/Dialect/OpenACC/Transforms ACCBindRoutine.cpp

[mlir][acc] Rewrite acc routine bind calls inside gpu.func (#204220)

Run `acc-bind-routine` on `FunctionOpInterface` and rewrite calls to
bound symbols in offload regions and `gpu.func`. For string bind names,
declare private functions in the enclosing `gpu.module` symbol table
when the call is inside device code.
DeltaFile
+56-47mlir/lib/Dialect/OpenACC/Transforms/ACCBindRoutine.cpp
+25-1mlir/test/Dialect/OpenACC/acc-bind-routine.mlir
+2-1mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.td
+83-493 files

LLVM/project 384cecdcross-project-tests/debuginfo-tests/dexter/dex/evaluation ExpectRewriter.py, cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting rewrite_expects.cpp rewrite_multiple_scripts.cpp

Reapply "[Dexter] Add ability to rewrite scripts to fill-in unknown values" (#206034)

Reverts llvm/llvm-project#205657

The original commit was causing pre-merge CI to fail for AArch64, as one
of the tests expects stepping behaviour that is seen on not seen on
AArch64 targets; the test suite containing the failing test is meant to
be configured to not run for AArch64, but the unsupported label was not
being applied, due to an error in the unsupported check. This patch
fixes the unsupported check in scripts/lit.local.cfg, which should
prevent further errors.
DeltaFile
+212-0cross-project-tests/debuginfo-tests/dexter/dex/evaluation/ExpectRewriter.py
+130-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting/Inputs/rewrite_expect_list_expected.cpp
+54-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting/rewrite_expects.cpp
+53-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting/Inputs/rewrite_expects_expected.cpp
+48-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting/rewrite_multiple_scripts.cpp
+48-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting/Inputs/rewrite_multiple_scripts_expected.cpp
+545-09 files not shown
+691-815 files

LLVM/project f7035c8bolt/lib/Core BinaryContext.cpp, bolt/lib/Passes BinaryPasses.cpp CacheMetrics.cpp

add missing includes

Created using spr 1.3.8-wip
DeltaFile
+1-0bolt/lib/Core/BinaryContext.cpp
+1-0bolt/lib/Passes/BinaryPasses.cpp
+1-0bolt/lib/Passes/CacheMetrics.cpp
+1-0bolt/lib/Passes/ProfileQualityStats.cpp
+1-0bolt/lib/Passes/RetpolineInsertion.cpp
+1-0llvm/tools/llvm-dwarfutil/DebugInfoLinker.cpp
+6-01 files not shown
+7-07 files

LLVM/project 1b35230llvm/lib/Target/X86/MCTargetDesc X86InstComments.cpp

[spr] initial version

Created using spr 1.3.8-wip
DeltaFile
+260-258llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp
+260-2581 files

LLVM/project e03fb67llvm/lib/Target/AMDGPU AMDGPUSwLowerLDS.cpp AMDGPUAsmPrinter.cpp, llvm/unittests/CodeGen AMDGPUMetadataTest.cpp

AMDGPU: Prefer getting the triple from the module over the TargetMachine
DeltaFile
+10-18llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+12-9llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+2-4llvm/lib/Target/AMDGPU/AMDGPU.h
+3-3llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+1-0llvm/unittests/CodeGen/AMDGPUMetadataTest.cpp
+29-356 files

LLVM/project e3e7ec4clang/test/DebugInfo/Generic bitfield-0-struct.c, clang/test/SemaSYCL sycl-cconv-win.cpp

AMDGPU: Fix typos in test triple OS components (#206065)

Co-Authored-By: Claude <noreply at anthropic.com>
DeltaFile
+1-1llvm/test/CodeGen/AMDGPU/GlobalISel/shufflevector.ll
+1-1clang/test/DebugInfo/Generic/bitfield-0-struct.c
+1-1clang/test/SemaSYCL/sycl-cconv-win.cpp
+3-33 files

LLVM/project bf4663fllvm/lib/TargetParser X86TargetParser.cpp

[spr] initial version

Created using spr 1.3.8-wip
DeltaFile
+180-184llvm/lib/TargetParser/X86TargetParser.cpp
+180-1841 files

LLVM/project 0aaec67llvm/test/CodeGen/AMDGPU vgpr-large-tuple-alloc-error.ll unpack-non-coissue-insts-post-ra-scheduler.mir

AMDGPU: Use -mtriple= instead of with a space for llc run lines (#206067)

-mtriple=amdgcn is by far the dominant form over space separation.
Convert these to simplify future bulk test updates.
DeltaFile
+4-4llvm/test/CodeGen/AMDGPU/vgpr-large-tuple-alloc-error.ll
+3-3llvm/test/CodeGen/AMDGPU/unpack-non-coissue-insts-post-ra-scheduler.mir
+2-2llvm/test/CodeGen/AMDGPU/phi-elimination-assertion.mir
+2-2llvm/test/CodeGen/AMDGPU/phi-elimination-end-cf.mir
+2-2llvm/test/CodeGen/AMDGPU/inline-calls.ll
+2-2llvm/test/CodeGen/AMDGPU/amdgpu-mul24-knownbits.ll
+15-1514 files not shown
+30-3020 files

LLVM/project 4182083llvm/lib/Target/AMDGPU/Utils AMDGPUPALMetadata.cpp

[spr] initial version

Created using spr 1.3.8-wip
DeltaFile
+359-363llvm/lib/Target/AMDGPU/Utils/AMDGPUPALMetadata.cpp
+359-3631 files