LLVM/project 43f8208llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge branch 'users/ziqingluo/PR-172429193-2-split-2' into users/ziqingluo/PR-172429193-2-split-3

 Conflicts:
        clang/lib/ScalableStaticAnalysisFramework/Analyses/SSAFAnalysesCommon.cpp
        clang/lib/ScalableStaticAnalysisFramework/Analyses/SSAFAnalysesCommon.h
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-03,040 files not shown
+986,235-65,9253,046 files

LLVM/project 4b9cac9llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.sub.ll llvm.amdgcn.reduce.add.ll

[AMDGPU] Unmark wave reduce intrinsics for constant folding

The `add`, `sub`, and `xor` wave reduction intrinsics cannot
be constant folded, as `add` and `sub` need to be multipled
by the number of active lanes, and `xor` depends on the parity
of the number of active lanes.
DeltaFile
+352-172llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+337-168llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+327-156llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+229-132llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.max.ll
+228-132llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.min.ll
+228-132llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.or.ll
+1,701-8924 files not shown
+2,383-1,29410 files

LLVM/project e983c47llvm/lib/Analysis ConstantFolding.cpp

[AMDGPU] Unmark wave reduce intrinsics for constant folding

The `add`, `sub`, and `xor` wave reduction intrinsics cannot
be constant folded, as `add` and `sub` need to be multipled
by the number of active lanes, and `xor` depends on the parity
of the number of active lanes.
DeltaFile
+0-6llvm/lib/Analysis/ConstantFolding.cpp
+0-61 files

LLVM/project e9d24b0clang/lib/ScalableStaticAnalysisFramework/Analyses SSAFAnalysesCommon.cpp

fix build issue
DeltaFile
+2-0clang/lib/ScalableStaticAnalysisFramework/Analyses/SSAFAnalysesCommon.cpp
+2-01 files

LLVM/project 07f7b03llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge remote-tracking branch 'origin/users/ziqingluo/PR-172429193-2-split-1' into users/ziqingluo/PR-172429193-2-split-2

 Conflicts:
        clang/include/clang/ScalableStaticAnalysisFramework/Analyses/EntityPointerLevel/EntityPointerLevelFormat.h
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-03,038 files not shown
+986,216-65,9093,044 files

LLVM/project 4976de9llvm/test/CodeGen/AArch64 sve-fixed-length-masked-expandloads.ll sve-streaming-mode-fixed-length-masked-expandload.ll, llvm/test/CodeGen/RISCV/rvv vector-interleave.ll

remove opt

Created using spr 1.3.7
DeltaFile
+26,606-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-expandloads.ll
+4,078-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-expandload.ll
+1,604-1,567llvm/test/CodeGen/AArch64/clmul-scalable.ll
+731-1,359llvm/test/Transforms/LoopVectorize/find-last-iv-sinkable-expr.ll
+1,957-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-rem.ll
+1,950-0llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
+36,926-2,9262,036 files not shown
+103,072-36,4012,042 files

LLVM/project 3fb1444llvm/test/CodeGen/AArch64 sve-fixed-length-masked-expandloads.ll sve-streaming-mode-fixed-length-masked-expandload.ll, llvm/test/CodeGen/RISCV/rvv vector-interleave.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+26,606-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-expandloads.ll
+4,078-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-expandload.ll
+1,604-1,567llvm/test/CodeGen/AArch64/clmul-scalable.ll
+731-1,359llvm/test/Transforms/LoopVectorize/find-last-iv-sinkable-expr.ll
+1,957-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-rem.ll
+1,950-0llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
+36,926-2,9262,036 files not shown
+103,050-36,3802,042 files

LLVM/project 0fb1f2dllvm/test/CodeGen/AArch64 sve-fixed-length-masked-expandloads.ll sve-streaming-mode-fixed-length-masked-expandload.ll, llvm/test/CodeGen/RISCV/rvv vector-interleave.ll

remove opt

Created using spr 1.3.7
DeltaFile
+26,606-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-expandloads.ll
+4,078-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-expandload.ll
+1,604-1,567llvm/test/CodeGen/AArch64/clmul-scalable.ll
+731-1,359llvm/test/Transforms/LoopVectorize/find-last-iv-sinkable-expr.ll
+1,957-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-rem.ll
+1,950-0llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
+36,926-2,9262,036 files not shown
+103,050-36,3802,042 files

LLVM/project a85d39dllvm/test/CodeGen/AArch64 sve-fixed-length-masked-expandloads.ll sve-streaming-mode-fixed-length-masked-expandload.ll, llvm/test/CodeGen/RISCV/rvv vector-interleave.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+26,606-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-expandloads.ll
+4,078-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-expandload.ll
+1,604-1,567llvm/test/CodeGen/AArch64/clmul-scalable.ll
+731-1,359llvm/test/Transforms/LoopVectorize/find-last-iv-sinkable-expr.ll
+1,957-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-rem.ll
+1,950-0llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
+36,926-2,9262,031 files not shown
+102,874-36,2652,037 files

LLVM/project a69a290clang/lib/ScalableStaticAnalysisFramework/Analyses/EntityPointerLevel EntityPointerLevel.cpp

a class should own a std::function instead of a llvm::function_ref
DeltaFile
+3-3clang/lib/ScalableStaticAnalysisFramework/Analyses/EntityPointerLevel/EntityPointerLevel.cpp
+3-31 files

LLVM/project 2851ad5llvm/lib/Target/AMDGPU SIRegisterInfo.cpp

Fix comments
DeltaFile
+4-6llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+4-61 files

LLVM/project cd5634aclang/lib/ScalableStaticAnalysisFramework/Analyses SSAFAnalysesCommon.cpp SSAFAnalysesCommon.h

Move templates into clang::ssaf
DeltaFile
+17-0clang/lib/ScalableStaticAnalysisFramework/Analyses/SSAFAnalysesCommon.cpp
+4-5clang/lib/ScalableStaticAnalysisFramework/Analyses/SSAFAnalysesCommon.h
+1-0clang/lib/ScalableStaticAnalysisFramework/Analyses/CMakeLists.txt
+22-53 files

LLVM/project 83f8eeelld/ELF Driver.cpp Config.h

[ELF] Parallelize input file loading (#191690)

During `createFiles`, `addFile()` records a `LoadJob` for each
non-script input (archive, relocatable, DSO, bitcode, binary) with a
state-machine snapshot (`inWholeArchive`, `inLib`, `asNeeded`,
`withLOption`, `groupId`) and expands them on worker threads in
`loadFiles()`. Linker scripts are still processed inline since their
`INPUT()` and `GROUP()` commands recursively call `addFile()`.

Outside `createFiles()`, `loadFiles()` is called with a single job and
drained immediately (`deferLoad` is false). Two cases:
- `addDependentLibrary()`: `.deplibs` sections trigger `addFile()`
  during the serial `doParseFiles()` loop.
- `--just-symbols`: pushes files directly, bypassing
`addFile`/`LoadJob`.

Thread-safety:
- A mutex serializes `BitcodeFile` / fatLTO constructors that call
  `ctx.saver` / `ctx.uniqueSaver`. Zero contention on pure ELF links.

    [23 lines not shown]
DeltaFile
+155-103lld/ELF/Driver.cpp
+20-2lld/ELF/Config.h
+1-1lld/ELF/InputFiles.h
+1-1lld/ELF/InputFiles.cpp
+177-1074 files

LLVM/project b2d7d89llvm/lib/Target/AMDGPU AMDGPURewriteAGPRCopyMFMA.cpp, llvm/test/CodeGen/AMDGPU rewrite-vgpr-mfma-to-agpr-spill-multi-store-mir.mir rewrite-vgpr-mfma-to-agpr-spill-multi-store.ll

Revert "[AMDGPU] Fixed verifier crash because of multiple live range components." (#193135)

Reverts llvm/llvm-project#190719

The Buildbot has detected a new failure on builder
sanitizer-aarch64-linux-bootstrap-hwasan while building llvm.
DeltaFile
+0-459llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-spill-multi-store-mir.mir
+0-148llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-spill-multi-store.ll
+0-19llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+0-6263 files

LLVM/project 4712ca8mlir/lib/Dialect/XeGPU/Transforms XeGPULayoutImpl.cpp, mlir/lib/Dialect/XeGPU/Utils XeGPUUtils.cpp

[MLIR][XeGPU] Recover temporary layout from Anchor Layout (#191947)

This PR refactor the recoverTemporaryLayout() method so that the
temporary layout is recovered from anchor layout, not from any user
specified temporary layout.
DeltaFile
+257-49mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+182-1mlir/test/Dialect/XeGPU/sg-to-wi-experimental-unit.mlir
+150-0mlir/test/Dialect/XeGPU/xegpu-recover-layout.mlir
+33-0mlir/test/lib/Dialect/XeGPU/TestXeGPUTransforms.cpp
+11-19mlir/test/Dialect/XeGPU/sg-to-wi-experimental.mlir
+19-5mlir/lib/Dialect/XeGPU/Utils/XeGPUUtils.cpp
+652-748 files not shown
+683-12114 files

LLVM/project ba3ceb8clang-tools-extra/clangd ProjectModules.cpp ProjectModules.h, clang-tools-extra/clangd/unittests PrerequisiteModulesTest.cpp

[NFC] [clangd] [C++20] [Modules] Introduce ProjectModules::getModuleNameState interface (#193133)

A hole in the current design is that, we assumed there is no duplicated
module name in different module interface in the same project.

This is not true techniquelly. ISO disallows duplicated module names in
a linked program. But we can have multiple program in a project. It will
be fine if they are not linked together. And in practice, it will be
fine if the symbols are masked and if these module interface units are
not showing in the same context of a single translation unit.

I am trying to improve this. This patch tries to add some NFC things to
reduce further patch size.

AI assisted.
DeltaFile
+32-14clang-tools-extra/clangd/ProjectModules.cpp
+9-0clang-tools-extra/clangd/ProjectModules.h
+4-0clang-tools-extra/clangd/unittests/PrerequisiteModulesTest.cpp
+45-143 files

LLVM/project ae8979dllvm/lib/Target/AMDGPU AMDGPURewriteAGPRCopyMFMA.cpp, llvm/test/CodeGen/AMDGPU rewrite-vgpr-mfma-to-agpr-spill-multi-store-mir.mir rewrite-vgpr-mfma-to-agpr-spill-multi-store.ll

Revert "[AMDGPU] Fixed verifier crash because of multiple live range componen…"

This reverts commit b39dfca39fa794d66580238fb382477e34fbd093.
DeltaFile
+0-459llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-spill-multi-store-mir.mir
+0-148llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-spill-multi-store.ll
+0-19llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+0-6263 files

LLVM/project b0b3e50llvm/lib/Target/NVPTX NVPTXInstrInfo.cpp NVPTXInstrInfo.td, llvm/test/CodeGen/NVPTX machine-cse-predicate-inversion-rollback.mir machine-cse-predicate-inversion-multiple-users.ll

update rollback logic and add test exercising it
DeltaFile
+66-0llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion-rollback.mir
+17-19llvm/lib/Target/NVPTX/NVPTXInstrInfo.cpp
+9-7llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion-multiple-users.ll
+1-1llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+93-274 files

LLVM/project b77ccdfllvm/test/CodeGen/NVPTX machine-cse-predicate-inversion-float16.ll machine-cse-predicate-inversion-bfloat16.ll

[NVPTX] Add commutativity to SETP instructions to enable MachineCSE of inverted predicates

Inverted predicates can be used freely in PTX. If we can invert a
predicate and CSE the generating instruction we can save calculating
the inverse.

Teach the NVPTX commuteInstructionImpl that SETP instructions can be
inverted to allow CSEing with previous SETP that match the inverted
form. This also inverts the branch users of the predicate to maintain
correctness.

Currently only allow the SETP inversion if all users are branches.
Future work can extend this to sel and not instructions.

Made-with: Cursor
DeltaFile
+695-0llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion-float16.ll
+695-0llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion-bfloat16.ll
+679-0llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion-float64.ll
+663-0llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion-float32.ll
+437-0llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion-int16.ll
+437-0llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion-int64.ll
+3,606-013 files not shown
+5,908-419 files

LLVM/project e4456ballvm/lib/Target/NVPTX NVPTXInstrInfo.cpp

clang-format
DeltaFile
+1-4llvm/lib/Target/NVPTX/NVPTXInstrInfo.cpp
+1-41 files

LLVM/project 1bfcddcclang-tools-extra/clang-tidy add_new_check.py, clang-tools-extra/docs/clang-tidy/checks list.rst

[clang-tidy][NFC] Fix list.rst and improve alias detection of `add_new_check.py` (#192228)

Follow up of https://github.com/llvm/llvm-project/pull/192224.

This commit does two things:

- Replace the original alias detection based on `:http-equiv` (we may
remove these completely in the future) with a method of directly
matching the documentation section.
- Update the list.rst

---------

Co-authored-by: Victor Chernyakin <chernyakin.victor.j at outlook.com>
DeltaFile
+120-109clang-tools-extra/clang-tidy/add_new_check.py
+2-2clang-tools-extra/docs/clang-tidy/checks/list.rst
+122-1112 files

LLVM/project 23f2a0allvm/lib/Target/NVPTX NVPTXInstrInfo.cpp NVPTXInstrInfo.td, llvm/lib/Target/NVPTX/MCTargetDesc NVPTXInstPrinter.cpp

Move predicate inversion to a flag
DeltaFile
+16-18llvm/lib/Target/NVPTX/NVPTXInstrInfo.cpp
+8-4llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+4-4llvm/test/CodeGen/NVPTX/branch-fold.mir
+4-4llvm/test/CodeGen/NVPTX/machinelicm-no-preheader.mir
+7-0llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
+2-2llvm/test/CodeGen/NVPTX/switch-loop-header.mir
+41-322 files not shown
+44-338 files

LLVM/project 4acbf99clang-tools-extra/clangd ProjectModules.cpp ScanningProjectModules.cpp, clang-tools-extra/clangd/unittests PrerequisiteModulesTest.cpp

[NFC] [clangd] [C++20] [Modules] Rename and move scanningProjectModules (#193128)

I am going to add more stuff to ProjectModules and the current structure
and the file name scanningProjectModules may be confusing.

This NFC patch changes that.
DeltaFile
+241-0clang-tools-extra/clangd/ProjectModules.cpp
+0-240clang-tools-extra/clangd/ScanningProjectModules.cpp
+0-26clang-tools-extra/clangd/ScanningProjectModules.h
+5-0clang-tools-extra/clangd/ProjectModules.h
+3-2clang-tools-extra/clangd/unittests/PrerequisiteModulesTest.cpp
+1-2clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
+250-2701 files not shown
+251-2717 files

LLVM/project 423d105llvm/test/CodeGen/RISCV/rvv vitofp-sdnode.ll vfptoi-sdnode.ll, llvm/test/Transforms/LoopVectorize/ARM mve-interleaved-cost.ll

Merge branch 'main' into users/efric/rocdl-dot
DeltaFile
+878-428llvm/test/CodeGen/RISCV/rvv/vitofp-sdnode.ll
+862-426llvm/test/CodeGen/RISCV/rvv/vfptoi-sdnode.ll
+428-526llvm/test/Transforms/LoopVectorize/ARM/mve-interleaved-cost.ll
+668-4llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwadd.ll
+619-4llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwsub.ll
+578-4llvm/test/CodeGen/RISCV/rvv/vfwsub-sdnode.ll
+4,033-1,392372 files not shown
+13,739-5,831378 files

LLVM/project 18f9423mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td

nits

Signed-off-by: Eric Feng <Eric.Feng at amd.com>
DeltaFile
+1-2mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+1-21 files

LLVM/project 9a5e96bmlir/include/mlir/Dialect/LLVMIR ROCDLOps.td, mlir/test/Dialect/LLVMIR rocdl.mlir

implement rocdl support for dot intrinsics

Signed-off-by: Eric Feng <Eric.Feng at amd.com>
DeltaFile
+112-0mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+101-0mlir/test/Dialect/LLVMIR/rocdl.mlir
+94-0mlir/test/Target/LLVMIR/rocdl.mlir
+307-03 files

LLVM/project b906cf3llvm/lib/CodeGen MachineBlockHashInfo.cpp

[CodeGen] Add constexpr and static_assert to fold64To16 (#192864)

This ensures the folding logic stability.
DeltaFile
+4-1llvm/lib/CodeGen/MachineBlockHashInfo.cpp
+4-11 files

LLVM/project b39dfcallvm/lib/Target/AMDGPU AMDGPURewriteAGPRCopyMFMA.cpp, llvm/test/CodeGen/AMDGPU rewrite-vgpr-mfma-to-agpr-spill-multi-store-mir.mir rewrite-vgpr-mfma-to-agpr-spill-multi-store.ll

[AMDGPU] Fixed verifier crash because of multiple live range components. (#190719)

In Rewrite AGPR-Copy-MFMA pass, after replacing spill instructions, the
replacement register may have multiple live range components when the
spill slot was stored to more than once. The verifier crashes with a bad
machine code error. This patch fixes the problem by splitting a live
range but assigning the same physical register in this scenario. A new
test has been added that verifies the absence of this verifier error.

Assisted-by: Claude Opus
DeltaFile
+459-0llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-spill-multi-store-mir.mir
+148-0llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-spill-multi-store.ll
+19-0llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+626-03 files

LLVM/project 9991423bolt/lib/Rewrite RewriteInstance.cpp

[BOLT] Fix stream position before appendPadding in writeEHFrameHeader

When writeEHFrameHeader needs to allocate new space for .eh_frame_hdr
(because the old section is too small), it calls appendPadding to align
NextAvailableAddress. appendPadding writes zero bytes at the current
stream position, but after the section write loop in rewriteFile the
stream is positioned at the end of the last section written in
BinarySection::operator< order — not at the file offset corresponding
to NextAvailableAddress.

In the common case (single loadObject call) the write order matches file
offset order, so the stream happens to be in the right place. But when
a runtime library adds sections via additional loadObject calls, the
operator< iteration order (code-before-data) can diverge from file
offset order: a runtime library code section may have a higher file
offset than a runtime library data section that comes after it in the
write loop. The stream then ends at a lower offset than expected, and
appendPadding's zeros overwrite the beginning of the code section.

Fix by seeking to the correct file offset before calling appendPadding.
DeltaFile
+1-0bolt/lib/Rewrite/RewriteInstance.cpp
+1-01 files

LLVM/project 021672fllvm/test/Transforms/LowerTypeTests x86-jumptable.ll aarch64-jumptable.ll

[test][LowerTypeTests] Re-generate jump table tests with --check-globals (#192734)

Debug information will be updated in the
https://github.com/llvm/llvm-project/pull/192736,
so we want to track the difference.
DeltaFile
+74-9llvm/test/Transforms/LowerTypeTests/x86-jumptable.ll
+20-8llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll
+94-172 files