LLVM/project aa3489fllvm/lib/Target/AMDGPU AMDGPULanePackedABI.cpp AMDGPULanePackedABI.h, llvm/test/CodeGen/AMDGPU inreg-vgpr-lane-packing.ll

[AMDGPU] Pack overflow inreg args into VGPR lanes

When inreg function arguments overflow the available SGPRs, pack multiple values
into lanes of a single VGPR using writelane/readlane instead of consuming one
VGPR per overflow argument.

The feature is behind a flag (default off) and currently only supports the
SelectionDAG path.

Known issue: if the register allocator does not coalesce the COPY between the
writelane chain and the physical call argument register, the resulting v_mov_b32
is EXEC-dependent and will not transfer inactive lanes. This is correct when
EXEC is all-ones (the common case at call sites) but would be incorrect inside
divergent control flow.
DeltaFile
+282-0llvm/test/CodeGen/AMDGPU/inreg-vgpr-lane-packing.ll
+152-0llvm/lib/Target/AMDGPU/AMDGPULanePackedABI.cpp
+54-0llvm/lib/Target/AMDGPU/AMDGPULanePackedABI.h
+51-3llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+1-0llvm/lib/Target/AMDGPU/CMakeLists.txt
+540-35 files

LLVM/project 03b70cbllvm/lib/Transforms/InstCombine InstCombineCompares.cpp, llvm/test/Transforms/InstCombine icmp-vector-bitwise-reductions.ll

[InstCombine] Restrict foldICmpOfVectorReduce to one-use (#182833)

Follow up on 279b3dbe ([InstCombine] Fold icmp (vreduce_(or|and) %x),
(0|-1), #182684) to fix a regression by restricting the fold to one-use.

Regression: https://godbolt.org/z/f38b169MM
DeltaFile
+30-0llvm/test/Transforms/InstCombine/icmp-vector-bitwise-reductions.ll
+4-2llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+34-22 files

LLVM/project 3b9c27dlibc/shared/math bf16addl.h, libc/src/__support/math bf16addl.h CMakeLists.txt

[libc][math] Refactor bf16addl implementation to header-only in src/__support/math folder. (#182561)

Resolves https://github.com/llvm/llvm-project/issues/181019
Part of https://github.com/llvm/llvm-project/issues/147386
DeltaFile
+26-0libc/src/__support/math/bf16addl.h
+23-0libc/shared/math/bf16addl.h
+15-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+11-1libc/src/__support/math/CMakeLists.txt
+2-5libc/src/math/generic/bf16addl.cpp
+1-5libc/src/math/generic/CMakeLists.txt
+78-113 files not shown
+82-129 files

LLVM/project 094a68bllvm/lib/Target/AMDGPU/MCTargetDesc AMDGPUInstPrinter.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp

Use AMDGPU:: for generation check
DeltaFile
+1-1llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+1-1llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+2-22 files

LLVM/project bf97ff0llvm/cmake/modules HandleLLVMOptions.cmake

[CMake] Disable PCH when LLVM_ENABLE_MODULES is set (#182914)

DeltaFile
+5-0llvm/cmake/modules/HandleLLVMOptions.cmake
+5-01 files

LLVM/project b19a0e0libc/shared/math getpayloadf16.h getpayloadf128.h, libc/src/__support/math CMakeLists.txt getpayloadf128.h

[libc][math] Refactor getpayload family functions to header-only (#181824)

Refactors the payload_functions math family to be header-only.

part of: https://github.com/llvm/llvm-project/issues/181823

Target Functions:
  - getpayload
  - getpayloadbf16
  - getpayloadf
  - getpayloadf128
  - getpayloadf16
DeltaFile
+87-5utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+57-0libc/src/__support/math/CMakeLists.txt
+32-0libc/src/__support/math/getpayloadf128.h
+32-0libc/src/__support/math/getpayloadf16.h
+28-0libc/shared/math/getpayloadf16.h
+28-0libc/shared/math/getpayloadf128.h
+264-518 files not shown
+502-4224 files

LLVM/project d304980clang/test/CodeGenHLSL matrix-member-zero-based-accessor-scalar-store.hlsl matrix-member-one-based-accessor-scalar-store.hlsl, llvm/test/CodeGen/AArch64 machine-outliner-bundle-debuginfo.mir

fix

Created using spr 1.3.7
DeltaFile
+0-185llvm/test/Transforms/InstCombine/AMDGPU/tensor-load-store-lds.ll
+113-0llvm/test/CodeGen/AArch64/machine-outliner-bundle-debuginfo.mir
+31-79clang/test/CodeGenHLSL/matrix-member-zero-based-accessor-scalar-store.hlsl
+31-79clang/test/CodeGenHLSL/matrix-member-one-based-accessor-scalar-store.hlsl
+49-33llvm/test/CodeGen/AMDGPU/llvm.amdgcn.tensor.load.store.ll
+32-48clang/test/CodeGenHLSL/matrix-member-one-based-accessor-scalar-load.hlsl
+256-42436 files not shown
+779-78642 files

LLVM/project 3236cdaclang/test/CodeGenHLSL matrix-member-one-based-accessor-scalar-store.hlsl matrix-member-zero-based-accessor-scalar-store.hlsl, llvm/test/CodeGen/AArch64 machine-outliner-bundle-debuginfo.mir

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+0-185llvm/test/Transforms/InstCombine/AMDGPU/tensor-load-store-lds.ll
+113-0llvm/test/CodeGen/AArch64/machine-outliner-bundle-debuginfo.mir
+31-79clang/test/CodeGenHLSL/matrix-member-one-based-accessor-scalar-store.hlsl
+31-79clang/test/CodeGenHLSL/matrix-member-zero-based-accessor-scalar-store.hlsl
+49-33llvm/test/CodeGen/AMDGPU/llvm.amdgcn.tensor.load.store.ll
+32-48clang/test/CodeGenHLSL/matrix-member-one-based-accessor-scalar-load.hlsl
+256-42435 files not shown
+778-78541 files

LLVM/project 4ffa619llvm/lib/SandboxIR Region.cpp, llvm/unittests/SandboxIR RegionTest.cpp

Revert "[SandboxIR][Region] Replace exit() with reportFatalUsageError() (#182134)"

This reverts commit 055b1efc1fe34106a8dc00a667708d5619077206.
DeltaFile
+1-1llvm/lib/SandboxIR/Region.cpp
+0-2llvm/unittests/SandboxIR/RegionTest.cpp
+1-32 files

LLVM/project 5bccf34libc/shared/math bf16fmaf128.h, libc/src/__support/math bf16fmaf128.h CMakeLists.txt

[libc][math] Refactor `bf16fmaf128` to header-only (#182009)

Close #181627
DeltaFile
+32-0libc/src/__support/math/bf16fmaf128.h
+28-0libc/shared/math/bf16fmaf128.h
+16-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+10-0libc/src/__support/math/CMakeLists.txt
+2-5libc/src/math/generic/bf16fmaf128.cpp
+1-5libc/src/math/generic/CMakeLists.txt
+89-103 files not shown
+94-109 files

LLVM/project 8eea160clang/test/Sema attr-overflow-behavior-format-strings.c attr-overflow-behavior-templates.cpp

[Clang][NFC] Don't redefine __trap macro in tests for PowerPC (#182898)

These `OverflowBehaviorType` tests were failing due to PowerPC already
defining a __trap macro.

We can just remove the __wrap and __trap macros as they are unused in these tests.

Signed-off-by: Justin Stitt <justinstitt at google.com>
DeltaFile
+0-3clang/test/Sema/attr-overflow-behavior-format-strings.c
+0-3clang/test/Sema/attr-overflow-behavior-templates.cpp
+0-3clang/test/Sema/attr-overflow-behavior.c
+0-3clang/test/Sema/attr-overflow-behavior.cpp
+0-124 files

LLVM/project 334353cllvm/lib/Transforms/Utils Local.cpp, llvm/test/Transforms/InstCombine invalid-alloca-poison-size.ll

[InstCombine] Document transformation that leads to invalid IR.
DeltaFile
+30-0llvm/test/Transforms/InstCombine/invalid-alloca-poison-size.ll
+4-0llvm/lib/Transforms/Utils/Local.cpp
+34-02 files

LLVM/project b8a4bf2llvm/lib/Target/RISCV RISCVInstrInfoXAndes.td

[RISCV] Remove extra ReadSFBALU from SFBNDS_BFO. (#182900)

There are only 4 register operands so there should only be 4 Read.
DeltaFile
+1-2llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td
+1-21 files

LLVM/project 546c526llvm/include/llvm/Support GenericDomTreeConstruction.h, llvm/unittests/Analysis DomTreeUpdaterTest.cpp

domtree
DeltaFile
+220-0llvm/unittests/Analysis/DomTreeUpdaterTest.cpp
+5-2llvm/include/llvm/Support/GenericDomTreeConstruction.h
+225-22 files

LLVM/project ba29460clang/lib/CodeGen CGObjCMac.cpp, clang/test/CodeGenObjC block-layout-section.m

Move the ObjC blocks layout bitmap to the cstring section (#182398)

This is a follow-up to https://github.com/llvm/llvm-project/pull/174705

There's one additional place in the ObjC code gen logic to make sure the
ObjC blocks layout is generated in the regular cstring section.
DeltaFile
+24-0clang/test/CodeGenObjC/block-layout-section.m
+1-1clang/lib/CodeGen/CGObjCMac.cpp
+25-12 files

LLVM/project e6f3033clang/docs CMakeLists.txt index.rst, clang/include/clang/Basic BuiltinsAMDGPUDocs.td BuiltinsAMDGPU.td

[Clang][AMDGPU][Docs] Add builtin documentation for AMDGPU builtins (#181574)

Use the documentation generation infrastructure to document the AMDGPU
builtins.

This PR starts with the ABI / Special Register builtins. Documentation
for the remaining builtin categories will be added incrementally in
follow-up patches.
DeltaFile
+268-0clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+100-27clang/include/clang/Basic/BuiltinsAMDGPU.td
+1-0clang/docs/CMakeLists.txt
+1-0clang/docs/index.rst
+370-274 files

LLVM/project 77b31b9llvm/test/CodeGen/X86 combine-or.ll

[X86] Add test showing failure to fold or(buildvector(),buildvector()) pattern into a common buildvector() (#182906)

DeltaFile
+40-0llvm/test/CodeGen/X86/combine-or.ll
+40-01 files

LLVM/project 79ea498llvm/lib/Target/AVR AVRISelLowering.cpp, llvm/test/CodeGen/AVR cmp.ll

[AVR] Fix SETUGT during 128b -> 64b lowering (#182690)

Closes https://github.com/llvm/llvm-project/issues/181504.
DeltaFile
+10-8llvm/lib/Target/AVR/AVRISelLowering.cpp
+18-0llvm/test/CodeGen/AVR/cmp.ll
+28-82 files

LLVM/project 9a91c50llvm/include/llvm/CodeGen SelectionDAG.h, llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

[DAG] isKnownNeverZero - add DemandedElts argument (#182679)

Following changes were made for isKnownNeverZero :
- Added BUILDVECTOR and SPLATVECTOR cases.
- Added support for DemandedElts arguments for SELECT/VSELECT cases.  
- Added tests for constants and SELECT/VSELECT.

Closes #181656
DeltaFile
+84-0llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
+41-4llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+6-0llvm/include/llvm/CodeGen/SelectionDAG.h
+131-43 files

LLVM/project 6705802utils/bazel/llvm-project-overlay/lldb/source/Plugins BUILD.bazel plugin_config.bzl

[lldb][bazel] Add HighlighterDefault, rename ClangHighlighter targets (#182693)

Rename `PluginClangHighlighter` to `PluginHighlighterClang` for
consistency with the directory-based naming convention, add the new
`PluginHighlighterDefault` library, and register both `HighlighterClang`
and `HighlighterDefault` in `DEFAULT_PLUGINS`.
DeltaFile
+17-6utils/bazel/llvm-project-overlay/lldb/source/Plugins/BUILD.bazel
+2-0utils/bazel/llvm-project-overlay/lldb/source/Plugins/plugin_config.bzl
+19-62 files

LLVM/project 9926ea9llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU setcc-select-hi32mask.ll carryout-selection.ll

[AMDGPU][ISel] Reduce 64-bit `setcc` to upper 32 bits if lower 32 bits are known (#181238)

Truncate 64-bit integral `setcc`s to their upper 32-bit operands if
enough information is known about their lower 32-bit operands, subsuming
the special cases handled in #177662.

Alive2 verification for analogous IR transformations:
[xdATxK](https://alive2.llvm.org/ce/z/xdATxK)
DeltaFile
+900-0llvm/test/CodeGen/AMDGPU/setcc-select-hi32mask.ll
+213-217llvm/test/CodeGen/AMDGPU/carryout-selection.ll
+58-59llvm/test/CodeGen/AMDGPU/wave32.ll
+87-17llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+32-53llvm/test/CodeGen/AMDGPU/srem.ll
+24-24llvm/test/CodeGen/AMDGPU/extract-subvector.ll
+1,314-3703 files not shown
+1,343-4119 files

LLVM/project 2503ba6clang/unittests/Analysis/Scalable EntityLinkerTest.cpp, lldb/source/Core PluginManager.cpp

Merge branch 'main' into users/shiltian/remove-unused-dump-code-feature
DeltaFile
+634-0clang/unittests/Analysis/Scalable/EntityLinkerTest.cpp
+405-7llvm/test/CodeGen/AMDGPU/call-args-inreg.ll
+403-0llvm/test/Transforms/SLPVectorizer/X86/copyable_reorder.ll
+164-124llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
+272-0llvm/test/Transforms/LoopVectorize/hoist-predicated-loads-with-predicated-stores.ll
+154-84lldb/source/Core/PluginManager.cpp
+2,032-215141 files not shown
+5,463-1,090147 files

LLVM/project 36445f7llvm/lib/Target/AMDGPU AMDGPUCallLowering.cpp, llvm/test/CodeGen/AMDGPU call-args-inreg.ll cc-inreg-sgpr0-3-mismatch.ll

[AMDGPU] Fix caller/callee mismatch in SGPR assignment for inreg args (#182754)

On the callee side, `LowerFormalArguments` marks SGPR0-3 as allocated in
`CCState` before running the CC analysis. On the caller side,
`LowerCall` (and GlobalISel's `lowerCall`/`lowerTailCall`) added the
scratch resource to `RegsToPass` without marking it in `CCState`. This
caused `CC_AMDGPU_Func` to treat SGPR0-3 as available on the caller
side, assigning user inreg args there, while the callee skipped them
without marking it in `CCState`. This caused `CC_AMDGPU_Func` to treat
SGPR0-3 as available on the caller side, assigning user inreg args
there, while the callee skipped them.
DeltaFile
+405-7llvm/test/CodeGen/AMDGPU/call-args-inreg.ll
+168-39llvm/test/CodeGen/AMDGPU/cc-inreg-sgpr0-3-mismatch.ll
+84-2llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll
+41-41llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll
+8-8llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
+12-0llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+718-975 files not shown
+730-11311 files

LLVM/project eaecf95clang/docs CMakeLists.txt index.rst, clang/include/clang/Basic BuiltinsAMDGPUDocs.td BuiltinsAMDGPU.td

[Clang][AMDGPU][Docs] Add builtin documentation for AMDGPU builtins

Use the documentation generation infrastructure to document the AMDGPU builtins.
This PR starts with the ABI / Special Register builtins. Documentation for the
remaining builtin categories will be added incrementally in follow-up patches.
DeltaFile
+268-0clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+100-27clang/include/clang/Basic/BuiltinsAMDGPU.td
+1-0clang/docs/CMakeLists.txt
+1-0clang/docs/index.rst
+370-274 files

LLVM/project 606e97dllvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlan.h, llvm/test/Transforms/LoopVectorize vplan-based-stride-mv.ll

[VPlan] Start implementing VPlan-based stride multiversioning

This commit only implements the run-time guard without actually
optimizing the vector loop. That would come in a separate PR to ease
review.
DeltaFile
+249-66llvm/test/Transforms/LoopVectorize/vplan-based-stride-mv.ll
+148-70llvm/test/Transforms/LoopVectorize/VPlan/vplan-based-stride-mv.ll
+117-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+43-0llvm/lib/Transforms/Vectorize/VPlan.h
+14-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+7-0llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+578-1394 files not shown
+596-14110 files

LLVM/project de6cadeclang/test/CIR/CodeGenHLSL matrix-element-expr-load.hlsl

[CIR] Fix HLSL test that crashes (#182894)

This was caused by #182609, which just changed the way the AST stores
these, which causes us to hit an NYI in a way that doesn't recover
nicely. In the future, we could probably represent a 'no op' instead of
an empty op in the IR for these cases, but there isn't much use for it,
   since it is always after NYI.

This patch changes the test to use float instead of float1 as suggested
in review, which avoids the problematic conversion.
DeltaFile
+1-1clang/test/CIR/CodeGenHLSL/matrix-element-expr-load.hlsl
+1-11 files

LLVM/project 812c6f8mlir/include/mlir/Dialect/OpenACC OpenACCParMapping.h

[mlir][acc] Add parallelism mapping policy interface (#182890)

Add a header that defines the interface for mapping OpenACC parallelism
levels (gang, worker, vector) to target-specific parallel dimension
attributes. Alongside this,
DefaultACCToGPUMappingPolicy is introduced for an initial implementation
of ACC parallelism to GPU mapping.
DeltaFile
+164-0mlir/include/mlir/Dialect/OpenACC/OpenACCParMapping.h
+164-01 files

LLVM/project dd6e7b8llvm/test/Transforms/LoopVectorize hoist-predicated-loads-with-predicated-stores.ll

[LV] Add corner-case tests for licm of predicated memops (#182828)

DeltaFile
+272-0llvm/test/Transforms/LoopVectorize/hoist-predicated-loads-with-predicated-stores.ll
+272-01 files

LLVM/project 1356a51llvm/test/CodeGen/AMDGPU call-args-inreg.ll cc-inreg-sgpr0-3-mismatch.ll, llvm/test/CodeGen/AMDGPU/GlobalISel irtranslator-call.ll

[AMDGPU] Fix caller/callee mismatch in SGPR assignment for inreg args

On the callee side, `LowerFormalArguments` marks SGPR0-3 as allocated in
`CCState` before running the CC analysis. On the caller side, `LowerCall` (and
GlobalISel's `lowerCall`/`lowerTailCall`) added the scratch resource to
`RegsToPass` without marking it in `CCState`. This caused `CC_AMDGPU_Func` to
treat SGPR0-3 as available on the caller side, assigning user inreg args there,
while the callee skipped them without marking it in `CCState`. This caused
`CC_AMDGPU_Func` to treat SGPR0-3 as available on the caller side, assigning
user inreg args there, while the callee skipped them.
DeltaFile
+405-7llvm/test/CodeGen/AMDGPU/call-args-inreg.ll
+168-39llvm/test/CodeGen/AMDGPU/cc-inreg-sgpr0-3-mismatch.ll
+84-2llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll
+41-41llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll
+8-8llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
+4-8llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.ll
+710-1055 files not shown
+730-11311 files

LLVM/project 055b1efllvm/lib/SandboxIR Region.cpp, llvm/unittests/SandboxIR RegionTest.cpp

[SandboxIR][Region] Replace exit() with reportFatalUsageError() (#182134)

`Region::createRegionsFromMD()` parses the IR and the corresponding
metadata and forms one or more Regions. If an instruction is tagged as
being part of the "auxiliary" vector of the region, then a check
enforces that it should also be part of a region, i.e., it should have
both `!sandboxaux` and `!sandboxvec` metadata, not just `!sandboxaux`.

The check used to `exit(1)` after printing an error, but it's better to
abort using LLVM's error handling functions. Since the user can write
the IR by hand I think it makes sense to report this as a usage error
with `reportFatalUsageError()`, and not as an internal error.
DeltaFile
+1-1llvm/lib/SandboxIR/Region.cpp
+2-0llvm/unittests/SandboxIR/RegionTest.cpp
+3-12 files