LLVM/project 58f2c18llvm/lib/Transforms/InstCombine InstCombineShifts.cpp, llvm/test/Transforms/InstCombine shift-sub.ll

[InstCombine] Fold shift of a constant into a reverse shift (#192982)

    C1 << (C2 - X) -> (C1 << C2) >> X
    C1 << (C2 ^ X) -> (C1 << C2) >> X (if equivalent to the above)
    C1 >> (C2 - X) -> (C1 >> C2) << X (right shift modes match)
    C1 >> (C2 ^ X) -> (C1 >> C2) << X (if equivalent to the above)

Proof: https://alive2.llvm.org/ce/z/q-4soi
DeltaFile
+513-0llvm/test/Transforms/InstCombine/shift-sub.ll
+66-31llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
+579-312 files

LLVM/project 9b78781clang/lib/CIR/CodeGen CIRGenBuiltinAMDGPU.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn.hip

[CIR][AMDGPU] Add lowering for amdgcn_div_scale builtins (#192931)

Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/2050

This PR adds support for lowering of _builtin_amdgcn_div_scale* amdgpu
builtins to clangIR.
Followed similar lowering from reference clang->llvmir in
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp.
DeltaFile
+49-0clang/test/CIR/CodeGenHIP/builtins-amdgcn.hip
+27-4clang/lib/CIR/CodeGen/CIRGenBuiltinAMDGPU.cpp
+76-42 files

LLVM/project 30c5cfdllvm/lib/CodeGen ExpandVectorPredication.cpp, llvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVTargetTransformInfo.h

[RISCV] Remove codegen for vp_is_fpclass (#193222)

Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off vp_is_fpclass from #179622.
DeltaFile
+51-58llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfclass-vp.ll
+23-39llvm/test/CodeGen/RISCV/rvv/vfclass-vp.ll
+1-15llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+3-1llvm/lib/CodeGen/ExpandVectorPredication.cpp
+0-1llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+78-1145 files

LLVM/project 48fc5a0flang/lib/Lower/OpenMP Utils.cpp OpenMP.cpp, flang/test/Lower/OpenMP metadirective-device-isa.f90 metadirective-implementation.f90

[flang][OpenMP] Support lowering of metadirective (part 1)

This patch implements following feature in metadirective:
- implementation={vendor(...)}
- device={kind(...), isa(...), arch(...)}
- user={condition(<constant-expr>)}
- construct={parallel, target, teams}
- default, nothing, and otherwise clause

Dynamic user conditions and loop-associated variants are deferred
to follow-up patches.

This patch is part of the feature work for #188820.

Assisted with copilot and GPT-5.4
DeltaFile
+204-0flang/lib/Lower/OpenMP/Utils.cpp
+180-1flang/lib/Lower/OpenMP/OpenMP.cpp
+162-0flang/test/Lower/OpenMP/metadirective-device-isa.f90
+121-0flang/test/Lower/OpenMP/metadirective-implementation.f90
+33-0flang/test/Lower/OpenMP/metadirective-static.f90
+30-0flang/test/Lower/OpenMP/metadirective-construct.f90
+730-16 files not shown
+786-1912 files

LLVM/project fda7c9fllvm/unittests/Frontend OpenMPIRBuilderTest.cpp

Try to fix unit tests
DeltaFile
+33-0llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+33-01 files

LLVM/project bee932fllvm/lib/Target/NVPTX NVPTXISelLowering.cpp NVPTXAsmPrinter.cpp, llvm/test/CodeGen/NVPTX unknown-intrinsic.ll

[NVPTX] Improve error diagnostic when handling unknown intrinsics (#191194)

Following up on #146726, it may be desirable to gracefully fail the
compilation in the presence of unknown NVVM intrinsics, which
cannot be lowered by the NVPTX backend, rather than silently
emitting invalid PTX.
DeltaFile
+15-10llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+12-0llvm/test/CodeGen/NVPTX/unknown-intrinsic.ll
+9-0llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+36-103 files

LLVM/project 4ab33dcllvm/test/CodeGen/RISCV/rvv vselect-vp.ll fixed-vectors-vmacc-vp.ll

[RISCV] Remove codegen for vp_select (#194199)

Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off vp.select from #179622
DeltaFile
+94-199llvm/test/CodeGen/RISCV/rvv/vselect-vp.ll
+123-162llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vmacc-vp.ll
+123-162llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vnmsac-vp.ll
+96-129llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect-vp.ll
+52-52llvm/test/CodeGen/RISCV/rvv/vmacc-vp.ll
+52-52llvm/test/CodeGen/RISCV/rvv/vnmsac-vp.ll
+540-75613 files not shown
+872-1,09919 files

LLVM/project 4d2d6a0llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch vector-fp-imm.ll

[LoongArch] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector (#164943)

On 64-bit targets the generic legalize will use an i64 load and a
scalar_to_vector for us. But on 32-bit targets, i64 isn't legal, and the
generic legalizer will end up emitting two 32-bit loads. This patch uses
f64 to avoid the splitting entirely and the redundant int->fp
conversion.
DeltaFile
+26-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+8-18llvm/test/CodeGen/LoongArch/lsx/ir-instruction/fpext.ll
+1-2llvm/test/CodeGen/LoongArch/vector-fp-imm.ll
+35-203 files

LLVM/project be68b10llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

[MLIR][OpenMP] Post-translate declare-target USM indirection in OpenMPIRBuilder

When lowering OpenMP to LLVM IR for the target device, record pairs of the
`declare target` device global and the OMPIRBuilder "ref" pointer global
(used for unified shared memory) via `OpenMPIRBuilder`. During the
`OpenMPIRBuilder::finalize` pass, run a postpass that rewrites remaining uses of the
original global to load from the ref global and adjust the pointer (shared
path for `ConstantExpr` addrspace/bitcast chains and for direct
instruction uses).

This follows what is done by clang for similar cases:
https://reviews.llvm.org/D63108.

Co-authored-by: Composer
Co-authored-by: Gemini Pro
DeltaFile
+68-0llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+39-0offload/test/offloading/fortran/declare-target-usm-ref-ptr.f90
+24-0mlir/test/Target/LLVMIR/omptarget-declare-target-usm-ref-ptr.mlir
+20-0llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+11-3mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+162-35 files

LLVM/project 68e6968flang/lib/Optimizer/OpenMP MapInfoFinalization.cpp, flang/test/Transforms omp-map-info-finalization-usm.fir

[Flang][OpenMP] Clear close on descriptor members for box parents in USM

Extend the MapInfoFinalization walk introduced in #185330 so
parent/member close consistency is enforced whenever
unified_shared_memory is in effect, not only when the parent map's
variable is a fir.RecordType. Allocatable (box) roots expand to member
maps the same way as derived-type instances; getDescriptorMapType may
add OMP_MAP_CLOSE to implicit descriptor members while the parent map
does not set close, which led to bad device behavior under
-fopenmp-force-usm with multiple mapped allocatables.

Co-authored-by: Composer (Cursor) <ai at cursor.com>
DeltaFile
+49-0offload/test/offloading/fortran/usm-box-parent-descriptor-close.f90
+12-12flang/test/Transforms/omp-map-info-finalization-usm.fir
+6-12flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+67-243 files

LLVM/project 1f9c611llvm/test/Transforms/LoopFusion triple_loop_nest_inner_guard.ll double_loop_nest_inner_guard.ll

[LoopFusion][NFC] UTC gen some tests (#193755)

Some variables need rename as UTC normalizes IR value names. Also,
remove dead variable `%M` and `%N` from
`double_loop_nest_inner_guard.ll`
DeltaFile
+68-51llvm/test/Transforms/LoopFusion/triple_loop_nest_inner_guard.ll
+50-37llvm/test/Transforms/LoopFusion/double_loop_nest_inner_guard.ll
+118-882 files

LLVM/project f115551llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

[MLIR][OpenMP] Post-translate declare-target USM indirection in OpenMPIRBuilder

When lowering OpenMP to LLVM IR for the target device, record pairs of the
`declare target` device global and the OMPIRBuilder "ref" pointer global
(used for unified shared memory) via `OpenMPIRBuilder`. During the
`OpenMPIRBuilder::finalize` pass, run a postpass that rewrites remaining uses of the
original global to load from the ref global and adjust the pointer (shared
path for `ConstantExpr` addrspace/bitcast chains and for direct
instruction uses).

This follows what is done by clang for similar cases:
https://reviews.llvm.org/D63108.

Co-authored-by: Composer
Co-authored-by: Gemini Pro
DeltaFile
+68-0llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+39-0offload/test/offloading/fortran/declare-target-usm-ref-ptr.f90
+24-0mlir/test/Target/LLVMIR/omptarget-declare-target-usm-ref-ptr.mlir
+20-0llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+11-3mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+162-35 files

LLVM/project 504930bllvm/test/CodeGen/X86 machine-block-hash.mir

[X86] Remove update_mir_test_checks.py NOTE (#194278)

The test checks printer output, not MIR.
It was probably copy-pasted in #193107 from other test.
DeltaFile
+0-1llvm/test/CodeGen/X86/machine-block-hash.mir
+0-11 files

LLVM/project 5c77411flang/lib/Optimizer/OpenMP MapInfoFinalization.cpp, flang/test/Transforms omp-map-info-finalization-usm.fir

[Flang][OpenMP] Clear close on descriptor members for box parents in USM

Extend the MapInfoFinalization walk introduced in #185330 so
parent/member close consistency is enforced whenever
unified_shared_memory is in effect, not only when the parent map's
variable is a fir.RecordType. Allocatable (box) roots expand to member
maps the same way as derived-type instances; getDescriptorMapType may
add OMP_MAP_CLOSE to implicit descriptor members while the parent map
does not set close, which led to bad device behavior under
-fopenmp-force-usm with multiple mapped allocatables.

Co-authored-by: Composer (Cursor) <ai at cursor.com>
DeltaFile
+49-0offload/test/offloading/fortran/usm-box-parent-descriptor-close.f90
+12-12flang/test/Transforms/omp-map-info-finalization-usm.fir
+6-12flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+67-243 files

LLVM/project 2a09db4llvm/lib/Target/AMDGPU SIWholeQuadMode.cpp, llvm/test/CodeGen/AMDGPU wqm-propagate-for-execz-side-effect.mir

AMDGPU: Back-propagate wqm for sources of side-effect instruction (#193395)

For readfirstlane instruction, as it would get undefined value if exec
is zero. To handle the case that only helper lanes execute the parent
block, we let the readfirstlane to execute under wqm. But this is not
enough. If the parent block was also executed by non-helper lanes, we
also need to make sure its sources were calculated under wqm. Otherwise,
if the instruction that generate the source of readfirstlane was
executed under exact mode, the value would contain garbage data in help
lane. The garbage data in helper lane maybe returned by the
readfirstlane running under wqm.

To fix this issue, we need to enforce the back-propagation of wqm for
instructions like readfirstlane. This was only done if the instruction
was possibly in the middle of wqm region (by checking OutNeeds).
DeltaFile
+35-4llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
+1-1llvm/test/CodeGen/AMDGPU/wqm-propagate-for-execz-side-effect.mir
+36-52 files

LLVM/project 4bf5bcbllvm/unittests/ADT StableHashingTest.cpp CMakeLists.txt

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+44-0llvm/unittests/ADT/StableHashingTest.cpp
+1-0llvm/unittests/ADT/CMakeLists.txt
+45-02 files

LLVM/project 75f6489llvm/test/CodeGen/RISCV/rvv fixed-vectors-vmacc-vp.ll fixed-vectors-vnmsac-vp.ll

rebase

Created using spr 1.3.7
DeltaFile
+438-234llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vmacc-vp.ll
+438-234llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vnmsac-vp.ll
+241-326llvm/test/CodeGen/RISCV/rvv/vadd-vp.ll
+201-265llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vadd-vp.ll
+175-179llvm/test/CodeGen/RISCV/rvv/vmul-vp.ll
+141-166llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vmul-vp.ll
+1,634-1,40428 files not shown
+2,855-2,13734 files

LLVM/project 9cbf724llvm/lib/TargetParser TargetDataLayout.cpp

clang-format

Created using spr 1.3.8-beta.1
DeltaFile
+2-1llvm/lib/TargetParser/TargetDataLayout.cpp
+2-11 files

LLVM/project 04031a9llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

only look at ABI

Created using spr 1.3.8-beta.1
DeltaFile
+160,853-171,875llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+54,567-55,132llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+92,827-0llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+31,320-33,737llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+759,347-260,74429,181 files not shown
+4,174,883-1,618,10929,187 files

LLVM/project 2d9efd2llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.8-beta.1

[skip ci]
DeltaFile
+160,853-171,875llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+54,567-55,132llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+92,827-0llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+31,320-33,737llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+759,347-260,74429,180 files not shown
+4,174,851-1,618,08029,186 files

LLVM/project 41236fbllvm/lib/Transforms/Scalar GVN.cpp, llvm/test/Transforms/GVN tbaa.ll

[GVN] Propagate isMemorySSAEnabled() into ValueTable (#193938)

`GVNPass::runImpl()` calls `VN.setMemorySSA(MSSA)` with a single
argument. The second parameter of `ValueTable::setMemorySSA()`,
`MSSAEnabled`, defaults to `false`, so `ValueTable::IsMSSAEnabled`
remains false even when the pass is configured with
`-enable-gvn-memoryssa=1` or `-passes='gvn<memoryssa>'`.

The MemorySSA-backed value-numbering paths in
`ValueTable::lookupOrAddCall()` and `ValueTable::computeLoadStoreVN()`
are gated on `IsMSSAEnabled`, making them unreachable from runImpl() on
main today.

This patch forwards isMemorySSAEnabled() as the second argument to
setMemorySSA(), so selecting the MemorySSA backend actually enables
MemorySSA-aware value numbering.
DeltaFile
+36-90llvm/test/Transforms/GVN/tbaa.ll
+4-1llvm/lib/Transforms/Scalar/GVN.cpp
+40-912 files

LLVM/project 87e285c. pyproject.toml

[𝘀𝗽𝗿] initial version

Created using spr 1.3.8-beta.1
DeltaFile
+10-0pyproject.toml
+10-01 files

LLVM/project ab6582bclang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded vfncvtbf16.c, llvm/test/CodeGen/RISCV/rvv setcc-int-vp.ll

update switch

Created using spr 1.3.8-beta.1
DeltaFile
+3,230-456llvm/test/CodeGen/WebAssembly/strided-int-mac.ll
+704-882llvm/test/CodeGen/RISCV/rvv/setcc-int-vp.ll
+980-230mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir
+1,024-0llvm/test/Transforms/LoopUnroll/debug-and-remarks.ll
+0-987mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir
+472-472clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded/vfncvtbf16.c
+6,410-3,027759 files not shown
+31,136-15,914765 files

LLVM/project 161e56bclang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded vfncvtbf16.c, llvm/test/CodeGen/RISCV/rvv setcc-int-vp.ll fixed-vectors-setcc-int-vp.ll

Merge branch 'main' into users/ylzsx/v2f32-load-legalize
DeltaFile
+3,230-456llvm/test/CodeGen/WebAssembly/strided-int-mac.ll
+704-882llvm/test/CodeGen/RISCV/rvv/setcc-int-vp.ll
+980-230mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir
+0-987mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir
+472-472clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded/vfncvtbf16.c
+345-558llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-int-vp.ll
+5,731-3,585576 files not shown
+25,323-14,291582 files

LLVM/project db57208llvm/test/CodeGen/X86 machine-block-hash.mir

[X86] Mark machine-block-hash.mir as XFAIL on big-endian hosts (#194279)

Test introduced in #193107 assumes `stable_hash_combine` is stable,
but it turns out it's not true.
DeltaFile
+3-0llvm/test/CodeGen/X86/machine-block-hash.mir
+3-01 files

LLVM/project e042f67llvm/lib/Target/LoongArch LoongArchInstrInfo.cpp LoongArchInstrInfo.h, llvm/test/CodeGen/LoongArch stackslot.mir

[LoongArch] Override `isLoadFromStackSlot/isStoreToStackSlot` to expose more optimizations (#164561)
DeltaFile
+245-0llvm/test/CodeGen/LoongArch/stackslot.mir
+76-0llvm/lib/Target/LoongArch/LoongArchInstrInfo.cpp
+9-0llvm/lib/Target/LoongArch/LoongArchInstrInfo.h
+330-03 files

LLVM/project 0d704c3llvm/test/CodeGen/X86 machine-block-hash.mir

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+3-0llvm/test/CodeGen/X86/machine-block-hash.mir
+3-01 files

LLVM/project a881a30llvm/lib/Target/AMDGPU SIWholeQuadMode.cpp

Apply suggestion from @ruiling
DeltaFile
+0-1llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
+0-11 files

LLVM/project 37ac1efllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/RISCV/rvv fixed-vectors-fp-setcc.ll fixed-vectors-setcc-fp-vp.ll

Merge branch 'main' into users/ruiling/wqm-prop-sideeffect
DeltaFile
+4,811-4,818llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+326-4,626llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-setcc.ll
+1,872-1,883llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+3,230-456llvm/test/CodeGen/WebAssembly/strided-int-mac.ll
+565-2,727llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-fp-vp.ll
+1,117-1,613llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll
+11,921-16,1233,545 files not shown
+154,376-91,2463,551 files

LLVM/project ef09defllvm/test/CodeGen/AMDGPU wqm-propagate-for-execz-side-effect.mir

[test][AMDGPU] Precommit test for Back-propagate wqm for sources of side-effect instruction (#193394)
DeltaFile
+238-0llvm/test/CodeGen/AMDGPU/wqm-propagate-for-execz-side-effect.mir
+238-01 files