LLVM/project d98fd41llvm/lib/CodeGen ExpandIRInsts.cpp, llvm/test/Transforms/ExpandIRInsts/X86 expand-large-fp-convert-fpto-sat-vector.ll

[ExpandIRInsts] Support llvm.fpto{u,s}i.sat (#199174)

Previously, running ExpandIRInsts on a program which needs to expand a
vector fptoui.sat would hit llvm_unreachable, because the `scalarize`
function didn't handle this intrinsic.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
DeltaFile
+320-0llvm/test/Transforms/ExpandIRInsts/X86/expand-large-fp-convert-fpto-sat-vector.ll
+6-1llvm/lib/CodeGen/ExpandIRInsts.cpp
+326-12 files

LLVM/project 753008dflang/lib/Lower/OpenMP OpenMP.cpp, flang/test/Lower/OpenMP target-inreduction.f90

[flang][OpenMP] Lower target in_reduction for host fallback

Teach Flang lowering and MLIR OpenMP translation to carry
in_reduction through omp.target for the host-fallback path.

The translation looks up task reduction-private storage with
__kmpc_task_reduction_get_th_data and binds the target region's
in_reduction block argument to that private pointer, so uses inside the
region do not keep referring to the original variable.

The patch also preserves in_reduction operands in the TargetOp builder
path and ensures target in_reduction list items are mapped into the
target region when needed.

The device/offload-entry path remains diagnosed as not yet implemented.
DeltaFile
+90-1mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+83-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+64-6flang/lib/Lower/OpenMP/OpenMP.cpp
+60-0mlir/test/Dialect/OpenMP/invalid.mlir
+50-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+28-0flang/test/Lower/OpenMP/target-inreduction.f90
+375-103 files not shown
+412-309 files

LLVM/project 981acb7llvm/lib/Target/RISCV RISCVInstrInfoY.td

indentation

Created using spr 1.3.8-beta.1
DeltaFile
+3-3llvm/lib/Target/RISCV/RISCVInstrInfoY.td
+3-31 files

LLVM/project 6d46223llvm/lib/Target/RISCV RISCVInstrInfoY.td, llvm/test/MC/RISCV/rvy rvy-basic.s

fix copy-paste error

Created using spr 1.3.8-beta.1
DeltaFile
+7-0llvm/test/MC/RISCV/rvy/rvy-basic.s
+1-1llvm/lib/Target/RISCV/RISCVInstrInfoY.td
+8-12 files

LLVM/project 7b993d2llvm/lib/Transforms/InstCombine InstCombineCalls.cpp, llvm/test/Transforms/InstCombine ldexp.ll

[InstCombine] Use sadd.sat for chained ldexp fold (#199274)

ldexp(ldexp(x, a), b) -> ldexp(x, a + b) didn't consider the fact that
`a + b` may overflow!  Use a saturating add instead.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
DeltaFile
+67-11llvm/test/Transforms/InstCombine/ldexp.ll
+42-18llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+109-292 files

LLVM/project 860e4b8llvm/lib/Target/X86 X86AvoidStoreForwardingBlocks.cpp, llvm/test/CodeGen/X86 avoid-sfb.ll

[X86][AvoidStoreForwardingBlocks] Skip volatile/atomic accesses. (#199698)

The pass splits an XMM/YMM load+store pair into smaller copies when a
preceding narrower store would block store-to-load forwarding into the
load, but it didn't check the MachineMemOperand's isVolatile/isAtomic
bits.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
DeltaFile
+163-0llvm/test/CodeGen/X86/avoid-sfb.ll
+7-1llvm/lib/Target/X86/X86AvoidStoreForwardingBlocks.cpp
+170-12 files

LLVM/project e1e52c9llvm/lib/Support Win64EH.cpp, llvm/test/tools/llvm-objdump/COFF win64-unwindv3-multi-epilog.yaml

[win][x64] Updated `llvm-objdump` and `llvm-readobj` to be able to dump Windows x64 Unwind v3 information. (#199120)

Public docs:
<https://learn.microsoft.com/en-us/cpp/build/x64-unwind-information-v3?view=msvc-170>

The change adds Windows x64 unwind v3 info decoding and printing support
in LLVM, including new data structures, enums, and decoding functions to
handle the different WOD opcodes and epilog descriptors. It also updates
the dumping utilities (llvm-readobj and llvm-objdump) to correctly
interpret v3 unwind info.
DeltaFile
+364-0llvm/lib/Support/Win64EH.cpp
+287-5llvm/tools/llvm-objdump/COFFDump.cpp
+233-3llvm/tools/llvm-readobj/Win64EHDumper.cpp
+191-0llvm/test/tools/llvm-readobj/COFF/unwind-x86_64-v3-multi-epilog.yaml
+173-0llvm/test/tools/llvm-readobj/COFF/unwind-x86_64-v3-all-wods.yaml
+164-0llvm/test/tools/llvm-objdump/COFF/win64-unwindv3-multi-epilog.yaml
+1,412-838 files not shown
+5,528-944 files

LLVM/project 2713d94llvm/docs SandboxIR.md, llvm/include/llvm/SandboxIR Tracker.h Context.h

Reapply "[SandboxIR][Tracker] Implement accept(/*AcceptAll*/) and revert(/*RevertAll*/)" (#199776) (#199805)

This reverts commit a7aceff0b1e552cbc2306e575e9ac649853fda8e.
DeltaFile
+55-0llvm/unittests/SandboxIR/TrackerTest.cpp
+23-8llvm/lib/SandboxIR/Tracker.cpp
+18-6llvm/include/llvm/SandboxIR/Tracker.h
+2-2llvm/include/llvm/SandboxIR/Context.h
+2-1llvm/docs/SandboxIR.md
+100-175 files

LLVM/project db9b595llvm/include/llvm/DebugInfo/CodeView CodeViewRegisters.def, llvm/lib/Target/X86/MCTargetDesc X86MCTargetDesc.cpp

[X86][APX] Add CodeView register IDs and mapping for APX EGPR (#199586)

Resolves #187924

Refer to
https://devblogs.microsoft.com/cppblog/msvc-version-1451-available/
DeltaFile
+64-0llvm/include/llvm/DebugInfo/CodeView/CodeViewRegisters.def
+64-0llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
+37-0llvm/test/DebugInfo/COFF/apx-egpr.ll
+165-03 files

LLVM/project ea3cb0fllvm/lib/Target/AMDGPU SIFrameLowering.cpp

Fix for noassert buildbot break in #183153 (#199781)

Change-Id: I285adf09ac2df239d0ab05459f7388b6970247ad
DeltaFile
+1-2llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+1-21 files

LLVM/project 3768e13llvm/test/CodeGen/X86 vector-shuffle-combining-avx512vbmi2.ll

[X86] Add test coverage for #145276 (#200004)
DeltaFile
+16-0llvm/test/CodeGen/X86/vector-shuffle-combining-avx512vbmi2.ll
+16-01 files

LLVM/project 95f08b1llvm/lib/Target/RISCV RISCVInstrInfoP.td

[RISCV][P-ext] Make the direction argument for RVPPairShift* classes required. NFC (#199799)

It's part of the encoding. I don't think we should have a preference for
one of the bit values being the default.
DeltaFile
+54-52llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+54-521 files

LLVM/project 38555dbllvm/lib/Target/RISCV RISCVISelDAGToDAG.cpp RISCVInstrInfoP.td

[RISCV][P-ext] Replace some custom isel code with tablegen patterns. NFC (#199881)
DeltaFile
+0-51llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+17-1llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+17-522 files

LLVM/project 358f5e7llvm/lib/Target/RISCV RISCVInstrInfoP.td

[RISCV][P-ext] Add missing let Inst{31} = 0b0 to RVPPairShift_rr. (#199885)

This bit was accidentally left unset. I think this means we might have
treated this bit as a don't care for the disassembler could disassemble
some invalid encodings to these instructions. I didn't check the opcode
map closely enough to confirm this.
DeltaFile
+1-0llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+1-01 files

LLVM/project a3acd80mlir/include/mlir/Dialect/XeGPU/Transforms XeGPULayoutImpl.h, mlir/lib/Dialect/XeGPU/Transforms XeGPUPropagateLayout.cpp XeGPULayoutImpl.cpp

[MLIR][XeGPU] Propagate layout onto loop-carried iter_arg entry edges (#198862)

as per title
DeltaFile
+14-0mlir/include/mlir/Dialect/XeGPU/Transforms/XeGPULayoutImpl.h
+9-4mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp
+8-4mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+2-2mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
+2-2mlir/test/Dialect/XeGPU/propagate-layout.mlir
+35-125 files

LLVM/project c3fc506llvm/lib/Target/AMDGPU AMDGPUTargetTransformInfo.cpp

[AMDGPU] Remove explicit PartialThreshold setting in loop unrolling (#198901)

Remove UP.PartialThreshold = UP.Threshold / 4 from AMDGPU TTI, restoring
the default PartialThreshold of 150.

This was introduced in #194924 to limit code-size growth from runtime
unrolling, but PartialThreshold also gates compile-time partial
unrolling of constant-trip-count loops. This change will make the
PartialThreshold back to the default value for both compile-time partial
unrolling and runtime partial unrolling.

Benchmarked across CK, llama.cpp, and xpu-perf — no performance impact
from restoring the default.

Fixes #196372, replaces #196818.

Assisted-by: Claude Code
DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+1-21 files

LLVM/project 22ced91flang/lib/Lower/OpenMP OpenMP.cpp, flang/test/Lower/OpenMP target-inreduction.f90 target-inreduction-unused.f90

[flang][OpenMP] Lower target in_reduction for host fallback

Teach Flang lowering and MLIR OpenMP translation to carry
in_reduction through omp.target for the host-fallback path.

The translation looks up task reduction-private storage with
__kmpc_task_reduction_get_th_data and binds the target region's
in_reduction block argument to that private pointer, so uses inside the
region do not keep referring to the original variable.

The patch also preserves in_reduction operands in the TargetOp builder
path and ensures target in_reduction list items are mapped into the
target region when needed.

The device/offload-entry path remains diagnosed as not yet implemented.
DeltaFile
+90-1mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+83-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+59-4flang/lib/Lower/OpenMP/OpenMP.cpp
+50-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+28-0flang/test/Lower/OpenMP/target-inreduction.f90
+27-0flang/test/Lower/OpenMP/target-inreduction-unused.f90
+337-82 files not shown
+342-288 files

LLVM/project 97b9c24libcxx/include/__functional function.h, libcxx/test/libcxx/utilities/function.objects block.func.compile.pass.cpp

Add missing annotations for Apple platforms (#198864)

These seemed to be missed in #193045.
DeltaFile
+31-0libcxx/test/libcxx/utilities/function.objects/block.func.compile.pass.cpp
+4-0libcxx/include/__functional/function.h
+35-02 files

LLVM/project dd2ce3dllvm/cmake/modules AddLLVM.cmake

[LLVM] Add per-target runtime directory to rpath (#199755)

Summary:
    This simply adds the LLVM_DEFAULT_TARGET_TRIPLE to the LLVM build's
rpath if present. This keeps things hermetic for the library (offload)
    that depends on it.
    
  The reason this is required is because `llvm-gpu-loader` calls
`DynamicLibrary` on the Offload runtime. However, in a shared library
build the actual call is in libLLVMSupport.so, which does not have this
RPath, so `dlopen` delegates to that which does not know how to find it.
The only options to fix this are to use `dlopen` directly in the loader,
    or add the rpath to the LLVM binaries.
    
I think this makes sense for LLVM, because the target-specific directory
    can contain LLVM related libraries.
DeltaFile
+4-0llvm/cmake/modules/AddLLVM.cmake
+4-01 files

LLVM/project b65a71dlibc/test/src/__support/threads/linux raw_mutex_test.cpp, utils/bazel/llvm-project-overlay/libc BUILD.bazel

[libc][bazel] Add rules for __support/threads tests. (#199871)

* Add Bazel BUILD rules for three `__support/threads` unit tests.
* Fix/expand BUILD rules for the support libraries they depend on
(clock_gettime and vdso) that were previously incorrectly missing `.cpp`
files with implementations.
* Minor fix to use `internal::exit` in `raw_mutex_test` to avoid adding
a dependency on `exit` entrypoint, which doesn't yet exist in Bazel.

Assisted by: Gemini
DeltaFile
+52-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+49-0utils/bazel/llvm-project-overlay/libc/test/src/__support/threads/BUILD.bazel
+2-2libc/test/src/__support/threads/linux/raw_mutex_test.cpp
+103-23 files

LLVM/project 468951bllvm/lib/Target/AMDGPU SIFrameLowering.cpp

Fix for noassert buildbot break in #183153

Change-Id: I285adf09ac2df239d0ab05459f7388b6970247ad
DeltaFile
+1-2llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+1-21 files

LLVM/project 06a6cbbllvm/lib/Target/AMDGPU SIFrameLowering.cpp

[AMDGPU] Fix SuperReg to MCRegister conversion (#199993)

This is a fix for "[AMDGPU] Implement CFI for non-kernel functions
(#183153)" f78a233ac89dc0f9f0f26dfe051874013ae6e242 to use
"SuperReg.asMCReg()" instead of "MCRegister(SuperReg)", which leads to
"ambiguous call" when using the MSVC compiler.
DeltaFile
+2-2llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+2-21 files

LLVM/project e5ad7f7flang/lib/Semantics resolve-directives.cpp, flang/test/Semantics/OpenMP declare-simd-uniform.f90

[flang][OpenMP] Remove ompFlagsRequireMark from symbol resolution (#198591)

The `ompFlagsRequireMark` set was there to make sure that we put the
flags from it on symbols even when no new symbols needed to be created.

Instead of doing that, we can just put the flag on the symbol every
time. There is no harm in having these flags, it's just extra
information.
DeltaFile
+2-12flang/lib/Semantics/resolve-directives.cpp
+1-1flang/test/Semantics/OpenMP/declare-simd-uniform.f90
+3-132 files

LLVM/project 976469cflang/lib/Lower/OpenMP OpenMP.cpp, flang/test/Lower/OpenMP target-inreduction.f90 target-inreduction-unused.f90

[flang][OpenMP] Support in_reduction on target

Teach Flang lowering and MLIR OpenMP translation to carry
in_reduction through omp.target.

The translation looks up the task reduction-private storage with
__kmpc_task_reduction_get_th_data and binds the target region's
in_reduction block argument to that private pointer, so uses inside the
region do not keep referring to the original variable.

The patch also preserves in_reduction operands in the TargetOp builder
path and makes sure target in_reduction list items are mapped into the
target region when needed.
DeltaFile
+90-1mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+83-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+59-4flang/lib/Lower/OpenMP/OpenMP.cpp
+50-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+28-0flang/test/Lower/OpenMP/target-inreduction.f90
+27-0flang/test/Lower/OpenMP/target-inreduction-unused.f90
+337-82 files not shown
+342-288 files

LLVM/project bf8dae1llvm/lib/Transforms/InstCombine InstCombineCasts.cpp, llvm/test/Transforms/InstCombine trunc-abs-intrinsics.ll

[InstCombine] Narrow llvm.abs through trunc. (#199643)

Update EvaluateInDifferentType / canEvaluateTruncated to narrow abs
intrinsics when the operand has at least OrigBitWidth - BitWidth + 1
sign bits. The transform always emits the narrow abs with
IsIntMinPoison=false, as the narrowed value may be INT_MIN in the narrow
type, while not in the original width.

Alive2 Proof with weaker precondition (top and truncated sign bits must
match): https://alive2.llvm.org/ce/z/AMQRmi

End-to-end C pixel math example: https://clang.godbolt.org/z/Ma8bsTGTY

PR: https://github.com/llvm/llvm-project/pull/199643
DeltaFile
+174-0llvm/test/Transforms/InstCombine/trunc-abs-intrinsics.ll
+15-0llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+189-02 files

LLVM/project 3e8c217llvm/lib/Target/AMDGPU SIInstrInfo.td, llvm/test/CodeGen/AMDGPU fshl-scalar-shift-zero.ll

[AMDGPU] Fix ShiftAmt32Imm to use unsigned comparison (#199052)

ShiftAmt32Imm used a signed 'Imm < 32' predicate, which incorrectly
matched negative immediates such as -1. The scalar fshr fast path:
    
      def : GCNPat<(UniformTernaryFrag<fshr> i32:$src0, i32:$src1,
                                             (i32 ShiftAmt32Imm:$src2)),
        (i32 (EXTRACT_SUBREG (S_LSHR_B64 ..., $src2), sub0))>;
    
When fshl(scalar, X, Z) is lowered via expandFunnelShift for any
constant Z in [0, 31], the generic code converts it to fshr(..., ~Z) or
fshr(..., -Z), producing a negative shift amount. Because all such
values satisfy Imm < 32 in a signed comparison, ShiftAmt32Imm matched
and the pattern passed the negative immediate directly to S_LSHR_B64
without the S_AND_B32 masking. S_LSHR_B64 then shifted by the wrong
amount, producing an incorrect result.
    
Fix by changing the predicate to an unsigned comparison so that only
values in [0, 31] match, and negative values fall through to the general

    [8 lines not shown]
DeltaFile
+139-0llvm/test/CodeGen/AMDGPU/fshl-scalar-shift-zero.ll
+1-1llvm/lib/Target/AMDGPU/SIInstrInfo.td
+140-12 files

LLVM/project dcf50fellvm/lib/Target/SystemZ SystemZInstrInfo.cpp, llvm/test/CodeGen/SystemZ foldmem-regalloc.mir foldmemop-global.mir

[SystemZ] Don't fold memops after SSA if tied regs don't match. (#197475)

When foldMemoryOperandImpl() is called during register allocation,
folding into a reg/mem opcode mustn't be done if the tied def and use
operands do not end up referencing the same register.

Fixes #197414
DeltaFile
+123-0llvm/test/CodeGen/SystemZ/foldmem-regalloc.mir
+4-2llvm/test/CodeGen/SystemZ/foldmemop-global.mir
+2-0llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
+129-23 files

LLVM/project 6cb00b6llvm/lib/Target/Hexagon HexagonISelLoweringHVX.cpp, llvm/test/CodeGen/Hexagon isel-hvx-pred-bitcast-order.ll inst_masked_store_bug1.ll

[Hexagon] Fix up vector predicate before compressing it for bitcast (#199283)

In v64i1 vector Predicate, each i1 is represented by 2 bits of predicate
register. A predicate register needs to be fixed before we compress it.

Signed-off-by: Alexey Karyakin <akaryaki at qti.qualcomm.com>
Co-authored-by: Ikhlas Ajbar <iajbar at quicinc.com>
DeltaFile
+47-2llvm/test/CodeGen/Hexagon/isel-hvx-pred-bitcast-order.ll
+21-0llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp
+1-1llvm/test/CodeGen/Hexagon/inst_masked_store_bug1.ll
+69-33 files

LLVM/project c0a302dllvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp

[AMDGPU] Refactor insertRelease into insertWriteback + insertWait (NFC) (#199486)

A release consists of two actions: write-back the current cache, and
wait for "relevant" outstanding operations to complete. With the new
memory model, it is possible to disable the cache write-back using
"non-av". This patch cleanly separates the existing implementation so
that the write-backs can be selectively applied after checking for
non-av semantics.

Part of a stack:

- #199486
- #199621
- #199489 
- #199622

Assisted-By: Claude Opus 4.6

---------

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+123-137llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+123-1371 files

LLVM/project 6170eebflang/lib/Lower/OpenMP ClauseProcessor.cpp, flang/test/Lower/OpenMP copyprivate6.f90

[flang][OpenMP] Fix copyprivate crash with unlimited polymorphic pointer (#199768)

Lowering a copyprivate clause whose list item is an unlimited
polymorphic pointer (class(*), pointer) crashed in TypeInfo::typeScan.
The scan descends through the fir.class box and the fir.ptr, reaching a
`none` element type, which the terminal assertion did not allow.

Fixes #198770
DeltaFile
+26-0flang/test/Lower/OpenMP/copyprivate6.f90
+5-1flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+31-12 files