LLVM/project 0689a59clang/include/clang/Basic DiagnosticParseKinds.td, clang/lib/Parse ParseOpenMP.cpp

[NFC] [OpenMP] Fix typo and add initializer to modifier. (#174784)

Fixed typo and added initialization of modifier.
DeltaFile
+2-0clang/lib/Parse/ParseOpenMP.cpp
+1-1clang/include/clang/Basic/DiagnosticParseKinds.td
+1-1clang/test/OpenMP/need_device_ptr_kind_messages.cpp
+4-23 files

LLVM/project c8674f6llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 combine-fma-concat.ll

[X86] combineConcatVectorOps - IsConcatFree - detect splats that comes from a common load/broadcastload (#174986)

Allows us to handle freely concatable cases after a broadcast load has
become shared by different vector width uses by peeking through
bitcasts/extract_subvector nodes
DeltaFile
+25-16llvm/test/CodeGen/X86/combine-fma-concat.ll
+7-1llvm/lib/Target/X86/X86ISelLowering.cpp
+32-172 files

LLVM/project da560b6llvm/lib/Target/RISCV/MCTargetDesc RISCVInstPrinter.cpp RISCVInstPrinter.h, llvm/test/MC/RISCV hex-imm-macho.s

[RISC-V][Mach-O] Print immediate operands in hexadecimal format. (#174505)

This is done for logical operations and auipc/lui.

Patch based on code written by Tim Northover.
DeltaFile
+17-1llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
+16-0llvm/test/MC/RISCV/hex-imm-macho.s
+2-0llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.h
+35-13 files

LLVM/project f5c39a6llvm/lib/Target/SPIRV SPIRVInstructionSelector.cpp, llvm/test/CodeGen/SPIRV/transcoding ConvertPtrInGlobalInit.ll

[SPIRV] Additional fixes for const init via `UtoPtr` (#172584)

#166494 added support for using `inttoptr` in global initialisation, and
lowering int into `OpSpecConstantOp OpConvertUToPtr`. Unfortunately, it
slightly more subtle case / exposed an existing issue around the `COPY`
pseudo-op. This patch ensures that we glance through a `COPY` when
figuring out whether an `OpConvertUToPtr` is actually operating on a
global. We also correctly handle the case where a `G_PTR_ADD` is used by
an `OpSpecConstantOp` in the context of global initialisation, which
would otherwise lead to broken SPIR-V wherein the latter would reference
a non constant Op.

---------

Co-authored-by: Marcos Maronas <marcos.maronas at intel.com>
DeltaFile
+4-3llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+2-1llvm/test/CodeGen/SPIRV/transcoding/ConvertPtrInGlobalInit.ll
+6-42 files

LLVM/project 5ca6327llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll ran-out-of-sgprs-allocation-failure.mir

[InlineSpiller][AMDGPU] Implement subreg reload during RA spill

Currently, when a virtual register is partially used, the
entire tuple is restored from the spilled location, even if
only a subset of its sub-registers is needed. This patch
introduces support for partial reloads by analyzing actual
register usage and restoring only the required sub-registers.
This improvement enhances register allocation efficiency,
particularly for cases involving tuple virtual registers.
For AMDGPU, this change brings considerable improvements
in workloads that involve matrix operations, large vectors,
and complex control flows.
DeltaFile
+3,429-4,107llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+81-102llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
+35-56llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
+91-0llvm/test/CodeGen/AMDGPU/skip-partial-reload-for-16bit-regaccess.mir
+40-40llvm/test/CodeGen/AMDGPU/ra-inserted-scalar-instructions.mir
+26-52llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+3,702-4,35767 files not shown
+4,016-4,57173 files

LLVM/project 5bae7a3llvm/test/CodeGen/AMDGPU regpressure-mitigation-with-subreg-reload.mir

[AMDGPU] Test precommit for subreg reload

This test currently fails due to insufficient
registers during allocation. Once the subreg
reload is implemented, it will begin to pass
as the partial reload help mitigate register
pressure.
DeltaFile
+37-0llvm/test/CodeGen/AMDGPU/regpressure-mitigation-with-subreg-reload.mir
+37-01 files

LLVM/project 64d13e3llvm/lib/Target/AMDGPU SIRegisterInfo.cpp SIRegisterInfo.h

[AMDGPU] Put back ProperlyAlighedRC helper functions

Putting back the functions that are recently deleted
as they were found unused. They are needed for
implementing subreg reload during RA.
DeltaFile
+22-0llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+5-0llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+27-02 files

LLVM/project b86e3c5llvm/include/llvm/CodeGen LiveRangeEdit.h, llvm/lib/CodeGen LiveRangeEdit.cpp

[CodeGen] Enhance createFrom for sub-reg aware cloning

Instead of just cloning the virtual register, this
function now creates a new virtual register derived
from a subregister class of the original value.
DeltaFile
+9-1llvm/lib/CodeGen/LiveRangeEdit.cpp
+5-2llvm/include/llvm/CodeGen/LiveRangeEdit.h
+14-32 files

LLVM/project 81b204dllvm/include/llvm/CodeGen TargetRegisterInfo.h, llvm/lib/CodeGen TargetRegisterInfo.cpp

[AMDGPU] Make AMDGPURewriteAGPRCopyMFMA aware of subreg reload

AMDGPURewriteAGPRCopyMFMA pass is currently not subreg-aware.
In particular, the logic that optimizes spills into COPY
instructions assumes full register reloads. This becomes
problematic when the reload instruction partially restores
a tuple register. This patch introduces the necessary changes
to make this pass subreg-aware, for a future patch that
implements subreg reload during RA.
DeltaFile
+41-1llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+10-0llvm/lib/CodeGen/TargetRegisterInfo.cpp
+3-0llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+54-13 files

LLVM/project da3d4c3llvm/test/CodeGen/AMDGPU ran-out-of-sgprs-allocation-failure.mir ra-inserted-scalar-instructions.mir

[AMDGPU] Introduce Offset field in SGPR spill Pseudos

Currently, SGPR spill pseudo-instructions lack
an offset field to represent non-zero stack offsets.
This patch introduces an additional offset field to
SGPR spill pseudo-instructions and updates all
relevant passes that handle spill lowering to support
this new field. This field is essential for a future
patch that implements subreg reload of tuple registers
from their stack location during RA.
DeltaFile
+26-26llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
+22-22llvm/test/CodeGen/AMDGPU/ra-inserted-scalar-instructions.mir
+16-16llvm/test/CodeGen/AMDGPU/remat-sop.mir
+14-14llvm/test/CodeGen/AMDGPU/remat-smrd.mir
+9-9llvm/test/CodeGen/AMDGPU/sgpr-spill.mir
+8-8llvm/test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir
+95-9535 files not shown
+164-16241 files

LLVM/project 089bb8ellvm/lib/Target/AMDGPU SIRegisterInfo.cpp SIRegisterInfo.h

[AMDGPU] Make getNumSubRegsForSpillOp externally available (NFC).
DeltaFile
+3-3llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+2-0llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+5-32 files

LLVM/project 9fb45c5llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 extractelements-subnodes-same-index.ll insert-subvector.ll

[SLP]Do not generate extractelement subnodes with the same indeces

The compiler should not generate subvectors with the same extractelement
instructions, it may cause a crash and leads to inefficient
vectorization.

Fixes #174773
DeltaFile
+113-0llvm/test/Transforms/SLPVectorizer/X86/extractelements-subnodes-same-index.ll
+3-1llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+1-3llvm/test/Transforms/SLPVectorizer/X86/insert-subvector.ll
+117-43 files

LLVM/project 218b3a5clang/lib/Analysis/LifetimeSafety LifetimeAnnotations.cpp, clang/test/Sema warn-lifetime-analysis-nocfg.cpp

only-for-owners
DeltaFile
+3-1clang/test/Sema/warn-lifetime-analysis-nocfg.cpp
+2-1clang/lib/Analysis/LifetimeSafety/LifetimeAnnotations.cpp
+5-22 files

LLVM/project 79fd11cclang/lib/Headers avx512vlbwintrin.h

[Headers][X86] __builtin_ia32_pmovwb128_mask is not constexpr (#174985)

Appears to be a copy+paste type - most of the x86 masked truncation intrinsics still can't be made constexpr at this time

Fixes #166814
DeltaFile
+2-2clang/lib/Headers/avx512vlbwintrin.h
+2-21 files

LLVM/project 9973e38llvm/include/llvm/CodeGen SDPatternMatch.h, llvm/unittests/CodeGen SelectionDAGPatternMatchTest.cpp

[SDPatternMatch] Add m_FAbs matcher (#174975)

Adds a pattern matcher for floating-point absolute value (ISD::FABS),
following the same pattern as m_Abs for integer absolute value.

Fixes #174751
DeltaFile
+6-0llvm/unittests/CodeGen/SelectionDAGPatternMatchTest.cpp
+4-0llvm/include/llvm/CodeGen/SDPatternMatch.h
+10-02 files

LLVM/project 012097dcompiler-rt/lib/builtins/aarch64 sme-abi.S

[compiler-rt][AArch64] Exit early from __arm_za_disable. (#174942)

Because `__arm_za_disable` is a private-ZA function, it's only ever
entered with ZA state `off` or `dormant`. If the state is `off` then we
can safely return and there is no need to call `__arm_tpidr2_save` or to
explicitly set PSTATE.ZA or TPIDR2_EL0 to zero.
DeltaFile
+7-0compiler-rt/lib/builtins/aarch64/sme-abi.S
+7-01 files

LLVM/project 21dc73flibcxx/include any

[libc++][NFC] Update <any> to a more modern code style (#174619)

This patch refactors `enable_if`s inside `<any>` to use the `..., int> =
0` variant that we try to use throughout the code base and inlines some
of the functions into the class body to avoid duplicating the
`enable_if`s.
DeltaFile
+44-62libcxx/include/any
+44-621 files

LLVM/project a4f1798clang/lib/Parse ParseDeclCXX.cpp, clang/test/Parser cxx2c-trivially-relocatable.cpp

[Clang] expunge `trivially_relocate_if_eligible` (#174344)

In Kona, WG21 decided to revert trivial relocation (P2786).

Keep the notion of relocatability
(used in the wild and likely to come back),
but remove the keyword which is no longer conforming
DeltaFile
+0-148clang/test/SemaCXX/ptrauth-type-traits.cpp
+0-123clang/test/SemaCXX/cxx2c-trivially-relocatable.cpp
+5-46clang/lib/Parse/ParseDeclCXX.cpp
+0-43clang/test/SemaCXX/trivially-relocatable-ptrauth.cpp
+0-31clang/test/Parser/cxx2c-trivially-relocatable.cpp
+0-24clang/test/SemaCXX/ptrauth-triviality.cpp
+5-41510 files not shown
+13-47916 files

LLVM/project 5c324b5llvm/lib/Target/AMDGPU AMDGPUPromoteAlloca.cpp, llvm/test/CodeGen/AMDGPU promote-alloca-use-after-erase.ll

use `Value *` instead of useless `WeakVH`
DeltaFile
+1-1llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+2-0llvm/test/CodeGen/AMDGPU/promote-alloca-use-after-erase.ll
+3-12 files

LLVM/project cc1bb84mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp

[mlir][OpenMP] Fix sanitizer error in buildTaskLikeBodyGenCallback  (#174983)

This is a fix for the asan bot after
https://github.com/llvm/llvm-project/pull/174386

Failing bot: https://lab.llvm.org/buildbot/#/builders/24/builds/16371

This commit undoes a simplification I thought reduced copied+pasted
code. I will merge it like this now to unblock the bot, and then work
separately on a different way to share code between both callbacks.
DeltaFile
+172-101mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+172-1011 files

LLVM/project 1677b3ellvm/lib/Target/AMDGPU AMDGPUPromoteAlloca.cpp, llvm/test/CodeGen/AMDGPU promote-alloca-use-after-erase.ll

fix comments
DeltaFile
+5-5llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+2-2llvm/test/CodeGen/AMDGPU/promote-alloca-use-after-erase.ll
+7-72 files

LLVM/project 5b1a032llvm/lib/Target/AMDGPU AMDGPUPromoteAlloca.cpp, llvm/test/CodeGen/AMDGPU promote-alloca-use-after-erase.ll

[AMDGPU] Fix a potential use-after-erase in `AMDGPUPromoteAlloca` pass

In some cases, the placeholder itself can be used as the value for its corresponding block in `SSAUpdater`, and later used as an incoming value in another block in `GetValueInMiddleOfBlock`. If we erase it too early, this can lead to a use-after-erase. The tricky part is that it may not trigger any error right away, but can cause weird and completely unrelated issues later in the pipeline.
DeltaFile
+34-0llvm/test/CodeGen/AMDGPU/promote-alloca-use-after-erase.ll
+11-2llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+45-22 files

LLVM/project db26ce5llvm/test/CodeGen/PowerPC vector-lrint.ll vector-llrint.ll

[PowerPC] Change `half` to use soft promotion rather than `PromoteFloat` (#152632)

On PowerPC targets, `half` uses the default legalization of promoting to
a `f32`. However, this has some fundamental issues related to inability
to round trip. Resolve this by switching to the soft legalization, which
passes `f16` as an `i16`.

The PowerPC ABI Specification does not define a `_Float16` type, so the
calling convention changes are acceptable.

Fixes the PowerPC part of
https://github.com/llvm/llvm-project/issues/97975
Fixes the PowerPC part of
https://github.com/llvm/llvm-project/issues/97981
DeltaFile
+957-1,766llvm/test/CodeGen/PowerPC/vector-lrint.ll
+957-1,766llvm/test/CodeGen/PowerPC/vector-llrint.ll
+275-590llvm/test/CodeGen/PowerPC/half.ll
+71-80llvm/test/CodeGen/PowerPC/llvm.frexp.ll
+30-75llvm/test/CodeGen/PowerPC/pr48519.ll
+49-34llvm/test/CodeGen/PowerPC/llvm.modf.ll
+2,339-4,3118 files not shown
+2,364-4,39714 files

LLVM/project 5f590edllvm/lib/Target/SystemZ SystemZAsmPrinter.cpp

[SystemZ][z/OS] Improve use of formatv (#174503)

Using a `raw_svector_ostream` object is not necessary, because this is
hidden in the conversion function. In addition, there is no need to
reason about a zero termination of the string. Declaring the ascii and
ebcdic version of the string variables at the same time makes sure that
both strings are allocated with the same size.
DeltaFile
+9-15llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+9-151 files

LLVM/project fa53d92flang/lib/Semantics expression.cpp, flang/test/Semantics bug127425.f90

[flang] Check for errors when analyzing array constructors (#173092)

Errors in array constructor values result in the array having
less elements than it should, which can cause other errors that
will confuse the user. Avoid this by not returning an expression
on errors.

Fixes #127425
DeltaFile
+10-0flang/test/Semantics/bug127425.f90
+4-0flang/lib/Semantics/expression.cpp
+14-02 files

LLVM/project f084590llvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp SOPInstructions.td

[AMDGPU] Add intrinsic exposing s_alloc_vgpr

Make it possible to use `s_alloc_vgpr` at the IR level. This is a huge
footgun and use for anything other than compiler internal purposes is
heavily discouraged. The calling code must make sure that it does not
allocate fewer VGPRs than necessary - the intrinsic is NOT a request to
the backend to limit the number of VGPRs it uses (in essence it's not so
different from what we do with the dynamic VGPR flags of the
`amdgcn.cs.chain` intrinsic, it just makes it possible to use this
functionality in other scenarios).
DeltaFile
+63-0llvm/test/CodeGen/AMDGPU/intrinsic-amdgcn-s-alloc-vgpr.ll
+16-0llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+11-0llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+9-0llvm/test/Analysis/UniformityAnalysis/AMDGPU/always_uniform.ll
+4-2llvm/lib/Target/AMDGPU/SOPInstructions.td
+4-0llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+107-21 files not shown
+108-27 files

LLVM/project d7ebd9cllvm/include/llvm/IR IntrinsicsAMDGPU.td

Address review comments
DeltaFile
+4-5llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+4-51 files

LLVM/project 7c316c7llvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp

Silence warning
DeltaFile
+2-2llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+2-21 files

LLVM/project 9e78d8autils/bazel/llvm-project-overlay/mlir BUILD.bazel

Revert "[BAZEL] Move FuncTransformsPassIncGen to CAPIIR header dep (#174982)"

This reverts commit 46d0862773ac3ac07fd1a8abe76db623b26d7d45.

This previously landed a couple commits ago and now duplicates the dep,
breaking the bazel build.
DeltaFile
+0-1utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+0-11 files

LLVM/project 46d0862utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[BAZEL] Move FuncTransformsPassIncGen to CAPIIR header dep (#174982)

DeltaFile
+1-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+1-01 files