LLVM/project 4d335cbllvm/test/tools/llvm-mca/AArch64/Neoverse V3AE-rcpc-immo-instructions.s N2-rcpc-immo-instructions.s

[AArch64] Fix scheduling info for Armv8.4-a LDAPUR* instructions (#171637)

They were using the wrong scheduler resource. They're also missing from
the optimisation guides, but WriteLD should be closer at least.
DeltaFile
+19-19llvm/test/tools/llvm-mca/AArch64/Neoverse/V3AE-rcpc-immo-instructions.s
+19-19llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-rcpc-immo-instructions.s
+19-19llvm/test/tools/llvm-mca/AArch64/Neoverse/N3-rcpc-immo-instructions.s
+19-19llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-rcpc-immo-instructions.s
+19-19llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-rcpc-immo-instructions.s
+19-19llvm/test/tools/llvm-mca/AArch64/Neoverse/V3-rcpc-immo-instructions.s
+114-1141 files not shown
+115-1157 files

LLVM/project c9c432fmlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp

review comments, Tom
DeltaFile
+1-4mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+1-41 files

LLVM/project 5fbb19emlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp, mlir/test/Target/LLVMIR openmp-private-allloca-hoisting.mlir

[OpenMP][MLIR] Hoist static `alloca`s emitted by private `init` regions to the allocation IP of the construct

Having more than 1 descritpr (allocatable or array) on the same `private` clause triggers a runtime crash on GPUs at the moment.

For SPMD kernels, the issue happens because the initialization logic includes:
* Allocating a number of temporary structs (these are emitted by flang when `fir` is lowered to `mlir.llvm`).
* There is a conditional branch that determines whether we will allocate storage for the descriptor and initialize array bounds from the original descriptor or whether we will initialize the private descriptor to null.

Because of these 2 things, temp allocations needed for descriptors beyond the 1st one are preceded by branching which causes the observed the runtime crash.

This PR solves this issue by hoisting these static `alloca`s instructions to the suitable allca IP of the parent construct.
DeltaFile
+79-0mlir/test/Target/LLVMIR/openmp-private-allloca-hoisting.mlir
+67-8mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+146-82 files

LLVM/project 6573f62llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 atomic-fp.ll

[X86] LowerATOMIC_STORE - on 32-bit targets see if i64 values were originally legal f64 values that we can store directly. (#171602)

Based off feedback from #171478
DeltaFile
+207-467llvm/test/CodeGen/X86/atomic-fp.ll
+8-1llvm/lib/Target/X86/X86ISelLowering.cpp
+215-4682 files

LLVM/project 59b13d6libcxx/include/__algorithm find_end.h

[libc++] Add `__find_end` optimizations back (#171374)

This essentially reverts #100685 and fixes the bidirectional and random
access specializations to be actually used.

```
Benchmark                                                                old             new    Difference    % Difference
------------------------------------------------------------  --------------  --------------  ------------  --------------
rng::find_end(deque<int>)_(match_near_end)/1000                       366.91           47.63       -319.28         -87.02%
rng::find_end(deque<int>)_(match_near_end)/1024                      3273.31           35.42      -3237.89         -98.92%
rng::find_end(deque<int>)_(match_near_end)/8192                    171608.41          285.04    -171323.38         -99.83%
rng::find_end(deque<int>)_(near_matches)/1000                       31808.40        19214.35     -12594.05         -39.59%
rng::find_end(deque<int>)_(near_matches)/1024                       37428.72        20773.87     -16654.85         -44.50%
rng::find_end(deque<int>)_(near_matches)/8192                     1719468.34      1213967.45    -505500.89         -29.40%
rng::find_end(deque<int>)_(process_all)/1000                          275.81          336.29         60.49          21.93%
rng::find_end(deque<int>)_(process_all)/1024                          258.88          320.36         61.47          23.74%
rng::find_end(deque<int>)_(process_all)/1048576                    277117.41       327640.37      50522.96          18.23%
rng::find_end(deque<int>)_(process_all)/8192                         2166.36         2533.52        367.16          16.95%
rng::find_end(deque<int>)_(same_length)/1000                         1280.06          362.53       -917.53         -71.68%

    [246 lines not shown]
DeltaFile
+105-0libcxx/include/__algorithm/find_end.h
+105-01 files

LLVM/project 4a92060llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Fix build
DeltaFile
+24-12llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+24-121 files

LLVM/project b67f8f0llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU lds-dma-waits.ll

(reland) [AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (#162077)

Fixed a crash in Blender due to some weird control flow.
The issue was with the "merge" function which was only looking at the
keys of the "Other" VMem/SGPR maps. It needs to look at the keys of both
maps and merge them.

Original commit message below
----

The pass was already "reinventing" the concept just to deal with 16 bit
registers. Clean up the entire tracking logic to only use register
units.

There are no test changes because functionality didn't change, except:
- We can now track more LDS DMA IDs if we need it (up to `1 << 16`)
- The debug prints also changed a bit because we now talk in terms of
register units.


    [9 lines not shown]
DeltaFile
+311-282llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+4-4llvm/test/CodeGen/AMDGPU/lds-dma-waits.ll
+315-2862 files

LLVM/project db06ebbllvm/lib/Target/AArch64 AArch64PerfectShuffle.h AArch64ISelLowering.cpp

[AArch64][NFC] Add isTRNMask improvements to isZIPMask (#171532)

Some [ideas for
improvement](https://github.com/llvm/llvm-project/pull/169858#pullrequestreview-3525357470)
came up during review of recent changes to `isTRNMask`.
This PR applies them also to `isZIPMask`, which is implemented almost
identically.
DeltaFile
+18-17llvm/lib/Target/AArch64/AArch64PerfectShuffle.h
+12-12llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+30-292 files

LLVM/project 2c1decblibcxx/include __split_buffer deque, libcxx/include/__vector vector.h

[libc++] Don't instantiate __split_buffer with an allocator reference (#171651)

Allocators should be extremely cheap, if not free, to copy. Furthermore,
we have requirements on allocator types that copies must compare equal,
and that move and copy must be the same.

Hence, taking an allocator by reference should not provide benefits
beyond making a copy of it. However, taking the allocator by reference
leads to complexity in __split_buffer, which can be removed if we stop
using that pattern.
DeltaFile
+20-25libcxx/include/__split_buffer
+17-17libcxx/include/__vector/vector.h
+8-8libcxx/include/deque
+45-503 files

LLVM/project 15df9e7llvm/lib/Target/AMDGPU SIInstructions.td SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU insert_vector_dynelt.ll extract_vector_dynelt.ll

[AMDGPU][SDAG] Add missing cases for SI_INDIRECT_SRC/DST (#170323)

Before this patch, `insertelement/extractelement` with dynamic indices
would
fail to select with `-O0` for vector 32-bit element types with sizes 3,
5, 6 and 7,
which did not map to a `SI_INDIRECT_SRC/DST` pattern.

Other "weird" sizes bigger than 8 (like 13) are properly handled
already.

To solve this issue we add the missing patterns for the problematic
sizes.

Solves SWDEV-568862
DeltaFile
+5,963-0llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
+3,310-0llvm/test/CodeGen/AMDGPU/extract_vector_dynelt.ll
+16-0llvm/lib/Target/AMDGPU/SIInstructions.td
+8-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+9,297-04 files

LLVM/project f57abf5llvm/lib/Target/SPIRV SPIRVRegularizer.cpp SPIRVBuiltins.cpp, llvm/test/CodeGen/SPIRV/transcoding OpExtInst_vector_promotion.ll OpExtInst_vector_promotion_bug.ll

[SPIRV] Promote scalar arguments to vector for `OpExtInst` in `generateExtInst` instead of `SPIRVRegularizer` (#170155)

This patch consist of 2 parts:
* A first part that removes the scalar to vector promotion for built-ins
in the `SPIRVRegularizer`;
* and a second part that implements the promotion for built-ins from
scalar to vector in `generateExtInst`.

The implementation in `SPIRVRegularizer` had several issues:
* It rolled its own built-in pattern matching that was extremely
permissive
  * the compiler would crash if the built-in had a definition
  * the compiler would crash if the built-in had no arguments
* The compiler would crash if there were more than 2 function
definitions in the module.
* It'd be better if this was implemented as a module pass; where we
iterate over the users of the function, instead of scanning the whole
module for callers.


    [13 lines not shown]
DeltaFile
+179-0llvm/test/CodeGen/SPIRV/transcoding/OpExtInst_vector_promotion.ll
+3-99llvm/lib/Target/SPIRV/SPIRVRegularizer.cpp
+60-2llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp
+21-0llvm/test/CodeGen/SPIRV/transcoding/OpExtInst_vector_promotion_bug.ll
+0-16llvm/test/CodeGen/SPIRV/transcoding/OpMin.ll
+263-1175 files

LLVM/project 794551dllvm/lib/Target/RISCV RISCVInstrInfoP.td RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV rvp-ext-rv32.ll rvp-ext-rv64.ll

[RISCV][llvm] Support PSRA, PSRAI, PSRL, PSRLI codegen for P extension (#171460)

DeltaFile
+294-0llvm/test/CodeGen/RISCV/rvp-ext-rv32.ll
+152-0llvm/test/CodeGen/RISCV/rvp-ext-rv64.ll
+47-6llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+14-15llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+507-214 files

LLVM/project 6ad0c7cllvm/lib/Target/RISCV RISCVInstrInfoZvfbf.td RISCVInstrInfoVVLPatterns.td

[NFC][RISCV] Unify all zvfbfa vl patterns and sd node patterns (#171072)

This patch try to move all vl patterns and sd node patterns to
RISCVInstrInfoVVLPatterns.td and RISCVInstrInfoVSDPatterns.td
respectively. It removes redefinition of pattern classes for zvfbfa and
make it easier to maintain and change.

Note: this does not include intrinsic patterns, if we want to also unify
intrinsic patterns we need to also move pseudo instruction definitions
of zvfbfa to RISCVInstrInfoVPseudos.td.
DeltaFile
+0-223llvm/lib/Target/RISCV/RISCVInstrInfoZvfbf.td
+146-32llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
+74-32llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td
+21-23llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
+3-0llvm/lib/Target/RISCV/RISCVInstrInfoV.td
+1-0llvm/lib/Target/RISCV/RISCVFeatures.td
+245-3106 files

LLVM/project b7c0452clang/lib/CodeGen CodeGenModule.cpp, llvm/test/CodeGen/PowerPC aix-cc-abi.ll aix-cc-abi-mir.ll

[PowerPC][AIX] Specify correct ABI alignment for double (#144673)

Add `f64:32:64` to the data layout for AIX, to indicate that doubles
have a 32-bit ABI alignment and 64-bit preferred alignment.

Clang was already taking this into account, but it was not reflected in
LLVM's data layout.

A notable effect of this change is that `double` loads/stores with 4
byte alignment are no longer considered "unaligned" and avoid the
corresponding unaligned access legalization. I assume that this is
correct/desired for AIX. (The codegen previously already relied on this
in some places related to the call ABI simply by dint of assuming
certain stack locations were 8 byte aligned, even though they were only
actually 4 byte aligned.)

Fixes https://github.com/llvm/llvm-project/issues/133599.
DeltaFile
+81-93llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
+70-82llvm/test/CodeGen/PowerPC/aix-cc-abi-mir.ll
+24-37llvm/test/CodeGen/PowerPC/aix32-cc-abi-vaarg-mir.ll
+22-35llvm/test/CodeGen/PowerPC/aix32-cc-abi-vaarg.ll
+5-9clang/lib/CodeGen/CodeGenModule.cpp
+11-1llvm/unittests/Bitcode/DataLayoutUpgradeTest.cpp
+213-2576 files not shown
+235-26612 files

LLVM/project c43c604mlir/python CMakeLists.txt

[mlir][Python] create MLIRPythonSupport
DeltaFile
+52-13mlir/python/CMakeLists.txt
+52-131 files

LLVM/project aa31efclibclc/opencl/lib/clspv/shared vstore_half.cl

[libclc] use clc functions in clspv/shared/vstore_half.cl (#171770)

DeltaFile
+18-12libclc/opencl/lib/clspv/shared/vstore_half.cl
+18-121 files

LLVM/project 2ce17ballvm/lib/Analysis CmpInstAnalysis.cpp, llvm/lib/Transforms/InstCombine InstCombineAndOrXor.cpp InstCombineCompares.cpp

[InstCombine][CmpInstAnalysis] Use consistent spelling and function names. NFC. (#171645)

Both `decomposeBitTestICmp` and `decomposeBitTest` have a parameter
called `lookThroughTrunc`. This was spelled in full (i.e. `lookThroughTrunc`)
in the header. However, in the implementation, it's written as `lookThruTrunc`.

I opted to convert all instances of `lookThruTrunc` into
`lookThroughTrunc` to reduce surprise while reading the code and for
conformity.

---

The other change in this PR is the renaming of the wrapper around
`decomposeBitTest()`. Even though it was a wrapper around
`CmpInstAnalysis.h`'s `decomposeBitTest`, the function was called
`decomposeBitTestICmp`. This is quite confusing because such a function
_also_ exists in `CmpInstAnalysis.h`, but it is _not_ the one actually
being used in `InstCombineAndOrXor.cpp`.
DeltaFile
+6-6llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+4-4llvm/lib/Analysis/CmpInstAnalysis.cpp
+1-1llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+11-113 files

LLVM/project 39a723emlir/lib/Dialect/Linalg/Transforms Specialize.cpp, mlir/lib/Dialect/Linalg/Utils Utils.cpp

[Linalg] Add *Conv2D* matchers (#168362)

-- This commit is the fourth in the series of adding matchers
for linalg.*conv*/*pool*. Refer:
https://github.com/llvm/llvm-project/pull/163724
-- In this commit all variants of Conv2D convolution ops have been
   added.
-- It also refactors the way these matchers work to make adding more
matchers concise.

Signed-off-by: Abhishek Varma <abhvarma at amd.com>

---------

Signed-off-by: Abhishek Varma <abhvarma at amd.com>
Signed-off-by: hanhanW <hanhan0912 at gmail.com>
Co-authored-by: hanhanW <hanhan0912 at gmail.com>
DeltaFile
+558-11mlir/lib/Dialect/Linalg/Utils/Utils.cpp
+218-4mlir/test/Dialect/Linalg/convolution/roundtrip-convolution.mlir
+15-0mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp
+791-153 files

LLVM/project 6a25e45llvm/lib/Analysis ConstantFolding.cpp, llvm/test/Transforms/InstSimplify ptrtoaddr.ll

[ConstantFolding] Support ptrtoaddr in ConstantFoldCompareInstOperands (#162653)

This folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)`, matching
the existing ptrtoint fold. Restrict both folds to only the case where
the result type matches the address type.
    
I think that all folds this can do in practice end up actually being
valid for ptrtoint to a type large than the address size as well, but I
don't really see a way to justify this generically without making
assumptions about what kind of folding the recursive calls may do.

This is based on the icmp semantics specified in
https://github.com/llvm/llvm-project/pull/163936.
DeltaFile
+82-0llvm/test/Transforms/InstSimplify/ptrtoaddr.ll
+12-10llvm/lib/Analysis/ConstantFolding.cpp
+94-102 files

LLVM/project c9648d7llvm/lib/IR Verifier.cpp, llvm/test/Assembler ptrtoaddr-invalid-constexpr.ll

[Verifier] Make sure all constexprs in instructions are visited (#171643)

Previously this only happened for constants of some types and missed
incorrect ptrtoaddr.
DeltaFile
+17-3llvm/test/Assembler/ptrtoaddr-invalid-constexpr.ll
+5-8llvm/lib/IR/Verifier.cpp
+22-112 files

LLVM/project 4882029llvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/InstCombine mul.ll

[ValueTracking] Enhance overflow computation for unsigned mul (#171568)

Changed the range computation in computeOverflowForUnsignedMul to use
computeConstantRange as well.

This expands the patterns that InstCombine manages to narrow a mul that
has values that come from zext, for example if a value comes from a div
operation then the known bits doesn't give the narrowest possible range
for that value.

---------

Co-authored-by: Adar Dagan <adar.dagan at mobileye.com>
DeltaFile
+28-0llvm/test/Transforms/InstCombine/mul.ll
+5-5llvm/lib/Analysis/ValueTracking.cpp
+33-52 files

LLVM/project cb4b6adclang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/Dialect/IR CIRDialect.cpp

[CIR] Add the ability to detect if SwitchOp covers all the cases (#171246)

DeltaFile
+35-13clang/test/CIR/CodeGen/switch.cpp
+0-38clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+31-1clang/test/CIR/IR/switch.cir
+13-4clang/include/clang/CIR/Dialect/IR/CIROps.td
+5-5clang/test/CIR/Transforms/switch-fold.cir
+3-3clang/test/CIR/CodeGen/atomic.c
+87-642 files not shown
+90-658 files

LLVM/project 45e7dabmlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp

Mark mlir->llvmir translation for num_threads with dims as NYI
DeltaFile
+14-1mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+14-11 files

LLVM/project 3d4264emlir/include/mlir/Dialect/OpenMP OpenMPClauses.td, mlir/lib/Conversion/SCFToOpenMP SCFToOpenMP.cpp

[OpenMP][MLIR] Add num_threads clause with dims modifier support
DeltaFile
+72-7mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+47-3mlir/include/mlir/Dialect/OpenMP/OpenMPClauses.td
+32-1mlir/test/Dialect/OpenMP/invalid.mlir
+10-5mlir/test/Dialect/OpenMP/ops.mlir
+2-0mlir/lib/Conversion/SCFToOpenMP/SCFToOpenMP.cpp
+163-165 files

LLVM/project 4f9d5a8llvm/lib/Target/RISCV RISCVLoadStoreOptimizer.cpp, llvm/test/CodeGen/RISCV xqcilsm-lwmi-swmi.mir

[RISCV] Generate Xqcilsm LWMI/SWMI load/store multiple instructions (#171079)

This patch adds support for generating the Xqcilsm load/store multiple
instructions as a part of the RISCVLoadStoreOptimizer pass. For now we
only combine two load/store instructions into a load/store multiple.
Support for converting more loads/stores will be added in follow-up
patches. These instructions are only applicable for 32-bit loads/stores
with an alignment of 4-bytes.
DeltaFile
+315-0llvm/test/CodeGen/RISCV/xqcilsm-lwmi-swmi.mir
+112-10llvm/lib/Target/RISCV/RISCVLoadStoreOptimizer.cpp
+427-102 files

LLVM/project 426cedcllvm/lib/Target/LoongArch LoongArchInstrInfo.td LoongArchInstrFormats.td, llvm/lib/Target/LoongArch/Disassembler LoongArchDisassembler.cpp

[LoongArch] Add support for the ud macro instruction (#171583)

This patch adds support for the `ud ui5` macro instruction. The `ui5`
operand must be inthe range `0-31`. The macro expands to:

`amswap.w $rd, $r1, $rj`

where `ui5` specifies the register number used for `$rd` in the expanded
instruction, and `$rd` is the same as `$rj`.

Relevant binutils patch:

https://sourceware.org/pipermail/binutils/2025-December/146042.html
DeltaFile
+23-0llvm/lib/Target/LoongArch/Disassembler/LoongArchDisassembler.cpp
+10-7llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+13-1llvm/test/MC/LoongArch/Basic/Integer/misc.s
+13-0llvm/lib/Target/LoongArch/LoongArchInstrFormats.td
+1-1llvm/test/CodeGen/LoongArch/trap.ll
+60-95 files

LLVM/project 71bfdd1clang/lib/CodeGen CGStmt.cpp, clang/test/CodeGen defer-ts.c defer-ts-nested-cleanups.c

[Clang] Add support for the C `_Defer` TS (#162848)

This implements WG14 N3734 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3734.pdf),
aka `_Defer`; it is currently only supported in C if `-fdefer-ts` is passed.
DeltaFile
+652-0clang/test/CodeGen/defer-ts.c
+179-0clang/test/CodeGen/defer-ts-nested-cleanups.c
+172-0clang/test/Sema/defer-ts.c
+85-0clang/lib/CodeGen/CGStmt.cpp
+58-0clang/test/Parser/defer-ts.c
+52-0clang/test/Sema/defer-ts-sjlj.c
+1,198-040 files not shown
+1,667-846 files

LLVM/project 3b04094llvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVISelDAGToDAG.cpp

[RISCV] Add Xsfmm vlte and vste intrinsics to getTgtMemIntrinsics. (#171747)

Replace dyn_cast with cast. The dyn_cast can never fail now. Previously
it never succeeded.
DeltaFile
+54-0llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+2-2llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+56-22 files

LLVM/project e795b8bllvm/lib/Target/RISCV RISCVInstrInfoA.td

[RISCV] Use GPR instead of ixlenimm for sextshamt in PseudoMaskedAMOMinMax. NFC (#171736)

This operand is always a register.
DeltaFile
+2-2llvm/lib/Target/RISCV/RISCVInstrInfoA.td
+2-21 files

LLVM/project a19badbclang-tools-extra/docs/clang-tidy/checks/abseil unchecked-statusor-access.rst

doc

Created using spr 1.3.7
DeltaFile
+3-2clang-tools-extra/docs/clang-tidy/checks/abseil/unchecked-statusor-access.rst
+3-21 files