LLVM/project d0caa41llvm/include/llvm/Target/GlobalISel Combine.td, llvm/test/CodeGen/AArch64 neon-bitwise-instructions.ll neg-selects.ll

[GISel] import pattern `(A-(B-C)) to A+(C-B)` (#181676)

This PR imports the rewrite pattern `(A-(B-C)) to A+(C-B)` from
selectionDAG to GlobalISel.
The rewrite should only trigger when `B-C` is used once.
DeltaFile
+83-0llvm/test/CodeGen/AArch64/GlobalISel/combine-integer.mir
+10-0llvm/include/llvm/Target/GlobalISel/Combine.td
+2-2llvm/test/CodeGen/AArch64/neon-bitwise-instructions.ll
+1-3llvm/test/CodeGen/AArch64/neg-selects.ll
+96-54 files

LLVM/project 9050794llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/PhaseOrdering/X86 scalarization.ll scalarization-inseltpoison.ll

[SLP]Improve reductions for copyables/split nodes

The original support for copyables leads to a regression in x264 in
RISCV, this patch improves detection of the copyable candidates by more
precise checking of the profitability and adds and extra check for
splitnode reduction, if it is profitable.

Fixes #184313

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/185697
DeltaFile
+79-139llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-based-reduction.ll
+58-28llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+27-27llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
+16-20llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll
+8-9llvm/test/Transforms/PhaseOrdering/X86/scalarization.ll
+8-9llvm/test/Transforms/PhaseOrdering/X86/scalarization-inseltpoison.ll
+196-2322 files not shown
+202-2398 files

LLVM/project 593683fmlir/include/mlir/Dialect/OpenACC OpenACCUtilsLoop.h, mlir/lib/Dialect/OpenACC/Utils OpenACCUtilsLoop.cpp OpenACCUtilsCG.cpp

[OpenACC][NFC] Generalize wrapMultiBlockRegionWithSCFExecuteRegion (#187359)

Simplify `wrapMultiBlockRegionWithSCFExecuteRegion` by replacing the
`bool convertFuncReturn` parameter with a generic `getNumSuccessors() ==
0` check. Terminators with no successors are by definition region exit
points, so they can be identified automatically without requiring
callers to specify types. This enables downstream dialects (e.g., CUF
with fir::FirEndOp) to reuse the utility without modifying it.

```
// Before:
wrapMultiBlockRegionWithSCFExecuteRegion(region, mapping, loc, rewriter, /*convertFuncReturn=*/true);

// After:
wrapMultiBlockRegionWithSCFExecuteRegion(region, mapping, loc, rewriter);
```
DeltaFile
+7-10mlir/include/mlir/Dialect/OpenACC/OpenACCUtilsLoop.h
+4-4mlir/unittests/Dialect/OpenACC/OpenACCUtilsLoopTest.cpp
+2-5mlir/lib/Dialect/OpenACC/Utils/OpenACCUtilsLoop.cpp
+1-1mlir/lib/Dialect/OpenACC/Utils/OpenACCUtilsCG.cpp
+14-204 files

LLVM/project e2c9ddellvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.s.ttracedata.ll

AMDGPU/GlobalISel: RegBankLegalize rules for s_ttracedata (#187342)
DeltaFile
+3-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+1-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.ttracedata.ll
+4-12 files

LLVM/project c9f6ad8libcxx/docs index.rst

[libc++][docs][NFC] Update Open XL supported version to 17.1.4 (#176112)

Open XL 17.1.4 based on LLVM21 was released:
https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.4?topic=whats-new

Co-authored-by: Hristo Hristov <zingam at outlook.com>
DeltaFile
+1-1libcxx/docs/index.rst
+1-11 files

LLVM/project a693970llvm/test/Transforms/LICM store-hoisting.ll

[LICM] Regenerate test checks (NFC)
DeltaFile
+320-87llvm/test/Transforms/LICM/store-hoisting.ll
+320-871 files

LLVM/project b7776ccclang/lib/CIR/CodeGen CIRGenExprCXX.cpp, clang/test/CIR/CodeGen new.cpp

[CIR] Add support for array new with ctor init (#187418)

This adds support for array new initialization that requires calling
constructors.

This diverges a bit from the classic codegen implementation in a couple
of ways. First, we use the cir.array_ctor operation to represent all the
constructor calls that weren't part of an explicit initializer list.
This gets lowered to a loop during the LoweringPrepare pass. Second,
because CIR uses more explicit types, we have to insert a bitcast of the
array pointer to an explicit array type. Third, when an initializer list
is provided and we are calling constructors for the "filler" portion of
the list, we attempt to get the array size as a constant and create a
"tail array" to initialize that is sized to the number of elements
remaining.
DeltaFile
+191-1clang/test/CIR/CodeGen/new.cpp
+46-2clang/lib/CIR/CodeGen/CIRGenExprCXX.cpp
+237-32 files

LLVM/project d18a784compiler-rt/lib/profile InstrProfilingPlatformGPU.c InstrProfiling.h, llvm/lib/Transforms/Instrumentation InstrProfiling.cpp

[compiler-rt] Define GPU specific handling of profiling functions (#185763)

Summary:
The changes in https://www.github.com/llvm/llvm-project/pull/185552
allowed us to
start building the standard `libclang_rt.profile.a` for GPU targets.
This PR expands this by adding an optimized GPU routine for counter
increment and removing the special-case handling of these functions in
the OpenMP runtime.

Vast majority of these functions are boilerplate, but we should be able
to do more interesting things with this in the future, like value or
memory profiling.
DeltaFile
+42-0compiler-rt/lib/profile/InstrProfilingPlatformGPU.c
+0-21openmp/device/include/Profiling.h
+0-18openmp/device/src/Profiling.cpp
+13-2llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
+12-1offload/test/lit.cfg
+10-0compiler-rt/lib/profile/InstrProfiling.h
+77-424 files not shown
+78-5010 files

LLVM/project 923cc2dllvm/lib/Target/AMDGPU AMDGPUSplitModule.cpp, llvm/test/tools/llvm-split/AMDGPU kernels-dependencies.ll

[AMDGPU] Fix alias handling in module splitting functionality (#187295)

Summary:
The module splitting used for `-flto-partitions=8` support (which is
passed by default) did not correctly handle aliases. We mainly need to
do two things: keep the aliases in the they are used in and externalize
them. Internalize linkage needs to be handled conservatively.

This is needed because these aliases show up in PGO contexts.

---------

Co-authored-by: Shilei Tian <i at tianshilei.me>
DeltaFile
+17-6llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
+9-2llvm/test/tools/llvm-split/AMDGPU/kernels-dependencies.ll
+26-82 files

LLVM/project d8a83a1llvm/test/CodeGen/SPIRV/pointers global-ptrtoint.ll nested-struct-opaque-pointers.ll, llvm/test/CodeGen/SPIRV/transcoding ConvertPtrInGlobalInit.ll

[NFC][SPIR-V] Disable tests failed after spirv-val update (#187028)

Issues:
- https://github.com/llvm/llvm-project/issues/186344
- https://github.com/llvm/llvm-project/issues/186756
DeltaFile
+4-2llvm/test/CodeGen/SPIRV/pointers/global-ptrtoint.ll
+3-1llvm/test/CodeGen/SPIRV/pointers/nested-struct-opaque-pointers.ll
+3-1llvm/test/CodeGen/SPIRV/pointers/PtrCast-in-OpSpecConstantOp.ll
+3-1llvm/test/CodeGen/SPIRV/transcoding/ConvertPtrInGlobalInit.ll
+3-1llvm/test/CodeGen/SPIRV/pointers/struct-opaque-pointers.ll
+16-65 files

LLVM/project d049eefllvm/include/llvm/Target TargetSelectionDAG.td, llvm/lib/Target/X86 X86InstrCompiler.td X86InstrFragments.td

[DAG] Use value tracking to detect or_disjoint patterns and add a add_like pattern matcher (#187478)

Extend the generic or_disjoint pattern to call haveNoCommonBitsSet, this
allows us to remove the similar x86 or_is_add pattern, use or_disjoint
directly and merge some add/or_is_add matching patterns to use a
add_like wrapper pattern instead
DeltaFile
+13-18llvm/lib/Target/X86/X86InstrCompiler.td
+16-13llvm/test/CodeGen/X86/fold-masked-merge.ll
+0-10llvm/lib/Target/X86/X86InstrFragments.td
+6-1llvm/include/llvm/Target/TargetSelectionDAG.td
+35-424 files

LLVM/project 4199bb1llvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp

[AMDGPU] Simplify loop in AMDGPULowerVGPREncoding::handleCoissue. NFC. (#187511)
DeltaFile
+1-4llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+1-41 files

LLVM/project c5c0b83mlir/include/mlir/Dialect/MemRef/Transforms Passes.td Transforms.h, mlir/lib/Dialect/MemRef/Transforms ElideReinterpretCast.cpp CMakeLists.txt

[mlir][memref] Rewrite scalar `memref.copy` through reinterpret_cast into load/store (#186118)

This change adds a rewrite that simplifies `memref.copy` operations whose
destination is a scalar view produced by `memref.reinterpret_cast`.

The pattern matches cases where a reinterpret cast creates a scalar view
(`sizes = [1, ..., 1]`) into a memref that has a single non-unit dimension. In
this situation the view refers to exactly one element in the base buffer, so
the accessed address depends only on the base pointer and the offset.

The stride information of the view does not affect the accessed element,
because the only valid index into the view is `[0, ..., 0]`.

Therefore the copy can be rewritten into a direct load from the source and a
store into the base memref using the offset from the reinterpret cast.

This makes the `memref.reinterpret_cast` redundant for the copy and simplifies
the IR.


    [53 lines not shown]
DeltaFile
+225-0mlir/lib/Dialect/MemRef/Transforms/ElideReinterpretCast.cpp
+222-0mlir/test/Dialect/MemRef/elide-reinterpret-cast.mlir
+10-0mlir/include/mlir/Dialect/MemRef/Transforms/Passes.td
+4-0mlir/include/mlir/Dialect/MemRef/Transforms/Transforms.h
+1-0mlir/lib/Dialect/MemRef/Transforms/CMakeLists.txt
+462-05 files

LLVM/project c63ce62llvm/test/CodeGen/AMDGPU si-lower-i1-copies.mir

[NFC][AMDGPU] New test for untested case in SILowerI1Copies (#186127)

[This
line](https://github.com/ambergorzynski/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp#L646)
is untested by the existing LLVM test suite (checked using code coverage
and by inserting an `abort`).

We propose a new test that exercises this case. The test is demonstrated
by adding an abort to show that it is the only test that fails (the
abort is removed before merging).
DeltaFile
+112-1llvm/test/CodeGen/AMDGPU/si-lower-i1-copies.mir
+112-11 files

LLVM/project 2754e35mlir/lib/Conversion/MemRefToEmitC MemRefToEmitC.cpp, mlir/test/Conversion/MemRefToEmitC memref-to-emitc-alloc-load-store.mlir

[mlir][EmitC] Support pointer-based memrefs in load/store lowering (#186828)

## Problem  
  
In the MemRef → EmitC conversion, `memref.load` and `memref.store`
assume that the converted memref operand is an `emitc.array`, as defined
by the type conversion in `populateMemRefToEmitCTypeConversion`.
  
However, `memref.alloc` is lowered to a `malloc` call returning
`emitc.ptr`. When such values are used by `memref.load` or
`memref.store`, the conversion framework inserts a bridging
`builtin.unrealized_conversion_cast` from `emitc.ptr` to `emitc.array`.
  
These casts have no EmitC representation and therefore remain in the IR
after conversion, preventing valid C/C++ emission.

## Solution  
  
Extend the `memref.load` and `memref.store` conversions to handle

    [74 lines not shown]
DeltaFile
+71-10mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitC.cpp
+74-0mlir/test/Conversion/MemRefToEmitC/memref-to-emitc-alloc-load-store.mlir
+145-102 files

LLVM/project 201d354llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

[AMDGPU] Clean up `LowerFP_TO_INT_SAT` in AMDGPUTargetLowering (#187486)

This addresses the rest of post-commit comments from #174726.
DeltaFile
+6-13llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+6-131 files

LLVM/project 023d42dllvm/lib/Target/AArch64 AArch64SystemOperands.td

fixup! Change tablegen as suggested
DeltaFile
+40-43llvm/lib/Target/AArch64/AArch64SystemOperands.td
+40-431 files

LLVM/project 001dd5bllvm/include/llvm/Analysis TargetTransformInfo.h, llvm/lib/Analysis TargetTransformInfo.cpp

Distinguish between extends
DeltaFile
+13-11llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+16-0llvm/lib/Analysis/TargetTransformInfo.cpp
+3-0llvm/include/llvm/Analysis/TargetTransformInfo.h
+32-113 files

LLVM/project f75db42llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-no-dotprod.ll partial-reduce-fdot-product.ll

Improvements to cost-model

The chosen costs are more precise as it tries to better use the target-features to determine if something can be expanded.
The costs in sdot-i16-i32 are now more accurate and the loops that didn't vectorise before result in equivalent or better codegen.
DeltaFile
+62-42llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+9-9llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll
+8-8llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-fdot-product.ll
+3-3llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-add-sdot-i16-i32.ll
+2-2llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-sdot.ll
+84-645 files

LLVM/project e1aef9elibcxx/src/include apple_availability.h

[libc++] Fix missing availability check for visionOS in apple_availability.h (#187015)

Without this, we were assuming that __ulock was unavailable on visionOS
and falling back to the manual implementation, when in reality we can
always rely on the existence of ulock.

Fixes #186467
DeltaFile
+4-0libcxx/src/include/apple_availability.h
+4-01 files

LLVM/project 70bb9e2llvm/include/llvm/ADT GenericCycleImpl.h GenericCycleInfo.h

[CycleInfo] Index using block numbers instead of pointers (#187500)

Replace the DenseMap from block pointer to cycle with a vector indexed
by block number, which makes the lookup more efficient.
DeltaFile
+32-7llvm/include/llvm/ADT/GenericCycleImpl.h
+6-2llvm/include/llvm/ADT/GenericCycleInfo.h
+38-92 files

LLVM/project ca917a6llvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis rdiv-large-btc.ll

[DA] Add precondition `0 <=s UB` to function `inferAffineDomain`
DeltaFile
+23-12llvm/lib/Analysis/DependenceAnalysis.cpp
+2-2llvm/test/Analysis/DependenceAnalysis/rdiv-large-btc.ll
+25-142 files

LLVM/project 3bb83cdllvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis strong-siv-addrec-wrap.ll exact-siv-addrec-wrap.ll

[DA] Remove calls to the GCD MIV test from `testSIV`
DeltaFile
+9-19llvm/test/Analysis/DependenceAnalysis/strong-siv-addrec-wrap.ll
+9-19llvm/test/Analysis/DependenceAnalysis/exact-siv-addrec-wrap.ll
+9-16llvm/test/Analysis/DependenceAnalysis/infer_affine_domain_ovlf.ll
+12-12llvm/test/Analysis/DependenceAnalysis/run-specific-dependence-test.ll
+4-8llvm/lib/Analysis/DependenceAnalysis.cpp
+2-2llvm/test/Analysis/DependenceAnalysis/exact-siv-overflow.ll
+45-762 files not shown
+47-788 files

LLVM/project 5ae5f9dllvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis exact-siv-addrec-wrap.ll non-monotonic.ll

[DA] Check nsw flags for addrecs in the Exact SIV test (#186387)

This patch adds a check to ensure that the addrecs have nsw flags at the
beginning of the Exact SIV test. If either of them doesn't have, the
analysis bails out. This check is necessary because the subsequent
process in the Exact SIV test assumes that they don't wrap.
DeltaFile
+4-0llvm/lib/Analysis/DependenceAnalysis.cpp
+1-1llvm/test/Analysis/DependenceAnalysis/exact-siv-addrec-wrap.ll
+1-1llvm/test/Analysis/DependenceAnalysis/non-monotonic.ll
+1-1llvm/test/Analysis/DependenceAnalysis/symbolic-rdiv-addrec-wrap.ll
+7-34 files

LLVM/project bc2a8eflldb/source/Plugins/SymbolFile/NativePDB SymbolFileNativePDB.cpp CompileUnitIndex.cpp

[lldb][NativePDB] Remove cantFail uses (1 out of ?) (#187158)

This is a follow-up to
https://github.com/swiftlang/llvm-project/pull/12317#discussion_r2850297229

Per that discussion, given that deserializers *can* fail given a corrupt
PDB, it's preferable to handle the error instead of crashing.

This specific change is limited to "easy" changes (read: I have high
confidence in their correctness). The ideal end state is funneling all
errors to a few central places in `SymbolFileNativePDB`.
DeltaFile
+250-73lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
+63-24lldb/source/Plugins/SymbolFile/NativePDB/CompileUnitIndex.cpp
+22-15lldb/source/Plugins/SymbolFile/NativePDB/DWARFLocationExpression.cpp
+3-1lldb/source/Plugins/SymbolFile/NativePDB/CompileUnitIndex.h
+2-1lldb/source/Plugins/SymbolFile/NativePDB/DWARFLocationExpression.h
+340-1145 files

LLVM/project 141a987llvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/test/MC/AArch64 armv9a-tlbip.s

[AArch64][llvm] Separate TLBI-only feature gating from TLBIP aliases

Refactor the TLBI system operand definitions so that TLBI and TLBIP
records are emitted through separate helper multiclasses, whilst keeping
the table layout readable.

The feature-scoped wrappers now apply FeatureTLB_RMI, FeatureRME, and
FeatureTLBIW only to TLBI records (it was previously incorrectly also
applied to TLBIP instructions), while TLBIP aliases remain gated only
by FeatureD128, including their nXS forms.

Update testcases accordingly.
DeltaFile
+103-75llvm/lib/Target/AArch64/AArch64SystemOperands.td
+8-9llvm/test/MC/AArch64/armv9a-tlbip.s
+111-842 files

LLVM/project 651b1c1llvm/lib/Target/AArch64 AArch64SystemOperands.td

[AArch64][llvm] Rewrite the TLBI multiclass to be much clearer (NFC)

The `tlbi` multiclass is really doing four jobs at once: base TLBI,
synthesized nXS, optional TLBIP, and synthesized TLBIP nXS. Also,
`needsreg` and `optreg` are really just a 3-state operand policy in
disguise. Likewise, the PLBI multiclass has this same issue.

Change `needsreg` and `optreg` into a combined fake enum, so it's
clearer whether the instruction takes no register operand, a required
register operand or an optional register operand.

This improves on my original change 66e8270e8.
DeltaFile
+127-121llvm/lib/Target/AArch64/AArch64SystemOperands.td
+127-1211 files

LLVM/project 989ea0emlir/include/mlir/Dialect/XeGPU/Utils XeGPUUtils.h, mlir/lib/Dialect/XeGPU/Transforms XeGPUWgToSgDistribute.cpp XeGPUPeepHoleOptimizer.cpp

[MLIR][XeGPU] Lowering 2-Dimensional Reductions of N-D Tensors into Chained 1-D Reductions (#186034)

This PR relaxes the 2d reduction lowering in the peephole optimization
pass to allow source tensor to have n-d shape.
It also fixes a minor bug of accumulator lowering in the current
implementation.
DeltaFile
+4-84mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp
+77-0mlir/lib/Dialect/XeGPU/Utils/XeGPUUtils.cpp
+54-7mlir/test/Dialect/XeGPU/peephole-optimize.mlir
+15-28mlir/lib/Dialect/XeGPU/Transforms/XeGPUPeepHoleOptimizer.cpp
+8-0mlir/include/mlir/Dialect/XeGPU/Utils/XeGPUUtils.h
+158-1195 files

LLVM/project 8ca7a33llvm/test/Analysis/ScalarEvolution backedge-taken-count-guard-info-with-multiple-predecessors.ll

[SCEV] Generate test checks (NFC)
DeltaFile
+21-2llvm/test/Analysis/ScalarEvolution/backedge-taken-count-guard-info-with-multiple-predecessors.ll
+21-21 files

LLVM/project cf92512llvm/lib/IR Verifier.cpp, llvm/test/Bitcode DIModule-fortran-external-module.ll DIImportedEntity_elements.ll

[DebugInfo] Add Verifier check for local imports in CU's imports field (#187118)

Since https://reviews.llvm.org/D144004, DwarfDebug asserts if
function-local imported entities are present in the imports field of
DICompileUnit.
This patch adds a Verifier check to detect such invalid IR earlier.

Incorrect occurrences of imported entities in DICompileUnit's imports
field in llvm/test/Bitcode/DIImportedEntity_elements.ll,
llvm/test/Bitcode/DIModule-fortran-external-module.ll are fixed.

This change is extracted from https://reviews.llvm.org/D144008.
DeltaFile
+14-0llvm/test/Verifier/local-import-in-cu.ll
+5-1llvm/lib/IR/Verifier.cpp
+2-3llvm/test/Bitcode/DIModule-fortran-external-module.ll
+2-2llvm/test/Bitcode/DIImportedEntity_elements.ll
+23-64 files