[GISel] import pattern `(A-(B-C)) to A+(C-B)` (#181676)
This PR imports the rewrite pattern `(A-(B-C)) to A+(C-B)` from
selectionDAG to GlobalISel.
The rewrite should only trigger when `B-C` is used once.
[SLP]Improve reductions for copyables/split nodes
The original support for copyables leads to a regression in x264 in
RISCV, this patch improves detection of the copyable candidates by more
precise checking of the profitability and adds and extra check for
splitnode reduction, if it is profitable.
Fixes #184313
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/185697
[OpenACC][NFC] Generalize wrapMultiBlockRegionWithSCFExecuteRegion (#187359)
Simplify `wrapMultiBlockRegionWithSCFExecuteRegion` by replacing the
`bool convertFuncReturn` parameter with a generic `getNumSuccessors() ==
0` check. Terminators with no successors are by definition region exit
points, so they can be identified automatically without requiring
callers to specify types. This enables downstream dialects (e.g., CUF
with fir::FirEndOp) to reuse the utility without modifying it.
```
// Before:
wrapMultiBlockRegionWithSCFExecuteRegion(region, mapping, loc, rewriter, /*convertFuncReturn=*/true);
// After:
wrapMultiBlockRegionWithSCFExecuteRegion(region, mapping, loc, rewriter);
```
[CIR] Add support for array new with ctor init (#187418)
This adds support for array new initialization that requires calling
constructors.
This diverges a bit from the classic codegen implementation in a couple
of ways. First, we use the cir.array_ctor operation to represent all the
constructor calls that weren't part of an explicit initializer list.
This gets lowered to a loop during the LoweringPrepare pass. Second,
because CIR uses more explicit types, we have to insert a bitcast of the
array pointer to an explicit array type. Third, when an initializer list
is provided and we are calling constructors for the "filler" portion of
the list, we attempt to get the array size as a constant and create a
"tail array" to initialize that is sized to the number of elements
remaining.
[compiler-rt] Define GPU specific handling of profiling functions (#185763)
Summary:
The changes in https://www.github.com/llvm/llvm-project/pull/185552
allowed us to
start building the standard `libclang_rt.profile.a` for GPU targets.
This PR expands this by adding an optimized GPU routine for counter
increment and removing the special-case handling of these functions in
the OpenMP runtime.
Vast majority of these functions are boilerplate, but we should be able
to do more interesting things with this in the future, like value or
memory profiling.
[AMDGPU] Fix alias handling in module splitting functionality (#187295)
Summary:
The module splitting used for `-flto-partitions=8` support (which is
passed by default) did not correctly handle aliases. We mainly need to
do two things: keep the aliases in the they are used in and externalize
them. Internalize linkage needs to be handled conservatively.
This is needed because these aliases show up in PGO contexts.
---------
Co-authored-by: Shilei Tian <i at tianshilei.me>
[DAG] Use value tracking to detect or_disjoint patterns and add a add_like pattern matcher (#187478)
Extend the generic or_disjoint pattern to call haveNoCommonBitsSet, this
allows us to remove the similar x86 or_is_add pattern, use or_disjoint
directly and merge some add/or_is_add matching patterns to use a
add_like wrapper pattern instead
[mlir][memref] Rewrite scalar `memref.copy` through reinterpret_cast into load/store (#186118)
This change adds a rewrite that simplifies `memref.copy` operations whose
destination is a scalar view produced by `memref.reinterpret_cast`.
The pattern matches cases where a reinterpret cast creates a scalar view
(`sizes = [1, ..., 1]`) into a memref that has a single non-unit dimension. In
this situation the view refers to exactly one element in the base buffer, so
the accessed address depends only on the base pointer and the offset.
The stride information of the view does not affect the accessed element,
because the only valid index into the view is `[0, ..., 0]`.
Therefore the copy can be rewritten into a direct load from the source and a
store into the base memref using the offset from the reinterpret cast.
This makes the `memref.reinterpret_cast` redundant for the copy and simplifies
the IR.
[53 lines not shown]
[NFC][AMDGPU] New test for untested case in SILowerI1Copies (#186127)
[This
line](https://github.com/ambergorzynski/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp#L646)
is untested by the existing LLVM test suite (checked using code coverage
and by inserting an `abort`).
We propose a new test that exercises this case. The test is demonstrated
by adding an abort to show that it is the only test that fails (the
abort is removed before merging).
[mlir][EmitC] Support pointer-based memrefs in load/store lowering (#186828)
## Problem
In the MemRef → EmitC conversion, `memref.load` and `memref.store`
assume that the converted memref operand is an `emitc.array`, as defined
by the type conversion in `populateMemRefToEmitCTypeConversion`.
However, `memref.alloc` is lowered to a `malloc` call returning
`emitc.ptr`. When such values are used by `memref.load` or
`memref.store`, the conversion framework inserts a bridging
`builtin.unrealized_conversion_cast` from `emitc.ptr` to `emitc.array`.
These casts have no EmitC representation and therefore remain in the IR
after conversion, preventing valid C/C++ emission.
## Solution
Extend the `memref.load` and `memref.store` conversions to handle
[74 lines not shown]
Improvements to cost-model
The chosen costs are more precise as it tries to better use the target-features to determine if something can be expanded.
The costs in sdot-i16-i32 are now more accurate and the loops that didn't vectorise before result in equivalent or better codegen.
[libc++] Fix missing availability check for visionOS in apple_availability.h (#187015)
Without this, we were assuming that __ulock was unavailable on visionOS
and falling back to the manual implementation, when in reality we can
always rely on the existence of ulock.
Fixes #186467
[CycleInfo] Index using block numbers instead of pointers (#187500)
Replace the DenseMap from block pointer to cycle with a vector indexed
by block number, which makes the lookup more efficient.
[DA] Check nsw flags for addrecs in the Exact SIV test (#186387)
This patch adds a check to ensure that the addrecs have nsw flags at the
beginning of the Exact SIV test. If either of them doesn't have, the
analysis bails out. This check is necessary because the subsequent
process in the Exact SIV test assumes that they don't wrap.
[lldb][NativePDB] Remove cantFail uses (1 out of ?) (#187158)
This is a follow-up to
https://github.com/swiftlang/llvm-project/pull/12317#discussion_r2850297229
Per that discussion, given that deserializers *can* fail given a corrupt
PDB, it's preferable to handle the error instead of crashing.
This specific change is limited to "easy" changes (read: I have high
confidence in their correctness). The ideal end state is funneling all
errors to a few central places in `SymbolFileNativePDB`.
[AArch64][llvm] Separate TLBI-only feature gating from TLBIP aliases
Refactor the TLBI system operand definitions so that TLBI and TLBIP
records are emitted through separate helper multiclasses, whilst keeping
the table layout readable.
The feature-scoped wrappers now apply FeatureTLB_RMI, FeatureRME, and
FeatureTLBIW only to TLBI records (it was previously incorrectly also
applied to TLBIP instructions), while TLBIP aliases remain gated only
by FeatureD128, including their nXS forms.
Update testcases accordingly.
[AArch64][llvm] Rewrite the TLBI multiclass to be much clearer (NFC)
The `tlbi` multiclass is really doing four jobs at once: base TLBI,
synthesized nXS, optional TLBIP, and synthesized TLBIP nXS. Also,
`needsreg` and `optreg` are really just a 3-state operand policy in
disguise. Likewise, the PLBI multiclass has this same issue.
Change `needsreg` and `optreg` into a combined fake enum, so it's
clearer whether the instruction takes no register operand, a required
register operand or an optional register operand.
This improves on my original change 66e8270e8.
[MLIR][XeGPU] Lowering 2-Dimensional Reductions of N-D Tensors into Chained 1-D Reductions (#186034)
This PR relaxes the 2d reduction lowering in the peephole optimization
pass to allow source tensor to have n-d shape.
It also fixes a minor bug of accumulator lowering in the current
implementation.
[DebugInfo] Add Verifier check for local imports in CU's imports field (#187118)
Since https://reviews.llvm.org/D144004, DwarfDebug asserts if
function-local imported entities are present in the imports field of
DICompileUnit.
This patch adds a Verifier check to detect such invalid IR earlier.
Incorrect occurrences of imported entities in DICompileUnit's imports
field in llvm/test/Bitcode/DIImportedEntity_elements.ll,
llvm/test/Bitcode/DIModule-fortran-external-module.ll are fixed.
This change is extracted from https://reviews.llvm.org/D144008.