[AArch64] Add basic NPM support for LoadStoreOptimizer. (#184090)
This adds what I can tell is the the basics for NPM support on LLVM, and
ports the AArch64LoadStoreOpt pass to have NPM support.
[mlir] Install '.pdll' files along with the header files (#183855)
The CMake install configuration was not installing
'include/mlir/Transforms/DialectConversion.pdll`, which is required
by the installed PDLL compiler tools for interacting withthe dialect
conversion infrastructure.
[Thumb2] Use BXAUT instruction if available (#183056)
Generated a
bxaut r12, lr, sp
instruction rather than
aut r12, lr, sp
bx lr
The bxaut instruction is available when for thumb2 code with the
armv8.1m-main architecture and PACBTI is enabled
This change introduces a new pseudo instruction ARM::t2BXAUT_RET which
is similar to the existing pseudo instruction ARM::tBX_RET.
---------
Co-authored-by: Copilot <175728472+Copilot at users.noreply.github.com>
[mlir][shard, mpi] Adding Shard/MPI reduce_scatter and simplification (#184189)
- introduces a simplify pass, which finds such patterns and replaces it
with the equivalent `reduce-scatter`
- promotes the test-pass `test-shard-optimizations` to a proper pass and adds
- folding allgather+allslice into reduce_scatter
- sanitizes the `shard.reduce_scatter` op
- adds a new `mpi.reduce_scatter_block` op
- lowers `shard.reduce_scatter` to MPI
- lowers `mpi-reduce_scatter_block` to llvm
---------
Co-authored-by: Copilot <175728472+Copilot at users.noreply.github.com>
[CIR] Fix unreachable block generation in EH flattening (#184268)
The previous EH CFG flattening implementation would sometimes create
dispatch handlers in unreachable blocks. This seemed OK until I started
implementing the code to lower the flattened CIR to an ABI-specific form
and those weren't getting updated.
This change fixes the flattening code to avoid generating unreachable
blocks.
[ELF] Add TargetInfo::initTargetSpecificSections hook (#184292)
so that we can move target-specific synthetic section creation from
createSyntheticSections into per-target initTargetSpecificSections
overrides. This reduces target-specific code in the shared
SyntheticSections.cpp. The subsequent commits (split from
https://github.com/llvm/llvm-project/pull/184057) will move these
target-specific classes to Arch/ files.
[CI][SPIRV][NFC] Remove unneccessary mkdir from workflow (#184353)
The `CMake` command does the `mkdir` automatically.
Pointed out in https://github.com/llvm/llvm-project/pull/184174
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[libc] Various GPU allocator tweaks and optimizations (#184368)
Summary:
Some low-hanging fruit tweaks. Mostly preventing redundant loads and
unnecessary widening. Some fixes as well, like nullptr handling,
incorrect rounding, and oversized bitfields.
[Clang] Generate ptr and float atomics without integer casts (#183853)
Summary:
LLVM IR should support these for all cases except for compare-exchange.
Currently the code goes through an integer indirection for these cases.
This PR changes the behavior to use atomics directly to the target
memory type.
Reapply "[SPIRV][NFCI] Use unordered data structures for SPIR-V extensions (#184162)
Reapply https://github.com/llvm/llvm-project/pull/183567 with minor
changes.
Problem causing the revert was we couldn't use the enum in `DenseMap`
directly because of some `TableGen` limitations so I casted made the map
use the underlying type, but that caused some UB, so I
[fixed](https://github.com/llvm/llvm-project/pull/183769) the `TableGen`
limitation so now it just works.
Fix `assignValueToReg` function's argument (#184354)
Because of [PR#178198](https://github.com/llvm/llvm-project/pull/178198)
the argument changes for `assignValueToReg`.
This PR aiming at fixing M86k experimental target
[Clang] Fix clang crash for fopenmp statement(for) inside lambda function (#146772)
C++ range-for statements introduce implicit variables such as `__range`,
`__begin`, and `__end`. When such a loop appears inside an OpenMP
loop-based directive (e.g. `#pragma omp for`) within a lambda, these
implicit variables were not emitted before OpenMP privatization logic
ran.
OMPLoopScope assumes that loop-related variables are already present in
LocalDeclMap and temporarily overrides their addresses. Since the
range-for implicit variables had not yet been emitted, they were treated
as newly introduced entries and later erased during restore(), leading
to missing mappings and a crash during codegen.
Fix this by emitting the range-for implicit variables before OpenMP
privatization (setVarAddr/apply), ensuring that existing mappings are
correctly overridden and restored.
This fixes #146335
[AMDGPU] Generate more swaps (#184164)
Generate more swaps from:
```
mov T, X
...
mov X, Y
...
mov Y, X
```
by being more careful about what use/defs of X, Y, T are allowed in
intervening code and allowing flexibility where the swap is inserted.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
[SPIR-V][HIP] Disable SPV_KHR_untyped_pointers (#183530)
SPV_KHR_untyped_pointers in SPIR-V to LLVM translator is incomplete with
few known issues. Therefore we better not to rely on this extension for SPIR-V
generation.
[AArch64] Fix type mismatch in bitconvert + vec_extract patterns (#183549)
This patch fixes mismatch in element width during isel of bitconvert +
vec_extract nodes. This resolves issue reported on
[this](https://github.com/llvm/llvm-project/pull/172837) PR.
[X86] Add i256 shift / funnel shift coverage to match i512 tests (#184346)
shift-i256.ll - added x86-64/x86-64-v2/x86-64-v3/x86-64-v4 coverage and retained the x86 test coverage
[SPIRV] Don't emit service function basic block names (#184206)
Right now if a module has a service function we always emit `OpName
entry` for the service function's basic block.
The actual service function isn't emitted and no other instruction uses
the basic block `OpName` instruction, so don't emit it.
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[VPlan] Preserve IsSingleScalar for sunken predicated stores. (#184329)
The predicated stores may be single scalar (e.g. for VF = 1). We should
preserve IsSingleScalar. As all stores access the same address,
IsSingleScalar must match across all stores in the group.
This fixes an assertion when interleaving-only with sunken stores.
Fixes https://github.com/llvm/llvm-project/issues/184317
PR: https://github.com/llvm/llvm-project/pull/184329
[CodeGen] Move rollback capabilities outside of the rematerializer
The rematerializer implements support for rolling back
rematerializations by modifying MIs that should normally be deleted in
an attempt to make them "transparent" to other analyses. This involves:
1. setting their opcode to DBG_VALUE and
2. setting their read register operands to the sentinel register.
This approach has several drawbacks.
1. It forces the rematerializer to support tracking these "dead MIs".
2. It is not actually clear whether this mechanism will interact well
with all other analyses. This is an issue since the intent of the
rematerializer is to be usable in as many contexts as possible.
3. In practice, it has shown itself to be relatively error-prone.
This commit removes rollback support from the rematerializer and moves
those capabilties to a rematerializer listener than can be instantiated
[5 lines not shown]
[CodeGen] Allow rematerializer to rematerialize at the end of a block
This makes the rematerializer able to rematerialize MIs at the end of a
basic block. We achive this by tracking the parent basic block of every
region inside the rematerializer and adding an explicit target region to
some of the class's methods. The latter removes the requirement that we
track the MI of every region (`Rematerializer::MIRegion`) after the
analysis phase; the class member is therefore deleted.
This new ability will be used shortly to improve the design of the
rollback mechanism.