[MLIR][WASM] Introduce the RaiseWasmMLIRPass to convert WasmSSA MLIR to core dialects (#164562)
This is following https://github.com/llvm/llvm-project/pull/154674 and
still related to
https://discourse.llvm.org/t/rfc-mlir-dialect-for-webassembly/86758.
This PR introduces the RaiseWasmMLIRPass. This pass lowers WasmSSA MLIR
to other dialects of the LLVM ecosystem (namely: arith, math, cf and
memref).
This is the first PR of a series of 2 or 3 to introduce the lowering, as
an introduction it brings support for function calls, local and global
variables and handling of arithmetic operations. As explained in the
RFC, most WasmSSA operations have been made to stay close to other
dialects' semantics so that conversion is trivialized.
---------
Signed-off-by: Ferdinand Lemaire <flemairen6 at gmail.com>
Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire at woven-planet.global>
Co-authored-by: Ferdinand Lemaire <flemairen6 at gmail.com>
Revert "[OpenMP][offload] Cross-team reductions with variable number of teams" (#204914)
Reverts llvm/llvm-project#195102 due to some missed debug info issue
revealed by https://lab.llvm.org/buildbot/#/builders/67/builds/7022
[OpenMP][offload] Cross-team reductions with variable number of teams (#195102)
This is a part of a series of patches that rework OpenMP cross-team
reductions.
This patch changes the cross-team reduction runtime to no longer work
through larger number of teams in chunks. Instead, we allocate a
suitable-sized global buffer for the team values and let all teams run
at once. The last team that finishes uses a strided loop to reduce the
team values from the global buffer.
We also use `mapping::getNumberOfThreadsInBlock()` instead of
`omp_get_num_threads()` because the reduction of the team values runs
outside of the parallel region device code, which would make
`omp_get_num_threads()` always return 1. For Generic-SPMD mode, we also
want to use all available threads, which means that we need to copy the
reduction data from LDS (where it lives in that mode by default) to
scratch in codegen before calling the cross-team reduction.
[48 lines not shown]
[DirectX] Handle llvm.dx.resource.getbasepointer intrinsic in DXILResourceAccess pass (#204732)
The `llvm.dx.resource.getbasepointer` intrinsic is emitted for
`Constantbuffer<T>` element access and needs to be translated to
`llvm.dx.resource.load.cbufferrow` calls in the `DXILResourceAccess`
pass. The handling is identical to `llvm.dx.resource.getpointer` with a
0 offset.
Fixes #204234
[LifetimeSafety] Allow configuring lifetimebound fix-it spelling (#204045)
When suggesting `[[clang::lifetimebound]]` fix-its, allow users to
provide a project-specific macro spelling with
`-lifetime-safety-lifetimebound-macro=...`.
If no spelling is configured, use a visible macro whose replacement
tokens spell the attribute, preferring the most recently defined
matching macro, and fall back to `[[clang::lifetimebound]]` or
`__attribute((lifetimebound))` otherwise.
Closes https://github.com/llvm/llvm-project/issues/200232
[BOLT][AArch64] Align tentative layout bases using per-section alignment (#204262)
Move `AssignSections` pass before `AlignerPass` so it can record the max
code alignment per output section, then align the tentative hot/cold
section bases using the recorded alignment, which makes tentative layout
better match actually emitted.
[Clang][UBSan] Use EmitCheckedLValue for C++ trivial operator= operands (#203737)
Further to https://github.com/llvm/llvm-project/pull/190739, use
EmitCheckedLValue for trivial operator= operands
* for the LHS (`lhs->` not handled yet), and
* for the RHS also for function call syntax.
[Support] Add a parser for cl::opt<ElementCount> (#203969)
This adds command-line option parsing support for ElementCount.
This allows the following syntax:
```
--my-option=4 ; Maps to ElementCount::getFixed(4)
--my-option="vscale x 8" ; Maps to ElementCount::getScalable(8)
```
This is intended to unify fixed/scalable option handling in the loop
vectorizer. Currently, we have options like
'`EpilogueVectorizationForceVF`' defined as `cl::opt<unsigned>` which do
not allow specifying scalable VFs.
Assisted-by: Codex
[AMDGPU] Use explicit carry nodes for i64 wide integer lowering (#204694)
This PR switches widened i64 add/sub lowering to use explicit
UADDO/USUBO carry
nodes instead of glue-based carry chains.
[SPIR-V] Lower undef nested in a constant aggregate (#204377)
A constant aggregate whose element is itself an aggregate `undef` was
never lowered to a placeholder. The raw aggregate operand reached
IRTranslator on the llvm.spv.const.composite call and aborted with
"unable to translate instruction".
A similar issue was found and fixed during SPV_KHR_poison_freeze
implementation. So instead of re-inventing a wheel - unify lowering with
poison.
Addresses the following observation:
https://github.com/llvm/llvm-project/pull/198037#discussion_r3304013315
[LV] Unify header phi fixup and remove fixNonInductionPHIs (NFC). (#204886)
Unify the execute logic for VPPhi and VPWidenPHIRecipe into a shared
executePhiRecipe helper that handles both scalar and vector phis. For
header phis, only the preheader incoming value is added during execute;
the backedge is fixed up later by VPlan::execute().
This allows generalizing the VPlan::execute() fixup loop to handle all
loop headers (not just the first), removing the VPWidenPHIRecipe skip,
and eliminating fixNonInductionPHIs entirely.
[Verifier] Verify AMX tile-register index operands are in range
AMX has 8 physical tile registers (TMM0-TMM7), so the tile-index operands
of the AMX intrinsics must be in [0, 8): operand 0 for the tile
load/store/zero intrinsics, operands 0-2 for the tdp* family.
[AMDGPU] Use explicit carry nodes for i64 wide integer lowering
This PR switches widened i64 add/sub lowering to use explicit UADDO/USUBO carry
nodes instead of glue-based carry chains.