[GlobalISel][LLT] Introduce FPInfo for LLT (Enable bfloat, ppc128float and others in GlobalISel) (#155107)
Added extra information in LLT to support ambiguous fp types during
GlobalISel. Original idea by @tgymnich
Main differences from https://github.com/llvm/llvm-project/pull/122503
are:
* Do not deprecate LLT::scalar
* Allow targets to enable/disable IR translation with extenden LLT via
`TargetOption::EnableGlobalISelExtendedLLT` (disabled by default)
* `IRTranslator` use `TargetLoweringInfo` for appropriate `LLT`
generation.
* For this reason added flag in GlobalISelMatchTable` to allow switch
between legacy and new extended LLT names
* Revert using stubs like `LLT::float32` for float types as they are
real now. Added `TODO` for such cases.
Also MIRParser now may parse new type indentifiers.
[3 lines not shown]
[CodeGen] Allow rematerializer to rematerialize at the end of a block
This makes the rematerializer able to rematerialize MIs at the end of a
basic block. We achive this by tracking the parent basic block of every
region inside the rematerializer and adding an explicit target region to
some of the class's methods. The latter removes the requirement that we
track the MI of every region (`Rematerializer::MIRegion`) after the
analysis phase; the class member is therefore deleted.
This new ability will be used shortly to improve the design of the
rollback mechanism.
[MLIR][OpenMP] Unify device shared memory logic
This patch creates a utils library for the OpenMP dialect with functions
used by MLIR to LLVM IR translation as well as the stack-to-shared pass
to determine which allocations must use local stack memory or device
shared memory.
[MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
[Flang][OpenMP] Add pass to replace allocas with device shared memory
This patch introduces a new Flang OpenMP MLIR pass, only ran for target device
modules, that identifies `fir.alloca` operations that should use device shared
memory and replaces them with pairs of `omp.alloc_shared_mem` and
`omp.free_shared_mem` operations.
This works in conjunction to the MLIR to LLVM IR translation pass' handling of
privatization, mapping and reductions in the OpenMP dialect to properly select
the right memory space for allocations based on where they are made and where
they are used.
This pass, in particular, handles explicit stack allocations in MLIR, whereas
the aforementioned translation pass takes care of implicit ones represented by
entry block arguments.
[Flang][MLIR][OpenMP] Add explicit shared memory (de-)allocation ops
This patch introduces the `omp.alloc_shared_mem` and `omp.free_shared_mem`
operations to represent explicit allocations and deallocations of shared memory
across threads in a team, mirroring the existing `omp.target_allocmem` and
`omp.target_freemem`.
The `omp.alloc_shared_mem` op goes through the same Flang-specific
transformations as `omp.target_allocmem`, so that the size of the buffer can be
properly calculated when translating to LLVM IR.
The corresponding runtime functions produced for these new operations are
`__kmpc_alloc_shared` and `__kmpc_free_shared`, which previously could only be
created for implicit allocations (e.g. privatized and reduction variables).
[MLIR][OpenMP] Refactor omp.target_allocmem to allow reuse, NFC
This patch moves tablegen definitions that could be used for all kinds of heap
allocations out of `omp.target_allocmem` and into a new
`OpenMP_HeapAllocClause` that can be reused.
Descriptions are updated to follow the format of most other operations and the
custom verifier for `omp.target_allocmem` is removed as it only made a
redundant check on its result type.
[OMPIRBuilder] Add support for explicit deallocation points
In this patch, some OMPIRBuilder codegen functions and callbacks are updated to
work with arrays of deallocation insertion points. The purpose of this is to
enable the replacement of `alloca`s with other types of allocations that
require explicit deallocations in a way that makes it possible for
`CodeExtractor` instances created during OMPIRBuilder finalization to also use
them.
The OpenMP to LLVM IR MLIR translation pass is updated to properly store and
forward deallocation points together with their matching allocation point to
the OMPIRBuilder.
Currently, only the `DeviceSharedMemCodeExtractor` uses this feature to get the
`CodeExtractor` to use device shared memory for intermediate allocations when
outlining a parallel region inside of a Generic kernel (code path that is only
used by Flang via MLIR, currently). However, long term this might also be
useful to refactor finalization of variables with destructors, potentially
reducing the use of callbacks and simplifying privatization and reductions.
[5 lines not shown]