[Flang] Move builtin .mod generation into runtimes (#137828)
Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:
1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Most other modules have `#ifdef`-enclosed code as well.
2. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.
3. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (#89403). The workaround of #95388 could be reverted.
[33 lines not shown]
[AMDGPU][gfx1250] Add wait_xcnt before any access that cannot be repeated (#168852)
The xcnt wait is actually required before any memory access that can
only be done once, so atomic stores and volatile accesses are affected.
This patch also ensures buffer instructions are handled.
[AArch64][SME] Handle zeroing ZA and ZT0 in functions with ZT0 state (#166361)
In the MachineSMEABIPass, if we have a function with ZT0 state, then
there are some additional cases where we need to zero ZA and ZT0.
If the function has a private ZA interface, i.e., new ZT0 (and new ZA if
present). Then ZT0/ZA must be zeroed when committing the incoming ZA
save.
If the function has a shared ZA interface, e.g. new ZA and shared ZT0.
Then ZA must be zeroed on function entry (without a ZA save commit).
The logic in the ABI pass has been reworked to use an "ENTRY" state to
handle this (rather than the more specific "CALLER_DORMANT" state).
[BOLT][BTI] Add MCPlusBuilder::addBTItoBBStart
This function contains most of the logic for BTI:
- it takes the BasicBlock and the instruction used to jump to it.
- then it checks if the first non-pseudo instruction is a sufficient
landing pad for the used call.
- if not, it generates the correct BTI instruction.
Also introduce the isBTIVariantCoveringCall helper to simplify the logic.
[BOLT][BTI] Add MCPlusBuilder::createBTI (#167305)
- creates a BTI j|c landing pad MCInst.
- create getBTIHintNum utility in AArch64/Utils, to make sure BOLT
generates BTI immediates the same way as LLVM.
- add MCPlusBuilder unittests to cover new function.
[AArch64] Assert `expandMOVImm` prioritizes optimal single MOVZ/N (#169341)
The expansion of move immediate in `expandMOVImm` follows the priority
of the `MOV` alias. In addition, the selection there properly prefers
expansion based on perf optimality order. This change adds a simple
assert that `expandMOVImmSimple` expands a single optimal MOVZ/MOVK.
[NVVM] Move pretty-print functions from NVVMIntrinsicUtils.h to cpp file (#168997)
This patch moves the print functions from `NVVMIntrinsicUtils.h` to
`NVVMIntrinsicUtils.cpp`, a file created in the `llvm/lib/IR` directory.
Signed-off-by: Dharuni R Acharya <dharunira at nvidia.com>
[mlir][LLVMIR] Handle anonymous TBAA roots during metadata emission (#169167)
This commit enhances MLIR's TBAA export with support for anonymous TBAA roots. The import for this was around for a bit but the export was missing.
Fixes: #160721
[ARM] Introduce intrinsics for MVE add/sub/mul under strict-fp. (#169156)
As far as I understand, the MVE fp vadd/vsub/vmul instructions will set
exception flags in the same ways as scalar fadd/fsub/fmul, but will not
honor flush-to-zero (for f32 they always flush, for f16 they follows the
fpsrc flags) and will always use the default rounding mode.
This means that we cannot convert the vadd_f23/vsub_f32/vmul_f32
intrinsics to llvm.constrained.fadd/fsub/fmul and then vadd/vsub/vmul
without changing the expected behaviour under strict-fp. This patch
introduces a set in intrinsics that we can use instead, going from
vadd_f32 -> llvm.arm.mve.vadd -> MVE_VADD.
The current implementations assumes that the standard variant of a
strictfp alternative will be a IRBuilder, this can be changed to take a
IRBuilder or IRInt.
[AArch64] Add patterns for add(x, trunc(shift)) (#168927)
This can be lowered to a 64bit add where we only use the bottom 32bits
of the result. It is conceptually the same as
https://alive2.llvm.org/ce/z/Xfz3Rf, but with the sext replaced by an
anyext.
[OpenMP][flang] Add initial support for by-ref reductions on the GPU
Adds initial support for GPU by-ref reductions. In particular, this diff
adds support for reductions on scalar allocatables where reductions
happen on loops nested in `target` regions. For example:
```fortran
integer :: i
real, allocatable :: scalar_alloc
allocate(scalar_alloc)
scalar_alloc = 0
!$omp target map(tofrom: scalar_alloc)
!$omp parallel do reduction(+: scalar_alloc)
do i = 1, 1000000
scalar_alloc = scalar_alloc + 1
end do
!$omp end target
[12 lines not shown]
[Flang] - Enhance testing for strictly-nested teams in target regions. (#168437)
This patch enhances the semantics test for checking that teams
directives are strictly nested inside target directives.
Fixes https://github.com/llvm/llvm-project/issues/153173