[VPlan] Move tail folding out of VPlanPredicator. NFC (#176143)
Currently the logic for introducing a header mask and predicating the
vector loop region is done inside introduceMasksAndLinearize.
This splits the tail folding part out into an individual VPlan transform
so that VPlanPredicator.cpp doesn't need to worry about tail folding,
which seemed to be a temporary measure according to a comment in
VPlanTransforms.h.
To perform tail folding independently, this splits the "body" of the
vector loop region between the phis in the header and the branch + iv
increment in the latch:
Before:
```
+-------------------------------------------+
|%iv = ... |
[39 lines not shown]
[CI] Enable LTO linker plugin tests (#184076)
We've recently had two instances of test failures for the LTO linker
plugin being introduced. Build and test the LTO linker plugin in
pre-merge CI to avoid this.
[SystemZ] Mark fminimumnum/fmaximumnum as legal (#184595)
In M=4 mode, the behavior matches IEEE 754-2019 minimumNumber, except
that if both operands are sNaN, the result will be sNaN rather than
qNaN. However, this is explicitly allowed for LLVM's minimumnum
intrinsic, as canonicalization can be omitted for non-constrainted FP.
As such, mark fminimumnum/fmaximumnum as legal, and lower them the same
way as fminnum/fmaxnum. In the future, we may wish to switch those to
use M=0 instead, to match IEEE 754-2008 maxNum/minNum instead.
[MLIR][NVVM] Unify and move to a single tcgen05_mma_kind attr for all tcgen05.mma Ops (#184433)
This change unifies using of `tcgen05_mma_kind` attribute for
tcgen05.mma Ops in MLIR.
Before this change there were two block scale attributes used for
tcgen05.mma Ops. One was `MMABlockScaleKindAttr` with `mxf8f6f4`, `mxf4`
and `fxf4nvf4` values used for `tcgen05.mma.block_scale` and
`tcgen05.mma.sp.block_scale`. Another one was `Tcgen05MMAKindAttr` with
`f16`, `tf32`, `f8f6f4` and `i8` values used for `tcgen05.mma`,
`tcgen05.mma.sp`, `tcgen05.mma.ws` and `tcgen05.mma.ws.sp`.
`Tcgen05MMAKindAttr` has been extended with values from
`MMABlockScaleKindAttr`. Now there is `tcgen05_mma_kind` attribute only
for all `tcgen05.mma` Ops in MLIR.
Backward compatibility is not supported. Existing tests and scripts
should be updated to use `tcgen05_mma_kind` attribute instead of
`block_scale_kind` for all tcgen05.mma MLIR Ops.
[mlir][MemRef] Add position-based matching heuristics for rank-reduction with dynamic strides (#184334)
When multiple source dimensions have multiple unit dimensions,
stride-based disambiguation can be wrong with dynamic strides. Add
position-based matching: for each result dimension in order, pick the
leftmost unmatched source dimension with the same size; unmatched source
dims are dropped.
Example: subview from memref<1x8x1x3> to memref<1x8x3>. Both dim 0 and
dim 2 have size 1. Stride-based logic cannot distinguish when strides
are dynamic. Position-based matching correctly drops dim 2 (middle unit
dim) instead of dim 0.
When we have non-trivial static strides, we make use of the stride-based
logic, else we fall back to position-based logic as introduced by this
patch.
INPUT :-
```
[22 lines not shown]
[clangd][NFC] Add RefKind::Call into RefKind::All and insertion operator (#184677)
Without this patch:
- RefKind output doesn't show RefKind::Call bit.
- RefKind::Call isn't included in RefKind::All.
I don't think these changes require additional tests, as the problems
above mainly appear during testing/debugging (e.g. if in tests
comparison of two RefKinds fails, `Call` isn't shown in the output even
if this bit is set).
[flang-rt] Handle NAMELIST logical comments without preceding space (#183202)
If a comment appears immediately after a logical value in a NAMELIST
file, the flang runtime returns IostatGenericError. No error occurs when
a space preceeds the exclamation point. Add code to handle a comment
while parsing logical values.
Co-authored-by: John Otken john.otken at hpe.com
[RISCV] Remove RISCVVectorPeephole::tryToReduceVL (#184297)
Now that RISCVVLOptimizer has been extended to handle the remaining
cases tryToReduceVL handles, we can remove tryToReduceVL to keep all the
reduction logic in one place.
Intended to be NFC but it looks like in
test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector-shuffle.ll we were
previously reducing the vl of a volatile load in
insert_subvector_dag_loop, which RISCVVLOptimizer knows to avoid.
On llvm-test-suite and SPEC CPU 2017 -march=rva23u64 -O3 there are no
changes with this patch.
[OpenMP][MLIR] Modify lowering OpenMP Dialect lowering to support attach mapping
This PR adjusts the LLVM-IR lowering to support the new attach map type that the runtime
uses to link data and pointer together, this swaps the mapping from the older
OMP_MAP_PTR_AND_OBJ map type in most cases and allows slightly more complicated ref_ptr/ptee
and attach semantics.
[Flang][OpenMP][Offload] Modify MapInfoFinalization to handle attach mapping and 6.1's ref_* and attach map keywords
This PR is one of four required to implement the attach mapping semantics in Flang, alongside the
ref_ptr/ref_ptee/ref_ptr_ptee map modifiers and the attach(always/never/auto) modifiers.
This PR is the MapInfoFinalization changes required to support these features, it mainly deals with
applying the correct attach map type and manipulating the descriptor types maps for base address
and descriptor so that when we specify ref_ptr/ref_ptee we emit one of the two maps and when we
emit ref_ptr_ptee we emit our usual default maps. In all cases we add the "glue" of an new
attach map except in cases where a user has provided attach never. In cases where we are
provided an always, we apply the always map type to our attach maps.
It's important to note the runtime has a toggle for the auto map behaviour, which will flip the
attach behaviour to the newer semantics or the older semantics for backwards compatability (outside
the purview of this PR but good to mention).
Modify semantic check for affinity clause
- Add CheckLastPartRef
- Add CheckArraySection
- Add comment why we still need check for substring even if
CheckArraySection is called
[flang-rt] Fixes EXECUTE_COMMAND_LINE() status management and double buffering (#184285)
EXECUTE_COMMAND_LINE() without CMDSTAT initiated termination in runtime
if the command returned non-zero status code. For example,
EXECUTE_COMMAND_LINE('false') on Linux would cause "fatal Fortran
runtime error... : Command line execution failed with exit code: 1."
This is too strict: EXECUTE_COMMAND_LINE() successfully called 'false',
it's just 'false' happened to return non-zero status code. ifx and
gfortran don't initiate termination in such case. Changed
EXECUTE_COMMAND_LINE() implementation to behave in similar fashion.
Also during testing discovered that when the output of the program that
uses EXECUTE_COMMAND_LINE(... WAIT=.false.) is piped to a file, the
resulting file has duplicated output lines. This was because fork()
command also ends up duplicating parent's buffered output to the child.
Added flush of all units and C stdio before calling fork().
[Flang][MLIR][OpenMP] Add distinct var_ptr_ptr_type to omp.map.info operations & remove ref_ptr_ptee
This is a precursor patch to attach and ref_ptr/ptee mapping that I intend to upstream
over the next few weeks. The attach maps require both the type of the descriptor and
the pointed to data to calculate the appropriate offload/base pointers and size. In
the base case of ref_ptr_ptee all of this information can be gathered from the pointer
and pointee maps, but in cases where we have only one (i.e. ref_ptr/ref_ptee) we will
be missing one of the key elements required to create an corresponding attach map.
So, this PR basically adds the ability to ferry around the type of both var_ptr and
var_ptr_ptr as opposed to just var_ptr, then we can emit attach maps as seperate
map.info's that carry all the pre-requisite informaion for lowering to LLVM-IR. But,
otherwise it seems reasonable to have var_ptr_ptr mirror var_ptr in all aspects for
consistency.
It also removes ref_ptr_ptee, instead opting to use the setting of both ref_ptr and
ref_ptee to mean ref_ptr_ptee.
Modify semantic check for affinity clause
- Add CheckLastPartRef
- Add CheckArraySection
- Add comment why we still need check for substring even if
CheckArraySection is called
Modify semantic check for affinity clause
- Add CheckLastPartRef
- Add CheckArraySection
- Add comment why we still need check for substring even if
CheckArraySection is called