[X86] Make ISD::ROTL/ROTR vector rotates legal on XOP+AVX512 targets (#184587)
Similar to what we did for funnel shifts on #166949 - set vector rotates
as legal on XOP (128-bit ROTL) and AVX512 (vXi32/vXi64 ROTL/ROTR)
targets, and custom fold to X86ISD::VROTLI/VROTRI as a later fixup.
128/256-bit vector widening to 512-bit instructions is already fully
supported + tested on AVX512F-only targets
First part of #184002
[AArch64] Refine reduction VT selection in CTPOP -> VECREDUCE combine (#183025)
Use the same VT as the SETcc source, or fall back to using the VT of the
unextended operand of the CTPOP if the element size of the SETcc is too
small to fit the negative popcount.
[AArch64] Fix SVE cost model for various math intrinsics (#184358)
The implementation of getIntrinsicInstrCost in BasicTTIImpl
assumes that for some intrinsics if we're using custom
lowering for the equivalent DAG node that the cost needs to
be 2, instead of 1 for legal ops. However, even though we
use custom lowering for these scalable vector operations
when SVE is available, we still end up generating the same
efficient codegen as fixed-width. This patch deals with a
few obvious intrinsics that we know get lowered to something
sensible and return the same cost as NEON, i.e. 1.
[llvm-objdump] Default --symbolize-operands for BPF (#184043)
BPF users expect to see basic block labels (e.g. <L0>, <L1>) in
disassembly output
(https://github.com/llvm/llvm-project/pull/95103#issuecomment-3771234810).
Default --symbolize-operands to on for BPF targets when neither
--symbolize-operands nor --no-symbolize-operands is explicitly
specified.
Add --no-symbolize-operands to allow users to opt out.
[SDAGBuilder] Fix incorrect fcmp+select to minnum/maxnum transform (#184590)
minnum/maxnum don't have the correct sNaN semantics, we must convert to
minimumnum/maximumnum instead.
To avoid an NVPTX regression, make it handle fmaximmumnum in one
TableGen pattern.
This is intended as a targeted fix for the miscompile, as the complete
removal of this transform (#93575) appears to be blocked.
Fixes https://github.com/llvm/llvm-project/issues/176624.
[BOLT] Retain certain local symbols (#184074)
BOLT currently strips all STT_NOTYPE STB_LOCAL zero-sized symbols
that fall inside function bodies. Certain such symbols are named
labels (loop markers and subroutine entry points) or local function
symbols in hand-written assembly. We now keep them in local symbol
table in BOLT processed binaries for better symbolication.
[AArch64] Enabled and regenerate clmul-fixed.ll. NFC (#184628)
The v2i64 tests are now fixed. The disabled ones in clmul-scalable.ll
require i128 vectors which are generally not supported.
[VPlan] Move tail folding out of VPlanPredicator. NFC (#176143)
Currently the logic for introducing a header mask and predicating the
vector loop region is done inside introduceMasksAndLinearize.
This splits the tail folding part out into an individual VPlan transform
so that VPlanPredicator.cpp doesn't need to worry about tail folding,
which seemed to be a temporary measure according to a comment in
VPlanTransforms.h.
To perform tail folding independently, this splits the "body" of the
vector loop region between the phis in the header and the branch + iv
increment in the latch:
Before:
```
+-------------------------------------------+
|%iv = ... |
[39 lines not shown]
[CI] Enable LTO linker plugin tests (#184076)
We've recently had two instances of test failures for the LTO linker
plugin being introduced. Build and test the LTO linker plugin in
pre-merge CI to avoid this.
[SystemZ] Mark fminimumnum/fmaximumnum as legal (#184595)
In M=4 mode, the behavior matches IEEE 754-2019 minimumNumber, except
that if both operands are sNaN, the result will be sNaN rather than
qNaN. However, this is explicitly allowed for LLVM's minimumnum
intrinsic, as canonicalization can be omitted for non-constrainted FP.
As such, mark fminimumnum/fmaximumnum as legal, and lower them the same
way as fminnum/fmaxnum. In the future, we may wish to switch those to
use M=0 instead, to match IEEE 754-2008 maxNum/minNum instead.
[MLIR][NVVM] Unify and move to a single tcgen05_mma_kind attr for all tcgen05.mma Ops (#184433)
This change unifies using of `tcgen05_mma_kind` attribute for
tcgen05.mma Ops in MLIR.
Before this change there were two block scale attributes used for
tcgen05.mma Ops. One was `MMABlockScaleKindAttr` with `mxf8f6f4`, `mxf4`
and `fxf4nvf4` values used for `tcgen05.mma.block_scale` and
`tcgen05.mma.sp.block_scale`. Another one was `Tcgen05MMAKindAttr` with
`f16`, `tf32`, `f8f6f4` and `i8` values used for `tcgen05.mma`,
`tcgen05.mma.sp`, `tcgen05.mma.ws` and `tcgen05.mma.ws.sp`.
`Tcgen05MMAKindAttr` has been extended with values from
`MMABlockScaleKindAttr`. Now there is `tcgen05_mma_kind` attribute only
for all `tcgen05.mma` Ops in MLIR.
Backward compatibility is not supported. Existing tests and scripts
should be updated to use `tcgen05_mma_kind` attribute instead of
`block_scale_kind` for all tcgen05.mma MLIR Ops.
[mlir][MemRef] Add position-based matching heuristics for rank-reduction with dynamic strides (#184334)
When multiple source dimensions have multiple unit dimensions,
stride-based disambiguation can be wrong with dynamic strides. Add
position-based matching: for each result dimension in order, pick the
leftmost unmatched source dimension with the same size; unmatched source
dims are dropped.
Example: subview from memref<1x8x1x3> to memref<1x8x3>. Both dim 0 and
dim 2 have size 1. Stride-based logic cannot distinguish when strides
are dynamic. Position-based matching correctly drops dim 2 (middle unit
dim) instead of dim 0.
When we have non-trivial static strides, we make use of the stride-based
logic, else we fall back to position-based logic as introduced by this
patch.
INPUT :-
```
[22 lines not shown]
[clangd][NFC] Add RefKind::Call into RefKind::All and insertion operator (#184677)
Without this patch:
- RefKind output doesn't show RefKind::Call bit.
- RefKind::Call isn't included in RefKind::All.
I don't think these changes require additional tests, as the problems
above mainly appear during testing/debugging (e.g. if in tests
comparison of two RefKinds fails, `Call` isn't shown in the output even
if this bit is set).
[flang-rt] Handle NAMELIST logical comments without preceding space (#183202)
If a comment appears immediately after a logical value in a NAMELIST
file, the flang runtime returns IostatGenericError. No error occurs when
a space preceeds the exclamation point. Add code to handle a comment
while parsing logical values.
Co-authored-by: John Otken john.otken at hpe.com