[SLP] Vectorize struct-returning intrinsics
Allow SLP to combine across lanes calls that return a literal struct
(llvm.sincos, llvm.*.with.overflow, llvm.frexp, ...) into a single
call returning a struct of vectors, by widening {T, T, ...} to
{<VF x T>, ...} via VectorTypeUtils and emitting extractvalue +
extractelement for external uses.
Original Pull Request: https://github.com/llvm/llvm-project/pull/195521
Reviewers: hiraditya, RKSimon, bababuck
Pull Request: https://github.com/llvm/llvm-project/pull/196756
[AArch64][llvm] Remove support for FEAT_MPAMv2_VID
`FEAT_MPAMv2_VID` instructions and system registers, as introduced
in change d30f18d2c, are being removed at this time, as they've been
removed from the latest Arm ARM, which doesn't preclude them returning
in some form in future.
Other system registers introduced with `FEAT_MPAMv2` are unaffected,
and these continue to be ungated, but since `+mpamv2` gating is now
empty, I'm removing this superfluous gating code.
Cherry-picked-from: a48159df9
[ARM][MVE] Constant fold PREDICATE_CAST of 0 and 0xffff (#197832)
This allows us to fold away the vselect when we know that the condition
is all true or all false.
[OpenMP] Fix missing install-openmp component (#197603)
Summary:
This pattern is consistent throughout all the runtimes and is what the
top-level `install-openmp-<triple>` corresponds to. It should be
provided and used.
[AArch64] Delete llvm/test/CodeGen/AArch64/fptoi-256.ll (NFC) (#197896)
llvm/test/CodeGen/AArch64/fcvt-i256.ll has since been added with the
same and broader coverage.
[libc] Disable GCC 12 waccess passes to fix ICE in environ_internal (#197916)
The waccess pass in GCC 12 consistently segmentation faults when
analyzing the memory allocations in environ_internal.cpp. This change
disables the relevant tree-waccess passes for this specific file,
avoiding the ICE without requiring intrusive code refactoring.
Assisted-by: Automated tooling, human reviewed.
[LLD] [MinGW] Implement --{push,pop}-state (#197748)
Implement `--push-state` and `--pop-state` for the MinGW lld driver.
Those options were already implemented by GNU ld for MinGW:
```
--push-state Push state of flags governing input file handling
--pop-state Pop state of flags governing input file handling
```
This will align the MinGW frontend's options closer with those of the
ELF frontend and fix issues due to e.g. CMake misdetecting
`--push-state`/`--pop-state` support by accidentally querying the ELF
driver.
Fixes #131007.
[VPlan] Move call widening decision to VPlan. (NFCI) (#195518)
This patch adds a new makeCallWideningDecisions transform which converts
Call VPInstructions to
VPWidenCallRecipe/VPWidenIntrinsicRecipe/VPReplicateRecipe depending on
their costs.
To compute the costs, static helpers are introduced to re-use the
existing VPlan cost model logic:
* VPWidenIntrinsicRecipe::computeCallCost
* VPReplicateRecipe::computeCallCost
The cost-model logic is still retained; we assert that the decisions
match to make sure we do not miss any edge cases. The legacy logic will
be removed in a follow-up.
PR: https://github.com/llvm/llvm-project/pull/195518
[VPlan] Compute the cost for scalar cmp outside the vector region (#197146)
Currently we don't compute the cost of any scalar compares. Change this
to only avoid computing the cost if it's inside the vector region, as
compares that are used in the loop exit condition are handled by the
legacy cost model and this is the simplest way to avoid double-counting
those instructions.
This mainly affects the compare in the middle block, and accounting for
the cost of that can change the requred minimum trip count.
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[LLVM][CodeGen][SME] Improve regalloc hinting for multi-vector instructions. (#197711)
When an instruction uses one of the results of a multi-vector
instruction it will typically be a subreg. For it to be considered a
suitable reuse candidate we must convert the subreg to its underlying
physical register.
[llvm-ar] fixing corruptions in documentation (#197783)
A follow up to:
https://github.com/llvm/llvm-project/pull/196541#issuecomment-4442635635
My fix for :option:`N` is based on the description of `option:: b`:
```
[...]
found, the files are placed at the end of the ``archive``. *relpos* cannot
be consumed without either :option:`a`, :option:`b` or :option:`i`. This
modifier is identical to the :option:`i` modifier.
```
CC: @MaskRay @jh7370
[Matrix] Create inbounds GEPs for matrix load/stores. (#197710)
LowerMatrixIntrinsics creates multiple loads/stores + GEPs for larger
matrix load/stores. Those GEPs compute offsets into the memory accessed
by the larger loads/stores, so those GEPs must be inbounds, otherwise
the larger load would access memory out-of-bounds.
PR: https://github.com/llvm/llvm-project/pull/197710
[LV] Introduce -force-target-supports-gather-scatter-ops testing option (#196947)
This introduces a new force-target-supports-gather-scatter-ops CLI
option for testing. It can be used to show that the lack of
gather/scatter support prevents if-conversion.
[X86] Avoid repeated select masks in avx512 tests (#197886)
Don't reuse the selection masks in unit tests just for expediency -
#197799 will attempt to fold these into single selects
Also remove an ancient test_vbroadcast test that hasn't actually done
anything since we started using mask vpternlog for mask expansion (and
the test now folds away anyhow).
[mlir][tosa] Use traits to check output type aligns with input type (#193961)
Reduces code duplication and ensures the output shape aligns with the
input shape.
[Syntax] Append EOF token to truncated expanded token stream when the parser halts prematurely (#196861)
Fixes #196244.
This PR addresses cases where this assertion is triggered in
`TokenCollector::Builder::build()`:
https://github.com/llvm/llvm-project/blob/dff356d47cfc4413f78c858dd8339cb1c9fca255/clang/lib/Tooling/Syntax/Tokens.cpp#L715
`TokenCollector` collects the expanded token stream by registering a
token watcher callback in the preprocessor. Normally, the preprocessor
calls the callback for every token up to and including the `tok::eof`
token. However, when the parser hits a hard limit such as exceeding the
maximum function scope depth (this is the case covered by #196244) or
exceeding the bracket depth limit, it bails out via
`Parser::cutOffParsing()`. `cutOffParsing` forces the current token to
`eof`, but the token watcher callback is never called for it. The result
is a truncated token stream.
Fix by checking if `ExpandedTokens` is missing the final `tok::eof`. If
[4 lines not shown]