[ARM][MVE] Combine extract(bitcast(buildvec(extract))) (#196263)
Due to some of the lowering we have for buildvector to attempt to use fp
lanes efficiently under arm, we can end up with
extract(bitcast(BUILD_VECTOR(extract(bitcast(a)), ..))) that we can
convert into simpler extract(a).
This helps with node order changes.
[SLP] Vectorize struct-returning intrinsics
Allow SLP to combine across lanes calls that return a literal struct
(llvm.sincos, llvm.*.with.overflow, llvm.frexp, ...) into a single
call returning a struct of vectors, by widening {T, T, ...} to
{<VF x T>, ...} via VectorTypeUtils and emitting extractvalue +
extractelement for external uses.
Original Pull Request: https://github.com/llvm/llvm-project/pull/195521
Reviewers: hiraditya, RKSimon, bababuck
Pull Request: https://github.com/llvm/llvm-project/pull/196756
[AArch64][llvm] Remove support for FEAT_MPAMv2_VID
`FEAT_MPAMv2_VID` instructions and system registers, as introduced
in change d30f18d2c, are being removed at this time, as they've been
removed from the latest Arm ARM, which doesn't preclude them returning
in some form in future.
Other system registers introduced with `FEAT_MPAMv2` are unaffected,
and these continue to be ungated, but since `+mpamv2` gating is now
empty, I'm removing this superfluous gating code.
Cherry-picked-from: a48159df9
[ARM][MVE] Constant fold PREDICATE_CAST of 0 and 0xffff (#197832)
This allows us to fold away the vselect when we know that the condition
is all true or all false.
[OpenMP] Fix missing install-openmp component (#197603)
Summary:
This pattern is consistent throughout all the runtimes and is what the
top-level `install-openmp-<triple>` corresponds to. It should be
provided and used.
[AArch64] Delete llvm/test/CodeGen/AArch64/fptoi-256.ll (NFC) (#197896)
llvm/test/CodeGen/AArch64/fcvt-i256.ll has since been added with the
same and broader coverage.
[libc] Disable GCC 12 waccess passes to fix ICE in environ_internal (#197916)
The waccess pass in GCC 12 consistently segmentation faults when
analyzing the memory allocations in environ_internal.cpp. This change
disables the relevant tree-waccess passes for this specific file,
avoiding the ICE without requiring intrusive code refactoring.
Assisted-by: Automated tooling, human reviewed.
[LLD] [MinGW] Implement --{push,pop}-state (#197748)
Implement `--push-state` and `--pop-state` for the MinGW lld driver.
Those options were already implemented by GNU ld for MinGW:
```
--push-state Push state of flags governing input file handling
--pop-state Pop state of flags governing input file handling
```
This will align the MinGW frontend's options closer with those of the
ELF frontend and fix issues due to e.g. CMake misdetecting
`--push-state`/`--pop-state` support by accidentally querying the ELF
driver.
Fixes #131007.
[VPlan] Move call widening decision to VPlan. (NFCI) (#195518)
This patch adds a new makeCallWideningDecisions transform which converts
Call VPInstructions to
VPWidenCallRecipe/VPWidenIntrinsicRecipe/VPReplicateRecipe depending on
their costs.
To compute the costs, static helpers are introduced to re-use the
existing VPlan cost model logic:
* VPWidenIntrinsicRecipe::computeCallCost
* VPReplicateRecipe::computeCallCost
The cost-model logic is still retained; we assert that the decisions
match to make sure we do not miss any edge cases. The legacy logic will
be removed in a follow-up.
PR: https://github.com/llvm/llvm-project/pull/195518
[VPlan] Compute the cost for scalar cmp outside the vector region (#197146)
Currently we don't compute the cost of any scalar compares. Change this
to only avoid computing the cost if it's inside the vector region, as
compares that are used in the loop exit condition are handled by the
legacy cost model and this is the simplest way to avoid double-counting
those instructions.
This mainly affects the compare in the middle block, and accounting for
the cost of that can change the requred minimum trip count.
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.