[compiler-rt][ARM] Optimized FP double <-> single conversion (#179926)
This commit provides assembly versions of the conversions both ways
between double and float.
[SLP] Improve InsertElement scalarization cost modeling
When costing InsertElement tree entries, pass getScalarizationOverhead the
per-lane insert operands via AdjustedVL, set ForPoisonSrc from whether the
base vector is entirely undef, and supply a VectorInstrContext hint derived
from the demanded insert instructions. Move the scalarization cost adjustment
to after InMask is computed so ForPoisonSrc reflects the actual base vector
state.
Reviewers: bababuck, RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/199514
[VPlan] Construct VPlan1 once, share across buildVPlans calls. (#197276)
Extract the VF-independent VPlan1 setup pipeline (header phis,
simplification, early-exit handling, middle check, loop regions, tail
folding, mask introduction) into a new helper tryToBuildVPlan1().
Construct the initial Vlan1 once, and pass to repeated buildVPlans
calls.
Note that this means we need to move collectInLoopReductions up. We not
may construct VPlan1 on code paths where we did not before, because we
failed UserVF validation/selection, but I think that should be fine as
this makes the overall code simpler and the UserVF code paths are for
testing.
PR: https://github.com/llvm/llvm-project/pull/197276
[clangd] Prefer .hpp files over .h with header source switch (#198152)
Previously, The "Switch Between Source/Header" action picked `.h` over
`.hpp` when both files existed next to a `.cpp` file, because `.h` is
listed first in the header-extension list.
This patch reorders `HeaderExtensions` and `SourceExtensions` so the
`C++`-flavored extensions come before `.h` and `.c`. `C++`-flavor of
file is preffered since (at least in my opinion) more people using
`clangd` for `C++` than `C` with `.hpp` ext so switching from `.cpp`
should go into `.hpp`, not `.h`.
This brings an edje case that when swithing from `.c` it will go into
`.hpp` instead of `.h`, but I think this situation is more rare than
having `.cpp` with `.hpp` and `.h` combination since `.h` headers can be
used as `extern "C"` wrapper of cpp library.
[LV] Handle loop.dependence.mask in verifyLastActiveLaneRecipe() (#199897)
This verification can be called after the alias-mask has been expanded
so needs to recognize loop.dependence.mask intrinsics.
[MLIR][AMDGPU] Add permlane16.var and permlanex16.var intrinsic ops (#199501)
## Summary
Add ROCDL and AMDGPU dialect support for the GFX12+ variable-selector
permlane intrinsics (`v_permlane16_var_b32` / `v_permlanex16_var_b32`).
Unlike the existing fixed-selector `permlane16`/`permlanex16` ops where
source-lane indices come from SGPR immediates, the "var" variants take
per-lane source-lane indices from a VGPR, enabling arbitrary per-lane
intra-row and cross-row permutations within a wave32 subgroup.
### ROCDL dialect
- `ROCDL_Permlane16VarOp` → `llvm.amdgcn.permlane16.var`
- `ROCDL_PermlaneX16VarOp` → `llvm.amdgcn.permlanex16.var`
- Both take `(old, src0, src1, fi, boundControl)` with `fi` and
`boundControl` as immediate i1 attrs
### AMDGPU dialect
[11 lines not shown]
[SelectionDAGBuilder] Replace asserts inside LLVM_DEBUG (#199748)
These assert were inside an LLVM_DEBUG macro, meaning they were very
rarely if ever tested. The second "LowerFormalArguments emitted a value
with the wrong type!" assert would fire in a number of tests so has been
removed. The other was replaced with an all_of assert.
Noticed when looking at #198107 / #199412.
[AArch64][TTI][EarlyCSE] Add support for ld1xN and st1xN intrinsics (#198765)
Handle ld1x2, ld1x3, ld1x4, st1x2, st1x3, st1x4 in:
- AArch64TTIImpl::getTgtMemIntrinsic
- AArch64TTIImpl::getOrCreateResultFromMemIntrinsic
This enables EarlyCSE to optimize these NEON load/store intrinsics.
To test the changes, a new testcase (intrinsics-1xN.ll) derived from
llvm/test/Transforms/EarlyCSE/AArch64/intrinsics.ll is added.
[libc++] Remove workarounds for __{add,remove}_pointer on AppleClang (#199821)
We've updated the supported AppleClang version, so we can drop those
workarounds now.
This also removes `__is_referenceable_v`, since it's no longer used.
Revert "[RISCV][CodeGen] Use vzip.vv for e64 interleave shuffles with Zvzip" (#199899)
Reverts llvm/llvm-project#199512
LLVM Buildbot has detected a build error for this PR.
[libc++][NFC] Remove lit annotations for older AppleClang versions (#199817)
We don't support anything older than apple-clang-21, so we can remove
those annotations.
AMDGPU/GlobalISel: Move executeInWaterfallLoop call from lower (#199701)
WFI is an argument to applyMappingSrc and lower,
move executeInWaterfallLoop after these two return.
Also set insert point in executeInWaterfallLoop to
avoid need to set insert point before calling it.
[clang] Add builtin to clear padding bytes (prework for P0528R3) (#75371)
Add builtin to clear padding bytes. This is the pre-work to implement
`std::atomic::compare_exchange_[weak/strong]` that ignores padding bits.
PR draft here: https://github.com/llvm/llvm-project/pull/76180
This PR picked up this patch from 3 years ago
https://reviews.llvm.org/D87974
The above patch no longer works as things changed quite a lot. I've made
some changes on top of the above patch:
it handles:
- struct
- builtin types with paddings (like `long double` and types with
`__attribute__((ext_vector_type(N)))`
- _Complex long double
- constant array
[7 lines not shown]