[AArch64] Optimize vector fmul(sitofp/uitofp, 1/2^N) -> scvtf/ucvtf (#141480)
When a vector integer-to-float conversion is followed by a multiply with a
reciprocal power-of-two constant, we can fold both operations into a single
SCVTF or UCVTF instruction with a fixed-point shift operand.
For example, `fmul(sitofp(v2i32 x), <0.5, 0.5>)` becomes `scvtf.2s v0, v0, #1`.
This is a reworked version with several improvements over the original
submission:
- Rewrite the C++ operand matcher to share implementation with the existing
`SelectCVTFixedPointVec` (MOVIshift, FMOV, and DUP handling with correct
truncation for f16)
- Add `uitofp`/`ucvtf` patterns via a `CVTFRecipPat` multiclass
- Add full GlobalISel support (`GIComplexOperandMatcher` + renderer)
Supported vector types: `v2f32`, `v4f32`, `v2f64`, `v4f16`, `v8f16`.
Fixes #94909
[SLP] Reuse diamond-matched gather across asymmetric reorder/reuse
processBuildVector's perfect-diamond match used Entries.front()->isSame(
E->Scalars) only, missing matches where E carries the reorder/reuse and
the entry is canonical. Two TreeEntries with the same effective scalar
layout but different raw orderings then build independent gathers; one
emits a fill-in shufflevector for reused lanes while the other leaves
poison there.
Fixes #194191.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/194247
[flang][OpenMP] Rename "declare constructs" to directives, NFC (#194240)
Only executable directives are constructs in OpenMP, so, for example,
"declare mapper" is not a construct.
Apply
find flang/ \( -name '*.cpp' -o -name '*.h' -o -name '*.f90' \) -exec sed \
-i -E -e 's/OpenMP(Declare[A-Za-z]*)Construct\b/Omp\1Directive/g' {} \;
plus local formatting updates as needed.
[flang][OpenMP] Rename "declare constructs" to directives, NFC
Only executable directives are constructs in OpenMP, so, for example,
"declare mapper" is not a construct.
Apply
find flang/ \( -name '*.cpp' -o -name '*.h' -o -name '*.f90' \) -exec sed \
-i -E -e 's/OpenMP(Declare[A-Za-z]*)Construct\b/Omp\1Directive/g' {} \;
plus local formatting updates as needed.
[VPlan] Verify and handle FOR legality during header phi creation (NFC). (#191298)
Move the logic to validate FOR users and introduce the split directly to
header phi creation. It makes sense to introduce the header phi and the
splice together.
It also means sinking only needs to be done once, instead for each
VPlan.
Depends on https://github.com/llvm/llvm-project/pull/190681.
PR: https://github.com/llvm/llvm-project/pull/191298
[LoongArch] Add support for vector add/sub on vNi128 types
Legalize ADD/SUB for v1i128 and v2i128 and extend LSX/LASX instruction
selection patterns to support the Q element size. Update register classes
to include vNi128 types and add codegen tests to verify lowering to
VADD.Q/XVADD.Q and VSUB.Q/XVSUB.Q.
Revert "[ARM] Fold SELECT (AND(X,1) == 0), C1, C2 -> XOR(C1,AND(NEG(AND(X,1)),XOR(C1,C2)) in Thumb1 (#185898)" (#194230)
This reverts commit 1823355d06b854854701a8ba430aa1f6be9994f4 due to
performance
regressions in benchmarks.
[SLP] Fix spill-cost cache lookup and predecessor scan
A cached intra-block scan that stopped at a call or budget limit only
proves the sub-range below the stop point is call-free; do not reuse
the cached bit for queries whose First lies above it. Also switch the
cross-block predecessor scan to "exists a call-free backward path"
semantics, skip blocks strictly dominated by Root, and memoize only
the (Root, OpParent) key. Fixes a false-positive spill cost that was
blocking profitable vectorization.
Reviewers: RKSimon, hiraditya, bababuck
Pull Request: https://github.com/llvm/llvm-project/pull/192709
[NVPTX] Scalarize `contract FMUL v2f32` to enable FMA fusion (#192815)
SM100+ legalizes `FMUL v2f32`, blocking the scalar FADD->FMA combiner.
Scalarize it when `contract` (or `allowFMA()`) is set and every lane
feeds a single `contract` FADD.
[clang][bytecode] Add new IntegralType for function addresses (#194206)
We used to use just `::Address` for functions, which later caused
problems because we casted the pointer to `ValueDecl*` and passed it to
`Program::getOrCreateGlobal()`, which doesn't work of course.
[clang][bytecode] Fix some problems with ptr-to-int casts (#193988)
1) When doing integral casts on a pointer-casted-to-integral, check the
bitwidth we're casting _to_, not the one we're casting _from_.
2) When the pointer we're casting to an integral is a dummy pointer,
don't forget to check the bitwidth.
[MC] Add MCTargetOptions to MCAsmInfo constructor. NFC (#194200)
Since #180464 the canonical MCTargetOptions pointer is stored in
MCAsmInfo, but it is bound after construction via `setTargetOptions`
called from TargetRegistry::createMCAsmInfo.
Direct constructions in unit tests can leave the pointer null, leading
to a runtime assert failure. Add MCTargetOptions to every MCAsmInfo
subclass constructor, store it as a reference in MCAsmInfo, and remove
`setTargetOptions()`.
[libcxx] Include python3-yaml and rsync in container (#194182)
rsync is needed for installing the kernel headers for the libc build.
The yaml python package is needed for libc's hdrgen. This means we no
longer have to install these utilities at runtime.
They should be small enough relative to the existing container image
size to not really have an impact in that regard.
[mlir][arith] Fold subi(a, subi(a, b)) to b (#194134)
Add a folder for `arith.subi` that simplifies `subi(a, subi(a, b))` to
`b` using the algebraic identity `a - (a - b) = b`.
[MLIR][XeGPU] Remove offsets from create_nd_tdesc & remove update_nd_offset, move offsets to load/store/prefetch ops (#193330)
This PR removes the optional offsets/const_offsets operands on
xegpu.create_nd_tdesc and instead mandates offsets directly on the
consuming load, store, and prefetch ops. It also deprecates the
update_nd_offset op.
[libclc] Only check the triple architecture for libclc (#194149)
Summary:
Previously, `nvptx64--` would reject `nvptx64-unknown-unknown`. Two
options, either normalize all the triples in CMake, or just check the
architecture. I went with the former because it makes it easier for
people to pass different values.