[SLP] Prefer outer binary op when opcode groups tie in buildInstructionsState
When VL contains instructions of different opcodes with equal counts,
the tie-breaking in buildInstructionsState could replace an outer
operation (e.g., fadd) with an inner one (e.g., fmul) that appears as
its direct operand, depending on SmallMapVector iteration order. Add a
check: if the current MainOp is a BinaryOperator with a direct operand
matching the challenger partition's opcode in the same block, keep
MainOp instead of switching to the inner operation.
Partially fixes #43353
Reviewers: bababuck, hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/198194
[SLP] Support ordered fadd reduction via reduction intrinsics
Add matchOrderedReduction() to recognize linearized ordered fadd chains
(both LHS- and RHS-associated) and tryToReduceOrdered() to vectorize
them using ordered reduction intrinsics (llvm.vector.reduce.fadd).
Previously, the SLP vectorizer could only vectorize ordered reductions
by keeping the original scalar chain and emitting extractelement
instructions. The new path replaces the scalar chain with a vector
ordered reduction intrinsic (where profitable), which allows the backend to lower it
more efficiently.
Reviewers: hiraditya, RKSimon, bababuck
Pull Request: https://github.com/llvm/llvm-project/pull/189451
[Flang][OpenMP][NFC] Track Objects for BlockArgs (#197442)
When lowering a BlockArg in OpenMP, currently the symbol is tracked.
This can however cause issues later on down the line as information may
be lost relating to an expression. For example, an ArrayElement will be
represented by its symbol, in this case the full array. This is not
ideal as its just he ArrayElement that is intended to be represented.
Now, the object is tracked instead of the Symbol. For cases where the
symbol is required, appropriate API is available to retrieve this
information. This change opens the ability to better handle lowering of
expressions such as Array Elements.
Assisted-by: Codex
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[X86] Cast atomic vectors in IR to support floats
Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.
Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.
Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two
[4 lines not shown]
[SelectionDAG] Split vector types for atomic store
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[X86] Cast atomic vectors in IR to support floats
Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.
Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.
Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two
[4 lines not shown]
[lldb] Fix no compile unit crash. (#195853)
This crash happens in lldb-dap when hovering inspecting over instruction
addresses in a frame that does not have debug information.
[TableGen] Fix getting weights of register classes (#198328)
The first member can be an aritifical register, so we have to find a
non-artificial one to query its weight.
[libc] Add regex_macros dependency to regex header (#198453)
Added the regex_macros dependency to the regex header target.
regex-macros.h was not being installed when regex entrypoints were
enabled.
Assisted-by: Automated tooling, human reviewed.
[lldb] Fix wrong buffer size when fetching Objective-C classes (#197389)
LLDB calls objc_getRealizedClassList_trylock to fetch the list of
realized Objective-C classes.
Jim spotted that we currently pass the buffer length in *bytes*, when
actually this API takes the buffer length in number of elements. This
causes that the Objective-C runtime write more memory that we allocated
for it. This can cause that the function calling expression crashes and
leaves the Objective-C runtime mutex locked.
[AArch64][SVE] Use truncating stores whenever possible (#196029)
For fixed length SVE and fixed length vectors x/y, fold
```
store(concat_vector(truncate(x), truncate(y)))
--> store(truncate(x))
store(truncate(y))
```
[Flang][Driver] Add per-target search path for modules (#196558)
Adds the version- and target-specific path
../lib/clang/<version>/finclude/flang/<target>
to the intrinsic module search path in addition to
../finclude/flang
with the former taking precedence if a module file should exist in both.
The version/target-specific path is added by the driver by passing
`-fintrinsic-modules-path` to the `-fc1` invocation. This is consistent
with gfortran and the usual pattern that the driver resolves paths into
the resource path, not the frontend.
This PR adds nothing into that directory, which will be done in #171515.
Extracted out of #171515 as requested by
[4 lines not shown]
[MIPS][GlobalISel] Remove dependency on legal ruleset (#197379)
This fills in always legal rules, to remove the dependency on the legacy
ruleset. This is not guaranteed to be all the rules, just the ones that
appear in tests.
[MLIR][NVGPU] Use NVVM enums in NVGPU dialect (#195812)
Updates the `nvgpu.rcp` Op to use the NVVM `FPRoundingModeAttr`
attribute instead of redefining the attribute in the NVGPU dialect.
[LoopPeel] Peel last iteration to enable load widening
In loops that contain multiple consecutive small loads (e.g., 3 bytes
loading i8's), peeling the last iteration makes it safe to read beyond
the accessed region, enabling the use of a wider load (e.g., i32) for
all other N-1 iterations.
Patterns such as:
```
%a = load i8, ptr %p
%b = load i8, ptr %p+1
%c = load i8, ptr %p+2
...
%p.next = getelementptr i8, ptr %p, 3
```
Can be transformed to:
```
%wide = load i32, ptr %p ; Read 4 bytes
[9 lines not shown]
[VPlan] Expand simple SCEVs directly to VPInstructions. (#189455)
Add initial simple SCEV expansion directly to VPInstructions. To start
with, just support expanding SCEV expressions for the vector step (VF *
UF). This requires expanding VScale, constants and multiply expressions.
This allows enables CSE for some redundant vscale calls as first step
and also enables expanding SCEV expressions in blocks other than the
header as follow-ups. For example, this could be useful to avoid some
code movement with https://github.com/llvm/llvm-project/pull/189372.