[SLP] Prefer outer binary op when opcode groups tie in buildInstructionsState
When VL contains instructions of different opcodes with equal counts,
the tie-breaking in buildInstructionsState could replace an outer
operation (e.g., fadd) with an inner one (e.g., fmul) that appears as
its direct operand, depending on SmallMapVector iteration order. Add a
check: if the current MainOp is a BinaryOperator with a direct operand
matching the challenger partition's opcode in the same block, keep
MainOp instead of switching to the inner operation.
Partially fixes #43353
Reviewers: bababuck, hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/198194
evdev: create sysctl entries before cdev to close userspace race
evdev_register_common() was creating the character device before
registering the kern.evdev.input.N.* sysctl entries. The moment
the cdev appears, devd fires a CREATE event for /dev/input/eventN,
and userspace libraries (e.g. libudev-devd) immediately call
sysctlbyname("kern.evdev.input.N.name") to enumerate device
capabilities. With the old ordering, that call could arrive before
the sysctl tree was populated, causing it to fail. The result was
that the device was not recognised by the input stack, leaving
keyboards and other HID devices non-functional after plug-in or
resume from suspend.
Fix the race by calling evdev_sysctl_create() before
evdev_cdev_create(). On cdev failure, free the already-registered
sysctl context with sysctl_ctx_free() to avoid leaking it.
[SLP] Support ordered fadd reduction via reduction intrinsics
Add matchOrderedReduction() to recognize linearized ordered fadd chains
(both LHS- and RHS-associated) and tryToReduceOrdered() to vectorize
them using ordered reduction intrinsics (llvm.vector.reduce.fadd).
Previously, the SLP vectorizer could only vectorize ordered reductions
by keeping the original scalar chain and emitting extractelement
instructions. The new path replaces the scalar chain with a vector
ordered reduction intrinsic (where profitable), which allows the backend to lower it
more efficiently.
Reviewers: hiraditya, RKSimon, bababuck
Pull Request: https://github.com/llvm/llvm-project/pull/189451
[Flang][OpenMP][NFC] Track Objects for BlockArgs (#197442)
When lowering a BlockArg in OpenMP, currently the symbol is tracked.
This can however cause issues later on down the line as information may
be lost relating to an expression. For example, an ArrayElement will be
represented by its symbol, in this case the full array. This is not
ideal as its just he ArrayElement that is intended to be represented.
Now, the object is tracked instead of the Symbol. For cases where the
symbol is required, appropriate API is available to retrieve this
information. This change opens the ability to better handle lowering of
expressions such as Array Elements.
Assisted-by: Codex
emulators/fbsd-duckstation: Fix build on 16-CURRENT
Shipped fmt does not build with clang 21 and consteval enabled:
/wrkdirs/usr/ports/emulators/fbsd-duckstation/work/duckstation-0.1-6937/dep/fmt/src/os.cc:172:35: error: call to consteval function 'fmt::basic_format_string<char, const char *>::basic_format_string<FMT_COMPILE_STRING, 0>' is not a constant expression
172 | FMT_THROW(system_error(errno, FMT_STRING("cannot open file {}"),
| ^
/wrkdirs/usr/ports/emulators/fbsd-duckstation/work/duckstation-0.1-6937/dep/fmt/include/fmt/format.h:1905:23: note: expanded from macro 'FMT_STRING'
1905 | #define FMT_STRING(s) FMT_STRING_IMPL(s, fmt::detail::compile_string, )
| ^
[...]
so disable consteval to work-around the problem.
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[X86] Cast atomic vectors in IR to support floats
Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.
Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.
Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two
[4 lines not shown]
Make qwx(4) send the PMF good-bye deauth frame when hopping out of RUN state.
In addition to sending the PMF good-bye deauth frame from qwx_stop() we
must also send it when leaving RUN state for other reasons.
Provided we are still running with IFF_RUNNING since otherwise qwx_stop()
has already sent the deauth frame. And provided the AP did not just send
a deauth frame to us, which also covers the background-scan/roaming case
where a deauth frame is sent via ieee80211_node_tx_stopped() and net80211
is faking our old AP's deauth event.
[SelectionDAG] Split vector types for atomic store
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[X86] Cast atomic vectors in IR to support floats
Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.
Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.
Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two
[4 lines not shown]
[lldb] Fix no compile unit crash. (#195853)
This crash happens in lldb-dap when hovering inspecting over instruction
addresses in a frame that does not have debug information.