[MLIR][XeGPU] Promote memref.alloca to SLM in convert-vector-to-xegpu (#197978)
Run a small pre-pass at the start of convert-vector-to-xegpu that
rewrites every memref.alloca to address space 3, so allocations coming
out of bufferization carry the SLM attribute by the time the conversion
patterns run.
[VPlan] Intersect IR flags across interleave members when narrowing. (#201682)
Update narrowInterleaveGroupOp to properly intersect flags for all wide
members, to make sure we only use the flags common across all combined
members.
[MemProf] Fix incorrect VP metadata update during ICP promotion (#201658)
Track unpromoted candidates explicitly when performing ICP during
MemProf
context disambiguation. Previously, the code assumed that the first N
candidates were always the ones promoted, which led to incorrect
metadata
on the fallback indirect call if a candidate was skipped (e.g. due to
missing definition or being illegal to promote).
[lldb][docs] Convert top-level RST docs to Markdown (NFC) (#201674)
Convert the two remaining top-level RST docs, index and
python_api_enums, to MyST Markdown. This is the final batch of an
incremental RST -> Markdown migration.
After this change, the only RST sources left under lldb/docs/ are
man/lldb.rst and man/lldb-server.rst, which conf.py intentionally keeps
as RST so the man-page builder can run without myst_parser installed
(this reduces dependencies for some llvm distributions).
Verified by building the docs on origin/main and on this branch with
identical sphinx flags and diffing both the warnings and the rendered
HTML. After file extension and line numbers are normalized, the warning
sets match exactly. index.html is byte-identical; python_api_enums.html
differs in a single line where CommonMark collapses two spaces after a
period to one.
The diff also surfaced two semantic regressions in the conversion, fixed
[10 lines not shown]
[libc++] Suppress deprecation warning around wstring_convert::to_bytes (#201633)
The deprecation warning for wstring_convert::to_bytes fires from inside
the libc++ header, so users can't suppress it with their own diagnostic
pragmas around the call site. Wrap the definition with
_LIBCPP_SUPPRESS_DEPRECATED_PUSH/POP, mirroring what's already done for
the destructor and from_bytes just above.
Add a regression test under test/libcxx.
rdar://173319468
Assisted-by: Claude
[libc++] Fix constraint recursion in std::expected's operator== (#201455)
The C++26 constraint added to operator==(const expected& x, const T2& v)
by P3379R0 evaluates *x == v as part of constraint satisfaction. When
ADL on a comparison reaches this hidden friend through a type whose
associated namespaces include std::expected -- for example std::pair<T,
std::expected<U, V>> -- the constraint check ends up considering the
same overload again with the original type as T2, producing a
"satisfaction of constraint depends on itself" error.
Parameterize the expected operand with an extra template parameter
constrained to be the same type as the enclosing expected's value type.
This is observationally equivalent but makes template argument deduction
fail for non-expected operands before the constraint is evaluated, so
the recursion never starts.
Fixes #160431
rdar://178226313
Assisted-by: Claude
[SCEV] Batch common-factor folding in getAddExpr (#184258)
The existing pairwise common-factor fold in getAddExpr handles two
patterns:
`W + X + (X * Y * Z) --> W + (X * ((Y*Z)+1))`
`X + (A*B*C) + (A*D*E) --> X + (A*(B*C+D*E))`
Both fold exactly two terms sharing a common factor, then re-enter
getAddExpr() with the partially-simplified Ops. When n terms share a
common factor X, this requires n-1 re-entries through the full
getAddExpr normalization pipeline.
Replace this with a single-pass scan that collects all terms sharing the
common factor and folds them in one shot:
`A1*X + A2*X + ... + An*X --> X * (A1 + A2 + ... + An)`
This reduces the number of top-level re-entries into getAddExpr() for
this fold from n-1 to 1, improving compile time for expressions with
many terms sharing a common factor.
[2 lines not shown]
Stop writing cluster_mode in scst.conf
Let middleware be the sole writer of cluster_mode via direct sysfs
writes (iscsi.scst.path_write_if_needed and callers). When scst.conf
also drove cluster_mode, pyscstadmin's apply reconciled the runtime
back to whatever value was captured at render time, undoing any
cluster_mode=1 just set by standby_fix_cluster_mode and destroying
per-extent DLM lockspaces on every cycle.
set_standby_lun_to_cluster_mode is repurposed as a predicate for
queueing the fix job. set_active_lun_to_cluster_mode and the state
it depended on (active_extents, cluster_mode_targets, dlm_ready) are
removed.
Must land together with the matching truenas_pyscstadmin change.
zpool/zfs: accept --help and -? after a subcommand
Print the short usage instead of "invalid option".
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Christos Longros <chris.longros at gmail.com>
Closes #18541
[MLIR][LLVMIR] Add support for intrinsics with metadata arguments (#200308)
This updates the LLVM dialect to properly handle intrinsics with
metadata arguments.
The primary goal of this change is to support the constrained FP
intrinsics, but support for other intrinsics with metadata arguments
came along with the change.
I have not yet added the RoundingModeOpInterface and
FPExceptionBehaviorOpInterface to CallIntrinsicOp. I intend to do that
as a follow up change if this direction is accepted. I have also not yet
removed existing specialized operations that explicitly handle a subset
of the constrained intrinsics.
Assisted-by: Cursor / claude-opus-4.7
Skip redundant cluster_mode writes to avoid scst_mutex contention
Add iscsi.scst.path_write_if_needed: read the attribute first and
write only if the first line differs. Route the three cluster_mode
setters (set_device_cluster_mode, set_devices_cluster_mode,
set_all_cluster_mode) through it.
The kernel's vdev_sysfs_cluster_mode_store path takes
scst_alloc_sysfs_work, scst_sysfs_queue_wait_work,
scst_suspend_activity (which quiesces in-flight commands) and
acquires the global scst_mutex BEFORE the same-value short-circuit.
So even no-op writes contend on a global mutex and can serialize
behind a long-running cluster_mode operation. cluster_mode_show
is lock-free, so the pre-read is essentially free.
Comparison is first-line-only because show emits a trailing
SCST_SYSFS_KEY_MARK line when cluster_mode is set.