[Offload] Fix ordering with RPC teardown and global destructors (#205594)
Summary:
There's a bit of a chicken and egg problem for the RPC server if we want
to do something creative with the device's image for things like DWARF
dumping. The problem was that destructors can make RPC calls, but the
RPC server also needed the images to be valid. Simple fix is to just
split the destructor calling out and do it first so we can deinitialize
RPC with valid device images.
[SBVec] Implement topDown/botUp vectorizers in unison
This patch introduces the `top-down-vec` pass to the Sandbox Vectorizer,
adding the ability to traverse use-def chains top-down to discover and
collect vectorization opportunities. Furthermore, this patch unifies
the two vectorizers into a single implementation to minimize code
duplication.
[SBVec] Implement topDown/botUp vectorizers in unison
This patch introduces the `top-down-vec` pass to the Sandbox Vectorizer,
adding the ability to traverse use-def chains top-down to discover and
collect vectorization opportunities. Furthermore, this patch unifies
the two vectorizers into a single implementation to minimize code
duplication.
[lldb-dap][test] Re-enable test_by_name_waitFor on Windows (#205570)
`test_by_name_waitFor` passes with `LLDB_USE_LLDB_SERVER=1`.
`test_by_partial_name_waitFor` hangs on exit. Skip if for now.
rdar://180515488
[SBVec] Implement topDown/botUp vectorizers in unison
This patch introduces the `top-down-vec` pass to the Sandbox Vectorizer,
adding the ability to traverse use-def chains top-down to discover and
collect vectorization opportunities. Furthermore, this patch unifies
the two vectorizers into a single implementation to minimize code
duplication.
[RISCV][XCV] Relax long `cv.beqimm`/`cv.bneimm` branches (#205096)
`cv.beqimm` and `cv.bneimm` encode their target as a 13-bit signed
PC-relative offset (+/-4094 bytes). Branches beyond that range were
silently truncated by MC fixup application, producing wrong code with no
diagnostic. Add `PseudoLongCV_BEQIMM`/`PseudoLongCV_BNEIMM` and the
MC-layer relaxation flow (inverted short branch + JAL trampoline),
mirroring the standard B-type and Qualcomm Xqcibi vendor branches.
**Tests:** `xcvbi-branch-relax.ll` (uses `-filetype=obj | llvm-objdump`,
since MC-layer relaxation is only observable on object emission, not on
textual asm).
Split out of #204879 at review request (one fix per PR).
Part of a CORE-V (XCV) series; see RFC:
https://discourse.llvm.org/t/rfc-core-v-xcv-support-for-cv32e40p-clang-builtins-xcvsimd-intrinsics-and-generic-auto-selection/91111
[clang-doc] Try to make testing more uniform
Today clang-doc has tests for its various backends that use the same
input files, and mix the checks for each format. This leads to very
large test files that are quite hard to update or maintain. Thus far
we've assumed that this is better than updating several files, but as we
leverage mustache and JSON more and more to test feature completeness,
much of the output complexity is now limited to each backend and its
mustache templates. To make this simpler to maintain, we can lean into
common test Inputs keeping the annotate source separate from the test
checks, and split the checks out into their own directory hierarchy.
This patch is mostly mechanical rewriting of code. This was done with
the assistance of an LLM, but was checked by me, and verified with
instrumentation based coverage that we did not lose any line coverage.
[llvm-objdump] Add --substitute-path and --source-dir for --source (#201096)
When the code object was compiled on a different machine that does not
have the same directory structure, or the source code has been moved, we
are seeing a warning with the disassembler, stating that the files
embedded in the code object were not found on disk.
This patch introduces a command line options for llvm-objdump, which
provide alternate directory locations to locate the source files on
disk. These options are inspired by GDB commands _set directory_ and
_set substitute-path_.
--substitute-path — Takes two strings, _from_ and _to_, and do a simple
string replacement of from with to at the start of the directory part of
the source file name, then use that result instead of the original file
name to look up the sources. A rule applies only if _from_ ends at a
directory separator.
--source-dir — Add directories to the source search path. Directories
are searched in following order: original recorded path, source-dir +
relative recorded path or absolute path without root, source-dir +
basename.
[clang-doc] Test more language constructs
We're missing several different language constructs in our tests. This
patch simply adds the basic tests and captures the output without trying
to fix or adjust any behavior, and can be considered a sort of precommit
test for future fixes to the various documentation components.
[Dexter] Add ability to rewrite scripts to fill-in unknown values (#202799)
This patch adds a feature to Dexter that allows scripts to be passed to
Dexter with missing expected values (`null` values in YAML), which
Dexter will attempt to "fill-in" with expected values that match the
debugger's actual output. The result is written to a file with the same
name as the original test file, in the directory given by
--results-directory if one is present; all content outside of the Dexter
script itself is preserved exactly as-is.
NB: Each test in this patch has a corresponding "expected" file, which
is almost identical (including the `RUN` lines), and exists to be
`diff`'d against the output of Dexter's script generation.
[Clang] Rebuild lambda captures in default member initializers
Fixes https://github.com/llvm/llvm-project/issues/196469
Since the CWG1815 implementation, `InitListChecker` rebuilds a default
member initializer at its point of use in aggregate initialization. The
rebuild uses the `EnsureImmediateInvocationInDefaultArgs` tree
transform, where `TransformCXXBindTemporaryExpr` strips
`CXXBindTemporaryExpr` nodes, relying on the subexpression's rebuild to
re-create the temporary binding: every `Rebuild*` path funnels through
`Sema::MaybeBindToTemporary`, which also re-registers the cleanup in
the current evaluation context.
However, the transform overrides `TransformLambdaExpr` to return the
closure unchanged because the lambda body is not a subexpression. That
skips the `MaybeBindToTemporary` call that `BuildLambdaExpr` ends with.
The rebuilt initializer then lacks both the `CXXBindTemporaryExpr`
around the closure and the `ExprWithCleanups` marker, so CodeGen never
emits the closure's destructor and init-captured members leak.
[11 lines not shown]
[mlir][vector] reject negative strides for `vector.load`/`vector.store` (#204611)
This PR follows up #204309 and #204309.
It simply rejects negative strides for vector.load/vector.store :D
AI Disclaimer: I used AI for the tests.
---------
Signed-off-by: Federico Bruzzone <federico.bruzzone.i at gmail.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski at gmail.com>
[AArch64] Recombine SETCCCARRY for legalized unsigned compares (#204504)
Type legalization can turn wide unsigned compares into SETCCCARRY nodes
fed by USUBO carry results, hiding the original high/low compare shape
from the existing CCMP conjunction/disjunction lowering.
Add an AArch64 DAG combine for SETCCCARRY that recognizes these
legalized wide-compare patterns and rebuilds them as SETCC plus AND/OR,
exposing them to the existing CCMP lowering.
This is separated from https://github.com/llvm/llvm-project/pull/181822.
[Clang] Rebuild lambda captures in default member initializers
Fixes https://github.com/llvm/llvm-project/issues/196469
Since the CWG1815 implementation, `InitListChecker` rebuilds a default
member initializer at its point of use in aggregate initialization. The
rebuild uses the `EnsureImmediateInvocationInDefaultArgs` tree
transform, where `TransformCXXBindTemporaryExpr` strips
`CXXBindTemporaryExpr` nodes, relying on the subexpression's rebuild to
re-create the temporary binding: every `Rebuild*` path funnels through
`Sema::MaybeBindToTemporary`, which also re-registers the cleanup in
the current evaluation context.
However, the transform overrides `TransformLambdaExpr` to return the
closure unchanged because the lambda body is not a subexpression. That
skips the `MaybeBindToTemporary` call that `BuildLambdaExpr` ends with.
The rebuilt initializer then lacks both the `CXXBindTemporaryExpr`
around the closure and the `ExprWithCleanups` marker, so CodeGen never
emits the closure's destructor and init-captured members leak.
[11 lines not shown]
[LifetimeSafety] Cache lifetimebound macro lookup (#205250)
Cache lifetimebound macro spelling lookup used by fix-it suggestions.
Current cache strategy:
- During cache build, collect macro names that have ever been defined as
a lifetimebound attribute spelling.
- During lookup, only visit those cached macro names, find the active
definition at the fix-it location, and re-check that the active
definition still has lifetimebound spelling.
- If multiple matching macros are active at the fix-it location, use the
most recently defined one.
Performance:
| Case | Before 359bfe6 | 359bfe6| After Cached |
|-------------------------------------------------------------|----------------:|---------------:|-------------:|
[10 lines not shown]
AMDGPU/GlobalISel: Implement G_GET/SET_ROUNDING (#205265)
Implement G_GET/SET_ROUNDING for the llvm.get.rounding and
llvm.set.rounding intrinsics.
The lowering is ported from the existing SelectionDAG handling, keeping
the structure close to the SDAG implementation.
Assisted by: Claude Opus 4.8
[llubi] Always print out error message (#205573)
When `--verbose` is not specified, the error message and UB reason are
omitted. However, this information is still useful for test oracles. For
example, the fuzzer may skip the seed when it runs out of time.
BTW, the stack trace is always dumped. Not sure if it is intended or
not.
[AArch64][ISel] Enable profile-aware branch condition merging (#201486)
AArch64 previously inherited the default {-1, -1, -1} for
`getJumpConditionMergingParams`, causing
`shouldKeepJumpConditionsTogether` in SelectionDAGBuilder to always
return false. This meant compound branch conditions (br (and/or cond1,
cond2)) were always split into separate basic blocks at the DAG level,
and profile data from BranchProbabilityInfo was never consulted for the
merge/split decision.
Override `getJumpConditionMergingParams` in AArch64TargetLowering with
tunable cl::opt parameters matching the X86 structure. Since `CCMP` is
part of the base AArch64 ISA, the `CCMP` bias is applied
unconditionally. Default values: `BaseCost=2, CcmpBias=6 (effective
threshold 8), LikelyBias=0, UnlikelyBias=-1`.
This enables three improvements:
1. Profile-guided merge/split decisions using BranchProbabilityInfo
2. Smarter compare ordering at the DAG level (e.g., placing large
[13 lines not shown]
[clang][bytecode] Work around virtual bases being present in APValues (#205553)
This happens in code called via `evaluateDestruction()`, where we
consume an `APValue` created by the current interpreter. APValues don't
have a notion of virtual bases right now, so the virtual bases simply
appear as regular ones.
[mlir][tosa] Handle function declarations in tosa input shape pass (#205359)
Fixes https://github.com/llvm/llvm-project/issues/205063.
The `tosa-experimental-input-shape` pass currently does not handle
function declarations correctly. The pass may run on declarations, but
the current implementation assumes that every function has a body and
unconditionally accesses the entry block and the last block when
updating argument and result types.
This patch checks whether the function has a body before accessing body
blocks. For declarations, the pass updates the function signature input
types and preserves the original result types, since there is no return
operation from which result types can be inferred.
A regression test is added for the declaration case.
[X86] matchPMADDWD - add support for larger source types (#205391)
Handle cases where the source vector type came from a vXi32 type wider
than 2 x the original vXi16 type
The matcher only bothers with the lower elements - it doesn't matter if
we're extracting from a wider vector
Fixes a number of SSE/AVX512 targets that failed to legalize to
recoverable vector widths