[X86] Update (p)haddsub undef element tests to match the output IR from the middle-end (#207244)
Use the vectorised output from the PhaseOrdering/X86 hadd.ll tests -
I've added test coverage for multiple SSE/AVX levels for cases where the
middle-end output is different for any level.
This exposes a number of regressions that have been there for some time
but we'd missed as we'd assumed the backend would still be receiving
non-vectorised IR, but there's been plenty of changes to SLP,
InstCombine and VectorCombine since then - end2end tests would have been
very useful here :(
Looking at fixes next before finally removing the (dead) scalar hadd
matching code for #143000
[lldb][test] Modernize and expand data-formatter-stl/generic/vbool (#206955)
This fixes several issues with this test:
* We use modern test utils for setting up the process.
* We get rid of the state-reset code which is no longer necessary these
days.
* Expand the test to also cover an empty and sub-word-size vector of
bool.
assisted-by: claude
[lldb][test] Truncate unexpectedly long test outputs (#206967)
A bug in LLDB could make our tests to produce giant ValueObjects with
millions of children. The same goes for most commands that print out
test data. While our test system can handle this amount of output, the
resulting log output will most likely break the storage capacity of our
build bots.
This patch adds truncation to the various expect* methods that avoids
spamming the output in the (unlikely) case this happens.
See also #206444
[DebugInfo] Avoid std::function in DWARF verifier internals (#202866)
This changes `OutputCategoryAggregator`'s synchronous callback
parameters from `std::function` to `function_ref`, avoiding type-erased
callback construction at 75 `DWARFVerifier` diagnostic sites.
On an arm64 Release build, standalone llvm-dwarfdump decreases by
133,680 bytes raw and 17,040 bytes stripped, `DWARFVerifier.cpp.o`
decreases by 174,464 bytes, and linked fixups decrease by 546.
Work towards #202616
AI tool disclosure: Co-authored with OpenAI Codex.
[llvm-exegesis] Add raw PMU encoding in TargetPfmCounters tablegen (#201228)
Adds optional EventSelect and UMask fields to PfmCounter in
TargetPfmCounters.td. EventSelect defaults to -1 (no raw encoding).
When set, ExegesisEmitter outputs raw hex values instead of a libpfm
symbolic name, allowing per-CPU .td entries to bypass
pfm_get_os_event_encoding for counters that are undocumented or
unsupported in libpfm.
Extends PfmCountersInfo with CycleCounterEventSelect, CycleCounterUMask,
UopsCounterEventSelect, and UopsCounterUMask fields. PerfHelper wiring
in a subsequent patch.
Towards #187930
[mlir][linalg/scf/transform] scalable tiling and fusion for pack/unpack ops (#204007)
# Inner tile alignment hints for scalable `linalg.pack`/`linalg.unpack`
tiling and fusion
## Overview
Tiling and fusing `linalg.pack`/`linalg.unpack` produces a clean result
only when the tiling implementation can tell how a loop tile size
relates to the op's inner tile size. When both are statically known this
is decided by comparing the constants. But with **scalable** (and, more
generally, dynamic) sizes, e.g. a loop tile of `8 * vscale` against an
inner tile of `8 * vscale`, that relationship is symbolic and cannot
cleanly be recovered from the IR, so the implementation conservatively
falls back to a dynamic, over-allocated tile. See #150185 for more
details.
This PR adds an optional **inner tile alignment hint**: a per-dimension
caller assertion about that relationship, threaded from the transform-op
[169 lines not shown]
[CI][flang][OpenMP] Build OpenMP runtime mod files for flang tests (#206517)
Some flang openmp lit tests require mod files (a bit like C header
files, except they are compiler generated) from the openmp runtime. As
the openmp runtime is not currently built in this configuration, these
71 flang tests get skipped and a warning is emitted.
Here I enable openmp as a dependency for flang but add
-DLIBOMP_FORTRAN_MODULES_ONLY=ON so that only the required mod files are
built and not the whole of the openmp runtime.
This only effects linux bots: Windows and MacOS explicitly exclude
openmp so it will still not be enabled there.
Assited-by: Codex
[Clang][tests][NFC] Split __counted_by attribute testcases into two (#207144)
Splitting the testcase file makes it easier to review the generated
code. The only changes are cosmetic:
- Renaming functions and structs to be more descriptive, and
- Removing a duplicate test.
[lldb][test] Add a function to spawn lldb-server platforms (#205083)
I will be doing this in a future test and we already have a few copies of this code in various tests.
[AArch64] Combine undef UZP and NVCAST away. (#204623)
These are used to lower insert_subvec nodes quite early in SDAG. After
DAG combines run, it's possible that the inputs to these AArch64 nodes
become UNDEF.
[SDAG][AArch64] Fold extract from pext to use status flags (#206443)
This folds extracting the first bit from the first segment of a
predicate-as-counter to use the "first active" status. E.g.:
```
%pn:aarch64svcount, %flags:FlagsVT = WHILELO_PRED_COUNTER(a, b, VLx4)
%first_pred:nxv4i1 = pext(%pn, 0)
%more:i1 = extractelement(%first_pred, 0)
```
->
```
%pn:aarch64svcount, %flags:FlagsVT = WHILELO_PRED_COUNTER(a, b, VLx4)
%more = CSET(%flags, FIRST_ACTIVE)
```
Assisted-by: Codex (adding test variations)
[clang][bytecode][NFC] Report error if HasGroup is set without types (#207334)
Setting `HasGroup = 1` in tablegen without the types being non-empty
causes problems later, so diagnose it.
[AArch64][FastISel] Update arm64-fast-isel-conversion.ll check lines (NFC) (#207159)
Before fixing relevant bugs and extending the existing tests,
auto-generate CHECKs.
Note that some of the existing CHECKs actually check for buggy isel.
Those will be fixed separately, after adding more tests in a separate
PR. This PR just runs `update_llc_test_checks.py`.
[AArch64] Fix reversed values in big-endian 128-bit atomics (#205760)
When AArch64TargetLowering expands a load-linked or a store-conditional
during the atomic-expand pass, it made the fixed assumption that the
64-bit value stored first in memory was the low-order half of the
128-bit value, instead of checking the SubtargetInfo's endianness. The
same was true of the code that expands CMP_SWAP_128 pseudoinstructions.
So in each case, if you compiled 128-bit atomic code big-endian, you'd
get back a 128-bit integer with the top and bottom half swapped.
This was found by compiler-rt's existing tests when we ran them for a
big-endian AArch64 target in Arm Toolchain.
Most of the test changes here are `update_llc_test_checks` churn: there
were already many tests of AArch64 atomics in big-endian mode, and
apparently they all simply had the reversed registers in their expected
output.
The one new test, `aarch64_i128_endianness.ll`, directly demonstrates
[4 lines not shown]
[Dexter] Switch to using script-mode by default (#204369)
This patch changes the default mode of Dexter from heuristic-mode to
script-mode. The --use-script argument is replaced with --use-heuristic,
some comments/docs/error messages are updated accordingly, and tests
have their flags switched accordingly.
[libc][ARM] Defend banked SP setup against register allocator (#206757)
The startup code for bare-metal AArch32 A/R shifts the CPU through all
the different modes which have their own copies of SP, updating all the
stack pointers to the same value. But it does it using C intrinsics,
leaving the register allocation to the compiler – so it's possible that
the register allocator happens to use one of the _other_ banked
registers, such as LR.
For example, when I built this code today, it happened that LR was used
to hold one of the constants written into CPSR_c to change mode. That
constant was written into the SVC mode LR before any mode changes, but
the MSR instruction that tried to use it was run in a different mode, so
it copied from _that_ mode's LR, which contained uninitialised nonsense
in place of the desired constant, triggering a boot-time crash.
I think it's safer to use a single asm statement for the whole job,
guaranteeing which registers it uses.
[IR] Explicitly specify target feature for module asm (#204548)
Support specifying additional properties on module-level inline
assembly. In particular, the target features and target CPU can now be
specified as follows:
module asm(target_features: "+foo", target_cpu: "bar")
"asm line 1"
"asm line 2"
There may be multiple module inline assembly blocks with different
properties.
This is intended to fix the long standing issue where in LTO scenarios
we don't know what target features to use for parsing the module-level
inline assembly. Now they can be faithfully preserved, even when merging
inline assembly from different modules with different features.
If target_features and target_cpu are empty, we fall back to the old
[4 lines not shown]
[LifetimeSafety] Track unary plus on a pointer (#207243)
Unary plus on a pointer is the identity (+p == p), so the result carries
the operand's loans -- but UO_Plus fell through VisitUnaryOperator's
default and left the result origin empty, dropping the borrow (e.g. p =
+&local was a silently-missed use-after-scope). Handle it by flowing the
operand's rvalue origins.
Assisted-by: Claude Opus 4.8
Co-authored-by: Gabor Horvath <gaborh at apple.com>
[Dexter] Switch to using script-mode by default
This patch changes the default mode of Dexter from heuristic-mode to
script-mode. The --use-script argument is replaced with --use-heuristic,
some comments/docs/error messages are updated accordingly, and tests have
their flags switched accordingly.