[CodeGen] Add listener support to the rematerializer (NFC) (#184338)
This change adds support for adding listeners to the target-independent
rematerializer; listeners can catch certain rematerialization-related
events to implement some additional functionality on top of what the
rematerializer already performs.
This is NFC and has no user at the moment, but the plan is to have
listeners start being responsible for secondary/optional functionalities
that are at the moment integrated with the rematerializer itself. Two
examples of that are:
1. rollback support (currently optional), and
2. region tracking (currently mandatory, but not fundamentally necessary
to the rematerializer).
[MCA] Make `ResourceSizeMask` const (#189453)
This patch marks the already effectively constant `ResourceSizeMask` as
`const`. It adds a helper `computeResourceSizeMask()` to initialize it
in the member initializer list.
Add a target triple to clang/test/Modules/pr189415.cppm (#189937)
Not all targets support thread_local, so in some environments the test
would fail with:
tools/clang/test/Modules/Output/pr189415.cppm.tmp/counter.cppm:6:1:
error: thread-local storage is not supported for the current target
Follow-up to #189796
[RISCV] Remove codegen for VP float rounding intrinsics (#189896)
Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999
This splits off seven intrinsics from #179622.
We now generate vfcvt.rtz for llvm.vp.roundtozero. It looks like we
should have been using the codegen for llvm.trunc for it, but we somehow
missed that.
[clang][bytecode] Disable tail calls on sparc (#189887)
Looks like this causes problems there as well:
https://lab.llvm.org/buildbot/#/builders/114/builds/252
Interp.cpp:2572:21: error: cannot tail-call: target is not able to
optimize the call into a sibling call
2572 | MUSTTAIL return Fn(S, PC);
| ~~^~~~~~~
[AArch64][GlobalISel] Selet SQDMLSLv1i64_indexed when vector_extract present
Like SQDMLALv1i64_indexed, selecting this intrinsic reduces the number of instructions generated by 1, as it performs both the vector extract and the sqdmlal in one instruction.
This only works when the vector to extract from is v4i32, not v2i32. This is due to some issues GlobalISel has selecting intrinsics using v2i32.
[AArch64][GlobalISel] Add test for v4i32 vector extract sqdmlal/sqdmlsl
1. Tests only test v4i32 versions of the intrinsic, as v2i32 currently doesn't work.
2. GlobalISel currently generates poor code in the sqdmlsl case. To fix, the sqdmlalvi64_indexed pattern needs to be copied over for sqdmlsl.
[MLIR][Affine] Fix dead store elimination for vector stores with different types (#189248)
affine-scalrep's findUnusedStore incorrectly classified an
affine.vector_store as dead when a subsequent store wrote to the same
base index but with a smaller vector type. A vector<1xi64> store at
[0,0] does not fully overwrite a vector<5xi64> store at [0,0], so the
first store must be preserved.
The loadCSE function in the same file already had the correct
type-equality check for loads; this patch adds the analogous check for
stores in findUnusedStore.
Fixes #113687
Assisted-by: Claude Code
[AArch64][GlobalISel] Select lane index sqdmlal when vector_extract of v4i32 present
SQDMLALv1i64_indexed takes in an index of a vector as its final operand, meaning it doesn't need to extract the element in a separate instruction.
This only works when the vector to extract from is a v4i32. Currently, extracting from a v2i32 doesn't work, and I'm unsure why.
[AArch64][llvm] Gate some `tlbip` insns with either +tlbid or +d128 (#178913)
Change the gating of `tlbip` instructions (`sysp` aliases) containing
`*E1IS*`, `*E1OS*`, `*E2IS*` or `*E2OS*` to be used with `+tlbid` or
`+d128`. This is because the 2025 Armv9.7-A MemSys specification says:
```
All TLBIP *E1IS*, TLBIP *E1OS*, TLBIP *E2IS* and TLBIP *E2OS*
instructions that are currently dependent on FEAT_D128 are updated
to be dependent on FEAT_D128 or FEAT_TLBID
```
See also change #178912 where the gating of `+d128` for `sysp` was
removed.
pass target triple to `check_assembler_flag` (#188521)
Target specific flags (Notably `-mimplict=always` for ARM) are not
recognized by the clang assembler unless the target is specified. This
PR passes the value of `CMAKE_C_COMPILER_TARGET` to the assembler so
that target specific flags are recognized.
## Previous behaviour
When configuring builtins for an ARMv7 target:
```
-- Builtin supported architectures: armv7
-- Checking for assembler flag -mimplicit-it=always
-- Checking for assembler flag -mimplicit-it=always - Not accepted
-- Checking for assembler flag -Wa,-mimplicit-it=always
-- Checking for assembler flag -Wa,-mimplicit-it=always - Not accepted
CMake Warning at CMakeLists.txt:462 (message):
Don't know how to set the -mimplicit-it=always flag in this assembler; not
[18 lines not shown]
[AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability. (#176304)
Disable generic DAG combines for AMDGPU at -O0 via
disableGenericCombines() to preserve instructions that users may want to
set breakpoints on during debugging.
Assisted-by: Cursor / Claude Opus 4.6
[IR] Fix C API after getTerminator() change (#189922)
The C API function LLVMGetBasicBlockTerminator should return NULL when
the basic block is not well-formed.
[AArch64][llvm] Encode `stshh` as a `HINT` alias (NFC)
Implement `stshh` as a `HINT` alias instead of a dedicated system opcode.
The Arm ARM says that `stshh` is in the `HINT` encoding space, but it is
currently written as a separate class.
Change this to be an alias of `HINT` and the `PHint` definition to only
use 7 bits. Also update the `stshh` pseudo expansion for the intrinsic
to emit `HINT #0x30 | policy`.
No test changes.