[RISCV] Remove codegen for VP float rounding intrinsics (#189896)
Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999
This splits off seven intrinsics from #179622.
We now generate vfcvt.rtz for llvm.vp.roundtozero. It looks like we
should have been using the codegen for llvm.trunc for it, but we somehow
missed that.
[clang][bytecode] Disable tail calls on sparc (#189887)
Looks like this causes problems there as well:
https://lab.llvm.org/buildbot/#/builders/114/builds/252
Interp.cpp:2572:21: error: cannot tail-call: target is not able to
optimize the call into a sibling call
2572 | MUSTTAIL return Fn(S, PC);
| ~~^~~~~~~
[AArch64][GlobalISel] Selet SQDMLSLv1i64_indexed when vector_extract present
Like SQDMLALv1i64_indexed, selecting this intrinsic reduces the number of instructions generated by 1, as it performs both the vector extract and the sqdmlal in one instruction.
This only works when the vector to extract from is v4i32, not v2i32. This is due to some issues GlobalISel has selecting intrinsics using v2i32.
[AArch64][GlobalISel] Add test for v4i32 vector extract sqdmlal/sqdmlsl
1. Tests only test v4i32 versions of the intrinsic, as v2i32 currently doesn't work.
2. GlobalISel currently generates poor code in the sqdmlsl case. To fix, the sqdmlalvi64_indexed pattern needs to be copied over for sqdmlsl.
[MLIR][Affine] Fix dead store elimination for vector stores with different types (#189248)
affine-scalrep's findUnusedStore incorrectly classified an
affine.vector_store as dead when a subsequent store wrote to the same
base index but with a smaller vector type. A vector<1xi64> store at
[0,0] does not fully overwrite a vector<5xi64> store at [0,0], so the
first store must be preserved.
The loadCSE function in the same file already had the correct
type-equality check for loads; this patch adds the analogous check for
stores in findUnusedStore.
Fixes #113687
Assisted-by: Claude Code
[AArch64][GlobalISel] Select lane index sqdmlal when vector_extract of v4i32 present
SQDMLALv1i64_indexed takes in an index of a vector as its final operand, meaning it doesn't need to extract the element in a separate instruction.
This only works when the vector to extract from is a v4i32. Currently, extracting from a v2i32 doesn't work, and I'm unsure why.
[AArch64][llvm] Gate some `tlbip` insns with either +tlbid or +d128 (#178913)
Change the gating of `tlbip` instructions (`sysp` aliases) containing
`*E1IS*`, `*E1OS*`, `*E2IS*` or `*E2OS*` to be used with `+tlbid` or
`+d128`. This is because the 2025 Armv9.7-A MemSys specification says:
```
All TLBIP *E1IS*, TLBIP *E1OS*, TLBIP *E2IS* and TLBIP *E2OS*
instructions that are currently dependent on FEAT_D128 are updated
to be dependent on FEAT_D128 or FEAT_TLBID
```
See also change #178912 where the gating of `+d128` for `sysp` was
removed.
pass target triple to `check_assembler_flag` (#188521)
Target specific flags (Notably `-mimplict=always` for ARM) are not
recognized by the clang assembler unless the target is specified. This
PR passes the value of `CMAKE_C_COMPILER_TARGET` to the assembler so
that target specific flags are recognized.
## Previous behaviour
When configuring builtins for an ARMv7 target:
```
-- Builtin supported architectures: armv7
-- Checking for assembler flag -mimplicit-it=always
-- Checking for assembler flag -mimplicit-it=always - Not accepted
-- Checking for assembler flag -Wa,-mimplicit-it=always
-- Checking for assembler flag -Wa,-mimplicit-it=always - Not accepted
CMake Warning at CMakeLists.txt:462 (message):
Don't know how to set the -mimplicit-it=always flag in this assembler; not
[18 lines not shown]
[AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability. (#176304)
Disable generic DAG combines for AMDGPU at -O0 via
disableGenericCombines() to preserve instructions that users may want to
set breakpoints on during debugging.
Assisted-by: Cursor / Claude Opus 4.6
[IR] Fix C API after getTerminator() change (#189922)
The C API function LLVMGetBasicBlockTerminator should return NULL when
the basic block is not well-formed.
[AArch64][llvm] Encode `stshh` as a `HINT` alias (NFC)
Implement `stshh` as a `HINT` alias instead of a dedicated system opcode.
The Arm ARM says that `stshh` is in the `HINT` encoding space, but it is
currently written as a separate class.
Change this to be an alias of `HINT` and the `PHint` definition to only
use 7 bits. Also update the `stshh` pseudo expansion for the intrinsic
to emit `HINT #0x30 | policy`.
No test changes.
[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128
Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:
```
All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
that are currently dependent on FEAT_D128 are updated to be dependent
on FEAT_D128 or FEAT_TLBID
```