LLVM/project effcd18llvm/test/CodeGen/RISCV/rvv roundtozero-vp.ll roundeven-vp.ll

[RISCV] Remove codegen for VP float rounding intrinsics (#189896)

Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off seven intrinsics from #179622.

We now generate vfcvt.rtz for llvm.vp.roundtozero. It looks like we
should have been using the codegen for llvm.trunc for it, but we somehow
missed that.
DeltaFile
+474-1,054llvm/test/CodeGen/RISCV/rvv/roundtozero-vp.ll
+431-799llvm/test/CodeGen/RISCV/rvv/roundeven-vp.ll
+431-799llvm/test/CodeGen/RISCV/rvv/round-vp.ll
+431-799llvm/test/CodeGen/RISCV/rvv/floor-vp.ll
+406-774llvm/test/CodeGen/RISCV/rvv/nearbyint-vp.ll
+376-744llvm/test/CodeGen/RISCV/rvv/rint-vp.ll
+2,549-4,96911 files not shown
+4,375-8,74017 files

LLVM/project f2685faclang/lib/AST/ByteCode Interp.cpp

[clang][bytecode] Disable tail calls on sparc (#189887)

Looks like this causes problems there as well:
https://lab.llvm.org/buildbot/#/builders/114/builds/252

Interp.cpp:2572:21: error: cannot tail-call: target is not able to
optimize the call into a sibling call
 2572 |   MUSTTAIL return Fn(S, PC);
      |                   ~~^~~~~~~
DeltaFile
+1-1clang/lib/AST/ByteCode/Interp.cpp
+1-11 files

LLVM/project 7a33b1dllvm/lib/Target/AArch64 AArch64InstrInfo.td

[AArch64][GlobalISel] Move new SQDMULLi32 pattern to join the others
DeltaFile
+4-4llvm/lib/Target/AArch64/AArch64InstrInfo.td
+4-41 files

LLVM/project 853cbc2llvm/lib/IR Metadata.cpp

second attempt of perf regression fix...

Created using spr 1.3.8-wip
DeltaFile
+2-0llvm/lib/IR/Metadata.cpp
+2-01 files

LLVM/project a5b9abcllvm/lib/Target/AArch64 AArch64InstrInfo.td, llvm/test/CodeGen/AArch64 arm64-vmul.ll

[AArch64][GlobalISel] Selet SQDMLSLv1i64_indexed when vector_extract present

Like SQDMLALv1i64_indexed, selecting this intrinsic reduces the number of instructions generated by 1, as it performs both the vector extract and the sqdmlal in one instruction.

This only works when the vector to extract from is v4i32, not v2i32. This is due to some issues GlobalISel has selecting intrinsics using v2i32.
DeltaFile
+7-16llvm/test/CodeGen/AArch64/arm64-vmul.ll
+6-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+13-162 files

LLVM/project a3ebf37llvm/lib/Target/AMDGPU GCNVOPDUtils.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.fdot2.ll llvm.amdgcn.fdot2.f32.bf16.ll

AMDGPU: Fix generation for dot2 VOPD with sgpr inputs

There was no check for sgpr in src1 operand.
DeltaFile
+204-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+108-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
+6-4llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+318-43 files

LLVM/project d482215llvm/test/CodeGen/AArch64 arm64-vmul.ll

[AArch64][GlobalISel] Add test for v4i32 vector extract sqdmlal/sqdmlsl

1. Tests only test v4i32 versions of the intrinsic, as v2i32 currently doesn't work.
2. GlobalISel currently generates poor code in the sqdmlsl case. To fix, the sqdmlalvi64_indexed pattern needs to be copied over for sqdmlsl.
DeltaFile
+42-6llvm/test/CodeGen/AArch64/arm64-vmul.ll
+42-61 files

LLVM/project f6ffdbcmlir/lib/Dialect/Affine/Utils Utils.cpp, mlir/test/Dialect/Affine scalrep.mlir

[MLIR][Affine] Fix dead store elimination for vector stores with different types (#189248)

affine-scalrep's findUnusedStore incorrectly classified an
affine.vector_store as dead when a subsequent store wrote to the same
base index but with a smaller vector type. A vector<1xi64> store at
[0,0] does not fully overwrite a vector<5xi64> store at [0,0], so the
first store must be preserved.

The loadCSE function in the same file already had the correct
type-equality check for loads; this patch adds the analogous check for
stores in findUnusedStore.

Fixes #113687

Assisted-by: Claude Code
DeltaFile
+37-0mlir/test/Dialect/Affine/scalrep.mlir
+13-2mlir/lib/Dialect/Affine/Utils/Utils.cpp
+50-22 files

LLVM/project 67a4c90libsycl/src/detail program_manager.cpp program_manager.hpp

fix comments

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+9-9libsycl/src/detail/program_manager.cpp
+5-0libsycl/src/detail/program_manager.hpp
+14-92 files

LLVM/project 16a3e0allvm/lib/Target/AArch64 AArch64InstrInfo.td

[AArch64][GlobalISel] Select lane index sqdmlal when vector_extract of v4i32 present

SQDMLALv1i64_indexed takes in an index of a vector as its final operand, meaning it doesn't need to extract the element in a separate instruction.

This only works when the vector to extract from is a v4i32. Currently, extracting from a v2i32 doesn't work, and I'm unsure why.
DeltaFile
+6-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+6-01 files

LLVM/project b655050llvm/lib/Target/AArch64 AArch64InstrInfo.td, llvm/lib/Target/AArch64/GISel AArch64RegisterBankInfo.cpp

[AArch64][GlobalISel] Add patterns for scalar sqdmlal/sqdmlsl
DeltaFile
+86-44llvm/test/CodeGen/AArch64/arm64-vmul.ll
+12-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+1-2llvm/test/CodeGen/AArch64/arm64-int-neon.ll
+1-0llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+100-464 files

LLVM/project 5a3abf9llvm/include/llvm/ADT Uniformity.h, llvm/lib/Analysis UniformityAnalysis.cpp TargetTransformInfo.cpp

[NFC] Rename InstructionUniformity to ValueUniformity
DeltaFile
+25-26llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+7-7llvm/include/llvm/ADT/Uniformity.h
+6-7llvm/lib/Analysis/UniformityAnalysis.cpp
+5-6llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+5-5llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+4-4llvm/lib/Analysis/TargetTransformInfo.cpp
+52-557 files not shown
+68-7513 files

LLVM/project d6cd159llvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Gate some `tlbip` insns with either +tlbid or +d128 (#178913)

Change the gating of `tlbip` instructions (`sysp` aliases) containing
`*E1IS*`, `*E1OS*`, `*E2IS*` or `*E2OS*` to be used with `+tlbid` or
`+d128`. This is because the 2025 Armv9.7-A MemSys specification says:

```
  All TLBIP *E1IS*, TLBIP *E1OS*, TLBIP *E2IS* and TLBIP *E2OS*
  instructions that are currently dependent on FEAT_D128 are updated
  to be dependent on FEAT_D128 or FEAT_TLBID
```

See also change #178912 where the gating of `+d128` for `sysp` was
removed.
DeltaFile
+498-366llvm/test/MC/AArch64/armv9a-tlbip.s
+17-14llvm/lib/Target/AArch64/AArch64SystemOperands.td
+17-2llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+2-4llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+534-3864 files

LLVM/project eed5592llvm/lib/IR Instruction.cpp Metadata.cpp

fix perf regression

Created using spr 1.3.8-wip
DeltaFile
+1-2llvm/lib/IR/Instruction.cpp
+2-0llvm/lib/IR/Metadata.cpp
+3-22 files

LLVM/project 97562e7compiler-rt/cmake/Modules CheckAssemblerFlag.cmake

pass target triple to `check_assembler_flag` (#188521)

Target specific flags (Notably `-mimplict=always` for ARM) are not
recognized by the clang assembler unless the target is specified. This
PR passes the value of `CMAKE_C_COMPILER_TARGET` to the assembler so
that target specific flags are recognized.

## Previous behaviour

When configuring builtins for an ARMv7 target:

```
-- Builtin supported architectures: armv7
-- Checking for assembler flag -mimplicit-it=always
-- Checking for assembler flag -mimplicit-it=always - Not accepted
-- Checking for assembler flag -Wa,-mimplicit-it=always
-- Checking for assembler flag -Wa,-mimplicit-it=always - Not accepted
CMake Warning at CMakeLists.txt:462 (message):
  Don't know how to set the -mimplicit-it=always flag in this assembler; not

    [18 lines not shown]
DeltaFile
+1-0compiler-rt/cmake/Modules/CheckAssemblerFlag.cmake
+1-01 files

LLVM/project 6bf794allvm/test/CodeGen/AMDGPU memory-legalizer-private-singlethread.ll memory-legalizer-private-workgroup.ll

[AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability. (#176304)

Disable generic DAG combines for AMDGPU at -O0 via
disableGenericCombines() to preserve instructions that users may want to
set breakpoints on during debugging.

Assisted-by: Cursor / Claude Opus 4.6
DeltaFile
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
+8,449-1,355llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
+8,449-1,355llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+8,069-1,315llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
+50,599-8,12371 files not shown
+191,551-26,12277 files

LLVM/project c5363f2llvm/lib/IR Core.cpp, llvm/tools/llvm-c-test echo.cpp

[IR] Fix C API after getTerminator() change (#189922)

The C API function LLVMGetBasicBlockTerminator should return NULL when
the basic block is not well-formed.
DeltaFile
+5-0llvm/tools/llvm-c-test/echo.cpp
+1-1llvm/lib/IR/Core.cpp
+6-12 files

LLVM/project 25eb4f4llvm/lib/Target/AArch64 AArch64InstrFormats.td AArch64SystemOperands.td, llvm/lib/Target/AArch64/Utils AArch64BaseInfo.h

[AArch64][llvm] Encode `stshh` as a `HINT` alias (NFC)

Implement `stshh` as a `HINT` alias instead of a dedicated system opcode.
The Arm ARM says that `stshh` is in the `HINT` encoding space, but it is
currently written as a separate class.

Change this to be an alias of `HINT` and the `PHint` definition to only
use 7 bits. Also update the `stshh` pseudo expansion for the intrinsic
to emit `HINT #0x30 | policy`.

No test changes.
DeltaFile
+6-10llvm/lib/Target/AArch64/AArch64InstrFormats.td
+5-10llvm/lib/Target/AArch64/AArch64SystemOperands.td
+2-2llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+1-1llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+1-1llvm/lib/Target/AArch64/AArch64InstrInfo.td
+15-245 files

LLVM/project 71cc594llvm/lib/Analysis DependenceAnalysis.cpp

[DA] add debug log in findGCD (#189537)
DeltaFile
+7-4llvm/lib/Analysis/DependenceAnalysis.cpp
+7-41 files

LLVM/project c431f1cllvm/test/MC/AArch64 armv9a-tlbip.s

fixup! Adjust tlbip.s test
DeltaFile
+114-58llvm/test/MC/AArch64/armv9a-tlbip.s
+114-581 files

LLVM/project 649be54llvm/test/MC/AArch64 armv9a-tlbip.s

fixup! Optimise RUN lines in armv9a-tlbip.s
DeltaFile
+58-114llvm/test/MC/AArch64/armv9a-tlbip.s
+58-1141 files

LLVM/project 0c22ec6llvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/test/MC/AArch64 armv9a-tlbip.s

fixup! More PR cleanups following comments
DeltaFile
+7-11llvm/lib/Target/AArch64/AArch64SystemOperands.td
+0-4llvm/test/MC/AArch64/armv9a-tlbip.s
+7-152 files

LLVM/project 032f5e8llvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/Utils AArch64BaseInfo.h

fixup! Fix commits after rebase to main
DeltaFile
+19-29llvm/lib/Target/AArch64/AArch64SystemOperands.td
+5-6llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+6-0llvm/test/MC/AArch64/armv9a-tlbip.s
+30-353 files

LLVM/project d9f0a6allvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! More optimisations
DeltaFile
+10-11llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+7-6llvm/lib/Target/AArch64/AArch64SystemOperands.td
+1-8llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+18-253 files

LLVM/project 1e4d967llvm/test/MC/AArch64 tlbip-tlbid-or-d128.s armv9a-tlbip.s

fixup! Fix using Marian's suggestion
DeltaFile
+0-259llvm/test/MC/AArch64/tlbip-tlbid-or-d128.s
+160-0llvm/test/MC/AArch64/armv9a-tlbip.s
+160-2592 files

LLVM/project 3fbe4d9llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp, llvm/lib/Target/AArch64/Utils AArch64BaseInfo.h

[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128

Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:

```
  All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
  that are currently dependent on FEAT_D128 are updated to be dependent
  on FEAT_D128 or FEAT_TLBID
```
DeltaFile
+259-0llvm/test/MC/AArch64/tlbip-tlbid-or-d128.s
+66-66llvm/test/MC/AArch64/armv9a-tlbip.s
+15-5llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+20-0llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+360-714 files

LLVM/project bef7c32llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Simplify logic after suggestions from Marian
DeltaFile
+13-10llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+13-101 files

LLVM/project bda386allvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Don't use ExtraRequires. Instead, set a boolean in TLBITableBase
DeltaFile
+27-22llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+17-1llvm/lib/Target/AArch64/AArch64SystemOperands.td
+7-7llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+51-303 files

LLVM/project 227039fllvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp, llvm/lib/Target/AArch64/Utils AArch64BaseInfo.h

fixup! More simplification
DeltaFile
+413-443llvm/test/MC/AArch64/armv9a-tlbip.s
+1-15llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+7-9llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+421-4673 files

LLVM/project 138f3a2llvm/lib/IR Core.cpp, llvm/tools/llvm-c-test echo.cpp

[spr] initial version

Created using spr 1.3.8-wip
DeltaFile
+5-0llvm/tools/llvm-c-test/echo.cpp
+1-1llvm/lib/IR/Core.cpp
+6-12 files