[LV] Store DataLayout on VPTypeAnalysis (NFC) (#197231)
Using `R->getParent()->getPlan()->getDataLayout()` limits
`inferScalarType` to recipes within blocks that have been attached to a
plan.
(Hit while re-basing a PR)
[lldb] Step over non-lldb breakpoints (#190622)
Note: this is a second attempt at 304c680 / #174348, hopefully fixing
the post-commit Mac testing failures. The main differences from the
previous commit are:
* Fixing the incorrect masks in ArchitectureArm.cpp
* Declining to step in StopInfoMachException if the PC and exception
exc_sub_code don't match - implies fixup already applied
* Change to reflect explicit Address constructor - I assume this is
correct, essentially explicitly making a temporary Address object of the
pc address in SkipOverTrapInstruction
* Removing the debugserver code to step over the trap instruction as it
interacts badly with this change (without the check mentioned
previously).
---
Several languages support some sort of "breakpoint" function, which adds
ISA-specific instructions to generate an interrupt at runtime. However,
[31 lines not shown]
[AggressiveInstCombine] POPCNT generation for bit-count pattern (#177109)
The proposal is to enhance LLVM by teaching it to recognize the pattern
and replace it with the hardware POPCNT instruction.
---------
Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal at amd.com>
Co-authored-by: Craig Topper <craig.topper at sifive.com>
[AMDGPU] Align GlobalISel with SelectionDAG for f16 to i1/i8 saturated conversions (#188019)
GlobaISel now also saturates `i1` and `i8` to `f16` conversion at `i16`
where available. As a side effect, this also causes the two uniform test
cases: `f16_i1` and `f16_i8` to use VALU instructions, instead of SALU
instructions. This is potentially sub-optimal but it makes it consistent
with ISel and has been already highlighted as future work in #187711.
[AMDGPU] AMDGPULibCalls: Set new intrinsic calling convention to C (#197364)
In #197151 libclc/test/math/fabs.cl,
tryReplaceLibcallWithSimpleIntrinsic replaces `call fastcc float
@_Z4fabsf` with `call fastcc float @llvm.fabs.f32`. But intrinsic call
must use CallingConv::C.
[RISCV][MC] add experimental `Zvvfmm` MC support (#196486)
This PR adds experimental MC layer support for the RISC-V `Zvvfmm` from
Integrated Matrix Extension based on the
[riscv-isa-release-fa55752-2026-05-04 spec
release](https://github.com/riscv/integrated-matrix-extension/releases/tag/riscv-isa-release-fa55752-2026-05-04).
As a follow up of `Zvvmm` in #193956
This PR:
- Renames `RISCVInstrInfoZvvmm.td` to `RISCVInstrInfoZvvm.td` so `Zvvmm`
and `Zvvfmm` share the same IME instruction file according to the spec.
And all future instructions from the `Zvvm family` will be placed here
too.
- Adds a new `VScaleReg` asm operand to support the `v0.scale` assembly
syntax.
- Adds assembler support for floating-point matrix instructions:
`vfmmacc.vv`, `vfwmmacc.vv`, `vfqmmacc.vv`, `vf8wmmacc.vv`
- Adds integer-input floating-point accumulate scaled instructions:
`vfwimmacc.vv`, `vfqimmacc.vv`, `vf8wimmacc.vv`
[3 lines not shown]
[OpenMP][offload] Inline target reductions (#196061)
Significantly reduces register usage and removes register spilling in
`offload/test/offloading/multiple-reductions.cpp`, for example. Provides
speedup of up to 5-10x for a lot of reductions in such a larger setup.
Based on https://github.com/llvm/llvm-project/pull/195940.
See also the discussion in
https://github.com/llvm/llvm-project/pull/195102.
[Support][Cache] Make `pruneCache` return an `Expected` (#191367)
When `sys::fs::disk_space` would fail in during a call to `pruneCache`,
it would report a `fatal_error`. However, a failure to prune doesn't
mean the caller should fail catastrophically.
Downstream, we use LLVM's cache in the OpenCL runtime. A failure to
prune the cache can be safely ignored without stopping the user's
application.
[AArch64] Don't use GISel for optnone functions if not feasible. (#196343)
A function like the one below should still result in an SME prologue to
set up ZA.
```
void bar() __arm_inout("za");
__attribute__((optnone)) __arm_new("za")
void foo() {
bar();
}
```
https://godbolt.org/z/aEcoKea4b
This worked in LLVM 22, but got broken by #174746.
[AMDGPU][InstCombine] Optimize constant shuffle patterns (#192246)
Detect llvm.amdgcn.wave.shuffle intrinsics where the lane index is a
constant function of the lane ID and replace them with hardware-specific
intrinsics.
[LangRef] Clarify pointer capture spec (#194647)
This clarifies the semantics of "pointer capture" in two respects:
* For provenance capture, specify this in terms of accesses based on the
pointer being UB after the function returns, rather than whether or not
the pointer gets stored etc. The distinction does not matter for
inference, but is commonly required for frontend-generated captures
annotations (and the `!captures` metadata doesn't really make sense
otherwise). This gives provenance (non-)capture unambiguous operational
semantics.
* For address capture, specify that the observable behavior of the
function can't differ based on the address. This is to accommodate
things like loop vectorization runtime checks, which introduce pointer
comparisons on `captures(none)` pointers in a way that is harmless and
needs to be allowed. The semantics here are non-operational. If anyone
has ideas on how to formalize this, they would be very welcome.
[AA] Consider read-only provenance capture for synchronization effects (#197157)
If only read-only provenance is captured, this means that another thread
may only read the object, not write to it. As such, we can also model
synchronizing operations as only reading the location (and thus allow
reordering of reads, but not writes, across the synchronization).
[X86] Manage atomic store of fp -> int promotion in DAG
When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.
[SelectionDAG] Scalarize <1 x T> vector types for atomic store
`store atomic <1 x T>` is not valid. This change legalizes
vector types of atomic store via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.
[X86] Add atomic vector store tests for unaligned >1 sizes.
Unaligned atomic vector stores with size >1 are lowered to calls.
Adding their tests separately here.