[libclc] Move external-funcs.test to static file and use IR checks for .cl tests (#197151)
This PR supercedes #87989.
Moving external-funcs.test to static file simplifies
test/CMakeLists.txt. Static files follows llvm standard lit pattern and
enables fine-grained check of missing symbols in specific libraries.
.cl test files uses %target, %cpu and %check_prefix, which are replaced
with specific values during `ninja check-libclc` or `llvm-lit
build/runtimes/runtimes-${triple}-llvm-bins/libclc/test`. This allows
checking outputs of multiple triples in the same test file.
Add script libclc/test/update_libclc_tests.py, which wraps
utils/update_cc_test_checks.py to update CHECK lines in libclc .cl tests
for a given arch. Example usage:
`libclc/test/update_libclc_tests.py amdgpu`
Assisted-by: Claude Sonnet 4.6
[3 lines not shown]
[LLVM] Add FastMathFlags operand to simplifySelectInst. (#197138)
This removes the potentially bogus use of SimplifyQuery.CxtI, whose
FastMathFlags are not necessarily relevant to the simplification.
[RegisterCoalescer] Fix crash coalescing COPY from erasable IMPLICIT_DEF (#196895)
When a CR_Erase value's source is an erasable IMPLICIT_DEF, discard the
endpoint from pruneValue instead of adding it to EndPoints, and mark any
full-register DstReg uses with no live coverage as undef in
updateRegDefsUses.
Fixes: https://github.com/llvm/llvm-project/issues/195587.
[Clang] Evaluate concepts in their declaration context. (#197215)
Concepts appearing in a constraint expression of a class member had
access to both `this` and the private member of the class.
This changes fixes that by setting the concext to that of the context
before evaluation of its constraint expression.
This is done after we have substituted the template argument.
Code in `Sema::isThisOutsideMemberFunctionBody` that no longer seems
useful is renoved as it was interefering with this change.
This is not an implementation of CWG2589 - at least not a complete one,
as we still check access when doing substitution in the parameter
mapping.
Fixes #115838
Fixes #194803
[lldb][AArch64][Linux] Use member initialisers (#197122)
Member initialise a bunch of things in the register context instead of
setting them all in the constructor with memsets.
The only things I've left are related to hardware breakpoints, and need
changes to non-AArch64 classes so I'll try that separately.
I have not changed the validity bools because those will be removed by
#197113.
[LV] Handle FSub Partial Reductions (#197134)
Reland #191186 after fixing up test failures
Introduces a new RecurKind value 'FSub' in order to handle partial
reductions of floating point values.
This is done by following the existing method for integer partial
reductions, doing a positive accumulation followed by a final
subtraction in the middle block.
[AArch64][ISel] Use SVE for fixed width vector reductions (#195806)
Enable custom lowering for v2i64 [s|u][min|max] reductions for SVE. This
allows fixed-width SVE to use predicated reductions such as smaxv where
NEON has no native equivalent.
Remove the fixed-length vector preference to allow more SVE reduction
operations to be selected when appropriate.
[LV][NFC] Make ARM/mve-selectandorcost.ll test more robust (#197388)
The test currently has an fdiv in the loop, which leads to vector costs
that are almost identical to the scalar cost. This makes the test
fragile and future cost model work for VPDerivedIVRecipe will cause the
test to favour the scalar version. I've changed the fdiv to fmul to make
the test more robust.
[lldb][windows] remove mandatory ordering of the lldb-python.h header (#197298)
`PosixApi.h` typedef'd `pid_t` as `uint32_t`, while Python's
`pyconfig.h` on Windows typedef's it as `int`. C++ forbids redeclaring a
typedef with a different type, so the two headers cannot coexist. The
`NO_PID_T` macro in `lldb-python.h` suppressed LLDB's typedef, but only
if `lldb-python.h` got included before `PosixApi.h`.
`pid_t` on Windows was originally defined in d87fc157d2b7. At this time,
there was no Python support for LLDB on Windows and `uint32_t` matches
the `DWORD` type used by the Win32 API for process IDs.
This patch matches the Python type in `PosixApi.h`, removing the need
for the include ordering.
This is a follow up to https://github.com/llvm/llvm-project/pull/197048.
[LV] Store DataLayout on VPTypeAnalysis (NFC) (#197231)
Using `R->getParent()->getPlan()->getDataLayout()` limits
`inferScalarType` to recipes within blocks that have been attached to a
plan.
(Hit while re-basing a PR)
[lldb] Step over non-lldb breakpoints (#190622)
Note: this is a second attempt at 304c680 / #174348, hopefully fixing
the post-commit Mac testing failures. The main differences from the
previous commit are:
* Fixing the incorrect masks in ArchitectureArm.cpp
* Declining to step in StopInfoMachException if the PC and exception
exc_sub_code don't match - implies fixup already applied
* Change to reflect explicit Address constructor - I assume this is
correct, essentially explicitly making a temporary Address object of the
pc address in SkipOverTrapInstruction
* Removing the debugserver code to step over the trap instruction as it
interacts badly with this change (without the check mentioned
previously).
---
Several languages support some sort of "breakpoint" function, which adds
ISA-specific instructions to generate an interrupt at runtime. However,
[31 lines not shown]
[AggressiveInstCombine] POPCNT generation for bit-count pattern (#177109)
The proposal is to enhance LLVM by teaching it to recognize the pattern
and replace it with the hardware POPCNT instruction.
---------
Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal at amd.com>
Co-authored-by: Craig Topper <craig.topper at sifive.com>
[AMDGPU] Align GlobalISel with SelectionDAG for f16 to i1/i8 saturated conversions (#188019)
GlobaISel now also saturates `i1` and `i8` to `f16` conversion at `i16`
where available. As a side effect, this also causes the two uniform test
cases: `f16_i1` and `f16_i8` to use VALU instructions, instead of SALU
instructions. This is potentially sub-optimal but it makes it consistent
with ISel and has been already highlighted as future work in #187711.
[AMDGPU] AMDGPULibCalls: Set new intrinsic calling convention to C (#197364)
In #197151 libclc/test/math/fabs.cl,
tryReplaceLibcallWithSimpleIntrinsic replaces `call fastcc float
@_Z4fabsf` with `call fastcc float @llvm.fabs.f32`. But intrinsic call
must use CallingConv::C.
[RISCV][MC] add experimental `Zvvfmm` MC support (#196486)
This PR adds experimental MC layer support for the RISC-V `Zvvfmm` from
Integrated Matrix Extension based on the
[riscv-isa-release-fa55752-2026-05-04 spec
release](https://github.com/riscv/integrated-matrix-extension/releases/tag/riscv-isa-release-fa55752-2026-05-04).
As a follow up of `Zvvmm` in #193956
This PR:
- Renames `RISCVInstrInfoZvvmm.td` to `RISCVInstrInfoZvvm.td` so `Zvvmm`
and `Zvvfmm` share the same IME instruction file according to the spec.
And all future instructions from the `Zvvm family` will be placed here
too.
- Adds a new `VScaleReg` asm operand to support the `v0.scale` assembly
syntax.
- Adds assembler support for floating-point matrix instructions:
`vfmmacc.vv`, `vfwmmacc.vv`, `vfqmmacc.vv`, `vf8wmmacc.vv`
- Adds integer-input floating-point accumulate scaled instructions:
`vfwimmacc.vv`, `vfqimmacc.vv`, `vf8wimmacc.vv`
[3 lines not shown]
[OpenMP][offload] Inline target reductions (#196061)
Significantly reduces register usage and removes register spilling in
`offload/test/offloading/multiple-reductions.cpp`, for example. Provides
speedup of up to 5-10x for a lot of reductions in such a larger setup.
Based on https://github.com/llvm/llvm-project/pull/195940.
See also the discussion in
https://github.com/llvm/llvm-project/pull/195102.