LLVM/project f6007cflibclc/test update_libclc_tests.py lit.cfg.py, libclc/test/integer sub_sat.cl

[libclc] Move external-funcs.test to static file and use IR checks for .cl tests (#197151)

This PR supercedes #87989.

Moving external-funcs.test to static file simplifies
test/CMakeLists.txt. Static files follows llvm standard lit pattern and
enables fine-grained check of missing symbols in specific libraries.

.cl test files uses %target, %cpu and %check_prefix, which are replaced
with specific values during `ninja check-libclc` or `llvm-lit
build/runtimes/runtimes-${triple}-llvm-bins/libclc/test`. This allows
checking outputs of multiple triples in the same test file.

Add script libclc/test/update_libclc_tests.py, which wraps
utils/update_cc_test_checks.py to update CHECK lines in libclc .cl tests
for a given arch. Example usage:
`libclc/test/update_libclc_tests.py amdgpu`

Assisted-by: Claude Sonnet 4.6

    [3 lines not shown]
DeltaFile
+170-0libclc/test/update_libclc_tests.py
+156-0libclc/test/math/cos.cl
+46-5libclc/test/lit.cfg.py
+16-32libclc/test/CMakeLists.txt
+45-0libclc/test/math/rsqrt.cl
+44-0libclc/test/integer/sub_sat.cl
+477-3720 files not shown
+590-15926 files

LLVM/project e89f7b5llvm/include/llvm/Analysis InstSimplifyFolder.h, llvm/include/llvm/IR ConstantFolder.h IRBuilderFolder.h

[LLVM] Add FastMathFlags operand to simplifySelectInst. (#197138)

This removes the potentially bogus use of SimplifyQuery.CxtI, whose
FastMathFlags are not necessarily relevant to the simplification.
DeltaFile
+16-9llvm/lib/Analysis/InstructionSimplify.cpp
+6-2llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+3-2llvm/include/llvm/Analysis/InstSimplifyFolder.h
+2-1llvm/include/llvm/IR/ConstantFolder.h
+2-1llvm/include/llvm/IR/IRBuilderFolder.h
+2-1llvm/include/llvm/IR/NoFolder.h
+31-164 files not shown
+36-2010 files

LLVM/project 9fb099eflang/include/flang/Lower OpenMP.h, flang/lib/Lower/OpenMP OpenMP.cpp

NFCcode changes
DeltaFile
+29-29flang/lib/Lower/OpenMP/OpenMP.cpp
+1-1flang/include/flang/Lower/OpenMP.h
+30-302 files

LLVM/project adfe869llvm/lib/CodeGen RegisterCoalescer.cpp, llvm/test/CodeGen/X86 coalescer-copy-from-erasable-implicit-def.ll coalescer-copy-from-erasable-implicit-def.mir

[RegisterCoalescer] Fix crash coalescing COPY from erasable IMPLICIT_DEF (#196895)

When a CR_Erase value's source is an erasable IMPLICIT_DEF, discard the
endpoint from pruneValue instead of adding it to EndPoints, and mark any
full-register DstReg uses with no live coverage as undef in
updateRegDefsUses.

Fixes: https://github.com/llvm/llvm-project/issues/195587.
DeltaFile
+85-0llvm/test/CodeGen/X86/coalescer-copy-from-erasable-implicit-def.ll
+29-0llvm/test/CodeGen/X86/coalescer-copy-from-erasable-implicit-def.mir
+21-8llvm/lib/CodeGen/RegisterCoalescer.cpp
+135-83 files

LLVM/project 8de30fcmlir/docs Tokens.md

Update mlir/docs/Tokens.md

Co-authored-by: Mehdi Amini <joker.eph at gmail.com>
DeltaFile
+2-0mlir/docs/Tokens.md
+2-01 files

LLVM/project ac3c588clang/docs ReleaseNotes.rst, clang/lib/Sema SemaConcept.cpp SemaExprCXX.cpp

[Clang] Evaluate concepts in their declaration context. (#197215)

Concepts appearing in a constraint expression of a class member had
access to both `this` and the private member of the class.

This changes fixes that by setting the concext to that of the context
before evaluation of its constraint expression.

This is done after we have substituted the template argument.

Code in `Sema::isThisOutsideMemberFunctionBody` that no longer seems
useful is renoved as it was interefering with this change.

This is not an implementation of CWG2589 - at least not a complete one,
as we still check access when doing substitution in the parameter
mapping.

Fixes #115838
Fixes #194803
DeltaFile
+68-1clang/test/SemaTemplate/concepts.cpp
+13-0clang/lib/Sema/SemaConcept.cpp
+0-5clang/lib/Sema/SemaExprCXX.cpp
+2-0clang/docs/ReleaseNotes.rst
+83-64 files

LLVM/project f27b3a7lldb/source/Plugins/Process/Linux NativeRegisterContextLinux_arm64.h NativeRegisterContextLinux_arm64.cpp

[lldb][AArch64][Linux] Use member initialisers (#197122)

Member initialise a bunch of things in the register context instead of
setting them all in the constructor with memsets.

The only things I've left are related to hardware breakpoints, and need
changes to non-AArch64 classes so I'll try that separately.

I have not changed the validity bools because those will be removed by
#197113.
DeltaFile
+24-31lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.h
+0-14lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
+24-452 files

LLVM/project c0bbc06llvm/lib/Analysis IVDescriptors.cpp, llvm/lib/Transforms/Vectorize VPlanTransforms.cpp LoopVectorize.cpp

[LV] Handle FSub Partial Reductions (#197134)

Reland #191186 after fixing up test failures 

Introduces a new RecurKind value 'FSub' in order to handle partial
reductions of floating point values.

This is done by following the existing method for integer partial
reductions, doing a positive accumulation followed by a final
subtraction in the middle block.
DeltaFile
+318-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll
+141-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub.ll
+40-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-fsub-chained.ll
+31-5llvm/lib/Analysis/IVDescriptors.cpp
+19-7llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+18-6llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+567-185 files not shown
+595-2211 files

LLVM/project dc681a7llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 sve-fixed-length-log-reduce.ll v2i64-min-max.ll

[AArch64][ISel] Use SVE for fixed width vector reductions (#195806)

Enable custom lowering for v2i64 [s|u][min|max] reductions for SVE. This
allows fixed-width SVE to use predicated reductions such as smaxv where
NEON has no native equivalent.

Remove the fixed-length vector preference to allow more SVE reduction
operations to be selected when appropriate.
DeltaFile
+26-26llvm/test/CodeGen/AArch64/sve-fixed-length-log-reduce.ll
+41-1llvm/test/CodeGen/AArch64/v2i64-min-max.ll
+9-10llvm/test/CodeGen/AArch64/vector-extract-last-active.ll
+6-9llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+82-464 files

LLVM/project 941f8a9llvm/test/Transforms/LoopVectorize/ARM mve-selectandorcost.ll

[LV][NFC] Make ARM/mve-selectandorcost.ll test more robust (#197388)

The test currently has an fdiv in the loop, which leads to vector costs
that are almost identical to the scalar cost. This makes the test
fragile and future cost model work for VPDerivedIVRecipe will cause the
test to favour the scalar version. I've changed the fdiv to fmul to make
the test more robust.
DeltaFile
+3-3llvm/test/Transforms/LoopVectorize/ARM/mve-selectandorcost.ll
+3-31 files

LLVM/project 9bae451llvm/test/CodeGen/X86 vector-reduce-ctpop.ll, llvm/test/MC/AMDGPU gfx13_asm_vop3.s gfx13_asm_vop3-fake16.s

Merge upstream/main into users/mariusz-sikora-at-amd/gfx13/add-vbuffer
DeltaFile
+8,195-0llvm/test/MC/AMDGPU/gfx13_asm_vop3.s
+8,182-0llvm/test/MC/AMDGPU/gfx13_asm_vop3-fake16.s
+6,862-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Nano-sve-instructions.s
+4,686-918llvm/test/CodeGen/X86/vector-reduce-ctpop.ll
+5,587-0llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+5,574-0llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16-fake16.s
+39,086-9181,525 files not shown
+117,335-34,7191,531 files

LLVM/project 2384d28lldb/include/lldb/Host/windows PosixApi.h, lldb/source/Plugins/ScriptInterpreter/Python lldb-python.h

[lldb][windows] remove mandatory ordering of the lldb-python.h header (#197298)

`PosixApi.h` typedef'd `pid_t` as `uint32_t`, while Python's
`pyconfig.h` on Windows typedef's it as `int`. C++ forbids redeclaring a
typedef with a different type, so the two headers cannot coexist. The
`NO_PID_T` macro in `lldb-python.h` suppressed LLDB's typedef, but only
if `lldb-python.h` got included before `PosixApi.h`.

`pid_t` on Windows was originally defined in d87fc157d2b7. At this time,
there was no Python support for LLDB on Windows and `uint32_t` matches
the `DWORD` type used by the Win32 API for process IDs.

This patch matches the Python type in `PosixApi.h`, removing the need
for the include ordering.

This is a follow up to https://github.com/llvm/llvm-project/pull/197048.
DeltaFile
+6-13lldb/include/lldb/Host/windows/PosixApi.h
+1-7lldb/source/Plugins/ScriptInterpreter/Python/Interfaces/ScriptedProcessPythonInterface.cpp
+0-7lldb/source/Plugins/ScriptInterpreter/Python/lldb-python.h
+0-3lldb/source/Plugins/ScriptInterpreter/Python/Interfaces/ScriptedFrameProviderPythonInterface.cpp
+0-3lldb/source/Plugins/ScriptInterpreter/Python/Interfaces/ScriptedFramePythonInterface.cpp
+0-3lldb/source/Plugins/ScriptInterpreter/Python/Interfaces/ScriptedHookPythonInterface.cpp
+7-3611 files not shown
+7-6917 files

LLVM/project 49190a3llvm/test/CodeGen/AMDGPU llvm.amdgcn.permlane.ll llvm.amdgcn.permlane.gfx1250.ll

Add gfx1310 CodeGen tests for permlane.* instructions
DeltaFile
+3,435-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
+2,953-188llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.gfx1250.ll
+6,388-1882 files

LLVM/project 332fde6llvm/lib/Transforms/Vectorize VPlanAnalysis.h VPlanAnalysis.cpp

[LV] Store DataLayout on VPTypeAnalysis (NFC) (#197231)

Using `R->getParent()->getPlan()->getDataLayout()` limits
`inferScalarType` to recipes within blocks that have been attached to a
plan.

(Hit while re-basing a PR)
DeltaFile
+3-1llvm/lib/Transforms/Vectorize/VPlanAnalysis.h
+1-1llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+4-22 files

LLVM/project 1da4f4elldb/include/lldb/Target Platform.h, lldb/source/Plugins/Architecture/Arm ArchitectureArm.cpp

[lldb] Step over non-lldb breakpoints (#190622)

Note: this is a second attempt at 304c680 / #174348, hopefully fixing
the post-commit Mac testing failures. The main differences from the
previous commit are:
* Fixing the incorrect masks in ArchitectureArm.cpp
* Declining to step in StopInfoMachException if the PC and exception
exc_sub_code don't match - implies fixup already applied
* Change to reflect explicit Address constructor - I assume this is
correct, essentially explicitly making a temporary Address object of the
pc address in SkipOverTrapInstruction
* Removing the debugserver code to step over the trap instruction as it
interacts badly with this change (without the check mentioned
previously).

---

Several languages support some sort of "breakpoint" function, which adds
ISA-specific instructions to generate an interrupt at runtime. However,

    [31 lines not shown]
DeltaFile
+83-61lldb/source/Target/Platform.cpp
+76-0lldb/test/API/functionalities/builtin-debugtrap/TestBuiltinDebugTrap.py
+0-71lldb/test/API/macosx/builtin-debugtrap/TestBuiltinDebugTrap.py
+43-0lldb/source/Target/StopInfo.cpp
+30-0lldb/source/Plugins/Architecture/Arm/ArchitectureArm.cpp
+29-0lldb/include/lldb/Target/Platform.h
+261-13214 files not shown
+344-16920 files

LLVM/project 6a107d2llvm/lib/Transforms/AggressiveInstCombine AggressiveInstCombine.cpp, llvm/test/Transforms/AggressiveInstCombine popcount.ll

[AggressiveInstCombine] POPCNT generation for bit-count pattern (#177109)

The proposal is to enhance LLVM by teaching it to recognize the pattern
and replace it with the hardware POPCNT instruction.

---------

Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal at amd.com>
Co-authored-by: Craig Topper <craig.topper at sifive.com>
DeltaFile
+1,077-0llvm/test/Transforms/AggressiveInstCombine/popcount.ll
+136-10llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
+1,213-102 files

LLVM/project add71aellvm/include/llvm/IR Function.h InstructionListener.h, llvm/lib/IR BasicBlock.cpp Function.cpp

review
DeltaFile
+3-16llvm/include/llvm/IR/Function.h
+16-0llvm/unittests/IR/InstructionListenerTest.cpp
+8-7llvm/lib/IR/BasicBlock.cpp
+2-12llvm/lib/IR/Function.cpp
+7-4llvm/include/llvm/IR/InstructionListener.h
+8-2llvm/lib/IR/Instruction.cpp
+44-411 files not shown
+48-427 files

LLVM/project 8d4a0fbllvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU fptosi-sat-vector.ll fptoui-sat-vector.ll

[AMDGPU] Align GlobalISel with SelectionDAG for f16 to i1/i8 saturated conversions (#188019)

GlobaISel now also saturates `i1` and `i8` to `f16` conversion at `i16`
where available. As a side effect, this also causes the two uniform test
cases: `f16_i1` and `f16_i8` to use VALU instructions, instead of SALU
instructions. This is potentially sub-optimal but it makes it consistent
with ISel and has been already highlighted as future work in #187711.
DeltaFile
+113-193llvm/test/CodeGen/AMDGPU/fptosi-sat-vector.ll
+110-162llvm/test/CodeGen/AMDGPU/fptoui-sat-vector.ll
+10-25llvm/test/CodeGen/AMDGPU/fptosi-sat-scalar.ll
+8-20llvm/test/CodeGen/AMDGPU/fptoui-sat-scalar.ll
+4-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+245-4015 files

LLVM/project ad33917flang/include/flang/Lower AbstractConverter.h OpenMP.h, flang/lib/Lower/OpenMP OpenMP.cpp

[flang][OpenMP] Replace OMPGroupprivateDeviceTypeInfo struct with a using alias
DeltaFile
+2-6flang/include/flang/Lower/AbstractConverter.h
+3-5flang/include/flang/Lower/OpenMP.h
+2-3flang/lib/Lower/OpenMP/OpenMP.cpp
+7-143 files

LLVM/project 05413e1flang/include/flang/Lower OpenMP.h AbstractConverter.h, flang/lib/Lower Bridge.cpp

support device_type groupprivate lowering
DeltaFile
+22-3flang/lib/Lower/OpenMP/OpenMP.cpp
+22-0flang/test/Lower/OpenMP/groupprivate.f90
+11-0flang/include/flang/Lower/OpenMP.h
+11-0flang/lib/Lower/Bridge.cpp
+11-0flang/include/flang/Lower/AbstractConverter.h
+77-35 files

LLVM/project 1639e35flang/include/flang/Lower OpenMP.h, flang/lib/Lower/OpenMP OpenMP.cpp

[flang][mlir] Add flang to mlir lowering for groupprivate
DeltaFile
+186-0flang/test/Lower/OpenMP/groupprivate.f90
+124-2flang/lib/Lower/OpenMP/OpenMP.cpp
+0-9flang/test/Lower/OpenMP/Todo/groupprivate.f90
+1-0flang/include/flang/Lower/OpenMP.h
+311-114 files

LLVM/project 54294feflang/lib/Lower/OpenMP ClauseProcessor.cpp ClauseProcessor.h, llvm/include/llvm/Frontend/OpenMP ConstructDecompositionT.h

NFC code changes
DeltaFile
+68-68flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+18-18llvm/include/llvm/Frontend/OpenMP/ConstructDecompositionT.h
+3-3flang/lib/Lower/OpenMP/ClauseProcessor.h
+4-2flang/lib/Lower/OpenMP/OpenMP.cpp
+93-914 files

LLVM/project 53050a9llvm/lib/Target/AMDGPU VOP3Instructions.td SIInstrInfo.td

Split true16HelperReg32FromSrc16 into two OutPatFrags
DeltaFile
+18-12llvm/lib/Target/AMDGPU/VOP3Instructions.td
+5-7llvm/lib/Target/AMDGPU/SIInstrInfo.td
+23-192 files

LLVM/project aefb53allvm/lib/Target/AMDGPU AMDGPULibCalls.cpp, llvm/test/CodeGen/AMDGPU amdgpu-simplify-libcall-fabs.ll

[AMDGPU] AMDGPULibCalls: Set new intrinsic calling convention to C (#197364)

In #197151 libclc/test/math/fabs.cl,
tryReplaceLibcallWithSimpleIntrinsic replaces `call fastcc float
@_Z4fabsf` with `call fastcc float @llvm.fabs.f32`. But intrinsic call
must use CallingConv::C.
DeltaFile
+12-2llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-fabs.ll
+1-0llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp
+13-22 files

LLVM/project 852291bclang-tools-extra/clang-tidy/hicpp HICPPTidyModule.cpp, clang-tools-extra/docs ReleaseNotes.rst

[clang-tidy] Remove hicpp module [4/N] (#197354)

This commit removes the remaining checks of `hicpp` module.

Part of https://github.com/llvm/llvm-project/issues/183462
DeltaFile
+1-24clang-tools-extra/clang-tidy/hicpp/HICPPTidyModule.cpp
+14-0clang-tools-extra/docs/ReleaseNotes.rst
+0-11clang-tools-extra/docs/clang-tidy/checks/hicpp/uppercase-literal-suffix.rst
+0-9clang-tools-extra/docs/clang-tidy/checks/hicpp/vararg.rst
+0-8clang-tools-extra/docs/clang-tidy/checks/hicpp/use-nullptr.rst
+0-8clang-tools-extra/docs/clang-tidy/checks/hicpp/use-equals-delete.rst
+15-608 files not shown
+17-10314 files

LLVM/project f835827llvm/lib/Transforms/InstCombine InstCombineMulDivRem.cpp, llvm/test/Transforms/InstCombine powi.ll

[InstCombine] Fix incorect `foldPowiReassoc` on signed overflow (#197172)

Reproducer: 

```
#include <math.h>
#include <stdio.h>

__attribute__((noinline))
double f(double x) {
    return __builtin_powi(x, 1073741824) * __builtin_powi(x, 1073741824);
}

int main(void) {
    double r = f(2.0);
    printf("%f\n", r);
    return r == 0.0; // 0 = correct, 1 = miscompile
}
```

https://llvm.godbolt.org/z/sjK1EsGhx
DeltaFile
+66-10llvm/test/Transforms/InstCombine/powi.ll
+2-2llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+68-122 files

LLVM/project 6baac3fllvm/lib/Transforms/InstCombine InstCombineAddSub.cpp, llvm/test/Transforms/InstCombine sub.ll

[InstCombine] Drop `(X + Z) - (Y + Z) --> (X - Y)` fold (#197373)

The pattern below does the same thing and does it better
DeltaFile
+33-0llvm/test/Transforms/InstCombine/sub.ll
+2-9llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+35-92 files

LLVM/project 6608431llvm/lib/Target/RISCV RISCVInstrInfoZvvm.td, llvm/lib/Target/RISCV/AsmParser RISCVAsmParser.cpp

[RISCV][MC] add experimental `Zvvfmm` MC support (#196486)

This PR adds experimental MC layer support for the RISC-V `Zvvfmm` from
Integrated Matrix Extension based on the
[riscv-isa-release-fa55752-2026-05-04 spec
release](https://github.com/riscv/integrated-matrix-extension/releases/tag/riscv-isa-release-fa55752-2026-05-04).
As a follow up of `Zvvmm` in #193956

This PR:
- Renames `RISCVInstrInfoZvvmm.td` to `RISCVInstrInfoZvvm.td` so `Zvvmm`
and `Zvvfmm` share the same IME instruction file according to the spec.
And all future instructions from the `Zvvm family` will be placed here
too.
- Adds a new `VScaleReg` asm operand to support the `v0.scale` assembly
syntax.
- Adds assembler support for floating-point matrix instructions:
`vfmmacc.vv`, `vfwmmacc.vv`, `vfqmmacc.vv`, `vf8wmmacc.vv`
- Adds integer-input floating-point accumulate scaled instructions:
`vfwimmacc.vv`, `vfqimmacc.vv`, `vf8wimmacc.vv`

    [3 lines not shown]
DeltaFile
+95-0llvm/test/MC/RISCV/rvv/zvvfmm-invalid.s
+80-4llvm/lib/Target/RISCV/RISCVInstrInfoZvvm.td
+59-0llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+57-0llvm/test/MC/RISCV/rvv/zvvfmm.s
+11-0llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
+9-0llvm/test/MC/RISCV/rvv/zvvfmm-invalid-encoding.s
+311-47 files not shown
+330-513 files

LLVM/project f59aca9openmp/device/src Reduction.cpp

[OpenMP][offload] Inline target reductions (#196061)

Significantly reduces register usage and removes register spilling in
`offload/test/offloading/multiple-reductions.cpp`, for example. Provides
speedup of up to 5-10x for a lot of reductions in such a larger setup.

Based on https://github.com/llvm/llvm-project/pull/195940.
See also the discussion in
https://github.com/llvm/llvm-project/pull/195102.
DeltaFile
+11-9openmp/device/src/Reduction.cpp
+11-91 files

LLVM/project eb899cfclang-tools-extra/clang-tidy/readability NonConstParameterCheck.cpp, clang-tools-extra/docs ReleaseNotes.rst

[clang-tidy] Fix false positives for dependent initializers (#186953)

Fixes #177354.

Handle dependent initializers in `readability-non-const-parameter` more
conservatively to avoid false positives in generic lambdas.

This fixes cases like:
- `T x(*p)`
- `DependentCtor<T> s(p)`
DeltaFile
+45-0clang-tools-extra/test/clang-tidy/checkers/readability/non-const-parameter.cpp
+23-6clang-tools-extra/clang-tidy/readability/NonConstParameterCheck.cpp
+1-1clang-tools-extra/docs/ReleaseNotes.rst
+69-73 files