LLVM/project 39c6ed3clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp CIRGenBuilder.h, clang/test/CodeGen/AArch64 neon-intrinsics.c

[CIR][AArch64] add vshr_* builtins (#186693)

Part of https://github.com/llvm/llvm-project/issues/185382

- Moved lowering logic from clangir incubator to upstream
- Added tests, partially reusing tests from
[neon-intrinsics.c](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon-intrinsics.c)
and
[neon.c](https://github.com/llvm/clangir/blob/main/clang/test/CIR/CodeGen/AArch64/neon.c)
- Made sure that all intrinsics from [Neon
ACLE](https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#vector-shift-right)
are implemented and tested
DeltaFile
+236-0clang/test/CodeGen/AArch64/neon/intrinsics.c
+0-213clang/test/CodeGen/AArch64/neon-intrinsics.c
+50-2clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+8-0clang/lib/CIR/CodeGen/CIRGenBuilder.h
+294-2154 files

LLVM/project 59b3a7dllvm/test/TableGen/GlobalISelEmitter MatchTableOptimizerSameOperand-invalid.td

Fix test
DeltaFile
+12-9llvm/test/TableGen/GlobalISelEmitter/MatchTableOptimizerSameOperand-invalid.td
+12-91 files

LLVM/project f65341cllvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU frem.ll setcc-f64-hi32mask.ll

[AMDGPU][ISel] Reduce `f64` compare to integer compare of upper half (#188356)

Truncate `f64` `setcc`s to upper 32-bit operands where possible.
These transformations are analogous to those in #181238, but for ordered
and unordered fp comparisons.

Fixes #187996.

Alive2 verification of transformations:

- For `eq` / `ne`: [ZRciR6](https://alive2.llvm.org/ce/z/ZRciR6)
- For `lt` / `ge`: [RDGnqr](https://alive2.llvm.org/ce/z/RDGnqr)
- For `le` / `gt`: [v0jlD5](https://alive2.llvm.org/ce/z/v0jlD5)
DeltaFile
+561-502llvm/test/CodeGen/AMDGPU/frem.ll
+916-0llvm/test/CodeGen/AMDGPU/setcc-f64-hi32mask.ll
+196-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+44-37llvm/test/CodeGen/AMDGPU/llvm.frexp.ll
+16-20llvm/test/CodeGen/AMDGPU/rsq.f64.ll
+10-10llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll
+1,743-5696 files

LLVM/project ab6394dlldb/source/Plugins/SymbolLocator/SymStore SymbolLocatorSymStore.cpp, lldb/test/API/symstore TestSymStore.py

[lldb] Bring Debuginfod's StreamedHTTPResponseHandler to SymbolLocatorSymStore (#187687)

SymbolLocatorSymStore used a simple local implementation of
HTTPResponseHandler so far. That was fine for basic usage, but it would
cause issues down the line. This patch hoists the
StreamedHTTPResponseHandler class from libDebuginfod to SupportHTTP and
integrates it in SymbolLocatorSymStore. PDB file downloads will now be
buffered on disk, which is necessary since they can be huge.

We use the opportunity an stop logging 404 responses (file not found on
server) and print warnings for all other erroneous HTTP responses. It
was more complicated before, because the old response handler created
the underlying file in any case. The new one does that only once the
first content package comes in.
DeltaFile
+63-70lldb/source/Plugins/SymbolLocator/SymStore/SymbolLocatorSymStore.cpp
+49-0llvm/include/llvm/Support/HTTP/StreamedHTTPResponseHandler.h
+1-47llvm/lib/Debuginfod/Debuginfod.cpp
+34-0llvm/lib/Support/HTTP/StreamedHTTPResponseHandler.cpp
+18-0lldb/test/API/symstore/TestSymStore.py
+1-0llvm/lib/Support/HTTP/CMakeLists.txt
+166-1176 files

LLVM/project 9e428b7llvm/docs ProgrammersManual.rst, llvm/include/llvm/Support Error.h

[LLVM][Support] add nonNull function helper (#188718)

We often see a pattern like:
```
T *ptr = doSomething()
assert(ptr && "doSomething() shouldn't return nullptr");
```

We also have functions like `cantFail`, but those are working with
Expected types.
This commits adds a `nonNull` function, which can be used inline. In
practice, one could use:

```
T *ptr = cast<T>(functionReturningT());
```

But it conveys the meaning that `functionReturningT` might return a
subtype/supertype that we actually cast.

    [7 lines not shown]
DeltaFile
+38-0llvm/unittests/Support/ErrorTest.cpp
+23-0llvm/include/llvm/Support/Error.h
+9-0llvm/docs/ProgrammersManual.rst
+70-03 files

LLVM/project e6b0f95llvm/test/TableGen/GlobalISelCombinerEmitter match-table-hoisting.td

Add test desc
DeltaFile
+5-0llvm/test/TableGen/GlobalISelCombinerEmitter/match-table-hoisting.td
+5-01 files

LLVM/project c7c340bllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp, llvm/test/CodeGen/AMDGPU amdgpu-simplify-demanded-bits-for-target-node.ll amdgpu-simplify-demanded-bits-readfirstlane.ll

[AMDGPU][CodeGen] Implement SimplifyDemandedBitsForTargetNode for readlane, wwm and set.inactive intrinsics. (#190830)

Propagate demanded bits through readlane, wwm, set.inactive intrinsics
in AMDGPUISelLowering in SimplifyDemandedBitsForTargetNode.

This allows upstream zero/sign extensions to be eliminated when only a
subset of bits is used after intrinsics.

Partially addresses https://github.com/llvm/llvm-project/issues/128390.
DeltaFile
+266-0llvm/test/CodeGen/AMDGPU/amdgpu-simplify-demanded-bits-for-target-node.ll
+0-60llvm/test/CodeGen/AMDGPU/amdgpu-simplify-demanded-bits-readfirstlane.ll
+22-28llvm/test/CodeGen/AMDGPU/fix-wwm-vgpr-copy.ll
+1-9llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-wwm.ll
+4-1llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+293-985 files

LLVM/project ff6097bllvm/test/TableGen/GlobalISelCombinerEmitter match-table-hoisting.td, llvm/utils/TableGen/Common/GlobalISel GlobalISelMatchTable.h GlobalISelMatchTable.cpp

[GlobalISel] Prevent hoisting of CheckIsSameOperand from creating invalid match tables

Fixes #188513

This patch adds logic to ask PredicateMatchers whether they'd like to be hoisted out of a specific Matcher or not.
SameOperandMatcher can use it to check if it's being hoisted out of the RuleMatcher that defines the operand it relies on.

Assisted-By: Claude Opus 4.6
Context of Use: Claude was only used to add LLVM-style RTTI to the matcher class (repetitive work). Claude-generated code was reviewed and cleaned up before committing.
DeltaFile
+92-0llvm/test/TableGen/GlobalISelCombinerEmitter/match-table-hoisting.td
+38-1llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.h
+13-17llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.cpp
+143-183 files

LLVM/project 6cb2006clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp, clang/test/CodeGen/AArch64 neon-intrinsics.c

[clang][CIR] Add lowering for vcvt_n_ and vcvtq_n_ conversion intrinsics

This PR adds lowering for the conversion intrinsics with an immediate
argument (identified by `_n_` in the intrinsic name), excluding FP16
variants.

It also moves the corresponding tests from:
  * clang/test/CodeGen/AArch64/neon_intrinsics.c

to:
  * clang/test/CodeGen/AArch64/neon/intrinsics.c

The lowering follows the existing implementation in
CodeGen/TargetBuiltins/ARM.cpp and adds the `getFloatNeonType` helper
to support it. The remaining changes are code motion and refactoring.

Reference:
[1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#conversions
DeltaFile
+197-147clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+222-0clang/test/CodeGen/AArch64/neon/intrinsics.c
+0-201clang/test/CodeGen/AArch64/neon-intrinsics.c
+419-3483 files

LLVM/project 4f5a59ellvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.xor.ll llvm.amdgcn.reduce.or.ll

[AMDGPU] DPP wave reduction for long types - 3

Supported Ops: `and`, `or`, `xor`
DeltaFile
+984-132llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+960-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.or.ll
+960-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.and.ll
+12-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,916-3494 files

LLVM/project 5815273llvm/lib/Target/AMDGPU SIISelLowering.cpp

Review comments:
use input wave instruction for checks
DeltaFile
+7-7llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+7-71 files

LLVM/project 4f15657llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.sub.ll llvm.amdgcn.reduce.add.ll

[AMDGPU] DPP wave reduction for long types - 2

Supported Ops: `add`, `sub`
DeltaFile
+1,113-146llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+1,079-142llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+72-20llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,264-3083 files

LLVM/project 8c116f8llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fsub.ll llvm.amdgcn.reduce.fadd.ll

[AMDGPU] DPP wave reduction for double types - 2

Supported Ops: `fadd` and `fsub`
DeltaFile
+1,030-130llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fsub.ll
+1,008-130llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fadd.ll
+12-10llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,050-2703 files

LLVM/project 1682c6cllvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fmax.ll llvm.amdgcn.reduce.fmin.ll

[AMDGPU] DPP wave reduction for double types - 1

Supported Ops: `fmin` and `fmax`
DeltaFile
+1,112-234llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmax.ll
+1,112-234llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmin.ll
+27-13llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,251-4813 files

LLVM/project 51d8076llvm/lib/Target/AMDGPU SIISelLowering.cpp

Refactor lambda to a helper function
DeltaFile
+8-6llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+8-61 files

LLVM/project 903cfcfllvm/include/llvm/IR IntrinsicsNVVM.td, llvm/test/CodeGen/NVPTX dead-shfl.ll

[NVVM] Update properties for non-sync variants of the SHFL intrinsics (#189615)

Non-sync SHFL variants (shfl without .sync) are pure functions of their SSA operands and the active thread mask. Assign IntrReadMem, IntrInaccessibleMemOnly and IntrWillReturn so that: - Reading the implicit mask state is modeled for correct ordering with other convergent operations - Truly dead non-sync shfl code can still be DCE'd

Sync SHFL variants keep IntrInaccessibleMemOnly (no IntrReadMem, no IntrWillReturn) to model synchronization side effects and prevent unsafe DCE/reordering.
DeltaFile
+19-0llvm/test/CodeGen/NVPTX/dead-shfl.ll
+13-4llvm/include/llvm/IR/IntrinsicsNVVM.td
+32-42 files

LLVM/project daf7a8fllvm/lib/Target/AMDGPU AMDGPURewriteAGPRCopyMFMA.cpp SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU inline-asm-vgpr-range-unsupported-width.ll

AMDGPU coverity fixes (#182013)

Coverity fixes
DeltaFile
+12-0llvm/test/CodeGen/AMDGPU/inline-asm-vgpr-range-unsupported-width.ll
+2-1llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+1-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+1-0llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
+16-24 files

LLVM/project 5616ad7llvm/lib/Analysis ConstantFolding.cpp, llvm/lib/Target/NVPTX NVPTXTargetTransformInfo.cpp

[NVPTX] Lower nvvm.fmax to maximumnum not maxnum (#189976)

Converting nvvm.{fmin/fmax} into llvm.{min/max}num is slightly
incorrect, as {min/max}(a, sNaN) should produce "a" according to the PTX
spec, but LLVM's {min/max}num intrinsics may return either NaN or "a".

Use the {min/max}imumnum intrinsics instead for correct sNaN behaviour.

Also tidy up NVVM FMin/FMax constant-folding using these tighter
definitions of how the NVVM intrinsics map to {min/max}imum and
{min/max}imumnum.
DeltaFile
+14-14llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
+12-12llvm/test/CodeGen/NVPTX/math-intrins-sm80-ptx70-instcombine.ll
+5-13llvm/lib/Analysis/ConstantFolding.cpp
+6-6llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
+37-454 files

LLVM/project ada5383lldb/test/API/linux/aarch64/sme_only_registers TestSMEOnlyRegisters.py main.c

[lldb][AArch64][Linux] Add tests for SME only core files (#189985)

Part of #138717.

This did not require any changes to core file handling. Since a static
snapshot of an SME only system looks pretty much the same as one from
the same state on a system with SVE and SME.

For this reason, we're only testing 2 combinations. In total these
include streaming and non-streaming, ZA on and off, and 2 different
vector lengths. I think this is enough to prove that the existing code
is working.
DeltaFile
+25-0lldb/test/API/linux/aarch64/sme_only_registers/TestSMEOnlyRegisters.py
+8-3lldb/test/API/linux/aarch64/sme_only_registers/main.c
+0-0lldb/test/API/linux/aarch64/sme_only_registers/core_simd_on_32
+0-0lldb/test/API/linux/aarch64/sme_only_registers/core_streaming_off_64
+33-34 files

LLVM/project 8798c1dllvm/lib/Target/AMDGPU SIISelLowering.cpp

Review comments:
use input wave instruction for checks
DeltaFile
+7-7llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+7-71 files

LLVM/project e4a7113llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fsub.ll llvm.amdgcn.reduce.fadd.ll

[AMDGPU] DPP wave reduction for double types - 2

Supported Ops: `fadd` and `fsub`
DeltaFile
+1,030-130llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fsub.ll
+1,008-130llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fadd.ll
+12-10llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,050-2703 files

LLVM/project ebdd7e9llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.sub.ll llvm.amdgcn.reduce.add.ll

[AMDGPU] DPP wave reduction for long types - 2

Supported Ops: `add`, `sub`
DeltaFile
+1,113-146llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+1,079-142llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+72-20llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,264-3083 files

LLVM/project c5a7fc9llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.min.ll llvm.amdgcn.reduce.max.ll

[AMDGPU] DPP wave reduction for long types - 1

Supported Ops: `min`, `max`, `umin`, `umax`
DeltaFile
+1,084-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.min.ll
+1,084-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.max.ll
+1,044-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll
+1,044-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umin.ll
+185-42llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+4,441-4745 files

LLVM/project 368e697llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fmax.ll llvm.amdgcn.reduce.fmin.ll

[AMDGPU] DPP wave reduction for double types - 1

Supported Ops: `fmin` and `fmax`
DeltaFile
+1,112-234llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmax.ll
+1,112-234llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmin.ll
+27-13llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,251-4813 files

LLVM/project 219874dllvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.xor.ll llvm.amdgcn.reduce.and.ll

[AMDGPU] DPP wave reduction for long types - 3

Supported Ops: `and`, `or`, `xor`
DeltaFile
+984-132llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+960-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.and.ll
+960-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.or.ll
+12-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,916-3494 files

LLVM/project b9e1e0ellvm/lib/Target/AMDGPU SIISelLowering.cpp

Refactor lambda to a helper function
DeltaFile
+26-22llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+26-221 files

LLVM/project 2cdf60aclang/lib/Basic/Targets AArch64.cpp AArch64.h, clang/lib/Sema SemaARM.cpp

[AArch64][clang] Use tablegen rather than hard-coded feature dependencies

Refactor AArch64 frontend feature handling so extension relationships come
from the TargetParser extension graph instead of hand-written dependency
code in C++. This makes `llvm::AArch64::ExtensionSet` the source of
truth for dependency expansion while still keeping the short `Has...` names
used in the frontend code.

This removes a large amount of duplicated implication logic from
`handleTargetFeatures` and related feature queries. The frontend now
rebuilds its extension state from TableGen-derived data and then derives
its cached feature state from that, rather than maintaining parallel
dependency rules in C++.

I also preserved several pieces of historical frontend behaviour that are
not represented directly in the extension graph. Explicit disables such as
`no-sme` still win after implied-feature expansion, direct `+fullfp16` and
`+jscvt` still restore the expected NEON-facing state, and SME-family
features no longer incorrectly appear to enable AdvSIMD/NEON.

    [4 lines not shown]
DeltaFile
+234-279clang/lib/Basic/Targets/AArch64.cpp
+9-71clang/lib/Basic/Targets/AArch64.h
+69-0clang/lib/Basic/Targets/AArch64TargetInfoFeatures.inc
+22-0clang/test/Preprocessor/aarch64-target-features.c
+3-3clang/lib/Sema/SemaARM.cpp
+5-0clang/test/Sema/aarch64-sme-func-attrs-without-target-feature.cpp
+342-3531 files not shown
+343-3547 files

LLVM/project db61d1corc-rt/include/orc-rt QueueingTaskDispatcher.h, orc-rt/lib/executor QueueingTaskDispatcher.cpp

[orc-rt] Refactor QueueingTaskDispatcher to use an external TaskQueue. (#190920)

QueueingTaskDispatcher now takes a TaskQueue by reference rather than
maintaining an internal queue. This lets API clients retain direct
access to the queue after transferring dispatcher ownership to the
Session.

TaskQueue operations (takeFirstIn, takeLastIn) are blocking: callers
wait until a task arrives or the queue is shut down. This enables a
simple client idiom:

```
  QueueingTaskDispatcher::TaskQueue TQ;
  Session S(std::make_unique<QueueingTaskDispatcher>(TQ), ...);
  S.attach(<controller access>);

  while (auto T = TQ.takeFirstIn())
    T->run();
```
DeltaFile
+148-205orc-rt/unittests/QueueingTaskDispatcherTest.cpp
+33-17orc-rt/include/orc-rt/QueueingTaskDispatcher.h
+25-16orc-rt/lib/executor/QueueingTaskDispatcher.cpp
+206-2383 files

LLVM/project 66eae42lldb/source/Expression DWARFExpression.cpp, lldb/source/Plugins/SymbolFile/DWARF SymbolFileWasm.cpp DWARFUnit.cpp

[lldb] Use llvm::DWARFExpression::iterator in DWARFExpression::Evaluate (#190556)

Co-authored-by: Jonas Devlieghere <jonas at devlieghere.com>
DeltaFile
+115-88lldb/source/Expression/DWARFExpression.cpp
+6-6lldb/source/Plugins/SymbolFile/DWARF/SymbolFileWasm.cpp
+4-5lldb/unittests/Expression/DWARFExpressionTest.cpp
+2-1lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
+2-1lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
+2-1lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
+131-1026 files not shown
+139-10612 files

LLVM/project 197051bllvm/lib/Target/AMDGPU SIISelLowering.cpp

Review comments:
use input wave instruction for checks
DeltaFile
+7-7llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+7-71 files