LLVM/project a257e16llvm/test/TableGen ArtificialSubregs.td ArtificialRegs.td

[TableGen] Use CHECK-LABEL in aritficial registers tests. NFC. (#185846)
DeltaFile
+28-28llvm/test/TableGen/ArtificialSubregs.td
+2-2llvm/test/TableGen/ArtificialRegs.td
+30-302 files

LLVM/project 9aba26blibclc/clc/lib/amdgpu CMakeLists.txt, libclc/clc/lib/amdgpu/math clc_frexp.cl

libclc: Use frexp builtins to implement frexp for amdgpu (#185637)

This should really be the default implementation.
DeltaFile
+46-0libclc/clc/lib/generic/math/clc_frexp_builtin.inc
+43-0libclc/clc/lib/amdgpu/math/clc_frexp.cl
+1-0libclc/clc/lib/amdgpu/CMakeLists.txt
+90-03 files

LLVM/project f69cb5cllvm/lib/Target/AMDGPU SIISelLowering.cpp

Review comments and code cleanup.
DeltaFile
+18-27llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+18-271 files

LLVM/project 74f5dd5llvm/lib/Target/RISCV RISCVInterleavedAccess.cpp, llvm/test/CodeGen/RISCV/rvv fixed-vectors-interleaved-access.ll

[RISCV]Lower one active interleaved load to normal segmented load (#185602)

 There’s an optimization for deinterleave loads in
`RISCVTargetLowering::PerformDAGCombine`.

We can generate a normal segmented load and 
let DAGCombine optimize it into vlse.
DeltaFile
+11-0llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
+3-4llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp
+14-42 files

LLVM/project 380ac9eclang/lib/Basic/Targets NVPTX.h, clang/test/CodeGenCUDA builtin-count-zeros-nvptx.cu

[NVPTX][clang] Ensure CLZ(0) is defined on NVPTX (#185630)

CUDA semantics specify that clz(0) = bitwidth, so clang should emit clz
/ ctz intrinsics for NVPTX with zero-is-poison = false.
DeltaFile
+12-0clang/test/CodeGenCUDA/builtin-count-zeros-nvptx.cu
+2-0clang/lib/Basic/Targets/NVPTX.h
+1-1clang/test/Headers/gpuintrin.c
+15-13 files

LLVM/project 20ebc62llvm/lib/Target/AMDGPU SIISelLowering.cpp SIInstrInfo.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.sub.ll llvm.amdgcn.reduce.add.ll

Overload `getVALUOp` to accept Opcodes as well.
DeltaFile
+26-26llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+26-26llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+2-26llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+12-8llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+1-0llvm/lib/Target/AMDGPU/SIInstrInfo.h
+67-865 files

LLVM/project 6b1cc22llvm/lib/Target/AArch64 AArch64MacroFusion.cpp AArch64Processors.td, llvm/test/CodeGen/AArch64 misched-fusion-fcsel.ll

Revert "[AArch64] Adding FeatureFuseFCmpFCSel  (#184881)"

This reverts commit 081cd56503bcdccde43a187053d7d5e32c50085e.
DeltaFile
+0-31llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
+0-27llvm/test/CodeGen/AArch64/misched-fusion-fcsel.ll
+0-4llvm/lib/Target/AArch64/AArch64Processors.td
+0-4llvm/lib/Target/AArch64/AArch64Features.td
+1-2llvm/lib/Target/AArch64/AArch64Subtarget.h
+1-685 files

LLVM/project 081cd56llvm/lib/Target/AArch64 AArch64MacroFusion.cpp AArch64Features.td, llvm/test/CodeGen/AArch64 misched-fusion-fcsel.ll

[AArch64] Adding FeatureFuseFCmpFCSel  (#184881)

This adds a new AArch64 feature, FeatureFuseFCmpFCSel - for FP compare
and FP Select instruction, and adds it to recent Apple CPUs.
Instruction scheduling makes such pairs adjacent.
DeltaFile
+31-0llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
+27-0llvm/test/CodeGen/AArch64/misched-fusion-fcsel.ll
+4-0llvm/lib/Target/AArch64/AArch64Features.td
+4-0llvm/lib/Target/AArch64/AArch64Processors.td
+2-1llvm/lib/Target/AArch64/AArch64Subtarget.h
+68-15 files

LLVM/project 18e447blldb/source/Plugins/Platform/MacOSX PlatformDarwin.cpp

[lldb][PlatformDarwin][NFC] Factor sanitization of Python module names into helper function (#185627)

I'm planning on re-using this logic for another API. This patch creates
a `SanitizedScriptingModuleName` that encapsulates the logic that checks
whether a file name would fail to be loaded by a `ScriptInterpreter`. I
called it something more generic despite it being `Python` specific at
the moment, in case the FIXME is eventually going to be addressed.

We have existing unit-tests that check this logic, so I'm relying on
that test coverage to give us confidence that this still works as
expected.
DeltaFile
+60-25lldb/source/Plugins/Platform/MacOSX/PlatformDarwin.cpp
+60-251 files

LLVM/project 032b2ffclang/lib/CIR/CodeGen TargetInfo.cpp CIRGenModule.cpp, clang/test/CIR/CodeGenHIP simple.cpp

[CIR][AMDGPU] Add AMDGPU target support to CIR CodeGen
DeltaFile
+89-0clang/test/CIR/CodeGenHIP/simple.cpp
+20-0clang/lib/CIR/CodeGen/TargetInfo.cpp
+4-0clang/lib/CIR/CodeGen/CIRGenModule.cpp
+3-0clang/lib/CIR/CodeGen/TargetInfo.h
+116-04 files

LLVM/project f7c79f4llvm/lib/Target/AArch64 AArch64InstrGISel.td, llvm/lib/Target/AArch64/GISel AArch64RegisterBankInfo.cpp AArch64LegalizerInfo.cpp

[AArch64][GlobalISel] Add G_SQDMULL node

Previously, GISel was failing to lower the sqdmulls.scalar intrinsic. This is just a variation of sqdmull, but on two 32-bit S registers.
To fix this, create a G_SQDMULL node, and lower sqdmulls.scalar to that. This node is linked to the SD patterns for sqdmull, which allow this version of the intrinsic to lower.
DeltaFile
+99-62llvm/test/CodeGen/AArch64/arm64-vmul.ll
+1-7llvm/test/CodeGen/AArch64/arm64-int-neon.ll
+7-0llvm/lib/Target/AArch64/AArch64InstrGISel.td
+2-0llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+2-0llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+111-695 files

LLVM/project d4d9b5allvm/test/CodeGen/AMDGPU dynamic_stackalloc.ll llvm.amdgcn.reduce.sub.ll

[AMDGPU] DPP implementations for Wave Reduction

Adding DPP reduction support for i32 types.
Supported Ops: `umin`, `min`, `umax`, `max`,
`add`, `sub`, `and`, `or`, `xor`.
DeltaFile
+2,113-1,374llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll
+1,096-146llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+1,047-142llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+986-132llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+894-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.max.ll
+894-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.min.ll
+7,030-2,0108 files not shown
+11,325-2,87814 files

LLVM/project bf21500llvm/test/CodeGen/SystemZ misched-prera-cmp-elim.mir misched-prera-latencies.mir

[SystemZ] Add missing REQUIRES: asserts
DeltaFile
+1-0llvm/test/CodeGen/SystemZ/misched-prera-cmp-elim.mir
+1-0llvm/test/CodeGen/SystemZ/misched-prera-latencies.mir
+2-02 files

LLVM/project 5adbf03clang/lib/Basic/Targets AMDGPU.cpp, clang/test/Driver amdgpu-macros.cl

clang/AMDGPU: Ensure more macros are defined for dummy target

FP_FAST_FMA should be unconditionally true.
DeltaFile
+15-15clang/lib/Basic/Targets/AMDGPU.cpp
+8-8clang/test/Preprocessor/predefined-arch-macros.c
+10-0clang/test/Driver/amdgpu-macros.cl
+33-233 files

LLVM/project d780072libclc/clc/include/clc/math clc_div_fast.h clc_recip_fast.h, libclc/clc/lib/generic/math clc_sqrt_fast.cl clc_recip_fast.cl

libclc: Add fast version utility functions for div, sqrt and reciprocal (#185823)
DeltaFile
+19-0libclc/clc/include/clc/math/clc_div_fast.h
+19-0libclc/clc/include/clc/math/clc_recip_fast.h
+19-0libclc/clc/include/clc/math/clc_sqrt_fast.h
+15-0libclc/clc/lib/generic/math/clc_sqrt_fast.cl
+14-0libclc/clc/lib/generic/math/clc_recip_fast.cl
+13-0libclc/clc/lib/generic/math/clc_div_fast.cl
+99-03 files not shown
+128-19 files

LLVM/project cf38704llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Fix debug log format (#185804)

The current debug log:
```
SrcSubscripts: {4611686018427387904,+,-1}<nsw><%loop.header>
DstSubscripts: {4611686018427387904,+,-1}<nsw><%loop.header>    delinearized
    common nesting levels = 1
    maximum nesting levels = 1
    SameSD nesting levels = 0
```
Missing a newline before "delinearized".

Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
DeltaFile
+1-0llvm/lib/Analysis/DependenceAnalysis.cpp
+1-01 files

LLVM/project 9a69fdcllvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch/lasx vxi1-masks.ll

[LoongArch] Try to avoid casts around logical vector ops on lasx (#163523)

On LASX the type v4i1/v8i1/v16i1 may be legalized to v4i32/v8i16/v16i8,
which is LSX-sized register. In most cases we actually compare or select
LASX-sized registers and mixing the two types creates horrible code.
DeltaFile
+81-522llvm/test/CodeGen/LoongArch/lasx/vxi1-masks.ll
+127-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+208-5222 files

LLVM/project 749953blibclc/clc/include/clc/math clc_sqrt_fast.h clc_div_fast.h, libclc/clc/lib/generic/math clc_sqrt_fast.cl clc_recip_fast.cl

libclc: Add fast version utility functions for div, sqrt and reciprocal

These are subtly different from the native versions, and should have
tighter requirements. They should handle the special cases correctly,
unlike the native functions from the standard.
DeltaFile
+19-0libclc/clc/include/clc/math/clc_sqrt_fast.h
+19-0libclc/clc/include/clc/math/clc_div_fast.h
+19-0libclc/clc/include/clc/math/clc_recip_fast.h
+15-0libclc/clc/lib/generic/math/clc_sqrt_fast.cl
+14-0libclc/clc/lib/generic/math/clc_recip_fast.cl
+13-0libclc/clc/lib/generic/math/clc_div_fast.cl
+99-03 files not shown
+128-19 files

LLVM/project 52ea06fllvm/test/CodeGen/SPIRV/transcoding store-atomic.ll load-atomic.ll

[SPIRV] Add tests documenting incorrect lowering of load/store atomic (#185628)

This patch only adds the tests documenting the broken behavior, but does
not fix them.
DeltaFile
+121-0llvm/test/CodeGen/SPIRV/transcoding/store-atomic.ll
+111-0llvm/test/CodeGen/SPIRV/transcoding/load-atomic.ll
+232-02 files

LLVM/project c035183llvm/lib/Target/AMDGPU SIRegisterInfo.cpp

further simplify loop logic.
DeltaFile
+3-4llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+3-41 files

LLVM/project c71bf5cllvm/lib/Target/AMDGPU SIRegisterInfo.cpp

Simplify spilling loop.

Plus some code cleanup, enable clang formatting.
DeltaFile
+13-24llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+13-241 files

LLVM/project b22c6b4libcxx/test/libcxx/containers/strings/basic.string asan_vector_integration.pass.cpp asan_deque_integration.pass.cpp, libcxx/test/libcxx/strings/basic.string asan_vector_integration.pass.cpp asan_deque_integration.pass.cpp

[libc++][NFC] Merge libc++ specific test directories for std::string (#185717)

String tests do not belong under `containers/`.
DeltaFile
+182-0libcxx/test/libcxx/strings/basic.string/asan_vector_integration.pass.cpp
+0-182libcxx/test/libcxx/containers/strings/basic.string/asan_vector_integration.pass.cpp
+182-0libcxx/test/libcxx/strings/basic.string/asan_deque_integration.pass.cpp
+0-182libcxx/test/libcxx/containers/strings/basic.string/asan_deque_integration.pass.cpp
+0-106libcxx/test/libcxx/containers/strings/basic.string/abi.compile.pass.cpp
+106-0libcxx/test/libcxx/strings/basic.string/abi.compile.pass.cpp
+470-4706 files not shown
+684-68412 files

LLVM/project 502727blibunwind/src libunwind.cpp, libunwind/test cfi_violating_handler.pass.cpp

[libunwind][PAC] Defang ptrauth's PC in valid CFI range abort

It turns out making the CFI check a release mode abort causes many,
if not the majority, of JITs to fail during unwinding as they do not
set up CFI sections for their generated code. As a result any JITs
that do nominally support unwinding (and catching) through their JIT
or assembly frames trip this abort.

rdar://170862047
DeltaFile
+91-0libunwind/test/cfi_violating_handler.pass.cpp
+11-17libunwind/src/libunwind.cpp
+102-172 files

LLVM/project 5e9e039libunwind/src libunwind.cpp, libunwind/test cfi_violating_handler.pass.cpp

[libunwind][PAC] Defang ptrauth's PC in valid CFI range abort

It turns out making the CFI check a release mode abort causes many,
if not the majority, of JITs to fail during unwinding as they do not
set up CFI sections for their generated code. As a result any JITs
that do nominally support unwinding (and catching) through their JIT
or assembly frames trip this abort.

rdar://170862047
DeltaFile
+101-0libunwind/test/cfi_violating_handler.pass.cpp
+11-17libunwind/src/libunwind.cpp
+112-172 files

LLVM/project 8a9c6a3llvm/lib/Analysis DependenceAnalysis.cpp

[DA] refactor bounds inference in exactSIVtest and exactRDIVtest  (NFC) (#185719)

Replaces the `SmallVector`-based approach for computing the min/max of
affine domain bounds with `GetMaxOrMin` lambda returning `std::optional`
for better readability.
Previously, the code allocated a `SmallVector` to collect valid bounds
and relied on `smax(front(), back())` to handle the single-element case,
which may cause misunderstanding.

---------

Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
DeltaFile
+27-23llvm/lib/Analysis/DependenceAnalysis.cpp
+27-231 files

LLVM/project ca29695llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Remove absolute value calculations in the Weak Zero SIV tests
DeltaFile
+7-7llvm/lib/Analysis/DependenceAnalysis.cpp
+7-71 files

LLVM/project 446d5d4llvm/test/Analysis/DependenceAnalysis weak-zero-siv-addrec-wrap.ll

[DA] Update tests for the Weak Zero SIV tests (NFC)
DeltaFile
+112-0llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-addrec-wrap.ll
+112-01 files

LLVM/project c239032llvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Consolidate the core logic of the Weak Zero SIV tests (NFCI)
DeltaFile
+80-124llvm/lib/Analysis/DependenceAnalysis.cpp
+5-0llvm/include/llvm/Analysis/DependenceAnalysis.h
+85-1242 files

LLVM/project 4860287llvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Extract reversing dependence logic (NFCI)
DeltaFile
+10-7llvm/lib/Analysis/DependenceAnalysis.cpp
+6-0llvm/include/llvm/Analysis/DependenceAnalysis.h
+16-72 files

LLVM/project 5cfe50dllvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis weak-zero-siv-addrec-wrap.ll

[DA] Add nsw check for addrecs in the Weak Zero SIV tests
DeltaFile
+31-16llvm/test/Analysis/DependenceAnalysis/weak-zero-siv-addrec-wrap.ll
+3-0llvm/lib/Analysis/DependenceAnalysis.cpp
+34-162 files