LLVM/project 3ac9b77llvm/include/llvm/Analysis LazyValueInfo.h, llvm/lib/Analysis LazyValueInfo.cpp

[LVI] Remove unused DL member (NFC)

This is never used (the data layout is taken later from the
module instead) and not even initialized in the legacy PM code path.
DeltaFile
+3-5llvm/include/llvm/Analysis/LazyValueInfo.h
+1-1llvm/lib/Analysis/LazyValueInfo.cpp
+4-62 files

LLVM/project 2ecd8e2llvm/include/llvm/Analysis LazyValueInfo.h, llvm/lib/Analysis LazyValueInfo.cpp

[LVI] Store function in LVI wrapper class

We know the function we're working on at construction, there is
no need to have code to fetch the module in every place that
fetches the Impl object.

I'm storing the function instead of the module to be able to get
the block number epoch in a future change.
DeltaFile
+18-27llvm/lib/Analysis/LazyValueInfo.cpp
+5-4llvm/include/llvm/Analysis/LazyValueInfo.h
+23-312 files

LLVM/project ad93032llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: address suggestions
DeltaFile
+16-39llvm/include/llvm/ADT/GenericUniformityImpl.h
+15-0llvm/lib/Analysis/UniformityAnalysis.cpp
+31-392 files

LLVM/project 04d42e2llvm/include/llvm/ADT GenericUniformityImpl.h GenericUniformityInfo.h

review: address suggestion on hasDivergence flag
DeltaFile
+26-16llvm/include/llvm/ADT/GenericUniformityImpl.h
+0-3llvm/include/llvm/ADT/GenericUniformityInfo.h
+26-192 files

LLVM/project 525180cllvm/include/llvm/ADT GenericUniformityImpl.h GenericSSAContext.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: add comment in isNeverDivergent and separate VH callback for other follow-up
DeltaFile
+0-83llvm/unittests/Target/AMDGPU/UniformityAnalysisCallbackVHTest.cpp
+0-46llvm/lib/Analysis/UniformityAnalysis.cpp
+4-14llvm/include/llvm/ADT/GenericUniformityImpl.h
+5-0llvm/include/llvm/ADT/GenericSSAContext.h
+0-4llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+0-1llvm/unittests/Target/AMDGPU/CMakeLists.txt
+9-1481 files not shown
+9-1497 files

LLVM/project d27b6c3llvm/include/llvm/ADT GenericUniformityImpl.h GenericUniformityInfo.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: chnage design to track uniform values
DeltaFile
+46-50llvm/lib/Analysis/UniformityAnalysis.cpp
+11-29llvm/include/llvm/ADT/GenericUniformityImpl.h
+17-21llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+1-1llvm/include/llvm/ADT/GenericUniformityInfo.h
+75-1014 files

LLVM/project b549b33clang/lib/CIR/CodeGen CIRGenBuiltinRISCV.cpp CIRGenBuiltin.cpp

[CIR][RISCV][NFC] Add CIRGenBuiltinRISCV file to support RISCV builtins codegen (#186050)

This PR adds CIRGenBuiltinRISCV.cpp file for RISCV specific builtins
codegen support.
List all builtins except vector builtins which need tablegen, and mark
them as "NYI".
DeltaFile
+112-0clang/lib/CIR/CodeGen/CIRGenBuiltinRISCV.cpp
+3-2clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+3-0clang/lib/CIR/CodeGen/CIRGenFunction.h
+1-0clang/lib/CIR/CodeGen/CMakeLists.txt
+119-24 files

LLVM/project a9a3a6fllvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: separate isDivergent internal and exxternal use and clear divergentValues after finalizing uniformValues
DeltaFile
+41-16llvm/include/llvm/ADT/GenericUniformityImpl.h
+10-5llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+2-0llvm/lib/Analysis/UniformityAnalysis.cpp
+53-213 files

LLVM/project 17c45b4llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

use CallbackVH for deletion/RAUW
DeltaFile
+149-0llvm/unittests/Target/AMDGPU/UniformityAnalysisCallbackVHTest.cpp
+45-0llvm/lib/Analysis/UniformityAnalysis.cpp
+12-1llvm/include/llvm/ADT/GenericUniformityImpl.h
+1-0llvm/unittests/Target/AMDGPU/CMakeLists.txt
+207-14 files

LLVM/project df08126llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: address suggestions
DeltaFile
+2-68llvm/unittests/Target/AMDGPU/UniformityAnalysisCallbackVHTest.cpp
+5-7llvm/lib/Analysis/UniformityAnalysis.cpp
+3-5llvm/include/llvm/ADT/GenericUniformityImpl.h
+10-803 files

LLVM/project b085d47llvm/include/llvm/ADT GenericUniformityImpl.h

review: remove isNeverDivergent check for internal query
DeltaFile
+0-2llvm/include/llvm/ADT/GenericUniformityImpl.h
+0-21 files

LLVM/project 329a0adllvm/include/llvm/ADT GenericUniformityImpl.h GenericUniformityInfo.h

Add api to query unknown uniformity
DeltaFile
+16-7llvm/include/llvm/ADT/GenericUniformityImpl.h
+4-0llvm/include/llvm/ADT/GenericUniformityInfo.h
+20-72 files

LLVM/project 245ecffllvm/include/llvm/ADT GenericUniformityImpl.h GenericSSAContext.h, llvm/lib/CodeGen MachineSSAContext.cpp

review: remove ir header
DeltaFile
+5-9llvm/include/llvm/ADT/GenericUniformityImpl.h
+4-0llvm/lib/IR/SSAContext.cpp
+2-0llvm/lib/CodeGen/MachineSSAContext.cpp
+1-0llvm/include/llvm/ADT/GenericSSAContext.h
+12-94 files

LLVM/project b965d58llvm/include/llvm/ADT GenericUniformityImpl.h

clean-up
DeltaFile
+2-6llvm/include/llvm/ADT/GenericUniformityImpl.h
+2-61 files

LLVM/project 80ea81dllvm/test/CodeGen/AMDGPU/GlobalISel divergence-divergent-i1-used-outside-loop.mir divergence-structurizer.mir

update failing test checks
DeltaFile
+132-171llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir
+101-127llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir
+58-85llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.mir
+17-27llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll
+15-21llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
+2-5llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.ll
+325-4361 files not shown
+327-4417 files

LLVM/project 31db715llvm/include/llvm/ADT GenericUniformityImpl.h GenericUniformityInfo.h

review: keep unknown values divergent
DeltaFile
+7-12llvm/include/llvm/ADT/GenericUniformityImpl.h
+0-4llvm/include/llvm/ADT/GenericUniformityInfo.h
+7-162 files

LLVM/project a2eff1bllvm/lib/CodeGen MachineUniformityAnalysis.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel divergence-divergent-i1-used-outside-loop.mir divergence-structurizer.mir

track uniform value for machine uniformity
DeltaFile
+171-132llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir
+127-101llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir
+85-58llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.mir
+27-17llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll
+21-15llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
+12-7llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+443-3302 files not shown
+453-3348 files

LLVM/project a58d6a1llvm/include/llvm/ADT GenericUniformityImpl.h GenericUniformityInfo.h, llvm/lib/Analysis UniformityAnalysis.cpp

track uniform values at SSA level
DeltaFile
+23-2llvm/include/llvm/ADT/GenericUniformityImpl.h
+17-0llvm/lib/Analysis/UniformityAnalysis.cpp
+12-0llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+1-0llvm/include/llvm/ADT/GenericUniformityInfo.h
+53-24 files

LLVM/project ea3eb01llvm/tools/llvm-symbolizer llvm-symbolizer.cpp

[NFC][llvm-symbolizer]Replace makeStringError helper with createStringError (#188428)

The local `makeStringError` helper in `llvm-symbolizer.cpp` is
equivalent to `createStringError` from `llvm/Support/Error.h`. Remove it
and use `createStringError` directly at all call sites.
DeltaFile
+10-14llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
+10-141 files

LLVM/project 3059a1ellvm/lib/Analysis DependenceAnalysis.cpp

[DA] Hoist division check for early exit in weakCrossingSIVtest (NFC)

This patch moves the check that `Coeff` divides `Delta` earlier in the
function to enable an early exit. Potentially improve performance.

Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
DeltaFile
+21-21llvm/lib/Analysis/DependenceAnalysis.cpp
+21-211 files

LLVM/project 47f6a19llvm/lib/Target/AMDGPU GCNVOPDUtils.cpp VOP3PInstructions.td, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp

AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter whihc will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
DeltaFile
+258-592llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+75-93llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
+32-1llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+8-5llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+8-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+6-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+387-6911 files not shown
+389-6937 files

LLVM/project 952dc12libc/src/__support/OSUtil/darwin/aarch64 syscall.h, libc/src/__support/OSUtil/linux/aarch64 syscall.h

Force to inline syscall_impl on all platforms (#186849)

With currently only LIBC_INLINE, we just hint the compiler to inline the
function which however in practice is not always the case.

Since we added `[[gnu::always_inline]]` on linux/x86_64 it makes sense
to do it on all platforms consistently and add a comment explaining why
we need it.
DeltaFile
+15-10libc/src/__support/OSUtil/darwin/aarch64/syscall.h
+15-10libc/src/__support/OSUtil/linux/aarch64/syscall.h
+15-10libc/src/__support/OSUtil/linux/arm/syscall.h
+15-10libc/src/__support/OSUtil/linux/riscv/syscall.h
+14-10libc/src/__support/OSUtil/linux/i386/syscall.h
+5-0libc/src/__support/OSUtil/linux/x86_64/syscall.h
+79-506 files

LLVM/project 72d0a3dllvm/lib/Analysis DependenceAnalysis.cpp

[DA] Optimize parity check in weakCrossingSIVtest (NFC)

This patch simplifies the logic used to determine if the `Distance`
is divisible by 2. Previously, this was done by allocating an APInt
and performing a signed remainder (`srem`) operation.

Since `Distance` is an APInt, we can more efficiently check if it
is odd by directly inspecting the least significant bit (`Distance[0]`).
This avoids an expensive division operation and APInt allocation
while making the code more concise.

Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
DeltaFile
+1-4llvm/lib/Analysis/DependenceAnalysis.cpp
+1-41 files

LLVM/project 1dbf7c7offload/plugins-nextgen/level_zero/include L0Trace.h L0Kernel.h, offload/plugins-nextgen/level_zero/src L0Device.cpp L0Kernel.cpp

[OFFLOAD] Improve resource management of the plugin (#187597)

This PR improves event management of the plugin by fixing potential
resource leaks and preventing a potential deadlock
DeltaFile
+34-22offload/plugins-nextgen/level_zero/src/L0Device.cpp
+18-7offload/plugins-nextgen/level_zero/src/L0Kernel.cpp
+11-0offload/plugins-nextgen/level_zero/include/L0Trace.h
+3-1offload/plugins-nextgen/level_zero/include/L0Kernel.h
+66-304 files

LLVM/project ebe2454libclc/clc/include/clc/math clc_atan_helpers_decl.inc, libclc/clc/lib/generic/math clc_atanpi.inc clc_atan_helpers.inc

libclc: Update atanpi (#188424)

This was originally ported from rocm device libs in
03dc366e79cd01afe0bbfad2a7ede3087d6c9356. Merge in more
recent changes.
DeltaFile
+30-144libclc/clc/lib/generic/math/clc_atanpi.inc
+40-0libclc/clc/lib/generic/math/clc_atan_helpers.inc
+4-7libclc/clc/lib/generic/math/clc_atanpi.cl
+3-0libclc/clc/include/clc/math/clc_atan_helpers_decl.inc
+77-1514 files

LLVM/project 7c933f0mlir/include/mlir/Dialect/Arith/IR ArithOps.td, mlir/lib/Conversion/ArithToLLVM ArithToLLVM.cpp

[mlir][arith] Add `arith.convertf` op (#188041)

There are multiple FP types with the same bitwidth. Neither `extf` nor
`truncf` can be used in that case. Add a new `arith.convertf` op that
can be used in such cases. The op is modeled after `arith.truncf`. Also
add a lowering to LLVM.

Discussion:
https://discourse.llvm.org/t/arith-fptofp-vs-arith-extf-arith-truncf/90276

Assisted-by: claude-4.6-opus-high
DeltaFile
+88-0mlir/test/Dialect/Arith/invalid.mlir
+62-0mlir/lib/Conversion/ArithToLLVM/ArithToLLVM.cpp
+49-0mlir/lib/Dialect/Arith/IR/ArithOps.cpp
+36-0mlir/test/Dialect/Arith/ops.mlir
+34-0mlir/test/Conversion/ArithToLLVM/arith-to-llvm.mlir
+34-0mlir/include/mlir/Dialect/Arith/IR/ArithOps.td
+303-02 files not shown
+329-08 files

LLVM/project 0e49be0libclc/clc/include/clc/math math.h

libclc: Force assuming fast float fma for AMDGPU (#188245)

Currently the build uses the default dummy target, which assumes
FMA is slow. Force this to assume fast fma, which is the case on
any remotely new hardware. In the future if we want better support
for older targets, there should be a separate build of the math
functions for the slow fma case.
DeltaFile
+3-1libclc/clc/include/clc/math/math.h
+3-11 files

LLVM/project c43f6a0llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vec-strict-cmp-128.ll

[X86] Fix widening for strict_fmin/fmax (#188286)

I believe that widening these with undef is not correct, because the
undef values might be picked as sNaN and then trap.
DeltaFile
+34-4llvm/test/CodeGen/X86/vec-strict-cmp-128.ll
+5-3llvm/lib/Target/X86/X86ISelLowering.cpp
+39-72 files

LLVM/project 8b1f7f0llvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis weak-crossing-siv-coeff-may-zero.ll WeakCrossingSIV.ll

[DA] Fix the Weak Crossing SIV test when Coeff and Delta are zero (#188203)

The Weak Zero SIV test concluded that there is a dependency only in the
`=`-direction when `Delta` is zero. This is incorrect, because the
coefficients of the addrecs might be zero, in which case the dependency
should have all directions. This patch adds non-zero check for the
coefficient to address the issue.
DeltaFile
+3-3llvm/test/Analysis/DependenceAnalysis/weak-crossing-siv-coeff-may-zero.ll
+1-1llvm/lib/Analysis/DependenceAnalysis.cpp
+1-1llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll
+5-53 files

LLVM/project 4e7274dlldb/source/Target Thread.cpp

[lldb] Print correct thread plan in logging code of Thread::ShouldReportRun (#188198)

This code accesses the completed thread plan (even if it's private one).
However, the logging code does not pass `skip_private=false` and instead
accesses only the public completed thread plan. In case there is no
public thread plan, the logging code could also crash.

This is just some minor refactoring that ensures we use the same thread
plan in the logging code.
DeltaFile
+4-3lldb/source/Target/Thread.cpp
+4-31 files