LLVM/project 7c29ed5libc/src/__support/wctype/conversion/utils slice.h CMakeLists.txt

[libc][wctype] Upstream custom slice implementation from PtrHash-cc prototype to LLVM libc
DeltaFile
+112-0libc/src/__support/wctype/conversion/utils/slice.h
+11-0libc/src/__support/wctype/conversion/utils/CMakeLists.txt
+123-02 files

LLVM/project db737bcclang/include/clang/Basic CodeGenOptions.def, clang/include/clang/Options Options.td

review: use function attr instead cl::opt flag
DeltaFile
+17-15llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+3-11llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+7-0clang/include/clang/Options/Options.td
+4-0clang/include/clang/Basic/CodeGenOptions.def
+2-0clang/lib/CodeGen/Targets/AMDGPU.cpp
+33-265 files

LLVM/project 88b77d5llvm/lib/Target/SPIRV SPIRVLegalizerInfo.cpp SPIRVPostLegalizer.cpp, llvm/test/CodeGen/SPIRV/legalization vector-arithmetic-6.ll load-store-global.ll

[SPIRV] Support non-constant indices for vector insert/extract (#172514)

This patch updates the legalization of spv_insertelt and spv_extractelt
to
handle non-constant (dynamic) indices. When a dynamic index is
encountered, the
vector is spilled to the stack, and the element is accessed via
OpAccessChain
(lowered from spv_gep).

This patch also adds custom legalization for G_STORE to scalarize vector
stores
and refines the legalization rules for G_LOAD, G_STORE, and
G_BUILD_VECTOR.

Fixes https://github.com/llvm/llvm-project/issues/170534
DeltaFile
+234-48llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
+45-54llvm/test/CodeGen/SPIRV/legalization/vector-arithmetic-6.ll
+32-64llvm/test/CodeGen/SPIRV/legalization/load-store-global.ll
+66-0llvm/test/CodeGen/SPIRV/legalization/spv-extractelt-legalization.ll
+18-30llvm/test/CodeGen/SPIRV/llvm-intrinsics/matrix-transpose.ll
+40-3llvm/lib/Target/SPIRV/SPIRVPostLegalizer.cpp
+435-1991 files not shown
+443-2107 files

LLVM/project c65c6aellvm/test/CodeGen/AArch64 arm64-cvtf-simd-itofp.ll

fixup! [AArch64][llvm] Add codegen for simd fpcvt intrinsics

Use @llvm.experimental.constrained.* to test strict nodes
DeltaFile
+12-12llvm/test/CodeGen/AArch64/arm64-cvtf-simd-itofp.ll
+12-121 files

LLVM/project 758ded1llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.h AMDGPUBaseInfo.cpp

review: move hardwareLimit inside AMDGPUBaseInfo
DeltaFile
+36-46llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+18-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+17-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+71-463 files

LLVM/project e61a150llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Update SIInsertWaitcnts.cpp

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+1-11 files

LLVM/project 52a1048llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

fix: resolve issue after rebase
DeltaFile
+0-15llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-151 files

LLVM/project fa1ea58llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Update SIInsertWaitcnts.cpp

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+1-11 files

LLVM/project 8aa920allvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

add more test
DeltaFile
+225-0llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+225-01 files

LLVM/project 92958dcllvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

add run line for diff GPU Gen and counter types
DeltaFile
+567-203llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+567-2031 files

LLVM/project 7f1d01cllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

skip expanding out-of-order events
DeltaFile
+143-20llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+42-12llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+185-322 files

LLVM/project 734a367llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Address reviewer feedback: fix getWaitCountMax and reduce code duplication

- Fix getWaitCountMax() to use correct bitmasks based on architecture:
  - Pre-GFX12: Use getVmcntBitMask/getLgkmcntBitMask for LOAD_CNT/DS_CNT
  - GFX12+: Use getLoadcntBitMask/getDscntBitMask for LOAD_CNT/DS_CNT
- Refactor repetitive if-blocks for LOAD_CNT, DS_CNT, EXP_CNT into
  a single loop using getCounterRef helper function
- Fix X_CNT to return proper getXcntBitMask(IV) instead of 0
DeltaFile
+18-32llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+18-321 files

LLVM/project c7c5259lldb/source/Plugins/Language/CPlusPlus MsvcStlSpan.cpp CPlusPlusLanguage.cpp, lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/span TestDataFormatterStdSpan.py

[LLDB] Add MSVC STL span formatter (#173053)

`std::span` didn't have a formatter for MSVC's STL yet. The type is
quite useful in C++ 20, so this PR adds a formatter for it.

Since the formatter is new, I made it work with both DWARF and PDB from
the start.
DeltaFile
+134-0lldb/source/Plugins/Language/CPlusPlus/MsvcStlSpan.cpp
+24-5lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/span/TestDataFormatterStdSpan.py
+17-9lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
+6-0lldb/source/Plugins/Language/CPlusPlus/MsvcStl.h
+1-0lldb/source/Plugins/Language/CPlusPlus/CMakeLists.txt
+182-145 files

LLVM/project 03e0c1ellvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

[AMDGPU] Add -amdgpu-expand-waitcnt-profiling option for PC-sampling profiling
DeltaFile
+230-0llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+172-22llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+402-222 files

LLVM/project 82ba5f6llvm/lib/Target/SPIRV SPIRVRegularizer.cpp, llvm/test/CodeGen/SPIRV icmp-i1.ll

[SPIRV] Lower i1 comparisons to logical operations in regularizer pass.

UGT, UGE, ULT, ULE, SGT, SGE, SLT, SLE predicates for i1 types are now
lowered to equivalent logical operations (AND, OR, NOT) to ensure
valid SPIR-V, since SPIR-V boolean types only support logical operations.
DeltaFile
+132-0llvm/test/CodeGen/SPIRV/icmp-i1.ll
+73-0llvm/lib/Target/SPIRV/SPIRVRegularizer.cpp
+205-02 files

LLVM/project 2e7c3dallvm/lib/Target/SPIRV SPIRVRegularizer.cpp

[review] Take name.
DeltaFile
+1-1llvm/lib/Target/SPIRV/SPIRVRegularizer.cpp
+1-11 files

LLVM/project 632d077llvm/lib/Target/SPIRV SPIRVRegularizer.cpp

[review] Use IRBuilder and inline call.
DeltaFile
+7-21llvm/lib/Target/SPIRV/SPIRVRegularizer.cpp
+7-211 files

LLVM/project a2ae83bllvm/lib/Target/SPIRV SPIRVRegularizer.cpp

[review] Take name.
DeltaFile
+1-1llvm/lib/Target/SPIRV/SPIRVRegularizer.cpp
+1-11 files

LLVM/project 757f57cllvm/lib/Target/SPIRV SPIRVRegularizer.cpp

[review] Use IRBuilder and inline call.
DeltaFile
+7-21llvm/lib/Target/SPIRV/SPIRVRegularizer.cpp
+7-211 files

LLVM/project e5623b1llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/AMDGPU div_v2i128.ll fptoi.i128.ll

Revert "SelectionDAG: Do not propagate divergence through glue (#174766)"

This reverts commit 47a0d0e42832558f999b149b22cfd48c46ef2a57.

Reverted due to test failures in LLVM_ENABLE_EXPENSIVE_CHECKS builds.
DeltaFile
+328-336llvm/test/CodeGen/AMDGPU/div_v2i128.ll
+256-262llvm/test/CodeGen/AMDGPU/fptoi.i128.ll
+200-204llvm/test/CodeGen/AMDGPU/rem_i128.ll
+163-167llvm/test/CodeGen/AMDGPU/div_i128.ll
+22-5llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+3-4llvm/test/CodeGen/AMDGPU/dag-divergence.ll
+972-9786 files

LLVM/project 9315747libclc CMakeLists.txt

[libclc] Initial support for cross-compiling OpenCL libraries (#174022)

Summary:
The other GPU enabled libraries, (openmp, flang-rt, compiler-rt, libc,
libcxx, libcxx-abi) all support builds through a runtime cross-build. In
these builds we use a separate CMake build that cross-compiles to a
single target.

This patch provides basic support for this with the `libclc` libraries.
Changes include adding support for the more standard GPU compute triples
(amdgcn-amd-amdhsa, nvptx64-nvidia-cuda) and building only one target in
this mode.

Some things left to do:

This patch does not change the compiler invocations, this method would
allow us to use standard CMake routines but this keeps it minimal.

The prebuild support is questionable and doesn't fit into this scheme

    [3 lines not shown]
DeltaFile
+10-2libclc/CMakeLists.txt
+10-21 files

LLVM/project d45d8cbllvm/lib/Frontend/OpenMP OMPIRBuilder.cpp, offload/test/offloading/fortran default-mapper-nested-derived-type.f90

[OpenMP][OMPIRBuilder] Attach `Attribute::OptimizeNone` to user-defined (and default) mappers

Disabling opts for user-defined mappers since, in some cases (see
`default-mapper-nested-derived-type.f90`), some optimizations and
instrumentations causes runtime crashes on the host. In particular, the
following:
- ths `X86DAGToDAGISel` pass and
- `OptNoneInstrumentation` (TODO zoom in further on passes that use this
instrumention object to find out the exact pass(es) causing the crash).

I couldn't find a way to fine-tune this for only the problematic passes
yet. I am very reluctant to do this, please let me know if there are
better solutions.
DeltaFile
+34-0offload/test/offloading/fortran/default-mapper-nested-derived-type.f90
+9-0llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+43-02 files

LLVM/project dd79244cross-project-tests CMakeLists.txt, cross-project-tests/debuginfo-tests/llvm-prettyprinters/lldb CMakeLists.txt

[cross-project-tests][formatters] Move LLDB test setup into it's own CMakeLists.txt

Once we start adding more tests, having this in a separate CMakeLists.txt is more maintainable.
DeltaFile
+9-0cross-project-tests/debuginfo-tests/llvm-prettyprinters/lldb/CMakeLists.txt
+2-6cross-project-tests/CMakeLists.txt
+11-62 files

LLVM/project 5603cd1llvm/lib/Target/AMDGPU SIDefines.h

[NFCI][AMDGPU] Update Mode register mask for gfx1250

SPG says two bits for each operand.
DeltaFile
+5-5llvm/lib/Target/AMDGPU/SIDefines.h
+5-51 files

LLVM/project 47a0d0ellvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/AMDGPU div_v2i128.ll fptoi.i128.ll

SelectionDAG: Do not propagate divergence through glue (#174766)

Glue does not carry any value (in the LLVM IR Value sense) that could be
considered uniform or divergent.
DeltaFile
+336-328llvm/test/CodeGen/AMDGPU/div_v2i128.ll
+262-256llvm/test/CodeGen/AMDGPU/fptoi.i128.ll
+204-200llvm/test/CodeGen/AMDGPU/rem_i128.ll
+167-163llvm/test/CodeGen/AMDGPU/div_i128.ll
+5-22llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-3llvm/test/CodeGen/AMDGPU/dag-divergence.ll
+978-9726 files

LLVM/project 1719aa4llvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp

resolve review comments
DeltaFile
+10-7llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+10-71 files

LLVM/project 82c1f94mlir/include/mlir/Transforms Passes.td Passes.h, mlir/lib/Transforms RemoveDeadValues.cpp

[mlir][Transforms] `remove-dead-values`: Rely on canonicalizer for region simplification (#173505)

This commit simplifies the `remove-dead-values` pass and fixes a bug in
the handling of `RegionBranchOpInterface` ops. The pass used to produce
invalid IR ("null value found") for the newly added test case.

`remove-dead-values` is a pass for additional IR simplification that
cannot be performed by the canonicalizer pass. Based on a liveness
analysis, it erases dead values / IR. (The liveness analysis is a
dataflow analysis that has more information about the IR than a
canonicalization pattern, which can see only "local" information.)

Region-based ops are difficult. The liveness analysis may determine that
an SSA value is dead. However, that does not mean that the value can
actually be removed. Doing so may violate an region data flow (as
modeled by the `RegionBranchOpInterface`). As an example, consider the
case where a region branch terminator may dispatch to one of two region
successor with the same forwarded values. A successor input (block
argument) can be erased only if it is dead on both successors.

    [11 lines not shown]
DeltaFile
+143-304mlir/lib/Transforms/RemoveDeadValues.cpp
+110-45mlir/test/Transforms/remove-dead-values.mlir
+10-0mlir/include/mlir/Transforms/Passes.td
+1-0mlir/include/mlir/Transforms/Passes.h
+264-3494 files

LLVM/project 4fdbe05clang/lib/AST/ByteCode Compiler.cpp, clang/test/AST/ByteCode complex.cpp

[clang][bytecode] Fix some imag/real corner cases (#174764)

Fix real/imag when taking a primitive parameter _and_ being discarded,
and fix the case where their subexpression can't be classified.

Fixes https://github.com/llvm/llvm-project/issues/174668
DeltaFile
+26-0clang/test/AST/ByteCode/complex.cpp
+7-3clang/lib/AST/ByteCode/Compiler.cpp
+33-32 files

LLVM/project 412e86allvm/test/CodeGen/AMDGPU carryout-selection.ll llvm.amdgcn.wmma.gfx1250.w32.ll

[AMDGPU] Handle `s_setreg_imm32_b32` targeting `MODE` register

On certain hardware, this instruction clobbers VGPR MSB `bits[12:19]`, so we need to restore the current mode.
DeltaFile
+713-2llvm/test/CodeGen/AMDGPU/carryout-selection.ll
+246-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wmma.gfx1250.w32.ll
+220-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wmma.imod.gfx1250.w32.ll
+212-0llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll
+174-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wmma.imm.gfx1250.w32.ll
+166-0llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll
+1,731-2146 files not shown
+6,212-3152 files

LLVM/project d206fb1mlir/include/mlir/Transforms Passes.td Passes.h, mlir/lib/Transforms RemoveDeadValues.cpp

tmp commit

simple test working

draft: do not erase IR, just replace uses
DeltaFile
+143-304mlir/lib/Transforms/RemoveDeadValues.cpp
+110-45mlir/test/Transforms/remove-dead-values.mlir
+10-0mlir/include/mlir/Transforms/Passes.td
+1-0mlir/include/mlir/Transforms/Passes.h
+264-3494 files