LLVM/project 9e86c0dmlir/lib/Dialect/Linalg/IR LinalgOps.cpp

[MLIR] Apply clang-tidy fixes for readability-container-size-empty in LinalgOps.cpp (NFC)
DeltaFile
+1-1mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+1-11 files

LLVM/project c6a79a5mlir/lib/Target/LLVMIR/Dialect/LLVMIR LLVMToLLVMIRTranslation.cpp

[MLIR] Apply clang-tidy fixes for readability-identifier-naming in LLVMToLLVMIRTranslation.cpp (NFC)
DeltaFile
+5-5mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.cpp
+5-51 files

LLVM/project 3da82afmlir/lib/Dialect/SparseTensor/Transforms SparseBufferRewriting.cpp

[MLIR] Apply clang-tidy fixes for bugprone-argument-comment in SparseBufferRewriting.cpp (NFC)
DeltaFile
+4-4mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
+4-41 files

LLVM/project 848f8bellvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU SIISelLowering.cpp SIInstructions.td

[AMDGPU] Add wave reduce intrinsics for float types - 2

Supported Ops: `fadd`, `fsub`
DeltaFile
+1,001-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+1,001-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+44-3llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2-0llvm/lib/Target/AMDGPU/SIInstructions.td
+1-1llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+2-0llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+2,051-46 files

LLVM/project a2b4c0fclang/include/clang/Basic BuiltinsX86.td, clang/lib/AST ExprConstant.cpp

[X86][Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow AVX512 mask predicate intrinsics to be used in constexpr (#165054)

Enables constexpr evaluation for the following AVX512 Instrinsics:
```
_mm_movepi8_mask _mm256_movepi8_mask _mm512_movepi8_mask
_mm_movepi16_mask _mm256_movepi16_mask _mm512_movepi16_mask
_mm_movepi32_mask _mm256_movepi32_mask _mm512_movepi32_mask
_mm_movepi64_mask _mm256_movepi64_mask _mm512_movepi64_mask
```
Part of #162072
DeltaFile
+36-0clang/lib/AST/ByteCode/InterpBuiltin.cpp
+22-11clang/include/clang/Basic/BuiltinsX86.td
+31-0clang/test/CodeGen/X86/avx512bw-builtins.c
+31-0clang/lib/AST/ExprConstant.cpp
+8-12clang/lib/Headers/avx512vlbwintrin.h
+8-12clang/lib/Headers/avx512vldqintrin.h
+136-355 files not shown
+174-4711 files

LLVM/project 02db2dellvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 vscale-and-sve-cnt-demandedbits.ll sve-vector-compress.ll

[AArch64][SVE] Implement demanded bits for @llvm.aarch64.sve.cntp (#168714)

This allows DemandedBits to see that the SVE CNTP intrinsic will only
ever produce small positive integers. The maximum value you could get
here is 256, which is CNTP on a nxv16i1 on a machine with a 2048bit
vector size (the maximum for SVE).

Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.
DeltaFile
+40-21llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+56-0llvm/test/CodeGen/AArch64/vscale-and-sve-cnt-demandedbits.ll
+5-6llvm/test/CodeGen/AArch64/sve-vector-compress.ll
+101-273 files

LLVM/project 0a88e96mlir/lib/Dialect/LLVMIR/Transforms DIScopeForLLVMFuncOp.cpp, mlir/test/Dialect/LLVMIR add-debuginfo-func-scope.mlir

[MLIR][LLVM] Extend DIScopeForLLVMFuncOp to handle cross-file operatio… (#167844)

The current `DIScopeForLLVMFuncOp` pass handles debug information for
inlined code by processing `CallSiteLoc` attributes. However, some
compilation scenarios compose code from multiple source files directly
into a single function without generating `CallSiteLoc`.

**Scenario:**
```python
# a.py
def kernel_a(tensor):
    print("a: {}", tensor)  # a.py:3
    jit_func_b(tensor)           # Calls b.py code

# b.py
def func_b(tensor):
    print("b: {}", tensor)  # b.py:7
```


    [18 lines not shown]
DeltaFile
+44-1mlir/lib/Dialect/LLVMIR/Transforms/DIScopeForLLVMFuncOp.cpp
+19-0mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope.mlir
+63-12 files

LLVM/project aacf145llvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU SIISelLowering.cpp SIInstructions.td

Review comments: remove the `.float` suffix and overload.
DeltaFile
+130-96llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+129-77llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+11-6llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2-2llvm/lib/Target/AMDGPU/SIInstructions.td
+1-1llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+2-0llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+275-1826 files

LLVM/project 386826dllvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp GCNSubtarget.h, llvm/test/CodeGen/AMDGPU spillv16.ll integer-mad-patterns.ll

[AMDGPU][gfx1250] Also add a wait on xcnt before volatile accesses
DeltaFile
+14-0llvm/test/CodeGen/AMDGPU/spillv16.ll
+9-3llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+6-0llvm/test/CodeGen/AMDGPU/integer-mad-patterns.ll
+4-2llvm/lib/Target/AMDGPU/GCNSubtarget.h
+4-0llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll
+4-0llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll
+41-510 files not shown
+56-516 files

LLVM/project 53dfdf7clang/include/clang/Basic BuiltinsX86.td

[X86] BuiltinsX86.td - attempt to pack the builtins for each SSE level close together. NFC. (#168844)

Avoid some repeated feature blocks - we should have a single place in
each file that we can find most builtins for a particular ISA level.

Also, avoid some of the 80col wrapping that just makes it harder to find
anything at all.

There's a lot more we can do - but I don't want to completely refactor
this while we still have so much work to do for #30794
DeltaFile
+27-64clang/include/clang/Basic/BuiltinsX86.td
+27-641 files

LLVM/project 95d788cmlir/include/mlir/Pass Pass.h, mlir/lib/Pass Pass.cpp

Revert "[mlir][Pass] Fix crash when applying a pass to an optional interface" (#168847)

Reverts llvm/llvm-project#168499
DeltaFile
+0-10mlir/test/Pass/invalid-unsupported-operation.mlir
+0-8mlir/include/mlir/Pass/Pass.h
+3-3mlir/lib/Pass/Pass.cpp
+1-1mlir/test/Dialect/Transform/test-pass-application.mlir
+1-1mlir/test/Pass/pipeline-invalid.mlir
+5-235 files

LLVM/project 3396b46llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize vplan-printing-reductions.ll

[LV] Allow partial reductions with an extended bin op (#165536)

A pattern of the form reduce.add(ext(mul)) is valid for a partial
reduction as long as the mul and its operands fulfill the requirements
of a normal partial reduction. The mul's extend operands will be
optimised to the wider extend, and we already have oneUse checks in
place to make sure the mul and operands can be modified safely.

1. -> https://github.com/llvm/llvm-project/pull/165536
2. https://github.com/llvm/llvm-project/pull/165543
DeltaFile
+509-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll
+90-0llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+85-0llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll
+0-80llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-constant-ops.ll
+18-3llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+702-835 files

LLVM/project 2cf550allvm/lib/CodeGen/AsmPrinter DwarfDebug.cpp, llvm/test/DebugInfo/MIR/X86 debug-loc-0.mir

[DebugInfo] Force early line-zero calls to have meaningful locations (#156850)

In functions that have been seriously deformed during optimisation,
there can be call instructions with line-zero immediately after frame
setup (see C reproducer in the test added). Our previous algorithms for
prologue_end ignored these, meaning someone entering a function at
prologue_end would break-in after a function call had completed. Prefer
instead to place prologue_end and the function scope-line on the line
zero call: this isn't false (it's the first meaningful instruction of the
function) and is approximately true. Given a less than ideal function,
this is an OK solution.
DeltaFile
+132-0llvm/test/DebugInfo/X86/no-prologue-end-after-line0-calls.mir
+26-0llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+1-1llvm/test/DebugInfo/MIR/X86/debug-loc-0.mir
+159-13 files

LLVM/project 74cebcellvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPURegisterBankInfo.cpp

Revert "[AMDGPU] Add wave reduce intrinsics for float types - 2 (#161… (#168845)

…815)"

This reverts commit dcab4cb49bfb0aa17df3d3fabe582696100e0d35.
DeltaFile
+0-1,001llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+0-1,001llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+3-39llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+1-1llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+0-2llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+0-2llvm/lib/Target/AMDGPU/SIInstructions.td
+4-2,0466 files

LLVM/project e3e0d8cllvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU SIISelLowering.cpp SIInstructions.td

Revert "[AMDGPU] Add wave reduce intrinsics for float types - 2 (#161815)"

This reverts commit dcab4cb49bfb0aa17df3d3fabe582696100e0d35.
DeltaFile
+0-1,001llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+0-1,001llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+3-39llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+0-2llvm/lib/Target/AMDGPU/SIInstructions.td
+0-2llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+1-1llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+4-2,0466 files

LLVM/project f7e07e5clang/include/clang/Basic BuiltinsAMDGPU.def, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

Review comments: remove the float overload.
DeltaFile
+48-48clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+12-8clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+4-4clang/include/clang/Basic/BuiltinsAMDGPU.def
+64-603 files

LLVM/project f051477clang/include/clang/Basic BuiltinsAMDGPU.def, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

[AMDGPU] Add builtins for wave reduction intrinsics
DeltaFile
+84-0clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+8-0clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+4-0clang/include/clang/Basic/BuiltinsAMDGPU.def
+96-03 files

LLVM/project 54f69camlir/include/mlir/Pass Pass.h, mlir/lib/Pass Pass.cpp

[mlir][Pass] Fix crash when applying a pass to an optional interface (#168499)

Interfaces can be optional: whether an op implements an interface or not
can depend on the state of the operation.

```
  // An optional code block for adding additional "classof" logic. This can
  // be used to better enable "optional" interfaces, where an entity only
  // implements the interface if some dynamic characteristic holds.
  // `$_attr`/`$_op`/`$_type` may be used to refer to an instance of the
  // interface instance being checked.
  code extraClassOf = "";
```

The current `Pass::canScheduleOn(RegisteredOperationName)` is
insufficient. This commit adds an additional overload to inspect
`Operation *`.

This commit fixes a crash when scheduling an `InterfacePass` for an
optional interface on an operation that does not actually implement the
interface.
DeltaFile
+10-0mlir/test/Pass/invalid-unsupported-operation.mlir
+8-0mlir/include/mlir/Pass/Pass.h
+3-3mlir/lib/Pass/Pass.cpp
+1-1mlir/test/Pass/pipeline-invalid.mlir
+1-1mlir/test/Dialect/Transform/test-pass-application.mlir
+23-55 files

LLVM/project 9870ef1clang/include/clang/Basic BuiltinsAMDGPU.def, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

Review comments: remove the float overload.
DeltaFile
+48-48clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+12-8clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+4-4clang/include/clang/Basic/BuiltinsAMDGPU.def
+64-603 files

LLVM/project 95b31cfclang/include/clang/Basic BuiltinsAMDGPU.def, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

[AMDGPU] Add builtins for wave reduction intrinsics
DeltaFile
+84-0clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+8-0clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+4-0clang/include/clang/Basic/BuiltinsAMDGPU.def
+96-03 files

LLVM/project 131cf7dclang/lib/CodeGen CGExpr.cpp, clang/test/CodeGenCXX alloc-token.cpp

[AllocToken] Enable alloc token instrumentation for size-returning functions (#168840)

Consider a newly added "malloc_span" attribute in the allocation token
instrumentation to ensure that allocation functions with the
"malloc_span" attribute are processed similarly to other memory
allocation functions.

Update the tests to demonstrate applicability to __size_returning_new.
DeltaFile
+8-9clang/test/CodeGenCXX/alloc-token.cpp
+1-0clang/lib/CodeGen/CGExpr.cpp
+9-92 files

LLVM/project dc343d2flang/test/Lower/Intrinsics modulo.f90, flang/test/Lower/OpenMP/Todo omp-clause-indirect.f90 omp-declarative-allocate.f90

[NFC][flang] Replace use of flang -fc1 with %flang_fc1 in few test case (#168830)

Replace use of flang -fc1 with %flang_fc1 in few test case
DeltaFile
+2-2flang/test/Semantics/indirect02.f90
+1-1flang/test/Lower/Intrinsics/modulo.f90
+1-1flang/test/Lower/OpenMP/Todo/omp-clause-indirect.f90
+1-1flang/test/Lower/OpenMP/Todo/omp-declarative-allocate.f90
+1-1flang/test/Lower/OpenMP/Todo/omp-declare-reduction-initsub.f90
+1-1flang/test/Lower/OpenMP/Todo/omp-declare-reduction.f90
+7-78 files not shown
+15-1514 files

LLVM/project 2ae7caaclang/include/clang/Basic BuiltinsAMDGPU.def, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

Review comments: remove the float overload.
DeltaFile
+48-48clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+12-8clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+4-4clang/include/clang/Basic/BuiltinsAMDGPU.def
+64-603 files

LLVM/project f6ed1d8clang/include/clang/Basic BuiltinsAMDGPU.def, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

[AMDGPU] Add builtins for wave reduction intrinsics
DeltaFile
+84-0clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+8-0clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+4-0clang/include/clang/Basic/BuiltinsAMDGPU.def
+96-03 files

LLVM/project bdf598fllvm/lib/Target/ARC ARCISelLowering.cpp, llvm/lib/Target/CSKY CSKYISelLowering.cpp

CodeGen: Add missing subtarget to TargetLoweringBase constructor for ARC, CSKY and M68K (#168811)

Those were missing in https://github.com/llvm/llvm-project/pull/168620.
DeltaFile
+1-1llvm/lib/Target/CSKY/CSKYISelLowering.cpp
+1-1llvm/lib/Target/ARC/ARCISelLowering.cpp
+1-1llvm/lib/Target/M68k/M68kISelLowering.cpp
+3-33 files

LLVM/project 07a31adllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 merge-consecutive-loads-128.ll merge-consecutive-loads-256.ll

[X86] EltsFromConsecutiveLoads - recognise reverse load patterns. (#168706)

See if we can create a vector load from the src elements in reverse and
then shuffle these back into place.

SLP will (usually) catch this in the middle-end, but there are a few
BUILD_VECTOR scalarizations etc. that appear during DAG legalization.

I did start looking at a more general permute fold, but I haven't found
any good test examples for this yet - happy to take another look if
somebody has examples.
DeltaFile
+48-216llvm/test/CodeGen/X86/merge-consecutive-loads-128.ll
+75-154llvm/test/CodeGen/X86/merge-consecutive-loads-256.ll
+14-113llvm/test/CodeGen/X86/merge-consecutive-loads-512.ll
+8-12llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+15-2llvm/lib/Target/X86/X86ISelLowering.cpp
+2-3llvm/test/CodeGen/X86/build-vector-256.ll
+162-5001 files not shown
+163-5027 files

LLVM/project 1a0a6b8clang/include/clang/Basic BuiltinsAMDGPU.def, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

[AMDGPU] Add builtins for wave reduction intrinsics
DeltaFile
+84-0clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+8-0clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+4-0clang/include/clang/Basic/BuiltinsAMDGPU.def
+96-03 files

LLVM/project c5cf1b2clang/include/clang/Basic BuiltinsAMDGPU.def, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

Review comments: remove the float overload.
DeltaFile
+48-48clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+12-8clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+4-4clang/include/clang/Basic/BuiltinsAMDGPU.def
+64-603 files

LLVM/project e44646bllvm/lib/Target/WebAssembly WebAssemblyISelLowering.cpp, llvm/test/CodeGen/WebAssembly simd-arith.ll simd-vecreduce-bool.ll

[WebAssembly] Lower ANY_EXTEND_VECTOR_INREG (#167529)

Treat it in the same manner of zero_extend_vector_inreg and generate an
extend_low_u if possible. This is to try an prevent expensive shuffles
from being generated instead. computeKnownBitsForTargetNode has also
been updated to specify known zeros on extend_low_u. 
DeltaFile
+14-22llvm/test/CodeGen/WebAssembly/simd-arith.ll
+26-1llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+2-2llvm/test/CodeGen/WebAssembly/simd-vecreduce-bool.ll
+42-253 files

LLVM/project dcab4cbllvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU SIISelLowering.cpp SIInstructions.td

[AMDGPU] Add wave reduce intrinsics for float types - 2 (#161815)

Supported Ops: `fadd`, `fsub`
DeltaFile
+1,001-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+1,001-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+39-3llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2-0llvm/lib/Target/AMDGPU/SIInstructions.td
+2-0llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+1-1llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+2,046-46 files