LLVM/project 95656f0llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-shufflevector.ll

InstCombine: Rudimentary support of shufflevector in SimplifyDemandedFPClass

This should look more like the computeKnownFPClass handling, with knowledge
of demanded vector elements.
DeltaFile
+269-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-shufflevector.ll
+11-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+280-02 files

LLVM/project 7050ca2llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-insertelement.ll

InstCombine: Basic insertelement support for SimplifyDemandedFPClass

Eventually this should pull up the known elements logic from
computeKnownFPClass.
DeltaFile
+187-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-insertelement.ll
+10-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+197-02 files

LLVM/project 984582ellvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Fix defining undef constant vector elts in SimplifyDemandedFPClass

Fold constants of known single class to the original constant instead of
a new constant. This avoids overdefining vector elements that were originally
undefined with the splat constant.
DeltaFile
+29-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+12-2llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+41-22 files

LLVM/project 775251allvm/include/llvm/CodeGen SelectionDAGISel.h, llvm/lib/CodeGen/SelectionDAG SelectionDAGISel.cpp

[SelectionDAG] Remove OPC_EmitStringInteger from isel. (#173936)

Instead emit this as an OPC_EmitInteger, but print the string
when the value is known to be 0..63 (when we don't need a VBR).
Also print the string into a comment when comments are not omitted
so it isn't lost when a VBR is needed.
DeltaFile
+7-22llvm/utils/TableGen/DAGISelMatcherGen.cpp
+17-11llvm/utils/TableGen/DAGISelMatcherEmitter.cpp
+4-13llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+8-4llvm/utils/TableGen/Common/DAGISelMatcher.h
+2-2llvm/test/TableGen/dag-isel-regclass-emit-enum.td
+0-3llvm/include/llvm/CodeGen/SelectionDAGISel.h
+38-551 files not shown
+39-567 files

LLVM/project 2541b18llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 xor-with-zero-and-incompat.ll

[SLP]Mark and incompatible for 'xor %a, 0' operations

Xor with 0 is incompatible with and, which resulst in all zero instead
of %a

https://alive2.llvm.org/ce/z/oEVETS

Fixes #174041
DeltaFile
+1-1llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+1-1llvm/test/Transforms/SLPVectorizer/X86/xor-with-zero-and-incompat.ll
+2-22 files

LLVM/project 2c32613mlir/include/mlir/Dialect/SparseTensor/IR SparseTensorOps.td, mlir/test/Dialect/SparseTensor sparse_out.mlir sparse_kernels.mlir

fix some tests
DeltaFile
+17-17mlir/test/Dialect/SparseTensor/sparse_out.mlir
+8-8mlir/test/Dialect/SparseTensor/sparse_kernels.mlir
+4-4mlir/test/Transforms/remove-dead-values.mlir
+4-3mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
+3-3mlir/test/Dialect/Vector/vector-warp-distribute.mlir
+36-355 files

LLVM/project a53dbe2mlir/lib/Dialect/MemRef/IR MemRefOps.cpp, mlir/test/Dialect/MemRef canonicalize.mlir

[mlir] Fold memref.cast static-to-dynamic to memref.expand_shape (#170037)

memref.expand_shape didn't have memref.cast op folder. Added
canonicalization pattern to allow folding of memref.cast from static to
dynamic.

Example:

```mlir
  %0 = memref.cast %arg0 : memref<8x4xf32> to memref<?x4xf32>
  %c0 = arith.constant 0 : index
  %dim0 = memref.dim %0, %c0 : memref<?x4xf32>
  %1 = memref.expand_shape %0 [[0, 1], [2]] output_shape [%dim0, 1, 4]  : memref<?x4xf32> into memref<?x1x4xf32>
```

is converted to:

```mlir
  %expand_shape = memref.expand_shape %arg0 [[0, 1], [2]] output_shape [8, 1, 4] : memref<8x4xf32> into memref<8x1x4xf32>

    [2 lines not shown]
DeltaFile
+138-0mlir/test/Dialect/MemRef/canonicalize.mlir
+67-1mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+205-12 files

LLVM/project a980baallvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-shufflevector.ll

InstCombine: Rudimentary support of shufflevector in SimplifyDemandedFPClass

This should look more like the computeKnownFPClass handling, with knowledge
of demanded vector elements.
DeltaFile
+269-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-shufflevector.ll
+11-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+280-02 files

LLVM/project e70906cmlir/include/mlir/Dialect/SparseTensor/IR SparseTensorOps.td, mlir/test/Dialect/Vector vector-warp-distribute.mlir

fix some tests
DeltaFile
+4-4mlir/test/Transforms/remove-dead-values.mlir
+4-3mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
+3-3mlir/test/Dialect/Vector/vector-warp-distribute.mlir
+11-103 files

LLVM/project 447cademlir/include/mlir/Dialect/SparseTensor/IR SparseTensorOps.td, mlir/test/Dialect/Vector vector-warp-distribute.mlir

fix some tests
DeltaFile
+4-3mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
+3-3mlir/test/Dialect/Vector/vector-warp-distribute.mlir
+7-62 files

LLVM/project fd1d31dllvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-insertelement.ll

InstCombine: Basic insertelement support for SimplifyDemandedFPClass

Eventually this should pull up the known elements logic from
computeKnownFPClass.
DeltaFile
+187-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-insertelement.ll
+11-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+198-02 files

LLVM/project 3e93d9allvm/test/tools/llvm-mca/AArch64 mcpu-help.test, llvm/tools/llvm-mca llvm-mca.cpp

Revert -mcpu fix (#174093)

Reverts #173399 and #174004.

#173399 moved MemoryBuffer::getFileOrSTDIN below the -mcpu validation to
fix the `-mcpu=help` flag , but on cross builds the first CPU is
rejected before the “file not found” diagnostic is printed. This failed
lit tests. #174004 introduced a host CPU fallback to fix the cross
compilation issue, but this still fails on NVPTX builders.

This can be revisited when a fix is found that works with the NVPTX
builders.
DeltaFile
+15-46llvm/tools/llvm-mca/llvm-mca.cpp
+0-11llvm/test/tools/llvm-mca/AArch64/mcpu-help.test
+15-572 files

LLVM/project 3dbfd13llvm/test/Transforms/SLPVectorizer/X86 xor-with-zero-and-incompat.ll

[SLP][NFC]Add a test with the incorrect xor to and transformation
DeltaFile
+19-0llvm/test/Transforms/SLPVectorizer/X86/xor-with-zero-and-incompat.ll
+19-01 files

LLVM/project bc128f3clang/test/Headers __clang_hip_math.hip, llvm/lib/IR Instructions.cpp

drop reapplying ir change

This reverts commit e4a0e0a13593d1cc0f79900c5e61a1848a1a0ee8.
DeltaFile
+88-81llvm/test/Transforms/DFAJumpThreading/dfa-unfold-select.ll
+22-22clang/test/Headers/__clang_hip_math.hip
+24-17llvm/lib/IR/Instructions.cpp
+18-18llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll
+17-15llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-transform.ll
+12-12llvm/test/Transforms/SimplifyCFG/UnreachableEliminate.ll
+181-16567 files not shown
+343-31673 files

LLVM/project a3fec0ellvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Fix defining undef constant vector elts in SimplifyDemandedFPClass

Fold constants of known single class to the original constant instead of
a new constant. This avoids overdefining vector elements that were originally
undefined with the splat constant.
DeltaFile
+29-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+12-2llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+41-22 files

LLVM/project 96df108llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-extractelement.ll

InstCombine: Handle extractelement in SimplifyDemandedFPClass (#174081)

A lot of boilerplate changes are necessary to do proper elementwise
tracking like SimplifyDemandedBits
DeltaFile
+120-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-extractelement.ll
+6-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+126-02 files

LLVM/project ff2d758llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-exp.ll

InstCombine: Preserve flags when simplifying exp (#174078)

DeltaFile
+21-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-exp.ll
+12-7llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+33-72 files

LLVM/project 8031481mlir/lib/Dialect/Arith/Transforms IntRangeOptimizations.cpp, mlir/test/Dialect/Arith int-range-narrowing.mlir

[mlir][int-range] `IntRangeNarrowingPass` was missing `SparseConstantPropagation` analysis (#174088)

This was causing it to skip nested scf ops in some cases (see `scf.for`
test). Use convenience `loadBaselineAnalyses` func.
DeltaFile
+30-0mlir/test/Dialect/Arith/int-range-narrowing.mlir
+3-3mlir/lib/Dialect/Arith/Transforms/IntRangeOptimizations.cpp
+33-32 files

LLVM/project 22f2ae1llvm/lib/Target/AMDGPU AMDGPUTargetMachine.cpp, llvm/test/CodeGen/AMDGPU llc-pipeline-npm.ll

[AMDGPU][NPM] Complete fast regalloc pipeline
DeltaFile
+38-0llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+1-1llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+39-12 files

LLVM/project da0a853mlir/include/mlir/Interfaces ControlFlowInterfaces.td ControlFlowInterfaces.h, mlir/lib/Dialect/SCF/IR SCF.cpp

[mlir][draft] Consolidate patterns into RegionBranchOpInterface patterns
DeltaFile
+195-727mlir/lib/Dialect/SCF/IR/SCF.cpp
+39-0mlir/lib/Interfaces/ControlFlowInterfaces.cpp
+5-7mlir/test/Dialect/SCF/canonicalize.mlir
+9-0mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
+2-0mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
+250-7345 files

LLVM/project 28a5690clang/lib/Headers amxavx512intrin.h, clang/test/CodeGen/X86 amxavx512-builtins.c

[X86][AMX-AVX512] Add *i intrinsics for immediate variants (#173545)

The immediate variants use the low 6-bit as row index, while register
variants use low 16-bit. We cannot select the immediate variants using
the same intrinsic. So let's add new intrinsics for them.
DeltaFile
+214-0clang/lib/Headers/amxavx512intrin.h
+36-0clang/test/CodeGen/X86/amxavx512-builtins.c
+9-9llvm/lib/Target/X86/X86InstrAMX.td
+18-0llvm/include/llvm/IR/IntrinsicsX86.td
+16-0llvm/lib/Target/X86/X86InstrOperands.td
+15-0llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
+308-96 files not shown
+353-1512 files

LLVM/project 5c2a811clang/docs ReleaseNotes.rst, clang/lib/Sema SemaDecl.cpp

[clang] Preserve the initializer when variable declaration deduction fails (#173546)

Fix https://github.com/clangd/clangd/issues/2572.
DeltaFile
+8-1clang/lib/Sema/SemaDecl.cpp
+8-0clang/test/AST/ast-dump-recovery.cpp
+1-0clang/docs/ReleaseNotes.rst
+17-13 files

LLVM/project d660ef7llvm/lib/DTLTO CMakeLists.txt

[DTLTO] Add missing link dependencies on BinaryFormat and Object
DeltaFile
+2-0llvm/lib/DTLTO/CMakeLists.txt
+2-01 files

LLVM/project 2d60f87llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/X86 multi-exit-cost.ll

[VPlan] Only use legacy cost for instructions only used by exit conds. (#174029)

Currently we need to precompute costs for exit conditions, to match the
legacy cost, as they will get replaced by a compare against the
canonical IV (or others, like active-lane-mask or EVL based) and the
original compare will get removed.

This is not true for instructions with users other than the exit
condition. Those will remain, and we can just use the VPlan-based cost
model to get more accurate results.

This improves results in some cases, like
@test_value_in_exit_compare_chain_used_outside because the IV increment
user outside the loop is replaced by computing the final value outside
the loop.

It also fixes a crash introduced by f196b1d66ff (#146525).

PR: https://github.com/llvm/llvm-project/pull/174029
DeltaFile
+99-7llvm/test/Transforms/LoopVectorize/X86/multi-exit-cost.ll
+2-3llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+101-102 files

LLVM/project d05c07dllvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Compute fp class when simplifying with multiple uses (#174086)

DeltaFile
+19-2llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+3-1llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+22-32 files

LLVM/project b646ddallvm/lib/CodeGen LiveIntervals.cpp

[CodeGen][NPM] dump slot index info with -debug while running LiveIntervals
DeltaFile
+4-2llvm/lib/CodeGen/LiveIntervals.cpp
+4-21 files

LLVM/project 4fcf759llvm/lib/Target/AMDGPU AMDGPUTargetMachine.cpp, llvm/test/CodeGen/AMDGPU llc-pipeline-npm.ll

[AMDGPU][NPM] Enable "AMDGPURewriteAGPRCopyMFMAPass"
DeltaFile
+2-2llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+2-0llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+4-22 files

LLVM/project 5f23c1bllvm/include/llvm/Passes CodeGenPassBuilder.h, llvm/test/CodeGen/AMDGPU llc-pipeline-npm.ll

[CodeGen][NPM] Add "PhysicalRegisterUsageAnalysis" once
DeltaFile
+3-3llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+1-4llvm/include/llvm/Passes/CodeGenPassBuilder.h
+4-72 files

LLVM/project c3db7f3llvm/lib/Target/AMDGPU AMDGPUTargetMachine.cpp

[AMDGPU][NPM] Obey "enable-amdgpu-aa" option
DeltaFile
+2-1llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+2-11 files

LLVM/project 001bd7dllvm/lib/CodeGen BranchFolding.cpp BranchRelaxation.cpp, llvm/lib/Target/AMDGPU SIPreEmitPeephole.cpp

[CodeGen][NPM] Update dominator tree and post dominator tree consistently
DeltaFile
+11-2llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp
+11-1llvm/lib/CodeGen/BranchFolding.cpp
+10-1llvm/lib/CodeGen/BranchRelaxation.cpp
+7-4llvm/lib/CodeGen/MachineBlockPlacement.cpp
+39-84 files