LLVM/project 351ae0cmlir/cmake/modules AddMLIR.cmake, mlir/lib/ExecutionEngine CMakeLists.txt

[MLIR][CMake] Fix runtime libraries with PCH (#182850)

Some MLIR libraries are intended to be dlopen-ed, but currently all MLIR
libraries link against LLVMSupport. After the recent PCH introduction,
this causes these libraries to implicitly use the LLVMSupport PCH, which
results in the definition of llvm::*ABIBreakingChecks, which results in
a ODR violation when loaded with dlopen.

Conceptually, libraries that are designed to be dlopen-ed should not
simply link against LLVM libraries in non-dylib builds for this reason.
(This apparently was a problem before with mlir_apfloat_wrappers.)

To fix builds, remove LLVMSupport from runtime libraries that don't need
it and, as a workaround, disable PCH for libraries that are in a weird
state (use LLVMSupport but happen to not export symbols currently).
DeltaFile
+55-3mlir/lib/ExecutionEngine/CMakeLists.txt
+7-3mlir/cmake/modules/AddMLIR.cmake
+2-0mlir/lib/ExecutionEngine/SparseTensor/CMakeLists.txt
+64-63 files

LLVM/project 51260bfmlir/test/Analysis/DataFlow test-liveness-analysis.mlir, mlir/test/lib/Analysis/DataFlow TestLivenessAnalysis.cpp

[mlir][Analysis] Print all blocks in `-test-liveness-analysis`
DeltaFile
+26-3mlir/test/Analysis/DataFlow/test-liveness-analysis.mlir
+10-7mlir/test/lib/Analysis/DataFlow/TestLivenessAnalysis.cpp
+36-102 files

LLVM/project 5aa1c38llvm/lib/Target/AMDGPU SIInstructions.td

[NFCI] Make all SI_KILL* convergent (#183100)

Add convergent property to SI_KILL*TERMINATOR. Now all SI_KILL* are
convergent. SI_KILL*TERMINATOR were already terminators so they could
not be sunk by machine-sink. Thus, this is probably a NFC.

Signed-off-by: John Lu <John.Lu at amd.com>
DeltaFile
+7-9llvm/lib/Target/AMDGPU/SIInstructions.td
+7-91 files

LLVM/project 8bf0b36llvm/lib/Transforms/InstCombine InstCombineLoadStoreAlloca.cpp, llvm/lib/Transforms/Utils Local.cpp

[InstCombine] Replace alloca with undef size with poison instead of null

When an alloca instruction has an undef (or poison) array size, InstCombine
was previously replacing all uses of the alloca with a null pointer. This
caused invalid IR when the alloca was used by @llvm.lifetime intrinsics.

According to the @llvm.lifetime intrinsic specification, the pointer
argument must be either:
  - A pointer to an alloca instruction, or
  - A poison value

Since null is neither an alloca pointer nor poison, the previous
transformation violated the intrinsic's requirements and produced
invalid IR.

Fix by replacing the alloca with a poison value instead of null, which
satisfies the @llvm.lifetime requirements and produces valid IR.
DeltaFile
+34-0llvm/test/Transforms/InstCombine/alloca-poison-size.ll
+0-30llvm/test/Transforms/InstCombine/invalid-alloca-poison-size.ll
+0-4llvm/lib/Transforms/Utils/Local.cpp
+1-1llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+35-354 files

LLVM/project dd37cd9llvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor/AMDGPU nofpclass-amdgcn-trig-preop.ll

AMDGPU: llvm.amdgcn.trig.preop cannot return negative values

This returns a positive value less than 1.
DeltaFile
+2-2llvm/test/Transforms/Attributor/AMDGPU/nofpclass-amdgcn-trig-preop.ll
+2-1llvm/lib/Analysis/ValueTracking.cpp
+4-32 files

LLVM/project bf4705cllvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlan.h, llvm/test/Transforms/LoopVectorize predicated-single-exit.ll early_exit_legality.ll

[VPlan] Supported conditionally executed single early exits. (#182395)

Add support for a single early exit that is executed conditionally. To
make sure the mask from any non-exiting control flow is combined with
the early exit condition.

To do so, introduce a MaskedCond VPInstruction, which is inserted as
user of the early-exit condition, at the point of the early-exit branch.
The VPInstruction will get masked automatically if needed by the
predicator, ensuring that we properly account for it when checking
whether the early exit has been taken.

Note that this does not allow for instructions that require predication
after the early exit. This requires additional work in progress:
https://github.com/llvm/llvm-project/pull/172454

As an alternative to MaskedCond, we could also predicate before handling
early exiting blocks: https://github.com/llvm/llvm-project/pull/181830

PR: https://github.com/llvm/llvm-project/pull/182395
DeltaFile
+100-57llvm/test/Transforms/LoopVectorize/predicated-single-exit.ll
+1-37llvm/test/Transforms/LoopVectorize/early_exit_legality.ll
+26-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+5-1llvm/lib/Transforms/Vectorize/VPlan.h
+5-0llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+3-1llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+140-994 files not shown
+146-10110 files

LLVM/project ab360b1llvm/lib/Analysis TargetTransformInfo.cpp, llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

[LLVM][TTI] Remove the isVScaleKnownToBeAPowerOfTwo hook. (#183292)

After https://github.com/llvm/llvm-project/pull/183080 this is no longer
a configurable property.

NOTE: No test changes expected beyond
llvm/test/Transforms/LoopVectorize/scalable-predication.ll which has
been removed because it only existed to verfiy the now unsupported
functionality.
DeltaFile
+0-114llvm/test/Transforms/LoopVectorize/scalable-predication.ll
+3-24llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+0-12llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+3-5llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+0-4llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+0-4llvm/lib/Analysis/TargetTransformInfo.cpp
+6-1637 files not shown
+6-17713 files

LLVM/project b7c056aclang-tools-extra/clang-tidy/modernize UseEqualsDeleteCheck.cpp, clang-tools-extra/docs ReleaseNotes.rst

[clang-tidy] Fix erroneous warning to make deleted function public (#182577)

This PR fixes #54276 and fixes #135249 by only matching private deleted
functions with a public overload or special member functions.
DeltaFile
+66-21clang-tools-extra/test/clang-tidy/checkers/modernize/use-equals-delete.cpp
+20-1clang-tools-extra/clang-tidy/modernize/UseEqualsDeleteCheck.cpp
+5-0clang-tools-extra/docs/ReleaseNotes.rst
+91-223 files

LLVM/project 4b25264clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp, clang/test/CIR/CodeGenBuiltins/AArch64 acle_sve_dup.c

[CIR][AArch64] Add lowering + tests for predicated SVE svdup_lane builtins

This PR adds CIR lowering + tests for SVE `svdup_lane` builtins on
AArch64. The corresponding ACLE intrinsics are documented at:
https://developer.arm.com/architectures/instruction-sets/intrinsics
DeltaFile
+157-0clang/test/CIR/CodeGenBuiltins/AArch64/acle_sve_dup.c
+20-3clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+177-32 files

LLVM/project 4c8ad96llvm/include/llvm/IR IntrinsicsSPIRV.td, llvm/lib/Target/SPIRV SPIRVInstructionSelector.cpp SPIRVModuleAnalysis.cpp

[SPIRV] Implement Gather and GatherCmp intrinsics (#182578)

This commit implements the intrinsics needed to represent the texture
Gather* instructions in HLSL.

Assisted-by: Gemini
DeltaFile
+118-0llvm/test/CodeGen/SPIRV/hlsl-resources/Gather.ll
+92-0llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+18-0llvm/test/CodeGen/SPIRV/hlsl-resources/Gather-errors-1.ll
+18-0llvm/test/CodeGen/SPIRV/hlsl-resources/Gather-errors-2.ll
+12-0llvm/include/llvm/IR/IntrinsicsSPIRV.td
+7-1llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
+265-16 files

LLVM/project 6a28a66clang/lib/CIR/CodeGen CIRGenAtomic.cpp, clang/test/CIR/CodeGen atomic.c

[CIR] Implement compare exchange with dynamic failure ordering (#183110)

In #156253, we implemented the rest of this feature, with compile time
constant failure ordering. This patch follows the incubators direction
(with a little cleanup based on other cleanup that we do) to replace
this situation with a 'switch'.
DeltaFile
+260-4clang/test/CIR/CodeGen/atomic.c
+47-3clang/lib/CIR/CodeGen/CIRGenAtomic.cpp
+307-72 files

LLVM/project bfff4f6clang/lib/CodeGen/TargetBuiltins ARM.cpp, clang/lib/Sema SemaARM.cpp

fixup! Move code to `AArch64ExpandPseudoInsts` and `getTgtMemIntrinsic`

Move code to `AArch64ExpandPseudoInsts` and `getTgtMemIntrinsic`
and use tablegen pattern for intrinsic, plus other small review changes.
DeltaFile
+47-75llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+69-32llvm/test/CodeGen/AArch64/pcdphint-atomic-store.ll
+42-47clang/lib/Sema/SemaARM.cpp
+21-12llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+10-12llvm/lib/Target/AArch64/AArch64InstrInfo.td
+17-5clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+206-1835 files not shown
+220-19211 files

LLVM/project 3f66484mlir/include/mlir-c ExtensibleDialect.h, mlir/lib/Bindings/Python IRAttributes.cpp IRTypes.cpp

[MLIR][Python] Fix typeid support for DynamicType and DynamicAttr (#183076)

Previously, we were using the static `typeid` of `DynamicType` for
checks, which is incorrect. We should instead check against the `typeid`
of `DynamicTypeDefinition` (which is a subclass of `SelfOwningTypeID`),
and register it via `register_type_caster` so that Python-defined types
can use `maybe_downcast`. (The attribute part is same.)
DeltaFile
+42-26mlir/lib/Bindings/Python/IRAttributes.cpp
+40-23mlir/lib/Bindings/Python/IRTypes.cpp
+32-0mlir/test/python/dialects/irdl.py
+8-6mlir/include/mlir-c/ExtensibleDialect.h
+10-0mlir/lib/CAPI/IR/ExtensibleDialect.cpp
+4-0mlir/python/mlir/dialects/ext.py
+136-553 files not shown
+139-599 files

LLVM/project c7d2031llvm/lib/Transforms/InstCombine InstCombineLoadStoreAlloca.cpp, llvm/lib/Transforms/Utils Local.cpp

[InstCombine] Replace alloca with poison size using poison instead of null

When an alloca instruction has an undef (poison) array size, InstCombine
was previously replacing all uses of the alloca with a null pointer. This
caused invalid IR when the alloca was used by @llvm.lifetime intrinsics.

According to the @llvm.lifetime intrinsic specification, the pointer
argument must be either:
  - A pointer to an alloca instruction, or
  - A poison value

Since null is neither an alloca pointer nor poison, the previous
transformation violated the intrinsic's requirements and produced
invalid IR.

Fix by replacing the alloca with a poison value instead of null, which
satisfies the @llvm.lifetime requirements and produces valid IR.
DeltaFile
+34-0llvm/test/Transforms/InstCombine/alloca-poison-size.ll
+0-30llvm/test/Transforms/InstCombine/invalid-alloca-poison-size.ll
+0-4llvm/lib/Transforms/Utils/Local.cpp
+1-1llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+35-354 files

LLVM/project 3e1545bllvm/include/llvm/Support ModRef.h, llvm/lib/Analysis AliasAnalysis.cpp

[LLVM] Refine MemoryEffect handling for target-specific intrinsics (#155590)

This patch improves memory alias analysis between calls if they change
inaccessible or target memory locations. The  results is
computed by comparing each location ModRefInfo between the calls.
DeltaFile
+119-0llvm/test/Transforms/EarlyCSE/target-memory.ll
+29-0llvm/lib/Analysis/AliasAnalysis.cpp
+14-2llvm/include/llvm/Support/ModRef.h
+162-23 files

LLVM/project c1b2477llvm/lib/Analysis LoopAccessAnalysis.cpp

[LAA] NFC: Rename mulSCEVOverflow to mulSCEVNoOverflow (#183096)

The function returns nullptr when the multiplication WOULD overflow,
matching the semantics of its sibling addSCEVNoOverflow. The old name
reads as if the function multiplies with overflow, which is the opposite
of what it does.
DeltaFile
+4-4llvm/lib/Analysis/LoopAccessAnalysis.cpp
+4-41 files

LLVM/project 87d9dadllvm/lib/Target/AArch64 AArch64ISelDAGToDAG.cpp AArch64InstrInfo.td, llvm/test/CodeGen/AArch64 arm64-vcvt.ll

[NFC] Address unresolved comments on #172837 (#183284)

DeltaFile
+5-5llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+0-1llvm/lib/Target/AArch64/AArch64InstrInfo.td
+0-1llvm/test/CodeGen/AArch64/arm64-vcvt.ll
+5-73 files

LLVM/project 1675efellvm/lib/Transforms/InstCombine InstCombineLoadStoreAlloca.cpp, llvm/lib/Transforms/Utils Local.cpp

[InstCombine] Replace alloca with poison size using poison instead of null

When an alloca instruction has an undef (poison) array size, InstCombine
was previously replacing all uses of the alloca with a null pointer. This
caused invalid IR when the alloca was used by @llvm.lifetime intrinsics.

According to the @llvm.lifetime intrinsic specification, the pointer
argument must be either:
  - A pointer to an alloca instruction, or
  - A poison value

Since null is neither an alloca pointer nor poison, the previous
transformation violated the intrinsic's requirements and produced
invalid IR.

Fix by replacing the alloca with a poison value instead of null, which
satisfies the @llvm.lifetime requirements and produces valid IR.
DeltaFile
+33-0llvm/test/Transforms/InstCombine/alloca-poision-size.ll
+33-0llvm/test/Transforms/InstCombine/alloca-poison-size.ll
+0-30llvm/test/Transforms/InstCombine/invalid-alloca-poison-size.ll
+0-4llvm/lib/Transforms/Utils/Local.cpp
+1-1llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+67-355 files

LLVM/project cf4b75fllvm/include/llvm/MC MCAsmBaseStreamer.h

Fix formatting
DeltaFile
+1-0llvm/include/llvm/MC/MCAsmBaseStreamer.h
+1-01 files

LLVM/project 76be8aellvm/include/llvm/MC MCAsmBaseStreamer.h, llvm/lib/Target/SystemZ/MCTargetDesc SystemZHLASMAsmStreamer.cpp SystemZHLASMAsmStreamer.h

Update based on review
DeltaFile
+0-6llvm/lib/Target/SystemZ/MCTargetDesc/SystemZHLASMAsmStreamer.cpp
+1-1llvm/include/llvm/MC/MCAsmBaseStreamer.h
+0-2llvm/lib/Target/SystemZ/MCTargetDesc/SystemZHLASMAsmStreamer.h
+1-93 files

LLVM/project ee34eb6mlir/test/Dialect/LLVMIR nvvm-mma-sp-ordered.mlir, mlir/test/Target/LLVMIR/nvvm mma-sparse-blockscale.mlir mma-blockscale.mlir

[MLIR][NVVM] Fix kFactor for fp8/fp6/fp4 types in MmaSpOp verifier. Improve mma tests. (#183133)

Fix an incorrect kFactor value for e4m3/e5m2, e3m2/e2m3, e2m1 types in
MmaSpOp::verify(). The kFactor for these types was set to 32 but should
be 16.

kFactor is used to compute the expected number of operand A/B register
fragments. With kFactor=32 (wrong) and the only allowed shape m16n8k64,
the fragment count was incorrect. With kFactor=16 (correct), it matches
the PTX ISA definition for mma.sp with fp8/fp6/fp4 A/B operands.

PTX ISA reference:
[https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-sparse-mma](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-sparse-mma)

Also improve existing MLIR dialect tests for nvvm.mma.sp.sync and add
new mlir-translate tests covering mma, mma.sp, and blockscale variants.
DeltaFile
+842-0mlir/test/Target/LLVMIR/nvvm/mma-sparse-blockscale.mlir
+697-0mlir/test/Target/LLVMIR/nvvm/mma-blockscale.mlir
+380-0mlir/test/Target/LLVMIR/nvvm/mma-sp-ordered.mlir
+325-0mlir/test/Target/LLVMIR/nvvm/mma-sp.mlir
+191-0mlir/test/Target/LLVMIR/nvvm/mma-sp-kind.mlir
+65-81mlir/test/Dialect/LLVMIR/nvvm-mma-sp-ordered.mlir
+2,500-813 files not shown
+2,636-2229 files

LLVM/project 9e84bd4llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

use CallbackVH for deletion/RAUW
DeltaFile
+45-0llvm/lib/Analysis/UniformityAnalysis.cpp
+12-1llvm/include/llvm/ADT/GenericUniformityImpl.h
+57-12 files

LLVM/project 717a9abllvm/lib/Analysis InstructionSimplify.cpp, llvm/test/Transforms/InstSimplify structured-gep.ll

[InstSimplify] Add support for llvm.structured.gep (#182874)

Similar to GEP, the SGEP instruction with no indices can be simplified
by directly using the base pointer.
DeltaFile
+91-0llvm/test/Transforms/InstSimplify/structured-gep.ll
+2-0llvm/lib/Analysis/InstructionSimplify.cpp
+93-02 files

LLVM/project 90144c2llvm/lib/Target/WebAssembly WebAssemblyInstrSIMD.td, llvm/test/CodeGen/WebAssembly simd-extadd.ll

[WebAssembly] optimize ext + shuffle + add into addext (#182849)

cc https://github.com/llvm/llvm-project/issues/179143

This adds a second pattern: we already recognize "shuffle + extend +
add" as `addext`, this adds another pattern for "extend + shuffle +
add", which can come up when programs are optimized.
DeltaFile
+52-3llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td
+32-0llvm/test/CodeGen/WebAssembly/simd-extadd.ll
+84-32 files

LLVM/project 9d55f14llvm/lib/Target/SPIRV SPIRVCommandLine.cpp, llvm/test/CodeGen/SPIRV/extensions/SPV_KHR_float_controls2 disabled-on-amd.ll

[SPIRV][AMD] Reenable `SPV_KHR_float_control2` for AMD flavored SPIRV (#182873)

`SPV_KHR_float_controls2` is enabled in the translator after
https://github.com/khronosgroup/spirv-llvm-translator/pull/3475.

This extension was disabled since we were not able to translate it back.

This patch reverts #169659.
DeltaFile
+0-23llvm/test/CodeGen/SPIRV/extensions/SPV_KHR_float_controls2/disabled-on-amd.ll
+0-6llvm/lib/Target/SPIRV/SPIRVCommandLine.cpp
+0-292 files

LLVM/project c48f60ellvm/lib/CodeGen ExpandIRInsts.cpp

Unify expandPow2Division/expandPow2Remainder into expandPow2DivRem.

Merge the two functions into one to share the common signed-path logic (freeze, bias, ashr) and reduce code duplication, as suggested.
DeltaFile
+66-85llvm/lib/CodeGen/ExpandIRInsts.cpp
+66-851 files

LLVM/project 0000728llvm/lib/CodeGen ExpandIRInsts.cpp

Fixed comments as requested.
DeltaFile
+2-1llvm/lib/CodeGen/ExpandIRInsts.cpp
+2-11 files

LLVM/project 92ef987llvm/lib/CodeGen ExpandIRInsts.cpp, llvm/test/CodeGen/AMDGPU div_v2i128.ll div_i128.ll

[CodeGen] Expand power-of-2 div/rem at IR level in ExpandIRInsts.

Previously, power-of-2 div/rem operations wider than
MaxLegalDivRemBitWidth were excluded from IR expansion and left for
backend peephole optimizations. Some backends can fail to process such
instructions in case we switch off DAGCombiner.

Now ExpandIRInsts expands them into shift/mask sequences:
- udiv X, 2^C  ->  lshr X, C
- urem X, 2^C  ->  and X, (2^C - 1)
- sdiv X, 2^C  ->  bias adjustment + ashr X, C
- srem X, 2^C  ->  X - (((X + Bias) >> C) << C)

Special cases handled:
- Division/remainder by 1 or -1 (identity, negation, or zero)
- Exact division (sdiv exact skips bias, produces ashr exact)
- Negative power-of-2 divisors (result is negated)
- INT_MIN divisor (correct via countr_zero on bit pattern)
DeltaFile
+69-1,283llvm/test/CodeGen/AMDGPU/div_v2i128.ll
+148-0llvm/test/Transforms/ExpandIRInsts/X86/sdiv129.ll
+55-93llvm/test/CodeGen/X86/div_i129_v_pow2k.ll
+125-9llvm/lib/CodeGen/ExpandIRInsts.cpp
+115-0llvm/test/Transforms/ExpandIRInsts/X86/srem129.ll
+20-49llvm/test/CodeGen/AMDGPU/div_i128.ll
+532-1,4344 files not shown
+645-1,46410 files

LLVM/project 4fb17f4llvm/test/CodeGen/AMDGPU div_i128.ll, llvm/test/Transforms/ExpandIRInsts/X86 divrem-pow2.ll sdiv129.ll

Addressed review comments:

- Added proofs for power-of-2 div/rem expansion in ExpandIRInsts at
  https://alive2.llvm.org/ce/z/Y-iWm-
- Tests updated as requested.

Also added CreateFreeze() where needed.
DeltaFile
+255-0llvm/test/Transforms/ExpandIRInsts/X86/divrem-pow2.ll
+0-148llvm/test/Transforms/ExpandIRInsts/X86/sdiv129.ll
+0-115llvm/test/Transforms/ExpandIRInsts/X86/srem129.ll
+0-51llvm/test/Transforms/ExpandIRInsts/X86/udiv129.ll
+24-11llvm/test/CodeGen/AMDGPU/div_i128.ll
+0-25llvm/test/Transforms/ExpandIRInsts/X86/urem129.ll
+279-3503 files not shown
+291-3539 files

LLVM/project 634e75fllvm/lib/CodeGen ExpandIRInsts.cpp

Fixed comments as requested.
DeltaFile
+2-1llvm/lib/CodeGen/ExpandIRInsts.cpp
+2-11 files