LLVM/project 5d7975fmlir/test/python/dialects transform_interpreter.py

Update transform_interpreter.py
DeltaFile
+1-0mlir/test/python/dialects/transform_interpreter.py
+1-01 files

LLVM/project df63a02mlir/test/python/dialects transform_interpreter.py

Update transform_interpreter.py
DeltaFile
+8-1mlir/test/python/dialects/transform_interpreter.py
+8-11 files

LLVM/project 728c4b5llvm/test/CodeGen/AMDGPU fmul-to-ldexp.ll llvm.log10.ll

[AMDGPU] si-peephole-sdwa: Handle V_PACK_B32_F16_e64 (WIP)

Change si-peephole-sdwa to eliminate V_PACK_B32_F16_e64 instructions
by changing the second operand to write to the upper word of the
destination directly.
DeltaFile
+126-140llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll
+138-98llvm/test/CodeGen/AMDGPU/llvm.log10.ll
+138-98llvm/test/CodeGen/AMDGPU/llvm.log.ll
+92-104llvm/test/CodeGen/AMDGPU/fpow.ll
+68-127llvm/test/CodeGen/AMDGPU/llvm.log2.ll
+74-118llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
+636-68529 files not shown
+1,251-1,34835 files

LLVM/project 74a9e06llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.cos.f16.ll llvm.sin.f16.ll

[AMDGPU] Enable ISD::{FSIN,FCOS} custom lowering to work on v2f16

Currently ISD::FSIN and ISD::FCOS of type MVT::v2f16 are legalized by
first expanding and then using a custom lowering on the resulting f16
instructions. This ordering prevents using packed math variants of the
instructions introduced by the legalization (e.g. the multiplication),
if available, and makes it difficult to eliminate the packing of the
results by using SDWA form; previous attempts to deal with the latter
situation in the si-peephole-sdwa pass were unwieldly since it was
necessary to reconstruct the association between the source and target
vectors.

Change the legalization action for ISD::FSIN and ISD::FCOS of type
MTF::v2f16 to Custom and change the custom intrinsic lowering to deal
with the v2f16 for the intrinsics introduced in this way.
DeltaFile
+27-38llvm/test/CodeGen/AMDGPU/llvm.cos.f16.ll
+27-38llvm/test/CodeGen/AMDGPU/llvm.sin.f16.ll
+34-3llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+88-793 files

LLVM/project 516fcd8llvm/lib/Target/AMDGPU SIISelLowering.cpp

[AMDGPU] SIIselLowering: Use intrinsics in LowerTrig

This allows to apply further legalization actions to the
resulting nodes which is a preparatory step to extend the
custom lowering to vector types.
DeltaFile
+12-9llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+12-91 files

LLVM/project e90dd53clang/include/clang/Analysis/Analyses/LifetimeSafety FactsGenerator.h Loans.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp

std_move false positive
DeltaFile
+23-0clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+18-0clang/test/Sema/warn-lifetime-safety.cpp
+5-0clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
+2-0clang/include/clang/Analysis/Analyses/LifetimeSafety/Loans.h
+48-04 files

LLVM/project 9a2d3abmlir/lib/Dialect/XeGPU/IR XeGPUDialect.cpp, mlir/lib/Dialect/XeGPU/Transforms XeGPUWgToSgDistribute.cpp

[MLIR][XeGPU] Add support for cross-subgroup reduction from wg to sg (#170936)

This PR adds support for cross-sg reduction whilst distributing from
workgroup to subgroup. It has following limitation
1. Cannot reduce to a scalar
2. For cross-sg, only 1:1 decomposition (each sg should be assigned only
one tile in the original WG tile) is supported for now. For example for
a WG tile of size 256x128, sg_layout = [8, 4], sg_data = [16, 16] wont
be supported.
DeltaFile
+353-34mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp
+183-0mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir
+0-19mlir/test/Dialect/XeGPU/invalid.mlir
+6-11mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp
+3-1mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops-rr.mlir
+545-655 files

LLVM/project 1056e32llvm/test/Transforms/LoopVectorize multiple-early-exits.ll unsupported_early_exit.ll

[LV] Precommit additional early-exit tests from #174864.

Pre-commit tests from https://github.com/llvm/llvm-project/pull/174864.
DeltaFile
+150-88llvm/test/Transforms/LoopVectorize/multiple-early-exits.ll
+222-0llvm/test/Transforms/LoopVectorize/unsupported_early_exit.ll
+39-0llvm/test/Transforms/LoopVectorize/early_exit_legality.ll
+33-0llvm/test/Transforms/LoopVectorize/uncountable-early-exit-vplan.ll
+444-884 files

LLVM/project b86c84cflang/lib/Lower Bridge.cpp

[flang] Handle unused variable (NFC) (#176274)

DeltaFile
+1-2flang/lib/Lower/Bridge.cpp
+1-21 files

LLVM/project 554d6ae.github/workflows release-tasks.yml

[github] Fix release parameter to uncomment download links step (#176386)

I thought I could remove validate-tag from the "needs" because
release-binaries also "needs" validate-tag. Turns out that we get the
release version from an output of validate-tag and if it isn't in the
"needs" section we get an empty string when substitution happens.
Leading to this error:

./llvm/utils/release/./github-upload-release.py --token "$GITHUB_TOKEN"
--release uncomment_download_links
github-upload-release.py: error: the following arguments are required:
command

Put back validate-tag.

Fixes 822a45f4b4909289f84d119f1e5891b486d74f5e.
DeltaFile
+1-0.github/workflows/release-tasks.yml
+1-01 files

LLVM/project 052fb00clang/include/clang/Options Options.td, clang/test/Driver cl-options.c

[clang] Expose -fmodules-disable-diagnostic-validation as clang-cl option (#176285)

DeltaFile
+1-1clang/include/clang/Options/Options.td
+1-0clang/test/Driver/cl-options.c
+2-12 files

LLVM/project f97f53emlir/include/mlir/Dialect/Tosa/IR TosaOps.td, mlir/lib/Dialect/Tosa/IR TosaOps.cpp

[mlir][tosa] Add support for CONV2D_BLOCK_SCALED operator (#172294)

This commit adds support for an MXFP CONV2D operation,
CONV2D_BLOCK_SCALED, added to the specification in
https://github.com/arm/tosa-specification/commit/408a5e53f5a7357adef7121ba3cc88e2225d4231.

This includes:
- Operator definition
- Addition of the EXT_MXFP_CONV extension
- Verification logic for the operator
- Output shape inference for the operator
- Validation checks to ensure compliance with the TOSA specification.
DeltaFile
+315-79mlir/lib/Dialect/Tosa/IR/TosaOps.cpp
+132-0mlir/test/Dialect/Tosa/verifier.mlir
+71-0mlir/test/Dialect/Tosa/level_check.mlir
+49-2mlir/lib/Dialect/Tosa/Transforms/TosaValidation.cpp
+48-0mlir/test/Dialect/Tosa/tosa-infer-shapes.mlir
+37-0mlir/include/mlir/Dialect/Tosa/IR/TosaOps.td
+652-8111 files not shown
+766-9317 files

LLVM/project 6856ddallvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Apply monotonicity check for Strong SIV
DeltaFile
+83-55llvm/lib/Analysis/DependenceAnalysis.cpp
+27-35llvm/test/Analysis/DependenceAnalysis/SymbolicSIV.ll
+18-18llvm/test/Analysis/DependenceAnalysis/SymbolicRDIV.ll
+13-19llvm/test/Analysis/DependenceAnalysis/StrongSIV.ll
+9-13llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll
+14-4llvm/include/llvm/Analysis/DependenceAnalysis.h
+164-14412 files not shown
+197-20318 files

LLVM/project d358d7bllvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Move some monotonicity declarations to header file (NFC)
DeltaFile
+35-111llvm/lib/Analysis/DependenceAnalysis.cpp
+80-0llvm/include/llvm/Analysis/DependenceAnalysis.h
+115-1112 files

LLVM/project 9587892llvm/include/llvm/TargetParser Triple.h, llvm/lib/TargetParser Triple.cpp

[Triple] Add "chipstar" OS components (#170655)

This new component is for Clang driver for selecting HIPSPV toolchain.
DeltaFile
+5-0llvm/unittests/TargetParser/TripleTest.cpp
+3-0llvm/lib/TargetParser/Triple.cpp
+2-1llvm/include/llvm/TargetParser/Triple.h
+10-13 files

LLVM/project 4afcc4bclang/include/clang/Options Options.td, clang/lib/Driver/ToolChains Clang.cpp

Add `-Xoffload-compiler` option (#170467)

... to forward input to clang-linker-wrapper's device compiler
invocation.

(a separate patch as requested in #168043)
DeltaFile
+10-5clang/lib/Driver/ToolChains/Clang.cpp
+6-0clang/test/Driver/openmp-offload-gpu.c
+4-0clang/include/clang/Options/Options.td
+20-53 files

LLVM/project 83ffe1emlir/test/python/dialects transform_interpreter.py

[MLIR][Python] add builtin module transform test
DeltaFile
+19-0mlir/test/python/dialects/transform_interpreter.py
+19-01 files

LLVM/project 5f697b3llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

AArch64: Use getLibcallImplCallingConv more consistently (#176377)

This was querying the calling conv from the Libcall instead of
the LibcallImpl.
DeltaFile
+2-1llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+2-11 files

LLVM/project 4c0f295llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp AMDGPUGlobalISelUtils.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel fpext.ll unmerge-sgpr-s16.ll

AMDGPU/GlobalISel: Regbanklegalize rules for G_UNMERGE_VALUES

Move G_UNMERGE_VALUES handling to AMDGPURegBankLegalizeRules.cpp.
Fix sgpr S16 unmerge by lowering using shift and using S32.
Previously sgpr S16 unmerge was selected using _lo16 and _hi16 subreg
indexes which are exclusive to vgpr register classes.
For remaing cases we do trivial mapping, assigns same reg bank
to all operands, vgpr or sgpr.
DeltaFile
+47-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+13-27llvm/test/CodeGen/AMDGPU/GlobalISel/fpext.ll
+36-0llvm/test/CodeGen/AMDGPU/GlobalISel/unmerge-sgpr-s16.ll
+18-0llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
+10-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+6-1llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h
+130-282 files not shown
+137-318 files

LLVM/project bd17d55llvm/lib/Target/AArch64 AArch64SelectionDAGInfo.cpp

AArch64: Avoid getLibcallName when emitting special mem libcalls (#176376)

Get the symbol through the RTLIB::LibcallImpl enum.
DeltaFile
+6-3llvm/lib/Target/AArch64/AArch64SelectionDAGInfo.cpp
+6-31 files

LLVM/project 832b091llvm/test/TableGen directive2.td directive1.td, llvm/utils/TableGen/Basic DirectiveEmitter.cpp

Restore comment with a period at the end
DeltaFile
+2-0llvm/test/TableGen/directive2.td
+2-0llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
+2-0llvm/test/TableGen/directive1.td
+6-03 files

LLVM/project 129ccf9llvm/include/llvm/Transforms/Utils LowerMemIntrinsics.h, llvm/lib/Transforms/Utils LowerMemIntrinsics.cpp

Add an overload of `expandMemSetAsLoop` that takes an optional TTI pointer

This avoids breaking the API for out-of-tree tools like the
SPIRV-LLVM-Translator.
DeltaFile
+24-12llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+7-0llvm/include/llvm/Transforms/Utils/LowerMemIntrinsics.h
+31-122 files

LLVM/project 14349bcllvm/lib/Target/AVR AVRISelLowering.cpp

AVR: Avoid getLibcallName (#176375)

Create the symbol through RTLIB::LibcallImpl
DeltaFile
+8-4llvm/lib/Target/AVR/AVRISelLowering.cpp
+8-41 files

LLVM/project 47689d2llvm/utils/release github-upload-release.py

[llvm][utils][release] Remove mention of sub-project source archives (#176348)

These are no longer provided as of llvm 22:
https://discourse.llvm.org/t/llvm-22-1-0-rc1-released/89479

> Please note: since the last release the subproject tarballs have been
> removed and are no longer provided. See RFC: Do "something" with the
> subproject tarballs in the release page for more details.

There are now only llvm-project and llvm-test-suite archives.
DeltaFile
+1-1llvm/utils/release/github-upload-release.py
+1-11 files

LLVM/project 9e6b658llvm/lib/Target/Hexagon HexagonSelectionDAGInfo.cpp

Hexagon: Avoid using getLibcallName for special memcpy (#176374)

Create the symbol through the RTLIB::LibcallImpl enum.
DeltaFile
+7-4llvm/lib/Target/Hexagon/HexagonSelectionDAGInfo.cpp
+7-41 files

LLVM/project 907b6c6llvm/test/CodeGen/AMDGPU fmul-to-ldexp.ll llvm.log10.ll

[AMDGPU] si-peephole-sdwa: Handle V_PACK_B32_F16_e64 (WIP)

Change si-peephole-sdwa to eliminate V_PACK_B32_F16_e64 instructions
by changing the second operand to write to the upper word of the
destination directly.
DeltaFile
+126-140llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll
+138-98llvm/test/CodeGen/AMDGPU/llvm.log10.ll
+138-98llvm/test/CodeGen/AMDGPU/llvm.log.ll
+92-104llvm/test/CodeGen/AMDGPU/fpow.ll
+68-127llvm/test/CodeGen/AMDGPU/llvm.log2.ll
+74-118llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
+636-68529 files not shown
+1,251-1,34835 files

LLVM/project a536850llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.cos.f16.ll llvm.sin.f16.ll

[AMDGPU] Enable ISD::{FSIN,FCOS} custom lowering to work on v2f16

Currently ISD::FSIN and ISD::FCOS of type MVT::v2f16 are legalized by
first expanding and then using a custom lowering on the resulting f16
instructions. This ordering prevents using packed math variants of the
instructions introduced by the legalization (e.g. the multiplication),
if available, and makes it difficult to eliminate the packing of the
results by using SDWA form; previous attempts to deal with the latter
situation in the si-peephole-sdwa pass were unwieldly since it was
necessary to reconstruct the association between the source and target
vectors.

Change the legalization action for ISD::FSIN and ISD::FCOS of type
MTF::v2f16 to Custom and change the custom intrinsic lowering to deal
with the v2f16 for the intrinsics introduced in this way.
DeltaFile
+27-38llvm/test/CodeGen/AMDGPU/llvm.cos.f16.ll
+27-38llvm/test/CodeGen/AMDGPU/llvm.sin.f16.ll
+34-3llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+88-793 files

LLVM/project 2eb709bllvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp, llvm/test/CodeGen/AMDGPU whole-wave-functions.ll vgpr-set-msb-coissue.mir

[AMDGPU] Fix typo in `LowerVGPREncoding` to allow it to hoist past `waitcnt` instructions (#176355)

Fixes a typo which prevented `set_vgpr_msb` to be hoisted past `waitcnt`
instructions.
DeltaFile
+3-3llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+2-2llvm/test/CodeGen/AMDGPU/whole-wave-functions.ll
+1-1llvm/test/CodeGen/AMDGPU/vgpr-set-msb-coissue.mir
+6-63 files

LLVM/project 135744cllvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Consider nsz when simplifying fabs/fneg uses (#176156)

Later this trick should also be applied in the single use
case.
DeltaFile
+11-4llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+3-3llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+14-72 files

LLVM/project 9d43694llvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/Transforms/AtomicExpand/AMDGPU expand-atomic-f64-system.ll expand-atomic-f32-agent.ll

AtomicExpand: Use LibcallLoweringInfo analysis
DeltaFile
+36-8llvm/lib/CodeGen/AtomicExpandPass.cpp
+8-8llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-system.ll
+8-8llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-agent.ll
+8-8llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-system.ll
+8-8llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-agent.ll
+8-8llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-i32-agent.ll
+76-4855 files not shown
+220-18561 files