LLVM/project 89503bdllvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

[AMDGPU] Add structural stall heuristic to scheduling strategies (#169617)

Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.

- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
  - Resource conflicts on unbuffered resources (from the SchedModel)
  - Sequence-dependent hazards (from GCNHazardRecognizer)

- Add getHazardWaitStates() to GCNHazardRecognizer that returns the
number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
DeltaFile
+41-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+38-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+3-4llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+6-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+4-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+4-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+96-71 files not shown
+98-77 files

LLVM/project 412aaebclang/include/clang/CIR/Dialect/IR CIROps.td, clang/test/CIR/Transforms bit.cir

[CIR] Add Involution trait to BitReverseOp and ByteSwapOp (#187862)

bitreverse(bitreverse(x)) == x and byte_swap(byte_swap(x)) == x are
mathematical involutions.

This adds MLIR Involution trait to CIR opetation, it encodes this
property and automatically folds away the outer application when an op's
input is produced by the same op type.
DeltaFile
+20-0clang/test/CIR/Transforms/bit.cir
+4-0clang/include/clang/CIR/Dialect/IR/CIROps.td
+24-02 files

LLVM/project f545b56llvm/lib/Transforms/Scalar JumpThreading.cpp, llvm/test/Transforms/JumpThreading update-bpi-bfi-unfold-select.ll

[JT] `tryToUnfoldSelectInCurrBB` should update BFI & BPI if present
DeltaFile
+58-0llvm/test/Transforms/JumpThreading/update-bpi-bfi-unfold-select.ll
+32-0llvm/lib/Transforms/Scalar/JumpThreading.cpp
+90-02 files

LLVM/project c9b9079llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Don't use RPOT, per review feedback
DeltaFile
+3-11llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+3-111 files

LLVM/project 0bb1ca9llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Reduction store needs to be processed on scalar VPlan
DeltaFile
+6-5llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+6-51 files

LLVM/project 69e0839llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPRecipeBuilder.h

Avoid exposing `RecipeBuilder.getVPBuilder()`
DeltaFile
+1-2llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+0-2llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+1-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+2-43 files

LLVM/project 359cdd7llvm/lib/Transforms/Vectorize LoopVectorize.cpp

Drop leftover comment
DeltaFile
+0-1llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+0-11 files

LLVM/project 1e5beb8llvm/lib/Transforms/Vectorize VPlanTransforms.cpp LoopVectorize.cpp

Move another `Legal` use to `VPRecipeBuilder::replaceWithFinalIfReductionStore`
DeltaFile
+10-21llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+21-1llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+8-0llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+1-3llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+40-254 files

LLVM/project 1278cd1llvm/lib/Transforms/Vectorize VPRecipeBuilder.h LoopVectorize.cpp

Fold one `Legal` use into `tryToWidenHistogram` renamed to `widenIfHistogram`
DeltaFile
+6-6llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+10-2llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+3-5llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+19-133 files

LLVM/project 33a3682llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Braces for outer `if`
DeltaFile
+2-1llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+2-11 files

LLVM/project b744712llvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlanTransforms.cpp

Move to VPlanTransforms, have to pass Legal explicitly
DeltaFile
+1-78llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+76-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+3-1llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+80-793 files

LLVM/project 1dc4df1llvm/lib/Transforms/Vectorize LoopVectorize.cpp

Don't make unnecessary captures
DeltaFile
+1-1llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+1-11 files

LLVM/project a736983llvm/lib/Transforms/Vectorize LoopVectorize.cpp VPRecipeBuilder.h, llvm/test/Transforms/LoopVectorize/AArch64 predication_costs.ll

[NFCI][VPlan] Split initial mem-widening into a separate transformation

Preparation change before implementing stride-multiversioning as a
VPlan-based transformation. Might help
https://github.com/llvm/llvm-project/pull/147297/ as well.
DeltaFile
+92-30llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+14-12llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+3-2llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
+4-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+1-0llvm/test/Transforms/LoopVectorize/VPlan/vplan-print-after-all.ll
+114-445 files

LLVM/project fdc6203mlir/include/mlir/Dialect/XeGPU/Transforms Transforms.h, mlir/lib/Dialect/XeGPU/Transforms XeGPUSubgroupDistribute.cpp XeGPUSgToWiDistributeExperimental.cpp

fix xegpu
DeltaFile
+2-17mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+2-16mlir/lib/Dialect/XeGPU/Transforms/XeGPUSgToWiDistributeExperimental.cpp
+15-0mlir/include/mlir/Dialect/XeGPU/Transforms/Transforms.h
+19-333 files

LLVM/project e672e88llvm/lib/Target/SPIRV SPIRVBuiltins.cpp SPIRVBuiltins.td, llvm/test/CodeGen/SPIRV/transcoding BuildNDRange.ll BuildNDRange_2.ll

[SPIRV] Fix OpBuildNDRange (#186153)

- Fix buildNDRange according to OpenCL and SPIRV specs.
- Fix tablegen SPIRV builtins for ndrange_* functions: despite of OpenCL
spec, the real call has additional first argument - structure return,
changed min and max num arguments accordingly.
 - Update test, add checks, combined with BuildNDRange_2
DeltaFile
+126-50llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp
+119-11llvm/test/CodeGen/SPIRV/transcoding/BuildNDRange.ll
+0-83llvm/test/CodeGen/SPIRV/transcoding/BuildNDRange_2.ll
+41-0llvm/test/CodeGen/SPIRV/transcoding/BuildNDRange_no_sret.ll
+3-3llvm/lib/Target/SPIRV/SPIRVBuiltins.td
+289-1475 files

LLVM/project d2e7041llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel llvm.amdgcn.live.mask.ll regbankselect-amdgcn.live.mask.mir

AMDGPU/GlobalISel: RegBankLegalize rules for live_mask (#187833)
DeltaFile
+19-0llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.live.mask.ll
+1-2llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.live.mask.mir
+2-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+22-23 files

LLVM/project dc4073futils/bazel/llvm-project-overlay/mlir BUILD.bazel

[Bazel] Fixes 239ca11 (#188083)

This fixes 239ca11a55b40ce12b21bc47e45cb4065d1cc3d4.
DeltaFile
+3-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+3-01 files

LLVM/project 9ba03abflang/lib/Optimizer/HLFIR/Transforms CMakeLists.txt, flang/lib/Optimizer/Transforms CMakeLists.txt

fix build errors
DeltaFile
+11-0flang/lib/Optimizer/Transforms/CMakeLists.txt
+10-0flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
+21-02 files

LLVM/project 8330903. benchmark_build.sh

measure memory too
DeltaFile
+44-20benchmark_build.sh
+44-201 files

LLVM/project 1fbc7ed. benchmark_build.sh

use time
DeltaFile
+29-10benchmark_build.sh
+29-101 files

LLVM/project 87076ddclang/lib/ScalableStaticAnalysisFramework/Core CMakeLists.txt

fix AnalysisRegistry
DeltaFile
+5-0clang/lib/ScalableStaticAnalysisFramework/Core/CMakeLists.txt
+5-01 files

LLVM/project 136a7f8. benchmark_build.sh

normalize script
DeltaFile
+10-81benchmark_build.sh
+10-811 files

LLVM/project 8d5ca18. benchmark_build.sh

measure memory too
DeltaFile
+13-39benchmark_build.sh
+13-391 files

LLVM/project f51539aclang-tools-extra/clangd/fuzzer CMakeLists.txt

LLVMFuzzerTestOneInput
DeltaFile
+3-0clang-tools-extra/clangd/fuzzer/CMakeLists.txt
+3-01 files

LLVM/project e024afellvm/lib/Support CMakeLists.txt

fix support on windows
DeltaFile
+9-0llvm/lib/Support/CMakeLists.txt
+9-01 files

LLVM/project 25de750.ci monolithic-linux.sh monolithic-windows.sh

patch monolithic
DeltaFile
+46-44.ci/monolithic-linux.sh
+14-12.ci/monolithic-windows.sh
+60-562 files

LLVM/project 0ec630bclang/lib/CIR/CodeGen CMakeLists.txt, flang/lib/Evaluate CMakeLists.txt

fix debug for all targets
DeltaFile
+19-0clang/lib/CIR/CodeGen/CMakeLists.txt
+1-15mlir/lib/Dialect/Affine/Transforms/DecomposeAffineOps.cpp
+14-0flang/lib/Evaluate/CMakeLists.txt
+8-0mlir/lib/Dialect/XeGPU/IR/CMakeLists.txt
+7-0flang/lib/Optimizer/CodeGen/CMakeLists.txt
+6-0mlir/examples/toy/Ch3/CMakeLists.txt
+55-1511 files not shown
+100-1517 files

LLVM/project 416f0c0. benchmark_build.sh

Change iteration count from 10 to 2
DeltaFile
+1-1benchmark_build.sh
+1-11 files

LLVM/project bcf06f1llvm/lib/CodeGen CMakeLists.txt, llvm/lib/Target/RISCV CMakeLists.txt

add comments
DeltaFile
+6-20llvm/lib/Transforms/Vectorize/CMakeLists.txt
+6-0llvm/lib/CodeGen/CMakeLists.txt
+3-0llvm/lib/Target/WebAssembly/CMakeLists.txt
+3-0llvm/lib/Target/X86/CMakeLists.txt
+3-0llvm/lib/Target/RISCV/CMakeLists.txt
+3-0mlir/test/lib/Dialect/Affine/CMakeLists.txt
+24-2050 files not shown
+82-2056 files

LLVM/project 4f0d852. benchmark_build.sh

add benchmark script
DeltaFile
+99-0benchmark_build.sh
+99-01 files