[AMDGPU] Add structural stall heuristic to scheduling strategies (#169617)
Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.
- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
- Resource conflicts on unbuffered resources (from the SchedModel)
- Sequence-dependent hazards (from GCNHazardRecognizer)
- Add getHazardWaitStates() to GCNHazardRecognizer that returns the
number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
[CIR] Add Involution trait to BitReverseOp and ByteSwapOp (#187862)
bitreverse(bitreverse(x)) == x and byte_swap(byte_swap(x)) == x are
mathematical involutions.
This adds MLIR Involution trait to CIR opetation, it encodes this
property and automatically folds away the outer application when an op's
input is produced by the same op type.
[NFCI][VPlan] Split initial mem-widening into a separate transformation
Preparation change before implementing stride-multiversioning as a
VPlan-based transformation. Might help
https://github.com/llvm/llvm-project/pull/147297/ as well.
[SPIRV] Fix OpBuildNDRange (#186153)
- Fix buildNDRange according to OpenCL and SPIRV specs.
- Fix tablegen SPIRV builtins for ndrange_* functions: despite of OpenCL
spec, the real call has additional first argument - structure return,
changed min and max num arguments accordingly.
- Update test, add checks, combined with BuildNDRange_2