[Flang-rt] Remove COMPILE_ONLY from flang-rt CMake file. (#169534)
COMPILE_ONLY was introduced in cmake 3.27.0. We cannot use this feature,
because LLVM supports cmake 3.20.0.
[AMDGPU] Add structural stall heuristic to scheduling strategies
Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions from the
pending queue.
- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
- Resource conflicts on unbuffered resources (from the SchedModel)
- Sequence-dependent hazards (from GCNHazardRecognizer)
- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
py-test-run-parallel: updated to 0.8.0
0.8.0
Update license infomation format in pyproject.toml
Fix typos discovered by codespell
Add a parallel_threads_limit mark
Detect gc.collect and mark tests as thread unsafe
Test on 3.15 and 3.15t on CI
Deprecated parallel_threads marker when n>1
Raise helpful error when forever is combined with 0 selected tests
[AMDGPU] Add scaffolding for ML focused scheduling strategy
This patch introduces scaffolding for a new machine instruction
scheduling strategy optimized for machine learning workloads.
Enable the ML scheduler automatically when functions have the
"amdgpu-workload-type"="ml" attribute.