[mlir][bytecode] Add builtin dialect version (#184678)
This adds a singular Builtin dialect version for use with bytecode
serialization. This version is not currently print unless set and not 0
(not planned a bump until next LLVM version). Created a unit test as
this was easiest way to track.
Additionally add emitWarning virtual method to DialectBytecodeReader,
mirroring emitError.
Tested on old mlir-opt reader, and could read, so should be non-breaking
change.
[X86] Reduce -ffixed-r compile-time overhead (#184606)
PR #180242 added reserve-r support across the driver and backend, but it
also introduced avoidable compile-time work in hot paths.
In Clang, delay +egpr detection until -ffixed-r16 through -ffixed-r31
are actually queried instead of computing it for every x86_64
invocation.
In LLVM, store X86Subtarget::ReservedRReg in a fixed-size std::bitset
and update X86RegisterInfo::getReservedRegs() to iterate only over the
reserve-r register ranges instead of scanning every target register.
These changes keep reserve-r behavior unchanged while trimming the extra
compile-time overhead introduced by the PR.
Signed-off-by: ZhouGuangyuan <zhouguangyuan.xian at gmail.com>
[mlir][NVGPU] Fix double spaces in tests after ODS printer fix. NFC. (#185327)
Follow-up to #184253. Update tests that checked for the old double-space
output of GPU and NVVM ops using GPU_DimensionAttr and
SetMaxRegisterActionAttr.
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[mlir][XeGPU] Fix double spaces in tests after ODS printer fix. NFC. (#185324)
Follow-up to #184253. Update tests that checked for the old double-space
output of gpu.block_id using GPU_DimensionAttr.
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
Refactor createIteratorLoop to use OMPIRBuilder utility functions and make end-of-block insertion robust.
- Replace manual splitBasicBlock/branch with splitBB
and redirectTo()
- When insertion point is at BB.end() and the block is terminated, split
before the terminator so the original successor path is preserved
through omp.it.cont
- Add test for unterminated blocks
[Flang][OpenMP] Fix close map flag propagation for derived types in USM (#1557)
This fixes a bug in USM mode where the `close` map type modifer was
attached to some `map.info.op`'s corresponding to user-defined type
members while the parent type instance itself is not marked as `close`.
This fix ensures that if a parent record type map does not have the
'close' flag, it is cleared from its members as well, maintaining
consistency.
Gemini was used to create tests. AI generated test code was reviewed
line-by-line by me. Which were derived from a reproducer I was working
with to debug the issue.
Assisted-by: Gemini <gemini at google.com>
[AMDGPU] Add structural stall heuristic to scheduling strategies
Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.
- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
- Resource conflicts on unbuffered resources (from the SchedModel)
- Sequence-dependent hazards (from GCNHazardRecognizer)
- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
[AMDGPU] Add ML-oriented coexec scheduler selection and queue handling
This patch adds the initial coexec scheduler scaffold for machine
learning workloads on gfx1250.
It introduces function and module-level controls for selecting the
AMDGPU preRA and postRA schedulers, including an `amdgpu-workload-type`
module flag that maps ML workloads to coexec preRA scheduling and a nop
postRA scheduler by default.
It also updates the coexec scheduler to use a simplified top-down
candidate selection path that considers both available and pending
queues through a single flow, setting up follow-on heuristic work.
Refactor and support multiple affinity register for a task
- Support multiple affinity register for a task
- Move iterator loop generate logic to OMPIRBuilder
- Extract iterator loop body convertion logic
- Refactor buildAffinityData by hoisting the creation of affinity_list
- IteratorsOp -> IteratorOp
- Add mlir to llvmir test
[mlir][llvmir][OpenMP] Translate affinity clause in task construct to llvmir
Translate affinity entries to LLVMIR by passing affinity information to
createTask (__kmpc_omp_reg_task_with_affinity is created inside PostOutlineCB).
Implement lowering for omp.iterator in affinity
Create IteratorLoopNestScope for building nested loop for iterator.
Take advantage of RAII so that we can have correct exit for each
level of the loop.
[Flang][mlir][OpenMP] Support affinity clause codegen in Flang (#182222)
This patch translates the Flang AST to the OpenMP dialect for the
affinity clause, including support for the iterator modifier.
2/3 in stack for implementing affinity clause with iterator modifier
1/3 #182218
2/3 #182222
3/3 #182223
[X86] Remove redundant and-not pattern code in X86 (#157687)
These transforms are now handled in DAGCombine, so enable
hasAndNotCompare for all scalar cases on X86, and remove the
platform-specific code that does the same thing.
[mlir][GPU] Fix double spaces in tests after ODS printer fix. NFC. (#185325)
Follow-up to #184253. The ODS attr/type printer fix removed the leading
space from generated print() methods. Update tests that checked for the
old double-space output of GPU ops using GPU_DimensionAttr and
GPU_MmaElementwiseOpAttr.
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[mlir][NVVM] Fix double spaces in tests after ODS printer fix. NFC. (#185326)
Follow-up to #184253. Update tests that checked for the old double-space
output of NVVM ops using ReductionKindAttr, ShflKindAttr, and
LoadCacheModifierAttr.
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[Loads] Add overload for isDerefAndAlignedInLoop that takes SCEVs.(NFC)
Add an overload of isDereferenceableAndAlignedInLoop that directly takes
the pointer and element sizes as SCEVs. This allows using it from
contexts without relying on an underlying load instruction in follow-up
patches.
[clang-tidy] Fix readability-else-after-return for [[likely]]/[[unlikely]] if (#184684)
Following the PR #181878 I have noticed a false negative when if is
attributed.
Repro:
```cpp
void f()
{
if (true) {
return;
} else {
// Warns as expected
}
if (true) [[likely]] {
return;
} else {
[5 lines not shown]
[MLIR] [Bazel] Removed the stubgen plumbing added in #179211 (#185292)
The hope was that we would be able to reuse parts of this for
Google-internal builds and for open-source jaxlib builds, but we ended
up with custom plumbing in both cases, so I am now removing this
effectively dead code.
[CIR] Split CIR_UnaryOp into individual operations
Split the monolithic cir.unary operation (which dispatched on a
UnaryOpKind enum) into four separate operations: cir.inc, cir.dec,
cir.minus, and cir.not.
This follows the same pattern used when cir.binop was split into
individual binary operations (AddOp, SubOp, etc.).
Changes:
- Add CIR_UnaryOpInterface with getInput()/getResult() methods
- Add CIR_UnaryOp and CIR_UnaryOpWithOverflowFlag base classes
- Define IncOp, DecOp, MinusOp, NotOp with per-op folds
- Add Involution trait to NotOp for not(not(x)) -> x folding
- Replace createUnaryOp() with createInc/Dec/Minus/Not builders
- Split LLVM lowering into four separate patterns
- Split LoweringPrepare complex-type handling per unary op
- Update CIRCanonicalize and CIRSimplify for new op types
- Update all codegen files to use bool params instead of UnaryOpKind
[6 lines not shown]
[ARM] Add basic NPM support for LoadStoreOptimizer (#184139)
This is similar to #184090 for ARM, porting the LoadStoreOptimizer to
the new pass manager. The time there are both a pre-ra and post-ra
variant that are ported.