[mlir][bufferization] Add error for memref return types
Add validation to reject functions with memref return types, as static
memory planning is incompatible with returning memrefs. In allocate mode,
the arena is freed at function exit, making returned memrefs invalid. In
arg mode, returning a memref from the input arena violates typical memory
ownership patterns.
When a function has memref return types, the pass:
1. Emits a clear error message
2. Fails gracefully without transforming the function
3. Preserves the original IR
This prevents silent bugs where returned memrefs would point to freed or
external memory.
Changes:
- Add return type validation at start of runOnOperation()
- Check all result types for MemRefType
[2 lines not shown]
[mlir][bufferization] Add arena-mode pass option (allocate vs arg)
Add arena-mode pass option to control how the shared arena is obtained:
- 'allocate' (default): Creates arena via memref.alloc within the function
- 'arg': Uses function's first argument as the pre-allocated arena
The 'arg' mode is useful when the arena is pre-allocated externally and
passed to the function, enabling use cases like pre-allocated scratch
buffers or memory pools.
In 'arg' mode, the pass validates that:
1. The context is a function operation
2. The function has at least one argument
3. The first argument is memref<...xi8>
If validation fails, the pass emits an error and fails gracefully.
Changes:
- Add arena-mode option to Passes.td with default 'allocate'
[3 lines not shown]
[mlir][bufferization] Convert arena to i8 byte buffer with memref.view
Change the arena from typed (e.g., memref<Nxf32>) to a generic i8 byte buffer
(memref<Nxi8>). This allows a single arena to hold allocations of different
element types (f32, i64, i16, etc.).
Use memref.view to create typed views into the i8 arena at computed byte
offsets. This is the standard MLIR pattern for type-agnostic memory buffers.
Changes:
- Arena is now memref<totalSizexi8> instead of element-typed
- Use memref.view instead of memref.subview + reinterpret_cast
- Byte offsets passed directly to memref.view via arith.constant
- Update all tests to reflect i8 arena + view pattern
Example transformation:
Before: memref.alloc() : memref<1024xf32>
After: %arena = memref.alloc() : memref<4096xi8>
%c0 = arith.constant 0 : index
%view = memref.view %arena[%c0][] : memref<4096xi8> to memref<1024xf32>
[mlir][bufferization] Extract memory planning into pure function
Separate memory planning logic from IR transformation by introducing
trivialMemoryPlanner() - a pure function that computes buffer offsets
without depending on MLIR operations.
Changes:
- Add Alloc structure for allocation-independent planning
- Implement trivialMemoryPlanner(arenaAlignment, allocs) -> offsets
- Refactor runOnOperation() to use the planning function
- Planning logic is now testable independently of MLIR
This architecture enables plugging in different allocation strategies
(firstFit, bestFit) without modifying IR transformation code.
[mlir][bufferization] Add alignment support to static memory planner
Track alignment requirements from memref.alloc operations and ensure
offsets are properly padded to meet alignment constraints. The arena
allocation receives the maximum alignment of all transformed allocations.
Changes:
- Add alignment field to AllocationCandidate structure
- Compute sizes in bytes to handle alignment padding correctly
- Implement alignOffset() helper to pad offsets to alignment boundaries
- Set arena alignment attribute to maximum required alignment
- Add test demonstrating alignment padding with 64 and 128-byte requirements
This ensures correctness for SIMD and other alignment-sensitive operations.
[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask
Follow up from comments on https://github.com/llvm/llvm-project/pull/202886
Make HWEvent a bitmask by default instead of having both the enum, and a separate HWEventSet. This has the advantage of streamlining the code a bit and opening the possibility of adding "modifiers" to events, e.g. I imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked into the design.
I opted for a bit more verbosity by taking inspiration from FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a class w/ helper function. The downside is having to reimplement all the little bitwise ops, but the result is a cleaner, simpler interface than a raw enum (class) w/ many helper functions. I initially tried that but I recoiled at the sight of things like `contains(A, B)` which isn't very clear, while `A.contains(B)` is self explanatory.
Considering HWEvent is a bitmask, I also implemented a simple iterator to iterate over all set bits of the mask, which is a useful thing to have as some APIs in InsertWaitCnt rely on treating one event at a time.
[LoopInterchange] Reject inner preheader PHIs with non-identical incoming values (#203842)
When the outer loop header branches to the inner loop preheader via duplicate edges (e.g. br i1 %c, label %inner.ph, label %inner.ph), the preheader can contain PHI nodes with more than one incoming entry for the same predecessor. The transform eliminates these PHIs by substituting each with its first incoming value, but the existing assert required exactly one incoming value and would fire on such input.
Relax the assert to accept any PHI where all incoming values are identical. A PHI with distinct values for the same predecessor is rejected by the IR verifier, so only the identical-value case can arise in practice.
Fixes #203466
[AArch64][SME] Split FP8 FTMOPA intrinsics (#203310)
Introduce separate FP8 FTMOPA intrinsics for ZA16 and ZA32:
llvm.aarch64.sme.fp8.ftmopa.za16
llvm.aarch64.sme.fp8.ftmopa.za32
The FP8 FTMOPA forms need to model their FPMR dependency, so they should
not share the same intrinsic definitions as the non-FP8 FTMOPA forms.
Update the Clang SME builtin definitions and AArch64 instruction
patterns to use the new intrinsics, and add AutoUpgrade support for the
previous FP8-shaped llvm.aarch64.sme.ftmopa.* spellings so existing IR
and bitcode continue to work.
This was split out from #154144 because the intrinsic upgrade needs to
be handled separately to avoid breaking existing bitcode.
Clear last_modified after each response on a persistent HTTP connection
(In case a later response doesn't contain its own "Last-Modified" header field.)
Reported by Ties de Kock.
OK tb@ claudio@
sysutils/mdfried: Update to 0.22.2
- Add option PDF (default ON) to enable support for PDF files
- Add a list of mdfried features to pkg-descr
Reported by: "github-actions[bot]" <notifications at github.com>
[CIR][AArch64] Lower NEON laneq FMA builtins (#202337)
Lower additional AArch64 NEON laneq fused multiply-accumulate builtins
in CIR.
This covers:
- `BI__builtin_neon_vfmaq_laneq_v`
- `vfmaq_laneq_f16`
- `vfmaq_laneq_f32`
- `vfmaq_laneq_f64`
- `BI__builtin_neon_vfmad_laneq_f64`
- `vfmad_laneq_f64`
For `vfmaq_laneq_v`, the lowering bitcasts the operands, splats the
selected lane source, and emits the `llvm.fma` intrinsic with the
operand order matching classic AArch64 CodeGen.
For `vfmad_laneq_f64`, the lowering extracts the selected lane from the
`float64x2_t` source and emits scalar `llvm.fma.f64`.
[7 lines not shown]
[CIR][AArch64] Lower NEON subtraction intrinsics (#202857)
### summary
part of : https://github.com/llvm/llvm-project/issues/185382
- Add CIR lowering for the scalar AArch64 NEON subtraction builtins
`vsubd_s64` and `vsubd_u64`.
- Verify that the remaining signed, unsigned, and floating-point
`vsub/vsubq` intrinsics are correctly expanded through arm_neon.h and
emitted as `cir.sub`.