[mlir][bufferization] Introduce reconcileBufferTypeMismatchFn hook (#202667)
This PR is the first part of the work that aims to allow customizations
in resolving mismatching buffer types.
Add a new bufferization hook that lets downstream bufferization
implementations define how to handle buffer mismatches that appear
during type inference in various upstream scenarios.
The hook is used as a fallback mechanism in several upstream operations.
For example, when bufferizing block signatures (scf.execute_region), and
resolving "branch" conflicts (scf.if, scfl.index_switch, scf.for,
arith.select).
The hook returns a valid buffer type when reconciliation succeeded;
failure indicates reconciliation failure and should be treated as
bufferization failure. The caller of the hook is expected to use the
returned buffer type. By default, a memref with fully-dynamic layout map
is returned (for unranked case, buffers are assumed to match).
[3 lines not shown]
[ARM] Fix Machine Outliner crash when tBLXr uses non-tcGPR register (#200684)
When the Machine Outliner selects MachineOutlinerThunk mode for a
sequence ending in tBLXr/tBLXr_noip, it converts the indirect call to
tTAILJMPr in buildOutlinedFrame. However tTAILJMPr requires its operand
to be in tcGPR (R0-R3, R12), while tBLXr accepts any GPR.
If the register is callee-saved (e.g. r4), the Machine Verifier crashes
with 'Illegal physical register for instruction'.
Fixes #188076
[mlir][spirv] Re-enable bf16/fp8 for vector composite ops (#204848)
Allow bf16 and fp8 vector element types in VectorExtractDynamic,
VectorInsertDynamic, and VectorShuffle.
clang: Use the effective triple string for offload jobs
Track the future effective triple for the job, rather than
the toolchain's default triple. In the future this will
change the result when amdgpu starts adjusting the triples
to contain subarches.
[mlir][tosa] Check same input/output types in pooling ops verifier (#203565)
Adds a missing check to make sure the input and output types of pooling
ops have the same element type.
[AMDGPU] Remove stale declarations. NFC. (#205047)
Remove declarations of functions that are never defined. Also remove
unused field AMDGPUInstructionSelector::TM.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
clang/AMDGPU: Use effective triple instead of raw toolchain triple
Start using the effective triple instead of the raw toolchain triple.
For the moment this is NFC, but will change when new uses of the subarch
field are introduced.
[LV] Allow scalable VFs in `-force-vector-width` (and use in tests) (#204953)
This updates `-force-vector-width=VF` to accept scalable VFs. If a
scalable width is specified it is assumed the target supports scalable
vectors.
So for example, `-force-vector-width="vscale x 4"` works as a shorthand
for `-scalable-vectorization=always -force-target-supports-scalable-vectors=true -force-vector-width=4`.
[IRBuilder] (Target|InstSimplify)-fold intrinsics (#204967)
Includes changes to guard against a nullptr TLI and Call. TargetFold or
InstSimplify fold in IRBuilderBase::CreateIntrinsic, in the same way we
fold in Create(Unary|Binary)Intrinsic.
[AMDGPU][doc] Refactor Barrier Execution Model (#204566)
Remove everything that has to do with named barriers and put it in a
series of model extensions specific to /sbarrier/named-barriers.
I had to change a few things to make it fit, in summary:
Base Model:
- (~) Stylistic changes that make it easier to refer to specific rules.
Each rule is in a rubric instead of a bullet point.
- (-) No longer defines `barrier-mutually-exclusive`
- (-) No longer defines barrier `join` and any associated rule.
New named barrier extensions
- (+) Define "named barrier" as a sub-type of barrier objects. This
makes barrier-mutually-exclusive redundant.
- (+) Define barrier join as an op that can exclusively be done on
[17 lines not shown]
[clang] Respect `CLANG_USE_EXPERIMENTAL_CONST_INTERP` (#200716)
Seems like https://github.com/llvm/llvm-project/pull/199396 had no
effect at all, even though the patch itself seems pretty obvious.
Change the semantics of the command-line option to support
`-fno-experimental-constant-interpreter` as well. This way, the cmake
option can be used to set the default and the `-f`/`-fno-` command-line
options can be used to override the default behavior.
[flang][PFT-to-MLIR] Wrap unstructured Fortran constructs in scf.execute_region
Extend the PFT-to-MLIR (HLFIR/FIR) lowering so unstructured DO and IF
constructs are emitted inside scf.execute_region, hiding their multi-block
CFG behind a single op. OpenACC and OpenMP lowerings that reject
multi-block content (e.g. the "unstructured do loop in combined acc
construct" TODO in OpenACC.cpp) now see a structured op instead.
Flag: -mmlir --wrap-unstructured-constructs-in-execute-region (default on).
An evaluation is wrappable iff all of the following hold:
* wrap flag on
* eval is parser::DoConstruct or parser::IfConstruct
* eval.isUnstructured
* branchesAreInternal(eval) -- every controlSuccessor in the subtree
targets a nested eval or the constructExit
* !hasIncomingBranch(eval) -- no outside eval branches into the body
(PFT's synthetic IfConstruct around `if(c) goto X` absorbs label
[14 lines not shown]
[orc-rt] Add SPS serialization for ExecutorAddrRange. (#205041)
Allows SPS serialization to/from ExecutorAddrRange. This will be used in
upcoming patches for compact-unwind registration support.
[FIR] Route embox + projected complex slice through shapeVec
When the array_coor base is a fir.embox with a projected complex %re/%im
slice, take the shapeVec path instead of the descriptor (fir.box_dims)
path. The descriptor path iterates source-rank dims while querying the
rank-reduced embox result box, which miscompiles slices that collapse
dims (e.g. complex(:,k)%re). For embox-derived boxes the underlying
storage is contiguous, so the shape-derived layout is both correct and
the natural place to encode that static shape is available. Non-embox
boxes (rebox, assumed-shape) still go through fir.box_dims.
Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
[AArch64] Lower extends of boolean vector loads via scalar load (#203394)
Replace a `load <N x i1>` under a sext/zext with a scalar load +
bitcast, so the `combineToExtendBoolVectorInReg` helper can apply,
avoiding scalarization.
Optimisation for the SVE case with a predicate load to be added in a
follow up.
Fixes #200325
[orc-rt] Tidy up some SPS tag types. NFC. (#205038)
Replaces class definitions with decls for tag types that don't need a
body, and moves the SPSError tag down to just above it's
serialization-traits class.
[LoongArch] Custom scalar UINT_TO_FP and FP_TO_UINT with LSX instructions (#200901)
Using `vftintrz.lu.d` for converting scalar double/float values to
unsigned 64-bit integers, and `vffint.d.lu` vice versa.
[AMDGPU] Improve the description of asyncmark semantics (#202579)
- The semantics of asyncmarks is now defined purely in terms of
sequences, without referring to the implementation.
- The examples incorrectly used (post)dominance. Fixed that with wording
in terms of asyncmark sequences.
[ProfileData] Lazy-load fixed-length MD5 name table (#202014)
When reading extensible binary format profiles with fixed-length MD5
name tables, the reader eagerly allocates and populates a
std::vector<FunctionId> to store the name table. This eager loading
is particularly wasteful when ProfileIsCS is false, as we populate the
entire name table just to support lookups during profile ingestion,
even though we may only use a subset of the profile. Since FunctionId
is 16 bytes on 64-bit systems, a name table containing 10 million MD5
hash values would consume 160MB of heap memory.
This patch implements lazy loading for the name table in extensible
binary format profiles when the fixed-length MD5 layout is used.
Specifically, this patch introduces SampleProfileNameTable to
encapsulate the name table representation, supporting both lazy
loading (pointing directly to the memory-mapped buffer) and eager
loading (using a vector). Eager loading is retained as a fallback for
layouts that do not support O(1) random access (such as
[11 lines not shown]
[FIR] Route embox+projected slice through shapeVec in FIRToMemRef
The descriptor-strides path iterates source-rank dims but queries the
rank-reduced embox result box, miscompiling slices that collapse dims
(e.g. complex %re/%im on b(:,k)). For embox-derived boxes the underlying
storage is contiguous, so the shape-derived layout is both correct and
the natural place to encode "static shape information is available."
Drop the `|| hasProjectedSlice` carve-out from boxNeedsDescriptorStrides
so projection cases also take the shapeVec path. Non-embox boxes
(rebox, assumed-shape) still go through fir.box_dims because their
storage may be non-contiguous.
Fixes the SIGSEGV at -O0 -lro and miscompile at -O1 -lro on the Fujitsu
0086_0019 reproducer (complex(:,k)%re inside WHERE).
Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>