[Flang][Semantics] Throw diagnostics for identical entry and result names. (#198500)
Fixes https://github.com/llvm/llvm-project/issues/198499
In flang entry name and result names cannot be same as per
[C1583](https://j3-fortran.org/doc/year/25/25-007r1.pdf).
Previously, Flang did not diagnose cases where the ENTRY name and RESULT
name were identical, e.g.
```
function m1f1()
integer :: m1f1
real :: m1f1e1
m1f1 = 0
entry m1f1e1() result(m1f1e1)
m1f1e1 = 0.1
end function
```
[Flang] Fix -frelaxed-c-loc-checks being ignored when using the driver (#200733)
`-frelaxed-c-loc-checks` worked correctly when passed directly to -fc1,
but was silently ignored when using the driver (e.g., flang -c
-frelaxed-c-loc-checks), causing the flag to go unused. This patch fixes
it by adding `OPT_relaxed_c_loc` to the `addAllArgs` call in Flang.cpp
Also extend the existing test with a driver-mode RUN line to cover this
path.
[mlir][spirv] Add TOSA graph constant marking (#201095)
Add a TOSA to SPIR-V TOSA preprocessing pass that marks large tosa.const
and tosa.const_shape operations for lowering to spirv.ARM.GraphConstant.
Keep small constants inline as spirv.Constant, assign graph constant IDs
with a grapharm-prefixed marker attribute, and teach the existing
constant conversion to use the marker when present.
Expose the grapharm source-side attribute names used for interface ABI
annotations and graph constant IDs.
Add tests for marking large constants, leaving small constants unmarked,
increasing graph constant IDs across mixed constants, and lowering
pre-marked constants to spirv.ARM.GraphConstant.
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
[AArch64] Fix DUP-of-extload combine to ignore chain uses (#201351)
The original combine bailed when the load had more than one use, but
counted chain uses too.
[orc-rt] Make NativeDylibManager::lookup return optional addresses. (#201519)
NativeDylibManager::lookup used to return (asynchronously) a vector of
void *s where null represented not-present. This commit updates it to
return a vector of std::optional<void *>s where std::nullopt represents
not-present and an address of zero indicates that the symbol is present
with an address of zero.
This matches the resolve semantics of SimpleExecutorDylibManager,
completing the alignment of the two implementations after the earlier
additions of the Mode argument to load() and the
required/weakly-referenced flag on lookup symbols.
[MLIR][XeGPU] Extend op definitions to support 3D+: dpas, dpas_mx (#199809)
**Summary**
Extend xegpu.dpas and xegpu.dpas_mx operations to support 3D and 4D
operands with batch dimensions, enabling batched matrix multiplication
workloads like [4, 128, 512] x [4, 512, 128] -> [4, 128, 128].
**Changes**
- Type definitions: Extended XeGPU_DpasOprType and XeGPU_DpasResType to
support rank 3-4 (previously 1-3 and 1-2)
- Op definitions: Extended dpas_mx scale operands to support rank 3-4
vectors
- Verifiers: Updated verifyDpasDimensions() to validate batch dimensions
across A, B, and result operands; updated DpasMxOp::verify() for
batch-aware scale dimension checks
- Documentation: Added comprehensive documentation explaining batch
dimensions, microarchitecture-specific matrix sizes, and 3D/4D usage
[7 lines not shown]
[MLIR][XeGPU] Extend op definitions to support 3D+: load_nd, store_nd, prefetch_nd (#199811)
**Summary**
Extend xegpu.load_nd, xegpu.store_nd, and xegpu.prefetch_nd operations
to support 3D and higher-dimensional tensor descriptors with batch
dimensions, enabling batched memory operations for workloads like [4, 8,
16] tensor loads/stores.
**Changes**
- Verifiers: Removed rank > 2 checks in LoadNdOp::verify() and
StoreNdOp::verify() to allow 3D+ tensor descriptors
- Documentation: Added comprehensive documentation explaining: Tensor
descriptors can be 1D, 2D, 3D, or higher dimensional; Batch dimensions
(leading dimensions) are unrolled to unit dimensions during lowering;
Operations execute at 2D granularity at subgroup level to match 2D block
IO hardware; Examples of 3D operations
- Tests: Added unit tests for 3D operations (load_nd_3d, store_nd_3d,
prefetch_nd_3d)
[2 lines not shown]
[LoopInterchange] Check all inner-exit LCSSA PHIs (#200860)
areInnerLoopExitPHIsSupported() returned true as soon as it saw the
reduction LCSSA PHI, skipping the user-check for any later LCSSA PHIs.
If one had a non-PHI user, legality wrongly succeeded and the transform
hit a cast<PHINode> assertion. Use continue so the remaining PHIs are
still validated.
Fixes #200811.
[CodeGen][AMDGPU] Remove premature empty subrange elimination (#201263)
This commit removes a call to `removeEmptySubRanges` inside
`SplitEditor::rewriteAssigned` which removes empty subranges that may be
expected at a later stage. The empty subranges are eliminated by a later
call to `removeEmptySubRanges`.
Fixes https://github.com/llvm/llvm-project/issues/199337.
---------
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
[ORC] Replace ExecutorSymbolDef with ExecutorAddr in remote lookup. (#201492)
Update DylibManager and associated interfaces to return ExecutorAddrs
for remote symbols, rather than ExecutorSymbolDefs. No clients were
using the flags component of ExecutorSymbolDef, and this brings the
SimpleExecutorDylibManager implementation in OrcTargetProcess into
closer alignment with the NativeDylibManager implementation in the new
ORC runtime.
Fix metadirective loop variant lowering
Preserve the associated DO evaluation when a dynamic metadirective can
select either a loop-associated directive or a standalone fallback, so
the fallback still lowers the original loop body.
Scope temporary loop-IV data-sharing attributes to the selected variant.
Use the selected variant's collapse clause to determine how many loop IVs
to mark, avoiding DSA state leaking between alternatives.
[flang][OpenMP] Support loop-associated metadirective variants (part 3)
Enable metadirective lowering for loop-associated variants such as
`do`, `simd`, `parallel do`, and `do simd`.
When a metadirective resolves to a loop-associated directive, the
sibling DO evaluation is spliced into the metadirective's evaluation
list so existing loop lowering finds it. Loop IV data-sharing
attributes are marked at lowering time since semantic analysis cannot
know which variant will be selected. The DataSharingProcessor is also
extended to handle spliced evaluations.
This patch is part of the feature work for #188820 and stacked on top
of #194424.
Assisted with copilot and GPT-5.4
[flang][OpenMP] Support lowering of metadirective (part 2) (#194424)
Lower non-constant `user={condition(expr)}` selectors in OpenMP
metadirectives to a runtime `fir.if` / `else` selection cascade.
Dynamic user conditions are handled in two separate phases:
- Static applicability uses only selector traits that are known at
compile time.
- Guarded ranking preserves selector specificity and dynamic-condition
scores for the path where the runtime condition evaluates to true.
Lowering first filters candidates using compile-time selector traits,
then orders the remaining candidates with the normal OpenMP variant
ranking rules. If the selected candidate has a non-constant user
condition, that condition is emitted as a `fir.if` guard. When the
condition evaluates to false, the `else` branch continues selection
among the remaining candidates.
[57 lines not shown]
[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
[MemProf] Change default of memprof-icp-noinline-threshold to 0 (#201474)
This is no longer needed after PR172502 added support to identify
indirect callees from inlined frames.
[OpenMP] Use ext linkage for kernels handles and globals handles keep… (#200964)
… linkage
Host handles are now emmitted with external linkage to clash if two
kernels with the same name are registered. This could have happen right
now and silently corrupt the program, but it can happen more easily once
we allow users to name their kernels.
In the same patch we make global variable handles retain the linkage of
the global variable, forcing clashes for external ones and continue to
support weak use cases.
---------
Co-authored-by: Shilei Tian <i at tianshilei.me>
[clang-tidy][readability] Ignore macros in function-size check (#199549)
This patch adds an IgnoreMacros option to the readability-function-size
check.
Fixes https://github.com/llvm/llvm-project/issues/112835