[mlir][Affine] Fix LICM incorrectly hoisting stores from zero-trip-count loops (#189165)
The affine-loop-invariant-code-motion pass was hoisting side-effectful
operations (e.g. affine.store) out of loops whose trip count is
statically known to be zero. This caused stores to execute
unconditionally even though the loop body should never run, producing
incorrect results.
The fix skips hoisting of non-memory-effect-free ops when
getConstantTripCount returns 0. Pure/side-effect-free ops are still
eligible for hoisting because they cannot change observable program
state.
Fixes #128273
Assisted-by: Claude Code
[MachinePipeliner] Remove isLoopCarriedDep and use DDG (#174394)
This patch completely removes `isLoopCarriedDep`, which was used
previously to identify loop-carried dependencies in the DAG. Now that we
have the DDG representation, this special handling is no longer
necessary. Simply replacing its usage with the DDG causes several tests
to fail, since cycle detection takes some of the validation-only edges
in the DDG into account. To address this, this patch introduces extra
edges in the DDG, which are used only for cycle detection and not for
other parts of the pass (e.g., scheduling). The extra edges are
determined to preserve the existing behavior of the pass as closely as
possible, which makes the predicates for adding them somewhat complex.
Split off from #135148, and the final patch in the series for #135148
[InstCombine][NFC] Expose isKnownExactCastIntToFP as a public method
stack-info: PR: https://github.com/llvm/llvm-project/pull/190327, branch: users/SavchenkoValeriy/feat/instcombine/fcvtzu_fcvtzs_roundtrip/stack/1
[CIR] Auto-generate matchAndRewrite for one-to-one CIR-to-LLVM lowerings
When a CIR op specifies a non-empty `llvmOp` field, the lowering
emitter now generates the `matchAndRewrite` body that converts the
result type and forwards all operands to the corresponding LLVM op.
This removes 27 boilerplate lowering patterns from LowerToLLVM.cpp.
Ops needing custom logic (FMaxNumOp/FMinNumOp for FastmathFlags::nsz)
override `llvmOp = ""` to retain hand-written implementations.
Also fixes llvmOp names (TruncOp -> FTruncOp, FloorOp -> FFloorOp)
and adds a diagnostic rejecting conflicting llvmOp + custom constructor.
[MLIR][MemRef] Fix AllocOp/AllocaOp flattening domination violation (#188980)
The generic MemRefRewritePattern handles AllocOp/AllocaOp by calling
getFlattenMemrefAndOffset with the op's own result as the source memref.
This inserts ExtractStridedMetadataOp and ReinterpretCastOp that consume
op.result before the alloc op itself in the block. After
replaceOpWithNewOp, op.result is RAUW'd to the new ReinterpretCastOp
result, leaving those earlier ops with forward references — a domination
violation caught by MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.
Replace the AllocOp/AllocaOp cases in MemRefRewritePattern with a
dedicated AllocLikeFlattenPattern that never touches op.result until the
final replaceOpWithNewOp:
- sizes come from op.getMixedSizes() (operands, not the result)
- strides come from getStridesAndOffset on the MemRefType
- the flat allocation size is computed via
getLinearizedMemRefOffsetAndSize plus the static base offset so the
buffer covers [0, offset+extent)
- castAllocResult is simplified to take the pre-computed sizes and
[10 lines not shown]
[BOLT] Move extern "C" out of unnamed namespace (#190282)
GCC 15 changes how it interprets extern "C" in unnamed namespaces and
gives the variable internal linkage.
[MLIR][Affine] Fix null operands in simplifyConstrainedMinMaxOp (#189246)
`mlir::affine::simplifyConstrainedMinMaxOp` called
`canonicalizeMapAndOperands` with `newOperands` that could contain null
`Value()`s. These nulls came from
`unpackOptionalValues(constraints.getMaybeValues(), newOperands)` where
internal constraint variables added by `appendDimVar` (for `dimOp`,
`dimOpBound`, and `resultDimStart*`) have no associated SSA values.
Passing null Values to `canonicalizeMapAndOperands` risks undefined
behavior:
- `seenDims.find(null_value)` in the DenseMap causes all null operands
to collide at the same key, producing incorrect dim remapping.
- Any null operand that remains referenced in the result map would
propagate as a null Value into `AffineValueMap`, crashing callers that
try to use those operands to create ops.
Fix: Before calling `canonicalizeMapAndOperands`, filter null operands
from `newOperands` by replacing their dim/symbol positions in `newMap`
[6 lines not shown]
[mlir][IntRangeAnalysis] Fix assertion in inferAffineExpr for mod with range crossing modulus boundary (#188842)
The "small range with constant divisor" optimization in
`inferAffineExpr` for `AffineExprKind::Mod` assumed that if the dividend
range span (`lhsMax - lhsMin`) is less than the divisor, then the mod
results form a contiguous range. This is not always true, as the range
can straddle a modulus boundary.
For example, `[14, 17] mod 8`:
- Span is 3 < 8, so the old condition passed
- But `14%8=6` and `17%8=1` (wraps at 16)
- `umin=6, umax=1` → assertion `umin.ule(umax)` fails
The fix adds a same-quotient check (`lhsMin/rhs == lhsMax/rhs`) to
ensure both endpoints fall within the same modular period. When they
don't, we fall back to the conservative `[0, divisor-1]` range.
Assisted-by: Cursor (Claude)
Signed-off-by: Yu-Zhewen <zhewenyu at amd.com>
[NFC][analyzer] Eliminate SwitchNodeBuilder (#188096)
This commit removes the class `SwitchNodeBuilder` because it just
obscured the logic of switch handling by hiding some parts of it in
another source file.
[lldb][Module] Only call LoadScriptingResourceInTarget via ModuleList (#190136)
This patch is motivated by
https://github.com/llvm/llvm-project/pull/189943, where we would like to
print the "these module scripts weren't loaded" warning for *all*
modules batched together. I.e., we want to print the warning *after* all
the script loading attempts, not from within each attempt.
To do so we want to hoist the `ReportWarning` calls in
`Module::LoadScriptingResourceInTarget` out into the callsites. But if
we do that, the callers have to remember to print the warnings. To avoid
this, we redirect all callsites to use
`ModuleList::LoadScriptingResourceInTarget`, which will be responsible
for printing the warnings.
To avoid future accidental uses of
`Module::LoadScriptingResourceInTarget` I moved the API into
`ModuleList` and made it `private`.
[mlir][reducer] Remove the restriction that OptReductionPass must be a ModuleOp (#189038)
This PR aims to make the pass more generic by removing the ModuleOp
restriction. This PR reimplements the logic using a standalone
PassManager. Additionally, the isInteresting method has been updated to
accept Operation* for better flexibility. Finally, a dedicated test
directory has been added to improve the organization of OptReductionPass
tests.
[AMDGPU] Add !noalias metadata to mem-accessing calls w/o pointer args (#188949)
addAliasScopeMetadata in AMDGPULowerKernelArguments skips instructions
with empty PtrArgs, including memory-accessing calls that have no
pointer arguments (e.g. builtins like threadIdx()). Because these calls
never receive !noalias metadata, ScopedNoAliasAA cannot prove they don't
alias noalias kernel arguments. MemorySSA then conservatively reports
them as clobbers, which prevents AMDGPUAnnotateUniformValues from
marking loads as noclobber, blocking scalarization (s_load) and forcing
expensive vector loads (global_load) instead.
Fix by adding all noalias kernel argument scopes to !noalias metadata
for memory-accessing instructions with no pointer arguments. Since such
instructions cannot access memory through any kernel pointer argument,
all noalias scopes are safe to apply.
This fixes a performance regression in rocFFT introduced by bd9668df0f00
("[AMDGPU] Propagate alias information in AMDGPULowerKernelArguments").
Assisted-by: Claude Opus
[clang-doc] Prepare Info types for Arena allocation (#190046)
To allocate Info structures directly in an Arena, they cannot have
members with nontrivial destructors, or we will leak memory. Before we
migrate them, we can replace growable vector types with intrusive lists.
This introduces some slight overhead as these types now have new pointer
members for use in ilists in later patches.
| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 1005.7s | 1010.5s | +9.8% | +0.5% |
| Memory | 86.0G | 42.1G | 42.9G | -50.2% | +1.8% |
| Benchmark | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| BM_BitcodeReader_Scale/10 | 67.9us | 68.6us | 69.2us | +1.9% | +0.9% |
| BM_BitcodeReader_Scale/10000 | 70.5ms | 21.3ms | 21.9ms | -68.9% |
+2.8% |
[32 lines not shown]