[DA] Bug fix regarding the SameSD levels (#188098)
SCEV isKnownPredicate may crash if the expressions are involved with
different loops. To verify if two loops have the same iteration space,
we do not need to use the SCEV apis, and it can be done by the equality
check.
Moreover, no pass (not even loop fusion) requires to check SameSD levels
for more than one level. In this patch, we limit the analysis of SameSD
levels to only one level after the common levels.
[LoopFusion] Remove the InvalidDependencies duplicates (#187744)
If the function dependencesAllowFusion returns false, in fuseCandidates
the reportLoopFusion function is used to increment InvalidDependencies
and to emit a OptimizationRemarkMissed. If both dependencesAllowFusion
and reportLoopFusion increment InvalidDependencies, statistics will
appear duplicated
[openacc][flang] full support to handle allocatable/pointer runtime declare-action calls in ACCDeclareActionConversion (#188055)
Supported before: `fir.store`, `fir.box_addr`, and `fir.call` only for
PointerAllocate/PointerDeallocate.
Added now: fir.call support for PointerAllocateSource,
PointerDeallocatePolymorphic, AllocatableAllocate,
AllocatableAllocateSource, AllocatableDeallocate,
AllocatableDeallocatePolymorphic (found in
flang/include/flang/Runtime/allocatable.h).
[AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FMA/FMAD pattern
Add conservative FMA/FMAD recognition to allMulUsesCanBeContracted:
a multiply used by an existing FMA/FMAD is assumed to be contractable
(it's already being contracted elsewhere). This avoids unnecessary
contraction blocking for multiplies that feed into FMA chains.
Also adds FMA/FMAD to the FPEXT user set (fpext(fmul) --> fma is
recognized as contractable when isFPExtFoldable).
Guards all remaining FMA-chain reassociation fold sites in both
SDAG (visitFADDForFMACombine/visitFSUBForFMACombine, 8 sites) and
GISel (matchCombineFAddFpExtFMulToFMadOrFMAAggressive, 4 sites).
This re-enables contractions that were conservatively blocked in
earlier patches where the multiply had an FMA use that wasn't yet
recognized: dagcombine-fma-crash.ll and dagcombine-fma-fmad.ll
CHECK lines revert to upstream behavior.
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
[lldb] Check for arm64e debugserver in skipUnlessArm64eSupported (#188082)
Explicitly check whether we are building debugserver for arm64e. To
debug an arm64e binary, debugserver itself needs to be an arm64e
process.
This PR eliminates the possibility of configuring LLDB with Right now,
it's possible to configure CMake with
`LLDB_ENABLE_ARM64E_DEBUGSERVER=Off` and the decorator wouldn't account
for that.
[AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FPEXT pattern
Extend the allMulUsesCanBeContracted analysis to recognize FPEXT patterns
where the multiply result flows through fpext before being used in
contractable operations (fadd, fsub). This covers:
- fmul --> fpext --> {fadd, fsub}: FPEXT folds if isFPExtFoldable
- fmul --> fpext --> fneg --> fsub: FPEXT then FNEG to FSUB
- fmul --> fneg --> fpext --> fsub: FNEG then FPEXT folds if foldable
Also adds allMulUsesCanBeContracted guards to all FPEXT fold sites in
both SDAG (visitFADDForFMACombine, visitFSUBForFMACombine) and GISel
(matchCombineFAddFpExtFMulToFMadOrFMA, matchCombineFSubFpExtFMulToFMadOrFMA,
matchCombineFSubFpExtFNegFMulToFMadOrFMA).
Fixes a missing isFPExtFoldable check in GISel's
matchCombineFSubFpExtFMulToFMadOrFMA which could fold without verifying
the extension is actually foldable.
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
[HLSL] Implement Texture2D::mips[][]
We implement the Textur2D::mips[][] method. We follow the design in DXC.
There is a new member called `mips` with type mips_type. The member will
contain a copy of the handle for the texture.
The type `mips_type` will have a member function `operator[]` that takes
a level, and returns a `mips_slice_type`. The slice will contain the
handle and the level. It also has an operator[] member function that
take a coordinate. It will do a load from the handle with the level and
coordinate, and return that value.
Assisted-by: Gemini
[LoongArch] Fix -Wunused-variable in c4b01ec20d3e845d348b0b005102b1301a8550ca
This variable was only used in assertions which was causing warnings in
release+noasserts builds.
[IR][NFC] Rename UncondBrInst 'IfTrue' argument to 'Target' (#187631)
Follow-up on @slackito's suggestion about the naming of the variable and
discussion with @aengelke in #187196
[AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FNEG pattern
Extend allMulUsesCanBeContracted() to recognize fmul -> fneg -> fsub
chains as contractable uses. This allows FMA contraction when a multiply
feeds an fneg that is only used by fsub operations.
Changes:
- DAGCombiner.cpp: Add ISD::FNEG case to allMulUsesCanBeContracted()
checking that all FNEG users are ISD::FSUB. Update 1 fold site guard
in visitFSUBForFMACombine (fsub(fneg(fmul))).
- CombinerHelper.cpp: Add G_FNEG case to allMulUsesCanBeContracted()
checking that all FNEG users are G_FSUB. Update 2 fold site guards
in matchCombineFSubFNegFMulToFMadOrFMA. Fix guard ordering to check
isContractableFMul before allMulUsesCanBeContracted (cheap first).
- Add 7 new test functions to fma-multiple-uses-contraction.ll covering
fneg single-use, multi-use, mixed contractable/non-contractable, and
cross-pattern (P1 direct + P2 fneg) interactions.
- Update mad-combine.ll CHECK lines affected by the guard changes.
[4 lines not shown]
[DAGCombiner][GlobalISel] Prevent FMA contraction when fmul cannot be eliminated (FADD/FSUB pattern)
fmul nodes with multiple uses can currently be contracted into FMA
operations even when the fmul itself cannot be eliminated, resulting in
a redundant multiply (wasted power and compute). The existing guard
`Aggressive || N0->hasOneUse()` allows contraction under Aggressive mode
regardless of whether the multiply can be removed.
This patch tightens the guard to:
`N0->hasOneUse() || (Aggressive && allMulUsesCanBeContracted(N0))`
`allMulUsesCanBeContracted()` iterates all users of the multiply and
returns true only if every use is itself contractable into an FMA.
For this first patch, only direct FADD and FSUB uses are recognized as
contractable (FNEG, FPEXT, and FMA/FMAD patterns follow in subsequent
patches).
The change is applied symmetrically to both DAGCombiner and GlobalISel:
- DAGCombiner: 4 fold sites in visitFADDForFMACombine (2 sites) and
[8 lines not shown]
[flang] Adding a new extension that was noticed to be intentionally in the code. (#182891)
The flang compiler intentionally issues a warning on duplicate
prefix-specs for procedures. This is not consistent with the standard
which says "shall contain at most one of each". Other tested compilers
correctly issue an error.
It is safe to leave this as a warning. It can be turned into an error
condition by using the "-Werror" flag.
However, it should be noted in the Extensions document, similar to the
mention of the SAVE attribute.
[CIR] Generalize cxx alloc new size handling (#187790)
The non-constant size handling in `emitCXXNewAllocSize` was making the
incorrect assumption that the default behavior of the size value being
explicitly cast to size_t would be the only behavior we'd see. This is
actually only true with C++14 and later. To properly handle earlier
standards, we need the more robust checking that classic codegen does in
the equivalent function. This change adds that handling.
Assisted-by: Cursor / claude-4.6-opus-high
[SLP]Improve reductions for copyables/split nodes
The original support for copyables leads to a regression in x264 in
RISCV, this patch improves detection of the copyable candidates by more
precise checking of the profitability and adds and extra check for
splitnode reduction, if it is profitable.
Fixes #184313
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/185697
[AMDGPU] Add structural stall heuristic to scheduling strategies (#169617)
Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.
- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
- Resource conflicts on unbuffered resources (from the SchedModel)
- Sequence-dependent hazards (from GCNHazardRecognizer)
- Add getHazardWaitStates() to GCNHazardRecognizer that returns the
number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
[CIR] Add Involution trait to BitReverseOp and ByteSwapOp (#187862)
bitreverse(bitreverse(x)) == x and byte_swap(byte_swap(x)) == x are
mathematical involutions.
This adds MLIR Involution trait to CIR opetation, it encodes this
property and automatically folds away the outer application when an op's
input is produced by the same op type.