[MLIR][Mem2Reg] Extract shared utilities for PromotableRegionOpInterface (#188514)
The `PromotableRegionOpInterface` implementations use two helpers that
are likely useful for other dialects implementing this interface as
well:
- `updateTerminator`: Appends the reaching definition as an operand to a
block's terminator, falling back to a default when the block has no
entry (e.g. dead code).
- `replaceWithNewResults`: Clones an operation with additional result
types while preserving its regions, then replaces the original.
This PR extracts them into a common utility header so that downstream
dialects can reuse them directly.
I'm open to discussion about the location of these utilities.
[SLP] Prefer to trim equal-cost alternate-shuffle subtrees
If the trimming candidate subtree is rooted at an alternate-shuffle node
with binary ops, and this subtree has the same cost as the buildvector
node cost, better to stick with the buildvector node to avoid runtime
perf regressions from shuffle/extra operations overhead that the cost model may
underestimate. Skip trimming if the subtree contains ExtractElement
nodes, since those operate on already-materialized vectors, which may
reduced vector-to-scalar code movement and have better perf.
Reviewers: hiraditya, bababuck, fhahn, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/188272
[DA] Require `nsw` for AddRecs in the WeakCrossing SIV test (#185041)
Before the start of the algorithm in weak crossing SIV test, we need to
ensure both addrecs are `nsw`
clang: Reorder linker aux-triple handling
Move the IsCuda check out from the IsCuda || isHIP block. Keep
this from splitting the aux-triple handling for future convenience.
[MLIR][SparseTensor] Add #undef FAILURE_IF_FAILED and ERROR_IF (#188685)
Both DimLvlMapParser.cpp and LvlTypeParser.cpp define FAILURE_IF_FAILED
and ERROR_IF macros that are never undefined, which can leak into
subsequent translation units in unity builds. Add #undef at the end of
each file. See
https://discourse.llvm.org/t/rfc-enabling-unity-build/90306 for more
info.
"clauded" not coded
[MLIR][SparseTensor] Add missing #undef REMUI and DIVUI (#188686)
LoopEmitter.cpp and SparseTensorIterator.cpp define REMUI and DIVUI
macros but the existing #undef block at the end of each file omits them.
This can leak the macros into subsequent translation units in unity
builds. See https://discourse.llvm.org/t/rfc-enabling-unity-build/90306
for more info.
"clauded" not coded
[Clang] Fix constant bit widths in gpuintrin.h (#189387)
Summary:
The `ull` suffix can mean 128 bits on some architectures. Replace this
with the `stdint.h` constructor to be certain.
Reapply "[AMDGPU] Add HWUI pressure heuristics to coexec strategy (#184929)" (#189121)
Reland https://github.com/llvm/llvm-project/pull/184929 after fixing
some issues in the NDEBUG builds.
3a640ee is unchanged from the previously approved PR, the unreviewed
portion of this PR is 9cabd8d
[Clang] Improve scan in gpuintrin.h (#189381)
Summary:
Right now the scan checks to avoid the unspecified behavior in
`clzg(0)`. This is used as the source to the shuffle instruction, but
the argument is discarded at zero anyway. So, we simply pass unspecified
behavior to shuffle and then discard it. This should be fine. The scan
routines are expected to be optimal.
Also renames `sum` to `add`.
[lld][Hexagon] Fix out-of-range PLT branch thunks (#186545)
Linking large Hexagon binaries (e.g. ASan runtime with >8 MiB of text)
fails with R_HEX_B22_PCREL / R_HEX_PLT_B22_PCREL relocation overflow on
calls to PLT entries, even though the thunk infrastructure exists and
needsThunks is set.
needsThunk() always used s.getVA() to compute the branch destination,
even for PLT calls where the actual destination is the PLT entry. This
meant the distance check used the wrong address and failed to create
thunks when the PLT entry was out of B22_PCREL range.
Fix by using s.getPltVA() when expr == R_PLT_PC. Also override
getThunkSectionSpacing() so ThunkSections are pre-created at appropriate
intervals for large binaries.
[LLVMABI] Create ABI Utils (#185105)
This PR introduces `ABIFunctionInfo` and surrounding utility helpers,
and is part of the set of breakout PRs to upstream the LLVM ABI lowering
library prototyped in https://github.com/llvm/llvm-project/pull/140112.
`ABIFunctionInfo` is directly analogous to `CGFunctionInfo` from Clang's
existing CodeGen pipeline, and represents an ABI lowered view of the
function signature, decoupled from both the Clang AST and LLVM IR.
`ABIArgInfo` encodes lowering decisions and currently supports
Direct,Extend,Indirect and Ignore which are required for our initial
goal of implementing x86-64 SysV and BPF, but this will change as the
library grows to represent more targets that need them.
This PR is a direct precursor to the implementation of `ABIInfo` in the
library as demonstrated in the PR linked above..
[AMDGPU][TTI] Update cost model for transcendental instructions to be more precise (#189430)
Introduce `getTransInstrCost` instead of `getQuarterRateInstrCost` for transcendental ops
[flang][OpenMP] Remove misplaced comment, NFC (#189449)
Remove the seemingly random comment listing clauses allowed on a DO
construct. The nearby code has nothing to do with clauses.
[flang][OpenMP] Remove misplaced comment, NFC
Remove the seemingly random comment listing clauses allowed on a DO
construct. The nearby code has nothing to do with clauses.
[Support] Move `KnownFPClass` inference from `KnownBits` to Support (#189414)
Move logic for inferring `KnownFPClass` from known bits into the Support
library so the logic may be used e.g., for analogous value tracking
functions in SelectionDAG.
[PowerPC] Respect chain operand for llvm.ppc.disassemble.dmr lowering (#188334)
Fix ignoring the input chain when turning llvm.ppc.disassemble.dmr into
a store.