[MTE][Darwin] This patch extends support for the stack frame history buffer to Darwin. (#178049)
Darwin reserves slot 231 for storing a pointer to the history ring
buffer. It also uses bits 60-62 to store the size of the ring buffer
rdar://168176496
[AMDGPU][SIInsertWaitcnts] Cleanup: Remove WaitEventMaskForInst member variable (#178030)
The event mask is constant and target dependent it should be accessed
through the WCG object.
[libc] [math] Refactor fsqrtl to be header-only (#176169)
This PR refactors fsqrtl to be header only as discussed. No functional
change intended. Test and build files were updated as required by the
refactor
Fixes #175335
[StaticDataLayout][MemProf] Annotate string literal hotness by making use of data access profiles. (#178333)
The change is gated under a new option
`memprof-annotate-string-literal-section-prefix` so we can flag-gate it
for rollout purposes.
A follow-up PR https://github.com/llvm/llvm-project/pull/178336/changes
updates the codegen pass to reconcile the hotness similar to the
reconciliation for other global variables.
[NFC] update doc comment on `setLoopEstimatedTripCount` (#178091)
See [this
discussion](https://github.com/llvm/llvm-project/pull/174896#issuecomment-3802361713)
prompted by PR #174896.
A 0-0 encoding in branch weights is invalid (the probability of an edge
is computed as a fraction where the denominator is the sum of the
weights and the numerator is its - the edge's - weight). So BPI actually
handles it as 1-1, which then results in raising the BFI of the loop
body that's meant to be cold.
The aforementioned PR addressed this, but didn't update the doc comment.
[mlir][xegpu] Add initial support for layout conflict handling. (#173090)
This PR adds initial support for layout conflict resolution in XeGPU.
Layout conflict occurs when some op's use point expects a different
layout than what the op can currently provide. This conflict needs to be
resolved by adding certain other xegpu ops.
Initially, We only focus conflict handling at tensor desc use points.
[EarlyIfConversion] Add analysis for data-dependent conditional branches(#174457)
Add infrastructure to identify conditional branches on values loaded from
memory. Such branches are likely to be harder to predict accurately since
branch history (probably) provides little useful information.
This analysis walks the def-use chain from the branch condition to find
loads that feed into it. Several cases are excluded from consideration:
- Loads from constant pools (predictable values)
- Dereferenceable invariant loads (loop-invariant)
- Branches with biased probability (null checks, etc.)
- Loads not "close in program time" to the branch (must be in the same
basic block with no intervening calls)
The analysis is disabled by default behind -enable-early-ifcvt-data-dependent.
[SDPatternMatch][NFC] Use empty SDNodeFlags instead of std::optional (#178483)
I think we can avoid using std::optional for SDNodeFlags in
UnaryOpc_match.
NFC.
[AMDGPU] Introduce V_READANYLANE_B32
This is non-convergent pseudo suitable for uniform inputs.
The MachineInstr::NoConvergent attribute allows hoisting
which is otherwise prohibited for a convergent instruction.
[Clang] Fix coro_await_elidable breaking with parenthesized expressions
The applySafeElideContext function used IgnoreImplicit() to find the
underlying CallExpr, but this didn't strip ParenExpr nodes. When code
like `co_await (fn(leaf()))` was parsed, the operand was wrapped in a
ParenExpr, causing HALO (Heap Allocation eLision Optimization) to fail.
This fix chains IgnoreImplicit()->IgnoreParens()->IgnoreImplicit() to
handle both orderings of implicit nodes and parentheses in the AST.
Fixes the issue where adding parentheses around co_await's argument
would prevent heap elision for coro_await_elidable coroutines, which
is particularly problematic since parentheses are often required in
real-world code due to co_await's tight binding with operators.
[mlir][tosa] Fix pad op verifier when padding is dynamic (#177622)
When padding is dynamic the verifier should not return failure, it
shouldn't try to check the pad values.
[DAG] SDPatternMatch - allow m_BinOp / m_c_BinOp to take an optional SDNodeFlags required for matching (#178435)
BinaryOpc_match is already wired up for this - but allow us to use
m_BinOp/m_c_BinOp with the required flags directly
Updated the foldShiftToAvg folds to make use of this
[InstCombine] Add combines for unsigned comparison of absolute value to constant (#176148)
This patch implements the following two peephole optimisations:
1. ``` abs(X) u> K --> K >= 0 ? `X + K u> 2 * K` : `false` ```;
2. If `abs(INT_MIN)` is `poison`, ```abs(X) u< K --> K >= 1 ? `X + (K -
1) u<= 2 * (K - 1)` : K != 0```.
See the following Alive2 proofs:
[1](https://alive2.llvm.org/ce/z/J2SRSv) and
[2](https://alive2.llvm.org/ce/z/tfxTrU).