[StaticDataLayout][MemProf] Annotate string literal hotness by making use of data access profiles. (#178333)
The change is gated under a new option
`memprof-annotate-string-literal-section-prefix` so we can flag-gate it
for rollout purposes.
A follow-up PR https://github.com/llvm/llvm-project/pull/178336/changes
updates the codegen pass to reconcile the hotness similar to the
reconciliation for other global variables.
[NFC] update doc comment on `setLoopEstimatedTripCount` (#178091)
See [this
discussion](https://github.com/llvm/llvm-project/pull/174896#issuecomment-3802361713)
prompted by PR #174896.
A 0-0 encoding in branch weights is invalid (the probability of an edge
is computed as a fraction where the denominator is the sum of the
weights and the numerator is its - the edge's - weight). So BPI actually
handles it as 1-1, which then results in raising the BFI of the loop
body that's meant to be cold.
The aforementioned PR addressed this, but didn't update the doc comment.
[mlir][xegpu] Add initial support for layout conflict handling. (#173090)
This PR adds initial support for layout conflict resolution in XeGPU.
Layout conflict occurs when some op's use point expects a different
layout than what the op can currently provide. This conflict needs to be
resolved by adding certain other xegpu ops.
Initially, We only focus conflict handling at tensor desc use points.
[EarlyIfConversion] Add analysis for data-dependent conditional branches(#174457)
Add infrastructure to identify conditional branches on values loaded from
memory. Such branches are likely to be harder to predict accurately since
branch history (probably) provides little useful information.
This analysis walks the def-use chain from the branch condition to find
loads that feed into it. Several cases are excluded from consideration:
- Loads from constant pools (predictable values)
- Dereferenceable invariant loads (loop-invariant)
- Branches with biased probability (null checks, etc.)
- Loads not "close in program time" to the branch (must be in the same
basic block with no intervening calls)
The analysis is disabled by default behind -enable-early-ifcvt-data-dependent.
Do not run make in jails without src
install_world() calls `make delete-old delete-old-libs`, but a jail
created with upstream pkgbase does not have src and so the command
fails. pkg removes any unneeded files, so there's no need to
delete-old on upstream pkgbase jails.
Signed-off-by: Pat Maddox <pat at patmaddox.com>
Change-Id: Ic11f82d89e6059032138fb73ccb2b2ad6a6a6964
[SDPatternMatch][NFC] Use empty SDNodeFlags instead of std::optional (#178483)
I think we can avoid using std::optional for SDNodeFlags in
UnaryOpc_match.
NFC.
[AMDGPU] Introduce V_READANYLANE_B32
This is non-convergent pseudo suitable for uniform inputs.
The MachineInstr::NoConvergent attribute allows hoisting
which is otherwise prohibited for a convergent instruction.