[RISCV] Fix i64 gather/scatter cost on rv32 (#176105)
Fixes #175909
We compute the cost of a gather/scatter by multiplying the cost of the
scalar element type memory op by the estimated number of elements. On
rv32 though a scalar i64 load costs 2, even if we have zve64x.
This causes the cost to diverge between a vector of f64 and vector of
i64, even though both are the same. This fixes it by just using
TTI::TCC_Basic as the scalar memory op cost. The element type is checked
to be legal at this point.
I think we have the same issue for the strided op cost, but we don't
have test coverage for it yet.
[DAGCombiner] Fold min/max vscale, C -> C (#174708)
This fixes a regression in #174693 caused by using ISD::UMIN to clamp
offset into a vector address.
For (umin x, y) if we know the minimum value of x is >= the maximum
value of y, then y will always be the smaller operand and we can fold to
y.
We can do similar folds for umax, smin and smax too.
In practice the only time we get a useful ConstantRange is with VScale
and a constant RHS, so this patch limits it to this case. I tried
generalizing it with computeKnownBits but it didn't have any effect on
existing tests.
[AArch64] Prioritize STNP patterns for !nontemporal (#174138)
This patch prioritizes lowering using STNP patterns over temporal store
patterns for store instructions marked `!nontemporal`. We should
generally prioritize STNP lowering for non-temporal stores, here it
costs extra instructions for address calculation, but its explicitly
hinted by the developer so the gain from the memory subsystem should be
more significant. It adds test cases for register offset stores where
the mixed lowering has been observed.
The `AArch64stnp` pattern is sinked under `isLE` but it doesnt change
functionality because it is only matched when the custom lowering code
in `AArch64ISelLowering.cpp` prepares for it and its guarded by
`isLittleEndian()` check.
Haven't observe a significant compile time impact that could have stem
from matching more failed patterns for each store.