[RISCV] Combine ADDD with UMUL_LOHI/SMUL_LOHI into WMACCU/WMACC (#180383)
Combine the pattern:
ADDD(addlo, addhi, UMUL_LOHI(x, y).0, UMUL_LOHI(x, y).1)
into:
WMACCU(x, y, addlo, addhi)
And similarly for SMUL_LOHI -> WMACC.
This patch was written with AI, but I reviewed it carefully.
[RISCV] Emit MULHU/MULHS/UMUL_LOHI/SMUL_LOHI from our custom XLen*2 expansion. (#180379)
We already do all the checks necessary in order to prioritize
MULHU/MULHS/UMUL_LOHI/SMUL_LOHI over MULHSU/WMULSU. We might as
well just emit the nodes instead of letting generic type legalization
redo the checks.
This is slightly different than the default legalization because we
don't have access to ExpandInteger so we have to emit TRUNCATES and
BUILD_PAIR. Not sure if this will result in any differences in practice.
[Docs][Intrisics] Fix the name of llvm.memset.inline in the documentation (#180373)
LLVM intrinsic `llvm.memset.inline` indicates in its name the types of
the destination pointer and the size. There is no second pointer.
Moreover, the tests are already verifying that generated code uses
`@llvm.memset.inline.p0.i32` and `@llvm.memset.inline.p0.i64`. So make
the documentation reference these names as well.
Fixes: https://github.com/llvm/llvm-project/issues/163454
[RISCV] Add support for forming WMULSU during type legalization. (#180331)
Add a DAG combine to turn it into MULHSU if the lower half result
is unused.
[VPlan] Use PredBB's terminator as insert point for VPIRPhi extracts.
Use PredBB's terminator as insert point in VPIRPhi::execute to make sure
the extracts are placed after any possibly sunk instructions.
Fixes https://github.com/llvm/llvm-project/issues/180363.
[AArch64] Consider MOVaddr* as cheap if fuse-adrp-add
These pseudo-instructions usually translate into a pair of adrp+add and
have a single cycle latency on some micro-architectures.
[ProfCheck] Add prof data for lowering of @llvm.cond.loop
When there is no target-specific lowering of @llvm.cond.loop, it is
lowered into a simple loop by PreISelIntrinsicLowering. Mark the branch
weights into the no-return loop as unknown given we do not have value
metadata to fix the profcheck test for this feature.
Reviewers: mtrofin, alanzhao1, snehasish, pcc
Pull Request: https://github.com/llvm/llvm-project/pull/180390
[ProfCheck] Add utility to get a MDNode for unknown branch weights
There are some cases where it is non-trivial to get access to a
branch/select instruction and the helper function that creates the
branch/select of interest takes in a MDNode for branch weights. Add a
helper to create a MDNode for unknown branch weights if the function is
profiled to handle this case.
Reviewers: mtrofin, snehasish, alanzhao1
Pull Request: https://github.com/llvm/llvm-project/pull/180389
[VPlan] Pass underlying instr to getMemoryOpCost in ::computeCost.
Pass underlying instruction to getMemoryOpCost in
VPReplicateRecipe::computeCost if UsedByLoadStoreAddress is true.
Some targets use the underlying instruction to improve costs,
and this is needed to match the legacy cost model.
Fixes https://github.com/llvm/llvm-project/issues/177780.
Fixes https://github.com/llvm/llvm-project/issues/177772.
[VPlan] Fall back to legacy cost model if PtrSCEV is nullptr.
There are some cases when PtrSCEV can be nullptr. Fall back to legacy
cost model, to not call isLoopInvariant with nullptr.
Fixes a crash after 0c4f8094939d2.