[AMDGPU][SIInsertWaitcnts][NFC] Make a few WaitcntBracket member functions private (#180018)
The user of the WaitcntBrackets class shouldn't need to know about how
the scoreboard has been implemented internally. So I think it is best to
provide a higher level API that hides things like scoreUB, scoreLB and
score ranges.
This patch makes getScoreUB(), getScoreLB() and getScoreRange() private
and introduces new functions that don't expose the internal
implementation:
- getOutstanding(T)
- hasPendingVMEM(VMEMID, T)
- empty(T)
I also noticed that getSGPRScore() and getVMemScore() are not used
externally so these are now private.
[VPlan] Compute predicated load/store costs in VPlan. (NFC) (#179129)
Update VPReplicateReicpe::computeCost to compute predicated load/store
costs directly, unless the pointer is uniform. In that case, the legacy
cost model uses a different logic, which will be migrated separately.
PR: https://github.com/llvm/llvm-project/pull/179129
[clang-format][doc] Add GNU style link in KeepFormFeed option (#176654)
It was not clear from the description what this option does.
Added small example to demostrate its behavior.
[DSE] Handle variable offsets with sized dead_on_return (#180364)
With a sized dead_on_return, we need to not eliminate stores if there
are to a pointer with a variable offset from the underlying object
marked dead_on_return. This manifested as an assertion failure as
BaseValue/V ended up not being equal. It's possible we could do a range
analysis to try and prove the variable offset stays within bounds, but
this case seems to come up relatively rarely (only reproducible with a
UBSan build of LLVM) and is probably not worth the compile time.
Fixes #180361.
[ProfCheck] Add PreISelIntrinsicLoweringTest to XFail
Introduced in 191af6c254a83c9eb72df92a5db534d8fd4f0701. Should not be a
complicated fix, but move to the xfail list for now so the bot gets back
to green while we work on fixing.
[AMDGPU][GlobalISel] Add lowering for G_FMODF (#180152)
Add generic expansion for G_FMODF matching the SelectionDAG
implementation.
Enable G_FMODF lowering for AMDGPU with tests.
Related: #179434
[clang][NFC] Replace includes of "Attrs.inc" with "Attr.h" (#180356)
"clang/AST/Attrs.inc" is not a self-contained header and is not intended
to be included directly. Instead, "clang/AST/Attr.h" is the header that
users need.
[lld][ELF] Remove redundant size check in nopInstrFill
We checked twice of size is equal to zero. This is not necessary and
makes the code a little bit less readable.
Reviewers: MaskRay, tmsri
Pull Request: https://github.com/llvm/llvm-project/pull/180304
[InferAddressSpaces] Initialize op(generic const, generic const, ...) -> generic (#172143)
Fixes #171890
If the pointer operands of an instruction are all constants with generic
AS, we always infer the AS of the instruction as uninitialized finally.
And the rewrite process will skip cloning the instruction, producing
invalid IR.
This patch fixes it by inferring the AS of this kind of instruction as
flat. Maybe we can fold the operator with all constants to get better
performance, but I think this case is rare in the real world.
[HIP][Sema] Fix incorrect CK_NoOp for lvalue-to-rvalue conversion in … (#180314)
…builtin args
The HIP implicit address space cast for builtin pointer arguments used
CK_NoOp to convert lvalue args to rvalues.
This caused an assertion failure in LifetimeSafety analysis:
Assertion `Dst->getLength() == Src->getLength()` failed
in FactsGenerator::flow() in some cases.
Use DefaultLvalueConversion which correctly emits CK_LValueToRValue.
[GlobalISel] add G_ROTL, G_ROTR to computeKnownBits (#166365)
Adresses one of the subtasks of #150515.
The code is ported from `SelectionDAG::computeKnownBits` and tests are
loosely based on `AArch64/GlobalISel/knownbits-shl.mir`.
[LSR] Support SCEVPtrToAddr in SCEVDbgValueBuilder.
Allow SCEVPtrToAddr as cast in assertion in SCEVDbgValueBuilder.
SCEVPtrToAddr is handled similarly to SCEVPtrToInt.
Fixes a crash with debug info after bd40d1de9c9ee, which started to
generate ptrtoaddr instead of ptrtoint expressions.
[mlir][Interfaces] Add `ExecutionProgressOpInterface` + folding pattern (#179039)
Add the `ExecutionProgressOpInterface` with an interface method to check
if an operation "must progress". Add `mustProgress` attributes to
`scf.for` and `scf.while` (default value is "true").
`mustProgress` corresponds to the [`llvm.loop.mustprogress`
metadata](https://llvm.org/docs/LangRef.html#langref-llvm-loop-mustprogress).
Also add a canonicalization pattern to erase `RegionBranchOpInterface`
ops that must progress but loop infinitely (and are non-side-effecting).
This canonicalization pattern is enabled for `scf.for` and `scf.while`.
RFC: https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530
[mlir] Fix build after #179039 (#179180)
Fix build after #179039.
[ValueTracking] Propagate sign information out of loop (#175590)
LLVM converts sqrt libcall to intrinsic call if the argument is within
the range(greater than or equal to 0.0). In this case the compiler is
not able to deduce the non-negativity on its own. Extended ValueTracking
to understand such loops.
Fixes llvm/llvm-project#174813
[mlir][AMDGPU] Avoid verifier crash in DPPOp on vector operand types (#178887)
### whats the problem
mlir-opt could crash while verifying amdgpu.dpp when its operands had
vector
types, such as ARM SME tile vectors produced by arm_sme.get_tile.
The crash occurred during IR verification, before any lowering or passes
ran.
### why it happens
DPPOp::verify() called Type::getIntOrFloatBitWidth() on the operand
type.
When the operand was a VectorType, this hit an assertion because only
scalar
integer and float types have a bitwidth.
### whats the fix
Query the bitwidth on the element type using getElementTypeOrSelf()
instead of
[5 lines not shown]