AMDGPU: Remove leftover test for old promote-alloca subtarget feature
This feature was removed in a56993a694ed02775285b9fe0e23fce8346491c9.
The test used to have a pair testing the enabled and disabled case,
and there's no point leaving the enabled partner.
[mlir][memref] Remove unsafe `getType()` from ReshapeOp (#205105)
Remove the unsafe `getType` method from ReshapeOp. It unconditionally
casts the result to `MemRefType`, but `memref.reshape` may return an
`UnrankedMemRefType`, leading to an assertion failure. The redundant
build method is also removed alongside this change. Fixes #203812.
[clangd] Navigate go-to-definition through forwarding wrappers to the constructor (#199480)
When the user invokes **Go to Definition** on a call like
`std::make_unique<T>(args...)` or `std::make_shared<T>(args...)`,
surface the constructor of `T` that is actually invoked inside the
wrapper, alongside the wrapper itself. The constructor is added before
the wrapper so LSP clients that auto-jump to the first target land on
it; clients that present a menu still let the user reach the wrapper.
This is the forward-direction counterpart to the find-references work in
#169742 (clangd/clangd#716): the same `isLikelyForwardingFunction` +
`searchConstructorsInForwardingFunction` machinery, applied to
`locateASTReferent`.
[LV][NFC] Remove instcombine pass from RUN lines in target tests (#205848)
There is still one test remaining:
LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll
but this looks more like a phase-ordering test and should probably be
handled separately.
[DirectX][ObjectYAML] Add PRIV part support (#204899)
Add support for DXContainer PRIV in the ObjectYAML pipeline so it can be
represented in structured YAML and round-tripped through
yaml2obj/obj2yaml.
PRIV part can store arbitrary user-provided binary blobs in DXContainer.
Unlike other DXContainer parts, PRIV part does not have to have 4-byte
aligned size. Therefore, if it is present, it is always the last section
in a DXContainer.
llvm-objcopy is already able to extract PRIV section. A test to verify
extraction of binary from PRIV is added.
GlobalISel/LegalizerHelper: Use same LLT kind as WideTy for widen merge (#205816)
In widenScalarMergeValues, WideTy is input given by target. Use same LLT
kind for other types of different sizes instead of LLT::scalar.
Makes a difference with extendedLLTs.
GlobalISel/LegalizerHelper: Use type of input load dst for LowerLoad (#205815)
Deduce dst type for new instructions, that do the load lowering, from
destination type of original load instead of from MMO.
Makes a difference with extendedLLTs.
[AArch64][SVE] add missing MLA commute instcombine (#205526)
Remove the MLA commuted patterns added in #198566 and canonicalise
those operations in instcombine instead.
[NFC][AMDGPU][SIMemoryLegalizer] Use BitMaskUtils Helpers
We already used BitMaskUtils but did not use any of the helpers.
Fix it so the pass is a bit less verbose.
One unfortunate problem with BitMaskUtils is the lack of a bool operator,
so we need to use `any` instead. This is because C++ doesn't allow
conversion operators as free functions.
[LV] Only collect strides without predicates under OptForSize when interleaved access analysis (#205793)
During interleaved access analysis, certain addresses require a no-wrap
predicate to form an add recurrence and obtain the stride. However, when
optimizing for size, generating SCEV runtime checks is disallowed.
This patch modifies the constant stride collection when optimizing for
size to only collect strides that do not require predicates. This
ensures that vectorization will not blocked by disallowed predicates.
[AMDGPU][HWEvents] Refactor VMEM_ACCESS as VMEM_READ_ACCESS (#204545)
Instead of having an HWEvent that can be either a read or a write
depending on the target, keep the events as straightforward as
possible and let InsertWaitCnt interpret it. Rename VMEM_ACCESS
to VMEM_READ_ACCESS and set VMEM_WRITE_ACCESS & similar events
even if the target does not have a VSCnt.
I think this conceptually makes more sense.
This separates concerns better so that HWEvents models events
objectively, and InsertWaitCnt handles them as necessary for the task
it is trying to achieve (insert wait instructions).
My end goal with this series of changes is to de-tangle InsertWaitCnt so
we can divide it into layers, and each layer worries about its own thing.
This is only possible with proper separation of concerns.
[AMDGPU][InsertWaitCnts] Move TENSOR/ASYNC event detection to separate header (#204544)
I forgot to move those out of the way as they were not grouped with the
other.
Now `getEventsFor` does all the work.
[AMDGPU][HWEvents] Refactor VMEM_ACCESS as VMEM_READ_ACCESS
Instead of having an HWEvent that can be either a read or a write
depending on the target, keep the events as straightforward as
possible and let InsertWaitCnt interpret it. Rename VMEM_ACCESS
to VMEM_READ_ACCESS and set VMEM_STORE_ACCESS & similar events
even if the target does not have a VSCnt.
I think this conceptually makes more sense.
This separates concerns better so that HWEvents nodels events
objectively, and InsertWaitCnt handles them as necessary for the task
it is trying to achieve (insert wait instructions).
[AMDGPU][InsertWaitCnts] Move TENSOR/ASYNC event detection to separate header
I forgot to move those out of the way as they were not grouped with the other.
Now `getEventsFor` does all the work.
[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask (#203864)
Follow up from comments on
https://github.com/llvm/llvm-project/pull/202886
Make HWEvent a bitmask by default instead of having both the enum, and a
separate HWEventSet. This has the advantage of streamlining the code a
bit and opening the possibility of adding "modifiers" to events, e.g. I
imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked
into the design.
I opted for a bit more verbosity by taking inspiration from
FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a
class w/ helper function. The downside is having to reimplement all the
little bitwise ops, but the result is a cleaner, simpler interface than
a raw enum (class) w/ many helper functions. I initially tried that but
I recoiled at the sight of things like `contains(A, B)` which isn't very
clear, while `A.contains(B)` is self explanatory.
[3 lines not shown]
[InstCombine] Handle shuffle masks selecting poison in unshuffleConstant (#205870)
A shuffle mask can select from the second operand even when that operand
is poison. This caused unshuffleConstant to assert while trying to map
those mask elements into the first operand's constant vector.
Fix this by ignoring mask elements that select the poison operand.
Fixes https://github.com/llvm/llvm-project/issues/205769
Reapply "[Clang] Optionally use NewPM to run CodeGen Pipeline" (#205943)
This reverts commit 0c4cc9f8adc5acda1aa49b8a8704433e237848ee.
This patch also fixes the dependency issue by making the clang CodeGen
library depend on the LLVM CodeGen library which is needed by the NewPM
for CodeGen.
Reviewers: oontvoo
Pull Request: https://github.com/llvm/llvm-project/pull/205986