[AMDGPU] remove DefIsPriv mapping (#202694)
Since various commits that now avoid immediately casting most
temporaries, and now follow Sema for variables, this looks like tests
pass now without needing a second map to correct those issues
afterwards. Hopefully this will help find any similar remaining issues
expeditiously, if any.
[clang] `this` getter missed in ConstructAttributeList (#203010)
In https://reviews.llvm.org/D159247 (400d3261a0da56554aee8e5a2fbc27eade9d05db)
it looks intended to update all of these calls, but missed this. The
effect is that a reference `&this` in a non-zero addrspace would take
this branch and crash there (because it ends up asserting that `this`
is a pointer). DRY the code since this branch looks like it kept
getting copied more incorrectly over time. I don't have an actual use
or test for this, I just noticed it while I was trying to break other
things in fuzzing.
[docs] try again to handle doxygen everywhere (#203081)
The previous attempt at this (b7da9565017e32c18b927a7637714d1b660b558d)
still broke standalone builds. Now I have locally tested standalone
flang, runtimes (with openmp), lldb, combined builds, and the utils
script. Hopefully that covers everything this time, and gets everything
into a more consistent state (always using the HandleDoxygen script in
the same way, included exactly once as required by the cmake design).
[LoopInterchange] Consolidate induction and reduction vars check (#203197)
Previously, the handling of PHI nodes in loop headers was scattered. In
particular, there were two separate functions, `findInductions` and
`findInductionAndReductions`, which made the code difficult to reason
about. This patch consolidates these two functions, along with their
related caller logic, into a single function,
`checkInductionsAndReductions`. Although some remarks and debug outputs
have changed as a result, I believe the functionality itself remains
unchanged.
[Dexter] Write expects for variables in Debugger scopes
Following on from the previous patch, this patch adds support for writing
expects from !value/all nodes, generating separate expects for each
variable in the requested debugger scope, for each continuous range of lines
it is live for.
[LoopInterchange] Bail out when outer loop latch PHI has non-PHI user (#201923)
When there are non-PHI instructions in the outer loop that use values
originating from the LCSSA PHIs of the inner loop, it becomes difficult
to adjust the wiring during the transformation. In fact, multiple issues
(#200819 and #201571) have been raised related to this pattern. #201059
tried to resolve the issue by modifying the transformation phase, but it
was insufficient.
Instead of spending effort in the transformation phase, this patch adds
an additional check in the legality check and rejects such cases. I
think the cases rejected by this additional check are not very
practical, so the impact on realistic cases should be low, and it is
simpler than adjusting the wiring in the transformation phase.
This patch also effectively reverts #201059, as it is no longer
necessary.
Fix #201571.
[X86] Don't assert on EFLAGS copies in unreachable blocks (#203208)
X86FlagsCopyLowering collects the EFLAGS copies to lower using a
ReversePostOrderTraversal, which only visits blocks reachable from the
entry. Its end-of-pass verification, however, iterated over every block
in the function, so an EFLAGS copy left in an unreachable block (e.g.
produced by ISel for an always-taken branch whose other edge is dead)
tripped the "Unlowered EFLAGS copy!" assertion.
Such copies are harmless: the unreachable block is removed by the
unreachable-block elimination pass that runs right after this one,
before register allocation, so the copy never reaches a pass that cannot
handle it. Restrict the verification to reachable blocks (depth_first
from the entry) to match the set of blocks actually processed.
Found via fuzzing (llvm-isel-fuzzer).
[lldb-dap] Support loading core files through attachCommands (#202785)
The `attachCommands` attach option lets users bootstrap a session with
arbitrary LLDB commands, but a command that loaded a core (e.g. `target
create --core`) produced a broken session:
`ConfigurationDoneRequestHandler` would call `process.Continue()` on the
core and fail, because the non-live-session handling was keyed on the
`coreFile` attach argument rather than on the actual resulting process.
This teaches `AttachRequestHandler` to detect, after the attach commands
run, whether the selected process was loaded from a core via the
`SBProcess:: IsLiveDebugSession()` API added in #203111. When it is a
core, it sets `stop_at_entry` and clears `is_live_session`, mirroring
what the `coreFile` key does.
[StringMap] Invalidate iterators in remove() (#203249)
erase() bumps the epoch to invalidate iterators (#202237), but the
lower-level remove() — which detaches an entry without destroying it,
used
by ValueSymbolTable via Value::setName() — did not. Move the
incrementEpoch() into remove() so remove-while-iterating fails fast
under
LLVM_ENABLE_ABI_BREAKING_CHECKS too.
Aided by Claude Opus 4.8
Reland after lldb fix #203035
[mlir][LangRef] Clarify terminator continuations (#201111)
Document that terminators may have no normal control-flow continuation,
such as ub.unreachable. Also clarify that no-return calls do not remove
the structural terminator requirement.
Assisted-by: Codex
[KnownBits] Fix add() SelfAdd assertion for bitwidths >= 512 (#202769)
`KnownBits::add()` with `SelfAdd=true` lowers `X+X` to `shl(X, 1)` using
a fixed 8-bit shift amount:
```cpp
KnownBits Amt = KnownBits::makeConstant(APInt(8, 1));
return KnownBits::shl(LHS, Amt, NUW, NSW, /*ShAmtNonZero=*/true);
```
The comment there claims the shift-amount bitwidth is independent of the
source bitwidth, but that is not true: `shl()`'s `getMaxShiftAmount()`
extracts `Log2_32(BitWidth)` bits from the shift amount's max value when
`BitWidth` is a power of two:
```cpp
static unsigned getMaxShiftAmount(const APInt &MaxValue, unsigned BitWidth) {
if (isPowerOf2_32(BitWidth))
return MaxValue.extractBitsAsZExtValue(Log2_32(BitWidth), 0);
[23 lines not shown]
[LoopInterchange] Add tests for outer-variant inner IV step (NFC) (#202750)
Adds test cases for #202383 and #202401. Both have an induction variable
in the inner loop whose step value is not loop-invariant with respect to
the outer loop.
[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils
Move really trivial functions into helpers to declutter InsertWaitCnt a bit more.
I had to move HardwareLimits into a different header but it's only used in InsertWaitCnt so it doesn't matter.
[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging
It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[AMDGPU][InsertWaitCnts] Move HWEvent analysis code
Building up on the previous RFC, if it is accepted:
Move the code that maps a MachineInstr to HWEventSet to a separate file.
This should be NFC.
[RFC][AMDGPU][InsertWaitCnt] Move WaitEventType into separate HWEvent header
I propose to move `WaitEventType` into its own header to start a new
component of the back-end targeted at analyzing and treating hardware events
fired by instructions. Right now this just moves code around and renames things
(NFCI) but over time, we should generalize the events so they can be reused
by other passes instead of being hyper-specialized for InsertWaitCnt.