[AMDGPU][SIInsertWaitcnts][NFC] Introduce WaitEventSet container for events (#178511)
Before this patch WaitEventType events used to be collected in unsigned
integers that were used as small bit vectors.
This patch introduces a WaitEventSet container class to replace the
integer bit vectors with a class that hides the implementation of common
operations like insertion, removal, union, intersection etc. from the
user.
The WaitEventSet API matches that of a set and not a vector because we
don't care about the order of its contents. Internally though it is
still a bit vector that uses an unsigned integer as its storage, just
like the original implementation.
This patch should not change the functionality.
[SLP]Cast incoming value to a propr type for int nodes, bitcasted to fp
Before casting the value to FP type, need to check, if the type for
reduced during minbitwidth analysis and need to restore the original
source type to generate correct bitcast operation.
Fixes #178884
[clang][Modules] Fixing Incorrect Diagnostics Issued during By-name Dependency Scanning (#178542)
The by-name lookup API uses the same diagnostics engine and consumer for
multiple lookups. When multiple lookups fail, the diagnostics could be
incorrect for all but the first failing lookup. All the subsequent
failing lookups inherit the diagnostics from the first failing lookup.
This PR resets the diagnostics consumer's buffer and the
CompilerInstance's diagnostics engine for each by-name lookup, so each
lookup can produce the correct diagnostics.
Part of work for rdar://136303612.
[AMDGPU][Scheduler] Make `finalizeGCNRegion` an overridable hook (NFC) (#177199)
This allows individual stages to make decisions after re-scheduling
individual regions.
[DOC][DTLTO] Update DTLTO documentation for the LLVM 22 release (#177368)
This change updates the documentation to reflect work completed during
the LLVM 22 timeframe, including support for the ThinLTO cache and
static libraries/archives.
It also clarifies that the goal of DTLTO is to support distribution of
ThinLTO backend compilations for any in-process ThinLTO invocation.
SIE Internal Tracker: TOOLCHAIN-21016
[VPlan] Mark VPActiveLaneMaskPHIRecipe as readnone. (#177886)
VPWidenActiveLaneMaskPHIRecipe does not have side-effects and also does
not access memory. Mark accordingly. This allows hoisting of some
invariant loads out of loops and also removing unused phi recipes in the
future.
In
llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll,
the hoisting makes vectorization profitable.
PR: https://github.com/llvm/llvm-project/pull/177886
[Flang][OpenMP] Reject INTENT(IN) pointers in LASTPRIVATE clause (#178845)
`LASTPRIVATE` clause requires the list item to be definable since the
value from the last iteration is assigned back to the original variable.
For pointers, this assignment occurs "as if by pointer assignment"
(OpenMP 5.2 Section 5.4.5).
An `INTENT(IN)` pointer dummy argument is not a valid target for pointer
assignment, therefore it should not be permitted in a `LASTPRIVATE`
clause.
This patch adds the `CheckIntentInPointer()` call to the `LASTPRIVATE`
clause handler, consistent with other data-sharing clauses like
`PRIVATE`, `COPYPRIVATE`, and `REDUCTION`.
Fixes [#178398](https://github.com/llvm/llvm-project/issues/178398)
[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128
Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:
```
All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
that are currently dependent on FEAT_D128 are updated to be dependent
on FEAT_D128 or FEAT_TLBID
```
[mlir] Verify childen interface in transform named sequence (#178881)
Application of sequence blocks in the transform interpreter assumes that
all operations (except for the terminator) in the sequence block have
the `TransformOpInterface`. For `SequenceOp`, this was already verified,
but not for `NamedSequenceOp`, causing assertion failures if the
assumption doesn't hold.
This change adds verification that all operations in the block except
for the terminator have the `TransformOpInterface`.
Signed-off-by: Lukas Sommer <lukas.sommer at amd.com>
[acc] Fix acc.loop to scf utilities (#178809)
Fixes a problem encountered with enabling coalesceLoops when bounds were
constructed inside expanded loops. Additionally, ensures that all loop
utilities use rewriter instead of their own builders for proper
tracking.
[HIP] Make `--no-offloadlib` not link HIP's RT (#177677)
Summary:
Right now we have `--no-hip-rt` to suppress the implicit linking of the
HIP runtime. However, we already have a flag for `--no-offloadlib` which
seems to imply this. However, this one currently only applies to the
device-side library. More targets will likely use this soon, so it would
be nice to unify the behavior here.
The impact of this change is that `-nogpulib` which is commonly used to
suppress the ROCm device libraries will now also suppress this, and
`--no-hip-rt` will suppress the ROCm device libraries. This is a
functional change, but I'm not sure if anyone truly relies on this
distinction in the wild. Functionally, one turns off the host runtime,
the other the device. This PR makes both do both at the same time. Since
these are libraries we should be able to just get users to pass them
manually if needed.
[AArch64][llvm] Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions
Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions.
We removed gating for `sys`, `mrs` and `mrs` instructions previously,
on the basis that it doesn't add value, as it doesn't indicate that
any particular system registers or system instructions are available.
Therefore, remove `+d128` gating for these too.
(In an upcoming change, some `tlbip` instructions, which are `sysp` aliases
are allowed to be used with either `+d128` or `tlbid`. If we don't remove
this gating, then it would require some ugly work-arounds in the code to
support the relaxation mandated by the 2025 MemSys specification.
In this change, retain `+d128` gating for all `tlbip` instructions, which
will then be loosened to either `+d128` or `+tlbid` in a subsequent change)
[VectorCombine] Trim low end of loads used in shufflevector rebroadcasts. (#149093)
Following on from #128938, trim the low end of loads where only some of
the incoming lanes are used for rebroadcasts in shufflevector
instructions.
---------
Co-authored-by: Leon Clark <leoclark at amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev at redking.me.uk>
[AMDGPU] Fix VOPD checks for commuting OpX and OpY (#178772)
We need to check that OpX does not write the sources of OpY, but if we
swap OpX and OpY with respect to program order, the check was not
swapped correctly.
The checks on gfx1250 can be relaxed slightly, that is planned for a
future patch.
---------
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
NFC: Rename CodeGenOptions::StackUsageOutput to StackUsageFile (#178898)
Preparation for #178005.
"Output" has too many different interpretations: it could be an
enabled/disabled, a file format, etc. Clarify that it's the destination
file.