[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging (#202937)
It's 8 years old, only used by a handful of tests, and has not been
updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt
debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils (#202936)
Move really trivial functions into helpers to declutter InsertWaitCnt a
bit more.
I had to move HardwareLimits into a different header but it's only used
in InsertWaitCnt so it doesn't matter.
[libc][NFC] Migrate unistd entrypoints to syscall wrappers (#204176)
Migrated link, ftruncate, and getentropy entrypoints to use their
corresponding syscall wrappers instead of direct syscall_impl calls.
Updated CMake dependencies accordingly.
Assisted-by: Automated tooling, human reviewed.
[clang][CodeGen] Fix crash on if/switch init-statement ending in noreturn (#201047)
EmitStmt may `ClearInsertionPoint()` to mark dead code, EmitDecl is not
prepared to handle it. Fix by `EnsureInsertPoint()` in transition from
EmitStmt to EmitDecl. If/Switch body may contain a label which makes
them not dead.
Fixes #115514.
[LifetimeSafety] Mark lifetime safety LangOptions as `Benign` (#204316)
Without this, we cannot load modules built without lifetime safety.
Analysis options are in general benign and does not effect AST
construction.
See doc:
```cpp
/// For ASTs produced with different option value, signifies their level of
/// compatibility.
enum class CompatibilityKind {
/// Does affect the construction of the AST in a way that does prevent
/// module interoperability.
NotCompatible,
/// Does affect the construction of the AST in a way that doesn't prevent
/// interoperability (that is, the value can be different between an
/// explicit module and the user of that module).
Compatible,
/// Does not affect the construction of the AST in any way (that is, the
[4 lines not shown]
[LV] Add initial cost model for VPScalarIVSteps (#203347)
This PR currently only adds a cost model for integer types in
non-replicating regions in order to limit the scope of impact.
We can also support replicating regions, but that requires
looking for a recipe with an underlying value in the same
region in order to get a BasicBlock to pass in to the
getPredBlockCostDivisor function. This can be done in a future
PR.
[DirectX][ObjectYAML][NFC] Remove unused function (#204019)
A small follow-up for #202761.
`updateSize()` function added there is a rebase artifact. It is never
actually used. This change removes it.
[FuncSpec] Do not specialize interposable functions (#204314)
We cannot specialize interposable functions, because the definition we
see may not be prevailing one. The prevailing definition can have
arbitrary different behavior.
We *can* still specialize inexact definitions like linkonce_odr, similar
to inlining.
[llvm] Fix unused function warning in Parallel (#204114)
When llvm is built without threading support:
<...>/llvm-project/llvm/lib/Support/Parallel.cpp:230:13: warning: unused
function 'isNested' [-Wunused-function]
230 | static bool isNested() {
| ^~~~~~~~
The function is only used once, so I've put the code into the caller,
which is itself guarded with `#if LLVM_ENABLE_THREADS`.
Function added in 8daaa26efdda3802f73367d844b267bda3f84cbe / #189293.
[mlir][tosa] Add row_gather operator (#202895)
Adds support for the row_gather operator defined by the TOSA
specification, see https://github.com/arm/tosa-specification/pull/60.
This includes:
- Operator definition
- Verification logic for the operator
- Output shape inference for the operator
- Validation checks to ensure compliance with the TOSA specification
including profile compliance and level checks.
- Canonicalization to replace row_gather with gather when row_count is
statically known to be 1.
It does not yet cover support for MXFP types. This will be added once
block scaled types are supported.
[lldb][test] Cleanup and modernize TestHiddenIvars.py (#202023)
This is simple rewrite of the test. The patch improves three things:
* It replaces old expect tests with the new expect_* variants that no
longer rely on substring matching.
* It unifies the strip/non-stripped checks as we actually produce
identical SBValues in both cases (by fetching data from the Objective-C
runtime).
* It builds this test with a shared build directory. Our stripping logic
generates a new stripped binary in a subdirectory and doesn't touch the
shared build files. This also halves the test runtime to 6s.
[lldb][test] Only calculate LLDB python path once (#201327)
We spend about 70ms each dotest invocation recalculating the path where
the LLDB module is. This patch changes this so that dotest calculates
this path once and passes it to every dotest invocation.
As a fallback, we still support inferring the location from LLDB as
before, but I would propose we drop this support in the future.
[MIPS] soft-promote `f16` also when using `+msa` (#204158)
Fixes https://github.com/llvm/llvm-project/issues/202808
Re-lands https://github.com/llvm/llvm-project/pull/203065
Make use of the default soft-promote mechanism for f16, rather than an
ad-hoc approach making f16 storage-only.
In theory you could leave it at that, but I added custom implementations
to make use of the instructions for `FP16_TO_FP` and `FP_TO_FP16`, and
manually apply the "fptoui to fptosi trick" which generates shorter
code.
I've now tested that, in combination with
https://github.com/llvm/llvm-project/pull/203390, this PR is able to
build and run the rust `std` test suite, which exercises both `f16` and
vectors a bunch. The tests all pass under `qemu` as well.
The last commit fixes an integer overflow bug that triggered UBSan and
led to an earlier revert of these changes.
[InstCombine] Fold X == Identity ? Y : min/max(X, Y) (#202748)
Fixes #202576
Fold:
```llvm
select (X == -1), Y, umin(X, Y) -> umin(X, Y)
select (X == 0), Y, umax(X, Y) -> umax(X, Y)
select (X == SignedMax), Y, smin(X, Y) -> smin(X, Y)
select (X == SignedMax), Y, smax(Y, X) -> smax(X, Y)
```
Snd the inverted/commuted forms.
AI note: I used AI to help me read through the codebase and write the
tests.
AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo
Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask
Follow up from comments on https://github.com/llvm/llvm-project/pull/202886
Make HWEvent a bitmask by default instead of having both the enum, and a separate HWEventSet. This has the advantage of streamlining the code a bit and opening the possibility of adding "modifiers" to events, e.g. I imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked into the design.
I opted for a bit more verbosity by taking inspiration from FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a class w/ helper function. The downside is having to reimplement all the little bitwise ops, but the result is a cleaner, simpler interface than a raw enum (class) w/ many helper functions. I initially tried that but I recoiled at the sight of things like `contains(A, B)` which isn't very clear, while `A.contains(B)` is self explanatory.
Considering HWEvent is a bitmask, I also implemented a simple iterator to iterate over all set bits of the mask, which is a useful thing to have as some APIs in InsertWaitCnt rely on treating one event at a time.
[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging
It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils
Move really trivial functions into helpers to declutter InsertWaitCnt a bit more.
I had to move HardwareLimits into a different header but it's only used in InsertWaitCnt so it doesn't matter.
[flang][NFC] remove libFortranEvaluate from Optimizer libraries (#204222)
Replace usages of `AbstractConverter` inside IntrinsicCall.cpp by a
structure that propagates the required option to avoid bringing
libFortranEvaluate as a dependency of libFortranOptimizer while the
Optimizer is not using evalute::Expr or other front-end data structure
at all.
Also remove headers whose include have crept-in and that were never
removed while not required.
[lldb][test] Fix duplicate error messages in expect_expr/var_path (#202310)
The error message field for expect_* methods always prints the value
object, so there is no need to specify a custom error message that then
just prints the object too.
This fixes the duplicate value object printout on test failures.
[AMDGPU][InsertWaitCnts] Move HWEvent analysis code (#202887)
Building up on the previous RFC, if it is accepted:
Move the code that maps a MachineInstr to HWEventSet to a separate file.
This should be NFC.
[mlir][spirv] Fix crash on 0-D vectors in vector unrolling (#203291)
`getTargetShape` and `getNativeVectorShape` called `getShape().back()`
without checking for rank-0 vectors, whose shape is empty. This crashed
when the SPIR-V vector unrolling pass processed a function returning a
0-D vector (e.g. `vector<f32>`) or a 0-D elementwise op.
0-D vectors have no dimension to unroll along and are not SPIR-V vector
types, so bail out and leave them unchanged in both paths.
Fixes https://github.com/llvm/llvm-project/issues/203220
[clangd] Replay macro definitions from preamble for clang-tidy checks (#202495)
Clang-tidy checkers observe preprocessor events via PPCallbacks. When
using a preamble, macro definitions in the preamble region of the main
file are not replayed during the main-file build, causing checkers like
bugprone-reserved-identifier to miss them.
This patch extends ReplayPreamble::replay() to also replay MacroDefined
events for macros defined directly in the preamble region of the open
file, similar to how InclusionDirective events are already replayed.
Fixes: https://github.com/clangd/clangd/issues/2501
[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask
Follow up from comments on https://github.com/llvm/llvm-project/pull/202886
Make HWEvent a bitmask by default instead of having both the enum, and a separate HWEventSet. This has the advantage of streamlining the code a bit and opening the possibility of adding "modifiers" to events, e.g. I imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked into the design.
I opted for a bit more verbosity by taking inspiration from FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a class w/ helper function. The downside is having to reimplement all the little bitwise ops, but the result is a cleaner, simpler interface than a raw enum (class) w/ many helper functions. I initially tried that but I recoiled at the sight of things like `contains(A, B)` which isn't very clear, while `A.contains(B)` is self explanatory.
Considering HWEvent is a bitmask, I also implemented a simple iterator to iterate over all set bits of the mask, which is a useful thing to have as some APIs in InsertWaitCnt rely on treating one event at a time.