Revert "[flang][mem2reg] promote memory slots through declares" (#204332)
Reverts llvm/llvm-project#196975
This patch is causing regressions on some of our downstream testing. I
am not sure the logic here is directly to blame, but I'd rather revert
and investigate for now.
[lldb] Remove several system header includes from tests (#204072)
System includes slow down test compilation and create unnecessary
dependencies on system header code.
This patch removes system headers from tests that do not test their
functionality. For the most part, this just removes the dummy 'printf'
we had in many tests.
[RFC][LangRef] Specify that the accessed bytes of concurrent atomics must be either disjoint or the same
So far, the LangRef hasn't been clear on the semantics of partially overlapping
concurrent atomics in LLVM IR (specifically: a set of accesses marked as
`atomic` that would be in a data race if they weren't `atomic` and not all of
them access the exact same set of bytes).
What loads read is defined in terms of individual bytes, but the memory
ordering constraints are formulated closely to the C/C++ (and Java for
`unordered`) memory model, where partially overlapping atomics are not
possible. It's not obvious how concepts like C/C++'s per-location total
modification order for `monotonic` accesses map to accesses that can partially
overlap. While C/C++ relies on the modification order to ensure that atomics
cannot tear (i.e., atomic reads return bytes from two or more atomic writes),
our IR semantics (as written) currently does not guarantee this in the presence
of partially overlapping accesses.
This PR proposes a solution to this problem: It specifies that concurrent
overlapping atomics must access the exact same set of bytes to act atomically.
[7 lines not shown]
[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging (#202937)
It's 8 years old, only used by a handful of tests, and has not been
updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt
debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils (#202936)
Move really trivial functions into helpers to declutter InsertWaitCnt a
bit more.
I had to move HardwareLimits into a different header but it's only used
in InsertWaitCnt so it doesn't matter.
[libc][NFC] Migrate unistd entrypoints to syscall wrappers (#204176)
Migrated link, ftruncate, and getentropy entrypoints to use their
corresponding syscall wrappers instead of direct syscall_impl calls.
Updated CMake dependencies accordingly.
Assisted-by: Automated tooling, human reviewed.
[clang][CodeGen] Fix crash on if/switch init-statement ending in noreturn (#201047)
EmitStmt may `ClearInsertionPoint()` to mark dead code, EmitDecl is not
prepared to handle it. Fix by `EnsureInsertPoint()` in transition from
EmitStmt to EmitDecl. If/Switch body may contain a label which makes
them not dead.
Fixes #115514.
[LifetimeSafety] Mark lifetime safety LangOptions as `Benign` (#204316)
Without this, we cannot load modules built without lifetime safety.
Analysis options are in general benign and does not effect AST
construction.
See doc:
```cpp
/// For ASTs produced with different option value, signifies their level of
/// compatibility.
enum class CompatibilityKind {
/// Does affect the construction of the AST in a way that does prevent
/// module interoperability.
NotCompatible,
/// Does affect the construction of the AST in a way that doesn't prevent
/// interoperability (that is, the value can be different between an
/// explicit module and the user of that module).
Compatible,
/// Does not affect the construction of the AST in any way (that is, the
[4 lines not shown]
[LV] Add initial cost model for VPScalarIVSteps (#203347)
This PR currently only adds a cost model for integer types in
non-replicating regions in order to limit the scope of impact.
We can also support replicating regions, but that requires
looking for a recipe with an underlying value in the same
region in order to get a BasicBlock to pass in to the
getPredBlockCostDivisor function. This can be done in a future
PR.
[DirectX][ObjectYAML][NFC] Remove unused function (#204019)
A small follow-up for #202761.
`updateSize()` function added there is a rebase artifact. It is never
actually used. This change removes it.
[FuncSpec] Do not specialize interposable functions (#204314)
We cannot specialize interposable functions, because the definition we
see may not be prevailing one. The prevailing definition can have
arbitrary different behavior.
We *can* still specialize inexact definitions like linkonce_odr, similar
to inlining.
[llvm] Fix unused function warning in Parallel (#204114)
When llvm is built without threading support:
<...>/llvm-project/llvm/lib/Support/Parallel.cpp:230:13: warning: unused
function 'isNested' [-Wunused-function]
230 | static bool isNested() {
| ^~~~~~~~
The function is only used once, so I've put the code into the caller,
which is itself guarded with `#if LLVM_ENABLE_THREADS`.
Function added in 8daaa26efdda3802f73367d844b267bda3f84cbe / #189293.
[mlir][tosa] Add row_gather operator (#202895)
Adds support for the row_gather operator defined by the TOSA
specification, see https://github.com/arm/tosa-specification/pull/60.
This includes:
- Operator definition
- Verification logic for the operator
- Output shape inference for the operator
- Validation checks to ensure compliance with the TOSA specification
including profile compliance and level checks.
- Canonicalization to replace row_gather with gather when row_count is
statically known to be 1.
It does not yet cover support for MXFP types. This will be added once
block scaled types are supported.
[lldb][test] Cleanup and modernize TestHiddenIvars.py (#202023)
This is simple rewrite of the test. The patch improves three things:
* It replaces old expect tests with the new expect_* variants that no
longer rely on substring matching.
* It unifies the strip/non-stripped checks as we actually produce
identical SBValues in both cases (by fetching data from the Objective-C
runtime).
* It builds this test with a shared build directory. Our stripping logic
generates a new stripped binary in a subdirectory and doesn't touch the
shared build files. This also halves the test runtime to 6s.
[lldb][test] Only calculate LLDB python path once (#201327)
We spend about 70ms each dotest invocation recalculating the path where
the LLDB module is. This patch changes this so that dotest calculates
this path once and passes it to every dotest invocation.
As a fallback, we still support inferring the location from LLDB as
before, but I would propose we drop this support in the future.
[MIPS] soft-promote `f16` also when using `+msa` (#204158)
Fixes https://github.com/llvm/llvm-project/issues/202808
Re-lands https://github.com/llvm/llvm-project/pull/203065
Make use of the default soft-promote mechanism for f16, rather than an
ad-hoc approach making f16 storage-only.
In theory you could leave it at that, but I added custom implementations
to make use of the instructions for `FP16_TO_FP` and `FP_TO_FP16`, and
manually apply the "fptoui to fptosi trick" which generates shorter
code.
I've now tested that, in combination with
https://github.com/llvm/llvm-project/pull/203390, this PR is able to
build and run the rust `std` test suite, which exercises both `f16` and
vectors a bunch. The tests all pass under `qemu` as well.
The last commit fixes an integer overflow bug that triggered UBSan and
led to an earlier revert of these changes.
[InstCombine] Fold X == Identity ? Y : min/max(X, Y) (#202748)
Fixes #202576
Fold:
```llvm
select (X == -1), Y, umin(X, Y) -> umin(X, Y)
select (X == 0), Y, umax(X, Y) -> umax(X, Y)
select (X == SignedMax), Y, smin(X, Y) -> smin(X, Y)
select (X == SignedMax), Y, smax(Y, X) -> smax(X, Y)
```
Snd the inverted/commuted forms.
AI note: I used AI to help me read through the codebase and write the
tests.
AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo
Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask
Follow up from comments on https://github.com/llvm/llvm-project/pull/202886
Make HWEvent a bitmask by default instead of having both the enum, and a separate HWEventSet. This has the advantage of streamlining the code a bit and opening the possibility of adding "modifiers" to events, e.g. I imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked into the design.
I opted for a bit more verbosity by taking inspiration from FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a class w/ helper function. The downside is having to reimplement all the little bitwise ops, but the result is a cleaner, simpler interface than a raw enum (class) w/ many helper functions. I initially tried that but I recoiled at the sight of things like `contains(A, B)` which isn't very clear, while `A.contains(B)` is self explanatory.
Considering HWEvent is a bitmask, I also implemented a simple iterator to iterate over all set bits of the mask, which is a useful thing to have as some APIs in InsertWaitCnt rely on treating one event at a time.
[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging
It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils
Move really trivial functions into helpers to declutter InsertWaitCnt a bit more.
I had to move HardwareLimits into a different header but it's only used in InsertWaitCnt so it doesn't matter.