[AArch64][SVE] Enable known bits for predicated shifts (#200347)
Allow SelectionDAG to query target known-bits information for scalable
vector nodes, and known-bits cases for SVE predicated SHL, SRL and SRA
nodes.
This enables DAG combines to prove disjointness for ORs involving
scalable vector shifts, enabling USRA/SSRA instruction selection.
[clang][Sema][CUDA] Restrict immediate template resolution to host-device functions (#200662)
Since overload resolution gives higher priority to `__host__` and
`__device__` attributes, HD functions may favor template candidates even
when a non‑template candidate would be a perfect match. This patch
resolves templates eagerly only for HD functions, not for all code
compiled with `-x cuda`, thus preventing valid host code from being
rejected.
Close #200545
[CIR][AArch64] Lower NEON Widen && Widening subtraction intrinsics (#204088)
## summary
This is a part of : https://github.com/llvm/llvm-project/issues/185382
follow up of : https://github.com/llvm/llvm-project/pull/202857
Lower part of Widen and Widening subtraction intrinsics
### why implement two sets of intrinsics in one PR?
Widening subtraction depends on the widen intrinsics during lowering, so
I implemented them in the same PR.
[flang][OpenMP] Refactor semantic check of SINGLE construct
Extract it into a separate function and simplify the code. Avoid making
the distinction between a clause appearing on the "begin" and the "end"
directives for the purposes of emitting diagnostic messages.
One change in behavior is that using the same list item multiple times
in COPYPRIVATE clause(s) is an error regardless of the placement of the
clauses. Previously in some cases it was treated as a warning.
Part of the motivation is the goal of eliminating explicit definitions
of end-directives for directives that are not delimited, e.g.
"end single", but not "end declare_variant".
[mlir][tosa] Preserve raw const data in signless conversion (#204324)
Use DenseElementsAttr::getFromRawBuffer when rebuilding tosa.const
attributes in TosaConvertIntegerTypeToSignless. The previous
DenseElementsAttr::get(type, ArrayRef<char>) call interpreted raw bytes
as i8 elements, which asserted for integer constants wider than 8 bits.
Add regression coverage for ui16, ui32, and ui48 constants.
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
[IRBuilder] Refactor for intrinsics const-folding (NFC) (#202738)
In preparation to const-fold intrinsic calls, refactor the IRBuilder
API, generalizing it to return possibly constant-folded values.
AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo
Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
Revert "[flang][mem2reg] promote memory slots through declares" (#204332)
Reverts llvm/llvm-project#196975
This patch is causing regressions on some of our downstream testing. I
am not sure the logic here is directly to blame, but I'd rather revert
and investigate for now.
[lldb] Remove several system header includes from tests (#204072)
System includes slow down test compilation and create unnecessary
dependencies on system header code.
This patch removes system headers from tests that do not test their
functionality. For the most part, this just removes the dummy 'printf'
we had in many tests.
[RFC][LangRef] Specify that the accessed bytes of concurrent atomics must be either disjoint or the same
So far, the LangRef hasn't been clear on the semantics of partially overlapping
concurrent atomics in LLVM IR (specifically: a set of accesses marked as
`atomic` that would be in a data race if they weren't `atomic` and not all of
them access the exact same set of bytes).
What loads read is defined in terms of individual bytes, but the memory
ordering constraints are formulated closely to the C/C++ (and Java for
`unordered`) memory model, where partially overlapping atomics are not
possible. It's not obvious how concepts like C/C++'s per-location total
modification order for `monotonic` accesses map to accesses that can partially
overlap. While C/C++ relies on the modification order to ensure that atomics
cannot tear (i.e., atomic reads return bytes from two or more atomic writes),
our IR semantics (as written) currently does not guarantee this in the presence
of partially overlapping accesses.
This PR proposes a solution to this problem: It specifies that concurrent
overlapping atomics must access the exact same set of bytes to act atomically.
[7 lines not shown]
[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging (#202937)
It's 8 years old, only used by a handful of tests, and has not been
updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt
debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils (#202936)
Move really trivial functions into helpers to declutter InsertWaitCnt a
bit more.
I had to move HardwareLimits into a different header but it's only used
in InsertWaitCnt so it doesn't matter.
[libc][NFC] Migrate unistd entrypoints to syscall wrappers (#204176)
Migrated link, ftruncate, and getentropy entrypoints to use their
corresponding syscall wrappers instead of direct syscall_impl calls.
Updated CMake dependencies accordingly.
Assisted-by: Automated tooling, human reviewed.
[clang][CodeGen] Fix crash on if/switch init-statement ending in noreturn (#201047)
EmitStmt may `ClearInsertionPoint()` to mark dead code, EmitDecl is not
prepared to handle it. Fix by `EnsureInsertPoint()` in transition from
EmitStmt to EmitDecl. If/Switch body may contain a label which makes
them not dead.
Fixes #115514.
[LifetimeSafety] Mark lifetime safety LangOptions as `Benign` (#204316)
Without this, we cannot load modules built without lifetime safety.
Analysis options are in general benign and does not effect AST
construction.
See doc:
```cpp
/// For ASTs produced with different option value, signifies their level of
/// compatibility.
enum class CompatibilityKind {
/// Does affect the construction of the AST in a way that does prevent
/// module interoperability.
NotCompatible,
/// Does affect the construction of the AST in a way that doesn't prevent
/// interoperability (that is, the value can be different between an
/// explicit module and the user of that module).
Compatible,
/// Does not affect the construction of the AST in any way (that is, the
[4 lines not shown]
[LV] Add initial cost model for VPScalarIVSteps (#203347)
This PR currently only adds a cost model for integer types in
non-replicating regions in order to limit the scope of impact.
We can also support replicating regions, but that requires
looking for a recipe with an underlying value in the same
region in order to get a BasicBlock to pass in to the
getPredBlockCostDivisor function. This can be done in a future
PR.
[DirectX][ObjectYAML][NFC] Remove unused function (#204019)
A small follow-up for #202761.
`updateSize()` function added there is a rebase artifact. It is never
actually used. This change removes it.
[FuncSpec] Do not specialize interposable functions (#204314)
We cannot specialize interposable functions, because the definition we
see may not be prevailing one. The prevailing definition can have
arbitrary different behavior.
We *can* still specialize inexact definitions like linkonce_odr, similar
to inlining.
[llvm] Fix unused function warning in Parallel (#204114)
When llvm is built without threading support:
<...>/llvm-project/llvm/lib/Support/Parallel.cpp:230:13: warning: unused
function 'isNested' [-Wunused-function]
230 | static bool isNested() {
| ^~~~~~~~
The function is only used once, so I've put the code into the caller,
which is itself guarded with `#if LLVM_ENABLE_THREADS`.
Function added in 8daaa26efdda3802f73367d844b267bda3f84cbe / #189293.
[mlir][tosa] Add row_gather operator (#202895)
Adds support for the row_gather operator defined by the TOSA
specification, see https://github.com/arm/tosa-specification/pull/60.
This includes:
- Operator definition
- Verification logic for the operator
- Output shape inference for the operator
- Validation checks to ensure compliance with the TOSA specification
including profile compliance and level checks.
- Canonicalization to replace row_gather with gather when row_count is
statically known to be 1.
It does not yet cover support for MXFP types. This will be added once
block scaled types are supported.
[lldb][test] Cleanup and modernize TestHiddenIvars.py (#202023)
This is simple rewrite of the test. The patch improves three things:
* It replaces old expect tests with the new expect_* variants that no
longer rely on substring matching.
* It unifies the strip/non-stripped checks as we actually produce
identical SBValues in both cases (by fetching data from the Objective-C
runtime).
* It builds this test with a shared build directory. Our stripping logic
generates a new stripped binary in a subdirectory and doesn't touch the
shared build files. This also halves the test runtime to 6s.