[AMDGPU] Pre-commit unit test for RP tracking `reset`/`advance` inconsistencies fix (#196098)
This adds a new AMDGPU unit test file for testing the behavior of
`GCNRPTracker` and its related classes. The two test showcase confusing
return value and behavioral semantics for variants of the advance and
reset functions, which will be clarified in a follow up commit.
[PowerPC][NFC]Refactor EmitInstrWithCustomInserter (#196114)
Currently PPCTargetLowering::EmitInstrWithCustomInserter() uses a large
if/else-if structure. Update to use switch and
move ATOMIC_CMP_SWAP and SELECT code to helper functions for better
readability and maintenance.
clang/AMDGPU: Pass BoundArch through device libs handling
Pre-work to consolidate target identification for future target
option bug fixes. Also requires updating flang to match recent
clang changes.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
[SLP] Vectorize struct-returning intrinsics
Allow SLP to combine across lanes calls that return a literal struct
(llvm.sincos, llvm.*.with.overflow, llvm.frexp, ...) into a single
call returning a struct of vectors, by widening {T, T, ...} to
{<VF x T>, ...} via VectorTypeUtils and emitting extractvalue +
extractelement for external uses.
Reviewers: hiraditya, bababuck, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/195521
[AArch64][GlobalISel] Legalize F64 to BF16 fptruncates (#196077)
This two-step expansion of bf16 fptrunc steps needs to be careful to
avoid double-rounding error. Under AArch64 we can apparently convert to
a fcvtxn that performs round-to-odd, followed by a standard fp truncate
to bf16 to make sure the rounding from there is done correctly. This
reuses the existing lowering added for vector operations.
[Clang][Modules] Fix -Wunused-variable (#196577)
Mark some variables [[maybe_unused]] and inline others that do not have
side effects to avoid -Wunused-variable in non-assert builds.
[Object][Wasm] Fix off-by-one in data segment name index validation (#196338)
The check `Index > DataSegments.size()` in `parseNameSection()` allows
`Index == DataSegments.size()`, which is an out-of-bounds access.
In an assertions-disabled ASan build, a malformed wasm object with one
data segment and a data segment name entry using index 1 triggers a
heap-buffer-overflow READ in `WasmObjectFile::parseNameSection()`.
Fix by checking `Index >= DataSegments.size()` instead.
Also add a regression test that verifies the malformed input is rejected
with "invalid data segment name entry".
[libc] Fix op_tests Memcmp guard to require SSE4.1 (#196572)
The is_vector<__m128i> specialisation in op_x86.h is gated on
__SSE4_1__, but op_tests.cpp included generic::Memcmp<__m128i> under the
weaker __SSE2__ guard. On baseline x86-64 (where __SSE2__ is always
defined but __SSE4_1__ may not be), this caused a static_assert failure
in is_element_type_v.
Changed the guard from __SSE2__ to __SSE4_1__ to match the
specialisation requirement, consistent with how BcmpImplementations
already guards its __m128i entry.
Assisted-by: Automated tooling, human reviewed.
[DAG] canCreateUndefOrPoison - ISD::FCEIL/FFLOOR/FTRUNC/FRINT/FNEARBYINT/FROUND/FROUNDEVEN can never create poison/undef (#196543)
Also add missing fold support for ftrunc(fround(x)) -> fround(x)
clang: Add BoundArch argument to addClangTargetOptions
addClangTargetOptions already has an OffloadKind argument,
but it kind of doesn't make sense for any function to know the
OffloadKind, but not the associated BoundArch.
The current process is kind of convoluted. TranslateArgs
synthesizes a -mcpu argument from BoundArch, and later
addClangTargetOptions re-parses that -mcpu argument each
time it wants the architecture. Add this argument so this
can be cleaned up in a future change.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
clang: Consolidate -aux-triple handling
All of the offload languages were essentially doing the
same thing, with overcomplicated conditions conditional on
the language.
[AMDGPU] Pre-commit unit test for RP tracking reset/advance behavior
This adds a new AMDGPU unit test file for testing the behavior of
`GCNRPTracker` and its related classes. The two test showcase confusing
return value and behavioral semantics for variants of the advance and
reset functions, which will be clarified in a follow up commit.
This also moves some common test helpers from other AMDGPU unit tests to
the `AMDGPUUnitTests` TU to avoid repetition between unit tests.
[CodeGen][AMDGPU] Move boilerplate unit test code to base class (NFC) (#196547)
This adds the `CodeGenTestBase` class to handle boilerplate code for
codegen unit tests and makes use of it wherever possible, in particular
in AMDGPU unit tests.
Furthermore, this makes all AMDGPU unit tests rely on GoogleTest's API
for "run once per test-suite" code, instead of re-implementing that
behavior using a `std::once` flag. As a consequence all TEST(...) become
TEST_F(...).