[Clang] Track temporary cleanups in rebuilt default member initializers
Fixes https://github.com/llvm/llvm-project/issues/196469
When Clang rebuilds a default member initializer for CWG1815 lifetime extension, TreeTransform's initializer path can drop CXXBindTemporaryExpr cleanup information. That loses destructor cleanup for ordinary temporaries inside the initializer; for a DMI-local lambda with an init-capture, the closure temporary is not destroyed at the end of the full-expression.
Handle CXXBindTemporaryExpr explicitly while rebuilding these initializers, rebind transformed subexpressions with MaybeBindToTemporary, and remember whether the rebuilt initializer still needs non-lifetime-extended cleanups. After discarding the cleanups collected for lifetime extension, restore the ExprWithCleanups marker only when such a rebuilt temporary remains.
When MaybeBindToTemporary references an implicit destructor and Sema has synthesized its body, pass that declaration to the AST consumer because there may be no later top-level definition point for DMI-local closure types. Add a CodeGenCXX regression test for a lambda init-capture in a default member initializer.
Assisted By: OpenAI Codex
[flang][OpenACC] support collapse on unstructured acc.loop (#196174)
PR #164992 added unstructured-loop support to OpenACC lowering (no
bounds on acc.loop, IVs privatized, body emitted as explicit cf), but it
didn't covered the `collapse(N)` case. Compiling
```
!$acc parallel loop collapse(2)
do j = 1, n
do i = 1, n
if (i == jdiag) then
a(i,j) = 0.0d0
cycle
end if
a(i,j) = real(i + j, 8)
end do
end do
```
asserted in MLIR's runRegionDCE: "Assertion `mightHaveTerminator()'
failed".
[14 lines not shown]
clang: Consolidate -aux-triple handling (#196551)
All of the offload languages were essentially doing the
same thing, with overcomplicated conditions conditional on
the language.
[AMDGPU] Pre-commit unit test for RP tracking `reset`/`advance` inconsistencies fix (#196098)
This adds a new AMDGPU unit test file for testing the behavior of
`GCNRPTracker` and its related classes. The two test showcase confusing
return value and behavioral semantics for variants of the advance and
reset functions, which will be clarified in a follow up commit.
[PowerPC][NFC]Refactor EmitInstrWithCustomInserter (#196114)
Currently PPCTargetLowering::EmitInstrWithCustomInserter() uses a large
if/else-if structure. Update to use switch and
move ATOMIC_CMP_SWAP and SELECT code to helper functions for better
readability and maintenance.
clang/AMDGPU: Pass BoundArch through device libs handling
Pre-work to consolidate target identification for future target
option bug fixes. Also requires updating flang to match recent
clang changes.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
[SLP] Vectorize struct-returning intrinsics
Allow SLP to combine across lanes calls that return a literal struct
(llvm.sincos, llvm.*.with.overflow, llvm.frexp, ...) into a single
call returning a struct of vectors, by widening {T, T, ...} to
{<VF x T>, ...} via VectorTypeUtils and emitting extractvalue +
extractelement for external uses.
Reviewers: hiraditya, bababuck, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/195521
[AArch64][GlobalISel] Legalize F64 to BF16 fptruncates (#196077)
This two-step expansion of bf16 fptrunc steps needs to be careful to
avoid double-rounding error. Under AArch64 we can apparently convert to
a fcvtxn that performs round-to-odd, followed by a standard fp truncate
to bf16 to make sure the rounding from there is done correctly. This
reuses the existing lowering added for vector operations.
[Clang][Modules] Fix -Wunused-variable (#196577)
Mark some variables [[maybe_unused]] and inline others that do not have
side effects to avoid -Wunused-variable in non-assert builds.
[Object][Wasm] Fix off-by-one in data segment name index validation (#196338)
The check `Index > DataSegments.size()` in `parseNameSection()` allows
`Index == DataSegments.size()`, which is an out-of-bounds access.
In an assertions-disabled ASan build, a malformed wasm object with one
data segment and a data segment name entry using index 1 triggers a
heap-buffer-overflow READ in `WasmObjectFile::parseNameSection()`.
Fix by checking `Index >= DataSegments.size()` instead.
Also add a regression test that verifies the malformed input is rejected
with "invalid data segment name entry".
[libc] Fix op_tests Memcmp guard to require SSE4.1 (#196572)
The is_vector<__m128i> specialisation in op_x86.h is gated on
__SSE4_1__, but op_tests.cpp included generic::Memcmp<__m128i> under the
weaker __SSE2__ guard. On baseline x86-64 (where __SSE2__ is always
defined but __SSE4_1__ may not be), this caused a static_assert failure
in is_element_type_v.
Changed the guard from __SSE2__ to __SSE4_1__ to match the
specialisation requirement, consistent with how BcmpImplementations
already guards its __m128i entry.
Assisted-by: Automated tooling, human reviewed.
[DAG] canCreateUndefOrPoison - ISD::FCEIL/FFLOOR/FTRUNC/FRINT/FNEARBYINT/FROUND/FROUNDEVEN can never create poison/undef (#196543)
Also add missing fold support for ftrunc(fround(x)) -> fround(x)
clang: Add BoundArch argument to addClangTargetOptions
addClangTargetOptions already has an OffloadKind argument,
but it kind of doesn't make sense for any function to know the
OffloadKind, but not the associated BoundArch.
The current process is kind of convoluted. TranslateArgs
synthesizes a -mcpu argument from BoundArch, and later
addClangTargetOptions re-parses that -mcpu argument each
time it wants the architecture. Add this argument so this
can be cleaned up in a future change.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
clang: Consolidate -aux-triple handling
All of the offload languages were essentially doing the
same thing, with overcomplicated conditions conditional on
the language.
[AMDGPU] Pre-commit unit test for RP tracking reset/advance behavior
This adds a new AMDGPU unit test file for testing the behavior of
`GCNRPTracker` and its related classes. The two test showcase confusing
return value and behavioral semantics for variants of the advance and
reset functions, which will be clarified in a follow up commit.
This also moves some common test helpers from other AMDGPU unit tests to
the `AMDGPUUnitTests` TU to avoid repetition between unit tests.
[CodeGen][AMDGPU] Move boilerplate unit test code to base class (NFC) (#196547)
This adds the `CodeGenTestBase` class to handle boilerplate code for
codegen unit tests and makes use of it wherever possible, in particular
in AMDGPU unit tests.
Furthermore, this makes all AMDGPU unit tests rely on GoogleTest's API
for "run once per test-suite" code, instead of re-implementing that
behavior using a `std::once` flag. As a consequence all TEST(...) become
TEST_F(...).