[AtomicExpand] Support non-integer atomic loads. (#199310)
This is arguably an enhancement rather than a bugfix. But
AtomicExpandPass already tries to support some non-integer atomic ops
using cmpxchg by bitcasting to/from an integer type. We're just missing
this one path used by atomic load. Seems easy enough to support it.
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
Reland "[LifetimeSafety] Detect iterator invalidation through container aliases" (#197873)
This relands #195231, which was reverted in commit
7c9717848851f3a71908becab4312ddc2d8482b8.
The original crash from the reproducer no longer reproduces after
#196680, #197220, and #197604. I verified the original `repro.cpp`: it
no longer hits the lifetime-safety assertion now.
Also added regression tests for the crash:
```cpp
struct SinkInteriorBorrow {
const char *dest_; // expected-note {{this field dangles}}
SinkInteriorBorrow(std::string *dest, int n) : dest_(dest->data()) { // expected-warning {{parameter which escapes to a field is later invalidated}}
if (n > 0)
dest->clear(); // expected-note {{invalidated here}}
}
[3 lines not shown]
[AMDGPU] Fix v_dot4_i32_i8 alias to set neg_lo modifiers (#197998)
Fixes issue here https://github.com/ROCm/ROCm/issues/6126
The `v_dot4_i32_i8` assembly alias was not setting the `neg_lo` modifier
bits when converted to `v_dot4_i32_iu8`, which causes signed int8
operands to be treated as unsigned.
For example: `q=[1,-1,1,-1], k=[1,1,1,1]`: expected 0, returned 512. The
instruction is computing `1*1 + 255*1 + 1*1 + 255*1 = 512` ; treating
`-1 (0xFF)` as `255`.
On AMD GFX11+, the native `v_dot4_i32_i8` instruction doesn't exist. The
hardware provides `v_dot4_i32_iu8` with `neg_lo` modifier bits to
control signedness of each operand. The compiler correctly lowers
`v_dot4_i32_i8` intrinsics by setting `neg_lo:[1,1,0]`, but inline
assembly using the `v_dot4_i32_i8` mnemonic bypasses this lowering and
goes directly to the assembler.
[10 lines not shown]
[Flang][OpenMP] Allow workdistribute inside 'target teams' (#199006)
Currently, a `workdistribute` construct nested inside of a combined
`target teams` is incorrectly reported as an error. This patch fixes
that.
[clang] fix getTemplateInstantiationArgs
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
[clang] implement CWG2064: ignore value dependence for decltype
The 'decltype' for a value-dependent (but non-type-dependent) should be known,
so this patch makes them non-opaque instead.
This patch also implements what's neceessary to allow overloading
on pure differences in instantiation dependence, making `std::void_t`
usable for SFINAE purposes.
This also readds a few test cases from da98651, which was a previous attempt
at resolving CWG2064.
Fixes #8740
Fixes #61818
Fixes #190388
[clang] fix getTemplateInstantiationArgs
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
[clang] fix getTemplateInstantiationArgs
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
[CIR][AArch64] Lower NEON vsli/vsliq intrinsics (#198309)
### summary
part of: https://github.com/llvm/llvm-project/issues/185382
Lower the AArch64 NEON shift-left-and-insert intrinsics (`vsli_n_v` /
`vsliq_n_v`) in the CIR codegen path. The lowering mirrors classic
CodeGen (`clang/lib/CodeGen/TargetBuiltins/ARM.cpp`): bitcast both
vector operands to the target element type and emit a direct
`llvm.aarch64.neon.vsli` intrinsic call.
[Hexagon] Support reserving R16-R28 registers via -ffixed-rN (#197208)
Extend register reservation from R19-only to R16-R28. This allows users
to reserve callee-saved registers (R16-R27) and R28 via command-line
flags -ffixed-r16 through -ffixed-r28. The single bool ReservedR19 is
replaced with an array-based approach (ReservedR[32]) to scale cleanly
across all supported registers.
---------
Co-authored-by: quic-santdas <quic_santdas at quicinc.com>
[clang] preserve exact redeclaration for getTemplateInstantiationPattern
This makes these functions not always return the definition if any.
The few users which depend on this are updated to fetch the definition
themselves.
Also fixes the VarDecl variant returning the queried declaration itself.
[VPlan] Rename Expression::isSingleScalar (NFC) (#199041)
The single-scalar terminology, as it is used in other places indicates
that all operands are scalars and that the result is a scalar.
VPExpressionRecipe::isSingleScalar is a misnomer, and is actually a
vector-to-scalar, using the existing terminology. Rename it for clarity.
[libc++] remove duplicate assertions for void/reference const any_cast (#199425)
For test cases of the const overload of any_cast, such as:
```C++
void test() {
std::any a = 0;
const std::any& a2 = a;
(void)std::any_cast<int&>(&a2);
}
```
(And similarly for void).
The problem is that the assertions are implemented both in the const and
non-const any_cast overloads, but since the const overload delegates to
the non-const overload, that ends up producing the same assertion twice.
This separates those test cases, because those assertions are
implemented in the function body, and that's only instantiated once per
specialization, not once per use.
[X86] Remove shouldCastAtomicLoadInIR; use DAG combine instead
Remove X86's shouldCastAtomicLoadInIR override that cast FP atomic
loads to integer at the IR level. Instead, handle this in a pre-legalize
DAG combine (combineAtomicLoad) that rewrites FP/FP-vector atomic loads
to integer atomic loads plus a bitcast.
This depends on #199310 which adds the necessary cmpxchg support for
non-integer atomic loads in AtomicExpand.