[LoongArch] Add DAG combine for horizontal widening add/sub
Add a DAG combine to recognize horizontal widening add/subtract patterns
and lower them to the corresponding LSX/LASX instructions.
The following pattern is matched for both signed and unsigned variants:
```
ADD/SUB(SEXT/ZEXT(BUILD_VECTOR(extract_elt(vj, 1), extract_elt(vj, 3), ...)),
SEXT/ZEXT(BUILD_VECTOR(extract_elt(vk, 0), extract_elt(vk, 2), ...)))
```
This covers the following instructions:
```
LSX: VHADDW.H.B, VHADDW.W.H, VHADDW.D.W
VHADDW.HU.BU, VHADDW.WU.HU, VHADDW.DU.WU
VHSUBW.H.B, VHSUBW.W.H, VHSUBW.D.W
VHSUBW.HU.BU, VHSUBW.WU.HU, VHSUBW.DU.WU
[10 lines not shown]
[Clang] Implement CWG 2282
Link: https://wg21.link/cwg2282
For non-overaligned types, overload resolution now falls back to aligned
allocation functions in C++20 and later.
[Clang][Sema][NFCI] Simplify `resolveAllocationOverload()`
`resolveAllocationOverload()` performs multiple rounds of overload
resolution (typed and untyped, aligned and unaligned), each requiring a
slightly different argument list. Previously, the argument vector was
mutated in-place, which made the flow hard to follow.
This refactor prepares the list of arguments before calling
`resolveAllocationOverload()`. The preferred argument list is passed in
`PrefArgs`, while the fallback arguments are passed in `FallbackArgs`.
If the fallback resolution is not required, `FallbackArgs` is empty.
When making a nested call to perform the resolution with the fallback
arguments, the current set of candidates is passed in `PrefCandidates`
(formerly, `AlignedCandidates`). This argument also serves as a flag
used to distinguish the top-level call from nested fallback calls.
[Clang] Implement CWG 2282
Link: https://wg21.link/cwg2282
For non-overaligned types, overload resolution now falls back to aligned
allocation functions.
[NFC][OpenMP] Add mapper-specific tests exercising pointee seciton mapping.
Also add a couple of tests that require correct propagation of map-type-modifier
bits into the mapper.
Reland "[clang][ssaf] Track target triple in TU and LU summaries. #204027" (#204259)
This commit introduces the following changes:
- Add `TargetTriple` field to `TUSummary`, `LUSummary`, and their encodings.
- Frontend captures the triple from `CompilerInstance::getTarget()` when extracting a TU summary.
- JSON format reads/writes a `target_triple` field at the root of each summary; reader rejects strings not in `llvm::Triple::normalize` form.
- All TU/LU JSON test inputs/outputs and unit tests updated to include the new field.
`clang-ssaf-linker` uses a hardcoded triple value for the link unit; surfacing the triple through the tool will be handled in a follow-up PR.
rdar://179403011
[lldb][test] Skip even more unsupported tests on WebAssembly (#204255)
A second pass over the full API suite for tests that depend on features
unavailable on wasm32-wasip1 or in LLDB's Wasm support:
- Expression evaluation (skipIfWasm) for the C++ tests that the
"expression" category doesn't cover, since that category only applies to
commands/expression/*.
- Attaching to a running process (skipIfWasm). These tests have the
harness spawn the inferior as a host process and then attach, but a
.wasm module isn't a native executable, so exec'ing it fails with
ENOEXEC ("Exec format error"). The wasm module only runs inside the
runtime (e.g. iwasm) that LLDB launches, so there is no host process to
attach to.
Where a test also has supported, passing cases, the decorator is applied
per method.
[HLSL] Codegen for passing cbuffer structs as function args (#203961)
Constant buffer structs are in `hlsl_constant` address space and have a
different layout than structs in default address space. They need to be
copied element-by-element and not by `memcpy`.
This change adds a check for the `hlsl_constant` address space to the
code path that avoids materializing a temporary copy for simple
`CK_LValueToRValue ` casts. This makes sure the constant buffer structs
is copied element-by-element to a temporary before being passed to a
function.
[Clang][Sema][NFCI] Simplify `resolveAllocationOverload()`
`resolveAllocationOverload()` performs multiple rounds of overload
resolution (typed and untyped, aligned and unaligned), each requiring a
slightly different argument list. Previously, the argument vector was
mutated in-place, which made the flow hard to follow.
This refactor prepares the list of arguments before calling
`resolveAllocationOverload()`. The preferred argument list is passed in
`PrefArgs`, while the fallback arguments are passed in `FallbackArgs`.
If the fallback resolution is not required, `FallbackArgs` is empty.
When making a nested call to perform the resolution with the fallback
arguments, the current set of candidates is passed in `PrefCandidates`
(formerly, `AlignedCandidates`). This argument also serves as a flag
used to distinguish the top-level call from nested fallback calls.
Add a useful command to the python examples & "lldb.utils" (#204251)
When debugging GUI programs where you have a bunch of breakpoints set
that you only want to have trigger when in the middle of some UI
interaction (a drag and drop for example) but not before, you need a way
to have the breakpoints disabled till a certain point, then re-enabled.
But since you are in the middle of the interaction, you can't interact
with the debugger to do that.
This little command disables your breakpoints, continues if you were
stopped, waits for a prescribed interval, then re-enables them.
[scudo] For a realloc that shrinks, retag the extra. (#204031)
When MTE is enabled and an allocations is reallocated from a large size
to a smaller size, zero tag the rest of the allocation. Before this
change only a single granule after the new size was zero tagged. This
adds extra security and use after realloc protection if code would have
tried to read/write into the old size, past the new size.
[BOLT] Delay indirect call pointer setup (#204229)
There is a race in the instrumentation runtime during setup. The setup
initializes the function pointers for indirect call instrumentation
before the indirect call counters array. If the application spawns a
background thread through a constructor (as does jemalloc), the
background thread has a chance to derefence that uninitialized array
pointer. Defer initialization of these function pointers to prevent this
race.
Fixes #198181.
Co-authored-by: Fabian Parzefall <parzefall at meta.com>
[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
[Clang][CodeGen] Fix C++20 NTTP object field indexing (#204174)
C++20 allows a class object to be used as a non-type template
parameter. For example, a template can take an object of a struct like
`{ char A; long long B; char C; char First[2]; char Second[2]; }`.
That struct has padding before `B`. The constant emitter can represent
the value with an ordinary LLVM struct and let LLVM provide that padding
implicitly. Normal record CodeGen instead uses the memory type for the
record, which may contain explicit padding fields so C++ fields have
stable LLVM field numbers.
The bytes are laid out the same, but the LLVM field numbers are not.
For the normal padded record type, `First` has one field index. For the
compact constant type, that same index names a later field. Clang was
computing the field index for the padded record type, then applying it
to the compact template parameter object type. As a result, reading
`First` could read from the bytes for `Second` instead.
[3 lines not shown]
[clang-format] Stop inserting blank line in disabled region (#201995)
Previously, a blank got inserted before the `// clang-format off`
comment with the `SeparateDefinitionBlocks` option set.
Fixes #106983 and #146317.