[clang][bytecode] Mark pointers destroyed in destructors (#192460)
We didn't use to do this at all, so calling the destructor explicitly
twice in a row wasn't an error. Calling it and accessing the object
afterwards wasn't an error either.
[LoongArch] Add support for vector FP_ROUND from vxf64 to vxf32
In LoongArch, [x]vfcvt.s.d intstructions require two vector registers
for v4f64->v4f32, v8f64->v8f32 conversions.
This patch handles these cases:
- For FP_ROUND v2f64->v2f32(illegal), add a customized v2f32 widening
to convert it into a target-specific LoongArchISD::VFCVT.
- For FP_ROUND v4f64->v4f32, on LSX platforms, v4f64 is illegal and will
be split into two v2f64->v2f32, resulting in two LoongArchISD::VFCVT.
Finally, they are combined into a single node during combining
LoongArchISD::VPACKEV. On LASX platforms, v4f64->v4f32 can directly
lower to vfcvt.s.d in lowerFP_ROUND.
- For FP_ROUND v8f64->v8f32, on LASX platforms, v8f64 is illegal and
will be split into two v4f64->v4f32 and then combine using
ISD::CONCAT_VECTORS, so xvfcvt.s.d is generated during its
combination.
[LoongArch] Select `V{AND,OR,XOR,NOR}I.B` for bitwise with byte splat immediates
The `V{AND,OR,XOR,NOR}I.B` instructions operate on byte elements and accept
an 8-bit immediate. However, when the same byte splat constant is used with
wider vector element types (e.g. v8i16, v4i32, v2i64), instruction selection
currently falls back to materializing the constant in a temporary register.
```
vrepli.b -1
vxor.v
```
even though the immediate form is available:
```
vxori.b 255
```
This happens because selectVSplatImm requires the splat bit width to match
[11 lines not shown]
[LoongArch] Select V{ADD,SUB}I for operations with negative splat immediates
Currently, vector add/sub with a negative splat immediate is lowered as a
vector splat followed by a register-register add, e.g.:
```
vrepli.b $vr1, -1
vadd.b $vr0, $vr0, $vr1
```
This misses the opportunity to use the more efficient V{ADD,SUB}I instruction
with a positive immediate.
This patch introduces `selectVSplatImmNeg` to detect negative splat
immediates whose negated value fits in a 5-bit unsigned immediate. New
patterns `(Pat{Vr,Vr}Nimm5)` are added to match:
```
add v, splat(-imm) --> vsubi v, v, imm
[7 lines not shown]
[libc][threads] adjust futex library and expose requeue API (#192478)
Make futex a common abstraction layer across platforms.
(linux/wasm/macOS/windows/fuchsia all have the support, which we can
align their support later on).
This patch also expose a requeue API that returns ENOSYS on unsupported
platforms. The requeue operation will be needed to reimplement a strict
FIFO style condvar similar to musl.
Additional cleanup is done to change raw syscall return value to
`ErrorOr<int>`.
Assisted-by: Codex with gpt-5.4 medium fast
[mlir][memref] Remove unit-stride restriction in SubViewOp folding (#192437)
This PR replaces manual offset/size resolution with `affine::mergeOffsetsSizesAndStrides`, simplifying the code and extending subview-of-subview folding to support non-unit strides.
[RISCV] Support MachineOutlinerRegSave for RISCV (#191351)
This patch adds support for the RegSave strategy in the RISC-V
MachineOutliner pass. It uses t1–t6 to preserve the t0 value across the
outlined function call when t0 is unavailable. This enables more
potential outlining candidates.
---------
Co-authored-by: Craig Topper <craig.topper at sifive.com>
[DAGCombiner] Extend convertBuildVecZextToZext to sign extends (#192372)
Generalize the existing fold that collapses a BUILD_VECTOR of ZERO_EXTEND
(or ANY_EXTEND) of EXTRACT_VECTOR_ELTs into a single vector extend so that
it also handles SIGN_EXTEND. Mixed sign and zero extends remain unsupported
because their high-bit semantics differ, so the combine bails out in that
case.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[CIR][NFC] Upstream IR roundtrip tests for branch and loop ops (#189006)
Add `clang/test/CIR/IR` roundtrip tests for `cir.br`, `cir.brcond`,
`cir.for`, `cir.while`, and `cir.do`.
This adds parser/printer coverage for the textual forms of these
control-flow operations.
Partially addresses #156747.
[flang][cuda] Add missing pointer deallocation entry point (#192566)
We were missing the deallocation entry point for pointer and wiring all
to allocatable deallocate which will trigger Invalid descriptor error.
[NFC][OpenMP] Make map ordering tests for no host->tgt transfer more robust (#192571)
They were relying on the host value not being seen on the device, but
the value being matched was small enough for the probability of a
successful match against garbage data relatively high.
Now we just rely on the LIBOMPTARGET_DEBUG logs to ensure there wasn't
any transfer.
[libclc] Fix atomic_fetch_add/sub overloads for uintptr_t (#192570)
The overloads taking the memory order and/or scope parameters should
have the `_explicit` suffix, according to the OpenCL C specification.