[AMDGPU][ASAN] Move allocas to entry block in amdgpu-sw-lower-lds pass (#190772)
The `amdgpu-sw-lower-lds` pass inserts a workitem-0 check, malloc, and
barrier before the original entry block, creating a new entry block.
This pushes the original allocas into a non-entry block, causing LLVM to
treat them as dynamic allocas.
AMDGPU backend generates incorrect flat addresses for dynamic alloca
addrspacecasts at -O0, causing memory faults when ASan is enabled with
LDS.
This PR hoists constant-size allocas to the new entry block so they
remain static.
[AMDGPU] Report only local per-function resource usage when object linking is enabled
With object linking the linker aggregates resource usage across TUs via
`.amdgpu.info`, so compile-time pessimism and call-graph propagation duplicate
the linker's work or pollute its inputs.
In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building call-graph
max/or expressions.
[lldb] Add synthetic variable support to Get*VariableList.
This patch adds a new flag to the lldb_private::StackFrame API to get variable lists: `include_synthetic_vars`. This allows ScriptedFrame (and other future synthetic frames) to construct 'fake' variables and return them in the VariableList, so that commands like `fr v` and `SBFrame::GetVariables` can show them to the user as requested.
This patch includes all changes necessary to call the API the new way - I tried to use my best judgement on when to include synthetic variables or not and leave comments explaining the decision.
As a consequence of producing synthetic variables, this patch means that ScriptedFrame can produce Variable objects with ValueType that contains a ValueTypeExtendedMask in a high bit. This necessarily complicates some of the switch/case handling in places where we would expect to find such variables, and this patch makes best effort to address all such cases as well. From experience, they tend to show up whenever we're dealing with checking if a Variable is in a specified scope, which means we basically have to check the high bit against some user input saying "yes/no synthetic variables".
stack-info: PR: https://github.com/llvm/llvm-project/pull/181501, branch: users/bzcheeseman/stack/9
[clang][bytecode] Mark pointers destroyed in destructors (#192460)
We didn't use to do this at all, so calling the destructor explicitly
twice in a row wasn't an error. Calling it and accessing the object
afterwards wasn't an error either.
[LoongArch] Add support for vector FP_ROUND from vxf64 to vxf32
In LoongArch, [x]vfcvt.s.d intstructions require two vector registers
for v4f64->v4f32, v8f64->v8f32 conversions.
This patch handles these cases:
- For FP_ROUND v2f64->v2f32(illegal), add a customized v2f32 widening
to convert it into a target-specific LoongArchISD::VFCVT.
- For FP_ROUND v4f64->v4f32, on LSX platforms, v4f64 is illegal and will
be split into two v2f64->v2f32, resulting in two LoongArchISD::VFCVT.
Finally, they are combined into a single node during combining
LoongArchISD::VPACKEV. On LASX platforms, v4f64->v4f32 can directly
lower to vfcvt.s.d in lowerFP_ROUND.
- For FP_ROUND v8f64->v8f32, on LASX platforms, v8f64 is illegal and
will be split into two v4f64->v4f32 and then combine using
ISD::CONCAT_VECTORS, so xvfcvt.s.d is generated during its
combination.
[LoongArch] Select `V{AND,OR,XOR,NOR}I.B` for bitwise with byte splat immediates
The `V{AND,OR,XOR,NOR}I.B` instructions operate on byte elements and accept
an 8-bit immediate. However, when the same byte splat constant is used with
wider vector element types (e.g. v8i16, v4i32, v2i64), instruction selection
currently falls back to materializing the constant in a temporary register.
```
vrepli.b -1
vxor.v
```
even though the immediate form is available:
```
vxori.b 255
```
This happens because selectVSplatImm requires the splat bit width to match
[11 lines not shown]
[LoongArch] Select V{ADD,SUB}I for operations with negative splat immediates
Currently, vector add/sub with a negative splat immediate is lowered as a
vector splat followed by a register-register add, e.g.:
```
vrepli.b $vr1, -1
vadd.b $vr0, $vr0, $vr1
```
This misses the opportunity to use the more efficient V{ADD,SUB}I instruction
with a positive immediate.
This patch introduces `selectVSplatImmNeg` to detect negative splat
immediates whose negated value fits in a 5-bit unsigned immediate. New
patterns `(Pat{Vr,Vr}Nimm5)` are added to match:
```
add v, splat(-imm) --> vsubi v, v, imm
[7 lines not shown]
[libc][threads] adjust futex library and expose requeue API (#192478)
Make futex a common abstraction layer across platforms.
(linux/wasm/macOS/windows/fuchsia all have the support, which we can
align their support later on).
This patch also expose a requeue API that returns ENOSYS on unsupported
platforms. The requeue operation will be needed to reimplement a strict
FIFO style condvar similar to musl.
Additional cleanup is done to change raw syscall return value to
`ErrorOr<int>`.
Assisted-by: Codex with gpt-5.4 medium fast
[mlir][memref] Remove unit-stride restriction in SubViewOp folding (#192437)
This PR replaces manual offset/size resolution with `affine::mergeOffsetsSizesAndStrides`, simplifying the code and extending subview-of-subview folding to support non-unit strides.
[RISCV] Support MachineOutlinerRegSave for RISCV (#191351)
This patch adds support for the RegSave strategy in the RISC-V
MachineOutliner pass. It uses t1–t6 to preserve the t0 value across the
outlined function call when t0 is unavailable. This enables more
potential outlining candidates.
---------
Co-authored-by: Craig Topper <craig.topper at sifive.com>
[DAGCombiner] Extend convertBuildVecZextToZext to sign extends (#192372)
Generalize the existing fold that collapses a BUILD_VECTOR of ZERO_EXTEND
(or ANY_EXTEND) of EXTRACT_VECTOR_ELTs into a single vector extend so that
it also handles SIGN_EXTEND. Mixed sign and zero extends remain unsupported
because their high-bit semantics differ, so the combine bails out in that
case.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[CIR][NFC] Upstream IR roundtrip tests for branch and loop ops (#189006)
Add `clang/test/CIR/IR` roundtrip tests for `cir.br`, `cir.brcond`,
`cir.for`, `cir.while`, and `cir.do`.
This adds parser/printer coverage for the textual forms of these
control-flow operations.
Partially addresses #156747.
[flang][cuda] Add missing pointer deallocation entry point (#192566)
We were missing the deallocation entry point for pointer and wiring all
to allocatable deallocate which will trigger Invalid descriptor error.