[NFC][mlir][shard] Unify MoveLastSplitAxisPattern/MoveLastSplitAxisPattern (#192295)
Made MoveLastSplitAxisPattern more general to also cover MoveLastSplitAxisPattern.
Less code, same functionality.
Assisted by claude.
[LV][RISCV] Fix incorrect pointer operand in interleaved access tests. nfc (#192464)
In some load cases, the index 1 member used the same pointer as the
index 0 member. This patch corrected the pointer use.
[CFI] Extract DropTypeTestsPass from LowerTypeTestsPass (#192578)
This patch introduces `DropTypeTestsPass` as a dedicated pass
to handle the dropping of type tests. Previously, this was handled
by `LowerTypeTestsPass` with a specific parameter.
By splitting this into its own pass, we simplify the pass pipeline
construction and make the intent clearer in `PassRegistry.def` and
various pipeline builders.
It's almost NFC, if not opt command line changes.
[libc][nfc] Fix ucontext buildbot failure with noexcept (#192343) (#192601)
Added noexcept to getcontext and setcontext declarations and definitions
to resolve missing attribute warning on aliases.
This fixes failures on builders using GCC like
libc-x86_64-debian-gcc-fullbuild-dbg.
[AMDGPU][ASAN] Move allocas to entry block in amdgpu-sw-lower-lds pass (#190772)
The `amdgpu-sw-lower-lds` pass inserts a workitem-0 check, malloc, and
barrier before the original entry block, creating a new entry block.
This pushes the original allocas into a non-entry block, causing LLVM to
treat them as dynamic allocas.
AMDGPU backend generates incorrect flat addresses for dynamic alloca
addrspacecasts at -O0, causing memory faults when ASan is enabled with
LDS.
This PR hoists constant-size allocas to the new entry block so they
remain static.
[AMDGPU] Report only local per-function resource usage when object linking is enabled
With object linking the linker aggregates resource usage across TUs via
`.amdgpu.info`, so compile-time pessimism and call-graph propagation duplicate
the linker's work or pollute its inputs.
In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building call-graph
max/or expressions.
[lldb] Add synthetic variable support to Get*VariableList.
This patch adds a new flag to the lldb_private::StackFrame API to get variable lists: `include_synthetic_vars`. This allows ScriptedFrame (and other future synthetic frames) to construct 'fake' variables and return them in the VariableList, so that commands like `fr v` and `SBFrame::GetVariables` can show them to the user as requested.
This patch includes all changes necessary to call the API the new way - I tried to use my best judgement on when to include synthetic variables or not and leave comments explaining the decision.
As a consequence of producing synthetic variables, this patch means that ScriptedFrame can produce Variable objects with ValueType that contains a ValueTypeExtendedMask in a high bit. This necessarily complicates some of the switch/case handling in places where we would expect to find such variables, and this patch makes best effort to address all such cases as well. From experience, they tend to show up whenever we're dealing with checking if a Variable is in a specified scope, which means we basically have to check the high bit against some user input saying "yes/no synthetic variables".
stack-info: PR: https://github.com/llvm/llvm-project/pull/181501, branch: users/bzcheeseman/stack/9
[clang][bytecode] Mark pointers destroyed in destructors (#192460)
We didn't use to do this at all, so calling the destructor explicitly
twice in a row wasn't an error. Calling it and accessing the object
afterwards wasn't an error either.
[LoongArch] Add support for vector FP_ROUND from vxf64 to vxf32
In LoongArch, [x]vfcvt.s.d intstructions require two vector registers
for v4f64->v4f32, v8f64->v8f32 conversions.
This patch handles these cases:
- For FP_ROUND v2f64->v2f32(illegal), add a customized v2f32 widening
to convert it into a target-specific LoongArchISD::VFCVT.
- For FP_ROUND v4f64->v4f32, on LSX platforms, v4f64 is illegal and will
be split into two v2f64->v2f32, resulting in two LoongArchISD::VFCVT.
Finally, they are combined into a single node during combining
LoongArchISD::VPACKEV. On LASX platforms, v4f64->v4f32 can directly
lower to vfcvt.s.d in lowerFP_ROUND.
- For FP_ROUND v8f64->v8f32, on LASX platforms, v8f64 is illegal and
will be split into two v4f64->v4f32 and then combine using
ISD::CONCAT_VECTORS, so xvfcvt.s.d is generated during its
combination.
[LoongArch] Select `V{AND,OR,XOR,NOR}I.B` for bitwise with byte splat immediates
The `V{AND,OR,XOR,NOR}I.B` instructions operate on byte elements and accept
an 8-bit immediate. However, when the same byte splat constant is used with
wider vector element types (e.g. v8i16, v4i32, v2i64), instruction selection
currently falls back to materializing the constant in a temporary register.
```
vrepli.b -1
vxor.v
```
even though the immediate form is available:
```
vxori.b 255
```
This happens because selectVSplatImm requires the splat bit width to match
[11 lines not shown]
[LoongArch] Select V{ADD,SUB}I for operations with negative splat immediates
Currently, vector add/sub with a negative splat immediate is lowered as a
vector splat followed by a register-register add, e.g.:
```
vrepli.b $vr1, -1
vadd.b $vr0, $vr0, $vr1
```
This misses the opportunity to use the more efficient V{ADD,SUB}I instruction
with a positive immediate.
This patch introduces `selectVSplatImmNeg` to detect negative splat
immediates whose negated value fits in a 5-bit unsigned immediate. New
patterns `(Pat{Vr,Vr}Nimm5)` are added to match:
```
add v, splat(-imm) --> vsubi v, v, imm
[7 lines not shown]
[libc][threads] adjust futex library and expose requeue API (#192478)
Make futex a common abstraction layer across platforms.
(linux/wasm/macOS/windows/fuchsia all have the support, which we can
align their support later on).
This patch also expose a requeue API that returns ENOSYS on unsupported
platforms. The requeue operation will be needed to reimplement a strict
FIFO style condvar similar to musl.
Additional cleanup is done to change raw syscall return value to
`ErrorOr<int>`.
Assisted-by: Codex with gpt-5.4 medium fast
[mlir][memref] Remove unit-stride restriction in SubViewOp folding (#192437)
This PR replaces manual offset/size resolution with `affine::mergeOffsetsSizesAndStrides`, simplifying the code and extending subview-of-subview folding to support non-unit strides.