[X86][FastISel] Restore support for struct returns (#194586)
After #180322, X86 FastISel forces SDAG fallback for any call with a
struct return. This caused major compile-time regressions for debug
builds in Rust, where struct returns are very common.
The type legality check should work on the de-aggregated types, not on
the return type directly.
[LLD][COFF] Move Archive::create call to LinkerDriver::addBuffer (NFC) (#194346)
This allows an upcoming change to Archive::create() to make decisions
based on the archive type.
[MLIR][GPU] Add cooperative launch support to gpu.launch_func (#190639)
Add a `cooperative` UnitAttr to `gpu.launch_func` that enables
cooperative kernel launch semantics. Cooperative launches guarantee that
all thread blocks in the grid are co-resident on the GPU simultaneously,
enabling grid-wide synchronization patterns.
## Implementation
When `cooperative` is set (with or without cluster sizes), the lowering
emits a call to the new `mgpuLaunchKernelCooperative` runtime function,
which uses `cuLaunchKernelEx` with a `CUlaunchConfig` and
`CU_LAUNCH_ATTRIBUTE_COOPERATIVE`. This API is guarded behind
`CUDA_VERSION >= 12000`. The HIP path funnels through
`hipModuleLaunchCooperativeKernel`.
## Changes
- **GPUOps.td**: add `cooperative` UnitAttr and assembly format keyword
[17 lines not shown]
[clang][analyzer] Add support for detecting uninitialized dynamically-allocated objects
Adapt the allocated region into a `TypedValueRegion` by retrieving its
type and wrapping it in an `ElementRegion`.
The `willObjectBeAnalyzedLater` function must therefore fall back on
using `SubRegion`s.
CPP-7677
[mlir][MemRefToLLVM] Support floating-point types in GenericAtomicRMWOp lowering (#194300)
`llvm.cmpxchg` only accepts integer or pointer operands. When the memref
element type is floating-point (e.g. `f16`), bitcast values to a
same-width integer for the CAS and bitcast the new-loaded result back to
the original float type.
[SPIRV] Add missing OpenCL atomic_fetch_min/max builtin mappings (#190443)
## Summary
The SPIR-V backend maps OpenCL `atomic_fetch_add`/`sub`/`or`/`xor`/`and`
(and their `_explicit` variants) to SPIR-V atomic opcodes, but was
missing support for `atomic_fetch_min`/`atomic_fetch_max`, their
`_explicit` variants, and the legacy `atom_min`/`atom_max` builtins.
This caused OpenCL programs using these atomics to emit unresolved
function calls instead of the correct
`OpAtomicSMin`/`OpAtomicSMax`/`OpAtomicUMin`/`OpAtomicUMax`
instructions.
### Approach
Unlike add/sub/or/xor/and (which are sign-agnostic), min/max require
distinct signed vs unsigned SPIR-V opcodes. Rather than inspecting the
`OpTypeInt` signedness bit at runtime (which is always 0 in this
backend), this patch uses the existing prefix-based builtin lookup
[17 lines not shown]
[LoongArch] Legalize BUILD_VECTOR into a broadcast when all non-undef elements are identical
When a BUILD_VECTOR consists of the same element (ignoring undefs),
it is better emitting a broadcast instead of multiple insertions.
Some floating-point cases suffer performance regressions, those
specific cases are excluded in this commit. Including when:
- only one element is non-undef,
- only two elements are non-undef, and one of them must at index 0,
- for v8f32 vector type, specially exclude the cases when the only
two non-undefs are at index (1,2)/(1,3)/(2,3).
Services: Kea DHCPv6: infer IPv6 lease tpe in delete script via lease lookup
This avoids propagating lease type handling through controller and UI
layers while fixing unreliable deletion of IA_PD leases.
The approach is pragmatic: in the extremely unlikely case that IA_NA and
IA_PD share the same base address, multiple leases may be deleted. This
tradeoff is considered acceptable given the low impact and recoverable
nature of DHCP leases.
[X86] Attempt to fold extract_vector_elt(logicop(x,y),i) -> extract_vector_elt(x,i) (#194581)
When extracting from logicops, we often don't need to extract the result
if one of the element sources is identity (and(x,-1) -> x, or/xor(x,0)
-> x etc.), so this patch uses SimplifyMultipleUseDemandedVectorElts to
peek through to an underlying build_vector.
I had hoped to make this generic, but there's still a lot of yak shaving
to deal with first, as usual - I've included the minimal x86-specific
fixes:
* missing constant folding of (vXi1 logicop(bitcast(c1),bitcast(c2)))
* fold kshiftr(concat_vectors(x,y,z,w),c) -> concat_vectors(z,w,0,0)
Fixes #193700
NAS-140814 / 27.0.0-BETA.1 / Copy VM NVRAM and TPM state on clone (#18828)
## Context
In continuation of the changes made in
https://github.com/truenas/middleware/pull/18764, the same fixes have
been applied to the VM cloning process ensuring that when a VM is cloned
- relevant files are copied over of the VM as well so tpm/secure boot
function as intended.
[flang][pft] visit original symbol in acc use_device (#194588)
Fix regression after https://github.com/llvm/llvm-project/pull/193689
when a use_device is referring to variables from a host module.
The original symbol needs to be visited in the PFT so that it will be
instantiated, but it is not visible anymore from the parse tree, and not
directly connected to the new symbol (this is because variables in
use_device are treated in a special way in order to give them the DEVICE
attribute, other data clause do not need such handling).
Look into the parent scope for a symbol with the same name and visit it.
[AMDGPU] Support Wave Reduction for true-16 types - 3
Supporting true-16 versions of the reduction intrinsics
Supported Ops: `and`, `or`, `xor`.
Supports only the iterative stratergy, DPP is yet
to be supported.
[LoongArch] Support memcmp expansion for vectors and combine for i128/i256 setcc
This commit enables memcmp expansion for lsx/lasx. After doing
this, i128 and i256 loads which are illegal types on LoongArch
will be generated. Without process, they will be splited to
legal scalar type.
So this commit also enable combination for `setcc` to bitcast
i128/i256 types to vector types before type legalization and
generate vector instructions.
Inspired by x86 and riscv.