[SPIR-V] Encode Atomic metadata as UserSemantic string decoration (#193019)
AMDGPU uses metadata to guide atomic related optimisations. SPIR-V was
not handling it, which led to significant and spurious performance
differences. This patch fixes this oversight by encoding the metadata as
UserSemantic string decorations applied to the atomic instructions.
[ExpandMemCmp] Pre-collect memcmp calls to improve compile time (#193415)
Avoid restarting the basic block iteration from the beginning of the
function every time a memcmp/bcmp is expanded. Instead, pre-collect all
memcmp/bcmp calls and process them in a single pass.
[libc][CndVar] reimplmement conditional variable with FIFO ordering (#192748)
This PR reimplements conditional variable with two different variants:
- futex-based shared condvar with atomic counter for waiters
- queue-based private condvar
Notice that thread-local queue node cannot be reliably accessed in
shared processes, so we cannot use a unified implementation in this
case.
POSIX.1-2024 (Issue 8) added atomicity conditions to conditional
variable:
- The `pthread_cond_broadcast()` function shall, **as a single atomic
operation**, determine which threads, if any, are blocked on the
specified condition variable cond and unblock all of these threads.
- The `pthread_cond_signal()` function shall, as a **single atomic
operation**, determine which threads, if any, are blocked on the
[41 lines not shown]
[DirectX] Implement lowering of Texture Load and Texture .operator[] (#193343)
Fixes https://github.com/llvm/llvm-project/issues/192546 and
https://github.com/llvm/llvm-project/issues/192558
This PR defines the TextureLoad DXIL Op (opcode 66), and implements
lowering of the texture load (dx_resource_load_level) intrinsic to the
DXIL op.
This PR also implements the transformation of loads from texture
resources (via dx_resource_getpointer) into dx_resource_load_level
intrinsics.
Assisted-by: Claude Opus 4.7
[AMDGPU] Add a sched group mask for LDSDMA instructions
The existing VMEM masks are not fine-grained enough for some use cases. For
example, if users want to control async loads, using VMEM may cause the compiler
to pick instructions it shouldn't.
This PR adds a new sched group mask for LDSDMA instructions. It is a subclass of
VMEM, but only targets isLDSDMA instructions.
[CIR][NFC] Delete unnecessary errorNYI call in emitDelegateCallArg (#193608)
There was a call to errorNYI in `CIRGenFunction::emitDelegateCallArg`
when the parameter decl was a `CXXRecordDecl`. This was an artifact from
an older version of this function in classic codegen, which called
`ErrorUnsupported` for InAlloca arguments, but that handling was deleted
as part of https://reviews.llvm.org/D154007.
[LV][RISCV] Add explicit LMUL controls via computeFeasibleMaxVF
Add components of maxVF and its support for scalable
vectorization. The default for unspecified RISCV is
LMUL=4 with this change, so some tests will have
the flag that controls max LMUL to extend to LMUL=8
when the request is made.
lang/fpc: Add infrastructure to bootstrap on macOS
Does compile all the way, but still fails linking and I couldn't figure
out yet why. Rather than throwing it all away, I guess it's better to
commit it with BROKEN_ON_PLATFORM.
[lldb] Update filecheck_log to use direct input (NFC-ish) (#193618)
Pass log files as direct input to `FileCheck` via its `-input-file`
option.
I had a failing test case where the log file contains the string being
checked for, and yet `FileCheck` failed. While debugging, I noticed the
output from running `platform shell -h -- cat ...` was somehow
truncated. I have not debugged why. As soon as I saw the issue, I
figured it was best to skip all the intermediaries, and pass the log
file straight to `FileCheck`.
linuxkpi: Implement __GFP_THISNODE in alloc_pages()
It indicates to `alloc_pages()` to allocate the pages from the current
NUMA domain. If it couldn't, it should not retry elsewhere and return
failure.
Reviewed by: bz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D56590
pf_purge_states() may trip assert(st->timeout == PFTM_UNLINKED)
in pf_free_state(). Once member ->timeout in pf_state structure
reaches PFTM_UNLINKED value, then the ->timeout member must not
not be updated. This diff reminds pfsync(4) to follow PFTM_UNLINKED
rule too. The pfsync(4) currently may accidentally update ->timeout
member while state is being purged, causing pf_purge_states() to
trip the assert.
Issue was kindly reported by Stuart Henderson.
OK @bluhm
[SPIR-V] Handle ASM with multiple outputs (#187128)
Inline ASM that writes to multiple registers is represented as a struct
returning call in LLVM IR. We did not handle this properly, as we
mutated the callsite, but did not correctly retrieve the type during
lowering to SPIR-V. Furthermore, IRTranslator tries to do some clever
things when lowering ASM, which are completely unhelpful to SPIR-V,
which merely wants to pass the original ASM through. This patch
correctly retains the IR type and cleans up the IRTranslator introduced
noise, matching how the SPIRV-LLVM Translator would handle such cases.
There is probably a cleaner reformulation of this to be had when we
rework the entire callsite mutation infra.
<!-- branch-stack-start -->
<!-- branch-stack-end -->
---------
Co-authored-by: Marcos Maronas <marcos.maronas at intel.com>
Co-authored-by: Juan Manuel Martinez Caamaño <jmartinezcaamao at gmail.com>
Ensure that the Synthetic children of a ValueObject are managed by their parents ClusterManager (#192561)
A very common pattern in our synthetic child providers was to make the
child ValueObject using ValueObjectConstResult::Create or some form of
the static ValueObject::CreateValueObjectFrom*** methods, and store and
hand that out as the child. Doing that creates a "root" ValueObject
whose lifecycle is not linked to the lifecycle of the ValueObject it is
a child of. And that means it is possible that either the child or the
parent could have gotten destroyed when the other ValueObject gets asked
a question about it.
For the most part this doesn't happen because there are usually enough
other shared pointer references binding the two to keep both sides
alive. But we have gotten a small but steady stream of reports for years
now of crashes where a ValueObject accesses its ClusterManager but that
has already been deleted. I've never been able to find a reproducible
case of this, but one plausible cause is that we are violating the
contract that "all the children of a ValueObject have coterminous
lifespans, enforced by the ClusterManager". So it is unsurprising that
[31 lines not shown]