[CodeGen] Compare MMO atomic ordering and syncscope. (#199892)
MachineMemOperand::operator== compared the address, flags, AA metadata,
range, alignment, and address space, but not atomic success ordering,
failure ordering, or syncscope. Users such as
MachineInstr::cloneMergedMemRefs could therefore treat atomic and
non-atomic MMOs, or atomics with different syncscopes, as identical.
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
[bazel] Add config for hermetic clang toolchain (#192528)
This config uses the https://github.com/hermeticbuild/hermetic-llvm
toolchain to avoid any dependency on the host compiler. This makes it
trivial to test with remote execution and also supports cross
compilation.
NAS-141142 / 27.0.0-BETA.1 / VM/container: parallelize shutdown and fix force_after_timeout (#19000)
## Problem
When middleware itself stops VMs and containers — on system
shutdown/reboot via the `system.shutdown` event, or during HA failover —
it loops through guests one at a time, waiting up to the per-guest
shutdown timeout (90s by default) for each. With many guests this
serializes into a long wait, even though stopping different guests has
no dependency on one another.
Separately, `vm.stop(force_after_timeout=True)` was silently ignored —
`stop_vm` only checked `options.force`. A VM that didn't respond to ACPI
within its `shutdown_timeout` was left running, contradicting the API
docstring and behaving inconsistently with the container path which
honored the flag correctly.
## Solution
[7 lines not shown]
[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs
These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.
Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
[AMDGPU] Implement CFI for CSR spills
Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.
Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.
Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[Clang] Default to async unwind tables for amdgcn
To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.
There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.
Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
[AMDGPU] Implement CFI for non-kernel functions
This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.
Change-Id: I5e3a9a62cf9189245011a82a129790d813d49373
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[AMDGPU] Emit entry function Dwarf CFI
Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.
Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU
While these can be represented with .cfi_escape, using these pseudo-cfi
instructions makes .s/.mir files more readable, and it is necessary to
support updating registers in CFI instructions (something that the
AMDGPU backend requires).
Change-Id: I763d0cabe5990394670281d4afb5a170981e55d0
[MIR] Error on signed integer in getUnsigned
Previously we effectively took the absolute value of the APSInt, instead
diagnose the unexpected negative value.
Change-Id: I4efe961e7b29fdf1d5f97df12f8139aac12c9219
Linux 5.6 compat: fix fs_parse API mismatch
Added m4 macro to check fs_parse API signature and wrappers. Before
5.6, fs_parse() took a struct fs_parameter_description which wraps
the parameter specs with name and enum pointers. From 5.6, the
description struct was removed and fs_parse() accepts the
fs_parameter_spec directly.
Reviewed-by: Rob Norris <robn at despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: tiehexue <tiehexue at hotmail.com>
Closes #18585
cap_mkdb: Fix memory leak
This is not a big deal since it only iterates once before exiting, but
that's no reason to set a bad example.
PR: 195128
MFC after: 1 week
Reviewed by: ngie
Differential Revision: https://reviews.freebsd.org/D57251
limits: Improve consistency
Historical precedent seems pretty consistent: size limits have singular
names, number limits have plural names. RLIMIT_VMM broke this, and I
made matters worse by referring to this limit as “vmms” in limits(1).
Consistently use “vms” everywhere user-visible, while leaving the
question of whether or not to rename RLIMIT_VMM itself for another day.
Fixes: 1092ec8b3375 ("kern: Introduce RLIMIT_VMM")
Fixes: 53af2026f213 ("limits: Unbreak after RLIMIT_VMM addition")
Reviewed by: bnovkov
Differential Revision: https://reviews.freebsd.org/D57265
[lldb-dap] Use MainLoop instead of a background thread in OutputRedirector. (#199970)
Replace the background thread in OutputRedirector with LLDB's MainLoop
event loop. This reduces the number of threads created and ensures file
descriptors are properly closed when no longer needed.
Since debugger's output is not I/O intensive, there is no risk of
hitting the pipe buffer limit with this approach.
[mlir][SliceAnalysis] Fix visited set to avoid infinite recursion (#200008)
Fixes #139694, which introduced use-def cycle detection during slice
analysis, but some cycles were still not detected, potentially leading
to infinite recursion.
This PR fixes the handling of the visited set, which tracks the current
DFS path during recursion. Previously, the set could fail to detect
double cycles because entries were erased even when no recursive call
was made. The insert/erase operations are now only performed when
recursion actually occurs, ensuring that cycle detection correctly
reflects the active DFS path.
[AArch64][GlobalISel] Add BF16 fabs and fneg (#198655)
These should be very simple as they are just legal or expanded based on
whether fullfp16 is available, as the FP16 FNEG and FABS instructions can
be used equally for BF16.
[flang-rt][cuda] Use a thinner I/O in CUDA build (#199769)
Reduce the footprint of IO in the CUDA build. This helps including IO
when using non relocatable device code mode.
hardware: update missing powerpc entries
Previous patch (4c396c5b7fd7) missed `archetypes/release/hardware.adoc`
which is used for creating new hardware notes. Update the file to
reflect the patch.
Reviewed by: cperciva
Fixes: 4c396c5b7fd7 ("hardware: Update pSeries entries")
Differential Revision: https://reviews.freebsd.org/D57260
[AtomicExpand] Preserve volatile in widenPartwordAtomicRMW. (#199722)
widenPartwordAtomicRMW widens a sub-word atomicrmw to the target's
minimum cmpxchg size by calling CreateAtomicRMW, which has no
IsVolatile parameter, and didn't copy isVolatile() from the original.
Every other expansion path in this file already does. Affects targets
whose MinCmpXchgSizeInBits exceeds the value width (RISC-V without
Zabha, LoongArch base, SPARC, AMDGPU, etc.).
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
[ProfCheck] Fix #199174 (#200013)
The patch added another large fp conversion test, which we currently are
missing some profile annotations for, so add it to the xfail list for
now.