[AMDGPU][GFX1250] Optimize s_wait_xcnt for back-to-back atomic RMWs (#177620)
This patch optimizes the insertion of s_wait_xcnt instruction for
sequences of atomic read-modify-write (RMW) operations in the
SIInsertWaitcnts pass. The Memory Legalizer conservatively inserts a
soft xcnt instruction before each atomic RMW operation as part of PR
168852, which is correct given the nature of atomic operations.
However, for back-to-back atomic RMWs, only the first s_wait_xcnt is
necessary for better runtime performance. This patch tracks atomic
RMW blocks within each basic block and removes redundant soft xcnt
instructions, keeping only the first wait in each sequence. An atomic
RMW block continues through subsequent atomic RMWs and non-memory
instructions (e.g., ALU operations) but is broken by CU-scoped memory
operations, atomic stores, or basic block boundaries.
Cleanup: Remove SmallVector hacks (#177667)
We no longer support any platform that uses Clang 5. This was a
workaround for older clang versions where template arguments weren't
merged between forward declarations and definitions correctly.
Since we don't support anything this old anymore, we can drop this
workaround.
Note: I do not have merge permissions.
nanobsd: Use mtree -C to produce the metalog
Prefer an mtree -C output, which is guaranteed to be mtree-compatible.
Add "gname", "uname", and "tags" to the default keyword set, while
removing "size" and "time", the latter being set on kernel file entries
and taking precedence over makefs -T (when paired with -F).
As a side effect, this produces a cleaner file with sorted keywords.
Note that passing "-u" to sort in order to pipe to mtree is no longer
necessary, but we'll do it out of habit.
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D54854
[flang] Added ConditionallySpeculatable and Pure for some FIR ops. (#174013)
This patch implements `ConditionallySpeculatable` interface for some
FIR operations (`embox`, `rebox`, `box_addr`, `box_dims` and `convert`).
It also adds `Pure` trait for `fir.shape`, `fir.shapeshift`,
`fir.shift` and `fir.slice`.
I could have split this into multiple patches, but the changes
are better tested together on real apps, and the amount of affected
code is small.
There are more `NoMemoryEffect` operations for which I am planning
to do the same in future PRs.
[flang] Support cuf.device_address in FIR AliasAnalysis. (#177518)
Support `cuf.device_address` same way as `fir.address_of`.
This implementation implies that the host address and the device
address `MustAlias` (as shown in the new test). This should be
conservatively correct as long as `MustAlias` does not allow
to assume that the actual addresses are the same (that is what
LLVM documentation implies, I believe).
It is probably worth adding an operation interface to handle
`fir::AddrOfOp` and `cuf::DeviceAddressOp` in FIR AliasAnalysis,
but for the initial implementation I hardcoded the checks.
I also removed the call to `fir::valueHasFirAttribute` that performs
on demand SymbolTable lookups, which may be costly, and added
SymbolTable caching in FIR AliasAnalysis object. Anyway,
`fir::valueHasFirAttribute` does not work for `cuf::DeviceAddressOp`.
astro/calceph: New port
CALCEPH Library is designed to access the binary planetary ephemeris
files, such INPOPxx and JPL DExxx ephemeris files, (called 'original
JPL binary' or 'INPOP 2.0 or 3.0 binary' ephemeris files in the next
sections) and the SPICE kernel files (called 'SPICE' ephemeris files
in the next sections).
amd64: Fix sys/pcpu.h usage in vmm_host.h and md_var.h
Include sys/pcpu in vmm_host.h as its structs and functions are used
there, and add a forward declaration of struct pcpu to md_var.h as it
is used in some function prototypes.
Reviewed by: corvink, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D51550
[HIP] Provide implicit include to ROCm library directory (#177704)
Summary:
It's more correct to directly link the HIP runtime if we know the path,
however some users were relying on the old `-L` to pass in some other
non-standard HIP libraries. Put that part back in for now.
[lldb] Avoid redundant calls to `std::shared_ptr::get` (NFC) (#177720)
Avoid redundant calls to `std::shared_ptr::get()`. The class provides a
dereference operator and using that is the standard, idiomatic way to
access the underlying object.
nfsd: Fix handling of hidden/system during Open/Create
When an NFSv4.n client specifies settings for the archive,
hidden and/or system attributes during a Open/Create, the
Open/Create fails for ZFS. This is caused by ZFS doing
a secpolicy_xvattr() call, which fails for non-root.
If this check is bypassed, ZFS panics.
This patch resolves the problem by disabling va_flags
for the VOP_CREATE() call in the NFSv4.n server and
then setting the flags with a subsequent VOP_SETATTR().
This problem only affects FreeBSD-15 and main, since the
archive, system and hidden attributes are not enabled
for FreeBSD-14.
I think a similar problem exists for the NFSv4.n
Open/Create/Exclusive_41, but that will be resolved
in a future commit.
[8 lines not shown]
nfsd: Fix handling of hidden/system during Open/Create
When an NFSv4.n client specifies settings for the archive,
hidden and/or system attributes during a Open/Create, the
Open/Create fails for ZFS. This is caused by ZFS doing
a secpolicy_xvattr() call, which fails for non-root.
If this check is bypassed, ZFS panics.
This patch resolves the problem by disabling va_flags
for the VOP_CREATE() call in the NFSv4.n server and
then setting the flags with a subsequent VOP_SETATTR().
This problem only affects FreeBSD-15 and main, since the
archive, system and hidden attributes are not enabled
for FreeBSD-14.
I think a similar problem exists for the NFSv4.n
Open/Create/Exclusive_41, but that will be resolved
in a future commit.
[8 lines not shown]
[Flang][OpenMP][Offload] Modify MapInfoFinalization to handle attach mapping and 6.1's ref_* and attach map keywords
This PR is one of four required to implement the attach mapping semantics in Flang, alongside the
ref_ptr/ref_ptee/ref_ptr_ptee map modifiers and the attach(always/never/auto) modifiers.
This PR is the MapInfoFinalization changes required to support these features, it mainly deals with
applying the correct attach map type and manipulating the descriptor types maps for base address
and descriptor so that when we specify ref_ptr/ref_ptee we emit one of the two maps and when we
emit ref_ptr_ptee we emit our usual default maps. In all cases we add the "glue" of an new
attach map except in cases where a user has provided attach never. In cases where we are
provided an always, we apply the always map type to our attach maps.
It's important to note the runtime has a toggle for the auto map behaviour, which will flip the
attach behaviour to the newer semantics or the older semantics for backwards compatability (outside
the purview of this PR but good to mention).