[NVPTX] Rewrite kernel signatures in param AS (#204192)
Rewrite the kernel signatures moving byval parameters directly into
entry parameter address space (similar to how ExpandVariadics handles
va_arg functions). This avoids the need for the somewhat hacky
nvvm_internal_addrspace_wrap intrinsic and enables better support for
parameter short pointers.
p9fs: Remove the "cancel" transport method
Nothing calls it, and the existing virtio transport doesn't implement
it. No functional change intended.
MFC after: 1 week
[Instrumentor] Add runtime examples: [2/N] A FP precision analysis
Second example:
Check all floating point operations and track if they could be done at
lower precision.
Partially developped by Claude (AI), tested and verified by me.
FreeBSD: avoid lookup overhead for nonexistent xattr directories
Port the z_xattr_dir_absent cache to FreeBSD. As on Linux, a
getextattr that misses in the SA otherwise falls through to
zfs_get_xattrdir(), which takes the "" ZXATTR dirlock and reads
SA_ZPL_XATTR only to find the file has no xattr directory. The
in-core znode now caches that: after an SA miss zfs_getextattr_impl()
skips the directory lookup when the flag is set, zfs_get_xattrdir()
sets it when no directory is found and clears it when one is found,
and zfs_make_xattrdir() clears it on creation, which also covers the
TX_MKXATTR ZIL replay path.
The flag is serialized by the base file's vnode lock.
zfs_make_xattrdir(), the only path that creates the directory and
clears the flag, runs with the vnode held exclusive, while every
reader that sets the flag holds it shared, so a set can never race
the clear. ASSERT_VOP_ELOCKED() in zfs_make_xattrdir() and
ASSERT_VOP_LOCKED() in zfs_get_xattrdir() enforce this, both skipped
during ZIL replay since it is single threaded with no locked vnode.
[8 lines not shown]
Avoid lookup overhead for nonexistent xattr directories
A getxattr that misses in the file's SA falls through to
zfs_get_xattrdir(), which takes the "" ZXATTR dirlock and issues an
sa_lookup(SA_ZPL_XATTR), only to find the file has no xattr directory at
all. security.capability is the common trigger: the kernel probes it on
file access (get_vfs_caps_from_disk()), so for the many files that carry
no extended attributes the same fruitless lookup repeats constantly.
Profiling an SMB metadata workload showed roughly 6% of CPU spent in
zfs_get_xattrdir(), every call missing and returning ENOENT.
Cache the result in the in-core znode: a new boolean marks a file as
having no xattr directory. When it is set, a getxattr that misses in the
SA returns ENODATA from __zpl_xattr_get() without the zfs_lookup into
zfs_get_xattrdir, so neither the "" ZXATTR dirlock nor the SA_ZPL_XATTR
lookup runs. The flag is set when the directory lookup finds nothing and
cleared in zfs_make_xattrdir() whenever a directory is created, so the
setxattr and TX_MKXATTR ZIL replay paths are both covered. It is updated
under the existing z_xattr_lock and defaults to the real lookup, so
[9 lines not shown]
[offload] add support for aligned allocations (#203353)
This patch is the first step towards introducing alignment support in
memory allocations using liboffload, in order to enable SYCL
implementation of aligned allocations.
At the level of device allocators, it does not modify the Level Zero
code except for forwarding the alignment parameter, since Level Zero
already allows for specifying the alignment in its device allocator
implementation. For AMD and CUDA, it checks whether the alignment passed
by the caller is supported by the given backend; the reasoning behind
this verification is described in the following paragraphs. At the API
level, it adds a new function olMemAllocAligned, which is expected to
work similarly to olMemAlloc, with the difference that the buffers
returned by olMemAllocAligned should be aligned to the alignment passed
by the user. At the level of the plugin interface internal abstractions,
it adds a new argument Alignment to existing functions and delegates
memory allocation between olMemAllocAligned and olMemAlloc
implementations by using a common helper function.
[37 lines not shown]
Improve performance of "zpool offline" for log devices
When offlining a log device, if it's part of a mirror that would still
be available after the offline operation, skip replaying the ZIL for
every dataset. This drastically improves the performance of "zpool
offline" for one log device of a mirrored pair.
Sponsored by: ConnectWise
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alek Pinchuk <alek.pinchuk at connectwise.com>
Signed-off-by: Alan Somers <asomers at gmail.com>
Closes #18664
honor file argument in file_wait_event
grep the log path passed by the caller instead of always using
ZED_DEBUG_LOG.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alek Pinchuk <Alek.Pinchuk at connectwise.com>
Closes #18700
[mlir][OpenMP] Don't use label prefixes on linear variable rewrite (#200900)
This is a follow-up to #194623. After that PR, matching specific label
prefixes became unnecessary. In fact, doing so could potentially lead
to missed linear variables in the rewrite, if they appear in basic
blocks with unexpected label prefixes.
[Instrumentor] Add runtime examples: [1/N] A flop counter
This adds a instrumentor-tools folder into compiler RT to showcase
use cases of the instrumentor. The initial example is a program that,
via instrumentation, counts the number of flops performed. Call and
intrinsic support will follow after #198042.
Partially developped by Claude (AI), tested and verified by me.
Merge tag 'ntfs3_for_7.2' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"Added:
- depth limit to indx_find_buffer() to prevent stack overflow
- validate split-point offset in indx_insert_into_buffer()
- bounds check to run_get_highest_vcn()
- fileattr_get() and fileattr_set() support
- zero stale pagecache beyond valid data length
- handle delayed allocation overlap in run lookup
- validate lcns_follow in log_replay() conversion
- cap RESTART_TABLE free-chain walker at rt->used
- resize log->one_page_buf when adopting on-disk page size
- reject direct userspace writes to reserved $LX* xattrs
Fixed:
- out-of-bounds read in decompress_lznt()
- avoid -Wmaybe-uninitialized warnings
- hold ni_lock across readdir metadata walk
[45 lines not shown]
[flang][cmake][perf-training] Optimize flang with PGO and BOLT (#198863)
This is an attempt to replicate similar fearture already available to
clang. The changes in this patch were made with an intent to reuse as
much of existing infrastructure as possible. Namely, two-stage build
arrangement, perf-helper.py script and the means for building of the
instrumented binaries have all been incorporated into this approach.
It was deliberately chosen to optimize clang along with flang as they
are mostly working together in the final toolchain.
See the `llvm/docs/AdvancedBuilds.rst` documentation for more details.
Note that the attempt to optimize flang has exceeded one of the BOLT
limitations. The size of one of the statically allocated buffers needed
to be extended in this patch.
[BOLT] Increase BufSize in runtime/common.h (#204607)
During my work towards bolring the flang binary, I've encountered a
frequently occuring problem with running out of the buffer space.
The problem affects C++ programs with a decent number of very long
symbol names, which is inevitable when using template metaprogramming.
As one can clearly see, flang is one of such programs.
The proposed BufSize value is an effect of the trial-and-error process
aiming at finiding the smallest reasonable increase.
Revert "fts: refactor to use fd-relative operations internally"
This reverts commit e03ed9daeb49fffa1d16b8d00240c65e92650d01.
The change to the size of struct FTSENT is breaking backwards
compatibility for some binaries. Jitendra is working on a new version
that will move the new field into a private struct.
Reported by: bdrewery
Fixes: e03ed9daeb4 ("fts: refactor to use fd-relative operations")
Sponsored by: ConnectWise
Pull up the following revisions, requested by martin in ticket #324:
usr.sbin/sysinst/util.c 1.82,1.83
PR 60354: move the test and new message about optional sets missing
into the correct place so it only shows the message when we really
can not find the set.
This only applies to local files.