[libc][x86] Add Non-temporal code path for large memcpy (#187108)
Large memcopies are pretty rare, but are more common in ML workloads
(copying large matrixes/tensors, often to/from CPU host).
For large copies NTA stores can provide performance advantages for both
memcpy itself and the rest of the workload (by reducing cache
pollution). Other runtimes already have NTA path for large copies, so
add 1 to the llvm-libc.
Internal whole-program loadtests shows small, but statistically
significant improvement of 0.1%. ML specific bencahmrks showed 10-20%
performance gain, and fleetbench (https://github.com/google/fleetbench,
which has more up-to-date version of libc benchmarks) shows ~3% gain
(ns/byte for distributions taken from various applications).
```
[Memcpy_0]_L1 0.01950n ± 3% 0.01900n ± 5% ~ (p=0.390 n=20)
[Memcpy_0]_L2 0.02300n ± 0% 0.02300n ± 0% ~ (p=0.256 n=20)
[35 lines not shown]
[AMDGPU][SIInsertWaitcnts] Add test functions in waitcnt-wcg-attributes.mir (#186504)
This patch adds two more functions for exercising the target-cpu
attribute.
[Clang] Use stable_sort for UnqualUsingDirectiveSet for determinism in ambiguity notes (#187750)
In SemaLookup.cpp, `UnqualUsingDirectiveSet::done()` uses `llvm::sort`
with a comparator that only checks the ancestor relationships. So, if
there are multiple "neighbor" namespaces, they are considered equal, and
thus `llvm::sort` may return the using directives in a non-deterministic
order.
This was observed as a test failure on clang/test/CXX/drs/cwg0xx.cpp at
line 220 after PR #187219 started verifying the diagnostics ordering.
The two "candidate found by name lookup" notes were emitted in the
opposite order from the test's expectations -- in some builds of Clang,
but not others.
Switching to `llvm::stable_sort` ensures that using-directives are
always traversed in a deterministic order, and thus the notes emitted
deterministically.
[flang][OpenMP] Introduce `WithReason<T>` for nest/sequence properties (#187563)
This helper class contains an optional value and a "reason" message. It
replaces the uses of std::pair<optional<...>, Reason>.
Issue: https://github.com/llvm/llvm-project/issues/185287
NAS-140382 / 26.0.0-BETA.2 / Add API documentation for RBAC (by anodos325) (#18537)
This commit adds explicit documentation for how the API roles framework
works as well as some rough examples of how to create custom roles.
Original PR: https://github.com/truenas/middleware/pull/18529
Co-authored-by: Andrew Walker <andrew.walker at truenas.com>
[RISCV] Fix the pipe used by `fmv.x.<fp>/<fp>.x` in SiFive7 sched model (#187740)
These FP <-> Integer conversion instructions should use PipeA instead.
NAS-140382 / 27.0.0-BETA.1 / Add API documentation for RBAC (#18529)
This commit adds explicit documentation for how the API roles framework
works as well as some rough examples of how to create custom roles.
NAS-140387 / 27.0.0-BETA.1 / Add batch port validation endpoint `port.validate_ports` (#18532)
This commit adds a new `port.validate_ports` endpoint that validates
multiple port/bindip combinations in a single call. Currently the apps
library calls `port.validate_port` once per port, each of which
internally queries all registered port delegates via `ports_mapping()`.
For apps with many ports (e.g. SeaweedFS with 15), this results in
redundant repeated work.
The new endpoint accepts a list of `{"port": int, "bindip": str}` dicts
and calls `ports_mapping()` only once for the entire batch. It supports
two modes:
- `raise_error=True`: raises a single `ValidationErrors` with all
conflicts (same pattern as the existing endpoint)
- `raise_error=False`: returns a JSON-serializable list of `(attribute,
errmsg, errno)` tuples
The existing `validate_port` endpoint is refactored to share a
`_validate_single_port` helper but its inputs, outputs, and behavior are
unchanged.
[NFC][LV] Fix what seems to be a typo in the test
The test was added in https://github.com/llvm/llvm-project/commit/4e9894498e166ef6b207c25e780db0b6f006cc89.
Alternative fixes would be:
* Remove unused GEP, although not clear why we'd want to overwrite
stored `i64` with `ptr` store.
* Keep this patch, but perform both GEPs with `i64` element type to
reduce the diff. It's not clear if the scalarization caused by that
type mismatch is intentional/relevant for the original change.
Makefile.inc1: Don't force LLVM_BINUTILS off for cross-tools
Because of this setting we were still using ELF Tool Chain tools for
buildworld. The sets of binary utilities are largely equivalent and
this went unnoticed after commit 1cae7121c667 ("Enable LLVM_BINUTILS
by default").
This was discovered recently because ELF Tool Chain objcopy produces
standalone debug files without phdrs and this caused an issue with a
3rd party ELF parser [1]. Remove the forced setting so that we use
LLVM's binutils to build the system.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=33876
Re-commit after fixing a bootstrapping issue with LLVM binutils (in
17494c6e6b7d "build: Boostrap LLVM_BINUTILS for cross-tools").
Reviewed by: imp, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55650
Merge tag 'execve-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fixes from Kees Cook:
- binfmt_elf_fdpic: fix AUXV size calculation (Andrei Vagin)
- fs/tests: exec: Remove bad test vector
* tag 'execve-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
fs/tests: exec: Remove bad test vector
binfmt_elf_fdpic: fix AUXV size calculation for ELF_HWCAP3 and ELF_HWCAP4
Merge tag 'tty-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/vt and serial driver fixes for 7.0-rc5.
Included in here are:
- 8250 driver fixes for reported problems
- serial core lockup fix
- uartlite driver bugfix
- vt save/restore bugfix
All of these have been in linux-next for over a week with no reported
problems"
* tag 'tty-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt: save/restore unicode screen buffer for alternate screen
[12 lines not shown]
tpm: fix multi-threaded access with per-open state
The TPM driver currently has a single buffer per instance to hold the
result of a command, and does not allow subsequent commands to be sent
until the current result is read by the same OS thread that sent the
command, with a timeout to throw away the result after a while if the
result is not read in a timely fashion. This has a couple problems:
- The timeout code has a bug which causes all subsequent commands to
hang forever if a different OS thread tries to read the result
before the OS thread which sent the command, and the OS thread
which sent the command never tries to read the result.
- Even if the first problem is fixed, applications expect to be able
to read the result from a different OS thread than the OS thread
which sent the command. The particular case that we saw was a go
application where the go runtime scheduled the goroutine which read
the result to a different OS thread from one where the goroutine
that sent the command ran, and there's no way to force these to
[11 lines not shown]
NAS-140381 / 26.0.0-BETA.2 / Use netlink API for default interface detection with IPv6 fallback (by sonicaj) (#18534)
This commit fixes Apps/Docker setup failing on IPv6 single-stack
deployments with "Unable to determine interface" error. The existing
get_default_interface() only read /proc/net/route (IPv4). This replaces
the procfs text parsing with the truenas_pynetif netlink API
(get_default_route), which tries IPv4 first and falls back to IPv6. Dead
constants RTF_GATEWAY and RTF_UP are removed.
Thank you @xionglingfeng for highlighting this issue and your
contribution.
Original PR: https://github.com/truenas/middleware/pull/18528
Co-authored-by: Waqar Ahmed <waqarahmedjoyia at live.com>