[TargetLowering] Avoid unnecessary EVT -> Type -> EVT roundtrip (NFC) (#177328)
For pointers, this gets the pointer EVT, then converts it back into a
type, and then gets the EVT for that type again. We can directly use the
pointer EVT.
[AMDGPU] Fix use-after-erase in OpenCL printf runtime binding (#177356)
When handling OpenCL printf calls, the AMDGPU backend replaces the
actual function call with a runtime binding. However, this replacement
currently assumes that there are no uses of the original call value
result. If there are uses, the erasure of the function call leads to
errors.
This patch replaces all uses of the original printf call with a 0 value
constant, signalling success of the printf operation.
---------
Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen at amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen at amd.com>
dpnaa2: announce transmit checksum support
Let the network stack know that the NIC supports checksum offloading
for the IPv4 header checksum and the TCP and UDP transport checksum.
This avoids the computation in software and therefore provides the
expected performance gain.
PR: 292006
Reviewed by: dsl, Timo Völker
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D54809
DAG: Remove softPromoteHalfType
Remove the now unimplemented target hook and associated DAG machinery
for the old half legalization path.
Really fixes #97975
R600: Remove softPromoteHalfType
Also includes a kind of hacky, minimal change to avoid assertions
when softPromoteHalfType is removed to fix kernel arguments
lowered as f16. Half support was never really implemented
for r600, and there just happened to be a few incidental tests
which included a half argument (which were also not even meaningful,
since the function body just folded to nothing due to no callable
function support).
AMDGPU: Move softPromoteHalfType override to R600 only
As expected the code is much worse, but more correct.
We could do a better job with source modifier management around
fp16_to_fp/fp_to_fp16.
14.4: Update stable/14 to -PRERELEASE
This marks the start of the FreeBSD 14.4 release cycle; the stable/14
tree is now in "code slush".
Developers are encouraged to prioritize fixing bugs (and/or merging bug
fixes from HEAD) over new features at this time. Commit approval from
re@ is not required but if new features introduce problems they may be
removed from the release.
Approved by: re (implicit)
Sponsored by: OpenSats Initiative
[AMDGPU] Fix inline constant encoding for `v_pk_fmac_f16` (#176659)
This PR handles`v_pk_fmac_f16` inline constant encoding/decoding
differences between pre-GFX11 and GFX11+ hardware.
- Pre-GFX11: fp16 inline constants produce `(f16, 0)` - value in low 16
bits, zero in high.
- GFX11+: fp16 inline constants are duplicated to both halves `(f16,
f16)`.
Fixes #94116.
(cherry picked from commit c253b9f9caf0be95bb16e973f216489d894370e1)
yazi: update to 26.1.22.
v26.1.22
Added
Tree view for the preset archive previewer (#3525)
Support compressed tarballs (.tar.gz, .tar.bz2, etc.) in the preset archive previewer (#3518)
Check and refresh the file list when the terminal gains focus (#3561)
Experimental module-level async support (#3594)
Disable ANSI escape sequences in ya pkg when stdout is not a TTY (#3566)
New Path.os() API creates an OS-native Path (#3541)
Fixed
Smart-case in interactive cd broken due to a typo (#3540)
Fix shell formatting for non-spread opener rules (#3532)
sort extension excludes directories since only files have extensions (#3582)
Account for URL covariance in Url:join() (#3514)
[AArch64] Fix partial_reduce v16i8 -> v2i32 (#177119)
The lowering doesn't need to check for `ConvertToScalable`, because it
lowers to another `PARTIAL_REDUCE_*MLA` node, which is subsequently
lowered using either fixed-length or scalable types.
This fixes https://github.com/llvm/llvm-project/issues/176954
Re-generate check lines
The check lines for SME were different because of sub-register liveness,
which is enabled for streaming functions on trunk, but isn't enabled on
the release branch.
(cherry picked from commit de997639876db38d20c7ed9fb0c683a239d56bf5)
[MC] Explicitly use memcpy in emitBytes() (NFC) (#177187)
We've observed a compile-time regression in LLVM 22 when including large
blobs. The root cause was that emitBytes() was copying bytes one-by-one,
which is much slower than using memcpy for large objects.
Optimization of std::copy to memmove is apparently much less reliable
than one might think. In particular, when using a non-bleeding-edge
libstdc++ (anything older than version 15), this does not happen if the
types of the input and output iterators do not match (like here, where
there is a signed/unsigned mismatch).
As this code is performance sensitive, I think it makes sense to
directly use memcpy.
Previously this code used SmallVector::append, which explicitly uses
memcpy.
(cherry picked from commit 15e421dc643ce4d9d79174fec585cf787e56b1a0)
[flang-rt] Fix system_clock scaling on MacOS (#176753)
The less accurate clock was being adjusted for twice: once in
`GetSystemClockCountRate` and again in `ConvertTimevalToCount`.
Also adding missing `static` specifiers I noticed whilst reading the
file. I don't know of a way of meaningfully testing this in the
repository, but the code in the ticket now produces the correct result.
Fixes #176505
(cherry picked from commit e03049ec0af2f589eb88e0708bfa357cdcc427ad)
[clang][bytecode] Fix a diagnostic discrepancy (#177384)
The current interpreter does _not_ evaluate function calls when checking
for a potential constant expression.
However, it _does_ evaluate the initializers of constructors. In the
bytecode interpreter, this is harder because we compile the initializers
and the body of a constructor all in the same function.
Add a special opcode that we emit after the constructor initializers and
that aborts when we're checking for a potential constant expression.
[AArch64] Disable FEAT_RNG on Grace. (#166387)
The FIXME in the test changed should be cleared by #176340.
(cherry picked from commit b691522e75e05af5ba594e250dc947a9c27802ba)
[AArch64] Protect against unexpected SIGN_EXTEND_INREG in performBuildShuffleExtendCombine (#176733)
Apparently this code is only expecting shuffle of SIGN_EXTEND or
ZERO_EXTEND, but can sometimes see a SIGN_EXTEND_INREG of the second
vector operand. Add a check that the second operand has the same
constraints as the first.
Fixes #176314
(cherry picked from commit 242ca4e116d18849187617d7399be20b136d768b)
[LV] Separate runtime check cost from total overhead in profitability check (#176754)
In isOutsideLoopWorkProfitable function, there are two places where only
the runtime check cost (RtC) should be used, but incorrectly included
the costs of middle blocks and early-exit blocks.
1. VectorizeMemoryCheckThreshold comparison for interleaving-only
2. Minimum trip count that bounds runtime check overhead, i.e. MinTC2
calculation
This results in an overly conservative minimum profitable trip count.
This patch separates the runtime check cost from the total overhead
cost, and uses only RtC for VectorizeMemoryCheckThreshold comparison and
the MinTC2 calculation.
[flang][OpenMP] Allow ALLOC/RELEASE in place of STORAGE in 6.0 (#176810)
As per the 6.0 spec
> The value alloc may be used on map-entering constructs and the value
> release may be used on map-exiting constructs with identical meaning to
> the value storage.
(cherry picked from commit eb7adafc68dcf5a86ce916d93df6a1e34fbb9688)
[mlir][Interfaces] Fix use-after-free after #176641 (#177536)
Fix a use-after-free after #176641: the successor operands
`OperandRange` is now longer value after the terminator has been erased.
bn regress: add test that double checks the RFC 2409 and 3526 primes
Also has code to check the RFC 7919 primes and run DH_check() once that
knows about these.