[libc++][NFC] Clang-format <vector> and remove unused __self alias (#177021)
These changes were extracted out of the size-based vector patch.
Co-authored-by: Christopher Di Bella <cjdb at google.com>
[mlir][tosa] Add a canonicalization to optimize cast cast sequences (#176904)
This commit introduces a new canonicalization over a sequence of cast
operations. cast->cast sequences can be simplified to a single cast when
no narrowing is performed inbetween. This optimization is limited to
integer types, since floating point casts may impact numerical
behaviour.
opnsense/hostwatch: using NAME_user invokes script magic
that hostwatch doesn't like. Rename the vars so that doesn't happen
somewhat emulating what was going on before.
CONTRIBUTING.md: Tweaks for clarity
Add a few tweaks to clarify the author and signed-off-by lines. Add
clarifying note about the style checker. Refine the AI statements
for clarity, but these will need to be revised once the AI policy
has been completed.
Sponsored by: Netflix
[OpenACC][MLIR] clone private operands during ACCIfClauseLowering (#177458)
Clone the private operands into the compute region side. This also fixes
an issue where references to acc.private remain on the host side.
mt76: update Mediatek's mt76 driver
This version is based on
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
24d479d26b25bce5faea3ddd9fa8f3a6c3129ea7 ( tag: v6.19-rc6 ).
Notable change: license got switched from ISC to BSD-3-Clause-Clear.
util.h is now imported from upstream given it is no longer GPL-only.
See the upstream repository 909675fd4344f73aad5f75f123bd271ada2ab9fb
and a96fed2825d8dfb068bf640419c619b5f2df4218.
For us the new version should also help with page pools and DMA32.
Sponsored by: The FreeBSD Foundation
[AMDGPU] Remove redundant s_cmp_* after add X, 1 (#176962)
Convert:
```
s_add_u32 X, Y, 1
s_cmp_lg_i32 X, 0
```
to:
```
s_add_u32 X, Y, 1
<invert scc uses>
```
Also delete with s_cmp_eq_i32 X, 0, but inverting scc uses is not
necessary.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
NAS-139428 / 26.04 / Rollback `defer_build=True` (#18087)
Using `defer_build=True` can lower memory usage before
`core.get_methods`, but since midcli calls that during startup, the
steady-state memory for actual deployments doesn’t improve (or improves
only briefly).
The tradeoff is we keep the added complexity and we break mail/cloud
sync on middleware restarts unless core.get_methods is called.
[LangRef] Clarify specification for float min/max operations (#172012)
This implements some clarifications for the specification of floating
point min/max operations based on the discussion in
https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006.
The key changes are:
* Explicitly specify minnum and maxnum with an sNaN operand as
non-deterministically either returning NaN or treating sNaN as qNaN.
This was implied by our general NaN semantics, but is important to call
out here due to the special behavior of sNaN.
* Explicitly specify the same non-determinism for the minnum/maxnum
based vector reductions as well.
* Explicitly specify the meaning of nsz on float min/max ops. In
particular, clarify that unlike normal nsz semantics, it does not allow
introducing a zero with a different sign out of thin air.
* Simplify the semantics comparison section. This now focuses only on
NaN and signed zero behavior, but omits information about exceptions
that is not relevant for these non-constrained intrinsics.
[MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.