[NFC][AMDGPU] Add test showing caller/callee SGPR mismatch for inreg args
Add a test demonstrating a bug where the caller and callee disagree on which
SGPRs hold user inreg arguments when there are enough to reach the SGPR0-3
range.
On the callee side, `LowerFormalArguments` marks SGPR0-3 as allocated in
`CCState` before the CC analysis runs. On the caller side, `LowerCall` adds the
scratch resource to `RegsToPass` without marking SGPR0-3 in `CCState`. This
causes `CC_AMDGPU_Func` to assign user inreg args to SGPR0-3 on the caller side
(they appear free) while the callee skips them.
In the test, the caller writes arg 0 (value 42) to s0, but the callee reads arg
0 from s16.
[clang][test] Try to fix Sema/format-strings.c on i686 (#181800)
https://github.com/llvm/llvm-project/pull/180566 did this for 32bit arm,
but this still breaks for us downstream on i686 with:
```
# .---command stderr------------
# | error: 'expected-warning' diagnostics expected but not seen:
# | File /builddir/build/BUILD/llvm-project/clang/test/Sema/format-strings.c Line 990: format specifies type 'size_t' (aka '{{.+}}') but the argument has type '_Bool'
# | File /builddir/build/BUILD/llvm-project/clang/test/Sema/format-strings.c Line 991: format specifies type 'ptrdiff_t' (aka '{{.+}}') but the argument has type '_Bool'
# | 2 errors generated.
# `-----------------------------
```
games/stockfish: Update 17.1 => 18
Summary of changes:
+ Improved quality of chess play, ELO gain of up to 46 points.
+ Next generation evaluation introducing the SFNNv10 network
architecture.
+ Hardware and Performance Optimizations.
+ Search Improvements.
Changelog:
https://github.com/official-stockfish/Stockfish/releases/tag/sf_18
PR: 292927
[LV] Only create partial reductions when profitable.
We want the LV cost-model to make the best possible decision of
VF and whether or not to use partial reductions. At the moment,
when the LV can use partial reductions for a given VF range, it
assumes those are always preferred. After transforming the plan to
use partial reductions, it then chooses the most profitable VF. It
is possible for a different VF to have been more profitable, if it
wouldn't have chosen to use partial reductions.
This PR changes that, to first decide whether partial reductions
are more profitable for a given chain. If not, then it won't do
the transform.
NAS-139934 / 26.0.0-BETA.1 / vacuum db before presenting for download (#18269)
There have been some circumstances in the past in which users were
unable to upload db through the webui due to file size limits. For
various reasons, DB had expanded to tens of MiB in size. Users were able
to work around by vacuuming. Since we're already prepping copy of DB for
download, we should create it as a vacuumed copy.
[LV] NFCI: Move extend optimization to transformToPartialReduction.
The reason for doing this in `transformToPartialReduction` is so that
we can create the VPExpressions directly when transforming reductions
into partial reductions (to be done in a follow-up PR).
I also intent to see if we can merge the in-loop reductions with
partial reductions, so that there will be no need for the separate
`convertToAbstractRecipes` VPlan Transform pass.
[MLIR][OpenMP] Unify device shared memory logic
This patch creates a utils library for the OpenMP dialect with functions
used by MLIR to LLVM IR translation as well as the stack-to-shared pass
to determine which allocations must use local stack memory or device
shared memory.
[MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
[NFC][AArch64] Split fptoi tests and add scal_to_vec convert tests (#179315)
This patch splits simd-fptoi tests into strictfp and nonstrictfp files
for simplicity and adds tests which will test correct insertion of
bitcasts to certain scalar_to_vector variant which will be introduced in
#172837.
java/openjdk21-25: Bootstrap from prebuilt packages
Completes the transition to using prebuilt packages to bootstrap OpenJDK
ports.
PR: 289731
Reviewed by: jrm, fuz (mentor)
Approved by: fuz (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D54731
[libc++] Simplify and optimize the run-benchmarks script (#181382)
Instead of configuring and running the benchmark suite once for SPEC and
once for the microbenchmarks, run it only once for everything. This
saves a configuration of the test suite (which includes building Google
Benchmark).
To replicate the functionality we had with --disable-microbenchmarks
(whose goal was mostly to run only SPEC), introduce a --filter argument
that can be used to select exactly which benchmarks are run. This is
simpler and more powerful.
Making this work requires hardcoding the only C++ standard that works
for SPEC (C++17) inside spec.gen.py instead of expecting it to be set
correctly when running the test suite.
[libc++] Fix `gps_time` formatting and related tests (#181560)
- The Standard wording in https://eel.is/c++draft/time.format#13 is similar
to TAI formatting in that it's equivalent to formatting a `sys_time`
with a fixed offset. Leap seconds should not be considered.
- Tests need to be adjusted by adding the number of leap seconds between
the GPS epoch and the tested date, which is 15s for 2010 and 18s for
2019.
- The TAI and GPS tests using `meow_time<cr::duration<long, ...>>`
should use `long long` because the offset swill overflow a 32-bit
signed integer.