[compiler-rt][ARM] Optimized single precision FP add/sub
This replaces the previous Thumb1-specific addsf3 with both Thumb1 and
Arm/Thumb2 add/sub.
I've removed the old Thumb1 addsf3 completely, partly because this
implementation is expected to be faster, and partly because the new
tests exposed a bug in the old implementation. However the new
implementation does consume more code, so perhaps putting the old
implementation back as an alternative with the bug fixed might be a
useful option.
[compiler-rt][ARM] Optimized FP -> integer conversions
This commit adds a total of 8 new functions, all converting a
floating-point number to an integer, varying in 3 independent choices:
* input float format (32-bit or 64-bit)
* output integer size (32-bit or 64-bit)
* output integer type (signed or unsigned)
[compiler-rt][ARM] Optimized single-precision FP comparisons
These comparison functions follow the same structure as the
double-precision ones in a prior commit, of a header file containing
the main logic and some entry points varying the construction of the
return value.
In this case, we have provided versions for Thumb1 as well as
Arm/Thumb2.
[compiler-rt][ARM] Optimized integer -> FP conversions
This commit adds a total of 8 new functions, all converting an integer
to a floating-point number, varying in 3 independent choices:
* input integer size (32-bit or 64-bit)
* input integer type (signed or unsigned)
* output float format (32-bit or 64-bit)
The two conversions of 64-bit integer to 32-bit float live in the same
source file, to save code size, since that conversion is one of the
more complicated ones and the two functions can share most of their
code, with only a few instructions differing at the start to handle
negative numbers (or not).
[compiler-rt][ARM] Optimized FP double <-> single conversion
This commit provides assembly versions of the conversions both ways
between double and float.
[compiler-rt][ARM] cmake properties for complicated builtin sources
In the builtins library, most functions have a portable C
implementation (e.g. `mulsf3.c`), and platforms might provide an
optimized assembler implementation (e.g. `arm/mulsf3.S`). The cmake
script automatically excludes the C source file corresponding to each
assembly source file it includes. Additionally, each source file name
is automatically translated into a flag that lit tests can query, with
a name like `librt_has_mulsf3`, to indicate that a function is
available to be tested.
In future commits I plan to introduce cases where a single .S file
provides more than one function (so that they can share code easily),
and therefore, must supersede more than one existing source file.
I've introduced the `crt_supersedes` cmake property, which you can set
on a .S file to name a list of .c files that it should supersede.
Also, the `crt_provides` property can be set on any source file to
indicate a list of functions it makes available for testing, in
addition to the one implied by its name.
[compiler-rt][ARM] Optimized double-precision FP comparisons
The structure of these comparison functions consists of a header file
containing the main code, and several `.S` files that include that
header with different macro definitions, so that they can use the same
procedure to determine the logical comparison result and then just
translate it into a return value in different ways.
[compiler-rt][ARM] Optimized double-precision FP mul/div
Optimized AArch32 implementations of `muldf3` and `divdf3` are
provided. The division function is particularly tricky because its
Newton-Raphson approximation strategy requires a rigorous error bound.
In this version of the commit I've left out the full supporting
machinery that validates the error bound via Gappa and Rocq, but full
details are provided via links to the upstream version of this code in
the Arm Optimized Routines repository, and to a pair of Arm Community
blog posts.
[compiler-rt][ARM] Optimized double precision FP add/sub
The one new assembly source file, `arm/adddf3.S`, implements both
addition and subtraction via cross-branching after flipping signs,
since both operations must provide substantially the same logic. The
new cmake properties introduced in a prior commit are used to arrange
that including `adddf3.S` supersedes the C versions of both addition
and subtraction, and also informs the test suite that both functions
are available to test.
[compiler-rt][ARM] Double-precision FP support functions
This commit adds C helper functions `dnan2`, `dnorm2` and `dunder` for
handling the less critical edge cases of double-precision arithmetic,
similar to `fnan2`, `fnorm2` and `funder` that were added in commit
f7e652127772e93.
It also adds a header file that defines some register aliases for
handling double-precision numbers in AArch32 software floating point
in an endianness-independent way, by providing aliases `xh` and `xl`
for the high and low words of the first double-precision function
argument, regardless of which of them is in r0 and which in r1, and
similarly `yh` and `yl` for the second argument in r2/r3.
[compiler-rt][ARM] Enable strict mode in divsf3/mulsf3 tests
Commit 5efce7392f3f6cc added optimized AArch32 assembly versions of
mulsf3 and divsf3, with more thorough tests. The new tests included
test cases specific to Arm's particular NaN handling rules, which are
disabled on most platforms, but were intended to be enabled for Arm.
Unfortunately, they were not enabled under any circumstances, because
I made a mistake in `test/builtins/CMakeLists.txt`: the command-line
`-D` option that should have enabled them was added to the cflags list
too early, before the list was reinitialized from scratch. So it never
ended up on the command line.
Also, the test file mulsf3.S only even _tried_ to enable strict mode
in Thumb1, even though the Arm/Thumb2 implementation would also have
met its requirements.
Because the strict-mode tests weren't enabled, I didn't notice that
they would also have failed absolutely everything, because they
[9 lines not shown]
touch: Fix setting time of created file if fstat() fails
Previously, if creating the file and fstat() fails, we would've ended up
calling utimensat() on that file anyways with whatever was in sb. Not
that this is an error likely to happen...
We don't check for the return value of close() as we aren't writing
anything to the file and the file is always created on success of
open().
Reviewed by: kevans
Approved by: kevans
Fixes: cb54c500d0e1 ("touch: don't leak descriptor if fstat(2) fails")
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D55117
MFC after: 1 week
graphics/processing: pin to openjdk8
Dependency comms/rxtx is tied to openjdk8 and installs jars in
PREFIX/openjdk8/jre/lib/ext. This directory is gone in jdk9+, but
processing expects it.
So pin processing to openjdk8 also.
If anybody wants this to be supported by jdk9+, patches are welcome.
PR: 292652
Approve-by: maintainer timeout
Reapply "AMDGPU: Use real copysign in fast pow (#97152)"
This reverts commit bff619f91015a633df659d7f60f842d5c49351df.
This was reverted due to regressions caused by poor copysign
optimization, which have been fixed.