[mlir][LLVM] Add support for `ptrtoaddr` (#185104)
The `ptrtoaddr` op is akin to `ptrtoint` with some important
differences:
* It does not capture the provenance of the pointer, meaning a pointer
does not escape and subsequent `inttoptr` don't make a legal pointer.
LLVM can then assume the pointer never escaped, which helps alias
analysis.
* It does not support arbitrary integer types, but only exactly the
integer type that is equal in width to the pointer type as specified by
the data layout.
This PR adds the op the MLIR dialect and adds the corresponding
verification for the datalayout property.
[BOLT][AArch64] Support block reordering beyond 1KB for FEAT_CMPBR. (#185443)
Currently LongJmpPass::relaxLocalBranches bails early if the estimated
size of a binary function is less than 32KB assuming that the shortest
branches are 16 bits. Therefore the fixup value for the cold branch
target may go out of range if the function is larger than 1KB.
I am decreasing ShortestJumpSpan from 32KB to 1KB, since FEAT_CMPBR
branches are 11 bits.
[libc] Fix hdrgen test test_small_proxy.h (#185890)
The expected output was outdated as it did not contain the macro
definitions.
This patch fixes the issue.
libclc: Improve fdim handling (#186085)
The maxnum is somewhat overconstraining. This gives slightly
better codegen and avoids the noise from the select and convert,
and saves the cost of materializing the nan literal.
libclc: Replace nextafter implementation (#186082)
Use a more straightforward version which allows
optimizations to delete the edge case checks, and also
codegens better. Implement in terms of new nextup and nextdown
helper functions, which are IEEE functions, and usable in other
functions.
libclc: Replace fmod implementation with elementwise builtin (#186083)
This corresponds to frem, which for whatever reason is a first
class IR instruction. The backend has a heroic freestanding
implementation that should be nearly identical to what was here.
libclc: Replace nextafter implementation
Use a more straightforward version which allows
optimizations to delete the edge case checks, and also
codegens better. Implement in terms of new nextup and nextdown
helper functions, which are IEEE functions, and usable in other
functions.
libclc: Improve fdim handling
The maxnum is somewhat overconstraining. This gives slightly
better codegen and avoids the noise from the select and convert,
and saves the cost of materializing the nan literal.
[Clang][AArch64] Remove duplicate CodeGen test for bf16 get/set intrinsics
The following test files contain identical test bodies (aside from the
RUN lines):
* clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c
* clang/test/CodeGen/arm-bf16-getset-intrinsics.c
The differences in the RUN lines do not appear to be relevant for the
tested functionality. This change keeps a single test file and
simplifies its RUN lines to match the generic style used in
clang/test/CodeGen/AArch64/neon.
This also moves toward unifying and reusing RUN lines across tests.
compiler-rt/arm: Check for overflow when adding float denorms (#185245)
When the sum of two sub-normal values is not also subnormal, we need to
set the exponent to one.
Test case:
static volatile float x = 0x1.362b4p-127;
static volatile float x2 = 0x1.362b4p-127 * 2;
int
main (void)
{
printf("x %a x2 %a x + x %a\n", x, x2, x + x);
return x2 == x + x ? 0 : 1;
}
Signed-off-by: Keith Packard <keithp at keithp.com>
Revert "[SDAG] (abs (add nsw a, -b)) -> (abds a, b)" (#17580) (#186068)
Reverts llvm/llvm-project#175801 while #185467 miscompilation is being investigated
libclc: Replace fmod implementation with elementwise builtin
This corresponds to frem, which for whatever reason is a first
class IR instruction. The backend has a heroic freestanding
implementation that should be nearly identical to what was here.
libclc: Replace nextafter implementation
Use a more straightforward version which allows
optimizations to delete the edge case checks, and also
codegens better. Implement in terms of new nextup and nextdown
helper functions, which are IEEE functions, and usable in other
functions.
[AArch64][SVE] Add unpacked fp ISel patterns for clastb (#185688)
Add support for selecting clastb for unpacked float, half, and
bfloat vectors.
Fixes #185670.
[LV] Fix another invalidated iterator in handleFindLastReductions (#185712)
Just collect all the initial phis into a SmallVector first instead
of trying to avoid iterator invalidation in a changing vplan.
Fixes #185682.
[AMDGPU] Use an X-macro to define ELF machine types and names. NFCI. (#185882)
This reduces the number of files that need to be touched when adding a
new CPU type.
Revert "[lldb] Consolidating platform support checks in tests." (#186071)
Reverts llvm/llvm-project#184656
This PR broke linking on Windows and possibly elsewhere. There are at
least 2 possible fixes. Revert while we decide on a single solution.
[libc++] Fix checks for terminal and flushes in std::print() (#70321)
The check whether a stream is associated with a terminal or not and the
flushing of the stream in `std::print()` is needed only on Windows.
Additionally, the correct flush should be used. When `std::print` is
called with a C stream, `std::fflush()` should be used. When it is
called with C++ `ostream`, `ostream::flush()` should be called.
Because POSIX does not have a separate Unicode API for terminal output,
checking for terminal (`isatty`) and flushing is not needed at all.
Moreover, `isatty` has noticeable performance cost.
See also https://wg21.link/LWG4044.
Fixes #70142