[libc++] "Always" include_next for non C++ path in stdatomic.h (#178463)
In https://github.com/llvm/llvm-project/pull/176903, `#include
<__configuration/compiler.h>` is moved into the
`#ifdef _cplusplus` clause, so `_LIBCPP_COMPILER_CLANG_BASED` is no
longer set for C compiles. This causes a regression internally, where
when C compiles includes stdatomic.h, they no longer get the
corresponding C header.
C++ stdlib headers "shouldn't" be on the search patch for C compile, but
we do and so do lots of other people, so libc++ tends to support that.
This include_next for a C compile should be unconditional, not
conditional upon being Clang.
[AMDGPU] Emit b32 movs if (a)v_mov_b64_pseudo dest vgprs are misaligned (#160547)
#154115 Exposed a possible destination misaligned v_mov_b64
Relaxes v_mov_b64_pseudo register class constraint (which matches
av_mov_b64_pseudo's register class).
[SimplifyCFG] Increase iterative simplification convergence limit. (#178406)
https://github.com/llvm/llvm-project/commit/a9b0776a81e84d8042716863842fe1f8adf39cad
added an assertion to avoid infinite loops. However, the limit seems
arbitrary, there is no justification for it neither in the code nor in
the commit message, so I think this can be increased.
[AMDGPU] Ensure all WMMA instructions are marked as convergent (#178314)
This is an extension of
https://github.com/llvm/llvm-project/pull/165602. It is needed to fix an
issue with V_WMMA_F32_16X16X16_F16_twoaddr_w32 being incorrectly sunk by
machine-sink.
All WMMA instructions in AMDGPUGenInstrInfo.inc were verified to be
marked as convergent.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
AMDGPU: Add support for llvm.trap to handling of intrinsics with !nocallback (#175230)
This adds support to whitelist trap intrinsics while handling of
intrinsics with !nocallback. This fixes the reasons behind the previous
revert of #131759.
The attributor was exiting early whenever it saw intrinsics without the nocallback bit, so trap-only kernels lost all the inferred “no implicit arg” metadata and their amdgpu-agpr-alloc=0 guarantees. That conservative fallback broke certain workloads by forcing unnecessary implicit arguments and AGPR reservations. This patch allows the pass to recognize leaf-like trap intrinsics, so they no longer poison the analysis.
---------
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
[CIR] 3 more 'quick' function attribute lowering through LLVMIRDialect (#178443)
This patch lowers 3 more attributes, two of which are trivial, and one
which has a touch of a complication.
The two trivial ones are no_caller_saved_registers and nocallback, which
are language-level attributes that are effectively just passed on.
The final one is a touch more complicated, as it is a 'string'
attribute: modular-format. Also, it has a dash in the LLVM-IR version,
but that isn't possible to add as a name in the LLVM-IR MLIR Dialect
(see the comment inline). It also has a string of some consequence (that
is checked in LLVM), but that is just passed to LLVM directly.
InstCombine: Improve single-use fneg(fabs(x)) SimplifyDemandedFPClass handling (#176360)
Match the multi-use case's logic for understanding no-nan/no-inf
context.
Also only apply the nsz handling in the single use case. alive2 seems to
treat nsz as nondeterministic for each use.
[Clang] Include clang standard lib header directory from Linux (#175593)
Summary:
The LLVM-libc stores its headers in the target-specific include
directory. This PR makes the Linux toolchain include the standard lib
directory when used. This allows LLVM-libc to work and any other
standard language headers installed there. We search this first.
[VectorCombine] Fix typo in foldPermuteOfBinops cost calculation (#178072)
Addresses an issue in #173153. This patch expanded the supported ops for
folding binary ops through shuffles, but seemingly had a typo which
could inaccurately increase the unmodified cost.
[SPIRV] Properly discover LLVM tools that live next to the compiler (#178779)
Summary:
When we compile with `-emit-llvm` it will try to use `llvm-link`. The
toolchain does not properly add the driver directory as a valid path so
this will default to the user's search path. This, like other tools,
should prioritize the binaries living next to the compiler.
Side note, why is this not default behavior?
[Offload] Add a function to register an RPC Server callback (#178774)
Summary:
We provide an RPC server to manage calls initiated by the device to run
on the host. This is very useful for the built-in handling we have,
however there are cases where we would want to extend this
functionality.
Cases like Fortran or MPI would be useful, but we cannot put references
to these in the core offloading runtime. This way, we can provide this
as a library interface that registers custom handlers for whatever code
people want.
InstCombine: Improve single-use fneg(fabs(x)) SimplifyDemandedFPClass handling
Match the multi-use case's logic for understanding no-nan/no-inf context.
Also only apply the nsz handling in the single use case. alive2 seems to treat
nsz as nondeterministic for each use.