Reapply "[AMDGPU][SDAG] Add missing cases for SI_INDIRECT_SRC/DST (#170323) (#171838)
A buildbot failed for the original patch.
https://github.com/llvm/llvm-project/pull/171835 addresses the issue
raised by the buildbot.
After the fix is merged, the original patch is reapplied without any
change.
[AArch64] Add a performBICiCombine function.
This moves the code out of PerformDAGCombine directly, changing the return
to return SDValue(N, 0) to match other uses of SimplifyDemandedBits.
[RISCV] Custom legalize i32 saddo/ssubo on RV64 to return a sign extended value for the data result. (#172112)
This is consistent with how we handle regular ADD/SUB and helps with
computeNumSignBits optimizations.
Fixes #172089
[orc-rt] Prevent RTTIExtends from being used for errors. (#172250)
Custom error types (ErrorInfoBase subclasses) should use ErrorExtends as
of 8f51da369e6. Adding a static_assert allows us to enforce that at
compile-time.
[CIR] Rename allEnumCasesCovered to all_enum_cases_covered (#172153)
Use the convetional snake_case for MLIR assembly and align with
operation documentation that already mentions snake_cased attribute.
[offload] Fix CUDA args size by subtracting tail padding (#172249)
This commit makes the cuLaunchKernel call to pass the total arguments size without tail padding.
[AArch64] Support USDOT in performAddDotCombine (#171864)
This function does
// ADD(UDOT(zero, x, y), A) --> UDOT(A, x, y)
Which can equally apply to USDOT too now that we have a node for it.
[AArch64] use `isTRNMask` to calculate shuffle costs (#171524)
This builds on #169858 to fix the divergence in codegen
(https://godbolt.org/z/a9az3h6oq) between two very similar
functions initially observed in #137447 (represented in the diff by test
cases `@transpose_splat_constants` and `@transpose_constants_splat`:
```
int8x16_t f(int8_t x)
{
return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
x, 4, x, 5, x, 6, x, 7 };
}
int8x16_t g(int8_t x)
{
return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
4, x, 5, x, 6, x, 7, x };
}
```
[7 lines not shown]
[orc-rt] Add Error / Exception interop. (#172247)
The ORC runtime needs to work in diverse codebases, both with and
without C++ exceptions enabled (e.g. most LLVM projects compile with
exceptions turned off, but regular C++ codebases will typically have
them turned on). This introduces a tension in the ORC runtime: If a C++
exception is thrown (e.g. by a client-supplied callback) it can't be
ignored, but orc_rt::Error values will assert if not handled prior to
destruction. That makes the following pattern fundamentally unsafe in
the ORC runtime:
```
if (auto Err = orc_rt_operation(...)) {
log("failure, bailing out"); // <- may throw if exceptions enabled
// Exception unwinds stack before Error is handled, triggers Error-not-checked
// assertion here.
return Err;
}
```
[29 lines not shown]
[llvm][RISCV] Add bf16 vfabs and vfneg intrinsics for zvfbfa. (#172130)
These are pseudoinstruction aliases for vfsgnjx and vfsgnjn.
Co-authored-by: Craig Topper <craig.topper at sifive.com>
[flang-rt][device] Use snprintf result for length (#172239)
The buffer might not be null terminated on the device and result in 1
byte invalid read when trying to get the length.
[BOLT] Introduce getOutputBinaryFunctions(). NFCI (#172174)
To gain better control over the functions that go into the output file
and their order, introduce `BinaryContext::getOutputBinaryFunctions()`.
The new API returns a modifiable list of functions in output order.
This list is filled by a new `PopulateOutputFunctions` pass and includes
emittable functions from the input file, plus functions added by BOLT
(injected functions).
The new functionality allows to freely intermix input functions with
injected ones in the output, which will be used in new PRs.
The new function replaces `BinaryContext::getSortedFunctions()`, but
unlike its predecessor, it includes injected functions in the returned
list.
[orc-rt] Ensure EH/RTTI=On overrides LLVM opts, applies to unit tests. (#172155)
When -DORC_RT_ENABLE_EXCEPTIONS=On and -DORC_RT_ENABLE_RTTI=On are
passed we need to ensure that the resulting compiler flags (e.g.
-fexceptions, -frtti for clang/GCC) are appended so that we override any
inherited options (e.g. -fno-exceptions, -fno-rtti) from LLVM.
Updates unit tests to ensure that these compiler options are applied to
them too.
[clang-tidy] Suggest `std::views::reverse` instead of `std::ranges::reverse_view` in `modernize-use-ranges` (#172199)
`std::views::FOO` should in almost all cases be preferred over
`std::ranges::FOO_view`. For a detailed explanation of why that is, see
https://brevzin.github.io/c++/2023/03/14/prefer-views-meow/. The TLDR is
that it's shorter to spell (which is obvious) and can in certain cases
be more efficient (which is less obvious; see the article if curious).