AMDGPU: Stop requiring afn for f32 rsq formation
We were checking for afn or !fpmath attached to the sqrt. We
are not trying to replace a correctly rounded rsqrt; we're replacing
the two correctly rounded operations with the contracted operation.
It's net a better precision, so contract on both instructions should
be sufficient. Both the contracted and uncontracted sequences pass
the OpenCL conformance test, with a lower maximum error contracted.
AMDGPU: Introduce f64 rsq pattern in AMDGPUCodeGenPrepare
Handle this here instead of DAGCombine, mostly because the f32
case is handled here due to the dependency on !fpmath. Also we can
take advantage of computeKnownFPClass.
[tsan] Export __cxa_guard_ interceptors from TSan runtime. (#171921)
These functions from C++ ABI are defined in
compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp and are supposed to
replace implementations from libstdc++/libc++abi.
We need to export them similar to why we need to export other
interceptors and TSan runtime functions - e.g. if a dlopen-ed shared
library depends on `__cxa_guard_acquire`, it needs to pick up the
exported definition from the TSan runtime that was linked into the main
executable calling the dlopen()
However, because the `__cxa_guard_` functions don't use traditional
interceptor machinery, they are omitted from the auto-generated
`libclang_rt.tsan.a.syms` files. Fix this by adding them to
tsan.syms.extra file explicitly.
Co-authored-by: Vitaly Buka <vitalybuka at google.com>
[MLIR][MemRef] Emit error on atomic generic result op defined outside the region (#172190)
While figuring out how to perform an atomic exchange on a memref, I
tried the generic atomic rmw with the yielded value captured from the
enclosing scope (instead of a plain atomic_rmw with
`arith::AtomicRMWKind::assign`). Instead of segfaulting, this PR changes
the pass to produce an error when the result is not found in the
region's IR map.
It might be more useful to give a suggestion to the user, but giving an
error message instead of a crash is at least an imrovement, I think.
See: #172184
[X86] combineVectorSizedSetCCEquality - convert to mayFoldIntoVector helper (#172215)
Add AssumeSingleUse (default = false) argument to mayFoldIntoVector to
allow us to match combineVectorSizedSetCCEquality behaviour with
AssumeSingleUse=true
Hopefully we can drop the AssumeSingleUse entirely soon, but there are a
number of messy test regressions that need handling first
[LLDB][NativePDB] Create typedefs in structs (#169248)
Typedef/using declarations in structs and classes were not created with
the native PDB plugin. The following would only create `Foo` and
`Foo::Bar`:
```cpp
struct Foo {
struct Bar {};
using Baz = Bar;
using Int = int;
};
```
With this PR, they're created. One complication is that typedefs and
nested types show up identical. The example from above gives:
```
0x1006 | LF_FIELDLIST [size = 40, hash = 0x2E844]
- LF_NESTTYPE [name = `Bar`, parent = 0x1002]
- LF_NESTTYPE [name = `Baz`, parent = 0x1002]
[5 lines not shown]
[XRay][test] Mark fdr-mode.cpp test as unsupported for RISC-V
Commit c6f501d479e8 fixed an issue where some tests were incorrectly
marked as unsupported for a bootstrapping build. This exposed in our
'slow' full-bootstrap qemu-system CI that the fdr-mode.cpp fails on
RISC-V. We mark it as unsupported. I believe _xray_ArgLoggerEntry needs
to be implemented in xray_trampoline_risc*.S for this to work.
[X86] combineVectorSizedSetCCEquality - ensure the load is a normal load (#172212)
Noticed while trying to replace the IsVectorBitCastCheap helper with
mayFoldIntoVector (still some work to do as we have a number of multiuse
cases) - technically its possible for a extload to reach this point.
[ARM] Introduce intrinsics for MVE fp-converts under strict-fp. (#170686)
This is the last of the generic instructions created from MVE
intrinsics. It was a little more awkward than the others due to it
taking a Type as one of the arguments. This creates a new function to
create the intrinsic we need.
[ARM] Introduce intrinsics for MVE vcmp under strict-fp. (#169798)
Similar to #169156 again, this adds intrinsics for strict-fp compare nodes to
make sure they end up as the original instruction.
[ARM] Introduce intrinsics for MVE vrnd under strict-fp. (#169797)
Similar to #169156 again, this adds intrinsics for strict-fp vrnd nodes to make
sure they end up as the original instruction.
[LLVM][Examples] Disable broken JIT + plugin tests (AIX, Sparc)
Plugins appear to be broken on AIX, CI fails. There is logic in
CMakeLists for plugins+AIX, but it was never tested before...
Note: when plugins work, also enable tests in Examples/IRTransforms.
There's no Sparc support for JIT tests, so disable the JIT tests in the
examples (copied from ExecutionEngine/lit.local.cfg).
[DAG] SDPatternMatch - Replace runtime data structures with lengths known at compile time (#172064)
Following the suggestions in #170061, I replaced `SmallVector<SDValue>`
with `std::array<SDValue, NumPatterns>` and `SmallBitVector` with
`Bitset<NumPatterns>`.
I had to make some changes to the `collectLeaves` and
`reassociatableMatchHelper` functions. In `collectLeaves` specifically,
I changed the return type so I could propagate a failure in case the
number of found leaves is greater than the number of expected patterns.
I also added a new unit test that, together with the one already present
in the previous line, checks if the matching fails in the cases where
the number of patterns is less or more than the number of leaves.
I don't think this is going to completely address the increased compile
time reported in #169644, but hopefully it leads to an improvement.
[libc++] Use native wait in std::barrier instead of sleep loop (#171041)
For some reason, the current `std::barrier`'s wait implementation polls
the underlying atomic in a loop with sleeps instead of using the native
wait.
This change should also indirectly fix the performance issue of
`std::barrier` described in
https://github.com/llvm/llvm-project/issues/123855.
Fixes #123855