[AArch64] Fix regression from “Fold scalar-to-vector shuffles into DUP/FMOV" (#178227)
Revised #166962.
This patch aims to fix the original compile time regression by
restricting the optimisation to run only on non-constant splats. Without
the guard, an infinite loop is caused because the
`CONCAT(SCALAR_TO_VECTOR, zero)` folds back into the same `BUILD_VECTOR`
and immediately re-enters `LowerBUILD_VECTOR`.
This patch was tested with the original TensorFlow reproduction provided
on the PR and shows a (very) slight improvement on compile-time.
[clang-doc]: Enable horizontal wrapping on longer function definitions (#181417)
This patch enables wrapping for longer function and template definitions
in the generated HTML. Currently uses the no. of parameters to
determine the need to wrap the function. If a function or template has
more than 2 parameters, they are printed one per line. Also fixes a styling
bug where a trailing comma was left after the last parameter.
[compiler-rt][Fuzzer] Relax reduce_inputs.test to account for non-determinism (#182495)
I have seen that very occasionally this test is failing on a bot, with
only 3 files in the corpus. After running the test in a loop 4000+
times, I witnessed this same failure twice.
In both cases the first corpus member was some string not containing a
'F'; the second corpus member was 'F[' or 'FZ'; and the final corpus
member 'FUZ'.
In a normal run there is an intermediate corpus member 'FU.' - so this
test is failing in very rare cases where the fuzzer gets lucky and
matches 2 branch conditions in one mutation.
This patch allows the FileCheck condition to match 3 or 4 corpus files.
It may be possible for the fuzzer to reach the target in 2 files, but I
think that if that is possible, it will be exceptionally rare.
rdar://170440934
[mlir][arith] Add nneg to extui and uitofp. (#183165)
This patchset adds missing the missing flag nneg (non-negative) to extui
and uitofp which denotes that the operand is known to be non-negative.
Semantics for this flag mirrors LLVM semantics.
[From:](https://discourse.llvm.org/t/rfc-add-zext-nneg-flag/73914)
> If the nneg flag is set, and the zext argument is negative, the result
is a poison value.
> A corollary is that replacing a zext nneg with sext is a refinement.
[and](https://discourse.llvm.org/t/rfc-support-nneg-flag-with-uitofp/77988):
> uitofp nneg iN %x to fM returns poison if %x is negative
> A corollary is that uitofp nneg iN %x to fM is equivilent to sitofp iN
%x to fM.
[7 lines not shown]
[libc++] Add a thread-safe version of std::lgamma in the dylib (#153631)
Libc++ currently redeclares ::lgamma_r on platforms that provide it.
This causes issues when building with modules, and redeclaring functions
provided by another library (here the C library) is bad hygiene.
Instead, use an asm declaration to call the right function without
having to redeclare it.
[AArch64] Fold zero-high vector inserts in MI peephole optimisation
Summary
This patch follows on from #178227.
The previous ISel fold lowers the 64-bit case to:
fmov d0, x0
fmov d0, d0
which is not ideal and could be fmov d0, x0.
A redundant copy comes from the INSERT_SUBREG/INSvi64lane.
This peephole detects <2 x i64> vectors made of a zeroed upper and low
lane produced by FMOVXDr/FMOVDr, then removes the redundant copy.
Further updated tests and added MIR tests.
[OpenMP][Clang] Parsing support for num_teams lower bound (#180608)
According to OpenMP 5.2 the num_teams clause should support a
lower-bound as modifier for its argument. This PR adds Parsing support
for the lower bound in num_teams clause.
[AArch64] Fix regression from “Fold scalar-to-vector shuffles into DUP/FMOV
This patch aims to fix the original compile time regression by restricting the optimisation to run only on non-constant splats.
Without the guard, an infinite loop is caused because the CONCAT(SCALAR_TO_VECTOR, zero) folds back into the same BUILD_VECTOR and
immediately re-enters LowerBUILD_VECTOR.
This patch was tested with the original TensorFlow reproduction provided on the PR and shows a (very) slight improvement on
compile-time.
[libc++] Add link to the running job from the benchmarking bot (#180217)
This allows following the progress of the benchmarking job and also
spotting when it fails.
Fixes #158296
[clang][ARM] Refactor argument handling in `EmitAArch64BuiltinExpr` (3/N) (NFC) (#183315)
Remove the outstanding calls to `EmitScalarExpr` in
`EmitAArch64BuiltinExpr` that are no longer required.
This is a follow-up for #181794 and #181974 - please refer to
those PRs for more context.
[AST][NFC] Move AST dump colors into separate namespace (#183341)
Preparatory work for Clang AST PCH, which will include ASTDumperUtils.h.
Polluting the clang namespace with colors would lead to a collision with
clang/lib/Frontend/TextDiagnostic.cpp.
[clang][ASan][Fuchsia] Have Fuchsia use a dynamic shadow start (#182917)
These are the compiler changes that depend on the runtime changes in
https://github.com/llvm/llvm-project/pull/183154. The runtime changes
need to have landed first. The dynamic shadow global is still set to
zero, but this will change in the future.
[lld][Webassembly] Avoid a signed overflow on large sections (#183225)
wasm sections sizes are specified as u32s, and thus can be as large as
4GB. wasm-ld currently stores the offset into a section as an int32_t
which overflows on large sections and results in a crash. This change
makes it a int64_t to accommodate any valid wasm section and allow
catching even larger sections instead of wrapping around.
This PR fixes the issue by storing the offset as a int64_t, as well as
adding extra checks to handle un-encodeable sections to fail instead of
producing garbage wasm binaries, and also adds lit tests to make sure it
works. I confirmed the test fails on main but passes with this fix.
This is the same as https://github.com/llvm/llvm-project/pull/178287 but
deletes the temporary files the tests create and requires the tests run
on a 64-bit platform to avoid OOM issues due to the large binaries it
creates.