[AArch64] Fold zero-high vector inserts in MI peephole optimisation
Summary
This patch follows on from #178227.
The previous ISel fold lowers the 64-bit case to:
fmov d0, x0
fmov d0, d0
which is not ideal and could be fmov d0, x0.
A redundant copy comes from the INSERT_SUBREG/INSvi64lane.
This peephole detects <2 x i64> vectors made of a zeroed upper and low
lane produced by FMOVXDr/FMOVDr, then removes the redundant copy.
Further updated tests and added MIR tests.
[OpenMP][Clang] Parsing support for num_teams lower bound (#180608)
According to OpenMP 5.2 the num_teams clause should support a
lower-bound as modifier for its argument. This PR adds Parsing support
for the lower bound in num_teams clause.
[AArch64] Fix regression from “Fold scalar-to-vector shuffles into DUP/FMOV
This patch aims to fix the original compile time regression by restricting the optimisation to run only on non-constant splats.
Without the guard, an infinite loop is caused because the CONCAT(SCALAR_TO_VECTOR, zero) folds back into the same BUILD_VECTOR and
immediately re-enters LowerBUILD_VECTOR.
This patch was tested with the original TensorFlow reproduction provided on the PR and shows a (very) slight improvement on
compile-time.
[libc++] Add link to the running job from the benchmarking bot (#180217)
This allows following the progress of the benchmarking job and also
spotting when it fails.
Fixes #158296
[clang][ARM] Refactor argument handling in `EmitAArch64BuiltinExpr` (3/N) (NFC) (#183315)
Remove the outstanding calls to `EmitScalarExpr` in
`EmitAArch64BuiltinExpr` that are no longer required.
This is a follow-up for #181794 and #181974 - please refer to
those PRs for more context.
[AST][NFC] Move AST dump colors into separate namespace (#183341)
Preparatory work for Clang AST PCH, which will include ASTDumperUtils.h.
Polluting the clang namespace with colors would lead to a collision with
clang/lib/Frontend/TextDiagnostic.cpp.
[clang][ASan][Fuchsia] Have Fuchsia use a dynamic shadow start (#182917)
These are the compiler changes that depend on the runtime changes in
https://github.com/llvm/llvm-project/pull/183154. The runtime changes
need to have landed first. The dynamic shadow global is still set to
zero, but this will change in the future.
[lld][Webassembly] Avoid a signed overflow on large sections (#183225)
wasm sections sizes are specified as u32s, and thus can be as large as
4GB. wasm-ld currently stores the offset into a section as an int32_t
which overflows on large sections and results in a crash. This change
makes it a int64_t to accommodate any valid wasm section and allow
catching even larger sections instead of wrapping around.
This PR fixes the issue by storing the offset as a int64_t, as well as
adding extra checks to handle un-encodeable sections to fail instead of
producing garbage wasm binaries, and also adds lit tests to make sure it
works. I confirmed the test fails on main but passes with this fix.
This is the same as https://github.com/llvm/llvm-project/pull/178287 but
deletes the temporary files the tests create and requires the tests run
on a 64-bit platform to avoid OOM issues due to the large binaries it
creates.
[AArch64] Extend condition optimizer to support unsigned comparisons (#144380)
We have to be extra careful to not allow unsigned wraps, however. This
also required some adjusting of the logic in adjustCmp, as well as
compare the true imm value with add or sub taken into effect.
Because SIGNED_MIN and SIGNED_MAX cannot be an immediate, we do not need
to worry about those edge cases when dealing with unsigned comparisons.
AMDGPU: Implement expansion for f64 exp
I asked AI to port the device libs reference implementation.
It mostly worked, though it got the compares wrong and also
missed a fold that happened in compiler. With that fixed I get
identical DAG output, and almost the same globalisel output (differing
by an inverted compare and select). Also adjusted some stylistic choices.