LLVM/project 5734d97llvm/lib/Target/AArch64 AArch64ISelLowering.cpp aarch64-tensorflow-isel-regression.ll, llvm/test/CodeGen/AArch64 neon-lowhalf128-optimisation.ll aarch64-addv.ll

[AArch64] Fix regression from “Fold scalar-to-vector shuffles into DUP/FMOV" (#178227)

Revised #166962.

This patch aims to fix the original compile time regression by
restricting the optimisation to run only on non-constant splats. Without
the guard, an infinite loop is caused because the
`CONCAT(SCALAR_TO_VECTOR, zero)` folds back into the same `BUILD_VECTOR`
and immediately re-enters `LowerBUILD_VECTOR`.

This patch was tested with the original TensorFlow reproduction provided
on the PR and shows a (very) slight improvement on compile-time.
DeltaFile
+92-0llvm/test/CodeGen/AArch64/neon-lowhalf128-optimisation.ll
+32-0llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+18-0llvm/lib/Target/AArch64/aarch64-tensorflow-isel-regression.ll
+6-9llvm/test/CodeGen/AArch64/aarch64-addv.ll
+4-4llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+2-3llvm/test/CodeGen/AArch64/bitcast-extend.ll
+154-163 files not shown
+157-259 files

LLVM/project 6d5e051clang-tools-extra/clang-doc JSONGenerator.cpp, clang-tools-extra/clang-doc/assets clang-doc-mustache.css

[clang-doc]: Enable horizontal wrapping on longer function definitions (#181417)

This patch enables wrapping for longer function and template definitions
in the generated HTML. Currently uses the no. of parameters to 
determine the need to wrap the function. If a function or template has
more than 2 parameters, they are printed one per line. Also fixes a styling
bug where a trailing comma was left after the last parameter.
DeltaFile
+160-12clang-tools-extra/test/clang-doc/templates.cpp
+41-30clang-tools-extra/clang-doc/JSONGenerator.cpp
+10-6clang-tools-extra/test/clang-doc/json/function-requires.cpp
+9-0clang-tools-extra/clang-doc/assets/clang-doc-mustache.css
+5-3clang-tools-extra/test/clang-doc/json/class.cpp
+4-2clang-tools-extra/test/clang-doc/json/class-template.cpp
+229-537 files not shown
+246-6313 files

LLVM/project bfaa15elibc/src/__support/math CMakeLists.txt f16divf128.h, libc/src/math/generic CMakeLists.txt

[libc][math] Refactor float16 basic operations to header-only (#181745)

closes: #181744
DeltaFile
+159-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+123-0libc/src/__support/math/CMakeLists.txt
+12-24libc/src/math/generic/CMakeLists.txt
+35-0libc/src/__support/math/f16divf128.h
+35-0libc/src/__support/math/f16subf128.h
+35-0libc/src/__support/math/f16mulf128.h
+399-2436 files not shown
+1,098-7542 files

LLVM/project ed1a1edcompiler-rt/test/fuzzer reduce_inputs.test

[compiler-rt][Fuzzer] Relax reduce_inputs.test to account for non-determinism (#182495)

I have seen that very occasionally this test is failing on a bot, with
only 3 files in the corpus. After running the test in a loop 4000+
times, I witnessed this same failure twice.

In both cases the first corpus member was some string not containing a
'F'; the second corpus member was 'F[' or 'FZ'; and the final corpus
member 'FUZ'.

In a normal run there is an intermediate corpus member 'FU.' - so this
test is failing in very rare cases where the fuzzer gets lucky and
matches 2 branch conditions in one mutation.

This patch allows the FileCheck condition to match 3 or 4 corpus files.
It may be possible for the fuzzer to reach the target in 2 files, but I
think that if that is possible, it will be exceptionally rare.

rdar://170440934
DeltaFile
+3-2compiler-rt/test/fuzzer/reduce_inputs.test
+3-21 files

LLVM/project 81afd93mlir/include/mlir/Dialect/Arith/IR ArithOpsInterfaces.td ArithOps.td, mlir/lib/Dialect/Arith/IR ArithCanonicalization.td

[mlir][arith] Add nneg to extui and uitofp. (#183165)

This patchset adds missing the missing flag nneg (non-negative) to extui
and uitofp which denotes that the operand is known to be non-negative.
Semantics for this flag mirrors LLVM semantics.

[From:](https://discourse.llvm.org/t/rfc-add-zext-nneg-flag/73914) 

> If the nneg flag is set, and the zext argument is negative, the result
is a poison value.
> A corollary is that replacing a zext nneg with sext is a refinement.


[and](https://discourse.llvm.org/t/rfc-support-nneg-flag-with-uitofp/77988):

> uitofp nneg iN %x to fM returns poison if %x is negative
> A corollary is that uitofp nneg iN %x to fM is equivilent to sitofp iN
%x to fM.


    [7 lines not shown]
DeltaFile
+127-0mlir/test/Dialect/Arith/canonicalize.mlir
+47-0mlir/include/mlir/Dialect/Arith/IR/ArithOpsInterfaces.td
+38-2mlir/include/mlir/Dialect/Arith/IR/ArithOps.td
+25-13mlir/lib/Dialect/Arith/IR/ArithCanonicalization.td
+28-0mlir/test/Dialect/Arith/ops.mlir
+24-0mlir/test/Conversion/ArithToLLVM/arith-to-llvm.mlir
+289-152 files not shown
+315-178 files

LLVM/project 09d7b89libcxx/include/__math gamma.h, libcxx/include/__random binomial_distribution.h

[libc++] Add a thread-safe version of std::lgamma in the dylib (#153631)

Libc++ currently redeclares ::lgamma_r on platforms that provide it.
This causes issues when building with modules, and redeclaring functions
provided by another library (here the C library) is bad hygiene.

Instead, use an asm declaration to call the right function without
having to redeclare it.
DeltaFile
+4-23libcxx/include/__random/binomial_distribution.h
+26-0libcxx/include/__math/gamma.h
+30-232 files

LLVM/project a224ba0flang/include/flang/Lower CUDA.h, flang/lib/Lower CUDA.cpp Bridge.cpp

[flang][cuda] Support data transfer with parenthesis around rhs (#183201)

DeltaFile
+26-4flang/lib/Lower/CUDA.cpp
+13-0flang/test/Lower/CUDA/cuda-data-transfer.cuf
+6-5flang/lib/Lower/Bridge.cpp
+2-1flang/include/flang/Lower/CUDA.h
+47-104 files

LLVM/project 060ed4cllvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll aarch64-addv.ll

Update tests, remove regression test
DeltaFile
+0-13llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+6-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+2-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+8-213 files

LLVM/project 453b73cllvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll

Compile time regression test
DeltaFile
+13-0llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+13-01 files

LLVM/project 8c703dallvm/lib/Target/AArch64 AArch64MIPeepholeOpt.cpp, llvm/test/CodeGen/AArch64 peephole-insvigpr.mir fpclamptosat_vec.ll

[AArch64] Fold zero-high vector inserts in MI peephole optimisation

Summary
This patch follows on from #178227.
The previous ISel fold lowers the 64-bit case to:
    fmov d0, x0
    fmov d0, d0
which is not ideal and could be fmov d0, x0.
A redundant copy comes from the INSERT_SUBREG/INSvi64lane.

This peephole detects <2 x i64> vectors made of a zeroed upper and low
lane produced by FMOVXDr/FMOVDr, then removes the redundant copy.

Further updated tests and added MIR tests.
DeltaFile
+51-0llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
+47-4llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+24-24llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
+7-8llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+6-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+2-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+137-442 files not shown
+137-468 files

LLVM/project 5524ce8clang/lib/AST OpenMPClause.cpp, clang/lib/Parse ParseOpenMP.cpp

[OpenMP][Clang] Parsing support for num_teams lower bound (#180608)

According to OpenMP 5.2 the num_teams clause should support a
lower-bound as modifier for its argument. This PR adds Parsing support
for the lower bound in num_teams clause.
DeltaFile
+122-2clang/test/OpenMP/teams_num_teams_messages.cpp
+103-0clang/test/OpenMP/num_teams_clause_ast.cpp
+48-14clang/lib/Sema/SemaOpenMP.cpp
+56-0clang/lib/Parse/ParseOpenMP.cpp
+14-2clang/lib/AST/OpenMPClause.cpp
+2-2clang/test/OpenMP/target_teams_distribute_parallel_for_num_teams_messages.cpp
+345-203 files not shown
+353-249 files

LLVM/project b12de4cllvm/test/CodeGen/AArch64 aarch64-addv.ll bitcast-extend.ll

Updated tests
DeltaFile
+3-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+1-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+4-82 files

LLVM/project 5850f41llvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll peephole-insvigpr.mir

Update tests
Reverted peephole-insvigpr.mir
Updated existing tests
DeltaFile
+0-285llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+0-51llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
+24-24llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
+8-7llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+3-0llvm/test/CodeGen/AArch64/aarch64-addv.ll
+1-0llvm/test/CodeGen/AArch64/neon-lowhalf128-optimisation.ll
+36-3672 files not shown
+38-3678 files

LLVM/project 5021b7fllvm/lib/Target/AArch64 AArch64MIPeepholeOpt.cpp AArch64ISelLowering.cpp

Respond to comments for AArch64ISelLowering.cpp
Remove peephole
DeltaFile
+4-47llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+10-9llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+14-562 files

LLVM/project 500c63bllvm/lib/Target/AArch64 aarch64-tensorflow-isel-regression.ll

Added regression test
DeltaFile
+18-0llvm/lib/Target/AArch64/aarch64-tensorflow-isel-regression.ll
+18-01 files

LLVM/project a6b2232llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

Clang format
DeltaFile
+1-1llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+1-11 files

LLVM/project ca7b30allvm/lib/Target/AArch64 AArch64ISelLowering.cpp

removed all_of loop upon @ilinpv's suggestion
DeltaFile
+28-21llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+28-211 files

LLVM/project 5028136llvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll

Added reduced reproduction
DeltaFile
+285-0llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+285-01 files

LLVM/project 10f07c7llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

Clang format
DeltaFile
+2-4llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+2-41 files

LLVM/project f7ce058llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

Clang format
DeltaFile
+2-3llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+2-31 files

LLVM/project e2dc9c7llvm/lib/Target/AArch64 AArch64MIPeepholeOpt.cpp AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 neon-lowhalf128-optimisation.ll peephole-insvigpr.mir

[AArch64] Fix regression from “Fold scalar-to-vector shuffles into DUP/FMOV

This patch aims to fix the original compile time regression by restricting the optimisation to run only on non-constant splats.
Without the guard, an infinite loop is caused because the CONCAT(SCALAR_TO_VECTOR, zero) folds back into the same BUILD_VECTOR and
immediately re-enters LowerBUILD_VECTOR.

This patch was tested with the original TensorFlow reproduction provided on the PR and shows a (very) slight improvement on
compile-time.
DeltaFile
+91-0llvm/test/CodeGen/AArch64/neon-lowhalf128-optimisation.ll
+51-0llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
+47-4llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+24-24llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
+27-0llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+7-8llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+247-365 files not shown
+253-5311 files

LLVM/project b8743de.github/workflows libcxx-run-benchmarks.yml

[libc++] Add link to the running job from the benchmarking bot (#180217)

This allows following the progress of the benchmarking job and also
spotting when it fails.

Fixes #158296
DeltaFile
+18-3.github/workflows/libcxx-run-benchmarks.yml
+18-31 files

LLVM/project 26c6b8cflang/lib/Lower/OpenMP ClauseProcessor.cpp, flang/test/Lower/OpenMP task-affinity.f90

Rebase and replace omp.iterators with omp.iterator
DeltaFile
+4-4mlir/test/Dialect/OpenMP/ops.mlir
+2-2flang/test/Lower/OpenMP/task-affinity.f90
+1-1flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+7-73 files

LLVM/project ae76defclang/lib/CodeGen/TargetBuiltins ARM.cpp

[clang][ARM] Refactor argument handling in `EmitAArch64BuiltinExpr` (3/N) (NFC) (#183315)

Remove the outstanding calls to `EmitScalarExpr` in
`EmitAArch64BuiltinExpr` that are no longer required.

This is a follow-up for #181794 and #181974 - please refer to
those PRs for more context.
DeltaFile
+32-65clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+32-651 files

LLVM/project 946e4b4flang/lib/Lower/OpenMP Utils.cpp ClauseProcessor.cpp, flang/test/Lower/OpenMP task-affinity.f90

Rewrite functions in affinity utility functions with hlfir apis
DeltaFile
+126-130flang/lib/Lower/OpenMP/Utils.cpp
+103-46flang/test/Lower/OpenMP/task-affinity.f90
+23-47flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+6-3flang/lib/Lower/OpenMP/Utils.h
+258-2264 files

LLVM/project da10c25clang/include/clang/AST ASTDumperUtils.h TextNodeDumper.h, clang/lib/AST TextNodeDumper.cpp ASTDumper.cpp

[AST][NFC] Move AST dump colors into separate namespace (#183341)

Preparatory work for Clang AST PCH, which will include ASTDumperUtils.h.
Polluting the clang namespace with colors would lead to a collision with
clang/lib/Frontend/TextDiagnostic.cpp.
DeltaFile
+62-62clang/lib/AST/TextNodeDumper.cpp
+45-42clang/include/clang/AST/ASTDumperUtils.h
+5-5clang/lib/AST/ASTDumper.cpp
+1-1clang/include/clang/AST/TextNodeDumper.h
+113-1104 files

LLVM/project 545c2a7llvm/lib/Transforms/Instrumentation AddressSanitizer.cpp, llvm/test/Instrumentation/AddressSanitizer fuchsia.ll

[clang][ASan][Fuchsia] Have Fuchsia use a dynamic shadow start (#182917)

These are the compiler changes that depend on the runtime changes in
https://github.com/llvm/llvm-project/pull/183154. The runtime changes
need to have landed first. The dynamic shadow global is still set to
zero, but this will change in the future.
DeltaFile
+9-0llvm/test/Instrumentation/AddressSanitizer/fuchsia.ll
+4-3llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
+13-32 files

LLVM/project 3495ae8flang/lib/Lower/OpenMP ClauseProcessor.cpp Utils.cpp, flang/test/Lower/OpenMP task-affinity.f90

Support iterator modifier in affinity clause
DeltaFile
+143-20flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+70-18mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+53-35flang/test/Lower/OpenMP/task-affinity.f90
+83-0flang/lib/Lower/OpenMP/Utils.cpp
+53-0mlir/test/Dialect/OpenMP/ops.mlir
+16-0flang/lib/Lower/OpenMP/Utils.h
+418-733 files not shown
+428-899 files

LLVM/project a52f611lld/test/wasm large-section.test large-debug-section.test, lld/wasm OutputSections.cpp InputChunks.h

[lld][Webassembly] Avoid a signed overflow on large sections (#183225)

wasm sections sizes are specified as u32s, and thus can be as large as
4GB. wasm-ld currently stores the offset into a section as an int32_t
which overflows on large sections and results in a crash. This change
makes it a int64_t to accommodate any valid wasm section and allow
catching even larger sections instead of wrapping around.

This PR fixes the issue by storing the offset as a int64_t, as well as
adding extra checks to handle un-encodeable sections to fail instead of
producing garbage wasm binaries, and also adds lit tests to make sure it
works. I confirmed the test fails on main but passes with this fix.

This is the same as https://github.com/llvm/llvm-project/pull/178287 but
deletes the temporary files the tests create and requires the tests run
on a 64-bit platform to avoid OOM issues due to the large binaries it
creates.
DeltaFile
+37-0lld/test/wasm/large-section.test
+31-0lld/test/wasm/large-debug-section.test
+23-0lld/test/wasm/section-too-large.test
+14-1lld/wasm/OutputSections.cpp
+5-1lld/wasm/InputChunks.h
+110-25 files

LLVM/project d2d862alibc/shared/math nexttowardf16.h, libc/src/__support/math CMakeLists.txt nexttowardf16.h

[libc][math] Refactor nexttoward family to header-only (#181685)

Closes https://github.com/llvm/llvm-project/issues/181684
DeltaFile
+82-3utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+48-0libc/src/__support/math/CMakeLists.txt
+34-0libc/src/__support/math/nexttowardf16.h
+29-0libc/shared/math/nexttowardf16.h
+28-0libc/src/__support/math/nexttowardl.h
+28-0libc/src/__support/math/nexttoward.h
+249-315 files not shown
+430-4321 files