LLVM/project 060ed4cllvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll aarch64-addv.ll

Update tests, remove regression test
DeltaFile
+0-13llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+6-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+2-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+8-213 files

LLVM/project 453b73cllvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll

Compile time regression test
DeltaFile
+13-0llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+13-01 files

LLVM/project 8c703dallvm/lib/Target/AArch64 AArch64MIPeepholeOpt.cpp, llvm/test/CodeGen/AArch64 peephole-insvigpr.mir fpclamptosat_vec.ll

[AArch64] Fold zero-high vector inserts in MI peephole optimisation

Summary
This patch follows on from #178227.
The previous ISel fold lowers the 64-bit case to:
    fmov d0, x0
    fmov d0, d0
which is not ideal and could be fmov d0, x0.
A redundant copy comes from the INSERT_SUBREG/INSvi64lane.

This peephole detects <2 x i64> vectors made of a zeroed upper and low
lane produced by FMOVXDr/FMOVDr, then removes the redundant copy.

Further updated tests and added MIR tests.
DeltaFile
+51-0llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
+47-4llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+24-24llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
+7-8llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+6-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+2-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+137-442 files not shown
+137-468 files

LLVM/project 5524ce8clang/lib/AST OpenMPClause.cpp, clang/lib/Parse ParseOpenMP.cpp

[OpenMP][Clang] Parsing support for num_teams lower bound (#180608)

According to OpenMP 5.2 the num_teams clause should support a
lower-bound as modifier for its argument. This PR adds Parsing support
for the lower bound in num_teams clause.
DeltaFile
+122-2clang/test/OpenMP/teams_num_teams_messages.cpp
+103-0clang/test/OpenMP/num_teams_clause_ast.cpp
+48-14clang/lib/Sema/SemaOpenMP.cpp
+56-0clang/lib/Parse/ParseOpenMP.cpp
+14-2clang/lib/AST/OpenMPClause.cpp
+2-2clang/test/OpenMP/target_teams_distribute_parallel_for_num_teams_messages.cpp
+345-203 files not shown
+353-249 files

LLVM/project b12de4cllvm/test/CodeGen/AArch64 aarch64-addv.ll bitcast-extend.ll

Updated tests
DeltaFile
+3-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+1-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+4-82 files

LLVM/project 5850f41llvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll peephole-insvigpr.mir

Update tests
Reverted peephole-insvigpr.mir
Updated existing tests
DeltaFile
+0-285llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+0-51llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
+24-24llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
+8-7llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+3-0llvm/test/CodeGen/AArch64/aarch64-addv.ll
+1-0llvm/test/CodeGen/AArch64/neon-lowhalf128-optimisation.ll
+36-3672 files not shown
+38-3678 files

LLVM/project 5021b7fllvm/lib/Target/AArch64 AArch64MIPeepholeOpt.cpp AArch64ISelLowering.cpp

Respond to comments for AArch64ISelLowering.cpp
Remove peephole
DeltaFile
+4-47llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+10-9llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+14-562 files

LLVM/project 500c63bllvm/lib/Target/AArch64 aarch64-tensorflow-isel-regression.ll

Added regression test
DeltaFile
+18-0llvm/lib/Target/AArch64/aarch64-tensorflow-isel-regression.ll
+18-01 files

LLVM/project a6b2232llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

Clang format
DeltaFile
+1-1llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+1-11 files

LLVM/project ca7b30allvm/lib/Target/AArch64 AArch64ISelLowering.cpp

removed all_of loop upon @ilinpv's suggestion
DeltaFile
+28-21llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+28-211 files

LLVM/project 5028136llvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll

Added reduced reproduction
DeltaFile
+285-0llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+285-01 files

LLVM/project 10f07c7llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

Clang format
DeltaFile
+2-4llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+2-41 files

LLVM/project f7ce058llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

Clang format
DeltaFile
+2-3llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+2-31 files

LLVM/project e2dc9c7llvm/lib/Target/AArch64 AArch64MIPeepholeOpt.cpp AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 neon-lowhalf128-optimisation.ll peephole-insvigpr.mir

[AArch64] Fix regression from “Fold scalar-to-vector shuffles into DUP/FMOV

This patch aims to fix the original compile time regression by restricting the optimisation to run only on non-constant splats.
Without the guard, an infinite loop is caused because the CONCAT(SCALAR_TO_VECTOR, zero) folds back into the same BUILD_VECTOR and
immediately re-enters LowerBUILD_VECTOR.

This patch was tested with the original TensorFlow reproduction provided on the PR and shows a (very) slight improvement on
compile-time.
DeltaFile
+91-0llvm/test/CodeGen/AArch64/neon-lowhalf128-optimisation.ll
+51-0llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
+47-4llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+24-24llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
+27-0llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+7-8llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+247-365 files not shown
+253-5311 files

LLVM/project b8743de.github/workflows libcxx-run-benchmarks.yml

[libc++] Add link to the running job from the benchmarking bot (#180217)

This allows following the progress of the benchmarking job and also
spotting when it fails.

Fixes #158296
DeltaFile
+18-3.github/workflows/libcxx-run-benchmarks.yml
+18-31 files

LLVM/project 26c6b8cflang/lib/Lower/OpenMP ClauseProcessor.cpp, flang/test/Lower/OpenMP task-affinity.f90

Rebase and replace omp.iterators with omp.iterator
DeltaFile
+4-4mlir/test/Dialect/OpenMP/ops.mlir
+2-2flang/test/Lower/OpenMP/task-affinity.f90
+1-1flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+7-73 files

LLVM/project ae76defclang/lib/CodeGen/TargetBuiltins ARM.cpp

[clang][ARM] Refactor argument handling in `EmitAArch64BuiltinExpr` (3/N) (NFC) (#183315)

Remove the outstanding calls to `EmitScalarExpr` in
`EmitAArch64BuiltinExpr` that are no longer required.

This is a follow-up for #181794 and #181974 - please refer to
those PRs for more context.
DeltaFile
+32-65clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+32-651 files

LLVM/project 946e4b4flang/lib/Lower/OpenMP Utils.cpp ClauseProcessor.cpp, flang/test/Lower/OpenMP task-affinity.f90

Rewrite functions in affinity utility functions with hlfir apis
DeltaFile
+126-130flang/lib/Lower/OpenMP/Utils.cpp
+103-46flang/test/Lower/OpenMP/task-affinity.f90
+23-47flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+6-3flang/lib/Lower/OpenMP/Utils.h
+258-2264 files

LLVM/project da10c25clang/include/clang/AST ASTDumperUtils.h TextNodeDumper.h, clang/lib/AST TextNodeDumper.cpp ASTDumper.cpp

[AST][NFC] Move AST dump colors into separate namespace (#183341)

Preparatory work for Clang AST PCH, which will include ASTDumperUtils.h.
Polluting the clang namespace with colors would lead to a collision with
clang/lib/Frontend/TextDiagnostic.cpp.
DeltaFile
+62-62clang/lib/AST/TextNodeDumper.cpp
+45-42clang/include/clang/AST/ASTDumperUtils.h
+5-5clang/lib/AST/ASTDumper.cpp
+1-1clang/include/clang/AST/TextNodeDumper.h
+113-1104 files

LLVM/project 545c2a7llvm/lib/Transforms/Instrumentation AddressSanitizer.cpp, llvm/test/Instrumentation/AddressSanitizer fuchsia.ll

[clang][ASan][Fuchsia] Have Fuchsia use a dynamic shadow start (#182917)

These are the compiler changes that depend on the runtime changes in
https://github.com/llvm/llvm-project/pull/183154. The runtime changes
need to have landed first. The dynamic shadow global is still set to
zero, but this will change in the future.
DeltaFile
+9-0llvm/test/Instrumentation/AddressSanitizer/fuchsia.ll
+4-3llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
+13-32 files

LLVM/project 3495ae8flang/lib/Lower/OpenMP ClauseProcessor.cpp Utils.cpp, flang/test/Lower/OpenMP task-affinity.f90

Support iterator modifier in affinity clause
DeltaFile
+143-20flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+70-18mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+53-35flang/test/Lower/OpenMP/task-affinity.f90
+83-0flang/lib/Lower/OpenMP/Utils.cpp
+53-0mlir/test/Dialect/OpenMP/ops.mlir
+16-0flang/lib/Lower/OpenMP/Utils.h
+418-733 files not shown
+428-899 files

LLVM/project a52f611lld/test/wasm large-section.test large-debug-section.test, lld/wasm OutputSections.cpp InputChunks.h

[lld][Webassembly] Avoid a signed overflow on large sections (#183225)

wasm sections sizes are specified as u32s, and thus can be as large as
4GB. wasm-ld currently stores the offset into a section as an int32_t
which overflows on large sections and results in a crash. This change
makes it a int64_t to accommodate any valid wasm section and allow
catching even larger sections instead of wrapping around.

This PR fixes the issue by storing the offset as a int64_t, as well as
adding extra checks to handle un-encodeable sections to fail instead of
producing garbage wasm binaries, and also adds lit tests to make sure it
works. I confirmed the test fails on main but passes with this fix.

This is the same as https://github.com/llvm/llvm-project/pull/178287 but
deletes the temporary files the tests create and requires the tests run
on a 64-bit platform to avoid OOM issues due to the large binaries it
creates.
DeltaFile
+37-0lld/test/wasm/large-section.test
+31-0lld/test/wasm/large-debug-section.test
+23-0lld/test/wasm/section-too-large.test
+14-1lld/wasm/OutputSections.cpp
+5-1lld/wasm/InputChunks.h
+110-25 files

LLVM/project d2d862alibc/shared/math nexttowardf16.h, libc/src/__support/math CMakeLists.txt nexttowardf16.h

[libc][math] Refactor nexttoward family to header-only (#181685)

Closes https://github.com/llvm/llvm-project/issues/181684
DeltaFile
+82-3utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+48-0libc/src/__support/math/CMakeLists.txt
+34-0libc/src/__support/math/nexttowardf16.h
+29-0libc/shared/math/nexttowardf16.h
+28-0libc/src/__support/math/nexttowardl.h
+28-0libc/src/__support/math/nexttoward.h
+249-315 files not shown
+430-4321 files

LLVM/project e07a554llvm/lib/Target/AArch64 AArch64ConditionOptimizer.cpp, llvm/test/CodeGen/AArch64 combine-comparisons-by-cse.ll aarch64-condopt-unsigned.mir

[AArch64] Extend condition optimizer to support unsigned comparisons (#144380)

We have to be extra careful to not allow unsigned wraps, however. This
also required some adjusting of the logic in adjustCmp, as well as
compare the true imm value with add or sub taken into effect.

Because SIGNED_MIN and SIGNED_MAX cannot be an immediate, we do not need
to worry about those edge cases when dealing with unsigned comparisons.
DeltaFile
+663-0llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
+484-0llvm/test/CodeGen/AArch64/aarch64-condopt-unsigned.mir
+69-25llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
+1,216-253 files

LLVM/project 00de858llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlan.h, llvm/test/Transforms/LoopVectorize vplan-based-stride-mv.ll

[VPlan] Implement VPlan-based stride speculation
DeltaFile
+998-1,108llvm/test/Transforms/LoopVectorize/vplan-based-stride-mv.ll
+264-145llvm/test/Transforms/LoopVectorize/VPlan/vplan-based-stride-mv.ll
+292-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+43-0llvm/lib/Transforms/Vectorize/VPlan.h
+5-5llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+7-0llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+1,609-1,2615 files not shown
+1,630-1,26411 files

LLVM/project 269b2cbllvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp

Capitalize again
DeltaFile
+63-61llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+63-611 files

LLVM/project 525927ellvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

Capitalize DL
DeltaFile
+57-57llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+57-571 files

LLVM/project 8e053bellvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

Capitalize
DeltaFile
+41-41llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+41-411 files

LLVM/project bf50461llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp AMDGPUISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.exp10.f64.ll llvm.exp.f64.ll

AMDGPU: Implement expansion for f64 exp

I asked AI to port the device libs reference implementation.
It mostly worked, though it got the compares wrong and also
missed a fold that happened in compiler. With that fixed I get
identical DAG output, and almost the same globalisel output (differing
by an inverted compare and select). Also adjusted some stylistic choices.
DeltaFile
+11,178-0llvm/test/CodeGen/AMDGPU/llvm.exp10.f64.ll
+10,242-0llvm/test/CodeGen/AMDGPU/llvm.exp.f64.ll
+9,987-0llvm/test/CodeGen/AMDGPU/llvm.exp2.f64.ll
+117-9llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+116-1llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+31-7llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+31,671-176 files not shown
+31,729-6512 files

LLVM/project f1a9deelldb/source/Host/windows ConnectionGenericFileWindows.cpp

[lldb][windows] refactor ConnectionGenericFile's Read and Write methods (#183332)

DeltaFile
+97-151lldb/source/Host/windows/ConnectionGenericFileWindows.cpp
+97-1511 files