LLVM/project 0b681eellvm/test/Transforms/SLPVectorizer/X86 arith-mul-smulo.ll arith-add-saddo.ll

[SLP] Vectorize struct-returning intrinsics

Allow SLP to combine across lanes calls that return a literal struct
(llvm.sincos, llvm.*.with.overflow, llvm.frexp, ...) into a single
call returning a struct of vectors, by widening {T, T, ...} to
{<VF x T>, ...} via VectorTypeUtils and emitting extractvalue +
extractelement for external uses.

Original Pull Request: https://github.com/llvm/llvm-project/pull/195521

Reviewers: hiraditya, RKSimon, bababuck

Pull Request: https://github.com/llvm/llvm-project/pull/196756
DeltaFile
+549-615llvm/test/Transforms/SLPVectorizer/X86/arith-mul-smulo.ll
+449-615llvm/test/Transforms/SLPVectorizer/X86/arith-add-saddo.ll
+449-615llvm/test/Transforms/SLPVectorizer/X86/arith-add-uaddo.ll
+449-615llvm/test/Transforms/SLPVectorizer/X86/arith-sub-ssubo.ll
+449-615llvm/test/Transforms/SLPVectorizer/X86/arith-sub-usubo.ll
+429-615llvm/test/Transforms/SLPVectorizer/X86/arith-mul-umulo.ll
+2,774-3,6904 files not shown
+3,268-3,91310 files

LLVM/project 8b6731allvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange ninf.ll reduction2mem.ll

[LoopInterchange] Drop ninf from instructions involved in interchange
DeltaFile
+32-11llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+1-1llvm/test/Transforms/LoopInterchange/ninf.ll
+1-1llvm/test/Transforms/LoopInterchange/reduction2mem.ll
+34-133 files

LLVM/project 8aa6d51llvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Remove support for FEAT_MPAMv2_VID

`FEAT_MPAMv2_VID` instructions and system registers, as introduced
in change d30f18d2c, are being removed at this time, as they've been
removed from the latest Arm ARM, which doesn't preclude them returning
in some form in future.

Other system registers introduced with `FEAT_MPAMv2` are unaffected,
and these continue to be ungated, but since `+mpamv2` gating is now
empty, I'm removing this superfluous gating code.

Cherry-picked-from: a48159df9
DeltaFile
+5-86llvm/test/MC/AArch64/armv9.7a-mpamv2.s
+0-36llvm/lib/Target/AArch64/AArch64SystemOperands.td
+5-17llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+0-18llvm/test/MC/AArch64/armv9.7a-mpamv2-diagnostics.s
+2-12llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+0-8llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+12-1776 files not shown
+13-19512 files

LLVM/project ed2fda6mlir/lib/Conversion/ArithToSPIRV ArithToSPIRV.cpp, mlir/test/Conversion/ArithToSPIRV arith-to-spirv.mlir

[mlir][spirv] Convert arith.subui_extended to spirv.ISubBorrow (#197736)
DeltaFile
+19-16mlir/lib/Conversion/ArithToSPIRV/ArithToSPIRV.cpp
+27-0mlir/test/Conversion/ArithToSPIRV/arith-to-spirv.mlir
+10-0mlir/test/Target/SPIRV/arithmetic-ops.mlir
+56-163 files

LLVM/project d3c38cfllvm/lib/Target/ARM ARMISelLowering.cpp, llvm/test/CodeGen/Thumb2 mve-pred-const.ll

[ARM][MVE] Constant fold PREDICATE_CAST of 0 and 0xffff (#197832)

This allows us to fold away the vselect when we know that the condition
is all true or all false.
DeltaFile
+4-36llvm/test/CodeGen/Thumb2/mve-pred-const.ll
+11-0llvm/lib/Target/ARM/ARMISelLowering.cpp
+15-362 files

LLVM/project 17146dcclang/lib/Driver/ToolChains/Arch AArch64.cpp

[clang][AArch64][NFC] Match variable names to code style (#197918)

Follow up to 0ac83dccaf53f3a51714fd53b151314de1a13e48 / #197689.
DeltaFile
+6-6clang/lib/Driver/ToolChains/Arch/AArch64.cpp
+6-61 files

LLVM/project eec28baopenmp CMakeLists.txt, openmp/device CMakeLists.txt

[OpenMP] Fix missing install-openmp component (#197603)

Summary:
This pattern is consistent throughout all the runtimes and is what the
top-level `install-openmp-<triple>` corresponds to. It should be
provided and used.
DeltaFile
+10-9openmp/runtime/src/CMakeLists.txt
+9-0openmp/CMakeLists.txt
+4-2openmp/device/CMakeLists.txt
+2-2openmp/tools/archer/CMakeLists.txt
+2-1openmp/docs/CMakeLists.txt
+2-1openmp/tools/Modules/CMakeLists.txt
+29-154 files not shown
+34-1810 files

LLVM/project 6996e97llvm/test/CodeGen/AArch64 fptoi-256.ll

[AArch64] Delete llvm/test/CodeGen/AArch64/fptoi-256.ll (NFC) (#197896)

llvm/test/CodeGen/AArch64/fcvt-i256.ll has since been added with the
same and broader coverage.
DeltaFile
+0-11llvm/test/CodeGen/AArch64/fptoi-256.ll
+0-111 files

LLVM/project b152ea8libc/src/stdlib CMakeLists.txt

[libc] Disable GCC 12 waccess passes to fix ICE in environ_internal (#197916)

The waccess pass in GCC 12 consistently segmentation faults when
analyzing the memory allocations in environ_internal.cpp. This change
disables the relevant tree-waccess passes for this specific file,
avoiding the ICE without requiring intrusive code refactoring.

Assisted-by: Automated tooling, human reviewed.
DeltaFile
+3-1libc/src/stdlib/CMakeLists.txt
+3-11 files

LLVM/project 400c376lld/MinGW Driver.cpp Options.td, lld/docs ReleaseNotes.rst

[LLD] [MinGW] Implement --{push,pop}-state (#197748)

Implement `--push-state` and `--pop-state` for the MinGW lld driver.
Those options were already implemented by GNU ld for MinGW:
```
  --push-state                Push state of flags governing input file handling
  --pop-state                 Pop state of flags governing input file handling
```

This will align the MinGW frontend's options closer with those of the
ELF frontend and fix issues due to e.g. CMake misdetecting
`--push-state`/`--pop-state` support by accidentally querying the ELF
driver.

Fixes #131007.
DeltaFile
+18-0lld/MinGW/Driver.cpp
+8-0lld/test/MinGW/driver.test
+4-0lld/MinGW/Options.td
+4-0lld/docs/ReleaseNotes.rst
+34-04 files

LLVM/project d1bac63llvm/lib/Target/ARM ARMInstrInfo.td

[ARM] NOP should be mov r0, r0 on all not V6K, including regular V6 (#196625)

Otherwise, nop on armv6 but not v6k targets may not work.
DeltaFile
+3-2llvm/lib/Target/ARM/ARMInstrInfo.td
+3-21 files

LLVM/project b03b2ddllvm/lib/Transforms/Vectorize VPlanTransforms.cpp LoopVectorize.cpp

[VPlan] Move call widening decision to VPlan. (NFCI) (#195518)

This patch adds a new makeCallWideningDecisions transform which converts
Call VPInstructions to
VPWidenCallRecipe/VPWidenIntrinsicRecipe/VPReplicateRecipe depending on
their costs.

To compute the costs, static helpers are introduced to re-use the
existing VPlan cost model logic:
 * VPWidenIntrinsicRecipe::computeCallCost
 * VPReplicateRecipe::computeCallCost

The cost-model logic is still retained; we assert that the decisions
match to make sure we do not miss any edge cases. The legacy logic will
be removed in a follow-up.

PR: https://github.com/llvm/llvm-project/pull/195518
DeltaFile
+181-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+44-90llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+48-34llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+16-4llvm/lib/Transforms/Vectorize/VPlan.h
+19-0llvm/lib/Transforms/Vectorize/VPlanHelpers.h
+2-12llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+310-1403 files not shown
+324-1479 files

LLVM/project d28372cllvm/test/Transforms/LoopInterchange ninf.ll

[LoopInterchange] Add test for poison can be produced due to ninf (NFC)
DeltaFile
+154-0llvm/test/Transforms/LoopInterchange/ninf.ll
+154-01 files

LLVM/project e984652llvm/lib/Transforms/Vectorize VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize/AArch64 cmp_cost.ll

[VPlan] Compute the cost for scalar cmp outside the vector region (#197146)

Currently we don't compute the cost of any scalar compares. Change this
to only avoid computing the cost if it's inside the vector region, as
compares that are used in the loop exit condition are handled by the
legacy cost model and this is the simplest way to avoid double-counting
those instructions.

This mainly affects the compare in the middle block, and accounting for
the cost of that can change the requred minimum trip count.
DeltaFile
+16-8llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll
+16-8llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-div.ll
+11-11llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll
+12-7llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+7-7llvm/test/Transforms/LoopVectorize/AArch64/cmp_cost.ll
+8-4llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
+70-4512 files not shown
+102-6218 files

LLVM/project be582e4llvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/CodeGen/ARM atomic-load-store.ll

[AtomicExpand] Add bitcasts when expanding store atomic vector

AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
DeltaFile
+100-7llvm/test/CodeGen/X86/atomic-load-store.ll
+98-0llvm/test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
+49-0llvm/test/CodeGen/ARM/atomic-load-store.ll
+4-2llvm/lib/CodeGen/AtomicExpandPass.cpp
+251-94 files

LLVM/project 7126e6allvm/lib/Target/AArch64 AArch64RegisterInfo.cpp, llvm/test/CodeGen/AArch64 regalloc-hint-movprfx-streaming.mir

[LLVM][CodeGen][SME] Improve regalloc hinting for multi-vector instructions. (#197711)

When an instruction uses one of the results of a multi-vector
instruction it will typically be a subreg. For it to be considered a
suitable reuse candidate we must convert the subreg to its underlying
physical register.
DeltaFile
+87-0llvm/test/CodeGen/AArch64/regalloc-hint-movprfx-streaming.mir
+4-1llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+91-12 files

LLVM/project 381d3a3llvm/test/CodeGen/AMDGPU reassoc-mul-add-1-to-mad.ll literal64.ll, llvm/test/CodeGen/AMDGPU/GlobalISel mul.ll

[AMDGPU] Add tests for 64bit literals in single DWORD instructions for gfx13.
DeltaFile
+1,343-0llvm/test/CodeGen/AMDGPU/reassoc-mul-add-1-to-mad.ll
+413-186llvm/test/CodeGen/AMDGPU/literal64.ll
+557-0llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
+344-171llvm/test/CodeGen/AMDGPU/scale-offset-global.ll
+310-162llvm/test/CodeGen/AMDGPU/scale-offset-scratch.ll
+426-0llvm/test/CodeGen/AMDGPU/mul.ll
+3,393-5194 files not shown
+4,022-51910 files

LLVM/project dbdb2edllvm/test/CodeGen/AArch64 select_of_select.ll

[AArch64] Add a test for select-of-select with the same condition. NFC (#197903)
DeltaFile
+51-0llvm/test/CodeGen/AArch64/select_of_select.ll
+51-01 files

LLVM/project 9dad35fllvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp

[GlobalISel] Add brackets to assert. NFC (#197902)

Fixes a `warning: suggest parentheses around ‘&&’ within ‘||’`
DeltaFile
+3-3llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+3-31 files

LLVM/project b22721fllvm/test/CodeGen/AMDGPU reassoc-mul-add-1-to-mad.ll literal64.ll, llvm/test/CodeGen/AMDGPU/GlobalISel mul.ll

[AMDGPU] Add tests for 64bit literals in single DWORD instructions for gfx13.
DeltaFile
+1,343-0llvm/test/CodeGen/AMDGPU/reassoc-mul-add-1-to-mad.ll
+413-186llvm/test/CodeGen/AMDGPU/literal64.ll
+557-0llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
+344-171llvm/test/CodeGen/AMDGPU/scale-offset-global.ll
+310-162llvm/test/CodeGen/AMDGPU/scale-offset-scratch.ll
+426-0llvm/test/CodeGen/AMDGPU/mul.ll
+3,393-5194 files not shown
+4,021-51910 files

LLVM/project b549776llvm/docs/CommandGuide llvm-ar.rst

[llvm-ar] add N modifier to extract operation documentation (#197787)

A follow up to:
https://github.com/llvm/llvm-project/pull/196541#issuecomment-4442635635

Add modifier `N` to list of modifiers allowed with `X` option
documentation.
Relevant source location:
[llvm-ar.cpp](https://github.com/llvm/llvm-project/blob/ecc8a95/llvm/tools/llvm-ar/llvm-ar.cpp#L462)


CC: @MaskRay @jh7370
DeltaFile
+5-5llvm/docs/CommandGuide/llvm-ar.rst
+5-51 files

LLVM/project e36fa4ellvm/docs/CommandGuide llvm-ar.rst

[llvm-ar] fixing corruptions in documentation (#197783)

A follow up to:
https://github.com/llvm/llvm-project/pull/196541#issuecomment-4442635635

My fix for :option:`N` is based on the description of `option:: b`:
```
[...]
 found, the files are placed at the end of the ``archive``. *relpos* cannot
 be consumed without either :option:`a`, :option:`b` or :option:`i`. This
 modifier is identical to the :option:`i` modifier.
```

CC: @MaskRay @jh7370
DeltaFile
+4-3llvm/docs/CommandGuide/llvm-ar.rst
+4-31 files

LLVM/project 117fa99llvm/test/Transforms/LowerMatrixIntrinsics multiply-fused-loops-large-matrixes.ll data-layout-multiply-fused.ll

[Matrix] Create inbounds GEPs for matrix load/stores. (#197710)

LowerMatrixIntrinsics creates multiple loads/stores + GEPs for larger
matrix load/stores. Those GEPs compute offsets into the memory accessed
by the larger loads/stores, so those GEPs must be inbounds, otherwise
the larger load would access memory out-of-bounds.

PR: https://github.com/llvm/llvm-project/pull/197710
DeltaFile
+132-132llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops-large-matrixes.ll
+70-70llvm/test/Transforms/LowerMatrixIntrinsics/data-layout-multiply-fused.ll
+67-67llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused.ll
+60-60llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-lifetime-ends.ll
+58-58llvm/test/Transforms/LowerMatrixIntrinsics/binop.ll
+53-53llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-dominance.ll
+440-44035 files not shown
+847-84741 files

LLVM/project 98f029cllvm/lib/Transforms/Vectorize LoopVectorizationPlanner.cpp, llvm/test/Transforms/LoopVectorize gather-scatter.ll if-conversion-scalable.ll

[LV] Introduce -force-target-supports-gather-scatter-ops testing option (#196947)

This introduces a new force-target-supports-gather-scatter-ops CLI
option for testing. It can be used to show that the lack of
gather/scatter support prevents if-conversion.
DeltaFile
+165-0llvm/test/Transforms/LoopVectorize/gather-scatter.ll
+123-0llvm/test/Transforms/LoopVectorize/if-conversion-scalable.ll
+7-1llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.cpp
+295-13 files

LLVM/project bf9916alibclc CMakeLists.txt

[libclc][CMake] Use set instead of APPEND for LIBCLC_ARCHS_ALL initialization (#197866)
DeltaFile
+1-1libclc/CMakeLists.txt
+1-11 files

LLVM/project a9b6710llvm/test/CodeGen/X86 vector-shuffle-combining-avx512f.ll avx512-vbroadcast.ll

[X86] Avoid repeated select masks in avx512 tests (#197886)

Don't reuse the selection masks in unit tests just for expediency -
#197799 will attempt to fold these into single selects

Also remove an ancient test_vbroadcast test that hasn't actually done
anything since we started using mask vpternlog for mask expansion (and
the test now folds away anyhow).
DeltaFile
+38-20llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll
+0-17llvm/test/CodeGen/X86/avx512-vbroadcast.ll
+9-6llvm/test/CodeGen/X86/fma-fneg-combine.ll
+47-433 files

LLVM/project 7395584mlir/include/mlir/Dialect/Tosa/IR TosaOps.td, mlir/lib/Dialect/Tosa/IR TosaOps.cpp

[mlir][tosa] Use traits to check output type aligns with input type (#193961)

Reduces code duplication and ensures the output shape aligns with the
input shape.
DeltaFile
+24-0mlir/test/Dialect/Tosa/verifier.mlir
+1-15mlir/lib/Dialect/Tosa/IR/TosaOps.cpp
+6-2mlir/include/mlir/Dialect/Tosa/IR/TosaOps.td
+7-0mlir/test/Dialect/Tosa/ops.mlir
+38-174 files

LLVM/project 47e1dbeclang/lib/Tooling/Syntax Tokens.cpp, clang/unittests/Tooling/Syntax TokensTest.cpp

[Syntax] Append EOF token to truncated expanded token stream when the parser halts prematurely (#196861)

Fixes #196244.

This PR addresses cases where this assertion is triggered in
`TokenCollector::Builder::build()`:
https://github.com/llvm/llvm-project/blob/dff356d47cfc4413f78c858dd8339cb1c9fca255/clang/lib/Tooling/Syntax/Tokens.cpp#L715

`TokenCollector` collects the expanded token stream by registering a
token watcher callback in the preprocessor. Normally, the preprocessor
calls the callback for every token up to and including the `tok::eof`
token. However, when the parser hits a hard limit such as exceeding the
maximum function scope depth (this is the case covered by #196244) or
exceeding the bracket depth limit, it bails out via
`Parser::cutOffParsing()`. `cutOffParsing` forces the current token to
`eof`, but the token watcher callback is never called for it. The result
is a truncated token stream.

Fix by checking if `ExpandedTokens` is missing the final `tok::eof`. If

    [4 lines not shown]
DeltaFile
+16-3clang/unittests/Tooling/Syntax/TokensTest.cpp
+9-1clang/lib/Tooling/Syntax/Tokens.cpp
+25-42 files

LLVM/project 2ee06c8libc/src/__support/OSUtil/linux/syscall_wrappers mmap.h

[libc] Fix truncation warning/error in #197694 (#197889)
DeltaFile
+1-1libc/src/__support/OSUtil/linux/syscall_wrappers/mmap.h
+1-11 files

LLVM/project 86225b7libcxx/include/__ranges enumerate_view.h, libcxx/test/std/ranges/range.adaptors/range.enumerate adaptor.pass.cpp

[libc++][ranges] Fix missing `forward` in `views::enumerate` (#197635)

This fixes #197404

---------

Co-authored-by: danielcm585 <danielchristianmandolang at gmail.com>
DeltaFile
+35-0libcxx/test/std/ranges/range.adaptors/range.enumerate/adaptor.pass.cpp
+3-3libcxx/include/__ranges/enumerate_view.h
+38-32 files