LLVM/project c139d16clang/test/AST ast-dump-lambda-json.cpp ast-dump-template-json-win32-mangler-crash.cpp, llvm/lib/Support UnicodeNameToCodepointGenerated.cpp

Rebase

Created using spr 1.3.7
DeltaFile
+23,873-20,923llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
+12,365-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.av.load.b128.ll
+1,353-5,985llvm/test/CodeGen/X86/avx512-calling-conv.ll
+0-3,387clang/test/AST/ast-dump-lambda-json.cpp
+7-3,217clang/test/AST/ast-dump-template-json-win32-mangler-crash.cpp
+2,148-0llvm/test/CodeGen/AMDGPU/amdgcn-av-scopes.ll
+39,746-33,5121,160 files not shown
+80,625-56,7471,166 files

LLVM/project 0e8f357llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/PhaseOrdering/X86 avg.ll

[SLP] Prefer outer binary op when opcode groups tie in buildInstructionsState

When VL contains instructions of different opcodes with equal counts,
the tie-breaking in buildInstructionsState could replace an outer
operation (e.g., fadd) with an inner one (e.g., fmul) that appears as
its direct operand, depending on SmallMapVector iteration order. Add a
check: if the current MainOp is a BinaryOperator with a direct operand
matching the challenger partition's opcode in the same block, keep
MainOp instead of switching to the inner operation.

Partially fixes #43353

Reviewers: bababuck, hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/198194
DeltaFile
+31-42llvm/test/Transforms/PhaseOrdering/X86/avg.ll
+25-24llvm/test/Transforms/SLPVectorizer/X86/gathered-loads-non-full-reg.ll
+17-0llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+6-10llvm/test/Transforms/SLPVectorizer/buildvector-nodes-dependency.ll
+3-7llvm/test/Transforms/SLPVectorizer/X86/complex-fma-combine.ll
+82-835 files

LLVM/project 4623a99clang/lib/CodeGen CGOpenMPRuntimeGPU.cpp, llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h

use correct number of threads
DeltaFile
+94-6llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+11-7openmp/device/src/Reduction.cpp
+11-1llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+3-1clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+119-154 files

LLVM/project 2309809llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 extractelements-vector-ops-shuffle.ll scatter-vectorize-reorder-non-empty.ll

[SLP] Support ordered fadd reduction via reduction intrinsics

Add matchOrderedReduction() to recognize linearized ordered fadd chains
(both LHS- and RHS-associated) and tryToReduceOrdered() to vectorize
them using ordered reduction intrinsics (llvm.vector.reduce.fadd).

Previously, the SLP vectorizer could only vectorize ordered reductions
by keeping the original scalar chain and emitting extractelement
instructions. The new path replaces the scalar chain with a vector
ordered reduction intrinsic (where profitable), which allows the backend to lower it
more efficiently.

Reviewers: hiraditya, RKSimon, bababuck

Pull Request: https://github.com/llvm/llvm-project/pull/189451
DeltaFile
+348-5llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+8-13llvm/test/Transforms/SLPVectorizer/X86/extractelements-vector-ops-shuffle.ll
+8-9llvm/test/Transforms/SLPVectorizer/X86/scatter-vectorize-reorder-non-empty.ll
+2-14llvm/test/Transforms/SLPVectorizer/X86/phi.ll
+366-414 files

LLVM/project 5ac91caflang/include/flang/Support OpenMP-utils.h, flang/lib/Lower/OpenMP OpenMP.cpp ClauseProcessor.cpp

[Flang][OpenMP][NFC] Track Objects for BlockArgs (#197442)

When lowering a BlockArg in OpenMP, currently the symbol is tracked.
This can however cause issues later on down the line as information may
be lost relating to an expression. For example, an ArrayElement will be
represented by its symbol, in this case the full array. This is not
ideal as its just he ArrayElement that is intended to be represented.

Now, the object is tracked instead of the Symbol. For cases where the
symbol is required, appropriate API is available to retrieve this
information. This change opens the ability to better handle lowering of
expressions such as Array Elements.

Assisted-by: Codex
DeltaFile
+217-186flang/lib/Lower/OpenMP/OpenMP.cpp
+60-47flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+30-30flang/lib/Lower/OpenMP/ClauseProcessor.h
+31-7flang/include/flang/Support/OpenMP-utils.h
+6-6flang/lib/Lower/OpenMP/Utils.cpp
+1-1flang/lib/Lower/OpenMP/Utils.h
+345-2776 files

LLVM/project 182ae96libcxx/include CMakeLists.txt module.modulemap.in, libcxx/include/__locale_dir locale_base_api.h

[libc++] Port The OpenBSD localization to the new locale API (#194317)
DeltaFile
+229-0libcxx/include/__locale_dir/support/openbsd.h
+0-37libcxx/include/__support/xlocale/__strtonum_fallback.h
+0-19libcxx/include/__locale_dir/locale_base_api/openbsd.h
+3-5libcxx/include/__locale_dir/locale_base_api.h
+1-2libcxx/include/CMakeLists.txt
+1-1libcxx/include/module.modulemap.in
+234-646 files

LLVM/project 263a80dllvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/CodeGen/ARM atomic-load-store.ll

[AtomicExpand] Add bitcasts when expanding store atomic vector

AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
DeltaFile
+99-6llvm/test/CodeGen/X86/atomic-load-store.ll
+98-0llvm/test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
+49-0llvm/test/CodeGen/ARM/atomic-load-store.ll
+4-2llvm/lib/CodeGen/AtomicExpandPass.cpp
+250-84 files

LLVM/project 4314cabllvm/include/llvm/Target TargetSelectionDAG.td, llvm/lib/Target/X86 X86InstrFragmentsSIMD.td X86InstrAVX512.td

[X86] Cast atomic vectors in IR to support floats

Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.

Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.

Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two

    [4 lines not shown]
DeltaFile
+86-0llvm/test/CodeGen/X86/atomic-load-store.ll
+5-4llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
+3-2llvm/lib/Target/X86/X86InstrAVX512.td
+1-1llvm/include/llvm/Target/TargetSelectionDAG.td
+95-74 files

LLVM/project a839f91llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp LegalizeTypes.h, llvm/test/CodeGen/X86 atomic-load-store.ll

[SelectionDAG] Split vector types for atomic store

Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.
DeltaFile
+440-0llvm/test/CodeGen/X86/atomic-load-store.ll
+20-0llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+1-0llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+461-03 files

LLVM/project 6ae5a68llvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/CodeGen/ARM atomic-load-store.ll

[AtomicExpand] Add bitcasts when expanding store atomic vector

AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
DeltaFile
+99-6llvm/test/CodeGen/X86/atomic-load-store.ll
+98-0llvm/test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
+49-0llvm/test/CodeGen/ARM/atomic-load-store.ll
+4-2llvm/lib/CodeGen/AtomicExpandPass.cpp
+250-84 files

LLVM/project de3ee84utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[Bazel] Fixes 6f92180 (#198467)

This fixes 6f9218051ab9e04a8547f7029ca1a9804b5c526d.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+36-6utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+36-61 files

LLVM/project e65b07fllvm/include/llvm/Target TargetSelectionDAG.td, llvm/lib/Target/X86 X86InstrFragmentsSIMD.td X86InstrAVX512.td

[X86] Cast atomic vectors in IR to support floats

Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.

Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.

Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two

    [4 lines not shown]
DeltaFile
+86-0llvm/test/CodeGen/X86/atomic-load-store.ll
+5-4llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
+3-2llvm/lib/Target/X86/X86InstrAVX512.td
+1-1llvm/include/llvm/Target/TargetSelectionDAG.td
+95-74 files

LLVM/project aecb576lldb/source/Breakpoint BreakpointResolverFileRegex.cpp, lldb/source/ValueObject DILEval.cpp

[lldb] Fix no compile unit crash. (#195853)

This crash happens in lldb-dap when hovering inspecting over instruction
addresses in a frame that does not have debug information.
DeltaFile
+14-11lldb/source/ValueObject/DILEval.cpp
+3-0lldb/source/Breakpoint/BreakpointResolverFileRegex.cpp
+17-112 files

LLVM/project 647cb06libcxx/test/std/ranges/range.factories/range.iota.view end.pass.cpp begin.pass.cpp, libcxx/test/std/ranges/range.factories/range.iota.view/iterator star.pass.cpp subscript.pass.cpp

[libc++][ranges] `ranges::iota_view` update tests with `__int128` (#175447)

https://github.com/llvm/llvm-project/pull/167869 made `iota_view`
`__int128` aware but tests needed updating.

---------

Co-authored-by: Hristo Hristov <zingam at outlook.com>
DeltaFile
+8-0libcxx/test/std/ranges/range.factories/range.iota.view/end.pass.cpp
+6-0libcxx/test/std/ranges/range.factories/range.iota.view/iterator/star.pass.cpp
+6-0libcxx/test/std/ranges/range.factories/range.iota.view/iterator/subscript.pass.cpp
+4-0libcxx/test/std/ranges/range.factories/range.iota.view/begin.pass.cpp
+4-0libcxx/test/std/ranges/range.factories/range.iota.view/views_iota.pass.cpp
+28-05 files

LLVM/project 4c10240llvm/tools/llvm-ir2vec/Bindings requirements.txt

[llvm-ir2vec][NFC] Adding disclaimer to Bindings requirements.txt to check compatibility with ml-compiler-opt (#198171)

Follow up PR for
https://github.com/llvm/llvm-zorg/pull/846#issuecomment-4467263196
DeltaFile
+33-1llvm/tools/llvm-ir2vec/Bindings/requirements.txt
+33-11 files

LLVM/project 7c45228lldb/include/lldb/Host FileAction.h, lldb/include/lldb/Host/windows WindowsFileAction.h

[lldb] Remove FileAction::Clear (#198350)
DeltaFile
+3-11lldb/source/Host/common/FileAction.cpp
+0-7lldb/include/lldb/Host/windows/WindowsFileAction.h
+2-3lldb/source/Host/windows/WindowsFileAction.cpp
+0-3lldb/include/lldb/Host/FileAction.h
+5-244 files

LLVM/project a256cf7clang/lib/CIR/CodeGen CIRGenModule.cpp, clang/test/CIR/CodeGen attr-target-aarch64.c attr-target-x86.c

[CIR] Implement function target/tune attrs and FMV metadata. (#195813)

Port OGCG's GetCPUAndFeaturesAttributes into CIRGenModule, replacing the
opFuncMultiVersioning placeholder. Handles TargetAttr /
TargetVersionAttr /CPUSpecificAttr / TargetClonesAttr, AMDGPU
delta-feature encoding, and AArch64 fmv-features metadata.
DeltaFile
+244-0clang/test/CIR/CodeGen/attr-target-aarch64.c
+186-0clang/test/CIR/CodeGen/attr-target-x86.c
+74-0clang/test/CIR/CodeGenHIP/attr-target-amdgpu.hip
+43-1clang/lib/CIR/CodeGen/CIRGenModule.cpp
+547-14 files

LLVM/project 8ec15f5llvm/test/TableGen ArtificialRegs.td, llvm/utils/TableGen RegisterInfoEmitter.cpp

[TableGen] Fix getting weights of register classes (#198328)

The first member can be an aritifical register, so we have to find a
non-artificial one to query its weight.
DeltaFile
+7-2llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+2-0llvm/test/TableGen/ArtificialRegs.td
+1-0llvm/utils/TableGen/RegisterInfoEmitter.cpp
+10-23 files

LLVM/project 0eceac1libc/include CMakeLists.txt

[libc] Add regex_macros dependency to regex header (#198453)

Added the regex_macros dependency to the regex header target.
regex-macros.h was not being installed when regex entrypoints were
enabled.

Assisted-by: Automated tooling, human reviewed.
DeltaFile
+1-0libc/include/CMakeLists.txt
+1-01 files

LLVM/project 304d077llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-sub.ll partial-reduce-sub-epilogue-vec.ll

[AArch64] Add missing FSub case to isLegalToVectorizeReduction (#198302)

Adds missing RecurKind::Fsub case to lower to partial reduction.
DeltaFile
+67-63llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub.ll
+53-27llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll
+1-0llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+121-903 files

LLVM/project 64f7035lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime AppleObjCRuntimeV2.cpp

[lldb] Fix wrong buffer size when fetching Objective-C classes (#197389)

LLDB calls objc_getRealizedClassList_trylock to fetch the list of
realized Objective-C classes.

Jim spotted that we currently pass the buffer length in *bytes*, when
actually this API takes the buffer length in number of elements. This
causes that the Objective-C runtime write more memory that we allocated
for it. This can cause that the function calling expression crashes and
leaves the Objective-C runtime mutex locked.
DeltaFile
+1-1lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCRuntimeV2.cpp
+1-11 files

LLVM/project 81c69e1cmake/Modules LLVMVersion.cmake, libcxx/include __config

Bump version to 22.1.7
DeltaFile
+1-1cmake/Modules/LLVMVersion.cmake
+1-1libcxx/include/__config
+1-1llvm/utils/gn/secondary/llvm/version.gni
+1-1llvm/utils/lit/lit/__init__.py
+1-1llvm/utils/mlgo-utils/mlgo/__init__.py
+5-55 files

LLVM/project e7887d5llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AArch64 sve-fixed-length-trunc-stores.ll

[AArch64][SVE] Use truncating stores whenever possible (#196029)

For fixed length SVE and fixed length vectors x/y, fold

```
store(concat_vector(truncate(x), truncate(y)))
-->  store(truncate(x))
     store(truncate(y))
```
DeltaFile
+67-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+12-31llvm/test/CodeGen/AArch64/sve-fixed-length-trunc-stores.ll
+8-16llvm/test/CodeGen/X86/vector-shuffle-combining-avx2.ll
+87-493 files

LLVM/project 60e95e5clang/lib/Driver Driver.cpp ToolChain.cpp, clang/lib/Driver/ToolChains Flang.cpp

[Flang][Driver] Add per-target search path for modules (#196558)

Adds the version- and target-specific path

    ../lib/clang/<version>/finclude/flang/<target>

to the intrinsic module search path in addition to

    ../finclude/flang

with the former taking precedence if a module file should exist in both.
The version/target-specific path is added by the driver by passing
`-fintrinsic-modules-path` to the `-fc1` invocation. This is consistent
with gfortran and the usual pattern that the driver resolves paths into
the resource path, not the frontend.

This PR adds nothing into that directory, which will be done in #171515.

Extracted out of #171515 as requested by

    [4 lines not shown]
DeltaFile
+62-0flang/test/Driver/intrinsic-module-path_per_target.f90
+32-0clang/lib/Driver/ToolChains/Flang.cpp
+6-6flang/test/Driver/use-module.f90
+11-0clang/lib/Driver/Driver.cpp
+6-0flang/lib/Frontend/CompilerInvocation.cpp
+6-0clang/lib/Driver/ToolChain.cpp
+123-66 files not shown
+134-1112 files

LLVM/project ee5c1c5clang/lib/CIR/CodeGen Address.h CIRGenExprCXX.cpp, clang/test/CIR/CodeGen union-agg-init.c

Address comments
DeltaFile
+9-9clang/test/CIR/CodeGenOpenACC/compute-reduction-clause-default-ops.c
+8-1clang/lib/CIR/CodeGen/Address.h
+2-1clang/lib/CIR/CodeGen/CIRGenExprCXX.cpp
+1-1clang/test/CIR/CodeGen/union-agg-init.c
+20-124 files

LLVM/project f3da7b9llvm/lib/Target/Mips MipsLegalizerInfo.cpp

[MIPS][GlobalISel] Remove dependency on legal ruleset (#197379)

This fills in always legal rules, to remove the dependency on the legacy
ruleset. This is not guaranteed to be all the rules, just the ones that
appear in tests.
DeltaFile
+5-0llvm/lib/Target/Mips/MipsLegalizerInfo.cpp
+5-01 files

LLVM/project 6f92180mlir/include/mlir/Dialect/NVGPU/IR NVGPUOps.td NVGPU.td, mlir/lib/Dialect/NVGPU/IR NVGPUDialect.cpp CMakeLists.txt

[MLIR][NVGPU] Use NVVM enums in NVGPU dialect (#195812)

Updates the `nvgpu.rcp` Op to use the NVVM `FPRoundingModeAttr`
attribute instead of redefining the attribute in the NVGPU dialect.
DeltaFile
+9-4mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
+6-6mlir/test/Dialect/NVGPU/invalid.mlir
+5-4mlir/include/mlir/Dialect/NVGPU/IR/NVGPUOps.td
+1-1mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
+2-0mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td
+1-0mlir/lib/Dialect/NVGPU/IR/CMakeLists.txt
+24-151 files not shown
+25-157 files

LLVM/project b51d4c2llvm/include/llvm/Transforms/Utils LoopPeel.h UnrollLoop.h, llvm/lib/Transforms/Scalar LoopUnrollPass.cpp

[LoopPeel] Peel last iteration to enable load widening

In loops that contain multiple consecutive small loads (e.g., 3 bytes
loading i8's), peeling the last iteration makes it safe to read beyond
the accessed region, enabling the use of a wider load (e.g., i32) for
all other N-1 iterations.

Patterns such as:
```
  %a = load i8, ptr %p
  %b = load i8, ptr %p+1
  %c = load i8, ptr %p+2
  ...
  %p.next = getelementptr i8, ptr %p, 3
```

Can be transformed to:
```
  %wide = load i32, ptr %p  ; Read 4 bytes

    [9 lines not shown]
DeltaFile
+616-0llvm/test/Transforms/LoopUnroll/peel-last-iteration-load-widening.ll
+230-1llvm/lib/Transforms/Utils/LoopPeel.cpp
+104-0llvm/test/Transforms/LoopUnroll/peel-last-iteration-load-widening-be.ll
+19-13llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+7-1llvm/include/llvm/Transforms/Utils/LoopPeel.h
+2-1llvm/include/llvm/Transforms/Utils/UnrollLoop.h
+978-166 files

LLVM/project 41b3177llvm/lib/Transforms/Utils LoopPeel.cpp, llvm/test/Transforms/LoopUnroll peel-last-iteration-load-widening.ll peel-last-iteration-load-widening-be.ll

Address comments 1
DeltaFile
+1,694-0llvm/test/Transforms/LoopUnroll/AArch64/peel-last-iteration-load-widening.ll
+0-616llvm/test/Transforms/LoopUnroll/peel-last-iteration-load-widening.ll
+117-79llvm/lib/Transforms/Utils/LoopPeel.cpp
+0-104llvm/test/Transforms/LoopUnroll/peel-last-iteration-load-widening-be.ll
+67-0llvm/test/Transforms/LoopUnroll/PowerPC/peel-last-iteration-load-widening-be.ll
+56-0llvm/test/Transforms/LoopUnroll/AArch64/peel-last-iteration-load-widening-disabled.ll
+1,934-7994 files not shown
+1,948-80910 files

LLVM/project c816a36llvm/test/Transforms/LoopVectorize scalable-first-order-recurrence.ll, llvm/test/Transforms/LoopVectorize/AArch64 sve-interleaved-masked-accesses.ll partial-reduce-chained.ll

[VPlan] Expand simple SCEVs directly to VPInstructions. (#189455)

Add initial simple SCEV expansion directly to VPInstructions. To start
with, just support expanding SCEV expressions for the vector step (VF *
UF). This requires expanding VScale, constants and multiply expressions.

This allows enables CSE for some redundant vscale calls as first step
and also enables expanding SCEV expressions in blocks other than the
header as follow-ups. For example, this could be useful to avoid some
code movement with https://github.com/llvm/llvm-project/pull/189372.
DeltaFile
+80-84llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
+18-36llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
+15-30llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
+14-28llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll
+14-28llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll
+12-24llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
+153-23085 files not shown
+467-69291 files