LLVM/project 2f79d41llvm/test/CodeGen/X86 avx512-calling-conv.ll avx512-masked_memop-16-8.ll

[X86] LowerBUILD_VECTORvXi1 - attempt to fold as VPTESTMB(BUILD_VECTOR_vXi8(X),1) (#198166)

i1 scalar elements will be legalised to i8 (and the BUILD_VECTOR relies
on implicit truncation) - but it will often be cheaper to perform the
BUILD_VECTOR as a vXi8 and then perform a comparison to convert to the
vXi1 mask, assuming we're inserting more than one non-constant i1
element.

Without BWI we have to extend this to vXi32 types to perform the
comparison.

There's probably a lot we can do here (v2i8/v4i8/v8i8 types), but this
patch at least addresses the worst codegen cases.

Fixes #179334
DeltaFile
+749-4,307llvm/test/CodeGen/X86/avx512-calling-conv.ll
+207-1,438llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll
+197-1,083llvm/test/CodeGen/X86/avx512-load-store.ll
+203-915llvm/test/CodeGen/X86/vector-compress.ll
+158-868llvm/test/CodeGen/X86/avx512-ext.ll
+154-866llvm/test/CodeGen/X86/avx512-mask-op.ll
+1,668-9,4771 files not shown
+1,688-9,4787 files

LLVM/project 1907b58llvm/lib/Target/PowerPC PPCISelLowering.cpp, llvm/test/CodeGen/PowerPC ppc-i128-cmp.ll

[PowerPC] Fix i128 vcmpequb optimization for loads with range metadata and small constants (#196801)

The combine introduced in 55aff64d2c6ef50d2ed725d7dd1fb34080486237
lowers scalar i128 compares into vector compares by reissuing the
original loads as v16i8 loads. However, the combine was reusing the
original MachineMemOperand without modification.

If the original i128 load carries !range metadata, the MMO encodes that
range using i128 values. Reusing this MMO for a v16i8 load is incorrect
as range metadata is only valid for integer scalar types and its
bitwidth must match the memory VT.

This patch fixes this by creating a new MachineMemOperand for the vector
vector load. Additionally, we restrict the combine for constant operands
to avoid cases that are better handled by scalar lowering. Small
constants (fit within 16 bits) are excluded to prevent generating
suboptimal vector compares.
DeltaFile
+282-0llvm/test/CodeGen/PowerPC/ppc-i128-cmp.ll
+28-8llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+310-82 files

LLVM/project 24b5f1dllvm/include/llvm/Transforms/Vectorize SLPVectorizer.h, llvm/lib/Transforms/Vectorize SLPVectorizer.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+53-28llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+15-24llvm/test/Transforms/SLPVectorizer/RISCV/vec3-base.ll
+14-20llvm/test/Transforms/SLPVectorizer/AArch64/commute.ll
+10-16llvm/test/Transforms/SLPVectorizer/X86/dot-product.ll
+16-9llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
+8-4llvm/test/Transforms/SLPVectorizer/X86/slp-fma-loss.ll
+116-1011 files not shown
+119-1047 files

LLVM/project 6931a33llvm/test/CodeGen/X86 avx512-masked_memop-16-8.ll avx512-load-store.ll

[X86] avx512-load-store.ll - add test coverage for #198154 and #198165 (#198169)
DeltaFile
+1,450-0llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll
+1,105-16llvm/test/CodeGen/X86/avx512-load-store.ll
+2,555-162 files

LLVM/project aa2f124llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/PhaseOrdering/AArch64 reduce_submuladd.ll

[SLP] Enable full non-power-of-2 vectorization by default

Default slp-vectorize-non-power-of-2 to true and broaden the set of
supported widths beyond NumElts + 1 == bit_ceil(NumElts) to include
small widths (<= 5), widths where NumElts - 1 is also non-power of two
(e.g. 6, 7, 10..15), and any width when the elements being vectorized
are themselves vectors (REVEC). Tweak gathered loads, stores, and
reduction support to the non-power-of-2 vector factors.

Reviewers: hiraditya, bababuck, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/196825
DeltaFile
+140-76llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll
+137-42llvm/test/Transforms/SLPVectorizer/X86/dot-product.ll
+120-29llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+31-98llvm/test/Transforms/PhaseOrdering/AArch64/reduce_submuladd.ll
+44-56llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
+24-60llvm/test/Transforms/SLPVectorizer/RISCV/revec.ll
+496-36133 files not shown
+703-70139 files

LLVM/project 265dcddllvm/test/CodeGen/X86 avx512-calling-conv.ll

[X86] avx512-calling-conv.ll - add test coverage for #179334 (#198163)
DeltaFile
+3,451-0llvm/test/CodeGen/X86/avx512-calling-conv.ll
+3,451-01 files

LLVM/project d926f39flang/include/flang/Optimizer/Dialect/CUF/Attributes CUFAttr.h, flang/lib/Optimizer/Transforms CompilerGeneratedNames.cpp

[CUF] Fix CompilerGeneratedNamesConversion renaming managed companion globals

CUFAddConstructor creates a companion pointer global (e.g. foo.managed.ptr)
for each non-allocatable managed variable. When CompilerGeneratedNamesConversion
ran after CUFAddConstructor, it replaced the dots with 'X',
so CUFOpConversionLate could no longer find the companion by name and fell back
to CUFGetDeviceAddress with the wrong host pointer, causing cudaErrorInvalidSymbol.

Fix: mark the companion global with a cuf.managed_ptr unit attribute in
CUFAddConstructor and skip it in CompilerGeneratedNamesConversionPass.

Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
DeltaFile
+51-0flang/test/Fir/CUDA/cuda-managed-ptr-companion.mlir
+7-0flang/include/flang/Optimizer/Dialect/CUF/Attributes/CUFAttr.h
+2-2flang/test/Fir/CUDA/cuda-constructor-2.f90
+3-1flang/lib/Optimizer/Transforms/CompilerGeneratedNames.cpp
+4-0flang/lib/Optimizer/Transforms/CUDA/CUFAddConstructor.cpp
+67-35 files

LLVM/project d90a802llvm/lib/IR PrintPasses.cpp

[IR][PrintPasses] Disable IO Sandbox in doSystemDiff (#198151)

Fix `fatal error: error in backend: IO sandbox violation` when executing
`clang -cc1 -print-after-all -print-changed=diff`.

doSystemDiff does temporary file I/O and executes external diff program.
This conflicts with IO sandbox.
DeltaFile
+3-0llvm/lib/IR/PrintPasses.cpp
+3-01 files

LLVM/project e925b35libcxx/include/__algorithm copy_backward.h move.h

[libc++] Introduce a private version of in_out_result and use it for copy/move algorithms (#198086)

This patch introduces a new `__in_out_result`, which is an internal
back-ported version of `in_out_result`, and is convertible to that when
it exists. This improves the readability of the code, since it replaces
uses of `first` and `second` with `__in_` and `__out_`, making it clear
which iterator is accessed.

Other algorithms will be updated in separate patches.
DeltaFile
+18-17libcxx/include/__algorithm/copy_backward.h
+16-15libcxx/include/__algorithm/move.h
+16-15libcxx/include/__algorithm/copy.h
+16-14libcxx/include/__algorithm/move_backward.h
+9-12libcxx/include/__algorithm/copy_move_common.h
+14-0libcxx/include/__algorithm/in_out_result.h
+89-7317 files not shown
+137-14023 files

LLVM/project 4dc415fclang/lib/CodeGen CGCall.cpp

[CGCall] Initially store arg attrs using AttrBuilder (NFCI) (#197906)

Make the argument attribute more similar to fn/ret handling, by first
populating an AttrBuilder and then converting it to AttributeSet once at
the end, instead of using a lot of intermediate AttrBuilders. This also
ensures we cannot lose any attributes because one code path overwrites
another.
DeltaFile
+16-20clang/lib/CodeGen/CGCall.cpp
+16-201 files

LLVM/project e6a1278llvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64Arm64ECCallLowering.cpp, llvm/test/CodeGen/AArch64 arm64ec-exit-thunks.ll arm64ec-hybrid-patchable.ll

[AArch64] Copy x4/x5 vararg payload into the x64 stack in Arm64EC exit thunks (#190933)

Currently the x4/x5 in a variadic Arm64EC exit thunks are treated by
LLVM like any other outgoing arguments. x4/x5 contain a pointer to the
first stack parameter and the size of the parameters passed on the
stack, and the generated exit thunk must memcpy these to the x86-64
stack. Current MSVC does this correctly.

Rather than introducing a new entry to the CallingConv enum, we mark the
call as vararg in AArch64ArmECCallLowering so that the lowering logic in
AArch64ISelLowering.cpp can recognise this case, perform the necessary
memcpy, and drop the x4/x5 arguments.

LLVM should additionally ensure that x0-x3 are mirrored to f0-f3 in
order to match the Windows x86-64 vararg ABI, but that change is left
for a follow-up patch.
DeltaFile
+208-6llvm/test/CodeGen/AArch64/arm64ec-exit-thunks.ll
+62-4llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+11-9llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll
+9-1llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
+290-204 files

LLVM/project 606a570llvm/lib/Target/RISCV/Disassembler RISCVDisassembler.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.8-beta.1
DeltaFile
+17-31llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+17-311 files

LLVM/project 4b27000llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU load-atomic-flat.ll load-atomic-global.ll

[AMDGPU][AtomicExpand] Support sub naturally aligned 64 bit atomic load/store
DeltaFile
+87-0llvm/test/CodeGen/AMDGPU/load-atomic-flat.ll
+87-0llvm/test/CodeGen/AMDGPU/load-atomic-global.ll
+49-0llvm/test/CodeGen/AMDGPU/store-atomic-flat.ll
+48-0llvm/test/CodeGen/AMDGPU/store-atomic-global.ll
+44-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+36-0llvm/test/Transforms/AtomicExpand/AMDGPU/unaligned-atomic.ll
+351-06 files not shown
+417-612 files

LLVM/project b5cf3c7clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp, clang/test/CodeGen/AArch64 neon-perm.c

[CIR][AArch64] Lower NEON vtrn intrinsics (#197651)

### Summary

part of : https://github.com/llvm/llvm-project/issues/185382

lower `vtrn` intrinsics in:
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#transpose-elements

Lower `case NEON::BI__builtin_neon_vtrn_v` and `case
NEON::BI__builtin_neon_vtrnq_v` CIRGenBuiltinAArch64.cpp by porting by
porting the existing incubator
logic(`clangir/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp`),generating
the two transpose results with `cir.vec.shuffle`, and storing both
vectors into the returned pair structure.

All intrinsics in `clang/test/CodeGen/AArch64/neon-perm.c` have been
migrated into `clang/test/CodeGen/AArch64/neon/perm.c`,so I delete the
original file.
DeltaFile
+0-383clang/test/CodeGen/AArch64/neon-perm.c
+372-0clang/test/CodeGen/AArch64/neon/perm.c
+0-36clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c
+22-2clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+394-4214 files

LLVM/project 6215721clang/test/CodeGen/AArch64/neon perm.c

[CIR][AArch64][NFC] Modify unzip zip intrinsics checkline (#198064)

### summary

part of:https://github.com/llvm/llvm-project/issues/185382

this is a follow up of:
https://github.com/llvm/llvm-project/pull/195591 and
https://github.com/llvm/llvm-project/pull/193658

Change the checkline from hard-coded to lit variable capture.
DeltaFile
+120-120clang/test/CodeGen/AArch64/neon/perm.c
+120-1201 files

LLVM/project 626ef59llvm/lib/Transforms/Instrumentation IndirectCallPromotion.cpp, llvm/lib/Transforms/Scalar JumpTableToSwitch.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+3-4llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp
+3-3llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+6-72 files

LLVM/project a1ff77dllvm/lib/Transforms/Instrumentation IndirectCallPromotion.cpp

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.7

[skip ci]
DeltaFile
+3-3llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+3-31 files

LLVM/project ad342d4llvm/lib/Transforms/Instrumentation IndirectCallPromotion.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+3-3llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+3-31 files

LLVM/project bae540bllvm/lib/IR Verifier.cpp, llvm/test/Verifier value-profile.ll

[IR] Verify that VP !prof does not have duplicate prof values

Follows on #193077. We should not have duplicate values as they should
be merged before being moved into IR. Add this to the verifier so that
we can actually enforce this constraint.

Reviewers: david-xl, teresajohnson

Pull Request: https://github.com/llvm/llvm-project/pull/193083
DeltaFile
+9-0llvm/lib/IR/Verifier.cpp
+9-0llvm/test/Verifier/value-profile.ll
+18-02 files

LLVM/project 96285e0flang/lib/Semantics check-omp-structure.cpp check-omp-structure.h, flang/test/Semantics/OpenMP declare-target09.f90

[flang][OpenMP] Add diagnostic for bare DECLARE TARGET in invalid scopes (#198039)

The bare form of `!$omp declare target` (without arguments or clauses)
is only permitted in the specification part of a subroutine, function,
or interface body (OpenMP 5.2, section 7.8.2). Flang previously accepted
it silently in BLOCK DATA, PROGRAM, MODULE, SUBMODULE, and BLOCK
constructs.

This patch:
- Adds a semantic check rejecting the bare form outside Subprogram
scopes.
- Adds MpSubprogramStmt/EndMpSubprogramStmt scope tracking to avoid
  false positives in separate module subprograms (MODULE PROCEDURE).
- Fixes pre-existing BlockConstruct scope tracking bugs: the Leave
  handler was pushing instead of popping (stack corruption), and the
  Enter handler used blockStmt.source which resolves to the parent
  scope. Now uses endBlockStmt.source (walked inside the block scope
  during name resolution) for correct BlockConstruct scope identity.


    [4 lines not shown]
DeltaFile
+101-0flang/test/Semantics/OpenMP/declare-target09.f90
+25-14flang/lib/Semantics/check-omp-structure.cpp
+2-0flang/lib/Semantics/check-omp-structure.h
+128-143 files

LLVM/project bf89698lld/ELF MarkLive.cpp, lld/test/ELF why-live.test

[ELF] Fix imprecise --why-live message for exported symbols (#198139)

The "; may interpose" suffix is imprecise: a symbol is preserved because
it is exported into the dynamic symbol table, regardless of whether it
is interposable (preemptible).

Fix #192035 and add a test (previously uncovered)
DeltaFile
+11-0lld/test/ELF/why-live.test
+1-1lld/ELF/MarkLive.cpp
+12-12 files

LLVM/project e407a44llvm/test/FileCheck/dump-input filter.txt annotations.txt, llvm/test/FileCheck/dump-input/search-range-annotations check-not.txt check-label-follows.txt

[FileCheck] Annotate search ranges with { } in -dump-input

Example
=======

```
$ cat check
CHECK: start
CHECK-NEXT: end

$ FileCheck -v -dump-input-context=2 check < input |& tail -23
<<<<<<
          1: start
check:1      ^~~~~
next:2'0         {   search range start (exclusive)
          2: foo0
          3: foo1
          .
          .

    [134 lines not shown]
DeltaFile
+373-214llvm/test/FileCheck/dump-input/filter.txt
+364-113llvm/test/FileCheck/dump-input/annotations.txt
+201-29llvm/utils/FileCheck/FileCheck.cpp
+101-99llvm/test/FileCheck/dump-input/context.txt
+112-0llvm/test/FileCheck/dump-input/search-range-annotations/check-not.txt
+74-0llvm/test/FileCheck/dump-input/search-range-annotations/check-label-follows.txt
+1,225-4557 files not shown
+1,327-49113 files

LLVM/project 2b7c4d0llvm/test/FileCheck/dump-input annotations.txt, llvm/utils/FileCheck FileCheck.cpp

[FileCheck] Refactor -dump-input test

This patch refactors `llvm/test/FileCheck/dump-input/annotations.txt`
to improve maintainability and coverage and to prepare for the
upcoming implementation of search range annotations.

Lit substitutions
=================

The test repeats the same basic set of RUN lines *many* times.  This
patch encapsulates those in lit substitutions to improve
maintainability.  By doing so, it also helps to ensure more consistent
coverage of all cases and thus slightly expands coverage.

-strict-whitespace
==================

Via those substitutions, this patch adds `-strict-whitespace`
throughout the test, and it drops the initial `-strict-whitespace`

    [27 lines not shown]
DeltaFile
+592-509llvm/test/FileCheck/dump-input/annotations.txt
+11-0llvm/utils/FileCheck/FileCheck.cpp
+603-5092 files

LLVM/project b2b510allvm/test/FileCheck/match-time-error-propagation invalid-excluded-pattern.txt matched-excluded-pattern.txt

[FileCheck] Resurrect overflow tests

D150880 (landed as 0726cb004718) uses `APInt` to eliminate most
integer overflow issues from FileCheck numeric variables.  It also
removes the 4 tests in
`llvm/test/FileCheck/match-time-error-propagation`.

While the elimination of overflow issues reduces the importance of
those tests, the tests still seem worthwhile.  Without them, I see no
test that exercises the "unable to substitute variable or numeric
expression: overflow error" diagnostic in FileCheck input dumps.

This patch resurrects those tests and updates them to exercise the
remaining unsigned underflow case.
DeltaFile
+68-0llvm/test/FileCheck/match-time-error-propagation/invalid-excluded-pattern.txt
+63-0llvm/test/FileCheck/match-time-error-propagation/matched-excluded-pattern.txt
+58-0llvm/test/FileCheck/match-time-error-propagation/matched-expected-pattern.txt
+55-0llvm/test/FileCheck/match-time-error-propagation/invalid-expected-pattern.txt
+244-04 files

LLVM/project 20b0089lld/ELF SymbolTable.cpp

[ELF] Remove redundant memset in SymbolTable::insert. NFC (#198132)

make<SymbolUnion>() value-initializes the union, zero-initializing all
sizeof(SymbolUnion) bytes. The following memset(sym, 0, sizeof(Symbol))
is therefore redundant.

This placeholder path runs no Symbol constructor, so it was not covered
by the constructor initialization in
905a88b923433eb8cd83677ea55bee82eb9ba498.
DeltaFile
+2-2lld/ELF/SymbolTable.cpp
+2-21 files

LLVM/project 19502e4llvm/test/CodeGen/AMDGPU/GlobalISel sdivrem.ll udivrem.ll, llvm/test/CodeGen/Thumb2 mve-clmul.ll

Rebase

Created using spr 1.3.7
DeltaFile
+8,633-8,584llvm/test/CodeGen/Thumb2/mve-clmul.ll
+3,436-2,769llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
+2,801-2,109llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
+0-4,752llvm/test/tools/llvm-mca/RISCV/SiFiveP800/vlseg-vsseg.s
+4,549-0llvm/test/tools/llvm-mca/RISCV/SiFiveP800/rvv/arithmetic.test
+3,706-328llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-local.mir
+23,125-18,5422,566 files not shown
+155,715-74,0332,572 files

LLVM/project f70897fllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-replicaton-i1-mask.ll

[X86] isExtractSubvectorCheap - fix typo in vXi1 extraction test (#198127)

Fix typo in check for ResVT subvector being half the size of the SrcVT vector (instead of vice-versa).

Fixes #195695
DeltaFile
+1,243-8,768llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll
+3-2llvm/lib/Target/X86/X86ISelLowering.cpp
+1,246-8,7702 files

LLVM/project 905a88blld/ELF Symbols.h InputFiles.cpp

[ELF] Initialize Symbol fields in the constructor instead of via memset (#198129)

`initSectionsAndLocalSyms` and `makeDefined` memset the storage to zero
and then placement-new a Symbol-derived object into it. Placement new
begins a new object's lifetime. The standard does not seem to guarantee
the memset bytes carry into members the constructor leaves
uninitialized.

lld built by GCC 16 can make Valgrind report reads of Symbol::flags
(via getSymSectionIndex during finalizeSections) as uses of
uninitialized values (ClangBuiltLinux/linux#2162).

This patch reinstates the per-field initialization that commit
778742760534 ("[ELF] Avoid redundant assignment to Symbol fields. NFC")
had replaced with a bulk memset.
DeltaFile
+12-8lld/ELF/Symbols.h
+0-2lld/ELF/InputFiles.cpp
+12-102 files

LLVM/project 2e4c820llvm/lib/Transforms/Vectorize VPlanCFG.h VPlanConstruction.cpp

[VPlan] Refine plain CFG iterator name and strengthen assert (NFC). (#198124)

Address post-commit comments for
https://github.com/llvm/llvm-project/pull/197499:
* add rpo prefix the name to indicate traversal (similar to other
vp_depth_first_ helpers)
 * Added comment about skipped VPIRBBs + assert.
DeltaFile
+8-3llvm/lib/Transforms/Vectorize/VPlanCFG.h
+1-1llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+9-42 files

LLVM/project e024375llvm/lib/Transforms/Vectorize VPlanUtils.h VPlanTransforms.cpp

[VPlan] Add blocksAs helper (NFC). (#198122)

Add new blocksAs helper which casts all blocks in the provided range to
the specified type, instead of filtering out non-matching blocks.
Migrate a number of users that expect only VPBasicBlocks.

Pointed out post-commit in
https://github.com/llvm/llvm-project/pull/197499.
DeltaFile
+11-0llvm/lib/Transforms/Vectorize/VPlanUtils.h
+3-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+2-2llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+1-1llvm/lib/Transforms/Vectorize/VPlanCFG.h
+17-64 files