LLVM/project f6e4e71llvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlan.h, llvm/test/Transforms/LoopVectorize select-cmp-blend-chain.ll

Reapply "[LV] Handle chained selects/blends when creating new rdx cha… (#199559)

This reverts commit ab1745439c7019d0753afc616c5fc5aef7b82fb6 & reapplies
#199443.

Recommit with additional additional fix to handle other select-like
recipes including VPWidenRecipe and VPReplicateRecipe.

Original message:
Make sure we recursively clone chains of selects/blends when re-creating
a reduction chain with new types.

Fixes https://github.com/llvm/llvm-project/issues/199406.
DeltaFile
+451-0llvm/test/Transforms/LoopVectorize/select-cmp-blend-chain.ll
+30-24llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+4-2llvm/lib/Transforms/Vectorize/VPlan.h
+485-263 files

LLVM/project 586cf1bclang/include/clang/Options Options.td, clang/lib/Driver/ToolChains Clang.cpp

address comments
DeltaFile
+2-2clang/include/clang/Options/Options.td
+1-1clang/lib/Driver/ToolChains/Clang.cpp
+3-32 files

LLVM/project d5f223dllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/RISCV revec-strided-store.ll

[SLP] Enable widening strided revectorization of vector stores (#198920)

This commit adds support for re-vectorization of vector stores into
widened strided stores. That is:
```
%p1 = getelementptr i16, ptr %p0, i64 16
store <4 x i16> zeroinitializer, ptr %p1, align 2
store <4 x i16> zeroinitializer, ptr %p0, align 2
```
can be further vectorized to:
```
call void @llvm.experimental.vp.strided.store.v2i64.p0.i64(<2 x i64> zeroinitializer, ptr align 2 %p0, i64 32, <2 x i1> splat (i1 true), i32 2)
```
DeltaFile
+18-7llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+4-9llvm/test/Transforms/SLPVectorizer/RISCV/revec-strided-store.ll
+22-162 files

LLVM/project 5b93aebclang/lib/Driver OffloadBundler.cpp, clang/test/Driver clang-offload-bundler-multi-compress.c

clang-offload-bundler incorrectly errors on multi-CCOB binaries (#182579)

Issue: https://github.com/ROCm/llvm-project/issues/448

Objects can have multiple Clang Compressed Offload Bundles (CCOB) in the
.hip_fatbin section. This happens when there are multiple
translation/compilation units built and then linked together into an
Archive or Shared Object. The resulting .hip_fatbin section will have
multiple offload bundles delimited by the magic string "CCOB" (on a 4k
alignment boundary). The Clang Offload bundler API, when a List of
bundle entries is requested, was not properly iterating (looping) over
each separate bundle.

REPRODUCTION
Test File: librocblas.so.5 from ROCm 6.x distribution
.hip_fatbin section: 8,163,887 bytes containing 64 concatenated CCOBs

Extract the .hip_fatbin section with:
objcopy --dump-section .hip_fatbin=fatbin.bin binary

    [19 lines not shown]
DeltaFile
+200-116clang/lib/Driver/OffloadBundler.cpp
+187-0clang/test/Driver/clang-offload-bundler-multi-compress.c
+387-1162 files

LLVM/project f78149cclang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[clang] fix getTemplateInstantiationArgs

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
DeltaFile
+194-429clang/lib/Sema/SemaTemplateInstantiate.cpp
+257-164clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+150-148clang/lib/Sema/SemaTemplate.cpp
+96-95clang/include/clang/AST/DeclTemplate.h
+59-129clang/lib/Sema/SemaConcept.cpp
+60-92clang/lib/AST/DeclTemplate.cpp
+816-1,05748 files not shown
+1,432-1,64754 files

LLVM/project c10922autils/bazel/llvm-project-overlay/libc BUILD.bazel

[Bazel] Fixes af92edf (#199515)

This fixes af92edf8b3aa4104992de9fe08ce2170d14bc28d.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+1-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+1-01 files

LLVM/project 7ba877ellvm/lib/Transforms/Vectorize VPlanPatternMatch.h VPlanUtils.cpp

[VPlan] Add matcher for canonical VPWidenIntOrFpInductionRecipe (NFC). (#199539)

Add matcher for canonical VPWidenIntOrFpInductionRecipe to simplify some
matching.
DeltaFile
+25-0llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+10-11llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+8-8llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+43-193 files

LLVM/project 1db7616clang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[clang] fix getTemplateInstantiationArgs

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
DeltaFile
+194-429clang/lib/Sema/SemaTemplateInstantiate.cpp
+257-164clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+150-148clang/lib/Sema/SemaTemplate.cpp
+96-95clang/include/clang/AST/DeclTemplate.h
+59-129clang/lib/Sema/SemaConcept.cpp
+60-92clang/lib/AST/DeclTemplate.cpp
+816-1,05747 files not shown
+1,428-1,63953 files

LLVM/project 24ca009clang/test/Headers __clang_hip_math.hip, llvm/test/CodeGen/PowerPC fp-strict-fcmp-spe.ll

Merge branch 'main' into users/kasuga-fj/da-consolidate-acc-gcd
DeltaFile
+647-736clang/test/Headers/__clang_hip_math.hip
+549-615llvm/test/Transforms/SLPVectorizer/X86/arith-mul-smulo.ll
+591-509llvm/test/FileCheck/dump-input/annotations.txt
+182-888llvm/test/CodeGen/PowerPC/fp-strict-fcmp-spe.ll
+449-615llvm/test/Transforms/SLPVectorizer/X86/arith-add-uaddo.ll
+449-615llvm/test/Transforms/SLPVectorizer/X86/arith-add-saddo.ll
+2,867-3,9781,135 files not shown
+27,616-16,8611,141 files

LLVM/project cef112ellvm/include/llvm/Analysis SimplifyQuery.h, llvm/lib/Analysis InstructionSimplify.cpp

Update transformations sensitive to signaling NaNs

Previously exception handling behavior was uses as an indicator of sNaN
support. With introducing a special function attribute `signaling_nans`
the checks for sNaN support must be changed to use the function
attribute rather than the exception behavior.
DeltaFile
+230-22llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
+119-14llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
+23-15llvm/lib/Analysis/InstructionSimplify.cpp
+28-0llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
+28-0llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
+8-0llvm/include/llvm/Analysis/SimplifyQuery.h
+436-511 files not shown
+436-577 files

LLVM/project 4f96d7bclang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp, clang/test/CIR/Lowering call-llvm-intrinsic.cir

[CIR] Fix cir.call_llvm_intrinsic lowering for 0-result ops
DeltaFile
+27-0clang/test/CIR/Lowering/call-llvm-intrinsic.cir
+14-6clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+41-62 files

LLVM/project 4c9626fclang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[clang] fix getTemplateInstantiationArgs

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
DeltaFile
+194-429clang/lib/Sema/SemaTemplateInstantiate.cpp
+257-164clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+150-148clang/lib/Sema/SemaTemplate.cpp
+96-95clang/include/clang/AST/DeclTemplate.h
+59-129clang/lib/Sema/SemaConcept.cpp
+60-92clang/lib/AST/DeclTemplate.cpp
+816-1,05747 files not shown
+1,426-1,63953 files

LLVM/project c1c4c8emlir/lib/Dialect/Vector/Transforms VectorDropLeadUnitDim.cpp VectorTransforms.cpp, mlir/test/Dialect/Vector vector-dropleadunitdim-transforms.mlir drop-unit-dims-with-shape-cast.mlir

Revert "[mlir][vector] Migrate drop-lead-unit-dim to shape_cast #196206" (#199546)

This reverts commit 24b8bb18f3417419cbd16fcd31f4e2842df952a1 from
#196206

This broke AArch64 SVE Linux buildbots, however it was not reported due
a glitch in the buildbot infrastructure. Following bots are failing:

https://lab.llvm.org/buildbot/#/builders/121
https://lab.llvm.org/buildbot/#/builders/41
https://lab.llvm.org/buildbot/#/builders/4
https://lab.llvm.org/buildbot/#/builders/199
https://lab.llvm.org/buildbot/#/builders/17
https://lab.llvm.org/buildbot/#/builders/198
https://lab.llvm.org/buildbot/#/builders/143
DeltaFile
+176-272mlir/lib/Dialect/Vector/Transforms/VectorDropLeadUnitDim.cpp
+149-281mlir/test/Dialect/Vector/vector-dropleadunitdim-transforms.mlir
+18-20mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
+7-23mlir/test/Dialect/Vector/drop-unit-dims-with-shape-cast.mlir
+6-8mlir/test/Dialect/Vector/vector-transforms.mlir
+356-6045 files

LLVM/project 79f1900llvm/lib/Target/PowerPC PPCISelLowering.cpp PPCInstrAltivec.td, llvm/test/CodeGen/PowerPC partial-red.ll

[PowerPC] Add PPC BE support for partial reductions (#195927)

Add PPC BE support for partial reduction ISD opcodes
PARTIAL_REDUCE_UMLA/SMLA/SUMLA.
DeltaFile
+466-0llvm/test/CodeGen/PowerPC/partial-red.ll
+35-0llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+10-0llvm/lib/Target/PowerPC/PPCInstrAltivec.td
+2-0llvm/lib/Target/PowerPC/PPCISelLowering.h
+513-04 files

LLVM/project 76c2635llvm/test/CodeGen/AMDGPU/GlobalISel legalize-sextload-s16-true16.mir load-d16.ll

[AMDGPU][True16] Create tests that will demonstrate true16 G_SEXTLOAD/G_ZEXTLOAD legalization changes (#198669)

<sub>Stack created with <a
href="https://github.com/github/gh-stack">GitHub Stacks CLI</a> • <a
href="https://gh.io/stacks-feedback">Give Feedback 💬</a></sub>

Stack PRs:
https://github.com/llvm/llvm-project/pull/198670
https://github.com/llvm/llvm-project/pull/198671

See https://github.com/llvm/llvm-project/pull/195289 for previous
discussion
DeltaFile
+87-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-s16-true16.mir
+5-1llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+92-12 files

LLVM/project a97f71fllvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

PR feedback, fix tests
DeltaFile
+24-90llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+12-14llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+36-1042 files

LLVM/project 45a06acllvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel load-d16.ll

[AMDGPU][True16] Add regbank combiner cases to fix regression around G_SEXTLOAD
DeltaFile
+63-165llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+17-2llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+80-1672 files

LLVM/project 5118565llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp

Update comment around destination reg size for clarity
DeltaFile
+5-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+5-11 files

LLVM/project cb1bc7allvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

Add legalize rules and fix tests
DeltaFile
+165-63llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+90-24llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+6-9llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-s16-true16.mir
+7-2llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+268-984 files

LLVM/project 92ffda6llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU flat-saddr-load.ll

[AMDGPU][True16] Legalize extloads into 16-bit registers

Signed-off-by: Domenic Nutile <domenic.nutile at gmail.com>
DeltaFile
+80-38llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll
+2-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+82-402 files

LLVM/project a9c9925llvm/docs LangRef.rst

Document that `signaling_nans` requires `strictfp`.
DeltaFile
+2-1llvm/docs/LangRef.rst
+2-11 files

LLVM/project 35babedclang/test/CIR/CodeGenCUDA device-stub.cu

add edge case test
DeltaFile
+14-0clang/test/CIR/CodeGenCUDA/device-stub.cu
+14-01 files

LLVM/project 2bc5459llvm/include/llvm/Analysis SimplifyQuery.h, llvm/lib/Analysis InstructionSimplify.cpp

Update transformations sensitive to signaling NaNs

Previously exception handling behavior was uses as an indicator of sNaN
support. With introducing a special function attribute `signaling_nans`
the checks for sNaN support must be changed to use the function
attribute rather than the exception behavior.
DeltaFile
+230-22llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
+119-14llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
+23-15llvm/lib/Analysis/InstructionSimplify.cpp
+28-0llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
+28-0llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
+8-0llvm/include/llvm/Analysis/SimplifyQuery.h
+436-511 files not shown
+436-577 files

LLVM/project 78f660cllvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/Transforms/AtomicExpand/X86 expand-atomic-non-integer.ll

[AtomicExpand] Support non-integer atomic loads. (#199310)

This is arguably an enhancement rather than a bugfix.  But
AtomicExpandPass already tries to support some non-integer atomic ops
using cmpxchg by bitcasting to/from an integer type.  We're just missing
this one path used by atomic load.  Seems easy enough to support it.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
DeltaFile
+43-3llvm/test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
+11-1llvm/lib/CodeGen/AtomicExpandPass.cpp
+54-42 files

LLVM/project e9132e9clang/docs ReleaseNotes.rst, clang/include/clang/Basic Builtins.td

[clang] Implement `__builtin_elementwise_clmul` (#196633)

Follow-up to:
- https://github.com/llvm/llvm-project/pull/140301
- https://github.com/llvm/llvm-project/pull/168731

I'm mostly just following the steps of
https://github.com/llvm/llvm-project/pull/153113/ and other prior PRs
here. I don't have any idea how testing works yet.

CC @artagnon @oscardssmith
DeltaFile
+34-0clang/test/Sema/builtins-elementwise-math.c
+33-0clang/test/CodeGen/builtins-elementwise-math.c
+17-0clang/test/AST/ByteCode/builtin-functions.cpp
+11-0clang/lib/AST/ExprConstant.cpp
+6-0clang/include/clang/Basic/Builtins.td
+4-0clang/docs/ReleaseNotes.rst
+105-04 files not shown
+114-010 files

LLVM/project b402d5bclang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp

Type size should be dl alloc size.
DeltaFile
+2-2clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+2-21 files

LLVM/project 3e6582fclang/lib/Analysis/LifetimeSafety FactsGenerator.cpp, clang/test/Sema warn-lifetime-safety-invalidations.cpp

Reland "[LifetimeSafety] Detect iterator invalidation through container aliases" (#197873)

This relands #195231, which was reverted in commit
7c9717848851f3a71908becab4312ddc2d8482b8.

The original crash from the reproducer no longer reproduces after
#196680, #197220, and #197604. I verified the original `repro.cpp`: it
no longer hits the lifetime-safety assertion now.

Also added regression tests for the crash:

```cpp
struct SinkInteriorBorrow {
  const char *dest_; // expected-note {{this field dangles}}

  SinkInteriorBorrow(std::string *dest, int n) : dest_(dest->data()) { // expected-warning {{parameter which escapes to a field is later invalidated}}
    if (n > 0)
      dest->clear(); // expected-note {{invalidated here}}
  }

    [3 lines not shown]
DeltaFile
+105-20clang/test/Sema/warn-lifetime-safety-invalidations.cpp
+5-3clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+110-232 files

LLVM/project 5b5b860llvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

PR feedback, fix tests
DeltaFile
+24-90llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+12-14llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+36-1042 files

LLVM/project 054188bllvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel load-d16.ll

[AMDGPU][True16] Add regbank combiner cases to fix regression around G_SEXTLOAD
DeltaFile
+63-165llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+17-2llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+80-1672 files

LLVM/project b075400llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp

Update comment around destination reg size for clarity
DeltaFile
+5-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+5-11 files