LLVM/project 1db7616clang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[clang] fix getTemplateInstantiationArgs

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
DeltaFile
+194-429clang/lib/Sema/SemaTemplateInstantiate.cpp
+257-164clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+150-148clang/lib/Sema/SemaTemplate.cpp
+96-95clang/include/clang/AST/DeclTemplate.h
+59-129clang/lib/Sema/SemaConcept.cpp
+60-92clang/lib/AST/DeclTemplate.cpp
+816-1,05747 files not shown
+1,428-1,63953 files

LLVM/project 24ca009clang/test/Headers __clang_hip_math.hip, llvm/test/CodeGen/PowerPC fp-strict-fcmp-spe.ll

Merge branch 'main' into users/kasuga-fj/da-consolidate-acc-gcd
DeltaFile
+647-736clang/test/Headers/__clang_hip_math.hip
+549-615llvm/test/Transforms/SLPVectorizer/X86/arith-mul-smulo.ll
+591-509llvm/test/FileCheck/dump-input/annotations.txt
+182-888llvm/test/CodeGen/PowerPC/fp-strict-fcmp-spe.ll
+449-615llvm/test/Transforms/SLPVectorizer/X86/arith-add-uaddo.ll
+449-615llvm/test/Transforms/SLPVectorizer/X86/arith-add-saddo.ll
+2,867-3,9781,135 files not shown
+27,616-16,8611,141 files

LLVM/project cef112ellvm/include/llvm/Analysis SimplifyQuery.h, llvm/lib/Analysis InstructionSimplify.cpp

Update transformations sensitive to signaling NaNs

Previously exception handling behavior was uses as an indicator of sNaN
support. With introducing a special function attribute `signaling_nans`
the checks for sNaN support must be changed to use the function
attribute rather than the exception behavior.
DeltaFile
+230-22llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
+119-14llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
+23-15llvm/lib/Analysis/InstructionSimplify.cpp
+28-0llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
+28-0llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
+8-0llvm/include/llvm/Analysis/SimplifyQuery.h
+436-511 files not shown
+436-577 files

LLVM/project 4f96d7bclang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp, clang/test/CIR/Lowering call-llvm-intrinsic.cir

[CIR] Fix cir.call_llvm_intrinsic lowering for 0-result ops
DeltaFile
+27-0clang/test/CIR/Lowering/call-llvm-intrinsic.cir
+14-6clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+41-62 files

LLVM/project 4c9626fclang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[clang] fix getTemplateInstantiationArgs

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
DeltaFile
+194-429clang/lib/Sema/SemaTemplateInstantiate.cpp
+257-164clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+150-148clang/lib/Sema/SemaTemplate.cpp
+96-95clang/include/clang/AST/DeclTemplate.h
+59-129clang/lib/Sema/SemaConcept.cpp
+60-92clang/lib/AST/DeclTemplate.cpp
+816-1,05747 files not shown
+1,426-1,63953 files

LLVM/project c1c4c8emlir/lib/Dialect/Vector/Transforms VectorDropLeadUnitDim.cpp VectorTransforms.cpp, mlir/test/Dialect/Vector vector-dropleadunitdim-transforms.mlir drop-unit-dims-with-shape-cast.mlir

Revert "[mlir][vector] Migrate drop-lead-unit-dim to shape_cast #196206" (#199546)

This reverts commit 24b8bb18f3417419cbd16fcd31f4e2842df952a1 from
#196206

This broke AArch64 SVE Linux buildbots, however it was not reported due
a glitch in the buildbot infrastructure. Following bots are failing:

https://lab.llvm.org/buildbot/#/builders/121
https://lab.llvm.org/buildbot/#/builders/41
https://lab.llvm.org/buildbot/#/builders/4
https://lab.llvm.org/buildbot/#/builders/199
https://lab.llvm.org/buildbot/#/builders/17
https://lab.llvm.org/buildbot/#/builders/198
https://lab.llvm.org/buildbot/#/builders/143
DeltaFile
+176-272mlir/lib/Dialect/Vector/Transforms/VectorDropLeadUnitDim.cpp
+149-281mlir/test/Dialect/Vector/vector-dropleadunitdim-transforms.mlir
+18-20mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
+7-23mlir/test/Dialect/Vector/drop-unit-dims-with-shape-cast.mlir
+6-8mlir/test/Dialect/Vector/vector-transforms.mlir
+356-6045 files

LLVM/project 79f1900llvm/lib/Target/PowerPC PPCISelLowering.cpp PPCInstrAltivec.td, llvm/test/CodeGen/PowerPC partial-red.ll

[PowerPC] Add PPC BE support for partial reductions (#195927)

Add PPC BE support for partial reduction ISD opcodes
PARTIAL_REDUCE_UMLA/SMLA/SUMLA.
DeltaFile
+466-0llvm/test/CodeGen/PowerPC/partial-red.ll
+35-0llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+10-0llvm/lib/Target/PowerPC/PPCInstrAltivec.td
+2-0llvm/lib/Target/PowerPC/PPCISelLowering.h
+513-04 files

LLVM/project 76c2635llvm/test/CodeGen/AMDGPU/GlobalISel legalize-sextload-s16-true16.mir load-d16.ll

[AMDGPU][True16] Create tests that will demonstrate true16 G_SEXTLOAD/G_ZEXTLOAD legalization changes (#198669)

<sub>Stack created with <a
href="https://github.com/github/gh-stack">GitHub Stacks CLI</a> • <a
href="https://gh.io/stacks-feedback">Give Feedback 💬</a></sub>

Stack PRs:
https://github.com/llvm/llvm-project/pull/198670
https://github.com/llvm/llvm-project/pull/198671

See https://github.com/llvm/llvm-project/pull/195289 for previous
discussion
DeltaFile
+87-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-s16-true16.mir
+5-1llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+92-12 files

LLVM/project a97f71fllvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

PR feedback, fix tests
DeltaFile
+24-90llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+12-14llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+36-1042 files

LLVM/project 45a06acllvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel load-d16.ll

[AMDGPU][True16] Add regbank combiner cases to fix regression around G_SEXTLOAD
DeltaFile
+63-165llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+17-2llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+80-1672 files

LLVM/project 5118565llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp

Update comment around destination reg size for clarity
DeltaFile
+5-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+5-11 files

LLVM/project cb1bc7allvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

Add legalize rules and fix tests
DeltaFile
+165-63llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+90-24llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+6-9llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-s16-true16.mir
+7-2llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+268-984 files

LLVM/project 92ffda6llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU flat-saddr-load.ll

[AMDGPU][True16] Legalize extloads into 16-bit registers

Signed-off-by: Domenic Nutile <domenic.nutile at gmail.com>
DeltaFile
+80-38llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll
+2-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+82-402 files

LLVM/project a9c9925llvm/docs LangRef.rst

Document that `signaling_nans` requires `strictfp`.
DeltaFile
+2-1llvm/docs/LangRef.rst
+2-11 files

LLVM/project 35babedclang/test/CIR/CodeGenCUDA device-stub.cu

add edge case test
DeltaFile
+14-0clang/test/CIR/CodeGenCUDA/device-stub.cu
+14-01 files

LLVM/project 2bc5459llvm/include/llvm/Analysis SimplifyQuery.h, llvm/lib/Analysis InstructionSimplify.cpp

Update transformations sensitive to signaling NaNs

Previously exception handling behavior was uses as an indicator of sNaN
support. With introducing a special function attribute `signaling_nans`
the checks for sNaN support must be changed to use the function
attribute rather than the exception behavior.
DeltaFile
+230-22llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
+119-14llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
+23-15llvm/lib/Analysis/InstructionSimplify.cpp
+28-0llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
+28-0llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
+8-0llvm/include/llvm/Analysis/SimplifyQuery.h
+436-511 files not shown
+436-577 files

LLVM/project 78f660cllvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/Transforms/AtomicExpand/X86 expand-atomic-non-integer.ll

[AtomicExpand] Support non-integer atomic loads. (#199310)

This is arguably an enhancement rather than a bugfix.  But
AtomicExpandPass already tries to support some non-integer atomic ops
using cmpxchg by bitcasting to/from an integer type.  We're just missing
this one path used by atomic load.  Seems easy enough to support it.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
DeltaFile
+43-3llvm/test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
+11-1llvm/lib/CodeGen/AtomicExpandPass.cpp
+54-42 files

LLVM/project e9132e9clang/docs ReleaseNotes.rst, clang/include/clang/Basic Builtins.td

[clang] Implement `__builtin_elementwise_clmul` (#196633)

Follow-up to:
- https://github.com/llvm/llvm-project/pull/140301
- https://github.com/llvm/llvm-project/pull/168731

I'm mostly just following the steps of
https://github.com/llvm/llvm-project/pull/153113/ and other prior PRs
here. I don't have any idea how testing works yet.

CC @artagnon @oscardssmith
DeltaFile
+34-0clang/test/Sema/builtins-elementwise-math.c
+33-0clang/test/CodeGen/builtins-elementwise-math.c
+17-0clang/test/AST/ByteCode/builtin-functions.cpp
+11-0clang/lib/AST/ExprConstant.cpp
+6-0clang/include/clang/Basic/Builtins.td
+4-0clang/docs/ReleaseNotes.rst
+105-04 files not shown
+114-010 files

LLVM/project b402d5bclang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp

Type size should be dl alloc size.
DeltaFile
+2-2clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+2-21 files

LLVM/project 3e6582fclang/lib/Analysis/LifetimeSafety FactsGenerator.cpp, clang/test/Sema warn-lifetime-safety-invalidations.cpp

Reland "[LifetimeSafety] Detect iterator invalidation through container aliases" (#197873)

This relands #195231, which was reverted in commit
7c9717848851f3a71908becab4312ddc2d8482b8.

The original crash from the reproducer no longer reproduces after
#196680, #197220, and #197604. I verified the original `repro.cpp`: it
no longer hits the lifetime-safety assertion now.

Also added regression tests for the crash:

```cpp
struct SinkInteriorBorrow {
  const char *dest_; // expected-note {{this field dangles}}

  SinkInteriorBorrow(std::string *dest, int n) : dest_(dest->data()) { // expected-warning {{parameter which escapes to a field is later invalidated}}
    if (n > 0)
      dest->clear(); // expected-note {{invalidated here}}
  }

    [3 lines not shown]
DeltaFile
+105-20clang/test/Sema/warn-lifetime-safety-invalidations.cpp
+5-3clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+110-232 files

LLVM/project 5b5b860llvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

PR feedback, fix tests
DeltaFile
+24-90llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+12-14llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+36-1042 files

LLVM/project 054188bllvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel load-d16.ll

[AMDGPU][True16] Add regbank combiner cases to fix regression around G_SEXTLOAD
DeltaFile
+63-165llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+17-2llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+80-1672 files

LLVM/project b075400llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp

Update comment around destination reg size for clarity
DeltaFile
+5-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+5-11 files

LLVM/project aa095dbllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

Add legalize rules and fix tests
DeltaFile
+165-63llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+90-24llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+6-9llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-s16-true16.mir
+7-2llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+268-984 files

LLVM/project 9a486ecllvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU flat-saddr-load.ll

[AMDGPU][True16] Legalize extloads into 16-bit registers

Signed-off-by: Domenic Nutile <domenic.nutile at gmail.com>
DeltaFile
+80-38llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll
+2-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+82-402 files

LLVM/project b688dc0llvm/test/CodeGen/AMDGPU/GlobalISel legalize-sextload-zextload-s16-true16.mir legalize-sextload-s16-true16.mir

PR feedback
DeltaFile
+0-376llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-zextload-s16-true16.mir
+87-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-s16-true16.mir
+5-1llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+92-3773 files

LLVM/project 69b473dllvm/test/CodeGen/AMDGPU/GlobalISel legalize-sextload-zextload-s16-true16.mir

[AMDGPU][True16] Create tests that will demonstrate true16 G_SEXTLOAD/G_ZEXTLOAD legalization changes
DeltaFile
+376-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-zextload-s16-true16.mir
+376-01 files

LLVM/project c52c3d2llvm/lib/Target/AMDGPU VOP3PInstructions.td AMDGPU.td, llvm/test/MC/AMDGPU gfx11_asm_vop3p_alias.s gfx12_asm_vop3p_aliases.s

[AMDGPU] Fix v_dot4_i32_i8 alias to set neg_lo modifiers (#197998)

Fixes issue here https://github.com/ROCm/ROCm/issues/6126

The `v_dot4_i32_i8` assembly alias was not setting the `neg_lo` modifier
bits when converted to `v_dot4_i32_iu8`, which causes signed int8
operands to be treated as unsigned.

For example: `q=[1,-1,1,-1], k=[1,1,1,1]`: expected 0, returned 512. The
instruction is computing `1*1 + 255*1 + 1*1 + 255*1 = 512` ; treating
`-1 (0xFF)` as `255`.

On AMD GFX11+, the native `v_dot4_i32_i8` instruction doesn't exist. The
hardware provides `v_dot4_i32_iu8` with `neg_lo` modifier bits to
control signedness of each operand. The compiler correctly lowers
`v_dot4_i32_i8` intrinsics by setting `neg_lo:[1,1,0]`, but inline
assembly using the `v_dot4_i32_i8` mnemonic bypasses this lowering and
goes directly to the assembler.


    [10 lines not shown]
DeltaFile
+17-5llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+18-0llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3p.txt
+2-2llvm/test/MC/AMDGPU/gfx11_asm_vop3p_alias.s
+2-2llvm/test/MC/AMDGPU/gfx12_asm_vop3p_aliases.s
+3-0llvm/lib/Target/AMDGPU/AMDGPU.td
+42-95 files

LLVM/project b969520mlir/include/mlir/Dialect/SCF/IR SCFOps.td, mlir/lib/Dialect/SCF/IR SCF.cpp

[mlir][SCF] Add `scf.loop` op and terminators
DeltaFile
+170-0mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
+162-0mlir/lib/Dialect/SCF/IR/SCF.cpp
+101-0mlir/test/Dialect/SCF/invalid.mlir
+73-0mlir/test/Dialect/SCF/ops.mlir
+506-04 files

LLVM/project 550ddeallvm/test/CodeGen/AArch64 sme-framelower-use-bp.ll

Remove redundant check
DeltaFile
+0-1llvm/test/CodeGen/AArch64/sme-framelower-use-bp.ll
+0-11 files