LLVM/project 78f660cllvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/Transforms/AtomicExpand/X86 expand-atomic-non-integer.ll

[AtomicExpand] Support non-integer atomic loads. (#199310)

This is arguably an enhancement rather than a bugfix.  But
AtomicExpandPass already tries to support some non-integer atomic ops
using cmpxchg by bitcasting to/from an integer type.  We're just missing
this one path used by atomic load.  Seems easy enough to support it.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
DeltaFile
+43-3llvm/test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
+11-1llvm/lib/CodeGen/AtomicExpandPass.cpp
+54-42 files

LLVM/project e9132e9clang/docs ReleaseNotes.rst, clang/include/clang/Basic Builtins.td

[clang] Implement `__builtin_elementwise_clmul` (#196633)

Follow-up to:
- https://github.com/llvm/llvm-project/pull/140301
- https://github.com/llvm/llvm-project/pull/168731

I'm mostly just following the steps of
https://github.com/llvm/llvm-project/pull/153113/ and other prior PRs
here. I don't have any idea how testing works yet.

CC @artagnon @oscardssmith
DeltaFile
+34-0clang/test/Sema/builtins-elementwise-math.c
+33-0clang/test/CodeGen/builtins-elementwise-math.c
+17-0clang/test/AST/ByteCode/builtin-functions.cpp
+11-0clang/lib/AST/ExprConstant.cpp
+6-0clang/include/clang/Basic/Builtins.td
+4-0clang/docs/ReleaseNotes.rst
+105-04 files not shown
+114-010 files

LLVM/project b402d5bclang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp

Type size should be dl alloc size.
DeltaFile
+2-2clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+2-21 files

LLVM/project 3e6582fclang/lib/Analysis/LifetimeSafety FactsGenerator.cpp, clang/test/Sema warn-lifetime-safety-invalidations.cpp

Reland "[LifetimeSafety] Detect iterator invalidation through container aliases" (#197873)

This relands #195231, which was reverted in commit
7c9717848851f3a71908becab4312ddc2d8482b8.

The original crash from the reproducer no longer reproduces after
#196680, #197220, and #197604. I verified the original `repro.cpp`: it
no longer hits the lifetime-safety assertion now.

Also added regression tests for the crash:

```cpp
struct SinkInteriorBorrow {
  const char *dest_; // expected-note {{this field dangles}}

  SinkInteriorBorrow(std::string *dest, int n) : dest_(dest->data()) { // expected-warning {{parameter which escapes to a field is later invalidated}}
    if (n > 0)
      dest->clear(); // expected-note {{invalidated here}}
  }

    [3 lines not shown]
DeltaFile
+105-20clang/test/Sema/warn-lifetime-safety-invalidations.cpp
+5-3clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+110-232 files

LLVM/project 5b5b860llvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

PR feedback, fix tests
DeltaFile
+24-90llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+12-14llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+36-1042 files

LLVM/project 054188bllvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel load-d16.ll

[AMDGPU][True16] Add regbank combiner cases to fix regression around G_SEXTLOAD
DeltaFile
+63-165llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+17-2llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+80-1672 files

LLVM/project b075400llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp

Update comment around destination reg size for clarity
DeltaFile
+5-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+5-11 files

LLVM/project aa095dbllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

Add legalize rules and fix tests
DeltaFile
+165-63llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+90-24llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+6-9llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-s16-true16.mir
+7-2llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+268-984 files

LLVM/project 9a486ecllvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU flat-saddr-load.ll

[AMDGPU][True16] Legalize extloads into 16-bit registers

Signed-off-by: Domenic Nutile <domenic.nutile at gmail.com>
DeltaFile
+80-38llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll
+2-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+82-402 files

LLVM/project b688dc0llvm/test/CodeGen/AMDGPU/GlobalISel legalize-sextload-zextload-s16-true16.mir legalize-sextload-s16-true16.mir

PR feedback
DeltaFile
+0-376llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-zextload-s16-true16.mir
+87-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-s16-true16.mir
+5-1llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+92-3773 files

LLVM/project 69b473dllvm/test/CodeGen/AMDGPU/GlobalISel legalize-sextload-zextload-s16-true16.mir

[AMDGPU][True16] Create tests that will demonstrate true16 G_SEXTLOAD/G_ZEXTLOAD legalization changes
DeltaFile
+376-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-zextload-s16-true16.mir
+376-01 files

LLVM/project c52c3d2llvm/lib/Target/AMDGPU VOP3PInstructions.td AMDGPU.td, llvm/test/MC/AMDGPU gfx11_asm_vop3p_alias.s gfx12_asm_vop3p_aliases.s

[AMDGPU] Fix v_dot4_i32_i8 alias to set neg_lo modifiers (#197998)

Fixes issue here https://github.com/ROCm/ROCm/issues/6126

The `v_dot4_i32_i8` assembly alias was not setting the `neg_lo` modifier
bits when converted to `v_dot4_i32_iu8`, which causes signed int8
operands to be treated as unsigned.

For example: `q=[1,-1,1,-1], k=[1,1,1,1]`: expected 0, returned 512. The
instruction is computing `1*1 + 255*1 + 1*1 + 255*1 = 512` ; treating
`-1 (0xFF)` as `255`.

On AMD GFX11+, the native `v_dot4_i32_i8` instruction doesn't exist. The
hardware provides `v_dot4_i32_iu8` with `neg_lo` modifier bits to
control signedness of each operand. The compiler correctly lowers
`v_dot4_i32_i8` intrinsics by setting `neg_lo:[1,1,0]`, but inline
assembly using the `v_dot4_i32_i8` mnemonic bypasses this lowering and
goes directly to the assembler.


    [10 lines not shown]
DeltaFile
+17-5llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+18-0llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3p.txt
+2-2llvm/test/MC/AMDGPU/gfx11_asm_vop3p_alias.s
+2-2llvm/test/MC/AMDGPU/gfx12_asm_vop3p_aliases.s
+3-0llvm/lib/Target/AMDGPU/AMDGPU.td
+42-95 files

LLVM/project b969520mlir/include/mlir/Dialect/SCF/IR SCFOps.td, mlir/lib/Dialect/SCF/IR SCF.cpp

[mlir][SCF] Add `scf.loop` op and terminators
DeltaFile
+170-0mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
+162-0mlir/lib/Dialect/SCF/IR/SCF.cpp
+101-0mlir/test/Dialect/SCF/invalid.mlir
+73-0mlir/test/Dialect/SCF/ops.mlir
+506-04 files

LLVM/project 72b1c13lldb/include/lldb/ValueObject DILEval.h DILParser.h, lldb/source/ValueObject DILEval.cpp DILParser.cpp

[lldb] Add enum lookup to DIL (#192065)
DeltaFile
+36-0lldb/test/API/commands/frame/var-dil/expr/EnumValueLookup/TestEnumValueLookup.py
+29-0lldb/source/ValueObject/DILEval.cpp
+19-0lldb/test/API/commands/frame/var-dil/expr/EnumValueLookup/main.cpp
+5-0lldb/include/lldb/ValueObject/DILEval.h
+2-2lldb/source/ValueObject/DILParser.cpp
+3-0lldb/include/lldb/ValueObject/DILParser.h
+94-21 files not shown
+97-27 files

LLVM/project 2894d75flang/lib/Semantics check-omp-structure.cpp, flang/test/Semantics/OpenMP workdistribute05.f90

[Flang][OpenMP] Allow workdistribute inside 'target teams' (#199006)

Currently, a `workdistribute` construct nested inside of a combined
`target teams` is incorrectly reported as an error. This patch fixes
that.
DeltaFile
+23-0flang/test/Semantics/OpenMP/workdistribute05.f90
+1-1flang/lib/Semantics/check-omp-structure.cpp
+24-12 files

LLVM/project 586b9a4clang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[clang] fix getTemplateInstantiationArgs

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
DeltaFile
+194-429clang/lib/Sema/SemaTemplateInstantiate.cpp
+257-164clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+150-148clang/lib/Sema/SemaTemplate.cpp
+96-95clang/include/clang/AST/DeclTemplate.h
+59-129clang/lib/Sema/SemaConcept.cpp
+60-92clang/lib/AST/DeclTemplate.cpp
+816-1,05746 files not shown
+1,384-1,63852 files

LLVM/project 00fb002clang/include/clang/Serialization ASTRecordReader.h, clang/lib/AST ASTContext.cpp Type.cpp

trivial changes
DeltaFile
+20-14clang/lib/Sema/SemaOpenMP.cpp
+18-14clang/lib/AST/ASTContext.cpp
+16-15clang/lib/Sema/SemaTemplate.cpp
+14-11clang/lib/AST/Type.cpp
+14-8clang/lib/AST/ASTDiagnostic.cpp
+11-6clang/include/clang/Serialization/ASTRecordReader.h
+93-6833 files not shown
+202-15239 files

LLVM/project 7cc0b13clang/include/clang/AST ASTContext.h, clang/lib/AST ASTContext.cpp ItaniumMangle.cpp

[clang] implement CWG2064: ignore value dependence for decltype

The 'decltype' for a value-dependent (but non-type-dependent) should be known,
so this patch makes them non-opaque instead.

This patch also implements what's neceessary to allow overloading
on pure differences in instantiation dependence, making `std::void_t`
usable for SFINAE purposes.

This also readds a few test cases from da98651, which was a previous attempt
at resolving CWG2064.

Fixes #8740
Fixes #61818
Fixes #190388
DeltaFile
+888-161clang/lib/AST/ASTContext.cpp
+328-12clang/test/SemaTemplate/instantiation-dependence.cpp
+176-96clang/lib/AST/ItaniumMangle.cpp
+100-98clang/lib/Sema/SemaCXXScopeSpec.cpp
+62-57clang/lib/AST/Type.cpp
+88-11clang/include/clang/AST/ASTContext.h
+1,642-43569 files not shown
+2,376-79475 files

LLVM/project c237f89clang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[clang] fix getTemplateInstantiationArgs

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
DeltaFile
+194-429clang/lib/Sema/SemaTemplateInstantiate.cpp
+261-167clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+150-148clang/lib/Sema/SemaTemplate.cpp
+96-95clang/include/clang/AST/DeclTemplate.h
+59-129clang/lib/Sema/SemaConcept.cpp
+60-92clang/lib/AST/DeclTemplate.cpp
+820-1,06046 files not shown
+1,388-1,64152 files

LLVM/project 925ec82mlir/docs Tokens.md LangRef.md

address comments: symbols / IsolatedFromAbove
DeltaFile
+6-1mlir/docs/Tokens.md
+1-2mlir/docs/LangRef.md
+7-32 files

LLVM/project 2928ac8clang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[clang] fix getTemplateInstantiationArgs

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
DeltaFile
+194-429clang/lib/Sema/SemaTemplateInstantiate.cpp
+263-167clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+150-148clang/lib/Sema/SemaTemplate.cpp
+96-95clang/include/clang/AST/DeclTemplate.h
+59-129clang/lib/Sema/SemaConcept.cpp
+60-92clang/lib/AST/DeclTemplate.cpp
+822-1,06046 files not shown
+1,390-1,64152 files

LLVM/project e1a9576clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp, clang/test/CodeGen/AArch64 neon-intrinsics.c

[CIR][AArch64] Lower NEON vsli/vsliq intrinsics (#198309)

### summary

part of: https://github.com/llvm/llvm-project/issues/185382

Lower the AArch64 NEON shift-left-and-insert intrinsics (`vsli_n_v` /
`vsliq_n_v`) in the CIR codegen path. The lowering mirrors classic
CodeGen (`clang/lib/CodeGen/TargetBuiltins/ARM.cpp`): bitcast both
vector operands to the target element type and emit a direct
`llvm.aarch64.neon.vsli` intrinsic call.
DeltaFile
+289-0clang/test/CodeGen/AArch64/neon/intrinsics.c
+0-284clang/test/CodeGen/AArch64/neon-intrinsics.c
+14-1clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+303-2853 files

LLVM/project 84f9530clang/lib/Driver/ToolChains Hexagon.cpp, clang/test/Driver hexagon-toolchain-elf.c

[Hexagon] Support reserving R16-R28 registers via -ffixed-rN (#197208)

Extend register reservation from R19-only to R16-R28. This allows users
to reserve callee-saved registers (R16-R27) and R28 via command-line
flags -ffixed-r16 through -ffixed-r28. The single bool ReservedR19 is
replaced with an array-based approach (ReservedR[32]) to scale cleanly
across all supported registers.

---------

Co-authored-by: quic-santdas <quic_santdas at quicinc.com>
DeltaFile
+38-2clang/test/Driver/hexagon-toolchain-elf.c
+30-0llvm/test/CodeGen/Hexagon/reserved-regs.ll
+20-3clang/lib/Driver/ToolChains/Hexagon.cpp
+7-2llvm/lib/Target/Hexagon/HexagonRegisterInfo.cpp
+6-2llvm/lib/Target/Hexagon/HexagonSubtarget.h
+4-2llvm/lib/Target/Hexagon/Hexagon.td
+105-116 files

LLVM/project 720dd96clang/lib/AST Decl.cpp DeclCXX.cpp, clang/lib/Sema SemaLookup.cpp

[clang] preserve exact redeclaration for getTemplateInstantiationPattern

This makes these functions not always return the definition if any.
The few users which depend on this are updated to fetch the definition
themselves.

Also fixes the VarDecl variant returning the queried declaration itself.
DeltaFile
+7-28clang/lib/AST/Decl.cpp
+9-10clang/test/AST/ast-dump-templates-pattern.cpp
+3-10clang/lib/AST/DeclCXX.cpp
+6-6clang/test/AST/ast-dump-decl.cpp
+6-4clang/lib/Sema/SemaLookup.cpp
+1-1clang/lib/StaticAnalyzer/Core/BugSuppression.cpp
+32-592 files not shown
+34-618 files

LLVM/project 3056addllvm/lib/Transforms/Vectorize VPlan.h VPlanRecipes.cpp

[VPlan] Rename Expression::isSingleScalar (NFC) (#199041)

The single-scalar terminology, as it is used in other places indicates
that all operands are scalars and that the result is a scalar.
VPExpressionRecipe::isSingleScalar is a misnomer, and is actually a
vector-to-scalar, using the existing terminology. Rename it for clarity.
DeltaFile
+2-2llvm/lib/Transforms/Vectorize/VPlan.h
+1-3llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+1-1llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+4-63 files

LLVM/project 00fcb51flang/test/Driver flang-f-opts.f90

Attempt to fix flang test
DeltaFile
+0-2flang/test/Driver/flang-f-opts.f90
+0-21 files

LLVM/project 59681c6libcxx/include any, libcxx/test/std/utilities/any/any.nonmembers/any.cast const_reference_types.verify.cpp void.const.verify.cpp

[libc++] remove duplicate assertions for void/reference const any_cast (#199425)

For test cases of the const overload of any_cast, such as:
```C++
void test() {
  std::any a = 0;
  const std::any& a2 = a;
  (void)std::any_cast<int&>(&a2);
}
```
(And similarly for void).

The problem is that the assertions are implemented both in the const and
non-const any_cast overloads, but since the const overload delegates to
the non-const overload, that ends up producing the same assertion twice.

This separates those test cases, because those assertions are
implemented in the function body, and that's only instantiated once per
specialization, not once per use.
DeltaFile
+33-0libcxx/test/std/utilities/any/any.nonmembers/any.cast/const_reference_types.verify.cpp
+23-0libcxx/test/std/utilities/any/any.nonmembers/any.cast/void.const.verify.cpp
+0-17libcxx/test/std/utilities/any/any.nonmembers/any.cast/reference_types.verify.cpp
+3-14libcxx/test/std/utilities/any/any.nonmembers/any.cast/void.verify.cpp
+0-2libcxx/include/any
+59-335 files

LLVM/project f347813clang/lib/CIR/CodeGen CIRGenBuiltinAMDGPU.cpp CIRGenFunction.h

[CIR] Add emitBuiltinWithOneOverloadedType helper
DeltaFile
+12-32clang/lib/CIR/CodeGen/CIRGenBuiltinAMDGPU.cpp
+19-0clang/lib/CIR/CodeGen/CIRGenFunction.h
+31-322 files

LLVM/project 25bb6a9llvm/lib/Transforms/InstCombine InstCombineCasts.cpp, llvm/test/Transforms/InstCombine trunc-minmax-intrinsics.ll

[InstCombine] Narrow umin/umax/smin/smax through trunc. (#199213)

Update EvaluateInDifferentType / canEvaluateTruncated to support
narrowing umin/umax/smin/smax intrinsics, when their result fits in the
narrow type: zero high bits for umin/umax, or enough sign
bits for smin/smax.

Alive2 Proofs:
 * umin/umax with high bits zero: https://alive2.llvm.org/ce/z/dJC_Fj
 * smin/smax with sign-bits set: https://alive2.llvm.org/ce/z/z7vM8Z

End-to-end examples from C workloads performing pixel math:
https://llvm.godbolt.org/z/jK3bd3GfY

PR: https://github.com/llvm/llvm-project/pull/199213
DeltaFile
+623-0llvm/test/Transforms/InstCombine/trunc-minmax-intrinsics.ll
+63-0llvm/test/Transforms/PhaseOrdering/AArch64/trunc-intrinsics.ll
+36-0llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+722-03 files

LLVM/project df4f0d0llvm/lib/Target/X86 X86ISelLowering.cpp X86ISelLowering.h, llvm/test/Transforms/AtomicExpand/X86 expand-atomic-non-integer.ll

[X86] Remove shouldCastAtomicLoadInIR; use DAG combine instead

Remove X86's shouldCastAtomicLoadInIR override that cast FP atomic
loads to integer at the IR level. Instead, handle this in a pre-legalize
DAG combine (combineAtomicLoad) that rewrites FP/FP-vector atomic loads
to integer atomic loads plus a bitcast.

This depends on #199310 which adds the necessary cmpxchg support for
non-integer atomic loads in AtomicExpand.
DeltaFile
+25-7llvm/lib/Target/X86/X86ISelLowering.cpp
+1-2llvm/test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
+0-2llvm/lib/Target/X86/X86ISelLowering.h
+26-113 files