LLVM/project 2e290dallvm/lib/CodeGen/GlobalISel CombinerHelper.cpp, llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

[AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FMA/FMAD pattern

Add conservative FMA/FMAD recognition to allMulUsesCanBeContracted:
a multiply used by an existing FMA/FMAD is assumed to be contractable
(it's already being contracted elsewhere). This avoids unnecessary
contraction blocking for multiplies that feed into FMA chains.

Also adds FMA/FMAD to the FPEXT user set (fpext(fmul) --> fma is
recognized as contractable when isFPExtFoldable).

Guards all remaining FMA-chain reassociation fold sites in both
SDAG (visitFADDForFMACombine/visitFSUBForFMACombine, 8 sites) and
GISel (matchCombineFAddFpExtFMulToFMadOrFMAAggressive, 4 sites).

This re-enables contractions that were conservatively blocked in
earlier patches where the multiply had an FMA use that wasn't yet
recognized: dagcombine-fma-crash.ll and dagcombine-fma-fmad.ll
CHECK lines revert to upstream behavior.

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+95-96llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
+20-3llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+10-12llvm/test/CodeGen/AMDGPU/dagcombine-fma-crash.ll
+17-2llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+142-1134 files

LLVM/project 58dffabllvm/include/llvm/CodeGen/GlobalISel CombinerHelper.h, llvm/lib/CodeGen/GlobalISel CombinerHelper.cpp

[AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FPEXT pattern

Extend the allMulUsesCanBeContracted analysis to recognize FPEXT patterns
where the multiply result flows through fpext before being used in
contractable operations (fadd, fsub). This covers:
  - fmul --> fpext --> {fadd, fsub}: FPEXT folds if isFPExtFoldable
  - fmul --> fpext --> fneg --> fsub: FPEXT then FNEG to FSUB
  - fmul --> fneg --> fpext --> fsub: FNEG then FPEXT folds if foldable

Also adds allMulUsesCanBeContracted guards to all FPEXT fold sites in
both SDAG (visitFADDForFMACombine, visitFSUBForFMACombine) and GISel
(matchCombineFAddFpExtFMulToFMadOrFMA, matchCombineFSubFpExtFMulToFMadOrFMA,
matchCombineFSubFpExtFNegFMulToFMadOrFMA).

Fixes a missing isFPExtFoldable check in GISel's
matchCombineFSubFpExtFMulToFMadOrFMA which could fold without verifying
the extension is actually foldable.

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+1,930-11llvm/test/CodeGen/AMDGPU/fma-multiple-uses-contraction.ll
+93-14llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+78-13llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+2-1llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+2,103-394 files

LLVM/project 6bcce6allvm/lib/CodeGen/GlobalISel CombinerHelper.cpp, llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

[AMDGPU][DAGCombiner][GlobalISel] Extend allMulUsesCanBeContracted with FNEG pattern

Extend allMulUsesCanBeContracted() to recognize fmul -> fneg -> fsub
chains as contractable uses. This allows FMA contraction when a multiply
feeds an fneg that is only used by fsub operations.

Changes:
- DAGCombiner.cpp: Add ISD::FNEG case to allMulUsesCanBeContracted()
  checking that all FNEG users are ISD::FSUB. Update 1 fold site guard
  in visitFSUBForFMACombine (fsub(fneg(fmul))).
- CombinerHelper.cpp: Add G_FNEG case to allMulUsesCanBeContracted()
  checking that all FNEG users are G_FSUB. Update 2 fold site guards
  in matchCombineFSubFNegFMulToFMadOrFMA. Fix guard ordering to check
  isContractableFMul before allMulUsesCanBeContracted (cheap first).
- Add 7 new test functions to fma-multiple-uses-contraction.ll covering
  fneg single-use, multi-use, mixed contractable/non-contractable, and
  cross-pattern (P1 direct + P2 fneg) interactions.
- Update mad-combine.ll CHECK lines affected by the guard changes.


    [4 lines not shown]
DeltaFile
+666-0llvm/test/CodeGen/AMDGPU/fma-multiple-uses-contraction.ll
+33-7llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+22-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+4-5llvm/test/CodeGen/AMDGPU/mad-combine.ll
+725-144 files

LLVM/project d935a23llvm/lib/CodeGen/GlobalISel CombinerHelper.cpp, llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

[DAGCombiner][GlobalISel] Prevent FMA contraction when fmul cannot be eliminated (FADD/FSUB pattern)

fmul nodes with multiple uses can currently be contracted into FMA
operations even when the fmul itself cannot be eliminated, resulting in
a redundant multiply (wasted power and compute). The existing guard
`Aggressive || N0->hasOneUse()` allows contraction under Aggressive mode
regardless of whether the multiply can be removed.

This patch tightens the guard to:
  `N0->hasOneUse() || (Aggressive && allMulUsesCanBeContracted(N0))`

`allMulUsesCanBeContracted()` iterates all users of the multiply and
returns true only if every use is itself contractable into an FMA.
For this first patch, only direct FADD and FSUB uses are recognized as
contractable (FNEG, FPEXT, and FMA/FMAD patterns follow in subsequent
patches).

The change is applied symmetrically to both DAGCombiner and GlobalISel:
- DAGCombiner: 4 fold sites in visitFADDForFMACombine (2 sites) and

    [8 lines not shown]
DeltaFile
+835-0llvm/test/CodeGen/AMDGPU/fma-multiple-uses-contraction.ll
+115-148llvm/test/CodeGen/AMDGPU/fma.f16.ll
+118-106llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
+61-26llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+39-42llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll
+43-5llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+1,211-3277 files not shown
+1,254-36413 files

LLVM/project 2bee958llvm/test/CodeGen/AMDGPU llvm.amdgcn.cvt.pknorm.i16.ll llvm.amdgcn.cvt.pknorm.u16.ll

[AMDGPU][NFC] Update test to use update_llc_test_checks (#188102)

Also add globalisel run lines.

Precommit test for https://github.com/llvm/llvm-project/pull/187834.
DeltaFile
+408-40llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pknorm.i16.ll
+408-40llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pknorm.u16.ll
+816-802 files

LLVM/project 53e3ed8llvm/utils/gn/secondary/lldb/test BUILD.gn, llvm/utils/gn/secondary/llvm/lib/Target/AMDGPU BUILD.gn

[gn build] Port commits (#188129)

f7d5e593d38a
b2edc0a3f8a5
3e4efe3ed4a2
DeltaFile
+2-1llvm/utils/gn/secondary/lldb/test/BUILD.gn
+1-0llvm/utils/gn/secondary/llvm/lib/Target/AMDGPU/BUILD.gn
+1-0llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
+4-13 files

LLVM/project fe03749clang/lib/Driver/ToolChains Cuda.cpp AMDGPU.cpp

[Clang] Do not emit multi-gpu warning if they are all the same (#185490)

Summary:
This warning exists because if you do `-mcpu=native` in some contexts it
may not be obvious which GPU you get. But if they are all the same then
it really don't make a difference since we just pass the first one to
`-mcpu` anyway. Relax this so it doesn't annoyingly warn on machines
with more than one of the same GPU.

---------

Co-authored-by: Jacob Lambert <jacob.lambert at amd.com>
DeltaFile
+5-3clang/lib/Driver/ToolChains/Cuda.cpp
+2-2clang/lib/Driver/ToolChains/AMDGPU.cpp
+7-52 files

LLVM/project 6d6c85alibc/src/__support/math log.h

fma
DeltaFile
+1-2libc/src/__support/math/log.h
+1-21 files

LLVM/project 3d6cc6dlibc/src/__support/CPP bit.h

static
DeltaFile
+1-1libc/src/__support/CPP/bit.h
+1-11 files

LLVM/project 40b75e2libc/src/__support/CPP iterator.h, libc/src/__support/wctype perfect_hash_map.h

cleanup
DeltaFile
+1-2libc/src/__support/wctype/perfect_hash_map.h
+1-1libc/src/__support/CPP/iterator.h
+2-32 files

LLVM/project c3fbbb6libc/src/__support/wctype perfect_hash_map.h

rename var
DeltaFile
+2-2libc/src/__support/wctype/perfect_hash_map.h
+2-21 files

LLVM/project 6c63916libc/src/__support/CPP iterator.h

fix 2
DeltaFile
+0-4libc/src/__support/CPP/iterator.h
+0-41 files

LLVM/project 5f35480libc/utils/wctype_utils gen.py

remove flag
DeltaFile
+1-1libc/utils/wctype_utils/gen.py
+1-11 files

LLVM/project dbeb438libc/utils/wctype_utils gen.py, libc/utils/wctype_utils/conversion hex_writer.py

format
DeltaFile
+4-5libc/utils/wctype_utils/gen.py
+1-2libc/utils/wctype_utils/conversion/hex_writer.py
+5-72 files

LLVM/project ef6ed2flibc/src/__support/wctype perfect_hash_map.h CMakeLists.txt

add UInt128
DeltaFile
+7-7libc/src/__support/wctype/perfect_hash_map.h
+1-0libc/src/__support/wctype/CMakeLists.txt
+8-72 files

LLVM/project 57b11c2libc/utils/wctype_utils gen.py, libc/utils/wctype_utils/conversion hex_writer.py

format
DeltaFile
+2-1libc/utils/wctype_utils/conversion/hex_writer.py
+2-1libc/utils/wctype_utils/gen.py
+4-22 files

LLVM/project 36125c7libc/src/__support/CPP iterator.h

fix iterator
DeltaFile
+1-3libc/src/__support/CPP/iterator.h
+1-31 files

LLVM/project c174ca5libc/src/__support/wctype perfect_hash_map.h lower_to_upper.h, libc/utils/wctype_utils/conversion hex_writer.py

[libc][wctype] Add perfect hash map for conversion functions
DeltaFile
+876-0libc/src/__support/wctype/perfect_hash_map.h
+568-0libc/src/__support/wctype/lower_to_upper.h
+553-0libc/src/__support/wctype/upper_to_lower.h
+0-400libc/src/__support/wctype/lower_to_upper.inc
+0-390libc/src/__support/wctype/upper_to_lower.inc
+71-1libc/utils/wctype_utils/conversion/hex_writer.py
+2,068-7918 files not shown
+2,256-79714 files

LLVM/project 5264046libc/src/__support/CPP bit.h

[libc][math] Qualify ceil functions to constexpr
DeltaFile
+1-1libc/src/__support/CPP/bit.h
+1-11 files

LLVM/project d906fa5libc/src/__support/math log.h

[libc][math] Qualify log with constant evaluation support
DeltaFile
+2-1libc/src/__support/math/log.h
+2-11 files

LLVM/project a235d96llvm/test/Instrumentation/AddressSanitizer remove-memory-effects.ll

[test][ASan] Precommit test for #187794 (#188112)
DeltaFile
+7-0llvm/test/Instrumentation/AddressSanitizer/remove-memory-effects.ll
+7-01 files

LLVM/project c0634cbflang-rt/lib/cuda registration.cpp, flang/include/flang/Runtime/CUDA registration.h

[flang][cuda] Add CUFRegisterManagedVariable runtime entry for __cudaRegisterManagedVar (#188124)

Add CUFRegisterManagedVariable runtime wrapper in flang-rt that calls
__cudaRegisterManagedVar.
This is preparation for supporting non-allocatable managed variables.
No functional change -- nothing calls this yet.
DeltaFile
+8-0flang-rt/lib/cuda/registration.cpp
+4-0flang/include/flang/Runtime/CUDA/registration.h
+12-02 files

LLVM/project 8cf542dlibc/test/shared shared_math_test.cpp shared_math_constexpr_test.cpp

configure 2 evaluation environment
DeltaFile
+5-72libc/test/shared/shared_math_test.cpp
+63-0libc/test/shared/shared_math_constexpr_test.cpp
+10-0libc/test/shared/CMakeLists.txt
+78-723 files

LLVM/project 2f1e0d1llvm/test/Transforms/LoopVectorize epilog-vectorization-reductions.ll, llvm/test/Transforms/LoopVectorize/X86 transform-narrow-interleave-to-widen-memory-epilogue-vec.ll

[LV] Add additional epilogue vector tests.

Add additional epilogue vectorization tests for
 * https://github.com/llvm/llvm-project/issues/187323
 * https://github.com/llvm/llvm-project/issues/185345
DeltaFile
+303-1llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll
+123-0llvm/test/Transforms/LoopVectorize/X86/transform-narrow-interleave-to-widen-memory-epilogue-vec.ll
+426-12 files

LLVM/project 22977fdllvm/test/CodeGen/AMDGPU llvm.amdgcn.ds.bpermute.ll llvm.amdgcn.ds.permute.ll

[AMDGPU][NFC] Update permute tests to use auto-generated checks (#188107)

Also add global-isel run line.
DeltaFile
+125-13llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.bpermute.ll
+50-5llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
+175-182 files

LLVM/project 3124cb3llvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis SameSDLoops.ll

[DA] Bug fix regarding the SameSD levels (#188098)

SCEV isKnownPredicate may crash if the expressions are involved with
different loops. To verify if two loops have the same iteration space,
we do not need to use the SCEV apis, and it can be done by the equality
check.

Moreover, no pass (not even loop fusion) requires to check SameSD levels
for more than one level. In this patch, we limit the analysis of SameSD
levels to only one level after the common levels.
DeltaFile
+23-20llvm/lib/Analysis/DependenceAnalysis.cpp
+1-1llvm/test/Analysis/DependenceAnalysis/SameSDLoops.ll
+24-212 files

LLVM/project 10580belibclc/clc/lib/generic/math clc_tgamma.inc clc_tgamma.cl

libclc: Improve tgamma handling
DeltaFile
+213-0libclc/clc/lib/generic/math/clc_tgamma.inc
+12-54libclc/clc/lib/generic/math/clc_tgamma.cl
+225-542 files

LLVM/project 13d9304libclc/clc/lib/generic/math clc_lgamma_r_stret.inc clc_lgamma_r.cl

avoid fract
DeltaFile
+2-4libclc/clc/lib/generic/math/clc_lgamma_r_stret.inc
+1-1libclc/clc/lib/generic/math/clc_lgamma_r.cl
+3-52 files

LLVM/project 9096c9cllvm/lib/Transforms/Scalar LoopFuse.cpp, llvm/test/Transforms/LoopFusion da_separate_loops.ll

[LoopFusion] Remove the InvalidDependencies duplicates (#187744)

If the function dependencesAllowFusion returns false, in fuseCandidates
the reportLoopFusion function is used to increment InvalidDependencies
and to emit a OptimizationRemarkMissed. If both dependencesAllowFusion
and reportLoopFusion increment InvalidDependencies, statistics will
appear duplicated
DeltaFile
+0-5llvm/lib/Transforms/Scalar/LoopFuse.cpp
+1-1llvm/test/Transforms/LoopFusion/da_separate_loops.ll
+1-62 files

LLVM/project 205187cflang/lib/Optimizer/OpenACC/Transforms ACCDeclareActionConversion.cpp, flang/test/Fir/OpenACC declare-action-conversion.fir

[openacc][flang] full support to handle allocatable/pointer runtime declare-action calls in ACCDeclareActionConversion (#188055)

Supported before: `fir.store`, `fir.box_addr`, and `fir.call` only for
PointerAllocate/PointerDeallocate.
Added now: fir.call support for PointerAllocateSource,
PointerDeallocatePolymorphic, AllocatableAllocate,
AllocatableAllocateSource, AllocatableDeallocate,
AllocatableDeallocatePolymorphic (found in
flang/include/flang/Runtime/allocatable.h).
DeltaFile
+74-0flang/test/Fir/OpenACC/declare-action-conversion.fir
+42-12flang/lib/Optimizer/OpenACC/Transforms/ACCDeclareActionConversion.cpp
+116-122 files