LLVM/project 34603dbllvm/lib/Transforms/Utils CMakeLists.txt

Fix dependencies after PR #174490 (#175793)

Added the missing direct dependency (it was already an indirect dependency due to `Analysis`)
DeltaFile
+1-0llvm/lib/Transforms/Utils/CMakeLists.txt
+1-01 files

LLVM/project 4a9a13cllvm/include/llvm/ADT FloatingPointMode.h, llvm/include/llvm/Support KnownFPClass.h

InstCombine: Handle fadd in SimplifyDemandedFPClass (#174853)

Note some of the tests currently fail with alive, but not
due to this patch. Namely, when performing the fadd x, 0 -> x
simplification in functions with non-IEEE denormal handling.
The existing instsimplify ignores the denormals-are-zero hazard by
checking cannotBeNegativeZero instead of isKnownNeverLogicalZero.

Also note the self handling doesn't really do anything yet, other
than propagate consistent known-fpclass information until there is
multiple use support.

This also leaves behind the original ValueTracking support, without
switching to the new KnownFPClass:fadd utility. This will be easier
to clean up after the subsequent fsub support patch.
DeltaFile
+93-124llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-fadd.ll
+113-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+46-0llvm/lib/Support/KnownFPClass.cpp
+17-0llvm/include/llvm/ADT/FloatingPointMode.h
+10-0llvm/include/llvm/Support/KnownFPClass.h
+279-1245 files

LLVM/project f8ae051libcxx/test/libcxx/ranges/range.adaptors/range.as_rvalue nodiscard.verify.cpp, libcxx/test/libcxx/ranges/range.adaptors/range.as_rvalue_view nodiscard.verify.cpp

[libc++][ranges][NFC] Cleanup `nodiscard.verify.cpp` tests (#175725)

- Removed redundant files
- Renamed files to the common `nodiscard.verify.cpp`
DeltaFile
+63-0libcxx/test/libcxx/ranges/range.adaptors/range.as_rvalue/nodiscard.verify.cpp
+0-63libcxx/test/libcxx/ranges/range.adaptors/range.as_rvalue_view/nodiscard.verify.cpp
+0-61libcxx/test/libcxx/ranges/range.adaptors/range.common.view/nodiscard.verify.cpp
+61-0libcxx/test/libcxx/ranges/range.adaptors/range.common/nodiscard.verify.cpp
+0-21libcxx/test/libcxx/ranges/range.adaptors/range.common.view/adaptor.nodiscard.verify.cpp
+19-0libcxx/test/libcxx/ranges/range.adaptors/range.counted/nodiscard.verify.cpp
+143-1451 files not shown
+143-1647 files

LLVM/project 62f629allvm/include/llvm/Transforms/Utils LowerMemIntrinsics.h, llvm/lib/Transforms/Utils LowerMemIntrinsics.cpp

[LowerMemIntrinsics] Propagate value profile to branch weights (#174490)

If the mem intrinsics have value profile information associated, we can synthesize branch weights when converting them (the intrinsics) to loops.  
  
Issue #147390
DeltaFile
+112-37llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+31-18llvm/test/Transforms/PreISelIntrinsicLowering/X86/memcpy-inline-non-constant-len.ll
+20-8llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-inline-non-constant-len.ll
+4-2llvm/include/llvm/Transforms/Utils/LowerMemIntrinsics.h
+0-3llvm/utils/profcheck-xfail.txt
+167-685 files

LLVM/project b88ea53llvm/lib/Target/AMDGPU GCNRegPressure.cpp, llvm/test/CodeGen/AMDGPU machine-scheduler-sink-trivial-remats-attr.mir

[WIP] Change how ArchVGPR excess is computed. It's not clear why it was considering AGPRs.
DeltaFile
+20-20llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir
+13-8llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+33-282 files

LLVM/project 3accd1fllvm/lib/Target/AMDGPU GCNSchedStrategy.cpp

[Review] typos in comment
DeltaFile
+4-3llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+4-31 files

LLVM/project 9b1f3d4llvm/lib/Target/AMDGPU GCNRegPressure.cpp GCNRegPressure.h

[Review] Change consturctor of RegExcess to take a pressure and a target and rename spillsToMemory to spillsToMemoryForTargetOccupancy
DeltaFile
+14-8llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+5-0llvm/lib/Target/AMDGPU/GCNRegPressure.h
+5-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+24-83 files

LLVM/project 58e74dellvm/lib/Target/AMDGPU GCNRegPressure.cpp

[Review] Move the  class into an annonymous namespace
DeltaFile
+2-0llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+2-01 files

LLVM/project e24ab2allvm/lib/Target/AMDGPU GCNRegPressure.cpp

[Review] Use unified vgpr count with unified-register-file
DeltaFile
+2-2llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+2-21 files

LLVM/project 42f18ccllvm/test/CodeGen/AMDGPU swdev-549940.ll

Remove undef from test (it still preserves the test behavour before and after the fix)
DeltaFile
+1-1llvm/test/CodeGen/AMDGPU/swdev-549940.ll
+1-11 files

LLVM/project 9bc63e3llvm/lib/Target/AMDGPU GCNRegPressure.cpp GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU machine-scheduler-sink-trivial-remats-attr.mir swdev-549940.ll

[AMDGPU] Rematerialize VGPR candidates when SGPR spills to VGPR over the VGPR limit

Before, when selecting candidates to rematerialize, we would only
consider SGPR candidates when there was an excess of SGPR registers.

Failing to eliminate the excess would result in spills to VGPRs.
This is normally not an issue, unless spilling to VGPRs results in
excess VGPRs.

This patch does 2 things:
* It relaxes the GCNRPTarget success criteria: now we accept regions
  where we spill SGPRs to VGPRs, as long as this does not end up in
  excess VGPRs.
* It changes isSaveBeneficial to consider the excess VGPRs (which
  includes the SGPRs that would be spilled to VGPR).

With these changes, the compiler rematerializes VGPRs when the excess
SGPRs would result in VGPR excess.


    [4 lines not shown]
DeltaFile
+30-30llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir
+15-9llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+3-1llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+1-1llvm/test/CodeGen/AMDGPU/swdev-549940.ll
+1-0llvm/lib/Target/AMDGPU/GCNRegPressure.h
+50-415 files

LLVM/project d8bbb25llvm/test/CodeGen/AMDGPU swdev-549940.ll

Unacceptably large test
DeltaFile
+609-0llvm/test/CodeGen/AMDGPU/swdev-549940.ll
+609-01 files

LLVM/project 87dc9dfllvm/lib/Target/AMDGPU GCNRegPressure.cpp

[NFC][AMDGPU] Refactor common code computing excess register preassure into RegExcess class
DeltaFile
+47-45llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+47-451 files

LLVM/project 1797a81flang/lib/Lower/OpenMP ClauseProcessor.cpp Clauses.cpp, flang/test/Lower/OpenMP thread-limit-dims.f90

[Flang] Add lowering for flang to mlir for thread_limit
DeltaFile
+62-0flang/test/Lower/OpenMP/thread-limit-dims.f90
+17-3flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+10-3flang/lib/Lower/OpenMP/Clauses.cpp
+3-1llvm/include/llvm/Frontend/OpenMP/ClauseT.h
+92-74 files

LLVM/project 86e114aoffload/plugins-nextgen/amdgpu/dynamic_hsa hsa.cpp, offload/plugins-nextgen/amdgpu/src rtl.cpp

Revert "[OFFLOAD] Update CUDA and AMD plugins to new debug format" (#175786)

Reverts llvm/llvm-project#175757
DeltaFile
+4-8offload/plugins-nextgen/cuda/dynamic_cuda/cuda.cpp
+3-5offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.cpp
+4-4offload/plugins-nextgen/cuda/src/rtl.cpp
+3-3offload/plugins-nextgen/amdgpu/src/rtl.cpp
+1-1offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h
+15-215 files

LLVM/project b93e243llvm/include/llvm/ExecutionEngine/Orc BacktraceTools.h

Remove LLVM_ABI from symbolicate declaration in BacktraceTools.h (#175764)

The class is already annotated with LLVM_ABI, so individual members shouldn't be.
DeltaFile
+1-1llvm/include/llvm/ExecutionEngine/Orc/BacktraceTools.h
+1-11 files

LLVM/project 61f2ce0flang/lib/Lower/OpenMP ClauseProcessor.cpp Clauses.cpp, flang/test/Lower/OpenMP num-threads-dims.f90

[Flang] Add lowering from flang to mlir for num_threads
DeltaFile
+61-0flang/test/Lower/OpenMP/num-threads-dims.f90
+17-3flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+10-3flang/lib/Lower/OpenMP/Clauses.cpp
+3-1llvm/include/llvm/Frontend/OpenMP/ClauseT.h
+91-74 files

LLVM/project 7c2f493offload/plugins-nextgen/amdgpu/dynamic_hsa hsa.cpp, offload/plugins-nextgen/amdgpu/src rtl.cpp

[OFFLOAD] Update CUDA and AMD plugins to new debug format (#175757)

This should be the last step before completely removing the DP macro.
DeltaFile
+8-4offload/plugins-nextgen/cuda/dynamic_cuda/cuda.cpp
+5-3offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.cpp
+4-4offload/plugins-nextgen/cuda/src/rtl.cpp
+3-3offload/plugins-nextgen/amdgpu/src/rtl.cpp
+1-1offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h
+21-155 files

LLVM/project 351d06amlir/include/mlir/Bindings/Python IRCore.h, mlir/lib/Bindings/Python IRCore.cpp IRAttributes.cpp

[MLIR][Python] Improve Iterator performance. Don't `throw` in `dunderNext` methods. (#175377)

In
https://github.com/llvm/llvm-project/pull/174139#issuecomment-3733259370
I wrote a scuffed benchmark that mostly iterates MLIR Container Types in
Python. My changes from that PR made the performance worse, so I closed
it.

However, when experimetning with that I also saw a large(?) performance
gain by changing the `dunderNext` methods of the various Iterators to
use `PyErr_SetNone(PyExc_StopIteration);` instead of `throw
nb::stop_iteration();`.

<details><summary>Benchmark attempt script</summary>

```python
import timeit

from mlir.ir import Context, Location, Module, InsertionPoint, Block, Region, OpView

    [93 lines not shown]
DeltaFile
+20-11mlir/lib/Bindings/Python/IRCore.cpp
+5-2mlir/lib/Bindings/Python/IRAttributes.cpp
+3-3mlir/include/mlir/Bindings/Python/IRCore.h
+28-163 files

LLVM/project 7581c70libcxx/include/__algorithm unwrap_iter.h

[libc++] Simplify __unwrap_iter a bit (#175153)

`__unwrap_iter` doesn't need to SFINAE away, so we can just check inside
the function body whether an iterator is copy constructible. This
reduces the overload set, improving compile times a bit.
DeltaFile
+8-10libcxx/include/__algorithm/unwrap_iter.h
+8-101 files

LLVM/project 75cf8e2flang/lib/Lower/OpenMP ClauseProcessor.cpp Clauses.cpp, flang/test/Lower/OpenMP num-teams-dims.f90

[FLANG] Add flang to mlir lowering for num_teams
DeltaFile
+52-0flang/test/Lower/OpenMP/num-teams-dims.f90
+27-10flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+22-4flang/lib/Lower/OpenMP/Clauses.cpp
+101-143 files

LLVM/project 403d8aaclang/lib/CodeGen/TargetBuiltins ARM.cpp, clang/test/CodeGen/AArch64/sme-intrinsics acle_sme_str.c acle_sme_ldr.c

[AArch64][llvm] Improve codegen for svldr_vnum_za/svstr_vnum_za

When compiling `svldr_vnum_za` or `svstr_vnum_za`, the output
assembly has a superfluous `SXTW` instruction (gcc doesn't add
this); this should be excised, see https://godbolt.org/z/sz4s79rf8

In clang we're using int64_t, and `i32` in llvm. The extra `SXTW`
is due to a call to `DAG.getNode(ISD::SIGN_EXTEND...)`. Make them
both 64bit to make the extra `SXTW` go away.
DeltaFile
+56-62llvm/test/CodeGen/AArch64/sme-intrinsics-stores.ll
+56-62llvm/test/CodeGen/AArch64/sme-intrinsics-loads.ll
+8-8llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+5-6clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_str.c
+5-6clang/test/CodeGen/AArch64/sme-intrinsics/acle_sme_ldr.c
+2-2clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+132-1461 files not shown
+133-1477 files

LLVM/project ed0eaa4llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 avx10_2bf16-fma.ll

[X86] Add bf16 support to isFMAFasterThanFMulAndFAdd for basic FMA optimizations (#172006)

This PR extends `isFMAFasterThanFMulAndFAdd` in `X86ISelLowering` to
handle
bfloat types. This enables basic FMA optimizations for bf16
operations on AVX10.2 targets.

Includes tests for scalar and vector bf16 cases:
- Scalar bf16 FMA lowering (AVX10.2 do not support scalar bf16
operations)
- Vector bf16 FMA fusion for 128-bit, 256-bit, and 512-bit widths
DeltaFile
+806-0llvm/test/CodeGen/X86/avx10_2bf16-fma.ll
+5-0llvm/lib/Target/X86/X86ISelLowering.cpp
+811-02 files

LLVM/project 3a21ff2llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

AMDGPU: Change ABI of 16-bit element vectors on gfx6/7

Fix ABI on old subtargets so match new subtargets, packing
16-bit element subvectors into 32-bit registers. Previously
this would be scalarized and promoted to i32/float.

Note this only changes the vector cases. Scalar i16/half are
still promoted to i32/float for now. I've unsuccessfully tried
to make that switch in the past, so leave that for later.

This will help with removal of softPromoteHalfType.
DeltaFile
+47,697-51,378llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+14,474-16,242llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+16,328-12,881llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+13,036-14,705llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+11,668-13,311llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+10,558-11,908llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+113,761-120,425151 files not shown
+200,130-204,067157 files

LLVM/project 186d520llvm/lib/CodeGen/GlobalISel CallLowering.cpp

GlobalISel: Fix mishandling vector-as-scalar in return values

This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16>
values as packed on gfx6/gfx7. The ABI does not pack values
currently; this is a pre-fix for that change.

Insert a bitcast if there is a single part with a different size.
Previously this would miscompile by going through the scalarization
and extend path, dropping the high element.

Also fix assertions in odd cases, like <3 x i16> -> i32. This needs
to unmerge with excess elements from the widened source vector.

All of this code is in need of a cleanup; this should look more
like the DAG version using getVectorTypeBreakdown.
DeltaFile
+24-2llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
+24-21 files

LLVM/project ae2860ellvm/lib/Target/LoongArch LoongArchInstrInfo.td LoongArchExpandPseudoInsts.cpp, llvm/lib/Target/LoongArch/AsmParser LoongArchAsmParser.cpp

[llvm][LoongArch] Add call and tail macro instruction support

Link: https://sourceware.org/pipermail/binutils/2025-December/146091.html
DeltaFile
+43-14llvm/test/MC/LoongArch/Macros/macros-call.s
+21-15llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+21-10llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
+7-0llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
+2-2llvm/lib/Target/LoongArch/LoongArchTargetMachine.cpp
+1-2llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+95-432 files not shown
+97-458 files

LLVM/project c88cbafllvm/lib/Target/LoongArch LoongArchInstrInfo.td LoongArchMCInstLower.cpp, llvm/lib/Target/LoongArch/AsmParser LoongArchAsmParser.cpp

[llvm][LoongArch] Add call30 and tail30 macro instruction support (#175356)

Link:
https://sourceware.org/pipermail/binutils/2025-December/146091.html
DeltaFile
+52-12llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
+15-1llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+14-1llvm/test/MC/LoongArch/Macros/macros-call.s
+2-2llvm/test/MC/LoongArch/Basic/Integer/invalid.s
+3-0llvm/lib/Target/LoongArch/LoongArchMCInstLower.cpp
+3-0llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchMCAsmInfo.cpp
+89-163 files not shown
+92-169 files

LLVM/project e6bb35bllvm/test/CodeGen/AMDGPU bf16.ll llvm.exp2.bf16.ll, llvm/test/CodeGen/AMDGPU/GlobalISel irtranslate-bf16.ll

AMDGPU: Directly use v2bf16 as register type for bf16 vectors.

Previously we were casting v2bf16 to i32, unlike the f16 case. Simplify
this by using the natural vector type. This is probably a leftover from
before v2bf16 was treated as legal. This is preparation for fixing a
miscompile in globalisel.
DeltaFile
+465-462llvm/test/CodeGen/AMDGPU/bf16.ll
+121-282llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslate-bf16.ll
+122-133llvm/test/CodeGen/AMDGPU/llvm.exp2.bf16.ll
+91-91llvm/test/CodeGen/AMDGPU/maximumnum.bf16.ll
+91-91llvm/test/CodeGen/AMDGPU/minimumnum.bf16.ll
+14-24llvm/test/CodeGen/AMDGPU/llvm.log2.bf16.ll
+904-1,0833 files not shown
+910-1,1009 files

LLVM/project dd63117llvm/lib/Target/Hexagon HexagonISelLoweringHVX.cpp, llvm/test/CodeGen/Hexagon hvx-constpool-vector-type.ll

[Hexagon] Fix PIC crash when lowering HVX vector constants (#175413)

Fix a PIC-only crash in Hexagon HVX lowering where we ended up treating
a vector-typed constant-pool reference as an address (e.g. when forming
PC-relative addresses), which triggers a type mismatch during lowering.
Build the constant-pool reference with the target pointer type instead,
then load the HVX vector from that address.
DeltaFile
+14-0llvm/test/CodeGen/Hexagon/hvx-constpool-vector-type.ll
+6-4llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp
+20-42 files

LLVM/project 513c29flibunwind/src Unwind-wasm.c

Revert "[libunwind] Silence -Wunused-parameter warnings in Unwind-wasm.c (#12…"

This reverts commit f4206f92c5f900a4e0fc0f6dcab6afb6865df1e9.
DeltaFile
+4-2libunwind/src/Unwind-wasm.c
+4-21 files