LLVM/project 852649dllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

AMDGPU: Remove an unnecessary lookup of the AMDGPUSubtarget (#177646)

DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+1-21 files

LLVM/project 4435083libc/shared/math f16fmal.h, libc/src/__support/math f16fmal.h CMakeLists.txt

[libc][math] Refactor f16fmal to header-only (#176576)

closes #175324 
part of #175313
DeltaFile
+34-0libc/src/__support/math/f16fmal.h
+31-0libc/shared/math/f16fmal.h
+12-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+10-0libc/src/__support/math/CMakeLists.txt
+2-4libc/src/math/generic/f16fmal.cpp
+1-2libc/src/math/generic/CMakeLists.txt
+90-73 files not shown
+95-79 files

LLVM/project 01ed562clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp, clang/test/Sema warn-lifetime-safety.cpp

Fix issue with references to fields
DeltaFile
+15-0clang/test/Sema/warn-lifetime-safety.cpp
+2-1clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+17-12 files

LLVM/project 896a667llvm/include/llvm/Support KnownBits.h, llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

[KnownBits][SelectionDAG] Add KnownBits::clmul. Support trailing bits.  NFC (#177517)

Borrow the known trailing bits logic from KnownBits::mul, but using
APIntOps::clmul.
DeltaFile
+37-0llvm/lib/Support/KnownBits.cpp
+1-5llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+3-0llvm/include/llvm/Support/KnownBits.h
+3-0llvm/unittests/Support/KnownBitsTest.cpp
+44-54 files

LLVM/project e7e2c2bllvm/lib/Target/RISCV RISCVInstrInfoZb.td, llvm/test/CodeGen/RISCV rv64zbc-intrinsic.ll rv64zbc-zbkc-intrinsic.ll

[RISCV] Select (clmul (zext_inreg X, i32), (zext_inreg X, i32)) as (clmulh (slli X, 32), (slli X, 32)). (#177429)

Without Zba. We do the same for MUL->MULHU without Zba.
DeltaFile
+9-0llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
+1-3llvm/test/CodeGen/RISCV/rv64zbc-intrinsic.ll
+1-3llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
+11-63 files

LLVM/project ff2535allvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp, llvm/test/CodeGen/AMDGPU ctlz.ll

AMDGPU: Use generic legality checks instead of checking subtarget feature

Avoid checking predicates on AMDGPUSubtarget when possible. Also add a couple
of tests for the ctlz combine where ffbh isn't legal. I'm not sure what
the point of the previous check was.
DeltaFile
+495-1llvm/test/CodeGen/AMDGPU/ctlz.ll
+20-22llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+515-232 files

LLVM/project fdb05bbllvm/lib/Transforms/Scalar DeadStoreElimination.cpp

[LLVM] Update assert to removed unused variable warning. (#177632)

Remove the variable definition and move the function call directly into
the assert statement. Otherwise builds with -Werror that don't use
asserts would fail.
DeltaFile
+1-3llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
+1-31 files

LLVM/project ff97d1alibc/shared/math dfmaf128.h, libc/src/__support/math dfmaf128.h CMakeLists.txt

[libc][math] Refactor dfmaf128 to Header Only (#176480)

Closes https://github.com/llvm/llvm-project/issues/175315, Part of
https://github.com/llvm/llvm-project/issues/175344
DeltaFile
+31-0libc/src/__support/math/dfmaf128.h
+29-0libc/shared/math/dfmaf128.h
+12-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+2-9libc/src/math/generic/dfmaf128.cpp
+10-0libc/src/__support/math/CMakeLists.txt
+1-2libc/src/math/generic/CMakeLists.txt
+85-123 files not shown
+89-129 files

LLVM/project b94c5e0llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU strict_ldexp.f16.ll strict_ldexp.f64.ll

[AMDGPU][GlobalISel] Add RegBankLegalize support for G_STRICT_FLDEXP (#177525)

DeltaFile
+87-15llvm/test/CodeGen/AMDGPU/strict_ldexp.f16.ll
+40-8llvm/test/CodeGen/AMDGPU/strict_ldexp.f64.ll
+31-4llvm/test/CodeGen/AMDGPU/strict_ldexp.f32.ll
+1-1llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+159-284 files

LLVM/project 7c8a13allvm/include/llvm/Transforms/Utils LoopPeel.h, llvm/lib/Transforms/Scalar LoopFuse.cpp LoopUnrollPass.cpp

[LoopPeel] change `peelLoop`'s return type from `bool` to `void` (#177488)

DeltaFile
+43-45llvm/lib/Transforms/Scalar/LoopFuse.cpp
+8-10llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+1-3llvm/lib/Transforms/Utils/LoopPeel.cpp
+1-1llvm/include/llvm/Transforms/Utils/LoopPeel.h
+53-594 files

LLVM/project 8b00b1dllvm Maintainers.md

Fix formatting in `Maintainers.md` (#177498)

DeltaFile
+1-1llvm/Maintainers.md
+1-11 files

LLVM/project f7361efllvm/lib/Support Threading.cpp, llvm/lib/Support/Unix Threading.inc

[Support] Avoid misguided FreeBSD hack (#177508)

FreeBSD doesn't do anything wrong here, it just happens to define and
use a struct thread in its own headers. The problems arise because here
in LLVM we have using namespace llvm prior to including system headers,
which is bad practice for precisely this reason. If we instead play by
the rules and defer our using namespace llvm until after we've included
the system headers then we no longer need this hack.

This hack is particularly problematic by being conditional on
__FreeBSD__ as of 9093ba9f7ee5 ("[Support] Include Support/thread.h
before api implementations (#111175)"), since on non-FreeBSD
Threading.inc can reference anything in Support/thread.h, only causing
errors on FreeBSD, which is precisely what happened in 64be34c562a2
("Enable using threads on z/OS (#171847)").

By deferring the using namespace llvm until after Threading.inc is
included there may be build failures introduced on untested platforms
due to needing to replace unqualified identifiers with qualified ones by
prepending llvm::.
DeltaFile
+13-15llvm/lib/Support/Threading.cpp
+5-9llvm/lib/Support/Unix/Threading.inc
+5-4llvm/lib/Support/Windows/Threading.inc
+23-283 files

LLVM/project f6f5ad3llvm/test/Analysis/CostModel/AArch64 arith.ll

[AArch64] Add some basic i128 arithmetic cost test cases. NFC
DeltaFile
+50-0llvm/test/Analysis/CostModel/AArch64/arith.ll
+50-01 files

LLVM/project 0d0249ellvm/lib/Target/AMDGPU AMDGPUIGroupLP.cpp, llvm/test/CodeGen/AMDGPU inlineasm-sgmask.ll

Try To Guess SGMasks for Inline Asm Instructions (#155491)

Addresses SWDEV-549227
DeltaFile
+179-0llvm/test/CodeGen/AMDGPU/inlineasm-sgmask.ll
+57-0llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
+236-02 files

LLVM/project c45684fllvm/test/CodeGen/AMDGPU frem.ll fract-match.ll, llvm/test/CodeGen/AMDGPU/GlobalISel frem.ll

AMDGPU: Ignore type legality in isFAbsFree (#177630)

This treats it as free on targets without legal f16. This
matches the existing logic in fneg, and they should be the same.
The test changes are mostly neutral with a few improvements.
DeltaFile
+130-148llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
+93-93llvm/test/CodeGen/AMDGPU/frem.ll
+22-24llvm/test/CodeGen/AMDGPU/fract-match.ll
+11-21llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
+12-15llvm/test/CodeGen/AMDGPU/fp-classify.ll
+8-8llvm/test/CodeGen/AMDGPU/fneg-fabs.f16.ll
+276-3093 files not shown
+285-3189 files

LLVM/project 1f8ae28clang/include/clang/Driver ToolChain.h, clang/lib/Driver/ToolChains Linux.cpp MSVC.cpp

[HIP] Pass HIP library directly and refactor (#176019)

Summary:
Currently we pass `-L` and `-l` to get the HIP library. Because we are
attached to a single HIP installation it's far better to pass it by
filename. This is because the `-L` could be out of order with other user
libraries and those could override it. If someone uses HIP with a
specific ROCm installation they most likely want that library, otherwise
incompatibilities can occur. This is still overridable with command line
flags if users want to pass a different one for some reason.

This PR also refactors the handling to be more generic for future
additions.
DeltaFile
+20-11clang/lib/Driver/ToolChains/Linux.cpp
+11-6clang/lib/Driver/ToolChains/MSVC.cpp
+0-15clang/lib/Driver/ToolChains/CommonArgs.cpp
+5-5clang/test/Driver/hip-runtime-libs-linux.hip
+4-4clang/include/clang/Driver/ToolChain.h
+3-3clang/test/Driver/rocm-detect.hip
+43-444 files not shown
+50-5010 files

LLVM/project 98b55bcllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp SIISelLowering.cpp

AMDGPU: Move f16 legality configuration to SITargetLowering (#177629)

f16 is never legal for R600 so this should not be in the common
base class.
DeltaFile
+2-11llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+3-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+5-112 files

LLVM/project 7edf4e1llvm/include/llvm/MC MCRegisterInfo.h, llvm/test/TableGen regunit-intervals.td regunit-intervals-impossible.td

[TableGen] Allow targets to enforce regunits assignment as intervals (#175823)

General tablegen infrastructure for #174888
DeltaFile
+100-2llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+73-0llvm/test/TableGen/regunit-intervals.td
+35-0llvm/test/TableGen/regunit-intervals-impossible.td
+17-1llvm/include/llvm/MC/MCRegisterInfo.h
+17-0llvm/utils/TableGen/RegisterInfoEmitter.cpp
+13-1llvm/utils/TableGen/Common/CodeGenRegisters.h
+255-43 files not shown
+279-59 files

LLVM/project 51c617cllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

AMDGPU: Remove an unnecessary lookup of the AMDGPUSubtarget
DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+1-21 files

LLVM/project e67f934llvm/lib/Transforms/Utils IntegerDivision.cpp, llvm/test/CodeGen/RISCV idiv_large.ll

[profcheck] Fix profle metatdata propagation for Large Integer operations (#175862)

This PR improves the propagation of profile metadata within the
ExpandIRInsts pass. When lowering large integer division operations, the
pass now ensures that branch weights are correctly attached to the
generated control flow, preventing the loss of profile data during IR
expansion.

This PR improves signed and unsigned division/remainder for non-native
bit widths (e.g., `sdiv/udiv i129`, `srem/urem i129`) and implemented
Heuristic-Based Branch Weights labeling using established heuristics for
edge cases e.g., `Division-by-zero guards` and `Magnitude comparisons
between dividends and divisors`.

It also adds detailed comments within the expansion logic to explain the
rationale behind specific branch weight choices and the underlying
mathematical invariants.

Please refer to the implementation details in the source code for the

    [2 lines not shown]
DeltaFile
+600-601llvm/test/CodeGen/RISCV/idiv_large.ll
+252-249llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll
+243-236llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll
+99-88llvm/test/CodeGen/X86/pr38539.ll
+49-39llvm/test/Transforms/ExpandIRInsts/X86/vector.ll
+52-3llvm/lib/Transforms/Utils/IntegerDivision.cpp
+1,295-1,2166 files not shown
+1,388-1,24812 files

LLVM/project 41567d8llvm/include/llvm/CodeGen TargetLoweringObjectFileImpl.h, llvm/include/llvm/MC MCGOFFAttributes.h

[SystemZ] Implement ctor/dtor emission via @@SQINIT and .xtor sections (#171476)

This patch implements support for constructors/destructors by
introducing the
`@@SQINIT` section and emitting `.xtor.<priority>` sections within the
SystemZ
AsmPrinter and in the GOFF object lowering layer.
DeltaFile
+63-0llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+36-0llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
+34-0llvm/test/CodeGen/SystemZ/zos_sinit.ll
+2-0llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
+1-0llvm/include/llvm/MC/MCGOFFAttributes.h
+1-0llvm/include/llvm/CodeGen/TargetLoweringObjectFileImpl.h
+137-06 files

LLVM/project 9f3d143llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

AMDGPU: Remove dead code configuring f16 is_fpclass (#177626)

isTypeLegal can never be true here. The register classes
are registered at the end of the target lowering constructor,
and in the subclasses.
DeltaFile
+0-5llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+0-51 files

LLVM/project 1986628llvm/lib/Target/AMDGPU AMDGPUPromoteAlloca.cpp, llvm/test/CodeGen/AMDGPU flat-scratch.ll target-cpu.ll

[AMDGPU] Remove `FeaturePromoteAlloca`
DeltaFile
+14-14llvm/test/CodeGen/AMDGPU/flat-scratch.ll
+10-10llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll
+1-16llvm/test/CodeGen/AMDGPU/target-cpu.ll
+7-7llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll
+6-6llvm/test/CodeGen/AMDGPU/amdgcn.private-memory.ll
+7-4llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+45-5723 files not shown
+95-12129 files

LLVM/project 09685b7llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel fmax_legacy.ll fmin_legacy.ll

[AMDGPU][GlobalISel] Add RegBankLegalize rules for fmin/fmax_legacy (#177520)

DeltaFile
+26-3llvm/test/CodeGen/AMDGPU/GlobalISel/fmax_legacy.ll
+26-3llvm/test/CodeGen/AMDGPU/GlobalISel/fmin_legacy.ll
+4-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+56-63 files

LLVM/project 9109c60mlir/include/mlir/Dialect/XeGPU/uArch IntelGpuXe2.h uArchBase.h, mlir/lib/Dialect/XeGPU/Transforms XeGPUPropagateLayout.cpp

[MLIR][XeGPU] Add uArch limitation to scatter load store (#172845)

DeltaFile
+98-35mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp
+71-2mlir/test/Dialect/XeGPU/propagate-layout-inst-data.mlir
+32-4mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h
+7-1mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h
+208-424 files

LLVM/project fef6a14llvm/lib/Target/AMDGPU AMDGPU.td

[NFCI][AMDGPU] Fix the predicate `HasDsSrc2Insts` (#177621)

I'm not sure why the predicate has a `!`, and more surprisingly,
removing it doesn't change anything.
DeltaFile
+1-5llvm/lib/Target/AMDGPU/AMDGPU.td
+1-51 files

LLVM/project 6cf189aclang/test/CodeGen builtin-rotate.c

[clang][test] Fix builtin-rotate.c failure on ARM32 (#177290)

Replace unsigned __int128 with unsigned _BitInt(128) since __int128 is
not supported on ARM 32-bit targets.

Fixes https://lab.llvm.org/buildbot/#/builders/79/builds/2754
DeltaFile
+14-4clang/test/CodeGen/builtin-rotate.c
+14-41 files

LLVM/project 4237e74llvm/include/llvm/Analysis TargetTransformInfo.h, llvm/lib/Analysis TargetTransformInfo.cpp

[VectorCombine] foldShuffleOfBinops - failure to track OperandValueInfo (#171934)

Resolves #170500.

Implemented mergeInfo static helper to return common
TTI::OperandValueInfo data .

Added common OperandValueInfo `Op0Info` && `Op1Info` to NewCost
calculation.
DeltaFile
+193-0llvm/test/Transforms/VectorCombine/X86/shuffle-of-binops.ll
+15-0llvm/include/llvm/Analysis/TargetTransformInfo.h
+10-4llvm/lib/Analysis/TargetTransformInfo.cpp
+7-4llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+225-84 files

LLVM/project 4f09b7allvm/test/CodeGen/AMDGPU frem.ll fract-match.ll, llvm/test/CodeGen/AMDGPU/GlobalISel frem.ll

AMDGPU: Ignore type legality in isFAbsFree

This treats it as free on targets without legal f16. This
matches the existing logic in fneg, and they should be the same.
The test changes are mostly neutral with a few improvements.
DeltaFile
+130-148llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
+93-93llvm/test/CodeGen/AMDGPU/frem.ll
+22-24llvm/test/CodeGen/AMDGPU/fract-match.ll
+11-21llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
+12-15llvm/test/CodeGen/AMDGPU/fp-classify.ll
+8-8llvm/test/CodeGen/AMDGPU/fneg-fabs.f16.ll
+276-3093 files not shown
+285-3189 files

LLVM/project c1de2a9llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp SIISelLowering.cpp

AMDGPU: Move f16 legality configuration to SITargetLowering

f16 is never legal for R600 so this should not be in the common
base class.
DeltaFile
+2-11llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+3-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+5-112 files