LLVM/project b94c5e0llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU strict_ldexp.f16.ll strict_ldexp.f64.ll

[AMDGPU][GlobalISel] Add RegBankLegalize support for G_STRICT_FLDEXP (#177525)

DeltaFile
+87-15llvm/test/CodeGen/AMDGPU/strict_ldexp.f16.ll
+40-8llvm/test/CodeGen/AMDGPU/strict_ldexp.f64.ll
+31-4llvm/test/CodeGen/AMDGPU/strict_ldexp.f32.ll
+1-1llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+159-284 files

LLVM/project 7c8a13allvm/include/llvm/Transforms/Utils LoopPeel.h, llvm/lib/Transforms/Scalar LoopFuse.cpp LoopUnrollPass.cpp

[LoopPeel] change `peelLoop`'s return type from `bool` to `void` (#177488)

DeltaFile
+43-45llvm/lib/Transforms/Scalar/LoopFuse.cpp
+8-10llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+1-3llvm/lib/Transforms/Utils/LoopPeel.cpp
+1-1llvm/include/llvm/Transforms/Utils/LoopPeel.h
+53-594 files

LLVM/project 8b00b1dllvm Maintainers.md

Fix formatting in `Maintainers.md` (#177498)

DeltaFile
+1-1llvm/Maintainers.md
+1-11 files

LLVM/project f7361efllvm/lib/Support Threading.cpp, llvm/lib/Support/Unix Threading.inc

[Support] Avoid misguided FreeBSD hack (#177508)

FreeBSD doesn't do anything wrong here, it just happens to define and
use a struct thread in its own headers. The problems arise because here
in LLVM we have using namespace llvm prior to including system headers,
which is bad practice for precisely this reason. If we instead play by
the rules and defer our using namespace llvm until after we've included
the system headers then we no longer need this hack.

This hack is particularly problematic by being conditional on
__FreeBSD__ as of 9093ba9f7ee5 ("[Support] Include Support/thread.h
before api implementations (#111175)"), since on non-FreeBSD
Threading.inc can reference anything in Support/thread.h, only causing
errors on FreeBSD, which is precisely what happened in 64be34c562a2
("Enable using threads on z/OS (#171847)").

By deferring the using namespace llvm until after Threading.inc is
included there may be build failures introduced on untested platforms
due to needing to replace unqualified identifiers with qualified ones by
prepending llvm::.
DeltaFile
+13-15llvm/lib/Support/Threading.cpp
+5-9llvm/lib/Support/Unix/Threading.inc
+5-4llvm/lib/Support/Windows/Threading.inc
+23-283 files

LLVM/project f6f5ad3llvm/test/Analysis/CostModel/AArch64 arith.ll

[AArch64] Add some basic i128 arithmetic cost test cases. NFC
DeltaFile
+50-0llvm/test/Analysis/CostModel/AArch64/arith.ll
+50-01 files

LLVM/project 0d0249ellvm/lib/Target/AMDGPU AMDGPUIGroupLP.cpp, llvm/test/CodeGen/AMDGPU inlineasm-sgmask.ll

Try To Guess SGMasks for Inline Asm Instructions (#155491)

Addresses SWDEV-549227
DeltaFile
+179-0llvm/test/CodeGen/AMDGPU/inlineasm-sgmask.ll
+57-0llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
+236-02 files

LLVM/project c45684fllvm/test/CodeGen/AMDGPU frem.ll fract-match.ll, llvm/test/CodeGen/AMDGPU/GlobalISel frem.ll

AMDGPU: Ignore type legality in isFAbsFree (#177630)

This treats it as free on targets without legal f16. This
matches the existing logic in fneg, and they should be the same.
The test changes are mostly neutral with a few improvements.
DeltaFile
+130-148llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
+93-93llvm/test/CodeGen/AMDGPU/frem.ll
+22-24llvm/test/CodeGen/AMDGPU/fract-match.ll
+11-21llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
+12-15llvm/test/CodeGen/AMDGPU/fp-classify.ll
+8-8llvm/test/CodeGen/AMDGPU/fneg-fabs.f16.ll
+276-3093 files not shown
+285-3189 files

LLVM/project 1f8ae28clang/include/clang/Driver ToolChain.h, clang/lib/Driver/ToolChains Linux.cpp MSVC.cpp

[HIP] Pass HIP library directly and refactor (#176019)

Summary:
Currently we pass `-L` and `-l` to get the HIP library. Because we are
attached to a single HIP installation it's far better to pass it by
filename. This is because the `-L` could be out of order with other user
libraries and those could override it. If someone uses HIP with a
specific ROCm installation they most likely want that library, otherwise
incompatibilities can occur. This is still overridable with command line
flags if users want to pass a different one for some reason.

This PR also refactors the handling to be more generic for future
additions.
DeltaFile
+20-11clang/lib/Driver/ToolChains/Linux.cpp
+11-6clang/lib/Driver/ToolChains/MSVC.cpp
+0-15clang/lib/Driver/ToolChains/CommonArgs.cpp
+5-5clang/test/Driver/hip-runtime-libs-linux.hip
+4-4clang/include/clang/Driver/ToolChain.h
+3-3clang/test/Driver/rocm-detect.hip
+43-444 files not shown
+50-5010 files

LLVM/project 98b55bcllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp SIISelLowering.cpp

AMDGPU: Move f16 legality configuration to SITargetLowering (#177629)

f16 is never legal for R600 so this should not be in the common
base class.
DeltaFile
+2-11llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+3-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+5-112 files

LLVM/project 7edf4e1llvm/include/llvm/MC MCRegisterInfo.h, llvm/test/TableGen regunit-intervals.td regunit-intervals-impossible.td

[TableGen] Allow targets to enforce regunits assignment as intervals (#175823)

General tablegen infrastructure for #174888
DeltaFile
+100-2llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+73-0llvm/test/TableGen/regunit-intervals.td
+35-0llvm/test/TableGen/regunit-intervals-impossible.td
+17-1llvm/include/llvm/MC/MCRegisterInfo.h
+17-0llvm/utils/TableGen/RegisterInfoEmitter.cpp
+13-1llvm/utils/TableGen/Common/CodeGenRegisters.h
+255-43 files not shown
+279-59 files

LLVM/project 51c617cllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

AMDGPU: Remove an unnecessary lookup of the AMDGPUSubtarget
DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+1-21 files

LLVM/project e67f934llvm/lib/Transforms/Utils IntegerDivision.cpp, llvm/test/CodeGen/RISCV idiv_large.ll

[profcheck] Fix profle metatdata propagation for Large Integer operations (#175862)

This PR improves the propagation of profile metadata within the
ExpandIRInsts pass. When lowering large integer division operations, the
pass now ensures that branch weights are correctly attached to the
generated control flow, preventing the loss of profile data during IR
expansion.

This PR improves signed and unsigned division/remainder for non-native
bit widths (e.g., `sdiv/udiv i129`, `srem/urem i129`) and implemented
Heuristic-Based Branch Weights labeling using established heuristics for
edge cases e.g., `Division-by-zero guards` and `Magnitude comparisons
between dividends and divisors`.

It also adds detailed comments within the expansion logic to explain the
rationale behind specific branch weight choices and the underlying
mathematical invariants.

Please refer to the implementation details in the source code for the

    [2 lines not shown]
DeltaFile
+600-601llvm/test/CodeGen/RISCV/idiv_large.ll
+252-249llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll
+243-236llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll
+99-88llvm/test/CodeGen/X86/pr38539.ll
+49-39llvm/test/Transforms/ExpandIRInsts/X86/vector.ll
+52-3llvm/lib/Transforms/Utils/IntegerDivision.cpp
+1,295-1,2166 files not shown
+1,388-1,24812 files

LLVM/project 41567d8llvm/include/llvm/CodeGen TargetLoweringObjectFileImpl.h, llvm/include/llvm/MC MCGOFFAttributes.h

[SystemZ] Implement ctor/dtor emission via @@SQINIT and .xtor sections (#171476)

This patch implements support for constructors/destructors by
introducing the
`@@SQINIT` section and emitting `.xtor.<priority>` sections within the
SystemZ
AsmPrinter and in the GOFF object lowering layer.
DeltaFile
+63-0llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+36-0llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
+34-0llvm/test/CodeGen/SystemZ/zos_sinit.ll
+2-0llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
+1-0llvm/include/llvm/MC/MCGOFFAttributes.h
+1-0llvm/include/llvm/CodeGen/TargetLoweringObjectFileImpl.h
+137-06 files

LLVM/project 9f3d143llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

AMDGPU: Remove dead code configuring f16 is_fpclass (#177626)

isTypeLegal can never be true here. The register classes
are registered at the end of the target lowering constructor,
and in the subclasses.
DeltaFile
+0-5llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+0-51 files

LLVM/project 1986628llvm/lib/Target/AMDGPU AMDGPUPromoteAlloca.cpp, llvm/test/CodeGen/AMDGPU flat-scratch.ll target-cpu.ll

[AMDGPU] Remove `FeaturePromoteAlloca`
DeltaFile
+14-14llvm/test/CodeGen/AMDGPU/flat-scratch.ll
+10-10llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll
+1-16llvm/test/CodeGen/AMDGPU/target-cpu.ll
+7-7llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll
+6-6llvm/test/CodeGen/AMDGPU/amdgcn.private-memory.ll
+7-4llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+45-5723 files not shown
+95-12129 files

LLVM/project 09685b7llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel fmax_legacy.ll fmin_legacy.ll

[AMDGPU][GlobalISel] Add RegBankLegalize rules for fmin/fmax_legacy (#177520)

DeltaFile
+26-3llvm/test/CodeGen/AMDGPU/GlobalISel/fmax_legacy.ll
+26-3llvm/test/CodeGen/AMDGPU/GlobalISel/fmin_legacy.ll
+4-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+56-63 files

LLVM/project 9109c60mlir/include/mlir/Dialect/XeGPU/uArch IntelGpuXe2.h uArchBase.h, mlir/lib/Dialect/XeGPU/Transforms XeGPUPropagateLayout.cpp

[MLIR][XeGPU] Add uArch limitation to scatter load store (#172845)

DeltaFile
+98-35mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp
+71-2mlir/test/Dialect/XeGPU/propagate-layout-inst-data.mlir
+32-4mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h
+7-1mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h
+208-424 files

LLVM/project fef6a14llvm/lib/Target/AMDGPU AMDGPU.td

[NFCI][AMDGPU] Fix the predicate `HasDsSrc2Insts` (#177621)

I'm not sure why the predicate has a `!`, and more surprisingly,
removing it doesn't change anything.
DeltaFile
+1-5llvm/lib/Target/AMDGPU/AMDGPU.td
+1-51 files

LLVM/project 6cf189aclang/test/CodeGen builtin-rotate.c

[clang][test] Fix builtin-rotate.c failure on ARM32 (#177290)

Replace unsigned __int128 with unsigned _BitInt(128) since __int128 is
not supported on ARM 32-bit targets.

Fixes https://lab.llvm.org/buildbot/#/builders/79/builds/2754
DeltaFile
+14-4clang/test/CodeGen/builtin-rotate.c
+14-41 files

LLVM/project 4237e74llvm/include/llvm/Analysis TargetTransformInfo.h, llvm/lib/Analysis TargetTransformInfo.cpp

[VectorCombine] foldShuffleOfBinops - failure to track OperandValueInfo (#171934)

Resolves #170500.

Implemented mergeInfo static helper to return common
TTI::OperandValueInfo data .

Added common OperandValueInfo `Op0Info` && `Op1Info` to NewCost
calculation.
DeltaFile
+193-0llvm/test/Transforms/VectorCombine/X86/shuffle-of-binops.ll
+15-0llvm/include/llvm/Analysis/TargetTransformInfo.h
+10-4llvm/lib/Analysis/TargetTransformInfo.cpp
+7-4llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+225-84 files

LLVM/project 4f09b7allvm/test/CodeGen/AMDGPU frem.ll fract-match.ll, llvm/test/CodeGen/AMDGPU/GlobalISel frem.ll

AMDGPU: Ignore type legality in isFAbsFree

This treats it as free on targets without legal f16. This
matches the existing logic in fneg, and they should be the same.
The test changes are mostly neutral with a few improvements.
DeltaFile
+130-148llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
+93-93llvm/test/CodeGen/AMDGPU/frem.ll
+22-24llvm/test/CodeGen/AMDGPU/fract-match.ll
+11-21llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
+12-15llvm/test/CodeGen/AMDGPU/fp-classify.ll
+8-8llvm/test/CodeGen/AMDGPU/fneg-fabs.f16.ll
+276-3093 files not shown
+285-3189 files

LLVM/project c1de2a9llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp SIISelLowering.cpp

AMDGPU: Move f16 legality configuration to SITargetLowering

f16 is never legal for R600 so this should not be in the common
base class.
DeltaFile
+2-11llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+3-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+5-112 files

LLVM/project 5b4a5cfllvm/docs ReleaseNotes.md, llvm/docs/CommandGuide llvm-objdump.rst

[RISCV][llvm-objdump] Support --symbolize-operands (#166656)

This adds support for `--symbolize-operands`, so that local references
are turned back into labels by objdump, which makes it easier to tell
what is going on with a linked object.

When using `--symbolize-operands`, branch target addresses are not
printed, only the referenced symbol is printed, and the address is
elided:

```
# Without --symbolize-operands
       0: 04a05263      blez    a0, 0x44 <.text+0x44>
...
      40: fd1ff06f      j       0x10 <.text+0x10>
      44: 00000613      li      a2, 0x0

# With --symbolize-operands
       0: 04a05263      blez    a0,  <L3>

    [4 lines not shown]
DeltaFile
+55-0llvm/test/MC/RISCV/symbolize-operands.s
+5-0llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
+2-1llvm/tools/llvm-objdump/llvm-objdump.cpp
+1-1llvm/docs/CommandGuide/llvm-objdump.rst
+2-0llvm/docs/ReleaseNotes.md
+65-25 files

LLVM/project 12cea04lldb/packages/Python/lldbsuite/test lldbtest.py

[lldb] Improve filecheck() by replacing assertTrue with assertEqual (#177212)

DeltaFile
+2-2lldb/packages/Python/lldbsuite/test/lldbtest.py
+2-21 files

LLVM/project 6c941d7llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

AMDGPU: Remove dead code configuring f16 is_fpclass

isTypeLegal can never be true here. The register classes
are registered at the end of the target lowering constructor,
and in the subclasses.
DeltaFile
+0-5llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+0-51 files

LLVM/project 56e04bellvm/lib/Target/AMDGPU AMDGPU.td

[NFCI][AMDGPU] Fix the predicate `HasDsSrc2Insts`

I'm not sure why the predicate has a `!`, and more surprisingly, removing it doesn't change anything.
DeltaFile
+1-5llvm/lib/Target/AMDGPU/AMDGPU.td
+1-51 files

LLVM/project 67baa87offload/test/mapping declare_mapper_target_checks.cpp

Revert "[NFC][OpenMP] Mark new mapper test as XFAIL on intelgpu. (#177491)"

This reverts commit 7d5622f7917815d224b780309432ffe4729e4852.
DeltaFile
+0-2offload/test/mapping/declare_mapper_target_checks.cpp
+0-21 files

LLVM/project 5997b42llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU wait-xcnt-atomic-rmw-optimization.ll

[AMDGPU][GFX1250] Optimize s_wait_xcnt for back-to-back atomic RMWs

This patch optimizes the insertion of s_wait_xcnt instruction for
sequences of atomic read-modify-write (RMW) operations in the
SIInsertWaitcnts pass. The Memory Legalizer conservatively inserts a
soft xcnt instruction before each atomic RMW operation as part of PR
168852, which is correct given the nature of atomic operations.
However, for back-to-back atomic RMWs, only the first s_wait_xcnt is
necessary for better runtime performance. This patch tracks atomic
RMW blocks within each basic block and removes redundant soft xcnt
instructions, keeping only the first wait in each sequence. An atomic
RMW block continues through subsequent atomic RMWs and non-memory
instructions (e.g., ALU operations) but is broken by CU-scoped memory
operations, atomic stores, or basic block boundaries.
DeltaFile
+71-2llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-31llvm/test/CodeGen/AMDGPU/wait-xcnt-atomic-rmw-optimization.ll
+71-332 files

LLVM/project 63e7070utils/bazel/llvm-project-overlay/clang BUILD.bazel

[bazel] Add filegroup for builtin_headers (#67757)

I'd like to package these files into a distribution tar as part of
https://github.com/dzbarsky/static-clang. I'm currently applying patches
to the llvm repo but figured this bit could be upstreamed.

(Also open to ideas what to do about the `config.bzl` change in
https://github.com/dzbarsky/static-clang/blob/master/llvm.patch - it's
needed to link with musl libc)
DeltaFile
+5-0utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+5-01 files

LLVM/project 296f5a7llvm/include/llvm/IR IntrinsicsSPIRV.td IntrinsicsDirectX.td, llvm/lib/Target/SPIRV SPIRVInstructionSelector.cpp SPIRVModuleAnalysis.cpp

[SPIR-V] Implement sample and sample_clamp intrinsics for HLSL resources (#177234)

This patch implements the `sample` and `sample_clamp` intrinsics for
HLSL
resources in the SPIR-V backend. It adds the necessary intrinsic
definitions
in `IntrinsicsDirectX.td` and `IntrinsicsSPIRV.td`, and implements the
instruction selection logic in `SPIRVInstructionSelector.cpp`.

Key changes:
- Added `int_dx_resource_sample` and `int_dx_resource_sample_clamp`
intrinsics.
- Added `int_spv_resource_sample` and `int_spv_resource_sample_clamp`
intrinsics.
- Implemented `selectSampleIntrinsic` to handle
`OpImageSampleImplicitLod` generation.
- Added `ResourceDimension` enum in `DXILABI.h` and `HLSLResource.h`.
- Added a new test case
`llvm/test/CodeGen/SPIRV/hlsl-resources/Sample.ll` to verify the
implementation.
DeltaFile
+127-0llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+69-0llvm/test/CodeGen/SPIRV/hlsl-resources/Sample.ll
+11-0llvm/include/llvm/IR/IntrinsicsSPIRV.td
+11-0llvm/include/llvm/IR/IntrinsicsDirectX.td
+3-0llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
+221-05 files