LLVM/project 068176allvm/lib/Analysis BranchProbabilityInfo.cpp

[Analysis] Remove LLVM_ABI annotations from llvm/lib/Analysis/BranchProbabilityInfo.cpp which cause build errors (#187388)

In llvm/lib/Analysis/BranchProbabilityInfo.cpp several LLVM_ABI
annotations were added which cause build errors, when trying to build
LLVM and Clang as a shared library on windows (see
https://github.com/compiler-research/ci-workflows/actions/runs/22754706570/job/67436382142#step:6:1141
for some of the errors) . With the changes in this PR these build errors
are fixed.

After this patch this is how far you get with the build
https://github.com/compiler-research/ci-workflows/actions/runs/23257495426/job/67635570161#step:6:4601.
These errors were introduced sometime in the last month, but I couldn't
work out how to fix them.
DeltaFile
+9-9llvm/lib/Analysis/BranchProbabilityInfo.cpp
+9-91 files

LLVM/project 98cf90ellvm/lib/Target/AArch64 AArch64InstrInfo.td

[AArch64][GlobalISel] Select lane index sqdmlal when vector_extract of v4i32 present

SQDMLALv1i64_indexed takes in an index of a vector as its final operand, meaning it doesn't need to extract the element in a separate instruction.

This only works when the vector to extract from is a v4i32. Currently, extracting from a v2i32 doesn't work, and I'm unsure why.
DeltaFile
+6-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+6-01 files

LLVM/project e3415daflang/lib/Semantics resolve-directives.cpp, flang/test/Integration/OpenMP copyprivate.f90

[Flang][OpenMP] Permit THREADPRIVATE variables in EQUIVALENCE statements (#186696)

The OpenMP API does not allow to have THREADPRIVATE variable appear in
an EQUIVALENCE statement. It has been requested by the community to
extend Flang such that it permits these non-conforming patterns. This PR
changes Flang to inherit the DSA of the base object of the EQUIVALENCE
statement to the equivalenced variables. The orginal error message is
turned into a warning.

This PR contains code from downstream PR
https://github.com/arm/arm-toolchain/pull/755 that @tblah pointed to
during the review.

Fixes https://github.com/llvm/llvm-project/issues/180493

Assisted-by: Claude Code, Opus 4.6
DeltaFile
+78-0flang/test/Lower/OpenMP/copyprivate-equivalence.f90
+33-33flang/test/Lower/OpenMP/copyprivate.f90
+19-19flang/test/Integration/OpenMP/copyprivate.f90
+30-0flang/lib/Semantics/resolve-directives.cpp
+17-0flang/test/Semantics/OpenMP/threadprivate-equivalence.f90
+8-8flang/test/Lower/OpenMP/copyprivate2.f90
+185-609 files not shown
+231-7215 files

LLVM/project a32d269utils/bazel/llvm-project-overlay/mlir BUILD.bazel, utils/bazel/llvm-project-overlay/mlir/test BUILD.bazel

[bazel] Gate GPU parsers behind llvm_targets (#187213)

Ideally fixes #63135

---------

Signed-off-by: Vimarsh Sathia <vsathia2 at illinois.edu>
DeltaFile
+5-5utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+2-2utils/bazel/llvm-project-overlay/mlir/test/BUILD.bazel
+7-72 files

LLVM/project a3e3fedllvm/include/llvm/CodeGen MachineCycleAnalysis.h TargetInstrInfo.h, llvm/lib/Target/AArch64 AArch64LoadStoreOptimizer.cpp

[CodeGen] Declare MachineCycleInfo in headers (#187494)

Transform MachineCycleInfo into a class that can be declared and remove
include from many source files.

Similar to 810ba55de9159932d498e9387d031f362b93fbea.
DeltaFile
+1-1llvm/include/llvm/CodeGen/MachineCycleAnalysis.h
+1-1llvm/include/llvm/CodeGen/TargetInstrInfo.h
+1-0llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+1-0llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+1-0llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+5-25 files

LLVM/project 2e2bcf7llvm/lib/Target/AMDGPU AMDGPUInstrInfo.h

[AMDGPU] Remove unused forward declaration
DeltaFile
+0-1llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.h
+0-11 files

LLVM/project dddf01cllvm/lib/Target/RISCV RISCVInstrInfo.td RISCVInstrInfoXqci.td, llvm/lib/Target/RISCV/MCTargetDesc RISCVMCCodeEmitter.cpp RISCVAsmBackend.cpp

[RISCV] Relax out of range Zibi conditional branches (#186965)

If `.Label` is not within +-4KiB range, we convert

```
beqi/bnei reg, imm, .Label
```

to

```
bnei/beqi reg, imm, 8
j .Label
```

This is similar to what is done for the RISCV conditional branches
and `Xqcibi` conditional branches.

---------

Co-authored-by: Sudharsan Veeravalli <svs at qti.qualcomm.com>
DeltaFile
+110-0llvm/test/MC/RISCV/zibi-long-conditional-jump.s
+14-0llvm/lib/Target/RISCV/RISCVInstrInfo.td
+0-13llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td
+10-3llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
+6-0llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp
+5-0llvm/lib/Target/RISCV/RISCVInstrInfoZibi.td
+145-161 files not shown
+148-187 files

LLVM/project 76f7252llvm/lib/CodeGen/SelectionDAG FastISel.cpp, llvm/test/CodeGen/X86 fake-use-fastisel.ll

[FastISel] generate FAKE_USE for llvm.fake.use (#187116)

FastISel was dropping llvm.fake.use because they are not meant to be
generated at O0 with clang.

This patch adds support in FastISel to generate FAKE_USE for llvm.fake.use.
The handling is simpler than in SelectionDagBuilder because no attempt is made to
get rid of useless FAKE_USE (e.g. for constant SSA values) to keep FastISel simple.

The motivation is that flang will generate llvm.fake.use for function arguments under
`-g` (and O0) because Fortran arguments are not copied to the stack (they are
reference like arguments in most cases) and one should be able to access these
variables from the debugger at any point of the function, even after their last use in the
function.
DeltaFile
+20-0llvm/test/CodeGen/X86/fake-use-fastisel.ll
+7-2llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
+27-22 files

LLVM/project d641186clang/docs UsersManual.rst, clang/test/Driver cl-link.c

[clang-cl] test that `-Xlinker` works, update supported options docs (#187395)

closes #119179
DeltaFile
+709-95clang/docs/UsersManual.rst
+5-0clang/test/Driver/cl-link.c
+714-952 files

LLVM/project a3f0a19llvm/lib/Target/AArch64 AArch64LoadStoreOptimizer.cpp, llvm/lib/Target/RISCV RISCVVLOptimizer.cpp

More include fixes

Created using spr 1.3.8-wip
DeltaFile
+1-0llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+1-0llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+2-02 files

LLVM/project 18ed1a9llvm/test/CodeGen/X86 bit-manip-i512.ll bit-manip-i256.ll

[X86] Add bitrevese/bswap i128/i256/i512 test coverage for #187353 (#187492)
DeltaFile
+2,986-1llvm/test/CodeGen/X86/bit-manip-i512.ll
+1,492-0llvm/test/CodeGen/X86/bit-manip-i256.ll
+459-0llvm/test/CodeGen/X86/bit-manip-i128.ll
+4,937-13 files

LLVM/project 78a8f00llvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlanConstruction.cpp

Revert "[VPlan] Create header phis once regions have been created (NFC)."

This reverts commit 91b928f919364b29e241821fc639b9ef56dab1a5.

This complicates some analysis that need the happen on the scalar VPlan,
before regions have been created, e.g.
https://github.com/llvm/llvm-project/pull/185323/.
DeltaFile
+9-11llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+6-1llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+15-122 files

LLVM/project 289c588llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 avx512-trunc.ll

[X86] Optimize load-trunc-store for v4i16/v2i32/v2i16 vectors (#186676)

This patch transform 
IR 
```
define void @cast_i16x4_to_u8x4(ptr %a0, ptr %a1) {
  %1 = load <4 x i16>, ptr %a1
  %2 = trunc <4 x i16> %1 to <4 x i8>
  store <4 x i8> %2, ptr %a0
  ret void
}
```
From Assembly
```
cast_i16x4_to_u8x4:                     # @cast_i16x4_to_u8x4
        vmovq   (%rsi), %xmm0                   # xmm0 = mem[0],zero
        vpmovwb %xmm0, %xmm0
        vmovd   %xmm0, (%rdi)
        retq

    [16 lines not shown]
DeltaFile
+119-0llvm/test/CodeGen/X86/avx512-trunc.ll
+22-0llvm/lib/Target/X86/X86ISelLowering.cpp
+141-02 files

LLVM/project 1078a1dllvm/lib/Transforms/InstCombine InstCombineAndOrXor.cpp, llvm/test/CodeGen/X86 bmi.ll

Lowering `~x | (x - 1)` to `~blsi(x)` (#186722)

Alive2 proof: 
https://alive2.llvm.org/ce/z/bK93Cn

I've implemented a fold in `InstCombineAndOrXor.cpp` to canonicalize `~x
| (x - 1)` to `~(x & -x)` which enables the CodeGen to emit the `blsi`
instruction.

I've also added a test in `CodeGen/X86`.

Fixes #184055

---------

Co-authored-by: Tim Gymnich <tim at gymni.ch>
DeltaFile
+68-0llvm/test/Transforms/InstCombine/fold-bmi.ll
+59-0llvm/test/CodeGen/X86/bmi.ll
+10-0llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+137-03 files

LLVM/project 1745706llvm/include/llvm/CodeGen MachineCycleAnalysis.h TargetInstrInfo.h, llvm/lib/Target/AMDGPU SIInstrInfo.cpp

[spr] initial version

Created using spr 1.3.8-wip
DeltaFile
+1-1llvm/include/llvm/CodeGen/MachineCycleAnalysis.h
+1-1llvm/include/llvm/CodeGen/TargetInstrInfo.h
+1-0llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+3-23 files

LLVM/project 49a5192llvm/include/llvm/ADT GenericCycleImpl.h GenericCycleInfo.h

[CycleInfo] Don't store top-level cycle per block (#187488)

CycleInfo currently has a second map, that stores the top-level cycle
for a block. I don't think storing this per-block makes a lot of sense,
because the top-level cycle is always the same for all blocks in a
cycle.

So instead store it as a member of the cycle.
DeltaFile
+10-20llvm/include/llvm/ADT/GenericCycleImpl.h
+5-4llvm/include/llvm/ADT/GenericCycleInfo.h
+15-242 files

LLVM/project 7d02ca6mlir/include/mlir/Dialect/LLVMIR LLVMIntrinsicOps.td, mlir/test/Target/LLVMIR llvmir-intrinsics.mlir

[mlir][LLVM] add llvm.fake.use to LLVM dialect (#187026)

Add llvm.fake.use to the LLVM dialect intrinsics.
See https://llvm.org/docs/LangRef.html#llvm-fake-use-intrinsic.
DeltaFile
+14-0mlir/test/Target/LLVMIR/Import/intrinsic.ll
+12-0mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir
+5-0mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
+31-03 files

LLVM/project 796b218llvm/test/CodeGen/AArch64 rem-by-const.ll, llvm/test/CodeGen/RISCV split-udiv-by-constant.ll split-urem-by-constant.ll

[LegalizeTypes] Expand UDIV/UREM by constant via chunk summation (#146238)

This patch improves the lowering of 128-bit unsigned division and
remainder by constants (UDIV/UREM) by avoiding a fallback to libcall
(__udivti3/uremti3) for specific divisors.

When a divisor D satisfies the condition (1 << ChunkWidth) % D == 1, the
128-bit value is split into fixed-width chunks (e.g., 30-bit) and summed
before applying a smaller UDIV/UREM. This transformation is based on the
"remainder by summing digits" trick described in Hacker’s Delight.

This fixes #137514 for some constants.
DeltaFile
+2,859-7llvm/test/CodeGen/X86/i128-udiv.ll
+474-0llvm/test/CodeGen/X86/vector-idiv-udiv-128.ll
+84-69llvm/test/CodeGen/AArch64/rem-by-const.ll
+122-28llvm/test/CodeGen/RISCV/split-udiv-by-constant.ll
+74-28llvm/test/CodeGen/RISCV/split-urem-by-constant.ll
+72-10llvm/test/CodeGen/RISCV/div-by-constant.ll
+3,685-1426 files not shown
+3,892-21912 files

LLVM/project 582fa78llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 matched-gather-part-of-combined.ll

[SLP]Do not match buildvector node, if current node is part of its combined nodes

If current buildvector node is part of the combined nodes of the
matching candidate node, this matching candidate must be considered as
non-matching to prevent wrong def-use chain

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/187491
DeltaFile
+75-0llvm/test/Transforms/SLPVectorizer/X86/matched-gather-part-of-combined.ll
+3-0llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+78-02 files

LLVM/project 191c84bllvm/lib/Transforms/Vectorize VPlanUtils.cpp, llvm/test/Transforms/LoopVectorize/ARM mve-reg-pressure-spills.ll

[VPlan] Permit derived IV in isHeaderMask (#187360)

When matching scalar steps of the canonical IV, also match a derived IV
of the canonical IV if the derivation is essentially a no-op. Fixes a
failure in the mve-reg-pressure-spills.ll test when expensive checks are
enabled.
DeltaFile
+4-3llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+2-2llvm/test/Transforms/LoopVectorize/ARM/mve-reg-pressure-spills.ll
+6-52 files

LLVM/project 6aeeae6llvm/test/Transforms/LoopVectorize/Sparc lit.local.cfg

[SPARC][Tests] Add lit.local.cfg to SPARC LoopVectorize tests (#187489)
DeltaFile
+2-0llvm/test/Transforms/LoopVectorize/Sparc/lit.local.cfg
+2-01 files

LLVM/project 66ed7e4llvm/lib/Target/AMDGPU VOP3PInstructions.td SIFoldOperands.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.h

AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
DeltaFile
+170-312llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+96-95llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
+31-4llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+21-8llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+27-0llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+16-1llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+361-4204 files not shown
+380-42210 files

LLVM/project 97f1c60llvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp VOP2Instructions.td, llvm/test/CodeGen/AMDGPU llvm.amdgcn.fdot2.ll llvm.amdgcn.fdot2.f32.bf16.ll

AMDGPU: Improve codegen for VOP2 v_dot2c_f32_f16/bf16

Select VOP2 version when there are no src_modifers, otherwise VOP3
DeltaFile
+64-212llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+20-48llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
+12-49llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll
+39-4llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+14-12llvm/lib/Target/AMDGPU/VOP2Instructions.td
+22-0llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+171-3254 files not shown
+188-32510 files

LLVM/project b029b98llvm/test/CodeGen/X86 bit-manip-i128.ll

[X86] Add i128 bit manipulation pattern test coverage (#187480)
DeltaFile
+1,095-0llvm/test/CodeGen/X86/bit-manip-i128.ll
+1,095-01 files

LLVM/project 0525c50llvm/lib/Target/AMDGPU VOP3PInstructions.td AMDGPUInstructionSelector.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.fdot2.ll llvm.amdgcn.fdot2.f32.bf16.ll

AMDGPU: Fix src2_modifiers for v_dot2_f32_f16/bf16
DeltaFile
+114-49llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll
+14-21llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+23-5llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+6-9llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
+13-0llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+9-0llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+179-845 files not shown
+192-8611 files

LLVM/project 23af867llvm/lib/Target/Sparc SparcTargetTransformInfo.cpp SparcTargetTransformInfo.h, llvm/test/Transforms/LoopVectorize/Sparc no-vectorize.ll

[SPARC] Add TTI implementation for getting register numbers and widths (#180660)

Correctly inform transform passes about our registers; this prevents the
issue with the `find-last` test where the loop vectorizer pass
mistakenly thinks that the backend has vector capabilities and generates
vector types, which causes the backend to crash.

See also: https://github.com/sparclinux/issues/issues/69
DeltaFile
+74-0llvm/test/Transforms/LoopVectorize/Sparc/no-vectorize.ll
+49-0llvm/lib/Target/Sparc/SparcTargetTransformInfo.cpp
+10-1llvm/lib/Target/Sparc/SparcTargetTransformInfo.h
+133-13 files

LLVM/project c3e7624clang/lib/Sema SemaExprCXX.cpp, clang/test/Modules align-val-t-merge.cpp

[clang] Add implicit std::align_val_t to std namespace DeclContext for module merging (#187347)

When a virtual destructor is encountered before any module providing
std::align_val_t is loaded, DeclareGlobalNewDelete() implicitly creates
a std::align_val_t EnumDecl. However, this EnumDecl was not added to the
std namespace's DeclContext -- it was only stored in the
Sema::StdAlignValT field.

Later, when a module containing an explicit std::align_val_t definition
is loaded, ASTReaderDecl::findExisting() attempts to find the implicit
decl via DeclContext::noload_lookup() on the std namespace. Since the
implicit EnumDecl was never added to that DeclContext, the lookup fails,
and the two align_val_t declarations are not merged into a single
redeclaration chain. This results in two distinct types both named
std::align_val_t.

The implicitly declared operator delete overloads (also created by
DeclareGlobalNewDelete) use the implicit align_val_t type for their
aligned-deallocation parameter. When module code (e.g. std::allocator::

    [17 lines not shown]
DeltaFile
+75-0clang/test/Modules/align-val-t-merge.cpp
+7-0clang/lib/Sema/SemaExprCXX.cpp
+82-02 files

LLVM/project cd72600llvm/test/CodeGen/AMDGPU llvm.amdgcn.fdot2.ll llvm.amdgcn.fdot2.f32.bf16.ll

AMDGPU: Add more tests for v_dot2_f32_f16/bf16

Test for src modifiers, inline constants and vopd codegen.
DeltaFile
+1,769-45llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+944-116llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
+2,713-1612 files

LLVM/project f104b73llvm/test/CodeGen/SPIRV trunc-nonstd-bitwidth.ll, llvm/test/CodeGen/SPIRV/extensions/SPV_ALTERA_arbitrary_precision_floating_point arbitrary_precision_floating_point_test.ll

[NFC][SPIRV] Run `spirv-val` on tests related to `SPV_ALTERA_arbitrary_precision_integers` (#187464)

https://github.com/KhronosGroup/SPIRV-Tools/pull/6232 landed support for
this extension in `spirv-val`.

This PR updates some relevant tests to run `spirv-val` on their output.
DeltaFile
+1-1llvm/test/CodeGen/SPIRV/llvm-intrinsics/bitreverse_small_type.ll
+1-1llvm/test/CodeGen/SPIRV/extensions/SPV_ALTERA_arbitrary_precision_integers/i128-switch-lower.ll
+1-1llvm/test/CodeGen/SPIRV/extensions/SPV_ALTERA_arbitrary_precision_integers/i128-arith.ll
+1-1llvm/test/CodeGen/SPIRV/extensions/SPV_ALTERA_arbitrary_precision_integers/i128-addsub.ll
+1-1llvm/test/CodeGen/SPIRV/extensions/SPV_ALTERA_arbitrary_precision_floating_point/arbitrary_precision_floating_point_test.ll
+1-0llvm/test/CodeGen/SPIRV/trunc-nonstd-bitwidth.ll
+6-53 files not shown
+9-59 files

LLVM/project 7663802llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AArch64 sve-extract-fixed-from-scalable-vector.ll

[LLVM][DAGCombiner] Limit extract_subvec(extract_subvec()) combine to vectors of the same type. (#187334)

The index operand of ISD::EXTRACT_SUBVECTOR is implicitly scaled by
vscale, which is effectively always one for fixed-length vectors. When
combining nested extracts we must ensure all use the same implicit
scaling otherwise the transform is not equivalent.

Fixes https://github.com/llvm/llvm-project/issues/186563
DeltaFile
+10-14llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll
+4-3llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+14-172 files