[AArch64][GlobalISel] Select lane index sqdmlal when vector_extract of v4i32 present
SQDMLALv1i64_indexed takes in an index of a vector as its final operand, meaning it doesn't need to extract the element in a separate instruction.
This only works when the vector to extract from is a v4i32. Currently, extracting from a v2i32 doesn't work, and I'm unsure why.
[Flang][OpenMP] Permit THREADPRIVATE variables in EQUIVALENCE statements (#186696)
The OpenMP API does not allow to have THREADPRIVATE variable appear in
an EQUIVALENCE statement. It has been requested by the community to
extend Flang such that it permits these non-conforming patterns. This PR
changes Flang to inherit the DSA of the base object of the EQUIVALENCE
statement to the equivalenced variables. The orginal error message is
turned into a warning.
This PR contains code from downstream PR
https://github.com/arm/arm-toolchain/pull/755 that @tblah pointed to
during the review.
Fixes https://github.com/llvm/llvm-project/issues/180493
Assisted-by: Claude Code, Opus 4.6
[CodeGen] Declare MachineCycleInfo in headers (#187494)
Transform MachineCycleInfo into a class that can be declared and remove
include from many source files.
Similar to 810ba55de9159932d498e9387d031f362b93fbea.
[RISCV] Relax out of range Zibi conditional branches (#186965)
If `.Label` is not within +-4KiB range, we convert
```
beqi/bnei reg, imm, .Label
```
to
```
bnei/beqi reg, imm, 8
j .Label
```
This is similar to what is done for the RISCV conditional branches
and `Xqcibi` conditional branches.
---------
Co-authored-by: Sudharsan Veeravalli <svs at qti.qualcomm.com>
[FastISel] generate FAKE_USE for llvm.fake.use (#187116)
FastISel was dropping llvm.fake.use because they are not meant to be
generated at O0 with clang.
This patch adds support in FastISel to generate FAKE_USE for llvm.fake.use.
The handling is simpler than in SelectionDagBuilder because no attempt is made to
get rid of useless FAKE_USE (e.g. for constant SSA values) to keep FastISel simple.
The motivation is that flang will generate llvm.fake.use for function arguments under
`-g` (and O0) because Fortran arguments are not copied to the stack (they are
reference like arguments in most cases) and one should be able to access these
variables from the debugger at any point of the function, even after their last use in the
function.
Lowering `~x | (x - 1)` to `~blsi(x)` (#186722)
Alive2 proof:
https://alive2.llvm.org/ce/z/bK93Cn
I've implemented a fold in `InstCombineAndOrXor.cpp` to canonicalize `~x
| (x - 1)` to `~(x & -x)` which enables the CodeGen to emit the `blsi`
instruction.
I've also added a test in `CodeGen/X86`.
Fixes #184055
---------
Co-authored-by: Tim Gymnich <tim at gymni.ch>
[CycleInfo] Don't store top-level cycle per block (#187488)
CycleInfo currently has a second map, that stores the top-level cycle
for a block. I don't think storing this per-block makes a lot of sense,
because the top-level cycle is always the same for all blocks in a
cycle.
So instead store it as a member of the cycle.
[LegalizeTypes] Expand UDIV/UREM by constant via chunk summation (#146238)
This patch improves the lowering of 128-bit unsigned division and
remainder by constants (UDIV/UREM) by avoiding a fallback to libcall
(__udivti3/uremti3) for specific divisors.
When a divisor D satisfies the condition (1 << ChunkWidth) % D == 1, the
128-bit value is split into fixed-width chunks (e.g., 30-bit) and summed
before applying a smaller UDIV/UREM. This transformation is based on the
"remainder by summing digits" trick described in Hacker’s Delight.
This fixes #137514 for some constants.
[SLP]Do not match buildvector node, if current node is part of its combined nodes
If current buildvector node is part of the combined nodes of the
matching candidate node, this matching candidate must be considered as
non-matching to prevent wrong def-use chain
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/187491
[VPlan] Permit derived IV in isHeaderMask (#187360)
When matching scalar steps of the canonical IV, also match a derived IV
of the canonical IV if the derivation is essentially a no-op. Fixes a
failure in the mve-reg-pressure-spills.ll test when expensive checks are
enabled.
AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
[SPARC] Add TTI implementation for getting register numbers and widths (#180660)
Correctly inform transform passes about our registers; this prevents the
issue with the `find-last` test where the loop vectorizer pass
mistakenly thinks that the backend has vector capabilities and generates
vector types, which causes the backend to crash.
See also: https://github.com/sparclinux/issues/issues/69
[clang] Add implicit std::align_val_t to std namespace DeclContext for module merging (#187347)
When a virtual destructor is encountered before any module providing
std::align_val_t is loaded, DeclareGlobalNewDelete() implicitly creates
a std::align_val_t EnumDecl. However, this EnumDecl was not added to the
std namespace's DeclContext -- it was only stored in the
Sema::StdAlignValT field.
Later, when a module containing an explicit std::align_val_t definition
is loaded, ASTReaderDecl::findExisting() attempts to find the implicit
decl via DeclContext::noload_lookup() on the std namespace. Since the
implicit EnumDecl was never added to that DeclContext, the lookup fails,
and the two align_val_t declarations are not merged into a single
redeclaration chain. This results in two distinct types both named
std::align_val_t.
The implicitly declared operator delete overloads (also created by
DeclareGlobalNewDelete) use the implicit align_val_t type for their
aligned-deallocation parameter. When module code (e.g. std::allocator::
[17 lines not shown]
[NFC][SPIRV] Run `spirv-val` on tests related to `SPV_ALTERA_arbitrary_precision_integers` (#187464)
https://github.com/KhronosGroup/SPIRV-Tools/pull/6232 landed support for
this extension in `spirv-val`.
This PR updates some relevant tests to run `spirv-val` on their output.
[LLVM][DAGCombiner] Limit extract_subvec(extract_subvec()) combine to vectors of the same type. (#187334)
The index operand of ISD::EXTRACT_SUBVECTOR is implicitly scaled by
vscale, which is effectively always one for fixed-length vectors. When
combining nested extracts we must ensure all use the same implicit
scaling otherwise the transform is not equivalent.
Fixes https://github.com/llvm/llvm-project/issues/186563