[X86] combineINSERT_SUBVECTOR - peek through BITCAST and EXTRACT_SUBVECTOR when trying to find shuffle combine candidates (#201781)
Helps with some expanded CONCAT_VECTORS cases where both halves came
from wider shuffles.
More yak shaving for #199445
[SimplifyCFG] Look at all uses when checking phi incoming for UB (#200164)
passingValueIsAlwaysUndefined only looks at the first use of the phi
that has a UB-candidate opcode. If that use is in a different block, the
function gives up, even when another use in the same block would prove
UB. Use-list order is not guaranteed, so this happens in practice.
Move the same-block check into the find_if lambda so the scan keeps
going past cross-block uses.
[alpha.webkit.NoDeleteChecker] Allow no-delete default constructors (#201544)
This PR fixes the bug in TrivialFunctionAnalysis that it treats a
default constructor without an explicit body / definition as not
"trivial". Fixed the bug by allowing the function body to be missing
when isThisDeclarationADefinition is true.
---------
Co-authored-by: Balazs Benics <benicsbalazs at gmail.com>
[NFC][clang] Add pragma comment formatting commit to blame ignore list (#201765)
Add the previously landed formatting-only commit for the pragma comment
kind StringSwitch to `.git-blame-ignore-revs`.
This keeps git blame useful across the NFC formatting change.
Formatting commit:
511d2e40ddeacf25f403b40ed73a41d1dea1b636
Co-authored-by: Tony Varghese <tony.varghese at ibm.com>
OpenMP: Accept amdgpu name in arch directive
Accept amdgpu as an alias for amdgcn as part of the general
trend of preferring the amdgpu name. This is so the name is
consistent in the future when the triple arch name changes.
[SLSR] Avoid repeatedly calling canReuseInstruction for the same Basis (#196545)
`canReuseInstruction` only depends on `Basis`, but runs for each
`(Basis, C)` pair. This patch moves the check earlier in the pass to
remove the repeated call.
Assisted-by: Claude Code
[Clang][HIP] Include `__clang_cuda_math_forward_declares.h` before `<cmath>`
This patch should fix the following error on windows: https://github.com/ggml-org/llama.cpp/issues/22570
In HIP, constexpr functions are treated as both __host__ and __device__.
A new version of the MS STL shipped with the build tools version
14.51.36231 has constexpr definitions for some cmath functions when the
compiler in use is Clang.
These definitions conflict with the __device__ declarations we provide
in the header wrappers.
There is a workaround for this: It is possible to overload constexpr
functions **that are defined in a system header** by declaring a __device__
version before.
By moving `__clang_cuda_math_forward_declares.h` before `<cmath>` is
included we're able to benefit from this behavour.
[mlir][tosa] Allow numeric values to be specified for mxint8 constants (#200762)
This commit uses the DenseElementTypeInterface to allow signless numeric
values to be specified for mxint8 constants by supplying `i8` values.
This is more user-friendly than the previous hex representation.
[Flang][OpenMP]add semantic check for linear clause with statement function variables (#199743)
### **Description**
1. This patch adds a missing semantic check for the LINEAR clause.
2. OpenMP treats LINEAR variables similarly to PRIVATE variables.
Variables used inside statement function expressions are not allowed to
be privatized, but Flang was not checking this for LINEAR.
3. The existing privatization check already handled PRIVATE,
FIRSTPRIVATE, and LASTPRIVATE. This patch extends the same check to
LINEAR.
Fixes : [199660](https://github.com/llvm/llvm-project/issues/199660)
### **Reproducer**
```
subroutine test()
integer :: pi, r, f, x
f(r) = pi * r + x
[21 lines not shown]
[LoopFusion][docs][NFC] Document atomic accesses as a fusion blocker (#201775)
Loops containing atomic accesses are now rejected outright, mirroring
the volatile blocker. Update the eligibility sections to match.
[RISCV][MC] Add experimental `Zvvmtls` and `Zvvmttls` support (#198229)
This patch adds experimental MC-layer support for the [RISC-V Integrated
Matrix
Extension](https://github.com/riscv/integrated-matrix-extension/releases/tag/riscv-isa-release-71c48b9-2026-05-17),
specifically the tile load/store extensions: `Zvvmtls` and `Zvvmttls`
This PR:
- Adds the optional tile lambda operand syntax (`L1` through `L64`), and
related asm operand.
- Adds the `vmtl.v`, `vmts.v`, `vmttl.v` and `vmtts.v` instructions to
the MC
- Modifies `parseMaskReg` to return `NoMatch` to allow overloaded
mnemonics to continue matching alternative optional operands, such as
parsing `vmtl.v v8, (a0), a1, L4` as the tile-lambda form instead of
failing by treating `L4` as a malformed mask operand. Real mask
registers missing .t, such as v0, still produce the existing diagnostic.
[mlirbc] Add AffineMap serialization support (#191970)
Add binary bytecode encoding for AffineMapAttr, replacing the textual fallback.
AffineMap is encoded as numDims, numSymbols, numResults, followed by the result
expressions. Where each expression, AffineExpr, is encoded in the general case
as a recursive/prefix tree with a VarInt kind tag followed by kind-specific
data. To guard a bit more against malformed bytecode it uses an iterative
parser for these.
Special case encoding for common case AffineMap's (required less space & easy
to create without much higher maintenance needs). The ordering of the enum
serialized differs from AffineExprKind as the latter has an expansion point in
the middle (new kinds can be added there) while the serialized encoding needs
to remain stable.
Updated the checked in mlirbc file as memref has a default affinemap, so
updating it pre snap.
Assisted-by: Antigravity : Gemini
[lldb][test] Increase polling in TestInterruptThreadNames.py (#201554)
This test runs for a very long time on my machine (11s per variation),
and nearly all of this time is spent on the 10s sleep in this function.
There are two issues here:
1. It uses the (now outdated) logic that arm64 means we have a remote
Darwin device. This is no longer true these days as Macs also run on
arm64.
2. The polling duration of 1s is still very long, and the test will
still spend all its time just waiting for this 1s sleep. A 100ms sleep
that we poll in a loop should be slow enough.
[lldb][test] Assume clang supports -gmodules (#201333)
We currently spend 50ms in most dotest invocations to check if clang
supports `-gmodules`. The expensive part of this check is creating the
clang process to run `clang --help`.
`-gmodules` was added 11 years ago and is present in any compiler that
has even a remote chance in supporting the rest of our test suite. This
patch just assumes that our compiler supports -gmodules if it is clang.
[lldb][test] Increase polling frequency in ProcessAttach (#201532)
The test_attach_to_process_by_id_correct_executable_offset subtest
requires us to hit a breakpoint in an attached process. For this we
implement a loop that hits the breakpoint location every 2 seconds.
This patch increases the rate at which we hit this breakpoint to 50ms.
The reason is that a 2s interval means that this test is waiting on any
fast system for nearly 2 seconds on the first breakpoint hit. With a
50ms interval this subtest passed immediately.
[lldb][test] Make TestInterruptThreadNames not depend on debug info (#201553)
This test only reads the pthread names, which don't depend on any debug
info.
This halves the runtime of this very long test from 22s to 11s.
[AMDGPU] In `LowerDYNAMIC_STACKALLOC`, hoist the `readfirstlane` up one instruction (#201528)
Instead of:
```
$max_size_vgpr = wave_reduction_umax($vgpr_alloca_size)
$sgpr_newsp = readfirstlane($max_size_vgpr + $sgpr_sp)
```
Hoist the readfirstlane up to perform the addition using scalar
registers:
```
$max_size_sgpr = readfirstlane(wave_reduction_umax($vgpr_alloca_size))
$sgpr_newsp = $max_size_sgpr + $sgpr_sp
```
[libc++] Drop transitive includes by default (#195509)
This patch removes the unused transitive includes by default.
`_LIBCPP_KEEP_TRANSITIVE_INCLUDES_LLVM23` can be defined to keep the
transitive includes around for an easier transition. The macro will be
removed in LLVM 24.
This patch implements
https://discourse.llvm.org/t/rfc-remove-unused-transitive-includes-from-the-libc-headers/90157
[offload][OpenMP] Fix record replay when no memory is used
Progams that do not use any memory (e.g., no mappings) were failing because
we were trying to execute zero size transfers.
[mlir] Fix crash in test type converter for 1->N result conversion (#201738)
Use `results.append` instead of `results.assign`, preserving previous
results.
Fixes https://github.com/llvm/llvm-project/issues/201521
[X86] X86FixupInstTuning - fold VPERM2x128 -> VINSERTx128 when shuffling lower xmm half ymm sources (#201618)
VINSERTx128 is never slower than VPERM2x128 and notably quicker on some
targets (btver2, znver1, e-cores, etc.).
Shuffle lowering avoids some VINSERT patterns for AVX targets as it can
affect folding/commutation - but by the time we get to the fixup passes,
these are all done and we can safely convert to VINSERTF128/I128.
There's more variants of the VPERM2 immediate mask that could be folded,
but its incredibly difficult to hit them as its easily commutable.
I hit this while working on #199445.
[SeparateConstOffsetFromGEP] Decompose xor constant operand when possible (#195830)
It may be desirable to fold constants directly into the addressing mode
when computing an address. While lowering GEPs and looking for a
constant to extract among the indexes, take into account constants which
are xor expressions as well. When some bits of the constant operand of
the xor are known-zero in the base operand, then, for those specific
bits (disjoint bits), xor and additions behave alike. Such bits may be
extracted from the xor, and are those that can contribute to the final
GEP offset.
Proofs: https://alive2.llvm.org/ce/z/JtmXsu.
Co-authored-by: Sumanth Gundapaneni <sumanth.gundapaneni at amd.com>
[clangd][modules] Provide correct context to ModulesBuilder::hasRequiredModules() call (#201419)
To make command mangler to use compile command edits from the
configuration, we need to provide the correct context to it.
Without this patch compile command edits declared in .clangd file are
not used during required modules check, which can lead to compile errors
appearing and false negative `ModulesBuilder::hadRequiredModules()`
return result.
This PR addresses problem described here
https://github.com/llvm/llvm-project/pull/200001#issuecomment-4590514342