LLVM/project 6d4cd34llvm/docs/GlobalISel IRTranslator.rst, llvm/lib/CodeGen LowLevelTypeUtils.cpp

[GlobalISel] Support the byte type in the IRTranslator (#196125)

> [!NOTE]
> Stacked: this PR is the base of a 2-PR stack.
> - **#196125 (this PR)** — GlobalISel byte-type support (lands first)
> - **#177908** — LoadStoreVectorizer mixed-type support (depends on
this; rebased on top)

Mirror SelectionDAG's behaviour by treating byte as integer at the
IR-to-MIR boundary:
- `getLLTForType` maps ByteType to `LLT::integer(N)`, matching the
byte->integer EVT mapping in ValueTypes.cpp.
- `translate(Constant)` handles ConstantByte by routing through
buildConstant with the underlying APInt.
- `translateBitCast` redirects only **scalar** byte<->ptr crossings to
G_INTTOPTR / G_PTRTOINT (the well-typed MIR shape for that boundary).
Vector byte<->ptr (e.g. `<N x b32>` -> ptr produced by mixed-type load
coalescing in #177908) and other legacy ptr/non-ptr IR bitcasts (AMDGPU
iN<->p3 kernarg packing, etc.) keep their historical G_BITCAST lowering

    [2 lines not shown]
DeltaFile
+793-0llvm/test/CodeGen/Generic/GlobalISel/irtranslator-byte-type.ll
+166-0llvm/test/CodeGen/AMDGPU/GlobalISel/amdgpu-irtranslator.ll
+22-0llvm/docs/GlobalISel/IRTranslator.rst
+19-2llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+7-0llvm/lib/CodeGen/LowLevelTypeUtils.cpp
+6-0llvm/test/CodeGen/PowerPC/fast-isel-cmp-imm.ll
+1,013-24 files not shown
+1,034-210 files

LLVM/project 25e94daflang/lib/Optimizer/Transforms DebugTypeGenerator.cpp DebugTypeGenerator.h, flang/test/Transforms debug-char-type-2.fir

[flang][debug] Fix DIStringType size for arrays of assumed-length chars. (#201649)

When generating DWARF for an assumed-shape array whose element type is
an assumed-length character, `convertBoxedSequenceType` called
`convertType` for the element, which in turn called
`convertCharacterType` with `hasDescriptor`=false. With no descriptor
and a non-constant length, none of the branches that set `sizeInBits` or
produce a length expression were taken, so the resulting
`DIStringTypeAttr` had `sizeInBits` equal to =0 and no
`stringLengthExp`, leaving GDB unable to determine the string length or
display the array elements.

Fix this by detecting a non-constant-length character element in
`convertBoxedSequenceType` and calling `convertCharacterType` directly
with `hasDescriptor=true`. This generates the correct `stringLengthExp`
that reads the element byte-size from the descriptor. A
`genStringLocation` parameter (default true) is also added to suppress
the string location expression for the element type, since the data
location of array elements is already provided by the enclosing array.

Fixes https://github.com/llvm/llvm-project/issues/113895
DeltaFile
+23-6flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
+19-0flang/test/Transforms/debug-char-type-2.fir
+2-1flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
+44-73 files

LLVM/project d7d2a55llvm/lib/Target/AArch64 AArch64LoadStoreOptimizer.cpp

[AArch64] Add an early return and de-indent a long indented block (NFC) (#201345)

As per LLVM coding standards recommendation in [1].

[1]:
https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code
DeltaFile
+38-38llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+38-381 files

LLVM/project 70d84ablldb/docs python_api_enums.md, lldb/docs/use variable.rst variable.md

Merge branch 'main' into users/arsenm/clang/amdgpu-openmp-accept-arch-amdgpu-name
DeltaFile
+4,489-13lldb/source/Utility/RISCV_DWARF_Registers.h
+4,473-0lldb/source/Plugins/Process/Utility/lldb-riscv-register-enums.h
+4,253-32lldb/source/Plugins/Process/Utility/RegisterInfos_riscv32.h
+3,288-0lldb/docs/python_api_enums.md
+0-1,531lldb/docs/use/variable.rst
+1,496-0lldb/docs/use/variable.md
+17,999-1,576636 files not shown
+36,752-14,187642 files

LLVM/project 6199ca9clang/test/OpenMP amdgpu-arch-compat.c

Remove blank line
DeltaFile
+0-1clang/test/OpenMP/amdgpu-arch-compat.c
+0-11 files

LLVM/project 643eec3llvm/utils/gdb-scripts prettyprinters.py

[prettyprinters] Fix syntax error introduced by 17f85f467249. (#201359)
DeltaFile
+3-2llvm/utils/gdb-scripts/prettyprinters.py
+3-21 files

LLVM/project 982dc4fllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-interleaved-store-i8-stride-4.ll x86-interleaved-access.ll

[X86] combineINSERT_SUBVECTOR - peek through BITCAST and EXTRACT_SUBVECTOR when trying to find shuffle combine candidates (#201781)

Helps with some expanded CONCAT_VECTORS cases where both halves came
from wider shuffles.

More yak shaving for #199445
DeltaFile
+18-26llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-4.ll
+8-1llvm/lib/Target/X86/X86ISelLowering.cpp
+3-5llvm/test/CodeGen/X86/x86-interleaved-access.ll
+4-4llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-3.ll
+1-2llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
+34-385 files

LLVM/project 0d566c6llvm/lib/Transforms/Utils SimplifyCFG.cpp, llvm/test/Transforms/SimplifyCFG phi-undef-loadstore.ll

[SimplifyCFG] Look at all uses when checking phi incoming for UB (#200164)

passingValueIsAlwaysUndefined only looks at the first use of the phi
that has a UB-candidate opcode. If that use is in a different block, the
function gives up, even when another use in the same block would prove
UB. Use-list order is not guaranteed, so this happens in practice.

Move the same-block check into the find_if lambda so the scan keeps
going past cross-block uses.
DeltaFile
+39-0llvm/test/Transforms/SimplifyCFG/phi-undef-loadstore.ll
+7-9llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+46-92 files

LLVM/project 7d987f8clang/lib/StaticAnalyzer/Checkers/WebKit PtrTypesSemantics.cpp, clang/test/Analysis/Checkers/WebKit nodelete-annotation.cpp

[alpha.webkit.NoDeleteChecker] Allow no-delete default constructors (#201544)

This PR fixes the bug in TrivialFunctionAnalysis that it treats a
default constructor without an explicit body / definition as not
"trivial". Fixed the bug by allowing the function body to be missing
when isThisDeclarationADefinition is true.

---------

Co-authored-by: Balazs Benics <benicsbalazs at gmail.com>
DeltaFile
+58-0clang/test/Analysis/Checkers/WebKit/nodelete-annotation.cpp
+27-3clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp
+85-32 files

LLVM/project 9c5dcfc. .git-blame-ignore-revs

[NFC][clang] Add pragma comment formatting commit to blame ignore list (#201765)

Add the previously landed formatting-only commit for the pragma comment
kind StringSwitch to `.git-blame-ignore-revs`.

This keeps git blame useful across the NFC formatting change.

Formatting commit:
511d2e40ddeacf25f403b40ed73a41d1dea1b636

Co-authored-by: Tony Varghese <tony.varghese at ibm.com>
DeltaFile
+3-0.git-blame-ignore-revs
+3-01 files

LLVM/project 38b402fclang/test/OpenMP amdgpu-arch-compat.c, llvm/include/llvm/Frontend/OpenMP OMPKinds.def

OpenMP: Accept amdgpu name in arch directive

Accept amdgpu as an alias for amdgcn as part of the general
trend of preferring the amdgpu name. This is so the name is
consistent in the future when the triple arch name changes.
DeltaFile
+12-0clang/test/OpenMP/amdgpu-arch-compat.c
+1-0llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+13-02 files

LLVM/project e379ef3llvm/lib/Transforms/Scalar StraightLineStrengthReduce.cpp

[SLSR] Avoid repeatedly calling canReuseInstruction for the same Basis (#196545)

`canReuseInstruction` only depends on `Basis`, but runs for each
`(Basis, C)` pair. This patch moves the check earlier in the pass to
remove the repeated call.

Assisted-by: Claude Code
DeltaFile
+16-11llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
+16-111 files

LLVM/project 76fa5fcclang/lib/Headers __clang_hip_runtime_wrapper.h, clang/test/Headers hip-constexpr-cmath.hip

[Clang][HIP] Include `__clang_cuda_math_forward_declares.h` before `<cmath>`

This patch should fix the following error on windows: https://github.com/ggml-org/llama.cpp/issues/22570

In HIP, constexpr functions are treated as both __host__ and __device__.

A new version of the MS STL shipped with the build tools version
14.51.36231 has constexpr definitions for some cmath functions when the
compiler in use is Clang.

These definitions conflict with the __device__ declarations we provide
in the header wrappers.

There is a workaround for this: It is possible to overload constexpr
functions **that are defined in a system header** by declaring a __device__
version before.

By moving `__clang_cuda_math_forward_declares.h` before `<cmath>` is
included we're able to benefit from this behavour.
DeltaFile
+6-1clang/lib/Headers/__clang_hip_runtime_wrapper.h
+1-1clang/test/Headers/hip-constexpr-cmath.hip
+7-22 files

LLVM/project 687162fclang/test/Headers hip-constexpr-cmath.hip

[Pre-commit test]
DeltaFile
+70-0clang/test/Headers/hip-constexpr-cmath.hip
+70-01 files

LLVM/project d37537cmlir/include/mlir/Dialect/Tosa/IR TosaTypesBase.td, mlir/lib/Dialect/Tosa/IR TosaOps.cpp

[mlir][tosa] Allow numeric values to be specified for mxint8 constants (#200762)

This commit uses the DenseElementTypeInterface to allow signless numeric
values to be specified for mxint8 constants by supplying `i8` values.
This is more user-friendly than the previous hex representation.
DeltaFile
+23-0mlir/lib/Dialect/Tosa/IR/TosaOps.cpp
+16-0mlir/test/Dialect/Tosa/verifier.mlir
+9-2mlir/test/Dialect/Tosa/ops.mlir
+7-1mlir/include/mlir/Dialect/Tosa/IR/TosaTypesBase.td
+55-34 files

LLVM/project 6ab6b80flang/lib/Semantics resolve-directives.cpp, flang/test/Semantics/OpenMP private03.f90

[Flang][OpenMP]add semantic check for linear clause with statement function variables (#199743)

### **Description**

1. This patch adds a missing semantic check for the LINEAR clause.
2. OpenMP treats LINEAR variables similarly to PRIVATE variables.
Variables used inside statement function expressions are not allowed to
be privatized, but Flang was not checking this for LINEAR.
3. The existing privatization check already handled PRIVATE,
FIRSTPRIVATE, and LASTPRIVATE. This patch extends the same check to
LINEAR.

Fixes : [199660](https://github.com/llvm/llvm-project/issues/199660)

### **Reproducer**
```
subroutine test()
  integer :: pi, r, f, x
  f(r) = pi * r + x

    [21 lines not shown]
DeltaFile
+9-1flang/test/Semantics/OpenMP/private03.f90
+4-1flang/lib/Semantics/resolve-directives.cpp
+13-22 files

LLVM/project 29f6956llvm/docs LoopFusion.rst

[LoopFusion][docs][NFC] Document atomic accesses as a fusion blocker (#201775)

Loops containing atomic accesses are now rejected outright, mirroring
the volatile blocker. Update the eligibility sections to match.
DeltaFile
+17-13llvm/docs/LoopFusion.rst
+17-131 files

LLVM/project 0f18088llvm/lib/Target/RISCV RISCVInstrInfoZvvm.td, llvm/lib/Target/RISCV/AsmParser RISCVAsmParser.cpp

[RISCV][MC] Add experimental `Zvvmtls` and `Zvvmttls` support (#198229)

This patch adds experimental MC-layer support for the [RISC-V Integrated
Matrix
Extension](https://github.com/riscv/integrated-matrix-extension/releases/tag/riscv-isa-release-71c48b9-2026-05-17),
specifically the tile load/store extensions: `Zvvmtls` and `Zvvmttls`

This PR:

- Adds the optional tile lambda operand syntax (`L1` through `L64`), and
related asm operand.
- Adds the `vmtl.v`, `vmts.v`, `vmttl.v` and `vmtts.v` instructions to
the MC
- Modifies `parseMaskReg` to return `NoMatch` to allow overloaded
mnemonics to continue matching alternative optional operands, such as
parsing `vmtl.v v8, (a0), a1, L4` as the tile-lambda form instead of
failing by treating `L4` as a malformed mask operand. Real mask
registers missing .t, such as v0, still produce the existing diagnostic.
DeltaFile
+114-12llvm/lib/Target/RISCV/RISCVInstrInfoZvvm.td
+67-0llvm/test/MC/RISCV/rvv/zvvmtls.s
+67-0llvm/test/MC/RISCV/rvv/zvvmttls.s
+34-2llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+32-0llvm/test/MC/RISCV/rvv/zvvmtls-invalid.s
+32-0llvm/test/MC/RISCV/rvv/zvvmttls-invalid.s
+346-148 files not shown
+399-1614 files

LLVM/project b735680mlir/include/mlir/IR BuiltinDialectBytecode.td, mlir/lib/IR BuiltinDialectBytecode.cpp

[mlirbc] Add AffineMap serialization support  (#191970)

Add binary bytecode encoding for AffineMapAttr, replacing the textual fallback.
AffineMap is encoded as numDims, numSymbols, numResults, followed by the result
expressions. Where each expression, AffineExpr, is encoded in the general case
as a recursive/prefix tree with a VarInt kind tag followed by kind-specific
data. To guard a bit more against malformed bytecode it uses an iterative
parser for these.

Special case encoding for common case AffineMap's (required less space & easy
to create without much higher maintenance needs). The ordering of the enum
serialized differs from AffineExprKind as the latter has an expansion point in
the middle (new kinds can be added there) while the serialized encoding needs
to remain stable.

Updated the checked in mlirbc file as memref has a default affinemap, so
updating it pre snap.

Assisted-by: Antigravity : Gemini
DeltaFile
+308-0mlir/lib/IR/BuiltinDialectBytecode.cpp
+63-0mlir/test/Dialect/Builtin/Bytecode/attrs.mlir
+10-0mlir/include/mlir/IR/BuiltinDialectBytecode.td
+0-0mlir/test/Dialect/Builtin/Bytecode/builtin_fixed_0.mlirbc
+381-04 files

LLVM/project ec8c818lldb/unittests/SymbolFile/DWARF XcodeSDKModuleTests.cpp

[lldb][NFC] Don't use C++20 designated initializer (#201075)

This source triggers the `-Wc++20-designator` warning as we're still
using C++ 17.
DeltaFile
+41-41lldb/unittests/SymbolFile/DWARF/XcodeSDKModuleTests.cpp
+41-411 files

LLVM/project fdfd1c1lldb/test/API/macosx/thread-names TestInterruptThreadNames.py

[lldb][test] Increase polling in TestInterruptThreadNames.py (#201554)

This test runs for a very long time on my machine (11s per variation),
and nearly all of this time is spent on the 10s sleep in this function.

There are two issues here:

1. It uses the (now outdated) logic that arm64 means we have a remote
Darwin device. This is no longer true these days as Macs also run on
arm64.

2. The polling duration of 1s is still very long, and the test will
still spend all its time just waiting for this 1s sleep. A 100ms sleep
that we poll in a loop should be slow enough.
DeltaFile
+2-9lldb/test/API/macosx/thread-names/TestInterruptThreadNames.py
+2-91 files

LLVM/project baccad7lldb/packages/Python/lldbsuite/support gmodules.py

[lldb][test] Assume clang supports -gmodules (#201333)

We currently spend 50ms in most dotest invocations to check if clang
supports `-gmodules`. The expensive part of this check is creating the
clang process to run `clang --help`.

`-gmodules` was added 11 years ago and is present in any compiler that
has even a remote chance in supporting the rest of our test suite. This
patch just assumes that our compiler supports -gmodules if it is clang.
DeltaFile
+1-6lldb/packages/Python/lldbsuite/support/gmodules.py
+1-61 files

LLVM/project 7b9435blldb/test/API/commands/process/attach main.cpp

[lldb][test] Increase polling frequency in ProcessAttach (#201532)

The test_attach_to_process_by_id_correct_executable_offset subtest
requires us to hit a breakpoint in an attached process. For this we
implement a loop that hits the breakpoint location every 2 seconds.

This patch increases the rate at which we hit this breakpoint to 50ms.
The reason is that a 2s interval means that this test is waiting on any
fast system for nearly 2 seconds on the first breakpoint hit. With a
50ms interval this subtest passed immediately.
DeltaFile
+7-11lldb/test/API/commands/process/attach/main.cpp
+7-111 files

LLVM/project 59bdd5blldb/test/API/macosx/thread-names TestInterruptThreadNames.py

[lldb][test] Make TestInterruptThreadNames not depend on debug info (#201553)

This test only reads the pthread names, which don't depend on any debug
info.

This halves the runtime of this very long test from 22s to 11s.
DeltaFile
+2-0lldb/test/API/macosx/thread-names/TestInterruptThreadNames.py
+2-01 files

LLVM/project 53e3e24llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU dynamic_stackalloc.ll amdgpu-cs-chain-fp-nosave.ll

[AMDGPU] In `LowerDYNAMIC_STACKALLOC`, hoist the `readfirstlane` up one instruction (#201528)

Instead of:

```
$max_size_vgpr = wave_reduction_umax($vgpr_alloca_size)
$sgpr_newsp = readfirstlane($max_size_vgpr + $sgpr_sp)
```

Hoist the readfirstlane up to perform the addition using scalar
registers:

```
$max_size_sgpr = readfirstlane(wave_reduction_umax($vgpr_alloca_size))
$sgpr_newsp = $max_size_sgpr + $sgpr_sp
```
DeltaFile
+180-210llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll
+36-49llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll
+5-7llvm/test/CodeGen/AMDGPU/llvm.sponentry.ll
+5-6llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+226-2724 files

LLVM/project ce5b2e8libcxx/docs/ReleaseNotes 23.rst, libcxx/include module.modulemap.in functional

[libc++] Drop transitive includes by default (#195509)

This patch removes the unused transitive includes by default.
`_LIBCPP_KEEP_TRANSITIVE_INCLUDES_LLVM23` can be defined to keep the
transitive includes around for an easier transition. The macro will be
removed in LLVM 24.

This patch implements
https://discourse.llvm.org/t/rfc-remove-unused-transitive-includes-from-the-libc-headers/90157
DeltaFile
+10-2libcxx/include/module.modulemap.in
+4-4libcxx/include/functional
+2-5libcxx/include/utility
+1-5libcxx/test/std/utilities/memory/util.smartptr/util.smartptr.weak/util.smartptr.weak.const/pr40459.pass.cpp
+3-3libcxx/include/any
+6-0libcxx/docs/ReleaseNotes/23.rst
+26-1983 files not shown
+128-11789 files

LLVM/project 004eac2offload/libomptarget omptarget.cpp, offload/plugins-nextgen/common/src RecordReplay.cpp

[offload][OpenMP] Fix record replay when no memory is used

Progams that do not use any memory (e.g., no mappings) were failing because
we were trying to execute zero size transfers.
DeltaFile
+18-12offload/libomptarget/omptarget.cpp
+26-0offload/test/tools/omp-kernel-replay/record-replay-empty-memory.cpp
+13-9offload/plugins-nextgen/common/src/RecordReplay.cpp
+2-1offload/tools/kernelreplay/llvm-omp-kernel-replay.cpp
+59-224 files

LLVM/project fd6afc5llvm/lib/Object RelocationResolver.cpp

[Object] Support COFF MIPS in RelocationResolver (#200477)

Similar to other 32-bit COFF logic.
DeltaFile
+23-0llvm/lib/Object/RelocationResolver.cpp
+23-01 files

LLVM/project 45a9a69mlir/test/Transforms test-legalizer.mlir, mlir/test/lib/Dialect/Test TestPatterns.cpp

[mlir] Fix crash in test type converter for 1->N result conversion (#201738)

Use `results.append` instead of `results.assign`, preserving previous
results.

Fixes https://github.com/llvm/llvm-project/issues/201521
DeltaFile
+6-0mlir/test/Transforms/test-legalizer.mlir
+1-1mlir/test/lib/Dialect/Test/TestPatterns.cpp
+7-12 files

LLVM/project df170b2llvm/test/CodeGen/X86 vector-interleaved-store-i64-stride-6.ll x86-interleaved-access.ll

[X86] X86FixupInstTuning - fold VPERM2x128 -> VINSERTx128 when shuffling lower xmm half ymm sources (#201618)

VINSERTx128 is never slower than VPERM2x128 and notably quicker on some
targets (btver2, znver1, e-cores, etc.).

Shuffle lowering avoids some VINSERT patterns for AVX targets as it can
affect folding/commutation - but by the time we get to the fixup passes,
these are all done and we can safely convert to VINSERTF128/I128.

There's more variants of the VPERM2 immediate mask that could be folded,
but its incredibly difficult to hit them as its easily commutable.

I hit this while working on #199445.
DeltaFile
+93-93llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-6.ll
+32-64llvm/test/CodeGen/X86/x86-interleaved-access.ll
+45-45llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
+45-45llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-2.ll
+21-21llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-2.ll
+21-21llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
+257-28911 files not shown
+362-36417 files