LLVM/project 7b8c4f5flang/include/flang/Optimizer/Dialect/MIF MIFOps.td, flang/lib/Optimizer/Builder IntrinsicCall.cpp

[Flang][MIF] Adding support of intrinsics with coarray argument (#192944)

Added support for intrinsics that query the coarray in argument:
- Adding lowering and operation in MIF Dialect for UCOBOUND, LCOBOUND,
COSHAPE and IMAGE_INDEX
- Adding support of coarray argument for THIS_IMAGE in MIF Dialect (and
the lowering)

---------

Co-authored-by: Dan Bonachea <dobonachea at lbl.gov>
Co-authored-by: jeanPerier <jean.perier.polytechnique at gmail.com>
DeltaFile
+219-14flang/lib/Optimizer/Transforms/MIFOpConversion.cpp
+129-4flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+109-4flang/include/flang/Optimizer/Dialect/MIF/MIFOps.td
+104-0flang/test/Fir/MIF/cobound.mlir
+86-3flang/lib/Optimizer/Dialect/MIF/MIFOps.cpp
+80-0flang/test/Fir/MIF/coshape.mlir
+727-2522 files not shown
+1,103-10928 files

LLVM/project 43424c3clang/test lit.cfg.py, clang/test/ClangScanDeps modules-full-by-mod-name.c

[clang-scan-deps] Add scan-deps-filter.py test helper to filter full output (#206758)

Add a helper script, which projects `clang-scan-deps` experimental-full
JSON down to a chosen set of fields, plus a `%scan-deps-filter` lit
substitution. A bare key (e.g. `file-deps`) matches that key at any
depth. A dotted path (e.g. `modules.command-line`) is anchored from the
document root to disambiguate keys when relevant.

This lets tests assert only on the fields they care about instead of
`CHECK`ing the whole object, which otherwise breaks whenever an
unrelated field is added/modified, and avoids gating emission behind
awkward per-field flags.

Migrate modules-full-by-mod-name.c as a first example.

Assisted-by: Claude Opus 4.8
DeltaFile
+127-0clang/utils/scan-deps-filter.py
+8-19clang/test/ClangScanDeps/modules-full-by-mod-name.c
+11-0clang/test/lit.cfg.py
+146-193 files

LLVM/project 32f14d7libcxx/include __config, libcxx/include/__configuration platform.h

[libc++] Move threading and random device config into <__configuration/platform.h> (#206262)

These are platform-specific configuration options, so they should live
`<__configuration/platform.h>`.
DeltaFile
+0-145libcxx/include/__config
+144-0libcxx/include/__configuration/platform.h
+144-1452 files

LLVM/project c646b95llvm/lib/CodeGen/GlobalISel IRTranslator.cpp, llvm/test/CodeGen/AArch64/GlobalISel threadlocal-address.ll

[GlobalISel] Implement threadlocal.address translation (#206908)

Use the same lowering as sdag.
DeltaFile
+21-0llvm/test/CodeGen/AArch64/GlobalISel/threadlocal-address.ll
+2-1llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+23-12 files

LLVM/project 562db47clang/include/clang/Driver CommonArgs.h, clang/lib/Driver/ToolChains CommonArgs.cpp AMDGPU.cpp

clang/AMDGPU: Fix double linking opencl libs with --libclc-lib

Noticed by inspection. If using an explicit --libclc-lib flag,
do not attempt to also link the rocm device libs which will contain
different implementations of the same opencl symbols.

Co-Authored-By: Claude <noreply at anthropic.com>
DeltaFile
+8-7clang/lib/Driver/ToolChains/CommonArgs.cpp
+9-0clang/test/Driver/opencl-libclc.cl
+5-1clang/include/clang/Driver/CommonArgs.h
+2-1clang/lib/Driver/ToolChains/AMDGPU.cpp
+24-94 files

LLVM/project b7e6e07clang/lib/Driver/ToolChains AMDGPU.cpp, clang/test/Driver amdgpu-openmp-gpu-max-threads-per-block.c

clang/AMDGPU: Remove driver restriction on --gpu-max-threads-per-block

Previously this flag was only handled for HIP, and would produce an unused
argument warning. There is a custom warning produced by cc1 that the
argument isn't supported, but practically speaking that was unreachable
due to not forwarding the argument. Also add a test for the untested warning.
Also use a simpler method for forwarding the flag to cc1.
DeltaFile
+14-0clang/test/Frontend/openmp-warn-gpu-max-threads-per-block.c
+2-8clang/lib/Driver/ToolChains/AMDGPU.cpp
+6-0clang/test/Driver/amdgpu-openmp-gpu-max-threads-per-block.c
+22-83 files

LLVM/project 7530c35clang/lib/Driver/ToolChains AMDGPU.cpp HIPAMD.cpp

clang/AMDGPU: Merge toolchain subclasses

Simplify the toolchain implementations by collapsing
them into one. Previously we had a confusing split. The
AMDGPUToolChain base class implemented much of the base
support. It was subclassed by ROCMToolChain, which would
have been more accurately described as the offloading subclass.

That was further subclassed into HIP and OpenMP specific subclasses.
Deleting those two is the important part of this change. There was
code duplication, and features arbitrarily handled in one but not
the other. The offload kind is passed in almost everywhere if you
really need to know the original language. However, I consider
this an antifeature, and it is really poor QoI to have the HIP
and OpenMP toolchains behave differently in any way. The platform
should be consistent and the driver behaviors should not depend
on the language.

There is additional mess in the handling of spirv, which this

    [9 lines not shown]
DeltaFile
+263-125clang/lib/Driver/ToolChains/AMDGPU.cpp
+2-193clang/lib/Driver/ToolChains/HIPAMD.cpp
+0-94clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+48-23clang/lib/Driver/ToolChains/AMDGPU.h
+0-68clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
+1-50clang/lib/Driver/ToolChains/HIPAMD.h
+314-5534 files not shown
+339-56810 files

LLVM/project ce8cf3fllvm/include/llvm/CodeGen BasicTTIImpl.h, llvm/test/Transforms/RelLookupTableConverter unnamed_addr.ll

Revert "Disable RelLookupTableConverter on AArch64" (#207046)

Reverts llvm/llvm-project#204669


https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst#code-models
says that text + rodata should be <2GB on AArch64 for the small code
model, so we should be able to enable RelLookupTableConverter on AArch64
small code model.

With #205963, we now properly diagnose overflows, rather than silently
truncating and miscompiling.
DeltaFile
+0-47llvm/test/Transforms/RelLookupTableConverter/AArch64/no_relative_lookup_table.ll
+40-0llvm/test/Transforms/RelLookupTableConverter/unnamed_addr.ll
+3-4llvm/include/llvm/CodeGen/BasicTTIImpl.h
+43-513 files

LLVM/project 81350fbutils/bazel/llvm-project-overlay/lldb/source/Plugins BUILD.bazel

[Bazel] Fixes 282416b (#207051)

This fixes 282416b6d457b7fdd5d51fed0d7c59d3ee09093f.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+1-0utils/bazel/llvm-project-overlay/lldb/source/Plugins/BUILD.bazel
+1-01 files

LLVM/project c188fddllvm/test/CodeGen/AMDGPU s-barrier-signal-var-gep.ll

[AMDGPU] Consolidate CHECK lines for barrier-gep test. NFC

Change-Id: I5d1e155cb02acba76bcdd8a1413d8e694b83ee83
DeltaFile
+172-197llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep.ll
+172-1971 files

LLVM/project 282416blldb/include/lldb/Target RegisterFlags.h, lldb/include/lldb/Utility RegisterFlags.h

[lldb] Move RegisterFlags from Target to Utility (#206861)

I'm doing this so that I can move RegisterInfo from
`lldb-private-types.h` to lldbUtility. It currently has a `RegisterFlags
*` field, so having it sit in lldb-private-types.h masks the actual
layering of our data types.

I considered moving RegisterInfo into Target, but RegisterValue (in
lldbUtility) uses RegisterInfo directly. Because RegisterFlags has no
internal dependencies, it seemed better to sink that instead.
DeltaFile
+431-0lldb/source/Utility/RegisterFlags.cpp
+0-431lldb/source/Target/RegisterFlags.cpp
+0-198lldb/include/lldb/Target/RegisterFlags.h
+198-0lldb/include/lldb/Utility/RegisterFlags.h
+2-2lldb/unittests/Target/RegisterFlagsTest.cpp
+1-1lldb/source/Plugins/Process/Utility/RegisterFlagsDetector_arm64.h
+632-6328 files not shown
+639-63914 files

LLVM/project 929284fllvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

DAG: Preserve poison in more concat_vectors folds (#206948)
DeltaFile
+6-2llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+6-21 files

LLVM/project 3794329llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU s-barrier-signal-var-gep.ll

[AMDGPU] Fold constant offsets into named barrier addresses (#205216)

Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. `s_barrier_signal_var` on a GEP'd
named barrier now selects the immediate form, matching a bare global and
GlobalISel.
DeltaFile
+14-11llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+4-8llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep.ll
+8-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+26-193 files

LLVM/project 524b97allvm/utils/lit/lit TestRunner.py, llvm/utils/lit/tests shtest-output-printing.py

[lit] Truncate process output to 10 kiB (#206355)

The output of processes is transformed multiple times:

1. All non-piped/redirected output of processes is collected
   (communicate() for the last process of the pipe, read() for all
   previous.)
   - It's not good that we *collect* the entire output at all (frequent
     realloc+memcpy for large buffers) and I'd rather not have a
     possibly large output stored in Python at all.
2. The output is converted into strings (memcpy/utf-8 decode) and stored
   in the results list of executeScriptInternal.
3. executeScriptInternal builds the debug output combining all these
   stdout/stderr.
   - It performs a lot of `out += ...`, which allocates (malloc+memcpy)
     a new string every time. There are many of these concatenations.
4. The combined debug output is returned (together with other things) to
   _runShTest, which determines whether the test passed, executing the
   test multiple times if necessary. It also string-formats the output.

    [23 lines not shown]
DeltaFile
+10-4llvm/utils/lit/lit/TestRunner.py
+1-1llvm/utils/lit/tests/shtest-output-printing.py
+11-52 files

LLVM/project fa84e62llvm/lib/Target/AMDGPU/MCA AMDGPUCustomBehaviour.cpp AMDGPUCustomBehaviour.h

 [NFC][AMDGPU] Use SIInstrFlags predicates in MCA (#206761)

Replace raw TSFlags accesses with SIInstrFlags predicate calls in
AMDGPUCustomBehaviour.

Part of a series following the introduction of SIInstrFlags predicates.
DeltaFile
+12-26llvm/lib/Target/AMDGPU/MCA/AMDGPUCustomBehaviour.cpp
+0-4llvm/lib/Target/AMDGPU/MCA/AMDGPUCustomBehaviour.h
+12-302 files

LLVM/project 539ad35mlir/test/Integration/Dialect/Linalg/CPU/ArmSME multi-tile-matmul.mlir

[mlir][sme] Update the multi-tile e2e example (#202979)

These changes enable hoisting of the accumulator load/store operations
out of the K loop.

Many thanks to @steplong for identifying the missing steps (see #201562)
DeltaFile
+15-20mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/multi-tile-matmul.mlir
+15-201 files

LLVM/project 9b8729bllvm/include/llvm/CodeGen BasicTTIImpl.h, llvm/test/Transforms/RelLookupTableConverter unnamed_addr.ll

Revert "Disable RelLookupTableConverter on AArch64 (#204669)"

This reverts commit 58f086d0cd58252b2b18fa95f96a566a06e96a36.
DeltaFile
+0-47llvm/test/Transforms/RelLookupTableConverter/AArch64/no_relative_lookup_table.ll
+40-0llvm/test/Transforms/RelLookupTableConverter/unnamed_addr.ll
+3-4llvm/include/llvm/CodeGen/BasicTTIImpl.h
+43-513 files

LLVM/project fdc29a9llvm/include/llvm/Transforms/Utils SplitModuleByCategory.h, llvm/lib/Transforms/Utils SplitModuleByCategory.cpp

[llvm][SplitModuleByCategory] Fix infinite loop on cyclic global uses (#206862)

The dependency graph construction in `addUserToGraphRecursively` walked
the users of globals without visited set. A cycle in the use graph
caused it to push the same users forever, hanging the splitter.

Track visited users so each is processed once.

Co-Authored-By: Claude
DeltaFile
+47-0llvm/test/tools/llvm-split/SplitByCategory/recursion-and-cycles.ll
+4-1llvm/lib/Transforms/Utils/SplitModuleByCategory.cpp
+2-2llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h
+53-33 files

LLVM/project a201ae3lldb/source/Plugins/LanguageRuntime/ObjC ObjCLanguageRuntime.h, lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime AppleObjCTrampolineHandler.cpp AppleObjCRuntimeV2.h

Handle the case where the ISA we find when looking up a method implementation has masked bits (#206864)

We need to canonicalize these since we look them up, and the PointerISA
is the right thing to use since it actually points at the class.

I can't write a test for this because ObjC mostly uses the masks for
swift/objc interop. I will add a test on the swift fork.
DeltaFile
+4-0lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCTrampolineHandler.cpp
+1-1lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCRuntimeV2.h
+2-0lldb/source/Plugins/LanguageRuntime/ObjC/ObjCLanguageRuntime.h
+7-13 files

LLVM/project e0500fallvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/AMDGPU vector-reduce-or.ll

DAG: Preserve poison in all-undef build_vector/concat_vectors folds (#206947)
DeltaFile
+12-4llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+0-1llvm/test/CodeGen/AMDGPU/vector-reduce-or.ll
+12-52 files

LLVM/project 4a6f7b1lld/ELF/Arch AArch64.cpp, lld/test/ELF aarch64-prel16.s aarch64-prel32.s

[lld] Make R_AARCH64_PREL32/16 only signed ints (#205963)

After https://github.com/ARM-software/abi-aa/pull/401, these are defined
to be only signed 32/16 bit ints rather than both signed and unsigned.

Assisted-by: Gemini
DeltaFile
+5-5lld/test/ELF/aarch64-prel16.s
+5-5lld/test/ELF/aarch64-prel32.s
+5-2lld/ELF/Arch/AArch64.cpp
+15-123 files

LLVM/project f8e3973flang/lib/Semantics check-omp-variant.cpp check-omp-structure.h, flang/test/Semantics/OpenMP metadirective-loop-nest.f90 metadirective-loop-applicability.f90

[flang][OpenMP] Semantic checks for metadirective loop nests

A loop-associated metadirective variant (`do`, `simd`, ...) is only
resolved during lowering, so it is never checked as a loop construct
during semantic analysis; a malformed or non-canonical associated nest
therefore reaches lowering, which assumes a canonical nest.

This patch Validate the nest that follows such a variant (the next
executable construct) during semantics, reusing the diagnostics of a real
loop-associated construct. Each applicable variant is checked against it:

  * Canonical loop: the affected loop must be a canonical DO loop, so a
    `DO WHILE`, a pre-6.0 `DO CONCURRENT`, or a `DO` without loop control
    is rejected.
  * Nest depth: `collapse(n)` and `ordered(n)` must not exceed the depth
    of the associated loop nest.
  * Rectangularity: loops that must be rectangular (e.g. under `tile`) may
    not have bounds that depend on an outer loop's variable.


    [8 lines not shown]
DeltaFile
+134-0flang/test/Semantics/OpenMP/metadirective-loop-nest.f90
+123-0flang/lib/Semantics/check-omp-variant.cpp
+54-0flang/test/Semantics/OpenMP/metadirective-loop-applicability.f90
+11-0flang/lib/Semantics/check-omp-structure.h
+4-0flang/lib/Semantics/check-omp-structure.cpp
+326-05 files

LLVM/project e6c909autils/bazel .bazelrc

[bazel] Fix remote exec with thin docker images (#205849)

rules_python is working on flipping this default but without this
setting /usr/bin/python3 must exist to run a py_binary. This might not
be the case in remote exec environments where you're trying to use the
smallest possible image.
DeltaFile
+3-0utils/bazel/.bazelrc
+3-01 files

LLVM/project 842558ellvm/test/TableGen searchabletables-multi-string-key.td, llvm/utils/TableGen SearchableTableEmitter.cpp

[TableGen] Fix SearchableTable lookup comparator w/ multiple string keys (#207021)

This change fixes a bug in `SearchableTableEmitter::emitLookupFunction`
where `emitComparator` redeclares `LHSStr`/`RHSStr` in the same scope.
This fix simply attaches the Field.Name to the emitted `LHSStr`/`RHSStr`
variable names.
DeltaFile
+40-0llvm/test/TableGen/searchabletables-multi-string-key.td
+15-9llvm/utils/TableGen/SearchableTableEmitter.cpp
+55-92 files

LLVM/project 1d2645fclang/lib/CIR/CodeGen CIRGenModule.cpp, clang/test/CIR/CodeGen global-replace-string-array.c

[CIR] Fix getNewInitValue on string-literal arrays

`getNewInitValue` in `CIRGenModule.cpp` rebuilds a global's initializer when
`replaceGlobal` fixes up references after a global's type changes -- for
example when an `extern` array is referenced while still incomplete and then
completed. Its `ConstArrayAttr` branch cast `getElts()` to an `mlir::ArrayAttr`,
but a `ConstArrayAttr` built from a string literal stores its elements as a
`StringAttr`. A struct global that both points at the replaced global and has a
`char` array member therefore aborted on a failed `cast<ArrayAttr>` during
CIRGen.

`ConstArrayAttr::verify` allows only two element kinds: an `ArrayAttr` or a
`StringAttr`. A `StringAttr` holds raw 8-bit bytes and references no globals, so
there is nothing to rewrite. The fix returns the initializer unchanged for the
`StringAttr` case and `cast`s on the `ArrayAttr` path, so a future third element
kind asserts rather than silently passing through.

This surfaced compiling CPython's deep-frozen module data (SPEC CPU 2026
714.cpython_r), where frozen objects cross-reference each other and carry string
payloads. The benchmark advances past this abort to a const-record type-identity
issue that is tracked separately.
DeltaFile
+21-0clang/test/CIR/CodeGen/global-replace-string-array.c
+9-1clang/lib/CIR/CodeGen/CIRGenModule.cpp
+30-12 files

LLVM/project f2ff9a5llvm/test/CodeGen/AMDGPU s-barrier-signal-var-gep.ll

[AMDGPU] Consolidate CHECK lines for barrier-gep test. NFC

Change-Id: I5d1e155cb02acba76bcdd8a1413d8e694b83ee83
DeltaFile
+172-197llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep.ll
+172-1971 files

LLVM/project 8ad161dllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

No else after return

Change-Id: Iff2630b1ac15ff821eacb4a8c9339c85a876ddbc
DeltaFile
+6-5llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+6-51 files

LLVM/project cf360fbllvm/include/llvm/ADT IntervalMap.h, llvm/lib/Support UnicodeNameToCodepoint.cpp

[LLVM] Avoid nested std::min and std::max. NFC. (#206982)
DeltaFile
+2-2llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+2-2llvm/include/llvm/ADT/IntervalMap.h
+1-2llvm/lib/TextAPI/RecordsSlice.cpp
+1-2llvm/lib/Transforms/Utils/ASanStackFrameLayout.cpp
+1-1llvm/lib/Support/UnicodeNameToCodepoint.cpp
+1-1llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+8-106 files

LLVM/project 478acecllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU s-barrier-signal-var-gep.ll

[AMDGPU] Fold constant offsets into named barrier addresses

Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.

The barrier ID is derived from the address via (addr >> 4) & 0x3F, so a
byte offset that does not land on a 16-byte barrier boundary is still
valid: it simply selects the containing barrier. No alignment assertion
is needed, and such offsets must not crash the compiler (see the
misaligned test).

Change-Id: I639bc723eb001573585cc05d0ad19f2773054f21
Assisted-by: Cursor
DeltaFile
+8-6llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+4-8llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep.ll
+8-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+20-143 files

LLVM/project ce8509fmlir/lib/Dialect/OpenACC/Transforms ACCRecipeMaterialization.cpp

[OpenACC] append triples when materializing reduction destroy recipes (#207034)

Append the triples if they exist when materializing the destroy region.
DeltaFile
+1-0mlir/lib/Dialect/OpenACC/Transforms/ACCRecipeMaterialization.cpp
+1-01 files