LLVM/project ba5384allvm/include/llvm/Support CommandLine.h, llvm/lib/Support CommandLine.cpp

[Support] Add a parser for cl::opt<ElementCount> (#203969)

This adds command-line option parsing support for ElementCount.

This allows the following syntax:
```
  --my-option=4 ; Maps to ElementCount::getFixed(4)
  --my-option="vscale x 8" ; Maps to ElementCount::getScalable(8)
```
This is intended to unify fixed/scalable option handling in the loop
vectorizer. Currently, we have options like
'`EpilogueVectorizationForceVF`' defined as `cl::opt<unsigned>` which do
not allow specifying scalable VFs.

Assisted-by: Codex
DeltaFile
+85-0llvm/unittests/Support/CommandLineTest.cpp
+46-0llvm/lib/Support/CommandLine.cpp
+23-0llvm/include/llvm/Support/CommandLine.h
+154-03 files

LLVM/project a8aba70flang/lib/Lower ConvertVariable.cpp MultiImageFortran.cpp, flang/test/Lower/MIF coarray_allocation5.f90 coarray_allocation4.f90

[Flang] Standardize coarray TODO() diagnostic messages (#204708)
DeltaFile
+5-4flang/lib/Lower/ConvertVariable.cpp
+3-3flang/lib/Lower/MultiImageFortran.cpp
+3-1flang/lib/Lower/Bridge.cpp
+1-1flang/test/Lower/MIF/coarray_allocation5.f90
+1-1flang/test/Lower/MIF/coarray_allocation4.f90
+1-1flang/test/Lower/MIF/coarray_allocation3.f90
+14-112 files not shown
+16-138 files

LLVM/project c890f4dutils/bazel/llvm-project-overlay/mlir BUILD.bazel

[Bazel] Fixes 95e3219 (#204873)

This fixes 95e321951ad3041998e49bc0353482bcd27c65db.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+1-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+1-01 files

LLVM/project 5e7727doffload/ci openmp-offload-amdgpu-libc-runtime.py

Revert "Revert "[AMDGPU] Add compiler-rt checks for the GPU runtime" (#204370)"

This reverts commit 24f4fbf89d7e1c6e7b00efde469adb0a8c529cd2.
DeltaFile
+7-0offload/ci/openmp-offload-amdgpu-libc-runtime.py
+7-01 files

LLVM/project 90b2048llvm/lib/Bitcode/Reader BitcodeReader.cpp, llvm/test/Bitcode invalid-summary-version.test

bitcode: Improve invalid summary version error (#204888)
DeltaFile
+3-4llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+5-0llvm/test/Bitcode/invalid-summary-version.test
+0-0llvm/test/Bitcode/Inputs/invalid-summary-version.bc
+8-43 files

LLVM/project f9fa598llvm/test/CodeGen/AMDGPU rem_i128.ll div_v2i128.ll

[AMDGPU] Use explicit carry nodes for i64 wide integer lowering (#204694)

This PR switches widened i64 add/sub lowering to use explicit
UADDO/USUBO carry
nodes instead of glue-based carry chains.
DeltaFile
+1,255-1,278llvm/test/CodeGen/AMDGPU/rem_i128.ll
+950-975llvm/test/CodeGen/AMDGPU/div_v2i128.ll
+758-780llvm/test/CodeGen/AMDGPU/div_i128.ll
+460-514llvm/test/CodeGen/AMDGPU/flat_atomics_i64_system.ll
+226-250llvm/test/CodeGen/AMDGPU/flat_atomics_i64.ll
+192-216llvm/test/CodeGen/AMDGPU/flat_atomics_i64_system_noprivate.ll
+3,841-4,01317 files not shown
+4,729-4,74523 files

LLVM/project 086f633llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.load.async.to.lds.ll

AMDGPU/GlobalISel: RegBankLegalize rules for load_async_to_lds (#204683)
DeltaFile
+2-1llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+1-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.async.to.lds.ll
+3-22 files

LLVM/project 4195b29.github/workflows subscriber.yml

workflows/subscriber: Update to latest github automation container (#204692)

This one is about 33% smaller than the previous version.
DeltaFile
+1-1.github/workflows/subscriber.yml
+1-11 files

LLVM/project 39f8f90llvm/lib/Target/SPIRV SPIRVEmitIntrinsics.cpp, llvm/test/CodeGen/SPIRV/instructions undef-composite.ll

[SPIR-V] Lower undef nested in a constant aggregate (#204377)

A constant aggregate whose element is itself an aggregate `undef` was
never lowered to a placeholder. The raw aggregate operand reached
IRTranslator on the llvm.spv.const.composite call and aborted with
"unable to translate instruction".

A similar issue was found and fixed during SPV_KHR_poison_freeze
implementation. So instead of re-inventing a wheel - unify lowering with
poison.

Addresses the following observation:
https://github.com/llvm/llvm-project/pull/198037#discussion_r3304013315
DeltaFile
+61-71llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
+45-0llvm/test/CodeGen/SPIRV/instructions/undef-composite.ll
+106-712 files

LLVM/project 6f05646llvm/include/llvm/Transforms/Vectorize SLPVectorizer.h, llvm/lib/Transforms/Vectorize SLPVectorizer.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+249-15llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+21-191llvm/test/Transforms/SLPVectorizer/X86/masked-stores.ll
+2-1llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
+272-2073 files

LLVM/project fe9521dllvm/lib/Transforms/Vectorize VPlanRecipes.cpp VPlan.cpp

[LV] Unify header phi fixup and remove fixNonInductionPHIs (NFC). (#204886)

Unify the execute logic for VPPhi and VPWidenPHIRecipe into a shared
executePhiRecipe helper that handles both scalar and vector phis. For
header phis, only the preheader incoming value is added during execute;
the backedge is fixed up later by VPlan::execute().

This allows generalizing the VPlan::execute() fixup loop to handle all
loop headers (not just the first), removing the VPWidenPHIRecipe skip,
and eliminating fixNonInductionPHIs entirely.
DeltaFile
+22-19llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+15-22llvm/lib/Transforms/Vectorize/VPlan.cpp
+0-22llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+37-633 files

LLVM/project 7472a0ellvm/lib/IR Verifier.cpp, llvm/test/Verifier x86-amx-tile-register-index.ll

[Verifier] Verify AMX tile-register index operands are in range

AMX has 8 physical tile registers (TMM0-TMM7), so the tile-index operands
of the AMX intrinsics must be in [0, 8): operand 0 for the tile
load/store/zero intrinsics, operands 0-2 for the tdp* family.
DeltaFile
+30-0llvm/test/Verifier/x86-amx-tile-register-index.ll
+24-0llvm/lib/IR/Verifier.cpp
+54-02 files

LLVM/project bd70fc0llvm/lib/Bitcode/Reader BitcodeReader.cpp, llvm/test/Bitcode invalid-summary-version.test

bitcode: Improve invalid summary version error

Include the filename in the description.
DeltaFile
+3-4llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+5-0llvm/test/Bitcode/invalid-summary-version.test
+0-0llvm/test/Bitcode/Inputs/invalid-summary-version.bc
+8-43 files

LLVM/project 776cea3llvm/test/CodeGen/AMDGPU rem_i128.ll div_v2i128.ll

[AMDGPU] Use explicit carry nodes for i64 wide integer lowering

This PR switches widened i64 add/sub lowering to use explicit UADDO/USUBO carry
nodes instead of glue-based carry chains.
DeltaFile
+1,255-1,278llvm/test/CodeGen/AMDGPU/rem_i128.ll
+950-975llvm/test/CodeGen/AMDGPU/div_v2i128.ll
+758-780llvm/test/CodeGen/AMDGPU/div_i128.ll
+460-514llvm/test/CodeGen/AMDGPU/flat_atomics_i64_system.ll
+226-250llvm/test/CodeGen/AMDGPU/flat_atomics_i64.ll
+192-216llvm/test/CodeGen/AMDGPU/flat_atomics_i64_system_noprivate.ll
+3,841-4,01317 files not shown
+4,729-4,74523 files

LLVM/project 2f0ae3allvm/lib/Bitcode/Reader BitcodeReader.cpp, llvm/test/Bitcode invalid-summary-version.test

bitcode: Improve invalid summary version error

Include the filename in the description.
DeltaFile
+5-0llvm/test/Bitcode/invalid-summary-version.test
+2-1llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+0-0llvm/test/Bitcode/Inputs/invalid-summary-version.bc
+7-13 files

LLVM/project f193189libcxx/include/__cstddef byte.h, libcxx/test/libcxx/language.support nodiscard.verify.cpp

[libc++][byte] Apply [[nodiscard]] to std::byte (#204674)

https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

Towards: #172124
DeltaFile
+24-0libcxx/test/libcxx/language.support/nodiscard.verify.cpp
+6-6libcxx/include/__cstddef/byte.h
+30-62 files

LLVM/project 6bc3ea3clang/lib/Driver/ToolChains AMDGPU.cpp, clang/test/Driver amdgpu-openmp-gpu-max-threads-per-block.c

clang/AMDGPU: Remove driver restriction on --gpu-max-threads-per-block

Previously this flag was only handled for HIP, and would produce an unused
argument warning. There is a custom warning produced by cc1 that the
argument isn't supported, but practically speaking that was unreachable
due to not forwarding the argument. Also add a test for the untested warning.
Also use a simpler method for forwarding the flag to cc1.
DeltaFile
+14-0clang/test/Frontend/openmp-warn-gpu-max-threads-per-block.c
+2-8clang/lib/Driver/ToolChains/AMDGPU.cpp
+6-0clang/test/Driver/amdgpu-openmp-gpu-max-threads-per-block.c
+22-83 files

LLVM/project 22995a6clang/lib/Driver/ToolChains AMDGPU.cpp, clang/test/Driver amdgpu-openmp-gpu-max-threads-per-block.c

clang/AMDGPU: Remove driver restriction on --gpu-max-threads-per-block

Previously this flag was only handled for HIP, and would produce an unused
argument warning. There is a custom warning produced by cc1 that the
argument isn't supported, but practically speaking that was unreachable
due to not forwarding the argument. Also add a test for the untested warning.
Also use a simpler method for forwarding the flag to cc1.
DeltaFile
+14-0clang/test/Frontend/openmp-warn-gpu-max-threads-per-block.c
+2-8clang/lib/Driver/ToolChains/AMDGPU.cpp
+5-0clang/test/Driver/amdgpu-openmp-gpu-max-threads-per-block.c
+21-83 files

LLVM/project cbf215cclang/lib/Driver/ToolChains AMDGPU.cpp, clang/test/Driver amdgpu-openmp-max-threads.c

clang/AMDGPU: Remove driver restriction on --gpu-max-threads-per-block

Previously this flag was only handled for HIP, and would produce an unused
argument warning. There is a custom warning produced by cc1 that the
argument isn't supported, but practically speaking that was unreachable
due to not forwarding the argument. Also add a test for the untested warning.
Also use a simpler method for forwarding the flag to cc1.
DeltaFile
+14-0clang/test/Frontend/openmp-warn-gpu-max-threads-per-block.c
+2-8clang/lib/Driver/ToolChains/AMDGPU.cpp
+5-0clang/test/Driver/amdgpu-openmp-max-threads.c
+21-83 files

LLVM/project 013dffeclang/lib/Driver/ToolChains AMDGPU.cpp HIPAMD.cpp

clang/AMDGPU: Merge toolchain subclasses

Simplify the toolchain implementations by collapsing
them into one. Previously we had a confusing split. The
AMDGPUToolChain base class implemented much of the base
support. It was subclassed by ROCMToolChain, which would
have been more accurately described as the offloading subclass.

That was further subclassed into HIP and OpenMP specific subclasses.
Deleting those two is the important part of this change. There was
code duplication, and features arbitrarily handled in one but not
the other. The offload kind is passed in almost everywhere if you
really need to know the original language. However, I consider
this an antifeature, and it is really poor QoI to have the HIP
and OpenMP toolchains behave differently in any way. The platform
should be consistent and the driver behaviors should not depend
on the language.

There is additional mess in the handling of spirv, which this

    [9 lines not shown]
DeltaFile
+264-123clang/lib/Driver/ToolChains/AMDGPU.cpp
+2-193clang/lib/Driver/ToolChains/HIPAMD.cpp
+0-94clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+48-23clang/lib/Driver/ToolChains/AMDGPU.h
+0-68clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
+1-50clang/lib/Driver/ToolChains/HIPAMD.h
+315-5514 files not shown
+340-56610 files

LLVM/project b17e6f7clang/lib/Driver Driver.cpp

Fix more windows paths
DeltaFile
+4-4clang/lib/Driver/Driver.cpp
+4-41 files

LLVM/project 18f45d9clang/include/clang/Driver CommonArgs.h, clang/lib/Driver/ToolChains CommonArgs.cpp AMDGPU.cpp

clang/AMDGPU: Fix double linking opencl libs with --libclc-lib

Noticed by inspection. If using an explicit --libclc-lib flag,
do not attempt to also link the rocm device libs which will contain
different implementations of the same opencl symbols.

Co-Authored-By: Claude <noreply at anthropic.com>
DeltaFile
+8-7clang/lib/Driver/ToolChains/CommonArgs.cpp
+9-0clang/test/Driver/opencl-libclc.cl
+5-1clang/include/clang/Driver/CommonArgs.h
+2-1clang/lib/Driver/ToolChains/AMDGPU.cpp
+24-94 files

LLVM/project e6a92e0offload/plugins-nextgen/common/include PluginInterface.h, offload/plugins-nextgen/common/src RecordReplay.cpp

[offload] Fix teams/threads limits in record replay (#200639)

The recording phase now sets the teams and threads limits provided by
the user (in the corresponding OpenMP clauses) or zero if not specified.
Additionally, the PR #199483 already enforces that replay's configuration
of threads and teams are respected.

This commit also changes the way we test record and replay when multiple
kernels are recorded in the same test. We use the record report to know how
to associate a json record descriptor file to the target region in the code. We
do not rely anymore on the modification time of the files to know the order,
which was problematic.
DeltaFile
+31-6offload/test/tools/omp-kernel-replay/record-replay-diff-teams-threads.cpp
+22-4offload/tools/kernelreplay/llvm-omp-kernel-replay.cpp
+12-6offload/plugins-nextgen/common/src/RecordReplay.cpp
+8-5offload/test/tools/omp-kernel-replay/record-replay-diff-threads.cpp
+3-0offload/plugins-nextgen/common/include/PluginInterface.h
+76-215 files

LLVM/project a665432clang/include/clang/Driver Driver.h, clang/lib/Driver Driver.cpp

Fix using unsanitized target id in filename
DeltaFile
+6-6clang/lib/Driver/Driver.cpp
+2-1clang/include/clang/Driver/Driver.h
+8-72 files

LLVM/project 72a7263clang/lib/Driver/ToolChains AMDGPU.cpp HIPAMD.cpp

clang/AMDGPU: Merge toolchain subclasses

Simplify the toolchain implementations by collapsing
them into one. Previously we had a confusing split. The
AMDGPUToolChain base class implemented much of the base
support. It was subclassed by ROCMToolChain, which would
have been more accurately described as the offloading subclass.

That was further subclassed into HIP and OpenMP specific subclasses.
Deleting those two is the important part of this change. There was
code duplication, and features arbitrarily handled in one but not
the other. The offload kind is passed in almost everywhere if you
really need to know the original language. However, I consider
this an antifeature, and it is really poor QoI to have the HIP
and OpenMP toolchains behave differently in any way. The platform
should be consistent and the driver behaviors should not depend
on the language.

There is additional mess in the handling of spirv, which this

    [9 lines not shown]
DeltaFile
+264-123clang/lib/Driver/ToolChains/AMDGPU.cpp
+2-193clang/lib/Driver/ToolChains/HIPAMD.cpp
+0-94clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+48-23clang/lib/Driver/ToolChains/AMDGPU.h
+0-68clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
+1-50clang/lib/Driver/ToolChains/HIPAMD.h
+315-5514 files not shown
+340-56610 files

LLVM/project db06fa6clang/include/clang/Driver CommonArgs.h, clang/lib/Driver/ToolChains CommonArgs.cpp AMDGPU.cpp

clang/AMDGPU: Fix double linking opencl libs with --libclc-lib

Noticed by inspection. If using an explicit --libclc-lib flag,
do not attempt to also link the rocm device libs which will contain
different implementations of the same opencl symbols.

Co-Authored-By: Claude <noreply at anthropic.com>
DeltaFile
+8-7clang/lib/Driver/ToolChains/CommonArgs.cpp
+9-0clang/test/Driver/opencl-libclc.cl
+5-1clang/include/clang/Driver/CommonArgs.h
+2-1clang/lib/Driver/ToolChains/AMDGPU.cpp
+24-94 files

LLVM/project 78fece9clang/lib/Driver/ToolChains AMDGPU.cpp, clang/test/Driver amdgpu-openmp-max-threads.c

clang/AMDGPU: Remove driver restriction on --gpu-max-threads-per-block

Previously this flag was only handled for HIP, and would produce an unused
argument warning. There is a custom warning produced by cc1 that the
argument isn't supported, but practically speaking that was unreachable
due to not forwarding the argument. Also add a test for the untested warning.
Also use a simpler method for forwarding the flag to cc1.
DeltaFile
+14-0clang/test/Frontend/openmp-warn-gpu-max-threads-per-block.c
+2-8clang/lib/Driver/ToolChains/AMDGPU.cpp
+5-0clang/test/Driver/amdgpu-openmp-max-threads.c
+21-83 files

LLVM/project 73a4a62clang/lib/Driver Driver.cpp SanitizerArgs.cpp

cleanups
DeltaFile
+5-5clang/lib/Driver/Driver.cpp
+1-1clang/lib/Driver/SanitizerArgs.cpp
+6-62 files

LLVM/project eeee48cclang/include/clang/Basic OffloadArch.h, clang/include/clang/Driver BoundArch.h Job.h

Merge into OffloadArch header
DeltaFile
+0-49clang/include/clang/Driver/BoundArch.h
+31-1clang/include/clang/Basic/OffloadArch.h
+4-6clang/include/clang/Driver/Job.h
+1-1clang/include/clang/Driver/Compilation.h
+1-1clang/include/clang/Driver/Action.h
+1-1clang/include/clang/Driver/Driver.h
+38-593 files not shown
+41-629 files

LLVM/project 0e27d37clang/include/clang/Driver BoundArch.h Action.h, clang/lib/Driver Driver.cpp

clang/Driver: Use struct type for BoundArch instead of StringRef

Change BoundArch arguments in the clang driver from StringRef (or
sometimes const char*) to a dedicated struct type that contains both
the architecture string and a parsed OffloadArch enum field. In the
future it may be useful to contain other feature bits here.

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+132-140clang/lib/Driver/Driver.cpp
+49-0clang/include/clang/Driver/BoundArch.h
+22-25clang/lib/Driver/ToolChains/Cuda.cpp
+23-23clang/include/clang/Driver/Action.h
+23-22clang/lib/Driver/ToolChains/AMDGPU.cpp
+18-24clang/lib/Driver/ToolChains/Darwin.cpp
+267-23484 files not shown
+573-59990 files