LLVM/project 88ffa46clang/include/clang/Basic BuiltinsAMDGPU.td BuiltinsAMDGPUDocs.td

[AMDGPU][Clang][Doc] Add documentation for WMMA builtins
DeltaFile
+268-67clang/include/clang/Basic/BuiltinsAMDGPU.td
+326-0clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+594-672 files

LLVM/project 0b423a5mlir/test/Dialect/GPU module-to-binary-invalid-format.mlir

[MLIR] Fix invalid test after improving the error message (NFC)
DeltaFile
+1-1mlir/test/Dialect/GPU/module-to-binary-invalid-format.mlir
+1-11 files

LLVM/project 9b81414clang/lib/Sema SemaExpr.cpp, clang/test/ParserOpenACC parse-constructs.cpp

[clang] use typo-corrected name qualifier for expressions

Fixes #175783
DeltaFile
+12-0clang/test/SemaCXX/GH175783.cpp
+7-0clang/lib/Sema/SemaExpr.cpp
+2-2clang/test/ParserOpenACC/parse-constructs.cpp
+21-23 files

LLVM/project 2c3d5f9clang/lib/Sema SemaExpr.cpp, clang/test/ParserOpenACC parse-constructs.cpp

[clang] use typo-corrected name qualifier for expressions (#183937)

Fixes #175783
DeltaFile
+12-0clang/test/SemaCXX/GH175783.cpp
+7-0clang/lib/Sema/SemaExpr.cpp
+2-2clang/test/ParserOpenACC/parse-constructs.cpp
+21-23 files

LLVM/project 20f36a2mlir/lib/Dialect/GPU/Transforms ModuleToBinary.cpp

[MLIR][GPU] Improve error message on invalid pass option

This provides a more helpful message to the user when passing invalid command
line options
DeltaFile
+4-1mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
+4-11 files

LLVM/project b72d8acllvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 known-never-zero.ll

[DAG] isKnownNeverZero - add ISD::EXTRACT_VECTOR_ELT handling (#183961)

Initialize DemandedElts mask when the index is constant and inbounds, otherwise check all elements.
DeltaFile
+22-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+3-6llvm/test/CodeGen/X86/known-never-zero.ll
+25-62 files

LLVM/project df616fblldb/tools/lldb-dap EventHelper.cpp

[lldb][lldb-dap] Correctly format lldb warnings in the debug console (#173852)

Trivial change to prevent all warnings from being printed on a single
line in the VS Code debug console.
DeltaFile
+1-1lldb/tools/lldb-dap/EventHelper.cpp
+1-11 files

LLVM/project a0fb4f6lldb/examples/python formatter_bytecode.py

[lldb] Add BytecodeSection class to formatter_bytecode.py (#183876)

Changes `formatter_bytecode.compile_file` to return a `BytecodeSection`
value. The `BytecodeSection` holds the data that needs to be emitted to
an `__lldbformatters` section.

The `BytecodeSection` currently provides `write_binary`, but will be
updated in a follow up commit to include `write_source` which will allow
the data to be emitted as C source code, or Swift source code. This will
make it easier to integrate into build systems, as it's easier to get
data into a binary via source code, than as a raw binary file.
DeltaFile
+46-26lldb/examples/python/formatter_bytecode.py
+46-261 files

LLVM/project 59ba10bmlir/lib/Dialect/SPIRV/IR SPIRVDialect.cpp, mlir/test/Dialect/SPIRV/IR types.mlir

[mlir][spirv] Fix crash when spirv.struct member type is not a SPIR-V type (#183942)

When parsing a spirv.struct type, any MLIR type was accepted as a member
type without validation. This caused a crash in TypeExtensionVisitor and
TypeCapabilityVisitor which unconditionally used cast<SPIRVType> on
struct element types, asserting when a non-SPIR-V type (e.g.,
vector<2x2xi1>) was encountered.

Fix the parser to reject non-SPIR-V member types with a proper error
message.

Fixes #179675
DeltaFile
+6-0mlir/test/Dialect/SPIRV/IR/types.mlir
+5-0mlir/lib/Dialect/SPIRV/IR/SPIRVDialect.cpp
+11-02 files

LLVM/project 3949b08flang/lib/Semantics check-omp-loop.cpp, flang/test/Semantics/OpenMP fuse1.f90

[flang][OpenMP] Fix counting generated nests

The code in `CountGeneratedNests` returned std::nullopt if the LOOPRANGE
clause was not present on a FUSE construct. That is incorrect, the answer
should be 1 instead, except in cases where the FUSE itself was invalid,
such as having no loops nested in it.

Returning std::nullopt will not cause any messages to be emitted. The case
of zero loops inside of FUSE will be diagnosed when analyzing the body of
the FUSE construct itself, not when checking a construct in which the FUSE
is nested.
This prevents error messages caused by the same problem from being emitted
for every enclosing loop construct.
DeltaFile
+19-4flang/lib/Semantics/check-omp-loop.cpp
+18-0flang/test/Semantics/OpenMP/fuse1.f90
+37-42 files

LLVM/project 4a602c0lldb/docs index.rst, lldb/source/Plugins/Process/FreeBSD-Kernel-Core RegisterContextFreeBSDKernelCore_riscv64.cpp RegisterContextFreeBSDKernelCore_riscv64.h

[lldb][Process/FreeBSDKernelCore] Add riscv64 support (#180670)

This is LLDB version of
https://cgit.freebsd.org/ports/tree/devel/gdb/files/kgdb/riscv-fbsd-kern.c.
This enables selecting riscv64 and reading registers from PCB structure
on core dump and live kenrel debugging while trapframe unwinding support
will be implemented in future. Test files using core dump from riscv64
will be implemented once other kernel debugging improvements are done.

---------

Signed-off-by: Minsoo Choo <minsoochoo0122 at proton.me>
DeltaFile
+105-0lldb/source/Plugins/Process/FreeBSD-Kernel-Core/RegisterContextFreeBSDKernelCore_riscv64.cpp
+42-0lldb/source/Plugins/Process/FreeBSD-Kernel-Core/RegisterContextFreeBSDKernelCore_riscv64.h
+8-0lldb/source/Plugins/Process/FreeBSD-Kernel-Core/ThreadFreeBSDKernelCore.cpp
+1-1llvm/docs/ReleaseNotes.md
+1-1lldb/docs/index.rst
+1-0lldb/source/Plugins/Process/FreeBSD-Kernel-Core/CMakeLists.txt
+158-26 files

LLVM/project 3e05ab6llvm/lib/LTO LTO.cpp, llvm/test/Assembler thinlto-summary.ll

[ThinLTO] Reduce the number of renaming due to promotions (#183793)

Currently for thin-lto, the imported static global values (functions,
variables, etc) will be promoted/renamed from e.g., foo() to
foo.llvm.(). Such a renaming caused difficulties in live patching
since function name is changed ([1]).

It is possible that some global value names have to be promoted to avoid
name collision and linker failure. But in practice, majority of name
promotions can be avoided.

In [2], the suggestion is that thin-lto pre-link decides whether
a particular global value needs name promotion or not. If yes, later on
in thinBackend() the name will be promoted.

I compiled a particular linux kernel version (latest bpf-next tree)
and found 1216 global values with suffix .llvm.. With this patch,
the number of promoted functions is 2, 98% reduction from the
original kernel build.

    [21 lines not shown]
DeltaFile
+99-0llvm/test/ThinLTO/X86/reduce-promotion-devirt.ll
+69-0llvm/test/ThinLTO/X86/reduce-promotion-same-local-name.ll
+30-30llvm/test/Assembler/thinlto-summary.ll
+52-7llvm/lib/LTO/LTO.cpp
+56-0llvm/test/ThinLTO/X86/reduce-promotion.ll
+46-0llvm/test/ThinLTO/X86/reduce-promotion-same-file-local-name.ll
+352-3738 files not shown
+556-14644 files

LLVM/project e317f42llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer last-buildvector-node.ll

[SLP]Recalculate dependencies for the buildvector schedule node, if they have copyable node

Need to recalculate the deps for all buildvector nodes with copyable
deps to prevent a compiler crash during scheduling of instructions
DeltaFile
+88-0llvm/test/Transforms/SLPVectorizer/last-buildvector-node.ll
+7-7llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+95-72 files

LLVM/project 5ed875alldb/source/Utility ZipFile.cpp, lldb/unittests/Host/common ZipFileResolverTest.cpp CMakeLists.txt

[lldb][lldb-server] Fix zip file lookup ignoring last entry in the zip file (#173966)

Command qModuleInfo (GDB server protocol) can be used to request
metadata of shared libraries stored in a ZIP archive on the target. This
is typically used for retrieving SO files bundled in a APK file on
Android.

Requesting the last entry in the ZIP file often fails because of a bug
in the entry search mechanism. This PR fixes this.

NOTES:
* The bug appears only if the entry in the zip file has no extra field
or comment
* This is part on an effort to get lldb working for debugging Swift on
Android: https://github.com/swiftlang/llvm-project/issues/10831
DeltaFile
+35-15lldb/unittests/Host/common/ZipFileResolverTest.cpp
+1-1lldb/source/Utility/ZipFile.cpp
+1-0lldb/unittests/Host/common/CMakeLists.txt
+0-0lldb/unittests/Host/common/Inputs/zip-test-no-extras.zip
+37-164 files

LLVM/project 9be2cf2clang/include/clang/Basic BuiltinsAMDGPU.td BuiltinsAMDGPUDocs.td

[AMDGPU][Clang][Doc] Add documentation for WMMA builtins
DeltaFile
+268-67clang/include/clang/Basic/BuiltinsAMDGPU.td
+300-0clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+568-672 files

LLVM/project 7aaa2ecclang/lib/Sema SemaExpr.cpp, clang/test/ParserOpenACC parse-constructs.cpp

[clang] use typo-corrected name qualifier for expressions

Fixes #175783
DeltaFile
+12-0clang/test/SemaCXX/GH175783.cpp
+7-0clang/lib/Sema/SemaExpr.cpp
+2-2clang/test/ParserOpenACC/parse-constructs.cpp
+21-23 files

LLVM/project f8f8d5aclang/test/TableGen builtin-docs.td, clang/utils/TableGen ClangBuiltinsEmitter.cpp

[Clang][TableGen] Sort undocumented builtins after documented ones in generated docs

The builtin documentation emitter previously sorted all categories purely
alphabetically, which placed the "Undocumented" section before categories like
"WMMA" in the generated RST. This made the output confusing since stub entries
appeared before real documentation.

Push the "Undocumented" category to the end of the output so that all documented
categories appear first, regardless of their names.
DeltaFile
+20-10clang/test/TableGen/builtin-docs.td
+10-3clang/utils/TableGen/ClangBuiltinsEmitter.cpp
+30-132 files

LLVM/project 3034c09clang/lib/Format UnwrappedLineParser.cpp, clang/unittests/Format FormatTest.cpp

[clang-format] bugfix: Whitesmiths with IndentAccessModifiers (#182432)

Due to special handling of Whitesmiths when parsing, the additional
level(s) needed for the block, when used with IndentAccessModifiers,
were not being applied. Consequently, when calculating the access
modifier indent offset, the modifiers were being placed at the class
level.

This change ensures that the additional level(s) are not omitted for
Whitesmiths.
DeltaFile
+17-0clang/unittests/Format/FormatTest.cpp
+9-5clang/lib/Format/UnwrappedLineParser.cpp
+26-52 files

LLVM/project e61b516clang/test/CodeGenOpenCL cl-uniform-wg-size.cl amdgpu-enqueue-kernel.cl, llvm/lib/IR AutoUpgrade.cpp

[AMDGPU] Make uniform-work-group-size a valueless attribute

The "uniform-work-group-size" function attribute previously took a
string value of "true" or "false". Since presence alone can convey
the "true" semantics and absence can convey "false", the value is
unnecessary.

This patch converts it to a valueless string attribute: presence
indicates true, absence indicates false. For backward compatibility,
auto-upgrade logic is added in both UpgradeAttributes (bitcode) and
UpgradeFunctionAttributes: if the old value is "true", the attribute
is kept without a value; if "false", the attribute is removed.

All setters (Clang CodeGen, OMPIRBuilder, AMDGPUAttributor, ROCDL
translation) and readers (AMDGPUAttributor, AMDGPULowerKernelAttributes,
AMDGPUHSAMetadataStreamer) are updated accordingly. The attribute is
also documented in the AMDGPU LLVM IR Attributes table where it was
previously missing.
DeltaFile
+43-17clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
+24-26clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+21-0llvm/test/Bitcode/upgrade-uniform-work-group-size.ll
+21-0llvm/lib/IR/AutoUpgrade.cpp
+4-9llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+5-6llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll
+118-5845 files not shown
+196-13851 files

LLVM/project 4d724c0llvm/test/CodeGen/X86 known-never-zero.ll

[X86] known-never-zero.ll - add tests showing failure to handle ISD::EXTRACT_VECTOR_ELT nodes (#183934)

DeltaFile
+70-0llvm/test/CodeGen/X86/known-never-zero.ll
+70-01 files

LLVM/project 1909e43mlir/lib/Dialect/GPU/IR GPUDialect.cpp, mlir/test/Dialect/GPU invalid.mlir

[mlir][GPU] Fix crash in WarpExecuteOnLane0Op::verify with wrong terminator (#183930)

WarpExecuteOnLane0Op::verify() called getTerminator() which performed an
unconditional cast<gpu::YieldOp> on the block's last operation. When the
op body was written with a different terminator (e.g. affine.yield), the
cast asserted immediately instead of emitting a verifier diagnostic.

Fix by using dyn_cast in verify() before calling getTerminator(), and
emitting a proper error message when the terminator is not gpu.yield.

Add a regression test to invalid.mlir.

Fixes #181450
DeltaFile
+17-0mlir/test/Dialect/GPU/invalid.mlir
+3-1mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+20-12 files

LLVM/project 889714aclang/test/CodeGenOpenCL cl-uniform-wg-size.cl amdgpu-enqueue-kernel.cl, llvm/lib/IR AutoUpgrade.cpp

[AMDGPU] Make uniform-work-group-size a valueless attribute

The "uniform-work-group-size" function attribute previously took a
string value of "true" or "false". Since presence alone can convey
the "true" semantics and absence can convey "false", the value is
unnecessary.

This patch converts it to a valueless string attribute: presence
indicates true, absence indicates false. For backward compatibility,
auto-upgrade logic is added in both UpgradeAttributes (bitcode) and
UpgradeFunctionAttributes: if the old value is "true", the attribute
is kept without a value; if "false", the attribute is removed.

All setters (Clang CodeGen, OMPIRBuilder, AMDGPUAttributor, ROCDL
translation) and readers (AMDGPUAttributor, AMDGPULowerKernelAttributes,
AMDGPUHSAMetadataStreamer) are updated accordingly. The attribute is
also documented in the AMDGPU LLVM IR Attributes table where it was
previously missing.
DeltaFile
+41-17clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
+24-26clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+21-0llvm/lib/IR/AutoUpgrade.cpp
+21-0llvm/test/Bitcode/upgrade-uniform-work-group-size.ll
+4-9llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+5-6llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll
+116-5845 files not shown
+194-13851 files

LLVM/project e27bbd7clang/test/CodeGenOpenCL cl-uniform-wg-size.cl

[NFC][Clang] Auto generate check lines for `clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl`
DeltaFile
+31-14clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
+31-141 files

LLVM/project 2430410lldb/docs index.rst, lldb/source/Plugins/Process/FreeBSD-Kernel-Core RegisterContextFreeBSDKernelCore_ppc64le.cpp RegisterContextFreeBSDKernelCore_ppc64le.h

[lldb][Process/FreeBSDKernelCore] Add ppc64le support (#180669)

This is LLDB version of
https://cgit.freebsd.org/ports/tree/devel/gdb/files/kgdb/ppcfbsd-kern.c.
This enables selecting ppc64le and reading registers from PCB structure
on core dump and live kernel debugging. FPU registers aren't supported
yet due to pcb structure issue, but this change still achieves feature
parity with KGDB. Trapframe unwinding support will be implemented in
future. Test files using core dump from ppc64le will be implemented once
other kernel debugging improvements are done.

---------

Signed-off-by: Minsoo Choo <minsoochoo0122 at proton.me>
DeltaFile
+95-0lldb/source/Plugins/Process/FreeBSD-Kernel-Core/RegisterContextFreeBSDKernelCore_ppc64le.cpp
+33-0lldb/source/Plugins/Process/FreeBSD-Kernel-Core/RegisterContextFreeBSDKernelCore_ppc64le.h
+7-0lldb/source/Plugins/Process/FreeBSD-Kernel-Core/ThreadFreeBSDKernelCore.cpp
+1-1lldb/docs/index.rst
+1-1llvm/docs/ReleaseNotes.md
+1-0lldb/source/Plugins/Process/FreeBSD-Kernel-Core/CMakeLists.txt
+138-26 files

LLVM/project 4a93b9allvm/lib/Target/ARM ARMISelLowering.cpp, llvm/test/CodeGen/ARM fp-intrinsics-vector-v8.ll

[ARM] Lower strictfp vector fp16 rounding operations similar to default mode (#183700)

Previously the strictfp rounding nodes were lowered using unrolling to
scalar operations, which has negative impact on performance. Partially
this issue was fixed in #180480, this change continues that work and
implements optimized lowering for v4f16 and v8f16.
DeltaFile
+10-220llvm/test/CodeGen/ARM/fp-intrinsics-vector-v8.ll
+7-12llvm/lib/Target/ARM/ARMISelLowering.cpp
+17-2322 files

LLVM/project a6ceae4llvm/lib/Target/AMDGPU AMDGPUPromoteAlloca.cpp

[AMDGPU] Assert non-array alloca does have a size (#183834)

Refs
https://github.com/llvm/llvm-project/pull/179523/changes#r2851952141
DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+1-21 files

LLVM/project f898469clang/test/CodeGenOpenCL cl-uniform-wg-size.cl amdgpu-enqueue-kernel.cl, llvm/lib/IR AutoUpgrade.cpp

[AMDGPU] Make uniform-work-group-size a valueless attribute

The "uniform-work-group-size" function attribute previously took a
string value of "true" or "false". Since presence alone can convey
the "true" semantics and absence can convey "false", the value is
unnecessary.

This patch converts it to a valueless string attribute: presence
indicates true, absence indicates false. For backward compatibility,
auto-upgrade logic is added in both UpgradeAttributes (bitcode) and
UpgradeFunctionAttributes: if the old value is "true", the attribute
is kept without a value; if "false", the attribute is removed.

All setters (Clang CodeGen, OMPIRBuilder, AMDGPUAttributor, ROCDL
translation) and readers (AMDGPUAttributor, AMDGPULowerKernelAttributes,
AMDGPUHSAMetadataStreamer) are updated accordingly. The attribute is
also documented in the AMDGPU LLVM IR Attributes table where it was
previously missing.
DeltaFile
+43-17clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
+24-26clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+21-0llvm/lib/IR/AutoUpgrade.cpp
+21-0llvm/test/Bitcode/upgrade-uniform-work-group-size.ll
+4-9llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+5-6llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll
+118-5845 files not shown
+196-13851 files

LLVM/project 5d6410fclang/test/CodeGenOpenCL cl-uniform-wg-size.cl

[NFC][Clang] Auto generate check lines for `clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl`
DeltaFile
+46-14clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
+46-141 files

LLVM/project 3d086f5clang/lib/CIR/CodeGen CIRGenExprComplex.cpp, clang/test/CIR/CodeGen implicit-value-init-expr.cpp

[CIR] Implement ImplicitValueInitExpr for ComplexType (#183836)

Implement ImplicitValueInitExpr for ComplexType
DeltaFile
+25-0clang/test/CIR/CodeGen/implicit-value-init-expr.cpp
+3-3clang/lib/CIR/CodeGen/CIRGenExprComplex.cpp
+28-32 files

LLVM/project 7585ab0llvm/lib/Target/AMDGPU GCNSubtarget.h, llvm/test/CodeGen/AMDGPU hazard-shift64.mir

[AMDGPU] Enable shift64 hazard recognition for gfx9 (#183839)

Enable shift64 hazard recognition for gfx9 cores.

---------

Signed-off-by: John Lu <John.Lu at amd.com>
DeltaFile
+1-3llvm/lib/Target/AMDGPU/GCNSubtarget.h
+2-0llvm/test/CodeGen/AMDGPU/hazard-shift64.mir
+3-32 files