LLVM/project 22271c9mlir/test/Integration/GPU/CUDA/sm90/python/tools matmulBuilder.py nvgpucompiler.py

[MLIR][NVVM][Tests] Re-enable matmul.py tests (#175728)

This patch re-enables the matmul.py tests:
* Fix gpu.wait usages
* Fix gpu.launchOp usage
* Fix format-string for gpu.printf
* Fix verification failure by removing the block[0] append.
   This is now done by the python script's init.
* Fix the runtime error by adding the missing initialize() call during
JIT.
* Add the missing waitGroup(0) for _ws implementation.
  This was mistakenly removed in PR #113713. Without this fix,
I see timing issues and the _ws tests with stage>1 randomly show output
mismatch.

With all these fixes, the test compiles and
executes successfully on an sm90a machine.
(locally verified for 1K iterations)

Signed-off-by: Durgadoss R <durgadossr at nvidia.com>
DeltaFile
+21-22mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
+3-1mlir/test/Integration/GPU/CUDA/sm90/python/tools/nvgpucompiler.py
+24-232 files

LLVM/project de32b21clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/non-overloaded vfncvtbf16.c vfncvt.c, clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded vfncvtbf16.c

[RISCV] Move the intrinsic tests for zvfofp8min to zvfofp8min directory. NFC. (#176100)

Those intrinsic tests for zvfofp8min don't belong to Sifive.
DeltaFile
+0-2,478clang/test/CodeGen/RISCV/rvv-intrinsics-sifive/policy/non-overloaded/vfncvtbf16.c
+2,478-0clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/non-overloaded/vfncvtbf16.c
+2,394-0clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded/vfncvtbf16.c
+0-2,394clang/test/CodeGen/RISCV/rvv-intrinsics-sifive/policy/overloaded/vfncvtbf16.c
+0-1,836clang/test/CodeGen/RISCV/rvv-intrinsics-sifive/policy/non-overloaded/vfncvt.c
+1,836-0clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/non-overloaded/vfncvt.c
+6,708-6,70818 files not shown
+14,114-14,11424 files

LLVM/project d8487bellvm/lib/Target/RISCV RISCVCallingConv.cpp

[RISCV] Store original LocVT/LocInfo in PendingLocs instead of XLenVT/Indirect. NFC (#176193)

Convert to XLenVT/Indirect when we use the PendingLocs. This allows the
2*XLen case to use the original LocVT and not the overridden XLenVT.

Hoping this reduces some of the changes from #176093.
DeltaFile
+5-5llvm/lib/Target/RISCV/RISCVCallingConv.cpp
+5-51 files

LLVM/project ac9f0celibc/shared/math dfmal.h, libc/src/__support/math dfmal.h CMakeLists.txt

[libc][math] Refactor dfmal to Header Only. (#175359)

builds correctly with both Clang and GCC 12.2.

Since `fma` is not `constexpr`, `dfmal` cannot be declared `constexpr`
either.
Closes #175316.
DeltaFile
+26-0libc/src/__support/math/dfmal.h
+24-0libc/shared/math/dfmal.h
+11-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+10-0libc/src/__support/math/CMakeLists.txt
+2-4libc/src/math/generic/dfmal.cpp
+4-1libc/test/shared/shared_math_test.cpp
+77-63 files not shown
+80-79 files

FreeNAS/freenas 5769f2atests/sharing_protocols/iscsi test_261_iscsi_cmd.py test_262_iscsi_alua.py

Add an extra delay when waiting for ALUA to settle
DeltaFile
+1-0tests/sharing_protocols/iscsi/test_261_iscsi_cmd.py
+1-0tests/sharing_protocols/iscsi/test_262_iscsi_alua.py
+2-02 files

LLVM/project 007a850lld/ELF InputSection.cpp

[lld][ELF] Deduplicate PC-relative indirect relocation logic for RISC-V and LoongArch
DeltaFile
+41-77lld/ELF/InputSection.cpp
+41-771 files

LLVM/project 49389a9lld/ELF/Arch LoongArch.cpp

[lld][LoongArch] Clean up CALL30 relocation with setK16 and checkInt
DeltaFile
+2-8lld/ELF/Arch/LoongArch.cpp
+2-81 files

LLVM/project 7d3bbdfllvm/test/CodeGen/AMDGPU memory-legalizer-private-wavefront.ll memory-legalizer-private-singlethread.ll

[AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability.

Disable generic DAG combines for AMDGPU at -O0 via disableGenericCombines()
to preserve instructions that users may want to set breakpoints
on during debugging.

Since power-of-2 division/remainder for types > i64 was dependent on
DAG combine optimizations, added shouldExpandPowerOf2DivRem()
to request IR-level expansion for these cases at -O0.
DeltaFile
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
+8,449-1,355llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
+8,449-1,355llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+8,069-1,315llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
+50,599-8,12373 files not shown
+196,618-26,14579 files

LLVM/project 10b7bacllvm/test/CodeGen/AMDGPU swdev503538-move-to-valu-stack-srd-physreg.ll

[NFC] Reduce fragility of swdev503538-... test.

The original test was created in PR #120815, but it depends on -O0 and
implicitly uses DAGCombiner (that is switched on by default for -O0).
The patch reduces fragility of the test and removes dependency on
DAGCombiner.
DeltaFile
+2-2llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll
+2-21 files

LLVM/project afd641forc-rt/include/orc-rt WrapperFunction.h SPSWrapperFunction.h, orc-rt/lib/executor Session.cpp

[orc-rt] Make WrapperFunctionResult constructor explicit. (#176298)

The WrapperFunctionBuffer(orc_rt_WrapperFunctionBuffer) constructor
takes ownership of the underlying buffer (if one exists). Making the
constructor explicit makes this clearer at the call site.

This mirrors a similar change to the LLVM-side API in dec5d663745.
DeltaFile
+8-5orc-rt/include/orc-rt/WrapperFunction.h
+10-0orc-rt/include/orc-rt/SPSWrapperFunction.h
+2-3orc-rt/include/orc-rt/Session.h
+2-2orc-rt/include/orc-rt/AllocAction.h
+2-2orc-rt/unittests/SessionTest.cpp
+1-1orc-rt/lib/executor/Session.cpp
+25-131 files not shown
+26-147 files

LLVM/project 10fea27llvm/lib/Target/X86 X86CodeGenPassBuilder.cpp, llvm/test/CodeGen/X86 llc-pipeline-npm.ll

[X86][NewPM] Fix X86CodeGenPassBuilder

There were two passes in there that have not actually been ported, and
x86-seses got ported earlier today before this landed, so adding it as
well.
DeltaFile
+8-5llvm/lib/Target/X86/X86CodeGenPassBuilder.cpp
+4-4llvm/test/CodeGen/X86/llc-pipeline-npm.ll
+12-92 files

LLVM/project 84cbe2cllvm/include/llvm/CodeGen ValueTypes.td, llvm/test/TableGen CPtrWildcard.td

[ValueTypes] Add types for v256bf16 and v512bf16 (#176287)

There are v256f16 and v128f16 types for f16. This PR adds the same
number of element types for bf16.
DeltaFile
+2-2llvm/test/TableGen/CPtrWildcard.td
+2-0llvm/include/llvm/CodeGen/ValueTypes.td
+4-22 files

LLVM/project 5093d00llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Rebase

Created using spr 1.3.6-beta.1
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,512 files not shown
+1,757,028-1,326,2229,518 files

LLVM/project 561144dllvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,512 files not shown
+1,757,028-1,326,2229,518 files

LLVM/project 9e71019llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Rebase

Created using spr 1.3.6-beta.1
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,512 files not shown
+1,757,028-1,326,2229,518 files

LLVM/project 8fee2c3llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,512 files not shown
+1,757,028-1,326,2229,518 files

LLVM/project bd2e4aellvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Address review comments

Created using spr 1.3.6-beta.1
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,512 files not shown
+1,757,026-1,326,2219,518 files

LLVM/project bd997aellvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,505 files not shown
+1,756,803-1,325,7469,511 files

LLVM/project abb4d65llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Rebase

Created using spr 1.3.6-beta.1
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,505 files not shown
+1,756,803-1,325,7469,511 files

LLVM/project 9085a50llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,505 files not shown
+1,756,803-1,325,7469,511 files

LLVM/project c770769llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Rebae

Created using spr 1.3.6-beta.1
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,505 files not shown
+1,756,803-1,325,7469,511 files

LLVM/project ce69179llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5269,505 files not shown
+1,756,803-1,325,7469,511 files

LLVM/project ab95de0mlir/lib/Dialect/NVGPU/Transforms OptimizeSharedMemory.cpp, mlir/test/Dialect/NVGPU optimize-shared-memory.mlir

[mlir][nvgpu] Fix a division by zero crash in OptimizeSharedMemoryPass (#174931)

Fixes #173553.
DeltaFile
+11-0mlir/test/Dialect/NVGPU/optimize-shared-memory.mlir
+3-0mlir/lib/Dialect/NVGPU/Transforms/OptimizeSharedMemory.cpp
+14-02 files

FreeBSD/ports de024b9devel/py-ty distinfo Makefile.crates

devel/py-ty: Update to 0.0.12

Changelog: https://github.com/astral-sh/ty/blob/0.0.12/CHANGELOG.md

Reported by:    Repology
DeltaFile
+19-15devel/py-ty/distinfo
+8-6devel/py-ty/Makefile.crates
+1-1devel/py-ty/Makefile
+28-223 files

LLVM/project f9d34fcllvm/test/CodeGen/AMDGPU memory-legalizer-private-workgroup.ll memory-legalizer-private-singlethread.ll

[AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability.

Disable generic DAG combines for AMDGPU at -O0 via disableGenericCombines()
to preserve instructions that users may want to set breakpoints
on during debugging.

Since power-of-2 division/remainder for types > i64 was dependent on
DAG combine optimizations, added shouldExpandPowerOf2DivRem()
to request IR-level expansion for these cases at -O0.
DeltaFile
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
+8,544-1,366llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
+8,449-1,355llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
+8,449-1,355llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+8,069-1,315llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
+50,599-8,12373 files not shown
+196,618-26,14579 files

LLVM/project 96bfc57mlir/lib/Conversion/ArithAndMathToAPFloat MathToAPFloat.cpp

address comments
DeltaFile
+4-5mlir/lib/Conversion/ArithAndMathToAPFloat/MathToAPFloat.cpp
+4-51 files

LLVM/project d3d7260llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Merge remote-tracking branch 'origin/users/makslevental/python-remove-obj' into users/makslevental/vectormathapfloat
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,708-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,355-193,526884 files not shown
+967,791-959,868890 files

LLVM/project dc5702cllvm/test/CodeGen/AMDGPU swdev503538-move-to-valu-stack-srd-physreg.ll

[NFC] Reduce fragility of swdev503538-... test.

The original test was created in PR #120815, but it depends on -O0 and
implicitly uses DAGCombiner (that is switched on by default for -O0).
The patch reduces fragility of the test and removes dependency on
DAGCombiner.
DeltaFile
+2-2llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll
+2-21 files

LLVM/project d972311mlir/lib/Bindings/Python TransformInterpreter.cpp

[mlir][Python] remove stray nb::cast
DeltaFile
+0-6mlir/lib/Bindings/Python/TransformInterpreter.cpp
+0-61 files

FreeNAS/freenas e6dbc22src/middlewared/middlewared/plugins/iscsi_ alua.py

Include service reload as part of iscsi.alua.settled
DeltaFile
+11-0src/middlewared/middlewared/plugins/iscsi_/alua.py
+11-01 files