13,086,369 commits found in 6 milliseconds
LLVM /project 22271c9 — mlir/test/Integration/GPU/CUDA/sm90/python/tools matmulBuilder.py nvgpucompiler.py [MLIR][NVVM][Tests] Re-enable matmul.py tests (#175728)
This patch re-enables the matmul.py tests:
* Fix gpu.wait usages
* Fix gpu.launchOp usage
* Fix format-string for gpu.printf
* Fix verification failure by removing the block[0] append.
This is now done by the python script's init.
* Fix the runtime error by adding the missing initialize() call during
JIT.
* Add the missing waitGroup(0) for _ws implementation.
This was mistakenly removed in PR #113713. Without this fix,
I see timing issues and the _ws tests with stage>1 randomly show output
mismatch.
With all these fixes, the test compiles and
executes successfully on an sm90a machine.
(locally verified for 1K iterations)
Signed-off-by: Durgadoss R <durgadossr at nvidia.com> LLVM /project de32b21 — clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/non-overloaded vfncvtbf16.c vfncvt.c, clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded vfncvtbf16.c [RISCV] Move the intrinsic tests for zvfofp8min to zvfofp8min directory. NFC. (#176100)
Those intrinsic tests for zvfofp8min don't belong to Sifive. Delta File +0 -2,478 clang/test/CodeGen/RISCV/rvv-intrinsics-sifive/policy/non-overloaded/vfncvtbf16.c +2,478 -0 clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/non-overloaded/vfncvtbf16.c +2,394 -0 clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded/vfncvtbf16.c +0 -2,394 clang/test/CodeGen/RISCV/rvv-intrinsics-sifive/policy/overloaded/vfncvtbf16.c +0 -1,836 clang/test/CodeGen/RISCV/rvv-intrinsics-sifive/policy/non-overloaded/vfncvt.c +1,836 -0 clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/non-overloaded/vfncvt.c +6,708 -6,708 18 files not shown +14,114 -14,114 24 files
[RISCV] Store original LocVT/LocInfo in PendingLocs instead of XLenVT/Indirect. NFC (#176193)
Convert to XLenVT/Indirect when we use the PendingLocs. This allows the
2*XLen case to use the original LocVT and not the overridden XLenVT.
Hoping this reduces some of the changes from #176093. LLVM /project ac9f0ce — libc/shared/math dfmal.h, libc/src/__support/math dfmal.h CMakeLists.txt [libc][math] Refactor dfmal to Header Only. (#175359)
builds correctly with both Clang and GCC 12.2.
Since `fma` is not `constexpr`, `dfmal` cannot be declared `constexpr`
either.
Closes #175316. Add an extra delay when waiting for ALUA to settle
[lld][ELF] Deduplicate PC-relative indirect relocation logic for RISC-V and LoongArch
[lld][LoongArch] Clean up CALL30 relocation with setK16 and checkInt
LLVM /project 7d3bbdf — llvm/test/CodeGen/AMDGPU memory-legalizer-private-wavefront.ll memory-legalizer-private-singlethread.ll [AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability.
Disable generic DAG combines for AMDGPU at -O0 via disableGenericCombines()
to preserve instructions that users may want to set breakpoints
on during debugging.
Since power-of-2 division/remainder for types > i64 was dependent on
DAG combine optimizations, added shouldExpandPowerOf2DivRem()
to request IR-level expansion for these cases at -O0.
Delta File +8,544 -1,366 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll +8,544 -1,366 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll +8,544 -1,366 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll +8,449 -1,355 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll +8,449 -1,355 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll +8,069 -1,315 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll +50,599 -8,123 73 files not shown +196,618 -26,145 79 files
LLVM /project 10b7bac — llvm/test/CodeGen/AMDGPU swdev503538-move-to-valu-stack-srd-physreg.ll [NFC] Reduce fragility of swdev503538-... test.
The original test was created in PR #120815, but it depends on -O0 and
implicitly uses DAGCombiner (that is switched on by default for -O0).
The patch reduces fragility of the test and removes dependency on
DAGCombiner.
LLVM /project afd641f — orc-rt/include/orc-rt WrapperFunction.h SPSWrapperFunction.h, orc-rt/lib/executor Session.cpp [orc-rt] Make WrapperFunctionResult constructor explicit. (#176298)
The WrapperFunctionBuffer(orc_rt_WrapperFunctionBuffer) constructor
takes ownership of the underlying buffer (if one exists). Making the
constructor explicit makes this clearer at the call site.
This mirrors a similar change to the LLVM-side API in dec5d663745 . LLVM /project 10fea27 — llvm/lib/Target/X86 X86CodeGenPassBuilder.cpp, llvm/test/CodeGen/X86 llc-pipeline-npm.ll [X86][NewPM] Fix X86CodeGenPassBuilder
There were two passes in there that have not actually been ported, and
x86-seses got ported earlier today before this landed, so adding it as
well.
LLVM /project 84cbe2c — llvm/include/llvm/CodeGen ValueTypes.td, llvm/test/TableGen CPtrWildcard.td [ValueTypes] Add types for v256bf16 and v512bf16 (#176287)
There are v256f16 and v128f16 types for f16. This PR adds the same
number of element types for bf16. LLVM /project 5093d00 — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt Rebase
Created using spr 1.3.6-beta.1
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,512 files not shown +1,757,028 -1,326,222 9,518 files
LLVM /project 561144d — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt [𝘀𝗽𝗿] changes introduced through rebase
Created using spr 1.3.6-beta.1
[skip ci]
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,512 files not shown +1,757,028 -1,326,222 9,518 files
LLVM /project 9e71019 — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt Rebase
Created using spr 1.3.6-beta.1
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,512 files not shown +1,757,028 -1,326,222 9,518 files
LLVM /project 8fee2c3 — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt [𝘀𝗽𝗿] changes introduced through rebase
Created using spr 1.3.6-beta.1
[skip ci]
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,512 files not shown +1,757,028 -1,326,222 9,518 files
LLVM /project bd2e4ae — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt Address review comments
Created using spr 1.3.6-beta.1
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,512 files not shown +1,757,026 -1,326,221 9,518 files
LLVM /project bd997ae — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt [𝘀𝗽𝗿] changes introduced through rebase
Created using spr 1.3.6-beta.1
[skip ci]
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,505 files not shown +1,756,803 -1,325,746 9,511 files
LLVM /project abb4d65 — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt Rebase
Created using spr 1.3.6-beta.1
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,505 files not shown +1,756,803 -1,325,746 9,511 files
LLVM /project 9085a50 — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt [𝘀𝗽𝗿] changes introduced through rebase
Created using spr 1.3.6-beta.1
[skip ci]
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,505 files not shown +1,756,803 -1,325,746 9,511 files
LLVM /project c770769 — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt Rebae
Created using spr 1.3.6-beta.1
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,505 files not shown +1,756,803 -1,325,746 9,511 files
LLVM /project ce69179 — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt [𝘀𝗽𝗿] changes introduced through rebase
Created using spr 1.3.6-beta.1
[skip ci]
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,711 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,358 -193,526 9,505 files not shown +1,756,803 -1,325,746 9,511 files
LLVM /project ab95de0 — mlir/lib/Dialect/NVGPU/Transforms OptimizeSharedMemory.cpp, mlir/test/Dialect/NVGPU optimize-shared-memory.mlir [mlir][nvgpu] Fix a division by zero crash in OptimizeSharedMemoryPass (#174931)
Fixes #173553. devel/py-ty: Update to 0.0.12
Changelog: https://github.com/astral-sh/ty/blob/0.0.12/CHANGELOG.md
Reported by: Repology
LLVM /project f9d34fc — llvm/test/CodeGen/AMDGPU memory-legalizer-private-workgroup.ll memory-legalizer-private-singlethread.ll [AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability.
Disable generic DAG combines for AMDGPU at -O0 via disableGenericCombines()
to preserve instructions that users may want to set breakpoints
on during debugging.
Since power-of-2 division/remainder for types > i64 was dependent on
DAG combine optimizations, added shouldExpandPowerOf2DivRem()
to request IR-level expansion for these cases at -O0.
Delta File +8,544 -1,366 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll +8,544 -1,366 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll +8,544 -1,366 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll +8,449 -1,355 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll +8,449 -1,355 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll +8,069 -1,315 llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll +50,599 -8,123 73 files not shown +196,618 -26,145 79 files
address comments
LLVM /project d3d7260 — llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt Merge remote-tracking branch 'origin/users/makslevental/python-remove-obj' into users/makslevental/vectormathapfloat
Delta File +42,349 -42,348 llvm/test/MC/AMDGPU/gfx8_asm_vop3.s +41,419 -41,418 llvm/test/MC/AMDGPU/gfx7_asm_vop3.s +36,428 -36,427 llvm/test/MC/AMDGPU/gfx9_asm_vop3.s +28,175 -28,174 llvm/test/MC/AMDGPU/gfx9_asm_vopc.s +22,708 -22,884 llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt +22,276 -22,275 llvm/test/MC/AMDGPU/gfx8_asm_vopc.s +193,355 -193,526 884 files not shown +967,791 -959,868 890 files
LLVM /project dc5702c — llvm/test/CodeGen/AMDGPU swdev503538-move-to-valu-stack-srd-physreg.ll [NFC] Reduce fragility of swdev503538-... test.
The original test was created in PR #120815, but it depends on -O0 and
implicitly uses DAGCombiner (that is switched on by default for -O0).
The patch reduces fragility of the test and removes dependency on
DAGCombiner.
[mlir][Python] remove stray nb::cast
Include service reload as part of iscsi.alua.settled