LLVM/project 7e98d19llvm/lib/Transforms/Scalar LoopFuse.cpp, llvm/test/Transforms/LoopFusion different_guards.ll

[LoopFusion] Do not fuse loops with different guards (#199724)

The testcase that was originally contributed to #193641 exposed a
functional issue in which loop fusion can fuse functions with different
loop guards. There seem to two distinct bugs and each of them alone is
enough to let this happen.

- The condition that checks loop guards are identical, intends to
exclude loops that require peeling. But the condition is not correct and
it allows some loops that do not require peeling to pass.

- The condition that checks two guards are identical implicitly assume
conditions of guard branches are instructions, but this is not
necessarily always correct.

This patch fixes the problem for the loops that do not require peeling.
The issue still exists for loops that require peeling and will be fixed
separately.
DeltaFile
+45-0llvm/test/Transforms/LoopFusion/different_guards.ll
+20-15llvm/lib/Transforms/Scalar/LoopFuse.cpp
+65-152 files

LLVM/project e63c406llvm/lib/Target/RISCV RISCVInstrInfoY.td, llvm/lib/Target/RISCV/MCTargetDesc RISCVBaseInfo.h

add operand types

Created using spr 1.3.8-beta.1
DeltaFile
+2-1llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
+2-1llvm/lib/Target/RISCV/RISCVInstrInfoY.td
+4-22 files

LLVM/project 98af862flang/lib/Lower/OpenMP OpenMP.cpp

Remove PFT fallback

PFT fallback was added when trying to support more complex construct
and is not required by current PR.
DeltaFile
+0-32flang/lib/Lower/OpenMP/OpenMP.cpp
+0-321 files

LLVM/project e42046amlir/test/Integration/Dialect/XeGPU/WG load_store_matrix.mlir

[MLIR][XeGPU] Fix pass name in RUN command (#199766)
DeltaFile
+1-1mlir/test/Integration/Dialect/XeGPU/WG/load_store_matrix.mlir
+1-11 files

LLVM/project 596247ellvm/lib/Target/AMDGPU SIFrameLowering.cpp

Fixes for buildbot breaks

Change-Id: I285adf09ac2df239d0ab05459f7388b6970247ad
DeltaFile
+7-8llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+7-81 files

LLVM/project 27abffalldb/docs/resources lldbgdbremote.md, lldb/source/Plugins/Process/gdb-remote GDBRemoteRegisterContext.cpp GDBRemoteRegisterContext.h

[lldb] New expedited register specfication for unavailable regs (#193894)

When lldb-server/debugserver send a stop packet, they expedite the
vaLues of many of the general purpose registers in the stop packet, so
lldb doesn't need to fetch them separately.

On Darwin systems using an AArch64 M4 or newer SOC with SME, we need to
fetch the streaming vector length (svl) register when in Streaming SVE
Mode to correctly size the registers in lldb. On Darwin systems, when we
are not in SSVE mode, svl is undefined -- it is not included in the
expedited registers. However, lldb will still try to fetch the value, so
we get a register-read packet at every stop on M4 and newer systems,
trying to fetch the value.

This patch adds a new format for the expedited registers. They are
normally a `;` separated series of `{regnum}:{native endian bytes}`.
This allows for `{regnum}:` alone, indicating that the register value
for regnum cannot be fetched at this stop.


    [42 lines not shown]
DeltaFile
+110-0lldb/test/API/functionalities/gdb_remote_client/TestUnavailableRegisters.py
+54-27lldb/source/Plugins/Process/gdb-remote/GDBRemoteRegisterContext.cpp
+39-16lldb/source/Plugins/Process/gdb-remote/GDBRemoteRegisterContext.h
+25-9lldb/tools/debugserver/source/RNBRemote.cpp
+11-5lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+8-0lldb/docs/resources/lldbgdbremote.md
+247-572 files not shown
+256-578 files

LLVM/project a7aceffllvm/docs SandboxIR.md, llvm/include/llvm/SandboxIR Tracker.h Context.h

Revert "[SandboxIR][Tracker] Implement accept(/*AcceptAll*/) and revert(/*RevertAll*/)" (#199776)

Reverts llvm/llvm-project#197289
DeltaFile
+0-54llvm/unittests/SandboxIR/TrackerTest.cpp
+6-18llvm/include/llvm/SandboxIR/Tracker.h
+4-13llvm/lib/SandboxIR/Tracker.cpp
+2-2llvm/include/llvm/SandboxIR/Context.h
+1-2llvm/docs/SandboxIR.md
+13-895 files

LLVM/project a14d084libc/config/baremetal config.json

Reland "[libc] Enable baremetal float printf using modular format" (#199758)

Reverts llvm/llvm-project#199114

#199118 fixed the issue uncovered in the Fuchsia CI build.
DeltaFile
+3-3libc/config/baremetal/config.json
+3-31 files

LLVM/project f5a3f1dllvm/test/Transforms/SLPVectorizer/AArch64 lcssa-phi-extract-scale.ll

[SLP][NFC]Add a test iwhtthre vectorization regression, NFC



Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/199774
DeltaFile
+213-0llvm/test/Transforms/SLPVectorizer/AArch64/lcssa-phi-extract-scale.ll
+213-01 files

LLVM/project 509f332llvm/lib/Target/RISCV RISCVInstrInfoY.td, llvm/lib/Target/RISCV/AsmParser RISCVAsmParser.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.8-beta.1
DeltaFile
+241-0llvm/test/MC/RISCV/rvy/rvy-basic.s
+168-0llvm/lib/Target/RISCV/RISCVInstrInfoY.td
+42-0llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+39-0llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+37-0llvm/test/MC/RISCV/rvy/rvy-basic-invalid.s
+33-0llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
+560-07 files not shown
+581-513 files

LLVM/project 705dff2llvm/lib/Target/RISCV RISCVInstrInfoY.td, llvm/lib/Target/RISCV/AsmParser RISCVAsmParser.cpp

update for version 0.9.8.2 of the spec

Created using spr 1.3.8-beta.1
DeltaFile
+241-0llvm/test/MC/RISCV/rvy/rvy-basic.s
+168-0llvm/lib/Target/RISCV/RISCVInstrInfoY.td
+42-0llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+39-0llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+37-0llvm/test/MC/RISCV/rvy/rvy-basic-invalid.s
+33-0llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
+560-07 files not shown
+581-513 files

LLVM/project 0e0127elldb/source/ValueObject ValueObjectVTable.cpp, lldb/test/API/functionalities/vtable TestVTableValue.py

[lldb] Fix vtable support on arm64e (#199116)

There were 2 small issues.
1. ValueObjectVTableChild was not fixing the addresses it was pulling
from signed pointers. This broke things like `SBValue::GetLoadAddress`
and identifying the function pointer type from debug info.
2. TestVTableValue.py made a lot of assumptions that did not hold on
arm64e. a. GetValueAsUnsigned will return a raw pointer value. Most of
the time, we needed GetValueAsAddress. b. The test was reading pointers
out of memory without fixing them up. c. The summary for a function
pointer on arm64e includes the load address. This isn't true on other
platforms.
DeltaFile
+10-8lldb/test/API/functionalities/vtable/TestVTableValue.py
+4-0lldb/source/ValueObject/ValueObjectVTable.cpp
+14-82 files

LLVM/project aa1e02dllvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir, mlir/lib/Dialect/XeGPU/Transforms XeGPUSubgroupDistribute.cpp

minor cleanup, rebase

Created using spr 1.3.8-beta.1
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+3,075-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+0-2,280mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+2,196-0llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+16,047-2,4481,232 files not shown
+57,071-25,4751,238 files

LLVM/project c2b92d3llvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir, mlir/lib/Dialect/XeGPU/Transforms XeGPUSubgroupDistribute.cpp

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.8-beta.1

[skip ci]
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+3,075-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+0-2,280mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+2,196-0llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+16,047-2,4481,229 files not shown
+57,062-25,4741,235 files

LLVM/project 013587ellvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir, mlir/lib/Dialect/XeGPU/Transforms XeGPUSubgroupDistribute.cpp

add more tests

Created using spr 1.3.8-beta.1
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+3,075-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+0-2,280mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+2,196-0llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+16,047-2,4481,238 files not shown
+57,110-25,6361,244 files

LLVM/project f57f30bllvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir, mlir/lib/Dialect/XeGPU/Transforms XeGPUSubgroupDistribute.cpp

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.8-beta.1

[skip ci]
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+3,075-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+0-2,280mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+2,196-0llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+16,047-2,4481,237 files not shown
+57,063-25,6361,243 files

LLVM/project 3177b3ellvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir, llvm/test/CodeGen/X86 horizontal-reduce-umax.ll

rebase

Created using spr 1.3.8-beta.1
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+3,075-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+0-2,353llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
+0-2,280mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+13,851-4,8012,377 files not shown
+88,102-48,5232,383 files

LLVM/project 8a64511llvm/test/Analysis/LoopAccessAnalysis clamped-access-pattern.ll, llvm/test/Transforms/LoopVectorize runtime-check-small-clamped-bounds.ll hoist-predicated-loads-with-predicated-stores.ll

[LV] Add tests with pointers based on URem expressions (NFC). (#199763)

Add tests with loads and stores with pointers based on URem expressions.
DeltaFile
+1,076-0llvm/test/Analysis/LoopAccessAnalysis/clamped-access-pattern.ll
+633-176llvm/test/Transforms/LoopVectorize/runtime-check-small-clamped-bounds.ll
+427-0llvm/test/Transforms/LoopVectorize/AArch64/clamped-load.ll
+156-34llvm/test/Transforms/LoopVectorize/hoist-predicated-loads-with-predicated-stores.ll
+152-0llvm/test/Transforms/LoopVectorize/clamped-load-vf-ranges.ll
+101-0llvm/test/Transforms/LoopVectorize/AArch64/discarded-interleave-group.ll
+2,545-2102 files not shown
+2,656-2108 files

LLVM/project 0eb28e6llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/RISCV get-vec-element-size.ll fmuladd_width_prop.ll

[SLP] Propagate through instrinsics in BoUpSLP::getVectorElementSize() (#199129)

We propagate through simple binary operations already, some operations
are excluded since it happens to be an intrinsic.

Motivated by case exposed when removing vectorization from pre-LTO, see
https://github.com/llvm/llvm-project/pull/195886#issuecomment-4486422243.
DeltaFile
+16-36llvm/test/Transforms/SLPVectorizer/RISCV/get-vec-element-size.ll
+18-18llvm/test/Transforms/SLPVectorizer/RISCV/fmuladd_width_prop.ll
+18-1llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+52-553 files

LLVM/project 2c33687llvm/test/CodeGen/AMDGPU extract-vector-elt-binop-build-vector.ll

[AMDGPU] Add regression test for extract of vector binop scalarization (#198825)

Test that extracting both lanes from a binop of two build_vectors
sharing a variable operand at different lane positions correctly folds
per-lane constants.

Assisted-by: Cursor (Claude)
DeltaFile
+87-0llvm/test/CodeGen/AMDGPU/extract-vector-elt-binop-build-vector.ll
+87-01 files

LLVM/project a4c8cfdllvm/lib/Target/AMDGPU/MCTargetDesc AMDGPUTargetStreamer.cpp, llvm/test/CodeGen/AMDGPU elf-note-null-terminator.ll

[AMDGPU] Fix ELF note emission to include null terminator (#199720)

The `AMDGPUTargetELFStreamer::EmitNote()` function claims the note name
includes a null terminator (NameSZ = Name.size() + 1) but only emits the
string bytes via `emitBytes(Name)`, relying on alignment padding to
provide the null byte. Works for most situations but breaks with 8-byte
names where padding lands exactly at the boundary.

Explicitly emit null terminator with `S.emitInt8(0)` after
`emitBytes(Name)`.
DeltaFile
+33-0llvm/test/CodeGen/AMDGPU/elf-note-null-terminator.ll
+1-0llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+34-02 files

LLVM/project c052a26clang/include/clang/Basic DarwinSDKInfo.h, clang/lib/Basic DarwinSDKInfo.cpp

Revert "[clang][driver][darwin] Hold onto full triples in Darwin SDKP… (#199756)

…latformInfo (#197791)"

This reverts commit 9c06c5de6a20df13cfe6d9a7022308e96f378955. It broke
downstream builds for compiler-rt builtins.

 Resolves: rdar://177813095
DeltaFile
+25-109clang/lib/Basic/DarwinSDKInfo.cpp
+33-35clang/include/clang/Basic/DarwinSDKInfo.h
+12-6clang/lib/Driver/ToolChains/Darwin.cpp
+2-2clang/unittests/Basic/DarwinSDKInfoTest.cpp
+1-1clang/lib/Driver/ToolChains/Darwin.h
+73-1535 files

LLVM/project 210323bllvm/lib/Target/AMDGPU SIFrameLowering.cpp SIMachineFunctionInfo.h, llvm/test/CodeGen/AMDGPU amdgpu-spill-cfi-saved-regs.ll

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
DeltaFile
+2,926-0llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll
+12-0llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+10-0llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+9-0llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+2-0llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+2,959-05 files

LLVM/project 708acd4llvm/include/llvm/CodeGen MachineFunction.h, llvm/lib/CodeGen MachineFunction.cpp

[AMDGPU][MC] Replace shifted registers in CFI instructions

Change-Id: I0d99e9fe43ec3b6fecac20531119956dca2e4e5c
DeltaFile
+67-67llvm/test/CodeGen/AMDGPU/sgpr-spill-overlap-wwm-reserve.mir
+33-0llvm/lib/MC/MCDwarf.cpp
+15-15llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll
+10-0llvm/lib/CodeGen/MachineFunction.cpp
+4-4llvm/test/CodeGen/AMDGPU/debug-frame.ll
+4-0llvm/include/llvm/CodeGen/MachineFunction.h
+133-865 files not shown
+143-9011 files

LLVM/project 3b7e787llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

[AMDGPU] Use register pair for PC spill

Change-Id: Ibedeef926f7ff235a06de65a83087c151f66a416
DeltaFile
+4,331-4,331llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+1,742-1,740llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+1,562-1,560llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+1,462-1,460llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+1,238-1,236llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+1,030-1,028llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+11,365-11,35589 files not shown
+18,153-18,04495 files

LLVM/project b39f39dllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll gfx-callable-argument-types.ll

[AMDGPU] Implement CFI for CSR spills

Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.

Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.

Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+3,568-2,598llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+1,912-1,913llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll
+2,700-12llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+631-631llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+505-510llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+394-399llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+9,710-6,063108 files not shown
+14,819-9,521114 files

LLVM/project 17a001c

[AMDGPU] Implement CFI for non-kernel functions

This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.

Change-Id: I5e3a9a62cf9189245011a82a129790d813d49373
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+0-00 files

LLVM/project 9b4ad51

[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU

While these can be represented with .cfi_escape, using these pseudo-cfi
instructions makes .s/.mir files more readable, and it is necessary to
support updating registers in CFI instructions (something that the
AMDGPU backend requires).

Change-Id: I763d0cabe5990394670281d4afb5a170981e55d0
DeltaFile
+0-00 files

LLVM/project 162b859

[AMDGPU] Emit entry function Dwarf CFI

Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.

Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+0-00 files

LLVM/project b87bafe

[Clang] Default to async unwind tables for amdgcn

To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.

There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.

Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
DeltaFile
+0-00 files