LLVM/project d2d8c53clang/lib/Driver/ToolChains AMDGPU.cpp AMDGPUOpenMP.cpp, clang/lib/Driver/ToolChains/Arch AMDGPU.cpp

[AMDGPU] Rewrite `-march` to `-mcpu` in the AMDGPU Toolchain (#198877)

Summary:
Pretty much every target uses either `-mcpu` or `-march` consistently.
AMDGPU has been accidentally using both for a while, mostly from some
fallout with the OpenMP Toolchain. This is too deep to pull out without
potentially disrupting users, but I want to at least contain it by
canonicalizing `-march` to `-mcpu` in the driver. This means we don't
need to bother checking both like every other target does.
DeltaFile
+8-5clang/lib/Driver/ToolChains/AMDGPU.cpp
+3-4clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+1-5clang/lib/Driver/ToolChains/CommonArgs.cpp
+1-3clang/lib/Driver/ToolChains/Arch/AMDGPU.cpp
+13-174 files

LLVM/project a3da590flang-rt/test/Driver compare_iso_fortran_env_symbols.f90, llvm/runtimes CMakeLists.txt

[flang-rt] Fix ISO test not respecting real kind flags (#198922)

Summary:
The test previously did not account for CMake overrides, so we just grab
the file that's actually generated. `sort -u` should handle the case
where there's both a .so and .a.
DeltaFile
+6-6flang-rt/test/Driver/compare_iso_fortran_env_symbols.f90
+3-0llvm/runtimes/CMakeLists.txt
+9-62 files

LLVM/project 17e4140lldb/include/lldb/Symbol TypeSystem.h

[lldb] Make TypeSystem::m_sym_file atomic to fix data race (#198923)

SymbolFileCommon::GetTypeSystemForLanguage unconditionally writes this
pointer with `ts->SetSymbolFile(this)` on every lookup, which races with
concurrent reads from other threads.

The race is benign in practice: there is exactly one SymbolFile per
Module, so every writer stores the same pointer, but it is still
undefined behavior under the C++ memory model.

Make the field std::atomic<SymbolFile *> and turn SetSymbolFile into a
compare-exchange that asserts a TypeSystem is never rebound to a
different SymbolFile, documenting the invariant that lets us get away
with this.

The alternative is to have the SymbolFile pointer passed in through the
constructor, but that would require updating a bunch of call sites,
including various plugin interfaces.

Found by ThreadSanitizer as part of #197792.
DeltaFile
+27-5lldb/include/lldb/Symbol/TypeSystem.h
+27-51 files

LLVM/project 3a25cb1llvm/lib/CodeGen InlineSpiller.cpp

More and more refactoring.
DeltaFile
+12-4llvm/lib/CodeGen/InlineSpiller.cpp
+12-41 files

LLVM/project 480a6e0llvm/test/CodeGen/DirectX/DebugInfo di-subprogram.ll

Adjust test after merged PRs.
DeltaFile
+4-1llvm/test/CodeGen/DirectX/DebugInfo/di-subprogram.ll
+4-11 files

LLVM/project f4caa0allvm/lib/Target/AMDGPU AMDGPU.td GCNSubtarget.cpp, llvm/test/CodeGen/AMDGPU gfx12-5-generic-no-xnack.ll

[AMDGPU] Remove unsupported feature by gfx12-5-generic target (#198437)

Co-authored-by: Shilei Tian <i at tianshilei.me>
Co-authored-by: Chinmay Deshpande <chdeshpa at amd.com>
DeltaFile
+8-3llvm/lib/Target/AMDGPU/AMDGPU.td
+9-0llvm/test/CodeGen/AMDGPU/gfx12-5-generic-no-xnack.ll
+4-0llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+21-33 files

LLVM/project 4e0d751llvm/lib/Support UnicodeNameToCodepointGenerated.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.av.load.b128.ll

Merge branch 'main' into users/chenshanzhi/AArch64-TTI-getTgtMemIntrinsic
DeltaFile
+23,873-20,923llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
+8,633-8,584llvm/test/CodeGen/Thumb2/mve-clmul.ll
+12,365-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.av.load.b128.ll
+1,243-8,768llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll
+8,195-0llvm/test/MC/AMDGPU/gfx13_asm_vop3.s
+8,182-0llvm/test/MC/AMDGPU/gfx13_asm_vop3-fake16.s
+62,491-38,2755,625 files not shown
+425,957-192,4315,631 files

LLVM/project 4cdb2bdllvm/lib/Transforms/AggressiveInstCombine AggressiveInstCombine.cpp, llvm/test/Transforms/AggressiveInstCombine popcount.ll

[AggressiveInstCombine] Recognizing tail truncation in the popcount pattern (#198658)

We're currently able to recognize the following popcount pattern
```
int popcnt(unsigned x) {
 x = x - ((x >> 1) & 0x55555555);
 x = x - 3*((x >> 2) & 0x33333333);
 x = (x + (x >> 4)) & 0x0F0F0F0F;
 x = x + (x >> 8);
 x = x + (x >> 16);
 return x & 0x0000003F;
}
```
but if a truncation follows right after the last AND instruction:
```
int16_t popcnt(unsigned x) {
 x = x - ((x >> 1) & 0x55555555);
 x = x - 3*((x >> 2) & 0x33333333);
 x = (x + (x >> 4)) & 0x0F0F0F0F;

    [12 lines not shown]
DeltaFile
+127-0llvm/test/Transforms/AggressiveInstCombine/popcount.ll
+26-7llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
+153-72 files

LLVM/project bf1fca2clang/test/AST ast-dump-lambda-json.cpp ast-dump-template-json-win32-mangler-crash.cpp, lldb/tools/lldb-dap/extension package-lock.json

Merge branch 'users/hvdijk/aaw-emitmdnodeannot' into users/hvdijk/dxilprettyprinter-ir-printing
DeltaFile
+23,873-20,923llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
+12,365-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.av.load.b128.ll
+3,903-0llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion.ll
+2,504-1,285lldb/tools/lldb-dap/extension/package-lock.json
+0-3,387clang/test/AST/ast-dump-lambda-json.cpp
+7-3,217clang/test/AST/ast-dump-template-json-win32-mangler-crash.cpp
+42,652-28,8121,867 files not shown
+110,110-56,2501,873 files

LLVM/project 6799f69lldb/source/Host/macosx/objcxx HostInfoMacOSX.mm

Revert "[LLDB] Add a progress event to xcrun invocations (#198931)" (#198945)

This change requires Host link against Core, and it cannot do that; it
may only link in Utility. Reverting so Adrian can decide what to do.

This reverts commit 5c63509f4cc356639d9c4067e0812c2312689363.
DeltaFile
+0-8lldb/source/Host/macosx/objcxx/HostInfoMacOSX.mm
+0-81 files

LLVM/project cd2b962clang/test/AST ast-dump-lambda-json.cpp ast-dump-template-json-win32-mangler-crash.cpp, lldb/tools/lldb-dap/extension package-lock.json

Merge branch 'main' into users/hvdijk/aaw-emitmdnodeannot
DeltaFile
+23,873-20,923llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
+12,365-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.av.load.b128.ll
+3,903-0llvm/test/CodeGen/NVPTX/machine-cse-predicate-inversion.ll
+2,504-1,285lldb/tools/lldb-dap/extension/package-lock.json
+0-3,387clang/test/AST/ast-dump-lambda-json.cpp
+7-3,217clang/test/AST/ast-dump-template-json-win32-mangler-crash.cpp
+42,652-28,8121,867 files not shown
+110,110-56,2501,873 files

LLVM/project 84de374.github/workflows libc-shared-tests.yml libc-fullbuild-tests.yml

[Github] Add timeouts to libc tests (#198934)

None of these jobs do not take anywhere close to the six hour timeout
that Github uses by default. Set timeouts that are 2-3x the typical job
runtime so that if there is a test/build step that hangs indefinitely,
the job times out in a reasonable amount of time and does not hold any
resources that could be used elsewhere.

This should not impact any jobs that do not hang, will not change the
result of jobs that do hang, and means we can more effectively deal with
cases like today where tests were hanging, from a resource perspective.

This is also standard in some other workflows like the main premerge
workflow definition.
DeltaFile
+2-0.github/workflows/libc-shared-tests.yml
+1-0.github/workflows/libc-fullbuild-tests.yml
+1-0.github/workflows/libc-freebsd-vm-tests.yml
+1-0.github/workflows/libc-overlay-tests.yml
+5-04 files

LLVM/project c25924fllvm/lib/Analysis InstCount.cpp, llvm/lib/Passes PassBuilderPipelines.cpp PassRegistry.def

Add InstCount Pass Before Optimization (#198874)

This way we can count instructions before the optimization pipeline for
analysis sake
DeltaFile
+49-9llvm/lib/Analysis/InstCount.cpp
+41-0llvm/test/Analysis/InstCount/pipeline.ll
+36-3llvm/lib/Passes/PassBuilderPipelines.cpp
+1-7llvm/test/Analysis/InstCount/instcount.ll
+6-1llvm/lib/Passes/PassRegistry.def
+5-0llvm/lib/Passes/PassBuilder.cpp
+138-201 files not shown
+142-217 files

LLVM/project dec3552llvm/include/llvm/CodeGen MachineFunction.h, llvm/lib/CodeGen MachineFunction.cpp

[AMDGPU][MC] Replace shifted registers in CFI instructions

Change-Id: I0d99e9fe43ec3b6fecac20531119956dca2e4e5c
DeltaFile
+67-67llvm/test/CodeGen/AMDGPU/sgpr-spill-overlap-wwm-reserve.mir
+33-0llvm/lib/MC/MCDwarf.cpp
+15-15llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll
+10-0llvm/lib/CodeGen/MachineFunction.cpp
+4-4llvm/test/CodeGen/AMDGPU/debug-frame.ll
+4-0llvm/include/llvm/CodeGen/MachineFunction.h
+133-865 files not shown
+143-9011 files

LLVM/project 9294b22llvm/lib/Target/AMDGPU SIFrameLowering.cpp SIMachineFunctionInfo.h, llvm/test/CodeGen/AMDGPU amdgpu-spill-cfi-saved-regs.ll

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
DeltaFile
+2,926-0llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll
+12-0llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+10-0llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+9-0llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+2-0llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+2,959-05 files

LLVM/project 06d4fb8llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll gfx-callable-argument-types.ll

[AMDGPU] Implement CFI for CSR spills

Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.

Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.

Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+3,568-2,598llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+1,912-1,913llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll
+2,700-12llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+631-631llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+505-510llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+394-399llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+9,710-6,063108 files not shown
+14,825-9,526114 files

LLVM/project 7a6764fllvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir

[AMDGPU] Implement CFI for non-kernel functions

This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.

Change-Id: I5e3a9a62cf9189245011a82a129790d813d49373
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+2,196-0llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+2,136-0llvm/test/CodeGen/AMDGPU/vgpr-spill-scc-clobber.mir
+1,671-1llvm/test/CodeGen/AMDGPU/debug-frame.ll
+16,779-16993 files not shown
+22,925-1,04999 files

LLVM/project fbfe7b4llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

[AMDGPU] Use register pair for PC spill

Change-Id: Ibedeef926f7ff235a06de65a83087c151f66a416
DeltaFile
+4,331-4,331llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+1,742-1,740llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+1,562-1,560llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+1,462-1,460llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+1,238-1,236llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+1,030-1,028llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+11,365-11,35589 files not shown
+18,153-18,04495 files

LLVM/project 0ce5199

[Clang] Default to async unwind tables for amdgcn

To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.

There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.

Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
DeltaFile
+0-00 files

LLVM/project 6158198llvm/lib/Target/AMDGPU SIFrameLowering.cpp, llvm/test/CodeGen/AMDGPU debug-frame.ll eliminate-frame-index-v-add-u32.mir

[AMDGPU] Emit entry function Dwarf CFI

Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.

Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+1,405-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+204-12llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-u32.mir
+134-6llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-co-u32.mir
+114-10llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-i32.mir
+42-5llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+34-0llvm/test/CodeGen/AMDGPU/entry-function-cfi.mir
+1,933-3322 files not shown
+2,044-5028 files

LLVM/project 64def82

[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU

While these can be represented with .cfi_escape, using these pseudo-cfi
instructions makes .s/.mir files more readable, and it is necessary to
support updating registers in CFI instructions (something that the
AMDGPU backend requires).

Change-Id: I763d0cabe5990394670281d4afb5a170981e55d0
DeltaFile
+0-00 files

LLVM/project 62ce978

[MIR] Error on signed integer in getUnsigned

Previously we effectively took the absolute value of the APSInt, instead
diagnose the unexpected negative value.

Change-Id: I4efe961e7b29fdf1d5f97df12f8139aac12c9219
DeltaFile
+0-00 files

LLVM/project 1da21d7lldb/test/API/functionalities/unwind/libunwind_ret_injection TestLibUnwindRetInjection.py

[lldb] Ensure libunwind architecture matches test for TestLibUnwindRetInjection.py (#198884)

If the test is run arm64e while the just-built libunwind is arm64 only,
the test will not function correctly.
DeltaFile
+38-12lldb/test/API/functionalities/unwind/libunwind_ret_injection/TestLibUnwindRetInjection.py
+38-121 files

LLVM/project 0aa7f26llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

[OpenMP][OMPIRBuilder] Refactor removeUnusedBlocksFromParent

This is essentially post-commit review for #198690 which was landed
quickly to fix nondeterminism in tests introduced in #197637

Change-Id: Ib3603ef3c70dde5bb22d0fc04d9249e62ecccf0c
Co-authored-by: @Meinersbur
Co-authored-by: @chichunchen
DeltaFile
+22-27llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+22-271 files

LLVM/project 0107b92mlir/include/mlir/Conversion/TosaToSPIRVTosa TosaToSPIRVTosa.h, mlir/lib/Conversion/TosaToSPIRVTosa TosaToSPIRVTosaPass.cpp TosaToSPIRVTosa.cpp

[mlir][tosa][spirv] Add TOSA to SPIR-V TOSA pass plumbing (#196539)

Introduce the initial TosaToSPIRVTosa conversion pass and library
wiring. This slice converts func.func regions to spirv.ARM.Graph inside
spirv.module, rewrites graph input/result types to SPIR-V ARM tensor
types, maps func.return to spirv.ARM.GraphOutputs, and adds focused
tests for type conversion, descriptor bindings, and nested containers.

Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
DeltaFile
+214-0mlir/lib/Conversion/TosaToSPIRVTosa/TosaToSPIRVTosaPass.cpp
+188-0mlir/lib/Conversion/TosaToSPIRVTosa/TosaToSPIRVTosa.cpp
+67-0mlir/test/Conversion/TosaToSPIRVTosa/type-conversions.mlir
+45-0mlir/include/mlir/Conversion/TosaToSPIRVTosa/TosaToSPIRVTosa.h
+28-0mlir/test/Conversion/TosaToSPIRVTosa/op-nesting.mlir
+28-0mlir/test/Conversion/TosaToSPIRVTosa/unsupported-func-calls.mlir
+570-05 files not shown
+631-011 files

LLVM/project 83a8c33llvm/lib/Target/DirectX/DirectXIRPasses DXILDebugInfo.cpp, llvm/test/tools/dxil-dis di-subprogram.ll

[DirectX] Drop unsupported DISubprogram flags (#197457)

These flags did not exist in LLVM 3.7 so should be omitted.
DeltaFile
+11-5llvm/lib/Target/DirectX/DirectXIRPasses/DXILDebugInfo.cpp
+1-1llvm/test/tools/dxil-dis/di-subprogram.ll
+12-62 files

LLVM/project b8ed8a2llvm/lib/Target/RISCV/Disassembler RISCVDisassembler.cpp

[RISCV][Disassembler] Refactor simple predicate decoders using a template

This replaces the manual boilerplate for DecodeGPRNoX0, DecodeGPRNoX2,
DecodeGPRNoX31, and DecodeGPRPairNoX0 with a universal filtering template
and constexpr predicate functions.

I will need more of these for the RVY patch series, so submitting this NFC
cleanup first.

Pull Request: https://github.com/llvm/llvm-project/pull/198146
DeltaFile
+17-31llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+17-311 files

LLVM/project e4b331ellvm/lib/MC/MCParser AsmParser.cpp MasmParser.cpp, llvm/test/MC/AsmParser macro-unknown-directive.s macros-darwin.s

[𝘀𝗽𝗿] initial version

Created using spr 1.3.8-beta.1
DeltaFile
+29-4llvm/lib/MC/MCParser/AsmParser.cpp
+29-4llvm/lib/MC/MCParser/MasmParser.cpp
+2-2llvm/test/MC/AsmParser/macro-unknown-directive.s
+2-2llvm/test/MC/AsmParser/macros-darwin.s
+1-2llvm/test/MC/AsmParser/unmatched-if-macro.s
+63-145 files

LLVM/project 5c63509lldb/source/Host/macosx/objcxx HostInfoMacOSX.mm

[LLDB] Add a progress event to xcrun invocations (#198931)

LLDB invokes xcrun to find SDKs on disk. This is usually very fast, but
sometimes (after an Xcode update, or when the searched SDK does not
exist) it can take very long (10s or more). The progress event provides
user feedback to explain the hang.
DeltaFile
+8-0lldb/source/Host/macosx/objcxx/HostInfoMacOSX.mm
+8-01 files

LLVM/project 262207b.github/workflows libc-fullbuild-tests.yml libc-overlay-tests.yml

[Github] Do not restrict branches for CI workflows (#198925)

This is covered in our CI best practices document in
https://llvm.org/docs/CIBestPractices.html#ensuring-workflows-run-on-the-correct-events.

Otherwise we cannot run libc CI workflows on stacked pull requests.
DeltaFile
+0-1.github/workflows/libc-fullbuild-tests.yml
+0-1.github/workflows/libc-overlay-tests.yml
+0-1.github/workflows/libc-freebsd-vm-tests.yml
+0-33 files