LLVM/project 2118499lld/ELF MarkLive.cpp InputFiles.cpp

[ELF] Decouple SharedFile::isNeeded from GC mark. NFC (#190112)

... out of the per-relocation resolveReloc and into a post-GC scan of
global symbols. This decouples the --as-needed logic from the mark
algorithm, simplifying the imminent parallel GC mark.
DeltaFile
+11-7lld/ELF/MarkLive.cpp
+2-1lld/ELF/InputFiles.cpp
+1-1lld/ELF/InputFiles.h
+14-93 files

LLVM/project 2a7ca3allvm/test/CodeGen/RISCV/rvv fixed-vectors-ctlz-vp.ll fixed-vectors-cttz-vp.ll

[RISCV] Remove codegen for vp_ctlz, vp_cttz, vp_ctpop (#189904)

Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off 3 intrinsics from #179622.

Note that vp.cttz is the elementwise version, not vp.cttz.elts.
DeltaFile
+657-4,246llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz-vp.ll
+850-3,299llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz-vp.ll
+994-2,350llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
+1,066-1,197llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
+870-938llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctpop-vp.ll
+762-962llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll
+5,199-12,9922 files not shown
+5,213-13,0668 files

LLVM/project 0bde74alld/ELF InputSection.cpp

[ELF] Pass SectionPiece by reference in getSectionPiece. NFC (#190110)

The generated assembly looks more optimized. In addition, this avoids
widened load, which would cause a TSan-detected data race with parallel
--gc-sections (#189321).
DeltaFile
+2-1lld/ELF/InputSection.cpp
+2-11 files

LLVM/project 3346a76llvm/include/llvm/ExecutionEngine/JITLink JITLink.h

[JITLink] Remove unnecessary SymbolStringPtr copy. (#190101)

This was probably intended to be a `const SymbolStringPtr&` originally,
but if we were going to copy it anyway it's better to just take the
argument by value and std::move it.
DeltaFile
+1-1llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
+1-11 files

LLVM/project 9725dc2clang/lib/CIR/Dialect/Transforms TargetLowering.cpp, clang/test/CIR/CodeGen amdgpu-target-lowering-as.cpp

Coverage for AS target lowering and fix generic lowering conversion pattern on alloca types.
DeltaFile
+66-0clang/test/CIR/CodeGen/amdgpu-target-lowering-as.cpp
+14-1clang/lib/CIR/Dialect/Transforms/TargetLowering.cpp
+80-12 files

LLVM/project a50308cclang/lib/CIR/CodeGen TargetInfo.cpp, clang/test/CIR/CodeGen amdgpu-address-spaces.cpp

proper amdgpu constant AS encoding
DeltaFile
+3-4clang/lib/CIR/CodeGen/TargetInfo.cpp
+2-2clang/test/CIR/CodeGen/amdgpu-address-spaces.cpp
+5-62 files

LLVM/project ee6b9adclang/lib/CIR/Dialect/Transforms/TargetLowering/Targets AMDGPU.cpp

Use AMDGPU enums to map CIR AS
DeltaFile
+7-6clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/AMDGPU.cpp
+7-61 files

LLVM/project 5e53353clang/lib/CIR/Dialect/IR CIRDialect.cpp

fix code dup rebase bug
DeltaFile
+0-4clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+0-41 files

LLVM/project 4a919d3clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets AMDGPU.cpp

Add table-based CIR -> Target AS mapping
DeltaFile
+17-18clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/AMDGPU.cpp
+17-181 files

LLVM/project a0abf04clang/test/CIR/CodeGenCUDA address-spaces.cu

add ogcg cuda checks and todo on nptx lowering
DeltaFile
+11-9clang/test/CIR/CodeGenCUDA/address-spaces.cu
+11-91 files

LLVM/project ae50e89clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

more fmt
DeltaFile
+2-2clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+2-21 files

LLVM/project 035e2e4clang/lib/CIR/CodeGen TargetInfo.cpp CIRGenModule.cpp, clang/test/CIR/CodeGen amdgpu-address-spaces.cpp

fix tests to represent pre-target lowering state of AS
DeltaFile
+0-85clang/test/CIR/Lowering/global-address-space.cir
+15-30clang/lib/CIR/CodeGen/TargetInfo.cpp
+24-2clang/test/CIR/CodeGenCUDA/address-spaces.cu
+14-6clang/test/CIR/CodeGen/amdgpu-address-spaces.cpp
+2-3clang/lib/CIR/CodeGen/CIRGenModule.cpp
+55-1265 files

LLVM/project f0a10bdclang/lib/CIR/Dialect/Transforms TargetLowering.cpp

handle formatting
DeltaFile
+38-42clang/lib/CIR/Dialect/Transforms/TargetLowering.cpp
+38-421 files

LLVM/project 535be16clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

more fmt yo
DeltaFile
+0-4clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+0-41 files

LLVM/project d42e95eclang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

fix fmt
DeltaFile
+4-0clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+4-01 files

LLVM/project 4ddcd54clang/lib/CIR/CodeGen CIRGenModule.cpp, clang/lib/CIR/Dialect/IR CIRDialect.cpp

[CIR] Address Space support for GlobalOps
DeltaFile
+3-0clang/lib/CIR/CodeGen/CIRGenModule.cpp
+2-0clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+5-02 files

LLVM/project fdf1b4cclang/lib/CIR/CodeGen TargetInfo.cpp, clang/lib/CIR/Dialect/Transforms TargetLowering.cpp

[CIR][AMDGPU] Lower Language specific address spaces and implement AMDGPU target
DeltaFile
+252-1clang/lib/CIR/Dialect/Transforms/TargetLowering.cpp
+48-9clang/test/CIR/Lowering/global-address-space.cir
+51-0clang/test/CIR/CodeGen/amdgpu-address-spaces.cpp
+47-0clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/AMDGPU.cpp
+46-0clang/lib/CIR/CodeGen/TargetInfo.cpp
+9-2clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerModule.cpp
+453-124 files not shown
+470-1810 files

LLVM/project 9fcbf82clang/lib/CIR/CodeGen CIRGenModule.cpp, clang/lib/CIR/Dialect/IR CIRDialect.cpp

Global AS lowering For CUDA and CIRGen tests for target AS
DeltaFile
+0-3clang/lib/CIR/CodeGen/CIRGenModule.cpp
+2-0clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+2-32 files

LLVM/project 9a354fcllvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/AMDGPU fneg-modifier-casting.ll

[SelectionDAG] Use `KnownBits` to determine if an operand may be NaN. (#188606)

Given a bitcast into a fp type, use the known bits of the operand to
infer whether the resulting value can never be NaN.
DeltaFile
+67-0llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
+19-1llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+2-2llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll
+88-33 files

LLVM/project dbc206fclang/include/clang/CIR MissingFeatures.h, clang/include/clang/CIR/Dialect/IR CIROps.td

[CIR][CIRGen] Support for section atttribute on cir.global (#188200)

Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/422

This PR implement support for `__attribute__((section("name")))` on
global variables in ClangIR, matching OGCG behavior.
DeltaFile
+26-13clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+20-4clang/lib/CIR/CodeGen/CIRGenModule.cpp
+18-0clang/test/CIR/CodeGen/global-section.c
+16-0clang/include/clang/CIR/Interfaces/CIROpInterfaces.td
+7-1clang/include/clang/CIR/Dialect/IR/CIROps.td
+1-2clang/include/clang/CIR/MissingFeatures.h
+88-206 files

LLVM/project 53bac5cclang/test/CodeGenHIP incorrect-atomic-scope.hip, clang/test/CodeGenOpenCL incorrect-atomic-scope.cl

move the new tests to Sema
DeltaFile
+35-0clang/test/SemaOpenCL/incorrect-atomic-scope.cl
+0-35clang/test/CodeGenOpenCL/incorrect-atomic-scope.cl
+31-0clang/test/SemaHIP/incorrect-atomic-scope.hip
+0-31clang/test/CodeGenHIP/incorrect-atomic-scope.hip
+66-664 files

LLVM/project 06aae40clang/lib/CodeGen CGDebugInfo.cpp, clang/test/CodeGenHLSL/debug source-language.hlsl

[HLSL][SPIRV] Restore support for -g to generate NSDI (#190007)

The original attempt (#187051) produced a regression for
`intel-sycl-gpu` because `SPIRVEmitNonSemanticDI` will now self-activate
whenever `llvm.dbg.cu` is present. This removed the need for the
explicit `--spv-emit-nonsemantic-debug-info` flag.

The pass is now entered unconditionally for all SPIR-V targets, but
`NonSemantic.Shader.DebugInfo.100` requires the
`SPV_KHR_non_semantic_info`. Targets like `spirv64-intel` do not enable
that extension by default. When `checkSatisfiable()` ran on those
targets, it issued a fatal error rather than silently skipping.

Adds an early-out from `emitGlobalDI()`: if
`SPV_KHR_non_semantic_info` is not available for the current target, the
pass returns without emitting anything.
DeltaFile
+34-0clang/test/CodeGenHLSL/debug/source-language.hlsl
+32-0llvm/test/CodeGen/SPIRV/debug-info/hlsl-debug-info-auto-activation.ll
+22-0llvm/test/CodeGen/SPIRV/debug-info/no-nonsemantic-without-extension.ll
+6-5llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
+5-3llvm/docs/SPIRVUsage.rst
+6-2clang/lib/CodeGen/CGDebugInfo.cpp
+105-106 files not shown
+119-1512 files

LLVM/project 18a0657llvm/lib/Target/RISCV RISCVLoadStoreOptimizer.cpp, llvm/test/CodeGen/RISCV xqcilsm-lwmi-swmi-multiple.mir

[RISCV] Move unpaired instruction back in RISCVLoadStoreOptimizer (#189912)

There are cases when the `Xqcilsm` vendor extension is enabled that we
are unable to pair non-adjacent load/store instructions. The
`RISCVLoadStoreOptimizer` moves the instruction adjacent to the other
before attempting to pair them but does not move them back when it
fails. This can sometimes prevent the generation of the `Xqcilsm`
load/store multiple instructions. This patch ensures that we move the
unpaired instruction back to it's original location.
DeltaFile
+20-1llvm/test/CodeGen/RISCV/xqcilsm-lwmi-swmi-multiple.mir
+12-3llvm/lib/Target/RISCV/RISCVLoadStoreOptimizer.cpp
+32-42 files

LLVM/project 8c2feeabolt/lib/Target/RISCV RISCVMCPlusBuilder.cpp, bolt/runtime sys_riscv64.h instr.cpp

[BOLT] Delete unnecessary instructions (#189297)
DeltaFile
+64-96bolt/runtime/sys_riscv64.h
+6-12bolt/runtime/instr.cpp
+3-6bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
+73-1143 files

LLVM/project 495e1a4mlir/lib/Dialect/Math/Transforms SincosFusion.cpp, mlir/test/Dialect/Math sincos-fusion.mlir

[mlir] added a check in the walk to prevent catching a cos in a nested region (#190064)

The walk in SincosFusion may detect a cos within a nested region of the
sin block. This triggers an assertion in `isBeforeInBlock` later on.
Added a check within the walk so it filters operations in nested
regions, which are not in the same block and should not be fused anyway.

---------

Co-authored-by: Yebin Chon <ychon at nvidia.com>
DeltaFile
+23-0mlir/test/Dialect/Math/sincos-fusion.mlir
+2-4mlir/lib/Dialect/Math/Transforms/SincosFusion.cpp
+25-42 files

LLVM/project d52daealibc/test/shared shared_math_test.cpp

[libc] Fix the remaining long double issue in shared_math_test.cpp. (#190098)
DeltaFile
+5-6libc/test/shared/shared_math_test.cpp
+5-61 files

LLVM/project c8c7186llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 gfni-rotates.ll vector-fshr-rot-512.ll

[X86] LowerRotate - expand vXi8 non-uniform variable rotates using uniform constant rotates (#189986)

We expand vXi8 non-uniform variable rotates as a sequence of uniform
constant rotates along with a SELECT depending on whether the original
rotate amount needs it

This patch removes premature uniform constant rotate expansion to the
OR(SHL,SRL) sequences to allow GFNI targets to use single VGF2P8AFFINEQB
calls
DeltaFile
+301-623llvm/test/CodeGen/X86/gfni-rotates.ll
+30-30llvm/test/CodeGen/X86/vector-fshr-rot-512.ll
+9-20llvm/lib/Target/X86/X86ISelLowering.cpp
+12-12llvm/test/CodeGen/X86/vector-fshr-rot-256.ll
+352-6854 files

LLVM/project 8daaa26lld/test/ELF merge-piece-oob.s, llvm/include/llvm/Support Parallel.h

[Support] Support nested parallel TaskGroup via work-stealing (#189293)

Nested TaskGroups run serially to prevent deadlock, as documented by
https://reviews.llvm.org/D61115 and refined by
https://reviews.llvm.org/D148984 to use threadIndex.

Enable nested parallelism by having worker threads actively execute
tasks from the work queue while waiting (work-stealing), instead of
just blocking. Root-level TaskGroups (main thread) keep the efficient
blocking Latch::sync(), so there is no overhead for the common
non-nested case.

In lld, https://reviews.llvm.org/D131247 worked around the limitation
by passing a single root TaskGroup into OutputSection::writeTo and
spawning 4MB-chunked tasks into it. However, SyntheticSection::writeTo
calls with internal parallelism (e.g. GdbIndexSection,
MergeNoTailSection) still ran serially on worker threads. With this
change, their internal parallelFor/parallelForEach calls parallelize
automatically via helpSync work-stealing.

    [3 lines not shown]
DeltaFile
+16-59llvm/unittests/Support/ParallelTest.cpp
+27-7llvm/lib/Support/Parallel.cpp
+2-4llvm/include/llvm/Support/Parallel.h
+2-2lld/test/ELF/merge-piece-oob.s
+47-724 files

LLVM/project dee982dllvm/lib/Target/AArch64 AArch64PostCoalescerPass.cpp AArch64.h, llvm/test/CodeGen/AArch64 aarch64-post-coalescer.mir

[NewPM] Adds a port for AArch64PostCoalescerPass (#189520)

Adds a standard porting for AArch64PostCoalescer to NewPM.
DeltaFile
+69-52llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp
+8-1llvm/lib/Target/AArch64/AArch64.h
+2-1llvm/test/CodeGen/AArch64/aarch64-post-coalescer.mir
+1-1llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+1-0llvm/lib/Target/AArch64/AArch64PassRegistry.def
+81-555 files

LLVM/project e27e7e4llvm/lib/Target/AArch64/GISel AArch64PreLegalizerCombiner.cpp

[NFC][AAarch64] Remove PreLegalizerCombiner pass dependency on TargetPassConfig (#190073)

This will enable NewPM porting.

Replaced with the definition in
[AArch64PassConfig::getCSEConfig](https://github.com/llvm/llvm-project/blob/1d549d9a777a6faef6d425cb6482ab1fa6b91bb7/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp#L614)
DeltaFile
+2-6llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
+2-61 files