[ELF] Decouple SharedFile::isNeeded from GC mark. NFC (#190112)
... out of the per-relocation resolveReloc and into a post-GC scan of
global symbols. This decouples the --as-needed logic from the mark
algorithm, simplifying the imminent parallel GC mark.
[ELF] Pass SectionPiece by reference in getSectionPiece. NFC (#190110)
The generated assembly looks more optimized. In addition, this avoids
widened load, which would cause a TSan-detected data race with parallel
--gc-sections (#189321).
[JITLink] Remove unnecessary SymbolStringPtr copy. (#190101)
This was probably intended to be a `const SymbolStringPtr&` originally,
but if we were going to copy it anyway it's better to just take the
argument by value and std::move it.
[SelectionDAG] Use `KnownBits` to determine if an operand may be NaN. (#188606)
Given a bitcast into a fp type, use the known bits of the operand to
infer whether the resulting value can never be NaN.
[CIR][CIRGen] Support for section atttribute on cir.global (#188200)
Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/422
This PR implement support for `__attribute__((section("name")))` on
global variables in ClangIR, matching OGCG behavior.
[HLSL][SPIRV] Restore support for -g to generate NSDI (#190007)
The original attempt (#187051) produced a regression for
`intel-sycl-gpu` because `SPIRVEmitNonSemanticDI` will now self-activate
whenever `llvm.dbg.cu` is present. This removed the need for the
explicit `--spv-emit-nonsemantic-debug-info` flag.
The pass is now entered unconditionally for all SPIR-V targets, but
`NonSemantic.Shader.DebugInfo.100` requires the
`SPV_KHR_non_semantic_info`. Targets like `spirv64-intel` do not enable
that extension by default. When `checkSatisfiable()` ran on those
targets, it issued a fatal error rather than silently skipping.
Adds an early-out from `emitGlobalDI()`: if
`SPV_KHR_non_semantic_info` is not available for the current target, the
pass returns without emitting anything.
[RISCV] Move unpaired instruction back in RISCVLoadStoreOptimizer (#189912)
There are cases when the `Xqcilsm` vendor extension is enabled that we
are unable to pair non-adjacent load/store instructions. The
`RISCVLoadStoreOptimizer` moves the instruction adjacent to the other
before attempting to pair them but does not move them back when it
fails. This can sometimes prevent the generation of the `Xqcilsm`
load/store multiple instructions. This patch ensures that we move the
unpaired instruction back to it's original location.
[mlir] added a check in the walk to prevent catching a cos in a nested region (#190064)
The walk in SincosFusion may detect a cos within a nested region of the
sin block. This triggers an assertion in `isBeforeInBlock` later on.
Added a check within the walk so it filters operations in nested
regions, which are not in the same block and should not be fused anyway.
---------
Co-authored-by: Yebin Chon <ychon at nvidia.com>
[X86] LowerRotate - expand vXi8 non-uniform variable rotates using uniform constant rotates (#189986)
We expand vXi8 non-uniform variable rotates as a sequence of uniform
constant rotates along with a SELECT depending on whether the original
rotate amount needs it
This patch removes premature uniform constant rotate expansion to the
OR(SHL,SRL) sequences to allow GFNI targets to use single VGF2P8AFFINEQB
calls
[Support] Support nested parallel TaskGroup via work-stealing (#189293)
Nested TaskGroups run serially to prevent deadlock, as documented by
https://reviews.llvm.org/D61115 and refined by
https://reviews.llvm.org/D148984 to use threadIndex.
Enable nested parallelism by having worker threads actively execute
tasks from the work queue while waiting (work-stealing), instead of
just blocking. Root-level TaskGroups (main thread) keep the efficient
blocking Latch::sync(), so there is no overhead for the common
non-nested case.
In lld, https://reviews.llvm.org/D131247 worked around the limitation
by passing a single root TaskGroup into OutputSection::writeTo and
spawning 4MB-chunked tasks into it. However, SyntheticSection::writeTo
calls with internal parallelism (e.g. GdbIndexSection,
MergeNoTailSection) still ran serially on worker threads. With this
change, their internal parallelFor/parallelForEach calls parallelize
automatically via helpSync work-stealing.
[3 lines not shown]