[SelectionDAG] Use `KnownBits` to determine if an operand may be NaN. (#188606)
Given a bitcast into a fp type, use the known bits of the operand to
infer whether the resulting value can never be NaN.
[CIR][CIRGen] Support for section atttribute on cir.global (#188200)
Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/422
This PR implement support for `__attribute__((section("name")))` on
global variables in ClangIR, matching OGCG behavior.
[HLSL][SPIRV] Restore support for -g to generate NSDI (#190007)
The original attempt (#187051) produced a regression for
`intel-sycl-gpu` because `SPIRVEmitNonSemanticDI` will now self-activate
whenever `llvm.dbg.cu` is present. This removed the need for the
explicit `--spv-emit-nonsemantic-debug-info` flag.
The pass is now entered unconditionally for all SPIR-V targets, but
`NonSemantic.Shader.DebugInfo.100` requires the
`SPV_KHR_non_semantic_info`. Targets like `spirv64-intel` do not enable
that extension by default. When `checkSatisfiable()` ran on those
targets, it issued a fatal error rather than silently skipping.
Adds an early-out from `emitGlobalDI()`: if
`SPV_KHR_non_semantic_info` is not available for the current target, the
pass returns without emitting anything.
[RISCV] Move unpaired instruction back in RISCVLoadStoreOptimizer (#189912)
There are cases when the `Xqcilsm` vendor extension is enabled that we
are unable to pair non-adjacent load/store instructions. The
`RISCVLoadStoreOptimizer` moves the instruction adjacent to the other
before attempting to pair them but does not move them back when it
fails. This can sometimes prevent the generation of the `Xqcilsm`
load/store multiple instructions. This patch ensures that we move the
unpaired instruction back to it's original location.
[mlir] added a check in the walk to prevent catching a cos in a nested region (#190064)
The walk in SincosFusion may detect a cos within a nested region of the
sin block. This triggers an assertion in `isBeforeInBlock` later on.
Added a check within the walk so it filters operations in nested
regions, which are not in the same block and should not be fused anyway.
---------
Co-authored-by: Yebin Chon <ychon at nvidia.com>
[X86] LowerRotate - expand vXi8 non-uniform variable rotates using uniform constant rotates (#189986)
We expand vXi8 non-uniform variable rotates as a sequence of uniform
constant rotates along with a SELECT depending on whether the original
rotate amount needs it
This patch removes premature uniform constant rotate expansion to the
OR(SHL,SRL) sequences to allow GFNI targets to use single VGF2P8AFFINEQB
calls
[Support] Support nested parallel TaskGroup via work-stealing (#189293)
Nested TaskGroups run serially to prevent deadlock, as documented by
https://reviews.llvm.org/D61115 and refined by
https://reviews.llvm.org/D148984 to use threadIndex.
Enable nested parallelism by having worker threads actively execute
tasks from the work queue while waiting (work-stealing), instead of
just blocking. Root-level TaskGroups (main thread) keep the efficient
blocking Latch::sync(), so there is no overhead for the common
non-nested case.
In lld, https://reviews.llvm.org/D131247 worked around the limitation
by passing a single root TaskGroup into OutputSection::writeTo and
spawning 4MB-chunked tasks into it. However, SyntheticSection::writeTo
calls with internal parallelism (e.g. GdbIndexSection,
MergeNoTailSection) still ran serially on worker threads. With this
change, their internal parallelFor/parallelForEach calls parallelize
automatically via helpSync work-stealing.
[3 lines not shown]
[C++20] [Modules] Add VisiblePromoted module ownership kind (#189903)
This patch adds a new ModuleOwnershipKind::VisiblePromoted to handle
declarations that are not visible to the current TU but are promoted to
be visible to avoid re-parsing.
Originally we set the visible visiblity directly in such cases. But
https://github.com/llvm/llvm-project/issues/188853 shows such decls may
be excluded later if we import #include and then import. So we have to
introduce a new visibility to express the intention that the visibility
of the decl is intentionally promoted.
Close https://github.com/llvm/llvm-project/issues/188853
[lld] Glob-based BP compression sort groups (#185661)
Add
--bp-compression-sort-section=<glob>[=<layout_priority>[=<match_priority>]]
to let users split input sections into multiple compression groups, run
balanced partitioning independently per group, and leave out sections
that are poor candidates for BP. This replaces the old coarse
--bp-compression-sort with a more explicit, user-controlled one.
In ELF, the glob matches input section names (.text.unlikely.cold1). In
Mach-O, it matches the concatenated segment+section name (__TEXT__text).
layout_priority controls group placement in the final layout.
match_priority resolves conflicts when multiple globs match the same
section: explicit priority beats positional matching, and among
positional specs the last match wins.
A CRTP hook getCompressionSubgroupKey() allows backends to further
subdivide glob groups into independent BP instances. This allows Mach-O
[3 lines not shown]
[NFC][SSAF][UnsafeBufferUsage] Separate EntityPointerLevel and UnsafeBufferUsage
EntityPointerLevel as a common data structure will later be shared by
UnsafeBufferUsage and pointer assignments analysis. So this commit
makes them separate:
- EntityPointerLevel provides the data structure and translation
- UnsafeBufferUsage uses EntityPointerLevel to translate unsafe pointers to EPLs.
[RISCV] Fix stackmap shadow trimming NOP size for compressed targets (#189774)
The shadow trimming loop in LowerSTACKMAP hardcoded a 4-byte decrement
per instruction, but when Zca is enabled NOPs are 2 bytes. Use NOPBytes
instead of the hardcoded 4 so the shadow is correctly trimmed on
compressed targets.
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[RISCV] Relax VL constraint in convertSameMaskVMergeToVMv (#189797)
When converting a PseudoVMERGE_VVM to PseudoVMV_V_V, we previously
required MIVL <= TrueVL to avoid losing False elements in the tail.
Relax this constraint when the vmerge's False operand equals its
Passthru operand and the True instruction's tail policy is TU
(tail undisturbed). In this case, True's tail lanes preserve its
passthru value (which equals False and Passthru), so the conversion
is safe even when MIVL > TrueVL.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[scudo] Fix reallocate for MTE. (#190086)
For MTE, we can't use the whole size or we might trigger a segfault.
Therefore, use the exact size when MTE is enabled or the exact usable
size parameter is true.
Also, optimize out the call to getUsableSize and use a simpler
calculation.
[MLIR][Vector] Enhance vector.multi_reduction unrolling to handle scalar result (#188633)
Previously, UnrollMultiReductionPattern bailed out when all the
dimensions were reduced to a scalar. This PR adds support for this case
by tiling the source vector and chaining partial reductions through the
accumulator operand.
[RISCV] Add SATI_RV64/USATI_RV64 to RISCVOptWInstrs. (#190030)
Note the immediates for these 2 instructions in their MachineInstr
representations both use the type width. The SATI_RV64 binary encoding
and the RISCVISD::SATI encoding uses the type width minus one.
Assisted-by: Claude Sonnet 4.5