[IR] Store fast-math flags in subclasses of Instruction (#191190)
Move fast-math flags out from `Value`, because we are out of space of
`Value::SubclassOptionalData` and it is incompatible with other
optimization flags like `nneg`.
FP variant for `call/select/phi` is not introduced, because of
`mutateType`, it may change the type of the `Instruction` instance,
which may cause UB.
RFC:
https://discourse.llvm.org/t/rfc-store-fast-math-flags-in-subclasses-of-instruction/
[X86] Add handling for sub-128bit minmax reductions (#198319)
Fold sub-128bit minmax reductions as ISD::VECREDUCE nodes.
This needed some cleanup to correct discard "identity value padded"
upper elements from legalisation - existing folds struggle to do this
due to the DemandedElts mask needing to be accurate enough (and
reductions nearly always result in multiple uses of source operands) -
I've been trying to do something similar in
TargetLowering::expandVecReduce but haven't managed it yet.
Final backend patch blocking #194473 - still some middle-end reduction
pattern matching that needs some fixing first though.
[CUF] Fix CompilerGeneratedNamesConversion renaming managed companion globals
CUFAddConstructor creates a companion pointer global (e.g. foo.managed.ptr)
for each non-allocatable managed variable. When CompilerGeneratedNamesConversion
ran after CUFAddConstructor, it replaced the dots with 'X',
so CUFOpConversionLate could no longer find the companion by name and fell back
to CUFGetDeviceAddress with the wrong host pointer, causing cudaErrorInvalidSymbol.
Fix: mark the companion global with a cuf.managed_ptr unit attribute in
CUFAddConstructor and skip it in CompilerGeneratedNamesConversionPass.
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[CUF] Handle renamed managed companion pointer in CUFDeviceAddressOpConversion
CUFAddConstructor creates a companion pointer global (@sym.managed.ptr) for
each non-allocatable managed variable. CompilerGeneratedNamesConversion may
run before CUFOpConversionLate and rename the global by replacing dots with
'X', producing @symXmanagedXptr. Extend CUFDeviceAddressOpConversion to try
both the original and the renamed suffix when looking up the companion pointer.
Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
[MISched] Make TG MacroFusion creation target specific(NFC) (#198304)
This patch moves TableGen-based macro fusion initialization from generic
MachineScheduler factory methods, to the specific targets that use it,
currently only RISCV. This better enables targets to have complete
control over DAG mutation ordering, e.g. for porting other targets to
TableGen macro fusion. More specifically, AArch64 requires the inverse
partial order between ld/st clustering and macro fusion, as part of an
attempt to migrate it to TableGen.
[AArch64] Apply generational deltas for Apple tuning features (#197115)
This patch refactors how we manage tuning features for AArch64 Apple
CPUs. Instead of duplicating feature lists for each generation, we now
form a chain such that every generation only adds or removes features
from the immediate predecessor.
This creates a much more compact representation, enables to view
generational changes at a glance, and can reduce potential copy-paste
errors by establishing a more structured dataflow.
However, not without pitfalls: to gather a feature list for a specific
gen we need to traverse a chain, and feature addition and removal
becomes more subtle as it propagates across the chain. The TableGen test
of complete feature lists per generation may alleviate these issues a
bit.
[cmake][runtimes] Remove obsolete LLVM_RUNTIMES_PREFIX (#198367)
Hasn't been used by anything for close to 8 years.
Fixes: 887f26d4703616934fd7a11b6649f605e1c7b4e3
[x86] De-type getMinimalPhysRegClass uses (NFC) (#198332)
Pulled out of #197495 which is de-typing this API. There's very few uses
of this API with a type across the whole codebase and only three in x86
related to callee-saves where an RC can instead be chosen directly.
[AMDGPU] De-type getMinimalPhysRegClass uses (NFC) (#198301)
Pulled out of #197495 which is de-typing this API. There's very few uses
of this API with a type across the whole codebase and only two in AMDPU.
No test fallout when dropping the type from these calls, so I'm assuming
they're not necessary.
[libc] Migrate socket syscall wrappers to syscall_checked (#198241)
Also update the file headers while I'm at it. Move includes into a
single block so that clang-format can enforce a consistent ordering. Fix
a couple of discrepancies in the cmake file.
Assisted by Gemini.
[alpha.webkit.UncountedCallArgsChecker] Check arguments of function pointers (#188162)
This PR fixes a hole in WebKit's static analysis that we weren't
checking the soundness of argumnets to a function call via a (member)
function pointer.
[alpha.webkit.UncountedCallArgsChecker] Emit a warning for a WeakPtr argument. (#184563)
This PR fixes a bug in UncountedCallArgsChecker that it would not emit a
warning when a function is called with a WeakPtr local variable as an
argument.
We normally don't generate a warning for a local variable passed to a
function argument in UncountedCallArgsChecker as the variable may have a
guardian in an outer scope but only UncountedLocalVarsChecker is capable
of locating one. So rather than generating a warning in
UncountedCallArgsChecker directly, we rely on UncountedLocalVarsChecker
to generate a warning for the local variable.
This all falls apart in the case of a WeakPtr local variable because a
WeakPtr is explicitly allowed as a local variable by
UncountedLocalVarsChecker.
So, this PR fixes the bug by detecting this exact scenario (a WeakPtr
local variable used as a function argument), and generate a warning
[7 lines not shown]
[RISCV] Add assembler and disassembler support for Xqccmt extension (#197673)
Xqccmt is Qualcomm's vendor extension providing compressed (16-bit) jump
table instructions, equivalent to (and mutually exclusive with) the
standard Zcmt extension.
Two instructions are added:
- qc.cm.jt (index 0-31): jump via table, no link register written
- qc.cm.jalt (index 32-255): jump via table with link; bit 0 of the jump
table entry selects the link register at runtime: 0 = ra (x1), 1 = t0
(x5)
The encoding is identical to cm.jt/cm.jalt from Zcmt. Xqccmt and Zcmt
are mutually exclusive and cannot be combined. Xqccmt is also
incompatible with Zcd (overlapping encoding space).
Spec: https://github.com/riscv/riscv-unified-db/pull/1788
[lldb][bytecode] Add GetParent and Clone selectors (#197312)
`GetParent` and `Clone` are needed to implement a `std::optional<T>`
data formatter for libc++.
[X86] Remove extra MOV after widening atomic store
This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
[SelectionDAG] Widen <2 x T> vector types for atomic store
Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.
[lldb] Increase availability of ValueObject::GetParent (#197311)
While working formatter bytecode, one of the C++ formatters needs
`GetParent`. While adding `GetParent` support in the bytecode, I noticed
the SB API also does not expose `GetParent`. This remedies that.
During review of this PR, it was pointed out that `GetParent` does work
with synthetic value objects. This PR also addresses that shortcoming.
Assisted-by: claude
[clang][bytecode] Improve constexpr-unknown handling (#196334)
1) Global variables as well as dummies can not be marked
constexpr-unknown. There is a subtlety here with global variables: we
can't register it as constexpr-unknown and later figure out that it
actually _isn't_.
2) Add a `GetRefGlobal` op similar to the existing `GetRefLocal`.
3) Reject constexpr-unknown values in `CmpHelperEQ<Pointer>`
4) Diagnose constexpr-unknown values in `GetTypeidPtr`
[AMDGPU] Add amdgcn.av.(load|store).b128 intrinsics (#191390)
The new `@llvm.amdgcn.av` family of intrinsics have availability and
visibility semantics as described in #191246. Each of them takes a scope
operand that is then translated to target-specific cache policy bits.
This allows the user to control how the side-effects of these loads and
stores are made visible to other threads.
This patch was extracted from #172090.
Co-authored-by: macurtis-amd <macurtis at amd.com>
Assisted-by: Claude Opus 4.6
[RISCV][P-ext] Set BITCAST to Custom for 64-bit packed vectors on RV32 (#198267)
Bitcasts between i64 and v8i8/v4i16/v2i32 used to expand to a stack
roundtrip, and the resulting concat_vectors let DAG combine split
paired-register arithmetic into two single-reg ops (e.g. v8i8 add became
two padd.b instead of one padd.db). The existing Is64BitCast handler in
LowerOperation already treats these as no-ops; this just routes through
it.
[lldb] Change ValueObject::Clone to take StringRef (NFC) (#198035)
Make `ValueObject`'s name being a `ConstString` more of an
implementation detail by changing `Clone` and `SetName` take a
`StringRef`.
[Clang] Fixed a crash when instantiating an invalid out-of-line static data member definition in a local class (#196772)
Add check before the function that cause assertion.
Fix #176152, Fix #195416