[LV] Update stale comment for partial reduction operands (NFC) (#198118)
The `neg` form was removed in #187228 (this case now uses the
out-of-loop sub, which is preferable, see #189739).
[Clang][ItaniumMangle][NFC] Refactor FunctionTypeDepthState (#196240)
This patch refactors `FunctionTypeDepthState` to use bit-fields and
moves the
`getNestingDepth` logic into it. It also renames
`{enter,leave}ResultType` to
`{enter,leave}FunctionDeclSuffix`, since the old names no longer match
their
current role.
[X86] LowerBUILD_VECTORvXi1 - attempt to fold as VPTESTMB(BUILD_VECTOR_vXi8(X),1) (#198166)
i1 scalar elements will be legalised to i8 (and the BUILD_VECTOR relies
on implicit truncation) - but it will often be cheaper to perform the
BUILD_VECTOR as a vXi8 and then perform a comparison to convert to the
vXi1 mask, assuming we're inserting more than one non-constant i1
element.
Without BWI we have to extend this to vXi32 types to perform the
comparison.
There's probably a lot we can do here (v2i8/v4i8/v8i8 types), but this
patch at least addresses the worst codegen cases.
Fixes #179334
[PowerPC] Fix i128 vcmpequb optimization for loads with range metadata and small constants (#196801)
The combine introduced in 55aff64d2c6ef50d2ed725d7dd1fb34080486237
lowers scalar i128 compares into vector compares by reissuing the
original loads as v16i8 loads. However, the combine was reusing the
original MachineMemOperand without modification.
If the original i128 load carries !range metadata, the MMO encodes that
range using i128 values. Reusing this MMO for a v16i8 load is incorrect
as range metadata is only valid for integer scalar types and its
bitwidth must match the memory VT.
This patch fixes this by creating a new MachineMemOperand for the vector
vector load. Additionally, we restrict the combine for constant operands
to avoid cases that are better handled by scalar lowering. Small
constants (fit within 16 bits) are excluded to prevent generating
suboptimal vector compares.
[SLP] Enable full non-power-of-2 vectorization by default
Default slp-vectorize-non-power-of-2 to true and broaden the set of
supported widths beyond NumElts + 1 == bit_ceil(NumElts) to include
small widths (<= 5), widths where NumElts - 1 is also non-power of two
(e.g. 6, 7, 10..15), and any width when the elements being vectorized
are themselves vectors (REVEC). Tweak gathered loads, stores, and
reduction support to the non-power-of-2 vector factors.
Reviewers: hiraditya, bababuck, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/196825
[CUF] Fix CompilerGeneratedNamesConversion renaming managed companion globals
CUFAddConstructor creates a companion pointer global (e.g. foo.managed.ptr)
for each non-allocatable managed variable. When CompilerGeneratedNamesConversion
ran after CUFAddConstructor, it replaced the dots with 'X',
so CUFOpConversionLate could no longer find the companion by name and fell back
to CUFGetDeviceAddress with the wrong host pointer, causing cudaErrorInvalidSymbol.
Fix: mark the companion global with a cuf.managed_ptr unit attribute in
CUFAddConstructor and skip it in CompilerGeneratedNamesConversionPass.
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[libc++] Introduce a private version of in_out_result and use it for copy/move algorithms (#198086)
This patch introduces a new `__in_out_result`, which is an internal
back-ported version of `in_out_result`, and is convertible to that when
it exists. This improves the readability of the code, since it replaces
uses of `first` and `second` with `__in_` and `__out_`, making it clear
which iterator is accessed.
Other algorithms will be updated in separate patches.
[CGCall] Initially store arg attrs using AttrBuilder (NFCI) (#197906)
Make the argument attribute more similar to fn/ret handling, by first
populating an AttrBuilder and then converting it to AttributeSet once at
the end, instead of using a lot of intermediate AttrBuilders. This also
ensures we cannot lose any attributes because one code path overwrites
another.
[AArch64] Copy x4/x5 vararg payload into the x64 stack in Arm64EC exit thunks (#190933)
Currently the x4/x5 in a variadic Arm64EC exit thunks are treated by
LLVM like any other outgoing arguments. x4/x5 contain a pointer to the
first stack parameter and the size of the parameters passed on the
stack, and the generated exit thunk must memcpy these to the x86-64
stack. Current MSVC does this correctly.
Rather than introducing a new entry to the CallingConv enum, we mark the
call as vararg in AArch64ArmECCallLowering so that the lowering logic in
AArch64ISelLowering.cpp can recognise this case, perform the necessary
memcpy, and drop the x4/x5 arguments.
LLVM should additionally ensure that x0-x3 are mirrored to f0-f3 in
order to match the Windows x86-64 vararg ABI, but that change is left
for a follow-up patch.