[libc++] Introduce a private version of in_out_result and use it for copy/move algorithms (#198086)
This patch introduces a new `__in_out_result`, which is an internal
back-ported version of `in_out_result`, and is convertible to that when
it exists. This improves the readability of the code, since it replaces
uses of `first` and `second` with `__in_` and `__out_`, making it clear
which iterator is accessed.
Other algorithms will be updated in separate patches.
[CGCall] Initially store arg attrs using AttrBuilder (NFCI) (#197906)
Make the argument attribute more similar to fn/ret handling, by first
populating an AttrBuilder and then converting it to AttributeSet once at
the end, instead of using a lot of intermediate AttrBuilders. This also
ensures we cannot lose any attributes because one code path overwrites
another.
[AArch64] Copy x4/x5 vararg payload into the x64 stack in Arm64EC exit thunks (#190933)
Currently the x4/x5 in a variadic Arm64EC exit thunks are treated by
LLVM like any other outgoing arguments. x4/x5 contain a pointer to the
first stack parameter and the size of the parameters passed on the
stack, and the generated exit thunk must memcpy these to the x86-64
stack. Current MSVC does this correctly.
Rather than introducing a new entry to the CallingConv enum, we mark the
call as vararg in AArch64ArmECCallLowering so that the lowering logic in
AArch64ISelLowering.cpp can recognise this case, perform the necessary
memcpy, and drop the x4/x5 arguments.
LLVM should additionally ensure that x0-x3 are mirrored to f0-f3 in
order to match the Windows x86-64 vararg ABI, but that change is left
for a follow-up patch.
[CIR][AArch64] Lower NEON vtrn intrinsics (#197651)
### Summary
part of : https://github.com/llvm/llvm-project/issues/185382
lower `vtrn` intrinsics in:
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#transpose-elements
Lower `case NEON::BI__builtin_neon_vtrn_v` and `case
NEON::BI__builtin_neon_vtrnq_v` CIRGenBuiltinAArch64.cpp by porting by
porting the existing incubator
logic(`clangir/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp`),generating
the two transpose results with `cir.vec.shuffle`, and storing both
vectors into the returned pair structure.
All intrinsics in `clang/test/CodeGen/AArch64/neon-perm.c` have been
migrated into `clang/test/CodeGen/AArch64/neon/perm.c`,so I delete the
original file.
[IR] Verify that VP !prof does not have duplicate prof values
Follows on #193077. We should not have duplicate values as they should
be merged before being moved into IR. Add this to the verifier so that
we can actually enforce this constraint.
Reviewers: david-xl, teresajohnson
Pull Request: https://github.com/llvm/llvm-project/pull/193083
[flang][OpenMP] Add diagnostic for bare DECLARE TARGET in invalid scopes (#198039)
The bare form of `!$omp declare target` (without arguments or clauses)
is only permitted in the specification part of a subroutine, function,
or interface body (OpenMP 5.2, section 7.8.2). Flang previously accepted
it silently in BLOCK DATA, PROGRAM, MODULE, SUBMODULE, and BLOCK
constructs.
This patch:
- Adds a semantic check rejecting the bare form outside Subprogram
scopes.
- Adds MpSubprogramStmt/EndMpSubprogramStmt scope tracking to avoid
false positives in separate module subprograms (MODULE PROCEDURE).
- Fixes pre-existing BlockConstruct scope tracking bugs: the Leave
handler was pushing instead of popping (stack corruption), and the
Enter handler used blockStmt.source which resolves to the parent
scope. Now uses endBlockStmt.source (walked inside the block scope
during name resolution) for correct BlockConstruct scope identity.
[4 lines not shown]
[ELF] Fix imprecise --why-live message for exported symbols (#198139)
The "; may interpose" suffix is imprecise: a symbol is preserved because
it is exported into the dynamic symbol table, regardless of whether it
is interposable (preemptible).
Fix #192035 and add a test (previously uncovered)
[FileCheck] Refactor -dump-input test
This patch refactors `llvm/test/FileCheck/dump-input/annotations.txt`
to improve maintainability and coverage and to prepare for the
upcoming implementation of search range annotations.
Lit substitutions
=================
The test repeats the same basic set of RUN lines *many* times. This
patch encapsulates those in lit substitutions to improve
maintainability. By doing so, it also helps to ensure more consistent
coverage of all cases and thus slightly expands coverage.
-strict-whitespace
==================
Via those substitutions, this patch adds `-strict-whitespace`
throughout the test, and it drops the initial `-strict-whitespace`
[27 lines not shown]
[FileCheck] Resurrect overflow tests
D150880 (landed as 0726cb004718) uses `APInt` to eliminate most
integer overflow issues from FileCheck numeric variables. It also
removes the 4 tests in
`llvm/test/FileCheck/match-time-error-propagation`.
While the elimination of overflow issues reduces the importance of
those tests, the tests still seem worthwhile. Without them, I see no
test that exercises the "unable to substitute variable or numeric
expression: overflow error" diagnostic in FileCheck input dumps.
This patch resurrects those tests and updates them to exercise the
remaining unsigned underflow case.
[ELF] Remove redundant memset in SymbolTable::insert. NFC (#198132)
make<SymbolUnion>() value-initializes the union, zero-initializing all
sizeof(SymbolUnion) bytes. The following memset(sym, 0, sizeof(Symbol))
is therefore redundant.
This placeholder path runs no Symbol constructor, so it was not covered
by the constructor initialization in
905a88b923433eb8cd83677ea55bee82eb9ba498.
[X86] isExtractSubvectorCheap - fix typo in vXi1 extraction test (#198127)
Fix typo in check for ResVT subvector being half the size of the SrcVT vector (instead of vice-versa).
Fixes #195695
[ELF] Initialize Symbol fields in the constructor instead of via memset (#198129)
`initSectionsAndLocalSyms` and `makeDefined` memset the storage to zero
and then placement-new a Symbol-derived object into it. Placement new
begins a new object's lifetime. The standard does not seem to guarantee
the memset bytes carry into members the constructor leaves
uninitialized.
lld built by GCC 16 can make Valgrind report reads of Symbol::flags
(via getSymSectionIndex during finalizeSections) as uses of
uninitialized values (ClangBuiltLinux/linux#2162).
This patch reinstates the per-field initialization that commit
778742760534 ("[ELF] Avoid redundant assignment to Symbol fields. NFC")
had replaced with a bulk memset.
[VPlan] Refine plain CFG iterator name and strengthen assert (NFC). (#198124)
Address post-commit comments for
https://github.com/llvm/llvm-project/pull/197499:
* add rpo prefix the name to indicate traversal (similar to other
vp_depth_first_ helpers)
* Added comment about skipped VPIRBBs + assert.
[VPlan] Add blocksAs helper (NFC). (#198122)
Add new blocksAs helper which casts all blocks in the provided range to
the specified type, instead of filtering out non-matching blocks.
Migrate a number of users that expect only VPBasicBlocks.
Pointed out post-commit in
https://github.com/llvm/llvm-project/pull/197499.
[RISC-V][RVY] Introduce pure-capability ABI names
Adding this will allow updating #177249 to define the datalayout only
based on the triple and ABI instead of inspecting the feature string
which is a per-function property and not a per-module one.
The RVY ABIs are currently under review at this psABI pull request:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/499
Pull Request: https://github.com/llvm/llvm-project/pull/194270
[SLP][REVEC] Fix crash on scalable vector types with -slp-revec
isValidElementType incorrectly called toScalarizedTy for scalable vector
types because isVectorizedTy returns true for all vector types. This let
scalable types pass as valid revectorization elements, causing a fatal
"Cannot implicitly convert a scalable size to a fixed-width size" error
in getNumElements when it called getVectorizedTypeVF(Ty).getFixedValue().
Fixes #198076
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/198123
[VPlan] Split out adding canonical IV recipes to separate transform. (#197541)
Introduce canonical IV recipes after initial scalar
transformations/simplifications. Conceptually it is a separate
transformation and moving it later simplifies initial construction The
canonical IV is only needed once we handle early exits/introduce
regions.
This is needed to compute costs of scalar VPlans, where we need to
compare the cost of the original loop control instructions.
PR: https://github.com/llvm/llvm-project/pull/197541
[AArch64] Decompose constant multiplies used only by memory addresses (#194584)
AArch64 currently avoids decomposing a constant multiply when the
multiply has
a single ADD/SUB user, preserving the opportunity to form MADD/MSUB.
That heuristic is too conservative when the ADD/SUB is used only as a
memory
address. In that case the ADD/SUB is selected as part of load/store
address
mode selection, so preserving the multiply does not produce MADD/MSUB
and
prevents the existing constant-multiply decomposition from exposing
ADD/LSL
forms usable by AArch64 register-offset addressing.
Relax the bailout for ADD/SUB users that are only used as unindexed
load/store
base addresses.
Fixes #161446.
[SLP] Prefer VF-matching scalar-set match in gather-shuffle lookup
In isGatherShuffledSingleRegisterEntry, the perfect-match search accepted
an entry that isSame(TE->Scalars) regardless of the entry's vector factor.
isSame can succeed via ReuseShuffleIndices on an entry whose actual VF is
smaller than TE->Scalars.size(); the subsequent mask construction then
copies TE->getCommonMask() indices that overrun the chosen source's lanes,
producing wrong shufflevector masks and a more-poisonous result than the
scalar code.
Fixes #197765
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/198120
[LV] Update stale comment for partial reduction operands (NFC)
The `neg` form was removed in #187228 (this case now uses the out-of-loop sub, which is preferable, see #189739).