[SLP] Prefer VF-matching scalar-set match in gather-shuffle lookup
In isGatherShuffledSingleRegisterEntry, the perfect-match search accepted
an entry that isSame(TE->Scalars) regardless of the entry's vector factor.
isSame can succeed via ReuseShuffleIndices on an entry whose actual VF is
smaller than TE->Scalars.size(); the subsequent mask construction then
copies TE->getCommonMask() indices that overrun the chosen source's lanes,
producing wrong shufflevector masks and a more-poisonous result than the
scalar code.
Fixes #197765
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/198120
[LV] Update stale comment for partial reduction operands (NFC)
The `neg` form was removed in #187228 (this case now uses the out-of-loop sub, which is preferable, see #189739).
[FileCheck][NFC] Introduce MarkerRange for -dump-input (#196800)
`MarkerRange` makes the computation of marker ranges clearer because it
encapsulates handling of several subtle boundary cases:
- It handles adjustments to line numbers when a range boundary appears
at a line boundary.
- It avoids related mistakes in determining whether the range is
contained within a single line.
- It avoids the mistake of producing no marker in an input annotation
for an empty range.
It will be used more in a future patch that extends `-dump-input` to
present search ranges for all errors.
This PR is stacked on PR #196799.
[AArch64][GlobalISel] Improve multiplication with multiple registers (#197943)
When working on codegen for `llvm.umul.fix.sat` I have recognized that
among for many things, GISel also generates worse code for mul when the
data is in multiple registers (for example when the register length is
64 bits but you want to multiply two 128 bit values).
Here is the example ll:
```
define i128 @i128(i128 %a, i128 %b) {
entry:
%s = mul i128 %a, %b
ret i128 %s
}
```
This is what GISel gave:
```
mul x9, x0, x3
[19 lines not shown]
[X86] LowerVECREDUCE - add AllowScalarization operand (#198109)
Pull out scalarization control from LowerVECREDUCE call to allow
different decisions based on the VECREDUCE opcode in future patches
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
Remove default setting signaling_nan attribute for strictfp functions
We cannot describe such behavior in Clang User Manual, strictfp is not
visible for a user.
[lldb] Make CommandObject::GetTarget filter out the dummy target (#198026)
Follow-up to #197805. Make CommandObject::GetTarget the canonical target
accessor for command code, and tighten its semantics so that DoExecute
methods can't accidentally operate on the dummy target.
GetTarget now returns Target* instead of Target&. The result is the
target from the command's frozen execution context, falling back to the
interpreter's execution context. The dummy target is filtered out and
replaced with nullptr unless the command opts in via one of the
eCommandRequires{Target,Process,Thread,Frame} flags (in which case
CheckRequirements has already guaranteed a real target) or via the new
eCommandAllowsDummyTarget flag.
This is the first half of the cleanup discussed at the end of #197805. A
follow-up will audit DoExecute methods that still reach for
GetSelectedTarget or m_exe_ctx.GetTargetPtr() directly and migrate them
to GetTarget.
[X86] Cast atomic vectors in IR to support floats
Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.
Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.
Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two
[4 lines not shown]
[SelectionDAG] Split vector types for atomic store
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.
[JTS] Drop test for multiple zero values in VP metadata
This will soon become a verifier failure. Drop the test so that we can
actually enforce this in the verifier without causing test failures.
Reviewers: mtrofin
Pull Request: https://github.com/llvm/llvm-project/pull/197617
[InstrProf] Deduplicate VP values
Zero VP values can come up in some places. They are intentional around
external symbols for indirect call sites, and it seems like they might
be unintentional around memop VP metadata
(https://reviews.llvm.org/D92074). This patch combines them so that we
can enforce the variant that there are no duplicate values in VP
metadata, which allows passes to make some simplifying assumptions. We
also deduplicate non-zero values, because there is error handling for
them and still some undebugged cases where they show up
(https://reviews.llvm.org/D136211).
This ended up being a bit messier than I would like due to the need to
handle non-zero duplicate values and preserve existing error handling
behavior in llvm-profdata. I've left comments explaining this so we can
hopefully clean this up when llvm-profdata eventually gets fixed. The
error has shown up in some places
(https://issues.chromium.org/issues/353702041), so does still exist, but
I still have not been able to find profraw files to be able to fix the
[6 lines not shown]
[clang-tidy][NFC] Fix modernize-macro-to-enum testcases (#198093)
Previously these header files are not tested, the new added test case
fixes the problem.
As of AI Usage: Codex is used to suggest the new tests
Closes https://github.com/llvm/llvm-project/issues/173530
[mlir] Cleanup Operation.cpp (NFC) (#197712)
This PR cleans up the Operation.cpp based on clangd suggestions. It
removes unused headers, fixes incorrect comments, and improves
performance by applying std::move where appropriate.
[AArch64][GlobalISel] Fold buildvector of bitcast (#141553)
This adds a combine for buildvectors from bitcast values, sinking the
bitcast and generating a buildvector from the original scalar type.
```
%5:_(<4 x s8>) = G_BITCAST %16:_(s32)
%18:_(s8), %19:_(s8), %20:_(s8), %21:_(s8) = G_UNMERGE_VALUES %5:_(<4 x s8>)
%22:_(s8) = G_IMPLICIT_DEF
%23:_(<8 x s8>) = G_BUILD_VECTOR %18:_(s8), %19:_(s8), %20:_(s8), %21:_(s8), %22:_(s8), %22:_(s8), %22:_(s8), %22:_(s8)
=>
%undef:_(s32) = G_IMPLICIT_DEF
%bv:_(<2 x s32>) = G_BUILD_VECTOR %16:_(s32), %undef:_(s32)
%23:_(<8 x s8>) = G_BITCAST %bv:_(<2 x s32>)
```
It helps clean up some of the inefficiencies from widening scalar types.
Fixup"[llvm-ir2vec] Breaking up llvm-ir2vec lib implementation to clean up MIR deps from ir2vec python bindings (#194414)" (#198077)
llvm-ir2vec and LLVMMIREmbUtils was missing some deps which show up when
-DBUILD_SHARED_LIBS=ON. Fixed the Cmakelists.txt to reflect accurate
dependencies
[offload] Add new features to libompaccsupport for OpenACC
AsyncInfoTy STATIC_NON_BLOCKING type.
Strided array copies and mapping.
No create mapping type.
Refactoring intialization.
Loading offload objects with OpenACC offloading kind.
[offload][clang][llvm] Add new openacc offload kind
The OpenACC offloading kind is equivalent to OpenMP except for which
initialization functino is called at initialization time.
[DAG] SimplifyDemandedBits - remove ISD::FREEZE node if all demanded elements are not undef/poison (#198084)
Similar to what we already do in SimplifyDemandedVectorElts