[AArch64] Add SVE shuffle optimization pass (#193951)
Add a pass to perform VLA shuffle optimizations for SVE.
First up is using tbl to replace deinterleave4+uunpk+zext/uitofp
by generating shuffle masks with index, exploiting the fact that
out-of-range indices in the mask produce zeroes in the result
vector. That way, we can easily zero-extend smaller elements
by using the destination type when generating the mask, and
having one index in range with several out-of-range for each
destination element.
[Delinearization] Narrow the scope of the term collection (#204145)
In parametric delinearization, it collects subexpressions whose SCEV
type is `SCEVUnknown` and uses them as candidates for the array
dimensions. When traversing these subexpressions, it may follow any kind
of expression. For example, if it follows a `sext` expression, this can
lead to type inconsistencies among the collected terms.
This patch fixes this issue by preventing traversal into subexpressions
other than `SCEVAddExpr` or `SCEVAddRecExpr`.
Note: I tried to minimize the test case, but this seems to be as far as
it can go.
Fix #204066.
[mlir][ExecutionEngine] Fix dead -Wno-c++98-compat-extra-semi guard (#204524)
`check_cxx_compiler_flag` stores its result in
`CXX_SUPPORTS_NO_CXX98_COMPAT_EXTRA_SEMI_FLAG`, but the guarding `if()`
checked `CXX_SUPPORTS_CXX98_COMPAT_EXTRA_SEMI_FLAG` (without `_NO_`),
which is never set. The condition was therefore always false and the
`-Wno-c++98-compat-extra-semi` suppression for `mlir_rocm_runtime` was
never applied.
The sibling flag checks in the same block (`-Wno-return-type-c-linkage`,
`-Wno-nested-anon-types`, `-Wno-gnu-anonymous-struct`) already use
matching variable names, so this aligns the typo'd guard with the
established pattern.
No test is included, this is a build-system-only (CMake) change to a
warning-suppression guard and is not unit-testable.
Signed-off-by: bogdan-petkovic <bpetkovi at amd.com>
[SPIR-V] Fix crash on void indirect call with aggregate argument (#204388)
removeAggregateTypesFromCalls named the call to key the type-restoration
metadata, which asserts for void-returning calls. Key the metadata via
instruction metadata on the call instead, which works for void results.
[Clang][NEON ACLE] Remove +bf16 requirement from opaque bfloat builtins. (#204201)
Builtins that only care about the size of the element type but not its
format (e.g loads, stores and shuffles) do not require any special
instructions to code generate beyond those already available to +neon.
Fixes https://github.com/llvm/llvm-project/issues/203159
[AArch64] Combine undef UZP and NVCAST away.
These are used to lower insert_subvec nodes quite early in SDAG. After
DAG combines run, it's possible that the inputs to these AArch64 nodes
become UNDEF.
[AArch64][SDAG] Legalise nxv1 gather/scatter nodes (#204620)
This updates WidenVecRes_MGATHER and WidenVecOp_MSCATTER to support
scalable vector types.
[SPIR-V] Legalize G_PHI of oversized vectors via fewer-elements (#203993)
`G_PHI` on vectors wider than the SPIR-V max vector size previously
failed legalization. This PR adds a `fewerElementsIf` rule that splits
them down to `MaxVectorSize`, matching how other vector ops are handled
in `SPIRVLegalizerInfo.cpp`.
Added the following test
`llvm/test/CodeGen/SPIRV/instructions/phi-large-vector.ll` covering
spirv32 and spirv64.
[llvm] Remove LLVM_ABI_FOR_TEST in public headers (#204627)
These annotations were mistakenly set up as LLVM_ABI_FOR_TEST. Since
these are public headers, they should be using LLVM_ABI.
The effort to build LLVM as a dylib is tracked in #109483.
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
[AArch64] Fix Windows target detection in FrameLowering (#204347)
In #156467, we switched to using `getMCAsmInfo()->usesWindowsCFI()` to
recognize "Windows". This does not include Windows triples with ELF
binary formats.
So, for aarch64-pc-windows-msvc-elf we would use the Windows callee-save
list in `AArch64RegisterInfo::getCalleeSavedRegs()`, but FrameLowering
would handle this like Linux, and fail to invalidate the (x29, x28)
pairing.
This patch switches back to using AArch64Subtarget::isTargetWindows(),
which aligns with getCalleeSavedRegs().
Note: We were using `usesWindowsCFI()` to include UEFI targets, however,
there does not seem to be tests/support for UEFI triples on AArch64
(basic examples that compile for x86 fail: https://godbolt.org/z/dPWdTrEG7).
So, this has been moved to a TODO.
Fixes #204060
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
[LifetimeSafety] Model bit_cast and atomic casts in the fact generator (#204591)
VisitCastExpr dropped several borrow-carrying cast kinds into its
default case. Propagate the borrow through
`__builtin_bit_cast`/`std::bit_cast` of a pointer and through
wrapping/unwrapping `_Atomic(T*)`, so a stack address laundered through
either is caught (matching reinterpret_cast). hasOrigins and
buildListForType now see through AtomicType, which is transparent for
lifetimes.
Assisted-by: Claude Opus 4.8
Co-authored-by: Gabor Horvath <gaborh at apple.com>
[ELF][AArch64] Relax zero TLSLE add to nop (#204286)
Optimize AArch64 local-exec TLS relocation handling by replacing a
self-add R_AARCH64_TLSLE_ADD_TPREL_HI12 instruction with nop when the
high 12 bits are zero.
The optimization is disabled by --no-relax and avoids non-equivalent
forms such as non-self-adds and 32-bit destination registers.
[lit] Make RecursionError less likely in internal shell (#204573)
The lit internal shell chains together the contents of multiple RUN:
lines by connecting them with implicit && nodes, forming a binary tree
structure which is then executed recursively by `_executeShCommand`.
However the tree structure is constructed in a very simple way which
makes it effectively just a linked list, so `_executeShCommand` must
recurse to a depth equal to the number of commands.
If a test file contains more than 1000 RUN: lines (e.g. running the
clang driver only, with lots of different options), then this causes a
RecursionError exception, which did not happen using the external shell.
Failures of this kind can be avoided by instead connecting the commands
together in a _balanced_ binary tree, which has equivalent behaviour,
since the && shell operator is associative.