[AArch64] Fix Windows target detection in FrameLowering (#204347)
In #156467, we switched to using `getMCAsmInfo()->usesWindowsCFI()` to
recognize "Windows". This does not include Windows triples with ELF
binary formats.
So, for aarch64-pc-windows-msvc-elf we would use the Windows callee-save
list in `AArch64RegisterInfo::getCalleeSavedRegs()`, but FrameLowering
would handle this like Linux, and fail to invalidate the (x29, x28)
pairing.
This patch switches back to using AArch64Subtarget::isTargetWindows(),
which aligns with getCalleeSavedRegs().
Note: We were using `usesWindowsCFI()` to include UEFI targets, however,
there does not seem to be tests/support for UEFI triples on AArch64
(basic examples that compile for x86 fail: https://godbolt.org/z/dPWdTrEG7).
So, this has been moved to a TODO.
Fixes #204060
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
[LifetimeSafety] Model bit_cast and atomic casts in the fact generator (#204591)
VisitCastExpr dropped several borrow-carrying cast kinds into its
default case. Propagate the borrow through
`__builtin_bit_cast`/`std::bit_cast` of a pointer and through
wrapping/unwrapping `_Atomic(T*)`, so a stack address laundered through
either is caught (matching reinterpret_cast). hasOrigins and
buildListForType now see through AtomicType, which is transparent for
lifetimes.
Assisted-by: Claude Opus 4.8
Co-authored-by: Gabor Horvath <gaborh at apple.com>
[ELF][AArch64] Relax zero TLSLE add to nop (#204286)
Optimize AArch64 local-exec TLS relocation handling by replacing a
self-add R_AARCH64_TLSLE_ADD_TPREL_HI12 instruction with nop when the
high 12 bits are zero.
The optimization is disabled by --no-relax and avoids non-equivalent
forms such as non-self-adds and 32-bit destination registers.
[lit] Make RecursionError less likely in internal shell (#204573)
The lit internal shell chains together the contents of multiple RUN:
lines by connecting them with implicit && nodes, forming a binary tree
structure which is then executed recursively by `_executeShCommand`.
However the tree structure is constructed in a very simple way which
makes it effectively just a linked list, so `_executeShCommand` must
recurse to a depth equal to the number of commands.
If a test file contains more than 1000 RUN: lines (e.g. running the
clang driver only, with lots of different options), then this causes a
RecursionError exception, which did not happen using the external shell.
Failures of this kind can be avoided by instead connecting the commands
together in a _balanced_ binary tree, which has equivalent behaviour,
since the && shell operator is associative.