[libc] Honor per-test timeout in lit test format (#193772)
The custom LibcTest format did not pass litConfig.maxIndividualTestTime
to executeCommand. This caused --timeout to be silently ignored, so
hanging tests like fdiv_test on AMDGPU blocked the entire suite until
the buildbot watchdog killed the process after 1200s.
Added timeout propagation and handling of ExecuteCommandTimeoutException
to return lit.Test.TIMEOUT. This follows the same pattern used by the
GoogleTest format in googletest.py.
[libc++] Implement `ranges::fold_left_first` and `ranges::fold_left_first_with_iter` (#180214)
- Part of #105208.
- Closes #174059.
- Closes #121558.
---------
Co-authored-by: JCGoran <jcgoran at protonmail.com>
Co-authored-by: A. Jiang <de34 at live.cn>
[DAGCombiner] Fold bswap of single-byte-known-nonzero value to a shift (#193473)
When computeKnownBits proves that a bswap operand has at most one byte
of possibly-nonzero bits at a known byte-aligned position, the bswap is
equivalent to a shift that moves that byte to the mirror position. This
is a producer-side known-bits rule; it fires in visitBSWAP regardless
of how the narrow-value provenance was established, covering shapes
such as
bswap(and X, 0xFF)
bswap(and X, 0xFF00) ; all byte positions
bswap(zext i8 X to iN)
bswap(zext i16 X to i64)
Motivation. While investigating a RISCV codegen regression under
-combiner-topological-sorting (bswap-shift.ll), I traced the root cause
to the existing consumer-side rule in
TargetLowering::SimplifyDemandedBits
for ISD::BSWAP: when a consumer demands only one byte of the bswap
result, that rule rewrites the inner bswap as a shift. Under topological
[33 lines not shown]
[Hexagon] Add SafeStack runtime libcall to HexagonSystemLibrary (#191673)
Register DefaultSafeStackGlobals for the Hexagon target so that the
SafeStack pass can locate the thread-local unsafe stack pointer during
codegen.
Without this, compiling with `-fsanitize=safe-stack` for Hexagon errors
with "no location available for safestack pointer address".
[NFC][Clang][Analyses] Fix AccessPath to have deleted copy assignment (#193639)
Static analysis flagged AccessPath because it had a copy constructor but
did not declare a copy assignment. It appears the intent is not to allow
assignment, so declare it deleted.
[TySan] Fix size type mismatch in instrumentMemInst for 32-bit targets (#191601)
The outlined instrumentation path in instrumentMemInst passes Size
directly to the __tysan_instrument_mem_inst runtime call, which declares
its size parameter as uint64_t (i64). On 32-bit targets, Size is
IntptrTy (i32) for allocas and byval arguments, causing an assertion:
Calling a function with a bad signature!
Add CreateZExtOrTrunc to widen Size to U64Ty before the call. This is a
no-op on 64-bit targets where IntptrTy is already i64.
[CodeGenPrepare] Drop nuw on gep unmerging if the new index is negative (#193488)
Fixes #193487.
Drop nuw if unmerging would result in gep with a negative index.
[AArch64][clang][llvm] Add ACLE Armv9.7 MMLA intrinsics
Implement new ACLE matrix multiply-accumulate intrinsics for Armv9.7:
```c
// 16-bit floating-point matrix multiply-accumulate.
// Only if __ARM_FEATURE_SVE_B16MM
// Variant also available for _f16 if (__ARM_FEATURE_SVE2p2 && __ARM_FEATURE_F16MM).
svbfloat16_t svmmla[_bf16](svbfloat16_t zda, svbfloat16_t zn, svbfloat16_t zm);
// Half-precision matrix multiply accumulating to single-precision
// instruction from Armv9.7-A. Requires the +f16f32mm architecture extension.
float32x4_t vmmlaq_f32_f16(float32x4_t r, float16x8_t a, float16x8_t b)
// Non-widening half-precision matrix multiply instruction. Requires the
// +f16mm architecture extension.
float16x8_t vmmlaq_f16_f16(float16x8_t r, float16x8_t a, float16x8_t b)
```
[OpenMP][OMPIRBuilder] Convert cmpxchg memory order to C ABI constants (#193536)
`EmitAtomicCompareExchangeLibcall` passed LLVM AtomicOrdering enum
values directly as the success/failure ordering arguments to
`__atomic_compare_exchange`. However, the C ABI expects the `__ATOMIC_*`
constants instead.
`EmitAtomicLoadLibcall` and `EmitAtomicStoreLibcall` already use
`toCABI()` for this conversion. Apply the same conversion in
`EmitAtomicCompareExchangeLibcall`.
This PR is a reland of #191857 which was closed incorrectly due to
parent branch deleted.
[DAGCombine] Relax restriction on (bswap shl(x,c)) combine (#193679)
We can still do the
(bswap shl(x,c)) -> (zext(bswap(trunc(shl(x,sub(c,bw/2))))))
combine if the shift amount is a multiple of 8 not just 16.
https://alive2.llvm.org/ce/z/crnSB6
[RISCV] Pass Subtarget to CC_RISCVAssign2XLen. NFC (#193609)
Previously we passed XLen and EABI, but still needed to get something
else from Subtarget so we looked it up from State. Instead, just pass
Subtarget and query all the information from it.
[libc][NFC] Fix typo in GPU test warning message (#193762)
The warning referred to CMAKE_CROSS_COMPILING_EMULATOR (with an extra
underscore) instead of CMAKE_CROSSCOMPILING_EMULATOR.
[lldb] Override UpdateBreakpointSites in ProcessGDBRemote to use MultiBreakpoint
This concludes the implementation of MultiBreakpoint by actually using
the new packet to batch breakpoint requests.
https://github.com/llvm/llvm-project/pull/192910
[X86] Use getTargetVShiftByConstNode helper to reduce code duplication. NFCI. (#193736)
Cleanup getTargetVShiftByConstNode and use it instead of repeatedly handling bitcasts/shift amount creation.
[lldb] Implement delayed breakpoints
This patch changes the Process class so that it delays *physically*
enabling/disabling breakpoints until the process is about to
resume/detach/be destroyed, potentially reducing the packets transmitted
by batching all breakpoints together.
Most classes only need to know whether a breakpoint is "logically"
enabled, as opposed to "physically" enabled (i.e. the remote server has
actually enabled the breakpoint). However, lower level classes like
derived Process classes, or StopInfo may actually need to know whether
the breakpoint was physically enabled. As such, this commit also adds a
"IsPhysicallyEnabled" API.
https://github.com/llvm/llvm-project/pull/192910
[mlir][vector] Generalize castAwayContractionLeadingOneDim (#187312)
This PR generalizes castAwayContractionLeadingOneDim to allow
accumulators with rank 1 to be matched.
With this generalization we allow the following contractions:
```
%c = vector.contract {
indexing_maps = [
affine_map<(d0, d1) -> (d0)>,
affine_map<(d0, d1) -> (d1, d0)>,
affine_map<(d0, d1) -> (d1)>],
iterator_types = ["reduction", "parallel"],
kind = #vector.kind<add>
} %0, %1, %2 : vector<64xf32>, vector<1x64xf32> into vector<1xf32>
```
to be matched and transformed to
[36 lines not shown]
[lldb][NFC] Move BreakpointSite::IsEnabled/SetEnabled into Process
The Process class is the one responsible for managing the state of a
BreakpointSite inside the process. As such, it should be the one
answering questions about the state of the site.
https://github.com/llvm/llvm-project/pull/192910
[lldb-server] Implement support for MultiBreakpoint packet
This is fairly straightforward, thanks to the helper functions created
in the previous commit.
https://github.com/llvm/llvm-project/pull/192910
[lldb-server][NFC] Factor out code handling breakpoint packets
This commit extracts the code handling breakpoint packets into a helper
function that can be used by a future implementation of the
MultiBreakpointPacket.
It is meant to be purely NFC.
There are two functions handling breakpoint packets (`handle_Z`
and `handle_z`) with a lot of repeated code. This commit did not attempt
to merge the two, as that would make the diff much larger due to subtle
differences in the error message produced by the two. The only
deduplication done is in the code processing a GDBStoppointType, where a
helper struct (`BreakpointKind`) and function (`std::optional<BreakpointKind> getBreakpointKind(GDBStoppointType stoppoint_type)`) was created.
https://github.com/llvm/llvm-project/pull/192910