[llvm][formatters] Add LLDB formatter for llvm::PointerUnion (#175218)
We make use of the fact that the `PointerUnion` element is a
`PointerIntPair`, for which we have a synthetic provider already. We get
the `Int` portion of the pair (which is the index into the template
parameter pack of the union) to get the active type and the `Pointer`
portion of the pair to get the actual pointer value.
Before:
```
(lldb) (lldb) v -T z_float
(llvm::PointerUnion<Z *, float *>) z_float = {
(llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> >, 0, Z *, float *>) llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> > >, 0, Z *, float *> = {
(llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> >, 1, float *>) llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> > >, 1, float *> = {
(llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> >, 2>) llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> > >, 2> = {
(llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> >) Val = {...}
}
}
}
[8 lines not shown]
[X86] InstCombine: Generalize scalar SSE MAX/MIN intrinsics (#175375)
Fixes #175162
This patch handles x86_sse_max_ss/min_ss and related intrinsics. It
check if is known to be safe to convert them to llvm.maxnum/minnum.
These intrinsics can be converted to `@llvm.maxnum` and `@llvm.minnum`.
This optimization can be done if the inputs are free of: NaN, Inf,
Subnormal, and NegZero. If it is not sure to be free of these, the
instructions remain the same.
[SCEV] Handle all PtrtoIntExpr construction in CastSinkingRewriter (NFC) (#174435)
Move SCEVPtrToIntSinkingRewriter out of getLosslessPtrToIntExpr to be
re-used for PtrToAddr. Also streamline code in getLosslessPtrToIntExpr
by moving zero handling to the rewriter and removing special handling
for SCEVUnknown in getLosslessPtrToIntExpr. Instead, always use the
rewriter, which will automatically handle the case where the expression
is a SCEVUnknown.
This makes it slightly easier to add support for PtrToAddr as follow-up
to https://github.com/llvm/llvm-project/pull/158032
PR: https://github.com/llvm/llvm-project/pull/174435
[RISCV] Fix ReplaceNodeResults of Intrinsic::experimental_cttz_elts for RV32 (#174992)
The test case added in this patch crashes on rv32v without this fix. We
attempt to trunc the i32 type of the select produced by lowerCttzElts to
i64, which asserts. Use getZExtOrTrunc instead.
[VPlan] Remove verifier check that EVL can only be used by VPInstruction with one use (#175502)
Fixes #175028
We have a VPlanVerifier assertion that a VPInstruction that uses EVL
only has one use. This used to hold until we implemented CSE, but now we
can run into the case where e.g. a multiply from an expanded
VPWidenPointerInductionRecipe gets cse'd, causing it to have multiple
uses:
EMIT ir<%0> = WIDEN-POINTER-INDUCTION ir<%.pre3>, ir<6>, vp<%5>
EMIT ir<%1> = WIDEN-POINTER-INDUCTION ir<%.pre>, ir<6>, vp<%5>
EMIT-SCALAR vp<%5> = EXPLICIT-VECTOR-LENGTH vp<%avl>
-->
EMIT-SCALAR vp<%10> = EXPLICIT-VECTOR-LENGTH vp<%avl>
EMIT vp<%11> = mul ir<6>, vp<%10>
EMIT vp<%ptr.ind> = ptradd vp<%pointer.phi>, vp<%11>
[13 lines not shown]
[X86][Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - Allow SSE/AVX FP MAX/MIN intrinsics to be used in constexpr (#171966)
* Implemented a generic function interp__builtin_elementwise_fp_binop
* NaN, Infinity, Denormal cases can be integrated into the lambda in
future. For, now these cases are hardcoded in the generic function
Resolves: #169991
[SPIR-V] Do not allow AS(2) to convert to generic (#175275)
Summary:
The original logic permitted this, while it's not permitted by the
standard.
---------
Co-authored-by: Dmitry Sidorov <18708689+MrSidims at users.noreply.github.com>
[AMDGPU] Fix crash in SIInsertWaitcnts debug output (#175518)
In some cases we were accessing `OldWaitcntInstr.getParent()->end()`
after `OldWaitcntInstr` had already been erased from its parent.
[libc++] Make basic_string::__erase_external_with_move noexcept (#171591)
`__erase_external_with_move` is in the dylib, so the compiler doesn't
see the definition. Marking it `noexcept` sometimes allows clang to
remove exceptions related code, improving code size slightly.
[PS5][Driver] forward -ffat-lto-objects to the linker (#172854)
When clang is driving the linker and is passed -ffat-lto-objects, pass
it on to the linker as --fat-lto-objects.
[libc] Improve SIMT control flow in the GPU allocator
Summary:
The Volta independent thread scheduling is very difficult to work with.
This is a first attempt to make the logic more sound when lanes execute
independently. This isn't all that's required, but it ends up improving
control flow for AMDGPU as well.
[flang] Enhance show_descriptor intrinsic to avoid extra descriptor copies (#173461)
Originally, the argument to show_descriptor() intrinsic was declared
with the passing mechanism of "asBox". This resulted in `fir.load`
instruction to be emitted to pass descriptor "asBox", which resulted in
extra llvm.memcpy in LLVM IR. The current change eliminates this, so
that show_descriptor() prints information about the original descriptor,
not about its copy.
The current change modifies the passing mechanism of the argument to
show_intrinsic() to "asInquired". The lowering of show_descriptor() now
passes the reference to a descriptor directly to the runtime routine. If
descriptor is passed as a value in SSA register, then it's spilled on
the stack and its address is passed to the runtime routine. If a
non-descriptor value is passed to show_descriptor(), then this value is
spilled to the stack, wrapped into a descriptor that is also spilled to
the stack, and the resulting descriptor pointer is passed to
show_descriptor().
show_descriptor() LIT test was modified to correspond to the new
implementation and additional test cases were added to it.
[PowerPC] using milicode call for strcpy instead of lib call (#174782)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strcpy routine is a millicode implementation;
we use millicode for the strcpy function instead of a library call to
improve performance.
---------
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
[mlir][Shape] Fix Yoda condition in OutlineShapeComputation (#174146)
Change `nullptr != inpDefOp` to `inpDefOp != nullptr` for better
readability and consistency with LLVM coding standards.
[include-cleaner] Report refs from macro-concat'd tokens as ambigious (#175532)
Previously we completely ignored these references as we couldn't detect
whether some pieces of concat'd token originated from main file and we
wanted to prevent false positives. Unfortunately these are resulting in
false negatives in certain cases and are breaking builds.
After this change, include-cleaner will treat such references as
ambigious to prevent deletion of likely-used headers (if they're already
directly included), while still giving user the opportunity to
explicitly delete them.
[Clang] prevent an assertion failure caused by C++ constant expression checks in C23 floating conversions (#174113)
Fixes #173847
---
This patch addresses an assertion failure during compilation of C23 code
involving floating-point conversions.
As part of the C23 constexpr support introduced in PR #73099, Clang
began reusing parts of the C++ constant evaluation and narrowing logic.
In C23 mode, a failed constant evaluation caused the condition to
proceed to C++ constant-expression checks, resulting in an assertion
failure.
This change evaluates constants using `EvaluateAsRValue` in C23 mode and
restricts C++ constant-expression checks to C++ mode.
[InlineSpiller][AMDGPU] Implement subreg reload during RA spill
Currently, when a virtual register is partially used, the
entire tuple is restored from the spilled location, even if
only a subset of its sub-registers is needed. This patch
introduces support for partial reloads by analyzing actual
register usage and restoring only the required sub-registers.
This improvement enhances register allocation efficiency,
particularly for cases involving tuple virtual registers.
For AMDGPU, this change brings considerable improvements
in workloads that involve matrix operations, large vectors,
and complex control flows.
[AMDGPU] Test precommit for subreg reload
This test currently fails due to insufficient
registers during allocation. Once the subreg
reload is implemented, it will begin to pass
as the partial reload help mitigate register
pressure.