[libc][math] Improve hypotf performance. (#186627)
Update the check for when a more careful rounding is needed, and remove
the redundant clear exception step.
[CIR] Add support for arrays-of-pointer-to-member-data (#186887)
This patch adds support for arrays of pointer-to-member-data, just like
we do for pointer-to-member-function. This patch also does a refactor of
some basic value lowering, which both makes this apply to locals and
constants, but also unifies them in preperation of future work when it
comes to record types.
Other than the otherwise-not-quite-intentional change (the recursion got
this feature, and I realized it worked while looking at other
things!), this is NFCI.
[CIR][NFC] Unify the 'null data member attr' getters (#186876)
In preperation of actually lowering data members as fields to a record
type, this patch does a minor refactor to make their single current use
have a slightly simpler interface. This will prevent us from having to
copy/paste this later.
Also, this patch removes a pair of now-orphaned builders, instead
preferring to use the ones that come from the parent builder type.
[CIR][NFC] Split the CXXABI 'TypeConverter' into its own type. (#186874)
This is currently an NFC change, as the CXXABITypeConverter has no
members yet. This patch splits it off into its own type, as it is going
to need to have members when we start transforming record types, but
doesn't implement that part yet (coming in future patches).
[CIR] Fix bug where block after-unreachable wasn't CXXABILowered (#186869)
If a TU has an 'unreachable' block, it wouldn't be CXXABILower'ed, which
would cause a legalization failure. This patch adds the same solution we
do in LowerToLLVM, which is to make sure we transform those sections
separately.
libclc: Improve float trig function handling (#187264)
Most of this was originally ported from rocm device libs in
c0ab2f81e3ab5c7a4c2e0b812a873c3a7f9dca8b, so merge
in more recent changes.
libclc: Improve float trig function handling
Most of this was originally ported from rocm device libs in
c0ab2f81e3ab5c7a4c2e0b812a873c3a7f9dca8b, so merge
in more recent changes.
libclc: Clean up sincos macro usage (#187260)
Handle this more like fract, and implement other
address spaces on top of the private overload with
a temporary variable.
graphics/kquickimageeditor: Update to 0.6.1
Release notes:
KQuickImageEditor 0.6.1 is a bug fix release. This release includes fixes
for memory leaks in the old editing system, fixes the dragger for the new
editing system's text tool, enables LSAN in CI, changes CMake include paths
and has a fix for an undefined aspectRatio error in the
new editing system's crop tool.
sysutils/plasma-pass: Update to 1.3.1
Changes since 1.3.0:
* Do not leak pages into the stackview (fixes #515036)
* Consistently install an appstream file again
* Updated translations
libclc: Clean up sincos macro usage
Handle this more like fract, and implement other
address spaces on top of the private overload with
a temporary variable.
libclc: Use select function instead of ?: for some fp selects (#187253)
It seems that ?: is not quite equivalent to select for floating-point
vectors. With ?:, the resulting IR involves integer bitcasts and
integer vector typed select. Use select so this is an fp-select. This
enables finite math only contexts to optimize out the select.
This feels like it's a clang bug though.
[flang][OpenMP] Use OmpDirectiveSpecification for range/depth queries, NFC (#187109)
That makes them usable for a potential future implementation of APPLY.
[LoongArch] Mark VREPLGR2VR/XVREPLGR2VR as re-materializable
The VREPLGR2VR and XVREPLGR2VR instruction families replicate a
scalar general-purpose register value into all elements of a vector
register. These instructions are side-effect free and relatively
cheap, with their result depending only on the input register.
Mark them as isReMaterializable to allow the register allocator to
recompute the value when profitable instead of spilling and reloading
it from memory.
This can help reduce register pressure and avoid unnecessary memory
traffic in vectorized code.
AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.