[AMDGPU] Multi dword spilling for unaligned tuples
While spilling unaligned tuples, rather than breaking the
spill into 32-bit accesses, spill the first register as a
single 32-bit spill, and spill the remainder of the tuple
as an aligned tuple.
Some additional bookkeeping is required in the spilling
loop to manage the state.
[clang][bytecode][NFC] Refactor visitDeclRef() (#183690)
Move the `!VD` case up so we can assume `VD` to be non-null earlier and
use a local variable instead of calling `D->getType()` several times.
[LV] NFCI: Move extend optimization to transformToPartialReduction. (#182860)
The reason for doing this in `transformToPartialReduction` is so that we
can create the VPExpressions directly when transforming reductions into
partial reductions (to be done in a follow-up PR).
I also intent to see if we can merge the in-loop reductions with partial
reductions, so that there will be no need for the separate
`convertToAbstractRecipes` VPlan Transform pass.
system: create a backup on factory reset
This way we can see all the migrations taking place instead of hiding
them by squashing them into the first backup. This is very important
for image testing as 26.1 taught us.
(cherry picked from commit da4fbf7ee942d6ce0e7b0a475b4d20577dd37183)
system: create a backup on factory reset
This way we can see all the migrations taking place instead of hiding
them by squashing them into the first backup. This is very important
for image testing as 26.1 taught us.
[AMX][NFC] Match pseudo name with isa (#182235)
Adds missing suffix to clear intent for isa.
we switch from `TILEMOVROWrre` to `TILEMOVROWrte` in
https://github.com/llvm/llvm-project/pull/168193 , however pseudo was
same, updating pseudo to intent right isa version, This patch makes
changes `PTILEMOVROWrre` to `PTILEMOVROWrte`, even though pseudo does
not actually have any tile register.
---------
Co-authored-by: mattarde <mattarde at intel.com>
[Clang][NFCI] Make program state GDM key const pointer (#183477)
This commit makes the GDM key in ProgramState a constant pointer. This
is done to better reflect the intention of the key as a unique
identifier for the data stored in the GDM, and to prevent the use of the
storage pointed to by the key as global state.
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
dpaa2: improve error messages and log requested cluster size
If m_getjcl() fails we want to know the size we requested in order to
have a chance to evaluate the problem better.
MFC after: 3 days
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D55555
Lower strictfp vector rounding operations similar to default mode
Previously the strictfp rounding nodes were lowered using unrolling to
scalar operations, which has negative impact on performance. Partially
this issue was fixed in #180480, this change continues that work and
implements optimized lowering for v4f16 and v8f16.
[VectorCombine][X86] Ensure we recognise free sign extends of vector comparison results (#183575)
Unless we're working with AVX512 mask predicate types, sign extending a
vXi1 comparison result back to the width of the comparison source types
is free.
VectorCombine::foldShuffleOfCastops - pass the original CastInst in the
getCastInstrCost calls to track the source comparison instruction.
Fixes #165813
xen/acpi: implement hook to notify Xen about entering sleep state
This is required so that ACPI power-off (entering S5) works as expected, as
the ACPI PM1a and PM1b blocks might not be accessible by dom0 directly.
Additionally, Xen also needs to do cleanup before entering a sleep state,
so it needs to be notified about it.
With this patch FreeBSD dom0 now powers off the host correctly:
acpi0: Powering system off...
(XEN) [ 85.686598] arch/x86/hvm/emulate.c:415:d0v0 fixup p2m mapping for page fedc6 added
(XEN) [ 85.687606] arch/x86/hvm/emulate.c:415:d0v0 fixup p2m mapping for page fbc10 added
(XEN) [ 85.692357] Preparing system for ACPI S5 state.
(XEN) [ 85.692702] Disabling non-boot CPUs ...
(XEN) [ 85.694471] Broke affinity for IRQ9, new: {0-7}
[...]
(XEN) [ 85.903118] Entering ACPI S5 state.
Should be a non-functional change when not running as a Xen dom0.
[5 lines not shown]
AMDGPU: Skip last corrections in afn f64 reciprocal
Device libs has a fast reciprocal macro that is close
to the fast division expansion, but skips the last terms
compared to the full division.
The basic reciprocal handling has identical output to this
macro. The negative reciprocal case has different fneg placement
and smaller code size, but I believe should be the same.