[OpenACC][flang] Emit NYI when unstructured loops are associated with OpenACC directives
When an unstructured loop is associated with a loop or a combined
directive, we emit an unstructured CFG for the loop's logic nested
within the OpenACC op. This effectively serializes the nested loop on
the device which is not desirable. For now, emit NYI's while working on
a longer-term solution.
AMDGPU: Reland: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter which will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
Original patch had a bug where it did not check if physical src
registers match register class of appropriate operand in fullVOPD
instructions, check is now done via isValidVOPDSrc.
AMDGPU: Validate VOPD/VOPD3 physical source registers against operand RC
Replace isVGPR checks with isValidVOPDSrc that validates physical source
registers against the actual combined VOPD/VOPD3 instruction's operand
register classes. Now we also validate operands for VOPD instructions.
[ARM][LLD] Fix buildbot failure due to ununsed variable [NFC] (#202925)
The variable was used in an assert, have altered the code to use in an a
non-assert context.
kern_fork: guard against NULL newproc on the failure path
Reported and tested by: pho
Fixes: 85a65e393092 ("proc: add tree ref count")
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
[X86] combineConcatVectorOps - add 512-bit PCMPEQ/PCMPGT handling (#202928)
If we can freely concatenate both operands, then its worth replacing
with a VPCMP+VPMOVM2 pair
Managed to notice this while triaging #198162 - and the AVX512DQ SGT
test shows another vpmovq2m+vpmovm2q pair codegen issue :(
Document the warn_unused attribute (#201881)
Basically, this attribute is useful for getting -Wunused-variable
diagnostics from class types with a nontrivial constructor or
destructor.
[CIR][AMDGPU] Add support for AMDGCN div_fixup builtins (#197468)
Adds codegen for the following AMDGCN division fixup builtins:
- __builtin_amdgcn_div_fixup (double)
- __builtin_amdgcn_div_fixupf (float)
- __builtin_amdgcn_div_fixuph (half)
These are lowered to the corresponding `llvm.amdgcn.div.fixup` intrinsic.
[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging
It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils
Move really trivial functions into helpers to declutter InsertWaitCnt a bit more.
I had to move HardwareLimits into a different header but it's only used in InsertWaitCnt so it doesn't matter.