[X86] combineConcatVectorOps - add 512-bit PCMPEQ/PCMPGT handling (#202928)
If we can freely concatenate both operands, then its worth replacing
with a VPCMP+VPMOVM2 pair
Managed to notice this while triaging #198162 - and the AVX512DQ SGT
test shows another vpmovq2m+vpmovm2q pair codegen issue :(
Document the warn_unused attribute (#201881)
Basically, this attribute is useful for getting -Wunused-variable
diagnostics from class types with a nontrivial constructor or
destructor.
[CIR][AMDGPU] Add support for AMDGCN div_fixup builtins (#197468)
Adds codegen for the following AMDGCN division fixup builtins:
- __builtin_amdgcn_div_fixup (double)
- __builtin_amdgcn_div_fixupf (float)
- __builtin_amdgcn_div_fixuph (half)
These are lowered to the corresponding `llvm.amdgcn.div.fixup` intrinsic.
[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging
It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils
Move really trivial functions into helpers to declutter InsertWaitCnt a bit more.
I had to move HardwareLimits into a different header but it's only used in InsertWaitCnt so it doesn't matter.
[VPlan] Simplify WidenGEP::execute (#193543)
WidenGEP::execute is currently dependent on whether or not a given
operand is defined outside loop regions, but it loop-invariant operands
are not guaranteed to be hoisted outside the loop, and neither are
single-scalar operands guaranteed to be maximally narrowed to
single-scalars. Use the vputils::isSingleScalar helper to analyze the
single-scalar status of each operand and the result instead, simplifying
the execute, while also leading to some improvements.
[AMDGPU][NFC] Templatise and roundtrip gfx11_asm_vop3_dpp16.s (#202721)
I tried to make sure this covers all important cases from asm/disasm
tests here upstream and the true16 branch downstream.
This will resolve ~4k lines of differences vs the true16 branch.
[NFC][lldb][Windows] Clean up TargetThreadWindows (#202722)
- Drop dead `//#include "ForwardDecl.h"` and stale `class HostThread;`
forward declaration.
- Remove redundant `m_thread_reg_ctx_sp()` default-init in the
constructor initializer list.
[NFC][lldb][Windows] Clean up NativeThreadWindows (#202723)
- Drop unused #includes lldb/Target/Process.h and lldb/lldb-forward.h.
- Inline the one-shot NativeProcessProtocol& local in DoResume and
modernize GetStopReason's stale legacy log->Printf idiom to LLDB_LOGF.
[NFC][lldb][Windows] Clean up DebuggerThread (#202719)
- Fix typos in a llvm_unreachable string and a local variable name.
- Replace a C-style downcast to HostProcessWindows with static_cast.
- Drop redundant braces around a single-statement if and add the
namespace-closer comment in the header.