[ScheduleDAG] Add a reachability cache to amortize DFS calls (#195079)
ScheduleDAGTopologicalSort::IsReachable falls out to a DFS on its
slow path. For some connectivity patterns this can result in ~quadratic
behavior.
Add a cache of {A, B} -> Reachable(A, B). This is invalidated whenever
AddPred or InitDAGTopologicalSorting is called.
For an antagnostic testcase, SelectionDAG time went from 1300s to 250s.
No testcase as no functional change, performance only.
---------
Co-authored-by: James Molloy <jmolloy at google.com>
AMDGPU/GlobalISel: Switch to extended LLTs
Switch is required to be able to translate bfloat.
After the switch most of the codegen patterns now require explicit
type on register to match instead of LLT::scalar.
So we can still use LLT::scalar for type checks but new instructions
created during lowerings/combines need to use propper extended LLT.
inst select test sources fully switched to i32/f32 so patterns can match
for legalizer and regbanklegalize left as is (should probably be switched
as well)
New functionality worth noting is f16 and bitcast lowering to i32
f16 = g_bitcast i16
->
i32 = g_anyext i16
f16 = g_trunc i32
f16 = trunc i32 is legal
[BasicAA] Don't look through llvm.ptrmask in GEP decomposition (#197082)
DecomposeGEPExpression() looked through llvm.ptrmask via
getArgumentAliasingToReturnedPointer(Call, MustPreserveNullness=false).
ptrmask preserves the underlying object but can change the byte address
by clearing low bits, so treating its result as having the same symbolic
offset as its argument produces stale offsets and bogus NoAlias answers.
The bug was introduced by 3f2850bc606c847075673554fe49d4a35f525b61.
Rename MustPreserveNullness to MustPreserveOffset, the property
DecomposeGEPExpression actually needs. Offset preservation is strictly
stronger than nullness preservation, so existing callers remain correct
and the accepted intrinsic set is unchanged (ptrmask stays excluded).
switch DecomposeGEPExpression to pass MustPreserveOffset=true. Every
call site is now tagged with MustPreserveOffset=.
[DAG] visitBITCAST - fold (conv (scalar_to_vector(load x))) -> (load (conv*)x) (#196978)
Legalization can leave superfluous scalar_to_vector nodes with the
scalar bitwidth matching the vector bitwidth - peek through these when
attempting to bitcast folds
Only one match in trunk at the moment, but there are some additional
folds encountered in #149798
[LLDB] Simplify the API of ClangUserExpression::ScanContext [NFC] (#197037)
- this function is a virtual function, but it is called by the leaf
class ClangUserExpression
- it also returns a Status only to then report any error as a warning
This patch devirtualizes the function, since there is use-case for
overloading it in other expression evaluator plugins, and it cleans up
the Status usage by passing in DiagnosticManager directly, like its
sibling functions do.
[DAG] canCreateUndefOrPoison - fmaxnum/fminnum/fmaximum/fminimum/fmaximumnum/fminimumnum don't create poison (#197195)
Test coverage is proving tricky due to lack of folds that work with these - I'm open to suggestions if we don't want to just eyeball this.
[clangd] Avoid crash on pseudo-destructor selection (#195939)
clangd crashes during textDocument/codeAction on valid pseudo-destructor
expressions like y->~decltype(A())(). The bug is in
Selection.cpp::earlySourceRange(), which assumes destructor names always
have NamedTypeInfo. The fix is adding null checks before calling
getTypeLoc().
Fixes #195788.
SymbolizableObjectFile: Fix Wasm test to avoid layering violation (#193574)
Tests for LLVM libraries should not require wasm-ld. It's not necessary
in this case to generate the binary at test time, so instead check in a
YAMLized pre-linked binary.
[GVN][NVPTX] Rename PRE flag to ScalarPRE, disable option in NVPTX (#190386)
Scalar PRE in GVN may cause performance issues in the NVPTX backend
by increasing register pressure. This PR renames the enable-pre flag to
enable-scalar-pre and updates its usage to cover an additional case of
scalar PRE being performed. The newly renamed option is also used to
disable scalar PRE for NVPTX.
[NFC][LLVM][VPlan] Fix "parameter ‘P’ set but not used" warning. (#197194)
For Is... = {} the fold expression short-circuits to true and does not
evaluate P.
[VPlan] Introduce reduction selects for tail folding in foldTailByMasking. NFCI (#192987)
Currently addComputeReductionResult has to check the cost model to see
if the loop is tail folded, and if so then manually fix up the backedge
value so any tail elements are ignored.
This PR moves this handling into foldTailByMasking itself so the plan
doesn't requiring fixing up. We do this by setting the incoming value
for the latch phi to the reduction phi instead of poison. A blend will
be created for this automatically.
The main benefits of this are that the reduction is correct when tail
folding is applied, and we don't need to worry about tail folding in as
many places.
In order to preserve some of the optimizations that we get on
VPInstruction::Select we need to convert the VPBlendRecipe to a select.
AMDGPU/GlobalISel: Switch to extended LLTs
Switch is required to be able to translate bfloat.
After the switch most of the codegen patterns now require explicit
type on register to match instead of LLT::scalar.
So we can still use LLT::scalar for type checks but new instructions
created during lowerings/combines need to use propper extended LLT.
inst select test sources fully switched to i32/f32 so patterns can match
for legalizer and regbanklegalize left as is (should probably be switched
as well)
New functionality worth noting is f16 and bitcast lowering to i32
f16 = g_bitcast i16
->
i32 = g_anyext i16
f16 = g_trunc i32
f16 = trunc i32 is legal
[libc] Pass -c to compiler when detecting target (#197012)
Follow-up to #176680 where I claimed having done this, but apparantly
didn't actually add it to the commit.
Hopefully no observable behavior change; will tell the compiler to omit
linker info in its output, which we don't need for this detection step.
[Support] Add a function to print the debug log (#197184)
With `EnableDebugBuffering`, the debug log is stored in a circular
buffer and printed, with a nice banner, on program termination - this is
achieved via a signal handler. For in-process tool execution, such as
for running the regression tests using daemon versions of the tools, we
need to be able to trigger the printing/flushing of the debug log from
the process itself. This PR just adds a small function `printDebugLog`
which checks if debug output and debug log buffering are enabled and, if
so, prints the debug log.
The code for printing the debug log in the signal handler is moved to a
new function `printDebugLogImpl` which is called by the signal handler
and `printDebugLog` - the reason this is separate from `printDebugLog`
is to avoid running the option check in the signal handler
implementation, in case options were reset before the signal handler is
called, as this would be an unintentional behavioral change.
[Clang] Fix assertion when __block is used on global variables in C mode (#194856)
This is a reland PR, related to #183988
I added an extra check in handleBlocksAttr to ensure that illegal Decl
values are not passed to downstream functions.
And remove unnecessary check in `CheckCompleteVariableDeclaration`.
Also added a extra regression test.
Fixes #183974
[AMDGPU] Optimize fneg and fsub with packed fp16 ops (#196659)
The work optimize fneg and fsub when packed half math instructions are supported.
On global isel path, for wider vectors of G_FSUB with element type of f16, we should
split them to v2f16 for v_pk_add_f16 to be selected.
On SelectionDAG path, we make FNEG legal, and also make sure to split wider vectors
to v2f16. In this way, we can fold fneg into the source modifiers for packed half ops.