[flang][acc] Accept component reference in non-global `acc declare` (#192563)
The current TODO was being issued for all cases of `acc declare`
including ones which are treated as a subroutine-scope lifetime. Since
the latter use normal data mapping clauses without the need for
ctors/dtors, accept them. However, still emit TODO for the cases where
component references are `acc declare`d in global context.
[hb]: Add fwget information
Add some information about how to install firmware for graphic cards.
Differential Revision: https://reviews.freebsd.org/D56428
Reviewed by: carlavilla@
[lldb] Rally around triple rather than arch in the API tests (#191416)
This PR removes as much uses of arch as possible, in favor of using
triple directly. Most of the changes are in the builder, which no longer
passes `ARCH` to Make, and of course in Makefile.rules.
This significantly simplifies the remote Darwin test suite, as it
previously had to try and piece together the triple from the platform
and the arch. As an added benefit, we now go through the same code path
for host and remote test runs.
I have tested this on Darwin and Linux and made the changes with the
remote test suites in mind, but it's possible I missed something not
caught by my local testing.
[AMDGPU] Mark ASYNCMARK as meta instruction to fix hazard cycle miscounting (#189981)
ASYNCMARK emits no hardware code it is used for tracking purpose but was
not marked as meta, causing getNumWaitStates to return 1 and
GCNHazardRecognizer to incorrectly count it as a pipeline cycle.
This patch marks ASYNCMARK as meta-Instruction so it correctly reports 0
wait states.
Fixes: #186878
[flang][OpenACC] Support acc routine info on ProcEntityDetails for separate compilation (#192367)
When !$acc routine(name) vector is used in a caller for an external
subroutine, the symbol has ProcEntityDetails (not SubprogramDetails).
The routine info (vector/worker/gang/seq) was silently lost because
AddRoutineInfoToSymbol only handled SubprogramDetails, and CallInterface
only checked SubprogramDetails for openACCRoutineInfos.
Add openACCRoutineInfos storage to ProcEntityDetails and handle it in
both AddRoutineInfoToSymbol and CallInterface so the parallelism level
is properly lowered to acc.routine with the correct keyword.
[flang][cuda][openacc] use the ultimate symbol to set the implicit device attribute (#192553)
The attribute was not applied when the symbol had a UseDetails. Use the
ultimate symbol so we get the proper ObjectEntityDetails to apply the
implicit attribute.
[flang][acc] Fix crash on collapse(force:N) with non-tightly nested loops (#191310)
When collapse(force:N) is applied to non-tightly nested loops, the
compiler could crash or generate redundant inner loops.
Crashes occurred because getNestedEvaluations() was called without
checking hasNestedEvaluations() first. Add guards in hasEarlyReturn(),
createRegionOp(), and the collapse-force sinking logic in Bridge.cpp.
Redundant inner loops were generated because processDoLoopBounds
absorbed N levels of do-loops into the outer acc.loop, but the PFT
walker still generated separate acc.loop ops for those same loops.
Fix by tracking absorbed DoConstruct* pointers in visitLoopControl
and skipping them in genFIR(DoConstruct).
[lldb] Convert TestPtrAuth.py from an inline to a regular test (NFC) (#192705)
This PR changes TestPtrAuth.py from an inline to a "regular" API test.
The motivation for this is #191416 and the need to specify parameters to
the build.
iflib: ignore reclaim coalescing when low on tx descriptors
If we are low on TX descriptors, bypass iflib_txq_can_reclaim()
and force a reclaim. This is intended to reduce the number of
output drops under heavy load when using simple transmit.
Differential Revision: https://reviews.freebsd.org/D56339
Sponsored by: Netflix
iflib: accurately count bytes/segments for TSO
When using software based ifnet counters, iflib has not factored
TSO into account when reporting the segments and bytes sent.
So it will underreport NIC bandwidth by a small percent,
and will undercount sent segments by a large factor.
Fix this by calculating the number of added segments the NIC
will send, and add header size multiplied by that number
to arrive at a correct accounting of segments and bytes sent.
This makes these software counters directly comparable to
hardware counters.
Doing this requires moving the calculation into iflib_encap() where
we have already parsed the packet and know the header size, MSS, etc.
Differential Revision: https://reviews.freebsd.org/D56338
Sponsored by: Netflix
iflib: accurately count bytes/segments for TSO
When using software based ifnet counters, iflib has not factored
TSO into account when reporting the segments and bytes sent.
So it will underreport NIC bandwidth by a small percent,
and will undercount sent segments by a large factor.
Fix this by calculating the number of added segments the NIC
will send, and add header size multiplied by that number
to arrive at a correct accounting of segments and bytes sent.
This makes these software counters directly comparable to
hardware counters.
Doing this requires moving the calculation into iflib_encap() where
we have already parsed the packet and know the header size, MSS, etc.
Differential Revision: https://reviews.freebsd.org/D56338
Sponsored by: Netflix
iflib: ignore reclaim coalescing when low on tx descriptors
If we are low on TX descriptors, bypass iflib_txq_can_reclaim()
and force a reclaim. This is intended to reduce the number of
output drops under heavy load when using simple transmit.
Differential Revision: https://reviews.freebsd.org/D56339
Sponsored by: Netflix