[MCA] Enhance debug prints of processor resources (#190132)
Previously, `computeProcResourceMasks()` would print resource masks on
debug mode from multiple call sites, creating noise in the debug output.
This patch aims to fix this and also print more info about the
resources.
It splits to 2 types of debug prints for resources:
1. No simulation - mask only
2. Simulation - mask + other info
For 2, it shares printing on a single place in `ResourceManager`
constructor, that should cover all the other simulation cases
indirectly:
1. `llvm/lib/MCA/HardwareUnits/ResourceManager` - covered
2. `llvm/lib/MCA/InstrBuilder.c` - should be covered indirectly - only
used by `llvm-mca` before simulation that constructs a `ResourceManager`
[23 lines not shown]
[Inliner] Put inline history into IR as !inline_history metadata (#190092)
So that it's preserved across all inline invocations rather than just
one inliner pass run.
This prevents cases where devirtualization in the simplification
pipeline uncovers inlining opportunities that should be discarded due to
inline history, but we dropped the inline history between inliner pass
runs, causing code size to blow up, sometimes exponentially.
For compile time reasons, we want to limit this to only call sites that
have the potential to inline through SCCs, potentially with the help of
devirtualization. This means that the callee is in a non-trivial
(Ref)SCC, or the call site was previously an indirect call, which can
potentially be devirtualized to call any function.
The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant
even with this change, as monster_scc.ll blows up if I remove that code.
[3 lines not shown]
yes: fix argv test race between fork and exec
The argv test checks ps(1) output immediately after backgrounding yes(1), but
the forked child briefly shows the parent shell's argv before exec(2) replaces it.
This caused intermittent failures where ps(1) captured the atf shell wrapper
command line instead of "yes y".
Approved by: des
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D56231
Replace pure-python filter_list
This commit replaces the pure-python implementation of filter_list
with the version provided by the truenas/truenas_pyos repo
(truenas_pyfilter). The overall new workflow for this is:
1. convert the filters / options to their respective objects from
truenas_pyfilter (compile_filters, compile_options).
2. use the filters / options to either match (if there's single item)
or tnfilter (if there is more than one).
Output is same so this is mostly a drop-in replacement; however,
in some places in our codebase we keep copies of pre-compiled filters
and options because they do not change. The filter_list util is now
replaced with what is largely a thin wrapper around the extension.
API validation also now wraps around validation provided by the
extension.
packages: Fix build with libucl 0.9.3
In libucl 0.9.3, macros and includes are disabled by default when
creating a new UCL parser. This breaks the package build, which
relies on includes. Fix this by explicitly passing zero flags
to ucl.parser().
MFC after: 3 days
Fixes: abda442d92fd ("contrib/libucl: Import libucl 0.9.3")
Reviewed by: kevans, bapt
Reported by: freebsd at walstatt-de.de
Sponsored by: https://www.patreon.com/bsdivy
Differential Revision: https://reviews.freebsd.org/D56266
fmax.3: Add caveat for going beyond C std requirements
libm's fmax and fmin family of functions treat +0.0 as greater than
-0.0. This is not required by the C standard, so the user may not see
this behaviour due to compiler optimization.
PR: 294214
Reviewed by: fuz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D56230
[RISCV] Use a vector MemVT when converting store+extractelt into a vector store. (#190107)
This is needed so that `allowsMemoryAccessForAlignment` checks for
unaligned vector memory
support instead of unaligned scalar memory support when called from
`RISCVTargetLowering::expandUnalignedVPStore`
While there remove incorrect setting of the truncating store flag
on the vector instruction. And restrict the transform to simple stores
since we don't have tests for volatile or atomic.
Fixes #189037
[RISCV][P-ext] Add isel patterns for for macc*.h00/macc*.w00. (#190444)
The RV32 macc*.h00 instructions take the lower half words from rs1 and
rs2, compute the full word product by extending the inputs, and
add to rd. The RV64 macc*.w00 is similar but operates on words
and produces a double word result.
I've restricted this to case where the multiply has a single use.
We don't have a general macc that multiplies the full xlen bits
of rs1 and rs2, so I'm allowing the input to be sext_inreg/and or
have sufficient sign/zero bits according to
ComputeNumSignBits/computeKnownBits.
We should also add mul*.h00/mul.*w00 patterns, but those we should
restrict to at least one input being sext_inreg/and and prefer
regular mul when there are no sext_inreg/and.
[AMDGPU] Add v2i32 and/or patterns for VOP3 AND_OR and OR3 operations (#188375)
Add ThreeOp_v2i32_Pats pattern class to support v2i32 vector operations
for AND_OR_B32 and OR3_B32 instructions. The new patterns check the
v2i32 and-or or or-or instruction sequence, extract individual 32-bit
elements from v2i32 operands, and applies the and_or or or3 vop3
operations.
sysutils/psutil: Patch to remove procfs dependency
Replace procfs depdendency in the following sections
- cpu stats calculation
- memory usage calculation
While here also handle EBUSY failures gracefully.
diff: use O_CLOEXEC on pipes
This only simplifies the code, no functional changes expected
MFC After: 1 week
(cherry picked from commit c8d40bf8ecc60cc15e3904410db62065ea681fdc)
[AMDGPU] Change isSingleLaneExecution to account for WWM enabling lanes even if there's only one workitem (#188316)
This issue was discovered during some downstream work around Vulkan CTS
tests, specifically
`dEQP-VK.subgroups.arithmetic.compute.subgroupadd_float`
---------
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
Remove pam_truenas etc_group
At this point there's not really a strong reason to have a
separate pam_truenas etc group. The truenas-specific entries
depend on what's generated in regular pam etc group. This
commit also fixes an issue whereby PAM files weren't updated
on remote controller when user enables DS authentication.
This commit also updates various call-sites to generate pam
rather than pam_truenas.