[AMDGPU] Fix handling of setting register classes in MFMA scheduler rewrite stage (#181047)
Fixes problem with setting/resetting register classes in the MFMA
rewrite sched stage. The code assumed that the dest and OpC would be the
same class. This is not true if one uses subregs. This fixes issue
#177696.
[libc][mathvec] Initial commit for LIBC vector math component (#173058)
Created mathvec directories and unittest framework for vector math
functions, as well as an initial implementation of vector expf, which is
presently CR for round-to-nearest.
---------
Co-authored-by: Pierre Blanchard <pierre.blanchard at arm.com>
[mlir][tosa] Improve slice op verifier (#181889)
The slice op verifier was missing checks on the values of start and size
inputs. Similar to other op verifiers, shape_t const inputs have been
checked for validity against the spec. The commit adds checks for the
following conditions:
- start values must be non-negative
- size values must be > 0
- start + size must be less than or equal to the input dimension size
- the output shape must be consistent with the size values
The commit also allows kInferableDimSize values (-1) to be passed in for
start and size, which are used to indicate that the dimension size can
be inferred by the compiler. The verifier will skip checks for any start
or size value that is kInferableDimSize. With shape expressions being
added, we should no longer require these values, but removal will be
handled in a separate commit.
[mlir][ArmSME] Replace nested-region assertion in tile allocation with diagnostic (#181934)
Replace the nested-region assertion in ArmSME tile allocation with a
proper diagnostic and graceful failure.
Fixes #181593
[MLIR][tblgen] Honor `-dialect` in `-gen-{attrdef,op,typedef,enum}-doc` (#182183)
Make all dialect documentation generators use the same set of records as
`-gen-dialect-doc`, which honors the `-dialect` tblgen option to filter
records by dialect. Add a `-keep-op-source-order` option to allow
`-gen-op-doc` to continue producing unsorted op lists if needed.
This commit factors the record collection, filtering, and sorting
performed in `emitDialectDoc` out into a separate `collectRecords`
function, returning a `DialectRecords` with the results. The emit
functions now all accept a `DialectRecords` argument instead of
collecting records themselves. Most changes are mechanical renamings and
moving code around.
This fixes a confusing issue where `gen-dialect-doc` would produce the
entire documentation for a dialect, but individual calls to
`gen-attrdef-doc` and the like would seemingly operate on a different
set of records, potentially covering multiple dialects. This all produce
the overall documentation now.
bhyve: Fix unchecked stream I/O in RFB handler
Convert rfb_send_* helpers to return status codes and check their
results. Add missing checks for stream_read() and stream_write() returns
during the handshake in rfb_handle() to avoid acting on failed I/O.
Signed-off-by: Hayzam Sherif <hayzam at gmail.com>
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55343
icmp6: clear csum_flags on mbuf reuse
When icmp6 sends an ICMPv6 message, it reuses the mbuf of the packet
that triggered the ICMPv6 message and prepends an IPv6 and ICMPv6
header. For a locally generated packet with checksum offloading, the
mbuf still has csum_flags set indicating that a SCTP/TCP/UDP checksum
has to be computed and inserted. Since this not the case anymore,
csum_flags need to be cleared.
PR: 293227
Reviewed by: kp, zlei, tuexen
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D55367
(cherry picked from commit ada4dc77577f7162353e8c2916ba5c258b6210f0)
(cherry picked from commit 0a87ae18331d5c52dde1e5a4f13ee577e8e5e188)
Merge commit bfb276e55c76 from upstream OpenZFS (by Jessica Clarke)
Once upon a time, 32-bit PowerPC did indeed have a 32-bit time_t, but
FreeBSD 12.0 switched to a 64-bit time_t for PowerPC as an ABI break,
which predates the addition of FreeBSD support to OpenZFS. Moreover,
64-bit PowerPC has existed since FreeBSD 9.0, where __powerpc__ is also
defined (alongside __powerpc64__ to disambiguate), which has always had
a 64-bit time_t. This code has therefore always been wrong for all
PowerPC variants. Fix this by limiting the 32-bit case to just i386,
which is the only architecture in FreeBSD to have a 32-bit time_t and
not have broken ABI, due to its special legacy compatibility status.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Jessica Clarke <jrtc27 at jrtc27.com>
Closes #18217
Closes #18218
Reported by: fuz
[4 lines not shown]
[MLIR][tblgen] Honor `-dialect` in `-gen-{attrdef,op,typedef,enum}-doc`
Make all dialect documentation generators use the same set of records as
`-gen-dialect-doc`, which honors the `-dialect` tblgen option to filter
records by dialect. Add a `-keep-op-source-order` option to allow
`-gen-op-doc` to continue producing unsorted op lists if needed.
This commit factors the record collection, filtering, and sorting
performed in `emitDialectDoc` out into a separate `collectRecords`
function, returning a `DialectRecords` with the results. The emit
functions now all accept a `DialectRecords` argument instead of
collecting records themselves. Most changes are mechanical renamings and
moving code around.
This fixes a confusing issue where `gen-dialect-doc` would produce the
entire documentation for a dialect, but individual calls to
`gen-attrdef-doc` and the like would seemingly operate on a different
set of records, potentially covering multiple dialects. This all produce
the overall documentation now.
[AMDGPU] Align loop headers to prevent instruction fetch split on GFX950 (#181999)
On GFX9, the instruction sequencer fetches 32 bytes at a time. When an
8-byte instruction at a loop header straddles a 32-byte fetch window
boundary, the sequencer must perform two fetches after a backward
branch, incurring a delay. On GFX950, this causes additional performance
issues.
This patch adds 32-byte alignment (.p2align 5, , 4) for loop headers on
GFX950 when the first real instruction is 8 bytes. At most one s_nop (4
bytes, 1 quad-cycle before the loop) is used for padding. If more than 4
bytes of padding were needed, the 8-byte instruction would not straddle
a 32-byte boundary anyway, so alignment is skipped.
Note: the alignment decision is made during block-placement, before
si-insert-waitcnts. In loops where a 4-byte S_WAITCNT is later inserted
as the first instruction, the alignment becomes redundant but mostly
harmless (at most one extra s_nop per affected loop).
Assisted-by: Claude (Anthropic)
OptionalObsoleteFiles: Don't mark /usr/lib/debug/boot directory obsolete
The intent of the currect code is to ignore anything under
/usr/lib/debug/boot/*. But we also should make sure that
/usr/lib/debug/boot directory is also ignored and is not marked
obsolete. If we don't do that, `make DBATCH_DELETE_OLD_FILES
delete-old` will try to rmdir(1) this directory, which will cause an
error, since /usr/lib/debug/boot may have nested directories like
kernel/ and modules/.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D55077