[mlir][MemRef] Make fold-memref-alias-ops use memref interfaces
This replaces the large switch-cases and operation-specific patterns
in FoldMemRefAliashops with patterns that use the new
IndexedAccessOpInterface and IndexedMemCopyOpInterface, which will
allow us to remove the memref transforms' dependency on the NVGPU
dialect.
This does also resolve some bugs and potential unsoundnesses:
1. We will no longer fold in expand_shape into vector.load or
vector.transfer_read in cases where that would alter the strides
between dimensions in multi-dimensional loads. For example, if we have
a `vector.load %e[%i, %j, %k] : memref<8x8x9xf32>, vector<2x3xf32>`
where %e is
`expand_shape %m [[0], [1], [2. 3]] : memref<8x8x3x3xf32> to 8x8x9xf32,
we will no longer fold in that shape, since that would change which
value would be read (the previous patterns tried to account for this
but failed).
2. Subviews that have non-unit strides in positions that aren't being
[15 lines not shown]
[mlir] Add [may]updateStartingPosition to VectorTransferOpInterface
This commit adds methods to VectorTransferOpInterface that allow
transfer operations to be queried for whether their base memref (or
tensor) and permutation map can be updated in some particular way and
then for performing this update. This is part of a series of changes
designed to make passes like fold-memref-alias-ops more generic,
allowing downstream operations, like IREE's transfer_gather, to
participate in them without needing to duplicate patterns.
[mlir] Implement indexed access op interfaces for memref, vector, gpu, nvgpu
This commit implements the IndexedAccessOpInterface and
IndexedMemCopyInterface for all operations in the memref and vector
dialects that it would appear to apply to. It follows the code in
FoldMemRefAliasOps and ExtractAddressComputations to define the
interface implementations. This commit also adds the interface to the
GPU subgroup MMA load and store operations and to any NVGPU operations
currently being handled by the in-memref transformations (there may be
more suitable operations in the NVGPU dialect, but I haven't gone
looking systematically)
This code will be tested by a later commit that updates
fold-memref-alias-ops.
Assisted-by: Claude Code, Cursor (interface boilerplate, sketching out
implementations)
Thread Safety Analysis: Add literal-based alias test (#179041)
Add a simple literal-based alias test that shows that the recently fixed
value-based literal comparison works when resolving aliases.
NFC.
[AMDGPU] Allow hoising of V_READFIRSTLANE_B32 for uniform operand
readfirstlane can be moved across control flow for uniform inputs.
The MachineInstr::NoConvergent attribute allows hoisting
which is otherwise prohibited for a convergent instruction.
heimdal: Pass the correct pointer to realloc when growing a string buffer
The realloc in my_fgetln was trying to grow the pointer to the string
buffer, not the string buffer itself.
In function 'my_fgetln',
inlined from 'mit_prop_dump' at crypto/heimdal/kdc/mit_dump.c:156:19:
crypto/heimdal/kdc/mit_dump.c:119:13: error: 'realloc' called on unallocated object 'line' [-Werror=free-nonheap-object]
119 | n = realloc(buf, *sz + (*sz >> 1));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
crypto/heimdal/kdc/mit_dump.c: In function 'mit_prop_dump':
crypto/heimdal/kdc/mit_dump.c:139:11: note: declared here
139 | char *line = NULL;
| ^~~~
Reviewed by: rmacklem, cy
Fixes: a93e1b731ae4 ("heimdal-kadmin: Add support for the -f dump option")
Differential Revision: https://reviews.freebsd.org/D54933
(cherry picked from commit 03d8ac948b1ad9c419b294c3129b7da58d818363)
iicbb: Fix gcc12 complaint
So gcc12 doesn't understand that t->udelay is >= 1, so thinks that noack
might be unset sometimes. While we specifically constrain this on direct
assignment, there's a sysctl that might not. This is likely also a bug.
Instead of uglifying everything by using MAX(1, sc->udelay), I rewrote
the for loop as a do-while loop (which arguably dictates intent better
because this code clearly assumes it will be executed once).
Sponsored by: Netflix
(cherry picked from commit 4b301f7e7ab43bb61561786c2ab33f3a3c4a725d)
[AMDGPU] Allo hoising of V_READFIRSTLANE_B32 for uniform operand
readfirstlane can be moved across control flow for uniform inputs.
The MachineInstr::NoConvergent attribute allows hoisting
which is otherwise prohibited for a convergent instruction.
runat: Add -h to manipulate a symlink's named attribute dir
Lionel Cons <lionelcons1972 at gmail.com> requested
that a new option be added to runat(1) so that it could
be used to manipulate named attributes associated with
a symbolic link and not the file the symbolic link refers to).
This patch adds the option -h/--nofollow to do this.
Requested by: Lionel Cons <lionelcons1972 at gmail.com>
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D55023
[scudo] Add resident pages info to getStats (#178969)
Adding resident pages field to the primary allocator's getStats function
makes it consistent with the secondary allocator's getStats function.
Attributor: Add -light otions to -attributor-enable flag
Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.
Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.
I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.
Fix build for Linux 6.18 with PowerPC/RISC-V kernels. (#18145)
The macro 'flush_dcache_page(...)' modifies the page flags, but in Linux
6.18 the type of the page flags changed from 'unsigned long' to the
struct type 'memdesc_flags_t' with a single member 'f' which is the page
flags field.
Signed-off-by: Erik Larsson <catacombae at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs at mcmilk.de>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
[AMDGPU][GlobalISel] Add COPY_SCC_VCC combine for VCC-SGPR-VGPR pattern
Eliminate VCC->SGPR->VGPR bounce created by UniInVcc when the uniform boolean
result is consumed by a VALU instruction that requires the input in VGPRs.