[clang][ssaf] Add FormatInfo sub-registry and tests [3/3]
Add `FormatInfoEntry` template to support per-analysis-type serialization
within a `SerializationFormat`.
This allows to implement different formats for the different analyses in
a decoupled way.
For testing, this patch also implements the MockSerializationFormat
demonstrating the FormatInfo sub-registry pattern.
Assisted-by: claude
[flang][cuda] Update visibility of declaration copied to in gpu.module (#179725)
https://github.com/llvm/llvm-project/pull/179362 changes which op is
checked for visibility during nested symbol resolution. This cause
issues in the CUDA Fortran pipeline and make some lookup fails. Update
the visibility of declaration copied to the gpu.module to nested.
[clang][ssaf] Add SerializationFormatRegistry [2/3]
Add a registry infrastructure for SerializationFormat implementations,
enabling registration and instantiation of different serialization formats.
For example:
```c++
static SerializationFormatRegistry::Add<MyFormat>
RegisterFormat("MyFormat", "Description");
```
Formats can then be instantiated by name using `makeFormat()`.
The patch also updates the SerializationFormat base class to accept
FileSystem and OutputBackend parameters for virtualizing I/O
operations.
Assisted-by: claude
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
- Fix a bug where AVIC is incorrectly inhibited when running with
x2AVIC disabled via module param (or on a system without x2AVIC)
- Fix a dangling device posted IRQs bug by explicitly checking if the
irqfd is still active (on the list) when handling an eventfd signal,
instead of zeroing the irqfd's routing information when the irqfd is
deassigned.
Zeroing the irqfd's routing info causes arm64 and x86's to not
disable posting for the IRQ (kvm_arch_irq_bypass_del_producer() looks
for an MSI), incorrectly leaving the IRQ in posted mode (and leading
to use-after-free and memory leaks on AMD in particular).
This is both the most pressing and scariest, but it's been in -next
for a while.
[22 lines not shown]
[AMDGPU][MC] Allow nodone etc. in exp instructions (#172749)
This patch allows nodone, nocompr, novm, and norow_en to be used in exp
instructions to indicate the corresponding modifiers are not present.
[AMDGPU] Allow hoising of V_READFIRSTLANE_B32 for uniform operand
readfirstlane can be moved across control flow for uniform inputs.
The MachineInstr::NoConvergent attribute allows hoisting
which is otherwise prohibited for a convergent instruction.
[mlir] Add [may]updateStartingPosition to VectorTransferOpInterface
This commit adds methods to VectorTransferOpInterface that allow
transfer operations to be queried for whether their base memref (or
tensor) and permutation map can be updated in some particular way and
then for performing this update. This is part of a series of changes
designed to make passes like fold-memref-alias-ops more generic,
allowing downstream operations, like IREE's transfer_gather, to
participate in them without needing to duplicate patterns.
[mlir][MemRef] Make fold-memref-alias-ops use memref interfaces
This replaces the large switch-cases and operation-specific patterns
in FoldMemRefAliashops with patterns that use the new
IndexedAccessOpInterface and IndexedMemCopyOpInterface, which will
allow us to remove the memref transforms' dependency on the NVGPU
dialect.
This does also resolve some bugs and potential unsoundnesses:
1. We will no longer fold in expand_shape into vector.load or
vector.transfer_read in cases where that would alter the strides
between dimensions in multi-dimensional loads. For example, if we have
a `vector.load %e[%i, %j, %k] : memref<8x8x9xf32>, vector<2x3xf32>`
where %e is
`expand_shape %m [[0], [1], [2. 3]] : memref<8x8x3x3xf32> to 8x8x9xf32,
we will no longer fold in that shape, since that would change which
value would be read (the previous patterns tried to account for this
but failed).
2. Subviews that have non-unit strides in positions that aren't being
[15 lines not shown]
[mlir] Implement indexed access op interfaces for memref, vector, gpu, nvgpu
This commit implements the IndexedAccessOpInterface and
IndexedMemCopyInterface for all operations in the memref and vector
dialects that it would appear to apply to. It follows the code in
FoldMemRefAliasOps and ExtractAddressComputations to define the
interface implementations. This commit also adds the interface to the
GPU subgroup MMA load and store operations and to any NVGPU operations
currently being handled by the in-memref transformations (there may be
more suitable operations in the NVGPU dialect, but I haven't gone
looking systematically)
This code will be tested by a later commit that updates
fold-memref-alias-ops.
Assisted-by: Claude Code, Cursor (interface boilerplate, sketching out
implementations)
[AArch64][llvm] Pre-commit tests for enabling streaming with +fprcvt
Add pre-commit tests for enabling streaming with +fprcvt. Because I've
added a `+sve,+neon,+fullfp16,+fprcvt -force-streaming-compatible` line
to the testfiles, this required a small change to prevent an assert.
[MLIR][Python] Remove partial LLVM APIs in python bindings (3/n) (#178984)
This PR continues work from #178290
It cleans up multiple LLVM utilities in *.h files under
`mlir/Bindings/python`, along with the corresponding *.cpp files.
Improve caching for dbuf prefetches
To avoid read errors with transaction open dmu_tx_check_ioerr()
is used to read everything required in advance. But there seems
to be a chance for the buffer to evicted from dbuf cache in
between, which result in immediate eviction from ARC, which may
require additional disk read later in a place where error handling
is problematic.
To partially workaround this introduce a new flag DMU_IS_PREFETCH,
relayed to ARC as ARC_FLAG_PREFETCH | ARC_FLAG_PRESCIENT_PREFETCH,
making ARC delay eviction by at least several seconds, or till the
actual read inside the transaction, that will promote it to demand
access.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18160
[clang] remove addrspace cast from CreateIRTemp (#179327)
This just added unnecessary work to the IR, since they are only used for
load and store, which just causes some IR noise. Tests updated by UTC
script to remove the extra lines.
[RegAlloc] Change the computation of CSRCost (#177226)
This patch fixes https://github.com/llvm/llvm-project/issues/150737.
The original computed CSRCost is too small, so the optimization of
spilling instead of using CSR is rarely triggered.
Also the original cost model is too difficult to be understood and too
hard to be tuned by backend developers and users.
So this patch changes the CSRCost to be
CSRCost = TRI->getCSRFirstUseCost() * EntryFreq * Scale
TRI->getCSRFirstUseCost() is the raw cost of save/restore a CSR. Usually
we don't need to tune this number.
EntryFreq is the BlockFrequency of the entry block.
Scale is used to scale down the CSRCost, because we usually prefer a CSR
register instead of spilling if we have similar CSRCost and spill cost,
[8 lines not shown]
[Evaluator] require invariant size to fully span the global (#179518)
Relying on the semantics of the type here is a bit potentially awkward,
since the full allocated space may be accessible to the user if desired,
since that space is defined to be a part of the type's sizeof
computation (e.g. for a `memcpy(gv, gv, sizeof(*gv))` or when making an
array of them). It also gets in the way of removing getAllocatedType
from AllocaInst entirely (they are converted to GlobalVariable
sometimes). It was originally added in
519561f418c77dcf46fd6d96d25d884fa07fd7da, though "correct size" is a
difficult thing to define.
The frontend (in clang) appears to always emit the full type size here,
so there seems like this shouldn't be visible change to clang users.
This is still a bit awkward though, since a global is defined to be any
size that is bigger than this unless it has a known initizalizer,
rendering the test still incomplete here against the IR semantics.
arc: remove unused l2df_size and l2df_type from l2arc_data_free_t
These fields became unused when ABD was introduced in a6255b7fc.
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
cache_012_pos: disable compression to ensure L2ARC wrap
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093
ZTS: Add L2ARC DWPD and parallel writes tests
Add four new functional tests to validate L2ARC DWPD rate limiting and
parallel write features:
- l2arc_dwpd_ratelimit_pos: Verifies DWPD rate limiting with different
values (0, 100, 1000, 10000) and ordering
- l2arc_dwpd_reimport_pos: Verifies DWPD rate limiting persists after
pool export/import
- l2arc_multidev_scaling_pos: Verifies parallel write scaling ratio
(dual devices achieve ~2× single device throughput)
- l2arc_multidev_throughput_pos: Verifies absolute parallel write
throughput scales with device count (~32MB/s per device)
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #18093