When the pagedaemon is triggered to create free memory, there may be
sleeping pmemrange allocations with multi-page alignment requirements
which can't be satisfied by the simplistic freeing of (solo) pages
which the pagedaemon performs. As we near starvation, fragmentation
is the main problem. Our free list could be large enough that the
pagedaemon sees no reason to do more work, but also too fragmented to
satisfy a pending allocation request with complex requirements
(imagine asking for 512K of physically linear memory which is DMA
reachable). When the requirement isn't satisfied, the pagedaemon is
told to try again, but again doesn't mean harder because it has no
mechanism to try harder. It's tracking variables do not show the
fragmentation problem. It spins a lot. Often this becomes a
deadlock.
Time to change strategy: Overshoot creation of (both) inactive and
free pages each time through the loop. After inspecting existing
variables, we generate minumum 128 inactive pages (which may be
dynamically drawn down asyncronously by accesses), and then try to
convert minumum 128 inactives into free pages (different pages
get freed different ways, including via swapcluster which has been
[7 lines not shown]
Changed stat passes to count instructions before and after optimizations (#188837)
Created this for instcount and func-properties-analysis to be able to
see the change the optimization pipelines have on stats
To support swapencrypt, the swapcluster code has a memory allocation codepath.
Since this is runs inside the pagedaemon that is unworkable. We'd like to
encrypt the pages inplace for IO, but there are architectures not ready for
a high-mem page to be written to a dma-restricted device (work in progress).
So for now we need to bounce through dma-reachable memory buffer. A previous
attempt had 1 extra bounce buffer, but then slept on allocation inside the
pagedaemon context which is also unworkable. This version contains 32
pre-allocated swapclusters (64K each), and through a counter signals to the
pagedaemon when it should stop trying to create memory. 32 swap clusters
is comfortably more than the minimum we expect the pagedaemon frantically
generate. This crummy solution is good enough until we the dma reach problem
is solved (soon)
ok kettenis kirill (who looked into other solutions) beck
nfs_nfsdsocket.c: All Copy and Clone across file systems
For some server file system types, such as ZFS, a Copy/Clone
operation can be done across file systems of the same file
system type.
As such, this patch allows the Copy/Clone to be attempted
when the file handles are for files on different file systems.
This fixes a problem for exported ZFS file systems when a
copy_files on file_range(2) between file systems in the same
NFSv4 mount is attempted.
PR: 294010
(cherry picked from commit b65e7b4944cc2f594c9d9e6abc9b8618d3d62ff8)
[clang-doc] Use distinct APIs for fixed arena allocation sites
Typically, code either always emits data into the TransientArena or the
PersistentArena. Use more explicit APIs to convey the intent directly
instead of relying on parameters or defaults.
[clang-doc] Removed OwnedPtr alias
The alias served a purpose during migration, but now conveys the wrong
semantics, as the memory of these pointers is generally interned inside
a local arena.
[clang-doc] Update type aliases
Many of the type aliases we introduced to simplify migration to arena
allocation are no longer relevant after completing the migration. We
can use more relevant names and remove dead aliases.
[flang] Detect non-optional boxes inside acc.compute_region. (#191328)
This should be a temporary change until we figure out
a better way for representing definitely present boxes.
It allows me to experiment with flang-licm further,
so I would like to ask for approval.
[flang][CUF] Limit Flang LICM for operations with symbol operands. (#191494)
There is probably an ordering issue between `CUFDeviceGlobal`
and `OffloadLiveInValueCanonicalization` passes: Flang LICM hoists
`fir.address_of` out of `cuf.kernel`, it is pulled back by
`OffloadLiveInValueCanonicalization`, but the symbol is never added
into the device module because `CUFDeviceGlobal` does not run after.
Changing the passes order may take some time, so this is a temporary
workaround to unblock #191309.
The change is currently NFC.
[Support] On Windows, silence FARPROC casts (#191563)
When building with clang-cl 19, this was generating:
```
warning: cast from 'FARPROC' ... converts to incompatible function type
[-Wcast-function-type-mismatch]
```
[clang-doc] Initialize member variable (#191570)
We don't always initialize the IsType field in the current
implementation. We can ensure this field is always initialized to
`false`, and avoid any UB due to garbage data.
[mlir][sparse][gpu] fix sparse GPU codegen out buffer (#189221)
When lowering sparse tensor operations to GPU code using
`-sparse-gpu-codegen`, the generated `gpu.memcpy` op for device-to-host
copy was targeting the wrong buffer. In my case, it did not copy back
the output buffer and instead only copied back the input positions
buffer which results in the output buffer in host memory being empty.
The `SparseGPUCodegen` pass carries an assumption that the first buffer
is the out buffer. It looks like this assumption is not always true, as
in my case its the input positions buffer which made it the only buffer
getting copied back to host.
This change introduces a fix by removing the assumption and replacing it
with an analysis that checks for `memref::StoreOp` and write
MemoryEffects. This change also adds a regression test which highlights
the problematic edge case.
Assisted by Gemini 3.1 Pro for finding the issue of using incorrect
buffers in `gpu.memcpy` op in the lowered code.
[clang-doc] Initialize member variable
We don't always initialize the IsType field in the current
implementation. We can ensure this field is always initialized to
`false`, and avoid any UB due to garbage data.
NAS-140050 / 27.0.0-BETA.1 / Remove ips/interfaces fields from tnc configuration (by sonicaj) (#18720)
Automatic cherry-pick failed. Please resolve conflicts by running:
git reset --hard HEAD~1
git cherry-pick -x e051b5507eac4371aa721495215e492538074bf7
If the original PR was merged via a squash, you can just cherry-pick the
squashed commit:
git reset --hard HEAD~1
git cherry-pick -x 2c4add974e477f95a4233d93371dc4221b9b0675
This PR adds changes to remove ips/interfaces field from TNC
configuration. These fields were earlier used to determine what IPs
should the TNC domain resolve to. Moving on, we have simplified the
implementation and now instead of asking the user - we rely instead on
`system.general.config` instead where ipv4/ipv6 values determine what
IPs TNC domain name resolves to. In case we have wildcard set in
[7 lines not shown]
[Codegen, X86] Add prefetch insertion based on Propeller profile (#166324)
This PR implements the prefetch insertion in the InsertCodePrefetch pass
based on the
[RFC](https://discourse.llvm.org/t/rfc-code-prefetch-insertion/88668).
If the prefetch target is not defined in the same module (i.e, prefetch
target function is not defined in the same module), we emit a fallback
weak symbol after the prefetch instruction so that if the symbol is not
ever defined, we don't get undefined symbol error and the prefetch
instruction prefetches the next address:
```
prefetchit1 __llvm_prefetch_target_foo(%rip)
.weak __llvm_prefetch_target_foo
__llvm_prefetch_target_foo:
```
The weak symbol semantic is tied to ELF, so this makes this PR
target-dependent.
[SystemZ][z/OS] Show instruction encoding in HLASM output
This change adds the support to show instruction encoding as a comment
when emitting HLASM text. With this, the last 2 LIT tests migrate to
HLASM syntax.