[AMDGPU] Fold constant offsets into named barrier addresses
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.
Change-Id: I639bc723eb001573585cc05d0ad19f2773054f21
Assisted-by: Cursor
[AMDGPU] Pre-commit test for constant-offset named barrier signal_var
A GEP into a named-barrier array (&bars[1]) lowers s_barrier_signal_var to
the dynamic m0 form on SelectionDAG, unlike the bare global and GlobalISel.
With object linking it emits a runtime add of the offset instead of folding
it into the relocation addend.
Change-Id: I7cea0dd64d050eb3e2143841e7136355cbb3bc50
Assisted-by: Cursor
[AMDGPU] Fold constant offsets into named barrier addresses
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.
Change-Id: Ie05b8c8cd127604ff174c423a74340fd2de4e405
Assisted-by: Cursor
[AMDGPU] Pre-commit test for constant-offset named barrier signal_var
A GEP into a named-barrier array (&bars[1]) lowers s_barrier_signal_var to
the dynamic m0 form on SelectionDAG, unlike the bare global and GlobalISel.
With object linking it emits a runtime add of the offset instead of folding
it into the relocation addend.
Change-Id: I59f0e6fe6a72b4c96c8efb926610f7f2d3833e38
Assisted-by: Cursor
Merge tag 'erofs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"The most notable change is the removal of the fscache backend: it has
been deprecated for almost two years, mainly because EROFS file-backed
mounts and fanotify pre-content hooks (together with erofs-utils) now
provide better functionality and simpler codebase. In addition,
fscache has depended on netfslib for years, which is undesirable for
EROFS since it is a local filesystem. More details in [1].
In addition, sparse support has been added to the pcluster layout,
which is helpful for large sparse AI datasets, and map requests for
chunk-based inodes have been optimized to be more efficient as well.
There are also the usual fixes and cleanups.
Summary:
- Report more consecutive chunks of the same type for
each iomap request
[21 lines not shown]
don't increment scatterlist length twice
this occurs as sg_dma_len() returns the length member of struct scatterlist
where as on x86 linux it returns a dma_length member of the struct
Problem reported by Ryan Fahy in FreeBSD drm-kmod PR 468.
Avoids a 'Data modified on freelist' panic on boot when using discrete
Intel cards (DG2). DG2 has other issues, so remains disabled for now.
[RISCV][P-ext] packed exchanged add/sub codegen (#203473)
Wire up the already-defined exchanged add/sub instructions
pas/psa/psas/pssa/paas/pasa with llvm.riscv.* intrinsics and isel
patterns.
[Instrumentor] Add runtime examples: [2/N] A FP precision analysis
Second example:
Check all floating point operations and track if they could be done at
lower precision.
Partially developped by Claude (AI), tested and verified by me.
nfscl: Add support for flexible file layout striping
Commit 72e57bc26417 added support for striping to the pNFS
server configuration. This patch adds support for striping
to the NFS client.
For striped flexible file layouts, an extra structure
must be malloc()d for each stripe, since the number
of stripe servers can vary from one mirror to another.
This new structure is called nfsffs and a single one
of these structures is in the nfsffm structure so that
the non-striped layouts can avoid the additional malloc()'s.
This patch only affects NFSv4.1/4.2 mounts that use the
"pnfs" mount option against servers that support the
flexible file layout.
[BPF] Increase BPFMaxStoresPerMemFunc from 128 to 192 (#205222)
With commits [1] and [2], memory operations like memcpy/memmove lower to
a sequence of loads/stores whose width is the minimum of the source and
destination alignment, and the store count is bounded by
BPFMaxStoresPerMemFunc. For 1-byte alignment, the maximum copy length
that can be inlined is therefore 128 bytes.
This may regress cases that previously inlined. Consider a memcpy with
src alignment 8, dst alignment 1 and size 136. After [1]/[2], the store
width is the minimum alignment (1 byte), so the store count is 136,
which exceeds the 128 limit and the copy falls back. Before [1]/[2], the
store count was computed with a fixed 8-byte unit regardless of the
actual alignment (each unit expands to 8 one-byte stores when the
minimum alignment is 1), so the total count was only 17 (136/8 < 128)
and the copy was inlined.
Raise the limit from 128 to 192 to mitigate. Alternatively, users can
increase alignment to avoid the regression.
[2 lines not shown]
[Instrumentor] Add runtime examples: [2/N] A FP precision analysis
Second example:
Check all floating point operations and track if they could be done at
lower precision.
Partially developped by Claude (AI), tested and verified by me.
[Instrumentor] Add runtime examples: [1/N] A flop counter
This adds a instrumentor-examples folder into compiler RT to showcase
use cases of the instrumentor. The initial example is a program that,
via instrumentation, counts the number of flops performed.
Partially developped by Claude (AI), tested and verified by me.
[OpenCL] Warn if filter_mode is linear in read_image{i|ui} (#204086)
Per OpenCL spec:
The read_image{i|ui} calls support a nearest filter only. The
filter_mode specified in sampler must be set to CLK_FILTER_NEAREST;
otherwise the values returned are undefined.
Warn users when they apply a linear filter accidentally.
Address https://github.com/intel/compute-runtime/issues/379#issuecomment-4592083032
Assisted-by: Claude Sonnet 4.6
[HLSL] Emit lifetime.start before copy-in for inout parameters (#191917)
For inout parameters, Clang was emitting lifetime.start after the
copy-in store that initializes the temporary. Per LLVM's lifetime
semantics, any access to memory outside its lifetime is undefined
behavior, so the copy-in store was technically UB and the value was
undefined after lifetime.start.
Move EmitLifetimeStart into EmitHLSLOutArgLValues so that it is emitted
before EmitInitializationToLValue, putting the copy-in store within the
lifetime of the temporary.
---------
Co-authored-by: Alexandre Isoard <alexandre.isoard at amd.com>
Co-authored-by: Deric C. <cheung.deric at gmail.com>
[lldb] Survive ptrace(PT_DENY_ATTACH) when attaching (#204688) (#205198)
A process can opt out of being debugged with ptrace(PT_DENY_ATTACH). The
XNU kernel enforces this by delivering SIGSEGV to the *attaching*
process while it is still inside the ptrace(PT_ATTACHEXC) syscall. This
means debugserver gets killed before it can inspect the result. LLDB
only sees the dropped connection ("error: attach failed: lost
connection").
The condition can't be detected up front: the target's P_LNOATTACH flag
is not exposed to userspace. To work around this, install a temporary
SIGSEGV handler around the ptrace(PT_ATTACHEXC) call in AttachForDebug
and siglongjmp back out if it fires, turning the fatal signal into an
EPERM that propagates to lldb as a clear message:
```
error: attach failed: cannot attach to process N because it has
disabled debugging via ptrace(PT_DENY_ATTACH)
```
[7 lines not shown]
[Clang] Fix crash when comparing fixed point type with BitInt (#199912)
Fixes #196948
Added checks in `handleFixedPointConversion`: reject fixed point/BitInt
comparisons
Now clang properly emits an error instead of crashing.
---------
Co-authored-by: cry <2091136672 at foxmail.com>