Reland [Allocator] Keep bump pointer at a minimum alignment (#205240)
Reland #203718 (reverted in #205091) by making computation in integer
domain to avoid UB (nullptr + non-zero offset).
Add a `MinAlign` template parameter (default 8, sizeof(size_t) on 64-bit
platforms) so that the common case `Alignment <= MinAlign` can skip
realigning `CurPtr`.
This is achieved by rounding each allocation's size up to MinAlign, so
the bump pointer stays MinAlign-aligned between allocations.
SpecificBumpPtrAllocator::DestroyAll() walks objects at a fixed
sizeof(T) stride and needs tight packing, so it uses MinAlign=1.
(alignof(T) would
pack just as tightly and reuse the default instantiation, but T may be
incomplete here, e.g. `SpecificBumpPtrAllocator<MCSectionELF>`.)
Its `Allocate` still skips the realign: the slab is max_align_t-aligned
[9 lines not shown]
[OpenMPOpt][Attributor] Selectively seed deglobalization AAs (#198710)
This addresses a compile-time issue observed on a large generated C++
translation unit compiled with `-fopenmp`.
The source code is not OpenMP-heavy. It mainly consists of generated
function-registration wrappers, template instantiations, lambdas, and
small helper functions. However, because the TU is compiled with OpenMP
enabled, `OpenMPOptCGSCCPass` runs and drives Attributor on a module
with many functions.
`OpenMPOpt::registerAAsForFunction` currently eagerly creates the
deglobalization AAs for every function in OpenMP device modules:
* `AAHeapToShared`
* `AAHeapToStack`
Most generated wrapper/helper functions in the motivating workload do
not contain `__kmpc_alloc_shared`, removable allocations, or free-like
[25 lines not shown]
[AMDGPU] Fold constant offsets into named barrier addresses
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.
Change-Id: I639bc723eb001573585cc05d0ad19f2773054f21
Assisted-by: Cursor
[AMDGPU] Pre-commit test for constant-offset named barrier signal_var
A GEP into a named-barrier array (&bars[1]) lowers s_barrier_signal_var to
the dynamic m0 form on SelectionDAG, unlike the bare global and GlobalISel.
With object linking it emits a runtime add of the offset instead of folding
it into the relocation addend.
Change-Id: I7cea0dd64d050eb3e2143841e7136355cbb3bc50
Assisted-by: Cursor
[AMDGPU] Fold constant offsets into named barrier addresses
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.
Change-Id: Ie05b8c8cd127604ff174c423a74340fd2de4e405
Assisted-by: Cursor
[AMDGPU] Pre-commit test for constant-offset named barrier signal_var
A GEP into a named-barrier array (&bars[1]) lowers s_barrier_signal_var to
the dynamic m0 form on SelectionDAG, unlike the bare global and GlobalISel.
With object linking it emits a runtime add of the offset instead of folding
it into the relocation addend.
Change-Id: I59f0e6fe6a72b4c96c8efb926610f7f2d3833e38
Assisted-by: Cursor
[RISCV][P-ext] packed exchanged add/sub codegen (#203473)
Wire up the already-defined exchanged add/sub instructions
pas/psa/psas/pssa/paas/pasa with llvm.riscv.* intrinsics and isel
patterns.
[Instrumentor] Add runtime examples: [2/N] A FP precision analysis
Second example:
Check all floating point operations and track if they could be done at
lower precision.
Partially developped by Claude (AI), tested and verified by me.
[BPF] Increase BPFMaxStoresPerMemFunc from 128 to 192 (#205222)
With commits [1] and [2], memory operations like memcpy/memmove lower to
a sequence of loads/stores whose width is the minimum of the source and
destination alignment, and the store count is bounded by
BPFMaxStoresPerMemFunc. For 1-byte alignment, the maximum copy length
that can be inlined is therefore 128 bytes.
This may regress cases that previously inlined. Consider a memcpy with
src alignment 8, dst alignment 1 and size 136. After [1]/[2], the store
width is the minimum alignment (1 byte), so the store count is 136,
which exceeds the 128 limit and the copy falls back. Before [1]/[2], the
store count was computed with a fixed 8-byte unit regardless of the
actual alignment (each unit expands to 8 one-byte stores when the
minimum alignment is 1), so the total count was only 17 (136/8 < 128)
and the copy was inlined.
Raise the limit from 128 to 192 to mitigate. Alternatively, users can
increase alignment to avoid the regression.
[2 lines not shown]
[Instrumentor] Add runtime examples: [2/N] A FP precision analysis
Second example:
Check all floating point operations and track if they could be done at
lower precision.
Partially developped by Claude (AI), tested and verified by me.
[Instrumentor] Add runtime examples: [1/N] A flop counter
This adds a instrumentor-examples folder into compiler RT to showcase
use cases of the instrumentor. The initial example is a program that,
via instrumentation, counts the number of flops performed.
Partially developped by Claude (AI), tested and verified by me.
[OpenCL] Warn if filter_mode is linear in read_image{i|ui} (#204086)
Per OpenCL spec:
The read_image{i|ui} calls support a nearest filter only. The
filter_mode specified in sampler must be set to CLK_FILTER_NEAREST;
otherwise the values returned are undefined.
Warn users when they apply a linear filter accidentally.
Address https://github.com/intel/compute-runtime/issues/379#issuecomment-4592083032
Assisted-by: Claude Sonnet 4.6
[HLSL] Emit lifetime.start before copy-in for inout parameters (#191917)
For inout parameters, Clang was emitting lifetime.start after the
copy-in store that initializes the temporary. Per LLVM's lifetime
semantics, any access to memory outside its lifetime is undefined
behavior, so the copy-in store was technically UB and the value was
undefined after lifetime.start.
Move EmitLifetimeStart into EmitHLSLOutArgLValues so that it is emitted
before EmitInitializationToLValue, putting the copy-in store within the
lifetime of the temporary.
---------
Co-authored-by: Alexandre Isoard <alexandre.isoard at amd.com>
Co-authored-by: Deric C. <cheung.deric at gmail.com>
[lldb] Survive ptrace(PT_DENY_ATTACH) when attaching (#204688) (#205198)
A process can opt out of being debugged with ptrace(PT_DENY_ATTACH). The
XNU kernel enforces this by delivering SIGSEGV to the *attaching*
process while it is still inside the ptrace(PT_ATTACHEXC) syscall. This
means debugserver gets killed before it can inspect the result. LLDB
only sees the dropped connection ("error: attach failed: lost
connection").
The condition can't be detected up front: the target's P_LNOATTACH flag
is not exposed to userspace. To work around this, install a temporary
SIGSEGV handler around the ptrace(PT_ATTACHEXC) call in AttachForDebug
and siglongjmp back out if it fires, turning the fatal signal into an
EPERM that propagates to lldb as a clear message:
```
error: attach failed: cannot attach to process N because it has
disabled debugging via ptrace(PT_DENY_ATTACH)
```
[7 lines not shown]
[Clang] Fix crash when comparing fixed point type with BitInt (#199912)
Fixes #196948
Added checks in `handleFixedPointConversion`: reject fixed point/BitInt
comparisons
Now clang properly emits an error instead of crashing.
---------
Co-authored-by: cry <2091136672 at foxmail.com>
Change 'benign-entity-name-conflict.cpp' to
'entity-name-no-conflict.cpp' because it is a USR generation bug even
though the erroneous behavior is benign in this example.
[Instrumentor] Add runtime examples: [1/N] A flop counter
This adds a instrumentor-examples folder into compiler RT to showcase
use cases of the instrumentor. The initial example is a program that,
via instrumentation, counts the number of flops performed.
Partially developped by Claude (AI), tested and verified by me.