[RISCV][P-ext] packed exchanged add/sub codegen (#203473)
Wire up the already-defined exchanged add/sub instructions
pas/psa/psas/pssa/paas/pasa with llvm.riscv.* intrinsics and isel
patterns.
[Instrumentor] Add runtime examples: [2/N] A FP precision analysis
Second example:
Check all floating point operations and track if they could be done at
lower precision.
Partially developped by Claude (AI), tested and verified by me.
[BPF] Increase BPFMaxStoresPerMemFunc from 128 to 192 (#205222)
With commits [1] and [2], memory operations like memcpy/memmove lower to
a sequence of loads/stores whose width is the minimum of the source and
destination alignment, and the store count is bounded by
BPFMaxStoresPerMemFunc. For 1-byte alignment, the maximum copy length
that can be inlined is therefore 128 bytes.
This may regress cases that previously inlined. Consider a memcpy with
src alignment 8, dst alignment 1 and size 136. After [1]/[2], the store
width is the minimum alignment (1 byte), so the store count is 136,
which exceeds the 128 limit and the copy falls back. Before [1]/[2], the
store count was computed with a fixed 8-byte unit regardless of the
actual alignment (each unit expands to 8 one-byte stores when the
minimum alignment is 1), so the total count was only 17 (136/8 < 128)
and the copy was inlined.
Raise the limit from 128 to 192 to mitigate. Alternatively, users can
increase alignment to avoid the regression.
[2 lines not shown]
[Instrumentor] Add runtime examples: [2/N] A FP precision analysis
Second example:
Check all floating point operations and track if they could be done at
lower precision.
Partially developped by Claude (AI), tested and verified by me.
[Instrumentor] Add runtime examples: [1/N] A flop counter
This adds a instrumentor-examples folder into compiler RT to showcase
use cases of the instrumentor. The initial example is a program that,
via instrumentation, counts the number of flops performed.
Partially developped by Claude (AI), tested and verified by me.
[OpenCL] Warn if filter_mode is linear in read_image{i|ui} (#204086)
Per OpenCL spec:
The read_image{i|ui} calls support a nearest filter only. The
filter_mode specified in sampler must be set to CLK_FILTER_NEAREST;
otherwise the values returned are undefined.
Warn users when they apply a linear filter accidentally.
Address https://github.com/intel/compute-runtime/issues/379#issuecomment-4592083032
Assisted-by: Claude Sonnet 4.6
[HLSL] Emit lifetime.start before copy-in for inout parameters (#191917)
For inout parameters, Clang was emitting lifetime.start after the
copy-in store that initializes the temporary. Per LLVM's lifetime
semantics, any access to memory outside its lifetime is undefined
behavior, so the copy-in store was technically UB and the value was
undefined after lifetime.start.
Move EmitLifetimeStart into EmitHLSLOutArgLValues so that it is emitted
before EmitInitializationToLValue, putting the copy-in store within the
lifetime of the temporary.
---------
Co-authored-by: Alexandre Isoard <alexandre.isoard at amd.com>
Co-authored-by: Deric C. <cheung.deric at gmail.com>
[lldb] Survive ptrace(PT_DENY_ATTACH) when attaching (#204688) (#205198)
A process can opt out of being debugged with ptrace(PT_DENY_ATTACH). The
XNU kernel enforces this by delivering SIGSEGV to the *attaching*
process while it is still inside the ptrace(PT_ATTACHEXC) syscall. This
means debugserver gets killed before it can inspect the result. LLDB
only sees the dropped connection ("error: attach failed: lost
connection").
The condition can't be detected up front: the target's P_LNOATTACH flag
is not exposed to userspace. To work around this, install a temporary
SIGSEGV handler around the ptrace(PT_ATTACHEXC) call in AttachForDebug
and siglongjmp back out if it fires, turning the fatal signal into an
EPERM that propagates to lldb as a clear message:
```
error: attach failed: cannot attach to process N because it has
disabled debugging via ptrace(PT_DENY_ATTACH)
```
[7 lines not shown]
[Clang] Fix crash when comparing fixed point type with BitInt (#199912)
Fixes #196948
Added checks in `handleFixedPointConversion`: reject fixed point/BitInt
comparisons
Now clang properly emits an error instead of crashing.
---------
Co-authored-by: cry <2091136672 at foxmail.com>
Change 'benign-entity-name-conflict.cpp' to
'entity-name-no-conflict.cpp' because it is a USR generation bug even
though the erroneous behavior is benign in this example.
[Instrumentor] Add runtime examples: [1/N] A flop counter
This adds a instrumentor-examples folder into compiler RT to showcase
use cases of the instrumentor. The initial example is a program that,
via instrumentation, counts the number of flops performed.
Partially developped by Claude (AI), tested and verified by me.
[docs] Rename LangRef.{rst|md}
Tracking issue: #201242
This commit does not use valid markdown, so the docs will not build, but they will be fixed in an immediate follow-up commit that does the migration.
[CIR] Use the AST result type for sizeof/alignof constants (#203942)
On targets where `size_t` is narrower than 64 bits (e.g. `i686`), CIR
codegen for `sizeof`/`alignof`/`__builtin_vectorelements` crashes with a
type/value bitwidth mismatch.
The result of these expressions is `size_t`, but the emitted integer
constant was built with a hardcoded 64-bit type. `EvaluateKnownConstInt`
returns an `APSInt` with the width of the AST result type (32 bits on
this target), so it no longer matches the `IntAttr`'s type and trips the
`IntAttr` verifier.
### How to Reproduce
```c++
using size_t = decltype(sizeof(int));
size_t size_of_int() { return sizeof(int); }
clang -cc1 -std=c++20 -triple i686-unknown-linux-gnu -fclangir \
-emit-cir test.cpp -o test.cir
[9 lines not shown]
[AMDGPU] Fold constant offsets into named barrier addresses
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
Change-Id: I2ce500917c3d47cd3687473406decc7430d73361
Assisted-by: Cursor
[AMDGPU] Pre-commit test for constant-offset named barrier signal_var
A GEP into a named-barrier array (&bars[1]) lowers s_barrier_signal_var to
the dynamic m0 form on SelectionDAG, unlike the bare global and GlobalISel.
Change-Id: I8846eb200b1e28785adfdfcaa082390170f4ea2d
Assisted-by: Cursor
[lldb] Use heuristics to extend rather than replace error message (#205196)
When an attach fails, HandlePacket_A tries to explain why. The last two
checks are heuristics that discard any error debugserver already
produced for this specific failure.
The guess can be wrong, for example the PT_DENY_ATTACH test case from
#204688 is incorrectly reported as failing due to it running in a
non-interactive debug session on the bots.
Include debugserver's real error into the heuristic message, instead of
replacing it, so the real reason is never lost.