[LLD][ELF] Do not reuse thunks in OVERLAYs (#200415)
We cannot guarantee that a thunk in an OVERLAY will be in memory at the
same time as the caller if the caller is not in the same output section.
It is safe for a caller in an OVERLAY to reuse a thunk in a non-OVERLAY
section as we know that will be in memory. Thunks that are placed
before their target, are alternative entry points and can also be reused.
Resurrect the isThunkSectionCompatible function that was recently
removed as it served a similar purpose for thunks in different
partitions.
Potentially fixes #199966 which mentions a similar problem for sections
assigned to TCM (Tightly Coupled Memory). It should be possible to model
a TCM as an OVERLAY. If not then there may need to be a command-line
option to inhibit thunk sharing across output sections.
[orc-rt] Treat empty path as "process symbols" in NativeDylibManager. (#202905)
NativeDylibManager::load now handles an empty path by returning the
process's global lookup handle (RTLD_DEFAULT on POSIX) directly,
bypassing dlopen and the shutdown-time dlclose registration. This
matches the behavior of OrcTargetProcess's SimpleExecutorDylibManager.
[AMDGPU] Support Wave Reduction for i16 types - 1 (#194808)
Supported Ops: `min`, `umin`, `max`, `umax`.
16-bit wave reduce ops are promoted to 32-bit
operations before ISEL. From there they use the
existing implementations for 32-bit reductions.
Assisted by - Claude-sonnet:4.6
[AMDGPU] Support Wave Reduction for i16 types - 3 (#194812)
Supported Ops: `and`, `or`, `xor`.
Supports only the iterative stratergy, DPP is yet
to be supported.
Supports only Fake-16 versions of the lowering.
True-16 support is yet to be added.
[AMDGPU] Support Wave Reduction for i16 types - 2 (#194810)
Supported Ops: `add`, `sub`.
Supports only the iterative stratergy, DPP is yet
to be supported.
Supports only Fake-16 versions of the lowering.
True-16 support is yet to be added.
[AMDGPU] Support Wave Reduction for i16 types - 1 (#194808)
Supported Ops: `min`, `umin`, `max`, `umax`.
16-bit wave reduce ops are promoted to 32-bit
operations before ISEL. From there they use the
existing implementations for 32-bit reductions.
Assisted by - Claude-sonnet:4.6
[AMDGPU][InsertWaitCnts] Move HWEvent analysis code
Building up on the previous RFC, if it is accepted:
Move the code that maps a MachineInstr to HWEventSet to a separate file.
This should be NFC.
[AMDGPU][InsertWaitCnts] Move HWEvent analysis code
Building up on the previous RFC, if it is accepted:
Move the code that maps a MachineInstr to HWEventSet to a separate file.
This should be NFC.
[llvm][ADT] Make ImmutableList conform the fwd iterator concept (#202580)
We missed post increment and a couple of typedefs. This would enable
llvm algorithms like filter_range, etc.
AMDGPU/GlobalISel: Implement RegBankLegalize rules for SALUFloat variants of G_INTRINSIC_TRUNC, G_FFLOOR and G_FCEIL. (#187679)
As requested on PR #179954.
[lldb][docs] Document what a Platform is (#202332)
Fixes #201875.
In #201875 a user was understandably confused what a platform even is,
and I had never had to explain it from the conceptual point of view
either.
So I wrote a long explanation
(https://github.com/llvm/llvm-project/issues/201875#issuecomment-4634087717)
specific to what they were trying to do. I don't think we need all that
in the docs and we don't have a great place for it anyway.
My alternative is:
* A high level explanation in the overview, to say what a platform does.
* A link from there to https://lldb.llvm.org/use/remote.html which has a
practical example of using one.
* A note in the platform extensions doc that our platform mode is not
related to gdb's extended remote.
[3 lines not shown]
[clang][OpenMP] Improve loop structure for distributed loops (pt 2)
This patch complements https://github.com/llvm/llvm-project/pull/201670
for non-reduction loops and wires the existing
`kmp_sched_distr_static_chunk_sched_static_chunkone` to be used by
CodeGen for these loops as well.
[AArch64] Use PNR rather than PPR register class for aarch64svcount (#202394)
While predicates and predicate-as-counter both use the same underlying
registers, within LLVM they use different register classes (PPR vs PNR).
Mapping aarch64svcount to the PPRRegClass results in some unnecessary
cross register class copies around PHIs, which results in some
unnecessary moves.
[libc++] Make __is_less_than_compatable a variable template (#202525)
This makes the code a bit more readable and improves compile times a
bit, since variable templates are faster to instantiate than class
templates.
[SPIR-V] Look up printf format string type in the correct function (#201523)
addPrintfRequirements() resolved the SPIR-V type of the format string
operand via getSPIRVTypeForVReg() without passing the instruction's
parent MachineFunction, so the lookup defaulted to the registry's CurMF:
whichever function happened to be processed last. Virtual register
numbers are only unique within a function, so in multi-function modules
the check could inspect an unrelated function's type, misreading its
second operand as the format string's storage class (an OpTypeInt's
width immediate, in the added test). For a format string in the constant
address space this spuriously triggered the fatal
"SPV_EXT_relaxed_printf_string_address_space is required" error, or
silently added the unnecessary extension when it was available;
conversely, the requirement could be silently omitted when the colliding
vreg had no recorded type.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
[RISCV] Add riscv_packed_simd.h for P extension intrinsics (#181115)
Add `riscv_packed_simd.h` with initial RISC-V P extension intrinsics, covering:
- Packed Splat
- Packed Addition and Subtraction
- Packed Addition with Scalar
- Packed Saturating Addition and Subtraction
- Packed Shift-Add
- Packed Minimum and Maximum
- Packed Shifts
- Packed Logical Operations
The intrinsics are implemented as thin wrappers over standard C operators
and existing generic builtins (`__builtin_elementwise_add_sat` etc.), letting
the RISC-V backend lower the resulting `<N x iN>` IR to P-ext instructions.
No new clang builtins or `llvm.riscv.*` intrinsics are introduced.
Spec: https://github.com/riscv/riscv-p-spec/blob/master/P-ext-intrinsics.adoc
[InstCombine] Remove knowledge retention folding (#202890)
The knowledge retention API for simplifying assumes isn't that useful
anymore, since most simplifications done by it are now done
unconditionally directly in InstCombine. It's also known to miscompoile
multiple patterns.