[LLD][ELF] Do not reuse thunks in OVERLAYs (#200415)
We cannot guarantee that a thunk in an OVERLAY will be in memory at the
same time as the caller if the caller is not in the same output section.
It is safe for a caller in an OVERLAY to reuse a thunk in a non-OVERLAY
section as we know that will be in memory. Thunks that are placed
before their target, are alternative entry points and can also be reused.
Resurrect the isThunkSectionCompatible function that was recently
removed as it served a similar purpose for thunks in different
partitions.
Potentially fixes #199966 which mentions a similar problem for sections
assigned to TCM (Tightly Coupled Memory). It should be possible to model
a TCM as an OVERLAY. If not then there may need to be a command-line
option to inhibit thunk sharing across output sections.
[orc-rt] Treat empty path as "process symbols" in NativeDylibManager. (#202905)
NativeDylibManager::load now handles an empty path by returning the
process's global lookup handle (RTLD_DEFAULT on POSIX) directly,
bypassing dlopen and the shutdown-time dlclose registration. This
matches the behavior of OrcTargetProcess's SimpleExecutorDylibManager.
[AMDGPU] Support Wave Reduction for i16 types - 1 (#194808)
Supported Ops: `min`, `umin`, `max`, `umax`.
16-bit wave reduce ops are promoted to 32-bit
operations before ISEL. From there they use the
existing implementations for 32-bit reductions.
Assisted by - Claude-sonnet:4.6
[AMDGPU] Support Wave Reduction for i16 types - 3 (#194812)
Supported Ops: `and`, `or`, `xor`.
Supports only the iterative stratergy, DPP is yet
to be supported.
Supports only Fake-16 versions of the lowering.
True-16 support is yet to be added.
[AMDGPU] Support Wave Reduction for i16 types - 2 (#194810)
Supported Ops: `add`, `sub`.
Supports only the iterative stratergy, DPP is yet
to be supported.
Supports only Fake-16 versions of the lowering.
True-16 support is yet to be added.
[AMDGPU] Support Wave Reduction for i16 types - 1 (#194808)
Supported Ops: `min`, `umin`, `max`, `umax`.
16-bit wave reduce ops are promoted to 32-bit
operations before ISEL. From there they use the
existing implementations for 32-bit reductions.
Assisted by - Claude-sonnet:4.6
[AMDGPU][InsertWaitCnts] Move HWEvent analysis code
Building up on the previous RFC, if it is accepted:
Move the code that maps a MachineInstr to HWEventSet to a separate file.
This should be NFC.
[AMDGPU][InsertWaitCnts] Move HWEvent analysis code
Building up on the previous RFC, if it is accepted:
Move the code that maps a MachineInstr to HWEventSet to a separate file.
This should be NFC.
[llvm][ADT] Make ImmutableList conform the fwd iterator concept (#202580)
We missed post increment and a couple of typedefs. This would enable
llvm algorithms like filter_range, etc.
AMDGPU/GlobalISel: Implement RegBankLegalize rules for SALUFloat variants of G_INTRINSIC_TRUNC, G_FFLOOR and G_FCEIL. (#187679)
As requested on PR #179954.