[Unwind][AArch64] Match sigreturn instructions in big endian (#167139)
Since insns are always stored LE, on a BE system the opcodes will be
loaded byte-reversed. Therefore, define two sets of opcodes, one for LE
and one for BE.
[AMDGPU] Add baseline test to show spilling of wmma scale. NFC
This is to show the spilling of WMMA scale values which are limited
to low 256 VGPRs. We have free registers, just RA allocates low 256
first. The only problem, this test is large.
[libc++] Apply `[[nodiscard]]` to `in/out_ptr` (#167097)
...according to Coding Guidelines: `[[nodiscard]]` should be applied to
functions where discarding the return value is most likely a correctness
issue.
Changes to:
- [x] `inout_ptr()`
- [x] `out_ptr()`
At the time of impelentation the `[[nodiscard]]` policy has not been
established yet.
---------
Co-authored-by: Hristo Hristov <zingam at outlook.com>
[lldb] Add the ability to load DWARF64 .debug_str_offsets tables for DWARF32 DWARF units in .dwp files in LLDB. (#167997)
This patch is updating the reading capabilities of the LLDB DWARF parser
for a llvm-dwp patch https://github.com/llvm/llvm-project/pull/167457
that will emit .dwp files where the compile units are DWARF32 and the
.debug_str_offsets tables will be emitted as DWARF64 to allow .debug_str
sections that exceed 4GB in size.
[GlobalISel] Return byte offsets from computeValueLLTs (NFC) (#166747)
To avoid scaling offsets back and forth. This is also what SelectionDAG
equivalent (ComputeValueVTs) does, and will allow to reuse
ComputeValueTypes with less effort.
[lldb] Enforce Py_LIMITED_API in the SWIG typemaps (#168147)
We missed a handful of uses of the Python private API in the SWIG
typemaps because they are included before we include the Python header
that defines Py_LIMITED_API.
This fixes that and guards the last private use on whether or not you're
targeting the limited API. Unfortunately there doesn't appear to be an
alternative, so we have to resort to being slightly less defensive.
[lldb] Drop support for the Buffer Protocol (#168144)
This is an alternative solution to the issue described in #167990, which
can be summarized as that we cannot target Python 3.8 with the stable
API and support building for Python 3.13 and later due to the buffer
protocol.
The approach taken in this PR, and proposed by Ismail, is to sidesteps
the issue by dropping support for the buffer protocol. The only two
users are SBFile::Read and SBFile::Write. Instead, we support PyBytes
and PyByteArray which are the builtin types that conform to the buffer
protocol. Technically, this means a small regression, where those
methods could previously take custom types that conform to Python's
buffer protocol. Like Ismail, I think this is acceptable given the
alternatives.
Co-authored-by: Med Ismail Bennani <ismail at bennani.ma>
[SLP]Check if the copyable element is a sub instruciton with abs in isCommutable
Need to check if the non-copyable element is an instruction before actually
trying to check its NSW attribute.
[lldb] fix parallel module loading deadlock for Linux DYLD (#166480)
Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered
(his previous
[attempt](https://github.com/llvm/llvm-project/pull/160225)).
This change can be summarized as the following:
* Plumb through a boolean flag to force no preload in
`GetOrCreateModules` all the way through to `LoadModuleAtAddress`.
* Parallelize `Module::PreloadSymbols` separately from
`Target::GetOrCreateModule` and its caller `LoadModuleAtAddress` (this
is what avoids the deadlock).
These changes roughly maintain the performance characteristics of the
previous implementation of parallel module loading. Testing on targets
with between 5000 and 14000 modules, I saw similar numbers as before,
often more than 10% faster in the new implementation across multiple
trials for these massive targets. I think it's because we have less lock
contention with this approach.
[109 lines not shown]
[CIR] Upstream CIR codegen for vec_ext x86 builtins (#167942)
This PR upstreams the codegen for the x86 vec_ext builtins from the
incubator. It is part of #167752.
[ProfCheck] Refactor Select Instrumentation to use Early Exits (#168086)
I think this is quite a bit more readable than the nested conditionals.
From review feedback that was not addressed precommitn in #167973.
[lldb] Diagnose unsupported configurations when targeting the Limited C API (#168145)
Diagnose unsupported configurations when targeting the Python Limited C
API. I used SEND_ERROR so that if there's multiple issues, you don't
need to keep reconfiguring.
[SLP]Check if the copyable element is a sub instruciton with abs in isCommutable
Need to check if the non-copyable element is an instruction before actually
trying to check its NSW attribute.