[OFFLOAD][L0] Improve cleanup on errors (#188251)
Additional cleanup improvements on error conditions (in addition to
those in #187597):
* Fixed incomplete cleanup in L0Context::init()
* Fixed build log leak in addModule()
* Fixed context inconsistent state in findDevices()
Disclaimer: The base of this PR was generated by Claude and adjusted by
me afterwards.
[libc] Remove redundant and incorrect comments in `logf.cpp` (#188236)
This PR intends to fix the nit (mostly caused during the refactor in
693a018dcf08e )
- During refactor we move the implementation along with the explanation
comments to header.
```CPP
// This is an algorithm for log(x) in single precision which is correctly
// rounded for all rounding modes, based on the implementation of log(x) from
// the RLIBM project at:
// https://people.cs.rutgers.edu/~sn349/rlibm
// Step 1 - Range reduction:
// For x = 2^m * 1.mant, log(x) = m * log(2) + log(1.m)
// If x is denormal, we normalize it by multiplying x by 2^23 and subtracting
.....
// Symposium on Principles of Programming Languages (POPL-2022), Philadelphia,
// USA, January 16-22, 2022.
[12 lines not shown]
[llvm] Update terminal list with ANSI colors (#187920)
ncurses hasn't adopted terminal entries for some terminal emulators like
Alacritty, Ghostty, and kitty in `xterm-*` format. To fix mismatch
between remote terminfo-db and terminal emulator that is accessing (e.g.
SSH), users explicitly set `$TERM` to `ghostty` or similar on the remote
side to make colors work. However, since `$TERM` has changed to non
`xterm-*` format in this case, llvm's `xterm-*` filter fails to catch
the terminal. Thus add entries for most widely used terminal emulators
(Alacritty, Ghostty, kitty) in terminfo-db version so llvm's filter can
catch it.
Signed-off-by: Minsoo Choo <minsoochoo0122 at proton.me>
[libc][docs] Avoid docgen target collisions and restore pthread docs (#188221)
Fixes llvm/llvm-project#123821.
Re-enabling pthread docs created a global CMake utility target named
`pthread`, which collides in combined runtime builds where `pthread` is
expected to be a library name. Namespace the internal libc docgen helper
targets under `libc-docgen-*` and restore the generated pthread docs
page. `docs-libc-html` is unchanged.
[Offload] Fix destroying signal that was never initialized
Summary:
We create the RPC doorbell signal lazily and destroy it at the plugin
level. This means that we can't rely on the normal 'per-device' handling
so this needs to be called unconditionally. We only create the signal if
a device is registered, but deinit is called unconditionally. Just check
the handle.
[AArch64][GlobalISel] Selet SQDMLSLv1i64_indexed when vector_extract present
Like SQDMLALv1i64_indexed, selecting this intrinsic reduces the number of instructions generated by 1, as it performs both the vector extract and the sqdmlal in one instruction.
This only works when the vector to extract from is v4i32, not v2i32. This is due to some issues GlobalISel has selecting intrinsics using v2i32.
liblfds: add new package
This is liblfds, a portable, license-free, lock-free data structure
library written in C.
Lock-free data structures are process, thread and interrupt safe
(i.e. the same data structure instance can be safely used concurrently
and simultaneously across cores, processes, threads and both inside
and outside of interrupt handlers), never sleep (and so are safe
for kernel use when sleeping is not permitted), operate without
context switches, cannot fail (no need to handle error cases, as
there are none), perform and scale literally orders of magnitude
better than locking data structures, and liblfds itself (as of
release 7.0.0) is implemented such that it performs no allocations
(and so works with NUMA, stack, heap and shared memory) and compiles
not just on a freestanding C89 implementation, but on a bare C89
implementation.
The library is completely documented (every API, function, macro,
[2 lines not shown]
[clang] Apply lvalue conversions to __builtin_classify_type argument (#175627)
According to GCC documentation default argument promotion is applied to
the argument, which includes the function-to-pointer and
array-to-pointer lvalue conversions.
This also implies checking of the argument for placeholder types.
Fixes #175589.
Reland "[flang][OpenMP] Fix lowering of LINEAR iteration variables (#183794)" (#187766)
Linear iteration variables were being treated as private. This fixes
one of the issues reported in #170784.
The previous regressions are fixed by #187097.
[CodeGen] Fix multiple connected component issue in rematerializer
This fixes a rematerializer issue wherein re-creating the interval of a
non-rematerializable super-register defined over multiple MIs, some of
which defining entirely dead subregisters, could cause a crash when
changing the order of sub-definitions (for example during scheduling)
because the re-created interval could end up with multiple connected
components, which is illegal. The solution is to split separate
components of the interval in such cases. The added unit test crashes
without that added behavior.
libpkg: fix --register-only with empty packages
Currently we end up calling archive_read_next_header() in
populate_config_file_contents() even when there are no files in the
package. This results in the following libarchive error:
pkg: archive_read_next_header(): INTERNAL ERROR: Function
'archive_read_next_header' invoked with archive structure in
state 'eof', should be in state 'header/data'
This commit fixes the error and adds a test to prevent regression.
Sponsored by: The FreeBSD Foundation
[CodeGen] Move rollback capabilities outside of the rematerializer
The rematerializer implements support for rolling back
rematerializations by modifying MIs that should normally be deleted in
an attempt to make them "transparent" to other analyses. This involves:
1. setting their opcode to DBG_VALUE and
2. setting their read register operands to the sentinel register.
This approach has several drawbacks.
1. It forces the rematerializer to support tracking these "dead MIs".
2. It is not actually clear whether this mechanism will interact well
with all other analyses. This is an issue since the intent of the
rematerializer is to be usable in as many contexts as possible.
3. In practice, it has shown itself to be relatively error-prone.
This commit removes rollback support from the rematerializer and moves
those capabilties to a rematerializer listener than can be instantiated
[5 lines not shown]
[RISCV] canCreateUndefOrPoisonForTargetNode - RISCVISD::READ_VLENB nodes don't create undef/poison (#188231)
Fixes a number of regressions in an upcoming FREEZE patch
[libc] Support AMDGPU device interrupts for the RPC interface (#188067)
Summary:
One of the main disadvantages to using the RPC interface is that it
requires a server thread to spin on the mailboxes checking for work.
The vast majority of the time, there will be no work and work will come
in large bursts.
The HSA / KFD interface supports device-side interrupts and already has
handling for binding these events to an HSA signal. This means that we
can send interrupts from the GPU to wake a sleeping thread on the CPU.
The sleeping thread will be descheduled with a blocking HSA wait call
and woken up when its event ID is raised through the kernel driver's
interrupt.
This is very target-specific handling, but I believe it is valuable
enough to warrant it being in the protocol. It is completely optional,
as it is ignored if uninitialized. This should bring this support at
parity with the interface HIP expects.
AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add pre-RA register allocation
hints to preferably assign dst and src2 to the same physical register.
When the hint is satisfied, canMapVOP3PToVOPD recognises the instruction
as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
[flang][debug] Always include (kind=X) suffix in debug type names (#186255)
Previously, 32-bit types (integer, real, logical, complex) were printed
without the (kind=4) suffix in DWARF debug type names, while other sizes
always included the kind suffix. This inconsistency is now removed by
always appending (kind=X) to all basic type names, making the format
uniform across all type sizes.
Fixes https://github.com/llvm/llvm-project/issues/119478.