libc: Fix dtor order in __cxa_thread_atexit
The thread_local variable may creates another thread_local variable
inside its dtor. This new object is immediately be registered in
__cxa_thread_atexit() and need to be freed before processing another
variable.
This fixes the libcxx test thread_local_destruction_order.pass.cpp.
Reported by: kib
Approved by: lwhsu (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55826
libc: Fix dtor order in __cxa_thread_atexit
The thread_local variable may creates another thread_local variable
inside its dtor. This new object is immediately be registered in
__cxa_thread_atexit() and need to be freed before processing another
variable.
This fixes the libcxx test thread_local_destruction_order.pass.cpp.
Reported by: kib
Approved by: lwhsu (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55826
kern_time: Honor the precise option when counting diff
When preecise option is used, the true elapsed time should also use the
precise timer.
This fixes the test case sleep_for.signals.pass.cpp in libcxx.
Reviewed by: kib, imp
Approved by: lwhsu (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55824
kern_time: Honor the precise option when counting diff
When preecise option is used, the true elapsed time should also use the
precise timer.
This fixes the test case sleep_for.signals.pass.cpp in libcxx.
Reviewed by: kib, imp
Approved by: lwhsu (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55824
tcp: fix up !VIMAGE builds
The tcp_seq.h uses getmicrouptime() in an inline function, but it doesn't
include <sys/time.h>. This was usually masked by having tcp_var.h always
before tcp_seq.h, so restore that.
Fixes: c0462c2deafdcfe885e8d6f91b529d8cbddc6014
tcp: fix up !VIMAGE builds
The tcp_seq.h uses getmicrouptime() in an inline function, but it doesn't
include <sys/time.h>. This was usually masked by having tcp_var.h always
before tcp_seq.h, so restore that.
Fixes: c0462c2deafdcfe885e8d6f91b529d8cbddc6014
inpcb: fix up !VIMAGE builds
There are some files that don't include mutex.h and rwlock.h, but use
inpcb locking macros. With VIMAGE the net/vnet.h pulls half of the
possible kernel includes, masking the problem. The in_pcb.h also used to
mask the problem, so restore that.
Fixes: 041e9eb1ae094a81e55fbcaba37eb2ac194658cc
inpcb: fix up !VIMAGE builds
There are some files that don't include mutex.h and rwlock.h, but use
inpcb locking macros. With VIMAGE the net/vnet.h pulls half of the
possible kernel includes, masking the problem. The in_pcb.h also used to
mask the problem, so restore that.
Fixes: 041e9eb1ae094a81e55fbcaba37eb2ac194658cc
[Driver] Enable -ftime-trace for CUDA/HIP device compilation (#179701)
Previously, -ftime-trace only generated trace files for host compilation
when compiling CUDA/HIP code. Device compilation was excluded because
the OffloadingPrefix was non-empty, causing handleTimeTrace() to be
skipped.
This patch enables -ftime-trace for offload device compilation by:
1. Passing the offloading prefix to handleTimeTrace()
2. Including the bound architecture in the trace filename
3. Deriving the trace output directory from the -o option for device
compilation (since the device output is a temp file)
Trace files are now generated for each offload target:
- Host: output.json
- Device: output-hip-amdgcn-amd-amdhsa-gfx906.json
Note: When using --save-temps, multiple compilation phases (preprocess,
compile, codegen) write to the same trace file, with each phase
[3 lines not shown]
[CUDA/HIP][SYCL] Deduplicate deferred diagnostics across multiple callers (#185926)
[CUDA/HIP][SYCL] Deduplicate deferred diagnostics across multiple
callers
Deferred diagnostics for a function were emitted once per caller that
forced the function into device context. When multiple device functions
called the same host-device function containing errors, the diagnostics
were repeated for each caller, producing noisy duplicate output.
Change the deferred diagnostic emission to a two-pass approach:
1. During the call graph walk, collect callers in DeviceKnownEmittedFns
(now storing multiple callers per function) and mark functions that
need diagnostics, but don't emit yet.
2. After the walk completes, emit diagnostics once per function with
all callers listed as notes.
Call chain notes now use "called by" for the first caller in each chain
and "which is called by" for subsequent callers in the chain, making it
[5 lines not shown]