[AMDGPU][SIInsertWaitcnt] Implement Waitcnt Expansion for Profiling (#169345)
Reference issue: https://github.com/ROCm/llvm-project/issues/67
This patch adds support for expanding s_waitcnt instructions into
sequences with decreasing counter values, enabling PC-sampling profilers
to identify which specific memory operation is causing a stall.
This is controlled via:
Clang flag: -mamdgpu-expand-waitcnt-profiling /
-mno-amdgpu-expand-waitcnt-profiling
Function attribute: "amdgpu-expand-waitcnt-profiling"
When enabled, instead of emitting a single waitcnt, the pass generates a
sequence that waits for each outstanding operation individually. For
example, if there are 5 outstanding memory operations and the target is
to wait until 2 remain:
[23 lines not shown]
Revert "[Clang] Warn when `std::atomic_thread_fence` is used with `fsanitize=thread`" (#175520)
Reverts llvm/llvm-project#166542
It caused clang to assert with: `!isa<CXXDestructorDecl>(D) && "Use
other ctor with dtor decls!"`
see comment on the PR.
[clang][bytecode] Fix calling lambdas with broken instance pointers (#175511)
Clang will make the instance pointer be of type 'int' if it is invalid,
which trips up later logic. Mark functions as invalid if any of their
parameters is and compile + check them early in CallPtr.
Fixes https://github.com/llvm/llvm-project/issues/175425
[clang-repl] Move the produced temporary files in wasm in a temp folder. (#175508)
This patch avoids bloating the current folder with temporary files
created by wasm and moves them in a separate tmp directory. This patch
is a version of a downstream one resolving this issue.
Co-authored with Anutosh Bhat.
[LLDB] Fix MS STL `variant` with non-trivial types (#171489)
When using `std::variant` with non-trivial types, we need to go through
multiple bases to find the `_Which` member. The MSVC STL implements this
in `xsmf_control.h` which conditionally adds/deletes copy/move
constructors/operators.
We now go to `_Variant_base` (the holder of `_Which`). This inherits
from `_Variant_storage`, which is our entry point to finding the n-th
storage (going through `_Tail`).
Address reviewer feedback: fix getWaitCountMax and reduce code duplication
- Fix getWaitCountMax() to use correct bitmasks based on architecture:
- Pre-GFX12: Use getVmcntBitMask/getLgkmcntBitMask for LOAD_CNT/DS_CNT
- GFX12+: Use getLoadcntBitMask/getDscntBitMask for LOAD_CNT/DS_CNT
- Refactor repetitive if-blocks for LOAD_CNT, DS_CNT, EXP_CNT into
a single loop using getCounterRef helper function
- Fix X_CNT to return proper getXcntBitMask(IV) instead of 0
[lldb] Disable flaky TestDetachResumes.py on Windows/AArch64
This patch marks TestDetachResumes.py skipped on Windows/AArch64.
It has been failing intermittently on Windows AArch64 buildbot:
https://lab.llvm.org/buildbot/#/builders/141/
This extends the prior change that disabled the same test on Windows
x86_64 (commit 6d8d4cf9a46b3729732736ffe288f6b722d85121 by Dmitry
Vasilyev, 2025-06-23). See #144891 for background and original
discussion.
[SDAG] Combine select into ABD?, for const (#173581)
(select (setcc ...) (sub a, b) (sub b, a))
When b is const, the `sub a, b` becomes `add a, -b` which we take care of in this patch with the m_SpecificNeg() matcher.
[LoongArch] Disable strict node mutation to fix strict FP lowering crash (#175484)
The patch disables strict node mutation for LoongArch by setting
IsStrictFPEnabled to true.
This change fixes the current strict FP lowering crash only.
ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS can be further improved.
Fixes #174606
[clang-repl] Rework layering of incremental executors. (#175448)
The original Interpreter implementation had a hard dependency on ORC and
grew organically with the addition of out-of-process JIT support. This
tightly coupled the Interpreter to a specific execution engine and
leaked ORC-specific assumptions (runtime layout, symbol lookup,
exception model) into higher layers.
The WebAssembly integration demonstrated that incremental execution can
be implemented without ORC, exposing the need for a cleaner abstraction
boundary.
This change introduces an IncrementalExecutor interface and moves
ORC-based execution behind a concrete implementation. The Interpreter
now depends only on the abstract executor, improving layering and
encapsulation.
In addition, the Interpreter can be configured with user-provided
incremental executor implementations, enabling ORC-independent
execution, easier testing, and future extensions without modifying the
core Interpreter.