LLVM/project 3dfb782clang/include/clang/Basic CodeGenOptions.def, clang/include/clang/Options Options.td

[AMDGPU][SIInsertWaitcnt] Implement Waitcnt Expansion for Profiling (#169345)

Reference issue: https://github.com/ROCm/llvm-project/issues/67

This patch adds support for expanding s_waitcnt instructions into
sequences with decreasing counter values, enabling PC-sampling profilers
to identify which specific memory operation is causing a stall.

This is controlled via:
Clang flag: -mamdgpu-expand-waitcnt-profiling /
-mno-amdgpu-expand-waitcnt-profiling
Function attribute: "amdgpu-expand-waitcnt-profiling"

When enabled, instead of emitting a single waitcnt, the pass generates a
sequence that waits for each outstanding operation individually. For
example, if there are 5 outstanding memory operations and the target is
to wait until 2 remain:



    [23 lines not shown]
DeltaFile
+944-0llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+204-93llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+20-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+19-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+7-0clang/include/clang/Options/Options.td
+4-0clang/include/clang/Basic/CodeGenOptions.def
+1,198-931 files not shown
+1,200-937 files

LLVM/project a6378b6llvm/include/llvm/CodeGen ReachingDefAnalysis.h, llvm/lib/CodeGen ReachingDefAnalysis.cpp

 [ReachingDefAnalysis][NFC] Use named constants. (#175075)

DeltaFile
+3-3llvm/lib/CodeGen/ReachingDefAnalysis.cpp
+3-1llvm/include/llvm/CodeGen/ReachingDefAnalysis.h
+6-42 files

LLVM/project 2a8be8bclang/docs ReleaseNotes.rst, clang/include/clang/Basic DiagnosticSemaKinds.td

Revert "[Clang] Warn when `std::atomic_thread_fence` is used with `fsanitize=thread`" (#175520)

Reverts llvm/llvm-project#166542

It caused clang to assert with: `!isa<CXXDestructorDecl>(D) && "Use
other ctor with dtor decls!"`
see comment on the PR.
DeltaFile
+0-70clang/test/SemaCXX/warn-tsan-atomic-fence.cpp
+0-67clang/lib/Sema/SemaChecking.cpp
+2-7clang/include/clang/Sema/Sema.h
+2-2clang/lib/Sema/Sema.cpp
+0-3clang/include/clang/Basic/DiagnosticSemaKinds.td
+0-2clang/docs/ReleaseNotes.rst
+4-1516 files

LLVM/project 6a28bd6llvm/test/Transforms/NaryReassociate/AMDGPU nary-add-uniform.ll

add GEP test
DeltaFile
+8-9llvm/test/Transforms/NaryReassociate/AMDGPU/nary-add-uniform.ll
+8-91 files

LLVM/project ac62f12mlir/include/mlir/Dialect/AMDGPU/IR AMDGPU.td, mlir/lib/Dialect/AMDGPU/IR AMDGPUDialect.cpp

[mlir][amdgpu] Remove redundant barriers (#175436)

DeltaFile
+15-0mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp
+12-0mlir/test/Dialect/AMDGPU/canonicalize.mlir
+5-4mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+2-6mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+34-104 files

LLVM/project f0982d5clang/lib/AST/ByteCode Interp.cpp ByteCodeEmitter.cpp, clang/test/AST/ByteCode cxx23.cpp

[clang][bytecode] Fix calling lambdas with broken instance pointers (#175511)

Clang will make the instance pointer be of type 'int' if it is invalid,
which trips up later logic. Mark functions as invalid if any of their
parameters is and compile + check them early in CallPtr.

Fixes https://github.com/llvm/llvm-project/issues/175425
DeltaFile
+11-2clang/test/AST/ByteCode/cxx23.cpp
+9-0clang/lib/AST/ByteCode/Interp.cpp
+4-2clang/lib/AST/ByteCode/ByteCodeEmitter.cpp
+24-43 files

LLVM/project 4cec622clang/include/clang/Interpreter IncrementalExecutor.h, clang/lib/Interpreter Wasm.cpp Wasm.h

[clang-repl] Move the produced temporary files in wasm in a temp folder. (#175508)

This patch avoids bloating the current folder with temporary files
created by wasm and moves them in a separate tmp directory. This patch
is a version of a downstream one resolving this issue.

Co-authored with Anutosh Bhat.
DeltaFile
+28-5clang/lib/Interpreter/Wasm.cpp
+5-1clang/lib/Interpreter/Wasm.h
+1-1clang/include/clang/Interpreter/IncrementalExecutor.h
+1-1clang/lib/Interpreter/IncrementalExecutor.cpp
+35-84 files

LLVM/project aeff209clang/docs ReleaseNotes.rst, clang/include/clang/Basic DiagnosticSemaKinds.td

Revert "[Clang] Warn when `std::atomic_thread_fence` is used with `fsanitize=…"

This reverts commit d6d5c5f6a2f468cb5aac7b466d2802841b8d1441.
DeltaFile
+0-70clang/test/SemaCXX/warn-tsan-atomic-fence.cpp
+0-67clang/lib/Sema/SemaChecking.cpp
+2-7clang/include/clang/Sema/Sema.h
+2-2clang/lib/Sema/Sema.cpp
+0-3clang/include/clang/Basic/DiagnosticSemaKinds.td
+0-2clang/docs/ReleaseNotes.rst
+4-1516 files

LLVM/project 91268a5libcxx/include scoped_allocator, libcxx/test/libcxx/diagnostics scoped_allocator.nodiscard.verify.cpp

[libc++][scoped_allocator] Applied `[[nodiscard]]` (#175291)

`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.

- https://libcxx.llvm.org/CodingGuidelines.html
- https://wg21.link/allocator.adaptor

Towards #172124
DeltaFile
+39-0libcxx/test/libcxx/utilities/allocator.adaptor/scoped_allocator.nodiscard.verify.cpp
+0-22libcxx/test/libcxx/diagnostics/scoped_allocator.nodiscard.verify.cpp
+11-6libcxx/include/scoped_allocator
+50-283 files

LLVM/project b646209llvm/lib/Target/AArch64 AArch64Features.td, llvm/unittests/TargetParser TargetParserTest.cpp

[AArch64][llvm] Add extra dependencies for recently added features (#175215)

DeltaFile
+9-0llvm/unittests/TargetParser/TargetParserTest.cpp
+2-2llvm/lib/Target/AArch64/AArch64Features.td
+11-22 files

LLVM/project 74001ccllvm/lib/Transforms/Scalar NaryReassociate.cpp

review: refactor to keep default order code unchanged
DeltaFile
+38-33llvm/lib/Transforms/Scalar/NaryReassociate.cpp
+38-331 files

LLVM/project 45388ebllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

avoid duplicating getWaitCountMax
DeltaFile
+47-63llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+47-631 files

LLVM/project 25a2c0cclang/include/clang/Basic CodeGenOptions.def, clang/include/clang/Options Options.td

review: use function attr instead cl::opt flag
DeltaFile
+17-15llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+7-0clang/include/clang/Options/Options.td
+3-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+4-0clang/include/clang/Basic/CodeGenOptions.def
+2-0clang/lib/CodeGen/Targets/AMDGPU.cpp
+33-165 files

LLVM/project 9a632fdlldb/source/Plugins/Language/CPlusPlus MsvcStlVariant.cpp, lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant TestDataFormatterStdVariant.py main.cpp

[LLDB] Fix MS STL `variant` with non-trivial types (#171489)

When using `std::variant` with non-trivial types, we need to go through
multiple bases to find the `_Which` member. The MSVC STL implements this
in `xsmf_control.h` which conditionally adds/deletes copy/move
constructors/operators.

We now go to `_Variant_base` (the holder of `_Which`). This inherits
from `_Variant_storage`, which is our entry point to finding the n-th
storage (going through `_Tail`).
DeltaFile
+21-0lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant/TestDataFormatterStdVariant.py
+11-5lldb/source/Plugins/Language/CPlusPlus/MsvcStlVariant.cpp
+5-0lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant/main.cpp
+37-53 files

LLVM/project 1f27f4allvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.h AMDGPUBaseInfo.cpp

review: move hardwareLimit inside AMDGPUBaseInfo
DeltaFile
+30-51llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+20-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+19-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+69-513 files

LLVM/project 0790c75llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Update SIInsertWaitcnts.cpp

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+1-11 files

LLVM/project 1e10ed0llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

add more test
DeltaFile
+225-0llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+225-01 files

LLVM/project 06ce938llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Update SIInsertWaitcnts.cpp

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+1-11 files

LLVM/project 4cd57e1llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

fix: resolve issue after rebase
DeltaFile
+0-15llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-151 files

LLVM/project d2565eallvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

skip expanding out-of-order events
DeltaFile
+143-20llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+42-12llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+185-322 files

LLVM/project 45dd426llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Address reviewer feedback: fix getWaitCountMax and reduce code duplication

- Fix getWaitCountMax() to use correct bitmasks based on architecture:
  - Pre-GFX12: Use getVmcntBitMask/getLgkmcntBitMask for LOAD_CNT/DS_CNT
  - GFX12+: Use getLoadcntBitMask/getDscntBitMask for LOAD_CNT/DS_CNT
- Refactor repetitive if-blocks for LOAD_CNT, DS_CNT, EXP_CNT into
  a single loop using getCounterRef helper function
- Fix X_CNT to return proper getXcntBitMask(IV) instead of 0
DeltaFile
+18-32llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+18-321 files

LLVM/project f7e94f1llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

add run line for diff GPU Gen and counter types
DeltaFile
+567-203llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+567-2031 files

LLVM/project 8ac8c55llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

[AMDGPU] Add -amdgpu-expand-waitcnt-profiling option for PC-sampling profiling
DeltaFile
+230-0llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+167-22llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+397-222 files

LLVM/project 9b8dd2cllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

avoid duplicating getWaitCountMax
DeltaFile
+41-58llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+41-581 files

LLVM/project 6bde19flldb/test/API/commands/process/detach-resumes TestDetachResumes.py

[lldb] Disable flaky TestDetachResumes.py on Windows/AArch64

This patch marks TestDetachResumes.py skipped on Windows/AArch64.
It has been failing intermittently on Windows AArch64 buildbot:
https://lab.llvm.org/buildbot/#/builders/141/

This extends the prior change that disabled the same test on Windows
x86_64 (commit 6d8d4cf9a46b3729732736ffe288f6b722d85121 by Dmitry
Vasilyev, 2025-06-23). See #144891 for background and original
discussion.
DeltaFile
+0-1lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py
+0-11 files

LLVM/project 4b813beopenmp/runtime/unittests CMakeLists.txt

improve LLVM_RUNTIMES_BUILD var handling
DeltaFile
+13-2openmp/runtime/unittests/CMakeLists.txt
+13-21 files

LLVM/project e51f25allvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AArch64 arm64-vabs.ll

[SDAG] Combine select into ABD?, for const (#173581)

(select (setcc ...) (sub a, b) (sub b, a))

When b is const, the `sub a, b` becomes `add a, -b` which we take care of in this patch with the m_SpecificNeg() matcher.
DeltaFile
+91-0llvm/test/CodeGen/AArch64/arm64-vabs.ll
+16-8llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+107-82 files

LLVM/project fcff5b0llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch/lsx issue174606.ll

[LoongArch] Disable strict node mutation to fix strict FP lowering crash (#175484)

The patch disables strict node mutation for LoongArch by setting
IsStrictFPEnabled to true.

This change fixes the current strict FP lowering crash only.
ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS can be further improved.

Fixes #174606
DeltaFile
+32-0llvm/test/CodeGen/LoongArch/lsx/issue174606.ll
+3-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+35-02 files

LLVM/project 19317adclang/include/clang/Interpreter IncrementalExecutor.h, clang/lib/Interpreter IncrementalExecutor.cpp Interpreter.cpp

[clang-repl] Rework layering of incremental executors. (#175448)

The original Interpreter implementation had a hard dependency on ORC and
grew organically with the addition of out-of-process JIT support. This
tightly coupled the Interpreter to a specific execution engine and
leaked ORC-specific assumptions (runtime layout, symbol lookup,
exception model) into higher layers.

The WebAssembly integration demonstrated that incremental execution can
be implemented without ORC, exposing the need for a cleaner abstraction
boundary.

This change introduces an IncrementalExecutor interface and moves
ORC-based execution behind a concrete implementation. The Interpreter
now depends only on the abstract executor, improving layering and
encapsulation.

In addition, the Interpreter can be configured with user-provided
incremental executor implementations, enabling ORC-independent
execution, easier testing, and future extensions without modifying the
core Interpreter.
DeltaFile
+250-100clang/lib/Interpreter/IncrementalExecutor.cpp
+13-260clang/lib/Interpreter/Interpreter.cpp
+121-0clang/lib/Interpreter/OrcIncrementalExecutor.cpp
+93-0clang/include/clang/Interpreter/IncrementalExecutor.h
+0-90clang/lib/Interpreter/IncrementalExecutor.h
+67-4clang/unittests/Interpreter/InterpreterExtensionsTest.cpp
+544-4547 files not shown
+643-51613 files

LLVM/project 65a5cdfllvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Merge branch 'main' into users/hev/issue-168152
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,708-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,355-193,52611,266 files not shown
+1,800,125-1,347,92411,272 files