LLVM/project ff005aautils/bazel/llvm-project-overlay/clang BUILD.bazel, utils/bazel/llvm-project-overlay/clang/unittests BUILD.bazel

Fix bazel test failures caused in #175435 (#175533)

DeltaFile
+1-1utils/bazel/llvm-project-overlay/clang/unittests/BUILD.bazel
+1-0utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+2-12 files

LLVM/project 4790a14llvm/lib/Target/X86 X86CompressEVEX.cpp, llvm/test/CodeGen/X86 pr174871.ll evex-to-vex-compress.mir

[X86] Allow EVEX compression for VPMOV*2M + KMOV pattern (#175219)

This patch extends the X86CompressEVEX pass to recognize and compress
multi-instruction masking patterns. It also adds relevant tests for the
new pattern.

Fixes #171746
Fixes #174871
DeltaFile
+151-4llvm/lib/Target/X86/X86CompressEVEX.cpp
+146-0llvm/test/CodeGen/X86/pr174871.ll
+60-0llvm/test/CodeGen/X86/evex-to-vex-compress.mir
+11-22llvm/test/CodeGen/X86/masked_gather_scatter.ll
+6-12llvm/test/CodeGen/X86/vector-shuffle-v1.ll
+3-6llvm/test/CodeGen/X86/avx512dqvl-intrinsics-upgrade.ll
+377-449 files not shown
+390-7015 files

LLVM/project 424998cllvm/lib/Target/X86 X86InstCombineIntrinsic.cpp

[InstCombine][X86] Move simplifyX86FPMaxMin handling from simplifyDemandedVectorEltsIntrinsic to instCombineIntrinsic (#175441)

My fault for missing this when reviewing #174806 - technically we might
benefit from demanded elts handling for these intrinsics some day, but
the base implementation should be in instCombineIntrinsic

Noticed while reviewing #175375 which I recommended reuses more of the
simplifyX86FPMaxMin handling.
DeltaFile
+27-26llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp
+27-261 files

LLVM/project 8f18252llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/X86 predicated-udiv.ll

[VPlan] Don't fold UDiv in replicate regions. (#175460)

The UDiv fold added in d12e993 (#174581) is currently also applied to
replicate regions, which means we may end up with VPInstructions in
replicate regions, which is currently nots supported.

Fixes https://github.com/llvm/llvm-project/issues/175295.

PR: https://github.com/llvm/llvm-project/pull/175460
DeltaFile
+265-0llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll
+6-1llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+271-12 files

LLVM/project 4415ea5lldb/test/API/commands/frame/var-dil/expr/Casts main.cpp

[lldb] Fix TestFrameVarDILCast.py build on Windows AArch64

This patch adds <cstddef> to main.cpp in TestFrameVarDILCast.py so that
std::nullptr_t is properly declared. It fixes the TestFrameVarDILCast.py
compile failure observed on the LLDB Windows AArch64 buildbot:
https://lab.llvm.org/buildbot/#/builders/141

The issue was introduced by commit 539cf92 in #170332.
DeltaFile
+1-0lldb/test/API/commands/frame/var-dil/expr/Casts/main.cpp
+1-01 files

LLVM/project b574f44lldb/docs/use lldbdap.md

[LLDB] Increase level of headings in lldb-dap docs (#175519)

The lldb-dap docs had more than one top-level heading (one `#`). All top
level headings are shown in the "Using LLDB" list on the left side. In
this case, "Supported Features" and "Configuration Settings Reference"
showed up there.

With this PR, these headings are increased by one level. This also
increases the level of "Debug Console" (child of "Supported Features")
and "Common/Launch/Attach configurations" (child of "Configuration
Settings Reference").
DeltaFile
+6-6lldb/docs/use/lldbdap.md
+6-61 files

LLVM/project 3dfb782clang/include/clang/Basic CodeGenOptions.def, clang/include/clang/Options Options.td

[AMDGPU][SIInsertWaitcnt] Implement Waitcnt Expansion for Profiling (#169345)

Reference issue: https://github.com/ROCm/llvm-project/issues/67

This patch adds support for expanding s_waitcnt instructions into
sequences with decreasing counter values, enabling PC-sampling profilers
to identify which specific memory operation is causing a stall.

This is controlled via:
Clang flag: -mamdgpu-expand-waitcnt-profiling /
-mno-amdgpu-expand-waitcnt-profiling
Function attribute: "amdgpu-expand-waitcnt-profiling"

When enabled, instead of emitting a single waitcnt, the pass generates a
sequence that waits for each outstanding operation individually. For
example, if there are 5 outstanding memory operations and the target is
to wait until 2 remain:



    [23 lines not shown]
DeltaFile
+944-0llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+204-93llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+20-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+19-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+7-0clang/include/clang/Options/Options.td
+4-0clang/include/clang/Basic/CodeGenOptions.def
+1,198-931 files not shown
+1,200-937 files

LLVM/project a6378b6llvm/include/llvm/CodeGen ReachingDefAnalysis.h, llvm/lib/CodeGen ReachingDefAnalysis.cpp

 [ReachingDefAnalysis][NFC] Use named constants. (#175075)

DeltaFile
+3-3llvm/lib/CodeGen/ReachingDefAnalysis.cpp
+3-1llvm/include/llvm/CodeGen/ReachingDefAnalysis.h
+6-42 files

LLVM/project 2a8be8bclang/docs ReleaseNotes.rst, clang/include/clang/Basic DiagnosticSemaKinds.td

Revert "[Clang] Warn when `std::atomic_thread_fence` is used with `fsanitize=thread`" (#175520)

Reverts llvm/llvm-project#166542

It caused clang to assert with: `!isa<CXXDestructorDecl>(D) && "Use
other ctor with dtor decls!"`
see comment on the PR.
DeltaFile
+0-70clang/test/SemaCXX/warn-tsan-atomic-fence.cpp
+0-67clang/lib/Sema/SemaChecking.cpp
+2-7clang/include/clang/Sema/Sema.h
+2-2clang/lib/Sema/Sema.cpp
+0-3clang/include/clang/Basic/DiagnosticSemaKinds.td
+0-2clang/docs/ReleaseNotes.rst
+4-1516 files

LLVM/project 6a28bd6llvm/test/Transforms/NaryReassociate/AMDGPU nary-add-uniform.ll

add GEP test
DeltaFile
+8-9llvm/test/Transforms/NaryReassociate/AMDGPU/nary-add-uniform.ll
+8-91 files

LLVM/project ac62f12mlir/include/mlir/Dialect/AMDGPU/IR AMDGPU.td, mlir/lib/Dialect/AMDGPU/IR AMDGPUDialect.cpp

[mlir][amdgpu] Remove redundant barriers (#175436)

DeltaFile
+15-0mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp
+12-0mlir/test/Dialect/AMDGPU/canonicalize.mlir
+5-4mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+2-6mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+34-104 files

LLVM/project f0982d5clang/lib/AST/ByteCode Interp.cpp ByteCodeEmitter.cpp, clang/test/AST/ByteCode cxx23.cpp

[clang][bytecode] Fix calling lambdas with broken instance pointers (#175511)

Clang will make the instance pointer be of type 'int' if it is invalid,
which trips up later logic. Mark functions as invalid if any of their
parameters is and compile + check them early in CallPtr.

Fixes https://github.com/llvm/llvm-project/issues/175425
DeltaFile
+11-2clang/test/AST/ByteCode/cxx23.cpp
+9-0clang/lib/AST/ByteCode/Interp.cpp
+4-2clang/lib/AST/ByteCode/ByteCodeEmitter.cpp
+24-43 files

LLVM/project 4cec622clang/include/clang/Interpreter IncrementalExecutor.h, clang/lib/Interpreter Wasm.cpp Wasm.h

[clang-repl] Move the produced temporary files in wasm in a temp folder. (#175508)

This patch avoids bloating the current folder with temporary files
created by wasm and moves them in a separate tmp directory. This patch
is a version of a downstream one resolving this issue.

Co-authored with Anutosh Bhat.
DeltaFile
+28-5clang/lib/Interpreter/Wasm.cpp
+5-1clang/lib/Interpreter/Wasm.h
+1-1clang/include/clang/Interpreter/IncrementalExecutor.h
+1-1clang/lib/Interpreter/IncrementalExecutor.cpp
+35-84 files

LLVM/project aeff209clang/docs ReleaseNotes.rst, clang/include/clang/Basic DiagnosticSemaKinds.td

Revert "[Clang] Warn when `std::atomic_thread_fence` is used with `fsanitize=…"

This reverts commit d6d5c5f6a2f468cb5aac7b466d2802841b8d1441.
DeltaFile
+0-70clang/test/SemaCXX/warn-tsan-atomic-fence.cpp
+0-67clang/lib/Sema/SemaChecking.cpp
+2-7clang/include/clang/Sema/Sema.h
+2-2clang/lib/Sema/Sema.cpp
+0-3clang/include/clang/Basic/DiagnosticSemaKinds.td
+0-2clang/docs/ReleaseNotes.rst
+4-1516 files

LLVM/project 91268a5libcxx/include scoped_allocator, libcxx/test/libcxx/diagnostics scoped_allocator.nodiscard.verify.cpp

[libc++][scoped_allocator] Applied `[[nodiscard]]` (#175291)

`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.

- https://libcxx.llvm.org/CodingGuidelines.html
- https://wg21.link/allocator.adaptor

Towards #172124
DeltaFile
+39-0libcxx/test/libcxx/utilities/allocator.adaptor/scoped_allocator.nodiscard.verify.cpp
+0-22libcxx/test/libcxx/diagnostics/scoped_allocator.nodiscard.verify.cpp
+11-6libcxx/include/scoped_allocator
+50-283 files

LLVM/project b646209llvm/lib/Target/AArch64 AArch64Features.td, llvm/unittests/TargetParser TargetParserTest.cpp

[AArch64][llvm] Add extra dependencies for recently added features (#175215)

DeltaFile
+9-0llvm/unittests/TargetParser/TargetParserTest.cpp
+2-2llvm/lib/Target/AArch64/AArch64Features.td
+11-22 files

LLVM/project 74001ccllvm/lib/Transforms/Scalar NaryReassociate.cpp

review: refactor to keep default order code unchanged
DeltaFile
+38-33llvm/lib/Transforms/Scalar/NaryReassociate.cpp
+38-331 files

LLVM/project 45388ebllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

avoid duplicating getWaitCountMax
DeltaFile
+47-63llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+47-631 files

LLVM/project 25a2c0cclang/include/clang/Basic CodeGenOptions.def, clang/include/clang/Options Options.td

review: use function attr instead cl::opt flag
DeltaFile
+17-15llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+7-0clang/include/clang/Options/Options.td
+3-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+4-0clang/include/clang/Basic/CodeGenOptions.def
+2-0clang/lib/CodeGen/Targets/AMDGPU.cpp
+33-165 files

LLVM/project 9a632fdlldb/source/Plugins/Language/CPlusPlus MsvcStlVariant.cpp, lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant TestDataFormatterStdVariant.py main.cpp

[LLDB] Fix MS STL `variant` with non-trivial types (#171489)

When using `std::variant` with non-trivial types, we need to go through
multiple bases to find the `_Which` member. The MSVC STL implements this
in `xsmf_control.h` which conditionally adds/deletes copy/move
constructors/operators.

We now go to `_Variant_base` (the holder of `_Which`). This inherits
from `_Variant_storage`, which is our entry point to finding the n-th
storage (going through `_Tail`).
DeltaFile
+21-0lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant/TestDataFormatterStdVariant.py
+11-5lldb/source/Plugins/Language/CPlusPlus/MsvcStlVariant.cpp
+5-0lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant/main.cpp
+37-53 files

LLVM/project 1f27f4allvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.h AMDGPUBaseInfo.cpp

review: move hardwareLimit inside AMDGPUBaseInfo
DeltaFile
+30-51llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+20-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+19-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+69-513 files

LLVM/project 0790c75llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Update SIInsertWaitcnts.cpp

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+1-11 files

LLVM/project 1e10ed0llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

add more test
DeltaFile
+225-0llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+225-01 files

LLVM/project 06ce938llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Update SIInsertWaitcnts.cpp

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+1-11 files

LLVM/project 4cd57e1llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

fix: resolve issue after rebase
DeltaFile
+0-15llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-151 files

LLVM/project d2565eallvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

skip expanding out-of-order events
DeltaFile
+143-20llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+42-12llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+185-322 files

LLVM/project 45dd426llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Address reviewer feedback: fix getWaitCountMax and reduce code duplication

- Fix getWaitCountMax() to use correct bitmasks based on architecture:
  - Pre-GFX12: Use getVmcntBitMask/getLgkmcntBitMask for LOAD_CNT/DS_CNT
  - GFX12+: Use getLoadcntBitMask/getDscntBitMask for LOAD_CNT/DS_CNT
- Refactor repetitive if-blocks for LOAD_CNT, DS_CNT, EXP_CNT into
  a single loop using getCounterRef helper function
- Fix X_CNT to return proper getXcntBitMask(IV) instead of 0
DeltaFile
+18-32llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+18-321 files

LLVM/project f7e94f1llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

add run line for diff GPU Gen and counter types
DeltaFile
+567-203llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+567-2031 files

LLVM/project 8ac8c55llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

[AMDGPU] Add -amdgpu-expand-waitcnt-profiling option for PC-sampling profiling
DeltaFile
+230-0llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+167-22llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+397-222 files

LLVM/project 9b8dd2cllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

avoid duplicating getWaitCountMax
DeltaFile
+41-58llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+41-581 files