[X86] Allow EVEX compression for VPMOV*2M + KMOV pattern (#175219)
This patch extends the X86CompressEVEX pass to recognize and compress
multi-instruction masking patterns. It also adds relevant tests for the
new pattern.
Fixes #171746
Fixes #174871
[InstCombine][X86] Move simplifyX86FPMaxMin handling from simplifyDemandedVectorEltsIntrinsic to instCombineIntrinsic (#175441)
My fault for missing this when reviewing #174806 - technically we might
benefit from demanded elts handling for these intrinsics some day, but
the base implementation should be in instCombineIntrinsic
Noticed while reviewing #175375 which I recommended reuses more of the
simplifyX86FPMaxMin handling.
[lldb] Fix TestFrameVarDILCast.py build on Windows AArch64
This patch adds <cstddef> to main.cpp in TestFrameVarDILCast.py so that
std::nullptr_t is properly declared. It fixes the TestFrameVarDILCast.py
compile failure observed on the LLDB Windows AArch64 buildbot:
https://lab.llvm.org/buildbot/#/builders/141
The issue was introduced by commit 539cf92 in #170332.
[LLDB] Increase level of headings in lldb-dap docs (#175519)
The lldb-dap docs had more than one top-level heading (one `#`). All top
level headings are shown in the "Using LLDB" list on the left side. In
this case, "Supported Features" and "Configuration Settings Reference"
showed up there.
With this PR, these headings are increased by one level. This also
increases the level of "Debug Console" (child of "Supported Features")
and "Common/Launch/Attach configurations" (child of "Configuration
Settings Reference").
[AMDGPU][SIInsertWaitcnt] Implement Waitcnt Expansion for Profiling (#169345)
Reference issue: https://github.com/ROCm/llvm-project/issues/67
This patch adds support for expanding s_waitcnt instructions into
sequences with decreasing counter values, enabling PC-sampling profilers
to identify which specific memory operation is causing a stall.
This is controlled via:
Clang flag: -mamdgpu-expand-waitcnt-profiling /
-mno-amdgpu-expand-waitcnt-profiling
Function attribute: "amdgpu-expand-waitcnt-profiling"
When enabled, instead of emitting a single waitcnt, the pass generates a
sequence that waits for each outstanding operation individually. For
example, if there are 5 outstanding memory operations and the target is
to wait until 2 remain:
[23 lines not shown]
Revert "[Clang] Warn when `std::atomic_thread_fence` is used with `fsanitize=thread`" (#175520)
Reverts llvm/llvm-project#166542
It caused clang to assert with: `!isa<CXXDestructorDecl>(D) && "Use
other ctor with dtor decls!"`
see comment on the PR.
[clang][bytecode] Fix calling lambdas with broken instance pointers (#175511)
Clang will make the instance pointer be of type 'int' if it is invalid,
which trips up later logic. Mark functions as invalid if any of their
parameters is and compile + check them early in CallPtr.
Fixes https://github.com/llvm/llvm-project/issues/175425
[clang-repl] Move the produced temporary files in wasm in a temp folder. (#175508)
This patch avoids bloating the current folder with temporary files
created by wasm and moves them in a separate tmp directory. This patch
is a version of a downstream one resolving this issue.
Co-authored with Anutosh Bhat.
[LLDB] Fix MS STL `variant` with non-trivial types (#171489)
When using `std::variant` with non-trivial types, we need to go through
multiple bases to find the `_Which` member. The MSVC STL implements this
in `xsmf_control.h` which conditionally adds/deletes copy/move
constructors/operators.
We now go to `_Variant_base` (the holder of `_Which`). This inherits
from `_Variant_storage`, which is our entry point to finding the n-th
storage (going through `_Tail`).
Address reviewer feedback: fix getWaitCountMax and reduce code duplication
- Fix getWaitCountMax() to use correct bitmasks based on architecture:
- Pre-GFX12: Use getVmcntBitMask/getLgkmcntBitMask for LOAD_CNT/DS_CNT
- GFX12+: Use getLoadcntBitMask/getDscntBitMask for LOAD_CNT/DS_CNT
- Refactor repetitive if-blocks for LOAD_CNT, DS_CNT, EXP_CNT into
a single loop using getCounterRef helper function
- Fix X_CNT to return proper getXcntBitMask(IV) instead of 0