[RISCV][Disassembler] Refactor simple predicate decoders using a template
This replaces the manual boilerplate for DecodeGPRNoX0, DecodeGPRNoX2,
DecodeGPRNoX31, and DecodeGPRPairNoX0 with a universal filtering template
and constexpr predicate functions.
I will need more of these for the RVY patch series, so submitting this NFC
cleanup first.
Pull Request: https://github.com/llvm/llvm-project/pull/198146
[LLDB] Add a progress event to xcrun invocations (#198931)
LLDB invokes xcrun to find SDKs on disk. This is usually very fast, but
sometimes (after an Xcode update, or when the searched SDK does not
exist) it can take very long (10s or more). The progress event provides
user feedback to explain the hang.
[SSAF] Let UnsafeBufferUsageExtractor & PointerFlowExtractor ignore templates
Templates are ignored for two reasons:
- Template instantiations are still handled. Template facts can be
inferred from their instantiations.
- Templates are inherently difficult to reason about. Their ASTs can
contain dependent expression types (such as ParenListExpr) that
complicate analysis.
[libc++] Add support for thread-id handling for llvm-libc. (#198595)
This change adds support for properly defining and obtaining
`__libcpp_thread_id` when llvm-libc is used. It defines the integral
thread-id (which satisfies necessary restrictions of having total order,
being hashable and formattable) as `pthread_id_np_t` type and uses
`pthread_getthreadid_np` and `pthread_getunique_np` functions to obtain
it (added in
https://github.com/llvm/llvm-project/pull/197027, following the
discussions in https://github.com/llvm/llvm-project/pull/195139 and
https://github.com/llvm/llvm-project/pull/195202).
We also let `_LIBCPP_NULL_THREAD` macro use a more portable
`PTHREAD_NULL` (defined in the latest POSIX) when this macro is
available, so that it would work as expected for opaque `pthread_t`
implementations, where default constructor might not necessarily
zero-initialize all the members.
This is the last remaining change to allow building libc++ against
llvm-libc with threads enabled (test-suite results TBD).
[NVPTX] Add commutativity to SETP instructions to enable MachineCSE of inverted predicates (#191890)
Inverted predicates can be used freely in PTX. If we can invert a
predicate and CSE the generating instruction we can save calculating the
inverse.
Teach the NVPTX `commuteInstructionImpl` that SETP instructions can be
inverted to allow CSEing with previous SETP that match the inverted
form. This also inverts the branch users of the predicate to maintain
correctness.
Currently only allow the SETP inversion if all users are branches.
Future work can extend this to `sel` and `not` instructions.
Depends on #191889.
Assisted-by: Cursor / Claude
[libc] Implement pthread_sigmask (#198682)
* Extract `rt_sigprocmask` syscall wrapper into the
libc/src/__support/OSUtil/linux/syscall_wrappers/ directory
* Convert all existing users of this syscall, and simplify the logic
where applicable.
* Implement `pthread_sigmask`, which is effectively another POSIX
wrapper around `rt_sigprocmask` syscall similar to `sigprocmask`
[libc] Update prctl() declaration to use variadic arguments. (#198654)
prctl declaration should typically use variadic arguments (e.g. see
https://man7.org/linux/man-pages/man2/prctl.2.html), as the types /
quantity of subsequent arguments depends on the `option`. We can't
depend on all `<prctl.h>` users to explicitly cast arguments to
`unsigned long` and passing all 5 of them every time.
* Don't add any option-specific logic, and just consume `arg2`-`arg5`
from variadic arguments and pass them to syscall implementation as-is,
assuming that they won't be used by the kernel if they are not needed,
and consuming these arguments won't lead to crashes.
* Updated the test to use `prctl` variants with less than 5 explicit
arguments (for PR_SET_NAME and PR_GET_NAME).
[CIR][CIRGen] Cast stack allocas to the language-visible address space (#196868)
This patch aims to improve parity with OG codegen on targets with
non-flat alloca address space. I observed this after getting some
crashes while compiling PolybenchGpu for HIP (amdgpu). This work had
previously been merged in the incubator, most notably:
https://github.com/llvm/clangir/pull/2090,https://github.com/llvm/clangir/pull/2088.
CIR currently returns the raw `cir.alloca` address from temporary/local
alloca creation. On AMDGPU, stack allocas live in private addrspace(5),
but ordinary C/C++/HIP auto variables are still used through the
language-visible generic/flat address space.
OG CodeGen handles this by creating the alloca in the target stack
address space and immediately casting it to the language-visible address
space when those differ. For example:
```llvm
[11 lines not shown]
[SLP]Bail out when copyable has cross-block reused non-schedulable user
When a copyable scalar in the bundle being scheduled has a same-block,
non-PHI, non-schedulable user with multiple uses, and at least one of
those uses is a non-PHI use in another block, the user's dependency
tracking across multiple bundles can be inconsistent.
Cancel scheduling of such copyable bundles instead.
Fixes #198364.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/198915
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[X86] Cast atomic vectors in IR to support floats
Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.
Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.
Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two
[4 lines not shown]
[SelectionDAG] Split vector types for atomic store
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.
[X86] Remove extra MOV after widening atomic store
This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
[SelectionDAG] Widen <2 x T> vector types for atomic store
Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.
[Clang] [Docs] Remove stray release note (#198913)
The patch that added this release note was reverted (#198341), but then
(#198167) accidentally added it back.
[Clang] Disallow `break`/`continue` in loop conditions (#198436)
tl;dr: This makes e.g. `while (({ break; 1; })) {}` ill-formed.
GCC used to allow this a long time ago (< GCC 9 I believe), but
eventually removed support for it; we originally allowed this both for
GCC compatibility and because there was actual code in the wild using it
(see Richard’s comment here for more background:
https://github.com/llvm/llvm-project/pull/152606#issuecomment-3166130973).
Note that this _is_ still allowed inside another loop, e.g. this
```c++
for (;;) {
while (({ break; true; })) {}
}
```
is well-formed; the `break` here will break out of the `for` loop.
Removing support for this gets rid of quite a bit of code and has a few
[32 lines not shown]
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[X86] Cast atomic vectors in IR to support floats
Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.
Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.
Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two
[4 lines not shown]
[SelectionDAG] Split vector types for atomic store
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.
[X86] Remove extra MOV after widening atomic store
This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
[SelectionDAG] Widen <2 x T> vector types for atomic store
Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.