[libc] clean up wchar file deps and includes (#198648)
There were a couple comments left on the wchar file series after I'd
already merged some. This PR should apply those changes to the rest of
the wchar file functions.
Assisted-by: Automated tooling, human reviewed.
[LLVM][ARM] Add native ct.select support for ARM32 and Thumb
This patch implements architecture-specific lowering for ct.select on ARM
(both ARM32 and Thumb modes) using conditional move instructions and
bitwise operations for constant-time selection.
Implementation details:
- Uses pseudo-instructions that are expanded Post-RA to bitwise operations
- Post-RA expansion in ARMBaseInstrInfo for BUNDLE pseudo-instructions
- Handles scalar integer types, floating-point, and half-precision types
- Handles vector types with NEON when available
- Support for both ARM and Thumb instruction sets (Thumb1 and Thumb2)
- Special handling for Thumb1 which lacks conditional execution
- Comprehensive test coverage including half-precision and vectors
The implementation includes:
- ISelLowering: Custom lowering to CTSELECT pseudo-instructions
- ISelDAGToDAG: Selection of appropriate pseudo-instructions
- BaseInstrInfo: Post-RA expansion of BUNDLE to bitwise instruction sequences
[3 lines not shown]
[X86] Manage atomic store of fp -> int promotion in DAG (#197166)
When lowering `atomic store <1 x T>` vector types with floats (i.e.
during scalarization in the selection DAG), selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.
Store-side counterpart to #148895. Stacked on top of #197165; and below
of #197618.
Reland [VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions (#197659)
Reland of #195119, which was reverted in 2b26355 due to:
1. An assertion failure on AArch64 where
`getShuffleCost(SK_ExtractSubvector)` was called without the `SubTp`
parameter.
2. A miscompilation on non-power-of-2 vector sizes where parity-based
shuffle masks cause lane duplication in the reduction tree.
Fixes:
- Pass `ReduceVecTy` as `SubTp` to `getShuffleCost`.
- Restrict partial reductions to power-of-2 vector sizes.
---
Extend foldShuffleChainsToReduce to recognize partial reduction patterns
where
only a subvector of the full vector is being reduced.
[11 lines not shown]
[LLVM][X86] Add f80 support for ct.select
Add special handling for x86_fp80 types in CTSELECT lowering by splitting
them into three 32-bit chunks, performing constant-time selection on each
chunk, and reassembling the result. This fixes crashes when compiling
tests with f80 types.
Also updated ctselect.ll to match current generic fallback implementation.
[LV] Add -epilogue-tail-folding-policy flag for tail-folded epilogue (#190697)
This is the first patch in a series implementing **tail-folding on the
epilogue loop** — a vectorization style that pairs an unpredicated
vector main loop with a predicated vector epilogue.
It adds a new flag, `-epilogue-tail-folding-policy`, to enable the
style opt-in. Subsequent patches will build out the implementation.
Motivation behind this work:
- The current vectorization styles force either tail-folding on the main
vector loop with no interleaving, or unpredicated main vector loop with
interleaving.
The first style prevents us from getting the benefit of high interleaving
when it’s beneficial/possible, and the second one prevents
tail-folding while it could be beneficial specially for low trip count.
- The proposed hybrid approach of having unpredicated main vector loop
with tail-folded vector epilogue combines the strengths of both styles,
[7 lines not shown]
[SelectionDAG] Scalarize <1 x T> vector types for atomic store (#197165)
`store atomic <1 x T>` is not valid. This change legalizes
vector types of atomic store via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.
This is the store-side counterpart to #148894. Stacked on top of
#197372; and below #197166.
[X86][AtomicExpand] Remove X86's shouldCastAtomicLoadInIR override (added in #148899)
So that atomic floating-point and FP-vector loads are no longer bitcast to an integer
at the IR level by AtomicExpand.
[clang][Sema] Diagnose nested local classes defined in a different block scope than their parent (#197863)
Fixes #193472.
[[class.local]/3](https://eel.is/c++draft/class.local#3) says:
> A class nested within a local class is a local class. A member of a
local class `X` shall be declared only in the definition of `X` or, __if
the member is a nested class, in the nearest enclosing block scope of
X__.
In other words:
```cpp
void f() {
struct X { struct S; };
struct X::S {}; // okay
struct X { struct S; };
[17 lines not shown]
[VPlan] Add branch-on-cond false original unconditional latch (NFC). (#198539)
For loops where the latch does not exit, addInitialSkeleton adds the
middle block as additional successor, as early canonicalization.
But then we end up with a block without terminator and multiple
successors. Fix this by adding a branch-on-cond false as terminator.
This preserves the original behavior (backegdge always taken) and
resolves the verifier issue.
PR: https://github.com/llvm/llvm-project/pull/198539
[OpenMP][OMPIRBuilder] Fix non-determinism in removeUnusedBlocksFromParent
The openmp-cli-fuse02.mlir test fails non-deterministically (~20%) when
unrelated patches add or reorder code, causing the linker to place
objects at different offsets and the heap allocator to return different
addresses at runtime. SmallPtrSet iteration order depends on these
pointer addresses (also randomized across runs by ASLR), so fuseLoops
sometimes leaves dead blocks in the function.
The root cause is erasing blocks from the set while iterating it to
check for external uses—removing one block mid-pass changes the result
for blocks checked later. Both the remove_if form and the earlier
make_early_inc_range version (pre-b6a94b6bfb2c) have this defect.
Fix by collecting all blocks to keep before erasing, so every block is
evaluated against the same snapshot of the set.
[Offload] Fix build install directory and remove 'add_llvm_library' (#198622)
Summary:
The problem is that we do not correctly set the build directory output
for offload/. Normally, it's supposed to mirror the install pattern.
This is because we both have variants and so people can use the compiler
from the build directory.
Currently, if you build more than one variant of the offload/ library
they will clobber each-other in `<build>/lib/`, so no cross compiling
allowed. Additionally, these will not be usable in the build directory
because the compiler will think that they are in the triple directory
when they are not.
Relatively simple fix, just copy-paste the pattern every other runtime
uses and then remove the implicit handling we get from
`add_llvm_libraries`. The only this it did for us was automatically map
component names to the libraries, which is easy enough to do.
[VPlan] Sink VPRecipeValue dtors. (#198623)
Currently (after https://github.com/llvm/llvm-project/pull/195483) the
VPRecipeValue accesses the defining value and removes it. This can cause
uninitialized memory reads, because the Def pointer held by the
VPMultiDefValue is destroyed before the super class destructor runs.
[libc][freebsd] initialize freebsd support (#124459)
Initialize FreeBSD support. Currently, only overlay build (mainly math
routines) is supported.
This PR mainly define the target entrypoints and basic syscall support.
Different from Linux, FreeBSD's syscall return always consist of two
component:
- return value as arch register
- error flag
On x86-64, the flag is returned via the carry bit state. Hence, for
syscall stubs, we always return a structure containing these two fields.
For math support, the only big difference is that FreeBSD has different
naming convention in some exception macros.
Further fixes for C++ userland are tracked in #197605
Assisted-by: Codex with gpt-5.5 high fast
[lldb-dap] Fall back to name when arguments are missing (#198626)
When `lldb-dap --list-processes` omits the `arguments` field for a
process, fall back to `name` (matching the `command` field's fallback)
instead of an empty string. This keeps the process picker from showing
blank entries for processes whose full command line is unavailable.
clang: Refactor handling of offload sanitizer arguments
Previously the AMDGPU toolchains hackily handled -fsanitize arguments.
They would lie and report that all host side sanitizers are available,
then TranslateArgs would filter out the device side cases that do not
work, providing diagnostics for the skipped cases. Move that logic
into the base sanitizer argument parsing.
This makes the produced diagnostics more consistent. Previously we
would get repeated warnings when a sanitizer is fully unsupported
by amdgpu, which should now be once for the toolchain. These could
be further improved; we're printing the specific field of -fsanitize
in more cases where it could be skipped. In other cases we have the
opposite problem, where we aren't reporting the exact sanitizer
from the -f flag in the case that depends on a subtarget feature.
This will help fix other broken target specific flag forwarding bugs
in the future.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
clang/AMDGPU: Use TranslateArgs from the base toolchain instead of the host
This fixes -Xopenmp-target / -Xarch for arbitrary arguments. HIP and OpenMP
had cargo-cult broken implementations of TranslateArgs, which called the host
toolchain's implementation, and then special case transferred either -march
or -mcpu to the device argument list. The respective device forwarding flags
should work for any argument, not just this one. The main feature that needs
to be preserved is the shared filtering of unsupported sanitizers to degrade
them into warnings.
Most of the changes here are dealing with fallout observed when
the host target is darwin. The darwin toolchain happens to have
some hacky statefulness tracking the compile target version, which
gets written and rewritten on argument parsing. To maintain this hack,
there are a few unused calls to getArgsForToolChain; start passing OFK_Host
to these so the offload toolchains don't get confused and think they're in
a non-offload context.