[libc] clean up wchar file deps and includes (#198648)
There were a couple comments left on the wchar file series after I'd
already merged some. This PR should apply those changes to the rest of
the wchar file functions.
Assisted-by: Automated tooling, human reviewed.
[LLVM][ARM] Add native ct.select support for ARM32 and Thumb
This patch implements architecture-specific lowering for ct.select on ARM
(both ARM32 and Thumb modes) using conditional move instructions and
bitwise operations for constant-time selection.
Implementation details:
- Uses pseudo-instructions that are expanded Post-RA to bitwise operations
- Post-RA expansion in ARMBaseInstrInfo for BUNDLE pseudo-instructions
- Handles scalar integer types, floating-point, and half-precision types
- Handles vector types with NEON when available
- Support for both ARM and Thumb instruction sets (Thumb1 and Thumb2)
- Special handling for Thumb1 which lacks conditional execution
- Comprehensive test coverage including half-precision and vectors
The implementation includes:
- ISelLowering: Custom lowering to CTSELECT pseudo-instructions
- ISelDAGToDAG: Selection of appropriate pseudo-instructions
- BaseInstrInfo: Post-RA expansion of BUNDLE to bitwise instruction sequences
[3 lines not shown]
[X86] Manage atomic store of fp -> int promotion in DAG (#197166)
When lowering `atomic store <1 x T>` vector types with floats (i.e.
during scalarization in the selection DAG), selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.
Store-side counterpart to #148895. Stacked on top of #197165; and below
of #197618.
Reland [VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions (#197659)
Reland of #195119, which was reverted in 2b26355 due to:
1. An assertion failure on AArch64 where
`getShuffleCost(SK_ExtractSubvector)` was called without the `SubTp`
parameter.
2. A miscompilation on non-power-of-2 vector sizes where parity-based
shuffle masks cause lane duplication in the reduction tree.
Fixes:
- Pass `ReduceVecTy` as `SubTp` to `getShuffleCost`.
- Restrict partial reductions to power-of-2 vector sizes.
---
Extend foldShuffleChainsToReduce to recognize partial reduction patterns
where
only a subvector of the full vector is being reduced.
[11 lines not shown]
[LLVM][X86] Add f80 support for ct.select
Add special handling for x86_fp80 types in CTSELECT lowering by splitting
them into three 32-bit chunks, performing constant-time selection on each
chunk, and reassembling the result. This fixes crashes when compiling
tests with f80 types.
Also updated ctselect.ll to match current generic fallback implementation.
[LV] Add -epilogue-tail-folding-policy flag for tail-folded epilogue (#190697)
This is the first patch in a series implementing **tail-folding on the
epilogue loop** — a vectorization style that pairs an unpredicated
vector main loop with a predicated vector epilogue.
It adds a new flag, `-epilogue-tail-folding-policy`, to enable the
style opt-in. Subsequent patches will build out the implementation.
Motivation behind this work:
- The current vectorization styles force either tail-folding on the main
vector loop with no interleaving, or unpredicated main vector loop with
interleaving.
The first style prevents us from getting the benefit of high interleaving
when it’s beneficial/possible, and the second one prevents
tail-folding while it could be beneficial specially for low trip count.
- The proposed hybrid approach of having unpredicated main vector loop
with tail-folded vector epilogue combines the strengths of both styles,
[7 lines not shown]
[SelectionDAG] Scalarize <1 x T> vector types for atomic store (#197165)
`store atomic <1 x T>` is not valid. This change legalizes
vector types of atomic store via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.
This is the store-side counterpart to #148894. Stacked on top of
#197372; and below #197166.
[X86][AtomicExpand] Remove X86's shouldCastAtomicLoadInIR override (added in #148899)
So that atomic floating-point and FP-vector loads are no longer bitcast to an integer
at the IR level by AtomicExpand.
[clang][Sema] Diagnose nested local classes defined in a different block scope than their parent (#197863)
Fixes #193472.
[[class.local]/3](https://eel.is/c++draft/class.local#3) says:
> A class nested within a local class is a local class. A member of a
local class `X` shall be declared only in the definition of `X` or, __if
the member is a nested class, in the nearest enclosing block scope of
X__.
In other words:
```cpp
void f() {
struct X { struct S; };
struct X::S {}; // okay
struct X { struct S; };
[17 lines not shown]
[VPlan] Add branch-on-cond false original unconditional latch (NFC). (#198539)
For loops where the latch does not exit, addInitialSkeleton adds the
middle block as additional successor, as early canonicalization.
But then we end up with a block without terminator and multiple
successors. Fix this by adding a branch-on-cond false as terminator.
This preserves the original behavior (backegdge always taken) and
resolves the verifier issue.
PR: https://github.com/llvm/llvm-project/pull/198539
[OpenMP][OMPIRBuilder] Fix non-determinism in removeUnusedBlocksFromParent
The openmp-cli-fuse02.mlir test fails non-deterministically (~20%) when
unrelated patches add or reorder code, causing the linker to place
objects at different offsets and the heap allocator to return different
addresses at runtime. SmallPtrSet iteration order depends on these
pointer addresses (also randomized across runs by ASLR), so fuseLoops
sometimes leaves dead blocks in the function.
The root cause is erasing blocks from the set while iterating it to
check for external uses—removing one block mid-pass changes the result
for blocks checked later. Both the remove_if form and the earlier
make_early_inc_range version (pre-b6a94b6bfb2c) have this defect.
Fix by collecting all blocks to keep before erasing, so every block is
evaluated against the same snapshot of the set.
[Offload] Fix build install directory and remove 'add_llvm_library' (#198622)
Summary:
The problem is that we do not correctly set the build directory output
for offload/. Normally, it's supposed to mirror the install pattern.
This is because we both have variants and so people can use the compiler
from the build directory.
Currently, if you build more than one variant of the offload/ library
they will clobber each-other in `<build>/lib/`, so no cross compiling
allowed. Additionally, these will not be usable in the build directory
because the compiler will think that they are in the triple directory
when they are not.
Relatively simple fix, just copy-paste the pattern every other runtime
uses and then remove the implicit handling we get from
`add_llvm_libraries`. The only this it did for us was automatically map
component names to the libraries, which is easy enough to do.
[VPlan] Sink VPRecipeValue dtors. (#198623)
Currently (after https://github.com/llvm/llvm-project/pull/195483) the
VPRecipeValue accesses the defining value and removes it. This can cause
uninitialized memory reads, because the Def pointer held by the
VPMultiDefValue is destroyed before the super class destructor runs.
[libc][freebsd] initialize freebsd support (#124459)
Initialize FreeBSD support. Currently, only overlay build (mainly math
routines) is supported.
This PR mainly define the target entrypoints and basic syscall support.
Different from Linux, FreeBSD's syscall return always consist of two
component:
- return value as arch register
- error flag
On x86-64, the flag is returned via the carry bit state. Hence, for
syscall stubs, we always return a structure containing these two fields.
For math support, the only big difference is that FreeBSD has different
naming convention in some exception macros.
Further fixes for C++ userland are tracked in #197605
Assisted-by: Codex with gpt-5.5 high fast