LLVM/project 529ab5allvm/lib/Target/AMDGPU AMDGPURegBankCombiner.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel legalize-sextload-zextload-s16-true16.mir

[AMDGPU][True16] Add regbank combiner cases to fix regression around G_SEXTLOAD
DeltaFile
+42-132llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-zextload-s16-true16.mir
+17-2llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+59-1342 files

LLVM/project b0cac7ellvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU global-saddr-load.ll

Add legalize rules and fix tests
DeltaFile
+504-222llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-zextload-s16-true16.mir
+90-24llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+45-10llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
+7-2llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+646-2584 files

LLVM/project 84700d2llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU flat-saddr-load.ll

[AMDGPU][True16] Legalize extloads into 16-bit registers

Signed-off-by: Domenic Nutile <domenic.nutile at gmail.com>
DeltaFile
+80-38llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll
+2-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+82-402 files

LLVM/project 61db56ellvm/test/CodeGen/AMDGPU/GlobalISel legalize-sextload-zextload-s16-true16.mir

[AMDGPU][True16] Create tests that will demonstrate true16 G_SEXTLOAD/G_ZEXTLOAD legalization changes
DeltaFile
+376-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-zextload-s16-true16.mir
+376-01 files

LLVM/project d63ca96libc/src/wchar CMakeLists.txt fgetws.cpp, libc/test/src/wchar CMakeLists.txt

[libc] clean up wchar file deps and includes (#198648)

There were a couple comments left on the wchar file series after I'd
already merged some. This PR should apply those changes to the rest of
the wchar file functions.

Assisted-by: Automated tooling, human reviewed.
DeltaFile
+37-19libc/src/wchar/CMakeLists.txt
+24-23libc/test/src/wchar/CMakeLists.txt
+1-0libc/src/wchar/fgetws.cpp
+1-0libc/src/wchar/fgetws.h
+1-0libc/src/wchar/fputwc.cpp
+1-0libc/src/wchar/fputwc.h
+65-4210 files not shown
+75-4216 files

LLVM/project e0ef9b7llvm/lib/Support UnicodeNameToCodepointGenerated.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.av.load.b128.ll

Merge branch 'main' into users/jofrn/widen-vec-atomic-store
DeltaFile
+23,873-20,923llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
+8,633-8,584llvm/test/CodeGen/Thumb2/mve-clmul.ll
+12,365-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.av.load.b128.ll
+1,243-8,768llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll
+6,862-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Nano-sve-instructions.s
+3,436-2,769llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
+56,412-41,0444,260 files not shown
+313,383-156,9124,266 files

LLVM/project 9dd4f7cutils/bazel/llvm-project-overlay/libc BUILD.bazel

[Bazel] Fixes f97e1d4 (#198640)

This fixes f97e1d46878d53a50a64c6b3faa45f43741d69ac.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+1-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+1-01 files

LLVM/project 066e3eallvm/lib/Target/ARM ARMISelLowering.cpp ARMBaseInstrInfo.cpp, llvm/test/CodeGen/ARM ctselect-vector.ll ctselect-half.ll

[LLVM][ARM] Add native ct.select support for ARM32 and Thumb

This patch implements architecture-specific lowering for ct.select on ARM
(both ARM32 and Thumb modes) using conditional move instructions and
bitwise operations for constant-time selection.

Implementation details:
- Uses pseudo-instructions that are expanded Post-RA to bitwise operations
- Post-RA expansion in ARMBaseInstrInfo for BUNDLE pseudo-instructions
- Handles scalar integer types, floating-point, and half-precision types
- Handles vector types with NEON when available
- Support for both ARM and Thumb instruction sets (Thumb1 and Thumb2)
- Special handling for Thumb1 which lacks conditional execution
- Comprehensive test coverage including half-precision and vectors

The implementation includes:
- ISelLowering: Custom lowering to CTSELECT pseudo-instructions
- ISelDAGToDAG: Selection of appropriate pseudo-instructions
- BaseInstrInfo: Post-RA expansion of BUNDLE to bitwise instruction sequences

    [3 lines not shown]
DeltaFile
+1,839-0llvm/test/CodeGen/ARM/ctselect-vector.ll
+867-0llvm/test/CodeGen/ARM/ctselect-half.ll
+549-0llvm/test/CodeGen/ARM/ctselect.ll
+311-62llvm/lib/Target/ARM/ARMISelLowering.cpp
+335-2llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
+187-0llvm/lib/Target/ARM/ARMInstrInfo.td
+4,088-644 files not shown
+4,187-6910 files

LLVM/project d9a9b3bllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 atomic-load-store.ll

[X86] Manage atomic store of fp -> int promotion in DAG (#197166)

When lowering `atomic store <1 x T>` vector types with floats (i.e.
during scalarization in the selection DAG), selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.

Store-side counterpart to #148895. Stacked on top of #197165; and below
of #197618.
DeltaFile
+130-0llvm/test/CodeGen/X86/atomic-load-store.ll
+4-0llvm/lib/Target/X86/X86ISelLowering.cpp
+134-02 files

LLVM/project e28e7ecllvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/PhaseOrdering/X86 horizontal-reduce-smin.ll horizontal-reduce-smax.ll

Reland [VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions (#197659)

Reland of #195119, which was reverted in 2b26355 due to:
1. An assertion failure on AArch64 where
`getShuffleCost(SK_ExtractSubvector)` was called without the `SubTp`
parameter.
2. A miscompilation on non-power-of-2 vector sizes where parity-based
shuffle masks cause lane duplication in the reduction tree.

Fixes:
- Pass `ReduceVecTy` as `SubTp` to `getShuffleCost`.
- Restrict partial reductions to power-of-2 vector sizes.

---

Extend foldShuffleChainsToReduce to recognize partial reduction patterns
where
only a subvector of the full vector is being reduced.


    [11 lines not shown]
DeltaFile
+70-0llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll
+37-6llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+8-32llvm/test/Transforms/PhaseOrdering/X86/horizontal-reduce-smin.ll
+8-32llvm/test/Transforms/PhaseOrdering/X86/horizontal-reduce-smax.ll
+8-32llvm/test/Transforms/PhaseOrdering/X86/horizontal-reduce-umax.ll
+8-32llvm/test/Transforms/PhaseOrdering/X86/horizontal-reduce-umin.ll
+139-1341 files not shown
+164-1347 files

LLVM/project 403887ellvm/lib/Target/X86 X86InstrInfo.cpp X86ISelLowering.cpp, llvm/test/CodeGen/X86 ctselect-vector.ll ctselect-i386-mmx.ll

[LLVM][X86] Add f80 support for ct.select

Add special handling for x86_fp80 types in CTSELECT lowering by splitting
them into three 32-bit chunks, performing constant-time selection on each
chunk, and reassembling the result. This fixes crashes when compiling
tests with f80 types.

Also updated ctselect.ll to match current generic fallback implementation.
DeltaFile
+463-452llvm/lib/Target/X86/X86InstrInfo.cpp
+211-492llvm/test/CodeGen/X86/ctselect-vector.ll
+188-255llvm/test/CodeGen/X86/ctselect-i386-mmx.ll
+126-146llvm/test/CodeGen/X86/ctselect-i386-fp.ll
+69-6llvm/lib/Target/X86/X86ISelLowering.cpp
+3-6llvm/lib/Target/X86/X86InstrInfo.h
+1,060-1,3572 files not shown
+1,061-1,3608 files

LLVM/project 499b2fallvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize fold-epilogue-tail.ll

[LV] Add -epilogue-tail-folding-policy flag for tail-folded epilogue (#190697)

This is the first patch in a series implementing **tail-folding on the
epilogue loop** — a vectorization style that pairs an unpredicated
vector main loop with a predicated vector epilogue.

It adds a new flag, `-epilogue-tail-folding-policy`, to enable the
style opt-in. Subsequent patches will build out the implementation.

Motivation behind this work:
- The current vectorization styles force either tail-folding on the main
vector loop with no interleaving, or unpredicated main vector loop with
interleaving.
The first style prevents us from getting the benefit of high interleaving
when it’s beneficial/possible, and the second one prevents
tail-folding while it could be beneficial specially for low trip count.

- The proposed hybrid approach of having unpredicated main vector loop
with tail-folded vector epilogue combines the strengths of both styles,

    [7 lines not shown]
DeltaFile
+98-0llvm/test/Transforms/LoopVectorize/fold-epilogue-tail.ll
+70-5llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+168-52 files

LLVM/project 3a47318llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp LegalizeTypes.h, llvm/test/CodeGen/X86 atomic-load-store.ll

[SelectionDAG] Scalarize <1 x T> vector types for atomic store (#197165)

`store atomic <1 x T>` is not valid. This change legalizes
vector types of atomic store via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.

This is the store-side counterpart to #148894. Stacked on top of
#197372; and below #197166.
DeltaFile
+57-0llvm/test/CodeGen/X86/atomic-load-store.ll
+12-0llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+1-0llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+70-03 files

LLVM/project c398ae2llvm/lib/CodeGen AtomicExpandPass.cpp, llvm/lib/Target/X86 X86ISelLowering.cpp X86ISelLowering.h

[X86][AtomicExpand] Remove X86's shouldCastAtomicLoadInIR override (added in #148899)

So that atomic floating-point and FP-vector loads are no longer bitcast to an integer
at the IR level by AtomicExpand.
DeltaFile
+32-7llvm/lib/Target/X86/X86ISelLowering.cpp
+14-1llvm/lib/CodeGen/AtomicExpandPass.cpp
+0-2llvm/lib/Target/X86/X86ISelLowering.h
+46-103 files

LLVM/project cbc6a27flang/test/Driver intrinsic-module-path_per_target.f90

[Flang][test] Require x86 target for test (#198643)

#196558 uses x86_64-unknown-linux-gnu in a target and needs
LLVM_TARGETS_TO_BUILD=X86, even though it uses -fsyntax-only to not
generate code.

Reported by
https://github.com/llvm/llvm-project/pull/196558#issuecomment-4487790588
DeltaFile
+1-0flang/test/Driver/intrinsic-module-path_per_target.f90
+1-01 files

LLVM/project 389b818llvm/test/CodeGen/AArch64 fneg.ll fabs.ll

[AArch64] Add bf16 negs and abs tests. NFC (#198647)

Adding them to fabs.ll and fneg.ll helps test both with and without
+fullfp16.
DeltaFile
+189-157llvm/test/CodeGen/AArch64/fneg.ll
+171-142llvm/test/CodeGen/AArch64/fabs.ll
+360-2992 files

LLVM/project d0ae038clang/docs ReleaseNotes.rst, clang/include/clang/Basic DiagnosticSemaKinds.td

[clang][Sema] Diagnose nested local classes defined in a different block scope than their parent (#197863)

Fixes #193472.

[[class.local]/3](https://eel.is/c++draft/class.local#3) says:

> A class nested within a local class is a local class. A member of a
local class `X` shall be declared only in the definition of `X` or, __if
the member is a nested class, in the nearest enclosing block scope of
X__.

In other words:

```cpp
void f() {
  struct X { struct S; };
  struct X::S {}; // okay

  struct X { struct S; };

    [17 lines not shown]
DeltaFile
+138-30clang/test/CXX/class/class.local/p3.cpp
+21-0clang/lib/Sema/SemaDecl.cpp
+3-0clang/docs/ReleaseNotes.rst
+2-0clang/include/clang/Basic/DiagnosticSemaKinds.td
+164-304 files

LLVM/project 4c07402libc/utils/wctype_utils gen.py, libc/utils/wctype_utils/conversion hex_writer.py

[libc][wctype] Add gen script for conversion functions
DeltaFile
+87-8libc/utils/wctype_utils/conversion/hex_writer.py
+11-3libc/utils/wctype_utils/gen.py
+98-112 files

LLVM/project ea73ddfllvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlanConstruction.cpp

[VPlan] Add branch-on-cond false original unconditional latch (NFC). (#198539)

For loops where the latch does not exit, addInitialSkeleton adds the
middle block as additional successor, as early canonicalization.

But then we end up with a block without terminator and multiple
successors. Fix this by adding a branch-on-cond false as terminator.
This preserves the original behavior (backegdge always taken) and
resolves the verifier issue.

PR: https://github.com/llvm/llvm-project/pull/198539
DeltaFile
+7-7llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+5-1llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+12-82 files

LLVM/project 04e0c61llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp, llvm/unittests/Frontend OpenMPIRBuilderTest.cpp

[OpenMP][OMPIRBuilder] Fix non-determinism in removeUnusedBlocksFromParent

The openmp-cli-fuse02.mlir test fails non-deterministically (~20%) when
unrelated patches add or reorder code, causing the linker to place
objects at different offsets and the heap allocator to return different
addresses at runtime. SmallPtrSet iteration order depends on these
pointer addresses (also randomized across runs by ASLR), so fuseLoops
sometimes leaves dead blocks in the function.

The root cause is erasing blocks from the set while iterating it to
check for external uses—removing one block mid-pass changes the result
for blocks checked later. Both the remove_if form and the earlier
make_early_inc_range version (pre-b6a94b6bfb2c) have this defect.

Fix by collecting all blocks to keep before erasing, so every block is
evaluated against the same snapshot of the set.
DeltaFile
+55-0llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+11-2llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+66-22 files

LLVM/project 3e5388bclang/lib/Lex PPDirectives.cpp, clang/test/CXX/cpp/cpp.replace.general p9.cpp p14.cpp

Revert "Reapply [Clang] Implement P2843R3 - Preprocessing is never undefined …"

This reverts commit 22e8c55ccf808620cce66c9b1fb6f1341d15a3cc.
DeltaFile
+14-36clang/lib/Lex/PPDirectives.cpp
+0-44clang/test/CXX/cpp/cpp.replace.general/p9.cpp
+15-15clang/test/Preprocessor/macro-reserved.c
+14-14clang/test/Preprocessor/macro-reserved-attrs-cxx11.cpp
+14-14clang/test/Preprocessor/macro-reserved.cpp
+0-24clang/test/CXX/cpp/cpp.replace.general/p14.cpp
+57-14723 files not shown
+115-24629 files

LLVM/project 019e986clang/lib/CIR/FrontendAction CIRGenAction.cpp

add note on sharing `linkInModules`
DeltaFile
+2-0clang/lib/CIR/FrontendAction/CIRGenAction.cpp
+2-01 files

LLVM/project 2ac9cd7llvm/include/llvm/SandboxIR Type.h Context.h, llvm/lib/SandboxIR Type.cpp

[SandboxIR][Type] Implement ByteType (#197309)

This is mirroring LLVM IR's ByteType.
DeltaFile
+51-0llvm/unittests/SandboxIR/TypesTest.cpp
+40-0llvm/include/llvm/SandboxIR/Type.h
+31-0llvm/lib/SandboxIR/Type.cpp
+1-0llvm/include/llvm/SandboxIR/Context.h
+123-04 files

LLVM/project f97e1d4libc/src/__support/wctype perfect_hash_map.h upper_to_lower.h, libc/test/src/__support/wctype wctype_perfect_hash_test.cpp

[libc][wctype] Add perfect hash map for conversion functions (#187670)

- Upstream PTRHash and PerfectHashTable
- Add lowber_bound and distance
DeltaFile
+986-0libc/test/src/__support/wctype/wctype_perfect_hash_test.cpp
+872-0libc/src/__support/wctype/perfect_hash_map.h
+561-0libc/src/__support/wctype/upper_to_lower.h
+481-0libc/src/__support/wctype/lower_to_upper.h
+0-400libc/src/__support/wctype/lower_to_upper.inc
+0-390libc/src/__support/wctype/upper_to_lower.inc
+2,900-7904 files not shown
+3,001-79210 files

LLVM/project 2eb4f2eclang/include/clang/CodeGen ModuleLinker.h, clang/lib/CIR/FrontendAction CIRGenAction.cpp

share function attribute propagation between OG Codegen and CIR consumers.
DeltaFile
+29-0clang/include/clang/CodeGen/ModuleLinker.h
+1-23clang/lib/CodeGen/CGCall.h
+18-0clang/test/CIR/CodeGen/link-bitcode-file.c
+12-3clang/lib/CIR/FrontendAction/CIRGenAction.cpp
+60-264 files

LLVM/project 12abc4dclang/include/clang/CIR/FrontendAction CIRGenAction.h

Rename llvm context for cir consumer
DeltaFile
+1-1clang/include/clang/CIR/FrontendAction/CIRGenAction.h
+1-11 files

LLVM/project 3383f0doffload CMakeLists.txt, offload/liboffload CMakeLists.txt

[Offload] Fix build install directory and remove 'add_llvm_library' (#198622)

Summary:
The problem is that we do not correctly set the build directory output
for offload/. Normally, it's supposed to mirror the install pattern.
This is because we both have variants and so people can use the compiler
from the build directory.

Currently, if you build more than one variant of the offload/ library
they will clobber each-other in `<build>/lib/`, so no cross compiling
allowed. Additionally, these will not be usable in the build directory
because the compiler will think that they are in the triple directory
when they are not.

Relatively simple fix, just copy-paste the pattern every other runtime
uses and then remove the implicit handling we get from
`add_llvm_libraries`. The only this it did for us was automatically map
component names to the libraries, which is easy enough to do.
DeltaFile
+7-32offload/plugins-nextgen/CMakeLists.txt
+4-19offload/libomptarget/CMakeLists.txt
+11-10offload/CMakeLists.txt
+5-10offload/liboffload/CMakeLists.txt
+27-714 files

LLVM/project 6de0400llvm/lib/Transforms/Vectorize VPlan.cpp VPlanValue.h

[VPlan] Sink VPRecipeValue dtors. (#198623)

Currently (after https://github.com/llvm/llvm-project/pull/195483) the
VPRecipeValue accesses the defining value and removes it. This can cause
uninitialized memory reads, because the Def pointer held by the
VPMultiDefValue is destroyed before the super class destructor runs.
DeltaFile
+8-1llvm/lib/Transforms/Vectorize/VPlan.cpp
+2-2llvm/lib/Transforms/Vectorize/VPlanValue.h
+10-32 files

LLVM/project 9997b11clang/lib/Driver/ToolChains CommonArgs.cpp, clang/lib/Sema SemaAMDGPU.cpp

[NFC][AMDGPU] Move AMDGPU related code out of generic TargetParser.cpp (#198433)
DeltaFile
+659-0llvm/lib/TargetParser/AMDGPUTargetParser.cpp
+1-643llvm/lib/TargetParser/TargetParser.cpp
+109-0llvm/include/llvm/TargetParser/AMDGPUTargetParser.h
+1-90llvm/include/llvm/TargetParser/TargetParser.h
+1-1clang/lib/Driver/ToolChains/CommonArgs.cpp
+1-1clang/lib/Sema/SemaAMDGPU.cpp
+772-73527 files not shown
+797-76033 files

LLVM/project 633539b.github/workflows libc-freebsd-vm-tests.yml, libc/config/freebsd/x86_64 entrypoints.txt

[libc][freebsd] initialize freebsd support (#124459)

Initialize FreeBSD support. Currently, only overlay build (mainly math
routines) is supported.
This PR mainly define the target entrypoints and basic syscall support.
Different from Linux, FreeBSD's syscall return always consist of two
component:
- return value as arch register
- error flag
On x86-64, the flag is returned via the carry bit state. Hence, for
syscall stubs, we always return a structure containing these two fields.

For math support, the only big difference is that FreeBSD has different
naming convention in some exception macros.

Further fixes for C++ userland are tracked in #197605

Assisted-by: Codex with gpt-5.5 high fast
DeltaFile
+656-0libc/config/freebsd/x86_64/entrypoints.txt
+118-0libc/src/__support/OSUtil/freebsd/x86_64/syscall.h
+92-0libc/src/__support/OSUtil/freebsd/syscall_wrappers/CMakeLists.txt
+64-0.github/workflows/libc-freebsd-vm-tests.yml
+52-0libc/src/__support/OSUtil/freebsd/syscall.h
+39-0libc/src/__support/OSUtil/freebsd/syscall_wrappers/mmap.h
+1,021-021 files not shown
+1,445-927 files