LLVM/project d0bf354llvm/include/llvm/ADT Bitset.h, llvm/unittests/ADT BitsetTest.cpp

[ADT] Reinstate "Refactor Bitset to Be More Constexpr-Usable" (#189497)

Reland of #172062 (a71b1d2), which was reverted in b0234d1.

This patch makes essential Bitset member functions constexpr (`set()`,
`any()`, `none()`, `count()`, `operator==`, `!=`, `<`, `\~`) and adds a
new `all()` method. It also introduces a `maskLastWord()` invariant to
ensure unused high bits in the last word are always zero, which is
required for correctness of `operator~`, `set()`, `all()`, and
comparisons on non-word-aligned sizes (e.g., `Bitset<33>`).

Changes from the original reverted PR:
- Replaced `llvm::any_of` with an inline loop to avoid depending on
constexpr `any_of`/`none_of` from `STLExtras` (#172536), which was also
reverted due to a GCC 15.2.1 bootstrap miscompile.
- The patch is now fully self-contained with no prerequisite changes.

Motivation: This is a prerequisite for making `LaneBitmask` a wrapper
around `Bitset`, enabling scalable lane bitmasks beyond 64 bits
(https://discourse.llvm.org/t/rfc-out-of-lanebitmask-bits-again/88613).
DeltaFile
+226-0llvm/unittests/ADT/BitsetTest.cpp
+49-18llvm/include/llvm/ADT/Bitset.h
+275-182 files

LLVM/project dc9be4elld/ELF/Arch ARM.cpp, lld/test/ELF arm-be8-data-mapsym.s

[LLD][ELF] Skip non-inputsections to avoid invalid cast in Arm BE8 handling (#188154)

This patch fixes https://github.com/llvm/llvm-project/issues/187033

In BE8 mode, instruction bytes are reversed for sections containing
code. This logic currently assumes that arm mapping symbols (e.g. $a,
$t, $d) are always associated with InputSections.

However, mapping symbols can also be defined in other section types such
as mergeable sections (SHF_MERGE). These are not represented as
InputSection, and attempting to cast them using
cast_if_present<InputSection> results in an assertion failure.
DeltaFile
+19-0lld/test/ELF/arm-be8-data-mapsym.s
+1-1lld/ELF/Arch/ARM.cpp
+20-12 files

LLVM/project 4c9a739bolt/test/AArch64 compare-and-branch-inversion.S compare-and-branch-reorder-blocks.S

[BOLT][AArch64] Strip uneeded labels from FEAT_CMPBR tests. (#189931)

Eliminates the temporary labels so that BOLT does not recognize them as
secondary entry points.
DeltaFile
+1-8bolt/test/AArch64/compare-and-branch-inversion.S
+3-6bolt/test/AArch64/compare-and-branch-reorder-blocks.S
+3-0bolt/test/AArch64/compare-and-branch-split-functions.S
+3-0bolt/test/AArch64/compare-and-branch-unsupported.S
+10-144 files

LLVM/project 2a30e72clang/lib/CIR/CodeGen CIRGenBuiltinAMDGPU.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn.hip

[CIR][AMDGPU] Add amdgpu wave reduce builtins codegen
DeltaFile
+180-0clang/test/CIR/CodeGenHIP/builtins-amdgcn.hip
+42-4clang/lib/CIR/CodeGen/CIRGenBuiltinAMDGPU.cpp
+222-42 files

LLVM/project d835dd2llvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlanRecipes.cpp

[LV] Strip createStepForVF (NFC) (#185668)

The mul -> shl simplification is already done in VPlan.
DeltaFile
+0-14llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+2-3llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+0-4llvm/lib/Transforms/Vectorize/VPlanHelpers.h
+2-213 files

LLVM/project 018e048mlir/include/mlir/Dialect/Linalg/IR LinalgInterfaces.h, mlir/lib/Dialect/Linalg/IR LinalgInterfaces.cpp

[MLIR][Linalg] Generic to category specialization for unary elementwise ops (#187217)

Handle specialization of `linalg.generic` ops representing a unary
elementwise computation to the `linalg.elementwise` category op. This
implements a previously absent path in the linalg morphism.
DeltaFile
+102-41mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp
+95-0mlir/test/Dialect/Linalg/roundtrip-morphism-linalg-category-ops.mlir
+70-2mlir/test/Dialect/Linalg/specialize-generic-ops.mlir
+14-9mlir/lib/Dialect/Linalg/IR/LinalgInterfaces.cpp
+7-2mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.h
+288-545 files

LLVM/project 81691d2llvm/lib/Target/RISCV RISCVTargetTransformInfo.cpp, llvm/test/Analysis/CostModel/RISCV extract-last-active.ll

[RISCV][TTI] Update cost and prevent exceed m8 for vector.extract.last.active (#188160)

This patch contains two parts.
1. Update costs reflect to the codegen changes. This is not that
accurate since the step vector can use smaller type if there is a
vscale_range attribute. But we cannot get that in the type-based query
in TTI.
2. Return invalid cost for the vector.extract.last.active that needs
vector split for the step vector. But currently this is not handled
correctly and will hit the assertion.

For not blocking the FindLast reduction in LV
(https://github.com/llvm/llvm-project/pull/184931). We should land this
first and fix the SelectionDAG for vector.extract.last.active lowering.
DeltaFile
+22-22llvm/test/Analysis/CostModel/RISCV/extract-last-active.ll
+8-1llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+30-232 files

LLVM/project ca2cb81llvm/lib/Target/AArch64 AArch64InstrFormats.td AArch64InstrInfo.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Some instructions should be `HINT` aliases (NFC)

Implement the following instructions as a `HINT` alias instead of a
dedicated instruction in separate classes:
  * `stshh`
  * `stcph`
  * `shuh`
  * `tsb`

Updated all their helper methods too, and updated the `stshh` pseudo
expansion for the intrinsic to emit `HINT #0x30 | policy`.

Code in AArch64AsmPrinter::emitInstruction identified an initial BTI using a
broad bitmask on the HINT immediate, which also matched shuh/stcph (50..52)
This could move the patchable entry label after a non-BTI instruction.
Replaced it with an exact BTI check using the BTI HINT range (32..63) and
AArch64BTIHint::lookupBTIByEncoding(Imm ^ 32).

A following change will remove duplicated code and simplify.

    [2 lines not shown]
DeltaFile
+115-0llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+41-39llvm/lib/Target/AArch64/AArch64InstrFormats.td
+22-3llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+5-14llvm/lib/Target/AArch64/AArch64InstrInfo.td
+5-10llvm/lib/Target/AArch64/AArch64SystemOperands.td
+4-2llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+192-683 files not shown
+201-719 files

LLVM/project 703d43cllvm/include/llvm/CodeGen BasicTTIImpl.h, llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp

[CostModel] Move default expand cost for partial reductions to BasicTTIImpl (#189905)

This is a follow-up of the suggestion left here:

https://github.com/llvm/llvm-project/pull/181707#discussion_r2995733831

The override functions in AMDGPU/ARM/SystemZ/X86 are required to avoid
enabling partial reductions where they were previously disabled (I've
added this for all targets that implement getArithmeticReductionCost).
DeltaFile
+43-0llvm/include/llvm/CodeGen/BasicTTIImpl.h
+7-27llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+10-0llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
+9-0llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+9-0llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+9-0llvm/lib/Target/X86/X86TargetTransformInfo.h
+87-276 files

LLVM/project b8ca2a8llvm/lib/Target/AArch64 AArch64InstrFormats.td AArch64InstrInfo.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Some instructions should be `HINT` aliases (NFC)

Implement the following instructions as a `HINT` alias instead of a
dedicated instruction in separate classes:
  * `stshh`
  * `stcph`
  * `shuh`
  * `tsb`

Updated all their helper methods too, and updated the `stshh` pseudo
expansion for the intrinsic to emit `HINT #0x30 | policy`.

Code in AArch64AsmPrinter::emitInstruction identified an initial BTI using a
broad bitmask on the HINT immediate, which also matched shuh/stcph (50..52)
This could move the patchable entry label after a non-BTI instruction.
Replaced it with an exact BTI check using the BTI HINT range (32..63) and
AArch64BTIHint::lookupBTIByEncoding(Imm ^ 32).

A following change will remove duplicated code and simplify.

    [2 lines not shown]
DeltaFile
+115-0llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+41-39llvm/lib/Target/AArch64/AArch64InstrFormats.td
+22-3llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+5-14llvm/lib/Target/AArch64/AArch64InstrInfo.td
+5-10llvm/lib/Target/AArch64/AArch64SystemOperands.td
+4-2llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+192-683 files not shown
+201-719 files

LLVM/project 5f6835dlldb/source/Plugins/Process/Linux NativeRegisterContextLinux_arm64.cpp

[lldb][AArch64][Linux] Qualify uses of user_sve_header (#190130)

Fixes #165413. Where a build failure was reported:
```
/b/s/w/ir/x/w/llvm-llvm-project/lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp:1182:9: error: unknown type name 'user_sve_header'; did you mean 'sve::user_sve_header'?
 1182 |         user_sve_header *header =
      |         ^~~~~~~~~~~~~~~
      |         sve::user_sve_header
```
To fix this, add sve:: as we do for all other uses of this.

This is LLDB's copy of a structure that Linux also defines. I think the
build worked on some machines because that version ended up being
included, but with a more isolated build, it may not.

We have our own definition of it so we can be sure what we're using in
case Linux extends it later.
DeltaFile
+3-3lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
+3-31 files

LLVM/project 76fc936clang/include/clang/Basic BuiltinsLoongArchLSX.def BuiltinsLoongArchLASX.def, clang/test/CodeGen/LoongArch/lasx lasxintrin-lax-vector-conversions.c

[Clang][LoongArch] Align LSX/LASX built-in signatures with intrinsic types to avoid lax conversions (#189900)

Update the built-in signatures in BuiltinsLoongArchLSX.def and
BuiltinsLoongArchLASX.def to precisely match the vector types used in
the corresponding intrinsic headers (lsxintrin.h and lasxintrin.h).

This alignment ensures that these intrinsics can be compiled
successfully even when -flax-vector-conversions=none is specified, since
the built-in arguments no longer rely on implicit vector type
conversions.

Added new test cases to verify the macro-defined LSX/LASX
intrinsic interfaces under -flax-vector-conversions=none.

Fixes #189898
DeltaFile
+528-0clang/test/CodeGen/LoongArch/lasx/lasxintrin-lax-vector-conversions.c
+463-0clang/test/CodeGen/LoongArch/lsx/lsxintrin-lax-vector-conversions.c
+20-20clang/include/clang/Basic/BuiltinsLoongArchLSX.def
+18-18clang/include/clang/Basic/BuiltinsLoongArchLASX.def
+1,029-384 files

LLVM/project f95d973clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

fix fmt
DeltaFile
+4-5clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+4-51 files

LLVM/project e3cfcf4clang/include/clang/Basic DiagnosticCrossTUKinds.td, clang/include/clang/CrossTU CrossTranslationUnit.h

[clang][analyzer] Forward CTU-import failure conditions

Forward all CTU-import failures as diagnostics (remarks, warnings,
errors), except for `index_error_code::missing_definition` which has the
potential of generating too many diagnostics.

--
CPP-7804
DeltaFile
+127-37clang/lib/CrossTU/CrossTranslationUnit.cpp
+28-5clang/test/Analysis/ctu/diag/load-threshold.cpp
+26-1clang/include/clang/Basic/DiagnosticCrossTUKinds.td
+13-10clang/include/clang/CrossTU/CrossTranslationUnit.h
+19-0clang/test/Analysis/ctu/diag/invlist-wrong-format-late.cpp
+16-0clang/unittests/CrossTU/CrossTranslationUnitTest.cpp
+229-5314 files not shown
+248-7320 files

LLVM/project 5e0a06bllvm/lib/CodeGen ExpandMemCmp.cpp, llvm/lib/Transforms/Scalar ExpandMemCmp.cpp MergeICmps.cpp

Move ExpandMemCmp and MergeIcmp to the middle end  (#77370)

Moving these into the middle-end pipeline will allow for additional
optimization of the expansion result, such as CSE of redundant loads
(c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place
the passes at the end of the middle-end pipeline, so we mostly don't
benefit from additional optimizations yet. The pipeline position will be
moved in a future change.

This builds on work done by legrosbuffle in
https://reviews.llvm.org/D60318.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
DeltaFile
+0-1,013llvm/lib/CodeGen/ExpandMemCmp.cpp
+963-0llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
+61-0llvm/test/Transforms/ExpandMemCmp/X86/sanitizer-skip.ll
+2-48llvm/lib/Transforms/Scalar/MergeICmps.cpp
+37-0llvm/test/Transforms/PhaseOrdering/X86/expand-memcmp-middle-end.ll
+16-16llvm/test/CodeGen/RISCV/memcmp-optsize.ll
+1,079-1,07770 files not shown
+1,236-1,35276 files

LLVM/project a599a06libc/test/src/math CMakeLists.txt

[libc] Indentation consistency in CMake (#190120)

This PR just fixes the indentation/style for the whole CMake file for
consistency.
No other changes.
c698f55b0245ffbaae55c7f854fadba33df16e9d
DeltaFile
+40-40libc/test/src/math/CMakeLists.txt
+40-401 files

LLVM/project d1cee04clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp, clang/test/CIR/CodeGenCUDA address-spaces.cu

Poison attr lowering and llvm `__shared__` lowering.
DeltaFile
+13-5clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+4-0clang/test/CIR/CodeGenCUDA/address-spaces.cu
+17-52 files

LLVM/project 7ccd1cbclang/test/CodeGenCoroutines coro-suspend-cleanups.cpp, llvm/lib/Transforms/Coroutines CoroFrame.cpp

Reland "[CoroSplit] Erase trivially dead allocas after spilling (#189295)" (#190124)

The original PR contained a use-after-delete issue, which has been
resolved in #189521.

Reland #189295, which is reverted in #189311
DeltaFile
+3-0llvm/lib/Transforms/Coroutines/CoroFrame.cpp
+0-2clang/test/CodeGenCoroutines/coro-suspend-cleanups.cpp
+0-1llvm/test/Transforms/Coroutines/coro-await-suspend-lower.ll
+0-1llvm/test/Transforms/Coroutines/coro-await-suspend-lower-invoke.ll
+0-1llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll
+3-55 files

LLVM/project 1662c20llvm/lib/Passes PassBuilderPipelines.cpp, llvm/lib/Transforms/Scalar LoopRotation.cpp

[Passes][LoopRotate] Move minsize handling fully into pass (#189956)

Make this dependent only on the minsize attribute and drop the pipeline
handling.

Rename the enable-loop-header-duplication option to
enable-loop-header-duplication-at-minsize to clarify that it controls
header duplication at minsize only (in other cases it is enabled by
default, independently of this option).
DeltaFile
+29-10llvm/test/Transforms/LoopRotate/oz-disable.ll
+6-20llvm/lib/Passes/PassBuilderPipelines.cpp
+15-5llvm/lib/Transforms/Scalar/LoopRotation.cpp
+8-9llvm/test/Transforms/PhaseOrdering/enable-loop-header-duplication-oz.ll
+58-444 files

LLVM/project 51fc1ffclang/lib/CIR/Dialect/Transforms/TargetLowering LowerModule.cpp TargetLoweringInfo.h, clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets NVPTX.cpp

[CIR][NVPTX] NVPTX lowering info skeleton and target AS mapping
DeltaFile
+30-17clang/test/CIR/CodeGenCUDA/address-spaces.cu
+39-0clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/NVPTX.cpp
+3-0clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerModule.cpp
+2-0clang/lib/CIR/Dialect/Transforms/TargetLowering/TargetLoweringInfo.h
+1-0clang/lib/CIR/Dialect/Transforms/TargetLowering/CMakeLists.txt
+75-175 files

LLVM/project 40e7fa6llvm/lib/Passes PassBuilderPipelines.cpp, llvm/lib/Transforms/IPO FunctionSpecialization.cpp

[Passes][FuncSpec] Move optsize/minsize handling into pass (#189952)

Instead of using the Os/Oz level during pass pipeline construction,
query the optsize/minsize attribute on the function to determine whether
specialization is allowed to take place. This ensures consistent
behavior for per-function attributes.

It's worth noting that FuncSpec *already* checks for minsize, but at the
call-site level.
DeltaFile
+39-0llvm/test/Transforms/FunctionSpecialization/function-specialization-optsize.ll
+3-8llvm/lib/Passes/PassBuilderPipelines.cpp
+3-0llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
+45-83 files

LLVM/project 3b81be8llvm/lib/Transforms/IPO WholeProgramDevirt.cpp, llvm/test/Transforms/WholeProgramDevirt import.ll export-vcp.ll

WholeProgramDevirt: Import/export the CVP byte directly in the summary (#188979)

rather than using absolute symbol constants on ELF/x86.

This leads to better codegen as the absolute symbol constants were not
resolved until link time (see bug for example).

Fixes #188470
DeltaFile
+6-13llvm/test/Transforms/WholeProgramDevirt/import.ll
+2-4llvm/test/Transforms/WholeProgramDevirt/export-vcp.ll
+2-4llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+10-213 files

LLVM/project 29810d7clang/lib/CIR/CodeGen CIRGenModule.cpp, clang/test/CIR/CodeGen attr-retain.c attr-used.c

add gv section attribute
DeltaFile
+1-5clang/lib/CIR/CodeGen/CIRGenModule.cpp
+2-2clang/test/CIR/CodeGen/attr-retain.c
+1-1clang/test/CIR/CodeGen/attr-used.c
+1-1clang/test/CIR/CodeGen/keep-persistent-storage-variables.cpp
+1-1clang/test/CIR/CodeGen/keep-static-consts.cpp
+6-105 files

LLVM/project 0981c88clang/test/CIR/CodeGen keep-persistent-storage-variables.cpp keep-static-consts.cpp

add tests persistent-storage-variables and keep-static-consts
DeltaFile
+20-0clang/test/CIR/CodeGen/keep-persistent-storage-variables.cpp
+11-0clang/test/CIR/CodeGen/keep-static-consts.cpp
+31-02 files

LLVM/project 30fefdbclang/lib/CIR/CodeGen CIRGenModule.cpp CIRGenModule.h, clang/test/CIR/CodeGen attr-retain.c attr-used.c

use CIRGlobalValueInterface
DeltaFile
+30-29clang/lib/CIR/CodeGen/CIRGenModule.cpp
+18-0clang/test/CIR/CodeGen/attr-retain.c
+7-7clang/lib/CIR/CodeGen/CIRGenModule.h
+14-0clang/test/CIR/CodeGen/attr-used.c
+69-364 files

LLVM/project 0f80a8aclang/lib/CIR/CodeGen CIRGenModule.cpp CIRGenModule.h, clang/test/CIR/CodeGenHIP hip-cuid.hip

[CIR] Add addLLVMUsed and addLLVMCompilerUsed methods to CIRGenModule
DeltaFile
+100-2clang/lib/CIR/CodeGen/CIRGenModule.cpp
+27-0clang/test/CIR/CodeGenHIP/hip-cuid.hip
+19-0clang/lib/CIR/CodeGen/CIRGenModule.h
+146-23 files

LLVM/project 4250a0fllvm/lib/Target/RISCV RISCVAsmPrinter.cpp, llvm/test/CodeGen/RISCV rv64-stackmap-nops.ll

[RISCV] Fix stackmap shadow trimming NOP size for compressed targets (#189774)

The shadow trimming loop in LowerSTACKMAP hardcoded a 4-byte decrement
per instruction, but when Zca is enabled NOPs are 2 bytes. Use NOPBytes
instead of the hardcoded 4 so the shadow is correctly trimmed on
compressed targets.

Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
(cherry picked from commit 3d7eedce5658c41a1b22775938359bfafac47fc9)
DeltaFile
+14-2llvm/test/CodeGen/RISCV/rv64-stackmap-nops.ll
+1-1llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
+15-32 files

LLVM/project da8a5b9flang/docs Extensions.md

[flang] Update Flang Extension doc to reflect previous change (#188088)

Update Flang Extension doc to remove note about a warning that was
removed in a previous PR (PR #178088). It is an oversight that this doc
change was not made in that previous PR. The oversight was only recently
discovered and has led to this PR.

(cherry picked from commit 45b932a2d452c997d98b57e1aa31bc4951c5e9f4)
DeltaFile
+0-5flang/docs/Extensions.md
+0-51 files

LLVM/project e3cbd99clang/lib/CIR/CodeGen TargetInfo.cpp, clang/lib/CIR/Dialect/Transforms TargetLowering.cpp

[CIR][AMDGPU] Lower Language specific address spaces and implement AMDGPU target (#179084)
DeltaFile
+261-1clang/lib/CIR/Dialect/Transforms/TargetLowering.cpp
+66-0clang/test/CIR/CodeGen/amdgpu-target-lowering-as.cpp
+59-0clang/test/CIR/CodeGen/amdgpu-address-spaces.cpp
+47-0clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/AMDGPU.cpp
+0-46clang/test/CIR/Lowering/global-address-space.cir
+31-1clang/lib/CIR/CodeGen/TargetInfo.cpp
+464-486 files not shown
+516-5912 files

LLVM/project 13fd079clang/include/clang/AST DeclCXX.h, clang/lib/AST DeclCXX.cpp

[CIR] Implement isMemcpyEquivalentSpecialMember for trivial copy/move ctors
DeltaFile
+41-0clang/test/CIR/CodeGen/copy-constructor-memcpy.cpp
+7-33clang/lib/CodeGen/CGClass.cpp
+34-0clang/lib/AST/DeclCXX.cpp
+16-7clang/lib/CIR/CodeGen/CIRGenClass.cpp
+13-0clang/include/clang/AST/DeclCXX.h
+6-5clang/test/CIR/CodeGen/cxx-special-member-attr.cpp
+117-456 files not shown
+130-5012 files