LLVM/project 5dbd049mlir/include/mlir/Conversion Passes.td, mlir/lib/Conversion/ArithToAPFloat ArithToAPFloat.cpp CMakeLists.txt

[mlir][arith] `arith-to-apfloat`: Add vector support (#171024)

Add support for vectorized operations such as `arith.addf ... :
vector<4xf4E2M1FN>`. The computation is scalarized: scalar operands are
extracted with `vector.to_elements`, multiple scalar computations are
performed and the result is inserted back into a vector with
`vector.from_elements`.
DeltaFile
+349-251mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+39-0mlir/test/Conversion/ArithToApfloat/arith-to-apfloat.mlir
+26-0mlir/test/Integration/Dialect/Arith/CPU/test-apfloat-emulation-vector.mlir
+2-1mlir/include/mlir/Conversion/Passes.td
+1-0mlir/lib/Conversion/ArithToAPFloat/CMakeLists.txt
+417-2525 files

LLVM/project c369b96mlir/include/mlir/Conversion Passes.td, mlir/lib/Conversion/ArithToAPFloat ArithToAPFloat.cpp CMakeLists.txt

[mlir][arith] `arith-to-apfloat`: Add vector support
DeltaFile
+349-251mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+39-0mlir/test/Conversion/ArithToApfloat/arith-to-apfloat.mlir
+26-0mlir/test/Integration/Dialect/Arith/CPU/test-apfloat-emulation-vector.mlir
+2-1mlir/include/mlir/Conversion/Passes.td
+1-0mlir/lib/Conversion/ArithToAPFloat/CMakeLists.txt
+417-2525 files

LLVM/project b176593llvm/lib/Target/AArch64 AArch64CollectCPSpillInfo.cpp AArch64TargetMachine.cpp, llvm/test/CodeGen/AArch64 fptosi-sat-vector.ll fptoui-sat-vector.ll

Constant pool spilling
DeltaFile
+503-525llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
+912-0llvm/lib/Target/AArch64/AArch64CollectCPSpillInfo.cpp
+177-177llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll
+19-44llvm/test/CodeGen/AArch64/arm64-fp128.ll
+11-0llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+2-7llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll
+1,624-7534 files not shown
+1,634-75510 files

LLVM/project 7bfdaa5llvm/lib/Transforms/Vectorize VPlanPredicator.cpp

[VPlan] Fix unused variable warning

llvm-project/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp:312:19: warning: unused variable 'EB' [-Wunused-variable]
  312 |     VPBasicBlock *EB = Plan.getExitBlocks().front();
      |                   ^~

This showed up in a non-assertions build.
DeltaFile
+2-2llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+2-21 files

LLVM/project 5bd5595llvm/lib/Target/AArch64 AArch64CollectCPSpillInfo.cpp AArch64TargetMachine.cpp, llvm/test/CodeGen/AArch64 fptosi-sat-vector.ll fptoui-sat-vector.ll

Constant pool spilling
DeltaFile
+503-525llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
+855-0llvm/lib/Target/AArch64/AArch64CollectCPSpillInfo.cpp
+177-177llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll
+19-44llvm/test/CodeGen/AArch64/arm64-fp128.ll
+11-0llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+2-7llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll
+1,567-7534 files not shown
+1,577-75510 files

LLVM/project 68fea00clang/lib/CodeGen/Targets SPIR.cpp, clang/test/CodeGenCUDA amdgpu-kernel-arg-pointer-type.cu kernel-args.cu

[SPIRV] Use AMDGPU ABI for AMDGCN flavoured SPIRV (#169865)

At the moment AMDGCN flavoured SPIRV uses the SPIRV ABI with some tweaks
revolving around passing aggregates as direct. This is problematic in
multiple ways:

- it leads to divergence from code compiled for a concrete target, which
makes it difficult to debug;
- it incurs a run time cost, when dealing with larger aggregates;
- it incurs a compile time cost, when dealing with larger aggregates.

This patch switches over AMDGCN flavoured SPIRV to implement the AMDGPU
ABI (except for dealing with variadic functions, which will be added in
the future). One additional complication (and the primary motivation
behind the current less than ideal state of affairs) stems from `byref`,
which AMDGPU uses, not being expressible in SPIR-V. We deal with this by
CodeGen-ing for `byref`, lowering it to the `FuncParamAttr ByVal` in
SPIR-V, and restoring it when doing reverse translation from AMDGCN
flavoured SPIR-V.
DeltaFile
+321-0clang/test/CodeGenHIP/amdgcnspirv-uses-amdgpu-abi.cpp
+254-54clang/lib/CodeGen/Targets/SPIR.cpp
+71-73clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu
+24-0llvm/test/CodeGen/SPIRV/pointers/ptr-argument-byref-amdgcnspirv.ll
+9-1llvm/lib/Target/SPIRV/SPIRVCallLowering.cpp
+4-4clang/test/CodeGenCUDA/kernel-args.cu
+683-1326 files

LLVM/project 11fd760mlir/include/mlir-c ExecutionEngine.h, mlir/lib/Bindings/Python ExecutionEngineModule.cpp

[MLIR][ExecutionEngine] Enable PIC option (#170995)

This PR enables the MLIR execution engine to dump object file as PIC
code, which is needed when the object file is later bundled into a dynamic
shared library.

---------

Co-authored-by: Mehdi Amini <joker.eph at gmail.com>
DeltaFile
+5-5mlir/lib/Bindings/Python/ExecutionEngineModule.cpp
+5-2mlir/lib/CAPI/ExecutionEngine/ExecutionEngine.cpp
+4-1mlir/include/mlir-c/ExecutionEngine.h
+2-2mlir/test/CAPI/execution_engine.c
+1-1mlir/test/CAPI/global_constructors.c
+1-0mlir/test/python/execution_engine.py
+18-116 files

LLVM/project 3fc7419llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize first-order-recurrence-chains-vplan.ll

[VPlan] Replace ExtractLast(Elem|LanePerPart) with ExtractLast(Lane/Part) (#164124)

Replace ExtractLastElement and ExtractLastLanePerPart with more generic
and specific ExtractLastLane and ExtractLastPart, which model distinct
parts of extracting across parts and lanes. ExtractLastElement ==
ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart ==
ExtractLastLane, the latter clarifying the name of the opcode. A new
m_ExtractLastElement matcher is provided for convenience.

The patch should be NFC modulo printing changes.

PR: https://github.com/llvm/llvm-project/pull/164124
DeltaFile
+34-35llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+16-15llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+13-17llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+18-9llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll
+14-6llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+8-10llvm/lib/Transforms/Vectorize/VPlan.h
+103-928 files not shown
+135-11214 files

LLVM/project 915e206llvm/lib/Target/AArch64 AArch64CollectCPSpillInfo.cpp, llvm/test/CodeGen/AArch64 fptosi-sat-vector.ll fptoui-sat-vector.ll

DUP const spilling
DeltaFile
+543-60llvm/lib/Target/AArch64/AArch64CollectCPSpillInfo.cpp
+148-188llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
+108-135llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll
+19-44llvm/test/CodeGen/AArch64/arm64-fp128.ll
+8-8llvm/test/CodeGen/AArch64/fcopysign-noneon.ll
+0-5llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll
+826-4406 files

LLVM/project 9e7ce77clang/docs UsersManual.rst, clang/test/CodeGenCXX speculative-devirt-metadata.cpp

[Clang]: Support opt-in speculative devirtualization (#159685)

This patch adds Clang support for speculative devirtualization and
integrates the related pass into the pass pipeline.
It's building on the LLVM backend implementation from PR #159048.
Speculative devirtualization transforms an indirect call (the virtual
function) to a guarded direct call.
It is guarded by a comparison of the virtual function pointer to the
expected target.
This optimization is still safe without LTO because it doesn't do direct
calls, it's conditional according to the function ptr.
This optimization:
- Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively`
- Works in non-LTO mode
- Handles publicly-visible objects.
- Uses guarded devirtualization with fallback to indirect calls when the
speculation is incorrect.

For this C++ example:

    [50 lines not shown]
DeltaFile
+78-0clang/test/CodeGenCXX/speculative-devirt-metadata.cpp
+64-0llvm/test/Transforms/WholeProgramDevirt/devirt-metadata.ll
+60-0llvm/test/Transforms/PhaseOrdering/speculative-devirt-then-inliner.ll
+52-0clang/docs/UsersManual.rst
+31-15llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+36-0llvm/lib/Passes/PassBuilderPipelines.cpp
+321-1511 files not shown
+379-3817 files

LLVM/project a3beac1mlir/lib/Dialect/Tensor/IR TensorOps.cpp, utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[DO NOT COMMIT] Break tensor dependence on linalg introduced in #123902
DeltaFile
+3-3mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+0-1utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+3-42 files

LLVM/project 1226a6dclang-tools-extra/test/clang-tidy/infrastructure/Inputs/param parameters.txt

[clang-tidy] Fix fragile test in `read-parameters-from-file` (#171033)

[CommandLine.cpp](https://github.com/llvm/llvm-project/blob/fb0400fe1f1f9e83f3148db8ce2c72ab5bc6728e/llvm/lib/Support/CommandLine.cpp#L940)
treats single quote as literal characters on Windows, so the argument is
parsed as a check named `' -*,llvm-namespace-comment '`, which matches
no existing checks, so no checks are enabled via the command line.

Previously, the test passed because it fell back to the root
`.clang-tidy` configuration which enables `llvm-*`.
DeltaFile
+1-1clang-tools-extra/test/clang-tidy/infrastructure/Inputs/param/parameters.txt
+1-11 files

LLVM/project 73a1383libcxx/include/__atomic atomic_sync.h contention_t.h, libcxx/include/__configuration availability.h

[libc++] Allows any types of size 4 and 8 to use native platform ulock_wait (#161086)

This is to address #146145

The issue before was that, for `std::atomic::wait/notify`, we only
support `uint64_t` to go through the native `ulock_wait` directly. Any
other types will go through the global contention table's `atomic`,
increasing the chances of spurious wakeup. This PR tries to allow any
types that are of size 4 or 8 to directly go to the `ulock_wait`.

This PR is just proof of concept. If we like this idea, I can go further
to update the Linux/FreeBSD branch and add ABI macros so the existing
behaviours are reserved under the stable ABI

Here are some benchmark results

```
Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------

    [48 lines not shown]
DeltaFile
+219-85libcxx/src/atomic.cpp
+152-0libcxx/include/__atomic/atomic_sync.h
+57-0libcxx/test/std/atomics/atomics.types.operations/atomics.types.operations.wait/lost_wakeup.pass.cpp
+27-3libcxx/include/__atomic/contention_t.h
+15-0libcxx/include/__configuration/availability.h
+14-0libcxx/lib/abi/CHANGELOG.TXT
+484-8815 files not shown
+579-10421 files

LLVM/project fb0400fmlir/lib/Target/Cpp TranslateToCpp.cpp, mlir/test/Target/Cpp common-cpp.mlir

[mlir][emitc] Fix bug in dereference translation (#171028)

The op was not added to `hasDeferredEmission()` when introduced by
f17abc280c70, causing incorrect translation.
DeltaFile
+2-1mlir/lib/Target/Cpp/TranslateToCpp.cpp
+1-1mlir/test/Target/Cpp/common-cpp.mlir
+3-22 files

LLVM/project 1034291mlir/lib/Dialect/Affine/IR AffineOps.cpp

[mlir][Affine] Avoid forcing a non-composable affine Inliner impl
DeltaFile
+2-1mlir/lib/Dialect/Affine/IR/AffineOps.cpp
+2-11 files

LLVM/project a8474ddmlir/include/mlir/Conversion Passes.td, mlir/lib/Conversion/ArithToAPFloat ArithToAPFloat.cpp CMakeLists.txt

[mlir][arith] `arith-to-apfloat`: Add vector support
DeltaFile
+351-251mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+39-0mlir/test/Conversion/ArithToApfloat/arith-to-apfloat.mlir
+26-0mlir/test/Integration/Dialect/Arith/CPU/test-apfloat-emulation-vector.mlir
+2-1mlir/include/mlir/Conversion/Passes.td
+1-0mlir/lib/Conversion/ArithToAPFloat/CMakeLists.txt
+419-2525 files

LLVM/project 9da9241mlir/include/mlir/Conversion Passes.td, mlir/lib/Conversion/ArithToAPFloat ArithToAPFloat.cpp CMakeLists.txt

[mlir][arith] `arith-to-apfloat`: Add vector support
DeltaFile
+351-251mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+39-0mlir/test/Conversion/ArithToApfloat/arith-to-apfloat.mlir
+2-1mlir/include/mlir/Conversion/Passes.td
+1-0mlir/lib/Conversion/ArithToAPFloat/CMakeLists.txt
+393-2524 files

LLVM/project 8c9174cmlir/include/mlir/Conversion Passes.td, mlir/lib/Conversion/ArithToAPFloat ArithToAPFloat.cpp CMakeLists.txt

[mlir][arith] `arith-to-apfloat`: Add vector support
DeltaFile
+339-251mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+39-0mlir/test/Conversion/ArithToApfloat/arith-to-apfloat.mlir
+2-1mlir/include/mlir/Conversion/Passes.td
+1-0mlir/lib/Conversion/ArithToAPFloat/CMakeLists.txt
+381-2524 files

LLVM/project 40f7224llvm/lib/Target/AArch64 AArch64CollectCPSpillInfo.cpp AArch64TargetMachine.cpp, llvm/test/CodeGen/AArch64 fptoui-sat-vector.ll fptosi-sat-vector.ll

Constant pool spilling
DeltaFile
+372-0llvm/lib/Target/AArch64/AArch64CollectCPSpillInfo.cpp
+90-63llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll
+77-59llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
+12-8llvm/test/CodeGen/AArch64/fcopysign-noneon.ll
+11-0llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+4-4llvm/test/CodeGen/AArch64/arm64-fp128.ll
+566-1344 files not shown
+572-13610 files

LLVM/project a1ba1famlir/lib/Target/Cpp TranslateToCpp.cpp

[mlir][emitc] Simplify inlining logic (NFCI) (#169978)

This change makes inlining logic in the translator simpler and more
consistent by

(a) Extending the inlining concept to include CExpression ops, which by
    definition are inlined if and only if they reside within an
    ExpressionOp.

(b) Concentraing all inlining decisions in `shouldBeInlined()` to make
    sure that ops get the same decision when queried as operations and
    as operands.
DeltaFile
+19-21mlir/lib/Target/Cpp/TranslateToCpp.cpp
+19-211 files

LLVM/project ad31a25mlir/lib/Conversion/FuncToLLVM FuncToLLVM.cpp, mlir/lib/Transforms/Utils DialectConversion.cpp

[mlir][Transforms] Remove `replaceAllUsesWith` workaround
DeltaFile
+12-21mlir/lib/Transforms/Utils/DialectConversion.cpp
+10-2mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp
+10-1mlir/test/lib/Conversion/FuncToLLVM/TestConvertFuncOp.cpp
+2-1mlir/test/Transforms/test-convert-func-op.mlir
+34-254 files

LLVM/project bdb918emlir/lib/Conversion/ArithToAPFloat ArithToAPFloat.cpp, mlir/test/Conversion/ArithToApfloat arith-to-apfloat.mlir

[mlir][arith] `arith-to-apfloat`: Bail on unsupported bitwidth (#170994)

Bitwidths greater than 64 are not supported by `arith-to-apfloat`.
DeltaFile
+26-8mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+25-0mlir/test/Conversion/ArithToApfloat/arith-to-apfloat.mlir
+51-82 files

LLVM/project 3b355b2mlir/tools/mlir-tblgen PassGen.cpp

[mlir] Remove deprecated GEN_PASS_CLASSES. (#166904)

This was marked as deprecated in 2022, but as comment. Switch to error
to make visible and stop generating. Will remove the error message in
follow up, just felt this was easier for folks to understand compilation
errors. The change required to new form is rather minimal.
DeltaFile
+2-79mlir/tools/mlir-tblgen/PassGen.cpp
+2-791 files

LLVM/project 4fe780allvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64ExpandPseudoInsts.cpp

MVNI + FNEG
DeltaFile
+53-4llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+34-0llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+34-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+121-43 files

LLVM/project b63f49dllvm/lib/Target/X86 X86ISelLowering.cpp X86ISelLowering.h, llvm/test/CodeGen/X86 ctselect-i386-fp.ll

[LLVM][X86] Add f80 support for ct.select

Add special handling for x86_fp80 types in CTSELECT lowering by splitting
them into three 32-bit chunks, performing constant-time selection on each
chunk, and reassembling the result. This fixes crashes when compiling
tests with f80 types.

Also updated ctselect.ll to match current generic fallback implementation.
DeltaFile
+2,857-3,098llvm/lib/Target/X86/X86ISelLowering.cpp
+1,619-1,634llvm/lib/Target/X86/X86ISelLowering.h
+463-452llvm/lib/Target/X86/X86InstrInfo.cpp
+126-146llvm/test/CodeGen/X86/ctselect-i386-fp.ll
+9-12llvm/lib/Target/X86/X86InstrInfo.h
+8-7llvm/lib/Target/X86/X86TargetMachine.cpp
+5,082-5,3496 files

LLVM/project d29036ellvm/lib/Target/X86 X86ISelLowering.cpp X86InstrInfo.cpp, llvm/test/CodeGen/X86 ctselect-vector.ll ctselect.ll

[LLVM][X86] Add native ct.select support for X86 and i386

Add native X86 implementation with CMOV instructions and comprehensive tests:
- X86 ISelLowering with CMOV for x86_64 and i386
- Fallback bitwise operations for i386 targets without CMOV
- Post-RA expansion for pseudo-instructions
- Comprehensive test coverage:
  - Edge cases (zero conditions, large integers)
  - i386-specific tests (FP, MMX, non-CMOV fallback)
  - Vector operations
  - Optimization patterns

The basic test demonstrating fallback is in the core infrastructure PR.
DeltaFile
+1,274-0llvm/test/CodeGen/X86/ctselect-vector.ll
+583-413llvm/test/CodeGen/X86/ctselect.ll
+763-28llvm/lib/Target/X86/X86ISelLowering.cpp
+722-0llvm/test/CodeGen/X86/ctselect-i386-fp.ll
+604-5llvm/lib/Target/X86/X86InstrInfo.cpp
+428-0llvm/test/CodeGen/X86/ctselect-i386-mmx.ll
+4,374-44611 files not shown
+5,671-45117 files

LLVM/project bd1047allvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64InstrInfo.td, llvm/test/CodeGen/AArch64 ctselect.ll

[LLVM][AArch64] Add native ct.select support for ARM64

This patch implements architecture-specific lowering for ct.select on AArch64
using CSEL (conditional select) instructions for constant-time selection.

Implementation details:
- Uses CSEL family of instructions for scalar integer types
- Uses FCSEL for floating-point types (F16, BF16, F32, F64)
- Post-RA MC lowering to convert pseudo-instructions to real CSEL/FCSEL
- Handles vector types appropriately
- Comprehensive test coverage for AArch64

The implementation includes:
- ISelLowering: Custom lowering to CTSELECT pseudo-instructions
- InstrInfo: Pseudo-instruction definitions and patterns
- MCInstLower: Post-RA lowering of pseudo-instructions to actual CSEL/FCSEL
- Proper handling of condition codes for constant-time guarantees
DeltaFile
+153-0llvm/test/CodeGen/AArch64/ctselect.ll
+56-0llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+40-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+35-4llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+18-0llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
+11-0llvm/lib/Target/AArch64/AArch64ISelLowering.h
+313-46 files

LLVM/project 6091c90clang/docs LanguageExtensions.rst, clang/include/clang/Basic Builtins.td

[ConstantTime][Clang] Add __builtin_ct_select for constant-time selection
DeltaFile
+683-0clang/test/Sema/builtin-ct-select.c
+373-0clang/test/Sema/builtin-ct-select-edge-cases.c
+64-0clang/lib/Sema/SemaChecking.cpp
+44-0clang/docs/LanguageExtensions.rst
+13-0clang/lib/CodeGen/CGBuiltin.cpp
+8-0clang/include/clang/Basic/Builtins.td
+1,185-06 files

LLVM/project b49b9cbllvm/test/CodeGen/Mips ctselect-fallback-vector.ll ctselect-fallback-patterns.ll

[LLVM][MIPS] Add comprehensive tests for ct.select
DeltaFile
+830-0llvm/test/CodeGen/Mips/ctselect-fallback-vector.ll
+426-0llvm/test/CodeGen/Mips/ctselect-fallback-patterns.ll
+371-0llvm/test/CodeGen/Mips/ctselect-fallback.ll
+244-0llvm/test/CodeGen/Mips/ctselect-fallback-edge-cases.ll
+183-0llvm/test/CodeGen/Mips/ctselect-side-effects.ll
+2,054-05 files

LLVM/project 2ba35b0llvm/test/CodeGen/WebAssembly ctselect-fallback-vector.ll ctselect-fallback-patterns.ll

[ConstantTime][WebAssembly] Add comprehensive tests for ct.select
DeltaFile
+714-0llvm/test/CodeGen/WebAssembly/ctselect-fallback-vector.ll
+641-0llvm/test/CodeGen/WebAssembly/ctselect-fallback-patterns.ll
+552-0llvm/test/CodeGen/WebAssembly/ctselect-fallback.ll
+376-0llvm/test/CodeGen/WebAssembly/ctselect-fallback-edge-cases.ll
+226-0llvm/test/CodeGen/WebAssembly/ctselect-side-effects.ll
+2,509-05 files