[mlir][arith] `arith-to-apfloat`: Add vector support (#171024)
Add support for vectorized operations such as `arith.addf ... :
vector<4xf4E2M1FN>`. The computation is scalarized: scalar operands are
extracted with `vector.to_elements`, multiple scalar computations are
performed and the result is inserted back into a vector with
`vector.from_elements`.
[SPIRV] Use AMDGPU ABI for AMDGCN flavoured SPIRV (#169865)
At the moment AMDGCN flavoured SPIRV uses the SPIRV ABI with some tweaks
revolving around passing aggregates as direct. This is problematic in
multiple ways:
- it leads to divergence from code compiled for a concrete target, which
makes it difficult to debug;
- it incurs a run time cost, when dealing with larger aggregates;
- it incurs a compile time cost, when dealing with larger aggregates.
This patch switches over AMDGCN flavoured SPIRV to implement the AMDGPU
ABI (except for dealing with variadic functions, which will be added in
the future). One additional complication (and the primary motivation
behind the current less than ideal state of affairs) stems from `byref`,
which AMDGPU uses, not being expressible in SPIR-V. We deal with this by
CodeGen-ing for `byref`, lowering it to the `FuncParamAttr ByVal` in
SPIR-V, and restoring it when doing reverse translation from AMDGCN
flavoured SPIR-V.
[MLIR][ExecutionEngine] Enable PIC option (#170995)
This PR enables the MLIR execution engine to dump object file as PIC
code, which is needed when the object file is later bundled into a dynamic
shared library.
---------
Co-authored-by: Mehdi Amini <joker.eph at gmail.com>
[VPlan] Replace ExtractLast(Elem|LanePerPart) with ExtractLast(Lane/Part) (#164124)
Replace ExtractLastElement and ExtractLastLanePerPart with more generic
and specific ExtractLastLane and ExtractLastPart, which model distinct
parts of extracting across parts and lanes. ExtractLastElement ==
ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart ==
ExtractLastLane, the latter clarifying the name of the opcode. A new
m_ExtractLastElement matcher is provided for convenience.
The patch should be NFC modulo printing changes.
PR: https://github.com/llvm/llvm-project/pull/164124
[Clang]: Support opt-in speculative devirtualization (#159685)
This patch adds Clang support for speculative devirtualization and
integrates the related pass into the pass pipeline.
It's building on the LLVM backend implementation from PR #159048.
Speculative devirtualization transforms an indirect call (the virtual
function) to a guarded direct call.
It is guarded by a comparison of the virtual function pointer to the
expected target.
This optimization is still safe without LTO because it doesn't do direct
calls, it's conditional according to the function ptr.
This optimization:
- Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively`
- Works in non-LTO mode
- Handles publicly-visible objects.
- Uses guarded devirtualization with fallback to indirect calls when the
speculation is incorrect.
For this C++ example:
[50 lines not shown]
[clang-tidy] Fix fragile test in `read-parameters-from-file` (#171033)
[CommandLine.cpp](https://github.com/llvm/llvm-project/blob/fb0400fe1f1f9e83f3148db8ce2c72ab5bc6728e/llvm/lib/Support/CommandLine.cpp#L940)
treats single quote as literal characters on Windows, so the argument is
parsed as a check named `' -*,llvm-namespace-comment '`, which matches
no existing checks, so no checks are enabled via the command line.
Previously, the test passed because it fell back to the root
`.clang-tidy` configuration which enables `llvm-*`.
[libc++] Allows any types of size 4 and 8 to use native platform ulock_wait (#161086)
This is to address #146145
The issue before was that, for `std::atomic::wait/notify`, we only
support `uint64_t` to go through the native `ulock_wait` directly. Any
other types will go through the global contention table's `atomic`,
increasing the chances of spurious wakeup. This PR tries to allow any
types that are of size 4 or 8 to directly go to the `ulock_wait`.
This PR is just proof of concept. If we like this idea, I can go further
to update the Linux/FreeBSD branch and add ABI macros so the existing
behaviours are reserved under the stable ABI
Here are some benchmark results
```
Benchmark Time CPU Time Old Time New CPU Old CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
[48 lines not shown]
[mlir][emitc] Fix bug in dereference translation (#171028)
The op was not added to `hasDeferredEmission()` when introduced by
f17abc280c70, causing incorrect translation.
[mlir][emitc] Simplify inlining logic (NFCI) (#169978)
This change makes inlining logic in the translator simpler and more
consistent by
(a) Extending the inlining concept to include CExpression ops, which by
definition are inlined if and only if they reside within an
ExpressionOp.
(b) Concentraing all inlining decisions in `shouldBeInlined()` to make
sure that ops get the same decision when queried as operations and
as operands.
[mlir] Remove deprecated GEN_PASS_CLASSES. (#166904)
This was marked as deprecated in 2022, but as comment. Switch to error
to make visible and stop generating. Will remove the error message in
follow up, just felt this was easier for folks to understand compilation
errors. The change required to new form is rather minimal.
[LLVM][X86] Add f80 support for ct.select
Add special handling for x86_fp80 types in CTSELECT lowering by splitting
them into three 32-bit chunks, performing constant-time selection on each
chunk, and reassembling the result. This fixes crashes when compiling
tests with f80 types.
Also updated ctselect.ll to match current generic fallback implementation.
[LLVM][X86] Add native ct.select support for X86 and i386
Add native X86 implementation with CMOV instructions and comprehensive tests:
- X86 ISelLowering with CMOV for x86_64 and i386
- Fallback bitwise operations for i386 targets without CMOV
- Post-RA expansion for pseudo-instructions
- Comprehensive test coverage:
- Edge cases (zero conditions, large integers)
- i386-specific tests (FP, MMX, non-CMOV fallback)
- Vector operations
- Optimization patterns
The basic test demonstrating fallback is in the core infrastructure PR.
[LLVM][AArch64] Add native ct.select support for ARM64
This patch implements architecture-specific lowering for ct.select on AArch64
using CSEL (conditional select) instructions for constant-time selection.
Implementation details:
- Uses CSEL family of instructions for scalar integer types
- Uses FCSEL for floating-point types (F16, BF16, F32, F64)
- Post-RA MC lowering to convert pseudo-instructions to real CSEL/FCSEL
- Handles vector types appropriately
- Comprehensive test coverage for AArch64
The implementation includes:
- ISelLowering: Custom lowering to CTSELECT pseudo-instructions
- InstrInfo: Pseudo-instruction definitions and patterns
- MCInstLower: Post-RA lowering of pseudo-instructions to actual CSEL/FCSEL
- Proper handling of condition codes for constant-time guarantees