TrimmedMlirOptLib: Remove IRDLDialect dependency
Add conditional compilation guards to MlirOptLib to allow building
without the IRDLDialect dependency.
- Add #ifndef MLIR_MLIROPTLIB_NO_IRDL guards around IRDL includes
- Guard --irdl-file command line option
- Guard loadIRDLDialects() function
- Guard calls to loadIRDLDialects() in doVerifyRoundTrip and processBuffer
- Add TrimmedMlirOptLib bazel target with local_defines
Dependency path removed:
aster-tools → aster-opt → MlirOptLib → IRDLDialect
[clang-tidy] Don't cache classes by name in `fuchsia-multiple-inheritance` (#171016)
Context: for every class, this check needs to compute whether that class
is an interface (i.e. only has pure virtual methods). This is expensive,
so the check caches the computation. But it caches by class name, which
is problematic, because the same name can refer to different classes at
different scopes. Here's for example a false negative it causes:
https://godbolt.org/z/bMGc5sYqh. This PR changes it to cache by
`CXXRecordDecl *` instead.
[SPIRV] Add support for pointers to functions with aggregate args/returns as global variables / constant initialisers (#169595)
This patch does two things:
1. it extends the aggregate arg / ret replacement transform to work on
indirect calls / pointers to function. It is somewhat spread out as
retrieving the original function type is needed in a few places. In
general, we should rethink / rework the entire infrastructure around
aggregate arg/ret handling, using an opaque target specific type rather
than i32;
2. it enables global variables of pointer to function type, and, more
specifically, global variables of a aggregate type (arrays / structures)
with pointer to function elements.
This also exposes some issues in how we handle pointers to function and
lowering indirect function calls, primarily around not using the program
address space. These will be handled in a subsequent patch as they'll
require somewhat more intrusive surgery, possibly involving modifying
the data layout.
[mlir][arith] `arith-to-apfloat`: Add vector support (#171024)
Add support for vectorized operations such as `arith.addf ... :
vector<4xf4E2M1FN>`. The computation is scalarized: scalar operands are
extracted with `vector.to_elements`, multiple scalar computations are
performed and the result is inserted back into a vector with
`vector.from_elements`.
[SPIRV] Use AMDGPU ABI for AMDGCN flavoured SPIRV (#169865)
At the moment AMDGCN flavoured SPIRV uses the SPIRV ABI with some tweaks
revolving around passing aggregates as direct. This is problematic in
multiple ways:
- it leads to divergence from code compiled for a concrete target, which
makes it difficult to debug;
- it incurs a run time cost, when dealing with larger aggregates;
- it incurs a compile time cost, when dealing with larger aggregates.
This patch switches over AMDGCN flavoured SPIRV to implement the AMDGPU
ABI (except for dealing with variadic functions, which will be added in
the future). One additional complication (and the primary motivation
behind the current less than ideal state of affairs) stems from `byref`,
which AMDGPU uses, not being expressible in SPIR-V. We deal with this by
CodeGen-ing for `byref`, lowering it to the `FuncParamAttr ByVal` in
SPIR-V, and restoring it when doing reverse translation from AMDGCN
flavoured SPIR-V.
[MLIR][ExecutionEngine] Enable PIC option (#170995)
This PR enables the MLIR execution engine to dump object file as PIC
code, which is needed when the object file is later bundled into a dynamic
shared library.
---------
Co-authored-by: Mehdi Amini <joker.eph at gmail.com>
TrimmedMemRefTransforms: Remove BufferizationDialect and NVGPUDialect deps
Exclude AllocationOpInterfaceImpl.cpp and BufferViewFlowOpInterfaceImpl.cpp
(which use BufferizationDialect), and remove NVGPUDialect dep since
ExtractAddressComputations.cpp and FoldMemRefAliasOps.cpp are already excluded.
[VPlan] Replace ExtractLast(Elem|LanePerPart) with ExtractLast(Lane/Part) (#164124)
Replace ExtractLastElement and ExtractLastLanePerPart with more generic
and specific ExtractLastLane and ExtractLastPart, which model distinct
parts of extracting across parts and lanes. ExtractLastElement ==
ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart ==
ExtractLastLane, the latter clarifying the name of the opcode. A new
m_ExtractLastElement matcher is provided for convenience.
The patch should be NFC modulo printing changes.
PR: https://github.com/llvm/llvm-project/pull/164124
[Clang]: Support opt-in speculative devirtualization (#159685)
This patch adds Clang support for speculative devirtualization and
integrates the related pass into the pass pipeline.
It's building on the LLVM backend implementation from PR #159048.
Speculative devirtualization transforms an indirect call (the virtual
function) to a guarded direct call.
It is guarded by a comparison of the virtual function pointer to the
expected target.
This optimization is still safe without LTO because it doesn't do direct
calls, it's conditional according to the function ptr.
This optimization:
- Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively`
- Works in non-LTO mode
- Handles publicly-visible objects.
- Uses guarded devirtualization with fallback to indirect calls when the
speculation is incorrect.
For this C++ example:
[50 lines not shown]
[clang-tidy] Fix fragile test in `read-parameters-from-file` (#171033)
[CommandLine.cpp](https://github.com/llvm/llvm-project/blob/fb0400fe1f1f9e83f3148db8ce2c72ab5bc6728e/llvm/lib/Support/CommandLine.cpp#L940)
treats single quote as literal characters on Windows, so the argument is
parsed as a check named `' -*,llvm-namespace-comment '`, which matches
no existing checks, so no checks are enabled via the command line.
Previously, the test passed because it fell back to the root
`.clang-tidy` configuration which enables `llvm-*`.
[libc++] Allows any types of size 4 and 8 to use native platform ulock_wait (#161086)
This is to address #146145
The issue before was that, for `std::atomic::wait/notify`, we only
support `uint64_t` to go through the native `ulock_wait` directly. Any
other types will go through the global contention table's `atomic`,
increasing the chances of spurious wakeup. This PR tries to allow any
types that are of size 4 or 8 to directly go to the `ulock_wait`.
This PR is just proof of concept. If we like this idea, I can go further
to update the Linux/FreeBSD branch and add ABI macros so the existing
behaviours are reserved under the stable ABI
Here are some benchmark results
```
Benchmark Time CPU Time Old Time New CPU Old CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
[48 lines not shown]
[mlir][emitc] Fix bug in dereference translation (#171028)
The op was not added to `hasDeferredEmission()` when introduced by
f17abc280c70, causing incorrect translation.