[orc-rt] Capture a Session& in SimpleNativeMemoryMap, fix TODOs. (#187200)
SimpleNativeMemoryMap now captures a reference to the Session that it
was constructed for. This is used to fix some outstanding TODOs: using
the real page size for the process, and reporting errors that were
previously discarded.
[CodeGen] Improve `getLoadExtAction` and friends (#181104)
Alternative approach to the same goals as #162407
This takes `TargetLoweringBase::getLoadExtAction`, renames it to
`TargetLoweringBase::getLoadAction`, merges `getAtomicLoadExtAction`
into it, and adds more inputs for relavent information (alignment,
address space).
The `isLoadExtLegal[OrCustom]` helpers are also modified in a matching
manner.
This is fully backwards compatible, with the existing `setLoadExtAction`
working as before. But this allows targets to override a new hook to
allow the query to make more use of the information. The hook
`getCustomLoadAction` is called with all the parameters whenever the
table lookup yields `LegalizeAction::Custom`, and can return any other
action it wants.
[CodeGen] Use separate MBB number for analyses (#187086)
Block numbers are updated too frequently, which makes it difficult to
keep analyses up to date. Therefore, introduce a second number per basic
block that is used for analyses and is renumbered less often. This frees
analyses from providing somewhat efficient facilities for dealing with
changed block numbers, making it simpler to implement in e.g. LoopInfo
or CycleInfo.
(Currently, "less often" means not at all, but we might want to renumber
after certain passes if the numbering gets too sparse and no analyses
are preserved anyway.)
When we introduced a more general use of block numbers some time ago,
using the existing numbers seemed to be a somewhat obvious choice, but I
now think that this was a bad decision, as it conflates a number that is
used for ordering with a number that should be more stable.
MachineBasicBlock isn't particularly size-optimized and there's a fair
[2 lines not shown]
[orc-rt] Publish controller interface from SimpleNativeMemoryMap ctor. (#187198)
Add named constructors to SimpleNativeMemoryMap to publish
SimpleNativeMemoryMap's controller interface when an instance is
constructed.
This supports correct setup by construction, since API clients can't
forget to publish the interface that the controller will need to
interact with the SimpleNativeMemoryMap object.
[BOLT] Enable compatibility of instrumentation-file-append-pid with instrumentation-sleep-time (#183919)
This commit enables compatibility of instrumentation-file-append-pid and
instrumentation-sleep-time options. It also requires keeping the
counters mapping between the watcher process and the instrumented binary
process in shared mode. This is useful when we instrument a shared
library that is used by several tasks running on the target system. In
case when we cannot wait for every task to complete, we must use the
sleep-time option. Without append-pid option, we would overwrite the
profile at the same path but collected from different tasks, leading to
unexpected or suboptimal optimization effects.
Co-authored-by: Vasily Leonenko <vasily.leonenko at huawei.com>
[CIR] Fix reference alignment to use pointee type
getNaturalTypeAlignment on a reference type returned pointer alignment
instead of pointee alignment. Pass the pointee type with
forPointeeType=true to match traditional codegen's
getNaturalPointeeTypeAlignment behavior. Fix applies to both argument
and return type attribute construction paths.
[orc-rt] De-duplicate some test helper APIs. (#187187)
Moves noErrors, mockExecutorProcessInfo, and NoDispatcher into
CommonTestUtils.h where they can be re-used between tests.
[clang] Reshuffle compiler options in C++ DR tests
This patch changes the order of compiler options on RUN lines so that options that differ in length (like -verify with its multiple prefixes) are at the end. This way it's much easier to see what is common and what is different between RUN lines
[mlir][x86] Lower packed type vector.contract to AMX dot-product (#182810)
A transform pass to lower `vector.contract` operation to (a)
`amx.tile_mulf` for BF16, or (b) `amx.tile_muli` for Int8 packed types.
[flang][mlir][OpenMP] Add linear modifier (val, ref, uval)
Add support for OpenMP linear modifiers `val`, `ref`, and `uval`
as defined in OpenMP 5.2 (5.4.6).
[ELF] Orphan placement: remove hasInputSections condition
https://reviews.llvm.org/D60131 (Change default output section type to
SHT_PROGBITS) caused a orphan placement regression for Fuchsia
`zircon.elf`: #40998 The orphan section `code_patch_table` was placed
before the first output section description `.text.boot0`, breaking the
address requirement.
https://reviews.llvm.org/D61197 (Fix getRankProximity to "ignore" not
live sections) fixed the regression by adding a `Live` condition (which
later became `hasInputSections`).
This condition added complexity, which turns out to be unneeded after
* 3bdc90e3ff4c9a18caeb3e6ad40fa5d15bbf9d5e ("[ELF] adjustOutputSections: update sortRank. NFC")
* 747d670baef35f0615b32652e93c97a2ff8dba18 ("[ELF] Make .interp/SHT_NOTE not special")
* #94099
The new orphan placement rule is slightly different (orphans can be
[12 lines not shown]
[OFFLOAD] Improve handling of synchronization errors in L0 plugin and reenable tests (#186927)
This change improves handling of errors during synchronization in Level
Zero plugin by ensuring cleanup of queues and events in case of an
synchronization error. As a result multiple tests stopped hanging.
---------
Co-authored-by: Duran, Alex <alejandro.duran at intel.com>
[DA] Rewrite formula in the Weak Zero SIV tests (#183738)
This patch rewrites the formula in the Weak Zero SIV tests to match the
one used in the Strong SIV test that was updated in #179665. In this
form, `ConstantRange` is used so we don't need to pay attention to any
corner cases such as overflow.
Fix some test cases that were added in the past PRs to represent the
edge cases.
[DA] Fix overflow in symbolic RDIV test (#185805)
The symbolic RDIV test relies on computing the extremes of affine
expressions (e.g., `A1*N1` and `A2*N2`) to disprove dependencies. These
calculations were previously done using `SE->getMulExpr` and
`SE->getMinusSCEV` without guarding against signed integer overflow. If
large coefficients or loop bounds cause a wrap, `isKnownPredicate`
evaluates the wrapped values, potentially disproving a valid dependence
and leading to miscompilations.
This patch reimplements symbolicRDIVtest using `ConstantRange` to work
around overflows.
---------
Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro at fujitsu.com>
[NFC][PowerPC] Pre-commit to optimize bswap64 builtin for power8 (#181776)
The current codegen (for power 8 targets specifically) does not make use
of the parallelism and does most of the operations sequentially.
This will be optimized in a future patch which will follow this NFC PR.
It will enhance the performance and also save us instructions.
---------
Co-authored-by: himadhith <himadhith.v at ibm.com>
[libclang/python] Add type annotations to the TranslationUnit class (#180876)
This adds type annotations to the `TranslationUnit` class, enough to
pass a strict typecheck. This resolves 19 strict typing errors as the
next step towards https://github.com/llvm/llvm-project/issues/76664
[mlir-python] Fix duplicate EnumAttr builder registration across dialects.
When multiple dialects share .td includes (e.g. affine includes arith),
each dialect's _*_enum_gen.py file registered attribute builders under
the same keys, causing "already registered" errors on the second import.
Two-pronged fix:
1. Add `allow_existing=True` to `register_attribute_builder` (and the
underlying C++ `registerAttributeBuilder`). When set, silently skips
registration if the key already exists (first-wins semantics). This
handles EnumInfo-based builders (e.g. `AtomicRMWKindAttr`,
`Arith_CmpFPredicateAttr`) that are emitted by every dialect whose
.td file includes the defining file.
2. Filter EnumAttr-loop builders by `-bind-dialect` in
`EnumPythonBindingGen.cpp` and register them under dialect-qualified
keys (`"dialect.AttrName"`). Update `OpPythonBindingGen.cpp` to look
up the same qualified keys for EnumAttr-typed op attributes (detected
[5 lines not shown]
[clang][Driver][Darwin] Use `xcselect` for `*-apple-darwin*` targets too (#186683)
This is a follow-up to #119670. There, we introduced a CMake option
`CLANG_USE_XCSELECT`, which, when enabled, uses `libxcselect` to find
the right SDK to inject as an `-isysroot` flag when targeting
`*-apple-macos*`.
We intentionally left out `*-apple-darwin*` targets because it broke
many tests. This is unfortunate because `*-apple-darwin*` is the default
triple when building LLVM on macOS, so one isn't able to take advantage
of `xcselect` without an explicit `-target` flag or a change to the
toolchain's default target.
We fix this in two ways.
First, we move the injection of the `-isysroot` flag using `xcselect`
later, until after we are sure that we are targeting macOS. This avoids
confusing the earlier deployment target detection code when we inject
the macOS SDK but actually intended to target non-macOS.
[3 lines not shown]