[Offload][OpenMP][libdevice] Make check to enter state machine architecture dependent (#188144)
The genericStateMachine call uses synchronize::thread wich is expected
to be implemented using a workgroup level barrier.
Currently as in some other architectures where if threads in the same
warp as the main thread reach the barrier may cause a race condition
there's a condition that makes some threads not enter the state machine.
But in Intel GPUs all threads must reach the barrier for it to be
completed, otherwise the threads in the state machine never make
progress.
This PR moves the condition into an architecture-dependent config so it
can work correctly for both kinds of hardware.
[flang][mlir][OpenMP] Add linear modifier (val, ref, uval) (#187142)
Add support for OpenMP linear modifiers `val`, `ref`, and `uval` as
defined in OpenMP 5.2 (5.4.6).
[acc] Lower acc if with multi-block host fallback via scf.execute_region (#188350)
handle multi-block host fallback regions by wrapping them in
scf.execute_region, instead of rejecting with `not yet implemented:
region with multiple blocks`.
InstCombine: Fold out nanless canonicalize pattern (#172998)
Pattern match a wrapper around llvm.canonicalize which
weakens the semantics to not require quieting signaling
nans. Depending on the denormal mode and FP type, we can
either drop the pattern entirely or reduce it only to
a canonicalize call. I'm inventing this pattern to deal
with LLVM's lax canonicalization model in math library
code.
The math library code currently has explicit checks for
the denormal mode, and conditionally canonicalizes the
result if there is flushing. Semantically, this could be
directly replaced with a simple call to llvm.canonicalize,
but doing so would incur an additional cost when using
standard IEEE behavior. If we do not care about quieting
a signaling nan, this should be a no-op unless the denormal
mode may flush. This will allow replacement of the
conditional code with a zero cost abstraction utility
[16 lines not shown]
[DSE] Use CycleInfo instead of LoopInfo (#188253)
DSE needs to reason about cycles in order to correctly handle
loop-carried dependencies. It currently does this by using LoopInfo and
performing a separate check for irreducible control flow.
Instead, we can use CycleInfo, which is like LoopInfo but also handles
irreducible cycles.
This requires computing CycleInfo (which, unlike LoopInfo won't be
reused by surrouding passes), but ends up being neutral in terms of
compile-time overall.
[lldb] Clear up GetModuleSpecifications return value confusion (#188276)
Some plugins were returning the number of specifications they have
added, while others were returning the total final number. Particularly
devious plugins (Minidump) were clearing the specification list
altogether. This resulted in nondeterministic failures (depending on
plugin ininitialization order) in TestSBModule.
This PR defines the problem away by having each plugin only return the
specifications it is responsible for. If the caller wants to merge them,
it is free to do so. This *might* be slighly less efficient, but this is
hardly hot code.
I'm not touching the ObjectFile::GetModuleSpecifications function (the
caller of all these functions) as the PR is big enough, although the
same approach might be warranted there as well.
Fixes https://github.com/llvm/llvm-project/issues/178625.
[libc] implement fflush(NULL) support (#188217)
Implement support for flushing all open streams when fflush is called
with a NULL pointer.
* Added a global linked list to track all open File objects.
* Updated File class to include prev/next pointers and list management
methods.
* Implemented POSIX requirement for fflush to sync seekable input
streams back to the host environment.
* Updated Linux-specific file creation to register new files in the
global list.
* Fixed a memory safety bug in create_file_from_fd using delete instead
of free.
* Added unit test for fflush(NULL).
* Added explanatory comments to fflush.cpp and file.cpp.
[MLIR][XeGPU] Validate DPAS operand types against uArch in XeGPUToXeVM conversion (#185081)
The `DpasOp` would crash with `llvm_unreachable` with unsupported types
(like i16, or i32 in operand) when during lowering to the XeVM dialect.
This happens in both `encodePrecision` and `getNumOperandsPerDword`.
Per
https://github.com/llvm/llvm-project/issues/180107#issuecomment-4009160113,
we handle this in the `matchAndRewrite` by retrieving the uArch instance
and fetching the registered `SubgroupMatrixMultiplyAcc` instruction.
Then, we validate with `getSupportedTypes` and check `aTy`, `bTy`, and
`resultType` correctly with `notifyMatchError` for reporting and
graceful handling.
We add a failed conversion test for a simplified version of the
reproducible error in #180107
InstCombine: Fold out nanless canonicalize pattern
Pattern match a wrapper around llvm.canonicalize which
weakens the semantics to not require quieting signaling
nans. Depending on the denormal mode and FP type, we can
either drop the pattern entirely or reduce it only to
a canonicalize call. I'm inventing this pattern to deal
with LLVM's lax canonicalization model in math library
code.
The math library code currently has explicit checks for
the denormal mode, and conditionally canonicalizes the
result if there is flushing. Semantically, this could be
directly replaced with a simple call to llvm.canonicalize,
but doing so would incur an additional cost when using
standard IEEE behavior. If we do not care about quieting
a signaling nan, this should be a no-op unless the denormal
mode may flush. This will allow replacement of the
conditional code with a zero cost abstraction utility
[17 lines not shown]
[CycleInfo] Preserve if CFG preserved (#188443)
Add a custom invalidator that makes sure CycleAnalysis is preserved if
the CFGAnalyses set is preserved. The implementation matches that of
LoopInfo.
[MLIR] [Python] More improvements to type annotations (#188468)
* `mlir.ir` now exports `_OperationBase`. It is handy to use when both
`Operation` and `OpView` are accepted.
* Added type arguments where they were missing, e.g.
`list[ir.Attribute]` instead of just `list`.
* Changed `Opview.build_generic` and `OpView.parse` to return `Self`
instead of the supertype `Type`.
* Changed the bindings generator to emit a parameterized `OpResult` when
the exact type is available.
[CodeGen] Fix incorrect rematerializtion order in rematerializer
When rematerializing DAGs of registers wherein multiple paths exist
between some regsters of the DAG, it is possible that the
rematerialization determines an incorrect rematerialization order that
does not ensure that a register's dependencies are rematerialized before
itself; an invariant that is otherwise required.
This fixes that using a simpler recursive logic to determine a correct
rematerialization order that honors this invariant. A minimal unit test
is added that fails on the current implementation.
[lldb][ADT] Fix LLDB/GDB formatters for PointerUnion after recactoring (#188483)
In #188242, we replaced `PointerUnion`'s `PointerIntPair` storage with
`PunnedPointer<void*>`. The old formatters relied on the PIP synthetic
provider (LLDB) / `get_pointer_int_pair helper` (GDB) which no longer
work.
Instead, read raw bytes from `PunnedPointer` and compute the active tag
from template argument type alignments -- the same fixed-width encoding
the C++ implementation uses. When template arg enumeration is truncated
(e.g., function-local types in GDB), the formatters fall back to showing
a tag-stripped `void*` instead of silently misdecoding.
Alternatives that didn't work out:
- Adding a C++ helper (`getActiveMemberIdx`) callable from Python: gets
optimized out even with `__attribute__((used, noinline))`, and
expression evaluation fails for synthetic children.
- Using `isa`/`dyn_cast` checks from Python: requires expression
evaluation, which does not work for local types or synthetic children
[2 lines not shown]
[CHERI] Allow @llvm.returnaddress to return a pointer in any address space. (#188464)
Clang now constructs calls to it using the default program address space from the DataLayout.
Co-authored-by: Alex Richardson <alexrichardson at google.com>
[flang][OpenM] Check if loop nest/sequence is well-formed
Check if the code associated with a nest or sequence construct is well
formed. Emit diagnostic messages if not.
Make a clearer separation for checks of loop-nest-associated and loop-
sequence-associated constructs.
Unify structure of some of the more common messages.
Issue: https://github.com/llvm/llvm-project/issues/185287
[flang][OpenMP] Provide reasons for calculated sequence length (#187866)
If the length was limited by some factor, include the reason for what
caused the reduction.
Issue: https://github.com/llvm/llvm-project/issues/185287