[Flang][OpenMP] Add combined construct information
This patch adds the `omp.combined` attribute to OpenMP dialect
operations following changes to the `ComposableOpInterface`.
This attribute is added to operations representing non-innermost leaf
constructs of a combined construct and to standalone block-associated
constructs that can be combined with their parent construct.
Changes are made to the OpenMP lowering logic, as well as the
do-concurrent, workshare and workdistribute transformation passes.
[MLIR][OpenMP] Explicit tagging of combined constructs
Combined OpenMP constructs, such as `parallel do`, which represent
nests of constructs where each one contains a single other construct
without any other directives or statements in between, are currently not
marked in any way in the MLIR representation.
This works because they don't usually require any specific handling
other than what would be done for the included operations. However, the
handling of `target` regions needs to know whether it was part of a
combined construct in order to properly optimize for the SPMD case and
detect when certain clauses must be inconditionally evaluated in the
host.
So far, this has been achieved by having some MLIR pattern-matching
logic to infer whether a nest of operations could have potentially been
produced for a combined construct. This approach is error prone,
computationally expensive and it can't really work in the general case.
On the other hand, a compiler frontend can easily tell the difference
[10 lines not shown]
[openmp] Fix export file paths (#202692)
The files omp_lib.h and omp-tools.h are the outputs of two
configure_file invocations which specify the full path of the outputs.
Use these full paths in LibompExports.cmake so they can actually be
found.
[Dexter] Add at_frame_idx to check values in frames above current
This patch adds a new attribute for !and nodes, `at_frame_idx`, which
matches against frames above its parent node; for example, in the script:
```
!where {function: foo}:
!where {function: bar}:
!and {at_frame_idx: 1}:
!value x: 0
```
The `!value x` node checks the value of 'x' in 'foo' while the debugger is
inside 'bar'. Use of this attribute comes with some restrictions: a !where
node can never be nested under a !and{at_frame_idx} node, and neither can
another !and{at_frame_idx} node.
runtimes: Pass CMAKE_SYSTEM_NAME based on target triple
Compute the cmake system name from the target triple, rather
than passing through the host's. This is primarily to stop
forwarding OSX specific cmake variables.
This fixes build failures when trying to build gpu libc on mac
hosts. Previously it would fail on several issues, starting with
an unused argument -mmacos-version-min error, followed by other
errors caused by passing -isysroot.
Secondarily, restrict the cmake imported targets when cross compiling.
Without this, the amdgpu build prints many cmake warnings about the
target not supporting shared libraries.
Claude did most of the actual work, though it required quite a few
rounds of prodding to get it into the right place. In particular it
took care of handling all of the cmake platform recognized names from
the triple.
[2 lines not shown]
Emit debug type vector (#200056)
This emits `DebugTypeVector` for HLSL `float4`-style vectors.
`partitionTypes()` separates vector `DICompositeType` nodes from basic
types so both can be visited in a single pass over the debug metadata. A
new `emitDebugTypeVector()` helper builds the `DebugTypeVector`
instruction and looks up the base-type register in `DebugTypeRegs`.
The helper skips four cases silently:
1. Absent or non-`DIBasicType` base type: only scalar element types are
supported for now.
2. Base type not yet emitted: the type was not reached during the
`DebugTypeBasic` pass.
3. Multiple subranges: `DebugTypeVector` models one-dimensional vectors
only (NSDI cannot encode multi-subrange types).
4. Non-constant subrange count: NSDI cannot represent variable-length
counts.
[2 lines not shown]
[NFC][AMDGPU] Generalize some LDS MemoryUtils
In preparation for upcoming work, I need some functions used by the LDS lowering
system to work on any GV. I removed the LDS specific queries inside these functions
and replaced them with functors passed by the caller, so these utility functions can be reused.
I also cleaned-up a few things that weren't up to code, such as lowercase variable names.
[RFC][AMDGPU] Add BARRIER address space
Add a new BARRIER address space that is used for global variables that are used to represent the barrier IDs in GFX12.5.
These barrier addresses just have values corresponding 1-1 to barrier IDs. They are still implemented on top of LDS, but the offsetting happens during an addrspacecast to generic, not whenever the barrier GV is used.
The motivation for this is to make the relation between LDS and barrier GVs explicit in the compiler. It does add a bit more complexity, but that complexity was already there, just hidden by pretending barrier GVs were actual LDS.
[NFCI][clang] Allow overriding any global variable address space
Allow the target to change the AS of a global variable at will, not just whenever Clang cannot assign one.
This enables the next patch that will specialize LDS GVs for barriers as a separate address space.
[AArch64](NFC) Introduce unified `isLegalArithImmed()` and `isLegalCmpImmed()` (#203020)
Quick tidy up to factor out some common helpers into
`AArch64AddressingModes.h`.
[KnownFPClass] Fix canonicalize incorrectly dropping fcNegZero under positive-zero denormal mode (#202268)
The denormal mode only flushes *denormal* (subnormal) values; -0.0 is
not a denormal, and per LangRef canonicalize must conserve the sign of
zero (canonicalize(-0.0) == -0.0).
Alive2 (InstCombine fold of canonicalize on a {+/-0, nan} value):
before (miscompiles -0.0 -> +0.0): https://alive2.llvm.org/ce/z/ZRK-sr
after (verifies): https://alive2.llvm.org/ce/z/L3tPu3
[Comgr][hotswap] Address PR #2437 review comments
Reviewer feedback from chinmaydd and jmmartinez:
- readKernelDescriptor now returns Expected<KernelDescriptorFields> by
value instead of writing through an out-parameter (jmmartinez), folding
the byte read and field extraction into one function.
- Group the KD register fields into a KernelDescriptorFields struct stored
as std::optional<KernelDescriptorFields> on KernelMeta, replacing the
HasKernelDescriptor bool flag (jmmartinez). PrivateSegmentFixedSize now
lives only in the descriptor struct, read authoritatively from .rodata.
- extractKernelMeta propagates a KD parse failure as an error rather than
swallowing it into a partial success (ftynse/chinmaydd; martin-luecke
agreed), so a successful KernelMeta always carries the descriptor.
- raiser.cpp reuses the shared kAMDGPUTriple from mc-state.h instead of a
local duplicate constant (chinmaydd).
- Add TODOs flagging the non-thread-safe target init and the
non-exhaustive stripEncoding suffix list (chinmaydd).
Assited-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>