[libc] Update the GPU allocator to work under post-Volta ITS
Summary:
There were several gaps that caused the allocator not to work under
NVIDIA's independent thread scheduling model. The problems (I know of)
are fixed in this commit. Generally this required using correct masks,
synchronizing before a few dependent operations, and overhauling the
allocate function to stick with the existing mask instead of querying
it.
The general idiom here is that at the start we obtain a single mask and
opportunistically use it. Every use must specifically sync this subset.
I.e. query a single time and never change it.
This passes most tests, however I have encountered two issues.
1. A bug in `nvlink` failing to link symbols called in 'free'
2. A deadlock under heavy divergence caused by IPSCCP altering control
flow.
[2 lines not shown]
[OMPIRBuilder] Add support for explicit deallocation points
In this patch, some OMPIRBuilder codegen functions and callbacks are updated to
work with arrays of deallocation insertion points. The purpose of this is to
enable the replacement of `alloca`s with other types of allocations that
require explicit deallocations in a way that makes it possible for
`CodeExtractor` instances created during OMPIRBuilder finalization to also use
them.
The OpenMP to LLVM IR MLIR translation pass is updated to properly store and
forward deallocation points together with their matching allocation point to
the OMPIRBuilder.
Currently, only the `DeviceSharedMemCodeExtractor` uses this feature to get the
`CodeExtractor` to use device shared memory for intermediate allocations when
outlining a parallel region inside of a Generic kernel (code path that is only
used by Flang via MLIR, currently). However, long term this might also be
useful to refactor finalization of variables with destructors, potentially
reducing the use of callbacks and simplifying privatization and reductions.
[5 lines not shown]
Do not format .td files in Clang; NFC (#182075)
We have varying needs for these files. e.g., a diagnostic file is a
different kind of file than compiler options which is different than
attributes which is different than attribute documentation, etc. So
running clang-format over .td files in Clang is not going well in
practice because of how often it reformats things unlike the rest of the
file. This results in a poor new contributor experience because
pre-commit CI tells them the changes are not clang-format clean but we
don't want the changes to be clang-format clean and so a reviewer asks
them to revert and ignore pre-commit CI.
---------
Co-authored-by: Sirraide <aeternalmail at gmail.com>
[mlir][NFC] Remove unused deprecated API wrappers (#182715)
Remove deprecated functions and constructors that have zero callers in
the monorepo: `applyPatternsAndFoldGreedily`, `applyOpPatternsAndFold`,
`NamedAttrList(std::nullopt_t)`, and `OpPrintingFlags(std::nullopt_t)`.
These had FIXME comments requesting their removal.