[SPIRV] Add `<2 x half>` and `<4 x half>` atomics via `SPV_NV_shader_atomic_fp16_vector` (#170213)
This adds support for the `SPV_NV_shader_atomic_fp16_vector` extension,
and then uses it to enable lowering of atomic add, sub, min and max on 2
and 4 component vectors of FP16, which are rather common options in ML
workloads. Even though `bfloat16` also works in practice, we do not
enable it since it's not specified in the extension (which might need
updating / promoting to KHR at least). A `TODO` is also inserted in
`SPIRVModuleAnalysis.cpp' regarding the need to upgrade its ample usage
of `report_fatal_error`; I have a WiP patch for that, but it still needs
a bit of baking. Finally, a paired patch will be necessary in the
Translator, as it's not aware of the extension either - I'll update this
review to reference the PR once I create it.
[AArch64] Add isAppleMLike helper to check for M cores and aligned CPUs. (#170553)
Add a new isAppleMLike helper, that returns true if the core is part of
the Apple M core family or Apple A14 or later. Used to apply cost
decisions consistently to those groups of cores.
The function is now a single place to update when new cores are added.
It also makes sure we apply unrolling decisions for newer Apple cores to
Apple A17.
PR: https://github.com/llvm/llvm-project/pull/170553
[flang][OpenMP] Reject END DO on construct that crosses label-DO (#169714)
In a label-DO construct where two or more loops share the same
teminator, an OpenMP construct must enclose all the loops if an
end-directive is present. E.g.
```
do 100 i = 1,10
!$omp do
do 100 j = 1,10
100 continue
!$omp end do ! Error, but ok if this line is removed
```
Fixes https://github.com/llvm/llvm-project/issues/169536.
[clang] Use tighter lifetime bounds for C temporary arguments
In C, consecutive statements in the same scope are under
CompoundStmt/CallExpr, while in C++ they typically fall under
CompoundStmt/ExprWithCleanup. This leads to different behavior with
respect to where pushFullExprCleanUp inserts the lifetime end markers
(e.g., at the end of scope).
For these cases, we can track and insert the lifetime end markers right
after the call completes. Allowing the stack space to be reused
immediately. This partially addresses #109204 and #43598 for improving
stack usage.
[clang] Limit lifetimes of temporaries to the full expression (#170517)
We have several issues describing suboptimal stack usage related to the
lifetimes of temporary objects, such as #68747, #43598, and #109204.
Previously, https://reviews.llvm.org/D74094 tried to address this. In
that review, a few issues were brought up, particularly a concern about
the lifetimes of the temporaries needing to be extended to end of the
full expression. While there are arguably more optimal lifetime bounds
we could enforce, for now we can conservatively make them extend to the
end of the full expression, and later refine the optimization to use
tighter bounds (or perhaps a better mechanism in the middle end?).
Fixes #68747
Co-authored-by: Nick Desaulniers <nick.desaulniers at gmail.com>
Co-authored-by: Erik Pilkington <erik.pilkington at gmail.com>
---------
[2 lines not shown]
[mlir][bufferization] Enable moving dependent values in eliminate-empty-tensors (#169718)
Currently empty tensor elimination by constructing a SubsetExtractionOp
to match a SubsetInsertionOp at the end of a DPS chain will fail if any
operands required by the insertion op don't dominate the insertion point
for the extraction op.
This change improves the transformation by attempting to move all pure
producers of required operands to the insertion point of the extraction
op. In the process this improves a number of tests for empty tensor
elimination.
[mlir][amdgpu] Add lowering for make_dma_descriptor (#169955)
* Adds initial lowering for make_dma_descriptor supporting tensors of
rank 2.
* Adds folders for make_dma_descriptor allowing statically known
operands to be folded into attributes.
* Add AllElementTypesMatch<["lds", "global"]> to make_dma_base.
* Rename pad to pad_amount
* Rename pad_every to pad_interval
[lldb][NFCI] Remove FileAction::GetPath (#170764)
This method puts strings into the ConstString pool and vends them as
llvm::StringRefs. Most of the uses only require a `std::string` or a
`const char *`. This can be achieved without wasting memory.
[NFC][LLVM] Minor code cleanup in DebugLoc (#170757)
Remove indentation of code in llvm namespace in header file. Remove {}
around single statement if in .cpp file.
[flang][cuda] Add double descriptor information in allocate/deallocate operations (#170901)
After https://github.com/llvm/llvm-project/pull/169740, the allocate and
deallocate cuf operation can be converted later. Update the way to
recognize double descriptor case by adding this information directly on
the operation itself.
[mlir][IntegerRangeAnalysis] Handle multi-dimensional loops (#170765)
Since LoopLikeInterface has (for some time) been extended to handle
multiple induction variables (and thus lower and upper bounds), handle
those bounds one at a time.