[flang][NFC] Strip trailing whitespace from tests (11 of 14)
Only some fortran source files in flang/test/Semantics have been
modified. The remaining files will be cleaned up in subsequent commits.
[RISCV] LMUL lists for indexed and strided loads (#169756)
Create additional lists representing valid LMULs for strided and indexed
load of particular element sizes.
[flang][OpenMP] Store list of expressions in InitializerT
The INITIALIZER clause holds a stylized expression that can be
intiantiated with different types. Currently, the InitializerT
class only holds one expression, which happens to correspond to
the first type in the DECLARE_REDUCTION type list.
Change InitializerT to hold a list of expressions instead, one
for each type. Keep the lowering code unchanged by picking the
first expression from the list.
AMDGPU/PromoteAlloca: Simplify how deferred loads work (#170510)
The second pass of promotion to vector can be quite simple. Reflect that
simplicity in the code for better maintainability.
[clang] Temporarily disable Darwin test for linking against libc++ on non-darwin systems (#170912)
Disable the test added in #170303, which breaks bots that don't use ld
as their linker. This is a temporary and narrow disablement of the test
until we can make it more general again, to get the bots green.
Co-authored-by: Louis Dionne <ldionne.2 at gmail.com>
[SPIRV] Add `<2 x half>` and `<4 x half>` atomics via `SPV_NV_shader_atomic_fp16_vector` (#170213)
This adds support for the `SPV_NV_shader_atomic_fp16_vector` extension,
and then uses it to enable lowering of atomic add, sub, min and max on 2
and 4 component vectors of FP16, which are rather common options in ML
workloads. Even though `bfloat16` also works in practice, we do not
enable it since it's not specified in the extension (which might need
updating / promoting to KHR at least). A `TODO` is also inserted in
`SPIRVModuleAnalysis.cpp' regarding the need to upgrade its ample usage
of `report_fatal_error`; I have a WiP patch for that, but it still needs
a bit of baking. Finally, a paired patch will be necessary in the
Translator, as it's not aware of the extension either - I'll update this
review to reference the PR once I create it.
[AArch64] Add isAppleMLike helper to check for M cores and aligned CPUs. (#170553)
Add a new isAppleMLike helper, that returns true if the core is part of
the Apple M core family or Apple A14 or later. Used to apply cost
decisions consistently to those groups of cores.
The function is now a single place to update when new cores are added.
It also makes sure we apply unrolling decisions for newer Apple cores to
Apple A17.
PR: https://github.com/llvm/llvm-project/pull/170553
[flang][OpenMP] Reject END DO on construct that crosses label-DO (#169714)
In a label-DO construct where two or more loops share the same
teminator, an OpenMP construct must enclose all the loops if an
end-directive is present. E.g.
```
do 100 i = 1,10
!$omp do
do 100 j = 1,10
100 continue
!$omp end do ! Error, but ok if this line is removed
```
Fixes https://github.com/llvm/llvm-project/issues/169536.
[clang] Use tighter lifetime bounds for C temporary arguments
In C, consecutive statements in the same scope are under
CompoundStmt/CallExpr, while in C++ they typically fall under
CompoundStmt/ExprWithCleanup. This leads to different behavior with
respect to where pushFullExprCleanUp inserts the lifetime end markers
(e.g., at the end of scope).
For these cases, we can track and insert the lifetime end markers right
after the call completes. Allowing the stack space to be reused
immediately. This partially addresses #109204 and #43598 for improving
stack usage.
[clang] Limit lifetimes of temporaries to the full expression (#170517)
We have several issues describing suboptimal stack usage related to the
lifetimes of temporary objects, such as #68747, #43598, and #109204.
Previously, https://reviews.llvm.org/D74094 tried to address this. In
that review, a few issues were brought up, particularly a concern about
the lifetimes of the temporaries needing to be extended to end of the
full expression. While there are arguably more optimal lifetime bounds
we could enforce, for now we can conservatively make them extend to the
end of the full expression, and later refine the optimization to use
tighter bounds (or perhaps a better mechanism in the middle end?).
Fixes #68747
Co-authored-by: Nick Desaulniers <nick.desaulniers at gmail.com>
Co-authored-by: Erik Pilkington <erik.pilkington at gmail.com>
---------
[2 lines not shown]
[mlir][bufferization] Enable moving dependent values in eliminate-empty-tensors (#169718)
Currently empty tensor elimination by constructing a SubsetExtractionOp
to match a SubsetInsertionOp at the end of a DPS chain will fail if any
operands required by the insertion op don't dominate the insertion point
for the extraction op.
This change improves the transformation by attempting to move all pure
producers of required operands to the insertion point of the extraction
op. In the process this improves a number of tests for empty tensor
elimination.