[LV] Handle partial sub-reductions with sub in middle block. (#178919)
Sub-reductions can be implemented in two ways:
(1) negate the operand in the vector loop (the default way).
(2) subtract the reduced value from the init value in the middle block.
Note that both ways keep the reduction itself as an 'add' reduction,
which is necessary because only llvm.vector.partial.reduce.add exists.
The ISD nodes for partial reductions don't support folding the
sub/negation into its operands because the following is not a valid
transformation:
```
sub(0, mul(ext(a), ext(b)))
-> mul(ext(a), ext(sub(0, b)))
```
It can therefore be better to choose option (2) such that the partial
reduction is always positive (starting at '0') and to do a final
subtract in the middle block.
[9 lines not shown]
[NFC] Modify the comment of LoopRotate param (#180675)
The first param of LoopRotatePass is EnableHeaderDuplication. The value
'true' means 'enable the header duplication'.
`LoopRotatePass(bool EnableHeaderDuplication, bool PrepareForLTO)`
---------
Co-authored-by: Pengcheng Wang <wangpengcheng.pp at bytedance.com>
[flang] Fix -debug crash from VScaleAttrPass (#180234)
This pass splits up the `vscaleRange` pass-option from the
`VScaleAttrPass` into `vscaleMin` and `vscaleMax` respectively, since a
`std::pair<>` cannot be used as a cli-option and crashes when running
`flang -march=rv64gcv -O3 file.f90 -mmlir -debug`.
Since the options can now be set individually I added some error
checking following the semantics described in the langref
https://llvm.org/docs/LangRef.html#function-attributes.
I also added tests since there were none for only this pass before.
[CoroSplit][DebugInfo] Fix scope of continuation funclets (#180523)
The heuristic for deciding which scope line to use for a continuation
funclet relies on iterating on the instructions of the first BB of the
continuation. Often, this contains a single unconditional branch, which
is skipped by the heuristic. However, in coro-retcon, two such
"jump-only" BBs are generated. This patch amends the heuristic to
account for that.
[flang] optimize WHERE with identical and disjoint array sections (#180279)
Improve `ScheduleOrderedAssignments` to avoid creating temporary storage
for masks in `WHERE` constructs when the mask modification is "aligned"
with the assignment (e.g., `where(a(i)>0) a(i)=...`).
- Identify "aligned" conflicts (identical array elements accessed in
order) using the `ArraySectionAnalyzer` that is extracted from
OptimizedBufferization.
- Defer saving regions with aligned conflicts, allowing fusion if
possible.
- Implement retroactive saving: if a region was modified in a previous
run (fused via aligned conflict) but is needed by a later split run,
insert a `SaveEntity` action before the modifying run.
- Use `std::list` for the schedule to support stable iterators for run
insertion.
- Update tests to verify fewer temporaries and correct retroactive
saves.
- Update flang pipeline at O2 and more to try fusing assignments in
[6 lines not shown]
[AMDGPU] Add legalization rules for atomicrmw max/min ops (#180502)
Adds rules for G_ATOMICRMW_{MAX, MIN, UMAX, UMIN, UINC_WRAP, UDEC_WRAP}.
Each of these generic opcode are supported for S32 and S64 types
on flat, global and local address spaces.
[SCEV] Add ptrtoaddr tests with external state/unstable addrspaces.
Add ptrtoaddr tests with address spaces with unstable and external but
stable pointer representations.
Currently we incorrectly form ptrtoaddr for unstsable pointers. See
discussion in https://github.com/llvm/llvm-project/pull/178861 for more
details.
rtw88: Add bus attachments to the module Makefile
In addition to PCIe we will support USB and also prepare for SDIO (still
disabled locally). The module SRCS are split up into a common part,
which we always add. All three bus parts are guarded by a local
variable in the Makefile.
In addition the PCI parts require PCI to be compiled into the kernel.
We add that check in case of, e.g., SoCs with SDIO but no PCI, which
may not have PCI in the kernel config and thus the module would fail
to attach.
USB has no additional check as it is fully loadable and does not have
to be in a kernel config.
SDIO depends on an MMCCAM-enabled kernel but is otherwise loadable.
While we could, we are not splitting the various bus attachments into
individual modules as we generally do not do that in FreeBSD. [1]
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
[3 lines not shown]
rtw89: harmonize all MODULE_DEPEND to rtw89
rtw89 came like rtw88 was done. Given rtw88 once was split up rtw89
got modelled the same way. Clean this up too.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
rtw89: cleanup static_assert() calls
These days we can use static_assert() without trouble so remove the
FreeBSD-specific rtw89_static_assert implementation. This reduces
the diff to upstream and will ease future driver updates.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
rtw88: harmonize all MODULE_DEPEND to rtw88
From the time I used to split up the driver into a core part and
bus attachment sub-drivers the various bus attachments had their own
module name but all is "rtw88" now.
Core functionality depends on linuxkpi, linuxkpi_wlan, and for debug.c
lindebugfs.
Each bus attachment then depends on its own parent layer if needed:
PCI gets pull in through linuxkpi, USB: depends on [the future] linuxkpi_usb,
and SDIO: depends on [the future] linuxkpi_sdio.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D55021
[AArch64] Add support for intent to read prefetch intrinsic (#179709)
This patch adds support in Clang for the PRFM IR instruction, by adding
the following builtin:
void __pldir(void const *addr);
This builtin is described in the following ACLE proposal:
https://github.com/ARM-software/acle/pull/406
Reland "[LV] Support conditional scalar assignments of masked operations" (#180708)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).
For example, the following loop can now be vectorized:
```
int simple_csa_int_load(
int* a, int* b, int default_val, int N, int threshold)
{
int result = default_val;
for (int i = 0; i < N; ++i)
if (a[i] > threshold)
result = b[i];
return result;
}
```
[9 lines not shown]
[mlir][vector] Reuse vector TD op in vector.xfer flatten tests (#180606)
This change adds a `RUN` line in vector-transfer-flatten.mlir that will
use `vector.flatten_vector_transfer_ops` that was introduced in #178134.
It also removes a test added in the original PR whose coverage is
already provided by pre-existing tests.