[llvm][Tablegen] Link to tutorial before programmer's reference
The natural assumption is that there's some sort of order here
and having people read the reference manual before the basic
tutorial does not make sense to me.
[Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - Allow AVX512 VPMULTISHIFTQB intrinsics to be used in constexpr (#168995)
Resolves #167477
[ShrinkWrap] Modify shrink wrapping to accommodate functions terminated by no-return blocks (#167548)
At present, the shrink wrapping pass misses opportunities to shrink wrap
in the presence of machine basic blocks which exit the function without
returning. Such cases arise from C++ functions like the following:
```cxx
int foo(int err, void* ptr) {
if (err == -1) {
if (ptr == nullptr) {
throw MyException("Received `nullptr`!", __FILE__, __LINE__);
}
handle(ptr);
}
return STATUS_OK;
}
```
In particular, assuming `MyException`'s constructor is not marked
[6 lines not shown]
[AMDGPU] Relax restrictions on amdgcn.cs.chain intrinsic
We have a new use-case for chain functions, so slightly relax the
restriction on which calling conventions may contain calls to chain
functions.
[NFC][HIP] Add __builtin_*_load_lds type check test cases (#165388)
This tests show how type-checking is performed for
`__builtin_amdgcn_load_to_lds`,
but not for `__builtin_amdgcn_raw_ptr_buffer_load_lds`,
`__builtin_amdgcn_struct_ptr_buffer_load_lds` and
`__builtin_amdgcn_global_load_lds` since they are declared with the 't'
attribute.
Stacked on top of: https://github.com/llvm/llvm-project/pull/165387
[ARM] Remove IR from mve vpt mir tests. NFC
As far as I can tell the llvm.arm.mve.vminnm.m intrinsic used in these tests
was the pre-upstream name of llvm.arm.mve.min.predicated. The tests should not
need IR sections, so remove them just relying on the MIR portions.
[mlir][llvm] Fix import of branch weights with "expected" field (#169776)
This commit fixes the import of `branch_weights` metadata from LLVM IR
to the LLVM dialect. Previously, `branch_weights` metadata containing
the `!"expected"` field were rejected because the importer expected
integer weights at operand 1, but found a string.
[OpenMP][flang] Support GPU team reductions on allocatables
Extends the work started in #165714 by supporting team reductions.
Similar to what was done in #165714, this PR introduces proper
allocations, loads, and stores for by-ref reductions in teams-related
callbacks:
* `_omp_reduction_list_to_global_copy_func`,
* `_omp_reduction_list_to_global_reduce_func`,
* `_omp_reduction_global_to_list_copy_func`, and
* `_omp_reduction_global_to_list_reduce_func`.
[AggressiveInstCombine] Match long high-half multiply (#168396)
This patch adds recognition of high-half multiply by parts into a single
larger multiply.
Considering a multiply made up of high and low parts, we can split the
multiply into:
x * y == (xh*T + xl) * (yh*T + yl)
where `xh == x>>32` and `xl == x & 0xffffffff`. `T = 2^32`.
This expands to
xh*yh*T*T + xh*yl*T + xl*yh*T + xl*yl
which I find it helpful to be drawn as
[ xh*yh ]
[ xh*yl ]
[ xl*yh ]
[ xl*yl ]
We are looking for the "high" half, which is xh*yh + xh*yl>>32 + xl*yh>>32 +
carrys. The carry makes this difficult and there are multiple ways of
[15 lines not shown]
[InstCombine] Fold @llvm.experimental.get.vector.length when cnt <= max_lanes (#169293)
On RISC-V, some loops that the loop vectorizer vectorizes pre-LTO may
turn out to have the exact trip count exposed after LTO, see #164762.
If the trip count is small enough we can fold away the
@llvm.experimental.get.vector.length intrinsic based on this corollary
from the LangRef:
> If %cnt is less than or equal to %max_lanes, the return value is equal
to %cnt.
This on its own doesn't remove the @llvm.experimental.get.vector.length
in #164762 since we also need to teach computeKnownBits about
@llvm.experimental.get.vector.length and the sub recurrence, but this PR
is a starting point.
I've added this in InstCombine rather than InstSimplify since we may
need to insert a truncation (@llvm.experimental.get.vector.length can
[3 lines not shown]
[MLIR][XeGPU] Add anchor_layout and update propagation to honor user-specified layouts (#169267)
Introduce anchor layout for XeGPU anchor ops: load_nd, store_nd,
prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and
atomic_rmw. Anchor layout is permanent, and is guaranteed to be honored
by XeGPU distribution and lowerinngs once specified.
1. Add anchor_layout for XeGPU anchor OPs: load_nd, store_nd,
prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and
atomic_rmw.
2. rename layout attributes to anchor_layout for these ops: load, store,
load_matrix, store_matrix
3. update layout propagation pass: Only when user doesn't specify anchor
layout, the pass computes a default layout and set to anchor op's
permant layout and use that for propagation. if user specified anchor
layout, the pass takes user-specified anchor layout. permant layout and
use that for propagation. if user specified anchor layout, the pass
takes user-specified anchor layout.