[LLVM][CodeView] Add `S_REGREL32_INDIR` (#183172)
This adds `RegRelativeIndirSym` (`S_REGREL32_INDIR`) as a record, so we
can emit and dump it (#34392). It encodes a variable at the location
`*($Register+ Offset) + OffsetInUdt` and is used by MSVC in C++ 20
coroutines and C++ 17 structured bindings. Clang also needs this for
coroutines (for `__promise` which has the location `DW_OP_deref,
DW_OP_plus_uconst, 16`).
For example:
```cpp
struct Foo { int a, b; };
void fn() {
Foo f = {1, 2};
// ╰─ S_REGREL32{ reg = rsp, offset = 0 }
auto &[x, y] = f;
// │ ╰─ S_REGREL32_INDIR{ reg = rsp, offset = 8, offset-in-udt = 4, type = int }
[17 lines not shown]
[Clang][AArch64] Remove duplicate CodeGen test for bf16 get/set intrinsics (#186084)
The following test files contain identical test bodies (aside from the
RUN lines):
* clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c
* clang/test/CodeGen/arm-bf16-getset-intrinsics.c
The differences in the RUN lines do not appear to be relevant for the
tested functionality. This change keeps a single test file and
simplifies its RUN lines to match the generic style used in
clang/test/CodeGen/AArch64/neon.
This also moves toward unifying and reusing RUN lines across tests.
[HLSL] Implement Texture2D::operator[]
Implments the Texture2D::operator[] method. It uses the same design as
Buffer::operator[]. However, this requires us to chagne the
resource_getpointer intrinsic to accept integer vectors for the index.
Assisted-by: Gemini
[flang][OpenMP] Implement checks of intervening code (#185295)
Invalid intervening code will cause the containing loop to be the final
loop in the loop nest. Transparent intervening code will not affect
perfect nesting if present. Currently compiler directives are considered
transparent to allow code mixing OpenMP and such directives to compile.
Issue: https://github.com/llvm/llvm-project/issues/185287
[VPlan] Simplify the computation of the block entry mask. (#173265)
When encountering a control-flow join, VPPredicator emit a disjunction
over the incoming edge masks as the entry mask of the joining block.
However, such a complex mask is not always necessary. If the block is
control-flow equivalent to the header block, we can directly use the
header block’s entry mask as the entry mask of that block.
This patch introduces a VPlan post-dominator tree to determine whether a
block is control-flow equivalent to the header block, and simplifies the
computation of block masks accordingly.
Based on #178724
[AMDGPU] Fix missing "---" in MIR test. NFCI. (#186097)
The only problem this caused was confusing the update script so that it
failed to update checks in the following function.
[libc] Use the proper name for the 'llvm-gpu-loader' (#186101)
Summary:
This used to be two separate executables but was merged awhile back. The
LLVM libc code was never updated to use the new tool name and a recent
refactoring unintentionally removed the symlinks. Just look for
`llvm-gpu-loader`.
[LowerMemIntrinsics][AMDGPU] Optimize memset.pattern lowering
This patch changes the lowering of the [experimental.memset.pattern intrinsic](https://llvm.org/docs/LangRef.html#llvm-experimental-memset-pattern-intrinsic)
to match the optimized memset and memcpy lowering when possible. (The tl;dr of
memset.pattern is that it is like memset, except that you can use it to set
values that are wider than a single byte.)
The memset.pattern lowering now queries `TTI::getMemcpyLoopLoweringType` for a
preferred memory access type. If the size of that type is a multiple of the set
value's type, and if both types have consistent store and alloc sizes (since
memset.pattern behaves in a way that is not well suitable for access widening
if store and alloc size differ), the memset.pattern is lowered into two loops:
a main loop that stores a sufficiently wide vector splat of the SetValue with
the preferred memory access type and a residual loop that covers the remaining
set values individually.
In contrast to the memset lowering, this patch doesn't include a specialized
lowering for residual loops with known constant lengths. Loops that are
statically known to be unreachable will not be emitted.
[7 lines not shown]
[LowerMemIntrinsics] Avoid emitting unreachable loops in insertLoopExpansion
This patch refactors insertLoopExpansion and allows it to skip loops that are
statically known to be unreachable and make conditional branches with a
statically known condition unconditional. Those situations arise when the loop
count is a known constant.
These cases don't occur at the existing call sites in the memcpy and memset
lowering, since they have custom handling for constant loop sizes anyway. They
will however occur in a follow-up patch that uses insertLoopExpansion for
memset.pattern, where similar custom handling for constant loop sizes would
make less sense.
This is mostly NFC with the current use except for slight changes in the branch
weight computation from profiling data (which causes the included test
changes).
AMDGPU: Add more tests for fp min/max combines (#184336)
There's some overlap with existing tests which
use the nnan flag. The vector cases get missed here.
[OpenMP][OMPT] Remove Threads dependency from omptest (#185930)
Removed link against `Threads`.
Reason: it is potentially problematic and optional.
The issue would manifest, if `omptest` is used via `find_package`.
But `Threads` might not be found and cause a link error.
[OpenMP] Add variable capture support for transparent clause expression. (#185419)
This patch extends the `transparent` clause implementation to properly
handle runtime variable expressions as the `impex-type` argument, as
required by the OpenMP specification:
`"The use of a variable in an impex-type expression causes an implicit
reference to the variable in all enclosing constructs. The impex-type
expression is evaluated in the context outside of the construct on which
the clause appears."`
[llvm][Support] formatv: non-negative-plus for integral numbers (#185008)
The older `format()` allows you to print a `+` sign for non-negative
integral numbers upon request.
Examples:
```c++
format("%+d", 255); // -> "+255"
format("%+d", -12); // -> "-12"
```
This change adds the ability to do the same with `formatv()`:
```c++
formatv("{0:+d}", 255); // -> "+255"
formatv("{0:+d}", -12); // -> "-12"
```
[9 lines not shown]
[flang][OpenMP] Allow parsing ODS as directive-specification list item (#185737)
Normally a directive specification may use commas between the directive
name and the clauses, and between the clauses. There are some instances,
however, when a directive-specification is treated as a list item.
Specifically in arguments to the APPLY clause and as an argument to
WHEN, OTHERWISE, and the now-deprecated DEFAULT when used on a
METADIRECTIVE. In those cases, use of commas is prohibited to avoid
confusion between commas being part of the directive-specification, and
the argument list separators.
[AMDGPU] Codegen for min/max instructions for gfx1170 (#185625)
gfx1170 does not have s_minimum/maximum_f16/f32 instructions so a new
feature `SALUMinimumMaximumInsts` is added for gfx12+ subtargets.