[OpenMP][clang] Indirect and Virtual function call mapping from host to device (#159857)
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
[AMDGPU] Insert readfirstlane for uniform VGPR arguments (#178198)
Fix inreg argument, which is uniform, but using VGPR due to run out of
SGPR.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault at amd.com>
[HLSL] Add globals for resources embedded in structs
For each resource or resource array member of a struct declared
at global scope or inside a cbuffer, create an implicit global
variable of the same resource type. The variable name will be
derived from the struct instance name and the member name.
The new global is associated with the struct declaration using
a new attribute HLSLAssociatedResourceDeclAttr.
Closes #182988
[mlir][acc] Add ACCRecipeMaterialization pass and reduction ops (#184252)
Pass
----
Add the `acc-recipe-materialization` pass, which materializes OpenACC
privatization, firstprivate and reduction recipes by inlining their
init, copy, combiner, and destroy regions into the operation for the
construct. The pass runs on acc.parallel, acc.serial, acc.kernels, and
acc.loop.
- Firstprivate: Inserts acc.firstprivate_map so the initial value is
available on the device, then clones the recipe init and copy regions
into the construct and replaces uses with the materialized alloca.
Optional destroy region is cloned before the region terminator.
- Private: Clones the recipe init region into the construct (at region
entry or at the loop op for acc.loop private). Replaces uses of the
recipe result with the materialized alloca. Optional destroy region is
cloned before the region terminator.
[42 lines not shown]
[Github] Respect LLVM_VERSION when building windows container (#184231)
Otherwise setting LLVM_VERSION does not actually do anything. This
avoids needing to update ~8 different locations in the file when doing a
toolchain bump to just 1 place.
[Github] Bump Github Runner to v2.332.0 (#184230)
To stay ahead of the support horizon. There were no major feature
changes/bug fixes from a cursory glance at the release notes.
[Clang] Add missing extension cl_intel_split_work_group_barrier declaration (#184269)
All the OpenCL extensions must be declared in OpenCLExtensions.def,
otherwise the frontend won't recognize them and won't be able to use
them in the code. This patch adds the missing declaration for the
`cl_intel_split_work_group_barrier` extension.
[CIR] Upstream basic CodeGen tests from incubator (#183998)
This PR upstreams `expressions.cpp` and `c89-implicit-int.c` from the
ClangIR incubator to the mainline.
Following the incremental approach discussed in #156747 and the feedback
from the closed PR #157333, I have:
1. Copied the files directly from the incubator to preserve history.
2. Updated the `RUN` lines to use the `--check-prefix=CIR` flag.
3. Converted `CHECK:` lines to `CIR:`.
4. Standardized variable captures using the `%[[VAR:.*]]` regex syntax
(in `expressions.cpp`).
Verified locally with `llvm-lit`. This is a partial fix for #156747.
*Note: As suggested in previous reviews, I am focusing only on the `CIR`
checks for now to keep the upstreaming incremental. OGCG/LLVM
verification can be added in a follow-up PR once the base tests land.*
[RISCV] Update Andes45 vector reduction scheduling info (#182980)
This PR adds latency/throughput for all RVV reductions to the andes45
series scheduling model.
[clang-doc] Improve complexity of Index construction
The existing implementation ends up with an O(N^2) algorithm due to
repeated linear scans during index construction. Switching to a
StringMap allows us to reduce this to O(N), since we no longer need to
search the vector.
The `BM_Index_Insertion` benchmark measures the time taken to insert N
unique records into the index.
| Scale (N Items) | Baseline (ns) | Patched (ns) | Speedup | Change |
|----------------:|--------------:|-------------:|--------:|-------:|
| 10 | 9,977 | 11,004 | 0.91x | +10.3% |
| 64 | 69,249 | 69,166 | 1.00x | -0.1% |
| 512 | 1,932,714 | 525,877 | 3.68x | -72.8% |
| 4,096 | 92,411,535 | 4,589,030 | 20.1x | -95.0% |
| 10,000 | 577,384,945 | 12,998,039 | 44.4x | -97.7% |
The patch delivers significant improvements to scalability. At 10,000
[13 lines not shown]
[clang-doc] Add basic benchmarks for library functionality
clang-doc's performance is good, but we suspect it could be better. To
track this with more fidelity, we can add a set of GoogleBenchmarks that
exercise portions of the library. To start we try to track high level
items that we monitor via the TimeTrace functions, and give them their
own micro benchmarks. This should give us more confidence that switching
out data structures or updating algorthms will have a positive
performance impact.
Note that an LLM helped generate portions of the benchmarks and
parameterize them. Most of the internal logic was written by me, but
the LLM was used to handle boilerplate and adaptation to the harness.
[NFC][VPlan] Add initial tests for future VPlan-based stride MV
I tried to include both the features that current
LoopAccessAnalysis-based transformation supports (e.g., trunc/sext of
stride) but also cases where the current implementation behaves poorly,
e.g., https://godbolt.org/z/h31c3zKxK; as well as some other potentially
interesting scenarios I could imagine.
The are two test files with the same content. One is for VPlan dump change of
the future transformation alone (I'll update `-vplan-print-after` in the next
PR), another is for the full vectorizer pipeline. The latter have two `RUN:`
lines:
* No multiversioning, so the next PR diff can show the transformation itself
* Stride multiversionin performed in LAA, so that we can compare future
VPlan-based transformation vs old behavior.
[VPlan] Scalarize to first-lane-only directly on VPlan
This is needed to enable subsequent https://github.com/llvm/llvm-project/pull/182595.
I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because that would require us to introduce
interleave groups much earlier in VPlan pipeline, and without that we
can't really `assert` this new decision matches the previous CM-based
one. And without those `assert`s it's really hard to ensure we properly
port all the previous logic.
As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.
Depends on https://github.com/llvm/llvm-project/pull/182592 but is stacked on
top of https://github.com/llvm/llvm-project/pull/182594 to enable linear
stacking for https://github.com/llvm/llvm-project/pull/182595.
[NFC][OpenMP] Remove redundant prints in `target` regions from tests added in #184260. (#184266)
Some buildbots don't like them, and the correctness of the values in the
`target` region is ensured via prints after the region.
[VPlan] Scalarize to first-lane-only directly on VPlan
This is needed to enable subsequent https://github.com/llvm/llvm-project/pull/182595.
I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because that would require us to introduce
interleave groups much earlier in VPlan pipeline, and without that we
can't really `assert` this new decision matches the previous CM-based
one. And without those `assert`s it's really hard to ensure we properly
port all the previous logic.
As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.
Depends on https://github.com/llvm/llvm-project/pull/182592 but is stacked on
top of https://github.com/llvm/llvm-project/pull/182594 to enable linear
stacking for https://github.com/llvm/llvm-project/pull/182595.
[VPlan] Scalarize to first-lane-only directly on VPlan
This is needed to enable subsequent https://github.com/llvm/llvm-project/pull/182595.
I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because that would require us to introduce
interleave groups much earlier in VPlan pipeline, and without that we
can't really `assert` this new decision matches the previous CM-based
one. And without those `assert`s it's really hard to ensure we properly
port all the previous logic.
As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.
Reland "[OpenMP][Offload] Handle `present/to/from` when a different entry did `alloc/delete`." (#184260)
Some tests that were checking for prints inside/outside `target` regions
needed to be updated to work on systems where the ordering wasn't
deterministic.
Reverts llvm/llvm-project#184240
Original description from #165494:
-----
OpenMP allows cases like the following:
```c
int *p1, *p2, x;
p1 = p2 = &x;
...
#pragma omp target_exit_data map(delete: p1[:]) from(p2[0])
[35 lines not shown]
[lldb] Terminate the LLDB Log in SystemInitializerCommon::Terminate (#184261)
Currently, when calling SBDebugger::Initialize after
SBDebugger::Terminate, you hit an assert in LLDBLog when trying to
register the LLDB log a second time. Also fix the awkward
capitalization.