LLVM/project b234386offload/test/api omp_virtual_func_multiple_inheritance_02.cpp omp_virtual_func_multiple_inheritance_01.cpp

[OpenMP][clang] Indirect and Virtual function call mapping from host to device (#159857)

This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
DeltaFile
+403-0offload/test/api/omp_virtual_func_multiple_inheritance_02.cpp
+400-0offload/test/api/omp_virtual_func_multiple_inheritance_01.cpp
+322-0offload/test/api/omp_indirect_func_struct.c
+153-0offload/test/api/omp_virtual_func.cpp
+124-0offload/test/api/omp_indirect_func_array.c
+95-0offload/test/api/omp_indirect_func_basic.c
+1,497-014 files not shown
+1,808-120 files

LLVM/project 908782fclang/lib/Sema SemaHLSL.cpp

Reorder and format
DeltaFile
+47-53clang/lib/Sema/SemaHLSL.cpp
+47-531 files

LLVM/project 572a0e4llvm/lib/Target/AMDGPU SIInstrInfo.cpp

AMDGPU: Remove "MBUF" from "loadMBUFScalarOperandsFromVGPR" (#184282)

There is nothing MBUF-specific about this function.
DeltaFile
+11-12llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+11-121 files

LLVM/project 6d25af0llvm/utils/lit/lit TestRunner.py display.py

[utils] use annotations from __future__ in lit (#184225)

DeltaFile
+4-6llvm/utils/lit/lit/TestRunner.py
+3-3llvm/utils/lit/lit/display.py
+7-92 files

LLVM/project 768240dllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll

[AMDGPU] Insert readfirstlane for uniform VGPR arguments (#178198)

Fix inreg argument, which is uniform, but using VGPR due to run out of
SGPR.

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault at amd.com>
DeltaFile
+84,419-78,498llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+25,751-24,782llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+23,663-20,281llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+21,867-18,577llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+19,112-16,445llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+17,646-15,131llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+192,458-173,71432 files not shown
+244,721-216,71538 files

LLVM/project 43a2695llvm/lib/Target/AMDGPU SIInstrInfo.cpp

AMDGPU: Remove "MBUF" from "loadMBUFScalarOperandsFromVGPR"

There is nothing MBUF-specific about this function.

commit-id:3c711dc9
DeltaFile
+11-12llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+11-121 files

LLVM/project 1a3c736clang/include/clang/AST HLSLResource.h, clang/include/clang/Basic Attr.td

[HLSL] Add globals for resources embedded in structs

For each resource or resource array member of a struct declared
at global scope or inside a cbuffer, create an implicit global
variable of the same resource type. The variable name will be
derived from the struct instance name and the member name.

The new global is associated with the struct declaration using
a new attribute HLSLAssociatedResourceDeclAttr.

Closes #182988
DeltaFile
+163-8clang/lib/Sema/SemaHLSL.cpp
+167-0clang/test/AST/HLSL/resources-in-structs.hlsl
+46-0clang/lib/AST/HLSLResource.cpp
+34-0clang/include/clang/AST/HLSLResource.h
+8-6clang/include/clang/Sema/SemaHLSL.h
+8-0clang/include/clang/Basic/Attr.td
+426-143 files not shown
+440-149 files

LLVM/project 99a6b3eclang-tools-extra/clang-doc/assets/md class-template.mustache namespace-template.mustache, clang-tools-extra/test/clang-doc enum.cpp templates.cpp

fix conflicts and update tests
DeltaFile
+4-10clang-tools-extra/test/clang-doc/enum.cpp
+5-5clang-tools-extra/test/clang-doc/templates.cpp
+1-1clang-tools-extra/clang-doc/assets/md/class-template.mustache
+1-1clang-tools-extra/clang-doc/assets/md/namespace-template.mustache
+11-174 files

LLVM/project e63e55cflang/test/Transforms/OpenACC acc-recipe-materialization-firstprivate-derived.fir, mlir/include/mlir/Dialect/OpenACC OpenACCCGOps.td

[mlir][acc] Add ACCRecipeMaterialization pass and reduction ops (#184252)

Pass
----
Add the `acc-recipe-materialization` pass, which materializes OpenACC
privatization, firstprivate and reduction recipes by inlining their
init, copy, combiner, and destroy regions into the operation for the
construct. The pass runs on acc.parallel, acc.serial, acc.kernels, and
acc.loop.

- Firstprivate: Inserts acc.firstprivate_map so the initial value is
available on the device, then clones the recipe init and copy regions
into the construct and replaces uses with the materialized alloca.
Optional destroy region is cloned before the region terminator.

- Private: Clones the recipe init region into the construct (at region
entry or at the loop op for acc.loop private). Replaces uses of the
recipe result with the materialized alloca. Optional destroy region is
cloned before the region terminator.

    [42 lines not shown]
DeltaFile
+459-0mlir/lib/Dialect/OpenACC/Transforms/ACCRecipeMaterialization.cpp
+59-40mlir/lib/Dialect/OpenACC/Utils/OpenACCUtilsLoop.cpp
+86-0mlir/unittests/Dialect/OpenACC/OpenACCUtilsLoopTest.cpp
+66-0mlir/include/mlir/Dialect/OpenACC/OpenACCCGOps.td
+63-0mlir/lib/Dialect/OpenACC/IR/OpenACCCG.cpp
+60-0flang/test/Transforms/OpenACC/acc-recipe-materialization-firstprivate-derived.fir
+793-4017 files not shown
+1,329-4123 files

LLVM/project 92aa2d3.github/workflows/containers/github-action-ci-windows Dockerfile

[Github] Respect LLVM_VERSION when building windows container (#184231)

Otherwise setting LLVM_VERSION does not actually do anything. This
avoids needing to update ~8 different locations in the file when doing a
toolchain bump to just 1 place.
DeltaFile
+5-5.github/workflows/containers/github-action-ci-windows/Dockerfile
+5-51 files

LLVM/project 52f32d7.github/workflows/containers/github-action-ci Dockerfile, .github/workflows/containers/github-action-ci-windows Dockerfile

[Github] Bump Github Runner to v2.332.0 (#184230)

To stay ahead of the support horizon. There were no major feature
changes/bug fixes from a cursory glance at the release notes.
DeltaFile
+1-1.github/workflows/containers/github-action-ci-windows/Dockerfile
+1-1.github/workflows/containers/github-action-ci/Dockerfile
+2-22 files

LLVM/project 8decfb8mlir/lib/Conversion/ArithToEmitC ArithToEmitCPass.cpp, mlir/lib/Conversion/FuncToEmitC FuncToEmitCPass.cpp

[mlir][emitc] Do not convert illegal types to emitc (#156222)

This PR adds fallbacks for other types instead of converting unsupported
types to emitc.
DeltaFile
+8-0mlir/test/Conversion/ArithToEmitC/arith-to-emitc-failed.mlir
+6-1mlir/lib/Conversion/ArithToEmitC/ArithToEmitCPass.cpp
+6-1mlir/lib/Conversion/FuncToEmitC/FuncToEmitCPass.cpp
+6-0mlir/test/Conversion/FuncToEmitC/func-to-emitc-failed.mlir
+26-24 files

LLVM/project 2407564clang/include/clang/Basic OpenCLExtensions.def, clang/test/SemaOpenCL extension-version.cl

[Clang] Add missing extension cl_intel_split_work_group_barrier declaration (#184269)

All the OpenCL extensions must be declared in OpenCLExtensions.def,
otherwise the frontend won't recognize them and won't be able to use
them in the code. This patch adds the missing declaration for the
`cl_intel_split_work_group_barrier` extension.
DeltaFile
+12-0clang/test/SemaOpenCL/extension-version.cl
+1-0clang/include/clang/Basic/OpenCLExtensions.def
+13-02 files

LLVM/project a6fa21cclang/test/CIR/CodeGen c89-implicit-int.c expressions.cpp

[CIR] Upstream basic CodeGen tests from incubator (#183998)

This PR upstreams `expressions.cpp` and `c89-implicit-int.c` from the
ClangIR incubator to the mainline.

Following the incremental approach discussed in #156747 and the feedback
from the closed PR #157333, I have:
1. Copied the files directly from the incubator to preserve history.
2. Updated the `RUN` lines to use the `--check-prefix=CIR` flag.
3. Converted `CHECK:` lines to `CIR:`.
4. Standardized variable captures using the `%[[VAR:.*]]` regex syntax
(in `expressions.cpp`).

Verified locally with `llvm-lit`. This is a partial fix for #156747.

*Note: As suggested in previous reviews, I am focusing only on the `CIR`
checks for now to keep the upstreaming incremental. OGCG/LLVM
verification can be added in a follow-up PR once the base tests land.*
DeltaFile
+11-0clang/test/CIR/CodeGen/c89-implicit-int.c
+11-0clang/test/CIR/CodeGen/expressions.cpp
+22-02 files

LLVM/project 82319d7llvm/lib/Target/RISCV RISCVSchedAndes45.td, llvm/test/tools/llvm-mca/RISCV/Andes45 rvv-reduction.s

[RISCV] Update Andes45 vector reduction scheduling info (#182980)

This PR adds latency/throughput for all RVV reductions to the andes45
series scheduling model.
DeltaFile
+589-589llvm/test/tools/llvm-mca/RISCV/Andes45/rvv-reduction.s
+109-8llvm/lib/Target/RISCV/RISCVSchedAndes45.td
+698-5972 files

LLVM/project 2cfc12aclang-tools-extra/clang-doc YAMLGenerator.cpp JSONGenerator.cpp

Format
DeltaFile
+2-4clang-tools-extra/clang-doc/YAMLGenerator.cpp
+3-2clang-tools-extra/clang-doc/JSONGenerator.cpp
+2-1clang-tools-extra/clang-doc/MDGenerator.cpp
+1-1clang-tools-extra/clang-doc/Representation.cpp
+8-84 files

LLVM/project 9a4420fclang-tools-extra/clang-doc MDGenerator.cpp Generators.cpp, clang-tools-extra/unittests/clang-doc GeneratorTest.cpp ClangDocTest.cpp

[clang-doc] Improve complexity of Index construction

The existing implementation ends up with an O(N^2) algorithm due to
repeated linear scans during index construction. Switching to a
StringMap allows us to reduce this to O(N), since we no longer need to
search the vector.

The `BM_Index_Insertion` benchmark measures the time taken to insert N
unique records into the index.

| Scale (N Items) | Baseline (ns) | Patched (ns) | Speedup | Change |
|----------------:|--------------:|-------------:|--------:|-------:|
| 10              | 9,977         | 11,004       | 0.91x   | +10.3% |
| 64              | 69,249        | 69,166       | 1.00x   | -0.1%  |
| 512             | 1,932,714     | 525,877      | 3.68x   | -72.8% |
| 4,096           | 92,411,535    | 4,589,030    | 20.1x   | -95.0% |
| 10,000          | 577,384,945   | 12,998,039   | 44.4x   | -97.7% |

The patch delivers significant improvements to scalability. At 10,000

    [13 lines not shown]
DeltaFile
+71-17clang-tools-extra/unittests/clang-doc/GeneratorTest.cpp
+21-10clang-tools-extra/clang-doc/MDGenerator.cpp
+13-11clang-tools-extra/clang-doc/Generators.cpp
+11-5clang-tools-extra/clang-doc/JSONGenerator.cpp
+3-3clang-tools-extra/clang-doc/YAMLGenerator.cpp
+2-2clang-tools-extra/unittests/clang-doc/ClangDocTest.cpp
+121-482 files not shown
+124-518 files

LLVM/project c092d36clang-tools-extra/clang-doc/benchmarks ClangDocBenchmark.cpp

Add license header
DeltaFile
+20-6clang-tools-extra/clang-doc/benchmarks/ClangDocBenchmark.cpp
+20-61 files

LLVM/project e48739bclang-tools-extra/clang-doc/benchmarks CMakeLists.txt

Simplify CMake handling of include_directories
DeltaFile
+5-7clang-tools-extra/clang-doc/benchmarks/CMakeLists.txt
+5-71 files

LLVM/project a126145clang-tools-extra/clang-doc CMakeLists.txt, clang-tools-extra/clang-doc/benchmarks ClangDocBenchmark.cpp CMakeLists.txt

[clang-doc] Add basic benchmarks for library functionality

clang-doc's performance is good, but we suspect it could be better. To
track this with more fidelity, we can add a set of GoogleBenchmarks that
exercise portions of the library. To start we try to track high level
items that we monitor via the TimeTrace functions, and give them their
own micro benchmarks. This should give us more confidence that switching
out data structures or updating algorthms will have a positive
performance impact.

Note that an LLM helped generate portions of the benchmarks and
parameterize them. Most of the internal logic was written by me, but
the LLM was used to handle boilerplate and adaptation to the harness.
DeltaFile
+220-0clang-tools-extra/clang-doc/benchmarks/ClangDocBenchmark.cpp
+20-0clang-tools-extra/clang-doc/benchmarks/CMakeLists.txt
+4-0clang-tools-extra/clang-doc/CMakeLists.txt
+244-03 files

LLVM/project 44c4070clang-tools-extra/clang-doc/benchmarks CMakeLists.txt

Add missing newline
DeltaFile
+1-1clang-tools-extra/clang-doc/benchmarks/CMakeLists.txt
+1-11 files

LLVM/project 4103855llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlan.h, llvm/test/Transforms/LoopVectorize vplan-based-stride-mv.ll

[VPlan] Implement VPlan-based stride speculation
DeltaFile
+986-1,154llvm/test/Transforms/LoopVectorize/vplan-based-stride-mv.ll
+289-160llvm/test/Transforms/LoopVectorize/VPlan/vplan-based-stride-mv.ll
+249-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+43-0llvm/lib/Transforms/Vectorize/VPlan.h
+5-5llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+7-0llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+1,579-1,3225 files not shown
+1,601-1,32511 files

LLVM/project db7d04dllvm/test/Transforms/LoopVectorize vplan-based-stride-mv.ll, llvm/test/Transforms/LoopVectorize/VPlan vplan-based-stride-mv.ll

[NFC][VPlan] Add initial tests for future VPlan-based stride MV

I tried to include both the features that current
LoopAccessAnalysis-based transformation supports (e.g., trunc/sext of
stride) but also cases where the current implementation behaves poorly,
e.g., https://godbolt.org/z/h31c3zKxK; as well as some other potentially
interesting scenarios I could imagine.

The are two test files with the same content. One is for VPlan dump change of
the future transformation alone (I'll update `-vplan-print-after` in the next
PR), another is for the full vectorizer pipeline. The latter have two `RUN:`
lines:
 * No multiversioning, so the next PR diff can show the transformation itself
 * Stride multiversionin performed in LAA, so that we can compare future
   VPlan-based transformation vs old behavior.
DeltaFile
+4,736-0llvm/test/Transforms/LoopVectorize/vplan-based-stride-mv.ll
+3,381-0llvm/test/Transforms/LoopVectorize/VPlan/vplan-based-stride-mv.ll
+8,117-02 files

LLVM/project c13d776llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h, llvm/test/Transforms/LoopVectorize pr37248.ll runtime-check-needed-but-empty.ll

[VPlan] Scalarize to first-lane-only directly on VPlan

This is needed to enable subsequent https://github.com/llvm/llvm-project/pull/182595.

I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because that would require us to introduce
interleave groups much earlier in VPlan pipeline, and without that we
can't really `assert` this new decision matches the previous CM-based
one. And without those `assert`s it's really hard to ensure we properly
port all the previous logic.

As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.

Depends on https://github.com/llvm/llvm-project/pull/182592 but is stacked on
top of https://github.com/llvm/llvm-project/pull/182594 to enable linear
stacking for https://github.com/llvm/llvm-project/pull/182595.
DeltaFile
+65-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+6-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+2-2llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll
+3-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+1-1llvm/test/Transforms/LoopVectorize/pr37248.ll
+1-1llvm/test/Transforms/LoopVectorize/runtime-check-needed-but-empty.ll
+78-41 files not shown
+79-47 files

LLVM/project 4f91d0blibcxx/test/benchmarks/numeric gcd.bench.cpp, libcxx/test/benchmarks/streams getline.bench.cpp ofstream.bench.cpp

[libc++] Give proper names to a few benchmarks (#183333)

DeltaFile
+3-3libcxx/test/benchmarks/numeric/gcd.bench.cpp
+1-1libcxx/test/benchmarks/streams/getline.bench.cpp
+1-1libcxx/test/benchmarks/streams/ofstream.bench.cpp
+5-53 files

LLVM/project 0ced81foffload/test/mapping map_ordering_tgt_alloc_tofrom.c map_ordering_tgt_data_alloc_to_from.c

[NFC][OpenMP] Remove redundant prints in `target` regions from tests added in #184260. (#184266)

Some buildbots don't like them, and the correctness of the values in the
`target` region is ensured via prints after the region.
DeltaFile
+1-2offload/test/mapping/map_ordering_tgt_alloc_tofrom.c
+1-2offload/test/mapping/map_ordering_tgt_data_alloc_to_from.c
+1-2offload/test/mapping/map_ordering_tgt_data_alloc_tofrom.c
+1-2offload/test/mapping/map_ordering_ptee_tgt_alloc_mapper_alloc_from_to.c
+1-2offload/test/mapping/map_ordering_tgt_alloc_from_to.c
+5-105 files

LLVM/project aa4bba3llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h, llvm/test/Transforms/LoopVectorize runtime-check-needed-but-empty.ll pr37248.ll

[VPlan] Scalarize to first-lane-only directly on VPlan

This is needed to enable subsequent https://github.com/llvm/llvm-project/pull/182595.

I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because that would require us to introduce
interleave groups much earlier in VPlan pipeline, and without that we
can't really `assert` this new decision matches the previous CM-based
one. And without those `assert`s it's really hard to ensure we properly
port all the previous logic.

As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.

Depends on https://github.com/llvm/llvm-project/pull/182592 but is stacked on
top of https://github.com/llvm/llvm-project/pull/182594 to enable linear
stacking for https://github.com/llvm/llvm-project/pull/182595.
DeltaFile
+65-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+6-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+2-2llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll
+3-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+1-1llvm/test/Transforms/LoopVectorize/runtime-check-needed-but-empty.ll
+1-1llvm/test/Transforms/LoopVectorize/pr37248.ll
+78-41 files not shown
+79-47 files

LLVM/project c0a7bb3llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h, llvm/test/Transforms/LoopVectorize runtime-check-needed-but-empty.ll pr37248.ll

[VPlan] Scalarize to first-lane-only directly on VPlan

This is needed to enable subsequent https://github.com/llvm/llvm-project/pull/182595.

I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because that would require us to introduce
interleave groups much earlier in VPlan pipeline, and without that we
can't really `assert` this new decision matches the previous CM-based
one. And without those `assert`s it's really hard to ensure we properly
port all the previous logic.

As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.
DeltaFile
+65-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+6-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+2-2llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll
+3-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+1-1llvm/test/Transforms/LoopVectorize/runtime-check-needed-but-empty.ll
+1-1llvm/test/Transforms/LoopVectorize/pr37248.ll
+78-41 files not shown
+79-47 files

LLVM/project 1d1c83aoffload/include/OpenMP Mapping.h, offload/libomptarget omptarget.cpp

Reland "[OpenMP][Offload] Handle `present/to/from` when a different entry did `alloc/delete`." (#184260)

Some tests that were checking for prints inside/outside `target` regions
needed to be updated to work on systems where the ordering wasn't
deterministic.

Reverts llvm/llvm-project#184240
    
Original description from #165494:

-----

OpenMP allows cases like the following:

```c
  int *p1, *p2, x;
  p1 = p2 = &x;
  ...
  #pragma omp target_exit_data map(delete: p1[:]) from(p2[0])

    [35 lines not shown]
DeltaFile
+223-50offload/libomptarget/omptarget.cpp
+103-15offload/include/OpenMP/Mapping.h
+50-0offload/test/mapping/map_ordering_tgt_exit_data_from_mapper_overlap.c
+49-0offload/test/mapping/map_ordering_ptee_tgt_data_alloc_tgt_mapper_present_delete_from_to.c
+48-0offload/test/mapping/map_ordering_ptee_tgt_alloc_mapper_alloc_from_to.c
+43-0offload/test/mapping/map_ordering_tgt_exit_data_delete_from_assumedsize.c
+516-6511 files not shown
+765-7617 files

LLVM/project d4d1824lldb/include/lldb/Utility LLDBLog.h, lldb/source/Initialization SystemInitializerCommon.cpp

[lldb] Terminate the LLDB Log in SystemInitializerCommon::Terminate (#184261)

Currently, when calling SBDebugger::Initialize after
SBDebugger::Terminate, you hit an assert in LLDBLog when trying to
register the LLDB log a second time. Also fix the awkward
capitalization.
DeltaFile
+3-1lldb/source/Initialization/SystemInitializerCommon.cpp
+3-1lldb/source/Utility/LLDBLog.cpp
+2-1lldb/include/lldb/Utility/LLDBLog.h
+8-33 files