[clang-doc] Improve complexity of Index construction
The existing implementation ends up with an O(N^2) algorithm due to
repeated linear scans during index construction. Switching to a
StringMap allows us to reduce this to O(N), since we no longer need to
search the vector.
The `BM_Index_Insertion` benchmark measures the time taken to insert N
unique records into the index.
| Scale (N Items) | Baseline (ns) | Patched (ns) | Speedup | Change |
|----------------:|--------------:|-------------:|--------:|-------:|
| 10 | 9,977 | 11,004 | 0.91x | +10.3% |
| 64 | 69,249 | 69,166 | 1.00x | -0.1% |
| 512 | 1,932,714 | 525,877 | 3.68x | -72.8% |
| 4,096 | 92,411,535 | 4,589,030 | 20.1x | -95.0% |
| 10,000 | 577,384,945 | 12,998,039 | 44.4x | -97.7% |
The patch delivers significant improvements to scalability. At 10,000
[13 lines not shown]
[clang-doc] Add basic benchmarks for library functionality
clang-doc's performance is good, but we suspect it could be better. To
track this with more fidelity, we can add a set of GoogleBenchmarks that
exercise portions of the library. To start we try to track high level
items that we monitor via the TimeTrace functions, and give them their
own micro benchmarks. This should give us more confidence that switching
out data structures or updating algorthms will have a positive
performance impact.
Note that an LLM helped generate portions of the benchmarks and
parameterize them. Most of the internal logic was written by me, but
the LLM was used to handle boilerplate and adaptation to the harness.
[HLSL][Matrix] EmitFromMemory when emitting load of vector and matrix element LValues (#178315)
Fixes #177712
The MatrixElt and VectorElt cases of `EmitLoadOfLValue` did not convert
the scalar value from its load/store type into its primary IR type like
the other cases do, which caused issues with HLSL in particular which
requires bools to be converted to and from i32 and i1 forms for its
load/store and primary IR types respectively.
This PR fixes the issue by applying `EmitFromMemory` to the loaded
scalar.
[VPlan] Add simple driver option to run some individual transforms. (#178522)
Add an alternative to test VPlan in more isolation via a new
`vplan-test-transform` option, which builds VPlan0 for each loop in the
input IR and then can invoke a set of transforms on it.
In order to allow different recipe types to be created, a new
widen-from-metadata transform is added, which transforms VPInstructions
to different recipes, based on custom !vplan.widen metadata. Currently
this supports creating widen & replicate recipes, but can easily be
extended in the future.
Currently the handling is intentionally bare-bones, to be extended
gradually as needed.
PR: https://github.com/llvm/llvm-project/pull/178522
AMDGPU: Cleanup the handling of flags in getTgtMemIntrinsic (#179469)
Some of the flag handling seems a bit inconsistent and dodgy, but this
is meant to be a pure refactoring for now.
[Hexagon] Fix extractHvxSubvectorPred shuffle mask for small predicates (#181364)
The loop generating the shuffle mask in extractHvxSubvectorPred used
HwLen/ResLen as the iteration count, but each iteration produces 8
elements (ResLen * Rep where Rep = 8/ResLen). This means the total mask
size was (HwLen/ResLen) * 8, which only equals HwLen when ResLen == 8.
For smaller predicate subvectors (e.g., <4 x i1> or <2 x i1>), the mask
was too large, causing an assertion failure in getVectorShuffle.
Fix by using HwLen/8 as the loop bound, which correctly produces HwLen
elements regardless of ResLen.
[AArch64] Add basic scmp and ucmp costs. (#182180)
This adds basic llvm.scmp and llvm.ucmp costs. Scalars are costed as
cmp+cset+csinv. Neon vectors can use cmgt - cmgt as the vectors write
full vector lanes.
[clang][ssaf][NFC] Avoid incomplete EntitySummary type breakage (#182946)
When parsing LUSummary.h as a standalone header unit, EntitySummary is
an incomplete type, causing compilation to fail:
```
__memory/unique_ptr.h:72:19: error: invalid application of 'sizeof' to an incomplete type 'clang::ssaf::EntitySummary'
72 | static_assert(sizeof(_Tp) >= 0, "cannot delete an incomplete type");
...
clang/include/clang/Analysis/Scalable/EntityLinker/LUSummary.h:48:12: note: in instantiation of member function 'std::map<clang::ssaf::SummaryName, std::map<clang::ssaf::EntityId, std::unique_ptr<clang::ssaf::EntitySummary>>>::map' requested here
48 | explicit LUSummary(NestedBuildNamespace LUNamespace)
| ^
clang/include/clang/Analysis/Scalable/EntityLinker/LUSummary.h:27:7: note: forward declaration of 'clang::ssaf::EntitySummary'
27 | class EntitySummary;
```
This is not a total breakage because this header file builds
successfully when used in a .cpp file that includes EntitySummary.h
prior to this.
See https://llvm.org/docs/CodingStandards.html#self-contained-headers
[Clang][Docs] Update OpenMP support status for loop transformations (#182591)
Update loop fusion transformation codegen status to done and add
additional PR links. Mark loop index set splitting parsing as in
progress.
Co-authored-by: Cursor <cursoragent at cursor.com>
AMDGPU: Cleanup the handling of flags in getTgtMemIntrinsic
Some of the flag handling seems a bit inconsistent and dodgy, but this
is meant to be a pure refactoring for now.
commit-id:99911619