[LLVM] Metric added - largest number of basic blocks in a single func… (#182970)
This metric gets the size of the biggest count of basic blocks in a
single function.
[NewPM][X86] Port AsmPrinter to NewPM
This patch makes AsmPrinter work with the NewPM. We essentially create
three new passes that wrap different parts of AsmPrinter so that we can
separate out doIntialization/doFinalization without needing to
materialize all MachineFunctions at the same time. This has two main
drawbacks for now:
1. We do not transfer any state between the three new AsmPrinter passes.
This means that debuginfo/CFI currently does not work. This will be
fixed in future passes by moving this state to MachineModuleInfo.
2. We probably incur some overhead by needing to setup up analysis
callbacks for every MF rather than just per module. This should not
be large, and can be optimized in the future on top of this if
needed.
3. This solution is not really clean. However, a lot of cleanup is going
to be difficult to do while supporting two pass managers. Once we
remove LegacyPM support, we can make the code much cleaner and better
enforce invariants like a lack of state between
[5 lines not shown]
[NFCi][NewPM][x86] Use callbacks to get analyses in AsmPrinter
This allows for overriding these call backs when using the NewPM which
has different methods for obtaining analysis results.
Reviewers: RKSimon, arsenm, phoebewang, mingmingl-llvm, aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/182796
[CodeGen][NewPM] Adjust pipeline for AsmPrinter
AsmPrinter needs to be split into three passes (begin, per MF, end) to
avoid the need to materialize all machine functions at the same time.
Update the CodeGenPassBuilder hooks for this.
Reviewers: aeubanks, paperchalice, arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/182795
[CodeGen][NewPM] Plumb MCContext through buildCodeGenPipeline
Otherwise we cannot create an MCStreamer without getting MMI, which we
cannot do until we have started running AsmPrinter without also plumbing
MMI through CodeGenPassBuilder.
Reviewers: arsenm, paperchalice, aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/182794
[NFCi][AsmPrinter] Refactor getting analyses to callbacks
As part of making AsmPrinter work with the new pass manager, we need to
be able to override how we get analyses. This patch does that by
refactoring getting all analyses/other related functionality to
callbacks that are set by default but can be overriden later (like by a
NewPM wrapper pass).
Reviewers: aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/182793
Revert "[ASan][Fuchsia] Have Fuchsia use a dynamic shadow start" (#182972)
Reverts llvm/llvm-project#180880
This is breaking Fuchsia's CI. something in the CMake needs to be
adjusted. Reverting on the author's request.
[HIP] Move HIP to the new driver by default (#123359)
Summary:
This patch matches CUDA, moving the HIP compilation jobs to the new
driver by default. The old behavior will return with
`--no-offload-new-driver`. The main difference is that objects compiled
with the old driver are no longer compatible and will need to be
recompiled or the old driver used.
[clang-doc] Improve complexity of Index construction
The existing implementation ends up with an O(N^2) algorithm due to
repeated linear scans during index construction. Switching to a
StringMap allows us to reduce this to O(N), since we no longer need to
search the vector.
The `BM_Index_Insertion` benchmark measures the time taken to insert N
unique records into the index.
| Scale (N Items) | Baseline (ns) | Patched (ns) | Speedup | Change |
|----------------:|--------------:|-------------:|--------:|-------:|
| 10 | 9,977 | 11,004 | 0.91x | +10.3% |
| 64 | 69,249 | 69,166 | 1.00x | -0.1% |
| 512 | 1,932,714 | 525,877 | 3.68x | -72.8% |
| 4,096 | 92,411,535 | 4,589,030 | 20.1x | -95.0% |
| 10,000 | 577,384,945 | 12,998,039 | 44.4x | -97.7% |
The patch delivers significant improvements to scalability. At 10,000
[13 lines not shown]
[clang-doc] Add basic benchmarks for library functionality
clang-doc's performance is good, but we suspect it could be better. To
track this with more fidelity, we can add a set of GoogleBenchmarks that
exercise portions of the library. To start we try to track high level
items that we monitor via the TimeTrace functions, and give them their
own micro benchmarks. This should give us more confidence that switching
out data structures or updating algorthms will have a positive
performance impact.
Note that an LLM helped generate portions of the benchmarks and
parameterize them. Most of the internal logic was written by me, but
the LLM was used to handle boilerplate and adaptation to the harness.
[HLSL][Matrix] EmitFromMemory when emitting load of vector and matrix element LValues (#178315)
Fixes #177712
The MatrixElt and VectorElt cases of `EmitLoadOfLValue` did not convert
the scalar value from its load/store type into its primary IR type like
the other cases do, which caused issues with HLSL in particular which
requires bools to be converted to and from i32 and i1 forms for its
load/store and primary IR types respectively.
This PR fixes the issue by applying `EmitFromMemory` to the loaded
scalar.
[VPlan] Add simple driver option to run some individual transforms. (#178522)
Add an alternative to test VPlan in more isolation via a new
`vplan-test-transform` option, which builds VPlan0 for each loop in the
input IR and then can invoke a set of transforms on it.
In order to allow different recipe types to be created, a new
widen-from-metadata transform is added, which transforms VPInstructions
to different recipes, based on custom !vplan.widen metadata. Currently
this supports creating widen & replicate recipes, but can easily be
extended in the future.
Currently the handling is intentionally bare-bones, to be extended
gradually as needed.
PR: https://github.com/llvm/llvm-project/pull/178522
AMDGPU: Cleanup the handling of flags in getTgtMemIntrinsic (#179469)
Some of the flag handling seems a bit inconsistent and dodgy, but this
is meant to be a pure refactoring for now.
[Hexagon] Fix extractHvxSubvectorPred shuffle mask for small predicates (#181364)
The loop generating the shuffle mask in extractHvxSubvectorPred used
HwLen/ResLen as the iteration count, but each iteration produces 8
elements (ResLen * Rep where Rep = 8/ResLen). This means the total mask
size was (HwLen/ResLen) * 8, which only equals HwLen when ResLen == 8.
For smaller predicate subvectors (e.g., <4 x i1> or <2 x i1>), the mask
was too large, causing an assertion failure in getVectorShuffle.
Fix by using HwLen/8 as the loop bound, which correctly produces HwLen
elements regardless of ResLen.
[AArch64] Add basic scmp and ucmp costs. (#182180)
This adds basic llvm.scmp and llvm.ucmp costs. Scalars are costed as
cmp+cset+csinv. Neon vectors can use cmgt - cmgt as the vectors write
full vector lanes.
[clang][ssaf][NFC] Avoid incomplete EntitySummary type breakage (#182946)
When parsing LUSummary.h as a standalone header unit, EntitySummary is
an incomplete type, causing compilation to fail:
```
__memory/unique_ptr.h:72:19: error: invalid application of 'sizeof' to an incomplete type 'clang::ssaf::EntitySummary'
72 | static_assert(sizeof(_Tp) >= 0, "cannot delete an incomplete type");
...
clang/include/clang/Analysis/Scalable/EntityLinker/LUSummary.h:48:12: note: in instantiation of member function 'std::map<clang::ssaf::SummaryName, std::map<clang::ssaf::EntityId, std::unique_ptr<clang::ssaf::EntitySummary>>>::map' requested here
48 | explicit LUSummary(NestedBuildNamespace LUNamespace)
| ^
clang/include/clang/Analysis/Scalable/EntityLinker/LUSummary.h:27:7: note: forward declaration of 'clang::ssaf::EntitySummary'
27 | class EntitySummary;
```
This is not a total breakage because this header file builds
successfully when used in a .cpp file that includes EntitySummary.h
prior to this.
See https://llvm.org/docs/CodingStandards.html#self-contained-headers
[Clang][Docs] Update OpenMP support status for loop transformations (#182591)
Update loop fusion transformation codegen status to done and add
additional PR links. Mark loop index set splitting parsing as in
progress.
Co-authored-by: Cursor <cursoragent at cursor.com>