[clang-doc] Support deep copy between arenas for merging
Upcoming changes to the merge step will necessitate that we clear the
transient arenas and merge new items into the persistent arena. However
there are some challenges with that, as the existing types typically
don't want to be copied. We introduce some new APIs to simplify that
task and ensure we don't accidentally leak memory.
On the performance front, we reclaim about 2% of the overhead, bringing
the cumulative overhead from the series of patches down to about 7% over
the baseline.
| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 1014.5s | 991.5s | +7.7% | -2.3% |
| Memory | 86.0G | 39.9G | 40.0G | -53.4% | +0.3% |
| Benchmark | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
[28 lines not shown]
[clang-doc] Merge data into persistent memory
We have a need for persistent memory for the final info. Since each
group processes a single USR at a time, every USR is only ever processed by
a single thread from the thread pool. This means that we can keep per
thread persistent storage for all the info. There is significant
duplicated data between all the serialized records, so we can just merge
the final/unique items into the persistent arena, and clear out the
scratch/transient arena as we process each record in the bitcode.
The patch adds some APIs to help with managing the data, merging, and
allocation of data in the correct arena. It also safely merges and deep
copies data from the transient arenas into persistent storage that is
never reset until the program completes.
This patch reduces memory by another % over the previous patches,
bringing the total savings over the baseline to 57%. Runtime performance
and benchmarks stay mostly flat with modest improvements.
[31 lines not shown]
[clang-doc] Move Info types into arenas
Info types used to own significant chunks of data. As we move these into
local arenas, these types must be trivially destructible, to avoid
leaking resources when the arena is reset. Unfortunaly, there isn't a
good way to transition all the data types one at a time, since most of
them are tied together in some way. Further, as they're now allocated in
the arenas, they often cannot be treated the same way, and even the
aliases and interfaces put in pLace to simplify the transition cannot
cover the full range of changes required.
We also use some SFINAE tricks to avoid adding boilerplate for helper
APIs, we'd otherwise ahve to support
Though it introduces some additional churn, we also try to keep tests
from using arena allocation as much as possible, since this is not
required to test the implementation of the library. As much of the test
code needed to be rewritten anyway, we take the opportunity to
transition now.
[41 lines not shown]
[BOLT] Template patchELFPHDRTable and rewriteNoteSections for ELF32 (#189715)
Template patchELFPHDRTable, rewriteNoteSections, markGnuRelroSections,
and discoverStorage to support both ELF32LE and ELF64LE binaries.
Previously these functions were hardcoded for ELF64LE, causing crashes
when processing 32-bit ELF binaries.
The RewriteInstance constructor now accepts ELF32LE objects in addition
to ELF64LE. The ELF_FUNCTION macro is reused (and moved earlier in the
header) to dispatch to the correct template instantiation.
These changes are preparation for adding support to hexagon architecture
in Bolt.
[OpenMP] Move alloc / free shared from TLI to alloc tags (#190365)
Summary:
Allocation kinds were added after these were introduced. We only needed
the TLI to identify these in the attributor so we can now just use
attributes. Update the usage in OpenMP and drop the TLI interface.
Fixes: https://github.com/llvm/llvm-project/issues/190072
[AMDGPU] Specialize gfx1250 codegen tests form fake and real t16. NFC.
This is preparation of turning on real true16, so we can easily
apply it or revert.
[Analysis] No block map in MemoryDependenceAnalysis (#190367)
Avoid expensive hash map of block to value by using a vector. To avoid
allocating and clearing the entire vector per query, cache the
allocation and use an epoch to identify stale values from previous
queries.
[clang-doc] Consolidate merging logic
As we migrate things in the arena, this logic may get more complex.
Factoring it out now, will give clear extension points to make this
easier to manage.
[clang-doc] Enforce arena allocated types are trivially destructible
We can enforce at compile-time that the types we want to place in the
arenas are always safe to allocate there.
[clang-doc] Move non-arena allocated types off the OwnedPtr alias
Some types should not be using this alias, which was over applied to
APIs that wont participate in arena style allocation. This patch
restores them to their correct spelling.
[clang-doc] Make CommentInfo arena allocated
This patch move the CommentInfo type into the arena. It updates block
handling to collect child info types and serialize the array in one
shot.
We also clean up the test code to avoid using the arenas in the tests.
This has the upside of making the test more hermetic, and avoids churn
in the related code as the allocation API interfaces evolve.
Performance and memory usage regress slightly. This is somewhat expected
as we do not yet aggressively release short term memory during merge
operations. Future patches will reclaim this overhead.
| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 998.5s | 1010.5s | +9.8% | +1.2% |
| Memory | 86.0G | 43.8G | 47.8G | -44.4% | +9.2% |
[26 lines not shown]
[clang-doc] Simplify parsing and reading bitcode blocks
Much of the logic int he readBlock implementation is boilerplate, and is
repeated for each implementation/specialization. This will become much
worse as we introduce new custom block reading logic as we migrate
towards arena allocation. In preparation for that, we're introducing the
change in logic now, which should make later refactoring much more
straightforward.
[Hexagon] Use trap0(#0xDB) for debugtrap instead of brkpt (#189446)
The brkpt instruction is intended for the in-silicon debugger (ISDB).
When ISDB is not enabled, brkpt is treated as a NOP, so
__builtin_debugtrap() would silently do nothing in user-mode Linux
processes.
Use trap0(#0xDB) instead.
[Matrix] Use matrix element type for TBAA nodes. (#190029)
Matrix loads and stores are accesses of their element types. Emit TBAA
nodes using their element type to allow more precise TBAA alias
analysis.
PR: https://github.com/llvm/llvm-project/pull/190029
[AMDGPU] Specialize gfx1250 codegen tests form fake and real t16. NFC.
This is preparation of turning on real true16, so we can easily
apply it or revert.
[SROA][NFC] Don't materialize name when discarding names (#190368)
SetValuePrefix has to materialize the prefix into a std::string, which
is non-free and pointless when value names are discarded.
[Hexagon] Clean up library and include paths and fix --sysroot (#188824)
Unify include and library paths by reusing common code to compute path
prefixes. First, determine the effective sysroot by choosing a
user-provided sysroot, "../target/<triple>", or "../target/hexagon",
in the order of precedence. Based on the sysroot, derive the standard
include path, C++ include path, and base library path.
Fix the default -L library paths so they are taken from the external
sysroot, when one specified. Previously, these paths were always
relative to the install directory and sysroot was ignored.
Remove certain locations from considerations, as there are never used
for the corresponding purpose in existing sysroots:
- fallback to install path, typically "../target/bin", as the base path
when other sysroot cannot be found;
- similarly, fallback to "../target/" for startup files;
- "../target/bin" for program paths as there are no program files in
current sysroots.
[3 lines not shown]
[AMDGPU] Specialize gfx1250 codegen tests form fake and real t16. NFC.
This is preparation of turning on real true16, so we can easily
apply it or revert.
[Clang][Sema] Prevent implicit casting Complex type to Vector (#187954)
Emitting an error message in case of implicit casting of a complex type
to a built-in vector type in C
Fixes: #186805
Introduce and use Verifier::visitDIType (#189067)
This adds a new method Verifier::visitDIType, and then changes method
for subclasses of DIType to call it. The new method just dispatches to
DIScope and adds a file/line check inspired by
Verifier::visitDISubprogram.
[SampleProfile] Fix FuncMappings key mismatch for renamed functions in stale profile matching (#187899)
Fix a bug where `distributeIRToProfileLocationMap` fails to find
location mappings from IR to profile for renamed functions because
`FuncMappings` is indexed by the IR function name while
`distributeIRToProfileLocationMap` looks up by the profile function
name. Fixed by making `FuncMappings` to use profile function name as
key.
[lldb][Utility] Remove address size from Stream class (NFC) (#190375)
It violates abstraction. Luckily, it was used only in two places, see
DumpDataExtractor.cpp and CommandObjectMemory.cpp.
[clang-doc] Simplify parsing and reading bitcode blocks
Much of the logic int he readBlock implementation is boilerplate, and is
repeated for each implementation/specialization. This will become much
worse as we introduce new custom block reading logic as we migrate
towards arena allocation. In preparation for that, we're introducing the
change in logic now, which should make later refactoring much more
straightforward.
[clang-doc] Move non-arena allocated types off the OwnedPtr alias
Some types should not be using this alias, which was over applied to
APIs that wont participate in arena style allocation. This patch
restores them to their correct spelling.
[clang-doc] Merge data into persistent memory
We have a need for persistent memory for the final info. Since each
group processes a single USR at a time, every USR is only ever processed by
a single thread from the thread pool. This means that we can keep per
thread persistent storage for all the info. There is significant
duplicated data between all the serialized records, so we can just merge
the final/unique items into the persistent arena, and clear out the
scratch/transient arena as we process each record in the bitcode.
The patch adds some APIs to help with managing the data, merging, and
allocation of data in the correct arena. It also safely merges and deep
copies data from the transient arenas into persistent storage that is
never reset until the program completes.
This patch reduces memory by another % over the previous patches,
bringing the total savings over the baseline to 57%. Runtime performance
and benchmarks stay mostly flat with modest improvements.
[31 lines not shown]