Revert "RuntimeLibcalls: Add mustprogress to common function attributes (#167080)" (#191524)
This reverts commit eb5297e0ade96fe8a6297763f28219be97dfac76.
This is redundant with willreturn.
[clang-doc] Removed OwnedPtr alias
The alias served a purpose during migration, but now conveys the wrong
semantics, as the memory of these pointers is generally interned inside
a local arena.
[clang-doc] Use distinct APIs for fixed arena allocation sites
Typically, code either always emits data into the TransientArena or the
PersistentArena. Use more explicit APIs to convey the intent directly
instead of relying on parameters or defaults.
[clang-doc] Update type aliases
Many of the type aliases we introduced to simplify migration to arena
allocation are no longer relevant after completing the migration. We
can use more relevant names and remove dead aliases.
[clang-doc] Merge data into persistent memory
We have a need for persistent memory for the final info. Since each
group processes a single USR at a time, every USR is only ever processed by
a single thread from the thread pool. This means that we can keep per
thread persistent storage for all the info. There is significant
duplicated data between all the serialized records, so we can just merge
the final/unique items into the persistent arena, and clear out the
scratch/transient arena as we process each record in the bitcode.
The patch adds some APIs to help with managing the data, merging, and
allocation of data in the correct arena. It also safely merges and deep
copies data from the transient arenas into persistent storage that is
never reset until the program completes.
This patch reduces memory by another % over the previous patches,
bringing the total savings over the baseline to 57%. Runtime performance
and benchmarks stay mostly flat with modest improvements.
[31 lines not shown]
[clang-doc] Support deep copy between arenas for merging
Upcoming changes to the merge step will necessitate that we clear the
transient arenas and merge new items into the persistent arena. However
there are some challenges with that, as the existing types typically
don't want to be copied. We introduce some new APIs to simplify that
task and ensure we don't accidentally leak memory.
On the performance front, we reclaim about 2% of the overhead, bringing
the cumulative overhead from the series of patches down to about 7% over
the baseline.
| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 1014.5s | 991.5s | +7.7% | -2.3% |
| Memory | 86.0G | 39.9G | 40.0G | -53.4% | +0.3% |
| Benchmark | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
[28 lines not shown]
[clang-doc] Move Info types into arenas (#190054)
Info types used to own significant chunks of data. As we move these into
local arenas, these types must be trivially destructible, to avoid
leaking resources when the arena is reset. Unfortunatly, there isn't a
good way to transition all the data types one at a time, since most of
them are tied together in some way. Further, as they're now allocated in
the arenas, they often cannot be treated the same way, and even the
aliases and interfaces put in place to simplify the transition cannot
cover the full range of changes required.
We also use some SFINAE tricks to avoid adding boilerplate for helper
APIs, we'd otherwise have to support
Though it introduces some additional churn, we also try to keep tests
from using arena allocation as much as possible, since this is not
required to test the implementation of the library. As much of the test
code needed to be rewritten anyway, we take the opportunity to
transition now.
[50 lines not shown]
Revert "RuntimeLibcalls: Add mustprogress to common function attributes (#167080)"
This reverts commit eb5297e0ade96fe8a6297763f28219be97dfac76.
This is redundant with willreturn.
SymbolizableObjectFile: Invalidate Wasm addresses that map outside the code section (#191329)
A fix after #191068: For linked files, invalidate any address that
is outside the text section to prevent it from being matched in DWARF as
a section-relative address.
Add test cases that cover the distinction (e.g. address 3 should match
in an object file but not in a linked file).
Also, fix the comments in the test to match the updated line numbers.
[cmake] Add support for statically linking libxml2 (#166867)
Dynamically depending on libxml2 results in various annoyances across
different linux distros for release artifacts. Specifically on fedora
and nixos the library has a different name than on debian, and on
arch-linux they tried to remove the old name entirely.
With this, enabled by default for releases, we don't sacrifice any
behavior changes, but no longer have these issues. For lld the binary
size impact is <1mb
This continues to use the shared libxml for lldb since otherwise
it requires linking ICU, which is off by default
macOS ignores this setting since libxml2 is part of the OS and stable
enough.
This mirrors what we do for zstd
[3 lines not shown]
Optimize the basename matching logic.
This change optimizes the basename matching logic in `SampleProfileMatcher::matchFunctionsWithoutProfileByBasename` by replacing the existing O(N*M) nested loop with an O(N+M) hash-based lookup, while strictly preserving the original matching semantics. The previous implementation relied on a substring heuristic (`ProfName.contains(BaseName)`) to bypass expensive demangling operations during the nested iteration; however, in codebases with common or overlapping function names, this heuristic frequently evaluated to true, resulting in redundant demangling and quadratic time complexity. The updated approach demangles each profile name exactly once and utilizes a `StringMap` to perform O(1) lookups against the orphan functions. This eliminates the need for the substring pre-check while maintaining the exact same constraints: establishing a strict 1:1 mapping between orphaned IR functions and profile entries, and correctly identifying and rejecting ambiguous matches where multiple entities share the same demangled basename.
Results in a 9x speedup on a large module with common basenames.
[clang-doc] Removed OwnedPtr alias
The alias served a purpose during migration, but now conveys the wrong
semantics, as the memory of these pointers is generally interned inside
a local arena.
[clang-doc] Use distinct APIs for fixed arena allocation sites
Typically, code either always emits data into the TransientArena or the
PersistentArena. Use more explicit APIs to convey the intent directly
instead of relying on parameters or defaults.
[clang-doc] Merge data into persistent memory
We have a need for persistent memory for the final info. Since each
group processes a single USR at a time, every USR is only ever processed by
a single thread from the thread pool. This means that we can keep per
thread persistent storage for all the info. There is significant
duplicated data between all the serialized records, so we can just merge
the final/unique items into the persistent arena, and clear out the
scratch/transient arena as we process each record in the bitcode.
The patch adds some APIs to help with managing the data, merging, and
allocation of data in the correct arena. It also safely merges and deep
copies data from the transient arenas into persistent storage that is
never reset until the program completes.
This patch reduces memory by another % over the previous patches,
bringing the total savings over the baseline to 57%. Runtime performance
and benchmarks stay mostly flat with modest improvements.
[31 lines not shown]
[clang-doc] Support deep copy between arenas for merging
Upcoming changes to the merge step will necessitate that we clear the
transient arenas and merge new items into the persistent arena. However
there are some challenges with that, as the existing types typically
don't want to be copied. We introduce some new APIs to simplify that
task and ensure we don't accidentally leak memory.
On the performance front, we reclaim about 2% of the overhead, bringing
the cumulative overhead from the series of patches down to about 7% over
the baseline.
| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 1014.5s | 991.5s | +7.7% | -2.3% |
| Memory | 86.0G | 39.9G | 40.0G | -53.4% | +0.3% |
| Benchmark | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
[28 lines not shown]
[clang-doc] Update type aliases
Many of the type aliases we introduced to simplify migration to arena
allocation are no longer relevant after completing the migration. We
can use more relevant names and remove dead aliases.
[clang-doc] Move Info types into arenas
Info types used to own significant chunks of data. As we move these into
local arenas, these types must be trivially destructible, to avoid
leaking resources when the arena is reset. Unfortunaly, there isn't a
good way to transition all the data types one at a time, since most of
them are tied together in some way. Further, as they're now allocated in
the arenas, they often cannot be treated the same way, and even the
aliases and interfaces put in pLace to simplify the transition cannot
cover the full range of changes required.
We also use some SFINAE tricks to avoid adding boilerplate for helper
APIs, we'd otherwise ahve to support
Though it introduces some additional churn, we also try to keep tests
from using arena allocation as much as possible, since this is not
required to test the implementation of the library. As much of the test
code needed to be rewritten anyway, we take the opportunity to
transition now.
[41 lines not shown]
[OpenMP][OMPIRBuilder] Support complex types in atomic update/capture
Route struct-typed values through the libcall path in
`emitAtomicUpdate`.
Previously, the libcall path was gated on `RMWOp == BAD_BINOP`, so
atomic capture swap patterns (`v = x; x = expr`) for complex values
lowered as structs fell through to the cmpxchg path. That path called
`getScalarSizeInBits()` on a struct type, produced 0, and triggered an
assertion in `IntegerType::get()`.
Remove the `BAD_BINOP` restriction so struct types always use the
libcall path. This is safe because the libcall path does not use
`RMWOp` and already handles arbitrary type sizes correctly.
Also fix `LoadSize` in the libcall path to use `XElemTy` rather than
the pointer type, which previously gave the wrong size for larger
complex types such as `complex(8)`.
[3 lines not shown]
[OpenMP][OMPIRBuilder] Support complex types in atomic update/capture
Route struct-typed values through the libcall path in
`emitAtomicUpdate`.
Previously, the libcall path was gated on `RMWOp == BAD_BINOP`, so
atomic capture swap patterns (`v = x; x = expr`) for complex values
lowered as structs fell through to the cmpxchg path. That path called
`getScalarSizeInBits()` on a struct type, produced 0, and triggered an
assertion in `IntegerType::get()`.
Remove the `BAD_BINOP` restriction so struct types always use the
libcall path. This is safe because the libcall path does not use
`RMWOp` and already handles arbitrary type sizes correctly.
Also fix `LoadSize` in the libcall path to use `XElemTy` rather than
the pointer type, which previously gave the wrong size for larger
complex types such as `complex(8)`.
[3 lines not shown]
[Modules] Enrich diags for out of date module dependencies (#190203)
Replace the opaque `ErrorStr` parameter in `ModuleManager::addModule`
with a structured object that carries specific information for
reporting. This allows the diagnostics to emit targeted notes instead of
a single error string appended to the main diagnostic. Information that
is relevant but is tied to implementation details of the compilation is
reported as notes (e.g. signature mismatches)
This patch additionally
* Adds minor comments to places I found unintuitive
* Adds a contrived test case for signature mismatches
* Sets `InputFilesValidation` based on module kind at construction
[Offload] Fix inefficient GNU Hash ELF symbol lookup (#191477)
Summary:
This PR fixes the handling of the GNU Hash table. The chain format uses
the lowest bit to indicate the end of the list. Previously we just
continued the loop instead of breaking, meaning we would exhaustively
search the whole list if the symbol was not found instead of just early
existing. This was still correct and worked in practice, but it was
slightly inefficient. Likely not noticed because the symbol tables on
these GPU binaries tend to be relatively small.
[clang-doc] Simplify parsing and reading bitcode blocks (#190053)
Much of the logic in the readBlock implementation is boilerplate, and is
repeated for each implementation/specialization. This will become much
worse as we introduce new custom block reading logic as we migrate
towards arena allocation. In preparation for that, we're introducing the
change in logic now, which should make later refactoring much more
straightforward.
[AMDGPU] Fix .Lfunc_end label placement
Now it is placed after the kernel descriptor, even the section
is .rodata, which is wrong. This allows proper code size calculation
in MC.
Remove name qualifiers on a class, using namespace instead.
This should fix the bot failure:
FAILED: clang/lib/ScalableStaticAnalysisFramework/Analyses/EntityPointerLevel/EntityPointerLevel.cpp:61:5: error: qualified name does not name a class before ‘:’ token
61 | : ConstStmtVisitor<EntityPointerLevelTranslator,
| ^