[clang] Use tighter lifetime bounds for C temporary arguments
In C, consecutive statements in the same scope are under
CompoundStmt/CallExpr, while in C++ they typically fall under
CompoundStmt/ExprWithCleanup. This leads to different behavior with
respect to where pushFullExprCleanUp inserts the lifetime end markers
(e.g., at the end of scope).
For these cases, we can track and insert the lifetime end markers right
after the call completes. Allowing the stack space to be reused
immediately. This partially addresses #109204 and #43598 for improving
stack usage.
Use setExprNeedsCleanups in BuildCXXNew and avoid breaking c++98
This approach is much cleaner, but broke checkICE reporting in c++98.
Stepping through a debugger shows that this happend because the
static_assert test didn not recognize ExprWithCleanups as transparent to
constant evaluation. To addresse this, we update CheckICE to recurse
into the sub-expression, and keep the old behavior.
[clang] Use uniform lifetime bounds under exceptions
To do this we have to slightly modify how some expressions are handled
in Sema. Principally, we need to ensure that calls to new for
non-trivial types still have their destructors run. Generally this isn't
an issue, since these just get sunk into the surrounding scope. With
more lifetime annotations being produced for the expressions, we found
that some calls to `new` in an unreachable switch arm would not be
wrapped in ExprWithCleanups. As a result, they remain on the EhStack
when processing the default label, and since the dead arm doesn't
dominate the default label, we can end up with a case where the def-use
chain is broken (e.g. the def doesn't dominate all uses). Technically
this path would be impossible to reach due to the active bit, but it
still failed to satisfy a dominance relationship.
With that in place, we can remove the constraint on only using tighter
lifetimes when exceptions are disabled.
[clang] Use tighter lifetime bounds for C temporary arguments
In C, consecutive statements in the same scope are under
CompoundStmt/CallExpr, while in C++ they typically fall under
CompoundStmt/ExprWithCleanup. This leads to different behavior with
respect to where pushFullExprCleanUp inserts the lifetime end markers
(e.g., at the end of scope).
For these cases, we can track and insert the lifetime end markers right
after the call completes. Allowing the stack space to be reused
immediately. This partially addresses #109204 and #43598 for improving
stack usage.
Use setExprNeedsCleanups in BuildCXXNew and avoid breaking c++98
This approach is much cleaner, but broke checkICE reporting in c++98.
Stepping through a debugger shows that this happend because the
static_assert test didn not recognize ExprWithCleanups as transparent to
constant evaluation. To addresse this, we update CheckICE to recurse
into the sub-expression, and keep the old behavior.
[clang] Use uniform lifetime bounds under exceptions
To do this we have to slightly modify how some expressions are handled
in Sema. Principally, we need to ensure that calls to new for
non-trivial types still have their destructors run. Generally this isn't
an issue, since these just get sunk into the surrounding scope. With
more lifetime annotations being produced for the expressions, we found
that some calls to `new` in an unreachable switch arm would not be
wrapped in ExprWithCleanups. As a result, they remain on the EhStack
when processing the default label, and since the dead arm doesn't
dominate the default label, we can end up with a case where the def-use
chain is broken (e.g. the def doesn't dominate all uses). Technically
this path would be impossible to reach due to the active bit, but it
still failed to satisfy a dominance relationship.
With that in place, we can remove the constraint on only using tighter
lifetimes when exceptions are disabled.
[AMDGPU] Fix i16/i8 flat store in true16 with sramecc (#190238)
The pattern was guarded by the D16PreservesUnusedBits predicate
which is not needed for stores.
Reapply "[clang] Limit lifetimes of temporaries to the full expression (#170517)"
This reverts commit 6d38c876478dac4a42f9d6e37692348deabf6a25. The
current version only works when exceptions are not enabled until we
determine how to resolve issues around broken dominance relationships
with the def-use chain.
[AArch64][SME] Preserve ZA in agnostic ZA functions without +sme (#190141)
`__arm_agnostic("sme_za_state")` does not require +sme, but we must
still preserve ZA in case the function is used with code that makes use
of ZA:
> The use of `__arm_agnostic("sme_za_state")` allows writing functions
> that are compatible with ZA state without having to share ZA state
> with the caller, as required by `__arm_preserves`. The use of this
> attribute does not imply that SME is available.
[lldb][kernel debug] Add a missing call to scan local fs for kexts (#190281)
A kernel developer noticed that I missed a call to index the local
filesystem in one of our codepaths, and had a use case that depended on
that working.
rdar://173814556
[Runtimes] Gracefully handle invalid LLVM_TARGET_TRIPLE (#190284)
In some situations such as reported at
https://github.com/llvm/llvm-project/pull/177953#issuecomment-4179014239,
LLVM_(DEFAULT_)TARGET_TRIPLE is not set. It is used to derive the output
directory in #177953. Only flang-rt currently uses
RUNTIMES_(INSTALL|OUTPUT)_RESOURCE_LIB_PATH, we should not fail building
other despite a missing LLVM_TARGET_TRIPLE.
Compiler-rt uses COMPILER_RT_DEFAULT_TARGET_TRIPLE instead which it
derives itself. Most other LLVM runtimes libraries just skip the target
portion of the library path (explicitly so since #93354). Do the same
for RUNTIMES_(INSTALL|OUTPUT)_RESOURCE_LIB_PATH which we hope eventually
can replace the other mechanisms.
[clang-doc] Switch to string internment (#190044)
This is the first step in migrating all the Info types to be POD. We
introduced a shared string saver that can be used safely across threads,
and updated the internal represntation of various data types to use
these over owned strings, like SmallString or std::string.
This also required changes to YAMLGenerator to keep the single quoted
string formatting and to update the YAML traits.
This change gives an almost 50% reduction in peak memory when building
documentation for clang, at about a 10% performance loss. Future patches
can mitigate the performance penalties, and further reduce memory use.
| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 920.5s | 1011.0s | +9.8% | +9.8% |
| Memory | 86.0G | 86.0G | 44.9G | -47.8% | -47.8% |
[30 lines not shown]
[clang-doc] Merge data into persistent memory
We have a need for persistent memory for the final info. Since each
group processes a single USR at a time, every USR is only ever processed by
a single thread from the thread pool. This means that we can keep per
thread persistent storage for all the info. There is significant
duplicated data between all the serialized records, so we can just merge
the final/unique items into the persistent arena, and clear out the
scratch/transient arena as we process each record in the bitcode.
The patch adds some APIs to help with managing the data, merging, and
allocation of data in the correct arena. It also safely merges and deep
copies data from the transient arenas into persistent storage that is
never reset until the program completes.
This patch reduces memory by another % over the previous patches,
bringing the total savings over the baseline to 57%. Runtime performance
and benchmarks stay mostly flat with modest improvements.
[31 lines not shown]
[clang-doc] Support deep copy between arenas for merging
Upcoming changes to the merge step will necessitate that we clear the
transient arenas and merge new items into the persistent arena. However
there are some challenges with that, as the existing types typically
don't want to be copied. We introduce some new APIs to simplify that
task and ensure we don't accidentally leak memory.
On the performance front, we reclaim about 2% of the overhead, bringing
the cumulative overhead from the series of patches down to about 7% over
the baseline.
| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 1014.5s | 991.5s | +7.7% | -2.3% |
| Memory | 86.0G | 39.9G | 40.0G | -53.4% | +0.3% |
| Benchmark | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
[28 lines not shown]