[RISC-V][LoongArch] Revert Musttail Fixes (#191508)
This reverts:
- 2b839f66ae0191039fb82049ec515bcbd726f612 (#168506)
- 6a81656f7d729615c296e5da774e78ad5b21a558 (#170547)
- ab17b5408ac83a03807b6f0ea22f51dfb84b0b8a (#188006)
- e65dd1f8a0c8cfd2255f336e5096232f587ed397 (#191093)
The changes in #168506 and #170547 both have a lifetime issue where an
SDValue is kept for the duration of a function, despite being valid only
when processing the same basic block.
Reverting both on LoongArch and RISC-V as the implementations are
identical and one of the fix commits touches both targets, rather than
doing only a RISC-V revert. I also think this more cleanly shows what is
being undone when starting again with the changes.
[SystemZ][z/OS] Remove use of subsections. (#184167)
HLASM has no notion of subsections. There are several possible solutions
how to deal with this. However,
- using a different section introduces a lot of relocations, which slows
down the binder later
- emitting the PPA1 after the code changes the location which may break
existing tools
The choosen solution is to record the PPA1 data, and emit them at the
end of the assembly into the code section. This solves both issues, at
the expense of having to do some bookkeeping.
This change moves the position of the PPA2, too, but this is less
critical.
[AArch64][GISel] Update and regenerate arm64-this-return.ll (#191515)
This updates the arm64-this-return.ll test, splitting the GISel
update_mir_test_checks into a separate GlobalISel test file.
[gn] put hlsl generated headers in hlsl/ subdirectory (#191513)
Needed after 88af28072637, which populated the previously-empty
hlsl_inline_intrinsics_gen.inc. (See also 627f6aa1cd930e6a8.)
[clang-doc] Update type aliases
Many of the type aliases we introduced to simplify migration to arena
allocation are no longer relevant after completing the migration. We
can use more relevant names and remove dead aliases.
[clang-doc] Use distinct APIs for fixed arena allocation sites
Typically, code either always emits data into the TransientArena or the
PersistentArena. Use more explicit APIs to convey the intent directly
instead of relying on parameters or defaults.
[clang-doc] Merge data into persistent memory
We have a need for persistent memory for the final info. Since each
group processes a single USR at a time, every USR is only ever processed by
a single thread from the thread pool. This means that we can keep per
thread persistent storage for all the info. There is significant
duplicated data between all the serialized records, so we can just merge
the final/unique items into the persistent arena, and clear out the
scratch/transient arena as we process each record in the bitcode.
The patch adds some APIs to help with managing the data, merging, and
allocation of data in the correct arena. It also safely merges and deep
copies data from the transient arenas into persistent storage that is
never reset until the program completes.
This patch reduces memory by another % over the previous patches,
bringing the total savings over the baseline to 57%. Runtime performance
and benchmarks stay mostly flat with modest improvements.
[31 lines not shown]
[clang-doc] Support deep copy between arenas for merging
Upcoming changes to the merge step will necessitate that we clear the
transient arenas and merge new items into the persistent arena. However
there are some challenges with that, as the existing types typically
don't want to be copied. We introduce some new APIs to simplify that
task and ensure we don't accidentally leak memory.
On the performance front, we reclaim about 2% of the overhead, bringing
the cumulative overhead from the series of patches down to about 7% over
the baseline.
| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 1014.5s | 991.5s | +7.7% | -2.3% |
| Memory | 86.0G | 39.9G | 40.0G | -53.4% | +0.3% |
| Benchmark | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
[28 lines not shown]
[clang-doc] Removed OwnedPtr alias
The alias served a purpose during migration, but now conveys the wrong
semantics, as the memory of these pointers is generally interned inside
a local arena.
[clang-doc] Move Info types into arenas
Info types used to own significant chunks of data. As we move these into
local arenas, these types must be trivially destructible, to avoid
leaking resources when the arena is reset. Unfortunaly, there isn't a
good way to transition all the data types one at a time, since most of
them are tied together in some way. Further, as they're now allocated in
the arenas, they often cannot be treated the same way, and even the
aliases and interfaces put in pLace to simplify the transition cannot
cover the full range of changes required.
We also use some SFINAE tricks to avoid adding boilerplate for helper
APIs, we'd otherwise ahve to support
Though it introduces some additional churn, we also try to keep tests
from using arena allocation as much as possible, since this is not
required to test the implementation of the library. As much of the test
code needed to be rewritten anyway, we take the opportunity to
transition now.
[41 lines not shown]
[clang-doc] Move non-arena allocated types off the OwnedPtr alias
Some types should not be using this alias, which was over applied to
APIs that wont participate in arena style allocation. This patch
restores them to their correct spelling.
[clang-doc] Simplify parsing and reading bitcode blocks
Much of the logic int he readBlock implementation is boilerplate, and is
repeated for each implementation/specialization. This will become much
worse as we introduce new custom block reading logic as we migrate
towards arena allocation. In preparation for that, we're introducing the
change in logic now, which should make later refactoring much more
straightforward.
[clang] Improve Ofast Warning (#183002)
`-Ofast` has an effect on the defaults for `-ffast-math` (documented
before this patch), and `-fstrict-aliasing` (not documented before this
patch).
On some platforms, `-Ofast` cannot be replaced with `-O3 -ffast-math`,
because the strict aliasing default would change. `-Ofast` can only be
replaced (in the exact same position) with `-O3 -ffast-math
-fstrict-aliasing` if `-Ofast` is the effective optimization level
(i.e., it is not followed by another `-O<value>` flag). Otherwise, the
`-Ofast` flag should just be deleted as it is having no effect.
This is all too difficult to summarise in a warning message, so this PR
mostly updates the docs. We keep the message about "use `-O3` to get
conforming optimizations" in the hope this encourages people to adopt
`-O3` alone.
The warning message is now emitted any time there is `-Ofast` in the
command-line string, rather than only when `-Ofast` is the effective
optimization level.
[UnitTests] Enable PCH (#191402)
I originally didn't enable PCH for unit tests, because I intended to
build a gtest PCH. But while gtest.h is slow, having many large standard
library headers already pre-compiled via the LLVMSupport PCH already
helps a lot, leaving ~250ms for parsing gtest.h (+ a fair amount of time
for template instantiation). Additionally, for unit tests that include
IR or AST headers, re-using the PCHs that include these is more
beneficial than gtest.h. Therefore, no longer disable PCH on unit tests.
[RISCV] Remove RISCVCCAssignFn and Simplify (#191071)
I think the signature of `CCAssignFn` has been updated since
`RISCVCCAssignFn` was introduced. There is now enough information passed
that we don't need a separate signature and custom reimplementations to
thread the value through.
We now expose just two `CC_RISCV` functions: one for arguments
(`CC_RISCV`) and one for return values (`RetCC_RISCV`). The argument
version now dispatches to different functions internally depending on
the `CallingConv`.
This allows the backend to remove:
- GISel's custom `RISCVOutgoingValueAssigner` and
`RISCVIncomingValueAssigner`
- GISel's custom implementation of `RISCVCallLowering::canLowerReturn`
- `llvm::RISCVCCAssignFn` which is no longer used.
- SDag's custom `RISCVTargetLowering::analyzeInputArgs` and
`RISCVTargetLowering::analyzeOutputArgs`.
[9 lines not shown]
[flang][NFC] Converted five tests from old lowering to new lowering (part 41) (#190575)
Tests converted from test/Lower/Intrinsics: shifta.f90, shiftl.f90,
shiftr.f90, size.f90, spread.f90
[MLIR][XeVM] Update HandleVectorExtract pattern. (#191052)
Split loads only if pointer address space is private.
Splitting loads from non-private memory could hurt performance.
[MLIR][XeVM] Update XeVM type converter (#189306)
Ideally, DLTI should be used for getting Index type which as it is tied
to bitwidth of pointer type that can be expressed with DLTI.
But currently, a separate pass option for bitwidth of Index type is used
in many passes.
GPU to XeVM lowering pipeline also use passes with such options.
But XeVM type converter does not provide a way to reflect choice of
Index type bitwidth and uses a hardcoded value.
This PR updates XeVM type converter to use Index type bitwidth from pass
option. This is done by using LLVM type converter for converting element
type instead of the previous custom logic.
In addition to handling Index type properly, by using LLVM type
converter, low precision float types are correctly converted to LLVM
supported types.