[clang-doc] Make CommentInfo arena allocated (#190050)
This patch move the CommentInfo type into the arena. It updates block
handling to collect child info types and serialize the array in one
shot.
We also clean up the test code to avoid using the arenas in the tests.
This has the upside of making the test more hermetic, and avoids churn
in the related code as the allocation API interfaces evolve.
Performance and memory usage regress slightly. This is somewhat expected
as we do not yet aggressively release short term memory during merge
operations. Future patches will reclaim this overhead.
| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 998.5s | 1010.5s | +9.8% | +1.2% |
| Memory | 86.0G | 43.8G | 47.8G | -44.4% | +9.2% |
[36 lines not shown]
[AMDGPU] IGroupLP: Fix BestCost assignment in greedy solver (NFC) (#186995)
The greedy solver's greedyFind method incorrectly reports the cost of
the last processed group instead of the best one. In practice, this does
not have any effect since (1) the cost is only used to decide whether
or not to run the exact solver and for this it only matters if it is
zero or not, and (2) the edges of the best group are used correctly.
But it clearly is conceptually wrong.
Use the best group cost, refactor how the information about the best
group is represented, and add debug output which outputs the greedy
solver's overall cost.
Disable MSVC-incompatible portions of `disable_container_overflow_checks` for MSVC (#191456)
**Context:**
The test `disable_container_overflow_checks` recently started running on
Windows, as per:
https://github.com/llvm/llvm-project/pull/181721/changes
As a result, the MSVC ASan fork of LLVM ASan started executing this
test, which has been failing for 2 reasons.
1) MSVC does not support the `__has_feature` syntax.
2) The `__SANITIZER_DISABLE_CONTAINER_OVERFLOW__` macro is not supported
in MSVC ASan (we have an equivalent in `_DISABLE_STL_ANNOTATION`)
because `__SANITIZER_DISABLE_CONTAINER_OVERFLOW__` also invokes
MSVC-incompatible syntax.
**This PR** addresses these two failures.
[19 lines not shown]
rge: add Wake-on-LAN support for magic packet
Advertise IFCAP_WOL_MAGIC when PCI power management is available
and enable it by default. On suspend or shutdown, rge_setwol()
enables the WOL_MAGIC and WOL_LANWAKE bits in CFG3/CFG5, disables
the RXDV gate, and enables PM so the NIC stays powered to watch
for magic packets.
Move hardware-specific WOL register configuration into
rge_wol_config() in if_rge_hw.c to keep hardware-specific
functions in sync with OpenBSD.
Update rge.4 to document WoL support.
Tested on FreeBSD 16.0-CURRENT bare metal with Realtek RTL8125
on a Gigabyte B650 Gaming X AX motherboard.
Signed-off-by: Christos Longros <chris.longros at gmail.com>
[2 lines not shown]
[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs
These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.
Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
[AMDGPU] Implement CFI for CSR spills
Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.
Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.
Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[Clang] Default to async unwind tables for amdgcn
To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.
There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.
Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
[AMDGPU] Implement CFI for non-kernel functions
This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.
Change-Id: I5e3a9a62cf9189245011a82a129790d813d49373
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[AMDGPU] Emit entry function Dwarf CFI
Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.
Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU
While these can be represented with .cfi_escape, using these pseudo-cfi
instructions makes .s/.mir files more readable, and it is necessary to
support updating registers in CFI instructions (something that the
AMDGPU backend requires).
Change-Id: I763d0cabe5990394670281d4afb5a170981e55d0
[LoopUnrollAndJam] Fix out-of-date LoopInfo being used during unroll and jam (#191250)
Fixed issue #190671, where loop unroll and jam did not update LoopInfo
entirely correctly.
Invalid LoopInfo gets passed into `simplifyLoopAfterUnroll()` and is
further called by SCEV at the beginning of
`ScalarEvolution::createSCEVIter()`, which triggered hidden bugs. To
fix, updated LoopInfo correctly before its use.
The loop blocks that `simplifyLoopAfterUnroll()` iterates
through, will become unavailable after the LoopInfo update. Therefore we
store the loop blocks beforehand for its use in
`simplifyLoopAfterUnroll()` later.
[LifetimeSafety] Flow origins from lifetimebound args in `gsl::Pointer` construction (#189907)
This PR adds origin flow from `[[clang::lifetimebound]]` constructor
arguments during `gsl::Pointer` construction.
Fixes #175898
Stop using spir_kernel calling convention on non-SPIR targets. (#191090)
This behavior traces back to fc2629a65a05fa05bc5c5bc37cf910c8e41cdac3 ,
but neither the commit message or the reviews actually justify using
this calling convention. The actual behavior which is important for that
change is the way clang calling convention lowering works.
There isn't really any other reason to use spir_kernel: every non-SPIR
target either rejects it, or treats it as the C calling convention. So
let's stop doing it.
Fixes #157028.
[ADT][NFC] Make po iterator stack entry trivially copyable (#191290)
std::tuple is not trivially copyable, leading to the use of less
efficient SmallVector implementations. Additionally, named members are
more readable than std::get<N>.
Also make sure that successors() is called only once per traversed basic
block -- this is difficult here: when the begin iterator is stored in
the vector between the calls, the second call can't be eliminated due to
the potentially visible store. When copying the entry into the vector,
SmallVector exposes the address of the alloca via ptrtoint to ensure
that the object indeed doesn't reside in the vector. We're missing
some optimization here... so very carefully work around this problem.
Strip .llvm. suffix after removing the coroutine suffixes to avoid breaking pseudo probe (#191354)
Pseudo probe is currently broken when a coroutine function is promoted
with a global name during ThinLTO import. The top-level function GUID in
.pseudo_probe section are computed from the promoted name (with
".llvm.xxxx" suffix) instead of the original function name. Then it will
cause a dangling top-level GUID that doesn't have any reference in the
pseudo probe desc, and potentially hurt profile quality.
The root cause of the issue were:
1) ThinLTO post-link imports and promotes a local coroutine function,
creating a global function with ".llvm.xxxx" suffix.
2) https://github.com/llvm/llvm-project/pull/141889 introduces a change
in CoroSplit pass that updates the coroutine functions linkage name with
the ".cleanup", ".destroy", ".resume" suffixes, and this creates
top-level functions with ".llvm.xxxx.cleanup", ".llvm.xxxx.destroy",
".llvm.xxxx.resume" suffixes.
3) PseudoProbePrinter and PseudoProbeInserter only strips coroutine
suffix, and didn't consider the ".llvm." suffix.
This patch fixes the issue in step 3)
japanese/font-takao: Update to 003.03.01 and take maintainership
In this release, Takao and TakaoEx fonts are distributed separately.
Update MASTER_SITES and DISTNAME.
Lint with portclippy.
Refactor do-install.
Changelog: https://launchpad.net/takao-fonts/trunk/15.03
PR: 277679
Approved by: hrs (maintainer timeout > 3 months)
Approved by: fluffy (mentor)
[AMDGPU] Always update SETREG MSBs if offset is 0
We can always update immediate if Offset is zero. The bits
HW will write are always at the same position if offset is 0.
In particular it removes redundant mode changes created as seen
in the hazard-setreg-vgpr-msb-gfx1250.mir.
This still relies on thr wrong behaviour that SETREG updates
MSBs, so it will have to be changes later. Test immediates may be
off from desired for that reason in this patch.
Add SwitchableSimpleService base class
Subclasses can override select_systemd_unit_name() to switch between
systemd units at runtime, or return None when no unit is involved.
select_etc() allows mode-dependent config generation. Intended to
support services with alternative kernel/userspace implementations.
(cherry picked from commit fb396ad0d74bdd90796b7f682c359f0c666050ce)
[Clang] Permit '--target=amdgcn--' for binaries (#191451)
Summary:
We always accepted `--target=amdgcn--` to create IR object files but it
doesn't allow creating actual binaries without user intervention. This
is because it would fall-through to the GCC toolchain which does not
know how to handle AMGCN / AMDGPU targets. This PR just adds a single
line to handle it, which effectively allows this as a 'bare' target.
Perhaps the argument could be made that AMDGPU should not support
anything but strictly HSA because it has many assumptions in the
compiler itself, such as implicit arguments, but I feel like it is
relatively harmless to support this case if users decide they really do
not need it.
NAS-140642 / 27.0.0-BETA.1 / Add SwitchableSimpleService base class (#18716)
Subclasses can override select_systemd_unit_name() to switch between
systemd units at runtime, or return None when no unit is involved.
select_etc() allows mode-dependent config generation. Intended to
support services with alternative kernel/userspace implementations.
[flang][OpenMP] Rename GetRequiredCount to GetMinimumSequenceCount
The new name better describes the calculated value.
Also adjust a diagnostic message to say that *at least* N loops are
expected in the sequence.