[clang-index][USR] GenLoc prints file entry at most once, allow repeated offsets
GenLoc previously printed the source location at most once per USR,
gated by a member flag toggled on the first call. During the
recursive visit, if both an outer and an inner decl needed to print
the location, only the outer one was printed. When the outer decl did
not need the offset, no offset was ever printed. For example, the USR
of `Holder<decltype([]{})>::method` depends on the location of the
type of the lambda but the outer decl prints the file entry only,
which disables offset printing.
Change the logic so the file-entry part of the location is printed at
most once (it must be identical), while offsets of sub-decl locations
may be printed multiple times.
[docs] Enforce unambiguous toctree in llvm/docs
It seems like using a non-`hidden` `toctree` for page navigation is a
bit of a trap, in that every doc must have a single unique path through
the global toctree to the root doc, and it is very easy to end up with
multiple.
This patch tries to address the warnings (actually infos, hence why it
does not fail the build) in llvm/docs/, namely:
$ sphinx-build -b html -jauto llvm/docs/ /tmp/sphinx-out
checking consistency...
llvm/docs/AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack/AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack.md: document is referenced in multiple toctrees: ['UserGuides', 'AMDGPUUsage'], selecting: UserGuides <- AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack/AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack
llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst: document is referenced in multiple toctrees: ['UserGuides', 'AMDGPUUsage'], selecting: UserGuides <- AMDGPUDwarfExtensionsForHeterogeneousDebugging
llvm/docs/CommandGuide/llvm-reduce.rst: document is referenced in multiple toctrees: ['CommandGuide/index', 'CommandGuide/index', 'Reference'], selecting: Reference <- CommandGuide/llvm-reduce
llvm/docs/GitHub.rst: document is referenced in multiple toctrees: ['GettingInvolved', 'UserGuides'], selecting: UserGuides <- GitHub
llvm/docs/GlobalISel/IRTranslator.rst: document is referenced in multiple toctrees: ['GlobalISel/index', 'GlobalISel/Pipeline'], selecting: GlobalISel/index <- GlobalISel/IRTranslator
llvm/docs/GlobalISel/InstructionSelect.rst: document is referenced in multiple toctrees: ['GlobalISel/index', 'GlobalISel/Pipeline'], selecting: GlobalISel/index <- GlobalISel/InstructionSelect
llvm/docs/GlobalISel/Legalizer.rst: document is referenced in multiple toctrees: ['GlobalISel/index', 'GlobalISel/Pipeline'], selecting: GlobalISel/index <- GlobalISel/Legalizer
[35 lines not shown]
[docs] Create utils/docs
llvm-project is home to many sphinx documentation sites, each with
configuration quirks and bespoke extensions.
The sphinx config model makes sharing code somewhat difficult. There
are options like sphinx-multiproject, but some of our docs builds are
out of the source tree while some are done out of the binary tree, so
the multiproject configuration itself would need to be generated. It
also would impose more uniformity around extensions than required.
This change instead creates a python package at utils/docs/llvm_sphinx
and makes it available to all sphinx-build processes via PYTHONPATH.
Each conf.py does not modify its own sys.path because not all builds are
out of the source tree, so there isn't a stable relative path to use to
refer to the utils/docs/ directory.
Type checking via pyright in new package is pinned to being python 3.8
compatible.
[29 lines not shown]
Bump minimum required sphinx Python to 3.8
There seems to be de-facto use of at least 3.6 in docs, namely:
* Use of pathlib (3.4) in various places
* Format f-strings (3.6) and used in clang/docs/ghlinks.py
I don't see a strong reason to maintain the divide in minimum version
between test/docs, especially considering the "FIXME" indicating
the 3.0 lower bound was just a guess to begin with.
Change-Id: I11e00295ae0a13ec0f1c5cefbb2fdd2db272b152
[docs] Add BOLTAArch64OptimizationStatus to toctree
Building docs-bolt-html fails with:
Warning, treated as error:
/home/slinder1/llvm-project/scratch/bolt/docs/BOLTAArch64OptimizationStatus.rst:document isn't included in any toctree
Just add the orphan document to the toctree in the index to silence
this. If there is a better parent it can be moved somewhere else in the
tree.
Change-Id: I1d26d96d5485d97d29231da89f8c8408b375c41f
[dsymutil] Reuse a single thread pool across architectures (#204691)
dsymutil links the architectures of a universal binary on a thread pool,
and the parallel linker's DWARFLinkerImpl::link() then created a second
pool to link each architecture's object files. With one such inner pool
per architecture, dsymutil spun up more worker threads than the machine
has cores.
Add DWARFLinkerBase::setThreadPool() so the caller provides the pool.
The parallel linker schedules the object files on it as a
ThreadPoolTaskGroup. dsymutil hands over the pool it already uses to
schedule the architectures, llvm-dwarfutil passes one sized by
--num-threads, and the classic linker ignores it and manages its own
threads (always 2 for the lockstep algorithm).
The per-compile-unit cloning still runs on the global llvm::parallel
executor, whose per-thread allocators are indexed by getThreadIndex(),
so it can't move onto this pool.
[IR] Remove ProfileCount Abstraction
This only exists to differentiate between real and synthetic profiles.
Remove the abstraction now that we plan to fully remove synthetic
profiles.
Reviewers: mtrofin, david-xl
Reviewed By: mtrofin
Pull Request: https://github.com/llvm/llvm-project/pull/204770
[MLIR][OpenACC] Add acc-emit-remarks-loop pass (#205203)
Add a function-level pass that emits optimization remarks for loops in
`acc.compute_region`, describing their mapping to OpenACC parallel
levels (gang, worker, vector, sequential) and GPU dimensions (blockIdx,
threadIdx).
[lldb] Disable dynamic script interpreters by default under Xcode (#205423)
When LLDB_ENABLE_DYNAMIC_SCRIPTINTERPRETERS is set, liblldb's export
list is built by merging the undefined LLDB symbols extracted from each
script interpreter plugin's objects (119e57630281). Because the plugins
link liblldb, the generated file is wired into liblldb's link via
LINK_DEPENDS, a file-level dependency with no target-level edge.
The Xcode generator only has coarse target-level dependencies, so that
generated liblldb-script-interpreter.exports ends up attached to two
targets with no common dependency, which its "new build system" rejects
at generation time:
```
CMake Error in source/API/CMakeLists.txt:
.../source/API/liblldb-script-interpreter.exports
is attached to multiple targets ... but none of these is a common
dependency of the other(s). This is not allowed by the Xcode "new
build system".
[6 lines not shown]
[PSI] Return raw entry count values
Now that synthetic entry counts are being removed, stop using the
ProfileCount wrapper around entrycounts given it only exists to
distinguish between synthetic and real profile counts.
Reviewers: teresajohnson, david-xl, mtrofin
Pull Request: https://github.com/llvm/llvm-project/pull/204769
[WebAssembly] Cooperative threading for WASIP3 (#200855)
This PR builds on the changes to allow libcall thread context from
https://github.com/llvm/llvm-project/pull/175800/changes and adds the
necessary changes to support cooperative multithreading in the WASIP3
target:
- Not marking memory as shared
- Allowing thread local accesses without atomics
- Only using passive segments for TLS segments
The linker changes are supported by a new flag called
`--cooperative-multithreading`. We talked about having two flags, one
for the `--libcall-thread-context` part and one for the cooperative
multithreading part. For now, I've simply replaced the
`--libcall-thread-context` flag with the `--cooperative-multithreading`
one and kept the internal configuration intact for simplicity.
[IR] Remove Synthetic Profile Support from Function
Synthetic profiles are not generated anywhere and support is very
sporadic across the code base. They are slated to be removed, so remove
support for them from Function member functions.
A future PR will clean up the ProfileCount abstraction that is now no
longer necessary.
Reviewers: teresajohnson, david-xl, mtrofin
Pull Request: https://github.com/llvm/llvm-project/pull/204768
[BFI] Drop AllowSynthetic Parameter
This was never set anywhere to something other than the default outside
of the implementation and synthetic profile counts are slated for
removal.
Reviewers: teresajohnson, mtrofin, david-xl
Pull Request: https://github.com/llvm/llvm-project/pull/204767
[PSI] Drop AllowSynthetic parameter to getProfileCount
This was not set anywhere and synthetic profile counts are not
emitted/used anywhere, so remove it.
Reviewers: david-xl, mtrofin, teresajohnson
Pull Request: https://github.com/llvm/llvm-project/pull/204765
[MLIR][XeGPU] Refactor XeGPU layout propagation: passing lane_layout/lane_data with inst_data (#203156)
**Motivation**
Enhance setup* rules in layout propagation to pass lane_layout, and
lane_data information during inst_data propagation, so that the
propagation can have lane level information when choosing an optimal
inst_data. This branch makes that relationship explicit and uniform
across all setup rules.
**Invariant**
All setup rules now produce layouts that satisfy:
Nd ops + dpas/dpas_mx: inst_data = k * (lane_layout * lane_data), k ≥ 1
Scatter/matrix ops + non-anchor ops: inst_data = lane_layout * lane_data
**Key changes in XeGPULayoutImpl**
- New per-op anchor setup rules: setupStoreNdAnchorLayout,
[38 lines not shown]
[LoopVectorize] Don't assert in getVectorCallCost for vector library variants (#202085)
During loop vectorization, `computePredInstDiscount` queries the cost of
instructions at vector VF using `getInstructionCost`. A `CallInst` with
a vector library variant delegates to `getVectorCallCost`, which
asserted that such variants should not reach it.
A predicated call can however reach `getVectorCallCost` via
`computePredInstDiscount` — before its widening decision is made — when
a predicated user (e.g. a scatter store) is being considered for
scalarization. Remove the assert and fall through to the existing
scalarization cost, which is the cost relevant to that analysis.
Adds a regression test exercising that path.
[AMDGPU] Change static NOP last terminator SI_DEMOTE_I1 to be replaced by S_BRANCH instead of assert (#204649)
This issue was first discovered in some testing downstream. A specific
chain of transformations on a ballot instruction with a constant
argument followed by an llvm.amgcn.wqm.demote call leads to an
instruction of `SI_DEMOTE_I1 -1, 0` being the last terminator of a block
with a single successor. This instruction is a NOP and can safely be
replaced with an S_BRANCH to the block's successor instead of asserting
failure.
The test added in this change is a very simplified recreation of the
pattern seen in the shader compilation in the downstream that lead to
assertion failure