[MLIR][OpenMP][Docs] Reorganize 'omp' dialect documentation (NFC) (#107232)
This patch creates a handwritten main documentation page for the OpenMP
dialect linking to the ODS-generated one as a sub-section.
This new page can be extended to better describe overall design
decisions of the dialect rather than relying exclusively on
documentation generated automatically from ODS descriptions. After some
investigation, there seem to be a few main ways we could structure
dialect documentation to allow the introduction of possibly extensive
handwritten text.
- Create a top-level OpenMPDialect.td file that includes the
auto-generated one. This is what the `acc` dialect currently does, but
it results in the addition of two equal TOCs. It would be possible to
move the `include` before all handwritten sections so that the page
would have a single TOC, but I believe moving general descriptions to
the end of the document would hurt readability. Also keeping the section
order without introducing a second TOC would mean the TOC would be
inserted somewhere halfway through the page, which isn't useful.
[14 lines not shown]
[RISCV] Add Syntacore SCR7 processor definition (#108406)
Syntacore SCR7 is a high-performance Linux-capable RISC-V processor
core.
The core has rv64imafdcv_zba_zbb_zbc_zbs_zkn march.
Overview: https://syntacore.com/products/scr7
Scheduling model will be added in a subsequent PR.
---------
Co-authored-by: Dmitrii Petrov <dmitrii.petrov at syntacore.com>
Co-authored-by: Anton Afanasyev <anton.afanasyev at syntacore.com>
Co-authored-by: Elena Lepilkina <elena.lepilkina at syntacore.com>
[GlobalISel][AArch64] Add G_FPTOSI_SAT/G_FPTOUI_SAT (#96297)
This is an implementation of the saturating fp to int conversions for
GlobalISel. On AArch64 the converstion instrctions work this way,
producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is
ported from SDAG.
AArch64 has a lot of existing tests for fptosi_sat, covering a wide
range of types. I have tried to make most of them work all at once, but
a few fall back due to other missing features such as f128 handling for
min/max.
[lldb][test] Remove benchmark API tests (#108629)
These benchmarks don't get run as part of the regular API test-suite.
And I'm not aware of any CI running this. Also, I haven't quite managed
to actually run them locally using the `bench.py` script. It looks like
these are obsolete, so I'm proposing to remove the infrastructure around
it entirely.
If anyone does know of a use for these do let me know.
[lldb] Support new libc++ __compressed_pair layout (#96538)
This patch is in preparation for the `__compressed_pair` refactor in
https://github.com/llvm/llvm-project/pull/76756.
This is mostly reviewable now. With the new layout we no longer need to
unwrap the `__compressed_pair`. Instead, we just need to look for child
members. E.g., to get to the underlying pointer of `std::unique_ptr` we
no longer do,
```
GetFirstValueOfCXXCompressedPair(GetChildMemberWithName("__ptr_"))
```
but instead do
```
GetChildMemberWithName("__ptr_")
```
We need to be slightly careful because previously the
[4 lines not shown]
[RISCV] Add some missing VP zvfhmin test coverage. NFC
We're also missing coverage for the vp_reduce_* nodes, but they
currently crash with zvfhmin. It looks like they're getting expanded
instead of promoted.
[libc++] Replace `__compressed_pair` with `[[no_unique_address]]` (#76756)
This significantly simplifies the code, improves compile times and
improves the object layout of types using `__compressed_pair` in the
unstable ABI. The only downside is that this is extremely ABI sensitive
and pedantically breaks the ABI for empty final types, since the address
of the subobject may change. The ABI of the whole object should not be
affected.
Fixes #91266
Fixes #93069
[SelectionDAG] Do not build illegal nodes with users (#108573)
When we build a node with illegal type which has a user, it's possible
that it can end up being processed by the DAG combiner later before it's
removed, which can trigger an assert expecting the types to be legalized
already.
[mlir][AsmParser] Avoid use of moved value (#108789)
'std::string detailData' is moved in the innermost loop of a 2-layer
loop, but is written to throughout the whole duration of the 2-layer
loop.
After move, std::string is in an unspecified state
(implementation-dependent).
Avoid using a moved value, as it incurs undefined behavior.
[MCA][ResourceManager] Fix a bug in the instruction issue logic. (#108386)
Before this patch, the pipeline selection logic in
ResourceManager::issueInstruction() didn't know how to correctly handle
instructions which consume multiple partially overlapping resource
groups. In some cases (like the test case from #108157), the inability
to correctly allocate resources on instruction issue was leading to
crashes.
The presence of multiple partially overlapping groups complicates the
selection process by introducing extra constraints. For those cases, the
issue logic now prioritizes groups which are more constrained than
others.
Fixes #108157
[IPSCCP] Infer attributes on arguments (#107114)
During inter-procedural SCCP, also infer attributes on arguments, not
just return values. This allows other non-interprocedural passes to make
use of the information later.
[NFC][LoopVectorize] Dont pass LLVMContext to VPTypeAnalysis constructor (#108540)
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
[RecursiveASTVisitor] Do not inline TraverseStmt (NFC) (#107601)
As things are now, this reduces size of clang bootstrapped with ThinLTO
by 0.3% and reduces thin link time by 0.3%. More importantly, it avoids
a large regression once https://github.com/llvm/llvm-project/pull/107114
is merged. Without this change, there would be a 0.4% regression in code
size and 4% (!) regression in thin link time. There is no impact on
run-time performance.
[InitUndef] Enable the InitUndef pass on non-AMDGPU targets (#108353)
The InitUndef pass works around a register allocation issue, where undef
operands can be allocated to the same register as early-clobber result
operands. This may lead to ISA constraint violations, where certain
input and output registers are not allowed to overlap.
Originally this pass was implemented for RISCV, and then extended to ARM
in #77770. I've since removed the target-specific parts of the pass in
#106744 and #107885. This PR reduces the pass to use a single
requiresDisjointEarlyClobberAndUndef() target hook and enables it by
default. The hook is disabled for AMDGPU, because overlapping
early-clobber and undef operands are known to be safe for that target,
and we get significant codegen diffs otherwise.
The motivating case is the one in arm64-ldxr-stxr.ll, where we were
previously incorrectly allocating a stxp input and output to the same
register.
[ASan][test] XFAIL global-overflow.cpp etc. on SPARC (#108200)
When enabling ASan testing on SPARC as per PR #107405, two tests `FAIL`
in similar ways as detailed in Issue #108194: at `-O1` and above, one
line of the stacktrace lacks the line number info, causing the tests to
`FAIL`. I could trace this to `clang` generating incomplete line number
info; `g++` gets this right.
To avoid this, this patch `XFAIL`s the affected tests on SPARC.
Tested on `sparcv9-sun-solaris2.11`.
[sanitizer_common][test] Disable sanitizer_coverage_trace_pc_guard.cp… (#108206)
…p etc. on SPARC
When enabling ASan testing on SPARC as per PR #107405, two tests `FAIL`:
```
SanitizerCommon-asan-sparc-SunOS :: sanitizer_coverage_trace_pc_guard-dso.cpp
SanitizerCommon-asan-sparc-SunOS :: sanitizer_coverage_trace_pc_guard.cpp
```
The issue is the same in both cases:
```
WARNING: No coverage file for projects/compiler-rt/test/sanitizer_common/asan-sparc-SunOS/Output/sanitizer_coverage_trace_pc_guard.cpp.tmp
WARNING: No coverage file for sanitizer_coverage_trace_pc_guard.cpp.tmp.22766.sancov
ERROR: No valid coverage files given.
```
Checking the file with `sancov -print` reveals `Wrong magic:
4294967090`. There seems to be an endianess bug somewhere, since the
[4 lines not shown]
[ASan] Disable InstallAtForkHandler on Linux/sparc64 (#108542)
When SPARC Asan testing is enabled by PR #107405, many Linux/sparc64
tests just hang like
```
#0 0xf7ae8e90 in syscall () from /usr/lib32/libc.so.6
#1 0x701065e8 in __sanitizer::FutexWait(__sanitizer::atomic_uint32_t*, unsigned int) ()
at compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp:766
#2 0x70107c90 in Wait ()
at compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp:35
#3 0x700f7cac in Lock ()
at compiler-rt/lib/asan/../sanitizer_common/sanitizer_mutex.h:196
#4 Lock ()
at compiler-rt/lib/asan/../sanitizer_common/sanitizer_thread_registry.h:98
#5 LockThreads ()
at compiler-rt/lib/asan/asan_thread.cpp:489
#6 0x700e9c8c in __asan::BeforeFork() ()
at compiler-rt/lib/asan/asan_posix.cpp:157
#7 0xf7ac83f4 in ?? () from /usr/lib32/libc.so.6
[11 lines not shown]
[X86] Avoid generating nested CALLSEQ for TLS pointer function arguments
When a pointer to thread-local storage is passed in a function call,
ISel first lowers the call and wraps the resulting code in CALLSEQ
markers. Afterwards, to compute the pointer to TLS, a call to retrieve
the TLS base address is generated and then wrapped in a set of CALLSEQ
markers. If the latter call is inserted into the call sequence of the
former call, this leads to nested call frames, which are illegal and
lead to errors in the machine verifier.
This patch avoids surrounding the call to compute the TLS base address
in CALLSEQ markers if it is already surrounded by such markers. It
relies on zero-sized call frames being represented in the call frame
size info stored in the MachineBBs.
Fixes #45574 and #98042.
[CodeGen] Compute call frame sizes instead of storing them in MachineBBs
Previously, we would need to update stored call frame sizes whenever a
MachineBB with a call sequence is split or call frame instructions are
moved. By recomputing them instead, this change avoids the need for
keeping track of the call frame sizes throughout transformations. It
also fixes several cases where updates of call frame sizes were missing.
The overhead according to the compile time tracker is < +0.03% executed
instructions on all CT benchmarks.
This also allows distinguishing a zero-sized call frame from no open
call frame, which is helpful to avoid nested CALLSEQs (which are illegal
and cause machine verifier errors).
[ELF] Move InStruct into Ctx. NFC
Ctx was introduced in March 2022 as a more suitable place for such
singletons.
llvm/Support/thread.h includes <thread>, which transitively includes
sstream in libc++ and uses ios_base::in, so we cannot use `#define in ctx.sec`.
`symtab, config, ctx` are now the only variables using
LLVM_LIBRARY_VISIBILITY.
[analyzer] Refactor MallocChecker to use `BindExpr` in `evalCall` (#106081)
PR refactors `MallocChecker` to not violate invariant of `BindExpr`,
which should be called only during `evalCall` to avoid conflicts.
To achieve this, most of `postCall` logic was moved to `evalCall` with
addition return value binding in case of processing of allocation
functions. Check functions prototypes was changed to use `State` with
bound return value.
`checkDelim` logic was left in `postCall` to avoid conflicts with
`StreamChecker` which also evaluates `getline` and friends.
PR also introduces breaking change in the unlikely case when the
definition of an allocation function (e.g. `malloc()`) is visible: now
checker does not try to inline allocation functions and assumes their
initial semantics.
Closes #73830
[lldb] Nits on uses of llvm::raw_string_ostream (NFC) (#108745)
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess
indirection.
[Instrumentation] Move out to Utils (NFC) (#108532)
Utility functions have been moved out to Utils. Minor opportunity to
drop the header where not needed.
[Attributor] Take the address space from addrspacecast directly
If the value to be analyzed is directly from addrspacecast, we take the source
address space directly. This is to improve the case where in
`AMDGPUPromoteKernelArgumentsPass`, the kernel argument is promoted by
insertting an addrspacecast directly from a generic pointer. However, during the
analysis, the underlying object will be the generic pointer, instead of the
addrspacecast, thus the inferred address space is the generic one, which is not
ideal.
[Attributor] Take the address space from addrspacecast directly
If the value to be analyzed is directly from addrspacecast, we take the source
address space directly. This is to improve the case where in
`AMDGPUPromoteKernelArgumentsPass`, the kernel argument is promoted by
insertting an addrspacecast directly from a generic pointer. However, during the
analysis, the underlying object will be the generic pointer, instead of the
addrspacecast, thus the inferred address space is the generic one, which is not
ideal.