Remove LLVM_ABI from symbolicate declaration in BacktraceTools.h (#175764)
The class is already annotated with LLVM_ABI, so individual members shouldn't be.
[MLIR][Python] Improve Iterator performance. Don't `throw` in `dunderNext` methods. (#175377)
In
https://github.com/llvm/llvm-project/pull/174139#issuecomment-3733259370
I wrote a scuffed benchmark that mostly iterates MLIR Container Types in
Python. My changes from that PR made the performance worse, so I closed
it.
However, when experimetning with that I also saw a large(?) performance
gain by changing the `dunderNext` methods of the various Iterators to
use `PyErr_SetNone(PyExc_StopIteration);` instead of `throw
nb::stop_iteration();`.
<details><summary>Benchmark attempt script</summary>
```python
import timeit
from mlir.ir import Context, Location, Module, InsertionPoint, Block, Region, OpView
[93 lines not shown]
[libc++] Simplify __unwrap_iter a bit (#175153)
`__unwrap_iter` doesn't need to SFINAE away, so we can just check inside
the function body whether an iterator is copy constructible. This
reduces the overload set, improving compile times a bit.
[AArch64][llvm] Improve codegen for svldr_vnum_za/svstr_vnum_za
When compiling `svldr_vnum_za` or `svstr_vnum_za`, the output
assembly has a superfluous `SXTW` instruction (gcc doesn't add
this); this should be excised, see https://godbolt.org/z/sz4s79rf8
In clang we're using int64_t, and `i32` in llvm. The extra `SXTW`
is due to a call to `DAG.getNode(ISD::SIGN_EXTEND...)`. Make them
both 64bit to make the extra `SXTW` go away.
[X86] Add bf16 support to isFMAFasterThanFMulAndFAdd for basic FMA optimizations (#172006)
This PR extends `isFMAFasterThanFMulAndFAdd` in `X86ISelLowering` to
handle
bfloat types. This enables basic FMA optimizations for bf16
operations on AVX10.2 targets.
Includes tests for scalar and vector bf16 cases:
- Scalar bf16 FMA lowering (AVX10.2 do not support scalar bf16
operations)
- Vector bf16 FMA fusion for 128-bit, 256-bit, and 512-bit widths
AMDGPU: Change ABI of 16-bit element vectors on gfx6/7
Fix ABI on old subtargets so match new subtargets, packing
16-bit element subvectors into 32-bit registers. Previously
this would be scalarized and promoted to i32/float.
Note this only changes the vector cases. Scalar i16/half are
still promoted to i32/float for now. I've unsuccessfully tried
to make that switch in the past, so leave that for later.
This will help with removal of softPromoteHalfType.
GlobalISel: Fix mishandling vector-as-scalar in return values
This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16>
values as packed on gfx6/gfx7. The ABI does not pack values
currently; this is a pre-fix for that change.
Insert a bitcast if there is a single part with a different size.
Previously this would miscompile by going through the scalarization
and extend path, dropping the high element.
Also fix assertions in odd cases, like <3 x i16> -> i32. This needs
to unmerge with excess elements from the widened source vector.
All of this code is in need of a cleanup; this should look more
like the DAG version using getVectorTypeBreakdown.
AMDGPU: Directly use v2bf16 as register type for bf16 vectors.
Previously we were casting v2bf16 to i32, unlike the f16 case. Simplify
this by using the natural vector type. This is probably a leftover from
before v2bf16 was treated as legal. This is preparation for fixing a
miscompile in globalisel.
[Hexagon] Fix PIC crash when lowering HVX vector constants (#175413)
Fix a PIC-only crash in Hexagon HVX lowering where we ended up treating
a vector-typed constant-pool reference as an address (e.g. when forming
PC-relative addresses), which triggers a type mismatch during lowering.
Build the constant-pool reference with the target pointer type instead,
then load the HVX vector from that address.
[lldb][test] Fix warning in DAP unit tests
ProtocolTypesTest.cpp:1140:19: warning: loop variable '[value, expected]' creates a copy from type 'std::pair<lldb_dap::protocol::ExceptionBreakMode, llvm::StringRef> const' [-Wrange-loop-construct]
for (const auto [value, expected] : test_cases) {
Not that it matters because the types are lightweight, but the warning
is creating noise in my builds.
[mlir][SCF] Fix region branch op interfaces for `scf.forall` and its terminator (#174221)
`scf.forall` does not completely implement the
`RegionBranchOpInterface`: `scf.forall.in_parallel` does not implement
the `RegionBranchTerminatorOpInterface`.
Incomplete interface implementation is a problem for transformations
that try to understand the control flow by querying the
`RegionBranchOpInterface`.
Detailed explanation of what is wrong with the current implementation.
- There is exactly one region branch point: "parent". `in_parallel` is
not a region branch point because it does not implement the
`RegionBranchTerminatorOpInterface`. (Clarified in #174978.)
- `ForallOp::getSuccessorRegions(parent)` returns one region successors:
the region of the `scf.forall` op.
- Since there is no region branch point in the region, there is no way
to leave the region. This means: once you enter the region, you are
stuck in it indefinitely. (It is unspecified what happens once you are
[18 lines not shown]
[flang][Lower] Lower OmpDependClause to Depend or Doacross
The clause::Depend class was a variant that either held a TaskDep
class or a Doacross clause. This mirrors the OmpDependClause in
the AST, which due to changes in the OpenMP spec can contain two
different forms.
This is not actually necessary, and we can save some complexity by
having clause::Depend only represent task dependence, and lowering
OmpDependClause to either clause:Depend or clause::Doacross.
[libc++][test] Move the SFINAE test for return types of `quoted` to `libcxx/test/libcxx/` (#157026)
[quoted.manip] only specifies that `operator<<`/`operator>>` is
well-formed for operands with suitable types, and leaves it undefined
whether they are SFINAE-friendly.
Although it's worthwhile making them SFINAE-friendly, perhaps the
SFINAE-friendliness should be considered as a libc++-specific choice at
this moment.
See LWG4364 for whether this should be considered portable.
[IR] Fix Module move-assignment missing NamedMDSymTab, ComdatSymTab and Parent update (#175501)
`Module::operator=(Module&&)` had three bugs:
1. `NamedMDSymTab` was not moved, may causing getNamedMetadata() to
fail.
2. `ComdatSymTab` was not moved, may causing getOrInsertComdat() to
fail.
3. `NamedMDNode::Parent` was not updated after splice, may causing
getParent() to return the wrong Module.
[SystemZ][z/OS] Handle labels for parts (#175665)
Global data is emitted into parts, which are modelled as a MCSection. A
label (symbol of type LD) is not allowed in a part, which requires
special handling. The approach is to not emit the label at all, and
using the part symbol in relocations.