[MLIR][OpenMP] Bail early in sortMapIndices if indices are the same (#169474)
If we are given the same index in the comparator callback, simply return
false. Otherwise we will end up adding invalid items to
occludedChildren, causing extra items to get removed that should not be,
resulting in failures that manifest in different forms (assertions, asan
failures, ubsan failures, etc.).
[OpenMP][flang] Add initial support for by-ref reductions on the GPU
Adds initial support for GPU by-ref reductions. In particular, this diff
adds support for reductions on scalar allocatables where reductions
happen on loops nested in `target` regions. For example:
```fortran
integer :: i
real, allocatable :: scalar_alloc
allocate(scalar_alloc)
scalar_alloc = 0
!$omp target map(tofrom: scalar_alloc)
!$omp parallel do reduction(+: scalar_alloc)
do i = 1, 1000000
scalar_alloc = scalar_alloc + 1
end do
!$omp end target
[12 lines not shown]
Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit bb78728826ff57f3df859e79bfd857b5a175bb6d.
[DAG] SDPatternMatch - add m_SpecificFP matcher (#167438)
This patch introduces SpecificFP matcher for SelectionDAG nodes.
This includes:
Adding SpecificFP_match() in SDPatternMatch.h.
Adding test coverage in SelectionDAGPatternMatchTest.cpp.
Closes #165566
[AArch64][SME] Support saving/restoring ZT0 in the MachineSMEABIPass
This patch extends the MachineSMEABIPass to support ZT0. This is done
with the addition of two new states:
- `ACTIVE_ZT0_SAVED`
* This is used when calling a function that shares ZA, but does
share ZT0 (i.e., no ZT0 attributes).
* This state indicates ZT0 must be saved to the save slot, but
must remain on, with no lazy save setup
- `LOCAL_COMMITTED`
* This is used for saving ZT0 in functions without ZA state.
* This state indicates ZA is off and ZT0 has been saved.
* This state is general enough to support ZA, but those
have not been implemented†
To aid with readability, the state transitions have been reworked to a
switch of `transitionFrom(<FromState>).to(<ToState>)`, rather than
nested ifs, which helps manage more transitions.
[5 lines not shown]
[AMDGPU] Insert inliner anchor earlier
Add a new hook for inserting passes right after the last DummyCGSCC pass
and use it to insert the anchor. This changes the last FunctionPass
manager to be an inlining pass manager, thus preserving some of the
analyses that might be computed before the inliner and used after it (to
be fair that's never going to be a lot of analyses, since inlining is
pretty plastic, but at least some of the IR-level analyses that have
absolutely no reason to change can be computed only once).
This is how I originally designed the code, but I don't feel like I have
a good name/abstraction for this exact point in the pipeline, hence the
separate patch.
[AMDGPU] Update machine frame info during inlining
Update some of the machine frame info while inlining functions. The
stack of the caller will now contain an additional object representing
the stacks of its callees that have been inlined.
Also update some other info such as HasCalls and a few other pieces of
info that are trivial to update (this isn't very thorough or exhaustive,
and notably doesn't handle tail calls).
Support inlining in backend update scripts
Generate CHECK-NOT for MIR functions that are missing from the output.
Also look for conflicts where a MIR function is generated for some runs
but not others with the same prefixes.
[LoongArch][DAGCombiner] Combine vand (vnot ..) to vandn (#161037)
After this commit, DAGCombiner will have more opportunities to perform
vector folding. This patch includes several foldings, as follows:
- VANDN(x,NOT(y)) -> AND(NOT(x),NOT(y)) -> NOT(OR(X,Y))
- VANDN(x, SplatVector(Imm)) -> AND(NOT(x), NOT(SplatVector(~Imm)))
[DAG][X86] Improve custom i256/i512 AVX512 CTLZ/CTTZ Handling with MVT::i256/i512 (#168860)
This patch proposes to move the AVX512 CTLZ/CTTZ i256/i512 codegen to
ReplaceNodeResults to allow them to be declared as custom lowering -
this allows expansion of larger int types (e.g. i1024) to fallback to
them during their expansion.
However to declare these i256/i512 ops as custom, we need to add
MVT::i256/i512 simple types - I'm intending to add further large integer
handling in the future, some of which will use vector register
instructions, and its going to be much easier if this can be handled
with i128/i256/i512 types that match the vector register sizes.
This exposed a regression in NVPTX due to their use of EVT::isSimple()
to match their upper integer size bounds.