[libomp] Add kmp_vector (ADT 2/2)
See rationale in the commit adding kmp_str_ref.
This commit introduces kmp_vector, a class intended primarily for small
vectors. It currently only includes methods I need at the moment, but
it's easily extensible.
AMDGPU: Reland: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter which will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
Original patch had a bug where it did not check if physical src
registers match register class of appropriate operand in fullVOPD
instructions, check is now done via isValidVOPDSrc.
[AArch64][LLD] Update tests to reduce file size [NFC] (#202547)
Remove large object and executable files after running test. Some tests
need to run within a single OutputSection and cannot use a Linker Script
to increase distance without a large object and corresponding executable
file.
Fixes AArch64 part of #202261
[ARM][LLD] Remove large files at end of test [NFC] (#202548)
Some range extension and erratum fix thunks can't easily use a linker
script to make gaps that don't result in a large output. Explicitly
remove the large object and linker output files to reduce storage usage.
Related to #202261
[ARM][LLD] Reduce thunk test case size, linkerscript changes [NFC] (#202549)
These changes either do some refactoring to use split-file and then
delete the outputs as the size saving is not large. Or it adapts the
linker script to reduce the size by introducing sparse program segments.
All these cases are fairly simple changes, and have made minimal changes
to the CHECK lines.
Related to #202261
[SPIR-V] Take ArrayRef instead of owning containers in selection helpers (NFC) (#203908)
Avoid per call heap allocations where call sites pass braced list
temporaries
AMDGPU: Validate VOPD/VOPD3 physical source registers against operand RC
Replace isVGPR checks with isValidVOPDSrc that validates physical source
registers against the actual combined VOPD/VOPD3 instruction's operand
register classes. Now we also validate operands for VOPD instructions.
[lldb] Reformat doxygen comments in lldb-enumerations.h (NFC) (#203079)
Convert doxygen comments to precede the enumerator to which they apply
(using `///`). This placement of documentation is more consistent with
how functions and classes are documented. Additionally, with the column
limit, the documentation was quite crammed as it was. Lastly, comments
have been reflowed, so that make full use of horizontal space.
Assisted-by: claude
[Lower] Wrap unstructured constructs in scf.execute_region
For each unstructured DO construct whose body only exits via the
construct's lexical exit (no GOTOs to outer labels), move the loop's
blocks into the region of a freshly-created scf.execute_region op
marked `no_inline`. The op sits in the outer CFG followed by a
cf.br to what used to be the construct-exit block; in-loop edges to
that block become scf.yield in a single yield block inside the region.
Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
[OFFLOAD][L0] Switch to use inorder queues by default (#203897)
Now that other pieces are in place we can switch to using Level Zero
inorder queues by default.
[lldb] Recompute the statusline on resize without clearing the screen (#202691)
On a terminal resize the statusline cleared the whole screen (ESC[2J)
and redrew, because recomputing in place was buggy: the statusline
wrapped and duplicated. The clear also wiped the visible scrollback on
every resize. I got lots of feedback that this wasn't a great user
experience so I spent some time taking another stab at this.
This PR reverts back to recomputing the statusline. After a resize the
terminal still shows the old statusline:
- Making the terminal smaller (horizontally) reflows the full-width line
into ceil(prev_width / width) rows at the bottom
- Making the terminal larger (vertically) leaves it stranded at its old
row
Clear only the rows it can still occupy and redraw, preserving the
scrollback above. Disable autowrap while drawing so a line briefly wider
than the terminal is clipped at the margin rather than wrapping onto the
[7 lines not shown]
[SPIR-V] Deduce pointee type for all global variables, not only initialized ones (#202047)
SPIRVEmitIntrinsics::processGlobalValue only recorded a global variable's
element type when hasInitializer() was true. Now it's recording it unconditionally.
Assisted-by: Claude (Anthropic)
AMDGPU/GlobalISel: Remove -new-reg-bank-select option
AMDGPU's -global-isel pipeline that uses AMDGPURegBankSelect and
AMDGPURegBankLegalize, previously -global-isel -new-reg-bank-select,
is now the default -global-isel pipeline.
Remove -new-reg-bank-select option from the compiler.
Remove -new-reg-bank-select from all llvm regression tests.
Edit a couple comments to reference RegBankLegalize instead of
-new-reg-bank-select.
[Flang] [Runtime ]Fix write endfile abort (#191633)
A WRITE after ENDFILE with ERR= or IOSTAT= was crashing instead of
handling the error properly. Earlier, the program was crashing because
the error was triggered too early (before error handling was ready).
---------
Co-authored-by: Jay Satish Kumar Patel <kumarpat at hpe.com>
[Flang][OpenMP] remove enable-delayed-privatization-staging to suppor… (#203626)
…t target first private default
This commit follows the decision in #182356 to remove the not yet
implemented for delayed privatization for firstprivate and private in
`omp target` regions in flang
Fixes #182356
Assisted with Opus
[libomp] Add kmp_str_ref (ADT 1/2) (#176162)
libomp currently has two limitations:
1) although it's C++, it doesn't link against the C++ stdlib 2) it
cannot link against the implementation of LLVM ADTs
These limitations shall not be altered at the moment.
As a result, this commit introduces kmp_str_ref, which is similar to
LLVM's StringRef. It currently only includes methods I need at the
moment, but it's easily extensible.
[JumpThreading] Use isGuaranteedToTransferExecutionToSuccessor() with range (#203918)
Use the overload that accepts a range of instructions. This is not NFC
because the scan is now subject to ScanLimit.
AMDGPU/GlobalISel: Use AMDGPURegBankSelect + AMDGPURegBankLegalize by default
Change AMDGPU's default -global-isel pipeline to use AMDGPURegBankSelect
and AMDGPURegBankLegalize (previously -global-isel -new-reg-bank-select)
by default instead of RegBankSelect which uses AMDGPURegisterBankInfo.
-global-isel pipeline that used RegBankSelect/AMDGPURegisterBankInfo is
now deprecated, since it could not generate functionally correct code in
some cases involving divergent control flow and phis.
-new-reg-bank-select option does nothing and will be removed in followup
patch.
Delete regbankselect-mui.ll and regbankselect-mui-salu-float.ll, which
existed to compare the -global-isel vs -global-isel -new-reg-bank-select.
Temporarily disable a couple of tests that are missing AMDGPURegBankLegalize
support.
[ConstantHoisting] Skip PHI edges from unreachable blocks (#203892)
When collecting constant candidates, skip incoming PHI edges from blocks
that are unreachable from entry. This avoids assertion failures when the
pass later tries to find insertion points or update users for constants
that only appear on unreachable edges.
While touching this part of the code, also remove an older XFAIL from
test/Transforms/ConstantHoisting/X86/pr52689-not-all-uses-rebased.ll
That test case also triggered the same assert once upon a time, but it
has been set to XFAIL for some time since the reproducer no longer
triggered the bug. This patch turns it into a normal test case instead
of an XFAIL test. Afaict the original problem may have been the same. We
have PHI nodes with edges from unreachable blocks. One difference
compared to the new aarch64 test is that here the involved constants are
GEPs and not simple scalars.
Closes https://github.com/llvm/llvm-project/issues/52689