[clang][docs] Add link to C++ modules Wikipedia page to docs (#169200)
This PR adds a link to the "[Modules
(C++)](https://en.wikipedia.org/wiki/Modules_(C++))" page on Wikipedia
and similar on cpp reference, as per recommendation by another
contributor.
[AArch64] Improve select dagcombine (#169925)
An AnyOf reduction (aka vector.reduce.or) with a fixed-width vector is
canonicalized to a bitcast of the mask vector to an integer of the same
overall size, which is then compared against zero.
If the scalar result of the bitcast is smaller than the element size of
vectors being selected, we often end up with suboptimal codegen. This
fixes the main cases, removing scalarized code.
[libc++] Use private CMake flags to enable the pragma system_header macro when building (#138826)
That property doesn't need to be propagated beyond the translation units
of the libc++ built library itself.
Address reviewer feedback: fix getWaitCountMax and reduce code duplication
- Fix getWaitCountMax() to use correct bitmasks based on architecture:
- Pre-GFX12: Use getVmcntBitMask/getLgkmcntBitMask for LOAD_CNT/DS_CNT
- GFX12+: Use getLoadcntBitMask/getDscntBitMask for LOAD_CNT/DS_CNT
- Refactor repetitive if-blocks for LOAD_CNT, DS_CNT, EXP_CNT into
a single loop using getCounterRef helper function
- Fix X_CNT to return proper getXcntBitMask(IV) instead of 0
AMDGPU: Fix treating unknown mem operands as uniform (#168980)
The test changes are mostly GlobalISel specific regressions.
GlobalISel is still relying on isUniformMMO, but it doesn't really
have an excuse for doing so. These should be avoidable with new
regbankselect.
There is an additional regression for addrspacecast for cov4. We
probably ought to be using a separate PseudoSourceValue for the
access of the queue pointer.
AMDGPU: Use ConstantPool as source value for DAG lowered kernarg loads
This isn't quite a constant pool, but probably close enough for this
purpose. We just need some known invariant value address. The aliasing
queries against the real kernarg base pointer will falsely report
no aliasing, but for invariant memory it probably doesn't matter.
[X86] combine-icmp.ll - fix copy+paste typo in concat_icmp_v64i8_v16i8 test (#170281)
I changed the condcode for variety but failed to update the constant to prevent constant folding
[Delinearization] Add test for inferred array size exceeds integer range (NFC) (#169048)
Add test cases where the delinearized arrays may not satisfy the
following "common" property:
`&A[I_1][I_2]...[I_n] == &A[J_1][J_2]...[J_n]` iff
`(I_1, I_2, ..., I_n) == (J_1, J_2, ..., J_n)`
The root cause of this issue is that the inferred array size is too
large and the offset calculation overflows.
Such results should be discarded during validation. This will be fixed
by #169902 .
Revert "[AArch64][ARM] Move ARM-specific InstCombine transforms into `Transforms/Utils` (#169589)"
This reverts commit 1c32b6f51ccaaf9c65be11d7dca9e5a476cddb5a due to failures on
BUILD_SHARED_LIBS builds.
[LV] Emit better debug and opt-report messages when vectorization is disallowed in the LoopVectorizer (#158513)
While looking into fixing #158499, I found some other cases where the
messages emitted could be improved. This PR improves both the messages
printed to the debug output and the missed-optimization messages in
cases where:
- loop vectorization is explicitly disabled
- loop vectorization is implicitly disabled by disabling all loop
transformations
- loop vectorization is set to happen only where explicitly enabled
A branch that should currently be unreachable is also added. If the
related logic ever breaks (eg. due to changes to getForce() or the
ForceKind enum) this should alert devs and users. New test cases are
also added to verify that the correct messages (and only them) are
outputted.
---------
[2 lines not shown]
[VPlan] Sink predicated stores with complementary masks. (#168771)
Extend the logic to hoist predicated loads
(https://github.com/llvm/llvm-project/pull/168373) to sink predicated
stores with complementary masks in a similar fashion.
The patch refactors some of the existing logic for legality checks to be
shared between hosting and sinking, and adds a new sinking transform on
top.
With respect to the legality checks, for sinking stores the code also
checks if there are any aliasing stores that may alias, not only loads.
PR: https://github.com/llvm/llvm-project/pull/168771
[X86] Make VBMI2 funnel shifts use VSHLD/VSHRD for const splats (#169401)
Make ISD::FSHL/FSHR legal on VBMI2 vector targets and convert to VSHLD/VSHRD in a combine
closes #166949
[AArch64] [test] Make unwind info tests actually use the right instructions
This makes them match the expected decoding of the unwind info
opcodes, avoiding mismatch indications from "dumpbin -unwindinfo".
[AArch64] [test] Write the seh.s test output object to a file
This is what is done in other tests; this makes it easier to
inspect the output of this test manually.
[AArch64] [test] Move tests for custom unwind opcodes to a separate function
These custom opcodes disable the checker for having the prologue
length actually match the opcodes (see checkARM64Instructions in
MCWin64EH.cpp) - which led to the prologue mismatching the opcodes
by one instruction, since 312d6b488ef9d7c0e4d649827820db7285e36406.
Move the special opcodes to a separate test function.
Remove the mismatched nop instruction at the end of the main
function, as this prologue now is assembled with the strict length
checking enabled.