[AMDGPU] Saturate at i16 for f16 to i1/i8 conversion (#187467)
By using a native `v_cvt_i16/u16_f16` conversion and saturation at `i16`
we avoid additional `f16` to `f32` conversion that is required to
perform saturation at `i32`. It also allows to perform clamping using
`i16` instructions, reducing number of registers needed in *true16* mode
in some of the lit tests. The behavior is disabled for pre-gfx8 targets
by checking `has16BitInsts()`.
[flang][NFC] Converted five tests from old lowering to new lowering (part 36) (#187628)
Tests converted from test/Lower/Intrinsics: maxloc.f90, maxval.f90,
merge.f90, merge_bits.f90, minloc.f90
[VPlan] Skip epilogue vectorization if dead after narrowing IGs. (#187016)
When narrowing interleave groups, the main vector loop processes IC
iterations instead of VF * IC. Update selectEpilogueVectorizationFactor
to use the effective VF, checking if the canonical IV controlling the
loop now steps by UF instead of VFxUF.
This avoids epilogue vectorization with dead epilogue vector loops and
also prevents crashes in cases where we can prove both the epilogue and
scalar loop are dead.
Fixes https://github.com/llvm/llvm-project/issues/186846
PR: https://github.com/llvm/llvm-project/pull/187016
[libc][NFC] Fix typo in file.cpp (#91192) (#187688)
Corrected language and spelling errors in a comment within file.cpp.
Credit GH user @iBlanket for identifying this typo.
[analyzer] Don't rule out symbolic pointer pointing to stack (#187080)
Ensure that the analyzer doesn't rule out the equality (or guarantee
disequality) of a pointer to the stack and a symbolic pointer in unknown
space. Previously the analyzer incorrectly assumed that stack pointers
cannot be equal to symbolic pointers in unknown space.
It is true that functions cannot validly return pointers to their own
stack frame, but they can easily return a pointer to some other stack
frame (e.g. a function can return a pointer recieved as an argument).
The old behavior was introduced intentionally in 2012 by commit
3563fde6a02c2a75d0b4ba629d80c5511056a688, but it causes incorrect
analysis, e.g. it prevents the correct handling of some testcases from
the Juliet suite because it rules out the "fgets succeeds" branch.
Reported-by: Daniel Krupp <daniel.krupp at ericsson.com>
[OFFLOAD] Add GPU wrappers for headers currently supported by SPIRV built libc (#181913)
This is to add GPU wrappers for headers that are currently supported by
libc built for SPIRV.
[VPlan] Simplify mul x, -1 -> sub 0, x (#187551)
Simplify exactly as InstCombine does. A follow-up would include
simplifying add x, (sub 0, y) -> sub x, y.
Alive2 proof: https://alive2.llvm.org/ce/z/Af7QiD
[lldb] Add HTTP support in SymbolLocatorSymStore (#186986)
The initial version of SymbolLocatorSymStore supported servers only on
local paths. This patch extends it to HTTP/HTTPS end-points. For that to
work on Windows, we add a WinHTTP-based HTTP client backend in LLVM next
to the existing CURL-based implementation.
We don't add a HTTP server implementation, because there is no use right
now. Test coverage for the LLVM part is built on llvm-debuginfod-find
and works server-less, since it checks textual output of request
headers. The existing CURL-based implementation uses the same approach.
The LLDB API test for the specific SymbolLocatorSymStore feature spawns
a HTTP server from Python.
To keep the size of this patch within reasonable limits, the initial
implementation of the SymbolLocatorSymStore feature is dump: There is no
caching, no verification of downloaded files and no protection against
file corruptions. We use a local implementation of LLVM's
HTTPResponseHandler, but should think about extracting and reusing
[9 lines not shown]
[analyzer] Fix logic in CallEvent::getReturnValueUnderConstruction (#187020)
The `CallEvent` has data members that store the `LocationContext` and
the `CFGElementRef` (i.e. `CFGBlock` + index of statement within that
block); but the method `getReturnValueUnderConstruction` ignored these
and used the currently analyzed `LocationContext` and `CFGBlock` instead
of them.
This was logically incorrect and would have caused problems if the
`CallEvent` was used later when the "currently analyzed" things are
different. However, the lit tests do pass even if I assert that the
currently analyzed `LocationContext` and `CFGBlock` is the same as the
ones saved in the `CallEvent`, so I'm pretty sure that there was no
actual problem caused by this bad logic and this commit won't cause
functional changes.
I also evaluated this change on a set of open source projects (postgres,
tinyxml2, libwebm, xerces, bitcoin, protobuf, qtbase, contour, openrct2)
and validated that it doesn't change the results of the analysis.
[CGP][PAC] Flip PHI and blends when all immediate modifiers are the same
GVN PRE, SimplifyCFG and possibly other passes may hoist the call to
`@llvm.ptrauth.blend` intrinsic, introducing multiple duplicate call
instructions hidden behind a PHI node. This prevents the instruction
selector from generating safer code by absorbing the address and
immediate modifiers into separate operands of AUT, PAC, etc. pseudo
instruction.
This patch makes CodeGenPrepare pass detect when discriminator is
computed as a PHI node with all incoming values being blends with the
same immediate modifier. Each such discriminator value is replaced by a
single blend, whose address argument is computed by a PHI node.
[clang-tidy] Fix alphabetical order check for multiline doc entries and whitespace handling (#186950)
The `check_alphabetical_order.py` script previously only scanned the
first line of each bullet point in `ReleaseNotes.rst`, causing sorting
failures when a `:doc:` tag was split across multiple lines.
Also, when it is sorting the last entry of a section, the script will
insert an unnecessary whitespace.
This PR fixes these two problems.
[AArch64][PAC] Rework discriminator analysis for calls and tail calls
Make use of fixupBlendComponents for AUTH_TCRETURN[_BTI] and for
BLRA[_RVMARKER] pseudos the same way it is done for AUT/PAC/AUTPAC.
This patch unifies discriminator analysis for DAGISel and GlobalISel
and improves cross-BB analysis in case of DAGISel.
[LoopRotate] Use SCEV exit counts to improve rotation profitability (#187483)
Most loop transformations, like unrolling and vectorization, expect the
latch branch to be countable. Allow rotation, if it turns the latch from
uncountable to countable.
This use SCEV to check for countable exits, if CheckExitCount set.
Currently it is not set for the LPM1 run (where SCEV is not used by
other passes), only in LPM.
With that compile-time impact is mostly neutral
https://llvm-compile-time-tracker.com/compare.php?from=eba342d0ba930a404a026c80aada51c43974f0db&to=2e676337b45fae63ce9498116d8e6e43772363c5&stat=instructions:u
ClamAV is consistently slower (~+0.15%) and 7zip faster in most cases
(~-0.13%)
Across a large test set based on C/C++ workloads, this rotates ~0.8%
more loops with ~2.68M rotated loops.
[16 lines not shown]
[SPIR-V] Support global variable annotations in llvm.global.annotations (#187241)
SPIR-V backend previously only supported function annotations in
llvm.global.annotations and crashed with a fatal error when encountering
global variable entries