Reland "Redesign Straight-Line Strength Reduction (SLSR) (#162930)" (#169614)
This PR implements parts of
https://github.com/llvm/llvm-project/issues/162376
- **Broader equivalence than constant index deltas**:
- Add Base-delta and Stride-delta matching for Add and GEP forms using
ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available
IR value dominates the user.
- **Dominance-aware dictionary instead of linear scans**:
- Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating
basis quickly and deterministically.
- **Simple cost model and best-rewrite selection**:
- Score candidate expressions and rewrites; select the highest-profit
rewrite per instruction.
- Skip rewriting when expressions are already foldable or
high-efficiency.
[15 lines not shown]
[hwasan] Add config for AArch64 Linux with 39-bit VA. (#170927)
This is leveraging work which has already been done for Android, which
ships 39-bit VA kernels, and extending it to other embedded Linux
targets.
(SANITIZER_AARCH64_39BIT_VA was added in 58c8f57681.)
[asan] Add config for AArch64 Linux with 39-bit VA. (#170929)
This is leveraging work which has already been done for Android, which
ships 39-bit VA kernels, and extending it to other embedded Linux
targets.
(SANITIZER_AARCH64_39BIT_VA was added in 58c8f57681.)
[LoadStoreVectorizer] Fill gaps in load/store chains to enable vectorization (#159388)
This change introduces Gap Filling, an optimization that aims to fill in
holes in otherwise contiguous load/store chains to enable vectorization.
It also introduces Chain Extending, which extends the end of a chain to
the closest power of 2.
This was originally motivated by the NVPTX target, but I tried to
generalize it to be universally applicable to all targets that may use
the LSV. I'm more than willing to make adjustments to improve the
target-agnostic-ness of this change. I fully expect there are some
issues and encourage feedback on how to improve things.
For both loads and stores we only perform the optimization when we can
generate a legal llvm masked load/store intrinsic, masking off the
"extra" elements. Determining legality for stores is a little tricky
from the NVPTX side, because these intrinsics are only supported for
256-bit vectors. See the other PR I opened for the implementation of the
NVPTX lowering of masked store intrinsics, which include NVPTX TTI
[12 lines not shown]
VectorCombine: Improve the insert/extract fold in the narrowing case
Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:
1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
compatible, which allows foldLengthChangingShuffles to successfully
recognize a chain that can be folded.
There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2`
at all.
commit-id:c151bb04
[LLVM] Mark reloc-none test unsupported on Hexagon (#171205)
Prevents infinite loop issue recorded in #147427. More work will be
required to make @llvm.reloc_none work correctly on Hexagon.
[LV] Compare induction start values via SCEV in assertion (NFCI).
Instead of comparing plain VPValue in the assertion checking the start
values, directly compare the SCEV's. This future-proofs the code in
preparation of performing more simplifications/canonicalizations for
live-ins.
[Github] Make premerge update correct comments file for Windows
platform.machine() on x86_64 on Windows returns AMD64 rather than
x86_64. Make premerge.yaml reflect this.
[MemProf] Merge all callee guids for indirect call VP metadata (#170964)
When matching memprof profiles, for indirect calls we use the callee
guids recorded on callsites in the profile to synthesize indirect call
VP metadata when none exists. However, we only do this for the first
matching CallSiteEntry from the profile.
In some case there can be multiple, for example when the current
function was eventually inlined into multiple callers. Profile
generation propagates the CallSiteEntry from those callers into the
inlined callee's profile as it may not yet have been inlined in the
new compile.
To capture all of these potential indirect call targets, merge callee
guids across all matching CallSiteEntries.
[flang] add simplification for ProductOp intrinsic (#169575)
Add simplification for `ProductOp`, by implementing support for
`ReductionConversion` and adding it to the pattern list in
`SimplifyHLFIRIntrinsics` pass.
Closes:
https://github.com/issues/recent?issue=llvm%7Cllvm-project%7C169433
---------
Co-authored-by: Eugene Epshteyn <eepshteyn at nvidia.com>
[FlowSensitive] [StatusOr] [11/N] Assume const accessor calls are stable (#170935)
This is not necessarily correct, but prevents us from flagging lots of
false positives because code usually abides by this.
[RISCV] Remove unnecesary override of getVectorTypeBreakdownForCallingConv. NFC (#171155)
There used to be code in here to make i32 legal on RV64, but it was
removed.
Also remove unnecessary temporary variable from
getRegisterTypeForCallingConv.
[AMDGPU] Add argument range annotations to intrinsics where applicable (#170958)
This commit adds annotations to AMDGPU intrinscis that take arguments
which are documented to lie within a specified range, ensuring that
invalid instances of these intrinsics don't pass verification.
(Note that certain intrinsics that could have range annothations don't,
as their existing behavior is to clamp out-of-range values silently.)
Disclaimer: tests generated by LLM (code is mine)
[NFC] [FlowSensitive] Fix missing namespace in MockHeaders (#170954)
This happened to work because we were missing both a namespace close and
open and things happened to be included in the correct order.