[Clang] Fix unused variable warning from 1911ce132659222aee353882bd55… (#171223)
…70d689745a7d
These are only used in assertions so trigger warnings in release builds.
Fix this per the LLVM programming standards.
AMDGPU: Fix truncstore from v6f32 to v6f16 (#171212)
The v6bf16 cases work, but that's likely because v6bf16 isn't
currently an MVT.
Fixes: SWDEV-570985
Fix a typo in "breakpoint add file" and add a test (#171206)
lldbutil.run_to_line_breakpoint had usages that set column breakpoints,
so I thought there was coverage of that on the command-line, but
actually all the `run_to` utilities use the SB API's, and there weren't
any tests of setting file line & column breakpoint through
`run_break_set`. So I missed that I had typed the column option `c` -
that's taken by `--command`.
This patch fixes that typo and adds a CLI test for file + line + column.
Reland "Redesign Straight-Line Strength Reduction (SLSR) (#162930)" (#169614)
This PR implements parts of
https://github.com/llvm/llvm-project/issues/162376
- **Broader equivalence than constant index deltas**:
- Add Base-delta and Stride-delta matching for Add and GEP forms using
ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available
IR value dominates the user.
- **Dominance-aware dictionary instead of linear scans**:
- Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating
basis quickly and deterministically.
- **Simple cost model and best-rewrite selection**:
- Score candidate expressions and rewrites; select the highest-profit
rewrite per instruction.
- Skip rewriting when expressions are already foldable or
high-efficiency.
[15 lines not shown]
[hwasan] Add config for AArch64 Linux with 39-bit VA. (#170927)
This is leveraging work which has already been done for Android, which
ships 39-bit VA kernels, and extending it to other embedded Linux
targets.
(SANITIZER_AARCH64_39BIT_VA was added in 58c8f57681.)
[asan] Add config for AArch64 Linux with 39-bit VA. (#170929)
This is leveraging work which has already been done for Android, which
ships 39-bit VA kernels, and extending it to other embedded Linux
targets.
(SANITIZER_AARCH64_39BIT_VA was added in 58c8f57681.)
[LoadStoreVectorizer] Fill gaps in load/store chains to enable vectorization (#159388)
This change introduces Gap Filling, an optimization that aims to fill in
holes in otherwise contiguous load/store chains to enable vectorization.
It also introduces Chain Extending, which extends the end of a chain to
the closest power of 2.
This was originally motivated by the NVPTX target, but I tried to
generalize it to be universally applicable to all targets that may use
the LSV. I'm more than willing to make adjustments to improve the
target-agnostic-ness of this change. I fully expect there are some
issues and encourage feedback on how to improve things.
For both loads and stores we only perform the optimization when we can
generate a legal llvm masked load/store intrinsic, masking off the
"extra" elements. Determining legality for stores is a little tricky
from the NVPTX side, because these intrinsics are only supported for
256-bit vectors. See the other PR I opened for the implementation of the
NVPTX lowering of masked store intrinsics, which include NVPTX TTI
[12 lines not shown]
VectorCombine: Improve the insert/extract fold in the narrowing case
Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:
1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
compatible, which allows foldLengthChangingShuffles to successfully
recognize a chain that can be folded.
There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2`
at all.
commit-id:c151bb04
[LLVM] Mark reloc-none test unsupported on Hexagon (#171205)
Prevents infinite loop issue recorded in #147427. More work will be
required to make @llvm.reloc_none work correctly on Hexagon.
[LV] Compare induction start values via SCEV in assertion (NFCI).
Instead of comparing plain VPValue in the assertion checking the start
values, directly compare the SCEV's. This future-proofs the code in
preparation of performing more simplifications/canonicalizations for
live-ins.
[Github] Make premerge update correct comments file for Windows
platform.machine() on x86_64 on Windows returns AMD64 rather than
x86_64. Make premerge.yaml reflect this.
[MemProf] Merge all callee guids for indirect call VP metadata (#170964)
When matching memprof profiles, for indirect calls we use the callee
guids recorded on callsites in the profile to synthesize indirect call
VP metadata when none exists. However, we only do this for the first
matching CallSiteEntry from the profile.
In some case there can be multiple, for example when the current
function was eventually inlined into multiple callers. Profile
generation propagates the CallSiteEntry from those callers into the
inlined callee's profile as it may not yet have been inlined in the
new compile.
To capture all of these potential indirect call targets, merge callee
guids across all matching CallSiteEntries.
[flang] add simplification for ProductOp intrinsic (#169575)
Add simplification for `ProductOp`, by implementing support for
`ReductionConversion` and adding it to the pattern list in
`SimplifyHLFIRIntrinsics` pass.
Closes:
https://github.com/issues/recent?issue=llvm%7Cllvm-project%7C169433
---------
Co-authored-by: Eugene Epshteyn <eepshteyn at nvidia.com>