LLVM/project ef7cfd1llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp, llvm/test/CodeGen/AMDGPU trunc-store.ll fp_trunc_store_fp32_to_bf16.ll

AMDGPU: Fix truncstore from v6f32 to v6f16

The v6bf16 cases work, but that's likely because v6bf16 isn't
currently an MVT.

Fixes: SWDEV-570985
DeltaFile
+125-0llvm/test/CodeGen/AMDGPU/trunc-store.ll
+48-0llvm/test/CodeGen/AMDGPU/fp_trunc_store_fp32_to_bf16.ll
+14-0llvm/test/CodeGen/AMDGPU/trunc-store-f64-to-f16.ll
+1-0llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+188-04 files

LLVM/project f803e46llvm/lib/Transforms/Scalar StraightLineStrengthReduce.cpp, llvm/test/CodeGen/AMDGPU promote-constOffset-to-imm.ll waitcnt-vscnt.ll

Reland "Redesign Straight-Line Strength Reduction (SLSR) (#162930)" (#169614)

This PR implements parts of
https://github.com/llvm/llvm-project/issues/162376

- **Broader equivalence than constant index deltas**:
- Add Base-delta and Stride-delta matching for Add and GEP forms using
ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available
IR value dominates the user.
- **Dominance-aware dictionary instead of linear scans**:
  - Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating
basis quickly and deterministically.
- **Simple cost model and best-rewrite selection**:
- Score candidate expressions and rewrites; select the highest-profit
rewrite per instruction.
- Skip rewriting when expressions are already foldable or
high-efficiency.

    [15 lines not shown]
DeltaFile
+883-273llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
+243-244llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
+133-196llvm/test/CodeGen/AMDGPU/waitcnt-vscnt.ll
+271-0llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-i8-gep.ll
+156-7llvm/test/Transforms/StraightLineStrengthReduce/slsr-gep.ll
+83-80llvm/test/CodeGen/AMDGPU/idot4s.ll
+1,769-80012 files not shown
+2,200-83618 files

LLVM/project c3acafccompiler-rt/lib/hwasan hwasan_allocator.h

[hwasan] Add config for AArch64 Linux with 39-bit VA. (#170927)

This is leveraging work which has already been done for Android, which
ships 39-bit VA kernels, and extending it to other embedded Linux
targets.

(SANITIZER_AARCH64_39BIT_VA was added in 58c8f57681.)
DeltaFile
+2-1compiler-rt/lib/hwasan/hwasan_allocator.h
+2-11 files

LLVM/project b71eb53compiler-rt/lib/asan asan_allocator.h

[asan] Add config for AArch64 Linux with 39-bit VA. (#170929)

This is leveraging work which has already been done for Android, which
ships 39-bit VA kernels, and extending it to other embedded Linux
targets.

(SANITIZER_AARCH64_39BIT_VA was added in 58c8f57681.)
DeltaFile
+2-1compiler-rt/lib/asan/asan_allocator.h
+2-11 files

LLVM/project 5c8c7f3llvm/lib/Transforms/Vectorize LoadStoreVectorizer.cpp, llvm/test/Transforms/LoadStoreVectorizer/NVPTX masked-store.ll vectorize_i8.ll

[LoadStoreVectorizer] Fill gaps in load/store chains to enable vectorization (#159388)

This change introduces Gap Filling, an optimization that aims to fill in
holes in otherwise contiguous load/store chains to enable vectorization.
It also introduces Chain Extending, which extends the end of a chain to
the closest power of 2.

This was originally motivated by the NVPTX target, but I tried to
generalize it to be universally applicable to all targets that may use
the LSV. I'm more than willing to make adjustments to improve the
target-agnostic-ness of this change. I fully expect there are some
issues and encourage feedback on how to improve things.

For both loads and stores we only perform the optimization when we can
generate a legal llvm masked load/store intrinsic, masking off the
"extra" elements. Determining legality for stores is a little tricky
from the NVPTX side, because these intrinsics are only supported for
256-bit vectors. See the other PR I opened for the implementation of the
NVPTX lowering of masked store intrinsics, which include NVPTX TTI

    [12 lines not shown]
DeltaFile
+541-0llvm/test/Transforms/LoadStoreVectorizer/NVPTX/masked-store.ll
+407-69llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+216-41llvm/test/Transforms/LoadStoreVectorizer/NVPTX/vectorize_i8.ll
+194-0llvm/test/Transforms/LoadStoreVectorizer/NVPTX/gap-fill.ll
+166-0llvm/test/Transforms/LoadStoreVectorizer/NVPTX/gap-fill-vectors.ll
+113-0llvm/test/Transforms/LoadStoreVectorizer/NVPTX/extend-chain.ll
+1,637-1108 files not shown
+2,029-15214 files

LLVM/project 89bc5ffclang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

[CIR] Clean up visibility conversion (NFC) (#171000)

DeltaFile
+5-7clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+5-71 files

LLVM/project 2ee921fllvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AMDGPU extract-insert-i8.ll

VectorCombine: Improve the insert/extract fold in the narrowing case

Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:

1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
   allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
   compatible, which allows foldLengthChangingShuffles to successfully
   recognize a chain that can be folded.

There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2`
at all.

commit-id:c151bb04
DeltaFile
+6-16llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+2-16llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+8-4llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
+4-4llvm/test/Transforms/VectorCombine/X86/extract-insert.ll
+2-2llvm/test/Transforms/VectorCombine/X86/pr126085.ll
+22-425 files

LLVM/project 316715ellvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AMDGPU extract-insert-i8.ll shuffles-of-length-changing-shuffles.ll

VectorCombine: Fold chains of shuffles fed by length-changing shuffles

Such chains can arise from folding insert/extract chains.

commit-id:a960175d
DeltaFile
+192-0llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+4-32llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+4-8llvm/test/Transforms/VectorCombine/AMDGPU/shuffles-of-length-changing-shuffles.ll
+200-403 files

LLVM/project 1e44d4bllvm/test/Transforms/VectorCombine/AMDGPU shuffles-of-length-changing-shuffles.ll

AMDGPU: Precommit a test

commit-id:a0814f87
DeltaFile
+50-0llvm/test/Transforms/VectorCombine/AMDGPU/shuffles-of-length-changing-shuffles.ll
+50-01 files

LLVM/project e7f6038llvm/test/CodeGen/Generic reloc-none.ll

[LLVM] Mark reloc-none test unsupported on Hexagon (#171205)

Prevents infinite loop issue recorded in #147427. More work will be
required to make @llvm.reloc_none work correctly on Hexagon.
DeltaFile
+1-0llvm/test/CodeGen/Generic/reloc-none.ll
+1-01 files

LLVM/project 65dd29bllvm/lib/Transforms/Vectorize LoopVectorize.cpp

[LV] Compare induction start values via SCEV in assertion (NFCI).

Instead of comparing plain VPValue in the assertion checking the start
values, directly compare the SCEV's. This future-proofs the code in
preparation of performing more simplifications/canonicalizations for
live-ins.
DeltaFile
+4-1llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+4-11 files

LLVM/project d33d9ac.github/workflows premerge.yaml

[Github] Make premerge update correct comments file for Windows

platform.machine() on x86_64 on Windows returns AMD64 rather than
x86_64. Make premerge.yaml reflect this.
DeltaFile
+1-1.github/workflows/premerge.yaml
+1-11 files

LLVM/project 3a77056clang-tools-extra/clang-tidy/abseil UncheckedStatusOrAccessCheck.h

change

Created using spr 1.3.7
DeltaFile
+1-1clang-tools-extra/clang-tidy/abseil/UncheckedStatusOrAccessCheck.h
+1-11 files

LLVM/project 1ec8e00clang/test/Headers __clang_hip_math.hip, llvm/test/CodeGen/X86 shift-i512.ll bitcnt-big-integer.ll

fix

Created using spr 1.3.7
DeltaFile
+2,027-185llvm/test/CodeGen/X86/shift-i512.ll
+1,563-413llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+1,005-956clang/test/Headers/__clang_hip_math.hip
+0-1,298openmp/runtime/src/include/omp_lib.h.var
+1,298-0openmp/module/omp_lib.h.var
+1,183-0openmp/module/omp_lib.F90.var
+7,076-2,852734 files not shown
+22,047-12,945740 files

LLVM/project 9cc4265clang/test/Headers __clang_hip_math.hip, llvm/test/CodeGen/X86 shift-i512.ll bitcnt-big-integer.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+2,027-185llvm/test/CodeGen/X86/shift-i512.ll
+1,563-413llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+1,005-956clang/test/Headers/__clang_hip_math.hip
+0-1,298openmp/runtime/src/include/omp_lib.h.var
+1,298-0openmp/module/omp_lib.h.var
+1,183-0openmp/module/omp_lib.F90.var
+7,076-2,852731 files not shown
+22,029-12,926737 files

LLVM/project f6971bfclang/lib/Analysis/FlowSensitive/Models UncheckedStatusOrAccessModel.cpp, clang/unittests/Analysis/FlowSensitive UncheckedStatusOrAccessModelTestFixture.cpp

[FlowSensitive] [StatusOr] [12/N] Add support for smart pointers (#170943)

DeltaFile
+74-0clang/unittests/Analysis/FlowSensitive/UncheckedStatusOrAccessModelTestFixture.cpp
+48-0clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
+122-02 files

LLVM/project 0f60cdflld/ELF Relocations.cpp SyntheticSections.h, lld/ELF/Arch AArch64.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.5
DeltaFile
+32-44lld/ELF/Relocations.cpp
+29-5lld/ELF/SyntheticSections.h
+8-24lld/ELF/Arch/AArch64.cpp
+5-15lld/ELF/Writer.cpp
+5-1lld/ELF/SyntheticSections.cpp
+2-0lld/ELF/InputSection.cpp
+81-891 files not shown
+82-897 files

LLVM/project 0eb9c6clld/ELF Relocations.cpp SyntheticSections.h, lld/ELF/Arch AArch64.cpp

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.5

[skip ci]
DeltaFile
+32-44lld/ELF/Relocations.cpp
+8-24lld/ELF/Arch/AArch64.cpp
+25-2lld/ELF/SyntheticSections.h
+2-1lld/ELF/Writer.cpp
+2-0lld/ELF/InputSection.cpp
+1-0lld/ELF/Relocations.h
+70-711 files not shown
+70-727 files

LLVM/project e3905a4llvm/lib/Transforms/Instrumentation MemProfUse.cpp, llvm/test/Transforms/PGOProfile memprof_annotate_indirect_call.test

[MemProf] Merge all callee guids for indirect call VP metadata (#170964)

When matching memprof profiles, for indirect calls we use the callee
guids recorded on callsites in the profile to synthesize indirect call
VP metadata when none exists. However, we only do this for the first
matching CallSiteEntry from the profile.

In some case there can be multiple, for example when the current
function was eventually inlined into multiple callers. Profile
generation propagates the CallSiteEntry from those callers into the
inlined callee's profile as it may not yet have been inlined in the
new compile.

To capture all of these potential indirect call targets, merge callee
guids across all matching CallSiteEntries.
DeltaFile
+40-41llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
+14-2llvm/test/Transforms/PGOProfile/memprof_annotate_indirect_call.test
+54-432 files

LLVM/project 1e16f4eflang/include/flang/Optimizer/Builder FIRBuilder.h, flang/lib/Optimizer/Builder FIRBuilder.cpp

[flang] add simplification for ProductOp intrinsic (#169575)

Add simplification for `ProductOp`, by implementing support for
`ReductionConversion` and adding it to the pattern list in
`SimplifyHLFIRIntrinsics` pass.

Closes:
https://github.com/issues/recent?issue=llvm%7Cllvm-project%7C169433

---------

Co-authored-by: Eugene Epshteyn <eepshteyn at nvidia.com>
DeltaFile
+457-0flang/test/HLFIR/simplify-hlfir-intrinsics-product.fir
+49-0flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
+20-0flang/lib/Optimizer/Builder/FIRBuilder.cpp
+10-0flang/include/flang/Optimizer/Builder/FIRBuilder.h
+536-04 files

LLVM/project d17f3b5compiler-rt/test/xray/TestCases/Posix basic-filtering.cpp fdr-mode.cpp

[XRay] Disable two more tests on armhf

Similar to d6f92050c0c2f60e78f3c8bcf557c5e69b025d7a. Needed now that
these tests are actually running more broadly.
DeltaFile
+2-0compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
+1-0compiler-rt/test/xray/TestCases/Posix/fdr-mode.cpp
+3-02 files

LLVM/project 2ba52c1clang/test/Headers __clang_hip_math.hip, llvm/test/CodeGen/X86 shift-i512.ll bitcnt-big-integer.ll

reb

Created using spr 1.3.7
DeltaFile
+2,027-185llvm/test/CodeGen/X86/shift-i512.ll
+1,563-413llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+1,005-956clang/test/Headers/__clang_hip_math.hip
+0-1,298openmp/runtime/src/include/omp_lib.h.var
+1,298-0openmp/module/omp_lib.h.var
+0-1,183openmp/runtime/src/include/omp_lib.F90.var
+5,893-4,035725 files not shown
+21,447-12,890731 files

LLVM/project 41363b4clang/unittests/Analysis/FlowSensitive MockHeaders.cpp

[NFC] [FlowSensitive] Add mock unique_ptr header (#170942)

DeltaFile
+90-0clang/unittests/Analysis/FlowSensitive/MockHeaders.cpp
+90-01 files

LLVM/project bb79b35clang/include/clang/Analysis/FlowSensitive/Models UncheckedStatusOrAccessModel.h, clang/lib/Analysis/FlowSensitive/Models UncheckedStatusOrAccessModel.cpp

[FlowSensitive] [StatusOr] [11/N] Assume const accessor calls are stable (#170935)

This is not necessarily correct, but prevents us from flagging lots of
false positives because code usually abides by this.
DeltaFile
+173-0clang/unittests/Analysis/FlowSensitive/UncheckedStatusOrAccessModelTestFixture.cpp
+168-0clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
+3-1clang/include/clang/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.h
+344-13 files

LLVM/project b802fdbllvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVISelLowering.h

[RISCV] Remove unnecesary override of getVectorTypeBreakdownForCallingConv. NFC (#171155)

There used to be code in here to make i32 legal on RV64, but it was
removed.

Also remove unnecessary temporary variable from
getRegisterTypeForCallingConv.
DeltaFile
+1-12llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+0-6llvm/lib/Target/RISCV/RISCVISelLowering.h
+1-182 files

LLVM/project 324d589utils/bazel/llvm-project-overlay/libc BUILD.bazel

Fix bazel build for 2a5420ea5184a334c2af9f2f9f43de4dfc6b76d3
DeltaFile
+9-16utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+9-161 files

LLVM/project 7ecc9eellvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp, llvm/lib/Target/X86/GISel X86LegalizerInfo.cpp

[X86][GlobalISel] Set Dst register correctly when narrowing G_ICMP (#169947)

Due to untested branch in #119335

Fixes #167326
DeltaFile
+172-6llvm/test/CodeGen/X86/isel-icmp.ll
+1-1llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+1-0llvm/lib/Target/X86/GISel/X86LegalizerInfo.cpp
+174-73 files

LLVM/project 863a4e4llvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/test/Verifier/AMDGPU test-cvt-fp4f6f8-immarg-ranges.ll

[AMDGPU] Add argument range annotations to intrinsics where applicable (#170958)

This commit adds annotations to AMDGPU intrinscis that take arguments
which are documented to lie within a specified range, ensuring that
invalid instances of these intrinsics don't pass verification.

(Note that certain intrinsics that could have range annothations don't,
as their existing behavior is to clamp out-of-range values silently.)

Disclaimer: tests generated by LLM (code is mine)
DeltaFile
+147-0llvm/test/Verifier/AMDGPU/test-cvt-fp4f6f8-immarg-ranges.ll
+10-8llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+157-82 files

LLVM/project 5d53085clang/unittests/Analysis/FlowSensitive MockHeaders.cpp

[NFC] [FlowSensitive] Fix missing namespace in MockHeaders (#170954)

This happened to work because we were missing both a namespace close and
open and things happened to be included in the correct order.
DeltaFile
+4-0clang/unittests/Analysis/FlowSensitive/MockHeaders.cpp
+4-01 files

LLVM/project 56fdc61mlir/lib/Dialect/SCF/IR SCF.cpp

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in SCF.cpp (NFC)
DeltaFile
+4-4mlir/lib/Dialect/SCF/IR/SCF.cpp
+4-41 files