LLVM/project 392c302clang/lib/AST/ByteCode InterpBuiltin.cpp

[Clang] Fix unused variable warning from 1911ce132659222aee353882bd55… (#171223)

…70d689745a7d

These are only used in assertions so trigger warnings in release builds.
Fix this per the LLVM programming standards.
DeltaFile
+4-4clang/lib/AST/ByteCode/InterpBuiltin.cpp
+4-41 files

LLVM/project cda8bfamlir/include/mlir/Conversion/MathToAPFloat MathToAPFloat.h, mlir/include/mlir/Dialect/Func/Utils Utils.h

[mlir][math] Add FP software implementation lowering pass: math-to-apfloat
DeltaFile
+14-53mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+52-0mlir/lib/Conversion/MathToAPFloat/MathToAPFloat.cpp
+38-0mlir/lib/Dialect/Func/Utils/Utils.cpp
+21-0mlir/include/mlir/Conversion/MathToAPFloat/MathToAPFloat.h
+17-0mlir/lib/Conversion/MathToAPFloat/CMakeLists.txt
+16-0mlir/include/mlir/Dialect/Func/Utils/Utils.h
+158-533 files not shown
+175-539 files

LLVM/project 05b7720clang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/CodeGen CIRGenBuiltinX86.cpp

[CIR][X86] Implement lowering for sqrt builtins (#169310)

Implements CIR IR generation for X86-specific sqrt builtin functions,
addressing issue #167765.

## Test Results 

Successfully tested the implementation locally. All tests pass:

```bash
$ ./bin/llvm-lit -v ../clang/test/CIR/CodeGen/X86/cir-sqrt-builtins.c

Testing: 1 tests, 1 workers
PASS: Clang :: CIR/CodeGen/X86/cir-sqrt-builtins.c (1 of 1)

Testing Time: 1.18s
Total Discovered Tests: 1
  Passed: 1 (100.00%)
```

    [4 lines not shown]
DeltaFile
+45-0clang/test/CIR/CodeGen/X86/cir-sqrt-builtins.c
+21-0clang/include/clang/CIR/Dialect/IR/CIROps.td
+7-3clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
+8-0clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+81-34 files

LLVM/project 49f813amlir/include/mlir/Conversion Passes.td, mlir/include/mlir/Conversion/MathToAPFloat MathToAPFloat.h

[mlir][math] Add FP software implementation lowering pass: math-to-apfloat
DeltaFile
+185-0mlir/lib/Conversion/MathToAPFloat/MathToAPFloat.cpp
+21-0mlir/include/mlir/Conversion/MathToAPFloat/MathToAPFloat.h
+17-0mlir/lib/Conversion/MathToAPFloat/CMakeLists.txt
+15-0mlir/include/mlir/Conversion/Passes.td
+1-0mlir/lib/Conversion/CMakeLists.txt
+239-05 files

LLVM/project 786498bllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp, llvm/test/CodeGen/AMDGPU trunc-store.ll fp_trunc_store_fp32_to_bf16.ll

AMDGPU: Fix truncstore from v6f32 to v6f16 (#171212)

The v6bf16 cases work, but that's likely because v6bf16 isn't
currently an MVT.

Fixes: SWDEV-570985
DeltaFile
+125-0llvm/test/CodeGen/AMDGPU/trunc-store.ll
+48-0llvm/test/CodeGen/AMDGPU/fp_trunc_store_fp32_to_bf16.ll
+14-0llvm/test/CodeGen/AMDGPU/trunc-store-f64-to-f16.ll
+1-0llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+188-04 files

LLVM/project 0ce6d56lldb/source/Commands CommandObjectBreakpoint.cpp, lldb/test/API/functionalities/breakpoint/breakpoint_by_line_and_column TestBreakpointByLineAndColumn.py

Fix a typo in "breakpoint add file" and add a test (#171206)

lldbutil.run_to_line_breakpoint had usages that set column breakpoints,
so I thought there was coverage of that on the command-line, but
actually all the `run_to` utilities use the SB API's, and there weren't
any tests of setting file line & column breakpoint through
`run_break_set`. So I missed that I had typed the column option `c` -
that's taken by `--command`.

This patch fixes that typo and adds a CLI test for file + line + column.
DeltaFile
+19-0lldb/test/API/functionalities/breakpoint/breakpoint_by_line_and_column/TestBreakpointByLineAndColumn.py
+1-1lldb/source/Commands/CommandObjectBreakpoint.cpp
+20-12 files

LLVM/project 2ab198fllvm/test/Transforms/VectorCombine/AMDGPU shuffles-of-length-changing-shuffles.ll

AMDGPU: Precommit a test (#171208)

DeltaFile
+50-0llvm/test/Transforms/VectorCombine/AMDGPU/shuffles-of-length-changing-shuffles.ll
+50-01 files

LLVM/project 1911ce1clang/include/clang/Basic BuiltinsX86.td, clang/lib/AST ExprConstant.cpp ExprConstShared.h

[Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - Allow GFNI intrinsics to be used in constexpr (#169619)

Resolves #169295

DeltaFile
+363-31clang/test/CodeGen/X86/gfni-builtins.c
+165-0clang/lib/AST/ExprConstant.cpp
+109-0clang/lib/AST/ByteCode/InterpBuiltin.cpp
+7-25clang/include/clang/Basic/BuiltinsX86.td
+30-0clang/lib/Headers/gfniintrin.h
+8-0clang/lib/AST/ExprConstShared.h
+682-566 files

LLVM/project ef7cfd1llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp, llvm/test/CodeGen/AMDGPU trunc-store.ll fp_trunc_store_fp32_to_bf16.ll

AMDGPU: Fix truncstore from v6f32 to v6f16

The v6bf16 cases work, but that's likely because v6bf16 isn't
currently an MVT.

Fixes: SWDEV-570985
DeltaFile
+125-0llvm/test/CodeGen/AMDGPU/trunc-store.ll
+48-0llvm/test/CodeGen/AMDGPU/fp_trunc_store_fp32_to_bf16.ll
+14-0llvm/test/CodeGen/AMDGPU/trunc-store-f64-to-f16.ll
+1-0llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+188-04 files

LLVM/project f803e46llvm/lib/Transforms/Scalar StraightLineStrengthReduce.cpp, llvm/test/CodeGen/AMDGPU promote-constOffset-to-imm.ll waitcnt-vscnt.ll

Reland "Redesign Straight-Line Strength Reduction (SLSR) (#162930)" (#169614)

This PR implements parts of
https://github.com/llvm/llvm-project/issues/162376

- **Broader equivalence than constant index deltas**:
- Add Base-delta and Stride-delta matching for Add and GEP forms using
ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available
IR value dominates the user.
- **Dominance-aware dictionary instead of linear scans**:
  - Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating
basis quickly and deterministically.
- **Simple cost model and best-rewrite selection**:
- Score candidate expressions and rewrites; select the highest-profit
rewrite per instruction.
- Skip rewriting when expressions are already foldable or
high-efficiency.

    [15 lines not shown]
DeltaFile
+883-273llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
+243-244llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
+133-196llvm/test/CodeGen/AMDGPU/waitcnt-vscnt.ll
+271-0llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-i8-gep.ll
+156-7llvm/test/Transforms/StraightLineStrengthReduce/slsr-gep.ll
+83-80llvm/test/CodeGen/AMDGPU/idot4s.ll
+1,769-80012 files not shown
+2,200-83618 files

LLVM/project c3acafccompiler-rt/lib/hwasan hwasan_allocator.h

[hwasan] Add config for AArch64 Linux with 39-bit VA. (#170927)

This is leveraging work which has already been done for Android, which
ships 39-bit VA kernels, and extending it to other embedded Linux
targets.

(SANITIZER_AARCH64_39BIT_VA was added in 58c8f57681.)
DeltaFile
+2-1compiler-rt/lib/hwasan/hwasan_allocator.h
+2-11 files

LLVM/project b71eb53compiler-rt/lib/asan asan_allocator.h

[asan] Add config for AArch64 Linux with 39-bit VA. (#170929)

This is leveraging work which has already been done for Android, which
ships 39-bit VA kernels, and extending it to other embedded Linux
targets.

(SANITIZER_AARCH64_39BIT_VA was added in 58c8f57681.)
DeltaFile
+2-1compiler-rt/lib/asan/asan_allocator.h
+2-11 files

LLVM/project 5c8c7f3llvm/lib/Transforms/Vectorize LoadStoreVectorizer.cpp, llvm/test/Transforms/LoadStoreVectorizer/NVPTX masked-store.ll vectorize_i8.ll

[LoadStoreVectorizer] Fill gaps in load/store chains to enable vectorization (#159388)

This change introduces Gap Filling, an optimization that aims to fill in
holes in otherwise contiguous load/store chains to enable vectorization.
It also introduces Chain Extending, which extends the end of a chain to
the closest power of 2.

This was originally motivated by the NVPTX target, but I tried to
generalize it to be universally applicable to all targets that may use
the LSV. I'm more than willing to make adjustments to improve the
target-agnostic-ness of this change. I fully expect there are some
issues and encourage feedback on how to improve things.

For both loads and stores we only perform the optimization when we can
generate a legal llvm masked load/store intrinsic, masking off the
"extra" elements. Determining legality for stores is a little tricky
from the NVPTX side, because these intrinsics are only supported for
256-bit vectors. See the other PR I opened for the implementation of the
NVPTX lowering of masked store intrinsics, which include NVPTX TTI

    [12 lines not shown]
DeltaFile
+541-0llvm/test/Transforms/LoadStoreVectorizer/NVPTX/masked-store.ll
+407-69llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+216-41llvm/test/Transforms/LoadStoreVectorizer/NVPTX/vectorize_i8.ll
+194-0llvm/test/Transforms/LoadStoreVectorizer/NVPTX/gap-fill.ll
+166-0llvm/test/Transforms/LoadStoreVectorizer/NVPTX/gap-fill-vectors.ll
+113-0llvm/test/Transforms/LoadStoreVectorizer/NVPTX/extend-chain.ll
+1,637-1108 files not shown
+2,029-15214 files

LLVM/project 89bc5ffclang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

[CIR] Clean up visibility conversion (NFC) (#171000)

DeltaFile
+5-7clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+5-71 files

LLVM/project 2ee921fllvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AMDGPU extract-insert-i8.ll

VectorCombine: Improve the insert/extract fold in the narrowing case

Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:

1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
   allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
   compatible, which allows foldLengthChangingShuffles to successfully
   recognize a chain that can be folded.

There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2`
at all.

commit-id:c151bb04
DeltaFile
+6-16llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+2-16llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+8-4llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
+4-4llvm/test/Transforms/VectorCombine/X86/extract-insert.ll
+2-2llvm/test/Transforms/VectorCombine/X86/pr126085.ll
+22-425 files

LLVM/project 316715ellvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AMDGPU extract-insert-i8.ll shuffles-of-length-changing-shuffles.ll

VectorCombine: Fold chains of shuffles fed by length-changing shuffles

Such chains can arise from folding insert/extract chains.

commit-id:a960175d
DeltaFile
+192-0llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+4-32llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+4-8llvm/test/Transforms/VectorCombine/AMDGPU/shuffles-of-length-changing-shuffles.ll
+200-403 files

LLVM/project 1e44d4bllvm/test/Transforms/VectorCombine/AMDGPU shuffles-of-length-changing-shuffles.ll

AMDGPU: Precommit a test

commit-id:a0814f87
DeltaFile
+50-0llvm/test/Transforms/VectorCombine/AMDGPU/shuffles-of-length-changing-shuffles.ll
+50-01 files

LLVM/project e7f6038llvm/test/CodeGen/Generic reloc-none.ll

[LLVM] Mark reloc-none test unsupported on Hexagon (#171205)

Prevents infinite loop issue recorded in #147427. More work will be
required to make @llvm.reloc_none work correctly on Hexagon.
DeltaFile
+1-0llvm/test/CodeGen/Generic/reloc-none.ll
+1-01 files

LLVM/project 65dd29bllvm/lib/Transforms/Vectorize LoopVectorize.cpp

[LV] Compare induction start values via SCEV in assertion (NFCI).

Instead of comparing plain VPValue in the assertion checking the start
values, directly compare the SCEV's. This future-proofs the code in
preparation of performing more simplifications/canonicalizations for
live-ins.
DeltaFile
+4-1llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+4-11 files

LLVM/project d33d9ac.github/workflows premerge.yaml

[Github] Make premerge update correct comments file for Windows

platform.machine() on x86_64 on Windows returns AMD64 rather than
x86_64. Make premerge.yaml reflect this.
DeltaFile
+1-1.github/workflows/premerge.yaml
+1-11 files

LLVM/project 3a77056clang-tools-extra/clang-tidy/abseil UncheckedStatusOrAccessCheck.h

change

Created using spr 1.3.7
DeltaFile
+1-1clang-tools-extra/clang-tidy/abseil/UncheckedStatusOrAccessCheck.h
+1-11 files

LLVM/project 1ec8e00clang/test/Headers __clang_hip_math.hip, llvm/test/CodeGen/X86 shift-i512.ll bitcnt-big-integer.ll

fix

Created using spr 1.3.7
DeltaFile
+2,027-185llvm/test/CodeGen/X86/shift-i512.ll
+1,563-413llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+1,005-956clang/test/Headers/__clang_hip_math.hip
+0-1,298openmp/runtime/src/include/omp_lib.h.var
+1,298-0openmp/module/omp_lib.h.var
+1,183-0openmp/module/omp_lib.F90.var
+7,076-2,852734 files not shown
+22,047-12,945740 files

LLVM/project 9cc4265clang/test/Headers __clang_hip_math.hip, llvm/test/CodeGen/X86 shift-i512.ll bitcnt-big-integer.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+2,027-185llvm/test/CodeGen/X86/shift-i512.ll
+1,563-413llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+1,005-956clang/test/Headers/__clang_hip_math.hip
+0-1,298openmp/runtime/src/include/omp_lib.h.var
+1,298-0openmp/module/omp_lib.h.var
+1,183-0openmp/module/omp_lib.F90.var
+7,076-2,852731 files not shown
+22,029-12,926737 files

LLVM/project f6971bfclang/lib/Analysis/FlowSensitive/Models UncheckedStatusOrAccessModel.cpp, clang/unittests/Analysis/FlowSensitive UncheckedStatusOrAccessModelTestFixture.cpp

[FlowSensitive] [StatusOr] [12/N] Add support for smart pointers (#170943)

DeltaFile
+74-0clang/unittests/Analysis/FlowSensitive/UncheckedStatusOrAccessModelTestFixture.cpp
+48-0clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
+122-02 files

LLVM/project 0f60cdflld/ELF Relocations.cpp SyntheticSections.h, lld/ELF/Arch AArch64.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.5
DeltaFile
+32-44lld/ELF/Relocations.cpp
+29-5lld/ELF/SyntheticSections.h
+8-24lld/ELF/Arch/AArch64.cpp
+5-15lld/ELF/Writer.cpp
+5-1lld/ELF/SyntheticSections.cpp
+2-0lld/ELF/InputSection.cpp
+81-891 files not shown
+82-897 files

LLVM/project 0eb9c6clld/ELF Relocations.cpp SyntheticSections.h, lld/ELF/Arch AArch64.cpp

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.5

[skip ci]
DeltaFile
+32-44lld/ELF/Relocations.cpp
+8-24lld/ELF/Arch/AArch64.cpp
+25-2lld/ELF/SyntheticSections.h
+2-1lld/ELF/Writer.cpp
+2-0lld/ELF/InputSection.cpp
+1-0lld/ELF/Relocations.h
+70-711 files not shown
+70-727 files

LLVM/project e3905a4llvm/lib/Transforms/Instrumentation MemProfUse.cpp, llvm/test/Transforms/PGOProfile memprof_annotate_indirect_call.test

[MemProf] Merge all callee guids for indirect call VP metadata (#170964)

When matching memprof profiles, for indirect calls we use the callee
guids recorded on callsites in the profile to synthesize indirect call
VP metadata when none exists. However, we only do this for the first
matching CallSiteEntry from the profile.

In some case there can be multiple, for example when the current
function was eventually inlined into multiple callers. Profile
generation propagates the CallSiteEntry from those callers into the
inlined callee's profile as it may not yet have been inlined in the
new compile.

To capture all of these potential indirect call targets, merge callee
guids across all matching CallSiteEntries.
DeltaFile
+40-41llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
+14-2llvm/test/Transforms/PGOProfile/memprof_annotate_indirect_call.test
+54-432 files

LLVM/project 1e16f4eflang/include/flang/Optimizer/Builder FIRBuilder.h, flang/lib/Optimizer/Builder FIRBuilder.cpp

[flang] add simplification for ProductOp intrinsic (#169575)

Add simplification for `ProductOp`, by implementing support for
`ReductionConversion` and adding it to the pattern list in
`SimplifyHLFIRIntrinsics` pass.

Closes:
https://github.com/issues/recent?issue=llvm%7Cllvm-project%7C169433

---------

Co-authored-by: Eugene Epshteyn <eepshteyn at nvidia.com>
DeltaFile
+457-0flang/test/HLFIR/simplify-hlfir-intrinsics-product.fir
+49-0flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
+20-0flang/lib/Optimizer/Builder/FIRBuilder.cpp
+10-0flang/include/flang/Optimizer/Builder/FIRBuilder.h
+536-04 files

LLVM/project d17f3b5compiler-rt/test/xray/TestCases/Posix basic-filtering.cpp fdr-mode.cpp

[XRay] Disable two more tests on armhf

Similar to d6f92050c0c2f60e78f3c8bcf557c5e69b025d7a. Needed now that
these tests are actually running more broadly.
DeltaFile
+2-0compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
+1-0compiler-rt/test/xray/TestCases/Posix/fdr-mode.cpp
+3-02 files

LLVM/project 2ba52c1clang/test/Headers __clang_hip_math.hip, llvm/test/CodeGen/X86 shift-i512.ll bitcnt-big-integer.ll

reb

Created using spr 1.3.7
DeltaFile
+2,027-185llvm/test/CodeGen/X86/shift-i512.ll
+1,563-413llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+1,005-956clang/test/Headers/__clang_hip_math.hip
+0-1,298openmp/runtime/src/include/omp_lib.h.var
+1,298-0openmp/module/omp_lib.h.var
+0-1,183openmp/runtime/src/include/omp_lib.F90.var
+5,893-4,035725 files not shown
+21,447-12,890731 files