LLVM/project 3cf97d8llvm/lib/Target/X86 X86ISelLowering.cpp

[X86] Make ISD::ROTL/ROTR vector rotates legal on XOP+AVX512 targets (#184587)

Similar to what we did for funnel shifts on #166949 - set vector rotates
as legal on XOP (128-bit ROTL) and AVX512 (vXi32/vXi64 ROTL/ROTR)
targets, and custom fold to X86ISD::VROTLI/VROTRI as a later fixup.

128/256-bit vector widening to 512-bit instructions is already fully
supported + tested on AVX512F-only targets

First part of #184002
DeltaFile
+38-37llvm/lib/Target/X86/X86ISelLowering.cpp
+38-371 files

LLVM/project 9c35a7bllvm/include/llvm/CodeGen SDPatternMatch.h, llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

[AArch64] Refine reduction VT selection in CTPOP -> VECREDUCE combine (#183025)

Use the same VT as the SETcc source, or fall back to using the VT of the
unextended operand of the CTPOP if the element size of the SETcc is too
small to fit the negative popcount.
DeltaFile
+39-35llvm/test/CodeGen/AArch64/popcount_vmask.ll
+17-5llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+15-5llvm/include/llvm/CodeGen/SDPatternMatch.h
+71-453 files

LLVM/project 6778c11llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Analysis/CostModel/AArch64 sve-intrinsics.ll sve-min-max.ll

[AArch64] Fix SVE cost model for various math intrinsics (#184358)

The implementation of getIntrinsicInstrCost in BasicTTIImpl
assumes that for some intrinsics if we're using custom
lowering for the equivalent DAG node that the cost needs to
be 2, instead of 1 for legal ops. However, even though we
use custom lowering for these scalable vector operations
when SVE is available, we still end up generating the same
efficient codegen as fixed-width. This patch deals with a
few obvious intrinsics that we know get lowered to something
sensible and return the same cost as NEON, i.e. 1.
DeltaFile
+311-6llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
+168-148llvm/test/Transforms/LoopVectorize/AArch64/veclib-intrinsic-calls.ll
+36-36llvm/test/Analysis/CostModel/AArch64/sve-min-max.ll
+23-3llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+1-1llvm/test/Analysis/CostModel/AArch64/sve-math.ll
+539-1945 files

LLVM/project 1f53da0llvm/docs/CommandGuide llvm-objdump.rst, llvm/test/CodeGen/BPF objdump_cond_op_2.ll objdump_cond_op.ll

[llvm-objdump] Default --symbolize-operands for BPF (#184043)

BPF users expect to see basic block labels (e.g. <L0>, <L1>) in
disassembly output

(https://github.com/llvm/llvm-project/pull/95103#issuecomment-3771234810).
Default --symbolize-operands to on for BPF targets when neither
--symbolize-operands nor --no-symbolize-operands is explicitly
specified.

Add --no-symbolize-operands to allow users to opt out.
DeltaFile
+21-2llvm/test/tools/llvm-objdump/BPF/disassemble-symbolize-operands.s
+11-2llvm/tools/llvm-objdump/llvm-objdump.cpp
+3-2llvm/tools/llvm-objdump/ObjdumpOpts.td
+3-2llvm/docs/CommandGuide/llvm-objdump.rst
+1-1llvm/test/CodeGen/BPF/objdump_cond_op_2.ll
+1-1llvm/test/CodeGen/BPF/objdump_cond_op.ll
+40-106 files

LLVM/project 0418700llvm/lib/CodeGen/SelectionDAG SelectionDAGBuilder.cpp, llvm/lib/Target/NVPTX NVPTXInstrInfo.td

[SDAGBuilder] Fix incorrect fcmp+select to minnum/maxnum transform (#184590)

minnum/maxnum don't have the correct sNaN semantics, we must convert to
minimumnum/maximumnum instead.

To avoid an NVPTX regression, make it handle fmaximmumnum in one
TableGen pattern.

This is intended as a targeted fix for the miscompile, as the complete
removal of this transform (#93575) appears to be blocked.

Fixes https://github.com/llvm/llvm-project/issues/176624.
DeltaFile
+89-49llvm/test/CodeGen/ARM/vminmaxnm-safe.ll
+57-29llvm/test/CodeGen/ARM/fp16-vminmaxnm-safe.ll
+36-24llvm/test/CodeGen/X86/avx512-broadcast-unfold.ll
+5-17llvm/test/CodeGen/X86/sse-minmax.ll
+5-5llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+8-2llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+200-1262 files not shown
+208-1318 files

LLVM/project 95685cabolt/include/bolt/Core BinaryContext.h, bolt/lib/Core BinaryContext.cpp

[BOLT] Retain certain local symbols (#184074)

BOLT currently strips all STT_NOTYPE STB_LOCAL zero-sized symbols
that fall inside function bodies. Certain such symbols are named
labels (loop markers and subroutine entry points) or local function
symbols in hand-written assembly. We now keep them in local symbol
table in BOLT processed binaries for better symbolication.
DeltaFile
+29-14bolt/lib/Rewrite/RewriteInstance.cpp
+30-0bolt/test/AArch64/retain-local-symbols.s
+17-12bolt/lib/Core/BinaryContext.cpp
+8-0bolt/test/AArch64/compare-and-branch-inversion.S
+3-2bolt/test/X86/dynamic-relocs-on-entry.s
+3-1bolt/include/bolt/Core/BinaryContext.h
+90-292 files not shown
+94-308 files

LLVM/project 17e783bmlir/include/mlir/Dialect/LLVMIR NVVMOps.td, mlir/lib/Dialect/LLVMIR/IR NVVMDialect.cpp

[MLIR][NVVM] Add nvvm.addf and nvvm.subf Ops (#179162)

Adds `nvvm.addf` and `nvvm.subf` Ops to the NVVM dialect. `nvvm.addf`
performs a floating-point addition between two operands. `nvvm.subf`
performs a floating-point subtraction between two operands and is
equivalent to an `llvm.fneg` followed by an `nvvm.addf` operation.

PTX ISA Reference:
1.
https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-add
2.
https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions-add
DeltaFile
+285-0mlir/test/Target/LLVMIR/nvvm/addf/addf_vector.mlir
+117-0mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
+90-2mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+89-0mlir/test/Target/LLVMIR/nvvm/addf/addf.mlir
+67-0mlir/test/Target/LLVMIR/nvvm/addf/addf_invalid.mlir
+64-0mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
+712-21 files not shown
+721-27 files

LLVM/project 3b65752llvm/test/CodeGen/AArch64 clmul-fixed.ll

[AArch64] Enabled and regenerate clmul-fixed.ll. NFC (#184628)

The v2i64 tests are now fixed. The disabled ones in clmul-scalable.ll
require i128 vectors which are generally not supported.
DeltaFile
+3,707-46llvm/test/CodeGen/AArch64/clmul-fixed.ll
+3,707-461 files

LLVM/project 821368flld/ELF/Arch RISCV.cpp LoongArch.cpp, lld/test/ELF riscv-relax-synthetic-in-text.s loongarch-relax-synthetic-in-text.s

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+28-0lld/test/ELF/riscv-relax-synthetic-in-text.s
+27-0lld/test/ELF/loongarch-relax-synthetic-in-text.s
+10-1lld/ELF/Arch/RISCV.cpp
+6-1lld/ELF/Arch/LoongArch.cpp
+71-24 files

LLVM/project 7691c0bclang/lib/Headers hvx_hexagon_protos.h, llvm/test/CodeGen/AArch64 arm64-cvt-simd-intrinsics.ll sve-fixed-vector-lrint.ll

Merge remote-tracking branch 'external-upstream/main' into users/mariusz-sikora-at-amd/add-flat-offset-bits-feature
DeltaFile
+53,024-7,001llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+15,172-1,553llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+1,205-1llvm/test/CodeGen/AArch64/arm64-cvt-simd-intrinsics.ll
+287-891clang/lib/Headers/hvx_hexagon_protos.h
+506-540llvm/test/CodeGen/AArch64/sve-fixed-vector-lrint.ll
+250-281llvm/test/CodeGen/AArch64/vector-lrint.ll
+70,444-10,267637 files not shown
+84,425-15,730643 files

LLVM/project 8251293llvm/lib/Transforms/Vectorize VPlanConstruction.cpp VPlanPredicator.cpp, llvm/test/Transforms/LoopVectorize/VPlan tail-folding.ll

[VPlan] Move tail folding out of VPlanPredicator. NFC (#176143)

Currently the logic for introducing a header mask and predicating the
vector loop region is done inside introduceMasksAndLinearize.

This splits the tail folding part out into an individual VPlan transform
so that VPlanPredicator.cpp doesn't need to worry about tail folding,
which seemed to be a temporary measure according to a comment in
VPlanTransforms.h.

To perform tail folding independently, this splits the "body" of the
vector loop region between the phis in the header and the branch + iv
increment in the latch:

Before:

```
+-------------------------------------------+
|%iv = ...                                  |

    [39 lines not shown]
DeltaFile
+338-0llvm/test/Transforms/LoopVectorize/VPlan/tail-folding.ll
+90-0llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+3-58llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+37-3llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+4-3llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+5-0llvm/lib/Transforms/Vectorize/VPlan.cpp
+477-641 files not shown
+478-647 files

LLVM/project b0da64e.ci monolithic-linux.sh

[CI] Enable LTO linker plugin tests (#184076)

We've recently had two instances of test failures for the LTO linker
plugin being introduced. Build and test the LTO linker plugin in
pre-merge CI to avoid this.
DeltaFile
+2-1.ci/monolithic-linux.sh
+2-11 files

LLVM/project d5378dallvm/lib/Target/SystemZ SystemZISelLowering.cpp SystemZInstrVector.td, llvm/test/CodeGen/SystemZ fminimumnum-fmaximumnum.ll

[SystemZ] Mark fminimumnum/fmaximumnum as legal (#184595)

In M=4 mode, the behavior matches IEEE 754-2019 minimumNumber, except
that if both operands are sNaN, the result will be sNaN rather than
qNaN. However, this is explicitly allowed for LLVM's minimumnum
intrinsic, as canonicalization can be omitted for non-constrainted FP.

As such, mark fminimumnum/fmaximumnum as legal, and lower them the same
way as fminnum/fmaxnum. In the future, we may wish to switch those to
use M=0 instead, to match IEEE 754-2008 maxNum/minNum instead.
DeltaFile
+164-0llvm/test/CodeGen/SystemZ/fminimumnum-fmaximumnum.ll
+8-24llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+2-0llvm/lib/Target/SystemZ/SystemZInstrVector.td
+174-243 files

LLVM/project d998a70libclc CMakeLists.txt

libclc: Fix checking for arch including OS in wrong place (#184683)

DeltaFile
+1-1libclc/CMakeLists.txt
+1-11 files

LLVM/project b414d77mlir/include/mlir/Dialect/LLVMIR NVVMOps.td, mlir/test/Target/LLVMIR/nvvm tcgen05-mma-block-scale-shared.mlir tcgen05-mma-block-scale-tensor.mlir

[MLIR][NVVM] Unify and move to a single tcgen05_mma_kind attr for all tcgen05.mma Ops (#184433)

This change unifies using of `tcgen05_mma_kind` attribute for
tcgen05.mma Ops in MLIR.

Before this change there were two block scale attributes used for
tcgen05.mma Ops. One was `MMABlockScaleKindAttr` with `mxf8f6f4`, `mxf4`
and `fxf4nvf4` values used for `tcgen05.mma.block_scale` and
`tcgen05.mma.sp.block_scale`. Another one was `Tcgen05MMAKindAttr` with
`f16`, `tf32`, `f8f6f4` and `i8` values used for `tcgen05.mma`,
`tcgen05.mma.sp`, `tcgen05.mma.ws` and `tcgen05.mma.ws.sp`.

`Tcgen05MMAKindAttr` has been extended with values from
`MMABlockScaleKindAttr`. Now there is `tcgen05_mma_kind` attribute only
for all `tcgen05.mma` Ops in MLIR.

Backward compatibility is not supported. Existing tests and scripts
should be updated to use `tcgen05_mma_kind` attribute instead of
`block_scale_kind` for all tcgen05.mma MLIR Ops.
DeltaFile
+49-48mlir/test/Target/LLVMIR/nvvm/tcgen05-mma-block-scale-shared.mlir
+49-48mlir/test/Target/LLVMIR/nvvm/tcgen05-mma-block-scale-tensor.mlir
+49-48mlir/test/Target/LLVMIR/nvvm/tcgen05-mma-sp-block-scale-shared.mlir
+49-48mlir/test/Target/LLVMIR/nvvm/tcgen05-mma-sp-block-scale-tensor.mlir
+64-4mlir/test/Target/LLVMIR/nvvm/tcgen05-mma-invalid.mlir
+41-18mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+301-2141 files not shown
+310-2247 files

LLVM/project 50826f9mlir/lib/Dialect/MemRef/IR MemRefOps.cpp, mlir/test/Dialect/MemRef fold-memref-alias-ops.mlir canonicalize.mlir

[mlir][MemRef] Add position-based matching heuristics for rank-reduction with dynamic strides (#184334)

When multiple source dimensions have multiple unit dimensions,
stride-based disambiguation can be wrong with dynamic strides. Add
position-based matching: for each result dimension in order, pick the
leftmost unmatched source dimension with the same size; unmatched source
dims are dropped.

Example: subview from memref<1x8x1x3> to memref<1x8x3>. Both dim 0 and
dim 2 have size 1. Stride-based logic cannot distinguish when strides
are dynamic. Position-based matching correctly drops dim 2 (middle unit
dim) instead of dim 0.

When we have non-trivial static strides, we make use of the stride-based
logic, else we fall back to position-based logic as introduced by this
patch.

INPUT :-
```

    [22 lines not shown]
DeltaFile
+114-35mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+25-4mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir
+1-1mlir/test/Dialect/MemRef/canonicalize.mlir
+140-403 files

LLVM/project 6e2ffd5clang/lib/Driver ToolChain.cpp, flang-rt/lib/runtime execute.cpp

rebase

Created using spr 1.3.7
DeltaFile
+64-64flang-rt/lib/runtime/execute.cpp
+2-104llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+25-0clang/lib/Driver/ToolChain.cpp
+0-19llvm/test/CodeGen/RISCV/rvv/reduce-vl-peephole.ll
+6-11flang-rt/unittests/Runtime/CommandTest.cpp
+0-15llvm/test/CodeGen/RISCV/rvv/reduce-vl-peephole.mir
+97-21382 files not shown
+228-31888 files

LLVM/project 35f20e5clang/lib/Driver ToolChain.cpp, flang-rt/lib/runtime execute.cpp

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+64-64flang-rt/lib/runtime/execute.cpp
+2-104llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+25-0clang/lib/Driver/ToolChain.cpp
+0-19llvm/test/CodeGen/RISCV/rvv/reduce-vl-peephole.ll
+6-11flang-rt/unittests/Runtime/CommandTest.cpp
+0-15llvm/test/CodeGen/RISCV/rvv/reduce-vl-peephole.mir
+97-21382 files not shown
+228-31888 files

LLVM/project 544de17clang/lib/Driver ToolChain.cpp, flang-rt/lib/runtime execute.cpp

clang only

Created using spr 1.3.7
DeltaFile
+64-64flang-rt/lib/runtime/execute.cpp
+2-104llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+25-0clang/lib/Driver/ToolChain.cpp
+0-19llvm/test/CodeGen/RISCV/rvv/reduce-vl-peephole.ll
+6-11flang-rt/unittests/Runtime/CommandTest.cpp
+0-15llvm/test/CodeGen/RISCV/rvv/reduce-vl-peephole.mir
+97-21382 files not shown
+228-31888 files

LLVM/project 73b1c2blibcxx/include string

don't leak

Created using spr 1.3.7
DeltaFile
+1-5libcxx/include/string
+1-51 files

LLVM/project a2407f6clang-tools-extra/clangd/index Ref.cpp Ref.h

[clangd][NFC] Add RefKind::Call into RefKind::All and insertion operator (#184677)

Without this patch:
- RefKind output doesn't show RefKind::Call bit.
- RefKind::Call isn't included in RefKind::All.

I don't think these changes require additional tests, as the problems
above mainly appear during testing/debugging (e.g. if in tests
comparison of two RefKinds fails, `Call` isn't shown in the output even
if this bit is set).
DeltaFile
+2-2clang-tools-extra/clangd/index/Ref.cpp
+1-1clang-tools-extra/clangd/index/Ref.h
+3-32 files

LLVM/project 7bb3139clang-tools-extra/test/clang-tidy/checkers/bugprone suspicious-string-compare.c, clang-tools-extra/test/clang-tidy/checkers/hicpp signed-bitwise-integer-literals.cpp

[clang-tidy][NFC] Run tests in multiple language modes where possible (#184741)

DeltaFile
+4-4clang-tools-extra/test/clang-tidy/checkers/modernize/use-designated-initializers.cpp
+3-3clang-tools-extra/test/clang-tidy/checkers/readability/redundant-casting.cpp
+2-3clang-tools-extra/test/clang-tidy/checkers/hicpp/signed-bitwise-integer-literals.cpp
+2-3clang-tools-extra/test/clang-tidy/checkers/bugprone/suspicious-string-compare.c
+2-2clang-tools-extra/test/clang-tidy/checkers/readability/redundant-inline-specifier.cpp
+2-2clang-tools-extra/test/clang-tidy/checkers/modernize/use-std-print-absl.cpp
+15-1764 files not shown
+91-9470 files

LLVM/project 6c30f83libc/src/__support/FPUtil multiply_add.h, libc/src/__support/macros attributes.h config.h

[libc][math] Qualify log with constant evaluation support
DeltaFile
+16-1libc/test/shared/shared_math_test.cpp
+14-0libc/src/__support/macros/attributes.h
+10-0libc/src/__support/macros/config.h
+5-5libc/src/__support/math/log.h
+2-2libc/src/__support/FPUtil/multiply_add.h
+2-2libc/src/__support/math/log_range_reduction.h
+49-101 files not shown
+53-107 files

LLVM/project 87b6dbeoffload/test/offloading dyn_groupprivate.cpp, openmp/device/include DeviceTypes.h

Add omp_get_dyn_gprivate_memspace routine
DeltaFile
+14-4openmp/device/src/State.cpp
+16-0offload/test/offloading/dyn_groupprivate.cpp
+7-0openmp/runtime/src/kmp_stub.cpp
+6-0openmp/device/include/DeviceTypes.h
+4-0openmp/runtime/src/kmp_csupport.cpp
+4-0openmp/runtime/src/include/omp.h.var
+51-43 files not shown
+55-49 files

LLVM/project 0f77c55libc/src/__support/FPUtil multiply_add.h, libc/src/__support/macros attributes.h config.h

[libc][math] Qualify log with constant evaluation support
DeltaFile
+16-0libc/test/shared/shared_math_test.cpp
+14-0libc/src/__support/macros/attributes.h
+10-0libc/src/__support/macros/config.h
+5-5libc/src/__support/math/log.h
+2-2libc/src/__support/FPUtil/multiply_add.h
+2-2libc/src/__support/math/log_range_reduction.h
+49-91 files not shown
+53-97 files

LLVM/project 22bf237flang-rt/lib/runtime edit-input.cpp

[flang-rt] Handle NAMELIST logical comments without preceding space (#183202)

If a comment appears immediately after a logical value in a NAMELIST
file, the flang runtime returns IostatGenericError. No error occurs when
a space preceeds the exclamation point. Add code to handle a comment
while parsing logical values.

Co-authored-by: John Otken john.otken at hpe.com
DeltaFile
+5-0flang-rt/lib/runtime/edit-input.cpp
+5-01 files

LLVM/project f49871elibc/src/__support/math CMakeLists.txt

[libc][math][NFC] Fix dependency for acospif (#184738)

DeltaFile
+1-1libc/src/__support/math/CMakeLists.txt
+1-11 files

LLVM/project 7418d1bclang/include/clang/Basic DiagnosticDriverKinds.td, clang/include/clang/Driver ToolChain.h

[Clang] Add clang flag --cstdlib (#183254)

Introduce clang flag --cstdlib based on RFC:

https://discourse.llvm.org/t/rfc-add-command-line-option-for-selecting-c-library/87335

This flag accepts a string i.e. the name of the C library that user
wants to use.
Toolchain drivers can handle this flag as per need or ignore it.
DeltaFile
+25-0clang/lib/Driver/ToolChain.cpp
+13-0clang/include/clang/Driver/ToolChain.h
+5-0clang/include/clang/Options/Options.td
+2-0clang/include/clang/Basic/DiagnosticDriverKinds.td
+45-04 files

LLVM/project b5c4051libc/src/__support/math CMakeLists.txt

[libc][math][NFC] Fix dependency for acospif
DeltaFile
+1-1libc/src/__support/math/CMakeLists.txt
+1-11 files

LLVM/project 31c405doffload/plugins-nextgen/common/src PluginInterface.cpp

Fix usage of REPORT
DeltaFile
+2-2offload/plugins-nextgen/common/src/PluginInterface.cpp
+2-21 files