LLVM/project 13b20e7llvm/lib/Target/AMDGPU SILoadStoreOptimizer.cpp SIFoldOperands.cpp, llvm/test/CodeGen/AMDGPU promote-constOffset-to-imm-gfx12.mir promote-constOffset-to-imm-gfx12.ll

[AMDGPU][SILoadStoreOptimizer] Fix lds address operand offset (#176816)

The offset operand in GLOBAL_LOAD_ASYNC_TO_LDS_B128, for instance, is
added to both the lds and global address, but SILoadStoreOptimizer is
currently unaware of that. This PR inserts an add to counteract the
offset meant for the global address. This one add is better than not
doing the optimization at all, and having to insert 2 adds for each
global address calculation (with no offset).

```
; ENABLE-LABEL: name: promote_async_load_offset
; ENABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; ENABLE-NEXT: {{  $}}
; ENABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; ENABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 512, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; ENABLE-NEXT: renamable $vgpr0 = V_ADD_U32_e32 256, $vgpr1, implicit $exec
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -256, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

    [18 lines not shown]
DeltaFile
+110-26llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+111-0llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir
+97-0llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.ll
+6-24llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+18-0llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+10-0llvm/lib/Target/AMDGPU/SIInstrInfo.h
+352-506 files

LLVM/project a53daacclang/test/Driver riscv-cpus.c, clang/test/Driver/print-enabled-extensions riscv-spacemit-x100.c

[RISCV] Add Spacemit X100 processor definition (#173988)

SpacemiT X100 is a 4-issue, out-of-order, RVA23 processor

https://www.spacemit.com/en/spacemit-x100-core/
DeltaFile
+83-0clang/test/Driver/print-enabled-extensions/riscv-spacemit-x100.c
+28-0llvm/lib/Target/RISCV/RISCVProcessors.td
+8-0clang/test/Driver/riscv-cpus.c
+2-0clang/test/Misc/target-invalid-cpu-note/riscv.c
+1-0llvm/docs/ReleaseNotes.md
+122-05 files

LLVM/project a9b7b4dclang/lib/AST/ByteCode Interp.h, clang/test/AST/ByteCode floats.cpp

[clang][bytecode] Fix crash caused by overflow of Casting float number to integer (#177815)

Before this PR evaluation process will stop immediately regradless of
whether it's set to handle overflow,

this will prevent us getting value from stack, which leads to crash(with
or without assertion).

Closes  #177751.
DeltaFile
+15-0clang/test/AST/ByteCode/floats.cpp
+6-4clang/lib/AST/ByteCode/Interp.h
+21-42 files

LLVM/project e68eadfclang/lib/AST/ByteCode Compiler.cpp, clang/test/AST/ByteCode complex.cpp

[clang][bytecode] Fix crash on discarded complex comparison (#177731)

Fixes llvm#176902: [clang][bytecode] crashes on ill-formed
_Static_assert comparing complex value

This patch resolves a crash in Clang's constant evaluation when handling
complex number comparisons in discarded expressions, such as those
involving short-circuiting logical operators. The crash occurred due to
unnecessary evaluation of the comparison in the experimental constant
interpreter.

The issue was originally observed and minimized in the following
example:

```cpp
#define EVAL(a, b) _Static_assert(a == b, "")

void foo() {
  EVAL(; + 0, 1i);

    [19 lines not shown]
DeltaFile
+13-0clang/test/AST/ByteCode/complex.cpp
+2-1clang/lib/AST/ByteCode/Compiler.cpp
+15-12 files

LLVM/project 772b15bclang/lib/Sema SemaLambda.cpp, clang/test/Modules pr177385.cppm

[C++20] [Modules] Set ManglingContextDecl when we need to mangle a lambda but it's nullptr (#177899)

Close https://github.com/llvm/llvm-project/issues/177385

The root cause of the problem is, when we decide to mangle a lamdba in a
module interface while the ManglingContextDecl is nullptr, we didn't
update ManglingContextDecl. So that the following use of
ManglingContextDecl is an invalid value.
DeltaFile
+165-0clang/test/Modules/pr177385.cppm
+12-11clang/lib/Sema/SemaLambda.cpp
+177-112 files

LLVM/project 551b713llvm/test/CodeGen/AArch64 arm64-neon-mul-div.ll, llvm/test/CodeGen/X86 clmul-vector-256.ll

rebase after merging #176870

Created using spr 1.3.8-beta.1
DeltaFile
+52,760-0polly/lib/External/isl/include/isl/typed_cpp.h
+30,864-0polly/lib/External/isl/include/isl/cpp.h
+21,192-0polly/lib/External/isl/include/isl/cpp-checked.h
+19,097-0polly/lib/External/isl/interface/isl.py.core
+1,963-1,787llvm/test/CodeGen/X86/clmul-vector-256.ll
+2,459-1,242llvm/test/CodeGen/AArch64/arm64-neon-mul-div.ll
+128,335-3,0291,002 files not shown
+173,760-18,6741,008 files

LLVM/project ca3b1bbllvm/test/CodeGen/AArch64 arm64-neon-mul-div.ll, llvm/test/CodeGen/X86 clmul-vector-256.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.8-beta.1

[skip ci]
DeltaFile
+52,760-0polly/lib/External/isl/include/isl/typed_cpp.h
+30,864-0polly/lib/External/isl/include/isl/cpp.h
+21,192-0polly/lib/External/isl/include/isl/cpp-checked.h
+19,097-0polly/lib/External/isl/interface/isl.py.core
+1,963-1,787llvm/test/CodeGen/X86/clmul-vector-256.ll
+2,459-1,242llvm/test/CodeGen/AArch64/arm64-neon-mul-div.ll
+128,335-3,0291,002 files not shown
+173,760-18,6741,008 files

LLVM/project fc2230fclang/test/Driver print-supported-extensions-riscv.c, llvm/lib/Target/RISCV RISCVFeatures.td

[RISC-V][MC] Introduce RVY extension feature

This adds the initial feature for the base RVY extension,
other extensions such as the hybrid mode will be added later.
RVY specification: https://riscv.github.io/riscv-cheri/

Co-authored-by: Jessica Clarke <jrtc27 at jrtc27.com>
Co-authored-by: Petr Vesely <petr.vesely at codasip.com>

Pull Request: https://github.com/llvm/llvm-project/pull/176870
DeltaFile
+6-0llvm/lib/Target/RISCV/RISCVFeatures.td
+2-0llvm/test/CodeGen/RISCV/attributes.ll
+1-0llvm/test/CodeGen/RISCV/features-info.ll
+1-0clang/test/Driver/print-supported-extensions-riscv.c
+1-0llvm/unittests/TargetParser/RISCVISAInfoTest.cpp
+11-05 files

LLVM/project 313bec1llvm/test/Analysis/CostModel/X86 vscale-insertelement-crash.ll

Add quotes around "print<cost-model>" argument

This is consistent with other tests and is required for some test runners.
DeltaFile
+1-1llvm/test/Analysis/CostModel/X86/vscale-insertelement-crash.ll
+1-11 files

LLVM/project c563a88llvm/lib/Target/RISCV RISCVInstrInfoC.td RISCVInstrInfo.cpp

[RISC-V] Reduce code duplication for uimm*_lsb* operands. NFC

Use a common tablegen class instead of duplicating all the data and add a
new case macro to handle the isShiftedUInt<>() call. This refactoring was
motivated by adding RVY support since I needed to add uimm{9,10}_lsb0000.

Pull Request: https://github.com/llvm/llvm-project/pull/177743
DeltaFile
+19-46llvm/lib/Target/RISCV/RISCVInstrInfoC.td
+16-26llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+2-26llvm/lib/Target/RISCV/RISCVInstrInfoXwch.td
+1-13llvm/lib/Target/RISCV/RISCVInstrInfoXMips.td
+1-13llvm/lib/Target/RISCV/RISCVInstrInfoZc.td
+39-1245 files

LLVM/project eae7535clang-tools-extra/include-cleaner/lib Record.cpp, clang-tools-extra/test/clang-tidy/checkers/bugprone invalid-enum-default-initialization.cpp

[NFC] Fix "FIMXE" typos to "FIXME" (#177895)

Replace common typo "FIMXE" with the intended "FIXME" across the
codebase.
DeltaFile
+2-2clang-tools-extra/test/clang-tidy/checkers/bugprone/invalid-enum-default-initialization.cpp
+1-1clang/include/clang/AST/RecursiveASTVisitor.h
+1-1clang/lib/Sema/SemaDecl.cpp
+1-1llvm/lib/Target/AVR/AVRInstrInfo.td
+1-1llvm/test/CodeGen/RISCV/GlobalISel/rv32zbkb.ll
+1-1clang-tools-extra/include-cleaner/lib/Record.cpp
+7-76 files

LLVM/project 4c7ced2llvm/lib/Target/RISCV RISCVInstrFormatsV.td RISCVInstrInfoV.td

[RISCV] Use inheritance to simplify RVInstSet*VL* classes. NFC (#177797)

Rename classes to start with RVInstV to make it more clear they are
vector related.
DeltaFile
+9-30llvm/lib/Target/RISCV/RISCVInstrFormatsV.td
+9-9llvm/lib/Target/RISCV/RISCVInstrInfoV.td
+1-7llvm/lib/Target/RISCV/RISCVInstrInfoXSfmm.td
+19-463 files

LLVM/project 55de73dllvm/lib/Target/RISCV RISCVInstrInfoXSfmm.td RISCVInstrInfoXAndes.td

[RISCV] Put VL before VTYPE in XAndes and XSfmm instruction Defs/Uses. NFC (#177877)

This is the order we use for standard vector instructions.
DeltaFile
+7-7llvm/lib/Target/RISCV/RISCVInstrInfoXSfmm.td
+2-2llvm/lib/Target/RISCV/RISCVInstrInfoXAndes.td
+9-92 files

LLVM/project ff13eb7llvm/test/CodeGen/RISCV selectopt.ll

Refactor tests

Created using spr 1.3.6-beta.1
DeltaFile
+47-77llvm/test/CodeGen/RISCV/selectopt.ll
+47-771 files

LLVM/project 2c7cf89llvm/lib/Transforms/Utils UnifyLoopExits.cpp ControlFlowUtils.cpp, llvm/test/Transforms/UnifyLoopExits no-exit-blocks.ll

[llvm][UnifyLoopExits] Avoid optimization if no exit block is found (#165343)

If there is not an exit block, we should not try unify the loops.
Instead we should just return.

Fixes #165252
DeltaFile
+15-0llvm/test/Transforms/UnifyLoopExits/no-exit-blocks.ll
+5-0llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
+2-0llvm/lib/Transforms/Utils/ControlFlowUtils.cpp
+22-03 files

LLVM/project c03d0feclang/docs LanguageExtensions.rst, clang/include/clang/Basic OpenCLExtensions.def

[OpenCL] Add clang internal extension __cl_clang_function_scope_local_variables  (#176726)

OpenCL spec restricts that variable in local address space can only be
declared at kernel function scope.
Add a Clang internal extension __cl_clang_function_scope_local_variables
to lift the restriction.

To expose static local allocations at kernel scope, targets can either
force-inline non-kernel functions that declare local memory or pass a
kernel-allocated local buffer to those functions via an implicit argument.

Motivation: support local memory allocation in libclc's implementation
of work-group collective built-ins, see example at:
https://github.com/intel/llvm/blob/41455e305117/libclc/libspirv/lib/amdgcn-amdhsa/group/collectives_helpers.ll
https://github.com/intel/llvm/blob/41455e305117/libclc/libspirv/lib/amdgcn-amdhsa/group/collectives.cl#L182

Right now this is a Clang-only OpenCL extension intended for compiling
OpenCL libraries with Clang. It could be proposed as a standard OpenCL
extension in the future.
DeltaFile
+44-0clang/docs/LanguageExtensions.rst
+22-9clang/test/SemaOpenCL/storageclass.cl
+19-0clang/test/CodeGenOpenCL/local-scope.cl
+11-2clang/lib/Sema/SemaDecl.cpp
+5-0clang/test/SemaOpenCL/extension-version.cl
+1-0clang/include/clang/Basic/OpenCLExtensions.def
+102-116 files

LLVM/project 20c15c7libclc/clc/lib/generic/math clc_remquo.inc clc_remquo.cl

[libclc] replace float remquo with amd ocml implementation (#177131)

Current implementation has two issues:
* unconditionally soft flushes denormal.
* can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu.

This PR upstreams remquo implementation from
https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs/ocml/src/remainderF_base.h
It supports denormal and can pass OpenCL CTS test. Number of LLVM IR
instructions of function _Z6remquoffPU3AS5i increased from 96 to 680.

---------

Co-authored-by: Copilot <175728472+Copilot at users.noreply.github.com>
DeltaFile
+70-57libclc/clc/lib/generic/math/clc_remquo.inc
+10-1libclc/clc/lib/generic/math/clc_remquo.cl
+80-582 files

LLVM/project cdc6a84llvm/test/CodeGen/ARM vminmax.ll minnum-maxnum-intrinsics.ll, llvm/test/CodeGen/WebAssembly simd-arith.ll f64.ll

TargetLowering: Allow FMINNUM/FMAXNUM to lower to FMINIMUM/FMAXIMUM even without `nsz` (#177828)

This restriction was originally added in
https://reviews.llvm.org/D143256, with the given justification:

> Currently, in TargetLowering, if the target does not support fminnum,
we lower to fminimum if neither operand could be a NaN. But this isn't
quite correct because fminnum and fminimum treat +/-0 differently; so,
we need to prove that one of the operands isn't a zero.

As far as I can tell, this was never correct. Before
https://github.com/llvm/llvm-project/pull/172012, `minnum` and `maxnum`
were nondeterministic with regards to signed zero, so it's always been
perfectly legal to lower them to operations that order signed zeroes.
DeltaFile
+337-176llvm/test/CodeGen/ARM/vminmax.ll
+78-314llvm/test/CodeGen/WebAssembly/simd-arith.ll
+43-112llvm/test/CodeGen/ARM/minnum-maxnum-intrinsics.ll
+14-26llvm/test/CodeGen/WebAssembly/f64.ll
+11-20llvm/test/CodeGen/WebAssembly/f32.ll
+16-12llvm/test/CodeGen/ARM/lower-vmax.ll
+499-6601 files not shown
+503-6687 files

LLVM/project 7b445ddllvm/test/Transforms/LoopVectorize early-exit-load-live-out.ll single_early_exit_unsafe_ptrs.ll, llvm/test/Transforms/LoopVectorize/AArch64 early_exit_cost.ll

[LV] Add additional tests for early-exit loops loads not known deref.

Add additional test coverage for loops with loads that are not known to
be dereferenceable.
DeltaFile
+236-0llvm/test/Transforms/LoopVectorize/early-exit-load-live-out.ll
+132-0llvm/test/Transforms/LoopVectorize/AArch64/early_exit_cost.ll
+38-0llvm/test/Transforms/LoopVectorize/single_early_exit_unsafe_ptrs.ll
+406-03 files

LLVM/project 9b3b643llvm/lib/Transforms/InstCombine InstCombineSelect.cpp, llvm/test/Transforms/InstCombine fcmp-fadd-select.ll minmax-fp.ll

[InstCombine] Don't convert a compare+select into a minnum/maxnum intrinsic that can't be lowered back to a compare+select (#177821)

This is a step on the yak-shaving expedition to properly implement the
new `minnum`/`maxnum` signed-zero semantics.

`InstCombineSelect` will convert a `fcmp`+`select` sequence to a
`minnum`/`maxnum` intrinsic. It doesn't require the `fcmp` to have any
particular fast-math flags, just that the `select` has `nnan` and `nsz`
(or is being used in a context where the result doesn't care about
signed zero).

It's not correct to propagate the `nnan` flag from the `fcmp`
instruction for poison-propagation reasons. Patches like
https://github.com/llvm/llvm-project/pull/117977 and
https://github.com/llvm/llvm-project/pull/141010 have *generously* made
it so that if `fcmp` doesn't have fast-math flags, we can still perform
the transformation by simply dropping the flags on the generated
intrinsic.


    [25 lines not shown]
DeltaFile
+107-92llvm/test/Transforms/InstCombine/fcmp-fadd-select.ll
+25-25llvm/test/Transforms/InstCombine/minmax-fp.ll
+27-3llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+7-7llvm/test/Transforms/InstCombine/unordered-fcmp-select.ll
+6-6llvm/test/Transforms/InstCombine/fcmp-select.ll
+2-2llvm/test/Transforms/InstCombine/fneg.ll
+174-1356 files

LLVM/project e5d2358polly/include/polly ScopDetectionDiagnostic.h ScopDetection.h, polly/lib/Analysis ScopDetectionDiagnostic.cpp ScopDetection.cpp

[Polly] Reject scalable vector types (#177871)

Polly currently does not consider types without fixed length, which can
be encountered if an input source uses e.g. ARM SVE builtins. Such
programs have already been optimized manually. Non-fixed type lengths
also add to the difficulty of dependency analysis. Skip such types
entirely for now.
 
Fixes: #177859
DeltaFile
+95-0polly/test/ScopDetectionDiagnostics/ReportIncompatibleType.ll
+32-0polly/lib/Analysis/ScopDetectionDiagnostic.cpp
+28-0polly/include/polly/ScopDetectionDiagnostic.h
+17-0polly/lib/Analysis/ScopDetection.cpp
+4-0polly/include/polly/ScopDetection.h
+176-05 files

LLVM/project 14bdd06mlir/lib/Dialect/SCF/Transforms LoopSpecialization.cpp, mlir/lib/Dialect/Utils StaticValueUtils.cpp

[mlir][DialectUtils] Fix 0 step handling in `constantTripCount` (#177329)

A step size of "zero" does not indicate "zero iterations". It may
indicate an infinite number of iterations.

This commit makes some transformations more conservative. We used to
fold away some loops with step size 0 and that's now no longer the case.

Relation discussion:
https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530
DeltaFile
+11-3mlir/lib/Dialect/Utils/StaticValueUtils.cpp
+8-2mlir/test/Dialect/SCF/for-loop-peeling.mlir
+4-3mlir/test/Dialect/SCF/canonicalize.mlir
+3-0mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
+26-84 files

LLVM/project 544c300llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

DAG: Use poison instead of undef in some vector combines (#177612)

Use poison for the unused or out of bounds vector components.
DeltaFile
+48-48llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+48-481 files

LLVM/project 0666a77llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 vec_list_bias-inseltpoison.ll

[SLP]Support for tree throttling in SLP graphs with gathered loads

Gathered loads forming DAG instead of trees in SLP vectorizer. When
doing the throttling analysis for such graphs, need to consider partially
matched gathered loads DAG nodes and consider extract and/or gather
operations and their costs.
The patch adds this analysis and allows cutting off the expensive
sub-graphs with gathered loads.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/177855
DeltaFile
+99-14llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+12-13llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll
+111-272 files

LLVM/project 73ebadaclang/docs ReleaseNotes.rst, clang/include/clang/Sema Overload.h

[clang] Don't assert on perfect overload match with _Atomic (#176619)

An assertion incorrectly treated difference in _Atomic qualification as
different types for the purpose of verifying a perfect match in overload
resolution in C++.

Fixes #170433
DeltaFile
+16-0clang/test/SemaCXX/crash-GH170433.cpp
+2-1clang/include/clang/Sema/Overload.h
+1-0clang/docs/ReleaseNotes.rst
+19-13 files

LLVM/project 9d6f011llvm/include/llvm/IR PatternMatch.h, llvm/lib/Transforms/Vectorize VectorCombine.cpp

[VectorCombine] Fold vector.reduce.OP(F(X)) == 0 -> OP(X) == 0 (#173069)

This commit introduces a pattern to do the following fold:

  vector.reduce.OP f(X_i) == 0 -> vector.reduce.OP X_i == 0

In order to decide on this fold, we use the following properties:

1. OP X_i == 0 <=> \forall i \in [1, N] X_i == 0 1'. OP X_i == 0 <=>
\exists j \in [1, N] X_j == 0
  2.  f(x) == 0 <=> x == 0

From 1 and 2 (or 1' and 2), we can infer that

  OP f(X_i) == 0 <=> OP X_i == 0.

For some of the OP's and f's, we need to have domain constraints on X to
ensure properties 1 (or 1') and 2.


    [52 lines not shown]
DeltaFile
+672-0llvm/test/Transforms/VectorCombine/AArch64/icmp-vector-reduce.ll
+672-0llvm/test/Transforms/VectorCombine/X86/icmp-vector-reduce.ll
+183-0llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+9-0llvm/include/llvm/IR/PatternMatch.h
+1,536-04 files

LLVM/project 9eaa1ffclang/test/CodeGen builtin-rotate.c

[clang][test] Fix builtin-rotate.c test __int128 test failure on ARM32 (#177732)

- Run the INT128 prefix checks on 64-bit targets since __int128 is not
supported on ARM32

Fixes https://lab.llvm.org/buildbot/#/builders/154/builds/26813

DeltaFile
+4-3clang/test/CodeGen/builtin-rotate.c
+4-31 files

LLVM/project 029efa6utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[bazel] Add missing dependencies for 778a2491149512109541cd5d59bad2d55024bdb7
DeltaFile
+2-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+2-01 files

LLVM/project 13d82f3llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

update comments
DeltaFile
+5-5llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+5-51 files

LLVM/project 4e1d431llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

DAG: Use poison instead of undef in some vector combines

Use poison for the unused or out of bounds vector components.
DeltaFile
+43-43llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+43-431 files