LLVM/project 08866cellvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/PowerPC vector-popcnt-128-ult-ugt.ll

Merge branch 'main' into users/paschalis-mpeis/nfc-rename-plt-taildup-getnames
DeltaFile
+238,275-0llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+92,826-0llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+65,599-0llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_long_composites/long-spec-const-composite.ll
+52,868-0llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+48,746-0llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+22,442-22,606llvm/test/CodeGen/PowerPC/vector-popcnt-128-ult-ugt.ll
+520,756-22,60663,811 files not shown
+7,932,640-2,665,80463,817 files

LLVM/project 8615193llvm/lib/Transforms/Vectorize VPlanTransforms.cpp LoopVectorize.cpp

[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)

The canonical IV is tied to region blocks; move getCanonicalIV there and
update all users.

PR: https://github.com/llvm/llvm-project/pull/163020
DeltaFile
+33-24llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+12-12llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+13-10llvm/lib/Transforms/Vectorize/VPlan.h
+9-6llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+6-3llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+2-1llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+75-562 files not shown
+78-588 files

LLVM/project 4b89704lldb/source/Plugins/SymbolFile/NativePDB SymbolFileNativePDB.cpp PdbUtil.cpp, lldb/test/Shell/SymbolFile/NativePDB simple-types.cpp local-variables-registers.s

[LLDB][NativePDB] Consolidate simple types (#163209)

This aligns the simple types created by the native plugin with the ones
from DIA as well as LLVM and the original cvdump.

- A few type names weren't handled when creating the LLDB `Type` name
(e.g. `short`)
- 64-bit integers were created as `(u)int64_t` and are now created as
`(unsigned) long long` (matches DIA)
- 128-bit integers (only supported by clang-cl) weren't created as types
(they have `SimpleTypeKind::(U)Int128Oct`)
- All complex types had the same name - now they have `_Complex
<float-type>`

Some types like `SimpleTypeKind::Float48` can't be tested because they
can't be created in C++.
DeltaFile
+49-15lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
+28-3lldb/source/Plugins/SymbolFile/NativePDB/PdbUtil.cpp
+22-7lldb/test/Shell/SymbolFile/NativePDB/simple-types.cpp
+2-2lldb/test/Shell/SymbolFile/NativePDB/local-variables-registers.s
+101-274 files

LLVM/project f29f237llvm/test/Analysis/CostModel/AArch64 sve-cast.ll cast.ll

[Analysis][AArch64][NFC] Change undef to poison in most tests (#163532)

Whenever someone modifies an existing test that has `undef` in it the
github code formatter will complain so it's not easy to know if it's due
to a new or old use. I figured I may as well just do a simple sed
replace of undef with poison in all the tests to clean them up.
Hopefully it makes the contribution process a bit easier.
DeltaFile
+1,938-1,938llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
+962-962llvm/test/Analysis/CostModel/AArch64/cast.ll
+144-144llvm/test/Analysis/CostModel/AArch64/sve-fptoi.ll
+64-64llvm/test/Analysis/CostModel/AArch64/sve-trunc.ll
+42-42llvm/test/Analysis/CostModel/AArch64/sve-fptrunc.ll
+40-40llvm/test/Analysis/CostModel/AArch64/sve-ext.ll
+3,190-3,1902 files not shown
+3,223-3,2238 files

LLVM/project 4ad625bclang/docs UsersManual.rst

[clang][docs] Fix typos in option names (#163482)

DeltaFile
+3-3clang/docs/UsersManual.rst
+3-31 files

LLVM/project fcd7b8dllvm/test/CodeGen/X86/apx cf.ll

[X86] Add baseline test for X86 conditional load/store optimization bug (#163354)

This PR adds a baseline test that exposes a bug in the current
`combineX86CloadCstore` optimization. The generated assembly
demonstrates incorrect behavior when the optimization is applied without
proper constraints.

Without any assumptions about `X` this transformation is only valid when
`Y` is a non zero power of two/single-bit mask.
```cpp
      // res, flags2 = sub 0, (and (xor X, -1), Y)
      // cload/cstore ..., cond_ne, flag2
      // ->
      // res, flags2 = sub 0, (and X, Y)
      // cload/cstore ..., cond_e, flag2
```

In the provided test case, the value in `%al` is unknown at compile
time. If `%al` contains `0`, the optimization cannot be applied, because

    [2 lines not shown]
DeltaFile
+17-0llvm/test/CodeGen/X86/apx/cf.ll
+17-01 files

LLVM/project 7275256llvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64SVEInstrInfo.td

[NFC] Rename AArch64ISD::SRAD_MERGE_OP1 as ASRD_MERGE_OP1.

This aligns with the specific instruction it represents.
DeltaFile
+3-3llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+1-1llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+4-42 files

LLVM/project 17e06aalldb/source/Plugins/Language/CPlusPlus MsvcStlAtomic.cpp

[lldb] Only get child if m_storage and m_element_type is valid (#163077)

This causes a crash because lldb-dap will check the first child to see
if it is array like to lazy load the children.
DeltaFile
+1-1lldb/source/Plugins/Language/CPlusPlus/MsvcStlAtomic.cpp
+1-11 files

LLVM/project e249c51lldb/test/API/tools/lldb-dap/stackTraceDisassemblyDisplay TestDAP_stackTraceDisassemblyDisplay.py

[lldb-dap][test] create temp source file in test directory. (#163383)

Fixes #163288

---------

Co-authored-by: Jonas Devlieghere <jonas at devlieghere.com>
DeltaFile
+2-1lldb/test/API/tools/lldb-dap/stackTraceDisassemblyDisplay/TestDAP_stackTraceDisassemblyDisplay.py
+2-11 files

LLVM/project 3141bdemlir/test/Dialect/Bufferization/Transforms one-shot-non-module-bufferize.mlir, mlir/test/lib/Dialect/Bufferization TestOneShotModuleBufferize.cpp

[mlir][bufferization] Test tensor encoding -> memref layout conversion (#161166)

Support custom types (4/N): test that it is possible to customize memref
layout specification for custom operations and function boundaries.

This is purely a test setup (no API modifications) to ensure users are
able to pass information from tensors to memrefs within bufferization
process. To achieve this, a test pass is required (since bufferization
options have to be set manually). As there is already a
--test-one-shot-module-bufferize pass present, it is extended for the
purpose.
DeltaFile
+38-6mlir/test/lib/Dialect/Test/TestOpDefs.cpp
+37-1mlir/test/Dialect/Bufferization/Transforms/one-shot-non-module-bufferize.mlir
+26-0mlir/test/lib/Dialect/Bufferization/TestOneShotModuleBufferize.cpp
+18-0mlir/test/lib/Dialect/Test/TestAttributes.cpp
+17-0mlir/test/lib/Dialect/Test/TestAttrDefs.td
+8-7mlir/test/lib/Dialect/Test/TestOps.td
+144-143 files not shown
+150-159 files

LLVM/project bf64316llvm/include/llvm/CodeGen SelectionDAGNodes.h

[DAG] Fix incorrect doxygen comment for isZeroOrZeroSplat (NFC) (#163527)

DeltaFile
+1-1llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+1-11 files

LLVM/project 3f2aacbmlir/include/mlir/Dialect/Linalg/Utils Utils.h, mlir/lib/Dialect/Linalg/Transforms Vectorization.cpp Transforms.cpp

[mlir][linalg] Update vectorizatio of linalg.pack

This patch changes `vectorizeAsTensorPackOp` to require users to specify
all write-side vector sizes for `linalg.pack` (not just the outer
dimensions). This makes `linalg.pack` vectorization consistent with
`linalg.unpack` (see #149293 for a similar change).

Conceptually, `linalg.pack` consists of these high-level steps:
  * **Read** from the source tensor using `vector.transfer_read`.
  * **Re-associate** dimensions of the transposed value, as specified by
    the op (via `vector.shape_cast`)
  * **Transpose** the re-associated value according to the permutation
    in the `linalg.pack` op (via `vector.transpose`).
  * **Write** the result into the destination tensor via
    `vector.transfer_write`.

Previously, the vector sizes provided by the user were interpreted as
write-vector-sizes for PackOp _outer_ dims (i.e. the final step above).
These were used to:

    [27 lines not shown]
DeltaFile
+116-98mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+93-5mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
+4-4mlir/lib/Dialect/Linalg/Utils/Utils.cpp
+2-1mlir/include/mlir/Dialect/Linalg/Utils/Utils.h
+2-1mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+2-0mlir/test/Dialect/Linalg/vectorization/linalg-ops-with-patterns.mlir
+219-1096 files

LLVM/project 3ade12cllvm/lib/Target/AArch64 MachineSMEABIPass.cpp, llvm/test/CodeGen/AArch64 sme-za-function-with-many-blocks.ll sme-za-lazy-save-buffer.ll

[AArch64][SME] Propagate desired ZA states in the MachineSMEABIPass

This patch adds a propagation step to the MachineSMEABIPass that
propagates desired ZA states forwards/backwards (from predecessors to
successors, or vice versa).

The aim of this is to pick better ZA states for edge bundles, as when
many (or all) blocks in a bundle do not have a preferred ZA state, the
ZA state assigned to a bundle can be less than ideal.

An important case is nested loops, where only the inner loop has a
preferred ZA state. Here we'd like to propagate the ZA state up from the
inner loop to the outer loops (to avoid saves/restores in any loop).

Change-Id: I39f9c7d7608e2fa070be2fb88351b4d1d0079041
DeltaFile
+296-0llvm/test/CodeGen/AArch64/sme-za-function-with-many-blocks.ll
+115-17llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+37-73llvm/test/CodeGen/AArch64/sme-za-lazy-save-buffer.ll
+40-67llvm/test/CodeGen/AArch64/sme-za-control-flow.ll
+5-13llvm/test/CodeGen/AArch64/sme-za-exceptions.ll
+2-5llvm/test/CodeGen/AArch64/sme-agnostic-za.ll
+495-1756 files

LLVM/project 6b6e62allvm/lib/Target/AArch64 MachineSMEABIPass.cpp AArch64TargetMachine.cpp

Pass CodeGenOptLevel to MachineSMEABI pass (part of #149065)

Change-Id: Idef5b1e2a45585f97897fc11c4f237996edb7c8b
DeltaFile
+8-2llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+3-3llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+1-1llvm/lib/Target/AArch64/AArch64.h
+12-63 files

LLVM/project 8b93f27llvm/lib/Target/AArch64 MachineSMEABIPass.cpp, llvm/test/CodeGen/AArch64 machine-sme-abi-find-insert-pt.mir sme-lazy-sve-nzcv-live.mir

[AArch64][SME] Fixup ABI routine insertion points to avoid clobbering NZCV (#161353)

This updates the `MachineSMEABIPass` to find insertion points for state
changes (i.e., calls to ABI routines), where the NZCV register (status
flags) are not live.

It works by stepping backwards from where the state change is needed
until we find an instruction where NZCV is not live, a previous state
change, or a call sequence. We conservatively don't move into/over
calls, as they may require a different state before the start of the
call sequence.
DeltaFile
+227-0llvm/test/CodeGen/AArch64/machine-sme-abi-find-insert-pt.mir
+86-22llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+4-8llvm/test/CodeGen/AArch64/sme-lazy-sve-nzcv-live.mir
+1-3llvm/test/CodeGen/AArch64/sme-agnostic-za.ll
+318-334 files

LLVM/project bcf9e91clang/lib/Sema SemaTypeTraits.cpp, clang/test/SemaCXX type-traits.cpp

[Clang] Fix a regression introduced by #161163. (#162612)

Classes with a user provided constructor are still implicit lifetime if
they have an implicit, trivial copy ctr.
DeltaFile
+40-1clang/test/SemaCXX/type-traits.cpp
+20-9clang/lib/Sema/SemaTypeTraits.cpp
+60-102 files

LLVM/project 8395a36clang/lib/Headers avx512vlcdintrin.h avx512cdintrin.h, clang/test/CodeGen/X86 avx512cd-builtins.c avx512vlcd-builtins.c

[Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow AVX512 mask broadcast intrinsics to be used in constexpr (#163475)

Fix #161334
DeltaFile
+8-13clang/lib/Headers/avx512vlcdintrin.h
+5-8clang/lib/Headers/avx512cdintrin.h
+4-0clang/test/CodeGen/X86/avx512cd-builtins.c
+4-0clang/test/CodeGen/X86/avx512vlcd-builtins.c
+21-214 files

LLVM/project fbd6ac3mlir/include/mlir/Dialect/Shard/IR ShardOps.td

[MLIR][shard] Fix tblgen description of `shard.neighbors_linear_indices` (#163409)

This PR fixed an issue where inline code blocks in the ODS description of `shard.neighbors_linear_indices` were not properly closed.
DeltaFile
+1-1mlir/include/mlir/Dialect/Shard/IR/ShardOps.td
+1-11 files

LLVM/project bd9bc53lld/COFF DLL.cpp, lld/test/COFF arm64x-delayimport.test arm64-delayimport.yaml

[LLD] [COFF] Fix aarch64 delayimport of sret arguments (#163096)

For sret arguments on aarch64, the x8 register is used as input
parameter to functions, even though x8 normally isn't an input parameter
register.

When delayloading a DLL, the first call of a delayloaded function ends
up calling a helper which resolves the function. Therefore, any input
arguments to the actual function to be called need to be backed up and
restored - this also includes x8.

This matches how MS link.exe also changed its delayloading trampoline,
between MSVC 2019 16.7 and 16.8 (between link.exe 14.27.29110.0 and
14.28.29333.0).

This fixes running LLDB on aarch64 mingw, after
ec28b95b7491bc2fbb6ec66cdbfd939e71255c42 and
93d326038959fd87fb666a8bf97d774d0abb3591. Those commits make LLDB load
liblldb.dll with delayloading, and the first function to be called,

    [4 lines not shown]
DeltaFile
+48-44lld/test/COFF/arm64x-delayimport.test
+24-22lld/test/COFF/arm64-delayimport.yaml
+16-14lld/COFF/DLL.cpp
+88-803 files

LLVM/project 7b785dclld/COFF DLL.cpp, lld/test/COFF arm64x-delayimport.test arm64-delayimport.yaml

[LLD][COFF] Fix tailMergeARM64 delayload thunk 128 MB range limitation (#161844)

lld would fail with "error: relocation out of range" if the thunk was
laid out more than 128 MB away from __delayLoadHelper2.

This patch changes the call sequence to load the offset into a register
and call through that, allowing for 32-bit offsets.

Fixes #161812

(cherry picked from commit 69b8d6d4ead01b88fb8d6642914ca7492e32fdb6)
DeltaFile
+28-24lld/test/COFF/arm64x-delayimport.test
+14-12lld/test/COFF/arm64-delayimport.yaml
+7-3lld/COFF/DLL.cpp
+49-393 files

LLVM/project 4145818clang/include/clang/Basic arm_sve.td, clang/test/CodeGen/AArch64/sme2-intrinsics acle_sme2_bfmul.c acle_sme2_bfscale.c

Revert "[AArch64] Add intrinsics for multi-vector FEAT_SVE_BFSCALE instructions" (#163535)

Reverts llvm/llvm-project#163346
DeltaFile
+0-76clang/test/CodeGen/AArch64/sme2-intrinsics/acle_sme2_bfmul.c
+0-76clang/test/CodeGen/AArch64/sme2-intrinsics/acle_sme2_bfscale.c
+0-56llvm/test/CodeGen/AArch64/sme2-intrinsics-bfscale.ll
+0-56llvm/test/CodeGen/AArch64/sme2-intrinsics-bfmul.ll
+0-24llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+0-7clang/include/clang/Basic/arm_sve.td
+0-2951 files not shown
+2-2977 files

LLVM/project 8a09111clang/include/clang/Basic arm_sve.td, clang/test/CodeGen/AArch64/sme2-intrinsics acle_sme2_bfmul.c acle_sme2_bfscale.c

[AArch64] Add intrinsics for multi-vector FEAT_SVE_BFSCALE instructions (#163346)

This patch add intrinsics support for multi-vector BFMUL and BFSCALE
instruction based on
[this](https://github.com/ARM-software/acle/pull/410) ACLE specification
proposal
DeltaFile
+76-0clang/test/CodeGen/AArch64/sme2-intrinsics/acle_sme2_bfmul.c
+76-0clang/test/CodeGen/AArch64/sme2-intrinsics/acle_sme2_bfscale.c
+56-0llvm/test/CodeGen/AArch64/sme2-intrinsics-bfmul.ll
+56-0llvm/test/CodeGen/AArch64/sme2-intrinsics-bfscale.ll
+24-0llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+7-0clang/include/clang/Basic/arm_sve.td
+295-01 files not shown
+297-27 files

LLVM/project 140d465clang/lib/AST ExprConstant.cpp, clang/lib/AST/ByteCode InterpBuiltin.cpp

[X86][ByteCode] Allow PSHUFB intrinsics to be used in constexpr #156612 (#163148)

The PSHUFB instruction shuffles bytes within each 128-bit lane: for each
control byte, if bit 7 is set, the output byte is zeroed; otherwise, the
low 4 bits select a source byte (0–15) from the same lane.

Note: _mm_shuffle_pi8 function had to change as __anyext128 had negative
indices which are invalid in constant expression context.

Fixes #156612
DeltaFile
+47-0clang/lib/AST/ExprConstant.cpp
+33-0clang/lib/AST/ByteCode/InterpBuiltin.cpp
+8-12clang/lib/Headers/avx512vlbwintrin.h
+9-11clang/lib/Headers/tmmintrin.h
+6-9clang/lib/Headers/avx512bwintrin.h
+13-0clang/test/CodeGen/X86/avx512vlbw-builtins.c
+116-326 files not shown
+138-3812 files

LLVM/project 9734aa8clang/lib/AST DeclBase.cpp

[NFC] [clang] Add comments for a defect

See the patch for details.

I tried to solve the defect left in previous refactorings
but found it was more complex. Add the comment to state
it more clearly.
DeltaFile
+24-2clang/lib/AST/DeclBase.cpp
+24-21 files

LLVM/project 7f7f249llvm/lib/Target/AArch64 AArch64MachineFunctionInfo.cpp AArch64PostCoalescerPass.cpp, llvm/test/CodeGen/AArch64 aarch64-post-coalescer.mir mir-yaml-has-streaming-mode-changes.ll

[AArch64PostCoalescer] Propagate undef flag after replacing (#163119)

I encountered a compilation crash issue, and after analysis, it was
caused by the AArch64PostCoalescerPass, see https://godbolt.org/z/vPeqeo5Pa.
When replacing the register, if the source register has undef flag, we
should propagate the flag to all uses of the destination register.
DeltaFile
+16-0llvm/test/CodeGen/AArch64/aarch64-post-coalescer.mir
+13-0llvm/test/CodeGen/AArch64/mir-yaml-has-streaming-mode-changes.ll
+7-1llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.cpp
+4-0llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp
+2-0llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
+42-15 files

LLVM/project cd24d10openmp/runtime/src kmp_alloc.cpp kmp_affinity.cpp

[OpenMP] Fix preprocessor mismatches between include and usages of hwloc (#158349)

Fix https://github.com/llvm/llvm-project/issues/156679

There is a mismatch between the preprocessor guards around the include
of `hwloc.h` and those protecting its usages, leading to build failures
on Darwin: https://github.com/spack/spack-packages/pull/1212

This change introduces `KMP_HWLOC_ENABLED` that reflects
whether hwloc is actually used.
DeltaFile
+15-15openmp/runtime/src/kmp_alloc.cpp
+12-12openmp/runtime/src/kmp_affinity.cpp
+13-10openmp/runtime/src/kmp.h
+10-10openmp/runtime/src/kmp_settings.cpp
+3-3openmp/runtime/src/kmp_affinity.h
+2-2openmp/runtime/src/kmp_global.cpp
+55-521 files not shown
+57-547 files

LLVM/project c9b07f3lldb/source/Plugins/Process/Utility RegisterContextFreeBSD_x86_64.cpp

[LLDB, FreeBSD, x86] Fix empty register set when trying to get size of register (#162890)

The register set information is stored as a singleton in
GetRegisterInfo_i386. However, other functions later access this
information assuming it is stored in GetSharedRegisterInfoVector. To
resolve this inconsistency, we remove the original construction logic
and instead initialize the singleton using llvm::call_once within the
appropriate function (GetSharedRegisterInfoVector_i386).
DeltaFile
+20-24lldb/source/Plugins/Process/Utility/RegisterContextFreeBSD_x86_64.cpp
+20-241 files

LLVM/project 0a71fd1mlir/lib/Dialect/Vector/Transforms VectorDistribute.cpp, mlir/test/Dialect/Vector vector-warp-distribute.mlir

[MLIR][Vector] Improve warp distribution robustness (#161647)

DeltaFile
+22-40mlir/lib/Dialect/Vector/Transforms/VectorDistribute.cpp
+19-0mlir/test/Dialect/Vector/vector-warp-distribute.mlir
+41-402 files

LLVM/project 0e6557dclang/test/CodeGen attr-target-mv.c, clang/test/Driver x86-march.c

[X86] Add support for Wildcat Lake (#163214)

Add support for Wildcat Lake, per Intel Architecture Instruction Set
Extensions Programming Reference rev. 59
(https://cdrdv2.intel.com/v1/dl/getContent/671368)
DeltaFile
+7-0llvm/lib/TargetParser/Host.cpp
+7-0compiler-rt/lib/builtins/cpu_model/x86.c
+6-0clang/test/Preprocessor/predefined-arch-macros.c
+5-0clang/test/CodeGen/attr-target-mv.c
+3-2llvm/lib/Target/X86/X86.td
+4-0clang/test/Driver/x86-march.c
+32-210 files not shown
+48-216 files

LLVM/project 4f2c867llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce.ll partial-reduce-dot-product.ll

[LV][NFC] Fix "cpu" attribute in some partial-reduce*.ll tests (#163518)

DeltaFile
+80-63llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll
+1-1llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll
+81-642 files