LLVM/project f016ee5utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[bazel] Add missing dependency for b9ab8885c89b80cdb638aecbd5114672ec4fdb4b
DeltaFile
+1-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+1-01 files

LLVM/project 7f39d92clang/lib/AST/ByteCode Interp.h Interp.cpp

[clang][bytecode][NFC] Move some opcode impls to the source file (#177543)

They aren't templated, so move them to Interp.cpp to make the header
file a bit shorter.
DeltaFile
+6-98clang/lib/AST/ByteCode/Interp.h
+96-0clang/lib/AST/ByteCode/Interp.cpp
+102-982 files

LLVM/project 4e51f90mlir/include/mlir/Dialect/SPIRV/IR SPIRVTosaOps.td SPIRVTosaTypes.td, mlir/lib/Dialect/SPIRV/IR SPIRVTosaOps.cpp

[mlir][spirv] Add Conv operations for TOSA Extended Instruction Set (001000.1) (#176908)

This patch expands support for the TOSA Extended Instruction Set
(001000.1) to the SPIR-V dialect in MLIR. The TOSA extended instruction
set provides a standardized set of machine learning operations designed
to be used within `spirv.ARM.Graph` operations (corresponding to
OpGraphARM in SPV_ARM_graph) and typed with `!spirv.arm.tensor<...>`
(corresponding to OpTypeTensorARM in SPV_ARM_tensor).

The change introduces:
* Extending dialect plumbing for import, serialization, and
deserialization of the TOSA extended instruction set.
* The `spirv.Tosa.*Conv*` convolution operation from TOSA extended
instruction, each lowering to the corresponding `OpExtInst`.
* Verification enforcing that new convolution operations appears only
within `spirv.ARM.Graph` regions, operates on `!spirv.arm.tensor<...>`
types, and is well-formed according to the TOSA 001000.1 specification.

All convolution operations from TOSA 001000.1 extended instructions are

    [11 lines not shown]
DeltaFile
+337-0mlir/test/Dialect/SPIRV/IR/tosa-ops-verification.mlir
+287-1mlir/include/mlir/Dialect/SPIRV/IR/SPIRVTosaOps.td
+184-0mlir/test/Target/SPIRV/tosa-ops.mlir
+133-3mlir/lib/Dialect/SPIRV/IR/SPIRVTosaOps.cpp
+104-0mlir/test/Dialect/SPIRV/IR/tosa-ops.mlir
+29-0mlir/include/mlir/Dialect/SPIRV/IR/SPIRVTosaTypes.td
+1,074-44 files not shown
+1,116-610 files

LLVM/project 9ef96bbllvm/test/Transforms/Attributor/IPConstantProp openmp_parallel_for.ll

Attributor: Regenerate baseline test checks
DeltaFile
+12-10llvm/test/Transforms/Attributor/IPConstantProp/openmp_parallel_for.ll
+12-101 files

LLVM/project 580b6dcllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 clmul-vector-256.ll clmul-vector-512.ll

[X86] Enable custom lowering of 256/512-bit vXi32 and vXi64 CLMUL nodes (#177554)

Similar to 128-bit v4i32/v2i64 support, these are can now be efficiently
lowered to PCLMUL nodes through unrolling, shuffle combining and
concatenation

If the target only supports PCLMUL then they will remain as 128-bit
nodes, but if VPCLMULQDQ is supported then they should merge into wider
types.
DeltaFile
+119-1,787llvm/test/CodeGen/X86/clmul-vector-256.ll
+64-381llvm/test/CodeGen/X86/clmul-vector-512.ll
+118-208llvm/test/CodeGen/X86/clmul-vector.ll
+16-0llvm/lib/Target/X86/X86ISelLowering.cpp
+317-2,3764 files

LLVM/project b9ab888mlir/include/mlir/Dialect/Shard/IR ShardOps.td, mlir/include/mlir/Dialect/Shard/Transforms Transforms.h

[mlir][shard,mpi] Lowering shard.allgather to MPI (#177202)

- lowering `shard.allgather` to `mpi.allgather`
- fixing lowering of `shard.allreduce`
- minor refactoring
DeltaFile
+127-77mlir/lib/Conversion/ShardToMPI/ShardToMPI.cpp
+45-14mlir/test/Conversion/ShardToMPI/convert-shard-to-mpi.mlir
+9-8mlir/lib/Dialect/Shard/Transforms/Transforms.cpp
+6-6mlir/include/mlir/Dialect/Shard/Transforms/Transforms.h
+2-2mlir/include/mlir/Dialect/Shard/IR/ShardOps.td
+1-1mlir/lib/Dialect/Linalg/Transforms/ShardingInterfaceImpl.cpp
+190-1086 files

LLVM/project fed07a9llvm/test/CodeGen/AArch64/GlobalISel pr166541.ll

[AArch64][GlobalISel] Commit reproducer for crash #166541 (#177190)

Crash was fixed in #175810, this is just committing the reproducer.
DeltaFile
+28-0llvm/test/CodeGen/AArch64/GlobalISel/pr166541.ll
+28-01 files

LLVM/project cc0b331llvm/lib/Transforms/InstCombine InstCombineMulDivRem.cpp InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine binop-itofp.ll pow-to-ldexp.ll

InstCombine: Use SimplifyDemandedFPClass on fmul

Start trying to use SimplifyDemandedFPClass on instructions, starting
with fmul. This subsumes the old transform on multiply of 0. The
main change is the introduction of nnan/ninf. I do not think anywhere
was systematically trying to introduce fast math flags before, though
a few odd transforms would set them.

Previously we only called SimplifyDemandedFPClass on function returns
with nofpclass annotations. Start following the pattern of
SimplifyDemandedBits, where this will be called from relevant root
instructions.

I was wondering if this should go into InstCombineAggressive, but that
apparently does not make use of InstCombineInternal's worklist.
DeltaFile
+12-12llvm/test/Transforms/InstCombine/binop-itofp.ll
+10-10llvm/test/Transforms/InstCombine/pow-to-ldexp.ll
+9-7llvm/test/Transforms/InstCombine/fsqrtdiv-transform.ll
+2-13llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+13-1llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+4-4llvm/test/Transforms/InstCombine/pow_fp_int16.ll
+50-4714 files not shown
+79-7820 files

LLVM/project a01e091llvm/lib/Target/ARC ARCRegisterInfo.cpp

[NFC][ARC] Tidy Up RegState in ARC Backend (#177546)

This was missed in llvm/llvm-project#177090 because Github CI and my
local build don't have experimental targets enabled.

This is the only problematic RegState use in the experimental targets.
DeltaFile
+1-1llvm/lib/Target/ARC/ARCRegisterInfo.cpp
+1-11 files

LLVM/project adc64c6clang/include/clang/Sema Sema.h, clang/lib/Sema SemaConcept.cpp SemaTemplateInstantiate.cpp

[Clang] Fix the normalization of fold constraints (#177531)

Fold constraints can contain packs expanded from different locations.
For `C<Ps...>`, where the ellipsis immediately follows the argument, the
pack should be expanded in place regardless of the fold expression. For
`C<Ps> && ...`, the fold expression itself is responsible for expanding
Ps.

Previously, both kinds of packs were expanded by the fold expression,
which broke assumptions within concept caching. This patch fixes that by
preserving PackExpansionTypes for the first kind of pack while rewriting
them to non-packs for the second kind.

This patch also removes an unused function and performs some cleanup of
the evaluation contexts. Hopefully it is viable for backporting.

No release note, as this issue was a regression.

Fixes https://github.com/llvm/llvm-project/issues/177245
DeltaFile
+89-53clang/lib/Sema/SemaConcept.cpp
+12-30clang/lib/Sema/SemaTemplateInstantiate.cpp
+16-0clang/test/SemaCXX/cxx2c-fold-exprs.cpp
+1-11clang/include/clang/Sema/Sema.h
+1-1clang/lib/Sema/TreeTransform.h
+119-955 files

LLVM/project 11ba240clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp, clang/test/CodeGenOpenCL builtins-amdgcn-gfx1250-load-monitor.cl

Revert to old name
DeltaFile
+42-42llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.monitor.gfx1250.ll
+24-24clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250-load-monitor.cl
+21-21clang/test/SemaOpenCL/builtins-amdgcn-error-gfx1250-param.cl
+20-20llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+18-18clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+8-8llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+133-1339 files not shown
+173-17915 files

LLVM/project fe29ae0clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp, clang/lib/Sema SemaAMDGPU.cpp

[AMDGPU][GFX12.5] Reimplement monitor load as an atomic operation

Load monitor operations make more sense as atomic operations, as
non-atomic operations cannot be used for inter-thread communication w/o
additional synchronization.
The previous built-in made it work because one could just override the CPol
bits, but that bypasses the memory model and forces the user to learn about
ISA bits encoding.

Making load monitor an atomic operation has a couple of advantages. First,
the memory model foundation for it is stronger. We just lean on the existing
rules for atomic operations. Second, the CPol bits are abstracted away from
the user, which avoids leaking ISA details into the API.

This patch also adds supporting memory model and intrinsics documentation to
AMDGPUUsage.

Solves SWDEV-516398.
DeltaFile
+73-53llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.monitor.gfx1250.ll
+90-18llvm/docs/AMDGPUUsage.rst
+58-28llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+51-24clang/lib/Sema/SemaAMDGPU.cpp
+31-24clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+24-24clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250-load-monitor.cl
+327-17110 files not shown
+450-20816 files

LLVM/project 7119dccclang/lib/CodeGen/TargetBuiltins AMDGPU.cpp, clang/test/CodeGenOpenCL builtins-amdgcn-gfx1250-load-monitor.cl

Revert to old name
DeltaFile
+42-42llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.monitor.gfx1250.ll
+24-24clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250-load-monitor.cl
+21-21clang/test/SemaOpenCL/builtins-amdgcn-error-gfx1250-param.cl
+20-20llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+18-18clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+8-8llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+133-1339 files not shown
+173-17915 files

LLVM/project a648128clang/bindings/python/clang cindex.py, clang/bindings/python/tests/cindex test_code_completion.py test_enums.py

[libclang/python] Add CompletionChunkKind enum and deprecate old CompletionChunk.Kind (#176631)

This adresses point 1 from
https://github.com/llvm/llvm-project/issues/156680.
Since step 4 is already completed, `CompletionChunk.Kind` becomes unused
in this PR, so it is removed.
DeltaFile
+59-53clang/bindings/python/clang/cindex.py
+42-2clang/bindings/python/tests/cindex/test_code_completion.py
+8-1clang/docs/ReleaseNotes.rst
+2-0clang/bindings/python/tests/cindex/test_enums.py
+111-564 files

LLVM/project e85bbd0llvm/lib/Target/AMDGPU AMDGPUSubtarget.cpp, llvm/test/CodeGen/AMDGPU waves-per-eu-hints-lower-occupancy-target.ll default-flat-work-group-size-overrides-waves-per-eu.ll

Revert "[AMDGPU] Allow amdgpu-waves-per-eu to lower target occupancy range" (#177544)

Reverts llvm/llvm-project#168358

Buildbot failure as commented in original PR.
DeltaFile
+0-84llvm/test/CodeGen/AMDGPU/waves-per-eu-hints-lower-occupancy-target.ll
+61-0llvm/test/CodeGen/AMDGPU/default-flat-work-group-size-overrides-waves-per-eu.ll
+14-20llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+12-13llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll
+0-12llvm/test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll
+2-2llvm/test/CodeGen/AMDGPU/vgpr-limit-gfx1250.ll
+89-1311 files not shown
+90-1347 files

LLVM/project f1eb1f2llvm/test/tools/llvm-exegesis/AArch64 debug-gen-asm.s

[llvm-exegesis] Add -mtriple to AArch64 test (#177485)

Similar to https://github.com/llvm/llvm-project/pull/148968

(cherry picked from commit 12b3a9f52e76611b3d3ceb746559d2c384b2565e)
DeltaFile
+2-2llvm/test/tools/llvm-exegesis/AArch64/debug-gen-asm.s
+2-21 files

LLVM/project 2d62af8bolt/test lit.local.cfg, bolt/test/X86 lit.local.cfg

[NFCI][bolt][test] Enable AT&T syntax generally (#172355)

Having it in the X86 subdirectory only affects tests in that directory.
That's however not sufficient as for example runtime/X86/pie-exceptions-split.test is affected but
isn't located in the X86 directory.
This essentially fixes the fix for the original commit by guarding it properly for when the X86
target has been built and the flag is recognized.

Fixes: 6c48fbc1dcfbd44a47f126f21e575340b67aac06
DeltaFile
+5-0bolt/test/lit.local.cfg
+1-1bolt/test/X86/lit.local.cfg
+6-12 files

LLVM/project 84b2f12llvm/docs ReleaseNotes.md, llvm/include/llvm/MC MCStreamer.h

[MC][X86/M68k] Emit syntax directive for AT&T (#167234)

This eases interoperability by making it explicit in emitted assembly code which syntax is used.
Refactored to remove X86-specific directives and logic from the generic MC(Asm)Streamer.

Motivated by building LLVM with `-mllvm -x86-asm-syntax=intel` (i.e. a global preference for Intel
syntax). A Bolt test (`runtime/X86/fdata-escape-chars.ll`) was using `llc` to compile to assembly
and then assembling with `clang`. The specific option causes Clang to assume Intel syntax but only
for assembly and not inline assembly.
DeltaFile
+6-9llvm/lib/MC/MCAsmStreamer.cpp
+7-0llvm/test/CodeGen/X86/asm-dialect-directive.ll
+5-1llvm/lib/Target/X86/X86AsmPrinter.cpp
+4-0llvm/docs/ReleaseNotes.md
+0-4llvm/lib/Target/M68k/M68kAsmPrinter.cpp
+1-1llvm/include/llvm/MC/MCStreamer.h
+23-153 files not shown
+25-179 files

LLVM/project 7633143llvm/lib/Target/AMDGPU AMDGPUSubtarget.cpp, llvm/test/CodeGen/AMDGPU waves-per-eu-hints-lower-occupancy-target.ll default-flat-work-group-size-overrides-waves-per-eu.ll

Revert "[AMDGPU] Allow amdgpu-waves-per-eu to lower target occupancy range (#…"

This reverts commit 967aeecdaa7db58db4cc896823b0327636c7219c.
DeltaFile
+0-84llvm/test/CodeGen/AMDGPU/waves-per-eu-hints-lower-occupancy-target.ll
+61-0llvm/test/CodeGen/AMDGPU/default-flat-work-group-size-overrides-waves-per-eu.ll
+14-20llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+12-13llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll
+0-12llvm/test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll
+2-2llvm/test/CodeGen/AMDGPU/vgpr-limit-gfx1250.ll
+89-1311 files not shown
+90-1347 files

LLVM/project 7184229llvm/include/llvm/CodeGen MachineInstrBuilder.h, llvm/lib/Target/AMDGPU SIInstrInfo.cpp AMDGPUInstructionSelector.cpp

[NFC][MI] Tidy Up RegState enum use (2/2) (#177090)

This Change makes `RegState` into an enum class, with bitwise operators.
It also:
- Updates declarations of flag variables/arguments/returns from
`unsigned` to `RegState`.
- Updates empty RegState initializers from 0 to `{}`.

If this is causing problems in downstream code:
- Adopt the `RegState getXXXRegState(bool)` functions instead of using a
ternary operator such as `bool ? RegState::XXX : 0`.
- Adopt the `bool hasRegState(RegState, RegState)` function instead of
using a bitwise check of the flags.
DeltaFile
+40-42llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+29-29llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+27-25llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
+26-20llvm/include/llvm/CodeGen/MachineInstrBuilder.h
+19-20llvm/lib/Target/Hexagon/HexagonBitSimplify.cpp
+17-17llvm/lib/Target/AVR/AVRISelLowering.cpp
+158-15390 files not shown
+474-45496 files

LLVM/project 4d55fb4llvm/include/llvm/CodeGen TargetLowering.h

[TargetLowering] Avoid unnecessary EVT -> Type -> EVT roundtrip (NFC) (#177328)

For pointers, this gets the pointer EVT, then converts it back into a
type, and then gets the EVT for that type again. We can directly use the
pointer EVT.
DeltaFile
+12-12llvm/include/llvm/CodeGen/TargetLowering.h
+12-121 files

LLVM/project 7e25a31llvm/lib/Target/AMDGPU AMDGPUPrintfRuntimeBinding.cpp, llvm/test/CodeGen/AMDGPU opencl-printf.ll

[AMDGPU] Fix use-after-erase in OpenCL printf runtime binding (#177356)

When handling OpenCL printf calls, the AMDGPU backend replaces the
actual function call with a runtime binding. However, this replacement
currently assumes that there are no uses of the original call value
result. If there are uses, the erasure of the function call leads to
errors.
This patch replaces all uses of the original printf call with a 0 value
constant, signalling success of the printf operation.

---------

Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen at amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen at amd.com>
DeltaFile
+91-0llvm/test/CodeGen/AMDGPU/opencl-printf.ll
+6-2llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
+97-22 files

LLVM/project 88dbbe8flang/docs ReleaseNotes.md

flang ReleaseNotes: Mention experimental multi-image feature
DeltaFile
+3-0flang/docs/ReleaseNotes.md
+3-01 files

LLVM/project 8e53101llvm/include/llvm/CodeGen TargetLowering.h, llvm/lib/CodeGen TargetLoweringBase.cpp

DAG: Remove softPromoteHalfType

Remove the now unimplemented target hook and associated DAG machinery
for the old half legalization path.

Really fixes #97975
DeltaFile
+7-22llvm/include/llvm/CodeGen/TargetLowering.h
+0-20llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+0-11llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+2-7llvm/lib/CodeGen/TargetLoweringBase.cpp
+0-8llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp
+0-2llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+9-701 files not shown
+9-717 files

LLVM/project 074f0b4llvm/lib/Target/AMDGPU R600ISelLowering.cpp R600ISelLowering.h, llvm/test/CodeGen/AMDGPU kernel-args.ll

R600: Remove softPromoteHalfType

Also includes a kind of hacky, minimal change to avoid assertions
when softPromoteHalfType is removed to fix kernel arguments
lowered as f16. Half support was never really implemented
for r600, and there just happened to be a few incidental tests
which included a half argument (which were also not even meaningful,
since the function body just folded to nothing due to no callable
function support).
DeltaFile
+164-0llvm/test/CodeGen/AMDGPU/kernel-args.ll
+3-0llvm/lib/Target/AMDGPU/R600ISelLowering.cpp
+0-2llvm/lib/Target/AMDGPU/R600ISelLowering.h
+167-23 files

LLVM/project 8c1dd3allvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

AMDGPU: Move softPromoteHalfType override to R600 only

As expected the code is much worse, but more correct.
We could do a better job with source modifier management around
fp16_to_fp/fp_to_fp16.
DeltaFile
+19,051-23,588llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+7,381-11,318llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+6,645-10,108llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+6,103-9,009llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+7,004-7,821llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+5,419-8,032llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+51,603-69,876116 files not shown
+97,949-126,397122 files

LLVM/project 9708409llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp

[AMDGPU] Fix inline constant encoding for `v_pk_fmac_f16` (#176659)

This PR handles`v_pk_fmac_f16` inline constant encoding/decoding
differences between pre-GFX11 and GFX11+ hardware.

- Pre-GFX11: fp16 inline constants produce `(f16, 0)` - value in low 16
bits, zero in high.
- GFX11+: fp16 inline constants are duplicated to both halves `(f16,
f16)`.

Fixes #94116.

(cherry picked from commit c253b9f9caf0be95bb16e973f216489d894370e1)
DeltaFile
+37-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+20-0llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+18-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+8-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+8-0llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
+8-0llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+99-111 files not shown
+133-417 files

LLVM/project e4a114eclang/test/Driver riscv-arch.c, clang/test/Preprocessor riscv-target-features.c

[RISCV]Remove experimental from Zalasr (#177120)

Zalasr 1.0 was ratified in October 2025.

Documentation:https://docs.riscv.org/reference/isa/extensions/zalasr/_attachments/riscv-zalasr.pdf
(cherry picked from commit a43b55edf42b4e70e3b26e493e5997a2f5682fea)
DeltaFile
+17-18llvm/unittests/TargetParser/RISCVISAInfoTest.cpp
+15-9clang/test/Driver/riscv-arch.c
+9-9clang/test/Preprocessor/riscv-target-features.c
+6-6llvm/test/MC/RISCV/rvzalasr-valid.s
+6-6llvm/test/CodeGen/RISCV/attributes.ll
+4-4llvm/test/CodeGen/RISCV/atomic-load-store.ll
+57-5212 files not shown
+81-7818 files

LLVM/project 0db6ae1llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 sve-partial-reduce-dot-product.ll

[AArch64] Fix partial_reduce v16i8 -> v2i32 (#177119)

The lowering doesn't need to check for `ConvertToScalable`, because it
lowers to another `PARTIAL_REDUCE_*MLA` node, which is subsequently
lowered using either fixed-length or scalable types.

This fixes https://github.com/llvm/llvm-project/issues/176954

Re-generate check lines

The check lines for SME were different because of sub-register liveness,
which is enabled for streaming functions on trunk, but isn't enabled on
the release branch.

(cherry picked from commit de997639876db38d20c7ed9fb0c683a239d56bf5)
DeltaFile
+46-0llvm/test/CodeGen/AArch64/sve-partial-reduce-dot-product.ll
+5-5llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+51-52 files

LLVM/project e4ff2fcllvm/lib/MC MCObjectStreamer.cpp

[MC] Explicitly use memcpy in emitBytes() (NFC) (#177187)

We've observed a compile-time regression in LLVM 22 when including large
blobs. The root cause was that emitBytes() was copying bytes one-by-one,
which is much slower than using memcpy for large objects.

Optimization of std::copy to memmove is apparently much less reliable
than one might think. In particular, when using a non-bleeding-edge
libstdc++ (anything older than version 15), this does not happen if the
types of the input and output iterators do not match (like here, where
there is a signed/unsigned mismatch).

As this code is performance sensitive, I think it makes sense to
directly use memcpy.

Previously this code used SmallVector::append, which explicitly uses
memcpy.

(cherry picked from commit 15e421dc643ce4d9d79174fec585cf787e56b1a0)
DeltaFile
+3-1llvm/lib/MC/MCObjectStreamer.cpp
+3-11 files