LLVM / project / 42ff31a / [X86] combineTargetShuffle - fold VPERMV3(HI,MASK,LO) -> VPERMV(COMMUTE(MASK),CONCAT(LO,HI)) (#127199)

LLVM/project 42ff31a — llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-interleaved-load-i16-stride-6.ll vector-interleaved-load-i32-stride-3.ll

28 days ago by Simon Pilgrim via GitHub on ⎇

main

[X86] combineTargetShuffle - fold VPERMV3(HI,MASK,LO) -> VPERMV(COMMUTE(MASK),CONCAT(LO,HI)) (#127199)

We already handle the simpler VPERMV3(LO,MASK,HI) fold which can reuse
the (widened) mask, this attempts to match the flipped concatenation,
and commutes the mask to handle the flip.

I've limited this to cases where we can extract the constant mask for
commutation, a more general solution would XOR the MSB of the shuffle
mask indices to commute, but this almost never constant folds away after
lowering so the benefit was minimal.

Delta		File
+240	-258	llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
+88	-96	llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
+36	-40	llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-3.ll
+20	-26	llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast.ll
+16	-20	llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll
+21	-4	llvm/lib/Target/X86/X86ISelLowering.cpp
+6	-8	llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
+6	-8	llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast.ll
+6	-8	llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast_from_memory.ll
+439	-468	9 files

Unified Split Raw