LLVM/project 42ff31a — llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-interleaved-load-i16-stride-6.ll vector-interleaved-load-i32-stride-3.ll
[X86] combineTargetShuffle - fold VPERMV3(HI,MASK,LO) -> VPERMV(COMMUTE(MASK),CONCAT(LO,HI)) (#127199)
We already handle the simpler VPERMV3(LO,MASK,HI) fold which can reuse
the (widened) mask, this attempts to match the flipped concatenation,
and commutes the mask to handle the flip.
I've limited this to cases where we can extract the constant mask for
commutation, a more general solution would XOR the MSB of the shuffle
mask indices to commute, but this almost never constant folds away after
lowering so the benefit was minimal.
UnifiedSplitRaw