Hi Uros, Many thanks and merry Christmas. Here's the version as committed, implemented using your preferred idiom with mode iterators for movss/movsd. Thanks again. 2022-12-25 Roger Sayle Uroš Bizjak gcc/ChangeLog * config/i386/i386-builtin.def (__builtin_ia32_movss): Update CODE_FOR_sse_movss to CODE_FOR_sse_movss_v4sf. (__builtin_ia32_movsd): Likewise, update CODE_FOR_sse2_movsd to CODE_FOR_sse2_movsd_v2df. * config/i386/i386-expand.cc (split_convert_uns_si_sse): Update gen_sse_movss call to gen_sse_movss_v4sf, and gen_sse2_movsd call to gen_sse2_movsd_v2df. (expand_vec_perm_movs): Also allow V4SImode with TARGET_SSE and V2DImode with TARGET_SSE2. * config/i386/sse.md (avx512fp16_fcmaddcsh_v8hf_mask3): Update gen_sse_movss call to gen_sse_movss_v4sf. (avx512fp16_fmaddcsh_v8hf_mask3): Likewise. (sse_movss_): Renamed from sse_movss using VI4F_128 mode iterator to handle both V4SF and V4SI. (sse2_movsd_): Likewise, renamed from sse2_movsd using VI8F_128 mode iterator to handle both V2DF and V2DI. gcc/testsuite/ChangeLog * gcc.target/i386/sse-movss-4.c: New test case. * gcc.target/i386/sse2-movsd-3.c: New test case. Roger -- > -----Original Message----- > From: Uros Bizjak > Sent: 23 December 2022 17:18 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [x86 PATCH] Use movss/movsd to implement V4SI/V2DI VEC_PERM. > > On Fri, Dec 23, 2022 at 5:46 PM Roger Sayle > wrote: > > > > > > This patch tweaks the x86 backend to use the movss and movsd > > instructions to perform some vector permutations on integer vectors > > (V4SI and V2DI) in the same way they are used for floating point vectors (V4SF > and V2DF). > > > > As a motivating example, consider: > > > > typedef unsigned int v4si __attribute__((vector_size(16))); typedef > > float v4sf __attribute__((vector_size(16))); v4si foo(v4si x,v4si y) { > > return (v4si){y[0],x[1],x[2],x[3]}; } v4sf bar(v4sf x,v4sf y) { return > > (v4sf){y[0],x[1],x[2],x[3]}; } > > > > which is currently compiled with -O2 to: > > > > foo: movdqa %xmm0, %xmm2 > > shufps $80, %xmm0, %xmm1 > > movdqa %xmm1, %xmm0 > > shufps $232, %xmm2, %xmm0 > > ret > > > > bar: movss %xmm1, %xmm0 > > ret > > > > with this patch both functions compile to the same form. > > Likewise for the V2DI case: > > > > typedef unsigned long v2di __attribute__((vector_size(16))); typedef > > double v2df __attribute__((vector_size(16))); > > > > v2di foo(v2di x,v2di y) { return (v2di){y[0],x[1]}; } v2df bar(v2df > > x,v2df y) { return (v2df){y[0],x[1]}; } > > > > which is currently generates: > > > > foo: shufpd $2, %xmm0, %xmm1 > > movdqa %xmm1, %xmm0 > > ret > > > > bar: movsd %xmm1, %xmm0 > > ret > > > > There are two possible approaches to adding integer vector forms of > > the sse_movss and sse2_movsd instructions. One is to use a mode > > iterator > > (VI4F_128 or VI8F_128) on the existing define_insn patterns, but this > > requires renaming the patterns to sse_movss_ which then requires > > changes to i386-builtins.def and through-out the backend to reflect > > the new naming of gen_sse_movss_v4sf. The alternate approach (taken > > here) is to simply clone and specialize the existing patterns. Uros, > > if you'd prefer the first approach, I'm happy to make/test/commit those > changes. > > I would really prefer the variant with VI4F_128/VI8F_128, these two iterators > were introduced specifically for this case (see e.g. > sse_shufps_ and sse2_shufpd_. The internal name of the > pattern is fairly irrelevant and a trivial search and replace operation can replace > the grand total of 6 occurrences ...) > > Also, changing sse2_movsd to use VI8F_128 mode iterator would enable more > alternatives besides movsd, so we give combine pass some more opportunities > with memory operands. > > So, the patch with those two iterators is pre-approved. > > Uros. > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32}, > > with no new failures. Ok for mainline? > > > > 2022-12-23 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/i386-expand.cc (expand_vec_perm_movs): Also allow > > V4SImode with TARGET_SSE and V2DImode with TARGET_SSE2. > > * config/i386/sse.md (sse_movss_v4si): New define_insn, a V4SI > > specialization of sse_movss. > > (sse2_movsd_v2di): Likewise, a V2DI specialization of sse2_movsd. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/sse-movss-4.c: New test case. > > * gcc.target/i386/sse2-movsd-3.c: New test case. > > > > > > Thanks in advance, > > Roger > > -- > >