From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22e.google.com (mail-lj1-x22e.google.com [IPv6:2a00:1450:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id D6451384CB97 for ; Wed, 8 May 2024 07:07:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D6451384CB97 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D6451384CB97 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::22e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715152056; cv=none; b=rMQeuqA/GAiqx3x2J0Hcbvhinr1Lh0uMlOOVKl//CG3Tt9eA8RHddcMdRvBlUnmt9ZW4Hj6x+1n0hmHUAmUuILCTKpqSeRXTIbOBrvw/Gwpsme3oBnViSXGeaA/ii4Kt8rh4/GsU0YRqmiDV9H66TWkTKp+dY9qNKroHDWXYWdg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715152056; c=relaxed/simple; bh=JyKSrbFoIcyHJ2Xya2QHX0Sej+COyxg/t75yrfmsjzE=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=tQkBy+bgAhU/8Xz81AaanhsifSj9GM8I1N4nHj9VNS2AejXV2idZHFYaoIeJPQd0t1YNS0LEmGW9avIADiUEKxgHGyf9NpKM/DigKBzWO0v4UAxoCAwVSnlm4dhMAn2CFRndEA9FppCBSqr2Lhjni6QpiGN7yT6VwlDdr0EHWXc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x22e.google.com with SMTP id 38308e7fff4ca-2e0a0cc5e83so49009701fa.1 for ; Wed, 08 May 2024 00:07:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715152053; x=1715756853; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sA/zud6CFf1/cHAyaB3aCo2UDOhf0JlvQL8MRzFcJTg=; b=G4t1xgmUcW3aX5AC881NsOxyX0EH2gT3FrtsOO5WM7/TFduY7B9GWAk1tSuA06QR4n IGvcJ2zVTaHGeItFk/ce/JuDNmw1od3WUc9j1apJi+mcZzLXlxFuOc0Msv1/LL57nU8o /6ZqE+7RbNcwVyHxBawHASfxFMgzGFQVvCLDoyCmsfA3IG39BCKbwSuJ7K5PJW1sVBei hWgBSag/N0VEuV9xhL6llJ/ld1iIG90UgwhF7GbFLizodUbXf1s19a47Z+CbpBh/k0jW ChXYdpRmbLKOWQdit5qz6NpwT/1F/z6HrCknwkaJTfmHguwvR0z3OnxpxoeKsa9HIpSB FjBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715152053; x=1715756853; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sA/zud6CFf1/cHAyaB3aCo2UDOhf0JlvQL8MRzFcJTg=; b=TuuFz7dhrA1n7HIbmVymSm3gOket5/+wMz2oyZzpIec44Byn0MXZlP/WOCrovv4xkP L1fWOqa/LA2vbIw5T3GM5+0/Ghzbk8322WubnolUvTAmmYuLld2AkKb91zDCPvgnyN20 4nWaygAnBZYw+/Ruqj2KQV2UGMLtytl6ObPuBGLqBu6TV4ynhfhbRKBG9Amfo07JQ+TK Xljm/NHMrivdh3YK+qUm4xWk+oRQ0I+jkf1Li8rwlZ77HR54yCuO8peHMirgCmK5DwrV YggttlLvfXYpdos4Hqj4iqp4n3vd51OHqs7l3pwxdCFcPRxf4hL+oDmDCvhu/l0wjJ1y yn6w== X-Gm-Message-State: AOJu0Ywcq9dpA3bZJ6+QI8OCw/3V6mMfQ4Np4INuqHTDkYBwS01UgZ0v WhF25p9n7bjL9gjUwPHLM1sj7JG10agFm8U3Vi7a42XkuKmRqcAmyGlq621tQUcpNAip7pKYKjA zlOO9CA94HULNWjPqNRqQqgMCm8w= X-Google-Smtp-Source: AGHT+IFpIqzOwni8hbGvRfuEPHUY/O8oYQyuTDbowQN5roeuIFbMl9q5MdUPTJG+rXeFNo8rJUE45e1lMr/h46Sc+6E= X-Received: by 2002:a2e:99ce:0:b0:2d8:713c:8313 with SMTP id 38308e7fff4ca-2e447cac487mr8057461fa.45.1715152052816; Wed, 08 May 2024 00:07:32 -0700 (PDT) MIME-Version: 1.0 References: <20240508024205.3623179-1-admin@levyhsu.com> In-Reply-To: <20240508024205.3623179-1-admin@levyhsu.com> From: Uros Bizjak Date: Wed, 8 May 2024 09:07:21 +0200 Message-ID: Subject: Re: [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563] To: Levy Hsu Cc: gcc-patches@gcc.gnu.org, liwei.xu@intel.com, crazylht@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, May 8, 2024 at 4:44=E2=80=AFAM Levy Hsu wrote: > > PR target/107563 > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): N= ew > subroutine. > (ix86_expand_vec_perm_const_1): New Entry. > > gcc/testsuite/ChangeLog: > > * g++.target/i386/pr107563.C: New test. > --- > gcc/config/i386/i386-expand.cc | 64 ++++++++++++++++++++++++ > gcc/testsuite/g++.target/i386/pr107563.C | 23 +++++++++ > 2 files changed, 87 insertions(+) > create mode 100755 gcc/testsuite/g++.target/i386/pr107563.C > > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand= .cc > index 2f27bfb484c..2718b0acb87 100644 > --- a/gcc/config/i386/i386-expand.cc > +++ b/gcc/config/i386/i386-expand.cc > @@ -22362,6 +22362,67 @@ expand_vec_perm_2perm_pblendv (struct expand_vec= _perm_d *d, bool two_insn) > return true; > } > > +/* A subroutine of ix86_expand_vec_perm_const_1. > + Implement a permutation with psrlw, psllw and por. > + It handles case: > + __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14); > + __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6); */ > + > +static bool > +expand_vec_perm_psrlw_psllw_por (struct expand_vec_perm_d *d) > +{ > + unsigned i; > + rtx (*gen_shr) (rtx, rtx, rtx); > + rtx (*gen_shl) (rtx, rtx, rtx); > + rtx (*gen_or) (rtx, rtx, rtx); > + machine_mode mode =3D VOIDmode; > + > + if (!TARGET_SSE2 || !d->one_operand_p) > + return false; > + > + switch (d->vmode) > + { > + case E_V8QImode: > + if (!TARGET_MMX_WITH_SSE) > + return false; > + mode =3D V4HImode; > + gen_shr =3D gen_ashrv4hi3; > + gen_shl =3D gen_ashlv4hi3; > + gen_or =3D gen_iorv4hi3; > + break; > + case E_V16QImode: > + mode =3D V8HImode; > + gen_shr =3D gen_vlshrv8hi3; > + gen_shl =3D gen_vashlv8hi3; > + gen_or =3D gen_iorv8hi3; > + break; > + default: return false; > + } > + > + if (!rtx_equal_p (d->op0, d->op1)) > + return false; > + > + for (i =3D 0; i < d->nelt; i +=3D 2) > + if (d->perm[i] !=3D i + 1 || d->perm[i + 1] !=3D i) > + return false; > + > + if (d->testing_p) > + return true; > + > + rtx tmp1 =3D gen_reg_rtx (mode); > + rtx tmp2 =3D gen_reg_rtx (mode); > + rtx op0 =3D force_reg (d->vmode, d->op0); > + > + emit_move_insn (tmp1, lowpart_subreg (mode, op0, d->vmode)); > + emit_move_insn (tmp2, lowpart_subreg (mode, op0, d->vmode)); > + emit_insn (gen_shr (tmp1, tmp1, GEN_INT (8))); > + emit_insn (gen_shl (tmp2, tmp2, GEN_INT (8))); > + emit_insn (gen_or (tmp1, tmp1, tmp2)); > + emit_move_insn (d->target, lowpart_subreg (d->vmode, tmp1, mode)); > + > + return true; > +} > + > /* A subroutine of ix86_expand_vec_perm_const_1. Implement a V4DF > permutation using two vperm2f128, followed by a vshufpd insn blending > the two vectors together. */ > @@ -23781,6 +23842,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_p= erm_d *d) > > if (expand_vec_perm_2perm_pblendv (d, false)) > return true; > + > + if (expand_vec_perm_psrlw_psllw_por (d)) > + return true; > > /* Try sequences of four instructions. */ > > diff --git a/gcc/testsuite/g++.target/i386/pr107563.C b/gcc/testsuite/g++= .target/i386/pr107563.C > new file mode 100755 > index 00000000000..5b0c648e8f1 > --- /dev/null > +++ b/gcc/testsuite/g++.target/i386/pr107563.C > @@ -0,0 +1,23 @@ > +/* PR target/107563.C */ > +/* { dg-do compile { target { ! ia32 } } } */ Please split the testcase to two files, one (e.g. pr107563-a.C) testing 8-byte vectors and the other (e.g. pr107563-b.C) using 16-byte vectors. The latter can also be tested with 32-bit targets. Uros. > +/* { dg-options "-std=3Dc++2b -O3 -msse2" } */ > +/* { dg-final { scan-assembler-not "movzbl" } } */ > +/* { dg-final { scan-assembler-not "salq" } } */ > +/* { dg-final { scan-assembler-not "orq" } } */ > +/* { dg-final { scan-assembler-not "punpcklqdq" } } */ > +/* { dg-final { scan-assembler-times "psllw" 2 } } */ > +/* { dg-final { scan-assembler-times "psrlw" 1 } } */ > +/* { dg-final { scan-assembler-times "psraw" 1 } } */ > +/* { dg-final { scan-assembler-times "por" 2 } } */ > + > +using temp_vec_type [[__gnu__::__vector_size__ (16)]] =3D char; > +void foo (temp_vec_type& v) noexcept > +{ > + v =3D __builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,1= 4); > +} > + > +using temp_vec_type2 [[__gnu__::__vector_size__ (8)]] =3D char; > +void foo2 (temp_vec_type2& v) noexcept > +{ > + v=3D__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6); > +} > -- > 2.31.1 >