From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf2b.google.com (mail-qv1-xf2b.google.com [IPv6:2607:f8b0:4864:20::f2b]) by sourceware.org (Postfix) with ESMTPS id 8368B3858D35 for ; Tue, 21 May 2024 07:21:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8368B3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8368B3858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::f2b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716276098; cv=none; b=cPRYxJyzf5n6Jp/dxvGVto5Gwf68/fYeuXk4rVcvhCP4x7yUgqJq1m4zw2CMVEYjCqMHAXsGMFU6PTqfvRID0xhDuxCY3ojr2sE5Fl78NHCmDaaeM7fwiMFV233oOkMJk+sKh7fV+68pmDS8ESEDEPozA1JlfzhkT1pDdxrIE80= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716276098; c=relaxed/simple; bh=t/df92a03PmRMZ9JHTWxiZfnTn4ms84EFj1waHvqghU=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=MFR/vJSS97GpT+dHwQ2cYODae+p8oYWVliRNNEw67x/idO3hMlsNHGIiRS/0Dtynm3SXeXVAoexJKQdazqqUKEuUmSUypkkp/dKimUm7NVXfTO+b7beU6tK3qCiOzd7grlazl3wiDD8SFC9fL/oGCoXjfTEGaHQLA8pYok9SOa0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-qv1-xf2b.google.com with SMTP id 6a1803df08f44-6a077a861e7so28976596d6.2 for ; Tue, 21 May 2024 00:21:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716276095; x=1716880895; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ncQCnmh66fXYVZ3OHGVOGJArYiPLS7t4k4Th7ScA/SI=; b=Ky9xXiLBD1e/BKFPj3fjully/JZ8lD7b/nu0hyyhCJpqD7cktdzRxVI0IPRWWQ5n1O DDjHQRhMsrRiu9KWE16STGlXw7avy+MRBCfYgix/nd0DqXJljZISQMbQTJ+yH7pCXCRG 2tCxwWJYnymfEUdlFSJkUf8Wdv8V3WHY0LnEsjtWOnVn1ICtKAtCSR1tpgc5s74Ekx8A s5W4lj3tlud+fyh+nyjqoO1p5l0Us5c48v0rbb48NXtdnZgc1C7syeH83fG/TkQlDNxX QSFk0DaC5iwv7kTbGtFcU1/m0T/TG9J1q5w01u+6rFVtcaWMFjxDR4KOERO25b7K1VXX eN/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716276095; x=1716880895; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ncQCnmh66fXYVZ3OHGVOGJArYiPLS7t4k4Th7ScA/SI=; b=ZPRo7ioWGqXI4lK9vycPDGGjuTWKELgtnZEqhP3Me3tocqId/MPlceCiSUivrLMGNq 36NS8gU//ENHat3AsrJWlaWg90HR8qjgVgUrySbfV1IEcLamxvvyHujtbRkLWtBt1dgu 1V3rqQFCSKPvH1KsGjcfx+ThssshGdaeSdYTh0GfvwndcUTNmjNdf5rteR61OO5hA9P8 POfQvet/Bbl2rJvpeZAP6WOsV4mw0WNBrb6/SejvdqGXyR+2p31YCxEgwle4Ra/MtHSY X2Wsf3MuQpvEzZmzsPXAWnNZ0r2D2rT4WxDOxaHLGrBDthkUu7/i8DWeMk+JDv6opdjF 1AdQ== X-Gm-Message-State: AOJu0YxCajOz+xsX015MZxOPJ2zG4HYrFUlT2FxjVKQ3wLokFlwq7ZuB vt61jGRihHVa3i51p4w5Wyb/fgv8S8yMGGy4KeY+cjINWYvr6jI8v1AGXtw3h33Q2jLgBdwuUXW IaQrxzj9g2Zn4q/5ggtLIfrR2iwk= X-Google-Smtp-Source: AGHT+IG5kCp+5UrLeMC1FGQuwALhQAcZ1tD7NQPF17eVOGK4IBgAWPHovYSlU43G/2qpAdYI8HZ71l5DDgEr0EdTtxo= X-Received: by 2002:a05:6214:3985:b0:6a8:d2bd:e737 with SMTP id 6a1803df08f44-6a8d2bdeae0mr88057326d6.16.1716276094655; Tue, 21 May 2024 00:21:34 -0700 (PDT) MIME-Version: 1.0 References: <20240521071334.1450276-1-haochen.jiang@intel.com> In-Reply-To: <20240521071334.1450276-1-haochen.jiang@intel.com> From: Hongtao Liu Date: Tue, 21 May 2024 15:21:23 +0800 Message-ID: Subject: Re: [PATCH v2] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW To: Haochen Jiang Cc: gcc-patches@gcc.gnu.org, hongtao.liu@intel.com, ubizjak@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, May 21, 2024 at 3:14=E2=80=AFPM Haochen Jiang wrote: > > Hi all, > > This is the v2 patch to fix PR115069. The new testcase has passed. > > Changes in v2: > - Added a testcase. > - Change the comment for the early exit. > > Thx, > Haochen > > Since vpermq is really slow, we should avoid using it for permutation > when vpmovwb is not available (needs AVX512BW) for ix86_expand_vecop_qihi= 2 > and fall back to ix86_expand_vecop_qihi. > > gcc/ChangeLog: > > PR target/115069 > * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): > Do not enable the optimization when AVX512BW is not enabled. > > gcc/testsuite/ChangeLog: > > PR target/115069 > * gcc.target/i386/pr115069.c: New. > --- > gcc/config/i386/i386-expand.cc | 7 +++ > gcc/testsuite/gcc.target/i386/pr115069.c | 78 ++++++++++++++++++++++++ > 2 files changed, 85 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr115069.c > > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand= .cc > index a6132911e6a..f7939761879 100644 > --- a/gcc/config/i386/i386-expand.cc > +++ b/gcc/config/i386/i386-expand.cc > @@ -24323,6 +24323,13 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx= dest, rtx op1, rtx op2) > bool op2vec =3D GET_MODE_CLASS (GET_MODE (op2)) =3D=3D MODE_VECTOR_INT= ; > bool uns_p =3D code !=3D ASHIFTRT; > > + /* Without VPMOVWB (provided by AVX512BW ISA), the expansion uses the > + generic permutation to merge the data back into the right place. T= his > + permutation results in VPERMQ, which is slow, so better fall back t= o > + ix86_expand_vecop_qihi. */ > + if (!TARGET_AVX512BW) > + return false; > + > if ((qimode =3D=3D V16QImode && !TARGET_AVX2) > || (qimode =3D=3D V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX51= 2)) > /* There are no V64HImode instructions. */ > diff --git a/gcc/testsuite/gcc.target/i386/pr115069.c b/gcc/testsuite/gcc= .target/i386/pr115069.c > new file mode 100644 > index 00000000000..c4b48b602ef > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr115069.c > @@ -0,0 +1,78 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mavx2" } */ > +/* { dg-final { scan-assembler-not "vpermq" } } */ > + > +#include > +#include > +#include > +#include > + > +typedef int8_t stress_vint8_t __attribute__ ((vector_size (16))); No need for such big testcase, typedef char v16qi __attribute__((vector_size(16))); v16qi foo (v16qi a, v16qi b) { return a * b; } should be enough, with -mavx2 -mno-avx512f > + > +#define OPS(a, b, c, s, v23, v3) \ > +do { \ > + a +=3D b; \ > + a |=3D b; \ > + a -=3D b; \ > + a &=3D ~b; \ > + a *=3D c; \ > + a =3D ~a; \ > + a *=3D s; \ > + a ^=3D c; \ > + a <<=3D 1; \ > + b >>=3D 1; \ > + b +=3D c; \ > + a %=3D v23; \ > + c /=3D v3; \ > + b =3D b ^ c; \ > + c =3D b ^ c; \ > + b =3D b ^ c; \ > +} while (0) > + > +volatile uint8_t csum8_put; > + > +void stress_vecmath(void) > +{ > + const stress_vint8_t v23_8 =3D { > + 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, > + 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17 > + }; > + const stress_vint8_t v3_8 =3D { > + 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, > + 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03 > + }; > + stress_vint8_t a8 =3D { > + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, > + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 > + }; > + stress_vint8_t b8 =3D { > + 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, > + 0x0f, 0x1e, 0x2d, 0x3c, 0x4b, 0x5a, 0x69, 0x78 > + }; > + stress_vint8_t c8 =3D { > + 0x01, 0x02, 0x03, 0x02, 0x01, 0x02, 0x03, 0x02, > + 0x03, 0x02, 0x01, 0x02, 0x03, 0x02, 0x01, 0x02 > + }; > + stress_vint8_t s8 =3D { > + 0x01, 0x01, 0x01, 0x01, 0x02, 0x02, 0x02, 0x02, > + 0x01, 0x01, 0x02, 0x02, 0x01, 0x01, 0x02, 0x02, > + }; > + const uint8_t csum8_val =3D (uint8_t)0x1b; > + int i; > + uint8_t csum8; > + > + for (i =3D 1000; i; i--) { > + OPS(a8, b8, c8, s8, v23_8, v3_8); > + OPS(a8, b8, c8, s8, v23_8, v3_8); > + OPS(a8, b8, c8, s8, v23_8, v3_8); > + OPS(a8, b8, c8, s8, v23_8, v3_8); > + OPS(a8, b8, c8, s8, v23_8, v3_8); > + OPS(a8, b8, c8, s8, v23_8, v3_8); > + } > + > + csum8 =3D a8[0] ^ a8[1] ^ a8[2] ^ a8[3] ^ > + a8[4] ^ a8[5] ^ a8[6] ^ a8[7] ^ > + a8[8] ^ a8[9] ^ a8[10] ^ a8[11] ^ > + a8[12] ^ a8[13] ^ a8[14] ^ a8[15]; > + csum8_put =3D csum8; > +} > -- > 2.31.1 > --=20 BR, Hongtao