From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by sourceware.org (Postfix) with ESMTPS id 0EAA63858D35 for ; Sun, 25 Jun 2023 04:42:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0EAA63858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-56ff9cc91b4so22712837b3.0 for ; Sat, 24 Jun 2023 21:42:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687668175; x=1690260175; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ySSPv9+v6DQvF/SH3wo1GZkd5QL9YlnxIl7qLunEaZg=; b=Jjcnu3YDg2kcVGvjERYrlhW0v+lCClfhwwlWyYhH5S/24i/Pp12hq0M2pScVNl6IHf R7pJcgvm9di+qWyjq4LrlAF4KJ+4iBcs3F8Z/6yUJhFiZko+eJENq5WXS/3gJE2ufWHt u0Lq/+QaUnAjoz2rcLmfXCL7UEowf2egQCLCcJ8c0t4Yhc0uSve7/4VL3hjSJtKJ0OGm nHmYOIvULYkNe4milXw63P+TxtWvV0WuY76/zwW2TQyfpY46GpR/bBE2vIR5b+y2IE7I ZZ4FXRHm017tXJJTRBndO4W2m0CvNIy/mNz1iwcKcm7tp1YmeC0k3+GRRdiJestM6/S8 XSyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687668175; x=1690260175; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ySSPv9+v6DQvF/SH3wo1GZkd5QL9YlnxIl7qLunEaZg=; b=jjQTbt2iE1LSIisdlTsL9/u9nhhnMmRSLQRW2sVUOis/NBGqLsD4BCHHoggGSoP/be ifCKlh1aTX+AWRMsnX4Gu2TOraXimeR2/nuMk2l+dd814Znzya34p1Cxp6kSfzVuDp+R 3cp4tPHpB6kK5kzGpv1MW6yg+yx4BptKb0CDfMeOEV7033H1eepwzSIrZoqurc+PERn3 84rC+EKHbw1XoElApSMZlu3i94NMo9JG4N/PPlzctuq2fdGwvdNBdzAmdqy8USdD1e2Z 5XfHfs1eaMpnIBLES2zUcFtn3k9BbPhd5vc6vISMItV81F24dNs+yVUBuGqtu7SOcl5m s0kA== X-Gm-Message-State: AC+VfDw7hwcjqxGstvmR+O5riiIl6sbgNG6BdqbkDWtd7RnhzqjSE+GZ EPYZWkn4++DbAxCMvYcgw64BOQQ0hSYK8y8PhAk= X-Google-Smtp-Source: ACHHUZ5oOrsGdXM1h7q1Out1n1zSiZx3a3rVIyTywZxAYz/Lf50CPA3a4vqPTrIQqp/tB14207erHQefTvkHgfWEGEs= X-Received: by 2002:a0d:f945:0:b0:576:a0b8:eb06 with SMTP id j66-20020a0df945000000b00576a0b8eb06mr2928667ywf.52.1687668175202; Sat, 24 Jun 2023 21:42:55 -0700 (PDT) MIME-Version: 1.0 References: <04f99abe-a563-d093-23b7-4abf0f91633d@suse.com> <457ffad0-9ecd-3e19-f5ab-6153ce4b8bad@suse.com> In-Reply-To: <457ffad0-9ecd-3e19-f5ab-6153ce4b8bad@suse.com> From: Hongtao Liu Date: Sun, 25 Jun 2023 12:42:43 +0800 Message-ID: Subject: Re: [PATCH 1/5] x86: use VPTERNLOG for further bitwise two-vector operations To: Jan Beulich Cc: "gcc-patches@gcc.gnu.org" , Hongtao Liu , Kirill Yukhin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jun 21, 2023 at 2:26=E2=80=AFPM Jan Beulich via Gcc-patches wrote: > > All combinations of and, ior, xor, and not involving two operands can be > expressed that way in a single insn. > > gcc/ > > PR target/93768 > * config/i386/i386.cc (ix86_rtx_costs): Further special-case > bitwise vector operations. > * config/i386/sse.md (*iornot3): New insn. > (*xnor3): Likewise. > (*3): Likewise. > (andor): New code iterator. > (nlogic): New code attribute. > (ternlog_nlogic): Likewise. > > gcc/testsuite/ > > PR target/93768 > gcc.target/i386/avx512-binop-not-1.h: New. > gcc.target/i386/avx512-binop-not-2.h: New. > gcc.target/i386/avx512f-orn-si-zmm-1.c: New test. > gcc.target/i386/avx512f-orn-si-zmm-2.c: New test. > --- > The use of VI matches that in e.g. one_cmpl2 / > one_cmpl2 and *andnot3, despite > (here and there) > - V64QI and V32HI being needlessly excluded when AVX512BW isn't enabled, > - VTI not being covered, > - vector modes more narrow than 16 bytes not being covered. > > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -21178,6 +21178,32 @@ ix86_rtx_costs (rtx x, machine_mode mode > return false; > > case IOR: > + if (GET_MODE_CLASS (mode) =3D=3D MODE_VECTOR_INT) > + { > + /* (ior (not ...) ...) can be a single insn in AVX512. */ > + if (GET_CODE (XEXP (x, 0)) =3D=3D NOT && TARGET_AVX512F > + && (GET_MODE_SIZE (mode) =3D=3D 64 > + || (TARGET_AVX512VL > + && (GET_MODE_SIZE (mode) =3D=3D 32 > + || GET_MODE_SIZE (mode) =3D=3D 16)))) > + { > + rtx right =3D GET_CODE (XEXP (x, 1)) !=3D NOT > + ? XEXP (x, 1) : XEXP (XEXP (x, 1), 0); > + > + *total =3D ix86_vec_cost (mode, cost->sse_op) > + + rtx_cost (XEXP (XEXP (x, 0), 0), mode, > + outer_code, opno, speed) > + + rtx_cost (right, mode, outer_code, opno, speed); > + return true; > + } > + *total =3D ix86_vec_cost (mode, cost->sse_op); > + } > + else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) > + *total =3D cost->add * 2; > + else > + *total =3D cost->add; > + return false; > + > case XOR: > if (GET_MODE_CLASS (mode) =3D=3D MODE_VECTOR_INT) > *total =3D ix86_vec_cost (mode, cost->sse_op); > @@ -21198,11 +21224,20 @@ ix86_rtx_costs (rtx x, machine_mode mode > /* pandn is a single instruction. */ > if (GET_CODE (XEXP (x, 0)) =3D=3D NOT) > { > + rtx right =3D XEXP (x, 1); > + > + /* (and (not ...) (not ...)) can be a single insn in AVX512= . */ > + if (GET_CODE (right) =3D=3D NOT && TARGET_AVX512F > + && (GET_MODE_SIZE (mode) =3D=3D 64 > + || (TARGET_AVX512VL > + && (GET_MODE_SIZE (mode) =3D=3D 32 > + || GET_MODE_SIZE (mode) =3D=3D 16)))) > + right =3D XEXP (right, 0); > + > *total =3D ix86_vec_cost (mode, cost->sse_op) > + rtx_cost (XEXP (XEXP (x, 0), 0), mode, > outer_code, opno, speed) > - + rtx_cost (XEXP (x, 1), mode, > - outer_code, opno, speed); > + + rtx_cost (right, mode, outer_code, opno, speed); > return true; > } > else if (GET_CODE (XEXP (x, 1)) =3D=3D NOT) > @@ -21260,8 +21295,25 @@ ix86_rtx_costs (rtx x, machine_mode mode > > case NOT: > if (GET_MODE_CLASS (mode) =3D=3D MODE_VECTOR_INT) > - // vnot is pxor -1. > - *total =3D ix86_vec_cost (mode, cost->sse_op) + 1; > + { > + /* (not (xor ...)) can be a single insn in AVX512. */ > + if (GET_CODE (XEXP (x, 0)) =3D=3D XOR && TARGET_AVX512F > + && (GET_MODE_SIZE (mode) =3D=3D 64 > + || (TARGET_AVX512VL > + && (GET_MODE_SIZE (mode) =3D=3D 32 > + || GET_MODE_SIZE (mode) =3D=3D 16)))) > + { > + *total =3D ix86_vec_cost (mode, cost->sse_op) > + + rtx_cost (XEXP (XEXP (x, 0), 0), mode, > + outer_code, opno, speed) > + + rtx_cost (XEXP (XEXP (x, 0), 1), mode, > + outer_code, opno, speed); > + return true; > + } > + > + // vnot is pxor -1. > + *total =3D ix86_vec_cost (mode, cost->sse_op) + 1; > + } > else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) > *total =3D cost->add * 2; > else > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17616,6 +17616,98 @@ > operands[2] =3D force_reg (V1TImode, CONSTM1_RTX (V1TImode)); > }) > > +(define_insn "*iornot3" > + [(set (match_operand:VI 0 "register_operand" "=3Dv,v,v,v") > + (ior:VI > + (not:VI > + (match_operand:VI 1 "bcst_vector_operand" "v,Br,v,m")) > + (match_operand:VI 2 "bcst_vector_operand" "vBr,v,m,v")))] > + "( =3D=3D 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > + && (register_operand (operands[1], mode) > + || register_operand (operands[2], mode))" > +{ > + if (!register_operand (operands[1], mode)) > + { > + if (TARGET_AVX512VL) > + return "vpternlog\t{$0xdd, %1, %2, %0|%0, %2, %1, = 0xdd}"; > + return "vpternlog\t{$0xdd, %g1, %g2, %g0|%g0, %g2, = %g1, 0xdd}"; > + } > + if (TARGET_AVX512VL) > + return "vpternlog\t{$0xbb, %2, %1, %0|%0, %1, %2, 0xb= b}"; > + return "vpternlog\t{$0xbb, %g2, %g1, %g0|%g0, %g1, %g2,= 0xbb}"; > +} > + [(set_attr "type" "sselog") > + (set_attr "length_immediate" "1") > + (set_attr "prefix" "evex") > + (set (attr "mode") > + (if_then_else (match_test "TARGET_AVX512VL") > + (const_string "") > + (const_string "XI"))) > + (set (attr "enabled") > + (if_then_else (eq_attr "alternative" "2,3") > + (symbol_ref " =3D=3D 64 || TARGET_AVX512= VL") > + (const_string "*")))]) > + > +(define_insn "*xnor3" > + [(set (match_operand:VI 0 "register_operand" "=3Dv,v") > + (not:VI > + (xor:VI > + (match_operand:VI 1 "bcst_vector_operand" "%v,v") > + (match_operand:VI 2 "bcst_vector_operand" "vBr,m"))))] > + "( =3D=3D 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > + && (register_operand (operands[1], mode) > + || register_operand (operands[2], mode))" > +{ > + if (TARGET_AVX512VL) > + return "vpternlog\t{$0x99, %2, %1, %0|%0, %1, %2, 0x9= 9}"; > + else > + return "vpternlog\t{$0x99, %g2, %g1, %g0|%g0, %g1, %g= 2, 0x99}"; > +} > + [(set_attr "type" "sselog") > + (set_attr "length_immediate" "1") > + (set_attr "prefix" "evex") > + (set (attr "mode") > + (if_then_else (match_test "TARGET_AVX512VL") > + (const_string "") > + (const_string "XI"))) > + (set (attr "enabled") > + (if_then_else (eq_attr "alternative" "1") > + (symbol_ref " =3D=3D 64 || TARGET_AVX512= VL") > + (const_string "*")))]) > + > +(define_code_iterator andor [and ior]) > +(define_code_attr nlogic [(and "nor") (ior "nand")]) > +(define_code_attr ternlog_nlogic [(and "0x11") (ior "0x77")]) > + > +(define_insn "*3" > + [(set (match_operand:VI 0 "register_operand" "=3Dv,v") > + (andor:VI > + (not:VI (match_operand:VI 1 "bcst_vector_operand" "%v,v")) > + (not:VI (match_operand:VI 2 "bcst_vector_operand" "vBr,m"))))] I'm thinking of doing it in simplify_rtx or gimple match.pd to transform (and (not op1)) (not op2)) -> (not: (ior: op1 op2)) (ior (not op1) (not op2)) -> (not : (and op1 op2)) Even w/o avx512f, the transformation should also benefit since it takes less logic operations 3 -> 2.(or 2 -> 2 for pandn). The other 2 patterns: *xnor3 and iornot3 LGTM. > + "( =3D=3D 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > + && (register_operand (operands[1], mode) > + || register_operand (operands[2], mode))" > +{ > + if (TARGET_AVX512VL) > + return "vpternlog\t{$, %2, %1, %0|%0,= %1, %2, }"; > + else > + return "vpternlog\t{$, %g2, %g1, %g0|= %g0, %g1, %g2, }"; > +} > + [(set_attr "type" "sselog") > + (set_attr "length_immediate" "1") > + (set_attr "prefix" "evex") > + (set (attr "mode") > + (if_then_else (match_test "TARGET_AVX512VL") > + (const_string "") > + (const_string "XI"))) > + (set (attr "enabled") > + (if_then_else (eq_attr "alternative" "1") > + (symbol_ref " =3D=3D 64 || TARGET_AVX512= VL") > + (const_string "*")))]) > + > (define_mode_iterator AVX512ZEXTMASK > [(DI "TARGET_AVX512BW") (SI "TARGET_AVX512BW") HI]) > > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512-binop-not-1.h > @@ -0,0 +1,13 @@ > +#include > + > +#define PASTER2(x,y) x##y > +#define PASTER3(x,y,z) _mm##x##_##y##_##z > +#define OP(vec, op, suffix) PASTER3 (vec, op, suffix) > +#define DUP(vec, suffix, val) PASTER3 (vec, set1, suffix) (val) > + > +type > +foo (type x, SCALAR *f) > +{ > + return OP (vec, op, suffix) (x, OP (vec, xor, suffix) (DUP (vec, suffi= x, *f), > + DUP (vec, suffix= , ~0))); > +} > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512-binop-not-2.h > @@ -0,0 +1,13 @@ > +#include > + > +#define PASTER2(x,y) x##y > +#define PASTER3(x,y,z) _mm##x##_##y##_##z > +#define OP(vec, op, suffix) PASTER3 (vec, op, suffix) > +#define DUP(vec, suffix, val) PASTER3 (vec, set1, suffix) (val) > + > +type > +foo (type x, SCALAR *f) > +{ > + return OP (vec, op, suffix) (OP (vec, xor, suffix) (x, DUP (vec, suffi= x, ~0)), > + DUP (vec, suffix, *f)); > +} > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-1.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=3D512 -O2= " } */ > +/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$0xdd, \\(%(= ?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ > +/* { dg-final { scan-assembler-not "vpbroadcast" } } */ > + > +#define type __m512i > +#define vec 512 > +#define op or > +#define suffix epi32 > +#define SCALAR int > + > +#include "avx512-binop-not-1.h" > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512f-orn-si-zmm-2.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=3D512 -O2= " } */ > +/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$0xbb, \\(%(= ?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ > +/* { dg-final { scan-assembler-not "vpbroadcast" } } */ > + > +#define type __m512i > +#define vec 512 > +#define op or > +#define suffix epi32 > +#define SCALAR int > + > +#include "avx512-binop-not-2.h" > --=20 BR, Hongtao