From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <prathamesh.kulkarni@linaro.org>
Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32])
	by sourceware.org (Postfix) with ESMTPS id 445A73858291
	for <gcc-patches@gcc.gnu.org>; Fri,  4 Nov 2022 06:43:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 445A73858291
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org
Received: by mail-io1-xd32.google.com with SMTP id q21so240056iod.4
        for <gcc-patches@gcc.gnu.org>; Thu, 03 Nov 2022 23:43:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=RR3c3ChH5kPmqfSUl/eDrwpswN/bKq2CnRiwJDMQpZw=;
        b=NTJzZevPNdkc4nULUHdf0SNQ/5wAK/VjbsMlriWGNM4QA3WAmafwXxKB2eLcotx+od
         WijOlMkF5KgVMuwisC7NeOR9AqYIYtQ5zsx6LNf21oEuQDXi4s48B28p8Y5gsE4qmOEH
         tN6kIbJTkDBluXQdUhq5fdA//6bLxrJ8A+nyqoGAcE1Hil/T/ChxTOqQEL2WVqiSxzFA
         6NrnQxXuktiX1F5L4RJQEg+OGwD794T8kN+5tj4ns8mpjj+JgN99Egc2Xj4yioizGwL1
         FWWt6nfdt+tDYqxD6fMQjYIciJCEg1I0Lu5865TMQHMK/hQC2wuSDWQpjrw1zXno1Y9r
         RTFw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=RR3c3ChH5kPmqfSUl/eDrwpswN/bKq2CnRiwJDMQpZw=;
        b=OdGfaupS+hy1XaH1mQp0tPhA2XLYtdVh9Gv+RAZTtFt+Uoq9TeojWnDTRl/fvInZKe
         xCT573e69+l2j9fDW2mlnwDQtBAncMbywRE48yJPNch69pr7rxSDgUtakhKfavjG35UO
         iFwJX5/uvxdigZe8TVYYvLQhlrpHUqMfP3BonGLlbuCVcv+Vb7rJc8NiZ02Kz6l0LjUH
         vhgoJ8/37sPs5l3gDOex9qoD1WnlvEXQGSyoY2H79+jT12T+D7nt9F/v0GLvnkhL5GCh
         FrtWb3oyT5mmoC7r4K9oLQptUb80D/VvxkNINfmInG0L8J8qwCyMB5wL0s0Xi46/u0Is
         wvLA==
X-Gm-Message-State: ACrzQf34BRv0/eHX4q0dbtQ5Pm+sW0RrBmBDm0B2T2OqeZSMQGnT3JAD
	fGZ58cbMVqrt8C0+dlgOjS423j40/SFYpDh11JObKQ==
X-Google-Smtp-Source: AMsMyM4eTxjIc4z5tM0OKMgx7LOT1r8dpb0aA1Jje+Ife9Tg83spZLmxu2ghTMcfXgbRQgbTm1hGiMJ5uEcBDb9H3Uk=
X-Received: by 2002:a6b:7907:0:b0:6cc:c6fb:d78e with SMTP id
 i7-20020a6b7907000000b006ccc6fbd78emr19921954iop.37.1667544218372; Thu, 03
 Nov 2022 23:43:38 -0700 (PDT)
MIME-Version: 1.0
References: <20221104000432.15254-1-hongyu.wang@intel.com>
In-Reply-To: <20221104000432.15254-1-hongyu.wang@intel.com>
From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Date: Fri, 4 Nov 2022 12:13:02 +0530
Message-ID: <CAAgBjM=uZ_+057TkSd9wHXfs3D1770GfRsbeR7QfCWpfo3o8FQ@mail.gmail.com>
Subject: Re: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and
 operation [PR98167]
To: Hongyu Wang <hongyu.wang@intel.com>
Cc: gcc-patches@gcc.gnu.org, hongtao.liu@intel.com
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, 4 Nov 2022 at 05:36, Hongyu Wang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> This is a follow-up patch for PR98167
>
> The sequence
>      c1 = VEC_PERM_EXPR (a, a, mask)
>      c2 = VEC_PERM_EXPR (b, b, mask)
>      c3 = c1 op c2
> can be optimized to
>      c = a op b
>      c3 = VEC_PERM_EXPR (c, c, mask)
> for all integer vector operation, and float operation with
> full permutation.
>
> Bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         PR target/98167
>         * match.pd: New perm + vector op patterns for int and fp vector.
>
> gcc/testsuite/ChangeLog:
>
>         PR target/98167
>         * gcc.target/i386/pr98167.c: New test.
> ---
>  gcc/match.pd                            | 49 +++++++++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr98167.c | 44 ++++++++++++++++++++++
>  2 files changed, 93 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr98167.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 194ba8f5188..b85ad34f609 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8189,3 +8189,52 @@ and,
>   (bit_and (negate @0) integer_onep@1)
>   (if (!TYPE_OVERFLOW_SANITIZED (type))
>    (bit_and @0 @1)))
> +
> +/* Optimize
> +   c1 = VEC_PERM_EXPR (a, a, mask)
> +   c2 = VEC_PERM_EXPR (b, b, mask)
> +   c3 = c1 op c2
> +   -->
> +   c = a op b
> +   c3 = VEC_PERM_EXPR (c, c, mask)
> +   For all integer non-div operations.  */
> +(for op (plus minus mult bit_and bit_ior bit_xor
> +        lshift rshift)
> + (simplify
> +  (op (vec_perm @0 @0 VECTOR_CST@2) (vec_perm @1 @1 VECTOR_CST@2))
> +    (if (VECTOR_INTEGER_TYPE_P (type))
> +     (vec_perm (op @0 @1) (op @0 @1) @2))))
Just wondering, why should mask be CST here ?
I guess the transform should work as long as mask is same for both
vectors even if it's
not constant ?
> +
> +/* Similar for float arithmetic when permutation constant covers
> +   all vector elements.  */
> +(for op (plus minus mult)
> + (simplify
> +  (op (vec_perm @0 @0 VECTOR_CST@2) (vec_perm @1 @1 VECTOR_CST@2))
> +    (if (VECTOR_FLOAT_TYPE_P (type))
> +     (with
> +      {
> +       tree perm_cst = @2;
> +       vec_perm_builder builder;
> +       bool full_perm_p = false;
> +       if (tree_to_vec_perm_builder (&builder, perm_cst))
> +         {
> +           /* Create a vec_perm_indices for the integer vector.  */
> +           int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
If this transform is meant only for VLS vectors, I guess you should
bail out if TYPE_VECTOR_SUBPARTS is not constant,
otherwise it will crash for VLA vectors.

Thanks,
Prathamesh
> +           vec_perm_indices sel (builder, 1, nelts);
> +
> +           /* Check if perm indices covers all vector elements.  */
> +           int count = 0, i, j;
> +           for (i = 0; i < nelts; i++)
> +             for (j = 0; j < nelts; j++)
> +               {
> +                 if (sel[j].to_constant () == i)
> +                   {
> +                     count++;
> +                     break;
> +                   }
> +               }
> +           full_perm_p = count == nelts;
> +         }
> +       }
> +       (if (full_perm_p)
> +       (vec_perm (op @0 @1) (op @0 @1) @2))))))
> diff --git a/gcc/testsuite/gcc.target/i386/pr98167.c b/gcc/testsuite/gcc.target/i386/pr98167.c
> new file mode 100644
> index 00000000000..40e0ac11332
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr98167.c
> @@ -0,0 +1,44 @@
> +/* PR target/98167 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx2" } */
> +
> +/* { dg-final { scan-assembler-times "vpshufd\t" 8 } } */
> +/* { dg-final { scan-assembler-times "vpermilps\t" 3 } } */
> +
> +#define VEC_PERM_4 \
> +  2, 3, 1, 0
> +#define VEC_PERM_8 \
> +  4, 5, 6, 7, 3, 2, 1, 0
> +#define VEC_PERM_16 \
> +  8, 9, 10, 11, 12, 13, 14, 15, 7, 6, 5, 4, 3, 2, 1, 0
> +
> +#define TYPE_PERM_OP(type, size, op, name) \
> +  typedef type v##size##s##type __attribute__ ((vector_size(4*size))); \
> +  v##size##s##type type##foo##size##i_##name (v##size##s##type a, \
> +                                             v##size##s##type b) \
> +  { \
> +    v##size##s##type a1 = __builtin_shufflevector (a, a, \
> +                                                  VEC_PERM_##size); \
> +    v##size##s##type b1 = __builtin_shufflevector (b, b, \
> +                                                  VEC_PERM_##size); \
> +    return a1 op b1; \
> +  }
> +
> +#define INT_PERMS(op, name) \
> +  TYPE_PERM_OP (int, 4, op, name) \
> +
> +#define FP_PERMS(op, name) \
> +  TYPE_PERM_OP (float, 4, op, name) \
> +
> +INT_PERMS (+, add)
> +INT_PERMS (-, sub)
> +INT_PERMS (*, mul)
> +INT_PERMS (|, ior)
> +INT_PERMS (^, xor)
> +INT_PERMS (&, and)
> +INT_PERMS (<<, shl)
> +INT_PERMS (>>, shr)
> +FP_PERMS (+, add)
> +FP_PERMS (-, sub)
> +FP_PERMS (*, mul)
> +
> --
> 2.18.1
>