From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=wM+B=BM=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e])
	by sourceware.org (Postfix) with ESMTPS id 43A453858D38
	for <gcc-patches@gcc.gnu.org>; Tue, 23 May 2023 06:19:31 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 43A453858D38
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-4f3baf04f0cso2917526e87.1
        for <gcc-patches@gcc.gnu.org>; Mon, 22 May 2023 23:19:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1684822770; x=1687414770;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=OQPwN3960hHcbI3Jp79FbkBIvFpuqbiqklIf9sY0Owc=;
        b=KMEGUUTJJyBbesKKxCr8ay7TKyrgmRHmgfkcK8OjgtUBNtu2wU/q4+4LZCHx9BeW92
         mlz+XMNwdwU1GAzOl29j0ClkpTJDfoxzOhEsM77ZeL1tnv5exRlcHIPWKE1MYEVZPSpr
         GNUxGy3d2XCZ45oDlcwyzgaMsJeJfoxVs3Qs+Qt1l92kq0JwJu5WNOCJZSKg+qTaIvUx
         SfVv7avmqm9SvaYwWvMIAp++rmNFFEjXB72PDxke2sL8lYzMf1lijVED1c/cUObD7Trm
         4q7E/oruUPFVnzS15vw5NyMqmLzAS2Y+7NsakOThEfxEcLdw5FMnpvbn1JbmecTzRyvh
         wYNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684822770; x=1687414770;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=OQPwN3960hHcbI3Jp79FbkBIvFpuqbiqklIf9sY0Owc=;
        b=RVUb8VviyPHg37u4EvUkW6ewFdSGkIMCxDESzk8UfBhA+rNGb6qC8PcDtCudi/bVwE
         FQ4NnO60OjxizNBiwlpDG+QdgYy0UUfdpg82CBXJ3LGW4hpeNAyRSftbyhhm1ug9fseI
         9XiWkIhevC3EosxZw1JzNPTz96b0XiqmNzPIMNAVLTcThPbXWlx6OvCVcn89D8kOXEBs
         AO+krg7tSZA45qL4VVxLyorPTmOqDSqRpp++UNswYhvy4cmthXt547kmWCeiy78TuUki
         FrcrghilovsUFRiUPTknnklUN3uLPEEhn9T23dRfujy0GVRxvApKODz5sEkgDy1kxxF6
         x/nQ==
X-Gm-Message-State: AC+VfDz0cDfSxg+lsiAr6DP7uy8N33jd1CEOCXhzKxcF3kTunqO7uh2x
	sJODVyxqtUgixwTKe6Yrh+p/kwGfdYE3KMT29PI=
X-Google-Smtp-Source: ACHHUZ4t5DhTM982/Q4hgRXUCRJKxeyeVbwivU7r5SoZAc7C+aVRPQtykE1ViwDgtthUMuwcZH0D+I7yijAb+onsl/g=
X-Received: by 2002:ac2:5ec2:0:b0:4f3:94b5:3272 with SMTP id
 d2-20020ac25ec2000000b004f394b53272mr4072842lfq.11.1684822769338; Mon, 22 May
 2023 23:19:29 -0700 (PDT)
MIME-Version: 1.0
References: <72a5c5db-bc06-eded-d229-82af34342515@linux.ibm.com>
 <71fda837-6a92-7f74-43e1-90b046919f6a@linux.ibm.com> <CAFiYyc0t03oJ3D0rdfzrviPk7Zggek095_F9-aJS2VESP-u10g@mail.gmail.com>
 <06c5a418-ca90-b117-04b1-c3bef50ae28c@linux.ibm.com>
In-Reply-To: <06c5a418-ca90-b117-04b1-c3bef50ae28c@linux.ibm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Tue, 23 May 2023 08:19:17 +0200
Message-ID: <CAFiYyc0J=Ot7U4BC0sANFQ2BTSut797gDjzwfU5LnGUm9uxS3Q@mail.gmail.com>
Subject: Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, Richard Sandiford <richard.sandiford@arm.com>, 
	Segher Boessenkool <segher@kernel.crashing.org>, Peter Bergner <bergner@linux.ibm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, May 23, 2023 at 5:01=E2=80=AFAM Kewen.Lin <linkw@linux.ibm.com> wro=
te:
>
> Hi Richi,
>
> Thanks for the review!
>
> on 2023/5/22 21:44, Richard Biener wrote:
> > On Wed, May 17, 2023 at 8:15=E2=80=AFAM Kewen.Lin <linkw@linux.ibm.com>=
 wrote:
> >>
> >> Hi,
> >>
> >> Following Richi's suggestion in [1], I'm working on deferring
> >> cost evaluation next to the transformation, this patch is
> >> to enhance function vect_transform_slp_perm_load_1 which
> >> could under-cost for vector permutation, since the costing
> >> doesn't try to consider nvectors_per_build, it's inconsistent
> >> with the transformation part.
> >>
> >> Bootstrapped and regtested on x86_64-redhat-linux,
> >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
> >>
> >> Is it ok for trunk?
> >>
> >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
> >>
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >>         * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust th=
e
> >>         calculation on n_perms by considering nvectors_per_build.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>         * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
> >> ---
> >>  .../vect/costmodel/ppc/costmodel-slp-perm.c   | 23 +++++++
> >>  gcc/tree-vect-slp.cc                          | 66 ++++++++++--------=
-
> >>  2 files changed, 57 insertions(+), 32 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-=
slp-perm.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-per=
m.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> >> new file mode 100644
> >> index 00000000000..e5c4dceddfb
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> >> @@ -0,0 +1,23 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-require-effective-target vect_int } */
> >> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> >> +/* Specify power9 to ensure the vectorization is profitable
> >> +   and test point stands, otherwise it could be not profitable
> >> +   to vectorize.  */
> >> +/* { dg-additional-options "-mdejagnu-cpu=3Dpower9 -mpower9-vector" }=
 */
> >> +
> >> +/* Verify we cost the exact count for required vec_perm.  */
> >> +
> >> +int x[1024], y[1024];
> >> +
> >> +void
> >> +foo ()
> >> +{
> >> +  for (int i =3D 0; i < 512; ++i)
> >> +    {
> >> +      x[2 * i] =3D y[1023 - (2 * i)];
> >> +      x[2 * i + 1] =3D y[1023 - (2 * i + 1)];
> >> +    }
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } =
*/
> >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> >> index e5c9d7e766e..af9a6dd4fa9 100644
> >> --- a/gcc/tree-vect-slp.cc
> >> +++ b/gcc/tree-vect-slp.cc
> >> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinf=
o, slp_tree node,
> >>
> >>    mode =3D TYPE_MODE (vectype);
> >>    poly_uint64 nunits =3D TYPE_VECTOR_SUBPARTS (vectype);
> >> +  unsigned int nstmts =3D SLP_TREE_NUMBER_OF_VEC_STMTS (node);
> >>
> >>    /* Initialize the vect stmts of NODE to properly insert the generat=
ed
> >>       stmts later.  */
> >>    if (! analyze_only)
> >> -    for (unsigned i =3D SLP_TREE_VEC_STMTS (node).length ();
> >> -        i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
> >> +    for (unsigned i =3D SLP_TREE_VEC_STMTS (node).length (); i < nstm=
ts; i++)
> >>        SLP_TREE_VEC_STMTS (node).quick_push (NULL);
> >>
> >>    /* Generate permutation masks for every NODE. Number of masks for e=
ach NODE
> >> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo=
, slp_tree node,
> >>          (b) the permutes only need a single vector input.  */
> >>        mask.new_vector (nunits, group_size, 3);
> >>        nelts_to_build =3D mask.encoded_nelts ();
> >> -      nvectors_per_build =3D SLP_TREE_VEC_STMTS (node).length ();
> >> +      /* It's possible to obtain zero nstmts during analyze_only, so =
make
> >> +        it at least one to ensure the later computation for n_perms
> >> +        proceed.  */
> >> +      nvectors_per_build =3D nstmts > 0 ? nstmts : 1;
> >>        in_nlanes =3D DR_GROUP_SIZE (stmt_info) * 3;
> >>      }
> >>    else
> >> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinf=
o, slp_tree node,
> >>                   return false;
> >>                 }
> >>
> >> -             ++*n_perms;
> >> -
> >> +             tree mask_vec =3D NULL_TREE;
> >>               if (!analyze_only)
> >> -               {
> >> -                 tree mask_vec =3D vect_gen_perm_mask_checked (vectyp=
e, indices);
> >> +               mask_vec =3D vect_gen_perm_mask_checked (vectype, indi=
ces);
> >>
> >> -                 if (second_vec_index =3D=3D -1)
> >> -                   second_vec_index =3D first_vec_index;
> >> +             if (second_vec_index =3D=3D -1)
> >> +               second_vec_index =3D first_vec_index;
> >>
> >> -                 for (unsigned int ri =3D 0; ri < nvectors_per_build;=
 ++ri)
> >> +             for (unsigned int ri =3D 0; ri < nvectors_per_build; ++r=
i)
> >> +               {
> >> +                 ++*n_perms;
> >
> > So the "real" change is doing
> >
> >       *n_perms +=3D nvectors_per_build;
> >
> > and *n_perms was unused when !analyze_only?  And since at
>
> Yes, although both !analyze_only and analyze_only calls pass n_perms in, =
now
> only the call sites with analyze_only will use the returned n_perms furth=
er.
>
> > analysis time we (sometimes?) have zero nvectors you have to
> > fixup above?  Which cases are that?
>
> Yes, the fixup is to avoid to result in unexpected n_perms in function
> vect_optimize_slp_pass::internal_node_cost=E3=80=82  One typical case is
> gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optim=
ize
> out one more vec_perm unexpectedly.
>
> In vect_optimize_slp_pass::internal_node_cost, it checks if the returned =
n_perms
> is zero or not (vec_perm not needed or needed).
>
>       if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL=
,
>                                            nullptr, vf, true, false, &n_p=
erms))
>         {
>           auto rep =3D SLP_TREE_REPRESENTATIVE (node);
>           if (out_layout_i =3D=3D 0)
>             {
>               /* Use the fallback cost if the load is an N-to-N permutati=
on.
>                  Otherwise assume that the node will be rejected later
>                  and rebuilt from scalars.  */
>               if (STMT_VINFO_GROUPED_ACCESS (rep)
>                   && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep))
>                       =3D=3D SLP_TREE_LANES (node)))
>                 return fallback_cost;
>               return 0;
>             }
>           return -1;
>         }
>
>       /* See the comment above the corresponding VEC_PERM_EXPR handling. =
 */
>       return n_perms =3D=3D 0 ? 0 : 1;
>
> In vect_optimize_slp_pass::forward_pass (), it only considers the case th=
at
> factor > 0 (there is some vec_perm needed).
>
>               /* Accumulate the cost of using LAYOUT_I within NODE,
>                  both for the inputs and the outputs.  */
>               int factor =3D internal_node_cost (vertex.node, layout_i,
>                                                layout_i);
>               if (factor < 0)
>                 {
>                   is_possible =3D false;
>                   break;
>                 }
>               else if (factor)
>                 layout_costs.internal_cost.add_serial_cost
>                   ({ vertex.weight * factor, m_optimize_size });

Ah, OK - thanks for clarifying.

The patch is OK.
Richard.

> BR,
> Kewen
>
> >
> > In principle the patch looks good to me.
> >
> > Richard.
> >
> >> +                 if (analyze_only)
> >> +                   continue;
> >> +                 /* Generate the permute statement if necessary.  */
> >> +                 tree first_vec =3D dr_chain[first_vec_index + ri];
> >> +                 tree second_vec =3D dr_chain[second_vec_index + ri];
> >> +                 gassign *stmt =3D as_a<gassign *> (stmt_info->stmt);
> >> +                 tree perm_dest
> >> +                   =3D vect_create_destination_var (gimple_assign_lhs=
 (stmt),
> >> +                                                  vectype);
> >> +                 perm_dest =3D make_ssa_name (perm_dest);
> >> +                 gimple *perm_stmt
> >> +                   =3D gimple_build_assign (perm_dest, VEC_PERM_EXPR,=
 first_vec,
> >> +                                          second_vec, mask_vec);
> >> +                 vect_finish_stmt_generation (vinfo, stmt_info, perm_=
stmt,
> >> +                                              gsi);
> >> +                 if (dce_chain)
> >>                     {
> >> -                     /* Generate the permute statement if necessary. =
 */
> >> -                     tree first_vec =3D dr_chain[first_vec_index + ri=
];
> >> -                     tree second_vec =3D dr_chain[second_vec_index + =
ri];
> >> -                     gassign *stmt =3D as_a<gassign *> (stmt_info->st=
mt);
> >> -                     tree perm_dest
> >> -                       =3D vect_create_destination_var (gimple_assign=
_lhs (stmt),
> >> -                                                      vectype);
> >> -                     perm_dest =3D make_ssa_name (perm_dest);
> >> -                     gimple *perm_stmt
> >> -                       =3D gimple_build_assign (perm_dest, VEC_PERM_E=
XPR,
> >> -                                              first_vec, second_vec, =
mask_vec);
> >> -                     vect_finish_stmt_generation (vinfo, stmt_info, p=
erm_stmt,
> >> -                                                  gsi);
> >> -                     if (dce_chain)
> >> -                       {
> >> -                         bitmap_set_bit (used_defs, first_vec_index +=
 ri);
> >> -                         bitmap_set_bit (used_defs, second_vec_index =
+ ri);
> >> -                       }
> >> -
> >> -                     /* Store the vector statement in NODE.  */
> >> -                     SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> >> -                       =3D perm_stmt;
> >> +                     bitmap_set_bit (used_defs, first_vec_index + ri)=
;
> >> +                     bitmap_set_bit (used_defs, second_vec_index + ri=
);
> >>                     }
> >> +
> >> +                 /* Store the vector statement in NODE.  */
> >> +                 SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] =3D =
perm_stmt;
> >>                 }
> >>             }
> >>           else if (!analyze_only)
> >> --
> >> 2.39.1
>