From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by sourceware.org (Postfix) with ESMTPS id 1B3BA3858D35 for ; Mon, 22 May 2023 13:44:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1B3BA3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12d.google.com with SMTP id 2adb3069b0e04-4f122ff663eso6642076e87.2 for ; Mon, 22 May 2023 06:44:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684763082; x=1687355082; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TGzrsvJYJozUq5aLGttwxnv0ula6d5NCdYgmYv+FmFg=; b=sEqMHs5+G7nM13bhkDM3iIhR7FrZWW9S8YSB2KjO6/Tw+9k6Z4xySUag5eL9GGny7x 0ToeSsfAhNkXbBuuygVtx8mgsvxBZ17YAoLfSCi/0Yrd7ftPBqkVfeTah7lqXl0lsYZV bFhoyc2bbKKjoUYNPiSJJizVveLJoNRDfATZYt9WVkSfL03MJ0By3KgwEI6KiRiyCv/T ZHry6BmKvfBOVqPEbpAylWDo4MGRbe625kTgPvqhKyYva08oFA6sEa1zSwccTUmdIS/n Ae6g0RcOi51jKstGoWDoZ0sw1vF/J/hRsi0axuk/6DCH00Nejsj9EtRISuiggKcxl/Vh /ztA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684763082; x=1687355082; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TGzrsvJYJozUq5aLGttwxnv0ula6d5NCdYgmYv+FmFg=; b=G1fMUOm+ucQHVQrwwGJXZGZMKdzhvstTMR2W05kd8UHfY6sT7W1YyxkzRsxRZYcqcJ zaYBXCj47rirmrlFZy+DK6SHGeZ3oosW+cS68uVIJdEggtd/mrGY8k8iiAklFIg5N3uj hypBd15P3ypR8Hqbagpf7NEwEeq8BuLG4fqrIjTWZsHbp1+hU680PFa9gQYCc/AvJHz3 C6xzj9WefyZ0ojfUo/R14MLAOiABSHzz2+/W9ole6cRSwxlJG0E8RG408v6Sy04ImQaG wkmxd5G/dGl1y+1/hBNePl+2xzrJH7Psim+gLQvzdlhgOU4coCUASmEWfYHz4LmtLyD4 F6oQ== X-Gm-Message-State: AC+VfDzVxQHvG1fdRszNOOIoLF/GvQdsUXC1ULHth4/pssrbkudpXvGj c/Ii2xAP0ryk6slcs7r6UhZgKkfyRe+2JNMIBTA= X-Google-Smtp-Source: ACHHUZ6hkbFK7FiECot8S+gM2SvrtwsnmUg/592GE06NWHPcmgUm6VGfvWRL21/NlSY6Rz92ToFGFo+NBGPEFt78Qpk= X-Received: by 2002:a05:6512:41b:b0:4f4:26d8:3668 with SMTP id u27-20020a056512041b00b004f426d83668mr1445577lfk.25.1684763082275; Mon, 22 May 2023 06:44:42 -0700 (PDT) MIME-Version: 1.0 References: <72a5c5db-bc06-eded-d229-82af34342515@linux.ibm.com> <71fda837-6a92-7f74-43e1-90b046919f6a@linux.ibm.com> In-Reply-To: <71fda837-6a92-7f74-43e1-90b046919f6a@linux.ibm.com> From: Richard Biener Date: Mon, 22 May 2023 15:44:30 +0200 Message-ID: Subject: Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 To: "Kewen.Lin" Cc: GCC Patches , Richard Sandiford , Segher Boessenkool , Peter Bergner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, May 17, 2023 at 8:15=E2=80=AFAM Kewen.Lin wro= te: > > Hi, > > Following Richi's suggestion in [1], I'm working on deferring > cost evaluation next to the transformation, this patch is > to enhance function vect_transform_slp_perm_load_1 which > could under-cost for vector permutation, since the costing > doesn't try to consider nvectors_per_build, it's inconsistent > with the transformation part. > > Bootstrapped and regtested on x86_64-redhat-linux, > aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > > Is it ok for trunk? > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html > > BR, > Kewen > ----- > gcc/ChangeLog: > > * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the > calculation on n_perms by considering nvectors_per_build. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. > --- > .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ > gcc/tree-vect-slp.cc | 66 ++++++++++--------- > 2 files changed, 57 insertions(+), 32 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp= -perm.c > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c= b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > new file mode 100644 > index 00000000000..e5c4dceddfb > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* Specify power9 to ensure the vectorization is profitable > + and test point stands, otherwise it could be not profitable > + to vectorize. */ > +/* { dg-additional-options "-mdejagnu-cpu=3Dpower9 -mpower9-vector" } */ > + > +/* Verify we cost the exact count for required vec_perm. */ > + > +int x[1024], y[1024]; > + > +void > +foo () > +{ > + for (int i =3D 0; i < 512; ++i) > + { > + x[2 * i] =3D y[1023 - (2 * i)]; > + x[2 * i + 1] =3D y[1023 - (2 * i + 1)]; > + } > +} > + > +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */ > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index e5c9d7e766e..af9a6dd4fa9 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, = slp_tree node, > > mode =3D TYPE_MODE (vectype); > poly_uint64 nunits =3D TYPE_VECTOR_SUBPARTS (vectype); > + unsigned int nstmts =3D SLP_TREE_NUMBER_OF_VEC_STMTS (node); > > /* Initialize the vect stmts of NODE to properly insert the generated > stmts later. */ > if (! analyze_only) > - for (unsigned i =3D SLP_TREE_VEC_STMTS (node).length (); > - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) > + for (unsigned i =3D SLP_TREE_VEC_STMTS (node).length (); i < nstmts;= i++) > SLP_TREE_VEC_STMTS (node).quick_push (NULL); > > /* Generate permutation masks for every NODE. Number of masks for each= NODE > @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, s= lp_tree node, > (b) the permutes only need a single vector input. */ > mask.new_vector (nunits, group_size, 3); > nelts_to_build =3D mask.encoded_nelts (); > - nvectors_per_build =3D SLP_TREE_VEC_STMTS (node).length (); > + /* It's possible to obtain zero nstmts during analyze_only, so mak= e > + it at least one to ensure the later computation for n_perms > + proceed. */ > + nvectors_per_build =3D nstmts > 0 ? nstmts : 1; > in_nlanes =3D DR_GROUP_SIZE (stmt_info) * 3; > } > else > @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, = slp_tree node, > return false; > } > > - ++*n_perms; > - > + tree mask_vec =3D NULL_TREE; > if (!analyze_only) > - { > - tree mask_vec =3D vect_gen_perm_mask_checked (vectype, = indices); > + mask_vec =3D vect_gen_perm_mask_checked (vectype, indices= ); > > - if (second_vec_index =3D=3D -1) > - second_vec_index =3D first_vec_index; > + if (second_vec_index =3D=3D -1) > + second_vec_index =3D first_vec_index; > > - for (unsigned int ri =3D 0; ri < nvectors_per_build; ++= ri) > + for (unsigned int ri =3D 0; ri < nvectors_per_build; ++ri) > + { > + ++*n_perms; So the "real" change is doing *n_perms +=3D nvectors_per_build; and *n_perms was unused when !analyze_only? And since at analysis time we (sometimes?) have zero nvectors you have to fixup above? Which cases are that? In principle the patch looks good to me. Richard. > + if (analyze_only) > + continue; > + /* Generate the permute statement if necessary. */ > + tree first_vec =3D dr_chain[first_vec_index + ri]; > + tree second_vec =3D dr_chain[second_vec_index + ri]; > + gassign *stmt =3D as_a (stmt_info->stmt); > + tree perm_dest > + =3D vect_create_destination_var (gimple_assign_lhs (s= tmt), > + vectype); > + perm_dest =3D make_ssa_name (perm_dest); > + gimple *perm_stmt > + =3D gimple_build_assign (perm_dest, VEC_PERM_EXPR, fi= rst_vec, > + second_vec, mask_vec); > + vect_finish_stmt_generation (vinfo, stmt_info, perm_stm= t, > + gsi); > + if (dce_chain) > { > - /* Generate the permute statement if necessary. */ > - tree first_vec =3D dr_chain[first_vec_index + ri]; > - tree second_vec =3D dr_chain[second_vec_index + ri]= ; > - gassign *stmt =3D as_a (stmt_info->stmt)= ; > - tree perm_dest > - =3D vect_create_destination_var (gimple_assign_lh= s (stmt), > - vectype); > - perm_dest =3D make_ssa_name (perm_dest); > - gimple *perm_stmt > - =3D gimple_build_assign (perm_dest, VEC_PERM_EXPR= , > - first_vec, second_vec, mas= k_vec); > - vect_finish_stmt_generation (vinfo, stmt_info, perm= _stmt, > - gsi); > - if (dce_chain) > - { > - bitmap_set_bit (used_defs, first_vec_index + ri= ); > - bitmap_set_bit (used_defs, second_vec_index + r= i); > - } > - > - /* Store the vector statement in NODE. */ > - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] > - =3D perm_stmt; > + bitmap_set_bit (used_defs, first_vec_index + ri); > + bitmap_set_bit (used_defs, second_vec_index + ri); > } > + > + /* Store the vector statement in NODE. */ > + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] =3D per= m_stmt; > } > } > else if (!analyze_only) > -- > 2.39.1 >