From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by sourceware.org (Postfix) with ESMTPS id 43A453858D38 for ; Tue, 23 May 2023 06:19:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 43A453858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-4f3baf04f0cso2917526e87.1 for ; Mon, 22 May 2023 23:19:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684822770; x=1687414770; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OQPwN3960hHcbI3Jp79FbkBIvFpuqbiqklIf9sY0Owc=; b=KMEGUUTJJyBbesKKxCr8ay7TKyrgmRHmgfkcK8OjgtUBNtu2wU/q4+4LZCHx9BeW92 mlz+XMNwdwU1GAzOl29j0ClkpTJDfoxzOhEsM77ZeL1tnv5exRlcHIPWKE1MYEVZPSpr GNUxGy3d2XCZ45oDlcwyzgaMsJeJfoxVs3Qs+Qt1l92kq0JwJu5WNOCJZSKg+qTaIvUx SfVv7avmqm9SvaYwWvMIAp++rmNFFEjXB72PDxke2sL8lYzMf1lijVED1c/cUObD7Trm 4q7E/oruUPFVnzS15vw5NyMqmLzAS2Y+7NsakOThEfxEcLdw5FMnpvbn1JbmecTzRyvh wYNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684822770; x=1687414770; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OQPwN3960hHcbI3Jp79FbkBIvFpuqbiqklIf9sY0Owc=; b=RVUb8VviyPHg37u4EvUkW6ewFdSGkIMCxDESzk8UfBhA+rNGb6qC8PcDtCudi/bVwE FQ4NnO60OjxizNBiwlpDG+QdgYy0UUfdpg82CBXJ3LGW4hpeNAyRSftbyhhm1ug9fseI 9XiWkIhevC3EosxZw1JzNPTz96b0XiqmNzPIMNAVLTcThPbXWlx6OvCVcn89D8kOXEBs AO+krg7tSZA45qL4VVxLyorPTmOqDSqRpp++UNswYhvy4cmthXt547kmWCeiy78TuUki FrcrghilovsUFRiUPTknnklUN3uLPEEhn9T23dRfujy0GVRxvApKODz5sEkgDy1kxxF6 x/nQ== X-Gm-Message-State: AC+VfDz0cDfSxg+lsiAr6DP7uy8N33jd1CEOCXhzKxcF3kTunqO7uh2x sJODVyxqtUgixwTKe6Yrh+p/kwGfdYE3KMT29PI= X-Google-Smtp-Source: ACHHUZ4t5DhTM982/Q4hgRXUCRJKxeyeVbwivU7r5SoZAc7C+aVRPQtykE1ViwDgtthUMuwcZH0D+I7yijAb+onsl/g= X-Received: by 2002:ac2:5ec2:0:b0:4f3:94b5:3272 with SMTP id d2-20020ac25ec2000000b004f394b53272mr4072842lfq.11.1684822769338; Mon, 22 May 2023 23:19:29 -0700 (PDT) MIME-Version: 1.0 References: <72a5c5db-bc06-eded-d229-82af34342515@linux.ibm.com> <71fda837-6a92-7f74-43e1-90b046919f6a@linux.ibm.com> <06c5a418-ca90-b117-04b1-c3bef50ae28c@linux.ibm.com> In-Reply-To: <06c5a418-ca90-b117-04b1-c3bef50ae28c@linux.ibm.com> From: Richard Biener Date: Tue, 23 May 2023 08:19:17 +0200 Message-ID: Subject: Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 To: "Kewen.Lin" Cc: GCC Patches , Richard Sandiford , Segher Boessenkool , Peter Bergner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, May 23, 2023 at 5:01=E2=80=AFAM Kewen.Lin wro= te: > > Hi Richi, > > Thanks for the review! > > on 2023/5/22 21:44, Richard Biener wrote: > > On Wed, May 17, 2023 at 8:15=E2=80=AFAM Kewen.Lin = wrote: > >> > >> Hi, > >> > >> Following Richi's suggestion in [1], I'm working on deferring > >> cost evaluation next to the transformation, this patch is > >> to enhance function vect_transform_slp_perm_load_1 which > >> could under-cost for vector permutation, since the costing > >> doesn't try to consider nvectors_per_build, it's inconsistent > >> with the transformation part. > >> > >> Bootstrapped and regtested on x86_64-redhat-linux, > >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > >> > >> Is it ok for trunk? > >> > >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html > >> > >> BR, > >> Kewen > >> ----- > >> gcc/ChangeLog: > >> > >> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust th= e > >> calculation on n_perms by considering nvectors_per_build. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. > >> --- > >> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ > >> gcc/tree-vect-slp.cc | 66 ++++++++++--------= - > >> 2 files changed, 57 insertions(+), 32 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-= slp-perm.c > >> > >> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-per= m.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > >> new file mode 100644 > >> index 00000000000..e5c4dceddfb > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > >> @@ -0,0 +1,23 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-require-effective-target vect_int } */ > >> +/* { dg-require-effective-target powerpc_p9vector_ok } */ > >> +/* Specify power9 to ensure the vectorization is profitable > >> + and test point stands, otherwise it could be not profitable > >> + to vectorize. */ > >> +/* { dg-additional-options "-mdejagnu-cpu=3Dpower9 -mpower9-vector" }= */ > >> + > >> +/* Verify we cost the exact count for required vec_perm. */ > >> + > >> +int x[1024], y[1024]; > >> + > >> +void > >> +foo () > >> +{ > >> + for (int i =3D 0; i < 512; ++i) > >> + { > >> + x[2 * i] =3D y[1023 - (2 * i)]; > >> + x[2 * i + 1] =3D y[1023 - (2 * i + 1)]; > >> + } > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } = */ > >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > >> index e5c9d7e766e..af9a6dd4fa9 100644 > >> --- a/gcc/tree-vect-slp.cc > >> +++ b/gcc/tree-vect-slp.cc > >> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinf= o, slp_tree node, > >> > >> mode =3D TYPE_MODE (vectype); > >> poly_uint64 nunits =3D TYPE_VECTOR_SUBPARTS (vectype); > >> + unsigned int nstmts =3D SLP_TREE_NUMBER_OF_VEC_STMTS (node); > >> > >> /* Initialize the vect stmts of NODE to properly insert the generat= ed > >> stmts later. */ > >> if (! analyze_only) > >> - for (unsigned i =3D SLP_TREE_VEC_STMTS (node).length (); > >> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) > >> + for (unsigned i =3D SLP_TREE_VEC_STMTS (node).length (); i < nstm= ts; i++) > >> SLP_TREE_VEC_STMTS (node).quick_push (NULL); > >> > >> /* Generate permutation masks for every NODE. Number of masks for e= ach NODE > >> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo= , slp_tree node, > >> (b) the permutes only need a single vector input. */ > >> mask.new_vector (nunits, group_size, 3); > >> nelts_to_build =3D mask.encoded_nelts (); > >> - nvectors_per_build =3D SLP_TREE_VEC_STMTS (node).length (); > >> + /* It's possible to obtain zero nstmts during analyze_only, so = make > >> + it at least one to ensure the later computation for n_perms > >> + proceed. */ > >> + nvectors_per_build =3D nstmts > 0 ? nstmts : 1; > >> in_nlanes =3D DR_GROUP_SIZE (stmt_info) * 3; > >> } > >> else > >> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinf= o, slp_tree node, > >> return false; > >> } > >> > >> - ++*n_perms; > >> - > >> + tree mask_vec =3D NULL_TREE; > >> if (!analyze_only) > >> - { > >> - tree mask_vec =3D vect_gen_perm_mask_checked (vectyp= e, indices); > >> + mask_vec =3D vect_gen_perm_mask_checked (vectype, indi= ces); > >> > >> - if (second_vec_index =3D=3D -1) > >> - second_vec_index =3D first_vec_index; > >> + if (second_vec_index =3D=3D -1) > >> + second_vec_index =3D first_vec_index; > >> > >> - for (unsigned int ri =3D 0; ri < nvectors_per_build;= ++ri) > >> + for (unsigned int ri =3D 0; ri < nvectors_per_build; ++r= i) > >> + { > >> + ++*n_perms; > > > > So the "real" change is doing > > > > *n_perms +=3D nvectors_per_build; > > > > and *n_perms was unused when !analyze_only? And since at > > Yes, although both !analyze_only and analyze_only calls pass n_perms in, = now > only the call sites with analyze_only will use the returned n_perms furth= er. > > > analysis time we (sometimes?) have zero nvectors you have to > > fixup above? Which cases are that? > > Yes, the fixup is to avoid to result in unexpected n_perms in function > vect_optimize_slp_pass::internal_node_cost=E3=80=82 One typical case is > gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optim= ize > out one more vec_perm unexpectedly. > > In vect_optimize_slp_pass::internal_node_cost, it checks if the returned = n_perms > is zero or not (vec_perm not needed or needed). > > if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL= , > nullptr, vf, true, false, &n_p= erms)) > { > auto rep =3D SLP_TREE_REPRESENTATIVE (node); > if (out_layout_i =3D=3D 0) > { > /* Use the fallback cost if the load is an N-to-N permutati= on. > Otherwise assume that the node will be rejected later > and rebuilt from scalars. */ > if (STMT_VINFO_GROUPED_ACCESS (rep) > && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep)) > =3D=3D SLP_TREE_LANES (node))) > return fallback_cost; > return 0; > } > return -1; > } > > /* See the comment above the corresponding VEC_PERM_EXPR handling. = */ > return n_perms =3D=3D 0 ? 0 : 1; > > In vect_optimize_slp_pass::forward_pass (), it only considers the case th= at > factor > 0 (there is some vec_perm needed). > > /* Accumulate the cost of using LAYOUT_I within NODE, > both for the inputs and the outputs. */ > int factor =3D internal_node_cost (vertex.node, layout_i, > layout_i); > if (factor < 0) > { > is_possible =3D false; > break; > } > else if (factor) > layout_costs.internal_cost.add_serial_cost > ({ vertex.weight * factor, m_optimize_size }); Ah, OK - thanks for clarifying. The patch is OK. Richard. > BR, > Kewen > > > > > In principle the patch looks good to me. > > > > Richard. > > > >> + if (analyze_only) > >> + continue; > >> + /* Generate the permute statement if necessary. */ > >> + tree first_vec =3D dr_chain[first_vec_index + ri]; > >> + tree second_vec =3D dr_chain[second_vec_index + ri]; > >> + gassign *stmt =3D as_a (stmt_info->stmt); > >> + tree perm_dest > >> + =3D vect_create_destination_var (gimple_assign_lhs= (stmt), > >> + vectype); > >> + perm_dest =3D make_ssa_name (perm_dest); > >> + gimple *perm_stmt > >> + =3D gimple_build_assign (perm_dest, VEC_PERM_EXPR,= first_vec, > >> + second_vec, mask_vec); > >> + vect_finish_stmt_generation (vinfo, stmt_info, perm_= stmt, > >> + gsi); > >> + if (dce_chain) > >> { > >> - /* Generate the permute statement if necessary. = */ > >> - tree first_vec =3D dr_chain[first_vec_index + ri= ]; > >> - tree second_vec =3D dr_chain[second_vec_index + = ri]; > >> - gassign *stmt =3D as_a (stmt_info->st= mt); > >> - tree perm_dest > >> - =3D vect_create_destination_var (gimple_assign= _lhs (stmt), > >> - vectype); > >> - perm_dest =3D make_ssa_name (perm_dest); > >> - gimple *perm_stmt > >> - =3D gimple_build_assign (perm_dest, VEC_PERM_E= XPR, > >> - first_vec, second_vec, = mask_vec); > >> - vect_finish_stmt_generation (vinfo, stmt_info, p= erm_stmt, > >> - gsi); > >> - if (dce_chain) > >> - { > >> - bitmap_set_bit (used_defs, first_vec_index += ri); > >> - bitmap_set_bit (used_defs, second_vec_index = + ri); > >> - } > >> - > >> - /* Store the vector statement in NODE. */ > >> - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] > >> - =3D perm_stmt; > >> + bitmap_set_bit (used_defs, first_vec_index + ri)= ; > >> + bitmap_set_bit (used_defs, second_vec_index + ri= ); > >> } > >> + > >> + /* Store the vector statement in NODE. */ > >> + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] =3D = perm_stmt; > >> } > >> } > >> else if (!analyze_only) > >> -- > >> 2.39.1 >