From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by sourceware.org (Postfix) with ESMTPS id ACC813861805 for ; Wed, 27 Sep 2023 11:33:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ACC813861805 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x130.google.com with SMTP id 2adb3069b0e04-50437c618b4so15099960e87.2 for ; Wed, 27 Sep 2023 04:33:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695814383; x=1696419183; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kZV0/go8qJi0Kusk7DMnWQA9VlJfE/x2xAjUCpxD0RY=; b=alpkVvwSA+RUVobhgsk76l699Lkvv4ve5ss1W7HudPEXW85bwt+Sbtt9Y0Qwhkthgj wp4aDBtj5+QVHLA/EyNCg9h7O0giBt4DfLql3qd5o0jYS62dY0gA/COTkMPFZPxFQgQG Uv2F2kNyvHr9Ksq97Q1kV25anPsl8bXn0YZPax77tNfd3nAqeCxljnsOCYjFZhEP34WY Iz37XN25CnJ+pf1PjN311xiDF59CFKvpGju0NYLyWle5o6y3k54mZKUZ++g5VolH3273 UzlQI74ZodiRjymOyg4CKx1MPXiTJqg1XyHrXFuHklRJhbpj0rsOTbhsvCtDIDk89FNB ZKHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695814383; x=1696419183; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kZV0/go8qJi0Kusk7DMnWQA9VlJfE/x2xAjUCpxD0RY=; b=a90MaMOT5F1gRQCkguGHXW20WuPZK03rsyJuwJrwviGw2utV7HGTrkJ2e23w1wyzg9 E+4/sacCu0VzXPb3tUCPCvglH7vqpxNfV2iWH1YkPI8vZPpOJt6muCUBY27OGuqoQC/R 3FMPuIlWmwV+bmfKnT3UlmD9qah9hC3n1RQ2nQ5O+h5izEiL/67vNIwHAfeGtf7cWwI7 yMeWJa4M/K9i+gfj3YM7z4j9qW2ssRxLgwmC3V74ocrsw37ZkYtO3H7pdNcn3u+lg+Gw /Fug7txECRMjDTTCIB7bYAjm/uHMd8eVL9gG6Ej95+zvxoGKbw9m8xUqK+wWTymHwnIO d+IA== X-Gm-Message-State: AOJu0YwVo3B6xEzPGXrfqsrtOY1oaVHXHxgg3a1c0C1G2gAMHqxnfrWf U2/BdhlnRhx72FCYo7CS+a2HupFz0osvK3t/yUk= X-Google-Smtp-Source: AGHT+IGym24ePuWNwZySm0TKpzZQwxITyTwmAKZ7Rzt7CXEp28BdS6/ynmsZZbGw1qdc0ThIVjETG4WRYzP0ZYi+OCQ= X-Received: by 2002:a05:6512:1152:b0:503:970:4dbf with SMTP id m18-20020a056512115200b0050309704dbfmr2011900lfg.15.1695814383111; Wed, 27 Sep 2023 04:33:03 -0700 (PDT) MIME-Version: 1.0 References: <7514680ad7b9b859a054ca1a59356f58b5ac9089.1694657495.git.linkw@linux.ibm.com> In-Reply-To: <7514680ad7b9b859a054ca1a59356f58b5ac9089.1694657495.git.linkw@linux.ibm.com> From: Richard Biener Date: Wed, 27 Sep 2023 13:30:38 +0200 Message-ID: Subject: Re: [PATCH 10/10] vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSE To: Kewen Lin Cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Sep 14, 2023 at 5:12=E2=80=AFAM Kewen Lin wro= te: > > For VMAT_CONTIGUOUS_REVERSE, the transform code in function > vectorizable_store generates a VEC_PERM_EXPR stmt before > storing, but it's never considered in costing. > > This patch is to make it consider vec_perm in costing, it > adjusts the order of transform code a bit to make it easy > to early return for costing_p. OK. > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_store): Consider generated > VEC_PERM_EXPR stmt for VMAT_CONTIGUOUS_REVERSE in costing as > vec_perm. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c: New test. > --- > .../costmodel/ppc/costmodel-vect-store-2.c | 29 +++++++++ > gcc/tree-vect-stmts.cc | 63 +++++++++++-------- > 2 files changed, 65 insertions(+), 27 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vec= t-store-2.c > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store= -2.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c > new file mode 100644 > index 00000000000..72b67cf9040 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-additional-options "-mvsx" } */ > + > +/* Verify we do cost the required vec_perm. */ > + > +int > +foo (int *a, int *b, int len) > +{ > + int i; > + int *a1 =3D a; > + int *a0 =3D a1 - 4; > + for (i =3D 0; i < len; i++) > + { > + *b =3D *a0 + *a1; > + b--; > + a0++; > + a1++; > + } > + return 0; > +} > + > +/* The reason why it doesn't check the exact count is that > + we can get more than 1 vec_perm when it's compiled with > + partial vector capability like Power10 (retrying for > + the epilogue) or it's complied without unaligned vector > + memory access support (realign). */ > +/* { dg-final { scan-tree-dump {\mvec_perm\M} "vect" } } */ > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index 3d451c80bca..ce925cc1d53 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -9279,6 +9279,40 @@ vectorizable_store (vec_info *vinfo, > stmt_vec_info next_stmt_info =3D first_stmt_info; > for (i =3D 0; i < vec_num; i++) > { > + if (!costing_p) > + { > + if (slp) > + vec_oprnd =3D vec_oprnds[i]; > + else if (grouped_store) > + /* For grouped stores vectorized defs are interleaved in > + vect_permute_store_chain(). */ > + vec_oprnd =3D result_chain[i]; > + } > + > + if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE) > + { > + if (costing_p) > + inside_cost +=3D record_stmt_cost (cost_vec, 1, vec_perm, > + stmt_info, 0, vect_body)= ; > + else > + { > + tree perm_mask =3D perm_mask_for_reverse (vectype); > + tree perm_dest =3D vect_create_destination_var ( > + vect_get_store_rhs (stmt_info), vectype); > + tree new_temp =3D make_ssa_name (perm_dest); > + > + /* Generate the permute statement. */ > + gimple *perm_stmt > + =3D gimple_build_assign (new_temp, VEC_PERM_EXPR, vec= _oprnd, > + vec_oprnd, perm_mask); > + vect_finish_stmt_generation (vinfo, stmt_info, perm_stm= t, > + gsi); > + > + perm_stmt =3D SSA_NAME_DEF_STMT (new_temp); > + vec_oprnd =3D new_temp; > + } > + } > + > if (costing_p) > { > vect_get_store_cost (vinfo, stmt_info, 1, > @@ -9294,8 +9328,6 @@ vectorizable_store (vec_info *vinfo, > > continue; > } > - unsigned misalign; > - unsigned HOST_WIDE_INT align; > > tree final_mask =3D NULL_TREE; > tree final_len =3D NULL_TREE; > @@ -9315,13 +9347,8 @@ vectorizable_store (vec_info *vinfo, > dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_incr= , gsi, > stmt_info, bump); > > - if (slp) > - vec_oprnd =3D vec_oprnds[i]; > - else if (grouped_store) > - /* For grouped stores vectorized defs are interleaved in > - vect_permute_store_chain(). */ > - vec_oprnd =3D result_chain[i]; > - > + unsigned misalign; > + unsigned HOST_WIDE_INT align; > align =3D known_alignment (DR_TARGET_ALIGNMENT (first_dr_info))= ; > if (alignment_support_scheme =3D=3D dr_aligned) > misalign =3D 0; > @@ -9338,24 +9365,6 @@ vectorizable_store (vec_info *vinfo, > misalign); > align =3D least_bit_hwi (misalign | align); > > - if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE) > - { > - tree perm_mask =3D perm_mask_for_reverse (vectype); > - tree perm_dest > - =3D vect_create_destination_var (vect_get_store_rhs (stmt= _info), > - vectype); > - tree new_temp =3D make_ssa_name (perm_dest); > - > - /* Generate the permute statement. */ > - gimple *perm_stmt > - =3D gimple_build_assign (new_temp, VEC_PERM_EXPR, vec_opr= nd, > - vec_oprnd, perm_mask); > - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, g= si); > - > - perm_stmt =3D SSA_NAME_DEF_STMT (new_temp); > - vec_oprnd =3D new_temp; > - } > - > /* Compute IFN when LOOP_LENS or final_mask valid. */ > machine_mode vmode =3D TYPE_MODE (vectype); > machine_mode new_vmode =3D vmode; > -- > 2.31.1 >