From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 14E123857C52 for ; Fri, 22 Oct 2021 15:33:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 14E123857C52 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B40C61FB; Fri, 22 Oct 2021 08:33:47 -0700 (PDT) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 21AFE3F694; Fri, 22 Oct 2021 08:33:47 -0700 (PDT) From: Richard Sandiford To: "Andre Vieira \(lists\) via Gcc-patches" Mail-Followup-To: "Andre Vieira \(lists\) via Gcc-patches" , "Andre Vieira \(lists\)" , Richard Biener , richard.sandiford@arm.com Cc: "Andre Vieira \(lists\)" , Richard Biener Subject: Re: [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops References: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> <4b403865-bb56-29a4-56d0-b18536925db6@arm.com> Date: Fri, 22 Oct 2021 16:33:45 +0100 In-Reply-To: <4b403865-bb56-29a4-56d0-b18536925db6@arm.com> (Andre Vieira via Gcc-patches's message of "Fri, 17 Sep 2021 16:32:48 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Oct 2021 15:33:49 -0000 "Andre Vieira (lists) via Gcc-patches" writes: > Hi, > > This patch changes the order in which we check outside and inside costs=20 > for epilogue loops, this is to ensure that a predicated epilogue is more= =20 > likely to be picked over an unpredicated one, since it saves having to=20 > enter a scalar epilogue loop. > > gcc/ChangeLog: > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * tree-vect-loop.c (vect_bett= er_loop_vinfo_p): Change how=20 > epilogue loop costs are compared. OK, thanks. Sorry for the slow review. Richard > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 14f8150d7c262b9422784e0e997ca4387664a20a..038af13a91d43c9f09186d042= cf415020ea73a38 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -2881,17 +2881,75 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_= vinfo, > return new_simdlen_p; > } >=20=20 > + loop_vec_info main_loop =3D LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo); > + if (main_loop) > + { > + poly_uint64 main_poly_vf =3D LOOP_VINFO_VECT_FACTOR (main_loop); > + unsigned HOST_WIDE_INT main_vf; > + unsigned HOST_WIDE_INT old_factor, new_factor, old_cost, new_cost; > + /* If we can determine how many iterations are left for the epilog= ue > + loop, that is if both the main loop's vectorization factor and number > + of iterations are constant, then we use them to calculate the cost of > + the epilogue loop together with a 'likely value' for the epilogues > + vectorization factor. Otherwise we use the main loop's vectorization > + factor and the maximum poly value for the epilogue's. If the target > + has not provided with a sensible upper bound poly vectorization > + factors are likely to be favored over constant ones. */ > + if (main_poly_vf.is_constant (&main_vf) > + && LOOP_VINFO_NITERS_KNOWN_P (main_loop)) > + { > + unsigned HOST_WIDE_INT niters > + =3D LOOP_VINFO_INT_NITERS (main_loop) % main_vf; > + HOST_WIDE_INT old_likely_vf > + =3D estimated_poly_value (old_vf, POLY_VALUE_LIKELY); > + HOST_WIDE_INT new_likely_vf > + =3D estimated_poly_value (new_vf, POLY_VALUE_LIKELY); > + > + /* If the epilogue is using partial vectors we account for the > + partial iteration here too. */ > + old_factor =3D niters / old_likely_vf; > + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo) > + && niters % old_likely_vf !=3D 0) > + old_factor++; > + > + new_factor =3D niters / new_likely_vf; > + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo) > + && niters % new_likely_vf !=3D 0) > + new_factor++; > + } > + else > + { > + unsigned HOST_WIDE_INT main_vf_max > + =3D estimated_poly_value (main_poly_vf, POLY_VALUE_MAX); > + > + old_factor =3D main_vf_max / estimated_poly_value (old_vf, > + POLY_VALUE_MAX); > + new_factor =3D main_vf_max / estimated_poly_value (new_vf, > + POLY_VALUE_MAX); > + > + /* If the loop is not using partial vectors then it will iterate one > + time less than one that does. It is safe to subtract one here, > + because the main loop's vf is always at least 2x bigger than that > + of an epilogue. */ > + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo)) > + old_factor -=3D 1; > + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo)) > + new_factor -=3D 1; > + } > + > + /* Compute the costs by multiplying the inside costs with the fact= or and > + add the outside costs for a more complete picture. The factor is the > + amount of times we are expecting to iterate this epilogue. */ > + old_cost =3D old_loop_vinfo->vec_inside_cost * old_factor; > + new_cost =3D new_loop_vinfo->vec_inside_cost * new_factor; > + old_cost +=3D old_loop_vinfo->vec_outside_cost; > + new_cost +=3D new_loop_vinfo->vec_outside_cost; > + return new_cost < old_cost; > + } > + > /* Limit the VFs to what is likely to be the maximum number of iterati= ons, > to handle cases in which at least one loop_vinfo is fully-masked. = */ > - HOST_WIDE_INT estimated_max_niter; > - loop_vec_info main_loop =3D LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo); > - unsigned HOST_WIDE_INT main_vf; > - if (main_loop > - && LOOP_VINFO_NITERS_KNOWN_P (main_loop) > - && LOOP_VINFO_VECT_FACTOR (main_loop).is_constant (&main_vf)) > - estimated_max_niter =3D LOOP_VINFO_INT_NITERS (main_loop) % main_vf; > - else > - estimated_max_niter =3D likely_max_stmt_executions_int (loop); > + HOST_WIDE_INT estimated_max_niter =3D likely_max_stmt_executions_int (= loop); > if (estimated_max_niter !=3D -1) > { > if (known_le (estimated_max_niter, new_vf))