From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-267194-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 17982 invoked by alias); 2 Jul 2010 16:42:20 -0000
Received: (qmail 17972 invoked by uid 22791); 2 Jul 2010 16:42:19 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE
X-Spam-Check-By: sourceware.org
Received: from mail-vw0-f47.google.com (HELO mail-vw0-f47.google.com) (209.85.212.47)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 02 Jul 2010 16:42:11 +0000
Received: by vws3 with SMTP id 3so320444vws.20        for <gcc-patches@gcc.gnu.org>; Fri, 02 Jul 2010 09:42:09 -0700 (PDT)
Received: by 10.224.104.220 with SMTP id q28mr558261qao.371.1278088929246; 	Fri, 02 Jul 2010 09:42:09 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.224.28.82 with HTTP; Fri, 2 Jul 2010 09:41:38 -0700 (PDT)
In-Reply-To: <AANLkTimL0po55_PnkbcCe2lbJNlaeWmcLeRqy48kTBen@mail.gmail.com>
References: <D4C76825A6780047854A11E93CDE84D02F7759@SAUSEXMBP01.amd.com> 	<AANLkTimL0po55_PnkbcCe2lbJNlaeWmcLeRqy48kTBen@mail.gmail.com>
From: Sebastian Pop <sebpop@gmail.com>
Date: Fri, 02 Jul 2010 16:42:00 -0000
Message-ID: <AANLkTindf030Kg9A63pYVJQ4sKLbMVcmjATeoq3oR9X1@mail.gmail.com>
Subject: Re: [Patch PR 44576]: Reduce the computation cost in 	compute_miss_rate for prefetching loop arrays
To: Richard Guenther <richard.guenther@gmail.com>
Cc: "Fang, Changpeng" <Changpeng.Fang@amd.com>, Zdenek Dvorak <rakdver@kam.mff.cuni.cz>, 	Christian Borntraeger <borntraeger@de.ibm.com>, 	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, "uweigand@de.ibm.com" <uweigand@de.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2010-07/txt/msg00190.txt.bz2

On Wed, Jun 30, 2010 at 03:52, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jun 30, 2010 at 2:06 AM, Fang, Changpeng <Changpeng.Fang@amd.com>=
 wrote:
>> Hi,
>>
>> Attached is the patch that partially fixes bug 44576: =A0testsuite/gfort=
ran.dg/zero_sized_1.f90 with huge compile
>> time on prefetching + peeling.
>>
>> The cost of compute_miss_rate in tree-ssa-loop-prefetch.c is found to be=
 very high, and it results in an exploding
>> compilation time when there are thousands of memory references in the lo=
op.
>>
>> compute_miss_rate is to compute the possibility (miss_rate) that two mem=
ory references fall on difference cache
>> lines. We will insert prefetches for both references if the computed mis=
s rate is greater than the experimentally
>> determined ACCEPTABLE_MISS_RATE.
>>
>> In this patch, compute_miss_rate is renamed as is_miss_rate_acceptable, =
and does an early return if the computed
>> miss rate is already greater than the ACCEPTABLE_MISS_RATE, and thus sig=
nificantly reduces the cost for such
>> case. Note that the worst case cost is not affected by this patch.
>>
>> This patch reduces the compile time of the test case from 1m20'' to 1m03=
'' on an amd-linux64 system.
>> Note that without -fprefetching-loop-arrays, the compile time on the sam=
e system is 0m30''. The extra 33'' is mostly
>> spent in "compute_all_dependences (datarefs, &dependences, vloops, true)=
". =A0I am not sure whether the cost in
>> =A0dependence analysis for loop should be that high (with several thousa=
nds memory references in loop) or there
>> is a bug in dependence computation.
>
> Well, it seems that you compute the same information over-and-over
> by doing
>
> unsigned int
> tree_ssa_prefetch_arrays (void)
> {
> ...
> =A0FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
> =A0 =A0{
> =A0 =A0 =A0if (dump_file && (dump_flags & TDF_DETAILS))
> =A0 =A0 =A0 =A0fprintf (dump_file, "Processing loop %d:\n", loop->num);
>
> =A0 =A0 =A0unrolled |=3D loop_prefetch_arrays (loop);
>
> static bool
> loop_prefetch_arrays (struct loop *loop)
> {
> ...
> =A0determine_loop_nest_reuse (loop, refs, no_other_refs);
>
> static void
> determine_loop_nest_reuse (struct loop *loop, struct mem_ref_group *refs,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 bool no_other_refs)
> {
> ...
> =A0if (loop->inner)
> =A0 =A0return;
> ...
> =A0/* Find the outermost loop of the loop nest of loop (we require that
> =A0 =A0 there are no sibling loops inside the nest). =A0*/
> =A0nest =3D loop;
> =A0while (1)
> =A0 =A0{
> =A0 =A0 =A0aloop =3D loop_outer (nest);
>
> =A0 =A0 =A0if (aloop =3D=3D current_loops->tree_root
> =A0 =A0 =A0 =A0 =A0|| aloop->inner->next)
> =A0 =A0 =A0 =A0break;
>
> =A0 =A0 =A0nest =3D aloop;
> =A0 =A0}
> ...
> =A0compute_all_dependences (datarefs, &dependences, vloops, true);
>
>
> In fact you should iterate _only_ over innermost loops like
>
> =A0FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
>
> does that make a difference?
>
>>
>> The patch passed Bootstrapping and regression tests.
>>
>> Is this patch OK to commit?
>
> Ok.
>

Committed r161728