From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17982 invoked by alias); 2 Jul 2010 16:42:20 -0000 Received: (qmail 17972 invoked by uid 22791); 2 Jul 2010 16:42:19 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE X-Spam-Check-By: sourceware.org Received: from mail-vw0-f47.google.com (HELO mail-vw0-f47.google.com) (209.85.212.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 02 Jul 2010 16:42:11 +0000 Received: by vws3 with SMTP id 3so320444vws.20 for ; Fri, 02 Jul 2010 09:42:09 -0700 (PDT) Received: by 10.224.104.220 with SMTP id q28mr558261qao.371.1278088929246; Fri, 02 Jul 2010 09:42:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.28.82 with HTTP; Fri, 2 Jul 2010 09:41:38 -0700 (PDT) In-Reply-To: References: From: Sebastian Pop Date: Fri, 02 Jul 2010 16:42:00 -0000 Message-ID: Subject: Re: [Patch PR 44576]: Reduce the computation cost in compute_miss_rate for prefetching loop arrays To: Richard Guenther Cc: "Fang, Changpeng" , Zdenek Dvorak , Christian Borntraeger , "gcc-patches@gcc.gnu.org" , "uweigand@de.ibm.com" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-07/txt/msg00190.txt.bz2 On Wed, Jun 30, 2010 at 03:52, Richard Guenther wrote: > On Wed, Jun 30, 2010 at 2:06 AM, Fang, Changpeng = wrote: >> Hi, >> >> Attached is the patch that partially fixes bug 44576: =A0testsuite/gfort= ran.dg/zero_sized_1.f90 with huge compile >> time on prefetching + peeling. >> >> The cost of compute_miss_rate in tree-ssa-loop-prefetch.c is found to be= very high, and it results in an exploding >> compilation time when there are thousands of memory references in the lo= op. >> >> compute_miss_rate is to compute the possibility (miss_rate) that two mem= ory references fall on difference cache >> lines. We will insert prefetches for both references if the computed mis= s rate is greater than the experimentally >> determined ACCEPTABLE_MISS_RATE. >> >> In this patch, compute_miss_rate is renamed as is_miss_rate_acceptable, = and does an early return if the computed >> miss rate is already greater than the ACCEPTABLE_MISS_RATE, and thus sig= nificantly reduces the cost for such >> case. Note that the worst case cost is not affected by this patch. >> >> This patch reduces the compile time of the test case from 1m20'' to 1m03= '' on an amd-linux64 system. >> Note that without -fprefetching-loop-arrays, the compile time on the sam= e system is 0m30''. The extra 33'' is mostly >> spent in "compute_all_dependences (datarefs, &dependences, vloops, true)= ". =A0I am not sure whether the cost in >> =A0dependence analysis for loop should be that high (with several thousa= nds memory references in loop) or there >> is a bug in dependence computation. > > Well, it seems that you compute the same information over-and-over > by doing > > unsigned int > tree_ssa_prefetch_arrays (void) > { > ... > =A0FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) > =A0 =A0{ > =A0 =A0 =A0if (dump_file && (dump_flags & TDF_DETAILS)) > =A0 =A0 =A0 =A0fprintf (dump_file, "Processing loop %d:\n", loop->num); > > =A0 =A0 =A0unrolled |=3D loop_prefetch_arrays (loop); > > static bool > loop_prefetch_arrays (struct loop *loop) > { > ... > =A0determine_loop_nest_reuse (loop, refs, no_other_refs); > > static void > determine_loop_nest_reuse (struct loop *loop, struct mem_ref_group *refs, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 bool no_other_refs) > { > ... > =A0if (loop->inner) > =A0 =A0return; > ... > =A0/* Find the outermost loop of the loop nest of loop (we require that > =A0 =A0 there are no sibling loops inside the nest). =A0*/ > =A0nest =3D loop; > =A0while (1) > =A0 =A0{ > =A0 =A0 =A0aloop =3D loop_outer (nest); > > =A0 =A0 =A0if (aloop =3D=3D current_loops->tree_root > =A0 =A0 =A0 =A0 =A0|| aloop->inner->next) > =A0 =A0 =A0 =A0break; > > =A0 =A0 =A0nest =3D aloop; > =A0 =A0} > ... > =A0compute_all_dependences (datarefs, &dependences, vloops, true); > > > In fact you should iterate _only_ over innermost loops like > > =A0FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) > > does that make a difference? > >> >> The patch passed Bootstrapping and regression tests. >> >> Is this patch OK to commit? > > Ok. > Committed r161728