From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13721 invoked by alias); 8 Jun 2011 09:55:28 -0000 Received: (qmail 13713 invoked by uid 22791); 8 Jun 2011 09:55:27 -0000 X-SWARE-Spam-Status: No, hits=-1.0 required=5.0 tests=AWL,BAYES_50,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,TW_CB X-Spam-Check-By: sourceware.org Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 08 Jun 2011 09:55:13 +0000 Received: by wye20 with SMTP id 20so253067wye.20 for ; Wed, 08 Jun 2011 02:55:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.163.76 with SMTP id z12mr907880wbx.75.1307526911564; Wed, 08 Jun 2011 02:55:11 -0700 (PDT) Received: by 10.227.37.152 with HTTP; Wed, 8 Jun 2011 02:55:11 -0700 (PDT) In-Reply-To: <4DEF4408.4040001@codesourcery.com> References: <4DEF4408.4040001@codesourcery.com> Date: Wed, 08 Jun 2011 10:09:00 -0000 Message-ID: Subject: Re: [PATCH, PR43864] Gimple level duplicate block cleanup. From: Richard Guenther To: Tom de Vries Cc: Steven Bosscher , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-06/txt/msg00629.txt.bz2 On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries wrot= e: > Hi Richard, > > I have a patch for PR43864. The patch adds a gimple level duplicate block > cleanup. The patch has been bootstrapped and reg-tested on x86_64, and > reg-tested on ARM. The size impact on ARM for spec2000 is shown in the fo= llowing > table (%, lower is better). > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 none =A0 =A0 =A0 =A0 =A0 =A0pic > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0thumb1 =A0thumb2 =A0thumb1 thumb2 > spec2000 =A0 =A0 =A0 =A0 =A099.9 =A0 =A099.9 =A0 =A099.8 =A0 99.8 > > PR43864 is currently marked as a duplicate of PR20070, but I'm not sure t= hat the > optimizations proposed in PR20070 would fix this PR. > > The problem in this PR is that when compiling with -O2, the example below= should > only have one call to free. The original problem is formulated in terms o= f -Os, > but currently we generate one call to free with -Os, although still not t= he > smallest code possible. I'll show here the -O2 case, since that's similar= to the > original PR. > > #include > void foo (char*, FILE*); > char* hprofStartupp(char *outputFileName, char *ctx) > { > =A0 =A0char fileName[1000]; > =A0 =A0FILE *fp; > =A0 =A0sprintf(fileName, outputFileName); > =A0 =A0if (access(fileName, 1) =3D=3D 0) { > =A0 =A0 =A0 =A0free(ctx); > =A0 =A0 =A0 =A0return 0; > =A0 =A0} > > =A0 =A0fp =3D fopen(fileName, 0); > =A0 =A0if (fp =3D=3D 0) { > =A0 =A0 =A0 =A0free(ctx); > =A0 =A0 =A0 =A0return 0; > =A0 =A0} > > =A0 =A0foo(outputFileName, fp); > > =A0 =A0return ctx; > } > > AFAIU, there are 2 complementary methods of rtl optimizations proposed in= PR20070. > - Merging 2 blocks which are identical expect for input registers, by usi= ng a > =A0conditional move to choose between the different input registers. > - Merging 2 blocks which have different local registers, by ignoring those > =A0differences > > Blocks .L6 and.L7 have no difference in local registers, but they have a > difference in input registers: r3 and r1. Replacing the move to r5 by a > conditional move would probably be benificial in terms of size, but it's = not > clear what condition the conditional move should be using. Calculating su= ch a > condition would add in size and increase the execution path. > > gcc -O2 -march=3Darmv7-a -mthumb pr43864.c -S: > ... > =A0 =A0 =A0 =A0push =A0 =A0{r4, r5, lr} > =A0 =A0 =A0 =A0mov =A0 =A0 r4, r0 > =A0 =A0 =A0 =A0sub =A0 =A0 sp, sp, #1004 > =A0 =A0 =A0 =A0mov =A0 =A0 r5, r1 > =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp > =A0 =A0 =A0 =A0mov =A0 =A0 r1, r4 > =A0 =A0 =A0 =A0bl =A0 =A0 =A0sprintf > =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp > =A0 =A0 =A0 =A0movs =A0 =A0r1, #1 > =A0 =A0 =A0 =A0bl =A0 =A0 =A0access > =A0 =A0 =A0 =A0mov =A0 =A0 r3, r0 > =A0 =A0 =A0 =A0cbz =A0 =A0 r0, .L6 > =A0 =A0 =A0 =A0movs =A0 =A0r1, #0 > =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp > =A0 =A0 =A0 =A0bl =A0 =A0 =A0fopen > =A0 =A0 =A0 =A0mov =A0 =A0 r1, r0 > =A0 =A0 =A0 =A0cbz =A0 =A0 r0, .L7 > =A0 =A0 =A0 =A0mov =A0 =A0 r0, r4 > =A0 =A0 =A0 =A0bl =A0 =A0 =A0foo > .L3: > =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5 > =A0 =A0 =A0 =A0add =A0 =A0 sp, sp, #1004 > =A0 =A0 =A0 =A0pop =A0 =A0 {r4, r5, pc} > .L6: > =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5 > =A0 =A0 =A0 =A0mov =A0 =A0 r5, r3 > =A0 =A0 =A0 =A0bl =A0 =A0 =A0free > =A0 =A0 =A0 =A0b =A0 =A0 =A0 .L3 > .L7: > =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5 > =A0 =A0 =A0 =A0mov =A0 =A0 r5, r1 > =A0 =A0 =A0 =A0bl =A0 =A0 =A0free > =A0 =A0 =A0 =A0b =A0 =A0 =A0 .L3 > ... > > The proposed patch solved the problem by dealing with the 2 blocks at a l= evel > when they are still identical: at gimple level. It detect that the 2 bloc= ks are > identical, and removes one of them. > > The following table shows the impact of the patch on the example in terms= of > size for -march=3Darmv7-a: > > =A0 =A0 =A0 =A0 =A0without =A0 =A0 with =A0 =A0delta > Os =A0 =A0 =A0: =A0 =A0 108 =A0 =A0 =A0104 =A0 =A0 =A0 -4 > O2 =A0 =A0 =A0: =A0 =A0 120 =A0 =A0 =A0104 =A0 =A0 =A0-16 > Os thumb: =A0 =A0 =A068 =A0 =A0 =A0 64 =A0 =A0 =A0 -4 > O2 thumb: =A0 =A0 =A076 =A0 =A0 =A0 64 =A0 =A0 =A0-12 > > The gain in size for -O2 is that of removing the entire block, plus the > replacement of 2 moves by a constant set, which also decreases the execut= ion > path. The patch ensures optimal code for both -O2 and -Os. > > > By keeping track of equivalent definitions in the 2 blocks, we can ignore= those > differences in comparison. Without this feature, we would only match bloc= ks with > resultless operations, due the the ssa-nature of gimples. > For example, with this feature, we reduce the following function to its m= inimum > at gimple level, rather than at rtl level. > > int f(int c, int b, int d) > { > =A0int r, e; > > =A0if (c) > =A0 =A0r =3D b + d; > =A0else > =A0 =A0{ > =A0 =A0 =A0e =3D b + d; > =A0 =A0 =A0r =3D e; > =A0 =A0} > > =A0return r; > } > > ;; Function f (f) > > f (int c, int b, int d) > { > =A0int e; > > : > =A0e_6 =3D b_3(D) + d_4(D); > =A0return e_6; > > } > > I'll send the patch with the testcases in a separate email. > > OK for trunk? I don't like that you hook this into cleanup_tree_cfg - that is called _way_ too often. This also duplicates the literal matching done on the RTL level - instead I think this optimization would be more related to value-numbering (either that of SCCVN/FRE/PRE or that of DOM which also does jump-threading). Richard. > Thanks, > - Tom > > 2011-06-08 =A0Tom de Vries =A0 > > =A0 =A0 =A0 =A0PR middle-end/43864 > =A0 =A0 =A0 =A0* tree-cfgcleanup.c (int_int_splay_lookup, int_int_splay_i= nsert) > =A0 =A0 =A0 =A0(int_int_splay_node_contained_in, int_int_splay_contained_= in) > =A0 =A0 =A0 =A0(equiv_lookup, equiv_insert, equiv_contained_in, equiv_ini= t) > =A0 =A0 =A0 =A0(equiv_delete, gimple_base_equal_p, pt_solution_equal_p, g= imple_equal_p) > =A0 =A0 =A0 =A0(bb_gimple_equal_p, update_debug_stmts, cleanup_duplicate_= preds_1) > =A0 =A0 =A0 =A0(same_or_local_phi_alternatives, cleanup_duplicate_preds):= New function. > =A0 =A0 =A0 =A0(cleanup_tree_cfg_bb): Use cleanup_duplicate_preds. >