From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12186 invoked by alias); 8 Jun 2011 10:36:46 -0000 Received: (qmail 12033 invoked by uid 22791); 8 Jun 2011 10:36:44 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,TW_CB X-Spam-Check-By: sourceware.org Received: from mail-gw0-f47.google.com (HELO mail-gw0-f47.google.com) (74.125.83.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 08 Jun 2011 10:36:27 +0000 Received: by gwb11 with SMTP id 11so129720gwb.20 for ; Wed, 08 Jun 2011 03:36:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.101.196.24 with SMTP id y24mr6098946anp.154.1307529386238; Wed, 08 Jun 2011 03:36:26 -0700 (PDT) Received: by 10.100.250.17 with HTTP; Wed, 8 Jun 2011 03:36:26 -0700 (PDT) In-Reply-To: References: <4DEF4408.4040001@codesourcery.com> Date: Wed, 08 Jun 2011 10:40:00 -0000 Message-ID: Subject: Re: [PATCH, PR43864] Gimple level duplicate block cleanup. From: Steven Bosscher To: Richard Guenther Cc: Tom de Vries , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-06/txt/msg00636.txt.bz2 On Wed, Jun 8, 2011 at 11:55 AM, Richard Guenther wrote: > On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries wr= ote: >> Hi Richard, >> >> I have a patch for PR43864. The patch adds a gimple level duplicate block >> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and >> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the f= ollowing >> table (%, lower is better). >> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 none =A0 =A0 =A0 =A0 =A0 =A0pic >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0thumb1 =A0thumb2 =A0thumb1 thumb2 >> spec2000 =A0 =A0 =A0 =A0 =A099.9 =A0 =A099.9 =A0 =A099.8 =A0 99.8 >> >> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure = that the >> optimizations proposed in PR20070 would fix this PR. >> >> The problem in this PR is that when compiling with -O2, the example belo= w should >> only have one call to free. The original problem is formulated in terms = of -Os, >> but currently we generate one call to free with -Os, although still not = the >> smallest code possible. I'll show here the -O2 case, since that's simila= r to the >> original PR. >> >> #include >> void foo (char*, FILE*); >> char* hprofStartupp(char *outputFileName, char *ctx) >> { >> =A0 =A0char fileName[1000]; >> =A0 =A0FILE *fp; >> =A0 =A0sprintf(fileName, outputFileName); >> =A0 =A0if (access(fileName, 1) =3D=3D 0) { >> =A0 =A0 =A0 =A0free(ctx); >> =A0 =A0 =A0 =A0return 0; >> =A0 =A0} >> >> =A0 =A0fp =3D fopen(fileName, 0); >> =A0 =A0if (fp =3D=3D 0) { >> =A0 =A0 =A0 =A0free(ctx); >> =A0 =A0 =A0 =A0return 0; >> =A0 =A0} >> >> =A0 =A0foo(outputFileName, fp); >> >> =A0 =A0return ctx; >> } >> >> AFAIU, there are 2 complementary methods of rtl optimizations proposed i= n PR20070. >> - Merging 2 blocks which are identical expect for input registers, by us= ing a >> =A0conditional move to choose between the different input registers. >> - Merging 2 blocks which have different local registers, by ignoring tho= se >> =A0differences >> >> Blocks .L6 and.L7 have no difference in local registers, but they have a >> difference in input registers: r3 and r1. Replacing the move to r5 by a >> conditional move would probably be benificial in terms of size, but it's= not >> clear what condition the conditional move should be using. Calculating s= uch a >> condition would add in size and increase the execution path. >> >> gcc -O2 -march=3Darmv7-a -mthumb pr43864.c -S: >> ... >> =A0 =A0 =A0 =A0push =A0 =A0{r4, r5, lr} >> =A0 =A0 =A0 =A0mov =A0 =A0 r4, r0 >> =A0 =A0 =A0 =A0sub =A0 =A0 sp, sp, #1004 >> =A0 =A0 =A0 =A0mov =A0 =A0 r5, r1 >> =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp >> =A0 =A0 =A0 =A0mov =A0 =A0 r1, r4 >> =A0 =A0 =A0 =A0bl =A0 =A0 =A0sprintf >> =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp >> =A0 =A0 =A0 =A0movs =A0 =A0r1, #1 >> =A0 =A0 =A0 =A0bl =A0 =A0 =A0access >> =A0 =A0 =A0 =A0mov =A0 =A0 r3, r0 >> =A0 =A0 =A0 =A0cbz =A0 =A0 r0, .L6 >> =A0 =A0 =A0 =A0movs =A0 =A0r1, #0 >> =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp >> =A0 =A0 =A0 =A0bl =A0 =A0 =A0fopen >> =A0 =A0 =A0 =A0mov =A0 =A0 r1, r0 >> =A0 =A0 =A0 =A0cbz =A0 =A0 r0, .L7 >> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r4 >> =A0 =A0 =A0 =A0bl =A0 =A0 =A0foo >> .L3: >> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5 >> =A0 =A0 =A0 =A0add =A0 =A0 sp, sp, #1004 >> =A0 =A0 =A0 =A0pop =A0 =A0 {r4, r5, pc} >> .L6: >> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5 >> =A0 =A0 =A0 =A0mov =A0 =A0 r5, r3 >> =A0 =A0 =A0 =A0bl =A0 =A0 =A0free >> =A0 =A0 =A0 =A0b =A0 =A0 =A0 .L3 >> .L7: >> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5 >> =A0 =A0 =A0 =A0mov =A0 =A0 r5, r1 >> =A0 =A0 =A0 =A0bl =A0 =A0 =A0free >> =A0 =A0 =A0 =A0b =A0 =A0 =A0 .L3 >> ... >> >> The proposed patch solved the problem by dealing with the 2 blocks at a = level >> when they are still identical: at gimple level. It detect that the 2 blo= cks are >> identical, and removes one of them. >> >> The following table shows the impact of the patch on the example in term= s of >> size for -march=3Darmv7-a: >> >> =A0 =A0 =A0 =A0 =A0without =A0 =A0 with =A0 =A0delta >> Os =A0 =A0 =A0: =A0 =A0 108 =A0 =A0 =A0104 =A0 =A0 =A0 -4 >> O2 =A0 =A0 =A0: =A0 =A0 120 =A0 =A0 =A0104 =A0 =A0 =A0-16 >> Os thumb: =A0 =A0 =A068 =A0 =A0 =A0 64 =A0 =A0 =A0 -4 >> O2 thumb: =A0 =A0 =A076 =A0 =A0 =A0 64 =A0 =A0 =A0-12 >> >> The gain in size for -O2 is that of removing the entire block, plus the >> replacement of 2 moves by a constant set, which also decreases the execu= tion >> path. The patch ensures optimal code for both -O2 and -Os. >> >> >> By keeping track of equivalent definitions in the 2 blocks, we can ignor= e those >> differences in comparison. Without this feature, we would only match blo= cks with >> resultless operations, due the the ssa-nature of gimples. >> For example, with this feature, we reduce the following function to its = minimum >> at gimple level, rather than at rtl level. >> >> int f(int c, int b, int d) >> { >> =A0int r, e; >> >> =A0if (c) >> =A0 =A0r =3D b + d; >> =A0else >> =A0 =A0{ >> =A0 =A0 =A0e =3D b + d; >> =A0 =A0 =A0r =3D e; >> =A0 =A0} >> >> =A0return r; >> } >> >> ;; Function f (f) >> >> f (int c, int b, int d) >> { >> =A0int e; >> >> : >> =A0e_6 =3D b_3(D) + d_4(D); >> =A0return e_6; >> >> } >> >> I'll send the patch with the testcases in a separate email. >> >> OK for trunk? > > I don't like that you hook this into cleanup_tree_cfg - that is called > _way_ too often. > > This also duplicates the literal matching done on the RTL level - instead > I think this optimization would be more related to value-numbering > (either that of SCCVN/FRE/PRE or that of DOM which also does > jump-threading). And while at it: Put it in a separate file ;-) Ciao! Steven