From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-293951-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 12186 invoked by alias); 8 Jun 2011 10:36:46 -0000
Received: (qmail 12033 invoked by uid 22791); 8 Jun 2011 10:36:44 -0000
X-SWARE-Spam-Status: No, hits=-2.2 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,TW_CB
X-Spam-Check-By: sourceware.org
Received: from mail-gw0-f47.google.com (HELO mail-gw0-f47.google.com) (74.125.83.47)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 08 Jun 2011 10:36:27 +0000
Received: by gwb11 with SMTP id 11so129720gwb.20        for <gcc-patches@gcc.gnu.org>; Wed, 08 Jun 2011 03:36:26 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.101.196.24 with SMTP id y24mr6098946anp.154.1307529386238; Wed, 08 Jun 2011 03:36:26 -0700 (PDT)
Received: by 10.100.250.17 with HTTP; Wed, 8 Jun 2011 03:36:26 -0700 (PDT)
In-Reply-To: <BANLkTimAvGy1jRniWftbEvLnGwEN_0Ma8Q@mail.gmail.com>
References: <4DEF4408.4040001@codesourcery.com>	<BANLkTimAvGy1jRniWftbEvLnGwEN_0Ma8Q@mail.gmail.com>
Date: Wed, 08 Jun 2011 10:40:00 -0000
Message-ID: <BANLkTikMua4WTn1Nn79H8PndQKn=HMqGtQ@mail.gmail.com>
Subject: Re: [PATCH, PR43864] Gimple level duplicate block cleanup.
From: Steven Bosscher <stevenb.gcc@gmail.com>
To: Richard Guenther <richard.guenther@gmail.com>
Cc: Tom de Vries <vries@codesourcery.com>, gcc-patches@gcc.gnu.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-06/txt/msg00636.txt.bz2

On Wed, Jun 8, 2011 at 11:55 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries <vries@codesourcery.com> wr=
ote:
>> Hi Richard,
>>
>> I have a patch for PR43864. The patch adds a gimple level duplicate block
>> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and
>> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the f=
ollowing
>> table (%, lower is better).
>>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 none =A0 =A0 =A0 =A0 =A0 =A0pic
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0thumb1 =A0thumb2 =A0thumb1 thumb2
>> spec2000 =A0 =A0 =A0 =A0 =A099.9 =A0 =A099.9 =A0 =A099.8 =A0 99.8
>>
>> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure =
that the
>> optimizations proposed in PR20070 would fix this PR.
>>
>> The problem in this PR is that when compiling with -O2, the example belo=
w should
>> only have one call to free. The original problem is formulated in terms =
of -Os,
>> but currently we generate one call to free with -Os, although still not =
the
>> smallest code possible. I'll show here the -O2 case, since that's simila=
r to the
>> original PR.
>>
>> #include <stdio.h>
>> void foo (char*, FILE*);
>> char* hprofStartupp(char *outputFileName, char *ctx)
>> {
>> =A0 =A0char fileName[1000];
>> =A0 =A0FILE *fp;
>> =A0 =A0sprintf(fileName, outputFileName);
>> =A0 =A0if (access(fileName, 1) =3D=3D 0) {
>> =A0 =A0 =A0 =A0free(ctx);
>> =A0 =A0 =A0 =A0return 0;
>> =A0 =A0}
>>
>> =A0 =A0fp =3D fopen(fileName, 0);
>> =A0 =A0if (fp =3D=3D 0) {
>> =A0 =A0 =A0 =A0free(ctx);
>> =A0 =A0 =A0 =A0return 0;
>> =A0 =A0}
>>
>> =A0 =A0foo(outputFileName, fp);
>>
>> =A0 =A0return ctx;
>> }
>>
>> AFAIU, there are 2 complementary methods of rtl optimizations proposed i=
n PR20070.
>> - Merging 2 blocks which are identical expect for input registers, by us=
ing a
>> =A0conditional move to choose between the different input registers.
>> - Merging 2 blocks which have different local registers, by ignoring tho=
se
>> =A0differences
>>
>> Blocks .L6 and.L7 have no difference in local registers, but they have a
>> difference in input registers: r3 and r1. Replacing the move to r5 by a
>> conditional move would probably be benificial in terms of size, but it's=
 not
>> clear what condition the conditional move should be using. Calculating s=
uch a
>> condition would add in size and increase the execution path.
>>
>> gcc -O2 -march=3Darmv7-a -mthumb pr43864.c -S:
>> ...
>> =A0 =A0 =A0 =A0push =A0 =A0{r4, r5, lr}
>> =A0 =A0 =A0 =A0mov =A0 =A0 r4, r0
>> =A0 =A0 =A0 =A0sub =A0 =A0 sp, sp, #1004
>> =A0 =A0 =A0 =A0mov =A0 =A0 r5, r1
>> =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp
>> =A0 =A0 =A0 =A0mov =A0 =A0 r1, r4
>> =A0 =A0 =A0 =A0bl =A0 =A0 =A0sprintf
>> =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp
>> =A0 =A0 =A0 =A0movs =A0 =A0r1, #1
>> =A0 =A0 =A0 =A0bl =A0 =A0 =A0access
>> =A0 =A0 =A0 =A0mov =A0 =A0 r3, r0
>> =A0 =A0 =A0 =A0cbz =A0 =A0 r0, .L6
>> =A0 =A0 =A0 =A0movs =A0 =A0r1, #0
>> =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp
>> =A0 =A0 =A0 =A0bl =A0 =A0 =A0fopen
>> =A0 =A0 =A0 =A0mov =A0 =A0 r1, r0
>> =A0 =A0 =A0 =A0cbz =A0 =A0 r0, .L7
>> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r4
>> =A0 =A0 =A0 =A0bl =A0 =A0 =A0foo
>> .L3:
>> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5
>> =A0 =A0 =A0 =A0add =A0 =A0 sp, sp, #1004
>> =A0 =A0 =A0 =A0pop =A0 =A0 {r4, r5, pc}
>> .L6:
>> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5
>> =A0 =A0 =A0 =A0mov =A0 =A0 r5, r3
>> =A0 =A0 =A0 =A0bl =A0 =A0 =A0free
>> =A0 =A0 =A0 =A0b =A0 =A0 =A0 .L3
>> .L7:
>> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5
>> =A0 =A0 =A0 =A0mov =A0 =A0 r5, r1
>> =A0 =A0 =A0 =A0bl =A0 =A0 =A0free
>> =A0 =A0 =A0 =A0b =A0 =A0 =A0 .L3
>> ...
>>
>> The proposed patch solved the problem by dealing with the 2 blocks at a =
level
>> when they are still identical: at gimple level. It detect that the 2 blo=
cks are
>> identical, and removes one of them.
>>
>> The following table shows the impact of the patch on the example in term=
s of
>> size for -march=3Darmv7-a:
>>
>> =A0 =A0 =A0 =A0 =A0without =A0 =A0 with =A0 =A0delta
>> Os =A0 =A0 =A0: =A0 =A0 108 =A0 =A0 =A0104 =A0 =A0 =A0 -4
>> O2 =A0 =A0 =A0: =A0 =A0 120 =A0 =A0 =A0104 =A0 =A0 =A0-16
>> Os thumb: =A0 =A0 =A068 =A0 =A0 =A0 64 =A0 =A0 =A0 -4
>> O2 thumb: =A0 =A0 =A076 =A0 =A0 =A0 64 =A0 =A0 =A0-12
>>
>> The gain in size for -O2 is that of removing the entire block, plus the
>> replacement of 2 moves by a constant set, which also decreases the execu=
tion
>> path. The patch ensures optimal code for both -O2 and -Os.
>>
>>
>> By keeping track of equivalent definitions in the 2 blocks, we can ignor=
e those
>> differences in comparison. Without this feature, we would only match blo=
cks with
>> resultless operations, due the the ssa-nature of gimples.
>> For example, with this feature, we reduce the following function to its =
minimum
>> at gimple level, rather than at rtl level.
>>
>> int f(int c, int b, int d)
>> {
>> =A0int r, e;
>>
>> =A0if (c)
>> =A0 =A0r =3D b + d;
>> =A0else
>> =A0 =A0{
>> =A0 =A0 =A0e =3D b + d;
>> =A0 =A0 =A0r =3D e;
>> =A0 =A0}
>>
>> =A0return r;
>> }
>>
>> ;; Function f (f)
>>
>> f (int c, int b, int d)
>> {
>> =A0int e;
>>
>> <bb 2>:
>> =A0e_6 =3D b_3(D) + d_4(D);
>> =A0return e_6;
>>
>> }
>>
>> I'll send the patch with the testcases in a separate email.
>>
>> OK for trunk?
>
> I don't like that you hook this into cleanup_tree_cfg - that is called
> _way_ too often.
>
> This also duplicates the literal matching done on the RTL level - instead
> I think this optimization would be more related to value-numbering
> (either that of SCCVN/FRE/PRE or that of DOM which also does
> jump-threading).

And while at it: Put it in a separate file ;-)

Ciao!
Steven