From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-293944-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 13721 invoked by alias); 8 Jun 2011 09:55:28 -0000
Received: (qmail 13713 invoked by uid 22791); 8 Jun 2011 09:55:27 -0000
X-SWARE-Spam-Status: No, hits=-1.0 required=5.0	tests=AWL,BAYES_50,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,TW_CB
X-Spam-Check-By: sourceware.org
Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 08 Jun 2011 09:55:13 +0000
Received: by wye20 with SMTP id 20so253067wye.20        for <gcc-patches@gcc.gnu.org>; Wed, 08 Jun 2011 02:55:11 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.227.163.76 with SMTP id z12mr907880wbx.75.1307526911564; Wed, 08 Jun 2011 02:55:11 -0700 (PDT)
Received: by 10.227.37.152 with HTTP; Wed, 8 Jun 2011 02:55:11 -0700 (PDT)
In-Reply-To: <4DEF4408.4040001@codesourcery.com>
References: <4DEF4408.4040001@codesourcery.com>
Date: Wed, 08 Jun 2011 10:09:00 -0000
Message-ID: <BANLkTimAvGy1jRniWftbEvLnGwEN_0Ma8Q@mail.gmail.com>
Subject: Re: [PATCH, PR43864] Gimple level duplicate block cleanup.
From: Richard Guenther <richard.guenther@gmail.com>
To: Tom de Vries <vries@codesourcery.com>
Cc: Steven Bosscher <stevenb.gcc@gmail.com>, gcc-patches@gcc.gnu.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-06/txt/msg00629.txt.bz2

On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries <vries@codesourcery.com> wrot=
e:
> Hi Richard,
>
> I have a patch for PR43864. The patch adds a gimple level duplicate block
> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and
> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the fo=
llowing
> table (%, lower is better).
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 none =A0 =A0 =A0 =A0 =A0 =A0pic
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0thumb1 =A0thumb2 =A0thumb1 thumb2
> spec2000 =A0 =A0 =A0 =A0 =A099.9 =A0 =A099.9 =A0 =A099.8 =A0 99.8
>
> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure t=
hat the
> optimizations proposed in PR20070 would fix this PR.
>
> The problem in this PR is that when compiling with -O2, the example below=
 should
> only have one call to free. The original problem is formulated in terms o=
f -Os,
> but currently we generate one call to free with -Os, although still not t=
he
> smallest code possible. I'll show here the -O2 case, since that's similar=
 to the
> original PR.
>
> #include <stdio.h>
> void foo (char*, FILE*);
> char* hprofStartupp(char *outputFileName, char *ctx)
> {
> =A0 =A0char fileName[1000];
> =A0 =A0FILE *fp;
> =A0 =A0sprintf(fileName, outputFileName);
> =A0 =A0if (access(fileName, 1) =3D=3D 0) {
> =A0 =A0 =A0 =A0free(ctx);
> =A0 =A0 =A0 =A0return 0;
> =A0 =A0}
>
> =A0 =A0fp =3D fopen(fileName, 0);
> =A0 =A0if (fp =3D=3D 0) {
> =A0 =A0 =A0 =A0free(ctx);
> =A0 =A0 =A0 =A0return 0;
> =A0 =A0}
>
> =A0 =A0foo(outputFileName, fp);
>
> =A0 =A0return ctx;
> }
>
> AFAIU, there are 2 complementary methods of rtl optimizations proposed in=
 PR20070.
> - Merging 2 blocks which are identical expect for input registers, by usi=
ng a
> =A0conditional move to choose between the different input registers.
> - Merging 2 blocks which have different local registers, by ignoring those
> =A0differences
>
> Blocks .L6 and.L7 have no difference in local registers, but they have a
> difference in input registers: r3 and r1. Replacing the move to r5 by a
> conditional move would probably be benificial in terms of size, but it's =
not
> clear what condition the conditional move should be using. Calculating su=
ch a
> condition would add in size and increase the execution path.
>
> gcc -O2 -march=3Darmv7-a -mthumb pr43864.c -S:
> ...
> =A0 =A0 =A0 =A0push =A0 =A0{r4, r5, lr}
> =A0 =A0 =A0 =A0mov =A0 =A0 r4, r0
> =A0 =A0 =A0 =A0sub =A0 =A0 sp, sp, #1004
> =A0 =A0 =A0 =A0mov =A0 =A0 r5, r1
> =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp
> =A0 =A0 =A0 =A0mov =A0 =A0 r1, r4
> =A0 =A0 =A0 =A0bl =A0 =A0 =A0sprintf
> =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp
> =A0 =A0 =A0 =A0movs =A0 =A0r1, #1
> =A0 =A0 =A0 =A0bl =A0 =A0 =A0access
> =A0 =A0 =A0 =A0mov =A0 =A0 r3, r0
> =A0 =A0 =A0 =A0cbz =A0 =A0 r0, .L6
> =A0 =A0 =A0 =A0movs =A0 =A0r1, #0
> =A0 =A0 =A0 =A0mov =A0 =A0 r0, sp
> =A0 =A0 =A0 =A0bl =A0 =A0 =A0fopen
> =A0 =A0 =A0 =A0mov =A0 =A0 r1, r0
> =A0 =A0 =A0 =A0cbz =A0 =A0 r0, .L7
> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r4
> =A0 =A0 =A0 =A0bl =A0 =A0 =A0foo
> .L3:
> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5
> =A0 =A0 =A0 =A0add =A0 =A0 sp, sp, #1004
> =A0 =A0 =A0 =A0pop =A0 =A0 {r4, r5, pc}
> .L6:
> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5
> =A0 =A0 =A0 =A0mov =A0 =A0 r5, r3
> =A0 =A0 =A0 =A0bl =A0 =A0 =A0free
> =A0 =A0 =A0 =A0b =A0 =A0 =A0 .L3
> .L7:
> =A0 =A0 =A0 =A0mov =A0 =A0 r0, r5
> =A0 =A0 =A0 =A0mov =A0 =A0 r5, r1
> =A0 =A0 =A0 =A0bl =A0 =A0 =A0free
> =A0 =A0 =A0 =A0b =A0 =A0 =A0 .L3
> ...
>
> The proposed patch solved the problem by dealing with the 2 blocks at a l=
evel
> when they are still identical: at gimple level. It detect that the 2 bloc=
ks are
> identical, and removes one of them.
>
> The following table shows the impact of the patch on the example in terms=
 of
> size for -march=3Darmv7-a:
>
> =A0 =A0 =A0 =A0 =A0without =A0 =A0 with =A0 =A0delta
> Os =A0 =A0 =A0: =A0 =A0 108 =A0 =A0 =A0104 =A0 =A0 =A0 -4
> O2 =A0 =A0 =A0: =A0 =A0 120 =A0 =A0 =A0104 =A0 =A0 =A0-16
> Os thumb: =A0 =A0 =A068 =A0 =A0 =A0 64 =A0 =A0 =A0 -4
> O2 thumb: =A0 =A0 =A076 =A0 =A0 =A0 64 =A0 =A0 =A0-12
>
> The gain in size for -O2 is that of removing the entire block, plus the
> replacement of 2 moves by a constant set, which also decreases the execut=
ion
> path. The patch ensures optimal code for both -O2 and -Os.
>
>
> By keeping track of equivalent definitions in the 2 blocks, we can ignore=
 those
> differences in comparison. Without this feature, we would only match bloc=
ks with
> resultless operations, due the the ssa-nature of gimples.
> For example, with this feature, we reduce the following function to its m=
inimum
> at gimple level, rather than at rtl level.
>
> int f(int c, int b, int d)
> {
> =A0int r, e;
>
> =A0if (c)
> =A0 =A0r =3D b + d;
> =A0else
> =A0 =A0{
> =A0 =A0 =A0e =3D b + d;
> =A0 =A0 =A0r =3D e;
> =A0 =A0}
>
> =A0return r;
> }
>
> ;; Function f (f)
>
> f (int c, int b, int d)
> {
> =A0int e;
>
> <bb 2>:
> =A0e_6 =3D b_3(D) + d_4(D);
> =A0return e_6;
>
> }
>
> I'll send the patch with the testcases in a separate email.
>
> OK for trunk?

I don't like that you hook this into cleanup_tree_cfg - that is called
_way_ too often.

This also duplicates the literal matching done on the RTL level - instead
I think this optimization would be more related to value-numbering
(either that of SCCVN/FRE/PRE or that of DOM which also does
jump-threading).

Richard.

> Thanks,
> - Tom
>
> 2011-06-08 =A0Tom de Vries =A0<tom@codesourcery.com>
>
> =A0 =A0 =A0 =A0PR middle-end/43864
> =A0 =A0 =A0 =A0* tree-cfgcleanup.c (int_int_splay_lookup, int_int_splay_i=
nsert)
> =A0 =A0 =A0 =A0(int_int_splay_node_contained_in, int_int_splay_contained_=
in)
> =A0 =A0 =A0 =A0(equiv_lookup, equiv_insert, equiv_contained_in, equiv_ini=
t)
> =A0 =A0 =A0 =A0(equiv_delete, gimple_base_equal_p, pt_solution_equal_p, g=
imple_equal_p)
> =A0 =A0 =A0 =A0(bb_gimple_equal_p, update_debug_stmts, cleanup_duplicate_=
preds_1)
> =A0 =A0 =A0 =A0(same_or_local_phi_alternatives, cleanup_duplicate_preds):=
 New function.
> =A0 =A0 =A0 =A0(cleanup_tree_cfg_bb): Use cleanup_duplicate_preds.
>