From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-409717-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 55786 invoked by alias); 9 Oct 2015 09:29:20 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 55773 invoked by uid 89); 9 Oct 2015 09:29:20 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=AWL,BAYES_99,BAYES_999,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2
X-HELO: mail-yk0-f173.google.com
Received: from mail-yk0-f173.google.com (HELO mail-yk0-f173.google.com) (209.85.160.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 09 Oct 2015 09:29:18 +0000
Received: by ykft14 with SMTP id t14so73311908ykf.0        for <gcc-patches@gcc.gnu.org>; Fri, 09 Oct 2015 02:29:16 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.129.76.18 with SMTP id z18mr9251519ywa.263.1444382956600; Fri, 09 Oct 2015 02:29:16 -0700 (PDT)
Received: by 10.37.93.136 with HTTP; Fri, 9 Oct 2015 02:29:16 -0700 (PDT)
In-Reply-To: <c792164a9e6f8c4ba4db346c2df0e2341c73a3df.1444322573.git.segher@kernel.crashing.org>
References: <c792164a9e6f8c4ba4db346c2df0e2341c73a3df.1444322573.git.segher@kernel.crashing.org>
Date: Fri, 09 Oct 2015 09:29:00 -0000
Message-ID: <CAFiYyc3pioW-Xe0A1JXW2JPkcVbVecV784fHy7OP2yVRR0T5OQ@mail.gmail.com>
Subject: Re: [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864)
From: Richard Biener <richard.guenther@gmail.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2015-10/txt/msg00938.txt.bz2

On Thu, Oct 8, 2015 at 6:57 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> As the PR points out, the "simple" reorder algorithm makes bigger code
> than the STC algorithm did, for -Os, for x86.  I now tested it for many
> different targets and it turns out to be worse everywhere.
>
> This simple patch tunes "simple" a bit; this makes it better than STC
> almost everywhere.  The only exceptions (for the targets where I have
> results) are x86 and mn10300.  For those targets it may be best to switch
> the default algorithm for -Os to STC.
>
> The raw results.  This is text size for vmlinux for 31 targets; as you
> can see many did not build, but at least all primary targets did.
> "none" is no basic block reordering; "trunk" is current trunk; "stc" is
> with the STC algorithm; "nodyn" is with simple, but considering all edges
> equally likely; "swaps" is that, but prefering the more likely edge from
> conditional branches; and "falls" prefers the edge from conditional
> branches that is already the fallthrough edge.  This patch is "falls".
>
>    none    trunk     stc     nodyn    swaps    falls    best
>  3728663  3721479  3700831  3706407  3717971  3690367  falls  arm
>  2097684  2095560  2089484  2094024  2094212  2086720  falls  blackfin
>  2096118  2107356  2081894  2092276  2103732  2077162  falls  cris
>  3204044  3200972  3187196  3191932  3198156  3177980  falls  frv
>  9254222  9340258  9208805  9265886  9331362  9247119   stc   i386
>  3353034  3355482  3334726  3334466  3349710  3314970  falls  m32r
>  4545720  4549824  4514256  4541832  4544540  4498416  falls  microblaze
>  4276743  4266363  4246255  4259923  4261367  4227723  falls  mips
>  5779199  5770567  5741663  5757663  5764475  5721803  falls  mips64
>  2059078  2086000  2051005  2064365  2083923  2055705   stc   mn10300
>  3311925  3320113  3293873  3305949  3317865  3284081  falls  nios2
>  6738701  6747077  6710253  6742917  6740965  6696757  falls  parisc64
>  8312408  8312744  8261016  8294480  8306488  8237188  falls  powerpc
> 17782722 17788382 17722326 17749526 17780642 17683810  falls  powerpc64
> 11016289 11029481 10970081 10980065 11024617 10942409  falls  s390
>  1406832  1417224  1400344  1409392  1416172  1399428  falls  shnommu
>  3771111  3776223  3751623  3768459  3771455  3732967  falls  sparc
>  6113427  6113527  6074875  6091167  6106015  6049571  falls  sparc64
> 10449681 10529883 10398908 10458240 10522149 10440814   stc   x86_64
>  1905733  1905733  1905733  1905733  1905733  1905733  -----  xtensa
>
>
> Is this okay for trunk?

I think the patch makes sense but it also raises a question for me - how
did we decide what edge gets EDGE_FALLTHRU when going out-of-cfglayout?
And isn't _that_ mechanism then not part of basic-block reordering that
needs to be tweaked for choosing the EDGE_FALLTHRU as better?

Thanks,
Richard.

>
> Segher
>
>
> 2015-10-08  Segher Boessenkool  <segher@kernel.crashing.org>
>
>         PR rtl-optimization/67864
>         * gcc/bb-reorder (reorder_basic_blocks_simple): Prefer existing
>         fallthrough edges for conditional jumps.  Don't sort candidate
>         edges if not optimizing for speed.
>
> ---
>  gcc/bb-reorder.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index cb001e8..3b7098e 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -2318,16 +2318,24 @@ reorder_basic_blocks_simple (void)
>
>        if (any_condjump_p (end))
>         {
> -         edges[n++] = EDGE_SUCC (bb, 0);
> -         edges[n++] = EDGE_SUCC (bb, 1);
> +         edge e0 = EDGE_SUCC (bb, 0);
> +         edge e1 = EDGE_SUCC (bb, 1);
> +         /* When optimizing for size it is best to keep the original
> +            fallthrough edges.  */
> +         if (e1->flags & EDGE_FALLTHRU)
> +           std::swap (e0, e1);
> +         edges[n++] = e0;
> +         edges[n++] = e1;
>         }
>        else if (single_succ_p (bb))
>         edges[n++] = EDGE_SUCC (bb, 0);
>      }
>
> -  /* Sort the edges, the most desirable first.  */
> +  /* Sort the edges, the most desirable first.  When optimizing for size
> +     all edges are equally desirable.  */
>
> -  std::stable_sort (edges, edges + n, edge_order);
> +  if (optimize_function_for_speed_p (cfun))
> +    std::stable_sort (edges, edges + n, edge_order);
>
>    /* Now decide which of those edges to make fallthrough edges.  We set
>       BB_VISITED if a block already has a fallthrough successor assigned
> --
> 1.8.1.4
>