* [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864)
@ 2015-10-08 16:57 Segher Boessenkool
2015-10-09 9:29 ` Richard Biener
2015-10-09 10:35 ` Bernd Schmidt
0 siblings, 2 replies; 5+ messages in thread
From: Segher Boessenkool @ 2015-10-08 16:57 UTC (permalink / raw)
To: gcc-patches; +Cc: Segher Boessenkool
As the PR points out, the "simple" reorder algorithm makes bigger code
than the STC algorithm did, for -Os, for x86. I now tested it for many
different targets and it turns out to be worse everywhere.
This simple patch tunes "simple" a bit; this makes it better than STC
almost everywhere. The only exceptions (for the targets where I have
results) are x86 and mn10300. For those targets it may be best to switch
the default algorithm for -Os to STC.
The raw results. This is text size for vmlinux for 31 targets; as you
can see many did not build, but at least all primary targets did.
"none" is no basic block reordering; "trunk" is current trunk; "stc" is
with the STC algorithm; "nodyn" is with simple, but considering all edges
equally likely; "swaps" is that, but prefering the more likely edge from
conditional branches; and "falls" prefers the edge from conditional
branches that is already the fallthrough edge. This patch is "falls".
none trunk stc nodyn swaps falls best
3728663 3721479 3700831 3706407 3717971 3690367 falls arm
2097684 2095560 2089484 2094024 2094212 2086720 falls blackfin
2096118 2107356 2081894 2092276 2103732 2077162 falls cris
3204044 3200972 3187196 3191932 3198156 3177980 falls frv
9254222 9340258 9208805 9265886 9331362 9247119 stc i386
3353034 3355482 3334726 3334466 3349710 3314970 falls m32r
4545720 4549824 4514256 4541832 4544540 4498416 falls microblaze
4276743 4266363 4246255 4259923 4261367 4227723 falls mips
5779199 5770567 5741663 5757663 5764475 5721803 falls mips64
2059078 2086000 2051005 2064365 2083923 2055705 stc mn10300
3311925 3320113 3293873 3305949 3317865 3284081 falls nios2
6738701 6747077 6710253 6742917 6740965 6696757 falls parisc64
8312408 8312744 8261016 8294480 8306488 8237188 falls powerpc
17782722 17788382 17722326 17749526 17780642 17683810 falls powerpc64
11016289 11029481 10970081 10980065 11024617 10942409 falls s390
1406832 1417224 1400344 1409392 1416172 1399428 falls shnommu
3771111 3776223 3751623 3768459 3771455 3732967 falls sparc
6113427 6113527 6074875 6091167 6106015 6049571 falls sparc64
10449681 10529883 10398908 10458240 10522149 10440814 stc x86_64
1905733 1905733 1905733 1905733 1905733 1905733 ----- xtensa
Is this okay for trunk?
Segher
2015-10-08 Segher Boessenkool <segher@kernel.crashing.org>
PR rtl-optimization/67864
* gcc/bb-reorder (reorder_basic_blocks_simple): Prefer existing
fallthrough edges for conditional jumps. Don't sort candidate
edges if not optimizing for speed.
---
gcc/bb-reorder.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index cb001e8..3b7098e 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2318,16 +2318,24 @@ reorder_basic_blocks_simple (void)
if (any_condjump_p (end))
{
- edges[n++] = EDGE_SUCC (bb, 0);
- edges[n++] = EDGE_SUCC (bb, 1);
+ edge e0 = EDGE_SUCC (bb, 0);
+ edge e1 = EDGE_SUCC (bb, 1);
+ /* When optimizing for size it is best to keep the original
+ fallthrough edges. */
+ if (e1->flags & EDGE_FALLTHRU)
+ std::swap (e0, e1);
+ edges[n++] = e0;
+ edges[n++] = e1;
}
else if (single_succ_p (bb))
edges[n++] = EDGE_SUCC (bb, 0);
}
- /* Sort the edges, the most desirable first. */
+ /* Sort the edges, the most desirable first. When optimizing for size
+ all edges are equally desirable. */
- std::stable_sort (edges, edges + n, edge_order);
+ if (optimize_function_for_speed_p (cfun))
+ std::stable_sort (edges, edges + n, edge_order);
/* Now decide which of those edges to make fallthrough edges. We set
BB_VISITED if a block already has a fallthrough successor assigned
--
1.8.1.4
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864)
2015-10-08 16:57 [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864) Segher Boessenkool
@ 2015-10-09 9:29 ` Richard Biener
2015-10-09 16:08 ` Segher Boessenkool
2015-10-09 10:35 ` Bernd Schmidt
1 sibling, 1 reply; 5+ messages in thread
From: Richard Biener @ 2015-10-09 9:29 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: GCC Patches
On Thu, Oct 8, 2015 at 6:57 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> As the PR points out, the "simple" reorder algorithm makes bigger code
> than the STC algorithm did, for -Os, for x86. I now tested it for many
> different targets and it turns out to be worse everywhere.
>
> This simple patch tunes "simple" a bit; this makes it better than STC
> almost everywhere. The only exceptions (for the targets where I have
> results) are x86 and mn10300. For those targets it may be best to switch
> the default algorithm for -Os to STC.
>
> The raw results. This is text size for vmlinux for 31 targets; as you
> can see many did not build, but at least all primary targets did.
> "none" is no basic block reordering; "trunk" is current trunk; "stc" is
> with the STC algorithm; "nodyn" is with simple, but considering all edges
> equally likely; "swaps" is that, but prefering the more likely edge from
> conditional branches; and "falls" prefers the edge from conditional
> branches that is already the fallthrough edge. This patch is "falls".
>
> none trunk stc nodyn swaps falls best
> 3728663 3721479 3700831 3706407 3717971 3690367 falls arm
> 2097684 2095560 2089484 2094024 2094212 2086720 falls blackfin
> 2096118 2107356 2081894 2092276 2103732 2077162 falls cris
> 3204044 3200972 3187196 3191932 3198156 3177980 falls frv
> 9254222 9340258 9208805 9265886 9331362 9247119 stc i386
> 3353034 3355482 3334726 3334466 3349710 3314970 falls m32r
> 4545720 4549824 4514256 4541832 4544540 4498416 falls microblaze
> 4276743 4266363 4246255 4259923 4261367 4227723 falls mips
> 5779199 5770567 5741663 5757663 5764475 5721803 falls mips64
> 2059078 2086000 2051005 2064365 2083923 2055705 stc mn10300
> 3311925 3320113 3293873 3305949 3317865 3284081 falls nios2
> 6738701 6747077 6710253 6742917 6740965 6696757 falls parisc64
> 8312408 8312744 8261016 8294480 8306488 8237188 falls powerpc
> 17782722 17788382 17722326 17749526 17780642 17683810 falls powerpc64
> 11016289 11029481 10970081 10980065 11024617 10942409 falls s390
> 1406832 1417224 1400344 1409392 1416172 1399428 falls shnommu
> 3771111 3776223 3751623 3768459 3771455 3732967 falls sparc
> 6113427 6113527 6074875 6091167 6106015 6049571 falls sparc64
> 10449681 10529883 10398908 10458240 10522149 10440814 stc x86_64
> 1905733 1905733 1905733 1905733 1905733 1905733 ----- xtensa
>
>
> Is this okay for trunk?
I think the patch makes sense but it also raises a question for me - how
did we decide what edge gets EDGE_FALLTHRU when going out-of-cfglayout?
And isn't _that_ mechanism then not part of basic-block reordering that
needs to be tweaked for choosing the EDGE_FALLTHRU as better?
Thanks,
Richard.
>
> Segher
>
>
> 2015-10-08 Segher Boessenkool <segher@kernel.crashing.org>
>
> PR rtl-optimization/67864
> * gcc/bb-reorder (reorder_basic_blocks_simple): Prefer existing
> fallthrough edges for conditional jumps. Don't sort candidate
> edges if not optimizing for speed.
>
> ---
> gcc/bb-reorder.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index cb001e8..3b7098e 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -2318,16 +2318,24 @@ reorder_basic_blocks_simple (void)
>
> if (any_condjump_p (end))
> {
> - edges[n++] = EDGE_SUCC (bb, 0);
> - edges[n++] = EDGE_SUCC (bb, 1);
> + edge e0 = EDGE_SUCC (bb, 0);
> + edge e1 = EDGE_SUCC (bb, 1);
> + /* When optimizing for size it is best to keep the original
> + fallthrough edges. */
> + if (e1->flags & EDGE_FALLTHRU)
> + std::swap (e0, e1);
> + edges[n++] = e0;
> + edges[n++] = e1;
> }
> else if (single_succ_p (bb))
> edges[n++] = EDGE_SUCC (bb, 0);
> }
>
> - /* Sort the edges, the most desirable first. */
> + /* Sort the edges, the most desirable first. When optimizing for size
> + all edges are equally desirable. */
>
> - std::stable_sort (edges, edges + n, edge_order);
> + if (optimize_function_for_speed_p (cfun))
> + std::stable_sort (edges, edges + n, edge_order);
>
> /* Now decide which of those edges to make fallthrough edges. We set
> BB_VISITED if a block already has a fallthrough successor assigned
> --
> 1.8.1.4
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864)
2015-10-09 9:29 ` Richard Biener
@ 2015-10-09 16:08 ` Segher Boessenkool
0 siblings, 0 replies; 5+ messages in thread
From: Segher Boessenkool @ 2015-10-09 16:08 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
On Fri, Oct 09, 2015 at 11:29:16AM +0200, Richard Biener wrote:
> I think the patch makes sense but it also raises a question for me - how
> did we decide what edge gets EDGE_FALLTHRU when going out-of-cfglayout?
Good question. I think it just tries to make "natural" control flow; I'll
investigate.
> And isn't _that_ mechanism then not part of basic-block reordering that
> needs to be tweaked for choosing the EDGE_FALLTHRU as better?
Yes.
This patch uses the existing fallthrough for conditional branches, which
seems to work best; but it still can significantly improve on unconditional
branches. I'm sure there are better heuristics possible but I don't know
them (and actually *solving* the problem isn't even polynomial of course).
Segher
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864)
2015-10-08 16:57 [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864) Segher Boessenkool
2015-10-09 9:29 ` Richard Biener
@ 2015-10-09 10:35 ` Bernd Schmidt
2015-10-09 16:27 ` Segher Boessenkool
1 sibling, 1 reply; 5+ messages in thread
From: Bernd Schmidt @ 2015-10-09 10:35 UTC (permalink / raw)
To: Segher Boessenkool, gcc-patches
On 10/08/2015 06:57 PM, Segher Boessenkool wrote:
> As the PR points out, the "simple" reorder algorithm makes bigger code
> than the STC algorithm did, for -Os, for x86. I now tested it for many
> different targets and it turns out to be worse everywhere.
That's somewhat disappointing. Wasn't it supposed to improve over it?
What went wrong?
> Is this okay for trunk?
>
> 2015-10-08 Segher Boessenkool <segher@kernel.crashing.org>
>
> PR rtl-optimization/67864
> * gcc/bb-reorder (reorder_basic_blocks_simple): Prefer existing
> fallthrough edges for conditional jumps. Don't sort candidate
> edges if not optimizing for speed.
Ok. Although it would be nice to understand exactly what causes code
growth and possibly get a real cost estimate rather than such a heuristic.
Bernd
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864)
2015-10-09 10:35 ` Bernd Schmidt
@ 2015-10-09 16:27 ` Segher Boessenkool
0 siblings, 0 replies; 5+ messages in thread
From: Segher Boessenkool @ 2015-10-09 16:27 UTC (permalink / raw)
To: Bernd Schmidt; +Cc: gcc-patches
On Fri, Oct 09, 2015 at 12:35:46PM +0200, Bernd Schmidt wrote:
> On 10/08/2015 06:57 PM, Segher Boessenkool wrote:
> >As the PR points out, the "simple" reorder algorithm makes bigger code
> >than the STC algorithm did, for -Os, for x86. I now tested it for many
> >different targets and it turns out to be worse everywhere.
>
> That's somewhat disappointing. Wasn't it supposed to improve over it?
> What went wrong?
I first somehow tested -O2 only. Then when you asked I tested -Os again,
but only on powerpc64, and it was 0.1% there. It seems combine.ii is not
very representative for this :-/
> >Is this okay for trunk?
> >
> >2015-10-08 Segher Boessenkool <segher@kernel.crashing.org>
> >
> > PR rtl-optimization/67864
> > * gcc/bb-reorder (reorder_basic_blocks_simple): Prefer existing
> > fallthrough edges for conditional jumps. Don't sort candidate
> > edges if not optimizing for speed.
>
> Ok. Although it would be nice to understand exactly what causes code
> growth and possibly get a real cost estimate rather than such a heuristic.
There are two main factors: firstly, you want to have as many fallthroughs
as possible (any block that does not end in a fallthrough gets an extra
jump insn). I haven't found any purely local heuristic that works better
than this patch. Very likely you can do better by looking at bigger parts
of the graph (say, identify loops and diamonds).
The second thing (which still hurts x86) is that on some targets the "cheap"
conditional branches have a very short range. "Simple" has no notion of
jump distance at all.
Segher
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-10-09 16:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-08 16:57 [PATCH] bb-reorder: Improve the simple algorithm for -Os (PR67864) Segher Boessenkool
2015-10-09 9:29 ` Richard Biener
2015-10-09 16:08 ` Segher Boessenkool
2015-10-09 10:35 ` Bernd Schmidt
2015-10-09 16:27 ` Segher Boessenkool
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).