public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/58033] New: counterproductive bb-reorder
@ 2013-07-30 21:00 olegendo at gcc dot gnu.org
2013-07-30 21:01 ` [Bug rtl-optimization/58033] " olegendo at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-07-30 21:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
Bug ID: 58033
Summary: counterproductive bb-reorder
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: olegendo at gcc dot gnu.org
CC: steven at gcc dot gnu.org, tejohnson at google dot com
Target: sh*-*-*
On SH, compiling the following code with -O2
#include <bitset>
std::bitset<32> make_bits (void)
{
std::bitset<32> r;
for (auto&& i : { 4, 5, 6, 10 })
if (i < r.size ())
r.set (i);
return r;
}
results in the following code:
mov.l .L8,r1
mov #0,r0
mov #31,r7
mov #1,r6
mov #4,r2
.L2:
mov.l @r1,r3
cmp/hi r7,r3
bf/s .L7
mov r6,r5
.L3:
dt r2
bf/s .L2 // branch if value not > 31, i.e. in each iteration
add #4,r1
rts
nop
.align 1
.L7:
shld r3,r5
bra .L3
or r5,r0
.L9:
.align 2
.L8:
.long _._45+0
_._45:
.long 4
.long 5
.long 6
.long 10
Disabling the bb-reorder pass or using -Os results in more compact and faster
code:
mov.l .L7,r1
mov #0,r0
mov #31,r7
mov #1,r6
mov #4,r2
.L2:
mov.l @r1,r3
cmp/hi r7,r3
bt/s .L3 // branch if value > 31, i.e. never.
mov r6,r5
shld r3,r5
or r5,r0
.L3:
dt r2
bf/s .L2
add #4,r1
rts
nop
Of course the bb-reorder pass doesn't know that the values in this case are
always in range. Still, maybe it could be improved by not splitting out a BB
if it consists only of a few insns? I've tried increasing the branch cost but
it won't do anything.
Teresa, Steven,
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/58033] counterproductive bb-reorder
2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
@ 2013-07-30 21:01 ` olegendo at gcc dot gnu.org
2013-07-30 21:16 ` tejohnson at google dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-07-30 21:01 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
--- Comment #1 from Oleg Endo <olegendo at gcc dot gnu.org> ---
... according to the changelog you've been doing some things on bb-reorder.c,
maybe you have an idea?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/58033] counterproductive bb-reorder
2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
2013-07-30 21:01 ` [Bug rtl-optimization/58033] " olegendo at gcc dot gnu.org
@ 2013-07-30 21:16 ` tejohnson at google dot com
2013-07-30 21:40 ` olegendo at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tejohnson at google dot com @ 2013-07-30 21:16 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
--- Comment #2 from Teresa Johnson <tejohnson at google dot com> ---
On Tue, Jul 30, 2013 at 2:00 PM, olegendo at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
>
> Bug ID: 58033
> Summary: counterproductive bb-reorder
> Product: gcc
> Version: 4.9.0
> Status: UNCONFIRMED
> Severity: normal
> Priority: P3
> Component: rtl-optimization
> Assignee: unassigned at gcc dot gnu.org
> Reporter: olegendo at gcc dot gnu.org
> CC: steven at gcc dot gnu.org, tejohnson at google dot com
> Target: sh*-*-*
>
> On SH, compiling the following code with -O2
>
> #include <bitset>
>
> std::bitset<32> make_bits (void)
> {
> std::bitset<32> r;
> for (auto&& i : { 4, 5, 6, 10 })
> if (i < r.size ())
> r.set (i);
>
> return r;
> }
>
> results in the following code:
>
> mov.l .L8,r1
> mov #0,r0
> mov #31,r7
> mov #1,r6
> mov #4,r2
> .L2:
> mov.l @r1,r3
> cmp/hi r7,r3
> bf/s .L7
I assume it is the above branch that is the issue (not the bf/s .L2
below as that is the same in both versions of the code). I'm assuming
this is not build with FDO? In that case bbro is probably at the mercy
of whatever probabilities the static heuristics assigned to the
branches. Although if it is 50-50 then I'm not sure offhand what
happens - maybe it is biasing in favor of having the shortest trace?
This is a great test case for motivating range propagation. =)
Can you attach the dump created with -fdump-rtl-bbro-all? We can see
what the edge probabilities are. For some reason it is not compiling
for me - what options do you use? My (4_7-based) g++ is complaining
about the "auto":
$ g++ -O2 pr58033.cc -S
pr58033.cc: In function 'std::bitset<32ul> make_bits()':
pr58033.cc:6:12: error: expected unqualified-id before '&&' token
for (auto&& i : { 4, 5, 6, 10 })
Teresa
> mov r6,r5
> .L3:
> dt r2
> bf/s .L2 // branch if value not > 31, i.e. in each iteration
> add #4,r1
> rts
> nop
> .align 1
> .L7:
> shld r3,r5
> bra .L3
> or r5,r0
> .L9:
> .align 2
> .L8:
> .long _._45+0
>
> _._45:
> .long 4
> .long 5
> .long 6
> .long 10
>
> Disabling the bb-reorder pass or using -Os results in more compact and faster
> code:
>
> mov.l .L7,r1
> mov #0,r0
> mov #31,r7
> mov #1,r6
> mov #4,r2
> .L2:
> mov.l @r1,r3
> cmp/hi r7,r3
> bt/s .L3 // branch if value > 31, i.e. never.
> mov r6,r5
> shld r3,r5
> or r5,r0
> .L3:
> dt r2
> bf/s .L2
> add #4,r1
> rts
> nop
>
> Of course the bb-reorder pass doesn't know that the values in this case are
> always in range. Still, maybe it could be improved by not splitting out a BB
> if it consists only of a few insns? I've tried increasing the branch cost but
> it won't do anything.
>
> Teresa, Steven,
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/58033] counterproductive bb-reorder
2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
2013-07-30 21:01 ` [Bug rtl-optimization/58033] " olegendo at gcc dot gnu.org
2013-07-30 21:16 ` tejohnson at google dot com
@ 2013-07-30 21:40 ` olegendo at gcc dot gnu.org
2013-07-30 23:49 ` tejohnson at google dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-07-30 21:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Created attachment 30574
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30574&action=edit
bbro dump
(In reply to Teresa Johnson from comment #2)
>
> I assume it is the above branch that is the issue (not the bf/s .L2
> below as that is the same in both versions of the code). I'm assuming
> this is not build with FDO?
Right, no FDO.
> In that case bbro is probably at the mercy
> of whatever probabilities the static heuristics assigned to the
> branches. Although if it is 50-50 then I'm not sure offhand what
> happens - maybe it is biasing in favor of having the shortest trace?
> This is a great test case for motivating range propagation. =)
>
> Can you attach the dump created with -fdump-rtl-bbro-all? We can see
> what the edge probabilities are. For some reason it is not compiling
> for me - what options do you use? My (4_7-based) g++ is complaining
> about the "auto":
Sorry, it's C++11, so you need to specify -std=c++11.
Here's a C++03 version, which on my setup (4.9 trunk rev 201282) results in
exactly the same code:
#include <bitset>
std::bitset<32> make_bits2 (void)
{
static const int ii[] = { 4, 5, 6, 10 };
std::bitset<32> r;
for (int i = 0; i < 4; ++i)
if (ii[i] < r.size ())
r.set (ii[i]);
return r;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/58033] counterproductive bb-reorder
2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
` (2 preceding siblings ...)
2013-07-30 21:40 ` olegendo at gcc dot gnu.org
@ 2013-07-30 23:49 ` tejohnson at google dot com
2013-11-25 18:44 ` olegendo at gcc dot gnu.org
2015-04-04 11:29 ` olegendo at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: tejohnson at google dot com @ 2013-07-30 23:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
--- Comment #4 from Teresa Johnson <tejohnson at google dot com> ---
On Tue, Jul 30, 2013 at 2:40 PM, olegendo at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
>
> --- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> ---
> Created attachment 30574
> --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30574&action=edit
> bbro dump
>
> (In reply to Teresa Johnson from comment #2)
>>
>> I assume it is the above branch that is the issue (not the bf/s .L2
>> below as that is the same in both versions of the code). I'm assuming
>> this is not build with FDO?
>
> Right, no FDO.
>
>> In that case bbro is probably at the mercy
>> of whatever probabilities the static heuristics assigned to the
>> branches. Although if it is 50-50 then I'm not sure offhand what
>> happens - maybe it is biasing in favor of having the shortest trace?
>> This is a great test case for motivating range propagation. =)
Thanks for the dump. The issue is the heuristically assigned edge frequencies:
;; succ: 6 [29.0%] (FALLTHRU,CAN_FALLTHRU)
;; 4 [71.0%] (CAN_FALLTHRU)
BB 6 contains the code within the then clause that is in fact always
executed. But it is being assigned only 29% probability, which is why
bbro lays out the code as it does.
I'm not familiar with the static edge weight heuristics in gcc, but it
would be interesting to see why it is predicting that the if statement
will be false.
Teresa
>>
>> Can you attach the dump created with -fdump-rtl-bbro-all? We can see
>> what the edge probabilities are. For some reason it is not compiling
>> for me - what options do you use? My (4_7-based) g++ is complaining
>> about the "auto":
>
> Sorry, it's C++11, so you need to specify -std=c++11.
>
> Here's a C++03 version, which on my setup (4.9 trunk rev 201282) results in
> exactly the same code:
>
> #include <bitset>
>
> std::bitset<32> make_bits2 (void)
> {
> static const int ii[] = { 4, 5, 6, 10 };
> std::bitset<32> r;
> for (int i = 0; i < 4; ++i)
> if (ii[i] < r.size ())
> r.set (ii[i]);
>
> return r;
> }
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/58033] counterproductive bb-reorder
2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
` (3 preceding siblings ...)
2013-07-30 23:49 ` tejohnson at google dot com
@ 2013-11-25 18:44 ` olegendo at gcc dot gnu.org
2015-04-04 11:29 ` olegendo at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-11-25 18:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Another somewhat related SH example, compiled with -m4 -O2 -ml on rev 205313:
int
foo1 (long long value)
{
const long long constant = 0xc000000080000000LL;
if (value < constant)
return 1;
else
return 2;
}
mov.l .L12,r1
cmp/ge r1,r5
bt/s .L10 <<< should branch to L10
cmp/gt r1,r5
mov #1,r0
.L6:
rts
nop
.align 1
.L10:
bf .L11 <<< never branches, always falls through
.L5:
rts
mov #2,r0
.align 1
.L11:
mov.l .L13,r1
cmp/hs r1,r4
bt .L5
mov #1,r0
bra .L6 <<< could just duplicate BB from L6 here
nop (see also PR 58615)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/58033] counterproductive bb-reorder
2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
` (4 preceding siblings ...)
2013-11-25 18:44 ` olegendo at gcc dot gnu.org
@ 2015-04-04 11:29 ` olegendo at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-04-04 11:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
--- Comment #6 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Another example on SH:
const char* test (const char* s0, int c, int* rout)
{
int r = 0;
for (int i = 0; i < c; ++i)
r += s0[i];
*rout = r;
return s0;
}
compiled with -O2:
_test:
cmp/pl r5
bf .L4
mov #0,r1
mov r4,r2
.align 2
.L3:
mov.b @r2+,r3
dt r5
bf/s .L3
add r3,r1
.L2:
mov.l r1,@r6
rts
mov r4,r0
.align 1
.L4:
bra .L2
mov #0,r1
compiled with -O2 -fno-reorder-blocks:
_test:
cmp/pl r5
bf/s .L4
mov #0,r1
mov r4,r2
.align 2
.L3:
mov.b @r2+,r3
dt r5
bf/s .L3
add r3,r1
bra .L7
mov.l r1,@r6
.align 1
.L4:
mov.l r1,@r6
.L7:
rts
mov r4,r0
.. which is better, except for the redundant stores. Folding the two stores
gives the minimal code:
_test:
cmp/pl r5
bf/s .L4
mov #0,r1
mov r4,r2
.align 2
.L3:
mov.b @r2+,r3
dt r5
bf/s .L3
add r3,r1
.L4:
mov.l r1,@r6
rts
mov r4,r0
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-04-04 11:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
2013-07-30 21:01 ` [Bug rtl-optimization/58033] " olegendo at gcc dot gnu.org
2013-07-30 21:16 ` tejohnson at google dot com
2013-07-30 21:40 ` olegendo at gcc dot gnu.org
2013-07-30 23:49 ` tejohnson at google dot com
2013-11-25 18:44 ` olegendo at gcc dot gnu.org
2015-04-04 11:29 ` olegendo at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).