[Bug rtl-optimization/58033] New: counterproductive bb-reorder

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/58033] New: counterproductive bb-reorder
@ 2013-07-30 21:00 olegendo at gcc dot gnu.org
  2013-07-30 21:01 ` [Bug rtl-optimization/58033] " olegendo at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-07-30 21:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033

            Bug ID: 58033
           Summary: counterproductive bb-reorder
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: olegendo at gcc dot gnu.org
                CC: steven at gcc dot gnu.org, tejohnson at google dot com
            Target: sh*-*-*

On SH, compiling the following code with -O2

#include <bitset>

std::bitset<32> make_bits (void)
{
  std::bitset<32> r;
  for (auto&& i : { 4, 5, 6, 10 })
    if (i < r.size ())
      r.set (i);

  return r;
}

results in the following code:

        mov.l   .L8,r1
        mov     #0,r0
        mov     #31,r7
        mov     #1,r6
        mov     #4,r2
.L2:
        mov.l   @r1,r3
        cmp/hi  r7,r3
        bf/s    .L7
        mov     r6,r5
.L3:
        dt      r2
        bf/s    .L2     // branch if value not > 31, i.e. in each iteration
        add     #4,r1
        rts
        nop
        .align 1
.L7:
        shld    r3,r5
        bra     .L3
        or      r5,r0
.L9:
        .align 2
.L8:
        .long   _._45+0

_._45:
        .long   4
        .long   5
        .long   6
        .long   10

Disabling the bb-reorder pass or using -Os results in more compact and faster
code:

        mov.l   .L7,r1
        mov     #0,r0
        mov     #31,r7
        mov     #1,r6
        mov     #4,r2
.L2:
        mov.l   @r1,r3
        cmp/hi  r7,r3
        bt/s    .L3    // branch if value > 31, i.e. never.
        mov     r6,r5
        shld    r3,r5
        or      r5,r0
.L3:
        dt      r2
        bf/s    .L2
        add     #4,r1
        rts    
        nop

Of course the bb-reorder pass doesn't know that the values in this case are
always in range.  Still, maybe it could be improved by not splitting out a BB
if it consists only of a few insns?  I've tried increasing the branch cost but
it won't do anything.

Teresa, Steven,


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/58033] counterproductive bb-reorder
  2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
@ 2013-07-30 21:01 ` olegendo at gcc dot gnu.org
  2013-07-30 21:16 ` tejohnson at google dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-07-30 21:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033

--- Comment #1 from Oleg Endo <olegendo at gcc dot gnu.org> ---
... according to the changelog you've been doing some things on bb-reorder.c,
maybe you have an idea?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/58033] counterproductive bb-reorder
  2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
  2013-07-30 21:01 ` [Bug rtl-optimization/58033] " olegendo at gcc dot gnu.org
@ 2013-07-30 21:16 ` tejohnson at google dot com
  2013-07-30 21:40 ` olegendo at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tejohnson at google dot com @ 2013-07-30 21:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033

--- Comment #2 from Teresa Johnson <tejohnson at google dot com> ---
On Tue, Jul 30, 2013 at 2:00 PM, olegendo at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
>
>             Bug ID: 58033
>            Summary: counterproductive bb-reorder
>            Product: gcc
>            Version: 4.9.0
>             Status: UNCONFIRMED
>           Severity: normal
>           Priority: P3
>          Component: rtl-optimization
>           Assignee: unassigned at gcc dot gnu.org
>           Reporter: olegendo at gcc dot gnu.org
>                 CC: steven at gcc dot gnu.org, tejohnson at google dot com
>             Target: sh*-*-*
>
> On SH, compiling the following code with -O2
>
> #include <bitset>
>
> std::bitset<32> make_bits (void)
> {
>   std::bitset<32> r;
>   for (auto&& i : { 4, 5, 6, 10 })
>     if (i < r.size ())
>       r.set (i);
>
>   return r;
> }
>
> results in the following code:
>
>         mov.l   .L8,r1
>         mov     #0,r0
>         mov     #31,r7
>         mov     #1,r6
>         mov     #4,r2
> .L2:
>         mov.l   @r1,r3
>         cmp/hi  r7,r3
>         bf/s    .L7

I assume it is the above branch that is the issue (not the bf/s .L2
below as that is the same in both versions of the code). I'm assuming
this is not build with FDO? In that case bbro is probably at the mercy
of whatever probabilities the static heuristics assigned to the
branches. Although if it is 50-50 then I'm not sure offhand what
happens - maybe it is biasing in favor of having the shortest trace?
This is a great test case for motivating range propagation. =)

Can you attach the dump created with -fdump-rtl-bbro-all? We can see
what the edge probabilities are. For some reason it is not compiling
for me - what options do you use? My (4_7-based) g++ is complaining
about the "auto":

$ g++ -O2 pr58033.cc -S
pr58033.cc: In function 'std::bitset<32ul> make_bits()':
pr58033.cc:6:12: error: expected unqualified-id before '&&' token
   for (auto&& i : { 4, 5, 6, 10 })

Teresa

>         mov     r6,r5
> .L3:
>         dt      r2
>         bf/s    .L2     // branch if value not > 31, i.e. in each iteration
>         add     #4,r1
>         rts
>         nop
>         .align 1
> .L7:
>         shld    r3,r5
>         bra     .L3
>         or      r5,r0
> .L9:
>         .align 2
> .L8:
>         .long   _._45+0
>
> _._45:
>         .long   4
>         .long   5
>         .long   6
>         .long   10
>
> Disabling the bb-reorder pass or using -Os results in more compact and faster
> code:
>
>         mov.l   .L7,r1
>         mov     #0,r0
>         mov     #31,r7
>         mov     #1,r6
>         mov     #4,r2
> .L2:
>         mov.l   @r1,r3
>         cmp/hi  r7,r3
>         bt/s    .L3    // branch if value > 31, i.e. never.
>         mov     r6,r5
>         shld    r3,r5
>         or      r5,r0
> .L3:
>         dt      r2
>         bf/s    .L2
>         add     #4,r1
>         rts
>         nop
>
> Of course the bb-reorder pass doesn't know that the values in this case are
> always in range.  Still, maybe it could be improved by not splitting out a BB
> if it consists only of a few insns?  I've tried increasing the branch cost but
> it won't do anything.
>
> Teresa, Steven,
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/58033] counterproductive bb-reorder
  2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
  2013-07-30 21:01 ` [Bug rtl-optimization/58033] " olegendo at gcc dot gnu.org
  2013-07-30 21:16 ` tejohnson at google dot com
@ 2013-07-30 21:40 ` olegendo at gcc dot gnu.org
  2013-07-30 23:49 ` tejohnson at google dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-07-30 21:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033

--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Created attachment 30574
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30574&action=edit
bbro dump

(In reply to Teresa Johnson from comment #2)
> 
> I assume it is the above branch that is the issue (not the bf/s .L2
> below as that is the same in both versions of the code). I'm assuming
> this is not build with FDO?

Right, no FDO.

> In that case bbro is probably at the mercy
> of whatever probabilities the static heuristics assigned to the
> branches. Although if it is 50-50 then I'm not sure offhand what
> happens - maybe it is biasing in favor of having the shortest trace?
> This is a great test case for motivating range propagation. =)
> 
> Can you attach the dump created with -fdump-rtl-bbro-all? We can see
> what the edge probabilities are. For some reason it is not compiling
> for me - what options do you use? My (4_7-based) g++ is complaining
> about the "auto":

Sorry, it's C++11, so you need to specify -std=c++11.

Here's a C++03 version, which on my setup (4.9 trunk rev 201282) results in
exactly the same code:

#include <bitset>

std::bitset<32> make_bits2 (void)
{
  static const int ii[] = { 4, 5, 6, 10 };
  std::bitset<32> r;
  for (int i = 0; i < 4; ++i)
    if (ii[i] < r.size ())
      r.set (ii[i]);

  return r;
}


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/58033] counterproductive bb-reorder
  2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2013-07-30 21:40 ` olegendo at gcc dot gnu.org
@ 2013-07-30 23:49 ` tejohnson at google dot com
  2013-11-25 18:44 ` olegendo at gcc dot gnu.org
  2015-04-04 11:29 ` olegendo at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: tejohnson at google dot com @ 2013-07-30 23:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033

--- Comment #4 from Teresa Johnson <tejohnson at google dot com> ---
On Tue, Jul 30, 2013 at 2:40 PM, olegendo at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
>
> --- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> ---
> Created attachment 30574
>   --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30574&action=edit
> bbro dump
>
> (In reply to Teresa Johnson from comment #2)
>>
>> I assume it is the above branch that is the issue (not the bf/s .L2
>> below as that is the same in both versions of the code). I'm assuming
>> this is not build with FDO?
>
> Right, no FDO.
>
>> In that case bbro is probably at the mercy
>> of whatever probabilities the static heuristics assigned to the
>> branches. Although if it is 50-50 then I'm not sure offhand what
>> happens - maybe it is biasing in favor of having the shortest trace?
>> This is a great test case for motivating range propagation. =)

Thanks for the dump. The issue is the heuristically assigned edge frequencies:

;;  succ:       6 [29.0%]  (FALLTHRU,CAN_FALLTHRU)
;;              4 [71.0%]  (CAN_FALLTHRU)

BB 6 contains the code within the then clause that is in fact always
executed. But it is being assigned only 29% probability, which is why
bbro lays out the code as it does.

I'm not familiar with the static edge weight heuristics in gcc, but it
would be interesting to see why it is predicting that the if statement
will be false.

Teresa

>>
>> Can you attach the dump created with -fdump-rtl-bbro-all? We can see
>> what the edge probabilities are. For some reason it is not compiling
>> for me - what options do you use? My (4_7-based) g++ is complaining
>> about the "auto":
>
> Sorry, it's C++11, so you need to specify -std=c++11.
>
> Here's a C++03 version, which on my setup (4.9 trunk rev 201282) results in
> exactly the same code:
>
> #include <bitset>
>
> std::bitset<32> make_bits2 (void)
> {
>   static const int ii[] = { 4, 5, 6, 10 };
>   std::bitset<32> r;
>   for (int i = 0; i < 4; ++i)
>     if (ii[i] < r.size ())
>       r.set (ii[i]);
>
>   return r;
> }
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/58033] counterproductive bb-reorder
  2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2013-07-30 23:49 ` tejohnson at google dot com
@ 2013-11-25 18:44 ` olegendo at gcc dot gnu.org
  2015-04-04 11:29 ` olegendo at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-11-25 18:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033

--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Another somewhat related SH example, compiled with -m4 -O2 -ml on rev 205313:

int
foo1 (long long value)
{
  const long long constant = 0xc000000080000000LL;

  if (value < constant)
    return 1;
  else
    return 2;
}

        mov.l   .L12,r1
        cmp/ge  r1,r5
        bt/s    .L10         <<< should branch to L10
        cmp/gt  r1,r5
        mov     #1,r0
.L6:
        rts    
        nop
    .align 1
.L10:
        bf      .L11         <<< never branches, always falls through
.L5:
        rts
        mov    #2,r0
        .align 1
.L11:
        mov.l   .L13,r1
        cmp/hs  r1,r4
        bt      .L5
        mov     #1,r0
        bra     .L6          <<< could just duplicate BB from L6 here
        nop                      (see also PR 58615)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/58033] counterproductive bb-reorder
  2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2013-11-25 18:44 ` olegendo at gcc dot gnu.org
@ 2015-04-04 11:29 ` olegendo at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-04-04 11:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033

--- Comment #6 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Another example on SH:

const char* test (const char* s0, int c, int* rout)
{
  int r = 0;

  for (int i = 0; i < c; ++i)
    r += s0[i];

  *rout = r;
  return s0;
}


compiled with -O2:
_test:
        cmp/pl  r5
        bf      .L4
        mov     #0,r1
        mov     r4,r2
        .align 2
.L3:
        mov.b   @r2+,r3
        dt      r5
        bf/s    .L3
        add     r3,r1
.L2:
        mov.l   r1,@r6
        rts
        mov     r4,r0
    .align 1
.L4:
        bra     .L2
        mov     #0,r1


compiled with -O2 -fno-reorder-blocks:

_test:
        cmp/pl  r5
        bf/s    .L4
        mov     #0,r1
        mov     r4,r2
    .align 2
.L3:
        mov.b   @r2+,r3
        dt      r5
        bf/s    .L3
        add     r3,r1
        bra     .L7
        mov.l   r1,@r6
        .align 1
.L4:
        mov.l   r1,@r6
.L7:
        rts
        mov     r4,r0

.. which is better, except for the redundant stores.  Folding the two stores
gives the minimal code:
_test:
        cmp/pl  r5
        bf/s    .L4
        mov     #0,r1
        mov     r4,r2
    .align 2
.L3:
        mov.b   @r2+,r3
        dt      r5
        bf/s    .L3
        add     r3,r1
.L4:
        mov.l   r1,@r6
        rts
        mov     r4,r0


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-04-04 11:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-30 21:00 [Bug rtl-optimization/58033] New: counterproductive bb-reorder olegendo at gcc dot gnu.org
2013-07-30 21:01 ` [Bug rtl-optimization/58033] " olegendo at gcc dot gnu.org
2013-07-30 21:16 ` tejohnson at google dot com
2013-07-30 21:40 ` olegendo at gcc dot gnu.org
2013-07-30 23:49 ` tejohnson at google dot com
2013-11-25 18:44 ` olegendo at gcc dot gnu.org
2015-04-04 11:29 ` olegendo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).