public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/66573] New: Unexpected change in static, branch-prediction cost from O1 to O2.
@ 2015-06-17 15:07 jmcguiness at liquidcapital dot com
  2015-06-18  8:55 ` [Bug c++/66573] " jmcguiness at liquidcapital dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: jmcguiness at liquidcapital dot com @ 2015-06-17 15:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573

            Bug ID: 66573
           Summary: Unexpected change in static, branch-prediction cost
                    from O1 to O2.
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jmcguiness at liquidcapital dot com
  Target Milestone: ---

For the following versions of gcc: 4.8.2, 4.9.0 and 5.10 with the following
code sequence:

extern void bar1();
extern void bar2();

void foo(bool i) {
  if (i)
        bar1();
  else
    bar2();
}

I have examined the assembler output for -O0, -O1, -O2 and -O3.

For all versions with -O0 & -O1 I get this:

foo(bool):
        subq    $8, %rsp
        testb   %dil, %dil
        je      .L2
        call    bar1()
        jmp     .L1
.L2:
        call    bar2()
.L1:
        addq    $8, %rsp
        ret

For all version with -O2 and -O3 I get this:

foo(bool):
        testb   %dil, %dil
        jne     .L4
        jmp     bar2()
.L4:
        jmp     bar1()

Note how the calls to bar1() and bar2() have been swapped. (I realise that the
condition has been swapped too, so that the generated code is correct.)

According to:

https://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts/

Forward jumps are not taken. So what has happened is that for -O0 and -O1 the
static branch-prediction has given no mis-prediction if bar1() is called, i.e.
the condition is usually true. But this flips for -O2 and -O3 so that now
bar2() suffers no mis-prediction if the condition is usually false.

The result is that if one codes so that one minimises the mis-predition cost
for -O0 and -O1 this becomes a pessimisation for -O2 and -O3. A rather
unexpected result.

Note that icc v13.0.1 generates the following assembly sequence:

foo(bool):
        testb     %dil, %dil                                    #5.7
        je        ..B1.3        # Prob 50%                      #5.7
        jmp       bar1()                                      #6.2
..B1.3:                         # Preds ..B1.1
        jmp       bar2()                                      #8.5

Which is stable for all optimisation levels, and works as one might expect: the
if-branch is the predicted branch, the else not.

(Yes I am aware of __builtin_expected(), but that is beside the point. One
would expect that the cost should reduce with increasing levels of optimisation
not increase!)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug c++/66573] Unexpected change in static, branch-prediction cost from O1 to O2.
  2015-06-17 15:07 [Bug c++/66573] New: Unexpected change in static, branch-prediction cost from O1 to O2 jmcguiness at liquidcapital dot com
@ 2015-06-18  8:55 ` jmcguiness at liquidcapital dot com
  2015-06-18  9:02 ` jmcguiness at liquidcapital dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jmcguiness at liquidcapital dot com @ 2015-06-18  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573

--- Comment #1 from Jason McG <jmcguiness at liquidcapital dot com> ---
Note that clang++ for all tested versions (3.0, 3.2, 3.3, 3.4.1, 3.5, 3.5.1,
3.6 (rc2), 3.7 (experimental)) produce the same assembler output for -O1, -O2 &
-O3:

foo(bool):                                # @foo(bool)
        testb   %dil, %dil
        je      .LBB0_2
        jmp     bar1()                # TAILCALL
.LBB0_2:
        jmp     bar2()                # TAILCALL

With bar1() being the statically-predicted branch, as per icc and Intel's own
documentation (as referenced in the link in the initial comment).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug c++/66573] Unexpected change in static, branch-prediction cost from O1 to O2.
  2015-06-17 15:07 [Bug c++/66573] New: Unexpected change in static, branch-prediction cost from O1 to O2 jmcguiness at liquidcapital dot com
  2015-06-18  8:55 ` [Bug c++/66573] " jmcguiness at liquidcapital dot com
@ 2015-06-18  9:02 ` jmcguiness at liquidcapital dot com
  2015-06-18 22:25 ` [Bug rtl-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else msebor at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jmcguiness at liquidcapital dot com @ 2015-06-18  9:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573

--- Comment #2 from Jason McG <jmcguiness at liquidcapital dot com> ---
If I try with this code:

#define likely(x)       __builtin_expect((x),1)
#define unlikely(x)     __builtin_expect((x),0)

extern void bar1();
extern void bar2();

void foo(bool i) {
//  if (i)
  if (likely(i))
        bar1();
  else
    bar2();
}

Then the assembly output for gcc 4.8.2 & 5.10 (the ones I tested) for -O0, -O1,
-O2 & -O3 is the same:

foo(bool):
        testb   %dil, %dil
        je      .L2
        jmp     bar1()
.L2:
        jmp     bar2()

i.e. gcc now behaves as per icc and clang++. So using __builtin_expect seems to
be a work-around to get gcc to behave like other compilers.

I would sincerely hope that at least the documentation for these optimisation
options warns the programmer that gcc optimises if-else statements in the
reverse manner to other major compilers and even across it's own optimisation
levels (!), as this apparent change in behaviour is most unexpected...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else.
  2015-06-17 15:07 [Bug c++/66573] New: Unexpected change in static, branch-prediction cost from O1 to O2 jmcguiness at liquidcapital dot com
  2015-06-18  8:55 ` [Bug c++/66573] " jmcguiness at liquidcapital dot com
  2015-06-18  9:02 ` jmcguiness at liquidcapital dot com
@ 2015-06-18 22:25 ` msebor at gcc dot gnu.org
  2015-06-18 23:13 ` [Bug tree-optimization/66573] " pinskia at gcc dot gnu.org
  2020-04-13 11:27 ` gcc-bugs at hussar dot me.uk
  4 siblings, 0 replies; 6+ messages in thread
From: msebor at gcc dot gnu.org @ 2015-06-18 22:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573

Martin Sebor <msebor at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|c++                         |rtl-optimization

--- Comment #3 from Martin Sebor <msebor at gcc dot gnu.org> ---
Since this isn't C++ specific but rather a doing of the RTL optimizer I changed
the component to rtl-optimization.

On powerpc64, which historically has implemented the same static branch
prediction strategy, and where IBM XLC emits the same code as Clang, GCC emits
the following at -O1:

foo:
        ...
        cmpdi 7,3,0
        beq 7,.L2
        bl bar1
        ...
        b .L1
.L2:
        bl bar2
        ...
.L1:
        ...
        blr

while the following at -O2:

foo:
        ...
        cmpdi 7,3,0
        ...
        bne 7,.L6
        bl bar2
        ...
        blr
.L6:
        bl bar1
        ...
        blr

Comparing the RTL dumps between -O1 and -O2 it looks like the change is
introduced in the basic block reordering (bbro) pass that only runs at -O2. 
There, the if_then_else eq instruction

(jump_insn 7 6 8 2 (set (pc)
        (if_then_else (eq (reg:CC 75 7 [156])
                (const_int 0 [0]))
            (label_ref 12)
            (pc))) t.c:5 693 {*rs6000.md:12637}
     (expr_list:REG_DEAD (reg:CC 75 7 [156])
        (int_list:REG_BR_PROB 6100 (nil)))
 -> 12)
...
(call_insn 14 13 17 3 (parallel [
            (call (mem:SI (symbol_ref:DI ("bar2") [flags 0x41]  <function_decl
0x3fff84018c10 bar2>) [0 bar2 S4 A8])

is replaced with

(jump_insn 7 6 13 2 (set (pc)
        (if_then_else (ne (reg:CC 75 7 [156])
                (const_int 0 [0]))
            (label_ref:DI 53)
            (pc))) t.c:5 693 {*rs6000.md:12637}
     (expr_list:REG_DEAD (reg:CC 75 7 [156])
        (int_list:REG_BR_PROB 3900 (nil)))
 -> 53)
...
(call_insn 14 13 34 3 (parallel [
            (call (mem:SI (symbol_ref:DI ("bar2") [flags 0x41]  <function_decl
0x3fff84018c10 bar2>) [0 bar2 S4 A8])

I would expect this to then be corrected if necessary according to the
processor's static branch prediction strategy but it clearly doesn't happen for
powerpc64 or apparently x86_64.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else.
  2015-06-17 15:07 [Bug c++/66573] New: Unexpected change in static, branch-prediction cost from O1 to O2 jmcguiness at liquidcapital dot com
                   ` (2 preceding siblings ...)
  2015-06-18 22:25 ` [Bug rtl-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else msebor at gcc dot gnu.org
@ 2015-06-18 23:13 ` pinskia at gcc dot gnu.org
  2020-04-13 11:27 ` gcc-bugs at hussar dot me.uk
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-06-18 23:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |tree-optimization

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>REG_BR_PROB 6100 (nil)

This is the issue.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else.
  2015-06-17 15:07 [Bug c++/66573] New: Unexpected change in static, branch-prediction cost from O1 to O2 jmcguiness at liquidcapital dot com
                   ` (3 preceding siblings ...)
  2015-06-18 23:13 ` [Bug tree-optimization/66573] " pinskia at gcc dot gnu.org
@ 2020-04-13 11:27 ` gcc-bugs at hussar dot me.uk
  4 siblings, 0 replies; 6+ messages in thread
From: gcc-bugs at hussar dot me.uk @ 2020-04-13 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573

--- Comment #12 from Jason <gcc-bugs at hussar dot me.uk> ---
This significant change appears to have been resolved in g++ v8.1.0. So now g++
does the same as the massive corpus of computer science and other compilers.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-04-13 11:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-17 15:07 [Bug c++/66573] New: Unexpected change in static, branch-prediction cost from O1 to O2 jmcguiness at liquidcapital dot com
2015-06-18  8:55 ` [Bug c++/66573] " jmcguiness at liquidcapital dot com
2015-06-18  9:02 ` jmcguiness at liquidcapital dot com
2015-06-18 22:25 ` [Bug rtl-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else msebor at gcc dot gnu.org
2015-06-18 23:13 ` [Bug tree-optimization/66573] " pinskia at gcc dot gnu.org
2020-04-13 11:27 ` gcc-bugs at hussar dot me.uk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).