[Bug c++/66520] New: bad code generated code for branches with single &

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/66520] New: bad code generated code for branches with single &
@ 2015-06-12 11:52 fkastrati at gmail dot com
  2015-06-13 10:13 ` [Bug c++/66520] " ebotcazou at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: fkastrati at gmail dot com @ 2015-06-12 11:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

            Bug ID: 66520
           Summary: bad code generated code for branches with single &
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fkastrati at gmail dot com
  Target Milestone: ---

consider the following code snippet (c++):

void ampamp(int x, int y) {
  if (x < 3 && y > 10 ) 
    printf("%d%d", x, y);
}

void amp(int x, int y) {
  if ((x < 3) & (y > 10) ) 
    printf("%d%d", x, y);
}

the assembly code generated by g++ (all versions I tested with  optimization
flag `-O3'), is not optimal (see the link on the bottom of this message).
Basically, for both methods, the generated assembly code is identical. For the
method "amp" (with single `&') there should be generated only one jump, as
branch misprediction penalty is very high otherwise!

As a side note: the code by intel's  compiler (ICC) is however generating
optimal code for such scenarios, at least for versions icc13, and icc15 that
I've tested.

See: http://goo.gl/oiTPX5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/66520] bad code generated code for branches with single &
  2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
@ 2015-06-13 10:13 ` ebotcazou at gcc dot gnu.org
  2015-06-13 23:21 ` fkastrati at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2015-06-13 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

Eric Botcazou <ebotcazou at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |ebotcazou at gcc dot gnu.org
         Resolution|---                         |WORKSFORME

--- Comment #1 from Eric Botcazou <ebotcazou at gcc dot gnu.org> ---
> consider the following code snippet (c++):
> 
> void ampamp(int x, int y) {
>   if (x < 3 && y > 10 ) 
>     printf("%d%d", x, y);
> }
> 
> void amp(int x, int y) {
>   if ((x < 3) & (y > 10) ) 
>     printf("%d%d", x, y);
> }
> 
> 
> the assembly code generated by g++ (all versions I tested with  optimization
> flag `-O3'), is not optimal (see the link on the bottom of this message).
> Basically, for both methods, the generated assembly code is identical.

An optimizing compiler should generate the same code in both cases, either with
or without branches, whatever form is deemed the fastest for the target, since
there are no side-effects involved in the evaluation of the comparisons.

> As a side note: the code by intel's  compiler (ICC) is however generating
> optimal code for such scenarios, at least for versions icc13, and icc15 that
> I've tested.

One of the forms is necessarily not optimal (unless they are equivalent).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/66520] bad code generated code for branches with single &
  2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
  2015-06-13 10:13 ` [Bug c++/66520] " ebotcazou at gcc dot gnu.org
@ 2015-06-13 23:21 ` fkastrati at gmail dot com
  2015-06-14  7:23 ` ebotcazou at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: fkastrati at gmail dot com @ 2015-06-13 23:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

--- Comment #2 from Fisnik <fkastrati at gmail dot com> ---
(In reply to Eric Botcazou from comment #1)
> > consider the following code snippet (c++):
> > 
> > void ampamp(int x, int y) {
> >   if (x < 3 && y > 10 ) 
> >     printf("%d%d", x, y);
> > }
> > 
> > void amp(int x, int y) {
> >   if ((x < 3) & (y > 10) ) 
> >     printf("%d%d", x, y);
> > }
> > 
> > 
> > the assembly code generated by g++ (all versions I tested with  optimization
> > flag `-O3'), is not optimal (see the link on the bottom of this message).
> > Basically, for both methods, the generated assembly code is identical.
> 
> An optimizing compiler should generate the same code in both cases, either
> with or without branches, whatever form is deemed the fastest for the
> target, since
> there are no side-effects involved in the evaluation of the comparisons.
> 
> > As a side note: the code by intel's  compiler (ICC) is however generating
> > optimal code for such scenarios, at least for versions icc13, and icc15 that
> > I've tested.
> 
> One of the forms is necessarily not optimal (unless they are equivalent).

I do not quite agree. If the first operand is "always" true, than the version
with double ampersand `&&' is faster. But if probability of the first operand
being true is around 0.5, then the version with single `&' is faster, given
that the compiler generates the correct assembly, i.e., a single jump. The
reason for this is the branch misprediction penalty, which is severe at
probabilities around 0.5.

I already did experiments with both methods `&', and `&&', and varying
probabilities for predicates being true.
The difference (in speed) is around factor 2.3, as the selectivity of the first
predicate approaches 0.5, (i.e., the probability of the first operand being
true -> 0.5).
That is, the method with single ampersand with beat clearly the method with
double ampersand.

The predicates I used were simple, but for more expensive predicates the
difference is even worse.
If still not convinced, read the paper: "Conjunctive selection conditions in
main memory" by K.Ross.

There should be at least some flag available, such that we can set such a flag
and have the compiler generate only a single jump for the method with single
ampersand.

[reply] [-] Comment 4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/66520] bad code generated code for branches with single &
  2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
  2015-06-13 10:13 ` [Bug c++/66520] " ebotcazou at gcc dot gnu.org
  2015-06-13 23:21 ` fkastrati at gmail dot com
@ 2015-06-14  7:23 ` ebotcazou at gcc dot gnu.org
  2015-06-14 10:44 ` fkastrati at gmail dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2015-06-14  7:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

--- Comment #3 from Eric Botcazou <ebotcazou at gcc dot gnu.org> ---
> There should be at least some flag available, such that we can set such a
> flag and have the compiler generate only a single jump for the method with
> single ampersand.

You missed the point though, the compiler should generate the same code in both
cases.  Whether or not it's the branchy form is another debate, which cannot be
settled with toy examples like this one but by using real-life benchmarks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/66520] bad code generated code for branches with single &
  2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
                   ` (2 preceding siblings ...)
  2015-06-14  7:23 ` ebotcazou at gcc dot gnu.org
@ 2015-06-14 10:44 ` fkastrati at gmail dot com
  2015-06-14 10:57 ` ebotcazou at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: fkastrati at gmail dot com @ 2015-06-14 10:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

--- Comment #4 from Fisnik <fkastrati at gmail dot com> ---
(In reply to Eric Botcazou from comment #3)
> > There should be at least some flag available, such that we can set such a
> > flag and have the compiler generate only a single jump for the method with
> > single ampersand.
> 
> You missed the point though, the compiler should generate the same code in
> both cases.  Whether or not it's the branchy form is another debate, which
> cannot be settled with toy examples like this one but by using real-life
> benchmarks.

Compiler should not generate the same code, and should listen to the developer,
when she/he connects predicates with single `&'. I already wrote that I did
benchmarks, and also mentioned a paper from a top computer science conference
(which I see you didn't even bother to read it) which has all the evidence
required.
The branch misprediction penalty for the assembly code with two jumps is simply
too severe when the selectivity of a predicate is around 0.5. 
And I also mentioned it that Intel's compiler is catching this very well. The
example was given just to point the problem, I cannot possibly write you here
10k lines of code of my project.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/66520] bad code generated code for branches with single &
  2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
                   ` (3 preceding siblings ...)
  2015-06-14 10:44 ` fkastrati at gmail dot com
@ 2015-06-14 10:57 ` ebotcazou at gcc dot gnu.org
  2015-06-14 10:58 ` schwab@linux-m68k.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2015-06-14 10:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

--- Comment #5 from Eric Botcazou <ebotcazou at gcc dot gnu.org> ---
> Compiler should not generate the same code, and should listen to the
> developer, when she/he connects predicates with single `&'. I already wrote
> that I did benchmarks, and also mentioned a paper from a top computer
> science conference (which I see you didn't even bother to read it) which has
> all the evidence required.

You're still missing the point!  If the non-branchy form is the fastest, then
the compiler should generate it also with the short-circuit form &&.

> And I also mentioned it that Intel's compiler is catching this very well.

Nope, the Intel compiler is inconsistent here (unlike GCC and Clang).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/66520] bad code generated code for branches with single &
  2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
                   ` (4 preceding siblings ...)
  2015-06-14 10:57 ` ebotcazou at gcc dot gnu.org
@ 2015-06-14 10:58 ` schwab@linux-m68k.org
  2015-06-14 11:26 ` fkastrati at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: schwab@linux-m68k.org @ 2015-06-14 10:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

--- Comment #6 from Andreas Schwab <schwab@linux-m68k.org> ---
If a and b are side-effect-free, pure-boolean expressions then `a && b' and `a
& b' are completely equivalent and there is no reason to generate different
code for them.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/66520] bad code generated code for branches with single &
  2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
                   ` (5 preceding siblings ...)
  2015-06-14 10:58 ` schwab@linux-m68k.org
@ 2015-06-14 11:26 ` fkastrati at gmail dot com
  2015-06-14 11:34 ` fkastrati at gmail dot com
  2015-06-14 12:26 ` redi at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: fkastrati at gmail dot com @ 2015-06-14 11:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

--- Comment #7 from Fisnik <fkastrati at gmail dot com> ---
(In reply to Eric Botcazou from comment #5)
> > Compiler should not generate the same code, and should listen to the
> > developer, when she/he connects predicates with single `&'. I already wrote
> > that I did benchmarks, and also mentioned a paper from a top computer
> > science conference (which I see you didn't even bother to read it) which has
> > all the evidence required.
> 
> You're still missing the point!  If the non-branchy form is the fastest,
> then the compiler should generate it also with the short-circuit form &&.
> 
> > And I also mentioned it that Intel's compiler is catching this very well.
> 
> Nope, the Intel compiler is inconsistent here (unlike GCC and Clang).

As I said, I would leave that to the developer. If I connect predicates with &,
the compiler should generate the non-branchy code. So this issue is
definatively an optimization bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/66520] bad code generated code for branches with single &
  2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
                   ` (6 preceding siblings ...)
  2015-06-14 11:26 ` fkastrati at gmail dot com
@ 2015-06-14 11:34 ` fkastrati at gmail dot com
  2015-06-14 12:26 ` redi at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: fkastrati at gmail dot com @ 2015-06-14 11:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

--- Comment #8 from Fisnik <fkastrati at gmail dot com> ---
(In reply to Andreas Schwab from comment #6)
> If a and b are side-effect-free, pure-boolean expressions then `a && b' and
> `a & b' are completely equivalent and there is no reason to generate
> different code for them.

I said it is an optimization bug. 
The branchy code is very expensive when the first operand has probability of
around 0.5 of being true, therefore it ruins the branch prediction. Now you can
imagine a much more complex scenario, such as  e.g. (A & B & ... & Z), and if
all predicates have probabilities of 0.5, then I'll end here with very
inefficient code (the way the g++ is generating assembly). This does not happen
for e.g. with icc.

To this end, the compiler should respect the code written by the developer. I'm
already having trouble with g++, as for the code with single &, it is
generating inefficient code. Or I have to remove completely the optimization
flag '-O3', but then I have a very slow code, and your comments are not helping
me at all solve the problem!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/66520] bad code generated code for branches with single &
  2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
                   ` (7 preceding siblings ...)
  2015-06-14 11:34 ` fkastrati at gmail dot com
@ 2015-06-14 12:26 ` redi at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2015-06-14 12:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66520

--- Comment #9 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Fisnik from comment #8)
> To this end, the compiler should respect the code written by the developer.

As far as the C++ standard is concerned the built-in && and & operators for
type bool are equivalent if evaluating the operands has no side-effects. C++ is
not a high-level assembly language.

> I'm already having trouble with g++, as for the code with single &, it is
> generating inefficient code. Or I have to remove completely the optimization
> flag '-O3', but then I have a very slow code

Have you tried just using -O3 -fno-xxx for the relevant pass?


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-06-14 12:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-12 11:52 [Bug c++/66520] New: bad code generated code for branches with single & fkastrati at gmail dot com
2015-06-13 10:13 ` [Bug c++/66520] " ebotcazou at gcc dot gnu.org
2015-06-13 23:21 ` fkastrati at gmail dot com
2015-06-14  7:23 ` ebotcazou at gcc dot gnu.org
2015-06-14 10:44 ` fkastrati at gmail dot com
2015-06-14 10:57 ` ebotcazou at gcc dot gnu.org
2015-06-14 10:58 ` schwab@linux-m68k.org
2015-06-14 11:26 ` fkastrati at gmail dot com
2015-06-14 11:34 ` fkastrati at gmail dot com
2015-06-14 12:26 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).