[Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c)

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
@ 2015-09-18 14:32 ktkachov at gcc dot gnu.org
  2015-09-18 14:48 ` [Bug tree-optimization/67628] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2015-09-18 14:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628

            Bug ID: 67628
           Summary: [tree-optimization] (a && b) && c shows better codegen
                    than a && (b && c)
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

Consider the two functions:
int
foo1 (int a, int b, int c, int d)
{
  return a > b && b <= c && c > d;
}

int
foo2 (int a, int b, int c, int d)
{
  return a > b && (b <= c && c > d);
}

On aarch64 foo1 generates:
foo1:
        cmp     w1, w2
        ccmp    w2, w3, 4, le
        ccmp    w0, w1, 4, gt
        cset    w0, gt
        ret

but for foo2 generates:
foo2:
        cmp     w0, w1
        ble     .L4
        cmp     w1, w2
        cset    w1, le
        cmp     w2, w3
        cset    w0, gt
        and     w0, w1, w0
        ret


Something similar is observed on x86_64 where foo2 contains a conditional
branch instruction where foo1 is a single basic block

In foo2 we end up generating multiple basic blocks whereas for foo1 we manage
to merge them all into 1 basic block which ends up going through the
conditional-compare pass nicely.

Looking at the final .optimized tree dump the foo1 tree is:
  _BoolD.2673 _1;
  _BoolD.2673 _4;
  _BoolD.2673 _6;
  _BoolD.2673 _10;
  _BoolD.2673 _11;
  intD.7 _12;

;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
;;    prev block 0, next block 1, flags: (NEW, REACHABLE)
;;    pred:       ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  # RANGE [0, 1]
  _4 = a_2(D) > b_3(D);
  # RANGE [0, 1]
  _6 = b_3(D) <= c_5(D);
  # RANGE [0, 1]
  _10 = c_5(D) > d_8(D);
  # RANGE [0, 1]
  _1 = _6 & _10;
  # RANGE [0, 1]
  _11 = _1 & _4;
  # RANGE [0, 1] NONZERO 1
  _12 = (intD.7) _11;
  # VUSE <.MEM_9(D)>
  return _12;
;;    succ:       EXIT [100.0%] 




whereas for foo2 it's more complex:
  intD.7 iftmp.0_1;
  _BoolD.2673 _5;
  _BoolD.2673 _7;
  _BoolD.2673 _8;
  intD.7 _10;
  _BoolD.2673 _11;

;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
;;    prev block 0, next block 3, flags: (NEW, REACHABLE)
;;    pred:       ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  if (a_2(D) > b_3(D))
    goto <bb 3>;
  else
    goto <bb 4>;
;;    succ:       3 [50.0%]  (TRUE_VALUE,EXECUTABLE)
;;                4 [50.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 0, freq 5000, maybe hot
;;    prev block 2, next block 4, flags: (NEW, REACHABLE)
;;    pred:       2 [50.0%]  (TRUE_VALUE,EXECUTABLE)
  # RANGE [0, 1]
  _5 = b_3(D) <= c_4(D);
  # RANGE [0, 1]
  _7 = c_4(D) > d_6(D);
  # RANGE [0, 1]
  _8 = _5 & _7;
;;    succ:       4 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 4, loop depth 0, count 0, freq 10000, maybe hot
;;    prev block 3, next block 1, flags: (NEW, REACHABLE)
;;    pred:       3 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                2 [50.0%]  (FALSE_VALUE,EXECUTABLE)
  # _11 = PHI <_8(3), 0(2)>
  # RANGE [0, 1] NONZERO 1
  iftmp.0_1 = (intD.7) _11;
  # VUSE <.MEM_9(D)>
  return iftmp.0_1;
;;    succ:       EXIT [100.0%] 



If we were to pick some kind of canonicalization for these equivalent
expressions it would make life easier for later passes to generate consistent
code.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/67628] [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
  2015-09-18 14:32 [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c) ktkachov at gcc dot gnu.org
@ 2015-09-18 14:48 ` pinskia at gcc dot gnu.org
  2015-09-18 15:00 ` ktkachov at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-09-18 14:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-09-18
                 CC|                            |pinskia at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is due to the fold-const.c optimization which should not be there any
more. You need to do benchmarking on x86 also if you remove it. 

I have not done that yet which is why I have not submitted the patch to fix
this.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/67628] [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
  2015-09-18 14:32 [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c) ktkachov at gcc dot gnu.org
  2015-09-18 14:48 ` [Bug tree-optimization/67628] " pinskia at gcc dot gnu.org
@ 2015-09-18 15:00 ` ktkachov at gcc dot gnu.org
  2015-09-18 15:10 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2015-09-18 15:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628

--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #1)
> This is due to the fold-const.c optimization which should not be there any
> more. You need to do benchmarking on x86 also if you remove it. 
> 

could you elaborate what optimization is that?
Is it a matter of fold-const.c or match.pd canonicalizing the expression in
some way?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/67628] [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
  2015-09-18 14:32 [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c) ktkachov at gcc dot gnu.org
  2015-09-18 14:48 ` [Bug tree-optimization/67628] " pinskia at gcc dot gnu.org
  2015-09-18 15:00 ` ktkachov at gcc dot gnu.org
@ 2015-09-18 15:10 ` pinskia at gcc dot gnu.org
  2015-09-21 14:23 ` ktkachov at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-09-18 15:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to ktkachov from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > This is due to the fold-const.c optimization which should not be there any
> > more. You need to do benchmarking on x86 also if you remove it. 
> > 
> 
> could you elaborate what optimization is that?
> Is it a matter of fold-const.c or match.pd canonicalizing the expression in
> some way?

The optimization is converting andif into and. Basically ifcombine pass does
not recombine them if fold-const does it early on.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/67628] [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
  2015-09-18 14:32 [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c) ktkachov at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-09-18 15:10 ` pinskia at gcc dot gnu.org
@ 2015-09-21 14:23 ` ktkachov at gcc dot gnu.org
  2021-06-07  7:30 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2015-09-21 14:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628

--- Comment #4 from ktkachov at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #3)
> (In reply to ktkachov from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > This is due to the fold-const.c optimization which should not be there any
> > > more. You need to do benchmarking on x86 also if you remove it. 
> > > 
> > 
> > could you elaborate what optimization is that?
> > Is it a matter of fold-const.c or match.pd canonicalizing the expression in
> > some way?
> 
> The optimization is converting andif into and. Basically ifcombine pass does
> not recombine them if fold-const does it early on.

Right, seems like the code in fold_truth_andor.
I think ifcombine should be doing a better job here though


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/67628] [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
  2015-09-18 14:32 [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c) ktkachov at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-09-21 14:23 ` ktkachov at gcc dot gnu.org
@ 2021-06-07  7:30 ` pinskia at gcc dot gnu.org
  2021-07-20  6:50 ` pinskia at gcc dot gnu.org
  2023-09-03  3:29 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-07  7:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Another testcase which shows the problem to be even worse:
#define isgap(c) ((c) == ' ' || (c) == '.' || (c) == '_' || (c) == '-' || (c)
== '~')

int f(char a)
{
  return isgap(a);
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/67628] [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
  2015-09-18 14:32 [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c) ktkachov at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-06-07  7:30 ` pinskia at gcc dot gnu.org
@ 2021-07-20  6:50 ` pinskia at gcc dot gnu.org
  2023-09-03  3:29 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-20  6:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2021-06-07 00:00:00         |2021-7-19
           Severity|normal                      |enhancement

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
  if (a_5(D) > b_6(D))
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  _1 = b_6(D) <= c_7(D);
  _2 = c_7(D) > d_8(D);
  _3 = _1 & _2;
  _10 = (int) _3;

  <bb 4> [local count: 1073741824]:
  # iftmp.1_4 = PHI <_10(3), 0(2)>


If what was in BB 3 was simplier, I have patches to implement that already.
The last time I tried to disable the code in fold_truth_andor there was some
testsuite fall out. I won't be able to get to trying to fix this until next
year at the earliest I think.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/67628] [tree-optimization] (a && b) && c shows better codegen than a && (b && c)
  2015-09-18 14:32 [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c) ktkachov at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-07-20  6:50 ` pinskia at gcc dot gnu.org
@ 2023-09-03  3:29 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-09-03  3:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Mine. 

I have ideas on how to fix ifcombine here to allow defining statements but only
if they are boolean in type ...

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-09-03  3:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-18 14:32 [Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c) ktkachov at gcc dot gnu.org
2015-09-18 14:48 ` [Bug tree-optimization/67628] " pinskia at gcc dot gnu.org
2015-09-18 15:00 ` ktkachov at gcc dot gnu.org
2015-09-18 15:10 ` pinskia at gcc dot gnu.org
2015-09-21 14:23 ` ktkachov at gcc dot gnu.org
2021-06-07  7:30 ` pinskia at gcc dot gnu.org
2021-07-20  6:50 ` pinskia at gcc dot gnu.org
2023-09-03  3:29 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).