[Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/32698]  New: [4.3 regression] inefficient pointer expression
@ 2007-07-09 13:25 zippel at gcc dot gnu dot org
  2007-07-09 13:40 ` [Bug tree-optimization/32698] " rguenth at gcc dot gnu dot org
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-09 13:25 UTC (permalink / raw)
  To: gcc-bugs

Taking this example:

int foo(int *p, unsigned int i)
{       
  return p[i + 1] + p[i + 2] + p[i + 3];
}

produces inefficient code. The problem already starts at tree level, where for
"p[i+1]" an expression like *(p + (i + 1) * 4)) is generated, which is not a
common pointer expression.
Also since this is different from the other generated pointer expression, the
common index expression isn't completely replaced.

An initial discussion about this can be found here:
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00418.html


-- 
           Summary: [4.3 regression] inefficient pointer expression
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: zippel at gcc dot gnu dot org
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
@ 2007-07-09 13:40 ` rguenth at gcc dot gnu dot org
  2007-07-09 13:41 ` rguenth at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 13:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2007-07-09 13:39 -------
With 2.95, 3.3.6 and 3.4.6 I get (for 32bit):

foo:
        pushl %ebp
        movl %esp,%ebp
        movl 8(%ebp),%edx
        movl 12(%ebp),%eax
        movl %ebp,%esp
        popl %ebp
        movl 8(%edx,%eax,4),%ecx
        addl 4(%edx,%eax,4),%ecx
        addl 12(%edx,%eax,4),%ecx
        movl %ecx,%eax
        ret

4.0.3 and 4.1.2 generate:

foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %edx
        sall    $2, %edx
        addl    8(%ebp), %edx
        movl    8(%edx), %eax
        addl    4(%edx), %eax
        addl    12(%edx), %eax
        leave
        ret

4.2.0 generates

foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %ecx
        movl    8(%ebp), %edx
        popl    %ebp
        sall    $2, %ecx
        movl    4(%ecx,%edx), %eax
        addl    8(%ecx,%edx), %eax
        addl    12(%ecx,%edx), %eax
        ret

while mainline generates

foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %ecx
        movl    8(%ebp), %edx
        pushl   %ebx
        leal    0(,%ecx,4), %ebx
        movl    8(%ebx,%edx), %eax
        addl    4(%edx,%ecx,4), %eax
        addl    12(%ebx,%edx), %eax
        popl    %ebx
        popl    %ebp
        ret

64bit variants of all of the above create

foo:
.LFB2:
        leal    1(%rsi), %edx
        leal    2(%rsi), %eax
        addl    $3, %esi
        movl    (%rdi,%rax,4), %eax
        addl    (%rdi,%rdx,4), %eax
        addl    (%rdi,%rsi,4), %eax
        ret

Tree dumps (of mainline) 64bit vs. 32bit are:

  return (*(p + (long unsigned int) (i + 2) * 4) + *(p + (long unsigned int) (i
+ 1) * 4)) + *(p + (long unsigned int) (i + 3) * 4);

  unsigned int D.1660;
  D.1660 = i * 4;
  return (*(p + (D.1660 + 8)) + *(p + (i + 1) * 4)) + *(p + (D.1660 + 12));


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   GCC host triplet|i686-pc-linux-gnu           |
 GCC target triplet|                            |i686-*-*
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2007-07-09 13:39:51
               date|                            |
   Target Milestone|---                         |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
  2007-07-09 13:40 ` [Bug tree-optimization/32698] " rguenth at gcc dot gnu dot org
@ 2007-07-09 13:41 ` rguenth at gcc dot gnu dot org
  2007-07-09 13:42 ` rguenth at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 13:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2007-07-09 13:41 -------
Created an attachment (id=13875)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13875&action=view)
patch

With the proposed patch mainline generates again

foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %ecx
        movl    12(%ebp), %edx
        popl    %ebp
        movl    8(%ecx,%edx,4), %eax
        addl    4(%ecx,%edx,4), %eax
        addl    12(%ecx,%edx,4), %eax
        ret

and

  return (*(p + (i + 2) * 4) + *(p + (i + 1) * 4)) + *(p + (i + 3) * 4);


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
  2007-07-09 13:40 ` [Bug tree-optimization/32698] " rguenth at gcc dot gnu dot org
  2007-07-09 13:41 ` rguenth at gcc dot gnu dot org
@ 2007-07-09 13:42 ` rguenth at gcc dot gnu dot org
  2007-07-09 14:40 ` zippel at gcc dot gnu dot org
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 13:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2007-07-09 13:42 -------
Mine.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |rguenth at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2007-07-09 13:39:51         |2007-07-09 13:42:24
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2007-07-09 13:42 ` rguenth at gcc dot gnu dot org
@ 2007-07-09 14:40 ` zippel at gcc dot gnu dot org
  2007-07-09 15:01 ` rguenth at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-09 14:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from zippel at gcc dot gnu dot org  2007-07-09 14:40 -------
IMHO something like this should be generated:

p2 = p + (i * 4);
return (*(p2 + 4) + *(p2 + 8) + *(p2 + 12));

Right now not even the (i*4) expression is removed from the last instruction
anymore.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2007-07-09 14:40 ` zippel at gcc dot gnu dot org
@ 2007-07-09 15:01 ` rguenth at gcc dot gnu dot org
  2007-07-09 15:28 ` zippel at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 15:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenth at gcc dot gnu dot org  2007-07-09 15:00 -------
Note that we don't hoist the i*4 or the addition to p to help addressing mode
selection which likes to see the "whole" address.

Of course it shouldn't matter in which form we see the addresses and the same
code should be generated for all, still canonicalization to the same form
makes a difference.

In fact, canonicalizing to

  unsigned int D.1656;

<bb 2>:
  D.1656 = i * 4;
  return (*(p + (D.1656 + 8)) + *(p + (D.1656 + 4))) + *(p + (D.1656 + 12));

as you suggest creates worse assembly (look at the extra shift)

foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %ecx
        movl    8(%ebp), %edx
        popl    %ebp
        sall    $2, %ecx
        movl    8(%ecx,%edx), %eax
        addl    4(%ecx,%edx), %eax
        addl    12(%ecx,%edx), %eax
        ret

in fact the above shows that the proper fix would be in the backend (if
there is anything to fix) and making sure we consistently canonicalize
is good enough.  Consider the related testcase

int foo(int *p, short *q, char *r, unsigned int i)
{
  return p[i + 1] + q[i + 1] + r[i + 1];
}

which in one case is canonicalized to

  unsigned int D.1658;

<bb 2>:
  D.1658 = i + 1;
  return ((int) *(r + D.1658) + *(p + D.1658 * 4)) + (int) *(q + D.1658 * 2);

in the other to

  return ((int) *(r + (i + 1)) + *(p + (i * 4 + 4))) + (int) *(q + (i * 2 +
2));

so there is no form that is clearly better to canonicalize to.  But the
correct "form" depends on the context (whether it is profitable to either
CSE i * 4 or i + 1).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2007-07-09 15:01 ` rguenth at gcc dot gnu dot org
@ 2007-07-09 15:28 ` zippel at gcc dot gnu dot org
  2007-07-09 15:37 ` rguenther at suse dot de
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-09 15:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from zippel at gcc dot gnu dot org  2007-07-09 15:27 -------
(In reply to comment #5)
> as you suggest creates worse assembly (look at the extra shift)
> 
> foo:
>         pushl   %ebp
>         movl    %esp, %ebp
>         movl    12(%ebp), %ecx
>         movl    8(%ebp), %edx
>         popl    %ebp
>         sall    $2, %ecx
>         movl    8(%ecx,%edx), %eax
>         addl    4(%ecx,%edx), %eax
>         addl    12(%ecx,%edx), %eax
>         ret

The cost of this is dependent on the target, so IMO the shift could be
propagated back into the address at RTL level.

> so there is no form that is clearly better to canonicalize to.

Your example is rather artificial and depends on that (i + x) * y is completely
eliminated. My main point is still that such expression are far more difficult
to translate into proper address operations.
To generate addresses targeting a form of (i * x) + y is clearly better.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (5 preceding siblings ...)
  2007-07-09 15:28 ` zippel at gcc dot gnu dot org
@ 2007-07-09 15:37 ` rguenther at suse dot de
  2007-07-09 17:42 ` zippel at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenther at suse dot de @ 2007-07-09 15:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from rguenther at suse dot de  2007-07-09 15:37 -------
Subject: Re:  [4.3 regression] inefficient
 pointer expression

On Mon, 9 Jul 2007, zippel at gcc dot gnu dot org wrote:

> (In reply to comment #5)
> > as you suggest creates worse assembly (look at the extra shift)
> > 
> > foo:
> >         pushl   %ebp
> >         movl    %esp, %ebp
> >         movl    12(%ebp), %ecx
> >         movl    8(%ebp), %edx
> >         popl    %ebp
> >         sall    $2, %ecx
> >         movl    8(%ecx,%edx), %eax
> >         addl    4(%ecx,%edx), %eax
> >         addl    12(%ecx,%edx), %eax
> >         ret
> 
> The cost of this is dependent on the target, so IMO the shift could be
> propagated back into the address at RTL level.

Or the other way around.

> > so there is no form that is clearly better to canonicalize to.
> 
> Your example is rather artificial and depends on that (i + x) * y is completely
> eliminated. My main point is still that such expression are far more difficult
> to translate into proper address operations.
> To generate addresses targeting a form of (i * x) + y is clearly better.

My example is not artificial.  It is quite common to have loops that
stream data from one type to another.  Also it's not 'clearly' better to 
me.

Anyway, fold () is certainly not able to decide this and both value
numbering and re-association can improve the IL by taking into account
context.

On the backend side we have the fwprop pass which is supposed to do
addressing mode selection and the backend which is supposed to provide
accurate costs for them.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (6 preceding siblings ...)
  2007-07-09 15:37 ` rguenther at suse dot de
@ 2007-07-09 17:42 ` zippel at gcc dot gnu dot org
  2007-07-09 19:42 ` rguenth at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-09 17:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from zippel at gcc dot gnu dot org  2007-07-09 17:42 -------
(In reply to comment #7)
> On the backend side we have the fwprop pass which is supposed to do
> addressing mode selection and the backend which is supposed to provide
> accurate costs for them.

Let's take your proposed form:

  return (*(p + (i + 2) * 4) + *(p + (i + 1) * 4)) + *(p + (i + 3) * 4);

After initial RTL expansion something like this is generated:

  t1 = i + 2;
  r = *(p + t1 * 4);
  t2 = i + 1;
  r += *(p + t2 * 4);
  t3 = i + 3;
  r += *(p + t3 * 4);

The problem is now that it takes until combine until this is generated:

r = *(p + i * 4 + 8);
r += *(p + i * 4 + 4);
r += *(p + i * 4 + 12);

and at this point it's too late. It's easy to blame the back end, but at this
point it's IMHO unrealistic to undo this mess at RTL level. The proper address
form should be generated as early as possible, so that at the time RTL is
generated, it's as close as possible to the final form.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (7 preceding siblings ...)
  2007-07-09 17:42 ` zippel at gcc dot gnu dot org
@ 2007-07-09 19:42 ` rguenth at gcc dot gnu dot org
  2007-07-18  6:00 ` pinskia at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 19:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from rguenth at gcc dot gnu dot org  2007-07-09 19:42 -------
Subject: Bug 32698

Author: rguenth
Date: Mon Jul  9 19:41:54 2007
New Revision: 126494

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126494
Log:
2007-07-09  Richard Guenther  <rguenther@suse.de>

        PR middle-end/32698
        * fold-const.c (fold_plusminus_mult_expr): Move constant
        arguments second to allow decomposing.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (8 preceding siblings ...)
  2007-07-09 19:42 ` rguenth at gcc dot gnu dot org
@ 2007-07-18  6:00 ` pinskia at gcc dot gnu dot org
  2007-07-18 12:56 ` zippel at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-07-18  6:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from pinskia at gcc dot gnu dot org  2007-07-18 06:00 -------
Fixed.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (9 preceding siblings ...)
  2007-07-18  6:00 ` pinskia at gcc dot gnu dot org
@ 2007-07-18 12:56 ` zippel at gcc dot gnu dot org
  2007-07-19 16:50 ` rguenth at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-18 12:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from zippel at gcc dot gnu dot org  2007-07-18 12:56 -------
This bug is not fixed yet.
Current gcc still generates:

return (*(p + (i + 2) * 4) + *(p + (i + 1) * 4)) + *(p + (i + 3) * 4);

1. it still fails to extract the common expression at tree level.
2. it generates ineffecient initial RTL, so later optimizers have little chance
to do something useful with it.


-- 

zippel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (10 preceding siblings ...)
  2007-07-18 12:56 ` zippel at gcc dot gnu dot org
@ 2007-07-19 16:50 ` rguenth at gcc dot gnu dot org
  2007-07-19 18:27 ` zippel at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-19 16:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from rguenth at gcc dot gnu dot org  2007-07-19 16:50 -------
The IL representation is not a thing to complain about.  Do you have a testcase
that shows a missed optimization instead of a one that has IL that is different
from what you expect?


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|rguenth at gcc dot gnu dot  |unassigned at gcc dot gnu
                   |org                         |dot org
             Status|REOPENED                    |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (11 preceding siblings ...)
  2007-07-19 16:50 ` rguenth at gcc dot gnu dot org
@ 2007-07-19 18:27 ` zippel at gcc dot gnu dot org
  2007-07-20 11:48 ` rguenth at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-19 18:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from zippel at gcc dot gnu dot org  2007-07-19 18:27 -------
The initial test case is part of the missed optimization. For example current
stable Debian gcc (4.1.2 20061115) produces code like this:

        movl    4(%esp), %eax
        movl    8(%esp), %edx
        leal    (%eax,%edx,4), %edx
        movl    4(%edx), %ecx
        movl    8(%edx), %eax
        addl    %ecx, %eax
        movl    12(%edx), %ecx
        addl    %ecx, %eax
        ret

Which has some unnecessaries moves, but it shows the basic idea, so with
eliminated moves it would be:

        movl    4(%esp), %eax
        movl    8(%esp), %edx
        leal    (%eax,%edx,4), %edx
        movl    4(%edx), %eax
        addl    8(%edx), %eax
        addl    12(%edx), %eax
        ret

>From the code size this is identical to:

        movl    4(%esp), %ecx
        movl    8(%esp), %edx
        movl    8(%ecx,%edx,4), %eax
        addl    4(%ecx,%edx,4), %eax
        addl    12(%ecx,%edx,4), %eax
        ret

But it depends now on the target which instruction sequence is better.
The problem is now with the new canonical form, that AFAICT it has become
practically very difficult to generate the optimal sequence based on
instruction costs.

The older gcc produces this IL before RTL generation:

  D.1283 = (int *) (i * 4) + p;
  return *(D.1283 + 4B) + *(D.1283 + 8B) + *(D.1283 + 12B);

which produces far better RTL for the optimizers to work with.

BTW this problem is not limited to pointer expression, since the lea
instruction is used in other expressions as well.
Let's take this example:

void f(unsigned int *p, unsigned int a)
{       
  p[0] = a * 4 + 4;
  p[1] = a * 4 + 8;
  p[2] = a * 4 + 12;
}

Above gcc 4.1 produces this:

  D.1281 = a * 4;
  *p = D.1281 + 4;
  *(p + 4B) = D.1281 + 8;
  *(p + 8B) = D.1281 + 12;

        movl    8(%esp), %eax
        movl    4(%esp), %ecx
        sall    $2, %eax
        leal    4(%eax), %edx
        movl    %edx, (%ecx)
        leal    8(%eax), %edx
        addl    $12, %eax
        movl    %edx, 4(%ecx)
        movl    %eax, 8(%ecx)
        ret

gcc 4.2 produces this:

  *p = (a + 1) * 4;
  D.1545 = a * 4;
  *(p + 4B) = D.1545 + 8;
  *(p + 8B) = D.1545 + 12;

        movl    8(%esp), %eax
        movl    4(%esp), %ecx
        leal    4(,%eax,4), %edx
        sall    $2, %eax
        movl    %edx, (%ecx)
        leal    8(%eax), %edx
        addl    $12, %eax
        movl    %edx, 4(%ecx)
        movl    %eax, 8(%ecx)
        ret

So 4.2 already produces slightly worse code.
Current gcc finally produces:

  *p = (a + 1) * 4;
  *(p + 4) = (a + 2) * 4;
  *(p + 8) = (a + 3) * 4;

        movl    8(%esp), %eax
        movl    4(%esp), %ecx
        leal    4(,%eax,4), %edx
        movl    %edx, (%ecx)
        leal    8(,%eax,4), %edx
        leal    12(,%eax,4), %eax
        movl    %edx, 4(%ecx)
        movl    %eax, 8(%ecx)
        ret

This has now the largest code size of all versions.

This new canonical form IMHO clearly conflicts with what is expected at RTL
level, so I don't understand why it's so important to use this one. Could you
maybe explain the reason behind this choice?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (12 preceding siblings ...)
  2007-07-19 18:27 ` zippel at gcc dot gnu dot org
@ 2007-07-20 11:48 ` rguenth at gcc dot gnu dot org
  2007-07-20 11:58 ` zippel at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-20 11:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from rguenth at gcc dot gnu dot org  2007-07-20 11:48 -------
For current mainline I get (-O2)

foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %ecx
        movl    12(%ebp), %edx
        popl    %ebp
        movl    8(%ecx,%edx,4), %eax
        addl    4(%ecx,%edx,4), %eax
        addl    12(%ecx,%edx,4), %eax
        ret

Can you be more specific about what processor tuning you are using?  That is,
can you provide the output of adding -v to the gcc commandline that produces
the mainline results in the last comment?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (13 preceding siblings ...)
  2007-07-20 11:48 ` rguenth at gcc dot gnu dot org
@ 2007-07-20 11:58 ` zippel at gcc dot gnu dot org
  2007-07-20 16:06 ` rguenth at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-20 11:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from zippel at gcc dot gnu dot org  2007-07-20 11:58 -------
In the examples I used -fomit-frame-pointer.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (14 preceding siblings ...)
  2007-07-20 11:58 ` zippel at gcc dot gnu dot org
@ 2007-07-20 16:06 ` rguenth at gcc dot gnu dot org
  2007-07-20 16:21 ` zippel at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-20 16:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from rguenth at gcc dot gnu dot org  2007-07-20 16:05 -------
That makes it

foo:
        movl    4(%esp), %ecx
        movl    8(%esp), %edx
        movl    8(%ecx,%edx,4), %eax
        addl    4(%ecx,%edx,4), %eax
        addl    12(%ecx,%edx,4), %eax
        ret

for me.  Still different from what you claim.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (15 preceding siblings ...)
  2007-07-20 16:06 ` rguenth at gcc dot gnu dot org
@ 2007-07-20 16:21 ` zippel at gcc dot gnu dot org
  2007-07-20 16:35 ` rguenth at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-20 16:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from zippel at gcc dot gnu dot org  2007-07-20 16:21 -------
Which claim?
It's exactly the third code example in comment #13


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (16 preceding siblings ...)
  2007-07-20 16:21 ` zippel at gcc dot gnu dot org
@ 2007-07-20 16:35 ` rguenth at gcc dot gnu dot org
  2007-07-20 17:06 ` zippel at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-20 16:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from rguenth at gcc dot gnu dot org  2007-07-20 16:35 -------
I mean

<cite>
Current gcc finally produces:

  *p = (a + 1) * 4;
  *(p + 4) = (a + 2) * 4;
  *(p + 8) = (a + 3) * 4;

        movl    8(%esp), %eax
        movl    4(%esp), %ecx
        leal    4(,%eax,4), %edx
        movl    %edx, (%ecx)
        leal    8(,%eax,4), %edx
        leal    12(,%eax,4), %eax
        movl    %edx, 4(%ecx)
        movl    %eax, 8(%ecx)
        ret

This has now the largest code size of all versions.

This new canonical form IMHO clearly conflicts with what is expected at RTL
level, so I don't understand why it's so important to use this one. Could you
maybe explain the reason behind this choice?
</cite>

which suggests that current trunk is worse with the patch.  Or am I confused
and you are happy with the code generated by current trunk for the original
testcase?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (17 preceding siblings ...)
  2007-07-20 16:35 ` rguenth at gcc dot gnu dot org
@ 2007-07-20 17:06 ` zippel at gcc dot gnu dot org
  2007-07-20 17:22 ` rguenth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-20 17:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from zippel at gcc dot gnu dot org  2007-07-20 17:06 -------
There is another small source example inbetween, which is used to produce all
code examples following it. :)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (18 preceding siblings ...)
  2007-07-20 17:06 ` zippel at gcc dot gnu dot org
@ 2007-07-20 17:22 ` rguenth at gcc dot gnu dot org
  2007-08-10  0:44 ` mmitchel at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-20 17:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from rguenth at gcc dot gnu dot org  2007-07-20 17:22 -------
Whoops ;)  I missed that.

I have a counter-example that is better with the patch in the same way yours
is worse with it.

void f(unsigned int *p, unsigned int a)
{
  p[0] = a * 4 + 4;
  p[1] = a * 8 + 8;
  p[2] = a * 12 + 12;
}

As I said, the fold canonicalization is just canonicalization, the code
generation has to be fixed elsewhere.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (19 preceding siblings ...)
  2007-07-20 17:22 ` rguenth at gcc dot gnu dot org
@ 2007-08-10  0:44 ` mmitchel at gcc dot gnu dot org
  2007-11-19  9:01 ` steven at gcc dot gnu dot org
  2008-01-13 15:18 ` rguenth at gcc dot gnu dot org
  22 siblings, 0 replies; 24+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-08-10  0:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from mmitchel at gcc dot gnu dot org  2007-08-10 00:44 -------
I'm not convinced that there's anything to fix here; it sounds like we've just
traded which of two examples is better.  If there is a bug here, please add a
note explaining, and upgrade back to P3.


-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (20 preceding siblings ...)
  2007-08-10  0:44 ` mmitchel at gcc dot gnu dot org
@ 2007-11-19  9:01 ` steven at gcc dot gnu dot org
  2008-01-13 15:18 ` rguenth at gcc dot gnu dot org
  22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2007-11-19  9:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from steven at gcc dot gnu dot org  2007-11-19 09:01 -------
"...and then he said: ``well, that's nice and all, but, ehm, where's the
bug?''"


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
  2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
                   ` (21 preceding siblings ...)
  2007-11-19  9:01 ` steven at gcc dot gnu dot org
@ 2008-01-13 15:18 ` rguenth at gcc dot gnu dot org
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-13 15:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from rguenth at gcc dot gnu dot org  2008-01-13 14:56 -------
Closing as fixed.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-01-13 14:57 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
2007-07-09 13:40 ` [Bug tree-optimization/32698] " rguenth at gcc dot gnu dot org
2007-07-09 13:41 ` rguenth at gcc dot gnu dot org
2007-07-09 13:42 ` rguenth at gcc dot gnu dot org
2007-07-09 14:40 ` zippel at gcc dot gnu dot org
2007-07-09 15:01 ` rguenth at gcc dot gnu dot org
2007-07-09 15:28 ` zippel at gcc dot gnu dot org
2007-07-09 15:37 ` rguenther at suse dot de
2007-07-09 17:42 ` zippel at gcc dot gnu dot org
2007-07-09 19:42 ` rguenth at gcc dot gnu dot org
2007-07-18  6:00 ` pinskia at gcc dot gnu dot org
2007-07-18 12:56 ` zippel at gcc dot gnu dot org
2007-07-19 16:50 ` rguenth at gcc dot gnu dot org
2007-07-19 18:27 ` zippel at gcc dot gnu dot org
2007-07-20 11:48 ` rguenth at gcc dot gnu dot org
2007-07-20 11:58 ` zippel at gcc dot gnu dot org
2007-07-20 16:06 ` rguenth at gcc dot gnu dot org
2007-07-20 16:21 ` zippel at gcc dot gnu dot org
2007-07-20 16:35 ` rguenth at gcc dot gnu dot org
2007-07-20 17:06 ` zippel at gcc dot gnu dot org
2007-07-20 17:22 ` rguenth at gcc dot gnu dot org
2007-08-10  0:44 ` mmitchel at gcc dot gnu dot org
2007-11-19  9:01 ` steven at gcc dot gnu dot org
2008-01-13 15:18 ` rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).