public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/29256]  New: [4.2.0 performance regression]
@ 2006-09-27 18:29 edmar at freescale dot com
  2006-09-27 18:30 ` [Bug c/29256] " edmar at freescale dot com
                   ` (42 more replies)
  0 siblings, 43 replies; 44+ messages in thread
From: edmar at freescale dot com @ 2006-09-27 18:29 UTC (permalink / raw)
  To: gcc-bugs

Compiler configure with --enable-e500_double

The compiler generates inferior code then with gcc-4.1.

The source code is:
# define N      2000000
static double   a[N],c[N];
void tuned_STREAM_Copy()
{
        int j;
        for (j=0; j<N; j++)
            c[j] = a[j];
}

Attached is stream.s-4.1 and stream.s-4.2

When compiled with 4.2, the command line is:
/temp/gnu_toolchain/install_area/gcc-trunk/gcc-trunk-20060926-e500v2/bin/powerpc-unknown-linux-gnuspe-gcc
-O3 -funroll-loops -funroll-all-loops -S stream.c -v
Using built-in specs.
Target: powerpc-unknown-linux-gnuspe
Configured with: ../gcc-trunk/configure
--prefix=/temp/gnu_toolchain/install_area/gcc-trunk/gcc-trunk-20060926-e500v2
--with-local-prefix=/temp/gnu_toolchain/install_area/gcc-trunk/gcc-trunk-20060926-e500v2
--enable-languages=c,c++,fortran --enable-threads
--target=powerpc-unknown-linux-gnuspe
--with-gmp=/proj/ppc/sysperf/sw/gnu_toolchain/gcc_support/linuxAMD64
--with-mpfr=/proj/ppc/sysperf/sw/gnu_toolchain/gcc_support/linuxAMD64
--disable-shared --disable-multilib --disable-linux-futex --enable-e500_double
Thread model: posix
gcc version 4.2.0 20060926 (experimental)

/temp/gnu_toolchain/install_area/gcc-trunk/gcc-trunk-20060926-e500v2/libexec/gcc/powerpc-unknown-linux-gnuspe/4.2.0/cc1
-quiet -v -D__unix__ -D__gnu_linux__ -D__linux__ -Dunix -D__unix -Dlinux
-D__linux -Asystem=linux -Asystem=unix -Asystem=posix stream.c -quiet -dumpbase
stream.c -auxbase stream -O3 -version -funroll-loops -funroll-all-loops -o
stream.s
#include "..." search starts here:
#include <...> search starts here:

/temp/gnu_toolchain/install_area/gcc-trunk/gcc-trunk-20060926-e500v2/lib/gcc/powerpc-unknown-linux-gnuspe/4.2.0/include

/temp/gnu_toolchain/install_area/gcc-trunk/gcc-trunk-20060926-e500v2/lib/gcc/powerpc-unknown-linux-gnuspe/4.2.0/../../../../powerpc-unknown-linux-gnuspe/sys-include

/temp/gnu_toolchain/install_area/gcc-trunk/gcc-trunk-20060926-e500v2/lib/gcc/powerpc-unknown-linux-gnuspe/4.2.0/../../../../powerpc-unknown-linux-gnuspe/include
End of search list.
GNU C version 4.2.0 20060926 (experimental) (powerpc-unknown-linux-gnuspe)
        compiled by GNU C version 3.4.3.
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: af19c94719eeca398c0b645020867b59





And when compiled with 4.1 the command line is:
/temp/gnu_toolchain/install_area/gcc-4_1-branch/gcc-4_1-branch-20060926-e500v2/bin/powerpc-unknown-linux-gnuspe-gcc
-O3 -funroll-loops -funroll-all-loops -S stream.c -v
Using built-in specs.
Target: powerpc-unknown-linux-gnuspe
Configured with: ../gcc-4_1-branch/configure
--prefix=/temp/gnu_toolchain/install_area/gcc-4_1-branch/gcc-4_1-branch-20060926-e500v2
--with-local-prefix=/temp/gnu_toolchain/install_area/gcc-4_1-branch/gcc-4_1-branch-20060926-e500v2
--enable-languages=c,c++,fortran --enable-threads
--target=powerpc-unknown-linux-gnuspe
--with-gmp=/proj/ppc/sysperf/sw/gnu_toolchain/gcc_support/linuxAMD64
--with-mpfr=/proj/ppc/sysperf/sw/gnu_toolchain/gcc_support/linuxAMD64
--disable-shared --disable-multilib --disable-shared --disable-multilib
--enable-e500_double
Thread model: posix
gcc version 4.1.2 20060926 (prerelease)

/temp/gnu_toolchain/install_area/gcc-4_1-branch/gcc-4_1-branch-20060926-e500v2/libexec/gcc/powerpc-unknown-linux-gnuspe/4.1.2/cc1
-quiet -v -D__unix__ -D__gnu_linux__ -D__linux__ -Dunix -D__unix -Dlinux
-D__linux -Asystem=linux -Asystem=unix -Asystem=posix stream.c -quiet -dumpbase
stream.c -auxbase stream -O3 -version -funroll-loops -funroll-all-loops -o
stream.s
ignoring nonexistent directory
"/temp/gnu_toolchain/install_area/gcc-4_1-branch/gcc-4_1-branch-20060926-e500v2/lib/gcc/powerpc-unknown-linux-gnuspe/4.1.2/../../../../powerpc-unknown-linux-gnuspe/include"
#include "..." search starts here:
#include <...> search starts here:

/temp/gnu_toolchain/install_area/gcc-4_1-branch/gcc-4_1-branch-20060926-e500v2/lib/gcc/powerpc-unknown-linux-gnuspe/4.1.2/include

/temp/gnu_toolchain/install_area/gcc-4_1-branch/gcc-4_1-branch-20060926-e500v2/lib/gcc/powerpc-unknown-linux-gnuspe/4.1.2/../../../../powerpc-unknown-linux-gnuspe/sys-include
End of search list.
GNU C version 4.1.2 20060926 (prerelease) (powerpc-unknown-linux-gnuspe)
        compiled by GNU C version 3.4.3.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 565818e6f0c83f0e9f8781118c7d40c3


-- 
           Summary: [4.2.0 performance regression]
           Product: gcc
           Version: 4.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: edmar at freescale dot com
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: powerpc-unknown-linux-gnuspe-gcc


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug c/29256] [4.2.0 performance regression]
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
@ 2006-09-27 18:30 ` edmar at freescale dot com
  2006-09-27 18:30 ` edmar at freescale dot com
                   ` (41 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: edmar at freescale dot com @ 2006-09-27 18:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from edmar at freescale dot com  2006-09-27 18:30 -------
Created an attachment (id=12340)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12340&action=view)
Result of 4.1 compilation


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug c/29256] [4.2.0 performance regression]
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
  2006-09-27 18:30 ` [Bug c/29256] " edmar at freescale dot com
@ 2006-09-27 18:30 ` edmar at freescale dot com
  2006-09-28  3:00 ` [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression pinskia at gcc dot gnu dot org
                   ` (40 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: edmar at freescale dot com @ 2006-09-27 18:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from edmar at freescale dot com  2006-09-27 18:30 -------
Created an attachment (id=12341)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12341&action=view)
Result of 4.2 compilation


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
  2006-09-27 18:30 ` [Bug c/29256] " edmar at freescale dot com
  2006-09-27 18:30 ` edmar at freescale dot com
@ 2006-09-28  3:00 ` pinskia at gcc dot gnu dot org
  2006-09-28 11:08 ` rguenth at gcc dot gnu dot org
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28  3:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2006-09-28 02:59 -------
This is a generic regression, x86 has the same problem with the code.  Even
doing -Ddouble=int, we have the same problem.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu dot
                   |                            |org
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   GCC host triplet|x86_64-unknown-linux-gnu    |
 GCC target triplet|powerpc-unknown-linux-gnuspe|
           Keywords|                            |missed-optimization
      Known to fail|                            |4.2.0
      Known to work|                            |4.1.2
   Last reconfirmed|0000-00-00 00:00:00         |2006-09-28 02:59:57
               date|                            |
            Summary|[4.2 regression] performance|[4.2 regression] loop
                   |regression with double on   |unrolling performance
                   |SPE2                        |regression
   Target Milestone|---                         |4.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (2 preceding siblings ...)
  2006-09-28  3:00 ` [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression pinskia at gcc dot gnu dot org
@ 2006-09-28 11:08 ` rguenth at gcc dot gnu dot org
  2006-09-28 11:34 ` rakdver at gcc dot gnu dot org
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-09-28 11:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2006-09-28 11:08 -------
On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times.  This
is
a code-size regression, but other than that?  The 4.2 version runs slightly
faster than the 4.1 version, though the difference may be in the noise.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org, rakdver at gcc dot gnu
                   |                            |dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (3 preceding siblings ...)
  2006-09-28 11:08 ` rguenth at gcc dot gnu dot org
@ 2006-09-28 11:34 ` rakdver at gcc dot gnu dot org
  2006-09-28 13:47 ` pinskia at gcc dot gnu dot org
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 11:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rakdver at gcc dot gnu dot org  2006-09-28 11:34 -------
(In reply to comment #4)
> On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times.  This
> is
> a code-size regression, but other than that?  The 4.2 version runs slightly
> faster than the 4.1 version, though the difference may be in the noise.

Choosing 9 instead of 8 looks weird, though :-).  The reason is following:
jump threading in vrp2 pass peels one iteration of the loop.  With this change,
unrolling by factor of 9 creates smaller code (only one extra iteration needs
to be peeled to make the number of iterations divisible by 9, while one would
need to peel 7 more iterations to make it divisible by 8).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (4 preceding siblings ...)
  2006-09-28 11:34 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 13:47 ` pinskia at gcc dot gnu dot org
  2006-09-28 14:03 ` rguenth at gcc dot gnu dot org
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28 13:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from pinskia at gcc dot gnu dot org  2006-09-28 13:47 -------
(In reply to comment #4)
> On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times.  This
> is
> a code-size regression, but other than that?  The 4.2 version runs slightly
> faster than the 4.1 version, though the difference may be in the noise.

No, no, no, I and Edmar are not complaining about how many times it unrolled
but the use of index addressing mode instead of offset addressing mode for the
stores and the extra adds.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (5 preceding siblings ...)
  2006-09-28 13:47 ` pinskia at gcc dot gnu dot org
@ 2006-09-28 14:03 ` rguenth at gcc dot gnu dot org
  2006-09-28 14:08 ` pinskia at gcc dot gnu dot org
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-09-28 14:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from rguenth at gcc dot gnu dot org  2006-09-28 14:02 -------
Oh, but those do not happen on x86_64.  So this is a target issue really.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (6 preceding siblings ...)
  2006-09-28 14:03 ` rguenth at gcc dot gnu dot org
@ 2006-09-28 14:08 ` pinskia at gcc dot gnu dot org
  2006-09-28 14:11 ` rguenth at gcc dot gnu dot org
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28 14:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from pinskia at gcc dot gnu dot org  2006-09-28 14:08 -------
  D.1563 = -&a;
  MEM[base: (int *) D.1563 + &c, index: D.1562] = MEM[base: D.1562];

WTFFFFFFF


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (7 preceding siblings ...)
  2006-09-28 14:08 ` pinskia at gcc dot gnu dot org
@ 2006-09-28 14:11 ` rguenth at gcc dot gnu dot org
  2006-09-28 14:15 ` rakdver at gcc dot gnu dot org
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-09-28 14:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from rguenth at gcc dot gnu dot org  2006-09-28 14:11 -------
Oh, didn't I fix this?  See PR26726.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (8 preceding siblings ...)
  2006-09-28 14:11 ` rguenth at gcc dot gnu dot org
@ 2006-09-28 14:15 ` rakdver at gcc dot gnu dot org
  2006-09-28 14:16 ` [Bug middle-end/29256] [4.2 regression] loop " pinskia at gcc dot gnu dot org
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rakdver at gcc dot gnu dot org  2006-09-28 14:15 -------
(In reply to comment #8)
>   D.1563 = -&a;
>   MEM[base: (int *) D.1563 + &c, index: D.1562] = MEM[base: D.1562];
> 
> WTFFFFFFF

ivopts are having fun :-)  On the other hand, this is (one of several possible)
cheapest ways how to express the code, and it should not affect creation of
offsetted modes on RTL, so although this is indeed somewhat curious (well, bug
in fact, from reasons unrelated to the problem covered by this PR), it is not
the cause of this problem.

On x86, tree optimizers seem to do just fine, producing

MEM[symbol: c, index: D.1569, step: 8B] = MEM[symbol: a, index: D.1569, step:
8B];

However, on RTL, we fail to create offsetted version of this addressing mode
after unrolling.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (9 preceding siblings ...)
  2006-09-28 14:15 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 14:16 ` pinskia at gcc dot gnu dot org
  2006-09-28 14:21 ` rakdver at gcc dot gnu dot org
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28 14:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from pinskia at gcc dot gnu dot org  2006-09-28 14:16 -------
(In reply to comment #9)
> Oh, didn't I fix this?  See PR26726.
This is unrelated to that as the trees produced is defined but just looks weird
and really the one IV selection is messed up.  It should have chosen two IVs
for this loop instead of just one.
Actually unrolling is not need to produced the bad code:
.L2:
        lwz 0,0(9)
        stwx 0,11,9
        addi 9,9,4
        bdnz .L2
I bet a beer that loop.c actually fixed this crap up before.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.2 regression] loop       |[4.2 regression] loop
                   |unrolling performance       |performance regression
                   |regression                  |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (10 preceding siblings ...)
  2006-09-28 14:16 ` [Bug middle-end/29256] [4.2 regression] loop " pinskia at gcc dot gnu dot org
@ 2006-09-28 14:21 ` rakdver at gcc dot gnu dot org
  2006-09-28 14:35 ` pinskia at gcc dot gnu dot org
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from rakdver at gcc dot gnu dot org  2006-09-28 14:21 -------
(In reply to comment #11)
> (In reply to comment #9)
> > Oh, didn't I fix this?  See PR26726.
> This is unrelated to that as the trees produced is defined but just looks weird
> and really the one IV selection is messed up.  It should have chosen two IVs
> for this loop instead of just one.
> Actually unrolling is not need to produced the bad code:
> .L2:
>         lwz 0,0(9)
>         stwx 0,11,9
>         addi 9,9,4
>         bdnz .L2
> I bet a beer that loop.c actually fixed this crap up before.

I am bad at reading ppc assembler; could you please explain what exactly is
wrong with the code you present?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (11 preceding siblings ...)
  2006-09-28 14:21 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 14:35 ` pinskia at gcc dot gnu dot org
  2006-09-28 14:40 ` rakdver at gcc dot gnu dot org
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28 14:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from pinskia at gcc dot gnu dot org  2006-09-28 14:34 -------
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #9)
> > > Oh, didn't I fix this?  See PR26726.
> > This is unrelated to that as the trees produced is defined but just looks weird
> > and really the one IV selection is messed up.  It should have chosen two IVs
> > for this loop instead of just one.
> > Actually unrolling is not need to produced the bad code:
> > .L2:
> >         lwz 0,0(9)
> >         stwx 0,11,9
> >         addi 9,9,4
> >         bdnz .L2
> > I bet a beer that loop.c actually fixed this crap up before.
> 
> I am bad at reading ppc assembler; could you please explain what exactly is
> wrong with the code you present?

One, there are two adds still there (just one is implicated)
so why not do the loop as:
 .L2:
         lwz r0,0(r9)
         stw r0,0(r11)
         addi r9,r9,4
         addi r11,r11,4
         bdnz .L2
Or:
 .L2:
         lwxz r0,r9,r12
         stwx r0,r11,r12
         addi r12,r12,4
         bdnz .L2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (12 preceding siblings ...)
  2006-09-28 14:35 ` pinskia at gcc dot gnu dot org
@ 2006-09-28 14:40 ` rakdver at gcc dot gnu dot org
  2006-09-28 14:44 ` rakdver at gcc dot gnu dot org
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from rakdver at gcc dot gnu dot org  2006-09-28 14:40 -------
> > > for this loop instead of just one.
> > > Actually unrolling is not need to produced the bad code:
> > > .L2:
> > >         lwz 0,0(9)
> > >         stwx 0,11,9
> > >         addi 9,9,4
> > >         bdnz .L2
> > > I bet a beer that loop.c actually fixed this crap up before.
> > 
> > I am bad at reading ppc assembler; could you please explain what exactly is
> > wrong with the code you present?
> 
> One, there are two adds still there (just one is implicated)
> so why not do the loop as:

there is only one add, as far as I can see.

>  .L2:
>          lwz r0,0(r9)
>          stw r0,0(r11)
>          addi r9,r9,4
>          addi r11,r11,4
>          bdnz .L2

Otoh, this seems worse to me (one more add).

> Or:
>  .L2:
>          lwxz r0,r9,r12
>          stwx r0,r11,r12
>          addi r12,r12,4
>          bdnz .L2

Yes, this would be about the same.  Still, ivopts chose one of the best
possible ways, so I do not see what you are complaining about so much.
The unrolled case is something different -- of course we should use offsetted
modes there.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (13 preceding siblings ...)
  2006-09-28 14:40 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 14:44 ` rakdver at gcc dot gnu dot org
  2006-09-28 14:50 ` rakdver at gcc dot gnu dot org
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:44 UTC (permalink / raw)
  To: gcc-bugs



-- 

rakdver at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |rakdver at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2006-09-28 02:59:57         |2006-09-28 14:44:02
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (14 preceding siblings ...)
  2006-09-28 14:44 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 14:50 ` rakdver at gcc dot gnu dot org
  2006-09-28 23:48 ` rakdver at gcc dot gnu dot org
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from rakdver at gcc dot gnu dot org  2006-09-28 14:50 -------
(In reply to comment #8)
>   D.1563 = -&a;
>   MEM[base: (int *) D.1563 + &c, index: D.1562] = MEM[base: D.1562];
> 
> WTFFFFFFF

This is caused by my change to ivopts in
http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00198.html.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (15 preceding siblings ...)
  2006-09-28 14:50 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 23:48 ` rakdver at gcc dot gnu dot org
  2006-10-01 23:04 ` mmitchel at gcc dot gnu dot org
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 23:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from rakdver at gcc dot gnu dot org  2006-09-28 23:48 -------
Patch for the induction variable selection (that however does not fix the
problem with offsetted addressing modes not being created after unrolling):

http://gcc.gnu.org/ml/gcc-patches/2006-09/msg01308.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (16 preceding siblings ...)
  2006-09-28 23:48 ` rakdver at gcc dot gnu dot org
@ 2006-10-01 23:04 ` mmitchel at gcc dot gnu dot org
  2006-10-06 19:32 ` rakdver at gcc dot gnu dot org
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-10-01 23:04 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (17 preceding siblings ...)
  2006-10-01 23:04 ` mmitchel at gcc dot gnu dot org
@ 2006-10-06 19:32 ` rakdver at gcc dot gnu dot org
  2007-05-14 21:37 ` [Bug middle-end/29256] [4.2/4.3 " mmitchel at gcc dot gnu dot org
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-10-06 19:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from rakdver at gcc dot gnu dot org  2006-10-06 19:32 -------
Subject: Bug 29256

Author: rakdver
Date: Fri Oct  6 19:32:04 2006
New Revision: 117513

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=117513
Log:
        PR middle-end/29256
        * tree-ssa-loop-ivopts.c (determine_base_object): Handle pointers
        casted to integer type.
        (get_address_cost): Decrease cost of [symbol + index] addressing modes
        if they are significantly more expensive than [reg + index] ones.

        * gcc.dg/tree-ssa/loop-19.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-ivopts.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (18 preceding siblings ...)
  2006-10-06 19:32 ` rakdver at gcc dot gnu dot org
@ 2007-05-14 21:37 ` mmitchel at gcc dot gnu dot org
  2007-07-20  3:50 ` mmitchel at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-05-14 21:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from mmitchel at gcc dot gnu dot org  2007-05-14 22:26 -------
Will not be fixed in 4.2.0; retargeting at 4.2.1.


-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.2.0                       |4.2.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (19 preceding siblings ...)
  2007-05-14 21:37 ` [Bug middle-end/29256] [4.2/4.3 " mmitchel at gcc dot gnu dot org
@ 2007-07-20  3:50 ` mmitchel at gcc dot gnu dot org
  2007-10-09 19:25 ` mmitchel at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-07-20  3:50 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.2.1                       |4.2.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (20 preceding siblings ...)
  2007-07-20  3:50 ` mmitchel at gcc dot gnu dot org
@ 2007-10-09 19:25 ` mmitchel at gcc dot gnu dot org
  2008-01-11  5:16 ` ghazi at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-10-09 19:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from mmitchel at gcc dot gnu dot org  2007-10-09 19:21 -------
Change target milestone to 4.2.3, as 4.2.2 has been released.


-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.2.2                       |4.2.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (21 preceding siblings ...)
  2007-10-09 19:25 ` mmitchel at gcc dot gnu dot org
@ 2008-01-11  5:16 ` ghazi at gcc dot gnu dot org
  2008-01-11  6:04 ` rakdver at kam dot mff dot cuni dot cz
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: ghazi at gcc dot gnu dot org @ 2008-01-11  5:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from ghazi at gcc dot gnu dot org  2008-01-11 04:21 -------
Is the testcase gcc.dg/tree-ssa/loop-19.c supposed to work with -fpic/-fPIC?
I'm getting failures on mainline and 4.2 with x86_64, and only on 4.2 with
i686.  Mainline i686 seems to work though.

Fails:
http://gcc.gnu.org/ml/gcc-testresults/2008-01/msg00383.html
http://gcc.gnu.org/ml/gcc-testresults/2008-01/msg00365.html
http://gcc.gnu.org/ml/gcc-testresults/2008-01/msg00410.html

works:
http://gcc.gnu.org/ml/gcc-testresults/2008-01/msg00366.html

Thanks,
--Kaveh


-- 

ghazi at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ghazi at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (22 preceding siblings ...)
  2008-01-11  5:16 ` ghazi at gcc dot gnu dot org
@ 2008-01-11  6:04 ` rakdver at kam dot mff dot cuni dot cz
  2008-01-12  8:43 ` ghazi at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at kam dot mff dot cuni dot cz @ 2008-01-11  6:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from rakdver at kam dot mff dot cuni dot cz  2008-01-11 04:44 -------
Subject: Re:  [4.2/4.3 regression] loop performance regression

> Is the testcase gcc.dg/tree-ssa/loop-19.c supposed to work with -fpic/-fPIC?

not necessarily; with -fpic, both memory accesses are fully
strength-reduced, which seems to be the correct thing to do; however,

> I'm getting failures on mainline and 4.2 with x86_64, and only on 4.2 with
> i686.  Mainline i686 seems to work though.

the difference in the costs of the two variants is so small that you
will basically get one of them at random.  This test is not intended to
be run with -fpic.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (23 preceding siblings ...)
  2008-01-11  6:04 ` rakdver at kam dot mff dot cuni dot cz
@ 2008-01-12  8:43 ` ghazi at gcc dot gnu dot org
  2008-02-01 17:00 ` jsm28 at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: ghazi at gcc dot gnu dot org @ 2008-01-12  8:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from ghazi at gcc dot gnu dot org  2008-01-12 08:35 -------
Thanks, testsuite patch posted here:
http://gcc.gnu.org/ml/gcc-patches/2008-01/msg00530.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (24 preceding siblings ...)
  2008-01-12  8:43 ` ghazi at gcc dot gnu dot org
@ 2008-02-01 17:00 ` jsm28 at gcc dot gnu dot org
  2008-05-19 20:35 ` [Bug middle-end/29256] [4.2/4.3/4.4 " jsm28 at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2008-02-01 17:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from jsm28 at gcc dot gnu dot org  2008-02-01 16:53 -------
4.2.3 is being released now, changing milestones of open bugs to 4.2.4.


-- 

jsm28 at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.2.3                       |4.2.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (25 preceding siblings ...)
  2008-02-01 17:00 ` jsm28 at gcc dot gnu dot org
@ 2008-05-19 20:35 ` jsm28 at gcc dot gnu dot org
  2008-08-06  6:58 ` cnstar9988 at gmail dot com
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2008-05-19 20:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from jsm28 at gcc dot gnu dot org  2008-05-19 20:22 -------
4.2.4 is being released, changing milestones to 4.2.5.


-- 

jsm28 at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.2.4                       |4.2.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (26 preceding siblings ...)
  2008-05-19 20:35 ` [Bug middle-end/29256] [4.2/4.3/4.4 " jsm28 at gcc dot gnu dot org
@ 2008-08-06  6:58 ` cnstar9988 at gmail dot com
  2008-08-06 21:52 ` rakdver at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: cnstar9988 at gmail dot com @ 2008-08-06  6:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from cnstar9988 at gmail dot com  2008-08-06 06:57 -------
ping...
Can this be fixed before 4.3.2? thanks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (27 preceding siblings ...)
  2008-08-06  6:58 ` cnstar9988 at gmail dot com
@ 2008-08-06 21:52 ` rakdver at gcc dot gnu dot org
  2008-08-06 21:55 ` rakdver at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2008-08-06 21:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from rakdver at gcc dot gnu dot org  2008-08-06 21:51 -------
Created an attachment (id=16036)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16036&action=view)
possible fix

One place where this can be fixed is fwprop (something like the attached
patch).  I am not sure whether it is the right place, though; maybe cse should
be handling this?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (28 preceding siblings ...)
  2008-08-06 21:52 ` rakdver at gcc dot gnu dot org
@ 2008-08-06 21:55 ` rakdver at gcc dot gnu dot org
  2008-08-06 21:57 ` rakdver at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2008-08-06 21:55 UTC (permalink / raw)
  To: gcc-bugs



-- 

rakdver at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bonzini at gnu dot org
         AssignedTo|rakdver at gcc dot gnu dot  |unassigned at gcc dot gnu
                   |org                         |dot org
             Status|ASSIGNED                    |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (29 preceding siblings ...)
  2008-08-06 21:55 ` rakdver at gcc dot gnu dot org
@ 2008-08-06 21:57 ` rakdver at gcc dot gnu dot org
  2008-08-07  5:03 ` bonzini at gnu dot org
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2008-08-06 21:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #27 from rakdver at gcc dot gnu dot org  2008-08-06 21:56 -------
(In reply to comment #26)
> Created an attachment (id=16036)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16036&action=view) [edit]
> possible fix
> 
> One place where this can be fixed is fwprop (something like the attached
> patch).  I am not sure whether it is the right place, though; maybe cse should
> be handling this?

Also, I only checked the problem on x86; most likely, something different is
happening on ppc.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (30 preceding siblings ...)
  2008-08-06 21:57 ` rakdver at gcc dot gnu dot org
@ 2008-08-07  5:03 ` bonzini at gnu dot org
  2008-10-29 17:05 ` janis at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: bonzini at gnu dot org @ 2008-08-07  5:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #28 from bonzini at gnu dot org  2008-08-07 05:01 -------
fwprop seems the right place to do that indeed.

Only thing, I wonder you need to "find a location to add the constant": it
could be enough to do

  *x = simplify_gen_binary (PLUS, Pmode, *x, cst_to_add);

because simplify_plus_minus should have machinery to do what you are doing
already.  Indeed I wonder if this code shouldn't go in simplify_plus_minus so
that propagate_rtx would call it automatically.

Also when you compute cst_to_add you can use the _const_ version of the
simplification routine.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (31 preceding siblings ...)
  2008-08-07  5:03 ` bonzini at gnu dot org
@ 2008-10-29 17:05 ` janis at gcc dot gnu dot org
  2009-03-31 19:46 ` [Bug middle-end/29256] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: janis at gcc dot gnu dot org @ 2008-10-29 17:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #29 from janis at gcc dot gnu dot org  2008-10-29 17:05 -------
On powerpc-linux the submitter's testcase gets better code with the patch from
comment #17, but the same testcase with the loop starting with 1 instead of
zero gets worse code.  From the 4.1 branch with -O2:

.L2:
        lfd 0,0(9)
        addi 9,9,8
        stfd 0,0(11)
        addi 11,11,8
        bdnz .L2

>From the 4.2 branch:

.L2:
        add 9,10,0
        add 11,10,8
        addi 10,10,8
        lfd 0,8(9)
        stfd 0,8(11)
        bdnz .L2

Code from current mainline is the same except for the order of the addi and lfd
instructions.


-- 

janis at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |janis at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (32 preceding siblings ...)
  2008-10-29 17:05 ` janis at gcc dot gnu dot org
@ 2009-03-31 19:46 ` jsm28 at gcc dot gnu dot org
  2009-08-04 12:35 ` rguenth at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2009-03-31 19:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #30 from jsm28 at gcc dot gnu dot org  2009-03-31 19:45 -------
Closing 4.2 branch.


-- 

jsm28 at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.2/4.3/4.4/4.5 regression]|[4.3/4.4/4.5 regression]
                   |loop performance regression |loop performance regression
   Target Milestone|4.2.5                       |4.3.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (33 preceding siblings ...)
  2009-03-31 19:46 ` [Bug middle-end/29256] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
@ 2009-08-04 12:35 ` rguenth at gcc dot gnu dot org
  2010-05-22 18:23 ` [Bug middle-end/29256] [4.3/4.4/4.5/4.6 " rguenth at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-04 12:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #31 from rguenth at gcc dot gnu dot org  2009-08-04 12:27 -------
GCC 4.3.4 is being released, adjusting target milestone.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.3.4                       |4.3.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (34 preceding siblings ...)
  2009-08-04 12:35 ` rguenth at gcc dot gnu dot org
@ 2010-05-22 18:23 ` rguenth at gcc dot gnu dot org
  2010-07-16 19:14 ` pthaugen at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-22 18:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #32 from rguenth at gcc dot gnu dot org  2010-05-22 18:11 -------
GCC 4.3.5 is being released, adjusting target milestone.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.3.5                       |4.3.6


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (35 preceding siblings ...)
  2010-05-22 18:23 ` [Bug middle-end/29256] [4.3/4.4/4.5/4.6 " rguenth at gcc dot gnu dot org
@ 2010-07-16 19:14 ` pthaugen at gcc dot gnu dot org
  2010-07-18 17:49 ` rguenth at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: pthaugen at gcc dot gnu dot org @ 2010-07-16 19:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #33 from pthaugen at gcc dot gnu dot org  2010-07-16 19:14 -------
gcc.dg/tree-ssa/loop-19.c started failing on powerpc with -m64 between 7/5 and
7/7. The tree dump now looks like the following:

<bb 2>:
  ivtmp.10_12 = (long unsigned int) &a[-1];
  ivtmp.16_15 = (long unsigned int) &c[-1];
  a.21_18 = (long unsigned int) &a;
  D.2035_19 = a.21_18 + 15999992;

<bb 3>:
  # ivtmp.10_9 = PHI <ivtmp.10_5(3), ivtmp.10_12(2)>
  # ivtmp.16_13 = PHI <ivtmp.16_14(3), ivtmp.16_15(2)>
  ivtmp.10_5 = ivtmp.10_9 + 8;
  D.2032_16 = (void *) ivtmp.10_5;
  D.2007_3 = MEM[(double[2000000] *)D.2032_16];
  ivtmp.16_14 = ivtmp.16_13 + 8;
  D.2033_17 = (void *) ivtmp.16_14;
  MEM[(double[2000000] *)D.2033_17] = D.2007_3;
  if (ivtmp.10_5 != D.2035_19)
    goto <bb 3>;
  else
    goto <bb 4>;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (36 preceding siblings ...)
  2010-07-16 19:14 ` pthaugen at gcc dot gnu dot org
@ 2010-07-18 17:49 ` rguenth at gcc dot gnu dot org
  2010-07-21  4:16 ` sandra at codesourcery dot com
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-07-18 17:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #34 from rguenth at gcc dot gnu dot org  2010-07-18 17:49 -------
In particular we are now back to generating the very bogus

  ivtmp.10_12 = (long unsigned int) &a[-1];
  ivtmp.16_15 = (long unsigned int) &c[-1];


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sandra at codesourcery dot
                   |                            |com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (37 preceding siblings ...)
  2010-07-18 17:49 ` rguenth at gcc dot gnu dot org
@ 2010-07-21  4:16 ` sandra at codesourcery dot com
  2010-07-21  4:17 ` sandra at codesourcery dot com
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: sandra at codesourcery dot com @ 2010-07-21  4:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #35 from sandra at codesourcery dot com  2010-07-21 04:16 -------
Created an attachment (id=21274)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21274&action=view)
-fdump-tree-ivopts-details output from r161843


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (38 preceding siblings ...)
  2010-07-21  4:16 ` sandra at codesourcery dot com
@ 2010-07-21  4:17 ` sandra at codesourcery dot com
  2010-07-21  4:21 ` sandra at codesourcery dot com
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: sandra at codesourcery dot com @ 2010-07-21  4:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #36 from sandra at codesourcery dot com  2010-07-21 04:16 -------
Created an attachment (id=21275)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21275&action=view)
-fdump-tree-ivopts-details output from r161844


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (39 preceding siblings ...)
  2010-07-21  4:17 ` sandra at codesourcery dot com
@ 2010-07-21  4:21 ` sandra at codesourcery dot com
  2010-07-21 16:10 ` sandra at codesourcery dot com
  2010-07-21 21:51 ` pthaugen at gcc dot gnu dot org
  42 siblings, 0 replies; 44+ messages in thread
From: sandra at codesourcery dot com @ 2010-07-21  4:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #37 from sandra at codesourcery dot com  2010-07-21 04:21 -------
It seems like the change was introduced by my patch for PR42505 in r161844. 
But, it is correctly choosing the lower-cost candidate set -- the problem is in
the cost model, which was unchanged from r161843.  Take a look at the
"Use-candidate costs" section of the dump.  Those costs with negative values
(like -7) look very suspicious to me.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (40 preceding siblings ...)
  2010-07-21  4:21 ` sandra at codesourcery dot com
@ 2010-07-21 16:10 ` sandra at codesourcery dot com
  2010-07-21 21:51 ` pthaugen at gcc dot gnu dot org
  42 siblings, 0 replies; 44+ messages in thread
From: sandra at codesourcery dot com @ 2010-07-21 16:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #38 from sandra at codesourcery dot com  2010-07-21 16:08 -------
On reading the code again, I think the -7 is coming from the can_autoinc case
in determine_use_iv_cost_address.  I also think it is correct to prefer
autoinc.  E.g., here's the generated code for the loop in r161843:

.L2:
        addi 11,8,9216
        ldx 0,10,9
        stdx 0,11,9
        addi 9,9,8
        bdnz .L2

and in r161844:

.L2:
        ldu 0,8(11)
        stdu 0,8(9)
        bdnz .L2

I'm no expert on powerpc architecture, but 3 instructions versus 5 looks like a
win to me.  Bit-rotten test case?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
  2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
                   ` (41 preceding siblings ...)
  2010-07-21 16:10 ` sandra at codesourcery dot com
@ 2010-07-21 21:51 ` pthaugen at gcc dot gnu dot org
  42 siblings, 0 replies; 44+ messages in thread
From: pthaugen at gcc dot gnu dot org @ 2010-07-21 21:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #39 from pthaugen at gcc dot gnu dot org  2010-07-21 21:51 -------
(In reply to comment #38)
> 
> .L2:
>         addi 11,8,9216
>         ldx 0,10,9
>         stdx 0,11,9
>         addi 9,9,8
>         bdnz .L2
> 
> and in r161844:
> 
> .L2:
>         ldu 0,8(11)
>         stdu 0,8(9)
>         bdnz .L2
> 
> I'm no expert on powerpc architecture, but 3 instructions versus 5 looks like a
> win to me.  Bit-rotten test case?
> 

The 'addi 11,8,9216' in the first loop is invariant and should be hoisted out
of the loop. Separate issue?

As for the issue of indexed ld/st+addi vs. update-form ld/st. The update forms
are cracked into ld/st+addi which imposes a scheduling restriction on them
(cracked insns start a dispatch group). May not make any difference in this
simple loop, but indexed ld/st+addi may have better scheduling opportunities
were there more insns in the loop.

This testcase also appears to be dependent on -mcpu value. Specifying
-mcpu=power7 the testcase passes (although there's still the issue of invariant
addi in the loop).  And if I change to use -m32, then it only fails for
-mcpu=power6.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2010-07-21 21:51 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
2006-09-27 18:30 ` [Bug c/29256] " edmar at freescale dot com
2006-09-27 18:30 ` edmar at freescale dot com
2006-09-28  3:00 ` [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression pinskia at gcc dot gnu dot org
2006-09-28 11:08 ` rguenth at gcc dot gnu dot org
2006-09-28 11:34 ` rakdver at gcc dot gnu dot org
2006-09-28 13:47 ` pinskia at gcc dot gnu dot org
2006-09-28 14:03 ` rguenth at gcc dot gnu dot org
2006-09-28 14:08 ` pinskia at gcc dot gnu dot org
2006-09-28 14:11 ` rguenth at gcc dot gnu dot org
2006-09-28 14:15 ` rakdver at gcc dot gnu dot org
2006-09-28 14:16 ` [Bug middle-end/29256] [4.2 regression] loop " pinskia at gcc dot gnu dot org
2006-09-28 14:21 ` rakdver at gcc dot gnu dot org
2006-09-28 14:35 ` pinskia at gcc dot gnu dot org
2006-09-28 14:40 ` rakdver at gcc dot gnu dot org
2006-09-28 14:44 ` rakdver at gcc dot gnu dot org
2006-09-28 14:50 ` rakdver at gcc dot gnu dot org
2006-09-28 23:48 ` rakdver at gcc dot gnu dot org
2006-10-01 23:04 ` mmitchel at gcc dot gnu dot org
2006-10-06 19:32 ` rakdver at gcc dot gnu dot org
2007-05-14 21:37 ` [Bug middle-end/29256] [4.2/4.3 " mmitchel at gcc dot gnu dot org
2007-07-20  3:50 ` mmitchel at gcc dot gnu dot org
2007-10-09 19:25 ` mmitchel at gcc dot gnu dot org
2008-01-11  5:16 ` ghazi at gcc dot gnu dot org
2008-01-11  6:04 ` rakdver at kam dot mff dot cuni dot cz
2008-01-12  8:43 ` ghazi at gcc dot gnu dot org
2008-02-01 17:00 ` jsm28 at gcc dot gnu dot org
2008-05-19 20:35 ` [Bug middle-end/29256] [4.2/4.3/4.4 " jsm28 at gcc dot gnu dot org
2008-08-06  6:58 ` cnstar9988 at gmail dot com
2008-08-06 21:52 ` rakdver at gcc dot gnu dot org
2008-08-06 21:55 ` rakdver at gcc dot gnu dot org
2008-08-06 21:57 ` rakdver at gcc dot gnu dot org
2008-08-07  5:03 ` bonzini at gnu dot org
2008-10-29 17:05 ` janis at gcc dot gnu dot org
2009-03-31 19:46 ` [Bug middle-end/29256] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
2009-08-04 12:35 ` rguenth at gcc dot gnu dot org
2010-05-22 18:23 ` [Bug middle-end/29256] [4.3/4.4/4.5/4.6 " rguenth at gcc dot gnu dot org
2010-07-16 19:14 ` pthaugen at gcc dot gnu dot org
2010-07-18 17:49 ` rguenth at gcc dot gnu dot org
2010-07-21  4:16 ` sandra at codesourcery dot com
2010-07-21  4:17 ` sandra at codesourcery dot com
2010-07-21  4:21 ` sandra at codesourcery dot com
2010-07-21 16:10 ` sandra at codesourcery dot com
2010-07-21 21:51 ` pthaugen at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).