public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug regression/44423]  New: [4.5/4.6] Massive performance regression in SSE code
@ 2010-06-05 10:17 martin at mpa-garching dot mpg dot de
  2010-06-05 10:18 ` [Bug regression/44423] " martin at mpa-garching dot mpg dot de
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2010-06-05 10:17 UTC (permalink / raw)
  To: gcc-bugs

If the attached test case is compiled with current trunk, it runs almost 4
times more slowly than the same code compiled with gcc 4.4 and identical
options:

~/ujedi/splotchnew>gcc -O2 -v bugrep.c -W -Wall
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /tmp/S/gcc-4.4.2/configure --prefix=/usr --sysconfdir=/etc
--localstatedir=/var --mandir=/usr/share/man --infodir=/usr/share/info
--enable-cpp --enable-nls --enable-shared --enable-multilib
--enable-languages=c,c++,fortran
Thread model: posix
gcc version 4.4.2 (GCC) 
COLLECT_GCC_OPTIONS='-O2' '-v' '-W' '-Wall' '-mtune=generic'

/afs/mpa/@sys/system/MPA-5.13/usr/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.4.2/cc1
-quiet -v -iprefix
/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/
bugrep.c -quiet -dumpbase bugrep.c -mtune=generic -auxbase bugrep -O2 -W -Wall
-version -o /tmp/cc3cnddf.s
ignoring nonexistent directory
"/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../x86_64-unknown-linux-gnu/include"
ignoring duplicate directory
"/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/../../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/include"
ignoring duplicate directory
"/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/../../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/include-fixed"
ignoring nonexistent directory
"/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/../../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../x86_64-unknown-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:

/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/include

/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/include-fixed
 /usr/local/include
 /usr/include
End of search list.
GNU C (GCC) version 4.4.2 (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.4.2, GMP version 4.3.1, MPFR version 2.4.1.
warning: MPFR header version 2.4.1 differs from library version 2.4.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 364bbcc3e471e0834234549d6d2cb3d0
COLLECT_GCC_OPTIONS='-O2' '-v' '-W' '-Wall' '-mtune=generic'

/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../x86_64-unknown-linux-gnu/bin/as
-V -Qy -o /tmp/ccKlmGkp.o /tmp/cc3cnddf.s
GNU assembler version 2.19.1 (x86_64-unknown-linux-gnu) using BFD version (GNU
Binutils) 2.19.1
COMPILER_PATH=/afs/mpa/@sys/system/MPA-5.13/usr/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.4.2/:/afs/mpa/@sys/system/MPA-5.13/usr/bin/../libexec/gcc/:/usr/libexec/gcc/x86_64-unknown-linux-gnu/4.4.2/:/usr/libexec/gcc/x86_64-unknown-linux-gnu/:/usr/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/:/usr/lib/gcc/x86_64-unknown-linux-gnu/:/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../x86_64-unknown-linux-gnu/bin/
LIBRARY_PATH=/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/:/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/:/usr/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/:/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../x86_64-unknown-linux-gnu/lib/:/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O2' '-v' '-W' '-Wall' '-mtune=generic'

/afs/mpa/@sys/system/MPA-5.13/usr/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.4.2/collect2
--eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2
/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../lib64/crt1.o
/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../lib64/crti.o
/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/crtbegin.o
-L/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2
-L/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc
-L/usr/lib/gcc/x86_64-unknown-linux-gnu/4.4.2
-L/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../lib64
-L/lib/../lib64 -L/usr/lib/../lib64
-L/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../x86_64-unknown-linux-gnu/lib
-L/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../..
/tmp/ccKlmGkp.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed
-lgcc_s --no-as-needed
/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/crtend.o
/afs/mpa/@sys/system/MPA-5.13/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../lib64/crtn.o

~/ujedi/splotchnew>time ./a.out
0.440u 0.000s 0:00.44 100.0%    0+0k 0+0io 0pf+0w



~/ujedi/splotchnew>gcc -O2 -v bugrep.c -W -Wall
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/afs/mpa/data/martin/ugcc/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /scratch/martin/gcc/configure --disable-bootstrap
--enable-gold --enable-plugins --prefix=/afs/mpa/data/martin/ugcc
--with-ppl=/afs/mpa/data/martin/numlibs64
--with-cloog=/afs/mpa/data/martin/numlibs64
--with-libelf=/afs/mpa/data/martin/numlibs64 --enable-languages=c++,fortran
--enable-target=all --enable-checking=release
Thread model: posix
gcc version 4.6.0 20100604 (experimental) [trunk revision 160258] (GCC) 
COLLECT_GCC_OPTIONS='-O2' '-v' '-W' '-Wall' '-mtune=generic' '-march=x86-64'
 /afs/mpa/data/martin/ugcc/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/cc1
-quiet -v bugrep.c -quiet -dumpbase bugrep.c -mtune=generic -march=x86-64
-auxbase bugrep -O2 -W -Wall -version -o /tmp/ccm5w6IQ.s
GNU C (GCC) version 4.6.0 20100604 (experimental) [trunk revision 160258]
(x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.4.2, GMP version 4.3.2, MPFR version 2.4.2,
MPC version 0.8.1
warning: GMP header version 4.3.2 differs from library version 4.3.1.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory
"/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../../x86_64-unknown-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /afs/mpa/data/martin/ugcc/include
 /afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/include
 /afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/include-fixed
 /usr/include
End of search list.
GNU C (GCC) version 4.6.0 20100604 (experimental) [trunk revision 160258]
(x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.4.2, GMP version 4.3.2, MPFR version 2.4.2,
MPC version 0.8.1
warning: GMP header version 4.3.2 differs from library version 4.3.1.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: d4da26fe4a5a46c88c2087304495247d
COLLECT_GCC_OPTIONS='-O2' '-v' '-W' '-Wall' '-mtune=generic' '-march=x86-64'
 as -V -Qy --64 -o /tmp/ccmGCvZC.o /tmp/ccm5w6IQ.s
GNU assembler version 2.19.1 (x86_64-unknown-linux-gnu) using BFD version (GNU
Binutils) 2.19.1
COMPILER_PATH=/afs/mpa/data/martin/ugcc/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/:/afs/mpa/data/martin/ugcc/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/:/afs/mpa/data/martin/ugcc/libexec/gcc/x86_64-unknown-linux-gnu/:/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/:/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/
LIBRARY_PATH=/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/:/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O2' '-v' '-W' '-Wall' '-mtune=generic' '-march=x86-64'
 /afs/mpa/data/martin/ugcc/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/collect2
--eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2
/usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o
/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/crtbegin.o
-L/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0
-L/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../../lib64
-L/lib/../lib64 -L/usr/lib/../lib64
-L/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../..
/tmp/ccmGCvZC.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed
-lgcc_s --no-as-needed
/afs/mpa/data/martin/ugcc/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/crtend.o
/usr/lib/../lib64/crtn.o

~/ujedi/splotchnew>time ./a.out
1.580u 0.000s 0:01.58 100.0%    0+0k 0+0io 0pf+0w


This was run on a Core2Duo.
gcc 4.5 also exhibits this regression.


Another observation: if I change ARRSZ in the testcase to 20 instead of 1024,
all executables run even more slowly (by about another factor of five); I have
absolutely no explanation for this :-/


-- 
           Summary: [4.5/4.6] Massive performance regression in SSE code
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: regression
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: martin at mpa-garching dot mpg dot de
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug regression/44423] [4.5/4.6] Massive performance regression in SSE code
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
@ 2010-06-05 10:18 ` martin at mpa-garching dot mpg dot de
  2010-06-05 10:47 ` rguenth at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2010-06-05 10:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from martin at mpa-garching dot mpg dot de  2010-06-05 10:18 -------
Created an attachment (id=20849)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20849&action=view)
test case


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug regression/44423] [4.5/4.6] Massive performance regression in SSE code
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
  2010-06-05 10:18 ` [Bug regression/44423] " martin at mpa-garching dot mpg dot de
@ 2010-06-05 10:47 ` rguenth at gcc dot gnu dot org
  2010-06-05 10:56 ` [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA rguenth at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-05 10:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2010-06-05 10:47 -------
We have (4.4):

<bb 2>:
  va.f[0] = a->r;
  va.f[1] = a->g;
  va.f[2] = a->b;
  va.f[3] = 0.0;
  pretmp.40 = va.v;
  ivtmp.61 = 0;

<bb 3>:
  att.12 = MEM[base: pre1, index: ivtmp.61] * pre2;
  tmpatt = {att.12, att.12, att.12, att.12};
  tmpatt.66 = __builtin_ia32_mulps (tmpatt, pretmp.40);
  D.3720 = __builtin_ia32_addps (tmpatt.66, MEM[base: lpic, index: ivtmp.61,
step: 4]);
  MEM[base: lpic, index: ivtmp.61, step: 4] = D.3720;
  ivtmp.61 = ivtmp.61 + 4;
  if (ivtmp.61 != 80)
    goto <bb 3>;
  else
    goto <bb 4>;

vs. (4.6):

<bb 2>:
  va$f$0_3 = a_2(D)->r;
  va$f$1_4 = a_2(D)->g;
  va$f$2_5 = a_2(D)->b;

<bb 3>:
  # ivtmp.31_52 = PHI <ivtmp.31_51(3), 0(2)>
  D.4345_11 = MEM[base: pre1_9(D), index: ivtmp.31_52];
  att.1_13 = D.4345_11 * pre2_12(D);
  tmpatt_37 = {att.1_13, att.1_13, att.1_13, att.1_13};
  va.f[0] = va$f$0_3;
  va.f[1] = va$f$1_4;
  va.f[2] = va$f$2_5;
  va.f[3] = 0.0;
  D.4347_16 = va.v;
  tmpatt_38 = __builtin_ia32_mulps (tmpatt_37, D.4347_16);
  D.4350_25 = MEM[base: lpic_20(D), index: ivtmp.31_52, step: 4];
  D.4382_39 = __builtin_ia32_addps (tmpatt_38, D.4350_25);
  MEM[base: lpic_20(D), index: ivtmp.31_52, step: 4] = D.4382_39;
  ivtmp.31_51 = ivtmp.31_52 + 4;
  if (ivtmp.31_51 != 80)
    goto <bb 3>;
  else
    goto <bb 4>;

which means that somehow we were not able to hoist loop invariant stuff.

investigating.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-06-05 10:47:29
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
  2010-06-05 10:18 ` [Bug regression/44423] " martin at mpa-garching dot mpg dot de
  2010-06-05 10:47 ` rguenth at gcc dot gnu dot org
@ 2010-06-05 10:56 ` rguenth at gcc dot gnu dot org
  2010-06-08 13:16 ` jamborm at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-05 10:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2010-06-05 10:56 -------
Ok.  Fact is that no pass can move invariant store/load pairs.  But that's
pre-existing - the main issue is that the new SRA implementation ends up
rematerializing the stores inside the loop!

Diff of pre-esra vs. esra:

 <bb 2>:
   D.4339_3 = a_2(D)->r;
-  va.f[0] = D.4339_3;
+  va$f$0_33 = D.4339_3;
   D.4340_4 = a_2(D)->g;
-  va.f[1] = D.4340_4;
+  va$f$1_32 = D.4340_4;
   D.4341_5 = a_2(D)->b;
-  va.f[2] = D.4341_5;
-  va.f[3] = 0.0;
+  va$f$2_31 = D.4341_5;
+  va$f$3_30 = 0.0;
   y_6 = 0;
   goto <bb 4>;

@@ -504,6 +203,10 @@
   tmpatt_37 = {D.4375_36, D.4375_36, D.4375_36, D.4375_36};
   tmpatt_40 = tmpatt_37;
   tmpatt_15 = tmpatt_40;
+  va.f[0] = va$f$0_33;
+  va.f[1] = va$f$1_32;
+  va.f[2] = va$f$2_31;
+  va.f[3] = va$f$3_30;
   D.4347_16 = va.v;
   tmpatt_38 = __builtin_ia32_mulps (tmpatt_15, D.4347_16);
   tmpatt_41 = tmpatt_38;

that's of course bad (and the scalarization in this particular case looks
useless, too - the only use is an aggregate one, covering all stores).


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu dot
                   |                            |org
          Component|regression                  |tree-optimization
            Summary|[4.5/4.6] Massive           |[4.5/4.6 Regression] Massive
                   |performance regression in   |performance regression in
                   |SSE code                    |SSE code due to SRA
   Target Milestone|---                         |4.5.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (2 preceding siblings ...)
  2010-06-05 10:56 ` [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA rguenth at gcc dot gnu dot org
@ 2010-06-08 13:16 ` jamborm at gcc dot gnu dot org
  2010-06-08 13:54 ` martin at mpa-garching dot mpg dot de
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-08 13:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jamborm at gcc dot gnu dot org  2010-06-08 13:16 -------
Mine


-- 

jamborm at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |jamborm at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2010-06-05 10:47:29         |2010-06-08 13:16:32
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (3 preceding siblings ...)
  2010-06-08 13:16 ` jamborm at gcc dot gnu dot org
@ 2010-06-08 13:54 ` martin at mpa-garching dot mpg dot de
  2010-06-08 14:03 ` rguenth at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2010-06-08 13:54 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from martin at mpa-garching dot mpg dot de  2010-06-08 13:54 -------
(In reply to comment #2)
> We have (4.4):
> 
> <bb 2>:
>   va.f[0] = a->r;
>   va.f[1] = a->g;
>   va.f[2] = a->b;
>   va.f[3] = 0.0;
>   pretmp.40 = va.v;
>   ivtmp.61 = 0;

[...]

Could you please tell me the compiler flag(s) needed to produce this kind of
information? That seems much more useful for debugging and chasing performance
bottlenecks than assembler dumps...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (4 preceding siblings ...)
  2010-06-08 13:54 ` martin at mpa-garching dot mpg dot de
@ 2010-06-08 14:03 ` rguenth at gcc dot gnu dot org
  2010-06-08 14:29 ` jamborm at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-08 14:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenth at gcc dot gnu dot org  2010-06-08 14:02 -------
(In reply to comment #5)
> (In reply to comment #2)
> > We have (4.4):
> > 
> > <bb 2>:
> >   va.f[0] = a->r;
> >   va.f[1] = a->g;
> >   va.f[2] = a->b;
> >   va.f[3] = 0.0;
> >   pretmp.40 = va.v;
> >   ivtmp.61 = 0;
> 
> [...]
> 
> Could you please tell me the compiler flag(s) needed to produce this kind of
> information? That seems much more useful for debugging and chasing performance
> bottlenecks than assembler dumps...

-fdump-tree-all will leave you with a hunded dump files, program state
after each individual tree optimization pass.  -da will add dumps after
each RTL pass.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (5 preceding siblings ...)
  2010-06-08 14:03 ` rguenth at gcc dot gnu dot org
@ 2010-06-08 14:29 ` jamborm at gcc dot gnu dot org
  2010-06-08 14:51 ` rguenth at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-08 14:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from jamborm at gcc dot gnu dot org  2010-06-08 14:29 -------
I don't think I can fix this bug in its most general form without
doing some flow-sensitive decisions (which can be difficult for
aggregates) and without causing PR 43846 again.  (Aggregate
copy-propagation and either of the two things described below should
do the trick, though).

As noted, this is caused by a fix to PR 43846 which on the other hand
is certainly not necessary for non-aggregate types when we do type
punning of register types through unions.  I've got a two line patch
testing that and it works (and bootstraps and tests) fine.

However, that is only a change in the new heuristics and if the array
elements are individually read somewhere else in the function too, a
different decision making condition will kick in and we will end up
with the replacements and extra statements in the loop again.

Therefore, I now tend to think that these accesses to SSE vectors are
a good reason to simply disallow scalarization of anything that has a
non-aggregate parent in the SRA access tree.  This would only affect
type punning through unions and weird typecasts (none of which could
be processed by previous SRA).

Actually, I had this disallowed when I was developing the new SRA most
of the time and then decided to allow it only very late.  I don't
remember why I did that.  I'm now testing a patch doing that, maybe
some testcase will remind me what the reason was.

I will ponder about this a bit more but probably will soon submit a
patch doing the latter.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (6 preceding siblings ...)
  2010-06-08 14:29 ` jamborm at gcc dot gnu dot org
@ 2010-06-08 14:51 ` rguenth at gcc dot gnu dot org
  2010-06-08 15:00 ` jamborm at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-08 14:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2010-06-08 14:50 -------
I don't think you need flow-sensitivity.

Basically when you have only aggregate uses (as in this case) then you only
want to scalarize if the destination of the use is scalarized as well
(to be able to copyprop out the aggregate copy).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (7 preceding siblings ...)
  2010-06-08 14:51 ` rguenth at gcc dot gnu dot org
@ 2010-06-08 15:00 ` jamborm at gcc dot gnu dot org
  2010-06-08 15:11 ` rguenth at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-08 15:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from jamborm at gcc dot gnu dot org  2010-06-08 15:00 -------
(In reply to comment #8)
> I don't think you need flow-sensitivity.
> 
> Basically when you have only aggregate uses (as in this case)

Vectors are considered scalars in GCC.  That is why the solutions
described above work.

> then you only want to scalarize if the destination of the use is
> scalarized as well (to be able to copyprop out the aggregate copy).

Well, that is what I thought until someone filed PR 43846.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (8 preceding siblings ...)
  2010-06-08 15:00 ` jamborm at gcc dot gnu dot org
@ 2010-06-08 15:11 ` rguenth at gcc dot gnu dot org
  2010-06-09  9:03 ` jamborm at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-08 15:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rguenth at gcc dot gnu dot org  2010-06-08 15:11 -------
(In reply to comment #9)
> (In reply to comment #8)
> > I don't think you need flow-sensitivity.
> > 
> > Basically when you have only aggregate uses (as in this case)
> 
> Vectors are considered scalars in GCC.  That is why the solutions
> described above work.
> 
> > then you only want to scalarize if the destination of the use is
> > scalarized as well (to be able to copyprop out the aggregate copy).
> 
> Well, that is what I thought until someone filed PR 43846.

Hm, yes.  But there you know that

  D.2464.m[0] = D.2473_20;
  D.2464.m[1] = D.2472_19;
  D.2464.m[2] = D.2471_18;
  *b_1(D) = D.2464;

D.2464 will be dead after scalarization.  In the particular case of the
current bug the aggregate remains live because of the load from va.v
which we cannot scalarize(*).

(*) we can scalarize this particular case if you manage to build a
proper constructor from the elements - but that's probably a bit
involved.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (9 preceding siblings ...)
  2010-06-08 15:11 ` rguenth at gcc dot gnu dot org
@ 2010-06-09  9:03 ` jamborm at gcc dot gnu dot org
  2010-06-09  9:06 ` jamborm at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-09  9:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from jamborm at gcc dot gnu dot org  2010-06-09 09:02 -------
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > I don't think you need flow-sensitivity.
> > > 
> > > Basically when you have only aggregate uses (as in this case)
> > 
> > Vectors are considered scalars in GCC.  That is why the solutions
> > described above work.
> > 
> > > then you only want to scalarize if the destination of the use is
> > > scalarized as well (to be able to copyprop out the aggregate copy).
> > 
> > Well, that is what I thought until someone filed PR 43846.
> 
> Hm, yes.  But there you know that
> 
>   D.2464.m[0] = D.2473_20;
>   D.2464.m[1] = D.2472_19;
>   D.2464.m[2] = D.2471_18;
>   *b_1(D) = D.2464;
> 
> D.2464 will be dead after scalarization.

If D.2464 was larger than just m, that would not necessarily be the
case and we would still want to avoid the extra copies.

However, I it is true that it would make sense to take
grp_assignment_read into account only if the whole access subtree
would end up with grp_unscalarized_data set to zero but that would
require quite a rewrite of analyze_access_subtree and would not help
in this case because grp_unscalarized_data is zero, the union is
covered by scalar replacements.

The real issue is that

>  In the particular case of the
> current bug the aggregate remains live because of the load from va.v
> which we cannot scalarize(*).

we determine this very late, in sra_modify_assign (right after the big
comment) and in the most general form this can be determined only when
we already have the whole access tree (so if we wanted to do this
during analysis, we would have to scan the function body twice).
Nevertheless, for scalar accesses that have scalar sub-accesses this
is always true and it can be easily detected and so we can simply
disallow them, like I wrote in comment #7.  And disallow them always,
since otherwise it would be easy to _add_ stuff to the function that
is causing trouble now so that any heuristics is confused and decides
to produce replacements.

I'll submit a patch in a while.

> 
> (*) we can scalarize this particular case if you manage to build a
> proper constructor from the elements - but that's probably a bit
> involved.
> 

Well, I don't think I want to implement that... but I am curious,
would that actually lead to better (or even different) code if I
placed something like that into the loop?  And I also thought that in
gimple, constructors only could have invariants in them.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (10 preceding siblings ...)
  2010-06-09  9:03 ` jamborm at gcc dot gnu dot org
@ 2010-06-09  9:06 ` jamborm at gcc dot gnu dot org
  2010-06-09 11:21 ` jamborm at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-09  9:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from jamborm at gcc dot gnu dot org  2010-06-09 09:05 -------
(In reply to comment #11)
> >   D.2464.m[0] = D.2473_20;
> >   D.2464.m[1] = D.2472_19;
> >   D.2464.m[2] = D.2471_18;
> >   *b_1(D) = D.2464;
> > 
> > D.2464 will be dead after scalarization.
> 
> If D.2464 was larger than just m, that would not necessarily be the
> case and we would still want to avoid the extra copies.
> 

Of course this would be true only if the last assignment was 

*b_1(D) = D.2464.m;

but the argument is the same.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (11 preceding siblings ...)
  2010-06-09  9:06 ` jamborm at gcc dot gnu dot org
@ 2010-06-09 11:21 ` jamborm at gcc dot gnu dot org
  2010-06-09 12:06 ` martin at mpa-garching dot mpg dot de
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-09 11:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from jamborm at gcc dot gnu dot org  2010-06-09 11:20 -------
Subject: Bug 44423

Author: jamborm
Date: Wed Jun  9 11:20:03 2010
New Revision: 160462

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=160462
Log:
2010-06-09  Martin Jambor  <mjambor@suse.cz>

        PR tree-optimization/44423
        * tree-sra.c (dump_access): Dump also grp_assignment_read.
        (analyze_access_subtree): Pass negative allow_replacements to children
        if the current type is scalar.

        * testsuite/gcc.dg/tree-ssa/pr44423.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr44423.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-sra.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (12 preceding siblings ...)
  2010-06-09 11:21 ` jamborm at gcc dot gnu dot org
@ 2010-06-09 12:06 ` martin at mpa-garching dot mpg dot de
  2010-06-14 12:40 ` jamborm at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2010-06-09 12:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from martin at mpa-garching dot mpg dot de  2010-06-09 12:06 -------
SSE performance is fine again, thanks a lot!

One more question, if that's OK...
Depending on ARRSZ the testcase uses wildly varying amounts of CPU time; it's
about half a second for ARRSZ=1024, but almost 10 seconds for ARRSZ=20 on my
machine, which is extremely strange because the operation count is the same in
both cases. I suspect that something weird is happening with respect to the
cache and prefetching. Should I open another PR for this?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (13 preceding siblings ...)
  2010-06-09 12:06 ` martin at mpa-garching dot mpg dot de
@ 2010-06-14 12:40 ` jamborm at gcc dot gnu dot org
  2010-06-14 12:46 ` martin at mpa-garching dot mpg dot de
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-14 12:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from jamborm at gcc dot gnu dot org  2010-06-14 12:39 -------
(In reply to comment #14)
> SSE performance is fine again, thanks a lot!
> 
> One more question, if that's OK...
> Depending on ARRSZ the testcase uses wildly varying amounts of CPU time; it's
> about half a second for ARRSZ=1024, but almost 10 seconds for ARRSZ=20 on my
> machine, which is extremely strange because the operation count is the same in
> both cases. I suspect that something weird is happening with respect to the
> cache and prefetching. Should I open another PR for this?
> 

The generated assembly is not different for the two cases, except that
there are much smaller offsets, of course.  This means that the lpic
and pre1 arrays are much closer to each other which may be something
the processor does not like.  I find this surprising but unless you
can think of a specific missed optimization opportunity (I can't), I
don't think it is a PR material.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (14 preceding siblings ...)
  2010-06-14 12:40 ` jamborm at gcc dot gnu dot org
@ 2010-06-14 12:46 ` martin at mpa-garching dot mpg dot de
  2010-06-14 12:50 ` jamborm at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2010-06-14 12:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from martin at mpa-garching dot mpg dot de  2010-06-14 12:46 -------
(In reply to comment #15)

I have found the problem in the meantime ... it's my mistake, sorry about the
noise :(

The problem is that I did not explicitly zero the arrays in main(), so they
apparently contained NaN or similar nastinesses for the small ARRSZ, and
"usual" numbers for large ARRSZ. Of course the processor chokes on the
"unusual" numbers and takes much longer to execute the code.
I'm not sure whether the zeroing should be added for the regression test case
... but since you check for compiler diagnostic and do not try to run the
resulting executable that's probably not necessary.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (15 preceding siblings ...)
  2010-06-14 12:46 ` martin at mpa-garching dot mpg dot de
@ 2010-06-14 12:50 ` jamborm at gcc dot gnu dot org
  2010-06-15  9:49 ` jamborm at gcc dot gnu dot org
  2010-06-15 10:04 ` jamborm at gcc dot gnu dot org
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-14 12:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from jamborm at gcc dot gnu dot org  2010-06-14 12:50 -------
OK, I did not put much effort into my thinking about it :-)
Yes, the testcase is fine as it is.
I'm not testing the patch on the 4.5 branch and will commit it today
if everything goes fine.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (16 preceding siblings ...)
  2010-06-14 12:50 ` jamborm at gcc dot gnu dot org
@ 2010-06-15  9:49 ` jamborm at gcc dot gnu dot org
  2010-06-15 10:04 ` jamborm at gcc dot gnu dot org
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-15  9:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from jamborm at gcc dot gnu dot org  2010-06-15 09:48 -------
Subject: Bug 44423

Author: jamborm
Date: Tue Jun 15 09:48:39 2010
New Revision: 160775

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=160775
Log:
2010-06-15  Martin Jambor  <mjambor@suse.cz>

        PR tree-optimization/44423
        * tree-sra.c (dump_access): Dump also grp_assignment_read.
        (analyze_access_subtree): Pass negative allow_replacements to children
        if the current type is scalar.

        * testsuite/gcc.dg/tree-ssa/pr44423.c: New test.


Added:
    branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/tree-ssa/pr44423.c
Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_5-branch/gcc/tree-sra.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA
  2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
                   ` (17 preceding siblings ...)
  2010-06-15  9:49 ` jamborm at gcc dot gnu dot org
@ 2010-06-15 10:04 ` jamborm at gcc dot gnu dot org
  18 siblings, 0 replies; 20+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2010-06-15 10:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from jamborm at gcc dot gnu dot org  2010-06-15 10:04 -------
This is now fixed on both the trunk and the 4.5 branch.


-- 

jamborm at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-06-15 10:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-05 10:17 [Bug regression/44423] New: [4.5/4.6] Massive performance regression in SSE code martin at mpa-garching dot mpg dot de
2010-06-05 10:18 ` [Bug regression/44423] " martin at mpa-garching dot mpg dot de
2010-06-05 10:47 ` rguenth at gcc dot gnu dot org
2010-06-05 10:56 ` [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA rguenth at gcc dot gnu dot org
2010-06-08 13:16 ` jamborm at gcc dot gnu dot org
2010-06-08 13:54 ` martin at mpa-garching dot mpg dot de
2010-06-08 14:03 ` rguenth at gcc dot gnu dot org
2010-06-08 14:29 ` jamborm at gcc dot gnu dot org
2010-06-08 14:51 ` rguenth at gcc dot gnu dot org
2010-06-08 15:00 ` jamborm at gcc dot gnu dot org
2010-06-08 15:11 ` rguenth at gcc dot gnu dot org
2010-06-09  9:03 ` jamborm at gcc dot gnu dot org
2010-06-09  9:06 ` jamborm at gcc dot gnu dot org
2010-06-09 11:21 ` jamborm at gcc dot gnu dot org
2010-06-09 12:06 ` martin at mpa-garching dot mpg dot de
2010-06-14 12:40 ` jamborm at gcc dot gnu dot org
2010-06-14 12:46 ` martin at mpa-garching dot mpg dot de
2010-06-14 12:50 ` jamborm at gcc dot gnu dot org
2010-06-15  9:49 ` jamborm at gcc dot gnu dot org
2010-06-15 10:04 ` jamborm at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).