public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/25500]  New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
@ 2005-12-20  5:25 yuri at tsoft dot com
  2005-12-20  5:34 ` [Bug c/25500] " yuri at tsoft dot com
                   ` (24 more replies)
  0 siblings, 25 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20  5:25 UTC (permalink / raw)
  To: gcc-bugs

The following testcase when compiled with 'g++ -O3 -msse3 -o testcase
testcase.C'
finishes in 0m0.277s if compiled with gcc-3.4.4 and in 0m44.843s if compiled
with gcc-4.0.2 (similar on all 4.x.x).

Yuri


-----------------------------------------------------------------------

typedef float __v2df __attribute__ ((__vector_size__ (16)));
typedef __v2df __m128;

static __inline __m128 _mm_sub_pd (__m128 __A, __m128 __B) { return
(__m128)__builtin_ia32_subps ((__v2df)__A, (__v2df)__B); }
static __inline __m128 _mm_add_pd (__m128 __A, __m128 __B) { return
(__m128)__builtin_ia32_addps ((__v2df)__A, (__v2df)__B); }
static __inline __m128 _mm_setr_ps (float __Z, float __Y, float __X, float __W)
{ return __extension__ (__m128)(__v2df){ __Z, __Y, __X, __W }; }

struct FF {
  __m128 d;

  __inline FF() { }
  __inline FF(__m128 new_d) : d(new_d) { }
  __inline FF(float f) : d(_mm_setr_ps(f, f, f, f)) { }

  __inline FF operator+(FF other) { return (FF(_mm_add_pd(d,other.d))); }
  __inline FF operator-(FF other) { return (FF(_mm_sub_pd(d,other.d))); }
};

float f[1024*1024];

int main() {
  int i;

  for (i = 0; i < 1024*1024; i++) { f[i] = 1.f/(1024*1024 + 10 - i); }

  FF total(0.f);

  for (int rpt = 0; rpt < 1000; rpt++) {
  FF p1(0.f), p2(0.), c;

  __m128 *pf = (__m128*)f;
  for (i = 0; i < 1024*1024/4; i++) {
    FF c(*pf++);

    total = total + c - p2 + p1;

    p1 = p2;
    p2 = c;
  }
  }
}


-- 
           Summary: REGREGRESSION: SSE2 vectorized code is many times slower
                    on 4.x.x than on 3.4.4
           Product: gcc
           Version: 4.0.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: yuri at tsoft dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug c/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
@ 2005-12-20  5:34 ` yuri at tsoft dot com
  2005-12-20  5:55 ` [Bug target/25500] " pinskia at gcc dot gnu dot org
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20  5:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from yuri at tsoft dot com  2005-12-20 05:34 -------
actually it's the defect in this case: result is not used.
But runtimes are very different in any case.
44.9s on 4.x.x vs. 0m2.371s on 3.4.4

--- begin corrected testcase -----------------------------------
#include <iostream>
using namespace std;

typedef float __v2df __attribute__ ((__vector_size__ (16)));
typedef __v2df __m128;

static __inline __m128 _mm_sub_pd (__m128 __A, __m128 __B) { return
(__m128)__builtin_ia32_subps ((__v2df)__A, (__v2df)__B); }
static __inline __m128 _mm_add_pd (__m128 __A, __m128 __B) { return
(__m128)__builtin_ia32_addps ((__v2df)__A, (__v2df)__B); }
static __inline __m128 _mm_setr_ps (float __Z, float __Y, float __X, float __W)
{ return __extension__ (__m128)(__v2df){ __Z, __Y, __X, __W }; }

struct FF {
  __m128 d;

  __inline FF() { }
  __inline FF(__m128 new_d) : d(new_d) { }
  __inline FF(float f) : d(_mm_setr_ps(f, f, f, f)) { }

  __inline FF operator+(FF other) { return (FF(_mm_add_pd(d,other.d))); }
  __inline FF operator-(FF other) { return (FF(_mm_sub_pd(d,other.d))); }
};

float f[1024*1024];

    union U {
      __m128 m;
      float f[4];
    };

int main() {
  int i;

  FF gtotal(0.f);
  for (i = 0; i < 1024*1024; i++) { f[i] = 1.f/(1024*1024 + 10 - i); }

  FF total(0.f);

  for (int rpt = 0; rpt < 1000; rpt++) {
    FF p1(0.f), p2(0.), c;

    __m128 *pf = (__m128*)f;
    for (i = 0; i < 1024*1024/4; i++) {
      FF c(*pf++);

      total = total + c - p2 + p1;

      p1 = p2;
      p2 = c;
    }
    gtotal = gtotal + total;
  }

  U u;
  u.m = gtotal.d;

  cout << (u.f[0]) << endl;
}
--- end corrected testcase -------------------------------------


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
  2005-12-20  5:34 ` [Bug c/25500] " yuri at tsoft dot com
@ 2005-12-20  5:55 ` pinskia at gcc dot gnu dot org
  2005-12-20  6:01 ` yuri at tsoft dot com
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-20  5:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2005-12-20 05:55 -------
I cannot reproduce this on an Athlon 64 running in either 32 or 64 bit mode.

Everything I tried shows that 4.x is actually faster than 3.4.4.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |target
            Summary|REGRESSION: SSE2 vectorized |REGREGRESSION: SSE2
                   |code is many times slower on|vectorized code is many
                   |4.x.x than on 3.4.4         |times slower on 4.x.x than
                   |                            |on 3.4.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
  2005-12-20  5:34 ` [Bug c/25500] " yuri at tsoft dot com
  2005-12-20  5:55 ` [Bug target/25500] " pinskia at gcc dot gnu dot org
@ 2005-12-20  6:01 ` yuri at tsoft dot com
  2005-12-20  6:03 ` yuri at tsoft dot com
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20  6:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from yuri at tsoft dot com  2005-12-20 06:01 -------
Subject: Re:  REGREGRESSION: SSE2 vectorized code is many
 times slower on 4.x.x than on 3.4.4

I run on Athlon64-3200 in i386 compatible mode.
Strange.

I had he problem with gcc-4.0.1, yesterday I compiled gcc-4.0.2 and same 
thing.

Yuri


pinskia at gcc dot gnu dot org wrote:

>------- Comment #2 from pinskia at gcc dot gnu dot org  2005-12-20 05:55 -------
>I cannot reproduce this on an Athlon 64 running in either 32 or 64 bit mode.
>
>Everything I tried shows that 4.x is actually faster than 3.4.4.
>
>
>  
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (2 preceding siblings ...)
  2005-12-20  6:01 ` yuri at tsoft dot com
@ 2005-12-20  6:03 ` yuri at tsoft dot com
  2005-12-20  6:19 ` yuri at tsoft dot com
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20  6:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from yuri at tsoft dot com  2005-12-20 06:03 -------
Subject: Re:  REGREGRESSION: SSE2 vectorized code is many
 times slower on 4.x.x than on 3.4.4

Also I use FreeBSD-6.0 if this even can make a difference.



pinskia at gcc dot gnu dot org wrote:

>------- Comment #2 from pinskia at gcc dot gnu dot org  2005-12-20 05:55 -------
>I cannot reproduce this on an Athlon 64 running in either 32 or 64 bit mode.
>
>Everything I tried shows that 4.x is actually faster than 3.4.4.
>
>
>  
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (3 preceding siblings ...)
  2005-12-20  6:03 ` yuri at tsoft dot com
@ 2005-12-20  6:19 ` yuri at tsoft dot com
  2005-12-20  6:33 ` pinskia at gcc dot gnu dot org
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20  6:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from yuri at tsoft dot com  2005-12-20 06:19 -------
Subject: Re:  REGREGRESSION: SSE2 vectorized code is many
 times slower on 4.x.x than on 3.4.4

Here's attachment with asms generated in both cases.

testcase-old.s is 4.3.3 and testcase-new.s is 4.0.2

In testcase-new.s SSE2 code is kinda diluted with i386 assembly, notably 
'rep movsl'
which never occurs in 3.4.4 output.

Yuri


------- Comment #6 from yuri at tsoft dot com  2005-12-20 06:19 -------
Created an attachment (id=10534)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10534&action=view)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (4 preceding siblings ...)
  2005-12-20  6:19 ` yuri at tsoft dot com
@ 2005-12-20  6:33 ` pinskia at gcc dot gnu dot org
  2005-12-20  6:36 ` pinskia at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-20  6:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from pinskia at gcc dot gnu dot org  2005-12-20 06:33 -------
I don't get:
        rep
        movsl

At all on GNU/Linux, doing a cross compiler to FreeBSD right now.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (5 preceding siblings ...)
  2005-12-20  6:33 ` pinskia at gcc dot gnu dot org
@ 2005-12-20  6:36 ` pinskia at gcc dot gnu dot org
  2005-12-20  6:51 ` yuri at tsoft dot com
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-20  6:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from pinskia at gcc dot gnu dot org  2005-12-20 06:36 -------
Can you show what the output of "gcc -v" for the 3.4 compiler and the 4.0
compiler?

This looks like just a different using arch by default.

a Compiler compiled for i686 by default gives the good code but code compiled
for i386 give bad code.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (6 preceding siblings ...)
  2005-12-20  6:36 ` pinskia at gcc dot gnu dot org
@ 2005-12-20  6:51 ` yuri at tsoft dot com
  2005-12-20  6:55 ` pinskia at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20  6:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from yuri at tsoft dot com  2005-12-20 06:51 -------
Subject: Re:  REGREGRESSION: SSE2 vectorized code is many
 times slower on 4.x.x than on 3.4.4

-----------------------------------
Using built-in specs.
Configured with: FreeBSD/i386 system compiler
Thread model: posix
gcc version 3.4.4 [FreeBSD] 20050518
-----------------------------------
g++ -v (4.0.2)
Using built-in specs.
Target: i386-unknown-freebsd6.0
Configured with: ../gcc-4.0.2/configure --prefix=/usr/local/gcc-4.0.2
Thread model: posix
gcc version 4.0.2


pinskia at gcc dot gnu dot org wrote:

>------- Comment #8 from pinskia at gcc dot gnu dot org  2005-12-20 06:36 -------
>Can you show what the output of "gcc -v" for the 3.4 compiler and the 4.0
>compiler?
>
>This looks like just a different using arch by default.
>
>a Compiler compiled for i686 by default gives the good code but code compiled
>for i386 give bad code.
>
>
>  
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (7 preceding siblings ...)
  2005-12-20  6:51 ` yuri at tsoft dot com
@ 2005-12-20  6:55 ` pinskia at gcc dot gnu dot org
  2005-12-20  7:40 ` yuri at tsoft dot com
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-20  6:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from pinskia at gcc dot gnu dot org  2005-12-20 06:55 -------
Oh, I looked a little more and yes it depends on the arch you are building for
but only for 4.x.

Since you are using SSE, you should add also -march=i686 or -march=k8 so that
the code is also tuned for the processor you are using.  

Anyways the problem with i386 with 4.0 is really just PR 14295.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |14295


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (8 preceding siblings ...)
  2005-12-20  6:55 ` pinskia at gcc dot gnu dot org
@ 2005-12-20  7:40 ` yuri at tsoft dot com
  2005-12-25  1:02 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: " pinskia at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20  7:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from yuri at tsoft dot com  2005-12-20 07:40 -------
Subject: Re:  REGREGRESSION: SSE2 vectorized code is many
 times slower on 4.x.x than on 3.4.4

Now this huge runtime difference disappeared
but now 4.0.2-generated code is always ~> 20% slower.
Many memory accesses where they are not needed at all and did not exist 
for 3.4.4.

I tried -march=i686 and -march=k8, both are slower than 3.4.4.

Do I also have to recompile gcc with some special options?

Yuri


pinskia at gcc dot gnu dot org wrote:

>------- Comment #10 from pinskia at gcc dot gnu dot org  2005-12-20 06:55 -------
>Oh, I looked a little more and yes it depends on the arch you are building for
>but only for 4.x.
>
>Since you are using SSE, you should add also -march=i686 or -march=k8 so that
>the code is also tuned for the processor you are using.  
>
>Anyways the problem with i386 with 4.0 is really just PR 14295.
>
>
>  
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (9 preceding siblings ...)
  2005-12-20  7:40 ` yuri at tsoft dot com
@ 2005-12-25  1:02 ` pinskia at gcc dot gnu dot org
  2005-12-28 16:53 ` jakub at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-25  1:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from pinskia at gcc dot gnu dot org  2005-12-25 01:02 -------
Confirmed, it really only effects i386/i486 code (maybe i586 also but I did not
try that).

The only thing I can think is to change MOVE_COST for those subtargets or just
have PR 14295 fixed.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
 GCC target triplet|                            |i386-*-*
           Keywords|                            |missed-optimization, ssemmx
      Known to fail|                            |4.0.0 4.1.0 4.2.0
      Known to work|                            |3.4.0
   Last reconfirmed|0000-00-00 00:00:00         |2005-12-25 01:02:34
               date|                            |
            Summary|REGREGRESSION: SSE2         |[4.0/4.1/4.2 Regression]:
                   |vectorized code is many     |SSE2 vectorized code is many
                   |times slower on 4.x.x than  |times slower on 4.x.x than
                   |on 3.4.4                    |on 3.4.4
   Target Milestone|---                         |4.0.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (10 preceding siblings ...)
  2005-12-25  1:02 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: " pinskia at gcc dot gnu dot org
@ 2005-12-28 16:53 ` jakub at gcc dot gnu dot org
  2006-01-15 22:13 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is " mmitchel at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu dot org @ 2005-12-28 16:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from jakub at gcc dot gnu dot org  2005-12-28 16:53 -------
Benchmarking -mtune=i386 tuned code on Athlon64 is simply a bad idea.
Either you need to tune for your CPU (or at least some contemporary one
like -mtune=pentium4 if you want to run quickly on a wider range of CPUs),
or you should be benchmarking on real i386 hardware.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (11 preceding siblings ...)
  2005-12-28 16:53 ` jakub at gcc dot gnu dot org
@ 2006-01-15 22:13 ` mmitchel at gcc dot gnu dot org
  2006-02-24  0:31 ` mmitchel at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-01-15 22:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from mmitchel at gcc dot gnu dot org  2006-01-15 22:13 -------
We're generating correct code, so I've marked this as P2, rather than P1.


-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than on 3.4.4
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (12 preceding siblings ...)
  2006-01-15 22:13 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is " mmitchel at gcc dot gnu dot org
@ 2006-02-24  0:31 ` mmitchel at gcc dot gnu dot org
  2006-05-25  2:38 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous mmitchel at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-02-24  0:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from mmitchel at gcc dot gnu dot org  2006-02-24 00:26 -------
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.


-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.3                       |4.1.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (13 preceding siblings ...)
  2006-02-24  0:31 ` mmitchel at gcc dot gnu dot org
@ 2006-05-25  2:38 ` mmitchel at gcc dot gnu dot org
  2006-07-05  9:50 ` pinskia at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-05-25  2:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from mmitchel at gcc dot gnu dot org  2006-05-25 02:33 -------
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.


-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.1.1                       |4.1.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (14 preceding siblings ...)
  2006-05-25  2:38 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous mmitchel at gcc dot gnu dot org
@ 2006-07-05  9:50 ` pinskia at gcc dot gnu dot org
  2006-08-07  7:55 ` bonzini at gnu dot org
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-07-05  9:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from pinskia at gcc dot gnu dot org  2006-07-05 09:50 -------
struct FF {
  __m128 d;
.....
}

Mine I have a patch for this I cannot believe I found this before.  The patch
has been tested a bit at least in the local tree I have been playing out with.
SRA should use element based copy that struct because it is only one element.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |pinskia at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (15 preceding siblings ...)
  2006-07-05  9:50 ` pinskia at gcc dot gnu dot org
@ 2006-08-07  7:55 ` bonzini at gnu dot org
  2006-08-07  7:59 ` bonzini at gnu dot org
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bonzini at gnu dot org @ 2006-08-07  7:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from bonzini at gnu dot org  2006-08-07 07:54 -------
One element, but with some additional complication because it is a vector.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (16 preceding siblings ...)
  2006-08-07  7:55 ` bonzini at gnu dot org
@ 2006-08-07  7:59 ` bonzini at gnu dot org
  2006-08-07 15:36 ` pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bonzini at gnu dot org @ 2006-08-07  7:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from bonzini at gnu dot org  2006-08-07 07:59 -------
This patchlet makes GCC use element-copy for struct FF:

Index: expr.c
===================================================================
--- expr.c      (revision 115990)
+++ expr.c      (working copy)
@@ -4763,7 +4763,7 @@ count_type_elements (tree type, bool all
       return 2;

     case VECTOR_TYPE:
-      return TYPE_VECTOR_SUBPARTS (type);
+      return TYPE_MODE (type) == BLKmode ? TYPE_VECTOR_SUBPARTS (type) : 1;

     case INTEGER_TYPE:
     case REAL_TYPE:


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (17 preceding siblings ...)
  2006-08-07  7:59 ` bonzini at gnu dot org
@ 2006-08-07 15:36 ` pinskia at gcc dot gnu dot org
  2006-08-17  8:16 ` bonzini at gnu dot org
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-08-07 15:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from pinskia at gcc dot gnu dot org  2006-08-07 15:35 -------
(In reply to comment #19)
> This patchlet makes GCC use element-copy for struct FF:

You have to be careful when editing count_type_elements so that the elements of
a constructor that are not explict are zeroed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (18 preceding siblings ...)
  2006-08-07 15:36 ` pinskia at gcc dot gnu dot org
@ 2006-08-17  8:16 ` bonzini at gnu dot org
  2006-08-18 16:16 ` bonzini at gnu dot org
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bonzini at gnu dot org @ 2006-08-17  8:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from bonzini at gnu dot org  2006-08-17 08:16 -------
I'll see if I can construct a case where my patch fails (actually a newer one)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (19 preceding siblings ...)
  2006-08-17  8:16 ` bonzini at gnu dot org
@ 2006-08-18 16:16 ` bonzini at gnu dot org
  2006-11-12  8:07 ` [Bug target/25500] [4.0/4.1/4.2/4.3 " pinskia at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bonzini at gnu dot org @ 2006-08-18 16:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from bonzini at gnu dot org  2006-08-18 16:16 -------
patch withdrawn, I'll wait for pinskia's


-- 

bonzini at gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|http://gcc.gnu.org/ml/gcc-  |
                   |patches/2006-               |
                   |08/msg00171.html            |
           Keywords|patch                       |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2/4.3 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (20 preceding siblings ...)
  2006-08-18 16:16 ` bonzini at gnu dot org
@ 2006-11-12  8:07 ` pinskia at gcc dot gnu dot org
  2006-11-15  0:38 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-11-12  8:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from pinskia at gcc dot gnu dot org  2006-11-12 08:07 -------
I should be posting a patch for this next week.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2/4.3 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (21 preceding siblings ...)
  2006-11-12  8:07 ` [Bug target/25500] [4.0/4.1/4.2/4.3 " pinskia at gcc dot gnu dot org
@ 2006-11-15  0:38 ` pinskia at gcc dot gnu dot org
  2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
  2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-11-15  0:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from pinskia at gcc dot gnu dot org  2006-11-15 00:38 -------
Patch submitted:
http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01005.html


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|                            |http://gcc.gnu.org/ml/gcc-
                   |                            |patches/2006-
                   |                            |11/msg01005.html
           Keywords|                            |patch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2/4.3 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (22 preceding siblings ...)
  2006-11-15  0:38 ` pinskia at gcc dot gnu dot org
@ 2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
  2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-11-20 20:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from pinskia at gcc dot gnu dot org  2006-11-20 20:29 -------
Subject: Bug 25500

Author: pinskia
Date: Mon Nov 20 20:29:10 2006
New Revision: 119026

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=119026
Log:
2006-11-20  Andrew Pinski  <andrew_pinski@playstation.sony.com>

        PR tree-opt/25500
        * tree-sra.c (single_scalar_field_in_record_p): New function.
        (decide_block_copy): Use it.

2006-11-20  Andrew Pinski  <andrew_pinski@playstation.sony.com>

        PR tree-opt/25500
        * gcc.dg/tree-ssa/sra-4.c: New testcase.



Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/sra-4.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-sra.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/25500] [4.0/4.1/4.2/4.3 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
  2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
                   ` (23 preceding siblings ...)
  2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
@ 2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
  24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-11-20 20:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from pinskia at gcc dot gnu dot org  2006-11-20 20:29 -------
Fixed for 4.3.0 and above.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|4.1.2                       |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2006-11-20 20:29 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-20  5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
2005-12-20  5:34 ` [Bug c/25500] " yuri at tsoft dot com
2005-12-20  5:55 ` [Bug target/25500] " pinskia at gcc dot gnu dot org
2005-12-20  6:01 ` yuri at tsoft dot com
2005-12-20  6:03 ` yuri at tsoft dot com
2005-12-20  6:19 ` yuri at tsoft dot com
2005-12-20  6:33 ` pinskia at gcc dot gnu dot org
2005-12-20  6:36 ` pinskia at gcc dot gnu dot org
2005-12-20  6:51 ` yuri at tsoft dot com
2005-12-20  6:55 ` pinskia at gcc dot gnu dot org
2005-12-20  7:40 ` yuri at tsoft dot com
2005-12-25  1:02 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: " pinskia at gcc dot gnu dot org
2005-12-28 16:53 ` jakub at gcc dot gnu dot org
2006-01-15 22:13 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is " mmitchel at gcc dot gnu dot org
2006-02-24  0:31 ` mmitchel at gcc dot gnu dot org
2006-05-25  2:38 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous mmitchel at gcc dot gnu dot org
2006-07-05  9:50 ` pinskia at gcc dot gnu dot org
2006-08-07  7:55 ` bonzini at gnu dot org
2006-08-07  7:59 ` bonzini at gnu dot org
2006-08-07 15:36 ` pinskia at gcc dot gnu dot org
2006-08-17  8:16 ` bonzini at gnu dot org
2006-08-18 16:16 ` bonzini at gnu dot org
2006-11-12  8:07 ` [Bug target/25500] [4.0/4.1/4.2/4.3 " pinskia at gcc dot gnu dot org
2006-11-15  0:38 ` pinskia at gcc dot gnu dot org
2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
2006-11-20 20:29 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).