public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
@ 2005-12-20 5:25 yuri at tsoft dot com
2005-12-20 5:34 ` [Bug c/25500] " yuri at tsoft dot com
` (24 more replies)
0 siblings, 25 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20 5:25 UTC (permalink / raw)
To: gcc-bugs
The following testcase when compiled with 'g++ -O3 -msse3 -o testcase
testcase.C'
finishes in 0m0.277s if compiled with gcc-3.4.4 and in 0m44.843s if compiled
with gcc-4.0.2 (similar on all 4.x.x).
Yuri
-----------------------------------------------------------------------
typedef float __v2df __attribute__ ((__vector_size__ (16)));
typedef __v2df __m128;
static __inline __m128 _mm_sub_pd (__m128 __A, __m128 __B) { return
(__m128)__builtin_ia32_subps ((__v2df)__A, (__v2df)__B); }
static __inline __m128 _mm_add_pd (__m128 __A, __m128 __B) { return
(__m128)__builtin_ia32_addps ((__v2df)__A, (__v2df)__B); }
static __inline __m128 _mm_setr_ps (float __Z, float __Y, float __X, float __W)
{ return __extension__ (__m128)(__v2df){ __Z, __Y, __X, __W }; }
struct FF {
__m128 d;
__inline FF() { }
__inline FF(__m128 new_d) : d(new_d) { }
__inline FF(float f) : d(_mm_setr_ps(f, f, f, f)) { }
__inline FF operator+(FF other) { return (FF(_mm_add_pd(d,other.d))); }
__inline FF operator-(FF other) { return (FF(_mm_sub_pd(d,other.d))); }
};
float f[1024*1024];
int main() {
int i;
for (i = 0; i < 1024*1024; i++) { f[i] = 1.f/(1024*1024 + 10 - i); }
FF total(0.f);
for (int rpt = 0; rpt < 1000; rpt++) {
FF p1(0.f), p2(0.), c;
__m128 *pf = (__m128*)f;
for (i = 0; i < 1024*1024/4; i++) {
FF c(*pf++);
total = total + c - p2 + p1;
p1 = p2;
p2 = c;
}
}
}
--
Summary: REGREGRESSION: SSE2 vectorized code is many times slower
on 4.x.x than on 3.4.4
Product: gcc
Version: 4.0.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: yuri at tsoft dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug c/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
@ 2005-12-20 5:34 ` yuri at tsoft dot com
2005-12-20 5:55 ` [Bug target/25500] " pinskia at gcc dot gnu dot org
` (23 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20 5:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from yuri at tsoft dot com 2005-12-20 05:34 -------
actually it's the defect in this case: result is not used.
But runtimes are very different in any case.
44.9s on 4.x.x vs. 0m2.371s on 3.4.4
--- begin corrected testcase -----------------------------------
#include <iostream>
using namespace std;
typedef float __v2df __attribute__ ((__vector_size__ (16)));
typedef __v2df __m128;
static __inline __m128 _mm_sub_pd (__m128 __A, __m128 __B) { return
(__m128)__builtin_ia32_subps ((__v2df)__A, (__v2df)__B); }
static __inline __m128 _mm_add_pd (__m128 __A, __m128 __B) { return
(__m128)__builtin_ia32_addps ((__v2df)__A, (__v2df)__B); }
static __inline __m128 _mm_setr_ps (float __Z, float __Y, float __X, float __W)
{ return __extension__ (__m128)(__v2df){ __Z, __Y, __X, __W }; }
struct FF {
__m128 d;
__inline FF() { }
__inline FF(__m128 new_d) : d(new_d) { }
__inline FF(float f) : d(_mm_setr_ps(f, f, f, f)) { }
__inline FF operator+(FF other) { return (FF(_mm_add_pd(d,other.d))); }
__inline FF operator-(FF other) { return (FF(_mm_sub_pd(d,other.d))); }
};
float f[1024*1024];
union U {
__m128 m;
float f[4];
};
int main() {
int i;
FF gtotal(0.f);
for (i = 0; i < 1024*1024; i++) { f[i] = 1.f/(1024*1024 + 10 - i); }
FF total(0.f);
for (int rpt = 0; rpt < 1000; rpt++) {
FF p1(0.f), p2(0.), c;
__m128 *pf = (__m128*)f;
for (i = 0; i < 1024*1024/4; i++) {
FF c(*pf++);
total = total + c - p2 + p1;
p1 = p2;
p2 = c;
}
gtotal = gtotal + total;
}
U u;
u.m = gtotal.d;
cout << (u.f[0]) << endl;
}
--- end corrected testcase -------------------------------------
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
2005-12-20 5:34 ` [Bug c/25500] " yuri at tsoft dot com
@ 2005-12-20 5:55 ` pinskia at gcc dot gnu dot org
2005-12-20 6:01 ` yuri at tsoft dot com
` (22 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-20 5:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from pinskia at gcc dot gnu dot org 2005-12-20 05:55 -------
I cannot reproduce this on an Athlon 64 running in either 32 or 64 bit mode.
Everything I tried shows that 4.x is actually faster than 3.4.4.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|c |target
Summary|REGRESSION: SSE2 vectorized |REGREGRESSION: SSE2
|code is many times slower on|vectorized code is many
|4.x.x than on 3.4.4 |times slower on 4.x.x than
| |on 3.4.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
2005-12-20 5:34 ` [Bug c/25500] " yuri at tsoft dot com
2005-12-20 5:55 ` [Bug target/25500] " pinskia at gcc dot gnu dot org
@ 2005-12-20 6:01 ` yuri at tsoft dot com
2005-12-20 6:03 ` yuri at tsoft dot com
` (21 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20 6:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from yuri at tsoft dot com 2005-12-20 06:01 -------
Subject: Re: REGREGRESSION: SSE2 vectorized code is many
times slower on 4.x.x than on 3.4.4
I run on Athlon64-3200 in i386 compatible mode.
Strange.
I had he problem with gcc-4.0.1, yesterday I compiled gcc-4.0.2 and same
thing.
Yuri
pinskia at gcc dot gnu dot org wrote:
>------- Comment #2 from pinskia at gcc dot gnu dot org 2005-12-20 05:55 -------
>I cannot reproduce this on an Athlon 64 running in either 32 or 64 bit mode.
>
>Everything I tried shows that 4.x is actually faster than 3.4.4.
>
>
>
>
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (2 preceding siblings ...)
2005-12-20 6:01 ` yuri at tsoft dot com
@ 2005-12-20 6:03 ` yuri at tsoft dot com
2005-12-20 6:19 ` yuri at tsoft dot com
` (20 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20 6:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from yuri at tsoft dot com 2005-12-20 06:03 -------
Subject: Re: REGREGRESSION: SSE2 vectorized code is many
times slower on 4.x.x than on 3.4.4
Also I use FreeBSD-6.0 if this even can make a difference.
pinskia at gcc dot gnu dot org wrote:
>------- Comment #2 from pinskia at gcc dot gnu dot org 2005-12-20 05:55 -------
>I cannot reproduce this on an Athlon 64 running in either 32 or 64 bit mode.
>
>Everything I tried shows that 4.x is actually faster than 3.4.4.
>
>
>
>
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (3 preceding siblings ...)
2005-12-20 6:03 ` yuri at tsoft dot com
@ 2005-12-20 6:19 ` yuri at tsoft dot com
2005-12-20 6:33 ` pinskia at gcc dot gnu dot org
` (19 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20 6:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from yuri at tsoft dot com 2005-12-20 06:19 -------
Subject: Re: REGREGRESSION: SSE2 vectorized code is many
times slower on 4.x.x than on 3.4.4
Here's attachment with asms generated in both cases.
testcase-old.s is 4.3.3 and testcase-new.s is 4.0.2
In testcase-new.s SSE2 code is kinda diluted with i386 assembly, notably
'rep movsl'
which never occurs in 3.4.4 output.
Yuri
------- Comment #6 from yuri at tsoft dot com 2005-12-20 06:19 -------
Created an attachment (id=10534)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10534&action=view)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (4 preceding siblings ...)
2005-12-20 6:19 ` yuri at tsoft dot com
@ 2005-12-20 6:33 ` pinskia at gcc dot gnu dot org
2005-12-20 6:36 ` pinskia at gcc dot gnu dot org
` (18 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-20 6:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from pinskia at gcc dot gnu dot org 2005-12-20 06:33 -------
I don't get:
rep
movsl
At all on GNU/Linux, doing a cross compiler to FreeBSD right now.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (5 preceding siblings ...)
2005-12-20 6:33 ` pinskia at gcc dot gnu dot org
@ 2005-12-20 6:36 ` pinskia at gcc dot gnu dot org
2005-12-20 6:51 ` yuri at tsoft dot com
` (17 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-20 6:36 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from pinskia at gcc dot gnu dot org 2005-12-20 06:36 -------
Can you show what the output of "gcc -v" for the 3.4 compiler and the 4.0
compiler?
This looks like just a different using arch by default.
a Compiler compiled for i686 by default gives the good code but code compiled
for i386 give bad code.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (6 preceding siblings ...)
2005-12-20 6:36 ` pinskia at gcc dot gnu dot org
@ 2005-12-20 6:51 ` yuri at tsoft dot com
2005-12-20 6:55 ` pinskia at gcc dot gnu dot org
` (16 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20 6:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from yuri at tsoft dot com 2005-12-20 06:51 -------
Subject: Re: REGREGRESSION: SSE2 vectorized code is many
times slower on 4.x.x than on 3.4.4
-----------------------------------
Using built-in specs.
Configured with: FreeBSD/i386 system compiler
Thread model: posix
gcc version 3.4.4 [FreeBSD] 20050518
-----------------------------------
g++ -v (4.0.2)
Using built-in specs.
Target: i386-unknown-freebsd6.0
Configured with: ../gcc-4.0.2/configure --prefix=/usr/local/gcc-4.0.2
Thread model: posix
gcc version 4.0.2
pinskia at gcc dot gnu dot org wrote:
>------- Comment #8 from pinskia at gcc dot gnu dot org 2005-12-20 06:36 -------
>Can you show what the output of "gcc -v" for the 3.4 compiler and the 4.0
>compiler?
>
>This looks like just a different using arch by default.
>
>a Compiler compiled for i686 by default gives the good code but code compiled
>for i386 give bad code.
>
>
>
>
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (7 preceding siblings ...)
2005-12-20 6:51 ` yuri at tsoft dot com
@ 2005-12-20 6:55 ` pinskia at gcc dot gnu dot org
2005-12-20 7:40 ` yuri at tsoft dot com
` (15 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-20 6:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from pinskia at gcc dot gnu dot org 2005-12-20 06:55 -------
Oh, I looked a little more and yes it depends on the arch you are building for
but only for 4.x.
Since you are using SSE, you should add also -march=i686 or -march=k8 so that
the code is also tuned for the processor you are using.
Anyways the problem with i386 with 4.0 is really just PR 14295.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |14295
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (8 preceding siblings ...)
2005-12-20 6:55 ` pinskia at gcc dot gnu dot org
@ 2005-12-20 7:40 ` yuri at tsoft dot com
2005-12-25 1:02 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: " pinskia at gcc dot gnu dot org
` (14 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2005-12-20 7:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from yuri at tsoft dot com 2005-12-20 07:40 -------
Subject: Re: REGREGRESSION: SSE2 vectorized code is many
times slower on 4.x.x than on 3.4.4
Now this huge runtime difference disappeared
but now 4.0.2-generated code is always ~> 20% slower.
Many memory accesses where they are not needed at all and did not exist
for 3.4.4.
I tried -march=i686 and -march=k8, both are slower than 3.4.4.
Do I also have to recompile gcc with some special options?
Yuri
pinskia at gcc dot gnu dot org wrote:
>------- Comment #10 from pinskia at gcc dot gnu dot org 2005-12-20 06:55 -------
>Oh, I looked a little more and yes it depends on the arch you are building for
>but only for 4.x.
>
>Since you are using SSE, you should add also -march=i686 or -march=k8 so that
>the code is also tuned for the processor you are using.
>
>Anyways the problem with i386 with 4.0 is really just PR 14295.
>
>
>
>
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (9 preceding siblings ...)
2005-12-20 7:40 ` yuri at tsoft dot com
@ 2005-12-25 1:02 ` pinskia at gcc dot gnu dot org
2005-12-28 16:53 ` jakub at gcc dot gnu dot org
` (13 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-12-25 1:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from pinskia at gcc dot gnu dot org 2005-12-25 01:02 -------
Confirmed, it really only effects i386/i486 code (maybe i586 also but I did not
try that).
The only thing I can think is to change MOVE_COST for those subtargets or just
have PR 14295 fixed.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
GCC target triplet| |i386-*-*
Keywords| |missed-optimization, ssemmx
Known to fail| |4.0.0 4.1.0 4.2.0
Known to work| |3.4.0
Last reconfirmed|0000-00-00 00:00:00 |2005-12-25 01:02:34
date| |
Summary|REGREGRESSION: SSE2 |[4.0/4.1/4.2 Regression]:
|vectorized code is many |SSE2 vectorized code is many
|times slower on 4.x.x than |times slower on 4.x.x than
|on 3.4.4 |on 3.4.4
Target Milestone|--- |4.0.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (10 preceding siblings ...)
2005-12-25 1:02 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: " pinskia at gcc dot gnu dot org
@ 2005-12-28 16:53 ` jakub at gcc dot gnu dot org
2006-01-15 22:13 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is " mmitchel at gcc dot gnu dot org
` (12 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu dot org @ 2005-12-28 16:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from jakub at gcc dot gnu dot org 2005-12-28 16:53 -------
Benchmarking -mtune=i386 tuned code on Athlon64 is simply a bad idea.
Either you need to tune for your CPU (or at least some contemporary one
like -mtune=pentium4 if you want to run quickly on a wider range of CPUs),
or you should be benchmarking on real i386 hardware.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (11 preceding siblings ...)
2005-12-28 16:53 ` jakub at gcc dot gnu dot org
@ 2006-01-15 22:13 ` mmitchel at gcc dot gnu dot org
2006-02-24 0:31 ` mmitchel at gcc dot gnu dot org
` (11 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-01-15 22:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from mmitchel at gcc dot gnu dot org 2006-01-15 22:13 -------
We're generating correct code, so I've marked this as P2, rather than P1.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than on 3.4.4
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (12 preceding siblings ...)
2006-01-15 22:13 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is " mmitchel at gcc dot gnu dot org
@ 2006-02-24 0:31 ` mmitchel at gcc dot gnu dot org
2006-05-25 2:38 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous mmitchel at gcc dot gnu dot org
` (10 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-02-24 0:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from mmitchel at gcc dot gnu dot org 2006-02-24 00:26 -------
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.0.3 |4.1.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (13 preceding siblings ...)
2006-02-24 0:31 ` mmitchel at gcc dot gnu dot org
@ 2006-05-25 2:38 ` mmitchel at gcc dot gnu dot org
2006-07-05 9:50 ` pinskia at gcc dot gnu dot org
` (9 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-05-25 2:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from mmitchel at gcc dot gnu dot org 2006-05-25 02:33 -------
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.1.1 |4.1.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (14 preceding siblings ...)
2006-05-25 2:38 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous mmitchel at gcc dot gnu dot org
@ 2006-07-05 9:50 ` pinskia at gcc dot gnu dot org
2006-08-07 7:55 ` bonzini at gnu dot org
` (8 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-07-05 9:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from pinskia at gcc dot gnu dot org 2006-07-05 09:50 -------
struct FF {
__m128 d;
.....
}
Mine I have a patch for this I cannot believe I found this before. The patch
has been tested a bit at least in the local tree I have been playing out with.
SRA should use element based copy that struct because it is only one element.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |pinskia at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (15 preceding siblings ...)
2006-07-05 9:50 ` pinskia at gcc dot gnu dot org
@ 2006-08-07 7:55 ` bonzini at gnu dot org
2006-08-07 7:59 ` bonzini at gnu dot org
` (7 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: bonzini at gnu dot org @ 2006-08-07 7:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from bonzini at gnu dot org 2006-08-07 07:54 -------
One element, but with some additional complication because it is a vector.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (16 preceding siblings ...)
2006-08-07 7:55 ` bonzini at gnu dot org
@ 2006-08-07 7:59 ` bonzini at gnu dot org
2006-08-07 15:36 ` pinskia at gcc dot gnu dot org
` (6 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: bonzini at gnu dot org @ 2006-08-07 7:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from bonzini at gnu dot org 2006-08-07 07:59 -------
This patchlet makes GCC use element-copy for struct FF:
Index: expr.c
===================================================================
--- expr.c (revision 115990)
+++ expr.c (working copy)
@@ -4763,7 +4763,7 @@ count_type_elements (tree type, bool all
return 2;
case VECTOR_TYPE:
- return TYPE_VECTOR_SUBPARTS (type);
+ return TYPE_MODE (type) == BLKmode ? TYPE_VECTOR_SUBPARTS (type) : 1;
case INTEGER_TYPE:
case REAL_TYPE:
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (17 preceding siblings ...)
2006-08-07 7:59 ` bonzini at gnu dot org
@ 2006-08-07 15:36 ` pinskia at gcc dot gnu dot org
2006-08-17 8:16 ` bonzini at gnu dot org
` (5 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-08-07 15:36 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from pinskia at gcc dot gnu dot org 2006-08-07 15:35 -------
(In reply to comment #19)
> This patchlet makes GCC use element-copy for struct FF:
You have to be careful when editing count_type_elements so that the elements of
a constructor that are not explict are zeroed.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (18 preceding siblings ...)
2006-08-07 15:36 ` pinskia at gcc dot gnu dot org
@ 2006-08-17 8:16 ` bonzini at gnu dot org
2006-08-18 16:16 ` bonzini at gnu dot org
` (4 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: bonzini at gnu dot org @ 2006-08-17 8:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from bonzini at gnu dot org 2006-08-17 08:16 -------
I'll see if I can construct a case where my patch fails (actually a newer one)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (19 preceding siblings ...)
2006-08-17 8:16 ` bonzini at gnu dot org
@ 2006-08-18 16:16 ` bonzini at gnu dot org
2006-11-12 8:07 ` [Bug target/25500] [4.0/4.1/4.2/4.3 " pinskia at gcc dot gnu dot org
` (3 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: bonzini at gnu dot org @ 2006-08-18 16:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from bonzini at gnu dot org 2006-08-18 16:16 -------
patch withdrawn, I'll wait for pinskia's
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
URL|http://gcc.gnu.org/ml/gcc- |
|patches/2006- |
|08/msg00171.html |
Keywords|patch |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2/4.3 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (20 preceding siblings ...)
2006-08-18 16:16 ` bonzini at gnu dot org
@ 2006-11-12 8:07 ` pinskia at gcc dot gnu dot org
2006-11-15 0:38 ` pinskia at gcc dot gnu dot org
` (2 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-11-12 8:07 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from pinskia at gcc dot gnu dot org 2006-11-12 08:07 -------
I should be posting a patch for this next week.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2/4.3 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (21 preceding siblings ...)
2006-11-12 8:07 ` [Bug target/25500] [4.0/4.1/4.2/4.3 " pinskia at gcc dot gnu dot org
@ 2006-11-15 0:38 ` pinskia at gcc dot gnu dot org
2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-11-15 0:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from pinskia at gcc dot gnu dot org 2006-11-15 00:38 -------
Patch submitted:
http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01005.html
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
URL| |http://gcc.gnu.org/ml/gcc-
| |patches/2006-
| |11/msg01005.html
Keywords| |patch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2/4.3 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (22 preceding siblings ...)
2006-11-15 0:38 ` pinskia at gcc dot gnu dot org
@ 2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-11-20 20:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from pinskia at gcc dot gnu dot org 2006-11-20 20:29 -------
Subject: Bug 25500
Author: pinskia
Date: Mon Nov 20 20:29:10 2006
New Revision: 119026
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=119026
Log:
2006-11-20 Andrew Pinski <andrew_pinski@playstation.sony.com>
PR tree-opt/25500
* tree-sra.c (single_scalar_field_in_record_p): New function.
(decide_block_copy): Use it.
2006-11-20 Andrew Pinski <andrew_pinski@playstation.sony.com>
PR tree-opt/25500
* gcc.dg/tree-ssa/sra-4.c: New testcase.
Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/sra-4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-sra.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Bug target/25500] [4.0/4.1/4.2/4.3 Regression]: SSE2 vectorized code is slower on 4.x.x than previous
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
` (23 preceding siblings ...)
2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
@ 2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
24 siblings, 0 replies; 26+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-11-20 20:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from pinskia at gcc dot gnu dot org 2006-11-20 20:29 -------
Fixed for 4.3.0 and above.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
Target Milestone|4.1.2 |4.3.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2006-11-20 20:29 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-20 5:25 [Bug c/25500] New: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4 yuri at tsoft dot com
2005-12-20 5:34 ` [Bug c/25500] " yuri at tsoft dot com
2005-12-20 5:55 ` [Bug target/25500] " pinskia at gcc dot gnu dot org
2005-12-20 6:01 ` yuri at tsoft dot com
2005-12-20 6:03 ` yuri at tsoft dot com
2005-12-20 6:19 ` yuri at tsoft dot com
2005-12-20 6:33 ` pinskia at gcc dot gnu dot org
2005-12-20 6:36 ` pinskia at gcc dot gnu dot org
2005-12-20 6:51 ` yuri at tsoft dot com
2005-12-20 6:55 ` pinskia at gcc dot gnu dot org
2005-12-20 7:40 ` yuri at tsoft dot com
2005-12-25 1:02 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: " pinskia at gcc dot gnu dot org
2005-12-28 16:53 ` jakub at gcc dot gnu dot org
2006-01-15 22:13 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is " mmitchel at gcc dot gnu dot org
2006-02-24 0:31 ` mmitchel at gcc dot gnu dot org
2006-05-25 2:38 ` [Bug target/25500] [4.0/4.1/4.2 Regression]: SSE2 vectorized code is slower on 4.x.x than previous mmitchel at gcc dot gnu dot org
2006-07-05 9:50 ` pinskia at gcc dot gnu dot org
2006-08-07 7:55 ` bonzini at gnu dot org
2006-08-07 7:59 ` bonzini at gnu dot org
2006-08-07 15:36 ` pinskia at gcc dot gnu dot org
2006-08-17 8:16 ` bonzini at gnu dot org
2006-08-18 16:16 ` bonzini at gnu dot org
2006-11-12 8:07 ` [Bug target/25500] [4.0/4.1/4.2/4.3 " pinskia at gcc dot gnu dot org
2006-11-15 0:38 ` pinskia at gcc dot gnu dot org
2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
2006-11-20 20:29 ` pinskia at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).