* Re: Bad performance (regression) on oopack's Complex test
@ 2003-01-27 22:02 Paolo Carlini
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Carlini @ 2003-01-27 22:02 UTC (permalink / raw)
To: Jan Hubicka; +Cc: gcc List
> What effect does have -mfpmath=sse switch?
It helps! But, as expected, most noticeably the C test, not the
OOP counterpart (and the Ratio becomes even bigger ;)
Seconds Mflops
Iterations C OOP C OOP Ratio
3.2.1
-----
-O2
Complex 200000 1.6 21.4 981.6 74.8 13.1
-O2 -march=pentium4 -mfpmath=sse
Complex 200000 1.0 19.1 1616.2 84.0 19.2
3.4
---
-O2
Complex 200000 1.6 29.2 993.8 54.8 18.1
-O2 -march=pentium4 -mfpmath=sse
Complex 200000 1.0 27.0 1649.5 59.3 27.8
Thanks,
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bad performance (regression) on oopack's Complex test
2003-01-27 16:55 ` Paolo Carlini
@ 2003-01-27 17:00 ` Jan Hubicka
0 siblings, 0 replies; 5+ messages in thread
From: Jan Hubicka @ 2003-01-27 17:00 UTC (permalink / raw)
To: Paolo Carlini; +Cc: Falk Hueffner, gcc
> Falk Hueffner wrote:
>
> >Have you tried -O3, or generally tuning inlining?
> >
> I have tried both, with no positive results (the inlined operators are
> very small)
>
> >BTW, on Alpha, I can see an improvement:
> >
> Nice. In fact on an older PII machine I was seeing Ratios similar to yours.
> I think that there is a general issue and a specific one concerning P4
> (slowness of memory accesses?)
I will try to look closer into this soon, as -march=pentium4 is my
work after all. It needs a lot more effort than I was able to put into
it - for instance scheduling is entirely missing.
What effect does have -mfpmath=sse switch?
Honza
>
> >Just for fun, here's the result of cxx -O5 -fast:
> >
> I know :-( ... this is the final goal which icc already achieves on any
> x86 cpu.
>
> Paolo.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bad performance (regression) on oopack's Complex test
2003-01-27 16:26 ` Falk Hueffner
@ 2003-01-27 16:55 ` Paolo Carlini
2003-01-27 17:00 ` Jan Hubicka
0 siblings, 1 reply; 5+ messages in thread
From: Paolo Carlini @ 2003-01-27 16:55 UTC (permalink / raw)
To: Falk Hueffner; +Cc: gcc
Falk Hueffner wrote:
>Have you tried -O3, or generally tuning inlining?
>
I have tried both, with no positive results (the inlined operators are
very small)
>BTW, on Alpha, I can see an improvement:
>
Nice. In fact on an older PII machine I was seeing Ratios similar to yours.
I think that there is a general issue and a specific one concerning P4
(slowness of memory accesses?)
>Just for fun, here's the result of cxx -O5 -fast:
>
I know :-( ... this is the final goal which icc already achieves on any
x86 cpu.
Paolo.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bad performance (regression) on oopack's Complex test
2003-01-27 15:46 Paolo Carlini
@ 2003-01-27 16:26 ` Falk Hueffner
2003-01-27 16:55 ` Paolo Carlini
0 siblings, 1 reply; 5+ messages in thread
From: Falk Hueffner @ 2003-01-27 16:26 UTC (permalink / raw)
To: Paolo Carlini; +Cc: gcc, Jan Hubicka
Paolo Carlini <pcarlini@unitus.it> writes:
> Hi everyone,
>
> I'm noticing bad performance on this test. Moreover, the numbers are
> worse for 3.4 vs 3.2.1.
>
> On my P4-2400, 200000 iterations, -O2 (ideally, the ratio should be
> 1 :(
Have you tried -O3, or generally tuning inlining?
BTW, on Alpha, I can see an improvement:
g++ 3.2 -O3
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Complex 200000 2.4 17.7 677.1 90.3 7.5
g++ 3.4 20030116 -O3
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Complex 200000 2.3 11.3 701.5 141.6 5.0
Just for fun, here's the result of cxx -O5 -fast:
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Max 100000 1.5 1.6 65.0 64.5 1.0
Matrix 2000 0.9 0.9 537.0 539.8 1.0
Complex 200000 1.6 1.6 999.6 988.7 1.0
Iterator 200000 0.5 0.5 729.2 729.2 1.0
--
Falk
^ permalink raw reply [flat|nested] 5+ messages in thread
* Bad performance (regression) on oopack's Complex test
@ 2003-01-27 15:46 Paolo Carlini
2003-01-27 16:26 ` Falk Hueffner
0 siblings, 1 reply; 5+ messages in thread
From: Paolo Carlini @ 2003-01-27 15:46 UTC (permalink / raw)
To: gcc; +Cc: Jan Hubicka
[-- Attachment #1: Type: text/plain, Size: 10413 bytes --]
Hi everyone,
I'm noticing bad performance on this test. Moreover, the
numbers are worse for 3.4 vs 3.2.1.
On my P4-2400, 200000 iterations, -O2 (ideally, the ratio
should be 1 :(
3.2.1
-----
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Complex 200000 1.6 21.4 981.6 74.8 13.1
3.4
---
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Complex 200000 1.6 29.2 993.8 54.8 18.1
Dumps of the relevant loop follows:
3.2.1 C
-------
000003d0 <ComplexBenchmark::c_style() const>:
3f0: 89 d0 mov %edx,%eax
3f2: d9 c1 fld %st(1)
3f4: c1 e0 04 shl $0x4,%eax
3f7: dd 80 00 00 00 00 fldl 0x0(%eax)
3fd: 42 inc %edx
3fe: d9 c2 fld %st(2)
400: dd 80 08 00 00 00 fldl 0x8(%eax)
406: d9 cb fxch %st(3)
408: d8 ca fmul %st(2),%st
40a: d9 c9 fxch %st(1)
40c: 81 fa e7 03 00 00 cmp $0x3e7,%edx
412: d8 cb fmul %st(3),%st
414: d9 cb fxch %st(3)
416: d8 cd fmul %st(5),%st
418: d9 ca fxch %st(2)
41a: d8 cc fmul %st(4),%st
41c: d9 c9 fxch %st(1)
41e: dc 80 00 00 00 00 faddl 0x0(%eax)
424: d9 ca fxch %st(2)
426: dc 80 08 00 00 00 faddl 0x8(%eax)
42c: d9 ca fxch %st(2)
42e: de e3 fsubp %st,%st(3)
430: de c1 faddp %st,%st(1)
432: d9 c9 fxch %st(1)
434: dd 98 00 00 00 00 fstpl 0x0(%eax)
43a: dd 98 08 00 00 00 fstpl 0x8(%eax)
440: 7e ae jle 3f0
<ComplexBenchmark::c_style() const+0x20>
3.2.1 OOP
---------
00000450 <ComplexBenchmark::oop_style() const>:
490: 89 7d 9c mov %edi,0xffffff9c(%ebp)
493: 89 ca mov %ecx,%edx
495: c1 e2 04 shl $0x4,%edx
498: 89 75 a0 mov %esi,0xffffffa0(%ebp)
49b: 8b 82 00 00 00 00 mov 0x0(%edx),%eax
4a1: 41 inc %ecx
4a2: 89 5d a4 mov %ebx,0xffffffa4(%ebp)
4a5: 81 f9 e7 03 00 00 cmp $0x3e7,%ecx
4ab: 89 45 b8 mov %eax,0xffffffb8(%ebp)
4ae: 8b 82 04 00 00 00 mov 0x4(%edx),%eax
4b4: dd 45 a0 fldl 0xffffffa0(%ebp)
4b7: 89 45 bc mov %eax,0xffffffbc(%ebp)
4ba: 8b 82 08 00 00 00 mov 0x8(%edx),%eax
4c0: d9 c0 fld %st(0)
4c2: 89 45 c0 mov %eax,0xffffffc0(%ebp)
4c5: 8b 82 0c 00 00 00 mov 0xc(%edx),%eax
4cb: 89 45 c4 mov %eax,0xffffffc4(%ebp)
4ce: 8b 45 84 mov 0xffffff84(%ebp),%eax
4d1: 89 45 98 mov %eax,0xffffff98(%ebp)
4d4: 8b 82 00 00 00 00 mov 0x0(%edx),%eax
4da: dd 45 98 fldl 0xffffff98(%ebp)
4dd: 89 45 88 mov %eax,0xffffff88(%ebp)
4e0: 8b 82 04 00 00 00 mov 0x4(%edx),%eax
4e6: d9 c0 fld %st(0)
4e8: 89 45 8c mov %eax,0xffffff8c(%ebp)
4eb: 8b 82 08 00 00 00 mov 0x8(%edx),%eax
4f1: dd 45 88 fldl 0xffffff88(%ebp)
4f4: 89 45 90 mov %eax,0xffffff90(%ebp)
4f7: 8b 82 0c 00 00 00 mov 0xc(%edx),%eax
4fd: dc c9 fmul %st,%st(1)
4ff: 89 45 94 mov %eax,0xffffff94(%ebp)
502: de cc fmulp %st,%st(4)
504: dd 45 90 fldl 0xffffff90(%ebp)
507: dc cb fmul %st,%st(3)
509: de ca fmulp %st,%st(2)
50b: de e2 fsubp %st,%st(2)
50d: de c2 faddp %st,%st(2)
50f: dc 45 b8 faddl 0xffffffb8(%ebp)
512: d9 c9 fxch %st(1)
514: dc 45 c0 faddl 0xffffffc0(%ebp)
517: d9 c9 fxch %st(1)
519: dd 5d c8 fstpl 0xffffffc8(%ebp)
51c: 8b 45 c8 mov 0xffffffc8(%ebp),%eax
51f: dd 5d d0 fstpl 0xffffffd0(%ebp)
522: 89 82 00 00 00 00 mov %eax,0x0(%edx)
528: 8b 45 cc mov 0xffffffcc(%ebp),%eax
52b: 89 82 04 00 00 00 mov %eax,0x4(%edx)
531: 8b 45 d0 mov 0xffffffd0(%ebp),%eax
534: 89 82 08 00 00 00 mov %eax,0x8(%edx)
53a: 8b 45 d4 mov 0xffffffd4(%ebp),%eax
53d: 89 82 0c 00 00 00 mov %eax,0xc(%edx)
543: 0f 8e 47 ff ff ff jle 490
<ComplexBenchmark::oop_style() const+0x40>
3.4 C
-----
000003a0 <ComplexBenchmark::c_style() const>:
3c0: 89 d0 mov %edx,%eax
3c2: d9 c1 fld %st(1)
3c4: c1 e0 04 shl $0x4,%eax
3c7: dd 80 00 00 00 00 fldl 0x0(%eax)
3cd: 42 inc %edx
3ce: d9 c2 fld %st(2)
3d0: dd 80 08 00 00 00 fldl 0x8(%eax)
3d6: d9 cb fxch %st(3)
3d8: 81 fa e7 03 00 00 cmp $0x3e7,%edx
3de: d8 ca fmul %st(2),%st
3e0: d9 c9 fxch %st(1)
3e2: d8 cb fmul %st(3),%st
3e4: d9 cb fxch %st(3)
3e6: d8 cd fmul %st(5),%st
3e8: d9 ca fxch %st(2)
3ea: d8 cc fmul %st(4),%st
3ec: d9 c9 fxch %st(1)
3ee: dc 80 00 00 00 00 faddl 0x0(%eax)
3f4: d9 ca fxch %st(2)
3f6: dc 80 08 00 00 00 faddl 0x8(%eax)
3fc: d9 ca fxch %st(2)
3fe: de e3 fsubp %st,%st(3)
400: de c1 faddp %st,%st(1)
402: d9 c9 fxch %st(1)
404: dd 98 00 00 00 00 fstpl 0x0(%eax)
40a: dd 98 08 00 00 00 fstpl 0x8(%eax)
410: 7e ae jle 3c0
<ComplexBenchmark::c_style() const+0x20>
3.4 OOP
-------
00000420 <ComplexBenchmark::oop_style() const>:
460: 89 7d 9c mov %edi,0xffffff9c(%ebp)
463: 89 ca mov %ecx,%edx
465: c1 e2 04 shl $0x4,%edx
468: 89 75 a0 mov %esi,0xffffffa0(%ebp)
46b: 8b 82 00 00 00 00 mov 0x0(%edx),%eax
471: 41 inc %ecx
472: 89 5d a4 mov %ebx,0xffffffa4(%ebp)
475: 81 f9 e7 03 00 00 cmp $0x3e7,%ecx
47b: dd 45 a0 fldl 0xffffffa0(%ebp)
47e: 89 45 b8 mov %eax,0xffffffb8(%ebp)
481: 8b 82 04 00 00 00 mov 0x4(%edx),%eax
487: d9 c0 fld %st(0)
489: 89 45 bc mov %eax,0xffffffbc(%ebp)
48c: 8b 82 08 00 00 00 mov 0x8(%edx),%eax
492: 89 45 c0 mov %eax,0xffffffc0(%ebp)
495: 8b 82 0c 00 00 00 mov 0xc(%edx),%eax
49b: 89 45 c4 mov %eax,0xffffffc4(%ebp)
49e: 8b 85 74 ff ff ff mov 0xffffff74(%ebp),%eax
4a4: 89 45 98 mov %eax,0xffffff98(%ebp)
4a7: 8b 82 00 00 00 00 mov 0x0(%edx),%eax
4ad: dd 45 98 fldl 0xffffff98(%ebp)
4b0: 89 45 88 mov %eax,0xffffff88(%ebp)
4b3: 8b 82 04 00 00 00 mov 0x4(%edx),%eax
4b9: d9 c0 fld %st(0)
4bb: 89 45 8c mov %eax,0xffffff8c(%ebp)
4be: 8b 82 08 00 00 00 mov 0x8(%edx),%eax
4c4: dd 45 88 fldl 0xffffff88(%ebp)
4c7: 89 45 90 mov %eax,0xffffff90(%ebp)
4ca: 8b 82 0c 00 00 00 mov 0xc(%edx),%eax
4d0: dc c9 fmul %st,%st(1)
4d2: de cc fmulp %st,%st(4)
4d4: 89 45 94 mov %eax,0xffffff94(%ebp)
4d7: dd 45 90 fldl 0xffffff90(%ebp)
4da: dc cb fmul %st,%st(3)
4dc: de ca fmulp %st,%st(2)
4de: de e2 fsubp %st,%st(2)
4e0: de c2 faddp %st,%st(2)
4e2: dd 9d 78 ff ff ff fstpl 0xffffff78(%ebp)
4e8: 8b 85 78 ff ff ff mov 0xffffff78(%ebp),%eax
4ee: dd 5d 80 fstpl 0xffffff80(%ebp)
4f1: 89 45 a8 mov %eax,0xffffffa8(%ebp)
4f4: 8b 85 7c ff ff ff mov 0xffffff7c(%ebp),%eax
4fa: 89 45 ac mov %eax,0xffffffac(%ebp)
4fd: 8b 45 80 mov 0xffffff80(%ebp),%eax
500: 89 45 b0 mov %eax,0xffffffb0(%ebp)
503: 8b 45 84 mov 0xffffff84(%ebp),%eax
506: 89 45 b4 mov %eax,0xffffffb4(%ebp)
509: dd 45 a8 fldl 0xffffffa8(%ebp)
50c: dc 45 b8 faddl 0xffffffb8(%ebp)
50f: dd 45 b0 fldl 0xffffffb0(%ebp)
512: dc 45 c0 faddl 0xffffffc0(%ebp)
515: d9 c9 fxch %st(1)
517: dd 9d 78 ff ff ff fstpl 0xffffff78(%ebp)
51d: 8b 85 78 ff ff ff mov 0xffffff78(%ebp),%eax
523: dd 5d 80 fstpl 0xffffff80(%ebp)
526: 89 82 00 00 00 00 mov %eax,0x0(%edx)
52c: 8b 85 7c ff ff ff mov 0xffffff7c(%ebp),%eax
532: 89 82 04 00 00 00 mov %eax,0x4(%edx)
538: 8b 45 80 mov 0xffffff80(%ebp),%eax
53b: 89 82 08 00 00 00 mov %eax,0x8(%edx)
541: 8b 45 84 mov 0xffffff84(%ebp),%eax
544: 89 82 0c 00 00 00 mov %eax,0xc(%edx)
54a: 0f 8e 10 ff ff ff jle 460
<ComplexBenchmark::oop_style() const+0x40>
Any ideas?
Thanks,
Paolo.
[-- Attachment #2: oopack_v1p8.C.bz2 --]
[-- Type: application/octet-stream, Size: 6984 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-01-27 20:12 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-27 22:02 Bad performance (regression) on oopack's Complex test Paolo Carlini
-- strict thread matches above, loose matches on Subject: below --
2003-01-27 15:46 Paolo Carlini
2003-01-27 16:26 ` Falk Hueffner
2003-01-27 16:55 ` Paolo Carlini
2003-01-27 17:00 ` Jan Hubicka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).