public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Bad performance (regression) on oopack's Complex test
@ 2003-01-27 22:02 Paolo Carlini
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Carlini @ 2003-01-27 22:02 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc List

> What effect does have -mfpmath=sse switch?

It helps! But, as expected, most noticeably the C test, not the
OOP counterpart (and the Ratio becomes even bigger ;)

                        Seconds       Mflops

           Iterations     C    OOP     C    OOP  Ratio

3.2.1
-----
-O2
Complex        200000    1.6  21.4  981.6  74.8   13.1

-O2 -march=pentium4 -mfpmath=sse
Complex        200000    1.0  19.1  1616.2  84.0   19.2


3.4
---
-O2
Complex        200000    1.6  29.2  993.8  54.8   18.1

-O2 -march=pentium4 -mfpmath=sse
Complex        200000    1.0  27.0  1649.5  59.3   27.8


Thanks,
Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bad performance (regression) on oopack's Complex test
  2003-01-27 16:55   ` Paolo Carlini
@ 2003-01-27 17:00     ` Jan Hubicka
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Hubicka @ 2003-01-27 17:00 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: Falk Hueffner, gcc

> Falk Hueffner wrote:
> 
> >Have you tried -O3, or generally tuning inlining?
> >
> I have tried both, with no positive results (the inlined operators are 
> very small)
> 
> >BTW, on Alpha, I can see an improvement:
> >
> Nice. In fact on an older PII machine I was seeing Ratios similar to yours.
> I think that there is a general issue and a specific one concerning P4 
> (slowness of memory accesses?)

I will try to look closer into this soon, as -march=pentium4 is my
work after all.  It needs a lot more effort than I was able to put into
it - for instance scheduling is entirely missing.
What effect does have -mfpmath=sse switch?

Honza
> 
> >Just for fun, here's the result of cxx -O5 -fast:
> >
> I know :-( ... this is the final goal which icc already achieves on any 
> x86 cpu.
> 
> Paolo.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bad performance (regression) on oopack's Complex test
  2003-01-27 16:26 ` Falk Hueffner
@ 2003-01-27 16:55   ` Paolo Carlini
  2003-01-27 17:00     ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Carlini @ 2003-01-27 16:55 UTC (permalink / raw)
  To: Falk Hueffner; +Cc: gcc

Falk Hueffner wrote:

>Have you tried -O3, or generally tuning inlining?
>
I have tried both, with no positive results (the inlined operators are 
very small)

>BTW, on Alpha, I can see an improvement:
>
Nice. In fact on an older PII machine I was seeing Ratios similar to yours.
I think that there is a general issue and a specific one concerning P4 
(slowness of memory accesses?)

>Just for fun, here's the result of cxx -O5 -fast:
>
I know :-( ... this is the final goal which icc already achieves on any 
x86 cpu.

Paolo.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bad performance (regression) on oopack's Complex test
  2003-01-27 15:46 Paolo Carlini
@ 2003-01-27 16:26 ` Falk Hueffner
  2003-01-27 16:55   ` Paolo Carlini
  0 siblings, 1 reply; 5+ messages in thread
From: Falk Hueffner @ 2003-01-27 16:26 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc, Jan Hubicka

Paolo Carlini <pcarlini@unitus.it> writes:

> Hi everyone,
> 
> I'm noticing bad performance on this test. Moreover, the numbers are
> worse for 3.4 vs 3.2.1.
> 
> On my P4-2400, 200000 iterations, -O2 (ideally, the ratio should be
> 1 :(

Have you tried -O3, or generally tuning inlining?

BTW, on Alpha, I can see an improvement:

g++ 3.2 -O3
                         Seconds       Mflops         
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Complex        200000    2.4  17.7  677.1  90.3    7.5

g++ 3.4 20030116 -O3
                         Seconds       Mflops         
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Complex        200000    2.3  11.3  701.5 141.6    5.0


Just for fun, here's the result of cxx -O5 -fast:

                         Seconds       Mflops         
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Max            100000    1.5   1.6   65.0  64.5    1.0
Matrix           2000    0.9   0.9  537.0 539.8    1.0
Complex        200000    1.6   1.6  999.6 988.7    1.0
Iterator       200000    0.5   0.5  729.2 729.2    1.0

-- 
	Falk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Bad performance (regression) on oopack's Complex test
@ 2003-01-27 15:46 Paolo Carlini
  2003-01-27 16:26 ` Falk Hueffner
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Carlini @ 2003-01-27 15:46 UTC (permalink / raw)
  To: gcc; +Cc: Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 10413 bytes --]

Hi everyone,

I'm noticing bad performance on this test. Moreover, the
numbers are worse for 3.4 vs 3.2.1.

On my P4-2400, 200000 iterations, -O2 (ideally, the ratio
should be 1 :(

3.2.1
-----
                          Seconds       Mflops
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Complex        200000    1.6  21.4  981.6  74.8   13.1


3.4
---
                         Seconds       Mflops
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Complex        200000    1.6  29.2  993.8  54.8   18.1


Dumps of the relevant loop follows:

3.2.1 C
-------
000003d0 <ComplexBenchmark::c_style() const>:
 3f0:    89 d0                    mov    %edx,%eax
 3f2:    d9 c1                    fld    %st(1)
 3f4:    c1 e0 04                 shl    $0x4,%eax
 3f7:    dd 80 00 00 00 00        fldl   0x0(%eax)
 3fd:    42                       inc    %edx
 3fe:    d9 c2                    fld    %st(2)
 400:    dd 80 08 00 00 00        fldl   0x8(%eax)
 406:    d9 cb                    fxch   %st(3)
 408:    d8 ca                    fmul   %st(2),%st
 40a:    d9 c9                    fxch   %st(1)
 40c:    81 fa e7 03 00 00        cmp    $0x3e7,%edx
 412:    d8 cb                    fmul   %st(3),%st
 414:    d9 cb                    fxch   %st(3)
 416:    d8 cd                    fmul   %st(5),%st
 418:    d9 ca                    fxch   %st(2)
 41a:    d8 cc                    fmul   %st(4),%st
 41c:    d9 c9                    fxch   %st(1)
 41e:    dc 80 00 00 00 00        faddl  0x0(%eax)
 424:    d9 ca                    fxch   %st(2)
 426:    dc 80 08 00 00 00        faddl  0x8(%eax)
 42c:    d9 ca                    fxch   %st(2)
 42e:    de e3                    fsubp  %st,%st(3)
 430:    de c1                    faddp  %st,%st(1)
 432:    d9 c9                    fxch   %st(1)
 434:    dd 98 00 00 00 00        fstpl  0x0(%eax)
 43a:    dd 98 08 00 00 00        fstpl  0x8(%eax)
 440:    7e ae                    jle    3f0 
<ComplexBenchmark::c_style() const+0x20>

3.2.1 OOP
---------
00000450 <ComplexBenchmark::oop_style() const>:
 490:    89 7d 9c                 mov    %edi,0xffffff9c(%ebp)
 493:    89 ca                    mov    %ecx,%edx
 495:    c1 e2 04                 shl    $0x4,%edx
 498:    89 75 a0                 mov    %esi,0xffffffa0(%ebp)
 49b:    8b 82 00 00 00 00        mov    0x0(%edx),%eax
 4a1:    41                       inc    %ecx
 4a2:    89 5d a4                 mov    %ebx,0xffffffa4(%ebp)
 4a5:    81 f9 e7 03 00 00        cmp    $0x3e7,%ecx
 4ab:    89 45 b8                 mov    %eax,0xffffffb8(%ebp)
 4ae:    8b 82 04 00 00 00        mov    0x4(%edx),%eax
 4b4:    dd 45 a0                 fldl   0xffffffa0(%ebp)
 4b7:    89 45 bc                 mov    %eax,0xffffffbc(%ebp)
 4ba:    8b 82 08 00 00 00        mov    0x8(%edx),%eax
 4c0:    d9 c0                    fld    %st(0)
 4c2:    89 45 c0                 mov    %eax,0xffffffc0(%ebp)
 4c5:    8b 82 0c 00 00 00        mov    0xc(%edx),%eax
 4cb:    89 45 c4                 mov    %eax,0xffffffc4(%ebp)
 4ce:    8b 45 84                 mov    0xffffff84(%ebp),%eax
 4d1:    89 45 98                 mov    %eax,0xffffff98(%ebp)
 4d4:    8b 82 00 00 00 00        mov    0x0(%edx),%eax
 4da:    dd 45 98                 fldl   0xffffff98(%ebp)
 4dd:    89 45 88                 mov    %eax,0xffffff88(%ebp)
 4e0:    8b 82 04 00 00 00        mov    0x4(%edx),%eax
 4e6:    d9 c0                    fld    %st(0)
 4e8:    89 45 8c                 mov    %eax,0xffffff8c(%ebp)
 4eb:    8b 82 08 00 00 00        mov    0x8(%edx),%eax
 4f1:    dd 45 88                 fldl   0xffffff88(%ebp)
 4f4:    89 45 90                 mov    %eax,0xffffff90(%ebp)
 4f7:    8b 82 0c 00 00 00        mov    0xc(%edx),%eax
 4fd:    dc c9                    fmul   %st,%st(1)
 4ff:    89 45 94                 mov    %eax,0xffffff94(%ebp)
 502:    de cc                    fmulp  %st,%st(4)
 504:    dd 45 90                 fldl   0xffffff90(%ebp)
 507:    dc cb                    fmul   %st,%st(3)
 509:    de ca                    fmulp  %st,%st(2)
 50b:    de e2                    fsubp  %st,%st(2)
 50d:    de c2                    faddp  %st,%st(2)
 50f:    dc 45 b8                 faddl  0xffffffb8(%ebp)
 512:    d9 c9                    fxch   %st(1)
 514:    dc 45 c0                 faddl  0xffffffc0(%ebp)
 517:    d9 c9                    fxch   %st(1)
 519:    dd 5d c8                 fstpl  0xffffffc8(%ebp)
 51c:    8b 45 c8                 mov    0xffffffc8(%ebp),%eax
 51f:    dd 5d d0                 fstpl  0xffffffd0(%ebp)
 522:    89 82 00 00 00 00        mov    %eax,0x0(%edx)
 528:    8b 45 cc                 mov    0xffffffcc(%ebp),%eax
 52b:    89 82 04 00 00 00        mov    %eax,0x4(%edx)
 531:    8b 45 d0                 mov    0xffffffd0(%ebp),%eax
 534:    89 82 08 00 00 00        mov    %eax,0x8(%edx)
 53a:    8b 45 d4                 mov    0xffffffd4(%ebp),%eax
 53d:    89 82 0c 00 00 00        mov    %eax,0xc(%edx)
 543:    0f 8e 47 ff ff ff        jle    490 
<ComplexBenchmark::oop_style() const+0x40>


3.4 C
-----
000003a0 <ComplexBenchmark::c_style() const>:
 3c0:    89 d0                    mov    %edx,%eax
 3c2:    d9 c1                    fld    %st(1)
 3c4:    c1 e0 04                 shl    $0x4,%eax
 3c7:    dd 80 00 00 00 00        fldl   0x0(%eax)
 3cd:    42                       inc    %edx
 3ce:    d9 c2                    fld    %st(2)
 3d0:    dd 80 08 00 00 00        fldl   0x8(%eax)
 3d6:    d9 cb                    fxch   %st(3)
 3d8:    81 fa e7 03 00 00        cmp    $0x3e7,%edx
 3de:    d8 ca                    fmul   %st(2),%st
 3e0:    d9 c9                    fxch   %st(1)
 3e2:    d8 cb                    fmul   %st(3),%st
 3e4:    d9 cb                    fxch   %st(3)
 3e6:    d8 cd                    fmul   %st(5),%st
 3e8:    d9 ca                    fxch   %st(2)
 3ea:    d8 cc                    fmul   %st(4),%st
 3ec:    d9 c9                    fxch   %st(1)
 3ee:    dc 80 00 00 00 00        faddl  0x0(%eax)
 3f4:    d9 ca                    fxch   %st(2)
 3f6:    dc 80 08 00 00 00        faddl  0x8(%eax)
 3fc:    d9 ca                    fxch   %st(2)
 3fe:    de e3                    fsubp  %st,%st(3)
 400:    de c1                    faddp  %st,%st(1)
 402:    d9 c9                    fxch   %st(1)
 404:    dd 98 00 00 00 00        fstpl  0x0(%eax)
 40a:    dd 98 08 00 00 00        fstpl  0x8(%eax)
 410:    7e ae                    jle    3c0 
<ComplexBenchmark::c_style() const+0x20>

3.4 OOP
-------
00000420 <ComplexBenchmark::oop_style() const>:
 460:    89 7d 9c                 mov    %edi,0xffffff9c(%ebp)
 463:    89 ca                    mov    %ecx,%edx
 465:    c1 e2 04                 shl    $0x4,%edx
 468:    89 75 a0                 mov    %esi,0xffffffa0(%ebp)
 46b:    8b 82 00 00 00 00        mov    0x0(%edx),%eax
 471:    41                       inc    %ecx
 472:    89 5d a4                 mov    %ebx,0xffffffa4(%ebp)
 475:    81 f9 e7 03 00 00        cmp    $0x3e7,%ecx
 47b:    dd 45 a0                 fldl   0xffffffa0(%ebp)
 47e:    89 45 b8                 mov    %eax,0xffffffb8(%ebp)
 481:    8b 82 04 00 00 00        mov    0x4(%edx),%eax
 487:    d9 c0                    fld    %st(0)
 489:    89 45 bc                 mov    %eax,0xffffffbc(%ebp)
 48c:    8b 82 08 00 00 00        mov    0x8(%edx),%eax
 492:    89 45 c0                 mov    %eax,0xffffffc0(%ebp)
 495:    8b 82 0c 00 00 00        mov    0xc(%edx),%eax
 49b:    89 45 c4                 mov    %eax,0xffffffc4(%ebp)
 49e:    8b 85 74 ff ff ff        mov    0xffffff74(%ebp),%eax
 4a4:    89 45 98                 mov    %eax,0xffffff98(%ebp)
 4a7:    8b 82 00 00 00 00        mov    0x0(%edx),%eax
 4ad:    dd 45 98                 fldl   0xffffff98(%ebp)
 4b0:    89 45 88                 mov    %eax,0xffffff88(%ebp)
 4b3:    8b 82 04 00 00 00        mov    0x4(%edx),%eax
 4b9:    d9 c0                    fld    %st(0)
 4bb:    89 45 8c                 mov    %eax,0xffffff8c(%ebp)
 4be:    8b 82 08 00 00 00        mov    0x8(%edx),%eax
 4c4:    dd 45 88                 fldl   0xffffff88(%ebp)
 4c7:    89 45 90                 mov    %eax,0xffffff90(%ebp)
 4ca:    8b 82 0c 00 00 00        mov    0xc(%edx),%eax
 4d0:    dc c9                    fmul   %st,%st(1)
 4d2:    de cc                    fmulp  %st,%st(4)
 4d4:    89 45 94                 mov    %eax,0xffffff94(%ebp)
 4d7:    dd 45 90                 fldl   0xffffff90(%ebp)
 4da:    dc cb                    fmul   %st,%st(3)
 4dc:    de ca                    fmulp  %st,%st(2)
 4de:    de e2                    fsubp  %st,%st(2)
 4e0:    de c2                    faddp  %st,%st(2)
 4e2:    dd 9d 78 ff ff ff        fstpl  0xffffff78(%ebp)
 4e8:    8b 85 78 ff ff ff        mov    0xffffff78(%ebp),%eax
 4ee:    dd 5d 80                 fstpl  0xffffff80(%ebp)
 4f1:    89 45 a8                 mov    %eax,0xffffffa8(%ebp)
 4f4:    8b 85 7c ff ff ff        mov    0xffffff7c(%ebp),%eax
 4fa:    89 45 ac                 mov    %eax,0xffffffac(%ebp)
 4fd:    8b 45 80                 mov    0xffffff80(%ebp),%eax
 500:    89 45 b0                 mov    %eax,0xffffffb0(%ebp)
 503:    8b 45 84                 mov    0xffffff84(%ebp),%eax
 506:    89 45 b4                 mov    %eax,0xffffffb4(%ebp)
 509:    dd 45 a8                 fldl   0xffffffa8(%ebp)
 50c:    dc 45 b8                 faddl  0xffffffb8(%ebp)
 50f:    dd 45 b0                 fldl   0xffffffb0(%ebp)
 512:    dc 45 c0                 faddl  0xffffffc0(%ebp)
 515:    d9 c9                    fxch   %st(1)
 517:    dd 9d 78 ff ff ff        fstpl  0xffffff78(%ebp)
 51d:    8b 85 78 ff ff ff        mov    0xffffff78(%ebp),%eax
 523:    dd 5d 80                 fstpl  0xffffff80(%ebp)
 526:    89 82 00 00 00 00        mov    %eax,0x0(%edx)
 52c:    8b 85 7c ff ff ff        mov    0xffffff7c(%ebp),%eax
 532:    89 82 04 00 00 00        mov    %eax,0x4(%edx)
 538:    8b 45 80                 mov    0xffffff80(%ebp),%eax
 53b:    89 82 08 00 00 00        mov    %eax,0x8(%edx)
 541:    8b 45 84                 mov    0xffffff84(%ebp),%eax
 544:    89 82 0c 00 00 00        mov    %eax,0xc(%edx)
 54a:    0f 8e 10 ff ff ff        jle    460 
<ComplexBenchmark::oop_style() const+0x40>


Any ideas?

Thanks,
Paolo.

[-- Attachment #2: oopack_v1p8.C.bz2 --]
[-- Type: application/octet-stream, Size: 6984 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-01-27 20:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-27 22:02 Bad performance (regression) on oopack's Complex test Paolo Carlini
  -- strict thread matches above, loose matches on Subject: below --
2003-01-27 15:46 Paolo Carlini
2003-01-27 16:26 ` Falk Hueffner
2003-01-27 16:55   ` Paolo Carlini
2003-01-27 17:00     ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).