public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3
@ 2005-02-12 20:07 gj at pointblue dot com dot pl
  2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org
                   ` (28 more replies)
  0 siblings, 29 replies; 30+ messages in thread
From: gj at pointblue dot com dot pl @ 2005-02-12 20:07 UTC (permalink / raw)
  To: gcc-bugs

here's openssl speed resoult when it's compiled with 3.3 (orginal debian 
unstable package): 
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) 
blowfish(idx) 
compiler: gcc -fPIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H 
-DOPENSSL_NO_KRB5 -DOPENSSL_NO_IDEA -DOPENSSL_NO_MDC2 -DOPENSSL_NO_RC5 
-DL_ENDIAN -DTERMIO -O3 -march=i686 -mcpu=i686 -fomit-frame-pointer -Wall 
-DSHA1_ASM -DMD5_ASM -DRMD160_ASM 
available timing options: TIMES TIMEB HZ=100 [sysconf value] 
timing function used: times 
The 'numbers' are in 1000s of bytes per second processed. 
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes 
md2                510.80k     1064.79k     1486.96k     1641.83k     1702.87k 
mdc2                 0.00         0.00         0.00         0.00         0.00 
md4               4999.47k    17746.97k    51392.88k    97451.59k   131711.89k 
md5               4405.95k    15208.16k    43027.34k    77946.11k   101040.96k 
hmac(md5)         4951.58k    16851.67k    46126.90k    81002.65k   101700.77k 
sha1              3892.54k    12223.89k    29586.19k    45767.99k    54082.03k 
rmd160            3715.14k    10397.52k    23079.49k    33148.87k    37651.83k 
rc4              58941.98k    66899.63k    71733.39k    72572.54k    72476.92k 
des cbc          13353.92k    13897.80k    14067.26k    14088.53k    14107.61k 
des ede3          4887.63k     5039.28k     5083.63k     5116.70k     5086.58k 
idea cbc             0.00         0.00         0.00         0.00         0.00 
rc2 cbc           5257.37k     5534.13k     5560.97k     5610.12k     5582.42k 
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00 
blowfish cbc     21054.83k    22340.34k    22704.49k    22895.90k    22860.91k 
cast cbc         14478.39k    15882.31k    16400.99k    16570.03k    16585.01k 
aes-128 cbc      13612.33k    14364.39k    14382.68k    14404.12k    14440.26k 
aes-192 cbc      12075.70k    12370.43k    12530.49k    12518.63k    12559.92k 
aes-256 cbc      10806.91k    11093.65k    11179.27k    11185.67k    11205.97k 
                  sign    verify    sign/s verify/s 
rsa  512 bits   0.0023s   0.0002s    438.5   4928.2 
rsa 1024 bits   0.0109s   0.0006s     91.6   1746.1 
rsa 2048 bits   0.0646s   0.0019s     15.5    527.6 
rsa 4096 bits   0.4317s   0.0066s      2.3    152.0 
                  sign    verify    sign/s verify/s 
dsa  512 bits   0.0018s   0.0022s    546.0    460.7 
dsa 1024 bits   0.0054s   0.0065s    186.6    154.8 
dsa 2048 bits   0.0179s   0.0220s     55.7     45.5 
 
and here's the same package compiled with gcc 4.0,  
gcc-4.0 (GCC) 4.0.0 20050212 (experimental) 
compiler: gcc -fPIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H 
-DOPENSSL_NO_KRB5 -DOPENSSL_NO_IDEA -DO 
PENSSL_NO_MDC2 -DOPENSSL_NO_RC5 -DL_ENDIAN -DTERMIO -O3 -march=i686 -mcpu=i686 
-fomit-frame-pointer -Wall -DSHA1_ASM 
-DMD5_ASM -DRMD160_ASM 
available timing options: TIMES TIMEB HZ=100 [sysconf value] 
timing function used: times 
The 'numbers' are in 1000s of bytes per second processed. 
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes 
md2                361.81k      781.01k     1103.19k     1231.36k     1278.84k 
mdc2                 0.00         0.00         0.00         0.00         0.00 
md4               3103.64k    11338.88k    36135.04k    79292.67k   123123.36k 
md5               2758.32k    10084.74k    31863.54k    66522.25k    98860.02k 
hmac(md5)         4581.08k    15784.49k    43771.66k    78227.60k   101959.42k 
sha1              2638.72k     8889.12k    24063.88k    41890.99k    53462.15k 
rmd160            2477.15k     7918.19k    19696.52k    31106.04k    37317.88k 
rc4              60284.27k    67543.46k    71379.34k    72455.38k    72581.12k 
des cbc          13547.77k    13876.64k    14049.67k    14102.25k    14020.78k 
des ede3          4950.20k     5050.99k     5068.80k     5111.00k     5088.06k 
idea cbc             0.00         0.00         0.00         0.00         0.00 
rc2 cbc           5814.75k     6060.45k     6150.37k     6169.60k     6196.13k 
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00 
blowfish cbc     20941.23k    22373.68k    22868.43k    22822.28k    23014.29k 
cast cbc         12790.60k    14102.95k    14514.24k    14494.77k    14622.21k 
aes-128 cbc      13030.43k    13549.49k    13653.51k    13694.85k    13696.33k 
aes-192 cbc      11257.66k    11517.92k    11545.25k    11604.32k    11568.43k 
aes-256 cbc      10065.01k    10296.48k    10403.82k    10332.02k    10382.25k 
                  sign    verify    sign/s verify/s 
rsa  512 bits   0.0024s   0.0002s    418.5   4201.7 
rsa 1024 bits   0.0112s   0.0006s     89.5   1550.7 
rsa 2048 bits   0.0650s   0.0020s     15.4    504.9 
rsa 4096 bits   0.4311s   0.0068s      2.3    147.9 
                  sign    verify    sign/s verify/s 
dsa  512 bits   0.0019s   0.0023s    521.4    441.9 
dsa 1024 bits   0.0055s   0.0067s    182.9    148.3 
dsa 2048 bits   0.0181s   0.0222s     55.2     45.1 
 
as you can see almost each test is worst with 4.0.  
Not sure why. The same test on ultrasparc and amd64 shows 4.0 as clear winner. 
( Althou it still crashes on amd64... ;) )

-- 
           Summary: openssl is slower when compiled with gcc 4.0 than 3.3
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: gj at pointblue dot com dot pl
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i86
  GCC host triplet: i86
GCC target triplet: i86


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
@ 2005-02-12 20:11 ` pinskia at gcc dot gnu dot org
  2005-02-13  6:44 ` pinskia at gcc dot gnu dot org
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-02-12 20:11 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |target
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
  2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org
@ 2005-02-13  6:44 ` pinskia at gcc dot gnu dot org
  2005-05-14 20:36 ` pinskia at gcc dot gnu dot org
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-02-13  6:44 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-02-12 22:24 -------
We need a self contained example.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu dot
                   |                            |org
             Status|UNCONFIRMED                 |WAITING


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
  2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org
  2005-02-13  6:44 ` pinskia at gcc dot gnu dot org
@ 2005-05-14 20:36 ` pinskia at gcc dot gnu dot org
  2005-06-01 20:47 ` yx at cs dot ucla dot edu
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-05-14 20:36 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-05-14 20:36 -------
No feedback in 3 months.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (2 preceding siblings ...)
  2005-05-14 20:36 ` pinskia at gcc dot gnu dot org
@ 2005-06-01 20:47 ` yx at cs dot ucla dot edu
  2005-06-01 20:55 ` pinskia at gcc dot gnu dot org
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: yx at cs dot ucla dot edu @ 2005-06-01 20:47 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From yx at cs dot ucla dot edu  2005-06-01 20:47 -------
When we ran 'openssh speed md2', we did see that gcc-4.0 was slower
than earlier versions, so we created a minimal test case, which
we will attach.  Here is how long it took to run a 34 megabyte
file through the test program when compiled with various compilers and
options:

gcc-2.95.3 -fPIC -O1 4.940s
gcc-4.0.0  -fPIC -O1 3.510s
gcc-3.4.3  -fPIC -O1 5.190s

gcc-2.95.3 -fPIC -O2 3.470s
gcc-3.4.3  -fPIC -O2 3.460s
gcc-4.0.0  -fPIC -O2 4.050s

gcc-2.95.3 -fPIC -O3 3.400s
gcc-3.4.3  -fPIC -O3 3.740s
gcc-4.0.0  -fPIC -O3 4.010s

This test was done on a pentium 4 workstation, and no smoothing was
done on the resulting times, but they seemed to be repeatable.
We also tried without -fPIC, but did not see as large a regression there.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (3 preceding siblings ...)
  2005-06-01 20:47 ` yx at cs dot ucla dot edu
@ 2005-06-01 20:55 ` pinskia at gcc dot gnu dot org
  2005-06-01 22:55 ` giovannibajo at libero dot it
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-06-01 20:55 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-06-01 20:55 -------
I would not doubt this is just not using the i386 address mode

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (4 preceding siblings ...)
  2005-06-01 20:55 ` pinskia at gcc dot gnu dot org
@ 2005-06-01 22:55 ` giovannibajo at libero dot it
  2005-06-01 22:59 ` giovannibajo at libero dot it
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-01 22:55 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2005-06-01 22:55 -------
Confirmed. The regression appears only with -fPIC, and it's pretty evident. The 
core is md2_block, the inner loop:

GCC 3.4
=============================================================
.L29:
        xorl    %edx, %edx
        .p2align 2,,3
.L28:
        movl    S@GOTOFF(%ebx,%eax,4), %esi
        xorl    -216(%ebp,%edx,4), %esi
        movl    S@GOTOFF(%ebx,%esi,4), %eax
        xorl    -212(%ebp,%edx,4), %eax
        movl    S@GOTOFF(%ebx,%eax,4), %edi
        xorl    -208(%ebp,%edx,4), %edi
        movl    %esi, -216(%ebp,%edx,4)
        movl    S@GOTOFF(%ebx,%edi,4), %esi
        xorl    -204(%ebp,%edx,4), %esi
        movl    %eax, -212(%ebp,%edx,4)
        movl    S@GOTOFF(%ebx,%esi,4), %eax
        xorl    -200(%ebp,%edx,4), %eax
        movl    %edi, -208(%ebp,%edx,4)
        movl    S@GOTOFF(%ebx,%eax,4), %edi
        xorl    -196(%ebp,%edx,4), %edi
        movl    %esi, -204(%ebp,%edx,4)
        movl    S@GOTOFF(%ebx,%edi,4), %esi
        xorl    -192(%ebp,%edx,4), %esi
        movl    %eax, -200(%ebp,%edx,4)
        movl    S@GOTOFF(%ebx,%esi,4), %eax
        xorl    -188(%ebp,%edx,4), %eax
        movl    %edi, -196(%ebp,%edx,4)
        movl    %esi, -192(%ebp,%edx,4)
        movl    %eax, -188(%ebp,%edx,4)
        addl    $8, %edx
        cmpl    $47, %edx
        jle     .L28
        addl    %ecx, %eax
        incl    %ecx
        andl    $255, %eax
        cmpl    $17, %ecx
        jle     .L29
=============================================================



GCC 4.0
=============================================================
.L16:
        movl    -384(%ebp), %eax
        movl    -208(%ebp), %esi
        incl    -384(%ebp)
        addl    %esi, %eax
        movl    -456(%ebp), %esi
        andl    $255, %eax
        movl    (%edi,%eax,4), %ecx
        movl    -464(%ebp), %eax
        xorl    %ecx, %esi
        movl    (%edi,%esi,4), %edx
        movl    %esi, -368(%ebp)
        movl    %esi, -456(%ebp)
        movl    -488(%ebp), %esi
        xorl    %edx, %eax
        movl    -472(%ebp), %edx
        movl    (%edi,%eax,4), %ecx
        movl    (%edi,%eax,4), %ecx
        movl    %eax, -364(%ebp)
        movl    %eax, -464(%ebp)
        xorl    %ecx, %edx
        movl    -480(%ebp), %ecx
        movl    (%edi,%edx,4), %eax
        movl    %edx, -360(%ebp)
        movl    %edx, -472(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -356(%ebp)
        movl    %ecx, -480(%ebp)
        xorl    %eax, %esi
        movl    -496(%ebp), %eax
        movl    (%edi,%esi,4), %edx
        movl    %esi, -352(%ebp)
        movl    %esi, -488(%ebp)
        xorl    %edx, %eax
        movl    -504(%ebp), %edx
        movl    (%edi,%eax,4), %ecx
        movl    %eax, -348(%ebp)
        movl    %eax, -496(%ebp)
        xorl    %ecx, %edx
        movl    -512(%ebp), %ecx
        movl    (%edi,%edx,4), %eax
        movl    %edx, -344(%ebp)
        movl    %edx, -504(%ebp)
        xorl    %eax, %ecx
        movl    %ecx, -340(%ebp)
        movl    (%edi,%ecx,4), %eax
        movl    -520(%ebp), %esi
        movl    %ecx, -512(%ebp)
        xorl    %eax, %esi
        movl    -528(%ebp), %eax
        movl    (%edi,%esi,4), %edx
        movl    %esi, -336(%ebp)
        movl    %esi, -520(%ebp)
        movl    -552(%ebp), %esi
        xorl    %edx, %eax
        movl    -536(%ebp), %edx
        movl    (%edi,%eax,4), %ecx
        movl    %eax, -332(%ebp)
        movl    %eax, -528(%ebp)
        xorl    %ecx, %edx
        movl    -544(%ebp), %ecx
        movl    (%edi,%edx,4), %eax
        movl    %edx, -328(%ebp)
        movl    %edx, -536(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -324(%ebp)
        movl    %ecx, -544(%ebp)
        xorl    %eax, %esi
        movl    -556(%ebp), %eax
        movl    (%edi,%esi,4), %edx
        movl    %esi, -320(%ebp)
        movl    %esi, -552(%ebp)
        movl    -568(%ebp), %esi
        xorl    %edx, %eax
        movl    -560(%ebp), %edx
        movl    (%edi,%eax,4), %ecx
        movl    %eax, -316(%ebp)
        movl    %eax, -556(%ebp)
        xorl    %ecx, %edx
        movl    -564(%ebp), %ecx
        movl    (%edi,%edx,4), %eax
        movl    %edx, -312(%ebp)
        movl    %edx, -560(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -308(%ebp)
        movl    %ecx, -564(%ebp)
        xorl    %eax, %esi
        movl    %esi, -304(%ebp)
        movl    (%edi,%esi,4), %edx
        movl    -572(%ebp), %eax
        movl    %esi, -568(%ebp)
        movl    -396(%ebp), %esi
        xorl    %edx, %eax
        movl    -576(%ebp), %edx
        movl    (%edi,%eax,4), %ecx
        movl    %eax, -300(%ebp)
        movl    %eax, -572(%ebp)
        xorl    %ecx, %edx
        movl    -580(%ebp), %ecx
        movl    (%edi,%edx,4), %eax
        movl    %edx, -296(%ebp)
        movl    %edx, -576(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -292(%ebp)
        movl    %ecx, -580(%ebp)
        xorl    %eax, %esi
        movl    -400(%ebp), %eax
        movl    (%edi,%esi,4), %edx
        movl    %esi, -288(%ebp)
        movl    %esi, -396(%ebp)
        movl    -412(%ebp), %esi
        xorl    %edx, %eax
        movl    -404(%ebp), %edx
        movl    (%edi,%eax,4), %ecx
        movl    %eax, -284(%ebp)
        movl    %eax, -400(%ebp)
        xorl    %ecx, %edx
        movl    -408(%ebp), %ecx
        movl    (%edi,%edx,4), %eax
        movl    %edx, -280(%ebp)
        movl    %edx, -404(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -276(%ebp)
        movl    %ecx, -408(%ebp)
        xorl    %eax, %esi
        movl    -416(%ebp), %eax
        movl    (%edi,%esi,4), %edx
        movl    %esi, -272(%ebp)
        movl    %esi, -412(%ebp)
        xorl    %edx, %eax
        movl    %eax, -268(%ebp)
        movl    (%edi,%eax,4), %ecx
        movl    -420(%ebp), %edx
        movl    %eax, -416(%ebp)
        movl    -428(%ebp), %esi
        xorl    %ecx, %edx
        movl    -424(%ebp), %ecx
        movl    (%edi,%edx,4), %eax
        movl    %edx, -264(%ebp)
        movl    %edx, -420(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -260(%ebp)
        movl    %ecx, -424(%ebp)
        xorl    %eax, %esi
        movl    -432(%ebp), %eax
        movl    (%edi,%esi,4), %edx
        movl    %esi, -256(%ebp)
        movl    %esi, -428(%ebp)
        movl    -444(%ebp), %esi
        xorl    %edx, %eax
        movl    -436(%ebp), %edx
        movl    (%edi,%eax,4), %ecx
        movl    %eax, -252(%ebp)
        movl    %eax, -432(%ebp)
        xorl    %ecx, %edx
        movl    -440(%ebp), %ecx
        movl    (%edi,%edx,4), %eax
        movl    %edx, -248(%ebp)
        movl    %edx, -436(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -244(%ebp)
        movl    %ecx, -440(%ebp)
        xorl    %eax, %esi
        movl    -448(%ebp), %eax
        movl    (%edi,%esi,4), %edx
        movl    %esi, -240(%ebp)
        movl    %esi, -444(%ebp)
        xorl    %edx, %eax
        movl    -452(%ebp), %edx
        movl    (%edi,%eax,4), %ecx
        movl    %eax, -236(%ebp)
        movl    %eax, -448(%ebp)
        xorl    %ecx, %edx
        movl    %edx, -232(%ebp)
        movl    (%edi,%edx,4), %eax
        movl    -460(%ebp), %ecx
        movl    -468(%ebp), %esi
        movl    %edx, -452(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -228(%ebp)
        movl    %ecx, -460(%ebp)
        xorl    %eax, %esi
        movl    -476(%ebp), %eax
        movl    (%edi,%esi,4), %edx
        movl    %esi, -224(%ebp)
        movl    %esi, -468(%ebp)
        movl    -500(%ebp), %esi
        xorl    %edx, %eax
        movl    -484(%ebp), %edx
        movl    (%edi,%eax,4), %ecx
        movl    %eax, -220(%ebp)
        movl    %eax, -476(%ebp)
        xorl    %ecx, %edx
        movl    -492(%ebp), %ecx
        movl    (%edi,%edx,4), %eax
        movl    %edx, -216(%ebp)
        movl    %edx, -484(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %edx, -216(%ebp)
        movl    %edx, -484(%ebp)
        xorl    %eax, %ecx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -212(%ebp)
        movl    %ecx, -492(%ebp)
        xorl    %eax, %esi
        movl    -508(%ebp), %eax
        movl    (%edi,%esi,4), %edx
        movl    %esi, -380(%ebp)
        movl    %esi, -500(%ebp)
        xorl    %edx, %eax
        movl    -516(%ebp), %edx
        movl    (%edi,%eax,4), %esi
        movl    %eax, -376(%ebp)
        movl    %eax, -508(%ebp)
        xorl    %esi, %edx
        movl    -524(%ebp), %esi
        movl    (%edi,%edx,4), %ecx
        movl    %edx, -372(%ebp)
        movl    %edx, -516(%ebp)
        xorl    %ecx, %esi
        movl    %esi, -524(%ebp)
        movl    -532(%ebp), %ecx
        movl    (%edi,%esi,4), %edx
        xorl    %edx, %ecx
        movl    -540(%ebp), %edx
        movl    (%edi,%ecx,4), %eax
        movl    %ecx, -532(%ebp)
        xorl    %eax, %edx
        movl    -548(%ebp), %eax
        xorl    (%edi,%edx,4), %eax
        movl    %edx, -540(%ebp)
        movl    %eax, -584(%ebp)
        movl    %eax, -548(%ebp)
        movl    (%edi,%eax,4), %eax
        xorl    %eax, -208(%ebp)
        cmpl    $17, -384(%ebp)
        jne     .L16
=============================================================


The loop was unrolled, but it's clear that the address mode selection is worse.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2005-06-01 22:55:36
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (5 preceding siblings ...)
  2005-06-01 22:55 ` giovannibajo at libero dot it
@ 2005-06-01 22:59 ` giovannibajo at libero dot it
  2005-06-02  8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-01 22:59 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2005-06-01 22:59 -------
I wonder if this is fixed by TARGET_MEM_REF.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (6 preceding siblings ...)
  2005-06-01 22:59 ` giovannibajo at libero dot it
@ 2005-06-02  8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
  2005-06-06  7:16 ` steven at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-02  8:01 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz  2005-06-02 08:01 -------
Subject: Re:  openssl is slower when compiled with gcc 4.0 than 3.3

The assembler attributed to 4.0 was produced by mainline (or some
patched version of 4.0), wasn't it?
Otherwise I cannot imagine why the inner loop would be unrolled.

For plain 4.0, we get the following code, which seems just fine
and equivalent to the one obtained with 3.4 (one of the memory
references is strength reduced, but since we still fit into registers,
this is OK).

I don't just now see what/whether there is some problem with the code
produced by 4.1, but I also don't see anything related to addressing
mode selection there.

.L21:
        movl    S@GOTOFF(%ebx,%eax,4), %eax
        xorl    (%edx), %eax
        movl    %eax, (%edx)
        movl    S@GOTOFF(%ebx,%eax,4), %eax
        xorl    4(%edx), %eax
        movl    %eax, 4(%edx)
        movl    S@GOTOFF(%ebx,%eax,4), %eax
        xorl    8(%edx), %eax
        movl    %eax, 8(%edx)
        movl    S@GOTOFF(%ebx,%eax,4), %eax
        xorl    12(%edx), %eax
        movl    %eax, 12(%edx)
        movl    S@GOTOFF(%ebx,%eax,4), %eax
        xorl    16(%edx), %eax
        movl    %eax, 16(%edx)
        movl    S@GOTOFF(%ebx,%eax,4), %eax
        xorl    20(%edx), %eax
        movl    %eax, 20(%edx)
        movl    S@GOTOFF(%ebx,%eax,4), %eax
        xorl    24(%edx), %eax
        movl    %eax, 24(%edx)
        movl    S@GOTOFF(%ebx,%eax,4), %eax
        xorl    28(%edx), %eax
        movl    %eax, 28(%edx)
        addl    $32, %edx
        leal    -12(%ebp), %esi
        cmpl    %esi, %edx
        jne     .L21



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (7 preceding siblings ...)
  2005-06-02  8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-06-06  7:16 ` steven at gcc dot gnu dot org
  2005-06-06  7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-06  7:16 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-06-06 07:16 -------
Could L1 icache blow-out be the reason? 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (8 preceding siblings ...)
  2005-06-06  7:16 ` steven at gcc dot gnu dot org
@ 2005-06-06  7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
  2005-06-06 13:33 ` giovannibajo at libero dot it
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-06  7:30 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz  2005-06-06 07:30 -------
Subject: Re:  openssl is slower when compiled with gcc 4.0 than 3.3

> Could L1 icache blow-out be the reason? 

This is not likely with the minimized example.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (9 preceding siblings ...)
  2005-06-06  7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-06-06 13:33 ` giovannibajo at libero dot it
  2005-06-06 14:40 ` giovannibajo at libero dot it
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-06 13:33 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2005-06-06 13:33 -------
Uhm, at this point, I don't believe anymore that the loop I posted is the cause 
of the regression. Maybe the regression is somewhere else. I'll investigate.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (10 preceding siblings ...)
  2005-06-06 13:33 ` giovannibajo at libero dot it
@ 2005-06-06 14:40 ` giovannibajo at libero dot it
  2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-06 14:40 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2005-06-06 14:40 -------
Looks like the culrpit is this:

=========================================================================
static unsigned int S[256];
unsigned
md2_block (unsigned int *sp1, unsigned int *sp2, const unsigned char *d)
{
	register unsigned int t;
	register int i, j;
	static unsigned int state[48];

	j = sp2[16 - 1];
	for (i = 0; i < 16; i++)
	{
		state[i] = sp1[i];
		state[i + 16] = t = d[i];
		state[i + 32] = (t ^ sp1[i]);
		j = sp2[i] ^= S[t ^ j];
	}
}
=========================================================================


gcc 3.4.3 -fPIC -O2:
===================================================
.L5:
	movl	8(%ebp), %esi
	movl	(%esi,%ecx,4), %eax
	movl	%eax, state.0@GOTOFF(%ebx,%ecx,4)
	movl	16(%ebp), %edx
	movzbl	(%edx,%ecx), %eax
	movl	%eax, 64+state.0@GOTOFF(%ebx,%ecx,4)
	movl	(%esi,%ecx,4), %edx
	xorl	%eax, %edx
	movl	-16(%ebp), %esi
	xorl	-20(%ebp), %eax
	movl	%edx, 128+state.0@GOTOFF(%ebx,%ecx,4)
	movl	(%esi,%eax,4), %eax
	xorl	(%edi,%ecx,4), %eax
	movl	%eax, (%edi,%ecx,4)
	incl	%ecx
	cmpl	$15, %ecx
	movl	%eax, -20(%ebp)
	jle	.L5
===================================================



gcc 4.1.0 20050529 -fPIC -O2:
===================================================
.L2:
	movl	8(%ebp), %eax
	leal	0(,%edi,4), %ecx
	movl	%ecx, -28(%ebp)
	addl	%ecx, %eax
	movl	16(%ebp), %ecx
	movl	%eax, %edx
	movl	%eax, -24(%ebp)
	movl	-4(%eax), %eax
	movl	%eax, (%esi)
	movzbl	-1(%ecx,%edi), %eax
	incl	%edi
	movl	%eax, 64(%esi)
	movl	-4(%edx), %ecx
	movl	12(%ebp), %edx
	xorl	%eax, %ecx
	movl	%ecx, 128(%esi)
	movl	-28(%ebp), %ecx
	addl	$4, %esi
	addl	%edx, %ecx
	movl	-16(%ebp), %edx
	xorl	%edx, %eax
	movl	-20(%ebp), %edx
	movl	(%edx,%eax,4), %eax
	movl	-4(%ecx), %edx
	xorl	%edx, %eax
	cmpl	$17, %edi
	movl	%eax, -4(%ecx)
	movl	%eax, -16(%ebp)
	jne	.L2
===================================================


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (11 preceding siblings ...)
  2005-06-06 14:40 ` giovannibajo at libero dot it
@ 2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
  2005-06-08 13:15 ` [Bug target/19923] [4.0/4.1 Regression] " giovannibajo at libero dot it
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-06 15:00 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz  2005-06-06 15:00 -------
Subject: Re:  openssl is slower when compiled with gcc 4.0 than 3.3

> Looks like the culrpit is this:
> 
> =========================================================================
> static unsigned int S[256];
> unsigned
> md2_block (unsigned int *sp1, unsigned int *sp2, const unsigned char *d)
> {
> 	register unsigned int t;
> 	register int i, j;
> 	static unsigned int state[48];
> 
> 	j = sp2[16 - 1];
> 	for (i = 0; i < 16; i++)
> 	{
> 		state[i] = sp1[i];
> 		state[i + 16] = t = d[i];
> 		state[i + 32] = (t ^ sp1[i]);
> 		j = sp2[i] ^= S[t ^ j];
> 	}
> }
> =========================================================================

with the TARGET_MEM_REFs patch the result is much better.  At
least we avoid the multiplication by 4

> 	leal	0(,%edi,4), %ecx

and other results of the DOM missoptimization of addressing modes, that was
one of the main motivations for TARGET_MEM_REFs.

We still use one more iv than in the 3.4 case, and in result we need one
more register.

.L2:
        movl    8(%ebp), %edi
        movl    -4(%edi,%ecx,4), %eax
        movl    %eax, (%esi)
        movl    16(%ebp), %edx
        movzbl  -1(%ecx,%edx), %eax
        movl    %eax, 64(%esi)
        movl    -4(%edi,%ecx,4), %edx
        xorl    %eax, %edx
        movl    %edx, 128(%esi)
        xorl    -20(%ebp), %eax
        movl    -16(%ebp), %edi
        movl    (%edi,%eax,4), %eax
        movl    12(%ebp), %edx
        xorl    -4(%edx,%ecx,4), %eax
        movl    %eax, -4(%edx,%ecx,4)
        movl    %eax, -20(%ebp)
        incl    %ecx
        addl    $4, %esi
        cmpl    $17, %ecx
        jne     .L2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (12 preceding siblings ...)
  2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-06-08 13:15 ` giovannibajo at libero dot it
  2005-06-17  0:59 ` dank at kegel dot com
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-08 13:15 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|openssl is slower when      |[4.0/4.1 Regression] openssl
                   |compiled with gcc 4.0 than  |is slower when compiled with
                   |3.3                         |gcc 4.0 than 3.3
   Target Milestone|---                         |4.0.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (13 preceding siblings ...)
  2005-06-08 13:15 ` [Bug target/19923] [4.0/4.1 Regression] " giovannibajo at libero dot it
@ 2005-06-17  0:59 ` dank at kegel dot com
  2005-06-17  1:11 ` pinskia at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-17  0:59 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-06-17 00:59 -------
We're learning more about this bug.
Anthony Danalis has boiled down the testcase much further;
I'll attach the reduced testcase as foo4.i.

It looks like it shows up if your /proc/cpuinfo says

vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 9
cpu MHz         : 2793.051
cache size      : 512 KB

but not if your /proc/cpuinfo says
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping        : 1
cpu MHz         : 3200.255
cache size      : 1024 KB

But here's the fun part: on the newer CPU with the bigger
cache, gcc-2.95.3 was just as slow as gcc-3.4.3/gcc-4.0.0.
Go figure.

We'll add more details once we've got more info.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (14 preceding siblings ...)
  2005-06-17  0:59 ` dank at kegel dot com
@ 2005-06-17  1:11 ` pinskia at gcc dot gnu dot org
  2005-06-18  6:24 ` dank at kegel dot com
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-06-17  1:11 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-06-17 01:10 -------
(In reply to comment #14)
> We're learning more about this bug.
> Anthony Danalis has boiled down the testcase much further;
> I'll attach the reduced testcase as foo4.i.

Yes you know what the difference is between those two, the second one is not really a P4 but really a 
new core, Intel marketing at its best.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (15 preceding siblings ...)
  2005-06-17  1:11 ` pinskia at gcc dot gnu dot org
@ 2005-06-18  6:24 ` dank at kegel dot com
  2005-06-18  6:39 ` dank at kegel dot com
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-18  6:24 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-06-18 06:24 -------
Looks to me like gcc-3.4.3 is known to fail, too, depending on the CPU.
Anthony Danalis and I came up with a little script to run foo4.i
on various processors with various values for -mtune, which I'll
attach; here are the results for four different x86 variants.

The last two columns are the time on gcc-3.4.3 and gcc-4.0.0
divided by the time on gcc-2.95.3, so any value above 1.0 in
the last column is a performance regression.  
Rows are sorted by the last column.  The first five
rows represent performance regressions for gcc-3.4.3;
the first three also represent performance regressions
for gcc-4.0.0.

family,model,name               pic?  tune      [t_295, t_343, t_400]
[t_295/t_295, t_343/t_295, t_400/t_295]

6,8, Pentium III (Coppermine),  -fPIC athlon-xp [9.25, 16.22, 18.79]  [1.00,
1.75, 2.03]
15,2, Xeon(TM) CPU 2.60GHz,     -fPIC pentium4  [1.91, 3.89, 3.27]    [1.00,
2.04, 1.71]
6,8, Pentium III (Coppermine),  -fPIC pentium3  [9.15, 10.10, 13.20]  [1.00,
1.10, 1.44]
15,2, Xeon(TM) CPU 2.60GHz,     -fPIC athlon-xp [1.91, 2.00, 1.95]    [1.00,
1.05, 1.02]
6,8, Pentium III (Coppermine),  -fPIC pentium4  [9.27, 10.49,  8.87]  [1.00,
1.13, 0.96]

--- ok below this line ---

6,8, Pentium III (Coppermine),        pentium4  [14.74, 13.71, 14.12] [1.00,
0.93, 0.96]
15,4, Athlon(tm) 64 3000+,      -fPIC pentium4  [4.12, 3.68, 3.74]    [1.00,
0.89, 0.91]
15,4, Pentium(R) 4 CPU 3.20GHz, -fPIC pentium4  [2.48, 2.18, 2.09]    [1.00,
0.88, 0.84]
15,4, Athlon(tm) 64 3000+,      -fPIC athlon-xp [4.12, 3.50, 3.20]    [1.00,
0.85, 0.78]
15,4, Pentium(R) 4 CPU 3.20GHz,       pentium4  [2.17, 1.07, 1.07]    [1.00,
0.49, 0.49]
6,8, Pentium III (Coppermine),        pentium3  [14.22,  6.26,  6.46] [1.00,
0.44, 0.45]
6,8, Pentium III (Coppermine),        athlon-xp [14.93,  6.26,  6.27] [1.00,
0.42, 0.42]
15,4, Athlon(tm) 64 3000+,            pentium4  [3.65, 1.39, 1.39]    [1.00,
0.38, 0.38]
15,4, Athlon(tm) 64 3000+,            athlon-xp [3.65, 1.39, 1.40]    [1.00,
0.38, 0.38]
15,2, Xeon(TM) CPU 2.60GHz,           pentium4  [6.42, 0.97, 0.98]    [1.00,
0.15, 0.15]


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (16 preceding siblings ...)
  2005-06-18  6:24 ` dank at kegel dot com
@ 2005-06-18  6:39 ` dank at kegel dot com
  2005-06-18 17:47 ` dank at kegel dot com
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-18  6:39 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-06-18 06:38 -------
To be clear, here are the two most worrying rows from the above table,
expanded a bit.  These are the runtimes of foo4.i in seconds.
The cpu family, model, and name are as shown by /proc/cpuinfo.

cpu family 15, model 2, Intel(R) Xeon(TM) CPU 2.60GHz:
-fPIC -mtune=pentium4 -O3 
gcc-2.95.3: 1.91 seconds
gcc-3.4.3:  3.89
gcc-4.0.0:  3.27

cpu family 6, model 8, Pentium III (Coppermine)
-fPIC -mtune=pentium3 -O3
gcc-2.95.3: 9.15
gcc-3.4.3: 10.10
gcc-4.0.0: 13.20

gcc-4.0.0 produces code that runs 1.7 and 1.4 times slower than gcc-2.95.3
on these (fairly common!) cpus, even when the proper -mtune is used.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (17 preceding siblings ...)
  2005-06-18  6:39 ` dank at kegel dot com
@ 2005-06-18 17:47 ` dank at kegel dot com
  2005-06-18 22:46 ` dank at kegel dot com
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-18 17:47 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-06-18 17:46 -------
The above tests did not use -mcpu on gcc-2.95.3,
so they were comparing apples to oranges, kind of.

I reran them on a PIII with gcc-2.95.3 -mcpu=$tune -O3 
and gcc-[34] -mtune=$tune -O3.  The problem persists
even when using the most appropriate tuning option
for the CPU in question.

cpu family 6,model 8, Pentium III (Coppermine):
-fPIC -mcpu=pentium -O3 
gcc-2.95.3: 7.61
gcc-3.4.3: 27.43
gcc-4.0.0: 17.57

cpu family 6,model 8, Pentium III (Coppermine):
-fPIC -mcpu=pentiumpro -O3
gcc-2.95.3: 9.27
gcc-3.4.3: 10.09
gcc-4.0.0: 13.96

cpu family 15, model 2, Intel(R) Xeon(TM) CPU 2.60GHz:
-fPIC -mtune=pentium4 -O3 
gcc-2.95.3: 1.91 seconds
gcc-3.4.3:  3.89
gcc-4.0.0:  3.27



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (18 preceding siblings ...)
  2005-06-18 17:47 ` dank at kegel dot com
@ 2005-06-18 22:46 ` dank at kegel dot com
  2005-06-24 15:00 ` dank at kegel dot com
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-18 22:46 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-06-18 22:45 -------
I asked the fellow who posted the original problem report to give
me the results of 'cat /proc/cpuinfo' on the affected machine.
Here it is:

vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 10
cpu MHz         : 896.153

This is the same as one of the two affected CPU types here.

The slow routine appears to be the buffer cleaning routine,
though I haven't verified this with oprofile yet.
Here's its loop:
static char cleanse_ctr;
...
    while (len--) {
        *(ptr++) = cleanse_ctr;
        cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF));
    }
and the output of -O3 -fPIC for both gcc-2.95.3 and gcc-4.0.0:

--- gcc-2.95.3 ---
.L5:    
        movl cleanse_ctr@GOT(%ebx),%edi
        movb (%edi),%al
        movb %al,(%edx)
        incl %edx
        movb (%edi),%cl
        addb $17,%cl
        movb %dl,%al
        andb $15,%al
        addb %al,%cl
        movb %cl,(%edi)
        subl $1,%esi
        jnc .L5
.L4:

--- gcc-4 ---    
.L4:    
        movb    (%esi), %al
        movb    %al, (%edx)
        leal    (%ecx,%edi), %eax
        andl    $15, %eax
        incl    %ecx
        addb    (%esi), %al
        incl    %edx
        addl    $17, %eax
        cmpl    %ecx, 12(%ebp)
        movb    %al, (%esi)
        jne     .L4

It's not obvious to me why the gcc-4.0.0 generated code
should be slower when run on some CPUs, if in fact it is.
Is it the fact that the loop condition is checked with
a cmp against memory instead of a flag being set by subtracting
1 from a register?

(And where's the best place to learn about how to predict
how long assembly snippets like this will take to run
on various modern CPUs, anyway?)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (19 preceding siblings ...)
  2005-06-18 22:46 ` dank at kegel dot com
@ 2005-06-24 15:00 ` dank at kegel dot com
  2005-06-24 15:01 ` dank at kegel dot com
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-24 15:00 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-06-24 15:00 -------
Michael Meissner looked at the code, and saw that
gcc-2.95.3 converts the loop to a countdown loop,
but gcc-3.x doesn't, which wastes a precious register.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (20 preceding siblings ...)
  2005-06-24 15:00 ` dank at kegel dot com
@ 2005-06-24 15:01 ` dank at kegel dot com
  2005-06-24 15:53 ` steven at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-24 15:01 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-06-24 15:01 -------
And, for what it's worth, the latest 4.1 snapshot also suffers from this.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (21 preceding siblings ...)
  2005-06-24 15:01 ` dank at kegel dot com
@ 2005-06-24 15:53 ` steven at gcc dot gnu dot org
  2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-24 15:53 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-06-24 15:53 -------
I don't see how the precious register would matter much.  But this compare 
with memory is strange: 
 
        cmpl    %ecx, 12(%ebp) 
 
Why isn't len loaded into a register?? 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (22 preceding siblings ...)
  2005-06-24 15:53 ` steven at gcc dot gnu dot org
@ 2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
  2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-24 16:24 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz  2005-06-24 16:24 -------
Subject: Re:  [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3

> I don't see how the precious register would matter much.  But this compare 
> with memory is strange: 
>  
>         cmpl    %ecx, 12(%ebp) 
>  
> Why isn't len loaded into a register?? 

You answer your own question -- because there is no register free;
that's why the precisious register maters that much.

(I guess; I may be wrong).

Zdenek


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (23 preceding siblings ...)
  2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu
  2005-06-25  2:49 ` rakdver at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-06-24 17:41 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dann at godzilla dot ics dot uci dot edu  2005-06-24 17:41 -------
(In reply to comment #21)

> The slow routine appears to be the buffer cleaning routine,
> though I haven't verified this with oprofile yet.
> Here's its loop:
> static char cleanse_ctr;
> ...
>     while (len--) {
>         *(ptr++) = cleanse_ctr;
>         cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF));
>     }

[Not entirely related, but..] There's one obvious way to improve this loop. 
The compiler cannot prove that the write *(ptr++) does not alias the global
variable cleanse_ptr, so it will read it from memory in each iteration.
To avoid the extra memory read just do something like:

void OPENSSL_cleanse(unsigned char *ptr, unsigned int len)
{
  unsigned char local_cleanse_ctr = cleanse_ctr;
  while (len--) {
        *(ptr++) = local_cleanse_ctr;
        local_cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF));
    }
  local_cleanse_ctr += 63;
  cleanse_ctr = local_cleanse_ctr;  
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (24 preceding siblings ...)
  2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu
@ 2005-06-25  2:49 ` rakdver at gcc dot gnu dot org
  2005-06-25 10:15 ` steven at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2005-06-25  2:49 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rakdver at gcc dot gnu dot org  2005-06-25 02:49 -------
Ivopts seem to do several quite doubtful decisions in this testcase.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |rakdver at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (25 preceding siblings ...)
  2005-06-25  2:49 ` rakdver at gcc dot gnu dot org
@ 2005-06-25 10:15 ` steven at gcc dot gnu dot org
  2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
  2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org
  28 siblings, 0 replies; 30+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-25 10:15 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-06-25 10:15 -------
Re. comment #25, as far as I can tell there are registers available in 
that loop.  To quote the loop from comment #12: 
 
.L4:     
        movb    (%esi), %al 
        movb    %al, (%edx) 
        leal    (%ecx,%edi), %eax 
        andl    $15, %eax 
        incl    %ecx 
        addb    (%esi), %al 
        incl    %edx 
        addl    $17, %eax 
        cmpl    %ecx, 12(%ebp) 
        movb    %al, (%esi) 
        jne     .L4 
 
Checking off used registers in this loop: 
%esi x 
%edi x 
%eax x 
%ebx 
%ecx x 
%edx x 
 
So %ebx at least is free (and iiuc, with -fomit-frame-pointer %ebp is 
also free, right?).  Maybe the allocator thinks %ebx can't be used 
because it is the PIC register. 
 
Here is what mainline today ("GCC: (GNU) 4.1.0 20050625 (experimental)") 
gives me (x86-64 compiler with "-m32 -march=i686 -O3 -fPIC"): 
 
.L4: 
        movzbl  (%esi), %eax 
        movb    %al, (%ecx) 
        incl    %ecx 
        movzbl  -13(%ebp), %eax 
        movzbl  (%esi), %edx 
        incb    -13(%ebp) 
        andb    $15, %al 
        addb    $17, %dl 
        addb    %dl, %al 
        cmpl    %edi, %ecx 
        movb    %al, (%esi) 
        jne     .L4 
 
The .optimized tree dump looks like this: 
 
<bb 0>: 
  len.23 = len - 1; 
  if (len.23 != 4294967295) goto <L6>; else goto <L2>; 
 
<L6>:; 
  ivtmp.19 = (unsigned char) (signed char) (int) (ptr + 1B); 
  ptr.27 = ptr; 
 
<L0>:; 
  MEM[base: ptr.27] = cleanse_ctr; 
  ptr.27 = ptr.27 + 1B; 
  cleanse_ctr = (unsigned char) (((signed char) ivtmp.19 & 15) 
                                 + (signed char) cleanse_ctr + 17); 
  ivtmp.19 = ivtmp.19 + 1; 
  if (ptr.27 != (unsigned char *) (ptr + (void *) len.23 + 1B)) goto <L0>; 
else goto <L2>; 
 
<L2>:; 
  cleanse_ctr = (unsigned char) ((signed char) cleanse_ctr + 63); 
  return; 
 
Note how the loop test is against ptr.  Also, as far as I can tell the 
right hand side of the test (i.e. "(ptr + (void *) len.23 + 1B)") is loop 
invariant and should have been moved out.  And the first two lines are 
also just weird, it is probably cheaper on almost any machine to do 
 
  len.23 = len; 
  if (len.23 != 0) goto <L6>; else goto <L2>; 
 
<L6>: 
  len.23 = len.23 - 1; 
  (etc...) 
 
In summary, we just produce crap code here ;-) 
 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (26 preceding siblings ...)
  2005-06-25 10:15 ` steven at gcc dot gnu dot org
@ 2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
  2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org
  28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-25 11:32 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz  2005-06-25 11:32 -------
Subject: Re:  [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3

> ------- Additional Comments From steven at gcc dot gnu dot org  2005-06-25 10:15 -------
> Re. comment #25, as far as I can tell there are registers available in 
> that loop.  To quote the loop from comment #12: 
>  
> .L4:     
>         movb    (%esi), %al 
>         movb    %al, (%edx) 
>         leal    (%ecx,%edi), %eax 
>         andl    $15, %eax 
>         incl    %ecx 
>         addb    (%esi), %al 
>         incl    %edx 
>         addl    $17, %eax 
>         cmpl    %ecx, 12(%ebp) 
>         movb    %al, (%esi) 
>         jne     .L4 
>  
> Checking off used registers in this loop: 
> %esi x 
> %edi x 
> %eax x 
> %ebx 
> %ecx x 
> %edx x 
>  
> So %ebx at least is free (and iiuc, with -fomit-frame-pointer %ebp is 
> also free, right?).  Maybe the allocator thinks %ebx can't be used 
> because it is the PIC register. 

yes, ebx cannot be used because of pic, and -fomit-frame-pointer is off
by default.

> Here is what mainline today ("GCC: (GNU) 4.1.0 20050625 (experimental)") 
> gives me (x86-64 compiler with "-m32 -march=i686 -O3 -fPIC"): 
>  
> .L4: 
>         movzbl  (%esi), %eax 
>         movb    %al, (%ecx) 
>         incl    %ecx 
>         movzbl  -13(%ebp), %eax 
>         movzbl  (%esi), %edx 
>         incb    -13(%ebp) 
>         andb    $15, %al 
>         addb    $17, %dl 
>         addb    %dl, %al 
>         cmpl    %edi, %ecx 
>         movb    %al, (%esi) 
>         jne     .L4 
>  
> The .optimized tree dump looks like this: 
>  
> <bb 0>: 
>   len.23 = len - 1; 
>   if (len.23 != 4294967295) goto <L6>; else goto <L2>; 

> And the first two lines are 
> also just weird, it is probably cheaper on almost any machine to do 
>   len.23 = len; 
>   if (len.23 != 0) goto <L6>; else goto <L2>; 
>  
> <L6>: 
>   len.23 = len.23 - 1; 
>   (etc...) 

Not really.  On i686, there should be no difference.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
  2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
                   ` (27 preceding siblings ...)
  2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org
  28 siblings, 0 replies; 30+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-27 15:57 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.2                       |4.0.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2005-09-27 15:57 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org
2005-02-13  6:44 ` pinskia at gcc dot gnu dot org
2005-05-14 20:36 ` pinskia at gcc dot gnu dot org
2005-06-01 20:47 ` yx at cs dot ucla dot edu
2005-06-01 20:55 ` pinskia at gcc dot gnu dot org
2005-06-01 22:55 ` giovannibajo at libero dot it
2005-06-01 22:59 ` giovannibajo at libero dot it
2005-06-02  8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-06  7:16 ` steven at gcc dot gnu dot org
2005-06-06  7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-06 13:33 ` giovannibajo at libero dot it
2005-06-06 14:40 ` giovannibajo at libero dot it
2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-08 13:15 ` [Bug target/19923] [4.0/4.1 Regression] " giovannibajo at libero dot it
2005-06-17  0:59 ` dank at kegel dot com
2005-06-17  1:11 ` pinskia at gcc dot gnu dot org
2005-06-18  6:24 ` dank at kegel dot com
2005-06-18  6:39 ` dank at kegel dot com
2005-06-18 17:47 ` dank at kegel dot com
2005-06-18 22:46 ` dank at kegel dot com
2005-06-24 15:00 ` dank at kegel dot com
2005-06-24 15:01 ` dank at kegel dot com
2005-06-24 15:53 ` steven at gcc dot gnu dot org
2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu
2005-06-25  2:49 ` rakdver at gcc dot gnu dot org
2005-06-25 10:15 ` steven at gcc dot gnu dot org
2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).