public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3
@ 2005-02-12 20:07 gj at pointblue dot com dot pl
2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org
` (28 more replies)
0 siblings, 29 replies; 30+ messages in thread
From: gj at pointblue dot com dot pl @ 2005-02-12 20:07 UTC (permalink / raw)
To: gcc-bugs
here's openssl speed resoult when it's compiled with 3.3 (orginal debian
unstable package):
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial)
blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H
-DOPENSSL_NO_KRB5 -DOPENSSL_NO_IDEA -DOPENSSL_NO_MDC2 -DOPENSSL_NO_RC5
-DL_ENDIAN -DTERMIO -O3 -march=i686 -mcpu=i686 -fomit-frame-pointer -Wall
-DSHA1_ASM -DMD5_ASM -DRMD160_ASM
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 510.80k 1064.79k 1486.96k 1641.83k 1702.87k
mdc2 0.00 0.00 0.00 0.00 0.00
md4 4999.47k 17746.97k 51392.88k 97451.59k 131711.89k
md5 4405.95k 15208.16k 43027.34k 77946.11k 101040.96k
hmac(md5) 4951.58k 16851.67k 46126.90k 81002.65k 101700.77k
sha1 3892.54k 12223.89k 29586.19k 45767.99k 54082.03k
rmd160 3715.14k 10397.52k 23079.49k 33148.87k 37651.83k
rc4 58941.98k 66899.63k 71733.39k 72572.54k 72476.92k
des cbc 13353.92k 13897.80k 14067.26k 14088.53k 14107.61k
des ede3 4887.63k 5039.28k 5083.63k 5116.70k 5086.58k
idea cbc 0.00 0.00 0.00 0.00 0.00
rc2 cbc 5257.37k 5534.13k 5560.97k 5610.12k 5582.42k
rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00
blowfish cbc 21054.83k 22340.34k 22704.49k 22895.90k 22860.91k
cast cbc 14478.39k 15882.31k 16400.99k 16570.03k 16585.01k
aes-128 cbc 13612.33k 14364.39k 14382.68k 14404.12k 14440.26k
aes-192 cbc 12075.70k 12370.43k 12530.49k 12518.63k 12559.92k
aes-256 cbc 10806.91k 11093.65k 11179.27k 11185.67k 11205.97k
sign verify sign/s verify/s
rsa 512 bits 0.0023s 0.0002s 438.5 4928.2
rsa 1024 bits 0.0109s 0.0006s 91.6 1746.1
rsa 2048 bits 0.0646s 0.0019s 15.5 527.6
rsa 4096 bits 0.4317s 0.0066s 2.3 152.0
sign verify sign/s verify/s
dsa 512 bits 0.0018s 0.0022s 546.0 460.7
dsa 1024 bits 0.0054s 0.0065s 186.6 154.8
dsa 2048 bits 0.0179s 0.0220s 55.7 45.5
and here's the same package compiled with gcc 4.0,
gcc-4.0 (GCC) 4.0.0 20050212 (experimental)
compiler: gcc -fPIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H
-DOPENSSL_NO_KRB5 -DOPENSSL_NO_IDEA -DO
PENSSL_NO_MDC2 -DOPENSSL_NO_RC5 -DL_ENDIAN -DTERMIO -O3 -march=i686 -mcpu=i686
-fomit-frame-pointer -Wall -DSHA1_ASM
-DMD5_ASM -DRMD160_ASM
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 361.81k 781.01k 1103.19k 1231.36k 1278.84k
mdc2 0.00 0.00 0.00 0.00 0.00
md4 3103.64k 11338.88k 36135.04k 79292.67k 123123.36k
md5 2758.32k 10084.74k 31863.54k 66522.25k 98860.02k
hmac(md5) 4581.08k 15784.49k 43771.66k 78227.60k 101959.42k
sha1 2638.72k 8889.12k 24063.88k 41890.99k 53462.15k
rmd160 2477.15k 7918.19k 19696.52k 31106.04k 37317.88k
rc4 60284.27k 67543.46k 71379.34k 72455.38k 72581.12k
des cbc 13547.77k 13876.64k 14049.67k 14102.25k 14020.78k
des ede3 4950.20k 5050.99k 5068.80k 5111.00k 5088.06k
idea cbc 0.00 0.00 0.00 0.00 0.00
rc2 cbc 5814.75k 6060.45k 6150.37k 6169.60k 6196.13k
rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00
blowfish cbc 20941.23k 22373.68k 22868.43k 22822.28k 23014.29k
cast cbc 12790.60k 14102.95k 14514.24k 14494.77k 14622.21k
aes-128 cbc 13030.43k 13549.49k 13653.51k 13694.85k 13696.33k
aes-192 cbc 11257.66k 11517.92k 11545.25k 11604.32k 11568.43k
aes-256 cbc 10065.01k 10296.48k 10403.82k 10332.02k 10382.25k
sign verify sign/s verify/s
rsa 512 bits 0.0024s 0.0002s 418.5 4201.7
rsa 1024 bits 0.0112s 0.0006s 89.5 1550.7
rsa 2048 bits 0.0650s 0.0020s 15.4 504.9
rsa 4096 bits 0.4311s 0.0068s 2.3 147.9
sign verify sign/s verify/s
dsa 512 bits 0.0019s 0.0023s 521.4 441.9
dsa 1024 bits 0.0055s 0.0067s 182.9 148.3
dsa 2048 bits 0.0181s 0.0222s 55.2 45.1
as you can see almost each test is worst with 4.0.
Not sure why. The same test on ultrasparc and amd64 shows 4.0 as clear winner.
( Althou it still crashes on amd64... ;) )
--
Summary: openssl is slower when compiled with gcc 4.0 than 3.3
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: gj at pointblue dot com dot pl
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i86
GCC host triplet: i86
GCC target triplet: i86
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
@ 2005-02-12 20:11 ` pinskia at gcc dot gnu dot org
2005-02-13 6:44 ` pinskia at gcc dot gnu dot org
` (27 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-02-12 20:11 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Component|c |target
Keywords| |missed-optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org
@ 2005-02-13 6:44 ` pinskia at gcc dot gnu dot org
2005-05-14 20:36 ` pinskia at gcc dot gnu dot org
` (26 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-02-13 6:44 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-02-12 22:24 -------
We need a self contained example.
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |pinskia at gcc dot gnu dot
| |org
Status|UNCONFIRMED |WAITING
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org
2005-02-13 6:44 ` pinskia at gcc dot gnu dot org
@ 2005-05-14 20:36 ` pinskia at gcc dot gnu dot org
2005-06-01 20:47 ` yx at cs dot ucla dot edu
` (25 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-05-14 20:36 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-05-14 20:36 -------
No feedback in 3 months.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |RESOLVED
Resolution| |INVALID
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (2 preceding siblings ...)
2005-05-14 20:36 ` pinskia at gcc dot gnu dot org
@ 2005-06-01 20:47 ` yx at cs dot ucla dot edu
2005-06-01 20:55 ` pinskia at gcc dot gnu dot org
` (24 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: yx at cs dot ucla dot edu @ 2005-06-01 20:47 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From yx at cs dot ucla dot edu 2005-06-01 20:47 -------
When we ran 'openssh speed md2', we did see that gcc-4.0 was slower
than earlier versions, so we created a minimal test case, which
we will attach. Here is how long it took to run a 34 megabyte
file through the test program when compiled with various compilers and
options:
gcc-2.95.3 -fPIC -O1 4.940s
gcc-4.0.0 -fPIC -O1 3.510s
gcc-3.4.3 -fPIC -O1 5.190s
gcc-2.95.3 -fPIC -O2 3.470s
gcc-3.4.3 -fPIC -O2 3.460s
gcc-4.0.0 -fPIC -O2 4.050s
gcc-2.95.3 -fPIC -O3 3.400s
gcc-3.4.3 -fPIC -O3 3.740s
gcc-4.0.0 -fPIC -O3 4.010s
This test was done on a pentium 4 workstation, and no smoothing was
done on the resulting times, but they seemed to be repeatable.
We also tried without -fPIC, but did not see as large a regression there.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (3 preceding siblings ...)
2005-06-01 20:47 ` yx at cs dot ucla dot edu
@ 2005-06-01 20:55 ` pinskia at gcc dot gnu dot org
2005-06-01 22:55 ` giovannibajo at libero dot it
` (23 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-06-01 20:55 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-06-01 20:55 -------
I would not doubt this is just not using the i386 address mode
--
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Resolution|INVALID |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (4 preceding siblings ...)
2005-06-01 20:55 ` pinskia at gcc dot gnu dot org
@ 2005-06-01 22:55 ` giovannibajo at libero dot it
2005-06-01 22:59 ` giovannibajo at libero dot it
` (22 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-01 22:55 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From giovannibajo at libero dot it 2005-06-01 22:55 -------
Confirmed. The regression appears only with -fPIC, and it's pretty evident. The
core is md2_block, the inner loop:
GCC 3.4
=============================================================
.L29:
xorl %edx, %edx
.p2align 2,,3
.L28:
movl S@GOTOFF(%ebx,%eax,4), %esi
xorl -216(%ebp,%edx,4), %esi
movl S@GOTOFF(%ebx,%esi,4), %eax
xorl -212(%ebp,%edx,4), %eax
movl S@GOTOFF(%ebx,%eax,4), %edi
xorl -208(%ebp,%edx,4), %edi
movl %esi, -216(%ebp,%edx,4)
movl S@GOTOFF(%ebx,%edi,4), %esi
xorl -204(%ebp,%edx,4), %esi
movl %eax, -212(%ebp,%edx,4)
movl S@GOTOFF(%ebx,%esi,4), %eax
xorl -200(%ebp,%edx,4), %eax
movl %edi, -208(%ebp,%edx,4)
movl S@GOTOFF(%ebx,%eax,4), %edi
xorl -196(%ebp,%edx,4), %edi
movl %esi, -204(%ebp,%edx,4)
movl S@GOTOFF(%ebx,%edi,4), %esi
xorl -192(%ebp,%edx,4), %esi
movl %eax, -200(%ebp,%edx,4)
movl S@GOTOFF(%ebx,%esi,4), %eax
xorl -188(%ebp,%edx,4), %eax
movl %edi, -196(%ebp,%edx,4)
movl %esi, -192(%ebp,%edx,4)
movl %eax, -188(%ebp,%edx,4)
addl $8, %edx
cmpl $47, %edx
jle .L28
addl %ecx, %eax
incl %ecx
andl $255, %eax
cmpl $17, %ecx
jle .L29
=============================================================
GCC 4.0
=============================================================
.L16:
movl -384(%ebp), %eax
movl -208(%ebp), %esi
incl -384(%ebp)
addl %esi, %eax
movl -456(%ebp), %esi
andl $255, %eax
movl (%edi,%eax,4), %ecx
movl -464(%ebp), %eax
xorl %ecx, %esi
movl (%edi,%esi,4), %edx
movl %esi, -368(%ebp)
movl %esi, -456(%ebp)
movl -488(%ebp), %esi
xorl %edx, %eax
movl -472(%ebp), %edx
movl (%edi,%eax,4), %ecx
movl (%edi,%eax,4), %ecx
movl %eax, -364(%ebp)
movl %eax, -464(%ebp)
xorl %ecx, %edx
movl -480(%ebp), %ecx
movl (%edi,%edx,4), %eax
movl %edx, -360(%ebp)
movl %edx, -472(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %ecx, -356(%ebp)
movl %ecx, -480(%ebp)
xorl %eax, %esi
movl -496(%ebp), %eax
movl (%edi,%esi,4), %edx
movl %esi, -352(%ebp)
movl %esi, -488(%ebp)
xorl %edx, %eax
movl -504(%ebp), %edx
movl (%edi,%eax,4), %ecx
movl %eax, -348(%ebp)
movl %eax, -496(%ebp)
xorl %ecx, %edx
movl -512(%ebp), %ecx
movl (%edi,%edx,4), %eax
movl %edx, -344(%ebp)
movl %edx, -504(%ebp)
xorl %eax, %ecx
movl %ecx, -340(%ebp)
movl (%edi,%ecx,4), %eax
movl -520(%ebp), %esi
movl %ecx, -512(%ebp)
xorl %eax, %esi
movl -528(%ebp), %eax
movl (%edi,%esi,4), %edx
movl %esi, -336(%ebp)
movl %esi, -520(%ebp)
movl -552(%ebp), %esi
xorl %edx, %eax
movl -536(%ebp), %edx
movl (%edi,%eax,4), %ecx
movl %eax, -332(%ebp)
movl %eax, -528(%ebp)
xorl %ecx, %edx
movl -544(%ebp), %ecx
movl (%edi,%edx,4), %eax
movl %edx, -328(%ebp)
movl %edx, -536(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %ecx, -324(%ebp)
movl %ecx, -544(%ebp)
xorl %eax, %esi
movl -556(%ebp), %eax
movl (%edi,%esi,4), %edx
movl %esi, -320(%ebp)
movl %esi, -552(%ebp)
movl -568(%ebp), %esi
xorl %edx, %eax
movl -560(%ebp), %edx
movl (%edi,%eax,4), %ecx
movl %eax, -316(%ebp)
movl %eax, -556(%ebp)
xorl %ecx, %edx
movl -564(%ebp), %ecx
movl (%edi,%edx,4), %eax
movl %edx, -312(%ebp)
movl %edx, -560(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %ecx, -308(%ebp)
movl %ecx, -564(%ebp)
xorl %eax, %esi
movl %esi, -304(%ebp)
movl (%edi,%esi,4), %edx
movl -572(%ebp), %eax
movl %esi, -568(%ebp)
movl -396(%ebp), %esi
xorl %edx, %eax
movl -576(%ebp), %edx
movl (%edi,%eax,4), %ecx
movl %eax, -300(%ebp)
movl %eax, -572(%ebp)
xorl %ecx, %edx
movl -580(%ebp), %ecx
movl (%edi,%edx,4), %eax
movl %edx, -296(%ebp)
movl %edx, -576(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %ecx, -292(%ebp)
movl %ecx, -580(%ebp)
xorl %eax, %esi
movl -400(%ebp), %eax
movl (%edi,%esi,4), %edx
movl %esi, -288(%ebp)
movl %esi, -396(%ebp)
movl -412(%ebp), %esi
xorl %edx, %eax
movl -404(%ebp), %edx
movl (%edi,%eax,4), %ecx
movl %eax, -284(%ebp)
movl %eax, -400(%ebp)
xorl %ecx, %edx
movl -408(%ebp), %ecx
movl (%edi,%edx,4), %eax
movl %edx, -280(%ebp)
movl %edx, -404(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %ecx, -276(%ebp)
movl %ecx, -408(%ebp)
xorl %eax, %esi
movl -416(%ebp), %eax
movl (%edi,%esi,4), %edx
movl %esi, -272(%ebp)
movl %esi, -412(%ebp)
xorl %edx, %eax
movl %eax, -268(%ebp)
movl (%edi,%eax,4), %ecx
movl -420(%ebp), %edx
movl %eax, -416(%ebp)
movl -428(%ebp), %esi
xorl %ecx, %edx
movl -424(%ebp), %ecx
movl (%edi,%edx,4), %eax
movl %edx, -264(%ebp)
movl %edx, -420(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %ecx, -260(%ebp)
movl %ecx, -424(%ebp)
xorl %eax, %esi
movl -432(%ebp), %eax
movl (%edi,%esi,4), %edx
movl %esi, -256(%ebp)
movl %esi, -428(%ebp)
movl -444(%ebp), %esi
xorl %edx, %eax
movl -436(%ebp), %edx
movl (%edi,%eax,4), %ecx
movl %eax, -252(%ebp)
movl %eax, -432(%ebp)
xorl %ecx, %edx
movl -440(%ebp), %ecx
movl (%edi,%edx,4), %eax
movl %edx, -248(%ebp)
movl %edx, -436(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %ecx, -244(%ebp)
movl %ecx, -440(%ebp)
xorl %eax, %esi
movl -448(%ebp), %eax
movl (%edi,%esi,4), %edx
movl %esi, -240(%ebp)
movl %esi, -444(%ebp)
xorl %edx, %eax
movl -452(%ebp), %edx
movl (%edi,%eax,4), %ecx
movl %eax, -236(%ebp)
movl %eax, -448(%ebp)
xorl %ecx, %edx
movl %edx, -232(%ebp)
movl (%edi,%edx,4), %eax
movl -460(%ebp), %ecx
movl -468(%ebp), %esi
movl %edx, -452(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %ecx, -228(%ebp)
movl %ecx, -460(%ebp)
xorl %eax, %esi
movl -476(%ebp), %eax
movl (%edi,%esi,4), %edx
movl %esi, -224(%ebp)
movl %esi, -468(%ebp)
movl -500(%ebp), %esi
xorl %edx, %eax
movl -484(%ebp), %edx
movl (%edi,%eax,4), %ecx
movl %eax, -220(%ebp)
movl %eax, -476(%ebp)
xorl %ecx, %edx
movl -492(%ebp), %ecx
movl (%edi,%edx,4), %eax
movl %edx, -216(%ebp)
movl %edx, -484(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %edx, -216(%ebp)
movl %edx, -484(%ebp)
xorl %eax, %ecx
movl (%edi,%ecx,4), %eax
movl %ecx, -212(%ebp)
movl %ecx, -492(%ebp)
xorl %eax, %esi
movl -508(%ebp), %eax
movl (%edi,%esi,4), %edx
movl %esi, -380(%ebp)
movl %esi, -500(%ebp)
xorl %edx, %eax
movl -516(%ebp), %edx
movl (%edi,%eax,4), %esi
movl %eax, -376(%ebp)
movl %eax, -508(%ebp)
xorl %esi, %edx
movl -524(%ebp), %esi
movl (%edi,%edx,4), %ecx
movl %edx, -372(%ebp)
movl %edx, -516(%ebp)
xorl %ecx, %esi
movl %esi, -524(%ebp)
movl -532(%ebp), %ecx
movl (%edi,%esi,4), %edx
xorl %edx, %ecx
movl -540(%ebp), %edx
movl (%edi,%ecx,4), %eax
movl %ecx, -532(%ebp)
xorl %eax, %edx
movl -548(%ebp), %eax
xorl (%edi,%edx,4), %eax
movl %edx, -540(%ebp)
movl %eax, -584(%ebp)
movl %eax, -548(%ebp)
movl (%edi,%eax,4), %eax
xorl %eax, -208(%ebp)
cmpl $17, -384(%ebp)
jne .L16
=============================================================
The loop was unrolled, but it's clear that the address mode selection is worse.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2005-06-01 22:55:36
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (5 preceding siblings ...)
2005-06-01 22:55 ` giovannibajo at libero dot it
@ 2005-06-01 22:59 ` giovannibajo at libero dot it
2005-06-02 8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
` (21 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-01 22:59 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From giovannibajo at libero dot it 2005-06-01 22:59 -------
I wonder if this is fixed by TARGET_MEM_REF.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (6 preceding siblings ...)
2005-06-01 22:59 ` giovannibajo at libero dot it
@ 2005-06-02 8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-06 7:16 ` steven at gcc dot gnu dot org
` (20 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-02 8:01 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-02 08:01 -------
Subject: Re: openssl is slower when compiled with gcc 4.0 than 3.3
The assembler attributed to 4.0 was produced by mainline (or some
patched version of 4.0), wasn't it?
Otherwise I cannot imagine why the inner loop would be unrolled.
For plain 4.0, we get the following code, which seems just fine
and equivalent to the one obtained with 3.4 (one of the memory
references is strength reduced, but since we still fit into registers,
this is OK).
I don't just now see what/whether there is some problem with the code
produced by 4.1, but I also don't see anything related to addressing
mode selection there.
.L21:
movl S@GOTOFF(%ebx,%eax,4), %eax
xorl (%edx), %eax
movl %eax, (%edx)
movl S@GOTOFF(%ebx,%eax,4), %eax
xorl 4(%edx), %eax
movl %eax, 4(%edx)
movl S@GOTOFF(%ebx,%eax,4), %eax
xorl 8(%edx), %eax
movl %eax, 8(%edx)
movl S@GOTOFF(%ebx,%eax,4), %eax
xorl 12(%edx), %eax
movl %eax, 12(%edx)
movl S@GOTOFF(%ebx,%eax,4), %eax
xorl 16(%edx), %eax
movl %eax, 16(%edx)
movl S@GOTOFF(%ebx,%eax,4), %eax
xorl 20(%edx), %eax
movl %eax, 20(%edx)
movl S@GOTOFF(%ebx,%eax,4), %eax
xorl 24(%edx), %eax
movl %eax, 24(%edx)
movl S@GOTOFF(%ebx,%eax,4), %eax
xorl 28(%edx), %eax
movl %eax, 28(%edx)
addl $32, %edx
leal -12(%ebp), %esi
cmpl %esi, %edx
jne .L21
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (7 preceding siblings ...)
2005-06-02 8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-06-06 7:16 ` steven at gcc dot gnu dot org
2005-06-06 7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
` (19 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-06 7:16 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-06-06 07:16 -------
Could L1 icache blow-out be the reason?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (8 preceding siblings ...)
2005-06-06 7:16 ` steven at gcc dot gnu dot org
@ 2005-06-06 7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-06 13:33 ` giovannibajo at libero dot it
` (18 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-06 7:30 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-06 07:30 -------
Subject: Re: openssl is slower when compiled with gcc 4.0 than 3.3
> Could L1 icache blow-out be the reason?
This is not likely with the minimized example.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (9 preceding siblings ...)
2005-06-06 7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-06-06 13:33 ` giovannibajo at libero dot it
2005-06-06 14:40 ` giovannibajo at libero dot it
` (17 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-06 13:33 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From giovannibajo at libero dot it 2005-06-06 13:33 -------
Uhm, at this point, I don't believe anymore that the loop I posted is the cause
of the regression. Maybe the regression is somewhere else. I'll investigate.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (10 preceding siblings ...)
2005-06-06 13:33 ` giovannibajo at libero dot it
@ 2005-06-06 14:40 ` giovannibajo at libero dot it
2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
` (16 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-06 14:40 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From giovannibajo at libero dot it 2005-06-06 14:40 -------
Looks like the culrpit is this:
=========================================================================
static unsigned int S[256];
unsigned
md2_block (unsigned int *sp1, unsigned int *sp2, const unsigned char *d)
{
register unsigned int t;
register int i, j;
static unsigned int state[48];
j = sp2[16 - 1];
for (i = 0; i < 16; i++)
{
state[i] = sp1[i];
state[i + 16] = t = d[i];
state[i + 32] = (t ^ sp1[i]);
j = sp2[i] ^= S[t ^ j];
}
}
=========================================================================
gcc 3.4.3 -fPIC -O2:
===================================================
.L5:
movl 8(%ebp), %esi
movl (%esi,%ecx,4), %eax
movl %eax, state.0@GOTOFF(%ebx,%ecx,4)
movl 16(%ebp), %edx
movzbl (%edx,%ecx), %eax
movl %eax, 64+state.0@GOTOFF(%ebx,%ecx,4)
movl (%esi,%ecx,4), %edx
xorl %eax, %edx
movl -16(%ebp), %esi
xorl -20(%ebp), %eax
movl %edx, 128+state.0@GOTOFF(%ebx,%ecx,4)
movl (%esi,%eax,4), %eax
xorl (%edi,%ecx,4), %eax
movl %eax, (%edi,%ecx,4)
incl %ecx
cmpl $15, %ecx
movl %eax, -20(%ebp)
jle .L5
===================================================
gcc 4.1.0 20050529 -fPIC -O2:
===================================================
.L2:
movl 8(%ebp), %eax
leal 0(,%edi,4), %ecx
movl %ecx, -28(%ebp)
addl %ecx, %eax
movl 16(%ebp), %ecx
movl %eax, %edx
movl %eax, -24(%ebp)
movl -4(%eax), %eax
movl %eax, (%esi)
movzbl -1(%ecx,%edi), %eax
incl %edi
movl %eax, 64(%esi)
movl -4(%edx), %ecx
movl 12(%ebp), %edx
xorl %eax, %ecx
movl %ecx, 128(%esi)
movl -28(%ebp), %ecx
addl $4, %esi
addl %edx, %ecx
movl -16(%ebp), %edx
xorl %edx, %eax
movl -20(%ebp), %edx
movl (%edx,%eax,4), %eax
movl -4(%ecx), %edx
xorl %edx, %eax
cmpl $17, %edi
movl %eax, -4(%ecx)
movl %eax, -16(%ebp)
jne .L2
===================================================
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (11 preceding siblings ...)
2005-06-06 14:40 ` giovannibajo at libero dot it
@ 2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-08 13:15 ` [Bug target/19923] [4.0/4.1 Regression] " giovannibajo at libero dot it
` (15 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-06 15:00 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-06 15:00 -------
Subject: Re: openssl is slower when compiled with gcc 4.0 than 3.3
> Looks like the culrpit is this:
>
> =========================================================================
> static unsigned int S[256];
> unsigned
> md2_block (unsigned int *sp1, unsigned int *sp2, const unsigned char *d)
> {
> register unsigned int t;
> register int i, j;
> static unsigned int state[48];
>
> j = sp2[16 - 1];
> for (i = 0; i < 16; i++)
> {
> state[i] = sp1[i];
> state[i + 16] = t = d[i];
> state[i + 32] = (t ^ sp1[i]);
> j = sp2[i] ^= S[t ^ j];
> }
> }
> =========================================================================
with the TARGET_MEM_REFs patch the result is much better. At
least we avoid the multiplication by 4
> leal 0(,%edi,4), %ecx
and other results of the DOM missoptimization of addressing modes, that was
one of the main motivations for TARGET_MEM_REFs.
We still use one more iv than in the 3.4 case, and in result we need one
more register.
.L2:
movl 8(%ebp), %edi
movl -4(%edi,%ecx,4), %eax
movl %eax, (%esi)
movl 16(%ebp), %edx
movzbl -1(%ecx,%edx), %eax
movl %eax, 64(%esi)
movl -4(%edi,%ecx,4), %edx
xorl %eax, %edx
movl %edx, 128(%esi)
xorl -20(%ebp), %eax
movl -16(%ebp), %edi
movl (%edi,%eax,4), %eax
movl 12(%ebp), %edx
xorl -4(%edx,%ecx,4), %eax
movl %eax, -4(%edx,%ecx,4)
movl %eax, -20(%ebp)
incl %ecx
addl $4, %esi
cmpl $17, %ecx
jne .L2
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (12 preceding siblings ...)
2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-06-08 13:15 ` giovannibajo at libero dot it
2005-06-17 0:59 ` dank at kegel dot com
` (14 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: giovannibajo at libero dot it @ 2005-06-08 13:15 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Summary|openssl is slower when |[4.0/4.1 Regression] openssl
|compiled with gcc 4.0 than |is slower when compiled with
|3.3 |gcc 4.0 than 3.3
Target Milestone|--- |4.0.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (13 preceding siblings ...)
2005-06-08 13:15 ` [Bug target/19923] [4.0/4.1 Regression] " giovannibajo at libero dot it
@ 2005-06-17 0:59 ` dank at kegel dot com
2005-06-17 1:11 ` pinskia at gcc dot gnu dot org
` (13 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-17 0:59 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dank at kegel dot com 2005-06-17 00:59 -------
We're learning more about this bug.
Anthony Danalis has boiled down the testcase much further;
I'll attach the reduced testcase as foo4.i.
It looks like it shows up if your /proc/cpuinfo says
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 9
cpu MHz : 2793.051
cache size : 512 KB
but not if your /proc/cpuinfo says
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping : 1
cpu MHz : 3200.255
cache size : 1024 KB
But here's the fun part: on the newer CPU with the bigger
cache, gcc-2.95.3 was just as slow as gcc-3.4.3/gcc-4.0.0.
Go figure.
We'll add more details once we've got more info.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (14 preceding siblings ...)
2005-06-17 0:59 ` dank at kegel dot com
@ 2005-06-17 1:11 ` pinskia at gcc dot gnu dot org
2005-06-18 6:24 ` dank at kegel dot com
` (12 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-06-17 1:11 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-06-17 01:10 -------
(In reply to comment #14)
> We're learning more about this bug.
> Anthony Danalis has boiled down the testcase much further;
> I'll attach the reduced testcase as foo4.i.
Yes you know what the difference is between those two, the second one is not really a P4 but really a
new core, Intel marketing at its best.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (15 preceding siblings ...)
2005-06-17 1:11 ` pinskia at gcc dot gnu dot org
@ 2005-06-18 6:24 ` dank at kegel dot com
2005-06-18 6:39 ` dank at kegel dot com
` (11 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-18 6:24 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dank at kegel dot com 2005-06-18 06:24 -------
Looks to me like gcc-3.4.3 is known to fail, too, depending on the CPU.
Anthony Danalis and I came up with a little script to run foo4.i
on various processors with various values for -mtune, which I'll
attach; here are the results for four different x86 variants.
The last two columns are the time on gcc-3.4.3 and gcc-4.0.0
divided by the time on gcc-2.95.3, so any value above 1.0 in
the last column is a performance regression.
Rows are sorted by the last column. The first five
rows represent performance regressions for gcc-3.4.3;
the first three also represent performance regressions
for gcc-4.0.0.
family,model,name pic? tune [t_295, t_343, t_400]
[t_295/t_295, t_343/t_295, t_400/t_295]
6,8, Pentium III (Coppermine), -fPIC athlon-xp [9.25, 16.22, 18.79] [1.00,
1.75, 2.03]
15,2, Xeon(TM) CPU 2.60GHz, -fPIC pentium4 [1.91, 3.89, 3.27] [1.00,
2.04, 1.71]
6,8, Pentium III (Coppermine), -fPIC pentium3 [9.15, 10.10, 13.20] [1.00,
1.10, 1.44]
15,2, Xeon(TM) CPU 2.60GHz, -fPIC athlon-xp [1.91, 2.00, 1.95] [1.00,
1.05, 1.02]
6,8, Pentium III (Coppermine), -fPIC pentium4 [9.27, 10.49, 8.87] [1.00,
1.13, 0.96]
--- ok below this line ---
6,8, Pentium III (Coppermine), pentium4 [14.74, 13.71, 14.12] [1.00,
0.93, 0.96]
15,4, Athlon(tm) 64 3000+, -fPIC pentium4 [4.12, 3.68, 3.74] [1.00,
0.89, 0.91]
15,4, Pentium(R) 4 CPU 3.20GHz, -fPIC pentium4 [2.48, 2.18, 2.09] [1.00,
0.88, 0.84]
15,4, Athlon(tm) 64 3000+, -fPIC athlon-xp [4.12, 3.50, 3.20] [1.00,
0.85, 0.78]
15,4, Pentium(R) 4 CPU 3.20GHz, pentium4 [2.17, 1.07, 1.07] [1.00,
0.49, 0.49]
6,8, Pentium III (Coppermine), pentium3 [14.22, 6.26, 6.46] [1.00,
0.44, 0.45]
6,8, Pentium III (Coppermine), athlon-xp [14.93, 6.26, 6.27] [1.00,
0.42, 0.42]
15,4, Athlon(tm) 64 3000+, pentium4 [3.65, 1.39, 1.39] [1.00,
0.38, 0.38]
15,4, Athlon(tm) 64 3000+, athlon-xp [3.65, 1.39, 1.40] [1.00,
0.38, 0.38]
15,2, Xeon(TM) CPU 2.60GHz, pentium4 [6.42, 0.97, 0.98] [1.00,
0.15, 0.15]
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (16 preceding siblings ...)
2005-06-18 6:24 ` dank at kegel dot com
@ 2005-06-18 6:39 ` dank at kegel dot com
2005-06-18 17:47 ` dank at kegel dot com
` (10 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-18 6:39 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dank at kegel dot com 2005-06-18 06:38 -------
To be clear, here are the two most worrying rows from the above table,
expanded a bit. These are the runtimes of foo4.i in seconds.
The cpu family, model, and name are as shown by /proc/cpuinfo.
cpu family 15, model 2, Intel(R) Xeon(TM) CPU 2.60GHz:
-fPIC -mtune=pentium4 -O3
gcc-2.95.3: 1.91 seconds
gcc-3.4.3: 3.89
gcc-4.0.0: 3.27
cpu family 6, model 8, Pentium III (Coppermine)
-fPIC -mtune=pentium3 -O3
gcc-2.95.3: 9.15
gcc-3.4.3: 10.10
gcc-4.0.0: 13.20
gcc-4.0.0 produces code that runs 1.7 and 1.4 times slower than gcc-2.95.3
on these (fairly common!) cpus, even when the proper -mtune is used.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (17 preceding siblings ...)
2005-06-18 6:39 ` dank at kegel dot com
@ 2005-06-18 17:47 ` dank at kegel dot com
2005-06-18 22:46 ` dank at kegel dot com
` (9 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-18 17:47 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dank at kegel dot com 2005-06-18 17:46 -------
The above tests did not use -mcpu on gcc-2.95.3,
so they were comparing apples to oranges, kind of.
I reran them on a PIII with gcc-2.95.3 -mcpu=$tune -O3
and gcc-[34] -mtune=$tune -O3. The problem persists
even when using the most appropriate tuning option
for the CPU in question.
cpu family 6,model 8, Pentium III (Coppermine):
-fPIC -mcpu=pentium -O3
gcc-2.95.3: 7.61
gcc-3.4.3: 27.43
gcc-4.0.0: 17.57
cpu family 6,model 8, Pentium III (Coppermine):
-fPIC -mcpu=pentiumpro -O3
gcc-2.95.3: 9.27
gcc-3.4.3: 10.09
gcc-4.0.0: 13.96
cpu family 15, model 2, Intel(R) Xeon(TM) CPU 2.60GHz:
-fPIC -mtune=pentium4 -O3
gcc-2.95.3: 1.91 seconds
gcc-3.4.3: 3.89
gcc-4.0.0: 3.27
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (18 preceding siblings ...)
2005-06-18 17:47 ` dank at kegel dot com
@ 2005-06-18 22:46 ` dank at kegel dot com
2005-06-24 15:00 ` dank at kegel dot com
` (8 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-18 22:46 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dank at kegel dot com 2005-06-18 22:45 -------
I asked the fellow who posted the original problem report to give
me the results of 'cat /proc/cpuinfo' on the affected machine.
Here it is:
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 10
cpu MHz : 896.153
This is the same as one of the two affected CPU types here.
The slow routine appears to be the buffer cleaning routine,
though I haven't verified this with oprofile yet.
Here's its loop:
static char cleanse_ctr;
...
while (len--) {
*(ptr++) = cleanse_ctr;
cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF));
}
and the output of -O3 -fPIC for both gcc-2.95.3 and gcc-4.0.0:
--- gcc-2.95.3 ---
.L5:
movl cleanse_ctr@GOT(%ebx),%edi
movb (%edi),%al
movb %al,(%edx)
incl %edx
movb (%edi),%cl
addb $17,%cl
movb %dl,%al
andb $15,%al
addb %al,%cl
movb %cl,(%edi)
subl $1,%esi
jnc .L5
.L4:
--- gcc-4 ---
.L4:
movb (%esi), %al
movb %al, (%edx)
leal (%ecx,%edi), %eax
andl $15, %eax
incl %ecx
addb (%esi), %al
incl %edx
addl $17, %eax
cmpl %ecx, 12(%ebp)
movb %al, (%esi)
jne .L4
It's not obvious to me why the gcc-4.0.0 generated code
should be slower when run on some CPUs, if in fact it is.
Is it the fact that the loop condition is checked with
a cmp against memory instead of a flag being set by subtracting
1 from a register?
(And where's the best place to learn about how to predict
how long assembly snippets like this will take to run
on various modern CPUs, anyway?)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (19 preceding siblings ...)
2005-06-18 22:46 ` dank at kegel dot com
@ 2005-06-24 15:00 ` dank at kegel dot com
2005-06-24 15:01 ` dank at kegel dot com
` (7 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-24 15:00 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dank at kegel dot com 2005-06-24 15:00 -------
Michael Meissner looked at the code, and saw that
gcc-2.95.3 converts the loop to a countdown loop,
but gcc-3.x doesn't, which wastes a precious register.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (20 preceding siblings ...)
2005-06-24 15:00 ` dank at kegel dot com
@ 2005-06-24 15:01 ` dank at kegel dot com
2005-06-24 15:53 ` steven at gcc dot gnu dot org
` (6 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: dank at kegel dot com @ 2005-06-24 15:01 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dank at kegel dot com 2005-06-24 15:01 -------
And, for what it's worth, the latest 4.1 snapshot also suffers from this.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (21 preceding siblings ...)
2005-06-24 15:01 ` dank at kegel dot com
@ 2005-06-24 15:53 ` steven at gcc dot gnu dot org
2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
` (5 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-24 15:53 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-06-24 15:53 -------
I don't see how the precious register would matter much. But this compare
with memory is strange:
cmpl %ecx, 12(%ebp)
Why isn't len loaded into a register??
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (22 preceding siblings ...)
2005-06-24 15:53 ` steven at gcc dot gnu dot org
@ 2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu
` (4 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-24 16:24 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-24 16:24 -------
Subject: Re: [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
> I don't see how the precious register would matter much. But this compare
> with memory is strange:
>
> cmpl %ecx, 12(%ebp)
>
> Why isn't len loaded into a register??
You answer your own question -- because there is no register free;
that's why the precisious register maters that much.
(I guess; I may be wrong).
Zdenek
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (23 preceding siblings ...)
2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu
2005-06-25 2:49 ` rakdver at gcc dot gnu dot org
` (3 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-06-24 17:41 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-06-24 17:41 -------
(In reply to comment #21)
> The slow routine appears to be the buffer cleaning routine,
> though I haven't verified this with oprofile yet.
> Here's its loop:
> static char cleanse_ctr;
> ...
> while (len--) {
> *(ptr++) = cleanse_ctr;
> cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF));
> }
[Not entirely related, but..] There's one obvious way to improve this loop.
The compiler cannot prove that the write *(ptr++) does not alias the global
variable cleanse_ptr, so it will read it from memory in each iteration.
To avoid the extra memory read just do something like:
void OPENSSL_cleanse(unsigned char *ptr, unsigned int len)
{
unsigned char local_cleanse_ctr = cleanse_ctr;
while (len--) {
*(ptr++) = local_cleanse_ctr;
local_cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF));
}
local_cleanse_ctr += 63;
cleanse_ctr = local_cleanse_ctr;
}
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (24 preceding siblings ...)
2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu
@ 2005-06-25 2:49 ` rakdver at gcc dot gnu dot org
2005-06-25 10:15 ` steven at gcc dot gnu dot org
` (2 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2005-06-25 2:49 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rakdver at gcc dot gnu dot org 2005-06-25 02:49 -------
Ivopts seem to do several quite doubtful decisions in this testcase.
--
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rakdver at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (25 preceding siblings ...)
2005-06-25 2:49 ` rakdver at gcc dot gnu dot org
@ 2005-06-25 10:15 ` steven at gcc dot gnu dot org
2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org
28 siblings, 0 replies; 30+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-25 10:15 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-06-25 10:15 -------
Re. comment #25, as far as I can tell there are registers available in
that loop. To quote the loop from comment #12:
.L4:
movb (%esi), %al
movb %al, (%edx)
leal (%ecx,%edi), %eax
andl $15, %eax
incl %ecx
addb (%esi), %al
incl %edx
addl $17, %eax
cmpl %ecx, 12(%ebp)
movb %al, (%esi)
jne .L4
Checking off used registers in this loop:
%esi x
%edi x
%eax x
%ebx
%ecx x
%edx x
So %ebx at least is free (and iiuc, with -fomit-frame-pointer %ebp is
also free, right?). Maybe the allocator thinks %ebx can't be used
because it is the PIC register.
Here is what mainline today ("GCC: (GNU) 4.1.0 20050625 (experimental)")
gives me (x86-64 compiler with "-m32 -march=i686 -O3 -fPIC"):
.L4:
movzbl (%esi), %eax
movb %al, (%ecx)
incl %ecx
movzbl -13(%ebp), %eax
movzbl (%esi), %edx
incb -13(%ebp)
andb $15, %al
addb $17, %dl
addb %dl, %al
cmpl %edi, %ecx
movb %al, (%esi)
jne .L4
The .optimized tree dump looks like this:
<bb 0>:
len.23 = len - 1;
if (len.23 != 4294967295) goto <L6>; else goto <L2>;
<L6>:;
ivtmp.19 = (unsigned char) (signed char) (int) (ptr + 1B);
ptr.27 = ptr;
<L0>:;
MEM[base: ptr.27] = cleanse_ctr;
ptr.27 = ptr.27 + 1B;
cleanse_ctr = (unsigned char) (((signed char) ivtmp.19 & 15)
+ (signed char) cleanse_ctr + 17);
ivtmp.19 = ivtmp.19 + 1;
if (ptr.27 != (unsigned char *) (ptr + (void *) len.23 + 1B)) goto <L0>;
else goto <L2>;
<L2>:;
cleanse_ctr = (unsigned char) ((signed char) cleanse_ctr + 63);
return;
Note how the loop test is against ptr. Also, as far as I can tell the
right hand side of the test (i.e. "(ptr + (void *) len.23 + 1B)") is loop
invariant and should have been moved out. And the first two lines are
also just weird, it is probably cheaper on almost any machine to do
len.23 = len;
if (len.23 != 0) goto <L6>; else goto <L2>;
<L6>:
len.23 = len.23 - 1;
(etc...)
In summary, we just produce crap code here ;-)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (26 preceding siblings ...)
2005-06-25 10:15 ` steven at gcc dot gnu dot org
@ 2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org
28 siblings, 0 replies; 30+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-25 11:32 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-25 11:32 -------
Subject: Re: [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
> ------- Additional Comments From steven at gcc dot gnu dot org 2005-06-25 10:15 -------
> Re. comment #25, as far as I can tell there are registers available in
> that loop. To quote the loop from comment #12:
>
> .L4:
> movb (%esi), %al
> movb %al, (%edx)
> leal (%ecx,%edi), %eax
> andl $15, %eax
> incl %ecx
> addb (%esi), %al
> incl %edx
> addl $17, %eax
> cmpl %ecx, 12(%ebp)
> movb %al, (%esi)
> jne .L4
>
> Checking off used registers in this loop:
> %esi x
> %edi x
> %eax x
> %ebx
> %ecx x
> %edx x
>
> So %ebx at least is free (and iiuc, with -fomit-frame-pointer %ebp is
> also free, right?). Maybe the allocator thinks %ebx can't be used
> because it is the PIC register.
yes, ebx cannot be used because of pic, and -fomit-frame-pointer is off
by default.
> Here is what mainline today ("GCC: (GNU) 4.1.0 20050625 (experimental)")
> gives me (x86-64 compiler with "-m32 -march=i686 -O3 -fPIC"):
>
> .L4:
> movzbl (%esi), %eax
> movb %al, (%ecx)
> incl %ecx
> movzbl -13(%ebp), %eax
> movzbl (%esi), %edx
> incb -13(%ebp)
> andb $15, %al
> addb $17, %dl
> addb %dl, %al
> cmpl %edi, %ecx
> movb %al, (%esi)
> jne .L4
>
> The .optimized tree dump looks like this:
>
> <bb 0>:
> len.23 = len - 1;
> if (len.23 != 4294967295) goto <L6>; else goto <L2>;
> And the first two lines are
> also just weird, it is probably cheaper on almost any machine to do
> len.23 = len;
> if (len.23 != 0) goto <L6>; else goto <L2>;
>
> <L6>:
> len.23 = len.23 - 1;
> (etc...)
Not really. On i686, there should be no difference.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
` (27 preceding siblings ...)
2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
@ 2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org
28 siblings, 0 replies; 30+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-27 15:57 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.0.2 |4.0.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2005-09-27 15:57 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl
2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org
2005-02-13 6:44 ` pinskia at gcc dot gnu dot org
2005-05-14 20:36 ` pinskia at gcc dot gnu dot org
2005-06-01 20:47 ` yx at cs dot ucla dot edu
2005-06-01 20:55 ` pinskia at gcc dot gnu dot org
2005-06-01 22:55 ` giovannibajo at libero dot it
2005-06-01 22:59 ` giovannibajo at libero dot it
2005-06-02 8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-06 7:16 ` steven at gcc dot gnu dot org
2005-06-06 7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-06 13:33 ` giovannibajo at libero dot it
2005-06-06 14:40 ` giovannibajo at libero dot it
2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-08 13:15 ` [Bug target/19923] [4.0/4.1 Regression] " giovannibajo at libero dot it
2005-06-17 0:59 ` dank at kegel dot com
2005-06-17 1:11 ` pinskia at gcc dot gnu dot org
2005-06-18 6:24 ` dank at kegel dot com
2005-06-18 6:39 ` dank at kegel dot com
2005-06-18 17:47 ` dank at kegel dot com
2005-06-18 22:46 ` dank at kegel dot com
2005-06-24 15:00 ` dank at kegel dot com
2005-06-24 15:01 ` dank at kegel dot com
2005-06-24 15:53 ` steven at gcc dot gnu dot org
2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu
2005-06-25 2:49 ` rakdver at gcc dot gnu dot org
2005-06-25 10:15 ` steven at gcc dot gnu dot org
2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).