public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 @ 2005-02-12 20:07 gj at pointblue dot com dot pl 2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org ` (28 more replies) 0 siblings, 29 replies; 35+ messages in thread From: gj at pointblue dot com dot pl @ 2005-02-12 20:07 UTC (permalink / raw) To: gcc-bugs here's openssl speed resoult when it's compiled with 3.3 (orginal debian unstable package): options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx) compiler: gcc -fPIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_NO_KRB5 -DOPENSSL_NO_IDEA -DOPENSSL_NO_MDC2 -DOPENSSL_NO_RC5 -DL_ENDIAN -DTERMIO -O3 -march=i686 -mcpu=i686 -fomit-frame-pointer -Wall -DSHA1_ASM -DMD5_ASM -DRMD160_ASM available timing options: TIMES TIMEB HZ=100 [sysconf value] timing function used: times The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 510.80k 1064.79k 1486.96k 1641.83k 1702.87k mdc2 0.00 0.00 0.00 0.00 0.00 md4 4999.47k 17746.97k 51392.88k 97451.59k 131711.89k md5 4405.95k 15208.16k 43027.34k 77946.11k 101040.96k hmac(md5) 4951.58k 16851.67k 46126.90k 81002.65k 101700.77k sha1 3892.54k 12223.89k 29586.19k 45767.99k 54082.03k rmd160 3715.14k 10397.52k 23079.49k 33148.87k 37651.83k rc4 58941.98k 66899.63k 71733.39k 72572.54k 72476.92k des cbc 13353.92k 13897.80k 14067.26k 14088.53k 14107.61k des ede3 4887.63k 5039.28k 5083.63k 5116.70k 5086.58k idea cbc 0.00 0.00 0.00 0.00 0.00 rc2 cbc 5257.37k 5534.13k 5560.97k 5610.12k 5582.42k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 blowfish cbc 21054.83k 22340.34k 22704.49k 22895.90k 22860.91k cast cbc 14478.39k 15882.31k 16400.99k 16570.03k 16585.01k aes-128 cbc 13612.33k 14364.39k 14382.68k 14404.12k 14440.26k aes-192 cbc 12075.70k 12370.43k 12530.49k 12518.63k 12559.92k aes-256 cbc 10806.91k 11093.65k 11179.27k 11185.67k 11205.97k sign verify sign/s verify/s rsa 512 bits 0.0023s 0.0002s 438.5 4928.2 rsa 1024 bits 0.0109s 0.0006s 91.6 1746.1 rsa 2048 bits 0.0646s 0.0019s 15.5 527.6 rsa 4096 bits 0.4317s 0.0066s 2.3 152.0 sign verify sign/s verify/s dsa 512 bits 0.0018s 0.0022s 546.0 460.7 dsa 1024 bits 0.0054s 0.0065s 186.6 154.8 dsa 2048 bits 0.0179s 0.0220s 55.7 45.5 and here's the same package compiled with gcc 4.0, gcc-4.0 (GCC) 4.0.0 20050212 (experimental) compiler: gcc -fPIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_NO_KRB5 -DOPENSSL_NO_IDEA -DO PENSSL_NO_MDC2 -DOPENSSL_NO_RC5 -DL_ENDIAN -DTERMIO -O3 -march=i686 -mcpu=i686 -fomit-frame-pointer -Wall -DSHA1_ASM -DMD5_ASM -DRMD160_ASM available timing options: TIMES TIMEB HZ=100 [sysconf value] timing function used: times The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 361.81k 781.01k 1103.19k 1231.36k 1278.84k mdc2 0.00 0.00 0.00 0.00 0.00 md4 3103.64k 11338.88k 36135.04k 79292.67k 123123.36k md5 2758.32k 10084.74k 31863.54k 66522.25k 98860.02k hmac(md5) 4581.08k 15784.49k 43771.66k 78227.60k 101959.42k sha1 2638.72k 8889.12k 24063.88k 41890.99k 53462.15k rmd160 2477.15k 7918.19k 19696.52k 31106.04k 37317.88k rc4 60284.27k 67543.46k 71379.34k 72455.38k 72581.12k des cbc 13547.77k 13876.64k 14049.67k 14102.25k 14020.78k des ede3 4950.20k 5050.99k 5068.80k 5111.00k 5088.06k idea cbc 0.00 0.00 0.00 0.00 0.00 rc2 cbc 5814.75k 6060.45k 6150.37k 6169.60k 6196.13k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 blowfish cbc 20941.23k 22373.68k 22868.43k 22822.28k 23014.29k cast cbc 12790.60k 14102.95k 14514.24k 14494.77k 14622.21k aes-128 cbc 13030.43k 13549.49k 13653.51k 13694.85k 13696.33k aes-192 cbc 11257.66k 11517.92k 11545.25k 11604.32k 11568.43k aes-256 cbc 10065.01k 10296.48k 10403.82k 10332.02k 10382.25k sign verify sign/s verify/s rsa 512 bits 0.0024s 0.0002s 418.5 4201.7 rsa 1024 bits 0.0112s 0.0006s 89.5 1550.7 rsa 2048 bits 0.0650s 0.0020s 15.4 504.9 rsa 4096 bits 0.4311s 0.0068s 2.3 147.9 sign verify sign/s verify/s dsa 512 bits 0.0019s 0.0023s 521.4 441.9 dsa 1024 bits 0.0055s 0.0067s 182.9 148.3 dsa 2048 bits 0.0181s 0.0222s 55.2 45.1 as you can see almost each test is worst with 4.0. Not sure why. The same test on ultrasparc and amd64 shows 4.0 as clear winner. ( Althou it still crashes on amd64... ;) ) -- Summary: openssl is slower when compiled with gcc 4.0 than 3.3 Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gj at pointblue dot com dot pl CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i86 GCC host triplet: i86 GCC target triplet: i86 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl @ 2005-02-12 20:11 ` pinskia at gcc dot gnu dot org 2005-02-13 6:44 ` pinskia at gcc dot gnu dot org ` (27 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu dot org @ 2005-02-12 20:11 UTC (permalink / raw) To: gcc-bugs -- What |Removed |Added ---------------------------------------------------------------------------- Component|c |target Keywords| |missed-optimization http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl 2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org @ 2005-02-13 6:44 ` pinskia at gcc dot gnu dot org 2005-05-14 20:36 ` pinskia at gcc dot gnu dot org ` (26 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu dot org @ 2005-02-13 6:44 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-02-12 22:24 ------- We need a self contained example. -- What |Removed |Added ---------------------------------------------------------------------------- CC| |pinskia at gcc dot gnu dot | |org Status|UNCONFIRMED |WAITING http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl 2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org 2005-02-13 6:44 ` pinskia at gcc dot gnu dot org @ 2005-05-14 20:36 ` pinskia at gcc dot gnu dot org 2005-06-01 20:47 ` yx at cs dot ucla dot edu ` (25 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu dot org @ 2005-05-14 20:36 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-05-14 20:36 ------- No feedback in 3 months. -- What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |RESOLVED Resolution| |INVALID http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (2 preceding siblings ...) 2005-05-14 20:36 ` pinskia at gcc dot gnu dot org @ 2005-06-01 20:47 ` yx at cs dot ucla dot edu 2005-06-01 20:55 ` pinskia at gcc dot gnu dot org ` (24 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: yx at cs dot ucla dot edu @ 2005-06-01 20:47 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From yx at cs dot ucla dot edu 2005-06-01 20:47 ------- When we ran 'openssh speed md2', we did see that gcc-4.0 was slower than earlier versions, so we created a minimal test case, which we will attach. Here is how long it took to run a 34 megabyte file through the test program when compiled with various compilers and options: gcc-2.95.3 -fPIC -O1 4.940s gcc-4.0.0 -fPIC -O1 3.510s gcc-3.4.3 -fPIC -O1 5.190s gcc-2.95.3 -fPIC -O2 3.470s gcc-3.4.3 -fPIC -O2 3.460s gcc-4.0.0 -fPIC -O2 4.050s gcc-2.95.3 -fPIC -O3 3.400s gcc-3.4.3 -fPIC -O3 3.740s gcc-4.0.0 -fPIC -O3 4.010s This test was done on a pentium 4 workstation, and no smoothing was done on the resulting times, but they seemed to be repeatable. We also tried without -fPIC, but did not see as large a regression there. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (3 preceding siblings ...) 2005-06-01 20:47 ` yx at cs dot ucla dot edu @ 2005-06-01 20:55 ` pinskia at gcc dot gnu dot org 2005-06-01 22:55 ` giovannibajo at libero dot it ` (23 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu dot org @ 2005-06-01 20:55 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-06-01 20:55 ------- I would not doubt this is just not using the i386 address mode -- What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (4 preceding siblings ...) 2005-06-01 20:55 ` pinskia at gcc dot gnu dot org @ 2005-06-01 22:55 ` giovannibajo at libero dot it 2005-06-01 22:59 ` giovannibajo at libero dot it ` (22 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: giovannibajo at libero dot it @ 2005-06-01 22:55 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From giovannibajo at libero dot it 2005-06-01 22:55 ------- Confirmed. The regression appears only with -fPIC, and it's pretty evident. The core is md2_block, the inner loop: GCC 3.4 ============================================================= .L29: xorl %edx, %edx .p2align 2,,3 .L28: movl S@GOTOFF(%ebx,%eax,4), %esi xorl -216(%ebp,%edx,4), %esi movl S@GOTOFF(%ebx,%esi,4), %eax xorl -212(%ebp,%edx,4), %eax movl S@GOTOFF(%ebx,%eax,4), %edi xorl -208(%ebp,%edx,4), %edi movl %esi, -216(%ebp,%edx,4) movl S@GOTOFF(%ebx,%edi,4), %esi xorl -204(%ebp,%edx,4), %esi movl %eax, -212(%ebp,%edx,4) movl S@GOTOFF(%ebx,%esi,4), %eax xorl -200(%ebp,%edx,4), %eax movl %edi, -208(%ebp,%edx,4) movl S@GOTOFF(%ebx,%eax,4), %edi xorl -196(%ebp,%edx,4), %edi movl %esi, -204(%ebp,%edx,4) movl S@GOTOFF(%ebx,%edi,4), %esi xorl -192(%ebp,%edx,4), %esi movl %eax, -200(%ebp,%edx,4) movl S@GOTOFF(%ebx,%esi,4), %eax xorl -188(%ebp,%edx,4), %eax movl %edi, -196(%ebp,%edx,4) movl %esi, -192(%ebp,%edx,4) movl %eax, -188(%ebp,%edx,4) addl $8, %edx cmpl $47, %edx jle .L28 addl %ecx, %eax incl %ecx andl $255, %eax cmpl $17, %ecx jle .L29 ============================================================= GCC 4.0 ============================================================= .L16: movl -384(%ebp), %eax movl -208(%ebp), %esi incl -384(%ebp) addl %esi, %eax movl -456(%ebp), %esi andl $255, %eax movl (%edi,%eax,4), %ecx movl -464(%ebp), %eax xorl %ecx, %esi movl (%edi,%esi,4), %edx movl %esi, -368(%ebp) movl %esi, -456(%ebp) movl -488(%ebp), %esi xorl %edx, %eax movl -472(%ebp), %edx movl (%edi,%eax,4), %ecx movl (%edi,%eax,4), %ecx movl %eax, -364(%ebp) movl %eax, -464(%ebp) xorl %ecx, %edx movl -480(%ebp), %ecx movl (%edi,%edx,4), %eax movl %edx, -360(%ebp) movl %edx, -472(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %ecx, -356(%ebp) movl %ecx, -480(%ebp) xorl %eax, %esi movl -496(%ebp), %eax movl (%edi,%esi,4), %edx movl %esi, -352(%ebp) movl %esi, -488(%ebp) xorl %edx, %eax movl -504(%ebp), %edx movl (%edi,%eax,4), %ecx movl %eax, -348(%ebp) movl %eax, -496(%ebp) xorl %ecx, %edx movl -512(%ebp), %ecx movl (%edi,%edx,4), %eax movl %edx, -344(%ebp) movl %edx, -504(%ebp) xorl %eax, %ecx movl %ecx, -340(%ebp) movl (%edi,%ecx,4), %eax movl -520(%ebp), %esi movl %ecx, -512(%ebp) xorl %eax, %esi movl -528(%ebp), %eax movl (%edi,%esi,4), %edx movl %esi, -336(%ebp) movl %esi, -520(%ebp) movl -552(%ebp), %esi xorl %edx, %eax movl -536(%ebp), %edx movl (%edi,%eax,4), %ecx movl %eax, -332(%ebp) movl %eax, -528(%ebp) xorl %ecx, %edx movl -544(%ebp), %ecx movl (%edi,%edx,4), %eax movl %edx, -328(%ebp) movl %edx, -536(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %ecx, -324(%ebp) movl %ecx, -544(%ebp) xorl %eax, %esi movl -556(%ebp), %eax movl (%edi,%esi,4), %edx movl %esi, -320(%ebp) movl %esi, -552(%ebp) movl -568(%ebp), %esi xorl %edx, %eax movl -560(%ebp), %edx movl (%edi,%eax,4), %ecx movl %eax, -316(%ebp) movl %eax, -556(%ebp) xorl %ecx, %edx movl -564(%ebp), %ecx movl (%edi,%edx,4), %eax movl %edx, -312(%ebp) movl %edx, -560(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %ecx, -308(%ebp) movl %ecx, -564(%ebp) xorl %eax, %esi movl %esi, -304(%ebp) movl (%edi,%esi,4), %edx movl -572(%ebp), %eax movl %esi, -568(%ebp) movl -396(%ebp), %esi xorl %edx, %eax movl -576(%ebp), %edx movl (%edi,%eax,4), %ecx movl %eax, -300(%ebp) movl %eax, -572(%ebp) xorl %ecx, %edx movl -580(%ebp), %ecx movl (%edi,%edx,4), %eax movl %edx, -296(%ebp) movl %edx, -576(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %ecx, -292(%ebp) movl %ecx, -580(%ebp) xorl %eax, %esi movl -400(%ebp), %eax movl (%edi,%esi,4), %edx movl %esi, -288(%ebp) movl %esi, -396(%ebp) movl -412(%ebp), %esi xorl %edx, %eax movl -404(%ebp), %edx movl (%edi,%eax,4), %ecx movl %eax, -284(%ebp) movl %eax, -400(%ebp) xorl %ecx, %edx movl -408(%ebp), %ecx movl (%edi,%edx,4), %eax movl %edx, -280(%ebp) movl %edx, -404(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %ecx, -276(%ebp) movl %ecx, -408(%ebp) xorl %eax, %esi movl -416(%ebp), %eax movl (%edi,%esi,4), %edx movl %esi, -272(%ebp) movl %esi, -412(%ebp) xorl %edx, %eax movl %eax, -268(%ebp) movl (%edi,%eax,4), %ecx movl -420(%ebp), %edx movl %eax, -416(%ebp) movl -428(%ebp), %esi xorl %ecx, %edx movl -424(%ebp), %ecx movl (%edi,%edx,4), %eax movl %edx, -264(%ebp) movl %edx, -420(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %ecx, -260(%ebp) movl %ecx, -424(%ebp) xorl %eax, %esi movl -432(%ebp), %eax movl (%edi,%esi,4), %edx movl %esi, -256(%ebp) movl %esi, -428(%ebp) movl -444(%ebp), %esi xorl %edx, %eax movl -436(%ebp), %edx movl (%edi,%eax,4), %ecx movl %eax, -252(%ebp) movl %eax, -432(%ebp) xorl %ecx, %edx movl -440(%ebp), %ecx movl (%edi,%edx,4), %eax movl %edx, -248(%ebp) movl %edx, -436(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %ecx, -244(%ebp) movl %ecx, -440(%ebp) xorl %eax, %esi movl -448(%ebp), %eax movl (%edi,%esi,4), %edx movl %esi, -240(%ebp) movl %esi, -444(%ebp) xorl %edx, %eax movl -452(%ebp), %edx movl (%edi,%eax,4), %ecx movl %eax, -236(%ebp) movl %eax, -448(%ebp) xorl %ecx, %edx movl %edx, -232(%ebp) movl (%edi,%edx,4), %eax movl -460(%ebp), %ecx movl -468(%ebp), %esi movl %edx, -452(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %ecx, -228(%ebp) movl %ecx, -460(%ebp) xorl %eax, %esi movl -476(%ebp), %eax movl (%edi,%esi,4), %edx movl %esi, -224(%ebp) movl %esi, -468(%ebp) movl -500(%ebp), %esi xorl %edx, %eax movl -484(%ebp), %edx movl (%edi,%eax,4), %ecx movl %eax, -220(%ebp) movl %eax, -476(%ebp) xorl %ecx, %edx movl -492(%ebp), %ecx movl (%edi,%edx,4), %eax movl %edx, -216(%ebp) movl %edx, -484(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %edx, -216(%ebp) movl %edx, -484(%ebp) xorl %eax, %ecx movl (%edi,%ecx,4), %eax movl %ecx, -212(%ebp) movl %ecx, -492(%ebp) xorl %eax, %esi movl -508(%ebp), %eax movl (%edi,%esi,4), %edx movl %esi, -380(%ebp) movl %esi, -500(%ebp) xorl %edx, %eax movl -516(%ebp), %edx movl (%edi,%eax,4), %esi movl %eax, -376(%ebp) movl %eax, -508(%ebp) xorl %esi, %edx movl -524(%ebp), %esi movl (%edi,%edx,4), %ecx movl %edx, -372(%ebp) movl %edx, -516(%ebp) xorl %ecx, %esi movl %esi, -524(%ebp) movl -532(%ebp), %ecx movl (%edi,%esi,4), %edx xorl %edx, %ecx movl -540(%ebp), %edx movl (%edi,%ecx,4), %eax movl %ecx, -532(%ebp) xorl %eax, %edx movl -548(%ebp), %eax xorl (%edi,%edx,4), %eax movl %edx, -540(%ebp) movl %eax, -584(%ebp) movl %eax, -548(%ebp) movl (%edi,%eax,4), %eax xorl %eax, -208(%ebp) cmpl $17, -384(%ebp) jne .L16 ============================================================= The loop was unrolled, but it's clear that the address mode selection is worse. -- What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed| |1 Last reconfirmed|0000-00-00 00:00:00 |2005-06-01 22:55:36 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (5 preceding siblings ...) 2005-06-01 22:55 ` giovannibajo at libero dot it @ 2005-06-01 22:59 ` giovannibajo at libero dot it 2005-06-02 8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz ` (21 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: giovannibajo at libero dot it @ 2005-06-01 22:59 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From giovannibajo at libero dot it 2005-06-01 22:59 ------- I wonder if this is fixed by TARGET_MEM_REF. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (6 preceding siblings ...) 2005-06-01 22:59 ` giovannibajo at libero dot it @ 2005-06-02 8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-06 7:16 ` steven at gcc dot gnu dot org ` (20 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-02 8:01 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-02 08:01 ------- Subject: Re: openssl is slower when compiled with gcc 4.0 than 3.3 The assembler attributed to 4.0 was produced by mainline (or some patched version of 4.0), wasn't it? Otherwise I cannot imagine why the inner loop would be unrolled. For plain 4.0, we get the following code, which seems just fine and equivalent to the one obtained with 3.4 (one of the memory references is strength reduced, but since we still fit into registers, this is OK). I don't just now see what/whether there is some problem with the code produced by 4.1, but I also don't see anything related to addressing mode selection there. .L21: movl S@GOTOFF(%ebx,%eax,4), %eax xorl (%edx), %eax movl %eax, (%edx) movl S@GOTOFF(%ebx,%eax,4), %eax xorl 4(%edx), %eax movl %eax, 4(%edx) movl S@GOTOFF(%ebx,%eax,4), %eax xorl 8(%edx), %eax movl %eax, 8(%edx) movl S@GOTOFF(%ebx,%eax,4), %eax xorl 12(%edx), %eax movl %eax, 12(%edx) movl S@GOTOFF(%ebx,%eax,4), %eax xorl 16(%edx), %eax movl %eax, 16(%edx) movl S@GOTOFF(%ebx,%eax,4), %eax xorl 20(%edx), %eax movl %eax, 20(%edx) movl S@GOTOFF(%ebx,%eax,4), %eax xorl 24(%edx), %eax movl %eax, 24(%edx) movl S@GOTOFF(%ebx,%eax,4), %eax xorl 28(%edx), %eax movl %eax, 28(%edx) addl $32, %edx leal -12(%ebp), %esi cmpl %esi, %edx jne .L21 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (7 preceding siblings ...) 2005-06-02 8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-06 7:16 ` steven at gcc dot gnu dot org 2005-06-06 7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz ` (19 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: steven at gcc dot gnu dot org @ 2005-06-06 7:16 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From steven at gcc dot gnu dot org 2005-06-06 07:16 ------- Could L1 icache blow-out be the reason? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (8 preceding siblings ...) 2005-06-06 7:16 ` steven at gcc dot gnu dot org @ 2005-06-06 7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-06 13:33 ` giovannibajo at libero dot it ` (18 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-06 7:30 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-06 07:30 ------- Subject: Re: openssl is slower when compiled with gcc 4.0 than 3.3 > Could L1 icache blow-out be the reason? This is not likely with the minimized example. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (9 preceding siblings ...) 2005-06-06 7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-06 13:33 ` giovannibajo at libero dot it 2005-06-06 14:40 ` giovannibajo at libero dot it ` (17 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: giovannibajo at libero dot it @ 2005-06-06 13:33 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From giovannibajo at libero dot it 2005-06-06 13:33 ------- Uhm, at this point, I don't believe anymore that the loop I posted is the cause of the regression. Maybe the regression is somewhere else. I'll investigate. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (10 preceding siblings ...) 2005-06-06 13:33 ` giovannibajo at libero dot it @ 2005-06-06 14:40 ` giovannibajo at libero dot it 2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz ` (16 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: giovannibajo at libero dot it @ 2005-06-06 14:40 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From giovannibajo at libero dot it 2005-06-06 14:40 ------- Looks like the culrpit is this: ========================================================================= static unsigned int S[256]; unsigned md2_block (unsigned int *sp1, unsigned int *sp2, const unsigned char *d) { register unsigned int t; register int i, j; static unsigned int state[48]; j = sp2[16 - 1]; for (i = 0; i < 16; i++) { state[i] = sp1[i]; state[i + 16] = t = d[i]; state[i + 32] = (t ^ sp1[i]); j = sp2[i] ^= S[t ^ j]; } } ========================================================================= gcc 3.4.3 -fPIC -O2: =================================================== .L5: movl 8(%ebp), %esi movl (%esi,%ecx,4), %eax movl %eax, state.0@GOTOFF(%ebx,%ecx,4) movl 16(%ebp), %edx movzbl (%edx,%ecx), %eax movl %eax, 64+state.0@GOTOFF(%ebx,%ecx,4) movl (%esi,%ecx,4), %edx xorl %eax, %edx movl -16(%ebp), %esi xorl -20(%ebp), %eax movl %edx, 128+state.0@GOTOFF(%ebx,%ecx,4) movl (%esi,%eax,4), %eax xorl (%edi,%ecx,4), %eax movl %eax, (%edi,%ecx,4) incl %ecx cmpl $15, %ecx movl %eax, -20(%ebp) jle .L5 =================================================== gcc 4.1.0 20050529 -fPIC -O2: =================================================== .L2: movl 8(%ebp), %eax leal 0(,%edi,4), %ecx movl %ecx, -28(%ebp) addl %ecx, %eax movl 16(%ebp), %ecx movl %eax, %edx movl %eax, -24(%ebp) movl -4(%eax), %eax movl %eax, (%esi) movzbl -1(%ecx,%edi), %eax incl %edi movl %eax, 64(%esi) movl -4(%edx), %ecx movl 12(%ebp), %edx xorl %eax, %ecx movl %ecx, 128(%esi) movl -28(%ebp), %ecx addl $4, %esi addl %edx, %ecx movl -16(%ebp), %edx xorl %edx, %eax movl -20(%ebp), %edx movl (%edx,%eax,4), %eax movl -4(%ecx), %edx xorl %edx, %eax cmpl $17, %edi movl %eax, -4(%ecx) movl %eax, -16(%ebp) jne .L2 =================================================== -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (11 preceding siblings ...) 2005-06-06 14:40 ` giovannibajo at libero dot it @ 2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-08 13:15 ` [Bug target/19923] [4.0/4.1 Regression] " giovannibajo at libero dot it ` (15 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-06 15:00 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-06 15:00 ------- Subject: Re: openssl is slower when compiled with gcc 4.0 than 3.3 > Looks like the culrpit is this: > > ========================================================================= > static unsigned int S[256]; > unsigned > md2_block (unsigned int *sp1, unsigned int *sp2, const unsigned char *d) > { > register unsigned int t; > register int i, j; > static unsigned int state[48]; > > j = sp2[16 - 1]; > for (i = 0; i < 16; i++) > { > state[i] = sp1[i]; > state[i + 16] = t = d[i]; > state[i + 32] = (t ^ sp1[i]); > j = sp2[i] ^= S[t ^ j]; > } > } > ========================================================================= with the TARGET_MEM_REFs patch the result is much better. At least we avoid the multiplication by 4 > leal 0(,%edi,4), %ecx and other results of the DOM missoptimization of addressing modes, that was one of the main motivations for TARGET_MEM_REFs. We still use one more iv than in the 3.4 case, and in result we need one more register. .L2: movl 8(%ebp), %edi movl -4(%edi,%ecx,4), %eax movl %eax, (%esi) movl 16(%ebp), %edx movzbl -1(%ecx,%edx), %eax movl %eax, 64(%esi) movl -4(%edi,%ecx,4), %edx xorl %eax, %edx movl %edx, 128(%esi) xorl -20(%ebp), %eax movl -16(%ebp), %edi movl (%edi,%eax,4), %eax movl 12(%ebp), %edx xorl -4(%edx,%ecx,4), %eax movl %eax, -4(%edx,%ecx,4) movl %eax, -20(%ebp) incl %ecx addl $4, %esi cmpl $17, %ecx jne .L2 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (12 preceding siblings ...) 2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-08 13:15 ` giovannibajo at libero dot it 2005-06-17 0:59 ` dank at kegel dot com ` (14 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: giovannibajo at libero dot it @ 2005-06-08 13:15 UTC (permalink / raw) To: gcc-bugs -- What |Removed |Added ---------------------------------------------------------------------------- Summary|openssl is slower when |[4.0/4.1 Regression] openssl |compiled with gcc 4.0 than |is slower when compiled with |3.3 |gcc 4.0 than 3.3 Target Milestone|--- |4.0.2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (13 preceding siblings ...) 2005-06-08 13:15 ` [Bug target/19923] [4.0/4.1 Regression] " giovannibajo at libero dot it @ 2005-06-17 0:59 ` dank at kegel dot com 2005-06-17 1:11 ` pinskia at gcc dot gnu dot org ` (13 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: dank at kegel dot com @ 2005-06-17 0:59 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From dank at kegel dot com 2005-06-17 00:59 ------- We're learning more about this bug. Anthony Danalis has boiled down the testcase much further; I'll attach the reduced testcase as foo4.i. It looks like it shows up if your /proc/cpuinfo says vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 9 cpu MHz : 2793.051 cache size : 512 KB but not if your /proc/cpuinfo says vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Pentium(R) 4 CPU 3.20GHz stepping : 1 cpu MHz : 3200.255 cache size : 1024 KB But here's the fun part: on the newer CPU with the bigger cache, gcc-2.95.3 was just as slow as gcc-3.4.3/gcc-4.0.0. Go figure. We'll add more details once we've got more info. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (14 preceding siblings ...) 2005-06-17 0:59 ` dank at kegel dot com @ 2005-06-17 1:11 ` pinskia at gcc dot gnu dot org 2005-06-18 6:24 ` dank at kegel dot com ` (12 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu dot org @ 2005-06-17 1:11 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-06-17 01:10 ------- (In reply to comment #14) > We're learning more about this bug. > Anthony Danalis has boiled down the testcase much further; > I'll attach the reduced testcase as foo4.i. Yes you know what the difference is between those two, the second one is not really a P4 but really a new core, Intel marketing at its best. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (15 preceding siblings ...) 2005-06-17 1:11 ` pinskia at gcc dot gnu dot org @ 2005-06-18 6:24 ` dank at kegel dot com 2005-06-18 6:39 ` dank at kegel dot com ` (11 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: dank at kegel dot com @ 2005-06-18 6:24 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From dank at kegel dot com 2005-06-18 06:24 ------- Looks to me like gcc-3.4.3 is known to fail, too, depending on the CPU. Anthony Danalis and I came up with a little script to run foo4.i on various processors with various values for -mtune, which I'll attach; here are the results for four different x86 variants. The last two columns are the time on gcc-3.4.3 and gcc-4.0.0 divided by the time on gcc-2.95.3, so any value above 1.0 in the last column is a performance regression. Rows are sorted by the last column. The first five rows represent performance regressions for gcc-3.4.3; the first three also represent performance regressions for gcc-4.0.0. family,model,name pic? tune [t_295, t_343, t_400] [t_295/t_295, t_343/t_295, t_400/t_295] 6,8, Pentium III (Coppermine), -fPIC athlon-xp [9.25, 16.22, 18.79] [1.00, 1.75, 2.03] 15,2, Xeon(TM) CPU 2.60GHz, -fPIC pentium4 [1.91, 3.89, 3.27] [1.00, 2.04, 1.71] 6,8, Pentium III (Coppermine), -fPIC pentium3 [9.15, 10.10, 13.20] [1.00, 1.10, 1.44] 15,2, Xeon(TM) CPU 2.60GHz, -fPIC athlon-xp [1.91, 2.00, 1.95] [1.00, 1.05, 1.02] 6,8, Pentium III (Coppermine), -fPIC pentium4 [9.27, 10.49, 8.87] [1.00, 1.13, 0.96] --- ok below this line --- 6,8, Pentium III (Coppermine), pentium4 [14.74, 13.71, 14.12] [1.00, 0.93, 0.96] 15,4, Athlon(tm) 64 3000+, -fPIC pentium4 [4.12, 3.68, 3.74] [1.00, 0.89, 0.91] 15,4, Pentium(R) 4 CPU 3.20GHz, -fPIC pentium4 [2.48, 2.18, 2.09] [1.00, 0.88, 0.84] 15,4, Athlon(tm) 64 3000+, -fPIC athlon-xp [4.12, 3.50, 3.20] [1.00, 0.85, 0.78] 15,4, Pentium(R) 4 CPU 3.20GHz, pentium4 [2.17, 1.07, 1.07] [1.00, 0.49, 0.49] 6,8, Pentium III (Coppermine), pentium3 [14.22, 6.26, 6.46] [1.00, 0.44, 0.45] 6,8, Pentium III (Coppermine), athlon-xp [14.93, 6.26, 6.27] [1.00, 0.42, 0.42] 15,4, Athlon(tm) 64 3000+, pentium4 [3.65, 1.39, 1.39] [1.00, 0.38, 0.38] 15,4, Athlon(tm) 64 3000+, athlon-xp [3.65, 1.39, 1.40] [1.00, 0.38, 0.38] 15,2, Xeon(TM) CPU 2.60GHz, pentium4 [6.42, 0.97, 0.98] [1.00, 0.15, 0.15] -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (16 preceding siblings ...) 2005-06-18 6:24 ` dank at kegel dot com @ 2005-06-18 6:39 ` dank at kegel dot com 2005-06-18 17:47 ` dank at kegel dot com ` (10 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: dank at kegel dot com @ 2005-06-18 6:39 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From dank at kegel dot com 2005-06-18 06:38 ------- To be clear, here are the two most worrying rows from the above table, expanded a bit. These are the runtimes of foo4.i in seconds. The cpu family, model, and name are as shown by /proc/cpuinfo. cpu family 15, model 2, Intel(R) Xeon(TM) CPU 2.60GHz: -fPIC -mtune=pentium4 -O3 gcc-2.95.3: 1.91 seconds gcc-3.4.3: 3.89 gcc-4.0.0: 3.27 cpu family 6, model 8, Pentium III (Coppermine) -fPIC -mtune=pentium3 -O3 gcc-2.95.3: 9.15 gcc-3.4.3: 10.10 gcc-4.0.0: 13.20 gcc-4.0.0 produces code that runs 1.7 and 1.4 times slower than gcc-2.95.3 on these (fairly common!) cpus, even when the proper -mtune is used. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (17 preceding siblings ...) 2005-06-18 6:39 ` dank at kegel dot com @ 2005-06-18 17:47 ` dank at kegel dot com 2005-06-18 22:46 ` dank at kegel dot com ` (9 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: dank at kegel dot com @ 2005-06-18 17:47 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From dank at kegel dot com 2005-06-18 17:46 ------- The above tests did not use -mcpu on gcc-2.95.3, so they were comparing apples to oranges, kind of. I reran them on a PIII with gcc-2.95.3 -mcpu=$tune -O3 and gcc-[34] -mtune=$tune -O3. The problem persists even when using the most appropriate tuning option for the CPU in question. cpu family 6,model 8, Pentium III (Coppermine): -fPIC -mcpu=pentium -O3 gcc-2.95.3: 7.61 gcc-3.4.3: 27.43 gcc-4.0.0: 17.57 cpu family 6,model 8, Pentium III (Coppermine): -fPIC -mcpu=pentiumpro -O3 gcc-2.95.3: 9.27 gcc-3.4.3: 10.09 gcc-4.0.0: 13.96 cpu family 15, model 2, Intel(R) Xeon(TM) CPU 2.60GHz: -fPIC -mtune=pentium4 -O3 gcc-2.95.3: 1.91 seconds gcc-3.4.3: 3.89 gcc-4.0.0: 3.27 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (18 preceding siblings ...) 2005-06-18 17:47 ` dank at kegel dot com @ 2005-06-18 22:46 ` dank at kegel dot com 2005-06-24 15:00 ` dank at kegel dot com ` (8 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: dank at kegel dot com @ 2005-06-18 22:46 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From dank at kegel dot com 2005-06-18 22:45 ------- I asked the fellow who posted the original problem report to give me the results of 'cat /proc/cpuinfo' on the affected machine. Here it is: vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 10 cpu MHz : 896.153 This is the same as one of the two affected CPU types here. The slow routine appears to be the buffer cleaning routine, though I haven't verified this with oprofile yet. Here's its loop: static char cleanse_ctr; ... while (len--) { *(ptr++) = cleanse_ctr; cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF)); } and the output of -O3 -fPIC for both gcc-2.95.3 and gcc-4.0.0: --- gcc-2.95.3 --- .L5: movl cleanse_ctr@GOT(%ebx),%edi movb (%edi),%al movb %al,(%edx) incl %edx movb (%edi),%cl addb $17,%cl movb %dl,%al andb $15,%al addb %al,%cl movb %cl,(%edi) subl $1,%esi jnc .L5 .L4: --- gcc-4 --- .L4: movb (%esi), %al movb %al, (%edx) leal (%ecx,%edi), %eax andl $15, %eax incl %ecx addb (%esi), %al incl %edx addl $17, %eax cmpl %ecx, 12(%ebp) movb %al, (%esi) jne .L4 It's not obvious to me why the gcc-4.0.0 generated code should be slower when run on some CPUs, if in fact it is. Is it the fact that the loop condition is checked with a cmp against memory instead of a flag being set by subtracting 1 from a register? (And where's the best place to learn about how to predict how long assembly snippets like this will take to run on various modern CPUs, anyway?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (19 preceding siblings ...) 2005-06-18 22:46 ` dank at kegel dot com @ 2005-06-24 15:00 ` dank at kegel dot com 2005-06-24 15:01 ` dank at kegel dot com ` (7 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: dank at kegel dot com @ 2005-06-24 15:00 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From dank at kegel dot com 2005-06-24 15:00 ------- Michael Meissner looked at the code, and saw that gcc-2.95.3 converts the loop to a countdown loop, but gcc-3.x doesn't, which wastes a precious register. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (20 preceding siblings ...) 2005-06-24 15:00 ` dank at kegel dot com @ 2005-06-24 15:01 ` dank at kegel dot com 2005-06-24 15:53 ` steven at gcc dot gnu dot org ` (6 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: dank at kegel dot com @ 2005-06-24 15:01 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From dank at kegel dot com 2005-06-24 15:01 ------- And, for what it's worth, the latest 4.1 snapshot also suffers from this. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (21 preceding siblings ...) 2005-06-24 15:01 ` dank at kegel dot com @ 2005-06-24 15:53 ` steven at gcc dot gnu dot org 2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz ` (5 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: steven at gcc dot gnu dot org @ 2005-06-24 15:53 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From steven at gcc dot gnu dot org 2005-06-24 15:53 ------- I don't see how the precious register would matter much. But this compare with memory is strange: cmpl %ecx, 12(%ebp) Why isn't len loaded into a register?? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (22 preceding siblings ...) 2005-06-24 15:53 ` steven at gcc dot gnu dot org @ 2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu ` (4 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-24 16:24 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-24 16:24 ------- Subject: Re: [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 > I don't see how the precious register would matter much. But this compare > with memory is strange: > > cmpl %ecx, 12(%ebp) > > Why isn't len loaded into a register?? You answer your own question -- because there is no register free; that's why the precisious register maters that much. (I guess; I may be wrong). Zdenek -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (23 preceding siblings ...) 2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu 2005-06-25 2:49 ` rakdver at gcc dot gnu dot org ` (3 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: dann at godzilla dot ics dot uci dot edu @ 2005-06-24 17:41 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From dann at godzilla dot ics dot uci dot edu 2005-06-24 17:41 ------- (In reply to comment #21) > The slow routine appears to be the buffer cleaning routine, > though I haven't verified this with oprofile yet. > Here's its loop: > static char cleanse_ctr; > ... > while (len--) { > *(ptr++) = cleanse_ctr; > cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF)); > } [Not entirely related, but..] There's one obvious way to improve this loop. The compiler cannot prove that the write *(ptr++) does not alias the global variable cleanse_ptr, so it will read it from memory in each iteration. To avoid the extra memory read just do something like: void OPENSSL_cleanse(unsigned char *ptr, unsigned int len) { unsigned char local_cleanse_ctr = cleanse_ctr; while (len--) { *(ptr++) = local_cleanse_ctr; local_cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF)); } local_cleanse_ctr += 63; cleanse_ctr = local_cleanse_ctr; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (24 preceding siblings ...) 2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu @ 2005-06-25 2:49 ` rakdver at gcc dot gnu dot org 2005-06-25 10:15 ` steven at gcc dot gnu dot org ` (2 subsequent siblings) 28 siblings, 0 replies; 35+ messages in thread From: rakdver at gcc dot gnu dot org @ 2005-06-25 2:49 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From rakdver at gcc dot gnu dot org 2005-06-25 02:49 ------- Ivopts seem to do several quite doubtful decisions in this testcase. -- What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|unassigned at gcc dot gnu |rakdver at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (25 preceding siblings ...) 2005-06-25 2:49 ` rakdver at gcc dot gnu dot org @ 2005-06-25 10:15 ` steven at gcc dot gnu dot org 2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org 28 siblings, 0 replies; 35+ messages in thread From: steven at gcc dot gnu dot org @ 2005-06-25 10:15 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From steven at gcc dot gnu dot org 2005-06-25 10:15 ------- Re. comment #25, as far as I can tell there are registers available in that loop. To quote the loop from comment #12: .L4: movb (%esi), %al movb %al, (%edx) leal (%ecx,%edi), %eax andl $15, %eax incl %ecx addb (%esi), %al incl %edx addl $17, %eax cmpl %ecx, 12(%ebp) movb %al, (%esi) jne .L4 Checking off used registers in this loop: %esi x %edi x %eax x %ebx %ecx x %edx x So %ebx at least is free (and iiuc, with -fomit-frame-pointer %ebp is also free, right?). Maybe the allocator thinks %ebx can't be used because it is the PIC register. Here is what mainline today ("GCC: (GNU) 4.1.0 20050625 (experimental)") gives me (x86-64 compiler with "-m32 -march=i686 -O3 -fPIC"): .L4: movzbl (%esi), %eax movb %al, (%ecx) incl %ecx movzbl -13(%ebp), %eax movzbl (%esi), %edx incb -13(%ebp) andb $15, %al addb $17, %dl addb %dl, %al cmpl %edi, %ecx movb %al, (%esi) jne .L4 The .optimized tree dump looks like this: <bb 0>: len.23 = len - 1; if (len.23 != 4294967295) goto <L6>; else goto <L2>; <L6>:; ivtmp.19 = (unsigned char) (signed char) (int) (ptr + 1B); ptr.27 = ptr; <L0>:; MEM[base: ptr.27] = cleanse_ctr; ptr.27 = ptr.27 + 1B; cleanse_ctr = (unsigned char) (((signed char) ivtmp.19 & 15) + (signed char) cleanse_ctr + 17); ivtmp.19 = ivtmp.19 + 1; if (ptr.27 != (unsigned char *) (ptr + (void *) len.23 + 1B)) goto <L0>; else goto <L2>; <L2>:; cleanse_ctr = (unsigned char) ((signed char) cleanse_ctr + 63); return; Note how the loop test is against ptr. Also, as far as I can tell the right hand side of the test (i.e. "(ptr + (void *) len.23 + 1B)") is loop invariant and should have been moved out. And the first two lines are also just weird, it is probably cheaper on almost any machine to do len.23 = len; if (len.23 != 0) goto <L6>; else goto <L2>; <L6>: len.23 = len.23 - 1; (etc...) In summary, we just produce crap code here ;-) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (26 preceding siblings ...) 2005-06-25 10:15 ` steven at gcc dot gnu dot org @ 2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org 28 siblings, 0 replies; 35+ messages in thread From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-06-25 11:32 UTC (permalink / raw) To: gcc-bugs ------- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-25 11:32 ------- Subject: Re: [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 > ------- Additional Comments From steven at gcc dot gnu dot org 2005-06-25 10:15 ------- > Re. comment #25, as far as I can tell there are registers available in > that loop. To quote the loop from comment #12: > > .L4: > movb (%esi), %al > movb %al, (%edx) > leal (%ecx,%edi), %eax > andl $15, %eax > incl %ecx > addb (%esi), %al > incl %edx > addl $17, %eax > cmpl %ecx, 12(%ebp) > movb %al, (%esi) > jne .L4 > > Checking off used registers in this loop: > %esi x > %edi x > %eax x > %ebx > %ecx x > %edx x > > So %ebx at least is free (and iiuc, with -fomit-frame-pointer %ebp is > also free, right?). Maybe the allocator thinks %ebx can't be used > because it is the PIC register. yes, ebx cannot be used because of pic, and -fomit-frame-pointer is off by default. > Here is what mainline today ("GCC: (GNU) 4.1.0 20050625 (experimental)") > gives me (x86-64 compiler with "-m32 -march=i686 -O3 -fPIC"): > > .L4: > movzbl (%esi), %eax > movb %al, (%ecx) > incl %ecx > movzbl -13(%ebp), %eax > movzbl (%esi), %edx > incb -13(%ebp) > andb $15, %al > addb $17, %dl > addb %dl, %al > cmpl %edi, %ecx > movb %al, (%esi) > jne .L4 > > The .optimized tree dump looks like this: > > <bb 0>: > len.23 = len - 1; > if (len.23 != 4294967295) goto <L6>; else goto <L2>; > And the first two lines are > also just weird, it is probably cheaper on almost any machine to do > len.23 = len; > if (len.23 != 0) goto <L6>; else goto <L2>; > > <L6>: > len.23 = len.23 - 1; > (etc...) Not really. On i686, there should be no difference. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl ` (27 preceding siblings ...) 2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org 28 siblings, 0 replies; 35+ messages in thread From: mmitchel at gcc dot gnu dot org @ 2005-09-27 15:57 UTC (permalink / raw) To: gcc-bugs -- What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|4.0.2 |4.0.3 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <bug-19923-6145@http.gcc.gnu.org/bugzilla/>]
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 [not found] <bug-19923-6145@http.gcc.gnu.org/bugzilla/> @ 2005-10-27 0:47 ` pinskia at gcc dot gnu dot org 2005-10-31 2:39 ` mmitchel at gcc dot gnu dot org ` (3 subsequent siblings) 4 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu dot org @ 2005-10-27 0:47 UTC (permalink / raw) To: gcc-bugs ------- Comment #31 from pinskia at gcc dot gnu dot org 2005-10-27 00:47 ------- (In reply to comment #30) > This patch could help; I need to benchmark it before submitting it. Any news about this patch? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 [not found] <bug-19923-6145@http.gcc.gnu.org/bugzilla/> 2005-10-27 0:47 ` pinskia at gcc dot gnu dot org @ 2005-10-31 2:39 ` mmitchel at gcc dot gnu dot org 2005-11-16 9:42 ` steven at gcc dot gnu dot org ` (2 subsequent siblings) 4 siblings, 0 replies; 35+ messages in thread From: mmitchel at gcc dot gnu dot org @ 2005-10-31 2:39 UTC (permalink / raw) To: gcc-bugs ------- Comment #32 from mmitchel at gcc dot gnu dot org 2005-10-31 02:39 ------- Leaving as P2 as this is a significant pessimization on a significant piece of code on relatively common processors. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 [not found] <bug-19923-6145@http.gcc.gnu.org/bugzilla/> 2005-10-27 0:47 ` pinskia at gcc dot gnu dot org 2005-10-31 2:39 ` mmitchel at gcc dot gnu dot org @ 2005-11-16 9:42 ` steven at gcc dot gnu dot org 2005-11-17 13:35 ` rakdver at gcc dot gnu dot org 2005-11-17 15:09 ` rakdver at gcc dot gnu dot org 4 siblings, 0 replies; 35+ messages in thread From: steven at gcc dot gnu dot org @ 2005-11-16 9:42 UTC (permalink / raw) To: gcc-bugs ------- Comment #33 from steven at gcc dot gnu dot org 2005-11-16 09:42 ------- Zdenek, any news about your patch from comment #30? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 [not found] <bug-19923-6145@http.gcc.gnu.org/bugzilla/> ` (2 preceding siblings ...) 2005-11-16 9:42 ` steven at gcc dot gnu dot org @ 2005-11-17 13:35 ` rakdver at gcc dot gnu dot org 2005-11-17 15:09 ` rakdver at gcc dot gnu dot org 4 siblings, 0 replies; 35+ messages in thread From: rakdver at gcc dot gnu dot org @ 2005-11-17 13:35 UTC (permalink / raw) To: gcc-bugs ------- Comment #34 from rakdver at gcc dot gnu dot org 2005-11-17 13:35 ------- It behaves somewhat erratically on SPEC2000 (it increases the overall score, but there are some significant regressions). And, it also causes us to produce worse code for this testcase at the moment, due to a missunderstanding between ivopts and fold; expression (unsigned char) (signed char) (int) (ptr + 1B) - (unsigned char) ptr is produced, and it is not folded to 1. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug target/19923] [4.0/4.1 Regression] openssl is slower when compiled with gcc 4.0 than 3.3 [not found] <bug-19923-6145@http.gcc.gnu.org/bugzilla/> ` (3 preceding siblings ...) 2005-11-17 13:35 ` rakdver at gcc dot gnu dot org @ 2005-11-17 15:09 ` rakdver at gcc dot gnu dot org 4 siblings, 0 replies; 35+ messages in thread From: rakdver at gcc dot gnu dot org @ 2005-11-17 15:09 UTC (permalink / raw) To: gcc-bugs ------- Comment #35 from rakdver at gcc dot gnu dot org 2005-11-17 15:09 ------- Created an attachment (id=10263) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10263&action=view) Patch After some playing with fold, I arrived to the following patch, that almost works. With the patch, the code for the loop is <L0>:; MEM[base: ptr]{*ptr} = cleanse_ctr; ptr = ptr + 1B; cleanse_ctr = (unsigned char) (((signed char) ptr & 15) + (signed char) cleanse_ctr + 17); len = len - 1; if (len != 0) goto <L0>; else goto <L2>; Which seems just fine. The assembler is .L3: movb (%edi), %al movb %al, (%ecx) incl %ecx movb %cl, %al andl $15, %eax movb (%edi), %dl addl $17, %edx addl %edx, %eax movb %al, (%edi) decl %esi jne .L3 Which also seems OK to me. However, the "ugly" version we produce without the patch: .L4: movb (%edi), %al movb %al, (%ecx) incl %ecx movb -16(%ebp), %al addl %esi, %eax andl $15, %eax movb (%edi), %dl addl $17, %edx addl %edx, %eax movb %al, (%edi) incl %esi cmpl 12(%ebp), %esi jne .L4 Is faster by 30%, from reasons I just don't understand :-( -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923 ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2005-11-17 15:09 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-02-12 20:07 [Bug c/19923] New: openssl is slower when compiled with gcc 4.0 than 3.3 gj at pointblue dot com dot pl 2005-02-12 20:11 ` [Bug target/19923] " pinskia at gcc dot gnu dot org 2005-02-13 6:44 ` pinskia at gcc dot gnu dot org 2005-05-14 20:36 ` pinskia at gcc dot gnu dot org 2005-06-01 20:47 ` yx at cs dot ucla dot edu 2005-06-01 20:55 ` pinskia at gcc dot gnu dot org 2005-06-01 22:55 ` giovannibajo at libero dot it 2005-06-01 22:59 ` giovannibajo at libero dot it 2005-06-02 8:01 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-06 7:16 ` steven at gcc dot gnu dot org 2005-06-06 7:30 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-06 13:33 ` giovannibajo at libero dot it 2005-06-06 14:40 ` giovannibajo at libero dot it 2005-06-06 15:00 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-08 13:15 ` [Bug target/19923] [4.0/4.1 Regression] " giovannibajo at libero dot it 2005-06-17 0:59 ` dank at kegel dot com 2005-06-17 1:11 ` pinskia at gcc dot gnu dot org 2005-06-18 6:24 ` dank at kegel dot com 2005-06-18 6:39 ` dank at kegel dot com 2005-06-18 17:47 ` dank at kegel dot com 2005-06-18 22:46 ` dank at kegel dot com 2005-06-24 15:00 ` dank at kegel dot com 2005-06-24 15:01 ` dank at kegel dot com 2005-06-24 15:53 ` steven at gcc dot gnu dot org 2005-06-24 16:24 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-06-24 17:41 ` dann at godzilla dot ics dot uci dot edu 2005-06-25 2:49 ` rakdver at gcc dot gnu dot org 2005-06-25 10:15 ` steven at gcc dot gnu dot org 2005-06-25 11:32 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org [not found] <bug-19923-6145@http.gcc.gnu.org/bugzilla/> 2005-10-27 0:47 ` pinskia at gcc dot gnu dot org 2005-10-31 2:39 ` mmitchel at gcc dot gnu dot org 2005-11-16 9:42 ` steven at gcc dot gnu dot org 2005-11-17 13:35 ` rakdver at gcc dot gnu dot org 2005-11-17 15:09 ` rakdver at gcc dot gnu dot org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).