* Re: gcc will become the best optimizing x86 compiler @ 2008-07-30 15:34 Eus 2008-07-30 16:09 ` Dennis Clarke 0 siblings, 1 reply; 46+ messages in thread From: Eus @ 2008-07-30 15:34 UTC (permalink / raw) To: Dennis Clarke; +Cc: GCC Development Mailing List Hi Ho! --- On Tue, 7/29/08, "Dennis Clarke" <blastwave@gmail.com> wrote: > hold on .. on the NEWS page I see ... okay .. how very user friendly. > Sort of the thing one would put on the project homepage I would think. Do you mind to tell me what you saw? I was looking for the interesting part on the latest release of the NEWS on the CVS but to no avail. Thank you for your help. > Dennis Best regards, Eus ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-30 15:34 gcc will become the best optimizing x86 compiler Eus @ 2008-07-30 16:09 ` Dennis Clarke 0 siblings, 0 replies; 46+ messages in thread From: Dennis Clarke @ 2008-07-30 16:09 UTC (permalink / raw) To: eus; +Cc: GCC Development Mailing List On Wed, Jul 30, 2008 at 3:23 PM, Eus <eus@member.fsf.org> wrote: > Hi Ho! > > --- On Tue, 7/29/08, "Dennis Clarke" <blastwave@gmail.com> wrote: > >> hold on .. on the NEWS page I see ... okay .. how very user friendly. >> Sort of the thing one would put on the project homepage I would think. > > Do you mind to tell me what you saw? > I was looking for the interesting part on the latest release of the NEWS on the CVS but to no avail. > > Thank you for your help. It says : GNU C Library NEWS -- history of user-visible changes. 2008-7-27 Copyright (C) 1992-2007, 2008 Free Software Foundation, Inc. See the end for copying conditions. Please send GNU C library bug reports via <http://sources.redhat.com/bugzilla/> using `glibc' in the "product" field. \f Version 2.9 * Unified lookup for getaddrinfo: IPv4 and IPv6 addresses are now looked up at the same time. Implemented by Ulrich Drepper. * TLS descriptors for LD and GD on x86 and x86-64. Implemented by Alexandre Oliva. * getaddrinfo now handles DCCP and UDPlite. Implemented by Ulrich Drepper. * New fixed-size conversion macros: htobe16, htole16, be16toh, le16toh, htobe32, htole32, be32toh, le32toh, htobe64, htole64, be64toh, le64toh. Implemented by Ulrich Drepper. * New implementation of memmem, strstr, and strcasestr which is O(n). Implemented by Eric Blake. * New Linux interfaces: inotify_init1, paccept, dup3, epoll_create2, pipe2 * Implement "e" option for popen to open file descriptor with the close-on-exec flag set * Many functions, exported and internal, now atomically set the close-on-exec flag when run on a sufficiently new kernel. Implemented by Ulrich Drepper. \f Version 2.8 * New locales: bo_CN, bo_IN, shs_CA. * New encoding: HP-ROMAN9, HP-GREEK8, HP-THAI8, HP-TURKISH8. * Sorting rules for some Indian languages (Devanagari and Gujarati). Implemented by Pravin Satpute. * IPV6 addresses in /etc/resolv.conf can now have a scope ID * nscd caches now all timeouts for DNS entries Implemented by Ulrich Drepper. * nscd is more efficient and wakes up less often. Implemented by Ulrich Drepper. * More checking functions: asprintf, dprintf, obstack_printf, vasprintf, vdprintf, and obstack_vprintf. Implemented by Jakub Jelinek. * Faster memset for x86-64. Implemented by Harsha Jagasia and H.J. Lu. * Faster memcpy on x86. Implemented by Ulrich Drepper. * ARG_MAX is not anymore constant on Linux. Use sysconf(_SC_ARG_MAX). Implemented by Ulrich Drepper. * Faster sqrt and sqrtf implemention for some PPC variants. Implemented by Stephen Munroe. \f Version 2.7 * More checking functions: fread, fread_unlocked, open*, mq_open. Implemented by Jakub Jelinek and Ulrich Drepper. * Extend fortification to C++. Implemented by Jakub Jelinek. * Implement 'm' modifier for scanf. Add stricter C99/SUS compliance by not recognizing 'a' as a modifier when those specs are requested. Implemented by Jakub Jelinek. * PPC optimizations to math and string functions. Implemented by Steven Munroe. * New interfaces: mkostemp, mkostemp64. Like mkstemp* but allow additional options to be passed. Implemented by Ulrich Drepper. * More CPU set manipulation functions. Implemented by Ulrich Drepper. * New Linux interfaces: signalfd, eventfd, eventfd_read, and eventfd_write. Implemented by Ulrich Drepper. * Handle private futexes in the NPTL implementation. Implemented by Jakub Jelinek and Ulrich Drepper. * Add support for O_CLOEXEC. Implement in Hurd. Use throughout libc. Implemented by Roland McGrath and Ulrich Drepper. * Linux/x86-64 vDSO support. Implemented by Ulrich Drepper. * SHA-256 and SHA-512 based password encryption. Implemented by Ulrich Drepper. * New locales: ber_DZ, ber_MA, en_NG, fil_PH, fur_IT, fy_DE, ha_NG, ig_NG, ik_CA, iu_CA, li_BE, li_NL, nds_DE, nds_NL, pap_AN, sc_IT, tk_TM, ug_CN, yo_NG. + New iconv modules: MAC-CENTRALEUROPE, ISO-8859-9E, KOI8-RU. Implemented by Ulrich Drepper. \f Version 2.6 * New Linux interfaces: epoll_pwait, sched_getcpu. * New generic interfaces: strerror_l. * nscd can now cache the services database. Implemented by Ulrich Drepper. etc etc etc Dennis ^ permalink raw reply [flat|nested] 46+ messages in thread
* Is cross-section inlining valid behaviour? @ 2008-07-23 14:59 Bingfeng Mei 2008-07-23 17:25 ` gcc will become the best optimizing x86 compiler Agner Fog 0 siblings, 1 reply; 46+ messages in thread From: Bingfeng Mei @ 2008-07-23 14:59 UTC (permalink / raw) To: gcc Hello, I came across a problem related to cross-section inlining. For the following example, static void foo(void) __attribute__((section ("foo"))); static void foo(void) { printf("Hello\n"); } void bar(void) __attribute__((section ("bar"))); void bar(void) { foo(); } I compiled with the latest mainline gcc. gcc tst.c -O3 -S The foo function is inlined into bar anyway even they have different section attribute. Is this a bug or expected behaviour? .file "tst.c" .section .rodata.str1.1,"aMS",@progbits,1 .LC0: .string "Hello" .section bar,"ax",@progbits .p2align 4,,15 .globl bar .type bar, @function bar: .LFB3: movl $.LC0, %edi jmp puts .LFE3: Thanks. Bingfeng Mei ^ permalink raw reply [flat|nested] 46+ messages in thread
* gcc will become the best optimizing x86 compiler 2008-07-23 14:59 Is cross-section inlining valid behaviour? Bingfeng Mei @ 2008-07-23 17:25 ` Agner Fog 2008-07-23 17:33 ` Tim Prince ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Agner Fog @ 2008-07-23 17:25 UTC (permalink / raw) To: gcc Hi, I am doing research on optimization of microprocessors and compilers. Some of you already know my optimization manuals (www.agner.org/optimize/). I have tested many different compilers and compared how well they optimize C++ code. I have been pleased to observe that gcc has been improved a lot in the last couple of years. The gcc compiler itself is now matching the optimizing performance of the Intel compiler and it beats all other compilers I have tested. All you hard-working developers deserve credit for this! I can imagine that gcc might be the compiler of choice for all x86 and x86-64 platforms in the future. Actually, the compiler itself is very close to being the best, but it appears that the function libraries are lacking behind. I have tested a few of the most important functions in libc and compared them with other available libraries (MS, Borland, Intel, Mac). The comparison does not look good for gnu libc. See my test results in http://www.agner.org/optimize/optimizing_cpp.pdf section 2.6. The 64-bit version is better than the 32-bit version, though. The first thing that you can do to improve the performance is to drop the builtin versions of memory and string functions. The speed can be improved by up to a factor 5 in some cases by compiling with -fno-builtin. The builtin version is never optimal, except for memcpy in cases where the count is a small compile-time constant so that it can be replaced by simple mov instructions. Next, the function libraries should have CPU-dispatching and use the latest instruction sets where appropriate. You are not even using XMM registers for memcpy in 64-bit libc. I think you can borrow code from the Mac/Darwin/Xnu project. They have optimized these functions very carefully for the Intel Core and Core 2 processors. Of course they have the advantage that they don't need to support any other processors, whereas gcc has to support every possible Intel and AMD processor. This means more CPU-dispatching. I have made a few optimized functions myself and published them as a multi-platform library (www.agner.org/optimize/asmlib.zip). It is faster than most other libraries on an Intel Core2 and up to ten times faster than gcc using builtin functions. My library is published with GPL license, but I will allow you to use my code in gnu libc if you wish (Sorry, I don't have the time to work on the gnu project myself, but you may contact me for details about the code). The Windows version of gcc is not up to date, but I think that when gcc gets a reputation as the best compiler, more people will be motivated to update cygwin/mingw. A lot of people are actually using it. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-23 17:25 ` gcc will become the best optimizing x86 compiler Agner Fog @ 2008-07-23 17:33 ` Tim Prince 2008-07-24 8:04 ` Dennis Clarke 2008-07-24 10:09 ` Zoltán Kócsi 2 siblings, 0 replies; 46+ messages in thread From: Tim Prince @ 2008-07-23 17:33 UTC (permalink / raw) To: Agner Fog; +Cc: gcc Agner Fog wrote: > I have tested a few of the most important functions in > libc and compared them with other available libraries (MS, Borland, > Intel, Mac). The comparison does not look good for gnu libc. See my test > results in http://www.agner.org/optimize/optimizing_cpp.pdf section 2.6. As far as I can see, you identify the library you tested only as "ubuntu g++ 4.2.3." Presumably, that implies some version of glibc? On my x86-64 system where I have glibc-2.6.1-18.3, some of the functions perform much better than those provided with earlier glibc versions. Speaking of the one case where I have looked into it, the builtin_memcpy of gcc for 32-bit linux uses a string move which performs well only for certain cases of short non-aligned strings. The corresponding 64-bit linux will see vastly different levels of performance, depending on the glibc version, as it doesn't use a builtin string move. Certain newer CPUs aim to improve performance of the 32-bit gcc builtin string moves, but don't entirely eliminate the situations where it isn't optimum. The machinery for getting good performing versions in glibc isn't visible on this list. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-23 17:25 ` gcc will become the best optimizing x86 compiler Agner Fog 2008-07-23 17:33 ` Tim Prince @ 2008-07-24 8:04 ` Dennis Clarke 2008-07-24 9:41 ` Agner Fog 2008-07-24 10:09 ` Zoltán Kócsi 2 siblings, 1 reply; 46+ messages in thread From: Dennis Clarke @ 2008-07-24 8:04 UTC (permalink / raw) To: Agner Fog; +Cc: gcc On Wed, Jul 23, 2008 at 12:15 PM, Agner Fog <agner@agner.org> wrote: > Hi, I am doing research on optimization of microprocessors and compilers. > Some of you already know my optimization manuals (www.agner.org/optimize/). Sorry but I'm not buying. The Sun Studio 12 compiler with Solaris 10 on AMD Opteron or UltraSparc beats GCC in almost every single test case that I have seen. On the same hardware. Regardless of a single threaded test case or a multi-threaded test case. The differences do occur with file IO and with situations where peripherals get involved but for pure number crunching and pushing data around in heaps of memory I simply have not seen GCC ever do as well as Sun Studio 12 or even Sun Studio 10. Also, you have provided no data at all. So your assertions are those of a marketing person at the moment. Please post some code that can be compiled and then tested with high resolution timers and perhaps we can compare notes. Dennis Clarke ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 8:04 ` Dennis Clarke @ 2008-07-24 9:41 ` Agner Fog 2008-07-24 10:10 ` Dave Korn 2008-07-24 17:21 ` Raksit Ashok 0 siblings, 2 replies; 46+ messages in thread From: Agner Fog @ 2008-07-24 9:41 UTC (permalink / raw) To: dclarke; +Cc: gcc, TimothyPrince Dennis Clarke wrote: >The Sun Studio 12 compiler with Solaris 10 on AMD Opteron or >UltraSparc beats GCC in almost every single test case that I have >seen. This is memcpy on Solaris: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/i386/gen/memcpy.s It uses exactly the same method as memcpy on gcc libc, with only minor differences that have no influence on performance. > Also, you have provided no data at all. I have linked to the data rather than copying it here to save space on the mailing list. Here is the link again: http://www.agner.org/optimize/optimizing_cpp.pdf section 2.6, page 12. > So your assertions are those of a marketing person at the moment. Who sounds like a marketing person, you or me? :-) > Please post some code that can be compiled and then tested with high resolution timers and perhaps > we can compare notes. Here is my code, again: http://www.agner.org/optimize/asmlib.zip My test results, referred to above, uses the "core clock cycles" performance counter on Intel and RDTSC on AMD. It's the highest resolution you can get. Feel free to do you own tests, it's as simple as linking my library into your test program. Tim Prince wrote: >you identify the library you tested only as "ubuntu g++ 4.2.3." Where can I see the libc version? >The corresponding 64-bit linux will see vastly different levels of performance, depending on the >glibc version, as it doesn't use a builtin string move. Yes, this is exactly what my tests show. 64-bit libc is better than 32-bit libc, but still 3-4 times slower than the best library for unaligned operands on an Intel. >Certain newer CPUs aim to improve performance of the 32-bit gcc builtin string moves, but don't > entirely eliminate the situations where it isn't optimum. The Intel manuals are not clear about this. Intel Optimization reference manual says: >In most cases, applications should take advantage of the default memory routines provided by Intel compilers. What an excellent advice - the Intel compiler puts in a library with an automatic run-slowly-on-AMD feature! The Intel library does not use rep movs when running on an Intel CPU. The AMD software optimization guide mentions specific situations where rep movs is optimal. However, my tests on an Opteron (K8) tell that rep movs is never optimal on AMD either. I have no access to test it on the new AMD K10, but I expect the XMM register code to run much faster on K10 than on K8 because K10 has 128-bit data paths where K8 has only 64-bit. Evidently, the problem with memcpy has been ignored for years, see http://softwarecommunity.intel.com/Wiki/Linux/719.htm ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: gcc will become the best optimizing x86 compiler 2008-07-24 9:41 ` Agner Fog @ 2008-07-24 10:10 ` Dave Korn 2008-07-24 13:20 ` Basile STARYNKEVITCH 2008-07-24 17:21 ` Raksit Ashok 1 sibling, 1 reply; 46+ messages in thread From: Dave Korn @ 2008-07-24 10:10 UTC (permalink / raw) To: 'Agner Fog', dclarke; +Cc: gcc, TimothyPrince Agner Fog wrote on 24 July 2008 09:04: > Tim Prince wrote: > >you identify the library you tested only as "ubuntu g++ 4.2.3." > Where can I see the libc version? Use whichever package manager ubuntu provides to check the version of the glibc package. Here's an example fron a centos (using rpm): [dk@quattro ~]$ rpm -q glibc glibc-2.3.4-2.36 glibc-2.3.4-2.36 [dk@quattro ~]$ cheers, DaveK -- Can't think of a witty .sigline today.... ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 10:10 ` Dave Korn @ 2008-07-24 13:20 ` Basile STARYNKEVITCH 2008-07-24 13:31 ` Dave Korn ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Basile STARYNKEVITCH @ 2008-07-24 13:20 UTC (permalink / raw) Cc: 'Agner Fog', gcc, TimothyPrince Dave Korn wrote: > Agner Fog wrote on 24 July 2008 09:04: > >> Tim Prince wrote: >> >you identify the library you tested only as "ubuntu g++ 4.2.3." >> Where can I see the libc version? > > Use whichever package manager ubuntu provides to check the version of the > glibc package. Here's an example fron a centos (using rpm): On most Linux systems, in addition of using the package manager, the libc.so file is executable, and when executed, shows info, so on my Debian/Sid/AMD64 I'm getting % /lib/libc.so.6 GNU C Library stable release version 2.7, by Roland McGrath et al. Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.3.1 20080523 (prerelease). Compiled on a Linux >>2.6.25-2-amd64<< system on 2008-06-02. Available extensions: crypt add-on version 2.1 by Michael Glad and others GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B For bug reporting instructions, please see: <http://www.gnu.org/software/libc/bugs.html>. Of course, on Ubuntu & Debian, you can query the package system % dpkg -l libc6 Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name Version Description +++-==============-==============-============================================ ii libc6 2.7-12 GNU C Library: Shared libraries Regarding the original thread (performance of GCC & standard functions) it should be stressed that gcc would probably compile them better if passed machine specific flags. At last, at the recent (july 2008) GCC summit, someone (sorry I forgot who, probably someone from SuSE) proposed in a BOFS to have architecture and machine specific hand-tuned (or even hand-written assembly) low level libraries for such basic things as memset etc.. Thanks for reading -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} *** ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: gcc will become the best optimizing x86 compiler 2008-07-24 13:20 ` Basile STARYNKEVITCH @ 2008-07-24 13:31 ` Dave Korn 2008-07-24 13:59 ` Agner Fog 2008-07-24 15:02 ` Joseph S. Myers 2 siblings, 0 replies; 46+ messages in thread From: Dave Korn @ 2008-07-24 13:31 UTC (permalink / raw) To: 'Basile STARYNKEVITCH'; +Cc: 'Agner Fog', gcc, TimothyPrince Basile STARYNKEVITCH wrote on 24 July 2008 11:28: > On most Linux systems, in addition of using the package manager, the > libc.so file is executable, and when executed, shows info, so on my > Debian/Sid/AMD64 I'm getting > > % /lib/libc.so.6 > GNU C Library stable release version 2.7, by Roland McGrath et al. [snip!] oooh, nice - that's loads more informative than "This program cannot be run in DOS mode" ;-) Thanks for the tip! cheers, DaveK -- Can't think of a witty .sigline today.... ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 13:20 ` Basile STARYNKEVITCH 2008-07-24 13:31 ` Dave Korn @ 2008-07-24 13:59 ` Agner Fog 2008-07-24 14:40 ` Richard Guenther 2008-07-28 10:57 ` Andrew Haley 2008-07-24 15:02 ` Joseph S. Myers 2 siblings, 2 replies; 46+ messages in thread From: Agner Fog @ 2008-07-24 13:59 UTC (permalink / raw) To: Basile STARYNKEVITCH; +Cc: gcc, TimothyPrince Basile STARYNKEVITCH wrote: >At last, at the recent (july 2008) GCC summit, someone (sorry I forgot who, probably someone from SuSE) > proposed in a BOFS to have architecture and machine specific hand-tuned (or even hand-written assembly) low > level libraries for such basic things as memset etc.. That's exactly what I meant. The most important memory, string and math functions should use hand-tuned assembly with CPU dispatching for the latest instruction sets. My experiments show that the speed can be improved by a factor 3 - 10 for unaligned memcpy on Intel processors (http://www.agner.org/optimize/optimizing_cpp.pdf page 12). There will be more hand-tuning work to do when the 256-bit YMM registes become available in a few years - and more to gain in speed. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 13:59 ` Agner Fog @ 2008-07-24 14:40 ` Richard Guenther 2008-07-28 10:57 ` Andrew Haley 1 sibling, 0 replies; 46+ messages in thread From: Richard Guenther @ 2008-07-24 14:40 UTC (permalink / raw) To: Agner Fog; +Cc: Basile STARYNKEVITCH, gcc, TimothyPrince On Thu, Jul 24, 2008 at 3:28 PM, Agner Fog <agner@agner.org> wrote: > Basile STARYNKEVITCH wrote: >>At last, at the recent (july 2008) GCC summit, someone (sorry I forgot who, >> probably someone from SuSE) That was me and Michael Matz. Richard. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 13:59 ` Agner Fog 2008-07-24 14:40 ` Richard Guenther @ 2008-07-28 10:57 ` Andrew Haley 1 sibling, 0 replies; 46+ messages in thread From: Andrew Haley @ 2008-07-28 10:57 UTC (permalink / raw) To: Agner Fog; +Cc: Basile STARYNKEVITCH, gcc, TimothyPrince Agner Fog wrote: > Basile STARYNKEVITCH wrote: >>At last, at the recent (july 2008) GCC summit, someone (sorry I forgot > who, probably someone from SuSE) >> proposed in a BOFS to have architecture and machine specific > hand-tuned (or even hand-written assembly) low >> level libraries for such basic things as memset etc.. > > That's exactly what I meant. The most important memory, string and math > functions should use hand-tuned assembly with CPU dispatching for the > latest instruction sets. My experiments show that the speed can be > improved by a factor 3 - 10 for unaligned memcpy on Intel processors > (http://www.agner.org/optimize/optimizing_cpp.pdf page 12). Is this still true if you have to go through the PLT to make a position- independent call? That's the most common case for userspace on GNU/Linux. Andrew. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 13:20 ` Basile STARYNKEVITCH 2008-07-24 13:31 ` Dave Korn 2008-07-24 13:59 ` Agner Fog @ 2008-07-24 15:02 ` Joseph S. Myers 2008-07-24 16:26 ` Agner Fog 2008-07-24 17:17 ` Basile STARYNKEVITCH 2 siblings, 2 replies; 46+ messages in thread From: Joseph S. Myers @ 2008-07-24 15:02 UTC (permalink / raw) To: Basile STARYNKEVITCH; +Cc: 'Agner Fog', gcc, TimothyPrince On Thu, 24 Jul 2008, Basile STARYNKEVITCH wrote: > At last, at the recent (july 2008) GCC summit, someone (sorry I forgot who, > probably someone from SuSE) proposed in a BOFS to have architecture and > machine specific hand-tuned (or even hand-written assembly) low level > libraries for such basic things as memset etc.. I don't recall seeing any BOF minutes on this list yet this year. Are people going to be posting them? I don't know if it was proposed in this context, but the ARM EABI has various __aeabi_mem* functions for calls known to have particular alignment and the idea is relevant to other platforms if you provide such functions with the compiler. The compiler could also generate calls to different functions depending on the -march options and so save the runtime CPU check cost (you could have options to call either generic versions, or versions for a particular CPU, depending on whether you are building a generic binary for CPU-X-or-newer or a binary just for CPU X). As usual in this area, careful negotiation with the FSF at an early stage to be able to reuse glibc versions of the functions where useful would be a good idea. Reusing the glibc testcases for string functions (that e.g. they don't access beyond the memory they are allowed to access at the end of a page) would be a good idea as well, and doesn't have the problems with changing licenses that reusing the functions themselves does. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 15:02 ` Joseph S. Myers @ 2008-07-24 16:26 ` Agner Fog 2008-07-24 17:17 ` Basile STARYNKEVITCH 1 sibling, 0 replies; 46+ messages in thread From: Agner Fog @ 2008-07-24 16:26 UTC (permalink / raw) To: Joseph S. Myers; +Cc: Basile STARYNKEVITCH, gcc, TimothyPrince Joseph S. Myers wrote: >I don't know if it was proposed in this context, but the ARM EABI has >various __aeabi_mem* functions for calls known to have particular >alignment and the idea is relevant to other platforms if you provide such >functions with the compiler. The compiler could also generate calls to >different functions depending on the -march options and so save the >runtime CPU check cost (you could have options to call either generic >versions, or versions for a particular CPU, depending on whether you are >building a generic binary for CPU-X-or-newer or a binary just for CPU X). memcpy in the Intel and Mac libraries, as well as my own code, have different branches for different alignments and different CPU instruction sets. The runtime cost for this branching is negligible compared to the gain, even when the byte count is small. No need to bother the programmer with different versions. You can just copy the code from the Mac library, or from me. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 15:02 ` Joseph S. Myers 2008-07-24 16:26 ` Agner Fog @ 2008-07-24 17:17 ` Basile STARYNKEVITCH 1 sibling, 0 replies; 46+ messages in thread From: Basile STARYNKEVITCH @ 2008-07-24 17:17 UTC (permalink / raw) To: Joseph S. Myers; +Cc: 'Agner Fog', gcc, TimothyPrince Joseph S. Myers wrote: > On Thu, 24 Jul 2008, Basile STARYNKEVITCH wrote: > >> At last, at the recent (july 2008) GCC summit, someone (sorry I forgot who, >> probably someone from SuSE) proposed in a BOFS to have architecture and >> machine specific hand-tuned (or even hand-written assembly) low level >> libraries for such basic things as memset etc.. > > I don't recall seeing any BOF minutes on this list yet this year. Are > people going to be posting them? for the BOFS I did propose, the summary is on the wiki. http://gcc.gnu.org/wiki/MakingGCCEasierToLearn (I agree that my summary is not very good. Feel free to improve it) Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} *** ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 9:41 ` Agner Fog 2008-07-24 10:10 ` Dave Korn @ 2008-07-24 17:21 ` Raksit Ashok 2008-07-25 7:23 ` Agner Fog 1 sibling, 1 reply; 46+ messages in thread From: Raksit Ashok @ 2008-07-24 17:21 UTC (permalink / raw) To: Agner Fog; +Cc: dclarke, gcc, TimothyPrince On Thu, Jul 24, 2008 at 1:03 AM, Agner Fog <agner@agner.org> wrote: > Dennis Clarke wrote: >>The Sun Studio 12 compiler with Solaris 10 on AMD Opteron or >>UltraSparc beats GCC in almost every single test case that I have >>seen. > > This is memcpy on Solaris: > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/i386/gen/memcpy.s > > It uses exactly the same method as memcpy on gcc libc, with only minor > differences that have no influence on performance. There is a more optimized version for 64-bit: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s I think this looks similar to your implementation, Agner. -raksit > >> Also, you have provided no data at all. > > I have linked to the data rather than copying it here to save space on the > mailing list. Here is the link again: > http://www.agner.org/optimize/optimizing_cpp.pdf section 2.6, page 12. > >> So your assertions are those of a marketing person at the moment. > > Who sounds like a marketing person, you or me? :-) > >> Please post some code that can be compiled and then tested with high >> resolution timers and perhaps >> we can compare notes. > > Here is my code, again: > http://www.agner.org/optimize/asmlib.zip > My test results, referred to above, uses the "core clock cycles" performance > counter on Intel and RDTSC on AMD. It's the highest resolution you can get. > Feel free to do you own tests, it's as simple as linking my library into > your test program. > > Tim Prince wrote: >>you identify the library you tested only as "ubuntu g++ 4.2.3." > Where can I see the libc version? > >>The corresponding 64-bit linux will see vastly different levels of >> performance, depending on the >>glibc version, as it doesn't use a builtin string move. > Yes, this is exactly what my tests show. 64-bit libc is better than 32-bit > libc, but still 3-4 times slower than the best library for unaligned > operands on an Intel. > >>Certain newer CPUs aim to improve performance of the 32-bit gcc builtin >> string moves, but don't >> entirely eliminate the situations where it isn't optimum. > > The Intel manuals are not clear about this. Intel Optimization reference > manual says: >>In most cases, applications should take advantage of the default memory >> routines provided by Intel compilers. > What an excellent advice - the Intel compiler puts in a library with an > automatic run-slowly-on-AMD feature! > The Intel library does not use rep movs when running on an Intel CPU. > > The AMD software optimization guide mentions specific situations where rep > movs is optimal. However, my tests on an Opteron (K8) tell that rep movs is > never optimal on AMD either. I have no access to test it on the new AMD K10, > but I expect the XMM register code to run much faster on K10 than on K8 > because K10 has 128-bit data paths where K8 has only 64-bit. > > Evidently, the problem with memcpy has been ignored for years, see > http://softwarecommunity.intel.com/Wiki/Linux/719.htm > > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-24 17:21 ` Raksit Ashok @ 2008-07-25 7:23 ` Agner Fog 2008-07-26 0:23 ` Michael Meissner 2008-07-30 16:37 ` Denys Vlasenko 0 siblings, 2 replies; 46+ messages in thread From: Agner Fog @ 2008-07-25 7:23 UTC (permalink / raw) To: Raksit Ashok; +Cc: dclarke, gcc, TimothyPrince Raksit Ashok wrote: >There is a more optimized version for 64-bit: >http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s >I think this looks similar to your implementation, Agner. Yes it is similar to my code. Gnu libc could borrow a lot of optimized functions from Opensolaris and Mac and other open source projects. They look better than Gnu libc, but there is still room for improvement. For example, Opensolaris does not use XMM registers for strlen, although this is simpler than using general purpose registers (see my code www.agner.org/optimize/asmlib.zip) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-25 7:23 ` Agner Fog @ 2008-07-26 0:23 ` Michael Meissner 2008-07-26 17:49 ` Agner Fog 2008-07-28 11:45 ` Agner Fog 2008-07-30 16:37 ` Denys Vlasenko 1 sibling, 2 replies; 46+ messages in thread From: Michael Meissner @ 2008-07-26 0:23 UTC (permalink / raw) To: Agner Fog; +Cc: Raksit Ashok, dclarke, gcc, TimothyPrince On Fri, Jul 25, 2008 at 09:08:42AM +0200, Agner Fog wrote: > Raksit Ashok wrote: > >There is a more optimized version for 64-bit: > >http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s > >I think this looks similar to your implementation, Agner. > > Yes it is similar to my code. > > Gnu libc could borrow a lot of optimized functions from Opensolaris and > Mac and other open source projects. They look better than Gnu libc, but > there is still room for improvement. For example, Opensolaris does not > use XMM registers for strlen, although this is simpler than using > general purpose registers (see my code www.agner.org/optimize/asmlib.zip) Note, glibc can only take code that is appropriately licensed and donated to the FSF. In addition it must meet the coding standards for glibc. Also note, that it depends on the basic chip level what is fastest for the operation (for example, using XMM registers are not faster for current AMD platforms). Memcpy/memset optimizations were added to glibc 2.8, though when your favorite distribution will provide it is a different question: http://sourceware.org/ml/libc-alpha/2008-04/msg00050.html -- Michael Meissner email: gnu@the-meissners.org http://www.the-meissners.org ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-26 0:23 ` Michael Meissner @ 2008-07-26 17:49 ` Agner Fog 2008-07-28 11:45 ` Agner Fog 1 sibling, 0 replies; 46+ messages in thread From: Agner Fog @ 2008-07-26 17:49 UTC (permalink / raw) To: Michael Meissner, Agner Fog, Raksit Ashok, dclarke, gcc, TimothyPrince Michael Meissner wrote: > On Fri, Jul 25, 2008 at 09:08:42AM +0200, Agner Fog wrote: > >> Gnu libc could borrow a lot of optimized functions from Opensolaris and >> Mac and other open source projects. They look better than Gnu libc, but >> there is still room for improvement. For example, Opensolaris does not >> use XMM registers for strlen, although this is simpler than using >> general purpose registers (see my code www.agner.org/optimize/asmlib.zip) >> > > Note, glibc can only take code that is appropriately licensed and donated to > the FSF. In addition it must meet the coding standards for glibc. > The Mac/Xnu and Opensolaris projects have fairly liberal public licenses. If there are legal differences, maybe the copyright owner is open to negotiation. My own code has GPL license. The fact that I am offering my code to you also means, of course, that I am willing to grant the necessary license. > Also note, that it depends on the basic chip level what is fastest for the > operation (for example, using XMM registers are not faster for current AMD > platforms). > Indeed. That's why I am talking about CPU dispatching (i.e. different branches for different CPUs). The CPU dispatching can be done with just a single jump instruction: At the function entry there is an indirect jump through a pointer to the appropriate version. The code pointer initially points to a CPU dispatcher. The CPU dispatcher detects which CPU it is running on, and replaces the code pointer with a pointer to the appropriate version, then jumps to the pointer. The next time the function is called, it follows the pointer directly to the right version. My memcpy runs faster with XMM registers than with 64-bit x64 registers on AMD K8. My strlen runs slower with XMM registers than with 64-bit x64 registers on AMD K8. I expect the XMM versions to run much faster on AMD K10, because it has full 128-bit execution units and data paths, where K8 has only 64-bits. I have not had the chance to test this on AMD K10 yet. I believe it is best to optimize for the newest processors, because the processor that is brand new today will become mainstream in a few years. > Memcpy/memset optimizations were added to glibc 2.8, though when your favorite > distribution will provide it is a different question: > http://sourceware.org/ml/libc-alpha/2008-04/msg00050.html > I have libc version 2.7. Can't find version 2.8. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-26 0:23 ` Michael Meissner 2008-07-26 17:49 ` Agner Fog @ 2008-07-28 11:45 ` Agner Fog 2008-07-28 14:40 ` Daniel Jacobowitz 2008-07-28 17:19 ` Michael Matz 1 sibling, 2 replies; 46+ messages in thread From: Agner Fog @ 2008-07-28 11:45 UTC (permalink / raw) To: Michael Meissner, Agner Fog, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad Michael Meissner wrote: >Memcpy/memset optimizations were added to glibc 2.8, though when your favorite >distribution will provide it is a different question: >http://sourceware.org/ml/libc-alpha/2008-04/msg00050.html I finally got a SUSE with glibc 2.8. I can see that 32-bit memcpy has been modified with an extra misalignment branch, but no significant improvement. Glibc 2.8 is NOT faster than glibc 2.7 in my tests. It still doesn't use XMM registers. Glibc 2.8 is still almost 5 times slower than the best function libraries for unaligned data on Intel Core 2, and the default builtin function is slower than any other implementation I have seen (copies 1 byte at a time!). Tarjei Knapstad wrote: >2008/7/26 Agner Fog <agner@agner.org>: >>I have libc version 2.7. Can't find version 2.8 >It's in Fedora 9, I have no idea why the source isn't directly >available from the glibc homepage. 2.8 is not an official final release yet. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-28 11:45 ` Agner Fog @ 2008-07-28 14:40 ` Daniel Jacobowitz 2008-07-28 17:37 ` Dennis Clarke 2008-07-28 17:19 ` Michael Matz 1 sibling, 1 reply; 46+ messages in thread From: Daniel Jacobowitz @ 2008-07-28 14:40 UTC (permalink / raw) To: Agner Fog Cc: Michael Meissner, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad On Mon, Jul 28, 2008 at 12:56:57PM +0200, Agner Fog wrote: > >2008/7/26 Agner Fog <agner@agner.org>: > >>I have libc version 2.7. Can't find version 2.8 > >It's in Fedora 9, I have no idea why the source isn't directly > >available from the glibc homepage. > > 2.8 is not an official final release yet. That's incorrect; the glibc maintainers just don't care much for tarballs. You can find the tag in CVS from several months ago. -- Daniel Jacobowitz CodeSourcery ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-28 14:40 ` Daniel Jacobowitz @ 2008-07-28 17:37 ` Dennis Clarke 2008-07-28 17:54 ` Paolo Carlini 0 siblings, 1 reply; 46+ messages in thread From: Dennis Clarke @ 2008-07-28 17:37 UTC (permalink / raw) To: Agner Fog, Michael Meissner, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad On Mon, Jul 28, 2008 at 8:10 AM, Daniel Jacobowitz <drow@false.org> wrote: > On Mon, Jul 28, 2008 at 12:56:57PM +0200, Agner Fog wrote: >> >2008/7/26 Agner Fog <agner@agner.org>: >> >>I have libc version 2.7. Can't find version 2.8 >> >It's in Fedora 9, I have no idea why the source isn't directly >> >available from the glibc homepage. >> >> 2.8 is not an official final release yet. > > That's incorrect; the glibc maintainers just don't care much for > tarballs. You can find the tag in CVS from several months ago. this page : http://www.gnu.org/software/libc/ says : Current Status The current version is 2.7. See the NEWS file for more information. There is a FAQ which you should read first. also, IMO, the NEWS sections says nothing useful to any human. Dennis ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-28 17:37 ` Dennis Clarke @ 2008-07-28 17:54 ` Paolo Carlini 2008-07-28 18:31 ` Dennis Clarke 0 siblings, 1 reply; 46+ messages in thread From: Paolo Carlini @ 2008-07-28 17:54 UTC (permalink / raw) To: dclarke; +Cc: gcc Dennis Clarke wrote: > also, IMO, the NEWS sections says nothing useful to any human. > but, *some* humans like to click on the first (download) link on top. Paolo. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-28 17:54 ` Paolo Carlini @ 2008-07-28 18:31 ` Dennis Clarke 2008-07-28 18:37 ` Ian Lance Taylor ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Dennis Clarke @ 2008-07-28 18:31 UTC (permalink / raw) To: Paolo Carlini; +Cc: gcc On Mon, Jul 28, 2008 at 1:17 PM, Paolo Carlini <paolo.carlini@oracle.com> wrote: > Dennis Clarke wrote: >> >> also, IMO, the NEWS sections says nothing useful to any human. >> > > but, *some* humans like to click on the first (download) link on top. where ? It says Availability The releases are available at http://ftp.gnu.org/gnu/glibc/ and its mirrors. which has glibc-2.7.tar.bz2 as the latest. hold on .. on the NEWS page I see ... okay .. how very user friendly. Sort of the thing one would put on the project homepage I would think. Dennis ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-28 18:31 ` Dennis Clarke @ 2008-07-28 18:37 ` Ian Lance Taylor 2008-07-28 19:44 ` Dave Korn 2008-07-29 1:31 ` Gerald Pfeifer 2 siblings, 0 replies; 46+ messages in thread From: Ian Lance Taylor @ 2008-07-28 18:37 UTC (permalink / raw) To: dclarke; +Cc: Paolo Carlini, gcc "Dennis Clarke" <blastwave@gmail.com> writes: > hold on .. on the NEWS page I see ... okay .. how very user friendly. > Sort of the thing one would put on the project homepage I would think. The glibc project has their own special approach to user friendliness. Ian ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: gcc will become the best optimizing x86 compiler 2008-07-28 18:31 ` Dennis Clarke 2008-07-28 18:37 ` Ian Lance Taylor @ 2008-07-28 19:44 ` Dave Korn 2008-07-28 21:40 ` Dennis Clarke 2008-07-29 1:31 ` Gerald Pfeifer 2 siblings, 1 reply; 46+ messages in thread From: Dave Korn @ 2008-07-28 19:44 UTC (permalink / raw) To: dclarke, 'Paolo Carlini'; +Cc: gcc Dennis Clarke wrote on 28 July 2008 18:54: > On Mon, Jul 28, 2008 at 1:17 PM, Paolo Carlini <paolo.carlini@oracle.com> > wrote: >> Dennis Clarke wrote: >>> >>> also, IMO, the NEWS sections says nothing useful to any human. >>> >> >> but, *some* humans like to click on the first (download) link on top. > > where ? > > It says > > Availability > The releases are available at http://ftp.gnu.org/gnu/glibc/ and its > mirrors. > > which has glibc-2.7.tar.bz2 as the latest. > > hold on .. on the NEWS page I see ... okay .. how very user friendly. > Sort of the thing one would put on the project homepage I would think. It's not the NEWS page; it's a link to the source of the NEWS file stored in the glibc CVS repository. The gnu.org page is rather out of date, and a bit obfuscated. Most GNU projects have a prominent link in their gnu.org directory page to the actual project home page; in this case it's tucked away on the "resources" page in the "Project website" section. (Oh, and it still points to "sources.redhat.com", which is a sign of just how out-of-date that gnu.org page really is...) Follow that link, and you'll see the *real* project home, with the real news and the real latest-release info, and the real list of mailing lists, and the wiki, and ... cheers, DaveK -- Can't think of a witty .sigline today.... ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-28 19:44 ` Dave Korn @ 2008-07-28 21:40 ` Dennis Clarke 0 siblings, 0 replies; 46+ messages in thread From: Dennis Clarke @ 2008-07-28 21:40 UTC (permalink / raw) To: Dave Korn; +Cc: Paolo Carlini, gcc On Mon, Jul 28, 2008 at 2:30 PM, Dave Korn <dave.korn@artimi.com> wrote: > Dennis Clarke wrote on 28 July 2008 18:54: > >> On Mon, Jul 28, 2008 at 1:17 PM, Paolo Carlini <paolo.carlini@oracle.com> >> wrote: >>> Dennis Clarke wrote: >>>> >>>> also, IMO, the NEWS sections says nothing useful to any human. >>>> >>> >>> but, *some* humans like to click on the first (download) link on top. >> >> where ? >> >> It says >> >> Availability >> The releases are available at http://ftp.gnu.org/gnu/glibc/ and its >> mirrors. >> >> which has glibc-2.7.tar.bz2 as the latest. >> >> hold on .. on the NEWS page I see ... okay .. how very user friendly. >> Sort of the thing one would put on the project homepage I would think. > > It's not the NEWS page; it's a link to the source of the NEWS file stored > in the glibc CVS repository. > > The gnu.org page is rather out of date, and a bit obfuscated. > > Most GNU projects have a prominent link in their gnu.org directory page to > the actual project home page; in this case it's tucked away on the > "resources" page in the "Project website" section. (Oh, and it still points > to "sources.redhat.com", which is a sign of just how out-of-date that > gnu.org page really is...) > > Follow that link, and you'll see the *real* project home, with the real > news and the real latest-release info, and the real list of mailing lists, > and the wiki, and ... the *real* wiki ? :-) Dennis ps: I used CVS to get the sources. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-28 18:31 ` Dennis Clarke 2008-07-28 18:37 ` Ian Lance Taylor 2008-07-28 19:44 ` Dave Korn @ 2008-07-29 1:31 ` Gerald Pfeifer 2008-07-29 6:29 ` Agner Fog 2 siblings, 1 reply; 46+ messages in thread From: Gerald Pfeifer @ 2008-07-29 1:31 UTC (permalink / raw) To: dclarke; +Cc: Paolo Carlini, gcc On Mon, 28 Jul 2008, Dennis Clarke wrote: > hold on .. on the NEWS page I see ... okay .. how very user friendly. > Sort of the thing one would put on the project homepage I would think. See how user friendly we in GCC-land are in comparison? ;-) Gerald ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-29 1:31 ` Gerald Pfeifer @ 2008-07-29 6:29 ` Agner Fog 2008-07-29 9:24 ` Ben Elliston 0 siblings, 1 reply; 46+ messages in thread From: Agner Fog @ 2008-07-29 6:29 UTC (permalink / raw) To: Gerald Pfeifer; +Cc: dclarke, Paolo Carlini, gcc Gerald Pfeifer wrote: > See how user friendly we in GCC-land are in comparison? ;-) > Since there is no libc mailing list, I thought that the gcc list is the place to contact the maintainers of libc. Am I on the wrong list? Or are there no maintainers of libc? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-29 6:29 ` Agner Fog @ 2008-07-29 9:24 ` Ben Elliston 2008-07-31 8:12 ` Christopher Faylor 0 siblings, 1 reply; 46+ messages in thread From: Ben Elliston @ 2008-07-29 9:24 UTC (permalink / raw) To: Agner Fog; +Cc: gcc > Since there is no libc mailing list, I thought that the gcc list is the > place to contact the maintainers of libc. Am I on the wrong list? Or are > there no maintainers of libc? See: http://sources.redhat.com/glibc/ You want the libc-alpha list, I think. Cheers, Ben ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-29 9:24 ` Ben Elliston @ 2008-07-31 8:12 ` Christopher Faylor 0 siblings, 0 replies; 46+ messages in thread From: Christopher Faylor @ 2008-07-31 8:12 UTC (permalink / raw) To: gcc, Agner Fog, Ben Elliston On Tue, Jul 29, 2008 at 04:14:49PM +1000, Ben Elliston wrote: >> Since there is no libc mailing list, I thought that the gcc list is the >> place to contact the maintainers of libc. Am I on the wrong list? Or are >> there no maintainers of libc? > >See: > http://sources.redhat.com/glibc/ > >You want the libc-alpha list, I think. I think libc-help is a more likely place to start. cgf ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-28 11:45 ` Agner Fog 2008-07-28 14:40 ` Daniel Jacobowitz @ 2008-07-28 17:19 ` Michael Matz 2008-07-29 6:15 ` Agner Fog 1 sibling, 1 reply; 46+ messages in thread From: Michael Matz @ 2008-07-28 17:19 UTC (permalink / raw) To: Agner Fog Cc: Michael Meissner, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad Hi, On Mon, 28 Jul 2008, Agner Fog wrote: > Glibc 2.8 is still almost 5 times slower than the best function > libraries for unaligned data on Intel Core 2, and the default builtin > function is slower than any other implementation I have seen (copies 1 > byte at a time!). You must be doing something wrong. If the compiler decides to inline the string ops it either knows the size or you told it to do it anyway (-minline-all-stringops or -minline-stringops-dynamically). In both cases will it use wider than byte moves when possible. Ciao, Michael. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-28 17:19 ` Michael Matz @ 2008-07-29 6:15 ` Agner Fog 2008-07-29 9:31 ` Richard Guenther ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Agner Fog @ 2008-07-29 6:15 UTC (permalink / raw) To: Michael Matz Cc: Michael Meissner, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad Michael Matz wrote: > You must be doing something wrong. If the compiler decides to inline the > string ops it either knows the size or you told it to do it anyway > (-minline-all-stringops or -minline-stringops-dynamically). In both cases > will it use wider than byte moves when possible. > g++ (v. 4.2.3) without any options converts memcpy with unknown size to rep movsb g++ with option -fno-builtin calls memcpy in libc The rep movs, stos, scas, cmps instructions are slower than function calls except in rare cases. The compiler should never use the string instructions. It is OK to use mov instructions if the size is known, but not string instructions. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-29 6:15 ` Agner Fog @ 2008-07-29 9:31 ` Richard Guenther 2008-07-29 9:55 ` Steven Bosscher 2008-07-29 14:11 ` Michael Matz 2008-07-29 14:45 ` Tim Prince 2 siblings, 1 reply; 46+ messages in thread From: Richard Guenther @ 2008-07-29 9:31 UTC (permalink / raw) To: Agner Fog Cc: Michael Matz, Michael Meissner, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad On Tue, Jul 29, 2008 at 7:26 AM, Agner Fog <agner@agner.org> wrote: > Michael Matz wrote: >> >> You must be doing something wrong. If the compiler decides to inline the >> string ops it either knows the size or you told it to do it anyway >> (-minline-all-stringops or -minline-stringops-dynamically). In both cases >> will it use wider than byte moves when possible. >> > > g++ (v. 4.2.3) without any options converts memcpy with unknown size to rep > movsb Make sure to use -D__NO_STRING_INLINES to not get glibcs inline implementation. Richard. > g++ with option -fno-builtin calls memcpy in libc > > The rep movs, stos, scas, cmps instructions are slower than function calls > except in rare cases. The compiler should never use the string instructions. > It is OK to use mov instructions if the size is known, but not string > instructions. > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-29 9:31 ` Richard Guenther @ 2008-07-29 9:55 ` Steven Bosscher 2008-07-29 13:09 ` Joseph S. Myers 0 siblings, 1 reply; 46+ messages in thread From: Steven Bosscher @ 2008-07-29 9:55 UTC (permalink / raw) To: Richard Guenther Cc: Agner Fog, Michael Matz, Michael Meissner, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad On Tue, Jul 29, 2008 at 11:26 AM, Richard Guenther <richard.guenther@gmail.com> wrote: >> g++ (v. 4.2.3) without any options converts memcpy with unknown size to rep >> movsb > > Make sure to use -D__NO_STRING_INLINES to not get glibcs inline > implementation. Why is this not the default? Gr. Steven ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-29 9:55 ` Steven Bosscher @ 2008-07-29 13:09 ` Joseph S. Myers 0 siblings, 0 replies; 46+ messages in thread From: Joseph S. Myers @ 2008-07-29 13:09 UTC (permalink / raw) To: Steven Bosscher Cc: Richard Guenther, Agner Fog, Michael Matz, Michael Meissner, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad On Tue, 29 Jul 2008, Steven Bosscher wrote: > On Tue, Jul 29, 2008 at 11:26 AM, Richard Guenther > <richard.guenther@gmail.com> wrote: > >> g++ (v. 4.2.3) without any options converts memcpy with unknown size to rep > >> movsb > > > > Make sure to use -D__NO_STRING_INLINES to not get glibcs inline > > implementation. > > Why is this not the default? Because GNU projects are supposed to work together rather than forcibly overriding each other. As GCC gets optimizations that obsolete particular parts of the optimizations in glibc's headers, Jakub updates the glibc headers to have only those optimizations not obsoleted by GCC (some call particular glibc-specific functions GCC doesn't know about, for example), depending on the GCC version. If GCC were to override glibc unconditionally, for all the inline implementations, then the natural consequence would be for glibc to change __NO_STRING_INLINES to __REALLY_NO_STRING_INLINES, and so on - this macro is for the user to override, if particular inlines are not needed or not optimal for particular compiler versions or processors then the headers should be updated in glibc. If you have issues with particular inlines (not limited to string functions), please file bugs in glibc Bugzilla, send patches to libc-alpha or contact Jakub. Anyone finding memcpy converted inappropriately needs to give the full testcase - both original and preprocessed source - and full command-line options, so we can tell what inlines if any are being used. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-29 6:15 ` Agner Fog 2008-07-29 9:31 ` Richard Guenther @ 2008-07-29 14:11 ` Michael Matz 2008-07-29 14:45 ` Tim Prince 2 siblings, 0 replies; 46+ messages in thread From: Michael Matz @ 2008-07-29 14:11 UTC (permalink / raw) To: Agner Fog Cc: Michael Meissner, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad Hi, On Tue, 29 Jul 2008, Agner Fog wrote: > g++ (v. 4.2.3) without any options converts memcpy with unknown size to rep > movsb Use newer GCCs. They will (1) not expand memcpy inline for unknown sizes (without special options, also make sure you don't get the glibc inlines) and (2) won't expand to movsb. > The rep movs, stos, scas, cmps instructions are slower than function > calls except in rare cases. Depends on the microarchitecture. For AMD Fam10 for instance REP prefixes are the preferred form for sizes between page-size and half of L1 size, when destination is aligned. > The compiler should never use the string instructions. It is OK to use > mov instructions if the size is known, but not string instructions. General statements are generally wrong :) Ciao, Michael. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-29 6:15 ` Agner Fog 2008-07-29 9:31 ` Richard Guenther 2008-07-29 14:11 ` Michael Matz @ 2008-07-29 14:45 ` Tim Prince 2 siblings, 0 replies; 46+ messages in thread From: Tim Prince @ 2008-07-29 14:45 UTC (permalink / raw) To: Agner Fog Cc: Michael Matz, Michael Meissner, Raksit Ashok, dclarke, gcc, TimothyPrince, Tarjei Knapstad Agner Fog wrote: > Michael Matz wrote: >> You must be doing something wrong. If the compiler decides to inline >> the string ops it either knows the size or you told it to do it anyway >> (-minline-all-stringops or -minline-stringops-dynamically). In both >> cases will it use wider than byte moves when possible. >> > g++ (v. 4.2.3) without any options converts memcpy with unknown size to > rep movsb > g++ with option -fno-builtin calls memcpy in libc > > The rep movs, stos, scas, cmps instructions are slower than function > calls except in rare cases. The compiler should never use the string > instructions. It is OK to use mov instructions if the size is known, but > not string instructions. I assume Agner is talking about the i386 target defaults, while Michael was talking about other target defaults. People who code for i386 must often use memcpy for short or medium length unaligned strings, and should be aware of the issues of long memcpy strings. Even for i386, the compiler should recognize where, for example, a single int move is explicitly the right thing. The rep string issue has been prominent enough for newer CPUs to be designed to recover some of the performance. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-25 7:23 ` Agner Fog 2008-07-26 0:23 ` Michael Meissner @ 2008-07-30 16:37 ` Denys Vlasenko 2008-07-30 16:40 ` Denys Vlasenko 1 sibling, 1 reply; 46+ messages in thread From: Denys Vlasenko @ 2008-07-30 16:37 UTC (permalink / raw) To: Agner Fog; +Cc: Raksit Ashok, dclarke, gcc, TimothyPrince On Fri, Jul 25, 2008 at 9:08 AM, Agner Fog <agner@agner.org> wrote: > Raksit Ashok wrote: >>There is a more optimized version for 64-bit: >>http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s >>I think this looks similar to your implementation, Agner. > > Yes it is similar to my code. 3164 line source file which implements memcpy(). You got to be kidding. How much of L1 icache it blows away in the process? I bet it performs wonderfully on microbenchmarks though. 2991 .balign 16 # sadistic alignment strikes again 2992 L(bkPxQx): .int L(bkP0Q0)-L(bkPxQx) # why use two bytes when we can use four? Seriously. What possible reason there can be to align a randomly accessed data table to 16 bytes? 4 bytes I understand, but 16? -- vda ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-30 16:37 ` Denys Vlasenko @ 2008-07-30 16:40 ` Denys Vlasenko 2008-07-30 17:52 ` Agner Fog 0 siblings, 1 reply; 46+ messages in thread From: Denys Vlasenko @ 2008-07-30 16:40 UTC (permalink / raw) To: Agner Fog; +Cc: Raksit Ashok, dclarke, gcc, TimothyPrince On Wed, Jul 30, 2008 at 5:57 PM, Denys Vlasenko <vda.linux@googlemail.com> wrote: > On Fri, Jul 25, 2008 at 9:08 AM, Agner Fog <agner@agner.org> wrote: >> Raksit Ashok wrote: >>>There is a more optimized version for 64-bit: >>>http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s >>>I think this looks similar to your implementation, Agner. >> >> Yes it is similar to my code. > > 3164 line source file which implements memcpy(). > You got to be kidding. > How much of L1 icache it blows away in the process? > I bet it performs wonderfully on microbenchmarks though. > > 2991 .balign 16 # sadistic alignment strikes again > 2992 L(bkPxQx): .int L(bkP0Q0)-L(bkPxQx) # why use two bytes when > we can use four? > > Seriously. What possible reason there can be to align > a randomly accessed data table to 16 bytes? > 4 bytes I understand, but 16? I'm afraid I sounded a bit confrontational above, here comes the clarification. I have nothing against making code faster. But there should be some balance between -O999 mindset and -Os midset. If you just found a tweak which gives you 1.2% speedup in microbencmark but code grew 4 times bigger, *stop*. Think about it. "We unrolled the loop two gazillion times and it's 3% faster now" is a similarly bad idea. I must admit that I didn't look too closely at http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s but at the first glance it sure looks like someone got carried away a bit. -- vda ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-30 16:40 ` Denys Vlasenko @ 2008-07-30 17:52 ` Agner Fog 2008-07-30 22:42 ` Dennis Clarke 2008-07-31 2:57 ` Denys Vlasenko 0 siblings, 2 replies; 46+ messages in thread From: Agner Fog @ 2008-07-30 17:52 UTC (permalink / raw) To: Denys Vlasenko; +Cc: Raksit Ashok, dclarke, gcc, TimothyPrince Denys Vlasenko wrote: >> 3164 line source file which implements memcpy(). >> You got to be kidding. >> How much of L1 icache it blows away in the process? >> I bet it performs wonderfully on microbenchmarks though. >> I agree that the OpenSolaris memcpy is bigger than necessary. However, it is necessary to have 16 branches for covering all possible alignments modulo 16. This is because, unfortunately, there is no XMM shift instruction with a variable count, only with a constant count, so we need one branch for each value of the shift count. Since only one of the branches is used, it doesn't take much space in the code cache. The speed is improved by a factor 4-5 by this 16-branch algorithm, so it is certainly worth the extra complexity. The future AMD SSE5 instruction set offers a possibility to join the many branches into one, but only on AMD processors. Intel is not going to support SSE5, and the future Intel AVX instruction set doesn't have an instruction that can be used for this purpose. So we will need separate branches for Intel and AMD code in future implementation of libc. (Explained in www.agner.org/optimize/asmexamples.zip). > "We unrolled the loop two gazillion times and it's 3% faster now" > is a similarly bad idea. > I agree completely. My memcpy code is much smaller than the OpenSolaris and Mac implementations and approximately equally fast. Some compilers unroll loops way too much in my opinion. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-30 17:52 ` Agner Fog @ 2008-07-30 22:42 ` Dennis Clarke 2008-07-31 2:57 ` Denys Vlasenko 1 sibling, 0 replies; 46+ messages in thread From: Dennis Clarke @ 2008-07-30 22:42 UTC (permalink / raw) To: Agner Fog; +Cc: Denys Vlasenko, Raksit Ashok, gcc, TimothyPrince On Wed, Jul 30, 2008 at 5:14 PM, Agner Fog <agner@agner.org> wrote: > Denys Vlasenko wrote: >>> >>> 3164 line source file which implements memcpy(). >>> You got to be kidding. >>> How much of L1 icache it blows away in the process? >>> I bet it performs wonderfully on microbenchmarks though. >>> > > I agree that the OpenSolaris memcpy is bigger than necessary. However, it is > necessary to have 16 branches for covering all possible alignments modulo > 16. This is because, unfortunately, there is no XMM shift instruction with a > variable count, only with a constant count, so we need one branch for each > value of the shift count. Since only one of the branches is used, it doesn't > take much space in the code cache. The speed is improved by a factor 4-5 by > this 16-branch algorithm, so it is certainly worth the extra complexity. You forgot to look at PowerPC : http://cvs.opensolaris.org/source/xref/ppc-dev/ppc-dev/usr/src/lib/libc/ppc/gen/memcpy.s is that nice and small ? Dennis Clarke ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-30 17:52 ` Agner Fog 2008-07-30 22:42 ` Dennis Clarke @ 2008-07-31 2:57 ` Denys Vlasenko 2008-07-31 8:18 ` Agner Fog 1 sibling, 1 reply; 46+ messages in thread From: Denys Vlasenko @ 2008-07-31 2:57 UTC (permalink / raw) To: Agner Fog; +Cc: Raksit Ashok, dclarke, gcc, TimothyPrince On Wednesday 30 July 2008 19:14, Agner Fog wrote: > I agree that the OpenSolaris memcpy is bigger than necessary. However, > it is necessary to have 16 branches for covering all possible alignments > modulo 16. This is because, unfortunately, there is no XMM shift > instruction with a variable count, only with a constant count, so we > need one branch for each value of the shift count. Since only one of the > branches is used, it doesn't take much space in the code cache. The > speed is improved by a factor 4-5 by this 16-branch algorithm, so it is > certainly worth the extra complexity. I tend to doubt that odd-byte aligned large memcpys are anywhere near typical. malloc and mmap both return well-aligned buffers (say, 8 byte aligned). Static and on-stack objects are also at least word-aligned 99% of the time. memcpy can just use "relatively simple" code for copies in which either src or dst is not word aligned. This cuts possibilities down from 16 to 4 (or even 2?). -- vda ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-31 2:57 ` Denys Vlasenko @ 2008-07-31 8:18 ` Agner Fog 2008-07-31 11:00 ` Dave Korn 0 siblings, 1 reply; 46+ messages in thread From: Agner Fog @ 2008-07-31 8:18 UTC (permalink / raw) To: Denys Vlasenko; +Cc: Raksit Ashok, dclarke, gcc, TimothyPrince Denys Vlasenko wrote: > I tend to doubt that odd-byte aligned large memcpys are anywhere > near typical. malloc and mmap both return well-aligned buffers > (say, 8 byte aligned). Static and on-stack objects are also > at least word-aligned 99% of the time. > > memcpy can just use "relatively simple" code for copies in which > either src or dst is not word aligned. This cuts possibilities down > from 16 to 4 (or even 2?). > The XMM code is still more than 3 times faster than rep movsl when data are aligned by 4 or 8, but not by 16. Even if odd addresses are rare, they must be supported, but we can put the most common cases first. strcpy and strcat can be implemented efficiently simply by calling strlen and memcpy, since both strlen and memcpy can be optimized very well. This can give unaligned addresses. Dennis Clarke wrote: > You forgot to look at PowerPC : > > http://cvs.opensolaris.org/source/xref/ppc-dev/ppc-dev/usr/src/lib/libc/ppc/gen/memcpy.s > > is that nice and small ? > .. and slow. Why doesn't it use Altivec? ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: gcc will become the best optimizing x86 compiler 2008-07-31 8:18 ` Agner Fog @ 2008-07-31 11:00 ` Dave Korn 0 siblings, 0 replies; 46+ messages in thread From: Dave Korn @ 2008-07-31 11:00 UTC (permalink / raw) To: 'Agner Fog', 'Denys Vlasenko' Cc: 'Raksit Ashok', dclarke, gcc, TimothyPrince Agner Fog wrote on 31 July 2008 07:14: > Denys Vlasenko wrote: >> I tend to doubt that odd-byte aligned large memcpys are anywhere >> near typical. malloc and mmap both return well-aligned buffers >> (say, 8 byte aligned). Static and on-stack objects are also >> at least word-aligned 99% of the time. >> >> memcpy can just use "relatively simple" code for copies in which >> either src or dst is not word aligned. This cuts possibilities down from >> 16 to 4 (or even 2?). >> > The XMM code is still more than 3 times faster than rep movsl when data > are aligned by 4 or 8, but not by 16. > Even if odd addresses are rare, they must be supported, but we can put > the most common cases first. In the real world, unaligned memcpys are anything but rare. Everything's networked these days, remember? Stuff gets misaligned real quick when you start adding and removing various network layer headers and trailers to unpredictably-sized packets. cheers, DaveK -- Can't think of a witty .sigline today.... ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: gcc will become the best optimizing x86 compiler 2008-07-23 17:25 ` gcc will become the best optimizing x86 compiler Agner Fog 2008-07-23 17:33 ` Tim Prince 2008-07-24 8:04 ` Dennis Clarke @ 2008-07-24 10:09 ` Zoltán Kócsi 2 siblings, 0 replies; 46+ messages in thread From: Zoltán Kócsi @ 2008-07-24 10:09 UTC (permalink / raw) To: gcc > [...] > I have made a few optimized functions myself and published them as a > multi-platform library (www.agner.org/optimize/asmlib.zip). It is > faster than most other libraries on an Intel Core2 and up to ten > times faster than gcc using builtin functions. My library is > published with GPL license, but I will allow you to use my code in > gnu libc if you wish (Sorry, I don't have the time to work on the gnu > project myself, but you may contact me for details about the code). > [...] But then it's not gcc that is the best optimising compiler, but it's the best library *hand optimised so that gcc compiles it very well*. Here's an example: void foo( void ) { unsigned x; for ( x = 0 ; x < 200 ; x++ ) func(); } void bar( void ) { unsigned x; for ( x = 201 ; --x ; ) func(); } foo() and bar() are completely equivalent, they call func() 200 times and that's all. Yet, if you compile them with -O3 for arm-elf target with version 4.0.2 (yes, I know, it's an ancient version, but still) bar() will be 6 insns long with the loop itself being 3 while foo() compiles to 7 insns of which 4 is the loop. In fact, the compiler is clever enough to transform bar()'s loop from for ( x = 201 ; --x ; ) func(); to x = 200; do func() while ( --x ); internally, the latter form being shorter to evaluate and since x is not used other than as the loop counter it doesn't matter. However, it is not clever enough to figure out that foo()'s loop is doing exactly what bar()'s is doing. Since x is only the loop counter, gcc could transform foo()'s loop to bar()'s freely but it doesn't. It generates the equivalent of this: x = 0; do { x += 1; func(); } while ( x != 240 ); that is not as efficient as what it generates from bar()'s code. Of course you get surprised when you change -O3 to -Os, in which case gcc suddenly realises that foo() can indeed be transformed to the internal representation that it used for bar() with -O3. Thus, we have foo() now being only 6 insns long with a 3 insn loop. Unfortunately, bar() is not that lucky. Although it's loop remains 3 insns long, the entire function is increased by an additional instruction, for bar() internally now looks like this: x = 201; goto label; do { func(); label: } while ( --x ); You can play with gcc and see which one of the equivalent C constructs it compiles to better code with any particular -O level (and if you have to work with severely constrained embedded systems you often do) but then hand-crafting your C code to fit gcc's taste is actually not that good an idea. With the next release, when different constructs will be recognised, you may end up with larger and/or slower code (as it happened to me when changing 4.0.x -> 4.3.x and before when going from 2.9.x to 3.1.x). Gcc will be the best optimising compiler when it will generate faster/shorter code that the other compilers on the majority of a large set of arbitrary, *not* hand-optimised sources. Preferrably for most targets, not only for the x86, if possible :-) Zoltan ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2008-07-31 9:36 UTC | newest] Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-07-30 15:34 gcc will become the best optimizing x86 compiler Eus 2008-07-30 16:09 ` Dennis Clarke -- strict thread matches above, loose matches on Subject: below -- 2008-07-23 14:59 Is cross-section inlining valid behaviour? Bingfeng Mei 2008-07-23 17:25 ` gcc will become the best optimizing x86 compiler Agner Fog 2008-07-23 17:33 ` Tim Prince 2008-07-24 8:04 ` Dennis Clarke 2008-07-24 9:41 ` Agner Fog 2008-07-24 10:10 ` Dave Korn 2008-07-24 13:20 ` Basile STARYNKEVITCH 2008-07-24 13:31 ` Dave Korn 2008-07-24 13:59 ` Agner Fog 2008-07-24 14:40 ` Richard Guenther 2008-07-28 10:57 ` Andrew Haley 2008-07-24 15:02 ` Joseph S. Myers 2008-07-24 16:26 ` Agner Fog 2008-07-24 17:17 ` Basile STARYNKEVITCH 2008-07-24 17:21 ` Raksit Ashok 2008-07-25 7:23 ` Agner Fog 2008-07-26 0:23 ` Michael Meissner 2008-07-26 17:49 ` Agner Fog 2008-07-28 11:45 ` Agner Fog 2008-07-28 14:40 ` Daniel Jacobowitz 2008-07-28 17:37 ` Dennis Clarke 2008-07-28 17:54 ` Paolo Carlini 2008-07-28 18:31 ` Dennis Clarke 2008-07-28 18:37 ` Ian Lance Taylor 2008-07-28 19:44 ` Dave Korn 2008-07-28 21:40 ` Dennis Clarke 2008-07-29 1:31 ` Gerald Pfeifer 2008-07-29 6:29 ` Agner Fog 2008-07-29 9:24 ` Ben Elliston 2008-07-31 8:12 ` Christopher Faylor 2008-07-28 17:19 ` Michael Matz 2008-07-29 6:15 ` Agner Fog 2008-07-29 9:31 ` Richard Guenther 2008-07-29 9:55 ` Steven Bosscher 2008-07-29 13:09 ` Joseph S. Myers 2008-07-29 14:11 ` Michael Matz 2008-07-29 14:45 ` Tim Prince 2008-07-30 16:37 ` Denys Vlasenko 2008-07-30 16:40 ` Denys Vlasenko 2008-07-30 17:52 ` Agner Fog 2008-07-30 22:42 ` Dennis Clarke 2008-07-31 2:57 ` Denys Vlasenko 2008-07-31 8:18 ` Agner Fog 2008-07-31 11:00 ` Dave Korn 2008-07-24 10:09 ` Zoltán Kócsi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).