* [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 @ 2004-07-12 0:54 Roger Sayle 2004-07-12 1:43 ` Joseph S. Myers 2004-07-12 7:07 ` Jakub Jelinek 0 siblings, 2 replies; 28+ messages in thread From: Roger Sayle @ 2004-07-12 0:54 UTC (permalink / raw) To: gcc-patches I was orginally just going to ask whether there were any remaining hold-ups to enabling -fomit-frame-pointer by default when optimizing on IA-32. More pro-actively, I thought I'd propose the following patch and see what people think. The x86_64 targets in i386.c already do this by default, and with the variable tracking changes, gcc 3.5 will already require users to use a recent gdb 6.x for debugging. The following patch has been tested on i686-pc-linux-gnu with a full "make bootstrap", all default languages, and regression tested with a top-level "make -k check" with no new failures. I'll admit that I've never run the gdb testsuite, but I'd hope that by now any regressions there would be considered problems with gdb :) Ok for mainline? 2004-07-11 Roger Sayle <roger@eyesopen.com> PR middle-end/16373 * config/i386/i386.h (CAN_DEBUG_WITHOUT_FP): Define. * config/i386/i386.c (optimization_options): Don't track whether flag_omit_frame_pointer has been specified by user. (override_options): Don't adjust target dependent default for flag_omit_frame_pointer. Index: i386.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.h,v retrieving revision 1.388 diff -c -3 -p -r1.388 i386.h *** i386.h 7 Jul 2004 19:24:17 -0000 1.388 --- i386.h 11 Jul 2004 19:53:17 -0000 *************** struct machine_function GTY(()) *** 3119,3124 **** --- 3119,3127 ---- #define X86_FILE_START_VERSION_DIRECTIVE false #define X86_FILE_START_FLTUSED false + /* Show we can debug even without a frame pointer. */ + #define CAN_DEBUG_WITHOUT_FP + /* Local variables: version-control: t Index: i386.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v retrieving revision 1.689 diff -c -3 -p -r1.689 i386.c *** i386.c 10 Jul 2004 19:01:40 -0000 1.689 --- i386.c 11 Jul 2004 19:53:20 -0000 *************** override_options (void) *** 1199,1206 **** in case they weren't overwritten by command line options. */ if (TARGET_64BIT) { - if (flag_omit_frame_pointer == 2) - flag_omit_frame_pointer = 1; if (flag_asynchronous_unwind_tables == 2) flag_asynchronous_unwind_tables = 1; if (flag_pcc_struct_return == 2) --- 1199,1204 ---- *************** override_options (void) *** 1208,1215 **** } else { - if (flag_omit_frame_pointer == 2) - flag_omit_frame_pointer = 0; if (flag_asynchronous_unwind_tables == 2) flag_asynchronous_unwind_tables = 0; if (flag_pcc_struct_return == 2) --- 1206,1211 ---- *************** optimization_options (int level, int siz *** 1578,1585 **** that is not known at this moment. Mark these values with 2 and let user the to override these. In case there is no command line option specifying them, we will set the defaults in override_options. */ - if (optimize >= 1) - flag_omit_frame_pointer = 2; flag_pcc_struct_return = 2; flag_asynchronous_unwind_tables = 2; } --- 1574,1579 ---- Roger -- Roger Sayle, E-mail: roger@eyesopen.com OpenEye Scientific Software, WWW: http://www.eyesopen.com/ Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385 Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 0:54 [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 Roger Sayle @ 2004-07-12 1:43 ` Joseph S. Myers 2004-07-12 2:19 ` Roger Sayle 2004-07-12 7:07 ` Jakub Jelinek 1 sibling, 1 reply; 28+ messages in thread From: Joseph S. Myers @ 2004-07-12 1:43 UTC (permalink / raw) To: Roger Sayle; +Cc: gcc-patches On Sun, 11 Jul 2004, Roger Sayle wrote: > The following patch has been tested on i686-pc-linux-gnu with a full > "make bootstrap", all default languages, and regression tested with a > top-level "make -k check" with no new failures. I'll admit that I've Performance statistics (compile time, run time and code size)? After all, the point of this change is presumably that it improves performance. -- Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/ jsm@polyomino.org.uk (personal mail) jsm28@gcc.gnu.org (Bugzilla assignments and CCs) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 1:43 ` Joseph S. Myers @ 2004-07-12 2:19 ` Roger Sayle 0 siblings, 0 replies; 28+ messages in thread From: Roger Sayle @ 2004-07-12 2:19 UTC (permalink / raw) To: Joseph S. Myers; +Cc: gcc-patches On Sun, 11 Jul 2004, Joseph S. Myers wrote: > Performance statistics (compile time, run time and code size)? After all, > the point of this change is presumably that it improves performance. Do you have any doubt at all that this patch won't improve run-time, reduce code size and speed up a bootstrapped compiler? :> The bugzilla PR reports performance increases of up to 40% on some benchmarks, and indeed the Intel and Microsoft C/C++ compilers omit frame pointers by default. I'd hope that any gains will help redress some of the tree-ssa related compile-time slow-down, but I haven't done any timings. The benefits of -fomit-frame-pointer are otherwise well documented by Robert Scott Ladd and others. But the real reason I submitted the patch was to decrease the maintenance burden of the six additional lines in i386.c that provide target dependent defaults for flag_omit_frame_pointer, but break the documented semantics of the CAN_DEBUG_WITHOUT_FP target macro. Any other side-effects of this patch are unintentional and purely coincidental. :> The alternative fix to PR middle-end/16373 is to change the documentation. Roger -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 0:54 [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 Roger Sayle 2004-07-12 1:43 ` Joseph S. Myers @ 2004-07-12 7:07 ` Jakub Jelinek 2004-07-12 8:18 ` Roger Sayle 2004-07-12 23:43 ` Richard Henderson 1 sibling, 2 replies; 28+ messages in thread From: Jakub Jelinek @ 2004-07-12 7:07 UTC (permalink / raw) To: Roger Sayle; +Cc: gcc-patches On Sun, Jul 11, 2004 at 05:06:02PM -0600, Roger Sayle wrote: > > I was orginally just going to ask whether there were any remaining > hold-ups to enabling -fomit-frame-pointer by default when optimizing > on IA-32. More pro-actively, I thought I'd propose the following patch > and see what people think. The x86_64 targets in i386.c already do > this by default, and with the variable tracking changes, gcc 3.5 will > already require users to use a recent gdb 6.x for debugging. > > > The following patch has been tested on i686-pc-linux-gnu with a full > "make bootstrap", all default languages, and regression tested with a > top-level "make -k check" with no new failures. I'll admit that I've > never run the gdb testsuite, but I'd hope that by now any regressions > there would be considered problems with gdb :) > > Ok for mainline? Just search the archives for -fomit-frame-pointer i386. This patch came up several times even this year. I don't think we can turn -fomit-frame-pointer by default on unless we also -fasynchronous-unwind-tables, because otherwise things like glibc's backtrace(3) has zero chance of ever working. Java heavily relies on this function, but several other programs as well. x86-64 which enables -fomit-frame-pointer by default enables async unwind tables at the same time. Jakub ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 7:07 ` Jakub Jelinek @ 2004-07-12 8:18 ` Roger Sayle 2004-07-12 8:28 ` Andrew Pinski 2004-07-12 12:42 ` Jakub Jelinek 2004-07-12 23:43 ` Richard Henderson 1 sibling, 2 replies; 28+ messages in thread From: Roger Sayle @ 2004-07-12 8:18 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches On Mon, 12 Jul 2004, Jakub Jelinek wrote: > I don't think we can turn -fomit-frame-pointer by default on unless > we also -fasynchronous-unwind-tables, because otherwise things like > glibc's backtrace(3) has zero chance of ever working. I doubt that this would be acceptable. Could you confirm that glibc's backtrace(3) command only requires -fasynchronous-unwind-tables to handle "asynchronous" events, such as interrupts and pre-emption, rather than being necessary for regular code? I notice that by default the x86 Linux kernel is compiled with -fomit-frame-pointer but without -fasynchronous-unwind-tables. Similarly how's this handled by other targets that define CAN_DEBUG_WITHOUT_FP including the alpha, arm, avr, fr30, frv, ia64, hq2000, m32r, mps, mmix, mn10300, pa, rs6000, s390, sparc, stormy16, v850 and xtensa. The reason why defaulting to -fasynchronous-unwind-tables in unreasonable is the large code size regression that this incurs. The following table is an extended version of the one from earlier today (CSiBE with -Os): Before After +Async Unwind Info bzip2,blocksort 7025 6305 6975 bzip2,bzip2 16221 16069 21289 bzip2,bzip2recover 3397 3365 4831 bzip2,compress 9532 8998 9658 bzip2,decompress 8216 8294* 8606 bzip2,dlltest 675 671 923 bzip2,huffman 1107 1054 1238 bzip2,mk251 30 24 76 bzip2,spewG 284 272 396 bzip2,unzcrash 912 908 1200 catdvi,adobe2h 33622 33636* 33972 catdvi,bytesex 790 774 1286 catdvi,canvas 2668 2630 4162 catdvi,catdvi 3933 3926 4710 catdvi,density 2041 2006 3278 catdvi,fixword 123 115 215 ... To summarize, the patch as proposed reduced the CSiBE benchmarks on i686-pc-linux-gnu from 939614 to 933795 (a 0.62% reduction), but making -fasynchronous-unwind-tables the default (in addition to -fomit-frame-pointers) increases this back up to 1220631, a whoping 30% increase over the original. My suggestion is that for code that requires use of glibc's backtrace(3) function, we recommend/require the user to manually specify either -fno-omit-frame-pointer or -fasynchronous-unwind-tables depending upon whether size or speed is more important. Of course, any improvements in glibc's backtrace functionality would be welcome. As a compromise, we could consider only enabling -fomit-frame-pointer by default on cygwin, mingw, netbsd, darwin/x86 and solaris/x86, retaining the frame pointer only for glibc targets, such as Linux/GNU, until the backtrace issue is resolved. I don't believe that gcc/g++ itself makes use of glibc's backtrace, so it could benefit even on Linux by adding -fomit-frame-pointer to its appropriate CFLAGS. Roger -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 8:18 ` Roger Sayle @ 2004-07-12 8:28 ` Andrew Pinski 2004-07-12 8:38 ` Michael Matz 2004-07-12 12:42 ` Jakub Jelinek 1 sibling, 1 reply; 28+ messages in thread From: Andrew Pinski @ 2004-07-12 8:28 UTC (permalink / raw) To: Roger Sayle; +Cc: Jakub Jelinek, gcc-patches > > > On Mon, 12 Jul 2004, Jakub Jelinek wrote: > > I don't think we can turn -fomit-frame-pointer by default on unless > > we also -fasynchronous-unwind-tables, because otherwise things like > > glibc's backtrace(3) has zero chance of ever working. > > The reason why defaulting to -fasynchronous-unwind-tables in unreasonable > is the large code size regression that this incurs. The following table > is an extended version of the one from earlier today (CSiBE with -Os): > > Before After +Async Unwind Info > bzip2,blocksort 7025 6305 6975 > bzip2,bzip2 16221 16069 21289 > bzip2,bzip2recover 3397 3365 4831 > bzip2,compress 9532 8998 9658 > bzip2,decompress 8216 8294* 8606 > bzip2,dlltest 675 671 923 > bzip2,huffman 1107 1054 1238 > bzip2,mk251 30 24 76 > bzip2,spewG 284 272 396 > bzip2,unzcrash 912 908 1200 > catdvi,adobe2h 33622 33636* 33972 > catdvi,bytesex 790 774 1286 > catdvi,canvas 2668 2630 4162 > catdvi,catdvi 3933 3926 4710 > catdvi,density 2041 2006 3278 > catdvi,fixword 123 115 215 > ... > > > To summarize, the patch as proposed reduced the CSiBE benchmarks on > i686-pc-linux-gnu from 939614 to 933795 (a 0.62% reduction), but > making -fasynchronous-unwind-tables the default (in addition to > -fomit-frame-pointers) increases this back up to 1220631, a whoping > 30% increase over the original. Huh? That is not a code size problem but rather a data size problem. Can you try again this time use size to show the size of the text section? In fact for elf this section is not loaded at all except when it is needed. I think you need to reduce the test again as it looks like you are measuring data size also. Thanks, Andrew Pinski ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 8:28 ` Andrew Pinski @ 2004-07-12 8:38 ` Michael Matz 0 siblings, 0 replies; 28+ messages in thread From: Michael Matz @ 2004-07-12 8:38 UTC (permalink / raw) To: Andrew Pinski; +Cc: Roger Sayle, Jakub Jelinek, gcc-patches Hi, On Mon, 12 Jul 2004, Andrew Pinski wrote: > > On Mon, 12 Jul 2004, Jakub Jelinek wrote: > > > I don't think we can turn -fomit-frame-pointer by default on unless > > > we also -fasynchronous-unwind-tables, because otherwise things like > > > glibc's backtrace(3) has zero chance of ever working. > > > > The reason why defaulting to -fasynchronous-unwind-tables in unreasonable > > is the large code size regression that this incurs. It incurs an enlargement of .eh_frame and .eh_frame_hdr, not of any code size. > > The following table > > is an extended version of the one from earlier today (CSiBE with -Os): > > > > Before After +Async Unwind Info > > bzip2,blocksort 7025 6305 6975 > > > Huh? That is not a code size problem but rather a data size problem. Can > you try again this time use size to show the size of the text section? CSiBE uses 'size' internally AFAIK. Unfortunately it adds the .eh sections to text size. So CSiBE results have to be argued about carefully. > In fact for elf this section is not loaded at all except when it is > needed. I think you need to reduce the test again as it looks like you > are measuring data size also. Ciao, Michael. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 8:18 ` Roger Sayle 2004-07-12 8:28 ` Andrew Pinski @ 2004-07-12 12:42 ` Jakub Jelinek 2004-07-12 18:03 ` Florian Weimer 1 sibling, 1 reply; 28+ messages in thread From: Jakub Jelinek @ 2004-07-12 12:42 UTC (permalink / raw) To: Roger Sayle; +Cc: gcc-patches On Mon, Jul 12, 2004 at 12:45:05AM -0600, Roger Sayle wrote: > > I don't think we can turn -fomit-frame-pointer by default on unless > > we also -fasynchronous-unwind-tables, because otherwise things like > > glibc's backtrace(3) has zero chance of ever working. > > I doubt that this would be acceptable. Could you confirm that glibc's > backtrace(3) command only requires -fasynchronous-unwind-tables to > handle "asynchronous" events, such as interrupts and pre-emption, rather > than being necessary for regular code? I notice that by default the > x86 Linux kernel is compiled with -fomit-frame-pointer but without > -fasynchronous-unwind-tables. Similarly how's this handled by other It increases read-only data, and as long as .eh_frame/.eh_frame_hdr sections are never used by the application, because .eh_frame/.eh_frame_hdr/.gcc_except_table sections are located these days at the end of the read-only ELF segment, the size impact is either zero, or at most a partial 4KB page. > targets that define CAN_DEBUG_WITHOUT_FP including the alpha, arm, avr, > fr30, frv, ia64, hq2000, m32r, mps, mmix, mn10300, pa, rs6000, s390, > sparc, stormy16, v850 and xtensa. E.g. ia64 emits its format of unwind data unconditionally, sparc has essentially frame pointer always present unless in leaf function (due to register windows), s390* has -fasynchronous-unwind-tables by default, various other arches have fixed conventions how to do the backtrace even without frame pointer (storing a backchain on the stack, etc.). Jakub ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 12:42 ` Jakub Jelinek @ 2004-07-12 18:03 ` Florian Weimer 2004-07-12 18:06 ` Paolo Bonzini 2004-07-12 19:09 ` Jakub Jelinek 0 siblings, 2 replies; 28+ messages in thread From: Florian Weimer @ 2004-07-12 18:03 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Roger Sayle, gcc-patches * Jakub Jelinek: > It increases read-only data, and as long as .eh_frame/.eh_frame_hdr > sections are never used by the application, because > .eh_frame/.eh_frame_hdr/.gcc_except_table sections are located these days > at the end of the read-only ELF segment, the size impact is either zero, > or at most a partial 4KB page. I don't understand your distinction between code and data directly supporting code execution. If executable machine code size is the only criterion, we could compile code to P-code and have almost zero code size, but I fail to see how this would help those who struggle with code size. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 18:03 ` Florian Weimer @ 2004-07-12 18:06 ` Paolo Bonzini 2004-07-12 18:19 ` Paolo Bonzini ` (2 more replies) 2004-07-12 19:09 ` Jakub Jelinek 1 sibling, 3 replies; 28+ messages in thread From: Paolo Bonzini @ 2004-07-12 18:06 UTC (permalink / raw) To: gcc-patches; +Cc: Roger Sayle, gcc-patches > I don't understand your distinction between code and data directly > supporting code execution. If executable machine code size is the > only criterion, we could compile code to P-code and have almost zero > code size, but I fail to see how this would help those who struggle > with code size. Unwind data is never loaded from disk in the first place, it only wastes address space. Read-only is shared between different "instances" of the same applications, and swapped to/from the on-disk binary rather than the swap file. Paolo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 18:06 ` Paolo Bonzini @ 2004-07-12 18:19 ` Paolo Bonzini 2004-07-12 19:06 ` Florian Weimer 2004-07-12 19:07 ` Giovanni Bajo 2 siblings, 0 replies; 28+ messages in thread From: Paolo Bonzini @ 2004-07-12 18:19 UTC (permalink / raw) To: Florian Weimer; +Cc: Roger Sayle, gcc-patches > I don't understand your distinction between code and data directly > supporting code execution. If executable machine code size is the > only criterion, we could compile code to P-code and have almost zero > code size, but I fail to see how this would help those who struggle > with code size. Unwind data is never loaded from disk in the first place, it only wastes address space. Read-only is shared between different "instances" of the same applications, and swapped to/from the on-disk binary rather than the swap file. Paolo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 18:06 ` Paolo Bonzini 2004-07-12 18:19 ` Paolo Bonzini @ 2004-07-12 19:06 ` Florian Weimer 2004-07-12 19:07 ` Giovanni Bajo 2 siblings, 0 replies; 28+ messages in thread From: Florian Weimer @ 2004-07-12 19:06 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Roger Sayle, gcc-patches * Paolo Bonzini: >> I don't understand your distinction between code and data directly >> supporting code execution. If executable machine code size is the >> only criterion, we could compile code to P-code and have almost zero >> code size, but I fail to see how this would help those who struggle >> with code size. > > Unwind data is never loaded from disk in the first place, it only > wastes address space. Read-only is shared between different > "instances" of the same applications, and swapped to/from the on-disk > binary rather than the swap file. Disk? What is a disk? 8-) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 18:06 ` Paolo Bonzini 2004-07-12 18:19 ` Paolo Bonzini 2004-07-12 19:06 ` Florian Weimer @ 2004-07-12 19:07 ` Giovanni Bajo 2004-07-12 19:24 ` Jakub Jelinek 2004-07-12 19:31 ` Paolo Bonzini 2 siblings, 2 replies; 28+ messages in thread From: Giovanni Bajo @ 2004-07-12 19:07 UTC (permalink / raw) To: gcc-patches Paolo Bonzini wrote: >> I don't understand your distinction between code and data directly >> supporting code execution. If executable machine code size is the >> only criterion, we could compile code to P-code and have almost zero >> code size, but I fail to see how this would help those who struggle >> with code size. > > Unwind data is never loaded from disk in the first place, it only > wastes address space. Read-only is shared between different > "instances" of the same applications, and swapped to/from the on-disk > binary rather than the swap file. There are other systems than GNU/Linux, you know. Like embedded systems with no MMU or without a smart enough OS. People who care about -Os usually want the total binary ELF image to be as small as possible, because it needs to be burnt in a flash, etc. -fasynchronous-unwind-tables should be a no-no for -Os. -- Giovanni Bajo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 19:07 ` Giovanni Bajo @ 2004-07-12 19:24 ` Jakub Jelinek 2004-07-12 19:31 ` Paolo Bonzini 1 sibling, 0 replies; 28+ messages in thread From: Jakub Jelinek @ 2004-07-12 19:24 UTC (permalink / raw) To: Giovanni Bajo; +Cc: gcc-patches On Mon, Jul 12, 2004 at 02:48:13PM +0200, Giovanni Bajo wrote: > There are other systems than GNU/Linux, you know. Like embedded systems with no > MMU or without a smart enough OS. People who care about -Os usually want the > total binary ELF image to be as small as possible, because it needs to be burnt > in a flash, etc. > > -fasynchronous-unwind-tables should be a no-no for -Os. Not for -Os, but for those embedded targets. Jakub ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 19:07 ` Giovanni Bajo 2004-07-12 19:24 ` Jakub Jelinek @ 2004-07-12 19:31 ` Paolo Bonzini 1 sibling, 0 replies; 28+ messages in thread From: Paolo Bonzini @ 2004-07-12 19:31 UTC (permalink / raw) To: gcc-patches > There are other systems than GNU/Linux, you know. Yes. But for x86, which is being discussed, about all OSes are smart enough to optimize read-only sections. > -fasynchronous-unwind-tables should be a no-no for -Os. This makes sense only to some extent unfortunately. Very often if you care about code size you also try to do -fno-rtti -fno-exceptions if possible, and in that case -fasynchronous-unwind-tables is a necessity. On the other hand, you often use -Os on a "smart OS" with powerful enough memory management, and in that case -fomit-frame-pointer -fasynchronous-unwind-tables is a better choice. Paolo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 18:03 ` Florian Weimer 2004-07-12 18:06 ` Paolo Bonzini @ 2004-07-12 19:09 ` Jakub Jelinek 1 sibling, 0 replies; 28+ messages in thread From: Jakub Jelinek @ 2004-07-12 19:09 UTC (permalink / raw) To: Florian Weimer; +Cc: Roger Sayle, gcc-patches On Mon, Jul 12, 2004 at 02:33:44PM +0200, Florian Weimer wrote: > > It increases read-only data, and as long as .eh_frame/.eh_frame_hdr > > sections are never used by the application, because > > .eh_frame/.eh_frame_hdr/.gcc_except_table sections are located these days > > at the end of the read-only ELF segment, the size impact is either zero, > > or at most a partial 4KB page. > > I don't understand your distinction between code and data directly > supporting code execution. If executable machine code size is the > only criterion, we could compile code to P-code and have almost zero > code size, but I fail to see how this would help those who struggle > with code size. Frame unwind info is data supporting code execution only when you actually query the unwind data. If you don't need the unwind info, the impact is (almost) zero. If you need it, then its size of course matters, but certainly is not "fixed" by not generating it at all. Roger was suggesting to -fomit-frame-pointer by default but not -fasynchronous-unwind-tables and tell people that if they need backtrace(3), they need to recompile their stuff (plus any libraries which might show up in the backtraces) with -fasynchronous-unwind-tables or -fno-omit-frame-pointer. I claim that there is no justification for this from the code-size POV and unless the suggestion would be always -fa-u-t, it is going to lead to severe problems as well. (glibc backtrace(3) can either use unwind info only, or use frame pointers only, or can use unwind info first and when unwind info stops, use frame pointers. But it can hardly use frame pointers for a few frames and then switch back to unwind info). Jakub ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 7:07 ` Jakub Jelinek 2004-07-12 8:18 ` Roger Sayle @ 2004-07-12 23:43 ` Richard Henderson 2004-07-15 14:44 ` Roger Sayle 1 sibling, 1 reply; 28+ messages in thread From: Richard Henderson @ 2004-07-12 23:43 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Roger Sayle, gcc-patches On Mon, Jul 12, 2004 at 01:24:14AM -0400, Jakub Jelinek wrote: > I don't think we can turn -fomit-frame-pointer by default on unless > we also -fasynchronous-unwind-tables, because otherwise things like > glibc's backtrace(3) has zero chance of ever working. Indeed. We might talk about turning on -momit-leaf-frame-pointer again, though. r~ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-12 23:43 ` Richard Henderson @ 2004-07-15 14:44 ` Roger Sayle 2004-07-15 14:54 ` Richard Henderson 0 siblings, 1 reply; 28+ messages in thread From: Roger Sayle @ 2004-07-15 14:44 UTC (permalink / raw) To: Richard Henderson; +Cc: Jakub Jelinek, gcc-patches On Mon, 12 Jul 2004, Richard Henderson wrote: > On Mon, Jul 12, 2004 at 01:24:14AM -0400, Jakub Jelinek wrote: > > I don't think we can turn -fomit-frame-pointer by default on unless > > we also -fasynchronous-unwind-tables, because otherwise things like > > glibc's backtrace(3) has zero chance of ever working. > > Indeed. We might talk about turning on -momit-leaf-frame-pointer > again, though. Reading glibc's implementation on backtrace for i386, I think I have a cunning plan. The code in sysdeps/i386/bactrace.c assumes that the full backtrace can be recovered by traversing a linked list of frames, starting with the current frame %ebp. A compromise which provides most of the performance advantages of -fomit-frame-pointer but still allows backtrace(3) to function might be for the i386 to treat %ebp as a pseudo-fixed register. The theory is that over all subroutine calls %ebp must contain a valid stack frame pointer, but not necessarily for the current function. Any function that uses a frame-pointer behaves as it always has. Any function, that doesn't must preserve the %ebp from its parent when calling children. This ensures that backtrace still sees a linked list of stack frames, just not *every* stack frame. To a user of backtrace, this is no different than inlining. In the worst case, backtrace will see a valid backtrace containing just "main" and its parents. Theoretically, GCC could still use %ebp in leaf functions, or in non-leaf functions provided that it restore its value prior to making any function calls. This scheme allows backtrace(3) to continue to function even when mixing and interleaving functions compiled with and without -fomit-frame-pointer. Almost all of the dramatic performance improvement with -fomit-frame-pointer is from not setting up and restoring a stack-frame, the loss of %ebp as an additional cheap register in this scheme should be negligible. For exception handling code, this should even be a performance win, as backtrace can skip "uninteresting" stack frames without a frame-pointer, assuming that all functions (or children of functions) that catch exceptions are compiled with a frame pointer. Hopefully, this makes some kind of sense, and is a generalization of -momit-leaf-frame-pointer. If you're unhappy with this approach, I'm also investigating "plan B", but that requires significantly more code. Is this doable? Roger -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-15 14:44 ` Roger Sayle @ 2004-07-15 14:54 ` Richard Henderson 2004-07-15 15:07 ` Roger Sayle 0 siblings, 1 reply; 28+ messages in thread From: Richard Henderson @ 2004-07-15 14:54 UTC (permalink / raw) To: Roger Sayle; +Cc: Jakub Jelinek, gcc-patches On Wed, Jul 14, 2004 at 07:53:01PM -0600, Roger Sayle wrote: > Is this doable? It would be *possible*, but would be quite a lot of code to get right. I'm not sure that it's worth it. r~ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-15 14:54 ` Richard Henderson @ 2004-07-15 15:07 ` Roger Sayle 2004-07-15 16:04 ` Richard Henderson 0 siblings, 1 reply; 28+ messages in thread From: Roger Sayle @ 2004-07-15 15:07 UTC (permalink / raw) To: Richard Henderson; +Cc: Jakub Jelinek, gcc-patches On Wed, 14 Jul 2004, Richard Henderson wrote: > On Wed, Jul 14, 2004 at 07:53:01PM -0600, Roger Sayle wrote: > > Is this doable? > > It would be *possible*, but would be quite a lot of code to get right. Would it be significantly easier just to treat %ebp as a "real" fixed register. Either saved and restored in functions with a frame-pointer, or completely unused in functions without? This should limit the changes to prologue/epilogue generation, and perhaps tweaking register classes. Less pretty, but safer (i.e. avoids your "a lot of code to get right"). > I'm not sure that it's worth it. The benchmarking results are compelling. I'll time some bootstraps to determine how many minutes it shaves off of a full bootstrap. Roger -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-15 15:07 ` Roger Sayle @ 2004-07-15 16:04 ` Richard Henderson 2004-07-16 20:36 ` Roger Sayle 0 siblings, 1 reply; 28+ messages in thread From: Richard Henderson @ 2004-07-15 16:04 UTC (permalink / raw) To: Roger Sayle; +Cc: Jakub Jelinek, gcc-patches On Wed, Jul 14, 2004 at 08:13:24PM -0600, Roger Sayle wrote: > The benchmarking results are compelling. I'll time some bootstraps to > determine how many minutes it shaves off of a full bootstrap. If you do this, do include -momit-leaf-frame-pointer, since I suspect this is where the bulk of the improvement comes from. r~ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-15 16:04 ` Richard Henderson @ 2004-07-16 20:36 ` Roger Sayle 2004-07-16 21:12 ` Zack Weinberg 2004-07-16 22:34 ` Richard Henderson 0 siblings, 2 replies; 28+ messages in thread From: Roger Sayle @ 2004-07-16 20:36 UTC (permalink / raw) To: Richard Henderson; +Cc: Jakub Jelinek, gcc-patches On Wed, 14 Jul 2004, Richard Henderson wrote: > On Wed, Jul 14, 2004 at 08:13:24PM -0600, Roger Sayle wrote: > > The benchmarking results are compelling. I'll time some bootstraps to > > determine how many minutes it shaves off of a full bootstrap. > > If you do this, do include -momit-leaf-frame-pointer, since > I suspect this is where the bulk of the improvement comes from. Ok, here are the effects on three stage bootstrap times on i686-pc-linux with an unmodified host compiler, building all default languages and all runtime libraries. These are user-times reported for three bootstraps: mainline: 59m05.170s 59m14.470s 59m22.780s -fomit-*: 58m30.580s 58m43.730s 58m47.490s -momit-*: 58m49.220s 59m15.230s 59m31.480s Firstly, the numbers are extremely noisy possibly due to other loads on the machine being timed, but I'd estimate that -fomit-frame-pointer reduces bootstrap times by 40 seconds, and -momit-leaf-frame-pointer by approximately half of that, about 20 seconds. These values are consistent with the SPECint2000 analysis of command line options (for GCC 3.0.1) that was performed by Andreas Jaeger at http://www.suse.de/~aj/SPEC/compare-flags.html To summarise his "geometric mean" results: -O2 -march=athlon: 401s -O2 -march=athlon -fomit-frame-pointer: 391s +2.5% -O2 -march=athlon -momit-leaf-frame-pointer: 394s +1.0% m/f=40% -O3 -march=athlon: 397s -O3 -march=athlon -fomit-frame-pointer: 391s +1.5% -O3 -march=athlon -momit-leaf-frame-pointer: 394s +0.8% m/f=50% So it looks like -momit-leaf-frame-pointer consistently captures about half of the benefit of -fomit-frame-pointer. Now for the bad news. I suspect that -momit-leaf-frame-pointer is not much safer than -fomit-frame-pointer with respect to glibc's backtrace. Consider the scenario of a leaf function that recieves a signal or triggers a division by zero exception. If, in that leaf function, %ebp has been used as a random scratch register the signal/trap handler will be unable to use backtrace(3) to determine which function is supposed to catch the exception. The probability of failure is lower, but the underlying pathology still remains. If someone could correct the flaw in the above analysis, I'd love to hear it. Hence, I believe that the correct way to obtain that not insignificant 2.5% improvement in SPECint2000 performance above, is for the i386 to treat %ebp as a fixed (non general purpose) register in the scheme that I described earlier. This would allow frame pointers to be eliminated both from leaf and non-leaf functions, and would resolve the issue above of non-call exceptions in a leaf functions (not unusual for gcj). There's even an additional benefit of reserving %ebp solely as a stack frame pointer, and just deciding to use it or not in a function. It means that this register is always available to become a frame pointer during reload. Hence, we can postpone the decision of whether we need a frame pointer or not until spill code generation, when the requirement to save and restore a significant number of temporaries to and from the stack should trigger the use of "%ebp" to allow using its more efficient addressing modes. Any help the x86 maintainers could offer experimenting with reserving %ebp as a frame pointer would be very much appreciated. Pretty please. Just for the record, here's the "momit-leaf-frame-pointer" patch that I used in the above testing: Index: config/i386/i386.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.h,v retrieving revision 1.390 diff -c -3 -p -r1.390 i386.h *** config/i386/i386.h 14 Jul 2004 06:24:15 -0000 1.390 --- config/i386/i386.h 15 Jul 2004 18:10:38 -0000 *************** extern int x86_prefetch_sse; *** 440,449 **** #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0 #endif ! /* Once GDB has been enhanced to deal with functions without frame ! pointers, we can change this to allow for elimination of ! the frame pointer in leaf functions. */ ! #define TARGET_DEFAULT 0 /* This is not really a target flag, but is done this way so that it's analogous to similar code for Mach-O on PowerPC. darwin.h --- 440,447 ---- #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0 #endif ! /* Omit the frame pointer from leaf functions by default. */ ! #define TARGET_DEFAULT MASK_OMIT_LEAF_FRAME_POINTER /* This is not really a target flag, but is done this way so that it's analogous to similar code for Mach-O on PowerPC. darwin.h Roger -- Roger Sayle, E-mail: roger@eyesopen.com OpenEye Scientific Software, WWW: http://www.eyesopen.com/ Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385 Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-16 20:36 ` Roger Sayle @ 2004-07-16 21:12 ` Zack Weinberg 2004-07-17 0:41 ` Roger Sayle 2004-07-16 22:34 ` Richard Henderson 1 sibling, 1 reply; 28+ messages in thread From: Zack Weinberg @ 2004-07-16 21:12 UTC (permalink / raw) To: Roger Sayle; +Cc: Richard Henderson, Jakub Jelinek, gcc-patches Roger Sayle <roger@eyesopen.com> writes: > Hence, I believe that the correct way to obtain that not insignificant > 2.5% improvement in SPECint2000 performance above, is for the i386 to > treat %ebp as a fixed (non general purpose) register in the scheme that > I described earlier. This would allow frame pointers to be eliminated > both from leaf and non-leaf functions, and would resolve the issue > above of non-call exceptions in a leaf functions (not unusual for gcj). Since this scheme doesn't make %ebp available as a general register, I will be very surprised if it actually gets the same 2.5% performance improvement. What was wrong with making -fomit-frame-pointer imply -fasynchronous-unwind-tables? People concerned with code size can turn it back off again. zw ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-16 21:12 ` Zack Weinberg @ 2004-07-17 0:41 ` Roger Sayle 0 siblings, 0 replies; 28+ messages in thread From: Roger Sayle @ 2004-07-17 0:41 UTC (permalink / raw) To: Zack Weinberg; +Cc: Richard Henderson, Jakub Jelinek, gcc-patches On Fri, 16 Jul 2004, Zack Weinberg wrote: > What was wrong with making -fomit-frame-pointer imply > -fasynchronous-unwind-tables? This is something that perhaps someone can clear up for me. Adding -fasynchronous-unwind-tables to GCC doesn't appear to fix glibc's backtrace(3) function (which reserving %ebp would). Clearly there's some context that needs to be explained to me, probably "gdb"?, that explains why/how -fasynchronous-unwind-tables addresses/resolves an issue that's currently blocking -fomit-frame-pointer by default. I can understand why backtrace(3) causes a problem, but I'm still unsure about -fasynchronous-unwind-tables; for example why this has any more benefit than -funwind-tables. If "backtrace" can't be used through signal frames, then perhaps a compromise might be to add a new -ftrapping-unwind-tables? And if its only a "gdb" issue, maybe the solution need only be restricted to "-g". > People concerned with code size can turn it back off again. It's a correctness issue. I've been told ommitting the frame pointer without asynchronous unwind tables is unsafe, and therefore inappropriate for -Os. A better alternative would be to have -Os require a frame pointer and thereby omit the unwind tables, as the resulting binary object files (as measured by CSiBE) would be smaller. Of course, if someone had recently proposed a method for "safely" ommitting the frame pointer, independently of DWARF-2 annotations, that would still be the best solution :> Confused of Santa Fe -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-16 20:36 ` Roger Sayle 2004-07-16 21:12 ` Zack Weinberg @ 2004-07-16 22:34 ` Richard Henderson 2004-07-16 23:27 ` Geert Bosch 2004-07-16 23:28 ` Roger Sayle 1 sibling, 2 replies; 28+ messages in thread From: Richard Henderson @ 2004-07-16 22:34 UTC (permalink / raw) To: Roger Sayle; +Cc: Jakub Jelinek, gcc-patches On Fri, Jul 16, 2004 at 10:32:03AM -0600, Roger Sayle wrote: > Now for the bad news. I suspect that -momit-leaf-frame-pointer is not > much safer than -fomit-frame-pointer with respect to glibc's backtrace. > Consider the scenario of a leaf function that recieves a signal ... Folks that use %ebp walking to examine the stack DO NOT use it in the presence of signals. They get broken by the signal frame before being broken by the leaf function. Being able to walk through a signal frame is a new feature of dwarf2 unwinding. > Hence, I believe that the correct way to obtain that not insignificant > 2.5% improvement in SPECint2000 performance above, is for the i386 to > treat %ebp as a fixed (non general purpose) register in the scheme that > I described earlier. This would allow frame pointers to be eliminated > both from leaf and non-leaf functions, and would resolve the issue > above of non-call exceptions in a leaf functions (not unusual for gcj). GCJ will be using real unwind info, and thus it won't matter what scheme we use for the local stack frame. As for treating %ebp as fixed... well, I guess we could see what kind of improvement that gives. r~ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-16 22:34 ` Richard Henderson @ 2004-07-16 23:27 ` Geert Bosch 2004-07-16 23:28 ` Roger Sayle 1 sibling, 0 replies; 28+ messages in thread From: Geert Bosch @ 2004-07-16 23:27 UTC (permalink / raw) To: Richard Henderson; +Cc: Jakub Jelinek, gcc-patches, Roger Sayle On Jul 16, 2004, at 15:16, Richard Henderson wrote: > Folks that use %ebp walking to examine the stack DO NOT use it > in the presence of signals. They get broken by the signal frame > before being broken by the leaf function. Being able to walk > through a signal frame is a new feature of dwarf2 unwinding. I'm not sure that's true for GDB's usage for example. Many unwinders use heuristics or specific knowledge about signal frames to be able to trace past these. -Geert ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 2004-07-16 22:34 ` Richard Henderson 2004-07-16 23:27 ` Geert Bosch @ 2004-07-16 23:28 ` Roger Sayle 1 sibling, 0 replies; 28+ messages in thread From: Roger Sayle @ 2004-07-16 23:28 UTC (permalink / raw) To: Richard Henderson; +Cc: Jakub Jelinek, gcc-patches On Fri, 16 Jul 2004, Richard Henderson wrote: > As for treating %ebp as fixed... well, I guess we could see what > kind of improvement that gives. Thanks, I'd really appreciate it. If nothing else, the performance numbers would answer Zack's question about the relative performance overhead of setting up and restoring the frame pointer vs. the benefit of having %ebp available as a general purpose register during register allocation. Roger -- ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <Pine.LNX.4.44.0407111933390.11880-100000@nondot.org>]
* Re: [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 [not found] <Pine.LNX.4.44.0407111933390.11880-100000@nondot.org> @ 2004-07-12 3:11 ` Roger Sayle 0 siblings, 0 replies; 28+ messages in thread From: Roger Sayle @ 2004-07-12 3:11 UTC (permalink / raw) To: Chris Lattner; +Cc: Joseph S. Myers, gcc-patches On Sun, 11 Jul 2004, Chris Lattner wrote: > > Do you have any doubt at all that this patch won't improve run-time, > > reduce code size and speed up a bootstrapped compiler? :> > > Yes. It's quite likely to substantially increase code size. Almost every > reference to the stack will have to be relative to the ESP register. > References like [ESP+10] are substantially larger than the > equivalent [EBP-10] encoding because the SIB encoding must be used. > > Food for thought, This is a fair concern. I've just benchmarked the code size effect of the patch, using the CSiBE v1.0.1 benchmark and the default "-Os" flags. The total code size on i686-pc-linux-gnu drops from 939614 to 933795 (or by about 0.62%). The first few differences look like: Before After bzip2,blocksort 7025 6305 bzip2,bzip2 16221 16069 bzip2,bzip2recover 3397 3365 bzip2,compress 9532 8998 bzip2,decompress 8216 8294* bzip2,dlltest 675 671 bzip2,huffman 1107 1054 bzip2,mk251 30 24 bzip2,spewG 284 272 bzip2,unzcrash 912 908 catdvi,adobe2h 33622 33636* catdvi,bytesex 790 774 catdvi,canvas 2668 2630 catdvi,catdvi 3933 3926 catdvi,density 2041 2006 catdvi,fixword 123 115 ... Hence although there are a few regressions (marked with an asterisk) overall omitting the frame-pointer is a code-size win. I will however admit that the stripped "cc1" binary grows from 4986396 bytes to 5002748 bytes (or about 0.33%), but I'm not sure if a little of this isn't caused by the longer pathnames in the "patcho" directory than the "clean" directory. Roger -- ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2004-07-16 20:35 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-07-12 0:54 [PATCH] PR 16373: -fomit-frame-pointer when optimizing on x86 Roger Sayle 2004-07-12 1:43 ` Joseph S. Myers 2004-07-12 2:19 ` Roger Sayle 2004-07-12 7:07 ` Jakub Jelinek 2004-07-12 8:18 ` Roger Sayle 2004-07-12 8:28 ` Andrew Pinski 2004-07-12 8:38 ` Michael Matz 2004-07-12 12:42 ` Jakub Jelinek 2004-07-12 18:03 ` Florian Weimer 2004-07-12 18:06 ` Paolo Bonzini 2004-07-12 18:19 ` Paolo Bonzini 2004-07-12 19:06 ` Florian Weimer 2004-07-12 19:07 ` Giovanni Bajo 2004-07-12 19:24 ` Jakub Jelinek 2004-07-12 19:31 ` Paolo Bonzini 2004-07-12 19:09 ` Jakub Jelinek 2004-07-12 23:43 ` Richard Henderson 2004-07-15 14:44 ` Roger Sayle 2004-07-15 14:54 ` Richard Henderson 2004-07-15 15:07 ` Roger Sayle 2004-07-15 16:04 ` Richard Henderson 2004-07-16 20:36 ` Roger Sayle 2004-07-16 21:12 ` Zack Weinberg 2004-07-17 0:41 ` Roger Sayle 2004-07-16 22:34 ` Richard Henderson 2004-07-16 23:27 ` Geert Bosch 2004-07-16 23:28 ` Roger Sayle [not found] <Pine.LNX.4.44.0407111933390.11880-100000@nondot.org> 2004-07-12 3:11 ` Roger Sayle
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).