* -Os is weak... @ 2010-09-09 16:43 DJ Delorie 2010-09-09 17:16 ` Ian Lance Taylor ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: DJ Delorie @ 2010-09-09 16:43 UTC (permalink / raw) To: gcc The docs say... @item -Os @opindex Os Optimize for size. @option{-Os} enables all @option{-O2} optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. @option{-Os} disables the following optimization flags: @gccoptlist{-falign-functions -falign-jumps -falign-loops @gol -falign-labels -freorder-blocks -freorder-blocks-and-partition @gol -fprefetch-loop-arrays -ftree-vect-loop-version} But in reality, the only thing -Os does beyond -O2, aside from a few niche special cases, is enable inlining, and maybe scheduling, which for some cases may be the wrong thing to do. Is this what we want? flag_schedule_insns = opt2 && ! optimize_size; if (optimize_size) { /* Inlining of functions reducing size is a good idea regardless of them being declared inline. */ flag_inline_functions = 1; /* Basic optimization options. */ optimize_size = 1; if (optimize > 2) optimize = 2; /* We want to crossjump as much as possible. */ set_param_value ("min-crossjump-insns", 1); } else set_param_value ("min-crossjump-insns", initial_min_crossjump_insns); $ grep optimize_size *.c genconditions.c: { "! optimize_size && ! TARGET_READ_MODIFY_WRITE", genconditions.c: __builtin_constant_p (! optimize_size && ! TARGET_READ_MODIFY_WRITE) genconditions.c: ? (int) (! optimize_size && ! TARGET_READ_MODIFY_WRITE) opts.c: optimize_size = 0; opts.c: optimize_size = 0; opts.c: optimize_size = 1; opts.c: optimize_size = 0; opts.c: flag_schedule_insns = opt2 && ! optimize_size; opts.c: if (optimize_size) opts.c: optimize_size = 1; opts.c: OPTIMIZATION_OPTIONS (optimize, optimize_size); predict.c: if (optimize_size) predict.c: return (optimize_size toplev.c: The only valid values are zero and nonzero. When optimize_size is toplev.c:int optimize_size = 0; toplev.c: if (flag_prefetch_loop_arrays > 0 && optimize_size) tree-inline.c: if (size < 0 || size > MOVE_MAX_PIECES * MOVE_RATIO (!optimize_size)) tree-inline.c: || (caller_opt->optimize_size != callee_opt->optimize_size)) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-09 16:43 -Os is weak DJ Delorie @ 2010-09-09 17:16 ` Ian Lance Taylor 2010-09-09 17:20 ` Andrew Pinski 2010-09-09 17:38 ` DJ Delorie 2010-09-09 19:42 ` Steven Bosscher ` (2 subsequent siblings) 3 siblings, 2 replies; 13+ messages in thread From: Ian Lance Taylor @ 2010-09-09 17:16 UTC (permalink / raw) To: DJ Delorie; +Cc: gcc DJ Delorie <dj@redhat.com> writes: > But in reality, the only thing -Os does beyond -O2, aside from a few > niche special cases, is enable inlining, and maybe scheduling, which > for some cases may be the wrong thing to do. Some backends also check optimize_size to change their cost algorithms to favor shorter instruction sequences. Ian ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-09 17:16 ` Ian Lance Taylor @ 2010-09-09 17:20 ` Andrew Pinski 2010-09-09 17:38 ` DJ Delorie 1 sibling, 0 replies; 13+ messages in thread From: Andrew Pinski @ 2010-09-09 17:20 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: DJ Delorie, gcc On Thu, Sep 9, 2010 at 10:16 AM, Ian Lance Taylor <iant@google.com> wrote: > Some backends also check optimize_size to change their cost algorithms > to favor shorter instruction sequences. Also see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16996 for all the other known code size improvements that could be done. Thanks, Andrew Pinski ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-09 17:16 ` Ian Lance Taylor 2010-09-09 17:20 ` Andrew Pinski @ 2010-09-09 17:38 ` DJ Delorie 1 sibling, 0 replies; 13+ messages in thread From: DJ Delorie @ 2010-09-09 17:38 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: gcc > Some backends also check optimize_size to change their cost algorithms > to favor shorter instruction sequences. But why doesn't it do what the documentation says? -falign-* seems like an obvious one - aligning labels and such always makes the code bigger. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-09 16:43 -Os is weak DJ Delorie 2010-09-09 17:16 ` Ian Lance Taylor @ 2010-09-09 19:42 ` Steven Bosscher 2010-09-09 20:00 ` Steven Bosscher 2010-09-10 8:44 ` Steven Bosscher 3 siblings, 0 replies; 13+ messages in thread From: Steven Bosscher @ 2010-09-09 19:42 UTC (permalink / raw) To: DJ Delorie; +Cc: gcc On Thu, Sep 9, 2010 at 6:43 PM, DJ Delorie <dj@redhat.com> wrote: > $ grep optimize_size *.c Try egrep "optimize_.*_for_speed|optimize_.*_for_size" * config/*/* Ciao! Steven ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-09 16:43 -Os is weak DJ Delorie 2010-09-09 17:16 ` Ian Lance Taylor 2010-09-09 19:42 ` Steven Bosscher @ 2010-09-09 20:00 ` Steven Bosscher 2010-09-10 8:44 ` Steven Bosscher 3 siblings, 0 replies; 13+ messages in thread From: Steven Bosscher @ 2010-09-09 20:00 UTC (permalink / raw) To: DJ Delorie; +Cc: gcc, Jan Hubicha On Thu, Sep 9, 2010 at 6:43 PM, DJ Delorie <dj@redhat.com> wrote: > $ grep optimize_size *.c > genconditions.c: { "! optimize_size && ! TARGET_READ_MODIFY_WRITE", > genconditions.c: __builtin_constant_p (! optimize_size && ! TARGET_READ_MODIFY_WRITE) > genconditions.c: ? (int) (! optimize_size && ! TARGET_READ_MODIFY_WRITE) These are in comments, not actual tests of optimize_size. > opts.c: optimize_size = 0; > opts.c: optimize_size = 0; > opts.c: optimize_size = 1; > opts.c: optimize_size = 0; > opts.c: flag_schedule_insns = opt2 && ! optimize_size; > opts.c: if (optimize_size) > opts.c: optimize_size = 1; > opts.c: OPTIMIZATION_OPTIONS (optimize, optimize_size); Various initialization bits for optimize_size, this is OK. > predict.c: if (optimize_size) This looks like a bug, it should proabably be: if (optimize_function_for_size_p (DECL_STRUCT_FUNCTION (edge->caller->decl)) Honza, what do you think about this one? > predict.c: return (optimize_size This is OK, this is inside optimize_function_for_size_p. > toplev.c: The only valid values are zero and nonzero. When optimize_size is > toplev.c:int optimize_size = 0; > toplev.c: if (flag_prefetch_loop_arrays > 0 && optimize_size) These are OK. > tree-inline.c: if (size < 0 || size > MOVE_MAX_PIECES * MOVE_RATIO (!optimize_size)) This lacks context to call one of the optimize_*_for_size_p functions. So this is OK. > tree-inline.c: || (caller_opt->optimize_size != callee_opt->optimize_size)) This is inside an #if 0'ed block and would not be a reference to the global variable optimize_size anyway. It looks like this code, if enabled again, would need modifications to make it compile again. In general, any reference to the global var optimize_size should be checked to verify that there shouldn't be a more fine-grained check instead. Ciao! Steven ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-09 16:43 -Os is weak DJ Delorie ` (2 preceding siblings ...) 2010-09-09 20:00 ` Steven Bosscher @ 2010-09-10 8:44 ` Steven Bosscher 2010-09-10 16:49 ` DJ Delorie ` (2 more replies) 3 siblings, 3 replies; 13+ messages in thread From: Steven Bosscher @ 2010-09-10 8:44 UTC (permalink / raw) To: DJ Delorie; +Cc: gcc On Thu, Sep 9, 2010 at 6:43 PM, DJ Delorie <dj@redhat.com> wrote: > > The docs say... > > @item -Os > @opindex Os > Optimize for size. @option{-Os} enables all @option{-O2} optimizations that > do not typically increase code size. It also performs further > optimizations designed to reduce code size. > > @option{-Os} disables the following optimization flags: > @gccoptlist{-falign-functions -falign-jumps -falign-loops @gol > -falign-labels -freorder-blocks -freorder-blocks-and-partition @gol > -fprefetch-loop-arrays -ftree-vect-loop-version} > > But in reality, the only thing -Os does beyond -O2, aside from a few > niche special cases, is enable inlining, and maybe scheduling, which > for some cases may be the wrong thing to do. > > Is this what we want? So yesterday I already sent out a few mails explaining that there is really more than just the things you described above. It seems that you haven't followed GCC development closely enough for a while to notice that the "optimize_size" checks have mostly been replaced with more fine-grained checks, even at the level of individual insns. What you quote above, from the documentation, is also actually incomplete. The -Os option also enables optimizations that are not performed at -O[123], e.g. code hoisting only runs at -Os (see gcse.c:pass_rtl_hoist). That said, it is true that GCC does not have the strongest code size optimizations compared to other compilers on the market. There are many things GCC could do better/more to improve code size further. This is a matter of focus, as you know: If no-one cares enough to commit enough resources to code-size optimizations, they will not get implemented in GCC. I guess the most important missing optimizations are various forms of code unification, such as the sequence abstraction code that GCC used to have (http://gcc.gnu.org/projects/cfo.html, but it never worked properly and it was way too slow), or some suffix-tree based sequence finding code. Various algorithms can be found in the academic literature about code size optimizations via abstraction (see e.g. "Procedural Abstraction with Reverse Prefix Trees", http://portal.acm.org/citation.cfm?id=1545074). Even for the existing code size optimizations, improvements are possible. I've played with some ideas myself, with new work (implementing code hoisting for GIMPLE, xf. http://gcc.gnu.org/PR23286) and extending other people's work (if-conversion and cross-jumping, xf. http://gcc.gnu.org/PR20070). If you have plans to work on improved -Os optimizations, those two could be good starting points to warm up. Personally, I had hoped that the ARM folks (Linaro, or what's it called?) would work on -Os. While I've never actually used it, a web search suggests that the RealView compilers generate code that is as much as 20% smaller than GCC at -Os (for unnamed benchmarks), so apparently there is a lot of room for improvement in GCC and the ARM people should know where. Finally, of course there are just various issues with instruction selection in GCC that result in larger-than-necessary code. It seems that this doesn't hurt code speed so much, but for code size GCC doesn't always select the shortes sequence possible. Some Google folks (Carrot Wei in particular) have filed bugs and patches for a couple of cases for ARM, but there is no target-independent frame work for selecting the shortest insn or sequence. Is there a particular target you're interested in? Ciao! Steven ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-10 8:44 ` Steven Bosscher @ 2010-09-10 16:49 ` DJ Delorie 2010-09-16 17:13 ` Yao Qi 2010-09-27 3:50 ` Gerald Pfeifer 2 siblings, 0 replies; 13+ messages in thread From: DJ Delorie @ 2010-09-10 16:49 UTC (permalink / raw) To: Steven Bosscher; +Cc: gcc > Is there a particular target you're interested in? Not in that way, no. My biggest concern is that the documentation is wrong. My second concern is that the help option says it basically does nothing (well, one or two options) instead of the big list it used to do (or that the other -O* do). My third concern is that it doesn't globally affect as many options as you'd expect - like forcing alignments to 1 for all targets. Why should every target duplicate that code? I wasn't really asking "how do I make code smaller" I was asking "why does -Os appear to be useless" - emphasis on "appear". ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-10 8:44 ` Steven Bosscher 2010-09-10 16:49 ` DJ Delorie @ 2010-09-16 17:13 ` Yao Qi 2010-09-16 22:06 ` Andi Kleen 2010-09-16 22:15 ` Steven Bosscher 2010-09-27 3:50 ` Gerald Pfeifer 2 siblings, 2 replies; 13+ messages in thread From: Yao Qi @ 2010-09-16 17:13 UTC (permalink / raw) To: gcc On Fri, Sep 10, 2010 at 10:44:24AM +0200, Steven Bosscher wrote: > On Thu, Sep 9, 2010 at 6:43 PM, DJ Delorie <dj@redhat.com> wrote: > > > I guess the most important missing optimizations are various forms of > code unification, such as the sequence abstraction code that GCC used > to have (http://gcc.gnu.org/projects/cfo.html, but it never worked > properly and it was way too slow), or some suffix-tree based sequence > finding code. Various algorithms can be found in the academic > literature about code size optimizations via abstraction (see e.g. > "Procedural Abstraction with Reverse Prefix Trees", > http://portal.acm.org/citation.cfm?id=1545074). Was CFO finally merged to mainline? At least, I can't find it in current gcc. > Personally, I had hoped that the ARM folks (Linaro, or what's it > called?) would work on -Os. While I've never actually used it, a web > search suggests that the RealView compilers generate code that is as > much as 20% smaller than GCC at -Os (for unnamed benchmarks), so > apparently there is a lot of room for improvement in GCC and the ARM > people should know where. > We, Linaro Toolchain Working Group, are doing the investigation on code size improvements on thumb-2. As you said, there would be a lot of room for improvement, and here is the report we got, fyi. http://lists.linaro.org/pipermail/linaro-toolchain/2010-September/000202.html > Finally, of course there are just various issues with instruction > selection in GCC that result in larger-than-necessary code. It seems > that this doesn't hurt code speed so much, but for code size GCC > doesn't always select the shortes sequence possible. Some Google folks > (Carrot Wei in particular) have filed bugs and patches for a couple of > cases for ARM, but there is no target-independent frame work for > selecting the shortest insn or sequence. During the investigation, I feel that all the potential improvements are identified by ARM experts or by reading asm code manually. This mode doesn't scale very well. IMO, it is necessary to have a target-independent framework for code size optimization. I have no idea to do that framework though. -- Yao Qi CodeSourcery yao@codesourcery.com (650) 331-3385 x739 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-16 17:13 ` Yao Qi @ 2010-09-16 22:06 ` Andi Kleen 2010-09-18 13:31 ` Jakub Jelinek 2010-09-16 22:15 ` Steven Bosscher 1 sibling, 1 reply; 13+ messages in thread From: Andi Kleen @ 2010-09-16 22:06 UTC (permalink / raw) To: Yao Qi; +Cc: gcc Yao Qi <yao@codesourcery.com> writes: > > During the investigation, I feel that all the potential improvements > are identified by ARM experts or by reading asm code manually. This > mode doesn't scale very well. IMO, it is necessary to have a > target-independent framework for code size optimization. I have no > idea to do that framework though. On x86 gcc is definitely behind some other compilers in terms of code size. Try reading some examples from http://embed.cs.utah.edu/embarrassing/ Since the criteria of the comparisons is code size it can show you where gcc is behind some other compilers (but note that these comparisons do not include the best compilers for small size and also do not run with -Os currently) This is for x86, but could be probably used for other architectures too. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-16 22:06 ` Andi Kleen @ 2010-09-18 13:31 ` Jakub Jelinek 0 siblings, 0 replies; 13+ messages in thread From: Jakub Jelinek @ 2010-09-18 13:31 UTC (permalink / raw) To: Andi Kleen; +Cc: Yao Qi, gcc On Thu, Sep 16, 2010 at 10:55:22AM +0200, Andi Kleen wrote: > Try reading some examples from http://embed.cs.utah.edu/embarrassing/ > Since the criteria of the comparisons is code size it can show > you where gcc is behind some other compilers > > (but note that these comparisons do not include the best compilers > for small size and also do not run with -Os currently) I'm not denying that there is lots of room for -Os code size improvements, but from the http://embed.cs.utah.edu/embarrassing/ results GCC doesn't perform that bad in comparison with other compilers (gcc 3.4 best, then icc and then gcc 4.5), and all the results there were from -Os compilations across the different compilers. Jakub ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-16 17:13 ` Yao Qi 2010-09-16 22:06 ` Andi Kleen @ 2010-09-16 22:15 ` Steven Bosscher 1 sibling, 0 replies; 13+ messages in thread From: Steven Bosscher @ 2010-09-16 22:15 UTC (permalink / raw) To: Yao Qi; +Cc: gcc On Thu, Sep 16, 2010 at 9:35 AM, Yao Qi <yao@codesourcery.com> wrote: > Was CFO finally merged to mainline? At least, I can't find it in > current gcc. Yes, it was merged. And then it was removed again because the implementation had several big problems. Such as, it didn't actually work. Ciao! Steven ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: -Os is weak... 2010-09-10 8:44 ` Steven Bosscher 2010-09-10 16:49 ` DJ Delorie 2010-09-16 17:13 ` Yao Qi @ 2010-09-27 3:50 ` Gerald Pfeifer 2 siblings, 0 replies; 13+ messages in thread From: Gerald Pfeifer @ 2010-09-27 3:50 UTC (permalink / raw) To: Steven Bosscher; +Cc: DJ Delorie, gcc [-- Attachment #1: Type: TEXT/PLAIN, Size: 844 bytes --] On Fri, 10 Sep 2010, Steven Bosscher wrote: >> The docs say... >> >> @item -Os >> @opindex Os >> Optimize for size.  @option{-Os} enables all @option{-O2} optimizations that >> do not typically increase code size.  It also performs further >> optimizations designed to reduce code size. >> >> @option{-Os} disables the following optimization flags: >> @gccoptlist{-falign-functions  -falign-jumps  -falign-loops @gol >> -falign-labels  -freorder-blocks  -freorder-blocks-and-partition @gol >> -fprefetch-loop-arrays  -ftree-vect-loop-version} > What you quote above, from the documentation, is also actually > incomplete. The -Os option also enables optimizations that are not > performed at -O[123], e.g. code hoisting only runs at -Os (see > gcse.c:pass_rtl_hoist). Any chance you could update the documentation, Steven or DJ? Gerald ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-09-26 12:19 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-09-09 16:43 -Os is weak DJ Delorie 2010-09-09 17:16 ` Ian Lance Taylor 2010-09-09 17:20 ` Andrew Pinski 2010-09-09 17:38 ` DJ Delorie 2010-09-09 19:42 ` Steven Bosscher 2010-09-09 20:00 ` Steven Bosscher 2010-09-10 8:44 ` Steven Bosscher 2010-09-10 16:49 ` DJ Delorie 2010-09-16 17:13 ` Yao Qi 2010-09-16 22:06 ` Andi Kleen 2010-09-18 13:31 ` Jakub Jelinek 2010-09-16 22:15 ` Steven Bosscher 2010-09-27 3:50 ` Gerald Pfeifer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).