* Re: GCC 3.0.3 produces large code
[not found] <Pine.LNX.4.33.0201312206150.23617-100000@pandora.x256.com>
@ 2002-02-01 9:14 ` Joe Buck
2002-02-01 9:36 ` Gabriel Dos Reis
0 siblings, 1 reply; 7+ messages in thread
From: Joe Buck @ 2002-02-01 9:14 UTC (permalink / raw)
To: Nicholas Adrian Vinen; +Cc: Zack Weinberg, Ingo Krabbe, gcc
> I agree, like I said, small functions should be optimised for space
> really, large functions with big loops for speed. Actually, most of my
> inner loops are in assembly language, so for the C++ code I actually
> prefer size optimisation so that as much will stay in cache as possible.
I don't see any reason why this should be true in general. What should
happen in my view is that a user who has strong ideas like this would
be free to sort his/her functions into separate compilation units, using
-Os to optimize some for space and -O2 or -O3 to optimize others for
speed. And yes, we're aware that 3.0.x doesn't do as good a job as it
should in either department (optimizing for space or speed), but 3.1
will be at least somewhat better.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GCC 3.0.3 produces large code
2002-02-01 9:14 ` GCC 3.0.3 produces large code Joe Buck
@ 2002-02-01 9:36 ` Gabriel Dos Reis
2002-02-01 10:40 ` Nicholas Adrian Vinen
0 siblings, 1 reply; 7+ messages in thread
From: Gabriel Dos Reis @ 2002-02-01 9:36 UTC (permalink / raw)
To: Joe Buck; +Cc: Nicholas Adrian Vinen, Zack Weinberg, Ingo Krabbe, gcc
Joe Buck <jbuck@synopsys.COM> writes:
| > I agree, like I said, small functions should be optimised for space
| > really, large functions with big loops for speed. Actually, most of my
| > inner loops are in assembly language, so for the C++ code I actually
| > prefer size optimisation so that as much will stay in cache as possible.
|
| I don't see any reason why this should be true in general. What should
| happen in my view is that a user who has strong ideas like this would
| be free to sort his/her functions into separate compilation units, using
| -Os to optimize some for space and -O2 or -O3 to optimize others for
| speed.
When the small functions are static one used in large fonctions, that
is no option.
-- Gaby
CodeSourcery, LLC http://www.codesourcery.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GCC 3.0.3 produces large code
2002-02-01 9:36 ` Gabriel Dos Reis
@ 2002-02-01 10:40 ` Nicholas Adrian Vinen
0 siblings, 0 replies; 7+ messages in thread
From: Nicholas Adrian Vinen @ 2002-02-01 10:40 UTC (permalink / raw)
To: Gabriel Dos Reis; +Cc: Joe Buck, Zack Weinberg, Ingo Krabbe, gcc
> | I don't see any reason why this should be true in general. What should
> | happen in my view is that a user who has strong ideas like this would
> | be free to sort his/her functions into separate compilation units, using
> | -Os to optimize some for space and -O2 or -O3 to optimize others for
> | speed.
>
> When the small functions are static one used in large fonctions, that
> is no option.
>
> -- Gaby
Actually I think having #pragmas to set optimisations for invidual
functions would be handy. Some other compilers have this, it's good for
the above reasons, but also incase some optimisation causes a bug in the
compiler, you can work around it more easily. But my idea for the compiler
automatically choosing how to optimise each function was really just me
brainstorming. It'd be a cool "experimental" feature if nothing else:)
Nicholas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GCC 3.0.3 produces large code
2002-01-31 10:20 ` Ingo Krabbe
@ 2002-01-31 11:21 ` Zack Weinberg
0 siblings, 0 replies; 7+ messages in thread
From: Zack Weinberg @ 2002-01-31 11:21 UTC (permalink / raw)
To: Ingo Krabbe; +Cc: Nicholas Adrian Vinen, gcc
On Thu, Jan 31, 2002 at 04:47:23PM +0100, Ingo Krabbe wrote:
>
> I'm not sure about that topic, but I don't think that code size
> reduction is pushed too far in the development process of gcc, since
> it isn't used by too many people ?! I would be interested in the
> results of -O2 replacing -Os. The code structure optimizations of
> -O2 may also result in a reduction of code size. BTW. in my opinion
> is the usage of two return statements in one function a design
> fault. It is also remarkeble that the most cleanest code concerning
> function design compiles into the smallest result. That's exactly
> what I want from my gcc.
In *my* opinion, GCC should generate equally good code for all three
functions, rather than registering a preference for one style or
other. Also, GCC should care more about code size than it currently
does. It's true that most people use -O2, but with modern computers
code size has direct effects on performance.
Let's look at this in a bit more depth. Here's what we get with
Nicholas' switches and the current mainline. (Warning, long lines.
The numbers in parens are size in bytes as reported by nm
--size-sort.)
_ZN1b7DoThingEv (50): _ZN1b8DoThing2Ev (44): _ZN1b8DoThing3Ev (44):
pushl %esi pushl %esi pushl %esi
pushl %ebx pushl %ebx pushl %ebx
pushl %ebx pushl %edx pushl %ebx
pushl %ebx pushl %edx pushl %ebx
movl 20(%esp), %esi movl 20(%esp), %esi movl 20(%esp), %esi
pushl $.LC0 pushl $.LC0 pushl $.LC0
call printf call printf call printf
popl %ecx popl %eax popl %ecx
leal 8(%esi), %ebx leal 8(%esi), %ebx leal 8(%esi), %ebx
movl %ebx, (%esp) movl %ebx, (%esp) movl %ebx, (%esp)
movl $1, 4(%esp) movl $1, 4(%esp) xorl %ebx, %ebx
cmpl $0, 4(%esi) cmpl $0, 4(%esi) movl $1, 4(%esp)
je .L23 jne .L43 cmpl $0, 4(%esi)
call rand xorl %ebx, %ebx jne .L51
pushl $.LC1 .L38: .L46:
movl %eax, %ebx pushl $.LC1 pushl $.LC1
call printf call printf call printf
popl %edx addl $12, %esp addl $12, %esp
movl %ebx, %eax movl %ebx, %eax movl %ebx, %eax
.L21: popl %ebx popl %ebx
popl %ebx popl %esi popl %esi
popl %esi ret ret
popl %ebx .L43: .L51:
popl %esi call rand call rand
ret movl %eax, %ebx movl %eax, %ebx
.L23: jmp .L38 jmp .L46
pushl $.LC1
call printf
popl %eax
xorl %eax, %eax
jmp .L21
First, you will notice that the code generated for DoThing2 and
DoThing3 is identical except for the position of one xorl instruction.
That's good. We ought to have hoisted the xor operation in DoThing2,
but the global optimizer isn't up to it yet.
Second, the differences between DoThing and DoThing2 are entirely
caused by branch prediction. GCC decided that in DoThing, it was more
likely for m_pa to be non-NULL, and in DoThing2, it was more likely
for it to be NULL. I am not sure why the printf operation got
duplicated in DoThing, I would have expected to see code like this
xorl %ebx, %ebx
cmpl $0, 4(%esi)
je .L23
call rand
movl %eax, %ebx
.L23:
pushl $.LC1
call printf
<tear down stack frame and return>
-O2 produces similar code except for the stack manipulations.
_ZN1b7DoThingEv (70): _ZN1b8DoThing2Ev (60): _ZN1b8DoThing3Ev (60):
subl $28, %esp subl $28, %esp subl $28, %esp
movl %esi, 24(%esp) movl %esi, 24(%esp) movl %esi, 24(%esp)
movl 32(%esp), %esi movl 32(%esp), %esi movl 32(%esp), %esi
movl %ebx, 20(%esp) movl %ebx, 20(%esp) movl %ebx, 20(%esp)
movl $.LC0, (%esp) movl $.LC0, (%esp) movl $.LC0, (%esp)
call printf call printf call printf
movl $1, 12(%esp) movl $1, 12(%esp) movl $1, 12(%esp)
leal 8(%esi), %ebx leal 8(%esi), %ebx leal 8(%esi), %ebx
movl %ebx, 8(%esp) movl %ebx, 8(%esp) movl %ebx, 8(%esp)
movl 4(%esi), %edx movl 4(%esi), %ecx xorl %ebx, %ebx
testl %edx, %edx testl %ecx, %ecx movl 4(%esi), %esi
je .L23 jne .L43 testl %esi, %esi
call rand xorl %ebx, %ebx jne .L51
movl $.LC1, (%esp) .L38: .L46:
movl %eax, %ebx movl $.LC1, (%esp) movl $.LC1, (%esp)
call printf call printf call printf
movl %ebx, %eax movl 24(%esp), %esi movl 24(%esp), %esi
.L21: movl %ebx, %eax movl %ebx, %eax
movl 20(%esp), %ebx movl 20(%esp), %ebx movl 20(%esp), %ebx
movl 24(%esp), %esi addl $28, %esp addl $28, %esp
addl $28, %esp ret ret
ret .L43: .L51:
.L23: call rand call rand
movl $.LC1, (%esp) movl %eax, %ebx movl %eax, %ebx
call printf jmp .L38 jmp .L46
xorl %eax, %eax
jmp .L21
This code _looks_ smaller, but it produces bigger object code. That's
probably because the instructions being used take more bytes in
machine language. Other than that, it's the same thing.
Okay, so why does the Visual C++ compiler do so much better on this?
Well, let's look at (part of) the source code...
struct b
{
b() { m_pa = new a; }
~b() { delete m_pa; }
virtual int DoThing()
{
AutoRWL Lock(&m_RWL, 1);
if( m_pa )
return m_pa->DoOtherThing();
else
return 0;
}
}
You can see that gcc has inlined the calls to AutoRWL's constructor
and destructor, and to a::DoOtherThing. Now suppose we were going to
write assembly language for DoThing by hand. The first thing we'd
probably notice is that DoThing can only be called on a validly
constructed object of class b, which means that m_pa cannot possibly
be NULL, and therefore we could throw away the else branch entirely:
_ZN1b7DoThingEv:
pushl %esi
pushl %ebx
pushl %ebx
pushl %ebx
movl 20(%esp), %esi
pushl $.LC0
call printf
popl %ecx
leal 8(%esi), %ebx
movl %ebx, (%esp)
movl $1, 4(%esp)
call rand
pushl $.LC1
movl %eax, %ebx
call printf
popl %edx
movl %ebx, %eax
popl %ebx
popl %esi
popl %ebx
popl %esi
ret
We would then notice that there is no point in doing a complete
construction of the AutoRWL object, since that data is never used
again. That in turn means we don't ever use the this pointer.
Furthermore, the program cannot tell whether the second printf call
occurs before or after the call to rand, and if we swap them we can
use a sibling call. Finally, we've managed to eliminate all need for
a stack frame.
_ZN1b7DoThingEv:
pushl $.LC0
call printf
popl %eax
pushl $.LC1
call printf
popl %eax
jmp rand
You didn't post the code generated by Visual C++ but I bet it's
capable of one or more of those optimizations. GCC has basically no
framework for whole-program analysis, but we're working on it.
zw
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GCC 3.0.3 produces large code
@ 2002-01-31 10:57 Nicholas Adrian Vinen
0 siblings, 0 replies; 7+ messages in thread
From: Nicholas Adrian Vinen @ 2002-01-31 10:57 UTC (permalink / raw)
To: gcc
> I'm not sure about that topic, but I don't think that code size reduction
> is pushed too far in the development process of gcc, since it isn't used
> by too many people ?! I would be interested in the results of -O2
> replacing -Os. The code structure optimizations of -O2 may also result
> in a reduction of code size.
Actually the difference between -O2 and -Os is almost nothing.
> BTW. in my opinion is the usage of two return statements in one function
> a design fault. It is also remarkeble that the most cleanest code
> concerning function design compiles into the smallest result. That's
> exactly what I want from my gcc.
Well, C obviously permits it, otherwise there would be no "return"
statement... you'd just "return" the value of the last statement or
something. And I'm obviously not using C++ because I want a portable
assembler... I expect a C++ compiler to do a lot of optimisation. It has
to to make object oriented code at all efficient. Most C++ compilers do a
somewhat decent job of this.
Also, I think that GCC should be at least as good as other C++
compilers in most respects. It's unfortunate that Visual C does a far
superior job of code size optimisation. The performance advantages GCC has
(if any) aren't worth the tradeoff in code size, I don't think. Not that
Visual C is a good compiler, but they obviously worked hard on this aspect
of it.
I guess to reiterate, my main objection to your objections are twofold:
I don't want to have to edit all my code just to get it to produce good
output in GCC, it should be slightly smart about it. And I think C++ is
way beyond the point these days where I expect to have to optimise every
little thing in the code. If I wanted that I'd write in C.
Nicholas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GCC 3.0.3 produces large code
2002-01-31 9:44 Nicholas Adrian Vinen
@ 2002-01-31 10:20 ` Ingo Krabbe
2002-01-31 11:21 ` Zack Weinberg
0 siblings, 1 reply; 7+ messages in thread
From: Ingo Krabbe @ 2002-01-31 10:20 UTC (permalink / raw)
To: gcc
I'm not sure about that topic, but I don't think that code size reduction
is pushed too far in the development process of gcc, since it isn't used
by too many people ?! I would be interested in the results of -O2 replacing
-Os. The code structure optimizations of -O2 may also result in a reduction
of code size.
BTW. in my opinion is the usage of two return statements in one function
a design fault. It is also remarkeble that the most cleanest code concerning
function design compiles into the smallest result. That's exactly what
I want from my gcc.
CU INGO
Am Donnerstag, 31. Januar 2002 14:49 schrieben Sie:
>
> g++ -Os -fomit-frame-pointer-falign-labels=1 -falign-jumps=1
> -fno-exceptions -o test test.cpp
>
> I chose -Os for obvious reasons... the others are also in an attempt to
> get file size down. The version, as mentioned in the subject, is 3.0.3.
> Can anyone suggest how to make it compile the code any smaller? And
> perhaps the people responsible for optimisations might take note of this.
> I can understand maybe why the code needs to be so big with -O3, but with
> -Os, it seems to me it could be a lot smaller, and really routines like
> this benefit most from being small rather than fast (they'll never really
> be all that fast). Maybe the compiler can be super-smart about it, and
> automatically optimise small functions for size and large functions for
> speed?
>
> Anyway, I hope someone can improve the situation a bit.
>
>
> Thanks,
> Nicholas
>
> P.S. Other than code size, GCC 3.0.3 seems great.
^ permalink raw reply [flat|nested] 7+ messages in thread
* GCC 3.0.3 produces large code
@ 2002-01-31 9:44 Nicholas Adrian Vinen
2002-01-31 10:20 ` Ingo Krabbe
0 siblings, 1 reply; 7+ messages in thread
From: Nicholas Adrian Vinen @ 2002-01-31 9:44 UTC (permalink / raw)
To: gcc
Hi, I'm the author of a fairly large project which builds in GCC and
Visual C. I noticed that GCC was producing very large files and with a
series of hacks I managed to get the size down, but it's still
considerably larger than Visual C. Here's an example of a function that
doesn't get optimised very well:
#include <stdio.h>
#include <stdlib.h>
class RWL
{
public:
int a;
};
class AutoRWL
{
public:
AutoRWL(RWL* pRWL, int foo)
{
printf("Lock");
m_pRWL = pRWL;
m_foo = foo;
}
~AutoRWL()
{
printf("Unlock");
}
RWL* m_pRWL;
int m_foo;
};
class a
{
public:
int DoOtherThing()
{
return rand();
}
};
class b
{
public:
b() { m_pa = new a; }
~b() { delete m_pa; }
virtual int DoThing()
{
AutoRWL Lock(&m_RWL, 1);
if( m_pa )
return m_pa->DoOtherThing();
else
return 0;
}
virtual int DoThing2()
{
AutoRWL Lock(&m_RWL, 1);
return m_pa ? m_pa->DoOtherThing() : 0;
}
virtual int DoThing3()
{
AutoRWL Lock(&m_RWL, 1);
int ret = 0;
if( m_pa ) ret = m_pa->DoOtherThing();
return ret;
}
a* m_pa;
RWL m_RWL;
};
int main(void)
{
b bb;
return bb.DoThing();
}
This is a disassembly of the three DoThing routines:
08048870 <b::DoThing()>:
8048870: 56 push %esi
8048871: 53 push %ebx
8048872: 83 ec 20 sub $0x20,%esp
8048875: 8b 74 24 2c mov 0x2c(%esp,1),%esi
8048879: 68 a4 89 04 08 push $0x80489a4
804887e: e8 91 fd ff ff call 8048614 <_init+0x98>
8048883: 83 c4 10 add $0x10,%esp
8048886: c7 44 24 0c 01 00 00 movl $0x1,0xc(%esp,1)
804888d: 00
804888e: 8d 5e 08 lea 0x8(%esi),%ebx
8048891: 89 5c 24 08 mov %ebx,0x8(%esp,1)
8048895: 83 7e 04 00 cmpl $0x0,0x4(%esi)
8048899: 74 25 je 80488c0 <b::DoThing()+0x50>
804889b: e8 a4 fd ff ff call 8048644 <_init+0xc8>
80488a0: 83 ec 0c sub $0xc,%esp
80488a3: 89 c3 mov %eax,%ebx
80488a5: 68 a9 89 04 08 push $0x80489a9
80488aa: e8 65 fd ff ff call 8048614 <_init+0x98>
80488af: 83 c4 10 add $0x10,%esp
80488b2: 89 d8 mov %ebx,%eax
80488b4: 83 c4 14 add $0x14,%esp
80488b7: 5b pop %ebx
80488b8: 5e pop %esi
80488b9: c3 ret
80488ba: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
80488c0: 83 ec 0c sub $0xc,%esp
80488c3: 68 a9 89 04 08 push $0x80489a9
80488c8: e8 47 fd ff ff call 8048614 <_init+0x98>
80488cd: 83 c4 10 add $0x10,%esp
80488d0: 31 c0 xor %eax,%eax
80488d2: eb e0 jmp 80488b4 <b::DoThing()+0x44>
80488d4: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
80488da: 8d bf 00 00 00 00 lea 0x0(%edi),%edi
080488e0 <b::DoThing2()>:
80488e0: 56 push %esi
80488e1: 53 push %ebx
80488e2: 83 ec 20 sub $0x20,%esp
80488e5: 8b 74 24 2c mov 0x2c(%esp,1),%esi
80488e9: 68 a4 89 04 08 push $0x80489a4
80488ee: e8 21 fd ff ff call 8048614 <_init+0x98>
80488f3: 83 c4 10 add $0x10,%esp
80488f6: c7 44 24 0c 01 00 00 movl $0x1,0xc(%esp,1)
80488fd: 00
80488fe: 8d 5e 08 lea 0x8(%esi),%ebx
8048901: 89 5c 24 08 mov %ebx,0x8(%esp,1)
8048905: 83 7e 04 00 cmpl $0x0,0x4(%esi)
8048909: 74 1c je 8048927 <b::DoThing2()+0x47>
804890b: e8 34 fd ff ff call 8048644 <_init+0xc8>
8048910: 89 c3 mov %eax,%ebx
8048912: 83 ec 0c sub $0xc,%esp
8048915: 68 a9 89 04 08 push $0x80489a9
804891a: e8 f5 fc ff ff call 8048614 <_init+0x98>
804891f: 83 c4 24 add $0x24,%esp
8048922: 89 d8 mov %ebx,%eax
8048924: 5b pop %ebx
8048925: 5e pop %esi
8048926: c3 ret
8048927: 31 db xor %ebx,%ebx
8048929: eb e7 jmp 8048912 <b::DoThing2()+0x32>
804892b: 90 nop
804892c: 8d 74 26 00 lea 0x0(%esi,1),%esi
08048930 <b::DoThing3()>:
8048930: 56 push %esi
8048931: 53 push %ebx
8048932: 83 ec 20 sub $0x20,%esp
8048935: 8b 74 24 2c mov 0x2c(%esp,1),%esi
8048939: 68 a4 89 04 08 push $0x80489a4
804893e: e8 d1 fc ff ff call 8048614 <_init+0x98>
8048943: 83 c4 10 add $0x10,%esp
8048946: c7 44 24 0c 01 00 00 movl $0x1,0xc(%esp,1)
804894d: 00
804894e: 8d 5e 08 lea 0x8(%esi),%ebx
8048951: 89 5c 24 08 mov %ebx,0x8(%esp,1)
8048955: 31 db xor %ebx,%ebx
8048957: 83 7e 04 00 cmpl $0x0,0x4(%esi)
804895b: 74 07 je 8048964 <b::DoThing3()+0x34>
804895d: e8 e2 fc ff ff call 8048644 <_init+0xc8>
8048962: 89 c3 mov %eax,%ebx
8048964: 83 ec 0c sub $0xc,%esp
8048967: 68 a9 89 04 08 push $0x80489a9
804896c: e8 a3 fc ff ff call 8048614 <_init+0x98>
8048971: 83 c4 24 add $0x24,%esp
8048974: 89 d8 mov %ebx,%eax
8048976: 5b pop %ebx
8048977: 5e pop %esi
8048978: c3 ret
8048979: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi
Here are the sizes:
08048930 w F .text 00000049 b::DoThing3()
080488e0 w F .text 0000004b b::DoThing2()
08048870 w F .text 00000064 b::DoThing()
What vexes me is that all three routines do very similar things; in
fact, I have really just done a trivial optimisation by hand in each
case.. but each time this has reduced the code size, in fact quite a lot.
I don't understand why GCC isn't able to do this itself, and I'm still not
sure that the smallest code that I can get it to produce couldn't be made
quite a bit smaller. Here's the commandline by the way:
g++ -Os -fomit-frame-pointer-falign-labels=1 -falign-jumps=1
-fno-exceptions -o test test.cpp
I chose -Os for obvious reasons... the others are also in an attempt to
get file size down. The version, as mentioned in the subject, is 3.0.3.
Can anyone suggest how to make it compile the code any smaller? And
perhaps the people responsible for optimisations might take note of this.
I can understand maybe why the code needs to be so big with -O3, but with
-Os, it seems to me it could be a lot smaller, and really routines like
this benefit most from being small rather than fast (they'll never really
be all that fast). Maybe the compiler can be super-smart about it, and
automatically optimise small functions for size and large functions for
speed?
Anyway, I hope someone can improve the situation a bit.
Thanks,
Nicholas
P.S. Other than code size, GCC 3.0.3 seems great.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2002-02-01 18:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.33.0201312206150.23617-100000@pandora.x256.com>
2002-02-01 9:14 ` GCC 3.0.3 produces large code Joe Buck
2002-02-01 9:36 ` Gabriel Dos Reis
2002-02-01 10:40 ` Nicholas Adrian Vinen
2002-01-31 10:57 Nicholas Adrian Vinen
-- strict thread matches above, loose matches on Subject: below --
2002-01-31 9:44 Nicholas Adrian Vinen
2002-01-31 10:20 ` Ingo Krabbe
2002-01-31 11:21 ` Zack Weinberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).