powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
@ 2008-01-16 11:24 Sergei Poselenov
  2008-01-16 12:15 ` Duncan Sands
  2008-01-16 13:17 ` Andrew Haley
  0 siblings, 2 replies; 21+ messages in thread
From: Sergei Poselenov @ 2008-01-16 11:24 UTC (permalink / raw)
  To: gcc

Hello all,

I'm using the ppc-linux gcc-4.2.2 compiler and noted the code
size have increased significantly (about 40%!), comparing with
old 4.0.0 when using the -Os option. Same code, same compile-
and configuration-time options. Binutils are differ
(2.16.1 vs 2.17.50), though.

I've looked at the CSiBE testing results for ppc-elf with -Os,
comparing gcc_4_0_0 with mainline and found that the mainline
actually optimizes better, at least for the CSiBE test environment.
After some analysis I've came to the following results:
  Number of packages in the CSiBE test environment: 863
  N of packages where mainline GCC optimizes better:   290
  N of packages where mainline GCC optimizes worse: 436

And the regression in code size is up to 40%, like in my case.

What I'm missing here? Apparently, just "-Os" is not enough to
get the best size-optimized code?

Currently, it builds as following:
ppc-linux-gcc -g  -Os   -fPIC -ffixed-r14 -meabi -fno-strict-aliasing 
-D__KERNEL__ -DTEXT_BASE=0xfffc0000  -I/work/psl/tmp/u-boot/include 
-fno-builtin -ffreestanding -nostdinc -isystem 
/opt/eldk-4.2-01-08/usr/bin/../lib/gcc/powerpc-linux/4.2.2/include 
-pipe  -DCONFIG_PPC -D__powerpc__ -DCONFIG_4xx -ffixed-r2 -ffixed-r29 
-mstring -msoft-float -Wa,-m440 -mcpu=440 -DCONFIG_440=1 -Wall 
-Wstrict-prototypes -c -o interrupts.o interrupts.c

What would you suggest?

Thanks for any help,
Sergei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 11:24 powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? Sergei Poselenov
@ 2008-01-16 12:15 ` Duncan Sands
  2008-01-16 12:19   ` Sergei Poselenov
  2008-01-16 13:17 ` Andrew Haley
  1 sibling, 1 reply; 21+ messages in thread
From: Duncan Sands @ 2008-01-16 12:15 UTC (permalink / raw)
  To: gcc; +Cc: Sergei Poselenov

Hi,

> I'm using the ppc-linux gcc-4.2.2 compiler and noted the code
> size have increased significantly (about 40%!), comparing with
> old 4.0.0 when using the -Os option. Same code, same compile-
> and configuration-time options. Binutils are differ
> (2.16.1 vs 2.17.50), though.

what LLVM version is old 4.0.0?  Are you compiling C++ (I don't know
what CSiBE is)?  Are you using exception handling?

> I've looked at the CSiBE testing results for ppc-elf with -Os,
> comparing gcc_4_0_0 with mainline and found that the mainline
> actually optimizes better, at least for the CSiBE test environment.
> After some analysis I've came to the following results:
>   Number of packages in the CSiBE test environment: 863
>   N of packages where mainline GCC optimizes better:   290
>   N of packages where mainline GCC optimizes worse: 436

From these numbers it looks like llvm-gcc is better than mainline
most of the time.  However you say: "... found that the mainline
actually optimizes better".  Can you please clarify.

Best wishes,

Duncan.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 12:15 ` Duncan Sands
@ 2008-01-16 12:19   ` Sergei Poselenov
  2008-01-16 12:35     ` Duncan Sands
  2008-01-16 13:20     ` Andrew Haley
  0 siblings, 2 replies; 21+ messages in thread
From: Sergei Poselenov @ 2008-01-16 12:19 UTC (permalink / raw)
  To: Duncan Sands; +Cc: gcc

Hi Duncan,

Duncan Sands wrote:
> Hi,
> 
>> I'm using the ppc-linux gcc-4.2.2 compiler and noted the code
>> size have increased significantly (about 40%!), comparing with
>> old 4.0.0 when using the -Os option. Same code, same compile-
>> and configuration-time options. Binutils are differ
>> (2.16.1 vs 2.17.50), though.
> 
> what LLVM version is old 4.0.0?  Are you compiling C++ (I don't know
> what CSiBE is)?  Are you using exception handling?
> 

LLVM? From what I know llvm-gcc is an alternative for gcc. Are any
parts of LLVM used in current GCC? None of what I know.

CSiBE is the Code-Size Benchmark Environment, see 
http://www.inf.u-szeged.hu/csibe

>> I've looked at the CSiBE testing results for ppc-elf with -Os,
>> comparing gcc_4_0_0 with mainline and found that the mainline
>> actually optimizes better, at least for the CSiBE test environment.
>> After some analysis I've came to the following results:
>>   Number of packages in the CSiBE test environment: 863
>>   N of packages where mainline GCC optimizes better:   290
>>   N of packages where mainline GCC optimizes worse: 436
> 
> From these numbers it looks like llvm-gcc is better than mainline
> most of the time.  However you say: "... found that the mainline
> actually optimizes better".  Can you please clarify.
> 

No, all results are for the GCC project. "Mainline" here means the
current development version of GCC. For it, the sum of the test code
size is 3503061, vs. 3542052 for the gcc_4_0_0 branch. But again,
this performance is achieved by the significant regression for the
most of the test packages.

Regards,
Sergei

> Best wishes,
> 
> Duncan.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 12:19   ` Sergei Poselenov
@ 2008-01-16 12:35     ` Duncan Sands
  2008-01-16 13:20     ` Andrew Haley
  1 sibling, 0 replies; 21+ messages in thread
From: Duncan Sands @ 2008-01-16 12:35 UTC (permalink / raw)
  To: gcc; +Cc: Sergei Poselenov

> LLVM? From what I know llvm-gcc is an alternative for gcc. Are any
> parts of LLVM used in current GCC? None of what I know.

Sorry, I confused my mailing lists and thought you had asked on
the LLVM mailing list.  This explains why I didn't understand
your questions :)

Sorry about the noise,

Duncan.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 11:24 powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? Sergei Poselenov
  2008-01-16 12:15 ` Duncan Sands
@ 2008-01-16 13:17 ` Andrew Haley
  2008-01-16 15:59   ` Sergei Poselenov
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Haley @ 2008-01-16 13:17 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: gcc

Sergei Poselenov writes:
 > Hello all,
 > 
 > I'm using the ppc-linux gcc-4.2.2 compiler and noted the code
 > size have increased significantly (about 40%!), comparing with
 > old 4.0.0 when using the -Os option. Same code, same compile-
 > and configuration-time options. Binutils are differ
 > (2.16.1 vs 2.17.50), though.
 > 
 > I've looked at the CSiBE testing results for ppc-elf with -Os,
 > comparing gcc_4_0_0 with mainline and found that the mainline
 > actually optimizes better, at least for the CSiBE test environment.
 > After some analysis I've came to the following results:
 >   Number of packages in the CSiBE test environment: 863
 >   N of packages where mainline GCC optimizes better:   290
 >   N of packages where mainline GCC optimizes worse: 436
 > 
 > And the regression in code size is up to 40%, like in my case.

40% seems severe, but it may be an outlier.  What is the average
increase in code size, including the packages where it got better?

 > What I'm missing here? Apparently, just "-Os" is not enough to
 > get the best size-optimized code?

Not necessarily.  -Os is mostly [*] the same as -O2 but with
optimizations that might reasonably be expected to cause code size
increase disabled.  You might find some better combination of options
for your code base.

Andrew.

[*] Full disclosure:  There are a few optimizations, such as code
hoisting, that are specially used with -Os.  There are also some
passes with special tuning for -Os.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 12:19   ` Sergei Poselenov
  2008-01-16 12:35     ` Duncan Sands
@ 2008-01-16 13:20     ` Andrew Haley
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Haley @ 2008-01-16 13:20 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: Duncan Sands, gcc

Sergei Poselenov writes:
 > 
 > No, all results are for the GCC project. "Mainline" here means the
 > current development version of GCC. For it, the sum of the test code
 > size is 3503061, vs. 3542052 for the gcc_4_0_0 branch. But again,
 > this performance is achieved by the significant regression for the
 > most of the test packages.

Hold on, that's a 1.1% improvement in code size, not a regression.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 13:17 ` Andrew Haley
@ 2008-01-16 15:59   ` Sergei Poselenov
  2008-01-16 17:10     ` Andrew Haley
  2008-01-17 12:14     ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? Gabriel Paubert
  0 siblings, 2 replies; 21+ messages in thread
From: Sergei Poselenov @ 2008-01-16 15:59 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc

Hello,

I've just noted an error in my calculations: not 40%, but 10%
regression (used gdb to do the calculations and forgot to convert
inputs to float). Sorry.

But the problem still persists for me - I'm building an embedded
firmware (U-Boot) and it doesn't fit into the reserved space
anymore.

Andrew Haley wrote:
> Sergei Poselenov writes:
>  > Hello all,
>  > 
>  > I'm using the ppc-linux gcc-4.2.2 compiler and noted the code
>  > size have increased significantly (about 40%!), comparing with
>  > old 4.0.0 when using the -Os option. Same code, same compile-
>  > and configuration-time options. Binutils are differ
>  > (2.16.1 vs 2.17.50), though.
>  > 
>  > I've looked at the CSiBE testing results for ppc-elf with -Os,
>  > comparing gcc_4_0_0 with mainline and found that the mainline
>  > actually optimizes better, at least for the CSiBE test environment.
>  > After some analysis I've came to the following results:
>  >   Number of packages in the CSiBE test environment: 863
>  >   N of packages where mainline GCC optimizes better:   290
>  >   N of packages where mainline GCC optimizes worse: 436
>  > 
>  > And the regression in code size is up to 40%, like in my case.
> 
> 40% seems severe, but it may be an outlier.  What is the average
> increase in code size, including the packages where it got better?
> 


Specifically, in my case the digits are as follows (as reported by
'size'):
gcc 4.2.2:
    text    data     bss     dec     hex filename
    2696      60    1536    4292    10c4 interrupts.o

gcc 4.0.0:
  text    data     bss     dec     hex filename
    2424      88    1536    4048     fd0 interrupts.o

(about 10% regression)

As for the CSiBE results - the average regression is
3%, including top 3 winners:
100% (32768 vs 16384 for "linux-2.4.23-pre3-testplatform - 
arch/testplatform/kernel/init_task")
35% (1440 vs 1064 for "teem-1.6.0-src - src/air/enum")
34% (1712 vs 1280 for "teem-1.6.0-src - src/nrrd/encodingHex")

(See 
http://www.inf.u-szeged.hu/csibe/osingle.php?branchid=mainline&targetid=ppc-elf&timestamp=2008-01-15%2012:00:00&flags=-Os&csibever=2.x.x&projectid=---%20all%20files%20---&finish_button=Finish
for mainline results, and
http://www.inf.u-szeged.hu/csibe/osingle.php?branchid=gcc_4_0_0_release&targetid=ppc-elf&timestamp=2003-01-01%2012:00:00&flags=-Os&csibever=2.x.x&projectid=---%20all%20files%20---&finish_button=Finish
for gcc_4_0_0_release results)

For comparison bar chart see
http://www.inf.u-szeged.hu/csibe/ocomp.php?branchid_a=gcc_4_0_0_release&branchid_b=mainline&targetid_a=ppc-elf&targetid_b=ppc-elf&timestamp_a=2003-01-01%2012:00:00&timestamp_b=2008-01-15%2012:00:00&flags_a=-Os&flags_b=-Os&csibever_a=2.x.x&csibever_b=2.x.x&dataview=Code%20size&viewmode=Bar%20chart&finish_button=Finish


>  > What I'm missing here? Apparently, just "-Os" is not enough to
>  > get the best size-optimized code?
> 
> Not necessarily.  -Os is mostly [*] the same as -O2 but with
> optimizations that might reasonably be expected to cause code size
> increase disabled.  You might find some better combination of options
> for your code base.
> 

Understood. I raised the question in a hope that somebody already
have a set of useful options for powerpc cross to try with -Os.

Thanks,
Sergei
> Andrew.
> 
> [*] Full disclosure:  There are a few optimizations, such as code
> hoisting, that are specially used with -Os.  There are also some
> passes with special tuning for -Os.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 15:59   ` Sergei Poselenov
@ 2008-01-16 17:10     ` Andrew Haley
  2008-01-16 17:14       ` Sergei Poselenov
  2008-01-17 12:14     ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? Gabriel Paubert
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Haley @ 2008-01-16 17:10 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: gcc

Sergei Poselenov writes:
 > Hello,
 > 
 > I've just noted an error in my calculations: not 40%, but 10%
 > regression (used gdb to do the calculations and forgot to convert
 > inputs to float). Sorry.
 > 
 > But the problem still persists for me - I'm building an embedded
 > firmware (U-Boot) and it doesn't fit into the reserved space
 > anymore.
 > 
 > Andrew Haley wrote:
 > > Sergei Poselenov writes:
 > >  > Hello all,
 > >  > 
 > >  > I'm using the ppc-linux gcc-4.2.2 compiler and noted the code
 > >  > size have increased significantly (about 40%!), comparing with
 > >  > old 4.0.0 when using the -Os option. Same code, same compile-
 > >  > and configuration-time options. Binutils are differ
 > >  > (2.16.1 vs 2.17.50), though.
 > >  > 
 > >  > I've looked at the CSiBE testing results for ppc-elf with -Os,
 > >  > comparing gcc_4_0_0 with mainline and found that the mainline
 > >  > actually optimizes better, at least for the CSiBE test environment.
 > >  > After some analysis I've came to the following results:
 > >  >   Number of packages in the CSiBE test environment: 863
 > >  >   N of packages where mainline GCC optimizes better:   290
 > >  >   N of packages where mainline GCC optimizes worse: 436
 > >  > 
 > >  > And the regression in code size is up to 40%, like in my case.
 > > 
 > > 40% seems severe, but it may be an outlier.  What is the average
 > > increase in code size, including the packages where it got better?
 > > 
 > 
 > 
 > Specifically, in my case the digits are as follows (as reported by
 > 'size'):
 > gcc 4.2.2:
 >     text    data     bss     dec     hex filename
 >     2696      60    1536    4292    10c4 interrupts.o
 > 
 > gcc 4.0.0:
 >   text    data     bss     dec     hex filename
 >     2424      88    1536    4048     fd0 interrupts.o
 > 
 > (about 10% regression)

Sure, but this is a tiny sample.

 > As for the CSiBE results - the average regression is
 > 3%, including top 3 winners:
 > 100% (32768 vs 16384 for "linux-2.4.23-pre3-testplatform - 
 > arch/testplatform/kernel/init_task")
 > 35% (1440 vs 1064 for "teem-1.6.0-src - src/air/enum")
 > 34% (1712 vs 1280 for "teem-1.6.0-src - src/nrrd/encodingHex")

I've just re-read what you wrote, and noticed your comment above: "the
mainline actually optimizes better, at least for the CSiBE test
environment."

Quite so:

http://www.inf.u-szeged.hu/csibe/ocomp.php?branchid_a=gcc_4_0_0_release&branchid_b=mainline&targetid_a=arm-elf&targetid_b=arm-elf&timestamp_a=2003-01-01%2012:00:00&timestamp_b=2008-01-14%2012:00:00&flags_a=-Os&flags_b=-Os&csibever_a=2.x.x&csibever_b=2.x.x&dataview=Code%20size&viewmode=Summarized%20bar%20chart&finish_button=Finish

So we're actually doing better now than we were in 4.0.0.

Now, I sympathize that in your particular case you have a code size
regression.  This happens: when we do optimization in gcc, some code
bases will lose out.  All that we can promise is that we try not to
make it worse for most users.

What we can do is compare your code that has got much worse, and try
to figure out why.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 17:10     ` Andrew Haley
@ 2008-01-16 17:14       ` Sergei Poselenov
  2008-01-16 19:36         ` Andrew Haley
  0 siblings, 1 reply; 21+ messages in thread
From: Sergei Poselenov @ 2008-01-16 17:14 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc

Hello Andrew,

> Now, I sympathize that in your particular case you have a code size
> regression.  This happens: when we do optimization in gcc, some code
> bases will lose out.  All that we can promise is that we try not to
> make it worse for most users.
> 
> What we can do is compare your code that has got much worse, and try
> to figure out why.
> 

Would the generated asm listings be enough? Or should I send
the preprocessed sources as well?

Thanks a lot!

Regards,
Sergei

> Andrew.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 17:14       ` Sergei Poselenov
@ 2008-01-16 19:36         ` Andrew Haley
  2008-01-17 14:52           ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? [Emcraft #11717] Sergei Poselenov
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Haley @ 2008-01-16 19:36 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: gcc

Sergei Poselenov writes:
 > Hello Andrew,
 > 
 > > Now, I sympathize that in your particular case you have a code size
 > > regression.  This happens: when we do optimization in gcc, some code
 > > bases will lose out.  All that we can promise is that we try not to
 > > make it worse for most users.
 > > 
 > > What we can do is compare your code that has got much worse, and try
 > > to figure out why.
 > > 
 > 
 > Would the generated asm listings be enough? Or should I send
 > the preprocessed sources as well?

Both.

Rather than sending stuff, best to stick it on a web site if you can.

Having said that, your example is tiny, so it's likely that it won't
be very representative, and the less representative the code is the
less likely a gcc maintainer will be interested.  But at least we'll
be able to see the difference.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-16 15:59   ` Sergei Poselenov
  2008-01-16 17:10     ` Andrew Haley
@ 2008-01-17 12:14     ` Gabriel Paubert
  2008-01-17 12:47       ` Sergei Poselenov
  1 sibling, 1 reply; 21+ messages in thread
From: Gabriel Paubert @ 2008-01-17 12:14 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: Andrew Haley, gcc

On Wed, Jan 16, 2008 at 04:55:19PM +0300, Sergei Poselenov wrote:
> Hello,
> 
> I've just noted an error in my calculations: not 40%, but 10%
> regression (used gdb to do the calculations and forgot to convert
> inputs to float). Sorry.
> 
> But the problem still persists for me - I'm building an embedded
> firmware (U-Boot) and it doesn't fit into the reserved space
> anymore.
> 
[snipped]

> As for the CSiBE results - the average regression is
> 3%, including top 3 winners:
> 100% (32768 vs 16384 for "linux-2.4.23-pre3-testplatform - 
> arch/testplatform/kernel/init_task")

A change from an exact power of 2 to the next one looks very
suspiscious: I seriously doubt that it is a code generation
or instruction choice issue. While there might be a relatively
small increase in size inherent to the compiler, it looks like 
it then goes to a "round to the next power of 2" step.

Do you set the right options for your particular processor
(-Os might not override some scheduling decisions and the
default target processor might have changed between GCC
releases)?

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-17 12:14     ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? Gabriel Paubert
@ 2008-01-17 12:47       ` Sergei Poselenov
  2008-01-17 13:10         ` Gabriel Paubert
  0 siblings, 1 reply; 21+ messages in thread
From: Sergei Poselenov @ 2008-01-17 12:47 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Andrew Haley, gcc

Hello Gabriel,

Gabriel Paubert wrote:
> On Wed, Jan 16, 2008 at 04:55:19PM +0300, Sergei Poselenov wrote:
>> Hello,
>>
>> I've just noted an error in my calculations: not 40%, but 10%
>> regression (used gdb to do the calculations and forgot to convert
>> inputs to float). Sorry.
>>
>> But the problem still persists for me - I'm building an embedded
>> firmware (U-Boot) and it doesn't fit into the reserved space
>> anymore.
>>
> [snipped]
> 
>> As for the CSiBE results - the average regression is
>> 3%, including top 3 winners:
>> 100% (32768 vs 16384 for "linux-2.4.23-pre3-testplatform - 
>> arch/testplatform/kernel/init_task")
> 
> A change from an exact power of 2 to the next one looks very
> suspiscious: I seriously doubt that it is a code generation
> or instruction choice issue. While there might be a relatively
> small increase in size inherent to the compiler, it looks like 
> it then goes to a "round to the next power of 2" step.
> 

Probably your are right.

> Do you set the right options for your particular processor
> (-Os might not override some scheduling decisions and the
> default target processor might have changed between GCC
> releases)?
> 

I don't know now, actually, this is what I'm asking. As for the
target processor - as I stated in the initial message:

...
Currently, it builds as following:
ppc-linux-gcc -g -Os -fPIC -ffixed-r14 -meabi -fno-strict-aliasing 
-D__KERNEL__ -DTEXT_BASE=0xfffc0000 -I/work/psl/tmp/u-boot/include 
-fno-builtin -ffreestanding -nostdinc -isystem 
/opt/eldk-4.2-01-08/usr/bin/../lib/gcc/powerpc-linux/4.2.2/include 
-pipe -DCONFIG_PPC -D__powerpc__ -DCONFIG_4xx -ffixed-r2 -ffixed-r29 
-mstring -msoft-float -Wa,-m440 -mcpu=440 -DCONFIG_440=1 -Wall 
-Wstrict-prototypes -c -o interrupts.o interrupts.c
...

Note the "-mcpu=440" switch.

I removed all "-ffixed" option (just for test - we surely need them)
  - it doesn't change the size of the resultant gcc-4.2.2 code.

Regards,
Sergei

> 	Regards,
> 	Gabriel
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?
  2008-01-17 12:47       ` Sergei Poselenov
@ 2008-01-17 13:10         ` Gabriel Paubert
  0 siblings, 0 replies; 21+ messages in thread
From: Gabriel Paubert @ 2008-01-17 13:10 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: Andrew Haley, gcc


	Hello Sergei,

On Thu, Jan 17, 2008 at 03:13:59PM +0300, Sergei Poselenov wrote:
> I don't know now, actually, this is what I'm asking. As for the
> target processor - as I stated in the initial message:
> 
> ...
> Currently, it builds as following:
> ppc-linux-gcc -g -Os -fPIC -ffixed-r14 -meabi -fno-strict-aliasing 
> -D__KERNEL__ -DTEXT_BASE=0xfffc0000 -I/work/psl/tmp/u-boot/include 
> -fno-builtin -ffreestanding -nostdinc -isystem 
> /opt/eldk-4.2-01-08/usr/bin/../lib/gcc/powerpc-linux/4.2.2/include 
> -pipe -DCONFIG_PPC -D__powerpc__ -DCONFIG_4xx -ffixed-r2 -ffixed-r29 
> -mstring -msoft-float -Wa,-m440 -mcpu=440 -DCONFIG_440=1 -Wall 
> -Wstrict-prototypes -c -o interrupts.o interrupts.c
> ...
> 
> Note the "-mcpu=440" switch.

Doh, I missed this, sorry.

> 
> I removed all "-ffixed" option (just for test - we surely need them)
>  - it doesn't change the size of the resultant gcc-4.2.2 code.

I'm not sure that having -ffixed-r29 is a wise choice when you are
looking for small code size. It might prevent the use of load/store 
multiple in prologue and epilogue code.

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?  [Emcraft #11717]
  2008-01-16 19:36         ` Andrew Haley
@ 2008-01-17 14:52           ` Sergei Poselenov
  2008-01-17 15:11             ` Richard Guenther
  2008-01-17 18:27             ` Gabriel Paubert
  0 siblings, 2 replies; 21+ messages in thread
From: Sergei Poselenov @ 2008-01-17 14:52 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc, 'rt'

Hello Andrew,

Andrew Haley wrote:
> Sergei Poselenov writes:
>  > Hello Andrew,
>  > 
>  > > Now, I sympathize that in your particular case you have a code size
>  > > regression.  This happens: when we do optimization in gcc, some code
>  > > bases will lose out.  All that we can promise is that we try not to
>  > > make it worse for most users.
>  > > 
>  > > What we can do is compare your code that has got much worse, and try
>  > > to figure out why.
>  > > 
>  > 
>  > Would the generated asm listings be enough? Or should I send
>  > the preprocessed sources as well?
> 
> Both.
> 
> Rather than sending stuff, best to stick it on a web site if you can.
> 

Here it is:
Preprocessed and assembler code generated by the GCC 4.2.2 ppc-linux
cross-compiler:
http://www.emcraft.com/codesize/gcc-4.2.2/interrupts.i
http://www.emcraft.com/codesize/gcc-4.2.2/interrupts.s

The same code built with gcc-4.0.0 cross-compiler:
http://www.emcraft.com/codesize/gcc-4.0.0/interrupts.i
http://www.emcraft.com/codesize/gcc-4.0.0/interrupts.s

Again, for convenience, the compilation string is:
ppc-linux-gcc -g  -Os   -fPIC -D__KERNEL__ -DTEXT_BASE=0xfffc0000 
-I/work/psl/tmp/u-boot/include -fno-builtin -nostdinc -isystem 
/opt/eldk-4.2-01-08/usr/bin/../lib/gcc/powerpc-linux/4.2.2/include 
-pipe  -DCONFIG_PPC -D__powerpc__ -DCONFIG_4xx -msoft-float -Wa,-m440 
-mcpu=440 -DCONFIG_440=1 -Wall -Wstrict-prototypes -c  interrupts.c

The 'size' output for both the cases:
    text    data     bss     dec     hex filename
    2696      60    1536    4292    10c4 interrupts.o
and
    text    data     bss     dec     hex filename
    2424      88    1536    4048     fd0 interrupts.o


> Having said that, your example is tiny, so it's likely that it won't
> be very representative, and the less representative the code is the
> less likely a gcc maintainer will be interested.  But at least we'll
> be able to see the difference.
> 

I agree. Actually, the CSiBE results are impressive: I've built the 
bzip2 library for powerpc and got similar results.

I wonder why GCC maintainers are ignoring the -Os regression for
most of their cases (at least for powerpc).

Thanks,
Sergei

> Andrew.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? [Emcraft #11717]
  2008-01-17 14:52           ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? [Emcraft #11717] Sergei Poselenov
@ 2008-01-17 15:11             ` Richard Guenther
  2008-01-17 18:27             ` Gabriel Paubert
  1 sibling, 0 replies; 21+ messages in thread
From: Richard Guenther @ 2008-01-17 15:11 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: Andrew Haley, gcc, rt

On Jan 17, 2008 3:48 PM, Sergei Poselenov <sposelenov@emcraft.com> wrote:
> Hello Andrew,
>
> I agree. Actually, the CSiBE results are impressive: I've built the
> bzip2 library for powerpc and got similar results.
>
> I wonder why GCC maintainers are ignoring the -Os regression for
> most of their cases (at least for powerpc).

As always - patches welcome.

Richard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?  [Emcraft #11717]
  2008-01-17 14:52           ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? [Emcraft #11717] Sergei Poselenov
  2008-01-17 15:11             ` Richard Guenther
@ 2008-01-17 18:27             ` Gabriel Paubert
  2008-01-19 16:26               ` Andrew Haley
  1 sibling, 1 reply; 21+ messages in thread
From: Gabriel Paubert @ 2008-01-17 18:27 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: Andrew Haley, gcc, 'rt'

On Thu, Jan 17, 2008 at 05:48:10PM +0300, Sergei Poselenov wrote:
> Hello Andrew,
> 
> Andrew Haley wrote:
> >Sergei Poselenov writes:
> > > Hello Andrew,
> > > 
> > > > Now, I sympathize that in your particular case you have a code size
> > > > regression.  This happens: when we do optimization in gcc, some code
> > > > bases will lose out.  All that we can promise is that we try not to
> > > > make it worse for most users.
> > > > 
> > > > What we can do is compare your code that has got much worse, and try
> > > > to figure out why.
> > > > 
> > > 
> > > Would the generated asm listings be enough? Or should I send
> > > the preprocessed sources as well?
> >
> >Both.
> >
> >Rather than sending stuff, best to stick it on a web site if you can.
> >
> 
> Here it is:
> Preprocessed and assembler code generated by the GCC 4.2.2 ppc-linux
> cross-compiler:
> http://www.emcraft.com/codesize/gcc-4.2.2/interrupts.i
> http://www.emcraft.com/codesize/gcc-4.2.2/interrupts.s
> 
> The same code built with gcc-4.0.0 cross-compiler:
> http://www.emcraft.com/codesize/gcc-4.0.0/interrupts.i
> http://www.emcraft.com/codesize/gcc-4.0.0/interrupts.s
> 

The functions do not appear in the same order in both files, it's a
bit surprising! Anyway look for example at irq_install_handler:

- gcc-4.0 saves all registers using stmw r24,xx(r1) and restores them
with lmw r24,xx(r1) however this means that r29 is overwritten in 
the epilogue.

- gcc-4.2.2 saves and restores registers individually which
means that it takes 12 more instructions. There go 48 bytes.

This is especially visible in the epilogue (in the prologue
the saves are interspersed with other instructions).

In this case -ffixed-r29 hurts, but gcc4.2.2 looks more correct.

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?   [Emcraft #11717]
  2008-01-17 18:27             ` Gabriel Paubert
@ 2008-01-19 16:26               ` Andrew Haley
  2008-01-19 16:35                 ` David Edelsohn
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Haley @ 2008-01-19 16:26 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Sergei Poselenov, gcc, 'rt'

Gabriel Paubert wrote:
> On Thu, Jan 17, 2008 at 05:48:10PM +0300, Sergei Poselenov wrote:
>> Hello Andrew,

>> Preprocessed and assembler code generated by the GCC 4.2.2 ppc-linux
>> cross-compiler:
>> http://www.emcraft.com/codesize/gcc-4.2.2/interrupts.i
>> http://www.emcraft.com/codesize/gcc-4.2.2/interrupts.s
>>
>> The same code built with gcc-4.0.0 cross-compiler:
>> http://www.emcraft.com/codesize/gcc-4.0.0/interrupts.i
>> http://www.emcraft.com/codesize/gcc-4.0.0/interrupts.s
>>
> 
> The functions do not appear in the same order in both files, it's a
> bit surprising! Anyway look for example at irq_install_handler:
> 
> - gcc-4.0 saves all registers using stmw r24,xx(r1) and restores them
> with lmw r24,xx(r1) however this means that r29 is overwritten in 
> the epilogue.
> 
> - gcc-4.2.2 saves and restores registers individually which
> means that it takes 12 more instructions. There go 48 bytes.
> 
> This is especially visible in the epilogue (in the prologue
> the saves are interspersed with other instructions).
> 
> In this case -ffixed-r29 hurts, but gcc4.2.2 looks more correct.

OK, so the code here is bigger, partly because a bug has been fixed.
Not a bad thing, in general.

I suspect that the real reason for the change in save/restore is because
not using lmw/stmw is faster.  That's just a guess though.  gcc could probably
be fixed to use ldmw/stmw if -Os is used.

Anyway, now we've found something specific this is for the ppc maintainer to comment.

Andrew.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? [Emcraft #11717]
  2008-01-19 16:26               ` Andrew Haley
@ 2008-01-19 16:35                 ` David Edelsohn
  2008-01-19 16:51                   ` Andrew Haley
  0 siblings, 1 reply; 21+ messages in thread
From: David Edelsohn @ 2008-01-19 16:35 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Gabriel Paubert, Sergei Poselenov, gcc, 'rt'

>>>>> Andrew Haley writes:

Andrew> I suspect that the real reason for the change in save/restore is because
Andrew> not using lmw/stmw is faster.  That's just a guess though.  gcc could probably
Andrew> be fixed to use ldmw/stmw if -Os is used.

Andrew> Anyway, now we've found something specific this is for the ppc maintainer to comment.

	GCC does use load/store multiple and load/store string
instructions if -Os is used, but not when the sequence is broken up by a
fixed register.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?  [Emcraft #11717]
  2008-01-19 16:35                 ` David Edelsohn
@ 2008-01-19 16:51                   ` Andrew Haley
  2008-01-19 19:10                     ` David Edelsohn
  2008-01-21 17:25                     ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? (filed bug 34903)[Emcraft #11717] Sergei Poselenov
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Haley @ 2008-01-19 16:51 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Gabriel Paubert, Sergei Poselenov, gcc, 'rt'

David Edelsohn wrote:
>>>>>> Andrew Haley writes:
> 
> Andrew> I suspect that the real reason for the change in save/restore is because
> Andrew> not using lmw/stmw is faster.  That's just a guess though.  gcc could probably
> Andrew> be fixed to use ldmw/stmw if -Os is used.
> 
> Andrew> Anyway, now we've found something specific this is for the ppc maintainer to comment.
> 
> 	GCC does use load/store multiple and load/store string
> instructions if -Os is used, but not when the sequence is broken up by a
> fixed register.

Err, why not?

Thanks,
Andrew.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? [Emcraft #11717]
  2008-01-19 16:51                   ` Andrew Haley
@ 2008-01-19 19:10                     ` David Edelsohn
  2008-01-21 17:25                     ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? (filed bug 34903)[Emcraft #11717] Sergei Poselenov
  1 sibling, 0 replies; 21+ messages in thread
From: David Edelsohn @ 2008-01-19 19:10 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Gabriel Paubert, Sergei Poselenov, gcc, 'rt'

>>>>> Andrew Haley writes:

Andrew> Err, why not?

	Because the fixed register means it no longer is a continuous
sequence of registers.  And the PowerPC port does not break it up into two
sequences.  And fixed registers in that range are not part of any standard
ABI.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression?  (filed bug 34903)[Emcraft #11717]
  2008-01-19 16:51                   ` Andrew Haley
  2008-01-19 19:10                     ` David Edelsohn
@ 2008-01-21 17:25                     ` Sergei Poselenov
  1 sibling, 0 replies; 21+ messages in thread
From: Sergei Poselenov @ 2008-01-21 17:25 UTC (permalink / raw)
  To: Andrew Haley; +Cc: David Edelsohn, Gabriel Paubert, gcc, 'rt'

Hello all,

I've filed bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34903

Regards,
Sergei

Andrew Haley wrote:
> David Edelsohn wrote:
>>>>>>> Andrew Haley writes:
>>
>> Andrew> I suspect that the real reason for the change in save/restore 
>> is because
>> Andrew> not using lmw/stmw is faster.  That's just a guess though.  
>> gcc could probably
>> Andrew> be fixed to use ldmw/stmw if -Os is used.
>>
>> Andrew> Anyway, now we've found something specific this is for the ppc 
>> maintainer to comment.
>>
>>     GCC does use load/store multiple and load/store string
>> instructions if -Os is used, but not when the sequence is broken up by a
>> fixed register.
> 
> Err, why not?
> 
> Thanks,
> Andrew.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-01-21 15:46 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-16 11:24 powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? Sergei Poselenov
2008-01-16 12:15 ` Duncan Sands
2008-01-16 12:19   ` Sergei Poselenov
2008-01-16 12:35     ` Duncan Sands
2008-01-16 13:20     ` Andrew Haley
2008-01-16 13:17 ` Andrew Haley
2008-01-16 15:59   ` Sergei Poselenov
2008-01-16 17:10     ` Andrew Haley
2008-01-16 17:14       ` Sergei Poselenov
2008-01-16 19:36         ` Andrew Haley
2008-01-17 14:52           ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? [Emcraft #11717] Sergei Poselenov
2008-01-17 15:11             ` Richard Guenther
2008-01-17 18:27             ` Gabriel Paubert
2008-01-19 16:26               ` Andrew Haley
2008-01-19 16:35                 ` David Edelsohn
2008-01-19 16:51                   ` Andrew Haley
2008-01-19 19:10                     ` David Edelsohn
2008-01-21 17:25                     ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? (filed bug 34903)[Emcraft #11717] Sergei Poselenov
2008-01-17 12:14     ` powercp-linux cross GCC 4.2 vs GCC 4.0.0: -Os code size regression? Gabriel Paubert
2008-01-17 12:47       ` Sergei Poselenov
2008-01-17 13:10         ` Gabriel Paubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).