public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Bizarrely Poor Code from Bizarre Machine-Generated C Sources
@ 2007-05-29 16:10 Barak A. Pearlmutter
  2007-05-29 23:39 ` Andrew Haley
  0 siblings, 1 reply; 7+ messages in thread
From: Barak A. Pearlmutter @ 2007-05-29 16:10 UTC (permalink / raw)
  To: gcc-help

Success!

Some working magic seems to be this:

    gcc -s -o particle1 \
	-O3 \
	-march=k8 \
	-mfpmath=sse \
	-finline-limit=100000 \
	--param large-function-insns=1000000 \
	--param inline-unit-growth=1000000 \
	--param sra-field-structure-ratio=0 \
	particle1.c -lm

although it looks like -Os gives an additional improvement.

This (with GCC 4.1) reduces code volume to about 16k from a previous
near 1M, and reduces runtime by a factor of about 2700, as compared to
just -O3.

Further improvements welcome.

I'd also suggest adding a section to the GCC documentation on "how to
use GCC as a back-end to another compiler" which gives some typical
magic options like the above that would be useful in circumstances
like these.
--
Barak A. Pearlmutter <barak@cs.nuim.ie>
 Hamilton Institute & Dept Comp Sci, NUI Maynooth, Co. Kildare, Ireland
 http://www.bcl.hamilton.ie/~barak/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bizarrely Poor Code from Bizarre Machine-Generated C Sources
  2007-05-29 16:10 Bizarrely Poor Code from Bizarre Machine-Generated C Sources Barak A. Pearlmutter
@ 2007-05-29 23:39 ` Andrew Haley
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Haley @ 2007-05-29 23:39 UTC (permalink / raw)
  To: Barak A. Pearlmutter; +Cc: gcc-help

Barak A. Pearlmutter writes:
 > Success!
 > 
 > Some working magic seems to be this:
 > 
 >     gcc -s -o particle1 \
 > 	-O3 \
 > 	-march=k8 \
 > 	-mfpmath=sse \
 > 	-finline-limit=100000 \
 > 	--param large-function-insns=1000000 \
 > 	--param inline-unit-growth=1000000 \
 > 	--param sra-field-structure-ratio=0 \
 > 	particle1.c -lm
 > 
 > although it looks like -Os gives an additional improvement.
 > 
 > This (with GCC 4.1) reduces code volume to about 16k from a previous
 > near 1M, and reduces runtime by a factor of about 2700, as compared to
 > just -O3.
 > 
 > Further improvements welcome.
 > 
 > I'd also suggest adding a section to the GCC documentation on "how to
 > use GCC as a back-end to another compiler" which gives some typical
 > magic options like the above that would be useful in circumstances
 > like these.

http://gcc.gnu.org/wiki

Enjoy...

Andrew.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bizarrely Poor Code from Bizarre Machine-Generated C Sources
  2007-05-28  7:23   ` Barak A. Pearlmutter
  2007-05-28  9:42     ` Mattias Engdegård
@ 2007-05-28 11:26     ` Rask Ingemann Lambertsen
  1 sibling, 0 replies; 7+ messages in thread
From: Rask Ingemann Lambertsen @ 2007-05-28 11:26 UTC (permalink / raw)
  To: Barak A. Pearlmutter; +Cc: gcc-help

On Sun, May 27, 2007 at 10:28:09PM +0100, Barak A. Pearlmutter wrote:
> Hope you don't mind if I ask some follow-up questions.

   Not at all.

> Yup: we had also noticed the zillions of calls to memcpy with static
> arguments.  This is part of what I meant by "unnecessary data
> shuffling".  Is there some way to tell GCC that it isn't worth calling
> memcpy to copy such short structures?

   GCC optimizes memcpy according to the size of the memory block and the
CPU it is optimizing for. I'm not sure the most recent work on optimizing
memcpy() for x86 processors went into GCC 4.2, though.

> We could re-jigger our back end to generate FORTRAN instead of C and
> use GCC's FORTRAN stuff, maybe that would help?

   I don't know FORTRAN. I have no idea.

> > You will definitely want a lot of inlining for this sort of code, so
> > at least use -O3, but perhaps play with the inlining parameters too.
> 
> Right; -O3 didn't make any qualitative difference.  (I certainly tried
> that before posting.)  I do see a whole bunch of inline-related
> parameters in the GCC documentation, but it is not clear which I
> should tweaked.  I tried -O3 -flinline-limit=60000 (default 600) but
> even that doesn't make any qualitative difference.

   You *really* need to crank up those limits. I don't have GCC 4.2, but I
tried GCC 4.3 --param inline-call-cost=10000 --param
max-inline-insns-auto=20000 --param large-function-growth=1000 --param
inline-unit-growth=1000 which wasn't enough. I ran out of memory (256 MB RAM
+ 757 MB swap) with -finline-limit=60000 --param inline-call-cost=10000
--param max-inline-insns-auto=200000 --param large-function-growth=10000
--param inline-unit-growth=10000. Some versions of GCC need much more memory
than others. YMMV.

   I only noticed right now that you have many functions marked inline. Then
you also want to increase the parameter max-inline-insns-single.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bizarrely Poor Code from Bizarre Machine-Generated C Sources
  2007-05-28  7:23   ` Barak A. Pearlmutter
@ 2007-05-28  9:42     ` Mattias Engdegård
  2007-05-28 11:26     ` Rask Ingemann Lambertsen
  1 sibling, 0 replies; 7+ messages in thread
From: Mattias Engdegård @ 2007-05-28  9:42 UTC (permalink / raw)
  To: gcc-help

"Barak A. Pearlmutter" <barak@cs.nuim.ie> writes:

>But none of the structures are on the heap (no malloc) and we never
>take any addresses, so in theory they could be held in registers and
>kept in fragmented representations and that sort of thing.  I do not
>know if GCC can do that, or if there's any way to tell it to.

GCC usually (always?) obeys the platform ABI even for static
functions. Some ABIs pass small structs in registers (x86-64 and
SPARC-V9, and others) but older ABIs do not (32-bit x86 in particular).

(For some reason I never understood even the enlightened ABIs are
asymmetrical; they allow more registers for passing arguments than for
return values. Perhaps tradition and the lack of multiple return
values in C are to blame.)

>Right; -O3 didn't make any qualitative difference.  (I certainly tried
>that before posting.)  I do see a whole bunch of inline-related
>parameters in the GCC documentation, but it is not clear which I
>should tweaked.  I tried -O3 -flinline-limit=60000 (default 600) but
>even that doesn't make any qualitative difference.

If you think you know better than the compiler, declare your
functions with __attribute__ ((always_inline)). Example:

static inline __attribute__ ((always_inline)) double square(double x)
{ return x * x; }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bizarrely Poor Code from Bizarre Machine-Generated C Sources
  2007-05-27 18:22 ` Rask Ingemann Lambertsen
@ 2007-05-28  7:23   ` Barak A. Pearlmutter
  2007-05-28  9:42     ` Mattias Engdegård
  2007-05-28 11:26     ` Rask Ingemann Lambertsen
  0 siblings, 2 replies; 7+ messages in thread
From: Barak A. Pearlmutter @ 2007-05-28  7:23 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: gcc-help

Thanks for the hints.  I really appreciate the advice, and this access
to the secrets of the GCC initiate.

Hope you don't mind if I ask some follow-up questions.

> $ for i in *.s; do echo -n "${i}: "; grep -F -e memcpy ${i} | wc --lines; done

Yup: we had also noticed the zillions of calls to memcpy with static
arguments.  This is part of what I meant by "unnecessary data
shuffling".  Is there some way to tell GCC that it isn't worth calling
memcpy to copy such short structures?  If GCC did the copying using
explicit assembly code, it would probably be able to notice a host of
shuffling reduction opportunities using things like peephole
optimization.  At least, that was our impression from looking at the
generated code.

> The way you are using structures forces GCC to copy data around.
> ...  Change structures into scalar variables ... GCC has more
> freedom to place scalar variables than structures.

Some copying is of course unavoidable, especially at procedure call
boundaries.

But none of the structures are on the heap (no malloc) and we never
take any addresses, so in theory they could be held in registers and
kept in fragmented representations and that sort of thing.  I do not
know if GCC can do that, or if there's any way to tell it to.

We could re-jigger our back end to generate FORTRAN instead of C and
use GCC's FORTRAN stuff, maybe that would help?

> Unless you somehow manage to inline the whole program into main(), I
> don't see how it can be any different.

That would certainly be ideal!  (Modulo cycles in the call graph that
contain a non-tail-recursive link; I do not believe there is any such
cycle in this particular hunk of C code.)  Perhaps there some way to
tell GCC that when I declare a procedure "inline", I really mean it?
As you can see, our compiler goes to some trouble to mark which
procedures it thinks should always be inlined versus which should be
the C compiler's judgement call.

> You will definitely want a lot of inlining for this sort of code, so
> at least use -O3, but perhaps play with the inlining parameters too.

Right; -O3 didn't make any qualitative difference.  (I certainly tried
that before posting.)  I do see a whole bunch of inline-related
parameters in the GCC documentation, but it is not clear which I
should tweaked.  I tried -O3 -flinline-limit=60000 (default 600) but
even that doesn't make any qualitative difference.

					Cheers & Thanks,

					--Barak.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bizarrely Poor Code from Bizarre Machine-Generated C Sources
  2007-05-27 15:11 Barak A. Pearlmutter
@ 2007-05-27 18:22 ` Rask Ingemann Lambertsen
  2007-05-28  7:23   ` Barak A. Pearlmutter
  0 siblings, 1 reply; 7+ messages in thread
From: Rask Ingemann Lambertsen @ 2007-05-27 18:22 UTC (permalink / raw)
  To: Barak A. Pearlmutter; +Cc: gcc-help

On Sun, May 27, 2007 at 03:05:38PM +0100, Barak A. Pearlmutter wrote:

> In particular, it defines gobs of new
> structure types and gobs of very very short functions, and there are
> no pointers used.  It should be possible, using the optimization
> techniques already present in GCC, for very tense machine code to be
> generated from this admittedly strange FORTRAN-style C source code.
> But instead, the assembly code GCC generates is full of unnecessary
> data shuffling.

   The way you are using structures forces GCC to copy data around. Unless
you somehow manage to inline the whole program into main(), I don't see how
it can be any different.

>  - Some small change we could make to the generated C sources that
>    would cause it to be optimized well.  (Add some magic __attribute__
>    somewhere.)

   Change the structures into scalar variables for a start. GCC has more
freedom to place scalar variables than structures. Also, try to arrange
function parameters such that sibling call optimization has a chance of
working.

BAD:

int g (int c, int b, int a)
{ ... }

int f (int a, int b, int c)
{
  return g (c, b, a);
}

GOOD:

int g (int a, int b, int c)
{ ... }

int f (int a, int b, int c)
{
  return g (a, b, c);
}

> Below are notes that include detailed version information on the
> compilers used.  In the notes below we used
>  -O2 -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3
> but the results don't seem to improve by changing them.

   You will definitely want a lot of inlining for this sort of code, so at
least use -O3, but perhaps play with the inlining parameters too. On a side
note, consider using using -march to tell GCC which model of CPU you intend
to run the code on.

> $ wc --lines *.s
>   163922 particle1-gcc295.s
>   343012 particle1-gcc33.s
>   353057 particle1-gcc34.s
>   100697 particle1-gcc41.s
>    47030 particle1-gcc42.s

   I imagine you'll be enlightened by running

$ for i in *.s; do echo -n "${i}: "; grep -F -e memcpy ${i} | wc --lines; done

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Bizarrely Poor Code from Bizarre Machine-Generated C Sources
@ 2007-05-27 15:11 Barak A. Pearlmutter
  2007-05-27 18:22 ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 7+ messages in thread
From: Barak A. Pearlmutter @ 2007-05-27 15:11 UTC (permalink / raw)
  To: gcc-help

A colleague and I have developed a fancy compiler for a new sort of
advanced numeric programming language.  The output of this compiler is
C source code.  Although optimized in some respects, this C is
somewhat bizarre in others.  In particular, it defines gobs of new
structure types and gobs of very very short functions, and there are
no pointers used.  It should be possible, using the optimization
techniques already present in GCC, for very tense machine code to be
generated from this admittedly strange FORTRAN-style C source code.
But instead, the assembly code GCC generates is full of unnecessary
data shuffling.  So much data shuffling that this dominates the actual
useful arithmetic instructions, by a factor of 100s, causing a
slowdown in the generated executable of a similar magnitude.  The poor
optimization is present no matter what we try: all versions of GCC and
all optimization flags.  Although it does seem to be a little better
in GCC 4.2.

What I'm hoping for is one of the following:

 - Some new GCC option magic that would get this all optimized.

 - Some small change we could make to the generated C sources that
   would cause it to be optimized well.  (Add some magic __attribute__
   somewhere.)

 - Some other magic (rebuild GCC with build option XXX, or patch the
   GCC sources *here* and *here*) that would make it optimize well.

 - Some combination of the above.

 - A pointer to some other compiler (horrors!) that would optimize
   this well.

The C sources, and generated assembly, are too long to attach below.
Instead, I am making them available at

 http://www.bcl.hamilton.ie/~barak/stalingrad-vs-gcc/

Below are notes that include detailed version information on the
compilers used.  In the notes below we used
 -O2 -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3
but the results don't seem to improve by changing them.

Our thanks, to anyone who takes up the challenge, for looking at and
thinking about this issue.
--
Barak A. Pearlmutter <barak@cs.nuim.ie>
 Hamilton Institute & Dept Comp Sci, NUI Maynooth, Co. Kildare, Ireland
 http://www.bcl.hamilton.ie/~barak/

----------------------------------------------------------------
--- NOTES ---
----------------------------------------------------------------

$ gcc-4.1 -v

Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --with-tune=i686 --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

$ gcc-4.1 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c

particle1.c:10763: warning: 'f95' defined but not used
particle1.c:10775: warning: 'f110' defined but not used
particle1.c:10788: warning: 'f126' defined but not used
particle1.c:10887: warning: 'f273' defined but not used
particle1.c:10888: warning: 'f274' defined but not used
particle1.c:10889: warning: 'f275' defined but not used
particle1.c:10890: warning: 'f277' defined but not used
particle1.c:12456: warning: 'f2456' defined but not used
particle1.c:12478: warning: 'f2482' defined but not used
particle1.c:12583: warning: 'f2623' defined but not used
particle1.c:12631: warning: 'f2690' defined but not used
particle1.c:12678: warning: 'f2752' defined but not used
particle1.c:12720: warning: 'f2828' defined but not used

$ mv particle1.s particle1-gcc41.s 



$ gcc-4.2 -v

Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-targets=all --disable-werror --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.2.1 20070525 (prerelease) (Debian 4.2-20070525-1)

$ gcc-4.2 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c

particle1.c:10763: warning: 'f95' defined but not used
particle1.c:10775: warning: 'f110' defined but not used
particle1.c:10788: warning: 'f126' defined but not used
particle1.c:10887: warning: 'f273' defined but not used
particle1.c:10888: warning: 'f274' defined but not used
particle1.c:10889: warning: 'f275' defined but not used
particle1.c:10890: warning: 'f277' defined but not used
particle1.c:12456: warning: 'f2456' defined but not used
particle1.c:12478: warning: 'f2482' defined but not used
particle1.c:12583: warning: 'f2623' defined but not used
particle1.c:12631: warning: 'f2690' defined but not used
particle1.c:12678: warning: 'f2752' defined but not used
particle1.c:12720: warning: 'f2828' defined but not used

$ mv particle1.s particle1-gcc42.s 



$ gcc-2.95 -v

Reading specs from /usr/lib/gcc-lib/i486-linux-gnu/2.95.4/specs
gcc version 2.95.4 20011002 (Debian prerelease)

$ gcc-2.95 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c

cc1: Invalid option `fpmath=sse'
cc1: Invalid option `sse3'
particle1.c: In function `write_real':
particle1.c:7: warning: use of `l' length character with `g' type character
particle1.c: At top level:
particle1.c:10763: warning: `f95' defined but not used
particle1.c:10775: warning: `f110' defined but not used
particle1.c:10788: warning: `f126' defined but not used
particle1.c:10887: warning: `f273' defined but not used
particle1.c:10888: warning: `f274' defined but not used
particle1.c:10889: warning: `f275' defined but not used
particle1.c:10890: warning: `f277' defined but not used
particle1.c:12456: warning: `f2456' defined but not used
particle1.c:12478: warning: `f2482' defined but not used
particle1.c:12583: warning: `f2623' defined but not used
particle1.c:12631: warning: `f2690' defined but not used
particle1.c:12678: warning: `f2752' defined but not used
particle1.c:12720: warning: `f2828' defined but not used

$ mv particle1.s particle1-gcc295.s 



$ gcc-3.3 -v

Reading specs from /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/specs
Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --enable-__cxa_atexit --with-system-zlib --enable-nls --without-included-gettext --enable-clocale=gnu --enable-debug i486-linux-gnu
Thread model: posix
gcc version 3.3.6 (Debian 1:3.3.6-15)

$ gcc-3.3 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c

particle1.c:10763: warning: `f95' defined but not used
particle1.c:10775: warning: `f110' defined but not used
particle1.c:10788: warning: `f126' defined but not used
particle1.c:10887: warning: `f273' defined but not used
particle1.c:10888: warning: `f274' defined but not used
particle1.c:10889: warning: `f275' defined but not used
particle1.c:10890: warning: `f277' defined but not used
particle1.c:12456: warning: `f2456' defined but not used
particle1.c:12478: warning: `f2482' defined but not used
particle1.c:12583: warning: `f2623' defined but not used
particle1.c:12631: warning: `f2690' defined but not used
particle1.c:12678: warning: `f2752' defined but not used
particle1.c:12720: warning: `f2828' defined but not used

$ mv particle1.s particle1-gcc33.s 



$ gcc-3.4 -v

Reading specs from /usr/lib/gcc/i486-linux-gnu/3.4.6/specs
Configured with: ../src/configure -v --enable-languages=c,c++,f77,pascal --prefix=/usr --libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --with-tune=i686 i486-linux-gnu
Thread model: posix
gcc version 3.4.6 (Debian 3.4.6-5)

$ gcc-3.4 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c

particle1.c:10763: warning: 'f95' defined but not used
particle1.c:10775: warning: 'f110' defined but not used
particle1.c:10788: warning: 'f126' defined but not used
particle1.c:10887: warning: 'f273' defined but not used
particle1.c:10888: warning: 'f274' defined but not used
particle1.c:10889: warning: 'f275' defined but not used
particle1.c:10890: warning: 'f277' defined but not used
particle1.c:12456: warning: 'f2456' defined but not used
particle1.c:12478: warning: 'f2482' defined but not used
particle1.c:12583: warning: 'f2623' defined but not used
particle1.c:12631: warning: 'f2690' defined but not used
particle1.c:12678: warning: 'f2752' defined but not used
particle1.c:12720: warning: 'f2828' defined but not used

$ mv particle1.s particle1-gcc34.s 



$ gcc -o particle1 particle1.c -lm

$ ./particle1
0.01999188620615792


$ ls -l
-rw-rw-r-- 1 barak barak    6764 2007-05-27 14:38 NOTES
-rwxrwxr-x 1 barak barak  736714 2007-05-27 13:08 particle1
-rw-r--r-- 1 barak barak  901853 2007-05-27 12:14 particle1.c
-rw-r--r-- 1 barak barak 2383226 2007-05-27 12:41 particle1-gcc295.s
-rw-r--r-- 1 barak barak 7291988 2007-05-27 12:46 particle1-gcc33.s
-rw-r--r-- 1 barak barak 8005026 2007-05-27 12:55 particle1-gcc34.s
-rw-rw-r-- 1 barak barak 1703481 2007-05-27 12:33 particle1-gcc41.s
-rw-r--r-- 1 barak barak 1000722 2007-05-27 12:36 particle1-gcc42.s

$ wc --lines particle1.c
   12825 particle1.c

$ wc --lines *.s
  163922 particle1-gcc295.s
  343012 particle1-gcc33.s
  353057 particle1-gcc34.s
  100697 particle1-gcc41.s
   47030 particle1-gcc42.s

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-05-29 16:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-29 16:10 Bizarrely Poor Code from Bizarre Machine-Generated C Sources Barak A. Pearlmutter
2007-05-29 23:39 ` Andrew Haley
  -- strict thread matches above, loose matches on Subject: below --
2007-05-27 15:11 Barak A. Pearlmutter
2007-05-27 18:22 ` Rask Ingemann Lambertsen
2007-05-28  7:23   ` Barak A. Pearlmutter
2007-05-28  9:42     ` Mattias Engdegård
2007-05-28 11:26     ` Rask Ingemann Lambertsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).