public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* profile-driven optimization and the linker?
@ 2004-11-25 19:49 Dan Kegel
  2004-11-25 20:38 ` Jeffrey A Law
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Kegel @ 2004-11-25 19:49 UTC (permalink / raw)
  To: GCC Mailing List

http://gcc.gnu.org/news/profiledriven.html doesn't mention
having ld reorder sections to improve locality of reference
of frequently called functions.    Has that technique
been tried with gcc and binutils?
- Dan

-- 
Trying to get a job as a c++ developer?  See http://kegel.com/academy/getting-hired.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 19:49 profile-driven optimization and the linker? Dan Kegel
@ 2004-11-25 20:38 ` Jeffrey A Law
  2004-11-25 21:07   ` Dan Kegel
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jeffrey A Law @ 2004-11-25 20:38 UTC (permalink / raw)
  To: Dan Kegel; +Cc: GCC Mailing List

On Thu, 2004-11-25 at 10:14 -0800, Dan Kegel wrote:
> http://gcc.gnu.org/news/profiledriven.html doesn't mention
> having ld reorder sections to improve locality of reference
> of frequently called functions.    Has that technique
> been tried with gcc and binutils?
Yes.  I had some success with it a while back.

With ELF it's not terribly hard since we can arrange for
each function to be put in its own section.  Thus you get
to do function-level reordering by just twiddling a linker
map.  In fact, that's where -ffunction-sections name from.

We had some hacks to gprof which would take profile data
and build a linker map.  That's the code that really needs
to be improved.  I never understood the gprof code well
enough to do it right.

Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 20:38 ` Jeffrey A Law
@ 2004-11-25 21:07   ` Dan Kegel
  2004-11-25 22:52     ` Joseph S. Myers
  2004-11-25 23:13     ` Jeffrey A Law
  2004-11-26  0:01   ` Ben Elliston
  2004-11-29 16:20   ` Will Cohen
  2 siblings, 2 replies; 14+ messages in thread
From: Dan Kegel @ 2004-11-25 21:07 UTC (permalink / raw)
  To: law; +Cc: GCC Mailing List, bastian

Jeffrey A Law wrote:
>>http://gcc.gnu.org/news/profiledriven.html doesn't mention
>>having ld reorder sections to improve locality of reference
>>of frequently called functions.    Has that technique
>>been tried with gcc and binutils?
> 
> Yes.  I had some success with it a while back.

Was that on static executables, or dynamically linked ones?

> With ELF it's not terribly hard since we can arrange for
> each function to be put in its own section.  Thus you get
> to do function-level reordering by just twiddling a linker
> map.  In fact, that's where -ffunction-sections name from.
> 
> We had some hacks to gprof which would take profile data
> and build a linker map.  That's the code that really needs
> to be improved.  I never understood the gprof code well
> enough to do it right.

I may take a whack at this in my copious free time.
Are any such hacks online, or shall I just start from scratch?

I guess further discussion should be on the binutils mailing list.
- Dan

p.s.

I first heard of this technique about 1993 or so, when
Lee Hasiuk implemented it on MS-DOS.

http://mail.kde.org/pipermail/kde-optimize/2003-January/000108.html
describes something similar, but which only aimed at
increasing *disk* cache locality when paging in functions from
disk for the first call.

FWIW, Alan Cox mentioned the technique a couple weeks ago,
http://marc.theaimsgroup.com/?l=linux-kernel&m=109935741612533&w=2
but didn't mention any code.

-- 
Trying to get a job as a c++ developer?  See http://kegel.com/academy/getting-hired.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 21:07   ` Dan Kegel
@ 2004-11-25 22:52     ` Joseph S. Myers
  2004-11-26  5:32       ` Dan Kegel
  2004-11-29 18:05       ` Joe Buck
  2004-11-25 23:13     ` Jeffrey A Law
  1 sibling, 2 replies; 14+ messages in thread
From: Joseph S. Myers @ 2004-11-25 22:52 UTC (permalink / raw)
  To: Dan Kegel; +Cc: law, GCC Mailing List, bastian

On Thu, 25 Nov 2004, Dan Kegel wrote:

> I may take a whack at this in my copious free time.
> Are any such hacks online, or shall I just start from scratch?

You could try to get hold of the semi-mythical "GNU Rope".  (Most of the 
online references are people asking what happened to it, e.g. 
<http://www.advogato.org/article/660.html>.)

-- 
Joseph S. Myers               http://www.srcf.ucam.org/~jsm28/gcc/
    jsm@polyomino.org.uk (personal mail)
    joseph@codesourcery.com (CodeSourcery mail)
    jsm28@gcc.gnu.org (Bugzilla assignments and CCs)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 21:07   ` Dan Kegel
  2004-11-25 22:52     ` Joseph S. Myers
@ 2004-11-25 23:13     ` Jeffrey A Law
  2004-11-26  5:08       ` Dan Kegel
  2004-11-26  6:19       ` Dan Kegel
  1 sibling, 2 replies; 14+ messages in thread
From: Jeffrey A Law @ 2004-11-25 23:13 UTC (permalink / raw)
  To: Dan Kegel; +Cc: GCC Mailing List, bastian

On Thu, 2004-11-25 at 11:59 -0800, Dan Kegel wrote:
> Jeffrey A Law wrote:
> >>http://gcc.gnu.org/news/profiledriven.html doesn't mention
> >>having ld reorder sections to improve locality of reference
> >>of frequently called functions.    Has that technique
> >>been tried with gcc and binutils?
> > 
> > Yes.  I had some success with it a while back.
> 
> Was that on static executables, or dynamically linked ones?
I've done both at different times with different linker technologies.

> 
> > With ELF it's not terribly hard since we can arrange for
> > each function to be put in its own section.  Thus you get
> > to do function-level reordering by just twiddling a linker
> > map.  In fact, that's where -ffunction-sections name from.
> > 
> > We had some hacks to gprof which would take profile data
> > and build a linker map.  That's the code that really needs
> > to be improved.  I never understood the gprof code well
> > enough to do it right.
> 
> I may take a whack at this in my copious free time.
> Are any such hacks online, or shall I just start from scratch?
I believe we put all the code into gprof, though I haven't followed
it over time and it may have been removed.

Basically there was two pieces to gprof.  The first within gprof
itself to produce an ordering of functions.  That code will
probably need to be rewritten (I never liked the way I did it
and I never got the coalescing of information step done).

The second piece was a simple shell script which took the ordering
produced by gprof and created a linker script.  Nothing too radical
here.


> I guess further discussion should be on the binutils mailing list.
Probably.  From a compiler standpoint I think all the pieces are in
place.


> I first heard of this technique about 1993 or so, when
> Lee Hasiuk implemented it on MS-DOS.
I did a lot of this kind of stuff in the early 90s -- back when
a.out was still popular.  With some assembler help we were able
to do function level reordering even on a.out files.  There's
some awful memories...  I have no idea if OMOS lives on or not.

The gcc/gas/gprof stuff was done in the mid 90s.


> http://mail.kde.org/pipermail/kde-optimize/2003-January/000108.html
> describes something similar, but which only aimed at
> increasing *disk* cache locality when paging in functions from
> disk for the first call.
We were primarily measuring TLB effects with ours -- page locality
for disk access falls out from the obvious algorithms.  Though IIRC
we actually did better by first removing all the single call nodes
from the graph and putting them into one contigious region, then
applying the greedy algorithm on the rest of the call graph.

> FWIW, Alan Cox mentioned the technique a couple weeks ago,
> http://marc.theaimsgroup.com/?l=linux-kernel&m=109935741612533&w=2
> but didn't mention any code.
I've heard of GNU Rope, but I've never seen any code...

jeff


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 20:38 ` Jeffrey A Law
  2004-11-25 21:07   ` Dan Kegel
@ 2004-11-26  0:01   ` Ben Elliston
  2004-11-29 16:20   ` Will Cohen
  2 siblings, 0 replies; 14+ messages in thread
From: Ben Elliston @ 2004-11-26  0:01 UTC (permalink / raw)
  To: gcc

Jeffrey A Law <law@redhat.com> writes:

> We had some hacks to gprof which would take profile data and build a
> linker map.  That's the code that really needs to be improved.  I
> never understood the gprof code well enough to do it right.

A year or two ago, I wrote a Python program that could compile and
link a program (with -fdata-sections), run it under a simulator,
collect memory traffic stats and re-run the linker with a custom
linker script that would rearrange global variables so that they were
allocated to faster memory regions (on systems where this was the
case).

The idea was that this could be used as a wrapper around the compiler
driver.  It never got quite that refined, but the idea seemed sound.

Ben

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 23:13     ` Jeffrey A Law
@ 2004-11-26  5:08       ` Dan Kegel
  2004-11-26  6:19       ` Dan Kegel
  1 sibling, 0 replies; 14+ messages in thread
From: Dan Kegel @ 2004-11-26  5:08 UTC (permalink / raw)
  To: law; +Cc: GCC Mailing List, bastian

Jeffrey A Law wrote:
> On Thu, 2004-11-25 at 11:59 -0800, Dan 
>>>>http://gcc.gnu.org/news/profiledriven.html doesn't mention
>>>>having ld reorder sections to improve locality of reference
>>>>of frequently called functions.    Has that technique
>>>>been tried with gcc and binutils?
>>>
>>>Yes.  I had some success with it a while back.  ...
> 
> I believe we put all the code into gprof, though I haven't followed
> it over time and it may have been removed.
> 
> Basically there was two pieces to gprof.  The first within gprof
> itself to produce an ordering of functions.  That code will
> probably need to be rewritten (I never liked the way I did it
> and I never got the coalescing of information step done).
> 
> The second piece was a simple shell script which took the ordering
> produced by gprof and created a linker script.  Nothing too radical
> here.
> ...
> We were primarily measuring TLB effects with ours -- page locality
> for disk access falls out from the obvious algorithms.  Though IIRC
> we actually did better by first removing all the single call nodes
> from the graph and putting them into one contigious region, then
> applying the greedy algorithm on the rest of the call graph.

Hey, looks like it might still be there in current gprof.
(The grope paper describes your algorithm, btw.)

Here are the gprof options from the doc:

--function-ordering
     The `--function-ordering' option causes gprof to print a suggested function ordering for the program based on profiling data. This option suggests an ordering which may improve paging, tlb and cache behavior for the program on systems 
which support arbitrary ordering of functions in an executable. The exact details of how to force the linker to place functions in a particular order is system dependent and out of the scope of this manual.
--file-ordering map_file
     The `--file-ordering' option causes gprof to print a suggested .o link line ordering for the program based on profiling data. This option suggests an ordering which may improve paging, tlb and cache behavior for the program on systems 
which do not support arbitrary ordering of functions in an executable. Use of the `-a' argument is highly recommended with this option. The map_file argument is a pathname to a file which provides function name to object file mappings. The 
format of the file is similar to the output of the program nm.

c-parse.o:00000000 T yyparse
c-parse.o:00000004 C yyerrflag
c-lang.o:00000000 T maybe_objc_method_name
c-lang.o:00000000 T print_lang_statistics
c-lang.o:00000000 T recognize_objc_keyword
c-decl.o:00000000 T print_lang_identifier
c-decl.o:00000000 T print_lang_type
...
     GNU nm `--extern-only' `--defined-only' `-v' `--print-file-name' can be used to create map_file.

Sounds like it ought to be easy to try.

Thanks,
Dan

-- 
Trying to get a job as a c++ developer?  See http://kegel.com/academy/getting-hired.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 22:52     ` Joseph S. Myers
@ 2004-11-26  5:32       ` Dan Kegel
  2004-11-29 18:05       ` Joe Buck
  1 sibling, 0 replies; 14+ messages in thread
From: Dan Kegel @ 2004-11-26  5:32 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: law, GCC Mailing List, bastian

Joseph S. Myers wrote:
> On Thu, 25 Nov 2004, Dan Kegel wrote:
> 
> 
>>I may take a whack at this in my copious free time.
>>Are any such hacks online, or shall I just start from scratch?
> 
> 
> You could try to get hold of the semi-mythical "GNU Rope".  (Most of the 
> online references are people asking what happened to it, e.g. 
> <http://www.advogato.org/article/660.html>.)

I just emailed Nat, but I have little hope of succeeding where others
have failed.

And then there's SCO's "fur", said to have been released under
a BSD license (http://www.aplawrence.com/News/sconews0051.html).
Anyone have a copy of that?
- Dan


-- 
Trying to get a job as a c++ developer?  See http://kegel.com/academy/getting-hired.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 23:13     ` Jeffrey A Law
  2004-11-26  5:08       ` Dan Kegel
@ 2004-11-26  6:19       ` Dan Kegel
  2004-11-26  8:55         ` Dan Kegel
  2004-11-29 16:30         ` Will Cohen
  1 sibling, 2 replies; 14+ messages in thread
From: Dan Kegel @ 2004-11-26  6:19 UTC (permalink / raw)
  To: law; +Cc: GCC Mailing List, bastian

I went looking for instances of people using gprof's --function-ordering
option, but came up fairly empty handed.  Anyone
have good references?  Here's the little bit I dug up:

Ben's post just now was pretty good, even though that was for data and not code:
http://gcc.gnu.org/ml/gcc/2004-11/msg01002.html
Did reordering the global variables end up helping performance on any
interesting tasks?

http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00935.html says
"Steve Christiansen tried using gprof output to create a linker
script that orders functions based on run-time call graphs
and call counts, but couldn't show that it made a difference, based on SPEC CPU2000 results."
(Since the --function-ordering option was added to gprof at
the end of 1995, I imagine Steve Christiansen used it.
On the other hand, since glibc aborted with more
than 64K symbols when run with -pg until late 2002,
maybe he ran into trouble there.)

http://vr3.uid0.sk/cd/Mailinglist_Archives/agenda-mail/agenda-user/200107/995378424.txt says
Jay Carlson gave it a try, but doesn't have any results.

http://sources.redhat.com/binutils/docs-2.15/ld/Input-Section.html#Input%20Section
describes (just barely) how one writes a linker script
to map a given list of input sections to an output section.
That plus gcc's -ffunction-sections option are probably needed
to use this.
- Dan

-- 
Trying to get a job as a c++ developer?  See http://kegel.com/academy/getting-hired.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-26  6:19       ` Dan Kegel
@ 2004-11-26  8:55         ` Dan Kegel
  2004-11-29 16:30         ` Will Cohen
  1 sibling, 0 replies; 14+ messages in thread
From: Dan Kegel @ 2004-11-26  8:55 UTC (permalink / raw)
  To: Dan Kegel; +Cc: law, GCC Mailing List, bastian

Dan Kegel wrote:
> I went looking for instances of people using gprof's --function-ordering
> option, but came up fairly empty handed.  Anyone
> have good references?  Here's the little bit I dug up:

One more example: mozilla does something like this (driven
not by gprof, but by some other execution trace analyzer);
the perl script that generates the linker script, along with
other interesting stuff, is the last attachment to
https://bugzilla.mozilla.org/show_bug.cgi?id=65845
- Dan

> Ben's post just now was pretty good, even though that was for data and 
> not code:
> http://gcc.gnu.org/ml/gcc/2004-11/msg01002.html
> Did reordering the global variables end up helping performance on any
> interesting tasks?
> 
> http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00935.html says
> "Steve Christiansen tried using gprof output to create a linker
> script that orders functions based on run-time call graphs
> and call counts, but couldn't show that it made a difference, based on 
> SPEC CPU2000 results."
> (Since the --function-ordering option was added to gprof at
> the end of 1995, I imagine Steve Christiansen used it.
> On the other hand, since glibc aborted with more
> than 64K symbols when run with -pg until late 2002,
> maybe he ran into trouble there.)
> 
> http://vr3.uid0.sk/cd/Mailinglist_Archives/agenda-mail/agenda-user/200107/995378424.txt 
> says
> Jay Carlson gave it a try, but doesn't have any results.
> 
> http://sources.redhat.com/binutils/docs-2.15/ld/Input-Section.html#Input%20Section 
> 
> describes (just barely) how one writes a linker script
> to map a given list of input sections to an output section.
> That plus gcc's -ffunction-sections option are probably needed
> to use this.
> - Dan
> 


-- 
Trying to get a job as a c++ developer?  See http://kegel.com/academy/getting-hired.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 20:38 ` Jeffrey A Law
  2004-11-25 21:07   ` Dan Kegel
  2004-11-26  0:01   ` Ben Elliston
@ 2004-11-29 16:20   ` Will Cohen
  2004-11-30  3:24     ` Dan Kegel
  2 siblings, 1 reply; 14+ messages in thread
From: Will Cohen @ 2004-11-29 16:20 UTC (permalink / raw)
  To: law; +Cc: Dan Kegel, GCC Mailing List

Jeffrey A Law wrote:
> On Thu, 2004-11-25 at 10:14 -0800, Dan Kegel wrote:
> 
>>http://gcc.gnu.org/news/profiledriven.html doesn't mention
>>having ld reorder sections to improve locality of reference
>>of frequently called functions.    Has that technique
>>been tried with gcc and binutils?
> 
> Yes.  I had some success with it a while back.
> 
> With ELF it's not terribly hard since we can arrange for
> each function to be put in its own section.  Thus you get
> to do function-level reordering by just twiddling a linker
> map.  In fact, that's where -ffunction-sections name from.
> 
> We had some hacks to gprof which would take profile data
> and build a linker map.  That's the code that really needs
> to be improved.  I never understood the gprof code well
> enough to do it right.
> 
> Jeff
> 

There are a couple of students doing a ECE senior design project at 
North Carolina Statue University using valgrind/callgrind to collect 
call-graph information. The information is feed into a program to 
generate a link script. It should do something similar to ROPE assuming 
that the code is compiled with -ffunction-section. There should be some 
information available on it in the near future, the NCSU semester is 
wrapping up.

-Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-26  6:19       ` Dan Kegel
  2004-11-26  8:55         ` Dan Kegel
@ 2004-11-29 16:30         ` Will Cohen
  1 sibling, 0 replies; 14+ messages in thread
From: Will Cohen @ 2004-11-29 16:30 UTC (permalink / raw)
  To: Dan Kegel; +Cc: law, GCC Mailing List, bastian

Dan Kegel wrote:

> 
> http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00935.html says
> "Steve Christiansen tried using gprof output to create a linker
> script that orders functions based on run-time call graphs
> and call counts, but couldn't show that it made a difference, based on 
> SPEC CPU2000 results."
> (Since the --function-ordering option was added to gprof at
> the end of 1995, I imagine Steve Christiansen used it.
> On the other hand, since glibc aborted with more
> than 64K symbols when run with -pg until late 2002,
> maybe he ran into trouble there.)

SPEC CPU2000 is not likely to be a big winner for reordering fuctions. 
There are small sections of code that consume most of the excution time 
for most of the benchmarks. Bigger winners are going to be code where 
things do not fit neatly into a cache or a limited number of pages, e.g. 
code using many shared libraries.

On some versions of the Pentium III there are only 32 instruction TLB 
entries, each for a 4K page. That isn't going to go very far when there 
are over 100 shared libraries mapped as in the case of openoffice. Each 
library has to be on a separate page.

-Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-25 22:52     ` Joseph S. Myers
  2004-11-26  5:32       ` Dan Kegel
@ 2004-11-29 18:05       ` Joe Buck
  1 sibling, 0 replies; 14+ messages in thread
From: Joe Buck @ 2004-11-29 18:05 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Dan Kegel, law, GCC Mailing List, bastian

On Thu, Nov 25, 2004 at 09:16:55PM +0000, Joseph S. Myers wrote:
> You could try to get hold of the semi-mythical "GNU Rope".  (Most of the 
> online references are people asking what happened to it, e.g. 
> <http://www.advogato.org/article/660.html>.)

Well, the guy who did it (Nat Friedman) found a more interesting project
to spend his time on (he started a company called Ximian that you might
have heard of), so it appears that GNU Rope was orphaned.  The web site
mentioned in the GNU Rope paper no longer exists.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: profile-driven optimization and the linker?
  2004-11-29 16:20   ` Will Cohen
@ 2004-11-30  3:24     ` Dan Kegel
  0 siblings, 0 replies; 14+ messages in thread
From: Dan Kegel @ 2004-11-30  3:24 UTC (permalink / raw)
  To: Will Cohen; +Cc: law, GCC Mailing List

Will Cohen wrote:
>>> http://gcc.gnu.org/news/profiledriven.html doesn't mention
>>> having ld reorder sections to improve locality of reference
>>> of frequently called functions.    Has that technique
>>> been tried with gcc and binutils?
>>
>> Yes.  I had some success with it a while back.  ...
>> We had some hacks to gprof which would take profile data
>> and build a linker map.  That's the code that really needs
>> to be improved.  I never understood the gprof code well
>> enough to do it right.  ...
> 
> There are a couple of students doing a ECE senior design project at 
> North Carolina Statue University using valgrind/callgrind to collect 
> call-graph information. The information is feed into a program to 
> generate a link script. It should do something similar to ROPE assuming 
> that the code is compiled with -ffunction-section. There should be some 
> information available on it in the near future, the NCSU semester is 
> wrapping up.

Excellent.  I'm trying to put together a similar project for
a course at UCLA, and it'd be nice to be able to build on
what the NCSU project did.
- Dan

-- 
Trying to get a job as a c++ developer?  See http://kegel.com/academy/getting-hired.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-11-30  2:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-25 19:49 profile-driven optimization and the linker? Dan Kegel
2004-11-25 20:38 ` Jeffrey A Law
2004-11-25 21:07   ` Dan Kegel
2004-11-25 22:52     ` Joseph S. Myers
2004-11-26  5:32       ` Dan Kegel
2004-11-29 18:05       ` Joe Buck
2004-11-25 23:13     ` Jeffrey A Law
2004-11-26  5:08       ` Dan Kegel
2004-11-26  6:19       ` Dan Kegel
2004-11-26  8:55         ` Dan Kegel
2004-11-29 16:30         ` Will Cohen
2004-11-26  0:01   ` Ben Elliston
2004-11-29 16:20   ` Will Cohen
2004-11-30  3:24     ` Dan Kegel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).