public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* re: Large, modular C++ application performance ...
@ 2005-07-30 15:19 dank
  2005-08-01  9:53 ` michael meeks
  0 siblings, 1 reply; 16+ messages in thread
From: dank @ 2005-07-30 15:19 UTC (permalink / raw)
  To: gcc; +Cc: michael.meeks

MM wrote in http://go-oo.org/~michael/OOoStartup.pdf:
"... not one slot was overridden by an implementation
method external to the implementing library."

Hmm.  For some reason that reminds me of the 'final'
keyword which is periodically proposed
(e.g. http://gcc.gnu.org/ml/gcc/2004-02/msg01483.html).
Is this a situation where 'final' would have a benefit?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* re: Large, modular C++ application performance ...
  2005-07-30 15:19 Large, modular C++ application performance dank
@ 2005-08-01  9:53 ` michael meeks
  0 siblings, 0 replies; 16+ messages in thread
From: michael meeks @ 2005-08-01  9:53 UTC (permalink / raw)
  To: dank; +Cc: gcc

Hi Dan,

On Sat, 2005-07-30 at 11:19 -0400, dank@kegel.com wrote:
> MM wrote in http://go-oo.org/~michael/OOoStartup.pdf:
> "... not one slot was overridden by an implementation
> method external to the implementing library."

	This is really an issue rather orthogonal to that of 'final', what I'm
trying to say (clearly, rather badly) - is that in those 3 libraries
there were 0 instances of virtual functions of a given class implemented
in that DSO, being implemented outside that DSO.[1]

	The significance of this is that - if we can markup classes to generate
internal relocations for their overridden slots, and copy the parent
library's (also internally) relocated version for inherited slots,
(during this proposed idle vtable relocation process). Then we would
avoid needing ~any named relocations at all to construct these vtables.
ie. go from many tens of thousands of the slowest type of relocation, to
none.

	HTH,

		Michael.

[1] - further research AFAIR showed only a handful of these instances
across all OO.o libraries.
-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-08-02 13:57         ` H. J. Lu
@ 2005-08-02 16:15           ` michael meeks
  0 siblings, 0 replies; 16+ messages in thread
From: michael meeks @ 2005-08-02 16:15 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Martin Hollmichel - Sun Germany - ham02 - Hamburg, Giovanni Bajo, gcc


On Tue, 2005-08-02 at 06:57 -0700, H. J. Lu wrote:
> Maitaining a C++ linker map isn't easy. I think gcc should help out
> here.

	What do you suggest ? - something separate from the visibility markup ?
perhaps what I'm suggesting is some horribly mis-use of that. Clearly
adding a new visibility attribute that would bind that symbol
internally, yet export it would be a simple approach; did you have a
better idea ? and/or suggestions for a name ? - or is this a total
non-starter for some other reason ?

> > 	That would suit our needs beautifully - if, when used to annotate a
> > class, it would allow the various typeinfo / vague-linkage pieces
> > through as 'default'. Is it a realistic suggestion ? / if so, am happy
> > to knock up a patch.
> > 
> > 	[ and of course, this is only 1/2 the problem - the other half isn't
> > much helped by visibility markup as previously discussed ;-]
>
> Why not? If you know a symbol in DSO won't be overridden by others,
> you can resolve it locally via a linker map.

	Sure - the other (more than) 1/2 of the performance problem comes from
named relocations to symbols external to the DSO.

	Thanks,

		Michael.

-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-08-02  9:59       ` michael meeks
@ 2005-08-02 13:57         ` H. J. Lu
  2005-08-02 16:15           ` michael meeks
  0 siblings, 1 reply; 16+ messages in thread
From: H. J. Lu @ 2005-08-02 13:57 UTC (permalink / raw)
  To: michael meeks
  Cc: Martin Hollmichel - Sun Germany - ham02 - Hamburg, Giovanni Bajo, gcc

On Tue, Aug 02, 2005 at 10:59:01AM +0100, michael meeks wrote:
> Hi H.J.,
> 
> >  Why can't you you do it with ELF using a linker map? Libstdc++.so is
> > built with a linker map. Any C++ shared library should use one if the
> > startup time is a big concern. Of coursee, if gcc can generate a list
> > of symbols suitable for linker map, which needs to be exported, it will
> > be very helpful. I don't think it will be too hard to implement.
> 
> 	So - the thing about linker maps (cf. the ldump4 tool) is that they
> tend to be hard to maintain, not portable across platforms, a source of
> grief and problems etc. ;-) [ we have several strata of old, now defunct
> link maps lying around from previous investments of effort that
> subsequently became useless ].

Maitaining a C++ linker map isn't easy. I think gcc should help out
here.

> 
> 	As I recall, I saw a suggestion (from you I think), for a new
> visibility attribute 'export' or somesuch, that would resolve names
> internally to the library, while still exporting the symbols.

I sugggested the "export" visibility to export a symbol from an
executable, even if it wasn't used by any DSOs.

> 
> 	That would suit our needs beautifully - if, when used to annotate a
> class, it would allow the various typeinfo / vague-linkage pieces
> through as 'default'. Is it a realistic suggestion ? / if so, am happy
> to knock up a patch.
> 
> 	[ and of course, this is only 1/2 the problem - the other half isn't
> much helped by visibility markup as previously discussed ;-]
> 

Why not? If you know a symbol in DSO won't be overridden by others,
you can resolve it locally via a linker map.



H.J.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-08-01 15:55     ` H. J. Lu
@ 2005-08-02  9:59       ` michael meeks
  2005-08-02 13:57         ` H. J. Lu
  0 siblings, 1 reply; 16+ messages in thread
From: michael meeks @ 2005-08-02  9:59 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Martin Hollmichel - Sun Germany - ham02 - Hamburg, Giovanni Bajo, gcc

Hi H.J.,

On Mon, 2005-08-01 at 08:55 -0700, H. J. Lu wrote:
> > 	-fvisibility is helpful - as the paper says, not as helpful as the old
> > -Bsymbolic (or link maps exposing only 3 or so functions) were. However
> > - -fvisibility can only help so much - if you have:
>
> Since you were comparing Windows vs. ELF, doesn't Windows need a file
> to define which symbols to export for a shared library ?

	Apparently so - here is my (fragementary) understanding of that -
Martin - please do correct me. OO.o builds the .defs on Win32 with a
custom tool called 'ldump4'. That (interestingly) goes groping in some
binary file format, reads the symbol table, groks symbols tagged with
'EXPORT:', and builds a .def file. ie. it *looks* like it's automated,
and can uses the API marked (__dllexport etc.) where appropriate.

>  Why can't you you do it with ELF using a linker map? Libstdc++.so is
> built with a linker map. Any C++ shared library should use one if the
> startup time is a big concern. Of coursee, if gcc can generate a list
> of symbols suitable for linker map, which needs to be exported, it will
> be very helpful. I don't think it will be too hard to implement.

	So - the thing about linker maps (cf. the ldump4 tool) is that they
tend to be hard to maintain, not portable across platforms, a source of
grief and problems etc. ;-) [ we have several strata of old, now defunct
link maps lying around from previous investments of effort that
subsequently became useless ].

	As I recall, I saw a suggestion (from you I think), for a new
visibility attribute 'export' or somesuch, that would resolve names
internally to the library, while still exporting the symbols.

	That would suit our needs beautifully - if, when used to annotate a
class, it would allow the various typeinfo / vague-linkage pieces
through as 'default'. Is it a realistic suggestion ? / if so, am happy
to knock up a patch.

	[ and of course, this is only 1/2 the problem - the other half isn't
much helped by visibility markup as previously discussed ;-]

	Thanks,

		Michael.

-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-08-01 12:18       ` Steven Bosscher
@ 2005-08-02  9:22         ` michael meeks
  0 siblings, 0 replies; 16+ messages in thread
From: michael meeks @ 2005-08-02  9:22 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, Andrew Haley, Giovanni Bajo


On Mon, 2005-08-01 at 14:18 +0200, Steven Bosscher wrote:
> On Monday 01 August 2005 11:44, michael meeks wrote:
> > 	However - the log(s) term is rather irrelevant to my argument :-)
> 
> Not really.  Maybe the oprofile results for the linker show that the
> behavior is worse, or maybe better - who knows :-)
> Have you looked at any profiles btw?  Just for the curious...

	Yes - identifying the linker and relocation processing as the root
cause of the problem isn't just a stab in the dark :-)

	This flgas up as the no.1 (individual) performance killer with whatever
profiling tools you use eg.:

	* vtune
	* speedprof
	* instrumenting top/tail of dlopen calls

	etc. :-)

	Regards,

		Michael.

-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-07-29 19:49 michael meeks
  2005-07-29 20:19 ` Florian Weimer
  2005-07-30 13:36 ` Giovanni Bajo
@ 2005-08-01 16:59 ` Dan Nicolaescu
  2 siblings, 0 replies; 16+ messages in thread
From: Dan Nicolaescu @ 2005-08-01 16:59 UTC (permalink / raw)
  To: michael.meeks; +Cc: gcc

michael meeks <michael.meeks@novell.com> writes:

  > Hi there,
  > 
  > 	I've been doing a little thinking about how to improve OO.o startup
  > performance recently; and - well, relocation processing happens to be
  > the single, biggest thing that most tools flag.

Have you tried eliminating all the unneeded shared libraries linked to
all the OO.o binaries and shared libraries? This should have an impact on
startup time.

ldd -u -r BINARY_OR_SHARED_LIBRARY 
should not print anything 

(as a side note Gnome is a much bigger offender on linking way too
many unused shared libraries...)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-08-01  9:39   ` michael meeks
@ 2005-08-01 15:55     ` H. J. Lu
  2005-08-02  9:59       ` michael meeks
  0 siblings, 1 reply; 16+ messages in thread
From: H. J. Lu @ 2005-08-01 15:55 UTC (permalink / raw)
  To: michael meeks; +Cc: Giovanni Bajo, gcc

On Mon, Aug 01, 2005 at 10:38:46AM +0100, michael meeks wrote:
> Hi Giovanni,
> 
> On Sat, 2005-07-30 at 15:36 +0200, Giovanni Bajo wrote:
> > I'm slow, but I can't understand why a careful design of the interfaces of
> > the dynamic libraries
> 
> 	Well - sure, depends how 'careful' you are ;-) clearly if no C++
> classes with virtual methods form the interface of any library, then
> there is no problem ;-) unfortunately, mandating that would rather
> cripple C++.
> 
> >  together with the new -fvisibility flags, should not
> > be sufficient. It worked well in other scenarios
> 
> 	-fvisibility is helpful - as the paper says, not as helpful as the old
> -Bsymbolic (or link maps exposing only 3 or so functions) were. However
> - -fvisibility can only help so much - if you have:
> 

Since you were comparing Windows vs. ELF, doesn't Windows need a file
to define which symbols to export for a shared library? Why can't you
you do it with ELF using a linker map? Libstdc++.so is built with
a linker map. Any C++ shared library should use one if the startup
time is a big concern. Of coursee, if gcc can generate a list of
symbols suitable for linker map, which needs to be exported, it will
be very helpful. I don't think it will be too hard to implement.


H.J.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-08-01  9:45     ` michael meeks
@ 2005-08-01 12:18       ` Steven Bosscher
  2005-08-02  9:22         ` michael meeks
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Bosscher @ 2005-08-01 12:18 UTC (permalink / raw)
  To: gcc, michael.meeks; +Cc: Andrew Haley, Giovanni Bajo

On Monday 01 August 2005 11:44, michael meeks wrote:
> 	However - the log(s) term is rather irrelevant to my argument :-)

Not really.  Maybe the oprofile results for the linker show that the
behavior is worse, or maybe better - who knows :-)
Have you looked at any profiles btw?  Just for the curious...

Gr.
Steven

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-07-30 17:24   ` Andrew Haley
@ 2005-08-01  9:45     ` michael meeks
  2005-08-01 12:18       ` Steven Bosscher
  0 siblings, 1 reply; 16+ messages in thread
From: michael meeks @ 2005-08-01  9:45 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Giovanni Bajo, gcc


On Sat, 2005-07-30 at 18:25 +0100, Andrew Haley wrote:
>  > > All input much appreciated; no doubt my terminology is irritatingly up
>  > > the creek, hopefully the sentiment will win through.
>  > >
>  > > http://go-oo.org/~michael/OOoStartup.pdf
> 
> One thing I don't understand is the formula where you write linking
> time is proprortional to the log of the total number of symbols.  Does
> this come from drepper's paper, or somewhere else?

	I defer to Ulrich's text:
		http://people.redhat.com/drepper/dsohowto.pdf

	Section 1.5 of:

	"Deficiencies in the ELF hash table function and various ELF extensions
modifying the symbol lookup functionality may well increase the factor
to O(R + r.n.log(s)) where s is the number of symbols. This should make
clear that for improved performance it is significant to reduce the
number of relocations and symbols as much as possible".

	However - the log(s) term is rather irrelevant to my argument :-)

	HTH,

		Michael.

-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-07-30 13:36 ` Giovanni Bajo
  2005-07-30 17:24   ` Andrew Haley
@ 2005-08-01  9:39   ` michael meeks
  2005-08-01 15:55     ` H. J. Lu
  1 sibling, 1 reply; 16+ messages in thread
From: michael meeks @ 2005-08-01  9:39 UTC (permalink / raw)
  To: Giovanni Bajo; +Cc: gcc

Hi Giovanni,

On Sat, 2005-07-30 at 15:36 +0200, Giovanni Bajo wrote:
> I'm slow, but I can't understand why a careful design of the interfaces of
> the dynamic libraries

	Well - sure, depends how 'careful' you are ;-) clearly if no C++
classes with virtual methods form the interface of any library, then
there is no problem ;-) unfortunately, mandating that would rather
cripple C++.

>  together with the new -fvisibility flags, should not
> be sufficient. It worked well in other scenarios

	-fvisibility is helpful - as the paper says, not as helpful as the old
-Bsymbolic (or link maps exposing only 3 or so functions) were. However
- -fvisibility can only help so much - if you have:

class LibraryAClass {
	virtual void doFoo(void);
};
class LibraryBClass : public LibraryAClass {
	virtual void doBaa(void);
};

	then there are 2 problems:

	a) there is no symbol visibility that will trigger internal
	   binding in addition to a symbol export. ie. if 
	   'LibraryBClass' is a public interface - no useful
	   visibility markup can be done; and hence we have a named
	   relocation for 'doBaa's vtable slot.
	   [ IMHO this is a feature-gap, we need a new ('export'?)
	     visibility attribute for this case ].

	b) even if LibraryBClass is a 'hidden' class - to build it's
	   vtable we have to have a slot for 'doFoo' which is in
	   an external library (A) => another named relocation. An 
	   unavoidable consequence of using virtual classes as part of
	   a library's API.

> IMHO, it's unreasonable to break the C++ ABI for 1 second of warm time
> startup.

	Well - it's an option that was considered, although - as you say -
highly unpleasant, and probably quite unnecessary - as the paper
explains.

	Regards,

		Michael.

-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-07-30 13:36 ` Giovanni Bajo
@ 2005-07-30 17:24   ` Andrew Haley
  2005-08-01  9:45     ` michael meeks
  2005-08-01  9:39   ` michael meeks
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Haley @ 2005-07-30 17:24 UTC (permalink / raw)
  To: Giovanni Bajo; +Cc: michael.meeks, gcc

Giovanni Bajo writes:
 > michael meeks <michael.meeks@novell.com> wrote:
 > 
 > > I've been doing a little thinking about how to improve OO.o startup
 > > performance recently; and - well, relocation processing happens to be
 > > the single, biggest thing that most tools flag.
 > >
 > > Anyhow - so I wrote up the problem, and a couple of potential
 > > solutions / extensions / workarounds, and - being of a generally
 > > clueless nature, was hoping to solicit instruction from those of a more
 > > enlightened disposition.
 > >
 > > All input much appreciated; no doubt my terminology is irritatingly up
 > > the creek, hopefully the sentiment will win through.
 > >
 > > http://go-oo.org/~michael/OOoStartup.pdf

One thing I don't understand is the formula where you write linking
time is proprortional to the log of the total number of symbols.  Does
this come from drepper's paper, or somewhere else?

Andrew.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-07-29 19:49 michael meeks
  2005-07-29 20:19 ` Florian Weimer
@ 2005-07-30 13:36 ` Giovanni Bajo
  2005-07-30 17:24   ` Andrew Haley
  2005-08-01  9:39   ` michael meeks
  2005-08-01 16:59 ` Dan Nicolaescu
  2 siblings, 2 replies; 16+ messages in thread
From: Giovanni Bajo @ 2005-07-30 13:36 UTC (permalink / raw)
  To: michael.meeks; +Cc: gcc

michael meeks <michael.meeks@novell.com> wrote:

> I've been doing a little thinking about how to improve OO.o startup
> performance recently; and - well, relocation processing happens to be
> the single, biggest thing that most tools flag.
>
> Anyhow - so I wrote up the problem, and a couple of potential
> solutions / extensions / workarounds, and - being of a generally
> clueless nature, was hoping to solicit instruction from those of a more
> enlightened disposition.
>
> All input much appreciated; no doubt my terminology is irritatingly up
> the creek, hopefully the sentiment will win through.
>
> http://go-oo.org/~michael/OOoStartup.pdf
>
> Two solutions are proposed - there are almost certainly more that I'm
> not thinking of. I'm interested in people's views as to which approach
> is best. So far the constructor hook approach seems to be the path of
> least resistance.


I'm slow, but I can't understand why a careful design of the interfaces of
the dynamic libraries, together with the new -fvisibility flags, should not
be sufficient. It worked well in other scenarios
(http://gcc.gnu.org/wiki/Visibility).

IMHO, it's unreasonable to break the C++ ABI for 1 second of warm time
startup.
-- 
Giovanni Bajo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-07-29 20:19 ` Florian Weimer
@ 2005-07-30 13:26   ` Nix
  0 siblings, 0 replies; 16+ messages in thread
From: Nix @ 2005-07-30 13:26 UTC (permalink / raw)
  To: Florian Weimer; +Cc: michael.meeks, gcc

On 29 Jul 2005, Florian Weimer announced authoritatively:
> * michael meeks:
> 
>> 	I've been doing a little thinking about how to improve OO.o startup
>> performance recently; and - well, relocation processing happens to be
>> the single, biggest thing that most tools flag.
> 
> Have you tried prelinking?

Prelinking is mentioned near the start of the paper and was actually
implemented with OOo (and KDE) in mind.

Alas, it's ineffective for dlopen()ed objects, and OOo dlopen()s nearly
everything. (To my mind the solution is `don't do that then; DT_NEEDED
has a purpose dammit'... certainly this is less disruptive than a change
to the C++ ABI, requiring cooperation with other vendors and a rebuild
of the entire known C++ universe yet again! But I am but an egg in these
waters.)

-- 
`Tor employs several thousand editors who they keep in dank
 subterranean editing facilities not unlike Moria' -- James Nicoll 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Large, modular C++ application performance ...
  2005-07-29 19:49 michael meeks
@ 2005-07-29 20:19 ` Florian Weimer
  2005-07-30 13:26   ` Nix
  2005-07-30 13:36 ` Giovanni Bajo
  2005-08-01 16:59 ` Dan Nicolaescu
  2 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2005-07-29 20:19 UTC (permalink / raw)
  To: michael.meeks; +Cc: gcc

* michael meeks:

> 	I've been doing a little thinking about how to improve OO.o startup
> performance recently; and - well, relocation processing happens to be
> the single, biggest thing that most tools flag.

Have you tried prelinking?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Large, modular C++ application performance ...
@ 2005-07-29 19:49 michael meeks
  2005-07-29 20:19 ` Florian Weimer
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: michael meeks @ 2005-07-29 19:49 UTC (permalink / raw)
  To: gcc

Hi there,

	I've been doing a little thinking about how to improve OO.o startup
performance recently; and - well, relocation processing happens to be
the single, biggest thing that most tools flag.

	Anyhow - so I wrote up the problem, and a couple of potential
solutions / extensions / workarounds, and - being of a generally
clueless nature, was hoping to solicit instruction from those of a more
enlightened disposition.

	All input much appreciated; no doubt my terminology is irritatingly up
the creek, hopefully the sentiment will win through.

	http://go-oo.org/~michael/OOoStartup.pdf

	Two solutions are proposed - there are almost certainly more that I'm
not thinking of. I'm interested in people's views as to which approach
is best. So far the constructor hook approach seems to be the path of
least resistance.

	Thanks,

		Michael.

-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-08-02 16:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-30 15:19 Large, modular C++ application performance dank
2005-08-01  9:53 ` michael meeks
  -- strict thread matches above, loose matches on Subject: below --
2005-07-29 19:49 michael meeks
2005-07-29 20:19 ` Florian Weimer
2005-07-30 13:26   ` Nix
2005-07-30 13:36 ` Giovanni Bajo
2005-07-30 17:24   ` Andrew Haley
2005-08-01  9:45     ` michael meeks
2005-08-01 12:18       ` Steven Bosscher
2005-08-02  9:22         ` michael meeks
2005-08-01  9:39   ` michael meeks
2005-08-01 15:55     ` H. J. Lu
2005-08-02  9:59       ` michael meeks
2005-08-02 13:57         ` H. J. Lu
2005-08-02 16:15           ` michael meeks
2005-08-01 16:59 ` Dan Nicolaescu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).