public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* call for libstdc++ profile mode diagnostic ideas
@ 2010-12-17  7:58 Silvius Rus
  2010-12-29  1:17 ` Hargett, Matt
  0 siblings, 1 reply; 5+ messages in thread
From: Silvius Rus @ 2010-12-17  7:58 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc

Hello libstdc++ developers and users,

(I cc-ed gcc@gcc.gnu.org for a larger audience as others may offer
suggestions not as library developers but as C++ programmers.  Sorry
in advance if this is spam to you.)

I'm planning to add a set of new performance diagnostics to the
libstdc++ profile mode
(http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html) and
am trying to come up with a list of what diagnostics are most
meaningful and most wanted.

The profile mode currently gives diagnostics such as "use
unordered_map instead of map at context ...".  The full list is at
http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt03ch19s07.html.

At this (brainstorming) point I'm looking for any suggestions,
including and not limited to:
- container selection (e.g., replace list with deque).
- container implementation selection (e.g., string implementation).
- algorithm selection (e.g., sort implementation)
- data structure or algorithm parameter selection (e.g., fixed string
reserved size value for a particular context).

Please reply to this message or email me privately (rus@google.com) if
you have any suggestions for new diagnostics, or about the profile
mode in general, or to express your support for a particular proposed
diagnostic.  For new diagnostics, it works best if you can provide:
- A simple description, e.g., "replace vector with list".
- (optional) The instrumentation points, even if vague, e.g.,
"instrument insertion, element access and iterators".
- (optional) How the decision is made, e.g. "there are many inserts in
the middle, there is no random access and it's not iterated through
many times".
Keep in mind that profile mode analysis is context sensitive (context
= dynamic call stack), so it's OK to propose both "change list to
vector" and "change vector to list", with different criteria sets as
they may both be right in different contexts.

To make sure the diagnostics are realistic, keep in mind that analysis
is usually based on observing, at run time, container state before and
after each operation on the container or on iterators of the
container.

Once I have a good pool of ideas and preferences, I will present a
refined set to libstdc++@gcc.gnu.org for discussion.


Thank you,
Silvius

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: call for libstdc++ profile mode diagnostic ideas
  2010-12-17  7:58 call for libstdc++ profile mode diagnostic ideas Silvius Rus
@ 2010-12-29  1:17 ` Hargett, Matt
  2010-12-29  4:15   ` Xinliang David Li
  2010-12-29 12:37   ` Jonathan Wakely
  0 siblings, 2 replies; 5+ messages in thread
From: Hargett, Matt @ 2010-12-29  1:17 UTC (permalink / raw)
  To: Silvius Rus, libstdc++; +Cc: gcc

> I'm planning to add a set of new performance diagnostics to the
> libstdc++ profile mode
> (http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html) and
> am trying to come up with a list of what diagnostics are most
> meaningful and most wanted.
> 
> At this (brainstorming) point I'm looking for any suggestions,
> including and not limited to:
> - container selection (e.g., replace list with deque).
> - container implementation selection (e.g., string implementation).
> - algorithm selection (e.g., sort implementation)
> - data structure or algorithm parameter selection (e.g., fixed string
> reserved size value for a particular context).
> 
> Please reply to this message or email me privately (rus@google.com) if
> you have any suggestions for new diagnostics, or about the profile
> mode in general, or to express your support for a particular proposed
> diagnostic.  For new diagnostics, it works best if you can provide:

First idea, based on a performance issue I fixed in a codebase in 2001:

> - A simple description, e.g., "replace vector with list".

"cache value of std::string::c_str() instead of multiple invocations with a non-shareable declared std::string"

> - (optional) The instrumentation points, even if vague, e.g.,
> "instrument insertion, element access and iterators".

Instrument calls to std::string::c_str() that are allocation and invocation context-aware.

> - (optional) How the decision is made, e.g. "there are many inserts in
> the middle, there is no random access and it's not iterated through
> many times".

1) allocation of std::string in local variable
2) calls to said local string's c_str() method within loops
3) and said loops do not modify the contents of the value returned from c_str()

Example:

#include <string>
#include <cstdio>

void notify(const char* printable) { printf("%s\n", printable); }

int main(void)
{
  std::string name("bob");
  for (int i = 0; i < name.length(); i++)
  {
    notify(name.c_str());
  }

  return 0;
}


Second idea, based on the same codebase as before. Removing 5 conversions to/from std::string and char* resulted in a 10X throughput improvement in network throughput in that codebase:

> - A simple description, e.g., "replace vector with list".

"avoid converting a std::string to a char*, just to convert it back to std::string later in the call stack"
 
> - (optional) The instrumentation points, even if vague, e.g.,
> "instrument insertion, element access and iterators".

Instrument calls to std::string::c_str(), tracking the resulting value returned by c_str().

> - (optional) How the decision is made, e.g. "there are many inserts in
> the middle, there is no random access and it's not iterated through
> many times".

1) where a value is returned by std::string::c_str()
2) and said char* value is fed back into std::string constructor, locally or further down the call chain
3) and said char* value was not modified between the calls to std::string::c_str() and std::string()

Example:

#include <cstdio>
#include <string>

void tally(const std::string& index) { printf("%s\n", index.c_str()); }

void notify(const char* printable) { tally(std::string(printable)); }

int main(void)
{
  std::string name("bob");
  notify(name.c_str());

  return 0;
}


Over the years, I have seen both of these occur multiple times across multiple teams. The constant seems to be C programmers who are passive-aggressive about their distaste for STL, and/or teams with poor communication between module owners.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: call for libstdc++ profile mode diagnostic ideas
  2010-12-29  1:17 ` Hargett, Matt
@ 2010-12-29  4:15   ` Xinliang David Li
  2011-01-05 23:55     ` Hargett, Matt
  2010-12-29 12:37   ` Jonathan Wakely
  1 sibling, 1 reply; 5+ messages in thread
From: Xinliang David Li @ 2010-12-29  4:15 UTC (permalink / raw)
  To: Hargett, Matt; +Cc: Silvius Rus, libstdc++, gcc

Your first example points to a weakness in the compiler optimization.
If base_string constructor is inlined, the compiler should be able to
figure out both 'name' and the heap memory it points to can not be
modified by the call to notify, and therefore hoist access name.c_str
() and name.length () out of the loop. Even without inlining, with
more powerful modeling of standard function side effect (e.g.,
base_string ctor does not expose name's address), the same
optimization should be performed.

David

On Tue, Dec 28, 2010 at 5:15 PM, Hargett, Matt
<matt.hargett@bluecoat.com> wrote:
>> I'm planning to add a set of new performance diagnostics to the
>> libstdc++ profile mode
>> (http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html) and
>> am trying to come up with a list of what diagnostics are most
>> meaningful and most wanted.
>>
>> At this (brainstorming) point I'm looking for any suggestions,
>> including and not limited to:
>> - container selection (e.g., replace list with deque).
>> - container implementation selection (e.g., string implementation).
>> - algorithm selection (e.g., sort implementation)
>> - data structure or algorithm parameter selection (e.g., fixed string
>> reserved size value for a particular context).
>>
>> Please reply to this message or email me privately (rus@google.com) if
>> you have any suggestions for new diagnostics, or about the profile
>> mode in general, or to express your support for a particular proposed
>> diagnostic.  For new diagnostics, it works best if you can provide:
>
> First idea, based on a performance issue I fixed in a codebase in 2001:
>
>> - A simple description, e.g., "replace vector with list".
>
> "cache value of std::string::c_str() instead of multiple invocations with a non-shareable declared std::string"
>
>> - (optional) The instrumentation points, even if vague, e.g.,
>> "instrument insertion, element access and iterators".
>
> Instrument calls to std::string::c_str() that are allocation and invocation context-aware.
>
>> - (optional) How the decision is made, e.g. "there are many inserts in
>> the middle, there is no random access and it's not iterated through
>> many times".
>
> 1) allocation of std::string in local variable
> 2) calls to said local string's c_str() method within loops
> 3) and said loops do not modify the contents of the value returned from c_str()
>
> Example:
>
> #include <string>
> #include <cstdio>
>
> void notify(const char* printable) { printf("%s\n", printable); }
>
> int main(void)
> {
>  std::string name("bob");
>  for (int i = 0; i < name.length(); i++)
>  {
>    notify(name.c_str());
>  }
>
>  return 0;
> }
>
>
> Second idea, based on the same codebase as before. Removing 5 conversions to/from std::string and char* resulted in a 10X throughput improvement in network throughput in that codebase:
>
>> - A simple description, e.g., "replace vector with list".
>
> "avoid converting a std::string to a char*, just to convert it back to std::string later in the call stack"
>
>> - (optional) The instrumentation points, even if vague, e.g.,
>> "instrument insertion, element access and iterators".
>
> Instrument calls to std::string::c_str(), tracking the resulting value returned by c_str().
>
>> - (optional) How the decision is made, e.g. "there are many inserts in
>> the middle, there is no random access and it's not iterated through
>> many times".
>
> 1) where a value is returned by std::string::c_str()
> 2) and said char* value is fed back into std::string constructor, locally or further down the call chain
> 3) and said char* value was not modified between the calls to std::string::c_str() and std::string()
>
> Example:
>
> #include <cstdio>
> #include <string>
>
> void tally(const std::string& index) { printf("%s\n", index.c_str()); }
>
> void notify(const char* printable) { tally(std::string(printable)); }
>
> int main(void)
> {
>  std::string name("bob");
>  notify(name.c_str());
>
>  return 0;
> }
>
>
> Over the years, I have seen both of these occur multiple times across multiple teams. The constant seems to be C programmers who are passive-aggressive about their distaste for STL, and/or teams with poor communication between module owners.
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: call for libstdc++ profile mode diagnostic ideas
  2010-12-29  1:17 ` Hargett, Matt
  2010-12-29  4:15   ` Xinliang David Li
@ 2010-12-29 12:37   ` Jonathan Wakely
  1 sibling, 0 replies; 5+ messages in thread
From: Jonathan Wakely @ 2010-12-29 12:37 UTC (permalink / raw)
  To: Hargett, Matt; +Cc: Silvius Rus, libstdc++, gcc

On 29 December 2010 01:15, Hargett, Matt wrote:
>
> 1) allocation of std::string in local variable
> 2) calls to said local string's c_str() method within loops
> 3) and said loops do not modify the contents of the value returned from c_str()

You can't modify the contents, c_str() returns a const char*

> Example:
>
> #include <string>
> #include <cstdio>
>
> void notify(const char* printable) { printf("%s\n", printable); }
>
> int main(void)
> {
>  std::string name("bob");
>  for (int i = 0; i < name.length(); i++)
>  {
>    notify(name.c_str());

What would you do to improve performance here?
I don't see any significant opportunity for optimisation here.

The second idea is obviously useful, as it could avoid unnecessary
heap allocations (although not in C++0x where strings are not going to
be reference-counted.)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: call for libstdc++ profile mode diagnostic ideas
  2010-12-29  4:15   ` Xinliang David Li
@ 2011-01-05 23:55     ` Hargett, Matt
  0 siblings, 0 replies; 5+ messages in thread
From: Hargett, Matt @ 2011-01-05 23:55 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: gcc

> Your first example points to a weakness in the compiler optimization.
> If base_string constructor is inlined, the compiler should be able to
> figure out both 'name' and the heap memory it points to can not be
> modified by the call to notify, and therefore hoist access name.c_str
> () and name.length () out of the loop. Even without inlining, with
> more powerful modeling of standard function side effect (e.g.,
> base_string ctor does not expose name's address), the same
> optimization should be performed.

You're right, 4.5.1 and current trunk appear to correctly hoist those accesses out of the loop.

It would still be useful for codebases that are built using different compilers that don't have this kind of optimization, though.

> > Example:
> >
> > #include <string>
> > #include <cstdio>
> >
> > void notify(const char* printable) { printf("%s\n", printable); }
> >
> > int main(void)
> > {
> >  std::string name("bob");
> >  for (int i = 0; i < name.length(); i++)
> >  {
> >    notify(name.c_str());
> >  }
> >
> >  return 0;
> > }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-01-05 23:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-17  7:58 call for libstdc++ profile mode diagnostic ideas Silvius Rus
2010-12-29  1:17 ` Hargett, Matt
2010-12-29  4:15   ` Xinliang David Li
2011-01-05 23:55     ` Hargett, Matt
2010-12-29 12:37   ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).