From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27290 invoked by alias); 29 Dec 2010 04:15:11 -0000 Received: (qmail 22801 invoked by uid 22791); 29 Dec 2010 04:14:51 -0000 X-SWARE-Spam-Status: No, hits=-1.5 required=5.0 tests=AWL,BAYES_50,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (74.125.121.67) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 29 Dec 2010 04:14:45 +0000 Received: from wpaz13.hot.corp.google.com (wpaz13.hot.corp.google.com [172.24.198.77]) by smtp-out.google.com with ESMTP id oBT4Egrd001382 for ; Tue, 28 Dec 2010 20:14:43 -0800 Received: from qwi4 (qwi4.prod.google.com [10.241.195.4]) by wpaz13.hot.corp.google.com with ESMTP id oBT4EF4O018531 for ; Tue, 28 Dec 2010 20:14:41 -0800 Received: by qwi4 with SMTP id 4so10070552qwi.11 for ; Tue, 28 Dec 2010 20:14:41 -0800 (PST) MIME-Version: 1.0 Received: by 10.229.96.133 with SMTP id h5mr12587824qcn.147.1293596081024; Tue, 28 Dec 2010 20:14:41 -0800 (PST) Received: by 10.220.110.84 with HTTP; Tue, 28 Dec 2010 20:14:40 -0800 (PST) In-Reply-To: <95F62D0687BC854194F0005299FD51150258FEC7@bcs-mail04.internal.cacheflow.com> References: <95F62D0687BC854194F0005299FD51150258FEC7@bcs-mail04.internal.cacheflow.com> Date: Wed, 29 Dec 2010 04:15:00 -0000 Message-ID: Subject: Re: call for libstdc++ profile mode diagnostic ideas From: Xinliang David Li To: "Hargett, Matt" Cc: Silvius Rus , "libstdc++" , gcc@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2010-12/txt/msg00470.txt.bz2 Your first example points to a weakness in the compiler optimization. If base_string constructor is inlined, the compiler should be able to figure out both 'name' and the heap memory it points to can not be modified by the call to notify, and therefore hoist access name.c_str () and name.length () out of the loop. Even without inlining, with more powerful modeling of standard function side effect (e.g., base_string ctor does not expose name's address), the same optimization should be performed. David On Tue, Dec 28, 2010 at 5:15 PM, Hargett, Matt wrote: >> I'm planning to add a set of new performance diagnostics to the >> libstdc++ profile mode >> (http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html) and >> am trying to come up with a list of what diagnostics are most >> meaningful and most wanted. >> >> At this (brainstorming) point I'm looking for any suggestions, >> including and not limited to: >> - container selection (e.g., replace list with deque). >> - container implementation selection (e.g., string implementation). >> - algorithm selection (e.g., sort implementation) >> - data structure or algorithm parameter selection (e.g., fixed string >> reserved size value for a particular context). >> >> Please reply to this message or email me privately (rus@google.com) if >> you have any suggestions for new diagnostics, or about the profile >> mode in general, or to express your support for a particular proposed >> diagnostic. =A0For new diagnostics, it works best if you can provide: > > First idea, based on a performance issue I fixed in a codebase in 2001: > >> - A simple description, e.g., "replace vector with list". > > "cache value of std::string::c_str() instead of multiple invocations with= a non-shareable declared std::string" > >> - (optional) The instrumentation points, even if vague, e.g., >> "instrument insertion, element access and iterators". > > Instrument calls to std::string::c_str() that are allocation and invocati= on context-aware. > >> - (optional) How the decision is made, e.g. "there are many inserts in >> the middle, there is no random access and it's not iterated through >> many times". > > 1) allocation of std::string in local variable > 2) calls to said local string's c_str() method within loops > 3) and said loops do not modify the contents of the value returned from c= _str() > > Example: > > #include > #include > > void notify(const char* printable) { printf("%s\n", printable); } > > int main(void) > { > =A0std::string name("bob"); > =A0for (int i =3D 0; i < name.length(); i++) > =A0{ > =A0 =A0notify(name.c_str()); > =A0} > > =A0return 0; > } > > > Second idea, based on the same codebase as before. Removing 5 conversions= to/from std::string and char* resulted in a 10X throughput improvement in = network throughput in that codebase: > >> - A simple description, e.g., "replace vector with list". > > "avoid converting a std::string to a char*, just to convert it back to st= d::string later in the call stack" > >> - (optional) The instrumentation points, even if vague, e.g., >> "instrument insertion, element access and iterators". > > Instrument calls to std::string::c_str(), tracking the resulting value re= turned by c_str(). > >> - (optional) How the decision is made, e.g. "there are many inserts in >> the middle, there is no random access and it's not iterated through >> many times". > > 1) where a value is returned by std::string::c_str() > 2) and said char* value is fed back into std::string constructor, locally= or further down the call chain > 3) and said char* value was not modified between the calls to std::string= ::c_str() and std::string() > > Example: > > #include > #include > > void tally(const std::string& index) { printf("%s\n", index.c_str()); } > > void notify(const char* printable) { tally(std::string(printable)); } > > int main(void) > { > =A0std::string name("bob"); > =A0notify(name.c_str()); > > =A0return 0; > } > > > Over the years, I have seen both of these occur multiple times across mul= tiple teams. The constant seems to be C programmers who are passive-aggress= ive about their distaste for STL, and/or teams with poor communication betw= een module owners. > >