From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29999 invoked by alias); 4 Nov 2016 12:55:11 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 29983 invoked by uid 89); 4 Nov 2016 12:55:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=finishes, performs, *delim, 4511 X-HELO: mail-ua0-f174.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=rmc285d6l3sIBSihrsfCwyCPuf7Ekw380ne63/T7x5I=; b=RT3HCX4/QoS7JSSbFNIV0l3aVPA7dHE5PertS1kkJTu7Qs8wmF78cLiSLyqmU1gxgw mPQBpqxwlIoupJIGHqkMCyf3DwoEBMOsrgN8TqEsgaC2NEcPYC3+ZcZyHTd9Gt8SP+bq hWA62ui3KP8RQdOUEMdGznTWJQUPC0+zmkhYRUzls+uoMTcwl8OPtkhGfmqetR7Kcivq IAIQqyH6d0Y/8x2wJK+VaB7J9KwHAF0A7zqCfl+Z3k9wMLBAdZqRD9AEuqytm9AEnsUq zBXGJHlElOLpV7RcJJCqX7C6F+PvQOy6HxFZocfroOfH7WcNw65NW4Kn+P/21EqBloqV Yi9g== X-Gm-Message-State: ABUngvcGpI+v0ZLQknEs5SFfFVa5MPCLWgTehYakJSuLyHLTa+Y/Vn/dIWzYOR8RIN/W13xT X-Received: by 10.176.4.130 with SMTP id 2mr295823uaw.19.1478264097585; Fri, 04 Nov 2016 05:54:57 -0700 (PDT) Subject: Re: [PATCH] Improve strtok(_r) performance To: libc-alpha@sourceware.org References: From: Adhemerval Zanella Message-ID: Date: Fri, 04 Nov 2016 12:55:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-SW-Source: 2016-11/txt/msg00150.txt.bz2 On 28/10/2016 09:35, Wilco Dijkstra wrote: > Improve strtok(_r) performance. Instead of calling strpbrk which calls > strcspn, call strcspn directly so we get the end of the token without > an extra call to rawmemchr. Also avoid an unnecessary call to strcspn after > the last token by adding an early exit for an empty string. The result > is a ~2x speedup of strtok on most inputs in bench-strtok. > > Passes regression tests, OK for commit? Why not aim for simplicity and just use strtok_r and strtok? I should be a tail call in most architecture and performance loss should be minimum. Either way LGTM. I also found that powerpc64 optimized one performs worse than this new default one, once you push it in I plan to remove it. > > ChangeLog: > 2015-10-28 Wilco Dijkstra > > * string/strtok.c (STRTOK): Optimize for performance. > * string/strtok_r.c (__strtok_r): Likewise. > -- > > diff --git a/string/strtok.c b/string/strtok.c > index 7a4574db5c80501e47d045ad4347e8a287b32191..b1ed48c24c8d20706b7d05481a138b18a01ff802 100644 > --- a/string/strtok.c > +++ b/string/strtok.c > @@ -38,11 +38,18 @@ static char *olds; > char * > STRTOK (char *s, const char *delim) > { > - char *token; > + char *end; > > if (s == NULL) > s = olds; > > + /* Return immediately at end of string. */ > + if (*s == '\0') > + { > + olds = s; > + return NULL; > + } > + > /* Scan leading delimiters. */ > s += strspn (s, delim); > if (*s == '\0') > @@ -52,16 +59,15 @@ STRTOK (char *s, const char *delim) > } > > /* Find the end of the token. */ > - token = s; > - s = strpbrk (token, delim); > - if (s == NULL) > - /* This token finishes the string. */ > - olds = __rawmemchr (token, '\0'); > - else > + end = s + strcspn (s, delim); > + if (*end == '\0') > { > - /* Terminate the token and make OLDS point past it. */ > - *s = '\0'; > - olds = s + 1; > + olds = end; > + return s; > } > - return token; > + > + /* Terminate the token and make OLDS point past it. */ > + *end = '\0'; > + olds = end + 1; > + return s; > } > diff --git a/string/strtok_r.c b/string/strtok_r.c > index f351304766108dad2c1cff881ad3bebae821b2a0..e049a5c82e026a3b6c1ba5da16ce81743717805e 100644 > --- a/string/strtok_r.c > +++ b/string/strtok_r.c > @@ -45,11 +45,17 @@ > char * > __strtok_r (char *s, const char *delim, char **save_ptr) > { > - char *token; > + char *end; > > if (s == NULL) > s = *save_ptr; > > + if (*s == '\0') > + { > + *save_ptr = s; > + return NULL; > + } > + > /* Scan leading delimiters. */ > s += strspn (s, delim); > if (*s == '\0') > @@ -59,18 +65,17 @@ __strtok_r (char *s, const char *delim, char **save_ptr) > } > > /* Find the end of the token. */ > - token = s; > - s = strpbrk (token, delim); > - if (s == NULL) > - /* This token finishes the string. */ > - *save_ptr = __rawmemchr (token, '\0'); > - else > + end = s + strcspn (s, delim); > + if (*end == '\0') > { > - /* Terminate the token and make *SAVE_PTR point past it. */ > - *s = '\0'; > - *save_ptr = s + 1; > + *save_ptr = end; > + return s; > } > - return token; > + > + /* Terminate the token and make *SAVE_PTR point past it. */ > + *end = '\0'; > + *save_ptr = end + 1; > + return s; > } > #ifdef weak_alias > libc_hidden_def (__strtok_r) >