From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 55150 invoked by alias); 14 Nov 2016 12:21:05 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 55123 invoked by uid 89); 14 Nov 2016 12:21:03 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=U*libc-alpha, libc-alpha@sourceware.org, libcalphasourcewareorg, sk:libc-al X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: "libc-alpha@sourceware.org" CC: nd Subject: Re: [PATCH] Improve strtok(_r) performance Date: Mon, 14 Nov 2016 12:21:00 -0000 Message-ID: References: In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-microsoft-exchange-diagnostics: 1;AM5PR0802MB2609;7:GpDSUY6FffJRIuU/lMQB+Z/tPstiycC7FpB+SgEK9Xv6ZnPkDC+hRtZD8u8xvS8frz8ebLnBl+1zdVlxUQtFLFmTKB9kdVTJ6BQdPhCXZ0V02MUgZKvmocLSHI9ecLqFeJ+vlEo6dpcxmKaaN37nwgajuuVCrdTgWJbQ2qGtLcuUFWHG8MkvbifuWL5EmSpq/mQGYRCkpzMvguPXdNkqfzoc01X0kqPNebpzypKpcvHmcouzVtlYSCCJxVTzk9JzCGs7eE1AzK+qrojJogeExNZDMqM4OPLE/uW76s/ujEDcFyGCmgu28CCQAqjfdscNekKjqWoLzR4bsYgd5NKFZR9GDv9NM3nxW4AyXMcyrf0= x-ms-office365-filtering-correlation-id: 4cad7c26-6c45-4abc-4f90-08d40c88a9c8 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:AM5PR0802MB2609; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6060326)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6055026)(6061321);SRVR:AM5PR0802MB2609;BCL:0;PCL:0;RULEID:;SRVR:AM5PR0802MB2609; x-forefront-prvs: 0126A32F74 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(7916002)(336003)(199003)(377424004)(189002)(54534003)(586003)(77096005)(2501003)(3280700002)(81156014)(33656002)(81166006)(450100001)(189998001)(6116002)(4001150100001)(2900100001)(8936002)(86362001)(3660700001)(87936001)(66066001)(76576001)(6916009)(5660300001)(2950100002)(110136003)(7696004)(97736004)(4326007)(106356001)(7736002)(68736007)(3900700001)(229853002)(2906002)(2351001)(92566002)(106116001)(54356999)(76176999)(74316002)(9686002)(122556002)(101416001)(8676002)(50986999)(5640700001)(305945005)(7846002)(102836003)(105586002)(3846002);DIR:OUT;SFP:1101;SCL:1;SRVR:AM5PR0802MB2609;H:AM5PR0802MB2610.eurprd08.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Nov 2016 12:20:46.8754 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2609 X-SW-Source: 2016-11/txt/msg00454.txt.bz2 ping From: Wilco Dijkstra Sent: 28 October 2016 12:35 To: libc-alpha@sourceware.org Cc: nd Subject: [PATCH] Improve strtok(_r) performance =A0=20=20=20 Improve strtok(_r) performance.=A0 Instead of calling strpbrk which calls strcspn, call strcspn directly so we get the end of the token without an extra call to rawmemchr.=A0 Also avoid an unnecessary call to strcspn af= ter the last token by adding an early exit for an empty string.=A0 The result is a ~2x speedup of strtok on most inputs in bench-strtok. Passes regression tests, OK for commit? ChangeLog: 2015-10-28=A0 Wilco Dijkstra=A0 =A0=A0=A0=A0=A0=A0=A0 * string/strtok.c (STRTOK): Optimize for performance. =A0=A0=A0=A0=A0=A0=A0 * string/strtok_r.c (__strtok_r): Likewise. -- diff --git a/string/strtok.c b/string/strtok.c index 7a4574db5c80501e47d045ad4347e8a287b32191..b1ed48c24c8d20706b7d05481a1= 38b18a01ff802 100644 --- a/string/strtok.c +++ b/string/strtok.c @@ -38,11 +38,18 @@ static char *olds; =A0char * =A0STRTOK (char *s, const char *delim) =A0{ -=A0 char *token; +=A0 char *end; =A0 =A0=A0 if (s =3D=3D NULL) =A0=A0=A0=A0 s =3D olds; =A0 +=A0 /* Return immediately at end of string.=A0 */ +=A0 if (*s =3D=3D '\0') +=A0=A0=A0 { +=A0=A0=A0=A0=A0 olds =3D s; +=A0=A0=A0=A0=A0 return NULL; +=A0=A0=A0 } + =A0=A0 /* Scan leading delimiters.=A0 */ =A0=A0 s +=3D strspn (s, delim); =A0=A0 if (*s =3D=3D '\0') @@ -52,16 +59,15 @@ STRTOK (char *s, const char *delim) =A0=A0=A0=A0 } =A0 =A0=A0 /* Find the end of the token.=A0 */ -=A0 token =3D s; -=A0 s =3D strpbrk (token, delim); -=A0 if (s =3D=3D NULL) -=A0=A0=A0 /* This token finishes the string.=A0 */ -=A0=A0=A0 olds =3D __rawmemchr (token, '\0'); -=A0 else +=A0 end =3D s + strcspn (s, delim); +=A0 if (*end =3D=3D '\0') =A0=A0=A0=A0 { -=A0=A0=A0=A0=A0 /* Terminate the token and make OLDS point past it.=A0 */ -=A0=A0=A0=A0=A0 *s =3D '\0'; -=A0=A0=A0=A0=A0 olds =3D s + 1; +=A0=A0=A0=A0=A0 olds =3D end; +=A0=A0=A0=A0=A0 return s; =A0=A0=A0=A0 } -=A0 return token; + +=A0 /* Terminate the token and make OLDS point past it.=A0 */ +=A0 *end =3D '\0'; +=A0 olds =3D end + 1; +=A0 return s; =A0} diff --git a/string/strtok_r.c b/string/strtok_r.c index f351304766108dad2c1cff881ad3bebae821b2a0..e049a5c82e026a3b6c1ba5da16c= e81743717805e 100644 --- a/string/strtok_r.c +++ b/string/strtok_r.c @@ -45,11 +45,17 @@ =A0char * =A0__strtok_r (char *s, const char *delim, char **save_ptr) =A0{ -=A0 char *token; +=A0 char *end; =A0 =A0=A0 if (s =3D=3D NULL) =A0=A0=A0=A0 s =3D *save_ptr; =A0 +=A0 if (*s =3D=3D '\0') +=A0=A0=A0 { +=A0=A0=A0=A0=A0 *save_ptr =3D s; +=A0=A0=A0=A0=A0 return NULL; +=A0=A0=A0 } + =A0=A0 /* Scan leading delimiters.=A0 */ =A0=A0 s +=3D strspn (s, delim); =A0=A0 if (*s =3D=3D '\0') @@ -59,18 +65,17 @@ __strtok_r (char *s, const char *delim, char **save_ptr) =A0=A0=A0=A0 } =A0 =A0=A0 /* Find the end of the token.=A0 */ -=A0 token =3D s; -=A0 s =3D strpbrk (token, delim); -=A0 if (s =3D=3D NULL) -=A0=A0=A0 /* This token finishes the string.=A0 */ -=A0=A0=A0 *save_ptr =3D __rawmemchr (token, '\0'); -=A0 else +=A0 end =3D s + strcspn (s, delim); +=A0 if (*end =3D=3D '\0') =A0=A0=A0=A0 { -=A0=A0=A0=A0=A0 /* Terminate the token and make *SAVE_PTR point past it.= =A0 */ -=A0=A0=A0=A0=A0 *s =3D '\0'; -=A0=A0=A0=A0=A0 *save_ptr =3D s + 1; +=A0=A0=A0=A0=A0 *save_ptr =3D end; +=A0=A0=A0=A0=A0 return s; =A0=A0=A0=A0 } -=A0 return token; + +=A0 /* Terminate the token and make *SAVE_PTR point past it.=A0 */ +=A0 *end =3D '\0'; +=A0 *save_ptr =3D end + 1; +=A0 return s; =A0} =A0#ifdef weak_alias =A0libc_hidden_def (__strtok_r) =20=20=20=20