* RE: [PATCH][PING] Improve stpncpy performance
@ 2015-07-06 11:29 Wilco Dijkstra
2015-07-09 13:21 ` Ondřej Bílka
0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2015-07-06 11:29 UTC (permalink / raw)
To: libc-alpha
> Wilco wrote:
> ping
>
> > > -----Original Message-----
> > > From: Wilco Dijkstra [mailto:wdijkstr@arm.com]
> > > Sent: 12 January 2015 15:09
> > > To: 'libc-alpha@sourceware.org'
> > > Subject: [PATCH] Improve stpncpy performance
> > >
> > > Like strncpy, this patch improves stpncpy performance by using strnlen/memcpy/memset
> rather
> > > than a byte loop. Performance on bench-stpncpy is ~2x faster on average.
> > >
> > > ChangeLog:
> > >
> > > 2015-01-12 Wilco Dijkstra wdijkstr@arm.com
> > >
> > > * string/stpncpy.c (stpncpy): Improve performance using
> > > __strnlen/memcpy/memset.
> > >
> > > ---
> > > string/stpncpy.c | 59 ++++++--------------------------------------------------
> > > 1 file changed, 6 insertions(+), 53 deletions(-)
> > >
> > > diff --git a/string/stpncpy.c b/string/stpncpy.c
> > > index fad747e..50521aa 100644
> > > --- a/string/stpncpy.c
> > > +++ b/string/stpncpy.c
> > > @@ -15,7 +15,7 @@
> > > License along with the GNU C Library; if not, see
> > > <http://www.gnu.org/licenses/>. */
> > >
> > > -/* This is almost copied from strncpy.c, written by Torbjorn Granlund. */
> > > +#include <stddef.h>
> > >
> > > #ifdef HAVE_CONFIG_H
> > > # include <config.h>
> > > @@ -41,59 +41,12 @@ weak_alias (__stpncpy, stpncpy)
> > > char *
> > > STPNCPY (char *dest, const char *src, size_t n)
> > > {
> > > - char c;
> > > - char *s = dest;
> > > -
> > > - if (n >= 4)
> > > - {
> > > - size_t n4 = n >> 2;
> > > -
> > > - for (;;)
> > > - {
> > > - c = *src++;
> > > - *dest++ = c;
> > > - if (c == '\0')
> > > - break;
> > > - c = *src++;
> > > - *dest++ = c;
> > > - if (c == '\0')
> > > - break;
> > > - c = *src++;
> > > - *dest++ = c;
> > > - if (c == '\0')
> > > - break;
> > > - c = *src++;
> > > - *dest++ = c;
> > > - if (c == '\0')
> > > - break;
> > > - if (--n4 == 0)
> > > - goto last_chars;
> > > - }
> > > - n -= dest - s;
> > > - goto zero_fill;
> > > - }
> > > -
> > > - last_chars:
> > > - n &= 3;
> > > - if (n == 0)
> > > + size_t size = __strnlen (src, n);
> > > + memcpy (dest, src, size);
> > > + dest += size;
> > > + if (size == n)
> > > return dest;
> > > -
> > > - for (;;)
> > > - {
> > > - c = *src++;
> > > - --n;
> > > - *dest++ = c;
> > > - if (c == '\0')
> > > - break;
> > > - if (n == 0)
> > > - return dest;
> > > - }
> > > -
> > > - zero_fill:
> > > - while (n-- > 0)
> > > - dest[n] = '\0';
> > > -
> > > - return dest - 1;
> > > + return memset (dest, '\0', n - size);
> > > }
> > > #ifdef weak_alias
> > > libc_hidden_def (__stpncpy)
> > > --
> > > 1.9.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][PING] Improve stpncpy performance
2015-07-06 11:29 [PATCH][PING] Improve stpncpy performance Wilco Dijkstra
@ 2015-07-09 13:21 ` Ondřej Bílka
2015-07-09 15:02 ` Wilco Dijkstra
0 siblings, 1 reply; 5+ messages in thread
From: Ondřej Bílka @ 2015-07-09 13:21 UTC (permalink / raw)
To: Wilco Dijkstra; +Cc: libc-alpha
On Mon, Jul 06, 2015 at 12:29:24PM +0100, Wilco Dijkstra wrote:
> > Wilco wrote:
> > ping
> >
> > > > -----Original Message-----
> > > > From: Wilco Dijkstra [mailto:wdijkstr@arm.com]
> > > > Sent: 12 January 2015 15:09
> > > > To: 'libc-alpha@sourceware.org'
> > > > Subject: [PATCH] Improve stpncpy performance
> > > >
> > > > Like strncpy, this patch improves stpncpy performance by using strnlen/memcpy/memset
> > rather
> > > > than a byte loop. Performance on bench-stpncpy is ~2x faster on average.
> > > >
> > > > + size_t size = __strnlen (src, n);
> > > > + memcpy (dest, src, size);
> > > > + dest += size;
> > > > + if (size == n)
> > > > return dest;
> > > > -
> > > > - for (;;)
> > > > - {
> > > > - c = *src++;
> > > > - --n;
> > > > - *dest++ = c;
> > > > - if (c == '\0')
> > > > - break;
> > > > - if (n == 0)
> > > > - return dest;
> > > > - }
> > > > -
> > > > - zero_fill:
> > > > - while (n-- > 0)
> > > > - dest[n] = '\0';
> > > > -
> > > > - return dest - 1;
> > > > + return memset (dest, '\0', n - size);
> > > > }
> > > > #ifdef weak_alias
> > > > libc_hidden_def (__stpncpy)
> > > > --
> > > > 1.9.1
>
You don't have to use special case
if (size == n)
return dest;
as it should be handled by
return memset (dest, '\0', 0);
That could improve performance a bit if its rare case. That doesn't
matter much as memset makes that function slow and it shouldn't be
used in performance sensitive code.
Otherwise ok for me.
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH][PING] Improve stpncpy performance
2015-07-09 13:21 ` Ondřej Bílka
@ 2015-07-09 15:02 ` Wilco Dijkstra
2015-08-12 20:57 ` Ondřej Bílka
0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2015-07-09 15:02 UTC (permalink / raw)
To: 'Ondřej Bílka'; +Cc: libc-alpha
> Ondřej Bílka wrote:
>
> You don't have to use special case
>
> if (size == n)
> return dest;
>
> as it should be handled by
>
> return memset (dest, '\0', 0);
>
> That could improve performance a bit if its rare case. That doesn't
> matter much as memset makes that function slow and it shouldn't be
> used in performance sensitive code.
>
> Otherwise ok for me.
On the benchtests the extra if made a significant difference, particularly
since memset of 0 is relatively expensive as it is being regarded as a very
rare case. It seems it should be less likely than the benchtests indicate,
but we'd have to fix the benchtest first to use more realistic data.
Wilco
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][PING] Improve stpncpy performance
2015-07-09 15:02 ` Wilco Dijkstra
@ 2015-08-12 20:57 ` Ondřej Bílka
2015-08-19 16:15 ` Wilco Dijkstra
0 siblings, 1 reply; 5+ messages in thread
From: Ondřej Bílka @ 2015-08-12 20:57 UTC (permalink / raw)
To: Wilco Dijkstra; +Cc: libc-alpha
On Thu, Jul 09, 2015 at 04:02:26PM +0100, Wilco Dijkstra wrote:
> > OndÅej BÃlka wrote:
> >
> > You don't have to use special case
> >
> > if (size == n)
> > return dest;
> >
> > as it should be handled by
> >
> > return memset (dest, '\0', 0);
> >
> > That could improve performance a bit if its rare case. That doesn't
> > matter much as memset makes that function slow and it shouldn't be
> > used in performance sensitive code.
> >
> > Otherwise ok for me.
>
> On the benchtests the extra if made a significant difference, particularly
> since memset of 0 is relatively expensive as it is being regarded as a very
> rare case. It seems it should be less likely than the benchtests indicate,
> but we'd have to fix the benchtest first to use more realistic data.
>
Ok, I did data collection and I take my objection back as it almost
always happens in bash. I was surprised why it needs to use strncpy to
copy small number of bytes.
When I tested dryrun benchmark special casing is faster. I got following
data on strncpy but no on stpncpy so we could reuse that for stpcpy patch that you also submitted.
replaying bash
calls 194
average n: 15.6082 n <= 0: 4.6% n <= 4: 6.7% n <= 8: 43.3% n <= 16: 80.9% n <= 24: 88.7% n <= 32: 88.7% n <= 48: 91.2% n <= 64: 98.5%
s aligned to 4 bytes: 99.5% 8 bytes: 97.9% 16 bytes: 0.5%
average *s access cache latency 0.9072 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 34.0% 8 bytes: 24.2% 16 bytes: 1.5%
s-s2 aligned to 4 bytes: 34.5% 8 bytes: 22.7% 16 bytes: 22.7%
average *s2 access cache latency 1.1186 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 15.6082 c <= 0: 4.6% c <= 4: 6.7% c <= 8: 43.3% c <= 16: 80.9% c <= 24: 88.7% c <= 32: 88.7% c <= 48: 91.2% c <= 64: 98.5% n == capa : 100.0%
replaying mc
calls 6971
average n: 9.0773 n <= 0: 2.7% n <= 4: 54.3% n <= 8: 71.9% n <= 16: 85.5% n <= 24: 91.0% n <= 32: 94.2% n <= 48: 96.9% n <= 64: 98.7%
s aligned to 4 bytes: 100.0% 8 bytes: 100.0% 16 bytes: 100.0%
average *s access cache latency 36.3347 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 55.3% 8 bytes: 49.5% 16 bytes: 48.1%
s-s2 aligned to 4 bytes: 55.3% 8 bytes: 49.5% 16 bytes: 48.1%
average *s2 access cache latency 1.0126 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 9.5847 c <= 0: 0.7% c <= 4: 52.2% c <= 8: 70.9% c <= 16: 84.9% c <= 24: 90.7% c <= 32: 93.8% c <= 48: 96.5% c <= 64: 98.7% n == capa : 63.6%
replaying mutt
calls 10415
average n: 7.6572 n <= 0: 0.1% n <= 4: 68.6% n <= 8: 82.5% n <= 16: 86.4% n <= 24: 87.7% n <= 32: 94.5% n <= 48: 96.6% n <= 64: 98.9%
s aligned to 4 bytes: 57.9% 8 bytes: 49.1% 16 bytes: 45.2%
average *s access cache latency 1.1092 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 85.7% 8 bytes: 79.8% 16 bytes: 79.5%
s-s2 aligned to 4 bytes: 43.6% 8 bytes: 28.9% 16 bytes: 24.7%
average *s2 access cache latency 1.1324 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 51.2750 c <= 0: 0.0% c <= 4: 65.3% c <= 8: 73.0% c <= 16: 74.6% c <= 24: 75.2% c <= 32: 75.3% c <= 48: 75.3% c <= 64: 75.3% n == capa : 64.4%
replaying /bin/bash
calls 60
average n: 10.3167 n <= 0: 1.7% n <= 4: 26.7% n <= 8: 56.7% n <= 16: 88.3% n <= 24: 98.3% n <= 32: 98.3% n <= 48: 98.3% n <= 64: 98.3%
s aligned to 4 bytes: 100.0% 8 bytes: 100.0% 16 bytes: 1.7%
average *s access cache latency 0.8833 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 36.7% 8 bytes: 33.3% 16 bytes: 3.3%
s-s2 aligned to 4 bytes: 36.7% 8 bytes: 33.3% 16 bytes: 31.7%
average *s2 access cache latency 0.9167 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 10.3167 c <= 0: 1.7% c <= 4: 26.7% c <= 8: 56.7% c <= 16: 88.3% c <= 24: 98.3% c <= 32: 98.3% c <= 48: 98.3% c <= 64: 98.3% n == capa : 100.0%
replaying as
calls 122
average n: 6.8115 n <= 0: 0.8% n <= 4: 6.6% n <= 8: 95.1% n <= 16: 98.4% n <= 24: 100.0% n <= 32: 100.0% n <= 48: 100.0% n <= 64: 100.0%
s aligned to 4 bytes: 100.0% 8 bytes: 100.0% 16 bytes: 100.0%
average *s access cache latency 1.0410 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 25.4% 8 bytes: 13.1% 16 bytes: 5.7%
s-s2 aligned to 4 bytes: 25.4% 8 bytes: 13.1% 16 bytes: 5.7%
average *s2 access cache latency 0.9262 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 126.9508 c <= 0: 0.8% c <= 4: 0.8% c <= 8: 0.8% c <= 16: 0.8% c <= 24: 0.8% c <= 32: 0.8% c <= 48: 0.8% c <= 64: 0.8% n == capa : 0.8%
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH][PING] Improve stpncpy performance
2015-08-12 20:57 ` Ondřej Bílka
@ 2015-08-19 16:15 ` Wilco Dijkstra
0 siblings, 0 replies; 5+ messages in thread
From: Wilco Dijkstra @ 2015-08-19 16:15 UTC (permalink / raw)
To: 'Ondřej Bílka'; +Cc: libc-alpha
> Ondřej Bílka wrote:
> On Thu, Jul 09, 2015 at 04:02:26PM +0100, Wilco Dijkstra wrote:
> > > Ondřej Bílka wrote:
> > >
> > > You don't have to use special case
> > >
> > > if (size == n)
> > > return dest;
> > >
> > > as it should be handled by
> > >
> > > return memset (dest, '\0', 0);
> > >
> > > That could improve performance a bit if its rare case. That doesn't
> > > matter much as memset makes that function slow and it shouldn't be
> > > used in performance sensitive code.
> > >
> > > Otherwise ok for me.
> >
> > On the benchtests the extra if made a significant difference, particularly
> > since memset of 0 is relatively expensive as it is being regarded as a very
> > rare case. It seems it should be less likely than the benchtests indicate,
> > but we'd have to fix the benchtest first to use more realistic data.
> >
> Ok, I did data collection and I take my objection back as it almost
> always happens in bash. I was surprised why it needs to use strncpy to
> copy small number of bytes.
>
> When I tested dryrun benchmark special casing is faster. I got following
> data on strncpy but no on stpncpy so we could reuse that for stpcpy patch that you also
> submitted.
OK thanks for checking the statistics. I've now committed it.
Wilco
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-19 16:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-06 11:29 [PATCH][PING] Improve stpncpy performance Wilco Dijkstra
2015-07-09 13:21 ` Ondřej Bílka
2015-07-09 15:02 ` Wilco Dijkstra
2015-08-12 20:57 ` Ondřej Bílka
2015-08-19 16:15 ` Wilco Dijkstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).