From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-52769-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 26336 invoked by alias); 12 Sep 2014 11:04:14 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 26321 invoked by uid 89); 12 Sep 2014 11:04:13 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=no version=3.3.2
X-HELO: service87.mimecast.com
From: "Wilco Dijkstra" <wdijkstr@arm.com>
To: =?iso-8859-2?Q?'Ond=F8ej_B=EDlka'?= <neleai@seznam.cz>
Cc: "'Rich Felker'" <dalias@libc.org>,
	"Florian Weimer" <fweimer@redhat.com>,
	<azanella@linux.vnet.ibm.com>,
	<libc-alpha@sourceware.org>
References: <001301cfcd0a$f0b62670$d2227350$@com> <54108BB0.90902@redhat.com> <20140910180144.GK23797@brightrain.aerifal.cx> <002501cfcdf7$cc046510$640d2f30$@com> <20140912062203.GB19287@domone>
In-Reply-To: <20140912062203.GB19287@domone>
Subject: RE: [PATCH] Improve performance of strncpy
Date: Fri, 12 Sep 2014 11:04:00 -0000
Message-ID: <002601cfce79$461ffd60$d25ff820$@com>
MIME-Version: 1.0
X-MC-Unique: 114091212040804701
Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2014-09/txt/msg00266.txt.bz2

> Ond=F8ej B=EDlka wrote:
> On Thu, Sep 11, 2014 at 08:37:17PM +0100, Wilco Dijkstra wrote:
> > I did a quick experiment with strcpy as it's simpler. Replacing it
> > with memcpy (d, s, strlen (s) + 1) is 3 times faster even on strings
> > of 16Mbytes! Perhaps more surprisingly, it has similar performance on
> > these huge strings as an optimized strcpy.
> >
> What architecture? This could also happen because memcpy has special
> case to handle large strings that speeds this up. Its something that I
> tried in one-pass strcpy but it harms performance as overhead of checking
> size is bigger than benefit of larger size.

The 3x happens on all 3 ISAs I tried. On ARM the memcpy/strlen variant
even beats the optimized strcmp case for most sizes, on x64 it runs at
about 80% of the optimized strcpy for sizes above 4KB.

> > So the results are pretty clear, if you don't have a super optimized
> > strcpy, then strlen+memcpy is the best way to do it.
> >
> It is not that clear as you spend considerable amount of time on small
> lenghts, what is important is constant overhead of strcpy startup.
> However this needs platform specific tricks to decide which alternative
> is fastest.

The overheads are relatively small on modern cores. The memcpy/strlen
is always faster than the single loop for lengths larger than 8-16.

Wilco