From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-63524-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 71132 invoked by alias); 28 Sep 2015 09:23:43 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 71122 invoked by uid 89); 28 Sep 2015 09:23:42 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
From: "Wilco Dijkstra" <wdijkstr@arm.com>
To: =?iso-8859-2?Q?'Ond=F8ej_B=EDlka'?= <neleai@seznam.cz>
Cc: "'GNU C Library'" <libc-alpha@sourceware.org>
References: <002d01d0f795$0ce77eb0$26b67c10$@com> <20150926084544.GA31280@domone>
In-Reply-To: <20150926084544.GA31280@domone>
Subject: RE: [PATCH][AArch64] Add optimized memchr
Date: Mon, 28 Sep 2015 09:23:00 -0000
Message-ID: <003601d0f9cf$5070f170$f152d450$@com>
MIME-Version: 1.0
X-MC-Unique: lIRpzaPFSQeZ4DFfDkr4cg-1
Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2015-09/txt/msg00678.txt.bz2

> Ond=F8ej B=EDlka wrote:
> On Fri, Sep 25, 2015 at 02:21:13PM +0100, Wilco Dijkstra wrote:
> > An optimized memchr was missing for AArch64. This version is similar to=
 strchr and is
> significantly
> > faster than the C version. Passes GLIBC tests.
> >
> > OK for commit?
> >
> > ChangeLog:
> > 2015-09-25  Wilco Dijkstra  <wdijkstr@arm.com>
> > 2015-09-25  Kevin Petit  <kevin.petit@arm.com>
> >
> > 	* sysdeps/aarch64/memchr.S (__memchr): New file.
>=20
> How you tested performance. I think that also here loading first 32
> bytes unaligned should be better. Could you use dryrun to verify?
>=20
> Also same optimization could be used for memrchr.

I haven't tuned this at all, this is an existing implementation that
was added to Newlib last year but not yet ported to GLIBC.

For maximum performance doing the first 16/32 bytes unaligned will likely=20
be fastest just like it was for strlen.

Wilco