From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-ports-return-4057-listarch-libc-ports=sources.redhat.com@sourceware.org>
Received: (qmail 18008 invoked by alias); 22 Apr 2013 08:32:35 -0000
Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-ports.sourceware.org>
List-Subscribe: <mailto:libc-ports-subscribe@sourceware.org>
List-Post: <mailto:libc-ports@sourceware.org>
List-Help: <mailto:libc-ports-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-ports-owner@sourceware.org
Received: (qmail 17997 invoked by uid 89); 22 Apr 2013 08:32:34 -0000
X-Spam-SWARE-Status: No, score=-3.4 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_CP autolearn=ham version=3.3.1
Received: from mail-ie0-f182.google.com (HELO mail-ie0-f182.google.com) (209.85.223.182)    by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 22 Apr 2013 08:32:34 +0000
Received: by mail-ie0-f182.google.com with SMTP id bn7so4595181ieb.41        for <libc-ports@sourceware.org>; Mon, 22 Apr 2013 01:32:32 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=google.com; s=20120113;        h=mime-version:x-received:in-reply-to:references:date:message-id         :subject:from:to:cc:content-type:x-gm-message-state;        bh=y1Hr/oIYFjl2ZRQXzVN7OvPmvkYQG9Fh5efGN/FJ4PY=;        b=RbqlLX5hQpNgl3CCudk+yRPBKrRxz7Ovrn4ko5rnBd13Vs1il11vV87m0CBglp1+8K         AyC9NrWFHluloMD+P2944V7GIQMo2jsNVpql9pRR3wcRoL4+QVMS38FbAHqmYjyXGfNE         IDKO5e3parALDfl9dLblF125FazG6yVQS6iceEtkdvtDDlz4ExREWA6EtNhWCNfpKEuP         jRiGtgWy1QweT9hHJymBbRrvA8qqcL+OGCeG7Sh6wfUUh2ryfaYXNX20kKnTJnyGeRAD         JyMf7tphoDtVwrg/vRhHnpKYl/SYVAZY+46172eaoAsjWePFdCuIfqYeYyzKvUObdEbf         8C+Q==
MIME-Version: 1.0
X-Received: by 10.43.134.202 with SMTP id id10mr12504498icc.46.1366619552659; Mon, 22 Apr 2013 01:32:32 -0700 (PDT)
Received: by 10.64.100.174 with HTTP; Mon, 22 Apr 2013 01:32:32 -0700 (PDT)
In-Reply-To: <Pine.LNX.4.64.1304192138380.27838@digraph.polyomino.org.uk>
References: <516D18F0.4060009@linaro.org>	<Pine.LNX.4.64.1304192138380.27838@digraph.polyomino.org.uk>
Date: Mon, 22 Apr 2013 08:32:00 -0000
Message-ID: <CANu=DmiTJM7-RzHFjwuhZ=d+s=_iWAdVVmrwLk0kEN70BNmv2w@mail.gmail.com>
Subject: Re: [PATCH v2] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC.
From: Will Newton <will.newton@linaro.org>
To: "Joseph S. Myers" <joseph@codesourcery.com>
Cc: libc-ports@sourceware.org, Patch Tracking <patches@linaro.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Gm-Message-State: ALoCoQmQ+zk27E339ZBn5gJROTAQPpcmDhLnEbURt6j+zC1GXDE5TDJ6dupHZGbt50CtSI0DDVOH
X-SW-Source: 2013-04/txt/msg00098.txt.bz2

On 19 April 2013 22:47, Joseph S. Myers <joseph@codesourcery.com> wrote:

Hi Joseph,

> On Tue, 16 Apr 2013, Will Newton wrote:
>
>> Add a high performance memcpy routine optimized for Cortex-A15 with
>> variants for use in the presence of NEON and VFP hardware selected
>> at runtime using indirect function support.
>
> The functions __aeabi_memcpy, __aeabi_memcpy4 and __aeabi_memcpy8,
> currently implemented to call memcpy, have their ABI defined to clobber
> only the core registers permitted to be clobbered by AAPCS, and not the
> normally call-clobbered VFP/NEON registers.
>
> This patch would cause those functions to start clobbering some VFP/NEON
> registers.  So you need to do something to avoid that, whether making the
> __aeabi_* functions save and restore registers in the affected case,
> making the new functions do so or some other approach such as making
> __aeabi_* use a variant of the code with an extra save/restore.
>
> As I understand the code, memcpy within ld.so itself will always be a
> version using the core registers only, so you shouldn't have the extra
> issue of needing to avoid corrupting such registers when used for argument
> passing in the VFP ABI variant.  Though if you were to support building a
> glibc version that requires VFP/NEON, where the new code is used
> unconditionally rather than just through IFUNC - and such a glibc is a
> perfectly reasonable thing to build, after all if you are building for the
> VFP ABI then you may as well assume at least VFP to be present everywhere
> - then you would need to deal with that issue.  (Cf.
> <http://sourceware.org/ml/libc-ports/2012-04/msg00087.html>.)

I suspect adding in extra saving/restoring would be a significant
performance overhead, particularly for small copies. Would it make
sense just to make __aeabi_memcpy call the fallback arm routine? That
would mean no performance improvement for __aeabi_memcpy calls but no
performance degradation for the explicit memcpy case.


--
Will Newton
Toolchain Working Group, Linaro