From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18008 invoked by alias); 22 Apr 2013 08:32:35 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 17997 invoked by uid 89); 22 Apr 2013 08:32:34 -0000 X-Spam-SWARE-Status: No, score=-3.4 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_CP autolearn=ham version=3.3.1 Received: from mail-ie0-f182.google.com (HELO mail-ie0-f182.google.com) (209.85.223.182) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 22 Apr 2013 08:32:34 +0000 Received: by mail-ie0-f182.google.com with SMTP id bn7so4595181ieb.41 for ; Mon, 22 Apr 2013 01:32:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=y1Hr/oIYFjl2ZRQXzVN7OvPmvkYQG9Fh5efGN/FJ4PY=; b=RbqlLX5hQpNgl3CCudk+yRPBKrRxz7Ovrn4ko5rnBd13Vs1il11vV87m0CBglp1+8K AyC9NrWFHluloMD+P2944V7GIQMo2jsNVpql9pRR3wcRoL4+QVMS38FbAHqmYjyXGfNE IDKO5e3parALDfl9dLblF125FazG6yVQS6iceEtkdvtDDlz4ExREWA6EtNhWCNfpKEuP jRiGtgWy1QweT9hHJymBbRrvA8qqcL+OGCeG7Sh6wfUUh2ryfaYXNX20kKnTJnyGeRAD JyMf7tphoDtVwrg/vRhHnpKYl/SYVAZY+46172eaoAsjWePFdCuIfqYeYyzKvUObdEbf 8C+Q== MIME-Version: 1.0 X-Received: by 10.43.134.202 with SMTP id id10mr12504498icc.46.1366619552659; Mon, 22 Apr 2013 01:32:32 -0700 (PDT) Received: by 10.64.100.174 with HTTP; Mon, 22 Apr 2013 01:32:32 -0700 (PDT) In-Reply-To: References: <516D18F0.4060009@linaro.org> Date: Mon, 22 Apr 2013 08:32:00 -0000 Message-ID: Subject: Re: [PATCH v2] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC. From: Will Newton To: "Joseph S. Myers" Cc: libc-ports@sourceware.org, Patch Tracking Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmQ+zk27E339ZBn5gJROTAQPpcmDhLnEbURt6j+zC1GXDE5TDJ6dupHZGbt50CtSI0DDVOH X-SW-Source: 2013-04/txt/msg00098.txt.bz2 On 19 April 2013 22:47, Joseph S. Myers wrote: Hi Joseph, > On Tue, 16 Apr 2013, Will Newton wrote: > >> Add a high performance memcpy routine optimized for Cortex-A15 with >> variants for use in the presence of NEON and VFP hardware selected >> at runtime using indirect function support. > > The functions __aeabi_memcpy, __aeabi_memcpy4 and __aeabi_memcpy8, > currently implemented to call memcpy, have their ABI defined to clobber > only the core registers permitted to be clobbered by AAPCS, and not the > normally call-clobbered VFP/NEON registers. > > This patch would cause those functions to start clobbering some VFP/NEON > registers. So you need to do something to avoid that, whether making the > __aeabi_* functions save and restore registers in the affected case, > making the new functions do so or some other approach such as making > __aeabi_* use a variant of the code with an extra save/restore. > > As I understand the code, memcpy within ld.so itself will always be a > version using the core registers only, so you shouldn't have the extra > issue of needing to avoid corrupting such registers when used for argument > passing in the VFP ABI variant. Though if you were to support building a > glibc version that requires VFP/NEON, where the new code is used > unconditionally rather than just through IFUNC - and such a glibc is a > perfectly reasonable thing to build, after all if you are building for the > VFP ABI then you may as well assume at least VFP to be present everywhere > - then you would need to deal with that issue. (Cf. > .) I suspect adding in extra saving/restoring would be a significant performance overhead, particularly for small copies. Would it make sense just to make __aeabi_memcpy call the fallback arm routine? That would mean no performance improvement for __aeabi_memcpy calls but no performance degradation for the explicit memcpy case. -- Will Newton Toolchain Working Group, Linaro