From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-79743-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 29853 invoked by alias); 25 May 2017 19:26:51 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 29828 invoked by uid 89); 25 May 2017 19:26:49 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=Micro, thursday, rare
X-HELO: homiemail-a92.g.dreamhost.com
Subject: Re: Ping: [Patch] aarch64: Thunderx specific memcpy and memmove
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>, Andrew Pinski <pinskia@gmail.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>,
 "Ellcey, Steve" <Steve.Ellcey@cavium.com>,
 libc-alpha <libc-alpha@sourceware.org>, nd <nd@arm.com>
References: <1493663254.29498.11.camel@cavium.com> <5909E2C5.7090603@arm.com>
 <d8858afd-ad4d-8fc3-9c0d-f95d6ab03c9c@gotplt.org>
 <1494366305.9224.26.camel@cavium.com>
 <74006e0a-fb4a-dc36-bc29-77303cef3cfb@gotplt.org>
 <DM5PR07MB34662F805C1EDE45882B82F6F5F90@DM5PR07MB3466.namprd07.prod.outlook.com>
 <5925BD04.7000902@arm.com> <0950612b-cff4-2256-6f81-3bacf30ce7e9@gotplt.org>
 <CA+=Sn1nwmW++JVH+ibFpC=80pByaSGWnB6MNGY0vsU8T5vV=-g@mail.gmail.com>
 <AM5PR0802MB26104227EC9F325E95C80D0883FF0@AM5PR0802MB2610.eurprd08.prod.outlook.com>
From: Siddhesh Poyarekar <siddhesh@gotplt.org>
Message-ID: <135198a3-ad77-5117-9c13-b4456268e74a@gotplt.org>
Date: Thu, 25 May 2017 19:26:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.1.0
MIME-Version: 1.0
In-Reply-To: <AM5PR0802MB26104227EC9F325E95C80D0883FF0@AM5PR0802MB2610.eurprd08.prod.outlook.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-SW-Source: 2017-05/txt/msg00771.txt.bz2

On Thursday 25 May 2017 11:19 PM, Wilco Dijkstra wrote:
> Given the number of micro architectures already existing, it would be a really
> bad situation to end up with one memcpy per micro architecture...

It's not just per micro-architecture...

> Micro architectures will tend to converge rather than diverge as performance
> level increases. So I believe it's generally best to use the same instructions for
> memcpy as for compiled code as that is what CPUs will actually encounter
> and optimize for. For the rare, very large copies we could do something different
> if it helps (eg. prefetch, non-temporals, SIMD registers etc).

... because as you say, micro-architectures may well converge over time
to some extent, but you will still end up having multiple memcpy
implementation taking advantage of different features in aarch64
architecture over time.  For example, SVE routines vs non-SVE routines.
You'll need both and looking at how x86 has evolved, there will be much
more to come.

> An ifunc has a measurable overhead unfortunately, and that would no longer
> be trivially avoidable via static linking. Most calls to memcpy tend to be very
> small copies. Maybe we should investigate statically linking the small copy part
> of memcpy with say -O3?

Sure, that might be something to look at as a data point, but again
getting rid of multiarch is not the option for desktop/server
implementations, especially if micro-architecture specific routines give
measurable gains over generic implementations in the general case, i.e.
dynamically linked programs that need to run out of the box and
optimally on multiple types of hardware.  Static binaries unfortunately
become the edge case here.

Siddhesh