From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 41322 invoked by alias); 27 Mar 2017 10:52:06 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 40669 invoked by uid 89); 27 Mar 2017 10:52:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-Spam-User: qpsmtpd, 2 recipients X-HELO: mail-pg0-f54.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wtA/HB1S3BeUYOFZ36oW7BvXR3LF0Ju85M1xv6YTsFk=; b=phT8zJ4ZpJ4sovIGc4XebWKzTocDuaxkLplHI16F3wg+evkm2hrBxh/4rVdMuMVEbD bqg0RvKkS/LpU4R201lq5FAN+os+1xFXrpxaMpKzsBfEmL63apTbJUqdM3MXnAbwUEJ/ NtTn1GeX7tArT71fU3k3oYGHg9pqh8FWYNEHlByipAzS2yq3uHYz7dR0pSLI4SEnk2cN CfO21B8MioiI+y4gbWN4/lg+14wLb++EAZ0KbtRMd9VjK0W24LvGnohh5zTYw2K8LiHK 9kxsOpoyCKNX2tdoLadLPwBO7kDHswwCqHpkfpBZ3a2Qa6qYKC7NBSNbIHgEFg6KMY+5 cQDg== X-Gm-Message-State: AFeK/H2ZQE4DXzSxXhatE9aeKMhzGJcMzf975FLNkdPrSc17XCSJKdcxVwbDBDfd3/5oJZVsfH0vbkfHJCXgIw== X-Received: by 10.99.167.1 with SMTP id d1mr24034109pgf.129.1490611924043; Mon, 27 Mar 2017 03:52:04 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1490397926.19074.73.camel@caviumnetworks.com> References: <1490397926.19074.73.camel@caviumnetworks.com> From: Ramana Radhakrishnan Date: Mon, 27 Mar 2017 10:52:00 -0000 Message-ID: Subject: Re: [Patch] aarch64: Thunderx specific memcpy and memmove To: Steve Ellcey Cc: libc-alpha , Siddhesh Poyarekar , Adhemerval Zanella Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2017-03/txt/msg00610.txt.bz2 On Fri, Mar 24, 2017 at 11:25 PM, Steve Ellcey wrote: > Now that the IFUNC infrastructure for aarch64 is in place, here is a > patch to use it to create ThunderX specific versions of memcpy and > memmove. > > This was part of my original patch before it was split in two and a > couple of issues were raised at that time. > > Siddhesh Poyarekar wanted to separate the generic and thunderx copies > of memcpy/memmove instead of using ifdefs in a combined source file. > I prefer the ifdef version as a cleaner implementation with less code > duplication but I can change it if that is the consensus. > > Also Adhemerval Zanella did some benchmarking that showed the > prefetching done in the thunderx version might be appropriate for the > generic version. However if you look at the prefetching we only do it > every other time through the loop. This is because the loop copies 64 > bytes and the ThunderX cache line size is 128 bytes. If other aarch64 > chips have a 64 byte cache line they might want a different prefetching > setup. Can you link to the benchmark numbers, workloads and what systems ? Ramana > > If people think we should use the ThunderX version of memcpy for all > aarch64 systems I am happy to drop this patch and create one that just > changes memcpy.S to do the ThunderX style prefetches for all aarch64 > systems. > > Steve Ellcey > sellcey@cavium.com > > > 2017-03-24 Steve Ellcey > > * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros. > (memmove): Use MEMMOVE for name. > (memcpy): Use MEMCPY for name. Add loop with prefetching > under USE_THUNDERX macro. > * sysdeps/aarch64/multiarch/Makefile: New file. > * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise. > * sysdeps/aarch64/multiarch/init-arch.h: Likewise. > * sysdeps/aarch64/multiarch/memcpy.c: Likewise. > * sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise. > * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise. > * sysdeps/aarch64/multiarch/memmove.c: Likewise.