From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 61940 invoked by alias); 2 Apr 2017 00:02:08 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 61912 invoked by uid 89); 2 Apr 2017 00:02:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=Hx-languages-length:2155 X-HELO: mx0a-001b2d01.pphosted.com Subject: Re: [Patch] aarch64: Thunderx specific memcpy and memmove To: Steve Ellcey , libc-alpha References: <1490397926.19074.73.camel@caviumnetworks.com> Cc: Siddhesh Poyarekar , Adhemerval Zanella From: Wainer dos Santos Moschetta Date: Sun, 02 Apr 2017 00:02:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1490397926.19074.73.camel@caviumnetworks.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable x-cbid: 17040200-0032-0000-0000-00000550A33D X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17040200-0033-0000-0000-000011D4F10F Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-01_18:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1704010228 X-SW-Source: 2017-04/txt/msg00006.txt.bz2 In sysdeps/aarch64/multiarch/memcpy_generic.S, it has: +#include "../memcpy.S" Is it ok to use relative path here? or rather it's recommended use of the full path since sysdeps? On 03/24/2017 08:25 PM, Steve Ellcey wrote: > Now that the IFUNC infrastructure for aarch64 is in place, here is a > patch to use it to create ThunderX specific versions of memcpy and > memmove. > > This was part of my original patch before it was split in two and a > couple of issues were raised at that time. > > Siddhesh Poyarekar wanted to separate the generic and thunderx copies > of memcpy/memmove instead of using ifdefs in a combined source file. > I prefer the ifdef version as a cleaner implementation with less code > duplication but I can change it if that is the consensus. > > Also Adhemerval Zanella did some benchmarking that showed the > prefetching done in the thunderx version might be appropriate for the > generic version. However if you look at the prefetching we only do it > every other time through the loop. This is because the loop copies 64 > bytes and the ThunderX cache line size is 128 bytes. If other aarch64 > chips have a 64 byte cache line they might want a different prefetching > setup. > > If people think we should use the ThunderX version of memcpy for all > aarch64 systems I am happy to drop this patch and create one that just > changes memcpy.S to do the ThunderX style prefetches for all aarch64 > systems. > > Steve Ellcey > sellcey@cavium.com > > > 2017-03-24 Steve Ellcey > > * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros. > (memmove): Use MEMMOVE for name. > (memcpy): Use MEMCPY for name. Add loop with prefetching > under USE_THUNDERX macro. > * sysdeps/aarch64/multiarch/Makefile: New file. > * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise. > * sysdeps/aarch64/multiarch/init-arch.h: Likewise. > * sysdeps/aarch64/multiarch/memcpy.c: Likewise. > * sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise. > * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise. > * sysdeps/aarch64/multiarch/memmove.c: Likewise.