From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 59448 invoked by alias); 6 Nov 2018 15:01:23 -0000 Mailing-List: contact newlib-cvs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: newlib-cvs-owner@sourceware.org Received: (qmail 58644 invoked by uid 9007); 6 Nov 2018 15:00:34 -0000 Date: Tue, 06 Nov 2018 15:01:00 -0000 Message-ID: <20181106150034.58641.qmail@sourceware.org> Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Richard Earnshaw To: newlib-cvs@sourceware.org Subject: [newlib-cygwin] Adjust writeback in non-zero memset X-Act-Checkin: newlib-cygwin X-Git-Author: Wilco Dijkstra X-Git-Refname: refs/heads/master X-Git-Oldrev: 535903696c339b71edd3575ab44bbf2e5eab689a X-Git-Newrev: d80db600664bec381230be85955b54884f21a619 X-SW-Source: 2018-q4/txt/msg00023.txt.bz2 https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=d80db600664bec381230be85955b54884f21a619 commit d80db600664bec381230be85955b54884f21a619 Author: Wilco Dijkstra Date: Tue Nov 6 14:42:10 2018 +0000 Adjust writeback in non-zero memset This fixes an ineffiency in the non-zero memset. Delaying the writeback until the end of the loop is slightly faster on some cores - this shows ~5% performance gain on Cortex-A53 when doing large non-zero memsets. Tested against the GLIBC testsuite. Diff: --- newlib/libc/machine/aarch64/memset.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/newlib/libc/machine/aarch64/memset.S b/newlib/libc/machine/aarch64/memset.S index 799e7b7..7c8fe58 100644 --- a/newlib/libc/machine/aarch64/memset.S +++ b/newlib/libc/machine/aarch64/memset.S @@ -142,10 +142,10 @@ L(set_long): b.eq L(try_zva) L(no_zva): sub count, dstend, dst /* Count is 16 too large. */ - add dst, dst, 16 + sub dst, dst, 16 /* Dst is biased by -32. */ sub count, count, 64 + 16 /* Adjust count and bias for loop. */ -1: stp q0, q0, [dst], 64 - stp q0, q0, [dst, -32] +1: stp q0, q0, [dst, 32] + stp q0, q0, [dst, 64]! L(tail64): subs count, count, 64 b.hi 1b