From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1895) id A9AB7386F448; Wed, 14 Oct 2020 16:26:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A9AB7386F448 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Wilco Dijkstra To: glibc-cvs@sourceware.org Subject: [glibc/release/2.29/master] AArch64: Improve backwards memmove performance X-Act-Checkin: glibc X-Git-Author: Wilco Dijkstra X-Git-Refname: refs/heads/release/2.29/master X-Git-Oldrev: 58c6a7ae53c647390a3057c247d34643e1201aac X-Git-Newrev: 64458aabeb7f6d15b389cb49b9faf4925db354fa Message-Id: <20201014162629.A9AB7386F448@sourceware.org> Date: Wed, 14 Oct 2020 16:26:29 +0000 (GMT) X-BeenThere: glibc-cvs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Oct 2020 16:26:29 -0000 https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=64458aabeb7f6d15b389cb49b9faf4925db354fa commit 64458aabeb7f6d15b389cb49b9faf4925db354fa Author: Wilco Dijkstra Date: Fri Aug 28 17:51:40 2020 +0100 AArch64: Improve backwards memmove performance On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses. So change the memmove loop in memcpy_advsimd.S to use 2x STR rather than STP. Reviewed-by: Adhemerval Zanella (cherry picked from commit bd394d131c10c9ec22c6424197b79410042eed99) Diff: --- sysdeps/aarch64/multiarch/memcpy_advsimd.S | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/sysdeps/aarch64/multiarch/memcpy_advsimd.S b/sysdeps/aarch64/multiarch/memcpy_advsimd.S index d4ba747777..48bb6d7ca4 100644 --- a/sysdeps/aarch64/multiarch/memcpy_advsimd.S +++ b/sysdeps/aarch64/multiarch/memcpy_advsimd.S @@ -223,12 +223,13 @@ L(copy_long_backwards): b.ls L(copy64_from_start) L(loop64_backwards): - stp A_q, B_q, [dstend, -32] + str B_q, [dstend, -16] + str A_q, [dstend, -32] ldp A_q, B_q, [srcend, -96] - stp C_q, D_q, [dstend, -64] + str D_q, [dstend, -48] + str C_q, [dstend, -64]! ldp C_q, D_q, [srcend, -128] sub srcend, srcend, 64 - sub dstend, dstend, 64 subs count, count, 64 b.hi L(loop64_backwards)