From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3980 invoked by alias); 3 May 2018 17:52:29 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 3945 invoked by uid 89); 3 May 2018 17:52:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.1 required=5.0 tests=BAYES_00,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL autolearn=ham version=3.3.2 spammy=train, Hx-languages-length:902, hints, poyarekar X-HELO: homiemail-a56.g.dreamhost.com From: Siddhesh Poyarekar To: libc-alpha@sourceware.org Subject: [PATCH 0/2] aarch64,falkor: memcpy/memmove performance improvements Date: Thu, 03 May 2018 17:52:00 -0000 Message-Id: <20180503175209.2943-1-siddhesh@sourceware.org> X-SW-Source: 2018-05/txt/msg00074.txt.bz2 Hi, Here are a couple of patches to improve performance of the falkor memcpy and memmove implementations based on testing on the latest hardware. The theme of the optimization is to avoid trying to train the hardware prefetcher for smaller sizes and in the loop tail since that just mis-trains the prefetcher. Instead, use multiple registers to aid reordering wherever possible. Testing showed that regressions in these sizes compared to generic memcpy are resolved with this patch. Siddhesh Siddhesh Poyarekar (2): aarch64,falkor: Ignore prefetcher hints for memmove tail Ignore prefetcher tagging for smaller copies sysdeps/aarch64/multiarch/memcpy_falkor.S | 68 ++++++++++++++++++------------ sysdeps/aarch64/multiarch/memmove_falkor.S | 48 ++++++++++++--------- 2 files changed, 70 insertions(+), 46 deletions(-) -- 2.14.3