From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5652 invoked by alias); 25 Mar 2013 15:33:06 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 5228 invoked by uid 89); 25 Mar 2013 15:32:58 -0000 Received: from popelka.ms.mff.cuni.cz (HELO popelka.ms.mff.cuni.cz) (195.113.20.131) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 25 Mar 2013 15:32:58 +0000 Received: from domone.kolej.mff.cuni.cz (popelka.ms.mff.cuni.cz [195.113.20.131]) by popelka.ms.mff.cuni.cz (Postfix) with ESMTPS id CDEBC62A3A; Mon, 25 Mar 2013 16:32:50 +0100 (CET) Received: by domone.kolej.mff.cuni.cz (Postfix, from userid 1000) id 586925FC15; Mon, 25 Mar 2013 16:28:23 +0100 (CET) Date: Mon, 25 Mar 2013 15:33:00 -0000 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: marcus.shawcroft@linaro.org Cc: libc-ports@sourceware.org Subject: memset on aarch64 Message-ID: <20130325152822.GA23513@domone.kolej.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2013-03/txt/msg00158.txt.bz2 Hi Marcus, could you try how following memset implementation works on aarch64. It could be faster for small n, I found that on x64 overlapping stores are best way how handle end conditions. I do not know how arm could handle that. I do not know how good loop will gcc generate, if this is faster header you can replace loop in assembly. #include #include /* Align VALUE down by ALIGN bytes. */ #define ALIGN_DOWN(value, align) \ ALIGN_DOWN_M1(value, align - 1) /* Align VALUE down by ALIGN_M1 + 1 bytes. Useful if you have precomputed ALIGN - 1. */ #define ALIGN_DOWN_M1(value, align_m1) \ (void *)((uintptr_t)(value) \ & ~(uintptr_t)(align_m1)) /* Align VALUE up by ALIGN bytes. */ #define ALIGN_UP(value, align) \ ALIGN_UP_M1(value, align - 1) /* Align VALUE up by ALIGN_M1 + 1 bytes. Useful if you have precomputed ALIGN - 1. */ #define ALIGN_UP_M1(value, align_m1) \ (void *)(((uintptr_t)(value) + (uintptr_t)(align_m1)) \ & ~(uintptr_t)(align_m1)) #define STOREU(x,y) STORE(x,y) #define STORE(x,y) ((uint64_t*)(x))[0]=y; ((uint64_t*)(x))[1]=y; static char *memset_small (char *dest, uint64_t c, size_t no, char *ret); void *memset_new(char *dest, int _c, size_t n) { int i; unsigned char c = _c; uint64_t vc = 0x0101010101010101ULL*c; if (n < 16) { return memset_small(dest, vc, n, dest); } else { STOREU(dest, vc); STOREU(dest + n - 16, vc); char *to = ALIGN_DOWN(dest + n, 16); dest = ALIGN_DOWN(dest + 16, 16); while (dest != to) { STORE(dest,vc); dest += 16; } } return dest; } static char *memset_small (char *dest, uint64_t c, size_t no, char *ret) { if (no & (8)) { ((uint64_t *) dest)[0] = c; ((uint64_t *)(dest + no - 8))[0] = c; return ret; } if (no & 4) { ((uint32_t *) dest)[0] = c; ((uint32_t *)(dest + no - 4))[0] = c; return ret; } if (no & 1) { dest[0] = c; } if (no & 2) { ((uint16_t *)(dest + no - 2))[0] = c; } return ret; }