From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31163 invoked by alias); 18 Aug 2017 06:25:33 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 31138 invoked by uid 89); 18 Aug 2017 06:25:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-yw0-f194.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=/JaRbepihCOTmJThJlFdlKVcp1i+J5e5eFgFGO1R9L4=; b=qMbF5IuqPlkek1DWHLsYVrtMFZ5Izy+CCoo4jgP/RWfqI56P9XqQ/vju1dfKQmKFGX IJyYj5OSx0h89Se6y4fwMGn8OnhqcEKhUYu9ggsfsG7UVq406fdC6DWa9El4V2VgwSE6 U72bxOUTZPeJml+Mj+vKLVw098EHalOszYi/ZWv9BwMFFNH8vQXjrkYMUQvKq2vX7iRw vmDk1g0c/D6PBvtf+N61/t+sYS2tq9hELv9qiZ2TBqcFJHIPITf+tjtiAF3pIrUWgopS 6XuxfKZAchWemAK266jy4tZptcMruPSTTvcnQ90G78ImsnPljIiyK/CJWXTWC/vRMQ1w B6rQ== X-Gm-Message-State: AHYfb5jW1TqIPe/DRfJjMVhwr9YD48ZSJqjx+X3DPcO7PwdvUAKMunMs fWDkXPiATrunc+kmUotdL5AcU6MMsw== X-Received: by 10.129.170.72 with SMTP id z8mr6302321ywk.74.1503037529798; Thu, 17 Aug 2017 23:25:29 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1503033107-20047-1-git-send-email-raji@linux.vnet.ibm.com> References: <1503033107-20047-1-git-send-email-raji@linux.vnet.ibm.com> From: Andrew Pinski Date: Fri, 18 Aug 2017 06:25:00 -0000 Message-ID: Subject: Re: [PATCH] powerpc: Use aligned stores in memset To: Rajalakshmi Srinivasaraghavan Cc: GNU C Library Content-Type: text/plain; charset="UTF-8" X-SW-Source: 2017-08/txt/msg00840.txt.bz2 On Thu, Aug 17, 2017 at 10:11 PM, Rajalakshmi Srinivasaraghavan wrote: > The powerpc hardware does not allow unaligned accesses on non cacheable > memory. This patch avoids misaligned stores for sizes less than 8 in > memset to avoid such cases. Tested on powerpc64 and powerpc64le. Why are you using memset on non cacheable memory? In fact how are you getting non-cacheable memory, mmap of /dev/mem or something different? Thanks, Andrew > > 2017-08-17 Rajalakshmi Srinivasaraghavan > > * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte > for unaligned inputs if size is less than 8. > --- > sysdeps/powerpc/powerpc64/power8/memset.S | 68 ++++++++++++++++++++++++++++++- > 1 file changed, 66 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/powerpc/powerpc64/power8/memset.S b/sysdeps/powerpc/powerpc64/power8/memset.S > index 7ad3bb1b00..504bab0841 100644 > --- a/sysdeps/powerpc/powerpc64/power8/memset.S > +++ b/sysdeps/powerpc/powerpc64/power8/memset.S > @@ -377,7 +377,8 @@ L(write_LT_32): > subf r5,r0,r5 > > 2: bf 30,1f > - sth r4,0(r10) > + stb r4,0(r10) > + stb r4,1(r10) > addi r10,r10,2 > > 1: bf 31,L(end_4bytes_alignment) > @@ -437,11 +438,74 @@ L(tail5): > /* Handles copies of 0~8 bytes. */ > .align 4 > L(write_LE_8): > - bne cr6,L(tail4) > + /* Use stb instead of sth which is safe for > + both aligned and unaligned inputs. */ > + bne cr6,L(LE7_tail4) > + /* If input is word aligned, use stw, Else use stb. */ > + andi. r0,r10,3 > + bne L(8_unalign) > > stw r4,0(r10) > stw r4,4(r10) > blr > + > + /* Unaligned input and size is 8. */ > + .align 4 > +L(8_unalign): > + andi. r0,r10,1 > + beq L(8_hwalign) > + stb r4,0(r10) > + sth r4,1(r10) > + sth r4,3(r10) > + sth r4,5(r10) > + stb r4,7(r10) > + blr > + > + /* Halfword aligned input and size is 8. */ > + .align 4 > +L(8_hwalign): > + sth r4,0(r10) > + sth r4,2(r10) > + sth r4,4(r10) > + sth r4,6(r10) > + blr > + > + .align 4 > + /* Copies 4~7 bytes. */ > +L(LE7_tail4): > + bf 29,L(LE7_tail2) > + stb r4,0(r10) > + stb r4,1(r10) > + stb r4,2(r10) > + stb r4,3(r10) > + bf 30,L(LE7_tail5) > + stb r4,4(r10) > + stb r4,5(r10) > + bflr 31 > + stb r4,6(r10) > + blr > + > + .align 4 > + /* Copies 2~3 bytes. */ > +L(LE7_tail2): > + bf 30,1f > + stb r4,0(r10) > + stb r4,1(r10) > + bflr 31 > + stb r4,2(r10) > + blr > + > + .align 4 > +L(LE7_tail5): > + bflr 31 > + stb r4,4(r10) > + blr > + > + .align 4 > +1: bflr 31 > + stb r4,0(r10) > + blr > + > END_GEN_TB (MEMSET,TB_TOCLESS) > libc_hidden_builtin_def (memset) > > -- > 2.11.0 >