From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4202 invoked by alias); 13 Sep 2017 13:12:57 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 4191 invoked by uid 89); 13 Sep 2017 13:12:57 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com From: "Tulio Magno Quites Machado Filho" To: Florian Weimer , Rajalakshmi Srinivasaraghavan Cc: libc-alpha@sourceware.org Cc: Subject: Re: [PATCH] powerpc: Use aligned stores in memset In-Reply-To: References: <1503033107-20047-1-git-send-email-raji@linux.vnet.ibm.com> User-Agent: Notmuch/0.24.2 (http://notmuchmail.org) Emacs/25.2.1 (x86_64-redhat-linux-gnu) Date: Wed, 13 Sep 2017 13:12:00 -0000 MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 x-cbid: 17091313-0008-0000-0000-00000890A271 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007720; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000227; SDB=6.00916427; UDB=6.00460194; IPR=6.00696659; BA=6.00005588; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017138; XFM=3.00000015; UTC=2017-09-13 13:12:47 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17091313-0009-0000-0000-000043F4C0A2 Message-Id: <87mv5yhhdh.fsf@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-13_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1709130207 X-SW-Source: 2017-09/txt/msg00549.txt.bz2 Florian Weimer writes: > On 08/18/2017 11:10 AM, Florian Weimer wrote: >> On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote: >>> >>> >>> On 08/18/2017 11:51 AM, Florian Weimer wrote: >>>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote: >>>>> * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte >>>>> for unaligned inputs if size is less than 8. >>>> >>>> This makes me rather nervous. powerpc64le was supposed to have >>>> reasonable efficient unaligned loads and stores. GCC happily generates >>>> them, too. >>> >>> This is meant ONLY for caching inhibited accesses. Caching Inhibited >>> accesses are required to be Guarded and properly aligned. >> >> The intent is to support memset for such memory regions, right? This >> change is insufficient. You have to fix GCC as well because it will >> inline memset of unaligned pointers, like this: > > Here's a more complete example: > > > #include > #include > #include > > typedef long __attribute__ ((aligned(1))) long_unaligned; > > __attribute__ ((noinline, noclone, weak)) > void > clear (long_unaligned *p) > { > memset (p, 0, sizeof (*p)); > } > > struct data > { > char misalign; > long_unaligned data; > }; > > int > main (void) > { > struct data *data = malloc (sizeof (*data)); > assert (data != NULL); > long_unaligned *p = &data->data; > printf ("pointer: %p\n", p); > clear (p); > return 0; > } > > The clear function compiles to: > > typedef long __attribute__ ((aligned(1))) long_unaligned; > > void > clear (long_unaligned *p) > { > memset (p, 0, sizeof (*p)); > } > > At run time, I get: > > pointer: 0x10003c10011 > > This means that GCC introduced an unaligned store, no matter how memset > was implemented. Which isn't necessarily a problem. The performance penalty only appears when the memory access is referring to an address which isn't at the instruction's natural boundary. In this case, memset should use stb to avoid an alignment interrupt. Notice that if the memory access is not at the natural boundary, an alignment interrupt is generated and it won't generate an error. The access will still happen, but it will have a performance penalty. > So I think the implementation constraint on the mem* functions is wrong. > It leads to a slower implementation of the mem* function for most of > userspace which does not access device memory, and even for device > memory, it is probably not what you want. Makes sense. But as there is nothing in the standard allowing or prohibiting the usage of mem* functions to access caching-inhibited memory, I thought it would make sense to provide functions that are as generic as possible. IMHO, it's easier for programmers to use generic functions in most scenarios and have access to specialized functions, e.g. a function for data already aligned at 16 bytes. -- Tulio Magno