From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 4A13B395BC51 for ; Thu, 29 Apr 2021 18:41:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4A13B395BC51 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13TIXRC4138887; Thu, 29 Apr 2021 14:40:56 -0400 Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com with ESMTP id 3881hts84d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 14:40:55 -0400 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13TIRIRi017138; Thu, 29 Apr 2021 18:40:55 GMT Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by ppma04dal.us.ibm.com with ESMTP id 384aya7te0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 18:40:55 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13TIesMd23855498 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Apr 2021 18:40:54 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 50FB2AE063; Thu, 29 Apr 2021 18:40:54 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1BF3AAE060; Thu, 29 Apr 2021 18:40:53 +0000 (GMT) Received: from work-tp (unknown [9.65.216.65]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTPS; Thu, 29 Apr 2021 18:40:52 +0000 (GMT) Date: Thu, 29 Apr 2021 15:40:49 -0300 From: Raoni Fassina Firmino To: "Lucas A. M. Magalhaes" Cc: libc-alpha@sourceware.org, tuliom@linux.ibm.com, anton@ozlabs.org Subject: Re: [PATCH] powerpc64le: Optimize memset for POWER10 Message-ID: <20210429184049.hbdv5l6yijfedt47@work-tp> Mail-Followup-To: "Lucas A. M. Magalhaes" , libc-alpha@sourceware.org, tuliom@linux.ibm.com, anton@ozlabs.org References: <20210428144048.gyeulahanuzjiotq@work-tp> <161963570877.1422733.11894775104498167968@fedora.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <161963570877.1422733.11894775104498167968@fedora.local> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: qLvoK2yidgKHjGUVlw9ZwMYsikM9TEhL X-Proofpoint-ORIG-GUID: qLvoK2yidgKHjGUVlw9ZwMYsikM9TEhL X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-04-29_10:2021-04-28, 2021-04-29 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 clxscore=1015 bulkscore=0 suspectscore=0 adultscore=0 phishscore=0 lowpriorityscore=0 mlxlogscore=999 priorityscore=1501 malwarescore=0 mlxscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104290118 X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Apr 2021 18:41:02 -0000 Thanks for the review Lucas, please let me know if I missed something. On Wed, Apr 28, 2021 at 03:48:28PM -0300, Lucas A. M. Magalhaes wrote: > > + /* After alignment, if there is 127B or less left > s/127B/64B/ Done. This is an awkward position/block, this comment is about the whole block, up until the `beq L(tail_128)`, but there is the label 'L(aligned)' in the middle that makes it hard to underhand. But it truly is that, after the alignment if there is less than 128 bytes is goes to the tail, but there is this optimization to go straight to the part of the tail depending on the amount left. any way, refrased it to be a single line (less obstrusive) and add a new one for the second branch: > > + go directly to the tail. */ > > + cmpldi r5,64 > > + blt L(tail_64) > > + > > + .balign 16 > > +L(aligned): > > + srdi. r0,r5,7 > > + beq L(tail_128) Here^, added another after L(aligned). > > + > > + cmpldi cr5,r5,255 > > + cmpldi cr6,r4,0 > > + crand 27,26,21 > > + bt 27,L(dcbz) > Maybe add a comment to explain this branch. Done. I was counting on the comment on the label itself, but I guess it makes sense to add a brief comment here also, avoid going back and forward to understand the condition check. > > + .balign 16 > > +L(tail_128): > The label tail_128 made me think that here would be copied 128 bytes. > Maybe add a comment here. Done. Sorry, yes, all this "tail_*" sections are "up to", the number being the maximum that it will write. But this one is in fact from 64 up to 128. > > > + stxv v0+32,0(r6) > > + stxv v0+32,16(r6) > > + stxv v0+32,32(r6) > > + stxv v0+32,48(r6) > > + addi r6,r6,64 > > + andi. r5,r5,63 > > + beqlr > > + > > + .balign 16 > > +L(tail_64): > Maybe add a comment here to explay this section as well. Done. o/ Raoni