From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id A47763857C55 for ; Fri, 18 Sep 2020 15:53:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A47763857C55 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08IFVuYN093337 for ; Fri, 18 Sep 2020 11:53:45 -0400 Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 33myfaguyt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 18 Sep 2020 11:53:45 -0400 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 08IFpfJP011068 for ; Fri, 18 Sep 2020 15:53:44 GMT Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by ppma03wdc.us.ibm.com with ESMTP id 33k5wcwby1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 18 Sep 2020 15:53:44 +0000 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 08IFrilM41025940 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 18 Sep 2020 15:53:44 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EA4ECAC059; Fri, 18 Sep 2020 15:53:43 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 73C95AC05F; Fri, 18 Sep 2020 15:53:43 +0000 (GMT) Received: from li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com (unknown [9.65.240.30]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTPS; Fri, 18 Sep 2020 15:53:43 +0000 (GMT) Date: Fri, 18 Sep 2020 10:53:41 -0500 From: "Paul A. Clarke" To: Raphael M Zinsly Cc: Matheus Castanho , libc-alpha@sourceware.org Subject: Re: [PATCH v2 2/2] powerpc: Add optimized stpncpy for POWER9 Message-ID: <20200918155341.GB377037@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> References: <20200904165653.16202-1-rzinsly@linux.ibm.com> <20200904165653.16202-2-rzinsly@linux.ibm.com> <67a53c6b-d350-f27f-f8a6-617f5658ccf2@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-09-18_14:2020-09-16, 2020-09-18 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 bulkscore=0 phishscore=0 clxscore=1015 mlxscore=0 malwarescore=0 suspectscore=0 mlxlogscore=999 spamscore=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009180124 X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Sep 2020 15:53:47 -0000 On Wed, Sep 16, 2020 at 09:56:59AM -0300, Raphael M Zinsly via Libc-alpha wrote: > On 16/09/2020 09:32, Matheus Castanho wrote: > > On 9/4/20 1:59 PM, Raphael M Zinsly via Libc-alpha wrote: > > > Benchtest output: > > >                                 generic_stpncpy    __stpncpy_power9 __stpncpy_power8    __stpncpy_power7   __stpncpy_ppc > > > > > Length  512, n 1024, alignment  0/ 0:    20.5111    22.9782   19.6648    21.3857 42.4801 > > > > > Length  512, n 1024, alignment  1/ 6:    29.9694    24.3087   22.0513    46.7436 51.5908 > > > > These two seem to be the only cases in which the power9 version loses to > > the power8 one. Have you investigated what happens in these two specific > > cases? > > > Yes the power8 optimization calls memset to do the zero padding at the end > if n > length. In this case where n is way higher, memset is faster than the > loop used in my implementation. Is there some sort of threshold that would help these cases by transitioning to memset (or replicating the relevant part of that code here? PC