From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 014073851C24 for ; Wed, 16 Sep 2020 12:57:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 014073851C24 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08GCWbWr117317 for ; Wed, 16 Sep 2020 08:57:03 -0400 Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com with ESMTP id 33kgrn4snb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 16 Sep 2020 08:57:03 -0400 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 08GCkhRA030618 for ; Wed, 16 Sep 2020 12:57:02 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma02dal.us.ibm.com with ESMTP id 33k5v95wu5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 16 Sep 2020 12:57:02 +0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 08GCv1QV50659780 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 16 Sep 2020 12:57:01 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 74ACE112063; Wed, 16 Sep 2020 12:57:01 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 94B58112061; Wed, 16 Sep 2020 12:57:00 +0000 (GMT) Received: from [9.163.19.239] (unknown [9.163.19.239]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 16 Sep 2020 12:57:00 +0000 (GMT) Subject: Re: [PATCH v2 2/2] powerpc: Add optimized stpncpy for POWER9 To: Matheus Castanho , libc-alpha@sourceware.org References: <20200904165653.16202-1-rzinsly@linux.ibm.com> <20200904165653.16202-2-rzinsly@linux.ibm.com> <67a53c6b-d350-f27f-f8a6-617f5658ccf2@linux.ibm.com> From: Raphael M Zinsly Message-ID: Date: Wed, 16 Sep 2020 09:56:59 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-09-16_07:2020-09-16, 2020-09-16 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 bulkscore=0 clxscore=1015 priorityscore=1501 lowpriorityscore=0 mlxlogscore=999 mlxscore=0 suspectscore=0 phishscore=0 spamscore=0 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009160092 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2020 12:57:05 -0000 Hi Matheus, On 16/09/2020 09:32, Matheus Castanho wrote: > On 9/4/20 1:59 PM, Raphael M Zinsly via Libc-alpha wrote: >> Benchtest output: >>                                 generic_stpncpy    __stpncpy_power9 __stpncpy_power8    __stpncpy_power7   __stpncpy_ppc > >> Length  512, n 1024, alignment  0/ 0:    20.5111    22.9782   19.6648    21.3857 42.4801 > >> Length  512, n 1024, alignment  1/ 6:    29.9694    24.3087   22.0513    46.7436 51.5908 > > These two seem to be the only cases in which the power9 version loses to > the power8 one. Have you investigated what happens in these two specific > cases? > Yes the power8 optimization calls memset to do the zero padding at the end if n > length. In this case where n is way higher, memset is faster than the loop used in my implementation. Thanks for the review! Regards, -- Raphael Moreira Zinsly IBM Linux on Power Toolchain