From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id E62F13987906 for ; Wed, 30 Sep 2020 14:21:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E62F13987906 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08UEKxTB138324; Wed, 30 Sep 2020 10:21:16 -0400 Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0b-001b2d01.pphosted.com with ESMTP id 33vrbmfc06-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 30 Sep 2020 10:21:15 -0400 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 08UED4cr014659; Wed, 30 Sep 2020 14:21:13 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma01dal.us.ibm.com with ESMTP id 33sw99qfan-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 30 Sep 2020 14:21:13 +0000 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 08UELBP39699718 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 30 Sep 2020 14:21:11 GMT Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BF841C605D; Wed, 30 Sep 2020 14:21:11 +0000 (GMT) Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 172D6C6062; Wed, 30 Sep 2020 14:21:10 +0000 (GMT) Received: from [9.163.24.219] (unknown [9.163.24.219]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP; Wed, 30 Sep 2020 14:21:10 +0000 (GMT) Subject: Re: [PATCH v3 2/2] powerpc: Add optimized stpncpy for POWER9 To: Adhemerval Zanella , libc-alpha@sourceware.org References: <20200929152103.18564-1-rzinsly@linux.ibm.com> <20200929152103.18564-2-rzinsly@linux.ibm.com> <37dd785c-60ec-f064-bfeb-7c5ec5483936@linaro.org> From: Raphael M Zinsly Message-ID: <8c436fe7-cdf4-a1fa-6777-f641f3b8a59c@linux.ibm.com> Date: Wed, 30 Sep 2020 11:21:09 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <37dd785c-60ec-f064-bfeb-7c5ec5483936@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-09-30_07:2020-09-30, 2020-09-30 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 bulkscore=0 suspectscore=0 impostorscore=0 lowpriorityscore=0 clxscore=1011 phishscore=0 adultscore=0 mlxscore=0 mlxlogscore=999 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009300108 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Sep 2020 14:21:18 -0000 Hi Adhemerval, On 30/09/2020 10:42, Adhemerval Zanella wrote: > > > On 29/09/2020 12:21, Raphael Moreira Zinsly via Libc-alpha wrote: >> Add stpncpy support into the POWER9 strncpy. > > The benchmark numbers you provided [1] seems to show it is slight worse than > the generic_strncpy which uses the same strategy as string/strncpy.c > (which would use VSX instruction through memset/memcpy). My implementation is always better than the generic_strncpy, almost three times better in average. And it calls memset as well. Are you talking about __strncpy_ppc? For some reason it is using strnlen_ppc instead of the strnlen_power8, but I didn't touch it. > Did you compare this > optimization against an implementation that just call power8/9 memset/memcpy > instead? > Not sure if I understand, isn't that generic_strncpy and strncpy_ppc? > It should resulting a smaller implementation which reduces i-cache size and > the code is much more simpler and maintainable. The same applies for stpncpy. > > I tried to dissuade Intel developers that such micro-optimization are not > really a real gain and instead we should optimize only a handful of string > operations (memcpy/memset/etc.) and use composable implementation instead > (as generic strncpy). It still resulted on 1a153e47fcc, but I think we > might do better for powerpc. > > [1] https://sourceware.org/pipermail/libc-alpha/2020-September/118049.html > Best Regards, -- Raphael Moreira Zinsly IBM Linux on Power Toolchain