From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 72CEE3847839 for ; Fri, 30 Apr 2021 21:15:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 72CEE3847839 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13ULF4bU141769; Fri, 30 Apr 2021 17:15:28 -0400 Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0b-001b2d01.pphosted.com with ESMTP id 388smy0066-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Apr 2021 17:15:27 -0400 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13ULEEeh021046; Fri, 30 Apr 2021 21:15:27 GMT Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by ppma03dal.us.ibm.com with ESMTP id 388gtrv951-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Apr 2021 21:15:27 +0000 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13ULFP5a25690370 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 30 Apr 2021 21:15:25 GMT Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BAF02C605B; Fri, 30 Apr 2021 21:15:25 +0000 (GMT) Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E61EAC6059; Fri, 30 Apr 2021 21:15:24 +0000 (GMT) Received: from linux.ibm.com (unknown [9.145.37.220]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP; Fri, 30 Apr 2021 21:15:24 +0000 (GMT) From: Tulio Magno Quites Machado Filho To: Raoni Fassina Firmino , libc-alpha@sourceware.org Cc: rzinsly@linux.ibm.com, anton@ozlabs.org Subject: Re: [PATCH v2] powerpc64le: Optimize memset for POWER10 In-Reply-To: <20210429234542.aeer5ncryowx34gs@work-tp> References: <20210429234542.aeer5ncryowx34gs@work-tp> User-Agent: Notmuch/0.31.3 (http://notmuchmail.org) Emacs/27.2 (x86_64-redhat-linux-gnu) Date: Fri, 30 Apr 2021 18:15:22 -0300 Message-ID: <87eeermq39.fsf@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 7isDRinwb8C0luR9QG-SbyeVXs2FuWWG X-Proofpoint-GUID: 7isDRinwb8C0luR9QG-SbyeVXs2FuWWG X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-04-30_13:2021-04-30, 2021-04-30 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 spamscore=0 mlxscore=0 suspectscore=0 priorityscore=1501 lowpriorityscore=0 mlxlogscore=961 clxscore=1015 impostorscore=0 malwarescore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104300148 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Apr 2021 21:15:32 -0000 Raoni Fassina Firmino via Libc-alpha writes: > This implementation is based on __memset_power8 and integrates a lot > of suggestions from Anton Blanchard. > > The biggest difference is that it makes extensive use of stxvl to > alignment and tail code to avoid branches and small stores. It has > three main execution paths: > > a) "Short lengths" for lengths up to 64 bytes, avoiding as many > branches as possible. > > b) "General case" for larger lengths, it has an alignment section > using stxvl to avoid branches, a 128 bytes loop and then a tail > code, again using stxvl with few branches. > > c) "Zeroing cache blocks" for lengths from 256 bytes upwards and set > value being zero. It is mostly the __memset_power8 code but the > alignment phase was simplified because, at this point, address is > already 16-bytes aligned and also changed to use vector stores. > The tail code was also simplified to reuse the general case tail. > > All unaligned stores use stxvl instructions that do not generate > alignment interrupts on POWER10, making it safe to use on > caching-inhibited memory. > > On average, this implementation provides something around 30% > improvement when compared to __memset_power8. LGTM. Reviewed-by: Tulio Magno Quites Machado Filho Pushed as 23fdf8178cce3c2ec320dd5eca8b544245bcaef0. Thanks! -- Tulio Magno