From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 78852 invoked by alias); 28 Nov 2018 19:25:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 78473 invoked by uid 89); 28 Nov 2018 19:25:04 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-12.6 required=5.0 tests=BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=PhD, phd X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Nov 2018 19:24:23 +0000 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wASJNoVO048641 for ; Wed, 28 Nov 2018 14:24:08 -0500 Received: from e11.ny.us.ibm.com (e11.ny.us.ibm.com [129.33.205.201]) by mx0a-001b2d01.pphosted.com with ESMTP id 2p20xurg6r-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 28 Nov 2018 14:24:07 -0500 Received: from localhost by e11.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 28 Nov 2018 19:24:06 -0000 Received: from b01cxnp23032.gho.pok.ibm.com (9.57.198.27) by e11.ny.us.ibm.com (146.89.104.198) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 28 Nov 2018 19:24:04 -0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wASJO3Q219202084 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 28 Nov 2018 19:24:03 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 28B0CB206C; Wed, 28 Nov 2018 19:24:03 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9828FB206A; Wed, 28 Nov 2018 19:24:02 +0000 (GMT) Received: from ragesh4.local (unknown [9.211.72.47]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 28 Nov 2018 19:24:02 +0000 (GMT) Subject: Re: [PATCH][rs6000] better use of unaligned vsx in memset() expansion To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, Bill Schmidt , David Edelsohn References: <0e5a2fa3-47df-47d4-89cb-5c421a1e366b@linux.ibm.com> <20181126222902.GK23873@gate.crashing.org> From: Aaron Sawdey Date: Wed, 28 Nov 2018 19:25:00 -0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: <20181126222902.GK23873@gate.crashing.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit x-cbid: 18112819-2213-0000-0000-000003212CD3 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010138; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000270; SDB=6.01124104; UDB=6.00583603; IPR=6.00904233; MB=3.00024371; MTD=3.00000008; XFM=3.00000015; UTC=2018-11-28 19:24:05 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18112819-2214-0000-0000-00005C6AD1B3 Message-Id: X-IsSubscribed: yes X-SW-Source: 2018-11/txt/msg02347.txt.bz2 The first version of this had a big bug and cleared past the requested bytes. This version passes regstrap on ppc64le(power7/8/9), ppc64be(power6/7/8), and ppc32(power8). OK for trunk (and 8 backport after a week)? Thanks! Aaron Index: gcc/config/rs6000/rs6000-string.c =================================================================== --- gcc/config/rs6000/rs6000-string.c (revision 266524) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -85,6 +85,8 @@ if (! optimize_size && bytes > 8 * clear_step) return 0; + bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); + for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) { machine_mode mode = BLKmode; @@ -91,8 +93,7 @@ rtx dest; if (TARGET_ALTIVEC - && ((bytes >= 16 && align >= 128) - || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX))) + && (bytes >= 16 && ( align >= 128 || unaligned_vsx_ok))) { clear_bytes = 16; mode = V4SImode; On 11/26/18 4:29 PM, Segher Boessenkool wrote: > On Mon, Nov 26, 2018 at 03:08:32PM -0600, Aaron Sawdey wrote: >> When I previously added the use of unaligned vsx stores to inline expansion >> of memset, I didn't do a good job of managing boundary conditions. The intention >> was to only use unaligned vsx if the block being cleared was more than 32 bytes. >> What it actually did was to prevent the use of unaligned vsx for the last 32 >> bytes of any block being cleared. So this change puts the test up front so it >> is not affected by the decrement of bytes. > > Oh wow. Yes, that isn't so great. Okay for trunk (and whatever backports). > Thanks, > > > Segher > > >> 2018-11-26 Aaron Sawdey >> >> * config/rs6000/rs6000-string.c (expand_block_clear): Change how >> we determine if unaligned vsx is ok. > -- Aaron Sawdey, Ph.D. acsawdey@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain