From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 2FF0C3858C30 for ; Thu, 7 Dec 2023 07:14:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2FF0C3858C30 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2FF0C3858C30 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701933279; cv=none; b=TKYe5Di0OZd7f7Qqb/5KjKMXdw6UaBrQjBi1qvJa9M5n8HRmqhMyEL2/voWOZ8qjfbmEM+cp0PAhLNYb9CApxHoTapU8Z8r2YDFiFM8BZzOWOf3IN72vFLEFmpj0uHvErbFLHoU5GvTrpLpGtbOXlHjrSYAsjhJf0mDpjdVxE+A= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701933279; c=relaxed/simple; bh=tu/053Iklz9xX0WuHcHn/t/ZhCbqJSUPYXWBAXDpo30=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=xRtxBAWpYNYkJM9lEToQM5esKLmFXDTJ36Ciq0GoOrrrShGHDctxr8Ov7/OoKgE8c2QNz1vbWBOsDjaPwX9TfF7MqOH7cqBCi8b0ItJxaKZESHYCwt95EAQhVCm9FghypqGyavQcJdo9IXqiMc7QSbxRvPC5gC+zYL46eZgcz7Y= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B761II7009293; Thu, 7 Dec 2023 07:14:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : references : cc : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=pkRk4AtJEf4WtSMlgXWFsUPr3kbubefnPCV1LmOcJSQ=; b=qo25vT4XIIAgJiNBXoMIOW2CXIyc7LGdL5hXcnRU5Vj5ezmfwlrohH67CHDRMi+axxEX yEDqwYydQwpQtFKUcpqaBU/umj0T8X/nlaDAQUCLmg+6CTuQTSm+zYklJi7wMsb0lxL3 9lY4smsC71oYk3KJkjJijIf7YFWxl/jX9QPv9pyiMOG89JFbYuL1qEdL3/3OR5XbRw3u jwMU6GFr4pK8+9vKVp9HMZciJ98x1i6CB0iN1nxLzXqce0c9Si7TP73ajmKTvzTvcQPz 4g5lVKmq4e0doS3oSs9y9PaTwI+krj2dQ2SL09M8v01uCua8M5BKU9R3PLt/inLTawiH CQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uu7ftu1cq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 07 Dec 2023 07:14:36 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3B770Cad002564; Thu, 7 Dec 2023 07:14:36 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uu7ftu1c3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 07 Dec 2023 07:14:36 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3B74Xr5A028433; Thu, 7 Dec 2023 07:14:35 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3utavjsmgd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 07 Dec 2023 07:14:35 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3B77EVDa12517898 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 7 Dec 2023 07:14:31 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B4FF620040; Thu, 7 Dec 2023 07:14:31 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8213B20049; Thu, 7 Dec 2023 07:14:29 +0000 (GMT) Received: from [9.200.63.215] (unknown [9.200.63.215]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 7 Dec 2023 07:14:29 +0000 (GMT) Message-ID: Date: Thu, 7 Dec 2023 15:14:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp Content-Language: en-US To: Michael Meissner References: <77426697-1571-e180-add9-cfb6d10f8424@linux.ibm.com> <57d3fbcb-98b6-4658-8d08-e30f8c68a18c@linux.ibm.com> <11198028-5b04-4ebd-9374-a78dc85376a8@linux.ibm.com> Cc: Ajit Agarwal , Segher Boessenkool , David Edelsohn , Peter Bergner , GCC Patches From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: MFWe03cips9SDhQqarSsudnWGU6mmgj2 X-Proofpoint-GUID: SZj5kAbATWQelYe6JkQ1vzw5DjNIiqTR X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-07_04,2023-12-06_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 clxscore=1015 suspectscore=0 mlxlogscore=900 malwarescore=0 lowpriorityscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2312070057 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: on 2023/12/6 13:09, Michael Meissner wrote: > On Wed, Dec 06, 2023 at 10:22:57AM +0800, Kewen.Lin wrote: >> I'd expect you use UNSPEC_MMA_EXTRACT to extract V16QI from the result of lxvp, >> the current define_insn_and_split "*vsx_disassemble_pair" should be able to take >> care of it further (eg: reg and regoff). >> >> BR, >> Kewen > > With Peter's subreg patch, UNSPEC_MMA_EXTRACT would produce two move with > eSUBREGs: With the below details, I think you meant that even with Peter's subreg patch which was intended to get rid of UNSPEC_MMA_EXTRACT for OOmode, we could still have sub-optimal moves? The proposed subreg and the current UNSPEC_MMA_EXTRACT unspec are alternatives to extract the component from the result of lxvp. Since the latest trunk still adopts UNSPEC_MMA_EXTRACT, I replied to Ajit with it. > > For a FMA type loop such as: > > union vector_hack2 { > vector unsigned char vuc[2]; > vector double v[2]; > }; > > static void > use_mma_ld_st_normal_no_unroll (double * __restrict__ r, > const double * __restrict__ a, > const double * __restrict__ b, > size_t num) > { > __vector_pair * __restrict__ v_r = ( __vector_pair * __restrict__) r; > const __vector_pair * __restrict__ v_a = (const __vector_pair * __restrict__) a; > const __vector_pair * __restrict__ v_b = (const __vector_pair * __restrict__) b; > size_t num_vector = num / (2 * (sizeof (vector double) / sizeof (double))); > size_t num_scalar = num % (2 * (sizeof (vector double) / sizeof (double))); > size_t i; > union vector_hack2 a_union; > union vector_hack2 b_union; > union vector_hack2 r_union; > vector double a_hi, a_lo; > vector double b_hi, b_lo; > vector double r_hi, r_lo; > union vector_hack result_hi, result_lo; > > #pragma GCC unroll 0 > for (i = 0; i < num_vector; i++) > { > __builtin_vsx_disassemble_pair (&a_union.vuc, &v_a[i]); > __builtin_vsx_disassemble_pair (&b_union.vuc, &v_b[i]); > __builtin_vsx_disassemble_pair (&r_union.vuc, &v_r[i]); > > a_hi = a_union.v[0]; > b_hi = b_union.v[0]; > r_hi = r_union.v[0]; > > a_lo = a_union.v[1]; > b_lo = b_union.v[1]; > r_lo = r_union.v[1]; > > result_hi.v = (a_hi * b_hi) + r_hi; > result_lo.v = (a_lo * b_lo) + r_lo; > > __builtin_vsx_build_pair (&v_r[i], result_hi.vuc, result_lo.vuc); > } > > if (num_scalar) > { > r += num_vector * (2 * (sizeof (vector double) / sizeof (double))); > a += num_vector * (2 * (sizeof (vector double) / sizeof (double))); > b += num_vector * (2 * (sizeof (vector double) / sizeof (double))); > > #pragma GCC unroll 0 > for (i = 0; i < num_scalar; i++) > r[i] += (a[i] * b[i]); > } > > return; > } > > Peter's code would produce the following in the inner loop: > > (insn 16 15 19 4 (set (reg:OO 133 [ _43 ]) > (mem:OO (plus:DI (reg/v/f:DI 150 [ a ]) > (reg:DI 143 [ ivtmp.1088 ])) [6 MEM[(__vector_pair *)a_30(D) + ivtmp.1088_88 * 1]+0 S32 A128])) "p10-fma.h":3285:1 2181 {*movoo} > (nil)) > (insn 19 16 22 4 (set (reg:OO 136 [ _48 ]) > (mem:OO (plus:DI (reg/v/f:DI 151 [ b ]) > (reg:DI 143 [ ivtmp.1088 ])) [6 MEM[(__vector_pair *)b_31(D) + ivtmp.1088_88 * 1]+0 S32 A128])) "p10-fma.h":3285:1 2181 {*movoo} > (nil)) > (insn 22 19 25 4 (set (reg:OO 139 [ _53 ]) > (mem:OO (plus:DI (reg/v/f:DI 149 [ r ]) > (reg:DI 143 [ ivtmp.1088 ])) [6 MEM[(__vector_pair *)r_29(D) + ivtmp.1088_88 * 1]+0 S32 A128])) "p10-fma.h":3285:1 2181 {*movoo} > (nil)) > (insn 25 22 26 4 (set (reg:V2DF 117 [ _6 ]) > (fma:V2DF (subreg:V2DF (reg:OO 136 [ _48 ]) 16) > (subreg:V2DF (reg:OO 133 [ _43 ]) 16) > (subreg:V2DF (reg:OO 139 [ _53 ]) 16))) "p10-fma.h":3319:35 1265 {*vsx_fmav2df4} > (nil)) > (insn 26 25 27 4 (set (reg:V2DF 118 [ _8 ]) > (fma:V2DF (subreg:V2DF (reg:OO 136 [ _48 ]) 0) > (subreg:V2DF (reg:OO 133 [ _43 ]) 0) > (subreg:V2DF (reg:OO 139 [ _53 ]) 0))) "p10-fma.h":3320:35 1265 {*vsx_fmav2df4} > (expr_list:REG_DEAD (reg:OO 139 [ _53 ]) > (expr_list:REG_DEAD (reg:OO 136 [ _48 ]) > (expr_list:REG_DEAD (reg:OO 133 [ _43 ]) > (nil))))) > (insn 27 26 28 4 (set (reg:OO 142 [ _59 ]) > (unspec:OO [ > (subreg:V16QI (reg:V2DF 117 [ _6 ]) 0) > (subreg:V16QI (reg:V2DF 118 [ _8 ]) 0) > ] UNSPEC_VSX_ASSEMBLE)) 2183 {*vsx_assemble_pair} > (expr_list:REG_DEAD (reg:V2DF 118 [ _8 ]) > (expr_list:REG_DEAD (reg:V2DF 117 [ _6 ]) > (nil)))) > > Now in theory you could get ride of the UNSPEC_VSX_ASSEMBLE also using SUBREG's. Agree, it looks doable, this comment seems more for Peter's subreg patch. :) BR, Kewen