From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 617683857C51 for ; Wed, 6 Dec 2023 05:09:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 617683857C51 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 617683857C51 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701839358; cv=none; b=U96jayFV/weyfB3xd+SJHD7MXjZVe4tXvZo3meER9C/auvhkXTqEMj2grhtVoo7vkCyQtJFB2ZTj+s+PdCT+Ca3btXs/n8VNVWqEKE1qb0ptBbZS3Ec4gJPmnj/d4KnUNUyrB/ifkzx/Z2xahUVV4ZxyIrV+EsVtgt03nRXQ1aQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701839358; c=relaxed/simple; bh=0FMO8WHYIaWL4ORNbwKZUp38oulF/LT8yhMzGwhdCJ8=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=fXHbqlzIWOcKibGTRQ+NdbiTikG2MPSfN5qX1y/6Yp5YgMV8FZ0L5P64MndfADXM8lLuZ1EldnJCd23dc8jdTPYq1gkZ8ZF8umf33HlG3RjzieX6wMs/W28HA/KTi8DvoQ9wcFUKye7YVQA7SgpBWQnZhaEuvWkYyozGnczM1Fg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B657edT012964; Wed, 6 Dec 2023 05:09:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=N+J56/nbKkB32ekKkCTq9cjKUOkWd5xOMASH/OTcPpA=; b=RBt0FILJte/Uw6quWNOga8Q3yz7Fna19uou0HEV27o4kfnIts9Ox/bzh5WdtFv0baJs9 yUMks1Zh1Lmq66mfBabdve6+85E/qCkI8mzO6WwY4GF0/BN7kog1Z0EGTVXgzCsos8K2 5e9EZs/6P4thiHgC9r2CG3DnsaHx1BupOQWiUy1ROiYvbWGu99+0r0GXDJ4EYeiPUK+L Dm20CMuMnHjR23x+pRWwEqY83aBZ7U/RAM7clm7nW7ukaMH4DxTOwgXS09JjnDCSaqTF HYk8W+tCLeu7HnWusbIH/7L8sbCVUm7LSWVCIhfGdoJggucyeUN2exkGIBfhw1JjZaCU JA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3utjhbg10b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 06 Dec 2023 05:09:13 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3B658qkB016508; Wed, 6 Dec 2023 05:09:13 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3utjhbg101-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 06 Dec 2023 05:09:13 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3B6540Hc002749; Wed, 6 Dec 2023 05:09:12 GMT Received: from smtprelay02.dal12v.mail.ibm.com ([172.16.1.4]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3utav4a8vr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 06 Dec 2023 05:09:12 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay02.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3B659BgG45744512 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 6 Dec 2023 05:09:11 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CC96858061; Wed, 6 Dec 2023 05:09:11 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5345E5803F; Wed, 6 Dec 2023 05:09:11 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.172.182]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTPS; Wed, 6 Dec 2023 05:09:11 +0000 (GMT) Date: Wed, 6 Dec 2023 00:09:09 -0500 From: Michael Meissner To: "Kewen.Lin" Cc: Ajit Agarwal , Segher Boessenkool , David Edelsohn , Peter Bergner , Michael Meissner , GCC Patches Subject: Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp Message-ID: Mail-Followup-To: Michael Meissner , "Kewen.Lin" , Ajit Agarwal , Segher Boessenkool , David Edelsohn , Peter Bergner , GCC Patches References: <77426697-1571-e180-add9-cfb6d10f8424@linux.ibm.com> <57d3fbcb-98b6-4658-8d08-e30f8c68a18c@linux.ibm.com> <11198028-5b04-4ebd-9374-a78dc85376a8@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 4ty79Gjhqp5LKtCc0jSUDXiW7cjVtdY0 X-Proofpoint-GUID: 5-LZDGsjaatdsz6CVumFmCvctGkzcgaD X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-06_03,2023-12-05_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxlogscore=679 spamscore=0 clxscore=1015 mlxscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2312060040 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Dec 06, 2023 at 10:22:57AM +0800, Kewen.Lin wrote: > I'd expect you use UNSPEC_MMA_EXTRACT to extract V16QI from the result of lxvp, > the current define_insn_and_split "*vsx_disassemble_pair" should be able to take > care of it further (eg: reg and regoff). > > BR, > Kewen With Peter's subreg patch, UNSPEC_MMA_EXTRACT would produce two move with eSUBREGs: For a FMA type loop such as: union vector_hack2 { vector unsigned char vuc[2]; vector double v[2]; }; static void use_mma_ld_st_normal_no_unroll (double * __restrict__ r, const double * __restrict__ a, const double * __restrict__ b, size_t num) { __vector_pair * __restrict__ v_r = ( __vector_pair * __restrict__) r; const __vector_pair * __restrict__ v_a = (const __vector_pair * __restrict__) a; const __vector_pair * __restrict__ v_b = (const __vector_pair * __restrict__) b; size_t num_vector = num / (2 * (sizeof (vector double) / sizeof (double))); size_t num_scalar = num % (2 * (sizeof (vector double) / sizeof (double))); size_t i; union vector_hack2 a_union; union vector_hack2 b_union; union vector_hack2 r_union; vector double a_hi, a_lo; vector double b_hi, b_lo; vector double r_hi, r_lo; union vector_hack result_hi, result_lo; #pragma GCC unroll 0 for (i = 0; i < num_vector; i++) { __builtin_vsx_disassemble_pair (&a_union.vuc, &v_a[i]); __builtin_vsx_disassemble_pair (&b_union.vuc, &v_b[i]); __builtin_vsx_disassemble_pair (&r_union.vuc, &v_r[i]); a_hi = a_union.v[0]; b_hi = b_union.v[0]; r_hi = r_union.v[0]; a_lo = a_union.v[1]; b_lo = b_union.v[1]; r_lo = r_union.v[1]; result_hi.v = (a_hi * b_hi) + r_hi; result_lo.v = (a_lo * b_lo) + r_lo; __builtin_vsx_build_pair (&v_r[i], result_hi.vuc, result_lo.vuc); } if (num_scalar) { r += num_vector * (2 * (sizeof (vector double) / sizeof (double))); a += num_vector * (2 * (sizeof (vector double) / sizeof (double))); b += num_vector * (2 * (sizeof (vector double) / sizeof (double))); #pragma GCC unroll 0 for (i = 0; i < num_scalar; i++) r[i] += (a[i] * b[i]); } return; } Peter's code would produce the following in the inner loop: (insn 16 15 19 4 (set (reg:OO 133 [ _43 ]) (mem:OO (plus:DI (reg/v/f:DI 150 [ a ]) (reg:DI 143 [ ivtmp.1088 ])) [6 MEM[(__vector_pair *)a_30(D) + ivtmp.1088_88 * 1]+0 S32 A128])) "p10-fma.h":3285:1 2181 {*movoo} (nil)) (insn 19 16 22 4 (set (reg:OO 136 [ _48 ]) (mem:OO (plus:DI (reg/v/f:DI 151 [ b ]) (reg:DI 143 [ ivtmp.1088 ])) [6 MEM[(__vector_pair *)b_31(D) + ivtmp.1088_88 * 1]+0 S32 A128])) "p10-fma.h":3285:1 2181 {*movoo} (nil)) (insn 22 19 25 4 (set (reg:OO 139 [ _53 ]) (mem:OO (plus:DI (reg/v/f:DI 149 [ r ]) (reg:DI 143 [ ivtmp.1088 ])) [6 MEM[(__vector_pair *)r_29(D) + ivtmp.1088_88 * 1]+0 S32 A128])) "p10-fma.h":3285:1 2181 {*movoo} (nil)) (insn 25 22 26 4 (set (reg:V2DF 117 [ _6 ]) (fma:V2DF (subreg:V2DF (reg:OO 136 [ _48 ]) 16) (subreg:V2DF (reg:OO 133 [ _43 ]) 16) (subreg:V2DF (reg:OO 139 [ _53 ]) 16))) "p10-fma.h":3319:35 1265 {*vsx_fmav2df4} (nil)) (insn 26 25 27 4 (set (reg:V2DF 118 [ _8 ]) (fma:V2DF (subreg:V2DF (reg:OO 136 [ _48 ]) 0) (subreg:V2DF (reg:OO 133 [ _43 ]) 0) (subreg:V2DF (reg:OO 139 [ _53 ]) 0))) "p10-fma.h":3320:35 1265 {*vsx_fmav2df4} (expr_list:REG_DEAD (reg:OO 139 [ _53 ]) (expr_list:REG_DEAD (reg:OO 136 [ _48 ]) (expr_list:REG_DEAD (reg:OO 133 [ _43 ]) (nil))))) (insn 27 26 28 4 (set (reg:OO 142 [ _59 ]) (unspec:OO [ (subreg:V16QI (reg:V2DF 117 [ _6 ]) 0) (subreg:V16QI (reg:V2DF 118 [ _8 ]) 0) ] UNSPEC_VSX_ASSEMBLE)) 2183 {*vsx_assemble_pair} (expr_list:REG_DEAD (reg:V2DF 118 [ _8 ]) (expr_list:REG_DEAD (reg:V2DF 117 [ _6 ]) (nil)))) Now in theory you could get ride of the UNSPEC_VSX_ASSEMBLE also using SUBREG's. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meissner@linux.ibm.com