From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id D48B93858CD1 for ; Fri, 8 Dec 2023 09:51:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D48B93858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D48B93858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702029096; cv=none; b=FPR4k1ADoLmWx7vTShroRVpAfDnZZLBbMgvB01w+U+hy3yFsRETXKwbnrYSyufGiNQJ8f7V4joAyno+/fbYErH7Of2JwHXtwqqvs8ecaTkjDjewCMskaYK4txrYTfpxqDT2vCDqGFLe/E2B35SdLAVNJm9Sb7U01+wsRx70T2lI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702029096; c=relaxed/simple; bh=HVjdXqVwYDUYAmr3kjkTXAHVrTHQobxTdTZOml27OTI=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=VrD0ecbJsaI+RQVtuF/Br2UZ4F/53Y7mVyl8cqWDOYzATStjNER6rf3dGQ2LVwFuu/0ClB+FpPpx/kNg+MqQeOu6l3QhaEnYLvLEyOeXcji9ZXJUnc48uTJ4U2twMe+Drr5FgXLC94oLaXGv9wLsyKt/G4UeF2fHv363mptzIHs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B87lmql004644; Fri, 8 Dec 2023 09:51:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=ZWwe+C/mBXJhtR17KxlVdHUUOkq7uTy61VDz20uC1pg=; b=TvRVQdhbG24PbbZtO2RsVMnAea4WaxkAc7U6iFpn+ylBDM5V7O/xb5FRMXM5I4PsOWG/ UUrdPJBn7dnmIMwo84ORYtzKFFCG4zq0+VF6ovD9+GcT88qFIWBl1rdm9tvIyNuJNp7w TAkQCegOPaoNYwxizzbNxOxjMvoX1KxGtC67/qqsrykrATUqpIqp4gKaqSpCtM6JchL1 G9qFnW0Pn02J3ENLekanJMWljuQeau6DWd5Bg3dY6TXBBKthE/mPZpNevTPVpdfnGHVG xPOLHGE2HqLriEb7wRjF7/do6yPPR5EV21E//0wj1MyMTfwxfgpk6JfakDCo00T+59M4 0w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uuy2e37dy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 08 Dec 2023 09:51:28 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3B89aX0R005702; Fri, 8 Dec 2023 09:51:28 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uuy2e37d4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 08 Dec 2023 09:51:28 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3B89JO1Z015431; Fri, 8 Dec 2023 09:51:27 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3utavkrv51-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 08 Dec 2023 09:51:27 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3B89pOIk66126304 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 8 Dec 2023 09:51:24 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5502F20043; Fri, 8 Dec 2023 09:51:24 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 753FF20040; Fri, 8 Dec 2023 09:51:20 +0000 (GMT) Received: from [9.197.246.45] (unknown [9.197.246.45]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 8 Dec 2023 09:51:20 +0000 (GMT) Message-ID: <2ad6513b-3d85-4871-eac7-a39cb23772d7@linux.ibm.com> Date: Fri, 8 Dec 2023 17:51:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp Content-Language: en-US To: Ajit Agarwal Cc: Segher Boessenkool , David Edelsohn , Peter Bergner , Michael Meissner , GCC Patches , Vladimir Makarov , Richard Sandiford , Richard Biener , Jeff Law , Jakub Jelinek References: <77426697-1571-e180-add9-cfb6d10f8424@linux.ibm.com> <57d3fbcb-98b6-4658-8d08-e30f8c68a18c@linux.ibm.com> <11198028-5b04-4ebd-9374-a78dc85376a8@linux.ibm.com> <5467f44c-7b16-444d-a292-8da237f7a9f5@linux.ibm.com> <75bdc1fc-23c9-4ca2-a338-fc1328319114@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: <75bdc1fc-23c9-4ca2-a338-fc1328319114@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: r9ax4B4dGIVeb9Y_RiDX_80IkurM-zTT X-Proofpoint-ORIG-GUID: XnqkFZnTIPc1MhhusLXpEIJJLF3sF-OM X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-08_04,2023-12-07_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 impostorscore=0 phishscore=0 suspectscore=0 clxscore=1015 priorityscore=1501 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2312080080 X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Ajit, on 2023/12/8 16:01, Ajit Agarwal wrote: > Hello Kewen: > > On 07/12/23 4:31 pm, Ajit Agarwal wrote: >> Hello Kewen: >> >> On 06/12/23 7:52 am, Kewen.Lin wrote: >>> on 2023/12/6 02:01, Ajit Agarwal wrote: >>>> Hello Kewen: >>>> >>>> >>>> On 05/12/23 7:13 pm, Ajit Agarwal wrote: >>>>> Hello Kewen: >>>>> >>>>> On 04/12/23 7:31 am, Kewen.Lin wrote: >>>>>> Hi Ajit, >>>>>> >>>>>> on 2023/12/1 17:10, Ajit Agarwal wrote: >>>>>>> Hello Kewen: >>>>>>> >>>>>>> On 24/11/23 3:01 pm, Kewen.Lin wrote: >>>>>>>> Hi Ajit, >>>>>>>> >>>>>>>> Don't forget to CC David (CC-ed) :), some comments are inlined below. >>>>>>>> >>>>>>>> on 2023/10/8 03:04, Ajit Agarwal wrote: >>>>>>>>> Hello All: >>>>>>>>> >>>>>>>>> This patch add new pass to replace contiguous addresses vector load lxv with mma instruction >>>>>>>>> lxvp. >>>>>>>> >>>>>>>> IMHO the current binding lxvp (and lxvpx, stxvp{x,}) to MMA looks wrong, it's only >>>>>>>> Power10 and VSX required, these instructions should perform well without MMA support. >>>>>>>> So one patch to separate their support from MMA seems to go first. >>>>>>>> >>>>>>> >>>>>>> I will make the changes for Power10 and VSX. >>>>>>> >>>>>>>>> This patch addresses one regressions failure in ARM architecture. >>>>>>>> >>>>>>>> Could you explain this? I don't see any test case for this. >>>>>>> >>>>>>> I have submitted v1 of the patch and there were regressions failure for Linaro. >>>>>>> I have fixed in version V2. >>>>>> >>>>>> OK, thanks for clarifying. So some unexpected changes on generic code in v1 >>>>>> caused the failure exposed on arm. >>>>>> >>>>>>> >>>>>>> >>>>>>>> Besides, it seems a bad idea to put this pass after reload? as register allocation >>>>>>>> finishes, this pairing has to be restricted by the reg No. (I didn't see any >>>>>>>> checking on the reg No. relationship for paring btw.) >>>>>>>> >>>>>>> >>>>>>> Adding before reload pass deletes one of the lxv and replaced with lxvp. This >>>>>>> fails in reload pass while freeing reg_eqivs as ira populates them and then >>>>>> >>>>>> I can't find reg_eqivs, I guessed you meant reg_equivs and moved this pass right before >>>>>> pass_reload (between pass_ira and pass_reload)? IMHO it's unexpected as those two passes >>>>>> are closely correlated. I was expecting to put it somewhere before ira. >>>>> >>>>> Yes they are tied together and moving before reload will not work. >>>>> >>>>>> >>>>>>> vecload pass deletes some of insns and while freeing in reload pass as insn >>>>>>> is already deleted in vecload pass reload pass segfaults. >>>>>>> >>>>>>> Moving vecload pass before ira will not make register pairs with lxvp and >>>>>>> in ira and that will be a problem. >>>>>> >>>>>> Could you elaborate the obstacle for moving such pass before pass_ira? >>>>>> >>>>>> Basing on the status quo, the lxvp is bundled with OOmode, then I'd expect >>>>>> we can generate OOmode move (load) and use the components with unspec (or >>>>>> subreg with Peter's patch) to replace all the previous use places, it looks >>>>>> doable to me. >>>>> >>>>> Moving before ira passes, we delete the offset lxv and generate lxvp and replace all >>>>> the uses, that I am doing. But the offset lxvp register generated by ira are not >>>>> register pair and generate random register and hence we cannot generate lxvp. >>>>> >>>>> For example one lxv is generated with register 32 and other pair is generated >>>>> with register 45 by ira if we move it before ira passes. >>>> >>>> It generates the following. >>>> lxvp %vs32,0(%r4) >>>> xvf32ger 0,%vs34,%vs32 >>>> xvf32gerpp 0,%vs34,%vs45 >>> >>> What do the RTL insns for these insns look like? >>> >>> I'd expect you use UNSPEC_MMA_EXTRACT to extract V16QI from the result of lxvp, >>> the current define_insn_and_split "*vsx_disassemble_pair" should be able to take >>> care of it further (eg: reg and regoff). >>> >> >> Yes with UNSPEC_MMA_EXTRACT it generates lxvp with register pair instead of random >> register by ira and reload pass. But there is an extra moves that gets generated. >> > > With UNSPEC_MMA_EXTRACT I could generate the register pair but functionally here is the > below code which is incorrect.> > l lxvp %vs0,0(%r4) > xxlor %vs32,%vs0,%vs0 > xvf32ger 0,%vs34,%vs32 > xvf32gerpp 0,%vs34,%vs33 > xxmfacc 0 > stxvp %vs2,0(%r3) > stxvp %vs0,32(%r3) > blr > > > Here is the RTL Code: > > (insn 19 4 20 2 (set (reg:OO 124 [ *ptr_4(D) ]) > (mem:OO (reg/v/f:DI 122 [ ptr ]) [0 *ptr_4(D)+0 S16 A128])) -1 > (nil)) > (insn 20 19 9 2 (set (reg:V16QI 129 [orig:124 *ptr_4(D) ] [124]) > (subreg:V16QI (reg:OO 124 [ *ptr_4(D) ]) 0)) -1 > (nil)) > (insn 9 20 11 2 (set (reg:XO 119 [ _7 ]) > (unspec:XO [ > (reg/v:V16QI 123 [ src ]) > (reg:V16QI 129 [orig:124 *ptr_4(D) ] [124]) > ] UNSPEC_MMA_XVF32GER)) 2195 {mma_xvf32ger} > (expr_list:REG_DEAD (reg:OO 124 [ *ptr_4(D) ]) > (nil))) > (insn 11 9 12 2 (set (reg:XO 120 [ _9 ]) > (unspec:XO [ > (reg:XO 119 [ _7 ]) > (reg/v:V16QI 123 [ src ]) > (reg:V16QI 125 [ MEM[(__vector unsigned char *)ptr_4(D) + 16B] ]) Thanks for trying this and the update! I think the functionality issue is due to this "reg:V16QI 125" isn't defined, it seems that you don't make explicit code to extract this component from "reg:OO 124"? > ] UNSPEC_MMA_XVF32GERPP)) 2209 {mma_xvf32gerpp} > (expr_list:REG_DEAD (reg:V16QI 125 [ MEM[(__vector unsigned char *)ptr_4(D) + 16B] ]) > (expr_list:REG_DEAD (reg/v:V16QI 123 [ src ]) > (expr_list:REG_DEAD (reg:XO 119 [ _7 ]) > (nil))))) > (insn 12 11 18 2 (set (mem:XO (reg:DI 126) [1 *dst_10(D)+0 S64 A128]) > (reg:XO 120 [ _9 ])) "../gcc/testsuite/g++.target/powerpc/vecload.C":13:8 2182 {*movxo} > (expr_list:REG_DEAD (reg:DI 126) > (expr_list:REG_DEAD (reg:XO 120 [ _9 ]) > (nil)))) > (note 18 12 0 NOTE_INSN_DELETED) > > r124 and r129 conflicts live range amd ira generates different registers which will not > serve our purpose. This conflict issue is tough, but IMHO it only causes unnecessary register moves and the functionality should be still fine with it. Not sure if there is some existing practice for this kind of issue, one immature idea seems to directly use (subreg:V16QI (reg:OO 124) 0/1) in those uses like UNSPEC_MMA_XVF32GER and UNSPEC_MMA_XVF32GERPP, and fix up them when dropping subreg. CC more experts for advice! > > Making r124 and r129 as same will not allocate register by ira as r124 could have both OOmode > and V16QImode. > > Doing this pass before ira_pass has such above issues and we could solve them after making > after reload pass. > > Please suggest. > > Thanks & Regards > Ajit > >> I am working further on this and send the new version of the patch with all the >> comments incorporated. >> >> Thanks & Regards >> Ajit >>> BR, >>> Kewen >>> >>>> xxmfacc 0 >>>> stxvp %vs2,0(%r3) >>>> stxvp %vs0,32(%r3) >>>> blr >>>> >>>> >>>> Instead of vs33 ira generates vs45 if we move before pass_ira. >>>> >>>> Thanks & Regards >>>> Ajit >>>> >>>> >>>>> Thanks & Regards >>>>> Ajit >>>>>> >>>>> >>>>>>> >>>>>>> Making after reload pass is the only solution I see as ira and reload pass >>>>>>> makes register pairs and vecload pass will be easier with generation of >>>>>>> lxvp. >>>>>>> >>>>>>> Please suggest. >>>>>>> >>>>>>>> Looking forward to the comments from Segher/David/Peter/Mike etc. >>>>>> >>>>>> Still looking forward. :) >>>>>> >>>>>> BR, >>>>>> Kewen >>> BR, Kewen