From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 756E13870871 for ; Fri, 31 May 2024 16:59:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 756E13870871 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 756E13870871 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717174778; cv=none; b=c/0lz4s5h7Vj6AN5awctsDSEJWKQnPnOQDqn0JVbMqbA+cGXm3OprUOQfBe/lTTnLor/Z2qAag364fI4SMaUKjAxrjUpMin8wtm5BLyQr6B/opyCaEj7fD9JQkVZQHvggVfvgbE/h+T3Tm7T8wgYH2CFgCAmm4ilxyyeBpJe1ws= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717174778; c=relaxed/simple; bh=KVm9huL/cHmW9Xbw49/8uJ3ErS2/y/mEXWOtgMgALHQ=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=n+JpPMe4CZE9jU6z//TYYBT3fIlEKqYMLLUz7Misrc9DkvaittXPx2D1ACvuxz+Ks4WMBWYhlJIkTP9lj8aMG2393Q2qc4j5djnMkczyEx/fy4gxCgyazKyzGKkpkCb8zvSRup3K4rnMnl805YmK7uCqJ3bVrXTjJsDnP6JXPC4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 44VGlGH6003465; Fri, 31 May 2024 16:59:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=content-transfer-encoding : content-type : date : from : in-reply-to : message-id : mime-version : references : subject : to; s=pp1; bh=JN0px/hv745AZ/KkhRtj+pC3JoS0GIeCrrZEuQwsE54=; b=mHRNOKLeaF4BUf/suP1o83kJuXnk2aqEDHHrU9kkpTC93omc0DwfoSvnFJmWphR+aQK0 +pIVHD+t159xiSt77pWhhs5DEn5JNcLWv68Geh8WGYNjR9Tadnkh4me+a2pZSWCPL2DI gpnRLgVQgACMTkKGiCg0k37wtJSpz1MF7SnmRQdIrR5Lk+mVgBmNfGBSN4oLF4HRQhw2 1q5modWHbvTUk57kKX8sqJg9yUuqKGo0Ob0TyBv1ZiC9ZzADIZBpQ/QC9WWSzn+0MTst 54IdsGwKAvXYMtnJny74njVkt3KlhSAQODK/6d/ixwecpc1KYwrdaFlkSMb94MxEKTv6 wA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yfjccr0mm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 May 2024 16:59:30 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 44VGsXGY014088; Fri, 31 May 2024 16:59:30 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yfjccr0mj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 May 2024 16:59:30 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 44VG2HuE002783; Fri, 31 May 2024 16:59:29 GMT Received: from smtprelay02.wdc07v.mail.ibm.com ([172.16.1.69]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3ydpb110wq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 May 2024 16:59:29 +0000 Received: from smtpav06.wdc07v.mail.ibm.com (smtpav06.wdc07v.mail.ibm.com [10.39.53.233]) by smtprelay02.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 44VGxP2k28639766 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 31 May 2024 16:59:27 GMT Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 901DC58060; Fri, 31 May 2024 16:59:25 +0000 (GMT) Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 00A2B58068; Fri, 31 May 2024 16:59:22 +0000 (GMT) Received: from [9.43.13.10] (unknown [9.43.13.10]) by smtpav06.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 31 May 2024 16:59:21 +0000 (GMT) Message-ID: <95a33b0a-8090-4218-a62c-da1f53bebbb7@linux.ibm.com> Date: Fri, 31 May 2024 22:29:19 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion To: Alex Coplan , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , Peter Bergner , David Edelsohn , gcc-patches , richard.sandiford@arm.com References: <53ba46de-6c01-4c68-bd98-1ba6950a793a@linux.ibm.com> Content-Language: en-US From: Ajit Agarwal In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: v5HTJqTZQ7DryuAHD8eoFQJivHxWFq1E X-Proofpoint-ORIG-GUID: lTx6KfD4NgCxj1htBGMrTMNw3-ZZFWwG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.650,FMLib:17.12.28.16 definitions=2024-05-31_12,2024-05-30_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 suspectscore=0 spamscore=0 impostorscore=0 priorityscore=1501 malwarescore=0 bulkscore=0 lowpriorityscore=0 clxscore=1015 phishscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2405010000 definitions=main-2405310128 X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_MANYTO,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Richard: On 31/05/24 8:08 pm, Richard Sandiford wrote: > Ajit Agarwal writes: >> On 31/05/24 3:23 pm, Richard Sandiford wrote: >>> Ajit Agarwal writes: >>>> Hello All: >>>> >>>> Common infrastructure using generic code for pair mem fusion of different >>>> targets. >>>> >>>> rs6000 target specific specific code implements virtual functions defined >>>> by generic code. >>>> >>>> Code is implemented with pure virtual functions to interface with target >>>> code. >>>> >>>> Target specific code are added in rs6000-mem-fusion.cc and additional virtual >>>> function implementation required for rs6000 are added in aarch64-ldp-fusion.cc. >>>> >>>> Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu. >>>> >>>> Thanks & Regards >>>> Ajit >>>> >>>> >>>> aarch64, rs6000, middle-end: Add implementation for different targets for pair mem fusion >>>> >>>> Common infrastructure using generic code for pair mem fusion of different >>>> targets. >>>> >>>> rs6000 target specific specific code implements virtual functions defined >>>> by generic code. >>>> >>>> Code is implemented with pure virtual functions to interface with target >>>> code. >>>> >>>> Target specific code are added in rs6000-mem-fusion.cc and additional virtual >>>> function implementation required for rs6000 are added in aarch64-ldp-fusion.cc. >>>> >>>> 2024-05-31 Ajit Kumar Agarwal >>>> >>>> gcc/ChangeLog: >>>> >>>> * config/aarch64/aarch64-ldp-fusion.cc: Add target specific >>>> implementation of additional virtual functions added in pair_fusion >>>> struct. >>>> * config/rs6000/rs6000-passes.def: New mem fusion pass >>>> before pass_early_remat. >>>> * config/rs6000/rs6000-mem-fusion.cc: Add new pass. >>>> Add target specific implementation using pure virtual >>>> functions. >>>> * config.gcc: Add new object file. >>>> * config/rs6000/rs6000-protos.h: Add new prototype for mem >>>> fusion pass. >>>> * config/rs6000/t-rs6000: Add new rule. >>>> * rtl-ssa/accesses.h: Moved set_is_live_out_use as public >>>> from private. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * g++.target/powerpc/me-fusion.C: New test. >>>> * g++.target/powerpc/mem-fusion-1.C: New test. >>>> * gcc.target/powerpc/mma-builtin-1.c: Modify test. >>>> --- >>> >>> This isn't a complete review, just some initial questions & comments >>> about selected parts. >>> >>>> [...] >>>> +/* Check whether load can be fusable or not. >>>> + Return true if dependent use is UNSPEC otherwise false. */ >>>> +bool >>>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info) >>>> +{ >>>> + rtx_insn *insn = info->rtl (); >>>> + >>>> + for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1)) >>>> + if (REG_NOTE_KIND (note) == REG_EQUAL >>>> + || REG_NOTE_KIND (note) == REG_EQUIV) >>>> + return false; >>> >>> It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV >>> note. What's the reason for doing this? Are you trying to avoid >>> fusing pairs before reload that are equivalent to a MEM (i.e. have >>> a natural spill slot)? I think Alex hit a similar situation. >>> >> >> We have used the above check because of some SPEC benchmarks failing with >> with MEM pairs having REG_EQUAL/EQUIV notes. >> >> By adding the checks the benchmarks passes and also it improves the >> performance. >> >> This checks were added during initial implementation of pair fusion >> pass. >> >> I will investigate further if this check is still required or not. > > Thanks. If it does affect SPEC results, it would be good to look > at the underlying reason, as a justification for the check. > > AIUI, the case Alex was due to the way that the RA recognises: > > (set (reg R) (mem address-of-a-stack-variable)) > REG_EQUIV: (mem address-of-a-stack-variable) > > where the REG_EQUIV is either explicit or detected by the RA. > If R needs to be spilled, it can then be spilled to its existing > location on the stack. And if R needs to be spilled in the > instruction above (because of register pressure before the first > use of R), the RA is able to delete the instruction. > > But if that is the reason, the condition should be restricted > to cases in which the note is a memory. > > I think Alex had tried something similar and found that it wasn't > always effective. > Sure I will check. >> [...] >>>> + && info->is_real ()) >>>> + { >>>> + rtx_insn *rtl_insn = info->rtl (); >>>> + rtx set = single_set (rtl_insn); >>>> + >>>> + if (set == NULL_RTX) >>>> + return false; >>>> + >>>> + rtx op0 = SET_SRC (set); >>>> + if (GET_CODE (op0) != UNSPEC) >>>> + return false; >>> >>> What's the motivation for rejecting unspecs? It's unsual to treat >>> all unspecs as a distinct group. >>> >>> Also, using single_set means that the function still lets through >>> parallels of two sets in which the sources are unspecs. Is that >>> intentional? >>> >>> The reasons behind things like the REG_EQUAL/EQUIV and UNSPEC decisions >>> need to be described in comments, so that other people coming to this >>> code later can understand the motivation. The same thing applies to >>> other decisions in the patch. >>> >> >> Adjacent load pair fusion with 256 bit OOmode is seen and valid with use of load >> in UNSPEC. Thats why this check is added. > > Can you give an example (in terms of rtl)? (insn 23 72 25 4 (set (reg:OO 124 [ vect__1.24 ]) (mem:OO (reg:DI 118 [ ivtmp.73 ]) [0 MEM [(unsigned char *)_65]+0 S16 A8])) {*movoo} (nil)) (insn 25 23 103 4 (set (reg:OO 174 [ vect__1.26 ]) (mem:OO (plus:DI (reg:DI 118 [ ivtmp.73 ]) (const_int 32 [0x20])) [0 MEM [(unsigned char *)_65 + 32B]+0 S16 A8])) {*movoo} (nil)) (insn 103 25 27 4 (set (reg:DI 118 [ ivtmp.73 ]) (plus:DI (reg:DI 118 [ ivtmp.73 ]) (const_int 64 [0x40]))) 66 {*adddi3} (nil)) (insn 27 103 29 4 (set (reg:V16QI 176 [ vect_perm_even_123 ]) (unspec:V16QI [ (subreg:V16QI (reg:OO 124 [ vect__1.24 ]) 0) (subreg:V16QI (reg:OO 124 [ vect__1.24 ]) 16) (reg:V16QI 300) ] UNSPEC_VPERM)) {altivec_vperm_v16qi_direct} (nil)) This is the example where UNSPEC is used with 256 bit OOmode. > >> [...] >>>> [...] >>>> + // Given insn_info pair I1 and I2, return true if offsets are in order. >>>> + virtual bool should_handle_unordered_insns (rtl_ssa::insn_info *i1, >>>> + rtl_ssa::insn_info *i2) = 0; >>>> + >>> >>> This name seems a bit misleading. The function is used in: >>> >>> @@ -2401,6 +2405,9 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >>> reversed = true; >>> } >>> >>> + if (!m_pass->should_handle_unordered_insns (i1, i2)) >>> + return false; >>> + >>> rtx cand_mems[2]; >>> rtx reg_ops[2]; >>> rtx pats[2]; >>> >>> and so it acts as a general opt-out. The insns aren't known to be unordered. >>> >>> It looks like the rs6000 override requires the original insns to be >>> in offset order. Could you say why that's necessary? (Both in email >>> and as a comment in the code.) >>> >> >> Yes rs6000 requires the original load insns to be in offset order. >> Some regression tests like vect-outer-4f fails if we do load pair >> fusion with load offsets are not in offset order as this breaks lxvp >> semantics. > > How does it break the semantics though? In principle, the generic code > only fuses if it has "proved" that the loads can happen in either order. > So it shouldn't matter which order the hardware does things in. > > Could you give an example of the kind of situation that you want > to avoid, and why it generates the wrong result? > (insn 31 62 32 2 (set (reg:V16QI 177 [ MEM [(short unsigned int *)vectp.62_36 + 64B] ]) (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ]) (const_int 64 [0x40])) [1 MEM [(short unsigned int *)vectp.62_36 + 64B]+0 S16 A16])) {vsx_movv16qi_64bit} (nil)) (insn 32 31 16 2 (set (reg:V16QI 178 [ MEM [(short unsigned int *)vectp.62_36 + 80B] ]) (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ]) (const_int 80 [0x50])) [1 MEM [(short unsigned int *)vectp.62_36 + 80B]+0 S16 A16])) {vsx_movv16qi_64bit} (nil)) (insn 16 32 21 2 (set (reg:V16QI 159 [ MEM [(short unsigned int *)vectp.62_36 + 16B] ]) (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ]) (const_int 16 [0x10])) [1 MEM [(short unsigned int *)vectp.62_36 + 16B]+0 S16 A16])) {vsx_movv16qi_64bit} (nil)) (insn 21 16 22 2 (set (reg:V16QI 165 [ MEM [(short unsigned int *)vectp.62_36 + 32B] ]) (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ]) (const_int 32 [0x20])) [1 MEM [(short unsigned int *)vectp.62_36 + 32B]+0 S16 A16])) {vsx_movv16qi_64bit} (nil)) (insn 22 21 37 2 (set (reg:V16QI 166 [ MEM [(short unsigned int *)vectp.62_36 + 48B] ]) (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ]) (const_int 48 [0x30])) [1 MEM [(short unsigned int *)vectp.62_36 + 48B]+0 S16 A16])) {vsx_movv16qi_64bit} (nil)) insn 22 and insn 31 is merged in the failure case and breaks the code. > Thanks, > Richard > Thanks & Regards Ajit