From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id B0D00385840F for ; Fri, 31 May 2024 13:54:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B0D00385840F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B0D00385840F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717163682; cv=none; b=Zoel/Hzf2apkEwYtWzawEMcu6u3ml1oahyKa7+7I+Dgu3yNP/BvvPOIYb2pl8FqqldxOooyXDkMa17W0HQINvPtByHW9ICbsX86PANa/DD7DB5/Dgso6F26STK8PV0ClxCPHiWWo5QMEubxqlgSiBqthdm/AcD3XJdywYANfvRg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717163682; c=relaxed/simple; bh=o2EkhtaR2+I92YdXtamNsvUSCtGVYjiDFzz8AgyCVKo=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=kS8jE9qnfofK0aa5qJrw6DKmXmzb2az057ADhvERlYUVLsngtX15EPWXfWxuI8IrVQnkAcoVLxqsJ2uEWjmlH/T5/amvLg6hTbRWqXEKoQK9iTUhP8oq6cODm/BxpVagHjrZrj4lMgABViOr8FxZqh1RPP0tc18aohmxKs4IFbw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 44VDlfZ7000632; Fri, 31 May 2024 13:54:35 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=content-transfer-encoding : content-type : date : from : in-reply-to : message-id : mime-version : references : subject : to; s=pp1; bh=149y/TM7Dxwnd3gXqiwb8Wd6fUHZAyKa7Sxg8d3LuZU=; b=NVQXmwW7HnznSPL5q2+in8xUg2UCXbQayu7YAczeFh/dQACFno5yScbcbFYt+lpFdyp+ I46YnuyfDtyy+eaE6s/G6E2o1s4qfgllLyeWGrxbW2lCXbNf6zM0DwVuITTAr1ARBue5 arG0e4e0YIbGGEJY/0StznqUlsOgx6OGqIHV/iwGKLcpHyOFz5fkeOW35rurUXNWz0V7 gSxxRqyBpLZmtdaXur9b3G+OnquLHYaRavxyHKu0DnrHIuB148Ni3/stoBrscE+HUN1v j72xKW4frW/KSxymty8xEDwcVcBoQru3v0vGYcyXlcMUdl8Cdzncv8OoE9qNNy5aKdU7 NA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yffr480r2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 May 2024 13:54:35 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 44VDsYll012164; Fri, 31 May 2024 13:54:34 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yffr480qy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 May 2024 13:54:34 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 44VCTVio028965; Fri, 31 May 2024 13:54:33 GMT Received: from smtprelay04.dal12v.mail.ibm.com ([172.16.1.6]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3ydpb008g4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 May 2024 13:54:33 +0000 Received: from smtpav04.dal12v.mail.ibm.com (smtpav04.dal12v.mail.ibm.com [10.241.53.103]) by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 44VDsUDU20251384 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 31 May 2024 13:54:32 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5489858056; Fri, 31 May 2024 13:54:30 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 44EE858052; Fri, 31 May 2024 13:54:27 +0000 (GMT) Received: from [9.43.13.10] (unknown [9.43.13.10]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTP; Fri, 31 May 2024 13:54:26 +0000 (GMT) Message-ID: Date: Fri, 31 May 2024 19:24:25 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion To: Alex Coplan , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , Peter Bergner , David Edelsohn , gcc-patches , richard.sandiford@arm.com References: <53ba46de-6c01-4c68-bd98-1ba6950a793a@linux.ibm.com> Content-Language: en-US From: Ajit Agarwal In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 1a9pcSMyOc4rQ4pJl20KPfNU1Pri0viE X-Proofpoint-ORIG-GUID: gnzEgUj3D_TedWjhxjQPEfmtD4IqJSbZ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.650,FMLib:17.12.28.16 definitions=2024-05-31_10,2024-05-30_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 phishscore=0 adultscore=0 suspectscore=0 priorityscore=1501 spamscore=0 malwarescore=0 clxscore=1015 bulkscore=0 impostorscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2405010000 definitions=main-2405310103 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_MANYTO,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Richard: On 31/05/24 3:23 pm, Richard Sandiford wrote: > Ajit Agarwal writes: >> Hello All: >> >> Common infrastructure using generic code for pair mem fusion of different >> targets. >> >> rs6000 target specific specific code implements virtual functions defined >> by generic code. >> >> Code is implemented with pure virtual functions to interface with target >> code. >> >> Target specific code are added in rs6000-mem-fusion.cc and additional virtual >> function implementation required for rs6000 are added in aarch64-ldp-fusion.cc. >> >> Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu. >> >> Thanks & Regards >> Ajit >> >> >> aarch64, rs6000, middle-end: Add implementation for different targets for pair mem fusion >> >> Common infrastructure using generic code for pair mem fusion of different >> targets. >> >> rs6000 target specific specific code implements virtual functions defined >> by generic code. >> >> Code is implemented with pure virtual functions to interface with target >> code. >> >> Target specific code are added in rs6000-mem-fusion.cc and additional virtual >> function implementation required for rs6000 are added in aarch64-ldp-fusion.cc. >> >> 2024-05-31 Ajit Kumar Agarwal >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-ldp-fusion.cc: Add target specific >> implementation of additional virtual functions added in pair_fusion >> struct. >> * config/rs6000/rs6000-passes.def: New mem fusion pass >> before pass_early_remat. >> * config/rs6000/rs6000-mem-fusion.cc: Add new pass. >> Add target specific implementation using pure virtual >> functions. >> * config.gcc: Add new object file. >> * config/rs6000/rs6000-protos.h: Add new prototype for mem >> fusion pass. >> * config/rs6000/t-rs6000: Add new rule. >> * rtl-ssa/accesses.h: Moved set_is_live_out_use as public >> from private. >> >> gcc/testsuite/ChangeLog: >> >> * g++.target/powerpc/me-fusion.C: New test. >> * g++.target/powerpc/mem-fusion-1.C: New test. >> * gcc.target/powerpc/mma-builtin-1.c: Modify test. >> --- > > This isn't a complete review, just some initial questions & comments > about selected parts. > >> [...] >> +/* Check whether load can be fusable or not. >> + Return true if dependent use is UNSPEC otherwise false. */ >> +bool >> +rs6000_pair_fusion::fuseable_load_p (insn_info *info) >> +{ >> + rtx_insn *insn = info->rtl (); >> + >> + for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1)) >> + if (REG_NOTE_KIND (note) == REG_EQUAL >> + || REG_NOTE_KIND (note) == REG_EQUIV) >> + return false; > > It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV > note. What's the reason for doing this? Are you trying to avoid > fusing pairs before reload that are equivalent to a MEM (i.e. have > a natural spill slot)? I think Alex hit a similar situation. > We have used the above check because of some SPEC benchmarks failing with with MEM pairs having REG_EQUAL/EQUIV notes. By adding the checks the benchmarks passes and also it improves the performance. This checks were added during initial implementation of pair fusion pass. I will investigate further if this check is still required or not. Sorry for the inconvenience caused. >> + >> + for (auto def : info->defs ()) >> + { >> + auto set = dyn_cast (def); >> + if (set && set->has_any_uses ()) >> + { >> + for (auto use : set->all_uses()) > > Nit: has_any_uses isn't necessary: the inner loop will simply do nothing > in that case. Also, we can/should restrict the scan to non-debug uses. > > This can then be: > > for (auto def : info->defs ()) > if (auto set = dyn_cast (def)) > for (auto use : set->nondebug_insn_uses()) > Sure. I will change as above. >> + { >> + if (use->insn ()->is_artificial ()) >> + return false; >> + >> + insn_info *info = use->insn (); >> + >> + if (info >> + && info->rtl () > > This test shouldn't be necessary. > Sure I will remove this check. >> + && info->is_real ()) >> + { >> + rtx_insn *rtl_insn = info->rtl (); >> + rtx set = single_set (rtl_insn); >> + >> + if (set == NULL_RTX) >> + return false; >> + >> + rtx op0 = SET_SRC (set); >> + if (GET_CODE (op0) != UNSPEC) >> + return false; > > What's the motivation for rejecting unspecs? It's unsual to treat > all unspecs as a distinct group. > > Also, using single_set means that the function still lets through > parallels of two sets in which the sources are unspecs. Is that > intentional? > > The reasons behind things like the REG_EQUAL/EQUIV and UNSPEC decisions > need to be described in comments, so that other people coming to this > code later can understand the motivation. The same thing applies to > other decisions in the patch. > Adjacent load pair fusion with 256 bit OOmode is seen and valid with use of load in UNSPEC. Thats why this check is added. >> + } >> + } >> + } >> + } >> + return true; >> +} >> [...] >> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc >> index 9f897ac04e2..2dbe9f854ef 100644 >> --- a/gcc/pair-fusion.cc >> +++ b/gcc/pair-fusion.cc >> @@ -312,7 +312,7 @@ static int >> encode_lfs (lfs_fields fields) >> { >> int size_log2 = exact_log2 (fields.size); >> - gcc_checking_assert (size_log2 >= 2 && size_log2 <= 4); >> + gcc_checking_assert (size_log2 >= 2 && size_log2 <= 6); >> return ((int)fields.load_p << 3) >> | ((int)fields.fpsimd_p << 2) >> | (size_log2 - 2); > > The point of the assert is to make sure that "size_log2 - 2" > fits within 2 bits, and so does not interfere with the fpsimd_p > and load_p parts of the encoding. If rs6000 needs size_log2 == 6, > the shift amounts should be increased by 1 to compensate. > > If we do bump the shifts by 1, the new range can be: > > gcc_checking_assert (size_log2 >= 2 && size_log2 <= 9); > Sure I will make this change. >> [...] >> + // Given insn_info pair I1 and I2, return true if offsets are in order. >> + virtual bool should_handle_unordered_insns (rtl_ssa::insn_info *i1, >> + rtl_ssa::insn_info *i2) = 0; >> + > > This name seems a bit misleading. The function is used in: > > @@ -2401,6 +2405,9 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size, > reversed = true; > } > > + if (!m_pass->should_handle_unordered_insns (i1, i2)) > + return false; > + > rtx cand_mems[2]; > rtx reg_ops[2]; > rtx pats[2]; > > and so it acts as a general opt-out. The insns aren't known to be unordered. > > It looks like the rs6000 override requires the original insns to be > in offset order. Could you say why that's necessary? (Both in email > and as a comment in the code.) > Yes rs6000 requires the original load insns to be in offset order. Some regression tests like vect-outer-4f fails if we do load pair fusion with load offsets are not in offset order as this breaks lxvp semantics. > If we do need a general opt-out like this, it should probably go at the > very start of try_fuse_pair, before even the dump message. (Alternatively, > it could go after the dump message, but then the "return false" would need > a dump message of its own to explain the failure.) > Sure. > Thanks, > Richard Thanks & Regards Ajit