From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 7D9763858C66 for ; Fri, 12 Apr 2024 14:33:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7D9763858C66 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7D9763858C66 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712932394; cv=none; b=Yo5MfrpNhHJRjX/oxuqVkrp8AEVNmAnP2T4EcC/N1FgVR9qe9sqYVQ2DaH6w8lliLW+BFv3m24di6MEv00OR9dn2D+FEu43nEaSaOzTFeqw6UQ9tC5oli4VszCRDcKxoZ0yCEL4Ux6AgynKCypfdCRcNrjTU3DYVh8U+zPLmUXk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712932394; c=relaxed/simple; bh=YpktNC0TgcxeVaZNDPt5TICyaAoqfKBvlykBFMQA0+4=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=QvVdECJ/PU8B0cdXtK9t9Q0KwfjBopqXQoM2zLdLO6C/inhgvDYeysL1VW3xMKNKGim6rVZ2SKvsvkXwjHIJddtteaThoR0G7KvkuL0DQCnlz5VFiIL5hvXh+Gr+Fw1Fdtxin10ipFxGy6CWxFoMAPCYRUsDra+AyWSXQ58KHBM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 43CEX7lf009431; Fri, 12 Apr 2024 14:33:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=bq5bj9bY18zWcQeSEoZrIkeoKjKASVOun8/Zja3htrY=; b=oFQl2E6ZM6HU7ozCdIPyp6Uskg5wUI26qPM7saGFtUEC6pFWnxiDK+Q1u/7BkhHi9A4O qr9aA2DWN9ubKkDbrQPd2S6wSSlzuZbcFbq1/y5a332O3xyL7+9M2sj7ddSDjywSroUE e24hjZT3Tt5Vs9vC41h5TRzhrKfhNCJfIwlbmpGCjN8Z2pz8BIIIfSb6FFoNTNK6/5qJ Q1HVxhIv+OYsSOlEjDixJSpN6rNrQluaBmfq25DZ+zqVFB8QxUjE0NpaCFtpQXOiYlOP bDTFMG8MgU1IEnm8BxMMxdFE0WPzTu1boDVSj75I6Wc6I7lOzS59F3N/l8H8AT+FpF63 yw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xf55y07y5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 12 Apr 2024 14:33:06 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 43CEX6E4009401; Fri, 12 Apr 2024 14:33:06 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xf55y07xx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 12 Apr 2024 14:33:06 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 43CCAKkd021511; Fri, 12 Apr 2024 14:32:59 GMT Received: from smtprelay06.wdc07v.mail.ibm.com ([172.16.1.73]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3xbjxma25t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 12 Apr 2024 14:32:59 +0000 Received: from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com [10.39.53.232]) by smtprelay06.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 43CEWtau13566566 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 12 Apr 2024 14:32:57 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D5BFD58043; Fri, 12 Apr 2024 14:32:55 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DFADA58067; Fri, 12 Apr 2024 14:32:43 +0000 (GMT) Received: from [9.43.20.26] (unknown [9.43.20.26]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 12 Apr 2024 14:32:43 +0000 (GMT) Message-ID: <595cd863-3ba5-492b-84d1-ab470411507e@linux.ibm.com> Date: Fri, 12 Apr 2024 20:02:41 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file To: Alex Coplan Cc: Richard Sandiford , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , David Edelsohn , Peter Bergner , gcc-patches References: <54ab422a-4ffa-4ca9-8029-eb23c42ed653@linux.ibm.com> Content-Language: en-US From: Ajit Agarwal In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: R9xWo4m38ZUwHF1kiWu9Yh7QwDjNFRD5 X-Proofpoint-ORIG-GUID: Yv_9tvjly1Lr-vSyVlVCopEoM9MOE0tN X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-04-12_11,2024-04-09_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 lowpriorityscore=0 phishscore=0 mlxscore=0 malwarescore=0 spamscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 mlxlogscore=999 bulkscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2404010000 definitions=main-2404120105 X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Alex: On 11/04/24 7:55 pm, Alex Coplan wrote: > On 10/04/2024 23:48, Ajit Agarwal wrote: >> Hello Alex: >> >> On 10/04/24 7:52 pm, Alex Coplan wrote: >>> Hi Ajit, >>> >>> On 10/04/2024 15:31, Ajit Agarwal wrote: >>>> Hello Alex: >>>> >>>> On 10/04/24 1:42 pm, Alex Coplan wrote: >>>>> Hi Ajit, >>>>> >>>>> On 09/04/2024 20:59, Ajit Agarwal wrote: >>>>>> Hello Alex: >>>>>> >>>>>> On 09/04/24 8:39 pm, Alex Coplan wrote: >>>>>>> On 09/04/2024 20:01, Ajit Agarwal wrote: >>>>>>>> Hello Alex: >>>>>>>> >>>>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote: >>>>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote: >>>>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote: >>>>>>>>>>>> Hello Alex/Richard: >>>>>>>>>>>> >>>>>>>>>>>> All review comments are incorporated. >>> >>>>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list, >>>>>>>>>>>> // of accesses. If we find two sets of adjacent accesses, call >>>>>>>>>>>> // merge_pairs. >>>>>>>>>>>> void >>>>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs, >>>>>>>>>>>> - access_group &group) >>>>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs, >>>>>>>>>>>> + access_group &group) >>>>>>>>>>>> { >>>>>>>>>>>> const auto lfs = decode_lfs (encoded_lfs); >>>>>>>>>>>> const unsigned access_size = lfs.size; >>>>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, >>>>>>>>>>>> access.cand_insns, >>>>>>>>>>>> lfs.load_p, >>>>>>>>>>>> access_size); >>>>>>>>>>>> - skip_next = access.cand_insns.empty (); >>>>>>>>>>>> + skip_next = bb_state->cand_insns_empty_p (access.cand_insns); >>>>>>>>>>> >>>>>>>>>>> As above, why is this needed? >>>>>>>>>> >>>>>>>>>> For rs6000 we want to return always true. as load store pair >>>>>>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000. >>>>>>>>>> And we want load store pair to 8/16 32/64. Thats why we want >>>>>>>>>> to generate always true for rs6000 to skip pairs as above. >>>>>>>>> >>>>>>>>> Hmm, sorry, I'm not sure I follow. Are you saying that for rs6000 you have >>>>>>>>> load/store pair instructions where the two arms of the access are storing >>>>>>>>> operands of different sizes? Or something else? >>>>>>>>> >>>>>>>>> As it stands the logic is to skip the next iteration only if we >>>>>>>>> exhausted all the candidate insns for the current access. In the case >>>>>>>>> that we didn't exhaust all such candidates, then the idea is that when >>>>>>>>> access becomes prev_access, we can attempt to use those candidates as >>>>>>>>> the "left-hand side" of a pair in the next iteration since we failed to >>>>>>>>> use them as the "right-hand side" of a pair in the current iteration. >>>>>>>>> I don't see why you wouldn't want that behaviour. Please can you >>>>>>>>> explain? >>>>>>>>> >>>>>>>> >>>>>>>> In merge_pair we get the 2 load candiates one load from 0 offset and >>>>>>>> other load is from 16th offset. Then in next iteration we get load >>>>>>>> from 16th offset and other load from 32 offset. In next iteration >>>>>>>> we get load from 32 offset and other load from 48 offset. >>>>>>>> >>>>>>>> For example: >>>>>>>> >>>>>>>> Currently we get the load candiates as follows. >>>>>>>> >>>>>>>> pairs: >>>>>>>> >>>>>>>> load from 0th offset. >>>>>>>> load from 16th offset. >>>>>>>> >>>>>>>> next pairs: >>>>>>>> >>>>>>>> load from 16th offset. >>>>>>>> load from 32th offset. >>>>>>>> >>>>>>>> next pairs: >>>>>>>> >>>>>>>> load from 32th offset >>>>>>>> load from 48th offset. >>>>>>>> >>>>>>>> Instead in rs6000 we should get: >>>>>>>> >>>>>>>> pairs: >>>>>>>> >>>>>>>> load from 0th offset >>>>>>>> load from 16th offset. >>>>>>>> >>>>>>>> next pairs: >>>>>>>> >>>>>>>> load from 32th offset >>>>>>>> load from 48th offset. >>>>>>> >>>>>>> Hmm, so then I guess my question is: why wouldn't you consider merging >>>>>>> the pair with offsets (16,32) for rs6000? Is it because you have a >>>>>>> stricter alignment requirement on the base pair offsets (i.e. they have >>>>>>> to be a multiple of 32 when the operand size is 16)? So the pair >>>>>>> offsets have to be a multiple of the entire pair size rather than a >>>>>>> single operand size> >>>>>> >>>>>> We get load pair at a certain point with (0,16) and other program >>>>>> point we get load pair (32, 48). >>>>>> >>>>>> In current implementation it takes offsets loads as (0, 16), >>>>>> (16, 32), (32, 48). >>>>>> >>>>>> But In rs6000 we want the load pair to be merged at different points >>>>>> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with >>>>>> 0 offset and other load (32, 48) with lxvp with 32 offset. >>>>>> >>>>>> In current case it will merge with lxvp with 0 offset and lxvp with >>>>>> 16 offset, then lxvp with 32 offset and lxvp with 48 offset which >>>>>> is incorrect in our case as the (16-32) case 16 offset will not >>>>>> load from even register and break for rs6000. >>>>> >>>>> Sorry, I think I'm still missing something here. Why does the address offset >>>>> affect the parity of the tranfser register? ISTM they needn't be related at >>>>> all (and indeed we can't even know the parity of the transfer register before >>>>> RA, but perhaps you're only intending on running the pass after RA?) >>>>> >>>> >>>> We have load pair with (0,16) wherein these loads are adjacent and >>>> replaced with lxvp. >>>> >>>> Semantic of lxvp instruction is that it loads adjacent load pair in >>>> even register and even_register + 1. >>>> >>>> We replace the above load pair with lxvp instruction and then we >>>> dont need to merge (16,32) as (0, 16) is already merged and instead >>>> we merge (32,48). >>> >>> Ok, but the existing logic should already account for this. I.e. if we >>> successfully merge (0,16), then we don't attempt to merge (16,32). We'd only >>> attempt to merge (16,32) if the merge of (0,16) failed (for whatever reason). >>> So I don't see that there's anything specific to lxvp that requires this logic >>> to change, _unless_ you have a stricter alignment requirement on the offsets as >>> I mentioned before. >>> >> >> Thanks for the suggestion. It worked for rs6000 also with current changes. >> Sorry for the confusion. > > Alright, glad we got to the bottom of this! Thanks. > >> >>>> >>>> Yes you are correct, the addresss offset doesn't affect the parity of >>>> the register transfer as we are doing fusion pass before RA. >>>> If that is the case then I think it would be better to introduce a >>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base >>>>>>> offset of the pair (prev_access->offset in transform_for_base). I guess >>>>>>> it would also take access_size as a parameter and for aarch64 it should >>>>>>> check: >>>>>>> >>>>>>> multiple_p (offset, access_size) >>>>>>> >>>>>>> and for rs6000 it could check: >>>>>>> >>>>>>> multiple_p (offset, access_size * 2) >>>>>>> >>>>>>> and we would then incorporate a call to that predicate in the else if >>>>>>> condition of tranform_for_base. >>>>>>> >>>>>>> It would have the same limitation whereby we assume that MEM_EXPR offset >>>>>>> alignment is a good proxy for RTL offset alignment, but we already >>>>>>> depend on that with the multiple_p check in track_via_mem_expr. >>>>>>> >>>> I have addressed the above hooks and it worked fine with both rs6000 >>>> and aarch64. I am sending subsequent patch in some time that address >>>> above. >>>> >>>>> How do you plan on handling this even-odd requirement for rs6000? >>>>> >>>> >>>> We plan to handle with V16QI subreg: 0 and V16QI subreg : 16 to >>>> generate register pair and thats what we generate and implement >>>> in rs6000 target >>>> code. >>> >>> Ah, this is coming back to me now. Sorry, I should have remembered this from >>> the previous discussion with Richard S. >>> >>> Apologies for going on a slight tangent here, but if you're running >>> before RA are you planning to create a new OImode pseudo register for >>> the lxvp insn and then somehow update uses of the old transfer registers >>> to replace them with subregs of that OImode pseudo? >> >> Yes I do the same as you have mentioned. We generate register pairs >> with 256 bit mode with two subregs of 128 bit modes with 0 and >> 16 offset. >> >> Or do you just plan >>> or replacing the individual loads with moves (instead of deleting them)? >>> I guess the latter would be simpler and might work better in the >>> presence of hard regs. >>> >> >> Would you mind explaining how to generate register pairs with lxvp by >> replacing loads with moves. > > Yeah, so suppose you have something like: > > (set (reg:V16QI v1) (mem:V16QI addr)) > (set (reg:V16QI v2) (mem:V16QI addr+16)) > > then when you insert the lxvp you can then (logically) replace the > original load instructions with moves from the appropriate subreg, as > follows: > > (set (reg:OI pair-pseudo) (mem:OI addr)) ; lxvp > (set (reg:V16QI v1) (subreg:V16QI (reg:OI pair-pseudo) 0)) > (set (reg:V16QI v2) (subreg:V16QI (reg:OI pair-pseudo) 16)) > Any Pseudo created with gen_rtx_REG like gen_rtx_REG (OOmode, REGNO (dest_exp) will error'ed out with unrecognize insn by LRA. If I create pseudo with gen_reg_rtx (OOmode) will error'ed out with new_defs Pseudo register is not found in change->new_defs. Also the sequential register pairs are not generated by Register Allocator. Thats why I haven't used above method as I also thought through. Please let me know what do you think. Thanks & Regards Ajit > now I'm not sure off-hand if this exact combination of subregs and mode > changes is valid, but hopefully you get the idea. The benefit of this > approach is that it keeps the transformation local and is more > compile-time efficient (we don't have to look through all mentions of > v1/v2 and replace them with subregs). I'd hope that the RA can then > clean up any redundant moves (especially in the case that v1/v2 are > pseudos). > > That would mean that you don't need all the grubbing around with DF > looking for occurrences of the transfer regs. Instead we'd just need > some logic to insert the extra moves after the lxvp for rs6000, and I > think this might fit more cleanly into the current pass. > > Does that make sense? In any case, this shouldn't really affect the > preparatory aarch64 patch because we were planning to defer adding any > hooks that are only needed for rs6000 from the initial aarch64/generic > split. > > Thanks, > Alex > >> >> Thanks & Regards >> Ajit >> >>> Thanks, >>> Alex >>> >>>> >>>> Thanks & Regards >>>> Ajit >>>>> Thanks, >>>>> Alex >>>>> >>>>>> >>>>>> lxvp should load from even registers and then loaded value will >>>>>> be in even register and even register +1 (which is odd). >>>>>> >>>>>> Thanks & Regards >>>>>> Ajit >>>>>>> If that is the case then I think it would be better to introduce a >>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base >>>>>>> offset of the pair (prev_access->offset in transform_for_base). I guess >>>>>>> it would also take access_size as a parameter and for aarch64 it should >>>>>>> check: >>>>>>> >>>>>>> multiple_p (offset, access_size) >>>>>>> >>>>>>> and for rs6000 it could check: >>>>>>> >>>>>>> multiple_p (offset, access_size * 2) >>>>>>> >>>>>>> and we would then incorporate a call to that predicate in the else if >>>>>>> condition of tranform_for_base. >>>>>>> >>>>>>> It would have the same limitation whereby we assume that MEM_EXPR offset >>>>>>> alignment is a good proxy for RTL offset alignment, but we already >>>>>>> depend on that with the multiple_p check in track_via_mem_expr. >>>>>>> >>>>>>> Thanks, >>>>>>> Alex >>>>>>> >>>>>>>> >>>>>>>> Thanks & Regards >>>>>>>> Ajit >>>>>>>>> Thanks, >>>>>>>>> Alex >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> } >>>>>>>>>>>> prev_access = &access; >>>>>>>>>>>> } >>>>>>>>>>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, >>>>>>>>>>>> // and remove all the tombstone insns, being sure to reparent any uses >>>>>>>>>>>> // of mem to previous defs when we do this. >>>>>>>>>>>> void >>>>>>>>>>>> -ldp_bb_info::cleanup_tombstones () >>>>>>>>>>>> +pair_fusion_bb_info::cleanup_tombstones () >>>>>>>>>>>> { >>>>>>>>>>>> // No need to do anything if we didn't emit a tombstone insn for this BB. >>>>>>>>>>>> if (!m_emitted_tombstone) >>>>>>>>>>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones () >>>>>>>>>>>> >>>>>>>>>>>> template >>>>>>>>>>>> void >>>>>>>>>>>> -ldp_bb_info::traverse_base_map (Map &map) >>>>>>>>>>>> +pair_fusion_bb_info::traverse_base_map (Map &map) >>>>>>>>>>>> { >>>>>>>>>>>> for (auto kv : map) >>>>>>>>>>>> { >>>>>>>>>>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map) >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> void >>>>>>>>>>>> -ldp_bb_info::transform () >>>>>>>>>>>> +pair_fusion_bb_info::transform () >>>>>>>>>>>> { >>>>>>>>>>>> traverse_base_map (expr_map); >>>>>>>>>>>> traverse_base_map (def_map); >>>>>>>>>>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn) >>>>>>>>>>>> // for load/store candidates. If running after RA, also try and promote >>>>>>>>>>>> // non-writeback pairs to use writeback addressing. Then try to fuse >>>>>>>>>>>> // candidates into pairs. >>>>>>>>>>>> -void ldp_fusion_bb (bb_info *bb) >>>>>>>>>>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb) >>>>>>>>>>>> { >>>>>>>>>>>> - const bool track_loads >>>>>>>>>>>> - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; >>>>>>>>>>>> - const bool track_stores >>>>>>>>>>>> - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; >>>>>>>>>>>> + const bool track_loads = track_load_p (); >>>>>>>>>>>> + const bool track_stores = track_store_p (); >>>>>>>>>>>> >>>>>>>>>>>> - ldp_bb_info bb_state (bb); >>>>>>>>>>> >>>>>>>>>>> This: >>>>>>>>>>> >>>>>>>>>>>> + aarch64_pair_fusion derived; >>>>>>>>>>> >>>>>>>>>>> can be deleted and then: >>>>>>>>>>> >>>>>>>>>>>> + pair_fusion_bb_info bb_info (bb, &derived); >>>>>>>>>>> >>>>>>>>>>> can just be: >>>>>>>>>>> >>>>>>>>>>> pair_fusion_bb_info bb_info (bb, this); >>>>>>>>>>> >>>>>>>>>>> (or you can pass *this if you make bb_info take a reference). >>>>>>>>>>> >>>>>>>>>>> I don't think there's a particular need to change the variable name >>>>>>>>>>> (bb_state -> bb_info). I chose the former because it doens't clash >>>>>>>>>>> with the RTL-SSA structure of the same name as the latter. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Addressed. >>>>>>>>>>>> >>>>>>>>>>>> for (auto insn : bb->nondebug_insns ()) >>>>>>>>>>>> { >>>>>>>>>>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb) >>>>>>>>>>>> continue; >>>>>>>>>>>> >>>>>>>>>>>> rtx pat = PATTERN (rti); >>>>>>>>>>>> - if (reload_completed >>>>>>>>>>>> - && aarch64_ldp_writeback > 1 >>>>>>>>>>>> - && GET_CODE (pat) == PARALLEL >>>>>>>>>>>> - && XVECLEN (pat, 0) == 2) >>>>>>>>>>>> + if (pair_mem_promote_writeback_p (pat)) >>>>>>>>>>>> try_promote_writeback (insn); >>>>>>>>>>> >>>>>>>>>>> It looks like try_promote_writeback itself will need some further work >>>>>>>>>>> to make it target-independent. I suppose this check: >>>>>>>>>>> >>>>>>>>>>> auto rti = insn->rtl (); >>>>>>>>>>> const auto attr = get_attr_ldpstp (rti); >>>>>>>>>>> if (attr == LDPSTP_NONE) >>>>>>>>>>> return; >>>>>>>>>>> >>>>>>>>>>> bool load_p = (attr == LDPSTP_LDP); >>>>>>>>>>> gcc_checking_assert (load_p || attr == LDPSTP_STP); >>>>>>>>>>> >>>>>>>>>>> will need to become part of the pair_mem_promote_writeback_p hook that you >>>>>>>>>>> added, potentially changing it to return a boolean for load_p. >>>>>>>>>>> >>>>>>>>>>> Then I guess we will need hooks for destructuring the pair insn and >>>>>>>>>>> another hook to wrap aarch64_gen_writeback_pair. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Addressed. >>>>>>>>>>>> >>>>>>>>>>>> if (GET_CODE (pat) != SET) >>>>>>>>>>>> continue; >>>>>>>>>>>> >>>>>>>>>>>> if (track_stores && MEM_P (XEXP (pat, 0))) >>>>>>>>>>>> - bb_state.track_access (insn, false, XEXP (pat, 0)); >>>>>>>>>>>> + bb_info.track_access (insn, false, XEXP (pat, 0)); >>>>>>>>>>>> else if (track_loads && MEM_P (XEXP (pat, 1))) >>>>>>>>>>>> - bb_state.track_access (insn, true, XEXP (pat, 1)); >>>>>>>>>>>> + bb_info.track_access (insn, true, XEXP (pat, 1)); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> - bb_state.transform (); >>>>>>>>>>>> - bb_state.cleanup_tombstones (); >>>>>>>>>>>> + bb_info.transform (); >>>>>>>>>>>> + bb_info.cleanup_tombstones (); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> void ldp_fusion () >>>>>>>>>>>> { >>>>>>>>>>>> ldp_fusion_init (); >>>>>>>>>>>> + pair_fusion *pfuse; >>>>>>>>>>>> + aarch64_pair_fusion derived; >>>>>>>>>>>> + pfuse = &derived; >>>>>>>>>>> >>>>>>>>>>> This is indeed the one place where I think it is acceptable to >>>>>>>>>>> instantiate aarch64_pair_fusion. But again there's no need to create a >>>>>>>>>>> pointer to the parent class, just call any function you like directly. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Addressed. >>>>>>>>>>>> >>>>>>>>>>>> for (auto bb : crtl->ssa->bbs ()) >>>>>>>>>>>> - ldp_fusion_bb (bb); >>>>>>>>>>>> + pfuse->ldp_fusion_bb (bb); >>>>>>>>>>> >>>>>>>>>>> I think even the code to iterate over bbs should itself be a member >>>>>>>>>>> function of pair_fusion (say "run") and then that becomes part of the >>>>>>>>>>> generic code. >>>>>>>>>>> >>>>>>>>>>> So this function would just become: >>>>>>>>>>> >>>>>>>>>>> aarch64_pair_fusion pass; >>>>>>>>>>> pass.run (); >>>>>>>>>>> >>>>>>>>>>> and could be inlined into the caller. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Addressed. >>>>>>>>>>> Perhaps you could also add an early return like: >>>>>>>>>>> >>>>>>>>>>> if (!track_loads_p () && !track_stores_p ()) >>>>>>>>>>> return; >>>>>>>>>>> >>>>>>>>>>> in pair_fusion::run () and then remove the corresponding code from >>>>>>>>>>> pass_ldp_fusion::gate? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Addressed. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ldp_fusion_destroy (); >>>>>>>>>>>> } >>>>>>>>>>>> -- >>>>>>>>>>>> 2.39.3 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Alex >>>>>>>>>> >>>>>>>>>> Thanks & Regards >>>>>>>>>> Ajit