From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 97003385840E for ; Wed, 14 Feb 2024 21:08:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 97003385840E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 97003385840E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707944912; cv=none; b=lfI98rNhFijU9g6DJatQMyrKzvMKsj2rUhTJRFVSOTj8mOLl1TmD13ZGTvG1Yzzb6RwoVifH73n4hgt4R3WIPWIcf1AygsRHbOrDZKJuZdcMtyQBa7x3Ki14EM0h86ha2OW5U6SLNUqZMD7G54n1UG5V5sPGOmFN2s7V58nAByU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707944912; c=relaxed/simple; bh=Sy4q3guubDFEWFFdTW72p1ddQx84Uj7L1KXFyj2PRYg=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=Yindbj2PbP/2m+vpdCs2iurEgjzHRqpq4CWqznrZjP0m+Tru6V3i6N0TnitVZDMOFTW9NT8P1ttWjalpOPf3VZ4A9P0ovth1gquYQBGXvLwKB03xukTvgFK4Bv8m6OHDh/lNGmHyIBzvcCR7qD/RsX88SxpJfcb4FLB3AozxX/A= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353727.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41EKqQVL029793; Wed, 14 Feb 2024 21:08:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=5zwXL9RDOcL4xQ9zlKsc3AIs1QLWjU4mVq65Xjh6esg=; b=QU3et6cuaKdXHQF34X9kusdJXHKQArWicUAqbwdalexm6JQwB2DzO4OtHPVLZoYLIaEw /jCs1n2gqav0RKV2tTeMdMSB0st3SdmGNG2I7mnnHDKoQo5D7lDwEeYQlO1zXMvlTiL0 XqGOwXEQwxyq05xzovmIlseKa6CIXPEfqqrxJUDCv/Uug+Med1bhcyxWdcF8kiRZtCbT NYk3c4znW2UmCENHuz1gX1AheOn1mClgLo4ROw6f/SBnmsXsBSR+hJdoV7n0XdefEiH/ QrmYaZtk+LT6eVvY8V/uiTWzGzEfum7DGxdganLgjki8z7ecA+aFWlTX7dUqAHiDXwIr Ew== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3w94x8g9sr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Feb 2024 21:08:25 +0000 Received: from m0353727.ppops.net (m0353727.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 41EKqQED029773; Wed, 14 Feb 2024 21:08:24 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3w94x8g9sa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Feb 2024 21:08:24 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 41EKKXuQ009935; Wed, 14 Feb 2024 21:08:24 GMT Received: from smtprelay07.wdc07v.mail.ibm.com ([172.16.1.74]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3w6p63023a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Feb 2024 21:08:24 +0000 Received: from smtpav03.wdc07v.mail.ibm.com (smtpav03.wdc07v.mail.ibm.com [10.39.53.230]) by smtprelay07.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 41EL8K7k26280480 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Feb 2024 21:08:22 GMT Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8D20758064; Wed, 14 Feb 2024 21:08:20 +0000 (GMT) Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0E79D5805C; Wed, 14 Feb 2024 21:08:17 +0000 (GMT) Received: from [9.43.109.182] (unknown [9.43.109.182]) by smtpav03.wdc07v.mail.ibm.com (Postfix) with ESMTP; Wed, 14 Feb 2024 21:08:16 +0000 (GMT) Message-ID: Date: Thu, 15 Feb 2024 02:38:15 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target Content-Language: en-US To: Alex Coplan , "Kewen.Lin" , Peter Bergner , Segher Boessenkool , David Edelsohn , Michael Meissner , gcc-patches , richard.sandiford@arm.com References: <39a67573-e2b6-475d-add4-6388c3bc82a4@linux.ibm.com> <8ec8d813-2a4e-4d4e-9b5e-11bf1f08193a@linux.ibm.com> <72f3af6a-218a-4bce-9006-44990dec6bdf@linux.ibm.com> From: Ajit Agarwal In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 9o4hd1lovzIgU857frTVRb3L4TJuKij3 X-Proofpoint-ORIG-GUID: vxmKWBz76qHXbLN9SW69OCmk9yqEFM2W X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-14_14,2024-02-14_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 phishscore=0 mlxscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 malwarescore=0 mlxlogscore=999 lowpriorityscore=0 suspectscore=0 spamscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402140163 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_MANYTO,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Richard: On 15/02/24 1:14 am, Richard Sandiford wrote: > Ajit Agarwal writes: >> On 14/02/24 10:56 pm, Richard Sandiford wrote: >>> Ajit Agarwal writes: >>>>>> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc >>>>>> index 88ee0dd67fc..a8d0ee7c4db 100644 >>>>>> --- a/gcc/df-problems.cc >>>>>> +++ b/gcc/df-problems.cc >>>>>> @@ -3360,7 +3360,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct df_mw_hardreg *mws, >>>>>> if (df_whole_mw_reg_unused_p (mws, live, artificial_uses)) >>>>>> { >>>>>> unsigned int regno = mws->start_regno; >>>>>> - df_set_note (REG_UNUSED, insn, mws->mw_reg); >>>>>> + //df_set_note (REG_UNUSED, insn, mws->mw_reg); >>>>>> dead_debug_insert_temp (debug, regno, insn, DEBUG_TEMP_AFTER_WITH_REG); >>>>>> >>>>>> if (REG_DEAD_DEBUGGING) >>>>>> @@ -3375,7 +3375,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct df_mw_hardreg *mws, >>>>>> if (!bitmap_bit_p (live, r) >>>>>> && !bitmap_bit_p (artificial_uses, r)) >>>>>> { >>>>>> - df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]); >>>>>> + // df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]); >>>>>> dead_debug_insert_temp (debug, r, insn, DEBUG_TEMP_AFTER_WITH_REG); >>>>>> if (REG_DEAD_DEBUGGING) >>>>>> df_print_note ("adding 2: ", insn, REG_NOTES (insn)); >>>>>> @@ -3493,9 +3493,9 @@ df_create_unused_note (rtx_insn *insn, df_ref def, >>>>>> || bitmap_bit_p (artificial_uses, dregno) >>>>>> || df_ignore_stack_reg (dregno))) >>>>>> { >>>>>> - rtx reg = (DF_REF_LOC (def)) >>>>>> - ? *DF_REF_REAL_LOC (def): DF_REF_REG (def); >>>>>> - df_set_note (REG_UNUSED, insn, reg); >>>>>> + //rtx reg = (DF_REF_LOC (def)) >>>>>> + // ? *DF_REF_REAL_LOC (def): DF_REF_REG (def); >>>>>> + //df_set_note (REG_UNUSED, insn, reg); >>>>>> dead_debug_insert_temp (debug, dregno, insn, DEBUG_TEMP_AFTER_WITH_REG); >>>>>> if (REG_DEAD_DEBUGGING) >>>>>> df_print_note ("adding 3: ", insn, REG_NOTES (insn)); >>>>> >>>>> I don't think this can be right. The last hunk of the var-tracking.cc >>>>> patch also seems to be reverting a correct change. >>>>> >>>> >>>> We generate sequential registers using (subreg V16QI (reg 00mode) 16) >>>> and (reg OOmode 0) >>>> where OOmode is 256 bit and V16QI is 128 bits in order to generate >>>> sequential register pair. >>> >>> OK. As I mentioned in the other message I just sent, it seems pretty >>> standard to use a 256-bit mode to represent a pair of 128-bit values. >>> In that case: >>> >>> - (reg:OO R) always refers to both registers in the pair, and any assignment >>> to it modifies both registers in the pair >>> >>> - (subreg:V16QI (reg:OO R) 0) refers to the first register only, and can >>> be modified independently of the second register >>> >>> - (subreg:V16QI (reg:OO R) 16) refers to the second register only, and can >>> be modified independently of the first register >>> >>> Is that how you're using it? >>> >> >> This is how I use it. >> (insn 27 21 33 2 (set (reg:OO 157 [ vect__5.11_76 ]) >> >> (insn 21 35 27 2 (set (subreg:V2DI (reg:OO 157 [ vect__5.11_76 ]) 16) >> >> to generate sequential registers. With the above sequential registers >> are generated by RA. >> >> >>> One thing to be wary of is that it isn't possible to assign to two >>> subregs of the same reg in a single instruction (at least AFAIK). >>> So any operation that wants to store to both registers in the pair >>> must store to (reg:OO R) itself, not to the two subregs. >>> >>>> If I keep the above REG_UNUSED notes ira generates >>>> REG_UNUSED and in cprop_harreg pass and dce pass deletes store pairs and >>>> we get incorrect code. >>>> >>>> By commenting REG_UNUSED notes it is not generated and we get the correct store >>>> pair fusion and cprop_hardreg and dce doesn't deletes them. >>>> >>>> Please let me know is there are better ways to address this instead of commenting >>>> above generation of REG_UNUSED notes. >>> >>> Could you quote an example rtl sequence that includes incorrect notes? >>> It might help to understand the problem a bit more. >>> >> >> Here is the rtl code: >> >> (insn 21 35 27 2 (set (subreg:V2DI (reg:OO 157 [ vect__5.11_76 ]) 16) >> (plus:V2DI (reg:V2DI 153 [ vect__4.10_72 ]) >> (reg:V2DI 154 [ _63 ]))) "/home/aagarwa/gcc-sources-fusion/gcc/testsuite/gcc.c-torture/execute/20030928-1.c":11:18 1706 {addv2di3} >> (expr_list:REG_DEAD (reg:V2DI 154 [ _63 ]) >> (expr_list:REG_DEAD (reg:V2DI 153 [ vect__4.10_72 ]) >> (expr_list:REG_UNUSED (reg:OO 157 [ vect__5.11_76 ]) >> (nil))))) >> (insn 27 21 33 2 (set (reg:OO 157 [ vect__5.11_76 ]) >> (plus:V2DI (reg:V2DI 158 [ vect__4.10_73 ]) >> (reg:V2DI 159 [ _60 ]))) "/home/aagarwa/gcc-sources-fusion/gcc/testsuite/gcc.c-torture/execute/20030928-1.c":11:18 1706 {addv2di3} >> (expr_list:REG_DEAD (reg:V2DI 159 [ _60 ]) >> (expr_list:REG_DEAD (reg:V2DI 158 [ vect__4.10_73 ]) >> (nil)))) >> (insn 33 27 39 2 (set (subreg:V2DI (reg:OO 167 [ vect__5.11_78 ]) 16) >> (plus:V2DI (reg:V2DI 163 [ vect__4.10_74 ]) >> (reg:V2DI 164 [ _57 ]))) "/home/aagarwa/gcc-sources-fusion/gcc/testsuite/gcc.c-torture/execute/20030928-1.c":11:18 1706 {addv2di3} >> (expr_list:REG_DEAD (reg:V2DI 164 [ _57 ]) >> (expr_list:REG_DEAD (reg:V2DI 163 [ vect__4.10_74 ]) >> (expr_list:REG_UNUSED (reg:OO 167 [ vect__5.11_78 ]) >> (nil))))) >> (insn 39 33 22 2 (set (reg:OO 167 [ vect__5.11_78 ]) >> (plus:V2DI (reg:V2DI 168 [ vect__4.10_75 ]) >> (reg:V2DI 169 [ _54 ]))) "/home/aagarwa/gcc-sources-fusion/gcc/testsuite/gcc.c-torture/execute/20030928-1.c":11:18 1706 {addv2di3} >> (expr_list:REG_DEAD (reg:V2DI 169 [ _54 ]) >> (expr_list:REG_DEAD (reg:V2DI 168 [ vect__4.10_75 ]) >> (nil)))) > > Thanks. Here, insns 27 and 39 are not well-formed RTL: the destination > of a SET must have the same mode as the source of the SET. (Except for > the special case of a const_int source, but even there, the source > *conceptually* has the same mode as the destination, even though the > mode isn't directly represented.) > > If the intention is for insn 27 to store the result of the plus:V2DI to > the first register in (reg:OO 157), while preserving the second register, > the RTL should instead be: > > (insn 27 21 33 2 (set (subreg:V2DI (reg:OO 157 [ vect__5.11_76 ]) 0) > (plus:V2DI (reg:V2DI 158 [ vect__4.10_73 ]) > (reg:V2DI 159 [ _60 ]))) "/home/aagarwa/gcc-sources-fusion/gcc/testsuite/gcc.c-torture/execute/20030928-1.c":11:18 1706 {addv2di3} > > (with the only change being to the first line). > > I think the other problems stem from that. Any assignment to (reg R) > (as opposed to a subreg of R) changes every bit of R. Since the original > form of insn 27 stores to all of (reg:OO 157), it makes the previous > partial definition in insn 21 dead. > > [We ought to have an RTL linter to catch this kind of thing, a bit like > tree-dfa and tree-cfg do for gimple. But unfortunately it'd take quite > a bit of time to implement, and would likely slow down checking builds > of the compiler quite a bit.] > I fixed the above issues and the code worked fine. Thanks a lot for your help. Now all the changes to df-problems.cc/emit-rtl.cc/dce.cc are not required and I will send the modified patch. i will send separate patch for rs6000 target. For aarch64 target I will send different target independent changes and target dependent changes as separate patch for review. Thanks again. Thanks so much for your help. Thanks & Regards Ajit > Richard > >> (insn 22 39 34 2 (set (mem:OO (reg/v/f:DI 143 [ x ]) [2 MEM [(const char * *)x_43(D)]+0 S16 A64]) >> (reg:OO 157 [ vect__5.11_76 ])) "/home/aagarwa/gcc-sources-fusion/gcc/testsuite/gcc.c-torture/execute/20030928-1.c":11:8 2186 {*movoo} >> (expr_list:REG_DEAD (reg:OO 157 [ vect__5.11_76 ]) >> (nil))) >> (insn 34 22 45 2 (set (mem:OO (plus:DI (reg/v/f:DI 143 [ x ]) >> (const_int 32 [0x20])) [2 MEM [(const char * *)x_43(D) + 32B]+0 S16 A64]) >> (reg:OO 167 [ vect__5.11_78 ])) "/home/aagarwa/gcc-sources-fusion/gcc/testsuite/gcc.c-torture/execute/20030928-1.c":11:8 2186 {*movoo} >> (expr_list:REG_DEAD (reg:OO 167 [ vect__5.11_78 ]) >> (expr_list:REG_DEAD (reg/v/f:DI 143 [ x ]) >> (nil)))) >> >> The first instruction in the above rtl code has REG_UNUSED and removed by cprop_hardreg. >> This problem occurs in store fusion and I dont see any issues with load fusion as >> REG_UNUSED are not generated in load fusion. >> >> Thanks & Regards >> Ajit >> >>> Thanks, >>> Richard