From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 1079C3855026; Tue, 17 Aug 2021 05:24:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1079C3855026 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17H52jQL164625; Tue, 17 Aug 2021 01:24:34 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3aeua0mqe7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Aug 2021 01:24:34 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 17H52w6v165707; Tue, 17 Aug 2021 01:24:34 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 3aeua0mqdt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Aug 2021 01:24:33 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17H5Mv8N008635; Tue, 17 Aug 2021 05:24:32 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma04ams.nl.ibm.com with ESMTP id 3ae5f8c90x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Aug 2021 05:24:32 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17H5OTcp56033762 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 17 Aug 2021 05:24:29 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B552CA4068; Tue, 17 Aug 2021 05:24:29 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7AF62A4069; Tue, 17 Aug 2021 05:24:27 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.197.234.118]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Tue, 17 Aug 2021 05:24:27 +0000 (GMT) Subject: Re: [PATCH] Fix incorrect computation in fill_always_executed_in_1 To: Richard Biener Cc: segher@kernel.crashing.org, wschmidt@linux.ibm.com, linkw@gcc.gnu.org, gcc-patches@gcc.gnu.org, dje.gcc@gmail.com References: <20210816084612.7999-1-luoxhu@linux.ibm.com> From: Xionghu Luo Message-ID: <6f0e3219-0a36-28a6-6313-0b40ac80d366@linux.ibm.com> Date: Tue, 17 Aug 2021 13:24:24 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: IDrnVU7XfUyiNZ06_TJgcqOpeLBtgXd9 X-Proofpoint-ORIG-GUID: LQA9q25X5x0fR2Nh06smKhimzVaXLHtQ X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-17_01:2021-08-16, 2021-08-17 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 bulkscore=0 malwarescore=0 priorityscore=1501 mlxlogscore=999 phishscore=0 spamscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108170031 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Aug 2021 05:24:36 -0000 On 2021/8/17 13:17, Xionghu Luo via Gcc-patches wrote: > Hi, > > On 2021/8/16 19:46, Richard Biener wrote: >> On Mon, 16 Aug 2021, Xiong Hu Luo wrote: >> >>> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for >>> nested loops.  inn_loop is updated to inner loop, so it need be restored >>> when exiting from innermost loop. With this patch, the store instruction >>> in outer loop could also be moved out of outer loop by store motion. >>> Any comments?  Thanks. >> >>> gcc/ChangeLog: >>> >>>     * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore >>>     inn_loop when exiting from innermost loop. >>> >>> gcc/testsuite/ChangeLog: >>> >>>     * gcc.dg/tree-ssa/ssa-lim-19.c: New test. >>> --- >>>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++ >>>   gcc/tree-ssa-loop-im.c                     |  6 +++++- >>>   2 files changed, 29 insertions(+), 1 deletion(-) >>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >>> >>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >>> new file mode 100644 >>> index 00000000000..097a5ee4a4b >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >>> @@ -0,0 +1,24 @@ >>> +/* PR/101293 */ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ >>> + >>> +struct X { int i; int j; int k;}; >>> + >>> +void foo(struct X *x, int n, int l) >>> +{ >>> +  for (int j = 0; j < l; j++) >>> +    { >>> +      for (int i = 0; i < n; ++i) >>> +    { >>> +      int *p = &x->j; >>> +      int tem = *p; >>> +      x->j += tem * i; >>> +    } >>> +      int *r = &x->k; >>> +      int tem2 = *r; >>> +      x->k += tem2 * j; >>> +    } >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 >>> "lim2" } } */ >>> + >>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c >>> index b24bc64f2a7..5ca4738b20e 100644 >>> --- a/gcc/tree-ssa-loop-im.c >>> +++ b/gcc/tree-ssa-loop-im.c >>> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, >>> sbitmap contains_call) >>>         if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) >>>           last = bb; >>> +      if (inn_loop != loop >>> +          && flow_loop_nested_p (bb->loop_father, inn_loop)) >>> +        inn_loop = bb->loop_father; >>> + >> >> The comment says >> >>                /* In a loop that is always entered we may proceed anyway. >>                   But record that we entered it and stop once we leave >> it. >> */ >>                inn_loop = bb->loop_father; >> >> and your change would defeat that early return, no? > > The issue is the search method exits too early when iterating the outer > loop.  For example of a nested loop, loop 1 includes 5,8,3,10,4,9 > and loop2 includes 3,10.  Currently, it breaks when bb is 3 as bb 3 > doesn't dominate bb 9 of loop 1.  But actually, both bb 5 and bb 4 are > ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4 > they won't be processed by store motion again. > > >     5<---- >     |\   | >     8 \  9 >     |  \ | > --->3--->4 > |    | 10---| Correct the graph display: 5<---- |\ | 8 \ 9 | \ | --->3--->4 | | ---10 > > > SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this > patch, it will continue search when meet bb 3 until bb 4, then last is > updated > to bb 4, it will break until exit edge is found at bb 4 by > "if (!flow_bb_inside_loop_p (loop, e->dest))".  Then the followed loop > code will > set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5. > > >      while (1) >     { >       SET_ALWAYS_EXECUTED_IN (last, loop); >       if (last == loop->header) >         break; >       last = get_immediate_dominator (CDI_DOMINATORS, last); >     } > > After further discussion with Kewen, we found that the inn_loop variable is > totally useless and could be removed. > > >> >>>         if (bitmap_bit_p (contains_call, bb->index)) >>>           break; >>> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, >>> sbitmap contains_call) >>>         if (bb->loop_father->header == bb) >>>           { >>> -          if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) >>> +          if (!dominated_by_p (CDI_DOMINATORS, >>> bb->loop_father->latch, bb)) >>>           break; >> >> That's now a always false condition - a loops latch is always dominated >> by its header.  The condition as written tries to verify whether the >> loop is always entered - mind we visit all blocks, not only those >> always executed. > > Thanks for the catch!  I am afraid the piece of code should be removed > since it stops > search of potential ALWAYS EXECUTED bb after inner loop... > >> >> In fact for your testcase the x->j ref is _not_ always executed >> since the inner loop is conditional on n > 0. > > Yes.  But I want to move x->k (not x->j) out of loop 1 when l > 0 in > store-motion. > Attached the diff file without and with my patch to show the extra > optimization. > > x->j is already moved out of loop 2 on master code. > If change n and l to constant numbers like 100, master code could also > do 2 store > motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb > 4, bb 3 > and bb 5 are ALWAYS EXECUTED for loop 1. > > > struct X { int i; int j; int k;}; > > void foo(struct X *x, int n, int l) > { >  for (int j = 0; j < l; j++) // loop 1 >    { >      for (int i = 0; i < n; ++i)  // loop 2 >        { >          int *p = &x->j; >          int tem = *p; >          x->j += tem * i; >        } >      int *r = &x->k; >      int tem2 = *r; >      x->k += tem2 * j; >    } > } > >> >> Richard. >> > -- Thanks, Xionghu