From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 115375 invoked by alias); 23 Nov 2015 16:36:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 115366 invoked by uid 89); 23 Nov 2015 16:36:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 23 Nov 2015 16:36:39 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58910) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1a0u6X-0005ck-CE for gcc-patches@gnu.org; Mon, 23 Nov 2015 11:36:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0u6S-0005bF-Si for gcc-patches@gnu.org; Mon, 23 Nov 2015 11:36:36 -0500 Received: from mx2.suse.de ([195.135.220.15]:60977) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0u6S-0005bA-HX for gcc-patches@gnu.org; Mon, 23 Nov 2015 11:36:32 -0500 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id C39A4ABFA; Mon, 23 Nov 2015 16:35:22 +0000 (UTC) User-Agent: K-9 Mail for Android In-Reply-To: <565332AE.1020907@mentor.com> References: <5640BD31.2060602@mentor.com> <5640FB07.6010008@mentor.com> <5649C41A.40403@mentor.com> <564A64B3.7080305@mentor.com> <564B3F69.50600@mentor.com> <564D1930.8040104@mentor.com> <56502AFB.8050105@mentor.com> <565332AE.1020907@mentor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Subject: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def From: Richard Biener Date: Mon, 23 Nov 2015 16:38:00 -0000 To: Tom de Vries CC: Richard Biener ,"gcc-patches@gnu.org" ,Jakub Jelinek Message-ID: <4541E680-3D50-4369-9FB9-3A5B00EF5478@suse.de> Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-Received-From: 195.135.220.15 X-SW-Source: 2015-11/txt/msg02757.txt.bz2 On November 23, 2015 4:37:18 PM GMT+01:00, Tom de Vries wrote: >On 23/11/15 12:31, Richard Biener wrote: >>>> From the dump below I understand you want no memory references in >>>> > >the outer loop? >>>> > >So the issue seems to be that store motion fails >>>> > >to insert the preheader load / exit store to the outermost loop >>>> > >possible and thus another LIM pass is needed to "store motion" >those >>>> > >again? >>> > >>> >Yep. >>> > >>>> > > But a simple testcase >>>> > > >>>> > >int a; >>>> > >int *p =3D &a; >>>> > >int foo (int n) >>>> > >{ >>>> > > for (int i =3D 0; i < n; ++i) >>>> > > for (int j =3D 0; j < 100; ++j) >>>> > > *p +=3D j + i; >>>> > > return a; >>>> > >} >>>> > > >>>> > >shows that LIM can do this in one step. >>> > >>> >I've filed a FTR PR68465 - "pass_lim doesn't detect identical loop >entry >>> >conditions" for a test-case where that doesn't happen (when using >>> >-fno-tree-dominator-opts). >>> > >>>> > >Which means it should >>>> > >be investigated why it doesn't do this properly for your >testcase >>>> > >(store motion of *_25). >>> > >>> >There seems to be two related problems: >>> >1. the store has tree_could_trap_p (ref->mem.ref) true, which >should be >>> > false. I'll work on a fix for this. >>> >2. Give that the store can trap, I was running into PR68465. I >managed >>> > to eliminate the 2nd pass_lim by moving the pass_dominator >instance >>> > before the pass_lim instance. >>> > >>> >Attached patch shows the pass group with only one pass_lim. I hope >to be able >>> >to eliminate the first pass_dominator instance before pass_lim once >I fix 1. >>> > >>>> > >Simply adding two LIM passes either papers over a wrong-code >>>> > >bug (in LIM or in DOM) or over a missed-optimization in LIM. >>> > >>> >AFAIU now, it's PR68465, a missed optimization in LIM. >> Ok, it's not really LIMs job to cleanup loop header copying that way. >> >> DOM performs jump-threading for this but FRE should also be able >> to handle this just fine. Ah, it doesn't because the outer loop >> header directly contains the condition >> >> Index: gcc/tree-ssa-sccvn.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- gcc/tree-ssa-sccvn.c (revision 230737) >> +++ gcc/tree-ssa-sccvn.c (working copy) >> @@ -4357,20 +4402,32 @@ sccvn_dom_walker::before_dom_children (b >> >> /* If we have a single predecessor record the equivalence from a >> possible condition on the predecessor edge. */ >> - if (single_pred_p (bb)) >> + edge pred_e =3D NULL; >> + FOR_EACH_EDGE (e, ei, bb->preds) >> + { >> + if (e->flags & EDGE_DFS_BACK) >> + continue; >> + if (! pred_e) >> + pred_e =3D e; >> + else >> + { >> + pred_e =3D NULL; >> + break; >> + } >> + } >> + if (pred_e) >> { >> - edge e =3D single_pred_edge (bb); >> /* Check if there are multiple executable successor edges in >> the source block. Otherwise there is no additional info >> to be recorded. */ >> edge e2; >> - FOR_EACH_EDGE (e2, ei, e->src->succs) >> - if (e2 !=3D e >> + FOR_EACH_EDGE (e2, ei, pred_e->src->succs) >> + if (e2 !=3D pred_e >> && e2->flags & EDGE_EXECUTABLE) >> break; >> if (e2 && (e2->flags & EDGE_EXECUTABLE)) >> { >> - gimple *stmt =3D last_stmt (e->src); >> + gimple *stmt =3D last_stmt (pred_e->src); >> if (stmt >> && gimple_code (stmt) =3D=3D GIMPLE_COND) >> { >> @@ -4378,11 +4435,11 @@ sccvn_dom_walker::before_dom_children (b >> tree lhs =3D gimple_cond_lhs (stmt); >> tree rhs =3D gimple_cond_rhs (stmt); >> record_conds (bb, code, lhs, rhs, >> - (e->flags & EDGE_TRUE_VALUE) !=3D 0); >> + (pred_e->flags & EDGE_TRUE_VALUE) !=3D 0); >> code =3D invert_tree_comparison (code, HONOR_NANS >(lhs)); >> if (code !=3D ERROR_MARK) >> record_conds (bb, code, lhs, rhs, >> - (e->flags & EDGE_TRUE_VALUE) =3D=3D 0); >> + (pred_e->flags & EDGE_TRUE_VALUE) =3D=3D >0); >> } >> } >> } >> >> fixes this for me (for a small testcase). Does it help yours? >> > >Yes, it has the desired effect (of not needing pass_dominator before=20 >pass_lim) . But, patch "Mark by_ref mem_ref in build_receiver_ref as=20 >non-trapping" committed as r230738, also has that effect, so AFAIU I=20 >don't require this tree-ssa-sccvn.c fix. OK, I committed it anyway already. Richard. >Thanks, >- Tom > >> Otherwise untested of course (I hope EDGE_DFS_BACK is good enough, >> it's supposed to match edges that have the src dominated by the >dest). >> Testing the above now.