From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 117554 invoked by alias); 19 Jun 2015 17:38:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 117542 invoked by uid 89); 19 Jun 2015 17:38:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 19 Jun 2015 17:38:33 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-1-FdoxFPxvRCCoTC3uszsB-w-1 Received: from [10.2.207.65] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 19 Jun 2015 18:38:29 +0100 Message-ID: <55845395.3020306@arm.com> Date: Fri, 19 Jun 2015 17:54:00 -0000 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: Jeff Law CC: "gcc-patches@gcc.gnu.org" , Richard Biener Subject: Re: [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion References: <555F4E54.6080600@arm.com> <5565E839.2090402@redhat.com> In-Reply-To: <5565E839.2090402@redhat.com> X-MC-Unique: FdoxFPxvRCCoTC3uszsB-w-1 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-06/txt/msg01358.txt.bz2 Jeff Law wrote: > On 05/22/2015 09:42 AM, Alan Lawrence wrote: >> This patch does so (and makes slightly less conservative, to tackle the >> example above). I found I had to make this a separate pass, so that the >> phi nodes were cleaned up at the end of the pass before running >> tree_if_conversion. > What PHI node cleanup needs to be done? I don't doubt something's=20 > needed, but would like to understand the cleanup -- depending on what=20 > needs to be done, it may be the case that we can cleanup on-the-fly or=20 > it may point at a general issue we should be resolving prior to running=20 > tree_if_conversion. If I change pass_ch_vect to return 0 rather than TODO_update_cfg, my testca= se gives: foo () { int m; int i; unsigned int ivtmp_3; int _5; int _6; int _7; int _10; int _12; unsigned int ivtmp_13; int _15; unsigned int ivtmp_19; int _20; unsigned int ivtmp_23; unsigned int ivtmp_27; : _10 =3D A[0]; : # m_11 =3D PHI <4(2)> # _12 =3D PHI <_10(2)> # i_1 =3D PHI <0(2)> # ivtmp_19 =3D PHI <16(2)> _20 =3D m_11 * _12; A[i_1] =3D _20; i_22 =3D i_1 + 1; ivtmp_23 =3D ivtmp_19 - 1; if (ivtmp_23 !=3D 0) goto ; else goto ; : : # i_26 =3D PHI # ivtmp_27 =3D PHI _5 =3D A[i_26]; _6 =3D _5 & i_26; if (_6 !=3D 0) goto ; else goto ; : : # m_14 =3D PHI <5(3), 4(4)> : # m_2 =3D PHI # _15 =3D PHI <_5(5)> # i_16 =3D PHI # ivtmp_3 =3D PHI _7 =3D m_2 * _15; A[i_16] =3D _7; i_9 =3D i_16 + 1; ivtmp_13 =3D ivtmp_3 - 1; if (ivtmp_13 !=3D 0) goto ; else goto ; : goto ; : return; } if-conversion then bails out in if_convertible_phi_p, with a phi (whose res= ult=20 is a virtual operand, specifically an SSA name) used as operand to another = phi=20 ("Difficult to handle this virtual phi" in tree-if-conv.c): see m_2, i_16.= =20 Returning TODO_update_cfg, causes merging of blocks 2 and 8, and blocks 5 a= nd 6,=20 giving instead: foo () { int m; int i; int _5; int _6; int _7; int _10; unsigned int ivtmp_13; int _20; unsigned int ivtmp_23; unsigned int ivtmp_27; : _10 =3D A[0]; _20 =3D _10 * 4; A[0] =3D _20; i_22 =3D 1; ivtmp_23 =3D 15; if (ivtmp_23 !=3D 0) goto ; else goto ; : : # i_26 =3D PHI # ivtmp_27 =3D PHI _5 =3D A[i_26]; _6 =3D _5 & i_26; if (_6 !=3D 0) goto ; else goto ; : : # m_14 =3D PHI <5(4), 4(5)> _7 =3D _5 * m_14; A[i_26] =3D _7; i_9 =3D i_26 + 1; ivtmp_13 =3D ivtmp_27 + 4294967295; if (ivtmp_13 !=3D 0) goto ; else goto ; : goto ; : return; } and one can see that the troublesome phi's are gone. > Don't we have a flag to turn off loop header copying? If so, does=20 > adding that flag to the test "fix" it without resorting to something > gross like preserving the old logic for the first pass and new logic for= =20 > the second pass. Ah, yes, thanks, -fno-tree-ch works. In fact here the problem was caused by= the=20 second pass, which following Richard's suggestion, I've now gated with=20 -ftree-vectorize also - which is turned off by default at -O2, so the test = is=20 passing without changes. However, it turns out I was implicitly using different logic in the two pas= ses:=20 the 'if (!single_exit(loop)) return true' always bailed out in the first=20 pass_ch, as single_exit-ness is not known in the absence of=20 LOOPS_HAVE_RECORDED_EXITS, so the first pass_ch was still using the origina= l=20 logic anyway! I've now had to refactor the two classes to allow different c= ode=20 anyway, which makes the distinction clear. (I also tried a more complicated version of do_while_loop_p that computed i= t's=20 own single_exit criteria to have that in the first pass; this gave more tes= t=20 failures, which are fixable, but it's not really the purpose of this patch = to do=20 additional header-copying when we are not vectorizing.) > My biggest worry would be cases where data initialized by=20 > loop_optimizer_init gets invalidated by the header copying. Have you=20 > looked at all at that possibility? I don't have anything specific in=20 > mind to point you at -- just a general concern. Well, this code from tree-ssa-loop-ch.c appears to update the preheader and= loop=20 latch (I've verified with printf's that loop->latch, loop->header,=20 loop_preheader_edge(loop) and loop_latch_edge(loop) are all sensible after = this): /* Ensure that the latch and the preheader is simple (we know that they are not now, since there was the loop exit condition. */ split_edge (loop_preheader_edge (loop)); split_edge (loop_latch_edge (loop)); and I think the exit edges don't change (they get cloned outside the loop).= The=20 one remaining worry might be irreducible code, but I believe this should ha= ve=20 been removed by the stage pass_ch_vect runs: I've also run an entire bootst= rap +=20 testsuite with both (a) an assertion in pass_ch_vect::process_loop_p that n= one=20 of the blocks in the loop are BB_IRREDUCIBLE, and (b) a call to=20 verify_loop_structure at the end of pass_ch_base::copy_headers. I agree this isn't (/cannot be) totally definitive, so if you are not=20 sufficiently reassured - would you be if I called loop_optimizer_init=20 (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS) at the end of pass_ch_vect, redo= ing=20 the setup done in pass_tree_loop_init::execute? Patch v2 at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01355.html . Thanks for the review! --Alan