From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-400823-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 117554 invoked by alias); 19 Jun 2015 17:38:35 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 117542 invoked by uid 89); 19 Jun 2015 17:38:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 19 Jun 2015 17:38:33 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-1-FdoxFPxvRCCoTC3uszsB-w-1
Received: from [10.2.207.65] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Fri, 19 Jun 2015 18:38:29 +0100
Message-ID: <55845395.3020306@arm.com>
Date: Fri, 19 Jun 2015 17:54:00 -0000
From: Alan Lawrence <alan.lawrence@arm.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101213)
MIME-Version: 1.0
To: Jeff Law <law@redhat.com>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Richard Biener <richard.guenther@gmail.com>
Subject: Re: [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion
References: <555F4E54.6080600@arm.com> <5565E839.2090402@redhat.com>
In-Reply-To: <5565E839.2090402@redhat.com>
X-MC-Unique: FdoxFPxvRCCoTC3uszsB-w-1
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-06/txt/msg01358.txt.bz2

Jeff Law wrote:
> On 05/22/2015 09:42 AM, Alan Lawrence wrote:
>> This patch does so (and makes slightly less conservative, to tackle the
>> example above). I found I had to make this a separate pass, so that the
>> phi nodes were cleaned up at the end of the pass before running
>> tree_if_conversion.
> What PHI node cleanup needs to be done?  I don't doubt something's=20
> needed, but would like to understand the cleanup -- depending on what=20
> needs to be done, it may be the case that we can cleanup on-the-fly or=20
> it may point at a general issue we should be resolving prior to running=20
> tree_if_conversion.

If I change pass_ch_vect to return 0 rather than TODO_update_cfg, my testca=
se gives:

foo ()
{
   int m;
   int i;
   unsigned int ivtmp_3;
   int _5;
   int _6;
   int _7;
   int _10;
   int _12;
   unsigned int ivtmp_13;
   int _15;
   unsigned int ivtmp_19;
   int _20;
   unsigned int ivtmp_23;
   unsigned int ivtmp_27;

   <bb 2>:
   _10 =3D A[0];

   <bb 8>:
   # m_11 =3D PHI <4(2)>
   # _12 =3D PHI <_10(2)>
   # i_1 =3D PHI <0(2)>
   # ivtmp_19 =3D PHI <16(2)>
   _20 =3D m_11 * _12;
   A[i_1] =3D _20;
   i_22 =3D i_1 + 1;
   ivtmp_23 =3D ivtmp_19 - 1;
   if (ivtmp_23 !=3D 0)
     goto <bb 9>;
   else
     goto <bb 7>;

   <bb 9>:

   <bb 3>:
   # i_26 =3D PHI <i_9(10), i_22(9)>
   # ivtmp_27 =3D PHI <ivtmp_13(10), ivtmp_23(9)>
   _5 =3D A[i_26];
   _6 =3D _5 & i_26;
   if (_6 !=3D 0)
     goto <bb 5>;
   else
     goto <bb 4>;

   <bb 4>:

   <bb 5>:
   # m_14 =3D PHI <5(3), 4(4)>

   <bb 6>:
   # m_2 =3D PHI <m_14(5)>
   # _15 =3D PHI <_5(5)>
   # i_16 =3D PHI <i_26(5)>
   # ivtmp_3 =3D PHI <ivtmp_27(5)>
   _7 =3D m_2 * _15;
   A[i_16] =3D _7;
   i_9 =3D i_16 + 1;
   ivtmp_13 =3D ivtmp_3 - 1;
   if (ivtmp_13 !=3D 0)
     goto <bb 10>;
   else
     goto <bb 7>;

   <bb 10>:
   goto <bb 3>;

   <bb 7>:
   return;

}

if-conversion then bails out in if_convertible_phi_p, with a phi (whose res=
ult=20
is a virtual operand, specifically an SSA name) used as operand to another =
phi=20
("Difficult to handle this virtual phi" in tree-if-conv.c): see m_2, i_16.=
=20
Returning TODO_update_cfg, causes merging of blocks 2 and 8, and blocks 5 a=
nd 6,=20
giving instead:

foo ()
{
   int m;
   int i;
   int _5;
   int _6;
   int _7;
   int _10;
   unsigned int ivtmp_13;
   int _20;
   unsigned int ivtmp_23;
   unsigned int ivtmp_27;

   <bb 2>:
   _10 =3D A[0];
   _20 =3D _10 * 4;
   A[0] =3D _20;
   i_22 =3D 1;
   ivtmp_23 =3D 15;
   if (ivtmp_23 !=3D 0)
     goto <bb 3>;
   else
     goto <bb 8>;

   <bb 3>:

   <bb 4>:
   # i_26 =3D PHI <i_9(7), i_22(3)>
   # ivtmp_27 =3D PHI <ivtmp_13(7), ivtmp_23(3)>
   _5 =3D A[i_26];
   _6 =3D _5 & i_26;
   if (_6 !=3D 0)
     goto <bb 6>;
   else
     goto <bb 5>;

   <bb 5>:

   <bb 6>:
   # m_14 =3D PHI <5(4), 4(5)>
   _7 =3D _5 * m_14;
   A[i_26] =3D _7;
   i_9 =3D i_26 + 1;
   ivtmp_13 =3D ivtmp_27 + 4294967295;
   if (ivtmp_13 !=3D 0)
     goto <bb 7>;
   else
     goto <bb 8>;

   <bb 7>:
   goto <bb 4>;

   <bb 8>:
   return;

}

and one can see that the troublesome phi's are gone.

> Don't we have a flag to turn off loop header copying?  If so, does=20
> adding that flag to the test "fix" it  without resorting to something
> gross like preserving the old logic for the first pass and new logic for=
=20
> the second pass.

Ah, yes, thanks, -fno-tree-ch works. In fact here the problem was caused by=
 the=20
second pass, which following Richard's suggestion, I've now gated with=20
-ftree-vectorize also - which is turned off by default at -O2, so the test =
is=20
passing without changes.

However, it turns out I was implicitly using different logic in the two pas=
ses:=20
the 'if (!single_exit(loop)) return true' always bailed out in the first=20
pass_ch, as single_exit-ness is not known in the absence of=20
LOOPS_HAVE_RECORDED_EXITS, so the first pass_ch was still using the origina=
l=20
logic anyway! I've now had to refactor the two classes to allow different c=
ode=20
anyway, which makes the distinction clear.

(I also tried a more complicated version of do_while_loop_p that computed i=
t's=20
own single_exit criteria to have that in the first pass; this gave more tes=
t=20
failures, which are fixable, but it's not really the purpose of this patch =
to do=20
additional header-copying when we are not vectorizing.)

> My biggest worry would be cases where data initialized by=20
> loop_optimizer_init gets invalidated by the header copying.  Have you=20
> looked at all at that possibility?  I don't have anything specific in=20
> mind to point you at -- just a general concern.

Well, this code from tree-ssa-loop-ch.c appears to update the preheader and=
 loop=20
latch (I've verified with printf's that loop->latch, loop->header,=20
loop_preheader_edge(loop) and loop_latch_edge(loop) are all sensible after =
this):

/* Ensure that the latch and the preheader is simple (we know that they
	 are not now, since there was the loop exit condition.  */
       split_edge (loop_preheader_edge (loop));
       split_edge (loop_latch_edge (loop));

and I think the exit edges don't change (they get cloned outside the loop).=
 The=20
one remaining worry might be irreducible code, but I believe this should ha=
ve=20
been removed by the stage pass_ch_vect runs: I've also run an entire bootst=
rap +=20
testsuite with both (a) an assertion in pass_ch_vect::process_loop_p that n=
one=20
of the blocks in the loop are BB_IRREDUCIBLE, and (b) a call to=20
verify_loop_structure at the end of pass_ch_base::copy_headers.

I agree this isn't (/cannot be) totally definitive, so if you are not=20
sufficiently reassured - would you be if I called loop_optimizer_init=20
(LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS) at the end of pass_ch_vect, redo=
ing=20
the setup done in pass_tree_loop_init::execute?

Patch v2 at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01355.html .

Thanks for the review!

--Alan