From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-410837-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 103385 invoked by alias); 21 Oct 2015 10:42:21 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 103373 invoked by uid 89); 21 Oct 2015 10:42:20 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-yk0-f179.google.com
Received: from mail-yk0-f179.google.com (HELO mail-yk0-f179.google.com) (209.85.160.179) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 21 Oct 2015 10:42:18 +0000
Received: by ykba4 with SMTP id a4so38438777ykb.3        for <gcc-patches@gcc.gnu.org>; Wed, 21 Oct 2015 03:42:16 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.129.49.211 with SMTP id x202mr6423548ywx.147.1445424136226; Wed, 21 Oct 2015 03:42:16 -0700 (PDT)
Received: by 10.37.117.136 with HTTP; Wed, 21 Oct 2015 03:42:16 -0700 (PDT)
In-Reply-To: <D24A647B.901F%alan.hayward@arm.com>
References: <D24A647B.901F%alan.hayward@arm.com>
Date: Wed, 21 Oct 2015 10:46:00 -0000
Message-ID: <CAFiYyc0RV55f+SM5qiRS0qJ1f1UAbQVnoG7_ksbC0XjjgBbN5Q@mail.gmail.com>
Subject: Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
From: Richard Biener <richard.guenther@gmail.com>
To: Alan Hayward <alan.hayward@arm.com>
Cc: Alan Lawrence <Alan.Lawrence@arm.com>, 	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-10/txt/msg02058.txt.bz2

On Mon, Oct 19, 2015 at 10:17 AM, Alan Hayward <alan.hayward@arm.com> wrote:
>
>
>>On 30/09/2015 13:45, "Richard Biener" <richard.guenther@gmail.com> wrote:
>>
>>>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayward@arm.com>
>>>wrote:
>>>>
>>>>
>>>> On 18/09/2015 14:53, "Alan Hayward" <Alan.Hayward@arm.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>On 18/09/2015 14:26, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:
>>>>>
>>>>>>On 18/09/15 13:17, Richard Biener wrote:
>>>>>>>
>>>>>>> Ok, I see.
>>>>>>>
>>>>>>> That this case is already vectorized is because it implements
>>>>>>>MAX_EXPR,
>>>>>>> modifying it slightly to
>>>>>>>
>>>>>>> int foo (int *a)
>>>>>>> {
>>>>>>>    int val =3D 0;
>>>>>>>    for (int i =3D 0; i < 1024; ++i)
>>>>>>>      if (a[i] > val)
>>>>>>>        val =3D a[i] + 1;
>>>>>>>    return val;
>>>>>>> }
>>>>>>>
>>>>>>> makes it no longer handled by current code.
>>>>>>>
>>>>>>
>>>>>>Yes. I believe the idea for the patch is to handle arbitrary
>>>>>>expressions
>>>>>>like
>>>>>>
>>>>>>int foo (int *a)
>>>>>>{
>>>>>>    int val =3D 0;
>>>>>>    for (int i =3D 0; i < 1024; ++i)
>>>>>>      if (some_expression (i))
>>>>>>        val =3D another_expression (i);
>>>>>>    return val;
>>>>>>}
>>>>>
>>>>>Yes, that=C3=A2s correct. Hopefully my new test cases should cover
>>>>>everything.
>>>>>
>>>>
>>>> Attached is a new version of the patch containing all the changes
>>>> requested by Richard.
>>>
>>>+      /* Compare the max index vector to the vector of found indexes to
>>>find
>>>+        the postion of the max value.  This will result in either a
>>>single
>>>+        match or all of the values.  */
>>>+      tree vec_compare =3D make_ssa_name (index_vec_type_signed);
>>>+      gimple vec_compare_stmt =3D gimple_build_assign (vec_compare,
>>>EQ_EXPR,
>>>+                                                    induction_index,
>>>+                                                    max_index_vec);
>>>
>>>I'm not sure all targets can handle this.  If I deciper the code
>>>correctly then we do
>>>
>>>  mask =3D induction_index =3D=3D max_index_vec;
>>>  vec_and =3D mask & vec_data;
>>>
>>>plus some casts.  So this is basically
>>>
>>>  vec_and =3D induction_index =3D=3D max_index_vec ? vec_data : {0, 0, .=
.. };
>>>
>>>without the need to relate the induction index vector type to the data
>>>vector type.
>>>I believe this is also the form all targets support.
>>
>>
>>Ok, I=C3=A2ll replace this.
>>
>>>
>>>I am missing a comment before all this code-generation that shows the
>>>transform
>>>result with the variable names used in the code-gen.  I have a hard
>>>time connecting
>>>things here.
>>
>>Ok, I=C3=A2ll add some comments.
>>
>>>
>>>+      tree matched_data_reduc_cast =3D build1 (VIEW_CONVERT_EXPR,
>>>scalar_type,
>>>+                                            matched_data_reduc);
>>>+      epilog_stmt =3D gimple_build_assign (new_scalar_dest,
>>>+                                        matched_data_reduc_cast);
>>>+      new_temp =3D make_ssa_name (new_scalar_dest, epilog_stmt);
>>>+      gimple_assign_set_lhs (epilog_stmt, new_temp);
>>>
>>>this will leave the stmt unsimplified.  scalar sign-changes should use
>>>NOP_EXPR,
>>>not VIEW_CONVERT_EXPR.  The easiest fix is to use fold_convert instead.
>>>Also just do like before - first make_ssa_name and then directly use it
>>>in the
>>>gimple_build_assign.
>>
>>We need the VIEW_CONVERT_EXPR for the cases where we have float data
>>values. The index is always integer.
>>
>>
>>>
>>>The patch is somewhat hard to parse with all the indentation changes.  A
>>>context
>>>diff would be much easier to read in those contexts.
>>
>>Ok, I=C3=A2ll make the next patch like that
>>
>>>
>>>+  if (v_reduc_type =3D=3D COND_REDUCTION)
>>>+    {
>>>+      widest_int ni;
>>>+
>>>+      if (! max_loop_iterations (loop, &ni))
>>>+       {
>>>+         if (dump_enabled_p ())
>>>+           dump_printf_loc (MSG_NOTE, vect_location,
>>>+                            "loop count not known, cannot create cond "
>>>+                            "reduction.\n");
>>>
>>>ugh.  That's bad.
>>>
>>>+      /* The additional index will be the same type as the condition.
>>>Check
>>>+        that the loop can fit into this less one (because we'll use up
>>>the
>>>+        zero slot for when there are no matches).  */
>>>+      tree max_index =3D TYPE_MAX_VALUE (cr_index_scalar_type);
>>>+      if (wi::geu_p (ni, wi::to_widest (max_index)))
>>>+       {
>>>+         if (dump_enabled_p ())
>>>+           dump_printf_loc (MSG_NOTE, vect_location,
>>>+                            "loop size is greater than data size.\n");
>>>+         return false;
>>>
>>>Likewise.
>>
>>We could do better if we made the index type larger.
>>But as a first implementation of this optimisation, I didn=C3=A2t want to
>>overcomplicate things more.
>>
>>>
>>>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
>>>gimple_stmt_iterator *gsi,
>>>   if (dump_enabled_p ())
>>>     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
>>>
>>>+  STMT_VINFO_TYPE (stmt_info) =3D reduc_vec_info_type;
>>>+
>>>   /* FORNOW: Multiple types are not supported for condition.  */
>>>   if (code =3D=3D COND_EXPR)
>>>
>>>this change looks odd (or wrong).  The type should be _only_ set/changed
>>>during
>>>analysis.
>>
>>
>>The problem is, for COND_EXPRs, this function calls
>>vectorizable_condition(), which sets STMT_VINFO_TYPE to
>>condition_vec_info_type.

Ah, the pre-existing issue of the transform phase re-doing the analysis...
a fix would be to condition that call on vec_stmt =3D=3D NULL, thus analysis
phase.

>>Therefore we need something to restore it back to reduc_vec_info_type on
>>the non-analysis call.
>>
>>I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after
>>the call to vectorizable_condition(), but that looked worse.
>>I could back up the value of STMT_VINFO_TYPE before calling
>>vectorizable_condition() and then restore it after? I think that=C3=A2ll =
look a
>>lot better.
>>
>>
>>>
>>>+
>>>+      /* For cond reductions we need to add an additional conditional
>>>based on
>>>+        the loop index.  */
>>>+      if (v_reduc_type =3D=3D COND_REDUCTION)
>>>+       {
>>>+         int nunits_out =3D TYPE_VECTOR_SUBPARTS (vectype_out);
>>>+         int k;
>>>...
>>>+         STMT_VINFO_VECTYPE (index_vec_info) =3D cr_index_vector_type;
>>>+         set_vinfo_for_stmt (index_condition, index_vec_info);
>>>+
>>>+         /* Update the phi with the vec cond.  */
>>>+         add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
>>>+                      UNKNOWN_LOCATION);
>>>
>>>same as before - I am missing a comment that shows the generated code
>>>and connects
>>>the local vars used.
>>
>>Ok, I=C3=A2ll add something
>>
>>>
>>>
>>>+         tree ccompare_name =3D make_ssa_name (TREE_TYPE (ccompare));
>>>+         gimple ccompare_stmt =3D gimple_build_assign (ccompare_name,
>>>ccompare);
>>>+         gsi_insert_before (&vec_stmt_gsi, ccompare_stmt,
>>>GSI_SAME_STMT);
>>>+         gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
>>>
>>>hum - are you sure this works with ncopies > 1?  Will it use the
>>>correct vec_stmt?
>>
>>We don=C3=A2t support this when ncopies >1.
>>
>>In vectorizable_reduction():
>>
>>if ((double_reduc || v_reduc_type =3D=3D COND_REDUCTION) && ncopies > 1)
>>    {
>>      if (dump_enabled_p ())
>>       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>                        "multiple types in double reduction or condition "
>>                        "reduction.\n");
>>      return false;
>>    }
>>
>>
>>
>>>
>>>I still dislike the v_reduc_type plastered and passed everywhere.  Can
>>>you explore
>>>adding the reduction kind to stmt_info?
>>
>>Ok, I can do that.
>>
>>
>>Thanks for the comments.
>>I=C3=A2ll put together a patch with the above changes.
>>
>>Thanks,
>>Alan.
>>
>>
>
> Richard, as requested I've updated with the follow changes:
>
> * AND and EQ replaced with a COND_EXPR
> * Better comments for the code gen, included references to variable names
> * Kept the VIEW_CONVERT_EXPR - we need this for when the data is float ty=
pe
> * Backed up STMT_VINFO_TYPE before the call to vectorizable_condition()
> and restored after the call. I considered extracting the relvant parts of
> vectorizable_condition() into a sub function, but it had too many
> dependencies with the rest of vectorizable_condition().
> * v_reduc_type is now part of stmt_info
> * Created a diff using 50 lines of context

Ok with the vectorizable_condition call guarded as suggested instead.

Thanks,
Richard.

>
> Thanks,
> Alan.
>
>