From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 103385 invoked by alias); 21 Oct 2015 10:42:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 103373 invoked by uid 89); 21 Oct 2015 10:42:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f179.google.com Received: from mail-yk0-f179.google.com (HELO mail-yk0-f179.google.com) (209.85.160.179) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 21 Oct 2015 10:42:18 +0000 Received: by ykba4 with SMTP id a4so38438777ykb.3 for ; Wed, 21 Oct 2015 03:42:16 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.129.49.211 with SMTP id x202mr6423548ywx.147.1445424136226; Wed, 21 Oct 2015 03:42:16 -0700 (PDT) Received: by 10.37.117.136 with HTTP; Wed, 21 Oct 2015 03:42:16 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Oct 2015 10:46:00 -0000 Message-ID: Subject: Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947) From: Richard Biener To: Alan Hayward Cc: Alan Lawrence , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg02058.txt.bz2 On Mon, Oct 19, 2015 at 10:17 AM, Alan Hayward wrote: > > >>On 30/09/2015 13:45, "Richard Biener" wrote: >> >>>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward >>>wrote: >>>> >>>> >>>> On 18/09/2015 14:53, "Alan Hayward" wrote: >>>> >>>>> >>>>> >>>>>On 18/09/2015 14:26, "Alan Lawrence" wrote: >>>>> >>>>>>On 18/09/15 13:17, Richard Biener wrote: >>>>>>> >>>>>>> Ok, I see. >>>>>>> >>>>>>> That this case is already vectorized is because it implements >>>>>>>MAX_EXPR, >>>>>>> modifying it slightly to >>>>>>> >>>>>>> int foo (int *a) >>>>>>> { >>>>>>> int val =3D 0; >>>>>>> for (int i =3D 0; i < 1024; ++i) >>>>>>> if (a[i] > val) >>>>>>> val =3D a[i] + 1; >>>>>>> return val; >>>>>>> } >>>>>>> >>>>>>> makes it no longer handled by current code. >>>>>>> >>>>>> >>>>>>Yes. I believe the idea for the patch is to handle arbitrary >>>>>>expressions >>>>>>like >>>>>> >>>>>>int foo (int *a) >>>>>>{ >>>>>> int val =3D 0; >>>>>> for (int i =3D 0; i < 1024; ++i) >>>>>> if (some_expression (i)) >>>>>> val =3D another_expression (i); >>>>>> return val; >>>>>>} >>>>> >>>>>Yes, that=C3=A2s correct. Hopefully my new test cases should cover >>>>>everything. >>>>> >>>> >>>> Attached is a new version of the patch containing all the changes >>>> requested by Richard. >>> >>>+ /* Compare the max index vector to the vector of found indexes to >>>find >>>+ the postion of the max value. This will result in either a >>>single >>>+ match or all of the values. */ >>>+ tree vec_compare =3D make_ssa_name (index_vec_type_signed); >>>+ gimple vec_compare_stmt =3D gimple_build_assign (vec_compare, >>>EQ_EXPR, >>>+ induction_index, >>>+ max_index_vec); >>> >>>I'm not sure all targets can handle this. If I deciper the code >>>correctly then we do >>> >>> mask =3D induction_index =3D=3D max_index_vec; >>> vec_and =3D mask & vec_data; >>> >>>plus some casts. So this is basically >>> >>> vec_and =3D induction_index =3D=3D max_index_vec ? vec_data : {0, 0, .= .. }; >>> >>>without the need to relate the induction index vector type to the data >>>vector type. >>>I believe this is also the form all targets support. >> >> >>Ok, I=C3=A2ll replace this. >> >>> >>>I am missing a comment before all this code-generation that shows the >>>transform >>>result with the variable names used in the code-gen. I have a hard >>>time connecting >>>things here. >> >>Ok, I=C3=A2ll add some comments. >> >>> >>>+ tree matched_data_reduc_cast =3D build1 (VIEW_CONVERT_EXPR, >>>scalar_type, >>>+ matched_data_reduc); >>>+ epilog_stmt =3D gimple_build_assign (new_scalar_dest, >>>+ matched_data_reduc_cast); >>>+ new_temp =3D make_ssa_name (new_scalar_dest, epilog_stmt); >>>+ gimple_assign_set_lhs (epilog_stmt, new_temp); >>> >>>this will leave the stmt unsimplified. scalar sign-changes should use >>>NOP_EXPR, >>>not VIEW_CONVERT_EXPR. The easiest fix is to use fold_convert instead. >>>Also just do like before - first make_ssa_name and then directly use it >>>in the >>>gimple_build_assign. >> >>We need the VIEW_CONVERT_EXPR for the cases where we have float data >>values. The index is always integer. >> >> >>> >>>The patch is somewhat hard to parse with all the indentation changes. A >>>context >>>diff would be much easier to read in those contexts. >> >>Ok, I=C3=A2ll make the next patch like that >> >>> >>>+ if (v_reduc_type =3D=3D COND_REDUCTION) >>>+ { >>>+ widest_int ni; >>>+ >>>+ if (! max_loop_iterations (loop, &ni)) >>>+ { >>>+ if (dump_enabled_p ()) >>>+ dump_printf_loc (MSG_NOTE, vect_location, >>>+ "loop count not known, cannot create cond " >>>+ "reduction.\n"); >>> >>>ugh. That's bad. >>> >>>+ /* The additional index will be the same type as the condition. >>>Check >>>+ that the loop can fit into this less one (because we'll use up >>>the >>>+ zero slot for when there are no matches). */ >>>+ tree max_index =3D TYPE_MAX_VALUE (cr_index_scalar_type); >>>+ if (wi::geu_p (ni, wi::to_widest (max_index))) >>>+ { >>>+ if (dump_enabled_p ()) >>>+ dump_printf_loc (MSG_NOTE, vect_location, >>>+ "loop size is greater than data size.\n"); >>>+ return false; >>> >>>Likewise. >> >>We could do better if we made the index type larger. >>But as a first implementation of this optimisation, I didn=C3=A2t want to >>overcomplicate things more. >> >>> >>>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt, >>>gimple_stmt_iterator *gsi, >>> if (dump_enabled_p ()) >>> dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n"); >>> >>>+ STMT_VINFO_TYPE (stmt_info) =3D reduc_vec_info_type; >>>+ >>> /* FORNOW: Multiple types are not supported for condition. */ >>> if (code =3D=3D COND_EXPR) >>> >>>this change looks odd (or wrong). The type should be _only_ set/changed >>>during >>>analysis. >> >> >>The problem is, for COND_EXPRs, this function calls >>vectorizable_condition(), which sets STMT_VINFO_TYPE to >>condition_vec_info_type. Ah, the pre-existing issue of the transform phase re-doing the analysis... a fix would be to condition that call on vec_stmt =3D=3D NULL, thus analysis phase. >>Therefore we need something to restore it back to reduc_vec_info_type on >>the non-analysis call. >> >>I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after >>the call to vectorizable_condition(), but that looked worse. >>I could back up the value of STMT_VINFO_TYPE before calling >>vectorizable_condition() and then restore it after? I think that=C3=A2ll = look a >>lot better. >> >> >>> >>>+ >>>+ /* For cond reductions we need to add an additional conditional >>>based on >>>+ the loop index. */ >>>+ if (v_reduc_type =3D=3D COND_REDUCTION) >>>+ { >>>+ int nunits_out =3D TYPE_VECTOR_SUBPARTS (vectype_out); >>>+ int k; >>>... >>>+ STMT_VINFO_VECTYPE (index_vec_info) =3D cr_index_vector_type; >>>+ set_vinfo_for_stmt (index_condition, index_vec_info); >>>+ >>>+ /* Update the phi with the vec cond. */ >>>+ add_phi_arg (new_phi, cond_name, loop_latch_edge (loop), >>>+ UNKNOWN_LOCATION); >>> >>>same as before - I am missing a comment that shows the generated code >>>and connects >>>the local vars used. >> >>Ok, I=C3=A2ll add something >> >>> >>> >>>+ tree ccompare_name =3D make_ssa_name (TREE_TYPE (ccompare)); >>>+ gimple ccompare_stmt =3D gimple_build_assign (ccompare_name, >>>ccompare); >>>+ gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, >>>GSI_SAME_STMT); >>>+ gimple_assign_set_rhs1 (*vec_stmt, ccompare_name); >>> >>>hum - are you sure this works with ncopies > 1? Will it use the >>>correct vec_stmt? >> >>We don=C3=A2t support this when ncopies >1. >> >>In vectorizable_reduction(): >> >>if ((double_reduc || v_reduc_type =3D=3D COND_REDUCTION) && ncopies > 1) >> { >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> "multiple types in double reduction or condition " >> "reduction.\n"); >> return false; >> } >> >> >> >>> >>>I still dislike the v_reduc_type plastered and passed everywhere. Can >>>you explore >>>adding the reduction kind to stmt_info? >> >>Ok, I can do that. >> >> >>Thanks for the comments. >>I=C3=A2ll put together a patch with the above changes. >> >>Thanks, >>Alan. >> >> > > Richard, as requested I've updated with the follow changes: > > * AND and EQ replaced with a COND_EXPR > * Better comments for the code gen, included references to variable names > * Kept the VIEW_CONVERT_EXPR - we need this for when the data is float ty= pe > * Backed up STMT_VINFO_TYPE before the call to vectorizable_condition() > and restored after the call. I considered extracting the relvant parts of > vectorizable_condition() into a sub function, but it had too many > dependencies with the rest of vectorizable_condition(). > * v_reduc_type is now part of stmt_info > * Created a diff using 50 lines of context Ok with the vectorizable_condition call guarded as suggested instead. Thanks, Richard. > > Thanks, > Alan. > >