From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10759 invoked by alias); 2 Dec 2014 15:29:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 10741 invoked by uid 89); 2 Dec 2014 15:29:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wi0-f173.google.com Received: from mail-wi0-f173.google.com (HELO mail-wi0-f173.google.com) (209.85.212.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 02 Dec 2014 15:29:02 +0000 Received: by mail-wi0-f173.google.com with SMTP id r20so28420142wiv.12 for ; Tue, 02 Dec 2014 07:29:00 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.180.95.68 with SMTP id di4mr5982996wib.49.1417534138880; Tue, 02 Dec 2014 07:28:58 -0800 (PST) Received: by 10.216.77.73 with HTTP; Tue, 2 Dec 2014 07:28:58 -0800 (PST) In-Reply-To: References: Date: Tue, 02 Dec 2014 15:29:00 -0000 Message-ID: Subject: Re: [PATCH 2/3] Extended if-conversion From: Yuri Rumyantsev To: Richard Biener Cc: gcc-patches , Igor Zamyatin Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2014-12/txt/msg00172.txt.bz2 Thanks Richard for your quick reply! 1. I agree that we can combine predicate_extended_ and predicate_arbitrary_ to one function as you proposed. 2. What is your opinion about using more simple decision about insertion point - if bb has use of phi result insert phi predication before it and at the bb end otherwise. I assume that critical edge splitting is not a good decision. Best regards. Yuri. 2014-12-02 16:28 GMT+03:00 Richard Biener : > On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev wrote: >> Hi Richard, >> >> I resend you patch1 and patch2 with minor changes: >> 1. I renamed flag_force_vectorize to aggressive_if_conv. >> 2. Use static cast for the first argument of gimple_phi_arg_edge. >> I also very sorry that I sent you bad patch. >> >> Now let me answer on your questions related to second patch. >> 1. Why we need both predicate_extended_scalar_phi and >> predicate_arbitrary_scalar_phi? >> >> Let's consider the following simple test-case: >> >> #pragma omp simd safelen(8) >> for (i=0; i<512; i++) >> { >> float t = a[i]; >> if (t > 0.0f & t < 1.0e+17f) >> if (c[i] != 0) /* c is integer array. */ >> res += 1; >> } >> >> we can see the following phi node correspondent to res: >> >> # res_1 = PHI >> >> It is clear that we can optimize it to phi node with 2 arguments only >> and only one check can be used for phi predication (for reduction in >> our case), namely predicate of bb_5. In general case we can't do it >> even if we sort all phi argument values since we still have to produce >> a chain of cond expressions to perform phi predication (see comments >> for predicate_arbitrary_scalar_phi). > > How so? We can always use !(condition) for the "last" value, thus > treat it as an 'else' case. That even works for > > # res_1 = PHI > > where the condition for edges 5 and 7 can be computed as > ! (condition for 3 || condition for 4). > > Of course it is worthwhile to also sort single-occurances first > so your case gets just the condiiton for edge 5 and its inversion > used for edges 3 and 4 combined. > >> 2. Why we need to introduce find_insertion_point? >> Let's consider another test-case extracted from 175.vpr ( t5.c is >> attached) and we can see that bb_7 and bb_9 containig phi nodes has >> only critical incoming edges and both contain code computing edge >> predicates, e.g. >> >> : >> # xmax_edge_30 = PHI >> _46 = xmax_17 == xmax_37; >> _47 = xmax_17 == xmax_27; >> _48 = _46 & _47; >> _53 = xmax_17 == xmax_37; >> _54 = ~_53; >> _55 = xmax_17 == xmax_27; >> _56 = _54 & _55; >> _57 = _48 | _56; >> xmax_edge_19 = xmax_edge_39 + 1; >> goto ; >> >> It is evident that we can not put phi predication at the block >> beginning but need to put it after predicate computations. >> Note also that if there are no critical edges for phi arguments >> insertion point will be "after labels" Note also that phi result can >> have use in this block too, so we can't put predication code to the >> block end. > > So the issue is that predicate insertion for edge predicates does > not happen on the edge but somewhere else (generally impossible > for critical edges unless you split them). > > I think I've told you before that I prefer simple solutions to such issues, > like splitting the edge! Certainly not involving a function walking > GENERIC expressions. > > Thanks, > Richard. > >> Let me know if you still have any questions. >> >> Best regards. >> Yuri. >> >> >> >> >> 2014-11-28 15:43 GMT+03:00 Richard Biener : >>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev wrote: >>>> Hi All, >>>> >>>> Here is the second patch related to extended predication. >>>> Few comments which explain a main goal of design. >>>> >>>> 1. I don't want to insert any critical edge splitting since it may >>>> lead to less efficient binaries. >>>> 2. One special case of extended PHI node predication was introduced >>>> when #arguments is more than 2 but only two arguments are different >>>> and one argument has the only occurrence. For such PHI conditional >>>> scalar reduction is applied. >>>> This is correspondent to the following statement: >>>> if (q1 && q2 && q3) var++ >>>> New function phi_has_two_different_args was introduced to detect such phi. >>>> 3. Original algorithm for PHI predication used assumption that at >>>> least one incoming edge for blocks containing PHI is not critical - it >>>> guarantees that all computations related to predicate of normal edge >>>> are already inserted above this block and >>>> code related to PHI predication can be inserted at the beginning of >>>> block. But this is not true for critical edges for which predicate >>>> computations are in the block where code for phi predication must be >>>> inserted. So new function find_insertion_point is introduced which is >>>> simply found out the last statement in block defining predicates >>>> correspondent to all incoming edges and insert phi predication code >>>> after it (with some minor exceptions). >>> >>> Unfortunately the patch doesn't apply for me - I get >>> >>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>> predicate_all_scalar_phis (struct loop *loop) >>> >>> a few remarks nevertheless. I don't see how we need both >>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>> Couldn't we simply sort an array of (edge, value) pairs after value >>> and handle equal values specially in predicate_extended_scalar_phi? >>> That would even make PHI more optimal. >>> >>> I don't understand the need for find_insertion_point. All SSA names >>> required for the predicates are defined upward - and the complex CFG >>> is squashed to a single basic-block, thus the defs will dominate the >>> inserted code if you insert after labels just like for the other case. >>> Or what am I missing? ("flattening" of the basic-blocks of course needs >>> to happen in dominator order - but I guess that happens already?) >>> >>> I'd like the extended PHI handling to be enablable by a flag even >>> for !force-vectorization - I've seen cases with 3 PHI args multiple >>> times that would have been nice to vectorize. I suggest to >>> add -ftree-loop-if-convert-aggressive for this. We can do this as >>> followup, but please rename the local flag_force_vectorize flag >>> to something less looking like a flag, like simply 'aggressive'. >>> >>> Otherwise patch 2 looks ok to me. >>> >>> Richard. >>> >>> >>>> ChangeLog: >>>> >>>> 2014-10-24 Yuri Rumyantsev >>>> >>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>> FLAG_FORCE_VECTORIZE is true. >>>> (if_convertible_bb_p): Delete check that bb has at least one >>>> non-critical incoming edge. >>>> (phi_has_two_different_args): New function. >>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>> arguments if EXTENDED is true. Change check that block >>>> containing reduction statement candidate is predecessor >>>> of phi-block since phi may have more than two arguments. >>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>> statement before/after gsi point. >>>> (predicate_scalar_phi): Add argument false (which means non-extended >>>> predication) to call of is_cond_scalar_reduction. Add argument >>>> true (which correspondent to argument BEFORE) to call of >>>> convert_scalar_cond_reduction. >>>> (get_predicate_for_edge): New function. >>>> (predicate_arbitrary_scalar_phi): New function. >>>> (predicate_extended_scalar_phi): New function. >>>> (find_insertion_point): New function. >>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>>> than 2 predecessors or both incoming edges are critical. Invoke >>>> find_phi_replacement_condition and predicate_scalar_phi or >>>> find_insertion_point and predicate_extended_scalar_phi depending on >>>> EXTENDED value. >>>> (insert_gimplified_predicates): Add check that non-predicated block >>>> may have statements to insert. Insert predicate of BB just after label >>>> if FLAG_FORCE_VECTORIZE is true. >>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which >>>> is copy of inner or outer loop field force_vectorize.