From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-386343-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 10759 invoked by alias); 2 Dec 2014 15:29:06 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 10741 invoked by uid 89); 2 Dec 2014 15:29:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-wi0-f173.google.com
Received: from mail-wi0-f173.google.com (HELO mail-wi0-f173.google.com) (209.85.212.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 02 Dec 2014 15:29:02 +0000
Received: by mail-wi0-f173.google.com with SMTP id r20so28420142wiv.12        for <gcc-patches@gcc.gnu.org>; Tue, 02 Dec 2014 07:29:00 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.180.95.68 with SMTP id di4mr5982996wib.49.1417534138880; Tue, 02 Dec 2014 07:28:58 -0800 (PST)
Received: by 10.216.77.73 with HTTP; Tue, 2 Dec 2014 07:28:58 -0800 (PST)
In-Reply-To: <CAFiYyc2TKP4FxbZ7L8=iDVgx-nQC_Dorz5tqaidSa+5vS_EAPQ@mail.gmail.com>
References: <CAEoMCqSS=M5vOV46ZdjNKpELqGqFHOi5ct39QMRg5NdO5uzfWg@mail.gmail.com>	<CAFiYyc3ONiHL1nsLK1MSEqZR94pHYmb+rK-M24n2fzACvFDiVA@mail.gmail.com>	<CAEoMCqQX5qEExZ7Xtz_JNzus5FQizvWkbFSW5y+TGNjt71BHaQ@mail.gmail.com>	<CAFiYyc2TKP4FxbZ7L8=iDVgx-nQC_Dorz5tqaidSa+5vS_EAPQ@mail.gmail.com>
Date: Tue, 02 Dec 2014 15:29:00 -0000
Message-ID: <CAEoMCqR3i8PgA3RF9dS6QrNBpBAujiaEN0xtLqeMUrcgMCgLfA@mail.gmail.com>
Subject: Re: [PATCH 2/3] Extended if-conversion
From: Yuri Rumyantsev <ysrumyan@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>, Igor Zamyatin <izamyatin@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-SW-Source: 2014-12/txt/msg00172.txt.bz2

Thanks Richard for your quick reply!

1. I agree that we can combine predicate_extended_ and
predicate_arbitrary_ to one function as you proposed.
2. What is your opinion about using more simple decision about
insertion point - if bb has use of phi result insert phi predication
before it and at the bb end otherwise. I assume that critical edge
splitting is not a good decision.

Best regards.
Yuri.

2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Richard,
>>
>> I resend you patch1 and patch2 with minor changes:
>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>> I also very sorry that I sent you bad patch.
>>
>> Now let me answer on your questions related to second patch.
>> 1. Why we need both predicate_extended_scalar_phi and
>> predicate_arbitrary_scalar_phi?
>>
>> Let's consider the following simple test-case:
>>
>>   #pragma omp simd safelen(8)
>>   for (i=0; i<512; i++)
>>   {
>>     float t = a[i];
>>     if (t > 0.0f & t < 1.0e+17f)
>>       if (c[i] != 0)  /* c is integer array. */
>> res += 1;
>>   }
>>
>> we can see the following phi node correspondent to res:
>>
>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>
>> It is clear that we can optimize it to phi node with 2 arguments only
>> and only one check can be used for phi predication (for reduction in
>> our case), namely predicate of bb_5. In general case we can't do it
>> even if we sort all phi argument values since we still have to produce
>> a chain of cond expressions to perform phi predication (see comments
>> for predicate_arbitrary_scalar_phi).
>
> How so?  We can always use !(condition) for the "last" value, thus
> treat it as an 'else' case.  That even works for
>
> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>
> where the condition for edges 5 and 7 can be computed as
> ! (condition for 3 || condition for 4).
>
> Of course it is worthwhile to also sort single-occurances first
> so your case gets just the condiiton for edge 5 and its inversion
> used for edges 3 and 4 combined.
>
>> 2. Why we need to introduce find_insertion_point?
>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>> only critical incoming edges and both contain code computing edge
>> predicates, e.g.
>>
>> <bb 7>:
>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>> _46 = xmax_17 == xmax_37;
>> _47 = xmax_17 == xmax_27;
>> _48 = _46 & _47;
>> _53 = xmax_17 == xmax_37;
>> _54 = ~_53;
>> _55 = xmax_17 == xmax_27;
>> _56 = _54 & _55;
>> _57 = _48 | _56;
>> xmax_edge_19 = xmax_edge_39 + 1;
>> goto <bb 11>;
>>
>> It is evident that we can not put phi predication at the block
>> beginning but need to put it after predicate computations.
>> Note also that if there are no critical edges for phi arguments
>> insertion point will be "after labels" Note also that phi result can
>> have use in this block too, so we can't put predication code to the
>> block end.
>
> So the issue is that predicate insertion for edge predicates does
> not happen on the edge but somewhere else (generally impossible
> for critical edges unless you split them).
>
> I think I've told you before that I prefer simple solutions to such issues,
> like splitting the edge!  Certainly not involving a function walking
> GENERIC expressions.
>
> Thanks,
> Richard.
>
>> Let me know if you still have any questions.
>>
>> Best regards.
>> Yuri.
>>
>>
>>
>>
>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> Here is the second patch related to extended predication.
>>>> Few comments which explain a main goal of design.
>>>>
>>>> 1. I don't want to insert any critical edge splitting since it may
>>>> lead to less efficient binaries.
>>>> 2. One special case of extended PHI node predication was introduced
>>>> when #arguments is more than 2 but only two arguments are different
>>>> and one argument has the only occurrence. For such PHI conditional
>>>> scalar reduction is applied.
>>>> This is correspondent to the following statement:
>>>>     if (q1 && q2 && q3) var++
>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>> 3. Original algorithm for PHI predication used assumption that at
>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>> guarantees that all computations related to predicate of normal edge
>>>> are already inserted above this block and
>>>> code related to PHI predication can be inserted at the beginning of
>>>> block. But this is not true for critical edges for which predicate
>>>> computations are  in the block where code for phi predication must be
>>>> inserted. So new function find_insertion_point is introduced which is
>>>> simply found out the last statement in block defining predicates
>>>> correspondent to all incoming edges and insert phi predication code
>>>> after it (with some minor exceptions).
>>>
>>> Unfortunately the patch doesn't apply for me - I get
>>>
>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>> predicate_all_scalar_phis (struct loop *loop)
>>>
>>> a few remarks nevertheless.  I don't see how we need both
>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>> and handle equal values specially in predicate_extended_scalar_phi?
>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>
>>> I don't understand the need for find_insertion_point.  All SSA names
>>> required for the predicates are defined upward - and the complex CFG
>>> is squashed to a single basic-block, thus the defs will dominate the
>>> inserted code if you insert after labels just like for the other case.
>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>> to happen in dominator order - but I guess that happens already?)
>>>
>>> I'd like the extended PHI handling to be enablable by a flag even
>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>> times that would have been nice to vectorize.  I suggest to
>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>> followup, but please rename the local flag_force_vectorize flag
>>> to something less looking like a flag, like simply 'aggressive'.
>>>
>>> Otherwise patch 2 looks ok to me.
>>>
>>> Richard.
>>>
>>>
>>>> ChangeLog:
>>>>
>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>> FLAG_FORCE_VECTORIZE is true.
>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>> non-critical incoming edge.
>>>> (phi_has_two_different_args): New function.
>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>> arguments if EXTENDED is true. Change check that block
>>>> containing reduction statement candidate is predecessor
>>>> of phi-block since phi may have more than two arguments.
>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>> statement before/after gsi point.
>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>> true (which correspondent to argument BEFORE) to call of
>>>> convert_scalar_cond_reduction.
>>>> (get_predicate_for_edge): New function.
>>>> (predicate_arbitrary_scalar_phi): New function.
>>>> (predicate_extended_scalar_phi): New function.
>>>> (find_insertion_point): New function.
>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>> EXTENDED value.
>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>> may have statements to insert. Insert predicate of BB just after label
>>>> if FLAG_FORCE_VECTORIZE is true.
>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>> is copy of inner or outer loop field force_vectorize.