From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-432542-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 63254 invoked by alias); 26 Jul 2016 13:37:29 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 63240 invoked by uid 89); 26 Jul 2016 13:37:28 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=utilize, optsc, opts.c, UD:opts.c
X-HELO: mail-wm0-f50.google.com
Received: from mail-wm0-f50.google.com (HELO mail-wm0-f50.google.com) (74.125.82.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 26 Jul 2016 13:37:17 +0000
Received: by mail-wm0-f50.google.com with SMTP id q128so173808562wma.1        for <gcc-patches@gcc.gnu.org>; Tue, 26 Jul 2016 06:37:17 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=qLXKNGUzzPi3GGmvYxIh63lLc5E62Zqq8Ij5YmTtja0=;        b=ZvWO+sSkXt1wxmnl7sARyP2hU+5sR9blhQQpWB76xfABMMEZJhM7wVWqSvXPGo+CLA         L6HkR8NQ/Y4xW0cpMz45cr/FIE1FjElee0tPdNLCfpi6d1H0Bw3e+i+z+Ih/mncaFz9p         XfcuOMJvSFyF5WJ1P8IMjRwDWFL565sYSymSuJSZOfCS8ratI7n0FzDNJcRceb3pVTmp         ucfqApJpWCxDy7twOQD5OjeClzTGjoDqvEdxmZOKUaDVaOpWCGUhel9Ay9TFXg8YImFb         MQHRNatdo86QqNxl/S3n8d3bNJfJ/pnmLc5K7sfWmzbq626X4Pn7lYOovt0/9E8mTEVC         V6Ew==
X-Gm-Message-State: AEkoous785nEPnON+e7vRuBD4MYSibHDAUQkomcuFcbxviBA0em/+g/CBPYfEWcn6eDNMmI26VX5KhZcLtyK1w==
X-Received: by 10.194.77.142 with SMTP id s14mr20431034wjw.77.1469540234098; Tue, 26 Jul 2016 06:37:14 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.28.137.202 with HTTP; Tue, 26 Jul 2016 06:37:13 -0700 (PDT)
In-Reply-To: <19ff8188-aed7-0f9e-cc0b-0603698787ff@linaro.org>
References: <57886949.8010300@linaro.org> <57886A71.6030103@linaro.org> <CA+=Sn1mYiB-xs=1H0rR_FxyUP0AaXS410C=+jyGu7MHDgsVOjg@mail.gmail.com> <57888BD6.6070200@linaro.org> <CA+=Sn1=qcCwu2VDTGTTfvFiUdDdUPOz78J2PAx0Pu-WMatZ-gQ@mail.gmail.com> <578891C8.7090609@linaro.org> <CAFiYyc2O1rh=UkOY6bvsciXmg4fLcua4xvT+3CZ_QKVO27WJxQ@mail.gmail.com> <e7d718c3-405a-ddd3-45be-e16f989bc91c@linaro.org> <CAFiYyc26xXnOqF=ocCAAnzQKC_ENJydj44QLbwRLo7cK72w3VA@mail.gmail.com> <19ff8188-aed7-0f9e-cc0b-0603698787ff@linaro.org>
From: Richard Biener <richard.guenther@gmail.com>
Date: Tue, 26 Jul 2016 13:37:00 -0000
Message-ID: <CAFiYyc0b3J7a6kB0VoLWGG9_qJ6eK2NuGshQfDpewsipkUh_7g@mail.gmail.com>
Subject: Re: [RFC][IPA-VRP] Early VRP Implementation
To: kugan <kugan.vivekanandarajah@linaro.org>
Cc: Andrew Pinski <pinskia@gmail.com>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, 	Jan Hubicka <hubicka@ucw.cz>, Martin Jambor <mjambor@suse.cz>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2016-07/txt/msg01714.txt.bz2

On Tue, Jul 26, 2016 at 2:27 PM, kugan
<kugan.vivekanandarajah@linaro.org> wrote:
>
>
> Hi Riachard,
>
> Thanks for the review. Here is an updated patch with comments below.
>
>> +/* Restore/Pop all the old VRs maintained in the cond_stack.  */
>> +
>> +void evrp_dom_walker::finalize_dom_walker ()
>> +{
>> +  while (!cond_stack.is_empty ())
>> +    {
>> +      tree var = cond_stack.last ().second;
>> +      pop_value_range (var);
>> +      cond_stack.pop ();
>> +    }
>>
>> why is this necessary at all?  Looks like a waste of time (and the
>> stack should be empty here
>> anyway).  I think you leak cond_stack as well, I suppose using
>> auto_vec might work here,
>> you have to check.
>
> Done.
>
>>
>>>>
>>>> +         /* Discover VR when condition is true.  */
>>>> +         if (te == e
>>>> +             && !TREE_OVERFLOW_P (op0)
>>>> +             && !TREE_OVERFLOW_P (op1))
>>>
>>>
>>>
>>> This is mainly to handle gcc.dg/pr38934.c. This would result in ICE from
>>> set_value_range otherwise:
>>>
>>> gcc_assert ((!TREE_OVERFLOW_P (min) || is_overflow_infinity (min))
>>>                   && (!TREE_OVERFLOW_P (max) || is_overflow_infinity
>>> (max)));
>>
>>
>> Ok, instead make sure to remove the overflow flag via
>>
>>   if (TREE_OVERFLOW_P (op0))
>>     op0 = drop_tree_overflow (op0);
>
>  ...
> Done.
>
>>
>> it looks like you want to merge the if ( & EDGE_TRUE_VALUE) and
>> FALSE_VALUE
>> cases and only invert the tree comparison for the false edge.
>
> Done.
>
>>
>> +             tree cond = build2 (code, boolean_type_node, op0, op1);
>> +             extract_range_for_var_from_comparison_expr (&vr, op0, cond);
>>
>> no wasteful tree building here please.  Instead of cond pass in code,
>> op0 and op1
>> as separate arguments.
>
> Done.
>
>>
>> +         /* If we found any usable VR, set the VR to ssa_name and create
>> a
>> +            PUSH old value in the cond_stack with the old VR.  */
>> +         if (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE)
>> +           {
>> +             value_range *new_vr = vrp_value_range_pool.allocate ();
>> +             memset (new_vr, 0, sizeof (*new_vr));
>> +             *new_vr = vr;
>> +             cond_stack.safe_push (std::make_pair (bb, op0));
>>
>> the memset looks redundant given you copy-assing from vr anyway.
>
> Done.
>
>>
>> You seem to miss the chance to register ranges for x_2 in i_1 < x_2 where
>> both i_1 and x_2 might have ranges that are useful.
>
> I will address this once the basic structure is in place if that is OK with
> you.

Sure.  Just note it down somewhere.

>>
>> push and pop_value_range should be methods in the dom walker class
>> and vrp_stack and cond_stack should be a single stack.  You can omit
>> basic_block if you use a "magic" NULL_TREE var as marker (simply
>> push a NULL_TREE, NULL pair at the end of before_dom_children).
>>
> Done.
>
>
>>>>
>>>> you can use e->flags & EDGE_TRUE_VALUE/EDGE_FALSE_VALUE
>>>>
>>>> why do you need those TREE_OVERFLOW checks?
>>>>
>>>> +             tree cond = build2 (code, boolean_type_node, op0, op1);
>>>> +             tree a = build2 (ASSERT_EXPR, TREE_TYPE (op0), op0, cond);
>>>> +             extract_range_from_assert (&vr, a);
>>>>
>>>> so I was hoping that the "refactoring" patch in the series would expose
>>>> a
>>>> more
>>>> useful interface than extract_range_from_assert ... namely one that can
>>>> extract a range from the comparison directly and does not require
>>>> building
>>>> a scratch ASSERT_EXPR.
>>>
>>>
>>>
>>> I have split extract_range_from_assert into
>>> extract_range_for_var_from_comparison_expr and using it now. Is that
>>> better?
>>
>>
>> See above.
>>
>>>>
>>>>
>>>> +         /* If we found any usable VR, set the VR to ssa_name and
>>>> create
>>>> a
>>>> +            restore point in the cond_stack with the  old VR. */
>>>> +         if (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE)
>>>> +           {
>>>> +             value_range *new_vr = XCNEW (value_range);
>>>> +             *new_vr = vr;
>>>> +             cond_stack.safe_push (std::make_pair (bb,
>>>> +                                                   std::make_pair (op0,
>>>> +
>>>> old_vr)));
>>>> +             change_value_range (op0, new_vr);
>>>>
>>>> I don't like 'change_value_range' as interface, please integrate that
>>>> into
>>>> a push/pop_value_range style interface instead.
>>>
>>>
>>>
>>> Tried using push/pop interface.
>>>>
>>>>
>>>>
>>>> +       vrp_visit_stmt (stmt, &taken_edge_p, &output_p);
>>>> +    }
>>>> +
>>>> +  return NULL;
>>>>
>>>> you should return taken_edge_p (misnamed as it isn't a pointer) and take
>>>> advantage of EDGE_EXECUTABLE.  Again see DOM/SCCVN (might want to
>>>> defer this as a followup improvement).
>>>
>>>
>>>
>>> I have added a TODO to this effect and will comeback to it.
>>>>
>>>>
>>>>
>>>> Note that the advantage of a DOM-based VRP is that backtracking is easy
>>>> to implement (you don't do that yet).  That is, after DEF got a (better)
>>>> value-range you can simply re-visit all the defs of its uses (and
>>>> recursively).
>>>> I think you have to be careful with stmts that might prematurely leave a
>>>> BB
>>>> though (like via EH).  So sth for a followup as well.
>>>
>>>
>>>
>>> Is this OK now. Bootstrapped and regression tested on x86_64-linux  with
>>> no
>>> new regressions.
>>
>>
>> Better, still not perfect.
>>
>> I'm also arguing on the pass placement now - you put it where it may make
>> sense
>> for IPA VRP (but then IPA VRP could simply do this in its analysis phase)
>> but
>> not so much where it makes sense during the early pipeline.  I think it
>> makes
>> sense after FRE.
>>
>> Looking at how you finalize I see several issues with simply re-using
>> vrp_finalize.
>>
>> First of all the loop doing the set_range_info will turn up with
>> nothing as you've
>> popped off all value-ranges from the lattice.  Same for
>> substitute-and-fold (or
>> jump-threading).
>
> I am not sure understanding you. I am poping the value ranges for scope when
> we leave them. Other value ranges which lives in all the regions will
> remain.

Ah, yeah.  Sorry for the confusion.

>>
>> What you instead need to do with the DOM scheme is at the point you
>> call vrp_visit_stmt do sth like
>>
>>      vrp_visit_stmt (....);
>>      if (fold_stmt (&gsi, follow_single_use_edges))
>>        update_stmt ();
>
> I have added this. I think this will be good as we would be able to optimize
> with value ranges that is valid within the scope.

Yes, it also improves memory access locality and thus compile-time I think.

>>      tree def = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF);
>>      if (def
>>          && INTEGRAL_TYPE_P (TREE_TYPE (def))
>>          && (TREE_CODE (vr_value[i]->min) == INTEGER_CST)
>>           && (TREE_CODE (vr_value[i]->max) == INTEGER_CST)
>>           && (vr_value[i]->type == VR_RANGE
>>               || vr_value[i]->type == VR_ANTI_RANGE))
>>         set_range_info (name, vr_value[i]->type, vr_value[i]->min,
>>                         vr_value[i]->max);
>>
> I am not sure if this is needed. I dont know if my push/pop value range
> implementation is not what you wanted. If you still prefer, I am happy to
> add this.

See above, also if you fold at this point names used should have their
ranges updated so match.pd patterns can utilize them.

I see you are still calling substitute_and_fold though so if you want to
continue doing that (for now) then you don't need any of the above.
If you stop doing that then you need to supply vrp_valueize instead of
follow_single_use_edges to fold_stmt, otherwise you won't get any
constants propagated.

I'd prefer to not call substitute_and_fold as that is a function of the
SSA propagator.

>
>> thus please split vrp_finalize into two parts, one of it with the memory
>> freeing which is the one you'd call.
>>
>> Note that EVRP is not enabled by default at any optimization level
>> it seems so bootstrap / test of it would be useless (did you only
>> test with the full series?  I never looked at the IPA VRP part)
>>
> I have:
>
> +ftree-evrp
> +Common Report Var(flag_tree_early_vrp) Init(1) Optimization
> +Perform Early Value Range Propagation on trees.
>
> I would like to run this only for -O2 and above but for now I am using this
> to test.

There is an array in opts.c, default_options_table, where you can set
defaults based on the optimization level.  If you want to turn it on
at -O2 that would match the setting for -ftree-vrp in which case I would
rather not add another option.  For debugging one can use
-fdisable-tree-evrp1 for example.

+value_range *evrp_dom_walker::pop_value_range (const_tree var)
+{

coding style nit: newline missing after return type.

It seems that in your pop_value_range you assume you only pop one
range per BB - while that's likely true at the moment it will be a limitation
in the future.  You want to pop ranges until you hit the NULL marker
in after_dom_children and unconditionally push a NULL marker.

For example to match current VRPs behavior on say

   i_2 = (int) j_3;
   if (i_2 < 0)
     ...

which can register an assert for j_3 when i_2 < 0 is true we'd do that
by re-simulating DEFs of uses we figured out new ranges of (and all
their uses).  All those ranges would be temporary as well, thus they'd
need to be pushed/popped.  In my quick prototype this was done
using a worklist seeded by the names we can derive a range from from
conditionals and "SSA propagating" from it.  Note that for this
the generic vrp_visit_stmt cannot be re-used as it doesn't push/pop,
factoring out the lattice update is what is needed here.

+/* Visit the basic blocks in the dominance order and set the Value Ranges (VR)
+   for SSA_NAMEs in the scope.  Use this VR to discover more VRs.  Restore the
+   old VR once the scope is exited.  */
+
+static bool
+evrp_visit_phi_node_local (gphi *phi)
+{
+  size_t i;
+  tree lhs = PHI_RESULT (phi);
+  value_range vr_result = VR_INITIALIZER;
+  bool first = true;
+  int edges;
+
+  edges = 0;
+  for (i = 0; i < gimple_phi_num_args (phi); i++)
+    {
+      edge e = gimple_phi_arg_edge (phi, i);
+      tree arg = PHI_ARG_DEF (phi, i);
+      value_range vr_arg = VR_INITIALIZER;
+      ++edges;
+
+      /* If there is a back-edge, set the result to VARYING.  */
+      if (e->flags & (EDGE_DFS_BACK | EDGE_COMPLEX))
+       {
+         set_value_range_to_varying (&vr_result);
+         break;
+       }
...
+      /* If any of the RHS value is VARYING, set the result to VARYING.  */
+      if ((vr_arg.type != VR_RANGE)
+         && (vr_arg.type != VR_ANTI_RANGE))
+       {
+         set_value_range_to_varying (&vr_result);
+         break;
+       }

this shows that you need to start conservative for a DOM based VRP,
thus with all lattice values initialized to VARYING (but undefined SSA
names of course still can be UNDEFINED) rather than UNDEFINED.

+      if (TREE_CODE (arg) == SSA_NAME)
+       vr_arg = *(get_value_range (arg));
+      else
+       set_value_range_to_varying (&vr_arg);

err - what about constants?  When you initialize the lattice properly
you should be able to re-use vrp_visit_phi_node (maybe split out
its head to avoid using SCEV or the iteration limitation).

Btw, you don't want to call vrp_initialize in evrp either, this is setting
SSA propagator state which you do not want to do.  Please factor
out lattice allocation/deallocation instead.  I see that might require
really factoring vrp_visit_stmt into a function that omits updating
the lattice and just returns a range for the single DEF.

That said, a good refactoring is to split the SSA propagator callback
implementations (vrp_visit_stmt and vrp_visit_phi_node) into workers
returning a value range and the callback that does the update_value_range
call plus returing a SSA propgator state.  You can then re-use the worker.

Thanks,
Richard.

> I  have tested the last set of patch separately.
>
> I will do more testing on this patch based on your feedback. Does this look
> better?
>
> Thanks,
> Kugan
>
>