On 2018-06-20 04:23 AM, Richard Biener wrote:
> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law <law@redhat.com> wrote:
>>
>> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
>>> Hi everyone,
>>>
>>> (I hope this is the right place to ask, if not my apologies; please
>>> point me in the right direction)
>>>
>>> I'm trying to get a better understanding of the following part in
>>> tree_swap_operands_p():
>>>
>>>   /* It is preferable to swap two SSA_NAME to ensure a canonical form
>>>      for commutative and comparison operators.  Ensuring a canonical
>>>      form allows the optimizers to find additional redundancies without
>>>      having to explicitly check for both orderings.  */
>>>   if (TREE_CODE (arg0) == SSA_NAME
>>>       && TREE_CODE (arg1) == SSA_NAME
>>>       && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
>>>     return 1;
>>>
>>> My questions in no particular order: It looks like this was added in
>>> 2004. I couldn't find any info other than what's in the corresponding
>>> commit (cc0bdf913) so I'm wondering if the canonical form/order still
>>> relevant/needed today? Does the ordering have to be done based on the
>>> name versions specifically? Or can it be based on something more
>>> intrinsic to the input source code rather than a GCC internal value, eg:
>>> would alphabetic sort order of the variable names be a reasonable
>>> replacement?
>> Canonicalization is still important and useful.
> 
> Indeed.
> 
>> However, canonicalizing on SSA_NAMEs is problematical due to the way we
>> recycle nodes and re-pack them.
> 
> In the past we made sure to not disrupt order - hopefully that didn't change
> so the re-packing shoudln't invaidate previous canonicalization:
> 
> static void
> release_free_names_and_compact_live_names (function *fun)
> {
> ...
>   /* And compact the SSA number space.  We make sure to not change the
>      relative order of SSA versions.  */
>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
>     {
> 
> 
>> I think defining additional rules for canonicalization prior to using
>> SSA_NAME_VERSION as the fallback would be looked upon favorably.
> 
> I don't see a good reason to do that, it will be harder to spot canonicalization
> issues and it will take extra compile-time.
> 
>> Note however, that many of the _DECL nodes referenced by SSA_NAMEs are
>> temporaries generated by the compiler and do not correspond to any
>> declared/defined object in the original source.  So you'll still need
>> the SSA_NAME_VERSION (or something as stable or better) canonicalization
>> to handle those cases.
> 
> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE names).
> 
> Richard.
> 
>> Jeff

After a bit more digging I found that insert_phi_nodes inserts PHIs in
the order of UIDs, which indirectly affects the order of vars in
old_ssa_names, which in turn affects the order in which make_ssa_name_fn
is called to pick SSA versions from FREE_SSANAMES so in the end
ordering by SSA_NAME_VERSION's is more or less equivalent to ordering by
UIDs. I'm trying to figure out if there's a way to avoid depending on
UIDs being ordered in a certain way. So if tree_swap_operands_p stays
the same I'm wondering if there's some other info available at the point
of insert_phi_nodes that would be a good replacement for UID. From my
very limited experience with a very small source input, and if I
understand things correctly, all of the var_infos have a var which is
DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
general case? I don't fully understand what are all the things that
insert_phi_nodes iterates over.

- Michael