public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@linaro.org>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: Handle data dependence relations with different bases
Date: Thu, 04 May 2017 17:23:00 -0000	[thread overview]
Message-ID: <87a86sfsgg.fsf@linaro.org> (raw)
In-Reply-To: <CAFiYyc0y66Rc9OU_ocZrvNZ9ogjTdzc3VhcGAWhBUdOdKg4H5g@mail.gmail.com>	(Richard Biener's message of "Thu, 4 May 2017 14:32:43 +0200")

Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, May 4, 2017 at 2:12 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch tries to calculate conservatively-correct distance
>>> vectors for two references whose base addresses are not the same.
>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
>>> isn't guaranteed to occur.
>>>
>>> The motivating example is:
>>>
>>>   struct s { int x[8]; };
>>>   void
>>>   f (struct s *a, struct s *b)
>>>   {
>>>     for (int i = 0; i < 8; ++i)
>>>       a->x[i] += b->x[i];
>>>   }
>>>
>>> in which the "a" and "b" accesses are either independent or have a
>>> dependence distance of 0 (assuming -fstrict-aliasing).  Neither case
>>> prevents vectorisation, so we can vectorise without an alias check.
>>>
>>> I'd originally wanted to do the same thing for arrays as well, e.g.:
>>>
>>>   void
>>>   f (int a[][8], struct b[][8])
>>>   {
>>>     for (int i = 0; i < 8; ++i)
>>>       a[0][i] += b[0][i];
>>>   }
>>>
>>> I think this is valid because C11 6.7.6.2/6 says:
>>>
>>>   For two array types to be compatible, both shall have compatible
>>>   element types, and if both size specifiers are present, and are
>>>   integer constant expressions, then both size specifiers shall have
>>>   the same constant value.
>>>
>>> So if we access an array through an int (*)[8], it must have type X[8]
>>> or X[], where X is compatible with int.  It doesn't seem possible in
>>> either case for "a[0]" and "b[0]" to overlap when "a != b".
>>>
>>> However, Richard B said that (at least in gimple) we support arbitrary
>>> overlap of arrays and allow arrays to be accessed with different
>>> dimensionality.  There are examples of this in PR50067.  I've therefore
>>> only handled references that end in a structure field access.
>>>
>>> There are two ways of handling these dependences in the vectoriser:
>>> use them to limit VF, or check at runtime as before.  I've gone for
>>> the approach of checking at runtime if we can, to avoid limiting VF
>>> unnecessarily.  We still fall back to a VF cap when runtime checks
>>> aren't allowed.
>>>
>>> The patch tests whether we queued an alias check with a dependence
>>> distance of X and then picked a VF <= X, in which case it's safe to
>>> drop the alias check.  Since vect_prune_runtime_alias_check_list can
>>> be called twice with different VF for the same loop, it's no longer
>>> safe to clear may_alias_ddrs on exit.  Instead we should use
>>> comp_alias_ddrs to check whether versioning is necessary.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> You seem to do your "fancy" thing but also later compute the old
>> base equality anyway (for same_base_p).  It looks to me for this
>> case the new fancy code can be simply skipped, keeping num_dimensions
>> as before?
>>
>> +      /* Try to approach equal type sizes.  */
>> +      if (!COMPLETE_TYPE_P (type_a)
>> +         || !COMPLETE_TYPE_P (type_b)
>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
>> +       break;
>>
>> ah, interesting idea to avoid a quadratic search.  Note that you should
>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
>> as they are used for type-punning.

All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs,
ARRAY_REFs or COMPONENT_REFs of structures, since that's all that
dr_analyze_indices allows, so I think we safe in terms of the tree codes.

>> I see nonoverlapping_component_refs_of_decl_p should simply skip
>> ARRAY_REFs - but I also see there:
>>
>>       /* ??? We cannot simply use the type of operand #0 of the refs here
>>          as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>          for common blocks instead of using unions like everyone else.  */
>>       tree type1 = DECL_CONTEXT (field1);
>>       tree type2 = DECL_CONTEXT (field2);
>>
>> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility.
>> You also may not use types_compatible_p here as for LTO that is _way_ too
>> lax for aggregates.  The above uses
>>
>>       /* We cannot disambiguate fields in a union or qualified union.  */
>>       if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>          return false;
>>
>> so you should also bail out on unions here, rather than the check you do later.

The loop stops before we get to a union, so I think "only" the RECORD_TYPE
COMPONENT_REF handling is a potential problem.  Does this mean that
I should use the nonoverlapping_component_refs_of_decl_p code:

      tree field1 = TREE_OPERAND (ref1, 1);
      tree field2 = TREE_OPERAND (ref2, 1);

      /* ??? We cannot simply use the type of operand #0 of the refs here
	 as the Fortran compiler smuggles type punning into COMPONENT_REFs
	 for common blocks instead of using unions like everyone else.  */
      tree type1 = DECL_CONTEXT (field1);
      tree type2 = DECL_CONTEXT (field2);

      /* We cannot disambiguate fields in a union or qualified union.  */
      if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
	 return false;

      if (field1 != field2)
	{
	  /* A field and its representative need to be considered the
	     same.  */
	  if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2
	      || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1)
	    return false;
	  /* Different fields of the same record type cannot overlap.
	     ??? Bitfields can overlap at RTL level so punt on them.  */
	  if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
	    return false;
	  return true;
	}

as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p
during the new loop?  And test for this as well as unions in the outer
references?

>> You seem to rely on getting an access_fn entry for each handled_component_p.
>> It looks like this is the case -- we even seem to stop at unions (with the same
>> fortran "issue").  I'm not sure that's the best thing to do but you
>> rely on that.

Yeah, the loop is deliberately limited to the components associated with
an access_fn.  I did wonder at first whether dr_analyze_indices should
store the original component reference trees for each access function.
That would make things simpler and more explicit, but would also eat up
more memory.  Things like object_address_invariant_in_loop_p rely on the
access_fns in the same way that the loop in the patch does.

>> I don't understand the looping, it needs more comments.  You seem to be
>> looking for the innermost compatible RECORD_TYPE but then num_dimensions
>> is how many compatible refs you found on the way (with incompatible ones
>> not counting?!).  What about an inner varying array of structs?  This seems to
>> be disregarded in the analysis now?  Thus, a[i].s.b[i].j vs. __real
>> b[i].s.b[i].j?

I'll try to improve the comments.  But the idea is that both sequences are
as long as possible, while that still gives compatible types.  If there is
more than one such sequence, we pick the one nearest the base.

So in your example, the access functions would be:

               0   1   2   3   4
  a:          .j [i]  .b  .s [i]

           0   1   2   3   4   5
  b:  __real  .j [i]  .b  .s [i]

If a and b are pointers, the final access functions would be
unconstrained base accesses, so we'd end up with:

  a: [0, 3]
  b: [1, 4]

for both sequences.

>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p
>> conveniently start from the other
>> end of the ref here.
>
> That said, for the motivational cases we either have one ref having
> more dimensions than the other (the __real vs. full complex access) or
> they have the same number of dimensions (and no access fn for the
> base).
>
> For the first case we should simply "drop" access_fns of the larger
> dimensional ref (from the start, plus outer component refs) up to the
> point the number of dimensions are equal.

Yeah, that's what happens for your example.  But if we had:

    a[i].s.c.d
    __real b[i].s.b[i].j

(where d is the same type as the real component) then the access
functions would be:

                   0   1   2   3
  a:              .d  .c  .s [i]

           0   1   2   3   4   5
  b:  __real  .j [i]  .b  .s [i]

Comparing the a0/b2 column doesn't make sense, because one's an array
and the other is a structure.  In this case the sequence we care about is:

  a: [1, 3]
  b: [3, 5]

which is what the loop gives.  The a1/b3 column is the one that proves
there's no dependence.

> Then we have the case of
>
>   ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b))
>
> where we have to punt.
>
> Then we have the case of
>
>   ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>
> which is where the new code should kick in to see if we can drop access_fns
> from the other end (as unanalyzable but either having distance zero or not
> aliased because of TBAA).
>
> At least your testcases suggest you do not want to handle
>
>  struct s { int x[N]; };
>  struct r { struct s s; };
>  f (struct s *a, struct r *b)
>  {
>     for (i = 0; i < N; ++i)
>       a->s.x[i] = b->x[i];
>  }
>
> ?
>
> With this example your loop which seems to search for a "common"
> sequence in (different) midst of the reference trees makes more sense
> (still that loop is awkward to understand).

Yeah, I want to handle that too, just hadn't thought of it as a specific
testcase.  The code does give the expected dependence distance of 0.

Thanks,
Richard

  reply	other threads:[~2017-05-04 17:21 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-03  8:02 Richard Sandiford
2017-05-04  9:36 ` Bin.Cheng
2017-05-04 10:07   ` Richard Sandiford
2017-05-04 10:59     ` Bin.Cheng
2017-05-04 12:11       ` Richard Sandiford
2017-05-04 12:13 ` Richard Biener
2017-05-04 12:37   ` Richard Biener
2017-05-04 17:23     ` Richard Sandiford [this message]
2017-05-09 10:10       ` Richard Biener
2017-05-18  8:35         ` Richard Sandiford
2017-05-31  6:57           ` Richard Sandiford
2017-06-07 10:17             ` Richard Biener
2017-06-07 21:27               ` Richard Sandiford
2017-06-15 13:45                 ` [PING, Ada] " Richard Sandiford
2017-06-22 11:34                   ` [PING*2, " Richard Sandiford
2017-06-29 17:20                     ` [PING*3, " Richard Sandiford
2017-07-04  9:40                 ` Eric Botcazou
2017-07-07 14:01                   ` Richard Sandiford
2017-07-07 16:03                     ` Eric Botcazou
2017-07-27 12:20                     ` Richard Sandiford
2017-08-03 20:52                       ` Richard Sandiford
2017-08-04  9:06                       ` Richard Biener
2017-08-04  9:29                         ` Richard Sandiford
2017-08-04  9:35                           ` Richard Biener
2017-05-05 22:03   ` Bernhard Reutner-Fischer
2017-05-09 10:21     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a86sfsgg.fsf@linaro.org \
    --to=richard.sandiford@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).