From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-453106-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 32420 invoked by alias); 9 May 2017 10:09:07 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 32409 invoked by uid 89); 9 May 2017 10:09:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=conveniently, disambiguation, type2, Things
X-HELO: mail-oi0-f53.google.com
Received: from mail-oi0-f53.google.com (HELO mail-oi0-f53.google.com) (209.85.218.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 09 May 2017 10:09:04 +0000
Received: by mail-oi0-f53.google.com with SMTP id h4so81182642oib.3        for <gcc-patches@gcc.gnu.org>; Tue, 09 May 2017 03:09:07 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to;        bh=Mgo1zWrR9RJoASG9vR/DUuqbKfiNdazcEyHQZnr4Wvg=;        b=OQYeFPu/YAi1BDJ/QQcfqE4elds2EbqGu6wCxBJAF8rKR6JtmqgNn5A337g8mj8PP0         1XzzH9DnDNEbV35SiZ/O27CmJych97zieC5nhX/YLGxGgJPqBvPD3/z5W0+NfkLjOnUZ         ZebTpLa31xhY5YA/F/Rfua2UXujgW/F0uoKP/blc0NIootMjn7jNxn1aJyKIwUQuf2Rv         O2Vf4hVPzI/5YBxTf06NfBY86lYPeSO0RYyUCOZmUhXV7QjQyq5XIof79Ibm1/uVR62J         /twk0pNZ6G9SFNdth4P/PTFqmoID7aV2+tgoOe76zjVOH6QkEVtbqj33nrkgWUnJb/yF         7/Wg==
X-Gm-Message-State: AN3rC/6wFv9bKbtmN8hqQSbClPHTlYyxz+g8vk0KBCOkpx92wJs1prnP	Bh0eJsyKtsVjS3pcHTWGjMCozJZvAg==
X-Received: by 10.157.46.234 with SMTP id w97mr24344450ota.78.1494324545246; Tue, 09 May 2017 03:09:05 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.157.51.83 with HTTP; Tue, 9 May 2017 03:09:04 -0700 (PDT)
In-Reply-To: <87a86sfsgg.fsf@linaro.org>
References: <87tw52s73r.fsf@linaro.org> <CAFiYyc0XFciO8BEitmMSuDQbVTc60FXiaCKtQxoiSQ_RjKM4dQ@mail.gmail.com> <CAFiYyc0y66Rc9OU_ocZrvNZ9ogjTdzc3VhcGAWhBUdOdKg4H5g@mail.gmail.com> <87a86sfsgg.fsf@linaro.org>
From: Richard Biener <richard.guenther@gmail.com>
Date: Tue, 09 May 2017 10:10:00 -0000
Message-ID: <CAFiYyc3jfLvP7t06mJoWHHuQxKRQ1XdaMru+x1Q3rLxVizjPFg@mail.gmail.com>
Subject: Re: Handle data dependence relations with different bases
To: Richard Biener <richard.guenther@gmail.com>, GCC Patches <gcc-patches@gcc.gnu.org>, 	Richard Sandiford <richard.sandiford@linaro.org>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2017-05/txt/msg00616.txt.bz2

On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Thu, May 4, 2017 at 2:12 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch tries to calculate conservatively-correct distance
>>>> vectors for two references whose base addresses are not the same.
>>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
>>>> isn't guaranteed to occur.
>>>>
>>>> The motivating example is:
>>>>
>>>>   struct s { int x[8]; };
>>>>   void
>>>>   f (struct s *a, struct s *b)
>>>>   {
>>>>     for (int i = 0; i < 8; ++i)
>>>>       a->x[i] += b->x[i];
>>>>   }
>>>>
>>>> in which the "a" and "b" accesses are either independent or have a
>>>> dependence distance of 0 (assuming -fstrict-aliasing).  Neither case
>>>> prevents vectorisation, so we can vectorise without an alias check.
>>>>
>>>> I'd originally wanted to do the same thing for arrays as well, e.g.:
>>>>
>>>>   void
>>>>   f (int a[][8], struct b[][8])
>>>>   {
>>>>     for (int i = 0; i < 8; ++i)
>>>>       a[0][i] += b[0][i];
>>>>   }
>>>>
>>>> I think this is valid because C11 6.7.6.2/6 says:
>>>>
>>>>   For two array types to be compatible, both shall have compatible
>>>>   element types, and if both size specifiers are present, and are
>>>>   integer constant expressions, then both size specifiers shall have
>>>>   the same constant value.
>>>>
>>>> So if we access an array through an int (*)[8], it must have type X[8]
>>>> or X[], where X is compatible with int.  It doesn't seem possible in
>>>> either case for "a[0]" and "b[0]" to overlap when "a != b".
>>>>
>>>> However, Richard B said that (at least in gimple) we support arbitrary
>>>> overlap of arrays and allow arrays to be accessed with different
>>>> dimensionality.  There are examples of this in PR50067.  I've therefore
>>>> only handled references that end in a structure field access.
>>>>
>>>> There are two ways of handling these dependences in the vectoriser:
>>>> use them to limit VF, or check at runtime as before.  I've gone for
>>>> the approach of checking at runtime if we can, to avoid limiting VF
>>>> unnecessarily.  We still fall back to a VF cap when runtime checks
>>>> aren't allowed.
>>>>
>>>> The patch tests whether we queued an alias check with a dependence
>>>> distance of X and then picked a VF <= X, in which case it's safe to
>>>> drop the alias check.  Since vect_prune_runtime_alias_check_list can
>>>> be called twice with different VF for the same loop, it's no longer
>>>> safe to clear may_alias_ddrs on exit.  Instead we should use
>>>> comp_alias_ddrs to check whether versioning is necessary.
>>>>
>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>>
>>> You seem to do your "fancy" thing but also later compute the old
>>> base equality anyway (for same_base_p).  It looks to me for this
>>> case the new fancy code can be simply skipped, keeping num_dimensions
>>> as before?
>>>
>>> +      /* Try to approach equal type sizes.  */
>>> +      if (!COMPLETE_TYPE_P (type_a)
>>> +         || !COMPLETE_TYPE_P (type_b)
>>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
>>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
>>> +       break;
>>>
>>> ah, interesting idea to avoid a quadratic search.  Note that you should
>>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
>>> as they are used for type-punning.
>
> All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs,
> ARRAY_REFs or COMPONENT_REFs of structures, since that's all that
> dr_analyze_indices allows, so I think we safe in terms of the tree codes.

Yeah.  I think we need to document that we should have a 1:1 match here.

>>> I see nonoverlapping_component_refs_of_decl_p should simply skip
>>> ARRAY_REFs - but I also see there:
>>>
>>>       /* ??? We cannot simply use the type of operand #0 of the refs here
>>>          as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>>          for common blocks instead of using unions like everyone else.  */
>>>       tree type1 = DECL_CONTEXT (field1);
>>>       tree type2 = DECL_CONTEXT (field2);
>>>
>>> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility.
>>> You also may not use types_compatible_p here as for LTO that is _way_ too
>>> lax for aggregates.  The above uses
>>>
>>>       /* We cannot disambiguate fields in a union or qualified union.  */
>>>       if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>>          return false;
>>>
>>> so you should also bail out on unions here, rather than the check you do later.
>
> The loop stops before we get to a union, so I think "only" the RECORD_TYPE
> COMPONENT_REF handling is a potential problem.  Does this mean that
> I should use the nonoverlapping_component_refs_of_decl_p code:
>
>       tree field1 = TREE_OPERAND (ref1, 1);
>       tree field2 = TREE_OPERAND (ref2, 1);
>
>       /* ??? We cannot simply use the type of operand #0 of the refs here
>          as the Fortran compiler smuggles type punning into COMPONENT_REFs
>          for common blocks instead of using unions like everyone else.  */
>       tree type1 = DECL_CONTEXT (field1);
>       tree type2 = DECL_CONTEXT (field2);
>
>       /* We cannot disambiguate fields in a union or qualified union.  */
>       if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>          return false;
>
>       if (field1 != field2)
>         {
>           /* A field and its representative need to be considered the
>              same.  */
>           if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2
>               || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1)
>             return false;
>           /* Different fields of the same record type cannot overlap.
>              ??? Bitfields can overlap at RTL level so punt on them.  */
>           if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
>             return false;
>           return true;
>         }
>
> as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p
> during the new loop?

Yes.  OTOH you want to "match" while the above disambiguates.  So it means
you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL
equality (which should be the same in the end).  The RTL concern
should not matter
here.

>  And test for this as well as unions in the outer
> references?

So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT,
and you (should) stop the ref walk at DR_BASE_OBJECT.  The dr_analyze_indices
code is also somewhat fishy in that it simply ignores everything below unhandled
component-refs even if there are indices involved (and it gets away
with this because
dependence analysis likely/hopefully gives up on the DR_BASE_OBJECT equality
test in case it is sth like a[i].union for example ... hopefully ...).

>>> You seem to rely on getting an access_fn entry for each handled_component_p.
>>> It looks like this is the case -- we even seem to stop at unions (with the same
>>> fortran "issue").  I'm not sure that's the best thing to do but you
>>> rely on that.
>
> Yeah, the loop is deliberately limited to the components associated with
> an access_fn.  I did wonder at first whether dr_analyze_indices should
> store the original component reference trees for each access function.
> That would make things simpler and more explicit, but would also eat up
> more memory.  Things like object_address_invariant_in_loop_p rely on the
> access_fns in the same way that the loop in the patch does.

in fact it fails to handle ARRAY_RANGE_REFs ...

>>> I don't understand the looping, it needs more comments.  You seem to be
>>> looking for the innermost compatible RECORD_TYPE but then num_dimensions
>>> is how many compatible refs you found on the way (with incompatible ones
>>> not counting?!).  What about an inner varying array of structs?  This seems to
>>> be disregarded in the analysis now?  Thus, a[i].s.b[i].j vs. __real
>>> b[i].s.b[i].j?
>
> I'll try to improve the comments.  But the idea is that both sequences are
> as long as possible, while that still gives compatible types.  If there is
> more than one such sequence, we pick the one nearest the base.
>
> So in your example, the access functions would be:
>
>                0   1   2   3   4
>   a:          .j [i]  .b  .s [i]
>
>            0   1   2   3   4   5
>   b:  __real  .j [i]  .b  .s [i]
>
> If a and b are pointers, the final access functions would be
> unconstrained base accesses, so we'd end up with:
>
>   a: [0, 3]
>   b: [1, 4]
>
> for both sequences.
>
>>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p
>>> conveniently start from the other
>>> end of the ref here.
>>
>> That said, for the motivational cases we either have one ref having
>> more dimensions than the other (the __real vs. full complex access) or
>> they have the same number of dimensions (and no access fn for the
>> base).
>>
>> For the first case we should simply "drop" access_fns of the larger
>> dimensional ref (from the start, plus outer component refs) up to the
>> point the number of dimensions are equal.
>
> Yeah, that's what happens for your example.  But if we had:
>
>     a[i].s.c.d
>     __real b[i].s.b[i].j
>
> (where d is the same type as the real component) then the access
> functions would be:
>
>                    0   1   2   3
>   a:              .d  .c  .s [i]
>
>            0   1   2   3   4   5
>   b:  __real  .j [i]  .b  .s [i]
>
> Comparing the a0/b2 column doesn't make sense, because one's an array
> and the other is a structure.  In this case the sequence we care about is:
>
>   a: [1, 3]
>   b: [3, 5]
>
> which is what the loop gives.  The a1/b3 column is the one that proves
> there's no dependence.
>
>> Then we have the case of
>>
>>   ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b))
>>
>> where we have to punt.
>>
>> Then we have the case of
>>
>>   ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>>
>> which is where the new code should kick in to see if we can drop access_fns
>> from the other end (as unanalyzable but either having distance zero or not
>> aliased because of TBAA).
>>
>> At least your testcases suggest you do not want to handle
>>
>>  struct s { int x[N]; };
>>  struct r { struct s s; };
>>  f (struct s *a, struct r *b)
>>  {
>>     for (i = 0; i < N; ++i)
>>       a->s.x[i] = b->x[i];
>>  }
>>
>> ?
>>
>> With this example your loop which seems to search for a "common"
>> sequence in (different) midst of the reference trees makes more sense
>> (still that loop is awkward to understand).
>
> Yeah, I want to handle that too, just hadn't thought of it as a specific
> testcase.  The code does give the expected dependence distance of 0.

Ok.

I think the patch is reasonable, maybe the loop can be restructured / simplified
a bit and handling of the union case for example be done first (by looking at
DR_BASE_OBJECT).

Thanks,
Richard.

>
> Thanks,
> Richard