From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-460560-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 51707 invoked by alias); 18 Aug 2017 10:39:31 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 91502 invoked by uid 89); 18 Aug 2017 10:35:38 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-6.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_1,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: mail-wr0-f182.google.com
Received: from mail-wr0-f182.google.com (HELO mail-wr0-f182.google.com) (209.85.128.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 18 Aug 2017 10:35:33 +0000
Received: by mail-wr0-f182.google.com with SMTP id y96so63966668wrc.1        for <gcc-patches@gcc.gnu.org>; Fri, 18 Aug 2017 03:35:33 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=FK50Q7LSLL1ElgK9moTdXFNTRWTLciht5/wynj8fw/A=;        b=W3FcVV0ParOz0Sn3Sk9ZidH7eRJwWhTWAKVtVNwWNh+WNWwWAIdVWxLWYHosyoQbGD         jeKHF/W9M6MrqcORpRZBdw4BlTlC5gT6mrMTwgo3oakZpJhghQaPi40UC4WHoIxuOwEN         YNld4G2d/yFi5JOynIXxPvBNpcpLp7oec7fJs4C6ZPrGspf8cTPvVoK04gZ8QEAyid+y         /pMIJKKIAeKFCe0AFQ4wuUlYMLqn93PZtfuoYwsD4+bPlyz5lJJoItu1ri2kFK1+4wwg         s9vvApqm6lSViMWKbLBEEJpDx2VW0NqW1xk7w4/SH5LXUt1IZKyWNvCDzR79YkoSnGWi         3r9g==
X-Gm-Message-State: AHYfb5hhVTYE4E30yil9P7hlAqzqpWUIIp8QAn1/3ysvRLOL5RNjOx/k	s1xu63ASHAS27jcj7nC3TJjMZtbSgQ==
X-Received: by 10.80.191.4 with SMTP id f4mr4420596edk.217.1503052531109; Fri, 18 Aug 2017 03:35:31 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.80.180.249 with HTTP; Fri, 18 Aug 2017 03:35:30 -0700 (PDT)
In-Reply-To: <CAFiYyc3e3ZH0hdrh3QdJTgx06Lm+6xpeM+3DPvHoAJ8ZuziG1w@mail.gmail.com>
References: <87pobvr5t8.fsf@linaro.org> <CAHFci2_ECtKEELwvdcr10JgeWJqoSxGX8vJHg+TqpfrKTFbOFQ@mail.gmail.com> <877ey3qz90.fsf@linaro.org> <CAHFci2-ZGKM8ndjPZD95WNRzkEKZpFX1tL2XW0L0e0QWM4hwYQ@mail.gmail.com> <87y3qjpfl2.fsf@linaro.org> <CAHFci29_Kobreer+SP2EoL1XwZrM10UxjncEc41qS2U7kHsZXw@mail.gmail.com> <874lt6fmvq.fsf@linaro.org> <CAHFci29bAdZRX14=1CqH=gtvuovGevCW1w2fD9pDxfDMPvQ=Aw@mail.gmail.com> <CAFiYyc3e3ZH0hdrh3QdJTgx06Lm+6xpeM+3DPvHoAJ8ZuziG1w@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Fri, 18 Aug 2017 12:18:00 -0000
Message-ID: <CAFiYyc0_tZB0mEW3nTCvyxTwzKyDAzUky6LP=tv3XKO1fD-GdQ@mail.gmail.com>
Subject: Re: PR81635: Use chrecs to help find related data refs
To: "Bin.Cheng" <amker.cheng@gmail.com>
Cc: gcc-patches List <gcc-patches@gcc.gnu.org>, 	Richard Sandiford <richard.sandiford@linaro.org>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2017-08/txt/msg01123.txt.bz2

On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>>>>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>>>>>>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>>>> The first loop in the testcase regressed after my recent changes to
>>>>>>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>>>>>>>>> for bb analysis and end up with:
>>>>>>>>>
>>>>>>>>>    DR_BASE_ADDRESS: p or q
>>>>>>>>>    DR_OFFSET: 0
>>>>>>>>>    DR_INIT: 0 or 4
>>>>>>>>>    DR_STEP: 16
>>>>>>>>>
>>>>>>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>>>>>>>>
>>>>>>>>>    DR_BASE_ADDRESS: p or q
>>>>>>>>>    DR_OFFSET: (intptr_t) i
>>>>>>>>>    DR_INIT: 0 or 4
>>>>>>>>>    DR_STEP: 0
>>>>>>>>>
>>>>>>>>> This is also what we'd like to have for the unsigned "i", but the
>>>>>>>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>>>>>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>>>>>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>>>>>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>>>>>>>> longer seem to be related to the [i] ones.
>>>>>>>>
>>>>>>>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>>>>>>>> I made changes in scev to better fold type conversion by various sources
>>>>>>>> of information, for example, vrp, niters, undefined overflow behavior etc.
>>>>>>>> In theory these information should be available for other
>>>>>>>> optimizers without
>>>>>>>> querying scev.  For the mentioned test, vrp should compute accurate range
>>>>>>>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>>>>>>>> worrying
>>>>>>>> overflow.  Note we don't do it in generic folding because
>>>>>>>> (intptr_t) (i) + 1
>>>>>>>> could be more expensive (especially in case of (T)(i + j)), or because the
>>>>>>>> CST part is in bigger precision after conversion.
>>>>>>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
>>>>>>>> an interface, we changed tree-affine and made it do aggressive fold.  I am
>>>>>>>> curious if it's possible to use aff_tree to implement strip_constant_offset
>>>>>>>> here since aggressive folding is wanted.  After all, using additional chrec
>>>>>>>> looks like a little heavy wrto the simple test.
>>>>>>>
>>>>>>> Yeah, using aff_tree does work here when the bounds are constant.
>>>>>>> It doesn't look like it works for things like:
>>>>>>>
>>>>>>>     double p[1000];
>>>>>>>     double q[1000];
>>>>>>>
>>>>>>>     void
>>>>>>>     f4 (unsigned int n)
>>>>>>>     {
>>>>>>>       for (unsigned int i = 0; i < n; i += 4)
>>>>>>>         {
>>>>>>>           double a = q[i] + p[i];
>>>>>>>           double b = q[i + 1] + p[i + 1];
>>>>>>>           q[i] = a;
>>>>>>>           q[i + 1] = b;
>>>>>>>         }
>>>>>>>     }
>>>>>>>
>>>>>>> though, where the bounds on the global arrays guarantee that [i + 1] can't
>>>>>>> overflow, even though "n" is unconstrained.  The patch as posted handles
>>>>>>> this case too.
>>>>>> BTW is this a missed optimization in value range analysis?  The range
>>>>>> information for i should flow in a way like: array boundary -> niters
>>>>>> -> scev/vrp.
>>>>>> I think that's what niters/scev do in analysis.
>>>>>
>>>>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
>>>>> the previous VRP pass came before loop header copying, so the (single)
>>>>> header has to cope with n == 0 case.  Thus we get:
>>>> Ah, there are several passes in between vrp and pass_ch, not sure if
>>>> any such pass depends on vrp intensively.  I would suggestion reorder
>>>> the two passes, or standalone VRP interface updating information for
>>>> loop region after header copied?   This is a non-trivial issue that
>>>> needs to be fixed.  Niters analyzer rely on
>>>> simplify_using_initial_conditions heavily to get the same information,
>>>> which in my opinion should be provided by VRP.  Though this won't be
>>>> able to obsolete simplify_using_initial_conditions because VRP is weak
>>>> in symbolic range...
>>>>
>>>>>
>>>>>   Visiting statement:
>>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;
>>>>>   Intersecting
>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>>>>>   and
>>>>>     [0, 0]
>>>>>   to
>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>>>>   Intersecting
>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>>>>   and
>>>>>     [0, 1000]
>>>>>   to
>>>>>     [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>>>>   Found new range for i_15: [0, 0]
>>>>>
>>>>>   Visiting statement:
>>>>>   _3 = i_15 + 1;
>>>>>   Match-and-simplified i_15 + 1 to 1
>>>>>   Intersecting
>>>>>     [1, 1]
>>>>>   and
>>>>>     [0, +INF]
>>>>>   to
>>>>>     [1, 1]
>>>>>   Found new range for _3: [1, 1]
>>>>>
>>>>> (where _3 is the index we care about), followed by:
>>>>>
>>>>>   Visiting statement:
>>>>>   i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;
>>>>>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'
>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>>>>>   and
>>>>>     [0, 4]
>>>>>   to
>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>>>>>   Intersecting
>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>>>>>   and
>>>>>     [0, 1000]
>>>>>   to
>>>>>     [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>>>>>   Found new range for i_15: [0, n_9(D) + 4294967295]
>>>>>
>>>>>   Visiting statement:
>>>>>   _3 = i_15 + 1;
>>>>>   Intersecting
>>>>>     [0, +INF]
>>>>>   and
>>>>>     [0, +INF]
>>>>>   to
>>>>>     [0, +INF]
>>>>>   Found new range for _3: [0, +INF]
>>>>>
>>>>> I guess in this case it would be better to intersect the i_15 ranges
>>>>> to [0, 1000] rather than [0, n_9(D) + 4294967295].
>>>>>
>>>>> It does work if another VRP pass runs after CH.  But even then,
>>>>> is it a good idea to rely on the range info being kept up-to-date
>>>>> all the way through to SLP?  A lot happens inbetween.
>>>> To some extend yes.  Now I understand that SCEV uses niters
>>>> information to prove no_overflow.  Niters analysis does infer better
>>>> information from array boundary, while VRP fails to do that.  I don't
>>>> worry much about gap between vrp pass and slp, it's basically the same
>>>> as niters.  Both information are analyzed at one point and meant to be
>>>> used by following passes.  It's also not common for vrp information
>>>> become invalid given we are on SSA?
>>>
>>> I'm not worried so much about vrp information becoming invalid on
>>> an SSA name that existed when VRP was run.  It's more a question
>>> of what happens about SSA names that get introduced after VRP,
>>> e.g. by things like dom, reassoc, PRE, etc.
>> For induction variables in concern, these passes shouldn't
>> aggressively introduces new variables I think.
>>>
>>>> Now that data address is not analyzed against loop, VRP would be the
>>>> only information we can use unless adding back scev analysis.  IMHO,
>>>> the patch is doing so in another way than before.  It requires
>>>> additional chrec data structure.  I remember the previous patch
>>>> enables more slp vectorization, is it because of "step" information
>>>> from scev?
>>>
>>> Do you mean that:
>>>
>>> 2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>
>>>
>>>         * tree-data-ref.c (dr_analyze_innermost): Replace the "nest"
>>>         parameter with a "loop" parameter and use it instead of the
>>>         loop containing DR_STMT.  Don't check simple_iv when doing
>>>         BB analysis.  Describe the two analysis modes in the comment.
>>>
>>> enabled more SLP vectorisation in bb-slp-pr65935.c?  That was due
>>> to us not doing IV analysis for BB vectorisation, and ensuring that
>>> the step was always zero.
>> Which means vectorizer code can handle not IV-analyzed offset, but
>> can't for analyzed form?
>>>
>>>> In this patch, step information is simply discarded.  I am
>>>> wondering if possible to always analyze scev within innermost loop for
>>>> slp while discards step information.
>>>
>>> Well, we don't calculate a step for bb analysis (i.e. it's not the case
>>> the patch calculates step information and then simply discards it).
>>> I don't see how that would work.  For bb analysis, the DR_OFFSET + DR_INIT
>>> has to give the offset for every execution of the block, not just the
>>> first iteration of the containing loop.  So if we get back a nonzero
>>> step, we have to do something with it.
>> Yeah.
>>>
>>> But:
>>>
>>> (a) the old simple_iv analysis is more expensive than simply calling
>>>     analyze_scev, so I don't think this is a win in terms of complexity.
>>>
>>> (b) for bb analysis, there's nothing particularly special about the
>>>     innermost loop.  It makes more sense to analyse it in the innermost
>>>     loop for which the offset is invariant, as shown by the second
>>>     testcase in the patch.
>>>
>>> (c) The patch helps with loop vectorisation too, since analysing the
>>>     starting DR_OFFSET in the context of the containing loop can help
>>>     in a similar way as analysing the full offset does for SLP.
>>
>> I have to admit I am not very much into this method.  It complicates
>> structure as well as code.
>> Mostly because now dr_init are split into two different fields and one
>> of it is lazily computed.
>>
>> For example:
>>> @@ -2974,12 +2974,12 @@ vect_vfa_segment_size (struct data_refer
>>>  vect_no_alias_p (struct data_reference *a, struct data_reference *b,
>>>                   tree segment_length_a, tree segment_length_b)
>>>  {
>>> -  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST
>>> -          && TREE_CODE (DR_INIT (b)) == INTEGER_CST);
>>> -  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))
>>> +  gcc_assert (TREE_CODE (DR_CHREC_INIT (a)) == INTEGER_CST
>>> +          && TREE_CODE (DR_CHREC_INIT (b)) == INTEGER_CST);
>>> +  if (tree_int_cst_equal (DR_CHREC_INIT (a), DR_CHREC_INIT (b)))
>>>      return false;
>>>
>>> -  tree seg_a_min = DR_INIT (a);
>>> +  tree seg_a_min = DR_CHREC_INIT (a);
>>>    tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),
>>>                  seg_a_min, segment_length_a);
>>>    /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
>>> @@ -2990,10 +2990,10 @@ vect_no_alias_p (struct data_reference *
>>>        tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));
>>>        seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),
>>>                     seg_a_max, unit_size);
>>> -      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),
>>> -                   DR_INIT (a), unit_size);
>>> +      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_CHREC_INIT (a)),
>>> +                   DR_CHREC_INIT (a), unit_size);
>>>      }
>>> -  tree seg_b_min = DR_INIT (b);
>>> +  tree seg_b_min = DR_CHREC_INIT (b);
>>>    tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),
>>>                  seg_b_min, segment_length_b);
>>>    if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
>>
>> Use of DR_INIT is simply replaced by DR_CHREC_INIT.  Is it safe to do
>> so in case of non-ZERO
>> DR_INIT?  It worries me that I may need to think twice before
>> referring to DR_INIT because it's
>> not clear when DR_OFFSET is split and DR_CHREC_INIT becomes non-ZERO.
>> It may simply
>> because I am too dumb to handle this.  I will leave this to richi.
>
> I'm trying to understand this a bit (not liking it very much in its
> current form).
>
> Can code currently using DR_OFFSET and DR_INIT simply use
> DR_CHREC_INIT and CHREC_LEFT (DR_CHREC_OFFSET) (or DR_CHREC_OFFSET
> if DR_CHREC_OFFSET is not a CHREC)?  If yes, would there be any downside
> in doing that?  If not, then why?
>
> That said, I'm all for making DR info more powerful.  But I detest
> having extra fields
> and adding confusion as to when to use which.  Thus if we can make DR_CHREC_INIT
> be DR_INIT and DR_OFFSET an inline function accessing CHREC_LEFT if
> suitable plus exposing DR_OFFSET_CHREC that would make me more happy.

And if we want to make it opt-in do a dr_analyze_me_harder () which will
re-write DR_OFFSET/INIT into the new form.

But DR_OFFSET and DR_INIT (read) accessors would maintain their
semantics while DR_OFFSET_CHREC would expose the CHREC if
available.

Richard.

> One downside might be that the scalar evolution of the offset might pull in
> SSA vars into "DR_OFFSET" that are "far away" and thus less optimal for
> code-generation when one re-constructs a ref by adding the components?
>
> Thanks,
> Richard.
>
>
>> Thanks,
>> bin
>>>
>>> Thanks,
>>> Richard
>>>
>>>>
>>>> Thanks,
>>>> bin
>>>>>
>>>>> FWIW, the old simple_iv check that I removed for bb data-ref analysis
>>>>> relies on SCEV analysis too, so I don't think this is more expensive
>>>>> than what we had before.
>>>>>
>>>>> Thanks,
>>>>> Richard