From: Richard Biener <richard.guenther@gmail.com>
To: "Bin.Cheng" <amker.cheng@gmail.com>
Cc: Michael Matz <matz@suse.de>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization
Date: Mon, 18 Dec 2017 13:32:00 -0000 [thread overview]
Message-ID: <CAFiYyc3Vx62h7L2eBgJ5hsLxxqxcmOzuYA-uDPB3A0jStXOtpw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc3+QJdcoa3moGRkPTSKh-vSAekBkOkyQzk+vSc0b5O+0Q@mail.gmail.com>
On Mon, Dec 18, 2017 at 1:37 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 7:39 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, Dec 15, 2017 at 1:19 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Dec 15, 2017 at 1:35 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Fri, Dec 15, 2017 at 12:09 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>>> On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>>>> Hi,
>>>>>>> As explained in the PR, given below test case:
>>>>>>> int a[8][10] = { [2][5] = 4 }, c;
>>>>>>>
>>>>>>> int
>>>>>>> main ()
>>>>>>> {
>>>>>>> short b;
>>>>>>> int i, d;
>>>>>>> for (b = 4; b >= 0; b--)
>>>>>>> for (c = 0; c <= 6; c++)
>>>>>>> a[c + 1][b + 2] = a[c][b + 1];
>>>>>>> for (i = 0; i < 8; i++)
>>>>>>> for (d = 0; d < 10; d++)
>>>>>>> if (a[i][d] != (i == 3 && d == 6) * 4)
>>>>>>> __builtin_abort ();
>>>>>>> return 0;
>>>>>>>
>>>>>>> the loop nest is illegal for vectorization without reversing inner loop. The issue
>>>>>>> is in data dependence checking of vectorizer, I believe the mentioned revision just
>>>>>>> exposed this. Previously the vectorization is skipped because of unsupported memory
>>>>>>> operation. The outer loop vectorization unrolls the outer loop into:
>>>>>>>
>>>>>>> for (b = 4; b > 0; b -= 4)
>>>>>>> {
>>>>>>> for (c = 0; c <= 6; c++)
>>>>>>> a[c + 1][6] = a[c][5];
>>>>>>> for (c = 0; c <= 6; c++)
>>>>>>> a[c + 1][5] = a[c][4];
>>>>>>> for (c = 0; c <= 6; c++)
>>>>>>> a[c + 1][4] = a[c][3];
>>>>>>> for (c = 0; c <= 6; c++)
>>>>>>> a[c + 1][3] = a[c][2];
>>>>>>> }
>>>>>>> Then four inner loops are fused into:
>>>>>>> for (b = 4; b > 0; b -= 4)
>>>>>>> {
>>>>>>> for (c = 0; c <= 6; c++)
>>>>>>> {
>>>>>>> a[c + 1][6] = a[c][5]; // S1
>>>>>>> a[c + 1][5] = a[c][4]; // S2
>>>>>>> a[c + 1][4] = a[c][3];
>>>>>>> a[c + 1][3] = a[c][2];
>>>>>>> }
>>>>>>> }
>>>>>>
>>>>>> Note that they are not really "fused" but they are interleaved. With
>>>>>> GIMPLE in mind
>>>>>> that makes a difference, you should get the equivalent of
>>>>>>
>>>>>> for (c = 0; c <= 6; c++)
>>>>>> {
>>>>>> tem1 = a[c][5];
>>>>>> tem2 = a[c][4];
>>>>>> tem3 = a[c][3];
>>>>>> tem4 = a[c][2];
>>>>>> a[c+1][6] = tem1;
>>>>>> a[c +1][5] = tem2;
>>>>>> a[c+1][4] = tem3;
>>>>>> a[c+1][3] = tem4;
>>>>>> }
>>>>> Yeah, I will double check if this abstract breaks the patch and how.
>>>> Hmm, I think this doesn't break it, well at least for part of the
>>>> analysis, because it is loop carried (backward) dependence goes wrong,
>>>> interleaving or not with the same iteration doesn't matter here.
>>>
>>> I think the idea is that forward dependences are always fine (negative distance)
>>> to vectorize. But with backward dependences we have to adhere to max_vf.
>>>
>>> It looks like for outer loop vectorization we only look at the distances in the
>>> outer loop but never at inner ones. But here the same applies but isn't that
>>> independend on the distances with respect to the outer loop?
>>>
>>> But maybe I'm misunderstanding how "distances" work here.
>> Hmm, I am not sure I understand "distance" correctly. With
>> description as in book like "Optimizing compilers for Modern
>> Architectures", distance is "# of iteration of sink ref - # of
>> iteration of source ref". Given below example:
>> for (i = 0; i < N; ++i)
>> {
>> x = arr[idx_1]; // S1
>> arr[idx_2] = x; // S2
>> }
>> if S1 is source ref, distance = idx_2 - idx_1, and distance > 0. Also
>> this is forward dependence. For example, idx_1 is i + 1 and idx_2 is
>> i;
>> If S2 is source ref, distance = idx_1 - idx_2, and distance < 0. Also
>> this is backward dependence. For example idx_1 is i and idx_2 is i +
>> 1;
>>
>> In GCC, we always try to subtract idx_2 from idx_1 first in computing
>> classic distance, we could result in negative distance in case of
>> backward dependence. When this happens at dependence carried level,
>> we manually reverse it. When this happens at inner level loop, we
>> simply keep the negative distance. And it's meaningless to talk about
>> forward/backward given dependence is carried at outer level loop.
>>
>> Outer loop vectorization is interesting. The problematic case has
>> backward dependence carried by outer loop. Because we don't check
>> dist vs. max_vf for such dep, the unrolled references could have outer
>> loop index equal to each other, as in a[c][5] and a[c+1][5]. So it's
>> like we have outer loop index resolved as equal. Now it has
>> dependence only if c == c' + 1. I think previous comment on fusion
>> still hold for this and we now need to check backward dependence
>> between the two reference at inner level loop. I guess it's a bit
>> trick to model dependence between a[c][5] and a[c+1][5] using the
>> original references and dependence. I think it's okay to do that
>> because distance of a[c][5] and a[c+1][5] is what we computed
>> previously for the original references at inner level loop.
>>
>> Not sure if I have missed something important.
>
> Not sure either. unroll-and-jam does
>
> FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
> {
> /* A distance (a,b) is at worst transformed into (a/N,b) by the
> unrolling (factor N), so the transformation is valid if
> a >= N, or b > 0, or b is zero and a > 0. Otherwise the unroll
> factor needs to be limited so that the first condition holds.
> That may limit the factor down to zero in the worst case. */
> int dist = dist_v[0];
> if (dist < 0)
> gcc_unreachable ();
> else if ((unsigned)dist >= *unroll)
> ;
> else if (lambda_vector_lexico_pos (dist_v + 1, DDR_NB_LOOPS (ddr) - 1)
> || (lambda_vector_zerop (dist_v + 1, DDR_NB_LOOPS (ddr) - 1)
> && dist > 0))
> ;
> else
> *unroll = dist;
>
> where *unroll is similar to *max_vf I think. dist_v[0] is the innermost loop.
>
> The vectorizer does way more complicated things and only looks at the
> distance with respect to the outer loop as far as I can see which can
> be negative.
>
> Not sure if fusion and vectorizer "interleaving" makes a difference here.
> I think the idea was that when interleaving stmt-by-stmt then forward
> dependences would be preserved and thus we don't need to check the
> inner loop dependences. speaking with "forward vs. backward" dependences
> again, not distances...
>
> This also means that unroll-and-jam could be enhanced to "interleave"
> stmts and thus cover more cases?
>
> Again, I hope Micha can have a look here...
And _maybe_ PR60276 and its fix (adding STMT_VINFO_MIN_NEG_DIST)
is related.
Richard.
> Richard.
>
>> Thanks,
>> bin
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> bin
>>>>>
>>>>>>
>>>>>>> The loop fusion needs to meet the dependence requirement. Basically, GCC's data
>>>>>>> dependence analyzer does not model dep between references in sibling loops, but
>>>>>>> in practice, fusion requirement can be checked by analyzing all data references
>>>>>>> after fusion, and there is no backward data dependence.
>>>>>>>
>>>>>>> Apparently, the requirement is violated because we have backward data dependence
>>>>>>> between references (a[c][5], a[c+1][5]) in S1/S2. Note, if we reverse the inner
>>>>>>> loop, the outer loop would become legal for vectorization.
>>>>>>>
>>>>>>> This patch fixes the issue by enforcing dependence check. It also adds two tests
>>>>>>> with one shouldn't be vectorized and the other should. Bootstrap and test on x86_64
>>>>>>> and AArch64. Is it OK?
>>>>>>
>>>>>> I think you have identified the spot where things go wrong but I'm not
>>>>>> sure you fix the
>>>>>> problem fully. The spot you pacth is (loop is the outer loop):
>>>>>>
>>>>>> loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
>>>>>> ...
>>>>>> FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>>>>>> {
>>>>>> int dist = dist_v[loop_depth];
>>>>>> ...
>>>>>> if (dist > 0 && DDR_REVERSED_P (ddr))
>>>>>> {
>>>>>> /* If DDR_REVERSED_P the order of the data-refs in DDR was
>>>>>> reversed (to make distance vector positive), and the actual
>>>>>> distance is negative. */
>>>>>> if (dump_enabled_p ())
>>>>>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>>>>> "dependence distance negative.\n");
>>>>>>
>>>>>> where you add
>>>>>>
>>>>>> + /* When doing outer loop vectorization, we need to check if there is
>>>>>> + backward dependence at inner loop level if dependence at the outer
>>>>>> + loop is reversed. See PR81740 for more information. */
>>>>>> + if (nested_in_vect_loop_p (loop, DR_STMT (dra))
>>>>>> + || nested_in_vect_loop_p (loop, DR_STMT (drb)))
>>>>>> + {
>>>>>> + unsigned inner_depth = index_in_loop_nest (loop->inner->num,
>>>>>> + DDR_LOOP_NEST (ddr));
>>>>>> + if (dist_v[inner_depth] < 0)
>>>>>> + return true;
>>>>>> + }
>>>>>>
>>>>>> but I don't understand how the dependence direction with respect to the
>>>>>> outer loop matters here.
>>>>> If the direction wrto outer loop is positive by itself, i.e,
>>>>> reversed_p equals to false, then dist is checked against max_vf. In
>>>>> this case, it's not possible to have references refer to the same
>>>>> object?
>>>>> On the other hand, dist is not checked at all for reversed case.
>>>>> Maybe an additional check "dist < max_vf" can relax the patch a bit.
>>>>>>
>>>>>> Given there's DDR_REVERSED on the outer loop distance what does that
>>>>>> mean for the inner loop distance given the quite non-obvious code handling
>>>>>> this case in tree-data-ref.c:
>>>>>>
>>>>>> /* Verify a basic constraint: classic distance vectors should
>>>>>> always be lexicographically positive.
>>>>>>
>>>>>> Data references are collected in the order of execution of
>>>>>> the program, thus for the following loop
>>>>>>
>>>>>> | for (i = 1; i < 100; i++)
>>>>>> | for (j = 1; j < 100; j++)
>>>>>> | {
>>>>>> | t = T[j+1][i-1]; // A
>>>>>> | T[j][i] = t + 2; // B
>>>>>> | }
>>>>>>
>>>>>> references are collected following the direction of the wind:
>>>>>> A then B. The data dependence tests are performed also
>>>>>> following this order, such that we're looking at the distance
>>>>>> separating the elements accessed by A from the elements later
>>>>>> accessed by B. But in this example, the distance returned by
>>>>>> test_dep (A, B) is lexicographically negative (-1, 1), that
>>>>>> means that the access A occurs later than B with respect to
>>>>>> the outer loop, ie. we're actually looking upwind. In this
>>>>>> case we solve test_dep (B, A) looking downwind to the
>>>>>> lexicographically positive solution, that returns the
>>>>>> distance vector (1, -1). */
>>>>>> if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr)))
>>>>>> {
>>>>>> lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
>>>>>> if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
>>>>>> return false;
>>>>>> compute_subscript_distance (ddr);
>>>>>> if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b,
>>>>>> &index_carry))
>>>>>> return false;
>>>>>> save_dist_v (ddr, save_v);
>>>>>> DDR_REVERSED_P (ddr) = true;
>>>>>>
>>>>>> that is, what does dist_v[inner_depth] < 0 tell us here? Does it tell us
>>>>>> that the dependence direction is reverse of the outer loop?
>>>>> Hmm, this part of code is a bit confusing for me. So we always
>>>>> compute classic distance by sorting two data references in lexico
>>>>> order, which could be the opposite given we collect references by dom
>>>>> order. IIUC, the negative dist at inner loop means a backward
>>>>> dependence for that index/loop. It's not related to outer loop or the
>>>>> reversed_p flag.
>>>>>
>>>>> Thanks,
>>>>> bin
>>>>>>
>>>>>> CCing Micha who might remember some more details from unroll-and-jam.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks,
>>>>>>> bin
>>>>>>> 2017-12-15 Bin Cheng <bin.cheng@arm.com>
>>>>>>>
>>>>>>> PR tree-optimization/81740
>>>>>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): In case
>>>>>>> of outer loop vectorization, check backward dependence at inner loop
>>>>>>> if dependence at outer loop is reversed.
>>>>>>>
>>>>>>> gcc/testsuite
>>>>>>> 2017-12-15 Bin Cheng <bin.cheng@arm.com>
>>>>>>>
>>>>>>> PR tree-optimization/81740
>>>>>>> * gcc.dg/vect/pr81740-1.c: New test.
>>>>>>> * gcc.dg/vect/pr81740-2.c: Refine test.
next prev parent reply other threads:[~2017-12-18 13:32 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-15 11:30 Bin Cheng
2017-12-15 11:55 ` Richard Biener
2017-12-15 12:09 ` Bin.Cheng
2017-12-15 12:35 ` Bin.Cheng
2017-12-15 13:19 ` Richard Biener
2017-12-15 18:39 ` Bin.Cheng
2017-12-18 12:37 ` Richard Biener
2017-12-18 13:32 ` Richard Biener [this message]
2017-12-18 14:35 ` Michael Matz
2017-12-19 11:58 ` Bin.Cheng
2017-12-19 12:56 ` Richard Biener
2017-12-20 14:56 ` Bin.Cheng
2019-03-21 12:29 ` Richard Biener
2019-03-22 7:21 ` Bin.Cheng
2019-03-25 13:51 ` Richard Biener
2019-03-26 1:33 ` Richard Sandiford
2019-03-26 7:57 ` Bin.Cheng
2019-03-26 8:40 ` Richard Biener
2019-04-01 17:26 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFiYyc3Vx62h7L2eBgJ5hsLxxqxcmOzuYA-uDPB3A0jStXOtpw@mail.gmail.com \
--to=richard.guenther@gmail.com \
--cc=amker.cheng@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=matz@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).