Questions about vectorizing a simple loop by inferring the range from array

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Questions about vectorizing a simple loop by inferring the range from array
@ 2023-11-09  9:49 Hao Liu OS
  2023-11-10  9:39 ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Hao Liu OS @ 2023-11-09  9:49 UTC (permalink / raw)
  To: gcc

Hi,

I'm investigating how to vectorize the following simple case:

    int A[1024 * 2];
    int foo1 (unsigned offset) {
      int sum = 0;
      for (unsigned i = 0; i < 1024; i++)
        sum += A[i + offset];
      return sum;
    }

The loop body and loop vectorizer dumps are:

    # i_13 = PHI <i_9(5), 0(2)>
    # ivtmp_14 = PHI <ivtmp_12(5), 1024(2)>
    _1 = offset_7(D) + i_13;
    _2 = A[_1];
    sum_8 = _2 + sum_11;
    i_9 = i_13 + 1;
    ...
    Creating dr for A[_1]
      ...
      (chrec = (sizetype) {offset_7(D), +, 1}_1)
      (res = (sizetype) {offset_7(D), +, 1}_1)
    case1.c:7:13: missed:  failed: evolution of offset is not affine.

As SCEV thinks {offset,+,1} may overflow, it can not propagate the sizetype and
fails. The call-stack is:

    dr_analyze_innermost() -> 
        simple_iv() -> ... ->
            convert_affine_scev() -> 
                scev_probably_wraps_p() -> 
                    scev_var_range_cant_overflow()
                    loop_exits_before_overflow()

BTW. If we add a stmt like "if (offset <= 1024)" before the loop, GCC can
vectorized it successfully, as scev_var_range_cant_overflow() knows the range
of _1.

For the original case, I think GCC is able to infer the range from the array
length and do the vectorization. There is already functions to infer this info
from undefined behavior like array-ref and record nonwrappiong IVs. We can see
the dumps of passes like evrp, cunroll, vrp1, etc:

    Induction variable (unsigned int) offset_8(D) + 1 * iteration does not wrap in statement _2 = A[_1];
    in loop 1.
    Statement _2 = A[_1];
     is executed at most 2047 (bounded by 2047) + 1 times in loop 1.

The call-stack is:

    estimate_numbers_of_iterations() ->
        infer_loop_bounds_from_undefined() ->
            infer_loop_bounds_from_array() -> ... ->
                record_nonwrapping_iv()

So, I have two questions:

1. is it legal to vectorize such case by inferring the no wrap info from array
   ref (I'm wondering if there is any corner case that can not do)?
2. If the 1st question is true, then how could we implement this in GCC?

For the 2nd question, I think there may be several possible solutions:
- Could we re-use the existing nonwrapping IV information found by niter?
- Or, could we implement a function called infer_nowrapping_from_array() (like
  infer_loop_bounds_from_array)? For this way, there are also different possible
  places to call it:
  - in the very beginning (i.e., dr_analyze_innermost()), where we knows it is
    analyzing an array-ref A[_1].
  - in scev_probably_wraps_p(), where it doesn't know it's analyzing an array
    subscript, so we have to find if the user of _1 is array-ref (i.e., A[_1]).

As SCEV is the fundamental pass for loop analysis, I think we'd be very careful
about the correctness. Do you have any comments?

Thanks!
-Hao

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Questions about vectorizing a simple loop by inferring the range from array
  2023-11-09  9:49 Questions about vectorizing a simple loop by inferring the range from array Hao Liu OS
@ 2023-11-10  9:39 ` Richard Biener
  2023-11-10 10:16   ` Hao Liu OS
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2023-11-10  9:39 UTC (permalink / raw)
  To: Hao Liu OS; +Cc: gcc

On Thu, Nov 9, 2023 at 10:51 AM Hao Liu OS via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi,
>
> I'm investigating how to vectorize the following simple case:
>
>     int A[1024 * 2];
>     int foo1 (unsigned offset) {
>       int sum = 0;
>       for (unsigned i = 0; i < 1024; i++)
>         sum += A[i + offset];
>       return sum;
>     }
>
> The loop body and loop vectorizer dumps are:
>
>     # i_13 = PHI <i_9(5), 0(2)>
>     # ivtmp_14 = PHI <ivtmp_12(5), 1024(2)>
>     _1 = offset_7(D) + i_13;
>     _2 = A[_1];
>     sum_8 = _2 + sum_11;
>     i_9 = i_13 + 1;
>     ...
>     Creating dr for A[_1]
>       ...
>       (chrec = (sizetype) {offset_7(D), +, 1}_1)
>       (res = (sizetype) {offset_7(D), +, 1}_1)
>     case1.c:7:13: missed:  failed: evolution of offset is not affine.
>
> As SCEV thinks {offset,+,1} may overflow, it can not propagate the sizetype and
> fails. The call-stack is:
>
>     dr_analyze_innermost() ->
>         simple_iv() -> ... ->
>             convert_affine_scev() ->
>                 scev_probably_wraps_p() ->
>                     scev_var_range_cant_overflow()
>                     loop_exits_before_overflow()
>
>
> BTW. If we add a stmt like "if (offset <= 1024)" before the loop, GCC can
> vectorized it successfully, as scev_var_range_cant_overflow() knows the range
> of _1.
>
> For the original case, I think GCC is able to infer the range from the array
> length and do the vectorization. There is already functions to infer this info
> from undefined behavior like array-ref and record nonwrappiong IVs. We can see
> the dumps of passes like evrp, cunroll, vrp1, etc:
>
>     Induction variable (unsigned int) offset_8(D) + 1 * iteration does not wrap in statement _2 = A[_1];
>     in loop 1.
>     Statement _2 = A[_1];
>      is executed at most 2047 (bounded by 2047) + 1 times in loop 1.
>
> The call-stack is:
>
>     estimate_numbers_of_iterations() ->
>         infer_loop_bounds_from_undefined() ->
>             infer_loop_bounds_from_array() -> ... ->
>                 record_nonwrapping_iv()
>
>
> So, I have two questions:
>
> 1. is it legal to vectorize such case by inferring the no wrap info from array
>    ref (I'm wondering if there is any corner case that can not do)?
> 2. If the 1st question is true, then how could we implement this in GCC?
>
> For the 2nd question, I think there may be several possible solutions:
> - Could we re-use the existing nonwrapping IV information found by niter?
> - Or, could we implement a function called infer_nowrapping_from_array() (like
>   infer_loop_bounds_from_array)? For this way, there are also different possible
>   places to call it:
>   - in the very beginning (i.e., dr_analyze_innermost()), where we knows it is
>     analyzing an array-ref A[_1].
>   - in scev_probably_wraps_p(), where it doesn't know it's analyzing an array
>     subscript, so we have to find if the user of _1 is array-ref (i.e., A[_1]).
>
> As SCEV is the fundamental pass for loop analysis, I think we'd be very careful
> about the correctness. Do you have any comments?

Also SCEV and niter analysis are very closely inter-winded ...

It was previously suggested that SCEV analysis should get something like
niter analysis "assumptions" so that we can record a condition that needs
to hold true for the SCEV result to be valid.  That condition can then for
example checked at runtime by the vectorizer, just like we check the
niter analysis 'assumptions'.  But it's going to be quite a bit of work to
wire in something like 'assumptions' there.

In your particular case we could indeed use undefined behavior like
niter analysis does and it would be nice to somehow record its
analysis and use it from SCEV.

Apart from introducing 'assumptions' I don't have any good idea on
how to actually do this.  It will require experimenting.

Richard.

>
> Thanks!
> -Hao

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Questions about vectorizing a simple loop by inferring the range from array
  2023-11-10  9:39 ` Richard Biener
@ 2023-11-10 10:16   ` Hao Liu OS
  0 siblings, 0 replies; 3+ messages in thread
From: Hao Liu OS @ 2023-11-10 10:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc

> In your particular case we could indeed use undefined behavior like
> niter analysis does and it would be nice to somehow record its
> analysis and use it from SCEV.

Thanks for your confirmation!

I also saw the suggestion about "assumptions" (in pr101450) and thought about
how to support it generally. This case is a special, as it doesn't need runtime
check. So, I'm trying to fix this one as the 1st step.

Thanks,
-Hao

________________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Friday, November 10, 2023 17:39
To: Hao Liu OS
Cc: gcc@gcc.gnu.org
Subject: Re: Questions about vectorizing a simple loop by inferring the range from array

On Thu, Nov 9, 2023 at 10:51 AM Hao Liu OS via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi,
>
> I'm investigating how to vectorize the following simple case:
>
>     int A[1024 * 2];
>     int foo1 (unsigned offset) {
>       int sum = 0;
>       for (unsigned i = 0; i < 1024; i++)
>         sum += A[i + offset];
>       return sum;
>     }
>
> The loop body and loop vectorizer dumps are:
>
>     # i_13 = PHI <i_9(5), 0(2)>
>     # ivtmp_14 = PHI <ivtmp_12(5), 1024(2)>
>     _1 = offset_7(D) + i_13;
>     _2 = A[_1];
>     sum_8 = _2 + sum_11;
>     i_9 = i_13 + 1;
>     ...
>     Creating dr for A[_1]
>       ...
>       (chrec = (sizetype) {offset_7(D), +, 1}_1)
>       (res = (sizetype) {offset_7(D), +, 1}_1)
>     case1.c:7:13: missed:  failed: evolution of offset is not affine.
>
> As SCEV thinks {offset,+,1} may overflow, it can not propagate the sizetype and
> fails. The call-stack is:
>
>     dr_analyze_innermost() ->
>         simple_iv() -> ... ->
>             convert_affine_scev() ->
>                 scev_probably_wraps_p() ->
>                     scev_var_range_cant_overflow()
>                     loop_exits_before_overflow()
>
>
> BTW. If we add a stmt like "if (offset <= 1024)" before the loop, GCC can
> vectorized it successfully, as scev_var_range_cant_overflow() knows the range
> of _1.
>
> For the original case, I think GCC is able to infer the range from the array
> length and do the vectorization. There is already functions to infer this info
> from undefined behavior like array-ref and record nonwrappiong IVs. We can see
> the dumps of passes like evrp, cunroll, vrp1, etc:
>
>     Induction variable (unsigned int) offset_8(D) + 1 * iteration does not wrap in statement _2 = A[_1];
>     in loop 1.
>     Statement _2 = A[_1];
>      is executed at most 2047 (bounded by 2047) + 1 times in loop 1.
>
> The call-stack is:
>
>     estimate_numbers_of_iterations() ->
>         infer_loop_bounds_from_undefined() ->
>             infer_loop_bounds_from_array() -> ... ->
>                 record_nonwrapping_iv()
>
>
> So, I have two questions:
>
> 1. is it legal to vectorize such case by inferring the no wrap info from array
>    ref (I'm wondering if there is any corner case that can not do)?
> 2. If the 1st question is true, then how could we implement this in GCC?
>
> For the 2nd question, I think there may be several possible solutions:
> - Could we re-use the existing nonwrapping IV information found by niter?
> - Or, could we implement a function called infer_nowrapping_from_array() (like
>   infer_loop_bounds_from_array)? For this way, there are also different possible
>   places to call it:
>   - in the very beginning (i.e., dr_analyze_innermost()), where we knows it is
>     analyzing an array-ref A[_1].
>   - in scev_probably_wraps_p(), where it doesn't know it's analyzing an array
>     subscript, so we have to find if the user of _1 is array-ref (i.e., A[_1]).
>
> As SCEV is the fundamental pass for loop analysis, I think we'd be very careful
> about the correctness. Do you have any comments?

Also SCEV and niter analysis are very closely inter-winded ...

It was previously suggested that SCEV analysis should get something like
niter analysis "assumptions" so that we can record a condition that needs
to hold true for the SCEV result to be valid.  That condition can then for
example checked at runtime by the vectorizer, just like we check the
niter analysis 'assumptions'.  But it's going to be quite a bit of work to
wire in something like 'assumptions' there.

In your particular case we could indeed use undefined behavior like
niter analysis does and it would be nice to somehow record its
analysis and use it from SCEV.

Apart from introducing 'assumptions' I don't have any good idea on
how to actually do this.  It will require experimenting.

Richard.

>
> Thanks!
> -Hao

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-11-10 10:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-09  9:49 Questions about vectorizing a simple loop by inferring the range from array Hao Liu OS
2023-11-10  9:39 ` Richard Biener
2023-11-10 10:16   ` Hao Liu OS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).