public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Richard Biener <rguenther@suse.de>,
	Jeff Law <jeffreyalaw@gmail.com>,
	gcc-patches@gcc.gnu.org
Subject: Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization
Date: Mon, 14 Aug 2023 20:28:15 +0530	[thread overview]
Message-ID: <CAAgBjMkyPNAhaELEbgi6MBCRx_Ccajtwct6nJpTt4TdwzD8NsQ@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc0OaYqEULUiytCSAmHE-7eJPevCf4xHj3KugmY0PqAAHg@mail.gmail.com>

On Mon, 7 Aug 2023 at 13:19, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Mon, Aug 7, 2023 at 2:05 AM Prathamesh Kulkarni via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, 3 Aug 2023 at 17:48, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Thu, 3 Aug 2023, Richard Biener wrote:
> > >
> > > > On Thu, 3 Aug 2023, Richard Biener wrote:
> > > >
> > > > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches
> > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > >
> > > > > > > On Mon, 31 Jul 2023, Jeff Law wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:
> > > > > > > > > The following delays sinking of loads within the same innermost
> > > > > > > > > loop when it was unconditional before.  That's a not uncommon
> > > > > > > > > issue preventing vectorization when masked loads are not available.
> > > > > > > > >
> > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > > > >
> > > > > > > > > I have a followup patch improving sinking that without this would
> > > > > > > > > cause more of the problematic sinking - now that we have a second
> > > > > > > > > sink pass after loop opts this looks like a reasonable approach?
> > > > > > > > >
> > > > > > > > > OK?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > >  PR tree-optimization/92335
> > > > > > > > >  * tree-ssa-sink.cc (select_best_block): Before loop
> > > > > > > > >  optimizations avoid sinking unconditional loads/stores
> > > > > > > > >  in innermost loops to conditional executed places.
> > > > > > > > >
> > > > > > > > >  * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
> > > > > > > > >  * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
> > > > > > > > >  expect predictive commoning to happen instead of sinking.
> > > > > > > > >  * gcc.dg/vect/pr65947-3.c: Adjust.
> > > > > > > > I think it's reasonable -- there's probably going to be cases where it's not
> > > > > > > > great, but more often than not I think it's going to be a reasonable
> > > > > > > > heuristic.
> > > > > > > >
> > > > > > > > If there is undesirable fallout, better to find it over the coming months than
> > > > > > > > next spring.  So I'd suggest we go forward now to give more time to find any
> > > > > > > > pathological cases (if they exist).
> > > > > > >
> > > > > > > Agreed, I've pushed this now.
> > > > > > Hi Richard,
> > > > > > After this patch (committed in 399c8dd44ff44f4b496223c7cc980651c4d6f6a0),
> > > > > > pr65947-7.c "failed" for aarch64-linux-gnu:
> > > > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP VECTORIZED"
> > > > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects
> > > > > > scan-tree-dump-not vect "LOOP VECTORIZED"
> > > > > >
> > > > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target {
> > > > > > ! vect_fold_extract_last } } } } */
> > > > > >
> > > > > > With your commit, condition_reduction in pr65947-7.c gets vectorized
> > > > > > regardless of vect_fold_extract_last,
> > > > > > which gates the above test (which is an improvement, because the
> > > > > > function didn't get vectorized before the commit).
> > > > > >
> > > > > > The attached patch thus removes the gating on vect_fold_extract_last,
> > > > > > and the test passes again.
> > > > > > OK to commit ?
> > > > >
> > > > > OK.
> > > >
> > > > Or wait - the loop doesn't vectorize on x86_64, so I guess one
> > > > critical target condition is missing.  Can you figure out which?
> > >
> > > I see
> > >
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>,
> > > type of def: reduction
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > note:   vect_is_simple_use: vectype vector(4) int
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > missed:   multiple types in double reduction or condition reduction or
> > > fold-left reduction.
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1:
> > > missed:   not vectorized: relevant phi not supported: last_19 = PHI
> > > <last_8(7), 108(15)>
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > missed:  bad operation or unsupported loop bound.
> > Hi Richard,
> > Looking at the aarch64 vect dump, it seems the loop in
> > condition_reduction gets vectorized with V4HI mode
> > while fails for other modes in vectorizable_condition:
> >
> >   if ((double_reduc || reduction_type != TREE_CODE_REDUCTION)
> >       && ncopies > 1)
> >     {
> >       if (dump_enabled_p ())
> >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                          "multiple types in double reduction or condition "
> >                          "reduction or fold-left reduction.\n");
> >       return false;
> >     }
> >
> > From the dump:
> > foo.c:9:21: note:   === vect_analyze_loop_operations ===
> > foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
> > foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of
> > def: internal
> > foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
> > foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI
> > <last_8(7), 108(15)>, type of def: reduction
> > foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
> >
> > For V8HI, VF = 8, and vectype_in = vector(4) int.
> > Thus ncopies = VF / length(vectype_in) = 2, which is greater than 1,
> > and thus fails:
> > foo.c:9:21: missed:   multiple types in double reduction or condition
> > reduction or fold-left reduction.
> > foo.c:4:1: missed:   not vectorized: relevant phi not supported:
> > last_19 = PHI <last_8(7), 108(15)>
> > While for V4HI, VF = 4 and thus ncopies = 1, so it succeeds.
> >
> > For x86_64, it seems the vectorizer doesn't seem to try V4HI mode.
> > If I "force" the vectorizer to use V4HI mode, we get the following dump:
> > foo.c:9:21: note:   === vect_analyze_loop_operations ===
> > foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
> > foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of
> > def: internal
> > foo.c:9:21: note:   vect_is_simple_use: vectype vector(2) int
> > foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI
> > <last_8(7), 108(15)>, type of def: reduction
> > foo.c:9:21: note:   vect_is_simple_use: vectype vector(2) int
> > foo.c:9:21: missed:   multiple types in double reduction or condition
> > reduction or fold-left reduction.
> >
> > Not sure tho if this is the only reason for the test to fail to
> > vectorize on the target.
> > Will investigate in more details next week.
>
> The odd thing is that you say
>
>   for (int i = 0; i < N; i++)
>     {
>       aval = a[i];
>       if (b[i] < min_v)
>         last = aval;
>     }
>
> fails to vectorize but
>
>   for (int i = 0; i < N; i++)
>     {
>       if (b[i] < min_v)
>         last = a[i];
>     }
>
> succeeds?  The IL difference should be irrelevant for the reduction
Hi Richard,
Sorry for late response.
No this case containing a conditional load doesn't vectorize on aarch64 either:
foo2.c:9:21: note:  === analyze_loop_nest ===
foo2.c:9:21: note:   === vect_analyze_loop_form ===
foo2.c:9:21: missed:   not vectorized: control flow in loop.
foo2.c:9:21: missed:  bad loop form.

For this test:
for (int i = 0; i < N; i++)
    {
      aval = a[i];
      if (b[i] < min_v)
        last = aval;
    }

IIUC sink pass made the load conditional preventing vectorization
(similar to above),
but your PR92335 fix delays the sinking of load before loop opts, and
thus gets vectorized.
Till vect pass, the dumps are similar for x86 and aarch64.
> vectorization:
>
>   <bb 3> [local count: 1049367889]:
>   # last_19 = PHI <last_8(7), 108(15)>
>   # i_21 = PHI <i_17(7), 0(15)>
>   # ivtmp_18 = PHI <ivtmp_10(7), 43(15)>
>   _1 = (long unsigned int) i_21;
>   _2 = _1 * 2;
>   _3 = a_12(D) + _2;
>   aval_13 = *_3;
>   _5 = _1 * 4;
>   _6 = b_14(D) + _5;
>   _7 = *_6;
>   last_16 = (int) aval_13;
>   _9 = _7 < min_v_15(D);
>   last_8 = _9 ? last_16 : last_19;
>   i_17 = i_21 + 1;
>   ivtmp_10 = ivtmp_18 - 1;
>   if (ivtmp_10 != 0)
>     goto <bb 7>; [97.68%]
>
> vs
>
>   <bb 3> [local count: 1049367889]:
>   # last_19 = PHI <last_9(7), 108(15)>
>   # i_21 = PHI <i_17(7), 0(15)>
>   # ivtmp_11 = PHI <ivtmp_10(7), 43(15)>
>   _1 = (long unsigned int) i_21;
>   _2 = _1 * 4;
>   _3 = b_13(D) + _2;
>   _4 = *_3;
>   _5 = _4 < min_v_14(D);
>   _6 = _1 * 2;
>   _38 = _37 + _6;
>   _7 = (short int *) _38;
>   _8 = .MASK_LOAD (_7, 16B, _5);
>   last_16 = (int) _8;
>   last_9 = _5 ? last_16 : last_19;
>   i_17 = i_21 + 1;
>   ivtmp_10 = ivtmp_11 - 1;
>   if (ivtmp_10 != 0)
>     goto <bb 7>; [97.68%]
>
> maybe since the "mask" is used twice with the .MASK_LOAD
> we are not actually looking at the def (the comparison) and it's
> the comparison which would introduce the "multiple types"?
>
> That is, I wonder why not sinking the load, avoiding a conditional
> load, makes a difference to vectorizing the condition/extract last reduction.
IIUC, the issue is that the vector type used for reduction seems to be
different than
the vector type used for determining VF, and it passes the above check
in vectoriable_reduction,
only if VF matches the length of the vector type used for reduction.

For V4HI mode on aarch64, it sets VF = 4 which matches with the length of vector
type used for reduction (vector (4) int):
foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of
def: internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI
<last_8(7), 108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
>
> It doesn't seem to make a difference for x86.  That said, the "fix" is
> probably sticking the correct target on the dump-check, it seems
> that vect_fold_extract_last is no longer correct here.
Um sorry, I did go thru various checks in target-supports.exp, but not
sure which one will be appropriate for this case,
and am stuck here :/ Could you please suggest how to proceed ?

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> > > Richard.

  reply	other threads:[~2023-08-14 14:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-28  7:05 Richard Biener
2023-07-31 15:33 ` Jeff Law
2023-08-02  8:46   ` Richard Biener
2023-08-03 11:42     ` Prathamesh Kulkarni
2023-08-03 12:14       ` Richard Biener
2023-08-03 12:16         ` Richard Biener
2023-08-03 12:18           ` Richard Biener
2023-08-07  0:04             ` Prathamesh Kulkarni
2023-08-07  7:48               ` Richard Biener
2023-08-14 14:58                 ` Prathamesh Kulkarni [this message]
2023-08-15  7:36                   ` Richard Biener
2023-08-15  8:58                     ` Richard Sandiford
2023-08-17 17:10                       ` Prathamesh Kulkarni
2023-08-18  9:29                         ` Richard Biener
2023-08-18 11:41                           ` Richard Biener
2023-08-19 15:48                             ` Prathamesh Kulkarni
2023-08-21  6:57                               ` Richard Biener
2023-08-21 11:32                                 ` Prathamesh Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAgBjMkyPNAhaELEbgi6MBCRx_Ccajtwct6nJpTt4TdwzD8NsQ@mail.gmail.com \
    --to=prathamesh.kulkarni@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=rguenther@suse.de \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).