On Thu, 3 Aug 2023 at 17:48, Richard Biener <rguenther@suse.de> wrote:
>
> On Thu, 3 Aug 2023, Richard Biener wrote:
>
> > On Thu, 3 Aug 2023, Richard Biener wrote:
> >
> > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote:
> > >
> > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > On Mon, 31 Jul 2023, Jeff Law wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:
> > > > > > > The following delays sinking of loads within the same innermost
> > > > > > > loop when it was unconditional before.  That's a not uncommon
> > > > > > > issue preventing vectorization when masked loads are not available.
> > > > > > >
> > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > >
> > > > > > > I have a followup patch improving sinking that without this would
> > > > > > > cause more of the problematic sinking - now that we have a second
> > > > > > > sink pass after loop opts this looks like a reasonable approach?
> > > > > > >
> > > > > > > OK?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > >  PR tree-optimization/92335
> > > > > > >  * tree-ssa-sink.cc (select_best_block): Before loop
> > > > > > >  optimizations avoid sinking unconditional loads/stores
> > > > > > >  in innermost loops to conditional executed places.
> > > > > > >
> > > > > > >  * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
> > > > > > >  * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
> > > > > > >  expect predictive commoning to happen instead of sinking.
> > > > > > >  * gcc.dg/vect/pr65947-3.c: Adjust.
> > > > > > I think it's reasonable -- there's probably going to be cases where it's not
> > > > > > great, but more often than not I think it's going to be a reasonable
> > > > > > heuristic.
> > > > > >
> > > > > > If there is undesirable fallout, better to find it over the coming months than
> > > > > > next spring.  So I'd suggest we go forward now to give more time to find any
> > > > > > pathological cases (if they exist).
> > > > >
> > > > > Agreed, I've pushed this now.
> > > > Hi Richard,
> > > > After this patch (committed in 399c8dd44ff44f4b496223c7cc980651c4d6f6a0),
> > > > pr65947-7.c "failed" for aarch64-linux-gnu:
> > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP VECTORIZED"
> > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects
> > > > scan-tree-dump-not vect "LOOP VECTORIZED"
> > > >
> > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target {
> > > > ! vect_fold_extract_last } } } } */
> > > >
> > > > With your commit, condition_reduction in pr65947-7.c gets vectorized
> > > > regardless of vect_fold_extract_last,
> > > > which gates the above test (which is an improvement, because the
> > > > function didn't get vectorized before the commit).
> > > >
> > > > The attached patch thus removes the gating on vect_fold_extract_last,
> > > > and the test passes again.
> > > > OK to commit ?
> > >
> > > OK.
> >
> > Or wait - the loop doesn't vectorize on x86_64, so I guess one
> > critical target condition is missing.  Can you figure out which?
>
> I see
>
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> note:   vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>,
> type of def: reduction
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> note:   vect_is_simple_use: vectype vector(4) int
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> missed:   multiple types in double reduction or condition reduction or
> fold-left reduction.
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1:
> missed:   not vectorized: relevant phi not supported: last_19 = PHI
> <last_8(7), 108(15)>
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> missed:  bad operation or unsupported loop bound.
Hi Richard,
Looking at the aarch64 vect dump, it seems the loop in
condition_reduction gets vectorized with V4HI mode
while fails for other modes in vectorizable_condition:

  if ((double_reduc || reduction_type != TREE_CODE_REDUCTION)
      && ncopies > 1)
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "multiple types in double reduction or condition "
                         "reduction or fold-left reduction.\n");
      return false;
    }

From the dump:
foo.c:9:21: note:   === vect_analyze_loop_operations ===
foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of
def: internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI
<last_8(7), 108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int

For V8HI, VF = 8, and vectype_in = vector(4) int.
Thus ncopies = VF / length(vectype_in) = 2, which is greater than 1,
and thus fails:
foo.c:9:21: missed:   multiple types in double reduction or condition
reduction or fold-left reduction.
foo.c:4:1: missed:   not vectorized: relevant phi not supported:
last_19 = PHI <last_8(7), 108(15)>
While for V4HI, VF = 4 and thus ncopies = 1, so it succeeds.

For x86_64, it seems the vectorizer doesn't seem to try V4HI mode.
If I "force" the vectorizer to use V4HI mode, we get the following dump:
foo.c:9:21: note:   === vect_analyze_loop_operations ===
foo.c:9:21: note:   examining phi: last_19 = PHI <last_8(7), 108(15)>
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of
def: internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(2) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI
<last_8(7), 108(15)>, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(2) int
foo.c:9:21: missed:   multiple types in double reduction or condition
reduction or fold-left reduction.

Not sure tho if this is the only reason for the test to fail to
vectorize on the target.
Will investigate in more details next week.

Thanks,
Prathamesh
>
> Richard.