From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-512580-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 20258 invoked by alias); 6 Nov 2019 11:02:56 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 20237 invoked by uid 89); 6 Nov 2019 11:02:55 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-9.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,SPF_PASS autolearn=ham version=3.3.1 spammy=H*i:sk:w@mail., H*i:V8nsnSG-0S, H*f:sk:7eLy_dY, H*f:V8nsnSG-0S
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Nov 2019 11:02:53 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 425D37A7;	Wed,  6 Nov 2019 03:02:51 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C14313F6C4;	Wed,  6 Nov 2019 03:02:50 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Mail-Followup-To: Richard Biener <richard.guenther@gmail.com>,GCC Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [11a/n] Avoid retrying with the same vector modes
References: <mpto8y59fe7.fsf@arm.com> <mpt5zjyvbaz.fsf@arm.com>	<CAFiYyc35Za8rjUrP=fs3GtMt7Gr-8UvjFxkgfNPQyeuYU9pr+g@mail.gmail.com>	<mptbltpqp9c.fsf@arm.com>	<CAFiYyc1+V8nsnSG-0S+7eLy_dYow_NfN443W0mmx4VjhFX8=+w@mail.gmail.com>
Date: Wed, 06 Nov 2019 11:02:00 -0000
In-Reply-To: <CAFiYyc1+V8nsnSG-0S+7eLy_dYow_NfN443W0mmx4VjhFX8=+w@mail.gmail.com>	(Richard Biener's message of "Wed, 6 Nov 2019 11:26:55 +0100")
Message-ID: <mpt4kzhqnc6.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-IsSubscribed: yes
X-SW-Source: 2019-11/txt/msg00373.txt.bz2

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Patch 12/n makes the AArch64 port add four entries to
>> >> autovectorize_vector_modes.  Each entry describes a different
>> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
>> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
>> >> fewer element sizes than that, we could end up trying the same
>> >> combination of vector modes multiple times.  This patch adds a
>> >> check to prevent that.
>> >>
>> >> As before: each patch tested individually on aarch64-linux-gnu and the
>> >> series as a whole on x86_64-linux-gnu.
>> >>
>> >>
>> >> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
>> >>
>> >> gcc/
>> >>         * tree-vectorizer.h (vec_info::mode_set): New typedef.
>> >>         (vec_info::used_vector_mode): New member variable.
>> >>         (vect_chooses_same_modes_p): Declare.
>> >>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
>> >>         chosen vector mode in vec_info::used_vector_mode.
>> >>         (vect_chooses_same_modes_p): New function.
>> >>         * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
>> >>         the same vector statements multiple times.
>> >>         * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>> >>
>> >> Index: gcc/tree-vectorizer.h
>> >> ===================================================================
>> >> --- gcc/tree-vectorizer.h       2019-11-05 10:48:11.246092351 +0000
>> >> +++ gcc/tree-vectorizer.h       2019-11-05 10:57:41.662071145 +0000
>> >> @@ -298,6 +298,7 @@ typedef std::pair<tree, tree> vec_object
>> >>  /* Vectorizer state common between loop and basic-block vectorization.  */
>> >>  class vec_info {
>> >>  public:
>> >> +  typedef hash_set<int_hash<machine_mode, E_VOIDmode, E_BLKmode> > mode_set;
>> >>    enum vec_kind { bb, loop };
>> >>
>> >>    vec_info (vec_kind, void *, vec_info_shared *);
>> >> @@ -335,6 +336,9 @@ typedef std::pair<tree, tree> vec_object
>> >>    /* Cost data used by the target cost model.  */
>> >>    void *target_cost_data;
>> >>
>> >> +  /* The set of vector modes used in the vectorized region.  */
>> >> +  mode_set used_vector_modes;
>> >> +
>> >>    /* The argument we should pass to related_vector_mode when looking up
>> >>       the vector mode for a scalar mode, or VOIDmode if we haven't yet
>> >>       made any decisions about which vector modes to use.  */
>> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
>> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
>> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>> >>  extern tree get_same_sized_vectype (tree, tree);
>> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
>> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
>> >>                                 stmt_vec_info * = NULL, gimple ** = NULL);
>> >> Index: gcc/tree-vect-stmts.c
>> >> ===================================================================
>> >> --- gcc/tree-vect-stmts.c       2019-11-05 10:48:11.242092379 +0000
>> >> +++ gcc/tree-vect-stmts.c       2019-11-05 10:57:41.662071145 +0000
>> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
>> >>                                                       scalar_type);
>> >>    if (vectype && vinfo->vector_mode == VOIDmode)
>> >>      vinfo->vector_mode = TYPE_MODE (vectype);
>> >> +
>> >> +  if (vectype)
>> >> +    vinfo->used_vector_modes.add (TYPE_MODE (vectype));
>> >> +
>> >
>> > Do we actually end up _using_ all types returned by this function?
>>
>> No, not all of them, so it's a bit crude.  E.g. some types might end up
>> not being relevant after pattern recognition, or after we've made a
>> final decision about which parts of an address calculation to include
>> in a gather or scatter op.  So we can still end up retrying the same
>> thing even after the patch.
>>
>> The problem is that we're trying to avoid pointless retries on failure
>> as well as success, so we could end up stopping at arbitrary points.
>> I wasn't sure where else to handle this.
>
> Yeah, I think this "iterating" is somewhat bogus (crude) now.

I think it was crude even before the series though. :-)  Not sure the
series is making things worse.

The problem is that there's a chicken-and-egg problem between how
we decide to vectorise and which vector subarchitecture and VF we use.
E.g. if we have:

  unsigned char *x, *y;
  ...
  x[i] = (unsigned short) (x[i] + y[i] + 1) >> 1;

do we build the SLP graph on the assumption that we need to use short
elements, or on the assumption that we can use IFN_AVG_CEIL?  This
affects the VF we get out: using IFN_AVG_CEIL gives double the VF
relative to doing unsigned short arithmetic.

And we need to know which vector subarchitecture we're targetting when
making that decision: e.g. Advanced SIMD and SVE2 have IFN_AVG_CEIL,
but SVE doesn't.  On the other hand, SVE supports things that Advanced
SIMD doesn't.  It's similar story of course for the x86 vector subarchs.

For one pattern like this, we could simply try both ways.
But that becomes untenable if there are multiple potential patterns.
Iterating over the vector subarchs gives us a sensible way of reducing
the search space by only applying patterns that the subarch supports.

So...

> What we'd like to collect is for all defs the vector types we could
> use and then vectorizable_ defines constraints between input and
> output vector types.  From that we'd arrive at a (possibly quite
> large) set of "SLP graphs with vector types" we'd choose from.  I
> believe we'll never want to truly explore the whole space but guess we
> want to greedily compute those "SLP graphs with vector types" starting
> from what (grouped) datarefs tells us is possible (which is kind of
> what we do now).

...I don't think we can/should use the same SLP graph to pick vector
types for all subarchs, since the ideal graph depends on the subarch.
I'm also not sure the vectorizable_* routines could say anything that
isn't obvious from the input and output scalar types.  Won't it still be
the case that within an SLP instance, all scalars of type T will become
the same vector(N) T?

Thanks,
Richard