From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 51923 invoked by alias); 6 Nov 2019 11:22:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 51915 invoked by uid 89); 6 Nov 2019 11:22:17 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:sk:mpt4kzh, H*i:sk:mpt4kzh X-HELO: mail-lj1-f174.google.com Received: from mail-lj1-f174.google.com (HELO mail-lj1-f174.google.com) (209.85.208.174) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Nov 2019 11:22:15 +0000 Received: by mail-lj1-f174.google.com with SMTP id n5so14628299ljc.9 for ; Wed, 06 Nov 2019 03:22:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=eq6VzLnb+UT1kULZyjBUJ0o38QsZh9131RF8wHjzmh4=; b=MWylweH4o5N9B7fArA8sQbj7WX79VLPC1VXvOWDgRDpf14P8qLaQN7VF2ycb5UHVq9 wT7aaXJk4r/vd5lY7AcePSg32LhM2mCz/VtugYUYccUUh+ahg03XThEPXvLgG/LPXNN4 sNII1sbeJsLVY8kumS/yWM5rhck5qNNFLoPFkX7jg36XmIYDwpxBIZDhFZbvbeUs8ULj T+c0gXB2DpaJdVaTZvrz4M+muJzAXi/sGrMA3ORlCafjgrDdzvXlp/fLgUlCire98zXx ECZCyACNEANIP1K8fhTk8w3iVRilE9YEojng+rnpPNktug23WJRWzL/6dVlsIG/hEPiz pdVw== MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Wed, 06 Nov 2019 11:22:00 -0000 Message-ID: Subject: Re: [11a/n] Avoid retrying with the same vector modes To: Richard Sandiford Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-11/txt/msg00375.txt.bz2 On Wed, Nov 6, 2019 at 12:02 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford > > wrote: > >> > >> Richard Biener writes: > >> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford > >> > wrote: > >> >> > >> >> Patch 12/n makes the AArch64 port add four entries to > >> >> autovectorize_vector_modes. Each entry describes a different > >> >> vector mode assignment for vector code that mixes 8-bit, 16-bit, > >> >> 32-bit and 64-bit elements. But if (as usual) the vector code has > >> >> fewer element sizes than that, we could end up trying the same > >> >> combination of vector modes multiple times. This patch adds a > >> >> check to prevent that. > >> >> > >> >> As before: each patch tested individually on aarch64-linux-gnu and the > >> >> series as a whole on x86_64-linux-gnu. > >> >> > >> >> > >> >> 2019-11-04 Richard Sandiford > >> >> > >> >> gcc/ > >> >> * tree-vectorizer.h (vec_info::mode_set): New typedef. > >> >> (vec_info::used_vector_mode): New member variable. > >> >> (vect_chooses_same_modes_p): Declare. > >> >> * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each > >> >> chosen vector mode in vec_info::used_vector_mode. > >> >> (vect_chooses_same_modes_p): New function. > >> >> * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying > >> >> the same vector statements multiple times. > >> >> * tree-vect-slp.c (vect_slp_bb_region): Likewise. > >> >> > >> >> Index: gcc/tree-vectorizer.h > >> >> =================================================================== > >> >> --- gcc/tree-vectorizer.h 2019-11-05 10:48:11.246092351 +0000 > >> >> +++ gcc/tree-vectorizer.h 2019-11-05 10:57:41.662071145 +0000 > >> >> @@ -298,6 +298,7 @@ typedef std::pair vec_object > >> >> /* Vectorizer state common between loop and basic-block vectorization. */ > >> >> class vec_info { > >> >> public: > >> >> + typedef hash_set > mode_set; > >> >> enum vec_kind { bb, loop }; > >> >> > >> >> vec_info (vec_kind, void *, vec_info_shared *); > >> >> @@ -335,6 +336,9 @@ typedef std::pair vec_object > >> >> /* Cost data used by the target cost model. */ > >> >> void *target_cost_data; > >> >> > >> >> + /* The set of vector modes used in the vectorized region. */ > >> >> + mode_set used_vector_modes; > >> >> + > >> >> /* The argument we should pass to related_vector_mode when looking up > >> >> the vector mode for a scalar mode, or VOIDmode if we haven't yet > >> >> made any decisions about which vector modes to use. */ > >> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal > >> >> extern tree get_vectype_for_scalar_type (vec_info *, tree); > >> >> extern tree get_mask_type_for_scalar_type (vec_info *, tree); > >> >> extern tree get_same_sized_vectype (tree, tree); > >> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode); > >> >> extern bool vect_get_loop_mask_type (loop_vec_info); > >> >> extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *, > >> >> stmt_vec_info * = NULL, gimple ** = NULL); > >> >> Index: gcc/tree-vect-stmts.c > >> >> =================================================================== > >> >> --- gcc/tree-vect-stmts.c 2019-11-05 10:48:11.242092379 +0000 > >> >> +++ gcc/tree-vect-stmts.c 2019-11-05 10:57:41.662071145 +0000 > >> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v > >> >> scalar_type); > >> >> if (vectype && vinfo->vector_mode == VOIDmode) > >> >> vinfo->vector_mode = TYPE_MODE (vectype); > >> >> + > >> >> + if (vectype) > >> >> + vinfo->used_vector_modes.add (TYPE_MODE (vectype)); > >> >> + > >> > > >> > Do we actually end up _using_ all types returned by this function? > >> > >> No, not all of them, so it's a bit crude. E.g. some types might end up > >> not being relevant after pattern recognition, or after we've made a > >> final decision about which parts of an address calculation to include > >> in a gather or scatter op. So we can still end up retrying the same > >> thing even after the patch. > >> > >> The problem is that we're trying to avoid pointless retries on failure > >> as well as success, so we could end up stopping at arbitrary points. > >> I wasn't sure where else to handle this. > > > > Yeah, I think this "iterating" is somewhat bogus (crude) now. > > I think it was crude even before the series though. :-) Not sure the > series is making things worse. > > The problem is that there's a chicken-and-egg problem between how > we decide to vectorise and which vector subarchitecture and VF we use. > E.g. if we have: > > unsigned char *x, *y; > ... > x[i] = (unsigned short) (x[i] + y[i] + 1) >> 1; > > do we build the SLP graph on the assumption that we need to use short > elements, or on the assumption that we can use IFN_AVG_CEIL? This > affects the VF we get out: using IFN_AVG_CEIL gives double the VF > relative to doing unsigned short arithmetic. > > And we need to know which vector subarchitecture we're targetting when > making that decision: e.g. Advanced SIMD and SVE2 have IFN_AVG_CEIL, > but SVE doesn't. On the other hand, SVE supports things that Advanced > SIMD doesn't. It's similar story of course for the x86 vector subarchs. > > For one pattern like this, we could simply try both ways. > But that becomes untenable if there are multiple potential patterns. > Iterating over the vector subarchs gives us a sensible way of reducing > the search space by only applying patterns that the subarch supports. > > So... > > > What we'd like to collect is for all defs the vector types we could > > use and then vectorizable_ defines constraints between input and > > output vector types. From that we'd arrive at a (possibly quite > > large) set of "SLP graphs with vector types" we'd choose from. I > > believe we'll never want to truly explore the whole space but guess we > > want to greedily compute those "SLP graphs with vector types" starting > > from what (grouped) datarefs tells us is possible (which is kind of > > what we do now). > > ...I don't think we can/should use the same SLP graph to pick vector > types for all subarchs, since the ideal graph depends on the subarch. > I'm also not sure the vectorizable_* routines could say anything that > isn't obvious from the input and output scalar types. Won't it still be > the case that within an SLP instance, all scalars of type T will become > the same vector(N) T? Not necessarily. I can see the SLP graph containing "reductions" (like those complex patterns proposed). But yes, at the moment there's a single group-size per SLP instance. Now, for the SLP _graph_ we may have one instance with 4 and one with group size of 8 both for example sharing the same grouped load. It may make sense to vectorize the load with v8si and for the group-size 4 SLP instance have a "reduction operation" that selects the appropriate part of the loaded vector. Now, vectorizable_* for say a conversion from int to double may be able to vectorize for a v4si directly to v4df or to two times v2df. With the size restriction relaxed vectorizable_* could opt to choose smaller/larger vectors specifically and thus also have different vector types for the same scalar type in the SLP graph. I do expect this to be profitable on x86 for some loops due to some asymmetries in the ISA (and extra cost of lane-crossing operations for say AVX where using SSE is cheaper for some ops even though you now have 2 or more instructions). Richard. > > Thanks, > Richard