From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20258 invoked by alias); 6 Nov 2019 11:02:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 20237 invoked by uid 89); 6 Nov 2019 11:02:55 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,SPF_PASS autolearn=ham version=3.3.1 spammy=H*i:sk:w@mail., H*i:V8nsnSG-0S, H*f:sk:7eLy_dY, H*f:V8nsnSG-0S X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Nov 2019 11:02:53 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 425D37A7; Wed, 6 Nov 2019 03:02:51 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C14313F6C4; Wed, 6 Nov 2019 03:02:50 -0800 (PST) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,GCC Patches , richard.sandiford@arm.com Cc: GCC Patches Subject: Re: [11a/n] Avoid retrying with the same vector modes References: Date: Wed, 06 Nov 2019 11:02:00 -0000 In-Reply-To: (Richard Biener's message of "Wed, 6 Nov 2019 11:26:55 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes X-SW-Source: 2019-11/txt/msg00373.txt.bz2 Richard Biener writes: > On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford >> > wrote: >> >> >> >> Patch 12/n makes the AArch64 port add four entries to >> >> autovectorize_vector_modes. Each entry describes a different >> >> vector mode assignment for vector code that mixes 8-bit, 16-bit, >> >> 32-bit and 64-bit elements. But if (as usual) the vector code has >> >> fewer element sizes than that, we could end up trying the same >> >> combination of vector modes multiple times. This patch adds a >> >> check to prevent that. >> >> >> >> As before: each patch tested individually on aarch64-linux-gnu and the >> >> series as a whole on x86_64-linux-gnu. >> >> >> >> >> >> 2019-11-04 Richard Sandiford >> >> >> >> gcc/ >> >> * tree-vectorizer.h (vec_info::mode_set): New typedef. >> >> (vec_info::used_vector_mode): New member variable. >> >> (vect_chooses_same_modes_p): Declare. >> >> * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each >> >> chosen vector mode in vec_info::used_vector_mode. >> >> (vect_chooses_same_modes_p): New function. >> >> * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying >> >> the same vector statements multiple times. >> >> * tree-vect-slp.c (vect_slp_bb_region): Likewise. >> >> >> >> Index: gcc/tree-vectorizer.h >> >> =================================================================== >> >> --- gcc/tree-vectorizer.h 2019-11-05 10:48:11.246092351 +0000 >> >> +++ gcc/tree-vectorizer.h 2019-11-05 10:57:41.662071145 +0000 >> >> @@ -298,6 +298,7 @@ typedef std::pair vec_object >> >> /* Vectorizer state common between loop and basic-block vectorization. */ >> >> class vec_info { >> >> public: >> >> + typedef hash_set > mode_set; >> >> enum vec_kind { bb, loop }; >> >> >> >> vec_info (vec_kind, void *, vec_info_shared *); >> >> @@ -335,6 +336,9 @@ typedef std::pair vec_object >> >> /* Cost data used by the target cost model. */ >> >> void *target_cost_data; >> >> >> >> + /* The set of vector modes used in the vectorized region. */ >> >> + mode_set used_vector_modes; >> >> + >> >> /* The argument we should pass to related_vector_mode when looking up >> >> the vector mode for a scalar mode, or VOIDmode if we haven't yet >> >> made any decisions about which vector modes to use. */ >> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal >> >> extern tree get_vectype_for_scalar_type (vec_info *, tree); >> >> extern tree get_mask_type_for_scalar_type (vec_info *, tree); >> >> extern tree get_same_sized_vectype (tree, tree); >> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode); >> >> extern bool vect_get_loop_mask_type (loop_vec_info); >> >> extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *, >> >> stmt_vec_info * = NULL, gimple ** = NULL); >> >> Index: gcc/tree-vect-stmts.c >> >> =================================================================== >> >> --- gcc/tree-vect-stmts.c 2019-11-05 10:48:11.242092379 +0000 >> >> +++ gcc/tree-vect-stmts.c 2019-11-05 10:57:41.662071145 +0000 >> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v >> >> scalar_type); >> >> if (vectype && vinfo->vector_mode == VOIDmode) >> >> vinfo->vector_mode = TYPE_MODE (vectype); >> >> + >> >> + if (vectype) >> >> + vinfo->used_vector_modes.add (TYPE_MODE (vectype)); >> >> + >> > >> > Do we actually end up _using_ all types returned by this function? >> >> No, not all of them, so it's a bit crude. E.g. some types might end up >> not being relevant after pattern recognition, or after we've made a >> final decision about which parts of an address calculation to include >> in a gather or scatter op. So we can still end up retrying the same >> thing even after the patch. >> >> The problem is that we're trying to avoid pointless retries on failure >> as well as success, so we could end up stopping at arbitrary points. >> I wasn't sure where else to handle this. > > Yeah, I think this "iterating" is somewhat bogus (crude) now. I think it was crude even before the series though. :-) Not sure the series is making things worse. The problem is that there's a chicken-and-egg problem between how we decide to vectorise and which vector subarchitecture and VF we use. E.g. if we have: unsigned char *x, *y; ... x[i] = (unsigned short) (x[i] + y[i] + 1) >> 1; do we build the SLP graph on the assumption that we need to use short elements, or on the assumption that we can use IFN_AVG_CEIL? This affects the VF we get out: using IFN_AVG_CEIL gives double the VF relative to doing unsigned short arithmetic. And we need to know which vector subarchitecture we're targetting when making that decision: e.g. Advanced SIMD and SVE2 have IFN_AVG_CEIL, but SVE doesn't. On the other hand, SVE supports things that Advanced SIMD doesn't. It's similar story of course for the x86 vector subarchs. For one pattern like this, we could simply try both ways. But that becomes untenable if there are multiple potential patterns. Iterating over the vector subarchs gives us a sensible way of reducing the search space by only applying patterns that the subarch supports. So... > What we'd like to collect is for all defs the vector types we could > use and then vectorizable_ defines constraints between input and > output vector types. From that we'd arrive at a (possibly quite > large) set of "SLP graphs with vector types" we'd choose from. I > believe we'll never want to truly explore the whole space but guess we > want to greedily compute those "SLP graphs with vector types" starting > from what (grouped) datarefs tells us is possible (which is kind of > what we do now). ...I don't think we can/should use the same SLP graph to pick vector types for all subarchs, since the ideal graph depends on the subarch. I'm also not sure the vectorizable_* routines could say anything that isn't obvious from the input and output scalar types. Won't it still be the case that within an SLP instance, all scalars of type T will become the same vector(N) T? Thanks, Richard