From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 73B8E385803C for ; Tue, 5 Jan 2021 09:04:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 73B8E385803C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rguenther@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 62ACDAD29; Tue, 5 Jan 2021 09:04:28 +0000 (UTC) Date: Tue, 5 Jan 2021 10:04:28 +0100 (CET) From: Richard Biener To: Richard Sandiford cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] vect: Fix missing alias checks for 128-bit SVE [PR98371] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jan 2021 09:04:32 -0000 On Fri, 18 Dec 2020, Richard Sandiford wrote: > On AArch64, the vectoriser tries various ways of vectorising with both > SVE and Advanced SIMD and picks the best one. All other things being > equal, it prefers earlier attempts over later attempts. > > The way this works currently is that, once it has a successful > vectorisation attempt A, it analyses all other attempts as epilogue > loops of A: > > /* When pick_lowest_cost_p is true, we should in principle iterate > over all the loop_vec_infos that LOOP_VINFO could replace and > try to vectorize LOOP_VINFO under the same conditions. > E.g. when trying to replace an epilogue loop, we should vectorize > LOOP_VINFO as an epilogue loop with the same VF limit. When trying > to replace the main loop, we should vectorize LOOP_VINFO as a main > loop too. > > However, autovectorize_vector_modes is usually sorted as follows: > > - Modes that naturally produce lower VFs usually follow modes that > naturally produce higher VFs. > > - When modes naturally produce the same VF, maskable modes > usually follow unmaskable ones, so that the maskable mode > can be used to vectorize the epilogue of the unmaskable mode. > > This order is preferred because it leads to the maximum > epilogue vectorization opportunities. Targets should only use > a different order if they want to make wide modes available while > disparaging them relative to earlier, smaller modes. The assumption > in that case is that the wider modes are more expensive in some > way that isn't reflected directly in the costs. > > There should therefore be few interesting cases in which > LOOP_VINFO fails when treated as an epilogue loop, succeeds when > treated as a standalone loop, and ends up being genuinely cheaper > than FIRST_LOOP_VINFO. */ > > However, the vectoriser can normally elide alias checks for epilogue > loops, on the basis that the main loop should do them instead. > Converting an epilogue loop to a main loop can therefore cause the alias > checks to be skipped. (It probably also unfairly penalises the original > loop in the cost comparison, given that one loop will have alias checks > and the other won't.) > > As the comment says, we should in principle analyse each vector mode > twice: once as a main loop and once as an epilogue. However, doing > that up-front would be quite expensive. This patch instead goes for a > compromise: if an epilogue loop for mode M2 seems better than a main > loop for mode M1, re-analyse with M2 as the main loop. > > The patch fixes dg.torture.exp=pr69719.c when testing with > -msve-vector-bits=128. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? OK. Thanks, Richard. > Richard > > > gcc/ > PR tree-optimization/98371 > * tree-vect-loop.c (vect_reanalyze_as_main_loop): New function. > (vect_analyze_loop): If an epilogue loop appears to be cheaper > than the main loop, re-analyze it as a main loop before adopting > it as a main loop. > --- > gcc/tree-vect-loop.c | 61 +++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 58 insertions(+), 3 deletions(-) > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 688538a4521..221541f4a38 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -2850,6 +2850,45 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo, > return true; > } > > +/* If LOOP_VINFO is already a main loop, return it unmodified. Otherwise > + try to reanalyze it as a main loop. Return the loop_vinfo on success > + and null on failure. */ > + > +static loop_vec_info > +vect_reanalyze_as_main_loop (loop_vec_info loop_vinfo, unsigned int *n_stmts) > +{ > + if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > + return loop_vinfo; > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "***** Reanalyzing as a main loop with vector mode %s\n", > + GET_MODE_NAME (loop_vinfo->vector_mode)); > + > + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > + vec_info_shared *shared = loop_vinfo->shared; > + opt_loop_vec_info main_loop_vinfo = vect_analyze_loop_form (loop, shared); > + gcc_assert (main_loop_vinfo); > + > + main_loop_vinfo->vector_mode = loop_vinfo->vector_mode; > + > + bool fatal = false; > + bool res = vect_analyze_loop_2 (main_loop_vinfo, fatal, n_stmts); > + loop->aux = NULL; > + if (!res) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "***** Failed to analyze main loop with vector" > + " mode %s\n", > + GET_MODE_NAME (loop_vinfo->vector_mode)); > + delete main_loop_vinfo; > + return NULL; > + } > + LOOP_VINFO_VECTORIZABLE_P (main_loop_vinfo) = 1; > + return main_loop_vinfo; > +} > + > /* Function vect_analyze_loop. > > Apply a set of analyses on LOOP, and create a loop_vec_info struct > @@ -2997,9 +3036,25 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) > if (vinfos.is_empty () > && vect_joust_loop_vinfos (loop_vinfo, first_loop_vinfo)) > { > - delete first_loop_vinfo; > - first_loop_vinfo = opt_loop_vec_info::success (NULL); > - LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = NULL; > + loop_vec_info main_loop_vinfo > + = vect_reanalyze_as_main_loop (loop_vinfo, &n_stmts); > + if (main_loop_vinfo == loop_vinfo) > + { > + delete first_loop_vinfo; > + first_loop_vinfo = opt_loop_vec_info::success (NULL); > + } > + else if (main_loop_vinfo > + && vect_joust_loop_vinfos (main_loop_vinfo, > + first_loop_vinfo)) > + { > + delete first_loop_vinfo; > + first_loop_vinfo = opt_loop_vec_info::success (NULL); > + delete loop_vinfo; > + loop_vinfo > + = opt_loop_vec_info::success (main_loop_vinfo); > + } > + else > + delete main_loop_vinfo; > } > } > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)