From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id E92023858C2C for ; Fri, 1 Oct 2021 08:19:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E92023858C2C Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id A13A122677; Fri, 1 Oct 2021 08:19:23 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 84467A3B83; Fri, 1 Oct 2021 08:19:23 +0000 (UTC) Date: Fri, 1 Oct 2021 10:19:23 +0200 (CEST) From: Richard Biener To: "Andre Vieira (lists)" cc: "gcc-patches@gcc.gnu.org" , Richard Sandiford Subject: Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling In-Reply-To: Message-ID: <4272814n-8538-p793-157q-5n6q16r48n51@fhfr.qr> References: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> <27777876-4201-5e86-bf9a-063143d38641@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2021 08:19:27 -0000 On Thu, 30 Sep 2021, Andre Vieira (lists) wrote: > Hi, > > > >> That just forces trying the vector modes we've tried before. Though I might > >> need to revisit this now I think about it. I'm afraid it might be possible > >> for > >> this to generate an epilogue with a vf that is not lower than that of the > >> main > >> loop, but I'd need to think about this again. > >> > >> Either way I don't think this changes the vector modes used for the > >> epilogue. > >> But maybe I'm just missing your point here. > > Yes, I was refering to the above which suggests that when we vectorize > > the main loop with V4SF but unroll then we try vectorizing the > > epilogue with V4SF as well (but not unrolled). I think that's > > premature (not sure if you try V8SF if the main loop was V4SF but > > unrolled 4 times). > > My main motivation for this was because I had a SVE loop that vectorized with > both VNx8HI, then V8HI which beat VNx8HI on cost, then it decided to unroll > V8HI by two and skipped using VNx8HI as a predicated epilogue which would've > been the best choice. I see, yes - for fully predicated epilogues it makes sense to consider the same vector mode as for the main loop anyways (independent on whether we're unrolling or not). One could argue that with an unrolled V4SImode main loop a predicated V8SImode epilogue would also be a good match (but then somehow costing favored the unrolled V4SI over the V8SI for the main loop...). > So that is why I decided to just 'reset' the vector_mode selection. In a > scenario where you only have the traditional vector modes it might make less > sense. > > Just realized I still didn't add any check to make sure the epilogue has a > lower VF than the previous loop, though I'm still not sure that could happen. > I'll go look at where to add that if you agree with this. As said above, it only needs a lower VF in case the epilogue is not fully masked - otherwise the same VF would be OK. > >> I can move it there, it would indeed remove the need for the change to > >> vect_update_vf_for_slp, the change to > >> vect_determine_partial_vectors_and_peeling would still be required I think. > >> It > >> is meant to disable using partial vectors in an unrolled loop. > > Why would we disable the use of partial vectors in an unrolled loop? > The motivation behind that is that the overhead caused by generating > predicates for each iteration will likely be too much for it to be profitable > to unroll. On top of that, when dealing with low iteration count loops, if > executing one predicated iteration would be enough we now still need to > execute all other unrolled predicated iterations, whereas if we keep them > unrolled we skip the unrolled loops. OK, I guess we're not factoring in costs when deciding on predication but go for it if it's gernally enabled and possible. With the proposed scheme we'd then cost the predicated not unrolled loop against a not predicated unrolled loop which might be a bit apples vs. oranges also because the target made the unroll decision based on the data it collected for the predicated loop. > > Sure but I'm suggesting you keep the not unrolled body as one way of > > costed vectorization but then if the target says "try unrolling" > > re-do the analysis with the same mode but a larger VF. Just like > > we iterate over vector modes you'll now iterate over pairs of > > vector mode + VF (unroll factor). It's not about re-using the costing > > it's about using costing that is actually relevant and also to avoid > > targets inventing two distinct separate costings - a target (powerpc) > > might already compute load/store density and other stuff for the main > > costing so it should have an idea whether doubling or triplicating is OK. > > > > Richard. > Sounds good! I changed the patch to determine the unrolling factor later, > after all analysis has been done and retry analysis if an unrolling factor > larger than 1 has been chosen for this loop and vector_mode. > > gcc/ChangeLog: > >         * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR. >         * doc/tm.texi.in: Add entries for TARGET_VECTORIZE_UNROLL_FACTOR. >         * params.opt: Add vect-unroll and vect-unroll-reductions > parameters. What's the reason to add the --params? It looks like this makes us unroll with a static number short-cutting the target. IMHO that's never going to be a great thing - but what we could do is look at loop->unroll and try to honor that (factoring in that the vectorization factor is already the times we unroll). So I'd leave those params out for now, the user would have a much more fine-grained way to control this with the unroll pragma. Adding a max-vect-unroll parameter would be another thing but that would apply after the targets or pragma decision. >         * target.def: Define hook TARGET_VECTORIZE_UNROLL_FACTOR. I still do not like the new target hook - as said I'd like to make you have the finis_cost hook allow the target to specify a suggested unroll factor instead because that's the point where it has all the info. Thanks, Richard. >         * targhooks.c (default_unroll_factor): New. >         * targhooks.h (default_unroll_factor): Likewise. >         * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize >         par_unrolling_factor. >         (vect_determine_partial_vectors_and_peeling): Account for > unrolling. >         (vect_determine_unroll_factor): New. >         (vect_try_unrolling): New. >         (vect_reanalyze_as_main_loop): Call vect_try_unrolling when >         retrying a loop_vinfo as a main loop. >         (vect_analyze_loop): Call vect_try_unrolling when vectorizing > main loops. >         (vect_analyze_loop): Allow for epilogue vectorization when unrolling >         and rewalk vector_mode warray for the epilogues. >         (vectorizable_reduction): Disable single_defuse_cycle when > unrolling. >         * tree-vectorizer.h (vect_unroll_value): Declare par_unrolling_factor >         as a member of loop_vec_info. > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)