From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id EE0693858402 for ; Tue, 21 Sep 2021 16:34:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EE0693858402 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8F803113E; Tue, 21 Sep 2021 09:34:03 -0700 (PDT) Received: from [10.57.73.81] (unknown [10.57.73.81]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 060C13F718; Tue, 21 Sep 2021 09:34:02 -0700 (PDT) Subject: Re: [PATCH 1/3][vect] Add main vectorized loop unrolling To: Richard Biener Cc: "gcc-patches@gcc.gnu.org" , Richard Sandiford References: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com> <27777876-4201-5e86-bf9a-063143d38641@arm.com> From: "Andre Vieira (lists)" Message-ID: Date: Tue, 21 Sep 2021 17:34:05 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, BODY_8BITS, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2021 16:34:06 -0000 Hi Richi, Thanks for the review, see below some questions. On 21/09/2021 13:30, Richard Biener wrote: > On Fri, 17 Sep 2021, Andre Vieira (lists) wrote: > >> Hi all, >> >> This patch adds the ability to define a target hook to unroll the main >> vectorized loop. It also introduces --param's vect-unroll and >> vect-unroll-reductions to control this through a command-line. I found this >> useful to experiment and believe can help when tuning, so I decided to leave >> it in. >> We only unroll the main loop and have disabled unrolling epilogues for now. We >> also do not support unrolling of any loop that has a negative step and we do >> not support unrolling a loop with any reduction other than a >> TREE_CODE_REDUCTION. >> >> Bootstrapped and regression tested on aarch64-linux-gnu as part of the series. > I wonder why we want to change the vector modes used for the epilogue, > we're either making it more likely to need to fall through to the > scalar epilogue or require another vectorized epilogue. I don't quite understand what you mean by change the vector modes for the epilogue. I don't think we do. If you are referring to:       /* If we are unrolling, try all VECTOR_MODES for the epilogue.  */       if (loop_vinfo->par_unrolling_factor > 1)         {           next_vector_mode = vector_modes[0];           mode_i = 1;           if (dump_enabled_p ())         dump_printf_loc (MSG_NOTE, vect_location,                  "***** Re-trying analysis with vector mode"                  " %s for epilogue with partial vectors.\n",                  GET_MODE_NAME (next_vector_mode));           continue;         } That just forces trying the vector modes we've tried before. Though I might need to revisit this now I think about it. I'm afraid it might be possible for this to generate an epilogue with a vf that is not lower than that of the main loop, but I'd need to think about this again. Either way I don't think this changes the vector modes used for the epilogue. But maybe I'm just missing your point here. > That said, for simplicity I'd only change the VF of the main loop. > > There I wonder why you need to change vect_update_vf_for_slp or > vect_determine_partial_vectors_and_peeling and why it's not enough > to adjust the VF in a single place, I'd do that here: > > /* This is the point where we can re-start analysis with SLP forced off. > */ > start_over: > > /* Now the vectorization factor is final. */ > poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > gcc_assert (known_ne (vectorization_factor, 0U)); > > ----> call vect_update_vf_for_unroll () I can move it there, it would indeed remove the need for the change to vect_update_vf_for_slp, the change to vect_determine_partial_vectors_and_peeling would still be required I think. It is meant to disable using partial vectors in an unrolled loop. > note there's also loop->unroll (from #pragma GCC unroll) which we > could include in what you look at in vect_unroll_value. > > I don't like add_stmt_cost_for_unroll - how should a target go > and decide based on what it is fed? You could as well feed it > the scalar body or the vinfo so it can get a shot at all > the vectorizers meta data - but feeding it individual stmt_infos > does not add any meaningful abstraction and thus what's the > point? I am still working on tuning our backend hook, but the way it works is it estimates how many load, store and general ops are required for the vectorized loop based on these. > I _think_ what would make some sense is when we actually cost > the vector body (with the not unrolled VF) ask the target > "well, how about unrolling this?" because there it has the > chance to look at the actual vector stmts produced (in "cost form"). > And if the target answers "yeah - go ahead and try x4" we signal > that to the iteration and have "mode N with x4 unroll" validated and > costed. > > So instead of a new target hook amend the finish_cost hook to > produce a suggested unroll value and cost both the unrolled and > not unrolled body. > > Sorry for steering in a different direction ;) The reason we decided to do this early and not after cost is because 'vect_prune_runtime_alias_test_list' and 'vect_enhance_data_refs_alignment' require the VF and if you suddenly raise that the alias analysis could become invalid. An initial implementation did do it later for that very reason that we could reuse the cost calculations and AArch64 already computed these 'ops' after Richard Sandiford's patches. But yeah ... the above kinda led me to rewrite it this way. > > Thanks, > Richard. > > > >> gcc/ChangeLog: >> >>         * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR >>         and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL. >>         * doc/tm.texi.in: Add entries for target hooks above. >>         * params.opt: Add vect-unroll and vect-unroll-reductions >> parameters. >>         * target.def: Define hooks TARGET_VECTORIZE_UNROLL_FACTOR >>         and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL. >>         * targhooks.c (default_add_stmt_cost_for_unroll): New. >>         (default_unroll_factor): Likewise. >>         * targhooks.h (default_add_stmt_cost_for_unroll): Likewise. >>         (default_unroll_factor): Likewise. >>         * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize >>         par_unrolling_factor. >>         (vect_update_vf_for_slp): Use unrolling factor to update >> vectorization >>         factor. >>         (vect_determine_partial_vectors_and_peeling): Account for >> unrolling. >>         (vect_determine_unroll_factor): Determine how much to unroll >> vectorized >>         main loop. >>         (vect_analyze_loop_2): Call vect_determine_unroll_factor. >>         (vect_analyze_loop): Allow for epilogue vectorization when >> unrolling >>         and rewalk vector_mode array for the epilogues. >>         (vectorizable_reduction): Disable single_defuse_cycle when >> unrolling. >>         * tree-vectorizer.h (vect_unroll_value): Declare par_unrolling_factor >>         as a member of loop_vec_info. >>