From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 38770 invoked by alias); 9 Nov 2016 12:52:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 38752 invoked by uid 89); 9 Nov 2016 12:52:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.8 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=ysrumyangmailcom, ysrumyan@gmail.com, amkerchenggmailcom, amker.cheng@gmail.com X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 Nov 2016 12:52:37 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id BB93EADC3; Wed, 9 Nov 2016 12:52:34 +0000 (UTC) Date: Wed, 09 Nov 2016 12:52:00 -0000 From: Richard Biener To: Yuri Rumyantsev cc: "Bin.Cheng" , Jeff Law , gcc-patches , Ilya Enkovich Subject: Re: [PATCH, vec-tails] Support loop epilogue vectorization In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2016-11/txt/msg00829.txt.bz2 On Wed, 9 Nov 2016, Yuri Rumyantsev wrote: > Thanks Richard for your comments. > Your proposed to handle epilogue loop just like normal short-trip loop > but this is not exactly truth since e.g. epilogue must not be peeled > for alignment. But if we know the epilogue data-refs are aligned we should have reflected that in the code so the vectorizer wouldn't even try to peel for alignment. OTOH peeling for alignment already knows that peeling a short-trip loop is not likely profitable (maybe the condition needs to be hardened to also work for VF/2). > It is not clear for me what are my next steps? Should I re-design the > patch completely or simply decompose the whole patch to different > parts? But it means that we must start review process from beginning > but release is closed to its end. What I disliked about the series from the beginning is that it does everything at once rather than first introducing vectorizing of epilogues as an independent patch. Lumping all in together makes it hard to decipher the conditions each feature is enabled. I'm mostly concerned about the predication part and thus if we can get the other parts separated and committed that would be a much smaller patch to look at and experiment. Note that only stage1 is at its end, we usually still accept patches that were posted before stage1 end during stage3. > Note also that i work for Intel till the end of year and have not idea > who will continue working on this project. Noted. Richard. > Any help will be appreciated. > > Thanks. > Yuri. > > 2016-11-09 13:37 GMT+03:00 Bin.Cheng : > > On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: > >> Hi All, > >> > >> I re-send all patches sent by Ilya earlier for review which support > >> vectorization of loop epilogues and loops with low trip count. We > >> assume that the only patch - vec-tails-07-combine-tail.patch - was not > >> approved by Jeff. > >> > >> I did re-base of all patches and performed bootstrapping and > >> regression testing that did not show any new failures. Also all > >> changes related to new vect_do_peeling algorithm have been changed > >> accordingly. > >> > >> Is it OK for trunk? > > > > Hi, > > I can't approve patches, but had some comments after going through the > > implementation. > > > > One confusing part is cost model change, as well as the way it's used > > to decide how epilogue loop should be vectorized. Given vect-tail is > > disabled at the moment and the cost change needs further tuning, is it > > reasonable to split this part out and get vectorization part > > reviewed/submitted independently? For example, let user specified > > parameters make the decision for now. Cost and target dependent > > changes should go in at last, this could make the patch easier to > > read. > > > > The implementation computes/shares quite amount information from main > > loop to epilogue loop vectorization. Furthermore, variables/fields > > for such information are somehow named in a misleading way. For > > example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the > > flag controlling whether epilogue loop should be vectorized with > > masking. However, it's actually controlled by exactly the same flag > > as whether epilogue loop should be combined into the main loop with > > masking: > > @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) > > > > slpeel_make_loop_iterate_ntimes (loop, niters_vector); > > > > + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) > > + vect_combine_loop_epilogue (loop_vinfo); > > + > > /* Reduce loop iterations by the vectorization factor. */ > > scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), > > expected_iterations / vf); > > > > IMHO, we should decouple main loop vectorization and epilogue > > vectorization as much as possible by sharing as few information as we > > can. The general idea is to handle epilogue loop just like normal > > short-trip loop. For example, we can rename > > LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something > > else), and we don't differentiate its meaning between main and > > epilogue(short-trip) loop. It only indicates the current loop should > > be vectorized with masking no matter it's a main loop or epilogue > > loop, and it works just like the current implementation. > > > > After this change, we can refine vectorization and make it more > > general for normal loop and epilogue(short trip) loop. For example, > > this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue > > loop and use it to control how it should be vectorized: > > + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) > > + { > > + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; > > + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; > > + } > > + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > > + && min_profitable_combine_iters >= 0) > > + { > > > > This works, but not that good for understanding or maintaining. > > > > Thanks, > > bin > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)