From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17379 invoked by alias); 24 Oct 2007 04:56:04 -0000 Received: (qmail 17370 invoked by uid 22791); 24 Oct 2007 04:56:02 -0000 X-Spam-Check-By: sourceware.org Received: from mtagate4.de.ibm.com (HELO mtagate4.de.ibm.com) (195.212.29.153) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 24 Oct 2007 04:55:58 +0000 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate4.de.ibm.com (8.13.8/8.13.8) with ESMTP id l9O4ttJZ126918 for ; Wed, 24 Oct 2007 04:55:55 GMT Received: from d12av04.megacenter.de.ibm.com (d12av04.megacenter.de.ibm.com [9.149.165.229]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9O4tt2X2269188 for ; Wed, 24 Oct 2007 06:55:55 +0200 Received: from d12av04.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av04.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9O4tseK010302 for ; Wed, 24 Oct 2007 06:55:54 +0200 Received: from d12mc102.megacenter.de.ibm.com (d12mc102.megacenter.de.ibm.com [9.149.167.114]) by d12av04.megacenter.de.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id l9O4tsVZ010298; Wed, 24 Oct 2007 06:55:54 +0200 In-Reply-To: Subject: RE: [PATCH,vect] ppc cost model options To: "Jagasia, Harsha" Cc: "David Edelsohn" , gcc-patches@gcc.gnu.org, janis187@us.ibm.com X-Mailer: Lotus Notes Release 7.0 HF277 June 21, 2006 Message-ID: From: Dorit Nuzman Date: Wed, 24 Oct 2007 08:43:00 -0000 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2007-10/txt/msg01393.txt.bz2 "Jagasia, Harsha" wrote on 19/10/2007 16:38:45: ... > >vectorized and such) because the vectorizer cost-model is pending some > >fixes from Harsha > (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00916.html > >), to be followed by target-specific tuning of costs.) > > Here is the patch implementing the early cost model check. > great! > The check is added as follows: > - If we do versioning, the threshold comparison is and'ed to the guard > that controls the versioning, > - Otherwise, if we do peeling for alignment, we can determine the > loop-count of the prolog loop according to the threshold test > - Otherwise, the threshold comparison is or'ed with the guard that > controls whether to bypass the vector code and go straight to the scalar > epilogue. > The only technical issue I have is with the fact that I think we're not taking into account that because of the presence of the runtime-test, there are costs that should be added also to the scalar side of the equasion (branch costs), and these costs are different in each of the above 3 cases. (This is something we don't need to worry about when we make the decision at compile time). For example, in case 2, even if we choose to execute the scalar loop we pay the cost of 2 branches that did not exist in the original code; in case 3 we pay the cost of 1-3 branches (depending on whether we also peel for alignment and if so, if we also enter the prolog loop). (We also pay for these branches on the vector side of the equasion, but this is already taken into account). So we actually have 4 slighlty different equasions: equasion for compile-time test: scalar_costs >= vector_costs equasion for run-time test, case 1: scalar_costs + versioning_guard_cost >= vector_costs equasion for run-time test, case 2: scalar_costs + cost_of_guard_for_prolog_peel >= vector_costs equasion for run-time test, case 3: scalar_costs + cost_of_guard_for_prolog_peel + cost_of_guard_for_epilog_peel >= vector_costs Do you think we can add these 3 different fixups somehow, depending on where the run-time test is actually placed? (this can be important for platforms like the SPU, where branches can be very costly, such that we may choose to vectorize the less efficient vector version and not pay the branch cost involved with branching to skip the vector code). The rest are just various style issues: > * tree-vect-transform.c (vect_do_peeling_for_loop_bound): Computes > the treshold if the cost model check has not been done already. treshold -> threshold > Index: tree-vectorizer.c > =================================================================== > --- tree-vectorizer.c (revision 129356) > +++ tree-vectorizer.c (working copy) ... > slpeel_tree_peel_loop_to_edge (struct loop *loop, > edge e, tree first_niters, > tree niters, bool update_first_loop_count, > - unsigned int th) > + unsigned int th, bool check) Please document the new function argument "check" (I see that a documentation for "th" is also missing; if you feel like it, please add that too while you're at it...) Actually, maybe we can come up with something more descriptive than "check"? e.g. "check_profitability", "add_cost_model_check"... > - /* 2. Add the guard that controls whether the first loop is executed. > - Resulting CFG would be: > + /* 2.a Add the guard that controls whether the first loop is executed. > + Resulting CFG would be: > > - bb_before_first_loop: > - if (FIRST_NITERS == 0) GOTO bb_before_second_loop > - GOTO first-loop > + bb_before_first_loop: > + if (FIRST_NITERS == 0) GOTO bb_before_second_loop > + GOTO first-loop > > - first_loop: > - do { > - } while ... > + first_loop: > + do { > + } while ... > > - bb_before_second_loop: > + bb_before_second_loop: > > - second_loop: > - do { > - } while ... > + second_loop: > + do { > + } while ... > > - orig_exit_bb: > - */ > + orig_exit_bb: > + */ > + > + > + /* 2.b Add the cost model check that allows the prologue > + to iterate for the entire unchanged scalar > + iterations of the loop in the event that the cost > + model indicates that the scalar loop is more > + profitable than the vector one. > + > + Resulting CFG after prologue peeling would be: > + > + if (scalar_loop_iterations <= th) > + FIRST_NITERS = scalar_loop_iterations > + > + bb_before_first_loop: > + if (FIRST_NITERS == 0) GOTO bb_before_second_loop > + GOTO first-loop > + > + first_loop: > + do { > + } while ... > + > + bb_before_second_loop: > + > + second_loop: > + do { > + } while ... > + > + orig_exit_bb: > + */ > + > + /* 2.c Add the cost model check that allows the epilogue > + to iterate for the entire unchanged scalar > + iterations of the loop in the event that the cost > + model indicates that the scalar loop is more > + profitable than the vector one. > + > + Resulting CFG after prologue peeling would be: > + > + bb_before_first_loop: > + if ((scalar_loop_iterations <= th) > + || > + FIRST_NITERS == 0) GOTO bb_before_second_loop > + GOTO first-loop > + > + first_loop: > + do { > + } while ... > + > + bb_before_second_loop: > + > + second_loop: > + do { > + } while ... > + > + orig_exit_bb: > + */ > Just so it's clear that this function does only one of the above per invocation, maybe write: "2. Add the control/guard code in one of the following ways: 2.a. ..." Also, I was wondering if it would be clearer if the description was more compact - i.e. instead of duplicating 3 times the pseudo-code that remains the same in all 3 alternatives, do something along these lines: " 2. Add the control/guard code. Resulting CFG would be: bb_before_first_loop: if (COND) GOTO bb_before_second_loop GOTO first-loop first_loop: do { } while ... bb_before_second_loop: second_loop: do { } while ... orig_exit_bb: The following 3 variants are supported: 2a. The guard controls whether the first loop is executed. COND: FIRST_NITERS == 0 2b. Add the cost model check that allows the prologue to iterate.... if (scalar_loop_iterations <= th) FIRST_NITERS = scalar_loop_iterations COND: FIRST_NITERS == 0 2c. Add the cost model check that allows the epilogue to iterate... COND: (scalar_loop_iterations <= th) || FIRST_NITERS == 0 " ... but in the end I'm not sure which way is clearer, so I'll leave the decision to you. A general comment about the changes to slpeel_tree_peel_loop_to_edge: I wonder if this function would be taken out of the vectorizer one day and merged with the generic peeling utils. In any case, it would be desirable if instead of manipulating the condition inside the function, the function would get the condition as an input? (even if it ends up remaining a vectorizer specific function - I think this will be a good thing to do). If this looks like too much work, then please consider factoring out the new code that builds the condition in 3 different ways, or at least just the part that creates the if-stmt for the "Peologue peeling case: > + /* Epilogue peeling. */ > + if (!update_first_loop_count) > + { ... > + } > + /* Prologue peeling. */ > + else > + { > + if (check) > + first_niters = conditionally_reset_first_niters (...) > + ... > + } > Index: tree-vect-transform.c > =================================================================== > --- tree-vect-transform.c (revision 129356) > +++ tree-vect-transform.c (working copy) ... > @@ -6482,28 +6483,35 @@ vect_do_peeling_for_loop_bound (loop_vec > > loop_num = loop->num; > > - /* Analyze cost to set threshhold for vectorized loop. */ threshhold --> threshold ... > + /* Analyze cost to set threshhold for vectorized loop. */ > + min_profitable_iters = LOOP_VINFO_COST_MODEL_MIN_ITERS (loop_vinfo); > about the comment - we don't "Analyze cost" here, just use the already precalculcated threshold. (it also repeats in other places). > - th = (unsigned) min_scalar_loop_bound; > - if (min_profitable_iters > - && (!min_scalar_loop_bound > - || min_profitable_iters > min_scalar_loop_bound)) > - th = (unsigned) min_profitable_iters; > + min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND) > + * LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 1); > > - if (((LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0) > - || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) > - && vect_print_dump_info (REPORT_DETAILS)) > - fprintf (vect_dump, "vectorization may not be profitable."); > + /* Use the cost model only if it is more conservative than > user specified > + threshold. */ > + th = (unsigned) min_scalar_loop_bound; > + if (min_profitable_iters > + && (!min_scalar_loop_bound > + || min_profitable_iters > min_scalar_loop_bound)) > + th = (unsigned) min_profitable_iters; > + > + if (th && vect_print_dump_info (REPORT_DETAILS)) > + fprintf (vect_dump, "loop bound peeling: vectorization may not be > profitable."); > + } The above code repeats almost identically 3 times: here, in vect_do_peeling_for_alignment and in vect_loop_versioning. Please it factor out... > @@ -7031,7 +7078,11 @@ vect_create_cond_for_alias_checks (loop_ > data references that may or may not be aligned. An additional > sequence of runtime tests is generated for each pairs of DDRs whose > independence was not proven. The vectorized version of loop is > - executed only if both alias and alignment tests are passed. */ > + executed only if both alias and alignment tests are passed. > + > + The test generated to check which version of loop is executed > + is modified to check for profitability as indicated by the --> "is modified to *also* check for profitability", right? thanks, dorit > The patch bootstrapped and passed test on amd64-linux. > > Ok for trunk? > > > > >> Thanks, David > >> > >> * gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Use > >> -mcpu=970 instead of 7400. > >> > >> Index: ppc-costmodel-vect.exp > >> =================================================================== > >> --- ppc-costmodel-vect.exp (revision 129469) > >> +++ ppc-costmodel-vect.exp (working copy) > >> @@ -49,7 +49,7 @@ > >> } else { > >> if [is-effective-target ilp32] { > >> # Specify a cpu that supports VMX for compile-only tests. > >> - lappend DEFAULT_VECTCFLAGS "-mcpu=7400" > >> + lappend DEFAULT_VECTCFLAGS "-mcpu=970" > >> } > >> set dg-do-what-default compile > >> } > > > > > > [attachment "0008-costmodel-early-scalar.diff" deleted by Dorit > Nuzman/Haifa/IBM]