From: "Jagasia, Harsha" <harsha.jagasia@amd.com>
To: "Dorit Nuzman" <DORIT@il.ibm.com>, gcc-patches@gcc.gnu.org
Subject: FW: [PATCH,vect] ppc cost model options
Date: Tue, 13 Nov 2007 17:22:00 -0000 [thread overview]
Message-ID: <D5B24B5251882048AD03DDFA431BB790012F04E1@SAUSEXMB3.amd.com> (raw)
Sorry for not being able to respond earlier to this thread.
>> >vectorized and such) because the vectorizer cost-model is pending
some
>> >fixes from Harsha
>> (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00916.html
>> >), to be followed by target-specific tuning of costs.)
>>
>> Here is the patch implementing the early cost model check.
>>
>
>great!
>
>> The check is added as follows:
>> - If we do versioning, the threshold comparison is and'ed to the
guard
>> that controls the versioning,
>> - Otherwise, if we do peeling for alignment, we can determine the
>> loop-count of the prolog loop according to the threshold test
>> - Otherwise, the threshold comparison is or'ed with the guard that
>> controls whether to bypass the vector code and go straight to the
scalar
>> epilogue.
>>
>
>The only technical issue I have is with the fact that I think we're not
>taking into account that because of the presence of the runtime-test,
there
>are costs that should be added also to the scalar side of the equasion
>(branch costs), and these costs are different in each of the above 3
cases.
>(This is something we don't need to worry about when we make the
decision
>at compile time). For example, in case 2, even if we choose to execute
the
>scalar loop we pay the cost of 2 branches that did not exist in the
>original code; in case 3 we pay the cost of 1-3 branches (depending on
>whether we also peel for alignment and if so, if we also enter the
prolog
>loop). (We also pay for these branches on the vector side of the
equasion,
>but this is already taken into account). So we actually have 4 slighlty
>different equasions:
>
>equasion for compile-time test:
> scalar_costs >= vector_costs
>
>equasion for run-time test, case 1:
> scalar_costs
> + versioning_guard_cost >= vector_costs
>
>equasion for run-time test, case 2:
> scalar_costs
> + cost_of_guard_for_prolog_peel >= vector_costs
>
>equasion for run-time test, case 3:
> scalar_costs
> + cost_of_guard_for_prolog_peel
> + cost_of_guard_for_epilog_peel >= vector_costs
Just to clarify
In case 1, the check is:
if (cost > th & ...)
jmp to vector code
Hence the run-time scalar cost should be incremented by a NOT-TAKEN
branch.
In case 2, the check is:
if (cost <= th)
prologue=scalar_iters
if (prologue == 0)
jmp to vector code
else
execute prologue
if (prologue == num_iters)
go to exit
Hence the run-time scalar cost should be incremented by a taken branch,
which on architectures like x86-64 should be a conditional move, plus a
NOT-TAKEN branch, plus a TAKEN BRANCH.
In case 3, the check is:
if (prologue == 0)
jmp to vector code
else
execute prologue
if prologue=num_iters
go to exit
vector code:
if (cost <=th | (scalar_iters-prologue-epilogue == 0))
jmp to epilogue
Hence the run-time scalar cost should be incremented by a (taken branch
or 2 not-taken branches for prologue) plus taken branch.
>Do you think we can add these 3 different fixups somehow, depending on
>where the run-time test is actually placed? (this can be important for
>platforms like the SPU, where branches can be very costly, such that we
may
>choose to vectorize the less efficient vector version and not pay the
>branch cost involved with branching to skip the vector code).
Indeed, I will add the costs.
>
>The rest are just various style issues:
>
>> * tree-vect-transform.c (vect_do_peeling_for_loop_bound): Computes
>> the treshold if the cost model check has not been done already.
>
>treshold -> threshold
>
>> Index: tree-vectorizer.c
>> ===================================================================
>> --- tree-vectorizer.c (revision 129356)
>> +++ tree-vectorizer.c (working copy)
>...
>> slpeel_tree_peel_loop_to_edge (struct loop *loop,
>> edge e, tree first_niters,
>> tree niters, bool update_first_loop_count,
>> - unsigned int th)
>> + unsigned int th, bool check)
>
>Please document the new function argument "check" (I see that a
>documentation for "th" is also missing; if you feel like it, please add
>that too while you're at it...)
>
>Actually, maybe we can come up with something more descriptive than
>"check"? e.g. "check_profitability", "add_cost_model_check"...
Ok.
>
>> - /* 2. Add the guard that controls whether the first loop is
executed.
>> - Resulting CFG would be:
>> + /* 2.a Add the guard that controls whether the first loop is
executed.
>> + Resulting CFG would be:
>>
>> - bb_before_first_loop:
>> - if (FIRST_NITERS == 0) GOTO bb_before_second_loop
>> - GOTO first-loop
>> + bb_before_first_loop:
>> + if (FIRST_NITERS == 0) GOTO bb_before_second_loop
>> + GOTO first-loop
>>
>> - first_loop:
>> - do {
>> - } while ...
>> + first_loop:
>> + do {
>> + } while ...
>>
>> - bb_before_second_loop:
>> + bb_before_second_loop:
>>
>> - second_loop:
>> - do {
>> - } while ...
>> + second_loop:
>> + do {
>> + } while ...
>>
>> - orig_exit_bb:
>> - */
>> + orig_exit_bb:
>> + */
>> +
>> +
>> + /* 2.b Add the cost model check that allows the prologue
>> + to iterate for the entire unchanged scalar
>> + iterations of the loop in the event that the cost
>> + model indicates that the scalar loop is more
>> + profitable than the vector one.
>> +
>> + Resulting CFG after prologue peeling would be:
>> +
>> + if (scalar_loop_iterations <= th)
>> + FIRST_NITERS = scalar_loop_iterations
>> +
>> + bb_before_first_loop:
>> + if (FIRST_NITERS == 0) GOTO bb_before_second_loop
>> + GOTO first-loop
>> +
>> + first_loop:
>> + do {
>> + } while ...
>> +
>> + bb_before_second_loop:
>> +
>> + second_loop:
>> + do {
>> + } while ...
>> +
>> + orig_exit_bb:
>> + */
>> +
>> + /* 2.c Add the cost model check that allows the epilogue
>> + to iterate for the entire unchanged scalar
>> + iterations of the loop in the event that the cost
>> + model indicates that the scalar loop is more
>> + profitable than the vector one.
>> +
>> + Resulting CFG after prologue peeling would be:
>> +
>> + bb_before_first_loop:
>> + if ((scalar_loop_iterations <= th)
>> + ||
>> + FIRST_NITERS == 0) GOTO bb_before_second_loop
>> + GOTO first-loop
>> +
>> + first_loop:
>> + do {
>> + } while ...
>> +
>> + bb_before_second_loop:
>> +
>> + second_loop:
>> + do {
>> + } while ...
>> +
>> + orig_exit_bb:
>> + */
>>
>
>Just so it's clear that this function does only one of the above per
>invocation, maybe write: "2. Add the control/guard code in one of the
>following ways: 2.a. ..."
Ok.
>Also, I was wondering if it would be clearer if the description was
more
>compact - i.e. instead of duplicating 3 times the pseudo-code that
remains
>the same in all 3 alternatives, do something along these lines:
>"
>2. Add the control/guard code. Resulting CFG would be:
>
> bb_before_first_loop:
> if (COND) GOTO bb_before_second_loop
> GOTO first-loop
>
> first_loop:
> do {
> } while ...
>
> bb_before_second_loop:
>
> second_loop:
> do {
> } while ...
>
> orig_exit_bb:
>
>The following 3 variants are supported:
>2a. The guard controls whether the first loop is executed.
> COND: FIRST_NITERS == 0
>2b. Add the cost model check that allows the prologue to iterate....
> if (scalar_loop_iterations <= th)
> FIRST_NITERS = scalar_loop_iterations
> COND: FIRST_NITERS == 0
>2c. Add the cost model check that allows the epilogue to iterate...
> COND: (scalar_loop_iterations <= th) || FIRST_NITERS == 0
>"
>
>... but in the end I'm not sure which way is clearer, so I'll leave the
>decision to you.
>
Ok, I will try to make it clearer. But its hard to say which is clearer
because the same function is used to do different things for prologue
and epilogue.
>A general comment about the changes to slpeel_tree_peel_loop_to_edge: I
>wonder if this function would be taken out of the vectorizer one day
and
>merged with the generic peeling utils. In any case, it would be
desirable
>if instead of manipulating the condition inside the function, the
function
>would get the condition as an input? (even if it ends up remaining a
>vectorizer specific function - I think this will be a good thing to
do). If
>this looks like too much work, then please consider factoring out the
new
>code that builds the condition in 3 different ways, or at least just
the
>part that creates the if-stmt for the "Peologue peeling case:
Ok.
>
>> + /* Epilogue peeling. */
>> + if (!update_first_loop_count)
>> + {
>...
>> + }
>> + /* Prologue peeling. */
>> + else
>> + {
>> + if (check)
>> + first_niters = conditionally_reset_first_niters (...)
>> +
>...
>> + }
>
>
>> Index: tree-vect-transform.c
>> ===================================================================
>> --- tree-vect-transform.c (revision 129356)
>> +++ tree-vect-transform.c (working copy)
>...
>> @@ -6482,28 +6483,35 @@ vect_do_peeling_for_loop_bound (loop_vec
>>
>> loop_num = loop->num;
>>
>> - /* Analyze cost to set threshhold for vectorized loop. */
>
>threshhold --> threshold
>
>...
Ok.
>> + /* Analyze cost to set threshhold for vectorized loop. */
>> + min_profitable_iters = LOOP_VINFO_COST_MODEL_MIN_ITERS
>(loop_vinfo);
>>
>
>about the comment - we don't "Analyze cost" here, just use the already
>precalculcated threshold. (it also repeats in other places).
>
>> - th = (unsigned) min_scalar_loop_bound;
>> - if (min_profitable_iters
>> - && (!min_scalar_loop_bound
>> - || min_profitable_iters > min_scalar_loop_bound))
>> - th = (unsigned) min_profitable_iters;
>> + min_scalar_loop_bound = ((PARAM_VALUE
(PARAM_MIN_VECT_LOOP_BOUND)
>> + * LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 1);
>>
>> - if (((LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
>> - || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> - && vect_print_dump_info (REPORT_DETAILS))
>> - fprintf (vect_dump, "vectorization may not be profitable.");
>> + /* Use the cost model only if it is more conservative than
>> user specified
>> + threshold. */
>> + th = (unsigned) min_scalar_loop_bound;
>> + if (min_profitable_iters
>> + && (!min_scalar_loop_bound
>> + || min_profitable_iters > min_scalar_loop_bound))
>> + th = (unsigned) min_profitable_iters;
>> +
>> + if (th && vect_print_dump_info (REPORT_DETAILS))
>> + fprintf (vect_dump, "loop bound peeling: vectorization may not be
>> profitable.");
>> + }
>
>The above code repeats almost identically 3 times: here, in
>vect_do_peeling_for_alignment and in vect_loop_versioning. Please it
factor
>out...
>
Ok.
>> @@ -7031,7 +7078,11 @@ vect_create_cond_for_alias_checks (loop_
>> data references that may or may not be aligned. An additional
>> sequence of runtime tests is generated for each pairs of DDRs
whose
>> independence was not proven. The vectorized version of loop is
>> - executed only if both alias and alignment tests are passed. */
>> + executed only if both alias and alignment tests are passed.
>> +
>> + The test generated to check which version of loop is executed
>> + is modified to check for profitability as indicated by the
>
>--> "is modified to *also* check for profitability", right?
>
Yes.
Thanks,
Harsha
>thanks,
>dorit
>
>
>> The patch bootstrapped and passed test on amd64-linux.
>>
>> Ok for trunk?
>>
>> >
>> >> Thanks, David
>> >>
>> >> * gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Use
>> >> -mcpu=970 instead of 7400.
>> >>
>> >> Index: ppc-costmodel-vect.exp
>> >>
===================================================================
>> >> --- ppc-costmodel-vect.exp (revision 129469)
>> >> +++ ppc-costmodel-vect.exp (working copy)
>> >> @@ -49,7 +49,7 @@
>> >> } else {
>> >> if [is-effective-target ilp32] {
>> >> # Specify a cpu that supports VMX for compile-only tests.
>> >> - lappend DEFAULT_VECTCFLAGS "-mcpu=7400"
>> >> + lappend DEFAULT_VECTCFLAGS "-mcpu=970"
>> >> }
>> >> set dg-do-what-default compile
>> >> }
>> >
>> >
>>
>> [attachment "0008-costmodel-early-scalar.diff" deleted by Dorit
>> Nuzman/Haifa/IBM]
>
>
next reply other threads:[~2007-11-13 16:35 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-13 17:22 Jagasia, Harsha [this message]
2007-11-26 2:24 ` Jagasia, Harsha
2007-12-02 10:30 ` Dorit Nuzman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D5B24B5251882048AD03DDFA431BB790012F04E1@SAUSEXMB3.amd.com \
--to=harsha.jagasia@amd.com \
--cc=DORIT@il.ibm.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).