From: Richard Sandiford <richard.sandiford@arm.com>
To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
Bill Schmidt <wschmidt@linux.ibm.com>,
Segher Boessenkool <segher@kernel.crashing.org>,
Richard Biener <richard.guenther@gmail.com>
Subject: Re: [PATCH v4] vect/rs6000: Support vector with length cost modeling
Date: Mon, 27 Jul 2020 14:40:53 +0100 [thread overview]
Message-ID: <mptime9f53u.fsf@arm.com> (raw)
In-Reply-To: <f9ca7491-d8f3-c8e3-5a55-1fbbe71a0d51@linux.ibm.com> (Kewen Lin's message of "Mon, 27 Jul 2020 11:58:52 +0800")
"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi Richard,
>
> Thanks for the review again!
>
> on 2020/7/25 上午12:21, Richard Sandiford wrote:
>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>
>> Thanks, the rearrangement of the existing code looks good. Could you
>> split that and the new LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo) stuff
>> out into separate patches?
>>
>
> Splitted to https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550691.html.
>
> errr... that subject should be with prefix "[PATCH] vect:".
>
> [snip ...]
> (Some comments in the snipped content will be done in v4)
>
>>> + here. */
>>> +
>>> + /* For now we only operate length-based partial vectors on Power,
>>> + which has constant VF all the time, we need some tweakings below
>>> + if it doesn't hold in future. */
>>> + gcc_assert (LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ());
>>
>> Where do you rely on this? There didn't seem to be any obvious
>> to_constant uses. Since this is “only” a cost calculation, we should
>> be using assumed_vf.
>
> Sorry for the confusion. This was intended for the poly things like
> VF or nitems_per_ctrl which isn't constant during compilation time,
> then get people's attention on the possible runtime cost on things like
> scaling up for nitems_step etc. But I just realized that the computations
> like the multiply with another constant can operate on the coefficient,
> it looks there is no runtime cost then? If so, I think I thought too
> much before. ;-)
>
>>> - prologue_cost_vec.release ();
>>> - epilogue_cost_vec.release ();
>>> + (void) add_stmt_cost (loop_vinfo, target_cost_data, prol_cnt, scalar_stmt,
>>> + NULL, NULL_TREE, 0, vect_prologue);
>>> + (void) add_stmt_cost (loop_vinfo, target_cost_data, body_cnt, scalar_stmt,
>>> + NULL, NULL_TREE, 0, vect_body);
>>
>> IMO this seems to be reproducing too much of the functions that you
>> referred to. And the danger with that is that they could easily
>> get out of sync later.
>
> Good point! The original intention was to model as possible as we can,
> to avoid some bad decision due to some unmodeled pieces, like the case
> the loop body is small and some computation become nonnegligible.
> The unsync risks seems also applied for other codes. How about adding
> some "note" comments in those functions?
>
> The updated v4 is attached by addressing your comments as well as Segher's
> comments.
>
> BR,
> Kewen
> -----
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New
> function.
> (rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop.
> * tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost
> modeling for vector with length.
> * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Update
> function comment.
> * tree-vect-stmts.c (vect_gen_len): Likewise.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 009afc5f894..86ef584e09b 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5177,6 +5177,34 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, int count,
> return retval;
> }
>
> +/* For some target specific vectorization cost which can't be handled per stmt,
> + we check the requisite conditions and adjust the vectorization cost
> + accordingly if satisfied. One typical example is to model shift cost for
> + vector with length by counting number of required lengths under condition
> + LOOP_VINFO_FULLY_WITH_LENGTH_P. */
> +
> +static void
> +rs6000_adjust_vect_cost_per_loop (rs6000_cost_data *data)
> +{
> + struct loop *loop = data->loop_info;
> + gcc_assert (loop);
> + loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
> +
> + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> + {
> + rgroup_controls *rgc;
> + unsigned int num_vectors_m1;
> + unsigned int shift_cnt = 0;
> + FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
> + if (rgc->type)
> + /* Each length needs one shift to fill into bits 0-7. */
> + shift_cnt += num_vectors_m1 + 1;
> +
> + rs6000_add_stmt_cost (loop_vinfo, (void *) data, shift_cnt, scalar_stmt,
> + NULL, NULL_TREE, 0, vect_body);
> + }
> +}
> +
> /* Implement targetm.vectorize.finish_cost. */
>
> static void
> @@ -5186,7 +5214,10 @@ rs6000_finish_cost (void *data, unsigned *prologue_cost,
> rs6000_cost_data *cost_data = (rs6000_cost_data*) data;
>
> if (cost_data->loop_info)
> - rs6000_density_test (cost_data);
> + {
> + rs6000_adjust_vect_cost_per_loop (cost_data);
> + rs6000_density_test (cost_data);
> + }
>
> /* Don't vectorize minimum-vectorization-factor, simple copy loops
> that require versioning for any reason. The vectorization is at
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 490e7befea3..9d0e3fc525e 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -412,7 +412,10 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, rgroup_controls *dest_rgm,
>
> This means that we cannot guarantee that such an induction variable
> would ever hit a value that produces a set of all-false masks or zero
> - lengths for RGC. */
> + lengths for RGC.
> +
> + Note that please check cost modeling whether needed to be updated in
> + function vect_estimate_min_profitable_iters if any changes. */
>
> static tree
> vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 06cde4b1da3..a00160a7f2d 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -3798,6 +3798,70 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
> (void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks - 1,
> vector_stmt, NULL, NULL_TREE, 0, vect_body);
> }
> + else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> + {
> + /* Referring to the functions vect_set_loop_condition_partial_vectors
> + and vect_set_loop_controls_directly, we need to generate each
> + length in the prologue and in the loop body if required. Although
> + there are some possible optimizations, we consider the worst case
> + here. */
> +
> + /* For wrap around checking. */
> + tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
> + unsigned int compare_precision = TYPE_PRECISION (compare_type);
> + widest_int iv_limit = vect_iv_limit_for_partial_vectors (loop_vinfo);
> +
> + bool niters_known_p = LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo);
> + bool need_iterate_p
> + = (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> + && !vect_known_niters_smaller_than_vf (loop_vinfo));
> +
> + /* Init min/max, shift and minus cost relative to single
> + scalar_stmt. For now we only use length-based partial vectors on
> + Power, target specific cost tweaking may be needed for other
> + ports in future. */
> + unsigned int min_max_cost = 2;
> + unsigned int shift_cost = 1, minus_cost = 1;
Please instead add a scalar_min_max to vect_cost_for_stmt, and use
scalar_stmt for shift and minus. There shouldn't be any Power things
hard-coded into target-independent code.
> + /* Init cost relative to single scalar_stmt. */
> + unsigned int prologue_cnt = 0;
> + unsigned int body_cnt = 0;
Maybe _stmts instead of _cnt, to make it clearer what we're counting.
At first it looked like this was actually the raw cost. I guess with
the above change we'd also want separate counters for min_max.
> + rgroup_controls *rgc;
> + unsigned int num_vectors_m1;
> + FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
> + if (rgc->type)
> + {
> + unsigned nitems = rgc->max_nscalars_per_iter * rgc->factor;
> +
> + /* May need one shift for nitems_total computation. */
> + if (nitems != 1 && !niters_known_p)
> + prologue_cnt += shift_cost;
> +
> + /* Need to handle wrap around. */
> + if (iv_limit == -1
> + || (wi::min_precision (iv_limit * nitems, UNSIGNED)
> + > compare_precision))
> + prologue_cnt += (min_max_cost + minus_cost);
I think we should instead split this condition out into a subroutine
that both this function and vect_set_loop_condition_partial_vectors
can use. It would take the loop_vec_info and rgroup_controls as
arguments.
That means that we might recompute things a few more times, but this
shouldn't be performance-critical.
Thanks,
Richard
next prev parent reply other threads:[~2020-07-27 13:40 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-21 5:51 [PATCH] vect: " Kewen.Lin
2020-07-21 7:57 ` Richard Biener
2020-07-22 1:26 ` [PATCH v2] vect/rs6000: " Kewen.Lin
2020-07-22 6:38 ` Richard Biener
2020-07-22 7:08 ` Kewen.Lin
2020-07-22 9:11 ` Richard Sandiford
2020-07-22 15:48 ` [PATCH v3] " Kewen.Lin
2020-07-22 16:25 ` Kewen.Lin
2020-07-24 16:21 ` Richard Sandiford
2020-07-27 3:58 ` [PATCH v4] " Kewen.Lin
2020-07-27 13:40 ` Richard Sandiford [this message]
2020-07-28 8:36 ` Kewen.Lin
2020-07-31 11:03 ` Richard Sandiford
2020-07-31 11:20 ` Richard Biener
2020-07-31 12:37 ` Kewen.Lin
2020-07-31 13:01 ` Richard Biener
2020-07-31 13:21 ` Kewen.Lin
2020-07-31 14:51 ` [PATCH v5] " Kewen.Lin
2020-08-05 7:27 ` Richard Sandiford
2020-08-05 14:06 ` Segher Boessenkool
2020-08-06 6:47 ` Kewen.Lin
2020-07-22 17:49 ` [PATCH v2] " Segher Boessenkool
2020-07-27 3:44 ` Kewen.Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mptime9f53u.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=linkw@linux.ibm.com \
--cc=richard.guenther@gmail.com \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).