Hi Richi,

Thanks for the comments!

on 2021/5/7 下午5:43, Richard Biener wrote:
> On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Hi,
>>
>> When I was investigating density_test heuristics, I noticed that
>> the current rs6000_density_test could be used for single scalar
>> iteration cost calculation, through the call trace:
>>   vect_compute_single_scalar_iteration_cost
>>     -> rs6000_finish_cost
>>          -> rs6000_density_test
>>
>> It looks unexpected as its desriptive function comments and Bill
>> helped to confirm this needs to be fixed (thanks!).
>>
>> So this patch is to check the passed data, if it's the same as
>> the one in loop_vinfo, it indicates it's working on vector version
>> cost calculation, otherwise just early return.
>>
>> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
>>
>> Nothing remarkable was observed with SPEC2017 Power9 full run.
>>
>> Is it ok for trunk?
> 
> +  /* Only care about cost of vector version, so exclude scalar
> version here.  */
> +  if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
> +    return;
> 
> Hmm, looks like a quite "random" test to me.  What about adding a
> parameter to finish_cost () (or init_cost?) indicating the cost kind?
> 

I originally wanted to change the hook interface, but noticed that
the finish_cost in function vect_estimate_min_profitable_iters is
the only invocation with LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
it looks enough to differentiate the scalar version costing or
vector version costing for loop.  Do you mean this observation/
assumption easy to be broken sometime later?

The attached patch to add one new parameter to indicate the
costing kind explicitly as you suggested.

Does it look better?

gcc/ChangeLog:

	* doc/tm.texi: Regenerated.
	* target.def (init_cost): Add new parameter costing_for_scalar.
	* targhooks.c (default_init_cost): Adjust for new parameter.
	* targhooks.h (default_init_cost): Likewise.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
	(vect_compute_single_scalar_iteration_cost): Likewise.
	(vect_analyze_loop_2): Likewise.
	* tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
	(vect_bb_vectorization_profitable_p): Likewise.
	* tree-vectorizer.h (init_cost): Likewise.
	* config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
	* config/i386/i386.c (ix86_init_cost): Likewise.
	* config/rs6000/rs6000.c (rs6000_init_cost): Likewise.  

> OTOH we already pass scalar_stmt to individual add_stmt_cost,
> so not sure whether the context really matters.  That said,
> the density test looks "interesting" ... the intent was that finish_cost
> might look at gathered data from add_stmt, not that it looks at
> the GIMPLE IL ... so why are you not counting vector_stmt vs.
> scalar_stmt entries in vect_body and using that for this metric?
> 

Good to know the intention behind finish_cost, thanks!

I'm afraid that the check on vector_stmt and scalar_stmt entries
from add_stmt_cost doesn't work for the density test here.  The
density test focuses on the vector version itself, there are some
stmts whose relevants are marked as vect_unused_in_scope, IIUC
they won't be passed down when costing for both versions.  But the
existing density check would like to know the cost for the
non-vectorized part.  The current implementation does:
 
 vec_cost = data->cost[vect_body]

	  if (!STMT_VINFO_RELEVANT_P (stmt_info)
	      && !STMT_VINFO_IN_PATTERN_P (stmt_info))
	    not_vec_cost++

 density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);

it takes those unrelevant stmts into account, and then has
both costs from the non-vectorized part (not_vec_cost)
and vectorized part (cost[vect_body]), it can calculate the
vectorization code density ratio.

BR,
Kewen