From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-501290-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 63442 invoked by alias); 21 May 2019 03:48:52 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 63404 invoked by uid 89); 21 May 2019 03:48:50 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-17.6 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy=this!
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 21 May 2019 03:48:44 +0000
Received: from pps.filterd (m0098410.ppops.net [127.0.0.1])	by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4L3l9m8110598	for <gcc-patches@gcc.gnu.org>; Mon, 20 May 2019 23:48:42 -0400
Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103])	by mx0a-001b2d01.pphosted.com with ESMTP id 2sm87fapbv-1	(version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT)	for <gcc-patches@gcc.gnu.org>; Mon, 20 May 2019 23:48:41 -0400
Received: from localhost	by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <linkw@linux.ibm.com>;	Tue, 21 May 2019 04:48:39 +0100
Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197)	by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	(version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256)	Tue, 21 May 2019 04:48:36 +0100
Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61])	by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x4L3mZ7k60817568	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);	Tue, 21 May 2019 03:48:36 GMT
Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA (Postfix) with ESMTP id D357011C054;	Tue, 21 May 2019 03:48:35 +0000 (GMT)
Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA (Postfix) with ESMTP id 5210C11C050;	Tue, 21 May 2019 03:48:33 +0000 (GMT)
Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.25])	by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP;	Tue, 21 May 2019 03:48:33 +0000 (GMT)
Subject: Re: [PATCH v3 2/3] Add predict_doloop_p target hook
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org,        wschmidt@linux.ibm.com, bin.cheng@linux.alibaba.com, jakub@redhat.com
References: <1558064130-111037-1-git-send-email-linkw@linux.ibm.com> <alpine.LSU.2.20.1905201122310.10704@zhemvz.fhfr.qr>
From: "Kewen.Lin" <linkw@linux.ibm.com>
Date: Tue, 21 May 2019 03:48:00 -0000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <alpine.LSU.2.20.1905201122310.10704@zhemvz.fhfr.qr>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
x-cbid: 19052103-0028-0000-0000-0000036FCDC3
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 19052103-0029-0000-0000-0000242F75AC
Message-Id: <e8c1e96f-b811-5d5c-0bb1-395bdef93850@linux.ibm.com>
X-IsSubscribed: yes
X-SW-Source: 2019-05/txt/msg01343.txt.bz2

on 2019/5/20 ä¸å5:28, Richard Biener wrote:
> On Thu, 16 May 2019, linkw@linux.ibm.com wrote:
> 
>> From: Kewen Lin <linkw@linux.ibm.com>
>>
>> Hi,
>>
>> Previous version link:
>> https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00654.html
>>
>> Comparing with the previous version, I moved the generic
>> parts of rs6000 target hook to IVOPTs.  But I still kept
>> the target hook as previous which checks some target
>> specific criteria like innermost, max iteration counts
>> etc, and checks for invalid stmt in loop.  The reason
>> I decided not to move this part to generic is they are
>> not generic enough.
>>
>> 1) For the part of target specific criteria, if we want
>> to put it in generic, we can call the hook
>> targetm.can_use_doloop_p, which requires us to
>> prepare those four parameters, but each target only needs
>> one or two parameters, it means we will evaluate some
>> things which aren't required for that target.  So I'd like
>> to leave this part to target hook.
>> 2) For the other part of target invalid stmt check, as the
>> hook invalid_within_doloop grep data shows, no all targets
>> need to check whether invalid instructions exist in doloop.
>> If we scan all stmts as generic, it can waste time for those
>> targets which don't need to check.  Besides, the scope of
>> the current check on SWITCH in rs6000 hook is wide, later
>> if we want it more exact, we may need to check more stmts
>> instead of single.  To let target hook scan the BBs/stmts
>> by itself is also more flexible.
>>
>> Bootstrapped and regression testing ongoing on powerpc64le.
>>
>> Any more comments?
>>
>>  
>> -
>> +/* Check whether there are some instructions preventing doloop transformation
>> +   inside loop body, mainly for instructions which are possible to kill CTR.
>> +
>> +   Return true if some invalid insn exits, otherwise return false.  */
>> +
>> +static bool
>> +invalid_insn_for_doloop_p (struct loop *loop)
>> +{
>> +  basic_block *body = get_loop_body (loop);
> 
> You leak memory here.  I still think all of this function should
> be in generic code for now.
> 

Thanks a lot for catching this!  I didn't notice I should free it by myself.

>> +  gimple_stmt_iterator gsi;
>> +
>> +  for (unsigned i = 0; i < loop->num_nodes; i++)
>> +    for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
>> +      {
>> +	gimple *stmt = gsi_stmt (gsi);
>> +	if (is_gimple_call (stmt) && !gimple_call_internal_p (stmt)
>> +	    && !is_inexpensive_builtin (gimple_call_fndecl (stmt)))
>> +	  {
>> +	    if (dump_file && (dump_flags & TDF_DETAILS))
>> +	      fprintf (dump_file,
>> +		       "predict doloop failure due to finding call.\n");
>> +	    return true;
>> +	  }
>> +	if (computed_goto_p (stmt))
>> +	  {
>> +	    if (dump_file && (dump_flags & TDF_DETAILS))
>> +	      fprintf (dump_file, "predict doloop failure due to"
>> +				  "finding computed jump.\n");
>> +	    return true;
>> +	  }
>> +
>> +	/* TODO: Now this hook is expected to be called in ivopts, which is
>> +	   before switchlower1/switchlower2.  Checking for SWITCH at this point
>> +	   will eliminate some good candidates.  But since there are only a few
>> +	   cases having _a_ switch statement in the innermost loop, it can be a
>> +	   low priority enhancement.  */
>> +	if (gimple_code (stmt) == GIMPLE_SWITCH)
>> +	  {
>> +	    if (dump_file && (dump_flags & TDF_DETAILS))
>> +	      fprintf (dump_file,
>> +		       "predict doloop failure due to finding switch.\n");
>> +	    return true;
>> +	  }
>> +      }
>> +
>> +  return false;
>> +}
>> +
>> +/* Predict whether the given loop in gimple will be transformed in the RTL
>> +   doloop_optimize pass.  This is for rs6000 target specific.  */
>> +
>> +static bool
>> +rs6000_predict_doloop_p (struct loop *loop)
>> +{
>> +  gcc_assert (loop);
>> +
>> +  /* On rs6000, targetm.can_use_doloop_p is actually
>> +     can_use_doloop_if_innermost.  Just ensure it's innermost.  */
> 
> So the point is that ppc only has a single counter register, right?
> 
> So the better way would be to expose that via a target hook somehow.
> Or simply restrict IVOPTs processing to innermost loops for now.
> 
It's to answer whether the target supports doloop or not.  I guess
to check innermost is stricter and simpler.  There is another target 
hook invalid_within_doloop to check counter register clobber.

>> +  if (loop->inner != NULL)
>> +    {
>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>> +	fprintf (dump_file, "predict doloop failure due to"
>> +			    "no innermost.\n");
>> +      return false;
>> +    }
>> +
>> +  /* Similar to doloop_optimize targetm.invalid_within_doloop, check for
>> +     CALL, tablejump, computed_jump.  */
>> +  if (invalid_insn_for_doloop_p (loop))
>> +    return false;
>> +
>> +  return true;
>> +}
>> +
>>  struct gcc_target targetm = TARGET_INITIALIZER;
>>  
>>  #include "gt-rs6000.h"
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 8c8978b..e595b8b 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -11603,6 +11603,14 @@ function version at run-time for a given set of function versions.
>>  body must be generated.
>>  @end deftypefn
>>  
>> +@deftypefn {Target Hook} bool TARGET_PREDICT_DOLOOP_P (struct loop *@var{loop})
>> +Return true if we can predict it is possible to use low-overhead loops
>> +for a particular loop.  The parameter @var{loop} is a pointer to the loop
>> +which is going to be checked.  This target hook is required only when the
>> +target supports low-overhead loops, and will help some earlier middle-end
>> +passes to make some decisions.
>> +@end deftypefn
>> +
>>  @deftypefn {Target Hook} bool TARGET_CAN_USE_DOLOOP_P (const widest_int @var{&iterations}, const widest_int @var{&iterations_max}, unsigned int @var{loop_depth}, bool @var{entered_at_top})
>>  Return true if it is possible to use low-overhead loops (@code{doloop_end}
>>  and @code{doloop_begin}) for a particular loop.  @var{iterations} gives the
>> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> index fe1194e..e047734 100644
>> --- a/gcc/doc/tm.texi.in
>> +++ b/gcc/doc/tm.texi.in
>> @@ -7942,6 +7942,8 @@ to by @var{ce_info}.
>>  
>>  @hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
>>  
>> +@hook TARGET_PREDICT_DOLOOP_P
>> +
>>  @hook TARGET_CAN_USE_DOLOOP_P
>>  
>>  @hook TARGET_INVALID_WITHIN_DOLOOP
>> diff --git a/gcc/target.def b/gcc/target.def
>> index 66cee07..0a80de3 100644
>> --- a/gcc/target.def
>> +++ b/gcc/target.def
>> @@ -4227,6 +4227,15 @@ DEFHOOK
>>  rtx, (machine_mode mode, rtx result, rtx val, rtx failval),
>>   default_speculation_safe_value)
>>   
>> +DEFHOOK
>> +(predict_doloop_p,
>> + "Return true if we can predict it is possible to use low-overhead loops\n\
>> +for a particular loop.  The parameter @var{loop} is a pointer to the loop\n\
>> +which is going to be checked.  This target hook is required only when the\n\
>> +target supports low-overhead loops, and will help some earlier middle-end\n\
>> +passes to make some decisions.",
>> + bool, (struct loop *loop),
>> + default_predict_doloop_p)
>>  
>>  DEFHOOK
>>  (can_use_doloop_p,
>> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> index 318f7e9..56ed4cc 100644
>> --- a/gcc/targhooks.c
>> +++ b/gcc/targhooks.c
>> @@ -643,6 +643,19 @@ default_has_ifunc_p (void)
>>    return HAVE_GNU_INDIRECT_FUNCTION;
>>  }
>>  
>> +/* True if we can predict this loop is possible to be transformed to a
>> +   low-overhead loop, otherwise returns false.
>> +
>> +   By default, false is returned, as this hook's applicability should be
>> +   verified for each target.  Target maintainers should re-define the hook
>> +   if the target can take advantage of it.  */
>> +
>> +bool
>> +default_predict_doloop_p (struct loop *loop ATTRIBUTE_UNUSED)
>> +{
>> +  return false;
>> +}
>> +
>>  /* NULL if INSN insn is valid within a low-overhead loop, otherwise returns
>>     an error message.
>>  
>> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
>> index 5943627..70bfb17 100644
>> --- a/gcc/targhooks.h
>> +++ b/gcc/targhooks.h
>> @@ -85,6 +85,7 @@ extern bool default_fixed_point_supported_p (void);
>>  
>>  extern bool default_has_ifunc_p (void);
>>  
>> +extern bool default_predict_doloop_p (struct loop *);
>>  extern const char * default_invalid_within_doloop (const rtx_insn *);
>>  
>>  extern tree default_builtin_vectorized_function (unsigned int, tree, tree);
>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>> index a44b4cb..ed7f2a5 100644
>> --- a/gcc/tree-ssa-loop-ivopts.c
>> +++ b/gcc/tree-ssa-loop-ivopts.c
>> @@ -3776,6 +3776,111 @@ prepare_decl_rtl (tree *expr_p, int *ws, void *data)
>>    return NULL_TREE;
>>  }
>>  
>> +/* Check whether number of iteration computation is too costly for doloop
>> +   transformation.  It expands the gimple sequence to equivalent RTL insn
>> +   sequence, then evaluate the cost.
>> +
>> +   Return true if it's costly, otherwise return false.  */
>> +
>> +static bool
>> +costly_iter_for_doloop_p (struct loop *loop, tree niters)
>> +{
> 
> I already said this but I expect IVOPTs to already cost initial
> value computation of new IVs.  I also doubt it's worth
> going into RTL details here, estimate_num_insns should be
> enough, like via using expression_expensive_p from SCEV analysis
> which is already used by IVOPTs.
> 
> You try to be perfect here but end up adding more
> magic numbers (PARAM_MAX_ITERATIONS_COMPUTATION_COST).
> 

Yes, I noticed there is something used for cost computation.  But it can not 
be used for this niter cost directly.  I tried to make similar thing like 
what we have in doloop_optimize.  The magic number is also from 
doloop_optimize.  I'm fine to use those existing ways, I don't have data for
whether it's not important to be exact.


Thanks
Kewen

>> +  tree type = TREE_TYPE (niters);
>> +  unsigned cost = 0;
>> +  bool speed = optimize_loop_for_speed_p (loop);
>> +  int regno = LAST_VIRTUAL_REGISTER + 1;
>> +  walk_tree (&niters, prepare_decl_rtl, &regno, NULL);
>> +  start_sequence ();
>> +  expand_expr (niters, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
>> +  rtx_insn *seq = get_insns ();
>> +  end_sequence ();
>> +
>> +  for (; seq; seq = NEXT_INSN (seq))
>> +    {
>> +      if (!INSN_P (seq))
>> +	continue;
>> +      rtx body = PATTERN (seq);
>> +      if (GET_CODE (body) == SET)
>> +	{
>> +	  rtx set_val = XEXP (body, 1);
>> +	  enum rtx_code code = GET_CODE (set_val);
>> +	  enum rtx_class cls = GET_RTX_CLASS (code);
>> +	  /* For now, we only consider these two RTX classes, to match what we
>> +	 get in doloop_optimize, excluding operations like zero/sign extend.  */
>> +	  if (cls == RTX_BIN_ARITH || cls == RTX_COMM_ARITH)
>> +	    cost += set_src_cost (set_val, GET_MODE (set_val), speed);
>> +	}
>> +    }
>> +  unsigned max_cost
>> +    = COSTS_N_INSNS (PARAM_VALUE (PARAM_MAX_ITERATIONS_COMPUTATION_COST));
>> +  if (cost > max_cost)
>> +    return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Predict whether the given loop will be transformed in the RTL
>> +   doloop_optimize pass.  Attempt to duplicate as many doloop_optimize checks
>> +   as possible.  This is only for target independent checks, see
>> +   targetm.predict_doloop_p for the target dependent ones.
>> +
>> +   Some RTL specific checks seems unable to be checked in gimple, if any new
>> +   checks or easy checks _are_ missing here, please add them.  */
>> +
>> +static bool
>> +generic_predict_doloop_p (struct ivopts_data *data)
>> +{
>> +  struct loop *loop = data->current_loop;
>> +
>> +  /* Call target hook for target dependent checks.  */
>> +  if (!targetm.predict_doloop_p (loop))
>> +    {
>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>> +	fprintf (dump_file, "predict doloop failure due to"
>> +			    "target specific checks.\n");
>> +      return false;
>> +    }
>> +
>> +  /* Similar to doloop_optimize, check iteration description to know it's
>> +     suitable or not.  */
>> +  edge exit = loop_latch_edge (loop);
>> +  struct tree_niter_desc *niter_desc = niter_for_exit (data, exit);
>> +  if (niter_desc == NULL)
>> +    {
>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>> +	fprintf (dump_file, "predict doloop failure due to"
>> +			    "unexpected niters.\n");
>> +      return false;
>> +    }
>> +
>> +  /* Similar to doloop_optimize, check whether iteration count too small
>> +     and not profitable.  */
>> +  HOST_WIDE_INT est_niter = get_estimated_loop_iterations_int (loop);
>> +  if (est_niter == -1)
>> +    est_niter = get_likely_max_loop_iterations_int (loop);
>> +  if (est_niter >= 0 && est_niter < 3)
>> +    {
>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>> +	fprintf (dump_file,
>> +		 "predict doloop failure due to"
>> +		 "too few iterations (%u).\n",
>> +		 (unsigned int) est_niter);
>> +      return false;
>> +    }
>> +
>> +  /* Similar to doloop_optimize, check whether number of iterations too costly
>> +     to compute.  */
>> +  if (costly_iter_for_doloop_p (loop, niter_desc->niter))
>> +    {
>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>> +	fprintf (dump_file, "predict doloop failure due to"
>> +			    "costly niter computation.\n");
>> +      return false;
>> +    }
>> +
>> +  return true;
>> +}
>> +
>>  /* Determines cost of the computation of EXPR.  */
>>  
>>  static unsigned
>>
>