public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Optimize divmod expansion (PR middle-end/79665)
@ 2017-02-22 21:50 Jakub Jelinek
  2017-02-23  6:00 ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Jelinek @ 2017-02-22 21:50 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Hi!

If both arguments of integer division or modulo are known to be non-negative
in corresponding signed type, then signed as well as unsigned division/modulo
shall have the exact same result and therefore we can choose between those
two depending on which one is faster (or shorter for -Os), which varries
a lot depending on target and especially for constant divisors on the exact
divisor.  expand_divmod itself is too complicated and we don't even have
the ability to ask about costs e.g. for highpart multiplication without
actually expanding it, so this patch just in that case tries both sequences,
computes their costs and uses the cheaper (and for equal cost honors the
actual original signedness of the operation).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/79665
	* internal-fn.c (get_range_pos_neg): Moved to ...
	* tree.c (get_range_pos_neg): ... here.  No longer static.
	* tree.h (get_range_pos_neg): New prototype.
	* expr.c (expand_expr_real_2) <case TRUNC_DIV_EXPR>: If both arguments
	are known to be in between 0 and signed maximum inclusive, try to
	expand both unsigned and signed divmod and use the cheaper one from
	those.

--- gcc/internal-fn.c.jj	2017-02-11 09:14:56.000000000 +0100
+++ gcc/internal-fn.c	2017-02-22 10:09:03.503254909 +0100
@@ -413,86 +413,6 @@ expand_FALLTHROUGH (internal_fn, gcall *
 	    "invalid use of attribute %<fallthrough%>");
 }
 
-/* Helper function for expand_addsub_overflow.  Return 1
-   if ARG interpreted as signed in its precision is known to be always
-   positive or 2 if ARG is known to be always negative, or 3 if ARG may
-   be positive or negative.  */
-
-static int
-get_range_pos_neg (tree arg)
-{
-  if (arg == error_mark_node)
-    return 3;
-
-  int prec = TYPE_PRECISION (TREE_TYPE (arg));
-  int cnt = 0;
-  if (TREE_CODE (arg) == INTEGER_CST)
-    {
-      wide_int w = wi::sext (arg, prec);
-      if (wi::neg_p (w))
-	return 2;
-      else
-	return 1;
-    }
-  while (CONVERT_EXPR_P (arg)
-	 && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0)))
-	 && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) <= prec)
-    {
-      arg = TREE_OPERAND (arg, 0);
-      /* Narrower value zero extended into wider type
-	 will always result in positive values.  */
-      if (TYPE_UNSIGNED (TREE_TYPE (arg))
-	  && TYPE_PRECISION (TREE_TYPE (arg)) < prec)
-	return 1;
-      prec = TYPE_PRECISION (TREE_TYPE (arg));
-      if (++cnt > 30)
-	return 3;
-    }
-
-  if (TREE_CODE (arg) != SSA_NAME)
-    return 3;
-  wide_int arg_min, arg_max;
-  while (get_range_info (arg, &arg_min, &arg_max) != VR_RANGE)
-    {
-      gimple *g = SSA_NAME_DEF_STMT (arg);
-      if (is_gimple_assign (g)
-	  && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (g)))
-	{
-	  tree t = gimple_assign_rhs1 (g);
-	  if (INTEGRAL_TYPE_P (TREE_TYPE (t))
-	      && TYPE_PRECISION (TREE_TYPE (t)) <= prec)
-	    {
-	      if (TYPE_UNSIGNED (TREE_TYPE (t))
-		  && TYPE_PRECISION (TREE_TYPE (t)) < prec)
-		return 1;
-	      prec = TYPE_PRECISION (TREE_TYPE (t));
-	      arg = t;
-	      if (++cnt > 30)
-		return 3;
-	      continue;
-	    }
-	}
-      return 3;
-    }
-  if (TYPE_UNSIGNED (TREE_TYPE (arg)))
-    {
-      /* For unsigned values, the "positive" range comes
-	 below the "negative" range.  */
-      if (!wi::neg_p (wi::sext (arg_max, prec), SIGNED))
-	return 1;
-      if (wi::neg_p (wi::sext (arg_min, prec), SIGNED))
-	return 2;
-    }
-  else
-    {
-      if (!wi::neg_p (wi::sext (arg_min, prec), SIGNED))
-	return 1;
-      if (wi::neg_p (wi::sext (arg_max, prec), SIGNED))
-	return 2;
-    }
-  return 3;
-}
-
 /* Return minimum precision needed to represent all values
    of ARG in SIGNed integral type.  */
 
--- gcc/tree.c.jj	2017-02-09 14:55:34.000000000 +0100
+++ gcc/tree.c	2017-02-22 10:10:36.525062066 +0100
@@ -14205,6 +14205,88 @@ verify_type (const_tree t)
 }
 
 
+/* Return 1 if ARG interpreted as signed in its precision is known to be
+   always positive or 2 if ARG is known to be always negative, or 3 if
+   ARG may be positive or negative.  */
+
+int
+get_range_pos_neg (tree arg)
+{
+  if (arg == error_mark_node)
+    return 3;
+
+  int prec = TYPE_PRECISION (TREE_TYPE (arg));
+  int cnt = 0;
+  if (TREE_CODE (arg) == INTEGER_CST)
+    {
+      wide_int w = wi::sext (arg, prec);
+      if (wi::neg_p (w))
+	return 2;
+      else
+	return 1;
+    }
+  while (CONVERT_EXPR_P (arg)
+	 && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0)))
+	 && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) <= prec)
+    {
+      arg = TREE_OPERAND (arg, 0);
+      /* Narrower value zero extended into wider type
+	 will always result in positive values.  */
+      if (TYPE_UNSIGNED (TREE_TYPE (arg))
+	  && TYPE_PRECISION (TREE_TYPE (arg)) < prec)
+	return 1;
+      prec = TYPE_PRECISION (TREE_TYPE (arg));
+      if (++cnt > 30)
+	return 3;
+    }
+
+  if (TREE_CODE (arg) != SSA_NAME)
+    return 3;
+  wide_int arg_min, arg_max;
+  while (get_range_info (arg, &arg_min, &arg_max) != VR_RANGE)
+    {
+      gimple *g = SSA_NAME_DEF_STMT (arg);
+      if (is_gimple_assign (g)
+	  && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (g)))
+	{
+	  tree t = gimple_assign_rhs1 (g);
+	  if (INTEGRAL_TYPE_P (TREE_TYPE (t))
+	      && TYPE_PRECISION (TREE_TYPE (t)) <= prec)
+	    {
+	      if (TYPE_UNSIGNED (TREE_TYPE (t))
+		  && TYPE_PRECISION (TREE_TYPE (t)) < prec)
+		return 1;
+	      prec = TYPE_PRECISION (TREE_TYPE (t));
+	      arg = t;
+	      if (++cnt > 30)
+		return 3;
+	      continue;
+	    }
+	}
+      return 3;
+    }
+  if (TYPE_UNSIGNED (TREE_TYPE (arg)))
+    {
+      /* For unsigned values, the "positive" range comes
+	 below the "negative" range.  */
+      if (!wi::neg_p (wi::sext (arg_max, prec), SIGNED))
+	return 1;
+      if (wi::neg_p (wi::sext (arg_min, prec), SIGNED))
+	return 2;
+    }
+  else
+    {
+      if (!wi::neg_p (wi::sext (arg_min, prec), SIGNED))
+	return 1;
+      if (wi::neg_p (wi::sext (arg_max, prec), SIGNED))
+	return 2;
+    }
+  return 3;
+}
+
+
+
+
 /* Return true if ARG is marked with the nonnull attribute in the
    current function signature.  */
 
--- gcc/tree.h.jj	2017-02-09 14:55:34.000000000 +0100
+++ gcc/tree.h	2017-02-22 10:11:05.448691171 +0100
@@ -4876,6 +4876,7 @@ extern bool gimple_canonical_types_compa
 						 bool trust_type_canonical = true);
 extern bool type_with_interoperable_signedness (const_tree);
 extern bitmap get_nonnull_args (const_tree);
+extern int get_range_pos_neg (tree);
 
 /* Return simplified tree code of type that is used for canonical type
    merging.  */
--- gcc/expr.c.jj	2017-01-19 16:58:23.000000000 +0100
+++ gcc/expr.c	2017-02-22 10:22:20.290996636 +0100
@@ -8809,6 +8809,34 @@ expand_expr_real_2 (sepops ops, rtx targ
 	 where some terms of the dividend have coeffs divisible by it.  */
       expand_operands (treeop0, treeop1,
 		       subtarget, &op0, &op1, EXPAND_NORMAL);
+      if (SCALAR_INT_MODE_P (mode)
+	  && optimize >= 2
+	  && get_range_pos_neg (treeop0) == 1
+	  && get_range_pos_neg (treeop1) == 1)
+	{
+	  /* If both arguments are known to be positive when interpreted
+	     as signed, we can expand it as both signed and unsigned
+	     division or modulo.  Choose the cheaper sequence in that case.  */
+	  bool speed_p = optimize_insn_for_speed_p ();
+	  do_pending_stack_adjust ();
+	  start_sequence ();
+	  rtx uns_ret = expand_divmod (0, code, mode, op0, op1, target, 1);
+	  rtx_insn *uns_insns = get_insns ();
+	  end_sequence ();
+	  start_sequence ();
+	  rtx sgn_ret = expand_divmod (0, code, mode, op0, op1, target, 0);
+	  rtx_insn *sgn_insns = get_insns ();
+	  end_sequence ();
+	  unsigned uns_cost = seq_cost (uns_insns, speed_p);
+	  unsigned sgn_cost = seq_cost (sgn_insns, speed_p);
+	  if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp))
+	    {
+	      emit_insn (uns_insns);
+	      return uns_ret;
+	    }
+	  emit_insn (sgn_insns);
+	  return sgn_ret;
+	}
       return expand_divmod (0, code, mode, op0, op1, target, unsignedp);
 
     case RDIV_EXPR:

	Jakub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Optimize divmod expansion (PR middle-end/79665)
  2017-02-22 21:50 [PATCH] Optimize divmod expansion (PR middle-end/79665) Jakub Jelinek
@ 2017-02-23  6:00 ` Jeff Law
  2017-05-31  8:15   ` Georg-Johann Lay
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2017-02-23  6:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On 02/22/2017 02:40 PM, Jakub Jelinek wrote:
> Hi!
>
> If both arguments of integer division or modulo are known to be non-negative
> in corresponding signed type, then signed as well as unsigned division/modulo
> shall have the exact same result and therefore we can choose between those
> two depending on which one is faster (or shorter for -Os), which varries
> a lot depending on target and especially for constant divisors on the exact
> divisor.  expand_divmod itself is too complicated and we don't even have
> the ability to ask about costs e.g. for highpart multiplication without
> actually expanding it, so this patch just in that case tries both sequences,
> computes their costs and uses the cheaper (and for equal cost honors the
> actual original signedness of the operation).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-02-22  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR middle-end/79665
> 	* internal-fn.c (get_range_pos_neg): Moved to ...
> 	* tree.c (get_range_pos_neg): ... here.  No longer static.
> 	* tree.h (get_range_pos_neg): New prototype.
> 	* expr.c (expand_expr_real_2) <case TRUNC_DIV_EXPR>: If both arguments
> 	are known to be in between 0 and signed maximum inclusive, try to
> 	expand both unsigned and signed divmod and use the cheaper one from
> 	those.
OK.
jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Optimize divmod expansion (PR middle-end/79665)
  2017-02-23  6:00 ` Jeff Law
@ 2017-05-31  8:15   ` Georg-Johann Lay
  2017-05-31  8:19     ` Jakub Jelinek
  0 siblings, 1 reply; 7+ messages in thread
From: Georg-Johann Lay @ 2017-05-31  8:15 UTC (permalink / raw)
  To: Jeff Law, Jakub Jelinek; +Cc: gcc-patches

On 23.02.2017 06:59, Jeff Law wrote:
> On 02/22/2017 02:40 PM, Jakub Jelinek wrote:
>> Hi!
>>
>> If both arguments of integer division or modulo are known to be
>> non-negative
>> in corresponding signed type, then signed as well as unsigned
>> division/modulo
>> shall have the exact same result and therefore we can choose between
>> those
>> two depending on which one is faster (or shorter for -Os), which varries
>> a lot depending on target and especially for constant divisors on the
>> exact
>> divisor.  expand_divmod itself is too complicated and we don't even have
>> the ability to ask about costs e.g. for highpart multiplication without
>> actually expanding it, so this patch just in that case tries both
>> sequences,
>> computes their costs and uses the cheaper (and for equal cost honors the
>> actual original signedness of the operation).
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>
>> 2017-02-22  Jakub Jelinek  <jakub@redhat.com>
>>
>>     PR middle-end/79665
>>     * internal-fn.c (get_range_pos_neg): Moved to ...
>>     * tree.c (get_range_pos_neg): ... here.  No longer static.
>>     * tree.h (get_range_pos_neg): New prototype.
>>     * expr.c (expand_expr_real_2) <case TRUNC_DIV_EXPR>: If both
>> arguments
>>     are known to be in between 0 and signed maximum inclusive, try to
>>     expand both unsigned and signed divmod and use the cheaper one from
>>     those.
> OK.
> jeff

Hi, this causes a performance degradation for avr.

When optimizing for speed, and with a known denominatior, then v6 uses
s/umulMM3_highpart insn to avoid division because no div instruction is
available.

unsigned scale256 (unsigned val)
{
     return value / 255;
}

With this patch, v7 now uses __divmodhi4 which is very expensive but
the costs are not computed because rtlanal.c:seq_cost assumes a cost of
ONE:

   for (; seq; seq = NEXT_INSN (seq))
     {
       set = single_set (seq);
       if (set)
         cost += set_rtx_cost (set, speed);
       else
         cost++;
     }

because divmod in not a single_set:
(gdb) p seq
$10 = (const rtx_insn *) 0x7ffff730d500
(gdb) pr
warning: Expression is not an assignment (and might have no effect)
(insn 14 13 0 (parallel [
             (set (reg:HI 52)
                 (div:HI (reg:HI 47)
                     (reg:HI 54)))
             (set (reg:HI 53)
                 (mod:HI (reg:HI 47)
                     (reg:HI 54)))
             (clobber (reg:QI 21 r21))
             (clobber (reg:HI 22 r22))
             (clobber (reg:HI 24 r24))
             (clobber (reg:HI 26 r26))
         ]) "scale.c":7 -1
      (nil))
(gdb)

Hence the divmod appears to be much less expensive than the unsigned
variant that computed the costs for mult_highpart.


Johann





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Optimize divmod expansion (PR middle-end/79665)
  2017-05-31  8:15   ` Georg-Johann Lay
@ 2017-05-31  8:19     ` Jakub Jelinek
  2017-05-31  9:00       ` Georg-Johann Lay
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Jelinek @ 2017-05-31  8:19 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: Jeff Law, gcc-patches

On Wed, May 31, 2017 at 10:06:34AM +0200, Georg-Johann Lay wrote:
> Hi, this causes a performance degradation for avr.
> 
> When optimizing for speed, and with a known denominatior, then v6 uses
> s/umulMM3_highpart insn to avoid division because no div instruction is
> available.
> 
> unsigned scale256 (unsigned val)
> {
>     return value / 255;
> }
> 
> With this patch, v7 now uses __divmodhi4 which is very expensive but
> the costs are not computed because rtlanal.c:seq_cost assumes a cost of
> ONE:
> 
>   for (; seq; seq = NEXT_INSN (seq))
>     {
>       set = single_set (seq);
>       if (set)
>         cost += set_rtx_cost (set, speed);
>       else
>         cost++;
>     }
> 
> because divmod in not a single_set:
> (gdb) p seq
> $10 = (const rtx_insn *) 0x7ffff730d500
> (gdb) pr
> warning: Expression is not an assignment (and might have no effect)
> (insn 14 13 0 (parallel [
>             (set (reg:HI 52)
>                 (div:HI (reg:HI 47)
>                     (reg:HI 54)))
>             (set (reg:HI 53)
>                 (mod:HI (reg:HI 47)
>                     (reg:HI 54)))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 22 r22))
>             (clobber (reg:HI 24 r24))
>             (clobber (reg:HI 26 r26))
>         ]) "scale.c":7 -1
>      (nil))
> (gdb)
> 
> Hence the divmod appears to be much less expensive than the unsigned
> variant that computed the costs for mult_highpart.

Then you should fix the cost computation - be able to use a target hook
on insns that are not a single set or something similar.

	Jakub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Optimize divmod expansion (PR middle-end/79665)
  2017-05-31  8:19     ` Jakub Jelinek
@ 2017-05-31  9:00       ` Georg-Johann Lay
  2017-05-31  9:08         ` Jakub Jelinek
  0 siblings, 1 reply; 7+ messages in thread
From: Georg-Johann Lay @ 2017-05-31  9:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, gcc-patches

On 31.05.2017 10:15, Jakub Jelinek wrote:
> On Wed, May 31, 2017 at 10:06:34AM +0200, Georg-Johann Lay wrote:
>> Hi, this causes a performance degradation for avr.
>>
>> When optimizing for speed, and with a known denominatior, then v6 uses
>> s/umulMM3_highpart insn to avoid division because no div instruction is
>> available.
>>
>> unsigned scale256 (unsigned val)
>> {
>>     return value / 255;
>> }
>>
>> With this patch, v7 now uses __divmodhi4 which is very expensive but
>> the costs are not computed because rtlanal.c:seq_cost assumes a cost of
>> ONE:
>>
>>   for (; seq; seq = NEXT_INSN (seq))
>>     {
>>       set = single_set (seq);
>>       if (set)
>>         cost += set_rtx_cost (set, speed);
>>       else
>>         cost++;
>>     }
>>
>> because divmod in not a single_set:
>> (gdb) p seq
>> $10 = (const rtx_insn *) 0x7ffff730d500
>> (gdb) pr
>> warning: Expression is not an assignment (and might have no effect)
>> (insn 14 13 0 (parallel [
>>             (set (reg:HI 52)
>>                 (div:HI (reg:HI 47)
>>                     (reg:HI 54)))
>>             (set (reg:HI 53)
>>                 (mod:HI (reg:HI 47)
>>                     (reg:HI 54)))
>>             (clobber (reg:QI 21 r21))
>>             (clobber (reg:HI 22 r22))
>>             (clobber (reg:HI 24 r24))
>>             (clobber (reg:HI 26 r26))
>>         ]) "scale.c":7 -1
>>      (nil))
>> (gdb)
>>
>> Hence the divmod appears to be much less expensive than the unsigned
>> variant that computed the costs for mult_highpart.
>
> Then you should fix the cost computation - be able to use a target hook
> on insns that are not a single set or something similar.
>
> 	Jakub
>

Are you saying that cost computation in GCC is fundamentally flawed
for anything that it not a single_set?

Johann

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Optimize divmod expansion (PR middle-end/79665)
  2017-05-31  9:00       ` Georg-Johann Lay
@ 2017-05-31  9:08         ` Jakub Jelinek
  2017-05-31 14:04           ` Georg-Johann Lay
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Jelinek @ 2017-05-31  9:08 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: Jeff Law, gcc-patches

On Wed, May 31, 2017 at 10:48:07AM +0200, Georg-Johann Lay wrote:
> > > because divmod in not a single_set:
> > > (gdb) p seq
> > > $10 = (const rtx_insn *) 0x7ffff730d500
> > > (gdb) pr
> > > warning: Expression is not an assignment (and might have no effect)
> > > (insn 14 13 0 (parallel [
> > >             (set (reg:HI 52)
> > >                 (div:HI (reg:HI 47)
> > >                     (reg:HI 54)))
> > >             (set (reg:HI 53)
> > >                 (mod:HI (reg:HI 47)
> > >                     (reg:HI 54)))
> > >             (clobber (reg:QI 21 r21))
> > >             (clobber (reg:HI 22 r22))
> > >             (clobber (reg:HI 24 r24))
> > >             (clobber (reg:HI 26 r26))
> > >         ]) "scale.c":7 -1
> > >      (nil))
> > > (gdb)
> > > 
> > > Hence the divmod appears to be much less expensive than the unsigned
> > > variant that computed the costs for mult_highpart.
> > 
> > Then you should fix the cost computation - be able to use a target hook
> > on insns that are not a single set or something similar.
> 
> Are you saying that cost computation in GCC is fundamentally flawed
> for anything that it not a single_set?

The division/modulo optimization I've added as well as many other spots
in GCC rely on reasonable cost, just grep e.g. all places that call
seq_cost.  So, if it returns something that is a very wrong estimate,
it won't affect just that single optimization, but all others.  Therefore,
you should fix the cost computation, rather than disabling all the places
that use the costs.  Many targets have instructions with multiple sets,
so I'm surprised assuming cost of 1 for them doesn't break many more things.
I think either we should have a separate target hook for multiple sets
instructions, or just call the targetm.rtx_costs on the PARALLEL in that
case and see if the targets compute something reasonable for it, otherwise
either use the cost of the first set, or maximum of all sets (that might be
best) or something similar.

	Jakub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Optimize divmod expansion (PR middle-end/79665)
  2017-05-31  9:08         ` Jakub Jelinek
@ 2017-05-31 14:04           ` Georg-Johann Lay
  0 siblings, 0 replies; 7+ messages in thread
From: Georg-Johann Lay @ 2017-05-31 14:04 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, gcc-patches

On 31.05.2017 11:00, Jakub Jelinek wrote:
> On Wed, May 31, 2017 at 10:48:07AM +0200, Georg-Johann Lay wrote:
>>>> because divmod in not a single_set:
>>>> (gdb) p seq
>>>> $10 = (const rtx_insn *) 0x7ffff730d500
>>>> (gdb) pr
>>>> warning: Expression is not an assignment (and might have no effect)
>>>> (insn 14 13 0 (parallel [
>>>>             (set (reg:HI 52)
>>>>                 (div:HI (reg:HI 47)
>>>>                     (reg:HI 54)))
>>>>             (set (reg:HI 53)
>>>>                 (mod:HI (reg:HI 47)
>>>>                     (reg:HI 54)))
>>>>             (clobber (reg:QI 21 r21))
>>>>             (clobber (reg:HI 22 r22))
>>>>             (clobber (reg:HI 24 r24))
>>>>             (clobber (reg:HI 26 r26))
>>>>         ]) "scale.c":7 -1
>>>>      (nil))
>>>> (gdb)
>>>>
>>>> Hence the divmod appears to be much less expensive than the unsigned
>>>> variant that computed the costs for mult_highpart.
>>>
>>> Then you should fix the cost computation - be able to use a target hook
>>> on insns that are not a single set or something similar.
>>
>> Are you saying that cost computation in GCC is fundamentally flawed
>> for anything that it not a single_set?
>
> The division/modulo optimization I've added as well as many other spots
> in GCC rely on reasonable cost, just grep e.g. all places that call
> seq_cost.  So, if it returns something that is a very wrong estimate,
> it won't affect just that single optimization, but all others.  Therefore,
> you should fix the cost computation, rather than disabling all the places
> that use the costs.  Many targets have instructions with multiple sets,

I didn't intend to disable anything...

Would the following addition be in order?

gcc/
	PR middle-end/80929
	* rtlanal.c (seq_cost) [PARALLEL]: Get cost from insn_rtx_cost
	instead of assuming cost of 1.

Index: rtlanal.c
===================================================================
--- rtlanal.c	(revision 248737)
+++ rtlanal.c	(working copy)
@@ -5300,6 +5300,8 @@ seq_cost (const rtx_insn *seq, bool spee
        set = single_set (seq);
        if (set)
          cost += set_rtx_cost (set, speed);
+      else if (PARALLEL == GET_CODE (PATTERN (seq)))
+	cost += insn_rtx_cost (PATTERN (seq), speed);
        else
          cost++;
      }


> so I'm surprised assuming cost of 1 for them doesn't break many more things.

Maybe because PARALLEL is not common, and when expand tests for costs of
DIV or MOD, it passes respective RTXes to the RTL cost functions, and
*not* what the target expands in divmod insns.

> I think either we should have a separate target hook for multiple sets
> instructions, or just call the targetm.rtx_costs on the PARALLEL in that
> case and see if the targets compute something reasonable for it, otherwise
> either use the cost of the first set, or maximum of all sets (that might be
> best) or something similar.
>
> 	Jakub


The patch uses whatever insn_rtx_costs comes up with.  For PARALLEL,
it's the cost of the 1st SET which is reasonable imo (at least for the
divmod case).

Johann


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-05-31 14:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-22 21:50 [PATCH] Optimize divmod expansion (PR middle-end/79665) Jakub Jelinek
2017-02-23  6:00 ` Jeff Law
2017-05-31  8:15   ` Georg-Johann Lay
2017-05-31  8:19     ` Jakub Jelinek
2017-05-31  9:00       ` Georg-Johann Lay
2017-05-31  9:08         ` Jakub Jelinek
2017-05-31 14:04           ` Georg-Johann Lay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).