[PATCH (0/7)] Improve use of Widening Multiplies

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH (0/7)] Improve use of Widening Multiplies
@ 2011-06-23 14:38 Andrew Stubbs
  2011-06-23 14:39 ` [PATCH (1/7)] New optab framework for widening multiplies Andrew Stubbs
                   ` (9 more replies)
  0 siblings, 10 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-23 14:38 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1895 bytes --]

Hi all,

This patch series is intended to improve use of widening multiply, and 
widening multiply-and-accumulate instructions. This is primarily for the 
benefit of ARM targets, but should give some improvements to other 
targets also.

The patches provide a number of improvements:

  * Support for instructions that widen by more than one mode
    (e.g. from HImode to DImode).

  * Use of widening multiplies even when the input mode is narrower than
    the instruction uses. (e.g. Use HI->DI to do QI->DI).

  * Use of signed widening multiplies (of a larger mode) where unsigned
    multiplies are not available.

  * Support for input operands with mis-matched signedness, with or
    without usmul_widen_optab.

  * Support for input operands with mis-matched mode [1].

  * Improved pattern matching in the widening_mult pass.
    * Recognition of true types, even if obscured by a cast.
    * Insertion of extra gimple statements where the existing code was
      incompatible with widening multiplies.
    * Recognition of widening multiply-and-accumulate even where the
      multiply expression was not widening.

The end result is that, on ARM, many many of the cases where the 
compiler would fall back to regular multiplies, extensions, and add 
instructions can now be handled with just one instruction.

For those interested in the before and after states, I have attached a 
couple of shell scripts. These generate test cases with many 
permutations of types and signedness.

[1] Operands of mis-matched mode are multiplied by extending the smaller 
one to match the larger one. Although this does not support mis-matched 
mode instructions directly, this ought to improve the chances of the 
combine pass doing The Right Thing. (Although this does depend on there 
being a suitable matched-mode instruction for widen_mult/expand to use.)

So, on to the patches ....

Andrew

[-- Attachment #2: script2 --]
[-- Type: text/plain, Size: 733 bytes --]

#!/bin/bash

for op in madd mul; do
  for i1 in char short int "long long"; do
    for i2 in char short int "long long"; do
      for o in char short int "long long"; do
	for x in unsigned signed; do
	  for y in unsigned signed; do
	    for z in unsigned signed; do
	      for c in cast nocast; do
		echo "$x $o"
		echo "${op}_${x}_${o/ /}_${y}_${i1/ /}_${z}_${i2/ /}_$c ($x $o a, $y $i1 *b, $z $i2 *c)"
		echo "{"
		case $op+$c in
		madd+cast)   echo "  return a + ($x $o)*b * ($x $o)*c;" ;;
		madd+nocast) echo "  return a + *b * *c;" ;;
		mul+cast)    echo "  return ($x $o)*b * ($x $o)*c;" ;;
		mul+nocast)  echo "  return *b * *c;" ;;
		esac
		echo "}"
		echo
	      done
	    done
	  done
	done
      done
    done
  done
done

[-- Attachment #3: script3 --]
[-- Type: text/plain, Size: 723 bytes --]

#!/bin/bash

for op in madd mul; do
  for i1 in char short int "long long"; do
    for i2 in char short int "long long"; do
      for o in char short int "long long"; do
	for x in unsigned signed; do
	  for y in unsigned signed; do
	    for z in unsigned signed; do
	      for c in cast nocast; do
		echo "$x $o"
		echo "${op}_${x}_${o/ /}_${y}_${i1/ /}_${z}_${i2/ /}_$c ($x $o a, $y $i1 b, $z $i2 c)"
		echo "{"
		case $op+$c in
		madd+cast)   echo "  return a + ($x $o)b * ($x $o)c;" ;;
		madd+nocast) echo "  return a + b * c;" ;;
		mul+cast)    echo "  return ($x $o)b * ($x $o)c;" ;;
		mul+nocast)  echo "  return b * c;" ;;
		esac
		echo "}"
		echo
	      done
	    done
	  done
	done
      done
    done
  done
done

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH (1/7)] New optab framework for widening multiplies
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
@ 2011-06-23 14:39 ` Andrew Stubbs
  2011-07-09 15:38   ` Andrew Stubbs
  2011-06-23 14:41 ` [PATCH (2/7)] Widening multiplies by more than one mode Andrew Stubbs
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-23 14:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]

This patch should have no effect on the compiler output. It merely 
replaces one way to represent widening operations with another, and 
refactors the other parts of the compiler to match. The rest of the 
patch set uses this new framework to implement the optimization 
improvements.

I considered and discarded many approaches to this patch before arriving 
at this solution, and I feel sure that there'll be somebody out there 
who will think I chose the wrong one, so let me first explain how I got 
here ....

The aim is to be able to encode and query optabs that have any given 
input mode, and any given output mode. This is similar to the 
convert_optab, but not compatible with that optab since it is handled 
completely differently in the code.

(Just to be clear, the existing widening multiply support only covers 
instructions that widen by *one* mode, so it's only ever been necessary 
to know the output mode, up to now.)

Option 1 was to add a second dimension to the handlers table in optab_d, 
but I discarded this option because it would increase the memory usage 
by the square of the number of modes, which is a bit much.

Option 2 was to add a whole new optab, similar to optab_d, but with a 
second dimension like convert_optab_d, however this turned out to cause 
way too many pointer type mismatches in the code, and would have been 
very difficult to fix up.

Option 3 was to add new optab entries for widening by two modes, by 
three modes, and so on. True, I would only need to add one extra set for 
what I need, but there would be so many places in the code that compare 
against smul_widen_optab, for example, that would need to be taught 
about these, that it seemed like a bad idea.

Option 4 was to have a separate table that contained the widening 
operations, and refer to that whenever a widening entry in the main 
optab is referenced, but I found that there was no easy way to do the 
mapping without putting some sort of switch table in 
widening_optab_handler, and that negates the other advantages.

So, what I've done in the end is add a new pointer entry "widening" into 
optab_d, and dynamically build the widening operations table for each 
optab that needs it. I've then added new accessor functions that take 
both input and output modes, and altered the code to use them where 
appropriate.

The down-side of this approach is that the optab entries for widening 
operations now have two "handlers" tables, one of which is redundant. 
That said, those cases are in the minority, and it is the smaller table 
which is unused.

If people find that very distasteful, it might be possible to remove the 
*_widen_optab entries and unify smul_optab with smul_widen_optab, and so 
on, and save space that way. I've not done so yet, but I expect I could 
if people feel strongly about it.

As a side-effect, it's now possible for any optab to be "widening", 
should some target happen to have a widening add, shift, or whatever.

Is this patch OK?

Andrew

[-- Attachment #2: widening-multiplies-1.patch --]
[-- Type: text/x-patch, Size: 14510 bytes --]

2011-06-23  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* expr.c (expand_expr_real_2): Use widening_optab_handler.
	* genopinit.c (optabs): Use set_widening_optab_handler for $N.
	(gen_insn): $N now means $a must be wider than $b, not consecutive.
	* optabs.c (expand_widen_pattern_expr): Use widening_optab_handler.
	(expand_binop_directly): Likewise.
	(expand_binop): Likewise.
	* optabs.h (widening_optab_handlers): New struct.
	(optab_d): New member, 'widening'.
	(widening_optab_handler): New function.
	(set_widening_optab_handler): New function.
	* tree-ssa-math-opts.c (convert_mult_to_widen): Use
	widening_optab_handler.
	(convert_plusminus_to_widen): Likewise.

--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7634,7 +7634,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  this_optab = usmul_widen_optab;
 	  if (mode == GET_MODE_2XWIDER_MODE (innermode))
 	    {
-	      if (optab_handler (this_optab, mode) != CODE_FOR_nothing)
+	      if (widening_optab_handler (this_optab, mode, innermode)
+		    != CODE_FOR_nothing)
 		{
 		  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
 		    expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -7661,7 +7662,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  if (mode == GET_MODE_2XWIDER_MODE (innermode)
 	      && TREE_CODE (treeop0) != INTEGER_CST)
 	    {
-	      if (optab_handler (this_optab, mode) != CODE_FOR_nothing)
+	      if (widening_optab_handler (this_optab, mode, innermode)
+		    != CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
 				   EXPAND_NORMAL);
@@ -7669,7 +7671,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 					       unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	      if (optab_handler (other_optab, mode) != CODE_FOR_nothing
+	      if (widening_optab_handler (other_optab, mode, innermode)
+		    != CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
 		  rtx htem, hipart;
--- a/gcc/genopinit.c
+++ b/gcc/genopinit.c
@@ -46,10 +46,12 @@ along with GCC; see the file COPYING3.  If not see
    used.  $A and $B are replaced with the full name of the mode; $a and $b
    are replaced with the short form of the name, as above.
 
-   If $N is present in the pattern, it means the two modes must be consecutive
-   widths in the same mode class (e.g, QImode and HImode).  $I means that
-   only full integer modes should be considered for the next mode, and $F
-   means that only float modes should be considered.
+   If $N is present in the pattern, it means the two modes must be in
+   the same mode class, and $b must be greater than $a (e.g, QImode
+   and HImode).
+
+   $I means that only full integer modes should be considered for the
+   next mode, and $F means that only float modes should be considered.
    $P means that both full and partial integer modes should be considered.
    $Q means that only fixed-point modes should be considered.
 
@@ -99,17 +101,17 @@ static const char * const optabs[] =
   "set_optab_handler (smulv_optab, $A, CODE_FOR_$(mulv$I$a3$))",
   "set_optab_handler (umul_highpart_optab, $A, CODE_FOR_$(umul$a3_highpart$))",
   "set_optab_handler (smul_highpart_optab, $A, CODE_FOR_$(smul$a3_highpart$))",
-  "set_optab_handler (smul_widen_optab, $B, CODE_FOR_$(mul$a$b3$)$N)",
-  "set_optab_handler (umul_widen_optab, $B, CODE_FOR_$(umul$a$b3$)$N)",
-  "set_optab_handler (usmul_widen_optab, $B, CODE_FOR_$(usmul$a$b3$)$N)",
-  "set_optab_handler (smadd_widen_optab, $B, CODE_FOR_$(madd$a$b4$)$N)",
-  "set_optab_handler (umadd_widen_optab, $B, CODE_FOR_$(umadd$a$b4$)$N)",
-  "set_optab_handler (ssmadd_widen_optab, $B, CODE_FOR_$(ssmadd$a$b4$)$N)",
-  "set_optab_handler (usmadd_widen_optab, $B, CODE_FOR_$(usmadd$a$b4$)$N)",
-  "set_optab_handler (smsub_widen_optab, $B, CODE_FOR_$(msub$a$b4$)$N)",
-  "set_optab_handler (umsub_widen_optab, $B, CODE_FOR_$(umsub$a$b4$)$N)",
-  "set_optab_handler (ssmsub_widen_optab, $B, CODE_FOR_$(ssmsub$a$b4$)$N)",
-  "set_optab_handler (usmsub_widen_optab, $B, CODE_FOR_$(usmsub$a$b4$)$N)",
+  "set_widening_optab_handler (smul_widen_optab, $B, $A, CODE_FOR_$(mul$a$b3$)$N)",
+  "set_widening_optab_handler (umul_widen_optab, $B, $A, CODE_FOR_$(umul$a$b3$)$N)",
+  "set_widening_optab_handler (usmul_widen_optab, $B, $A, CODE_FOR_$(usmul$a$b3$)$N)",
+  "set_widening_optab_handler (smadd_widen_optab, $B, $A, CODE_FOR_$(madd$a$b4$)$N)",
+  "set_widening_optab_handler (umadd_widen_optab, $B, $A, CODE_FOR_$(umadd$a$b4$)$N)",
+  "set_widening_optab_handler (ssmadd_widen_optab, $B, $A, CODE_FOR_$(ssmadd$a$b4$)$N)",
+  "set_widening_optab_handler (usmadd_widen_optab, $B, $A, CODE_FOR_$(usmadd$a$b4$)$N)",
+  "set_widening_optab_handler (smsub_widen_optab, $B, $A, CODE_FOR_$(msub$a$b4$)$N)",
+  "set_widening_optab_handler (umsub_widen_optab, $B, $A, CODE_FOR_$(umsub$a$b4$)$N)",
+  "set_widening_optab_handler (ssmsub_widen_optab, $B, $A, CODE_FOR_$(ssmsub$a$b4$)$N)",
+  "set_widening_optab_handler (usmsub_widen_optab, $B, $A, CODE_FOR_$(usmsub$a$b4$)$N)",
   "set_optab_handler (sdiv_optab, $A, CODE_FOR_$(div$a3$))",
   "set_optab_handler (ssdiv_optab, $A, CODE_FOR_$(ssdiv$Q$a3$))",
   "set_optab_handler (sdivv_optab, $A, CODE_FOR_$(div$V$I$a3$))",
@@ -305,7 +307,7 @@ gen_insn (rtx insn)
     {
       int force_float = 0, force_int = 0, force_partial_int = 0;
       int force_fixed = 0;
-      int force_consec = 0;
+      int force_wider = 0;
       int matches = 1;
 
       for (pp = optabs[pindex]; pp[0] != '$' || pp[1] != '('; pp++)
@@ -323,7 +325,7 @@ gen_insn (rtx insn)
 	    switch (*++pp)
 	      {
 	      case 'N':
-		force_consec = 1;
+		force_wider = 1;
 		break;
 	      case 'I':
 		force_int = 1;
@@ -392,7 +394,10 @@ gen_insn (rtx insn)
 			    || mode_class[i] == MODE_VECTOR_FRACT
 			    || mode_class[i] == MODE_VECTOR_UFRACT
 			    || mode_class[i] == MODE_VECTOR_ACCUM
-			    || mode_class[i] == MODE_VECTOR_UACCUM))
+			    || mode_class[i] == MODE_VECTOR_UACCUM)
+			&& (! force_wider
+			    || *pp == 'a'
+			    || m1 < i))
 		      break;
 		  }
 
@@ -412,8 +417,7 @@ gen_insn (rtx insn)
 	}
 
       if (matches && pp[0] == '$' && pp[1] == ')'
-	  && *np == 0
-	  && (! force_consec || (int) GET_MODE_WIDER_MODE(m1) == m2))
+	  && *np == 0)
 	break;
     }
 
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -515,8 +515,8 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
       || ops->code == WIDEN_MULT_MINUS_EXPR)
-    icode = optab_handler (widen_pattern_optab,
-			   TYPE_MODE (TREE_TYPE (ops->op2)));
+    icode = widening_optab_handler (widen_pattern_optab,
+				    TYPE_MODE (TREE_TYPE (ops->op2)), tmode0);
   else
     icode = optab_handler (widen_pattern_optab, tmode0);
   gcc_assert (icode != CODE_FOR_nothing);
@@ -1242,7 +1242,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
 		       rtx target, int unsignedp, enum optab_methods methods,
 		       rtx last)
 {
-  enum insn_code icode = optab_handler (binoptab, mode);
+  enum machine_mode from_mode = GET_MODE (op0);
+  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);
   enum machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
   enum machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
   enum machine_mode mode0, mode1, tmp_mode;
@@ -1389,7 +1390,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
   /* If we can do it with a three-operand insn, do so.  */
 
   if (methods != OPTAB_MUST_WIDEN
-      && optab_handler (binoptab, mode) != CODE_FOR_nothing)
+      && widening_optab_handler (binoptab, mode, GET_MODE (op0))
+	    != CODE_FOR_nothing)
     {
       temp = expand_binop_directly (mode, binoptab, op0, op1, target,
 				    unsignedp, methods, last);
@@ -1429,8 +1431,9 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 
   if (binoptab == smul_optab
       && GET_MODE_WIDER_MODE (mode) != VOIDmode
-      && (optab_handler ((unsignedp ? umul_widen_optab : smul_widen_optab),
-			 GET_MODE_WIDER_MODE (mode))
+      && (widening_optab_handler ((unsignedp ? umul_widen_optab
+					     : smul_widen_optab),
+				  GET_MODE_WIDER_MODE (mode), mode)
 	  != CODE_FOR_nothing))
     {
       temp = expand_binop (GET_MODE_WIDER_MODE (mode),
@@ -1458,12 +1461,14 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	 wider_mode != VOIDmode;
 	 wider_mode = GET_MODE_WIDER_MODE (wider_mode))
       {
-	if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
+	if (widening_optab_handler (binoptab, wider_mode, mode)
+		!= CODE_FOR_nothing
 	    || (binoptab == smul_optab
 		&& GET_MODE_WIDER_MODE (wider_mode) != VOIDmode
-		&& (optab_handler ((unsignedp ? umul_widen_optab
-				    : smul_widen_optab),
-				   GET_MODE_WIDER_MODE (wider_mode))
+		&& (widening_optab_handler ((unsignedp ? umul_widen_optab
+						       : smul_widen_optab),
+					    GET_MODE_WIDER_MODE (wider_mode),
+					    mode)
 		    != CODE_FOR_nothing)))
 	  {
 	    rtx xop0 = op0, xop1 = op1;
@@ -1896,8 +1901,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
       && optab_handler (add_optab, word_mode) != CODE_FOR_nothing)
     {
       rtx product = NULL_RTX;
-
-      if (optab_handler (umul_widen_optab, mode) != CODE_FOR_nothing)
+      if (widening_optab_handler (umul_widen_optab, mode, word_mode)
+	    != CODE_FOR_nothing)
 	{
 	  product = expand_doubleword_mult (mode, op0, op1, target,
 					    true, methods);
@@ -1906,7 +1911,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	}
 
       if (product == NULL_RTX
-	  && optab_handler (smul_widen_optab, mode) != CODE_FOR_nothing)
+	  && widening_optab_handler (smul_widen_optab, mode, word_mode)
+		!= CODE_FOR_nothing)
 	{
 	  product = expand_doubleword_mult (mode, op0, op1, target,
 					    false, methods);
@@ -1997,7 +2003,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	   wider_mode != VOIDmode;
 	   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
 	{
-	  if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
+	  if (widening_optab_handler (binoptab, wider_mode, mode)
+		  != CODE_FOR_nothing
 	      || (methods == OPTAB_LIB
 		  && optab_libfunc (binoptab, wider_mode)))
 	    {
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -42,6 +42,11 @@ struct optab_handlers
   int insn_code;
 };
 
+struct widening_optab_handlers
+{
+  struct optab_handlers handlers[NUM_MACHINE_MODES][NUM_MACHINE_MODES];
+};
+
 struct optab_d
 {
   enum rtx_code code;
@@ -50,6 +55,7 @@ struct optab_d
   void (*libcall_gen)(struct optab_d *, const char *name, char suffix,
 		      enum machine_mode);
   struct optab_handlers handlers[NUM_MACHINE_MODES];
+  struct widening_optab_handlers *widening;
 };
 typedef struct optab_d * optab;
 
@@ -876,6 +882,23 @@ optab_handler (optab op, enum machine_mode mode)
 			   + (int) CODE_FOR_nothing);
 }
 
+/* Like optab_handler, but for widening_operations that have a TO_MODE and
+  a FROM_MODE.  */
+
+static inline enum insn_code
+widening_optab_handler (optab op, enum machine_mode to_mode,
+			enum machine_mode from_mode)
+{
+  if (to_mode == from_mode)
+    return optab_handler (op, to_mode);
+
+  if (op->widening)
+    return (enum insn_code) (op->widening->handlers[(int) to_mode][(int) from_mode].insn_code
+			     + (int) CODE_FOR_nothing);
+
+  return CODE_FOR_nothing;
+}
+
 /* Record that insn CODE should be used to implement mode MODE of OP.  */
 
 static inline void
@@ -884,6 +907,26 @@ set_optab_handler (optab op, enum machine_mode mode, enum insn_code code)
   op->handlers[(int) mode].insn_code = (int) code - (int) CODE_FOR_nothing;
 }
 
+/* Like set_optab_handler, but for widening operations that have a TO_MODE
+   and a FROM_MODE.  */
+
+static inline void
+set_widening_optab_handler (optab op, enum machine_mode to_mode,
+			    enum machine_mode from_mode, enum insn_code code)
+{
+  if (to_mode == from_mode)
+    set_optab_handler (op, to_mode, code);
+  else
+    {
+      if (op->widening == NULL)
+	op->widening = (struct widening_optab_handlers *)
+	      xcalloc (1, sizeof (struct widening_optab_handlers));
+
+      op->widening->handlers[(int) to_mode][(int) from_mode].insn_code
+	  = (int) code - (int) CODE_FOR_nothing;
+    }
+}
+
 /* Return the insn used to perform conversion OP from mode FROM_MODE
    to mode TO_MODE; return CODE_FOR_nothing if the target does not have
    such an insn.  */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2047,6 +2047,8 @@ convert_mult_to_widen (gimple stmt)
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
+  enum machine_mode to_mode, from_mode;
+  optab op;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2056,12 +2058,17 @@ convert_mult_to_widen (gimple stmt)
   if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
     return false;
 
+  to_mode = TYPE_MODE (type);
+  from_mode = TYPE_MODE (type1);
+
   if (TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2))
-    handler = optab_handler (umul_widen_optab, TYPE_MODE (type));
+    op = umul_widen_optab;
   else if (!TYPE_UNSIGNED (type1) && !TYPE_UNSIGNED (type2))
-    handler = optab_handler (smul_widen_optab, TYPE_MODE (type));
+    op = smul_widen_optab;
   else
-    handler = optab_handler (usmul_widen_optab, TYPE_MODE (type));
+    op = usmul_widen_optab;
+
+  handler = widening_optab_handler (op, to_mode, from_mode);
 
   if (handler == CODE_FOR_nothing)
     return false;
@@ -2090,6 +2097,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
   enum tree_code wmult_code;
+  enum insn_code handler;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2163,7 +2171,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
   this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
-  if (optab_handler (this_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+  if (widening_optab_handler (this_optab, TYPE_MODE (type), TYPE_MODE (type1))
+	== CODE_FOR_nothing)
     return false;
 
   /* ??? May need some type verification here?  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH (2/7)] Widening multiplies by more than one mode
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
  2011-06-23 14:39 ` [PATCH (1/7)] New optab framework for widening multiplies Andrew Stubbs
@ 2011-06-23 14:41 ` Andrew Stubbs
  2011-07-12 10:15   ` Andrew Stubbs
  2011-06-23 14:42 ` [PATCH (3/7)] Widening multiply-and-accumulate pattern matching Andrew Stubbs
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-23 14:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 803 bytes --]

This patch has two effects:

1. It permits the use of widening multiply instructions that widen by 
more than one mode. E.g. HImode -> DImode.

2. It enables the use of widening multiply instructions for (extended) 
inputs of narrower mode than the instruction takes. E.g. QImode -> 
DImode where only HI->DI or SI->DI is available.

Hopefully, most of the patch is self-explanatory, but here are few notes:

The code introduces a temporary FIXME comment; this will be removed 
later in the patch series. In fact, this is not a new restriction; 
previously "type1" and "type2" were implicitly identical because they 
were required to be one mode smaller than "type".

I regard the ARM portion of this patch as obvious, so I don't think I 
need an ARM maintainer to read this.

Is the patch OK?

Andrew


[-- Attachment #2: widening-multiplies-2.patch --]
[-- Type: text/x-patch, Size: 10879 bytes --]

2011-06-23  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/arm/arm.md (maddhidi4): Remove '*' from name.
	* expr.c (expand_expr_real_2): Use find_widening_optab_handler.
	* optabs.c (find_widening_optab_handler): New function.
	(expand_widen_pattern_expr): Use find_widening_optab_handler.
	(expand_binop_directly): Likewise.
	(expand_binop): Likewise.
	* optabs.h (find_widening_optab_handler): New prototype.
	* tree-cfg.c (verify_gimple_assign_binary): Adjust WIDEN_MULT_EXPR
	type precision rules.
	(verify_gimple_assign_ternary): Likewise for WIDEN_MULT_PLUS_EXPR.
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Allow widening by
	more than one mode.
	Explicitly disallow mis-matched input types.
	(convert_mult_to_widen): Use find_widening_optab_handler.
	(convert_plusminus_to_widen): Likewise.

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1857,7 +1857,7 @@
    (set_attr "predicable" "yes")]
 )
 
-(define_insn "*maddhidi4"
+(define_insn "maddhidi4"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
 	(plus:DI
 	  (mult:DI (sign_extend:DI
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7632,19 +7632,16 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	{
 	  enum machine_mode innermode = TYPE_MODE (TREE_TYPE (treeop0));
 	  this_optab = usmul_widen_optab;
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode))
+	  if (find_widening_optab_handler (this_optab, mode, innermode, 0)
+		!= CODE_FOR_nothing)
 	    {
-	      if (widening_optab_handler (this_optab, mode, innermode)
-		    != CODE_FOR_nothing)
-		{
-		  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
-		    expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
-				     EXPAND_NORMAL);
-		  else
-		    expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
-				     EXPAND_NORMAL);
-		  goto binop3;
-		}
+	      if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
+		expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
+				 EXPAND_NORMAL);
+	      else
+		expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
+				 EXPAND_NORMAL);
+	      goto binop3;
 	    }
 	}
       /* Check for a multiplication with matching signedness.  */
@@ -7659,10 +7656,9 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  optab other_optab = zextend_p ? smul_widen_optab : umul_widen_optab;
 	  this_optab = zextend_p ? umul_widen_optab : smul_widen_optab;
 
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode)
-	      && TREE_CODE (treeop0) != INTEGER_CST)
+	  if (TREE_CODE (treeop0) != INTEGER_CST)
 	    {
-	      if (widening_optab_handler (this_optab, mode, innermode)
+	      if (find_widening_optab_handler (this_optab, mode, innermode, 0)
 		    != CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -7671,7 +7667,7 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 					       unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	      if (widening_optab_handler (other_optab, mode, innermode)
+	      if (find_widening_optab_handler (other_optab, mode, innermode, 0)
 		    != CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -225,6 +225,32 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
   return 1;
 }
 \f
+/* Find a widening optab even if it doesn't widen as much as we want.
+   E.g. if from_mode is HImode, and to_mode is DImode, and there is no
+   direct HI->SI insn, then return SI->DI, if that exists.
+   If PERMIT_NON_WIDENING is non-zero then this can be used with
+   non-widening optabs also.  */
+
+enum insn_code
+find_widening_optab_handler (optab op, enum machine_mode to_mode,
+			     enum machine_mode from_mode,
+			     int permit_non_widening)
+{
+  for (; (permit_non_widening || from_mode != to_mode)
+	 && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
+	 && from_mode != VOIDmode;
+       from_mode = GET_MODE_WIDER_MODE (from_mode))
+    {
+      enum insn_code handler = widening_optab_handler (op, to_mode,
+						       from_mode);
+
+      if (handler != CODE_FOR_nothing)
+	return handler;
+    }
+
+  return CODE_FOR_nothing;
+}
+\f
 /* Widen OP to MODE and return the rtx for the widened operand.  UNSIGNEDP
    says whether OP is signed or unsigned.  NO_EXTEND is nonzero if we need
    not actually do a sign-extend or zero-extend, but can leave the
@@ -515,8 +541,9 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
       || ops->code == WIDEN_MULT_MINUS_EXPR)
-    icode = widening_optab_handler (widen_pattern_optab,
-				    TYPE_MODE (TREE_TYPE (ops->op2)), tmode0);
+    icode = find_widening_optab_handler (widen_pattern_optab,
+					 TYPE_MODE (TREE_TYPE (ops->op2)),
+					 tmode0, 0);
   else
     icode = optab_handler (widen_pattern_optab, tmode0);
   gcc_assert (icode != CODE_FOR_nothing);
@@ -1243,7 +1270,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
 		       rtx last)
 {
   enum machine_mode from_mode = GET_MODE (op0);
-  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);
+  enum insn_code icode = find_widening_optab_handler (binoptab, mode,
+						      from_mode, 1);
   enum machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
   enum machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
   enum machine_mode mode0, mode1, tmp_mode;
@@ -1390,7 +1418,7 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
   /* If we can do it with a three-operand insn, do so.  */
 
   if (methods != OPTAB_MUST_WIDEN
-      && widening_optab_handler (binoptab, mode, GET_MODE (op0))
+      && find_widening_optab_handler (binoptab, mode, GET_MODE (op0), 1)
 	    != CODE_FOR_nothing)
     {
       temp = expand_binop_directly (mode, binoptab, op0, op1, target,
@@ -1461,14 +1489,15 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	 wider_mode != VOIDmode;
 	 wider_mode = GET_MODE_WIDER_MODE (wider_mode))
       {
-	if (widening_optab_handler (binoptab, wider_mode, mode)
+	if (find_widening_optab_handler (binoptab, wider_mode, mode, 1)
 		!= CODE_FOR_nothing
 	    || (binoptab == smul_optab
 		&& GET_MODE_WIDER_MODE (wider_mode) != VOIDmode
-		&& (widening_optab_handler ((unsignedp ? umul_widen_optab
-						       : smul_widen_optab),
-					    GET_MODE_WIDER_MODE (wider_mode),
-					    mode)
+		&& (find_widening_optab_handler ((unsignedp
+						  ? umul_widen_optab
+						  : smul_widen_optab),
+						 GET_MODE_WIDER_MODE (wider_mode),
+						 mode, 0)
 		    != CODE_FOR_nothing)))
 	  {
 	    rtx xop0 = op0, xop1 = op1;
@@ -2003,7 +2032,7 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	   wider_mode != VOIDmode;
 	   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
 	{
-	  if (widening_optab_handler (binoptab, wider_mode, mode)
+	  if (find_widening_optab_handler (binoptab, wider_mode, mode, 1)
 		  != CODE_FOR_nothing
 	      || (methods == OPTAB_LIB
 		  && optab_libfunc (binoptab, wider_mode)))
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -807,6 +807,10 @@ extern rtx expand_copysign (rtx, rtx, rtx);
 extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
+/* Find a widening optab even if it doesn't widen as much as we want.  */
+extern enum insn_code find_widening_optab_handler (optab, enum machine_mode,
+						   enum machine_mode, int);
+
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
    distinguish between machines with a vector shift that takes a scalar for the
    shift amount vs. machines that take a vector for the shift amount.  */
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3577,7 +3577,7 @@ do_pointer_plus_expr_check:
     case WIDEN_MULT_EXPR:
       if (TREE_CODE (lhs_type) != INTEGER_TYPE)
 	return true;
-      return ((2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type))
+      return ((2 * TYPE_PRECISION (rhs1_type) > TYPE_PRECISION (lhs_type))
 	      || (TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type)));
 
     case WIDEN_SUM_EXPR:
@@ -3668,7 +3668,7 @@ verify_gimple_assign_ternary (gimple stmt)
 	   && !FIXED_POINT_TYPE_P (rhs1_type))
 	  || !useless_type_conversion_p (rhs1_type, rhs2_type)
 	  || !useless_type_conversion_p (lhs_type, rhs3_type)
-	  || 2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type)
+	  || 2 * TYPE_PRECISION (rhs1_type) > TYPE_PRECISION (lhs_type)
 	  || TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type))
 	{
 	  error ("type mismatch in widening multiply-accumulate expression");
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1950,8 +1950,8 @@ struct gimple_opt_pass pass_optimize_bswap =
 /* Return true if RHS is a suitable operand for a widening multiplication.
    There are two cases:
 
-     - RHS makes some value twice as wide.  Store that value in *NEW_RHS_OUT
-       if so, and store its type in *TYPE_OUT.
+     - RHS makes some value at least twice as wide.  Store that value
+       in *NEW_RHS_OUT if so, and store its type in *TYPE_OUT.
 
      - RHS is an integer constant.  Store that value in *NEW_RHS_OUT if so,
        but leave *TYPE_OUT untouched.  */
@@ -1979,7 +1979,7 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
       rhs1 = gimple_assign_rhs1 (stmt);
       type1 = TREE_TYPE (rhs1);
       if (TREE_CODE (type1) != TREE_CODE (type)
-	  || TYPE_PRECISION (type1) * 2 != TYPE_PRECISION (type))
+	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
 
       *new_rhs_out = rhs1;
@@ -2035,6 +2035,10 @@ is_widening_mult_p (gimple stmt,
       *type2_out = *type1_out;
     }
 
+  /* FIXME: remove this restriction.  */
+  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
+    return false;
+
   return true;
 }
 
@@ -2068,7 +2072,7 @@ convert_mult_to_widen (gimple stmt)
   else
     op = usmul_widen_optab;
 
-  handler = widening_optab_handler (op, to_mode, from_mode);
+  handler = find_widening_optab_handler (op, to_mode, from_mode, 0);
 
   if (handler == CODE_FOR_nothing)
     return false;
@@ -2171,8 +2175,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
   this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
-  if (widening_optab_handler (this_optab, TYPE_MODE (type), TYPE_MODE (type1))
-	== CODE_FOR_nothing)
+  handler = find_widening_optab_handler (this_optab, TYPE_MODE (type),
+					 TYPE_MODE (type1), 0);
+
+  if (handler == CODE_FOR_nothing)
     return false;
 
   /* ??? May need some type verification here?  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
  2011-06-23 14:39 ` [PATCH (1/7)] New optab framework for widening multiplies Andrew Stubbs
  2011-06-23 14:41 ` [PATCH (2/7)] Widening multiplies by more than one mode Andrew Stubbs
@ 2011-06-23 14:42 ` Andrew Stubbs
  2011-06-23 16:28   ` Richard Guenther
  2011-06-23 21:55   ` Janis Johnson
  2011-06-23 14:43 ` [PATCH (4/7)] Unsigned multiplies using wider signed multiplies Andrew Stubbs
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-23 14:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 288 bytes --]

There are many cases where the widening_mult pass does not recognise 
widening multiply-and-accumulate cases simply because there is a type 
conversion step between the multiply and add statements.

This patch should rectify that simply by looking beyond those conversions.

OK?

Andrew


[-- Attachment #2: widening-multiplies-3.patch --]
[-- Type: text/x-patch, Size: 1978 bytes --]

2011-06-23  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Look for
	multiply statement beyond NOP_EXPR statements.

	gcc/testsuite/
	* gcc.target/arm/umlal-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/umlal-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2114,26 +2114,39 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
     wmult_code = WIDEN_MULT_PLUS_EXPR;
 
-  rhs1 = gimple_assign_rhs1 (stmt);
-  rhs2 = gimple_assign_rhs2 (stmt);
-
-  if (TREE_CODE (rhs1) == SSA_NAME)
+  rhs1_stmt = stmt;
+  do
     {
-      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-      if (is_gimple_assign (rhs1_stmt))
-	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+      rhs1_code = ERROR_MARK;
+      rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+
+      if (TREE_CODE (rhs1) == SSA_NAME)
+	{
+	  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+	  if (is_gimple_assign (rhs1_stmt))
+	    rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+	}
+      else
+	return false;
     }
-  else
-    return false;
+  while (rhs1_code == NOP_EXPR);
 
-  if (TREE_CODE (rhs2) == SSA_NAME)
+  rhs2_stmt = stmt;
+  do
     {
-      rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
-      if (is_gimple_assign (rhs2_stmt))
-	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+      rhs2_code = ERROR_MARK;
+      rhs2 = gimple_assign_rhs2 (rhs2_stmt);
+
+      if (rhs2 && TREE_CODE (rhs2) == SSA_NAME)
+	{
+	  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+	  if (is_gimple_assign (rhs2_stmt))
+	    rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+	}
+      else
+	return false;
     }
-  else
-    return false;
+  while (rhs2_code == NOP_EXPR);
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
     {

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
                   ` (2 preceding siblings ...)
  2011-06-23 14:42 ` [PATCH (3/7)] Widening multiply-and-accumulate pattern matching Andrew Stubbs
@ 2011-06-23 14:43 ` Andrew Stubbs
  2011-06-28 13:28   ` Andrew Stubbs
  2011-06-28 13:30   ` Paolo Bonzini
  2011-06-23 14:44 ` [PATCH (5/7)] Widening multiplies for mis-matched mode inputs Andrew Stubbs
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-23 14:43 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 713 bytes --]

If one or both of the inputs to a widening multiply are of unsigned type 
then the compiler will attempt to use usmul_widen_optab or 
umul_widen_optab, respectively.

That works fine, but only if the target supports those operations 
directly. Otherwise, it just bombs out and reverts to the normal 
inefficient non-widening multiply.

This patch attempts to catch these cases and use an alternative signed 
widening multiply instruction, if one of those is available.

I believe this should be legal as long as the top bit of both inputs is 
guaranteed to be zero. The code achieves this guarantee by 
zero-extending the inputs to a wider mode (which must still be narrower 
than the output mode).

OK?

Andrew


[-- Attachment #2: widening-multiplies-4.patch --]
[-- Type: text/x-patch, Size: 7324 bytes --]

2011-06-23  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* Makefile.in (tree-ssa-math-opts.o): Add langhooks.h dependency.
	* optabs.c (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this, and add new
	argument 'found_mode'.
	* optabs.h (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this.
	(find_widening_optab_handler): New macro.
	* tree-ssa-math-opts.c: Include langhooks.h
	(build_and_insert_cast): New function.
	(convert_mult_to_widen): Add new argument 'gsi'.
	Convert unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.
	(execute_optimize_widening_mul): Pass gsi to convert_mult_to_widen.

	gcc/testsuite/
	* gcc.target/arm/smlalbb-1.c: New file.

--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2672,7 +2672,8 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLOW_H) $(CONFIG_H) \
 tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
    $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
-   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h
+   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \
+   langhooks.h
 tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
    $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \
    $(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -232,9 +232,10 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
    non-widening optabs also.  */
 
 enum insn_code
-find_widening_optab_handler (optab op, enum machine_mode to_mode,
-			     enum machine_mode from_mode,
-			     int permit_non_widening)
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+				      enum machine_mode from_mode,
+				      int permit_non_widening,
+				      enum machine_mode *found_mode)
 {
   for (; (permit_non_widening || from_mode != to_mode)
 	 && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
@@ -245,7 +246,11 @@ find_widening_optab_handler (optab op, enum machine_mode to_mode,
 						       from_mode);
 
       if (handler != CODE_FOR_nothing)
-	return handler;
+	{
+	  if (found_mode)
+	    *found_mode = from_mode;
+	  return handler;
+	}
     }
 
   return CODE_FOR_nothing;
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -808,8 +808,13 @@ extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
 /* Find a widening optab even if it doesn't widen as much as we want.  */
-extern enum insn_code find_widening_optab_handler (optab, enum machine_mode,
-						   enum machine_mode, int);
+#define find_widening_optab_handler(A,B,C,D) \
+  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+extern enum insn_code find_widening_optab_handler_and_mode (optab,
+							    enum machine_mode,
+							    enum machine_mode,
+							    int,
+							    enum machine_mode *);
 
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
    distinguish between machines with a vector shift that takes a scalar for the
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/smlalbb-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -98,6 +98,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "basic-block.h"
 #include "target.h"
 #include "gimple-pretty-print.h"
+#include "langhooks.h"
 
 /* FIXME: RTL headers have to be included here for optabs.  */
 #include "rtl.h"		/* Because optabs.h wants enum rtx_code.  */
@@ -1086,6 +1087,21 @@ build_and_insert_ref (gimple_stmt_iterator *gsi, location_t loc, tree type,
   return result;
 }
 
+/* Build a gimple assignment to cast VAL to TYPE, and put the result in
+   TARGET.  Insert the statement prior to GSI's current position, and
+   return the from SSA name.  */
+
+static tree
+build_and_insert_cast (gimple_stmt_iterator *gsi, location_t loc,
+		       tree target, tree val, tree type)
+{
+  tree result = make_ssa_name (target, NULL);
+  gimple stmt = gimple_build_assign (result, fold_convert (type, val));
+  gimple_set_location (stmt, loc);
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  return result;
+}
+
 /* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI
    with location info LOC.  If possible, create an equivalent and
    less expensive sequence of statements prior to GSI, and return an
@@ -2047,7 +2063,7 @@ is_widening_mult_p (gimple stmt,
    value is true iff we converted the statement.  */
 
 static bool
-convert_mult_to_widen (gimple stmt)
+convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
@@ -2075,7 +2091,31 @@ convert_mult_to_widen (gimple stmt)
   handler = find_widening_optab_handler (op, to_mode, from_mode, 0);
 
   if (handler == CODE_FOR_nothing)
-    return false;
+    {
+      if (op != smul_widen_optab)
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+	    return false;
+
+	  op = smul_widen_optab;
+	  handler = find_widening_optab_handler_and_mode (op, to_mode,
+							  from_mode, 0,
+							  &from_mode);
+
+	  if (handler == CODE_FOR_nothing)
+	    return false;
+
+	  type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0);
+
+	  rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+					create_tmp_var (type1, NULL), rhs1, type1);
+	  rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+					create_tmp_var (type2, NULL), rhs2, type2);
+	}
+      else
+	return false;
+    }
 
   gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
   gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
@@ -2182,7 +2222,22 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     return false;
 
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
-    return false;
+    {
+      enum machine_mode mode = TYPE_MODE (type1);
+      mode = GET_MODE_WIDER_MODE (mode);
+      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
+	{
+	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
+	  mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+					     create_tmp_var (type1, NULL),
+					     mult_rhs1, type1);
+	  mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+					     create_tmp_var (type2, NULL),
+					     mult_rhs2, type2);
+	}
+      else
+	return false;
+    }
 
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
@@ -2410,7 +2465,7 @@ execute_optimize_widening_mul (void)
 	      switch (code)
 		{
 		case MULT_EXPR:
-		  if (!convert_mult_to_widen (stmt)
+		  if (!convert_mult_to_widen (stmt, &gsi)
 		      && convert_mult_to_fma (stmt,
 					      gimple_assign_rhs1 (stmt),
 					      gimple_assign_rhs2 (stmt)))

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
                   ` (3 preceding siblings ...)
  2011-06-23 14:43 ` [PATCH (4/7)] Unsigned multiplies using wider signed multiplies Andrew Stubbs
@ 2011-06-23 14:44 ` Andrew Stubbs
  2011-06-28 15:44   ` Andrew Stubbs
  2011-06-23 14:51 ` [PATCH (6/7)] More widening multiply-and-accumulate pattern matching Andrew Stubbs
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-23 14:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 399 bytes --]

This patch removes the restriction that the inputs to a widening 
multiply must be of the same mode.

It does this by extending the smaller of the two inputs to match the 
larger; therefore, it remains the case that subsequent code (in the 
expand pass, for example) can rely on the type of rhs1 being the input 
type of the operation, and the gimple verification code is still valid.

OK?

Andrew


[-- Attachment #2: widening-multiplies-5.patch --]
[-- Type: text/x-patch, Size: 4152 bytes --]

2011-06-23  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME.
	Ensure the the larger type is the first operand.
	(convert_mult_to_widen): Insert cast if type2 is smaller than type1.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/smlalbb-2.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/smlalbb-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+unsigned long long
+foo (unsigned long long a, unsigned char *b, unsigned short *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2051,9 +2051,17 @@ is_widening_mult_p (gimple stmt,
       *type2_out = *type1_out;
     }
 
-  /* FIXME: remove this restriction.  */
-  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
-    return false;
+  /* Ensure that the larger of the two operands comes first. */
+  if (TYPE_PRECISION (*type1_out) < TYPE_PRECISION (*type2_out))
+    {
+      tree tmp;
+      tmp = *type1_out;
+      *type1_out = *type2_out;
+      *type2_out = tmp;
+      tmp = *rhs1_out;
+      *rhs1_out = *rhs2_out;
+      *rhs2_out = tmp;
+    }
 
   return true;
 }
@@ -2069,6 +2077,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   enum insn_code handler;
   enum machine_mode to_mode, from_mode;
   optab op;
+  int cast1 = false, cast2 = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2107,16 +2116,26 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 	    return false;
 
 	  type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0);
-
-	  rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
-					create_tmp_var (type1, NULL), rhs1, type1);
-	  rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
-					create_tmp_var (type2, NULL), rhs2, type2);
+	  cast1 = cast2 = true;
 	}
       else
 	return false;
     }
 
+  if (TYPE_MODE (type2) != from_mode)
+    {
+      type2 = lang_hooks.types.type_for_mode (from_mode,
+					      TYPE_UNSIGNED (type2));
+      cast2 = true;
+    }
+
+  if (cast1)
+    rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+				  create_tmp_var (type1, NULL), rhs1, type1);
+  if (cast2)
+    rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+				  create_tmp_var (type2, NULL), rhs2, type2);
+
   gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
   gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
@@ -2142,6 +2161,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   optab this_optab;
   enum tree_code wmult_code;
   enum insn_code handler;
+  int cast1 = false, cast2 = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2228,17 +2248,28 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
 	{
 	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
-	  mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
-					     create_tmp_var (type1, NULL),
-					     mult_rhs1, type1);
-	  mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
-					     create_tmp_var (type2, NULL),
-					     mult_rhs2, type2);
+	  cast1 = cast2 = true;
 	}
       else
 	return false;
     }
 
+  if (TYPE_MODE (type2) != TYPE_MODE (type1))
+    {
+      type2 = lang_hooks.types.type_for_mode (TYPE_MODE (type1),
+					      TYPE_UNSIGNED (type2));
+      cast2 = true;
+    }
+
+  if (cast1)
+    mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+				       create_tmp_var (type1, NULL),
+				       mult_rhs1, type1);
+  if (cast2)
+    mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+				       create_tmp_var (type2, NULL),
+				       mult_rhs2, type2);
+
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
                   ` (4 preceding siblings ...)
  2011-06-23 14:44 ` [PATCH (5/7)] Widening multiplies for mis-matched mode inputs Andrew Stubbs
@ 2011-06-23 14:51 ` Andrew Stubbs
  2011-06-28 15:49   ` Andrew Stubbs
  2011-06-23 14:54 ` [PATCH (7/7)] Mixed-sign multiplies using narrowest mode Andrew Stubbs
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-23 14:51 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 790 bytes --]

This patch fixes the case where widening multiply-and-accumulate were 
not recognised because the multiplication itself is not actually widening.

This can happen when you have "DI + SI * SI" - the multiplication will 
be done in SImode as a non-widening multiply, and it's only the final 
accumulate step that is widening.

This was not recognised for two reasons:

1. is_widening_mult_p inferred the output type from the multiply 
statement, which in not useful in this case.

2. The inputs to the multiply instruction may not have been converted at 
all (because they're not being widened), so the pattern match failed.

The patch fixes these issues by making the output type explicit, and by 
permitting unconverted inputs (the types are still checked, so this is 
safe).

OK?

Andrew


[-- Attachment #2: widening-multiplies-6.patch --]
[-- Type: text/x-patch, Size: 5025 bytes --]

2011-06-23  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
	'type'.
	Use 'type' from caller, not inferred from 'rhs'.
	Don't reject non-conversion statements. Do return lhs in this case.
	(is_widening_mult_p): Add new argument 'type'.
	Use 'type' from caller, not inferred from 'stmt'.
	Pass type to is_widening_mult_rhs_p.
	(convert_mult_to_widen): Pass type to is_widening_mult_p.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/smlal-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/smlal-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, int *b, int *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1963,7 +1963,8 @@ struct gimple_opt_pass pass_optimize_bswap =
  }
 };
 
-/* Return true if RHS is a suitable operand for a widening multiplication.
+/* Return true if RHS is a suitable operand for a widening multiplication,
+   assuming a target type of TYPE.
    There are two cases:
 
      - RHS makes some value at least twice as wide.  Store that value
@@ -1973,32 +1974,32 @@ struct gimple_opt_pass pass_optimize_bswap =
        but leave *TYPE_OUT untouched.  */
 
 static bool
-is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
+is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
+			tree *new_rhs_out)
 {
   gimple stmt;
-  tree type, type1, rhs1;
+  tree type1, rhs1;
   enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
     {
-      type = TREE_TYPE (rhs);
       stmt = SSA_NAME_DEF_STMT (rhs);
       if (!is_gimple_assign (stmt))
 	return false;
 
-      rhs_code = gimple_assign_rhs_code (stmt);
-      if (TREE_CODE (type) == INTEGER_TYPE
-	  ? !CONVERT_EXPR_CODE_P (rhs_code)
-	  : rhs_code != FIXED_CONVERT_EXPR)
-	return false;
-
       rhs1 = gimple_assign_rhs1 (stmt);
       type1 = TREE_TYPE (rhs1);
       if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
 
-      *new_rhs_out = rhs1;
+      rhs_code = gimple_assign_rhs_code (stmt);
+      if (TREE_CODE (type) == INTEGER_TYPE
+	  ? !CONVERT_EXPR_CODE_P (rhs_code)
+	  : rhs_code != FIXED_CONVERT_EXPR)
+	*new_rhs_out = gimple_assign_lhs (stmt);
+      else
+	*new_rhs_out = rhs1;
       *type_out = type1;
       return true;
     }
@@ -2013,28 +2014,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
   return false;
 }
 
-/* Return true if STMT performs a widening multiplication.  If so,
-   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
-   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
-   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
-   operands of the multiplication.  */
+/* Return true if STMT performs a widening multiplication, assuming the
+   output type is TYPE.  If so, store the unwidened types of the operands
+   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
+   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
+   and *TYPE2_OUT would give the operands of the multiplication.  */
 
 static bool
-is_widening_mult_p (gimple stmt,
+is_widening_mult_p (tree type, gimple stmt,
 		    tree *type1_out, tree *rhs1_out,
 		    tree *type2_out, tree *rhs2_out)
 {
-  tree type;
-
-  type = TREE_TYPE (gimple_assign_lhs (stmt));
   if (TREE_CODE (type) != INTEGER_TYPE
       && TREE_CODE (type) != FIXED_POINT_TYPE)
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
+			       rhs1_out))
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
+			       rhs2_out))
     return false;
 
   if (*type1_out == NULL)
@@ -2084,7 +2084,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != INTEGER_TYPE)
     return false;
 
-  if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
+  if (!is_widening_mult_p (type, stmt, &type1, &rhs1, &type2, &rhs2))
     return false;
 
   to_mode = TYPE_MODE (type);
@@ -2210,14 +2210,14 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
     {
-      if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
     }
   else if (rhs2_code == MULT_EXPR)
     {
-      if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH (7/7)] Mixed-sign multiplies using narrowest mode
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
                   ` (5 preceding siblings ...)
  2011-06-23 14:51 ` [PATCH (6/7)] More widening multiply-and-accumulate pattern matching Andrew Stubbs
@ 2011-06-23 14:54 ` Andrew Stubbs
  2011-06-28 17:02   ` Andrew Stubbs
  2011-06-25 16:14 ` [PATCH (0/7)] Improve use of Widening Multiplies Bernd Schmidt
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-23 14:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

Patch 4 introduced support for using signed multiplies to code unsigned 
multiplies in a narrower mode. Patch 5 then introduced support for 
mis-matched input modes.

These two combined mean that there is case where only the smaller of two 
inputs is unsigned, and yet it still tries to user a mode wider than the 
larger, signed input. This is bad because it means unnecessary extends 
and because the wider operation might not exist.

This patch catches that case, and ensures that the smaller, unsigned 
input, is zero-extended to match the mode of the larger, signed input.

Of course, both inputs may still have to be extended to fit the nearest 
available instruction, so it doesn't make a difference every time.

OK?

Andrew


[-- Attachment #2: widening-multiplies-7.patch --]
[-- Type: text/x-patch, Size: 2437 bytes --]

2011-06-23  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Better handle
	unsigned inputs of different modes.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/smlalbb-3.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/smlalbb-3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, short *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2103,9 +2103,17 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
     {
       if (op != smul_widen_optab)
 	{
-	  from_mode = GET_MODE_WIDER_MODE (from_mode);
-	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
-	    return false;
+	  /* We can use a signed multiply with unsigned types as long as
+	     there is a wider mode to use, or it is the smaller of the two
+	     types that is unsigned.  Note that type1 >= type2, always.  */
+	  if (TYPE_UNSIGNED (type1)
+	      || (TYPE_UNSIGNED (type2)
+		  && TYPE_MODE (type2) == from_mode))
+	    {
+	      from_mode = GET_MODE_WIDER_MODE (from_mode);
+	      if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+		return false;
+	    }
 
 	  op = smul_widen_optab;
 	  handler = find_widening_optab_handler_and_mode (op, to_mode,
@@ -2244,14 +2252,21 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     {
       enum machine_mode mode = TYPE_MODE (type1);
-      mode = GET_MODE_WIDER_MODE (mode);
-      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
+
+      /* We can use a signed multiply with unsigned types as long as
+	 there is a wider mode to use, or it is the smaller of the two
+	 types that is unsigned.  Note that type1 >= type2, always.  */
+      if (TYPE_UNSIGNED (type1)
+	  || (TYPE_UNSIGNED (type2)
+	      && TYPE_MODE (type2) == mode))
 	{
-	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
-	  cast1 = cast2 = true;
+	  mode = GET_MODE_WIDER_MODE (mode);
+	  if (GET_MODE_SIZE (mode) >= GET_MODE_SIZE (TYPE_MODE (type)))
+	    return false;
 	}
-      else
-	return false;
+
+      type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
+      cast1 = cast2 = true;
     }
 
   if (TYPE_MODE (type2) != TYPE_MODE (type1))

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-23 14:42 ` [PATCH (3/7)] Widening multiply-and-accumulate pattern matching Andrew Stubbs
@ 2011-06-23 16:28   ` Richard Guenther
  2011-06-24  8:14     ` Andrew Stubbs
  2011-06-23 21:55   ` Janis Johnson
  1 sibling, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-06-23 16:28 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Thu, Jun 23, 2011 at 4:40 PM, Andrew Stubbs <andrew.stubbs@linaro.org> wrote:
> There are many cases where the widening_mult pass does not recognise
> widening multiply-and-accumulate cases simply because there is a type
> conversion step between the multiply and add statements.
>
> This patch should rectify that simply by looking beyond those conversions.

That's surely wrong for (int)(short)int_var.  You have to constrain
the conversions
you look through properly.

Richard.

> OK?
>
> Andrew
>
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-23 14:42 ` [PATCH (3/7)] Widening multiply-and-accumulate pattern matching Andrew Stubbs
  2011-06-23 16:28   ` Richard Guenther
@ 2011-06-23 21:55   ` Janis Johnson
  1 sibling, 0 replies; 107+ messages in thread
From: Janis Johnson @ 2011-06-23 21:55 UTC (permalink / raw)
  To: gcc-patches, Andrew Stubbs

On 06/23/2011 07:40 AM, Andrew Stubbs wrote:

+++ b/gcc/testsuite/gcc.target/arm/umlal-1.c
+/* { dg-final { scan-assembler "umlal" } } */

Don't use the name of the instruction as the test name or the scan
will always pass, because the file name shows up in assembly output.

See http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01823.html for a
proposed effective target that can be used in this test.

Janis

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-23 16:28   ` Richard Guenther
@ 2011-06-24  8:14     ` Andrew Stubbs
  2011-06-24  9:31       ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-24  8:14 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

On 23/06/11 17:26, Richard Guenther wrote:
> On Thu, Jun 23, 2011 at 4:40 PM, Andrew Stubbs<andrew.stubbs@linaro.org>  wrote:
>> There are many cases where the widening_mult pass does not recognise
>> widening multiply-and-accumulate cases simply because there is a type
>> conversion step between the multiply and add statements.
>>
>> This patch should rectify that simply by looking beyond those conversions.
>
> That's surely wrong for (int)(short)int_var.  You have to constrain
> the conversions
> you look through properly.

To be clear, it only skips past NOP_EXPR. Is it not the case that what 
you're describing would need a CONVERT_EXPR?

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-24  8:14     ` Andrew Stubbs
@ 2011-06-24  9:31       ` Richard Guenther
  2011-06-24 14:08         ` Stubbs, Andrew
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-06-24  9:31 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Fri, Jun 24, 2011 at 10:05 AM, Andrew Stubbs
<andrew.stubbs@linaro.org> wrote:
> On 23/06/11 17:26, Richard Guenther wrote:
>>
>> On Thu, Jun 23, 2011 at 4:40 PM, Andrew Stubbs<andrew.stubbs@linaro.org>
>>  wrote:
>>>
>>> There are many cases where the widening_mult pass does not recognise
>>> widening multiply-and-accumulate cases simply because there is a type
>>> conversion step between the multiply and add statements.
>>>
>>> This patch should rectify that simply by looking beyond those
>>> conversions.
>>
>> That's surely wrong for (int)(short)int_var.  You have to constrain
>> the conversions
>> you look through properly.
>
> To be clear, it only skips past NOP_EXPR. Is it not the case that what
> you're describing would need a CONVERT_EXPR?

NOP_EXPR is the same as CONVERT_EXPR.

Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-24  9:31       ` Richard Guenther
@ 2011-06-24 14:08         ` Stubbs, Andrew
  2011-06-24 16:13           ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Stubbs, Andrew @ 2011-06-24 14:08 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew Stubbs, gcc-patches, patches

On 24/06/11 09:28, Richard Guenther wrote:
>> >  To be clear, it only skips past NOP_EXPR. Is it not the case that what
>> >  you're describing would need a CONVERT_EXPR?
> NOP_EXPR is the same as CONVERT_EXPR.

Are you sure?

I thought this was safe because the internals manual says:

   NOP_EXPR
   These nodes are used to represent conversions that do not require any
   code-generation ....

   CONVERT_EXPR
   These nodes are similar to NOP_EXPRs, but are used in those
   situations where code may need to be generated ....

So, I tried this example:

int
foo (int a, short b, short c)
{
   int bc = b * c;
   return a + (short)bc;
}

Both before and after my patch, GCC gives:

         mul     r2, r1, r2
         sxtah   r0, r0, r2

(where, SXTAH means sign-extend the third operand from HImode to SImode 
and add to the second operand.)

The dump after the widening_mult pass is:

foo (int a, short int b, short int c)
{
   int bc;
   int D.2018;
   short int D.2017;
   int D.2016;
   int D.2015;
   int D.2014;

<bb 2>:
   D.2014_2 = (int) b_1(D);
   D.2015_4 = (int) c_3(D);
   bc_5 = b_1(D) w* c_3(D);
   D.2017_6 = (short int) bc_5;
   D.2018_7 = (int) D.2017_6;
   D.2016_9 = D.2018_7 + a_8(D);
   return D.2016_9;

}

Where you can clearly see that the addition has not been recognised as a 
multiply-and-accumulate.

When I step through convert_plusminus_to_widen, I can see that the 
reason it has not matched is because "D.2017_6 = (short int) bc_5" is 
encoded with a CONVERT_EXPR, just as the manual said it would be.

So, according to the manual, and my (admittedly limited) experiments, 
skipping over NOP_EXPR does appear to be safe.

But you say that it isn't safe. So now I'm confused. :(

I can certainly add checks to make sure that the skipped operations 
actually don't make any important changes to the value, but do I need to?

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-24 14:08         ` Stubbs, Andrew
@ 2011-06-24 16:13           ` Richard Guenther
  2011-06-24 18:22             ` Stubbs, Andrew
  2011-06-28 11:32             ` Andrew Stubbs
  0 siblings, 2 replies; 107+ messages in thread
From: Richard Guenther @ 2011-06-24 16:13 UTC (permalink / raw)
  To: Stubbs, Andrew; +Cc: Andrew Stubbs, gcc-patches, patches

On Fri, Jun 24, 2011 at 3:46 PM, Stubbs, Andrew
<Andrew_Stubbs@mentor.com> wrote:
> On 24/06/11 09:28, Richard Guenther wrote:
>>> >  To be clear, it only skips past NOP_EXPR. Is it not the case that what
>>> >  you're describing would need a CONVERT_EXPR?
>> NOP_EXPR is the same as CONVERT_EXPR.
>
> Are you sure?

Yes, definitely.  They are synonyms of each other (an unfinished merging
process), the usual check for them is via CONVERT_EXPR_P.

> I thought this was safe because the internals manual says:
>
>   NOP_EXPR
>   These nodes are used to represent conversions that do not require any
>   code-generation ....
>
>   CONVERT_EXPR
>   These nodes are similar to NOP_EXPRs, but are used in those
>   situations where code may need to be generated ....

Which is wrong (sorry).

> So, I tried this example:
>
> int
> foo (int a, short b, short c)
> {
>   int bc = b * c;
>   return a + (short)bc;
> }
>
> Both before and after my patch, GCC gives:
>
>         mul     r2, r1, r2
>         sxtah   r0, r0, r2
>
> (where, SXTAH means sign-extend the third operand from HImode to SImode
> and add to the second operand.)
>
> The dump after the widening_mult pass is:
>
> foo (int a, short int b, short int c)
> {
>   int bc;
>   int D.2018;
>   short int D.2017;
>   int D.2016;
>   int D.2015;
>   int D.2014;
>
> <bb 2>:
>   D.2014_2 = (int) b_1(D);
>   D.2015_4 = (int) c_3(D);
>   bc_5 = b_1(D) w* c_3(D);
>   D.2017_6 = (short int) bc_5;
>   D.2018_7 = (int) D.2017_6;
>   D.2016_9 = D.2018_7 + a_8(D);
>   return D.2016_9;
>
> }
>
> Where you can clearly see that the addition has not been recognised as a
> multiply-and-accumulate.
>
> When I step through convert_plusminus_to_widen, I can see that the
> reason it has not matched is because "D.2017_6 = (short int) bc_5" is
> encoded with a CONVERT_EXPR, just as the manual said it would be.

A NOP_EXPR in this place would be valid as well.  The merging hasn't
been completed and at least the C frontend still generates CONVERT_EXPRs
in some cases.

> So, according to the manual, and my (admittedly limited) experiments,
> skipping over NOP_EXPR does appear to be safe.
>
> But you say that it isn't safe. So now I'm confused. :(
>
> I can certainly add checks to make sure that the skipped operations
> actually don't make any important changes to the value, but do I need to?

Yes.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-24 16:13           ` Richard Guenther
@ 2011-06-24 18:22             ` Stubbs, Andrew
  2011-06-25  9:58               ` Richard Guenther
  2011-06-28 11:32             ` Andrew Stubbs
  1 sibling, 1 reply; 107+ messages in thread
From: Stubbs, Andrew @ 2011-06-24 18:22 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

On 24/06/11 16:47, Richard Guenther wrote:
>> >  I can certainly add checks to make sure that the skipped operations
>> >  actually don't make any important changes to the value, but do I need to?
> Yes.

Ok, I'll go away and do that then.

BTW, I see useless_type_conversion_p, but that's not quite what I want. 
Is there an equivalent existing function to determine whether a 
conversion changes the logical/arithmetic meaning of a type?

I mean, conversion to a wider mode is not "useless", but it is harmless, 
whereas conversion to a narrower mode may truncate the value.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-24 18:22             ` Stubbs, Andrew
@ 2011-06-25  9:58               ` Richard Guenther
  0 siblings, 0 replies; 107+ messages in thread
From: Richard Guenther @ 2011-06-25  9:58 UTC (permalink / raw)
  To: Stubbs, Andrew; +Cc: gcc-patches, patches

On Fri, Jun 24, 2011 at 6:58 PM, Stubbs, Andrew
<Andrew_Stubbs@mentor.com> wrote:
> On 24/06/11 16:47, Richard Guenther wrote:
>>> >  I can certainly add checks to make sure that the skipped operations
>>> >  actually don't make any important changes to the value, but do I need to?
>> Yes.
>
> Ok, I'll go away and do that then.
>
> BTW, I see useless_type_conversion_p, but that's not quite what I want.
> Is there an equivalent existing function to determine whether a
> conversion changes the logical/arithmetic meaning of a type?
>
> I mean, conversion to a wider mode is not "useless", but it is harmless,
> whereas conversion to a narrower mode may truncate the value.

Well, you have to decide that for the concrete situation based on
the signedness and precision of the types involved.  All such
conversions change the logical/arithmetic meaning of a type if
seen in the right context.

Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (0/7)] Improve use of Widening Multiplies
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
                   ` (6 preceding siblings ...)
  2011-06-23 14:54 ` [PATCH (7/7)] Mixed-sign multiplies using narrowest mode Andrew Stubbs
@ 2011-06-25 16:14 ` Bernd Schmidt
  2011-06-27  9:16   ` Andrew Stubbs
  2011-07-18 14:34 ` [PATCH (8/7)] Fix a bug in multiply-and-accumulate Andrew Stubbs
  2011-07-21 13:14 ` [PATCH (9/7)] Widening multiplies with constant inputs Andrew Stubbs
  9 siblings, 1 reply; 107+ messages in thread
From: Bernd Schmidt @ 2011-06-25 16:14 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On 06/23/11 16:34, Andrew Stubbs wrote:

> The patches provide a number of improvements:
> 
>  * Support for instructions that widen by more than one mode
>    (e.g. from HImode to DImode).
> 
>  * Use of widening multiplies even when the input mode is narrower than
>    the instruction uses. (e.g. Use HI->DI to do QI->DI).
> 
>  * Use of signed widening multiplies (of a larger mode) where unsigned
>    multiplies are not available.
> 
>  * Support for input operands with mis-matched signedness, with or
>    without usmul_widen_optab.
> 
>  * Support for input operands with mis-matched mode [1].
> 
>  * Improved pattern matching in the widening_mult pass.
>    * Recognition of true types, even if obscured by a cast.
>    * Insertion of extra gimple statements where the existing code was
>      incompatible with widening multiplies.
>    * Recognition of widening multiply-and-accumulate even where the
>      multiply expression was not widening.

That all sounds good, but missing from this list is something that
occurs on many CPUs - widening from the high part of a register. The
current machinery only recognizes lowxlow widening multiplication, but
hardware often exists for highxlow and highxhigh. For example, Blackfin
has "<su_optab>hisi_lh"/hl/hh instruction patterns; C6X also has a full
set; ARM has mulhisi3tb/bt/tt.

Do you think it will be possible to extend your new framework to handle
this case as well?


Bernd

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (0/7)] Improve use of Widening Multiplies
  2011-06-25 16:14 ` [PATCH (0/7)] Improve use of Widening Multiplies Bernd Schmidt
@ 2011-06-27  9:16   ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-27  9:16 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches

On 25/06/11 15:12, Bernd Schmidt wrote:
> That all sounds good, but missing from this list is something that
> occurs on many CPUs - widening from the high part of a register. The
> current machinery only recognizes lowxlow widening multiplication, but
> hardware often exists for highxlow and highxhigh. For example, Blackfin
> has "<su_optab>hisi_lh"/hl/hh instruction patterns; C6X also has a full
> set; ARM has mulhisi3tb/bt/tt.
>
> Do you think it will be possible to extend your new framework to handle
> this case as well?

No, I can't think of a way to implement widening from the high part 
using anything like my framework.

I mean, what I've done is add a new dimension to the optab table, but 
not changed the meaning of that optab. The expand pass has to look at 
the input types to know what insn to use, but it doesn't need to look 
any further than that. If I added yet another dimension to cover expand 
from high part, then we could detect that in convert_mult_to_widen (and 
maybe clean it up), but the expander would still have to re-detect it 
later on.

I would think that the best way to implement that would still be to add 
a new optab entry, new tree code, etc., etc., and then fix up all the 
"if (optab == smul_widen_optab)" and such that would need to consider it.

In any case, on ARM at any rate, the combine pass already combines shift 
and widening-mult patterns quite reliably (I committed at a patch for 
that not so long ago).

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-24 16:13           ` Richard Guenther
  2011-06-24 18:22             ` Stubbs, Andrew
@ 2011-06-28 11:32             ` Andrew Stubbs
  2011-06-28 12:48               ` Richard Guenther
  1 sibling, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-28 11:32 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 371 bytes --]

On 24/06/11 16:47, Richard Guenther wrote:
>> I can certainly add checks to make sure that the skipped operations
>> >  actually don't make any important changes to the value, but do I need to?
> Yes.

OK, how about this patch?

I've added checks to make sure the value is not truncated at any point.

I've also changed the test cases to address Janis' comments.

Andrew

[-- Attachment #2: widening-multiplies-3.patch --]
[-- Type: text/x-patch, Size: 5383 bytes --]

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* gimple.h (tree_ssa_harmless_type_conversion): New prototype.
	(tree_ssa_strip_harmless_type_conversions): New prototype.
	(harmless_type_conversion_p): New prototype.
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Look for
	multiply statement beyond no-op conversion statements.
	* tree-ssa.c (harmless_type_conversion_p): New function.
	(tree_ssa_harmless_type_conversion): New function.
	(tree_ssa_strip_harmless_type_conversions): New function.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1090,8 +1090,11 @@ extern bool validate_gimple_arglist (const_gimple, ...);
 
 /* In tree-ssa.c  */
 extern bool tree_ssa_useless_type_conversion (tree);
+extern bool tree_ssa_harmless_type_conversion (tree);
 extern tree tree_ssa_strip_useless_type_conversions (tree);
+extern tree tree_ssa_strip_harmless_type_conversions (tree);
 extern bool useless_type_conversion_p (tree, tree);
+extern bool harmless_type_conversion_p (tree, tree);
 extern bool types_compatible_p (tree, tree);
 
 /* Return the code for GIMPLE statement G.  */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+     int bc = b * c;
+        return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2117,23 +2117,19 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   rhs1 = gimple_assign_rhs1 (stmt);
   rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (rhs1) == SSA_NAME)
-    {
-      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-      if (is_gimple_assign (rhs1_stmt))
-	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
-    }
-  else
+  if (TREE_CODE (rhs1) != SSA_NAME
+      || TREE_CODE (rhs2) != SSA_NAME)
     return false;
 
-  if (TREE_CODE (rhs2) == SSA_NAME)
-    {
-      rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
-      if (is_gimple_assign (rhs2_stmt))
-	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
-    }
-  else
-    return false;
+  rhs1 = tree_ssa_strip_harmless_type_conversions (rhs1);
+  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+  if (is_gimple_assign (rhs1_stmt))
+    rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+
+  rhs2 = tree_ssa_strip_harmless_type_conversions(rhs2);
+  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+  if (is_gimple_assign (rhs2_stmt))
+    rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
     {
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -1484,6 +1484,33 @@ useless_type_conversion_p (tree outer_type, tree inner_type)
   return false;
 }
 
+/* Return true if the conversion from INNER_TYPE to OUTER_TYPE will
+   not alter the arithmetic meaning of a type, otherwise return false.
+
+   For example, widening an integer type leaves the value unchanged,
+   but narrowing an integer type can cause truncation.
+
+   Note that switching between signed and unsigned modes doesn't change
+   the underlying representation, and so is harmless.
+
+   This function is not yet a complete definition of what is harmless
+   but should reject everything that is not.  */
+
+bool
+harmless_type_conversion_p (tree outer_type, tree inner_type)
+{
+  /* If it's useless, it's also harmless.  */
+  if (useless_type_conversion_p (outer_type, inner_type))
+    return true;
+
+  if (INTEGRAL_TYPE_P (inner_type)
+      && INTEGRAL_TYPE_P (outer_type)
+      && TYPE_PRECISION (inner_type) <= TYPE_PRECISION (outer_type))
+    return true;
+
+  return false;
+}
+
 /* Return true if a conversion from either type of TYPE1 and TYPE2
    to the other is not required.  Otherwise return false.  */
 
@@ -1515,6 +1542,29 @@ tree_ssa_useless_type_conversion (tree expr)
   return false;
 }
 
+/* Return true if EXPR is a harmless type conversion, otherwise return
+   false.  */
+
+bool
+tree_ssa_harmless_type_conversion (tree expr)
+{
+  gimple stmt;
+
+  if (TREE_CODE (expr) != SSA_NAME)
+    return false;
+
+  stmt = SSA_NAME_DEF_STMT (expr);
+
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
+    return false;
+
+  return harmless_type_conversion_p (TREE_TYPE (gimple_assign_lhs (stmt)),
+				     TREE_TYPE (gimple_assign_rhs1 (stmt)));
+}
+
 /* Strip conversions from EXP according to
    tree_ssa_useless_type_conversion and return the resulting
    expression.  */
@@ -1527,6 +1577,18 @@ tree_ssa_strip_useless_type_conversions (tree exp)
   return exp;
 }
 
+/* Strip conversions from EXP according to
+   tree_ssa_harmless_type_conversion and return the resulting
+   expression.  */
+
+tree
+tree_ssa_strip_harmless_type_conversions (tree exp)
+{
+  while (tree_ssa_harmless_type_conversion (exp))
+    exp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (exp));
+  return exp;
+}
+
 
 /* Internal helper for walk_use_def_chains.  VAR, FN and DATA are as
    described in walk_use_def_chains.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-28 11:32             ` Andrew Stubbs
@ 2011-06-28 12:48               ` Richard Guenther
  2011-06-28 16:37                 ` Michael Matz
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-06-28 12:48 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Tue, Jun 28, 2011 at 12:47 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote:
> On 24/06/11 16:47, Richard Guenther wrote:
>>>
>>> I can certainly add checks to make sure that the skipped operations
>>> >  actually don't make any important changes to the value, but do I need
>>> > to?
>>
>> Yes.
>
> OK, how about this patch?

I'd name the predicate value_preserving_conversion_p which I think
is what you mean.  harmless isn't really descriptive.

Note that you include non-value-preserving conversions, namely
int -> unsigned int.  Don't dispatch to useless_type_conversion_p,
it's easy to enumerate which conversions are value-preserving.

Don't try to match the tree_ssa_useless_* set of functions, instead
put the value_preserving_conversion_p predicate in tree.[ch] and
a suitable function using it in tree-ssa-math-opts.c.

Thanks,
Richard.

> I've added checks to make sure the value is not truncated at any point.
>
> I've also changed the test cases to address Janis' comments.
>
> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-06-23 14:43 ` [PATCH (4/7)] Unsigned multiplies using wider signed multiplies Andrew Stubbs
@ 2011-06-28 13:28   ` Andrew Stubbs
  2011-06-28 14:49     ` Andrew Stubbs
  2011-06-28 13:30   ` Paolo Bonzini
  1 sibling, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-28 13:28 UTC (permalink / raw)
  Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

On 23/06/11 15:41, Andrew Stubbs wrote:
> If one or both of the inputs to a widening multiply are of unsigned type
> then the compiler will attempt to use usmul_widen_optab or
> umul_widen_optab, respectively.
>
> That works fine, but only if the target supports those operations
> directly. Otherwise, it just bombs out and reverts to the normal
> inefficient non-widening multiply.
>
> This patch attempts to catch these cases and use an alternative signed
> widening multiply instruction, if one of those is available.
>
> I believe this should be legal as long as the top bit of both inputs is
> guaranteed to be zero. The code achieves this guarantee by
> zero-extending the inputs to a wider mode (which must still be narrower
> than the output mode).
>
> OK?

This update fixes the testsuite issue Janis pointed out.

Andrew

[-- Attachment #2: widening-multiplies-4.patch --]
[-- Type: text/x-patch, Size: 7316 bytes --]

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* Makefile.in (tree-ssa-math-opts.o): Add langhooks.h dependency.
	* optabs.c (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this, and add new
	argument 'found_mode'.
	* optabs.h (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this.
	(find_widening_optab_handler): New macro.
	* tree-ssa-math-opts.c: Include langhooks.h
	(build_and_insert_cast): New function.
	(convert_mult_to_widen): Add new argument 'gsi'.
	Convert unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.
	(execute_optimize_widening_mul): Pass gsi to convert_mult_to_widen.

	gcc/testsuite/
	* gcc.target/arm/wmul-6.c: New file.

--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2672,7 +2672,8 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLOW_H) $(CONFIG_H) \
 tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
    $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
-   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h
+   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \
+   langhooks.h
 tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
    $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \
    $(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -232,9 +232,10 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
    non-widening optabs also.  */
 
 enum insn_code
-find_widening_optab_handler (optab op, enum machine_mode to_mode,
-			     enum machine_mode from_mode,
-			     int permit_non_widening)
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+				      enum machine_mode from_mode,
+				      int permit_non_widening,
+				      enum machine_mode *found_mode)
 {
   for (; (permit_non_widening || from_mode != to_mode)
 	 && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
@@ -245,7 +246,11 @@ find_widening_optab_handler (optab op, enum machine_mode to_mode,
 						       from_mode);
 
       if (handler != CODE_FOR_nothing)
-	return handler;
+	{
+	  if (found_mode)
+	    *found_mode = from_mode;
+	  return handler;
+	}
     }
 
   return CODE_FOR_nothing;
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -808,8 +808,13 @@ extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
 /* Find a widening optab even if it doesn't widen as much as we want.  */
-extern enum insn_code find_widening_optab_handler (optab, enum machine_mode,
-						   enum machine_mode, int);
+#define find_widening_optab_handler(A,B,C,D) \
+  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+extern enum insn_code find_widening_optab_handler_and_mode (optab,
+							    enum machine_mode,
+							    enum machine_mode,
+							    int,
+							    enum machine_mode *);
 
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
    distinguish between machines with a vector shift that takes a scalar for the
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -98,6 +98,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "basic-block.h"
 #include "target.h"
 #include "gimple-pretty-print.h"
+#include "langhooks.h"
 
 /* FIXME: RTL headers have to be included here for optabs.  */
 #include "rtl.h"		/* Because optabs.h wants enum rtx_code.  */
@@ -1086,6 +1087,21 @@ build_and_insert_ref (gimple_stmt_iterator *gsi, location_t loc, tree type,
   return result;
 }
 
+/* Build a gimple assignment to cast VAL to TYPE, and put the result in
+   TARGET.  Insert the statement prior to GSI's current position, and
+   return the from SSA name.  */
+
+static tree
+build_and_insert_cast (gimple_stmt_iterator *gsi, location_t loc,
+		       tree target, tree val, tree type)
+{
+  tree result = make_ssa_name (target, NULL);
+  gimple stmt = gimple_build_assign (result, fold_convert (type, val));
+  gimple_set_location (stmt, loc);
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  return result;
+}
+
 /* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI
    with location info LOC.  If possible, create an equivalent and
    less expensive sequence of statements prior to GSI, and return an
@@ -2047,7 +2063,7 @@ is_widening_mult_p (gimple stmt,
    value is true iff we converted the statement.  */
 
 static bool
-convert_mult_to_widen (gimple stmt)
+convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
@@ -2075,7 +2091,31 @@ convert_mult_to_widen (gimple stmt)
   handler = find_widening_optab_handler (op, to_mode, from_mode, 0);
 
   if (handler == CODE_FOR_nothing)
-    return false;
+    {
+      if (op != smul_widen_optab)
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+	    return false;
+
+	  op = smul_widen_optab;
+	  handler = find_widening_optab_handler_and_mode (op, to_mode,
+							  from_mode, 0,
+							  &from_mode);
+
+	  if (handler == CODE_FOR_nothing)
+	    return false;
+
+	  type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0);
+
+	  rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+					create_tmp_var (type1, NULL), rhs1, type1);
+	  rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+					create_tmp_var (type2, NULL), rhs2, type2);
+	}
+      else
+	return false;
+    }
 
   gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
   gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
@@ -2165,7 +2205,22 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     return false;
 
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
-    return false;
+    {
+      enum machine_mode mode = TYPE_MODE (type1);
+      mode = GET_MODE_WIDER_MODE (mode);
+      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
+	{
+	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
+	  mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+					     create_tmp_var (type1, NULL),
+					     mult_rhs1, type1);
+	  mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+					     create_tmp_var (type2, NULL),
+					     mult_rhs2, type2);
+	}
+      else
+	return false;
+    }
 
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
@@ -2393,7 +2448,7 @@ execute_optimize_widening_mul (void)
 	      switch (code)
 		{
 		case MULT_EXPR:
-		  if (!convert_mult_to_widen (stmt)
+		  if (!convert_mult_to_widen (stmt, &gsi)
 		      && convert_mult_to_fma (stmt,
 					      gimple_assign_rhs1 (stmt),
 					      gimple_assign_rhs2 (stmt)))

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-06-23 14:43 ` [PATCH (4/7)] Unsigned multiplies using wider signed multiplies Andrew Stubbs
  2011-06-28 13:28   ` Andrew Stubbs
@ 2011-06-28 13:30   ` Paolo Bonzini
  1 sibling, 0 replies; 107+ messages in thread
From: Paolo Bonzini @ 2011-06-28 13:30 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On 06/23/2011 04:41 PM, Andrew Stubbs wrote:
>
> I believe this should be legal as long as the top bit of both inputs is
> guaranteed to be zero. The code achieves this guarantee by
> zero-extending the inputs to a wider mode (which must still be narrower
> than the output mode).

Yes, that's correct.

Paolo

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-06-28 13:28   ` Andrew Stubbs
@ 2011-06-28 14:49     ` Andrew Stubbs
  2011-07-04 14:27       ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-28 14:49 UTC (permalink / raw)
  Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 988 bytes --]

On 28/06/11 13:33, Andrew Stubbs wrote:
> On 23/06/11 15:41, Andrew Stubbs wrote:
>> If one or both of the inputs to a widening multiply are of unsigned type
>> then the compiler will attempt to use usmul_widen_optab or
>> umul_widen_optab, respectively.
>>
>> That works fine, but only if the target supports those operations
>> directly. Otherwise, it just bombs out and reverts to the normal
>> inefficient non-widening multiply.
>>
>> This patch attempts to catch these cases and use an alternative signed
>> widening multiply instruction, if one of those is available.
>>
>> I believe this should be legal as long as the top bit of both inputs is
>> guaranteed to be zero. The code achieves this guarantee by
>> zero-extending the inputs to a wider mode (which must still be narrower
>> than the output mode).
>>
>> OK?
>
> This update fixes the testsuite issue Janis pointed out.

And this one fixes up the wmul-5.c testcase also. The patch has changed 
the correct result.

Andrew

[-- Attachment #2: widening-multiplies-4.patch --]
[-- Type: text/x-patch, Size: 7632 bytes --]

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* Makefile.in (tree-ssa-math-opts.o): Add langhooks.h dependency.
	* optabs.c (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this, and add new
	argument 'found_mode'.
	* optabs.h (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this.
	(find_widening_optab_handler): New macro.
	* tree-ssa-math-opts.c: Include langhooks.h
	(build_and_insert_cast): New function.
	(convert_mult_to_widen): Add new argument 'gsi'.
	Convert unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.
	(execute_optimize_widening_mul): Pass gsi to convert_mult_to_widen.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: Update expected result.
	* gcc.target/arm/wmul-6.c: New file.

--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2672,7 +2672,8 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLOW_H) $(CONFIG_H) \
 tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
    $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
-   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h
+   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \
+   langhooks.h
 tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
    $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \
    $(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -232,9 +232,10 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
    non-widening optabs also.  */
 
 enum insn_code
-find_widening_optab_handler (optab op, enum machine_mode to_mode,
-			     enum machine_mode from_mode,
-			     int permit_non_widening)
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+				      enum machine_mode from_mode,
+				      int permit_non_widening,
+				      enum machine_mode *found_mode)
 {
   for (; (permit_non_widening || from_mode != to_mode)
 	 && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
@@ -245,7 +246,11 @@ find_widening_optab_handler (optab op, enum machine_mode to_mode,
 						       from_mode);
 
       if (handler != CODE_FOR_nothing)
-	return handler;
+	{
+	  if (found_mode)
+	    *found_mode = from_mode;
+	  return handler;
+	}
     }
 
   return CODE_FOR_nothing;
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -808,8 +808,13 @@ extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
 /* Find a widening optab even if it doesn't widen as much as we want.  */
-extern enum insn_code find_widening_optab_handler (optab, enum machine_mode,
-						   enum machine_mode, int);
+#define find_widening_optab_handler(A,B,C,D) \
+  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+extern enum insn_code find_widening_optab_handler_and_mode (optab,
+							    enum machine_mode,
+							    enum machine_mode,
+							    int,
+							    enum machine_mode *);
 
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
    distinguish between machines with a vector shift that takes a scalar for the
--- a/gcc/testsuite/gcc.target/arm/wmul-5.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -7,4 +7,4 @@ foo (long long a, char *b, char *c)
   return a + *b * *c;
 }
 
-/* { dg-final { scan-assembler "umlal" } } */
+/* { dg-final { scan-assembler "smlalbb" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -98,6 +98,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "basic-block.h"
 #include "target.h"
 #include "gimple-pretty-print.h"
+#include "langhooks.h"
 
 /* FIXME: RTL headers have to be included here for optabs.  */
 #include "rtl.h"		/* Because optabs.h wants enum rtx_code.  */
@@ -1086,6 +1087,21 @@ build_and_insert_ref (gimple_stmt_iterator *gsi, location_t loc, tree type,
   return result;
 }
 
+/* Build a gimple assignment to cast VAL to TYPE, and put the result in
+   TARGET.  Insert the statement prior to GSI's current position, and
+   return the from SSA name.  */
+
+static tree
+build_and_insert_cast (gimple_stmt_iterator *gsi, location_t loc,
+		       tree target, tree val, tree type)
+{
+  tree result = make_ssa_name (target, NULL);
+  gimple stmt = gimple_build_assign (result, fold_convert (type, val));
+  gimple_set_location (stmt, loc);
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  return result;
+}
+
 /* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI
    with location info LOC.  If possible, create an equivalent and
    less expensive sequence of statements prior to GSI, and return an
@@ -2047,7 +2063,7 @@ is_widening_mult_p (gimple stmt,
    value is true iff we converted the statement.  */
 
 static bool
-convert_mult_to_widen (gimple stmt)
+convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
@@ -2075,7 +2091,31 @@ convert_mult_to_widen (gimple stmt)
   handler = find_widening_optab_handler (op, to_mode, from_mode, 0);
 
   if (handler == CODE_FOR_nothing)
-    return false;
+    {
+      if (op != smul_widen_optab)
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+	    return false;
+
+	  op = smul_widen_optab;
+	  handler = find_widening_optab_handler_and_mode (op, to_mode,
+							  from_mode, 0,
+							  &from_mode);
+
+	  if (handler == CODE_FOR_nothing)
+	    return false;
+
+	  type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0);
+
+	  rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+					create_tmp_var (type1, NULL), rhs1, type1);
+	  rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+					create_tmp_var (type2, NULL), rhs2, type2);
+	}
+      else
+	return false;
+    }
 
   gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
   gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
@@ -2165,7 +2205,22 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     return false;
 
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
-    return false;
+    {
+      enum machine_mode mode = TYPE_MODE (type1);
+      mode = GET_MODE_WIDER_MODE (mode);
+      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
+	{
+	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
+	  mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+					     create_tmp_var (type1, NULL),
+					     mult_rhs1, type1);
+	  mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+					     create_tmp_var (type2, NULL),
+					     mult_rhs2, type2);
+	}
+      else
+	return false;
+    }
 
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
@@ -2393,7 +2448,7 @@ execute_optimize_widening_mul (void)
 	      switch (code)
 		{
 		case MULT_EXPR:
-		  if (!convert_mult_to_widen (stmt)
+		  if (!convert_mult_to_widen (stmt, &gsi)
 		      && convert_mult_to_fma (stmt,
 					      gimple_assign_rhs1 (stmt),
 					      gimple_assign_rhs2 (stmt)))

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
  2011-06-23 14:44 ` [PATCH (5/7)] Widening multiplies for mis-matched mode inputs Andrew Stubbs
@ 2011-06-28 15:44   ` Andrew Stubbs
  2011-07-04 14:29     ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-28 15:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 507 bytes --]

On 23/06/11 15:41, Andrew Stubbs wrote:
> This patch removes the restriction that the inputs to a widening
> multiply must be of the same mode.
>
> It does this by extending the smaller of the two inputs to match the
> larger; therefore, it remains the case that subsequent code (in the
> expand pass, for example) can rely on the type of rhs1 being the input
> type of the operation, and the gimple verification code is still valid.
>
> OK?

This update fixes the testcase issue Janis highlighted.

Andrew

[-- Attachment #2: widening-multiplies-5.patch --]
[-- Type: text/x-patch, Size: 4144 bytes --]

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME.
	Ensure the the larger type is the first operand.
	(convert_mult_to_widen): Insert cast if type2 is smaller than type1.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-7.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+unsigned long long
+foo (unsigned long long a, unsigned char *b, unsigned short *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2051,9 +2051,17 @@ is_widening_mult_p (gimple stmt,
       *type2_out = *type1_out;
     }
 
-  /* FIXME: remove this restriction.  */
-  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
-    return false;
+  /* Ensure that the larger of the two operands comes first. */
+  if (TYPE_PRECISION (*type1_out) < TYPE_PRECISION (*type2_out))
+    {
+      tree tmp;
+      tmp = *type1_out;
+      *type1_out = *type2_out;
+      *type2_out = tmp;
+      tmp = *rhs1_out;
+      *rhs1_out = *rhs2_out;
+      *rhs2_out = tmp;
+    }
 
   return true;
 }
@@ -2069,6 +2077,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   enum insn_code handler;
   enum machine_mode to_mode, from_mode;
   optab op;
+  int cast1 = false, cast2 = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2107,16 +2116,26 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 	    return false;
 
 	  type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0);
-
-	  rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
-					create_tmp_var (type1, NULL), rhs1, type1);
-	  rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
-					create_tmp_var (type2, NULL), rhs2, type2);
+	  cast1 = cast2 = true;
 	}
       else
 	return false;
     }
 
+  if (TYPE_MODE (type2) != from_mode)
+    {
+      type2 = lang_hooks.types.type_for_mode (from_mode,
+					      TYPE_UNSIGNED (type2));
+      cast2 = true;
+    }
+
+  if (cast1)
+    rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+				  create_tmp_var (type1, NULL), rhs1, type1);
+  if (cast2)
+    rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+				  create_tmp_var (type2, NULL), rhs2, type2);
+
   gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
   gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
@@ -2142,6 +2161,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   optab this_optab;
   enum tree_code wmult_code;
   enum insn_code handler;
+  int cast1 = false, cast2 = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2211,17 +2231,28 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
 	{
 	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
-	  mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
-					     create_tmp_var (type1, NULL),
-					     mult_rhs1, type1);
-	  mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
-					     create_tmp_var (type2, NULL),
-					     mult_rhs2, type2);
+	  cast1 = cast2 = true;
 	}
       else
 	return false;
     }
 
+  if (TYPE_MODE (type2) != TYPE_MODE (type1))
+    {
+      type2 = lang_hooks.types.type_for_mode (TYPE_MODE (type1),
+					      TYPE_UNSIGNED (type2));
+      cast2 = true;
+    }
+
+  if (cast1)
+    mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+				       create_tmp_var (type1, NULL),
+				       mult_rhs1, type1);
+  if (cast2)
+    mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+				       create_tmp_var (type2, NULL),
+				       mult_rhs2, type2);
+
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
  2011-06-23 14:51 ` [PATCH (6/7)] More widening multiply-and-accumulate pattern matching Andrew Stubbs
@ 2011-06-28 15:49   ` Andrew Stubbs
  2011-07-04 14:32     ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-28 15:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 899 bytes --]

On 23/06/11 15:42, Andrew Stubbs wrote:
> This patch fixes the case where widening multiply-and-accumulate were
> not recognised because the multiplication itself is not actually widening.
>
> This can happen when you have "DI + SI * SI" - the multiplication will
> be done in SImode as a non-widening multiply, and it's only the final
> accumulate step that is widening.
>
> This was not recognised for two reasons:
>
> 1. is_widening_mult_p inferred the output type from the multiply
> statement, which in not useful in this case.
>
> 2. The inputs to the multiply instruction may not have been converted at
> all (because they're not being widened), so the pattern match failed.
>
> The patch fixes these issues by making the output type explicit, and by
> permitting unconverted inputs (the types are still checked, so this is
> safe).
>
> OK?

This update fixes Janis' testsuite issue.

Andrew

[-- Attachment #2: widening-multiplies-6.patch --]
[-- Type: text/x-patch, Size: 5023 bytes --]

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
	'type'.
	Use 'type' from caller, not inferred from 'rhs'.
	Don't reject non-conversion statements. Do return lhs in this case.
	(is_widening_mult_p): Add new argument 'type'.
	Use 'type' from caller, not inferred from 'stmt'.
	Pass type to is_widening_mult_rhs_p.
	(convert_mult_to_widen): Pass type to is_widening_mult_p.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-8.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, int *b, int *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1963,7 +1963,8 @@ struct gimple_opt_pass pass_optimize_bswap =
  }
 };
 
-/* Return true if RHS is a suitable operand for a widening multiplication.
+/* Return true if RHS is a suitable operand for a widening multiplication,
+   assuming a target type of TYPE.
    There are two cases:
 
      - RHS makes some value at least twice as wide.  Store that value
@@ -1973,32 +1974,32 @@ struct gimple_opt_pass pass_optimize_bswap =
        but leave *TYPE_OUT untouched.  */
 
 static bool
-is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
+is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
+			tree *new_rhs_out)
 {
   gimple stmt;
-  tree type, type1, rhs1;
+  tree type1, rhs1;
   enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
     {
-      type = TREE_TYPE (rhs);
       stmt = SSA_NAME_DEF_STMT (rhs);
       if (!is_gimple_assign (stmt))
 	return false;
 
-      rhs_code = gimple_assign_rhs_code (stmt);
-      if (TREE_CODE (type) == INTEGER_TYPE
-	  ? !CONVERT_EXPR_CODE_P (rhs_code)
-	  : rhs_code != FIXED_CONVERT_EXPR)
-	return false;
-
       rhs1 = gimple_assign_rhs1 (stmt);
       type1 = TREE_TYPE (rhs1);
       if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
 
-      *new_rhs_out = rhs1;
+      rhs_code = gimple_assign_rhs_code (stmt);
+      if (TREE_CODE (type) == INTEGER_TYPE
+	  ? !CONVERT_EXPR_CODE_P (rhs_code)
+	  : rhs_code != FIXED_CONVERT_EXPR)
+	*new_rhs_out = gimple_assign_lhs (stmt);
+      else
+	*new_rhs_out = rhs1;
       *type_out = type1;
       return true;
     }
@@ -2013,28 +2014,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
   return false;
 }
 
-/* Return true if STMT performs a widening multiplication.  If so,
-   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
-   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
-   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
-   operands of the multiplication.  */
+/* Return true if STMT performs a widening multiplication, assuming the
+   output type is TYPE.  If so, store the unwidened types of the operands
+   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
+   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
+   and *TYPE2_OUT would give the operands of the multiplication.  */
 
 static bool
-is_widening_mult_p (gimple stmt,
+is_widening_mult_p (tree type, gimple stmt,
 		    tree *type1_out, tree *rhs1_out,
 		    tree *type2_out, tree *rhs2_out)
 {
-  tree type;
-
-  type = TREE_TYPE (gimple_assign_lhs (stmt));
   if (TREE_CODE (type) != INTEGER_TYPE
       && TREE_CODE (type) != FIXED_POINT_TYPE)
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
+			       rhs1_out))
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
+			       rhs2_out))
     return false;
 
   if (*type1_out == NULL)
@@ -2084,7 +2084,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != INTEGER_TYPE)
     return false;
 
-  if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
+  if (!is_widening_mult_p (type, stmt, &type1, &rhs1, &type2, &rhs2))
     return false;
 
   to_mode = TYPE_MODE (type);
@@ -2193,14 +2193,14 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
     {
-      if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
     }
   else if (rhs2_code == MULT_EXPR)
     {
-      if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-28 12:48               ` Richard Guenther
@ 2011-06-28 16:37                 ` Michael Matz
  2011-06-28 16:48                   ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Michael Matz @ 2011-06-28 16:37 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew Stubbs, gcc-patches, patches

Hi,

On Tue, 28 Jun 2011, Richard Guenther wrote:

> I'd name the predicate value_preserving_conversion_p which I think is 
> what you mean.  harmless isn't really descriptive.
> 
> Note that you include non-value-preserving conversions, namely int -> 
> unsigned int.

It seems that Andrew really does want to accept them.  If so 
value_preserving_conversion_p would be the wrong name.  It seems to me he 
wants to accept those conversions that make it possible to retrieve the 
old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the 
to-be-named property.  bits_preserving?  Hmm.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-28 16:37                 ` Michael Matz
@ 2011-06-28 16:48                   ` Andrew Stubbs
  2011-06-28 17:09                     ` Michael Matz
  2011-07-01 16:40                     ` Bernd Schmidt
  0 siblings, 2 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-28 16:48 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, gcc-patches, patches

On 28/06/11 16:53, Michael Matz wrote:
> On Tue, 28 Jun 2011, Richard Guenther wrote:
>> I'd name the predicate value_preserving_conversion_p which I think is
>> what you mean.  harmless isn't really descriptive.
>>
>> Note that you include non-value-preserving conversions, namely int ->
>> unsigned int.
>
> It seems that Andrew really does want to accept them.  If so
> value_preserving_conversion_p would be the wrong name.  It seems to me he
> wants to accept those conversions that make it possible to retrieve the
> old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the
> to-be-named property.  bits_preserving?  Hmm.

What I want (and I'm not totally clear on what this actually means) is 
to be able to optimize all the cases where the end result will be the 
same as the compiler produces now (using multiple multiply, shift, and 
add operations).

Ok, so that's an obvious statement, but the point is that, right now, 
the compiler does nothing special when you cast from int -> unsigned 
int, or vice-versa, and I want to capture that somehow. There are some 
exceptions, I'm sure, but what are they?

What is clear is that I don't want to just assume that casting from one 
signedness to the other is a show-stopper.

For example:

   unsigned long long
   foo (unsigned long long a, unsigned char b, unsigned char c)
   {
     return a + b * c;
   }

This appears to be entirely unsigned maths with plenty of spare 
precision, and therefore a dead cert for any SI->DI 
multiply-and-accumulate instruction, but not so - it is represented 
internally as:

   signed int tmp = (signed int)a * (signed int)b;
   unsigned long long result = a + (unsigned long long)tmp;

Notice the unexpected signed int in the middle! I need to be able to get 
past that to optimize this properly.

I've tried various test cases in which I cast signedness and mode around 
a bit, and so far it appear to perform safely, but probably I'm not be 
cunning enough.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode
  2011-06-23 14:54 ` [PATCH (7/7)] Mixed-sign multiplies using narrowest mode Andrew Stubbs
@ 2011-06-28 17:02   ` Andrew Stubbs
  2011-07-14 14:44     ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-06-28 17:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

On 23/06/11 15:43, Andrew Stubbs wrote:
> Patch 4 introduced support for using signed multiplies to code unsigned
> multiplies in a narrower mode. Patch 5 then introduced support for
> mis-matched input modes.
>
> These two combined mean that there is case where only the smaller of two
> inputs is unsigned, and yet it still tries to user a mode wider than the
> larger, signed input. This is bad because it means unnecessary extends
> and because the wider operation might not exist.
>
> This patch catches that case, and ensures that the smaller, unsigned
> input, is zero-extended to match the mode of the larger, signed input.
>
> Of course, both inputs may still have to be extended to fit the nearest
> available instruction, so it doesn't make a difference every time.
>
> OK?

This update fixes Janis' issue with the testsuite.

Andrew

[-- Attachment #2: widening-multiplies-7.patch --]
[-- Type: text/x-patch, Size: 2431 bytes --]

2011-06-24  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Better handle
	unsigned inputs of different modes.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-9.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, short *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2103,9 +2103,17 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
     {
       if (op != smul_widen_optab)
 	{
-	  from_mode = GET_MODE_WIDER_MODE (from_mode);
-	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
-	    return false;
+	  /* We can use a signed multiply with unsigned types as long as
+	     there is a wider mode to use, or it is the smaller of the two
+	     types that is unsigned.  Note that type1 >= type2, always.  */
+	  if (TYPE_UNSIGNED (type1)
+	      || (TYPE_UNSIGNED (type2)
+		  && TYPE_MODE (type2) == from_mode))
+	    {
+	      from_mode = GET_MODE_WIDER_MODE (from_mode);
+	      if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+		return false;
+	    }
 
 	  op = smul_widen_optab;
 	  handler = find_widening_optab_handler_and_mode (op, to_mode,
@@ -2227,14 +2235,21 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     {
       enum machine_mode mode = TYPE_MODE (type1);
-      mode = GET_MODE_WIDER_MODE (mode);
-      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
+
+      /* We can use a signed multiply with unsigned types as long as
+	 there is a wider mode to use, or it is the smaller of the two
+	 types that is unsigned.  Note that type1 >= type2, always.  */
+      if (TYPE_UNSIGNED (type1)
+	  || (TYPE_UNSIGNED (type2)
+	      && TYPE_MODE (type2) == mode))
 	{
-	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
-	  cast1 = cast2 = true;
+	  mode = GET_MODE_WIDER_MODE (mode);
+	  if (GET_MODE_SIZE (mode) >= GET_MODE_SIZE (TYPE_MODE (type)))
+	    return false;
 	}
-      else
-	return false;
+
+      type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
+      cast1 = cast2 = true;
     }
 
   if (TYPE_MODE (type2) != TYPE_MODE (type1))

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-28 16:48                   ` Andrew Stubbs
@ 2011-06-28 17:09                     ` Michael Matz
  2011-07-01 11:58                       ` Stubbs, Andrew
  2011-07-01 16:40                     ` Bernd Schmidt
  1 sibling, 1 reply; 107+ messages in thread
From: Michael Matz @ 2011-06-28 17:09 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Guenther, gcc-patches, patches

Hi,

On Tue, 28 Jun 2011, Andrew Stubbs wrote:

> What I want (and I'm not totally clear on what this actually means) is 
> to be able to optimize all the cases where the end result will be the 
> same as the compiler produces now (using multiple multiply, shift, and 
> add operations).

Okay, then you really want to look through value-preserving conversions.

> Ok, so that's an obvious statement, but the point is that, right now, 
> the compiler does nothing special when you cast from int -> unsigned 
> int, or vice-versa, and I want to capture that somehow. There are some 
> exceptions, I'm sure, but what are they?

Same-sized signed <-> unsigned conversions aren't value preserving:
  unsigned char c = 255; (signed char)c == -1; 255 != -1
unsigned -> larger sized signed is value preserving
  unsigned char c = 255; (signed short)c == 255;
signed -> unsigned never is value preserving

> multiply-and-accumulate instruction, but not so - it is represented 
> internally as:
> 
>   signed int tmp = (signed int)a * (signed int)b;
>   unsigned long long result = a + (unsigned long long)tmp;
> 
> Notice the unexpected signed int in the middle!

Yeah, the C standard requires this.

> I need to be able to get past that to optimize this properly.

Then you're lucky because unsigned char -> signed int is an embedding, 
hence value preserving.  I thought we had a predicate for such conversions 
already, but seems I was wrong.  So, create it as Richi said, but 
enumerate explicitely the cases you want to handle, and include only those 
that really are value preserving.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-28 17:09                     ` Michael Matz
@ 2011-07-01 11:58                       ` Stubbs, Andrew
  2011-07-01 12:25                         ` Richard Guenther
  2011-07-01 12:33                         ` Paolo Bonzini
  0 siblings, 2 replies; 107+ messages in thread
From: Stubbs, Andrew @ 2011-07-01 11:58 UTC (permalink / raw)
  To: Michael Matz; +Cc: Andrew Stubbs, Richard Guenther, gcc-patches, patches

On 28/06/11 17:37, Michael Matz wrote:
>> What I want (and I'm not totally clear on what this actually means) is
>> >  to be able to optimize all the cases where the end result will be the
>> >  same as the compiler produces now (using multiple multiply, shift, and
>> >  add operations).
> Okay, then you really want to look through value-preserving conversions.
>
>> >  Ok, so that's an obvious statement, but the point is that, right now,
>> >  the compiler does nothing special when you cast from int ->  unsigned
>> >  int, or vice-versa, and I want to capture that somehow. There are some
>> >  exceptions, I'm sure, but what are they?
> Same-sized signed<->  unsigned conversions aren't value preserving:
>    unsigned char c = 255; (signed char)c == -1; 255 != -1
> unsigned ->  larger sized signed is value preserving
>    unsigned char c = 255; (signed short)c == 255;
> signed ->  unsigned never is value preserving

OK, so I've tried implementing this, and I find I hit against a problem:

Given this test case:

   unsigned long long
   foo (unsigned long long a, signed char *b, signed char *c)
   {
     return a + *b * *c;
   }

Those rules say that it should not be suitable for optimization because 
there's an implicit cast from signed int to unsigned long long.

Without any widening multiplications allowed, GCC gives this code (for ARM):

   ldrsb   r2, [r2, #0]
   ldrsb   r3, [r3, #0]
   mul     r2, r2, r3
   adds    r0, r0, r2
   adc     r1, r1, r2, asr #31

This is exactly what a signed widening multiply-and-accumulate with 
smlalbb would have done!

OK, so the types in the testcase are a bit contrived, but my point is 
that I want to be able to use the widening-mult instructions everywhere 
that they would produce the same output and gcc would otherwise, and gcc 
just doesn't seem that interested in signed<->unsigned conversions.

So, I'm happy to put in checks to ensure that truncations are not 
ignore, but I'm really not sure what's the right thing to do with the 
extends and signedness switches.

Any suggestions?

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-01 11:58                       ` Stubbs, Andrew
@ 2011-07-01 12:25                         ` Richard Guenther
  2011-07-04 14:23                           ` Andrew Stubbs
  2011-07-01 12:33                         ` Paolo Bonzini
  1 sibling, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-01 12:25 UTC (permalink / raw)
  To: Stubbs, Andrew; +Cc: Michael Matz, Andrew Stubbs, gcc-patches, patches

On Fri, Jul 1, 2011 at 1:58 PM, Stubbs, Andrew <Andrew_Stubbs@mentor.com> wrote:
> On 28/06/11 17:37, Michael Matz wrote:
>>> What I want (and I'm not totally clear on what this actually means) is
>>> >  to be able to optimize all the cases where the end result will be the
>>> >  same as the compiler produces now (using multiple multiply, shift, and
>>> >  add operations).
>> Okay, then you really want to look through value-preserving conversions.
>>
>>> >  Ok, so that's an obvious statement, but the point is that, right now,
>>> >  the compiler does nothing special when you cast from int ->  unsigned
>>> >  int, or vice-versa, and I want to capture that somehow. There are some
>>> >  exceptions, I'm sure, but what are they?
>> Same-sized signed<->  unsigned conversions aren't value preserving:
>>    unsigned char c = 255; (signed char)c == -1; 255 != -1
>> unsigned ->  larger sized signed is value preserving
>>    unsigned char c = 255; (signed short)c == 255;
>> signed ->  unsigned never is value preserving
>
> OK, so I've tried implementing this, and I find I hit against a problem:
>
> Given this test case:
>
>   unsigned long long
>   foo (unsigned long long a, signed char *b, signed char *c)
>   {
>     return a + *b * *c;
>   }
>
> Those rules say that it should not be suitable for optimization because
> there's an implicit cast from signed int to unsigned long long.
>
> Without any widening multiplications allowed, GCC gives this code (for ARM):
>
>   ldrsb   r2, [r2, #0]
>   ldrsb   r3, [r3, #0]
>   mul     r2, r2, r3
>   adds    r0, r0, r2
>   adc     r1, r1, r2, asr #31
>
> This is exactly what a signed widening multiply-and-accumulate with
> smlalbb would have done!
>
> OK, so the types in the testcase are a bit contrived, but my point is
> that I want to be able to use the widening-mult instructions everywhere
> that they would produce the same output and gcc would otherwise, and gcc
> just doesn't seem that interested in signed<->unsigned conversions.
>
> So, I'm happy to put in checks to ensure that truncations are not
> ignore, but I'm really not sure what's the right thing to do with the
> extends and signedness switches.
>
> Any suggestions?

Well - some operations work the same on both signedness if you
just care about the twos-complement result.  This includes
multiplication (but not for example division).  For this special
case I suggest to not bother trying to invent a generic predicate
but do something local in tree-ssa-math-opts.c.

Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-01 11:58                       ` Stubbs, Andrew
  2011-07-01 12:25                         ` Richard Guenther
@ 2011-07-01 12:33                         ` Paolo Bonzini
  2011-07-01 13:31                           ` Stubbs, Andrew
  1 sibling, 1 reply; 107+ messages in thread
From: Paolo Bonzini @ 2011-07-01 12:33 UTC (permalink / raw)
  To: Stubbs, Andrew
  Cc: Michael Matz, Andrew Stubbs, Richard Guenther, gcc-patches, patches

On 07/01/2011 01:58 PM, Stubbs, Andrew wrote:
> Given this test case:
>
>     unsigned long long
>     foo (unsigned long long a, signed char *b, signed char *c)
>     {
>       return a + *b * *c;
>     }
>
> Those rules say that it should not be suitable for optimization because
> there's an implicit cast from signed int to unsigned long long.

Got it now!  Casts from signed to unsigned are not value-preserving, but 
they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the 
same result bit-by-bit as the s64 result.  The fact that s64 has an 
implicit 1111... in front, while an u64 has an implicit 0000... does not 
matter.

Is this the meaning of the predicate you want?  I think so, based on the 
discussion, but it's hard to say without seeing the cases enumerated 
(i.e. a patch).

However, perhaps there is a catch.  We can do the following thought 
experiment.  What would happen if you had multiple widening multiplies? 
  Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 
128-bit unsigned?  I believe in this case you couldn't optimize 8-bit 
signed to 128-bit unsigned.  Would your code do it?

Paolo

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-01 12:33                         ` Paolo Bonzini
@ 2011-07-01 13:31                           ` Stubbs, Andrew
  2011-07-01 14:41                             ` Paolo Bonzini
  2011-07-01 15:10                             ` Stubbs, Andrew
  0 siblings, 2 replies; 107+ messages in thread
From: Stubbs, Andrew @ 2011-07-01 13:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael Matz, Richard Guenther, gcc-patches, patches

On 01/07/11 13:33, Paolo Bonzini wrote:
> Got it now! Casts from signed to unsigned are not value-preserving, but
> they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the
> same result bit-by-bit as the s64 result. The fact that s64 has an
> implicit 1111... in front, while an u64 has an implicit 0000... does not
> matter.

But, the 1111... and 0000... are not implicit. They are very real, and 
if applied incorrectly will change the result, I think.

> Is this the meaning of the predicate you want? I think so, based on the
> discussion, but it's hard to say without seeing the cases enumerated
> (i.e. a patch).

The purpose of this predicate is to determine whether any type 
conversions that occur between the output of a widening multiply, and 
the input of an addition have any bearing on the end result.

We know what the effective output type of the multiply is (the size is 
2x the input type, and the signed if either one of the inputs in 
signed), and we know what the input type of the addition is, but any 
amount of junk can lie in between. The problem is determining if it *is* 
junk.

In an ideal world there would only be two cases to consider:

   1. No conversion needed.

   2. A single sign-extend or zero-extend (according to the type of the 
inputs) to match the input size of the addition.

Anything else would be unsuitable for optimization. Of course, it's 
never that simple, but it should still be possible to boil down a list 
of conversions to one of these cases, if it's valid.

The signedness of the input to the addition is not significant - the 
code would be the same either way. But, I is important not to try to 
zero-extend something that started out signed, and not to sign-extend 
something that started out unsigned.

> However, perhaps there is a catch. We can do the following thought
> experiment. What would happen if you had multiple widening multiplies?
> Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit
> unsigned? I believe in this case you couldn't optimize 8-bit signed to
> 128-bit unsigned. Would your code do it?

My code does not attempt to combine multiple multiplies. In any case, if 
you have two multiplications, surely you have at least three input 
values, so they can't be combined?

It does attempt to combine a multiply and an addition, where a suitable 
madd* insn is available. (This is not new; I'm just trying to do it in 
more cases.)

I have considered the case where you have "(a * b) + (c * d)", but have 
not yet coded anything for it. At present, the code will simply choose 
whichever multiply happens to find itself the first input operand of the 
plus, and ignores the other, even if the first turns out not to be a 
suitable candidate.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-01 13:31                           ` Stubbs, Andrew
@ 2011-07-01 14:41                             ` Paolo Bonzini
  2011-07-01 14:55                               ` Stubbs, Andrew
  2011-07-01 15:10                             ` Stubbs, Andrew
  1 sibling, 1 reply; 107+ messages in thread
From: Paolo Bonzini @ 2011-07-01 14:41 UTC (permalink / raw)
  To: Stubbs, Andrew; +Cc: Michael Matz, Richard Guenther, gcc-patches, patches

On 07/01/2011 03:30 PM, Stubbs, Andrew wrote:
>> >  However, perhaps there is a catch. We can do the following thought
>> >  experiment. What would happen if you had multiple widening multiplies?
>> >  Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit
>> >  unsigned? I believe in this case you couldn't optimize 8-bit signed to
>> >  128-bit unsigned. Would your code do it?
> My code does not attempt to combine multiple multiplies. In any case, if
> you have two multiplications, surely you have at least three input
> values, so they can't be combined?

What about (u128)c + (u64)((s8)a * (s8)b)?  You cannot convert this to 
(u128)c + (u128)((s8)a * (s8)b).

Paolo

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-01 14:41                             ` Paolo Bonzini
@ 2011-07-01 14:55                               ` Stubbs, Andrew
  2011-07-01 15:54                                 ` Paolo Bonzini
  0 siblings, 1 reply; 107+ messages in thread
From: Stubbs, Andrew @ 2011-07-01 14:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael Matz, Richard Guenther, gcc-patches, patches

On 01/07/11 15:40, Paolo Bonzini wrote:
> On 07/01/2011 03:30 PM, Stubbs, Andrew wrote:
>>> > However, perhaps there is a catch. We can do the following thought
>>> > experiment. What would happen if you had multiple widening multiplies?
>>> > Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to
>>> 128-bit
>>> > unsigned? I believe in this case you couldn't optimize 8-bit signed to
>>> > 128-bit unsigned. Would your code do it?
>> My code does not attempt to combine multiple multiplies. In any case, if
>> you have two multiplications, surely you have at least three input
>> values, so they can't be combined?
>
> What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to
> (u128)c + (u128)((s8)a * (s8)b).

Oh I see, sorry. Yes, that's exactly what I'm trying to do here.

No, wait, I don't see. Where are these multiple widening multiplies 
you're talking about? I only see one multiply?

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-01 13:31                           ` Stubbs, Andrew
  2011-07-01 14:41                             ` Paolo Bonzini
@ 2011-07-01 15:10                             ` Stubbs, Andrew
  1 sibling, 0 replies; 107+ messages in thread
From: Stubbs, Andrew @ 2011-07-01 15:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael Matz, Richard Guenther, gcc-patches, patches

On 01/07/11 14:30, Stubbs, Andrew wrote:
>> Got it now! Casts from signed to unsigned are not value-preserving, but
>> >  they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the
>> >  same result bit-by-bit as the s64 result. The fact that s64 has an
>> >  implicit 1111... in front, while an u64 has an implicit 0000... does not
>> >  matter.
> But, the 1111... and 0000... are not implicit. They are very real, and
> if applied incorrectly will change the result, I think.

Wait, I'm clearly confused ....

When I try a s32->u64 conversion, the expand pass generates a 
sign_extend insn.

Clearly it's the source type that determines the extension type, not the 
destination type ... and I'm a dunce!

Thanks :)

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-01 14:55                               ` Stubbs, Andrew
@ 2011-07-01 15:54                                 ` Paolo Bonzini
  2011-07-01 18:18                                   ` Stubbs, Andrew
  0 siblings, 1 reply; 107+ messages in thread
From: Paolo Bonzini @ 2011-07-01 15:54 UTC (permalink / raw)
  To: Stubbs, Andrew; +Cc: Michael Matz, Richard Guenther, gcc-patches, patches

On 07/01/2011 04:55 PM, Stubbs, Andrew wrote:
>> >
>> >  What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to
>> >  (u128)c + (u128)((s8)a * (s8)b).
> Oh I see, sorry. Yes, that's exactly what I'm trying to do here.
>
> No, wait, I don't see. Where are these multiple widening multiplies
> you're talking about? I only see one multiply?

I meant one multiplication with multiple widening steps.  Not clear at 
all, sorry.

Paolo

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-06-28 16:48                   ` Andrew Stubbs
  2011-06-28 17:09                     ` Michael Matz
@ 2011-07-01 16:40                     ` Bernd Schmidt
  1 sibling, 0 replies; 107+ messages in thread
From: Bernd Schmidt @ 2011-07-01 16:40 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Michael Matz, Richard Guenther, gcc-patches, patches

On 06/28/11 18:14, Andrew Stubbs wrote:

>   unsigned long long
>   foo (unsigned long long a, unsigned char b, unsigned char c)
>   {
>     return a + b * c;
>   }
> 
> This appears to be entirely unsigned maths with plenty of spare
> precision, and therefore a dead cert for any SI->DI
> multiply-and-accumulate instruction, but not so - it is represented
> internally as:
> 
>   signed int tmp = (signed int)a * (signed int)b;
>   unsigned long long result = a + (unsigned long long)tmp;
> 
> Notice the unexpected signed int in the middle! I need to be able to get
> past that to optimize this properly.

Since both inputs are positive in a signed int (they must be, being cast
from a smaller unsigned value), you can infer that it does not matter
whether you treat the result of the multiplication as a signed or an
unsigned value. It is positive in any case.

So, I think the thing to test is: if the accumulate step requires
widening the result of the multiplication, either the cast must be value
preserving (widening unsigned to signed), or you must be able to prove
that the multiplication produces a positive result.

If the accumulate step just casts the multiplication result from signed
to unsigned, keeping the precision the same, you can ignore the cast
since the addition is unaffected by it.

Bernd

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-01 15:54                                 ` Paolo Bonzini
@ 2011-07-01 18:18                                   ` Stubbs, Andrew
  0 siblings, 0 replies; 107+ messages in thread
From: Stubbs, Andrew @ 2011-07-01 18:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael Matz, Richard Guenther, gcc-patches, patches

On 01/07/11 16:54, Paolo Bonzini wrote:
> On 07/01/2011 04:55 PM, Stubbs, Andrew wrote:
>>> >
>>> > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to
>>> > (u128)c + (u128)((s8)a * (s8)b).
>> Oh I see, sorry. Yes, that's exactly what I'm trying to do here.
>>
>> No, wait, I don't see. Where are these multiple widening multiplies
>> you're talking about? I only see one multiply?
>
> I meant one multiplication with multiple widening steps. Not clear at
> all, sorry.

Yes, I see now, the whole purpose of my patch set is widening by more 
than one mode.

The case of the multiply-and-accumulate is the only way there can be 
more than one step though. Widening multiplies themselves are always 
handled as one unit.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-01 12:25                         ` Richard Guenther
@ 2011-07-04 14:23                           ` Andrew Stubbs
  2011-07-07 10:00                             ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-04 14:23 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Michael Matz, gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

On 01/07/11 13:25, Richard Guenther wrote:
> Well - some operations work the same on both signedness if you
> just care about the twos-complement result.  This includes
> multiplication (but not for example division).  For this special
> case I suggest to not bother trying to invent a generic predicate
> but do something local in tree-ssa-math-opts.c.

OK, here's my updated patch.

I've taken the view that we *know* what size and signedness the result 
of the multiplication is, and we know what size the input to the 
addition must be, so all the check has to do is make sure it does that 
same conversion, even if by a roundabout means.

What I hadn't grasped before is that when extending a value it's the 
source type that is significant, not the destination, so the checks are 
not as complex as I had thought.

So, this patch adds a test to ensure that:

  1. the type is not truncated so far that we lose any information; and

  2. the type is only ever extended in the proper signedness.

Also, just to be absolutely sure, I've also added a little bit of logic 
to permit extends that are then undone by a truncate. I'm really not 
sure what guarantees there are about what sort of cast sequences can 
exist? Is this necessary? I haven't managed to coax it to generated any 
examples of extends followed by truncates myself, but in any case, it's 
hardly any code and it'll make sure it's future proofed.

OK?

Andrew

[-- Attachment #2: widening-multiplies-3.patch --]
[-- Type: text/x-patch, Size: 8045 bytes --]

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (valid_types_for_madd_p): New function.
	(convert_plusminus_to_widen): Use valid_types_for_madd_p to
	identify optimization candidates.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

---
 .../gcc/testsuite/gcc.target/arm/no-wmla-1.c       |   11 ++
 .../gcc/testsuite/gcc.target/arm/wmul-5.c          |   10 ++
 src/gcc-mainline/gcc/tree-ssa-math-opts.c          |  112 ++++++++++++++++++--
 3 files changed, 123 insertions(+), 10 deletions(-)
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c

diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
new file mode 100644
index 0000000..17f7427
--- /dev/null
+++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+     int bc = b * c;
+        return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c
new file mode 100644
index 0000000..65c43e3
--- /dev/null
+++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
diff --git a/src/gcc-mainline/gcc/tree-ssa-math-opts.c b/src/gcc-mainline/gcc/tree-ssa-math-opts.c
index d55ba57..5ef7bb4 100644
--- a/src/gcc-mainline/gcc/tree-ssa-math-opts.c
+++ b/src/gcc-mainline/gcc/tree-ssa-math-opts.c
@@ -2085,6 +2085,78 @@ convert_mult_to_widen (gimple stmt)
   return true;
 }
 
+/* Check the input types, TYPE1 and TYPE2 to a widening multiply,
+   and then the convertions between the output of the multiply, and
+   the input to an addition EXPR, to ensure that they are compatible with
+   a widening multiply-and-accumulate.
+
+   This function assumes that expr is a valid string of conversion expressions
+   terminated by a multiplication.
+
+   This function tries NOT to make any (fragile) assumptions about what
+   sequence of conversions can exist in the input.  */
+
+static bool
+valid_types_for_madd_p (tree type1, tree type2, tree expr)
+{
+  gimple stmt, prev_stmt;
+  enum tree_code code, prev_code;
+  tree prev_expr, type, prev_type;
+  int bitsize, prev_bitsize, initial_bitsize, min_bitsize;
+  bool initial_unsigned;
+
+  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+  initial_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+  stmt = SSA_NAME_DEF_STMT (expr);
+  code = gimple_assign_rhs_code (stmt);
+  type = TREE_TYPE (expr);
+  bitsize = TYPE_PRECISION (type);
+  min_bitsize = bitsize;
+
+  if (code == MULT_EXPR || code == WIDEN_MULT_EXPR)
+    return true;
+
+  if (!INTEGRAL_TYPE_P (type)
+      || TYPE_PRECISION (type) < initial_bitsize)
+    return false;
+
+  /* Step through the conversions backwards.  */
+  while (true)
+    {
+      prev_expr = gimple_assign_rhs1 (stmt);
+      prev_stmt = SSA_NAME_DEF_STMT (prev_expr);
+      prev_code = gimple_assign_rhs_code (prev_stmt);
+      prev_type = TREE_TYPE (prev_expr);
+      prev_bitsize = TYPE_PRECISION (prev_type);
+
+      if (prev_code == MULT_EXPR || prev_code == WIDEN_MULT_EXPR)
+	break;
+
+      /* If it's an unsuitable type or a truncate that damages the
+	 original value, then were done.  */
+      if (!INTEGRAL_TYPE_P (prev_type)
+	  || TYPE_PRECISION (prev_type) < initial_bitsize)
+	return false;
+
+      /* If we have the wrong sort of extend for the value, then it
+	 could still be ok if we already saw a truncate that reverses
+	 the effect.  */
+      if (bitsize > prev_bitsize
+	  && TYPE_UNSIGNED (prev_type) != initial_unsigned
+	  && min_bitsize > prev_bitsize)
+	return false;
+
+      stmt = prev_stmt;
+      code = prev_code;
+      type = prev_type;
+      bitsize = prev_bitsize;
+      min_bitsize = bitsize < min_bitsize ? bitsize : min_bitsize;
+    }
+
+  return true;
+}
+
 /* Process a single gimple statement STMT, which is found at the
    iterator GSI and has a either a PLUS_EXPR or a MINUS_EXPR as its
    rhs (given by CODE), and try to convert it into a
@@ -2098,6 +2170,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
   tree type, type1, type2;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
+  tree tmp, mult_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
   enum tree_code wmult_code;
@@ -2117,22 +2190,32 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   rhs1 = gimple_assign_rhs1 (stmt);
   rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (rhs1) == SSA_NAME)
+  for (tmp = rhs1, rhs1_code = ERROR_MARK;
+       TREE_CODE (tmp) == SSA_NAME
+       && (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK);
+       tmp = gimple_assign_rhs1 (rhs1_stmt))
     {
-      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-      if (is_gimple_assign (rhs1_stmt))
-	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+      rhs1_stmt = SSA_NAME_DEF_STMT (tmp);
+      if (!is_gimple_assign (rhs1_stmt))
+	break;
+      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
     }
-  else
+
+  if (TREE_CODE (tmp) != SSA_NAME)
     return false;
 
-  if (TREE_CODE (rhs2) == SSA_NAME)
+  for (tmp = rhs2, rhs2_code = ERROR_MARK;
+       TREE_CODE (tmp) == SSA_NAME
+       && (CONVERT_EXPR_CODE_P (rhs2_code) || rhs2_code == ERROR_MARK);
+       tmp = gimple_assign_rhs1 (rhs2_stmt))
     {
-      rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
-      if (is_gimple_assign (rhs2_stmt))
-	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+      rhs2_stmt = SSA_NAME_DEF_STMT (tmp);
+      if (!is_gimple_assign (rhs2_stmt))
+	break;
+      rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
     }
-  else
+
+  if (TREE_CODE (tmp) != SSA_NAME)
     return false;
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
@@ -2140,6 +2223,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
+      mult_rhs = rhs1;
       add_rhs = rhs2;
     }
   else if (rhs2_code == MULT_EXPR)
@@ -2147,6 +2231,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
+      mult_rhs = rhs2;
       add_rhs = rhs1;
     }
   else if (code == PLUS_EXPR && rhs1_code == WIDEN_MULT_EXPR)
@@ -2155,6 +2240,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       mult_rhs2 = gimple_assign_rhs2 (rhs1_stmt);
       type1 = TREE_TYPE (mult_rhs1);
       type2 = TREE_TYPE (mult_rhs2);
+      mult_rhs = rhs1;
       add_rhs = rhs2;
     }
   else if (rhs2_code == WIDEN_MULT_EXPR)
@@ -2163,6 +2249,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       mult_rhs2 = gimple_assign_rhs2 (rhs2_stmt);
       type1 = TREE_TYPE (mult_rhs1);
       type2 = TREE_TYPE (mult_rhs2);
+      mult_rhs = rhs2;
       add_rhs = rhs1;
     }
   else
@@ -2171,6 +2258,11 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     return false;
 
+  /* Verify that the convertions between the mult and the add doesn't do
+     anything unexpected.  */
+  if (!valid_types_for_madd_p (type1, type2, mult_rhs))
+    return false;
+
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-06-28 14:49     ` Andrew Stubbs
@ 2011-07-04 14:27       ` Andrew Stubbs
  2011-07-07 10:10         ` Richard Guenther
  2011-07-12 14:10         ` Andrew Stubbs
  0 siblings, 2 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-04 14:27 UTC (permalink / raw)
  Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 1165 bytes --]

On 28/06/11 15:14, Andrew Stubbs wrote:
> On 28/06/11 13:33, Andrew Stubbs wrote:
>> On 23/06/11 15:41, Andrew Stubbs wrote:
>>> If one or both of the inputs to a widening multiply are of unsigned type
>>> then the compiler will attempt to use usmul_widen_optab or
>>> umul_widen_optab, respectively.
>>>
>>> That works fine, but only if the target supports those operations
>>> directly. Otherwise, it just bombs out and reverts to the normal
>>> inefficient non-widening multiply.
>>>
>>> This patch attempts to catch these cases and use an alternative signed
>>> widening multiply instruction, if one of those is available.
>>>
>>> I believe this should be legal as long as the top bit of both inputs is
>>> guaranteed to be zero. The code achieves this guarantee by
>>> zero-extending the inputs to a wider mode (which must still be narrower
>>> than the output mode).
>>>
>>> OK?
>>
>> This update fixes the testsuite issue Janis pointed out.
>
> And this one fixes up the wmul-5.c testcase also. The patch has changed
> the correct result.

Here's an update for the context changed by the update to patch 3.

The content of the patch has not changed.

Andrew

[-- Attachment #2: widening-multiplies-4.patch --]
[-- Type: text/x-patch, Size: 7611 bytes --]

2011-07-04  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* Makefile.in (tree-ssa-math-opts.o): Add langhooks.h dependency.
	* optabs.c (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this, and add new
	argument 'found_mode'.
	* optabs.h (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this.
	(find_widening_optab_handler): New macro.
	* tree-ssa-math-opts.c: Include langhooks.h
	(build_and_insert_cast): New function.
	(convert_mult_to_widen): Add new argument 'gsi'.
	Convert unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.
	(execute_optimize_widening_mul): Pass gsi to convert_mult_to_widen.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: Update expected result.
	* gcc.target/arm/wmul-6.c: New file.

--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2672,7 +2672,8 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLOW_H) $(CONFIG_H) \
 tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
    $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
-   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h
+   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \
+   langhooks.h
 tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
    $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \
    $(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -232,9 +232,10 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
    non-widening optabs also.  */
 
 enum insn_code
-find_widening_optab_handler (optab op, enum machine_mode to_mode,
-			     enum machine_mode from_mode,
-			     int permit_non_widening)
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+				      enum machine_mode from_mode,
+				      int permit_non_widening,
+				      enum machine_mode *found_mode)
 {
   for (; (permit_non_widening || from_mode != to_mode)
 	 && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
@@ -245,7 +246,11 @@ find_widening_optab_handler (optab op, enum machine_mode to_mode,
 						       from_mode);
 
       if (handler != CODE_FOR_nothing)
-	return handler;
+	{
+	  if (found_mode)
+	    *found_mode = from_mode;
+	  return handler;
+	}
     }
 
   return CODE_FOR_nothing;
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -808,8 +808,13 @@ extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
 /* Find a widening optab even if it doesn't widen as much as we want.  */
-extern enum insn_code find_widening_optab_handler (optab, enum machine_mode,
-						   enum machine_mode, int);
+#define find_widening_optab_handler(A,B,C,D) \
+  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+extern enum insn_code find_widening_optab_handler_and_mode (optab,
+							    enum machine_mode,
+							    enum machine_mode,
+							    int,
+							    enum machine_mode *);
 
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
    distinguish between machines with a vector shift that takes a scalar for the
--- a/gcc/testsuite/gcc.target/arm/wmul-5.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -7,4 +7,4 @@ foo (long long a, char *b, char *c)
   return a + *b * *c;
 }
 
-/* { dg-final { scan-assembler "umlal" } } */
+/* { dg-final { scan-assembler "smlalbb" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -98,6 +98,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "basic-block.h"
 #include "target.h"
 #include "gimple-pretty-print.h"
+#include "langhooks.h"
 
 /* FIXME: RTL headers have to be included here for optabs.  */
 #include "rtl.h"		/* Because optabs.h wants enum rtx_code.  */
@@ -1086,6 +1087,21 @@ build_and_insert_ref (gimple_stmt_iterator *gsi, location_t loc, tree type,
   return result;
 }
 
+/* Build a gimple assignment to cast VAL to TYPE, and put the result in
+   TARGET.  Insert the statement prior to GSI's current position, and
+   return the from SSA name.  */
+
+static tree
+build_and_insert_cast (gimple_stmt_iterator *gsi, location_t loc,
+		       tree target, tree val, tree type)
+{
+  tree result = make_ssa_name (target, NULL);
+  gimple stmt = gimple_build_assign (result, fold_convert (type, val));
+  gimple_set_location (stmt, loc);
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  return result;
+}
+
 /* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI
    with location info LOC.  If possible, create an equivalent and
    less expensive sequence of statements prior to GSI, and return an
@@ -2047,7 +2063,7 @@ is_widening_mult_p (gimple stmt,
    value is true iff we converted the statement.  */
 
 static bool
-convert_mult_to_widen (gimple stmt)
+convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
@@ -2075,7 +2091,31 @@ convert_mult_to_widen (gimple stmt)
   handler = find_widening_optab_handler (op, to_mode, from_mode, 0);
 
   if (handler == CODE_FOR_nothing)
-    return false;
+    {
+      if (op != smul_widen_optab)
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+	    return false;
+
+	  op = smul_widen_optab;
+	  handler = find_widening_optab_handler_and_mode (op, to_mode,
+							  from_mode, 0,
+							  &from_mode);
+
+	  if (handler == CODE_FOR_nothing)
+	    return false;
+
+	  type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0);
+
+	  rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+					create_tmp_var (type1, NULL), rhs1, type1);
+	  rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+					create_tmp_var (type2, NULL), rhs2, type2);
+	}
+      else
+	return false;
+    }
 
   gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
   gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
@@ -2256,7 +2296,22 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     return false;
 
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
-    return false;
+    {
+      enum machine_mode mode = TYPE_MODE (type1);
+      mode = GET_MODE_WIDER_MODE (mode);
+      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
+	{
+	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
+	  mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+					     create_tmp_var (type1, NULL),
+					     mult_rhs1, type1);
+	  mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+					     create_tmp_var (type2, NULL),
+					     mult_rhs2, type2);
+	}
+      else
+	return false;
+    }
 
   /* Verify that the convertions between the mult and the add doesn't do
      anything unexpected.  */
@@ -2489,7 +2544,7 @@ execute_optimize_widening_mul (void)
 	      switch (code)
 		{
 		case MULT_EXPR:
-		  if (!convert_mult_to_widen (stmt)
+		  if (!convert_mult_to_widen (stmt, &gsi)
 		      && convert_mult_to_fma (stmt,
 					      gimple_assign_rhs1 (stmt),
 					      gimple_assign_rhs2 (stmt)))

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
  2011-06-28 15:44   ` Andrew Stubbs
@ 2011-07-04 14:29     ` Andrew Stubbs
  2011-07-07 10:11       ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-04 14:29 UTC (permalink / raw)
  Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]

On 28/06/11 16:08, Andrew Stubbs wrote:
> On 23/06/11 15:41, Andrew Stubbs wrote:
>> This patch removes the restriction that the inputs to a widening
>> multiply must be of the same mode.
>>
>> It does this by extending the smaller of the two inputs to match the
>> larger; therefore, it remains the case that subsequent code (in the
>> expand pass, for example) can rely on the type of rhs1 being the input
>> type of the operation, and the gimple verification code is still valid.
>>
>> OK?
>
> This update fixes the testcase issue Janis highlighted.

And this one updates the context changed by my update to patch 3.

The content of the patch has not changed.

Andrew

[-- Attachment #2: widening-multiplies-5.patch --]
[-- Type: text/x-patch, Size: 4121 bytes --]

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME.
	Ensure the the larger type is the first operand.
	(convert_mult_to_widen): Insert cast if type2 is smaller than type1.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-7.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+unsigned long long
+foo (unsigned long long a, unsigned char *b, unsigned short *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2051,9 +2051,17 @@ is_widening_mult_p (gimple stmt,
       *type2_out = *type1_out;
     }
 
-  /* FIXME: remove this restriction.  */
-  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
-    return false;
+  /* Ensure that the larger of the two operands comes first. */
+  if (TYPE_PRECISION (*type1_out) < TYPE_PRECISION (*type2_out))
+    {
+      tree tmp;
+      tmp = *type1_out;
+      *type1_out = *type2_out;
+      *type2_out = tmp;
+      tmp = *rhs1_out;
+      *rhs1_out = *rhs2_out;
+      *rhs2_out = tmp;
+    }
 
   return true;
 }
@@ -2069,6 +2077,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   enum insn_code handler;
   enum machine_mode to_mode, from_mode;
   optab op;
+  int cast1 = false, cast2 = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2107,16 +2116,26 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 	    return false;
 
 	  type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0);
-
-	  rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
-					create_tmp_var (type1, NULL), rhs1, type1);
-	  rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
-					create_tmp_var (type2, NULL), rhs2, type2);
+	  cast1 = cast2 = true;
 	}
       else
 	return false;
     }
 
+  if (TYPE_MODE (type2) != from_mode)
+    {
+      type2 = lang_hooks.types.type_for_mode (from_mode,
+					      TYPE_UNSIGNED (type2));
+      cast2 = true;
+    }
+
+  if (cast1)
+    rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+				  create_tmp_var (type1, NULL), rhs1, type1);
+  if (cast2)
+    rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+				  create_tmp_var (type2, NULL), rhs2, type2);
+
   gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
   gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
@@ -2215,6 +2234,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   optab this_optab;
   enum tree_code wmult_code;
   enum insn_code handler;
+  int cast1 = false, cast2 = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2302,17 +2322,28 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
 	{
 	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
-	  mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
-					     create_tmp_var (type1, NULL),
-					     mult_rhs1, type1);
-	  mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
-					     create_tmp_var (type2, NULL),
-					     mult_rhs2, type2);
+	  cast1 = cast2 = true;
 	}
       else
 	return false;
     }
 
+  if (TYPE_MODE (type2) != TYPE_MODE (type1))
+    {
+      type2 = lang_hooks.types.type_for_mode (TYPE_MODE (type1),
+					      TYPE_UNSIGNED (type2));
+      cast2 = true;
+    }
+
+  if (cast1)
+    mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+				       create_tmp_var (type1, NULL),
+				       mult_rhs1, type1);
+  if (cast2)
+    mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+				       create_tmp_var (type2, NULL),
+				       mult_rhs2, type2);
+
   /* Verify that the convertions between the mult and the add doesn't do
      anything unexpected.  */
   if (!valid_types_for_madd_p (type1, type2, mult_rhs))

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
  2011-06-28 15:49   ` Andrew Stubbs
@ 2011-07-04 14:32     ` Andrew Stubbs
  2011-07-07 10:20       ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-04 14:32 UTC (permalink / raw)
  Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 1070 bytes --]

On 28/06/11 16:30, Andrew Stubbs wrote:
> On 23/06/11 15:42, Andrew Stubbs wrote:
>> This patch fixes the case where widening multiply-and-accumulate were
>> not recognised because the multiplication itself is not actually
>> widening.
>>
>> This can happen when you have "DI + SI * SI" - the multiplication will
>> be done in SImode as a non-widening multiply, and it's only the final
>> accumulate step that is widening.
>>
>> This was not recognised for two reasons:
>>
>> 1. is_widening_mult_p inferred the output type from the multiply
>> statement, which in not useful in this case.
>>
>> 2. The inputs to the multiply instruction may not have been converted at
>> all (because they're not being widened), so the pattern match failed.
>>
>> The patch fixes these issues by making the output type explicit, and by
>> permitting unconverted inputs (the types are still checked, so this is
>> safe).
>>
>> OK?
>
> This update fixes Janis' testsuite issue.

This updates the context changed by my update to patch 3.

The content of this patch has not changed.

Andrew

[-- Attachment #2: widening-multiplies-6.patch --]
[-- Type: text/x-patch, Size: 5113 bytes --]

2011-07-04  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
	'type'.
	Use 'type' from caller, not inferred from 'rhs'.
	Don't reject non-conversion statements. Do return lhs in this case.
	(is_widening_mult_p): Add new argument 'type'.
	Use 'type' from caller, not inferred from 'stmt'.
	Pass type to is_widening_mult_rhs_p.
	(convert_mult_to_widen): Pass type to is_widening_mult_p.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-8.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, int *b, int *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1963,7 +1963,8 @@ struct gimple_opt_pass pass_optimize_bswap =
  }
 };
 
-/* Return true if RHS is a suitable operand for a widening multiplication.
+/* Return true if RHS is a suitable operand for a widening multiplication,
+   assuming a target type of TYPE.
    There are two cases:
 
      - RHS makes some value at least twice as wide.  Store that value
@@ -1973,32 +1974,32 @@ struct gimple_opt_pass pass_optimize_bswap =
        but leave *TYPE_OUT untouched.  */
 
 static bool
-is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
+is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
+			tree *new_rhs_out)
 {
   gimple stmt;
-  tree type, type1, rhs1;
+  tree type1, rhs1;
   enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
     {
-      type = TREE_TYPE (rhs);
       stmt = SSA_NAME_DEF_STMT (rhs);
       if (!is_gimple_assign (stmt))
 	return false;
 
-      rhs_code = gimple_assign_rhs_code (stmt);
-      if (TREE_CODE (type) == INTEGER_TYPE
-	  ? !CONVERT_EXPR_CODE_P (rhs_code)
-	  : rhs_code != FIXED_CONVERT_EXPR)
-	return false;
-
       rhs1 = gimple_assign_rhs1 (stmt);
       type1 = TREE_TYPE (rhs1);
       if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
 
-      *new_rhs_out = rhs1;
+      rhs_code = gimple_assign_rhs_code (stmt);
+      if (TREE_CODE (type) == INTEGER_TYPE
+	  ? !CONVERT_EXPR_CODE_P (rhs_code)
+	  : rhs_code != FIXED_CONVERT_EXPR)
+	*new_rhs_out = gimple_assign_lhs (stmt);
+      else
+	*new_rhs_out = rhs1;
       *type_out = type1;
       return true;
     }
@@ -2013,28 +2014,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
   return false;
 }
 
-/* Return true if STMT performs a widening multiplication.  If so,
-   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
-   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
-   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
-   operands of the multiplication.  */
+/* Return true if STMT performs a widening multiplication, assuming the
+   output type is TYPE.  If so, store the unwidened types of the operands
+   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
+   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
+   and *TYPE2_OUT would give the operands of the multiplication.  */
 
 static bool
-is_widening_mult_p (gimple stmt,
+is_widening_mult_p (tree type, gimple stmt,
 		    tree *type1_out, tree *rhs1_out,
 		    tree *type2_out, tree *rhs2_out)
 {
-  tree type;
-
-  type = TREE_TYPE (gimple_assign_lhs (stmt));
   if (TREE_CODE (type) != INTEGER_TYPE
       && TREE_CODE (type) != FIXED_POINT_TYPE)
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
+			       rhs1_out))
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
+			       rhs2_out))
     return false;
 
   if (*type1_out == NULL)
@@ -2084,7 +2084,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != INTEGER_TYPE)
     return false;
 
-  if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
+  if (!is_widening_mult_p (type, stmt, &type1, &rhs1, &type2, &rhs2))
     return false;
 
   to_mode = TYPE_MODE (type);
@@ -2280,7 +2280,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
     {
-      if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       mult_rhs = rhs1;
@@ -2288,7 +2288,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     }
   else if (rhs2_code == MULT_EXPR)
     {
-      if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       mult_rhs = rhs2;

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-04 14:23                           ` Andrew Stubbs
@ 2011-07-07 10:00                             ` Richard Guenther
  2011-07-07 10:27                               ` Andrew Stubbs
  2011-07-11 17:01                               ` Andrew Stubbs
  0 siblings, 2 replies; 107+ messages in thread
From: Richard Guenther @ 2011-07-07 10:00 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Michael Matz, gcc-patches, patches

On Mon, Jul 4, 2011 at 4:23 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 01/07/11 13:25, Richard Guenther wrote:
>>
>> Well - some operations work the same on both signedness if you
>> just care about the twos-complement result.  This includes
>> multiplication (but not for example division).  For this special
>> case I suggest to not bother trying to invent a generic predicate
>> but do something local in tree-ssa-math-opts.c.
>
> OK, here's my updated patch.
>
> I've taken the view that we *know* what size and signedness the result of
> the multiplication is, and we know what size the input to the addition must
> be, so all the check has to do is make sure it does that same conversion,
> even if by a roundabout means.
>
> What I hadn't grasped before is that when extending a value it's the source
> type that is significant, not the destination, so the checks are not as
> complex as I had thought.
>
> So, this patch adds a test to ensure that:
>
>  1. the type is not truncated so far that we lose any information; and
>
>  2. the type is only ever extended in the proper signedness.
>
> Also, just to be absolutely sure, I've also added a little bit of logic to
> permit extends that are then undone by a truncate. I'm really not sure what
> guarantees there are about what sort of cast sequences can exist? Is this
> necessary? I haven't managed to coax it to generated any examples of extends
> followed by truncates myself, but in any case, it's hardly any code and
> it'll make sure it's future proofed.
>
> OK?

I think you should assume that series of widenings, (int)(short)char_variable
are already combined.  Thus I believe you only need to consider a single
conversion in valid_types_for_madd_p.

+/* Check the input types, TYPE1 and TYPE2 to a widening multiply,

what are those types?  Is TYPE1 the result type and TYPE2 the
operand type?  If so why

+  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);

this?!

+  initial_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);

that also looks odd.  So probably TYPE1 isn't the result type.  If they
are the types of the operands, then what operand is EXPR for?

I didn't look at the actual implementation of the function because of the
lack of understanding of the inputs.

-  if (TREE_CODE (rhs1) == SSA_NAME)
+  for (tmp = rhs1, rhs1_code = ERROR_MARK;
+       TREE_CODE (tmp) == SSA_NAME
+       && (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK);
+       tmp = gimple_assign_rhs1 (rhs1_stmt))
     {
-      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-      if (is_gimple_assign (rhs1_stmt))
-       rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+      rhs1_stmt = SSA_NAME_DEF_STMT (tmp);
+      if (!is_gimple_assign (rhs1_stmt))
+       break;
+      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
     }

the result looks a bit like spaghetti code ... and lacks a comment
on what it is trying to do.  It looks like it sees through an arbitrary
number of conversions - possibly ones that will make the
macc invalid, as for (short)int-var * short-var + int-var.  So you'll
be pessimizing code by doing that unconditionally.  As I said
above you should at most consider one intermediate conversion.

I believe the code should be arranged such that only valid
conversions are looked through in the first place.  Valid, in
that the resulting types should still match the macc constraints.

Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-07-04 14:27       ` Andrew Stubbs
@ 2011-07-07 10:10         ` Richard Guenther
  2011-07-07 10:42           ` Andrew Stubbs
  2011-07-12 14:10         ` Andrew Stubbs
  1 sibling, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-07 10:10 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Mon, Jul 4, 2011 at 4:26 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 28/06/11 15:14, Andrew Stubbs wrote:
>>
>> On 28/06/11 13:33, Andrew Stubbs wrote:
>>>
>>> On 23/06/11 15:41, Andrew Stubbs wrote:
>>>>
>>>> If one or both of the inputs to a widening multiply are of unsigned type
>>>> then the compiler will attempt to use usmul_widen_optab or
>>>> umul_widen_optab, respectively.
>>>>
>>>> That works fine, but only if the target supports those operations
>>>> directly. Otherwise, it just bombs out and reverts to the normal
>>>> inefficient non-widening multiply.
>>>>
>>>> This patch attempts to catch these cases and use an alternative signed
>>>> widening multiply instruction, if one of those is available.
>>>>
>>>> I believe this should be legal as long as the top bit of both inputs is
>>>> guaranteed to be zero. The code achieves this guarantee by
>>>> zero-extending the inputs to a wider mode (which must still be narrower
>>>> than the output mode).
>>>>
>>>> OK?
>>>
>>> This update fixes the testsuite issue Janis pointed out.
>>
>> And this one fixes up the wmul-5.c testcase also. The patch has changed
>> the correct result.
>
> Here's an update for the context changed by the update to patch 3.
>
> The content of the patch has not changed.

+  gimple stmt = gimple_build_assign (result, fold_convert (type, val));

please use gimple_build_assign_with_ops

-convert_mult_to_widen (gimple stmt)
+convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)

The comment needs updating for the new parameter.

+         type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0);

don't use type_for_mode, use build_nonstandard_integer_type
(GET_MODE_PRECISION (from_mode), 0) instead.

Both types are equal, so please share the temporary variable you
create

+         rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+                                       create_tmp_var (type1, NULL),
rhs1, type1);
+         rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+                                       create_tmp_var (type2, NULL),
rhs2, type2);

here (CSE create_tmp_var).

+         type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
+         mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+                                            create_tmp_var (type1, NULL),
+                                            mult_rhs1, type1);
+         mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+                                            create_tmp_var (type2, NULL),
+                                            mult_rhs2, type2);

Likewise.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
  2011-07-04 14:29     ` Andrew Stubbs
@ 2011-07-07 10:11       ` Richard Guenther
  2011-07-14 14:34         ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-07 10:11 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Mon, Jul 4, 2011 at 4:29 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 28/06/11 16:08, Andrew Stubbs wrote:
>>
>> On 23/06/11 15:41, Andrew Stubbs wrote:
>>>
>>> This patch removes the restriction that the inputs to a widening
>>> multiply must be of the same mode.
>>>
>>> It does this by extending the smaller of the two inputs to match the
>>> larger; therefore, it remains the case that subsequent code (in the
>>> expand pass, for example) can rely on the type of rhs1 being the input
>>> type of the operation, and the gimple verification code is still valid.
>>>
>>> OK?
>>
>> This update fixes the testcase issue Janis highlighted.
>
> And this one updates the context changed by my update to patch 3.
>
> The content of the patch has not changed.

Similar to the previous patch

+  if (TYPE_MODE (type2) != from_mode)
+    {
+      type2 = lang_hooks.types.type_for_mode (from_mode,
+                                             TYPE_UNSIGNED (type2));

use build_nonstandard_integer_type.

+  if (cast1)
+    rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+                                 create_tmp_var (type1, NULL), rhs1, type1);
+  if (cast2)
+    rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+                                 create_tmp_var (type2, NULL), rhs2, type2);

and CSE create_tmp_var - at this point type1 and type2 should be
the same, right?  So I guess it would be a good place to assert
types_compatible_p (type1, type2).

   gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
   gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));

and that's now seemingly redundant ... it should probably be
gimple_assign_set_rhs1 (stmt, rhs1);, no?  A conversion isn't
a valid rhs1/2.  Similar oddity in convert_plusminus_to_widen.

+  if (TYPE_MODE (type2) != TYPE_MODE (type1))
+    {
+      type2 = lang_hooks.types.type_for_mode (TYPE_MODE (type1),
+                                             TYPE_UNSIGNED (type2));
+      cast2 = true;
+    }
+
+  if (cast1)
+    mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+                                      create_tmp_var (type1, NULL),
+                                      mult_rhs1, type1);
+  if (cast2)
+    mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+                                      create_tmp_var (type2, NULL),
+                                      mult_rhs2, type2);

see above.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
  2011-07-04 14:32     ` Andrew Stubbs
@ 2011-07-07 10:20       ` Richard Guenther
  2011-07-14 14:35         ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-07 10:20 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Mon, Jul 4, 2011 at 4:31 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 28/06/11 16:30, Andrew Stubbs wrote:
>>
>> On 23/06/11 15:42, Andrew Stubbs wrote:
>>>
>>> This patch fixes the case where widening multiply-and-accumulate were
>>> not recognised because the multiplication itself is not actually
>>> widening.
>>>
>>> This can happen when you have "DI + SI * SI" - the multiplication will
>>> be done in SImode as a non-widening multiply, and it's only the final
>>> accumulate step that is widening.
>>>
>>> This was not recognised for two reasons:
>>>
>>> 1. is_widening_mult_p inferred the output type from the multiply
>>> statement, which in not useful in this case.
>>>
>>> 2. The inputs to the multiply instruction may not have been converted at
>>> all (because they're not being widened), so the pattern match failed.
>>>
>>> The patch fixes these issues by making the output type explicit, and by
>>> permitting unconverted inputs (the types are still checked, so this is
>>> safe).
>>>
>>> OK?
>>
>> This update fixes Janis' testsuite issue.
>
> This updates the context changed by my update to patch 3.
>
> The content of this patch has not changed.

Ok.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-07 10:00                             ` Richard Guenther
@ 2011-07-07 10:27                               ` Andrew Stubbs
  2011-07-07 12:18                                 ` Andrew Stubbs
  2011-07-11 17:01                               ` Andrew Stubbs
  1 sibling, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-07 10:27 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Michael Matz, gcc-patches, patches

On 07/07/11 10:58, Richard Guenther wrote:
> I think you should assume that series of widenings, (int)(short)char_variable
> are already combined.  Thus I believe you only need to consider a single
> conversion in valid_types_for_madd_p.

Hmm, I'm not so sure. I'll look into it a bit further.

> +/* Check the input types, TYPE1 and TYPE2 to a widening multiply,
>
> what are those types?  Is TYPE1 the result type and TYPE2 the
> operand type?  If so why

TYPE1 and TYPE2 are the inputs to the multiply. I thought I explained 
that in the comment before the function.

> +  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
>
> this?!

The result of the multiply will be this many bits wide. This may be 
narrower than the type that holds it.

E.g., 16-bit * 8-bit gives a result at most 24-bits wide, which will 
usually be held in a 32- or 64-bit variable.

> +  initial_unsigned = TYPE_UNSIGNED (type1)&&  TYPE_UNSIGNED (type2);
>
> that also looks odd.  So probably TYPE1 isn't the result type.  If they
> are the types of the operands, then what operand is EXPR for?

EXPR, as the comment says, is the addition that follows the multiply.

> -  if (TREE_CODE (rhs1) == SSA_NAME)
> +  for (tmp = rhs1, rhs1_code = ERROR_MARK;
> +       TREE_CODE (tmp) == SSA_NAME
> +&&  (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK);
> +       tmp = gimple_assign_rhs1 (rhs1_stmt))
>       {
> -      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
> -      if (is_gimple_assign (rhs1_stmt))
> -       rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
> +      rhs1_stmt = SSA_NAME_DEF_STMT (tmp);
> +      if (!is_gimple_assign (rhs1_stmt))
> +       break;
> +      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
>       }
>
> the result looks a bit like spaghetti code ... and lacks a comment
> on what it is trying to do.  It looks like it sees through an arbitrary
> number of conversions - possibly ones that will make the
> macc invalid, as for (short)int-var * short-var + int-var.  So you'll
> be pessimizing code by doing that unconditionally.  As I said
> above you should at most consider one intermediate conversion.

Ok, I need to add a comment here. The code does indeed look back through 
an arbitrary number of conversions. It is searching for the last real 
operation before the addition, hoping to find a multiply.

> I believe the code should be arranged such that only valid
> conversions are looked through in the first place.  Valid, in
> that the resulting types should still match the macc constraints.

Well, it might be possible to discard some conversions initially, but 
until the multiply is found, and it's input types are known, we can't 
know for certain what conversions are valid.

I think I need to explain what's going on here more clearly.

   1. It finds an addition statement. It's not known yet whether it is 
part of a multiply-and-accumulate, or not.

   2. It follows the conversion chain back from each operand to see if 
it finds a multiply, or widening multiply statement.

   3. If it finds a non-widening multiply, it checks it to see if it 
could be widening multiply-and-accumulate (it will already have been 
rejected as a widening multiply on it's own, but the addition might be 
in a wider mode, or the target might provide multiply-and-accumulate 
insns that don't have corresponding widening multiply insns).

   4. (This is the new bit!) It looks to see if there are any 
conversions between the multiply and addition that can safely be ignored.

   5. If we get here, then emit any necessary conversion statements, and 
convert the addition to a WIDEN_MULT_PLUS_EXPR.

Before these changes, any conversion between the multiply and addition 
statements would prevent optimization, even though there are many cases 
where the conversions are valid, and even inserted automatically.

I'm going to go away and find out whether there are really any cases 
where there can legitimately be more than one conversion, and at least 
update my patch with better commenting.

Thanks for you review.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-07-07 10:10         ` Richard Guenther
@ 2011-07-07 10:42           ` Andrew Stubbs
  2011-07-07 11:08             ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-07 10:42 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

On 07/07/11 11:04, Richard Guenther wrote:
> Both types are equal, so please share the temporary variable you
> create
>
> +         rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
> +                                       create_tmp_var (type1, NULL),
> rhs1, type1);
> +         rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
> +                                       create_tmp_var (type2, NULL),
> rhs2, type2);
>
> here (CSE create_tmp_var).

I'm sorry, I don't understand this?

This takes code like this:

   r1 = a;
   r2 = b;
   result = r1 + r2;

And transforms it to this:

   r1 = a;
   r2 = b;
   t1 = (type1) r1;
   t2 = (type2) r2;
   result = t1 + t2;

Yes, type1 == type2, but r1 != r2, so t1 != t2.

I don't see where the common expression is here? But then, I am 
something of a newbie to tree optimizations.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-07-07 10:42           ` Andrew Stubbs
@ 2011-07-07 11:08             ` Richard Guenther
  0 siblings, 0 replies; 107+ messages in thread
From: Richard Guenther @ 2011-07-07 11:08 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Thu, Jul 7, 2011 at 12:41 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote:
> On 07/07/11 11:04, Richard Guenther wrote:
>>
>> Both types are equal, so please share the temporary variable you
>> create
>>
>> +         rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
>> +                                       create_tmp_var (type1, NULL),
>> rhs1, type1);
>> +         rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
>> +                                       create_tmp_var (type2, NULL),
>> rhs2, type2);
>>
>> here (CSE create_tmp_var).
>
> I'm sorry, I don't understand this?
>
> This takes code like this:
>
>  r1 = a;
>  r2 = b;
>  result = r1 + r2;
>
> And transforms it to this:
>
>  r1 = a;
>  r2 = b;
>  t1 = (type1) r1;
>  t2 = (type2) r2;
>  result = t1 + t2;
>
> Yes, type1 == type2, but r1 != r2, so t1 != t2.
>
> I don't see where the common expression is here? But then, I am something of
> a newbie to tree optimizations.

create_tmp_var creates a var-decl, build_and_insert_casts builds an
SSA name from it.  You can build multiple SSA names from a single
VAR_DECL, so no need to waste two VAR_DECLs for temporaries
of the same type.

Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-07 10:27                               ` Andrew Stubbs
@ 2011-07-07 12:18                                 ` Andrew Stubbs
  2011-07-07 12:34                                   ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-07 12:18 UTC (permalink / raw)
  Cc: Richard Guenther, Michael Matz, gcc-patches, patches

On 07/07/11 11:26, Andrew Stubbs wrote:
> On 07/07/11 10:58, Richard Guenther wrote:
>> I think you should assume that series of widenings,
>> (int)(short)char_variable
>> are already combined.  Thus I believe you only need to consider a single
>> conversion in valid_types_for_madd_p.
>
> Hmm, I'm not so sure. I'll look into it a bit further.

OK, here's a test case that gives multiple conversions:

   long long
   foo (long long a, signed char b, signed char c)
   {
     int bc = b * c;
     return a + (short)bc;
   }

The dump right before the widen_mult pass gives:

   foo (long long int a, signed char b, signed char c)
   {
     int bc;
     long long int D.2018;
     short int D.2017;
     long long int D.2016;
     int D.2015;
     int D.2014;

   <bb 2>:
     D.2014_2 = (int) b_1(D);
     D.2015_4 = (int) c_3(D);
     bc_5 = D.2014_2 * D.2015_4;
     D.2017_6 = (short int) bc_5;
     D.2018_7 = (long long int) D.2017_6;
     D.2016_9 = D.2018_7 + a_8(D);
     return D.2016_9;

   }

Here we have a multiply and accumulate done the long way. The 8-bit 
inputs are widened to 32-bit, multiplied to give a 32-bit result (of 
which only the lower 16-bits contain meaningful data), then truncated to 
16-bits, and sign-extended up to 64-bits ready for the 64-bit addition.

This is slight contrived, perhaps, but not unlike the sort of thing that 
might occur when you have inline functions and macros, and most 
importantly - it is mathematically valid!

So, here's the output from my patched widen_mult pass:

   foo (long long int a, signed char b, signed char c)
   {
     int bc;
     long long int D.2018;
     short int D.2017;
     long long int D.2016;
     int D.2015;
     int D.2014;

   <bb 2>:
     D.2014_2 = (int) b_1(D);
     D.2015_4 = (int) c_3(D);
     bc_5 = b_1(D) w* c_3(D);
     D.2017_6 = (short int) bc_5;
     D.2018_7 = (long long int) D.2017_6;
     D.2016_9 = WIDEN_MULT_PLUS_EXPR <b_1(D), c_3(D), a_8(D)>;
     return D.2016_9;

   }

As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is 
now redundant. (Ideally, this would be removed now, but in fact it 
doesn't get eliminated until the RTL into_cfglayout pass. This is not 
new behaviour.)

My point is that it's possible to have at least two conversions to 
examine. Is it possible to have more? I don't know, but once I'm dealing 
with two I might as well deal with an arbitrary number.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-07 12:18                                 ` Andrew Stubbs
@ 2011-07-07 12:34                                   ` Richard Guenther
  2011-07-07 12:49                                     ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-07 12:34 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Michael Matz, gcc-patches, patches

On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote:
> On 07/07/11 11:26, Andrew Stubbs wrote:
>>
>> On 07/07/11 10:58, Richard Guenther wrote:
>>>
>>> I think you should assume that series of widenings,
>>> (int)(short)char_variable
>>> are already combined.  Thus I believe you only need to consider a single
>>> conversion in valid_types_for_madd_p.
>>
>> Hmm, I'm not so sure. I'll look into it a bit further.
>
> OK, here's a test case that gives multiple conversions:
>
>  long long
>  foo (long long a, signed char b, signed char c)
>  {
>    int bc = b * c;
>    return a + (short)bc;
>  }
>
> The dump right before the widen_mult pass gives:
>
>  foo (long long int a, signed char b, signed char c)
>  {
>    int bc;
>    long long int D.2018;
>    short int D.2017;
>    long long int D.2016;
>    int D.2015;
>    int D.2014;
>
>  <bb 2>:
>    D.2014_2 = (int) b_1(D);
>    D.2015_4 = (int) c_3(D);
>    bc_5 = D.2014_2 * D.2015_4;
>    D.2017_6 = (short int) bc_5;

Ok, so you have a truncation that is a no-op value-wise.  I would
argue that this truncation should be removed independent on
whether we have a widening multiply instruction or not.

The technically most capable place to remove non-value-changing
truncations (and combine them with a successive conversion)
would be value-range propagation.  Which already knows:

Value ranges after VRP:

b_1(D): VARYING
D.2698_2: [-128, 127]
c_3(D): VARYING
D.2699_4: [-128, 127]
bc_5: [-16256, 16384]
D.2701_6: [-16256, 16384]
D.2702_7: [-16256, 16384]
a_8(D): VARYING
D.2700_9: VARYING

thus truncating bc_5 to short does not change the value.

The simplification could be made when looking at the
statement

>    D.2018_7 = (long long int) D.2017_6;

in vrp_fold_stmt, based on the fact that this conversion
converts from a value-preserving intermediate conversion.
Thus the transform would replace the D.2017_6 operand
with bc_5.

So yes, the case appears - but it shouldn't ;)

I'll cook up a quick patch for VRP.

Thanks,
Richard.

>    D.2016_9 = D.2018_7 + a_8(D);
>    return D.2016_9;
>
>  }
>
> Here we have a multiply and accumulate done the long way. The 8-bit inputs
> are widened to 32-bit, multiplied to give a 32-bit result (of which only the
> lower 16-bits contain meaningful data), then truncated to 16-bits, and
> sign-extended up to 64-bits ready for the 64-bit addition.
>
> This is slight contrived, perhaps, but not unlike the sort of thing that
> might occur when you have inline functions and macros, and most importantly
> - it is mathematically valid!
>
>
> So, here's the output from my patched widen_mult pass:
>
>  foo (long long int a, signed char b, signed char c)
>  {
>    int bc;
>    long long int D.2018;
>    short int D.2017;
>    long long int D.2016;
>    int D.2015;
>    int D.2014;
>
>  <bb 2>:
>    D.2014_2 = (int) b_1(D);
>    D.2015_4 = (int) c_3(D);
>    bc_5 = b_1(D) w* c_3(D);
>    D.2017_6 = (short int) bc_5;
>    D.2018_7 = (long long int) D.2017_6;
>    D.2016_9 = WIDEN_MULT_PLUS_EXPR <b_1(D), c_3(D), a_8(D)>;
>    return D.2016_9;
>
>  }
>
> As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is now
> redundant. (Ideally, this would be removed now, but in fact it doesn't get
> eliminated until the RTL into_cfglayout pass. This is not new behaviour.)
>
>
> My point is that it's possible to have at least two conversions to examine.
> Is it possible to have more? I don't know, but once I'm dealing with two I
> might as well deal with an arbitrary number.
>
> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-07 12:34                                   ` Richard Guenther
@ 2011-07-07 12:49                                     ` Richard Guenther
  2011-07-08 12:55                                       ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-07 12:49 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Michael Matz, gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 2336 bytes --]

On Thu, Jul 7, 2011 at 2:28 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote:
>> On 07/07/11 11:26, Andrew Stubbs wrote:
>>>
>>> On 07/07/11 10:58, Richard Guenther wrote:
>>>>
>>>> I think you should assume that series of widenings,
>>>> (int)(short)char_variable
>>>> are already combined.  Thus I believe you only need to consider a single
>>>> conversion in valid_types_for_madd_p.
>>>
>>> Hmm, I'm not so sure. I'll look into it a bit further.
>>
>> OK, here's a test case that gives multiple conversions:
>>
>>  long long
>>  foo (long long a, signed char b, signed char c)
>>  {
>>    int bc = b * c;
>>    return a + (short)bc;
>>  }
>>
>> The dump right before the widen_mult pass gives:
>>
>>  foo (long long int a, signed char b, signed char c)
>>  {
>>    int bc;
>>    long long int D.2018;
>>    short int D.2017;
>>    long long int D.2016;
>>    int D.2015;
>>    int D.2014;
>>
>>  <bb 2>:
>>    D.2014_2 = (int) b_1(D);
>>    D.2015_4 = (int) c_3(D);
>>    bc_5 = D.2014_2 * D.2015_4;
>>    D.2017_6 = (short int) bc_5;
>
> Ok, so you have a truncation that is a no-op value-wise.  I would
> argue that this truncation should be removed independent on
> whether we have a widening multiply instruction or not.
>
> The technically most capable place to remove non-value-changing
> truncations (and combine them with a successive conversion)
> would be value-range propagation.  Which already knows:
>
> Value ranges after VRP:
>
> b_1(D): VARYING
> D.2698_2: [-128, 127]
> c_3(D): VARYING
> D.2699_4: [-128, 127]
> bc_5: [-16256, 16384]
> D.2701_6: [-16256, 16384]
> D.2702_7: [-16256, 16384]
> a_8(D): VARYING
> D.2700_9: VARYING
>
> thus truncating bc_5 to short does not change the value.
>
> The simplification could be made when looking at the
> statement
>
>>    D.2018_7 = (long long int) D.2017_6;
>
> in vrp_fold_stmt, based on the fact that this conversion
> converts from a value-preserving intermediate conversion.
> Thus the transform would replace the D.2017_6 operand
> with bc_5.
>
> So yes, the case appears - but it shouldn't ;)
>
> I'll cook up a quick patch for VRP.

Like the attached.  I'll finish and properly test it.

Richard.

[-- Attachment #2: p --]
[-- Type: application/octet-stream, Size: 10194 bytes --]

Index: gcc/tree-vrp.c
===================================================================
--- gcc/tree-vrp.c	(revision 175962)
+++ gcc/tree-vrp.c	(working copy)
@@ -161,10 +161,10 @@ static VEC (switch_update, heap) *to_upd
 static inline tree
 vrp_val_max (const_tree type)
 {
-  if (!INTEGRAL_TYPE_P (type))
-    return NULL_TREE;
+  if (INTEGRAL_TYPE_P (type))
+    return upper_bound_in_type (CONST_CAST_TREE (type), CONST_CAST_TREE (type));
 
-  return TYPE_MAX_VALUE (type);
+  return NULL_TREE;
 }
 
 /* Return the minimum value for TYPE.  */
@@ -172,10 +172,10 @@ vrp_val_max (const_tree type)
 static inline tree
 vrp_val_min (const_tree type)
 {
-  if (!INTEGRAL_TYPE_P (type))
-    return NULL_TREE;
+  if (INTEGRAL_TYPE_P (type))
+    return lower_bound_in_type (CONST_CAST_TREE (type), CONST_CAST_TREE (type));
 
-  return TYPE_MIN_VALUE (type);
+  return NULL_TREE;
 }
 
 /* Return whether VAL is equal to the maximum value of its type.  This
@@ -565,7 +565,7 @@ set_value_range_to_nonnegative (value_ra
   set_value_range (vr, VR_RANGE, zero,
 		   (overflow_infinity
 		    ? positive_overflow_infinity (type)
-		    : TYPE_MAX_VALUE (type)),
+		    : vrp_val_max (type)),
 		   vr->equiv);
 }
 
@@ -1627,7 +1627,7 @@ extract_range_from_assert (value_range_t
     }
   else if (cond_code == LE_EXPR || cond_code == LT_EXPR)
     {
-      min = TYPE_MIN_VALUE (type);
+      min = vrp_val_min (type);
 
       if (limit_vr == NULL || limit_vr->type == VR_ANTI_RANGE)
 	max = limit;
@@ -1662,7 +1662,7 @@ extract_range_from_assert (value_range_t
     }
   else if (cond_code == GE_EXPR || cond_code == GT_EXPR)
     {
-      max = TYPE_MAX_VALUE (type);
+      max = vrp_val_max (type);
 
       if (limit_vr == NULL || limit_vr->type == VR_ANTI_RANGE)
 	min = limit;
@@ -2079,11 +2079,11 @@ vrp_int_const_binop (enum tree_code code
 	  || code == ROUND_DIV_EXPR)
 	return (needs_overflow_infinity (TREE_TYPE (res))
 		? positive_overflow_infinity (TREE_TYPE (res))
-		: TYPE_MAX_VALUE (TREE_TYPE (res)));
+		: vrp_val_max (TREE_TYPE (res)));
       else
 	return (needs_overflow_infinity (TREE_TYPE (res))
 		? negative_overflow_infinity (TREE_TYPE (res))
-		: TYPE_MIN_VALUE (TREE_TYPE (res)));
+		: vrp_val_min (TREE_TYPE (res)));
     }
 
   return res;
@@ -2888,8 +2888,8 @@ extract_range_from_unary_expr (value_ran
 	  && TYPE_PRECISION (inner_type) < TYPE_PRECISION (outer_type))
 	{
 	  vr0.type = VR_RANGE;
-	  vr0.min = TYPE_MIN_VALUE (inner_type);
-	  vr0.max = TYPE_MAX_VALUE (inner_type);
+	  vr0.min = vrp_val_min (inner_type);
+	  vr0.max = vrp_val_max (inner_type);
 	}
 
       /* If VR0 is a constant range or anti-range and the conversion is
@@ -2974,7 +2974,7 @@ extract_range_from_unary_expr (value_ran
 	    }
 	}
       else
-	min = TYPE_MIN_VALUE (type);
+	min = vrp_val_min (type);
 
       if (is_positive_overflow_infinity (vr0.min))
 	max = negative_overflow_infinity (type);
@@ -2993,7 +2993,7 @@ extract_range_from_unary_expr (value_ran
 	    }
 	}
       else
-	max = TYPE_MIN_VALUE (type);
+	max = vrp_val_min (type);
     }
   else if (code == NEGATE_EXPR
 	   && TYPE_UNSIGNED (type))
@@ -3035,7 +3035,7 @@ extract_range_from_unary_expr (value_ran
       else if (!vrp_val_is_min (vr0.min))
 	min = fold_unary_to_constant (code, type, vr0.min);
       else if (!needs_overflow_infinity (type))
-	min = TYPE_MAX_VALUE (type);
+	min = vrp_val_max (type);
       else if (supports_overflow_infinity (type))
 	min = positive_overflow_infinity (type);
       else
@@ -3049,7 +3049,7 @@ extract_range_from_unary_expr (value_ran
       else if (!vrp_val_is_min (vr0.max))
 	max = fold_unary_to_constant (code, type, vr0.max);
       else if (!needs_overflow_infinity (type))
-	max = TYPE_MAX_VALUE (type);
+	max = vrp_val_max (type);
       else if (supports_overflow_infinity (type)
 	       /* We shouldn't generate [+INF, +INF] as set_value_range
 		  doesn't like this and ICEs.  */
@@ -3079,7 +3079,7 @@ extract_range_from_unary_expr (value_ran
 	         TYPE_MIN_VALUE, remember -TYPE_MIN_VALUE = TYPE_MIN_VALUE.  */
 	      if (TYPE_OVERFLOW_WRAPS (type))
 		{
-		  tree type_min_value = TYPE_MIN_VALUE (type);
+		  tree type_min_value = vrp_val_min (type);
 
 		  min = (vr0.min != type_min_value
 			 ? int_const_binop (PLUS_EXPR, type_min_value,
@@ -3091,7 +3091,7 @@ extract_range_from_unary_expr (value_ran
 		  if (overflow_infinity_range_p (&vr0))
 		    min = negative_overflow_infinity (type);
 		  else
-		    min = TYPE_MIN_VALUE (type);
+		    min = vrp_val_min (type);
 		}
 	    }
 	  else
@@ -3112,7 +3112,7 @@ extract_range_from_unary_expr (value_ran
 		    }
 		}
 	      else
-		max = TYPE_MAX_VALUE (type);
+		max = vrp_val_max (type);
 	    }
 	}
 
@@ -3396,11 +3396,11 @@ adjust_range_with_scev (value_range_t *v
   if (POINTER_TYPE_P (type) || !TYPE_MIN_VALUE (type))
     tmin = lower_bound_in_type (type, type);
   else
-    tmin = TYPE_MIN_VALUE (type);
+    tmin = vrp_val_min (type);
   if (POINTER_TYPE_P (type) || !TYPE_MAX_VALUE (type))
     tmax = upper_bound_in_type (type, type);
   else
-    tmax = TYPE_MAX_VALUE (type);
+    tmax = vrp_val_max (type);
 
   /* Try to use estimated number of iterations for the loop to constrain the
      final value in the evolution.  */
@@ -4318,8 +4318,8 @@ extract_code_and_val_from_cond_with_ops
   if ((comp_code == GT_EXPR || comp_code == LT_EXPR)
       && INTEGRAL_TYPE_P (TREE_TYPE (val)))
     {
-      tree min = TYPE_MIN_VALUE (TREE_TYPE (val));
-      tree max = TYPE_MAX_VALUE (TREE_TYPE (val));
+      tree min = vrp_val_min (TREE_TYPE (val));
+      tree max = vrp_val_max (TREE_TYPE (val));
 
       if (comp_code == GT_EXPR
 	  && (!max
@@ -6685,7 +6685,7 @@ vrp_visit_phi_node (gimple phi)
 	{
 	  if (!needs_overflow_infinity (TREE_TYPE (vr_result.min))
 	      || !vrp_var_may_overflow (lhs, phi))
-	    vr_result.min = TYPE_MIN_VALUE (TREE_TYPE (vr_result.min));
+	    vr_result.min = vrp_val_min (TREE_TYPE (vr_result.min));
 	  else if (supports_overflow_infinity (TREE_TYPE (vr_result.min)))
 	    vr_result.min =
 		negative_overflow_infinity (TREE_TYPE (vr_result.min));
@@ -6697,7 +6697,7 @@ vrp_visit_phi_node (gimple phi)
 	{
 	  if (!needs_overflow_infinity (TREE_TYPE (vr_result.max))
 	      || !vrp_var_may_overflow (lhs, phi))
-	    vr_result.max = TYPE_MAX_VALUE (TREE_TYPE (vr_result.max));
+	    vr_result.max = vrp_val_max (TREE_TYPE (vr_result.max));
 	  else if (supports_overflow_infinity (TREE_TYPE (vr_result.max)))
 	    vr_result.max =
 		positive_overflow_infinity (TREE_TYPE (vr_result.max));
@@ -7119,7 +7119,7 @@ test_for_singularity (enum tree_code con
     {
       /* This should not be negative infinity; there is no overflow
 	 here.  */
-      min = TYPE_MIN_VALUE (TREE_TYPE (op0));
+      min = vrp_val_min (TREE_TYPE (op0));
 
       max = op1;
       if (cond_code == LT_EXPR && !is_overflow_infinity (max))
@@ -7134,7 +7134,7 @@ test_for_singularity (enum tree_code con
     {
       /* This should not be positive infinity; there is no overflow
 	 here.  */
-      max = TYPE_MAX_VALUE (TREE_TYPE (op0));
+      max = vrp_val_max (TREE_TYPE (op0));
 
       min = op1;
       if (cond_code == GT_EXPR && !is_overflow_infinity (min))
@@ -7342,6 +7342,33 @@ simplify_switch_using_ranges (gimple stm
   return false;
 }
 
+/* Simplify an integral conversion from an SSA name in STMT.  */
+
+static bool
+simplify_conversion_using_ranges (gimple stmt)
+{
+  tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  gimple def_stmt = SSA_NAME_DEF_STMT (rhs1);
+  value_range_t *vr;
+
+  if (!is_gimple_assign (def_stmt)
+      || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
+    return false;
+  rhs1 = gimple_assign_rhs1 (def_stmt);
+  if (TREE_CODE (rhs1) != SSA_NAME)
+    return false;
+  vr = get_value_range (rhs1);
+  if (vr->type != VR_RANGE)
+    return false;
+  if (!int_fits_type_p (vr->min, type)
+      || !int_fits_type_p (vr->max, type))
+    return false;
+  gimple_assign_set_rhs1 (stmt, rhs1);
+  update_stmt (stmt);
+  return true;
+}
+
 /* Simplify STMT using ranges if possible.  */
 
 static bool
@@ -7351,6 +7378,7 @@ simplify_stmt_using_ranges (gimple_stmt_
   if (is_gimple_assign (stmt))
     {
       enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
+      tree rhs1 = gimple_assign_rhs1 (stmt);
 
       switch (rhs_code)
 	{
@@ -7364,7 +7392,7 @@ simplify_stmt_using_ranges (gimple_stmt_
 	     or identity if the RHS is zero or one, and the LHS are known
 	     to be boolean values.  Transform all TRUTH_*_EXPR into
              BIT_*_EXPR if both arguments are known to be boolean values.  */
-	  if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
+	  if (INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
 	    return simplify_truth_ops_using_ranges (gsi, stmt);
 	  break;
 
@@ -7373,15 +7401,15 @@ simplify_stmt_using_ranges (gimple_stmt_
 	 than zero and the second operand is an exact power of two.  */
 	case TRUNC_DIV_EXPR:
 	case TRUNC_MOD_EXPR:
-	  if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	  if (INTEGRAL_TYPE_P (TREE_TYPE (rhs1))
 	      && integer_pow2p (gimple_assign_rhs2 (stmt)))
 	    return simplify_div_or_mod_using_ranges (stmt);
 	  break;
 
       /* Transform ABS (X) into X or -X as appropriate.  */
 	case ABS_EXPR:
-	  if (TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME
-	      && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
+	  if (TREE_CODE (rhs1) == SSA_NAME
+	      && INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
 	    return simplify_abs_using_ranges (stmt);
 	  break;
 
@@ -7390,10 +7418,16 @@ simplify_stmt_using_ranges (gimple_stmt_
 	  /* Optimize away BIT_AND_EXPR and BIT_IOR_EXPR
 	     if all the bits being cleared are already cleared or
 	     all the bits being set are already set.  */
-	  if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
+	  if (INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
 	    return simplify_bit_ops_using_ranges (gsi, stmt);
 	  break;
 
+	CASE_CONVERT:
+	  if (TREE_CODE (rhs1) == SSA_NAME
+	      && INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
+	    return simplify_conversion_using_ranges (stmt);
+	  break;
+
 	default:
 	  break;
 	}

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-07 12:49                                     ` Richard Guenther
@ 2011-07-08 12:55                                       ` Andrew Stubbs
  2011-07-08 13:22                                         ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-08 12:55 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Michael Matz, gcc-patches, patches

On 07/07/11 13:37, Richard Guenther wrote:
>> I'll cook up a quick patch for VRP.
>
> Like the attached.  I'll finish and properly test it.

Your patch appears to do the wrong thing for this test case:

int
foo (int a, short b, short c)
{
   int bc = b * c;
   return a + (short)bc;
}

With your patch, the input to the widening-mult pass now looks like this:

foo (int a, short int b, short int c)
{
   int bc;
   int D.2016;
   int D.2015;
   int D.2014;

<bb 2>:
   D.2014_2 = (int) b_1(D);
   D.2015_4 = (int) c_3(D);
   bc_5 = D.2014_2 * D.2015_4;
   D.2016_9 = bc_5 + a_8(D);
   return D.2016_9;

}

It looks like when the user tries to deliberately break the maths your 
patch seems to unbreak it.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-08 12:55                                       ` Andrew Stubbs
@ 2011-07-08 13:22                                         ` Richard Guenther
  0 siblings, 0 replies; 107+ messages in thread
From: Richard Guenther @ 2011-07-08 13:22 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Michael Matz, gcc-patches, patches

On Fri, Jul 8, 2011 at 2:44 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 07/07/11 13:37, Richard Guenther wrote:
>>>
>>> I'll cook up a quick patch for VRP.
>>
>> Like the attached.  I'll finish and properly test it.
>
> Your patch appears to do the wrong thing for this test case:
>
> int
> foo (int a, short b, short c)
> {
>  int bc = b * c;
>  return a + (short)bc;
> }
>
> With your patch, the input to the widening-mult pass now looks like this:
>
> foo (int a, short int b, short int c)
> {
>  int bc;
>  int D.2016;
>  int D.2015;
>  int D.2014;
>
> <bb 2>:
>  D.2014_2 = (int) b_1(D);
>  D.2015_4 = (int) c_3(D);
>  bc_5 = D.2014_2 * D.2015_4;
>  D.2016_9 = bc_5 + a_8(D);
>  return D.2016_9;
>
> }
>
> It looks like when the user tries to deliberately break the maths your patch
> seems to unbreak it.

Yeah, I fixed that in the checked in version.

Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-06-23 14:39 ` [PATCH (1/7)] New optab framework for widening multiplies Andrew Stubbs
@ 2011-07-09 15:38   ` Andrew Stubbs
  2011-07-14 15:29     ` Andrew Stubbs
  2011-07-22 13:01     ` Bernd Schmidt
  0 siblings, 2 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-09 15:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 3533 bytes --]

On 23/06/11 15:37, Andrew Stubbs wrote:
> This patch should have no effect on the compiler output. It merely
> replaces one way to represent widening operations with another, and
> refactors the other parts of the compiler to match. The rest of the
> patch set uses this new framework to implement the optimization
> improvements.
>
> I considered and discarded many approaches to this patch before arriving
> at this solution, and I feel sure that there'll be somebody out there
> who will think I chose the wrong one, so let me first explain how I got
> here ....
>
> The aim is to be able to encode and query optabs that have any given
> input mode, and any given output mode. This is similar to the
> convert_optab, but not compatible with that optab since it is handled
> completely differently in the code.
>
> (Just to be clear, the existing widening multiply support only covers
> instructions that widen by *one* mode, so it's only ever been necessary
> to know the output mode, up to now.)
>
> Option 1 was to add a second dimension to the handlers table in optab_d,
> but I discarded this option because it would increase the memory usage
> by the square of the number of modes, which is a bit much.
>
> Option 2 was to add a whole new optab, similar to optab_d, but with a
> second dimension like convert_optab_d, however this turned out to cause
> way too many pointer type mismatches in the code, and would have been
> very difficult to fix up.
>
> Option 3 was to add new optab entries for widening by two modes, by
> three modes, and so on. True, I would only need to add one extra set for
> what I need, but there would be so many places in the code that compare
> against smul_widen_optab, for example, that would need to be taught
> about these, that it seemed like a bad idea.
>
> Option 4 was to have a separate table that contained the widening
> operations, and refer to that whenever a widening entry in the main
> optab is referenced, but I found that there was no easy way to do the
> mapping without putting some sort of switch table in
> widening_optab_handler, and that negates the other advantages.
>
> So, what I've done in the end is add a new pointer entry "widening" into
> optab_d, and dynamically build the widening operations table for each
> optab that needs it. I've then added new accessor functions that take
> both input and output modes, and altered the code to use them where
> appropriate.
>
> The down-side of this approach is that the optab entries for widening
> operations now have two "handlers" tables, one of which is redundant.
> That said, those cases are in the minority, and it is the smaller table
> which is unused.
>
> If people find that very distasteful, it might be possible to remove the
> *_widen_optab entries and unify smul_optab with smul_widen_optab, and so
> on, and save space that way. I've not done so yet, but I expect I could
> if people feel strongly about it.
>
> As a side-effect, it's now possible for any optab to be "widening",
> should some target happen to have a widening add, shift, or whatever.
>
> Is this patch OK?

This update has been rebaselined to fix some conflicts with other recent 
commits in this area.

I also identified a small bug which resulted in the operands to some 
commutative operations being reversed. I don't believe the bug did any 
harm, logically speaking, but I suppose there could be a testcase that 
resulted in worse code being generated. With this fix, I now see exactly 
matching output in all my testcases.

Andrew

[-- Attachment #2: widening-multiplies-1.patch --]
[-- Type: text/x-patch, Size: 14205 bytes --]

2011-07-09  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* expr.c (expand_expr_real_2): Use widening_optab_handler.
	* genopinit.c (optabs): Use set_widening_optab_handler for $N.
	(gen_insn): $N now means $a must be wider than $b, not consecutive.
	* optabs.c (expand_widen_pattern_expr): Use widening_optab_handler.
	(expand_binop_directly): Likewise.
	(expand_binop): Likewise.
	* optabs.h (widening_optab_handlers): New struct.
	(optab_d): New member, 'widening'.
	(widening_optab_handler): New function.
	(set_widening_optab_handler): New function.
	* tree-ssa-math-opts.c (convert_mult_to_widen): Use
	widening_optab_handler.
	(convert_plusminus_to_widen): Likewise.

--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7640,7 +7640,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  this_optab = usmul_widen_optab;
 	  if (mode == GET_MODE_2XWIDER_MODE (innermode))
 	    {
-	      if (optab_handler (this_optab, mode) != CODE_FOR_nothing)
+	      if (widening_optab_handler (this_optab, mode, innermode)
+		    != CODE_FOR_nothing)
 		{
 		  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
 		    expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -7667,7 +7668,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  if (mode == GET_MODE_2XWIDER_MODE (innermode)
 	      && TREE_CODE (treeop0) != INTEGER_CST)
 	    {
-	      if (optab_handler (this_optab, mode) != CODE_FOR_nothing)
+	      if (widening_optab_handler (this_optab, mode, innermode)
+		    != CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
 				   EXPAND_NORMAL);
@@ -7675,7 +7677,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 					       unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	      if (optab_handler (other_optab, mode) != CODE_FOR_nothing
+	      if (widening_optab_handler (other_optab, mode, innermode)
+		    != CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
 		  rtx htem, hipart;
--- a/gcc/genopinit.c
+++ b/gcc/genopinit.c
@@ -46,10 +46,12 @@ along with GCC; see the file COPYING3.  If not see
    used.  $A and $B are replaced with the full name of the mode; $a and $b
    are replaced with the short form of the name, as above.
 
-   If $N is present in the pattern, it means the two modes must be consecutive
-   widths in the same mode class (e.g, QImode and HImode).  $I means that
-   only full integer modes should be considered for the next mode, and $F
-   means that only float modes should be considered.
+   If $N is present in the pattern, it means the two modes must be in
+   the same mode class, and $b must be greater than $a (e.g, QImode
+   and HImode).
+
+   $I means that only full integer modes should be considered for the
+   next mode, and $F means that only float modes should be considered.
    $P means that both full and partial integer modes should be considered.
    $Q means that only fixed-point modes should be considered.
 
@@ -99,17 +101,17 @@ static const char * const optabs[] =
   "set_optab_handler (smulv_optab, $A, CODE_FOR_$(mulv$I$a3$))",
   "set_optab_handler (umul_highpart_optab, $A, CODE_FOR_$(umul$a3_highpart$))",
   "set_optab_handler (smul_highpart_optab, $A, CODE_FOR_$(smul$a3_highpart$))",
-  "set_optab_handler (smul_widen_optab, $B, CODE_FOR_$(mul$a$b3$)$N)",
-  "set_optab_handler (umul_widen_optab, $B, CODE_FOR_$(umul$a$b3$)$N)",
-  "set_optab_handler (usmul_widen_optab, $B, CODE_FOR_$(usmul$a$b3$)$N)",
-  "set_optab_handler (smadd_widen_optab, $B, CODE_FOR_$(madd$a$b4$)$N)",
-  "set_optab_handler (umadd_widen_optab, $B, CODE_FOR_$(umadd$a$b4$)$N)",
-  "set_optab_handler (ssmadd_widen_optab, $B, CODE_FOR_$(ssmadd$a$b4$)$N)",
-  "set_optab_handler (usmadd_widen_optab, $B, CODE_FOR_$(usmadd$a$b4$)$N)",
-  "set_optab_handler (smsub_widen_optab, $B, CODE_FOR_$(msub$a$b4$)$N)",
-  "set_optab_handler (umsub_widen_optab, $B, CODE_FOR_$(umsub$a$b4$)$N)",
-  "set_optab_handler (ssmsub_widen_optab, $B, CODE_FOR_$(ssmsub$a$b4$)$N)",
-  "set_optab_handler (usmsub_widen_optab, $B, CODE_FOR_$(usmsub$a$b4$)$N)",
+  "set_widening_optab_handler (smul_widen_optab, $B, $A, CODE_FOR_$(mul$a$b3$)$N)",
+  "set_widening_optab_handler (umul_widen_optab, $B, $A, CODE_FOR_$(umul$a$b3$)$N)",
+  "set_widening_optab_handler (usmul_widen_optab, $B, $A, CODE_FOR_$(usmul$a$b3$)$N)",
+  "set_widening_optab_handler (smadd_widen_optab, $B, $A, CODE_FOR_$(madd$a$b4$)$N)",
+  "set_widening_optab_handler (umadd_widen_optab, $B, $A, CODE_FOR_$(umadd$a$b4$)$N)",
+  "set_widening_optab_handler (ssmadd_widen_optab, $B, $A, CODE_FOR_$(ssmadd$a$b4$)$N)",
+  "set_widening_optab_handler (usmadd_widen_optab, $B, $A, CODE_FOR_$(usmadd$a$b4$)$N)",
+  "set_widening_optab_handler (smsub_widen_optab, $B, $A, CODE_FOR_$(msub$a$b4$)$N)",
+  "set_widening_optab_handler (umsub_widen_optab, $B, $A, CODE_FOR_$(umsub$a$b4$)$N)",
+  "set_widening_optab_handler (ssmsub_widen_optab, $B, $A, CODE_FOR_$(ssmsub$a$b4$)$N)",
+  "set_widening_optab_handler (usmsub_widen_optab, $B, $A, CODE_FOR_$(usmsub$a$b4$)$N)",
   "set_optab_handler (sdiv_optab, $A, CODE_FOR_$(div$a3$))",
   "set_optab_handler (ssdiv_optab, $A, CODE_FOR_$(ssdiv$Q$a3$))",
   "set_optab_handler (sdivv_optab, $A, CODE_FOR_$(div$V$I$a3$))",
@@ -305,7 +307,7 @@ gen_insn (rtx insn)
     {
       int force_float = 0, force_int = 0, force_partial_int = 0;
       int force_fixed = 0;
-      int force_consec = 0;
+      int force_wider = 0;
       int matches = 1;
 
       for (pp = optabs[pindex]; pp[0] != '$' || pp[1] != '('; pp++)
@@ -323,7 +325,7 @@ gen_insn (rtx insn)
 	    switch (*++pp)
 	      {
 	      case 'N':
-		force_consec = 1;
+		force_wider = 1;
 		break;
 	      case 'I':
 		force_int = 1;
@@ -392,7 +394,10 @@ gen_insn (rtx insn)
 			    || mode_class[i] == MODE_VECTOR_FRACT
 			    || mode_class[i] == MODE_VECTOR_UFRACT
 			    || mode_class[i] == MODE_VECTOR_ACCUM
-			    || mode_class[i] == MODE_VECTOR_UACCUM))
+			    || mode_class[i] == MODE_VECTOR_UACCUM)
+			&& (! force_wider
+			    || *pp == 'a'
+			    || m1 < i))
 		      break;
 		  }
 
@@ -412,8 +417,7 @@ gen_insn (rtx insn)
 	}
 
       if (matches && pp[0] == '$' && pp[1] == ')'
-	  && *np == 0
-	  && (! force_consec || (int) GET_MODE_WIDER_MODE(m1) == m2))
+	  && *np == 0)
 	break;
     }
 
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -515,8 +515,8 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
       || ops->code == WIDEN_MULT_MINUS_EXPR)
-    icode = optab_handler (widen_pattern_optab,
-			   TYPE_MODE (TREE_TYPE (ops->op2)));
+    icode = widening_optab_handler (widen_pattern_optab,
+				    TYPE_MODE (TREE_TYPE (ops->op2)), tmode0);
   else
     icode = optab_handler (widen_pattern_optab, tmode0);
   gcc_assert (icode != CODE_FOR_nothing);
@@ -1242,7 +1242,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
 		       rtx target, int unsignedp, enum optab_methods methods,
 		       rtx last)
 {
-  enum insn_code icode = optab_handler (binoptab, mode);
+  enum machine_mode from_mode = GET_MODE (op0);
+  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);
   enum machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
   enum machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
   enum machine_mode mode0, mode1, tmp_mode;
@@ -1389,7 +1390,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
   /* If we can do it with a three-operand insn, do so.  */
 
   if (methods != OPTAB_MUST_WIDEN
-      && optab_handler (binoptab, mode) != CODE_FOR_nothing)
+      && widening_optab_handler (binoptab, mode, GET_MODE (op0))
+	    != CODE_FOR_nothing)
     {
       temp = expand_binop_directly (mode, binoptab, op0, op1, target,
 				    unsignedp, methods, last);
@@ -1429,8 +1431,9 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 
   if (binoptab == smul_optab
       && GET_MODE_2XWIDER_MODE (mode) != VOIDmode
-      && (optab_handler ((unsignedp ? umul_widen_optab : smul_widen_optab),
-			 GET_MODE_2XWIDER_MODE (mode))
+      && (widening_optab_handler ((unsignedp ? umul_widen_optab
+					     : smul_widen_optab),
+				  GET_MODE_2XWIDER_MODE (mode), mode)
 	  != CODE_FOR_nothing))
     {
       temp = expand_binop (GET_MODE_2XWIDER_MODE (mode),
@@ -1457,12 +1460,14 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	 wider_mode != VOIDmode;
 	 wider_mode = GET_MODE_WIDER_MODE (wider_mode))
       {
-	if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
+	if (optab_handler (binoptab, wider_mode)
+		!= CODE_FOR_nothing
 	    || (binoptab == smul_optab
 		&& GET_MODE_WIDER_MODE (wider_mode) != VOIDmode
-		&& (optab_handler ((unsignedp ? umul_widen_optab
-				    : smul_widen_optab),
-				   GET_MODE_WIDER_MODE (wider_mode))
+		&& (widening_optab_handler ((unsignedp ? umul_widen_optab
+						       : smul_widen_optab),
+					    GET_MODE_WIDER_MODE (wider_mode),
+					    mode)
 		    != CODE_FOR_nothing)))
 	  {
 	    rtx xop0 = op0, xop1 = op1;
@@ -1895,8 +1900,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
       && optab_handler (add_optab, word_mode) != CODE_FOR_nothing)
     {
       rtx product = NULL_RTX;
-
-      if (optab_handler (umul_widen_optab, mode) != CODE_FOR_nothing)
+      if (widening_optab_handler (umul_widen_optab, mode, word_mode)
+	    != CODE_FOR_nothing)
 	{
 	  product = expand_doubleword_mult (mode, op0, op1, target,
 					    true, methods);
@@ -1905,7 +1910,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	}
 
       if (product == NULL_RTX
-	  && optab_handler (smul_widen_optab, mode) != CODE_FOR_nothing)
+	  && widening_optab_handler (smul_widen_optab, mode, word_mode)
+		!= CODE_FOR_nothing)
 	{
 	  product = expand_doubleword_mult (mode, op0, op1, target,
 					    false, methods);
@@ -1996,7 +2002,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	   wider_mode != VOIDmode;
 	   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
 	{
-	  if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
+	  if (widening_optab_handler (binoptab, wider_mode, mode)
+		  != CODE_FOR_nothing
 	      || (methods == OPTAB_LIB
 		  && optab_libfunc (binoptab, wider_mode)))
 	    {
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -42,6 +42,11 @@ struct optab_handlers
   int insn_code;
 };
 
+struct widening_optab_handlers
+{
+  struct optab_handlers handlers[NUM_MACHINE_MODES][NUM_MACHINE_MODES];
+};
+
 struct optab_d
 {
   enum rtx_code code;
@@ -50,6 +55,7 @@ struct optab_d
   void (*libcall_gen)(struct optab_d *, const char *name, char suffix,
 		      enum machine_mode);
   struct optab_handlers handlers[NUM_MACHINE_MODES];
+  struct widening_optab_handlers *widening;
 };
 typedef struct optab_d * optab;
 
@@ -876,6 +882,23 @@ optab_handler (optab op, enum machine_mode mode)
 			   + (int) CODE_FOR_nothing);
 }
 
+/* Like optab_handler, but for widening_operations that have a TO_MODE and
+  a FROM_MODE.  */
+
+static inline enum insn_code
+widening_optab_handler (optab op, enum machine_mode to_mode,
+			enum machine_mode from_mode)
+{
+  if (to_mode == from_mode)
+    return optab_handler (op, to_mode);
+
+  if (op->widening)
+    return (enum insn_code) (op->widening->handlers[(int) to_mode][(int) from_mode].insn_code
+			     + (int) CODE_FOR_nothing);
+
+  return CODE_FOR_nothing;
+}
+
 /* Record that insn CODE should be used to implement mode MODE of OP.  */
 
 static inline void
@@ -884,6 +907,26 @@ set_optab_handler (optab op, enum machine_mode mode, enum insn_code code)
   op->handlers[(int) mode].insn_code = (int) code - (int) CODE_FOR_nothing;
 }
 
+/* Like set_optab_handler, but for widening operations that have a TO_MODE
+   and a FROM_MODE.  */
+
+static inline void
+set_widening_optab_handler (optab op, enum machine_mode to_mode,
+			    enum machine_mode from_mode, enum insn_code code)
+{
+  if (to_mode == from_mode)
+    set_optab_handler (op, to_mode, code);
+  else
+    {
+      if (op->widening == NULL)
+	op->widening = (struct widening_optab_handlers *)
+	      xcalloc (1, sizeof (struct widening_optab_handlers));
+
+      op->widening->handlers[(int) to_mode][(int) from_mode].insn_code
+	  = (int) code - (int) CODE_FOR_nothing;
+    }
+}
+
 /* Return the insn used to perform conversion OP from mode FROM_MODE
    to mode TO_MODE; return CODE_FOR_nothing if the target does not have
    such an insn.  */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2055,6 +2055,8 @@ convert_mult_to_widen (gimple stmt)
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
+  enum machine_mode to_mode, from_mode;
+  optab op;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2064,12 +2066,17 @@ convert_mult_to_widen (gimple stmt)
   if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
     return false;
 
+  to_mode = TYPE_MODE (type);
+  from_mode = TYPE_MODE (type1);
+
   if (TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2))
-    handler = optab_handler (umul_widen_optab, TYPE_MODE (type));
+    op = umul_widen_optab;
   else if (!TYPE_UNSIGNED (type1) && !TYPE_UNSIGNED (type2))
-    handler = optab_handler (smul_widen_optab, TYPE_MODE (type));
+    op = smul_widen_optab;
   else
-    handler = optab_handler (usmul_widen_optab, TYPE_MODE (type));
+    op = usmul_widen_optab;
+
+  handler = widening_optab_handler (op, to_mode, from_mode);
 
   if (handler == CODE_FOR_nothing)
     return false;
@@ -2171,7 +2178,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
   this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
-  if (optab_handler (this_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+  if (widening_optab_handler (this_optab, TYPE_MODE (type), TYPE_MODE (type1))
+	== CODE_FOR_nothing)
     return false;
 
   /* ??? May need some type verification here?  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-07 10:00                             ` Richard Guenther
  2011-07-07 10:27                               ` Andrew Stubbs
@ 2011-07-11 17:01                               ` Andrew Stubbs
  2011-07-12 11:05                                 ` Richard Guenther
  2011-07-14 14:26                                 ` Andrew Stubbs
  1 sibling, 2 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-11 17:01 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Michael Matz, gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]

On 07/07/11 10:58, Richard Guenther wrote:
> I think you should assume that series of widenings, (int)(short)char_variable
> are already combined.  Thus I believe you only need to consider a single
> conversion in valid_types_for_madd_p.

Ok, here's my new patch.

This version only allows one conversion between the multiply and 
addition, so assumes that VRP has eliminated any needless ones.

That one conversion may either be a truncate, if the mode was too large 
for the meaningful data, or an extend, which must be of the right flavour.

This means that this patch now has the same effect as the last patch, 
for all valid cases (following you VRP patch), but rejects the cases 
where the C language (unhelpfully) requires an intermediate temporary to 
be of the 'wrong' signedness.

Hopefully the output will now be the same between both -O0 and -O2, and 
programmers will continue to have to be careful about casting unsigned 
variables whenever they expect purely unsigned math. :(

Is this one ok?

Andrew

[-- Attachment #2: widening-multiplies-3.patch --]
[-- Type: text/x-patch, Size: 4415 bytes --]

2011-07-11  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single
	conversion statement separating multiply-and-accumulate.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+     int bc = b * c;
+        return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			    enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
   tree type, type1, type2;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
@@ -2175,6 +2176,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
     return false;
 
+  /* Allow for one conversion statement between the multiply
+     and addition/subtraction statement.  If there are more than
+     one conversions then we assume they would invalidate this
+     transformation.  If that's not the case then they should have
+     been folded before now.  */
+  if (CONVERT_EXPR_CODE_P (rhs1_code))
+    {
+      conv1_stmt = rhs1_stmt;
+      rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+      if (TREE_CODE (rhs1) == SSA_NAME)
+	{
+	  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+	  if (is_gimple_assign (rhs1_stmt))
+	    rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+	}
+      else
+	return false;
+    }
+  if (CONVERT_EXPR_CODE_P (rhs2_code))
+    {
+      conv2_stmt = rhs2_stmt;
+      rhs2 = gimple_assign_rhs1 (rhs2_stmt);
+      if (TREE_CODE (rhs2) == SSA_NAME)
+	{
+	  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+	  if (is_gimple_assign (rhs2_stmt))
+	    rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+	}
+      else
+	return false;
+    }
+
   /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
      is_widening_mult_p, but we still need the rhs returns.
 
@@ -2188,6 +2221,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
+      conv_stmt = conv1_stmt;
     }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
     {
@@ -2195,6 +2229,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;
+      conv_stmt = conv2_stmt;
     }
   else
     return false;
@@ -2202,6 +2237,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     return false;
 
+  /* If there was a conversion between the multiply and addition
+     then we need to make sure it fits a multiply-and-accumulate.
+     The should be a single mode change which does not change the
+     value.  */
+  if (conv_stmt)
+    {
+      tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
+      tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
+      int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+      bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+      if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is a truncate.  */
+	  if (TYPE_PRECISION (to_type) < data_size)
+	    return false;
+	}
+      else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is an extend.  Check it's the right sort.  */
+	  if (TYPE_UNSIGNED (from_type) != is_unsigned
+	      && !(is_unsigned && TYPE_PRECISION (from_type) > data_size))
+	    return false;
+	}
+      /* else convert is a no-op for our purposes.  */
+    }
+
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-06-23 14:41 ` [PATCH (2/7)] Widening multiplies by more than one mode Andrew Stubbs
@ 2011-07-12 10:15   ` Andrew Stubbs
  2011-07-12 11:05     ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-12 10:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 1368 bytes --]

On 23/06/11 15:39, Andrew Stubbs wrote:
> This patch has two effects:
>
> 1. It permits the use of widening multiply instructions that widen by
> more than one mode. E.g. HImode -> DImode.
>
> 2. It enables the use of widening multiply instructions for (extended)
> inputs of narrower mode than the instruction takes. E.g. QImode ->
> DImode where only HI->DI or SI->DI is available.
>
> Hopefully, most of the patch is self-explanatory, but here are few notes:
>
> The code introduces a temporary FIXME comment; this will be removed
> later in the patch series. In fact, this is not a new restriction;
> previously "type1" and "type2" were implicitly identical because they
> were required to be one mode smaller than "type".
>
> I regard the ARM portion of this patch as obvious, so I don't think I
> need an ARM maintainer to read this.
>
> Is the patch OK?

I found a bug in this patch. It seems I do need to add casts for the 
inputs to widening multiplies (even though I know the registers are 
already fine), because otherwise something is insisting on truncating 
the values to the minimum width, which isn't helpful when it's actually 
an instruction with wider inputs.

The mode changing bits from patch 4 have therefore been moved here. I've 
made the changes Richard Guenther requested there, I think.

Otherwise, the patch is the same as before.

Andrew


[-- Attachment #2: widening-multiplies-2.patch --]
[-- Type: text/x-patch, Size: 15998 bytes --]

2011-07-11  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/arm/arm.md (maddhidi4): Remove '*' from name.
	* expr.c (expand_expr_real_2): Use find_widening_optab_handler.
	* optabs.c (find_widening_optab_handler_and_mode): New function.
	(expand_widen_pattern_expr): Use find_widening_optab_handler.
	(expand_binop_directly): Likewise.
	(expand_binop): Likewise.
	* optabs.h (find_widening_optab_handler): New macro define.
	(find_widening_optab_handler_and_mode): New prototype.
	* tree-cfg.c (verify_gimple_assign_binary): Adjust WIDEN_MULT_EXPR
	type precision rules.
	(verify_gimple_assign_ternary): Likewise for WIDEN_MULT_PLUS_EXPR.
	* tree-ssa-math-opts.c (build_and_insert_cast): New function.
	(is_widening_mult_rhs_p): Allow widening by more than one mode.
	Explicitly disallow mis-matched input types.
	(convert_mult_to_widen): Use find_widening_optab_handler, and cast
	input types to fit the new handler.
	(convert_plusminus_to_widen): Likewise.

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1857,7 +1857,7 @@
    (set_attr "predicable" "yes")]
 )
 
-(define_insn "*maddhidi4"
+(define_insn "maddhidi4"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
 	(plus:DI
 	  (mult:DI (sign_extend:DI
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7638,19 +7638,16 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	{
 	  enum machine_mode innermode = TYPE_MODE (TREE_TYPE (treeop0));
 	  this_optab = usmul_widen_optab;
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode))
+	  if (find_widening_optab_handler (this_optab, mode, innermode, 0)
+		!= CODE_FOR_nothing)
 	    {
-	      if (widening_optab_handler (this_optab, mode, innermode)
-		    != CODE_FOR_nothing)
-		{
-		  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
-		    expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
-				     EXPAND_NORMAL);
-		  else
-		    expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
-				     EXPAND_NORMAL);
-		  goto binop3;
-		}
+	      if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
+		expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
+				 EXPAND_NORMAL);
+	      else
+		expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
+				 EXPAND_NORMAL);
+	      goto binop3;
 	    }
 	}
       /* Check for a multiplication with matching signedness.  */
@@ -7665,10 +7662,9 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  optab other_optab = zextend_p ? smul_widen_optab : umul_widen_optab;
 	  this_optab = zextend_p ? umul_widen_optab : smul_widen_optab;
 
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode)
-	      && TREE_CODE (treeop0) != INTEGER_CST)
+	  if (TREE_CODE (treeop0) != INTEGER_CST)
 	    {
-	      if (widening_optab_handler (this_optab, mode, innermode)
+	      if (find_widening_optab_handler (this_optab, mode, innermode, 0)
 		    != CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -7677,7 +7673,7 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 					       unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	      if (widening_optab_handler (other_optab, mode, innermode)
+	      if (find_widening_optab_handler (other_optab, mode, innermode, 0)
 		    != CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -225,6 +225,37 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
   return 1;
 }
 \f
+/* Find a widening optab even if it doesn't widen as much as we want.
+   E.g. if from_mode is HImode, and to_mode is DImode, and there is no
+   direct HI->SI insn, then return SI->DI, if that exists.
+   If PERMIT_NON_WIDENING is non-zero then this can be used with
+   non-widening optabs also.  */
+
+enum insn_code
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+				      enum machine_mode from_mode,
+				      int permit_non_widening,
+				      enum machine_mode *found_mode)
+{
+  for (; (permit_non_widening || from_mode != to_mode)
+	 && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
+	 && from_mode != VOIDmode;
+       from_mode = GET_MODE_WIDER_MODE (from_mode))
+    {
+      enum insn_code handler = widening_optab_handler (op, to_mode,
+						       from_mode);
+
+      if (handler != CODE_FOR_nothing)
+	{
+	  if (found_mode)
+	    *found_mode = from_mode;
+	  return handler;
+	}
+    }
+
+  return CODE_FOR_nothing;
+}
+\f
 /* Widen OP to MODE and return the rtx for the widened operand.  UNSIGNEDP
    says whether OP is signed or unsigned.  NO_EXTEND is nonzero if we need
    not actually do a sign-extend or zero-extend, but can leave the
@@ -515,8 +546,9 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
       || ops->code == WIDEN_MULT_MINUS_EXPR)
-    icode = widening_optab_handler (widen_pattern_optab,
-				    TYPE_MODE (TREE_TYPE (ops->op2)), tmode0);
+    icode = find_widening_optab_handler (widen_pattern_optab,
+					 TYPE_MODE (TREE_TYPE (ops->op2)),
+					 tmode0, 0);
   else
     icode = optab_handler (widen_pattern_optab, tmode0);
   gcc_assert (icode != CODE_FOR_nothing);
@@ -1243,7 +1275,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
 		       rtx last)
 {
   enum machine_mode from_mode = GET_MODE (op0);
-  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);
+  enum insn_code icode = find_widening_optab_handler (binoptab, mode,
+						      from_mode, 1);
   enum machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
   enum machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
   enum machine_mode mode0, mode1, tmp_mode;
@@ -1390,7 +1423,7 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
   /* If we can do it with a three-operand insn, do so.  */
 
   if (methods != OPTAB_MUST_WIDEN
-      && widening_optab_handler (binoptab, mode, GET_MODE (op0))
+      && find_widening_optab_handler (binoptab, mode, GET_MODE (op0), 1)
 	    != CODE_FOR_nothing)
     {
       temp = expand_binop_directly (mode, binoptab, op0, op1, target,
@@ -1464,10 +1497,11 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 		!= CODE_FOR_nothing
 	    || (binoptab == smul_optab
 		&& GET_MODE_WIDER_MODE (wider_mode) != VOIDmode
-		&& (widening_optab_handler ((unsignedp ? umul_widen_optab
-						       : smul_widen_optab),
-					    GET_MODE_WIDER_MODE (wider_mode),
-					    mode)
+		&& (find_widening_optab_handler ((unsignedp
+						  ? umul_widen_optab
+						  : smul_widen_optab),
+						 GET_MODE_WIDER_MODE (wider_mode),
+						 mode, 0)
 		    != CODE_FOR_nothing)))
 	  {
 	    rtx xop0 = op0, xop1 = op1;
@@ -2002,7 +2036,7 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	   wider_mode != VOIDmode;
 	   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
 	{
-	  if (widening_optab_handler (binoptab, wider_mode, mode)
+	  if (find_widening_optab_handler (binoptab, wider_mode, mode, 1)
 		  != CODE_FOR_nothing
 	      || (methods == OPTAB_LIB
 		  && optab_libfunc (binoptab, wider_mode)))
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -807,6 +807,15 @@ extern rtx expand_copysign (rtx, rtx, rtx);
 extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
+/* Find a widening optab even if it doesn't widen as much as we want.  */
+#define find_widening_optab_handler(A,B,C,D) \
+  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+extern enum insn_code find_widening_optab_handler_and_mode (optab,
+							    enum machine_mode,
+							    enum machine_mode,
+							    int,
+							    enum machine_mode *);
+
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
    distinguish between machines with a vector shift that takes a scalar for the
    shift amount vs. machines that take a vector for the shift amount.  */
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3577,7 +3577,7 @@ do_pointer_plus_expr_check:
     case WIDEN_MULT_EXPR:
       if (TREE_CODE (lhs_type) != INTEGER_TYPE)
 	return true;
-      return ((2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type))
+      return ((2 * TYPE_PRECISION (rhs1_type) > TYPE_PRECISION (lhs_type))
 	      || (TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type)));
 
     case WIDEN_SUM_EXPR:
@@ -3668,7 +3668,7 @@ verify_gimple_assign_ternary (gimple stmt)
 	   && !FIXED_POINT_TYPE_P (rhs1_type))
 	  || !useless_type_conversion_p (rhs1_type, rhs2_type)
 	  || !useless_type_conversion_p (lhs_type, rhs3_type)
-	  || 2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type)
+	  || 2 * TYPE_PRECISION (rhs1_type) > TYPE_PRECISION (lhs_type)
 	  || TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type))
 	{
 	  error ("type mismatch in widening multiply-accumulate expression");
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1086,6 +1086,16 @@ build_and_insert_ref (gimple_stmt_iterator *gsi, location_t loc, tree type,
   return result;
 }
 
+/* Build a gimple assignment to cast VAL to TARGET.  Insert the statement
+   prior to GSI's current position, and return the fresh SSA name.  */
+
+static tree
+build_and_insert_cast (gimple_stmt_iterator *gsi, location_t loc,
+		       tree target, tree val)
+{
+  return build_and_insert_binop (gsi, loc, target, CONVERT_EXPR, val, NULL);
+}
+
 /* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI
    with location info LOC.  If possible, create an equivalent and
    less expensive sequence of statements prior to GSI, and return an
@@ -1958,8 +1968,8 @@ struct gimple_opt_pass pass_optimize_bswap =
 /* Return true if RHS is a suitable operand for a widening multiplication.
    There are two cases:
 
-     - RHS makes some value twice as wide.  Store that value in *NEW_RHS_OUT
-       if so, and store its type in *TYPE_OUT.
+     - RHS makes some value at least twice as wide.  Store that value
+       in *NEW_RHS_OUT if so, and store its type in *TYPE_OUT.
 
      - RHS is an integer constant.  Store that value in *NEW_RHS_OUT if so,
        but leave *TYPE_OUT untouched.  */
@@ -1987,7 +1997,7 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
       rhs1 = gimple_assign_rhs1 (stmt);
       type1 = TREE_TYPE (rhs1);
       if (TREE_CODE (type1) != TREE_CODE (type)
-	  || TYPE_PRECISION (type1) * 2 != TYPE_PRECISION (type))
+	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
 
       *new_rhs_out = rhs1;
@@ -2043,6 +2053,10 @@ is_widening_mult_p (gimple stmt,
       *type2_out = *type1_out;
     }
 
+  /* FIXME: remove this restriction.  */
+  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
+    return false;
+
   return true;
 }
 
@@ -2051,7 +2065,7 @@ is_widening_mult_p (gimple stmt,
    value is true iff we converted the statement.  */
 
 static bool
-convert_mult_to_widen (gimple stmt)
+convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
@@ -2076,13 +2090,34 @@ convert_mult_to_widen (gimple stmt)
   else
     op = usmul_widen_optab;
 
-  handler = widening_optab_handler (op, to_mode, from_mode);
+  handler = find_widening_optab_handler_and_mode (op, to_mode, from_mode,
+						  0, &from_mode);
 
   if (handler == CODE_FOR_nothing)
     return false;
 
-  gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
-  gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
+  if (from_mode != TYPE_MODE (type1))
+    {
+      location_t loc = gimple_location (stmt);
+      tree tmp1, tmp2;
+
+      tmp1 = create_tmp_var (
+		build_nonstandard_integer_type (
+		  GET_MODE_PRECISION (from_mode), TYPE_UNSIGNED (type1)),
+		NULL);
+      tmp2 = TYPE_UNSIGNED (type1) == TYPE_UNSIGNED (type2)
+	     ? tmp1
+	     : create_tmp_var (
+		  build_nonstandard_integer_type (
+		    GET_MODE_PRECISION (from_mode), TYPE_UNSIGNED (type1)),
+		  NULL);
+
+      rhs1 = build_and_insert_cast (gsi, loc, tmp1, rhs1);
+      rhs2 = build_and_insert_cast (gsi, loc, tmp2, rhs2);
+    }
+
+  gimple_assign_set_rhs1 (stmt, rhs1);
+  gimple_assign_set_rhs2 (stmt, rhs2);
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
   update_stmt (stmt);
   widen_mul_stats.widen_mults_inserted++;
@@ -2105,6 +2140,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
   enum tree_code wmult_code;
+  enum insn_code handler;
+  enum machine_mode from_mode;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2138,36 +2175,27 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
     return false;
 
-  if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
+  /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
+     is_widening_mult_p, but we still need the rhs returns.
+
+     It might also appear that it would be sufficient to use the existing
+     operands of the widening multiply, but that would limit the choice of
+     multiply-and-accumulate instructions.  */
+  if (code == PLUS_EXPR
+      && (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR))
     {
       if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
     }
-  else if (rhs2_code == MULT_EXPR)
+  else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
     {
       if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;
     }
-  else if (code == PLUS_EXPR && rhs1_code == WIDEN_MULT_EXPR)
-    {
-      mult_rhs1 = gimple_assign_rhs1 (rhs1_stmt);
-      mult_rhs2 = gimple_assign_rhs2 (rhs1_stmt);
-      type1 = TREE_TYPE (mult_rhs1);
-      type2 = TREE_TYPE (mult_rhs2);
-      add_rhs = rhs2;
-    }
-  else if (rhs2_code == WIDEN_MULT_EXPR)
-    {
-      mult_rhs1 = gimple_assign_rhs1 (rhs2_stmt);
-      mult_rhs2 = gimple_assign_rhs2 (rhs2_stmt);
-      type1 = TREE_TYPE (mult_rhs1);
-      type2 = TREE_TYPE (mult_rhs2);
-      add_rhs = rhs1;
-    }
   else
     return false;
 
@@ -2178,15 +2206,29 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
   this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
-  if (widening_optab_handler (this_optab, TYPE_MODE (type), TYPE_MODE (type1))
-	== CODE_FOR_nothing)
+  handler = find_widening_optab_handler_and_mode (this_optab,
+						  TYPE_MODE (type),
+						  TYPE_MODE (type1), 0,
+						  &from_mode);
+
+  if (handler == CODE_FOR_nothing)
     return false;
 
-  /* ??? May need some type verification here?  */
+  if (TYPE_MODE (type1) != from_mode)
+    {
+      location_t loc = gimple_location (stmt);
+      tree tmp;
+
+      tmp = create_tmp_var (
+		build_nonstandard_integer_type (
+		  GET_MODE_PRECISION (from_mode), TYPE_UNSIGNED (type1)),
+		NULL);
+
+      mult_rhs1 = build_and_insert_cast (gsi, loc, tmp, mult_rhs1);
+      mult_rhs2 = build_and_insert_cast (gsi, loc, tmp, mult_rhs2);
+    }
 
-  gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code,
-				    fold_convert (type1, mult_rhs1),
-				    fold_convert (type2, mult_rhs2),
+  gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code, mult_rhs1, mult_rhs2,
 				    add_rhs);
   update_stmt (gsi_stmt (*gsi));
   widen_mul_stats.maccs_inserted++;
@@ -2398,7 +2440,7 @@ execute_optimize_widening_mul (void)
 	      switch (code)
 		{
 		case MULT_EXPR:
-		  if (!convert_mult_to_widen (stmt)
+		  if (!convert_mult_to_widen (stmt, &gsi)
 		      && convert_mult_to_fma (stmt,
 					      gimple_assign_rhs1 (stmt),
 					      gimple_assign_rhs2 (stmt)))


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-07-12 10:15   ` Andrew Stubbs
@ 2011-07-12 11:05     ` Richard Guenther
  2011-07-12 11:14       ` Richard Guenther
  2011-07-14 14:17       ` Andrew Stubbs
  0 siblings, 2 replies; 107+ messages in thread
From: Richard Guenther @ 2011-07-12 11:05 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Tue, Jul 12, 2011 at 11:50 AM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 23/06/11 15:39, Andrew Stubbs wrote:
>>
>> This patch has two effects:
>>
>> 1. It permits the use of widening multiply instructions that widen by
>> more than one mode. E.g. HImode -> DImode.
>>
>> 2. It enables the use of widening multiply instructions for (extended)
>> inputs of narrower mode than the instruction takes. E.g. QImode ->
>> DImode where only HI->DI or SI->DI is available.
>>
>> Hopefully, most of the patch is self-explanatory, but here are few notes:
>>
>> The code introduces a temporary FIXME comment; this will be removed
>> later in the patch series. In fact, this is not a new restriction;
>> previously "type1" and "type2" were implicitly identical because they
>> were required to be one mode smaller than "type".
>>
>> I regard the ARM portion of this patch as obvious, so I don't think I
>> need an ARM maintainer to read this.
>>
>> Is the patch OK?
>
> I found a bug in this patch. It seems I do need to add casts for the inputs
> to widening multiplies (even though I know the registers are already fine),
> because otherwise something is insisting on truncating the values to the
> minimum width, which isn't helpful when it's actually an instruction with
> wider inputs.
>
> The mode changing bits from patch 4 have therefore been moved here. I've
> made the changes Richard Guenther requested there, I think.
>
> Otherwise, the patch is the same as before.

I wonder if we want to restrict the WIDEN_* operations to operate
on types that have matching type/mode precision(**).  Consider

struct {
  int a : 7;
  int b : 7;
} x;

short c = x.a * x.b;

which will be represented as (short)((int)<7-bit-type-with-QImode> *
(int)<7-bit-type-with-QImode>).

I wonder if you can do some experiments with bitfield types and see
if your patch series handles them correctly.

As for the patch, please update tree.def with the new requirements
for the WIDEN_* codes.

As for the bitfield precisions, we probably want to reject types that
do not have TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE
(type)).  Or maybe we can allow them if we generate
correct and good code for them?

+      tmp2 = TYPE_UNSIGNED (type1) == TYPE_UNSIGNED (type2)
+            ? tmp1
+            : create_tmp_var (
+                 build_nonstandard_integer_type (
+                   GET_MODE_PRECISION (from_mode), TYPE_UNSIGNED (type1)),
+                 NULL);

please use an if () stmt to avoid gross formatting.

+  if (TYPE_MODE (type1) != from_mode)

these kind of checks are unsafe if type1 does not have a TYPE_PRECISION
equal to its mode precision.

Thanks,
Richard.


> Andrew
>
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-11 17:01                               ` Andrew Stubbs
@ 2011-07-12 11:05                                 ` Richard Guenther
  2011-08-19 14:50                                   ` Andrew Stubbs
  2011-07-14 14:26                                 ` Andrew Stubbs
  1 sibling, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-12 11:05 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Michael Matz, gcc-patches, patches

On Mon, Jul 11, 2011 at 6:55 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 07/07/11 10:58, Richard Guenther wrote:
>>
>> I think you should assume that series of widenings,
>> (int)(short)char_variable
>> are already combined.  Thus I believe you only need to consider a single
>> conversion in valid_types_for_madd_p.
>
> Ok, here's my new patch.
>
> This version only allows one conversion between the multiply and addition,
> so assumes that VRP has eliminated any needless ones.
>
> That one conversion may either be a truncate, if the mode was too large for
> the meaningful data, or an extend, which must be of the right flavour.
>
> This means that this patch now has the same effect as the last patch, for
> all valid cases (following you VRP patch), but rejects the cases where the C
> language (unhelpfully) requires an intermediate temporary to be of the
> 'wrong' signedness.
>
> Hopefully the output will now be the same between both -O0 and -O2, and
> programmers will continue to have to be careful about casting unsigned
> variables whenever they expect purely unsigned math. :(
>
> Is this one ok?

Ok.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-07-12 11:05     ` Richard Guenther
@ 2011-07-12 11:14       ` Richard Guenther
  2011-07-12 11:38         ` Andrew Stubbs
  2011-07-21 19:51         ` Joseph S. Myers
  2011-07-14 14:17       ` Andrew Stubbs
  1 sibling, 2 replies; 107+ messages in thread
From: Richard Guenther @ 2011-07-12 11:14 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Tue, Jul 12, 2011 at 1:04 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Jul 12, 2011 at 11:50 AM, Andrew Stubbs <ams@codesourcery.com> wrote:
>> On 23/06/11 15:39, Andrew Stubbs wrote:
>>>
>>> This patch has two effects:
>>>
>>> 1. It permits the use of widening multiply instructions that widen by
>>> more than one mode. E.g. HImode -> DImode.
>>>
>>> 2. It enables the use of widening multiply instructions for (extended)
>>> inputs of narrower mode than the instruction takes. E.g. QImode ->
>>> DImode where only HI->DI or SI->DI is available.
>>>
>>> Hopefully, most of the patch is self-explanatory, but here are few notes:
>>>
>>> The code introduces a temporary FIXME comment; this will be removed
>>> later in the patch series. In fact, this is not a new restriction;
>>> previously "type1" and "type2" were implicitly identical because they
>>> were required to be one mode smaller than "type".
>>>
>>> I regard the ARM portion of this patch as obvious, so I don't think I
>>> need an ARM maintainer to read this.
>>>
>>> Is the patch OK?
>>
>> I found a bug in this patch. It seems I do need to add casts for the inputs
>> to widening multiplies (even though I know the registers are already fine),
>> because otherwise something is insisting on truncating the values to the
>> minimum width, which isn't helpful when it's actually an instruction with
>> wider inputs.
>>
>> The mode changing bits from patch 4 have therefore been moved here. I've
>> made the changes Richard Guenther requested there, I think.
>>
>> Otherwise, the patch is the same as before.
>
> I wonder if we want to restrict the WIDEN_* operations to operate
> on types that have matching type/mode precision(**).  Consider
>
> struct {
>  int a : 7;
>  int b : 7;
> } x;
>
> short c = x.a * x.b;
>
> which will be represented as (short)((int)<7-bit-type-with-QImode> *
> (int)<7-bit-type-with-QImode>).
>
> I wonder if you can do some experiments with bitfield types and see
> if your patch series handles them correctly.
>
> As for the patch, please update tree.def with the new requirements
> for the WIDEN_* codes.
>
> As for the bitfield precisions, we probably want to reject types that
> do not have TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE
> (type)).  Or maybe we can allow them if we generate
> correct and good code for them?
>
> +      tmp2 = TYPE_UNSIGNED (type1) == TYPE_UNSIGNED (type2)
> +            ? tmp1
> +            : create_tmp_var (
> +                 build_nonstandard_integer_type (
> +                   GET_MODE_PRECISION (from_mode), TYPE_UNSIGNED (type1)),
> +                 NULL);
>
> please use an if () stmt to avoid gross formatting.
>
> +  if (TYPE_MODE (type1) != from_mode)
>
> these kind of checks are unsafe if type1 does not have a TYPE_PRECISION
> equal to its mode precision.

(**) We really ought to forbid any arithmetic on types that have non-mode
precision and only allow conversions to/from such types.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-07-12 11:14       ` Richard Guenther
@ 2011-07-12 11:38         ` Andrew Stubbs
  2011-07-12 11:51           ` Richard Guenther
  2011-07-21 19:51         ` Joseph S. Myers
  1 sibling, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-12 11:38 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

On 12/07/11 12:05, Richard Guenther wrote:
> (**) We really ought to forbid any arithmetic on types that have non-mode
> precision and only allow conversions to/from such types.

Hmmm, presumably the problem is that we might have a compatible 
precision, but the backends actually work with purely mode-sized types?

That does sound problematic. :(

Does the recent bitfield lowering activity have any affect on this? I.e. 
does it make it a moot point by the time we get to the widen_mult pass?

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-07-12 11:38         ` Andrew Stubbs
@ 2011-07-12 11:51           ` Richard Guenther
  0 siblings, 0 replies; 107+ messages in thread
From: Richard Guenther @ 2011-07-12 11:51 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Tue, Jul 12, 2011 at 1:26 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote:
> On 12/07/11 12:05, Richard Guenther wrote:
>>
>> (**) We really ought to forbid any arithmetic on types that have non-mode
>> precision and only allow conversions to/from such types.
>
> Hmmm, presumably the problem is that we might have a compatible precision,
> but the backends actually work with purely mode-sized types?
>
> That does sound problematic. :(
>
> Does the recent bitfield lowering activity have any affect on this? I.e.
> does it make it a moot point by the time we get to the widen_mult pass?

No, the bitfield lowering will only change the types of memory loads,
not the types of the quantities we eventually see in the IL.  Thus for
my example we'd still see the casts from 7-bit types.

Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-07-04 14:27       ` Andrew Stubbs
  2011-07-07 10:10         ` Richard Guenther
@ 2011-07-12 14:10         ` Andrew Stubbs
  2011-07-14 14:28           ` Andrew Stubbs
  1 sibling, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-12 14:10 UTC (permalink / raw)
  Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 1481 bytes --]

On 04/07/11 15:26, Andrew Stubbs wrote:
> On 28/06/11 15:14, Andrew Stubbs wrote:
>> On 28/06/11 13:33, Andrew Stubbs wrote:
>>> On 23/06/11 15:41, Andrew Stubbs wrote:
>>>> If one or both of the inputs to a widening multiply are of unsigned
>>>> type
>>>> then the compiler will attempt to use usmul_widen_optab or
>>>> umul_widen_optab, respectively.
>>>>
>>>> That works fine, but only if the target supports those operations
>>>> directly. Otherwise, it just bombs out and reverts to the normal
>>>> inefficient non-widening multiply.
>>>>
>>>> This patch attempts to catch these cases and use an alternative signed
>>>> widening multiply instruction, if one of those is available.
>>>>
>>>> I believe this should be legal as long as the top bit of both inputs is
>>>> guaranteed to be zero. The code achieves this guarantee by
>>>> zero-extending the inputs to a wider mode (which must still be narrower
>>>> than the output mode).
>>>>
>>>> OK?
>>>
>>> This update fixes the testsuite issue Janis pointed out.
>>
>> And this one fixes up the wmul-5.c testcase also. The patch has changed
>> the correct result.
>
> Here's an update for the context changed by the update to patch 3.
>
> The content of the patch has not changed.

This update does the same thing as before, but updated for the changes 
earlier in the patch series. In particular, the build_and_insert_cast 
function and find_widening_optab_handler_and_mode changes have been 
moved up to patch 2.

OK?

Andrew

[-- Attachment #2: widening-multiplies-4.patch --]
[-- Type: text/x-patch, Size: 3075 bytes --]

2011-07-12  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Convert
	unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-6.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2071,6 +2071,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   enum insn_code handler;
   enum machine_mode to_mode, from_mode;
   optab op;
+  bool do_cast = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2094,9 +2095,32 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 						  0, &from_mode);
 
   if (handler == CODE_FOR_nothing)
-    return false;
+    {
+      if (op != smul_widen_optab)
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+	    return false;
+
+	  op = smul_widen_optab;
+	  handler = find_widening_optab_handler_and_mode (op, to_mode,
+							  from_mode, 0,
+							  &from_mode);
 
-  if (from_mode != TYPE_MODE (type1))
+	  if (handler == CODE_FOR_nothing)
+	    return false;
+
+	  type1 = build_nonstandard_integer_type (
+					GET_MODE_PRECISION (from_mode),
+					0);
+	  type2 = type1;
+	  do_cast = true;
+	}
+      else
+	return false;
+    }
+
+  if (from_mode != TYPE_MODE (type1) || do_cast)
     {
       location_t loc = gimple_location (stmt);
       tree tmp1, tmp2;
@@ -2143,6 +2167,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   enum tree_code wmult_code;
   enum insn_code handler;
   enum machine_mode from_mode;
+  bool do_cast = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2234,8 +2259,21 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
     return false;
 
+  /* We don't support usmadd yet, so try a wider signed mode.  */
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
-    return false;
+    {
+      enum machine_mode mode = TYPE_MODE (type1);
+      mode = GET_MODE_WIDER_MODE (mode);
+      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (TYPE_MODE (type)))
+	{
+	  type1 = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
+						  0);
+	  type2 = type1;
+	  do_cast = true;
+	}
+      else
+	return false;
+    }
 
   /* If there was a conversion between the multiply and addition
      then we need to make sure it fits a multiply-and-accumulate.
@@ -2276,7 +2314,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (handler == CODE_FOR_nothing)
     return false;
 
-  if (TYPE_MODE (type1) != from_mode)
+  if (TYPE_MODE (type1) != from_mode || do_cast)
     {
       location_t loc = gimple_location (stmt);
       tree tmp;

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-07-12 11:05     ` Richard Guenther
  2011-07-12 11:14       ` Richard Guenther
@ 2011-07-14 14:17       ` Andrew Stubbs
  2011-07-14 14:24         ` Richard Guenther
  1 sibling, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-14 14:17 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 943 bytes --]

On 12/07/11 12:04, Richard Guenther wrote:
> I wonder if we want to restrict the WIDEN_* operations to operate
> on types that have matching type/mode precision(**).

I've now modified the patch to allow bitfields, or other case where the 
precision is smaller than the mode-size. I've also addressed the 
formatting issues you pointed out (and in fact reorganised the code 
slightly to make the rest of the series a bit cleaner).

As in the previous version of this patch, it's necessary to convert the 
input values to the proper mode for the machine instruction, so the 
basic tools for supporting the bitfields were already there - I just had 
to tweak the conditionals to take bitfields into account.

The only this I haven't done is modified tree.def. Looking at it though, 
I don't thing any needs changing? The code is still valid, and the 
comments are correct (in fact, they may have been wrong before).

Is this version OK?

Andrew

[-- Attachment #2: widening-multiplies-2.patch --]
[-- Type: text/x-patch, Size: 17355 bytes --]

2011-07-14  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/arm/arm.md (maddhidi4): Remove '*' from name.
	* expr.c (expand_expr_real_2): Use find_widening_optab_handler.
	* optabs.c (find_widening_optab_handler_and_mode): New function.
	(expand_widen_pattern_expr): Use find_widening_optab_handler.
	(expand_binop_directly): Likewise.
	(expand_binop): Likewise.
	* optabs.h (find_widening_optab_handler): New macro define.
	(find_widening_optab_handler_and_mode): New prototype.
	* tree-cfg.c (verify_gimple_assign_binary): Adjust WIDEN_MULT_EXPR
	type precision rules.
	(verify_gimple_assign_ternary): Likewise for WIDEN_MULT_PLUS_EXPR.
	* tree-ssa-math-opts.c (build_and_insert_cast): New function.
	(is_widening_mult_rhs_p): Allow widening by more than one mode.
	Explicitly disallow mis-matched input types.
	(convert_mult_to_widen): Use find_widening_optab_handler, and cast
	input types to fit the new handler.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-bitfield-1.c: New file.

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1857,7 +1857,7 @@
    (set_attr "predicable" "yes")]
 )
 
-(define_insn "*maddhidi4"
+(define_insn "maddhidi4"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
 	(plus:DI
 	  (mult:DI (sign_extend:DI
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7638,19 +7638,16 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	{
 	  enum machine_mode innermode = TYPE_MODE (TREE_TYPE (treeop0));
 	  this_optab = usmul_widen_optab;
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode))
+	  if (find_widening_optab_handler (this_optab, mode, innermode, 0)
+		!= CODE_FOR_nothing)
 	    {
-	      if (widening_optab_handler (this_optab, mode, innermode)
-		    != CODE_FOR_nothing)
-		{
-		  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
-		    expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
-				     EXPAND_NORMAL);
-		  else
-		    expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
-				     EXPAND_NORMAL);
-		  goto binop3;
-		}
+	      if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
+		expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
+				 EXPAND_NORMAL);
+	      else
+		expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
+				 EXPAND_NORMAL);
+	      goto binop3;
 	    }
 	}
       /* Check for a multiplication with matching signedness.  */
@@ -7665,10 +7662,9 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  optab other_optab = zextend_p ? smul_widen_optab : umul_widen_optab;
 	  this_optab = zextend_p ? umul_widen_optab : smul_widen_optab;
 
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode)
-	      && TREE_CODE (treeop0) != INTEGER_CST)
+	  if (TREE_CODE (treeop0) != INTEGER_CST)
 	    {
-	      if (widening_optab_handler (this_optab, mode, innermode)
+	      if (find_widening_optab_handler (this_optab, mode, innermode, 0)
 		    != CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -7677,7 +7673,7 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 					       unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	      if (widening_optab_handler (other_optab, mode, innermode)
+	      if (find_widening_optab_handler (other_optab, mode, innermode, 0)
 		    != CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -225,6 +225,37 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
   return 1;
 }
 \f
+/* Find a widening optab even if it doesn't widen as much as we want.
+   E.g. if from_mode is HImode, and to_mode is DImode, and there is no
+   direct HI->SI insn, then return SI->DI, if that exists.
+   If PERMIT_NON_WIDENING is non-zero then this can be used with
+   non-widening optabs also.  */
+
+enum insn_code
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+				      enum machine_mode from_mode,
+				      int permit_non_widening,
+				      enum machine_mode *found_mode)
+{
+  for (; (permit_non_widening || from_mode != to_mode)
+	 && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
+	 && from_mode != VOIDmode;
+       from_mode = GET_MODE_WIDER_MODE (from_mode))
+    {
+      enum insn_code handler = widening_optab_handler (op, to_mode,
+						       from_mode);
+
+      if (handler != CODE_FOR_nothing)
+	{
+	  if (found_mode)
+	    *found_mode = from_mode;
+	  return handler;
+	}
+    }
+
+  return CODE_FOR_nothing;
+}
+\f
 /* Widen OP to MODE and return the rtx for the widened operand.  UNSIGNEDP
    says whether OP is signed or unsigned.  NO_EXTEND is nonzero if we need
    not actually do a sign-extend or zero-extend, but can leave the
@@ -515,8 +546,9 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
       || ops->code == WIDEN_MULT_MINUS_EXPR)
-    icode = widening_optab_handler (widen_pattern_optab,
-				    TYPE_MODE (TREE_TYPE (ops->op2)), tmode0);
+    icode = find_widening_optab_handler (widen_pattern_optab,
+					 TYPE_MODE (TREE_TYPE (ops->op2)),
+					 tmode0, 0);
   else
     icode = optab_handler (widen_pattern_optab, tmode0);
   gcc_assert (icode != CODE_FOR_nothing);
@@ -1243,7 +1275,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
 		       rtx last)
 {
   enum machine_mode from_mode = GET_MODE (op0);
-  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);
+  enum insn_code icode = find_widening_optab_handler (binoptab, mode,
+						      from_mode, 1);
   enum machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
   enum machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
   enum machine_mode mode0, mode1, tmp_mode;
@@ -1390,7 +1423,7 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
   /* If we can do it with a three-operand insn, do so.  */
 
   if (methods != OPTAB_MUST_WIDEN
-      && widening_optab_handler (binoptab, mode, GET_MODE (op0))
+      && find_widening_optab_handler (binoptab, mode, GET_MODE (op0), 1)
 	    != CODE_FOR_nothing)
     {
       temp = expand_binop_directly (mode, binoptab, op0, op1, target,
@@ -1464,10 +1497,11 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 		!= CODE_FOR_nothing
 	    || (binoptab == smul_optab
 		&& GET_MODE_WIDER_MODE (wider_mode) != VOIDmode
-		&& (widening_optab_handler ((unsignedp ? umul_widen_optab
-						       : smul_widen_optab),
-					    GET_MODE_WIDER_MODE (wider_mode),
-					    mode)
+		&& (find_widening_optab_handler ((unsignedp
+						  ? umul_widen_optab
+						  : smul_widen_optab),
+						 GET_MODE_WIDER_MODE (wider_mode),
+						 mode, 0)
 		    != CODE_FOR_nothing)))
 	  {
 	    rtx xop0 = op0, xop1 = op1;
@@ -2002,7 +2036,7 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	   wider_mode != VOIDmode;
 	   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
 	{
-	  if (widening_optab_handler (binoptab, wider_mode, mode)
+	  if (find_widening_optab_handler (binoptab, wider_mode, mode, 1)
 		  != CODE_FOR_nothing
 	      || (methods == OPTAB_LIB
 		  && optab_libfunc (binoptab, wider_mode)))
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -807,6 +807,15 @@ extern rtx expand_copysign (rtx, rtx, rtx);
 extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
+/* Find a widening optab even if it doesn't widen as much as we want.  */
+#define find_widening_optab_handler(A,B,C,D) \
+  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+extern enum insn_code find_widening_optab_handler_and_mode (optab,
+							    enum machine_mode,
+							    enum machine_mode,
+							    int,
+							    enum machine_mode *);
+
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
    distinguish between machines with a vector shift that takes a scalar for the
    shift amount vs. machines that take a vector for the shift amount.  */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-bitfield-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+struct bf
+{
+  int a : 3;
+  int b : 15;
+  int c : 3;
+};
+
+long long
+foo (long long a, struct bf b, struct bf c)
+{
+  return a + b.b * c.b;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3577,7 +3577,7 @@ do_pointer_plus_expr_check:
     case WIDEN_MULT_EXPR:
       if (TREE_CODE (lhs_type) != INTEGER_TYPE)
 	return true;
-      return ((2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type))
+      return ((2 * TYPE_PRECISION (rhs1_type) > TYPE_PRECISION (lhs_type))
 	      || (TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type)));
 
     case WIDEN_SUM_EXPR:
@@ -3668,7 +3668,7 @@ verify_gimple_assign_ternary (gimple stmt)
 	   && !FIXED_POINT_TYPE_P (rhs1_type))
 	  || !useless_type_conversion_p (rhs1_type, rhs2_type)
 	  || !useless_type_conversion_p (lhs_type, rhs3_type)
-	  || 2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type)
+	  || 2 * TYPE_PRECISION (rhs1_type) > TYPE_PRECISION (lhs_type)
 	  || TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type))
 	{
 	  error ("type mismatch in widening multiply-accumulate expression");
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1086,6 +1086,16 @@ build_and_insert_ref (gimple_stmt_iterator *gsi, location_t loc, tree type,
   return result;
 }
 
+/* Build a gimple assignment to cast VAL to TARGET.  Insert the statement
+   prior to GSI's current position, and return the fresh SSA name.  */
+
+static tree
+build_and_insert_cast (gimple_stmt_iterator *gsi, location_t loc,
+		       tree target, tree val)
+{
+  return build_and_insert_binop (gsi, loc, target, CONVERT_EXPR, val, NULL);
+}
+
 /* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI
    with location info LOC.  If possible, create an equivalent and
    less expensive sequence of statements prior to GSI, and return an
@@ -1958,8 +1968,8 @@ struct gimple_opt_pass pass_optimize_bswap =
 /* Return true if RHS is a suitable operand for a widening multiplication.
    There are two cases:
 
-     - RHS makes some value twice as wide.  Store that value in *NEW_RHS_OUT
-       if so, and store its type in *TYPE_OUT.
+     - RHS makes some value at least twice as wide.  Store that value
+       in *NEW_RHS_OUT if so, and store its type in *TYPE_OUT.
 
      - RHS is an integer constant.  Store that value in *NEW_RHS_OUT if so,
        but leave *TYPE_OUT untouched.  */
@@ -1987,7 +1997,7 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
       rhs1 = gimple_assign_rhs1 (stmt);
       type1 = TREE_TYPE (rhs1);
       if (TREE_CODE (type1) != TREE_CODE (type)
-	  || TYPE_PRECISION (type1) * 2 != TYPE_PRECISION (type))
+	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
 
       *new_rhs_out = rhs1;
@@ -2043,6 +2053,10 @@ is_widening_mult_p (gimple stmt,
       *type2_out = *type1_out;
     }
 
+  /* FIXME: remove this restriction.  */
+  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
+    return false;
+
   return true;
 }
 
@@ -2051,12 +2065,14 @@ is_widening_mult_p (gimple stmt,
    value is true iff we converted the statement.  */
 
 static bool
-convert_mult_to_widen (gimple stmt)
+convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
-  tree lhs, rhs1, rhs2, type, type1, type2;
+  tree lhs, rhs1, rhs2, type, type1, type2, tmp;
   enum insn_code handler;
-  enum machine_mode to_mode, from_mode;
+  enum machine_mode to_mode, from_mode, actual_mode;
   optab op;
+  int actual_precision;
+  location_t loc = gimple_location (stmt);
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2076,13 +2092,32 @@ convert_mult_to_widen (gimple stmt)
   else
     op = usmul_widen_optab;
 
-  handler = widening_optab_handler (op, to_mode, from_mode);
+  handler = find_widening_optab_handler_and_mode (op, to_mode, from_mode,
+						  0, &actual_mode);
 
   if (handler == CODE_FOR_nothing)
     return false;
 
-  gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
-  gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
+  /* Ensure that the inputs to the handler are in the correct precison
+     for the opcode.  This will be the full mode size.  */
+  actual_precision = GET_MODE_PRECISION (actual_mode);
+  if (actual_precision != TYPE_PRECISION (type1))
+    {
+      tmp = create_tmp_var (build_nonstandard_integer_type
+				(actual_precision, TYPE_UNSIGNED (type1)),
+			    NULL);
+      rhs1 = build_and_insert_cast (gsi, loc, tmp, rhs1);
+
+      /* Reuse the same type info, if possible.  */
+      if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
+	tmp = create_tmp_var (build_nonstandard_integer_type
+				(actual_precision, TYPE_UNSIGNED (type2)),
+			      NULL);
+      rhs2 = build_and_insert_cast (gsi, loc, tmp, rhs2);
+    }
+
+  gimple_assign_set_rhs1 (stmt, rhs1);
+  gimple_assign_set_rhs2 (stmt, rhs2);
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
   update_stmt (stmt);
   widen_mul_stats.widen_mults_inserted++;
@@ -2100,11 +2135,15 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			    enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
-  tree type, type1, type2;
+  tree type, type1, type2, tmp;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
   enum tree_code wmult_code;
+  enum insn_code handler;
+  enum machine_mode to_mode, from_mode, actual_mode;
+  location_t loc = gimple_location (stmt);
+  int actual_precision;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2138,39 +2177,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
     return false;
 
-  if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
+  /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
+     is_widening_mult_p, but we still need the rhs returns.
+
+     It might also appear that it would be sufficient to use the existing
+     operands of the widening multiply, but that would limit the choice of
+     multiply-and-accumulate instructions.  */
+  if (code == PLUS_EXPR
+      && (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR))
     {
       if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
     }
-  else if (rhs2_code == MULT_EXPR)
+  else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
     {
       if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;
     }
-  else if (code == PLUS_EXPR && rhs1_code == WIDEN_MULT_EXPR)
-    {
-      mult_rhs1 = gimple_assign_rhs1 (rhs1_stmt);
-      mult_rhs2 = gimple_assign_rhs2 (rhs1_stmt);
-      type1 = TREE_TYPE (mult_rhs1);
-      type2 = TREE_TYPE (mult_rhs2);
-      add_rhs = rhs2;
-    }
-  else if (rhs2_code == WIDEN_MULT_EXPR)
-    {
-      mult_rhs1 = gimple_assign_rhs1 (rhs2_stmt);
-      mult_rhs2 = gimple_assign_rhs2 (rhs2_stmt);
-      type1 = TREE_TYPE (mult_rhs1);
-      type2 = TREE_TYPE (mult_rhs2);
-      add_rhs = rhs1;
-    }
   else
     return false;
 
+  to_mode = TYPE_MODE (type);
+  from_mode = TYPE_MODE (type1);
+
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     return false;
 
@@ -2178,15 +2211,26 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
   this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
-  if (widening_optab_handler (this_optab, TYPE_MODE (type), TYPE_MODE (type1))
-	== CODE_FOR_nothing)
+  handler = find_widening_optab_handler_and_mode (this_optab, to_mode,
+						  from_mode, 0, &actual_mode);
+
+  if (handler == CODE_FOR_nothing)
     return false;
 
-  /* ??? May need some type verification here?  */
+  /* Ensure that the inputs to the handler are in the correct precison
+     for the opcode.  This will be the full mode size.  */
+  actual_precision = GET_MODE_PRECISION (actual_mode);
+  if (actual_precision != TYPE_PRECISION (type1))
+    {
+      tmp = create_tmp_var (build_nonstandard_integer_type
+				(actual_precision, TYPE_UNSIGNED (type1)),
+			    NULL);
+
+      mult_rhs1 = build_and_insert_cast (gsi, loc, tmp, mult_rhs1);
+      mult_rhs2 = build_and_insert_cast (gsi, loc, tmp, mult_rhs2);
+    }
 
-  gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code,
-				    fold_convert (type1, mult_rhs1),
-				    fold_convert (type2, mult_rhs2),
+  gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code, mult_rhs1, mult_rhs2,
 				    add_rhs);
   update_stmt (gsi_stmt (*gsi));
   widen_mul_stats.maccs_inserted++;
@@ -2398,7 +2442,7 @@ execute_optimize_widening_mul (void)
 	      switch (code)
 		{
 		case MULT_EXPR:
-		  if (!convert_mult_to_widen (stmt)
+		  if (!convert_mult_to_widen (stmt, &gsi)
 		      && convert_mult_to_fma (stmt,
 					      gimple_assign_rhs1 (stmt),
 					      gimple_assign_rhs2 (stmt)))

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-07-14 14:17       ` Andrew Stubbs
@ 2011-07-14 14:24         ` Richard Guenther
  2011-08-19 14:45           ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-14 14:24 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Thu, Jul 14, 2011 at 4:10 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 12/07/11 12:04, Richard Guenther wrote:
>>
>> I wonder if we want to restrict the WIDEN_* operations to operate
>> on types that have matching type/mode precision(**).
>
> I've now modified the patch to allow bitfields, or other case where the
> precision is smaller than the mode-size. I've also addressed the formatting
> issues you pointed out (and in fact reorganised the code slightly to make
> the rest of the series a bit cleaner).
>
> As in the previous version of this patch, it's necessary to convert the
> input values to the proper mode for the machine instruction, so the basic
> tools for supporting the bitfields were already there - I just had to tweak
> the conditionals to take bitfields into account.
>
> The only this I haven't done is modified tree.def. Looking at it though, I
> don't thing any needs changing? The code is still valid, and the comments
> are correct (in fact, they may have been wrong before).

Ah, it indeed talks about at least twice the precision already.

> Is this version OK?

Ok.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-11 17:01                               ` Andrew Stubbs
  2011-07-12 11:05                                 ` Richard Guenther
@ 2011-07-14 14:26                                 ` Andrew Stubbs
  2011-07-19  0:36                                   ` Janis Johnson
  1 sibling, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-14 14:26 UTC (permalink / raw)
  Cc: Richard Guenther, Michael Matz, gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 153 bytes --]

This update changes only the context modified by changes to patch 2. The 
patch has already been approved. I'm just posting it for completeness.

Andrew

[-- Attachment #2: widening-multiplies-3.patch --]
[-- Type: text/x-patch, Size: 4420 bytes --]

2011-07-14  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single
	conversion statement separating multiply-and-accumulate.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+     int bc = b * c;
+        return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			    enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
   tree type, type1, type2, tmp;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
@@ -2177,6 +2178,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
     return false;
 
+  /* Allow for one conversion statement between the multiply
+     and addition/subtraction statement.  If there are more than
+     one conversions then we assume they would invalidate this
+     transformation.  If that's not the case then they should have
+     been folded before now.  */
+  if (CONVERT_EXPR_CODE_P (rhs1_code))
+    {
+      conv1_stmt = rhs1_stmt;
+      rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+      if (TREE_CODE (rhs1) == SSA_NAME)
+	{
+	  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+	  if (is_gimple_assign (rhs1_stmt))
+	    rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+	}
+      else
+	return false;
+    }
+  if (CONVERT_EXPR_CODE_P (rhs2_code))
+    {
+      conv2_stmt = rhs2_stmt;
+      rhs2 = gimple_assign_rhs1 (rhs2_stmt);
+      if (TREE_CODE (rhs2) == SSA_NAME)
+	{
+	  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+	  if (is_gimple_assign (rhs2_stmt))
+	    rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+	}
+      else
+	return false;
+    }
+
   /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
      is_widening_mult_p, but we still need the rhs returns.
 
@@ -2190,6 +2223,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
+      conv_stmt = conv1_stmt;
     }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
     {
@@ -2197,6 +2231,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;
+      conv_stmt = conv2_stmt;
     }
   else
     return false;
@@ -2207,6 +2242,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     return false;
 
+  /* If there was a conversion between the multiply and addition
+     then we need to make sure it fits a multiply-and-accumulate.
+     The should be a single mode change which does not change the
+     value.  */
+  if (conv_stmt)
+    {
+      tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
+      tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
+      int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+      bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+      if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is a truncate.  */
+	  if (TYPE_PRECISION (to_type) < data_size)
+	    return false;
+	}
+      else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is an extend.  Check it's the right sort.  */
+	  if (TYPE_UNSIGNED (from_type) != is_unsigned
+	      && !(is_unsigned && TYPE_PRECISION (from_type) > data_size))
+	    return false;
+	}
+      /* else convert is a no-op for our purposes.  */
+    }
+
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-07-12 14:10         ` Andrew Stubbs
@ 2011-07-14 14:28           ` Andrew Stubbs
  2011-07-14 14:31             ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-14 14:28 UTC (permalink / raw)
  Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

On 12/07/11 15:07, Andrew Stubbs wrote:
> This update does the same thing as before, but updated for the changes
> earlier in the patch series. In particular, the build_and_insert_cast
> function and find_widening_optab_handler_and_mode changes have been
> moved up to patch 2.

And this update changes the way the casts are handled, partly because it 
got unwieldy towards the end of the patch series, and partly because I 
found a few bugs.

I've also ensured that it checks the precision of the types, rather than 
the mode size to ensure that it is bitfield safe.

OK?

Andrew

[-- Attachment #2: widening-multiplies-4.patch --]
[-- Type: text/x-patch, Size: 7001 bytes --]

2011-07-14  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Convert
	unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-6.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2067,12 +2067,13 @@ is_widening_mult_p (gimple stmt,
 static bool
 convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
-  tree lhs, rhs1, rhs2, type, type1, type2, tmp;
+  tree lhs, rhs1, rhs2, type, type1, type2, tmp = NULL;
   enum insn_code handler;
   enum machine_mode to_mode, from_mode, actual_mode;
   optab op;
   int actual_precision;
   location_t loc = gimple_location (stmt);
+  bool from_unsigned1, from_unsigned2;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2084,10 +2085,12 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 
   to_mode = TYPE_MODE (type);
   from_mode = TYPE_MODE (type1);
+  from_unsigned1 = TYPE_UNSIGNED (type1);
+  from_unsigned2 = TYPE_UNSIGNED (type2);
 
-  if (TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2))
+  if (from_unsigned1 && from_unsigned2)
     op = umul_widen_optab;
-  else if (!TYPE_UNSIGNED (type1) && !TYPE_UNSIGNED (type2))
+  else if (!from_unsigned1 && !from_unsigned2)
     op = smul_widen_optab;
   else
     op = usmul_widen_optab;
@@ -2096,22 +2099,45 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 						  0, &actual_mode);
 
   if (handler == CODE_FOR_nothing)
-    return false;
+    {
+      if (op != smul_widen_optab)
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+	    return false;
+
+	  op = smul_widen_optab;
+	  handler = find_widening_optab_handler_and_mode (op, to_mode,
+							  from_mode, 0,
+							  &actual_mode);
+
+	  if (handler == CODE_FOR_nothing)
+	    return false;
+
+	  from_unsigned1 = from_unsigned2 = false;
+	}
+      else
+	return false;
+    }
 
   /* Ensure that the inputs to the handler are in the correct precison
      for the opcode.  This will be the full mode size.  */
   actual_precision = GET_MODE_PRECISION (actual_mode);
-  if (actual_precision != TYPE_PRECISION (type1))
+  if (actual_precision != TYPE_PRECISION (type1)
+      || from_unsigned1 != TYPE_UNSIGNED (type1))
     {
       tmp = create_tmp_var (build_nonstandard_integer_type
-				(actual_precision, TYPE_UNSIGNED (type1)),
+				(actual_precision, from_unsigned1),
 			    NULL);
       rhs1 = build_and_insert_cast (gsi, loc, tmp, rhs1);
-
+    }
+  if (actual_precision != TYPE_PRECISION (type2)
+      || from_unsigned2 != TYPE_UNSIGNED (type2))
+    {
       /* Reuse the same type info, if possible.  */
-      if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
+      if (!tmp || from_unsigned1 != from_unsigned2)
 	tmp = create_tmp_var (build_nonstandard_integer_type
-				(actual_precision, TYPE_UNSIGNED (type2)),
+				(actual_precision, from_unsigned2),
 			      NULL);
       rhs2 = build_and_insert_cast (gsi, loc, tmp, rhs2);
     }
@@ -2136,7 +2162,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
   gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
-  tree type, type1, type2, tmp;
+  tree type, type1, type2, optype, tmp = NULL;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
@@ -2145,6 +2171,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   enum machine_mode to_mode, from_mode, actual_mode;
   location_t loc = gimple_location (stmt);
   int actual_precision;
+  bool from_unsigned1, from_unsigned2;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2238,9 +2265,21 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 
   to_mode = TYPE_MODE (type);
   from_mode = TYPE_MODE (type1);
+  from_unsigned1 = TYPE_UNSIGNED (type1);
+  from_unsigned2 = TYPE_UNSIGNED (type2);
 
-  if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
-    return false;
+  /* There's no such thing as a mixed sign madd yet, so use a wider mode.  */
+  if (from_unsigned1 != from_unsigned2)
+    {
+      enum machine_mode mode = GET_MODE_WIDER_MODE (from_mode);
+      if (GET_MODE_PRECISION (mode) < GET_MODE_PRECISION (to_mode))
+	{
+	  from_mode = mode;
+	  from_unsigned1 = from_unsigned2 = false;
+	}
+      else
+	return false;
+    }
 
   /* If there was a conversion between the multiply and addition
      then we need to make sure it fits a multiply-and-accumulate.
@@ -2248,6 +2287,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      value.  */
   if (conv_stmt)
     {
+      /* We use the original, unmodified data types for this.  */
       tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
       tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
       int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
@@ -2272,7 +2312,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
-  this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
+  optype = build_nonstandard_integer_type (from_mode, from_unsigned1);
+  this_optab = optab_for_tree_code (wmult_code, optype, optab_default);
   handler = find_widening_optab_handler_and_mode (this_optab, to_mode,
 						  from_mode, 0, &actual_mode);
 
@@ -2282,13 +2323,21 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   /* Ensure that the inputs to the handler are in the correct precison
      for the opcode.  This will be the full mode size.  */
   actual_precision = GET_MODE_PRECISION (actual_mode);
-  if (actual_precision != TYPE_PRECISION (type1))
+  if (actual_precision != TYPE_PRECISION (type1)
+      || from_unsigned1 != TYPE_UNSIGNED (type1))
     {
       tmp = create_tmp_var (build_nonstandard_integer_type
-				(actual_precision, TYPE_UNSIGNED (type1)),
+				(actual_precision, from_unsigned1),
 			    NULL);
-
       mult_rhs1 = build_and_insert_cast (gsi, loc, tmp, mult_rhs1);
+    }
+  if (actual_precision != TYPE_PRECISION (type2)
+      || from_unsigned2 != TYPE_UNSIGNED (type2))
+    {
+      if (!tmp || from_unsigned1 != from_unsigned2)
+	tmp = create_tmp_var (build_nonstandard_integer_type
+				(actual_precision, from_unsigned2),
+			      NULL);
       mult_rhs2 = build_and_insert_cast (gsi, loc, tmp, mult_rhs2);
     }
 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-07-14 14:28           ` Andrew Stubbs
@ 2011-07-14 14:31             ` Richard Guenther
  2011-08-19 14:51               ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-14 14:31 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Thu, Jul 14, 2011 at 4:23 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 12/07/11 15:07, Andrew Stubbs wrote:
>>
>> This update does the same thing as before, but updated for the changes
>> earlier in the patch series. In particular, the build_and_insert_cast
>> function and find_widening_optab_handler_and_mode changes have been
>> moved up to patch 2.
>
> And this update changes the way the casts are handled, partly because it got
> unwieldy towards the end of the patch series, and partly because I found a
> few bugs.
>
> I've also ensured that it checks the precision of the types, rather than the
> mode size to ensure that it is bitfield safe.
>
> OK?

Ok.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
  2011-07-07 10:11       ` Richard Guenther
@ 2011-07-14 14:34         ` Andrew Stubbs
  2011-07-14 14:35           ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-14 14:34 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 161 bytes --]

I've updated this patch following the changes earlier in the patch 
series. There isn't much left.

This should obviate all the review comments. :)

OK?

Andrew

[-- Attachment #2: widening-multiplies-5.patch --]
[-- Type: text/x-patch, Size: 1161 bytes --]

2011-07-14  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME.
	Ensure the the larger type is the first operand.

	gcc/testsuite/
	* gcc.target/arm/wmul-7.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+unsigned long long
+foo (unsigned long long a, unsigned char *b, unsigned short *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2053,9 +2053,17 @@ is_widening_mult_p (gimple stmt,
       *type2_out = *type1_out;
     }
 
-  /* FIXME: remove this restriction.  */
-  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
-    return false;
+  /* Ensure that the larger of the two operands comes first. */
+  if (TYPE_PRECISION (*type1_out) < TYPE_PRECISION (*type2_out))
+    {
+      tree tmp;
+      tmp = *type1_out;
+      *type1_out = *type2_out;
+      *type2_out = tmp;
+      tmp = *rhs1_out;
+      *rhs1_out = *rhs2_out;
+      *rhs2_out = tmp;
+    }
 
   return true;
 }

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
  2011-07-14 14:34         ` Andrew Stubbs
@ 2011-07-14 14:35           ` Richard Guenther
  2011-08-19 14:54             ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-14 14:35 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Thu, Jul 14, 2011 at 4:28 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> I've updated this patch following the changes earlier in the patch series.
> There isn't much left.
>
> This should obviate all the review comments. :)

Indeed ;)

> OK?

Ok.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
  2011-07-07 10:20       ` Richard Guenther
@ 2011-07-14 14:35         ` Andrew Stubbs
  2011-07-14 14:41           ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-14 14:35 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

On 07/07/11 11:13, Richard Guenther wrote:
>> This updates the context changed by my update to patch 3.
>> >
>> >  The content of this patch has not changed.
> Ok.

I know this patch was already approved, but I discovered a bug in this 
patch that missed optimizing the case where the input to multiply did 
not come from an assign statement (this can happen when the value comes 
from a function parameter).

This patch fixes that case, and updates the context changed by my 
updates earlier in the patch series.

OK?

Andrew

[-- Attachment #2: widening-multiplies-6.patch --]
[-- Type: text/x-patch, Size: 5369 bytes --]

2011-07-14  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
	'type'.
	Use 'type' from caller, not inferred from 'rhs'.
	Don't reject non-conversion statements. Do return lhs in this case.
	(is_widening_mult_p): Add new argument 'type'.
	Use 'type' from caller, not inferred from 'stmt'.
	Pass type to is_widening_mult_rhs_p.
	(convert_mult_to_widen): Pass type to is_widening_mult_p.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-8.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, int *b, int *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1965,7 +1965,8 @@ struct gimple_opt_pass pass_optimize_bswap =
  }
 };
 
-/* Return true if RHS is a suitable operand for a widening multiplication.
+/* Return true if RHS is a suitable operand for a widening multiplication,
+   assuming a target type of TYPE.
    There are two cases:
 
      - RHS makes some value at least twice as wide.  Store that value
@@ -1975,32 +1976,43 @@ struct gimple_opt_pass pass_optimize_bswap =
        but leave *TYPE_OUT untouched.  */
 
 static bool
-is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
+is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
+			tree *new_rhs_out)
 {
   gimple stmt;
-  tree type, type1, rhs1;
+  tree type1, rhs1;
   enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
     {
-      type = TREE_TYPE (rhs);
       stmt = SSA_NAME_DEF_STMT (rhs);
       if (!is_gimple_assign (stmt))
-	return false;
-
-      rhs_code = gimple_assign_rhs_code (stmt);
-      if (TREE_CODE (type) == INTEGER_TYPE
-	  ? !CONVERT_EXPR_CODE_P (rhs_code)
-	  : rhs_code != FIXED_CONVERT_EXPR)
-	return false;
+	{
+	  rhs1 = NULL;
+	  type1 = TREE_TYPE (rhs);
+	}
+      else
+	{
+	  rhs1 = gimple_assign_rhs1 (stmt);
+	  type1 = TREE_TYPE (rhs1);
+	}
 
-      rhs1 = gimple_assign_rhs1 (stmt);
-      type1 = TREE_TYPE (rhs1);
       if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
 
-      *new_rhs_out = rhs1;
+      if (rhs1)
+	{
+	  rhs_code = gimple_assign_rhs_code (stmt);
+	  if (TREE_CODE (type) == INTEGER_TYPE
+	      ? !CONVERT_EXPR_CODE_P (rhs_code)
+	      : rhs_code != FIXED_CONVERT_EXPR)
+	    *new_rhs_out = rhs;
+	  else
+	    *new_rhs_out = rhs1;
+	}
+      else
+	*new_rhs_out = rhs;
       *type_out = type1;
       return true;
     }
@@ -2015,28 +2027,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
   return false;
 }
 
-/* Return true if STMT performs a widening multiplication.  If so,
-   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
-   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
-   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
-   operands of the multiplication.  */
+/* Return true if STMT performs a widening multiplication, assuming the
+   output type is TYPE.  If so, store the unwidened types of the operands
+   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
+   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
+   and *TYPE2_OUT would give the operands of the multiplication.  */
 
 static bool
-is_widening_mult_p (gimple stmt,
+is_widening_mult_p (tree type, gimple stmt,
 		    tree *type1_out, tree *rhs1_out,
 		    tree *type2_out, tree *rhs2_out)
 {
-  tree type;
-
-  type = TREE_TYPE (gimple_assign_lhs (stmt));
   if (TREE_CODE (type) != INTEGER_TYPE
       && TREE_CODE (type) != FIXED_POINT_TYPE)
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
+			       rhs1_out))
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
+			       rhs2_out))
     return false;
 
   if (*type1_out == NULL)
@@ -2088,7 +2099,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != INTEGER_TYPE)
     return false;
 
-  if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
+  if (!is_widening_mult_p (type, stmt, &type1, &rhs1, &type2, &rhs2))
     return false;
 
   to_mode = TYPE_MODE (type);
@@ -2254,7 +2265,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (code == PLUS_EXPR
       && (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR))
     {
-      if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
@@ -2262,7 +2273,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
     {
-      if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
  2011-07-14 14:35         ` Andrew Stubbs
@ 2011-07-14 14:41           ` Richard Guenther
  2011-08-19 15:03             ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-14 14:41 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Thu, Jul 14, 2011 at 4:34 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 07/07/11 11:13, Richard Guenther wrote:
>>>
>>> This updates the context changed by my update to patch 3.
>>> >
>>> >  The content of this patch has not changed.
>>
>> Ok.
>
> I know this patch was already approved, but I discovered a bug in this patch
> that missed optimizing the case where the input to multiply did not come
> from an assign statement (this can happen when the value comes from a
> function parameter).
>
> This patch fixes that case, and updates the context changed by my updates
> earlier in the patch series.
>
> OK?

Ok.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode
  2011-06-28 17:02   ` Andrew Stubbs
@ 2011-07-14 14:44     ` Andrew Stubbs
  2011-07-14 14:48       ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-14 14:44 UTC (permalink / raw)
  Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 1085 bytes --]

On 28/06/11 17:23, Andrew Stubbs wrote:
> On 23/06/11 15:43, Andrew Stubbs wrote:
>> Patch 4 introduced support for using signed multiplies to code unsigned
>> multiplies in a narrower mode. Patch 5 then introduced support for
>> mis-matched input modes.
>>
>> These two combined mean that there is case where only the smaller of two
>> inputs is unsigned, and yet it still tries to user a mode wider than the
>> larger, signed input. This is bad because it means unnecessary extends
>> and because the wider operation might not exist.
>>
>> This patch catches that case, and ensures that the smaller, unsigned
>> input, is zero-extended to match the mode of the larger, signed input.
>>
>> Of course, both inputs may still have to be extended to fit the nearest
>> available instruction, so it doesn't make a difference every time.
>>
>> OK?
>
> This update fixes Janis' issue with the testsuite.

And this version is updated to fit the changes made earlier in the 
series, and also to use the precision, instead of the mode-size, in 
order to better optimize bitfields.

OK?

Andrew

[-- Attachment #2: widening-multiplies-7.patch --]
[-- Type: text/x-patch, Size: 3017 bytes --]

2011-06-24  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Better handle
	unsigned inputs of different modes.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-9.c: New file.
	* gcc.target/arm/wmul-bitfield-2.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, short *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-bitfield-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+struct bf
+{
+  int a : 3;
+  unsigned int b : 15;
+  int c : 3;
+};
+
+long long
+foo (long long a, struct bf b, struct bf c)
+{
+  return a + b.b * c.c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2121,9 +2121,18 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
     {
       if (op != smul_widen_optab)
 	{
-	  from_mode = GET_MODE_WIDER_MODE (from_mode);
-	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
-	    return false;
+	  /* We can use a signed multiply with unsigned types as long as
+	     there is a wider mode to use, or it is the smaller of the two
+	     types that is unsigned.  Note that type1 >= type2, always.  */
+	  if ((TYPE_UNSIGNED (type1)
+	       && TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode))
+	      || (TYPE_UNSIGNED (type2)
+		  && TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode)))
+	    {
+	      from_mode = GET_MODE_WIDER_MODE (from_mode);
+	      if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+		return false;
+	    }
 
 	  op = smul_widen_optab;
 	  handler = find_widening_optab_handler_and_mode (op, to_mode,
@@ -2290,14 +2299,20 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   /* There's no such thing as a mixed sign madd yet, so use a wider mode.  */
   if (from_unsigned1 != from_unsigned2)
     {
-      enum machine_mode mode = GET_MODE_WIDER_MODE (from_mode);
-      if (GET_MODE_PRECISION (mode) < GET_MODE_PRECISION (to_mode))
+      /* We can use a signed multiply with unsigned types as long as
+	 there is a wider mode to use, or it is the smaller of the two
+	 types that is unsigned.  Note that type1 >= type2, always.  */
+      if ((from_unsigned1
+	   && TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode))
+	  || (from_unsigned2
+	      && TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode)))
 	{
-	  from_mode = mode;
-	  from_unsigned1 = from_unsigned2 = false;
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (from_mode) >= GET_MODE_SIZE (to_mode))
+	    return false;
 	}
-      else
-	return false;
+
+      from_unsigned1 = from_unsigned2 = false;
     }
 
   /* If there was a conversion between the multiply and addition

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode
  2011-07-14 14:44     ` Andrew Stubbs
@ 2011-07-14 14:48       ` Richard Guenther
  2011-08-19 15:56         ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-14 14:48 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Thu, Jul 14, 2011 at 4:38 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 28/06/11 17:23, Andrew Stubbs wrote:
>>
>> On 23/06/11 15:43, Andrew Stubbs wrote:
>>>
>>> Patch 4 introduced support for using signed multiplies to code unsigned
>>> multiplies in a narrower mode. Patch 5 then introduced support for
>>> mis-matched input modes.
>>>
>>> These two combined mean that there is case where only the smaller of two
>>> inputs is unsigned, and yet it still tries to user a mode wider than the
>>> larger, signed input. This is bad because it means unnecessary extends
>>> and because the wider operation might not exist.
>>>
>>> This patch catches that case, and ensures that the smaller, unsigned
>>> input, is zero-extended to match the mode of the larger, signed input.
>>>
>>> Of course, both inputs may still have to be extended to fit the nearest
>>> available instruction, so it doesn't make a difference every time.
>>>
>>> OK?
>>
>> This update fixes Janis' issue with the testsuite.
>
> And this version is updated to fit the changes made earlier in the series,
> and also to use the precision, instead of the mode-size, in order to better
> optimize bitfields.
>
> OK?

Ok.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-07-09 15:38   ` Andrew Stubbs
@ 2011-07-14 15:29     ` Andrew Stubbs
  2011-07-22 13:01     ` Bernd Schmidt
  1 sibling, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-14 15:29 UTC (permalink / raw)
  Cc: gcc-patches, patches

Ping. This is the last unreviewed patch in this series ...

Thanks

Andrew

On 09/07/11 15:43, Andrew Stubbs wrote:
> On 23/06/11 15:37, Andrew Stubbs wrote:
>> This patch should have no effect on the compiler output. It merely
>> replaces one way to represent widening operations with another, and
>> refactors the other parts of the compiler to match. The rest of the
>> patch set uses this new framework to implement the optimization
>> improvements.
>>
>> I considered and discarded many approaches to this patch before arriving
>> at this solution, and I feel sure that there'll be somebody out there
>> who will think I chose the wrong one, so let me first explain how I got
>> here ....
>>
>> The aim is to be able to encode and query optabs that have any given
>> input mode, and any given output mode. This is similar to the
>> convert_optab, but not compatible with that optab since it is handled
>> completely differently in the code.
>>
>> (Just to be clear, the existing widening multiply support only covers
>> instructions that widen by *one* mode, so it's only ever been necessary
>> to know the output mode, up to now.)
>>
>> Option 1 was to add a second dimension to the handlers table in optab_d,
>> but I discarded this option because it would increase the memory usage
>> by the square of the number of modes, which is a bit much.
>>
>> Option 2 was to add a whole new optab, similar to optab_d, but with a
>> second dimension like convert_optab_d, however this turned out to cause
>> way too many pointer type mismatches in the code, and would have been
>> very difficult to fix up.
>>
>> Option 3 was to add new optab entries for widening by two modes, by
>> three modes, and so on. True, I would only need to add one extra set for
>> what I need, but there would be so many places in the code that compare
>> against smul_widen_optab, for example, that would need to be taught
>> about these, that it seemed like a bad idea.
>>
>> Option 4 was to have a separate table that contained the widening
>> operations, and refer to that whenever a widening entry in the main
>> optab is referenced, but I found that there was no easy way to do the
>> mapping without putting some sort of switch table in
>> widening_optab_handler, and that negates the other advantages.
>>
>> So, what I've done in the end is add a new pointer entry "widening" into
>> optab_d, and dynamically build the widening operations table for each
>> optab that needs it. I've then added new accessor functions that take
>> both input and output modes, and altered the code to use them where
>> appropriate.
>>
>> The down-side of this approach is that the optab entries for widening
>> operations now have two "handlers" tables, one of which is redundant.
>> That said, those cases are in the minority, and it is the smaller table
>> which is unused.
>>
>> If people find that very distasteful, it might be possible to remove the
>> *_widen_optab entries and unify smul_optab with smul_widen_optab, and so
>> on, and save space that way. I've not done so yet, but I expect I could
>> if people feel strongly about it.
>>
>> As a side-effect, it's now possible for any optab to be "widening",
>> should some target happen to have a widening add, shift, or whatever.
>>
>> Is this patch OK?
>
> This update has been rebaselined to fix some conflicts with other recent
> commits in this area.
>
> I also identified a small bug which resulted in the operands to some
> commutative operations being reversed. I don't believe the bug did any
> harm, logically speaking, but I suppose there could be a testcase that
> resulted in worse code being generated. With this fix, I now see exactly
> matching output in all my testcases.
>
> Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH (8/7)] Fix a bug in multiply-and-accumulate
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
                   ` (7 preceding siblings ...)
  2011-06-25 16:14 ` [PATCH (0/7)] Improve use of Widening Multiplies Bernd Schmidt
@ 2011-07-18 14:34 ` Andrew Stubbs
  2011-07-18 16:09   ` Richard Guenther
  2011-07-21 13:14 ` [PATCH (9/7)] Widening multiplies with constant inputs Andrew Stubbs
  9 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-18 14:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 479 bytes --]

As far as I can tell, the patch series so far works great as long as the 
input type of the accumulate value is the same as the output type. 
Unfortunately you get an ICE otherwise .... doh!

This patch should fix the problem.

I could have inserted this fix into the correct spot in the existing 
series, but I've already regenerated the whole lot several times, it's 
getting confusing, and they're all approved already, so I'm just going 
to tack this one on the end.

Andrew

[-- Attachment #2: widening-multiplies-8.patch --]
[-- Type: text/x-patch, Size: 1091 bytes --]

2011-07-18  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Convert add_rhs
	to the correct type.

	gcc/testsuite/
	* gcc.target/arm/wmul-10.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-10.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+unsigned long long
+foo (unsigned short a, unsigned short *b, unsigned short *c)
+{
+  return (unsigned)a + (unsigned long long)*b * (unsigned long long)*c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2375,6 +2375,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       mult_rhs2 = build_and_insert_cast (gsi, loc, tmp, mult_rhs2);
     }
 
+  if (TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (add_rhs)))
+    add_rhs = build_and_insert_cast (gsi, loc, create_tmp_var (type, NULL),
+				     add_rhs);
+
   gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code, mult_rhs1, mult_rhs2,
 				    add_rhs);
   update_stmt (gsi_stmt (*gsi));

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (8/7)] Fix a bug in multiply-and-accumulate
  2011-07-18 14:34 ` [PATCH (8/7)] Fix a bug in multiply-and-accumulate Andrew Stubbs
@ 2011-07-18 16:09   ` Richard Guenther
  2011-07-21 13:48     ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-18 16:09 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On Mon, Jul 18, 2011 at 3:14 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> As far as I can tell, the patch series so far works great as long as the
> input type of the accumulate value is the same as the output type.
> Unfortunately you get an ICE otherwise .... doh!
>
> This patch should fix the problem.
>
> I could have inserted this fix into the correct spot in the existing series,
> but I've already regenerated the whole lot several times, it's getting
> confusing, and they're all approved already, so I'm just going to tack this
> one on the end.

Will signedness be always the same?  Usually the canonical check to
use would be !useless_type_conversion_p (type, TREE_TYPE (add_rhs)).

Ok if you use that.

Thanks,
Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-14 14:26                                 ` Andrew Stubbs
@ 2011-07-19  0:36                                   ` Janis Johnson
  2011-07-19  9:01                                     ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Janis Johnson @ 2011-07-19  0:36 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Guenther, Michael Matz, gcc-patches, patches

On 07/14/2011 07:16 AM, Andrew Stubbs wrote:
> { dg-options "-O2 -march=armv7-a" }

The tests use "{ dg-options "-O2 -march=armv7-a" }" but -march will be
overridden for multilibs that specify -march, and might conflict with
other multilib options.  If you really need that particular -march value
then use dg-skip-if to skip multilibs with conflicting or overriding
options, or else "dg-require-effective-target arm_dsp" to only run the
tests when the multilib already supports it; that has the advantage of
testing a wider range of arch values.

Janis

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-19  0:36                                   ` Janis Johnson
@ 2011-07-19  9:01                                     ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-19  9:01 UTC (permalink / raw)
  To: Janis Johnson; +Cc: Richard Guenther, Michael Matz, gcc-patches, patches

On 19/07/11 00:33, Janis Johnson wrote:
> On 07/14/2011 07:16 AM, Andrew Stubbs wrote:
>> { dg-options "-O2 -march=armv7-a" }
>
> The tests use "{ dg-options "-O2 -march=armv7-a" }" but -march will be
> overridden for multilibs that specify -march, and might conflict with
> other multilib options.  If you really need that particular -march value
> then use dg-skip-if to skip multilibs with conflicting or overriding
> options, or else "dg-require-effective-target arm_dsp" to only run the
> tests when the multilib already supports it; that has the advantage of
> testing a wider range of arch values.

Yes, I know about this one. You committed that feature since I first 
posted this, I think? I plan to make that change when I do the final commit.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH (9/7)] Widening multiplies with constant inputs
  2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
                   ` (8 preceding siblings ...)
  2011-07-18 14:34 ` [PATCH (8/7)] Fix a bug in multiply-and-accumulate Andrew Stubbs
@ 2011-07-21 13:14 ` Andrew Stubbs
  2011-07-21 14:34   ` Richard Guenther
  9 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-21 13:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

[-- Attachment #1: Type: text/plain, Size: 871 bytes --]

This patch is part bug fix, part better optimization.

Firstly, my initial patch series introduced a bug that caused an 
internal compiler error when the input to a multiply was a constant. 
This was caused by the gimple verification rejecting such things. I'm 
not totally clear how this ever worked, but I've corrected it by 
inserting a temporary SSA_NAME between the constant and the multiply.

I also discovered that widening multiply-and-accumulate operations were 
not recognised if any one of the three inputs were a constant. I've 
corrected this by adjusting the pattern matching. This also required 
inserting new SSA_NAMEs to make it work.

In order to insert the new SSA_NAME, I've simply reused the existing 
type conversion code - the only difference is that the conversion may be 
a no-op, so it just generates a straight forward assignment.

OK?

Andrew

[-- Attachment #2: widening-multiplies-9.patch --]
[-- Type: text/x-patch, Size: 4353 bytes --]

2011-07-21  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Handle constants
	beyond conversions.
	(convert_mult_to_widen): Create SSA_NAME for constant inputs.
	(convert_plusminus_to_widen): Don't automatically reject inputs that are
	not an SSA_NAME.
	Create SSA_NAME for constant inputs.

	gcc/testsuite/
	* gcc.target/arm/wmul-11.c: New file.
	* gcc.target/arm/wmul-12.c: New file.
	* gcc.target/arm/wmul-13.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-11.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *b)
+{
+  return 10 * (long long)*b;
+}
+
+/* { dg-final { scan-assembler "smull" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-12.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *b, int *c)
+{
+  int tmp = *b * *c;
+  return 10 + (long long)tmp;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-13.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *a, int *b)
+{
+  return *a + (long long)*b * 10;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1997,6 +1997,13 @@ is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
 	  type1 = TREE_TYPE (rhs1);
 	}
 
+      if (TREE_CODE (rhs1) == INTEGER_CST)
+	{
+	  *new_rhs_out = rhs1;
+	  *type_out = NULL;
+	  return true;
+	}
+
       if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
@@ -2152,7 +2159,8 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
      for the opcode.  This will be the full mode size.  */
   actual_precision = GET_MODE_PRECISION (actual_mode);
   if (actual_precision != TYPE_PRECISION (type1)
-      || from_unsigned1 != TYPE_UNSIGNED (type1))
+      || from_unsigned1 != TYPE_UNSIGNED (type1)
+      || TREE_CODE (rhs1) != SSA_NAME)
     {
       tmp = create_tmp_var (build_nonstandard_integer_type
 				(actual_precision, from_unsigned1),
@@ -2160,7 +2168,8 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
       rhs1 = build_and_insert_cast (gsi, loc, tmp, rhs1);
     }
   if (actual_precision != TYPE_PRECISION (type2)
-      || from_unsigned2 != TYPE_UNSIGNED (type2))
+      || from_unsigned2 != TYPE_UNSIGNED (type2)
+      || TREE_CODE (rhs2) != SSA_NAME)
     {
       /* Reuse the same type info, if possible.  */
       if (!tmp || from_unsigned1 != from_unsigned2)
@@ -2221,8 +2230,6 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (is_gimple_assign (rhs1_stmt))
 	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
     }
-  else
-    return false;
 
   if (TREE_CODE (rhs2) == SSA_NAME)
     {
@@ -2230,8 +2237,6 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (is_gimple_assign (rhs2_stmt))
 	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
     }
-  else
-    return false;
 
   /* Allow for one conversion statement between the multiply
      and addition/subtraction statement.  If there are more than
@@ -2358,7 +2363,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      for the opcode.  This will be the full mode size.  */
   actual_precision = GET_MODE_PRECISION (actual_mode);
   if (actual_precision != TYPE_PRECISION (type1)
-      || from_unsigned1 != TYPE_UNSIGNED (type1))
+      || from_unsigned1 != TYPE_UNSIGNED (type1)
+      || TREE_CODE (mult_rhs1) != SSA_NAME)
     {
       tmp = create_tmp_var (build_nonstandard_integer_type
 				(actual_precision, from_unsigned1),
@@ -2366,7 +2372,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       mult_rhs1 = build_and_insert_cast (gsi, loc, tmp, mult_rhs1);
     }
   if (actual_precision != TYPE_PRECISION (type2)
-      || from_unsigned2 != TYPE_UNSIGNED (type2))
+      || from_unsigned2 != TYPE_UNSIGNED (type2)
+      || TREE_CODE (mult_rhs2) != SSA_NAME)
     {
       if (!tmp || from_unsigned1 != from_unsigned2)
 	tmp = create_tmp_var (build_nonstandard_integer_type

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (8/7)] Fix a bug in multiply-and-accumulate
  2011-07-18 16:09   ` Richard Guenther
@ 2011-07-21 13:48     ` Andrew Stubbs
  2011-08-19 16:22       ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-21 13:48 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 540 bytes --]

On 18/07/11 15:46, Richard Guenther wrote:
> Will signedness be always the same?  Usually the canonical check to
> use would be !useless_type_conversion_p (type, TREE_TYPE (add_rhs)).

The signedness ought to be unimportant - any extend will be based on the 
source type, and the signedness should not affect the addition 
operation. That said, it really ought to remain correct or else bad 
things could happen in later optimizations ....

Here is the patch I plan to commit, when patch 1 is approved, and my 
testing is complete.

Andrew

[-- Attachment #2: widening-multiplies-8.patch --]
[-- Type: text/x-patch, Size: 1118 bytes --]

2011-07-21  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Convert add_rhs
	to the correct type.

	gcc/testsuite/
	* gcc.target/arm/wmul-10.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-10.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+
+unsigned long long
+foo (unsigned short a, unsigned short *b, unsigned short *c)
+{
+  return (unsigned)a + (unsigned long long)*b * (unsigned long long)*c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2375,6 +2375,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       mult_rhs2 = build_and_insert_cast (gsi, loc, tmp, mult_rhs2);
     }
 
+  if (!useless_type_conversion_p (type, TREE_TYPE (add_rhs)))
+    add_rhs = build_and_insert_cast (gsi, loc, create_tmp_var (type, NULL),
+				     add_rhs);
+
   gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code, mult_rhs1, mult_rhs2,
 				    add_rhs);
   update_stmt (gsi_stmt (*gsi));

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (9/7)] Widening multiplies with constant inputs
  2011-07-21 13:14 ` [PATCH (9/7)] Widening multiplies with constant inputs Andrew Stubbs
@ 2011-07-21 14:34   ` Richard Guenther
  2011-07-22 12:28     ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-21 14:34 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Thu, Jul 21, 2011 at 2:53 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> This patch is part bug fix, part better optimization.
>
> Firstly, my initial patch series introduced a bug that caused an internal
> compiler error when the input to a multiply was a constant. This was caused
> by the gimple verification rejecting such things. I'm not totally clear how
> this ever worked, but I've corrected it by inserting a temporary SSA_NAME
> between the constant and the multiply.

Huh?  Constant operands should be perfectly fine.  What was the error
you got?

> I also discovered that widening multiply-and-accumulate operations were not
> recognised if any one of the three inputs were a constant. I've corrected
> this by adjusting the pattern matching. This also required inserting new
> SSA_NAMEs to make it work.

See above.

> In order to insert the new SSA_NAME, I've simply reused the existing type
> conversion code - the only difference is that the conversion may be a no-op,
> so it just generates a straight forward assignment.
>
> OK?

Nope.  I suppose you forget to adjust the constants type?  Just
fold-convert it before using it as input to a macc.

Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-07-12 11:14       ` Richard Guenther
  2011-07-12 11:38         ` Andrew Stubbs
@ 2011-07-21 19:51         ` Joseph S. Myers
  2011-07-22  8:58           ` Andrew Stubbs
  1 sibling, 1 reply; 107+ messages in thread
From: Joseph S. Myers @ 2011-07-21 19:51 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew Stubbs, gcc-patches, patches

On Tue, 12 Jul 2011, Richard Guenther wrote:

> (**) We really ought to forbid any arithmetic on types that have non-mode
> precision and only allow conversions to/from such types.

Arithmetic on such types is a perfectly reasonable notion to have in 
language-independent code and carry out language-independent optimizations 
on.  There may well be a case for lowering such arithmetic earlier than 
the present point at which it's lowered (expand), but it isn't obvious 
that gimplification is the right point for that lowering either.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-07-21 19:51         ` Joseph S. Myers
@ 2011-07-22  8:58           ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-22  8:58 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, gcc-patches, patches

On 21/07/11 20:29, Joseph S. Myers wrote:
> On Tue, 12 Jul 2011, Richard Guenther wrote:
>
>> (**) We really ought to forbid any arithmetic on types that have non-mode
>> precision and only allow conversions to/from such types.
>
> Arithmetic on such types is a perfectly reasonable notion to have in
> language-independent code and carry out language-independent optimizations
> on.  There may well be a case for lowering such arithmetic earlier than
> the present point at which it's lowered (expand), but it isn't obvious
> that gimplification is the right point for that lowering either.

This optimization deals with real machine instructions, and so the 
inputs must always be in whole-mode sizes. With my patch, this pass 
inserts conversions to ensure this is the case.

However, the code takes into account the true precision of each input 
when selecting the most optimal machine instruction to use, so I think 
it should have satisfied both goals.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (9/7)] Widening multiplies with constant inputs
  2011-07-21 14:34   ` Richard Guenther
@ 2011-07-22 12:28     ` Andrew Stubbs
  2011-07-22 12:32       ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-22 12:28 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

On 21/07/11 14:22, Richard Guenther wrote:
> On Thu, Jul 21, 2011 at 2:53 PM, Andrew Stubbs<ams@codesourcery.com>  wrote:
>> This patch is part bug fix, part better optimization.
>>
>> Firstly, my initial patch series introduced a bug that caused an internal
>> compiler error when the input to a multiply was a constant. This was caused
>> by the gimple verification rejecting such things. I'm not totally clear how
>> this ever worked, but I've corrected it by inserting a temporary SSA_NAME
>> between the constant and the multiply.
>
> Huh?  Constant operands should be perfectly fine.  What was the error
> you got?

Ok, so it seems that the fold_convert we thought was redundant in patch 
5 (now moved to patch 2) was in fact responsible for making constants 
the right type.

I've rewritten it to use fold_convert to change the constant.

>> I also discovered that widening multiply-and-accumulate operations were not
>> recognised if any one of the three inputs were a constant. I've corrected
>> this by adjusting the pattern matching. This also required inserting new
>> SSA_NAMEs to make it work.
>
> See above.

The pattern matching stuff remains the same, but the constant 
conversions have been updated.

>> In order to insert the new SSA_NAME, I've simply reused the existing type
>> conversion code - the only difference is that the conversion may be a no-op,
>> so it just generates a straight forward assignment.
>>
>> OK?
>
> Nope.  I suppose you forget to adjust the constants type?  Just
> fold-convert it before using it as input to a macc.

Done.

OK?

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (9/7)] Widening multiplies with constant inputs
  2011-07-22 12:28     ` Andrew Stubbs
@ 2011-07-22 12:32       ` Andrew Stubbs
  2011-07-22 12:34         ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-22 12:32 UTC (permalink / raw)
  Cc: Richard Guenther, gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 1721 bytes --]

ENOPATCH ....

On 22/07/11 12:57, Andrew Stubbs wrote:
> On 21/07/11 14:22, Richard Guenther wrote:
>> On Thu, Jul 21, 2011 at 2:53 PM, Andrew Stubbs<ams@codesourcery.com>
>> wrote:
>>> This patch is part bug fix, part better optimization.
>>>
>>> Firstly, my initial patch series introduced a bug that caused an
>>> internal
>>> compiler error when the input to a multiply was a constant. This was
>>> caused
>>> by the gimple verification rejecting such things. I'm not totally
>>> clear how
>>> this ever worked, but I've corrected it by inserting a temporary
>>> SSA_NAME
>>> between the constant and the multiply.
>>
>> Huh? Constant operands should be perfectly fine. What was the error
>> you got?
>
> Ok, so it seems that the fold_convert we thought was redundant in patch
> 5 (now moved to patch 2) was in fact responsible for making constants
> the right type.
>
> I've rewritten it to use fold_convert to change the constant.
>
>>> I also discovered that widening multiply-and-accumulate operations
>>> were not
>>> recognised if any one of the three inputs were a constant. I've
>>> corrected
>>> this by adjusting the pattern matching. This also required inserting new
>>> SSA_NAMEs to make it work.
>>
>> See above.
>
> The pattern matching stuff remains the same, but the constant
> conversions have been updated.
>
>>> In order to insert the new SSA_NAME, I've simply reused the existing
>>> type
>>> conversion code - the only difference is that the conversion may be a
>>> no-op,
>>> so it just generates a straight forward assignment.
>>>
>>> OK?
>>
>> Nope. I suppose you forget to adjust the constants type? Just
>> fold-convert it before using it as input to a macc.
>
> Done.
>
> OK?
>
> Andrew
>


[-- Attachment #2: widening-multiplies-9.patch --]
[-- Type: text/x-patch, Size: 3431 bytes --]

2011-07-22  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Handle constants
	beyond conversions.
	(convert_mult_to_widen): Convert constant inputs to the right type.
	(convert_plusminus_to_widen): Don't automatically reject inputs that
	are not an SSA_NAME.
	Convert constant inputs to the right type.

	gcc/testsuite/
	* gcc.target/arm/wmul-11.c: New file.
	* gcc.target/arm/wmul-12.c: New file.
	* gcc.target/arm/wmul-13.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-11.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *b)
+{
+  return 10 * (long long)*b;
+}
+
+/* { dg-final { scan-assembler "smull" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-12.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *b, int *c)
+{
+  int tmp = *b * *c;
+  return 10 + (long long)tmp;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-13.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *a, int *b)
+{
+  return *a + (long long)*b * 10;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1997,6 +1997,13 @@ is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
 	  type1 = TREE_TYPE (rhs1);
 	}
 
+      if (TREE_CODE (rhs1) == INTEGER_CST)
+	{
+	  *new_rhs_out = rhs1;
+	  *type_out = NULL;
+	  return true;
+	}
+
       if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
@@ -2170,6 +2177,12 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
       rhs2 = build_and_insert_cast (gsi, loc, tmp, rhs2);
     }
 
+  /* Handle constants.  */
+  if (TREE_CODE (rhs1) == INTEGER_CST)
+    rhs1 = fold_convert (type1, rhs1);
+  if (TREE_CODE (rhs2) == INTEGER_CST)
+    rhs2 = fold_convert (type2, rhs2);
+
   gimple_assign_set_rhs1 (stmt, rhs1);
   gimple_assign_set_rhs2 (stmt, rhs2);
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
@@ -2221,8 +2234,6 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (is_gimple_assign (rhs1_stmt))
 	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
     }
-  else
-    return false;
 
   if (TREE_CODE (rhs2) == SSA_NAME)
     {
@@ -2230,8 +2241,6 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (is_gimple_assign (rhs2_stmt))
 	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
     }
-  else
-    return false;
 
   /* Allow for one conversion statement between the multiply
      and addition/subtraction statement.  If there are more than
@@ -2379,6 +2388,12 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     add_rhs = build_and_insert_cast (gsi, loc, create_tmp_var (type, NULL),
 				     add_rhs);
 
+  /* Handle constants.  */
+  if (TREE_CODE (mult_rhs1) == INTEGER_CST)
+    rhs1 = fold_convert (type1, mult_rhs1);
+  if (TREE_CODE (mult_rhs2) == INTEGER_CST)
+    rhs2 = fold_convert (type2, mult_rhs2);
+
   gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code, mult_rhs1, mult_rhs2,
 				    add_rhs);
   update_stmt (gsi_stmt (*gsi));

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (9/7)] Widening multiplies with constant inputs
  2011-07-22 12:32       ` Andrew Stubbs
@ 2011-07-22 12:34         ` Richard Guenther
  2011-07-22 16:06           ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-07-22 12:34 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On Fri, Jul 22, 2011 at 2:07 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> ENOPATCH ....
>
> On 22/07/11 12:57, Andrew Stubbs wrote:
>>
>> On 21/07/11 14:22, Richard Guenther wrote:
>>>
>>> On Thu, Jul 21, 2011 at 2:53 PM, Andrew Stubbs<ams@codesourcery.com>
>>> wrote:
>>>>
>>>> This patch is part bug fix, part better optimization.
>>>>
>>>> Firstly, my initial patch series introduced a bug that caused an
>>>> internal
>>>> compiler error when the input to a multiply was a constant. This was
>>>> caused
>>>> by the gimple verification rejecting such things. I'm not totally
>>>> clear how
>>>> this ever worked, but I've corrected it by inserting a temporary
>>>> SSA_NAME
>>>> between the constant and the multiply.
>>>
>>> Huh? Constant operands should be perfectly fine. What was the error
>>> you got?
>>
>> Ok, so it seems that the fold_convert we thought was redundant in patch
>> 5 (now moved to patch 2) was in fact responsible for making constants
>> the right type.
>>
>> I've rewritten it to use fold_convert to change the constant.
>>
>>>> I also discovered that widening multiply-and-accumulate operations
>>>> were not
>>>> recognised if any one of the three inputs were a constant. I've
>>>> corrected
>>>> this by adjusting the pattern matching. This also required inserting new
>>>> SSA_NAMEs to make it work.
>>>
>>> See above.
>>
>> The pattern matching stuff remains the same, but the constant
>> conversions have been updated.
>>
>>>> In order to insert the new SSA_NAME, I've simply reused the existing
>>>> type
>>>> conversion code - the only difference is that the conversion may be a
>>>> no-op,
>>>> so it just generates a straight forward assignment.
>>>>
>>>> OK?
>>>
>>> Nope. I suppose you forget to adjust the constants type? Just
>>> fold-convert it before using it as input to a macc.
>>
>> Done.
>>
>> OK?

Ok.

Thanks,
Richard.

>> Andrew
>>
>
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-07-09 15:38   ` Andrew Stubbs
  2011-07-14 15:29     ` Andrew Stubbs
@ 2011-07-22 13:01     ` Bernd Schmidt
  2011-07-22 13:50       ` Andrew Stubbs
  1 sibling, 1 reply; 107+ messages in thread
From: Bernd Schmidt @ 2011-07-22 13:01 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On 07/09/11 16:43, Andrew Stubbs wrote:
>> So, what I've done in the end is add a new pointer entry "widening" into
>> optab_d, and dynamically build the widening operations table for each
>> optab that needs it. I've then added new accessor functions that take
>> both input and output modes, and altered the code to use them where
>> appropriate.

I think this is a reasonable approach given the way our code is structured.

> @@ -1242,7 +1242,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
>  		       rtx target, int unsignedp, enum optab_methods methods,
>  		       rtx last)
>  {
> -  enum insn_code icode = optab_handler (binoptab, mode);
> +  enum machine_mode from_mode = GET_MODE (op0);
> +  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);

Please add a new function along the lines of

enum machine_mode
widened_mode (enum machine_mode to_mode, rtx op0, rtx op1)
{
  if (GET_MODE (op1) == VOIDmode)
    return GET_MODE (op0);
  gcc_assert (GET_MODE (op0) == GET_MODE (op1);
  return GET_MODE (op0);
}

I'll want to extend this at some point to allow widening multiplies
where only one operand is widened (with a new set of optabs).

> -	if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
> +	if (optab_handler (binoptab, wider_mode)
> +		!= CODE_FOR_nothing

Spurious formatting change.

Otherwise ok.


Bernd

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-07-22 13:01     ` Bernd Schmidt
@ 2011-07-22 13:50       ` Andrew Stubbs
  2011-07-22 14:01         ` Bernd Schmidt
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-22 13:50 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, patches

On 22/07/11 13:34, Bernd Schmidt wrote:
>> @@ -1242,7 +1242,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
>> >    		rtx target, int unsignedp, enum optab_methods methods,
>> >    		rtx last)
>> >    {
>> >  -  enum insn_code icode = optab_handler (binoptab, mode);
>> >  +  enum machine_mode from_mode = GET_MODE (op0);
>> >  +  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);
> Please add a new function along the lines of
>
> enum machine_mode
> widened_mode (enum machine_mode to_mode, rtx op0, rtx op1)
> {
>    if (GET_MODE (op1) == VOIDmode)
>      return GET_MODE (op0);
>    gcc_assert (GET_MODE (op0) == GET_MODE (op1);
>    return GET_MODE (op0);
> }
>
> I'll want to extend this at some point to allow widening multiplies
> where only one operand is widened (with a new set of optabs).

Sorry, I don't quite understand what you're getting at here?

expand_binop_directly is only ever used, I think, when the tree 
optimizer has already identified what insn to use. Both before and after 
my patch, the tree-cfg gimple verification requires that both op0 and 
op1 are the same mode, and non-widening operation are always he same 
mode, so I think my code is perfectly adequate. Is that not so?

If you want to add support for machine instructions that only widen one 
input, then that's surely a separate problem? If the target mode is 
smaller than the combined size of the inputs, then the changes to the 
widening_mul pass would be non-trivial.

If the point is just to be absolutely certain that the inputs are valid 
then I'm happy to add the function. BTW, did you mean the have the 
unused parameter?

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-07-22 13:50       ` Andrew Stubbs
@ 2011-07-22 14:01         ` Bernd Schmidt
  2011-07-22 15:52           ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Bernd Schmidt @ 2011-07-22 14:01 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

On 07/22/11 15:27, Andrew Stubbs wrote:
> On 22/07/11 13:34, Bernd Schmidt wrote:
>>> @@ -1242,7 +1242,8 @@ expand_binop_directly (enum machine_mode mode,
>>> optab binoptab,
>>> >            rtx target, int unsignedp, enum optab_methods methods,
>>> >            rtx last)
>>> >    {
>>> >  -  enum insn_code icode = optab_handler (binoptab, mode);
>>> >  +  enum machine_mode from_mode = GET_MODE (op0);
>>> >  +  enum insn_code icode = widening_optab_handler (binoptab, mode,
>>> from_mode);
>> Please add a new function along the lines of
>>
>> enum machine_mode
>> widened_mode (enum machine_mode to_mode, rtx op0, rtx op1)
>> {
>>    if (GET_MODE (op1) == VOIDmode)
>>      return GET_MODE (op0);
>>    gcc_assert (GET_MODE (op0) == GET_MODE (op1);
>>    return GET_MODE (op0);
>> }
>>
>> I'll want to extend this at some point to allow widening multiplies
>> where only one operand is widened (with a new set of optabs).
> 
> Sorry, I don't quite understand what you're getting at here?
> 
> expand_binop_directly is only ever used, I think, when the tree
> optimizer has already identified what insn to use. Both before and after
> my patch, the tree-cfg gimple verification requires that both op0 and
> op1 are the same mode, and non-widening operation are always he same
> mode, so I think my code is perfectly adequate. Is that not so?

For the moment, yes.

Oh well, let's shelve it and do it later.


Bernd

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-07-22 14:01         ` Bernd Schmidt
@ 2011-07-22 15:52           ` Andrew Stubbs
  2011-08-19 14:41             ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-22 15:52 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 167 bytes --]

On 22/07/11 14:28, Bernd Schmidt wrote:
> Oh well, let's shelve it and do it later.

Here's an updated patch with the formatting problem you found fixed.

OK?

Andrew

[-- Attachment #2: widening-multiplies-1.patch --]
[-- Type: text/x-patch, Size: 14052 bytes --]

2011-07-22  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* expr.c (expand_expr_real_2): Use widening_optab_handler.
	* genopinit.c (optabs): Use set_widening_optab_handler for $N.
	(gen_insn): $N now means $a must be wider than $b, not consecutive.
	* optabs.c (expand_widen_pattern_expr): Use widening_optab_handler.
	(expand_binop_directly): Likewise.
	(expand_binop): Likewise.
	* optabs.h (widening_optab_handlers): New struct.
	(optab_d): New member, 'widening'.
	(widening_optab_handler): New function.
	(set_widening_optab_handler): New function.
	* tree-ssa-math-opts.c (convert_mult_to_widen): Use
	widening_optab_handler.
	(convert_plusminus_to_widen): Likewise.

--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7662,7 +7662,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  this_optab = usmul_widen_optab;
 	  if (mode == GET_MODE_2XWIDER_MODE (innermode))
 	    {
-	      if (optab_handler (this_optab, mode) != CODE_FOR_nothing)
+	      if (widening_optab_handler (this_optab, mode, innermode)
+		    != CODE_FOR_nothing)
 		{
 		  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
 		    expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -7689,7 +7690,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  if (mode == GET_MODE_2XWIDER_MODE (innermode)
 	      && TREE_CODE (treeop0) != INTEGER_CST)
 	    {
-	      if (optab_handler (this_optab, mode) != CODE_FOR_nothing)
+	      if (widening_optab_handler (this_optab, mode, innermode)
+		    != CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
 				   EXPAND_NORMAL);
@@ -7697,7 +7699,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 					       unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	      if (optab_handler (other_optab, mode) != CODE_FOR_nothing
+	      if (widening_optab_handler (other_optab, mode, innermode)
+		    != CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
 		  rtx htem, hipart;
--- a/gcc/genopinit.c
+++ b/gcc/genopinit.c
@@ -46,10 +46,12 @@ along with GCC; see the file COPYING3.  If not see
    used.  $A and $B are replaced with the full name of the mode; $a and $b
    are replaced with the short form of the name, as above.
 
-   If $N is present in the pattern, it means the two modes must be consecutive
-   widths in the same mode class (e.g, QImode and HImode).  $I means that
-   only full integer modes should be considered for the next mode, and $F
-   means that only float modes should be considered.
+   If $N is present in the pattern, it means the two modes must be in
+   the same mode class, and $b must be greater than $a (e.g, QImode
+   and HImode).
+
+   $I means that only full integer modes should be considered for the
+   next mode, and $F means that only float modes should be considered.
    $P means that both full and partial integer modes should be considered.
    $Q means that only fixed-point modes should be considered.
 
@@ -99,17 +101,17 @@ static const char * const optabs[] =
   "set_optab_handler (smulv_optab, $A, CODE_FOR_$(mulv$I$a3$))",
   "set_optab_handler (umul_highpart_optab, $A, CODE_FOR_$(umul$a3_highpart$))",
   "set_optab_handler (smul_highpart_optab, $A, CODE_FOR_$(smul$a3_highpart$))",
-  "set_optab_handler (smul_widen_optab, $B, CODE_FOR_$(mul$a$b3$)$N)",
-  "set_optab_handler (umul_widen_optab, $B, CODE_FOR_$(umul$a$b3$)$N)",
-  "set_optab_handler (usmul_widen_optab, $B, CODE_FOR_$(usmul$a$b3$)$N)",
-  "set_optab_handler (smadd_widen_optab, $B, CODE_FOR_$(madd$a$b4$)$N)",
-  "set_optab_handler (umadd_widen_optab, $B, CODE_FOR_$(umadd$a$b4$)$N)",
-  "set_optab_handler (ssmadd_widen_optab, $B, CODE_FOR_$(ssmadd$a$b4$)$N)",
-  "set_optab_handler (usmadd_widen_optab, $B, CODE_FOR_$(usmadd$a$b4$)$N)",
-  "set_optab_handler (smsub_widen_optab, $B, CODE_FOR_$(msub$a$b4$)$N)",
-  "set_optab_handler (umsub_widen_optab, $B, CODE_FOR_$(umsub$a$b4$)$N)",
-  "set_optab_handler (ssmsub_widen_optab, $B, CODE_FOR_$(ssmsub$a$b4$)$N)",
-  "set_optab_handler (usmsub_widen_optab, $B, CODE_FOR_$(usmsub$a$b4$)$N)",
+  "set_widening_optab_handler (smul_widen_optab, $B, $A, CODE_FOR_$(mul$a$b3$)$N)",
+  "set_widening_optab_handler (umul_widen_optab, $B, $A, CODE_FOR_$(umul$a$b3$)$N)",
+  "set_widening_optab_handler (usmul_widen_optab, $B, $A, CODE_FOR_$(usmul$a$b3$)$N)",
+  "set_widening_optab_handler (smadd_widen_optab, $B, $A, CODE_FOR_$(madd$a$b4$)$N)",
+  "set_widening_optab_handler (umadd_widen_optab, $B, $A, CODE_FOR_$(umadd$a$b4$)$N)",
+  "set_widening_optab_handler (ssmadd_widen_optab, $B, $A, CODE_FOR_$(ssmadd$a$b4$)$N)",
+  "set_widening_optab_handler (usmadd_widen_optab, $B, $A, CODE_FOR_$(usmadd$a$b4$)$N)",
+  "set_widening_optab_handler (smsub_widen_optab, $B, $A, CODE_FOR_$(msub$a$b4$)$N)",
+  "set_widening_optab_handler (umsub_widen_optab, $B, $A, CODE_FOR_$(umsub$a$b4$)$N)",
+  "set_widening_optab_handler (ssmsub_widen_optab, $B, $A, CODE_FOR_$(ssmsub$a$b4$)$N)",
+  "set_widening_optab_handler (usmsub_widen_optab, $B, $A, CODE_FOR_$(usmsub$a$b4$)$N)",
   "set_optab_handler (sdiv_optab, $A, CODE_FOR_$(div$a3$))",
   "set_optab_handler (ssdiv_optab, $A, CODE_FOR_$(ssdiv$Q$a3$))",
   "set_optab_handler (sdivv_optab, $A, CODE_FOR_$(div$V$I$a3$))",
@@ -305,7 +307,7 @@ gen_insn (rtx insn)
     {
       int force_float = 0, force_int = 0, force_partial_int = 0;
       int force_fixed = 0;
-      int force_consec = 0;
+      int force_wider = 0;
       int matches = 1;
 
       for (pp = optabs[pindex]; pp[0] != '$' || pp[1] != '('; pp++)
@@ -323,7 +325,7 @@ gen_insn (rtx insn)
 	    switch (*++pp)
 	      {
 	      case 'N':
-		force_consec = 1;
+		force_wider = 1;
 		break;
 	      case 'I':
 		force_int = 1;
@@ -392,7 +394,10 @@ gen_insn (rtx insn)
 			    || mode_class[i] == MODE_VECTOR_FRACT
 			    || mode_class[i] == MODE_VECTOR_UFRACT
 			    || mode_class[i] == MODE_VECTOR_ACCUM
-			    || mode_class[i] == MODE_VECTOR_UACCUM))
+			    || mode_class[i] == MODE_VECTOR_UACCUM)
+			&& (! force_wider
+			    || *pp == 'a'
+			    || m1 < i))
 		      break;
 		  }
 
@@ -412,8 +417,7 @@ gen_insn (rtx insn)
 	}
 
       if (matches && pp[0] == '$' && pp[1] == ')'
-	  && *np == 0
-	  && (! force_consec || (int) GET_MODE_WIDER_MODE(m1) == m2))
+	  && *np == 0)
 	break;
     }
 
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -515,8 +515,8 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
       || ops->code == WIDEN_MULT_MINUS_EXPR)
-    icode = optab_handler (widen_pattern_optab,
-			   TYPE_MODE (TREE_TYPE (ops->op2)));
+    icode = widening_optab_handler (widen_pattern_optab,
+				    TYPE_MODE (TREE_TYPE (ops->op2)), tmode0);
   else
     icode = optab_handler (widen_pattern_optab, tmode0);
   gcc_assert (icode != CODE_FOR_nothing);
@@ -1242,7 +1242,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
 		       rtx target, int unsignedp, enum optab_methods methods,
 		       rtx last)
 {
-  enum insn_code icode = optab_handler (binoptab, mode);
+  enum machine_mode from_mode = GET_MODE (op0);
+  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);
   enum machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
   enum machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
   enum machine_mode mode0, mode1, tmp_mode;
@@ -1389,7 +1390,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
   /* If we can do it with a three-operand insn, do so.  */
 
   if (methods != OPTAB_MUST_WIDEN
-      && optab_handler (binoptab, mode) != CODE_FOR_nothing)
+      && widening_optab_handler (binoptab, mode, GET_MODE (op0))
+	    != CODE_FOR_nothing)
     {
       temp = expand_binop_directly (mode, binoptab, op0, op1, target,
 				    unsignedp, methods, last);
@@ -1429,8 +1431,9 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 
   if (binoptab == smul_optab
       && GET_MODE_2XWIDER_MODE (mode) != VOIDmode
-      && (optab_handler ((unsignedp ? umul_widen_optab : smul_widen_optab),
-			 GET_MODE_2XWIDER_MODE (mode))
+      && (widening_optab_handler ((unsignedp ? umul_widen_optab
+					     : smul_widen_optab),
+				  GET_MODE_2XWIDER_MODE (mode), mode)
 	  != CODE_FOR_nothing))
     {
       temp = expand_binop (GET_MODE_2XWIDER_MODE (mode),
@@ -1460,9 +1463,10 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
 	    || (binoptab == smul_optab
 		&& GET_MODE_WIDER_MODE (wider_mode) != VOIDmode
-		&& (optab_handler ((unsignedp ? umul_widen_optab
-				    : smul_widen_optab),
-				   GET_MODE_WIDER_MODE (wider_mode))
+		&& (widening_optab_handler ((unsignedp ? umul_widen_optab
+						       : smul_widen_optab),
+					    GET_MODE_WIDER_MODE (wider_mode),
+					    mode)
 		    != CODE_FOR_nothing)))
 	  {
 	    rtx xop0 = op0, xop1 = op1;
@@ -1895,8 +1899,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
       && optab_handler (add_optab, word_mode) != CODE_FOR_nothing)
     {
       rtx product = NULL_RTX;
-
-      if (optab_handler (umul_widen_optab, mode) != CODE_FOR_nothing)
+      if (widening_optab_handler (umul_widen_optab, mode, word_mode)
+	    != CODE_FOR_nothing)
 	{
 	  product = expand_doubleword_mult (mode, op0, op1, target,
 					    true, methods);
@@ -1905,7 +1909,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	}
 
       if (product == NULL_RTX
-	  && optab_handler (smul_widen_optab, mode) != CODE_FOR_nothing)
+	  && widening_optab_handler (smul_widen_optab, mode, word_mode)
+		!= CODE_FOR_nothing)
 	{
 	  product = expand_doubleword_mult (mode, op0, op1, target,
 					    false, methods);
@@ -1996,7 +2001,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	   wider_mode != VOIDmode;
 	   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
 	{
-	  if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
+	  if (widening_optab_handler (binoptab, wider_mode, mode)
+		  != CODE_FOR_nothing
 	      || (methods == OPTAB_LIB
 		  && optab_libfunc (binoptab, wider_mode)))
 	    {
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -42,6 +42,11 @@ struct optab_handlers
   int insn_code;
 };
 
+struct widening_optab_handlers
+{
+  struct optab_handlers handlers[NUM_MACHINE_MODES][NUM_MACHINE_MODES];
+};
+
 struct optab_d
 {
   enum rtx_code code;
@@ -50,6 +55,7 @@ struct optab_d
   void (*libcall_gen)(struct optab_d *, const char *name, char suffix,
 		      enum machine_mode);
   struct optab_handlers handlers[NUM_MACHINE_MODES];
+  struct widening_optab_handlers *widening;
 };
 typedef struct optab_d * optab;
 
@@ -876,6 +882,23 @@ optab_handler (optab op, enum machine_mode mode)
 			   + (int) CODE_FOR_nothing);
 }
 
+/* Like optab_handler, but for widening_operations that have a TO_MODE and
+  a FROM_MODE.  */
+
+static inline enum insn_code
+widening_optab_handler (optab op, enum machine_mode to_mode,
+			enum machine_mode from_mode)
+{
+  if (to_mode == from_mode)
+    return optab_handler (op, to_mode);
+
+  if (op->widening)
+    return (enum insn_code) (op->widening->handlers[(int) to_mode][(int) from_mode].insn_code
+			     + (int) CODE_FOR_nothing);
+
+  return CODE_FOR_nothing;
+}
+
 /* Record that insn CODE should be used to implement mode MODE of OP.  */
 
 static inline void
@@ -884,6 +907,26 @@ set_optab_handler (optab op, enum machine_mode mode, enum insn_code code)
   op->handlers[(int) mode].insn_code = (int) code - (int) CODE_FOR_nothing;
 }
 
+/* Like set_optab_handler, but for widening operations that have a TO_MODE
+   and a FROM_MODE.  */
+
+static inline void
+set_widening_optab_handler (optab op, enum machine_mode to_mode,
+			    enum machine_mode from_mode, enum insn_code code)
+{
+  if (to_mode == from_mode)
+    set_optab_handler (op, to_mode, code);
+  else
+    {
+      if (op->widening == NULL)
+	op->widening = (struct widening_optab_handlers *)
+	      xcalloc (1, sizeof (struct widening_optab_handlers));
+
+      op->widening->handlers[(int) to_mode][(int) from_mode].insn_code
+	  = (int) code - (int) CODE_FOR_nothing;
+    }
+}
+
 /* Return the insn used to perform conversion OP from mode FROM_MODE
    to mode TO_MODE; return CODE_FOR_nothing if the target does not have
    such an insn.  */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2055,6 +2055,8 @@ convert_mult_to_widen (gimple stmt)
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
+  enum machine_mode to_mode, from_mode;
+  optab op;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2064,12 +2066,17 @@ convert_mult_to_widen (gimple stmt)
   if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
     return false;
 
+  to_mode = TYPE_MODE (type);
+  from_mode = TYPE_MODE (type1);
+
   if (TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2))
-    handler = optab_handler (umul_widen_optab, TYPE_MODE (type));
+    op = umul_widen_optab;
   else if (!TYPE_UNSIGNED (type1) && !TYPE_UNSIGNED (type2))
-    handler = optab_handler (smul_widen_optab, TYPE_MODE (type));
+    op = smul_widen_optab;
   else
-    handler = optab_handler (usmul_widen_optab, TYPE_MODE (type));
+    op = usmul_widen_optab;
+
+  handler = widening_optab_handler (op, to_mode, from_mode);
 
   if (handler == CODE_FOR_nothing)
     return false;
@@ -2171,7 +2178,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
   this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
-  if (optab_handler (this_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+  if (widening_optab_handler (this_optab, TYPE_MODE (type), TYPE_MODE (type1))
+	== CODE_FOR_nothing)
     return false;
 
   /* ??? May need some type verification here?  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (9/7)] Widening multiplies with constant inputs
  2011-07-22 12:34         ` Richard Guenther
@ 2011-07-22 16:06           ` Andrew Stubbs
  2011-08-19 16:24             ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-07-22 16:06 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 197 bytes --]

On 22/07/11 13:17, Richard Guenther wrote:
> Ok.

I found a NULL-pointer dereference bug.

Fixed in the attached. I'll commit this version when the rest of my 
testing is complete.

Thanks

Andrew

[-- Attachment #2: widening-multiplies-9.patch --]
[-- Type: text/x-patch, Size: 3439 bytes --]

2011-07-22  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Handle constants
	beyond conversions.
	(convert_mult_to_widen): Convert constant inputs to the right type.
	(convert_plusminus_to_widen): Don't automatically reject inputs that
	are not an SSA_NAME.
	Convert constant inputs to the right type.

	gcc/testsuite/
	* gcc.target/arm/wmul-11.c: New file.
	* gcc.target/arm/wmul-12.c: New file.
	* gcc.target/arm/wmul-13.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-11.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *b)
+{
+  return 10 * (long long)*b;
+}
+
+/* { dg-final { scan-assembler "smull" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-12.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *b, int *c)
+{
+  int tmp = *b * *c;
+  return 10 + (long long)tmp;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-13.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *a, int *b)
+{
+  return *a + (long long)*b * 10;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1997,6 +1997,13 @@ is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
 	  type1 = TREE_TYPE (rhs1);
 	}
 
+      if (rhs1 && TREE_CODE (rhs1) == INTEGER_CST)
+	{
+	  *new_rhs_out = rhs1;
+	  *type_out = NULL;
+	  return true;
+	}
+
       if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
@@ -2170,6 +2177,12 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
       rhs2 = build_and_insert_cast (gsi, loc, tmp, rhs2);
     }
 
+  /* Handle constants.  */
+  if (TREE_CODE (rhs1) == INTEGER_CST)
+    rhs1 = fold_convert (type1, rhs1);
+  if (TREE_CODE (rhs2) == INTEGER_CST)
+    rhs2 = fold_convert (type2, rhs2);
+
   gimple_assign_set_rhs1 (stmt, rhs1);
   gimple_assign_set_rhs2 (stmt, rhs2);
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
@@ -2221,8 +2234,6 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (is_gimple_assign (rhs1_stmt))
 	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
     }
-  else
-    return false;
 
   if (TREE_CODE (rhs2) == SSA_NAME)
     {
@@ -2230,8 +2241,6 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (is_gimple_assign (rhs2_stmt))
 	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
     }
-  else
-    return false;
 
   /* Allow for one conversion statement between the multiply
      and addition/subtraction statement.  If there are more than
@@ -2379,6 +2388,12 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     add_rhs = build_and_insert_cast (gsi, loc, create_tmp_var (type, NULL),
 				     add_rhs);
 
+  /* Handle constants.  */
+  if (TREE_CODE (mult_rhs1) == INTEGER_CST)
+    rhs1 = fold_convert (type1, mult_rhs1);
+  if (TREE_CODE (mult_rhs2) == INTEGER_CST)
+    rhs2 = fold_convert (type2, mult_rhs2);
+
   gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code, mult_rhs1, mult_rhs2,
 				    add_rhs);
   update_stmt (gsi_stmt (*gsi));

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-07-22 15:52           ` Andrew Stubbs
@ 2011-08-19 14:41             ` Andrew Stubbs
  2011-08-19 14:55               ` Richard Guenther
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 14:41 UTC (permalink / raw)
  Cc: Bernd Schmidt, gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 702 bytes --]

On 22/07/11 16:34, Andrew Stubbs wrote:
> On 22/07/11 14:28, Bernd Schmidt wrote:
>> Oh well, let's shelve it and do it later.
>
> Here's an updated patch with the formatting problem you found fixed.

I've just committed an updated version of this patch (attached).

I found a number of subtle bugs while I was testing, and these have now 
been corrected. In particular, I found that VOIDmode constants were not 
handled correctly; I've added a function "widened_mode" along the lines 
originally suggested by Benrd to deal with this. I also found one case 
where different code was produced to previously, although it was 
actually corrected later in the patch series I've fixed it here now.

Andrew


[-- Attachment #2: widening-multiplies-1.patch --]
[-- Type: text/x-patch, Size: 15169 bytes --]

2011-08-19  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* expr.c (expand_expr_real_2): Use widening_optab_handler.
	* genopinit.c (optabs): Use set_widening_optab_handler for $N.
	(gen_insn): $N now means $a must be wider than $b, not consecutive.
	* optabs.c (widened_mode): New function.
	(expand_widen_pattern_expr): Use widening_optab_handler.
	(expand_binop_directly): Likewise.
	(expand_binop): Likewise.
	* optabs.h (widening_optab_handlers): New struct.
	(optab_d): New member, 'widening'.
	(widening_optab_handler): New function.
	(set_widening_optab_handler): New function.
	* tree-ssa-math-opts.c (convert_mult_to_widen): Use
	widening_optab_handler.
	(convert_plusminus_to_widen): Likewise.

--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8005,7 +8005,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  this_optab = usmul_widen_optab;
 	  if (mode == GET_MODE_2XWIDER_MODE (innermode))
 	    {
-	      if (optab_handler (this_optab, mode) != CODE_FOR_nothing)
+	      if (widening_optab_handler (this_optab, mode, innermode)
+		    != CODE_FOR_nothing)
 		{
 		  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
 		    expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -8032,7 +8033,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  if (mode == GET_MODE_2XWIDER_MODE (innermode)
 	      && TREE_CODE (treeop0) != INTEGER_CST)
 	    {
-	      if (optab_handler (this_optab, mode) != CODE_FOR_nothing)
+	      if (widening_optab_handler (this_optab, mode, innermode)
+		    != CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
 				   EXPAND_NORMAL);
@@ -8040,7 +8042,8 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 					       unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	      if (optab_handler (other_optab, mode) != CODE_FOR_nothing
+	      if (widening_optab_handler (other_optab, mode, innermode)
+		    != CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
 		  rtx htem, hipart;
--- a/gcc/genopinit.c
+++ b/gcc/genopinit.c
@@ -46,10 +46,12 @@ along with GCC; see the file COPYING3.  If not see
    used.  $A and $B are replaced with the full name of the mode; $a and $b
    are replaced with the short form of the name, as above.
 
-   If $N is present in the pattern, it means the two modes must be consecutive
-   widths in the same mode class (e.g, QImode and HImode).  $I means that
-   only full integer modes should be considered for the next mode, and $F
-   means that only float modes should be considered.
+   If $N is present in the pattern, it means the two modes must be in
+   the same mode class, and $b must be greater than $a (e.g, QImode
+   and HImode).
+
+   $I means that only full integer modes should be considered for the
+   next mode, and $F means that only float modes should be considered.
    $P means that both full and partial integer modes should be considered.
    $Q means that only fixed-point modes should be considered.
 
@@ -99,17 +101,17 @@ static const char * const optabs[] =
   "set_optab_handler (smulv_optab, $A, CODE_FOR_$(mulv$I$a3$))",
   "set_optab_handler (umul_highpart_optab, $A, CODE_FOR_$(umul$a3_highpart$))",
   "set_optab_handler (smul_highpart_optab, $A, CODE_FOR_$(smul$a3_highpart$))",
-  "set_optab_handler (smul_widen_optab, $B, CODE_FOR_$(mul$a$b3$)$N)",
-  "set_optab_handler (umul_widen_optab, $B, CODE_FOR_$(umul$a$b3$)$N)",
-  "set_optab_handler (usmul_widen_optab, $B, CODE_FOR_$(usmul$a$b3$)$N)",
-  "set_optab_handler (smadd_widen_optab, $B, CODE_FOR_$(madd$a$b4$)$N)",
-  "set_optab_handler (umadd_widen_optab, $B, CODE_FOR_$(umadd$a$b4$)$N)",
-  "set_optab_handler (ssmadd_widen_optab, $B, CODE_FOR_$(ssmadd$a$b4$)$N)",
-  "set_optab_handler (usmadd_widen_optab, $B, CODE_FOR_$(usmadd$a$b4$)$N)",
-  "set_optab_handler (smsub_widen_optab, $B, CODE_FOR_$(msub$a$b4$)$N)",
-  "set_optab_handler (umsub_widen_optab, $B, CODE_FOR_$(umsub$a$b4$)$N)",
-  "set_optab_handler (ssmsub_widen_optab, $B, CODE_FOR_$(ssmsub$a$b4$)$N)",
-  "set_optab_handler (usmsub_widen_optab, $B, CODE_FOR_$(usmsub$a$b4$)$N)",
+  "set_widening_optab_handler (smul_widen_optab, $B, $A, CODE_FOR_$(mul$a$b3$)$N)",
+  "set_widening_optab_handler (umul_widen_optab, $B, $A, CODE_FOR_$(umul$a$b3$)$N)",
+  "set_widening_optab_handler (usmul_widen_optab, $B, $A, CODE_FOR_$(usmul$a$b3$)$N)",
+  "set_widening_optab_handler (smadd_widen_optab, $B, $A, CODE_FOR_$(madd$a$b4$)$N)",
+  "set_widening_optab_handler (umadd_widen_optab, $B, $A, CODE_FOR_$(umadd$a$b4$)$N)",
+  "set_widening_optab_handler (ssmadd_widen_optab, $B, $A, CODE_FOR_$(ssmadd$a$b4$)$N)",
+  "set_widening_optab_handler (usmadd_widen_optab, $B, $A, CODE_FOR_$(usmadd$a$b4$)$N)",
+  "set_widening_optab_handler (smsub_widen_optab, $B, $A, CODE_FOR_$(msub$a$b4$)$N)",
+  "set_widening_optab_handler (umsub_widen_optab, $B, $A, CODE_FOR_$(umsub$a$b4$)$N)",
+  "set_widening_optab_handler (ssmsub_widen_optab, $B, $A, CODE_FOR_$(ssmsub$a$b4$)$N)",
+  "set_widening_optab_handler (usmsub_widen_optab, $B, $A, CODE_FOR_$(usmsub$a$b4$)$N)",
   "set_optab_handler (sdiv_optab, $A, CODE_FOR_$(div$a3$))",
   "set_optab_handler (ssdiv_optab, $A, CODE_FOR_$(ssdiv$Q$a3$))",
   "set_optab_handler (sdivv_optab, $A, CODE_FOR_$(div$V$I$a3$))",
@@ -305,7 +307,7 @@ gen_insn (rtx insn)
     {
       int force_float = 0, force_int = 0, force_partial_int = 0;
       int force_fixed = 0;
-      int force_consec = 0;
+      int force_wider = 0;
       int matches = 1;
 
       for (pp = optabs[pindex]; pp[0] != '$' || pp[1] != '('; pp++)
@@ -323,7 +325,7 @@ gen_insn (rtx insn)
 	    switch (*++pp)
 	      {
 	      case 'N':
-		force_consec = 1;
+		force_wider = 1;
 		break;
 	      case 'I':
 		force_int = 1;
@@ -392,7 +394,10 @@ gen_insn (rtx insn)
 			    || mode_class[i] == MODE_VECTOR_FRACT
 			    || mode_class[i] == MODE_VECTOR_UFRACT
 			    || mode_class[i] == MODE_VECTOR_ACCUM
-			    || mode_class[i] == MODE_VECTOR_UACCUM))
+			    || mode_class[i] == MODE_VECTOR_UACCUM)
+			&& (! force_wider
+			    || *pp == 'a'
+			    || m1 < i))
 		      break;
 		  }
 
@@ -412,8 +417,7 @@ gen_insn (rtx insn)
 	}
 
       if (matches && pp[0] == '$' && pp[1] == ')'
-	  && *np == 0
-	  && (! force_consec || (int) GET_MODE_WIDER_MODE(m1) == m2))
+	  && *np == 0)
 	break;
     }
 
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -225,6 +225,30 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
   return 1;
 }
 \f
+/* Given two input operands, OP0 and OP1, determine what the correct from_mode
+   for a widening operation would be.  In most cases this would be OP0, but if
+   that's a constant it'll be VOIDmode, which isn't useful.  */
+
+static enum machine_mode
+widened_mode (enum machine_mode to_mode, rtx op0, rtx op1)
+{
+  enum machine_mode m0 = GET_MODE (op0);
+  enum machine_mode m1 = GET_MODE (op1);
+  enum machine_mode result;
+
+  if (m0 == VOIDmode && m1 == VOIDmode)
+    return to_mode;
+  else if (m0 == VOIDmode || GET_MODE_SIZE (m0) < GET_MODE_SIZE (m1))
+    result = m1;
+  else
+    result = m0;
+
+  if (GET_MODE_SIZE (result) > GET_MODE_SIZE (to_mode))
+    return to_mode;
+
+  return result;
+}
+\f
 /* Widen OP to MODE and return the rtx for the widened operand.  UNSIGNEDP
    says whether OP is signed or unsigned.  NO_EXTEND is nonzero if we need
    not actually do a sign-extend or zero-extend, but can leave the
@@ -515,8 +539,8 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
       || ops->code == WIDEN_MULT_MINUS_EXPR)
-    icode = optab_handler (widen_pattern_optab,
-			   TYPE_MODE (TREE_TYPE (ops->op2)));
+    icode = widening_optab_handler (widen_pattern_optab,
+				    TYPE_MODE (TREE_TYPE (ops->op2)), tmode0);
   else
     icode = optab_handler (widen_pattern_optab, tmode0);
   gcc_assert (icode != CODE_FOR_nothing);
@@ -1242,7 +1266,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
 		       rtx target, int unsignedp, enum optab_methods methods,
 		       rtx last)
 {
-  enum insn_code icode = optab_handler (binoptab, mode);
+  enum machine_mode from_mode = widened_mode (mode, op0, op1);
+  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);
   enum machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
   enum machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
   enum machine_mode mode0, mode1, tmp_mode;
@@ -1389,7 +1414,9 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
   /* If we can do it with a three-operand insn, do so.  */
 
   if (methods != OPTAB_MUST_WIDEN
-      && optab_handler (binoptab, mode) != CODE_FOR_nothing)
+      && widening_optab_handler (binoptab, mode,
+				 widened_mode (mode, op0, op1))
+	    != CODE_FOR_nothing)
     {
       temp = expand_binop_directly (mode, binoptab, op0, op1, target,
 				    unsignedp, methods, last);
@@ -1429,8 +1456,9 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 
   if (binoptab == smul_optab
       && GET_MODE_2XWIDER_MODE (mode) != VOIDmode
-      && (optab_handler ((unsignedp ? umul_widen_optab : smul_widen_optab),
-			 GET_MODE_2XWIDER_MODE (mode))
+      && (widening_optab_handler ((unsignedp ? umul_widen_optab
+					     : smul_widen_optab),
+				  GET_MODE_2XWIDER_MODE (mode), mode)
 	  != CODE_FOR_nothing))
     {
       temp = expand_binop (GET_MODE_2XWIDER_MODE (mode),
@@ -1460,9 +1488,10 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
 	    || (binoptab == smul_optab
 		&& GET_MODE_WIDER_MODE (wider_mode) != VOIDmode
-		&& (optab_handler ((unsignedp ? umul_widen_optab
-				    : smul_widen_optab),
-				   GET_MODE_WIDER_MODE (wider_mode))
+		&& (widening_optab_handler ((unsignedp ? umul_widen_optab
+						       : smul_widen_optab),
+					    GET_MODE_WIDER_MODE (wider_mode),
+					    mode)
 		    != CODE_FOR_nothing)))
 	  {
 	    rtx xop0 = op0, xop1 = op1;
@@ -1895,8 +1924,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
       && optab_handler (add_optab, word_mode) != CODE_FOR_nothing)
     {
       rtx product = NULL_RTX;
-
-      if (optab_handler (umul_widen_optab, mode) != CODE_FOR_nothing)
+      if (widening_optab_handler (umul_widen_optab, mode, word_mode)
+	    != CODE_FOR_nothing)
 	{
 	  product = expand_doubleword_mult (mode, op0, op1, target,
 					    true, methods);
@@ -1905,7 +1934,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	}
 
       if (product == NULL_RTX
-	  && optab_handler (smul_widen_optab, mode) != CODE_FOR_nothing)
+	  && widening_optab_handler (smul_widen_optab, mode, word_mode)
+		!= CODE_FOR_nothing)
 	{
 	  product = expand_doubleword_mult (mode, op0, op1, target,
 					    false, methods);
@@ -1997,6 +2027,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
 	{
 	  if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
+	      || widening_optab_handler (binoptab, wider_mode, mode)
+		  != CODE_FOR_nothing
 	      || (methods == OPTAB_LIB
 		  && optab_libfunc (binoptab, wider_mode)))
 	    {
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -42,6 +42,11 @@ struct optab_handlers
   int insn_code;
 };
 
+struct widening_optab_handlers
+{
+  struct optab_handlers handlers[NUM_MACHINE_MODES][NUM_MACHINE_MODES];
+};
+
 struct optab_d
 {
   enum rtx_code code;
@@ -50,6 +55,7 @@ struct optab_d
   void (*libcall_gen)(struct optab_d *, const char *name, char suffix,
 		      enum machine_mode);
   struct optab_handlers handlers[NUM_MACHINE_MODES];
+  struct widening_optab_handlers *widening;
 };
 typedef struct optab_d * optab;
 
@@ -879,6 +885,23 @@ optab_handler (optab op, enum machine_mode mode)
 			   + (int) CODE_FOR_nothing);
 }
 
+/* Like optab_handler, but for widening_operations that have a TO_MODE and
+  a FROM_MODE.  */
+
+static inline enum insn_code
+widening_optab_handler (optab op, enum machine_mode to_mode,
+			enum machine_mode from_mode)
+{
+  if (to_mode == from_mode || from_mode == VOIDmode)
+    return optab_handler (op, to_mode);
+
+  if (op->widening)
+    return (enum insn_code) (op->widening->handlers[(int) to_mode][(int) from_mode].insn_code
+			     + (int) CODE_FOR_nothing);
+
+  return CODE_FOR_nothing;
+}
+
 /* Record that insn CODE should be used to implement mode MODE of OP.  */
 
 static inline void
@@ -887,6 +910,26 @@ set_optab_handler (optab op, enum machine_mode mode, enum insn_code code)
   op->handlers[(int) mode].insn_code = (int) code - (int) CODE_FOR_nothing;
 }
 
+/* Like set_optab_handler, but for widening operations that have a TO_MODE
+   and a FROM_MODE.  */
+
+static inline void
+set_widening_optab_handler (optab op, enum machine_mode to_mode,
+			    enum machine_mode from_mode, enum insn_code code)
+{
+  if (to_mode == from_mode)
+    set_optab_handler (op, to_mode, code);
+  else
+    {
+      if (op->widening == NULL)
+	op->widening = (struct widening_optab_handlers *)
+	      xcalloc (1, sizeof (struct widening_optab_handlers));
+
+      op->widening->handlers[(int) to_mode][(int) from_mode].insn_code
+	  = (int) code - (int) CODE_FOR_nothing;
+    }
+}
+
 /* Return the insn used to perform conversion OP from mode FROM_MODE
    to mode TO_MODE; return CODE_FOR_nothing if the target does not have
    such an insn.  */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2056,6 +2056,8 @@ convert_mult_to_widen (gimple stmt)
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
+  enum machine_mode to_mode, from_mode;
+  optab op;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2065,12 +2067,17 @@ convert_mult_to_widen (gimple stmt)
   if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
     return false;
 
+  to_mode = TYPE_MODE (type);
+  from_mode = TYPE_MODE (type1);
+
   if (TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2))
-    handler = optab_handler (umul_widen_optab, TYPE_MODE (type));
+    op = umul_widen_optab;
   else if (!TYPE_UNSIGNED (type1) && !TYPE_UNSIGNED (type2))
-    handler = optab_handler (smul_widen_optab, TYPE_MODE (type));
+    op = smul_widen_optab;
   else
-    handler = optab_handler (usmul_widen_optab, TYPE_MODE (type));
+    op = usmul_widen_optab;
+
+  handler = widening_optab_handler (op, to_mode, from_mode);
 
   if (handler == CODE_FOR_nothing)
     return false;
@@ -2172,7 +2179,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
   this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
-  if (optab_handler (this_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+  if (widening_optab_handler (this_optab, TYPE_MODE (type), TYPE_MODE (type1))
+	== CODE_FOR_nothing)
     return false;
 
   /* ??? May need some type verification here?  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (2/7)] Widening multiplies by more than one mode
  2011-07-14 14:24         ` Richard Guenther
@ 2011-08-19 14:45           ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 14:45 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 285 bytes --]

On 14/07/11 15:15, Richard Guenther wrote:
>> Is this version OK?
> Ok.

I've just committed this slightly updated patch.

I found some bugs while testing, now fixed. Most of the changes in this 
patch are context changes, and using widened_mode to handle VOIDmode 
constants.

Andrew

[-- Attachment #2: widening-multiplies-2.patch --]
[-- Type: text/x-patch, Size: 17544 bytes --]

2011-08-19  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/arm/arm.md (maddhidi4): Remove '*' from name.
	* expr.c (expand_expr_real_2): Use find_widening_optab_handler.
	* optabs.c (find_widening_optab_handler_and_mode): New function.
	(expand_widen_pattern_expr): Use find_widening_optab_handler.
	(expand_binop_directly): Likewise.
	(expand_binop): Likewise.
	* optabs.h (find_widening_optab_handler): New macro define.
	(find_widening_optab_handler_and_mode): New prototype.
	* tree-cfg.c (verify_gimple_assign_binary): Adjust WIDEN_MULT_EXPR
	type precision rules.
	(verify_gimple_assign_ternary): Likewise for WIDEN_MULT_PLUS_EXPR.
	* tree-ssa-math-opts.c (build_and_insert_cast): New function.
	(is_widening_mult_rhs_p): Allow widening by more than one mode.
	Explicitly disallow mis-matched input types.
	(convert_mult_to_widen): Use find_widening_optab_handler, and cast
	input types to fit the new handler.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-bitfield-1.c: New file.

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1857,7 +1857,7 @@
    (set_attr "predicable" "yes")]
 )
 
-(define_insn "*maddhidi4"
+(define_insn "maddhidi4"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
 	(plus:DI
 	  (mult:DI (sign_extend:DI
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8003,19 +8003,16 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	{
 	  enum machine_mode innermode = TYPE_MODE (TREE_TYPE (treeop0));
 	  this_optab = usmul_widen_optab;
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode))
+	  if (find_widening_optab_handler (this_optab, mode, innermode, 0)
+		!= CODE_FOR_nothing)
 	    {
-	      if (widening_optab_handler (this_optab, mode, innermode)
-		    != CODE_FOR_nothing)
-		{
-		  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
-		    expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
-				     EXPAND_NORMAL);
-		  else
-		    expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
-				     EXPAND_NORMAL);
-		  goto binop3;
-		}
+	      if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
+		expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
+				 EXPAND_NORMAL);
+	      else
+		expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
+				 EXPAND_NORMAL);
+	      goto binop3;
 	    }
 	}
       /* Check for a multiplication with matching signedness.  */
@@ -8030,10 +8027,9 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  optab other_optab = zextend_p ? smul_widen_optab : umul_widen_optab;
 	  this_optab = zextend_p ? umul_widen_optab : smul_widen_optab;
 
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode)
-	      && TREE_CODE (treeop0) != INTEGER_CST)
+	  if (TREE_CODE (treeop0) != INTEGER_CST)
 	    {
-	      if (widening_optab_handler (this_optab, mode, innermode)
+	      if (find_widening_optab_handler (this_optab, mode, innermode, 0)
 		    != CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -8042,7 +8038,7 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 					       unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	      if (widening_optab_handler (other_optab, mode, innermode)
+	      if (find_widening_optab_handler (other_optab, mode, innermode, 0)
 		    != CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -249,6 +249,37 @@ widened_mode (enum machine_mode to_mode, rtx op0, rtx op1)
   return result;
 }
 \f
+/* Find a widening optab even if it doesn't widen as much as we want.
+   E.g. if from_mode is HImode, and to_mode is DImode, and there is no
+   direct HI->SI insn, then return SI->DI, if that exists.
+   If PERMIT_NON_WIDENING is non-zero then this can be used with
+   non-widening optabs also.  */
+
+enum insn_code
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+				      enum machine_mode from_mode,
+				      int permit_non_widening,
+				      enum machine_mode *found_mode)
+{
+  for (; (permit_non_widening || from_mode != to_mode)
+	 && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
+	 && from_mode != VOIDmode;
+       from_mode = GET_MODE_WIDER_MODE (from_mode))
+    {
+      enum insn_code handler = widening_optab_handler (op, to_mode,
+						       from_mode);
+
+      if (handler != CODE_FOR_nothing)
+	{
+	  if (found_mode)
+	    *found_mode = from_mode;
+	  return handler;
+	}
+    }
+
+  return CODE_FOR_nothing;
+}
+\f
 /* Widen OP to MODE and return the rtx for the widened operand.  UNSIGNEDP
    says whether OP is signed or unsigned.  NO_EXTEND is nonzero if we need
    not actually do a sign-extend or zero-extend, but can leave the
@@ -539,8 +570,9 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
       || ops->code == WIDEN_MULT_MINUS_EXPR)
-    icode = widening_optab_handler (widen_pattern_optab,
-				    TYPE_MODE (TREE_TYPE (ops->op2)), tmode0);
+    icode = find_widening_optab_handler (widen_pattern_optab,
+					 TYPE_MODE (TREE_TYPE (ops->op2)),
+					 tmode0, 0);
   else
     icode = optab_handler (widen_pattern_optab, tmode0);
   gcc_assert (icode != CODE_FOR_nothing);
@@ -1267,7 +1299,8 @@ expand_binop_directly (enum machine_mode mode, optab binoptab,
 		       rtx last)
 {
   enum machine_mode from_mode = widened_mode (mode, op0, op1);
-  enum insn_code icode = widening_optab_handler (binoptab, mode, from_mode);
+  enum insn_code icode = find_widening_optab_handler (binoptab, mode,
+						      from_mode, 1);
   enum machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
   enum machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
   enum machine_mode mode0, mode1, tmp_mode;
@@ -1414,8 +1447,8 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
   /* If we can do it with a three-operand insn, do so.  */
 
   if (methods != OPTAB_MUST_WIDEN
-      && widening_optab_handler (binoptab, mode,
-				 widened_mode (mode, op0, op1))
+      && find_widening_optab_handler (binoptab, mode,
+				      widened_mode (mode, op0, op1), 1)
 	    != CODE_FOR_nothing)
     {
       temp = expand_binop_directly (mode, binoptab, op0, op1, target,
@@ -1488,10 +1521,11 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
 	    || (binoptab == smul_optab
 		&& GET_MODE_WIDER_MODE (wider_mode) != VOIDmode
-		&& (widening_optab_handler ((unsignedp ? umul_widen_optab
-						       : smul_widen_optab),
-					    GET_MODE_WIDER_MODE (wider_mode),
-					    mode)
+		&& (find_widening_optab_handler ((unsignedp
+						  ? umul_widen_optab
+						  : smul_widen_optab),
+						 GET_MODE_WIDER_MODE (wider_mode),
+						 mode, 0)
 		    != CODE_FOR_nothing)))
 	  {
 	    rtx xop0 = op0, xop1 = op1;
@@ -2026,8 +2060,7 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,
 	   wider_mode != VOIDmode;
 	   wider_mode = GET_MODE_WIDER_MODE (wider_mode))
 	{
-	  if (optab_handler (binoptab, wider_mode) != CODE_FOR_nothing
-	      || widening_optab_handler (binoptab, wider_mode, mode)
+	  if (find_widening_optab_handler (binoptab, wider_mode, mode, 1)
 		  != CODE_FOR_nothing
 	      || (methods == OPTAB_LIB
 		  && optab_libfunc (binoptab, wider_mode)))
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -807,6 +807,15 @@ extern rtx expand_copysign (rtx, rtx, rtx);
 extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
+/* Find a widening optab even if it doesn't widen as much as we want.  */
+#define find_widening_optab_handler(A,B,C,D) \
+  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+extern enum insn_code find_widening_optab_handler_and_mode (optab,
+							    enum machine_mode,
+							    enum machine_mode,
+							    int,
+							    enum machine_mode *);
+
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
    distinguish between machines with a vector shift that takes a scalar for the
    shift amount vs. machines that take a vector for the shift amount.  */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-bitfield-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+struct bf
+{
+  int a : 3;
+  int b : 15;
+  int c : 3;
+};
+
+long long
+foo (long long a, struct bf b, struct bf c)
+{
+  return a + b.b * c.b;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3564,7 +3564,7 @@ do_pointer_plus_expr_check:
     case WIDEN_MULT_EXPR:
       if (TREE_CODE (lhs_type) != INTEGER_TYPE)
 	return true;
-      return ((2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type))
+      return ((2 * TYPE_PRECISION (rhs1_type) > TYPE_PRECISION (lhs_type))
 	      || (TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type)));
 
     case WIDEN_SUM_EXPR:
@@ -3655,7 +3655,7 @@ verify_gimple_assign_ternary (gimple stmt)
 	   && !FIXED_POINT_TYPE_P (rhs1_type))
 	  || !useless_type_conversion_p (rhs1_type, rhs2_type)
 	  || !useless_type_conversion_p (lhs_type, rhs3_type)
-	  || 2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type)
+	  || 2 * TYPE_PRECISION (rhs1_type) > TYPE_PRECISION (lhs_type)
 	  || TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type))
 	{
 	  error ("type mismatch in widening multiply-accumulate expression");
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1086,6 +1086,16 @@ build_and_insert_ref (gimple_stmt_iterator *gsi, location_t loc, tree type,
   return result;
 }
 
+/* Build a gimple assignment to cast VAL to TARGET.  Insert the statement
+   prior to GSI's current position, and return the fresh SSA name.  */
+
+static tree
+build_and_insert_cast (gimple_stmt_iterator *gsi, location_t loc,
+		       tree target, tree val)
+{
+  return build_and_insert_binop (gsi, loc, target, CONVERT_EXPR, val, NULL);
+}
+
 /* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI
    with location info LOC.  If possible, create an equivalent and
    less expensive sequence of statements prior to GSI, and return an
@@ -1959,8 +1969,8 @@ struct gimple_opt_pass pass_optimize_bswap =
 /* Return true if RHS is a suitable operand for a widening multiplication.
    There are two cases:
 
-     - RHS makes some value twice as wide.  Store that value in *NEW_RHS_OUT
-       if so, and store its type in *TYPE_OUT.
+     - RHS makes some value at least twice as wide.  Store that value
+       in *NEW_RHS_OUT if so, and store its type in *TYPE_OUT.
 
      - RHS is an integer constant.  Store that value in *NEW_RHS_OUT if so,
        but leave *TYPE_OUT untouched.  */
@@ -1988,7 +1998,7 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
       rhs1 = gimple_assign_rhs1 (stmt);
       type1 = TREE_TYPE (rhs1);
       if (TREE_CODE (type1) != TREE_CODE (type)
-	  || TYPE_PRECISION (type1) * 2 != TYPE_PRECISION (type))
+	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
 
       *new_rhs_out = rhs1;
@@ -2044,6 +2054,10 @@ is_widening_mult_p (gimple stmt,
       *type2_out = *type1_out;
     }
 
+  /* FIXME: remove this restriction.  */
+  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
+    return false;
+
   return true;
 }
 
@@ -2052,12 +2066,14 @@ is_widening_mult_p (gimple stmt,
    value is true iff we converted the statement.  */
 
 static bool
-convert_mult_to_widen (gimple stmt)
+convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
-  tree lhs, rhs1, rhs2, type, type1, type2;
+  tree lhs, rhs1, rhs2, type, type1, type2, tmp;
   enum insn_code handler;
-  enum machine_mode to_mode, from_mode;
+  enum machine_mode to_mode, from_mode, actual_mode;
   optab op;
+  int actual_precision;
+  location_t loc = gimple_location (stmt);
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2077,13 +2093,32 @@ convert_mult_to_widen (gimple stmt)
   else
     op = usmul_widen_optab;
 
-  handler = widening_optab_handler (op, to_mode, from_mode);
+  handler = find_widening_optab_handler_and_mode (op, to_mode, from_mode,
+						  0, &actual_mode);
 
   if (handler == CODE_FOR_nothing)
     return false;
 
-  gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
-  gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
+  /* Ensure that the inputs to the handler are in the correct precison
+     for the opcode.  This will be the full mode size.  */
+  actual_precision = GET_MODE_PRECISION (actual_mode);
+  if (actual_precision != TYPE_PRECISION (type1))
+    {
+      tmp = create_tmp_var (build_nonstandard_integer_type
+				(actual_precision, TYPE_UNSIGNED (type1)),
+			    NULL);
+      rhs1 = build_and_insert_cast (gsi, loc, tmp, rhs1);
+
+      /* Reuse the same type info, if possible.  */
+      if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
+	tmp = create_tmp_var (build_nonstandard_integer_type
+				(actual_precision, TYPE_UNSIGNED (type2)),
+			      NULL);
+      rhs2 = build_and_insert_cast (gsi, loc, tmp, rhs2);
+    }
+
+  gimple_assign_set_rhs1 (stmt, rhs1);
+  gimple_assign_set_rhs2 (stmt, rhs2);
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
   update_stmt (stmt);
   widen_mul_stats.widen_mults_inserted++;
@@ -2101,11 +2136,15 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			    enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
-  tree type, type1, type2;
+  tree type, type1, type2, tmp;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
   enum tree_code wmult_code;
+  enum insn_code handler;
+  enum machine_mode to_mode, from_mode, actual_mode;
+  location_t loc = gimple_location (stmt);
+  int actual_precision;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2139,39 +2178,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
     return false;
 
-  if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
+  /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
+     is_widening_mult_p, but we still need the rhs returns.
+
+     It might also appear that it would be sufficient to use the existing
+     operands of the widening multiply, but that would limit the choice of
+     multiply-and-accumulate instructions.  */
+  if (code == PLUS_EXPR
+      && (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR))
     {
       if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
     }
-  else if (rhs2_code == MULT_EXPR)
+  else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
     {
       if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;
     }
-  else if (code == PLUS_EXPR && rhs1_code == WIDEN_MULT_EXPR)
-    {
-      mult_rhs1 = gimple_assign_rhs1 (rhs1_stmt);
-      mult_rhs2 = gimple_assign_rhs2 (rhs1_stmt);
-      type1 = TREE_TYPE (mult_rhs1);
-      type2 = TREE_TYPE (mult_rhs2);
-      add_rhs = rhs2;
-    }
-  else if (rhs2_code == WIDEN_MULT_EXPR)
-    {
-      mult_rhs1 = gimple_assign_rhs1 (rhs2_stmt);
-      mult_rhs2 = gimple_assign_rhs2 (rhs2_stmt);
-      type1 = TREE_TYPE (mult_rhs1);
-      type2 = TREE_TYPE (mult_rhs2);
-      add_rhs = rhs1;
-    }
   else
     return false;
 
+  to_mode = TYPE_MODE (type);
+  from_mode = TYPE_MODE (type1);
+
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     return false;
 
@@ -2179,15 +2212,26 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
   this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
-  if (widening_optab_handler (this_optab, TYPE_MODE (type), TYPE_MODE (type1))
-	== CODE_FOR_nothing)
+  handler = find_widening_optab_handler_and_mode (this_optab, to_mode,
+						  from_mode, 0, &actual_mode);
+
+  if (handler == CODE_FOR_nothing)
     return false;
 
-  /* ??? May need some type verification here?  */
+  /* Ensure that the inputs to the handler are in the correct precison
+     for the opcode.  This will be the full mode size.  */
+  actual_precision = GET_MODE_PRECISION (actual_mode);
+  if (actual_precision != TYPE_PRECISION (type1))
+    {
+      tmp = create_tmp_var (build_nonstandard_integer_type
+				(actual_precision, TYPE_UNSIGNED (type1)),
+			    NULL);
+
+      mult_rhs1 = build_and_insert_cast (gsi, loc, tmp, mult_rhs1);
+      mult_rhs2 = build_and_insert_cast (gsi, loc, tmp, mult_rhs2);
+    }
 
-  gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code,
-				    fold_convert (type1, mult_rhs1),
-				    fold_convert (type2, mult_rhs2),
+  gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code, mult_rhs1, mult_rhs2,
 				    add_rhs);
   update_stmt (gsi_stmt (*gsi));
   widen_mul_stats.maccs_inserted++;
@@ -2399,7 +2443,7 @@ execute_optimize_widening_mul (void)
 	      switch (code)
 		{
 		case MULT_EXPR:
-		  if (!convert_mult_to_widen (stmt)
+		  if (!convert_mult_to_widen (stmt, &gsi)
 		      && convert_mult_to_fma (stmt,
 					      gimple_assign_rhs1 (stmt),
 					      gimple_assign_rhs2 (stmt)))

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
  2011-07-12 11:05                                 ` Richard Guenther
@ 2011-08-19 14:50                                   ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 14:50 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Michael Matz, gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 184 bytes --]

On 12/07/11 11:52, Richard Guenther wrote:
>> Is this one ok?
> Ok.

I've just committed this slightly modified patch.

The changes are mainly in the context and the testcase.

Andrew

[-- Attachment #2: widening-multiplies-3.patch --]
[-- Type: text/x-patch, Size: 4488 bytes --]

2011-08-19  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single
	conversion statement separating multiply-and-accumulate.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+int
+foo (int a, short b, short c)
+{
+     int bc = b * c;
+        return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "\tmul\t" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2136,6 +2136,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			    enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
   tree type, type1, type2, tmp;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
@@ -2178,6 +2179,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
     return false;
 
+  /* Allow for one conversion statement between the multiply
+     and addition/subtraction statement.  If there are more than
+     one conversions then we assume they would invalidate this
+     transformation.  If that's not the case then they should have
+     been folded before now.  */
+  if (CONVERT_EXPR_CODE_P (rhs1_code))
+    {
+      conv1_stmt = rhs1_stmt;
+      rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+      if (TREE_CODE (rhs1) == SSA_NAME)
+	{
+	  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+	  if (is_gimple_assign (rhs1_stmt))
+	    rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+	}
+      else
+	return false;
+    }
+  if (CONVERT_EXPR_CODE_P (rhs2_code))
+    {
+      conv2_stmt = rhs2_stmt;
+      rhs2 = gimple_assign_rhs1 (rhs2_stmt);
+      if (TREE_CODE (rhs2) == SSA_NAME)
+	{
+	  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+	  if (is_gimple_assign (rhs2_stmt))
+	    rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+	}
+      else
+	return false;
+    }
+
   /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
      is_widening_mult_p, but we still need the rhs returns.
 
@@ -2191,6 +2224,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
+      conv_stmt = conv1_stmt;
     }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
     {
@@ -2198,6 +2232,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;
+      conv_stmt = conv2_stmt;
     }
   else
     return false;
@@ -2208,6 +2243,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     return false;
 
+  /* If there was a conversion between the multiply and addition
+     then we need to make sure it fits a multiply-and-accumulate.
+     The should be a single mode change which does not change the
+     value.  */
+  if (conv_stmt)
+    {
+      tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
+      tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
+      int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+      bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+      if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is a truncate.  */
+	  if (TYPE_PRECISION (to_type) < data_size)
+	    return false;
+	}
+      else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is an extend.  Check it's the right sort.  */
+	  if (TYPE_UNSIGNED (from_type) != is_unsigned
+	      && !(is_unsigned && TYPE_PRECISION (from_type) > data_size))
+	    return false;
+	}
+      /* else convert is a no-op for our purposes.  */
+    }
+
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
  2011-07-14 14:31             ` Richard Guenther
@ 2011-08-19 14:51               ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 14:51 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 129 bytes --]

On 14/07/11 15:25, Richard Guenther wrote:
> Ok.

Committed, with no real changes. I just updated the testcase a little.

Andrew

[-- Attachment #2: widening-multiplies-4.patch --]
[-- Type: text/x-patch, Size: 7035 bytes --]

2011-08-19  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Convert
	unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-6.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-6.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2068,12 +2068,13 @@ is_widening_mult_p (gimple stmt,
 static bool
 convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
-  tree lhs, rhs1, rhs2, type, type1, type2, tmp;
+  tree lhs, rhs1, rhs2, type, type1, type2, tmp = NULL;
   enum insn_code handler;
   enum machine_mode to_mode, from_mode, actual_mode;
   optab op;
   int actual_precision;
   location_t loc = gimple_location (stmt);
+  bool from_unsigned1, from_unsigned2;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2085,10 +2086,12 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 
   to_mode = TYPE_MODE (type);
   from_mode = TYPE_MODE (type1);
+  from_unsigned1 = TYPE_UNSIGNED (type1);
+  from_unsigned2 = TYPE_UNSIGNED (type2);
 
-  if (TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2))
+  if (from_unsigned1 && from_unsigned2)
     op = umul_widen_optab;
-  else if (!TYPE_UNSIGNED (type1) && !TYPE_UNSIGNED (type2))
+  else if (!from_unsigned1 && !from_unsigned2)
     op = smul_widen_optab;
   else
     op = usmul_widen_optab;
@@ -2097,22 +2100,45 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 						  0, &actual_mode);
 
   if (handler == CODE_FOR_nothing)
-    return false;
+    {
+      if (op != smul_widen_optab)
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+	    return false;
+
+	  op = smul_widen_optab;
+	  handler = find_widening_optab_handler_and_mode (op, to_mode,
+							  from_mode, 0,
+							  &actual_mode);
+
+	  if (handler == CODE_FOR_nothing)
+	    return false;
+
+	  from_unsigned1 = from_unsigned2 = false;
+	}
+      else
+	return false;
+    }
 
   /* Ensure that the inputs to the handler are in the correct precison
      for the opcode.  This will be the full mode size.  */
   actual_precision = GET_MODE_PRECISION (actual_mode);
-  if (actual_precision != TYPE_PRECISION (type1))
+  if (actual_precision != TYPE_PRECISION (type1)
+      || from_unsigned1 != TYPE_UNSIGNED (type1))
     {
       tmp = create_tmp_var (build_nonstandard_integer_type
-				(actual_precision, TYPE_UNSIGNED (type1)),
+				(actual_precision, from_unsigned1),
 			    NULL);
       rhs1 = build_and_insert_cast (gsi, loc, tmp, rhs1);
-
+    }
+  if (actual_precision != TYPE_PRECISION (type2)
+      || from_unsigned2 != TYPE_UNSIGNED (type2))
+    {
       /* Reuse the same type info, if possible.  */
-      if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
+      if (!tmp || from_unsigned1 != from_unsigned2)
 	tmp = create_tmp_var (build_nonstandard_integer_type
-				(actual_precision, TYPE_UNSIGNED (type2)),
+				(actual_precision, from_unsigned2),
 			      NULL);
       rhs2 = build_and_insert_cast (gsi, loc, tmp, rhs2);
     }
@@ -2137,7 +2163,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
   gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
-  tree type, type1, type2, tmp;
+  tree type, type1, type2, optype, tmp = NULL;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
@@ -2146,6 +2172,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   enum machine_mode to_mode, from_mode, actual_mode;
   location_t loc = gimple_location (stmt);
   int actual_precision;
+  bool from_unsigned1, from_unsigned2;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2239,9 +2266,21 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 
   to_mode = TYPE_MODE (type);
   from_mode = TYPE_MODE (type1);
+  from_unsigned1 = TYPE_UNSIGNED (type1);
+  from_unsigned2 = TYPE_UNSIGNED (type2);
 
-  if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
-    return false;
+  /* There's no such thing as a mixed sign madd yet, so use a wider mode.  */
+  if (from_unsigned1 != from_unsigned2)
+    {
+      enum machine_mode mode = GET_MODE_WIDER_MODE (from_mode);
+      if (GET_MODE_PRECISION (mode) < GET_MODE_PRECISION (to_mode))
+	{
+	  from_mode = mode;
+	  from_unsigned1 = from_unsigned2 = false;
+	}
+      else
+	return false;
+    }
 
   /* If there was a conversion between the multiply and addition
      then we need to make sure it fits a multiply-and-accumulate.
@@ -2249,6 +2288,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
      value.  */
   if (conv_stmt)
     {
+      /* We use the original, unmodified data types for this.  */
       tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
       tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
       int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
@@ -2273,7 +2313,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */
-  this_optab = optab_for_tree_code (wmult_code, type1, optab_default);
+  optype = build_nonstandard_integer_type (from_mode, from_unsigned1);
+  this_optab = optab_for_tree_code (wmult_code, optype, optab_default);
   handler = find_widening_optab_handler_and_mode (this_optab, to_mode,
 						  from_mode, 0, &actual_mode);
 
@@ -2283,13 +2324,21 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   /* Ensure that the inputs to the handler are in the correct precison
      for the opcode.  This will be the full mode size.  */
   actual_precision = GET_MODE_PRECISION (actual_mode);
-  if (actual_precision != TYPE_PRECISION (type1))
+  if (actual_precision != TYPE_PRECISION (type1)
+      || from_unsigned1 != TYPE_UNSIGNED (type1))
     {
       tmp = create_tmp_var (build_nonstandard_integer_type
-				(actual_precision, TYPE_UNSIGNED (type1)),
+				(actual_precision, from_unsigned1),
 			    NULL);
-
       mult_rhs1 = build_and_insert_cast (gsi, loc, tmp, mult_rhs1);
+    }
+  if (actual_precision != TYPE_PRECISION (type2)
+      || from_unsigned2 != TYPE_UNSIGNED (type2))
+    {
+      if (!tmp || from_unsigned1 != from_unsigned2)
+	tmp = create_tmp_var (build_nonstandard_integer_type
+				(actual_precision, from_unsigned2),
+			      NULL);
       mult_rhs2 = build_and_insert_cast (gsi, loc, tmp, mult_rhs2);
     }
 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
  2011-07-14 14:35           ` Richard Guenther
@ 2011-08-19 14:54             ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 14:54 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 144 bytes --]

On 14/07/11 15:31, Richard Guenther wrote:
> Ok.

I've just committed this patch with no real changes. I've just updated 
the testcase.

Andrew

[-- Attachment #2: widening-multiplies-5.patch --]
[-- Type: text/x-patch, Size: 1193 bytes --]

2011-08-19  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME.
	Ensure the the larger type is the first operand.

	gcc/testsuite/
	* gcc.target/arm/wmul-7.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-7.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+unsigned long long
+foo (unsigned long long a, unsigned char *b, unsigned short *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2054,9 +2054,17 @@ is_widening_mult_p (gimple stmt,
       *type2_out = *type1_out;
     }
 
-  /* FIXME: remove this restriction.  */
-  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
-    return false;
+  /* Ensure that the larger of the two operands comes first. */
+  if (TYPE_PRECISION (*type1_out) < TYPE_PRECISION (*type2_out))
+    {
+      tree tmp;
+      tmp = *type1_out;
+      *type1_out = *type2_out;
+      *type2_out = tmp;
+      tmp = *rhs1_out;
+      *rhs1_out = *rhs2_out;
+      *rhs2_out = tmp;
+    }
 
   return true;
 }

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-08-19 14:41             ` Andrew Stubbs
@ 2011-08-19 14:55               ` Richard Guenther
  2011-08-19 15:07                 ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Richard Guenther @ 2011-08-19 14:55 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Bernd Schmidt, gcc-patches, patches

On Fri, Aug 19, 2011 at 4:18 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 22/07/11 16:34, Andrew Stubbs wrote:
>>
>> On 22/07/11 14:28, Bernd Schmidt wrote:
>>>
>>> Oh well, let's shelve it and do it later.
>>
>> Here's an updated patch with the formatting problem you found fixed.
>
> I've just committed an updated version of this patch (attached).
>
> I found a number of subtle bugs while I was testing, and these have now been
> corrected. In particular, I found that VOIDmode constants were not handled
> correctly; I've added a function "widened_mode" along the lines originally
> suggested by Benrd to deal with this. I also found one case where different
> code was produced to previously, although it was actually corrected later in
> the patch series I've fixed it here now.

Seems one in the series has broken bootstrap on x86_64 when building
the 32bit libgcc multilib in stage1.

Richard.

> Andrew
>
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
  2011-07-14 14:41           ` Richard Guenther
@ 2011-08-19 15:03             ` Andrew Stubbs
  2011-10-13 16:25               ` Matthew Gretton-Dann
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 15:03 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 366 bytes --]

On 14/07/11 15:35, Richard Guenther wrote:
> Ok.

I've just committed this updated patch.

I found bugs with VOIDmode constants that have caused me to recast my 
patches to is_widening_mult_rhs_p. They should be logically the same for 
non VOIDmode cases, but work correctly for constants. I think the new 
version is a bit easier to understand in any case.

Andrew

[-- Attachment #2: widening-multiplies-6.patch --]
[-- Type: text/x-patch, Size: 5194 bytes --]

2011-08-19  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
	'type'.
	Use 'type' from caller, not inferred from 'rhs'.
	Don't reject non-conversion statements. Do return lhs in this case.
	(is_widening_mult_p): Add new argument 'type'.
	Use 'type' from caller, not inferred from 'stmt'.
	Pass type to is_widening_mult_rhs_p.
	(convert_mult_to_widen): Pass type to is_widening_mult_p.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-8.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-8.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (long long a, int *b, int *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1966,7 +1966,8 @@ struct gimple_opt_pass pass_optimize_bswap =
  }
 };
 
-/* Return true if RHS is a suitable operand for a widening multiplication.
+/* Return true if RHS is a suitable operand for a widening multiplication,
+   assuming a target type of TYPE.
    There are two cases:
 
      - RHS makes some value at least twice as wide.  Store that value
@@ -1976,27 +1977,31 @@ struct gimple_opt_pass pass_optimize_bswap =
        but leave *TYPE_OUT untouched.  */
 
 static bool
-is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
+is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
+			tree *new_rhs_out)
 {
   gimple stmt;
-  tree type, type1, rhs1;
+  tree type1, rhs1;
   enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
     {
-      type = TREE_TYPE (rhs);
       stmt = SSA_NAME_DEF_STMT (rhs);
-      if (!is_gimple_assign (stmt))
-	return false;
-
-      rhs_code = gimple_assign_rhs_code (stmt);
-      if (TREE_CODE (type) == INTEGER_TYPE
-	  ? !CONVERT_EXPR_CODE_P (rhs_code)
-	  : rhs_code != FIXED_CONVERT_EXPR)
-	return false;
+      if (is_gimple_assign (stmt))
+	{
+	  rhs_code = gimple_assign_rhs_code (stmt);
+	  if (TREE_CODE (type) == INTEGER_TYPE
+	      ? !CONVERT_EXPR_CODE_P (rhs_code)
+	      : rhs_code != FIXED_CONVERT_EXPR)
+	    rhs1 = rhs;
+	  else
+	    rhs1 = gimple_assign_rhs1 (stmt);
+	}
+      else
+	rhs1 = rhs;
 
-      rhs1 = gimple_assign_rhs1 (stmt);
       type1 = TREE_TYPE (rhs1);
+
       if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
@@ -2016,28 +2021,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
   return false;
 }
 
-/* Return true if STMT performs a widening multiplication.  If so,
-   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
-   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
-   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
-   operands of the multiplication.  */
+/* Return true if STMT performs a widening multiplication, assuming the
+   output type is TYPE.  If so, store the unwidened types of the operands
+   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
+   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
+   and *TYPE2_OUT would give the operands of the multiplication.  */
 
 static bool
-is_widening_mult_p (gimple stmt,
+is_widening_mult_p (tree type, gimple stmt,
 		    tree *type1_out, tree *rhs1_out,
 		    tree *type2_out, tree *rhs2_out)
 {
-  tree type;
-
-  type = TREE_TYPE (gimple_assign_lhs (stmt));
   if (TREE_CODE (type) != INTEGER_TYPE
       && TREE_CODE (type) != FIXED_POINT_TYPE)
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
+			       rhs1_out))
     return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
+			       rhs2_out))
     return false;
 
   if (*type1_out == NULL)
@@ -2089,7 +2093,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != INTEGER_TYPE)
     return false;
 
-  if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
+  if (!is_widening_mult_p (type, stmt, &type1, &rhs1, &type2, &rhs2))
     return false;
 
   to_mode = TYPE_MODE (type);
@@ -2255,7 +2259,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (code == PLUS_EXPR
       && (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR))
     {
-      if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs2;
@@ -2263,7 +2267,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
     {
-      if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
+      if (!is_widening_mult_p (type, rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
       add_rhs = rhs1;

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-08-19 14:55               ` Richard Guenther
@ 2011-08-19 15:07                 ` Andrew Stubbs
  2011-08-19 16:40                   ` Andrew Stubbs
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 15:07 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Bernd Schmidt, gcc-patches, patches

On 19/08/11 15:45, Richard Guenther wrote:
> Seems one in the series has broken bootstrap on x86_64 when building
> the 32bit libgcc multilib in stage1.

Oh? Hopefully that'll be fixed when I complete the patchset. Patches 8 
and 9 (of 7) did fix issues with the earlier patches.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode
  2011-07-14 14:48       ` Richard Guenther
@ 2011-08-19 15:56         ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 15:56 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 105 bytes --]

On 14/07/11 15:41, Richard Guenther wrote:
> Ok.

Committed, unchanged apart from the test case.

Andrew

[-- Attachment #2: widening-multiplies-7.patch --]
[-- Type: text/x-patch, Size: 3081 bytes --]

2011-08-19  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Better handle
	unsigned inputs of different modes.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-9.c: New file.
	* gcc.target/arm/wmul-bitfield-2.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-9.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (long long a, short *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-bitfield-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+struct bf
+{
+  int a : 3;
+  unsigned int b : 15;
+  int c : 3;
+};
+
+long long
+foo (long long a, struct bf b, struct bf c)
+{
+  return a + b.b * c.c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2115,9 +2115,18 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
     {
       if (op != smul_widen_optab)
 	{
-	  from_mode = GET_MODE_WIDER_MODE (from_mode);
-	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
-	    return false;
+	  /* We can use a signed multiply with unsigned types as long as
+	     there is a wider mode to use, or it is the smaller of the two
+	     types that is unsigned.  Note that type1 >= type2, always.  */
+	  if ((TYPE_UNSIGNED (type1)
+	       && TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode))
+	      || (TYPE_UNSIGNED (type2)
+		  && TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode)))
+	    {
+	      from_mode = GET_MODE_WIDER_MODE (from_mode);
+	      if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+		return false;
+	    }
 
 	  op = smul_widen_optab;
 	  handler = find_widening_optab_handler_and_mode (op, to_mode,
@@ -2284,14 +2293,20 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   /* There's no such thing as a mixed sign madd yet, so use a wider mode.  */
   if (from_unsigned1 != from_unsigned2)
     {
-      enum machine_mode mode = GET_MODE_WIDER_MODE (from_mode);
-      if (GET_MODE_PRECISION (mode) < GET_MODE_PRECISION (to_mode))
+      /* We can use a signed multiply with unsigned types as long as
+	 there is a wider mode to use, or it is the smaller of the two
+	 types that is unsigned.  Note that type1 >= type2, always.  */
+      if ((from_unsigned1
+	   && TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode))
+	  || (from_unsigned2
+	      && TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode)))
 	{
-	  from_mode = mode;
-	  from_unsigned1 = from_unsigned2 = false;
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (from_mode) >= GET_MODE_SIZE (to_mode))
+	    return false;
 	}
-      else
-	return false;
+
+      from_unsigned1 = from_unsigned2 = false;
     }
 
   /* If there was a conversion between the multiply and addition

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (8/7)] Fix a bug in multiply-and-accumulate
  2011-07-21 13:48     ` Andrew Stubbs
@ 2011-08-19 16:22       ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 16:22 UTC (permalink / raw)
  Cc: Richard Guenther, gcc-patches

On 21/07/11 14:14, Andrew Stubbs wrote:
> Here is the patch I plan to commit, when patch 1 is approved, and my
> testing is complete.

Committed, unchanged.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (9/7)] Widening multiplies with constant inputs
  2011-07-22 16:06           ` Andrew Stubbs
@ 2011-08-19 16:24             ` Andrew Stubbs
  2011-08-19 16:52               ` H.J. Lu
  0 siblings, 1 reply; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 16:24 UTC (permalink / raw)
  Cc: Richard Guenther, gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 196 bytes --]

On 22/07/11 16:38, Andrew Stubbs wrote:
> Fixed in the attached. I'll commit this version when the rest of my
> testing is complete.

Now committed. Here's the patch with updated context.

Andrew

[-- Attachment #2: widening-multiplies-9.patch --]
[-- Type: text/x-patch, Size: 3474 bytes --]

2011-08-19  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Handle constants
	beyond conversions.
	(convert_mult_to_widen): Convert constant inputs to the right type.
	(convert_plusminus_to_widen): Don't automatically reject inputs that
	are not an SSA_NAME.
	Convert constant inputs to the right type.

	gcc/testsuite/
	* gcc.target/arm/wmul-11.c: New file.
	* gcc.target/arm/wmul-12.c: New file.
	* gcc.target/arm/wmul-13.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-11.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *b)
+{
+  return 10 * (long long)*b;
+}
+
+/* { dg-final { scan-assembler "smull" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-12.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *b, int *c)
+{
+  int tmp = *b * *c;
+  return 10 + (long long)tmp;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-13.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (int *a, int *b)
+{
+  return *a + (long long)*b * 10;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1995,7 +1995,16 @@ is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
 	      : rhs_code != FIXED_CONVERT_EXPR)
 	    rhs1 = rhs;
 	  else
-	    rhs1 = gimple_assign_rhs1 (stmt);
+	    {
+	      rhs1 = gimple_assign_rhs1 (stmt);
+
+	      if (TREE_CODE (rhs1) == INTEGER_CST)
+		{
+		  *new_rhs_out = rhs1;
+		  *type_out = NULL;
+		  return true;
+		}
+	    }
 	}
       else
 	rhs1 = rhs;
@@ -2164,6 +2173,12 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
       rhs2 = build_and_insert_cast (gsi, loc, tmp, rhs2);
     }
 
+  /* Handle constants.  */
+  if (TREE_CODE (rhs1) == INTEGER_CST)
+    rhs1 = fold_convert (type1, rhs1);
+  if (TREE_CODE (rhs2) == INTEGER_CST)
+    rhs2 = fold_convert (type2, rhs2);
+
   gimple_assign_set_rhs1 (stmt, rhs1);
   gimple_assign_set_rhs2 (stmt, rhs2);
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
@@ -2215,8 +2230,6 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (is_gimple_assign (rhs1_stmt))
 	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
     }
-  else
-    return false;
 
   if (TREE_CODE (rhs2) == SSA_NAME)
     {
@@ -2224,8 +2237,6 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (is_gimple_assign (rhs2_stmt))
 	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
     }
-  else
-    return false;
 
   /* Allow for one conversion statement between the multiply
      and addition/subtraction statement.  If there are more than
@@ -2373,6 +2384,12 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
     add_rhs = build_and_insert_cast (gsi, loc, create_tmp_var (type, NULL),
 				     add_rhs);
 
+  /* Handle constants.  */
+  if (TREE_CODE (mult_rhs1) == INTEGER_CST)
+    rhs1 = fold_convert (type1, mult_rhs1);
+  if (TREE_CODE (mult_rhs2) == INTEGER_CST)
+    rhs2 = fold_convert (type2, mult_rhs2);
+
   gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code, mult_rhs1, mult_rhs2,
 				    add_rhs);
   update_stmt (gsi_stmt (*gsi));

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (1/7)] New optab framework for widening multiplies
  2011-08-19 15:07                 ` Andrew Stubbs
@ 2011-08-19 16:40                   ` Andrew Stubbs
  0 siblings, 0 replies; 107+ messages in thread
From: Andrew Stubbs @ 2011-08-19 16:40 UTC (permalink / raw)
  Cc: Richard Guenther, Bernd Schmidt, gcc-patches, patches

On 19/08/11 15:51, Andrew Stubbs wrote:
> On 19/08/11 15:45, Richard Guenther wrote:
>> Seems one in the series has broken bootstrap on x86_64 when building
>> the 32bit libgcc multilib in stage1.
>
> Oh? Hopefully that'll be fixed when I complete the patchset. Patches 8
> and 9 (of 7) did fix issues with the earlier patches.


Seems fine now. Sorry for the trouble.

Andrew

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (9/7)] Widening multiplies with constant inputs
  2011-08-19 16:24             ` Andrew Stubbs
@ 2011-08-19 16:52               ` H.J. Lu
  0 siblings, 0 replies; 107+ messages in thread
From: H.J. Lu @ 2011-08-19 16:52 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Guenther, gcc-patches, patches

On Fri, Aug 19, 2011 at 8:07 AM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 22/07/11 16:38, Andrew Stubbs wrote:
>>
>> Fixed in the attached. I'll commit this version when the rest of my
>> testing is complete.
>
> Now committed. Here's the patch with updated context.
>

I think one of your patches caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50128

-- 
H.J.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
  2011-08-19 15:03             ` Andrew Stubbs
@ 2011-10-13 16:25               ` Matthew Gretton-Dann
  0 siblings, 0 replies; 107+ messages in thread
From: Matthew Gretton-Dann @ 2011-10-13 16:25 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches, patches

This patch seems to have caused PR50717:
   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50717

Thanks,

Matt

On 19/08/11 15:49, Andrew Stubbs wrote:
> On 14/07/11 15:35, Richard Guenther wrote:
>> Ok.
>
> I've just committed this updated patch.
>
> I found bugs with VOIDmode constants that have caused me to recast my
> patches to is_widening_mult_rhs_p. They should be logically the same for
> non VOIDmode cases, but work correctly for constants. I think the new
> version is a bit easier to understand in any case.
>
> Andrew
>
>
> widening-multiplies-6.patch
>
>
> 2011-08-19  Andrew Stubbs<ams@codesourcery.com>
>
> 	gcc/
> 	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
> 	'type'.
> 	Use 'type' from caller, not inferred from 'rhs'.
> 	Don't reject non-conversion statements. Do return lhs in this case.
> 	(is_widening_mult_p): Add new argument 'type'.
> 	Use 'type' from caller, not inferred from 'stmt'.
> 	Pass type to is_widening_mult_rhs_p.
> 	(convert_mult_to_widen): Pass type to is_widening_mult_p.
> 	(convert_plusminus_to_widen): Likewise.
>
> 	gcc/testsuite/
> 	* gcc.target/arm/wmul-8.c: New file.
>
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/wmul-8.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-require-effective-target arm_dsp } */
> +
> +long long
> +foo (long long a, int *b, int *c)
> +{
> +  return a + *b * *c;
> +}
> +
> +/* { dg-final { scan-assembler "smlal" } } */
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -1966,7 +1966,8 @@ struct gimple_opt_pass pass_optimize_bswap =
>    }
>   };
>
> -/* Return true if RHS is a suitable operand for a widening multiplication.
> +/* Return true if RHS is a suitable operand for a widening multiplication,
> +   assuming a target type of TYPE.
>      There are two cases:
>
>        - RHS makes some value at least twice as wide.  Store that value
> @@ -1976,27 +1977,31 @@ struct gimple_opt_pass pass_optimize_bswap =
>          but leave *TYPE_OUT untouched.  */
>
>   static bool
> -is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
> +is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
> +			tree *new_rhs_out)
>   {
>     gimple stmt;
> -  tree type, type1, rhs1;
> +  tree type1, rhs1;
>     enum tree_code rhs_code;
>
>     if (TREE_CODE (rhs) == SSA_NAME)
>       {
> -      type = TREE_TYPE (rhs);
>         stmt = SSA_NAME_DEF_STMT (rhs);
> -      if (!is_gimple_assign (stmt))
> -	return false;
> -
> -      rhs_code = gimple_assign_rhs_code (stmt);
> -      if (TREE_CODE (type) == INTEGER_TYPE
> -	  ? !CONVERT_EXPR_CODE_P (rhs_code)
> -	  : rhs_code != FIXED_CONVERT_EXPR)
> -	return false;
> +      if (is_gimple_assign (stmt))
> +	{
> +	  rhs_code = gimple_assign_rhs_code (stmt);
> +	  if (TREE_CODE (type) == INTEGER_TYPE
> +	      ? !CONVERT_EXPR_CODE_P (rhs_code)
> +	      : rhs_code != FIXED_CONVERT_EXPR)
> +	    rhs1 = rhs;
> +	  else
> +	    rhs1 = gimple_assign_rhs1 (stmt);
> +	}
> +      else
> +	rhs1 = rhs;
>
> -      rhs1 = gimple_assign_rhs1 (stmt);
>         type1 = TREE_TYPE (rhs1);
> +
>         if (TREE_CODE (type1) != TREE_CODE (type)
>   	  || TYPE_PRECISION (type1) * 2>  TYPE_PRECISION (type))
>   	return false;
> @@ -2016,28 +2021,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
>     return false;
>   }
>
> -/* Return true if STMT performs a widening multiplication.  If so,
> -   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
> -   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
> -   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
> -   operands of the multiplication.  */
> +/* Return true if STMT performs a widening multiplication, assuming the
> +   output type is TYPE.  If so, store the unwidened types of the operands
> +   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
> +   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
> +   and *TYPE2_OUT would give the operands of the multiplication.  */
>
>   static bool
> -is_widening_mult_p (gimple stmt,
> +is_widening_mult_p (tree type, gimple stmt,
>   		    tree *type1_out, tree *rhs1_out,
>   		    tree *type2_out, tree *rhs2_out)
>   {
> -  tree type;
> -
> -  type = TREE_TYPE (gimple_assign_lhs (stmt));
>     if (TREE_CODE (type) != INTEGER_TYPE
>         &&  TREE_CODE (type) != FIXED_POINT_TYPE)
>       return false;
>
> -  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
> +  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
> +			       rhs1_out))
>       return false;
>
> -  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
> +  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
> +			       rhs2_out))
>       return false;
>
>     if (*type1_out == NULL)
> @@ -2089,7 +2093,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
>     if (TREE_CODE (type) != INTEGER_TYPE)
>       return false;
>
> -  if (!is_widening_mult_p (stmt,&type1,&rhs1,&type2,&rhs2))
> +  if (!is_widening_mult_p (type, stmt,&type1,&rhs1,&type2,&rhs2))
>       return false;
>
>     to_mode = TYPE_MODE (type);
> @@ -2255,7 +2259,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
>     if (code == PLUS_EXPR
>         &&  (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR))
>       {
> -      if (!is_widening_mult_p (rhs1_stmt,&type1,&mult_rhs1,
> +      if (!is_widening_mult_p (type, rhs1_stmt,&type1,&mult_rhs1,
>   			&type2,&mult_rhs2))
>   	return false;
>         add_rhs = rhs2;
> @@ -2263,7 +2267,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
>       }
>     else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
>       {
> -      if (!is_widening_mult_p (rhs2_stmt,&type1,&mult_rhs1,
> +      if (!is_widening_mult_p (type, rhs2_stmt,&type1,&mult_rhs1,
>   			&type2,&mult_rhs2))
>   	return false;
>         add_rhs = rhs1;


-- 
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd

^ permalink raw reply	[flat|nested] 107+ messages in thread

end of thread, other threads:[~2011-10-13 15:59 UTC | newest]

Thread overview: 107+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-23 14:38 [PATCH (0/7)] Improve use of Widening Multiplies Andrew Stubbs
2011-06-23 14:39 ` [PATCH (1/7)] New optab framework for widening multiplies Andrew Stubbs
2011-07-09 15:38   ` Andrew Stubbs
2011-07-14 15:29     ` Andrew Stubbs
2011-07-22 13:01     ` Bernd Schmidt
2011-07-22 13:50       ` Andrew Stubbs
2011-07-22 14:01         ` Bernd Schmidt
2011-07-22 15:52           ` Andrew Stubbs
2011-08-19 14:41             ` Andrew Stubbs
2011-08-19 14:55               ` Richard Guenther
2011-08-19 15:07                 ` Andrew Stubbs
2011-08-19 16:40                   ` Andrew Stubbs
2011-06-23 14:41 ` [PATCH (2/7)] Widening multiplies by more than one mode Andrew Stubbs
2011-07-12 10:15   ` Andrew Stubbs
2011-07-12 11:05     ` Richard Guenther
2011-07-12 11:14       ` Richard Guenther
2011-07-12 11:38         ` Andrew Stubbs
2011-07-12 11:51           ` Richard Guenther
2011-07-21 19:51         ` Joseph S. Myers
2011-07-22  8:58           ` Andrew Stubbs
2011-07-14 14:17       ` Andrew Stubbs
2011-07-14 14:24         ` Richard Guenther
2011-08-19 14:45           ` Andrew Stubbs
2011-06-23 14:42 ` [PATCH (3/7)] Widening multiply-and-accumulate pattern matching Andrew Stubbs
2011-06-23 16:28   ` Richard Guenther
2011-06-24  8:14     ` Andrew Stubbs
2011-06-24  9:31       ` Richard Guenther
2011-06-24 14:08         ` Stubbs, Andrew
2011-06-24 16:13           ` Richard Guenther
2011-06-24 18:22             ` Stubbs, Andrew
2011-06-25  9:58               ` Richard Guenther
2011-06-28 11:32             ` Andrew Stubbs
2011-06-28 12:48               ` Richard Guenther
2011-06-28 16:37                 ` Michael Matz
2011-06-28 16:48                   ` Andrew Stubbs
2011-06-28 17:09                     ` Michael Matz
2011-07-01 11:58                       ` Stubbs, Andrew
2011-07-01 12:25                         ` Richard Guenther
2011-07-04 14:23                           ` Andrew Stubbs
2011-07-07 10:00                             ` Richard Guenther
2011-07-07 10:27                               ` Andrew Stubbs
2011-07-07 12:18                                 ` Andrew Stubbs
2011-07-07 12:34                                   ` Richard Guenther
2011-07-07 12:49                                     ` Richard Guenther
2011-07-08 12:55                                       ` Andrew Stubbs
2011-07-08 13:22                                         ` Richard Guenther
2011-07-11 17:01                               ` Andrew Stubbs
2011-07-12 11:05                                 ` Richard Guenther
2011-08-19 14:50                                   ` Andrew Stubbs
2011-07-14 14:26                                 ` Andrew Stubbs
2011-07-19  0:36                                   ` Janis Johnson
2011-07-19  9:01                                     ` Andrew Stubbs
2011-07-01 12:33                         ` Paolo Bonzini
2011-07-01 13:31                           ` Stubbs, Andrew
2011-07-01 14:41                             ` Paolo Bonzini
2011-07-01 14:55                               ` Stubbs, Andrew
2011-07-01 15:54                                 ` Paolo Bonzini
2011-07-01 18:18                                   ` Stubbs, Andrew
2011-07-01 15:10                             ` Stubbs, Andrew
2011-07-01 16:40                     ` Bernd Schmidt
2011-06-23 21:55   ` Janis Johnson
2011-06-23 14:43 ` [PATCH (4/7)] Unsigned multiplies using wider signed multiplies Andrew Stubbs
2011-06-28 13:28   ` Andrew Stubbs
2011-06-28 14:49     ` Andrew Stubbs
2011-07-04 14:27       ` Andrew Stubbs
2011-07-07 10:10         ` Richard Guenther
2011-07-07 10:42           ` Andrew Stubbs
2011-07-07 11:08             ` Richard Guenther
2011-07-12 14:10         ` Andrew Stubbs
2011-07-14 14:28           ` Andrew Stubbs
2011-07-14 14:31             ` Richard Guenther
2011-08-19 14:51               ` Andrew Stubbs
2011-06-28 13:30   ` Paolo Bonzini
2011-06-23 14:44 ` [PATCH (5/7)] Widening multiplies for mis-matched mode inputs Andrew Stubbs
2011-06-28 15:44   ` Andrew Stubbs
2011-07-04 14:29     ` Andrew Stubbs
2011-07-07 10:11       ` Richard Guenther
2011-07-14 14:34         ` Andrew Stubbs
2011-07-14 14:35           ` Richard Guenther
2011-08-19 14:54             ` Andrew Stubbs
2011-06-23 14:51 ` [PATCH (6/7)] More widening multiply-and-accumulate pattern matching Andrew Stubbs
2011-06-28 15:49   ` Andrew Stubbs
2011-07-04 14:32     ` Andrew Stubbs
2011-07-07 10:20       ` Richard Guenther
2011-07-14 14:35         ` Andrew Stubbs
2011-07-14 14:41           ` Richard Guenther
2011-08-19 15:03             ` Andrew Stubbs
2011-10-13 16:25               ` Matthew Gretton-Dann
2011-06-23 14:54 ` [PATCH (7/7)] Mixed-sign multiplies using narrowest mode Andrew Stubbs
2011-06-28 17:02   ` Andrew Stubbs
2011-07-14 14:44     ` Andrew Stubbs
2011-07-14 14:48       ` Richard Guenther
2011-08-19 15:56         ` Andrew Stubbs
2011-06-25 16:14 ` [PATCH (0/7)] Improve use of Widening Multiplies Bernd Schmidt
2011-06-27  9:16   ` Andrew Stubbs
2011-07-18 14:34 ` [PATCH (8/7)] Fix a bug in multiply-and-accumulate Andrew Stubbs
2011-07-18 16:09   ` Richard Guenther
2011-07-21 13:48     ` Andrew Stubbs
2011-08-19 16:22       ` Andrew Stubbs
2011-07-21 13:14 ` [PATCH (9/7)] Widening multiplies with constant inputs Andrew Stubbs
2011-07-21 14:34   ` Richard Guenther
2011-07-22 12:28     ` Andrew Stubbs
2011-07-22 12:32       ` Andrew Stubbs
2011-07-22 12:34         ` Richard Guenther
2011-07-22 16:06           ` Andrew Stubbs
2011-08-19 16:24             ` Andrew Stubbs
2011-08-19 16:52               ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).