[PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
@ 2016-06-15 13:24 Kyrill Tkachov
  2016-06-15 21:53 ` Marc Glisse
  0 siblings, 1 reply; 18+ messages in thread
From: Kyrill Tkachov @ 2016-06-15 13:24 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Marc Glisse

[-- Attachment #1: Type: text/plain, Size: 1747 bytes --]

Hi all,

This is a respin of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following feedback.
I've changed the code to cast the operand to an unsigned type before applying the multiplication algorithm
and cast it back to the signed type at the end.
Whether to perform the cast is now determined by the function cast_mult_synth_to_unsigned in which I've implemented
the cases that Marc mentioned in [1]. Please do let me know
if there are any other cases that need to be handled.

I've added a couple of TODO notes in the places that need to be extended to handle shifts as a series of additions
for targets that support vector addition but not vector shifts [2].

tree-vect-patterns.c already includes expmed.h (must have been added in the time since I first wrote the patch
last November...) so it picks up the definition of choose_mult_variant moved there in patch 1/2.

Bootstrapped and tested on aarch64, arm, x86_64.
Does this look better?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01023.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00967.html

2016-06-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65951
     * tree-vect-patterns.c: Include mult-synthesis.h
     (target_supports_mult_synth_alg): New function.
     (apply_binop_and_append_stmt): Likewise.
     (vect_synth_mult_by_constant): Likewise.
     (target_has_vecop_for_code): Likewise.
     (cast_mult_synth_to_unsigned): Likewise.
     (vect_recog_mult_pattern): Use above functions to synthesize vector
     multiplication by integer constants.

2016-06-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
     * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.

[-- Attachment #2: vect-2.patch --]
[-- Type: text/x-patch, Size: 15165 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..e5dba82d7fa955a6a37a0eabf980127e464ac77b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= 123;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / 123 != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..83019c96910b866e364a7c2e00261a1ded13cb53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= -19594LL;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / -19594LL != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 8a2221f935063002ecd02d2b20af5cb4bd7d9fee..698a022d4b80795a0b5a22a9c433fe482fa1bdaa 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2131,32 +2131,281 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
   return pattern_stmt;
 }
 
-/* Detect multiplication by constant which are postive or negatives of power 2,
-   and convert them to shift patterns.
+/* Return true iff the target has a vector optab implementing the operation
+   CODE on type VECTYPE.  */
 
-   Mult with constants that are postive power of two.
-   type a_t;
-   type b_t
-   S1: b_t = a_t * n
+static bool
+target_has_vecop_for_code (tree_code code, tree vectype)
+{
+  optab voptab = optab_for_tree_code (code, vectype, optab_vector);
+  return voptab
+	 && optab_handler (voptab, TYPE_MODE (vectype)) != CODE_FOR_nothing;
+}
 
-   or
+/* Return true iff we need to cast the operand of the
+   mutliplication by a constant CST described by the algorithm ALG and VAR
+   on type SCALTYPE to an unsigned type to avoid overflow.  */
+
+static bool
+cast_mult_synth_to_unsigned (struct algorithm *alg, mult_variant var,
+			      tree scaltype, tree cst)
+{
+  if (TYPE_OVERFLOW_WRAPS (scaltype))
+    return false;
+
+  widest_int val = wi::to_widest (cst);
+  /* If multiplying by a negative divisor of the minimum value we risk
+     signed overflow.  */
+  if (wi::neg_p (val)
+      && wi::multiple_of_p (wi::to_widest (TYPE_MIN_VALUE (scaltype)),
+			     val, SIGNED))
+    return true;
+
+  /* If the final step is a subtraction we should cast.  */
+  if (var == negate_variant)
+    return true;
+
+  if (var == add_variant)
+    return false;
+
+  alg_code acode = alg->op[alg->ops - 1];
+  if (acode == alg_sub_t_m2
+      || acode == alg_sub_t2_m
+      || acode == alg_sub_factor)
+    return true;
+
+  return false;
+}
+
+/* Verify that the target has optabs of the vector form of SCALTYPE to perform
+   all the steps needed by the multiplication-by-immediate synthesis algorithm
+   described by ALG and VAR.  Return true iff that is the case.  */
+
+static bool
+target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
+				 tree scaltype)
+{
+  if (alg->op[0] != alg_zero && alg->op[0] != alg_m)
+    return false;
+
+  tree vectype = get_vectype_for_scalar_type (scaltype);
+
+  if (!vectype)
+    return false;
+
+  /* All synthesis algorithms require shifts, so bail out early if
+     target cannot vectorize them.
+     TODO: If the target doesn't support vector shifts but supports vector
+     addition we could synthesize shifts that way.  vect_synth_mult_by_constant
+     would need to be updated to do that.  */
+  if (!vect_supportable_shift (LSHIFT_EXPR, scaltype))
+    return false;
+
+  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
+  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
+
+  if (var == negate_variant
+      && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
+    return false;
+
+  if (var == add_variant && !supports_vplus)
+    return false;
+
+  for (int i = 1; i < alg->ops; i++)
+    {
+      switch (alg->op[i])
+	{
+	case alg_shift:
+	  break;
+	case alg_add_t_m2:
+	case alg_add_t2_m:
+	case alg_add_factor:
+	  if (!supports_vplus)
+	    return false;
+	  break;
+	case alg_sub_t_m2:
+	case alg_sub_t2_m:
+	case alg_sub_factor:
+	  if (!supports_vminus)
+	    return false;
+	  break;
+	case alg_unknown:
+	case alg_m:
+	case alg_zero:
+	case alg_impossible:
+	  return false;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  return true;
+}
+
+/* Helper for vect_synth_mult_by_constant.  Apply a binary operation
+   CODE to operands OP1 and OP2, creating a new temporary SSA var in
+   the process.  Append the resulting assignment statement to the sequence
+   in STMT_VINFO.  Return the newly created SSA variable which is the result
+   of the binary operation.  */
+
+static tree
+apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
+			     stmt_vec_info stmt_vinfo)
+{
+  tree itype = TREE_TYPE (op1);
+  tree tmp_var = vect_recog_temp_ssa_var (itype, NULL);
+  gimple *stmt = gimple_build_assign (tmp_var, code, op1, op2);
+  append_pattern_def_seq (stmt_vinfo, stmt);
+  return tmp_var;
+}
+
+/* Synthesize a multiplication of OP by an INTEGER_CST VAL using shifts
+   and simple arithmetic operations to be vectorized.  Record the statements
+   produced in STMT_VINFO and return the last statement in the sequence or
+   NULL if it's not possible to synthesize such a multiplication.
+   This function mirrors the behavior of expand_mult_const in expmed.c but
+   works on tree-ssa form.  */
+
+static gimple *
+vect_synth_mult_by_constant (tree op, tree val,
+			     stmt_vec_info stmt_vinfo)
+{
+  tree itype = TREE_TYPE (op);
+  machine_mode mode = TYPE_MODE (itype);
+  struct algorithm alg;
+  mult_variant variant;
+  if (!tree_fits_shwi_p (val))
+    return NULL;
+
+  HOST_WIDE_INT hwval = tree_to_shwi (val);
+  /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.
+     The vectorizer's benefit analysis will decide whether it's beneficial
+     to do this.  */
+  bool possible = choose_mult_variant (mode, hwval, &alg,
+					&variant, MAX_COST);
+  if (!possible)
+    return NULL;
+
+  bool cast_to_unsigned_p
+    = cast_mult_synth_to_unsigned (&alg, variant, itype, val);
+
+  tree multtype = cast_to_unsigned_p ? unsigned_type_for (itype) : itype;
+
+  if (!target_supports_mult_synth_alg (&alg, variant, multtype))
+    return NULL;
+
+  tree accumulator;
+
+  /* Clear out the sequence of statements so we can populate it below.  */
+  STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo) = NULL;
+  gimple *stmt = NULL;
 
-   Mult with constants that are negative power of two.
-   S2: b_t = a_t * -n
+  if (alg.op[0] == alg_zero)
+    accumulator = build_int_cst (multtype, 0);
+  else if (cast_to_unsigned_p)
+    {
+      accumulator = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (accumulator, CONVERT_EXPR, op);
+      append_pattern_def_seq (stmt_vinfo, stmt);
+    }
+  else
+    accumulator = op;
+
+  bool needs_fixup = (variant == negate_variant)
+		      || (variant == add_variant);
+
+  for (int i = 1; i < alg.ops; i++)
+    {
+      tree shft_log = build_int_cst (multtype, alg.log[i]);
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      tree tmp_var = NULL_TREE;
+
+      /* TODO: Targets that don't support vector shifts could synthesize
+	 them using vector additions.  target_supports_mult_synth_alg would
+	 need to be updated to allow this.  */
+      switch (alg.op[i])
+	{
+	case alg_shift:
+	  stmt = gimple_build_assign (accum_tmp, LSHIFT_EXPR, accumulator,
+				       shft_log);
+	  break;
+	case alg_add_t_m2:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_t_m2:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_add_t2_m:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, tmp_var, op);
+	  break;
+	case alg_sub_t2_m:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var, op);
+	  break;
+	case alg_add_factor:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_factor:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var,
+				      accumulator);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      /* We don't want to append the last stmt in the sequence to stmt_vinfo
+	 but rather return it directly.  */
+      if ((i < alg.ops - 1) || needs_fixup || cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+      accumulator = accum_tmp;
+    }
+  if (variant == negate_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, accumulator);
+      accumulator = accum_tmp;
+      if (cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+    }
+  else if (variant == add_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator, op);
+      accumulator = accum_tmp;
+      if (cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+    }
+
+  /* Move back to a signed if needed.  */
+  if (cast_to_unsigned_p)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (itype, NULL);
+      stmt = gimple_build_assign (accum_tmp, CONVERT_EXPR, accumulator);
+    }
+  return stmt;
+}
+
+/* Detect multiplication by constant and convert it into a sequence of
+   shifts and additions, subtractions, negations.  We reuse the
+   choose_mult_variant algorithms from expmed.c
 
    Input/Output:
 
    STMTS: Contains a stmt from which the pattern search begins,
-   i.e. the mult stmt.  Convert the mult operation to LSHIFT if
-   constant operand is a power of 2.
-   type a_t, b_t
-   S1': b_t = a_t << log2 (n)
-
-   Convert the mult operation to LSHIFT and followed by a NEGATE
-   if constant operand is a negative power of 2.
-   type a_t, b_t, res_T;
-   S2': b_t = a_t << log2 (n)
-   S3': res_T  = - (b_t)
+   i.e. the mult stmt.
 
  Output:
 
@@ -2164,8 +2413,8 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
 
   * TYPE_OUT: The type of the output of this pattern.
 
-  * Return value: A new stmt that will be used to replace the multiplication
-    S1 or S2 stmt.  */
+  * Return value: A new stmt that will be used to replace
+    the multiplication.  */
 
 static gimple *
 vect_recog_mult_pattern (vec<gimple *> *stmts,
@@ -2173,11 +2422,8 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 {
   gimple *last_stmt = stmts->pop ();
   tree oprnd0, oprnd1, vectype, itype;
-  gimple *pattern_stmt, *def_stmt;
-  optab optab;
+  gimple *pattern_stmt;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
-  int power2_val, power2_neg_val;
-  tree shift;
 
   if (!is_gimple_assign (last_stmt))
     return NULL;
@@ -2201,52 +2447,17 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 
   /* If the target can handle vectorized multiplication natively,
      don't attempt to optimize this.  */
-  optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
-  if (optab != unknown_optab)
+  optab mul_optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
+  if (mul_optab != unknown_optab)
     {
       machine_mode vec_mode = TYPE_MODE (vectype);
-      int icode = (int) optab_handler (optab, vec_mode);
+      int icode = (int) optab_handler (mul_optab, vec_mode);
       if (icode != CODE_FOR_nothing)
-	return NULL;
-    }
-
-  /* If target cannot handle vector left shift then we cannot
-     optimize and bail out.  */
-  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
-  if (!optab
-      || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-    return NULL;
-
-  power2_val = wi::exact_log2 (oprnd1);
-  power2_neg_val = wi::exact_log2 (wi::neg (oprnd1));
-
-  /* Handle constant operands that are postive or negative powers of 2.  */
-  if (power2_val != -1)
-    {
-      shift = build_int_cst (itype, power2_val);
-      pattern_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
+       return NULL;
     }
-  else if (power2_neg_val != -1)
-    {
-      /* If the target cannot handle vector NEGATE then we cannot
-	 do the optimization.  */
-      optab = optab_for_tree_code (NEGATE_EXPR, vectype, optab_vector);
-      if (!optab
-	  || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-	return NULL;
 
-      shift = build_int_cst (itype, power2_neg_val);
-      def_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
-      new_pattern_def_seq (stmt_vinfo, def_stmt);
-      pattern_stmt
-	 = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-				NEGATE_EXPR, gimple_assign_lhs (def_stmt));
-    }
-  else
+  pattern_stmt = vect_synth_mult_by_constant (oprnd0, oprnd1, stmt_vinfo);
+  if (!pattern_stmt)
     return NULL;
 
   /* Pattern detected.  */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-06-15 13:24 [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant Kyrill Tkachov
@ 2016-06-15 21:53 ` Marc Glisse
  2016-06-16  9:06   ` Kyrill Tkachov
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Glisse @ 2016-06-15 21:53 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Richard Biener

On Wed, 15 Jun 2016, Kyrill Tkachov wrote:

> This is a respin of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html 
> following feedback.
> I've changed the code to cast the operand to an unsigned type before applying 
> the multiplication algorithm
> and cast it back to the signed type at the end.
> Whether to perform the cast is now determined by the function 
> cast_mult_synth_to_unsigned in which I've implemented
> the cases that Marc mentioned in [1]. Please do let me know
> if there are any other cases that need to be handled.

Ah, I never meant those cases as an exhaustive list, I was just looking 
for examples showing that the transformation was unsafe, and those 2 came 
to mind:

- x*15 -> x*16-x the second one can overflow even when the first one 
doesn't.

- x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's 
redundant with the negate_variant check?)

On the other hand, as long as we remain in the 'positive' operations, 
turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2 
cannot cause spurious overflows. But I didn't look at the algorithm 
closely enough to characterize the safe cases. Now if you have done it, 
that's good :-) Otherwise, we might want to err on the side of caution.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-06-15 21:53 ` Marc Glisse
@ 2016-06-16  9:06   ` Kyrill Tkachov
  2016-06-28  8:08     ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Kyrill Tkachov @ 2016-06-16  9:06 UTC (permalink / raw)
  To: Marc Glisse; +Cc: GCC Patches, Richard Biener


On 15/06/16 22:53, Marc Glisse wrote:
> On Wed, 15 Jun 2016, Kyrill Tkachov wrote:
>
>> This is a respin of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following feedback.
>> I've changed the code to cast the operand to an unsigned type before applying the multiplication algorithm
>> and cast it back to the signed type at the end.
>> Whether to perform the cast is now determined by the function cast_mult_synth_to_unsigned in which I've implemented
>> the cases that Marc mentioned in [1]. Please do let me know
>> if there are any other cases that need to be handled.
>
> Ah, I never meant those cases as an exhaustive list, I was just looking for examples showing that the transformation was unsafe, and those 2 came to mind:
>
> - x*15 -> x*16-x the second one can overflow even when the first one doesn't.
>
> - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's redundant with the negate_variant check?)
>
> On the other hand, as long as we remain in the 'positive' operations, turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2 cannot cause spurious overflows. But I didn't look at the algorithm closely enough to 
> characterize the safe cases. Now if you have done it, that's good :-) Otherwise, we might want to err on the side of caution.
>

I'll be honest, I didn't give it much thought beyond convincing myself that the two cases you listed are legitimate.
Looking at expand_mult_const in expmed.c can be helpful (where it updates val_so_far for checking purposes) to see
the different algorithm cases. I think the only steps that could cause overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the final step is negate_variant, which are what you listed (and covered in this patch).

richi is away on PTO for the time being though, so we have some time to convince ourselves :)

Thanks,
Kyrill

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-06-16  9:06   ` Kyrill Tkachov
@ 2016-06-28  8:08     ` Richard Biener
  2016-06-30  9:00       ` Kyrill Tkachov
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2016-06-28  8:08 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Marc Glisse, GCC Patches

On Thu, 16 Jun 2016, Kyrill Tkachov wrote:

> 
> On 15/06/16 22:53, Marc Glisse wrote:
> > On Wed, 15 Jun 2016, Kyrill Tkachov wrote:
> > 
> > > This is a respin of
> > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
> > > feedback.
> > > I've changed the code to cast the operand to an unsigned type before
> > > applying the multiplication algorithm
> > > and cast it back to the signed type at the end.
> > > Whether to perform the cast is now determined by the function
> > > cast_mult_synth_to_unsigned in which I've implemented
> > > the cases that Marc mentioned in [1]. Please do let me know
> > > if there are any other cases that need to be handled.
> > 
> > Ah, I never meant those cases as an exhaustive list, I was just looking for
> > examples showing that the transformation was unsafe, and those 2 came to
> > mind:
> > 
> > - x*15 -> x*16-x the second one can overflow even when the first one
> > doesn't.
> > 
> > - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's
> > redundant with the negate_variant check?)
> > 
> > On the other hand, as long as we remain in the 'positive' operations,
> > turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2
> > cannot cause spurious overflows. But I didn't look at the algorithm closely
> > enough to characterize the safe cases. Now if you have done it, that's good
> > :-) Otherwise, we might want to err on the side of caution.
> > 
> 
> I'll be honest, I didn't give it much thought beyond convincing myself that
> the two cases you listed are legitimate.
> Looking at expand_mult_const in expmed.c can be helpful (where it updates
> val_so_far for checking purposes) to see
> the different algorithm cases. I think the only steps that could cause
> overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the final
> step is negate_variant, which are what you listed (and covered in this patch).
> 
> richi is away on PTO for the time being though, so we have some time to
> convince ourselves :)

;)  I think the easiest way would be to always use unsigned arithmetic.

While VRP doesn't do anything on vector operations we still have some
match.pd patterns that rely on correct overflow behavior and those
may be enabled for vector types as well.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-06-28  8:08     ` Richard Biener
@ 2016-06-30  9:00       ` Kyrill Tkachov
  2016-07-01 12:02         ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Kyrill Tkachov @ 2016-06-30  9:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: Marc Glisse, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3135 bytes --]


On 28/06/16 08:54, Richard Biener wrote:
> On Thu, 16 Jun 2016, Kyrill Tkachov wrote:
>
>> On 15/06/16 22:53, Marc Glisse wrote:
>>> On Wed, 15 Jun 2016, Kyrill Tkachov wrote:
>>>
>>>> This is a respin of
>>>> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
>>>> feedback.
>>>> I've changed the code to cast the operand to an unsigned type before
>>>> applying the multiplication algorithm
>>>> and cast it back to the signed type at the end.
>>>> Whether to perform the cast is now determined by the function
>>>> cast_mult_synth_to_unsigned in which I've implemented
>>>> the cases that Marc mentioned in [1]. Please do let me know
>>>> if there are any other cases that need to be handled.
>>> Ah, I never meant those cases as an exhaustive list, I was just looking for
>>> examples showing that the transformation was unsafe, and those 2 came to
>>> mind:
>>>
>>> - x*15 -> x*16-x the second one can overflow even when the first one
>>> doesn't.
>>>
>>> - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's
>>> redundant with the negate_variant check?)
>>>
>>> On the other hand, as long as we remain in the 'positive' operations,
>>> turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2
>>> cannot cause spurious overflows. But I didn't look at the algorithm closely
>>> enough to characterize the safe cases. Now if you have done it, that's good
>>> :-) Otherwise, we might want to err on the side of caution.
>>>
>> I'll be honest, I didn't give it much thought beyond convincing myself that
>> the two cases you listed are legitimate.
>> Looking at expand_mult_const in expmed.c can be helpful (where it updates
>> val_so_far for checking purposes) to see
>> the different algorithm cases. I think the only steps that could cause
>> overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the final
>> step is negate_variant, which are what you listed (and covered in this patch).
>>
>> richi is away on PTO for the time being though, so we have some time to
>> convince ourselves :)
> ;)  I think the easiest way would be to always use unsigned arithmetic.
>
> While VRP doesn't do anything on vector operations we still have some
> match.pd patterns that rely on correct overflow behavior and those
> may be enabled for vector types as well.

That's fine with me.
Here's the patch that performs the casts to unsigned and back when the input
type doesn't wrap on overflow.

Bootstrapped and tested on arm, aarch64, x86_64.

How's this?

Thanks,
Kyrill

2016-06-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65951
     * tree-vect-patterns.c: Include mult-synthesis.h
     (target_supports_mult_synth_alg): New function.
     (apply_binop_and_append_stmt): Likewise.
     (vect_synth_mult_by_constant): Likewise.
     (target_has_vecop_for_code): Likewise.
     (vect_recog_mult_pattern): Use above functions to synthesize vector
     multiplication by integer constants.

2016-06-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
     * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.


[-- Attachment #2: vect-mult.patch --]
[-- Type: text/x-patch, Size: 15004 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..e5dba82d7fa955a6a37a0eabf980127e464ac77b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= 123;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / 123 != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..83019c96910b866e364a7c2e00261a1ded13cb53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= -19594LL;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / -19594LL != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index f0c515daaa22f59e4bcefe1fcf3e44bf29fe9a40..96f8bc5954f2944d6ce7fa2bb51743b516be36dc 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2131,32 +2131,268 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
   return pattern_stmt;
 }
 
-/* Detect multiplication by constant which are postive or negatives of power 2,
-   and convert them to shift patterns.
+/* Return true iff the target has a vector optab implementing the operation
+   CODE on type VECTYPE.  */
 
-   Mult with constants that are postive power of two.
-   type a_t;
-   type b_t
-   S1: b_t = a_t * n
+static bool
+target_has_vecop_for_code (tree_code code, tree vectype)
+{
+  optab voptab = optab_for_tree_code (code, vectype, optab_vector);
+  return voptab
+	 && optab_handler (voptab, TYPE_MODE (vectype)) != CODE_FOR_nothing;
+}
 
-   or
+/* Verify that the target has optabs of the vector form of SCALTYPE to perform
+   all the steps needed by the multiplication-by-immediate synthesis algorithm
+   described by ALG and VAR.  Return true iff that is the case.  */
+
+static bool
+target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
+				 tree scaltype)
+{
+  if (alg->op[0] != alg_zero && alg->op[0] != alg_m)
+    return false;
+
+  tree vectype = get_vectype_for_scalar_type (scaltype);
+
+  if (!vectype)
+    return false;
+
+  /* All synthesis algorithms require shifts, so bail out early if
+     target cannot vectorize them.
+     TODO: If the target doesn't support vector shifts but supports vector
+     addition we could synthesize shifts that way.  vect_synth_mult_by_constant
+     would need to be updated to do that.  */
+  if (!vect_supportable_shift (LSHIFT_EXPR, scaltype))
+    return false;
+
+  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
+  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
+
+  if (var == negate_variant
+      && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
+    return false;
+
+  if (var == add_variant && !supports_vplus)
+    return false;
+
+  for (int i = 1; i < alg->ops; i++)
+    {
+      switch (alg->op[i])
+	{
+	case alg_shift:
+	  break;
+	case alg_add_t_m2:
+	case alg_add_t2_m:
+	case alg_add_factor:
+	  if (!supports_vplus)
+	    return false;
+	  break;
+	case alg_sub_t_m2:
+	case alg_sub_t2_m:
+	case alg_sub_factor:
+	  if (!supports_vminus)
+	    return false;
+	  break;
+	case alg_unknown:
+	case alg_m:
+	case alg_zero:
+	case alg_impossible:
+	  return false;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  return true;
+}
+
+/* Helper for vect_synth_mult_by_constant.  Apply a binary operation
+   CODE to operands OP1 and OP2, creating a new temporary SSA var in
+   the process if necessary.  Append the resulting assignment statement
+   to the sequence in STMT_VINFO.  Return the SSA variable that holds the
+   result of the binary operation.  */
+
+static tree
+apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
+			     stmt_vec_info stmt_vinfo)
+{
+  if (TREE_INT_CST_LOW (op2) == 0
+      && (code == LSHIFT_EXPR
+	  || code == PLUS_EXPR))
+    {
+      gcc_assert (TREE_CODE (op1) == SSA_NAME);
+      return op1;
+    }
+
+  tree itype = TREE_TYPE (op2);
+  tree tmp_var = vect_recog_temp_ssa_var (itype, NULL);
+  gimple *stmt = gimple_build_assign (tmp_var, code, op1, op2);
+  append_pattern_def_seq (stmt_vinfo, stmt);
+  return tmp_var;
+}
+
+/* Synthesize a multiplication of OP by an INTEGER_CST VAL using shifts
+   and simple arithmetic operations to be vectorized.  Record the statements
+   produced in STMT_VINFO and return the last statement in the sequence or
+   NULL if it's not possible to synthesize such a multiplication.
+   This function mirrors the behavior of expand_mult_const in expmed.c but
+   works on tree-ssa form.  */
+
+static gimple *
+vect_synth_mult_by_constant (tree op, tree val,
+			     stmt_vec_info stmt_vinfo)
+{
+  tree itype = TREE_TYPE (op);
+  machine_mode mode = TYPE_MODE (itype);
+  struct algorithm alg;
+  mult_variant variant;
+  if (!tree_fits_shwi_p (val))
+    return NULL;
+
+  HOST_WIDE_INT hwval = tree_to_shwi (val);
+  /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.
+     The vectorizer's benefit analysis will decide whether it's beneficial
+     to do this.  */
+  bool possible = choose_mult_variant (mode, hwval, &alg,
+					&variant, MAX_COST);
+  if (!possible)
+    return NULL;
+
+  /* Multiplication synthesis by shifts, adds and subs can introduce
+     signed overflow where the original operation didn't.  Perform the
+     operations on an unsigned type and cast back to avoid this.
+     In the future we may want to relax this for synthesis algorithms
+     that we can prove do not cause unexpected overflow.  */
+  bool cast_to_unsigned_p = !TYPE_OVERFLOW_WRAPS (itype);
+
+  tree multtype = cast_to_unsigned_p ? unsigned_type_for (itype) : itype;
+
+  if (!target_supports_mult_synth_alg (&alg, variant, multtype))
+    return NULL;
 
-   Mult with constants that are negative power of two.
-   S2: b_t = a_t * -n
+  tree accumulator;
+
+  /* Clear out the sequence of statements so we can populate it below.  */
+  STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo) = NULL;
+  gimple *stmt = NULL;
+
+  if (cast_to_unsigned_p)
+    {
+      tree tmp_op = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (tmp_op, CONVERT_EXPR, op);
+      append_pattern_def_seq (stmt_vinfo, stmt);
+      op = tmp_op;
+    }
+
+  if (alg.op[0] == alg_zero)
+    accumulator = build_int_cst (multtype, 0);
+  else
+    accumulator = op;
+
+  bool needs_fixup = (variant == negate_variant)
+		      || (variant == add_variant);
+
+  for (int i = 1; i < alg.ops; i++)
+    {
+      tree shft_log = build_int_cst (multtype, alg.log[i]);
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      tree tmp_var = NULL_TREE;
+
+      /* TODO: Targets that don't support vector shifts could synthesize
+	 them using vector additions.  target_supports_mult_synth_alg would
+	 need to be updated to allow this.  */
+      switch (alg.op[i])
+	{
+	case alg_shift:
+	  stmt = gimple_build_assign (accum_tmp, LSHIFT_EXPR, accumulator,
+				       shft_log);
+	  break;
+	case alg_add_t_m2:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_t_m2:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op,
+						  shft_log, stmt_vinfo);
+	  /* In some algorithms the first step involves zeroing the
+	     accumulator.  If subtracting from such an accumulator
+	     just emit the negation directly.  */
+	  if (TREE_CODE (accumulator) == INTEGER_CST
+	      && TREE_INT_CST_LOW (accumulator) == 0)
+	    stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, tmp_var);
+	  else
+	    stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, accumulator,
+					tmp_var);
+	  break;
+	case alg_add_t2_m:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, tmp_var, op);
+	  break;
+	case alg_sub_t2_m:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var, op);
+	  break;
+	case alg_add_factor:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_factor:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var,
+				      accumulator);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      /* We don't want to append the last stmt in the sequence to stmt_vinfo
+	 but rather return it directly.  */
+
+      if ((i < alg.ops - 1) || needs_fixup || cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+      accumulator = accum_tmp;
+    }
+  if (variant == negate_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, accumulator);
+      accumulator = accum_tmp;
+      if (cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+    }
+  else if (variant == add_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator, op);
+      accumulator = accum_tmp;
+      if (cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+    }
+  /* Move back to a signed if needed.  */
+  if (cast_to_unsigned_p)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (itype, NULL);
+      stmt = gimple_build_assign (accum_tmp, CONVERT_EXPR, accumulator);
+    }
+
+  return stmt;
+}
+
+/* Detect multiplication by constant and convert it into a sequence of
+   shifts and additions, subtractions, negations.  We reuse the
+   choose_mult_variant algorithms from expmed.c
 
    Input/Output:
 
    STMTS: Contains a stmt from which the pattern search begins,
-   i.e. the mult stmt.  Convert the mult operation to LSHIFT if
-   constant operand is a power of 2.
-   type a_t, b_t
-   S1': b_t = a_t << log2 (n)
-
-   Convert the mult operation to LSHIFT and followed by a NEGATE
-   if constant operand is a negative power of 2.
-   type a_t, b_t, res_T;
-   S2': b_t = a_t << log2 (n)
-   S3': res_T  = - (b_t)
+   i.e. the mult stmt.
 
  Output:
 
@@ -2164,8 +2400,8 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
 
   * TYPE_OUT: The type of the output of this pattern.
 
-  * Return value: A new stmt that will be used to replace the multiplication
-    S1 or S2 stmt.  */
+  * Return value: A new stmt that will be used to replace
+    the multiplication.  */
 
 static gimple *
 vect_recog_mult_pattern (vec<gimple *> *stmts,
@@ -2173,11 +2409,8 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 {
   gimple *last_stmt = stmts->pop ();
   tree oprnd0, oprnd1, vectype, itype;
-  gimple *pattern_stmt, *def_stmt;
-  optab optab;
+  gimple *pattern_stmt;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
-  int power2_val, power2_neg_val;
-  tree shift;
 
   if (!is_gimple_assign (last_stmt))
     return NULL;
@@ -2201,52 +2434,17 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 
   /* If the target can handle vectorized multiplication natively,
      don't attempt to optimize this.  */
-  optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
-  if (optab != unknown_optab)
+  optab mul_optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
+  if (mul_optab != unknown_optab)
     {
       machine_mode vec_mode = TYPE_MODE (vectype);
-      int icode = (int) optab_handler (optab, vec_mode);
+      int icode = (int) optab_handler (mul_optab, vec_mode);
       if (icode != CODE_FOR_nothing)
-	return NULL;
-    }
-
-  /* If target cannot handle vector left shift then we cannot
-     optimize and bail out.  */
-  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
-  if (!optab
-      || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-    return NULL;
-
-  power2_val = wi::exact_log2 (oprnd1);
-  power2_neg_val = wi::exact_log2 (wi::neg (oprnd1));
-
-  /* Handle constant operands that are postive or negative powers of 2.  */
-  if (power2_val != -1)
-    {
-      shift = build_int_cst (itype, power2_val);
-      pattern_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
+       return NULL;
     }
-  else if (power2_neg_val != -1)
-    {
-      /* If the target cannot handle vector NEGATE then we cannot
-	 do the optimization.  */
-      optab = optab_for_tree_code (NEGATE_EXPR, vectype, optab_vector);
-      if (!optab
-	  || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-	return NULL;
 
-      shift = build_int_cst (itype, power2_neg_val);
-      def_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
-      new_pattern_def_seq (stmt_vinfo, def_stmt);
-      pattern_stmt
-	 = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-				NEGATE_EXPR, gimple_assign_lhs (def_stmt));
-    }
-  else
+  pattern_stmt = vect_synth_mult_by_constant (oprnd0, oprnd1, stmt_vinfo);
+  if (!pattern_stmt)
     return NULL;
 
   /* Pattern detected.  */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-06-30  9:00       ` Kyrill Tkachov
@ 2016-07-01 12:02         ` Richard Biener
  2016-07-05  8:38           ` Kyrill Tkachov
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2016-07-01 12:02 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Marc Glisse, GCC Patches

On Thu, 30 Jun 2016, Kyrill Tkachov wrote:

> 
> On 28/06/16 08:54, Richard Biener wrote:
> > On Thu, 16 Jun 2016, Kyrill Tkachov wrote:
> > 
> > > On 15/06/16 22:53, Marc Glisse wrote:
> > > > On Wed, 15 Jun 2016, Kyrill Tkachov wrote:
> > > > 
> > > > > This is a respin of
> > > > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
> > > > > feedback.
> > > > > I've changed the code to cast the operand to an unsigned type before
> > > > > applying the multiplication algorithm
> > > > > and cast it back to the signed type at the end.
> > > > > Whether to perform the cast is now determined by the function
> > > > > cast_mult_synth_to_unsigned in which I've implemented
> > > > > the cases that Marc mentioned in [1]. Please do let me know
> > > > > if there are any other cases that need to be handled.
> > > > Ah, I never meant those cases as an exhaustive list, I was just looking
> > > > for
> > > > examples showing that the transformation was unsafe, and those 2 came to
> > > > mind:
> > > > 
> > > > - x*15 -> x*16-x the second one can overflow even when the first one
> > > > doesn't.
> > > > 
> > > > - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's
> > > > redundant with the negate_variant check?)
> > > > 
> > > > On the other hand, as long as we remain in the 'positive' operations,
> > > > turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2
> > > > cannot cause spurious overflows. But I didn't look at the algorithm
> > > > closely
> > > > enough to characterize the safe cases. Now if you have done it, that's
> > > > good
> > > > :-) Otherwise, we might want to err on the side of caution.
> > > > 
> > > I'll be honest, I didn't give it much thought beyond convincing myself
> > > that
> > > the two cases you listed are legitimate.
> > > Looking at expand_mult_const in expmed.c can be helpful (where it updates
> > > val_so_far for checking purposes) to see
> > > the different algorithm cases. I think the only steps that could cause
> > > overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the
> > > final
> > > step is negate_variant, which are what you listed (and covered in this
> > > patch).
> > > 
> > > richi is away on PTO for the time being though, so we have some time to
> > > convince ourselves :)
> > ;)  I think the easiest way would be to always use unsigned arithmetic.
> > 
> > While VRP doesn't do anything on vector operations we still have some
> > match.pd patterns that rely on correct overflow behavior and those
> > may be enabled for vector types as well.
> 
> That's fine with me.
> Here's the patch that performs the casts to unsigned and back when the input
> type doesn't wrap on overflow.
> 
> Bootstrapped and tested on arm, aarch64, x86_64.
> 
> How's this?

+static bool
+target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
+                                tree scaltype)
+{
...
+  tree vectype = get_vectype_for_scalar_type (scaltype);
+
+  if (!vectype)
+    return false;
+
+  /* All synthesis algorithms require shifts, so bail out early if
+     target cannot vectorize them.
+     TODO: If the target doesn't support vector shifts but supports 
vector
+     addition we could synthesize shifts that way.  
vect_synth_mult_by_constant
+     would need to be updated to do that.  */
+  if (!vect_supportable_shift (LSHIFT_EXPR, scaltype))
+    return false;

I think these tests should be done in the caller before calling
synth_mult (and vectype be passed down accordingly).  Also I wonder
if synth_mult for a * 2 produces a << 1 or a + a - the case of
a * 2 -> a + a was what Marcs patch handled.  Guarding off LSHIFT_EXPR
support that early will make that fail on targets w/o vector shift.

+  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
+  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
+
+  if (var == negate_variant
+      && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
+    return false;

I think we don't have any targets that support PLUS but not MINUS
or targets that do not support NEGATE.  OTOH double-checking doesn't 
matter.

+apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
+                            stmt_vec_info stmt_vinfo)
+{
+  if (TREE_INT_CST_LOW (op2) == 0

integer_zerop (op2)

+      && (code == LSHIFT_EXPR
+         || code == PLUS_EXPR))

+  tree itype = TREE_TYPE (op2);

I think it's dangerous to use the type of op2 given you synthesize
shifts as well.  Why not use the type of op1?

+      /* TODO: Targets that don't support vector shifts could synthesize
+        them using vector additions.  target_supports_mult_synth_alg 
would
+        need to be updated to allow this.  */
+      switch (alg.op[i])
+       {

so I suppose one could at least special-case << 1 and always use
PLUS for that.

Otherwise looks ok to me.  There is a PR with a testcase for
the x * 2 issue so please add that if it is fixed or amend the fix if not 
;)

Thanks,
Richard.

> Thanks,
> Kyrill
> 
> 2016-06-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     PR target/65951
>     * tree-vect-patterns.c: Include mult-synthesis.h
>     (target_supports_mult_synth_alg): New function.
>     (apply_binop_and_append_stmt): Likewise.
>     (vect_synth_mult_by_constant): Likewise.
>     (target_has_vecop_for_code): Likewise.
>     (vect_recog_mult_pattern): Use above functions to synthesize vector
>     multiplication by integer constants.
> 
> 2016-06-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
>     * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-01 12:02         ` Richard Biener
@ 2016-07-05  8:38           ` Kyrill Tkachov
  2016-07-05 10:13             ` Richard Biener
  2016-07-05 10:14             ` Marc Glisse
  0 siblings, 2 replies; 18+ messages in thread
From: Kyrill Tkachov @ 2016-07-05  8:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: Marc Glisse, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 6810 bytes --]


On 01/07/16 13:02, Richard Biener wrote:
> On Thu, 30 Jun 2016, Kyrill Tkachov wrote:
>
>> On 28/06/16 08:54, Richard Biener wrote:
>>> On Thu, 16 Jun 2016, Kyrill Tkachov wrote:
>>>
>>>> On 15/06/16 22:53, Marc Glisse wrote:
>>>>> On Wed, 15 Jun 2016, Kyrill Tkachov wrote:
>>>>>
>>>>>> This is a respin of
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
>>>>>> feedback.
>>>>>> I've changed the code to cast the operand to an unsigned type before
>>>>>> applying the multiplication algorithm
>>>>>> and cast it back to the signed type at the end.
>>>>>> Whether to perform the cast is now determined by the function
>>>>>> cast_mult_synth_to_unsigned in which I've implemented
>>>>>> the cases that Marc mentioned in [1]. Please do let me know
>>>>>> if there are any other cases that need to be handled.
>>>>> Ah, I never meant those cases as an exhaustive list, I was just looking
>>>>> for
>>>>> examples showing that the transformation was unsafe, and those 2 came to
>>>>> mind:
>>>>>
>>>>> - x*15 -> x*16-x the second one can overflow even when the first one
>>>>> doesn't.
>>>>>
>>>>> - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's
>>>>> redundant with the negate_variant check?)
>>>>>
>>>>> On the other hand, as long as we remain in the 'positive' operations,
>>>>> turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2
>>>>> cannot cause spurious overflows. But I didn't look at the algorithm
>>>>> closely
>>>>> enough to characterize the safe cases. Now if you have done it, that's
>>>>> good
>>>>> :-) Otherwise, we might want to err on the side of caution.
>>>>>
>>>> I'll be honest, I didn't give it much thought beyond convincing myself
>>>> that
>>>> the two cases you listed are legitimate.
>>>> Looking at expand_mult_const in expmed.c can be helpful (where it updates
>>>> val_so_far for checking purposes) to see
>>>> the different algorithm cases. I think the only steps that could cause
>>>> overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the
>>>> final
>>>> step is negate_variant, which are what you listed (and covered in this
>>>> patch).
>>>>
>>>> richi is away on PTO for the time being though, so we have some time to
>>>> convince ourselves :)
>>> ;)  I think the easiest way would be to always use unsigned arithmetic.
>>>
>>> While VRP doesn't do anything on vector operations we still have some
>>> match.pd patterns that rely on correct overflow behavior and those
>>> may be enabled for vector types as well.
>> That's fine with me.
>> Here's the patch that performs the casts to unsigned and back when the input
>> type doesn't wrap on overflow.
>>
>> Bootstrapped and tested on arm, aarch64, x86_64.
>>
>> How's this?
> +static bool
> +target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
> +                                tree scaltype)
> +{
> ...
> +  tree vectype = get_vectype_for_scalar_type (scaltype);
> +
> +  if (!vectype)
> +    return false;
> +
> +  /* All synthesis algorithms require shifts, so bail out early if
> +     target cannot vectorize them.
> +     TODO: If the target doesn't support vector shifts but supports
> vector
> +     addition we could synthesize shifts that way.
> vect_synth_mult_by_constant
> +     would need to be updated to do that.  */
> +  if (!vect_supportable_shift (LSHIFT_EXPR, scaltype))
> +    return false;
>
> I think these tests should be done in the caller before calling
> synth_mult (and vectype be passed down accordingly).  Also I wonder
> if synth_mult for a * 2 produces a << 1 or a + a - the case of
> a * 2 -> a + a was what Marcs patch handled.  Guarding off LSHIFT_EXPR
> support that early will make that fail on targets w/o vector shift.

I believe it depends on the relative rtx costs (which of course are not relevant
at vector gimple level). Anyway, I've moved the check outside.

I've also added code to synthesise the shifts by additions when vector shift
is not available (the new function is synth_lshift_by_additions).

> +  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
> +  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
> +
> +  if (var == negate_variant
> +      && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
> +    return false;
>
> I think we don't have any targets that support PLUS but not MINUS
> or targets that do not support NEGATE.  OTOH double-checking doesn't
> matter.
>
> +apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
> +                            stmt_vec_info stmt_vinfo)
> +{
> +  if (TREE_INT_CST_LOW (op2) == 0
>
> integer_zerop (op2)

Ok

> +      && (code == LSHIFT_EXPR
> +         || code == PLUS_EXPR))
>
> +  tree itype = TREE_TYPE (op2);
>
> I think it's dangerous to use the type of op2 given you synthesize
> shifts as well.  Why not use the type of op1?

Yeah, we should be taking the type of op1. Fixed.

> +      /* TODO: Targets that don't support vector shifts could synthesize
> +        them using vector additions.  target_supports_mult_synth_alg
> would
> +        need to be updated to allow this.  */
> +      switch (alg.op[i])
> +       {
>
> so I suppose one could at least special-case << 1 and always use
> PLUS for that.

As said above, I added code to synthesize all constant shifts using additions
if the target doesn't support vector shifts. Note that on targets that do support
vector shifts we will still generate a shift. I think that's fair enough but if you
really want to special-case this to always generate an addition I suppose I can do that.

> Otherwise looks ok to me.  There is a PR with a testcase for
> the x * 2 issue so please add that if it is fixed or amend the fix if not
> ;)

Thanks for the review. I've added the * 2 and * 4 case.
As for testing I've bootstrapped and tested the patch on aarch64 and x86_64
with synth_shift_p in vect_synth_mult_by_constant hacked to be always true to
exercise the paths that synthesize the shift by additions. Marc, could you test this
on the sparc targets you care about?

Thanks,
Kyrill

2016-07-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65951
     * tree-vect-patterns.c: Include mult-synthesis.h.
     (target_supports_mult_synth_alg): New function.
     (synth_lshift_by_additions): Likewise.
     (apply_binop_and_append_stmt): Likewise.
     (vect_synth_mult_by_constant): Likewise.
     (target_has_vecop_for_code): Likewise.
     (vect_recog_mult_pattern): Use above functions to synthesize vector
     multiplication by integer constants.

2016-07-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65951
     * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
     * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
     * gcc.dg/vect/pr65951.c: Likewise.


[-- Attachment #2: vect-mul-int.patch --]
[-- Type: text/x-patch, Size: 17508 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/pr65951.c b/gcc/testsuite/gcc.dg/vect/pr65951.c
new file mode 100644
index 0000000000000000000000000000000000000000..97ec1661d3f1220158dbc30bc2617e127dde47ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65951.c
@@ -0,0 +1,63 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 512
+
+/* These multiplications should be vectorizable with additions when
+   no vector shift is available.  */
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= 2;
+}
+
+__attribute__ ((noinline)) void
+foo2 (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= 4;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / 2 != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo2 (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / 4 != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..e5dba82d7fa955a6a37a0eabf980127e464ac77b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= 123;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / 123 != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..c5beabaa97425cc1e644d37a69eba65036eeaf4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= -19594LL;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / -19594LL != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index f0c515daaa22f59e4bcefe1fcf3e44bf29fe9a40..3be1b89d5842955a4ef56f4a0e5e6f47046c5150 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2131,32 +2131,313 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
   return pattern_stmt;
 }
 
-/* Detect multiplication by constant which are postive or negatives of power 2,
-   and convert them to shift patterns.
+/* Return true iff the target has a vector optab implementing the operation
+   CODE on type VECTYPE.  */
 
-   Mult with constants that are postive power of two.
-   type a_t;
-   type b_t
-   S1: b_t = a_t * n
+static bool
+target_has_vecop_for_code (tree_code code, tree vectype)
+{
+  optab voptab = optab_for_tree_code (code, vectype, optab_vector);
+  return voptab
+	 && optab_handler (voptab, TYPE_MODE (vectype)) != CODE_FOR_nothing;
+}
 
-   or
+/* Verify that the target has optabs of VECTYPE to perform all the steps
+   needed by the multiplication-by-immediate synthesis algorithm described by
+   ALG and VAR.  If SYNTH_SHIFT_P is true ensure that vector addition is
+   present.  Return true iff the target supports all the steps.  */
+
+static bool
+target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
+				 tree vectype, bool synth_shift_p)
+{
+  if (alg->op[0] != alg_zero && alg->op[0] != alg_m)
+    return false;
+
+  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
+  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
+
+  if (var == negate_variant
+      && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
+    return false;
+
+  /* If we must synthesize shifts with additions make sure that vector
+     addition is available.  */
+  if ((var == add_variant || synth_shift_p) && !supports_vplus)
+    return false;
+
+  for (int i = 1; i < alg->ops; i++)
+    {
+      switch (alg->op[i])
+	{
+	case alg_shift:
+	  break;
+	case alg_add_t_m2:
+	case alg_add_t2_m:
+	case alg_add_factor:
+	  if (!supports_vplus)
+	    return false;
+	  break;
+	case alg_sub_t_m2:
+	case alg_sub_t2_m:
+	case alg_sub_factor:
+	  if (!supports_vminus)
+	    return false;
+	  break;
+	case alg_unknown:
+	case alg_m:
+	case alg_zero:
+	case alg_impossible:
+	  return false;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  return true;
+}
+
+/* Synthesize a left shift of OP by AMNT bits using a series of additions and
+   putting the final result in DEST.  Append all statements but the last into
+   VINFO.  Return the last statement.  */
+
+static gimple *
+synth_lshift_by_additions (tree dest, tree op, HOST_WIDE_INT amnt,
+			   stmt_vec_info vinfo)
+{
+  HOST_WIDE_INT i;
+  tree itype = TREE_TYPE (op);
+  tree prev_res = op;
+  gcc_assert (amnt >= 0);
+  for (i = 0; i < amnt; i++)
+    {
+      tree tmp_var = (i < amnt - 1) ? vect_recog_temp_ssa_var (itype, NULL)
+		      : dest;
+      gimple *stmt
+        = gimple_build_assign (tmp_var, PLUS_EXPR, prev_res, prev_res);
+      prev_res = tmp_var;
+      if (i < amnt - 1)
+	append_pattern_def_seq (vinfo, stmt);
+      else
+	return stmt;
+    }
+  gcc_unreachable ();
+  return NULL;
+}
+
+/* Helper for vect_synth_mult_by_constant.  Apply a binary operation
+   CODE to operands OP1 and OP2, creating a new temporary SSA var in
+   the process if necessary.  Append the resulting assignment statements
+   to the sequence in STMT_VINFO.  Return the SSA variable that holds the
+   result of the binary operation.  If SYNTH_SHIFT_P is true synthesize
+   left shifts using additions.  */
+
+static tree
+apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
+			     stmt_vec_info stmt_vinfo, bool synth_shift_p)
+{
+  if (integer_zerop (op2)
+      && (code == LSHIFT_EXPR
+	  || code == PLUS_EXPR))
+    {
+      gcc_assert (TREE_CODE (op1) == SSA_NAME);
+      return op1;
+    }
+
+  gimple *stmt;
+  tree itype = TREE_TYPE (op1);
+  tree tmp_var = vect_recog_temp_ssa_var (itype, NULL);
+
+  if (code == LSHIFT_EXPR
+      && synth_shift_p)
+    {
+      stmt = synth_lshift_by_additions (tmp_var, op1, TREE_INT_CST_LOW (op2),
+					 stmt_vinfo);
+      append_pattern_def_seq (stmt_vinfo, stmt);
+      return tmp_var;
+    }
+
+  stmt = gimple_build_assign (tmp_var, code, op1, op2);
+  append_pattern_def_seq (stmt_vinfo, stmt);
+  return tmp_var;
+}
+
+/* Synthesize a multiplication of OP by an INTEGER_CST VAL using shifts
+   and simple arithmetic operations to be vectorized.  Record the statements
+   produced in STMT_VINFO and return the last statement in the sequence or
+   NULL if it's not possible to synthesize such a multiplication.
+   This function mirrors the behavior of expand_mult_const in expmed.c but
+   works on tree-ssa form.  */
+
+static gimple *
+vect_synth_mult_by_constant (tree op, tree val,
+			     stmt_vec_info stmt_vinfo)
+{
+  tree itype = TREE_TYPE (op);
+  machine_mode mode = TYPE_MODE (itype);
+  struct algorithm alg;
+  mult_variant variant;
+  if (!tree_fits_shwi_p (val))
+    return NULL;
+
+  /* Multiplication synthesis by shifts, adds and subs can introduce
+     signed overflow where the original operation didn't.  Perform the
+     operations on an unsigned type and cast back to avoid this.
+     In the future we may want to relax this for synthesis algorithms
+     that we can prove do not cause unexpected overflow.  */
+  bool cast_to_unsigned_p = !TYPE_OVERFLOW_WRAPS (itype);
+
+  tree multtype = cast_to_unsigned_p ? unsigned_type_for (itype) : itype;
+
+  /* Targets that don't support vector shifts but support vector additions
+     can synthesize shifts that way.  */
+  bool synth_shift_p = !vect_supportable_shift (LSHIFT_EXPR, multtype);
+
+  HOST_WIDE_INT hwval = tree_to_shwi (val);
+  /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.
+     The vectorizer's benefit analysis will decide whether it's beneficial
+     to do this.  */
+  bool possible = choose_mult_variant (mode, hwval, &alg,
+					&variant, MAX_COST);
+  if (!possible)
+    return NULL;
 
-   Mult with constants that are negative power of two.
-   S2: b_t = a_t * -n
+  tree vectype = get_vectype_for_scalar_type (multtype);
+
+  if (!vectype
+      || !target_supports_mult_synth_alg (&alg, variant,
+					   vectype, synth_shift_p))
+    return NULL;
+
+  tree accumulator;
+
+  /* Clear out the sequence of statements so we can populate it below.  */
+  STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo) = NULL;
+  gimple *stmt = NULL;
+
+  if (cast_to_unsigned_p)
+    {
+      tree tmp_op = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (tmp_op, CONVERT_EXPR, op);
+      append_pattern_def_seq (stmt_vinfo, stmt);
+      op = tmp_op;
+    }
+
+  if (alg.op[0] == alg_zero)
+    accumulator = build_int_cst (multtype, 0);
+  else
+    accumulator = op;
+
+  bool needs_fixup = (variant == negate_variant)
+		      || (variant == add_variant);
+
+  for (int i = 1; i < alg.ops; i++)
+    {
+      tree shft_log = build_int_cst (multtype, alg.log[i]);
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      tree tmp_var = NULL_TREE;
+
+      switch (alg.op[i])
+	{
+	case alg_shift:
+	  if (synth_shift_p)
+	    stmt
+	      = synth_lshift_by_additions (accum_tmp, accumulator, alg.log[i],
+					    stmt_vinfo);
+	  else
+	    stmt = gimple_build_assign (accum_tmp, LSHIFT_EXPR, accumulator,
+					 shft_log);
+	  break;
+	case alg_add_t_m2:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, op, shft_log,
+					    stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_t_m2:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op,
+						  shft_log, stmt_vinfo,
+						  synth_shift_p);
+	  /* In some algorithms the first step involves zeroing the
+	     accumulator.  If subtracting from such an accumulator
+	     just emit the negation directly.  */
+	  if (integer_zerop (accumulator))
+	    stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, tmp_var);
+	  else
+	    stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, accumulator,
+					tmp_var);
+	  break;
+	case alg_add_t2_m:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
+					   stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, tmp_var, op);
+	  break;
+	case alg_sub_t2_m:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
+					   stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var, op);
+	  break;
+	case alg_add_factor:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
+					    stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_factor:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
+					   stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var,
+				      accumulator);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      /* We don't want to append the last stmt in the sequence to stmt_vinfo
+	 but rather return it directly.  */
+
+      if ((i < alg.ops - 1) || needs_fixup || cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+      accumulator = accum_tmp;
+    }
+  if (variant == negate_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, accumulator);
+      accumulator = accum_tmp;
+      if (cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+    }
+  else if (variant == add_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator, op);
+      accumulator = accum_tmp;
+      if (cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+    }
+  /* Move back to a signed if needed.  */
+  if (cast_to_unsigned_p)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (itype, NULL);
+      stmt = gimple_build_assign (accum_tmp, CONVERT_EXPR, accumulator);
+    }
+
+  return stmt;
+}
+
+/* Detect multiplication by constant and convert it into a sequence of
+   shifts and additions, subtractions, negations.  We reuse the
+   choose_mult_variant algorithms from expmed.c
 
    Input/Output:
 
    STMTS: Contains a stmt from which the pattern search begins,
-   i.e. the mult stmt.  Convert the mult operation to LSHIFT if
-   constant operand is a power of 2.
-   type a_t, b_t
-   S1': b_t = a_t << log2 (n)
-
-   Convert the mult operation to LSHIFT and followed by a NEGATE
-   if constant operand is a negative power of 2.
-   type a_t, b_t, res_T;
-   S2': b_t = a_t << log2 (n)
-   S3': res_T  = - (b_t)
+   i.e. the mult stmt.
 
  Output:
 
@@ -2164,8 +2445,8 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
 
   * TYPE_OUT: The type of the output of this pattern.
 
-  * Return value: A new stmt that will be used to replace the multiplication
-    S1 or S2 stmt.  */
+  * Return value: A new stmt that will be used to replace
+    the multiplication.  */
 
 static gimple *
 vect_recog_mult_pattern (vec<gimple *> *stmts,
@@ -2173,11 +2454,8 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 {
   gimple *last_stmt = stmts->pop ();
   tree oprnd0, oprnd1, vectype, itype;
-  gimple *pattern_stmt, *def_stmt;
-  optab optab;
+  gimple *pattern_stmt;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
-  int power2_val, power2_neg_val;
-  tree shift;
 
   if (!is_gimple_assign (last_stmt))
     return NULL;
@@ -2201,52 +2479,17 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 
   /* If the target can handle vectorized multiplication natively,
      don't attempt to optimize this.  */
-  optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
-  if (optab != unknown_optab)
+  optab mul_optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
+  if (mul_optab != unknown_optab)
     {
       machine_mode vec_mode = TYPE_MODE (vectype);
-      int icode = (int) optab_handler (optab, vec_mode);
+      int icode = (int) optab_handler (mul_optab, vec_mode);
       if (icode != CODE_FOR_nothing)
-	return NULL;
+       return NULL;
     }
 
-  /* If target cannot handle vector left shift then we cannot
-     optimize and bail out.  */
-  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
-  if (!optab
-      || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-    return NULL;
-
-  power2_val = wi::exact_log2 (oprnd1);
-  power2_neg_val = wi::exact_log2 (wi::neg (oprnd1));
-
-  /* Handle constant operands that are postive or negative powers of 2.  */
-  if (power2_val != -1)
-    {
-      shift = build_int_cst (itype, power2_val);
-      pattern_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
-    }
-  else if (power2_neg_val != -1)
-    {
-      /* If the target cannot handle vector NEGATE then we cannot
-	 do the optimization.  */
-      optab = optab_for_tree_code (NEGATE_EXPR, vectype, optab_vector);
-      if (!optab
-	  || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-	return NULL;
-
-      shift = build_int_cst (itype, power2_neg_val);
-      def_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
-      new_pattern_def_seq (stmt_vinfo, def_stmt);
-      pattern_stmt
-	 = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-				NEGATE_EXPR, gimple_assign_lhs (def_stmt));
-    }
-  else
+  pattern_stmt = vect_synth_mult_by_constant (oprnd0, oprnd1, stmt_vinfo);
+  if (!pattern_stmt)
     return NULL;
 
   /* Pattern detected.  */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-05  8:38           ` Kyrill Tkachov
@ 2016-07-05 10:13             ` Richard Biener
  2016-07-05 10:14             ` Marc Glisse
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Biener @ 2016-07-05 10:13 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Marc Glisse, GCC Patches

On Tue, 5 Jul 2016, Kyrill Tkachov wrote:

> 
> On 01/07/16 13:02, Richard Biener wrote:
> > On Thu, 30 Jun 2016, Kyrill Tkachov wrote:
> > 
> > > On 28/06/16 08:54, Richard Biener wrote:
> > > > On Thu, 16 Jun 2016, Kyrill Tkachov wrote:
> > > > 
> > > > > On 15/06/16 22:53, Marc Glisse wrote:
> > > > > > On Wed, 15 Jun 2016, Kyrill Tkachov wrote:
> > > > > > 
> > > > > > > This is a respin of
> > > > > > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
> > > > > > > feedback.
> > > > > > > I've changed the code to cast the operand to an unsigned type
> > > > > > > before
> > > > > > > applying the multiplication algorithm
> > > > > > > and cast it back to the signed type at the end.
> > > > > > > Whether to perform the cast is now determined by the function
> > > > > > > cast_mult_synth_to_unsigned in which I've implemented
> > > > > > > the cases that Marc mentioned in [1]. Please do let me know
> > > > > > > if there are any other cases that need to be handled.
> > > > > > Ah, I never meant those cases as an exhaustive list, I was just
> > > > > > looking
> > > > > > for
> > > > > > examples showing that the transformation was unsafe, and those 2
> > > > > > came to
> > > > > > mind:
> > > > > > 
> > > > > > - x*15 -> x*16-x the second one can overflow even when the first one
> > > > > > doesn't.
> > > > > > 
> > > > > > - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe
> > > > > > that's
> > > > > > redundant with the negate_variant check?)
> > > > > > 
> > > > > > On the other hand, as long as we remain in the 'positive'
> > > > > > operations,
> > > > > > turning x*3 to x<<1+x seems perfectly safe. And even x*30 to
> > > > > > (x*16-x)*2
> > > > > > cannot cause spurious overflows. But I didn't look at the algorithm
> > > > > > closely
> > > > > > enough to characterize the safe cases. Now if you have done it,
> > > > > > that's
> > > > > > good
> > > > > > :-) Otherwise, we might want to err on the side of caution.
> > > > > > 
> > > > > I'll be honest, I didn't give it much thought beyond convincing myself
> > > > > that
> > > > > the two cases you listed are legitimate.
> > > > > Looking at expand_mult_const in expmed.c can be helpful (where it
> > > > > updates
> > > > > val_so_far for checking purposes) to see
> > > > > the different algorithm cases. I think the only steps that could cause
> > > > > overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the
> > > > > final
> > > > > step is negate_variant, which are what you listed (and covered in this
> > > > > patch).
> > > > > 
> > > > > richi is away on PTO for the time being though, so we have some time
> > > > > to
> > > > > convince ourselves :)
> > > > ;)  I think the easiest way would be to always use unsigned arithmetic.
> > > > 
> > > > While VRP doesn't do anything on vector operations we still have some
> > > > match.pd patterns that rely on correct overflow behavior and those
> > > > may be enabled for vector types as well.
> > > That's fine with me.
> > > Here's the patch that performs the casts to unsigned and back when the
> > > input
> > > type doesn't wrap on overflow.
> > > 
> > > Bootstrapped and tested on arm, aarch64, x86_64.
> > > 
> > > How's this?
> > +static bool
> > +target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
> > +                                tree scaltype)
> > +{
> > ...
> > +  tree vectype = get_vectype_for_scalar_type (scaltype);
> > +
> > +  if (!vectype)
> > +    return false;
> > +
> > +  /* All synthesis algorithms require shifts, so bail out early if
> > +     target cannot vectorize them.
> > +     TODO: If the target doesn't support vector shifts but supports
> > vector
> > +     addition we could synthesize shifts that way.
> > vect_synth_mult_by_constant
> > +     would need to be updated to do that.  */
> > +  if (!vect_supportable_shift (LSHIFT_EXPR, scaltype))
> > +    return false;
> > 
> > I think these tests should be done in the caller before calling
> > synth_mult (and vectype be passed down accordingly).  Also I wonder
> > if synth_mult for a * 2 produces a << 1 or a + a - the case of
> > a * 2 -> a + a was what Marcs patch handled.  Guarding off LSHIFT_EXPR
> > support that early will make that fail on targets w/o vector shift.
> 
> I believe it depends on the relative rtx costs (which of course are not
> relevant
> at vector gimple level). Anyway, I've moved the check outside.
> 
> I've also added code to synthesise the shifts by additions when vector shift
> is not available (the new function is synth_lshift_by_additions).
> 
> > +  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
> > +  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
> > +
> > +  if (var == negate_variant
> > +      && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
> > +    return false;
> > 
> > I think we don't have any targets that support PLUS but not MINUS
> > or targets that do not support NEGATE.  OTOH double-checking doesn't
> > matter.
> > 
> > +apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
> > +                            stmt_vec_info stmt_vinfo)
> > +{
> > +  if (TREE_INT_CST_LOW (op2) == 0
> > 
> > integer_zerop (op2)
> 
> Ok
> 
> > +      && (code == LSHIFT_EXPR
> > +         || code == PLUS_EXPR))
> > 
> > +  tree itype = TREE_TYPE (op2);
> > 
> > I think it's dangerous to use the type of op2 given you synthesize
> > shifts as well.  Why not use the type of op1?
> 
> Yeah, we should be taking the type of op1. Fixed.
> 
> > +      /* TODO: Targets that don't support vector shifts could synthesize
> > +        them using vector additions.  target_supports_mult_synth_alg
> > would
> > +        need to be updated to allow this.  */
> > +      switch (alg.op[i])
> > +       {
> > 
> > so I suppose one could at least special-case << 1 and always use
> > PLUS for that.
> 
> As said above, I added code to synthesize all constant shifts using additions
> if the target doesn't support vector shifts. Note that on targets that do
> support
> vector shifts we will still generate a shift. I think that's fair enough but
> if you
> really want to special-case this to always generate an addition I suppose I
> can do that.
> 
> > Otherwise looks ok to me.  There is a PR with a testcase for
> > the x * 2 issue so please add that if it is fixed or amend the fix if not
> > ;)
> 
> Thanks for the review. I've added the * 2 and * 4 case.
> As for testing I've bootstrapped and tested the patch on aarch64 and x86_64
> with synth_shift_p in vect_synth_mult_by_constant hacked to be always true to
> exercise the paths that synthesize the shift by additions. Marc, could you
> test this
> on the sparc targets you care about?

Looks good.

Thanks,
Richard.

> Thanks,
> Kyrill
> 
> 2016-07-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     PR target/65951
>     * tree-vect-patterns.c: Include mult-synthesis.h.
>     (target_supports_mult_synth_alg): New function.
>     (synth_lshift_by_additions): Likewise.
>     (apply_binop_and_append_stmt): Likewise.
>     (vect_synth_mult_by_constant): Likewise.
>     (target_has_vecop_for_code): Likewise.
>     (vect_recog_mult_pattern): Use above functions to synthesize vector
>     multiplication by integer constants.
> 
> 2016-07-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     PR target/65951
>     * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
>     * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
>     * gcc.dg/vect/pr65951.c: Likewise.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-05  8:38           ` Kyrill Tkachov
  2016-07-05 10:13             ` Richard Biener
@ 2016-07-05 10:14             ` Marc Glisse
  2016-07-05 11:24               ` Rainer Orth
  1 sibling, 1 reply; 18+ messages in thread
From: Marc Glisse @ 2016-07-05 10:14 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Richard Biener, GCC Patches, Rainer Orth

On Tue, 5 Jul 2016, Kyrill Tkachov wrote:

> As for testing I've bootstrapped and tested the patch on aarch64 and 
> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be 
> always true to exercise the paths that synthesize the shift by 
> additions. Marc, could you test this on the sparc targets you care 
> about?

I don't have access to Sparc, you want Rainer here (added in Cc:).

(thanks for completing the patch!)

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-05 10:14             ` Marc Glisse
@ 2016-07-05 11:24               ` Rainer Orth
  2016-07-05 11:31                 ` Kyrill Tkachov
  0 siblings, 1 reply; 18+ messages in thread
From: Rainer Orth @ 2016-07-05 11:24 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Kyrill Tkachov, gcc-patches, Richard Biener

Marc Glisse <marc.glisse@inria.fr> writes:

> On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>
>> As for testing I've bootstrapped and tested the patch on aarch64 and
>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
>> always true to exercise the paths that synthesize the shift by
>> additions. Marc, could you test this on the sparc targets you care about?
>
> I don't have access to Sparc, you want Rainer here (added in Cc:).

As is, the patch doesn't even build on current mainline:

/vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared
 target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
                                                        ^

enum mult_variant is only declared in expmed.c.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-05 11:24               ` Rainer Orth
@ 2016-07-05 11:31                 ` Kyrill Tkachov
  2016-07-06 12:31                   ` Rainer Orth
  0 siblings, 1 reply; 18+ messages in thread
From: Kyrill Tkachov @ 2016-07-05 11:31 UTC (permalink / raw)
  To: Rainer Orth, Marc Glisse; +Cc: gcc-patches, Richard Biener


On 05/07/16 12:24, Rainer Orth wrote:
> Marc Glisse <marc.glisse@inria.fr> writes:
>
>> On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>>
>>> As for testing I've bootstrapped and tested the patch on aarch64 and
>>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
>>> always true to exercise the paths that synthesize the shift by
>>> additions. Marc, could you test this on the sparc targets you care about?
>> I don't have access to Sparc, you want Rainer here (added in Cc:).
> As is, the patch doesn't even build on current mainline:
>
> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared
>   target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
>                                                          ^
>
> enum mult_variant is only declared in expmed.c.

Ah, sorry I forgot to mention that this is patch 2/2.
It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which is already approved
but I was planning to commit it together with this one.
Can you please try applying https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
as well as this?

Thanks,
Kyrill

> 	Rainer
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-05 11:31                 ` Kyrill Tkachov
@ 2016-07-06 12:31                   ` Rainer Orth
  2016-07-06 12:41                     ` Kyrill Tkachov
  0 siblings, 1 reply; 18+ messages in thread
From: Rainer Orth @ 2016-07-06 12:31 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Marc Glisse, gcc-patches, Richard Biener

Hi Kyrill,

> On 05/07/16 12:24, Rainer Orth wrote:
>> Marc Glisse <marc.glisse@inria.fr> writes:
>>
>>> On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>>>
>>>> As for testing I've bootstrapped and tested the patch on aarch64 and
>>>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
>>>> always true to exercise the paths that synthesize the shift by
>>>> additions. Marc, could you test this on the sparc targets you care about?
>>> I don't have access to Sparc, you want Rainer here (added in Cc:).
>> As is, the patch doesn't even build on current mainline:
>>
>> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared
>>   target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
>>                                                          ^
>>
>> enum mult_variant is only declared in expmed.c.
>
> Ah, sorry I forgot to mention that this is patch 2/2.
> It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which
> is already approved
> but I was planning to commit it together with this one.
> Can you please try applying
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
> as well as this?

sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
that the patch fixes PR tree-optimization/70923 (you should mention that
in the ChangeLog or close it as a duplicate), with the same caveat as
about Marc's latest patch for that:

+FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1 loops" 1

The message appears twice, not once, on sparc, so the testcase should be
updated to accomodate that.

Besides, the new testcase FAILs:

+FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 2
+FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1 loops" 2

The dump contains

gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt: _4 = *_3;
gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt: _4 = *_3;
gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-06 12:31                   ` Rainer Orth
@ 2016-07-06 12:41                     ` Kyrill Tkachov
  2016-07-07 16:16                       ` Kyrill Tkachov
  0 siblings, 1 reply; 18+ messages in thread
From: Kyrill Tkachov @ 2016-07-06 12:41 UTC (permalink / raw)
  To: Rainer Orth; +Cc: Marc Glisse, gcc-patches, Richard Biener


On 06/07/16 13:31, Rainer Orth wrote:
> Hi Kyrill,
>
>> On 05/07/16 12:24, Rainer Orth wrote:
>>> Marc Glisse <marc.glisse@inria.fr> writes:
>>>
>>>> On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>>>>
>>>>> As for testing I've bootstrapped and tested the patch on aarch64 and
>>>>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
>>>>> always true to exercise the paths that synthesize the shift by
>>>>> additions. Marc, could you test this on the sparc targets you care about?
>>>> I don't have access to Sparc, you want Rainer here (added in Cc:).
>>> As is, the patch doesn't even build on current mainline:
>>>
>>> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared
>>>    target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
>>>                                                           ^
>>>
>>> enum mult_variant is only declared in expmed.c.
>> Ah, sorry I forgot to mention that this is patch 2/2.
>> It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which
>> is already approved
>> but I was planning to commit it together with this one.
>> Can you please try applying
>> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
>> as well as this?
> sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
> that the patch fixes PR tree-optimization/70923 (you should mention that
> in the ChangeLog or close it as a duplicate), with the same caveat as
> about Marc's latest patch for that:

Thanks! Much appreciated.

> +FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 1
> +FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1 loops" 1
>
> The message appears twice, not once, on sparc, so the testcase should be
> updated to accomodate that.

Ok.

> Besides, the new testcase FAILs:
>
> +FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 2
> +FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1 loops" 2
>
> The dump contains
>
> gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt: _4 = *_3;
> gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
> gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt: _4 = *_3;
> gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
> gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
> gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
> gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
> gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
> gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.

I see. I suppose SPARC doesn't have vector shifts operating on 64-bit data?
I believe the testcase should be updated to use just "int" arrays rather than "long long".

I'll respin the testcases
Thanks again,
Kyrill

> 	Rainer
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-06 12:41                     ` Kyrill Tkachov
@ 2016-07-07 16:16                       ` Kyrill Tkachov
  2016-07-14  8:22                         ` Kyrill Tkachov
  0 siblings, 1 reply; 18+ messages in thread
From: Kyrill Tkachov @ 2016-07-07 16:16 UTC (permalink / raw)
  To: Rainer Orth; +Cc: Marc Glisse, gcc-patches, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 4380 bytes --]


On 06/07/16 13:40, Kyrill Tkachov wrote:
>
> On 06/07/16 13:31, Rainer Orth wrote:
>> Hi Kyrill,
>>
>>> On 05/07/16 12:24, Rainer Orth wrote:
>>>> Marc Glisse <marc.glisse@inria.fr> writes:
>>>>
>>>>> On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>>>>>
>>>>>> As for testing I've bootstrapped and tested the patch on aarch64 and
>>>>>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
>>>>>> always true to exercise the paths that synthesize the shift by
>>>>>> additions. Marc, could you test this on the sparc targets you care about?
>>>>> I don't have access to Sparc, you want Rainer here (added in Cc:).
>>>> As is, the patch doesn't even build on current mainline:
>>>>
>>>> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared
>>>>    target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
>>>>                                                           ^
>>>>
>>>> enum mult_variant is only declared in expmed.c.
>>> Ah, sorry I forgot to mention that this is patch 2/2.
>>> It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which
>>> is already approved
>>> but I was planning to commit it together with this one.
>>> Can you please try applying
>>> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
>>> as well as this?
>> sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
>> that the patch fixes PR tree-optimization/70923 (you should mention that
>> in the ChangeLog or close it as a duplicate), with the same caveat as
>> about Marc's latest patch for that:
>
> Thanks! Much appreciated.
>
>> +FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 1
>> +FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1 loops" 1
>>
>> The message appears twice, not once, on sparc, so the testcase should be
>> updated to accomodate that.
>
> Ok.
>
>> Besides, the new testcase FAILs:
>>
>> +FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 2
>> +FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1 loops" 2
>>
>> The dump contains
>>
>> gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt: _4 = *_3;
>> gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
>> gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt: _4 = *_3;
>> gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
>> gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
>> gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
>> gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
>> gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
>> gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.
>
> I see. I suppose SPARC doesn't have vector shifts operating on 64-bit data?
> I believe the testcase should be updated to use just "int" arrays rather than "long long".
>
> I'll respin the testcases

Ok, here's the patch with pr65951.c updated to use int rather than long long arrays and vect-iv-9.c updated to scan for the
"Vectorized 1 loops" string twice. I've verified by hand by building a sparc-sun-solaris2.10 cc1 that compiling with
-mcpu=ultrasparc -mvis gives the right strings in the vectoriser dump.

So is this version ok?

Thanks,
Kyrill


2016-07-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65951
     PR tree-optimization/70923
     * tree-vect-patterns.c: Include mult-synthesis.h.
     (target_supports_mult_synth_alg): New function.
     (synth_lshift_by_additions): Likewise.
     (apply_binop_and_append_stmt): Likewise.
     (vect_synth_mult_by_constant): Likewise.
     (target_has_vecop_for_code): Likewise.
     (vect_recog_mult_pattern): Use above functions to synthesize vector
     multiplication by integer constants.

2016-07-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65951
     PR tree-optimization/70923
     * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
     * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
     * gcc.dg/vect/pr65951.c: Likewise.
     * gcc.dg/vect/vect-iv-9.c: Remove ! vect_int_mult-specific scan.


[-- Attachment #2: vect-mult.patch --]
[-- Type: text/x-patch, Size: 18098 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/pr65951.c b/gcc/testsuite/gcc.dg/vect/pr65951.c
new file mode 100644
index 0000000000000000000000000000000000000000..cfd32373181fc4fc1e37677ccec6505321a81b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65951.c
@@ -0,0 +1,63 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 512
+
+/* These multiplications should be vectorizable with additions when
+   no vector shift is available.  */
+
+__attribute__ ((noinline)) void
+foo (int *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= 2;
+}
+
+__attribute__ ((noinline)) void
+foo2 (int *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= 4;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  int data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / 2 != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo2 (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / 4 != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-9.c b/gcc/testsuite/gcc.dg/vect/vect-iv-9.c
index 99a7e18965ef5c0746272864462b839541e26d43..e548b81b922fccc738f217a37a67042ee8e1856f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-9.c
@@ -33,5 +33,4 @@ int main (void)
   return 0;
 } 
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_int_mult } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target {! vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..e5dba82d7fa955a6a37a0eabf980127e464ac77b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= 123;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / 123 != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..c5beabaa97425cc1e644d37a69eba65036eeaf4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= -19594LL;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / -19594LL != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index f0c515daaa22f59e4bcefe1fcf3e44bf29fe9a40..3be1b89d5842955a4ef56f4a0e5e6f47046c5150 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2131,32 +2131,313 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
   return pattern_stmt;
 }
 
-/* Detect multiplication by constant which are postive or negatives of power 2,
-   and convert them to shift patterns.
+/* Return true iff the target has a vector optab implementing the operation
+   CODE on type VECTYPE.  */
 
-   Mult with constants that are postive power of two.
-   type a_t;
-   type b_t
-   S1: b_t = a_t * n
+static bool
+target_has_vecop_for_code (tree_code code, tree vectype)
+{
+  optab voptab = optab_for_tree_code (code, vectype, optab_vector);
+  return voptab
+	 && optab_handler (voptab, TYPE_MODE (vectype)) != CODE_FOR_nothing;
+}
 
-   or
+/* Verify that the target has optabs of VECTYPE to perform all the steps
+   needed by the multiplication-by-immediate synthesis algorithm described by
+   ALG and VAR.  If SYNTH_SHIFT_P is true ensure that vector addition is
+   present.  Return true iff the target supports all the steps.  */
+
+static bool
+target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
+				 tree vectype, bool synth_shift_p)
+{
+  if (alg->op[0] != alg_zero && alg->op[0] != alg_m)
+    return false;
+
+  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
+  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
+
+  if (var == negate_variant
+      && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
+    return false;
+
+  /* If we must synthesize shifts with additions make sure that vector
+     addition is available.  */
+  if ((var == add_variant || synth_shift_p) && !supports_vplus)
+    return false;
+
+  for (int i = 1; i < alg->ops; i++)
+    {
+      switch (alg->op[i])
+	{
+	case alg_shift:
+	  break;
+	case alg_add_t_m2:
+	case alg_add_t2_m:
+	case alg_add_factor:
+	  if (!supports_vplus)
+	    return false;
+	  break;
+	case alg_sub_t_m2:
+	case alg_sub_t2_m:
+	case alg_sub_factor:
+	  if (!supports_vminus)
+	    return false;
+	  break;
+	case alg_unknown:
+	case alg_m:
+	case alg_zero:
+	case alg_impossible:
+	  return false;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  return true;
+}
+
+/* Synthesize a left shift of OP by AMNT bits using a series of additions and
+   putting the final result in DEST.  Append all statements but the last into
+   VINFO.  Return the last statement.  */
+
+static gimple *
+synth_lshift_by_additions (tree dest, tree op, HOST_WIDE_INT amnt,
+			   stmt_vec_info vinfo)
+{
+  HOST_WIDE_INT i;
+  tree itype = TREE_TYPE (op);
+  tree prev_res = op;
+  gcc_assert (amnt >= 0);
+  for (i = 0; i < amnt; i++)
+    {
+      tree tmp_var = (i < amnt - 1) ? vect_recog_temp_ssa_var (itype, NULL)
+		      : dest;
+      gimple *stmt
+        = gimple_build_assign (tmp_var, PLUS_EXPR, prev_res, prev_res);
+      prev_res = tmp_var;
+      if (i < amnt - 1)
+	append_pattern_def_seq (vinfo, stmt);
+      else
+	return stmt;
+    }
+  gcc_unreachable ();
+  return NULL;
+}
+
+/* Helper for vect_synth_mult_by_constant.  Apply a binary operation
+   CODE to operands OP1 and OP2, creating a new temporary SSA var in
+   the process if necessary.  Append the resulting assignment statements
+   to the sequence in STMT_VINFO.  Return the SSA variable that holds the
+   result of the binary operation.  If SYNTH_SHIFT_P is true synthesize
+   left shifts using additions.  */
+
+static tree
+apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
+			     stmt_vec_info stmt_vinfo, bool synth_shift_p)
+{
+  if (integer_zerop (op2)
+      && (code == LSHIFT_EXPR
+	  || code == PLUS_EXPR))
+    {
+      gcc_assert (TREE_CODE (op1) == SSA_NAME);
+      return op1;
+    }
+
+  gimple *stmt;
+  tree itype = TREE_TYPE (op1);
+  tree tmp_var = vect_recog_temp_ssa_var (itype, NULL);
+
+  if (code == LSHIFT_EXPR
+      && synth_shift_p)
+    {
+      stmt = synth_lshift_by_additions (tmp_var, op1, TREE_INT_CST_LOW (op2),
+					 stmt_vinfo);
+      append_pattern_def_seq (stmt_vinfo, stmt);
+      return tmp_var;
+    }
+
+  stmt = gimple_build_assign (tmp_var, code, op1, op2);
+  append_pattern_def_seq (stmt_vinfo, stmt);
+  return tmp_var;
+}
+
+/* Synthesize a multiplication of OP by an INTEGER_CST VAL using shifts
+   and simple arithmetic operations to be vectorized.  Record the statements
+   produced in STMT_VINFO and return the last statement in the sequence or
+   NULL if it's not possible to synthesize such a multiplication.
+   This function mirrors the behavior of expand_mult_const in expmed.c but
+   works on tree-ssa form.  */
+
+static gimple *
+vect_synth_mult_by_constant (tree op, tree val,
+			     stmt_vec_info stmt_vinfo)
+{
+  tree itype = TREE_TYPE (op);
+  machine_mode mode = TYPE_MODE (itype);
+  struct algorithm alg;
+  mult_variant variant;
+  if (!tree_fits_shwi_p (val))
+    return NULL;
+
+  /* Multiplication synthesis by shifts, adds and subs can introduce
+     signed overflow where the original operation didn't.  Perform the
+     operations on an unsigned type and cast back to avoid this.
+     In the future we may want to relax this for synthesis algorithms
+     that we can prove do not cause unexpected overflow.  */
+  bool cast_to_unsigned_p = !TYPE_OVERFLOW_WRAPS (itype);
+
+  tree multtype = cast_to_unsigned_p ? unsigned_type_for (itype) : itype;
+
+  /* Targets that don't support vector shifts but support vector additions
+     can synthesize shifts that way.  */
+  bool synth_shift_p = !vect_supportable_shift (LSHIFT_EXPR, multtype);
+
+  HOST_WIDE_INT hwval = tree_to_shwi (val);
+  /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.
+     The vectorizer's benefit analysis will decide whether it's beneficial
+     to do this.  */
+  bool possible = choose_mult_variant (mode, hwval, &alg,
+					&variant, MAX_COST);
+  if (!possible)
+    return NULL;
 
-   Mult with constants that are negative power of two.
-   S2: b_t = a_t * -n
+  tree vectype = get_vectype_for_scalar_type (multtype);
+
+  if (!vectype
+      || !target_supports_mult_synth_alg (&alg, variant,
+					   vectype, synth_shift_p))
+    return NULL;
+
+  tree accumulator;
+
+  /* Clear out the sequence of statements so we can populate it below.  */
+  STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo) = NULL;
+  gimple *stmt = NULL;
+
+  if (cast_to_unsigned_p)
+    {
+      tree tmp_op = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (tmp_op, CONVERT_EXPR, op);
+      append_pattern_def_seq (stmt_vinfo, stmt);
+      op = tmp_op;
+    }
+
+  if (alg.op[0] == alg_zero)
+    accumulator = build_int_cst (multtype, 0);
+  else
+    accumulator = op;
+
+  bool needs_fixup = (variant == negate_variant)
+		      || (variant == add_variant);
+
+  for (int i = 1; i < alg.ops; i++)
+    {
+      tree shft_log = build_int_cst (multtype, alg.log[i]);
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      tree tmp_var = NULL_TREE;
+
+      switch (alg.op[i])
+	{
+	case alg_shift:
+	  if (synth_shift_p)
+	    stmt
+	      = synth_lshift_by_additions (accum_tmp, accumulator, alg.log[i],
+					    stmt_vinfo);
+	  else
+	    stmt = gimple_build_assign (accum_tmp, LSHIFT_EXPR, accumulator,
+					 shft_log);
+	  break;
+	case alg_add_t_m2:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, op, shft_log,
+					    stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_t_m2:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op,
+						  shft_log, stmt_vinfo,
+						  synth_shift_p);
+	  /* In some algorithms the first step involves zeroing the
+	     accumulator.  If subtracting from such an accumulator
+	     just emit the negation directly.  */
+	  if (integer_zerop (accumulator))
+	    stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, tmp_var);
+	  else
+	    stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, accumulator,
+					tmp_var);
+	  break;
+	case alg_add_t2_m:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
+					   stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, tmp_var, op);
+	  break;
+	case alg_sub_t2_m:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
+					   stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var, op);
+	  break;
+	case alg_add_factor:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
+					    stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_factor:
+	  tmp_var
+	    = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
+					   stmt_vinfo, synth_shift_p);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var,
+				      accumulator);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      /* We don't want to append the last stmt in the sequence to stmt_vinfo
+	 but rather return it directly.  */
+
+      if ((i < alg.ops - 1) || needs_fixup || cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+      accumulator = accum_tmp;
+    }
+  if (variant == negate_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, accumulator);
+      accumulator = accum_tmp;
+      if (cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+    }
+  else if (variant == add_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
+      stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator, op);
+      accumulator = accum_tmp;
+      if (cast_to_unsigned_p)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+    }
+  /* Move back to a signed if needed.  */
+  if (cast_to_unsigned_p)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (itype, NULL);
+      stmt = gimple_build_assign (accum_tmp, CONVERT_EXPR, accumulator);
+    }
+
+  return stmt;
+}
+
+/* Detect multiplication by constant and convert it into a sequence of
+   shifts and additions, subtractions, negations.  We reuse the
+   choose_mult_variant algorithms from expmed.c
 
    Input/Output:
 
    STMTS: Contains a stmt from which the pattern search begins,
-   i.e. the mult stmt.  Convert the mult operation to LSHIFT if
-   constant operand is a power of 2.
-   type a_t, b_t
-   S1': b_t = a_t << log2 (n)
-
-   Convert the mult operation to LSHIFT and followed by a NEGATE
-   if constant operand is a negative power of 2.
-   type a_t, b_t, res_T;
-   S2': b_t = a_t << log2 (n)
-   S3': res_T  = - (b_t)
+   i.e. the mult stmt.
 
  Output:
 
@@ -2164,8 +2445,8 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
 
   * TYPE_OUT: The type of the output of this pattern.
 
-  * Return value: A new stmt that will be used to replace the multiplication
-    S1 or S2 stmt.  */
+  * Return value: A new stmt that will be used to replace
+    the multiplication.  */
 
 static gimple *
 vect_recog_mult_pattern (vec<gimple *> *stmts,
@@ -2173,11 +2454,8 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 {
   gimple *last_stmt = stmts->pop ();
   tree oprnd0, oprnd1, vectype, itype;
-  gimple *pattern_stmt, *def_stmt;
-  optab optab;
+  gimple *pattern_stmt;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
-  int power2_val, power2_neg_val;
-  tree shift;
 
   if (!is_gimple_assign (last_stmt))
     return NULL;
@@ -2201,52 +2479,17 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 
   /* If the target can handle vectorized multiplication natively,
      don't attempt to optimize this.  */
-  optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
-  if (optab != unknown_optab)
+  optab mul_optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
+  if (mul_optab != unknown_optab)
     {
       machine_mode vec_mode = TYPE_MODE (vectype);
-      int icode = (int) optab_handler (optab, vec_mode);
+      int icode = (int) optab_handler (mul_optab, vec_mode);
       if (icode != CODE_FOR_nothing)
-	return NULL;
+       return NULL;
     }
 
-  /* If target cannot handle vector left shift then we cannot
-     optimize and bail out.  */
-  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
-  if (!optab
-      || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-    return NULL;
-
-  power2_val = wi::exact_log2 (oprnd1);
-  power2_neg_val = wi::exact_log2 (wi::neg (oprnd1));
-
-  /* Handle constant operands that are postive or negative powers of 2.  */
-  if (power2_val != -1)
-    {
-      shift = build_int_cst (itype, power2_val);
-      pattern_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
-    }
-  else if (power2_neg_val != -1)
-    {
-      /* If the target cannot handle vector NEGATE then we cannot
-	 do the optimization.  */
-      optab = optab_for_tree_code (NEGATE_EXPR, vectype, optab_vector);
-      if (!optab
-	  || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-	return NULL;
-
-      shift = build_int_cst (itype, power2_neg_val);
-      def_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
-      new_pattern_def_seq (stmt_vinfo, def_stmt);
-      pattern_stmt
-	 = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-				NEGATE_EXPR, gimple_assign_lhs (def_stmt));
-    }
-  else
+  pattern_stmt = vect_synth_mult_by_constant (oprnd0, oprnd1, stmt_vinfo);
+  if (!pattern_stmt)
     return NULL;
 
   /* Pattern detected.  */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-07 16:16                       ` Kyrill Tkachov
@ 2016-07-14  8:22                         ` Kyrill Tkachov
  2016-07-14 10:04                           ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Kyrill Tkachov @ 2016-07-14  8:22 UTC (permalink / raw)
  To: Rainer Orth; +Cc: Marc Glisse, gcc-patches, Richard Biener

Hi all,

On 07/07/16 17:16, Kyrill Tkachov wrote:
>
> On 06/07/16 13:40, Kyrill Tkachov wrote:
>>
>> On 06/07/16 13:31, Rainer Orth wrote:
>>> Hi Kyrill,
>>>
>>>> On 05/07/16 12:24, Rainer Orth wrote:
>>>>> Marc Glisse <marc.glisse@inria.fr> writes:
>>>>>
>>>>>> On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>>>>>>
>>>>>>> As for testing I've bootstrapped and tested the patch on aarch64 and
>>>>>>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
>>>>>>> always true to exercise the paths that synthesize the shift by
>>>>>>> additions. Marc, could you test this on the sparc targets you care about?
>>>>>> I don't have access to Sparc, you want Rainer here (added in Cc:).
>>>>> As is, the patch doesn't even build on current mainline:
>>>>>
>>>>> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared
>>>>>    target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
>>>>> ^
>>>>>
>>>>> enum mult_variant is only declared in expmed.c.
>>>> Ah, sorry I forgot to mention that this is patch 2/2.
>>>> It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which
>>>> is already approved
>>>> but I was planning to commit it together with this one.
>>>> Can you please try applying
>>>> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
>>>> as well as this?
>>> sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
>>> that the patch fixes PR tree-optimization/70923 (you should mention that
>>> in the ChangeLog or close it as a duplicate), with the same caveat as
>>> about Marc's latest patch for that:
>>
>> Thanks! Much appreciated.
>>
>>> +FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 1
>>> +FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1 loops" 1
>>>
>>> The message appears twice, not once, on sparc, so the testcase should be
>>> updated to accomodate that.
>>
>> Ok.
>>
>>> Besides, the new testcase FAILs:
>>>
>>> +FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 2
>>> +FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1 loops" 2
>>>
>>> The dump contains
>>>
>>> gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt: _4 = *_3;
>>> gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
>>> gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt: _4 = *_3;
>>> gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
>>> gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
>>> gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
>>> gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
>>> gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed
>>> gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.
>>
>> I see. I suppose SPARC doesn't have vector shifts operating on 64-bit data?
>> I believe the testcase should be updated to use just "int" arrays rather than "long long".
>>
>> I'll respin the testcases
>
> Ok, here's the patch with pr65951.c updated to use int rather than long long arrays and vect-iv-9.c updated to scan for the
> "Vectorized 1 loops" string twice. I've verified by hand by building a sparc-sun-solaris2.10 cc1 that compiling with
> -mcpu=ultrasparc -mvis gives the right strings in the vectoriser dump.
>
> So is this version ok?
>

So richi was ok with the implementation part of the patch [1]
Are the testuite changes made since ok?

[1] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00177.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00331.html


Thanks,
Kyrill

>
>
> 2016-07-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR target/65951
>     PR tree-optimization/70923
>     * tree-vect-patterns.c: Include mult-synthesis.h.
>     (target_supports_mult_synth_alg): New function.
>     (synth_lshift_by_additions): Likewise.
>     (apply_binop_and_append_stmt): Likewise.
>     (vect_synth_mult_by_constant): Likewise.
>     (target_has_vecop_for_code): Likewise.
>     (vect_recog_mult_pattern): Use above functions to synthesize vector
>     multiplication by integer constants.
>
> 2016-07-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR target/65951
>     PR tree-optimization/70923
>     * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
>     * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
>     * gcc.dg/vect/pr65951.c: Likewise.
>     * gcc.dg/vect/vect-iv-9.c: Remove ! vect_int_mult-specific scan.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-14  8:22                         ` Kyrill Tkachov
@ 2016-07-14 10:04                           ` Richard Biener
  2016-07-14 13:31                             ` Rainer Orth
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2016-07-14 10:04 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Rainer Orth, Marc Glisse, gcc-patches

On Thu, 14 Jul 2016, Kyrill Tkachov wrote:

> Hi all,
> 
> On 07/07/16 17:16, Kyrill Tkachov wrote:
> > 
> > On 06/07/16 13:40, Kyrill Tkachov wrote:
> > > 
> > > On 06/07/16 13:31, Rainer Orth wrote:
> > > > Hi Kyrill,
> > > > 
> > > > > On 05/07/16 12:24, Rainer Orth wrote:
> > > > > > Marc Glisse <marc.glisse@inria.fr> writes:
> > > > > > 
> > > > > > > On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
> > > > > > > 
> > > > > > > > As for testing I've bootstrapped and tested the patch on aarch64
> > > > > > > > and
> > > > > > > > x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked
> > > > > > > > to be
> > > > > > > > always true to exercise the paths that synthesize the shift by
> > > > > > > > additions. Marc, could you test this on the sparc targets you
> > > > > > > > care about?
> > > > > > > I don't have access to Sparc, you want Rainer here (added in Cc:).
> > > > > > As is, the patch doesn't even build on current mainline:
> > > > > > 
> > > > > > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error:
> > > > > > 'mult_variant' has not been declared
> > > > > >    target_supports_mult_synth_alg (struct algorithm *alg,
> > > > > > mult_variant var,
> > > > > > ^
> > > > > > 
> > > > > > enum mult_variant is only declared in expmed.c.
> > > > > Ah, sorry I forgot to mention that this is patch 2/2.
> > > > > It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
> > > > > which
> > > > > is already approved
> > > > > but I was planning to commit it together with this one.
> > > > > Can you please try applying
> > > > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
> > > > > as well as this?
> > > > sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
> > > > that the patch fixes PR tree-optimization/70923 (you should mention that
> > > > in the ChangeLog or close it as a duplicate), with the same caveat as
> > > > about Marc's latest patch for that:
> > > 
> > > Thanks! Much appreciated.
> > > 
> > > > +FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects
> > > > scan-tree-dump-times vect "vectorized 1 loops" 1
> > > > +FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1
> > > > loops" 1
> > > > 
> > > > The message appears twice, not once, on sparc, so the testcase should be
> > > > updated to accomodate that.
> > > 
> > > Ok.
> > > 
> > > > Besides, the new testcase FAILs:
> > > > 
> > > > +FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects
> > > > scan-tree-dump-times vect "vectorized 1 loops" 2
> > > > +FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1
> > > > loops" 2
> > > > 
> > > > The dump contains
> > > > 
> > > > gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt:
> > > > _4 = *_3;
> > > > gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
> > > > gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt:
> > > > _4 = *_3;
> > > > gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
> > > > gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
> > > > gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function
> > > > calls or data references that cannot be analyzed
> > > > gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
> > > > gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function
> > > > calls or data references that cannot be analyzed
> > > > gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.
> > > 
> > > I see. I suppose SPARC doesn't have vector shifts operating on 64-bit
> > > data?
> > > I believe the testcase should be updated to use just "int" arrays rather
> > > than "long long".
> > > 
> > > I'll respin the testcases
> > 
> > Ok, here's the patch with pr65951.c updated to use int rather than long long
> > arrays and vect-iv-9.c updated to scan for the
> > "Vectorized 1 loops" string twice. I've verified by hand by building a
> > sparc-sun-solaris2.10 cc1 that compiling with
> > -mcpu=ultrasparc -mvis gives the right strings in the vectoriser dump.
> > 
> > So is this version ok?
> > 
> 
> So richi was ok with the implementation part of the patch [1]
> Are the testuite changes made since ok?
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00177.html
> [2] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00331.html

Ok with me - I wondered if Rainer wanted to double-check.

Richard.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-14 10:04                           ` Richard Biener
@ 2016-07-14 13:31                             ` Rainer Orth
  2016-07-15 11:14                               ` Kyrill Tkachov
  0 siblings, 1 reply; 18+ messages in thread
From: Rainer Orth @ 2016-07-14 13:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: Kyrill Tkachov, Marc Glisse, gcc-patches

Hi Richard,

> On Thu, 14 Jul 2016, Kyrill Tkachov wrote:
>
>> Hi all,
>> 
>> On 07/07/16 17:16, Kyrill Tkachov wrote:
>> > 
>> > On 06/07/16 13:40, Kyrill Tkachov wrote:
>> > > 
>> > > On 06/07/16 13:31, Rainer Orth wrote:
>> > > > Hi Kyrill,
>> > > > 
>> > > > > On 05/07/16 12:24, Rainer Orth wrote:
>> > > > > > Marc Glisse <marc.glisse@inria.fr> writes:
>> > > > > > 
>> > > > > > > On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>> > > > > > > 
>> > > > > > > > As for testing I've bootstrapped and tested the patch on aarch64
>> > > > > > > > and
>> > > > > > > > x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked
>> > > > > > > > to be
>> > > > > > > > always true to exercise the paths that synthesize the shift by
>> > > > > > > > additions. Marc, could you test this on the sparc targets you
>> > > > > > > > care about?
>> > > > > > > I don't have access to Sparc, you want Rainer here (added in Cc:).
>> > > > > > As is, the patch doesn't even build on current mainline:
>> > > > > > 
>> > > > > > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error:
>> > > > > > 'mult_variant' has not been declared
>> > > > > >    target_supports_mult_synth_alg (struct algorithm *alg,
>> > > > > > mult_variant var,
>> > > > > > ^
>> > > > > > 
>> > > > > > enum mult_variant is only declared in expmed.c.
>> > > > > Ah, sorry I forgot to mention that this is patch 2/2.
>> > > > > It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
>> > > > > which
>> > > > > is already approved
>> > > > > but I was planning to commit it together with this one.
>> > > > > Can you please try applying
>> > > > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
>> > > > > as well as this?
>> > > > sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
>> > > > that the patch fixes PR tree-optimization/70923 (you should mention that
>> > > > in the ChangeLog or close it as a duplicate), with the same caveat as
>> > > > about Marc's latest patch for that:
>> > > 
>> > > Thanks! Much appreciated.
>> > > 
>> > > > +FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects
>> > > > scan-tree-dump-times vect "vectorized 1 loops" 1
>> > > > +FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1
>> > > > loops" 1
>> > > > 
>> > > > The message appears twice, not once, on sparc, so the testcase should be
>> > > > updated to accomodate that.
>> > > 
>> > > Ok.
>> > > 
>> > > > Besides, the new testcase FAILs:
>> > > > 
>> > > > +FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects
>> > > > scan-tree-dump-times vect "vectorized 1 loops" 2
>> > > > +FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1
>> > > > loops" 2
>> > > > 
>> > > > The dump contains
>> > > > 
>> > > > gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt:
>> > > > _4 = *_3;
>> > > > gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
>> > > > gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt:
>> > > > _4 = *_3;
>> > > > gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
>> > > > gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
>> > > > gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function
>> > > > calls or data references that cannot be analyzed
>> > > > gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
>> > > > gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function
>> > > > calls or data references that cannot be analyzed
>> > > > gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.
>> > > 
>> > > I see. I suppose SPARC doesn't have vector shifts operating on 64-bit
>> > > data?
>> > > I believe the testcase should be updated to use just "int" arrays rather
>> > > than "long long".
>> > > 
>> > > I'll respin the testcases
>> > 
>> > Ok, here's the patch with pr65951.c updated to use int rather than long long
>> > arrays and vect-iv-9.c updated to scan for the
>> > "Vectorized 1 loops" string twice. I've verified by hand by building a
>> > sparc-sun-solaris2.10 cc1 that compiling with
>> > -mcpu=ultrasparc -mvis gives the right strings in the vectoriser dump.
>> > 
>> > So is this version ok?
>> > 
>> 
>> So richi was ok with the implementation part of the patch [1]
>> Are the testuite changes made since ok?
>> 
>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00177.html
>> [2] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00331.html
>
> Ok with me - I wondered if Rainer wanted to double-check.

I just ran a sparc-sun-solaris2.12 bootstrap with the patch(es) applied
and results look good: the remaining failures are unrelated and have
already been reported separately.

Thanks.
        Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
  2016-07-14 13:31                             ` Rainer Orth
@ 2016-07-15 11:14                               ` Kyrill Tkachov
  0 siblings, 0 replies; 18+ messages in thread
From: Kyrill Tkachov @ 2016-07-15 11:14 UTC (permalink / raw)
  To: Rainer Orth, Richard Biener; +Cc: Marc Glisse, gcc-patches


On 14/07/16 14:31, Rainer Orth wrote:
> Hi Richard,
>
>> On Thu, 14 Jul 2016, Kyrill Tkachov wrote:
>>
>>> Hi all,
>>>
>>> On 07/07/16 17:16, Kyrill Tkachov wrote:
>>>> On 06/07/16 13:40, Kyrill Tkachov wrote:
>>>>> On 06/07/16 13:31, Rainer Orth wrote:
>>>>>> Hi Kyrill,
>>>>>>
>>>>>>> On 05/07/16 12:24, Rainer Orth wrote:
>>>>>>>> Marc Glisse <marc.glisse@inria.fr> writes:
>>>>>>>>
>>>>>>>>> On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>>>>>>>>>
>>>>>>>>>> As for testing I've bootstrapped and tested the patch on aarch64
>>>>>>>>>> and
>>>>>>>>>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked
>>>>>>>>>> to be
>>>>>>>>>> always true to exercise the paths that synthesize the shift by
>>>>>>>>>> additions. Marc, could you test this on the sparc targets you
>>>>>>>>>> care about?
>>>>>>>>> I don't have access to Sparc, you want Rainer here (added in Cc:).
>>>>>>>> As is, the patch doesn't even build on current mainline:
>>>>>>>>
>>>>>>>> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error:
>>>>>>>> 'mult_variant' has not been declared
>>>>>>>>     target_supports_mult_synth_alg (struct algorithm *alg,
>>>>>>>> mult_variant var,
>>>>>>>> ^
>>>>>>>>
>>>>>>>> enum mult_variant is only declared in expmed.c.
>>>>>>> Ah, sorry I forgot to mention that this is patch 2/2.
>>>>>>> It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
>>>>>>> which
>>>>>>> is already approved
>>>>>>> but I was planning to commit it together with this one.
>>>>>>> Can you please try applying
>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
>>>>>>> as well as this?
>>>>>> sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
>>>>>> that the patch fixes PR tree-optimization/70923 (you should mention that
>>>>>> in the ChangeLog or close it as a duplicate), with the same caveat as
>>>>>> about Marc's latest patch for that:
>>>>> Thanks! Much appreciated.
>>>>>
>>>>>> +FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects
>>>>>> scan-tree-dump-times vect "vectorized 1 loops" 1
>>>>>> +FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1
>>>>>> loops" 1
>>>>>>
>>>>>> The message appears twice, not once, on sparc, so the testcase should be
>>>>>> updated to accomodate that.
>>>>> Ok.
>>>>>
>>>>>> Besides, the new testcase FAILs:
>>>>>>
>>>>>> +FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects
>>>>>> scan-tree-dump-times vect "vectorized 1 loops" 2
>>>>>> +FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1
>>>>>> loops" 2
>>>>>>
>>>>>> The dump contains
>>>>>>
>>>>>> gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt:
>>>>>> _4 = *_3;
>>>>>> gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
>>>>>> gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt:
>>>>>> _4 = *_3;
>>>>>> gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
>>>>>> gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
>>>>>> gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function
>>>>>> calls or data references that cannot be analyzed
>>>>>> gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
>>>>>> gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function
>>>>>> calls or data references that cannot be analyzed
>>>>>> gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.
>>>>> I see. I suppose SPARC doesn't have vector shifts operating on 64-bit
>>>>> data?
>>>>> I believe the testcase should be updated to use just "int" arrays rather
>>>>> than "long long".
>>>>>
>>>>> I'll respin the testcases
>>>> Ok, here's the patch with pr65951.c updated to use int rather than long long
>>>> arrays and vect-iv-9.c updated to scan for the
>>>> "Vectorized 1 loops" string twice. I've verified by hand by building a
>>>> sparc-sun-solaris2.10 cc1 that compiling with
>>>> -mcpu=ultrasparc -mvis gives the right strings in the vectoriser dump.
>>>>
>>>> So is this version ok?
>>>>
>>> So richi was ok with the implementation part of the patch [1]
>>> Are the testuite changes made since ok?
>>>
>>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00177.html
>>> [2] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00331.html
>> Ok with me - I wondered if Rainer wanted to double-check.
> I just ran a sparc-sun-solaris2.12 bootstrap with the patch(es) applied
> and results look good: the remaining failures are unrelated and have
> already been reported separately.

Thanks for confirming.
I've committed the patches to trunk.

Kyrill


> Thanks.
>          Rainer
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-07-15 11:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-15 13:24 [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant Kyrill Tkachov
2016-06-15 21:53 ` Marc Glisse
2016-06-16  9:06   ` Kyrill Tkachov
2016-06-28  8:08     ` Richard Biener
2016-06-30  9:00       ` Kyrill Tkachov
2016-07-01 12:02         ` Richard Biener
2016-07-05  8:38           ` Kyrill Tkachov
2016-07-05 10:13             ` Richard Biener
2016-07-05 10:14             ` Marc Glisse
2016-07-05 11:24               ` Rainer Orth
2016-07-05 11:31                 ` Kyrill Tkachov
2016-07-06 12:31                   ` Rainer Orth
2016-07-06 12:41                     ` Kyrill Tkachov
2016-07-07 16:16                       ` Kyrill Tkachov
2016-07-14  8:22                         ` Kyrill Tkachov
2016-07-14 10:04                           ` Richard Biener
2016-07-14 13:31                             ` Rainer Orth
2016-07-15 11:14                               ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).