[PATCH 3/14] Add new optabs for reducing vectors to scalars

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Alan Lawrence <alan.lawrence@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH 3/14] Add new optabs for reducing vectors to scalars
Date: Thu, 18 Sep 2014 11:54:00 -0000	[thread overview]
Message-ID: <541AC800.2060907@arm.com> (raw)
In-Reply-To: <541AC4D2.9040901@arm.com>

[-- Attachment #1: Type: text/plain, Size: 1396 bytes --]

These match their corresponding tree codes, by taking a vector and returning a 
scalar; this is more architecturally neutral than the (somewhat loosely defined) 
previous optab that took a vector and returned a vector with the result in the 
least significant bits (i.e. element 0 for little-endian or N-1 for bigendian). 
However, the old optabs are preserved so as not to break existing backends, so 
clients check for both old + new optabs.

Bootstrap, check-gcc and check-g++ on x86_64-none-linux-gnu.
aarch64.exp + vect.exp on aarch64{,_be}-none-elf.
(of course at this point in the series all these are using the old optab + 
migration path.)

gcc/ChangeLog:

	* doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal
	optabs, and note in reduc_[us](plus|min|max) to prefer the former.

	* expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall
	back to old reduc_... + BIT_FIELD_REF only if not.

	* optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR,
	return the reduce-to-scalar (reduc_..._scal) optab.
	(scalar_reduc_to_vector): New.

	* optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab,
	reduc_plus_scal_optab, reduc_umax_scal_optab, reduc_umin_scal_optab):
	New.

	* optabs.h (scalar_reduc_to_vector): Declare.

	* tree-vect-loop.c (vectorizable_reduction): Look for optabs reducing
	to either scalar or vector.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 3_add_new_optabs.patch --]
[-- Type: text/x-patch; name=3_add_new_optabs.patch, Size: 10874 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index dd7861188afb8afd01971f9f75f0e32da9f9f826..3f5fd6f0e3ac3fcc30f6c961e3e2709a35f4d413 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4811,29 +4811,48 @@ it is unspecified which of the two operands is returned as the result.
 @cindex @code{reduc_smax_@var{m}} instruction pattern
 @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}}
 Find the signed minimum/maximum of the elements of a vector. The vector is
-operand 1, and the scalar result is stored in the least significant bits of
+operand 1, and the result is stored in the least significant bits of
 operand 0 (also a vector). The output and input vector should have the same
-modes.
+modes. These are legacy optabs, and platforms should prefer to implement
+@samp{reduc_smin_scal_@var{m}} and @samp{reduc_smax_scal_@var{m}}.
 
 @cindex @code{reduc_umin_@var{m}} instruction pattern
 @cindex @code{reduc_umax_@var{m}} instruction pattern
 @item @samp{reduc_umin_@var{m}}, @samp{reduc_umax_@var{m}}
 Find the unsigned minimum/maximum of the elements of a vector. The vector is
-operand 1, and the scalar result is stored in the least significant bits of
+operand 1, and the result is stored in the least significant bits of
 operand 0 (also a vector). The output and input vector should have the same
-modes.
+modes. These are legacy optabs, and platforms should prefer to implement
+@samp{reduc_umin_scal_@var{m}} and @samp{reduc_umax_scal_@var{m}}.
 
 @cindex @code{reduc_splus_@var{m}} instruction pattern
-@item @samp{reduc_splus_@var{m}}
-Compute the sum of the signed elements of a vector. The vector is operand 1,
-and the scalar result is stored in the least significant bits of operand 0
-(also a vector). The output and input vector should have the same modes.
-
 @cindex @code{reduc_uplus_@var{m}} instruction pattern
-@item @samp{reduc_uplus_@var{m}}
-Compute the sum of the unsigned elements of a vector. The vector is operand 1,
-and the scalar result is stored in the least significant bits of operand 0
+@item @samp{reduc_splus_@var{m}}, @samp{reduc_uplus_@var{m}}
+Compute the sum of the signed/unsigned elements of a vector. The vector is
+operand 1, and the result is stored in the least significant bits of operand 0
 (also a vector). The output and input vector should have the same modes.
+These are legacy optabs, and platforms should prefer to implement
+@samp{reduc_plus_scal_@var{m}}.
+
+@cindex @code{reduc_smin_scal_@var{m}} instruction pattern
+@cindex @code{reduc_smax_scal_@var{m}} instruction pattern
+@item @samp{reduc_smin_scal_@var{m}}, @samp{reduc_smax_scal_@var{m}}
+Find the signed minimum/maximum of the elements of a vector. The vector is
+operand 1, and operand 0 is the scalar result, with mode equal to the mode of
+the elements of the input vector.
+
+@cindex @code{reduc_umin_scal_@var{m}} instruction pattern
+@cindex @code{reduc_umax_scal_@var{m}} instruction pattern
+@item @samp{reduc_umin_scal_@var{m}}, @samp{reduc_umax_scal_@var{m}}
+Find the unsigned minimum/maximum of the elements of a vector. The vector is
+operand 1, and operand 0 is the scalar result, with mode equal to the mode of
+the elements of the input vector.
+
+@cindex @code{reduc_plus_scal_@var{m}} instruction pattern
+@item @samp{reduc_plus_scal_@var{m}}
+Compute the sum of the elements of a vector. The vector is operand 1, and
+operand 0 is the scalar result, with mode equal to the mode of the elements of
+the input vector.
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
diff --git a/gcc/expr.c b/gcc/expr.c
index a293c06489f09586ed56dff1381467401687be45..11930ca121e4e1f3807261a2e5b0ca4f6723176d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9018,21 +9018,39 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
       {
-        op0 = expand_normal (treeop0);
-        this_optab = optab_for_tree_code (code, type, optab_default);
-        enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
-        temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
-        gcc_assert (temp);
-        /* The tree code produces a scalar result, but (somewhat by convention)
-           the optab produces a vector with the result in element 0 if
-           little-endian, or element N-1 if big-endian.  So pull the scalar
-           result out of that element.  */
-        int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
-        int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
-        temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
-				  target, mode, mode);
-        gcc_assert (temp);
-        return temp;
+	op0 = expand_normal (treeop0);
+	enum machine_mode vec_mode = GET_MODE (op0);
+	this_optab = optab_for_tree_code (code, type, optab_default);
+
+	if (optab_handler (this_optab, vec_mode) != CODE_FOR_nothing)
+	  {
+	    struct expand_operand ops[2];
+	    enum insn_code icode = optab_handler (this_optab, vec_mode);
+
+	    create_output_operand (&ops[0], target, mode);
+	    create_input_operand (&ops[1], op0, vec_mode);
+	    if (maybe_expand_insn (icode, 2, ops))
+	      {
+		target = ops[0].value;
+		if (GET_MODE (target) != mode)
+		  return gen_lowpart (tmode, target);
+		return target;
+	      }
+	  }
+	/* Fall back to optab with vector result, and then extract scalar.  */
+	this_optab = scalar_reduc_to_vector (this_optab, type);
+	temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
+	gcc_assert (temp);
+	/* The tree code produces a scalar result, but (somewhat by convention)
+	   the optab produces a vector with the result in element 0 if
+	   little-endian, or element N-1 if big-endian.  So pull the scalar
+	   result out of that element.  */
+	int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
+	int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+	temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
+	                          target, mode, mode);
+	gcc_assert (temp);
+	return temp;
       }
 
     case VEC_LSHIFT_EXPR:
diff --git a/gcc/optabs.c b/gcc/optabs.c
index d6412ec42d7c4908a74d098e80d7038e068ca557..e422bcce18d06a39b26547b510c35858efc2303e 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -505,13 +505,15 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return fma_optab;
 
     case REDUC_MAX_EXPR:
-      return TYPE_UNSIGNED (type) ? reduc_umax_optab : reduc_smax_optab;
+      return TYPE_UNSIGNED (type)
+	     ? reduc_umax_scal_optab : reduc_smax_scal_optab;
 
     case REDUC_MIN_EXPR:
-      return TYPE_UNSIGNED (type) ? reduc_umin_optab : reduc_smin_optab;
+      return TYPE_UNSIGNED (type)
+	     ? reduc_umin_scal_optab : reduc_smin_scal_optab;
 
     case REDUC_PLUS_EXPR:
-      return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab;
+      return reduc_plus_scal_optab;
 
     case VEC_LSHIFT_EXPR:
       return vec_shl_optab;
@@ -607,7 +609,22 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return unknown_optab;
     }
 }
-\f
+
+optab
+scalar_reduc_to_vector (optab unoptab, const_tree type)
+{
+  switch (unoptab)
+    {
+    case reduc_plus_scal_optab:
+      return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab;
+
+    case reduc_smin_scal_optab: return reduc_smin_optab;
+    case reduc_umin_scal_optab: return reduc_umin_optab;
+    case reduc_smax_scal_optab: return reduc_smax_optab;
+    case reduc_umax_scal_optab: return reduc_umax_optab;
+    default: return unknown_optab;
+    }
+}
 
 /* Expand vector widening operations.
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b75547006585267d9f5b4f17ba972ba388852cf5..131ea048b012b073345be3b426d4ac8f33061809 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -243,12 +243,20 @@ OPTAB_D (sin_optab, "sin$a2")
 OPTAB_D (sincos_optab, "sincos$a3")
 OPTAB_D (tan_optab, "tan$a2")
 
+/* Vector reduction to a scalar.  */
+OPTAB_D (reduc_smax_scal_optab, "reduc_smax_scal_$a")
+OPTAB_D (reduc_smin_scal_optab, "reduc_smin_scal_$a")
+OPTAB_D (reduc_plus_scal_optab, "reduc_plus_scal_$a")
+OPTAB_D (reduc_umax_scal_optab, "reduc_umax_scal_$a")
+OPTAB_D (reduc_umin_scal_optab, "reduc_umin_scal_$a")
+/* (Old) Vector reduction, returning a vector with the result in one lane.  */
 OPTAB_D (reduc_smax_optab, "reduc_smax_$a")
 OPTAB_D (reduc_smin_optab, "reduc_smin_$a")
 OPTAB_D (reduc_splus_optab, "reduc_splus_$a")
 OPTAB_D (reduc_umax_optab, "reduc_umax_$a")
 OPTAB_D (reduc_umin_optab, "reduc_umin_$a")
 OPTAB_D (reduc_uplus_optab, "reduc_uplus_$a")
+
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 089b15a6fcd261bb15c898f185a157f1257284ba..d9f4900620a13d74fc3dfb1bac9bcb34416012de 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -162,6 +162,11 @@ enum optab_subtype
    vector shifts and rotates */
 extern optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
 
+/* Given optab UNOPTAB that reduces a vector to a scalar, find instead the old
+   optab that produces a vector with the reduction result in one element,
+   for a tree with type TYPE.  */
+extern optab scalar_reduc_to_vector (optab unoptab, const_tree type);
+
 /* The various uses that a comparison can have; used by can_compare_p:
    jumps, conditional moves, store flag operations.  */
 enum can_compare_purpose
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 36f51977845bf5ce451564ccd1eb8ad5f39ac8de..d0a29d312bfd9a7eb552d937e3c64cf9b30d558a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5101,15 +5101,17 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
 
           epilog_reduc_code = ERROR_MARK;
         }
-
-      if (reduc_optab
-          && optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
+      else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
         {
-          if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "reduc op not supported by target.\n");
+          optab = scalar_reduc_to_vector (reduc_optab, vectype_out);
+          if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
+            {
+              if (dump_enabled_p ())
+	        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "reduc op not supported by target.\n");
 
-          epilog_reduc_code = ERROR_MARK;
+	      epilog_reduc_code = ERROR_MARK;
+	    }
         }
     }
   else

next prev parent reply	other threads:[~2014-09-18 11:54 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-18 11:41 [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114 Alan Lawrence
2014-09-18 11:45 ` [PATCH 1/14][AArch64] Temporarily remove aarch64_gimple_fold_builtin code for reduction operations Alan Lawrence
2014-09-24  9:41   ` Marcus Shawcroft
2014-09-18 11:51 ` [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result Alan Lawrence
2014-09-22 10:34   ` Richard Biener
2014-09-22 13:23     ` Alan Lawrence
2014-09-24 15:02     ` Alan Lawrence
2014-09-24 18:08       ` Segher Boessenkool
2014-09-25 16:07         ` Alan Lawrence
2014-09-18 11:54 ` Alan Lawrence [this message]
2014-09-22 10:40   ` [PATCH 3/14] Add new optabs for reducing vectors to scalars Richard Biener
2014-09-22 13:26     ` Alan Lawrence
2014-09-22 13:38       ` Richard Biener
2014-09-25 14:33         ` [PATCH/RFC v2 " Alan Lawrence
2014-09-25 15:31           ` Richard Biener
2014-09-25 16:12             ` Alan Lawrence
2014-09-25 19:20               ` Segher Boessenkool
2014-09-18 11:59 ` [PATCH 4/14][AArch64] Use new reduc_plus_scal optabs, inc. for __builtins Alan Lawrence
2014-09-24  9:44   ` Marcus Shawcroft
2014-09-18 12:02 ` [PATCH 5/14][AArch64] Use new reduc_[us](min|max)_scal optabs, inc. for builtins Alan Lawrence
2014-09-24  9:47   ` Marcus Shawcroft
2014-09-18 12:05 ` [PATCH 6/14][AArch64] Restore gimple_folding of reduction intrinsics Alan Lawrence
2014-09-24  9:48   ` Marcus Shawcroft
2014-09-18 12:19 ` [PATCH 7/14][Testsuite] Add tests of reductions using whole-vector-shifts (multiplication) Alan Lawrence
2014-09-22 10:41   ` Richard Biener
2014-09-18 12:25 ` [PATCH 8/14][Testsuite] Add tests of reductions using whole-vector-shifts (ior) Alan Lawrence
2014-09-22 10:42   ` Richard Biener
2014-09-18 12:27 ` [PATCH 9/14] Enforce whole-vector-shifts to always be by a whole number of elements Alan Lawrence
2014-09-22 10:50   ` Richard Biener
2014-09-18 12:34 ` [PATCH 10/14][AArch64] Implement vec_shr optab Alan Lawrence
2014-09-18 12:35 ` [PATCH 11/14] Remove VEC_LSHIFT_EXPR and vec_shl_optab Alan Lawrence
2014-09-22 10:52   ` Richard Biener
2014-10-27 18:45     ` Alan Lawrence
2014-10-27 20:24       ` Richard Biener
2014-09-18 12:43 ` [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral Alan Lawrence
2014-09-18 13:12   ` David Edelsohn
2014-09-22 13:27     ` Bill Schmidt
2014-09-22 10:58   ` Richard Biener
2014-09-18 12:45 ` [PATCH 13/14][AArch64_be] Fix vec_shr pattern to correctly implement endianness-neutral optab Alan Lawrence
2014-09-22 10:52   ` Richard Biener
2014-09-18 12:48 ` [PATCH 14/14][Vectorizer] Tidy up vect_create_epilog / use_scalar_result Alan Lawrence
2014-09-22 10:53   ` Richard Biener
2014-11-14 17:29     ` PUSHED: " Alan Lawrence
2014-09-18 12:58 ` [PATCH/RFC 15 / 14+2][RS6000] Remove vec_shl and (hopefully) fix vec_shr Alan Lawrence
2014-09-23 12:50   ` David Edelsohn
2014-09-18 13:02 ` [PATCH 16 / 14+2][MIPS] " Alan Lawrence
2014-09-22 11:21 ` [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114 Richard Biener
2014-09-22 11:26   ` Richard Biener
2014-10-06 17:31   ` Alan Lawrence
     [not found]   ` <5432D1A5.6080208@arm.com>
2014-10-07  7:45     ` Richard Biener
2014-10-07  7:46       ` Richard Biener
     [not found]       ` <5436C138.50208@arm.com>
2014-10-09 17:13         ` Alan Lawrence

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=541AC800.2060907@arm.com \
    --to=alan.lawrence@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).