From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-320771-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 6033 invoked by alias); 13 Jun 2012 02:19:21 -0000
Received: (qmail 5680 invoked by uid 22791); 13 Jun 2012 02:19:18 -0000
X-SWARE-Spam-Status: No, hits=-6.4 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,TW_TM,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from e39.co.us.ibm.com (HELO e39.co.us.ibm.com) (32.97.110.160)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 13 Jun 2012 02:19:04 +0000
Received: from /spool/local	by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;	Tue, 12 Jun 2012 20:19:03 -0600
Received: from d01dlp03.pok.ibm.com (9.56.224.17)	by e39.co.us.ibm.com (192.168.1.139) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Tue, 12 Jun 2012 20:19:00 -0600
Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236])	by d01dlp03.pok.ibm.com (Postfix) with ESMTP id ED37DC90050	for <gcc-patches@gcc.gnu.org>; Tue, 12 Jun 2012 22:18:58 -0400 (EDT)
Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168])	by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5D2Ixsw218682	for <gcc-patches@gcc.gnu.org>; Tue, 12 Jun 2012 22:18:59 -0400
Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1])	by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5D2IwOe011330	for <gcc-patches@gcc.gnu.org>; Tue, 12 Jun 2012 20:18:58 -0600
Received: from [9.76.42.35] (sig-9-76-42-35.mts.ibm.com [9.76.42.35])	by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q5D2IuTR011226;	Tue, 12 Jun 2012 20:18:57 -0600
Message-ID: <1339553936.18291.15.camel@gnopaine>
Subject: [PATCH, RFC] First cut at using vec_construct for strided loads
From: "William J. Schmidt" <wschmidt@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org
Cc: rguenther@suse.de, bergner@vnet.ibm.com
Date: Wed, 13 Jun 2012 04:32:00 -0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12061302-4242-0000-0000-000001F35277
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-06/txt/msg00776.txt.bz2

This patch is a follow-up to the discussion generated by
http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00546.html.  I've added
vec_construct to the cost model for use in vect_model_load_cost, and
implemented a cost calculation that makes sense to me for PowerPC.  I'm
less certain about the default, i386, and spu implementations.  I took a
guess at i386 from the discussions we had, and used the same calculation
for the default and for spu.  I'm hoping you or others can fill in the
blanks if I guessed badly.

The i386 cost for vec_construct is different from all the others, which
are parameterized for each processor description.  This should probably
be parameterized in some way as well, but thought you'd know better than
I how that should be.  Perhaps instead of

	elements / 2 + 1

it should be

	(elements / 2) * X + Y

where X and Y are taken from the processor description, and represent
the cost of a merge and a permute, respectively.  Let me know what you
think.

Thanks,
Bill


2012-06-12  Bill Schmidt  <wschmidt@linux.ibm.com>

	* targhooks.c (default_builtin_vectorized_conversion): Handle
	vec_construct, using vectype to base cost on subparts.
	* target.h (enum vect_cost_for_stmt): Add vec_construct.
	* tree-vect-stmts.c (vect_model_load_cost): Use vec_construct
	instead of scalar_to-vec.
	* config/spu/spu.c (spu_builtin_vectorization_cost): Handle
	vec_construct in same way as default for now.
	* config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
	Handle vec_construct, including special case for 32-bit loads.
	

Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 188482)
+++ gcc/targhooks.c	(working copy)
@@ -499,9 +499,11 @@ default_builtin_vectorized_conversion (unsigned in
 
 int
 default_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
-                                    tree vectype ATTRIBUTE_UNUSED,
+                                    tree vectype,
                                     int misalign ATTRIBUTE_UNUSED)
 {
+  unsigned elements;
+
   switch (type_of_cost)
     {
       case scalar_stmt:
@@ -524,6 +526,11 @@ default_builtin_vectorization_cost (enum vect_cost
       case cond_branch_taken:
         return 3;
 
+      case vec_construct:
+	elements = TYPE_VECTOR_SUBPARTS (vectype);
+	gcc_assert (elements > 1);
+	return elements / 2 + 1;
+
       default:
         gcc_unreachable ();
     }
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 188482)
+++ gcc/target.h	(working copy)
@@ -146,7 +146,8 @@ enum vect_cost_for_stmt
   cond_branch_not_taken,
   cond_branch_taken,
   vec_perm,
-  vec_promote_demote
+  vec_promote_demote,
+  vec_construct
 };
 
 /* The target structure.  This holds all the backend hooks.  */
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	(revision 188482)
+++ gcc/tree-vect-stmts.c	(working copy)
@@ -1031,11 +1031,13 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
   /* The loads themselves.  */
   if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
     {
-      /* N scalar loads plus gathering them into a vector.
-         ???  scalar_to_vec isn't the cost for that.  */
+      /* N scalar loads plus gathering them into a vector.  */
+      tree vectype = STMT_VINFO_VECTYPE (stmt_info);
       inside_cost += (vect_get_stmt_cost (scalar_load) * ncopies
-		      * TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info)));
-      inside_cost += ncopies * vect_get_stmt_cost (scalar_to_vec);
+		      * TYPE_VECTOR_SUBPARTS (vectype));
+      inside_cost += ncopies
+	* targetm.vectorize.builtin_vectorization_cost (vec_construct,
+							vectype, 0);
     }
   else
     vect_get_load_cost (first_dr, ncopies,
Index: gcc/config/spu/spu.c
===================================================================
--- gcc/config/spu/spu.c	(revision 188482)
+++ gcc/config/spu/spu.c	(working copy)
@@ -6908,9 +6908,11 @@ spu_builtin_mask_for_load (void)
 /* Implement targetm.vectorize.builtin_vectorization_cost.  */
 static int 
 spu_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
-                                tree vectype ATTRIBUTE_UNUSED,
+                                tree vectype,
                                 int misalign ATTRIBUTE_UNUSED)
 {
+  unsigned elements;
+
   switch (type_of_cost)
     {
       case scalar_stmt:
@@ -6937,6 +6939,11 @@ spu_builtin_vectorization_cost (enum vect_cost_for
       case cond_branch_taken:
         return 6;
 
+      case vec_construct:
+	elements = TYPE_VECTOR_SUBPARTS (vectype);
+	gcc_assert (elements > 1);
+	return elements / 2 + 1;
+
       default:
         gcc_unreachable ();
     }
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 188482)
+++ gcc/config/i386/i386.c	(working copy)
@@ -36072,9 +36072,11 @@ static const struct attribute_spec ix86_attribute_
 /* Implement targetm.vectorize.builtin_vectorization_cost.  */
 static int
 ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
-                                 tree vectype ATTRIBUTE_UNUSED,
+                                 tree vectype,
                                  int misalign ATTRIBUTE_UNUSED)
 {
+  unsigned elements;
+
   switch (type_of_cost)
     {
       case scalar_stmt:
@@ -36115,6 +36117,11 @@ ix86_builtin_vectorization_cost (enum vect_cost_fo
       case vec_promote_demote:
         return ix86_cost->vec_stmt_cost;
 
+      case vec_construct:
+	elements = TYPE_VECTOR_SUBPARTS (vectype);
+	gcc_assert (elements > 1);
+	return elements / 2 + 1;
+
       default:
         gcc_unreachable ();
     }
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 188482)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -3405,6 +3405,7 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
                                    tree vectype, int misalign)
 {
   unsigned elements;
+  tree elem_type;
 
   switch (type_of_cost)
     {
@@ -3504,6 +3505,19 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
 
         return 2;
 
+      case vec_construct:
+	elements = TYPE_VECTOR_SUBPARTS (vectype);
+	elem_type = TREE_TYPE (vectype);
+	gcc_assert (elements > 1);
+	/* 32-bit vectors loaded into registers are stored as double
+	   precision, so we need n/2 converts in addition to the usual
+	   n/2 merges to construct a vector of short floats from them.  */
+	if (SCALAR_FLOAT_TYPE_P (elem_type)
+	    && TYPE_PRECISION (elem_type) == 32)
+	  return elements + 1;
+	else
+	  return elements / 2 + 1;
+
       default:
         gcc_unreachable ();
     }