From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-320800-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 26427 invoked by alias); 13 Jun 2012 09:26:50 -0000
Received: (qmail 26408 invoked by uid 22791); 13 Jun 2012 09:26:48 -0000
X-SWARE-Spam-Status: No, hits=-6.0 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_HI,TW_TM,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 13 Jun 2012 09:26:35 +0000
Received: from relay1.suse.de (unknown [195.135.220.254])	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx2.suse.de (Postfix) with ESMTP id 786E590983;	Wed, 13 Jun 2012 11:26:33 +0200 (CEST)
Date: Wed, 13 Jun 2012 09:32:00 -0000
From: Richard Guenther <rguenther@suse.de>
To: "William J. Schmidt" <wschmidt@linux.vnet.ibm.com>
Cc: gcc-patches@gcc.gnu.org, bergner@vnet.ibm.com
Subject: Re: [PATCH, RFC] First cut at using vec_construct for strided loads
In-Reply-To: <1339553936.18291.15.camel@gnopaine>
Message-ID: <Pine.LNX.4.64.1206131124440.29541@jbgna.fhfr.qr>
References: <1339553936.18291.15.camel@gnopaine>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="168427776-1835978898-1339579593=:29541"
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-06/txt/msg00805.txt.bz2

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--168427776-1835978898-1339579593=:29541
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Content-length: 8796

On Tue, 12 Jun 2012, William J. Schmidt wrote:

> This patch is a follow-up to the discussion generated by
> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00546.html.  I've added
> vec_construct to the cost model for use in vect_model_load_cost, and
> implemented a cost calculation that makes sense to me for PowerPC.  I'm
> less certain about the default, i386, and spu implementations.  I took a
> guess at i386 from the discussions we had, and used the same calculation
> for the default and for spu.  I'm hoping you or others can fill in the
> blanks if I guessed badly.
>=20
> The i386 cost for vec_construct is different from all the others, which
> are parameterized for each processor description.  This should probably
> be parameterized in some way as well, but thought you'd know better than
> I how that should be.  Perhaps instead of
>=20
> 	elements / 2 + 1
>=20
> it should be
>=20
> 	(elements / 2) * X + Y
>=20
> where X and Y are taken from the processor description, and represent
> the cost of a merge and a permute, respectively.  Let me know what you
> think.

Looks good to me with the gcc_asserts removed - TYPE_VECTOR_SUBPARTS
might be 1 for V1TImode for example (heh, not that the vectorizer would
vectorize to that).  But I don't see any possible breakage with
elements =3D=3D 1, do you?

Target maintainers can improve on the cost calculation if they wish,
the default looks sensible to me.

Thanks,
Richard.

> Thanks,
> Bill
>=20
>=20
> 2012-06-12  Bill Schmidt  <wschmidt@linux.ibm.com>
>=20
> 	* targhooks.c (default_builtin_vectorized_conversion): Handle
> 	vec_construct, using vectype to base cost on subparts.
> 	* target.h (enum vect_cost_for_stmt): Add vec_construct.
> 	* tree-vect-stmts.c (vect_model_load_cost): Use vec_construct
> 	instead of scalar_to-vec.
> 	* config/spu/spu.c (spu_builtin_vectorization_cost): Handle
> 	vec_construct in same way as default for now.
> 	* config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
> 	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
> 	Handle vec_construct, including special case for 32-bit loads.
>=20=09
>=20
> Index: gcc/targhooks.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- gcc/targhooks.c	(revision 188482)
> +++ gcc/targhooks.c	(working copy)
> @@ -499,9 +499,11 @@ default_builtin_vectorized_conversion (unsigned in
>=20=20
>  int
>  default_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
> -                                    tree vectype ATTRIBUTE_UNUSED,
> +                                    tree vectype,
>                                      int misalign ATTRIBUTE_UNUSED)
>  {
> +  unsigned elements;
> +
>    switch (type_of_cost)
>      {
>        case scalar_stmt:
> @@ -524,6 +526,11 @@ default_builtin_vectorization_cost (enum vect_cost
>        case cond_branch_taken:
>          return 3;
>=20=20
> +      case vec_construct:
> +	elements =3D TYPE_VECTOR_SUBPARTS (vectype);
> +	gcc_assert (elements > 1);
> +	return elements / 2 + 1;
> +
>        default:
>          gcc_unreachable ();
>      }
> Index: gcc/target.h
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- gcc/target.h	(revision 188482)
> +++ gcc/target.h	(working copy)
> @@ -146,7 +146,8 @@ enum vect_cost_for_stmt
>    cond_branch_not_taken,
>    cond_branch_taken,
>    vec_perm,
> -  vec_promote_demote
> +  vec_promote_demote,
> +  vec_construct
>  };
>=20=20
>  /* The target structure.  This holds all the backend hooks.  */
> Index: gcc/tree-vect-stmts.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- gcc/tree-vect-stmts.c	(revision 188482)
> +++ gcc/tree-vect-stmts.c	(working copy)
> @@ -1031,11 +1031,13 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
>    /* The loads themselves.  */
>    if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
>      {
> -      /* N scalar loads plus gathering them into a vector.
> -         ???  scalar_to_vec isn't the cost for that.  */
> +      /* N scalar loads plus gathering them into a vector.  */
> +      tree vectype =3D STMT_VINFO_VECTYPE (stmt_info);
>        inside_cost +=3D (vect_get_stmt_cost (scalar_load) * ncopies
> -		      * TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info)));
> -      inside_cost +=3D ncopies * vect_get_stmt_cost (scalar_to_vec);
> +		      * TYPE_VECTOR_SUBPARTS (vectype));
> +      inside_cost +=3D ncopies
> +	* targetm.vectorize.builtin_vectorization_cost (vec_construct,
> +							vectype, 0);
>      }
>    else
>      vect_get_load_cost (first_dr, ncopies,
> Index: gcc/config/spu/spu.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- gcc/config/spu/spu.c	(revision 188482)
> +++ gcc/config/spu/spu.c	(working copy)
> @@ -6908,9 +6908,11 @@ spu_builtin_mask_for_load (void)
>  /* Implement targetm.vectorize.builtin_vectorization_cost.  */
>  static int=20
>  spu_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
> -                                tree vectype ATTRIBUTE_UNUSED,
> +                                tree vectype,
>                                  int misalign ATTRIBUTE_UNUSED)
>  {
> +  unsigned elements;
> +
>    switch (type_of_cost)
>      {
>        case scalar_stmt:
> @@ -6937,6 +6939,11 @@ spu_builtin_vectorization_cost (enum vect_cost_for
>        case cond_branch_taken:
>          return 6;
>=20=20
> +      case vec_construct:
> +	elements =3D TYPE_VECTOR_SUBPARTS (vectype);
> +	gcc_assert (elements > 1);
> +	return elements / 2 + 1;
> +
>        default:
>          gcc_unreachable ();
>      }
> Index: gcc/config/i386/i386.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- gcc/config/i386/i386.c	(revision 188482)
> +++ gcc/config/i386/i386.c	(working copy)
> @@ -36072,9 +36072,11 @@ static const struct attribute_spec ix86_attribut=
e_
>  /* Implement targetm.vectorize.builtin_vectorization_cost.  */
>  static int
>  ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
> -                                 tree vectype ATTRIBUTE_UNUSED,
> +                                 tree vectype,
>                                   int misalign ATTRIBUTE_UNUSED)
>  {
> +  unsigned elements;
> +
>    switch (type_of_cost)
>      {
>        case scalar_stmt:
> @@ -36115,6 +36117,11 @@ ix86_builtin_vectorization_cost (enum vect_cost_=
fo
>        case vec_promote_demote:
>          return ix86_cost->vec_stmt_cost;
>=20=20
> +      case vec_construct:
> +	elements =3D TYPE_VECTOR_SUBPARTS (vectype);
> +	gcc_assert (elements > 1);
> +	return elements / 2 + 1;
> +
>        default:
>          gcc_unreachable ();
>      }
> Index: gcc/config/rs6000/rs6000.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- gcc/config/rs6000/rs6000.c	(revision 188482)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -3405,6 +3405,7 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>                                     tree vectype, int misalign)
>  {
>    unsigned elements;
> +  tree elem_type;
>=20=20
>    switch (type_of_cost)
>      {
> @@ -3504,6 +3505,19 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>=20=20
>          return 2;
>=20=20
> +      case vec_construct:
> +	elements =3D TYPE_VECTOR_SUBPARTS (vectype);
> +	elem_type =3D TREE_TYPE (vectype);
> +	gcc_assert (elements > 1);
> +	/* 32-bit vectors loaded into registers are stored as double
> +	   precision, so we need n/2 converts in addition to the usual
> +	   n/2 merges to construct a vector of short floats from them.  */
> +	if (SCALAR_FLOAT_TYPE_P (elem_type)
> +	    && TYPE_PRECISION (elem_type) =3D=3D 32)
> +	  return elements + 1;
> +	else
> +	  return elements / 2 + 1;
> +
>        default:
>          gcc_unreachable ();
>      }
>=20
>=20
>=20

--=20
Richard Guenther <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend=C3=B6rffer=

--168427776-1835978898-1339579593=:29541--