From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 49868 invoked by alias); 27 Aug 2017 19:36:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 49854 invoked by uid 89); 27 Aug 2017 19:36:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=sk:autove, sk:auto-ve, H*RU:Postfix, Hx-spam-relays-external:Postfix X-HELO: smtprelay.hostedemail.com Received: from smtprelay0077.hostedemail.com (HELO smtprelay.hostedemail.com) (216.40.44.77) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 27 Aug 2017 19:36:24 +0000 Received: from filter.hostedemail.com (clb03-v110.bra.tucows.net [216.40.38.60]) by smtprelay03.hostedemail.com (Postfix) with ESMTP id 92DE3837F252; Sun, 27 Aug 2017 19:36:21 +0000 (UTC) X-Session-Marker: 6A62656E6973746F6E40756B322E6E6574 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,jon@beniston.com,:::,RULES_HIT:10:41:355:379:541:542:800:960:973:988:989:1155:1260:1308:1309:1313:1314:1345:1437:1516:1518:1535:1542:1575:1587:1594:1711:1730:1747:1764:1777:1792:2110:2393:2559:2562:2693:2743:3138:3139:3140:3141:3142:3352:3865:3866:3867:3868:3870:3871:3872:3874:4120:4184:4250:4321:4605:4657:5007:6119:7903:7904:8545:9121:10004:10394:10848:11026:11658:11914:12043:12438:12663:12740:12895:13177:13229:13255:13439:13972:14096:14180:14181:14659:14721:21060:21080:21434:21451:21627:30036:30054,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: rule74_547a3756d9758 X-Filterd-Recvd-Size: 9537 Received: from LoftPC (cpc97974-croy24-2-0-cust112.19-2.cable.virginm.net [77.99.44.113]) (Authenticated sender: jbeniston@uk2.net) by omf08.hostedemail.com (Postfix) with ESMTPA; Sun, 27 Aug 2017 19:36:20 +0000 (UTC) From: "Jon Beniston" To: Cc: Subject: [RFC, vectorizer] Allow single element vector types for vector reduction operations Date: Mon, 28 Aug 2017 08:22:00 -0000 Message-ID: <015b01d31f6b$c3651620$4a2f4260$@beniston.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_015C_01D31F74.252A1A60" X-SW-Source: 2017-08/txt/msg01556.txt.bz2 This is a multipart message in MIME format. ------=_NextPart_000_015C_01D31F74.252A1A60 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-length: 2398 Hi, I have an out-of-tree GCC port and it is struggling supporting auto-vectorization on some dot product instructions. For example, I have an instruction that takes three operands which are all 32-bit general registers. The second and third operands will be treated as V2HI then do dot product, and then generate an SI result which is then added to the first operand which is SI as well. I do see there is dot product recognizer in tree-vect-patters.c, however, I found the following testcase still can't be auto-vectorized on my port which has implemented all necessary dot product standard patterns. This testcase can't be auto-vectorized on other targets that have similar V2HI dot product instructions as well, for example ARC. === test.c === #define K 4 #define M 4 #define N 256 int in[N*K][M]; int out[K]; int coeff[N][M]; void foo (void) { int i, j, k; int sum; for (k = 0; k < K; k++) { sum = 0; for (j = 0; j < M; j++) for (i = 0; i < N; i++) sum += in[i+k][j] * coeff[i][j]; out[k] = sum; } } === The reason that auto-vectorizer doesn't work seems to be that GCC doesn't support single-element vector types in get_vectype_for_scalar_type_and_size. tree-vect-stmts.c: get_vectype_for_scalar_type_and_size ... if (nunits <= 1) return NULL_TREE; So, I am thinking this actually should be relaxed to support more cases. At least on vector reduction operations which normally will have scalar result with wider types than the element type of input operands. I have tried to make the auto-vectorizer work for my V2HI dot product case, with the patch attached. Is this the correct approach? Cheers, Jon gcc/ 2017-08-27 Jon Beniston * tree-vectorizer.h (get_vectype_for_scalar_type): New optional parameter declaration. * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Add new optional parameter "reduct_p". Support single element vector types if it is true. (get_vectype_for_scalar_type): Add new parameter "reduct_p". * tree-vect-patterns.c (vect_pattern_recog_1): Pass new parameter "reduct_p". * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. (vect_model_reduction_cost): Likewise. (get_initial_def_for_induction): Likewise. (vect_create_epilog_for_reduction): Likewise. ------=_NextPart_000_015C_01D31F74.252A1A60 Content-Type: application/octet-stream; name="fix-vec-reduct.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="fix-vec-reduct.patch" Content-length: 5755 Index: gcc/tree-vect-loop.c=0A= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A= --- gcc/tree-vect-loop.c=0A= +++ gcc/tree-vect-loop.c=0A= @@ -232,7 +232,7 @@=0A= dump_printf (MSG_NOTE, "\n");=0A= }=0A= =20=0A= - vectype =3D get_vectype_for_scalar_type (scalar_type);=0A= + vectype =3D get_vectype_for_scalar_type (scalar_type, true);=0A= if (!vectype)=0A= {=0A= if (dump_enabled_p ())=0A= @@ -465,7 +465,7 @@=0A= dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);=0A= dump_printf (MSG_NOTE, "\n");=0A= }=0A= - vectype =3D get_vectype_for_scalar_type (scalar_type);=0A= + vectype =3D get_vectype_for_scalar_type (scalar_type, true);=0A= if (!vectype)=0A= {=0A= if (dump_enabled_p ())=0A= @@ -510,7 +510,7 @@=0A= dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);=0A= dump_printf (MSG_NOTE, "\n");=0A= }=0A= - vf_vectype =3D get_vectype_for_scalar_type (scalar_type);=0A= + vf_vectype =3D get_vectype_for_scalar_type (scalar_type, true);=0A= }=0A= if (!vf_vectype)=0A= {=0A= @@ -3673,7 +3673,7 @@=0A= =20=0A= reduction_op =3D get_reduction_op (stmt, reduc_index);=0A= =20=0A= - vectype =3D get_vectype_for_scalar_type (TREE_TYPE (reduction_op));=0A= + vectype =3D get_vectype_for_scalar_type (TREE_TYPE (reduction_op), true)= ;=0A= if (!vectype)=0A= {=0A= if (dump_enabled_p ())=0A= @@ -4202,7 +4202,7 @@=0A= loop_vec_info loop_vinfo =3D STMT_VINFO_LOOP_VINFO (stmt_vinfo);=0A= struct loop *loop =3D LOOP_VINFO_LOOP (loop_vinfo);=0A= tree scalar_type =3D TREE_TYPE (init_val);=0A= - tree vectype =3D get_vectype_for_scalar_type (scalar_type);=0A= + tree vectype =3D get_vectype_for_scalar_type (scalar_type, true);=0A= int nunits;=0A= enum tree_code code =3D gimple_assign_rhs_code (stmt);=0A= tree def_for_init;=0A= @@ -4455,7 +4455,7 @@=0A= =20=0A= reduction_op =3D get_reduction_op (stmt, reduc_index);=0A= =20=0A= - vectype =3D get_vectype_for_scalar_type (TREE_TYPE (reduction_op));=0A= + vectype =3D get_vectype_for_scalar_type (TREE_TYPE (reduction_op), true)= ;=0A= gcc_assert (vectype);=0A= mode =3D TYPE_MODE (vectype);=0A= =20=0A= Index: gcc/tree-vect-patterns.c=0A= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A= --- gcc/tree-vect-patterns.c=0A= +++ gcc/tree-vect-patterns.c=0A= @@ -4176,12 +4176,22 @@=0A= type_in =3D get_vectype_for_scalar_type (type_in);=0A= if (!type_in)=0A= return false;=0A= +=0A= + tree type_out_backup =3D type_out;=0A= if (type_out)=0A= type_out =3D get_vectype_for_scalar_type (type_out);=0A= else=0A= type_out =3D type_in;=0A= if (!type_out)=0A= - return false;=0A= + {=0A= + /* dot_prod is vector reduction operation that we want allow=0A= + single element vector types. */=0A= + if (!strcmp (recog_func->name, "dot_prod"))=0A= + type_out =3D get_vectype_for_scalar_type (type_out_backup, true);=0A= +=0A= + if (!type_out)=0A= + return false;=0A= + }=0A= pattern_vectype =3D type_out;=0A= =20=0A= if (is_gimple_assign (pattern_stmt))=0A= Index: gcc/tree-vect-stmts.c=0A= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A= --- gcc/tree-vect-stmts.c=0A= +++ gcc/tree-vect-stmts.c=0A= @@ -8985,7 +8985,8 @@=0A= by the target. */=0A= =20=0A= static tree=0A= -get_vectype_for_scalar_type_and_size (tree scalar_type, unsigned size)=0A= +get_vectype_for_scalar_type_and_size (tree scalar_type, unsigned size,=0A= + bool reduct_p =3D false)=0A= {=0A= tree orig_scalar_type =3D scalar_type;=0A= machine_mode inner_mode =3D TYPE_MODE (scalar_type);=0A= @@ -9039,7 +9040,7 @@=0A= else=0A= simd_mode =3D mode_for_vector (inner_mode, size / nbytes);=0A= nunits =3D GET_MODE_SIZE (simd_mode) / nbytes;=0A= - if (nunits < 1) /* Support V1SI. */=0A= + if (nunits < 1 || (nunits =3D=3D 1 && !reduct_p))=0A= return NULL_TREE;=0A= =20=0A= vectype =3D build_vector_type (scalar_type, nunits);=0A= @@ -9065,11 +9066,12 @@=0A= by the target. */=0A= =20=0A= tree=0A= -get_vectype_for_scalar_type (tree scalar_type)=0A= +get_vectype_for_scalar_type (tree scalar_type, bool reduct_p)=0A= {=0A= tree vectype;=0A= vectype =3D get_vectype_for_scalar_type_and_size (scalar_type,=0A= - current_vector_size);=0A= + current_vector_size,=0A= + reduct_p);=0A= if (vectype=0A= && current_vector_size =3D=3D 0)=0A= current_vector_size =3D GET_MODE_SIZE (TYPE_MODE (vectype));=0A= Index: gcc/tree-vectorizer.h=0A= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A= --- gcc/tree-vectorizer.h=0A= +++ gcc/tree-vectorizer.h=0A= @@ -1062,7 +1062,7 @@=0A= =20=0A= /* In tree-vect-stmts.c. */=0A= extern unsigned int current_vector_size;=0A= -extern tree get_vectype_for_scalar_type (tree);=0A= +extern tree get_vectype_for_scalar_type (tree, bool =3D false);=0A= extern tree get_mask_type_for_scalar_type (tree);=0A= extern tree get_same_sized_vectype (tree, tree);=0A= extern bool vect_is_simple_use (tree, vec_info *, gimple **,=0A= ------=_NextPart_000_015C_01D31F74.252A1A60--