From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 125548 invoked by alias); 20 Aug 2015 18:41:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 125539 invoked by uid 89); 20 Aug 2015 18:41:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 20 Aug 2015 18:41:26 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id 991DA341AE5; Thu, 20 Aug 2015 18:41:25 +0000 (UTC) Received: from localhost.localdomain (ovpn-113-201.phx2.redhat.com [10.3.113.201]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7KIfPe4015592; Thu, 20 Aug 2015 14:41:25 -0400 Subject: Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR. To: Ilya Enkovich , gcc-patches@gcc.gnu.org References: <20150817162247.GA12565@msticlxl57.ims.intel.com> From: Jeff Law Message-ID: <55D61F54.1090802@redhat.com> Date: Thu, 20 Aug 2015 18:46:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20150817162247.GA12565@msticlxl57.ims.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg01245.txt.bz2 On 08/17/2015 10:22 AM, Ilya Enkovich wrote: > Hi, > > This patch starts a series introducing scalar masks support in the vectorizer. It was discussed on the recent Cauldron and changes overiew is available here: https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf. Here is shortly a list of changes introduced by this series: > > - Add new tree expr to produce scalar masks in a vectorized code > - Fix-up if-conversion to use bool predicates instead of integer masks > - Disable some bool patterns to avoid bool to int conversion where masks can be used > - Support bool operands in vectorization factor computation > - Support scalar masks in MASK_LOAD, MASK_STORE and VEC_COND_EXPR by adding new optabs > - Support vectorization for statements which are now not transformed by bool patterns > - Add target support (hooks, optabs, expands) > > This patch introduces GEN_MASK_EXPR code. Intitially I wanted to use a comparison as an operand for it directly mapping it into AVX-512 comparison instruction. But a feedback was to simplify new code's semantics and use it for converting vectors into scalar masks. Therefore if we want to compare two vectors into a scalar masks we use two statements: > > vect.18_87 = vect__5.13_81 > vect__6.16_86; > mask__ifc__23.17_88 = GEN_MASK ; > > Trying it in practice I found it producing worse code. The problem is that on target first comparison is expanded into two instructions: cmp with mask result + masked move to get a vector. GEN_MASK is then expanded into another comparison with zero vector. Thus I get two comparisons + move instead of a single comparison and have to optimize this out on a target side (current optimizers can't handle it). That's actually what I wanted to avoid. For now I changed GEN_MASK_EXPR to get a vector value as an operand but didn't change expand pattern which has four opernads: two vectors to compare + cmp operator + result. On expand I try to detect GEN_MASK uses a result of comparison and thus avoid double comparison generation. > > Patch series is not actually fully finished yet. I still have several type conversion tests not being vectorized and it wasn't widely tested. That's what I'm working on now. > > Will be glad to any comments. > > Thanks, > Ilya > -- > 2015-08-17 Ilya Enkovich > > * expr.c (expand_expr_real_2): Support GEN_MASK_EXPR. > * gimple-pretty-print.c (dump_unary_rhs): Likewise. > * gimple.c (get_gimple_rhs_num_ops): Likewise. > * optabs.c: Include gimple.h. > (vector_compare_rtx): Add OPNO arg. > (get_gen_mask_icode): New. > (expand_gen_mask_expr_p): New. > (expand_gen_mask_expr): New. > (expand_vec_cond_expr): Adjust vector_compare_rtx call. > * optabs.def (gen_mask_optab): New. > (gen_masku_optab): New. > * optabs.h (expand_gen_mask_expr_p): New. > (expand_gen_mask_expr): New. > * tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR. > * tree-inline.c (estimate_operator_cost): Likewise. > * tree-pretty-print.c (dump_generic_node): Likewise. > * tree-ssa-operands.c (get_expr_operands): Likewise. > * tree.def (GEN_MASK_EXPR): New. A general question, would any of this likely help Yuri's work to optimize MASK_STORES? > > > diff --git a/gcc/optabs.c b/gcc/optabs.c > index a6ca706..bf466ca 100644 > --- a/gcc/optabs.c > +++ b/gcc/optabs.c > @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3. If not see > #include "recog.h" > #include "reload.h" > #include "target.h" > +#include "gimple.h" Hmm, part of me doesn't want to see optabs.c depending on gimple.h. How painful would it be to have this stuff live in expr.c? > + > +/* Generate insns for a GEN_MASK_EXPR, given its TYPE and operand. */ > + > +rtx > +expand_gen_mask_expr (tree type, tree op0, rtx target) > +{ > + struct expand_operand ops[4]; > + enum insn_code icode; > + rtx comparison; > + machine_mode mode = TYPE_MODE (type); > + machine_mode cmp_op_mode; > + bool unsignedp; > + tree op0a, op0b; > + enum tree_code tcode; > + gimple def_stmt; > + > + /* Avoid double comparison. */ > + if (TREE_CODE (op0) == SSA_NAME > + && (def_stmt = SSA_NAME_DEF_STMT (op0)) > + && is_gimple_assign (def_stmt) > + && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) == tcc_comparison) > + { > + op0a = gimple_assign_rhs1 (def_stmt); > + op0b = gimple_assign_rhs2 (def_stmt); > + tcode = gimple_assign_rhs_code (def_stmt); > + } > + else > + { > + op0a = op0; > + op0b = build_zero_cst (TREE_TYPE (op0)); > + tcode = NE_EXPR; > + } > + > + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); > + cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a)); > + > + gcc_assert (GET_MODE_BITSIZE (mode) >= GET_MODE_NUNITS (cmp_op_mode)); > + > + icode = get_gen_mask_icode (cmp_op_mode, unsignedp); > + if (icode == CODE_FOR_nothing) > + return 0; So if the target doesn't have suitable insns, what happens? I suspect the answer is nothing useful. In which case the question becomes what prevents the optimizers from generating a GEN_MASK_EXPR? Maybe that'll become clear as I read the rest of the patches. > > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c > index e1ceea4..052c055 100644 > --- a/gcc/tree-inline.c > +++ b/gcc/tree-inline.c > @@ -3850,6 +3850,7 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights, > case COMPLEX_EXPR: > case PAREN_EXPR: > case VIEW_CONVERT_EXPR: > + case GEN_MASK_EXPR: > return 0; That seems wrong to me :-) Jeff