From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 124434 invoked by alias); 17 Aug 2015 16:23:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 124424 invoked by uid 89); 17 Aug 2015 16:23:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_40,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wi0-f173.google.com Received: from mail-wi0-f173.google.com (HELO mail-wi0-f173.google.com) (209.85.212.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 17 Aug 2015 16:23:09 +0000 Received: by wibhh20 with SMTP id hh20so85706146wib.0 for ; Mon, 17 Aug 2015 09:23:06 -0700 (PDT) X-Received: by 10.194.92.236 with SMTP id cp12mr4091465wjb.139.1439828586677; Mon, 17 Aug 2015 09:23:06 -0700 (PDT) Received: from msticlxl57.ims.intel.com ([192.55.54.40]) by smtp.gmail.com with ESMTPSA id c7sm22546686wjb.19.2015.08.17.09.23.04 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Aug 2015 09:23:06 -0700 (PDT) Date: Mon, 17 Aug 2015 16:25:00 -0000 From: Ilya Enkovich To: gcc-patches@gcc.gnu.org Subject: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR. Message-ID: <20150817162247.GA12565@msticlxl57.ims.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg00910.txt.bz2 Hi, This patch starts a series introducing scalar masks support in the vectorizer. It was discussed on the recent Cauldron and changes overiew is available here: https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf. Here is shortly a list of changes introduced by this series: - Add new tree expr to produce scalar masks in a vectorized code - Fix-up if-conversion to use bool predicates instead of integer masks - Disable some bool patterns to avoid bool to int conversion where masks can be used - Support bool operands in vectorization factor computation - Support scalar masks in MASK_LOAD, MASK_STORE and VEC_COND_EXPR by adding new optabs - Support vectorization for statements which are now not transformed by bool patterns - Add target support (hooks, optabs, expands) This patch introduces GEN_MASK_EXPR code. Intitially I wanted to use a comparison as an operand for it directly mapping it into AVX-512 comparison instruction. But a feedback was to simplify new code's semantics and use it for converting vectors into scalar masks. Therefore if we want to compare two vectors into a scalar masks we use two statements: vect.18_87 = vect__5.13_81 > vect__6.16_86; mask__ifc__23.17_88 = GEN_MASK ; Trying it in practice I found it producing worse code. The problem is that on target first comparison is expanded into two instructions: cmp with mask result + masked move to get a vector. GEN_MASK is then expanded into another comparison with zero vector. Thus I get two comparisons + move instead of a single comparison and have to optimize this out on a target side (current optimizers can't handle it). That's actually what I wanted to avoid. For now I changed GEN_MASK_EXPR to get a vector value as an operand but didn't change expand pattern which has four opernads: two vectors to compare + cmp operator + result. On expand I try to detect GEN_MASK uses a result of comparison and thus avoid double comparison generation. Patch series is not actually fully finished yet. I still have several type conversion tests not being vectorized and it wasn't widely tested. That's what I'm working on now. Will be glad to any comments. Thanks, Ilya -- 2015-08-17 Ilya Enkovich * expr.c (expand_expr_real_2): Support GEN_MASK_EXPR. * gimple-pretty-print.c (dump_unary_rhs): Likewise. * gimple.c (get_gimple_rhs_num_ops): Likewise. * optabs.c: Include gimple.h. (vector_compare_rtx): Add OPNO arg. (get_gen_mask_icode): New. (expand_gen_mask_expr_p): New. (expand_gen_mask_expr): New. (expand_vec_cond_expr): Adjust vector_compare_rtx call. * optabs.def (gen_mask_optab): New. (gen_masku_optab): New. * optabs.h (expand_gen_mask_expr_p): New. (expand_gen_mask_expr): New. * tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-ssa-operands.c (get_expr_operands): Likewise. * tree.def (GEN_MASK_EXPR): New. diff --git a/gcc/expr.c b/gcc/expr.c index 31b4573..8af5926 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9180,6 +9180,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, return temp; } + case GEN_MASK_EXPR: + target = expand_gen_mask_expr (type, treeop0, target); + return target; + case VEC_COND_EXPR: target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target); return target; diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c index 53900dd..ac25b79 100644 --- a/gcc/gimple-pretty-print.c +++ b/gcc/gimple-pretty-print.c @@ -300,6 +300,12 @@ dump_unary_rhs (pretty_printer *buffer, gassign *gs, int spc, int flags) pp_greater (buffer); break; + case GEN_MASK_EXPR: + pp_string (buffer, "GEN_MASK <"); + dump_generic_node (buffer, rhs, spc, flags, false); + pp_greater (buffer); + break; + default: if (TREE_CODE_CLASS (rhs_code) == tcc_declaration || TREE_CODE_CLASS (rhs_code) == tcc_constant diff --git a/gcc/gimple.c b/gcc/gimple.c index cca328a..93caf01 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -2005,7 +2005,8 @@ get_gimple_rhs_num_ops (enum tree_code code) : ((SYM) == TRUTH_AND_EXPR \ || (SYM) == TRUTH_OR_EXPR \ || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS \ - : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS \ + : ((SYM) == TRUTH_NOT_EXPR \ + || (SYM) == GEN_MASK_EXPR) ? GIMPLE_UNARY_RHS \ : ((SYM) == COND_EXPR \ || (SYM) == WIDEN_MULT_PLUS_EXPR \ || (SYM) == WIDEN_MULT_MINUS_EXPR \ diff --git a/gcc/optabs.c b/gcc/optabs.c index a6ca706..bf466ca 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3. If not see #include "recog.h" #include "reload.h" #include "target.h" +#include "gimple.h" struct target_optabs default_target_optabs; struct target_libfuncs default_target_libfuncs; @@ -6488,11 +6489,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp) } /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or - unsigned operators. Do not generate compare instruction. */ + unsigned operators. OPNO holds an index of the first comparison + operand in insn with code ICODE. Do not generate compare instruction. */ static rtx vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1, - bool unsignedp, enum insn_code icode) + bool unsignedp, enum insn_code icode, + unsigned int opno) { struct expand_operand ops[2]; rtx rtx_op0, rtx_op1; @@ -6518,7 +6521,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1, create_input_operand (&ops[0], rtx_op0, m0); create_input_operand (&ops[1], rtx_op1, m1); - if (!maybe_legitimize_operands (icode, 4, 2, ops)) + if (!maybe_legitimize_operands (icode, opno, 2, ops)) gcc_unreachable (); return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value); } @@ -6789,6 +6792,77 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target) return tmp; } +/* Return insn code for a comparison operator with MODE, + unsigned if UNS is true. */ + +static inline enum insn_code +get_gen_mask_icode (machine_mode mode, bool uns) +{ + optab tab = uns ? gen_masku_optab : gen_mask_optab; + return optab_handler (tab, mode); +} + +/* Return TRUE if appropriate vector insn is available + for vector comparison expr with vector type VALUE_TYPE. */ + +bool +expand_gen_mask_expr_p (tree value_type) +{ + enum insn_code icode = get_gen_mask_icode (TYPE_MODE (value_type), + TYPE_UNSIGNED (value_type)); + return (icode != CODE_FOR_nothing); +} + +/* Generate insns for a GEN_MASK_EXPR, given its TYPE and operand. */ + +rtx +expand_gen_mask_expr (tree type, tree op0, rtx target) +{ + struct expand_operand ops[4]; + enum insn_code icode; + rtx comparison; + machine_mode mode = TYPE_MODE (type); + machine_mode cmp_op_mode; + bool unsignedp; + tree op0a, op0b; + enum tree_code tcode; + gimple def_stmt; + + /* Avoid double comparison. */ + if (TREE_CODE (op0) == SSA_NAME + && (def_stmt = SSA_NAME_DEF_STMT (op0)) + && is_gimple_assign (def_stmt) + && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) == tcc_comparison) + { + op0a = gimple_assign_rhs1 (def_stmt); + op0b = gimple_assign_rhs2 (def_stmt); + tcode = gimple_assign_rhs_code (def_stmt); + } + else + { + op0a = op0; + op0b = build_zero_cst (TREE_TYPE (op0)); + tcode = NE_EXPR; + } + + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); + cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a)); + + gcc_assert (GET_MODE_BITSIZE (mode) >= GET_MODE_NUNITS (cmp_op_mode)); + + icode = get_gen_mask_icode (cmp_op_mode, unsignedp); + if (icode == CODE_FOR_nothing) + return 0; + + comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2); + create_output_operand (&ops[0], target, mode); + create_fixed_operand (&ops[1], comparison); + create_fixed_operand (&ops[2], XEXP (comparison, 0)); + create_fixed_operand (&ops[3], XEXP (comparison, 1)); + expand_insn (icode, 4, ops); + return ops[0].value; +} + /* Return insn code for a conditional operator with a comparison in mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE. */ @@ -6861,7 +6935,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, if (icode == CODE_FOR_nothing) return 0; - comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode); + comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4); rtx_op1 = expand_normal (op1); rtx_op2 = expand_normal (op2); diff --git a/gcc/optabs.def b/gcc/optabs.def index 888b21c..dc664d1 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -266,6 +266,8 @@ OPTAB_D (usad_optab, "usad$I$a") OPTAB_D (ssad_optab, "ssad$I$a") OPTAB_D (maskload_optab, "maskload$a") OPTAB_D (maskstore_optab, "maskstore$a") +OPTAB_D (gen_mask_optab, "gen_mask$a") +OPTAB_D (gen_masku_optab, "gen_masku$a") OPTAB_D (vec_extract_optab, "vec_extract$a") OPTAB_D (vec_init_optab, "vec_init$a") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") diff --git a/gcc/optabs.h b/gcc/optabs.h index 95f5cbc..d7b7fb0 100644 --- a/gcc/optabs.h +++ b/gcc/optabs.h @@ -495,6 +495,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *); /* Generate code for VEC_PERM_EXPR. */ extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx); +/* Return tree if target supports vector operations for comparison. */ +extern bool expand_gen_mask_expr_p (tree); + +/* Generate code for GEN_MASK_EXPR. */ +extern rtx expand_gen_mask_expr (tree, tree, rtx); + /* Return tree if target supports vector operations for COND_EXPR. */ bool expand_vec_cond_expr_p (tree, tree); diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 588ab69..5bae411 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -3646,6 +3646,16 @@ verify_gimple_assign_unary (gassign *stmt) case CONJ_EXPR: break; + case GEN_MASK_EXPR: + if (!INTEGRAL_TYPE_P (lhs_type) || !VECTOR_TYPE_P (rhs1_type)) + { + error ("invalid types in gen_mask"); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + return true; + } + return false; + default: gcc_unreachable (); } diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index e1ceea4..052c055 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -3850,6 +3850,7 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights, case COMPLEX_EXPR: case PAREN_EXPR: case VIEW_CONVERT_EXPR: + case GEN_MASK_EXPR: return 0; /* Assign cost of 1 to usual operations. diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index 7cd1fe7..5b6cdf0 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -2485,6 +2485,12 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, int flags, pp_greater (pp); break; + case GEN_MASK_EXPR: + pp_string (pp, " GEN_MASK_EXPR < "); + dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); + pp_string (pp, " > "); + break; + case VEC_COND_EXPR: pp_string (pp, " VEC_COND_EXPR < "); dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c index b1e3f99..7151ad1 100644 --- a/gcc/tree-ssa-operands.c +++ b/gcc/tree-ssa-operands.c @@ -837,6 +837,7 @@ get_expr_operands (struct function *fn, gimple stmt, tree *expr_p, int flags) gimple_set_has_volatile_ops (stmt, true); /* FALLTHRU */ + case GEN_MASK_EXPR: case VIEW_CONVERT_EXPR: do_unary: get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags); diff --git a/gcc/tree.def b/gcc/tree.def index 56580af..42984ca 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -536,6 +536,10 @@ DEFTREECODE (TARGET_EXPR, "target_expr", tcc_expression, 4) 1 and 2 are NULL. The operands are then taken from the cfg edges. */ DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3) +/* Generate a scalar bitmask for a vector value. Mask gets bits set + for nonzero vector elements. */ +DEFTREECODE (GEN_MASK_EXPR, "gen_mask_expr", tcc_expression, 1) + /* Vector conditional expression. It is like COND_EXPR, but with vector operands.