From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id 690B03858D3C for ; Wed, 6 Dec 2023 09:40:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 690B03858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 690B03858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701855661; cv=none; b=pKKFn7LGl54EkiOKs25AhbT4T5dzssdzvRtg8Y0K+Sw97EO70PPnFFlvu4+fu/MMVkMy3D3/NfSzyII9RCmBfaRf7rL7L/X7TQ2VR8K0FECMUv5bEewv3/pH2YQ8N/ZYzqJCucxQMBSPs/HmzWbxX8UfhiZ1U92IlqBiSYSlcSs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701855661; c=relaxed/simple; bh=WODzhKxipnYp+q3NBGDzBuO9smIxX9akGu5WY2paAj0=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=W7FqMy215AZnPTHsAIpmXMSHsh4jJ6xMFlFHptEOBKCOK0NhjkYgi2Yqqx5i0Y3N2TUVB15VKKkoOafy9CzJ1bQvpM4FKMzgUB99BrQSsQclTzxMipYRF/soQnhO0KlLFPRetsWGaiayOU1nz/cfSkvvQIaJijExtnmCmOwoly8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 4DCDA1FD00; Wed, 6 Dec 2023 09:40:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1701855657; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=STbjoJ1h0513wuTQXa7ee4zFPgIC1B2OlKeZvpCsJH4=; b=WkV+38rCZ7MnDeE0kQ9VA6sxsGhAj8Z9Szm/+AGU2h69uuH8oCE7hcitkatRZV8WGZr1Xz IdddxqoKIgh2mQVUcYzDafKNc2DnOexjoSYHlKnnmMHoUOy1iO99ckyKDnfpHrwdtlsFXH a6XfGxWlVMHWdHcAwNyKaOHlpyn1GSg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1701855657; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=STbjoJ1h0513wuTQXa7ee4zFPgIC1B2OlKeZvpCsJH4=; b=i1Dl6mzCBTdRnhn/fvCuPL2k02C6Bh8xZH4QiasjKPaFavBJuk4dtEs9MjmOrqkNRg+lV9 z7sMtcxFJwlDinAg== Date: Wed, 6 Dec 2023 10:37:11 +0100 (CET) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code In-Reply-To: Message-ID: <5r3p7378-q309-ooqo-7o76-q9r567ns1890@fhfr.qr> References: <85570n66-1540-0r07-7q80-269p3o133585@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Authentication-Results: smtp-out2.suse.de; none X-Spam-Level: X-Spam-Score: -0.10 X-Spamd-Result: default: False [-0.10 / 50.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; NEURAL_SPAM_SHORT(3.00)[0.999]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,tree-vect-stmts.cc:url]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 6 Dec 2023, Tamar Christina wrote: > > > > + > > > > + tree truth_type = truth_type_for (vectype_op); machine_mode mode = > > > > + TYPE_MODE (truth_type); int ncopies; > > > > + > > > > more line break issues ... (also below, check yourself) > > > > shouldn't STMT_VINFO_VECTYPE already match truth_type here? If not > > it looks to be set wrongly (or shouldn't be set at all) > > > > Fixed, I now leverage the existing vect_recog_bool_pattern to update the types > If needed and determine the initial type in vect_get_vector_types_for_stmt. > > > > > + if (slp_node) > > > > + ncopies = 1; > > > > + else > > > > + ncopies = vect_get_num_copies (loop_vinfo, truth_type); > > > > + > > > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); bool > > > > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); > > > > + > > > > what about with_len? > > Should be easy to add, but don't know how it works. > > > > > > > + /* Analyze only. */ > > > > + if (!vec_stmt) > > > > + { > > > > + if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing) > > > > + { > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > + "can't vectorize early exit because the " > > > > + "target doesn't support flag setting vector " > > > > + "comparisons.\n"); > > > > + return false; > > > > + } > > > > + > > > > + if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR)) > > > > Why NE_EXPR? This looks wrong. Or vectype_op is wrong if you're > > emitting > > > > mask = op0 CMP op1; > > if (mask != 0) > > > > I think you need to check for CMP, not NE_EXPR. > > Well CMP is checked by vectorizable_comparison_1, but I realized this > check is not checking what I wanted and the cbranch requirements > already do. So removed. > > > > > > > + { > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > + "can't vectorize early exit because the " > > > > + "target does not support boolean vector " > > > > + "comparisons for type %T.\n", truth_type); > > > > + return false; > > > > + } > > > > + > > > > + if (ncopies > 1 > > > > + && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing) > > > > + { > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > + "can't vectorize early exit because the " > > > > + "target does not support boolean vector OR for " > > > > + "type %T.\n", truth_type); > > > > + return false; > > > > + } > > > > + > > > > + if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi, > > > > + vec_stmt, slp_node, cost_vec)) > > > > + return false; > > > > I suppose vectorizable_comparison_1 will check this again, so the above > > is redundant? > > > > The IOR? No, vectorizable_comparison_1 doesn't reduce so may not check it > depending on the condition. > > > > > + /* Determine if we need to reduce the final value. */ > > > > + if (stmts.length () > 1) > > > > + { > > > > + /* We build the reductions in a way to maintain as much parallelism as > > > > + possible. */ > > > > + auto_vec workset (stmts.length ()); > > > > + workset.splice (stmts); > > > > + while (workset.length () > 1) > > > > + { > > > > + new_temp = make_temp_ssa_name (truth_type, NULL, > > > > "vexit_reduc"); > > > > + tree arg0 = workset.pop (); > > > > + tree arg1 = workset.pop (); > > > > + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, > > > > arg1); > > > > + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, > > > > + &cond_gsi); > > > > + if (slp_node) > > > > + slp_node->push_vec_def (new_stmt); > > > > + else > > > > + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); > > > > + workset.quick_insert (0, new_temp); > > > > Reduction epilogue handling has similar code to reduce a set of vectors > > to a single one with an operation. I think we want to share that code. > > > > I've taken a look but that code isn't suitable here since they have different > constraints. I don't require an in-order reduction since for the comparison > all we care about is whether in a lane any bit is set or not. This means: > > 1. we can reduce using a fast operation like IOR. > 2. we can reduce in as much parallelism as possible. > > The comparison is on the critical path for the loop now, unlike live reductions > which are always at the end, so using the live reduction code resulted in a > slow down since it creates a longer dependency chain. OK. > > > > + } > > > > + } > > > > + else > > > > + new_temp = stmts[0]; > > > > + > > > > + gcc_assert (new_temp); > > > > + > > > > + tree cond = new_temp; > > > > + if (masked_loop_p) > > > > + { > > > > + tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, > > > > truth_type, 0); > > > > + cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, > > > > + &cond_gsi); > > > > I don't think this is correct when 'stmts' had more than one vector? > > > > It is, because even when VLA, since we only support counted loops partial vectors > are disabled. And it looks like --parm vect-partial-vector-usage=1 cannot force it on. --param vect-partial-vector-usage=2 would, no? > In principal I suppose I could mask the individual stmts, that should handle the future case when > This is relaxed to supposed non-fix length buffers? Well, it looks wrong - either put in an assert that we start with a single stmt or assert !masked_loop_p instead? Better ICE than generate wrong code. That said, I think you need to apply the masking on the original stmts[], before reducing them, no? Thanks, Richard. > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds. > (check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts, > vect_recog_bool_pattern): Support gconds type analysis. > * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without > lhs. > (vectorizable_early_exit): New. > (vect_analyze_stmt, vect_transform_stmt): Use it. > (vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond. > > --- inline copy of patch --- > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt, > if (!STMT_VINFO_VECTYPE (pattern_stmt_info)) > { > gcc_assert (!vectype > + || is_a (pattern_stmt) > || (VECTOR_BOOLEAN_TYPE_P (vectype) > == vect_use_mask_type_p (orig_stmt_info))); > STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype; > @@ -5210,19 +5211,27 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo, > true if bool VAR can and should be optimized that way. Assume it shouldn't > in case it's a result of a comparison which can be directly vectorized into > a vector comparison. Fills in STMTS with all stmts visited during the > - walk. */ > + walk. if COND then a gcond is being inspected instead of a normal COND, */ > > static bool > -check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts) > +check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts, > + gcond *cond) > { > tree rhs1; > enum tree_code rhs_code; > + gassign *def_stmt = NULL; > > stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var); > - if (!def_stmt_info) > + if (!def_stmt_info && !cond) > return false; > + else if (!def_stmt_info) > + /* If we're a gcond we won't be codegen-ing the statements and are only > + after if the types match. In that case we can accept loop invariant > + values. */ > + def_stmt = dyn_cast (SSA_NAME_DEF_STMT (var)); > + else > + def_stmt = dyn_cast (def_stmt_info->stmt); > > - gassign *def_stmt = dyn_cast (def_stmt_info->stmt); > if (!def_stmt) > return false; > > @@ -5234,27 +5243,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts) > switch (rhs_code) > { > case SSA_NAME: > - if (! check_bool_pattern (rhs1, vinfo, stmts)) > + if (! check_bool_pattern (rhs1, vinfo, stmts, cond)) > return false; > break; > > CASE_CONVERT: > if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1))) > return false; > - if (! check_bool_pattern (rhs1, vinfo, stmts)) > + if (! check_bool_pattern (rhs1, vinfo, stmts, cond)) > return false; > break; > > case BIT_NOT_EXPR: > - if (! check_bool_pattern (rhs1, vinfo, stmts)) > + if (! check_bool_pattern (rhs1, vinfo, stmts, cond)) > return false; > break; > > case BIT_AND_EXPR: > case BIT_IOR_EXPR: > case BIT_XOR_EXPR: > - if (! check_bool_pattern (rhs1, vinfo, stmts) > - || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts)) > + if (! check_bool_pattern (rhs1, vinfo, stmts, cond) > + || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts, > + cond)) > return false; > break; > > @@ -5275,6 +5285,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts) > tree mask_type = get_mask_type_for_scalar_type (vinfo, > TREE_TYPE (rhs1)); > if (mask_type > + && !cond > && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code)) > return false; > > @@ -5324,11 +5335,13 @@ adjust_bool_pattern_cast (vec_info *vinfo, > VAR is an SSA_NAME that should be transformed from bool to a wider integer > type, OUT_TYPE is the desired final integer type of the whole pattern. > STMT_INFO is the info of the pattern root and is where pattern stmts should > - be associated with. DEFS is a map of pattern defs. */ > + be associated with. DEFS is a map of pattern defs. If TYPE_ONLY then don't > + create new pattern statements and instead only fill LAST_STMT and DEFS. */ > > static void > adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type, > - stmt_vec_info stmt_info, hash_map &defs) > + stmt_vec_info stmt_info, hash_map &defs, > + gimple *&last_stmt, bool type_only) > { > gimple *stmt = SSA_NAME_DEF_STMT (var); > enum tree_code rhs_code, def_rhs_code; > @@ -5492,8 +5505,10 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type, > } > > gimple_set_location (pattern_stmt, loc); > - append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, > - get_vectype_for_scalar_type (vinfo, itype)); > + if (!type_only) > + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, > + get_vectype_for_scalar_type (vinfo, itype)); > + last_stmt = pattern_stmt; > defs.put (var, gimple_assign_lhs (pattern_stmt)); > } > > @@ -5509,11 +5524,14 @@ sort_after_uid (const void *p1, const void *p2) > > /* Create pattern stmts for all stmts participating in the bool pattern > specified by BOOL_STMT_SET and its root STMT_INFO with the desired type > - OUT_TYPE. Return the def of the pattern root. */ > + OUT_TYPE. Return the def of the pattern root. If TYPE_ONLY the new > + statements are not emitted as pattern statements and the tree returned is > + only useful for type queries. */ > > static tree > adjust_bool_stmts (vec_info *vinfo, hash_set &bool_stmt_set, > - tree out_type, stmt_vec_info stmt_info) > + tree out_type, stmt_vec_info stmt_info, > + bool type_only = false) > { > /* Gather original stmts in the bool pattern in their order of appearance > in the IL. */ > @@ -5523,16 +5541,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set &bool_stmt_set, > bool_stmts.quick_push (*i); > bool_stmts.qsort (sort_after_uid); > > + gimple *last_stmt = NULL; > + > /* Now process them in that order, producing pattern stmts. */ > hash_map defs; > for (unsigned i = 0; i < bool_stmts.length (); ++i) > adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]), > - out_type, stmt_info, defs); > + out_type, stmt_info, defs, last_stmt, type_only); > > /* Pop the last pattern seq stmt and install it as pattern root for STMT. */ > - gimple *pattern_stmt > - = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info)); > - return gimple_assign_lhs (pattern_stmt); > + return gimple_assign_lhs (last_stmt); > } > > /* Return the proper type for converting bool VAR into > @@ -5608,13 +5626,22 @@ vect_recog_bool_pattern (vec_info *vinfo, > enum tree_code rhs_code; > tree var, lhs, rhs, vectype; > gimple *pattern_stmt; > - > - if (!is_gimple_assign (last_stmt)) > + gcond* cond = NULL; > + if (!is_gimple_assign (last_stmt) > + && !(cond = dyn_cast (last_stmt))) > return NULL; > > - var = gimple_assign_rhs1 (last_stmt); > - lhs = gimple_assign_lhs (last_stmt); > - rhs_code = gimple_assign_rhs_code (last_stmt); > + if (is_gimple_assign (last_stmt)) > + { > + var = gimple_assign_rhs1 (last_stmt); > + lhs = gimple_assign_lhs (last_stmt); > + rhs_code = gimple_assign_rhs_code (last_stmt); > + } > + else > + { > + lhs = var = gimple_cond_lhs (last_stmt); > + rhs_code = gimple_cond_code (last_stmt); > + } > > if (rhs_code == VIEW_CONVERT_EXPR) > var = TREE_OPERAND (var, 0); > @@ -5632,7 +5659,7 @@ vect_recog_bool_pattern (vec_info *vinfo, > return NULL; > vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs)); > > - if (check_bool_pattern (var, vinfo, bool_stmts)) > + if (check_bool_pattern (var, vinfo, bool_stmts, cond)) > { > rhs = adjust_bool_stmts (vinfo, bool_stmts, > TREE_TYPE (lhs), stmt_vinfo); > @@ -5680,7 +5707,7 @@ vect_recog_bool_pattern (vec_info *vinfo, > > return pattern_stmt; > } > - else if (rhs_code == COND_EXPR > + else if ((rhs_code == COND_EXPR || cond) > && TREE_CODE (var) == SSA_NAME) > { > vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs)); > @@ -5700,18 +5727,31 @@ vect_recog_bool_pattern (vec_info *vinfo, > if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE) > return NULL; > > - if (check_bool_pattern (var, vinfo, bool_stmts)) > + if (check_bool_pattern (var, vinfo, bool_stmts, cond)) > var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo); > else if (integer_type_for_mask (var, vinfo)) > return NULL; > > - lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); > - pattern_stmt > - = gimple_build_assign (lhs, COND_EXPR, > - build2 (NE_EXPR, boolean_type_node, > - var, build_int_cst (TREE_TYPE (var), 0)), > - gimple_assign_rhs2 (last_stmt), > - gimple_assign_rhs3 (last_stmt)); > + if (!cond) > + { > + lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); > + pattern_stmt > + = gimple_build_assign (lhs, COND_EXPR, > + build2 (NE_EXPR, boolean_type_node, var, > + build_int_cst (TREE_TYPE (var), 0)), > + gimple_assign_rhs2 (last_stmt), > + gimple_assign_rhs3 (last_stmt)); > + } > + else > + { > + pattern_stmt > + = gimple_build_cond (gimple_cond_code (cond), gimple_cond_lhs (cond), > + gimple_cond_rhs (cond), > + gimple_cond_true_label (cond), > + gimple_cond_false_label (cond)); > + vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var)); > + vectype = truth_type_for (vectype); > + } > *type_out = vectype; > vect_pattern_detected ("vect_recog_bool_pattern", last_stmt); > > @@ -5725,7 +5765,7 @@ vect_recog_bool_pattern (vec_info *vinfo, > if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype))) > return NULL; > > - if (check_bool_pattern (var, vinfo, bool_stmts)) > + if (check_bool_pattern (var, vinfo, bool_stmts, cond)) > rhs = adjust_bool_stmts (vinfo, bool_stmts, > TREE_TYPE (vectype), stmt_vinfo); > else > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index 582c5e678fad802d6e76300fe3c939b9f2978f17..d801b72a149ebe6aa4d1f2942324b042d07be530 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, > vec vec_oprnds0 = vNULL; > vec vec_oprnds1 = vNULL; > tree mask_type; > - tree mask; > + tree mask = NULL_TREE; > > if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > return false; > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, > /* Transform. */ > > /* Handle def. */ > - lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info)); > - mask = vect_create_destination_var (lhs, mask_type); > + lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info)); > + if (lhs) > + mask = vect_create_destination_var (lhs, mask_type); > > vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies, > rhs1, &vec_oprnds0, vectype, > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, > gimple *new_stmt; > vec_rhs2 = vec_oprnds1[i]; > > - new_temp = make_ssa_name (mask); > + if (lhs) > + new_temp = make_ssa_name (mask); > + else > + new_temp = make_temp_ssa_name (mask_type, NULL, "cmp"); > if (bitop1 == NOP_EXPR) > { > new_stmt = gimple_build_assign (new_temp, code, > @@ -12723,6 +12727,176 @@ vectorizable_comparison (vec_info *vinfo, > return true; > } > > +/* Check to see if the current early break given in STMT_INFO is valid for > + vectorization. */ > + > +static bool > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, > + gimple_stmt_iterator *gsi, gimple **vec_stmt, > + slp_tree slp_node, stmt_vector_for_cost *cost_vec) > +{ > + loop_vec_info loop_vinfo = dyn_cast (vinfo); > + if (!loop_vinfo > + || !is_a (STMT_VINFO_STMT (stmt_info))) > + return false; > + > + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def) > + return false; > + > + if (!STMT_VINFO_RELEVANT_P (stmt_info)) > + return false; > + > + auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info)); > + tree vectype = STMT_VINFO_VECTYPE (stmt_info); > + gcc_assert (vectype); > + > + tree vectype_op0 = NULL_TREE; > + slp_tree slp_op0; > + tree op0; > + enum vect_def_type dt0; > + if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0, > + &vectype_op0)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "use not simple.\n"); > + return false; > + } > + > + machine_mode mode = TYPE_MODE (vectype); > + int ncopies; > + > + if (slp_node) > + ncopies = 1; > + else > + ncopies = vect_get_num_copies (loop_vinfo, vectype); > + > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); > + > + /* Analyze only. */ > + if (!vec_stmt) > + { > + if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "can't vectorize early exit because the " > + "target doesn't support flag setting vector " > + "comparisons.\n"); > + return false; > + } > + > + if (ncopies > 1 > + && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "can't vectorize early exit because the " > + "target does not support boolean vector OR for " > + "type %T.\n", vectype); > + return false; > + } > + > + if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi, > + vec_stmt, slp_node, cost_vec)) > + return false; > + > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) > + { > + if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype, > + OPTIMIZE_FOR_SPEED)) > + return false; > + else > + vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL); > + } > + > + > + return true; > + } > + > + /* Tranform. */ > + > + tree new_temp = NULL_TREE; > + gimple *new_stmt = NULL; > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n"); > + > + if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi, > + vec_stmt, slp_node, cost_vec)) > + gcc_unreachable (); > + > + gimple *stmt = STMT_VINFO_STMT (stmt_info); > + basic_block cond_bb = gimple_bb (stmt); > + gimple_stmt_iterator cond_gsi = gsi_last_bb (cond_bb); > + > + auto_vec stmts; > + > + if (slp_node) > + stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node)); > + else > + { > + auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info); > + stmts.reserve_exact (vec_stmts.length ()); > + for (auto stmt : vec_stmts) > + stmts.quick_push (gimple_assign_lhs (stmt)); > + } > + > + /* Determine if we need to reduce the final value. */ > + if (stmts.length () > 1) > + { > + /* We build the reductions in a way to maintain as much parallelism as > + possible. */ > + auto_vec workset (stmts.length ()); > + workset.splice (stmts); > + while (workset.length () > 1) > + { > + new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc"); > + tree arg0 = workset.pop (); > + tree arg1 = workset.pop (); > + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1); > + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, > + &cond_gsi); > + workset.quick_insert (0, new_temp); > + } > + } > + else > + new_temp = stmts[0]; > + > + gcc_assert (new_temp); > + > + tree cond = new_temp; > + /* If we have multiple statements after reduction we should check all the > + lanes and treat it as a full vector. */ > + if (masked_loop_p) > + { > + tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, > + vectype, 0); > + cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, > + &cond_gsi); > + } > + > + /* Now build the new conditional. Pattern gimple_conds get dropped during > + codegen so we must replace the original insn. */ > + stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info)); > + gcond *cond_stmt = as_a (stmt); > + gimple_cond_set_condition (cond_stmt, NE_EXPR, cond, > + build_zero_cst (vectype)); > + update_stmt (stmt); > + > + if (slp_node) > + SLP_TREE_VEC_DEFS (slp_node).truncate (0); > + else > + STMT_VINFO_VEC_STMTS (stmt_info).truncate (0); > + > + > + if (!slp_node) > + *vec_stmt = stmt; > + > + return true; > +} > + > /* If SLP_NODE is nonnull, return true if vectorizable_live_operation > can handle all live statements in the node. Otherwise return true > if STMT_INFO is not live or if vectorizable_live_operation can handle it. > @@ -12949,7 +13123,9 @@ vect_analyze_stmt (vec_info *vinfo, > || vectorizable_lc_phi (as_a (vinfo), > stmt_info, NULL, node) > || vectorizable_recurr (as_a (vinfo), > - stmt_info, NULL, node, cost_vec)); > + stmt_info, NULL, node, cost_vec) > + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, > + cost_vec)); > else > { > if (bb_vinfo) > @@ -12972,7 +13148,10 @@ vect_analyze_stmt (vec_info *vinfo, > NULL, NULL, node, cost_vec) > || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node, > cost_vec) > - || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)); > + || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec) > + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, > + cost_vec)); > + > } > > if (node) > @@ -13131,6 +13310,12 @@ vect_transform_stmt (vec_info *vinfo, > gcc_assert (done); > break; > > + case loop_exit_ctrl_vec_info_type: > + done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt, > + slp_node, NULL); > + gcc_assert (done); > + break; > + > default: > if (!STMT_VINFO_LIVE_P (stmt_info)) > { > @@ -14321,10 +14506,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, > } > else > { > + gcond *cond = NULL; > if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info)) > scalar_type = TREE_TYPE (DR_REF (dr)); > else if (gimple_call_internal_p (stmt, IFN_MASK_STORE)) > scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); > + else if ((cond = dyn_cast (stmt))) > + { > + /* We can't convert the scalar type to boolean yet, since booleans have a > + single bit precision and we need the vector boolean to be a > + representation of the integer mask. So set the correct integer type and > + convert to boolean vector once we have a vectype. */ > + scalar_type = TREE_TYPE (gimple_cond_lhs (cond)); > + } > else > scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); > > @@ -14339,12 +14533,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, > "get vectype for scalar type: %T\n", scalar_type); > } > vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size); > + > if (!vectype) > return opt_result::failure_at (stmt, > "not vectorized:" > " unsupported data-type %T\n", > scalar_type); > > + /* If we were a gcond, convert the resulting type to a vector boolean type now > + that we have the correct integer mask type. */ > + if (cond) > + vectype = truth_type_for (vectype); > + > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype); > } > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)