From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by sourceware.org (Postfix) with ESMTPS id 8DDFC3858C78 for ; Tue, 12 Dec 2023 10:11:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8DDFC3858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8DDFC3858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702375878; cv=none; b=EKN24n2HbL+/B5YrM91H4HZQCZtT2B27lBvoUOCeuzvM/kZCfFRxnWdn9K0d9KecmCYFtfpo3pb4dacz0wz/ltMrW7ybcAH0+lT2djcqC8924VtkAm2KmfLlcBEbV4pi2cz4r2dfK03NLzZE26QzT0oHYTHnFN+kZ1t8MPNItd8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702375878; c=relaxed/simple; bh=kXJaG3hQNJQvabxNG4sHVt/g1H246buxKqBNzcExMXg=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=l5HkMlhTqcxBw95dayvyo4H54vAaz8fa7m1OQ1zM/m/ig5k0J3d3TOHfvSBhb01Ew5RyK5sSBS7JWzxalaTiTXXS+UowSQg+wv7Ge/uZw8u2/MgpVoB0nSymQ3gqJCxb+LD8+ffo33Jqyigy0J5VQJwk2sy3oHbxu6XoHyQzgBk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 32D5D1F889; Tue, 12 Dec 2023 10:11:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702375874; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Yt1RwUvcR2sGo5t16rdPA5VLlLnYn2Oy/Y9x3Ofdl7E=; b=nUEICwGsTKDMsk2AvXT0Q2hvB/BWNCgdJFA05CSbKeVYaxvpz5GUuLCYiHBWmv+7qBX8lm QNSDzzpcmsChatTSU5pi0/zI6YY6AWOszw+wFOXFVtyzcsppyrkzZmrHNS0TNlNDZcSljl n7V9jM1nwgoGz8nloSCw9S6m73gRTQc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702375874; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Yt1RwUvcR2sGo5t16rdPA5VLlLnYn2Oy/Y9x3Ofdl7E=; b=AS3gjkzQTtxYB1frZxJbNEgkEDyr0tLeOrPfok0S7h5I4oHbDo0f8F24ry24KpFNoge3bR NnEuc7ONdI56nACA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702375874; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Yt1RwUvcR2sGo5t16rdPA5VLlLnYn2Oy/Y9x3Ofdl7E=; b=nUEICwGsTKDMsk2AvXT0Q2hvB/BWNCgdJFA05CSbKeVYaxvpz5GUuLCYiHBWmv+7qBX8lm QNSDzzpcmsChatTSU5pi0/zI6YY6AWOszw+wFOXFVtyzcsppyrkzZmrHNS0TNlNDZcSljl n7V9jM1nwgoGz8nloSCw9S6m73gRTQc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702375874; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Yt1RwUvcR2sGo5t16rdPA5VLlLnYn2Oy/Y9x3Ofdl7E=; b=AS3gjkzQTtxYB1frZxJbNEgkEDyr0tLeOrPfok0S7h5I4oHbDo0f8F24ry24KpFNoge3bR NnEuc7ONdI56nACA== Date: Tue, 12 Dec 2023 11:10:11 +0100 (CET) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" , richard.sandiford@arm.com Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code In-Reply-To: Message-ID: <3o102so4-34pp-3o01-o002-0q245oo10303@fhfr.qr> References: <85570n66-1540-0r07-7q80-269p3o133585@fhfr.qr> <5r3p7378-q309-ooqo-7o76-q9r567ns1890@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: X-Spam-Score: -3.10 X-Spam-Level: X-Spam-Flag: NO Authentication-Results: smtp-out2.suse.de; none X-Spamd-Result: default: False [-3.10 / 50.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_FIVE(0.00)[5]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Score: -3.10 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 11 Dec 2023, Tamar Christina wrote: > > > + vectype = truth_type_for (comp_type); > > > > so this leaves the producer of the mask in the GIMPLE_COND and we > > vectorize the GIMPLE_COND as > > > > mask_1 = ...; > > if (mask_1 != {-1,-1...}) > > .. > > > > ? In principle only the mask producer needs a vector type and that > > adjusted by bool handling, the branch itself doesn't need any > > STMT_VINFO_VECTYPE. > > > > As said I believe if you recognize a GIMPLE_COND pattern for conds > > that aren't bool != 0 producing the mask stmt this should be picked > > up by bool handling correctly already. > > > > Also as said piggy-backing on the COND_EXPR handling in this function > > which has the condition split out into a separate stmt(!) might not > > completely handle things correctly and you are likely missing > > the tcc_comparison handling of the embedded compare. > > > > Ok, I've stopped piggy-backing on the COND_EXPR handling and created > vect_recog_gcond_pattern. As you said in the previous email I've also > stopped setting the vectype for the gcond and instead use the type of the > operand. > > Note that because the pattern doesn't apply if you were already an NE_EXPR > I do need the extra truth_type_for for that case. Because in the case of e.g. > > a = b > 4; > If (a != 0) > > The producer of the mask is already outside of the cond but will not trigger > Boolean recognition. It should trigger because we have a mask use of 'a', I always forget where we do that - it might be where we compute mask precision stuff or it might be bool pattern recognition itself ... That said, a GIMPLE_COND (be it pattern or not) should be recognized as mask use. > That means that while the integral type is correct it > Won't be a Boolean one and vectorable_comparison expects a Boolean > vector. Alternatively, we can remove that assert? But that seems worse. > > Additionally in the previous email you mention "adjusted Boolean statement". > > I'm guessing you were referring to generating a COND_EXPR from the gcond. > So vect_recog_bool_pattern detects it? The problem with that this gets folded > to x & 1 and doesn't trigger. It also then blocks vectorization. So instead I've > not forced it. Not sure what you are refering to, but no - we shouln't generate a COND_EXPR from the gcond. Pattern recog generates COND_EXPRs for _data_ uses of masks (if we need a 'bool' data type for storing). We then get mask != 0 ? true : false; > > > + /* Determine if we need to reduce the final value. */ > > > + if (stmts.length () > 1) > > > + { > > > + /* We build the reductions in a way to maintain as much parallelism as > > > + possible. */ > > > + auto_vec workset (stmts.length ()); > > > + > > > + /* Mask the statements as we queue them up. */ > > > + if (masked_loop_p) > > > + for (auto stmt : stmts) > > > + workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), > > > + mask, stmt, &cond_gsi)); > > > + else > > > + workset.splice (stmts); > > > + > > > + while (workset.length () > 1) > > > + { > > > + new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc"); > > > + tree arg0 = workset.pop (); > > > + tree arg1 = workset.pop (); > > > + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1); > > > + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, > > > + &cond_gsi); > > > + workset.quick_insert (0, new_temp); > > > + } > > > + } > > > + else > > > + new_temp = stmts[0]; > > > + > > > + gcc_assert (new_temp); > > > + > > > + tree cond = new_temp; > > > + /* If we have multiple statements after reduction we should check all the > > > + lanes and treat it as a full vector. */ > > > + if (masked_loop_p) > > > + cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, > > > + &cond_gsi); > > > > You didn't fix any of the code above it seems, it's still wrong. > > > > Apologies, I hadn't realized that the last argument to get_loop_mask was the index. > > Should be fixed now. Is this closer to what you wanted? > The individual ops are now masked with separate masks. (See testcase when N=865). > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds. > (vect_recog_gcond_pattern): New. > (vect_vect_recog_func_ptrs): Use it. > * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without > lhs. > (vectorizable_early_exit): New. > (vect_analyze_stmt, vect_transform_stmt): Use it. > (vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-early-break_88.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c > new file mode 100644 > index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c > @@ -0,0 +1,36 @@ > +/* { dg-require-effective-target vect_early_break } */ > +/* { dg-require-effective-target vect_int } */ > + > +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */ > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > + > +#ifndef N > +#define N 5 > +#endif > +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f }; > +unsigned vect_b[N] = { 0 }; > + > +__attribute__ ((noinline, noipa)) > +unsigned test4(double x) > +{ > + unsigned ret = 0; > + for (int i = 0; i < N; i++) > + { > + if (vect_a[i] > x) > + break; > + vect_a[i] = x; > + > + } > + return ret; > +} > + > +extern void abort (); > + > +int main () > +{ > + if (test4 (7.0) != 0) > + abort (); > + > + if (vect_b[2] != 0 && vect_b[1] == 0) > + abort (); > +} > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df577c08adffa44e71b 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt, > if (!STMT_VINFO_VECTYPE (pattern_stmt_info)) > { > gcc_assert (!vectype > + || is_a (pattern_stmt) > || (VECTOR_BOOLEAN_TYPE_P (vectype) > == vect_use_mask_type_p (orig_stmt_info))); > STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype; > @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo) > return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1); > } > > +/* Function vect_recog_gcond_pattern > + > + Try to find pattern like following: > + > + if (a op b) > + > + where operator 'op' is not != and convert it to an adjusted boolean pattern > + > + mask = a op b > + if (mask != 0) > + > + and set the mask type on MASK. > + > + Input: > + > + * STMT_VINFO: The stmt at the end from which the pattern > + search begins, i.e. cast of a bool to > + an integer type. > + > + Output: > + > + * TYPE_OUT: The type of the output of this pattern. > + > + * Return value: A new stmt that will be used to replace the pattern. */ > + > +static gimple * > +vect_recog_gcond_pattern (vec_info *vinfo, > + stmt_vec_info stmt_vinfo, tree *type_out) > +{ > + gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo); > + gcond* cond = NULL; > + if (!(cond = dyn_cast (last_stmt))) > + return NULL; > + > + auto lhs = gimple_cond_lhs (cond); > + auto rhs = gimple_cond_rhs (cond); > + auto code = gimple_cond_code (cond); > + > + tree scalar_type = TREE_TYPE (lhs); > + if (VECTOR_TYPE_P (scalar_type)) > + return NULL; > + > + if (code == NE_EXPR && zerop (rhs)) I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here, an integer != 0 would not be an appropriate mask. I guess two relevant testcases would have an early exit like if (here[i] != 0) break; once with a 'bool here[]' and once with a 'int here[]'. > + return NULL; > + > + tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type); > + if (vecitype == NULL_TREE) > + return NULL; > + > + /* Build a scalar type for the boolean result that when vectorized matches the > + vector type of the result in size and number of elements. */ > + unsigned prec > + = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)), > + TYPE_VECTOR_SUBPARTS (vecitype)); > + > + scalar_type > + = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type)); > + > + vecitype = get_vectype_for_scalar_type (vinfo, scalar_type); > + if (vecitype == NULL_TREE) > + return NULL; > + > + tree vectype = truth_type_for (vecitype); That looks awfully complicated. I guess one complication is that we compute mask_precision & friends before this pattern gets recognized. See vect_determine_mask_precision and its handling of tcc_comparison, see also integer_type_for_mask. For comparisons properly handled during pattern recog the vector type is determined in vect_get_vector_types_for_stmt via else if (vect_use_mask_type_p (stmt_info)) { unsigned int precision = stmt_info->mask_precision; scalar_type = build_nonstandard_integer_type (precision, 1); vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, group_size); if (!vectype) return opt_result::failure_at (stmt, "not vectorized: unsupported" " data-type %T\n", scalar_type); Richard, do you have any advice here? I suppose vect_determine_precisions needs to handle the gcond case with bool != 0 somehow and for the extra mask producer we add here we have to emulate what it would have done, right? > + tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL); > + gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs); > + append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type); > + > + gimple *pattern_stmt > + = gimple_build_cond (NE_EXPR, new_lhs, > + build_int_cst (TREE_TYPE (new_lhs), 0), > + NULL_TREE, NULL_TREE); > + *type_out = vectype; > + vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt); > + return pattern_stmt; > +} > + > /* Function vect_recog_bool_pattern > > Try to find pattern like following: > @@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { > { vect_recog_divmod_pattern, "divmod" }, > { vect_recog_mult_pattern, "mult" }, > { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" }, > + { vect_recog_gcond_pattern, "gcond" }, > { vect_recog_bool_pattern, "bool" }, > /* This must come before mask conversion, and includes the parts > of mask conversion that are needed for gather and scatter > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea2e00b4450023f9 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, > vec vec_oprnds0 = vNULL; > vec vec_oprnds1 = vNULL; > tree mask_type; > - tree mask; > + tree mask = NULL_TREE; > > if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > return false; > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, > /* Transform. */ > > /* Handle def. */ > - lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info)); > - mask = vect_create_destination_var (lhs, mask_type); > + lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info)); > + if (lhs) > + mask = vect_create_destination_var (lhs, mask_type); > > vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies, > rhs1, &vec_oprnds0, vectype, > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, > gimple *new_stmt; > vec_rhs2 = vec_oprnds1[i]; > > - new_temp = make_ssa_name (mask); > + if (lhs) > + new_temp = make_ssa_name (mask); > + else > + new_temp = make_temp_ssa_name (mask_type, NULL, "cmp"); > if (bitop1 == NOP_EXPR) > { > new_stmt = gimple_build_assign (new_temp, code, > @@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo, > return true; > } > > +/* Check to see if the current early break given in STMT_INFO is valid for > + vectorization. */ > + > +static bool > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, > + gimple_stmt_iterator *gsi, gimple **vec_stmt, > + slp_tree slp_node, stmt_vector_for_cost *cost_vec) > +{ > + loop_vec_info loop_vinfo = dyn_cast (vinfo); > + if (!loop_vinfo > + || !is_a (STMT_VINFO_STMT (stmt_info))) > + return false; > + > + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def) > + return false; > + > + if (!STMT_VINFO_RELEVANT_P (stmt_info)) > + return false; > + > + DUMP_VECT_SCOPE ("vectorizable_early_exit"); > + > + auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info)); > + > + tree vectype_op0 = NULL_TREE; > + slp_tree slp_op0; > + tree op0; > + enum vect_def_type dt0; > + if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0, > + &vectype_op0)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "use not simple.\n"); > + return false; > + } > + > + stmt_vec_info op0_info = vinfo->lookup_def (op0); > + tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info)); > + gcc_assert (vectype); > + > + machine_mode mode = TYPE_MODE (vectype); > + int ncopies; > + > + if (slp_node) > + ncopies = 1; > + else > + ncopies = vect_get_num_copies (loop_vinfo, vectype); > + > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); > + > + /* Analyze only. */ > + if (!vec_stmt) > + { > + if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "can't vectorize early exit because the " > + "target doesn't support flag setting vector " > + "comparisons.\n"); > + return false; > + } > + > + if (ncopies > 1 > + && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "can't vectorize early exit because the " > + "target does not support boolean vector OR for " > + "type %T.\n", vectype); > + return false; > + } > + > + if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi, > + vec_stmt, slp_node, cost_vec)) > + return false; > + > + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) > + { > + if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype, > + OPTIMIZE_FOR_SPEED)) > + return false; > + else > + vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL); > + } > + > + > + return true; > + } > + > + /* Tranform. */ > + > + tree new_temp = NULL_TREE; > + gimple *new_stmt = NULL; > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n"); > + > + if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi, > + vec_stmt, slp_node, cost_vec)) > + gcc_unreachable (); > + > + gimple *stmt = STMT_VINFO_STMT (stmt_info); > + basic_block cond_bb = gimple_bb (stmt); > + gimple_stmt_iterator cond_gsi = gsi_last_bb (cond_bb); > + > + auto_vec stmts; > + > + tree mask = NULL_TREE; > + if (masked_loop_p) > + mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0); > + > + if (slp_node) > + stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node)); > + else > + { > + auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info); > + stmts.reserve_exact (vec_stmts.length ()); > + for (auto stmt : vec_stmts) > + stmts.quick_push (gimple_assign_lhs (stmt)); > + } > + > + /* Determine if we need to reduce the final value. */ > + if (stmts.length () > 1) > + { > + /* We build the reductions in a way to maintain as much parallelism as > + possible. */ > + auto_vec workset (stmts.length ()); > + > + /* Mask the statements as we queue them up. Normally we loop over > + vec_num, but since we inspect the exact results of vectorization > + we don't need to and instead can just use the stmts themselves. */ > + if (masked_loop_p) > + for (unsigned i = 0; i < stmts.length (); i++) > + { > + tree stmt_mask > + = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, > + i); > + stmt_mask > + = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask, > + stmts[i], &cond_gsi); > + workset.quick_push (stmt_mask); > + } > + else > + workset.splice (stmts); > + > + while (workset.length () > 1) > + { > + new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc"); > + tree arg0 = workset.pop (); > + tree arg1 = workset.pop (); > + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1); > + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, > + &cond_gsi); > + workset.quick_insert (0, new_temp); > + } > + } > + else > + new_temp = stmts[0]; > + > + gcc_assert (new_temp); > + > + tree cond = new_temp; > + /* If we have multiple statements after reduction we should check all the > + lanes and treat it as a full vector. */ > + if (masked_loop_p) > + cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, > + &cond_gsi); This is still wrong, you are applying mask[0] on the IOR reduced result. As suggested do that in the else { new_temp = stmts[0] } clause instead (or simply elide the optimization of a single vector) > + /* Now build the new conditional. Pattern gimple_conds get dropped during > + codegen so we must replace the original insn. */ > + stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info)); > + gcond *cond_stmt = as_a (stmt); > + /* When vectorizing we assume that if the branch edge is taken that we're > + exiting the loop. This is not however always the case as the compiler will > + rewrite conditions to always be a comparison against 0. To do this it > + sometimes flips the edges. This is fine for scalar, but for vector we > + then have to flip the test, as we're still assuming that if you take the > + branch edge that we found the exit condition. */ > + auto new_code = NE_EXPR; > + tree cst = build_zero_cst (vectype); > + if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo), > + BRANCH_EDGE (gimple_bb (cond_stmt))->dest)) > + { > + new_code = EQ_EXPR; > + cst = build_minus_one_cst (vectype); > + } > + > + gimple_cond_set_condition (cond_stmt, new_code, cond, cst); > + update_stmt (stmt); > + > + if (slp_node) > + SLP_TREE_VEC_DEFS (slp_node).truncate (0); > + else > + STMT_VINFO_VEC_STMTS (stmt_info).truncate (0); > + > + > + if (!slp_node) > + *vec_stmt = stmt; > + > + return true; > +} > + > /* If SLP_NODE is nonnull, return true if vectorizable_live_operation > can handle all live statements in the node. Otherwise return true > if STMT_INFO is not live or if vectorizable_live_operation can handle it. > @@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo, > || vectorizable_lc_phi (as_a (vinfo), > stmt_info, NULL, node) > || vectorizable_recurr (as_a (vinfo), > - stmt_info, NULL, node, cost_vec)); > + stmt_info, NULL, node, cost_vec) > + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, > + cost_vec)); > else > { > if (bb_vinfo) > @@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo, > NULL, NULL, node, cost_vec) > || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node, > cost_vec) > - || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)); > + || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec) > + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, > + cost_vec)); > + > } > > if (node) > @@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo, > gcc_assert (done); > break; > > + case loop_exit_ctrl_vec_info_type: > + done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt, > + slp_node, NULL); > + gcc_assert (done); > + break; > + > default: > if (!STMT_VINFO_LIVE_P (stmt_info)) > { > @@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, > } > else > { > + gcond *cond = NULL; > if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info)) > scalar_type = TREE_TYPE (DR_REF (dr)); > else if (gimple_call_internal_p (stmt, IFN_MASK_STORE)) > scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); > + else if ((cond = dyn_cast (stmt))) > + { > + /* We can't convert the scalar type to boolean yet, since booleans have a > + single bit precision and we need the vector boolean to be a > + representation of the integer mask. So set the correct integer type and > + convert to boolean vector once we have a vectype. */ > + scalar_type = TREE_TYPE (gimple_cond_lhs (cond)); You should get into the vect_use_mask_type_p (stmt_info) path for early exit conditions (see above with regard to mask_precision). > + } > else > scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); > > @@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, > "get vectype for scalar type: %T\n", scalar_type); > } > vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size); > + > if (!vectype) > return opt_result::failure_at (stmt, > "not vectorized:" > " unsupported data-type %T\n", > scalar_type); > > + /* If we were a gcond, convert the resulting type to a vector boolean type now > + that we have the correct integer mask type. */ > + if (cond) > + vectype = truth_type_for (vectype); > + which makes this moot. Richard. > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype); > } > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)