From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=RZg2=HX=suse.de=rguenther@sourceware.org>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131])
	by sourceware.org (Postfix) with ESMTPS id 8DDFC3858C78
	for <gcc-patches@gcc.gnu.org>; Tue, 12 Dec 2023 10:11:15 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8DDFC3858C78
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8DDFC3858C78
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.131
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702375878; cv=none;
	b=EKN24n2HbL+/B5YrM91H4HZQCZtT2B27lBvoUOCeuzvM/kZCfFRxnWdn9K0d9KecmCYFtfpo3pb4dacz0wz/ltMrW7ybcAH0+lT2djcqC8924VtkAm2KmfLlcBEbV4pi2cz4r2dfK03NLzZE26QzT0oHYTHnFN+kZ1t8MPNItd8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1702375878; c=relaxed/simple;
	bh=kXJaG3hQNJQvabxNG4sHVt/g1H246buxKqBNzcExMXg=;
	h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date:
	 From:To:Subject:Message-ID:MIME-Version; b=l5HkMlhTqcxBw95dayvyo4H54vAaz8fa7m1OQ1zM/m/ig5k0J3d3TOHfvSBhb01Ew5RyK5sSBS7JWzxalaTiTXXS+UowSQg+wv7Ge/uZw8u2/MgpVoB0nSymQ3gqJCxb+LD8+ffo33Jqyigy0J5VQJwk2sy3oHbxu6XoHyQzgBk=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from [10.168.4.150] (unknown [10.168.4.150])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by smtp-out2.suse.de (Postfix) with ESMTPS id 32D5D1F889;
	Tue, 12 Dec 2023 10:11:14 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1702375874; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Yt1RwUvcR2sGo5t16rdPA5VLlLnYn2Oy/Y9x3Ofdl7E=;
	b=nUEICwGsTKDMsk2AvXT0Q2hvB/BWNCgdJFA05CSbKeVYaxvpz5GUuLCYiHBWmv+7qBX8lm
	QNSDzzpcmsChatTSU5pi0/zI6YY6AWOszw+wFOXFVtyzcsppyrkzZmrHNS0TNlNDZcSljl
	n7V9jM1nwgoGz8nloSCw9S6m73gRTQc=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1702375874;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Yt1RwUvcR2sGo5t16rdPA5VLlLnYn2Oy/Y9x3Ofdl7E=;
	b=AS3gjkzQTtxYB1frZxJbNEgkEDyr0tLeOrPfok0S7h5I4oHbDo0f8F24ry24KpFNoge3bR
	NnEuc7ONdI56nACA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1702375874; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Yt1RwUvcR2sGo5t16rdPA5VLlLnYn2Oy/Y9x3Ofdl7E=;
	b=nUEICwGsTKDMsk2AvXT0Q2hvB/BWNCgdJFA05CSbKeVYaxvpz5GUuLCYiHBWmv+7qBX8lm
	QNSDzzpcmsChatTSU5pi0/zI6YY6AWOszw+wFOXFVtyzcsppyrkzZmrHNS0TNlNDZcSljl
	n7V9jM1nwgoGz8nloSCw9S6m73gRTQc=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1702375874;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Yt1RwUvcR2sGo5t16rdPA5VLlLnYn2Oy/Y9x3Ofdl7E=;
	b=AS3gjkzQTtxYB1frZxJbNEgkEDyr0tLeOrPfok0S7h5I4oHbDo0f8F24ry24KpFNoge3bR
	NnEuc7ONdI56nACA==
Date: Tue, 12 Dec 2023 11:10:11 +0100 (CET)
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>, 
    "jlaw@ventanamicro.com" <jlaw@ventanamicro.com>, richard.sandiford@arm.com
Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
 codegen of exit code
In-Reply-To:  <VI1PR08MB5325931CD930AEBBA5D27C7DFF8FA@VI1PR08MB5325.eurprd08.prod.outlook.com>
Message-ID: <3o102so4-34pp-3o01-o002-0q245oo10303@fhfr.qr>
References: <patch-17494-tamar@arm.com> <ZUiYPmcaBs87FUcl@arm.com>  <VI1PR08MB5325161E6B5C138D7E0BC1D9FFBDA@VI1PR08MB5325.eurprd08.prod.outlook.com> <85570n66-1540-0r07-7q80-269p3o133585@fhfr.qr>  <VI1PR08MB53254186A2A0585B263FF72AFF84A@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <5r3p7378-q309-ooqo-7o76-q9r567ns1890@fhfr.qr>  <VI1PR08MB5325B28C5C00234F14DB2E1DFF8AA@VI1PR08MB5325.eurprd08.prod.outlook.com> <prss2nrs-28s5-90n2-nron-qs04s8qpq364@fhfr.qr>  <VI1PR08MB5325DB133AC037DD8FF4BB90FF8AA@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <o85p3ps3-7813-or65-r8sr-q60n59s623s9@fhfr.qr>  <VI1PR08MB5325ECEA26DD3FD48A18D8E8FF8FA@VI1PR08MB5325.eurprd08.prod.outlook.com> <os1npr2n-4onn-53s5-5604-rsnpn6186sr9@fhfr.qr> 
 <VI1PR08MB5325931CD930AEBBA5D27C7DFF8FA@VI1PR08MB5325.eurprd08.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Level: 
X-Spam-Score: -3.10
X-Spam-Level: 
X-Spam-Flag: NO
Authentication-Results: smtp-out2.suse.de;
	none
X-Spamd-Result: default: False [-3.10 / 50.00];
	 ARC_NA(0.00)[];
	 TO_DN_EQ_ADDR_SOME(0.00)[];
	 FROM_HAS_DN(0.00)[];
	 TO_DN_SOME(0.00)[];
	 TO_MATCH_ENVRCPT_ALL(0.00)[];
	 MIME_GOOD(-0.10)[text/plain];
	 RCPT_COUNT_FIVE(0.00)[5];
	 DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];
	 DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email];
	 FUZZY_BLOCKED(0.00)[rspamd.com];
	 RCVD_COUNT_ZERO(0.00)[0];
	 FROM_EQ_ENVFROM(0.00)[];
	 MIME_TRACE(0.00)[0:+];
	 BAYES_HAM(-3.00)[100.00%]
X-Spam-Score: -3.10
X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, 11 Dec 2023, Tamar Christina wrote:

> > > +	  vectype = truth_type_for (comp_type);
> > 
> > so this leaves the producer of the mask in the GIMPLE_COND and we
> > vectorize the GIMPLE_COND as
> > 
> >   mask_1 = ...;
> >   if (mask_1 != {-1,-1...})
> >     ..
> > 
> > ?  In principle only the mask producer needs a vector type and that
> > adjusted by bool handling, the branch itself doesn't need any
> > STMT_VINFO_VECTYPE.
> > 
> > As said I believe if you recognize a GIMPLE_COND pattern for conds
> > that aren't bool != 0 producing the mask stmt this should be picked
> > up by bool handling correctly already.
> > 
> > Also as said piggy-backing on the COND_EXPR handling in this function
> > which has the condition split out into a separate stmt(!) might not
> > completely handle things correctly and you are likely missing
> > the tcc_comparison handling of the embedded compare.
> > 
> 
> Ok, I've stopped piggy-backing on the COND_EXPR handling and created
> vect_recog_gcond_pattern.  As you said in the previous email I've also
> stopped setting the vectype for the gcond and instead use the type of the
> operand.
> 
> Note that because the pattern doesn't apply if you were already an NE_EXPR
> I do need the extra truth_type_for for that case.  Because in the case of e.g.
> 
> a = b > 4;
> If (a != 0)
> 
> The producer of the mask is already outside of the cond but will not trigger
> Boolean recognition.

It should trigger because we have a mask use of 'a', I always forget
where we do that - it might be where we compute mask precision stuff
or it might be bool pattern recognition itself ...

That said, a GIMPLE_COND (be it pattern or not) should be recognized
as mask use.

>  That means that while the integral type is correct it
> Won't be a Boolean one and vectorable_comparison expects a Boolean
> vector.  Alternatively, we can remove that assert?  But that seems worse.
> 
> Additionally in the previous email you mention "adjusted Boolean statement".
> 
> I'm guessing you were referring to generating a COND_EXPR from the gcond.
> So vect_recog_bool_pattern detects it?  The problem with that this gets folded
> to x & 1 and doesn't trigger.  It also then blocks vectorization.  So instead I've
> not forced it.

Not sure what you are refering to, but no - we shouln't generate a
COND_EXPR from the gcond.  Pattern recog generates COND_EXPRs for
_data_ uses of masks (if we need a 'bool' data type for storing).
We then get mask != 0 ? true : false;

> > > +  /* Determine if we need to reduce the final value.  */
> > > +  if (stmts.length () > 1)
> > > +    {
> > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > +	 possible.  */
> > > +      auto_vec<tree> workset (stmts.length ());
> > > +
> > > +      /* Mask the statements as we queue them up.  */
> > > +      if (masked_loop_p)
> > > +	for (auto stmt : stmts)
> > > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > > +						mask, stmt, &cond_gsi));
> > > +      else
> > > +	workset.splice (stmts);
> > > +
> > > +      while (workset.length () > 1)
> > > +	{
> > > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > +	  tree arg0 = workset.pop ();
> > > +	  tree arg1 = workset.pop ();
> > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > +				       &cond_gsi);
> > > +	  workset.quick_insert (0, new_temp);
> > > +	}
> > > +    }
> > > +  else
> > > +    new_temp = stmts[0];
> > > +
> > > +  gcc_assert (new_temp);
> > > +
> > > +  tree cond = new_temp;
> > > +  /* If we have multiple statements after reduction we should check all the
> > > +     lanes and treat it as a full vector.  */
> > > +  if (masked_loop_p)
> > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > +			     &cond_gsi);
> > 
> > You didn't fix any of the code above it seems, it's still wrong.
> > 
> 
> Apologies, I hadn't realized that the last argument to get_loop_mask was the index.
> 
> Should be fixed now. Is this closer to what you wanted?
> The individual ops are now masked with separate masks. (See testcase when N=865).
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(vect_recog_gcond_pattern): New.
> 	(vect_vect_recog_func_ptrs): Use it.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-early-break_88.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> @@ -0,0 +1,36 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 5
> +#endif
> +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(double x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (7.0) != 0)
> +    abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +    abort ();
> +}
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df577c08adffa44e71b 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
>    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
>  }
>  
> +/* Function vect_recog_gcond_pattern
> +
> +   Try to find pattern like following:
> +
> +     if (a op b)
> +
> +   where operator 'op' is not != and convert it to an adjusted boolean pattern
> +
> +     mask = a op b
> +     if (mask != 0)
> +
> +   and set the mask type on MASK.
> +
> +   Input:
> +
> +   * STMT_VINFO: The stmt at the end from which the pattern
> +		 search begins, i.e. cast of a bool to
> +		 an integer type.
> +
> +   Output:
> +
> +   * TYPE_OUT: The type of the output of this pattern.
> +
> +   * Return value: A new stmt that will be used to replace the pattern.  */
> +
> +static gimple *
> +vect_recog_gcond_pattern (vec_info *vinfo,
> +			 stmt_vec_info stmt_vinfo, tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +  gcond* cond = NULL;
> +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> +    return NULL;
> +
> +  auto lhs = gimple_cond_lhs (cond);
> +  auto rhs = gimple_cond_rhs (cond);
> +  auto code = gimple_cond_code (cond);
> +
> +  tree scalar_type = TREE_TYPE (lhs);
> +  if (VECTOR_TYPE_P (scalar_type))
> +    return NULL;
> +
> +  if (code == NE_EXPR && zerop (rhs))

I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
an integer != 0 would not be an appropriate mask.  I guess two
relevant testcases would have an early exit like

   if (here[i] != 0)
     break;

once with a 'bool here[]' and once with a 'int here[]'.

> +    return NULL;
> +
> +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  if (vecitype == NULL_TREE)
> +    return NULL;
> +
> +  /* Build a scalar type for the boolean result that when vectorized matches the
> +     vector type of the result in size and number of elements.  */
> +  unsigned prec
> +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
> +			   TYPE_VECTOR_SUBPARTS (vecitype));
> +
> +  scalar_type
> +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
> +
> +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  if (vecitype == NULL_TREE)
> +    return NULL;
> +
> +  tree vectype = truth_type_for (vecitype);

That looks awfully complicated.  I guess one complication is that
we compute mask_precision & friends before this pattern gets
recognized.  See vect_determine_mask_precision and its handling
of tcc_comparison, see also integer_type_for_mask.  For comparisons
properly handled during pattern recog the vector type is determined
in vect_get_vector_types_for_stmt via

  else if (vect_use_mask_type_p (stmt_info))
    {
      unsigned int precision = stmt_info->mask_precision;
      scalar_type = build_nonstandard_integer_type (precision, 1);
      vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, 
group_size);
      if (!vectype)
        return opt_result::failure_at (stmt, "not vectorized: unsupported"
                                       " data-type %T\n", scalar_type);

Richard, do you have any advice here?  I suppose vect_determine_precisions
needs to handle the gcond case with bool != 0 somehow and for the
extra mask producer we add here we have to emulate what it would have 
done, right?

> +  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
> +  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
> +  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
> +
> +  gimple *pattern_stmt
> +    = gimple_build_cond (NE_EXPR, new_lhs,
> +			 build_int_cst (TREE_TYPE (new_lhs), 0),
> +			 NULL_TREE, NULL_TREE);
> +  *type_out = vectype;
> +  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
> +  return pattern_stmt;
> +}
> +
>  /* Function vect_recog_bool_pattern
>  
>     Try to find pattern like following:
> @@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_divmod_pattern, "divmod" },
>    { vect_recog_mult_pattern, "mult" },
>    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> +  { vect_recog_gcond_pattern, "gcond" },
>    { vect_recog_bool_pattern, "bool" },
>    /* This must come before mask conversion, and includes the parts
>       of mask conversion that are needed for gather and scatter
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea2e00b4450023f9 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  DUMP_VECT_SCOPE ("vectorizable_early_exit");
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  stmt_vec_info op0_info = vinfo->lookup_def (op0);
> +  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
> +  gcc_assert (vectype);
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  tree mask = NULL_TREE;
> +  if (masked_loop_p)
> +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +
> +      /* Mask the statements as we queue them up.  Normally we loop over
> +	 vec_num,  but since we inspect the exact results of vectorization
> +	 we don't need to and instead can just use the stmts themselves.  */
> +      if (masked_loop_p)
> +	for (unsigned i = 0; i < stmts.length (); i++)
> +	  {
> +	    tree stmt_mask
> +	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
> +				    i);
> +	    stmt_mask
> +	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
> +				  stmts[i], &cond_gsi);
> +	    workset.quick_push (stmt_mask);
> +	  }
> +      else
> +	workset.splice (stmts);
> +
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			     &cond_gsi);

This is still wrong, you are applying mask[0] on the IOR reduced result.
As suggested do that in the else { new_temp = stmts[0] } clause instead
(or simply elide the optimization of a single vector)

> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  /* When vectorizing we assume that if the branch edge is taken that we're
> +     exiting the loop.  This is not however always the case as the compiler will
> +     rewrite conditions to always be a comparison against 0.  To do this it
> +     sometimes flips the edges.  This is fine for scalar,  but for vector we
> +     then have to flip the test, as we're still assuming that if you take the
> +     branch edge that we found the exit condition.  */
> +  auto new_code = NE_EXPR;
> +  tree cst = build_zero_cst (vectype);
> +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> +			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> +    {
> +      new_code = EQ_EXPR;
> +      cst = build_minus_one_cst (vectype);
> +    }
> +
> +  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));

You should get into the vect_use_mask_type_p (stmt_info) path for
early exit conditions (see above with regard to mask_precision).

> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +

which makes this moot.

Richard.

>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)