public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
@ 2023-11-30  3:47 juzhe.zhong
  2023-11-30 10:39 ` Tamar Christina
  0 siblings, 1 reply; 24+ messages in thread
From: juzhe.zhong @ 2023-11-30  3:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, tamar.christina

[-- Attachment #1: Type: text/plain, Size: 1319 bytes --]

Hi, Richard and Tamar.

I am sorry for bothering you.
Hope you don't mind I give some comments:

Can we support partial vector for length ?

IMHO, we can do that as follows:

bool length_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);

if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
  {
    if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
                                        OPTIMIZE_FOR_SPEED))
      vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1);
    else
      vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
  }

if (length_loop_p)
  {
    tree len = vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, vectype, 0, 0);
    /* Use VCOND_MASK_LEN (all true, cond, all false, len, bias) to generate
       final mask = i < len + bias ? cond[i] : false.  */
    cond = gimple_build (&cond_gsi, IFN_VCOND_MASK_LEN, truth_type,
                         all true mask, cond, all false mask, len, bias);
  }
else if (masked_loop_p)
  {
    tree mask
      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
    cond
      = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, &cond_gsi);
  }

This is a prototype. Is this idea reasonable to Richi ?

Thanks.



juzhe.zhong@rivai.ai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-11-30  3:47 [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code juzhe.zhong
@ 2023-11-30 10:39 ` Tamar Christina
  2023-11-30 10:48   ` juzhe.zhong
  0 siblings, 1 reply; 24+ messages in thread
From: Tamar Christina @ 2023-11-30 10:39 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2114 bytes --]

Hi Juzhe,

I'm happy to take the hunks, just that I can't test it and don't know the specifics of how it lens work.
I still need to read up on it.

I tried adding that chunk in, but for the first bit `lens` seems undefined, and the second bit it seems `bias` is undefined.

I'll also need what to do for vectorizable_live_operations how to get the first element rather than the last.

Thanks,
Tamar

From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Thursday, November 30, 2023 4:48 AM
To: gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Richard Biener <richard.guenther@gmail.com>; Tamar Christina <Tamar.Christina@arm.com>
Subject: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

Hi, Richard and Tamar.

I am sorry for bothering you.
Hope you don't mind I give some comments:

Can we support partial vector for length ?

IMHO, we can do that as follows:

bool length_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);

if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
  {
    if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
                                        OPTIMIZE_FOR_SPEED))
      vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1);
    else
      vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
  }

if (length_loop_p)
  {
    tree len = vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, vectype, 0, 0);
    /* Use VCOND_MASK_LEN (all true, cond, all false, len, bias) to generate
       final mask = i < len + bias ? cond[i] : false.  */
    cond = gimple_build (&cond_gsi, IFN_VCOND_MASK_LEN, truth_type,
                         all true mask, cond, all false mask, len, bias);
  }
else if (masked_loop_p)
  {
    tree mask
      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
    cond
      = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, &cond_gsi);
  }

This is a prototype. Is this idea reasonable to Richi ?

Thanks.

________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-11-30 10:39 ` Tamar Christina
@ 2023-11-30 10:48   ` juzhe.zhong
  2023-11-30 10:58     ` Tamar Christina
  0 siblings, 1 reply; 24+ messages in thread
From: juzhe.zhong @ 2023-11-30 10:48 UTC (permalink / raw)
  To: tamar.christina, gcc-patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 3173 bytes --]

Thanks Tamar.

I am not sure whether I am not on the same page with you.

IMHO, ARM SVE will use the final mask = loop mask (generate by WHILE_ULT) & conditional mask.
Use that final mask to do the cbranch. Am I right ?

If yes, I leverage that for length and avoid too much codes change in your patch.

So, for RVV, the length is pretty same as loop mask in ARM SVE.
For example, suppose n = 4, in ARM SVE, WHILE_ULT (whilelo) generate mask = 0b11110000000....
Then use that mask to control the operations.

For RVV, is the same, length will be 4, then we will only process the elements with index < 4.

For bias, I think that won't be the issue. Currently, BIAS is not used by RVV and only used on len_load/len_store for IBM targets.
So, the bias value by default is 0 in all other situations except len_load/len_store specifically for IBM.



juzhe.zhong@rivai.ai
 
From: Tamar Christina
Date: 2023-11-30 18:39
To: juzhe.zhong@rivai.ai; gcc-patches
CC: Richard Biener
Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
Hi Juzhe,
 
I’m happy to take the hunks, just that I can’t test it and don’t know the specifics of how it lens work.
I still need to read up on it.
 
I tried adding that chunk in, but for the first bit `lens` seems undefined, and the second bit it seems `bias` is undefined.
 
I’ll also need what to do for vectorizable_live_operations how to get the first element rather than the last.
 
Thanks,
Tamar
 
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> 
Sent: Thursday, November 30, 2023 4:48 AM
To: gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Richard Biener <richard.guenther@gmail.com>; Tamar Christina <Tamar.Christina@arm.com>
Subject: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
 
Hi, Richard and Tamar.
 
I am sorry for bothering you.
Hope you don't mind I give some comments:
 
Can we support partial vector for length ?
 
IMHO, we can do that as follows:
 
bool length_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
 
if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
  {
    if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
                                        OPTIMIZE_FOR_SPEED))
      vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1);
    else
      vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
  }
 
if (length_loop_p)
  {
    tree len = vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, vectype, 0, 0);
    /* Use VCOND_MASK_LEN (all true, cond, all false, len, bias) to generate
       final mask = i < len + bias ? cond[i] : false.  */
    cond = gimple_build (&cond_gsi, IFN_VCOND_MASK_LEN, truth_type,
                         all true mask, cond, all false mask, len, bias);
  }
else if (masked_loop_p)
  {
    tree mask
      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
    cond
      = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, &cond_gsi);
  }
 
This is a prototype. Is this idea reasonable to Richi ?
 
Thanks.
 


juzhe.zhong@rivai.ai

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-11-30 10:48   ` juzhe.zhong
@ 2023-11-30 10:58     ` Tamar Christina
  0 siblings, 0 replies; 24+ messages in thread
From: Tamar Christina @ 2023-11-30 10:58 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 4417 bytes --]

Hi Juzhe,

I meant that “lens” is undefined, from looking around I guess that needs to be

  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);

for `bias` I meant

    cond = gimple_build (&cond_gsi, IFN_VCOND_MASK_LEN, truth_type,
                         all true mask, cond, all false mask, len, bias);

that variable `bias` isn’t defined. And I can’t find any other usage of IFN_VCOND_MASK_LEN creation to figure out what it’s supposed to be 😊

is it just an SImode 0?

Thanks,
Tamar


From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Thursday, November 30, 2023 11:49 AM
To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Richard Biener <richard.guenther@gmail.com>
Subject: Re: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

Thanks Tamar.

I am not sure whether I am not on the same page with you.

IMHO, ARM SVE will use the final mask = loop mask (generate by WHILE_ULT) & conditional mask.
Use that final mask to do the cbranch. Am I right ?

If yes, I leverage that for length and avoid too much codes change in your patch.

So, for RVV, the length is pretty same as loop mask in ARM SVE.
For example, suppose n = 4, in ARM SVE, WHILE_ULT (whilelo) generate mask = 0b11110000000....
Then use that mask to control the operations.

For RVV, is the same, length will be 4, then we will only process the elements with index < 4.

For bias, I think that won't be the issue. Currently, BIAS is not used by RVV and only used on len_load/len_store for IBM targets.
So, the bias value by default is 0 in all other situations except len_load/len_store specifically for IBM.

________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>

From: Tamar Christina<mailto:Tamar.Christina@arm.com>
Date: 2023-11-30 18:39
To: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: Richard Biener<mailto:richard.guenther@gmail.com>
Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
Hi Juzhe,

I’m happy to take the hunks, just that I can’t test it and don’t know the specifics of how it lens work.
I still need to read up on it.

I tried adding that chunk in, but for the first bit `lens` seems undefined, and the second bit it seems `bias` is undefined.

I’ll also need what to do for vectorizable_live_operations how to get the first element rather than the last.

Thanks,
Tamar

From: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>
Sent: Thursday, November 30, 2023 4:48 AM
To: gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
Cc: Richard Biener <richard.guenther@gmail.com<mailto:richard.guenther@gmail.com>>; Tamar Christina <Tamar.Christina@arm.com<mailto:Tamar.Christina@arm.com>>
Subject: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

Hi, Richard and Tamar.

I am sorry for bothering you.
Hope you don't mind I give some comments:

Can we support partial vector for length ?

IMHO, we can do that as follows:

bool length_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);

if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
  {
    if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
                                        OPTIMIZE_FOR_SPEED))
      vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1);
    else
      vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
  }

if (length_loop_p)
  {
    tree len = vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, vectype, 0, 0);
    /* Use VCOND_MASK_LEN (all true, cond, all false, len, bias) to generate
       final mask = i < len + bias ? cond[i] : false.  */
    cond = gimple_build (&cond_gsi, IFN_VCOND_MASK_LEN, truth_type,
                         all true mask, cond, all false mask, len, bias);
  }
else if (masked_loop_p)
  {
    tree mask
      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
    cond
      = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, &cond_gsi);
  }

This is a prototype. Is this idea reasonable to Richi ?

Thanks.

________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-14 13:12                                 ` Richard Biener
@ 2023-12-14 18:44                                   ` Tamar Christina
  0 siblings, 0 replies; 24+ messages in thread
From: Tamar Christina @ 2023-12-14 18:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, December 14, 2023 1:13 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; gcc-patches@gcc.gnu.org;
> nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Wed, 13 Dec 2023, Tamar Christina wrote:
> 
> > > > >   else if (vect_use_mask_type_p (stmt_info))
> > > > >     {
> > > > >       unsigned int precision = stmt_info->mask_precision;
> > > > >       scalar_type = build_nonstandard_integer_type (precision, 1);
> > > > >       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> > > > > group_size);
> > > > >       if (!vectype)
> > > > >         return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > > > >                                        " data-type %T\n", scalar_type);
> > > > >
> > > > > Richard, do you have any advice here?  I suppose
> vect_determine_precisions
> > > > > needs to handle the gcond case with bool != 0 somehow and for the
> > > > > extra mask producer we add here we have to emulate what it would have
> > > > > done, right?
> > > >
> > > > How about handling gconds directly in vect_determine_mask_precision?
> > > > In a sense it's not needed, since gconds are always roots, and so we
> > > > could calculate their precision on the fly instead.  But handling it in
> > > > vect_determine_mask_precision feels like it should reduce the number
> > > > of special cases.
> > >
> > > Yeah, that sounds worth trying.
> > >
> > > Richard.
> >
> > So here's a respin with this suggestion and the other issues fixed.
> > Note that the testcases still need to be updated with the right stanzas.
> >
> > The patch is much smaller, I still have a small change to
> > vect_get_vector_types_for_stmt  in case we get there on a gcond where
> > vect_recog_gcond_pattern couldn't apply due to the target missing an
> > appropriate vectype.  The change only gracefully rejects the gcond.
> >
> > Since patterns cannot apply to the same root twice I've had to also do
> > the split of the condition out of the gcond in bitfield lowering.
> 
> Bah.  Guess we want to fix that (next stage1).  Can you please add
> a comment to the split out done in vect_recog_bitfield_ref_pattern?

Done.

> 
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and
> no issues.
> >
> > Ok for master?
> 
> OK with the above change.
> 

Thanks!

That leaves one patch left. I'll have that for you Tuesday morning.  Currently going over it
to see if I can't clean it up (and usually a day or two helps) more to minimize respins.

I'll then also send the final testsuite patches.

Thanks for all the reviews!

Cheers,
Tamar

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-13 14:13                               ` Tamar Christina
@ 2023-12-14 13:12                                 ` Richard Biener
  2023-12-14 18:44                                   ` Tamar Christina
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2023-12-14 13:12 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Sandiford, gcc-patches, nd, jlaw

On Wed, 13 Dec 2023, Tamar Christina wrote:

> > > >   else if (vect_use_mask_type_p (stmt_info))
> > > >     {
> > > >       unsigned int precision = stmt_info->mask_precision;
> > > >       scalar_type = build_nonstandard_integer_type (precision, 1);
> > > >       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> > > > group_size);
> > > >       if (!vectype)
> > > >         return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > > >                                        " data-type %T\n", scalar_type);
> > > >
> > > > Richard, do you have any advice here?  I suppose vect_determine_precisions
> > > > needs to handle the gcond case with bool != 0 somehow and for the
> > > > extra mask producer we add here we have to emulate what it would have
> > > > done, right?
> > >
> > > How about handling gconds directly in vect_determine_mask_precision?
> > > In a sense it's not needed, since gconds are always roots, and so we
> > > could calculate their precision on the fly instead.  But handling it in
> > > vect_determine_mask_precision feels like it should reduce the number
> > > of special cases.
> > 
> > Yeah, that sounds worth trying.
> > 
> > Richard.
> 
> So here's a respin with this suggestion and the other issues fixed.
> Note that the testcases still need to be updated with the right stanzas.
> 
> The patch is much smaller, I still have a small change to
> vect_get_vector_types_for_stmt  in case we get there on a gcond where
> vect_recog_gcond_pattern couldn't apply due to the target missing an
> appropriate vectype.  The change only gracefully rejects the gcond.
> 
> Since patterns cannot apply to the same root twice I've had to also do
> the split of the condition out of the gcond in bitfield lowering.

Bah.  Guess we want to fix that (next stage1).  Can you please add
a comment to the split out done in vect_recog_bitfield_ref_pattern?

> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?

OK with the above change.

Thanks,
Richard.

> Thanks,
> Tamar
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gcond
> 	(vect_recog_bitfield_ref_pattern): Update to split out bool.
> 	(vect_recog_gcond_pattern): New.
> 	(possible_vector_mask_operation_p): Support gcond.
> 	(vect_determine_mask_precision): Likewise.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_get_vector_types_for_stmt): Rejects gcond if not lowered by
> 	vect_recog_gcond_pattern.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-early-break_84.c: New test.
> 	* gcc.dg/vect/vect-early-break_85.c: New test.
> 	* gcc.dg/vect/vect-early-break_86.c: New test.
> 	* gcc.dg/vect/vect-early-break_87.c: New test.
> 	* gcc.dg/vect/vect-early-break_88.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..0622339491d333b07c2ce895785b5216713097a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
> @@ -0,0 +1,39 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <stdbool.h>
> +
> +#ifndef N
> +#define N 17
> +#endif
> +bool vect_a[N] = { false, false, true, false, false, false,
> +                   false, false, false, false, false, false,
> +                   false, false, false, false, false };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(bool x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] == x)
> +     return 1;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (true) != 1)
> +    abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..39b3d9bad8681a2d15d7fc7de86bdd3ce0f0bd4e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
> @@ -0,0 +1,35 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 5
> +#endif
> +int vect_a[N] = { 5, 4, 8, 4, 6 };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(int x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] > x)
> +     return 1;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (7) != 1)
> +    abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..66eb570f4028bca4b631329d7af50c646d3c0cb3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
> @@ -0,0 +1,21 @@
> +/* { dg-additional-options "-std=gnu89" } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +extern void abort ();
> +extern void exit (int);
> +
> +__attribute__((noinline, noipa))
> +int f(x) {
> +  int i;
> +  for (i = 0; i < 8 && (x & 1) == 1; i++)
> +    x >>= 1;
> +  return i;
> +}
> +main() {
> +  if (f(4) != 0)
> +    abort();
> +  exit(0);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..67be67da0583ba7feda3bed09c42fa735da9b98e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
> @@ -0,0 +1,21 @@
> +/* { dg-additional-options "-std=gnu89" } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +extern void abort ();
> +extern void exit (int);
> +
> +__attribute__((noinline, noipa))
> +int f(x) {
> +  int i;
> +  for (i = 0; i < 8 && (x & 1) == 0; i++)
> +    x >>= 1;
> +  return i;
> +}
> +main() {
> +  if (f(4) != 2)
> +    abort();
> +  exit(0);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> @@ -0,0 +1,36 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 5
> +#endif
> +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(double x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (7.0) != 0)
> +    abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +    abort ();
> +}
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..f6ce27a7c45aa6ce72c402987958ee395c045a14 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -2786,15 +2787,24 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>  
>    if (!lhs)
>      {
> +      if (!vectype)
> +	return NULL;
> +
>        append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
> +      vectype = truth_type_for (vectype);
> +
> +      tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
>        gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
>        tree cond_cst = gimple_cond_rhs (cond_stmt);
> +      gimple *new_stmt
> +	= gimple_build_assign (new_lhs, gimple_cond_code (cond_stmt),
> +			       gimple_get_lhs (pattern_stmt),
> +			       fold_convert (container_type, cond_cst));
> +      append_pattern_def_seq (vinfo, stmt_info, new_stmt, vectype, container_type);
>        pattern_stmt
> -	= gimple_build_cond (gimple_cond_code (cond_stmt),
> -			     gimple_get_lhs (pattern_stmt),
> -			     fold_convert (ret_type, cond_cst),
> -			     gimple_cond_true_label (cond_stmt),
> -			     gimple_cond_false_label (cond_stmt));
> +	= gimple_build_cond (NE_EXPR, new_lhs,
> +			     build_zero_cst (TREE_TYPE (new_lhs)),
> +			     NULL_TREE, NULL_TREE);
>      }
>  
>    *type_out = STMT_VINFO_VECTYPE (stmt_info);
> @@ -5553,6 +5563,72 @@ integer_type_for_mask (tree var, vec_info *vinfo)
>    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
>  }
>  
> +/* Function vect_recog_gcond_pattern
> +
> +   Try to find pattern like following:
> +
> +     if (a op b)
> +
> +   where operator 'op' is not != and convert it to an adjusted boolean pattern
> +
> +     mask = a op b
> +     if (mask != 0)
> +
> +   and set the mask type on MASK.
> +
> +   Input:
> +
> +   * STMT_VINFO: The stmt at the end from which the pattern
> +		 search begins, i.e. cast of a bool to
> +		 an integer type.
> +
> +   Output:
> +
> +   * TYPE_OUT: The type of the output of this pattern.
> +
> +   * Return value: A new stmt that will be used to replace the pattern.  */
> +
> +static gimple *
> +vect_recog_gcond_pattern (vec_info *vinfo,
> +			 stmt_vec_info stmt_vinfo, tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +  gcond* cond = NULL;
> +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> +    return NULL;
> +
> +  auto lhs = gimple_cond_lhs (cond);
> +  auto rhs = gimple_cond_rhs (cond);
> +  auto code = gimple_cond_code (cond);
> +
> +  tree scalar_type = TREE_TYPE (lhs);
> +  if (VECTOR_TYPE_P (scalar_type))
> +    return NULL;
> +
> +  if (code == NE_EXPR
> +      && zerop (rhs)
> +      && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
> +    return NULL;
> +
> +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  if (vecitype == NULL_TREE)
> +    return NULL;
> +
> +  tree vectype = truth_type_for (vecitype);
> +
> +  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
> +  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
> +  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
> +
> +  gimple *pattern_stmt
> +    = gimple_build_cond (NE_EXPR, new_lhs,
> +			 build_int_cst (TREE_TYPE (new_lhs), 0),
> +			 NULL_TREE, NULL_TREE);
> +  *type_out = vectype;
> +  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
> +  return pattern_stmt;
> +}
> +
>  /* Function vect_recog_bool_pattern
>  
>     Try to find pattern like following:
> @@ -6581,15 +6657,26 @@ static bool
>  possible_vector_mask_operation_p (stmt_vec_info stmt_info)
>  {
>    tree lhs = gimple_get_lhs (stmt_info->stmt);
> +  tree_code code = ERROR_MARK;
> +  gassign *assign = NULL;
> +  gcond *cond = NULL;
> +
> +  if ((assign = dyn_cast <gassign *> (stmt_info->stmt)))
> +    code = gimple_assign_rhs_code (assign);
> +  else if ((cond = dyn_cast <gcond *> (stmt_info->stmt)))
> +    {
> +      lhs = gimple_cond_lhs (cond);
> +      code = gimple_cond_code (cond);
> +    }
> +
>    if (!lhs
>        || TREE_CODE (lhs) != SSA_NAME
>        || !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (lhs)))
>      return false;
>  
> -  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
> +  if (code != ERROR_MARK)
>      {
> -      tree_code rhs_code = gimple_assign_rhs_code (assign);
> -      switch (rhs_code)
> +      switch (code)
>  	{
>  	CASE_CONVERT:
>  	case SSA_NAME:
> @@ -6600,7 +6687,7 @@ possible_vector_mask_operation_p (stmt_vec_info stmt_info)
>  	  return true;
>  
>  	default:
> -	  return TREE_CODE_CLASS (rhs_code) == tcc_comparison;
> +	  return TREE_CODE_CLASS (code) == tcc_comparison;
>  	}
>      }
>    else if (is_a <gphi *> (stmt_info->stmt))
> @@ -6647,12 +6734,35 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
>       The number of operations are equal, but M16 would have given
>       a shorter dependency chain and allowed more ILP.  */
>    unsigned int precision = ~0U;
> -  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +
> +  /* If the statement compares two values that shouldn't use vector masks,
> +     try comparing the values as normal scalars instead.  */
> +  tree_code code = ERROR_MARK;
> +  tree op0_type;
> +  unsigned int nops = -1;
> +  unsigned int ops_start = 0;
> +
> +  if (gassign *assign = dyn_cast <gassign *> (stmt))
> +    {
> +      code = gimple_assign_rhs_code (assign);
> +      op0_type = TREE_TYPE (gimple_assign_rhs1 (assign));
> +      nops = gimple_num_ops (assign);
> +      ops_start = 1;
> +    }
> +  else if (gcond *cond = dyn_cast <gcond *> (stmt))
> +    {
> +      code = gimple_cond_code (cond);
> +      op0_type = TREE_TYPE (gimple_cond_lhs (cond));
> +      nops = 2;
> +      ops_start = 0;
> +    }
> +
> +  if (code != ERROR_MARK)
>      {
> -      unsigned int nops = gimple_num_ops (assign);
> -      for (unsigned int i = 1; i < nops; ++i)
> +      for (unsigned int i = ops_start; i < nops; ++i)
>  	{
> -	  tree rhs = gimple_op (assign, i);
> +	  tree rhs = gimple_op (stmt, i);
>  	  if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs)))
>  	    continue;
>  
> @@ -6669,19 +6779,15 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
>  	    }
>  	}
>  
> -      /* If the statement compares two values that shouldn't use vector masks,
> -	 try comparing the values as normal scalars instead.  */
> -      tree_code rhs_code = gimple_assign_rhs_code (assign);
>        if (precision == ~0U
> -	  && TREE_CODE_CLASS (rhs_code) == tcc_comparison)
> +	  && TREE_CODE_CLASS (code) == tcc_comparison)
>  	{
> -	  tree rhs1_type = TREE_TYPE (gimple_assign_rhs1 (assign));
>  	  scalar_mode mode;
>  	  tree vectype, mask_type;
> -	  if (is_a <scalar_mode> (TYPE_MODE (rhs1_type), &mode)
> -	      && (vectype = get_vectype_for_scalar_type (vinfo, rhs1_type))
> -	      && (mask_type = get_mask_type_for_scalar_type (vinfo, rhs1_type))
> -	      && expand_vec_cmp_expr_p (vectype, mask_type, rhs_code))
> +	  if (is_a <scalar_mode> (TYPE_MODE (op0_type), &mode)
> +	      && (vectype = get_vectype_for_scalar_type (vinfo, op0_type))
> +	      && (mask_type = get_mask_type_for_scalar_type (vinfo, op0_type))
> +	      && expand_vec_cmp_expr_p (vectype, mask_type, code))
>  	    precision = GET_MODE_BITSIZE (mode);
>  	}
>      }
> @@ -6860,6 +6966,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_divmod_pattern, "divmod" },
>    { vect_recog_mult_pattern, "mult" },
>    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> +  { vect_recog_gcond_pattern, "gcond" },
>    { vect_recog_bool_pattern, "bool" },
>    /* This must come before mask conversion, and includes the parts
>       of mask conversion that are needed for gather and scatter
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..766450cd85b55ce4dfd45878c5dc44cd09c68681 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,207 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  DUMP_VECT_SCOPE ("vectorizable_early_exit");
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +
> +  tree vectype = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  if (!vectype)
> +    return false;
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +
> +      /* Mask the statements as we queue them up.  Normally we loop over
> +	 vec_num,  but since we inspect the exact results of vectorization
> +	 we don't need to and instead can just use the stmts themselves.  */
> +      if (masked_loop_p)
> +	for (unsigned i = 0; i < stmts.length (); i++)
> +	  {
> +	    tree stmt_mask
> +	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
> +				    i);
> +	    stmt_mask
> +	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
> +				  stmts[i], &cond_gsi);
> +	    workset.quick_push (stmt_mask);
> +	  }
> +      else
> +	workset.splice (stmts);
> +
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    {
> +      new_temp = stmts[0];
> +      if (masked_loop_p)
> +	{
> +	  tree mask
> +	    = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> +	  new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
> +				       new_temp, &cond_gsi);
> +	}
> +    }
> +
> +  gcc_assert (new_temp);
> +
> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  /* When vectorizing we assume that if the branch edge is taken that we're
> +     exiting the loop.  This is not however always the case as the compiler will
> +     rewrite conditions to always be a comparison against 0.  To do this it
> +     sometimes flips the edges.  This is fine for scalar,  but for vector we
> +     then have to flip the test, as we're still assuming that if you take the
> +     branch edge that we found the exit condition.  */
> +  auto new_code = NE_EXPR;
> +  tree cst = build_zero_cst (vectype);
> +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> +			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> +    {
> +      new_code = EQ_EXPR;
> +      cst = build_minus_one_cst (vectype);
> +    }
> +
> +  gimple_cond_set_condition (cond_stmt, new_code, new_temp, cst);
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13154,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13179,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13341,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,6 +14537,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      /* If we got here with a gcond it means that the target had no available vector
> +	 mode for the scalar type.  We can't vectorize so abort.  */
> +      if (is_a <gcond *> (stmt))
> +	return opt_result::failure_at (stmt,
> +				       "not vectorized:"
> +				       " unsupported data-type for gcond %T\n",
> +				       scalar_type);
> +
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-12 11:30                             ` Richard Biener
@ 2023-12-13 14:13                               ` Tamar Christina
  2023-12-14 13:12                                 ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Tamar Christina @ 2023-12-13 14:13 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 25396 bytes --]

> > >   else if (vect_use_mask_type_p (stmt_info))
> > >     {
> > >       unsigned int precision = stmt_info->mask_precision;
> > >       scalar_type = build_nonstandard_integer_type (precision, 1);
> > >       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> > > group_size);
> > >       if (!vectype)
> > >         return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > >                                        " data-type %T\n", scalar_type);
> > >
> > > Richard, do you have any advice here?  I suppose vect_determine_precisions
> > > needs to handle the gcond case with bool != 0 somehow and for the
> > > extra mask producer we add here we have to emulate what it would have
> > > done, right?
> >
> > How about handling gconds directly in vect_determine_mask_precision?
> > In a sense it's not needed, since gconds are always roots, and so we
> > could calculate their precision on the fly instead.  But handling it in
> > vect_determine_mask_precision feels like it should reduce the number
> > of special cases.
> 
> Yeah, that sounds worth trying.
> 
> Richard.

So here's a respin with this suggestion and the other issues fixed.
Note that the testcases still need to be updated with the right stanzas.

The patch is much smaller, I still have a small change to
vect_get_vector_types_for_stmt  in case we get there on a gcond where
vect_recog_gcond_pattern couldn't apply due to the target missing an
appropriate vectype.  The change only gracefully rejects the gcond.

Since patterns cannot apply to the same root twice I've had to also do
the split of the condition out of the gcond in bitfield lowering.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar
gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gcond
	(vect_recog_bitfield_ref_pattern): Update to split out bool.
	(vect_recog_gcond_pattern): New.
	(possible_vector_mask_operation_p): Support gcond.
	(vect_determine_mask_precision): Likewise.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_get_vector_types_for_stmt): Rejects gcond if not lowered by
	vect_recog_gcond_pattern.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-early-break_84.c: New test.
	* gcc.dg/vect/vect-early-break_85.c: New test.
	* gcc.dg/vect/vect-early-break_86.c: New test.
	* gcc.dg/vect/vect-early-break_87.c: New test.
	* gcc.dg/vect/vect-early-break_88.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
new file mode 100644
index 0000000000000000000000000000000000000000..0622339491d333b07c2ce895785b5216713097a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 17
+#endif
+bool vect_a[N] = { false, false, true, false, false, false,
+                   false, false, false, false, false, false,
+                   false, false, false, false, false };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(bool x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] == x)
+     return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (true) != 1)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
new file mode 100644
index 0000000000000000000000000000000000000000..39b3d9bad8681a2d15d7fc7de86bdd3ce0f0bd4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
@@ -0,0 +1,35 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+int vect_a[N] = { 5, 4, 8, 4, 6 };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(int x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7) != 1)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
new file mode 100644
index 0000000000000000000000000000000000000000..66eb570f4028bca4b631329d7af50c646d3c0cb3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
@@ -0,0 +1,21 @@
+/* { dg-additional-options "-std=gnu89" } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+extern void exit (int);
+
+__attribute__((noinline, noipa))
+int f(x) {
+  int i;
+  for (i = 0; i < 8 && (x & 1) == 1; i++)
+    x >>= 1;
+  return i;
+}
+main() {
+  if (f(4) != 0)
+    abort();
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
new file mode 100644
index 0000000000000000000000000000000000000000..67be67da0583ba7feda3bed09c42fa735da9b98e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
@@ -0,0 +1,21 @@
+/* { dg-additional-options "-std=gnu89" } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+extern void exit (int);
+
+__attribute__((noinline, noipa))
+int f(x) {
+  int i;
+  for (i = 0; i < 8 && (x & 1) == 0; i++)
+    x >>= 1;
+  return i;
+}
+main() {
+  if (f(4) != 2)
+    abort();
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
new file mode 100644
index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(double x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7.0) != 0)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..f6ce27a7c45aa6ce72c402987958ee395c045a14 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -2786,15 +2787,24 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 
   if (!lhs)
     {
+      if (!vectype)
+	return NULL;
+
       append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      vectype = truth_type_for (vectype);
+
+      tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
       gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
       tree cond_cst = gimple_cond_rhs (cond_stmt);
+      gimple *new_stmt
+	= gimple_build_assign (new_lhs, gimple_cond_code (cond_stmt),
+			       gimple_get_lhs (pattern_stmt),
+			       fold_convert (container_type, cond_cst));
+      append_pattern_def_seq (vinfo, stmt_info, new_stmt, vectype, container_type);
       pattern_stmt
-	= gimple_build_cond (gimple_cond_code (cond_stmt),
-			     gimple_get_lhs (pattern_stmt),
-			     fold_convert (ret_type, cond_cst),
-			     gimple_cond_true_label (cond_stmt),
-			     gimple_cond_false_label (cond_stmt));
+	= gimple_build_cond (NE_EXPR, new_lhs,
+			     build_zero_cst (TREE_TYPE (new_lhs)),
+			     NULL_TREE, NULL_TREE);
     }
 
   *type_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -5553,6 +5563,72 @@ integer_type_for_mask (tree var, vec_info *vinfo)
   return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
 }
 
+/* Function vect_recog_gcond_pattern
+
+   Try to find pattern like following:
+
+     if (a op b)
+
+   where operator 'op' is not != and convert it to an adjusted boolean pattern
+
+     mask = a op b
+     if (mask != 0)
+
+   and set the mask type on MASK.
+
+   Input:
+
+   * STMT_VINFO: The stmt at the end from which the pattern
+		 search begins, i.e. cast of a bool to
+		 an integer type.
+
+   Output:
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.  */
+
+static gimple *
+vect_recog_gcond_pattern (vec_info *vinfo,
+			 stmt_vec_info stmt_vinfo, tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+  gcond* cond = NULL;
+  if (!(cond = dyn_cast <gcond *> (last_stmt)))
+    return NULL;
+
+  auto lhs = gimple_cond_lhs (cond);
+  auto rhs = gimple_cond_rhs (cond);
+  auto code = gimple_cond_code (cond);
+
+  tree scalar_type = TREE_TYPE (lhs);
+  if (VECTOR_TYPE_P (scalar_type))
+    return NULL;
+
+  if (code == NE_EXPR
+      && zerop (rhs)
+      && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
+    return NULL;
+
+  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  tree vectype = truth_type_for (vecitype);
+
+  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
+  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
+  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
+
+  gimple *pattern_stmt
+    = gimple_build_cond (NE_EXPR, new_lhs,
+			 build_int_cst (TREE_TYPE (new_lhs), 0),
+			 NULL_TREE, NULL_TREE);
+  *type_out = vectype;
+  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
+  return pattern_stmt;
+}
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -6581,15 +6657,26 @@ static bool
 possible_vector_mask_operation_p (stmt_vec_info stmt_info)
 {
   tree lhs = gimple_get_lhs (stmt_info->stmt);
+  tree_code code = ERROR_MARK;
+  gassign *assign = NULL;
+  gcond *cond = NULL;
+
+  if ((assign = dyn_cast <gassign *> (stmt_info->stmt)))
+    code = gimple_assign_rhs_code (assign);
+  else if ((cond = dyn_cast <gcond *> (stmt_info->stmt)))
+    {
+      lhs = gimple_cond_lhs (cond);
+      code = gimple_cond_code (cond);
+    }
+
   if (!lhs
       || TREE_CODE (lhs) != SSA_NAME
       || !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (lhs)))
     return false;
 
-  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
+  if (code != ERROR_MARK)
     {
-      tree_code rhs_code = gimple_assign_rhs_code (assign);
-      switch (rhs_code)
+      switch (code)
 	{
 	CASE_CONVERT:
 	case SSA_NAME:
@@ -6600,7 +6687,7 @@ possible_vector_mask_operation_p (stmt_vec_info stmt_info)
 	  return true;
 
 	default:
-	  return TREE_CODE_CLASS (rhs_code) == tcc_comparison;
+	  return TREE_CODE_CLASS (code) == tcc_comparison;
 	}
     }
   else if (is_a <gphi *> (stmt_info->stmt))
@@ -6647,12 +6734,35 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
      The number of operations are equal, but M16 would have given
      a shorter dependency chain and allowed more ILP.  */
   unsigned int precision = ~0U;
-  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+
+  /* If the statement compares two values that shouldn't use vector masks,
+     try comparing the values as normal scalars instead.  */
+  tree_code code = ERROR_MARK;
+  tree op0_type;
+  unsigned int nops = -1;
+  unsigned int ops_start = 0;
+
+  if (gassign *assign = dyn_cast <gassign *> (stmt))
+    {
+      code = gimple_assign_rhs_code (assign);
+      op0_type = TREE_TYPE (gimple_assign_rhs1 (assign));
+      nops = gimple_num_ops (assign);
+      ops_start = 1;
+    }
+  else if (gcond *cond = dyn_cast <gcond *> (stmt))
+    {
+      code = gimple_cond_code (cond);
+      op0_type = TREE_TYPE (gimple_cond_lhs (cond));
+      nops = 2;
+      ops_start = 0;
+    }
+
+  if (code != ERROR_MARK)
     {
-      unsigned int nops = gimple_num_ops (assign);
-      for (unsigned int i = 1; i < nops; ++i)
+      for (unsigned int i = ops_start; i < nops; ++i)
 	{
-	  tree rhs = gimple_op (assign, i);
+	  tree rhs = gimple_op (stmt, i);
 	  if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs)))
 	    continue;
 
@@ -6669,19 +6779,15 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
 	    }
 	}
 
-      /* If the statement compares two values that shouldn't use vector masks,
-	 try comparing the values as normal scalars instead.  */
-      tree_code rhs_code = gimple_assign_rhs_code (assign);
       if (precision == ~0U
-	  && TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+	  && TREE_CODE_CLASS (code) == tcc_comparison)
 	{
-	  tree rhs1_type = TREE_TYPE (gimple_assign_rhs1 (assign));
 	  scalar_mode mode;
 	  tree vectype, mask_type;
-	  if (is_a <scalar_mode> (TYPE_MODE (rhs1_type), &mode)
-	      && (vectype = get_vectype_for_scalar_type (vinfo, rhs1_type))
-	      && (mask_type = get_mask_type_for_scalar_type (vinfo, rhs1_type))
-	      && expand_vec_cmp_expr_p (vectype, mask_type, rhs_code))
+	  if (is_a <scalar_mode> (TYPE_MODE (op0_type), &mode)
+	      && (vectype = get_vectype_for_scalar_type (vinfo, op0_type))
+	      && (mask_type = get_mask_type_for_scalar_type (vinfo, op0_type))
+	      && expand_vec_cmp_expr_p (vectype, mask_type, code))
 	    precision = GET_MODE_BITSIZE (mode);
 	}
     }
@@ -6860,6 +6966,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
+  { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
   /* This must come before mask conversion, and includes the parts
      of mask conversion that are needed for gather and scatter
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..766450cd85b55ce4dfd45878c5dc44cd09c68681 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,207 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  DUMP_VECT_SCOPE ("vectorizable_early_exit");
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+
+  tree vectype = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  if (!vectype)
+    return false;
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  Normally we loop over
+	 vec_num,  but since we inspect the exact results of vectorization
+	 we don't need to and instead can just use the stmts themselves.  */
+      if (masked_loop_p)
+	for (unsigned i = 0; i < stmts.length (); i++)
+	  {
+	    tree stmt_mask
+	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
+				    i);
+	    stmt_mask
+	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
+				  stmts[i], &cond_gsi);
+	    workset.quick_push (stmt_mask);
+	  }
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    {
+      new_temp = stmts[0];
+      if (masked_loop_p)
+	{
+	  tree mask
+	    = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+	  new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
+				       new_temp, &cond_gsi);
+	}
+    }
+
+  gcc_assert (new_temp);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, new_temp, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13154,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13179,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13341,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,6 +14537,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      /* If we got here with a gcond it means that the target had no available vector
+	 mode for the scalar type.  We can't vectorize so abort.  */
+      if (is_a <gcond *> (stmt))
+	return opt_result::failure_at (stmt,
+				       "not vectorized:"
+				       " unsupported data-type for gcond %T\n",
+				       scalar_type);
+
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))

[-- Attachment #2: rb17969 (2).patch --]
[-- Type: application/octet-stream, Size: 22031 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
new file mode 100644
index 0000000000000000000000000000000000000000..0622339491d333b07c2ce895785b5216713097a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 17
+#endif
+bool vect_a[N] = { false, false, true, false, false, false,
+                   false, false, false, false, false, false,
+                   false, false, false, false, false };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(bool x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] == x)
+     return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (true) != 1)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
new file mode 100644
index 0000000000000000000000000000000000000000..39b3d9bad8681a2d15d7fc7de86bdd3ce0f0bd4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
@@ -0,0 +1,35 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+int vect_a[N] = { 5, 4, 8, 4, 6 };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(int x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7) != 1)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
new file mode 100644
index 0000000000000000000000000000000000000000..66eb570f4028bca4b631329d7af50c646d3c0cb3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
@@ -0,0 +1,21 @@
+/* { dg-additional-options "-std=gnu89" } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+extern void exit (int);
+
+__attribute__((noinline, noipa))
+int f(x) {
+  int i;
+  for (i = 0; i < 8 && (x & 1) == 1; i++)
+    x >>= 1;
+  return i;
+}
+main() {
+  if (f(4) != 0)
+    abort();
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
new file mode 100644
index 0000000000000000000000000000000000000000..67be67da0583ba7feda3bed09c42fa735da9b98e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
@@ -0,0 +1,21 @@
+/* { dg-additional-options "-std=gnu89" } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+extern void exit (int);
+
+__attribute__((noinline, noipa))
+int f(x) {
+  int i;
+  for (i = 0; i < 8 && (x & 1) == 0; i++)
+    x >>= 1;
+  return i;
+}
+main() {
+  if (f(4) != 2)
+    abort();
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
new file mode 100644
index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(double x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7.0) != 0)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..f6ce27a7c45aa6ce72c402987958ee395c045a14 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -2786,15 +2787,24 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 
   if (!lhs)
     {
+      if (!vectype)
+	return NULL;
+
       append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      vectype = truth_type_for (vectype);
+
+      tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
       gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
       tree cond_cst = gimple_cond_rhs (cond_stmt);
+      gimple *new_stmt
+	= gimple_build_assign (new_lhs, gimple_cond_code (cond_stmt),
+			       gimple_get_lhs (pattern_stmt),
+			       fold_convert (container_type, cond_cst));
+      append_pattern_def_seq (vinfo, stmt_info, new_stmt, vectype, container_type);
       pattern_stmt
-	= gimple_build_cond (gimple_cond_code (cond_stmt),
-			     gimple_get_lhs (pattern_stmt),
-			     fold_convert (ret_type, cond_cst),
-			     gimple_cond_true_label (cond_stmt),
-			     gimple_cond_false_label (cond_stmt));
+	= gimple_build_cond (NE_EXPR, new_lhs,
+			     build_zero_cst (TREE_TYPE (new_lhs)),
+			     NULL_TREE, NULL_TREE);
     }
 
   *type_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -5553,6 +5563,72 @@ integer_type_for_mask (tree var, vec_info *vinfo)
   return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
 }
 
+/* Function vect_recog_gcond_pattern
+
+   Try to find pattern like following:
+
+     if (a op b)
+
+   where operator 'op' is not != and convert it to an adjusted boolean pattern
+
+     mask = a op b
+     if (mask != 0)
+
+   and set the mask type on MASK.
+
+   Input:
+
+   * STMT_VINFO: The stmt at the end from which the pattern
+		 search begins, i.e. cast of a bool to
+		 an integer type.
+
+   Output:
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.  */
+
+static gimple *
+vect_recog_gcond_pattern (vec_info *vinfo,
+			 stmt_vec_info stmt_vinfo, tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+  gcond* cond = NULL;
+  if (!(cond = dyn_cast <gcond *> (last_stmt)))
+    return NULL;
+
+  auto lhs = gimple_cond_lhs (cond);
+  auto rhs = gimple_cond_rhs (cond);
+  auto code = gimple_cond_code (cond);
+
+  tree scalar_type = TREE_TYPE (lhs);
+  if (VECTOR_TYPE_P (scalar_type))
+    return NULL;
+
+  if (code == NE_EXPR
+      && zerop (rhs)
+      && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
+    return NULL;
+
+  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  tree vectype = truth_type_for (vecitype);
+
+  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
+  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
+  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
+
+  gimple *pattern_stmt
+    = gimple_build_cond (NE_EXPR, new_lhs,
+			 build_int_cst (TREE_TYPE (new_lhs), 0),
+			 NULL_TREE, NULL_TREE);
+  *type_out = vectype;
+  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
+  return pattern_stmt;
+}
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -6581,15 +6657,26 @@ static bool
 possible_vector_mask_operation_p (stmt_vec_info stmt_info)
 {
   tree lhs = gimple_get_lhs (stmt_info->stmt);
+  tree_code code = ERROR_MARK;
+  gassign *assign = NULL;
+  gcond *cond = NULL;
+
+  if ((assign = dyn_cast <gassign *> (stmt_info->stmt)))
+    code = gimple_assign_rhs_code (assign);
+  else if ((cond = dyn_cast <gcond *> (stmt_info->stmt)))
+    {
+      lhs = gimple_cond_lhs (cond);
+      code = gimple_cond_code (cond);
+    }
+
   if (!lhs
       || TREE_CODE (lhs) != SSA_NAME
       || !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (lhs)))
     return false;
 
-  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
+  if (code != ERROR_MARK)
     {
-      tree_code rhs_code = gimple_assign_rhs_code (assign);
-      switch (rhs_code)
+      switch (code)
 	{
 	CASE_CONVERT:
 	case SSA_NAME:
@@ -6600,7 +6687,7 @@ possible_vector_mask_operation_p (stmt_vec_info stmt_info)
 	  return true;
 
 	default:
-	  return TREE_CODE_CLASS (rhs_code) == tcc_comparison;
+	  return TREE_CODE_CLASS (code) == tcc_comparison;
 	}
     }
   else if (is_a <gphi *> (stmt_info->stmt))
@@ -6647,12 +6734,35 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
      The number of operations are equal, but M16 would have given
      a shorter dependency chain and allowed more ILP.  */
   unsigned int precision = ~0U;
-  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+
+  /* If the statement compares two values that shouldn't use vector masks,
+     try comparing the values as normal scalars instead.  */
+  tree_code code = ERROR_MARK;
+  tree op0_type;
+  unsigned int nops = -1;
+  unsigned int ops_start = 0;
+
+  if (gassign *assign = dyn_cast <gassign *> (stmt))
+    {
+      code = gimple_assign_rhs_code (assign);
+      op0_type = TREE_TYPE (gimple_assign_rhs1 (assign));
+      nops = gimple_num_ops (assign);
+      ops_start = 1;
+    }
+  else if (gcond *cond = dyn_cast <gcond *> (stmt))
+    {
+      code = gimple_cond_code (cond);
+      op0_type = TREE_TYPE (gimple_cond_lhs (cond));
+      nops = 2;
+      ops_start = 0;
+    }
+
+  if (code != ERROR_MARK)
     {
-      unsigned int nops = gimple_num_ops (assign);
-      for (unsigned int i = 1; i < nops; ++i)
+      for (unsigned int i = ops_start; i < nops; ++i)
 	{
-	  tree rhs = gimple_op (assign, i);
+	  tree rhs = gimple_op (stmt, i);
 	  if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs)))
 	    continue;
 
@@ -6669,19 +6779,15 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
 	    }
 	}
 
-      /* If the statement compares two values that shouldn't use vector masks,
-	 try comparing the values as normal scalars instead.  */
-      tree_code rhs_code = gimple_assign_rhs_code (assign);
       if (precision == ~0U
-	  && TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+	  && TREE_CODE_CLASS (code) == tcc_comparison)
 	{
-	  tree rhs1_type = TREE_TYPE (gimple_assign_rhs1 (assign));
 	  scalar_mode mode;
 	  tree vectype, mask_type;
-	  if (is_a <scalar_mode> (TYPE_MODE (rhs1_type), &mode)
-	      && (vectype = get_vectype_for_scalar_type (vinfo, rhs1_type))
-	      && (mask_type = get_mask_type_for_scalar_type (vinfo, rhs1_type))
-	      && expand_vec_cmp_expr_p (vectype, mask_type, rhs_code))
+	  if (is_a <scalar_mode> (TYPE_MODE (op0_type), &mode)
+	      && (vectype = get_vectype_for_scalar_type (vinfo, op0_type))
+	      && (mask_type = get_mask_type_for_scalar_type (vinfo, op0_type))
+	      && expand_vec_cmp_expr_p (vectype, mask_type, code))
 	    precision = GET_MODE_BITSIZE (mode);
 	}
     }
@@ -6860,6 +6966,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
+  { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
   /* This must come before mask conversion, and includes the parts
      of mask conversion that are needed for gather and scatter
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..766450cd85b55ce4dfd45878c5dc44cd09c68681 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,207 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  DUMP_VECT_SCOPE ("vectorizable_early_exit");
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+
+  tree vectype = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  if (!vectype)
+    return false;
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  Normally we loop over
+	 vec_num,  but since we inspect the exact results of vectorization
+	 we don't need to and instead can just use the stmts themselves.  */
+      if (masked_loop_p)
+	for (unsigned i = 0; i < stmts.length (); i++)
+	  {
+	    tree stmt_mask
+	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
+				    i);
+	    stmt_mask
+	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
+				  stmts[i], &cond_gsi);
+	    workset.quick_push (stmt_mask);
+	  }
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    {
+      new_temp = stmts[0];
+      if (masked_loop_p)
+	{
+	  tree mask
+	    = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+	  new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
+				       new_temp, &cond_gsi);
+	}
+    }
+
+  gcc_assert (new_temp);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, new_temp, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13154,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13179,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13341,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,6 +14537,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      /* If we got here with a gcond it means that the target had no available vector
+	 mode for the scalar type.  We can't vectorize so abort.  */
+      if (is_a <gcond *> (stmt))
+	return opt_result::failure_at (stmt,
+				       "not vectorized:"
+				       " unsupported data-type for gcond %T\n",
+				       scalar_type);
+
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-12 10:59                           ` Richard Sandiford
@ 2023-12-12 11:30                             ` Richard Biener
  2023-12-13 14:13                               ` Tamar Christina
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2023-12-12 11:30 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Tue, 12 Dec 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Mon, 11 Dec 2023, Tamar Christina wrote:
> >> @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
> >>    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
> >>  }
> >>  
> >> +/* Function vect_recog_gcond_pattern
> >> +
> >> +   Try to find pattern like following:
> >> +
> >> +     if (a op b)
> >> +
> >> +   where operator 'op' is not != and convert it to an adjusted boolean pattern
> >> +
> >> +     mask = a op b
> >> +     if (mask != 0)
> >> +
> >> +   and set the mask type on MASK.
> >> +
> >> +   Input:
> >> +
> >> +   * STMT_VINFO: The stmt at the end from which the pattern
> >> +		 search begins, i.e. cast of a bool to
> >> +		 an integer type.
> >> +
> >> +   Output:
> >> +
> >> +   * TYPE_OUT: The type of the output of this pattern.
> >> +
> >> +   * Return value: A new stmt that will be used to replace the pattern.  */
> >> +
> >> +static gimple *
> >> +vect_recog_gcond_pattern (vec_info *vinfo,
> >> +			 stmt_vec_info stmt_vinfo, tree *type_out)
> >> +{
> >> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> >> +  gcond* cond = NULL;
> >> +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> >> +    return NULL;
> >> +
> >> +  auto lhs = gimple_cond_lhs (cond);
> >> +  auto rhs = gimple_cond_rhs (cond);
> >> +  auto code = gimple_cond_code (cond);
> >> +
> >> +  tree scalar_type = TREE_TYPE (lhs);
> >> +  if (VECTOR_TYPE_P (scalar_type))
> >> +    return NULL;
> >> +
> >> +  if (code == NE_EXPR && zerop (rhs))
> >
> > I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
> > an integer != 0 would not be an appropriate mask.  I guess two
> > relevant testcases would have an early exit like
> >
> >    if (here[i] != 0)
> >      break;
> >
> > once with a 'bool here[]' and once with a 'int here[]'.
> >
> >> +    return NULL;
> >> +
> >> +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> >> +  if (vecitype == NULL_TREE)
> >> +    return NULL;
> >> +
> >> +  /* Build a scalar type for the boolean result that when vectorized matches the
> >> +     vector type of the result in size and number of elements.  */
> >> +  unsigned prec
> >> +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
> >> +			   TYPE_VECTOR_SUBPARTS (vecitype));
> >> +
> >> +  scalar_type
> >> +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
> >> +
> >> +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> >> +  if (vecitype == NULL_TREE)
> >> +    return NULL;
> >> +
> >> +  tree vectype = truth_type_for (vecitype);
> >
> > That looks awfully complicated.  I guess one complication is that
> > we compute mask_precision & friends before this pattern gets
> > recognized.  See vect_determine_mask_precision and its handling
> > of tcc_comparison, see also integer_type_for_mask.  For comparisons
> > properly handled during pattern recog the vector type is determined
> > in vect_get_vector_types_for_stmt via
> >
> >   else if (vect_use_mask_type_p (stmt_info))
> >     {
> >       unsigned int precision = stmt_info->mask_precision;
> >       scalar_type = build_nonstandard_integer_type (precision, 1);
> >       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, 
> > group_size);
> >       if (!vectype)
> >         return opt_result::failure_at (stmt, "not vectorized: unsupported"
> >                                        " data-type %T\n", scalar_type);
> >
> > Richard, do you have any advice here?  I suppose vect_determine_precisions
> > needs to handle the gcond case with bool != 0 somehow and for the
> > extra mask producer we add here we have to emulate what it would have 
> > done, right?
> 
> How about handling gconds directly in vect_determine_mask_precision?
> In a sense it's not needed, since gconds are always roots, and so we
> could calculate their precision on the fly instead.  But handling it in
> vect_determine_mask_precision feels like it should reduce the number
> of special cases.

Yeah, that sounds worth trying.

Richard.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-12 10:10                         ` Richard Biener
  2023-12-12 10:27                           ` Tamar Christina
@ 2023-12-12 10:59                           ` Richard Sandiford
  2023-12-12 11:30                             ` Richard Biener
  1 sibling, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2023-12-12 10:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd, jlaw

Richard Biener <rguenther@suse.de> writes:
> On Mon, 11 Dec 2023, Tamar Christina wrote:
>> @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
>>    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
>>  }
>>  
>> +/* Function vect_recog_gcond_pattern
>> +
>> +   Try to find pattern like following:
>> +
>> +     if (a op b)
>> +
>> +   where operator 'op' is not != and convert it to an adjusted boolean pattern
>> +
>> +     mask = a op b
>> +     if (mask != 0)
>> +
>> +   and set the mask type on MASK.
>> +
>> +   Input:
>> +
>> +   * STMT_VINFO: The stmt at the end from which the pattern
>> +		 search begins, i.e. cast of a bool to
>> +		 an integer type.
>> +
>> +   Output:
>> +
>> +   * TYPE_OUT: The type of the output of this pattern.
>> +
>> +   * Return value: A new stmt that will be used to replace the pattern.  */
>> +
>> +static gimple *
>> +vect_recog_gcond_pattern (vec_info *vinfo,
>> +			 stmt_vec_info stmt_vinfo, tree *type_out)
>> +{
>> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
>> +  gcond* cond = NULL;
>> +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
>> +    return NULL;
>> +
>> +  auto lhs = gimple_cond_lhs (cond);
>> +  auto rhs = gimple_cond_rhs (cond);
>> +  auto code = gimple_cond_code (cond);
>> +
>> +  tree scalar_type = TREE_TYPE (lhs);
>> +  if (VECTOR_TYPE_P (scalar_type))
>> +    return NULL;
>> +
>> +  if (code == NE_EXPR && zerop (rhs))
>
> I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
> an integer != 0 would not be an appropriate mask.  I guess two
> relevant testcases would have an early exit like
>
>    if (here[i] != 0)
>      break;
>
> once with a 'bool here[]' and once with a 'int here[]'.
>
>> +    return NULL;
>> +
>> +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
>> +  if (vecitype == NULL_TREE)
>> +    return NULL;
>> +
>> +  /* Build a scalar type for the boolean result that when vectorized matches the
>> +     vector type of the result in size and number of elements.  */
>> +  unsigned prec
>> +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
>> +			   TYPE_VECTOR_SUBPARTS (vecitype));
>> +
>> +  scalar_type
>> +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
>> +
>> +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
>> +  if (vecitype == NULL_TREE)
>> +    return NULL;
>> +
>> +  tree vectype = truth_type_for (vecitype);
>
> That looks awfully complicated.  I guess one complication is that
> we compute mask_precision & friends before this pattern gets
> recognized.  See vect_determine_mask_precision and its handling
> of tcc_comparison, see also integer_type_for_mask.  For comparisons
> properly handled during pattern recog the vector type is determined
> in vect_get_vector_types_for_stmt via
>
>   else if (vect_use_mask_type_p (stmt_info))
>     {
>       unsigned int precision = stmt_info->mask_precision;
>       scalar_type = build_nonstandard_integer_type (precision, 1);
>       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, 
> group_size);
>       if (!vectype)
>         return opt_result::failure_at (stmt, "not vectorized: unsupported"
>                                        " data-type %T\n", scalar_type);
>
> Richard, do you have any advice here?  I suppose vect_determine_precisions
> needs to handle the gcond case with bool != 0 somehow and for the
> extra mask producer we add here we have to emulate what it would have 
> done, right?

How about handling gconds directly in vect_determine_mask_precision?
In a sense it's not needed, since gconds are always roots, and so we
could calculate their precision on the fly instead.  But handling it in
vect_determine_mask_precision feels like it should reduce the number
of special cases.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-12 10:10                         ` Richard Biener
@ 2023-12-12 10:27                           ` Tamar Christina
  2023-12-12 10:59                           ` Richard Sandiford
  1 sibling, 0 replies; 24+ messages in thread
From: Tamar Christina @ 2023-12-12 10:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw, Richard Sandiford

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, December 12, 2023 10:10 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com;
> Richard Sandiford <Richard.Sandiford@arm.com>
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Mon, 11 Dec 2023, Tamar Christina wrote:
> 
> > > > +	  vectype = truth_type_for (comp_type);
> > >
> > > so this leaves the producer of the mask in the GIMPLE_COND and we
> > > vectorize the GIMPLE_COND as
> > >
> > >   mask_1 = ...;
> > >   if (mask_1 != {-1,-1...})
> > >     ..
> > >
> > > ?  In principle only the mask producer needs a vector type and that
> > > adjusted by bool handling, the branch itself doesn't need any
> > > STMT_VINFO_VECTYPE.
> > >
> > > As said I believe if you recognize a GIMPLE_COND pattern for conds
> > > that aren't bool != 0 producing the mask stmt this should be picked
> > > up by bool handling correctly already.
> > >
> > > Also as said piggy-backing on the COND_EXPR handling in this function
> > > which has the condition split out into a separate stmt(!) might not
> > > completely handle things correctly and you are likely missing
> > > the tcc_comparison handling of the embedded compare.
> > >
> >
> > Ok, I've stopped piggy-backing on the COND_EXPR handling and created
> > vect_recog_gcond_pattern.  As you said in the previous email I've also
> > stopped setting the vectype for the gcond and instead use the type of the
> > operand.
> >
> > Note that because the pattern doesn't apply if you were already an NE_EXPR
> > I do need the extra truth_type_for for that case.  Because in the case of e.g.
> >
> > a = b > 4;
> > If (a != 0)
> >
> > The producer of the mask is already outside of the cond but will not trigger
> > Boolean recognition.
> 
> It should trigger because we have a mask use of 'a', I always forget
> where we do that - it might be where we compute mask precision stuff
> or it might be bool pattern recognition itself ...
> 
> That said, a GIMPLE_COND (be it pattern or not) should be recognized
> as mask use.
> 
> >  That means that while the integral type is correct it
> > Won't be a Boolean one and vectorable_comparison expects a Boolean
> > vector.  Alternatively, we can remove that assert?  But that seems worse.
> >
> > Additionally in the previous email you mention "adjusted Boolean statement".
> >
> > I'm guessing you were referring to generating a COND_EXPR from the gcond.
> > So vect_recog_bool_pattern detects it?  The problem with that this gets folded
> > to x & 1 and doesn't trigger.  It also then blocks vectorization.  So instead I've
> > not forced it.
> 
> Not sure what you are refering to, but no - we shouln't generate a
> COND_EXPR from the gcond.  Pattern recog generates COND_EXPRs for
> _data_ uses of masks (if we need a 'bool' data type for storing).
> We then get mask != 0 ? true : false;
> 

Thought so.. but there happens to be a function called adjust_bool_stmts which
I thought you wanted me to call.  This is where the confusion came from, couldn't
tell whether "adjusted Boolean statement" meant just the new modified one or
one from adjust_bool_stmts.  But that last one didn't make much sense so hence
the question above..

> > > > +  /* Determine if we need to reduce the final value.  */
> > > > +  if (stmts.length () > 1)
> > > > +    {
> > > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > > +	 possible.  */
> > > > +      auto_vec<tree> workset (stmts.length ());
> > > > +
> > > > +      /* Mask the statements as we queue them up.  */
> > > > +      if (masked_loop_p)
> > > > +	for (auto stmt : stmts)
> > > > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > > > +						mask, stmt, &cond_gsi));
> > > > +      else
> > > > +	workset.splice (stmts);
> > > > +
> > > > +      while (workset.length () > 1)
> > > > +	{
> > > > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > > +	  tree arg0 = workset.pop ();
> > > > +	  tree arg1 = workset.pop ();
> > > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > > +				       &cond_gsi);
> > > > +	  workset.quick_insert (0, new_temp);
> > > > +	}
> > > > +    }
> > > > +  else
> > > > +    new_temp = stmts[0];
> > > > +
> > > > +  gcc_assert (new_temp);
> > > > +
> > > > +  tree cond = new_temp;
> > > > +  /* If we have multiple statements after reduction we should check all the
> > > > +     lanes and treat it as a full vector.  */
> > > > +  if (masked_loop_p)
> > > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > > +			     &cond_gsi);
> > >
> > > You didn't fix any of the code above it seems, it's still wrong.
> > >
> >
> > Apologies, I hadn't realized that the last argument to get_loop_mask was the
> index.
> >
> > Should be fixed now. Is this closer to what you wanted?
> > The individual ops are now masked with separate masks. (See testcase when
> N=865).
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > 	(vect_recog_gcond_pattern): New.
> > 	(vect_vect_recog_func_ptrs): Use it.
> > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > 	lhs.
> > 	(vectorizable_early_exit): New.
> > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* gcc.dg/vect/vect-early-break_88.c: New test.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..b64becd588973f5860119
> 6bfcb15afbe4bab60f2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 5
> > +#endif
> > +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
> > +unsigned vect_b[N] = { 0 };
> > +
> > +__attribute__ ((noinline, noipa))
> > +unsigned test4(double x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +extern void abort ();
> > +
> > +int main ()
> > +{
> > +  if (test4 (7.0) != 0)
> > +    abort ();
> > +
> > +  if (vect_b[2] != 0 && vect_b[1] == 0)
> > +    abort ();
> > +}
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df
> 577c08adffa44e71b 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> *pattern_stmt,
> >    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> >      {
> >        gcc_assert (!vectype
> > +		  || is_a <gcond *> (pattern_stmt)
> >  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
> >  		      == vect_use_mask_type_p (orig_stmt_info)));
> >        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
> >    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
> >  }
> >
> > +/* Function vect_recog_gcond_pattern
> > +
> > +   Try to find pattern like following:
> > +
> > +     if (a op b)
> > +
> > +   where operator 'op' is not != and convert it to an adjusted boolean pattern
> > +
> > +     mask = a op b
> > +     if (mask != 0)
> > +
> > +   and set the mask type on MASK.
> > +
> > +   Input:
> > +
> > +   * STMT_VINFO: The stmt at the end from which the pattern
> > +		 search begins, i.e. cast of a bool to
> > +		 an integer type.
> > +
> > +   Output:
> > +
> > +   * TYPE_OUT: The type of the output of this pattern.
> > +
> > +   * Return value: A new stmt that will be used to replace the pattern.  */
> > +
> > +static gimple *
> > +vect_recog_gcond_pattern (vec_info *vinfo,
> > +			 stmt_vec_info stmt_vinfo, tree *type_out)
> > +{
> > +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> > +  gcond* cond = NULL;
> > +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> > +    return NULL;
> > +
> > +  auto lhs = gimple_cond_lhs (cond);
> > +  auto rhs = gimple_cond_rhs (cond);
> > +  auto code = gimple_cond_code (cond);
> > +
> > +  tree scalar_type = TREE_TYPE (lhs);
> > +  if (VECTOR_TYPE_P (scalar_type))
> > +    return NULL;
> > +
> > +  if (code == NE_EXPR && zerop (rhs))
> 
> I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
> an integer != 0 would not be an appropriate mask.  I guess two
> relevant testcases would have an early exit like
> 
>    if (here[i] != 0)
>      break;
> 
> once with a 'bool here[]' and once with a 'int here[]'.
> 
> > +    return NULL;
> > +
> > +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> > +  if (vecitype == NULL_TREE)
> > +    return NULL;
> > +
> > +  /* Build a scalar type for the boolean result that when vectorized matches the
> > +     vector type of the result in size and number of elements.  */
> > +  unsigned prec
> > +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
> > +			   TYPE_VECTOR_SUBPARTS (vecitype));
> > +
> > +  scalar_type
> > +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
> > +
> > +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> > +  if (vecitype == NULL_TREE)
> > +    return NULL;
> > +
> > +  tree vectype = truth_type_for (vecitype);
> 
> That looks awfully complicated.  I guess one complication is that
> we compute mask_precision & friends before this pattern gets
> recognized.  See vect_determine_mask_precision and its handling
> of tcc_comparison, see also integer_type_for_mask.  For comparisons
> properly handled during pattern recog the vector type is determined
> in vect_get_vector_types_for_stmt via
> 
>   else if (vect_use_mask_type_p (stmt_info))
>     {
>       unsigned int precision = stmt_info->mask_precision;
>       scalar_type = build_nonstandard_integer_type (precision, 1);
>       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> group_size);
>       if (!vectype)
>         return opt_result::failure_at (stmt, "not vectorized: unsupported"
>                                        " data-type %T\n", scalar_type);
> 
> Richard, do you have any advice here?  I suppose vect_determine_precisions
> needs to handle the gcond case with bool != 0 somehow and for the
> extra mask producer we add here we have to emulate what it would have
> done, right?
> 

There seems to be an awful lots of places that determine types and precision 😊
It's quite hard to figure out which part is used where... and Boolean handling
seems to be especially complicated.

> > +  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
> > +  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
> > +  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
> > +
> > +  gimple *pattern_stmt
> > +    = gimple_build_cond (NE_EXPR, new_lhs,
> > +			 build_int_cst (TREE_TYPE (new_lhs), 0),
> > +			 NULL_TREE, NULL_TREE);
> > +  *type_out = vectype;
> > +  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
> > +  return pattern_stmt;
> > +}
> > +
> >  /* Function vect_recog_bool_pattern
> >
> >     Try to find pattern like following:
> > @@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] =
> {
> >    { vect_recog_divmod_pattern, "divmod" },
> >    { vect_recog_mult_pattern, "mult" },
> >    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> > +  { vect_recog_gcond_pattern, "gcond" },
> >    { vect_recog_bool_pattern, "bool" },
> >    /* This must come before mask conversion, and includes the parts
> >       of mask conversion that are needed for gather and scatter
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea
> 2e00b4450023f9 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    vec<tree> vec_oprnds0 = vNULL;
> >    vec<tree> vec_oprnds1 = vNULL;
> >    tree mask_type;
> > -  tree mask;
> > +  tree mask = NULL_TREE;
> >
> >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> >      return false;
> > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    /* Transform.  */
> >
> >    /* Handle def.  */
> > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > -  mask = vect_create_destination_var (lhs, mask_type);
> > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> > +  if (lhs)
> > +    mask = vect_create_destination_var (lhs, mask_type);
> >
> >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> >  		     rhs1, &vec_oprnds0, vectype,
> > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >        gimple *new_stmt;
> >        vec_rhs2 = vec_oprnds1[i];
> >
> > -      new_temp = make_ssa_name (mask);
> > +      if (lhs)
> > +	new_temp = make_ssa_name (mask);
> > +      else
> > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> >        if (bitop1 == NOP_EXPR)
> >  	{
> >  	  new_stmt = gimple_build_assign (new_temp, code,
> > @@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
> >    return true;
> >  }
> >
> > +/* Check to see if the current early break given in STMT_INFO is valid for
> > +   vectorization.  */
> > +
> > +static bool
> > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > +{
> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > +  if (!loop_vinfo
> > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > +    return false;
> > +
> > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> > +    return false;
> > +
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > +    return false;
> > +
> > +  DUMP_VECT_SCOPE ("vectorizable_early_exit");
> > +
> > +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> > +
> > +  tree vectype_op0 = NULL_TREE;
> > +  slp_tree slp_op0;
> > +  tree op0;
> > +  enum vect_def_type dt0;
> > +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> > +			   &vectype_op0))
> > +    {
> > +      if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			   "use not simple.\n");
> > +	return false;
> > +    }
> > +
> > +  stmt_vec_info op0_info = vinfo->lookup_def (op0);
> > +  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
> > +  gcc_assert (vectype);
> > +
> > +  machine_mode mode = TYPE_MODE (vectype);
> > +  int ncopies;
> > +
> > +  if (slp_node)
> > +    ncopies = 1;
> > +  else
> > +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > +
> > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +
> > +  /* Analyze only.  */
> > +  if (!vec_stmt)
> > +    {
> > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target doesn't support flag setting vector "
> > +			       "comparisons.\n");
> > +	  return false;
> > +	}
> > +
> > +      if (ncopies > 1
> > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target does not support boolean vector OR for "
> > +			       "type %T.\n", vectype);
> > +	  return false;
> > +	}
> > +
> > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +				      vec_stmt, slp_node, cost_vec))
> > +	return false;
> > +
> > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > +	{
> > +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> > +					      OPTIMIZE_FOR_SPEED))
> > +	    return false;
> > +	  else
> > +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> > +	}
> > +
> > +
> > +      return true;
> > +    }
> > +
> > +  /* Tranform.  */
> > +
> > +  tree new_temp = NULL_TREE;
> > +  gimple *new_stmt = NULL;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> > +
> > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +				  vec_stmt, slp_node, cost_vec))
> > +    gcc_unreachable ();
> > +
> > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > +  basic_block cond_bb = gimple_bb (stmt);
> > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > +
> > +  auto_vec<tree> stmts;
> > +
> > +  tree mask = NULL_TREE;
> > +  if (masked_loop_p)
> > +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> > +
> > +  if (slp_node)
> > +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > +  else
> > +    {
> > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > +      stmts.reserve_exact (vec_stmts.length ());
> > +      for (auto stmt : vec_stmts)
> > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > +    }
> > +
> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +    {
> > +      /* We build the reductions in a way to maintain as much parallelism as
> > +	 possible.  */
> > +      auto_vec<tree> workset (stmts.length ());
> > +
> > +      /* Mask the statements as we queue them up.  Normally we loop over
> > +	 vec_num,  but since we inspect the exact results of vectorization
> > +	 we don't need to and instead can just use the stmts themselves.  */
> > +      if (masked_loop_p)
> > +	for (unsigned i = 0; i < stmts.length (); i++)
> > +	  {
> > +	    tree stmt_mask
> > +	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
> > +				    i);
> > +	    stmt_mask
> > +	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
> > +				  stmts[i], &cond_gsi);
> > +	    workset.quick_push (stmt_mask);
> > +	  }
> > +      else
> > +	workset.splice (stmts);
> > +
> > +      while (workset.length () > 1)
> > +	{
> > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > +	  tree arg0 = workset.pop ();
> > +	  tree arg1 = workset.pop ();
> > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +				       &cond_gsi);
> > +	  workset.quick_insert (0, new_temp);
> > +	}
> > +    }
> > +  else
> > +    new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  /* If we have multiple statements after reduction we should check all the
> > +     lanes and treat it as a full vector.  */
> > +  if (masked_loop_p)
> > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +			     &cond_gsi);
> 
> This is still wrong, you are applying mask[0] on the IOR reduced result.
> As suggested do that in the else { new_temp = stmts[0] } clause instead
> (or simply elide the optimization of a single vector)

PEBKAC.. I had looked at it, and thought, it doesn't seem right since why would
mask[0] be used for both the elements and the final, but left it ☹

I'll wait for Richard's thoughts on the precision before re-spining. 

Thanks,
Tamar
> 
> > +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> > +     codegen so we must replace the original insn.  */
> > +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > +  gcond *cond_stmt = as_a <gcond *>(stmt);
> > +  /* When vectorizing we assume that if the branch edge is taken that we're
> > +     exiting the loop.  This is not however always the case as the compiler will
> > +     rewrite conditions to always be a comparison against 0.  To do this it
> > +     sometimes flips the edges.  This is fine for scalar,  but for vector we
> > +     then have to flip the test, as we're still assuming that if you take the
> > +     branch edge that we found the exit condition.  */
> > +  auto new_code = NE_EXPR;
> > +  tree cst = build_zero_cst (vectype);
> > +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > +			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> > +    {
> > +      new_code = EQ_EXPR;
> > +      cst = build_minus_one_cst (vectype);
> > +    }
> > +
> > +  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
> > +  update_stmt (stmt);
> > +
> > +  if (slp_node)
> > +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > +   else
> > +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> > +
> > +
> > +  if (!slp_node)
> > +    *vec_stmt = stmt;
> > +
> > +  return true;
> > +}
> > +
> >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> >     can handle all live statements in the node.  Otherwise return true
> >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > @@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
> >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> >  				  stmt_info, NULL, node)
> >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > -				   stmt_info, NULL, node, cost_vec));
> > +				   stmt_info, NULL, node, cost_vec)
> > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +				      cost_vec));
> >    else
> >      {
> >        if (bb_vinfo)
> > @@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
> >  					 NULL, NULL, node, cost_vec)
> >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> >  					  cost_vec)
> > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +					  cost_vec));
> > +
> >      }
> >
> >    if (node)
> > @@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
> >        gcc_assert (done);
> >        break;
> >
> > +    case loop_exit_ctrl_vec_info_type:
> > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > +				      slp_node, NULL);
> > +      gcc_assert (done);
> > +      break;
> > +
> >      default:
> >        if (!STMT_VINFO_LIVE_P (stmt_info))
> >  	{
> > @@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >      }
> >    else
> >      {
> > +      gcond *cond = NULL;
> >        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> >  	scalar_type = TREE_TYPE (DR_REF (dr));
> >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > +      else if ((cond = dyn_cast <gcond *> (stmt)))
> > +	{
> > +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> > +	     single bit precision and we need the vector boolean to be a
> > +	     representation of the integer mask.  So set the correct integer type and
> > +	     convert to boolean vector once we have a vectype.  */
> > +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> 
> You should get into the vect_use_mask_type_p (stmt_info) path for
> early exit conditions (see above with regard to mask_precision).
> 
> > +	}
> >        else
> >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> >
> > @@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >  			     "get vectype for scalar type: %T\n", scalar_type);
> >  	}
> >        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> > +
> >        if (!vectype)
> >  	return opt_result::failure_at (stmt,
> >  				       "not vectorized:"
> >  				       " unsupported data-type %T\n",
> >  				       scalar_type);
> >
> > +      /* If we were a gcond, convert the resulting type to a vector boolean type
> now
> > +	 that we have the correct integer mask type.  */
> > +      if (cond)
> > +	vectype = truth_type_for (vectype);
> > +
> 
> which makes this moot.
> 
> Richard.
> 
> >        if (dump_enabled_p ())
> >  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> >      }
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-11 23:12                       ` Tamar Christina
@ 2023-12-12 10:10                         ` Richard Biener
  2023-12-12 10:27                           ` Tamar Christina
  2023-12-12 10:59                           ` Richard Sandiford
  0 siblings, 2 replies; 24+ messages in thread
From: Richard Biener @ 2023-12-12 10:10 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw, richard.sandiford

On Mon, 11 Dec 2023, Tamar Christina wrote:

> > > +	  vectype = truth_type_for (comp_type);
> > 
> > so this leaves the producer of the mask in the GIMPLE_COND and we
> > vectorize the GIMPLE_COND as
> > 
> >   mask_1 = ...;
> >   if (mask_1 != {-1,-1...})
> >     ..
> > 
> > ?  In principle only the mask producer needs a vector type and that
> > adjusted by bool handling, the branch itself doesn't need any
> > STMT_VINFO_VECTYPE.
> > 
> > As said I believe if you recognize a GIMPLE_COND pattern for conds
> > that aren't bool != 0 producing the mask stmt this should be picked
> > up by bool handling correctly already.
> > 
> > Also as said piggy-backing on the COND_EXPR handling in this function
> > which has the condition split out into a separate stmt(!) might not
> > completely handle things correctly and you are likely missing
> > the tcc_comparison handling of the embedded compare.
> > 
> 
> Ok, I've stopped piggy-backing on the COND_EXPR handling and created
> vect_recog_gcond_pattern.  As you said in the previous email I've also
> stopped setting the vectype for the gcond and instead use the type of the
> operand.
> 
> Note that because the pattern doesn't apply if you were already an NE_EXPR
> I do need the extra truth_type_for for that case.  Because in the case of e.g.
> 
> a = b > 4;
> If (a != 0)
> 
> The producer of the mask is already outside of the cond but will not trigger
> Boolean recognition.

It should trigger because we have a mask use of 'a', I always forget
where we do that - it might be where we compute mask precision stuff
or it might be bool pattern recognition itself ...

That said, a GIMPLE_COND (be it pattern or not) should be recognized
as mask use.

>  That means that while the integral type is correct it
> Won't be a Boolean one and vectorable_comparison expects a Boolean
> vector.  Alternatively, we can remove that assert?  But that seems worse.
> 
> Additionally in the previous email you mention "adjusted Boolean statement".
> 
> I'm guessing you were referring to generating a COND_EXPR from the gcond.
> So vect_recog_bool_pattern detects it?  The problem with that this gets folded
> to x & 1 and doesn't trigger.  It also then blocks vectorization.  So instead I've
> not forced it.

Not sure what you are refering to, but no - we shouln't generate a
COND_EXPR from the gcond.  Pattern recog generates COND_EXPRs for
_data_ uses of masks (if we need a 'bool' data type for storing).
We then get mask != 0 ? true : false;

> > > +  /* Determine if we need to reduce the final value.  */
> > > +  if (stmts.length () > 1)
> > > +    {
> > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > +	 possible.  */
> > > +      auto_vec<tree> workset (stmts.length ());
> > > +
> > > +      /* Mask the statements as we queue them up.  */
> > > +      if (masked_loop_p)
> > > +	for (auto stmt : stmts)
> > > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > > +						mask, stmt, &cond_gsi));
> > > +      else
> > > +	workset.splice (stmts);
> > > +
> > > +      while (workset.length () > 1)
> > > +	{
> > > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > +	  tree arg0 = workset.pop ();
> > > +	  tree arg1 = workset.pop ();
> > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > +				       &cond_gsi);
> > > +	  workset.quick_insert (0, new_temp);
> > > +	}
> > > +    }
> > > +  else
> > > +    new_temp = stmts[0];
> > > +
> > > +  gcc_assert (new_temp);
> > > +
> > > +  tree cond = new_temp;
> > > +  /* If we have multiple statements after reduction we should check all the
> > > +     lanes and treat it as a full vector.  */
> > > +  if (masked_loop_p)
> > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > +			     &cond_gsi);
> > 
> > You didn't fix any of the code above it seems, it's still wrong.
> > 
> 
> Apologies, I hadn't realized that the last argument to get_loop_mask was the index.
> 
> Should be fixed now. Is this closer to what you wanted?
> The individual ops are now masked with separate masks. (See testcase when N=865).
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(vect_recog_gcond_pattern): New.
> 	(vect_vect_recog_func_ptrs): Use it.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-early-break_88.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> @@ -0,0 +1,36 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 5
> +#endif
> +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(double x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (7.0) != 0)
> +    abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +    abort ();
> +}
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df577c08adffa44e71b 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
>    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
>  }
>  
> +/* Function vect_recog_gcond_pattern
> +
> +   Try to find pattern like following:
> +
> +     if (a op b)
> +
> +   where operator 'op' is not != and convert it to an adjusted boolean pattern
> +
> +     mask = a op b
> +     if (mask != 0)
> +
> +   and set the mask type on MASK.
> +
> +   Input:
> +
> +   * STMT_VINFO: The stmt at the end from which the pattern
> +		 search begins, i.e. cast of a bool to
> +		 an integer type.
> +
> +   Output:
> +
> +   * TYPE_OUT: The type of the output of this pattern.
> +
> +   * Return value: A new stmt that will be used to replace the pattern.  */
> +
> +static gimple *
> +vect_recog_gcond_pattern (vec_info *vinfo,
> +			 stmt_vec_info stmt_vinfo, tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +  gcond* cond = NULL;
> +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> +    return NULL;
> +
> +  auto lhs = gimple_cond_lhs (cond);
> +  auto rhs = gimple_cond_rhs (cond);
> +  auto code = gimple_cond_code (cond);
> +
> +  tree scalar_type = TREE_TYPE (lhs);
> +  if (VECTOR_TYPE_P (scalar_type))
> +    return NULL;
> +
> +  if (code == NE_EXPR && zerop (rhs))

I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
an integer != 0 would not be an appropriate mask.  I guess two
relevant testcases would have an early exit like

   if (here[i] != 0)
     break;

once with a 'bool here[]' and once with a 'int here[]'.

> +    return NULL;
> +
> +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  if (vecitype == NULL_TREE)
> +    return NULL;
> +
> +  /* Build a scalar type for the boolean result that when vectorized matches the
> +     vector type of the result in size and number of elements.  */
> +  unsigned prec
> +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
> +			   TYPE_VECTOR_SUBPARTS (vecitype));
> +
> +  scalar_type
> +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
> +
> +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  if (vecitype == NULL_TREE)
> +    return NULL;
> +
> +  tree vectype = truth_type_for (vecitype);

That looks awfully complicated.  I guess one complication is that
we compute mask_precision & friends before this pattern gets
recognized.  See vect_determine_mask_precision and its handling
of tcc_comparison, see also integer_type_for_mask.  For comparisons
properly handled during pattern recog the vector type is determined
in vect_get_vector_types_for_stmt via

  else if (vect_use_mask_type_p (stmt_info))
    {
      unsigned int precision = stmt_info->mask_precision;
      scalar_type = build_nonstandard_integer_type (precision, 1);
      vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, 
group_size);
      if (!vectype)
        return opt_result::failure_at (stmt, "not vectorized: unsupported"
                                       " data-type %T\n", scalar_type);

Richard, do you have any advice here?  I suppose vect_determine_precisions
needs to handle the gcond case with bool != 0 somehow and for the
extra mask producer we add here we have to emulate what it would have 
done, right?

> +  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
> +  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
> +  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
> +
> +  gimple *pattern_stmt
> +    = gimple_build_cond (NE_EXPR, new_lhs,
> +			 build_int_cst (TREE_TYPE (new_lhs), 0),
> +			 NULL_TREE, NULL_TREE);
> +  *type_out = vectype;
> +  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
> +  return pattern_stmt;
> +}
> +
>  /* Function vect_recog_bool_pattern
>  
>     Try to find pattern like following:
> @@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_divmod_pattern, "divmod" },
>    { vect_recog_mult_pattern, "mult" },
>    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> +  { vect_recog_gcond_pattern, "gcond" },
>    { vect_recog_bool_pattern, "bool" },
>    /* This must come before mask conversion, and includes the parts
>       of mask conversion that are needed for gather and scatter
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea2e00b4450023f9 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  DUMP_VECT_SCOPE ("vectorizable_early_exit");
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  stmt_vec_info op0_info = vinfo->lookup_def (op0);
> +  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
> +  gcc_assert (vectype);
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  tree mask = NULL_TREE;
> +  if (masked_loop_p)
> +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +
> +      /* Mask the statements as we queue them up.  Normally we loop over
> +	 vec_num,  but since we inspect the exact results of vectorization
> +	 we don't need to and instead can just use the stmts themselves.  */
> +      if (masked_loop_p)
> +	for (unsigned i = 0; i < stmts.length (); i++)
> +	  {
> +	    tree stmt_mask
> +	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
> +				    i);
> +	    stmt_mask
> +	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
> +				  stmts[i], &cond_gsi);
> +	    workset.quick_push (stmt_mask);
> +	  }
> +      else
> +	workset.splice (stmts);
> +
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			     &cond_gsi);

This is still wrong, you are applying mask[0] on the IOR reduced result.
As suggested do that in the else { new_temp = stmts[0] } clause instead
(or simply elide the optimization of a single vector)

> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  /* When vectorizing we assume that if the branch edge is taken that we're
> +     exiting the loop.  This is not however always the case as the compiler will
> +     rewrite conditions to always be a comparison against 0.  To do this it
> +     sometimes flips the edges.  This is fine for scalar,  but for vector we
> +     then have to flip the test, as we're still assuming that if you take the
> +     branch edge that we found the exit condition.  */
> +  auto new_code = NE_EXPR;
> +  tree cst = build_zero_cst (vectype);
> +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> +			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> +    {
> +      new_code = EQ_EXPR;
> +      cst = build_minus_one_cst (vectype);
> +    }
> +
> +  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));

You should get into the vect_use_mask_type_p (stmt_info) path for
early exit conditions (see above with regard to mask_precision).

> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +

which makes this moot.

Richard.

>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-11  9:36                     ` Richard Biener
@ 2023-12-11 23:12                       ` Tamar Christina
  2023-12-12 10:10                         ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Tamar Christina @ 2023-12-11 23:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 19900 bytes --]

> > +	  vectype = truth_type_for (comp_type);
> 
> so this leaves the producer of the mask in the GIMPLE_COND and we
> vectorize the GIMPLE_COND as
> 
>   mask_1 = ...;
>   if (mask_1 != {-1,-1...})
>     ..
> 
> ?  In principle only the mask producer needs a vector type and that
> adjusted by bool handling, the branch itself doesn't need any
> STMT_VINFO_VECTYPE.
> 
> As said I believe if you recognize a GIMPLE_COND pattern for conds
> that aren't bool != 0 producing the mask stmt this should be picked
> up by bool handling correctly already.
> 
> Also as said piggy-backing on the COND_EXPR handling in this function
> which has the condition split out into a separate stmt(!) might not
> completely handle things correctly and you are likely missing
> the tcc_comparison handling of the embedded compare.
> 

Ok, I've stopped piggy-backing on the COND_EXPR handling and created
vect_recog_gcond_pattern.  As you said in the previous email I've also
stopped setting the vectype for the gcond and instead use the type of the
operand.

Note that because the pattern doesn't apply if you were already an NE_EXPR
I do need the extra truth_type_for for that case.  Because in the case of e.g.

a = b > 4;
If (a != 0)

The producer of the mask is already outside of the cond but will not trigger
Boolean recognition.  That means that while the integral type is correct it
Won't be a Boolean one and vectorable_comparison expects a Boolean
vector.  Alternatively, we can remove that assert?  But that seems worse.

Additionally in the previous email you mention "adjusted Boolean statement".

I'm guessing you were referring to generating a COND_EXPR from the gcond.
So vect_recog_bool_pattern detects it?  The problem with that this gets folded
to x & 1 and doesn't trigger.  It also then blocks vectorization.  So instead I've
not forced it.

> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +    {
> > +      /* We build the reductions in a way to maintain as much parallelism as
> > +	 possible.  */
> > +      auto_vec<tree> workset (stmts.length ());
> > +
> > +      /* Mask the statements as we queue them up.  */
> > +      if (masked_loop_p)
> > +	for (auto stmt : stmts)
> > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > +						mask, stmt, &cond_gsi));
> > +      else
> > +	workset.splice (stmts);
> > +
> > +      while (workset.length () > 1)
> > +	{
> > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > +	  tree arg0 = workset.pop ();
> > +	  tree arg1 = workset.pop ();
> > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +				       &cond_gsi);
> > +	  workset.quick_insert (0, new_temp);
> > +	}
> > +    }
> > +  else
> > +    new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  /* If we have multiple statements after reduction we should check all the
> > +     lanes and treat it as a full vector.  */
> > +  if (masked_loop_p)
> > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +			     &cond_gsi);
> 
> You didn't fix any of the code above it seems, it's still wrong.
> 

Apologies, I hadn't realized that the last argument to get_loop_mask was the index.

Should be fixed now. Is this closer to what you wanted?
The individual ops are now masked with separate masks. (See testcase when N=865).

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
	(vect_recog_gcond_pattern): New.
	(vect_vect_recog_func_ptrs): Use it.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-early-break_88.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
new file mode 100644
index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(double x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7.0) != 0)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df577c08adffa44e71b 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
   return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
 }
 
+/* Function vect_recog_gcond_pattern
+
+   Try to find pattern like following:
+
+     if (a op b)
+
+   where operator 'op' is not != and convert it to an adjusted boolean pattern
+
+     mask = a op b
+     if (mask != 0)
+
+   and set the mask type on MASK.
+
+   Input:
+
+   * STMT_VINFO: The stmt at the end from which the pattern
+		 search begins, i.e. cast of a bool to
+		 an integer type.
+
+   Output:
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.  */
+
+static gimple *
+vect_recog_gcond_pattern (vec_info *vinfo,
+			 stmt_vec_info stmt_vinfo, tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+  gcond* cond = NULL;
+  if (!(cond = dyn_cast <gcond *> (last_stmt)))
+    return NULL;
+
+  auto lhs = gimple_cond_lhs (cond);
+  auto rhs = gimple_cond_rhs (cond);
+  auto code = gimple_cond_code (cond);
+
+  tree scalar_type = TREE_TYPE (lhs);
+  if (VECTOR_TYPE_P (scalar_type))
+    return NULL;
+
+  if (code == NE_EXPR && zerop (rhs))
+    return NULL;
+
+  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  /* Build a scalar type for the boolean result that when vectorized matches the
+     vector type of the result in size and number of elements.  */
+  unsigned prec
+    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
+			   TYPE_VECTOR_SUBPARTS (vecitype));
+
+  scalar_type
+    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
+
+  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  tree vectype = truth_type_for (vecitype);
+
+  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
+  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
+  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
+
+  gimple *pattern_stmt
+    = gimple_build_cond (NE_EXPR, new_lhs,
+			 build_int_cst (TREE_TYPE (new_lhs), 0),
+			 NULL_TREE, NULL_TREE);
+  *type_out = vectype;
+  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
+  return pattern_stmt;
+}
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
+  { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
   /* This must come before mask conversion, and includes the parts
      of mask conversion that are needed for gather and scatter
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea2e00b4450023f9 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  DUMP_VECT_SCOPE ("vectorizable_early_exit");
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  stmt_vec_info op0_info = vinfo->lookup_def (op0);
+  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
+  gcc_assert (vectype);
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  Normally we loop over
+	 vec_num,  but since we inspect the exact results of vectorization
+	 we don't need to and instead can just use the stmts themselves.  */
+      if (masked_loop_p)
+	for (unsigned i = 0; i < stmts.length (); i++)
+	  {
+	    tree stmt_mask
+	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
+				    i);
+	    stmt_mask
+	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
+				  stmts[i], &cond_gsi);
+	    workset.quick_push (stmt_mask);
+	  }
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

[-- Attachment #2: rb17969.patch --]
[-- Type: application/octet-stream, Size: 15296 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
new file mode 100644
index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(double x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7.0) != 0)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df577c08adffa44e71b 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
   return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
 }
 
+/* Function vect_recog_gcond_pattern
+
+   Try to find pattern like following:
+
+     if (a op b)
+
+   where operator 'op' is not != and convert it to an adjusted boolean pattern
+
+     mask = a op b
+     if (mask != 0)
+
+   and set the mask type on MASK.
+
+   Input:
+
+   * STMT_VINFO: The stmt at the end from which the pattern
+		 search begins, i.e. cast of a bool to
+		 an integer type.
+
+   Output:
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.  */
+
+static gimple *
+vect_recog_gcond_pattern (vec_info *vinfo,
+			 stmt_vec_info stmt_vinfo, tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+  gcond* cond = NULL;
+  if (!(cond = dyn_cast <gcond *> (last_stmt)))
+    return NULL;
+
+  auto lhs = gimple_cond_lhs (cond);
+  auto rhs = gimple_cond_rhs (cond);
+  auto code = gimple_cond_code (cond);
+
+  tree scalar_type = TREE_TYPE (lhs);
+  if (VECTOR_TYPE_P (scalar_type))
+    return NULL;
+
+  if (code == NE_EXPR && zerop (rhs))
+    return NULL;
+
+  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  /* Build a scalar type for the boolean result that when vectorized matches the
+     vector type of the result in size and number of elements.  */
+  unsigned prec
+    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
+			   TYPE_VECTOR_SUBPARTS (vecitype));
+
+  scalar_type
+    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
+
+  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  tree vectype = truth_type_for (vecitype);
+
+  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
+  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
+  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
+
+  gimple *pattern_stmt
+    = gimple_build_cond (NE_EXPR, new_lhs,
+			 build_int_cst (TREE_TYPE (new_lhs), 0),
+			 NULL_TREE, NULL_TREE);
+  *type_out = vectype;
+  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
+  return pattern_stmt;
+}
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
+  { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
   /* This must come before mask conversion, and includes the parts
      of mask conversion that are needed for gather and scatter
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea2e00b4450023f9 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  DUMP_VECT_SCOPE ("vectorizable_early_exit");
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  stmt_vec_info op0_info = vinfo->lookup_def (op0);
+  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
+  gcc_assert (vectype);
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  Normally we loop over
+	 vec_num,  but since we inspect the exact results of vectorization
+	 we don't need to and instead can just use the stmts themselves.  */
+      if (masked_loop_p)
+	for (unsigned i = 0; i < stmts.length (); i++)
+	  {
+	    tree stmt_mask
+	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
+				    i);
+	    stmt_mask
+	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
+				  stmts[i], &cond_gsi);
+	    workset.quick_push (stmt_mask);
+	  }
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-11  7:09                   ` Tamar Christina
@ 2023-12-11  9:36                     ` Richard Biener
  2023-12-11 23:12                       ` Tamar Christina
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2023-12-11  9:36 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 11 Dec 2023, Tamar Christina wrote:

> > > >
> > > > Hmm, but we're visiting them then?  I wonder how you get along
> > > > without doing adjustmens on the uses if you consider
> > > >
> > > >     _1 = a < b;
> > > >     _2 = c != d;
> > > >     _3 = _1 | _2;
> > > >     if (_3 != 0)
> > > >       exit loop;
> > > >
> > > > thus a combined condition like
> > > >
> > > >     if (a < b || c != d)
> > > >
> > > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > > mask uses and thus possibly adjust them.
> > > >
> > > > What bad happens if you drop 'analyze_only'?  We're not really
> > > > rewriting anything there.
> > >
> > > You mean drop it only in the above? We then fail to update the type for
> > > the gcond.  So in certain circumstances like with
> > >
> > > int a, c, d;
> > > short b;
> > >
> > > int
> > > main ()
> > > {
> > >   int e[1];
> > >   for (; b < 2; b++)
> > >     {
> > >       a = 0;
> > >       if (b == 28378)
> > >         a = e[b];
> > >       if (!(d || b))
> > >         for (; c;)
> > >           ;
> > >     }
> > >   return 0;
> > > }
> > >
> > > Unless we walk the statements regardless of whether they come from inside the
> > loop or not.
> > 
> > What do you mean by "fail to update the type for the gcond"?  If
> > I understood correctly the 'analyze_only' short-cuts some
> > checks, it doens't add some?
> > 
> > But it's hard to follow what's actually done for a gcond ...
> > 
> 
> Yes so I had realized I had misunderstood what this pattern was doing and once
> I had made the first wrong change it snowballed.
> 
> This is an updates patch where the only modification made is to check_bool_pattern
> to also return the type of the overall expression even if we are going to handle the
> conditional through an optab expansion.  I'm piggybacking on the fact that this function
> has seen enough of the operands to be able to tell the precision needed when vectorizing.
> 
> This is needed because in the cases where the condition to the gcond was already a bool
> The precision would be 1 bit, to find the actual mask since we have to dig through the
> operands which this function already does.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(check_bool_pattern, vect_recog_bool_pattern): Support gconds type
> 	analysis.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..6bf1c0aba8ce94f70ce4e952efd1c5695b189690 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5210,10 +5211,12 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
>     true if bool VAR can and should be optimized that way.  Assume it shouldn't
>     in case it's a result of a comparison which can be directly vectorized into
>     a vector comparison.  Fills in STMTS with all stmts visited during the
> -   walk.  */
> +   walk.  if VECTYPE then this value will contain the common type of the
> +   operations making up the comparisons.  */
>  
>  static bool
> -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> +		    tree *vectype)
>  {
>    tree rhs1;
>    enum tree_code rhs_code;
> @@ -5234,27 +5237,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>    switch (rhs_code)
>      {
>      case SSA_NAME:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
>  	return false;
>        break;
>  
>      CASE_CONVERT:
>        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
>  	return false;
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
>  	return false;
>        break;
>  
>      case BIT_NOT_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
>  	return false;
>        break;
>  
>      case BIT_AND_EXPR:
>      case BIT_IOR_EXPR:
>      case BIT_XOR_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype)
> +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> +				   vectype))
>  	return false;
>        break;
>  
> @@ -5272,6 +5276,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>  	  if (comp_vectype == NULL_TREE)
>  	    return false;
>  
> +	  if (vectype)
> +	    *vectype = comp_vectype;
>  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
>  							  TREE_TYPE (rhs1));
>  	  if (mask_type
> @@ -5608,13 +5614,28 @@ vect_recog_bool_pattern (vec_info *vinfo,
>    enum tree_code rhs_code;
>    tree var, lhs, rhs, vectype;
>    gimple *pattern_stmt;
> -
> -  if (!is_gimple_assign (last_stmt))
> +  gcond* cond = NULL;
> +  if (!is_gimple_assign (last_stmt)
> +      && !(cond = dyn_cast <gcond *> (last_stmt)))
>      return NULL;

I still think the code will be much easier to follow if you add

     if (gcond *cond = dyn_cast <gcond *> (last_stmt))
       {
         thread to all branches
         return;
       }

     if (!is_gimple_assign (last_stmt))
       return NULL;

     .. original code unchanged ..

you can then also choose better names for the local variables.

> -  var = gimple_assign_rhs1 (last_stmt);
> -  lhs = gimple_assign_lhs (last_stmt);
> -  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (is_gimple_assign (last_stmt))
> +    {
> +      var = gimple_assign_rhs1 (last_stmt);
> +      lhs = gimple_assign_lhs (last_stmt);
> +      rhs_code = gimple_assign_rhs_code (last_stmt);
> +    }
> +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    {
> +      /* If not multiple exits, and loop vectorization don't bother analyzing
> +	 the gcond as we don't support SLP today.  */
> +      lhs = gimple_cond_lhs (last_stmt);
> +      var = gimple_cond_lhs (last_stmt);
> +      rhs_code = gimple_cond_code (last_stmt);
> +    }
> +  else
> +    return NULL;
>  
>    if (rhs_code == VIEW_CONVERT_EXPR)
>      var = TREE_OPERAND (var, 0);
> @@ -5632,7 +5653,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  	return NULL;
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
>  	{
>  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				   TREE_TYPE (lhs), stmt_vinfo);
> @@ -5680,7 +5701,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  
>        return pattern_stmt;
>      }
> -  else if (rhs_code == COND_EXPR
> +  else if ((rhs_code == COND_EXPR || cond)
>  	   && TREE_CODE (var) == SSA_NAME)
>      {
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> @@ -5700,18 +5721,33 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      tree comp_type = NULL_TREE;
> +      if (check_bool_pattern (var, vinfo, bool_stmts, &comp_type))
>  	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> -      else if (integer_type_for_mask (var, vinfo))
> +      else if (!cond && integer_type_for_mask (var, vinfo))
> +	return NULL;
> +      else if (cond && !comp_type)
>  	return NULL;
>  
> -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> -      pattern_stmt 
> -	= gimple_build_assign (lhs, COND_EXPR,
> -			       build2 (NE_EXPR, boolean_type_node,
> -				       var, build_int_cst (TREE_TYPE (var), 0)),
> -			       gimple_assign_rhs2 (last_stmt),
> -			       gimple_assign_rhs3 (last_stmt));
> +      if (!cond)
> +	{
> +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +	  pattern_stmt
> +	    = gimple_build_assign (lhs, COND_EXPR,
> +				   build2 (NE_EXPR, boolean_type_node, var,
> +					   build_int_cst (TREE_TYPE (var), 0)),
> +				   gimple_assign_rhs2 (last_stmt),
> +				   gimple_assign_rhs3 (last_stmt));
> +	}
> +      else
> +	{
> +	  pattern_stmt
> +	    = gimple_build_cond (NE_EXPR,
> +				 var, build_int_cst (TREE_TYPE (var), 0),
> +				 gimple_cond_true_label (cond),
> +				 gimple_cond_false_label (cond));

the labels are always NULL, so just use NULL_TREE for them.

> +	  vectype = truth_type_for (comp_type);

so this leaves the producer of the mask in the GIMPLE_COND and we
vectorize the GIMPLE_COND as

  mask_1 = ...;
  if (mask_1 != {-1,-1...})
    ..

?  In principle only the mask producer needs a vector type and that
adjusted by bool handling, the branch itself doesn't need any
STMT_VINFO_VECTYPE.

As said I believe if you recognize a GIMPLE_COND pattern for conds
that aren't bool != 0 producing the mask stmt this should be picked
up by bool handling correctly already.

Also as said piggy-backing on the COND_EXPR handling in this function
which has the condition split out into a separate stmt(!) might not
completely handle things correctly and you are likely missing
the tcc_comparison handling of the embedded compare.

> +	}
>        *type_out = vectype;
>        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
>  
> @@ -5725,7 +5761,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
>  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				 TREE_TYPE (vectype), stmt_vinfo);
>        else
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..d0878250f6fb9de4d6e6a39d16956ca147be4b80 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,198 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  gcc_assert (vectype);
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  tree mask = NULL_TREE;
> +  if (masked_loop_p)
> +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +
> +      /* Mask the statements as we queue them up.  */
> +      if (masked_loop_p)
> +	for (auto stmt : stmts)
> +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> +						mask, stmt, &cond_gsi));
> +      else
> +	workset.splice (stmts);
> +
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			     &cond_gsi);

You didn't fix any of the code above it seems, it's still wrong.

Richard.

> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  /* When vectorizing we assume that if the branch edge is taken that we're
> +     exiting the loop.  This is not however always the case as the compiler will
> +     rewrite conditions to always be a comparison against 0.  To do this it
> +     sometimes flips the edges.  This is fine for scalar,  but for vector we
> +     then have to flip the test, as we're still assuming that if you take the
> +     branch edge that we found the exit condition.  */
> +  auto new_code = NE_EXPR;
> +  tree cst = build_zero_cst (vectype);
> +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> +			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> +    {
> +      new_code = EQ_EXPR;
> +      cst = build_minus_one_cst (vectype);
> +    }
> +
> +  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13145,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13170,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13332,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14528,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14555,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08 13:59                 ` Richard Biener
  2023-12-08 15:01                   ` Tamar Christina
@ 2023-12-11  7:09                   ` Tamar Christina
  2023-12-11  9:36                     ` Richard Biener
  1 sibling, 1 reply; 24+ messages in thread
From: Tamar Christina @ 2023-12-11  7:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 19809 bytes --]

> > >
> > > Hmm, but we're visiting them then?  I wonder how you get along
> > > without doing adjustmens on the uses if you consider
> > >
> > >     _1 = a < b;
> > >     _2 = c != d;
> > >     _3 = _1 | _2;
> > >     if (_3 != 0)
> > >       exit loop;
> > >
> > > thus a combined condition like
> > >
> > >     if (a < b || c != d)
> > >
> > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > mask uses and thus possibly adjust them.
> > >
> > > What bad happens if you drop 'analyze_only'?  We're not really
> > > rewriting anything there.
> >
> > You mean drop it only in the above? We then fail to update the type for
> > the gcond.  So in certain circumstances like with
> >
> > int a, c, d;
> > short b;
> >
> > int
> > main ()
> > {
> >   int e[1];
> >   for (; b < 2; b++)
> >     {
> >       a = 0;
> >       if (b == 28378)
> >         a = e[b];
> >       if (!(d || b))
> >         for (; c;)
> >           ;
> >     }
> >   return 0;
> > }
> >
> > Unless we walk the statements regardless of whether they come from inside the
> loop or not.
> 
> What do you mean by "fail to update the type for the gcond"?  If
> I understood correctly the 'analyze_only' short-cuts some
> checks, it doens't add some?
> 
> But it's hard to follow what's actually done for a gcond ...
> 

Yes so I had realized I had misunderstood what this pattern was doing and once
I had made the first wrong change it snowballed.

This is an updates patch where the only modification made is to check_bool_pattern
to also return the type of the overall expression even if we are going to handle the
conditional through an optab expansion.  I'm piggybacking on the fact that this function
has seen enough of the operands to be able to tell the precision needed when vectorizing.

This is needed because in the cases where the condition to the gcond was already a bool
The precision would be 1 bit, to find the actual mask since we have to dig through the
operands which this function already does.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
	(check_bool_pattern, vect_recog_bool_pattern): Support gconds type
	analysis.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..6bf1c0aba8ce94f70ce4e952efd1c5695b189690 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,10 +5211,12 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if VECTYPE then this value will contain the common type of the
+   operations making up the comparisons.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    tree *vectype)
 {
   tree rhs1;
   enum tree_code rhs_code;
@@ -5234,27 +5237,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   vectype))
 	return false;
       break;
 
@@ -5272,6 +5276,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  if (comp_vectype == NULL_TREE)
 	    return false;
 
+	  if (vectype)
+	    *vectype = comp_vectype;
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
@@ -5608,13 +5614,28 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      /* If not multiple exits, and loop vectorization don't bother analyzing
+	 the gcond as we don't support SLP today.  */
+      lhs = gimple_cond_lhs (last_stmt);
+      var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
+  else
+    return NULL;
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5653,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5701,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5721,33 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      tree comp_type = NULL_TREE;
+      if (check_bool_pattern (var, vinfo, bool_stmts, &comp_type))
 	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
-      else if (integer_type_for_mask (var, vinfo))
+      else if (!cond && integer_type_for_mask (var, vinfo))
+	return NULL;
+      else if (cond && !comp_type)
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (NE_EXPR,
+				 var, build_int_cst (TREE_TYPE (var), 0),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = truth_type_for (comp_type);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5761,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..d0878250f6fb9de4d6e6a39d16956ca147be4b80 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,198 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  */
+      if (masked_loop_p)
+	for (auto stmt : stmts)
+	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
+						mask, stmt, &cond_gsi));
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13145,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13170,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13332,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14528,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14555,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

[-- Attachment #2: rb17969 (2).patch --]
[-- Type: application/octet-stream, Size: 16690 bytes --]

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..6bf1c0aba8ce94f70ce4e952efd1c5695b189690 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,10 +5211,12 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if VECTYPE then this value will contain the common type of the
+   operations making up the comparisons.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    tree *vectype)
 {
   tree rhs1;
   enum tree_code rhs_code;
@@ -5234,27 +5237,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   vectype))
 	return false;
       break;
 
@@ -5272,6 +5276,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  if (comp_vectype == NULL_TREE)
 	    return false;
 
+	  if (vectype)
+	    *vectype = comp_vectype;
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
@@ -5608,13 +5614,28 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      /* If not multiple exits, and loop vectorization don't bother analyzing
+	 the gcond as we don't support SLP today.  */
+      lhs = gimple_cond_lhs (last_stmt);
+      var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
+  else
+    return NULL;
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5653,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5701,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5721,33 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      tree comp_type = NULL_TREE;
+      if (check_bool_pattern (var, vinfo, bool_stmts, &comp_type))
 	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
-      else if (integer_type_for_mask (var, vinfo))
+      else if (!cond && integer_type_for_mask (var, vinfo))
+	return NULL;
+      else if (cond && !comp_type)
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (NE_EXPR,
+				 var, build_int_cst (TREE_TYPE (var), 0),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = truth_type_for (comp_type);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5761,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..d0878250f6fb9de4d6e6a39d16956ca147be4b80 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,198 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  */
+      if (masked_loop_p)
+	for (auto stmt : stmts)
+	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
+						mask, stmt, &cond_gsi));
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13145,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13170,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13332,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14528,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14555,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08 13:59                 ` Richard Biener
@ 2023-12-08 15:01                   ` Tamar Christina
  2023-12-11  7:09                   ` Tamar Christina
  1 sibling, 0 replies; 24+ messages in thread
From: Tamar Christina @ 2023-12-08 15:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, December 8, 2023 2:00 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Fri, 8 Dec 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Friday, December 8, 2023 10:28 AM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > > Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> > > codegen of exit code
> > >
> > > On Fri, 8 Dec 2023, Tamar Christina wrote:
> > >
> > > > > --param vect-partial-vector-usage=2 would, no?
> > > > >
> > > > I.. didn't even know it went to 2!
> > > >
> > > > > > In principal I suppose I could mask the individual stmts, that should handle
> > > the
> > > > > future case when
> > > > > > This is relaxed to supposed non-fix length buffers?
> > > > >
> > > > > Well, it looks wrong - either put in an assert that we start with a
> > > > > single stmt or assert !masked_loop_p instead?  Better ICE than
> > > > > generate wrong code.
> > > > >
> > > > > That said, I think you need to apply the masking on the original
> > > > > stmts[], before reducing them, no?
> > > >
> > > > Yeah, I've done so now.  For simplicity I've just kept the final masking always
> as
> > > well
> > > > and just leave it up to the optimizers to drop it when it's superfluous.
> > > >
> > > > Simple testcase:
> > > >
> > > > #ifndef N
> > > > #define N 837
> > > > #endif
> > > > float vect_a[N];
> > > > unsigned vect_b[N];
> > > >
> > > > unsigned test4(double x)
> > > > {
> > > >  unsigned ret = 0;
> > > >  for (int i = 0; i < N; i++)
> > > >  {
> > > >    if (vect_a[i] > x)
> > > >      break;
> > > >    vect_a[i] = x;
> > > >
> > > >  }
> > > >  return ret;
> > > > }
> > > >
> > > > Looks good now. After this one there's only one patch left, the dependency
> > > analysis.
> > > > I'm almost done with the cleanup/respin, but want to take the weekend to
> > > double check and will post it first thing Monday morning.
> > > >
> > > > Did you want to see the testsuite changes as well again? I've basically just
> added
> > > the right dg-requires-effective and add-options etc.
> > >
> > > Yes please.
> > >
> > > > Thanks for all the reviews!
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > > > 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> > > > 	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> > > > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > > > 	lhs.
> > > > 	(vectorizable_early_exit): New.
> > > > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > > > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> > > >
> > > >
> > > > --- inline copy of patch ---
> > > >
> > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > > index
> > >
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848
> > > ae12523576d29744d 100644
> > > > --- a/gcc/tree-vect-patterns.cc
> > > > +++ b/gcc/tree-vect-patterns.cc
> > > > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> > > *pattern_stmt,
> > > >    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> > > >      {
> > > >        gcc_assert (!vectype
> > > > +		  || is_a <gcond *> (pattern_stmt)
> > > >  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
> > > >  		      == vect_use_mask_type_p (orig_stmt_info)));
> > > >        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > > > @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info
> > > *vinfo,
> > > >     true if bool VAR can and should be optimized that way.  Assume it shouldn't
> > > >     in case it's a result of a comparison which can be directly vectorized into
> > > >     a vector comparison.  Fills in STMTS with all stmts visited during the
> > > > -   walk.  */
> > > > +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform
> > > any
> > > > +   codegen associated with the boolean condition.  */
> > > >
> > > >  static bool
> > > > -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> > > > +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> > > > +		    bool analyze_only)
> > > >  {
> > > >    tree rhs1;
> > > >    enum tree_code rhs_code;
> > > > +  gassign *def_stmt = NULL;
> > > >
> > > >    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> > > > -  if (!def_stmt_info)
> > > > +  if (!def_stmt_info && !analyze_only)
> > > >      return false;
> > > > +  else if (!def_stmt_info)
> > > > +    /* If we're a only analyzing we won't be codegen-ing the statements and
> are
> > > > +       only after if the types match.  In that case we can accept loop invariant
> > > > +       values.  */
> > > > +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> > > > +  else
> > > > +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> > > >
> > >
> > > Hmm, but we're visiting them then?  I wonder how you get along
> > > without doing adjustmens on the uses if you consider
> > >
> > >     _1 = a < b;
> > >     _2 = c != d;
> > >     _3 = _1 | _2;
> > >     if (_3 != 0)
> > >       exit loop;
> > >
> > > thus a combined condition like
> > >
> > >     if (a < b || c != d)
> > >
> > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > mask uses and thus possibly adjust them.
> > >
> > > What bad happens if you drop 'analyze_only'?  We're not really
> > > rewriting anything there.
> >
> > You mean drop it only in the above? We then fail to update the type for
> > the gcond.  So in certain circumstances like with
> >
> > int a, c, d;
> > short b;
> >
> > int
> > main ()
> > {
> >   int e[1];
> >   for (; b < 2; b++)
> >     {
> >       a = 0;
> >       if (b == 28378)
> >         a = e[b];
> >       if (!(d || b))
> >         for (; c;)
> >           ;
> >     }
> >   return 0;
> > }
> >
> > Unless we walk the statements regardless of whether they come from inside the
> loop or not.
> 
> What do you mean by "fail to update the type for the gcond"?  If
> I understood correctly the 'analyze_only' short-cuts some
> checks, it doens't add some?

analyze_only got it to not check for a vector compare because I wasn't rewriting the
condition and still kept it in the gcond.  And the gcond check happens later anyway
during vectorizable_early_exit.

But more on it below.

> 
> But it's hard to follow what's actually done for a gcond ...
> 
> > >
> > > > -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> > > >    if (!def_stmt)
> > > >      return false;
> > > >
> > > > @@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > > hash_set<gimple *> &stmts)
> > > >    switch (rhs_code)
> > > >      {
> > > >      case SSA_NAME:
> > > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > > >  	return false;
> > > >        break;
> > > >
> > > >      CASE_CONVERT:
> > > >        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> > > >  	return false;
> > > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > > >  	return false;
> > > >        break;
> > > >
> > > >      case BIT_NOT_EXPR:
> > > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > > >  	return false;
> > > >        break;
> > > >
> > > >      case BIT_AND_EXPR:
> > > >      case BIT_IOR_EXPR:
> > > >      case BIT_XOR_EXPR:
> > > > -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> > > > -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> > > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
> > > > +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> > > > +				   analyze_only))
> > > >  	return false;
> > > >        break;
> > > >
> > > > @@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > > hash_set<gimple *> &stmts)
> > > >  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
> > > >  							  TREE_TYPE (rhs1));
> > > >  	  if (mask_type
> > > > +	      && !analyze_only
> > > >  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
> > > >  	    return false;
> > > >
> > > > @@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > > hash_set<gimple *> &stmts)
> > > >  	    }
> > > >  	  else
> > > >  	    vecitype = comp_vectype;
> > > > -	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > > > +	  if (!analyze_only
> > > > +	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > > >  	    return false;
> > > >  	}
> > > >        else
> > > > @@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
> > > >     VAR is an SSA_NAME that should be transformed from bool to a wider
> integer
> > > >     type, OUT_TYPE is the desired final integer type of the whole pattern.
> > > >     STMT_INFO is the info of the pattern root and is where pattern stmts
> should
> > > > -   be associated with.  DEFS is a map of pattern defs.  */
> > > > +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> > > > +   create new pattern statements and instead only fill LAST_STMT and DEFS.
> */
> > > >
> > > >  static void
> > > >  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> > > > -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> > > > +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> > > > +		     gimple *&last_stmt, bool type_only)
> > > >  {
> > > >    gimple *stmt = SSA_NAME_DEF_STMT (var);
> > > >    enum tree_code rhs_code, def_rhs_code;
> > > > @@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var,
> tree
> > > out_type,
> > > >      }
> > > >
> > > >    gimple_set_location (pattern_stmt, loc);
> > > > -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > > > -			  get_vectype_for_scalar_type (vinfo, itype));
> > > > +  if (!type_only)
> > > > +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > > > +			    get_vectype_for_scalar_type (vinfo, itype));
> > > > +  last_stmt = pattern_stmt;
> > > >    defs.put (var, gimple_assign_lhs (pattern_stmt));
> > > >  }
> > > >
> > > > -/* Comparison function to qsort a vector of gimple stmts after UID.  */
> > > > +/* Comparison function to qsort a vector of gimple stmts after BB and UID.
> > > > +   the def of one statement can be in an earlier block than the use, so if
> > > > +   the BB are different, first compare by BB.  */
> > > >
> > > >  static int
> > > >  sort_after_uid (const void *p1, const void *p2)
> > > >  {
> > > >    const gimple *stmt1 = *(const gimple * const *)p1;
> > > >    const gimple *stmt2 = *(const gimple * const *)p2;
> > > > +  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
> > > > +    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
> > > > +
> > >
> > > is this because you eventually get out-of-loop stmts (without UID)?
> > >
> >
> > No the problem I was having is that with an early exit the statement of
> > one branch of the compare can be in a different BB than the other.
> >
> > The testcase specifically was this:
> >
> > int a, c, d;
> > short b;
> >
> > int
> > main ()
> > {
> >   int e[1];
> >   for (; b < 2; b++)
> >     {
> >       a = 0;
> >       if (b == 28378)
> >         a = e[b];
> >       if (!(d || b))
> >         for (; c;)
> >           ;
> >     }
> >   return 0;
> > }
> >
> > Without debug info it happened to work:
> >
> > >>> p gimple_uid (bool_stmts[0])
> > $1 = 3
> > >>> p gimple_uid (bool_stmts[1])
> > $2 = 3
> > >>> p gimple_uid (bool_stmts[2])
> > $3 = 4
> >
> > The first two statements got the same uid, but are in different BB in the loop.
> > When we add debug, it looks like 1 bb got more debug state than the other:
> >
> > >>> p gimple_uid (bool_stmts[0])
> > $1 = 3
> > >>> p gimple_uid (bool_stmts[1])
> > $2 = 4
> > >>> p gimple_uid (bool_stmts[2])
> > $3 = 6
> >
> > That last statement, which now has a UID of 6 used to be 3.
> 
> ?  gimple_uid is used to map to stmt_vec_info and initially all UIDs
> are zero.  It should never happen that two stmts belonging to the
> same analyzed loop have the same UID.  In particular debug stmts
> never get stmt_vec_info and thus no UID.
> 
> If you run into stmts not within the loop or that have no stmt_info
> then all bets are off and you can't use UID at all.
> 
> As said, I didn't get why you look at those.

Right, it was hard to tell from the gimple dumps, but the graph made me realize
that your initial statement was right,  one of the uses is out of loop.

https://gist.github.com/Mistuke/2460471529e6e42d34d5db0b307ff3cf

where _12 is out of loop.

I don't particularly care about the out of loop use itself, only the type of it in
the loop.  But check_bool_pattern needs to see all the uses or it stops and
returns NULL.  In that case the Boolean pattern isn't generated and we end
up vectorizing with the wrong types.

That said I the loop above should be valid for vectorization.

Part of the reason for doing this isn't just for the mask uses, it's also to figure
out what the type of the arguments are.  But your right in that I need to actually
replace the statements...  I may have misunderstood how the pattern was supposed to
work as I was focused on mainly determining the correct type of the gcond in the case
where the input types differ, like:

#define N 1024
complex double vect_a[N];
complex double vect_b[N];

complex double test4(complex double x)
{
 complex double ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] += x + i;
   if (vect_a[i] == x)
     return i;
   vect_a[i] += x * vect_b[i];

 }
 return ret;
}

Reverting the analyze_stmt we fail to vectorize, but that could be something else,
I'll investigate.

However...

> 
> > > >    return gimple_uid (stmt1) - gimple_uid (stmt2);
> > > >  }
> > > >
> > > >  /* Create pattern stmts for all stmts participating in the bool pattern
> > > >     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> > > > -   OUT_TYPE.  Return the def of the pattern root.  */
> > > > +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> > > > +   statements are not emitted as pattern statements and the tree returned is
> > > > +   only useful for type queries.  */
> > > >
> > > >  static tree
> > > >  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> > > > -		   tree out_type, stmt_vec_info stmt_info)
> > > > +		   tree out_type, stmt_vec_info stmt_info,
> > > > +		   bool type_only = false)
> > > >  {
> > > >    /* Gather original stmts in the bool pattern in their order of appearance
> > > >       in the IL.  */
> > > > @@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set
> > > <gimple *> &bool_stmt_set,
> > > >      bool_stmts.quick_push (*i);
> > > >    bool_stmts.qsort (sort_after_uid);
> > > >
> > > > +  gimple *last_stmt = NULL;
> > > > +
> > > >    /* Now process them in that order, producing pattern stmts.  */
> > > >    hash_map <tree, tree> defs;
> > > > -  for (unsigned i = 0; i < bool_stmts.length (); ++i)
> > > > -    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> > > > -			 out_type, stmt_info, defs);
> > > > +  for (auto bool_stmt : bool_stmts)
> > > > +    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
> > > > +			 out_type, stmt_info, defs, last_stmt, type_only);
> > > >
> > > >    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> > > > -  gimple *pattern_stmt
> > > > -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> > > > -  return gimple_assign_lhs (pattern_stmt);
> > > > +  return gimple_assign_lhs (last_stmt);
> > > >  }
> > > >
> > > >  /* Return the proper type for converting bool VAR into
> > > > @@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >    enum tree_code rhs_code;
> > > >    tree var, lhs, rhs, vectype;
> > > >    gimple *pattern_stmt;
> > > > -
> > > > -  if (!is_gimple_assign (last_stmt))
> > > > +  gcond* cond = NULL;
> > > > +  if (!is_gimple_assign (last_stmt)
> > > > +      && !(cond = dyn_cast <gcond *> (last_stmt)))
> > > >      return NULL;
> > > >
> > > > -  var = gimple_assign_rhs1 (last_stmt);
> > > > -  lhs = gimple_assign_lhs (last_stmt);
> > > > -  rhs_code = gimple_assign_rhs_code (last_stmt);
> > > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > > +  if (is_gimple_assign (last_stmt))
> > > > +    {
> > > > +      var = gimple_assign_rhs1 (last_stmt);
> > > > +      lhs = gimple_assign_lhs (last_stmt);
> > > > +      rhs_code = gimple_assign_rhs_code (last_stmt);
> > > > +    }
> > > > +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    {
> > > > +      /* If not multiple exits, and loop vectorization don't bother analyzing
> > > > +	 the gcond as we don't support SLP today.  */
> > > > +      lhs = var = gimple_cond_lhs (last_stmt);
> > > > +      rhs_code = gimple_cond_code (last_stmt);
> > > > +    }
> > > > +  else
> > > > +    return NULL;
> > > >
> > > >    if (rhs_code == VIEW_CONVERT_EXPR)
> > > >      var = TREE_OPERAND (var, 0);
> > > > @@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >  	return NULL;
> > > >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > > >
> > > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> > > >  	{
> > > >  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
> > > >  				   TREE_TYPE (lhs), stmt_vinfo);
> > > > @@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >
> > > >        return pattern_stmt;
> > > >      }
> > > > -  else if (rhs_code == COND_EXPR
> > > > +  else if ((rhs_code == COND_EXPR || cond)
> > > >  	   && TREE_CODE (var) == SSA_NAME)
> > > >      {
> > > >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > > > @@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
> > > >  	return NULL;
> > > >
> > > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > > -	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> > > > +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
> > > > +	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
> > > >        else if (integer_type_for_mask (var, vinfo))
> > > >  	return NULL;
> > > >
> > > > -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > > > -      pattern_stmt
> > > > -	= gimple_build_assign (lhs, COND_EXPR,
> > > > -			       build2 (NE_EXPR, boolean_type_node,
> > > > -				       var, build_int_cst (TREE_TYPE (var), 0)),
> > > > -			       gimple_assign_rhs2 (last_stmt),
> > > > -			       gimple_assign_rhs3 (last_stmt));
> > > > +      if (!cond)
> > > > +	{
> > > > +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > > > +	  pattern_stmt
> > > > +	    = gimple_build_assign (lhs, COND_EXPR,
> > > > +				   build2 (NE_EXPR, boolean_type_node, var,
> > > > +					   build_int_cst (TREE_TYPE (var), 0)),
> > > > +				   gimple_assign_rhs2 (last_stmt),
> > > > +				   gimple_assign_rhs3 (last_stmt));
> > > > +	}
> > > > +      else
> > > > +	{
> > > > +	  pattern_stmt
> > > > +	    = gimple_build_cond (gimple_cond_code (cond),
> > > > +				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
> > > > +				 gimple_cond_true_label (cond),
> > > > +				 gimple_cond_false_label (cond));
> > > > +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> > > > +	  vectype = truth_type_for (vectype);
> > > > +	}
> > > >        *type_out = vectype;
> > > >        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
> > > >
> > >
> > > So this is also quite odd.  You're hooking into COND_EXPR handling
> > > but only look at the LHS of the GIMPLE_COND compare.
> > >
> >
> > Hmm, not sure I follow, GIMPLE_CONDs don't have an LHS no? we look at the
> LHS
> > For the COND_EXPR but a GCOND we just recreate the statement and set
> vectype
> > based on the updated var. I guess this is related to:
> 
> a GIMPLE_COND has "lhs" and "rhs", the two operands of the embedded
> compare.  You seem to look at only "lhs" for analyzing bool patterns.
> 
> > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > mask uses and thus possibly adjust them.
> >
> > Which I did think about somewhat, so what you're saying is that I need to create
> > a new GIMPLE_COND here with an NE to 0 compare against var like the
> COND_EXPR
> > case?
> 
> Well, it depends how you wire eveything up.  But since we later want
> a mask def and vectorize the GIMPLE_COND as cbranch it seemed to me
> it's easiest to pattern
> 
>   if (a > b)
> 
> as
> 
>   mask.patt = a > b;
>   if (mask.patt != 0)
> 
> I thought you were doing this.  And yes, COND_EXPRs are now
> effectively doing that since we no longer embed a comparison
> in the first operand (only the pattern recognizer still does that
> as I was lazy).

I did initially do that, but reverted it since I quite understand the
full point of the pattern.   Added it back in.

> 
> >
> > > Please refactor the changes to separate the GIMPLE_COND path
> > > completely.
> > >
> >
> > Ok, then it seems better to make two patterns?
> 
> Maybe.
> 
> > > Is there test coverage for such "complex" condition?  I think
> > > you'll need adjustments to vect_recog_mask_conversion_pattern
> > > as well similar as to how COND_EXPR is handled there.
> >
> > Yes, the existing testsuite has many cases which fail, including gcc/testsuite/gcc.c-
> torture/execute/20150611-1.c
> 
> Fail in which way?  Fail to vectorize because we don't handle such
> condition?

No, it vectorizes, but crashes later in

gcc/testsuite/gcc.c-torture/execute/20150611-1.c: In function 'main':
gcc/testsuite/gcc.c-torture/execute/20150611-1.c:5:1: internal compiler error: in eliminate_stmt, at tree-ssa-sccvn.cc:6959
0x19f43cc eliminate_dom_walker::eliminate_stmt(basic_block_def*, gimple_stmt_iterator*)
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-ssa-sccvn.cc:6959
0x19f929f process_bb
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-ssa-sccvn.cc:8171
0x19fb393 do_rpo_vn_1
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-ssa-sccvn.cc:8621
0x19fbad4 do_rpo_vn(function*, edge_def*, bitmap_head*, bool, bool, vn_lookup_kind)
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-ssa-sccvn.cc:8723
0x1b1bd82 execute
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-vectorizer.cc:1389

Because

>>> p debug_tree (lhs)
 <ssa_name 0x7fbfcbf20750
    type <vector_type 0x7fbfcbf27690
        type <boolean_type 0x7fbfcbf27bd0 public QI
            size <integer_cst 0x7fbfcbfdcf60 constant 8>
            unit-size <integer_cst 0x7fbfcbfdcf78 constant 1>
            align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fbfcbf27bd0 precision:8 min <integer_cst 0x7fbfcbf26048 -128> max <integer_cst 0x7fbfcbf26060 127>>
        V8QI
        size <integer_cst 0x7fbfcbfdce70 constant 64>
        unit-size <integer_cst 0x7fbfcbfdce88 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fbfcbf27690 nunits:8>

    def_stmt cmp_56 = mask__13.21_54 ^ vect_cst__55;
    version:56>
$1 = void

Because the out of tree operand made the pattern not apply and so we didn't vectorize using V8HI as we should have.

So not sure what to do for those cases.

Regards,
Tamar

> 
> Thanks,
> Richard.
> 
> > Cheers,
> > Tamar
> >
> > >
> > > > @@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
> > > >  	return NULL;
> > > >
> > > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> > > >  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
> > > >  				 TREE_TYPE (vectype), stmt_vinfo);
> > > >        else
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index
> > >
> 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f5
> > > 62721c140d586c94 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree
> > > vectype,
> > > >    vec<tree> vec_oprnds0 = vNULL;
> > > >    vec<tree> vec_oprnds1 = vNULL;
> > > >    tree mask_type;
> > > > -  tree mask;
> > > > +  tree mask = NULL_TREE;
> > > >
> > > >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > > >      return false;
> > > > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree
> > > vectype,
> > > >    /* Transform.  */
> > > >
> > > >    /* Handle def.  */
> > > > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > > > -  mask = vect_create_destination_var (lhs, mask_type);
> > > > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> > > > +  if (lhs)
> > > > +    mask = vect_create_destination_var (lhs, mask_type);
> > > >
> > > >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> > > >  		     rhs1, &vec_oprnds0, vectype,
> > > > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree
> > > vectype,
> > > >        gimple *new_stmt;
> > > >        vec_rhs2 = vec_oprnds1[i];
> > > >
> > > > -      new_temp = make_ssa_name (mask);
> > > > +      if (lhs)
> > > > +	new_temp = make_ssa_name (mask);
> > > > +      else
> > > > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> > > >        if (bitop1 == NOP_EXPR)
> > > >  	{
> > > >  	  new_stmt = gimple_build_assign (new_temp, code,
> > > > @@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
> > > >    return true;
> > > >  }
> > > >
> > > > +/* Check to see if the current early break given in STMT_INFO is valid for
> > > > +   vectorization.  */
> > > > +
> > > > +static bool
> > > > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > > > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > > > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > > > +{
> > > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > > +  if (!loop_vinfo
> > > > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > > > +    return false;
> > > > +
> > > > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> > > > +    return false;
> > > > +
> > > > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > > > +    return false;
> > > > +
> > > > +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> > > > +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > > > +  gcc_assert (vectype);
> > > > +
> > > > +  tree vectype_op0 = NULL_TREE;
> > > > +  slp_tree slp_op0;
> > > > +  tree op0;
> > > > +  enum vect_def_type dt0;
> > > > +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0,
> &dt0,
> > > > +			   &vectype_op0))
> > > > +    {
> > > > +      if (dump_enabled_p ())
> > > > +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			   "use not simple.\n");
> > > > +	return false;
> > > > +    }
> > >
> > > I think you rely on patterns transforming this into canonical form
> > > mask != 0, so I suggest to check this here.
> > >
> > > > +  machine_mode mode = TYPE_MODE (vectype);
> > > > +  int ncopies;
> > > > +
> > > > +  if (slp_node)
> > > > +    ncopies = 1;
> > > > +  else
> > > > +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > > > +
> > > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > > +
> > > > +  /* Analyze only.  */
> > > > +  if (!vec_stmt)
> > > > +    {
> > > > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target doesn't support flag setting vector "
> > > > +			       "comparisons.\n");
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (ncopies > 1
> > >
> > > Also required for vec_num > 1 with SLP
> > > (SLP_TREE_NUMBER_OF_VEC_STMTS)
> > >
> > > > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target does not support boolean vector OR for "
> > > > +			       "type %T.\n", vectype);
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > > > +				      vec_stmt, slp_node, cost_vec))
> > > > +	return false;
> > > > +
> > > > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > > > +	{
> > > > +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> > > > +					      OPTIMIZE_FOR_SPEED))
> > > > +	    return false;
> > > > +	  else
> > > > +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> > > > +	}
> > > > +
> > > > +
> > > > +      return true;
> > > > +    }
> > > > +
> > > > +  /* Tranform.  */
> > > > +
> > > > +  tree new_temp = NULL_TREE;
> > > > +  gimple *new_stmt = NULL;
> > > > +
> > > > +  if (dump_enabled_p ())
> > > > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> > > > +
> > > > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > > > +				  vec_stmt, slp_node, cost_vec))
> > > > +    gcc_unreachable ();
> > > > +
> > > > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > > > +  basic_block cond_bb = gimple_bb (stmt);
> > > > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > > > +
> > > > +  auto_vec<tree> stmts;
> > > > +
> > > > +  tree mask = NULL_TREE;
> > > > +  if (masked_loop_p)
> > > > +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> > > > +
> > > > +  if (slp_node)
> > > > +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > > > +  else
> > > > +    {
> > > > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > > > +      stmts.reserve_exact (vec_stmts.length ());
> > > > +      for (auto stmt : vec_stmts)
> > > > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > > > +    }
> > > > +
> > > > +  /* Determine if we need to reduce the final value.  */
> > > > +  if (stmts.length () > 1)
> > > > +    {
> > > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > > +	 possible.  */
> > > > +      auto_vec<tree> workset (stmts.length ());
> > > > +
> > > > +      /* Mask the statements as we queue them up.  */
> > > > +      if (masked_loop_p)
> > > > +	for (auto stmt : stmts)
> > > > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > > > +						mask, stmt, &cond_gsi));
> > >
> > > I think this still uses the wrong mask, you need to use
> > >
> > >   vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, <cnt>)
> > >
> > > replacing <cnt> with the vector def index to mask I think.  For this
> > > reason keeping the "final" mask below is also wrong.
> > >
> > > Or am I missing something?
> > >
> > > > +      else
> > > > +	workset.splice (stmts);
> > > > +
> > > > +      while (workset.length () > 1)
> > > > +	{
> > > > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > > +	  tree arg0 = workset.pop ();
> > > > +	  tree arg1 = workset.pop ();
> > > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > > +				       &cond_gsi);
> > > > +	  workset.quick_insert (0, new_temp);
> > > > +	}
> > > > +    }
> > > > +  else
> > > > +    new_temp = stmts[0];
> > > > +
> > > > +  gcc_assert (new_temp);
> > > > +
> > > > +  tree cond = new_temp;
> > > > +  /* If we have multiple statements after reduction we should check all the
> > > > +     lanes and treat it as a full vector.  */
> > > > +  if (masked_loop_p)
> > > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > > +			     &cond_gsi);
> > >
> > > so just do this in the else path above
> > >
> > > Otherwise looks OK.
> > >
> > > Richard.
> > >
> > > > +  /* Now build the new conditional.  Pattern gimple_conds get dropped
> during
> > > > +     codegen so we must replace the original insn.  */
> > > > +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > > > +  gcond *cond_stmt = as_a <gcond *>(stmt);
> > > > +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> > > > +			     build_zero_cst (vectype));
> > > > +  update_stmt (stmt);
> > > > +
> > > > +  if (slp_node)
> > > > +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > > > +   else
> > > > +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> > > > +
> > > > +
> > > > +  if (!slp_node)
> > > > +    *vec_stmt = stmt;
> > > > +
> > > > +  return true;
> > > > +}
> > > > +
> > > >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> > > >     can handle all live statements in the node.  Otherwise return true
> > > >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > > > @@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> > > >  				  stmt_info, NULL, node)
> > > >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > > > -				   stmt_info, NULL, node, cost_vec));
> > > > +				   stmt_info, NULL, node, cost_vec)
> > > > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > > > +				      cost_vec));
> > > >    else
> > > >      {
> > > >        if (bb_vinfo)
> > > > @@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  					 NULL, NULL, node, cost_vec)
> > > >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> > > >  					  cost_vec)
> > > > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > > > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > > > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > > > +					  cost_vec));
> > > > +
> > > >      }
> > > >
> > > >    if (node)
> > > > @@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
> > > >        gcc_assert (done);
> > > >        break;
> > > >
> > > > +    case loop_exit_ctrl_vec_info_type:
> > > > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > > > +				      slp_node, NULL);
> > > > +      gcc_assert (done);
> > > > +      break;
> > > > +
> > > >      default:
> > > >        if (!STMT_VINFO_LIVE_P (stmt_info))
> > > >  	{
> > > > @@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info
> > > *vinfo, stmt_vec_info stmt_info,
> > > >      }
> > > >    else
> > > >      {
> > > > +      gcond *cond = NULL;
> > > >        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> > > >  	scalar_type = TREE_TYPE (DR_REF (dr));
> > > >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > > >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > > > +      else if ((cond = dyn_cast <gcond *> (stmt)))
> > > > +	{
> > > > +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> > > > +	     single bit precision and we need the vector boolean to be a
> > > > +	     representation of the integer mask.  So set the correct integer type and
> > > > +	     convert to boolean vector once we have a vectype.  */
> > > > +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> > > > +	}
> > > >        else
> > > >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> > > >
> > > > @@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info
> > > *vinfo, stmt_vec_info stmt_info,
> > > >  			     "get vectype for scalar type: %T\n", scalar_type);
> > > >  	}
> > > >        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> > > > +
> > > >        if (!vectype)
> > > >  	return opt_result::failure_at (stmt,
> > > >  				       "not vectorized:"
> > > >  				       " unsupported data-type %T\n",
> > > >  				       scalar_type);
> > > >
> > > > +      /* If we were a gcond, convert the resulting type to a vector boolean type
> > > now
> > > > +	 that we have the correct integer mask type.  */
> > > > +      if (cond)
> > > > +	vectype = truth_type_for (vectype);
> > > > +
> > > >        if (dump_enabled_p ())
> > > >  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> > > >      }
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08 13:45               ` Tamar Christina
@ 2023-12-08 13:59                 ` Richard Biener
  2023-12-08 15:01                   ` Tamar Christina
  2023-12-11  7:09                   ` Tamar Christina
  0 siblings, 2 replies; 24+ messages in thread
From: Richard Biener @ 2023-12-08 13:59 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 8 Dec 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Friday, December 8, 2023 10:28 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> > codegen of exit code
> > 
> > On Fri, 8 Dec 2023, Tamar Christina wrote:
> > 
> > > > --param vect-partial-vector-usage=2 would, no?
> > > >
> > > I.. didn't even know it went to 2!
> > >
> > > > > In principal I suppose I could mask the individual stmts, that should handle
> > the
> > > > future case when
> > > > > This is relaxed to supposed non-fix length buffers?
> > > >
> > > > Well, it looks wrong - either put in an assert that we start with a
> > > > single stmt or assert !masked_loop_p instead?  Better ICE than
> > > > generate wrong code.
> > > >
> > > > That said, I think you need to apply the masking on the original
> > > > stmts[], before reducing them, no?
> > >
> > > Yeah, I've done so now.  For simplicity I've just kept the final masking always as
> > well
> > > and just leave it up to the optimizers to drop it when it's superfluous.
> > >
> > > Simple testcase:
> > >
> > > #ifndef N
> > > #define N 837
> > > #endif
> > > float vect_a[N];
> > > unsigned vect_b[N];
> > >
> > > unsigned test4(double x)
> > > {
> > >  unsigned ret = 0;
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    if (vect_a[i] > x)
> > >      break;
> > >    vect_a[i] = x;
> > >
> > >  }
> > >  return ret;
> > > }
> > >
> > > Looks good now. After this one there's only one patch left, the dependency
> > analysis.
> > > I'm almost done with the cleanup/respin, but want to take the weekend to
> > double check and will post it first thing Monday morning.
> > >
> > > Did you want to see the testsuite changes as well again? I've basically just added
> > the right dg-requires-effective and add-options etc.
> > 
> > Yes please.
> > 
> > > Thanks for all the reviews!
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > > 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> > > 	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> > > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > > 	lhs.
> > > 	(vectorizable_early_exit): New.
> > > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> > >
> > >
> > > --- inline copy of patch ---
> > >
> > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > index
> > 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848
> > ae12523576d29744d 100644
> > > --- a/gcc/tree-vect-patterns.cc
> > > +++ b/gcc/tree-vect-patterns.cc
> > > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> > *pattern_stmt,
> > >    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> > >      {
> > >        gcc_assert (!vectype
> > > +		  || is_a <gcond *> (pattern_stmt)
> > >  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
> > >  		      == vect_use_mask_type_p (orig_stmt_info)));
> > >        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > > @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info
> > *vinfo,
> > >     true if bool VAR can and should be optimized that way.  Assume it shouldn't
> > >     in case it's a result of a comparison which can be directly vectorized into
> > >     a vector comparison.  Fills in STMTS with all stmts visited during the
> > > -   walk.  */
> > > +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform
> > any
> > > +   codegen associated with the boolean condition.  */
> > >
> > >  static bool
> > > -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> > > +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> > > +		    bool analyze_only)
> > >  {
> > >    tree rhs1;
> > >    enum tree_code rhs_code;
> > > +  gassign *def_stmt = NULL;
> > >
> > >    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> > > -  if (!def_stmt_info)
> > > +  if (!def_stmt_info && !analyze_only)
> > >      return false;
> > > +  else if (!def_stmt_info)
> > > +    /* If we're a only analyzing we won't be codegen-ing the statements and are
> > > +       only after if the types match.  In that case we can accept loop invariant
> > > +       values.  */
> > > +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> > > +  else
> > > +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> > >
> > 
> > Hmm, but we're visiting them then?  I wonder how you get along
> > without doing adjustmens on the uses if you consider
> > 
> >     _1 = a < b;
> >     _2 = c != d;
> >     _3 = _1 | _2;
> >     if (_3 != 0)
> >       exit loop;
> > 
> > thus a combined condition like
> > 
> >     if (a < b || c != d)
> > 
> > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > mask uses and thus possibly adjust them.
> > 
> > What bad happens if you drop 'analyze_only'?  We're not really
> > rewriting anything there.
> 
> You mean drop it only in the above? We then fail to update the type for
> the gcond.  So in certain circumstances like with
> 
> int a, c, d;
> short b;
> 
> int
> main ()
> {
>   int e[1];
>   for (; b < 2; b++)
>     {
>       a = 0;
>       if (b == 28378)
>         a = e[b];
>       if (!(d || b))
>         for (; c;)
>           ;
>     }
>   return 0;
> }
> 
> Unless we walk the statements regardless of whether they come from inside the loop or not.

What do you mean by "fail to update the type for the gcond"?  If
I understood correctly the 'analyze_only' short-cuts some
checks, it doens't add some?

But it's hard to follow what's actually done for a gcond ...

> > 
> > > -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> > >    if (!def_stmt)
> > >      return false;
> > >
> > > @@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > hash_set<gimple *> &stmts)
> > >    switch (rhs_code)
> > >      {
> > >      case SSA_NAME:
> > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > >  	return false;
> > >        break;
> > >
> > >      CASE_CONVERT:
> > >        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> > >  	return false;
> > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > >  	return false;
> > >        break;
> > >
> > >      case BIT_NOT_EXPR:
> > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > >  	return false;
> > >        break;
> > >
> > >      case BIT_AND_EXPR:
> > >      case BIT_IOR_EXPR:
> > >      case BIT_XOR_EXPR:
> > > -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> > > -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
> > > +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> > > +				   analyze_only))
> > >  	return false;
> > >        break;
> > >
> > > @@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > hash_set<gimple *> &stmts)
> > >  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
> > >  							  TREE_TYPE (rhs1));
> > >  	  if (mask_type
> > > +	      && !analyze_only
> > >  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
> > >  	    return false;
> > >
> > > @@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > hash_set<gimple *> &stmts)
> > >  	    }
> > >  	  else
> > >  	    vecitype = comp_vectype;
> > > -	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > > +	  if (!analyze_only
> > > +	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > >  	    return false;
> > >  	}
> > >        else
> > > @@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
> > >     VAR is an SSA_NAME that should be transformed from bool to a wider integer
> > >     type, OUT_TYPE is the desired final integer type of the whole pattern.
> > >     STMT_INFO is the info of the pattern root and is where pattern stmts should
> > > -   be associated with.  DEFS is a map of pattern defs.  */
> > > +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> > > +   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
> > >
> > >  static void
> > >  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> > > -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> > > +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> > > +		     gimple *&last_stmt, bool type_only)
> > >  {
> > >    gimple *stmt = SSA_NAME_DEF_STMT (var);
> > >    enum tree_code rhs_code, def_rhs_code;
> > > @@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree
> > out_type,
> > >      }
> > >
> > >    gimple_set_location (pattern_stmt, loc);
> > > -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > > -			  get_vectype_for_scalar_type (vinfo, itype));
> > > +  if (!type_only)
> > > +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > > +			    get_vectype_for_scalar_type (vinfo, itype));
> > > +  last_stmt = pattern_stmt;
> > >    defs.put (var, gimple_assign_lhs (pattern_stmt));
> > >  }
> > >
> > > -/* Comparison function to qsort a vector of gimple stmts after UID.  */
> > > +/* Comparison function to qsort a vector of gimple stmts after BB and UID.
> > > +   the def of one statement can be in an earlier block than the use, so if
> > > +   the BB are different, first compare by BB.  */
> > >
> > >  static int
> > >  sort_after_uid (const void *p1, const void *p2)
> > >  {
> > >    const gimple *stmt1 = *(const gimple * const *)p1;
> > >    const gimple *stmt2 = *(const gimple * const *)p2;
> > > +  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
> > > +    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
> > > +
> > 
> > is this because you eventually get out-of-loop stmts (without UID)?
> > 
> 
> No the problem I was having is that with an early exit the statement of
> one branch of the compare can be in a different BB than the other.
> 
> The testcase specifically was this:
> 
> int a, c, d;
> short b;
> 
> int
> main ()
> {
>   int e[1];
>   for (; b < 2; b++)
>     {
>       a = 0;
>       if (b == 28378)
>         a = e[b];
>       if (!(d || b))
>         for (; c;)
>           ;
>     }
>   return 0;
> }
> 
> Without debug info it happened to work:
> 
> >>> p gimple_uid (bool_stmts[0])
> $1 = 3
> >>> p gimple_uid (bool_stmts[1])
> $2 = 3
> >>> p gimple_uid (bool_stmts[2])
> $3 = 4
> 
> The first two statements got the same uid, but are in different BB in the loop.
> When we add debug, it looks like 1 bb got more debug state than the other:
> 
> >>> p gimple_uid (bool_stmts[0])
> $1 = 3
> >>> p gimple_uid (bool_stmts[1])
> $2 = 4
> >>> p gimple_uid (bool_stmts[2])
> $3 = 6
> 
> That last statement, which now has a UID of 6 used to be 3.

?  gimple_uid is used to map to stmt_vec_info and initially all UIDs
are zero.  It should never happen that two stmts belonging to the
same analyzed loop have the same UID.  In particular debug stmts
never get stmt_vec_info and thus no UID.

If you run into stmts not within the loop or that have no stmt_info
then all bets are off and you can't use UID at all.

As said, I didn't get why you look at those.

> > >    return gimple_uid (stmt1) - gimple_uid (stmt2);
> > >  }
> > >
> > >  /* Create pattern stmts for all stmts participating in the bool pattern
> > >     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> > > -   OUT_TYPE.  Return the def of the pattern root.  */
> > > +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> > > +   statements are not emitted as pattern statements and the tree returned is
> > > +   only useful for type queries.  */
> > >
> > >  static tree
> > >  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> > > -		   tree out_type, stmt_vec_info stmt_info)
> > > +		   tree out_type, stmt_vec_info stmt_info,
> > > +		   bool type_only = false)
> > >  {
> > >    /* Gather original stmts in the bool pattern in their order of appearance
> > >       in the IL.  */
> > > @@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set
> > <gimple *> &bool_stmt_set,
> > >      bool_stmts.quick_push (*i);
> > >    bool_stmts.qsort (sort_after_uid);
> > >
> > > +  gimple *last_stmt = NULL;
> > > +
> > >    /* Now process them in that order, producing pattern stmts.  */
> > >    hash_map <tree, tree> defs;
> > > -  for (unsigned i = 0; i < bool_stmts.length (); ++i)
> > > -    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> > > -			 out_type, stmt_info, defs);
> > > +  for (auto bool_stmt : bool_stmts)
> > > +    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
> > > +			 out_type, stmt_info, defs, last_stmt, type_only);
> > >
> > >    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> > > -  gimple *pattern_stmt
> > > -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> > > -  return gimple_assign_lhs (pattern_stmt);
> > > +  return gimple_assign_lhs (last_stmt);
> > >  }
> > >
> > >  /* Return the proper type for converting bool VAR into
> > > @@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >    enum tree_code rhs_code;
> > >    tree var, lhs, rhs, vectype;
> > >    gimple *pattern_stmt;
> > > -
> > > -  if (!is_gimple_assign (last_stmt))
> > > +  gcond* cond = NULL;
> > > +  if (!is_gimple_assign (last_stmt)
> > > +      && !(cond = dyn_cast <gcond *> (last_stmt)))
> > >      return NULL;
> > >
> > > -  var = gimple_assign_rhs1 (last_stmt);
> > > -  lhs = gimple_assign_lhs (last_stmt);
> > > -  rhs_code = gimple_assign_rhs_code (last_stmt);
> > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > +  if (is_gimple_assign (last_stmt))
> > > +    {
> > > +      var = gimple_assign_rhs1 (last_stmt);
> > > +      lhs = gimple_assign_lhs (last_stmt);
> > > +      rhs_code = gimple_assign_rhs_code (last_stmt);
> > > +    }
> > > +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    {
> > > +      /* If not multiple exits, and loop vectorization don't bother analyzing
> > > +	 the gcond as we don't support SLP today.  */
> > > +      lhs = var = gimple_cond_lhs (last_stmt);
> > > +      rhs_code = gimple_cond_code (last_stmt);
> > > +    }
> > > +  else
> > > +    return NULL;
> > >
> > >    if (rhs_code == VIEW_CONVERT_EXPR)
> > >      var = TREE_OPERAND (var, 0);
> > > @@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >  	return NULL;
> > >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > >
> > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> > >  	{
> > >  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
> > >  				   TREE_TYPE (lhs), stmt_vinfo);
> > > @@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >
> > >        return pattern_stmt;
> > >      }
> > > -  else if (rhs_code == COND_EXPR
> > > +  else if ((rhs_code == COND_EXPR || cond)
> > >  	   && TREE_CODE (var) == SSA_NAME)
> > >      {
> > >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > > @@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
> > >  	return NULL;
> > >
> > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > -	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> > > +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
> > > +	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
> > >        else if (integer_type_for_mask (var, vinfo))
> > >  	return NULL;
> > >
> > > -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > > -      pattern_stmt
> > > -	= gimple_build_assign (lhs, COND_EXPR,
> > > -			       build2 (NE_EXPR, boolean_type_node,
> > > -				       var, build_int_cst (TREE_TYPE (var), 0)),
> > > -			       gimple_assign_rhs2 (last_stmt),
> > > -			       gimple_assign_rhs3 (last_stmt));
> > > +      if (!cond)
> > > +	{
> > > +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > > +	  pattern_stmt
> > > +	    = gimple_build_assign (lhs, COND_EXPR,
> > > +				   build2 (NE_EXPR, boolean_type_node, var,
> > > +					   build_int_cst (TREE_TYPE (var), 0)),
> > > +				   gimple_assign_rhs2 (last_stmt),
> > > +				   gimple_assign_rhs3 (last_stmt));
> > > +	}
> > > +      else
> > > +	{
> > > +	  pattern_stmt
> > > +	    = gimple_build_cond (gimple_cond_code (cond),
> > > +				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
> > > +				 gimple_cond_true_label (cond),
> > > +				 gimple_cond_false_label (cond));
> > > +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> > > +	  vectype = truth_type_for (vectype);
> > > +	}
> > >        *type_out = vectype;
> > >        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
> > >
> > 
> > So this is also quite odd.  You're hooking into COND_EXPR handling
> > but only look at the LHS of the GIMPLE_COND compare.
> > 
> 
> Hmm, not sure I follow, GIMPLE_CONDs don't have an LHS no? we look at the LHS
> For the COND_EXPR but a GCOND we just recreate the statement and set vectype
> based on the updated var. I guess this is related to:

a GIMPLE_COND has "lhs" and "rhs", the two operands of the embedded
compare.  You seem to look at only "lhs" for analyzing bool patterns.

> > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > mask uses and thus possibly adjust them.
> 
> Which I did think about somewhat, so what you're saying is that I need to create
> a new GIMPLE_COND here with an NE to 0 compare against var like the COND_EXPR
> case?

Well, it depends how you wire eveything up.  But since we later want
a mask def and vectorize the GIMPLE_COND as cbranch it seemed to me
it's easiest to pattern

  if (a > b)

as

  mask.patt = a > b;
  if (mask.patt != 0)

I thought you were doing this.  And yes, COND_EXPRs are now
effectively doing that since we no longer embed a comparison
in the first operand (only the pattern recognizer still does that
as I was lazy).

> 
> > Please refactor the changes to separate the GIMPLE_COND path
> > completely.
> > 
> 
> Ok, then it seems better to make two patterns?

Maybe.

> > Is there test coverage for such "complex" condition?  I think
> > you'll need adjustments to vect_recog_mask_conversion_pattern
> > as well similar as to how COND_EXPR is handled there.
> 
> Yes, the existing testsuite has many cases which fail, including gcc/testsuite/gcc.c-torture/execute/20150611-1.c

Fail in which way?  Fail to vectorize because we don't handle such
condition?

Thanks,
Richard.

> Cheers,
> Tamar
> 
> > 
> > > @@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
> > >  	return NULL;
> > >
> > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> > >  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
> > >  				 TREE_TYPE (vectype), stmt_vinfo);
> > >        else
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index
> > 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f5
> > 62721c140d586c94 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> > vectype,
> > >    vec<tree> vec_oprnds0 = vNULL;
> > >    vec<tree> vec_oprnds1 = vNULL;
> > >    tree mask_type;
> > > -  tree mask;
> > > +  tree mask = NULL_TREE;
> > >
> > >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > >      return false;
> > > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> > vectype,
> > >    /* Transform.  */
> > >
> > >    /* Handle def.  */
> > > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > > -  mask = vect_create_destination_var (lhs, mask_type);
> > > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> > > +  if (lhs)
> > > +    mask = vect_create_destination_var (lhs, mask_type);
> > >
> > >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> > >  		     rhs1, &vec_oprnds0, vectype,
> > > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> > vectype,
> > >        gimple *new_stmt;
> > >        vec_rhs2 = vec_oprnds1[i];
> > >
> > > -      new_temp = make_ssa_name (mask);
> > > +      if (lhs)
> > > +	new_temp = make_ssa_name (mask);
> > > +      else
> > > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> > >        if (bitop1 == NOP_EXPR)
> > >  	{
> > >  	  new_stmt = gimple_build_assign (new_temp, code,
> > > @@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
> > >    return true;
> > >  }
> > >
> > > +/* Check to see if the current early break given in STMT_INFO is valid for
> > > +   vectorization.  */
> > > +
> > > +static bool
> > > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > > +{
> > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > +  if (!loop_vinfo
> > > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > > +    return false;
> > > +
> > > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> > > +    return false;
> > > +
> > > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > > +    return false;
> > > +
> > > +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> > > +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > > +  gcc_assert (vectype);
> > > +
> > > +  tree vectype_op0 = NULL_TREE;
> > > +  slp_tree slp_op0;
> > > +  tree op0;
> > > +  enum vect_def_type dt0;
> > > +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> > > +			   &vectype_op0))
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			   "use not simple.\n");
> > > +	return false;
> > > +    }
> > 
> > I think you rely on patterns transforming this into canonical form
> > mask != 0, so I suggest to check this here.
> > 
> > > +  machine_mode mode = TYPE_MODE (vectype);
> > > +  int ncopies;
> > > +
> > > +  if (slp_node)
> > > +    ncopies = 1;
> > > +  else
> > > +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > > +
> > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > +
> > > +  /* Analyze only.  */
> > > +  if (!vec_stmt)
> > > +    {
> > > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target doesn't support flag setting vector "
> > > +			       "comparisons.\n");
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (ncopies > 1
> > 
> > Also required for vec_num > 1 with SLP
> > (SLP_TREE_NUMBER_OF_VEC_STMTS)
> > 
> > > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target does not support boolean vector OR for "
> > > +			       "type %T.\n", vectype);
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > > +				      vec_stmt, slp_node, cost_vec))
> > > +	return false;
> > > +
> > > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > > +	{
> > > +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> > > +					      OPTIMIZE_FOR_SPEED))
> > > +	    return false;
> > > +	  else
> > > +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> > > +	}
> > > +
> > > +
> > > +      return true;
> > > +    }
> > > +
> > > +  /* Tranform.  */
> > > +
> > > +  tree new_temp = NULL_TREE;
> > > +  gimple *new_stmt = NULL;
> > > +
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> > > +
> > > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > > +				  vec_stmt, slp_node, cost_vec))
> > > +    gcc_unreachable ();
> > > +
> > > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > > +  basic_block cond_bb = gimple_bb (stmt);
> > > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > > +
> > > +  auto_vec<tree> stmts;
> > > +
> > > +  tree mask = NULL_TREE;
> > > +  if (masked_loop_p)
> > > +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> > > +
> > > +  if (slp_node)
> > > +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > > +  else
> > > +    {
> > > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > > +      stmts.reserve_exact (vec_stmts.length ());
> > > +      for (auto stmt : vec_stmts)
> > > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > > +    }
> > > +
> > > +  /* Determine if we need to reduce the final value.  */
> > > +  if (stmts.length () > 1)
> > > +    {
> > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > +	 possible.  */
> > > +      auto_vec<tree> workset (stmts.length ());
> > > +
> > > +      /* Mask the statements as we queue them up.  */
> > > +      if (masked_loop_p)
> > > +	for (auto stmt : stmts)
> > > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > > +						mask, stmt, &cond_gsi));
> > 
> > I think this still uses the wrong mask, you need to use
> > 
> >   vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, <cnt>)
> > 
> > replacing <cnt> with the vector def index to mask I think.  For this
> > reason keeping the "final" mask below is also wrong.
> > 
> > Or am I missing something?
> > 
> > > +      else
> > > +	workset.splice (stmts);
> > > +
> > > +      while (workset.length () > 1)
> > > +	{
> > > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > +	  tree arg0 = workset.pop ();
> > > +	  tree arg1 = workset.pop ();
> > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > +				       &cond_gsi);
> > > +	  workset.quick_insert (0, new_temp);
> > > +	}
> > > +    }
> > > +  else
> > > +    new_temp = stmts[0];
> > > +
> > > +  gcc_assert (new_temp);
> > > +
> > > +  tree cond = new_temp;
> > > +  /* If we have multiple statements after reduction we should check all the
> > > +     lanes and treat it as a full vector.  */
> > > +  if (masked_loop_p)
> > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > +			     &cond_gsi);
> > 
> > so just do this in the else path above
> > 
> > Otherwise looks OK.
> > 
> > Richard.
> > 
> > > +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> > > +     codegen so we must replace the original insn.  */
> > > +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > > +  gcond *cond_stmt = as_a <gcond *>(stmt);
> > > +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> > > +			     build_zero_cst (vectype));
> > > +  update_stmt (stmt);
> > > +
> > > +  if (slp_node)
> > > +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > > +   else
> > > +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> > > +
> > > +
> > > +  if (!slp_node)
> > > +    *vec_stmt = stmt;
> > > +
> > > +  return true;
> > > +}
> > > +
> > >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> > >     can handle all live statements in the node.  Otherwise return true
> > >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > > @@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
> > >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> > >  				  stmt_info, NULL, node)
> > >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > > -				   stmt_info, NULL, node, cost_vec));
> > > +				   stmt_info, NULL, node, cost_vec)
> > > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > > +				      cost_vec));
> > >    else
> > >      {
> > >        if (bb_vinfo)
> > > @@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
> > >  					 NULL, NULL, node, cost_vec)
> > >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> > >  					  cost_vec)
> > > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > > +					  cost_vec));
> > > +
> > >      }
> > >
> > >    if (node)
> > > @@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
> > >        gcc_assert (done);
> > >        break;
> > >
> > > +    case loop_exit_ctrl_vec_info_type:
> > > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > > +				      slp_node, NULL);
> > > +      gcc_assert (done);
> > > +      break;
> > > +
> > >      default:
> > >        if (!STMT_VINFO_LIVE_P (stmt_info))
> > >  	{
> > > @@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > >      }
> > >    else
> > >      {
> > > +      gcond *cond = NULL;
> > >        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> > >  	scalar_type = TREE_TYPE (DR_REF (dr));
> > >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > > +      else if ((cond = dyn_cast <gcond *> (stmt)))
> > > +	{
> > > +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> > > +	     single bit precision and we need the vector boolean to be a
> > > +	     representation of the integer mask.  So set the correct integer type and
> > > +	     convert to boolean vector once we have a vectype.  */
> > > +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> > > +	}
> > >        else
> > >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> > >
> > > @@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > >  			     "get vectype for scalar type: %T\n", scalar_type);
> > >  	}
> > >        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> > > +
> > >        if (!vectype)
> > >  	return opt_result::failure_at (stmt,
> > >  				       "not vectorized:"
> > >  				       " unsupported data-type %T\n",
> > >  				       scalar_type);
> > >
> > > +      /* If we were a gcond, convert the resulting type to a vector boolean type
> > now
> > > +	 that we have the correct integer mask type.  */
> > > +      if (cond)
> > > +	vectype = truth_type_for (vectype);
> > > +
> > >        if (dump_enabled_p ())
> > >  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> > >      }
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08 10:28             ` Richard Biener
@ 2023-12-08 13:45               ` Tamar Christina
  2023-12-08 13:59                 ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Tamar Christina @ 2023-12-08 13:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, December 8, 2023 10:28 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Fri, 8 Dec 2023, Tamar Christina wrote:
> 
> > > --param vect-partial-vector-usage=2 would, no?
> > >
> > I.. didn't even know it went to 2!
> >
> > > > In principal I suppose I could mask the individual stmts, that should handle
> the
> > > future case when
> > > > This is relaxed to supposed non-fix length buffers?
> > >
> > > Well, it looks wrong - either put in an assert that we start with a
> > > single stmt or assert !masked_loop_p instead?  Better ICE than
> > > generate wrong code.
> > >
> > > That said, I think you need to apply the masking on the original
> > > stmts[], before reducing them, no?
> >
> > Yeah, I've done so now.  For simplicity I've just kept the final masking always as
> well
> > and just leave it up to the optimizers to drop it when it's superfluous.
> >
> > Simple testcase:
> >
> > #ifndef N
> > #define N 837
> > #endif
> > float vect_a[N];
> > unsigned vect_b[N];
> >
> > unsigned test4(double x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >    if (vect_a[i] > x)
> >      break;
> >    vect_a[i] = x;
> >
> >  }
> >  return ret;
> > }
> >
> > Looks good now. After this one there's only one patch left, the dependency
> analysis.
> > I'm almost done with the cleanup/respin, but want to take the weekend to
> double check and will post it first thing Monday morning.
> >
> > Did you want to see the testsuite changes as well again? I've basically just added
> the right dg-requires-effective and add-options etc.
> 
> Yes please.
> 
> > Thanks for all the reviews!
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> > 	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > 	lhs.
> > 	(vectorizable_early_exit): New.
> > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> >
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848
> ae12523576d29744d 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> *pattern_stmt,
> >    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> >      {
> >        gcc_assert (!vectype
> > +		  || is_a <gcond *> (pattern_stmt)
> >  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
> >  		      == vect_use_mask_type_p (orig_stmt_info)));
> >        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info
> *vinfo,
> >     true if bool VAR can and should be optimized that way.  Assume it shouldn't
> >     in case it's a result of a comparison which can be directly vectorized into
> >     a vector comparison.  Fills in STMTS with all stmts visited during the
> > -   walk.  */
> > +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform
> any
> > +   codegen associated with the boolean condition.  */
> >
> >  static bool
> > -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> > +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> > +		    bool analyze_only)
> >  {
> >    tree rhs1;
> >    enum tree_code rhs_code;
> > +  gassign *def_stmt = NULL;
> >
> >    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> > -  if (!def_stmt_info)
> > +  if (!def_stmt_info && !analyze_only)
> >      return false;
> > +  else if (!def_stmt_info)
> > +    /* If we're a only analyzing we won't be codegen-ing the statements and are
> > +       only after if the types match.  In that case we can accept loop invariant
> > +       values.  */
> > +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> > +  else
> > +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> >
> 
> Hmm, but we're visiting them then?  I wonder how you get along
> without doing adjustmens on the uses if you consider
> 
>     _1 = a < b;
>     _2 = c != d;
>     _3 = _1 | _2;
>     if (_3 != 0)
>       exit loop;
> 
> thus a combined condition like
> 
>     if (a < b || c != d)
> 
> that we if-converted.  We need to recognize that _1, _2 and _3 have
> mask uses and thus possibly adjust them.
> 
> What bad happens if you drop 'analyze_only'?  We're not really
> rewriting anything there.

You mean drop it only in the above? We then fail to update the type for
the gcond.  So in certain circumstances like with

int a, c, d;
short b;

int
main ()
{
  int e[1];
  for (; b < 2; b++)
    {
      a = 0;
      if (b == 28378)
        a = e[b];
      if (!(d || b))
        for (; c;)
          ;
    }
  return 0;
}

Unless we walk the statements regardless of whether they come from inside the loop or not.

> 
> > -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> >    if (!def_stmt)
> >      return false;
> >
> > @@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo,
> hash_set<gimple *> &stmts)
> >    switch (rhs_code)
> >      {
> >      case SSA_NAME:
> > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> >  	return false;
> >        break;
> >
> >      CASE_CONVERT:
> >        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> >  	return false;
> > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> >  	return false;
> >        break;
> >
> >      case BIT_NOT_EXPR:
> > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> >  	return false;
> >        break;
> >
> >      case BIT_AND_EXPR:
> >      case BIT_IOR_EXPR:
> >      case BIT_XOR_EXPR:
> > -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> > -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
> > +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> > +				   analyze_only))
> >  	return false;
> >        break;
> >
> > @@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo,
> hash_set<gimple *> &stmts)
> >  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
> >  							  TREE_TYPE (rhs1));
> >  	  if (mask_type
> > +	      && !analyze_only
> >  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
> >  	    return false;
> >
> > @@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo,
> hash_set<gimple *> &stmts)
> >  	    }
> >  	  else
> >  	    vecitype = comp_vectype;
> > -	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > +	  if (!analyze_only
> > +	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> >  	    return false;
> >  	}
> >        else
> > @@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
> >     VAR is an SSA_NAME that should be transformed from bool to a wider integer
> >     type, OUT_TYPE is the desired final integer type of the whole pattern.
> >     STMT_INFO is the info of the pattern root and is where pattern stmts should
> > -   be associated with.  DEFS is a map of pattern defs.  */
> > +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> > +   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
> >
> >  static void
> >  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> > -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> > +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> > +		     gimple *&last_stmt, bool type_only)
> >  {
> >    gimple *stmt = SSA_NAME_DEF_STMT (var);
> >    enum tree_code rhs_code, def_rhs_code;
> > @@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree
> out_type,
> >      }
> >
> >    gimple_set_location (pattern_stmt, loc);
> > -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > -			  get_vectype_for_scalar_type (vinfo, itype));
> > +  if (!type_only)
> > +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > +			    get_vectype_for_scalar_type (vinfo, itype));
> > +  last_stmt = pattern_stmt;
> >    defs.put (var, gimple_assign_lhs (pattern_stmt));
> >  }
> >
> > -/* Comparison function to qsort a vector of gimple stmts after UID.  */
> > +/* Comparison function to qsort a vector of gimple stmts after BB and UID.
> > +   the def of one statement can be in an earlier block than the use, so if
> > +   the BB are different, first compare by BB.  */
> >
> >  static int
> >  sort_after_uid (const void *p1, const void *p2)
> >  {
> >    const gimple *stmt1 = *(const gimple * const *)p1;
> >    const gimple *stmt2 = *(const gimple * const *)p2;
> > +  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
> > +    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
> > +
> 
> is this because you eventually get out-of-loop stmts (without UID)?
> 

No the problem I was having is that with an early exit the statement of
one branch of the compare can be in a different BB than the other.

The testcase specifically was this:

int a, c, d;
short b;

int
main ()
{
  int e[1];
  for (; b < 2; b++)
    {
      a = 0;
      if (b == 28378)
        a = e[b];
      if (!(d || b))
        for (; c;)
          ;
    }
  return 0;
}

Without debug info it happened to work:

>>> p gimple_uid (bool_stmts[0])
$1 = 3
>>> p gimple_uid (bool_stmts[1])
$2 = 3
>>> p gimple_uid (bool_stmts[2])
$3 = 4

The first two statements got the same uid, but are in different BB in the loop.
When we add debug, it looks like 1 bb got more debug state than the other:

>>> p gimple_uid (bool_stmts[0])
$1 = 3
>>> p gimple_uid (bool_stmts[1])
$2 = 4
>>> p gimple_uid (bool_stmts[2])
$3 = 6

That last statement, which now has a UID of 6 used to be 3.

> >    return gimple_uid (stmt1) - gimple_uid (stmt2);
> >  }
> >
> >  /* Create pattern stmts for all stmts participating in the bool pattern
> >     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> > -   OUT_TYPE.  Return the def of the pattern root.  */
> > +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> > +   statements are not emitted as pattern statements and the tree returned is
> > +   only useful for type queries.  */
> >
> >  static tree
> >  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> > -		   tree out_type, stmt_vec_info stmt_info)
> > +		   tree out_type, stmt_vec_info stmt_info,
> > +		   bool type_only = false)
> >  {
> >    /* Gather original stmts in the bool pattern in their order of appearance
> >       in the IL.  */
> > @@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set
> <gimple *> &bool_stmt_set,
> >      bool_stmts.quick_push (*i);
> >    bool_stmts.qsort (sort_after_uid);
> >
> > +  gimple *last_stmt = NULL;
> > +
> >    /* Now process them in that order, producing pattern stmts.  */
> >    hash_map <tree, tree> defs;
> > -  for (unsigned i = 0; i < bool_stmts.length (); ++i)
> > -    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> > -			 out_type, stmt_info, defs);
> > +  for (auto bool_stmt : bool_stmts)
> > +    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
> > +			 out_type, stmt_info, defs, last_stmt, type_only);
> >
> >    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> > -  gimple *pattern_stmt
> > -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> > -  return gimple_assign_lhs (pattern_stmt);
> > +  return gimple_assign_lhs (last_stmt);
> >  }
> >
> >  /* Return the proper type for converting bool VAR into
> > @@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >    enum tree_code rhs_code;
> >    tree var, lhs, rhs, vectype;
> >    gimple *pattern_stmt;
> > -
> > -  if (!is_gimple_assign (last_stmt))
> > +  gcond* cond = NULL;
> > +  if (!is_gimple_assign (last_stmt)
> > +      && !(cond = dyn_cast <gcond *> (last_stmt)))
> >      return NULL;
> >
> > -  var = gimple_assign_rhs1 (last_stmt);
> > -  lhs = gimple_assign_lhs (last_stmt);
> > -  rhs_code = gimple_assign_rhs_code (last_stmt);
> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > +  if (is_gimple_assign (last_stmt))
> > +    {
> > +      var = gimple_assign_rhs1 (last_stmt);
> > +      lhs = gimple_assign_lhs (last_stmt);
> > +      rhs_code = gimple_assign_rhs_code (last_stmt);
> > +    }
> > +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    {
> > +      /* If not multiple exits, and loop vectorization don't bother analyzing
> > +	 the gcond as we don't support SLP today.  */
> > +      lhs = var = gimple_cond_lhs (last_stmt);
> > +      rhs_code = gimple_cond_code (last_stmt);
> > +    }
> > +  else
> > +    return NULL;
> >
> >    if (rhs_code == VIEW_CONVERT_EXPR)
> >      var = TREE_OPERAND (var, 0);
> > @@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >  	return NULL;
> >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> >
> > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> >  	{
> >  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
> >  				   TREE_TYPE (lhs), stmt_vinfo);
> > @@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >
> >        return pattern_stmt;
> >      }
> > -  else if (rhs_code == COND_EXPR
> > +  else if ((rhs_code == COND_EXPR || cond)
> >  	   && TREE_CODE (var) == SSA_NAME)
> >      {
> >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > @@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
> >  	return NULL;
> >
> > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > -	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> > +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
> > +	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
> >        else if (integer_type_for_mask (var, vinfo))
> >  	return NULL;
> >
> > -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > -      pattern_stmt
> > -	= gimple_build_assign (lhs, COND_EXPR,
> > -			       build2 (NE_EXPR, boolean_type_node,
> > -				       var, build_int_cst (TREE_TYPE (var), 0)),
> > -			       gimple_assign_rhs2 (last_stmt),
> > -			       gimple_assign_rhs3 (last_stmt));
> > +      if (!cond)
> > +	{
> > +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > +	  pattern_stmt
> > +	    = gimple_build_assign (lhs, COND_EXPR,
> > +				   build2 (NE_EXPR, boolean_type_node, var,
> > +					   build_int_cst (TREE_TYPE (var), 0)),
> > +				   gimple_assign_rhs2 (last_stmt),
> > +				   gimple_assign_rhs3 (last_stmt));
> > +	}
> > +      else
> > +	{
> > +	  pattern_stmt
> > +	    = gimple_build_cond (gimple_cond_code (cond),
> > +				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
> > +				 gimple_cond_true_label (cond),
> > +				 gimple_cond_false_label (cond));
> > +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> > +	  vectype = truth_type_for (vectype);
> > +	}
> >        *type_out = vectype;
> >        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
> >
> 
> So this is also quite odd.  You're hooking into COND_EXPR handling
> but only look at the LHS of the GIMPLE_COND compare.
> 

Hmm, not sure I follow, GIMPLE_CONDs don't have an LHS no? we look at the LHS
For the COND_EXPR but a GCOND we just recreate the statement and set vectype
based on the updated var. I guess this is related to:

> that we if-converted.  We need to recognize that _1, _2 and _3 have
> mask uses and thus possibly adjust them.

Which I did think about somewhat, so what you're saying is that I need to create
a new GIMPLE_COND here with an NE to 0 compare against var like the COND_EXPR
case?

> Please refactor the changes to separate the GIMPLE_COND path
> completely.
> 

Ok, then it seems better to make two patterns?

> Is there test coverage for such "complex" condition?  I think
> you'll need adjustments to vect_recog_mask_conversion_pattern
> as well similar as to how COND_EXPR is handled there.

Yes, the existing testsuite has many cases which fail, including gcc/testsuite/gcc.c-torture/execute/20150611-1.c

Cheers,
Tamar

> 
> > @@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
> >  	return NULL;
> >
> > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> >  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
> >  				 TREE_TYPE (vectype), stmt_vinfo);
> >        else
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f5
> 62721c140d586c94 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    vec<tree> vec_oprnds0 = vNULL;
> >    vec<tree> vec_oprnds1 = vNULL;
> >    tree mask_type;
> > -  tree mask;
> > +  tree mask = NULL_TREE;
> >
> >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> >      return false;
> > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    /* Transform.  */
> >
> >    /* Handle def.  */
> > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > -  mask = vect_create_destination_var (lhs, mask_type);
> > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> > +  if (lhs)
> > +    mask = vect_create_destination_var (lhs, mask_type);
> >
> >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> >  		     rhs1, &vec_oprnds0, vectype,
> > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >        gimple *new_stmt;
> >        vec_rhs2 = vec_oprnds1[i];
> >
> > -      new_temp = make_ssa_name (mask);
> > +      if (lhs)
> > +	new_temp = make_ssa_name (mask);
> > +      else
> > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> >        if (bitop1 == NOP_EXPR)
> >  	{
> >  	  new_stmt = gimple_build_assign (new_temp, code,
> > @@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
> >    return true;
> >  }
> >
> > +/* Check to see if the current early break given in STMT_INFO is valid for
> > +   vectorization.  */
> > +
> > +static bool
> > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > +{
> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > +  if (!loop_vinfo
> > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > +    return false;
> > +
> > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> > +    return false;
> > +
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > +    return false;
> > +
> > +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> > +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > +  gcc_assert (vectype);
> > +
> > +  tree vectype_op0 = NULL_TREE;
> > +  slp_tree slp_op0;
> > +  tree op0;
> > +  enum vect_def_type dt0;
> > +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> > +			   &vectype_op0))
> > +    {
> > +      if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			   "use not simple.\n");
> > +	return false;
> > +    }
> 
> I think you rely on patterns transforming this into canonical form
> mask != 0, so I suggest to check this here.
> 
> > +  machine_mode mode = TYPE_MODE (vectype);
> > +  int ncopies;
> > +
> > +  if (slp_node)
> > +    ncopies = 1;
> > +  else
> > +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > +
> > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +
> > +  /* Analyze only.  */
> > +  if (!vec_stmt)
> > +    {
> > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target doesn't support flag setting vector "
> > +			       "comparisons.\n");
> > +	  return false;
> > +	}
> > +
> > +      if (ncopies > 1
> 
> Also required for vec_num > 1 with SLP
> (SLP_TREE_NUMBER_OF_VEC_STMTS)
> 
> > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target does not support boolean vector OR for "
> > +			       "type %T.\n", vectype);
> > +	  return false;
> > +	}
> > +
> > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +				      vec_stmt, slp_node, cost_vec))
> > +	return false;
> > +
> > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > +	{
> > +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> > +					      OPTIMIZE_FOR_SPEED))
> > +	    return false;
> > +	  else
> > +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> > +	}
> > +
> > +
> > +      return true;
> > +    }
> > +
> > +  /* Tranform.  */
> > +
> > +  tree new_temp = NULL_TREE;
> > +  gimple *new_stmt = NULL;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> > +
> > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +				  vec_stmt, slp_node, cost_vec))
> > +    gcc_unreachable ();
> > +
> > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > +  basic_block cond_bb = gimple_bb (stmt);
> > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > +
> > +  auto_vec<tree> stmts;
> > +
> > +  tree mask = NULL_TREE;
> > +  if (masked_loop_p)
> > +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> > +
> > +  if (slp_node)
> > +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > +  else
> > +    {
> > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > +      stmts.reserve_exact (vec_stmts.length ());
> > +      for (auto stmt : vec_stmts)
> > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > +    }
> > +
> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +    {
> > +      /* We build the reductions in a way to maintain as much parallelism as
> > +	 possible.  */
> > +      auto_vec<tree> workset (stmts.length ());
> > +
> > +      /* Mask the statements as we queue them up.  */
> > +      if (masked_loop_p)
> > +	for (auto stmt : stmts)
> > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > +						mask, stmt, &cond_gsi));
> 
> I think this still uses the wrong mask, you need to use
> 
>   vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, <cnt>)
> 
> replacing <cnt> with the vector def index to mask I think.  For this
> reason keeping the "final" mask below is also wrong.
> 
> Or am I missing something?
> 
> > +      else
> > +	workset.splice (stmts);
> > +
> > +      while (workset.length () > 1)
> > +	{
> > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > +	  tree arg0 = workset.pop ();
> > +	  tree arg1 = workset.pop ();
> > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +				       &cond_gsi);
> > +	  workset.quick_insert (0, new_temp);
> > +	}
> > +    }
> > +  else
> > +    new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  /* If we have multiple statements after reduction we should check all the
> > +     lanes and treat it as a full vector.  */
> > +  if (masked_loop_p)
> > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +			     &cond_gsi);
> 
> so just do this in the else path above
> 
> Otherwise looks OK.
> 
> Richard.
> 
> > +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> > +     codegen so we must replace the original insn.  */
> > +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > +  gcond *cond_stmt = as_a <gcond *>(stmt);
> > +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> > +			     build_zero_cst (vectype));
> > +  update_stmt (stmt);
> > +
> > +  if (slp_node)
> > +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > +   else
> > +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> > +
> > +
> > +  if (!slp_node)
> > +    *vec_stmt = stmt;
> > +
> > +  return true;
> > +}
> > +
> >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> >     can handle all live statements in the node.  Otherwise return true
> >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > @@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
> >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> >  				  stmt_info, NULL, node)
> >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > -				   stmt_info, NULL, node, cost_vec));
> > +				   stmt_info, NULL, node, cost_vec)
> > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +				      cost_vec));
> >    else
> >      {
> >        if (bb_vinfo)
> > @@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
> >  					 NULL, NULL, node, cost_vec)
> >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> >  					  cost_vec)
> > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +					  cost_vec));
> > +
> >      }
> >
> >    if (node)
> > @@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
> >        gcc_assert (done);
> >        break;
> >
> > +    case loop_exit_ctrl_vec_info_type:
> > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > +				      slp_node, NULL);
> > +      gcc_assert (done);
> > +      break;
> > +
> >      default:
> >        if (!STMT_VINFO_LIVE_P (stmt_info))
> >  	{
> > @@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >      }
> >    else
> >      {
> > +      gcond *cond = NULL;
> >        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> >  	scalar_type = TREE_TYPE (DR_REF (dr));
> >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > +      else if ((cond = dyn_cast <gcond *> (stmt)))
> > +	{
> > +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> > +	     single bit precision and we need the vector boolean to be a
> > +	     representation of the integer mask.  So set the correct integer type and
> > +	     convert to boolean vector once we have a vectype.  */
> > +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> > +	}
> >        else
> >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> >
> > @@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >  			     "get vectype for scalar type: %T\n", scalar_type);
> >  	}
> >        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> > +
> >        if (!vectype)
> >  	return opt_result::failure_at (stmt,
> >  				       "not vectorized:"
> >  				       " unsupported data-type %T\n",
> >  				       scalar_type);
> >
> > +      /* If we were a gcond, convert the resulting type to a vector boolean type
> now
> > +	 that we have the correct integer mask type.  */
> > +      if (cond)
> > +	vectype = truth_type_for (vectype);
> > +
> >        if (dump_enabled_p ())
> >  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> >      }
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08  8:58           ` Tamar Christina
@ 2023-12-08 10:28             ` Richard Biener
  2023-12-08 13:45               ` Tamar Christina
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2023-12-08 10:28 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 8 Dec 2023, Tamar Christina wrote:

> > --param vect-partial-vector-usage=2 would, no?
> > 
> I.. didn't even know it went to 2!
> 
> > > In principal I suppose I could mask the individual stmts, that should handle the
> > future case when
> > > This is relaxed to supposed non-fix length buffers?
> > 
> > Well, it looks wrong - either put in an assert that we start with a
> > single stmt or assert !masked_loop_p instead?  Better ICE than
> > generate wrong code.
> > 
> > That said, I think you need to apply the masking on the original
> > stmts[], before reducing them, no?
> 
> Yeah, I've done so now.  For simplicity I've just kept the final masking always as well
> and just leave it up to the optimizers to drop it when it's superfluous.
> 
> Simple testcase:
> 
> #ifndef N
> #define N 837
> #endif
> float vect_a[N];
> unsigned vect_b[N];
> 
> unsigned test4(double x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>    if (vect_a[i] > x)
>      break;
>    vect_a[i] = x;
> 
>  }
>  return ret;
> }
> 
> Looks good now. After this one there's only one patch left, the dependency analysis.
> I'm almost done with the cleanup/respin, but want to take the weekend to double check and will post it first thing Monday morning.
> 
> Did you want to see the testsuite changes as well again? I've basically just added the right dg-requires-effective and add-options etc.

Yes please.

> Thanks for all the reviews!
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
>
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> 	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848ae12523576d29744d 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
>     true if bool VAR can and should be optimized that way.  Assume it shouldn't
>     in case it's a result of a comparison which can be directly vectorized into
>     a vector comparison.  Fills in STMTS with all stmts visited during the
> -   walk.  */
> +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform any
> +   codegen associated with the boolean condition.  */
>  
>  static bool
> -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> +		    bool analyze_only)
>  {
>    tree rhs1;
>    enum tree_code rhs_code;
> +  gassign *def_stmt = NULL;
>  
>    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> -  if (!def_stmt_info)
> +  if (!def_stmt_info && !analyze_only)
>      return false;
> +  else if (!def_stmt_info)
> +    /* If we're a only analyzing we won't be codegen-ing the statements and are
> +       only after if the types match.  In that case we can accept loop invariant
> +       values.  */
> +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> +  else
> +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>  

Hmm, but we're visiting them then?  I wonder how you get along
without doing adjustmens on the uses if you consider

    _1 = a < b;
    _2 = c != d;
    _3 = _1 | _2;
    if (_3 != 0)
      exit loop;

thus a combined condition like

    if (a < b || c != d)

that we if-converted.  We need to recognize that _1, _2 and _3 have
mask uses and thus possibly adjust them.

What bad happens if you drop 'analyze_only'?  We're not really
rewriting anything there.

> -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>    if (!def_stmt)
>      return false;
>  
> @@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>    switch (rhs_code)
>      {
>      case SSA_NAME:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
>  	return false;
>        break;
>  
>      CASE_CONVERT:
>        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
>  	return false;
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
>  	return false;
>        break;
>  
>      case BIT_NOT_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
>  	return false;
>        break;
>  
>      case BIT_AND_EXPR:
>      case BIT_IOR_EXPR:
>      case BIT_XOR_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
> +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> +				   analyze_only))
>  	return false;
>        break;
>  
> @@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
>  							  TREE_TYPE (rhs1));
>  	  if (mask_type
> +	      && !analyze_only
>  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
>  	    return false;
>  
> @@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>  	    }
>  	  else
>  	    vecitype = comp_vectype;
> -	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> +	  if (!analyze_only
> +	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
>  	    return false;
>  	}
>        else
> @@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
>     VAR is an SSA_NAME that should be transformed from bool to a wider integer
>     type, OUT_TYPE is the desired final integer type of the whole pattern.
>     STMT_INFO is the info of the pattern root and is where pattern stmts should
> -   be associated with.  DEFS is a map of pattern defs.  */
> +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> +   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
>  
>  static void
>  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> +		     gimple *&last_stmt, bool type_only)
>  {
>    gimple *stmt = SSA_NAME_DEF_STMT (var);
>    enum tree_code rhs_code, def_rhs_code;
> @@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
>      }
>  
>    gimple_set_location (pattern_stmt, loc);
> -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> -			  get_vectype_for_scalar_type (vinfo, itype));
> +  if (!type_only)
> +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> +			    get_vectype_for_scalar_type (vinfo, itype));
> +  last_stmt = pattern_stmt;
>    defs.put (var, gimple_assign_lhs (pattern_stmt));
>  }
>  
> -/* Comparison function to qsort a vector of gimple stmts after UID.  */
> +/* Comparison function to qsort a vector of gimple stmts after BB and UID.
> +   the def of one statement can be in an earlier block than the use, so if
> +   the BB are different, first compare by BB.  */
>  
>  static int
>  sort_after_uid (const void *p1, const void *p2)
>  {
>    const gimple *stmt1 = *(const gimple * const *)p1;
>    const gimple *stmt2 = *(const gimple * const *)p2;
> +  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
> +    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
> +

is this because you eventually get out-of-loop stmts (without UID)?

>    return gimple_uid (stmt1) - gimple_uid (stmt2);
>  }
>  
>  /* Create pattern stmts for all stmts participating in the bool pattern
>     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> -   OUT_TYPE.  Return the def of the pattern root.  */
> +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> +   statements are not emitted as pattern statements and the tree returned is
> +   only useful for type queries.  */
>  
>  static tree
>  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> -		   tree out_type, stmt_vec_info stmt_info)
> +		   tree out_type, stmt_vec_info stmt_info,
> +		   bool type_only = false)
>  {
>    /* Gather original stmts in the bool pattern in their order of appearance
>       in the IL.  */
> @@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
>      bool_stmts.quick_push (*i);
>    bool_stmts.qsort (sort_after_uid);
>  
> +  gimple *last_stmt = NULL;
> +
>    /* Now process them in that order, producing pattern stmts.  */
>    hash_map <tree, tree> defs;
> -  for (unsigned i = 0; i < bool_stmts.length (); ++i)
> -    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> -			 out_type, stmt_info, defs);
> +  for (auto bool_stmt : bool_stmts)
> +    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
> +			 out_type, stmt_info, defs, last_stmt, type_only);
>  
>    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> -  gimple *pattern_stmt
> -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> -  return gimple_assign_lhs (pattern_stmt);
> +  return gimple_assign_lhs (last_stmt);
>  }
>  
>  /* Return the proper type for converting bool VAR into
> @@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
>    enum tree_code rhs_code;
>    tree var, lhs, rhs, vectype;
>    gimple *pattern_stmt;
> -
> -  if (!is_gimple_assign (last_stmt))
> +  gcond* cond = NULL;
> +  if (!is_gimple_assign (last_stmt)
> +      && !(cond = dyn_cast <gcond *> (last_stmt)))
>      return NULL;
>  
> -  var = gimple_assign_rhs1 (last_stmt);
> -  lhs = gimple_assign_lhs (last_stmt);
> -  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (is_gimple_assign (last_stmt))
> +    {
> +      var = gimple_assign_rhs1 (last_stmt);
> +      lhs = gimple_assign_lhs (last_stmt);
> +      rhs_code = gimple_assign_rhs_code (last_stmt);
> +    }
> +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    {
> +      /* If not multiple exits, and loop vectorization don't bother analyzing
> +	 the gcond as we don't support SLP today.  */
> +      lhs = var = gimple_cond_lhs (last_stmt);
> +      rhs_code = gimple_cond_code (last_stmt);
> +    }
> +  else
> +    return NULL;
>  
>    if (rhs_code == VIEW_CONVERT_EXPR)
>      var = TREE_OPERAND (var, 0);
> @@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  	return NULL;
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
>  	{
>  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				   TREE_TYPE (lhs), stmt_vinfo);
> @@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  
>        return pattern_stmt;
>      }
> -  else if (rhs_code == COND_EXPR
> +  else if ((rhs_code == COND_EXPR || cond)
>  	   && TREE_CODE (var) == SSA_NAME)
>      {
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> @@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> -	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
> +	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
>        else if (integer_type_for_mask (var, vinfo))
>  	return NULL;
>  
> -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> -      pattern_stmt 
> -	= gimple_build_assign (lhs, COND_EXPR,
> -			       build2 (NE_EXPR, boolean_type_node,
> -				       var, build_int_cst (TREE_TYPE (var), 0)),
> -			       gimple_assign_rhs2 (last_stmt),
> -			       gimple_assign_rhs3 (last_stmt));
> +      if (!cond)
> +	{
> +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +	  pattern_stmt
> +	    = gimple_build_assign (lhs, COND_EXPR,
> +				   build2 (NE_EXPR, boolean_type_node, var,
> +					   build_int_cst (TREE_TYPE (var), 0)),
> +				   gimple_assign_rhs2 (last_stmt),
> +				   gimple_assign_rhs3 (last_stmt));
> +	}
> +      else
> +	{
> +	  pattern_stmt
> +	    = gimple_build_cond (gimple_cond_code (cond),
> +				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
> +				 gimple_cond_true_label (cond),
> +				 gimple_cond_false_label (cond));
> +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> +	  vectype = truth_type_for (vectype);
> +	}
>        *type_out = vectype;
>        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
>  

So this is also quite odd.  You're hooking into COND_EXPR handling
but only look at the LHS of the GIMPLE_COND compare.

Please refactor the changes to separate the GIMPLE_COND path
completely.

Is there test coverage for such "complex" condition?  I think
you'll need adjustments to vect_recog_mask_conversion_pattern
as well similar as to how COND_EXPR is handled there.

> @@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
>  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				 TREE_TYPE (vectype), stmt_vinfo);
>        else
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f562721c140d586c94 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  gcc_assert (vectype);
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }

I think you rely on patterns transforming this into canonical form
mask != 0, so I suggest to check this here.

> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1

Also required for vec_num > 1 with SLP
(SLP_TREE_NUMBER_OF_VEC_STMTS)

> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  tree mask = NULL_TREE;
> +  if (masked_loop_p)
> +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +
> +      /* Mask the statements as we queue them up.  */
> +      if (masked_loop_p)
> +	for (auto stmt : stmts)
> +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> +						mask, stmt, &cond_gsi));

I think this still uses the wrong mask, you need to use

  vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, <cnt>)

replacing <cnt> with the vector def index to mask I think.  For this
reason keeping the "final" mask below is also wrong.

Or am I missing something?

> +      else
> +	workset.splice (stmts);
> +
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			     &cond_gsi);

so just do this in the else path above

Otherwise looks OK.

Richard.

> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> +			     build_zero_cst (vectype));
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-06  9:37         ` Richard Biener
@ 2023-12-08  8:58           ` Tamar Christina
  2023-12-08 10:28             ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Tamar Christina @ 2023-12-08  8:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 23128 bytes --]

> --param vect-partial-vector-usage=2 would, no?
> 
I.. didn't even know it went to 2!

> > In principal I suppose I could mask the individual stmts, that should handle the
> future case when
> > This is relaxed to supposed non-fix length buffers?
> 
> Well, it looks wrong - either put in an assert that we start with a
> single stmt or assert !masked_loop_p instead?  Better ICE than
> generate wrong code.
> 
> That said, I think you need to apply the masking on the original
> stmts[], before reducing them, no?

Yeah, I've done so now.  For simplicity I've just kept the final masking always as well
and just leave it up to the optimizers to drop it when it's superfluous.

Simple testcase:

#ifndef N
#define N 837
#endif
float vect_a[N];
unsigned vect_b[N];

unsigned test4(double x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   if (vect_a[i] > x)
     break;
   vect_a[i] = x;

 }
 return ret;
}

Looks good now. After this one there's only one patch left, the dependency analysis.
I'm almost done with the cleanup/respin, but want to take the weekend to double check and will post it first thing Monday morning.

Did you want to see the testsuite changes as well again? I've basically just added the right dg-requires-effective and add-options etc.

Thanks for all the reviews!

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.


--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848ae12523576d29744d 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform any
+   codegen associated with the boolean condition.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    bool analyze_only)
 {
   tree rhs1;
   enum tree_code rhs_code;
+  gassign *def_stmt = NULL;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info && !analyze_only)
     return false;
+  else if (!def_stmt_info)
+    /* If we're a only analyzing we won't be codegen-ing the statements and are
+       only after if the types match.  In that case we can accept loop invariant
+       values.  */
+    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
+  else
+    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
 
-  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
   if (!def_stmt)
     return false;
 
@@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   analyze_only))
 	return false;
       break;
 
@@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
+	      && !analyze_only
 	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
 	    return false;
 
@@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	    }
 	  else
 	    vecitype = comp_vectype;
-	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
+	  if (!analyze_only
+	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
 	    return false;
 	}
       else
@@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
    VAR is an SSA_NAME that should be transformed from bool to a wider integer
    type, OUT_TYPE is the desired final integer type of the whole pattern.
    STMT_INFO is the info of the pattern root and is where pattern stmts should
-   be associated with.  DEFS is a map of pattern defs.  */
+   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
+   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
 
 static void
 adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
-		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
+		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
+		     gimple *&last_stmt, bool type_only)
 {
   gimple *stmt = SSA_NAME_DEF_STMT (var);
   enum tree_code rhs_code, def_rhs_code;
@@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
     }
 
   gimple_set_location (pattern_stmt, loc);
-  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
-			  get_vectype_for_scalar_type (vinfo, itype));
+  if (!type_only)
+    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			    get_vectype_for_scalar_type (vinfo, itype));
+  last_stmt = pattern_stmt;
   defs.put (var, gimple_assign_lhs (pattern_stmt));
 }
 
-/* Comparison function to qsort a vector of gimple stmts after UID.  */
+/* Comparison function to qsort a vector of gimple stmts after BB and UID.
+   the def of one statement can be in an earlier block than the use, so if
+   the BB are different, first compare by BB.  */
 
 static int
 sort_after_uid (const void *p1, const void *p2)
 {
   const gimple *stmt1 = *(const gimple * const *)p1;
   const gimple *stmt2 = *(const gimple * const *)p2;
+  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
+    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
+
   return gimple_uid (stmt1) - gimple_uid (stmt2);
 }
 
 /* Create pattern stmts for all stmts participating in the bool pattern
    specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
-   OUT_TYPE.  Return the def of the pattern root.  */
+   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
+   statements are not emitted as pattern statements and the tree returned is
+   only useful for type queries.  */
 
 static tree
 adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
-		   tree out_type, stmt_vec_info stmt_info)
+		   tree out_type, stmt_vec_info stmt_info,
+		   bool type_only = false)
 {
   /* Gather original stmts in the bool pattern in their order of appearance
      in the IL.  */
@@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
     bool_stmts.quick_push (*i);
   bool_stmts.qsort (sort_after_uid);
 
+  gimple *last_stmt = NULL;
+
   /* Now process them in that order, producing pattern stmts.  */
   hash_map <tree, tree> defs;
-  for (unsigned i = 0; i < bool_stmts.length (); ++i)
-    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
-			 out_type, stmt_info, defs);
+  for (auto bool_stmt : bool_stmts)
+    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
+			 out_type, stmt_info, defs, last_stmt, type_only);
 
   /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
-  gimple *pattern_stmt
-    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
-  return gimple_assign_lhs (pattern_stmt);
+  return gimple_assign_lhs (last_stmt);
 }
 
 /* Return the proper type for converting bool VAR into
@@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      /* If not multiple exits, and loop vectorization don't bother analyzing
+	 the gcond as we don't support SLP today.  */
+      lhs = var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
+  else
+    return NULL;
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, false))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
-	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
+	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
       else if (integer_type_for_mask (var, vinfo))
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (gimple_cond_code (cond),
+				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
+	  vectype = truth_type_for (vectype);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, false))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f562721c140d586c94 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  */
+      if (masked_loop_p)
+	for (auto stmt : stmts)
+	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
+						mask, stmt, &cond_gsi));
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
+			     build_zero_cst (vectype));
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

[-- Attachment #2: rb17969 (1).patch --]
[-- Type: application/octet-stream, Size: 20605 bytes --]

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848ae12523576d29744d 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform any
+   codegen associated with the boolean condition.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    bool analyze_only)
 {
   tree rhs1;
   enum tree_code rhs_code;
+  gassign *def_stmt = NULL;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info && !analyze_only)
     return false;
+  else if (!def_stmt_info)
+    /* If we're a only analyzing we won't be codegen-ing the statements and are
+       only after if the types match.  In that case we can accept loop invariant
+       values.  */
+    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
+  else
+    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
 
-  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
   if (!def_stmt)
     return false;
 
@@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   analyze_only))
 	return false;
       break;
 
@@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
+	      && !analyze_only
 	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
 	    return false;
 
@@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	    }
 	  else
 	    vecitype = comp_vectype;
-	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
+	  if (!analyze_only
+	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
 	    return false;
 	}
       else
@@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
    VAR is an SSA_NAME that should be transformed from bool to a wider integer
    type, OUT_TYPE is the desired final integer type of the whole pattern.
    STMT_INFO is the info of the pattern root and is where pattern stmts should
-   be associated with.  DEFS is a map of pattern defs.  */
+   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
+   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
 
 static void
 adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
-		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
+		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
+		     gimple *&last_stmt, bool type_only)
 {
   gimple *stmt = SSA_NAME_DEF_STMT (var);
   enum tree_code rhs_code, def_rhs_code;
@@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
     }
 
   gimple_set_location (pattern_stmt, loc);
-  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
-			  get_vectype_for_scalar_type (vinfo, itype));
+  if (!type_only)
+    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			    get_vectype_for_scalar_type (vinfo, itype));
+  last_stmt = pattern_stmt;
   defs.put (var, gimple_assign_lhs (pattern_stmt));
 }
 
-/* Comparison function to qsort a vector of gimple stmts after UID.  */
+/* Comparison function to qsort a vector of gimple stmts after BB and UID.
+   the def of one statement can be in an earlier block than the use, so if
+   the BB are different, first compare by BB.  */
 
 static int
 sort_after_uid (const void *p1, const void *p2)
 {
   const gimple *stmt1 = *(const gimple * const *)p1;
   const gimple *stmt2 = *(const gimple * const *)p2;
+  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
+    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
+
   return gimple_uid (stmt1) - gimple_uid (stmt2);
 }
 
 /* Create pattern stmts for all stmts participating in the bool pattern
    specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
-   OUT_TYPE.  Return the def of the pattern root.  */
+   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
+   statements are not emitted as pattern statements and the tree returned is
+   only useful for type queries.  */
 
 static tree
 adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
-		   tree out_type, stmt_vec_info stmt_info)
+		   tree out_type, stmt_vec_info stmt_info,
+		   bool type_only = false)
 {
   /* Gather original stmts in the bool pattern in their order of appearance
      in the IL.  */
@@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
     bool_stmts.quick_push (*i);
   bool_stmts.qsort (sort_after_uid);
 
+  gimple *last_stmt = NULL;
+
   /* Now process them in that order, producing pattern stmts.  */
   hash_map <tree, tree> defs;
-  for (unsigned i = 0; i < bool_stmts.length (); ++i)
-    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
-			 out_type, stmt_info, defs);
+  for (auto bool_stmt : bool_stmts)
+    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
+			 out_type, stmt_info, defs, last_stmt, type_only);
 
   /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
-  gimple *pattern_stmt
-    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
-  return gimple_assign_lhs (pattern_stmt);
+  return gimple_assign_lhs (last_stmt);
 }
 
 /* Return the proper type for converting bool VAR into
@@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      /* If not multiple exits, and loop vectorization don't bother analyzing
+	 the gcond as we don't support SLP today.  */
+      lhs = var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
+  else
+    return NULL;
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, false))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
-	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
+	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
       else if (integer_type_for_mask (var, vinfo))
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (gimple_cond_code (cond),
+				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
+	  vectype = truth_type_for (vectype);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, false))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f562721c140d586c94 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  */
+      if (masked_loop_p)
+	for (auto stmt : stmts)
+	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
+						mask, stmt, &cond_gsi));
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
+			     build_zero_cst (vectype));
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-06  4:37       ` Tamar Christina
@ 2023-12-06  9:37         ` Richard Biener
  2023-12-08  8:58           ` Tamar Christina
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2023-12-06  9:37 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > > +
> > > > +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> > > > + TYPE_MODE (truth_type);  int ncopies;
> > > > +
> > 
> > more line break issues ... (also below, check yourself)
> > 
> > shouldn't STMT_VINFO_VECTYPE already match truth_type here?  If not
> > it looks to be set wrongly (or shouldn't be set at all)
> > 
> 
> Fixed, I now leverage the existing vect_recog_bool_pattern to update the types
> If needed and determine the initial type in vect_get_vector_types_for_stmt.
> 
> > > > +  if (slp_node)
> > > > +    ncopies = 1;
> > > > +  else
> > > > +    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> > > > +
> > > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> > > > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > > +
> > 
> > what about with_len?
> 
> Should be easy to add, but don't know how it works.
> 
> > 
> > > > +  /* Analyze only.  */
> > > > +  if (!vec_stmt)
> > > > +    {
> > > > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target doesn't support flag setting vector "
> > > > +			       "comparisons.\n");
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
> > 
> > Why NE_EXPR?  This looks wrong.  Or vectype_op is wrong if you're
> > emitting
> > 
> >  mask = op0 CMP op1;
> >  if (mask != 0)
> > 
> > I think you need to check for CMP, not NE_EXPR.
> 
> Well CMP is checked by vectorizable_comparison_1, but I realized this
> check is not checking what I wanted and the cbranch requirements
> already do.  So removed.
> 
> > 
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target does not support boolean vector "
> > > > +			       "comparisons for type %T.\n", truth_type);
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (ncopies > 1
> > > > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target does not support boolean vector OR for "
> > > > +			       "type %T.\n", truth_type);
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> > > > +				      vec_stmt, slp_node, cost_vec))
> > > > +	return false;
> > 
> > I suppose vectorizable_comparison_1 will check this again, so the above
> > is redundant?
> > 
> 
> The IOR? No, vectorizable_comparison_1 doesn't reduce so may not check it
> depending on the condition.
> 
> > > > +  /* Determine if we need to reduce the final value.  */
> > > > +  if (stmts.length () > 1)
> > > > +    {
> > > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > > +	 possible.  */
> > > > +      auto_vec<tree> workset (stmts.length ());
> > > > +      workset.splice (stmts);
> > > > +      while (workset.length () > 1)
> > > > +	{
> > > > +	  new_temp = make_temp_ssa_name (truth_type, NULL,
> > > > "vexit_reduc");
> > > > +	  tree arg0 = workset.pop ();
> > > > +	  tree arg1 = workset.pop ();
> > > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> > > > arg1);
> > > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > > +				       &cond_gsi);
> > > > +	  if (slp_node)
> > > > +	    slp_node->push_vec_def (new_stmt);
> > > > +	  else
> > > > +	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > > > +	  workset.quick_insert (0, new_temp);
> > 
> > Reduction epilogue handling has similar code to reduce a set of vectors
> > to a single one with an operation.  I think we want to share that code.
> > 
> 
> I've taken a look but that code isn't suitable here since they have different
> constraints.  I don't require an in-order reduction since for the comparison
> all we care about is whether in a lane any bit is set or not.  This means:
> 
> 1. we can reduce using a fast operation like IOR.
> 2. we can reduce in as much parallelism as possible.
> 
> The comparison is on the critical path for the loop now, unlike live reductions
> which are always at the end, so using the live reduction code resulted in a
> slow down since it creates a longer dependency chain.

OK.

> > > > +	}
> > > > +    }
> > > > +  else
> > > > +    new_temp = stmts[0];
> > > > +
> > > > +  gcc_assert (new_temp);
> > > > +
> > > > +  tree cond = new_temp;
> > > > +  if (masked_loop_p)
> > > > +    {
> > > > +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> > > > truth_type, 0);
> > > > +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > > +			       &cond_gsi);
> > 
> > I don't think this is correct when 'stmts' had more than one vector?
> > 
> 
> It is, because even when VLA, since we only support counted loops partial vectors
> are disabled. And it looks like --parm vect-partial-vector-usage=1 cannot force it on.

--param vect-partial-vector-usage=2 would, no?

> In principal I suppose I could mask the individual stmts, that should handle the future case when
> This is relaxed to supposed non-fix length buffers?

Well, it looks wrong - either put in an assert that we start with a
single stmt or assert !masked_loop_p instead?  Better ICE than
generate wrong code.

That said, I think you need to apply the masking on the original
stmts[], before reducing them, no?

Thanks,
Richard.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> 	vect_recog_bool_pattern): Support gconds type analysis.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5210,19 +5211,27 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
>     true if bool VAR can and should be optimized that way.  Assume it shouldn't
>     in case it's a result of a comparison which can be directly vectorized into
>     a vector comparison.  Fills in STMTS with all stmts visited during the
> -   walk.  */
> +   walk.  if COND then a gcond is being inspected instead of a normal COND,  */
>  
>  static bool
> -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> +		    gcond *cond)
>  {
>    tree rhs1;
>    enum tree_code rhs_code;
> +  gassign *def_stmt = NULL;
>  
>    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> -  if (!def_stmt_info)
> +  if (!def_stmt_info && !cond)
>      return false;
> +  else if (!def_stmt_info)
> +    /* If we're a gcond we won't be codegen-ing the statements and are only
> +       after if the types match.  In that case we can accept loop invariant
> +       values.  */
> +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> +  else
> +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>  
> -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>    if (!def_stmt)
>      return false;
>  
> @@ -5234,27 +5243,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>    switch (rhs_code)
>      {
>      case SSA_NAME:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
>  	return false;
>        break;
>  
>      CASE_CONVERT:
>        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
>  	return false;
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
>  	return false;
>        break;
>  
>      case BIT_NOT_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
>  	return false;
>        break;
>  
>      case BIT_AND_EXPR:
>      case BIT_IOR_EXPR:
>      case BIT_XOR_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond)
> +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> +				   cond))
>  	return false;
>        break;
>  
> @@ -5275,6 +5285,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
>  							  TREE_TYPE (rhs1));
>  	  if (mask_type
> +	      && !cond
>  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
>  	    return false;
>  
> @@ -5324,11 +5335,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
>     VAR is an SSA_NAME that should be transformed from bool to a wider integer
>     type, OUT_TYPE is the desired final integer type of the whole pattern.
>     STMT_INFO is the info of the pattern root and is where pattern stmts should
> -   be associated with.  DEFS is a map of pattern defs.  */
> +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> +   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
>  
>  static void
>  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> +		     gimple *&last_stmt, bool type_only)
>  {
>    gimple *stmt = SSA_NAME_DEF_STMT (var);
>    enum tree_code rhs_code, def_rhs_code;
> @@ -5492,8 +5505,10 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
>      }
>  
>    gimple_set_location (pattern_stmt, loc);
> -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> -			  get_vectype_for_scalar_type (vinfo, itype));
> +  if (!type_only)
> +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> +			    get_vectype_for_scalar_type (vinfo, itype));
> +  last_stmt = pattern_stmt;
>    defs.put (var, gimple_assign_lhs (pattern_stmt));
>  }
>  
> @@ -5509,11 +5524,14 @@ sort_after_uid (const void *p1, const void *p2)
>  
>  /* Create pattern stmts for all stmts participating in the bool pattern
>     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> -   OUT_TYPE.  Return the def of the pattern root.  */
> +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> +   statements are not emitted as pattern statements and the tree returned is
> +   only useful for type queries.  */
>  
>  static tree
>  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> -		   tree out_type, stmt_vec_info stmt_info)
> +		   tree out_type, stmt_vec_info stmt_info,
> +		   bool type_only = false)
>  {
>    /* Gather original stmts in the bool pattern in their order of appearance
>       in the IL.  */
> @@ -5523,16 +5541,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
>      bool_stmts.quick_push (*i);
>    bool_stmts.qsort (sort_after_uid);
>  
> +  gimple *last_stmt = NULL;
> +
>    /* Now process them in that order, producing pattern stmts.  */
>    hash_map <tree, tree> defs;
>    for (unsigned i = 0; i < bool_stmts.length (); ++i)
>      adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> -			 out_type, stmt_info, defs);
> +			 out_type, stmt_info, defs, last_stmt, type_only);
>  
>    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> -  gimple *pattern_stmt
> -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> -  return gimple_assign_lhs (pattern_stmt);
> +  return gimple_assign_lhs (last_stmt);
>  }
>  
>  /* Return the proper type for converting bool VAR into
> @@ -5608,13 +5626,22 @@ vect_recog_bool_pattern (vec_info *vinfo,
>    enum tree_code rhs_code;
>    tree var, lhs, rhs, vectype;
>    gimple *pattern_stmt;
> -
> -  if (!is_gimple_assign (last_stmt))
> +  gcond* cond = NULL;
> +  if (!is_gimple_assign (last_stmt)
> +      && !(cond = dyn_cast <gcond *> (last_stmt)))
>      return NULL;
>  
> -  var = gimple_assign_rhs1 (last_stmt);
> -  lhs = gimple_assign_lhs (last_stmt);
> -  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  if (is_gimple_assign (last_stmt))
> +    {
> +      var = gimple_assign_rhs1 (last_stmt);
> +      lhs = gimple_assign_lhs (last_stmt);
> +      rhs_code = gimple_assign_rhs_code (last_stmt);
> +    }
> +  else
> +    {
> +      lhs = var = gimple_cond_lhs (last_stmt);
> +      rhs_code = gimple_cond_code (last_stmt);
> +    }
>  
>    if (rhs_code == VIEW_CONVERT_EXPR)
>      var = TREE_OPERAND (var, 0);
> @@ -5632,7 +5659,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  	return NULL;
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
>  	{
>  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				   TREE_TYPE (lhs), stmt_vinfo);
> @@ -5680,7 +5707,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  
>        return pattern_stmt;
>      }
> -  else if (rhs_code == COND_EXPR
> +  else if ((rhs_code == COND_EXPR || cond)
>  	   && TREE_CODE (var) == SSA_NAME)
>      {
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> @@ -5700,18 +5727,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
>  	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
>        else if (integer_type_for_mask (var, vinfo))
>  	return NULL;
>  
> -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> -      pattern_stmt 
> -	= gimple_build_assign (lhs, COND_EXPR,
> -			       build2 (NE_EXPR, boolean_type_node,
> -				       var, build_int_cst (TREE_TYPE (var), 0)),
> -			       gimple_assign_rhs2 (last_stmt),
> -			       gimple_assign_rhs3 (last_stmt));
> +      if (!cond)
> +	{
> +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +	  pattern_stmt
> +	    = gimple_build_assign (lhs, COND_EXPR,
> +				   build2 (NE_EXPR, boolean_type_node, var,
> +					   build_int_cst (TREE_TYPE (var), 0)),
> +				   gimple_assign_rhs2 (last_stmt),
> +				   gimple_assign_rhs3 (last_stmt));
> +	}
> +      else
> +	{
> +	  pattern_stmt
> +	    = gimple_build_cond (gimple_cond_code (cond), gimple_cond_lhs (cond),
> +				 gimple_cond_rhs (cond),
> +				 gimple_cond_true_label (cond),
> +				 gimple_cond_false_label (cond));
> +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> +	  vectype = truth_type_for (vectype);
> +	}
>        *type_out = vectype;
>        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
>  
> @@ -5725,7 +5765,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
>  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				 TREE_TYPE (vectype), stmt_vinfo);
>        else
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..d801b72a149ebe6aa4d1f2942324b042d07be530 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,176 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  gcc_assert (vectype);
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +      workset.splice (stmts);
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    {
> +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> +				      vectype, 0);
> +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			       &cond_gsi);
> +    }
> +
> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> +			     build_zero_cst (vectype));
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13123,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13148,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13310,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14506,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14533,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-11-29 13:50     ` Richard Biener
@ 2023-12-06  4:37       ` Tamar Christina
  2023-12-06  9:37         ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Tamar Christina @ 2023-12-06  4:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 25416 bytes --]

> > > +
> > > +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> > > + TYPE_MODE (truth_type);  int ncopies;
> > > +
> 
> more line break issues ... (also below, check yourself)
> 
> shouldn't STMT_VINFO_VECTYPE already match truth_type here?  If not
> it looks to be set wrongly (or shouldn't be set at all)
> 

Fixed, I now leverage the existing vect_recog_bool_pattern to update the types
If needed and determine the initial type in vect_get_vector_types_for_stmt.

> > > +  if (slp_node)
> > > +    ncopies = 1;
> > > +  else
> > > +    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> > > +
> > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> > > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > +
> 
> what about with_len?

Should be easy to add, but don't know how it works.

> 
> > > +  /* Analyze only.  */
> > > +  if (!vec_stmt)
> > > +    {
> > > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target doesn't support flag setting vector "
> > > +			       "comparisons.\n");
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
> 
> Why NE_EXPR?  This looks wrong.  Or vectype_op is wrong if you're
> emitting
> 
>  mask = op0 CMP op1;
>  if (mask != 0)
> 
> I think you need to check for CMP, not NE_EXPR.

Well CMP is checked by vectorizable_comparison_1, but I realized this
check is not checking what I wanted and the cbranch requirements
already do.  So removed.

> 
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target does not support boolean vector "
> > > +			       "comparisons for type %T.\n", truth_type);
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (ncopies > 1
> > > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target does not support boolean vector OR for "
> > > +			       "type %T.\n", truth_type);
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> > > +				      vec_stmt, slp_node, cost_vec))
> > > +	return false;
> 
> I suppose vectorizable_comparison_1 will check this again, so the above
> is redundant?
> 

The IOR? No, vectorizable_comparison_1 doesn't reduce so may not check it
depending on the condition.

> > > +  /* Determine if we need to reduce the final value.  */
> > > +  if (stmts.length () > 1)
> > > +    {
> > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > +	 possible.  */
> > > +      auto_vec<tree> workset (stmts.length ());
> > > +      workset.splice (stmts);
> > > +      while (workset.length () > 1)
> > > +	{
> > > +	  new_temp = make_temp_ssa_name (truth_type, NULL,
> > > "vexit_reduc");
> > > +	  tree arg0 = workset.pop ();
> > > +	  tree arg1 = workset.pop ();
> > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> > > arg1);
> > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > +				       &cond_gsi);
> > > +	  if (slp_node)
> > > +	    slp_node->push_vec_def (new_stmt);
> > > +	  else
> > > +	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > > +	  workset.quick_insert (0, new_temp);
> 
> Reduction epilogue handling has similar code to reduce a set of vectors
> to a single one with an operation.  I think we want to share that code.
> 

I've taken a look but that code isn't suitable here since they have different
constraints.  I don't require an in-order reduction since for the comparison
all we care about is whether in a lane any bit is set or not.  This means:

1. we can reduce using a fast operation like IOR.
2. we can reduce in as much parallelism as possible.

The comparison is on the critical path for the loop now, unlike live reductions
which are always at the end, so using the live reduction code resulted in a
slow down since it creates a longer dependency chain.

> > > +	}
> > > +    }
> > > +  else
> > > +    new_temp = stmts[0];
> > > +
> > > +  gcc_assert (new_temp);
> > > +
> > > +  tree cond = new_temp;
> > > +  if (masked_loop_p)
> > > +    {
> > > +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> > > truth_type, 0);
> > > +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > +			       &cond_gsi);
> 
> I don't think this is correct when 'stmts' had more than one vector?
> 

It is, because even when VLA, since we only support counted loops partial vectors
are disabled. And it looks like --parm vect-partial-vector-usage=1 cannot force it on.

In principal I suppose I could mask the individual stmts, that should handle the future case when
This is relaxed to supposed non-fix length buffers?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
	vect_recog_bool_pattern): Support gconds type analysis.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,19 +5211,27 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if COND then a gcond is being inspected instead of a normal COND,  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    gcond *cond)
 {
   tree rhs1;
   enum tree_code rhs_code;
+  gassign *def_stmt = NULL;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info && !cond)
     return false;
+  else if (!def_stmt_info)
+    /* If we're a gcond we won't be codegen-ing the statements and are only
+       after if the types match.  In that case we can accept loop invariant
+       values.  */
+    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
+  else
+    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
 
-  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
   if (!def_stmt)
     return false;
 
@@ -5234,27 +5243,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   cond))
 	return false;
       break;
 
@@ -5275,6 +5285,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
+	      && !cond
 	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
 	    return false;
 
@@ -5324,11 +5335,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
    VAR is an SSA_NAME that should be transformed from bool to a wider integer
    type, OUT_TYPE is the desired final integer type of the whole pattern.
    STMT_INFO is the info of the pattern root and is where pattern stmts should
-   be associated with.  DEFS is a map of pattern defs.  */
+   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
+   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
 
 static void
 adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
-		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
+		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
+		     gimple *&last_stmt, bool type_only)
 {
   gimple *stmt = SSA_NAME_DEF_STMT (var);
   enum tree_code rhs_code, def_rhs_code;
@@ -5492,8 +5505,10 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
     }
 
   gimple_set_location (pattern_stmt, loc);
-  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
-			  get_vectype_for_scalar_type (vinfo, itype));
+  if (!type_only)
+    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			    get_vectype_for_scalar_type (vinfo, itype));
+  last_stmt = pattern_stmt;
   defs.put (var, gimple_assign_lhs (pattern_stmt));
 }
 
@@ -5509,11 +5524,14 @@ sort_after_uid (const void *p1, const void *p2)
 
 /* Create pattern stmts for all stmts participating in the bool pattern
    specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
-   OUT_TYPE.  Return the def of the pattern root.  */
+   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
+   statements are not emitted as pattern statements and the tree returned is
+   only useful for type queries.  */
 
 static tree
 adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
-		   tree out_type, stmt_vec_info stmt_info)
+		   tree out_type, stmt_vec_info stmt_info,
+		   bool type_only = false)
 {
   /* Gather original stmts in the bool pattern in their order of appearance
      in the IL.  */
@@ -5523,16 +5541,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
     bool_stmts.quick_push (*i);
   bool_stmts.qsort (sort_after_uid);
 
+  gimple *last_stmt = NULL;
+
   /* Now process them in that order, producing pattern stmts.  */
   hash_map <tree, tree> defs;
   for (unsigned i = 0; i < bool_stmts.length (); ++i)
     adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
-			 out_type, stmt_info, defs);
+			 out_type, stmt_info, defs, last_stmt, type_only);
 
   /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
-  gimple *pattern_stmt
-    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
-  return gimple_assign_lhs (pattern_stmt);
+  return gimple_assign_lhs (last_stmt);
 }
 
 /* Return the proper type for converting bool VAR into
@@ -5608,13 +5626,22 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else
+    {
+      lhs = var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5659,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5707,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5727,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
       else if (integer_type_for_mask (var, vinfo))
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (gimple_cond_code (cond), gimple_cond_lhs (cond),
+				 gimple_cond_rhs (cond),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
+	  vectype = truth_type_for (vectype);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5765,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..d801b72a149ebe6aa4d1f2942324b042d07be530 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,176 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
+				      vectype, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
+			     build_zero_cst (vectype));
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13123,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13148,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13310,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14506,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14533,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

[-- Attachment #2: rb17969.patch --]
[-- Type: application/octet-stream, Size: 18893 bytes --]

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,19 +5211,27 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if COND then a gcond is being inspected instead of a normal COND,  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    gcond *cond)
 {
   tree rhs1;
   enum tree_code rhs_code;
+  gassign *def_stmt = NULL;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info && !cond)
     return false;
+  else if (!def_stmt_info)
+    /* If we're a gcond we won't be codegen-ing the statements and are only
+       after if the types match.  In that case we can accept loop invariant
+       values.  */
+    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
+  else
+    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
 
-  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
   if (!def_stmt)
     return false;
 
@@ -5234,27 +5243,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   cond))
 	return false;
       break;
 
@@ -5275,6 +5285,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
+	      && !cond
 	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
 	    return false;
 
@@ -5324,11 +5335,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
    VAR is an SSA_NAME that should be transformed from bool to a wider integer
    type, OUT_TYPE is the desired final integer type of the whole pattern.
    STMT_INFO is the info of the pattern root and is where pattern stmts should
-   be associated with.  DEFS is a map of pattern defs.  */
+   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
+   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
 
 static void
 adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
-		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
+		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
+		     gimple *&last_stmt, bool type_only)
 {
   gimple *stmt = SSA_NAME_DEF_STMT (var);
   enum tree_code rhs_code, def_rhs_code;
@@ -5492,8 +5505,10 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
     }
 
   gimple_set_location (pattern_stmt, loc);
-  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
-			  get_vectype_for_scalar_type (vinfo, itype));
+  if (!type_only)
+    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			    get_vectype_for_scalar_type (vinfo, itype));
+  last_stmt = pattern_stmt;
   defs.put (var, gimple_assign_lhs (pattern_stmt));
 }
 
@@ -5509,11 +5524,14 @@ sort_after_uid (const void *p1, const void *p2)
 
 /* Create pattern stmts for all stmts participating in the bool pattern
    specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
-   OUT_TYPE.  Return the def of the pattern root.  */
+   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
+   statements are not emitted as pattern statements and the tree returned is
+   only useful for type queries.  */
 
 static tree
 adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
-		   tree out_type, stmt_vec_info stmt_info)
+		   tree out_type, stmt_vec_info stmt_info,
+		   bool type_only = false)
 {
   /* Gather original stmts in the bool pattern in their order of appearance
      in the IL.  */
@@ -5523,16 +5541,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
     bool_stmts.quick_push (*i);
   bool_stmts.qsort (sort_after_uid);
 
+  gimple *last_stmt = NULL;
+
   /* Now process them in that order, producing pattern stmts.  */
   hash_map <tree, tree> defs;
   for (unsigned i = 0; i < bool_stmts.length (); ++i)
     adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
-			 out_type, stmt_info, defs);
+			 out_type, stmt_info, defs, last_stmt, type_only);
 
   /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
-  gimple *pattern_stmt
-    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
-  return gimple_assign_lhs (pattern_stmt);
+  return gimple_assign_lhs (last_stmt);
 }
 
 /* Return the proper type for converting bool VAR into
@@ -5608,13 +5626,22 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else
+    {
+      lhs = var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5659,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5707,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5727,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
       else if (integer_type_for_mask (var, vinfo))
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (gimple_cond_code (cond), gimple_cond_lhs (cond),
+				 gimple_cond_rhs (cond),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
+	  vectype = truth_type_for (vectype);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5765,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..d801b72a149ebe6aa4d1f2942324b042d07be530 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,176 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
+				      vectype, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
+			     build_zero_cst (vectype));
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13123,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13148,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13310,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14506,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14533,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-11-27 22:49   ` Tamar Christina
@ 2023-11-29 13:50     ` Richard Biener
  2023-12-06  4:37       ` Tamar Christina
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2023-11-29 13:50 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 27 Nov 2023, Tamar Christina wrote:

> Ping
> 
> > -----Original Message-----
> > From: Tamar Christina <tamar.christina@arm.com>
> > Sent: Monday, November 6, 2023 7:40 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > Subject: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> > codegen of exit code
> > 
> > Hi All,
> > 
> > This implements vectorable_early_exit which is used as the codegen part of
> > vectorizing a gcond.
> > 
> > For the most part it shares the majority of the code with
> > vectorizable_comparison with addition that it needs to be able to reduce
> > multiple resulting statements into a single one for use in the gcond, and also
> > needs to be able to perform masking on the comparisons.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > Ok for master?
> > 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts
> > without
> > 	lhs.
> > 	(vectorizable_early_exit): New.
> > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support
> > gcond.
> > 
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > 36aeca60a22cfaea8d3b43348000d75de1d525c7..4809b822632279493a84
> > 3d402a833c9267bb315e 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12475,7 +12475,7 @@ vectorizable_comparison_1 (vec_info *vinfo,
> > tree vectype,
> >    vec<tree> vec_oprnds0 = vNULL;
> >    vec<tree> vec_oprnds1 = vNULL;
> >    tree mask_type;
> > -  tree mask;
> > +  tree mask = NULL_TREE;
> > 
> >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> >      return false;
> > @@ -12615,8 +12615,9 @@ vectorizable_comparison_1 (vec_info *vinfo,
> > tree vectype,
> >    /* Transform.  */
> > 
> >    /* Handle def.  */
> > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > -  mask = vect_create_destination_var (lhs, mask_type);
> > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));  if (lhs)
> > +    mask = vect_create_destination_var (lhs, mask_type);

wrecked line-break / white-space

> > 
> >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> >  		     rhs1, &vec_oprnds0, vectype,
> > @@ -12630,7 +12631,10 @@ vectorizable_comparison_1 (vec_info *vinfo,
> > tree vectype,
> >        gimple *new_stmt;
> >        vec_rhs2 = vec_oprnds1[i];
> > 
> > -      new_temp = make_ssa_name (mask);
> > +      if (lhs)
> > +	new_temp = make_ssa_name (mask);
> > +      else
> > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> >        if (bitop1 == NOP_EXPR)
> >  	{
> >  	  new_stmt = gimple_build_assign (new_temp, code, @@ -12709,6
> > +12713,196 @@ vectorizable_comparison (vec_info *vinfo,
> >    return true;
> >  }
> > 
> > +/* Check to see if the current early break given in STMT_INFO is valid for
> > +   vectorization.  */
> > +
> > +static bool
> > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec) {

{ goes to the next line

> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > +  if (!loop_vinfo
> > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > +    return false;
> > +
> > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
> > +    return false;
> > +
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > +    return false;
> > +
> > +  gimple_match_op op;
> > +  if (!gimple_extract_op (stmt_info->stmt, &op))
> > +    gcc_unreachable ();
> > +  gcc_assert (op.code.is_tree_code ());  auto code = tree_code
> > + (op.code);

missed line break

> > +
> > +  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);  gcc_assert
> > + (vectype_out);

likewise.

> > +  tree var_op = op.ops[0];
> > +
> > +  /* When vectorizing things like pointer comparisons we will assume that
> > +     the VF of both operands are the same. e.g. a pointer must be compared
> > +     to a pointer.  We'll leave this up to vectorizable_comparison_1 to
> > +     check further.  */
> > +  tree vectype_op = vectype_out;
> > +  if (SSA_VAR_P (var_op))

TREE_CODE (var_op) == SSA_NAME

> > +    {
> > +      stmt_vec_info operand0_info
> > +	= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (var_op));

lookup_def (var_op)

> > +      if (!operand0_info)
> > +	return false;
> > +
> > +      /* If we're in a pattern get the type of the original statement.  */
> > +      if (STMT_VINFO_IN_PATTERN_P (operand0_info))
> > +	operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
> > +      vectype_op = STMT_VINFO_VECTYPE (operand0_info);
> > +    }

I think you want to use vect_is_simple_use on var_op instead, that's
the canonical way for querying operands.

> > +
> > +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> > + TYPE_MODE (truth_type);  int ncopies;
> > +

more line break issues ... (also below, check yourself)

shouldn't STMT_VINFO_VECTYPE already match truth_type here?  If not
it looks to be set wrongly (or shouldn't be set at all)

> > +  if (slp_node)
> > +    ncopies = 1;
> > +  else
> > +    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> > +
> > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +

what about with_len?

> > +  /* Analyze only.  */
> > +  if (!vec_stmt)
> > +    {
> > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target doesn't support flag setting vector "
> > +			       "comparisons.\n");
> > +	  return false;
> > +	}
> > +
> > +      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))

Why NE_EXPR?  This looks wrong.  Or vectype_op is wrong if you're
emitting

 mask = op0 CMP op1;
 if (mask != 0)

I think you need to check for CMP, not NE_EXPR.

> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target does not support boolean vector "
> > +			       "comparisons for type %T.\n", truth_type);
> > +	  return false;
> > +	}
> > +
> > +      if (ncopies > 1
> > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target does not support boolean vector OR for "
> > +			       "type %T.\n", truth_type);
> > +	  return false;
> > +	}
> > +
> > +      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> > +				      vec_stmt, slp_node, cost_vec))
> > +	return false;

I suppose vectorizable_comparison_1 will check this again, so the above
is redundant?

> > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > +	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type,
> > NULL);

LENs missing (or disabling partial vectors).

> > +      return true;
> > +    }
> > +
> > +  /* Tranform.  */
> > +
> > +  tree new_temp = NULL_TREE;
> > +  gimple *new_stmt = NULL;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "transform
> > + early-exit.\n");
> > +
> > +  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> > +				  vec_stmt, slp_node, cost_vec))
> > +    gcc_unreachable ();
> > +
> > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);  basic_block cond_bb =
> > + gimple_bb (stmt);  gimple_stmt_iterator  cond_gsi = gsi_last_bb
> > + (cond_bb);
> > +
> > +  vec<tree> stmts;
> > +
> > +  if (slp_node)
> > +    stmts = SLP_TREE_VEC_DEFS (slp_node);
> > +  else
> > +    {
> > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > +      stmts.create (vec_stmts.length ());
> > +      for (auto stmt : vec_stmts)
> > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > +    }
>
> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +    {
> > +      /* We build the reductions in a way to maintain as much parallelism as
> > +	 possible.  */
> > +      auto_vec<tree> workset (stmts.length ());
> > +      workset.splice (stmts);
> > +      while (workset.length () > 1)
> > +	{
> > +	  new_temp = make_temp_ssa_name (truth_type, NULL,
> > "vexit_reduc");
> > +	  tree arg0 = workset.pop ();
> > +	  tree arg1 = workset.pop ();
> > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> > arg1);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +				       &cond_gsi);
> > +	  if (slp_node)
> > +	    slp_node->push_vec_def (new_stmt);
> > +	  else
> > +	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > +	  workset.quick_insert (0, new_temp);

Reduction epilogue handling has similar code to reduce a set of vectors
to a single one with an operation.  I think we want to share that code.

> > +	}
> > +    }
> > +  else
> > +    new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  if (masked_loop_p)
> > +    {
> > +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> > truth_type, 0);
> > +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +			       &cond_gsi);

I don't think this is correct when 'stmts' had more than one vector?

> > +    }
> > +
> > +  /* Now build the new conditional.  Pattern gimple_conds get dropped
> > during
> > +     codegen so we must replace the original insn.  */  if
> > + (is_pattern_stmt_p (stmt_info))
> > +    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));

vect_original_stmt?

> > +
> > +  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
> > +			build_zero_cst (truth_type));
> > +  t = canonicalize_cond_expr_cond (t);
> > +  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);

Please use gimple_cond_set_{lhs,rhs,code} instead of going through
GENERIC.

> > +  update_stmt (stmt);
> > +
> > +  if (slp_node)
> > +    slp_node->push_vec_def (stmt);
> > +   else
> > +    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);

I don't think we need those, in fact we still have the original defs
from vectorizable_comparison_1 here?  I'd just truncate both vectors.

> > +
> > +
> > +  if (!slp_node)
> > +    *vec_stmt = stmt;

I think you leak 'stmts' for !slp

Otherwise looks good.

Richard.


> > +  return true;
> > +}
> > +
> >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> >     can handle all live statements in the node.  Otherwise return true
> >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > @@ -12928,7 +13122,9 @@ vect_analyze_stmt (vec_info *vinfo,
> >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> >  				  stmt_info, NULL, node)
> >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > -				   stmt_info, NULL, node, cost_vec));
> > +				   stmt_info, NULL, node, cost_vec)
> > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +				      cost_vec));
> >    else
> >      {
> >        if (bb_vinfo)
> > @@ -12951,7 +13147,10 @@ vect_analyze_stmt (vec_info *vinfo,
> >  					 NULL, NULL, node, cost_vec)
> >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> >  					  cost_vec)
> > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +					  cost_vec));
> > +
> >      }
> > 
> >    if (node)
> > @@ -13110,6 +13309,12 @@ vect_transform_stmt (vec_info *vinfo,
> >        gcc_assert (done);
> >        break;
> > 
> > +    case loop_exit_ctrl_vec_info_type:
> > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > +				      slp_node, NULL);
> > +      gcc_assert (done);
> > +      break;
> > +
> >      default:
> >        if (!STMT_VINFO_LIVE_P (stmt_info))
> >  	{
> > @@ -13511,7 +13716,7 @@ vect_is_simple_use (tree operand, vec_info
> > *vinfo, enum vect_def_type *dt,
> >  	case vect_first_order_recurrence:
> >  	  dump_printf (MSG_NOTE, "first order recurrence\n");
> >  	  break;
> > -       case vect_early_exit_def:
> > +	case vect_early_exit_def:
> >  	  dump_printf (MSG_NOTE, "early exit\n");
> >  	  break;
> >  	case vect_unknown_def_type:
> > 
> > 
> > 
> > 
> > --
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
@ 2023-11-27 22:49   ` Tamar Christina
  2023-11-29 13:50     ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Tamar Christina @ 2023-11-27 22:49 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: nd, rguenther, jlaw

Ping

> -----Original Message-----
> From: Tamar Christina <tamar.christina@arm.com>
> Sent: Monday, November 6, 2023 7:40 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> Hi All,
> 
> This implements vectorable_early_exit which is used as the codegen part of
> vectorizing a gcond.
> 
> For the most part it shares the majority of the code with
> vectorizable_comparison with addition that it needs to be able to reduce
> multiple resulting statements into a single one for use in the gcond, and also
> needs to be able to perform masking on the comparisons.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts
> without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support
> gcond.
> 
> --- inline copy of patch --
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> 36aeca60a22cfaea8d3b43348000d75de1d525c7..4809b822632279493a84
> 3d402a833c9267bb315e 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12475,7 +12475,7 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
> 
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12615,8 +12615,9 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree vectype,
>    /* Transform.  */
> 
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
> 
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12630,7 +12631,10 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
> 
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code, @@ -12709,6
> +12713,196 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
> 
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec) {
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  gimple_match_op op;
> +  if (!gimple_extract_op (stmt_info->stmt, &op))
> +    gcc_unreachable ();
> +  gcc_assert (op.code.is_tree_code ());  auto code = tree_code
> + (op.code);
> +
> +  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);  gcc_assert
> + (vectype_out);
> +
> +  tree var_op = op.ops[0];
> +
> +  /* When vectorizing things like pointer comparisons we will assume that
> +     the VF of both operands are the same. e.g. a pointer must be compared
> +     to a pointer.  We'll leave this up to vectorizable_comparison_1 to
> +     check further.  */
> +  tree vectype_op = vectype_out;
> +  if (SSA_VAR_P (var_op))
> +    {
> +      stmt_vec_info operand0_info
> +	= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (var_op));
> +      if (!operand0_info)
> +	return false;
> +
> +      /* If we're in a pattern get the type of the original statement.  */
> +      if (STMT_VINFO_IN_PATTERN_P (operand0_info))
> +	operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
> +      vectype_op = STMT_VINFO_VECTYPE (operand0_info);
> +    }
> +
> +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> + TYPE_MODE (truth_type);  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector "
> +			       "comparisons for type %T.\n", truth_type);
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", truth_type);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type,
> NULL);
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform
> + early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);  basic_block cond_bb =
> + gimple_bb (stmt);  gimple_stmt_iterator  cond_gsi = gsi_last_bb
> + (cond_bb);
> +
> +  vec<tree> stmts;
> +
> +  if (slp_node)
> +    stmts = SLP_TREE_VEC_DEFS (slp_node);
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.create (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +      workset.splice (stmts);
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (truth_type, NULL,
> "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  if (slp_node)
> +	    slp_node->push_vec_def (new_stmt);
> +	  else
> +	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  if (masked_loop_p)
> +    {
> +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> truth_type, 0);
> +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			       &cond_gsi);
> +    }
> +
> +  /* Now build the new conditional.  Pattern gimple_conds get dropped
> during
> +     codegen so we must replace the original insn.  */  if
> + (is_pattern_stmt_p (stmt_info))
> +    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));
> +
> +  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
> +			build_zero_cst (truth_type));
> +  t = canonicalize_cond_expr_cond (t);
> +  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    slp_node->push_vec_def (stmt);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12928,7 +13122,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12951,7 +13147,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
> 
>    if (node)
> @@ -13110,6 +13309,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
> 
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -13511,7 +13716,7 @@ vect_is_simple_use (tree operand, vec_info
> *vinfo, enum vect_def_type *dt,
>  	case vect_first_order_recurrence:
>  	  dump_printf (MSG_NOTE, "first order recurrence\n");
>  	  break;
> -       case vect_early_exit_def:
> +	case vect_early_exit_def:
>  	  dump_printf (MSG_NOTE, "early exit\n");
>  	  break;
>  	case vect_unknown_def_type:
> 
> 
> 
> 
> --

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
@ 2023-11-06  7:39 ` Tamar Christina
  2023-11-27 22:49   ` Tamar Christina
  0 siblings, 1 reply; 24+ messages in thread
From: Tamar Christina @ 2023-11-06  7:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 9830 bytes --]

Hi All,

This implements vectorable_early_exit which is used as the codegen part of
vectorizing a gcond.

For the most part it shares the majority of the code with
vectorizable_comparison with addition that it needs to be able to reduce
multiple resulting statements into a single one for use in the gcond, and also
needs to be able to perform masking on the comparisons.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 36aeca60a22cfaea8d3b43348000d75de1d525c7..4809b822632279493a843d402a833c9267bb315e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12475,7 +12475,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12615,8 +12615,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12630,7 +12631,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12709,6 +12713,196 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  gimple_match_op op;
+  if (!gimple_extract_op (stmt_info->stmt, &op))
+    gcc_unreachable ();
+  gcc_assert (op.code.is_tree_code ());
+  auto code = tree_code (op.code);
+
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype_out);
+
+  tree var_op = op.ops[0];
+
+  /* When vectorizing things like pointer comparisons we will assume that
+     the VF of both operands are the same. e.g. a pointer must be compared
+     to a pointer.  We'll leave this up to vectorizable_comparison_1 to
+     check further.  */
+  tree vectype_op = vectype_out;
+  if (SSA_VAR_P (var_op))
+    {
+      stmt_vec_info operand0_info
+	= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (var_op));
+      if (!operand0_info)
+	return false;
+
+      /* If we're in a pattern get the type of the original statement.  */
+      if (STMT_VINFO_IN_PATTERN_P (operand0_info))
+	operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
+      vectype_op = STMT_VINFO_VECTYPE (operand0_info);
+    }
+
+  tree truth_type = truth_type_for (vectype_op);
+  machine_mode mode = TYPE_MODE (truth_type);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector "
+			       "comparisons for type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  vec<tree> stmts;
+
+  if (slp_node)
+    stmts = SLP_TREE_VEC_DEFS (slp_node);
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.create (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (truth_type, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  if (slp_node)
+	    slp_node->push_vec_def (new_stmt);
+	  else
+	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  if (is_pattern_stmt_p (stmt_info))
+    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));
+
+  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
+			build_zero_cst (truth_type));
+  t = canonicalize_cond_expr_cond (t);
+  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
+  update_stmt (stmt);
+
+  if (slp_node)
+    slp_node->push_vec_def (stmt);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12928,7 +13122,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12951,7 +13147,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13110,6 +13309,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -13511,7 +13716,7 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
-       case vect_early_exit_def:
+	case vect_early_exit_def:
 	  dump_printf (MSG_NOTE, "early exit\n");
 	  break;
 	case vect_unknown_def_type:




-- 

[-- Attachment #2: rb17969.patch --]
[-- Type: text/plain, Size: 9077 bytes --]

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 36aeca60a22cfaea8d3b43348000d75de1d525c7..4809b822632279493a843d402a833c9267bb315e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12475,7 +12475,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12615,8 +12615,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12630,7 +12631,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12709,6 +12713,196 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  gimple_match_op op;
+  if (!gimple_extract_op (stmt_info->stmt, &op))
+    gcc_unreachable ();
+  gcc_assert (op.code.is_tree_code ());
+  auto code = tree_code (op.code);
+
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype_out);
+
+  tree var_op = op.ops[0];
+
+  /* When vectorizing things like pointer comparisons we will assume that
+     the VF of both operands are the same. e.g. a pointer must be compared
+     to a pointer.  We'll leave this up to vectorizable_comparison_1 to
+     check further.  */
+  tree vectype_op = vectype_out;
+  if (SSA_VAR_P (var_op))
+    {
+      stmt_vec_info operand0_info
+	= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (var_op));
+      if (!operand0_info)
+	return false;
+
+      /* If we're in a pattern get the type of the original statement.  */
+      if (STMT_VINFO_IN_PATTERN_P (operand0_info))
+	operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
+      vectype_op = STMT_VINFO_VECTYPE (operand0_info);
+    }
+
+  tree truth_type = truth_type_for (vectype_op);
+  machine_mode mode = TYPE_MODE (truth_type);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector "
+			       "comparisons for type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  vec<tree> stmts;
+
+  if (slp_node)
+    stmts = SLP_TREE_VEC_DEFS (slp_node);
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.create (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (truth_type, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  if (slp_node)
+	    slp_node->push_vec_def (new_stmt);
+	  else
+	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  if (is_pattern_stmt_p (stmt_info))
+    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));
+
+  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
+			build_zero_cst (truth_type));
+  t = canonicalize_cond_expr_cond (t);
+  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
+  update_stmt (stmt);
+
+  if (slp_node)
+    slp_node->push_vec_def (stmt);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12928,7 +13122,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12951,7 +13147,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13110,6 +13309,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -13511,7 +13716,7 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
-       case vect_early_exit_def:
+	case vect_early_exit_def:
 	  dump_printf (MSG_NOTE, "early exit\n");
 	  break;
 	case vect_unknown_def_type:




^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-12-14 18:45 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-30  3:47 [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code juzhe.zhong
2023-11-30 10:39 ` Tamar Christina
2023-11-30 10:48   ` juzhe.zhong
2023-11-30 10:58     ` Tamar Christina
  -- strict thread matches above, loose matches on Subject: below --
2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 13:50     ` Richard Biener
2023-12-06  4:37       ` Tamar Christina
2023-12-06  9:37         ` Richard Biener
2023-12-08  8:58           ` Tamar Christina
2023-12-08 10:28             ` Richard Biener
2023-12-08 13:45               ` Tamar Christina
2023-12-08 13:59                 ` Richard Biener
2023-12-08 15:01                   ` Tamar Christina
2023-12-11  7:09                   ` Tamar Christina
2023-12-11  9:36                     ` Richard Biener
2023-12-11 23:12                       ` Tamar Christina
2023-12-12 10:10                         ` Richard Biener
2023-12-12 10:27                           ` Tamar Christina
2023-12-12 10:59                           ` Richard Sandiford
2023-12-12 11:30                             ` Richard Biener
2023-12-13 14:13                               ` Tamar Christina
2023-12-14 13:12                                 ` Richard Biener
2023-12-14 18:44                                   ` Tamar Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).