[Scalar masks 2/x] Use bool masks in if-conversion

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [Scalar masks 2/x] Use bool masks in if-conversion
@ 2015-08-17 16:27 Ilya Enkovich
  2015-08-20 19:26 ` Jeff Law
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Enkovich @ 2015-08-17 16:27 UTC (permalink / raw)
  To: gcc-patches

Hi,

This patch intoriduces a new vectorizer hook use_scalar_mask_p which affects code generated by if-conversion pass (and affects patterns in later patches).

Thanks,
Ilya
--
2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
	* doc/tm.texi.in: Regenerated.
	* target.def (use_scalar_mask_p): New.
	* tree-if-conv.c: Include target.h.
	(predicate_mem_writes): Don't convert boolean predicates into
	integer when scalar masks are used.


diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 2383fb9..a124489 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4233,6 +4233,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_DESTROY_COST_DATA
 
+@hook TARGET_VECTORIZE_USE_SCALAR_MASK_P
+
 @hook TARGET_VECTORIZE_BUILTIN_TM_LOAD
 
 @hook TARGET_VECTORIZE_BUILTIN_TM_STORE
diff --git a/gcc/target.def b/gcc/target.def
index 4edc209..0975bf3 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1855,6 +1855,15 @@ DEFHOOK
  (void *data),
  default_destroy_cost_data)
 
+/* Target function to check scalar masks support.  */
+DEFHOOK
+(use_scalar_mask_p,
+ "This hook returns 1 if vectorizer should use scalar masks instead of "
+ "vector ones for MASK_LOAD, MASK_STORE and VEC_COND_EXPR.",
+ bool,
+ (void),
+ hook_bool_void_false)
+
 HOOK_VECTOR_END (vectorize)
 
 #undef HOOK_PREFIX
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 291e602..73dcecd 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -122,6 +122,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-codes.h"
 #include "optabs.h"
 #include "tree-hash-traits.h"
+#include "target.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
@@ -2082,15 +2083,24 @@ predicate_mem_writes (loop_p loop)
 	      mask = vect_masks[index];
 	    else
 	      {
-		masktype = build_nonstandard_integer_type (bitsize, 1);
-		mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
-		mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
-		cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-						   is_gimple_condexpr,
-						   NULL_TREE,
-						   true, GSI_SAME_STMT);
-		mask = fold_build_cond_expr (masktype, unshare_expr (cond),
-					     mask_op0, mask_op1);
+		if (targetm.vectorize.use_scalar_mask_p ())
+		  {
+		    masktype = boolean_type_node;
+		    mask = unshare_expr (cond);
+		  }
+		else
+		  {
+		    masktype = build_nonstandard_integer_type (bitsize, 1);
+		    mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
+		    mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
+		    cond = force_gimple_operand_gsi_1 (&gsi,
+						       unshare_expr (cond),
+						       is_gimple_condexpr,
+						       NULL_TREE,
+						       true, GSI_SAME_STMT);
+		    mask = fold_build_cond_expr (masktype, unshare_expr (cond),
+						 mask_op0, mask_op1);
+		  }
 		mask = ifc_temp_var (masktype, mask, &gsi);
 		/* Save mask and its size for further use.  */
 	        vect_sizes.safe_push (bitsize);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-17 16:27 [Scalar masks 2/x] Use bool masks in if-conversion Ilya Enkovich
@ 2015-08-20 19:26 ` Jeff Law
  2015-08-21  8:32   ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff Law @ 2015-08-20 19:26 UTC (permalink / raw)
  To: Ilya Enkovich, gcc-patches

On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
> Hi,
>
> This patch intoriduces a new vectorizer hook use_scalar_mask_p which affects code generated by if-conversion pass (and affects patterns in later patches).
>
> Thanks,
> Ilya
> --
> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
> 	* doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
> 	* doc/tm.texi.in: Regenerated.
> 	* target.def (use_scalar_mask_p): New.
> 	* tree-if-conv.c: Include target.h.
> 	(predicate_mem_writes): Don't convert boolean predicates into
> 	integer when scalar masks are used.
Presumably this is how you prevent the generation of scalar masks rather 
than boolean masks on targets which don't have the former?

I hate to ask, but how painful would it be to go from a boolean to 
integer masks later such as during expansion?  Or vice-versa.

WIthout a deep knowledge of the entire patchkit, it feels like we're 
introducing target stuff in a place where we don't want it and that we'd 
be better served with a canonical representation through gimple, then 
dropping into something more target specific during gimple->rtl expansion.


Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-20 19:26 ` Jeff Law
@ 2015-08-21  8:32   ` Richard Biener
  2015-08-21 10:52     ` Ilya Enkovich
  2015-08-21 15:57     ` Jeff Law
  0 siblings, 2 replies; 48+ messages in thread
From: Richard Biener @ 2015-08-21  8:32 UTC (permalink / raw)
  To: Jeff Law; +Cc: Ilya Enkovich, GCC Patches

On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>> affects code generated by if-conversion pass (and affects patterns in later
>> patches).
>>
>> Thanks,
>> Ilya
>> --
>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>
>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>         * doc/tm.texi.in: Regenerated.
>>         * target.def (use_scalar_mask_p): New.
>>         * tree-if-conv.c: Include target.h.
>>         (predicate_mem_writes): Don't convert boolean predicates into
>>         integer when scalar masks are used.
>
> Presumably this is how you prevent the generation of scalar masks rather
> than boolean masks on targets which don't have the former?
>
> I hate to ask, but how painful would it be to go from a boolean to integer
> masks later such as during expansion?  Or vice-versa.
>
> WIthout a deep knowledge of the entire patchkit, it feels like we're
> introducing target stuff in a place where we don't want it and that we'd be
> better served with a canonical representation through gimple, then dropping
> into something more target specific during gimple->rtl expansion.

Indeed.  I don't remember my exact comments during the talk at the Cauldron
but the scheme used there was sth like

  mask = GEN_MASK <vec1 < vec2>;
  b = a + 1;
  x = VEC_COND <mask, a, b>

to model conditional execution already at the if-conversion stage (for
all scalar
stmts made executed unconditionally rather than just the PHI results).  I was
asking for the condition to be removed from GEN_MASK (patch 1 has this
fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
here and not simply do

mask = vec1 < vec2;  // regular vector mask!
b = a + 1;
x = VEC_COND <mask, a, b>

and have the lowering to an integer mask done later.  You'd still
change if-conversion
to predicate _all_ statements, not just those with side-effects.  So I
think there
still needs to be a new target hook to trigger this, similar to how
the target capabilities
trigger the masked load/store path in if-conversion.

But I don't like changing our IL so much as to allow 'integer' masks everywhere.

Richard.

>
> Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-21  8:32   ` Richard Biener
@ 2015-08-21 10:52     ` Ilya Enkovich
  2015-08-21 11:15       ` Richard Biener
  2015-08-25 21:42       ` [Scalar masks 2/x] Use bool masks in if-conversion Jeff Law
  2015-08-21 15:57     ` Jeff Law
  1 sibling, 2 replies; 48+ messages in thread
From: Ilya Enkovich @ 2015-08-21 10:52 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

2015-08-21 11:15 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
>> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>>> affects code generated by if-conversion pass (and affects patterns in later
>>> patches).
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>
>>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>>         * doc/tm.texi.in: Regenerated.
>>>         * target.def (use_scalar_mask_p): New.
>>>         * tree-if-conv.c: Include target.h.
>>>         (predicate_mem_writes): Don't convert boolean predicates into
>>>         integer when scalar masks are used.
>>
>> Presumably this is how you prevent the generation of scalar masks rather
>> than boolean masks on targets which don't have the former?
>>
>> I hate to ask, but how painful would it be to go from a boolean to integer
>> masks later such as during expansion?  Or vice-versa.
>>
>> WIthout a deep knowledge of the entire patchkit, it feels like we're
>> introducing target stuff in a place where we don't want it and that we'd be
>> better served with a canonical representation through gimple, then dropping
>> into something more target specific during gimple->rtl expansion.

I want a work with bitmasks to be expressed in a natural way using
regular integer operations. Currently all masks manipulations are
emulated via vector statements (mostly using a bunch of vec_cond). For
complex predicates it may be nontrivial to transform it back to scalar
masks and get an efficient code. Also the same vector may be used as
both a mask and an integer vector. Things become more complex if you
additionally have broadcasts and vector pack/unpack code. It also
should be transformed into a scalar masks manipulations somehow.

Also according to vector ABI integer mask should be used for mask
operand in case of masked vector call.

Current implementation of masked loads, masked stores and bool
patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
really call it a canonical representation for all targets? Using
scalar masks everywhere should probably cause the same conversion
problem for SSE I listed above though.

Talking about a canonical representation, shouldn't we use some
special masks representation and not mixing it with integer and vector
of integers then? Only in this case target would be able to
efficiently expand it into a corresponding rtl.

>
> Indeed.  I don't remember my exact comments during the talk at the Cauldron
> but the scheme used there was sth like
>
>   mask = GEN_MASK <vec1 < vec2>;
>   b = a + 1;
>   x = VEC_COND <mask, a, b>
>
> to model conditional execution already at the if-conversion stage (for
> all scalar
> stmts made executed unconditionally rather than just the PHI results).  I was
> asking for the condition to be removed from GEN_MASK (patch 1 has this
> fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
> here and not simply do
>
> mask = vec1 < vec2;  // regular vector mask!
> b = a + 1;
> x = VEC_COND <mask, a, b>
>
> and have the lowering to an integer mask done later.  You'd still
> change if-conversion
> to predicate _all_ statements, not just those with side-effects.  So I
> think there
> still needs to be a new target hook to trigger this, similar to how
> the target capabilities
> trigger the masked load/store path in if-conversion.

I think you mix scalar masks with a loop reminders optimization. I'm
not going to do other changes in if-conversion other then in this
posted patch to support scalar masks. Statements predication will be
used to vectorize loop reminders. And not all of them, only reduction
definitions. This will be independent from scalar masks and will work
for vector masks also. And these changes are not going to be in
if-conversion.

Thanks,
Ilya

>
> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
>
> Richard.
>
>
>>
>> Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-21 10:52     ` Ilya Enkovich
@ 2015-08-21 11:15       ` Richard Biener
  2015-08-21 12:19         ` Ilya Enkovich
  2015-08-25 21:42       ` [Scalar masks 2/x] Use bool masks in if-conversion Jeff Law
  1 sibling, 1 reply; 48+ messages in thread
From: Richard Biener @ 2015-08-21 11:15 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, GCC Patches

On Fri, Aug 21, 2015 at 12:49 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-21 11:15 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
>>> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>>>
>>>> Hi,
>>>>
>>>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>>>> affects code generated by if-conversion pass (and affects patterns in later
>>>> patches).
>>>>
>>>> Thanks,
>>>> Ilya
>>>> --
>>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>>
>>>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>>>         * doc/tm.texi.in: Regenerated.
>>>>         * target.def (use_scalar_mask_p): New.
>>>>         * tree-if-conv.c: Include target.h.
>>>>         (predicate_mem_writes): Don't convert boolean predicates into
>>>>         integer when scalar masks are used.
>>>
>>> Presumably this is how you prevent the generation of scalar masks rather
>>> than boolean masks on targets which don't have the former?
>>>
>>> I hate to ask, but how painful would it be to go from a boolean to integer
>>> masks later such as during expansion?  Or vice-versa.
>>>
>>> WIthout a deep knowledge of the entire patchkit, it feels like we're
>>> introducing target stuff in a place where we don't want it and that we'd be
>>> better served with a canonical representation through gimple, then dropping
>>> into something more target specific during gimple->rtl expansion.
>
> I want a work with bitmasks to be expressed in a natural way using
> regular integer operations. Currently all masks manipulations are
> emulated via vector statements (mostly using a bunch of vec_cond). For
> complex predicates it may be nontrivial to transform it back to scalar
> masks and get an efficient code. Also the same vector may be used as
> both a mask and an integer vector. Things become more complex if you
> additionally have broadcasts and vector pack/unpack code. It also
> should be transformed into a scalar masks manipulations somehow.

Hmm, I don't see how vector masks are more difficult to operate with.

> Also according to vector ABI integer mask should be used for mask
> operand in case of masked vector call.

What ABI?  The function signature of the intrinsics?  How would that
come into play here?

> Current implementation of masked loads, masked stores and bool
> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
> really call it a canonical representation for all targets?

No idea - we'll revisit when another targets adds a similar capability.

> Using scalar masks everywhere should probably cause the same conversion
> problem for SSE I listed above though.
>
> Talking about a canonical representation, shouldn't we use some
> special masks representation and not mixing it with integer and vector
> of integers then? Only in this case target would be able to
> efficiently expand it into a corresponding rtl.

That was my idea of vector<bool> ... but I didn't explore it and see where
it will cause issues.

Fact is GCC already copes with vector masks generated by vector compares
just fine everywhere and I'd rather leave it as that.

>>
>> Indeed.  I don't remember my exact comments during the talk at the Cauldron
>> but the scheme used there was sth like
>>
>>   mask = GEN_MASK <vec1 < vec2>;
>>   b = a + 1;
>>   x = VEC_COND <mask, a, b>
>>
>> to model conditional execution already at the if-conversion stage (for
>> all scalar
>> stmts made executed unconditionally rather than just the PHI results).  I was
>> asking for the condition to be removed from GEN_MASK (patch 1 has this
>> fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
>> here and not simply do
>>
>> mask = vec1 < vec2;  // regular vector mask!
>> b = a + 1;
>> x = VEC_COND <mask, a, b>
>>
>> and have the lowering to an integer mask done later.  You'd still
>> change if-conversion
>> to predicate _all_ statements, not just those with side-effects.  So I
>> think there
>> still needs to be a new target hook to trigger this, similar to how
>> the target capabilities
>> trigger the masked load/store path in if-conversion.
>
> I think you mix scalar masks with a loop reminders optimization. I'm
> not going to do other changes in if-conversion other then in this
> posted patch to support scalar masks. Statements predication will be
> used to vectorize loop reminders. And not all of them, only reduction
> definitions. This will be independent from scalar masks and will work
> for vector masks also. And these changes are not going to be in
> if-conversion.

Maybe I misremember.  Didn't look at the patch in detail yet.

Richard.

>
> Thanks,
> Ilya
>
>>
>> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
>>
>> Richard.
>>
>>
>>>
>>> Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-21 11:15       ` Richard Biener
@ 2015-08-21 12:19         ` Ilya Enkovich
  2015-08-25 21:40           ` Jeff Law
  2015-08-26 13:09           ` Richard Biener
  0 siblings, 2 replies; 48+ messages in thread
From: Ilya Enkovich @ 2015-08-21 12:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Aug 21, 2015 at 12:49 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2015-08-21 11:15 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
>>>> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>>>>> affects code generated by if-conversion pass (and affects patterns in later
>>>>> patches).
>>>>>
>>>>> Thanks,
>>>>> Ilya
>>>>> --
>>>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>>>
>>>>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>>>>         * doc/tm.texi.in: Regenerated.
>>>>>         * target.def (use_scalar_mask_p): New.
>>>>>         * tree-if-conv.c: Include target.h.
>>>>>         (predicate_mem_writes): Don't convert boolean predicates into
>>>>>         integer when scalar masks are used.
>>>>
>>>> Presumably this is how you prevent the generation of scalar masks rather
>>>> than boolean masks on targets which don't have the former?
>>>>
>>>> I hate to ask, but how painful would it be to go from a boolean to integer
>>>> masks later such as during expansion?  Or vice-versa.
>>>>
>>>> WIthout a deep knowledge of the entire patchkit, it feels like we're
>>>> introducing target stuff in a place where we don't want it and that we'd be
>>>> better served with a canonical representation through gimple, then dropping
>>>> into something more target specific during gimple->rtl expansion.
>>
>> I want a work with bitmasks to be expressed in a natural way using
>> regular integer operations. Currently all masks manipulations are
>> emulated via vector statements (mostly using a bunch of vec_cond). For
>> complex predicates it may be nontrivial to transform it back to scalar
>> masks and get an efficient code. Also the same vector may be used as
>> both a mask and an integer vector. Things become more complex if you
>> additionally have broadcasts and vector pack/unpack code. It also
>> should be transformed into a scalar masks manipulations somehow.
>
> Hmm, I don't see how vector masks are more difficult to operate with.

There are just no instructions for that but you have to pretend you
have to get code vectorized.

>
>> Also according to vector ABI integer mask should be used for mask
>> operand in case of masked vector call.
>
> What ABI?  The function signature of the intrinsics?  How would that
> come into play here?

Not intrinsics. I mean OpenMP vector functions which require integer
arg for a mask in case of 512-bit vector.

>
>> Current implementation of masked loads, masked stores and bool
>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>> really call it a canonical representation for all targets?
>
> No idea - we'll revisit when another targets adds a similar capability.

AVX-512 is such target. Current representation forces multiple scalar
mask -> vector mask and back transformations which are artificially
introduced by current bool patterns and are hard to optimize out.

>
>> Using scalar masks everywhere should probably cause the same conversion
>> problem for SSE I listed above though.
>>
>> Talking about a canonical representation, shouldn't we use some
>> special masks representation and not mixing it with integer and vector
>> of integers then? Only in this case target would be able to
>> efficiently expand it into a corresponding rtl.
>
> That was my idea of vector<bool> ... but I didn't explore it and see where
> it will cause issues.
>
> Fact is GCC already copes with vector masks generated by vector compares
> just fine everywhere and I'd rather leave it as that.

Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
additional vec_cond. I don't think vectorizer ever generates vector
comparison.

And I wouldn't say it's fine 'everywhere' because there is a single
target utilizing them. Masked loads and stored for AVX-512 just don't
work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
512-bit vector then we get an ugly inefficient code. The question is
where to fight with this inefficiency: in RTL or in GIMPLE. I want to
fight with it where it appears, i.e. in GIMPLE by preventing bool ->
int conversions applied everywhere even if target doesn't need it.

If we don't want to support both types of masks in GIMPLE then it's
more reasonable to make bool -> int conversion in expand for targets
requiring it, rather than do it for everyone and then leave it to
target to transform it back and try to get rid of all those redundant
transformations. I'd give vector<bool> a chance to become a canonical
mask representation for that.


Thanks,
Ilya

>
>>>
>>> Indeed.  I don't remember my exact comments during the talk at the Cauldron
>>> but the scheme used there was sth like
>>>
>>>   mask = GEN_MASK <vec1 < vec2>;
>>>   b = a + 1;
>>>   x = VEC_COND <mask, a, b>
>>>
>>> to model conditional execution already at the if-conversion stage (for
>>> all scalar
>>> stmts made executed unconditionally rather than just the PHI results).  I was
>>> asking for the condition to be removed from GEN_MASK (patch 1 has this
>>> fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
>>> here and not simply do
>>>
>>> mask = vec1 < vec2;  // regular vector mask!
>>> b = a + 1;
>>> x = VEC_COND <mask, a, b>
>>>
>>> and have the lowering to an integer mask done later.  You'd still
>>> change if-conversion
>>> to predicate _all_ statements, not just those with side-effects.  So I
>>> think there
>>> still needs to be a new target hook to trigger this, similar to how
>>> the target capabilities
>>> trigger the masked load/store path in if-conversion.
>>
>> I think you mix scalar masks with a loop reminders optimization. I'm
>> not going to do other changes in if-conversion other then in this
>> posted patch to support scalar masks. Statements predication will be
>> used to vectorize loop reminders. And not all of them, only reduction
>> definitions. This will be independent from scalar masks and will work
>> for vector masks also. And these changes are not going to be in
>> if-conversion.
>
> Maybe I misremember.  Didn't look at the patch in detail yet.
>
> Richard.
>
>>
>> Thanks,
>> Ilya
>>
>>>
>>> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
>>>
>>> Richard.
>>>
>>>
>>>>
>>>> Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-21  8:32   ` Richard Biener
  2015-08-21 10:52     ` Ilya Enkovich
@ 2015-08-21 15:57     ` Jeff Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeff Law @ 2015-08-21 15:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: Ilya Enkovich, GCC Patches

On 08/21/2015 02:15 AM, Richard Biener wrote:
>
> Indeed.  I don't remember my exact comments during the talk at the
> Cauldron but the scheme used there was sth like
>
> mask = GEN_MASK <vec1 < vec2>; b = a + 1; x = VEC_COND <mask, a, b>
>
> to model conditional execution already at the if-conversion stage
> (for all scalar stmts made executed unconditionally rather than just
> the PHI results).  I was asking for the condition to be removed from
> GEN_MASK (patch 1 has this fixed now AFAICS).  And I also asked why
> it was necessary to do this "lowering" here and not simply do
>
> mask = vec1 < vec2;  // regular vector mask! b = a + 1; x = VEC_COND
> <mask, a, b>
>
> and have the lowering to an integer mask done later.  You'd still
> change if-conversion to predicate _all_ statements, not just those
> with side-effects.  So I think there still needs to be a new target
> hook to trigger this, similar to how the target capabilities trigger
> the masked load/store path in if-conversion.
>
> But I don't like changing our IL so much as to allow 'integer' masks
> everywhere.
Right.  I'd be *much* less concerned with a hook that tells the 
if-converter to predicate everything and for the expander to do the 
lowering.

We'd still have the same representation through gimple, what changes is 
how many statements have predication and the lowering from gimple to RTL.

Contrast to a introducing a totally new way to represent predication in 
if-conversion and beyond, which just seems like a nightmare.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-21 12:19         ` Ilya Enkovich
@ 2015-08-25 21:40           ` Jeff Law
  2015-08-26 11:13             ` Ilya Enkovich
  2015-08-26 13:09           ` Richard Biener
  1 sibling, 1 reply; 48+ messages in thread
From: Jeff Law @ 2015-08-25 21:40 UTC (permalink / raw)
  To: Ilya Enkovich, Richard Biener; +Cc: GCC Patches

On 08/21/2015 06:17 AM, Ilya Enkovich wrote:
>>
>> Hmm, I don't see how vector masks are more difficult to operate with.
>
> There are just no instructions for that but you have to pretend you
> have to get code vectorized.
>
>>
>>> Also according to vector ABI integer mask should be used for mask
>>> operand in case of masked vector call.
>>
>> What ABI?  The function signature of the intrinsics?  How would that
>> come into play here?
>
> Not intrinsics. I mean OpenMP vector functions which require integer
> arg for a mask in case of 512-bit vector.
That's what I assumed -- you can pass in a mask as an argument and it's 
supposed to be a simple integer, right?


>
>>
>>> Current implementation of masked loads, masked stores and bool
>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>> really call it a canonical representation for all targets?
>>
>> No idea - we'll revisit when another targets adds a similar capability.
>
> AVX-512 is such target. Current representation forces multiple scalar
> mask -> vector mask and back transformations which are artificially
> introduced by current bool patterns and are hard to optimize out.
I'm a bit surprised they're so prevalent and hard to optimize away. 
ISTM PRE ought to handle this kind of thing with relative ease.


>> Fact is GCC already copes with vector masks generated by vector compares
>> just fine everywhere and I'd rather leave it as that.
>
> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
> additional vec_cond. I don't think vectorizer ever generates vector
> comparison.
>
> And I wouldn't say it's fine 'everywhere' because there is a single
> target utilizing them. Masked loads and stored for AVX-512 just don't
> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
> 512-bit vector then we get an ugly inefficient code. The question is
> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
> int conversions applied everywhere even if target doesn't need it.
You should expect pushback anytime target dependencies are added to 
gimple, even if it's stuff in the vectorizer, which is infested with 
target dependencies.

>
> If we don't want to support both types of masks in GIMPLE then it's
> more reasonable to make bool -> int conversion in expand for targets
> requiring it, rather than do it for everyone and then leave it to
> target to transform it back and try to get rid of all those redundant
> transformations. I'd give vector<bool> a chance to become a canonical
> mask representation for that.
Might be worth some experimentation.

Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-21 10:52     ` Ilya Enkovich
  2015-08-21 11:15       ` Richard Biener
@ 2015-08-25 21:42       ` Jeff Law
  2015-08-26 11:14         ` Ilya Enkovich
  1 sibling, 1 reply; 48+ messages in thread
From: Jeff Law @ 2015-08-25 21:42 UTC (permalink / raw)
  To: Ilya Enkovich, Richard Biener; +Cc: GCC Patches

On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>
> I want a work with bitmasks to be expressed in a natural way using
> regular integer operations. Currently all masks manipulations are
> emulated via vector statements (mostly using a bunch of vec_cond). For
> complex predicates it may be nontrivial to transform it back to scalar
> masks and get an efficient code. Also the same vector may be used as
> both a mask and an integer vector. Things become more complex if you
> additionally have broadcasts and vector pack/unpack code. It also
> should be transformed into a scalar masks manipulations somehow.
Or why not model the conversion at the gimple level using a 
CONVERT_EXPR?   In fact, the more I think about it, that seems to make 
more sense to me.

We pick a canonical form for the mask, whatever it may be.  We use that 
canonical form and model conversions between it and the other form via 
CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant 
conversions.  If it's not up to the task, we should really look into why 
and resolve.

Yes, that does mean we have two forms which I'm not terribly happy about 
and it means some target dependencies on what the masked vector 
operation looks like (ie, does it accept a simple integer or vector 
mask), but I'm starting to wonder if, as distasteful as I find it, it's 
the right thing to do.

>>
>> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
I'm warming up to that idea...

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-25 21:40           ` Jeff Law
@ 2015-08-26 11:13             ` Ilya Enkovich
  0 siblings, 0 replies; 48+ messages in thread
From: Ilya Enkovich @ 2015-08-26 11:13 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, GCC Patches

2015-08-26 0:26 GMT+03:00 Jeff Law <law@redhat.com>:
> On 08/21/2015 06:17 AM, Ilya Enkovich wrote:
>>>
>>>
>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>
>>
>> There are just no instructions for that but you have to pretend you
>> have to get code vectorized.
>>
>>>
>>>> Also according to vector ABI integer mask should be used for mask
>>>> operand in case of masked vector call.
>>>
>>>
>>> What ABI?  The function signature of the intrinsics?  How would that
>>> come into play here?
>>
>>
>> Not intrinsics. I mean OpenMP vector functions which require integer
>> arg for a mask in case of 512-bit vector.
>
> That's what I assumed -- you can pass in a mask as an argument and it's
> supposed to be a simple integer, right?

Depending on target ABI requires either vector mask or a simple integer value.

>
>
>>
>>>
>>>> Current implementation of masked loads, masked stores and bool
>>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>>> really call it a canonical representation for all targets?
>>>
>>>
>>> No idea - we'll revisit when another targets adds a similar capability.
>>
>>
>> AVX-512 is such target. Current representation forces multiple scalar
>> mask -> vector mask and back transformations which are artificially
>> introduced by current bool patterns and are hard to optimize out.
>
> I'm a bit surprised they're so prevalent and hard to optimize away. ISTM PRE
> ought to handle this kind of thing with relative ease.

Most of vector comparisons are UNSPEC. And I doubt PRE may actually
help much even if get rid of UNSPEC somehow. Is there really a
redundancy in:

if ((v1 cmp v2) && (v3 cmp v4))
  load

v1 cmp v2 -> mask1
select mask1 vec_cst_-1 vec_cst_0 -> vec_mask1
v3 cmp v4 -> mask2
select mask2 vec_mask1 vec_cst_0 -> vec_mask2
vec_mask2 NE vec_cst_0 -> mask3
load by mask3

It looks to me more like a i386 specific instruction selection problem.

Ilya

>
>
>>> Fact is GCC already copes with vector masks generated by vector compares
>>> just fine everywhere and I'd rather leave it as that.
>>
>>
>> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
>> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
>> additional vec_cond. I don't think vectorizer ever generates vector
>> comparison.
>>
>> And I wouldn't say it's fine 'everywhere' because there is a single
>> target utilizing them. Masked loads and stored for AVX-512 just don't
>> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
>> 512-bit vector then we get an ugly inefficient code. The question is
>> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
>> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
>> int conversions applied everywhere even if target doesn't need it.
>
> You should expect pushback anytime target dependencies are added to gimple,
> even if it's stuff in the vectorizer, which is infested with target
> dependencies.
>
>>
>> If we don't want to support both types of masks in GIMPLE then it's
>> more reasonable to make bool -> int conversion in expand for targets
>> requiring it, rather than do it for everyone and then leave it to
>> target to transform it back and try to get rid of all those redundant
>> transformations. I'd give vector<bool> a chance to become a canonical
>> mask representation for that.
>
> Might be worth some experimentation.
>
> Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-25 21:42       ` [Scalar masks 2/x] Use bool masks in if-conversion Jeff Law
@ 2015-08-26 11:14         ` Ilya Enkovich
  2015-08-26 13:12           ` Richard Biener
  2015-08-26 16:58           ` Jeff Law
  0 siblings, 2 replies; 48+ messages in thread
From: Ilya Enkovich @ 2015-08-26 11:14 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, GCC Patches

2015-08-26 0:42 GMT+03:00 Jeff Law <law@redhat.com>:
> On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>>
>>
>> I want a work with bitmasks to be expressed in a natural way using
>> regular integer operations. Currently all masks manipulations are
>> emulated via vector statements (mostly using a bunch of vec_cond). For
>> complex predicates it may be nontrivial to transform it back to scalar
>> masks and get an efficient code. Also the same vector may be used as
>> both a mask and an integer vector. Things become more complex if you
>> additionally have broadcasts and vector pack/unpack code. It also
>> should be transformed into a scalar masks manipulations somehow.
>
> Or why not model the conversion at the gimple level using a CONVERT_EXPR?
> In fact, the more I think about it, that seems to make more sense to me.
>
> We pick a canonical form for the mask, whatever it may be.  We use that
> canonical form and model conversions between it and the other form via
> CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
> If it's not up to the task, we should really look into why and resolve.
>
> Yes, that does mean we have two forms which I'm not terribly happy about and
> it means some target dependencies on what the masked vector operation looks
> like (ie, does it accept a simple integer or vector mask), but I'm starting
> to wonder if, as distasteful as I find it, it's the right thing to do.

If we have some special representation for masks in GIMPLE then we
might not need any conversions. We could ask a target to define a MODE
for this type and use it directly everywhere: directly compare into
it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
that type is reserved for masks usage then you previous suggestion to
transform masks into target specific form at GIMPLE->RTL phase should
work fine. This would allow to support only a single masks
representation in GIMPLE.

Thanks,
Ilya

>
>>>
>>> But I don't like changing our IL so much as to allow 'integer' masks
>>> everywhere.
>
> I'm warming up to that idea...
>
> jeff
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-21 12:19         ` Ilya Enkovich
  2015-08-25 21:40           ` Jeff Law
@ 2015-08-26 13:09           ` Richard Biener
  2015-08-26 13:21             ` Jakub Jelinek
  2015-08-26 14:51             ` Ilya Enkovich
  1 sibling, 2 replies; 48+ messages in thread
From: Richard Biener @ 2015-08-26 13:09 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, GCC Patches

On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Fri, Aug 21, 2015 at 12:49 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-08-21 11:15 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
>>>>> On 08/17/2015 10:25 AM, Ilya Enkovich wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This patch intoriduces a new vectorizer hook use_scalar_mask_p which
>>>>>> affects code generated by if-conversion pass (and affects patterns in later
>>>>>> patches).
>>>>>>
>>>>>> Thanks,
>>>>>> Ilya
>>>>>> --
>>>>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>>>>
>>>>>>         * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
>>>>>>         * doc/tm.texi.in: Regenerated.
>>>>>>         * target.def (use_scalar_mask_p): New.
>>>>>>         * tree-if-conv.c: Include target.h.
>>>>>>         (predicate_mem_writes): Don't convert boolean predicates into
>>>>>>         integer when scalar masks are used.
>>>>>
>>>>> Presumably this is how you prevent the generation of scalar masks rather
>>>>> than boolean masks on targets which don't have the former?
>>>>>
>>>>> I hate to ask, but how painful would it be to go from a boolean to integer
>>>>> masks later such as during expansion?  Or vice-versa.
>>>>>
>>>>> WIthout a deep knowledge of the entire patchkit, it feels like we're
>>>>> introducing target stuff in a place where we don't want it and that we'd be
>>>>> better served with a canonical representation through gimple, then dropping
>>>>> into something more target specific during gimple->rtl expansion.
>>>
>>> I want a work with bitmasks to be expressed in a natural way using
>>> regular integer operations. Currently all masks manipulations are
>>> emulated via vector statements (mostly using a bunch of vec_cond). For
>>> complex predicates it may be nontrivial to transform it back to scalar
>>> masks and get an efficient code. Also the same vector may be used as
>>> both a mask and an integer vector. Things become more complex if you
>>> additionally have broadcasts and vector pack/unpack code. It also
>>> should be transformed into a scalar masks manipulations somehow.
>>
>> Hmm, I don't see how vector masks are more difficult to operate with.
>
> There are just no instructions for that but you have to pretend you
> have to get code vectorized.

Huh?  Bitwise ops should be readily available.

>>
>>> Also according to vector ABI integer mask should be used for mask
>>> operand in case of masked vector call.
>>
>> What ABI?  The function signature of the intrinsics?  How would that
>> come into play here?
>
> Not intrinsics. I mean OpenMP vector functions which require integer
> arg for a mask in case of 512-bit vector.

How do you declare those?

>>
>>> Current implementation of masked loads, masked stores and bool
>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>> really call it a canonical representation for all targets?
>>
>> No idea - we'll revisit when another targets adds a similar capability.
>
> AVX-512 is such target. Current representation forces multiple scalar
> mask -> vector mask and back transformations which are artificially
> introduced by current bool patterns and are hard to optimize out.

I dislike the bool patterns anyway and we should try to remove those
and make the vectorizer handle them in other ways (they have single-use
issues anyway).  I don't remember exactly what caused us to add them
but one reason was there wasn't a vector type for 'bool' (but I don't see how
it should be necessary to ask "get me a vector type for 'bool'").

>>
>>> Using scalar masks everywhere should probably cause the same conversion
>>> problem for SSE I listed above though.
>>>
>>> Talking about a canonical representation, shouldn't we use some
>>> special masks representation and not mixing it with integer and vector
>>> of integers then? Only in this case target would be able to
>>> efficiently expand it into a corresponding rtl.
>>
>> That was my idea of vector<bool> ... but I didn't explore it and see where
>> it will cause issues.
>>
>> Fact is GCC already copes with vector masks generated by vector compares
>> just fine everywhere and I'd rather leave it as that.
>
> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
> additional vec_cond. I don't think vectorizer ever generates vector
> comparison.

Ok, well that's an implementation detail then.  Are you sure about AND and IOR?
The comment above vect_recog_bool_pattern says

        Assuming size of TYPE is the same as size of all comparisons
        (otherwise some casts would be added where needed), the above
        sequence we create related pattern stmts:
        S1'  a_T = x1 CMP1 y1 ? 1 : 0;
        S3'  c_T = x2 CMP2 y2 ? a_T : 0;
        S4'  d_T = x3 CMP3 y3 ? 1 : 0;
        S5'  e_T = c_T | d_T;
        S6'  f_T = e_T;

thus has vector mask |

> And I wouldn't say it's fine 'everywhere' because there is a single
> target utilizing them. Masked loads and stored for AVX-512 just don't
> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
> 512-bit vector then we get an ugly inefficient code. The question is
> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
> int conversions applied everywhere even if target doesn't need it.
>
> If we don't want to support both types of masks in GIMPLE then it's
> more reasonable to make bool -> int conversion in expand for targets
> requiring it, rather than do it for everyone and then leave it to
> target to transform it back and try to get rid of all those redundant
> transformations. I'd give vector<bool> a chance to become a canonical
> mask representation for that.

Well, you are missing the case of

   bool b = a < b;
   int x = (int)b;

where the bool is used as integer (and thus an integer mask would have to be
"expanded").  When the bool is a mask in itself the integer use is either free
or a matter of a widening/shortening operation.

Richard.

>
> Thanks,
> Ilya
>
>>
>>>>
>>>> Indeed.  I don't remember my exact comments during the talk at the Cauldron
>>>> but the scheme used there was sth like
>>>>
>>>>   mask = GEN_MASK <vec1 < vec2>;
>>>>   b = a + 1;
>>>>   x = VEC_COND <mask, a, b>
>>>>
>>>> to model conditional execution already at the if-conversion stage (for
>>>> all scalar
>>>> stmts made executed unconditionally rather than just the PHI results).  I was
>>>> asking for the condition to be removed from GEN_MASK (patch 1 has this
>>>> fixed now AFAICS).  And I also asked why it was necessary to do this "lowering"
>>>> here and not simply do
>>>>
>>>> mask = vec1 < vec2;  // regular vector mask!
>>>> b = a + 1;
>>>> x = VEC_COND <mask, a, b>
>>>>
>>>> and have the lowering to an integer mask done later.  You'd still
>>>> change if-conversion
>>>> to predicate _all_ statements, not just those with side-effects.  So I
>>>> think there
>>>> still needs to be a new target hook to trigger this, similar to how
>>>> the target capabilities
>>>> trigger the masked load/store path in if-conversion.
>>>
>>> I think you mix scalar masks with a loop reminders optimization. I'm
>>> not going to do other changes in if-conversion other then in this
>>> posted patch to support scalar masks. Statements predication will be
>>> used to vectorize loop reminders. And not all of them, only reduction
>>> definitions. This will be independent from scalar masks and will work
>>> for vector masks also. And these changes are not going to be in
>>> if-conversion.
>>
>> Maybe I misremember.  Didn't look at the patch in detail yet.
>>
>> Richard.
>>
>>>
>>> Thanks,
>>> Ilya
>>>
>>>>
>>>> But I don't like changing our IL so much as to allow 'integer' masks everywhere.
>>>>
>>>> Richard.
>>>>
>>>>
>>>>>
>>>>> Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 11:14         ` Ilya Enkovich
@ 2015-08-26 13:12           ` Richard Biener
  2015-08-26 16:58           ` Jeff Law
  1 sibling, 0 replies; 48+ messages in thread
From: Richard Biener @ 2015-08-26 13:12 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, GCC Patches

On Wed, Aug 26, 2015 at 1:13 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-26 0:42 GMT+03:00 Jeff Law <law@redhat.com>:
>> On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>>>
>>>
>>> I want a work with bitmasks to be expressed in a natural way using
>>> regular integer operations. Currently all masks manipulations are
>>> emulated via vector statements (mostly using a bunch of vec_cond). For
>>> complex predicates it may be nontrivial to transform it back to scalar
>>> masks and get an efficient code. Also the same vector may be used as
>>> both a mask and an integer vector. Things become more complex if you
>>> additionally have broadcasts and vector pack/unpack code. It also
>>> should be transformed into a scalar masks manipulations somehow.
>>
>> Or why not model the conversion at the gimple level using a CONVERT_EXPR?
>> In fact, the more I think about it, that seems to make more sense to me.
>>
>> We pick a canonical form for the mask, whatever it may be.  We use that
>> canonical form and model conversions between it and the other form via
>> CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
>> If it's not up to the task, we should really look into why and resolve.
>>
>> Yes, that does mean we have two forms which I'm not terribly happy about and
>> it means some target dependencies on what the masked vector operation looks
>> like (ie, does it accept a simple integer or vector mask), but I'm starting
>> to wonder if, as distasteful as I find it, it's the right thing to do.
>
> If we have some special representation for masks in GIMPLE then we
> might not need any conversions. We could ask a target to define a MODE
> for this type and use it directly everywhere: directly compare into
> it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
> that type is reserved for masks usage then you previous suggestion to
> transform masks into target specific form at GIMPLE->RTL phase should
> work fine. This would allow to support only a single masks
> representation in GIMPLE.

But we can already do all this with the integer vector masks we have.
If you think that the vectorizer generated

  mask = VEC_COND <v1 < v2 ? { -1,...} : { 0, ...} >

is ugly then we can remove that implementation detail and use

  mask = v1 < v2;

directly.  Note that the VEC_COND form was invented to avoid
the need to touch RTL expansion for vector compares (IIRC).
Or it pre-dated specifying what compares generate on GIMPLE.

Richard.

> Thanks,
> Ilya
>
>>
>>>>
>>>> But I don't like changing our IL so much as to allow 'integer' masks
>>>> everywhere.
>>
>> I'm warming up to that idea...
>>
>> jeff
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 13:09           ` Richard Biener
@ 2015-08-26 13:21             ` Jakub Jelinek
  2015-08-26 13:27               ` Richard Biener
  2015-08-26 14:51             ` Ilya Enkovich
  1 sibling, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-08-26 13:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: Ilya Enkovich, Jeff Law, GCC Patches

On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
> > AVX-512 is such target. Current representation forces multiple scalar
> > mask -> vector mask and back transformations which are artificially
> > introduced by current bool patterns and are hard to optimize out.
> 
> I dislike the bool patterns anyway and we should try to remove those
> and make the vectorizer handle them in other ways (they have single-use
> issues anyway).  I don't remember exactly what caused us to add them
> but one reason was there wasn't a vector type for 'bool' (but I don't see how
> it should be necessary to ask "get me a vector type for 'bool'").

That was just one of the reasons.  The other reason is that even if we would
choose some vector of integer type as vector of bool, the question is what
type.  E.g. if you use vector of chars, you almost always get terrible
vectorized code, except for the AVX-512 you really want an integral type
that has the size of the types you are comparing.
And I'd say this is very much related to the need to do some type promotions
or demotions on the scalar code meant to be vectorized (but only the copy
for vectorizations), so that we have as few different scalar type sizes in
the loop as possible, because widening / narrowing vector conversions aren't
exactly cheap and a single char operation in a loop otherwise full of long
long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into
vf=16 (or 32 or 64), increasing it a lot.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 13:21             ` Jakub Jelinek
@ 2015-08-26 13:27               ` Richard Biener
  2015-08-26 13:47                 ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2015-08-26 13:27 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ilya Enkovich, Jeff Law, GCC Patches

On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
>> > AVX-512 is such target. Current representation forces multiple scalar
>> > mask -> vector mask and back transformations which are artificially
>> > introduced by current bool patterns and are hard to optimize out.
>>
>> I dislike the bool patterns anyway and we should try to remove those
>> and make the vectorizer handle them in other ways (they have single-use
>> issues anyway).  I don't remember exactly what caused us to add them
>> but one reason was there wasn't a vector type for 'bool' (but I don't see how
>> it should be necessary to ask "get me a vector type for 'bool'").
>
> That was just one of the reasons.  The other reason is that even if we would
> choose some vector of integer type as vector of bool, the question is what
> type.  E.g. if you use vector of chars, you almost always get terrible
> vectorized code, except for the AVX-512 you really want an integral type
> that has the size of the types you are comparing.

Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
first compute the vector type for the comparison itself (which is "fixed") and
thus we can compute the vector type of any bitwise op on it as well.

> And I'd say this is very much related to the need to do some type promotions
> or demotions on the scalar code meant to be vectorized (but only the copy
> for vectorizations), so that we have as few different scalar type sizes in
> the loop as possible, because widening / narrowing vector conversions aren't
> exactly cheap and a single char operation in a loop otherwise full of long
> long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into
> vf=16 (or 32 or 64), increasing it a lot.

That's true but unrelated.  With conditions this gets to optimizing where
the promotion/demotion happens (which depends on how the result is used).

The current pattern approach has the issue that it doesn't work for multiple
uses in the condition bitops which is bad as well.

But it couldn't have been _only_ the vector type computation that made us
invent the patterns, no?  Do you remember anything else?

Thanks,
Richard.


>
>         Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 13:27               ` Richard Biener
@ 2015-08-26 13:47                 ` Jakub Jelinek
  2015-08-26 14:36                   ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-08-26 13:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: Ilya Enkovich, Jeff Law, GCC Patches

On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote:
> On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
> >> > AVX-512 is such target. Current representation forces multiple scalar
> >> > mask -> vector mask and back transformations which are artificially
> >> > introduced by current bool patterns and are hard to optimize out.
> >>
> >> I dislike the bool patterns anyway and we should try to remove those
> >> and make the vectorizer handle them in other ways (they have single-use
> >> issues anyway).  I don't remember exactly what caused us to add them
> >> but one reason was there wasn't a vector type for 'bool' (but I don't see how
> >> it should be necessary to ask "get me a vector type for 'bool'").
> >
> > That was just one of the reasons.  The other reason is that even if we would
> > choose some vector of integer type as vector of bool, the question is what
> > type.  E.g. if you use vector of chars, you almost always get terrible
> > vectorized code, except for the AVX-512 you really want an integral type
> > that has the size of the types you are comparing.
> 
> Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
> first compute the vector type for the comparison itself (which is "fixed") and
> thus we can compute the vector type of any bitwise op on it as well.

Sure, but if you then immediately vector narrow it to a V*QI vector because
it is stored originally into a bool/_Bool variable, and then again when it
is used in say a COND_EXPR widen it again, you get really poor code.
So, what the bool pattern code does is kind of poor man's type
promotion/demotion pass for bool only, at least for the common cases.

PR50596 has been the primary reason to introduce the bool patterns.
If there is a better type promotion/demotion pass on a copy of the loop,
sure, we can get rid of it (but figure out also what to do for SLP).

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 13:47                 ` Jakub Jelinek
@ 2015-08-26 14:36                   ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2015-08-26 14:36 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ilya Enkovich, Jeff Law, GCC Patches

On Wed, Aug 26, 2015 at 3:35 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote:
>> On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
>> >> > AVX-512 is such target. Current representation forces multiple scalar
>> >> > mask -> vector mask and back transformations which are artificially
>> >> > introduced by current bool patterns and are hard to optimize out.
>> >>
>> >> I dislike the bool patterns anyway and we should try to remove those
>> >> and make the vectorizer handle them in other ways (they have single-use
>> >> issues anyway).  I don't remember exactly what caused us to add them
>> >> but one reason was there wasn't a vector type for 'bool' (but I don't see how
>> >> it should be necessary to ask "get me a vector type for 'bool'").
>> >
>> > That was just one of the reasons.  The other reason is that even if we would
>> > choose some vector of integer type as vector of bool, the question is what
>> > type.  E.g. if you use vector of chars, you almost always get terrible
>> > vectorized code, except for the AVX-512 you really want an integral type
>> > that has the size of the types you are comparing.
>>
>> Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
>> first compute the vector type for the comparison itself (which is "fixed") and
>> thus we can compute the vector type of any bitwise op on it as well.
>
> Sure, but if you then immediately vector narrow it to a V*QI vector because
> it is stored originally into a bool/_Bool variable, and then again when it
> is used in say a COND_EXPR widen it again, you get really poor code.
> So, what the bool pattern code does is kind of poor man's type
> promotion/demotion pass for bool only, at least for the common cases.

Yeah, I just looked at the code but in the end everything should be fixable
in the place we compute STMT_VINFO_VECTYPE.  The code just
looks at the LHS type plus at the narrowest type (for vectorization factor).
It should get re-structured to get the vector types from the operands
(much like code-generation will eventually fall back to).

> PR50596 has been the primary reason to introduce the bool patterns.
> If there is a better type promotion/demotion pass on a copy of the loop,
> sure, we can get rid of it (but figure out also what to do for SLP).

Yeah, of course.  Basic-block SLP just asks for the vectype during SLP
analysis AFAIK.

I suppose we want sth like get_result_vectype (gimple) which can look
at operands as well and can be used from both places.

After all we do want to fix the non-single-use issue somehow and getting
rid of the patterns sounds good to me anyway...

Not sure if I can get to the above for GCC 6, but at least putting it on my
TODO...

Richard.

>         Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 13:09           ` Richard Biener
  2015-08-26 13:21             ` Jakub Jelinek
@ 2015-08-26 14:51             ` Ilya Enkovich
  2015-08-26 15:02               ` Richard Biener
  1 sibling, 1 reply; 48+ messages in thread
From: Ilya Enkovich @ 2015-08-26 14:51 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

2015-08-26 16:02 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>
>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>
>> There are just no instructions for that but you have to pretend you
>> have to get code vectorized.
>
> Huh?  Bitwise ops should be readily available.

Right bitwise ops are available, but there is no comparison into a
vector and no masked loads and stores using vector masks (when we
speak about 512-bit vectors).

>
>>>
>>>> Also according to vector ABI integer mask should be used for mask
>>>> operand in case of masked vector call.
>>>
>>> What ABI?  The function signature of the intrinsics?  How would that
>>> come into play here?
>>
>> Not intrinsics. I mean OpenMP vector functions which require integer
>> arg for a mask in case of 512-bit vector.
>
> How do you declare those?

Something like this:

#pragma omp declare simd inbranch
int foo(int*);

>
>>>
>>>> Current implementation of masked loads, masked stores and bool
>>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>>> really call it a canonical representation for all targets?
>>>
>>> No idea - we'll revisit when another targets adds a similar capability.
>>
>> AVX-512 is such target. Current representation forces multiple scalar
>> mask -> vector mask and back transformations which are artificially
>> introduced by current bool patterns and are hard to optimize out.
>
> I dislike the bool patterns anyway and we should try to remove those
> and make the vectorizer handle them in other ways (they have single-use
> issues anyway).  I don't remember exactly what caused us to add them
> but one reason was there wasn't a vector type for 'bool' (but I don't see how
> it should be necessary to ask "get me a vector type for 'bool'").
>
>>>
>>>> Using scalar masks everywhere should probably cause the same conversion
>>>> problem for SSE I listed above though.
>>>>
>>>> Talking about a canonical representation, shouldn't we use some
>>>> special masks representation and not mixing it with integer and vector
>>>> of integers then? Only in this case target would be able to
>>>> efficiently expand it into a corresponding rtl.
>>>
>>> That was my idea of vector<bool> ... but I didn't explore it and see where
>>> it will cause issues.
>>>
>>> Fact is GCC already copes with vector masks generated by vector compares
>>> just fine everywhere and I'd rather leave it as that.
>>
>> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
>> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
>> additional vec_cond. I don't think vectorizer ever generates vector
>> comparison.
>
> Ok, well that's an implementation detail then.  Are you sure about AND and IOR?
> The comment above vect_recog_bool_pattern says
>
>         Assuming size of TYPE is the same as size of all comparisons
>         (otherwise some casts would be added where needed), the above
>         sequence we create related pattern stmts:
>         S1'  a_T = x1 CMP1 y1 ? 1 : 0;
>         S3'  c_T = x2 CMP2 y2 ? a_T : 0;
>         S4'  d_T = x3 CMP3 y3 ? 1 : 0;
>         S5'  e_T = c_T | d_T;
>         S6'  f_T = e_T;
>
> thus has vector mask |

I think in practice it would look like:

S4'  d_T = x3 CMP3 y3 ? 1 : c_T;

Thus everything is usually hidden in vec_cond. But my concern is
mostly about types used for that.

>
>> And I wouldn't say it's fine 'everywhere' because there is a single
>> target utilizing them. Masked loads and stored for AVX-512 just don't
>> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
>> 512-bit vector then we get an ugly inefficient code. The question is
>> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
>> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
>> int conversions applied everywhere even if target doesn't need it.
>>
>> If we don't want to support both types of masks in GIMPLE then it's
>> more reasonable to make bool -> int conversion in expand for targets
>> requiring it, rather than do it for everyone and then leave it to
>> target to transform it back and try to get rid of all those redundant
>> transformations. I'd give vector<bool> a chance to become a canonical
>> mask representation for that.
>
> Well, you are missing the case of
>
>    bool b = a < b;
>    int x = (int)b;

This case seems to require no changes and just be transformed into vec_cond.

Thanks,
Ilya

>
> where the bool is used as integer (and thus an integer mask would have to be
> "expanded").  When the bool is a mask in itself the integer use is either free
> or a matter of a widening/shortening operation.
>
> Richard.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 14:51             ` Ilya Enkovich
@ 2015-08-26 15:02               ` Richard Biener
  2015-08-26 15:15                 ` Jakub Jelinek
  2015-08-26 16:09                 ` Ilya Enkovich
  0 siblings, 2 replies; 48+ messages in thread
From: Richard Biener @ 2015-08-26 15:02 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, GCC Patches

On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-26 16:02 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>
>>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>>
>>> There are just no instructions for that but you have to pretend you
>>> have to get code vectorized.
>>
>> Huh?  Bitwise ops should be readily available.
>
> Right bitwise ops are available, but there is no comparison into a
> vector and no masked loads and stores using vector masks (when we
> speak about 512-bit vectors).
>
>>
>>>>
>>>>> Also according to vector ABI integer mask should be used for mask
>>>>> operand in case of masked vector call.
>>>>
>>>> What ABI?  The function signature of the intrinsics?  How would that
>>>> come into play here?
>>>
>>> Not intrinsics. I mean OpenMP vector functions which require integer
>>> arg for a mask in case of 512-bit vector.
>>
>> How do you declare those?
>
> Something like this:
>
> #pragma omp declare simd inbranch
> int foo(int*);

The 'inbranch' is the thing that matters?  And all of foo is then
implicitely predicated?

>>
>>>>
>>>>> Current implementation of masked loads, masked stores and bool
>>>>> patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
>>>>> really call it a canonical representation for all targets?
>>>>
>>>> No idea - we'll revisit when another targets adds a similar capability.
>>>
>>> AVX-512 is such target. Current representation forces multiple scalar
>>> mask -> vector mask and back transformations which are artificially
>>> introduced by current bool patterns and are hard to optimize out.
>>
>> I dislike the bool patterns anyway and we should try to remove those
>> and make the vectorizer handle them in other ways (they have single-use
>> issues anyway).  I don't remember exactly what caused us to add them
>> but one reason was there wasn't a vector type for 'bool' (but I don't see how
>> it should be necessary to ask "get me a vector type for 'bool'").
>>
>>>>
>>>>> Using scalar masks everywhere should probably cause the same conversion
>>>>> problem for SSE I listed above though.
>>>>>
>>>>> Talking about a canonical representation, shouldn't we use some
>>>>> special masks representation and not mixing it with integer and vector
>>>>> of integers then? Only in this case target would be able to
>>>>> efficiently expand it into a corresponding rtl.
>>>>
>>>> That was my idea of vector<bool> ... but I didn't explore it and see where
>>>> it will cause issues.
>>>>
>>>> Fact is GCC already copes with vector masks generated by vector compares
>>>> just fine everywhere and I'd rather leave it as that.
>>>
>>> Nope. Currently vector mask is obtained from a vec_cond <A op B, {0 ..
>>> 0}, {-1 .. -1}>. AND and IOR on bools are also expressed via
>>> additional vec_cond. I don't think vectorizer ever generates vector
>>> comparison.
>>
>> Ok, well that's an implementation detail then.  Are you sure about AND and IOR?
>> The comment above vect_recog_bool_pattern says
>>
>>         Assuming size of TYPE is the same as size of all comparisons
>>         (otherwise some casts would be added where needed), the above
>>         sequence we create related pattern stmts:
>>         S1'  a_T = x1 CMP1 y1 ? 1 : 0;
>>         S3'  c_T = x2 CMP2 y2 ? a_T : 0;
>>         S4'  d_T = x3 CMP3 y3 ? 1 : 0;
>>         S5'  e_T = c_T | d_T;
>>         S6'  f_T = e_T;
>>
>> thus has vector mask |
>
> I think in practice it would look like:
>
> S4'  d_T = x3 CMP3 y3 ? 1 : c_T;
>
> Thus everything is usually hidden in vec_cond. But my concern is
> mostly about types used for that.
>
>>
>>> And I wouldn't say it's fine 'everywhere' because there is a single
>>> target utilizing them. Masked loads and stored for AVX-512 just don't
>>> work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
>>> 512-bit vector then we get an ugly inefficient code. The question is
>>> where to fight with this inefficiency: in RTL or in GIMPLE. I want to
>>> fight with it where it appears, i.e. in GIMPLE by preventing bool ->
>>> int conversions applied everywhere even if target doesn't need it.
>>>
>>> If we don't want to support both types of masks in GIMPLE then it's
>>> more reasonable to make bool -> int conversion in expand for targets
>>> requiring it, rather than do it for everyone and then leave it to
>>> target to transform it back and try to get rid of all those redundant
>>> transformations. I'd give vector<bool> a chance to become a canonical
>>> mask representation for that.
>>
>> Well, you are missing the case of
>>
>>    bool b = a < b;
>>    int x = (int)b;
>
> This case seems to require no changes and just be transformed into vec_cond.

Ok, the example was too simple but I meant that a bool has a non-conditional
use.

Ok, so I still believe we don't want two ways to express things on GIMPLE if
possible.  Yes, the vectorizer already creates only vector stmts that
are supported
by the hardware.  So it's a matter of deciding on the GIMPLE representation
for the "mask".  I'd rather use vector<bool> (and the target assigning
an integer
mode to it) than an 'int' in GIMPLE statements.  Because that makes the
type constraints on GIMPLE very weak and exposes those 'ints' to all kind
of optimization passes.

Thus if we change the result type requirement of vector comparisons from
signed integer vectors to bool vectors the vectorizer can still go for
promoting that bool vector to a vector of ints via a VEC_COND_EXPR
and the expander can special-case that if the target has a vector comparison
producing a vector mask.

So, can you give that vector<bool> some thought?  Note that to assign
sth else than a vector mode to it needs adjustments in stor-layout.c.
I'm pretty sure we don't want vector BImodes.

Richard.

> Thanks,
> Ilya
>
>>
>> where the bool is used as integer (and thus an integer mask would have to be
>> "expanded").  When the bool is a mask in itself the integer use is either free
>> or a matter of a widening/shortening operation.
>>
>> Richard.
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 15:02               ` Richard Biener
@ 2015-08-26 15:15                 ` Jakub Jelinek
  2015-08-26 16:09                 ` Ilya Enkovich
  1 sibling, 0 replies; 48+ messages in thread
From: Jakub Jelinek @ 2015-08-26 15:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: Ilya Enkovich, Jeff Law, GCC Patches

On Wed, Aug 26, 2015 at 04:56:23PM +0200, Richard Biener wrote:
> >> How do you declare those?
> >
> > Something like this:
> >
> > #pragma omp declare simd inbranch
> > int foo(int*);
> 
> The 'inbranch' is the thing that matters?  And all of foo is then
> implicitely predicated?

If it is
#pragma omp declare simd notinbranch,
then only the non-predicated version is emitted and thus it is usable only
in vectorized loops inside of non-conditional contexts.
If it is
#pragma omp declare simd inbranch,
then only the predicated version is emitted, there is an extra argument
(either V*QI if I remember well, or for AVX-512 short/int/long bitmask),
if the caller wants to use it in non-conditional contexts, it just passes
all ones mask.  For
#pragma omp declare simd
(neither inbranch nor notinbranch), two versions are emitted, one predicated
and one non-predicated.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 15:02               ` Richard Biener
  2015-08-26 15:15                 ` Jakub Jelinek
@ 2015-08-26 16:09                 ` Ilya Enkovich
  2015-08-27  7:58                   ` Richard Biener
  1 sibling, 1 reply; 48+ messages in thread
From: Ilya Enkovich @ 2015-08-26 16:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

2015-08-26 17:56 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2015-08-26 16:02 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>
>>>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>>>
>>>> There are just no instructions for that but you have to pretend you
>>>> have to get code vectorized.
>>>
>>> Huh?  Bitwise ops should be readily available.
>>
>> Right bitwise ops are available, but there is no comparison into a
>> vector and no masked loads and stores using vector masks (when we
>> speak about 512-bit vectors).
>>
>>>
>>>>>
>>>>>> Also according to vector ABI integer mask should be used for mask
>>>>>> operand in case of masked vector call.
>>>>>
>>>>> What ABI?  The function signature of the intrinsics?  How would that
>>>>> come into play here?
>>>>
>>>> Not intrinsics. I mean OpenMP vector functions which require integer
>>>> arg for a mask in case of 512-bit vector.
>>>
>>> How do you declare those?
>>
>> Something like this:
>>
>> #pragma omp declare simd inbranch
>> int foo(int*);
>
> The 'inbranch' is the thing that matters?  And all of foo is then
> implicitely predicated?

That's right. And a vector version of foo gets a mask as an additional arg.

>
>>>
>>> Well, you are missing the case of
>>>
>>>    bool b = a < b;
>>>    int x = (int)b;
>>
>> This case seems to require no changes and just be transformed into vec_cond.
>
> Ok, the example was too simple but I meant that a bool has a non-conditional
> use.

Right. In such cases I think it's reasonable to replace it with a
select similar to what we now have but without whole bool tree
transformed.

>
> Ok, so I still believe we don't want two ways to express things on GIMPLE if
> possible.  Yes, the vectorizer already creates only vector stmts that
> are supported
> by the hardware.  So it's a matter of deciding on the GIMPLE representation
> for the "mask".  I'd rather use vector<bool> (and the target assigning
> an integer
> mode to it) than an 'int' in GIMPLE statements.  Because that makes the
> type constraints on GIMPLE very weak and exposes those 'ints' to all kind
> of optimization passes.
>
> Thus if we change the result type requirement of vector comparisons from
> signed integer vectors to bool vectors the vectorizer can still go for
> promoting that bool vector to a vector of ints via a VEC_COND_EXPR
> and the expander can special-case that if the target has a vector comparison
> producing a vector mask.
>
> So, can you give that vector<bool> some thought?

Yes, I want to try it. But getting rid of bool patterns would mean
support for all targets currently supporting vec_cond. Would it be OK
to have vector<bool> mask co-exist with bool patterns for some time?
Thus first step would be to require vector<bool> for MASK_LOAD and
MASK_STORE and support it for i386 (the only user of MASK_LOAD and
MASK_STORE).

>Note that to assign
> sth else than a vector mode to it needs adjustments in stor-layout.c.
> I'm pretty sure we don't want vector BImodes.

I can directly build a vector type with specified mode to avoid it. Smth. like:

mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
mask_type = make_vector_type (bool_type_node, nunits, mask_mode);

Thanks,
Ilya

>
> Richard.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 11:14         ` Ilya Enkovich
  2015-08-26 13:12           ` Richard Biener
@ 2015-08-26 16:58           ` Jeff Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeff Law @ 2015-08-26 16:58 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Richard Biener, GCC Patches

On 08/26/2015 05:13 AM, Ilya Enkovich wrote:
> 2015-08-26 0:42 GMT+03:00 Jeff Law <law@redhat.com>:
>> On 08/21/2015 04:49 AM, Ilya Enkovich wrote:
>>>
>>>
>>> I want a work with bitmasks to be expressed in a natural way using
>>> regular integer operations. Currently all masks manipulations are
>>> emulated via vector statements (mostly using a bunch of vec_cond). For
>>> complex predicates it may be nontrivial to transform it back to scalar
>>> masks and get an efficient code. Also the same vector may be used as
>>> both a mask and an integer vector. Things become more complex if you
>>> additionally have broadcasts and vector pack/unpack code. It also
>>> should be transformed into a scalar masks manipulations somehow.
>>
>> Or why not model the conversion at the gimple level using a CONVERT_EXPR?
>> In fact, the more I think about it, that seems to make more sense to me.
>>
>> We pick a canonical form for the mask, whatever it may be.  We use that
>> canonical form and model conversions between it and the other form via
>> CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
>> If it's not up to the task, we should really look into why and resolve.
>>
>> Yes, that does mean we have two forms which I'm not terribly happy about and
>> it means some target dependencies on what the masked vector operation looks
>> like (ie, does it accept a simple integer or vector mask), but I'm starting
>> to wonder if, as distasteful as I find it, it's the right thing to do.
>
> If we have some special representation for masks in GIMPLE then we
> might not need any conversions. We could ask a target to define a MODE
> for this type and use it directly everywhere: directly compare into
> it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
> that type is reserved for masks usage then you previous suggestion to
> transform masks into target specific form at GIMPLE->RTL phase should
> work fine. This would allow to support only a single masks
> representation in GIMPLE.
Possibly, but you mentioned that you may need to use the masks in both 
forms depending on the exact context.  If so, then I think we need to 
model a conversion between the two forms.


Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Scalar masks 2/x] Use bool masks in if-conversion
  2015-08-26 16:09                 ` Ilya Enkovich
@ 2015-08-27  7:58                   ` Richard Biener
  2015-09-01 13:13                     ` [RFC] Try vector<bool> as a new representation for vector masks Ilya Enkovich
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2015-08-27  7:58 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, GCC Patches

On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-08-26 17:56 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-08-26 16:02 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>> 2015-08-21 14:00 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>
>>>>>> Hmm, I don't see how vector masks are more difficult to operate with.
>>>>>
>>>>> There are just no instructions for that but you have to pretend you
>>>>> have to get code vectorized.
>>>>
>>>> Huh?  Bitwise ops should be readily available.
>>>
>>> Right bitwise ops are available, but there is no comparison into a
>>> vector and no masked loads and stores using vector masks (when we
>>> speak about 512-bit vectors).
>>>
>>>>
>>>>>>
>>>>>>> Also according to vector ABI integer mask should be used for mask
>>>>>>> operand in case of masked vector call.
>>>>>>
>>>>>> What ABI?  The function signature of the intrinsics?  How would that
>>>>>> come into play here?
>>>>>
>>>>> Not intrinsics. I mean OpenMP vector functions which require integer
>>>>> arg for a mask in case of 512-bit vector.
>>>>
>>>> How do you declare those?
>>>
>>> Something like this:
>>>
>>> #pragma omp declare simd inbranch
>>> int foo(int*);
>>
>> The 'inbranch' is the thing that matters?  And all of foo is then
>> implicitely predicated?
>
> That's right. And a vector version of foo gets a mask as an additional arg.
>
>>
>>>>
>>>> Well, you are missing the case of
>>>>
>>>>    bool b = a < b;
>>>>    int x = (int)b;
>>>
>>> This case seems to require no changes and just be transformed into vec_cond.
>>
>> Ok, the example was too simple but I meant that a bool has a non-conditional
>> use.
>
> Right. In such cases I think it's reasonable to replace it with a
> select similar to what we now have but without whole bool tree
> transformed.
>
>>
>> Ok, so I still believe we don't want two ways to express things on GIMPLE if
>> possible.  Yes, the vectorizer already creates only vector stmts that
>> are supported
>> by the hardware.  So it's a matter of deciding on the GIMPLE representation
>> for the "mask".  I'd rather use vector<bool> (and the target assigning
>> an integer
>> mode to it) than an 'int' in GIMPLE statements.  Because that makes the
>> type constraints on GIMPLE very weak and exposes those 'ints' to all kind
>> of optimization passes.
>>
>> Thus if we change the result type requirement of vector comparisons from
>> signed integer vectors to bool vectors the vectorizer can still go for
>> promoting that bool vector to a vector of ints via a VEC_COND_EXPR
>> and the expander can special-case that if the target has a vector comparison
>> producing a vector mask.
>>
>> So, can you give that vector<bool> some thought?
>
> Yes, I want to try it. But getting rid of bool patterns would mean
> support for all targets currently supporting vec_cond. Would it be OK
> to have vector<bool> mask co-exist with bool patterns for some time?

No, I'd like to remove the bool patterns anyhow - the vectorizer should be able
to figure out the correct vector type (or mask type) from the uses.  Currently
it simply looks at the stmts LHS type but as all stmt operands already have
vector types it can as well compute the result type from those.  We'd want to
have a helper function that does this result type computation as I figure it
will be needed in multiple places.

This is now on my personal TODO list (but that's already quite long for GCC 6),
so if you manage to get to that...  see
tree-vect-loop.c:vect_determine_vectorization_factor
which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
vector type set from data-ref analysis already - there 'bool' loads
correctly get
VNQImode).  There is a basic-block / SLP part as well that would need to use
the helper function (eventually with some SLP discovery order issue).

> Thus first step would be to require vector<bool> for MASK_LOAD and
> MASK_STORE and support it for i386 (the only user of MASK_LOAD and
> MASK_STORE).

You can certainly try that first, but as soon as you hit complications with
needing to adjust bool patterns then I'd rather get rid of them.

>>Note that to assign
>> sth else than a vector mode to it needs adjustments in stor-layout.c.
>> I'm pretty sure we don't want vector BImodes.
>
> I can directly build a vector type with specified mode to avoid it. Smth. like:
>
> mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
> mask_type = make_vector_type (bool_type_node, nunits, mask_mode);

Hmm, indeed, that might be a (good) solution.  Btw, in this case
target attribute
boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
on the active target).  There would also be no way for the user to
declare vector<bool>
in source (which is good because of that target attribute issue...).

So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
and adjusting
truth_type_for is the way to go.

I suggest you try modifying those parts first according to this scheme
that will most
likely uncover issues we missed.

Thanks,
Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC] Try vector<bool> as a new representation for vector masks
  2015-08-27  7:58                   ` Richard Biener
@ 2015-09-01 13:13                     ` Ilya Enkovich
  2015-09-01 14:25                       ` Richard Biener
  2015-09-04 20:47                       ` Jeff Law
  0 siblings, 2 replies; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-01 13:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 9148 bytes --]

On 27 Aug 09:55, Richard Biener wrote:
> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> >
> > Yes, I want to try it. But getting rid of bool patterns would mean
> > support for all targets currently supporting vec_cond. Would it be OK
> > to have vector<bool> mask co-exist with bool patterns for some time?
> 
> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able
> to figure out the correct vector type (or mask type) from the uses.  Currently
> it simply looks at the stmts LHS type but as all stmt operands already have
> vector types it can as well compute the result type from those.  We'd want to
> have a helper function that does this result type computation as I figure it
> will be needed in multiple places.
> 
> This is now on my personal TODO list (but that's already quite long for GCC 6),
> so if you manage to get to that...  see
> tree-vect-loop.c:vect_determine_vectorization_factor
> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
> vector type set from data-ref analysis already - there 'bool' loads
> correctly get
> VNQImode).  There is a basic-block / SLP part as well that would need to use
> the helper function (eventually with some SLP discovery order issue).
> 
> > Thus first step would be to require vector<bool> for MASK_LOAD and
> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
> > MASK_STORE).
> 
> You can certainly try that first, but as soon as you hit complications with
> needing to adjust bool patterns then I'd rather get rid of them.
> 
> >
> > I can directly build a vector type with specified mode to avoid it. Smth. like:
> >
> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
> 
> Hmm, indeed, that might be a (good) solution.  Btw, in this case
> target attribute
> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
> on the active target).  There would also be no way for the user to
> declare vector<bool>
> in source (which is good because of that target attribute issue...).
> 
> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
> and adjusting
> truth_type_for is the way to go.
> 
> I suggest you try modifying those parts first according to this scheme
> that will most
> likely uncover issues we missed.
> 
> Thanks,
> Richard.
> 

I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  There were no major issues (for now).

build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type.  It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length.

As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path.  Vectorization factor computation is fixed to have a separate computation for mask types.  Comparison is now handled separately by vectorizer and is vectorized into vector comparison.

Optabs for masked loads and stores were transformed into convert optabs.  Now it is checked using both value and mask modes.

Optabs for comparison were added.  These are also convert optabs checking value and result type.

I had to introduce significant number of new patterns in i386 target to support new optabs.  The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion.

As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type.  Here is an example I used as a simple test:

  for (i=0; i<N; i++)
  {
    float t = a[i];
    if (t > 0.0f && t < 1.0e+2f)
      if (c[i] != 0)
        c[i] = 1;
  }

Produced vector GIMPLE (before expand):

  vect_t_5.22_105 = MEM[base: _256, offset: 0B];
  mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
  mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
  mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
  vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
  mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
  mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
  MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 });

Produced assembler on AVX-512:

	vmovups	(%rdi), %zmm0
	vcmpps	$25, %zmm5, %zmm0, %k1
	vcmpps	$22, %zmm3, %zmm0, %k1{%k1}
	vmovdqa32	-64(%rdx), %zmm2{%k1}
	vpcmpd	$4, %zmm1, %zmm2, %k1{%k1}
	vmovdqa32	%zmm4, (%rcx){%k1}

Produced assembler on AVX-2:

	vmovups	(%rdx), %xmm1
	vinsertf128	$0x1, -16(%rdx), %ymm1, %ymm1
	vcmpltps	%ymm1, %ymm3, %ymm0
	vcmpltps	%ymm5, %ymm1, %ymm1
	vpand	%ymm0, %ymm1, %ymm0
	vpmaskmovd	-32(%rcx), %ymm0, %ymm1
	vpcmpeqd	%ymm2, %ymm1, %ymm1
	vpcmpeqd	%ymm2, %ymm1, %ymm1
	vpand	%ymm0, %ymm1, %ymm0
	vpmaskmovd	%ymm4, %ymm0, (%rax)

BTW AVX-2 code produced by trunk compiler is 4 insns longer:

	vmovups	(%rdx), %xmm0
	vinsertf128	$0x1, -16(%rdx), %ymm0, %ymm0
	vcmpltps	%ymm0, %ymm6, %ymm1
	vcmpltps	%ymm7, %ymm0, %ymm0
	vpand	%ymm1, %ymm5, %ymm2
	vpand	%ymm0, %ymm2, %ymm1
	vpcmpeqd	%ymm3, %ymm1, %ymm0
	vpandn	%ymm4, %ymm0, %ymm0
	vpmaskmovd	-32(%rcx), %ymm0, %ymm0
	vpcmpeqd	%ymm3, %ymm0, %ymm0
	vpandn	%ymm1, %ymm0, %ymm0
	vpcmpeqd	%ymm3, %ymm0, %ymm0
	vpandn	%ymm4, %ymm0, %ymm0
	vpmaskmovd	%ymm5, %ymm0, (%rax)


For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only.  Patch is also not tested and tried on several small tests only.  Could you please look at what I currently have and say if it's in sync with your view on vector masking?

Thanks,
Ilya
--
gcc/

2015-09-01  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
	(ix86_expand_int_vec_cmp): New.
	(ix86_expand_fp_vec_cmp): New.
	* config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
	op_true and op_false.
	(ix86_int_cmp_code_to_pcmp_immediate): New.
	(ix86_fp_cmp_code_to_pcmp_immediate): New.
	(ix86_cmp_code_to_pcmp_immediate): New.
	(ix86_expand_mask_vec_cmp): New.
	(ix86_expand_fp_vec_cmp): New.
	(ix86_expand_int_sse_cmp): New.
	(ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
	(ix86_expand_int_vec_cmp): New.
	(ix86_get_mask_mode): New.
	(TARGET_VECTORIZE_GET_MASK_MODE): New.
	* config/i386/sse.md (avx512fmaskmodelower): New.
	(vec_cmp<mode><avx512fmaskmodelower>): New.
	(vec_cmp<mode><sseintvecmodelower>): New.
	(vec_cmpv2div2di): New.
	(vec_cmpu<mode><avx512fmaskmodelower>): New.
	(vec_cmpu<mode><sseintvecmodelower>): New.
	(vec_cmpuv2div2di): New.
	(maskload<mode>): Rename to ...
	(maskload<mode><sseintvecmodelower>): ... this.
	(maskstore<mode>): Rename to ...
	(maskstore<mode><sseintvecmodelower>): ... this.
	(maskload<mode><avx512fmaskmodelower>): New.
	(maskstore<mode><avx512fmaskmodelower>): New.
	* doc/tm.texi: Regenerated.
	* doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
	* expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
	* internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
	(expand_MASK_STORE): Likewise.
	* optabs.c (vector_compare_rtx): Add OPNO arg.
	(expand_vec_cond_expr): Adjust to vector_compare_rtx change.
	(get_vec_cmp_icode): New.
	(expand_vec_cmp_expr_p): New.
	(expand_vec_cmp_expr): New.
	(can_vec_mask_load_store_p): Add MASK_MODE arg.
	* optabs.def (vec_cmp_optab): New.
	(vec_cmpu_optab): New.
	(maskload_optab): Transform into convert optab.
	(maskstore_optab): Likewise.
	* optabs.h (expand_vec_cmp_expr_p): New.
	(expand_vec_cmp_expr): New.
	(can_vec_mask_load_store_p): Add MASK_MODE arg.
	* target.def (get_mask_mode): New.
	* targhooks.c (default_vector_alignment): Use mode alignment
	for vector masks.
	(default_get_mask_mode): New.
	* targhooks.h (default_get_mask_mode): New.
	* tree-cfg.c (verify_gimple_comparison): Support vector mask.
	* tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
	can_vec_mask_load_store_p signature change.
	(predicate_mem_writes): Use boolean mask.
	* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
	(vect_create_destination_var): Likewise.
	* tree-vect-generic.c (expand_vector_comparison): Use
	expand_vec_cmp_expr_p for comparison availability.
	(expand_vector_operations_1): Ignore statements with scalar mode.
	* tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
	operations for VF.  Add mask type computation.
	* tree-vect-stmts.c (vect_get_vec_def_for_operand): Support mask
	constant.
	(vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
	signature change.
	(vectorizable_comparison): New.
	(vect_analyze_stmt): Add vectorizable_comparison.
	(vect_transform_stmt): Likewise.
	(get_mask_type_for_scalar_type): New.
	* tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
	(enum stmt_vec_info_type): Add comparison_vec_info_type.
	(get_mask_type_for_scalar_type): New.
	* tree.c (build_truth_vector_type): New.
	(truth_type_for): Use build_truth_vector_type for vectors.
	* tree.h (build_truth_vector_type): New.

[-- Attachment #2: vec-bool-mask.patch --]
[-- Type: text/plain, Size: 59247 bytes --]

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 6a17ef4..e22aa57 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -129,6 +129,9 @@ extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
 extern void ix86_expand_vec_perm (rtx[]);
 extern bool ix86_expand_vec_perm_const (rtx[]);
+extern bool ix86_expand_mask_vec_cmp (rtx[]);
+extern bool ix86_expand_int_vec_cmp (rtx[]);
+extern bool ix86_expand_fp_vec_cmp (rtx[]);
 extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 070605f..e44cdb5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21440,8 +21440,8 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1,
     cmp_op1 = force_reg (cmp_ops_mode, cmp_op1);
 
   if (optimize
-      || reg_overlap_mentioned_p (dest, op_true)
-      || reg_overlap_mentioned_p (dest, op_false))
+      || (op_true && reg_overlap_mentioned_p (dest, op_true))
+      || (op_false && reg_overlap_mentioned_p (dest, op_false)))
     dest = gen_reg_rtx (maskcmp ? cmp_mode : mode);
 
   /* Compare patterns for int modes are unspec in AVX512F only.  */
@@ -21713,34 +21713,127 @@ ix86_expand_fp_movcc (rtx operands[])
   return true;
 }
 
-/* Expand a floating-point vector conditional move; a vcond operation
-   rather than a movcc operation.  */
+/* Helper for ix86_cmp_code_to_pcmp_immediate for int modes.  */
+
+static int
+ix86_int_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+  switch (code)
+    {
+    case EQ:
+      return 0;
+    case LT:
+    case LTU:
+      return 1;
+    case LE:
+    case LEU:
+      return 2;
+    case NE:
+      return 4;
+    case GE:
+    case GEU:
+      return 5;
+    case GT:
+    case GTU:
+      return 6;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Helper for ix86_cmp_code_to_pcmp_immediate for fp modes.  */
+
+static int
+ix86_fp_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+  switch (code)
+    {
+    case EQ:
+      return 0x08;
+    case NE:
+      return 0x04;
+    case GT:
+      return 0x16;
+    case LE:
+      return 0x1a;
+    case GE:
+      return 0x15;
+    case LT:
+      return 0x19;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return immediate value to be used in UNSPEC_PCMP
+   for comparison CODE in MODE.  */
+
+static int
+ix86_cmp_code_to_pcmp_immediate (enum rtx_code code, machine_mode mode)
+{
+  if (FLOAT_MODE_P (mode))
+    return ix86_fp_cmp_code_to_pcmp_immediate (code);
+  return ix86_int_cmp_code_to_pcmp_immediate (code);
+}
+
+/* Expand AVX-512 vector comparison.  */
 
 bool
-ix86_expand_fp_vcond (rtx operands[])
+ix86_expand_mask_vec_cmp (rtx operands[])
 {
-  enum rtx_code code = GET_CODE (operands[3]);
+  machine_mode mask_mode = GET_MODE (operands[0]);
+  machine_mode cmp_mode = GET_MODE (operands[2]);
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx imm = GEN_INT (ix86_cmp_code_to_pcmp_immediate (code, cmp_mode));
+  int unspec_code;
+  rtx unspec;
+
+  switch (code)
+    {
+    case LEU:
+    case GTU:
+    case GEU:
+    case LTU:
+      unspec_code = UNSPEC_UNSIGNED_PCMP;
+    default:
+      unspec_code = UNSPEC_PCMP;
+    }
+
+  unspec = gen_rtx_UNSPEC (mask_mode, gen_rtvec (3, operands[2],
+						 operands[3], imm),
+			   unspec_code);
+  emit_insn (gen_rtx_SET (operands[0], unspec));
+
+  return true;
+}
+
+/* Expand fp vector comparison.  */
+
+bool
+ix86_expand_fp_vec_cmp (rtx operands[])
+{
+  enum rtx_code code = GET_CODE (operands[1]);
   rtx cmp;
 
   code = ix86_prepare_sse_fp_compare_args (operands[0], code,
-					   &operands[4], &operands[5]);
+					   &operands[2], &operands[3]);
   if (code == UNKNOWN)
     {
       rtx temp;
-      switch (GET_CODE (operands[3]))
+      switch (GET_CODE (operands[1]))
 	{
 	case LTGT:
-	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
-				      operands[5], operands[0], operands[0]);
-	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
-				     operands[5], operands[1], operands[2]);
+	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[2],
+				      operands[3], NULL, NULL);
+	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[2],
+				     operands[3], NULL, NULL);
 	  code = AND;
 	  break;
 	case UNEQ:
-	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
-				      operands[5], operands[0], operands[0]);
-	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
-				     operands[5], operands[1], operands[2]);
+	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[2],
+				      operands[3], NULL, NULL);
+	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[2],
+				     operands[3], NULL, NULL);
 	  code = IOR;
 	  break;
 	default:
@@ -21748,72 +21841,26 @@ ix86_expand_fp_vcond (rtx operands[])
 	}
       cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
 				 OPTAB_DIRECT);
-      ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
-      return true;
     }
+  else
+    cmp = ix86_expand_sse_cmp (operands[0], code, operands[2], operands[3],
+			       operands[1], operands[2]);
 
-  if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
-				 operands[5], operands[1], operands[2]))
-    return true;
+  if (operands[0] != cmp)
+    emit_move_insn (operands[0], cmp);
 
-  cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
-			     operands[1], operands[2]);
-  ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
   return true;
 }
 
-/* Expand a signed/unsigned integral vector conditional move.  */
-
-bool
-ix86_expand_int_vcond (rtx operands[])
+static rtx
+ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1,
+			 rtx op_true, rtx op_false, bool *negate)
 {
-  machine_mode data_mode = GET_MODE (operands[0]);
-  machine_mode mode = GET_MODE (operands[4]);
-  enum rtx_code code = GET_CODE (operands[3]);
-  bool negate = false;
-  rtx x, cop0, cop1;
-
-  cop0 = operands[4];
-  cop1 = operands[5];
+  machine_mode data_mode = GET_MODE (dest);
+  machine_mode mode = GET_MODE (cop0);
+  rtx x;
 
-  /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
-     and x < 0 ? 1 : 0 into (unsigned) x >> 31.  */
-  if ((code == LT || code == GE)
-      && data_mode == mode
-      && cop1 == CONST0_RTX (mode)
-      && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
-      && GET_MODE_UNIT_SIZE (data_mode) > 1
-      && GET_MODE_UNIT_SIZE (data_mode) <= 8
-      && (GET_MODE_SIZE (data_mode) == 16
-	  || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
-    {
-      rtx negop = operands[2 - (code == LT)];
-      int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
-      if (negop == CONST1_RTX (data_mode))
-	{
-	  rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
-					 operands[0], 1, OPTAB_DIRECT);
-	  if (res != operands[0])
-	    emit_move_insn (operands[0], res);
-	  return true;
-	}
-      else if (GET_MODE_INNER (data_mode) != DImode
-	       && vector_all_ones_operand (negop, data_mode))
-	{
-	  rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
-					 operands[0], 0, OPTAB_DIRECT);
-	  if (res != operands[0])
-	    emit_move_insn (operands[0], res);
-	  return true;
-	}
-    }
-
-  if (!nonimmediate_operand (cop1, mode))
-    cop1 = force_reg (mode, cop1);
-  if (!general_operand (operands[1], data_mode))
-    operands[1] = force_reg (data_mode, operands[1]);
-  if (!general_operand (operands[2], data_mode))
-    operands[2] = force_reg (data_mode, operands[2]);
+  *negate = false;
 
   /* XOP supports all of the comparisons on all 128-bit vector int types.  */
   if (TARGET_XOP
@@ -21834,13 +21881,13 @@ ix86_expand_int_vcond (rtx operands[])
 	case LE:
 	case LEU:
 	  code = reverse_condition (code);
-	  negate = true;
+	  *negate = true;
 	  break;
 
 	case GE:
 	case GEU:
 	  code = reverse_condition (code);
-	  negate = true;
+	  *negate = true;
 	  /* FALLTHRU */
 
 	case LT:
@@ -21861,14 +21908,14 @@ ix86_expand_int_vcond (rtx operands[])
 	    case EQ:
 	      /* SSE4.1 supports EQ.  */
 	      if (!TARGET_SSE4_1)
-		return false;
+		return NULL;
 	      break;
 
 	    case GT:
 	    case GTU:
 	      /* SSE4.2 supports GT/GTU.  */
 	      if (!TARGET_SSE4_2)
-		return false;
+		return NULL;
 	      break;
 
 	    default:
@@ -21929,12 +21976,13 @@ ix86_expand_int_vcond (rtx operands[])
 	    case V8HImode:
 	      /* Perform a parallel unsigned saturating subtraction.  */
 	      x = gen_reg_rtx (mode);
-	      emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0, cop1)));
+	      emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0,
+							   cop1)));
 
 	      cop0 = x;
 	      cop1 = CONST0_RTX (mode);
 	      code = EQ;
-	      negate = !negate;
+	      *negate = !*negate;
 	      break;
 
 	    default:
@@ -21943,22 +21991,162 @@ ix86_expand_int_vcond (rtx operands[])
 	}
     }
 
+  if (*negate)
+    std::swap (op_true, op_false);
+
   /* Allow the comparison to be done in one mode, but the movcc to
      happen in another mode.  */
   if (data_mode == mode)
     {
-      x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-			       operands[1+negate], operands[2-negate]);
+      x = ix86_expand_sse_cmp (dest, code, cop0, cop1,
+			       op_true, op_false);
     }
   else
     {
       gcc_assert (GET_MODE_SIZE (data_mode) == GET_MODE_SIZE (mode));
       x = ix86_expand_sse_cmp (gen_reg_rtx (mode), code, cop0, cop1,
-			       operands[1+negate], operands[2-negate]);
+			       op_true, op_false);
       if (GET_MODE (x) == mode)
 	x = gen_lowpart (data_mode, x);
     }
 
+  return x;
+}
+
+/* Expand integer vector comparison.  */
+
+bool
+ix86_expand_int_vec_cmp (rtx operands[])
+{
+  rtx_code code = GET_CODE (operands[1]);
+  bool negate = false;
+  rtx cmp = ix86_expand_int_sse_cmp (operands[0], code, operands[2],
+				     operands[3], NULL, NULL, &negate);
+
+  if (!cmp)
+    return false;
+
+  if (negate)
+    cmp = ix86_expand_int_sse_cmp (operands[0], EQ, cmp,
+				   CONST0_RTX (GET_MODE (cmp)),
+				   NULL, NULL, &negate);
+
+  gcc_assert (!negate);
+
+  if (operands[0] != cmp)
+    emit_move_insn (operands[0], cmp);
+
+  return true;
+}
+
+/* Expand a floating-point vector conditional move; a vcond operation
+   rather than a movcc operation.  */
+
+bool
+ix86_expand_fp_vcond (rtx operands[])
+{
+  enum rtx_code code = GET_CODE (operands[3]);
+  rtx cmp;
+
+  code = ix86_prepare_sse_fp_compare_args (operands[0], code,
+					   &operands[4], &operands[5]);
+  if (code == UNKNOWN)
+    {
+      rtx temp;
+      switch (GET_CODE (operands[3]))
+	{
+	case LTGT:
+	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
+				      operands[5], operands[0], operands[0]);
+	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
+				     operands[5], operands[1], operands[2]);
+	  code = AND;
+	  break;
+	case UNEQ:
+	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
+				      operands[5], operands[0], operands[0]);
+	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
+				     operands[5], operands[1], operands[2]);
+	  code = IOR;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
+				 OPTAB_DIRECT);
+      ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+      return true;
+    }
+
+  if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
+				 operands[5], operands[1], operands[2]))
+    return true;
+
+  cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
+			     operands[1], operands[2]);
+  ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+  return true;
+}
+
+/* Expand a signed/unsigned integral vector conditional move.  */
+
+bool
+ix86_expand_int_vcond (rtx operands[])
+{
+  machine_mode data_mode = GET_MODE (operands[0]);
+  machine_mode mode = GET_MODE (operands[4]);
+  enum rtx_code code = GET_CODE (operands[3]);
+  bool negate = false;
+  rtx x, cop0, cop1;
+
+  cop0 = operands[4];
+  cop1 = operands[5];
+
+  /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
+     and x < 0 ? 1 : 0 into (unsigned) x >> 31.  */
+  if ((code == LT || code == GE)
+      && data_mode == mode
+      && cop1 == CONST0_RTX (mode)
+      && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
+      && GET_MODE_UNIT_SIZE (data_mode) > 1
+      && GET_MODE_UNIT_SIZE (data_mode) <= 8
+      && (GET_MODE_SIZE (data_mode) == 16
+	  || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
+    {
+      rtx negop = operands[2 - (code == LT)];
+      int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
+      if (negop == CONST1_RTX (data_mode))
+	{
+	  rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
+					 operands[0], 1, OPTAB_DIRECT);
+	  if (res != operands[0])
+	    emit_move_insn (operands[0], res);
+	  return true;
+	}
+      else if (GET_MODE_INNER (data_mode) != DImode
+	       && vector_all_ones_operand (negop, data_mode))
+	{
+	  rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
+					 operands[0], 0, OPTAB_DIRECT);
+	  if (res != operands[0])
+	    emit_move_insn (operands[0], res);
+	  return true;
+	}
+    }
+
+  if (!nonimmediate_operand (cop1, mode))
+    cop1 = force_reg (mode, cop1);
+  if (!general_operand (operands[1], data_mode))
+    operands[1] = force_reg (data_mode, operands[1]);
+  if (!general_operand (operands[2], data_mode))
+    operands[2] = force_reg (data_mode, operands[2]);
+
+  x = ix86_expand_int_sse_cmp (operands[0], code, cop0, cop1,
+			       operands[1], operands[2], &negate);
+
+  if (!x)
+    return false;
+
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 			 operands[2-negate]);
   return true;
@@ -51678,6 +51866,25 @@ ix86_autovectorize_vector_sizes (void)
     (TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0;
 }
 
+/* Implemenation of targetm.vectorize.get_mask_mode.  */
+
+static machine_mode
+ix86_get_mask_mode (unsigned nunits, unsigned vector_size)
+{
+  /* Scalar mask case.  */
+  if ((TARGET_AVX512F && vector_size == 64)
+      || TARGET_AVX512VL)
+    return smallest_mode_for_size (nunits, MODE_INT);
+
+  unsigned elem_size = vector_size / nunits;
+  machine_mode elem_mode
+    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
+
+  gcc_assert (elem_size * nunits == vector_size);
+
+  return mode_for_vector (elem_mode, nunits);
+}
+
 \f
 
 /* Return class of registers which could be used for pseudo of MODE
@@ -52612,6 +52819,8 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
 #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode
 #undef TARGET_VECTORIZE_INIT_COST
 #define TARGET_VECTORIZE_INIT_COST ix86_init_cost
 #undef TARGET_VECTORIZE_ADD_STMT_COST
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4535570..a8d55cc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -605,6 +605,15 @@
    (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
+;; Mapping of vector modes to corresponding mask size
+(define_mode_attr avx512fmaskmodelower
+  [(V64QI "di") (V32QI "si") (V16QI "hi")
+   (V32HI "si") (V16HI "hi") (V8HI  "qi") (V4HI "qi")
+   (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
+   (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
+   (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
+   (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V16SF "V16SI") (V8DF  "V8DI")
@@ -2803,6 +2812,150 @@
 		      (const_string "0")))
    (set_attr "mode" "<MODE>")])
 
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:V48_AVX512VL 2 "register_operand")
+	   (match_operand:V48_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512F"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI12_AVX512VL 2 "register_operand")
+	   (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512BW"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI_256 2 "register_operand")
+	   (match_operand:VI_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI124_128 2 "register_operand")
+	   (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpv2div2di"
+  [(set (match_operand:V2DI 0 "register_operand")
+	(match_operator:V2DI 1 ""
+	  [(match_operand:V2DI 2 "register_operand")
+	   (match_operand:V2DI 3 "nonimmediate_operand")]))]
+  "TARGET_SSE4_2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VF_256 2 "register_operand")
+	   (match_operand:VF_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX"
+{
+  bool ok = ix86_expand_fp_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VF_128 2 "register_operand")
+	   (match_operand:VF_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE"
+{
+  bool ok = ix86_expand_fp_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI48_AVX512VL 2 "register_operand")
+	   (match_operand:VI48_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512F"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI12_AVX512VL 2 "register_operand")
+	   (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512BW"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI_256 2 "register_operand")
+	   (match_operand:VI_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI124_128 2 "register_operand")
+	   (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpuv2div2di"
+  [(set (match_operand:V2DI 0 "register_operand")
+	(match_operator:V2DI 1 ""
+	  [(match_operand:V2DI 2 "register_operand")
+	   (match_operand:V2DI 3 "nonimmediate_operand")]))]
+  "TARGET_SSE4_2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
 (define_expand "vcond<V_512:mode><VF_512:mode>"
   [(set (match_operand:V_512 0 "register_operand")
 	(if_then_else:V_512
@@ -17895,7 +18048,7 @@
    (set_attr "btver2_decode" "vector") 
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "maskload<mode>"
+(define_expand "maskload<mode><sseintvecmodelower>"
   [(set (match_operand:V48_AVX2 0 "register_operand")
 	(unspec:V48_AVX2
 	  [(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17903,7 +18056,23 @@
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
-(define_expand "maskstore<mode>"
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+  [(set (match_operand:V48_AVX512VL 0 "register_operand")
+	(vec_merge:V48_AVX512VL
+	  (match_operand:V48_AVX512VL 1 "memory_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+  [(set (match_operand:VI12_AVX512VL 0 "register_operand")
+	(vec_merge:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "memory_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512BW")
+
+(define_expand "maskstore<mode><sseintvecmodelower>"
   [(set (match_operand:V48_AVX2 0 "memory_operand")
 	(unspec:V48_AVX2
 	  [(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17912,6 +18081,22 @@
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
+	(vec_merge:V48_AVX512VL
+	  (match_operand:V48_AVX512VL 1 "register_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+  [(set (match_operand:VI12_AVX512VL 0 "memory_operand")
+	(vec_merge:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "register_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512BW")
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f5a1f84..acdfcd5 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5688,6 +5688,11 @@ mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.
 The default is zero which means to not iterate over other vector sizes.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_GET_MASK_MODE (unsigned @var{nunits}, unsigned @var{length})
+This hook returns mode to be used for a mask to be used for a vector
+of specified @var{length} with @var{nunits} elements.
+@end deftypefn
+
 @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info})
 This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block.  The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block.  If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9d5ac0a..52e912a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4225,6 +4225,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
 
+@hook TARGET_VECTORIZE_GET_MASK_MODE
+
 @hook TARGET_VECTORIZE_INIT_COST
 
 @hook TARGET_VECTORIZE_ADD_STMT_COST
diff --git a/gcc/expr.c b/gcc/expr.c
index 1e820b4..fa48484 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11000,9 +11000,15 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
     {
       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
-      tree if_true = constant_boolean_node (true, ops->type);
-      tree if_false = constant_boolean_node (false, ops->type);
-      return expand_vec_cond_expr (ops->type, ifexp, if_true, if_false, target);
+      if (TREE_TYPE (ops->type) == boolean_type_node)
+	return expand_vec_cmp_expr (ops->type, ifexp, target);
+      else
+	{
+	  tree if_true = constant_boolean_node (true, ops->type);
+	  tree if_false = constant_boolean_node (false, ops->type);
+	  return expand_vec_cond_expr (ops->type, ifexp, if_true,
+				       if_false, target);
+	}
     }
 
   /* Get the rtx comparison code to use.  We know that EXP is a comparison
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index e785946..4ca0a40 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
   create_output_operand (&ops[0], target, TYPE_MODE (type));
   create_fixed_operand (&ops[1], mem);
   create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
+  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
+				      TYPE_MODE (TREE_TYPE (maskt))),
+	       3, ops);
 }
 
 static void
@@ -1908,7 +1910,9 @@ expand_MASK_STORE (gcall *stmt)
   create_fixed_operand (&ops[0], mem);
   create_input_operand (&ops[1], reg, TYPE_MODE (type));
   create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops);
+  expand_insn (convert_optab_handler (maskstore_optab, TYPE_MODE (type),
+				      TYPE_MODE (TREE_TYPE (maskt))),
+	       3, ops);
 }
 
 static void
diff --git a/gcc/optabs.c b/gcc/optabs.c
index e533e6e..48f7914 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6490,11 +6490,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
 }
 
 /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
-   unsigned operators. Do not generate compare instruction.  */
+   unsigned operators.  OPNO holds an index of the first comparison
+   operand in insn with code ICODE.  Do not generate compare instruction.  */
 
 static rtx
 vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
-		    bool unsignedp, enum insn_code icode)
+		    bool unsignedp, enum insn_code icode,
+		    unsigned int opno)
 {
   struct expand_operand ops[2];
   rtx rtx_op0, rtx_op1;
@@ -6520,7 +6522,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
 
   create_input_operand (&ops[0], rtx_op0, m0);
   create_input_operand (&ops[1], rtx_op1, m1);
-  if (!maybe_legitimize_operands (icode, 4, 2, ops))
+  if (!maybe_legitimize_operands (icode, opno, 2, ops))
     gcc_unreachable ();
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
@@ -6863,7 +6865,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode);
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
@@ -6877,6 +6879,63 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   return ops[0].value;
 }
 
+/* Return insn code for a comparison operator with VMODE
+   resultin MASK_MODE, unsigned if UNS is true.  */
+
+static inline enum insn_code
+get_vec_cmp_icode (machine_mode vmode, machine_mode mask_mode, bool uns)
+{
+  optab tab = uns ? vec_cmpu_optab : vec_cmp_optab;
+  return convert_optab_handler (tab, vmode, mask_mode);
+}
+
+/* Return TRUE if appropriate vector insn is available
+   for vector comparison expr with vector type VALUE_TYPE
+   and resulting mask with MASK_TYPE.  */
+
+bool
+expand_vec_cmp_expr_p (tree value_type, tree mask_type)
+{
+  enum insn_code icode = get_vec_cmp_icode (TYPE_MODE (value_type),
+					    TYPE_MODE (mask_type),
+					    TYPE_UNSIGNED (value_type));
+  return (icode != CODE_FOR_nothing);
+}
+
+/* Generate insns for a vector comparison into a mask.  */
+
+rtx
+expand_vec_cmp_expr (tree type, tree exp, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  rtx comparison;
+  machine_mode mask_mode = TYPE_MODE (type);
+  machine_mode vmode;
+  bool unsignedp;
+  tree op0a, op0b;
+  enum tree_code tcode;
+
+  op0a = TREE_OPERAND (exp, 0);
+  op0b = TREE_OPERAND (exp, 1);
+  tcode = TREE_CODE (exp);
+
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+  vmode = TYPE_MODE (TREE_TYPE (op0a));
+
+  icode = get_vec_cmp_icode (vmode, mask_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
+  create_output_operand (&ops[0], target, mask_mode);
+  create_fixed_operand (&ops[1], comparison);
+  create_fixed_operand (&ops[2], XEXP (comparison, 0));
+  create_fixed_operand (&ops[3], XEXP (comparison, 1));
+  expand_insn (icode, 4, ops);
+  return ops[0].value;
+}
+
 /* Return non-zero if a highpart multiply is supported of can be synthisized.
    For the benefit of expand_mult_highpart, the return value is 1 for direct,
    2 for even/odd widening, and 3 for hi/lo widening.  */
@@ -7002,26 +7061,32 @@ expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
 
 /* Return true if target supports vector masked load/store for mode.  */
 bool
-can_vec_mask_load_store_p (machine_mode mode, bool is_load)
+can_vec_mask_load_store_p (machine_mode mode,
+			   machine_mode mask_mode,
+			   bool is_load)
 {
   optab op = is_load ? maskload_optab : maskstore_optab;
-  machine_mode vmode;
   unsigned int vector_sizes;
 
   /* If mode is vector mode, check it directly.  */
   if (VECTOR_MODE_P (mode))
-    return optab_handler (op, mode) != CODE_FOR_nothing;
+    return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
 
   /* Otherwise, return true if there is some vector mode with
      the mask load/store supported.  */
 
   /* See if there is any chance the mask load or store might be
      vectorized.  If not, punt.  */
-  vmode = targetm.vectorize.preferred_simd_mode (mode);
-  if (!VECTOR_MODE_P (vmode))
+  mode = targetm.vectorize.preferred_simd_mode (mode);
+  if (!VECTOR_MODE_P (mode))
+    return false;
+
+  mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+					       GET_MODE_SIZE (mode));
+  if (mask_mode == VOIDmode)
     return false;
 
-  if (optab_handler (op, vmode) != CODE_FOR_nothing)
+  if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
     return true;
 
   vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
@@ -7031,9 +7096,12 @@ can_vec_mask_load_store_p (machine_mode mode, bool is_load)
       vector_sizes &= ~cur;
       if (cur <= GET_MODE_SIZE (mode))
 	continue;
-      vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
-      if (VECTOR_MODE_P (vmode)
-	  && optab_handler (op, vmode) != CODE_FOR_nothing)
+      mode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
+      mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+						   cur);
+      if (VECTOR_MODE_P (mode)
+	  && mask_mode != VOIDmode
+	  && convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
 	return true;
     }
   return false;
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..9804378 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -61,6 +61,10 @@ OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b")
 OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b")
 OPTAB_CD(vcond_optab, "vcond$a$b")
 OPTAB_CD(vcondu_optab, "vcondu$a$b")
+OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
+OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
+OPTAB_CD(maskload_optab, "maskload$a$b")
+OPTAB_CD(maskstore_optab, "maskstore$a$b")
 
 OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(add_optab, "add$F$a3")
@@ -264,8 +268,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
-OPTAB_D (maskload_optab, "maskload$a")
-OPTAB_D (maskstore_optab, "maskstore$a")
 OPTAB_D (vec_extract_optab, "vec_extract$a")
 OPTAB_D (vec_init_optab, "vec_init$a")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..dfe9ebf 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -496,6 +496,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
 extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
 
 /* Return tree if target supports vector operations for COND_EXPR.  */
+bool expand_vec_cmp_expr_p (tree, tree);
+
+/* Generate code for VEC_COND_EXPR.  */
+extern rtx expand_vec_cmp_expr (tree, tree, rtx);
+
+/* Return true if target supports vector comparison.  */
 bool expand_vec_cond_expr_p (tree, tree);
 
 /* Generate code for VEC_COND_EXPR.  */
@@ -508,7 +514,7 @@ extern int can_mult_highpart_p (machine_mode, bool);
 extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
 
 /* Return true if target supports vector masked load/store for mode.  */
-extern bool can_vec_mask_load_store_p (machine_mode, bool);
+extern bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
 
 /* Return true if there is an inline compare and swap pattern.  */
 extern bool can_compare_and_swap_p (machine_mode, bool);
diff --git a/gcc/target.def b/gcc/target.def
index 4edc209..c5b8ed9 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1789,6 +1789,15 @@ The default is zero which means to not iterate over other vector sizes.",
  (void),
  default_autovectorize_vector_sizes)
 
+/* Function to get a target mode for a vector mask.  */
+DEFHOOK
+(get_mask_mode,
+ "This hook returns mode to be used for a mask to be used for a vector\n\
+of specified @var{length} with @var{nunits} elements.",
+ machine_mode,
+ (unsigned nunits, unsigned length),
+ default_get_mask_mode)
+
 /* Target builtin that implements vector gather operation.  */
 DEFHOOK
 (builtin_gather,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 7238c8f..61fb97d 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1033,6 +1033,8 @@ tree default_mangle_decl_assembler_name (tree decl ATTRIBUTE_UNUSED,
 HOST_WIDE_INT
 default_vector_alignment (const_tree type)
 {
+  if (TREE_TYPE (type) == boolean_type_node)
+    return GET_MODE_ALIGNMENT (TYPE_MODE (type));
   return tree_to_shwi (TYPE_SIZE (type));
 }
 
@@ -1087,6 +1089,20 @@ default_autovectorize_vector_sizes (void)
   return 0;
 }
 
+/* By defaults a vector of integers is used as a mask.  */
+
+machine_mode
+default_get_mask_mode (unsigned nunits, unsigned vector_size)
+{
+  unsigned elem_size = vector_size / nunits;
+  machine_mode elem_mode
+    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
+
+  gcc_assert (elem_size * nunits == vector_size);
+
+  return mode_for_vector (elem_mode, nunits);
+}
+
 /* By default, the cost model accumulates three separate costs (prologue,
    loop body, and epilogue) for a vectorized loop or block.  So allocate an
    array of three unsigned ints, set it to zero, and return its address.  */
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 5ae991d..cc7263f 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -100,6 +100,7 @@ default_builtin_support_vector_misalignment (machine_mode mode,
 					     int, bool);
 extern machine_mode default_preferred_simd_mode (machine_mode mode);
 extern unsigned int default_autovectorize_vector_sizes (void);
+extern machine_mode default_get_mask_mode (unsigned, unsigned);
 extern void *default_init_cost (struct loop *);
 extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
 				       struct _stmt_vec_info *, int,
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5ac73b3..1ee8f93 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3490,6 +3490,27 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
           return true;
         }
     }
+  /* Or a boolean vector type with the same element count
+     as the comparison operand types.  */
+  else if (TREE_CODE (type) == VECTOR_TYPE
+	   && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+    }
   else
     {
       error ("bogus comparison result type");
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 291e602..1c9242a 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -811,7 +811,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
       || VECTOR_MODE_P (mode))
     return false;
 
-  if (can_vec_mask_load_store_p (mode, is_load))
+  if (can_vec_mask_load_store_p (mode, VOIDmode, is_load))
     return true;
 
   return false;
@@ -2082,15 +2082,14 @@ predicate_mem_writes (loop_p loop)
 	      mask = vect_masks[index];
 	    else
 	      {
-		masktype = build_nonstandard_integer_type (bitsize, 1);
-		mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
-		mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
-		cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-						   is_gimple_condexpr,
-						   NULL_TREE,
-						   true, GSI_SAME_STMT);
-		mask = fold_build_cond_expr (masktype, unshare_expr (cond),
-					     mask_op0, mask_op1);
+		masktype = boolean_type_node;
+		if (TREE_CODE (cond) == NE_EXPR
+		    && integer_zerop (TREE_OPERAND (cond, 1))
+		    && TREE_CODE (TREE_TYPE (TREE_OPERAND (cond, 0)))
+		       == BOOLEAN_TYPE)
+		  mask = TREE_OPERAND (cond, 0);
+		else
+		  mask = unshare_expr (cond);
 		mask = ifc_temp_var (masktype, mask, &gsi);
 		/* Save mask and its size for further use.  */
 	        vect_sizes.safe_push (bitsize);
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index f1eaef4..b3b970a 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3849,6 +3849,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind var_kind, const char *name)
   case vect_scalar_var:
     prefix = "stmp";
     break;
+  case vect_mask_var:
+    prefix = "mask";
+    break;
   case vect_pointer_var:
     prefix = "vectp";
     break;
@@ -4403,7 +4406,11 @@ vect_create_destination_var (tree scalar_dest, tree vectype)
   tree type;
   enum vect_var_kind kind;
 
-  kind = vectype ? vect_simple_var : vect_scalar_var;
+  kind = vectype
+    ? TREE_TYPE (vectype) == boolean_type_node
+    ? vect_mask_var
+    : vect_simple_var
+    : vect_scalar_var;
   type = vectype ? vectype : TREE_TYPE (scalar_dest);
 
   gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index be3d27f..3e4fabc 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -350,7 +350,8 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
                           tree op1, enum tree_code code)
 {
   tree t;
-  if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
+  if (!expand_vec_cmp_expr_p (TREE_TYPE (op0), type)
+      && !expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
     t = expand_vector_piecewise (gsi, do_compare, type,
 				 TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
   else
@@ -1506,6 +1507,11 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != VECTOR_TYPE)
     return;
 
+  /* A scalar operation pretending to be a vector one.  */
+  if (!VECTOR_MODE_P (TYPE_MODE (type))
+      && TYPE_MODE (type) != BLKmode)
+    return;
+
   if (CONVERT_EXPR_CODE_P (code)
       || code == FLOAT_EXPR
       || code == FIX_TRUNC_EXPR
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 59c75af..f2dbc4e 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -193,19 +193,21 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  int nbbs = loop->num_nodes;
+  unsigned nbbs = loop->num_nodes;
   unsigned int vectorization_factor = 0;
   tree scalar_type;
   gphi *phi;
   tree vectype;
   unsigned int nunits;
   stmt_vec_info stmt_info;
-  int i;
+  unsigned i;
   HOST_WIDE_INT dummy;
   gimple stmt, pattern_stmt = NULL;
   gimple_seq pattern_def_seq = NULL;
   gimple_stmt_iterator pattern_def_si = gsi_none ();
   bool analyze_pattern_stmt = false;
+  bool bool_result;
+  auto_vec<stmt_vec_info> mask_producers;
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -424,6 +426,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 	      return false;
 	    }
 
+	  bool_result = false;
+
 	  if (STMT_VINFO_VECTYPE (stmt_info))
 	    {
 	      /* The only case when a vectype had been already set is for stmts
@@ -444,6 +448,30 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 		scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
 	      else
 		scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
+
+	      /* Bool ops don't participate in vectorization factor
+		 computation.  For comparison use compared types to
+		 compute a factor.  */
+	      if (scalar_type == boolean_type_node)
+		{
+		  mask_producers.safe_push (stmt_info);
+		  bool_result = true;
+
+		  if (gimple_code (stmt) == GIMPLE_ASSIGN
+		      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
+		      && TREE_TYPE (gimple_assign_rhs1 (stmt)) != boolean_type_node)
+		    scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+		  else
+		    {
+		      if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
+			{
+			  pattern_def_seq = NULL;
+			  gsi_next (&si);
+			}
+		      continue;
+		    }
+		}
+
 	      if (dump_enabled_p ())
 		{
 		  dump_printf_loc (MSG_NOTE, vect_location,
@@ -466,7 +494,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 		  return false;
 		}
 
-	      STMT_VINFO_VECTYPE (stmt_info) = vectype;
+	      if (!bool_result)
+		STMT_VINFO_VECTYPE (stmt_info) = vectype;
 
 	      if (dump_enabled_p ())
 		{
@@ -479,8 +508,9 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 	  /* The vectorization factor is according to the smallest
 	     scalar type (or the largest vector size, but we only
 	     support one vector size per loop).  */
-	  scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
-						       &dummy);
+	  if (!bool_result)
+	    scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
+							 &dummy);
 	  if (dump_enabled_p ())
 	    {
 	      dump_printf_loc (MSG_NOTE, vect_location,
@@ -555,6 +585,100 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
     }
   LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
 
+  for (i = 0; i < mask_producers.length (); i++)
+    {
+      tree mask_type = NULL;
+      bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (mask_producers[i]);
+
+      stmt = STMT_VINFO_STMT (mask_producers[i]);
+
+      if (gimple_code (stmt) == GIMPLE_ASSIGN
+	  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
+	  && TREE_TYPE (gimple_assign_rhs1 (stmt)) != boolean_type_node)
+	{
+	  scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+	  mask_type = get_mask_type_for_scalar_type (scalar_type);
+
+	  if (!mask_type)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "not vectorized: unsupported mask\n");
+	      return false;
+	    }
+	}
+      else
+	{
+	  tree rhs, def;
+	  ssa_op_iter iter;
+	  gimple def_stmt;
+	  enum vect_def_type dt;
+
+	  FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
+	    {
+	      if (!vect_is_simple_use_1 (rhs, stmt, loop_vinfo, bb_vinfo,
+					 &def_stmt, &def, &dt, &vectype))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "not vectorized: can't compute mask type "
+				       "for statement, ");
+		      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
+					0);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		    }
+		  return false;
+		}
+
+	      /* No vectype probably means external definition.
+		 Allow it in case there is another operand which
+		 allows to determine mask type.  */
+	      if (!vectype)
+		continue;
+
+	      if (!mask_type)
+		mask_type = vectype;
+	      else if (TYPE_VECTOR_SUBPARTS (mask_type)
+		       != TYPE_VECTOR_SUBPARTS (vectype))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "not vectorized: different sized masks "
+				       "types in statement, ");
+		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+					 mask_type);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+					 vectype);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		    }
+		  return false;
+		}
+	    }
+	}
+
+      /* No mask_type should mean loop invariant predicate.
+	 This is probably a subject for optimization in
+	 if-conversion.  */
+      if (!mask_type)
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: can't compute mask type "
+			       "for statement, ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
+				0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+	  return false;
+	}
+
+      STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type;
+    }
+
   return true;
 }
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f87c066..41fb401 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1411,7 +1411,15 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
     /* Case 1: operand is a constant.  */
     case vect_constant_def:
       {
-	vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+	if (TREE_TYPE (op) == boolean_type_node)
+	  {
+	    vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
+	    nunits = TYPE_VECTOR_SUBPARTS (vector_type);
+	    vector_type = build_truth_vector_type (nunits, current_vector_size);
+	  }
+	else
+	  vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+
 	gcc_assert (vector_type);
 	nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
@@ -1758,6 +1766,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
   bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree mask_vectype;
   tree elem_type;
   gimple new_stmt;
   tree dummy;
@@ -1785,8 +1794,8 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 
   is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE;
   mask = gimple_call_arg (stmt, 2);
-  if (TYPE_PRECISION (TREE_TYPE (mask))
-      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype))))
+
+  if (TREE_TYPE (mask) != boolean_type_node)
     return false;
 
   /* FORNOW. This restriction should be relaxed.  */
@@ -1815,6 +1824,19 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
   if (STMT_VINFO_STRIDED_P (stmt_info))
     return false;
 
+  if (TREE_CODE (mask) != SSA_NAME)
+    return false;
+
+  if (!vect_is_simple_use_1 (mask, stmt, loop_vinfo, NULL,
+			     &def_stmt, &def, &dt, &mask_vectype))
+    return false;
+
+  if (!mask_vectype)
+    mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
+
+  if (!mask_vectype)
+    return false;
+
   if (STMT_VINFO_GATHER_P (stmt_info))
     {
       gimple def_stmt;
@@ -1848,14 +1870,9 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 				 : DR_STEP (dr), size_zero_node) <= 0)
     return false;
   else if (!VECTOR_MODE_P (TYPE_MODE (vectype))
-	   || !can_vec_mask_load_store_p (TYPE_MODE (vectype), !is_store))
-    return false;
-
-  if (TREE_CODE (mask) != SSA_NAME)
-    return false;
-
-  if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL,
-			   &def_stmt, &def, &dt))
+	   || !can_vec_mask_load_store_p (TYPE_MODE (vectype),
+					  TYPE_MODE (mask_vectype),
+					  !is_store))
     return false;
 
   if (is_store)
@@ -7373,6 +7390,201 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+
+bool
+vectorizable_comparison (gimple stmt, gimple_stmt_iterator *gsi,
+			 gimple *vec_stmt, tree reduc_def,
+			 slp_tree slp_node)
+{
+  tree lhs, rhs1, rhs2;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype1, vectype2;
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
+  tree vec_compare;
+  tree new_temp;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  tree def;
+  enum vect_def_type dt, dts[4];
+  unsigned nunits;
+  int ncopies;
+  enum tree_code code;
+  stmt_vec_info prev_stmt_info = NULL;
+  int i, j;
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  vec<tree> vec_oprnds0 = vNULL;
+  vec<tree> vec_oprnds1 = vNULL;
+  tree mask_type;
+  tree mask;
+
+  if (TREE_TYPE (vectype) != boolean_type_node)
+    return false;
+
+  mask_type = vectype;
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+    ncopies = 1;
+  else
+    ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+	   && reduc_def))
+    return false;
+
+  if (STMT_VINFO_LIVE_P (stmt_info))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "value used after loop.\n");
+      return false;
+    }
+
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  code = gimple_assign_rhs_code (stmt);
+
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (rhs1) == SSA_NAME)
+    {
+      gimple rhs1_def_stmt = SSA_NAME_DEF_STMT (rhs1);
+      if (!vect_is_simple_use_1 (rhs1, stmt, loop_vinfo, bb_vinfo,
+				 &rhs1_def_stmt, &def, &dt, &vectype1))
+	return false;
+    }
+  else if (TREE_CODE (rhs1) != INTEGER_CST && TREE_CODE (rhs1) != REAL_CST
+	   && TREE_CODE (rhs1) != FIXED_CST)
+    return false;
+
+  if (TREE_CODE (rhs2) == SSA_NAME)
+    {
+      gimple rhs2_def_stmt = SSA_NAME_DEF_STMT (rhs2);
+      if (!vect_is_simple_use_1 (rhs2, stmt, loop_vinfo, bb_vinfo,
+				 &rhs2_def_stmt, &def, &dt, &vectype2))
+	return false;
+    }
+  else if (TREE_CODE (rhs2) != INTEGER_CST && TREE_CODE (rhs2) != REAL_CST
+	   && TREE_CODE (rhs2) != FIXED_CST)
+    return false;
+
+  vectype = vectype1 ? vectype1 : vectype2;
+
+  if (!vectype
+      || nunits != TYPE_VECTOR_SUBPARTS (vectype))
+    return false;
+
+  if (!vec_stmt)
+    {
+      STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+      return expand_vec_cmp_expr_p (vectype, mask_type);
+    }
+
+  /* Transform.  */
+  if (!slp_node)
+    {
+      vec_oprnds0.create (1);
+      vec_oprnds1.create (1);
+    }
+
+  /* Handle def.  */
+  lhs = gimple_assign_lhs (stmt);
+  mask = vect_create_destination_var (lhs, mask_type);
+
+  /* Handle cmp expr.  */
+  for (j = 0; j < ncopies; j++)
+    {
+      gassign *new_stmt = NULL;
+      if (j == 0)
+	{
+	  if (slp_node)
+	    {
+	      auto_vec<tree, 2> ops;
+	      auto_vec<vec<tree>, 2> vec_defs;
+
+	      ops.safe_push (rhs1);
+	      ops.safe_push (rhs2);
+	      vect_get_slp_defs (ops, slp_node, &vec_defs, -1);
+	      vec_oprnds1 = vec_defs.pop ();
+	      vec_oprnds0 = vec_defs.pop ();
+
+	      ops.release ();
+	      vec_defs.release ();
+	    }
+	  else
+	    {
+	      gimple gtemp;
+	      vec_rhs1
+		= vect_get_vec_def_for_operand (rhs1, stmt, NULL);
+	      vect_is_simple_use (rhs1, stmt, loop_vinfo, NULL,
+				  &gtemp, &def, &dts[0]);
+	      vec_rhs2 =
+		vect_get_vec_def_for_operand (rhs2, stmt, NULL);
+	      vect_is_simple_use (rhs2, stmt, loop_vinfo, NULL,
+				  &gtemp, &def, &dts[1]);
+	    }
+	}
+      else
+	{
+	  vec_rhs1 = vect_get_vec_def_for_stmt_copy (dts[0],
+						     vec_oprnds0.pop ());
+	  vec_rhs2 = vect_get_vec_def_for_stmt_copy (dts[1],
+						     vec_oprnds1.pop ());
+	}
+
+      if (!slp_node)
+	{
+	  vec_oprnds0.quick_push (vec_rhs1);
+	  vec_oprnds1.quick_push (vec_rhs2);
+	}
+
+      /* Arguments are ready.  Create the new vector stmt.  */
+      FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_rhs1)
+	{
+	  vec_rhs2 = vec_oprnds1[i];
+
+	  vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2);
+	  new_stmt = gimple_build_assign (mask, vec_compare);
+	  new_temp = make_ssa_name (mask, new_stmt);
+	  gimple_assign_set_lhs (new_stmt, new_temp);
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	  if (slp_node)
+	    SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
+	}
+
+      if (slp_node)
+	continue;
+
+      if (j == 0)
+	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+      else
+	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
+    }
+
+  vec_oprnds0.release ();
+  vec_oprnds1.release ();
+
+  return true;
+}
 
 /* Make sure the statement is vectorizable.  */
 
@@ -7576,7 +7788,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	  || vectorizable_call (stmt, NULL, NULL, node)
 	  || vectorizable_store (stmt, NULL, NULL, node)
 	  || vectorizable_reduction (stmt, NULL, NULL, node)
-	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+	  || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
   else
     {
       if (bb_vinfo)
@@ -7588,7 +7801,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	      || vectorizable_load (stmt, NULL, NULL, node, NULL)
 	      || vectorizable_call (stmt, NULL, NULL, node)
 	      || vectorizable_store (stmt, NULL, NULL, node)
-	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+	      || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
     }
 
   if (!ok)
@@ -7704,6 +7918,11 @@ vect_transform_stmt (gimple stmt, gimple_stmt_iterator *gsi,
       gcc_assert (done);
       break;
 
+    case comparison_vec_info_type:
+      done = vectorizable_comparison (stmt, gsi, &vec_stmt, NULL, slp_node);
+      gcc_assert (done);
+      break;
+
     case call_vec_info_type:
       done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node);
       stmt = gsi_stmt (*gsi);
@@ -8038,6 +8257,23 @@ get_vectype_for_scalar_type (tree scalar_type)
   return vectype;
 }
 
+/* Function get_mask_type_for_scalar_type.
+
+   Returns the mask type corresponding to a result of comparison
+   of vectors of specified SCALAR_TYPE as supported by target.  */
+
+tree
+get_mask_type_for_scalar_type (tree scalar_type)
+{
+  tree vectype = get_vectype_for_scalar_type (scalar_type);
+
+  if (!vectype)
+    return NULL;
+
+  return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype),
+				  current_vector_size);
+}
+
 /* Function get_same_sized_vectype
 
    Returns a vector type corresponding to SCALAR_TYPE of size
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 58e8f10..94aea1a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -28,7 +28,8 @@ along with GCC; see the file COPYING3.  If not see
 enum vect_var_kind {
   vect_simple_var,
   vect_pointer_var,
-  vect_scalar_var
+  vect_scalar_var,
+  vect_mask_var
 };
 
 /* Defines type of operation.  */
@@ -482,6 +483,7 @@ enum stmt_vec_info_type {
   call_simd_clone_vec_info_type,
   assignment_vec_info_type,
   condition_vec_info_type,
+  comparison_vec_info_type,
   reduc_vec_info_type,
   induc_vec_info_type,
   type_promotion_vec_info_type,
@@ -995,6 +997,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 /* In tree-vect-stmts.c.  */
 extern unsigned int current_vector_size;
 extern tree get_vectype_for_scalar_type (tree);
+extern tree get_mask_type_for_scalar_type (tree);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_is_simple_use (tree, gimple, loop_vec_info,
 			        bb_vec_info, gimple *,
diff --git a/gcc/tree.c b/gcc/tree.c
index af3a6a3..30398e5 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -10568,6 +10568,20 @@ build_vector_type (tree innertype, int nunits)
   return make_vector_type (innertype, nunits, VOIDmode);
 }
 
+/* Build truth vector with specified length and number of units.  */
+
+tree
+build_truth_vector_type (unsigned nunits, unsigned vector_size)
+{
+  machine_mode mask_mode = targetm.vectorize.get_mask_mode (nunits,
+							    vector_size);
+
+  if (mask_mode == VOIDmode)
+    return NULL;
+
+  return make_vector_type (boolean_type_node, nunits, mask_mode);
+}
+
 /* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set.  */
 
 tree
@@ -11054,9 +11068,10 @@ truth_type_for (tree type)
 {
   if (TREE_CODE (type) == VECTOR_TYPE)
     {
-      tree elem = lang_hooks.types.type_for_size
-        (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (type))), 0);
-      return build_opaque_vector_type (elem, TYPE_VECTOR_SUBPARTS (type));
+      if (TREE_TYPE (type) == boolean_type_node)
+	return type;
+      return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (type),
+				      GET_MODE_SIZE (TYPE_MODE (type)));
     }
   else
     return boolean_type_node;
diff --git a/gcc/tree.h b/gcc/tree.h
index 2cd6ec4..1657e06 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3820,6 +3820,7 @@ extern tree build_reference_type_for_mode (tree, machine_mode, bool);
 extern tree build_reference_type (tree);
 extern tree build_vector_type_for_mode (tree, machine_mode);
 extern tree build_vector_type (tree innertype, int nunits);
+extern tree build_truth_vector_type (unsigned, unsigned);
 extern tree build_opaque_vector_type (tree innertype, int nunits);
 extern tree build_index_type (tree);
 extern tree build_array_type (tree, tree);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-01 13:13                     ` [RFC] Try vector<bool> as a new representation for vector masks Ilya Enkovich
@ 2015-09-01 14:25                       ` Richard Biener
       [not found]                         ` <CAMbmDYafMuqzmRwRQfFHpLORFFGmFpfSRTR0QKx+LRFm6z75JQ@mail.gmail.com>
  2015-09-04 20:47                       ` Jeff Law
  1 sibling, 1 reply; 48+ messages in thread
From: Richard Biener @ 2015-09-01 14:25 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, GCC Patches

On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> On 27 Aug 09:55, Richard Biener wrote:
>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> >
>> > Yes, I want to try it. But getting rid of bool patterns would mean
>> > support for all targets currently supporting vec_cond. Would it be OK
>> > to have vector<bool> mask co-exist with bool patterns for some time?
>>
>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able
>> to figure out the correct vector type (or mask type) from the uses.  Currently
>> it simply looks at the stmts LHS type but as all stmt operands already have
>> vector types it can as well compute the result type from those.  We'd want to
>> have a helper function that does this result type computation as I figure it
>> will be needed in multiple places.
>>
>> This is now on my personal TODO list (but that's already quite long for GCC 6),
>> so if you manage to get to that...  see
>> tree-vect-loop.c:vect_determine_vectorization_factor
>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>> vector type set from data-ref analysis already - there 'bool' loads
>> correctly get
>> VNQImode).  There is a basic-block / SLP part as well that would need to use
>> the helper function (eventually with some SLP discovery order issue).
>>
>> > Thus first step would be to require vector<bool> for MASK_LOAD and
>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>> > MASK_STORE).
>>
>> You can certainly try that first, but as soon as you hit complications with
>> needing to adjust bool patterns then I'd rather get rid of them.
>>
>> >
>> > I can directly build a vector type with specified mode to avoid it. Smth. like:
>> >
>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>
>> Hmm, indeed, that might be a (good) solution.  Btw, in this case
>> target attribute
>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
>> on the active target).  There would also be no way for the user to
>> declare vector<bool>
>> in source (which is good because of that target attribute issue...).
>>
>> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
>> and adjusting
>> truth_type_for is the way to go.
>>
>> I suggest you try modifying those parts first according to this scheme
>> that will most
>> likely uncover issues we missed.
>>
>> Thanks,
>> Richard.
>>
>
> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  There were no major issues (for now).
>
> build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type.  It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length.
>
> As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path.  Vectorization factor computation is fixed to have a separate computation for mask types.  Comparison is now handled separately by vectorizer and is vectorized into vector comparison.
>
> Optabs for masked loads and stores were transformed into convert optabs.  Now it is checked using both value and mask modes.
>
> Optabs for comparison were added.  These are also convert optabs checking value and result type.
>
> I had to introduce significant number of new patterns in i386 target to support new optabs.  The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion.

Indeed.

> As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type.  Here is an example I used as a simple test:
>
>   for (i=0; i<N; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f && t < 1.0e+2f)
>       if (c[i] != 0)
>         c[i] = 1;
>   }
>
> Produced vector GIMPLE (before expand):
>
>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 });

Looks good to me.

> Produced assembler on AVX-512:
>
>         vmovups (%rdi), %zmm0
>         vcmpps  $25, %zmm5, %zmm0, %k1
>         vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
>         vmovdqa32       -64(%rdx), %zmm2{%k1}
>         vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
>         vmovdqa32       %zmm4, (%rcx){%k1}
>
> Produced assembler on AVX-2:
>
>         vmovups (%rdx), %xmm1
>         vinsertf128     $0x1, -16(%rdx), %ymm1, %ymm1
>         vcmpltps        %ymm1, %ymm3, %ymm0
>         vcmpltps        %ymm5, %ymm1, %ymm1
>         vpand   %ymm0, %ymm1, %ymm0
>         vpmaskmovd      -32(%rcx), %ymm0, %ymm1
>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>         vpand   %ymm0, %ymm1, %ymm0
>         vpmaskmovd      %ymm4, %ymm0, (%rax)
>
> BTW AVX-2 code produced by trunk compiler is 4 insns longer:
>
>         vmovups (%rdx), %xmm0
>         vinsertf128     $0x1, -16(%rdx), %ymm0, %ymm0
>         vcmpltps        %ymm0, %ymm6, %ymm1
>         vcmpltps        %ymm7, %ymm0, %ymm0
>         vpand   %ymm1, %ymm5, %ymm2
>         vpand   %ymm0, %ymm2, %ymm1
>         vpcmpeqd        %ymm3, %ymm1, %ymm0
>         vpandn  %ymm4, %ymm0, %ymm0
>         vpmaskmovd      -32(%rcx), %ymm0, %ymm0
>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>         vpandn  %ymm1, %ymm0, %ymm0
>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>         vpandn  %ymm4, %ymm0, %ymm0
>         vpmaskmovd      %ymm5, %ymm0, (%rax)
>
>
> For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only.  Patch is also not tested and tried on several small tests only.  Could you please look at what I currently have and say if it's in sync with your view on vector masking?

So apart from bool patterns and maybe implementation details (didn't
look too closely at the patch yet, maybe tomorrow), there is

+  /* Or a boolean vector type with the same element count
+     as the comparison operand types.  */
+  else if (TREE_CODE (type) == VECTOR_TYPE
+          && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
+    {

so we now allow both, integer typed and boolean typed comparison
results?  I was hoping that on GIMPLE
we can canonicalize to a single form, the boolean one and for the
"old" style force the use of VEC_COND exprs
(which we did anyway, AFAIK).  The comparison in the VEC_COND would
still have vector bool result type.

I expect the vectorization factor changes to "vanish" if we remove
bool patterns and re-org vector type deduction

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-09-01  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
>         (ix86_expand_int_vec_cmp): New.
>         (ix86_expand_fp_vec_cmp): New.
>         * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
>         op_true and op_false.
>         (ix86_int_cmp_code_to_pcmp_immediate): New.
>         (ix86_fp_cmp_code_to_pcmp_immediate): New.
>         (ix86_cmp_code_to_pcmp_immediate): New.
>         (ix86_expand_mask_vec_cmp): New.
>         (ix86_expand_fp_vec_cmp): New.
>         (ix86_expand_int_sse_cmp): New.
>         (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
>         (ix86_expand_int_vec_cmp): New.
>         (ix86_get_mask_mode): New.
>         (TARGET_VECTORIZE_GET_MASK_MODE): New.
>         * config/i386/sse.md (avx512fmaskmodelower): New.
>         (vec_cmp<mode><avx512fmaskmodelower>): New.
>         (vec_cmp<mode><sseintvecmodelower>): New.
>         (vec_cmpv2div2di): New.
>         (vec_cmpu<mode><avx512fmaskmodelower>): New.
>         (vec_cmpu<mode><sseintvecmodelower>): New.
>         (vec_cmpuv2div2di): New.
>         (maskload<mode>): Rename to ...
>         (maskload<mode><sseintvecmodelower>): ... this.
>         (maskstore<mode>): Rename to ...
>         (maskstore<mode><sseintvecmodelower>): ... this.
>         (maskload<mode><avx512fmaskmodelower>): New.
>         (maskstore<mode><avx512fmaskmodelower>): New.
>         * doc/tm.texi: Regenerated.
>         * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
>         * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
>         * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
>         (expand_MASK_STORE): Likewise.
>         * optabs.c (vector_compare_rtx): Add OPNO arg.
>         (expand_vec_cond_expr): Adjust to vector_compare_rtx change.
>         (get_vec_cmp_icode): New.
>         (expand_vec_cmp_expr_p): New.
>         (expand_vec_cmp_expr): New.
>         (can_vec_mask_load_store_p): Add MASK_MODE arg.
>         * optabs.def (vec_cmp_optab): New.
>         (vec_cmpu_optab): New.
>         (maskload_optab): Transform into convert optab.
>         (maskstore_optab): Likewise.
>         * optabs.h (expand_vec_cmp_expr_p): New.
>         (expand_vec_cmp_expr): New.
>         (can_vec_mask_load_store_p): Add MASK_MODE arg.
>         * target.def (get_mask_mode): New.
>         * targhooks.c (default_vector_alignment): Use mode alignment
>         for vector masks.
>         (default_get_mask_mode): New.
>         * targhooks.h (default_get_mask_mode): New.
>         * tree-cfg.c (verify_gimple_comparison): Support vector mask.
>         * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
>         can_vec_mask_load_store_p signature change.
>         (predicate_mem_writes): Use boolean mask.
>         * tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
>         (vect_create_destination_var): Likewise.
>         * tree-vect-generic.c (expand_vector_comparison): Use
>         expand_vec_cmp_expr_p for comparison availability.
>         (expand_vector_operations_1): Ignore statements with scalar mode.
>         * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
>         operations for VF.  Add mask type computation.
>         * tree-vect-stmts.c (vect_get_vec_def_for_operand): Support mask
>         constant.
>         (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
>         signature change.
>         (vectorizable_comparison): New.
>         (vect_analyze_stmt): Add vectorizable_comparison.
>         (vect_transform_stmt): Likewise.
>         (get_mask_type_for_scalar_type): New.
>         * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
>         (enum stmt_vec_info_type): Add comparison_vec_info_type.
>         (get_mask_type_for_scalar_type): New.
>         * tree.c (build_truth_vector_type): New.
>         (truth_type_for): Use build_truth_vector_type for vectors.
>         * tree.h (build_truth_vector_type): New.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
       [not found]                         ` <CAMbmDYafMuqzmRwRQfFHpLORFFGmFpfSRTR0QKx+LRFm6z75JQ@mail.gmail.com>
@ 2015-09-03 12:12                           ` Ilya Enkovich
  2015-09-03 12:42                             ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-03 12:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches

Adding CCs.

2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> On 27 Aug 09:55, Richard Biener wrote:
>>>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> >
>>>> > Yes, I want to try it. But getting rid of bool patterns would mean
>>>> > support for all targets currently supporting vec_cond. Would it be OK
>>>> > to have vector<bool> mask co-exist with bool patterns for some time?
>>>>
>>>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able
>>>> to figure out the correct vector type (or mask type) from the uses.  Currently
>>>> it simply looks at the stmts LHS type but as all stmt operands already have
>>>> vector types it can as well compute the result type from those.  We'd want to
>>>> have a helper function that does this result type computation as I figure it
>>>> will be needed in multiple places.
>>>>
>>>> This is now on my personal TODO list (but that's already quite long for GCC 6),
>>>> so if you manage to get to that...  see
>>>> tree-vect-loop.c:vect_determine_vectorization_factor
>>>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>>>> vector type set from data-ref analysis already - there 'bool' loads
>>>> correctly get
>>>> VNQImode).  There is a basic-block / SLP part as well that would need to use
>>>> the helper function (eventually with some SLP discovery order issue).
>>>>
>>>> > Thus first step would be to require vector<bool> for MASK_LOAD and
>>>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>>>> > MASK_STORE).
>>>>
>>>> You can certainly try that first, but as soon as you hit complications with
>>>> needing to adjust bool patterns then I'd rather get rid of them.
>>>>
>>>> >
>>>> > I can directly build a vector type with specified mode to avoid it. Smth. like:
>>>> >
>>>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
>>>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>>>
>>>> Hmm, indeed, that might be a (good) solution.  Btw, in this case
>>>> target attribute
>>>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
>>>> on the active target).  There would also be no way for the user to
>>>> declare vector<bool>
>>>> in source (which is good because of that target attribute issue...).
>>>>
>>>> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
>>>> and adjusting
>>>> truth_type_for is the way to go.
>>>>
>>>> I suggest you try modifying those parts first according to this scheme
>>>> that will most
>>>> likely uncover issues we missed.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>
>>> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  There were no major issues (for now).
>>>
>>> build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type.  It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length.
>>>
>>> As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path.  Vectorization factor computation is fixed to have a separate computation for mask types.  Comparison is now handled separately by vectorizer and is vectorized into vector comparison.
>>>
>>> Optabs for masked loads and stores were transformed into convert optabs.  Now it is checked using both value and mask modes.
>>>
>>> Optabs for comparison were added.  These are also convert optabs checking value and result type.
>>>
>>> I had to introduce significant number of new patterns in i386 target to support new optabs.  The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion.
>>
>> Indeed.
>>
>>> As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type.  Here is an example I used as a simple test:
>>>
>>>   for (i=0; i<N; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0.0f && t < 1.0e+2f)
>>>       if (c[i] != 0)
>>>         c[i] = 1;
>>>   }
>>>
>>> Produced vector GIMPLE (before expand):
>>>
>>>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>>>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
>>>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>>>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>>>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>>>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>>>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 });
>>
>> Looks good to me.
>>
>>> Produced assembler on AVX-512:
>>>
>>>         vmovups (%rdi), %zmm0
>>>         vcmpps  $25, %zmm5, %zmm0, %k1
>>>         vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
>>>         vmovdqa32       -64(%rdx), %zmm2{%k1}
>>>         vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
>>>         vmovdqa32       %zmm4, (%rcx){%k1}
>>>
>>> Produced assembler on AVX-2:
>>>
>>>         vmovups (%rdx), %xmm1
>>>         vinsertf128     $0x1, -16(%rdx), %ymm1, %ymm1
>>>         vcmpltps        %ymm1, %ymm3, %ymm0
>>>         vcmpltps        %ymm5, %ymm1, %ymm1
>>>         vpand   %ymm0, %ymm1, %ymm0
>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm1
>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>         vpand   %ymm0, %ymm1, %ymm0
>>>         vpmaskmovd      %ymm4, %ymm0, (%rax)
>>>
>>> BTW AVX-2 code produced by trunk compiler is 4 insns longer:
>>>
>>>         vmovups (%rdx), %xmm0
>>>         vinsertf128     $0x1, -16(%rdx), %ymm0, %ymm0
>>>         vcmpltps        %ymm0, %ymm6, %ymm1
>>>         vcmpltps        %ymm7, %ymm0, %ymm0
>>>         vpand   %ymm1, %ymm5, %ymm2
>>>         vpand   %ymm0, %ymm2, %ymm1
>>>         vpcmpeqd        %ymm3, %ymm1, %ymm0
>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm0
>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>         vpandn  %ymm1, %ymm0, %ymm0
>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>         vpmaskmovd      %ymm5, %ymm0, (%rax)
>>>
>>>
>>> For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only.  Patch is also not tested and tried on several small tests only.  Could you please look at what I currently have and say if it's in sync with your view on vector masking?
>>
>> So apart from bool patterns and maybe implementation details (didn't
>> look too closely at the patch yet, maybe tomorrow), there is
>>
>> +  /* Or a boolean vector type with the same element count
>> +     as the comparison operand types.  */
>> +  else if (TREE_CODE (type) == VECTOR_TYPE
>> +          && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>> +    {
>>
>> so we now allow both, integer typed and boolean typed comparison
>> results?  I was hoping that on GIMPLE
>> we can canonicalize to a single form, the boolean one and for the
>> "old" style force the use of VEC_COND exprs
>> (which we did anyway, AFAIK).  The comparison in the VEC_COND would
>> still have vector bool result type.
>>
>> I expect the vectorization factor changes to "vanish" if we remove
>> bool patterns and re-org vector type deduction
>>
>> Richard.
>>
>
> Totally disabling old style vector comparison and bool pattern is a
> goal but doing hat would mean a lot of regressions for many targets.
> Do you want to it to be tried to estimate amount of changes required
> and reveal possible issues? What would be integration plan for these
> changes? Do you want to just introduce new vector<bool> in GIMPLE
> disabling bool patterns and then resolving vectorization regression on
> all targets or allow them live together with following target switch
> one by one from bool patterns with finally removing them? Not all
> targets are likely to be adopted fast I suppose.
>
> Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-03 12:12                           ` Ilya Enkovich
@ 2015-09-03 12:42                             ` Richard Biener
  2015-09-03 14:12                               ` Ilya Enkovich
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2015-09-03 12:42 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, gcc-patches

On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Adding CCs.
>
> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> On 27 Aug 09:55, Richard Biener wrote:
>>>>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>> >
>>>>> > Yes, I want to try it. But getting rid of bool patterns would mean
>>>>> > support for all targets currently supporting vec_cond. Would it be OK
>>>>> > to have vector<bool> mask co-exist with bool patterns for some time?
>>>>>
>>>>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able
>>>>> to figure out the correct vector type (or mask type) from the uses.  Currently
>>>>> it simply looks at the stmts LHS type but as all stmt operands already have
>>>>> vector types it can as well compute the result type from those.  We'd want to
>>>>> have a helper function that does this result type computation as I figure it
>>>>> will be needed in multiple places.
>>>>>
>>>>> This is now on my personal TODO list (but that's already quite long for GCC 6),
>>>>> so if you manage to get to that...  see
>>>>> tree-vect-loop.c:vect_determine_vectorization_factor
>>>>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>>>>> vector type set from data-ref analysis already - there 'bool' loads
>>>>> correctly get
>>>>> VNQImode).  There is a basic-block / SLP part as well that would need to use
>>>>> the helper function (eventually with some SLP discovery order issue).
>>>>>
>>>>> > Thus first step would be to require vector<bool> for MASK_LOAD and
>>>>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>>>>> > MASK_STORE).
>>>>>
>>>>> You can certainly try that first, but as soon as you hit complications with
>>>>> needing to adjust bool patterns then I'd rather get rid of them.
>>>>>
>>>>> >
>>>>> > I can directly build a vector type with specified mode to avoid it. Smth. like:
>>>>> >
>>>>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
>>>>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>>>>
>>>>> Hmm, indeed, that might be a (good) solution.  Btw, in this case
>>>>> target attribute
>>>>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
>>>>> on the active target).  There would also be no way for the user to
>>>>> declare vector<bool>
>>>>> in source (which is good because of that target attribute issue...).
>>>>>
>>>>> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
>>>>> and adjusting
>>>>> truth_type_for is the way to go.
>>>>>
>>>>> I suggest you try modifying those parts first according to this scheme
>>>>> that will most
>>>>> likely uncover issues we missed.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>
>>>> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  There were no major issues (for now).
>>>>
>>>> build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type.  It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length.
>>>>
>>>> As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path.  Vectorization factor computation is fixed to have a separate computation for mask types.  Comparison is now handled separately by vectorizer and is vectorized into vector comparison.
>>>>
>>>> Optabs for masked loads and stores were transformed into convert optabs.  Now it is checked using both value and mask modes.
>>>>
>>>> Optabs for comparison were added.  These are also convert optabs checking value and result type.
>>>>
>>>> I had to introduce significant number of new patterns in i386 target to support new optabs.  The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion.
>>>
>>> Indeed.
>>>
>>>> As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type.  Here is an example I used as a simple test:
>>>>
>>>>   for (i=0; i<N; i++)
>>>>   {
>>>>     float t = a[i];
>>>>     if (t > 0.0f && t < 1.0e+2f)
>>>>       if (c[i] != 0)
>>>>         c[i] = 1;
>>>>   }
>>>>
>>>> Produced vector GIMPLE (before expand):
>>>>
>>>>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>>>>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
>>>>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>>>>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>>>>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>>>>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>>>>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 });
>>>
>>> Looks good to me.
>>>
>>>> Produced assembler on AVX-512:
>>>>
>>>>         vmovups (%rdi), %zmm0
>>>>         vcmpps  $25, %zmm5, %zmm0, %k1
>>>>         vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
>>>>         vmovdqa32       -64(%rdx), %zmm2{%k1}
>>>>         vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
>>>>         vmovdqa32       %zmm4, (%rcx){%k1}
>>>>
>>>> Produced assembler on AVX-2:
>>>>
>>>>         vmovups (%rdx), %xmm1
>>>>         vinsertf128     $0x1, -16(%rdx), %ymm1, %ymm1
>>>>         vcmpltps        %ymm1, %ymm3, %ymm0
>>>>         vcmpltps        %ymm5, %ymm1, %ymm1
>>>>         vpand   %ymm0, %ymm1, %ymm0
>>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm1
>>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>>         vpand   %ymm0, %ymm1, %ymm0
>>>>         vpmaskmovd      %ymm4, %ymm0, (%rax)
>>>>
>>>> BTW AVX-2 code produced by trunk compiler is 4 insns longer:
>>>>
>>>>         vmovups (%rdx), %xmm0
>>>>         vinsertf128     $0x1, -16(%rdx), %ymm0, %ymm0
>>>>         vcmpltps        %ymm0, %ymm6, %ymm1
>>>>         vcmpltps        %ymm7, %ymm0, %ymm0
>>>>         vpand   %ymm1, %ymm5, %ymm2
>>>>         vpand   %ymm0, %ymm2, %ymm1
>>>>         vpcmpeqd        %ymm3, %ymm1, %ymm0
>>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm0
>>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>>         vpandn  %ymm1, %ymm0, %ymm0
>>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>>         vpmaskmovd      %ymm5, %ymm0, (%rax)
>>>>
>>>>
>>>> For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only.  Patch is also not tested and tried on several small tests only.  Could you please look at what I currently have and say if it's in sync with your view on vector masking?
>>>
>>> So apart from bool patterns and maybe implementation details (didn't
>>> look too closely at the patch yet, maybe tomorrow), there is
>>>
>>> +  /* Or a boolean vector type with the same element count
>>> +     as the comparison operand types.  */
>>> +  else if (TREE_CODE (type) == VECTOR_TYPE
>>> +          && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>>> +    {
>>>
>>> so we now allow both, integer typed and boolean typed comparison
>>> results?  I was hoping that on GIMPLE
>>> we can canonicalize to a single form, the boolean one and for the
>>> "old" style force the use of VEC_COND exprs
>>> (which we did anyway, AFAIK).  The comparison in the VEC_COND would
>>> still have vector bool result type.
>>>
>>> I expect the vectorization factor changes to "vanish" if we remove
>>> bool patterns and re-org vector type deduction
>>>
>>> Richard.
>>>
>>
>> Totally disabling old style vector comparison and bool pattern is a
>> goal but doing hat would mean a lot of regressions for many targets.
>> Do you want to it to be tried to estimate amount of changes required
>> and reveal possible issues? What would be integration plan for these
>> changes? Do you want to just introduce new vector<bool> in GIMPLE
>> disabling bool patterns and then resolving vectorization regression on
>> all targets or allow them live together with following target switch
>> one by one from bool patterns with finally removing them? Not all
>> targets are likely to be adopted fast I suppose.

Well, the frontends already create vec_cond exprs I believe.  So for
bool patterns the vectorizer would have to do the same, but the
comparison result in there would still use vec<bool>.  Thus the scalar

 _Bool a = b < c;
 _Bool c = a || d;
 if (c)

would become

 vec<int> a = VEC_COND <a < b ? -1 : 0>;
 vec<int> c = a | d;

when the target does not have vec<bool>s directly and otherwise
vec<boo> directly (dropping the VEC_COND).

Just the vector comparison inside the VEC_COND would always
have vec<bool> type.

And the "bool patterns" I am talking about are those in
tree-vect-patterns.c, not any targets instruction patterns.

Richard.

>>
>> Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-03 12:42                             ` Richard Biener
@ 2015-09-03 14:12                               ` Ilya Enkovich
  2015-09-18 12:29                                 ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-03 14:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches

2015-09-03 15:11 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> Adding CCs.
>>
>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>
>>> Totally disabling old style vector comparison and bool pattern is a
>>> goal but doing hat would mean a lot of regressions for many targets.
>>> Do you want to it to be tried to estimate amount of changes required
>>> and reveal possible issues? What would be integration plan for these
>>> changes? Do you want to just introduce new vector<bool> in GIMPLE
>>> disabling bool patterns and then resolving vectorization regression on
>>> all targets or allow them live together with following target switch
>>> one by one from bool patterns with finally removing them? Not all
>>> targets are likely to be adopted fast I suppose.
>
> Well, the frontends already create vec_cond exprs I believe.  So for
> bool patterns the vectorizer would have to do the same, but the
> comparison result in there would still use vec<bool>.  Thus the scalar
>
>  _Bool a = b < c;
>  _Bool c = a || d;
>  if (c)
>
> would become
>
>  vec<int> a = VEC_COND <a < b ? -1 : 0>;
>  vec<int> c = a | d;

This should be identical to

vec<_Bool> a = a < b;
vec<_Bool> c = a | d;

where vec<_Bool> has VxSI mode. And we should prefer it in case target
supports vector comparison into vec<bool>, right?

>
> when the target does not have vec<bool>s directly and otherwise
> vec<boo> directly (dropping the VEC_COND).
>
> Just the vector comparison inside the VEC_COND would always
> have vec<bool> type.

I don't really understand what you mean by 'doesn't have vec<bool>s
dirrectly' here. Currently I have a hook to ask for a vec<bool> mode
and assume target doesn't support it in case it returns VOIDmode. But
in such case I have no mode to use for vec<bool> inside VEC_COND
either.

In default implementation of the new target hook I always return
integer vector mode (to have default behavior similar to the current
one). It should allow me to use vec<bool> for conditions in all
vec_cond. But we'd need some other trigger for bool patterns to apply.
Probably check vec_cmp optab in check_bool_pattern and don't convert
in case comparison is supported by target? Or control it via
additional hook.

>
> And the "bool patterns" I am talking about are those in
> tree-vect-patterns.c, not any targets instruction patterns.

I refer to them also. BTW bool patterns also pull comparison into
vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
think with vector comparisons in place we should allow SSA_NAME as
conditions in VEC_COND for better CSE. That should require new vcond
optabs though.

Ilya

>
> Richard.
>
>>>
>>> Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-01 13:13                     ` [RFC] Try vector<bool> as a new representation for vector masks Ilya Enkovich
  2015-09-01 14:25                       ` Richard Biener
@ 2015-09-04 20:47                       ` Jeff Law
  2015-09-08 12:43                         ` Ilya Enkovich
  1 sibling, 1 reply; 48+ messages in thread
From: Jeff Law @ 2015-09-04 20:47 UTC (permalink / raw)
  To: Ilya Enkovich, Richard Biener; +Cc: GCC Patches

On 09/01/2015 07:08 AM, Ilya Enkovich wrote:
> On 27 Aug 09:55, Richard Biener wrote:
>> I suggest you try modifying those parts first according to this
>> scheme that will most likely uncover issues we missed.
>>
>> Thanks, Richard.
>>
>
> I tried to implement this scheme and apply it for MASK_LOAD and
> MASK_STORE.  There were no major issues (for now).
So do we have enough confidence in this representation that we want to 
go ahead and commit to it?

>
> I had to introduce significant number of new patterns in i386 target
> to support new optabs.  The reason was vector compare was never
> expanded separately and always was a part of a vec_cond expansion.
One could argue we should have fixed this already, so I don't see the 
new patterns as a bad thing, but instead they're addressing a long term 
mis-design.

>
>
> For now I still don't disable bool patterns, thus new masks apply to
> masked loads and stores only.  Patch is also not tested and tried on
> several small tests only.  Could you please look at what I currently
> have and say if it's in sync with your view on vector masking?
I'm going to let Richi run with this for the most part -- but I will 
chime in with a thank you for being willing to bounce this around a bit 
while we figure out the representational issues.


jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-04 20:47                       ` Jeff Law
@ 2015-09-08 12:43                         ` Ilya Enkovich
  2015-09-15 13:55                           ` Ilya Enkovich
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-08 12:43 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5513 bytes --]

2015-09-04 23:42 GMT+03:00 Jeff Law <law@redhat.com>:
> On 09/01/2015 07:08 AM, Ilya Enkovich wrote:
>>
>> On 27 Aug 09:55, Richard Biener wrote:
>>>
>>> I suggest you try modifying those parts first according to this
>>> scheme that will most likely uncover issues we missed.
>>>
>>> Thanks, Richard.
>>>
>>
>> I tried to implement this scheme and apply it for MASK_LOAD and
>> MASK_STORE.  There were no major issues (for now).
>
> So do we have enough confidence in this representation that we want to go
> ahead and commit to it?

I think new representation fits nice mostly. There are some places
where I have to make some exceptions for vector of bools to make it
work. This is mostly to avoid target modifications. I'd like to avoid
necessity to change all targets currently supporting vec_cond. It
makes me add some special handling of vec<bool> in GIMPLE, e.g. I add
a special code in vect_init_vector to build vec<bool> invariants with
proper casting to int. Otherwise I'd need to do it on a target side.

I made several fixes and current patch (still allowing integer vector
result for vector comparison and applying bool patterns) passes
bootstrap and regression testing on x86_64. Now I'll try to fully
switch to vec<bool> and see how it goes.

Thanks,
Ilya

>
>>
>> I had to introduce significant number of new patterns in i386 target
>> to support new optabs.  The reason was vector compare was never
>> expanded separately and always was a part of a vec_cond expansion.
>
> One could argue we should have fixed this already, so I don't see the new
> patterns as a bad thing, but instead they're addressing a long term
> mis-design.
>
>>
>>
>> For now I still don't disable bool patterns, thus new masks apply to
>> masked loads and stores only.  Patch is also not tested and tried on
>> several small tests only.  Could you please look at what I currently
>> have and say if it's in sync with your view on vector masking?
>
> I'm going to let Richi run with this for the most part -- but I will chime
> in with a thank you for being willing to bounce this around a bit while we
> figure out the representational issues.
>
>
> jeff



gcc/

2015-09-08  Ilya Enkovich  <enkovich.gnu@gmail.com>

* config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
(ix86_expand_int_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
* config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
op_true and op_false.
(ix86_int_cmp_code_to_pcmp_immediate): New.
(ix86_fp_cmp_code_to_pcmp_immediate): New.
(ix86_cmp_code_to_pcmp_immediate): New.
(ix86_expand_mask_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
(ix86_expand_int_sse_cmp): New.
(ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
(ix86_expand_int_vec_cmp): New.
(ix86_get_mask_mode): New.
(TARGET_VECTORIZE_GET_MASK_MODE): New.
* config/i386/sse.md (avx512fmaskmodelower): New.
(vec_cmp<mode><avx512fmaskmodelower>): New.
(vec_cmp<mode><sseintvecmodelower>): New.
(vec_cmpv2div2di): New.
(vec_cmpu<mode><avx512fmaskmodelower>): New.
(vec_cmpu<mode><sseintvecmodelower>): New.
(vec_cmpuv2div2di): New.
(maskload<mode>): Rename to ...
(maskload<mode><sseintvecmodelower>): ... this.
(maskstore<mode>): Rename to ...
(maskstore<mode><sseintvecmodelower>): ... this.
(maskload<mode><avx512fmaskmodelower>): New.
(maskstore<mode><avx512fmaskmodelower>): New.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
* expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
* internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
(expand_MASK_STORE): Likewise.
* optabs.c (vector_compare_rtx): Add OPNO arg.
(expand_vec_cond_expr): Adjust to vector_compare_rtx change.
(get_vec_cmp_icode): New.
(expand_vec_cmp_expr_p): New.
(expand_vec_cmp_expr): New.
(can_vec_mask_load_store_p): Add MASK_MODE arg.
* optabs.def (vec_cmp_optab): New.
(vec_cmpu_optab): New.
(maskload_optab): Transform into convert optab.
(maskstore_optab): Likewise.
* optabs.h (expand_vec_cmp_expr_p): New.
(expand_vec_cmp_expr): New.
(can_vec_mask_load_store_p): Add MASK_MODE arg.
* target.def (get_mask_mode): New.
* targhooks.c (default_vector_alignment): Use mode alignment
for vector masks.
(default_get_mask_mode): New.
* targhooks.h (default_get_mask_mode): New.
* tree-cfg.c (verify_gimple_comparison): Support vector mask.
* tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
can_vec_mask_load_store_p signature change.
(predicate_mem_writes): Use boolean mask.
* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
(vect_create_destination_var): Likewise.
* tree-vect-generic.c (expand_vector_comparison): Use
expand_vec_cmp_expr_p for comparison availability.
(expand_vector_operations_1): Ignore mask statements with scalar mode.
* tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
operations for VF.  Add mask type computation.
* tree-vect-stmts.c (vect_init_vector): Support mask invariants.
(vect_get_vec_def_for_operand): Support mask constant.
(vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
signature change.
(vectorizable_comparison): New.
(vect_analyze_stmt): Add vectorizable_comparison.
(vect_transform_stmt): Likewise.
(get_mask_type_for_scalar_type): New.
* tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
(enum stmt_vec_info_type): Add comparison_vec_info_type.
(get_mask_type_for_scalar_type): New.
* tree.c (build_truth_vector_type): New.
(truth_type_for): Use build_truth_vector_type for vectors.
* tree.h (build_truth_vector_type): New.

[-- Attachment #2: avx512-vec-bool.patch --]
[-- Type: application/octet-stream, Size: 63973 bytes --]

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 6a17ef4..e22aa57 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -129,6 +129,9 @@ extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
 extern void ix86_expand_vec_perm (rtx[]);
 extern bool ix86_expand_vec_perm_const (rtx[]);
+extern bool ix86_expand_mask_vec_cmp (rtx[]);
+extern bool ix86_expand_int_vec_cmp (rtx[]);
+extern bool ix86_expand_fp_vec_cmp (rtx[]);
 extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 070605f..e44cdb5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21440,8 +21440,8 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1,
     cmp_op1 = force_reg (cmp_ops_mode, cmp_op1);
 
   if (optimize
-      || reg_overlap_mentioned_p (dest, op_true)
-      || reg_overlap_mentioned_p (dest, op_false))
+      || (op_true && reg_overlap_mentioned_p (dest, op_true))
+      || (op_false && reg_overlap_mentioned_p (dest, op_false)))
     dest = gen_reg_rtx (maskcmp ? cmp_mode : mode);
 
   /* Compare patterns for int modes are unspec in AVX512F only.  */
@@ -21713,34 +21713,127 @@ ix86_expand_fp_movcc (rtx operands[])
   return true;
 }
 
-/* Expand a floating-point vector conditional move; a vcond operation
-   rather than a movcc operation.  */
+/* Helper for ix86_cmp_code_to_pcmp_immediate for int modes.  */
+
+static int
+ix86_int_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+  switch (code)
+    {
+    case EQ:
+      return 0;
+    case LT:
+    case LTU:
+      return 1;
+    case LE:
+    case LEU:
+      return 2;
+    case NE:
+      return 4;
+    case GE:
+    case GEU:
+      return 5;
+    case GT:
+    case GTU:
+      return 6;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Helper for ix86_cmp_code_to_pcmp_immediate for fp modes.  */
+
+static int
+ix86_fp_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+  switch (code)
+    {
+    case EQ:
+      return 0x08;
+    case NE:
+      return 0x04;
+    case GT:
+      return 0x16;
+    case LE:
+      return 0x1a;
+    case GE:
+      return 0x15;
+    case LT:
+      return 0x19;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return immediate value to be used in UNSPEC_PCMP
+   for comparison CODE in MODE.  */
+
+static int
+ix86_cmp_code_to_pcmp_immediate (enum rtx_code code, machine_mode mode)
+{
+  if (FLOAT_MODE_P (mode))
+    return ix86_fp_cmp_code_to_pcmp_immediate (code);
+  return ix86_int_cmp_code_to_pcmp_immediate (code);
+}
+
+/* Expand AVX-512 vector comparison.  */
 
 bool
-ix86_expand_fp_vcond (rtx operands[])
+ix86_expand_mask_vec_cmp (rtx operands[])
 {
-  enum rtx_code code = GET_CODE (operands[3]);
+  machine_mode mask_mode = GET_MODE (operands[0]);
+  machine_mode cmp_mode = GET_MODE (operands[2]);
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx imm = GEN_INT (ix86_cmp_code_to_pcmp_immediate (code, cmp_mode));
+  int unspec_code;
+  rtx unspec;
+
+  switch (code)
+    {
+    case LEU:
+    case GTU:
+    case GEU:
+    case LTU:
+      unspec_code = UNSPEC_UNSIGNED_PCMP;
+    default:
+      unspec_code = UNSPEC_PCMP;
+    }
+
+  unspec = gen_rtx_UNSPEC (mask_mode, gen_rtvec (3, operands[2],
+						 operands[3], imm),
+			   unspec_code);
+  emit_insn (gen_rtx_SET (operands[0], unspec));
+
+  return true;
+}
+
+/* Expand fp vector comparison.  */
+
+bool
+ix86_expand_fp_vec_cmp (rtx operands[])
+{
+  enum rtx_code code = GET_CODE (operands[1]);
   rtx cmp;
 
   code = ix86_prepare_sse_fp_compare_args (operands[0], code,
-					   &operands[4], &operands[5]);
+					   &operands[2], &operands[3]);
   if (code == UNKNOWN)
     {
       rtx temp;
-      switch (GET_CODE (operands[3]))
+      switch (GET_CODE (operands[1]))
 	{
 	case LTGT:
-	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
-				      operands[5], operands[0], operands[0]);
-	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
-				     operands[5], operands[1], operands[2]);
+	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[2],
+				      operands[3], NULL, NULL);
+	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[2],
+				     operands[3], NULL, NULL);
 	  code = AND;
 	  break;
 	case UNEQ:
-	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
-				      operands[5], operands[0], operands[0]);
-	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
-				     operands[5], operands[1], operands[2]);
+	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[2],
+				      operands[3], NULL, NULL);
+	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[2],
+				     operands[3], NULL, NULL);
 	  code = IOR;
 	  break;
 	default:
@@ -21748,72 +21841,26 @@ ix86_expand_fp_vcond (rtx operands[])
 	}
       cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
 				 OPTAB_DIRECT);
-      ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
-      return true;
     }
+  else
+    cmp = ix86_expand_sse_cmp (operands[0], code, operands[2], operands[3],
+			       operands[1], operands[2]);
 
-  if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
-				 operands[5], operands[1], operands[2]))
-    return true;
+  if (operands[0] != cmp)
+    emit_move_insn (operands[0], cmp);
 
-  cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
-			     operands[1], operands[2]);
-  ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
   return true;
 }
 
-/* Expand a signed/unsigned integral vector conditional move.  */
-
-bool
-ix86_expand_int_vcond (rtx operands[])
+static rtx
+ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1,
+			 rtx op_true, rtx op_false, bool *negate)
 {
-  machine_mode data_mode = GET_MODE (operands[0]);
-  machine_mode mode = GET_MODE (operands[4]);
-  enum rtx_code code = GET_CODE (operands[3]);
-  bool negate = false;
-  rtx x, cop0, cop1;
-
-  cop0 = operands[4];
-  cop1 = operands[5];
+  machine_mode data_mode = GET_MODE (dest);
+  machine_mode mode = GET_MODE (cop0);
+  rtx x;
 
-  /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
-     and x < 0 ? 1 : 0 into (unsigned) x >> 31.  */
-  if ((code == LT || code == GE)
-      && data_mode == mode
-      && cop1 == CONST0_RTX (mode)
-      && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
-      && GET_MODE_UNIT_SIZE (data_mode) > 1
-      && GET_MODE_UNIT_SIZE (data_mode) <= 8
-      && (GET_MODE_SIZE (data_mode) == 16
-	  || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
-    {
-      rtx negop = operands[2 - (code == LT)];
-      int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
-      if (negop == CONST1_RTX (data_mode))
-	{
-	  rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
-					 operands[0], 1, OPTAB_DIRECT);
-	  if (res != operands[0])
-	    emit_move_insn (operands[0], res);
-	  return true;
-	}
-      else if (GET_MODE_INNER (data_mode) != DImode
-	       && vector_all_ones_operand (negop, data_mode))
-	{
-	  rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
-					 operands[0], 0, OPTAB_DIRECT);
-	  if (res != operands[0])
-	    emit_move_insn (operands[0], res);
-	  return true;
-	}
-    }
-
-  if (!nonimmediate_operand (cop1, mode))
-    cop1 = force_reg (mode, cop1);
-  if (!general_operand (operands[1], data_mode))
-    operands[1] = force_reg (data_mode, operands[1]);
-  if (!general_operand (operands[2], data_mode))
-    operands[2] = force_reg (data_mode, operands[2]);
+  *negate = false;
 
   /* XOP supports all of the comparisons on all 128-bit vector int types.  */
   if (TARGET_XOP
@@ -21834,13 +21881,13 @@ ix86_expand_int_vcond (rtx operands[])
 	case LE:
 	case LEU:
 	  code = reverse_condition (code);
-	  negate = true;
+	  *negate = true;
 	  break;
 
 	case GE:
 	case GEU:
 	  code = reverse_condition (code);
-	  negate = true;
+	  *negate = true;
 	  /* FALLTHRU */
 
 	case LT:
@@ -21861,14 +21908,14 @@ ix86_expand_int_vcond (rtx operands[])
 	    case EQ:
 	      /* SSE4.1 supports EQ.  */
 	      if (!TARGET_SSE4_1)
-		return false;
+		return NULL;
 	      break;
 
 	    case GT:
 	    case GTU:
 	      /* SSE4.2 supports GT/GTU.  */
 	      if (!TARGET_SSE4_2)
-		return false;
+		return NULL;
 	      break;
 
 	    default:
@@ -21929,12 +21976,13 @@ ix86_expand_int_vcond (rtx operands[])
 	    case V8HImode:
 	      /* Perform a parallel unsigned saturating subtraction.  */
 	      x = gen_reg_rtx (mode);
-	      emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0, cop1)));
+	      emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0,
+							   cop1)));
 
 	      cop0 = x;
 	      cop1 = CONST0_RTX (mode);
 	      code = EQ;
-	      negate = !negate;
+	      *negate = !*negate;
 	      break;
 
 	    default:
@@ -21943,22 +21991,162 @@ ix86_expand_int_vcond (rtx operands[])
 	}
     }
 
+  if (*negate)
+    std::swap (op_true, op_false);
+
   /* Allow the comparison to be done in one mode, but the movcc to
      happen in another mode.  */
   if (data_mode == mode)
     {
-      x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-			       operands[1+negate], operands[2-negate]);
+      x = ix86_expand_sse_cmp (dest, code, cop0, cop1,
+			       op_true, op_false);
     }
   else
     {
       gcc_assert (GET_MODE_SIZE (data_mode) == GET_MODE_SIZE (mode));
       x = ix86_expand_sse_cmp (gen_reg_rtx (mode), code, cop0, cop1,
-			       operands[1+negate], operands[2-negate]);
+			       op_true, op_false);
       if (GET_MODE (x) == mode)
 	x = gen_lowpart (data_mode, x);
     }
 
+  return x;
+}
+
+/* Expand integer vector comparison.  */
+
+bool
+ix86_expand_int_vec_cmp (rtx operands[])
+{
+  rtx_code code = GET_CODE (operands[1]);
+  bool negate = false;
+  rtx cmp = ix86_expand_int_sse_cmp (operands[0], code, operands[2],
+				     operands[3], NULL, NULL, &negate);
+
+  if (!cmp)
+    return false;
+
+  if (negate)
+    cmp = ix86_expand_int_sse_cmp (operands[0], EQ, cmp,
+				   CONST0_RTX (GET_MODE (cmp)),
+				   NULL, NULL, &negate);
+
+  gcc_assert (!negate);
+
+  if (operands[0] != cmp)
+    emit_move_insn (operands[0], cmp);
+
+  return true;
+}
+
+/* Expand a floating-point vector conditional move; a vcond operation
+   rather than a movcc operation.  */
+
+bool
+ix86_expand_fp_vcond (rtx operands[])
+{
+  enum rtx_code code = GET_CODE (operands[3]);
+  rtx cmp;
+
+  code = ix86_prepare_sse_fp_compare_args (operands[0], code,
+					   &operands[4], &operands[5]);
+  if (code == UNKNOWN)
+    {
+      rtx temp;
+      switch (GET_CODE (operands[3]))
+	{
+	case LTGT:
+	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
+				      operands[5], operands[0], operands[0]);
+	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
+				     operands[5], operands[1], operands[2]);
+	  code = AND;
+	  break;
+	case UNEQ:
+	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
+				      operands[5], operands[0], operands[0]);
+	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
+				     operands[5], operands[1], operands[2]);
+	  code = IOR;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
+				 OPTAB_DIRECT);
+      ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+      return true;
+    }
+
+  if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
+				 operands[5], operands[1], operands[2]))
+    return true;
+
+  cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
+			     operands[1], operands[2]);
+  ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+  return true;
+}
+
+/* Expand a signed/unsigned integral vector conditional move.  */
+
+bool
+ix86_expand_int_vcond (rtx operands[])
+{
+  machine_mode data_mode = GET_MODE (operands[0]);
+  machine_mode mode = GET_MODE (operands[4]);
+  enum rtx_code code = GET_CODE (operands[3]);
+  bool negate = false;
+  rtx x, cop0, cop1;
+
+  cop0 = operands[4];
+  cop1 = operands[5];
+
+  /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
+     and x < 0 ? 1 : 0 into (unsigned) x >> 31.  */
+  if ((code == LT || code == GE)
+      && data_mode == mode
+      && cop1 == CONST0_RTX (mode)
+      && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
+      && GET_MODE_UNIT_SIZE (data_mode) > 1
+      && GET_MODE_UNIT_SIZE (data_mode) <= 8
+      && (GET_MODE_SIZE (data_mode) == 16
+	  || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
+    {
+      rtx negop = operands[2 - (code == LT)];
+      int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
+      if (negop == CONST1_RTX (data_mode))
+	{
+	  rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
+					 operands[0], 1, OPTAB_DIRECT);
+	  if (res != operands[0])
+	    emit_move_insn (operands[0], res);
+	  return true;
+	}
+      else if (GET_MODE_INNER (data_mode) != DImode
+	       && vector_all_ones_operand (negop, data_mode))
+	{
+	  rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
+					 operands[0], 0, OPTAB_DIRECT);
+	  if (res != operands[0])
+	    emit_move_insn (operands[0], res);
+	  return true;
+	}
+    }
+
+  if (!nonimmediate_operand (cop1, mode))
+    cop1 = force_reg (mode, cop1);
+  if (!general_operand (operands[1], data_mode))
+    operands[1] = force_reg (data_mode, operands[1]);
+  if (!general_operand (operands[2], data_mode))
+    operands[2] = force_reg (data_mode, operands[2]);
+
+  x = ix86_expand_int_sse_cmp (operands[0], code, cop0, cop1,
+			       operands[1], operands[2], &negate);
+
+  if (!x)
+    return false;
+
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 			 operands[2-negate]);
   return true;
@@ -51678,6 +51866,25 @@ ix86_autovectorize_vector_sizes (void)
     (TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0;
 }
 
+/* Implemenation of targetm.vectorize.get_mask_mode.  */
+
+static machine_mode
+ix86_get_mask_mode (unsigned nunits, unsigned vector_size)
+{
+  /* Scalar mask case.  */
+  if ((TARGET_AVX512F && vector_size == 64)
+      || TARGET_AVX512VL)
+    return smallest_mode_for_size (nunits, MODE_INT);
+
+  unsigned elem_size = vector_size / nunits;
+  machine_mode elem_mode
+    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
+
+  gcc_assert (elem_size * nunits == vector_size);
+
+  return mode_for_vector (elem_mode, nunits);
+}
+
 \f
 
 /* Return class of registers which could be used for pseudo of MODE
@@ -52612,6 +52819,8 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
 #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode
 #undef TARGET_VECTORIZE_INIT_COST
 #define TARGET_VECTORIZE_INIT_COST ix86_init_cost
 #undef TARGET_VECTORIZE_ADD_STMT_COST
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4535570..a8d55cc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -605,6 +605,15 @@
    (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
+;; Mapping of vector modes to corresponding mask size
+(define_mode_attr avx512fmaskmodelower
+  [(V64QI "di") (V32QI "si") (V16QI "hi")
+   (V32HI "si") (V16HI "hi") (V8HI  "qi") (V4HI "qi")
+   (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
+   (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
+   (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
+   (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V16SF "V16SI") (V8DF  "V8DI")
@@ -2803,6 +2812,150 @@
 		      (const_string "0")))
    (set_attr "mode" "<MODE>")])
 
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:V48_AVX512VL 2 "register_operand")
+	   (match_operand:V48_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512F"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI12_AVX512VL 2 "register_operand")
+	   (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512BW"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI_256 2 "register_operand")
+	   (match_operand:VI_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI124_128 2 "register_operand")
+	   (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpv2div2di"
+  [(set (match_operand:V2DI 0 "register_operand")
+	(match_operator:V2DI 1 ""
+	  [(match_operand:V2DI 2 "register_operand")
+	   (match_operand:V2DI 3 "nonimmediate_operand")]))]
+  "TARGET_SSE4_2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VF_256 2 "register_operand")
+	   (match_operand:VF_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX"
+{
+  bool ok = ix86_expand_fp_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VF_128 2 "register_operand")
+	   (match_operand:VF_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE"
+{
+  bool ok = ix86_expand_fp_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI48_AVX512VL 2 "register_operand")
+	   (match_operand:VI48_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512F"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI12_AVX512VL 2 "register_operand")
+	   (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512BW"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI_256 2 "register_operand")
+	   (match_operand:VI_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI124_128 2 "register_operand")
+	   (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpuv2div2di"
+  [(set (match_operand:V2DI 0 "register_operand")
+	(match_operator:V2DI 1 ""
+	  [(match_operand:V2DI 2 "register_operand")
+	   (match_operand:V2DI 3 "nonimmediate_operand")]))]
+  "TARGET_SSE4_2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
 (define_expand "vcond<V_512:mode><VF_512:mode>"
   [(set (match_operand:V_512 0 "register_operand")
 	(if_then_else:V_512
@@ -17895,7 +18048,7 @@
    (set_attr "btver2_decode" "vector") 
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "maskload<mode>"
+(define_expand "maskload<mode><sseintvecmodelower>"
   [(set (match_operand:V48_AVX2 0 "register_operand")
 	(unspec:V48_AVX2
 	  [(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17903,7 +18056,23 @@
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
-(define_expand "maskstore<mode>"
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+  [(set (match_operand:V48_AVX512VL 0 "register_operand")
+	(vec_merge:V48_AVX512VL
+	  (match_operand:V48_AVX512VL 1 "memory_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+  [(set (match_operand:VI12_AVX512VL 0 "register_operand")
+	(vec_merge:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "memory_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512BW")
+
+(define_expand "maskstore<mode><sseintvecmodelower>"
   [(set (match_operand:V48_AVX2 0 "memory_operand")
 	(unspec:V48_AVX2
 	  [(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17912,6 +18081,22 @@
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
+	(vec_merge:V48_AVX512VL
+	  (match_operand:V48_AVX512VL 1 "register_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+  [(set (match_operand:VI12_AVX512VL 0 "memory_operand")
+	(vec_merge:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "register_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512BW")
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f5a1f84..acdfcd5 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5688,6 +5688,11 @@ mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.
 The default is zero which means to not iterate over other vector sizes.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_GET_MASK_MODE (unsigned @var{nunits}, unsigned @var{length})
+This hook returns mode to be used for a mask to be used for a vector
+of specified @var{length} with @var{nunits} elements.
+@end deftypefn
+
 @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info})
 This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block.  The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block.  If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9d5ac0a..52e912a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4225,6 +4225,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
 
+@hook TARGET_VECTORIZE_GET_MASK_MODE
+
 @hook TARGET_VECTORIZE_INIT_COST
 
 @hook TARGET_VECTORIZE_ADD_STMT_COST
diff --git a/gcc/expr.c b/gcc/expr.c
index 1e820b4..fa48484 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11000,9 +11000,15 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
     {
       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
-      tree if_true = constant_boolean_node (true, ops->type);
-      tree if_false = constant_boolean_node (false, ops->type);
-      return expand_vec_cond_expr (ops->type, ifexp, if_true, if_false, target);
+      if (TREE_TYPE (ops->type) == boolean_type_node)
+	return expand_vec_cmp_expr (ops->type, ifexp, target);
+      else
+	{
+	  tree if_true = constant_boolean_node (true, ops->type);
+	  tree if_false = constant_boolean_node (false, ops->type);
+	  return expand_vec_cond_expr (ops->type, ifexp, if_true,
+				       if_false, target);
+	}
     }
 
   /* Get the rtx comparison code to use.  We know that EXP is a comparison
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index e785946..4ca0a40 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
   create_output_operand (&ops[0], target, TYPE_MODE (type));
   create_fixed_operand (&ops[1], mem);
   create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
+  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
+				      TYPE_MODE (TREE_TYPE (maskt))),
+	       3, ops);
 }
 
 static void
@@ -1908,7 +1910,9 @@ expand_MASK_STORE (gcall *stmt)
   create_fixed_operand (&ops[0], mem);
   create_input_operand (&ops[1], reg, TYPE_MODE (type));
   create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops);
+  expand_insn (convert_optab_handler (maskstore_optab, TYPE_MODE (type),
+				      TYPE_MODE (TREE_TYPE (maskt))),
+	       3, ops);
 }
 
 static void
diff --git a/gcc/optabs.c b/gcc/optabs.c
index e533e6e..48f7914 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6490,11 +6490,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
 }
 
 /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
-   unsigned operators. Do not generate compare instruction.  */
+   unsigned operators.  OPNO holds an index of the first comparison
+   operand in insn with code ICODE.  Do not generate compare instruction.  */
 
 static rtx
 vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
-		    bool unsignedp, enum insn_code icode)
+		    bool unsignedp, enum insn_code icode,
+		    unsigned int opno)
 {
   struct expand_operand ops[2];
   rtx rtx_op0, rtx_op1;
@@ -6520,7 +6522,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
 
   create_input_operand (&ops[0], rtx_op0, m0);
   create_input_operand (&ops[1], rtx_op1, m1);
-  if (!maybe_legitimize_operands (icode, 4, 2, ops))
+  if (!maybe_legitimize_operands (icode, opno, 2, ops))
     gcc_unreachable ();
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
@@ -6863,7 +6865,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode);
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
@@ -6877,6 +6879,63 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   return ops[0].value;
 }
 
+/* Return insn code for a comparison operator with VMODE
+   resultin MASK_MODE, unsigned if UNS is true.  */
+
+static inline enum insn_code
+get_vec_cmp_icode (machine_mode vmode, machine_mode mask_mode, bool uns)
+{
+  optab tab = uns ? vec_cmpu_optab : vec_cmp_optab;
+  return convert_optab_handler (tab, vmode, mask_mode);
+}
+
+/* Return TRUE if appropriate vector insn is available
+   for vector comparison expr with vector type VALUE_TYPE
+   and resulting mask with MASK_TYPE.  */
+
+bool
+expand_vec_cmp_expr_p (tree value_type, tree mask_type)
+{
+  enum insn_code icode = get_vec_cmp_icode (TYPE_MODE (value_type),
+					    TYPE_MODE (mask_type),
+					    TYPE_UNSIGNED (value_type));
+  return (icode != CODE_FOR_nothing);
+}
+
+/* Generate insns for a vector comparison into a mask.  */
+
+rtx
+expand_vec_cmp_expr (tree type, tree exp, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  rtx comparison;
+  machine_mode mask_mode = TYPE_MODE (type);
+  machine_mode vmode;
+  bool unsignedp;
+  tree op0a, op0b;
+  enum tree_code tcode;
+
+  op0a = TREE_OPERAND (exp, 0);
+  op0b = TREE_OPERAND (exp, 1);
+  tcode = TREE_CODE (exp);
+
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+  vmode = TYPE_MODE (TREE_TYPE (op0a));
+
+  icode = get_vec_cmp_icode (vmode, mask_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
+  create_output_operand (&ops[0], target, mask_mode);
+  create_fixed_operand (&ops[1], comparison);
+  create_fixed_operand (&ops[2], XEXP (comparison, 0));
+  create_fixed_operand (&ops[3], XEXP (comparison, 1));
+  expand_insn (icode, 4, ops);
+  return ops[0].value;
+}
+
 /* Return non-zero if a highpart multiply is supported of can be synthisized.
    For the benefit of expand_mult_highpart, the return value is 1 for direct,
    2 for even/odd widening, and 3 for hi/lo widening.  */
@@ -7002,26 +7061,32 @@ expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
 
 /* Return true if target supports vector masked load/store for mode.  */
 bool
-can_vec_mask_load_store_p (machine_mode mode, bool is_load)
+can_vec_mask_load_store_p (machine_mode mode,
+			   machine_mode mask_mode,
+			   bool is_load)
 {
   optab op = is_load ? maskload_optab : maskstore_optab;
-  machine_mode vmode;
   unsigned int vector_sizes;
 
   /* If mode is vector mode, check it directly.  */
   if (VECTOR_MODE_P (mode))
-    return optab_handler (op, mode) != CODE_FOR_nothing;
+    return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
 
   /* Otherwise, return true if there is some vector mode with
      the mask load/store supported.  */
 
   /* See if there is any chance the mask load or store might be
      vectorized.  If not, punt.  */
-  vmode = targetm.vectorize.preferred_simd_mode (mode);
-  if (!VECTOR_MODE_P (vmode))
+  mode = targetm.vectorize.preferred_simd_mode (mode);
+  if (!VECTOR_MODE_P (mode))
+    return false;
+
+  mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+					       GET_MODE_SIZE (mode));
+  if (mask_mode == VOIDmode)
     return false;
 
-  if (optab_handler (op, vmode) != CODE_FOR_nothing)
+  if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
     return true;
 
   vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
@@ -7031,9 +7096,12 @@ can_vec_mask_load_store_p (machine_mode mode, bool is_load)
       vector_sizes &= ~cur;
       if (cur <= GET_MODE_SIZE (mode))
 	continue;
-      vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
-      if (VECTOR_MODE_P (vmode)
-	  && optab_handler (op, vmode) != CODE_FOR_nothing)
+      mode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
+      mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+						   cur);
+      if (VECTOR_MODE_P (mode)
+	  && mask_mode != VOIDmode
+	  && convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
 	return true;
     }
   return false;
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..9804378 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -61,6 +61,10 @@ OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b")
 OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b")
 OPTAB_CD(vcond_optab, "vcond$a$b")
 OPTAB_CD(vcondu_optab, "vcondu$a$b")
+OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
+OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
+OPTAB_CD(maskload_optab, "maskload$a$b")
+OPTAB_CD(maskstore_optab, "maskstore$a$b")
 
 OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(add_optab, "add$F$a3")
@@ -264,8 +268,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
-OPTAB_D (maskload_optab, "maskload$a")
-OPTAB_D (maskstore_optab, "maskstore$a")
 OPTAB_D (vec_extract_optab, "vec_extract$a")
 OPTAB_D (vec_init_optab, "vec_init$a")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..dfe9ebf 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -496,6 +496,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
 extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
 
 /* Return tree if target supports vector operations for COND_EXPR.  */
+bool expand_vec_cmp_expr_p (tree, tree);
+
+/* Generate code for VEC_COND_EXPR.  */
+extern rtx expand_vec_cmp_expr (tree, tree, rtx);
+
+/* Return true if target supports vector comparison.  */
 bool expand_vec_cond_expr_p (tree, tree);
 
 /* Generate code for VEC_COND_EXPR.  */
@@ -508,7 +514,7 @@ extern int can_mult_highpart_p (machine_mode, bool);
 extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
 
 /* Return true if target supports vector masked load/store for mode.  */
-extern bool can_vec_mask_load_store_p (machine_mode, bool);
+extern bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
 
 /* Return true if there is an inline compare and swap pattern.  */
 extern bool can_compare_and_swap_p (machine_mode, bool);
diff --git a/gcc/target.def b/gcc/target.def
index 4edc209..c5b8ed9 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1789,6 +1789,15 @@ The default is zero which means to not iterate over other vector sizes.",
  (void),
  default_autovectorize_vector_sizes)
 
+/* Function to get a target mode for a vector mask.  */
+DEFHOOK
+(get_mask_mode,
+ "This hook returns mode to be used for a mask to be used for a vector\n\
+of specified @var{length} with @var{nunits} elements.",
+ machine_mode,
+ (unsigned nunits, unsigned length),
+ default_get_mask_mode)
+
 /* Target builtin that implements vector gather operation.  */
 DEFHOOK
 (builtin_gather,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 7238c8f..61fb97d 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1033,6 +1033,8 @@ tree default_mangle_decl_assembler_name (tree decl ATTRIBUTE_UNUSED,
 HOST_WIDE_INT
 default_vector_alignment (const_tree type)
 {
+  if (TREE_TYPE (type) == boolean_type_node)
+    return GET_MODE_ALIGNMENT (TYPE_MODE (type));
   return tree_to_shwi (TYPE_SIZE (type));
 }
 
@@ -1087,6 +1089,20 @@ default_autovectorize_vector_sizes (void)
   return 0;
 }
 
+/* By defaults a vector of integers is used as a mask.  */
+
+machine_mode
+default_get_mask_mode (unsigned nunits, unsigned vector_size)
+{
+  unsigned elem_size = vector_size / nunits;
+  machine_mode elem_mode
+    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
+
+  gcc_assert (elem_size * nunits == vector_size);
+
+  return mode_for_vector (elem_mode, nunits);
+}
+
 /* By default, the cost model accumulates three separate costs (prologue,
    loop body, and epilogue) for a vectorized loop or block.  So allocate an
    array of three unsigned ints, set it to zero, and return its address.  */
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 5ae991d..cc7263f 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -100,6 +100,7 @@ default_builtin_support_vector_misalignment (machine_mode mode,
 					     int, bool);
 extern machine_mode default_preferred_simd_mode (machine_mode mode);
 extern unsigned int default_autovectorize_vector_sizes (void);
+extern machine_mode default_get_mask_mode (unsigned, unsigned);
 extern void *default_init_cost (struct loop *);
 extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
 				       struct _stmt_vec_info *, int,
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5ac73b3..1ee8f93 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3490,6 +3490,27 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
           return true;
         }
     }
+  /* Or a boolean vector type with the same element count
+     as the comparison operand types.  */
+  else if (TREE_CODE (type) == VECTOR_TYPE
+	   && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+    }
   else
     {
       error ("bogus comparison result type");
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 291e602..d66517d 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -811,7 +811,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
       || VECTOR_MODE_P (mode))
     return false;
 
-  if (can_vec_mask_load_store_p (mode, is_load))
+  if (can_vec_mask_load_store_p (mode, VOIDmode, is_load))
     return true;
 
   return false;
@@ -2068,7 +2068,7 @@ predicate_mem_writes (loop_p loop)
 	  {
 	    tree lhs = gimple_assign_lhs (stmt);
 	    tree rhs = gimple_assign_rhs1 (stmt);
-	    tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask;
+	    tree ref, addr, ptr, masktype, mask;
 	    gimple new_stmt;
 	    int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
 	    ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
@@ -2082,15 +2082,47 @@ predicate_mem_writes (loop_p loop)
 	      mask = vect_masks[index];
 	    else
 	      {
-		masktype = build_nonstandard_integer_type (bitsize, 1);
-		mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
-		mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
-		cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-						   is_gimple_condexpr,
-						   NULL_TREE,
-						   true, GSI_SAME_STMT);
-		mask = fold_build_cond_expr (masktype, unshare_expr (cond),
-					     mask_op0, mask_op1);
+		masktype = boolean_type_node;
+		if ((TREE_CODE (cond) == NE_EXPR
+		     || TREE_CODE (cond) == EQ_EXPR)
+		    && (integer_zerop (TREE_OPERAND (cond, 1))
+			|| integer_onep (TREE_OPERAND (cond, 1)))
+		    && TREE_CODE (TREE_TYPE (TREE_OPERAND (cond, 0)))
+		       == BOOLEAN_TYPE)
+		  {
+		    bool negate = (TREE_CODE (cond) == EQ_EXPR);
+		    if (integer_onep (TREE_OPERAND (cond, 1)))
+		      negate = !negate;
+		    if (swap)
+		      negate = !negate;
+		    mask = TREE_OPERAND (cond, 0);
+		    if (negate)
+		      {
+			mask = ifc_temp_var (masktype, unshare_expr (cond),
+					     &gsi);
+			mask = build1 (TRUTH_NOT_EXPR, masktype, mask);
+		      }
+		  }
+		else if (swap &&
+			 TREE_CODE_CLASS (TREE_CODE (cond)) == tcc_comparison)
+		  {
+		    tree op_type = TREE_TYPE (TREE_OPERAND (cond, 0));
+		    tree_code code
+		      = invert_tree_comparison (TREE_CODE (cond),
+						HONOR_NANS (op_type));
+		    if (code != ERROR_MARK)
+			mask = build2 (code, TREE_TYPE (cond),
+				       TREE_OPERAND (cond, 0),
+				       TREE_OPERAND (cond, 1));
+		    else
+		      {
+			mask = ifc_temp_var (masktype, unshare_expr (cond),
+					     &gsi);
+			mask = build1 (TRUTH_NOT_EXPR, masktype, mask);
+		      }
+		  }
+		else
+		  mask = unshare_expr (cond);
 		mask = ifc_temp_var (masktype, mask, &gsi);
 		/* Save mask and its size for further use.  */
 	        vect_sizes.safe_push (bitsize);
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index f1eaef4..b3b970a 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3849,6 +3849,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind var_kind, const char *name)
   case vect_scalar_var:
     prefix = "stmp";
     break;
+  case vect_mask_var:
+    prefix = "mask";
+    break;
   case vect_pointer_var:
     prefix = "vectp";
     break;
@@ -4403,7 +4406,11 @@ vect_create_destination_var (tree scalar_dest, tree vectype)
   tree type;
   enum vect_var_kind kind;
 
-  kind = vectype ? vect_simple_var : vect_scalar_var;
+  kind = vectype
+    ? TREE_TYPE (vectype) == boolean_type_node
+    ? vect_mask_var
+    : vect_simple_var
+    : vect_scalar_var;
   type = vectype ? vectype : TREE_TYPE (scalar_dest);
 
   gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index be3d27f..9218ca5 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -350,7 +350,8 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
                           tree op1, enum tree_code code)
 {
   tree t;
-  if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
+  if (!expand_vec_cmp_expr_p (TREE_TYPE (op0), type)
+      && !expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
     t = expand_vector_piecewise (gsi, do_compare, type,
 				 TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
   else
@@ -1506,6 +1507,11 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != VECTOR_TYPE)
     return;
 
+  /* A scalar operation pretending to be a vector one.  */
+  if (TREE_TYPE (type) == boolean_type_node
+      && !VECTOR_MODE_P (TYPE_MODE (type)))
+    return;
+
   if (CONVERT_EXPR_CODE_P (code)
       || code == FLOAT_EXPR
       || code == FIX_TRUNC_EXPR
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 59c75af..f2dbc4e 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -193,19 +193,21 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  int nbbs = loop->num_nodes;
+  unsigned nbbs = loop->num_nodes;
   unsigned int vectorization_factor = 0;
   tree scalar_type;
   gphi *phi;
   tree vectype;
   unsigned int nunits;
   stmt_vec_info stmt_info;
-  int i;
+  unsigned i;
   HOST_WIDE_INT dummy;
   gimple stmt, pattern_stmt = NULL;
   gimple_seq pattern_def_seq = NULL;
   gimple_stmt_iterator pattern_def_si = gsi_none ();
   bool analyze_pattern_stmt = false;
+  bool bool_result;
+  auto_vec<stmt_vec_info> mask_producers;
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -424,6 +426,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 	      return false;
 	    }
 
+	  bool_result = false;
+
 	  if (STMT_VINFO_VECTYPE (stmt_info))
 	    {
 	      /* The only case when a vectype had been already set is for stmts
@@ -444,6 +448,30 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 		scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
 	      else
 		scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
+
+	      /* Bool ops don't participate in vectorization factor
+		 computation.  For comparison use compared types to
+		 compute a factor.  */
+	      if (scalar_type == boolean_type_node)
+		{
+		  mask_producers.safe_push (stmt_info);
+		  bool_result = true;
+
+		  if (gimple_code (stmt) == GIMPLE_ASSIGN
+		      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
+		      && TREE_TYPE (gimple_assign_rhs1 (stmt)) != boolean_type_node)
+		    scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+		  else
+		    {
+		      if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
+			{
+			  pattern_def_seq = NULL;
+			  gsi_next (&si);
+			}
+		      continue;
+		    }
+		}
+
 	      if (dump_enabled_p ())
 		{
 		  dump_printf_loc (MSG_NOTE, vect_location,
@@ -466,7 +494,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 		  return false;
 		}
 
-	      STMT_VINFO_VECTYPE (stmt_info) = vectype;
+	      if (!bool_result)
+		STMT_VINFO_VECTYPE (stmt_info) = vectype;
 
 	      if (dump_enabled_p ())
 		{
@@ -479,8 +508,9 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 	  /* The vectorization factor is according to the smallest
 	     scalar type (or the largest vector size, but we only
 	     support one vector size per loop).  */
-	  scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
-						       &dummy);
+	  if (!bool_result)
+	    scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
+							 &dummy);
 	  if (dump_enabled_p ())
 	    {
 	      dump_printf_loc (MSG_NOTE, vect_location,
@@ -555,6 +585,100 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
     }
   LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
 
+  for (i = 0; i < mask_producers.length (); i++)
+    {
+      tree mask_type = NULL;
+      bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (mask_producers[i]);
+
+      stmt = STMT_VINFO_STMT (mask_producers[i]);
+
+      if (gimple_code (stmt) == GIMPLE_ASSIGN
+	  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
+	  && TREE_TYPE (gimple_assign_rhs1 (stmt)) != boolean_type_node)
+	{
+	  scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+	  mask_type = get_mask_type_for_scalar_type (scalar_type);
+
+	  if (!mask_type)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "not vectorized: unsupported mask\n");
+	      return false;
+	    }
+	}
+      else
+	{
+	  tree rhs, def;
+	  ssa_op_iter iter;
+	  gimple def_stmt;
+	  enum vect_def_type dt;
+
+	  FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
+	    {
+	      if (!vect_is_simple_use_1 (rhs, stmt, loop_vinfo, bb_vinfo,
+					 &def_stmt, &def, &dt, &vectype))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "not vectorized: can't compute mask type "
+				       "for statement, ");
+		      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
+					0);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		    }
+		  return false;
+		}
+
+	      /* No vectype probably means external definition.
+		 Allow it in case there is another operand which
+		 allows to determine mask type.  */
+	      if (!vectype)
+		continue;
+
+	      if (!mask_type)
+		mask_type = vectype;
+	      else if (TYPE_VECTOR_SUBPARTS (mask_type)
+		       != TYPE_VECTOR_SUBPARTS (vectype))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "not vectorized: different sized masks "
+				       "types in statement, ");
+		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+					 mask_type);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+					 vectype);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		    }
+		  return false;
+		}
+	    }
+	}
+
+      /* No mask_type should mean loop invariant predicate.
+	 This is probably a subject for optimization in
+	 if-conversion.  */
+      if (!mask_type)
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: can't compute mask type "
+			       "for statement, ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
+				0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+	  return false;
+	}
+
+      STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type;
+    }
+
   return true;
 }
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f87c066..85259d9 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1316,27 +1316,61 @@ vect_init_vector_1 (gimple stmt, gimple new_stmt, gimple_stmt_iterator *gsi)
 tree
 vect_init_vector (gimple stmt, tree val, tree type, gimple_stmt_iterator *gsi)
 {
+  tree val_type = TREE_TYPE (val);
+  machine_mode mode = TYPE_MODE (type);
+  machine_mode val_mode = TYPE_MODE(val_type);
   tree new_var;
   gimple init_stmt;
   tree vec_oprnd;
   tree new_temp;
 
   if (TREE_CODE (type) == VECTOR_TYPE
-      && TREE_CODE (TREE_TYPE (val)) != VECTOR_TYPE)
-    {
-      if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
+      && TREE_CODE (val_type) != VECTOR_TYPE)
+    {
+      /* Handle vector of bool represented as a vector of
+	 integers here rather than on expand because it is
+	 a default mask type for targets.  Vector mask is
+	 built in a following way:
+
+	 tmp = (int)val
+	 vec_tmp = {tmp, ..., tmp}
+	 vec_cst = VIEW_CONVERT_EXPR<vector(N) _Bool>(vec_tmp);  */
+      if (val_type == boolean_type_node
+	  && VECTOR_MODE_P (mode)
+	  && SCALAR_INT_MODE_P (GET_MODE_INNER (mode))
+	  && GET_MODE_INNER (mode) != val_mode)
 	{
-	  if (CONSTANT_CLASS_P (val))
-	    val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
-	  else
+	  unsigned size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
+	  tree stype = build_nonstandard_integer_type (size, 1);
+	  tree vectype = get_vectype_for_scalar_type (stype);
+
+	  new_temp = make_ssa_name (stype);
+	  init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
+	  vect_init_vector_1 (stmt, init_stmt, gsi);
+
+	  val = make_ssa_name (vectype);
+	  new_temp = build_vector_from_val (vectype, new_temp);
+	  init_stmt = gimple_build_assign (val, new_temp);
+	  vect_init_vector_1 (stmt, init_stmt, gsi);
+
+	  val = build1 (VIEW_CONVERT_EXPR, type, val);
+	}
+      else
+	{
+	  if (!types_compatible_p (TREE_TYPE (type), val_type))
 	    {
-	      new_temp = make_ssa_name (TREE_TYPE (type));
-	      init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
-	      vect_init_vector_1 (stmt, init_stmt, gsi);
-	      val = new_temp;
+	      if (CONSTANT_CLASS_P (val))
+		val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
+	      else
+		{
+		  new_temp = make_ssa_name (TREE_TYPE (type));
+		  init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
+		  vect_init_vector_1 (stmt, init_stmt, gsi);
+		  val = new_temp;
+		}
 	    }
+	  val = build_vector_from_val (type, val);
 	}
-      val = build_vector_from_val (type, val);
     }
 
   new_var = vect_get_new_vect_var (type, vect_simple_var, "cst_");
@@ -1368,6 +1402,7 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
   gimple def_stmt;
   stmt_vec_info def_stmt_info = NULL;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
+  tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
   unsigned int nunits;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
   tree def;
@@ -1411,7 +1446,12 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
     /* Case 1: operand is a constant.  */
     case vect_constant_def:
       {
-	vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+	if (TREE_TYPE (op) == boolean_type_node
+	    && TREE_TYPE (stmt_vectype) == boolean_type_node)
+	  vector_type = stmt_vectype;
+	else
+	  vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+
 	gcc_assert (vector_type);
 	nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
@@ -1429,7 +1469,11 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
     /* Case 2: operand is defined outside the loop - loop invariant.  */
     case vect_external_def:
       {
-	vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
+	if (TREE_TYPE (op) == boolean_type_node
+	    && TREE_TYPE (stmt_vectype) == boolean_type_node)
+	  vector_type = stmt_vectype;
+	else
+	  vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
 	gcc_assert (vector_type);
 
 	if (scalar_def)
@@ -1758,6 +1802,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
   bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree mask_vectype;
   tree elem_type;
   gimple new_stmt;
   tree dummy;
@@ -1785,8 +1830,8 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 
   is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE;
   mask = gimple_call_arg (stmt, 2);
-  if (TYPE_PRECISION (TREE_TYPE (mask))
-      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype))))
+
+  if (TREE_TYPE (mask) != boolean_type_node)
     return false;
 
   /* FORNOW. This restriction should be relaxed.  */
@@ -1815,6 +1860,19 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
   if (STMT_VINFO_STRIDED_P (stmt_info))
     return false;
 
+  if (TREE_CODE (mask) != SSA_NAME)
+    return false;
+
+  if (!vect_is_simple_use_1 (mask, stmt, loop_vinfo, NULL,
+			     &def_stmt, &def, &dt, &mask_vectype))
+    return false;
+
+  if (!mask_vectype)
+    mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
+
+  if (!mask_vectype)
+    return false;
+
   if (STMT_VINFO_GATHER_P (stmt_info))
     {
       gimple def_stmt;
@@ -1848,14 +1906,9 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 				 : DR_STEP (dr), size_zero_node) <= 0)
     return false;
   else if (!VECTOR_MODE_P (TYPE_MODE (vectype))
-	   || !can_vec_mask_load_store_p (TYPE_MODE (vectype), !is_store))
-    return false;
-
-  if (TREE_CODE (mask) != SSA_NAME)
-    return false;
-
-  if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL,
-			   &def_stmt, &def, &dt))
+	   || !can_vec_mask_load_store_p (TYPE_MODE (vectype),
+					  TYPE_MODE (mask_vectype),
+					  !is_store))
     return false;
 
   if (is_store)
@@ -7373,6 +7426,201 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+
+bool
+vectorizable_comparison (gimple stmt, gimple_stmt_iterator *gsi,
+			 gimple *vec_stmt, tree reduc_def,
+			 slp_tree slp_node)
+{
+  tree lhs, rhs1, rhs2;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype1, vectype2;
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
+  tree vec_compare;
+  tree new_temp;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  tree def;
+  enum vect_def_type dt, dts[4];
+  unsigned nunits;
+  int ncopies;
+  enum tree_code code;
+  stmt_vec_info prev_stmt_info = NULL;
+  int i, j;
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  vec<tree> vec_oprnds0 = vNULL;
+  vec<tree> vec_oprnds1 = vNULL;
+  tree mask_type;
+  tree mask;
+
+  if (TREE_TYPE (vectype) != boolean_type_node)
+    return false;
+
+  mask_type = vectype;
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+    ncopies = 1;
+  else
+    ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+	   && reduc_def))
+    return false;
+
+  if (STMT_VINFO_LIVE_P (stmt_info))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "value used after loop.\n");
+      return false;
+    }
+
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  code = gimple_assign_rhs_code (stmt);
+
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (rhs1) == SSA_NAME)
+    {
+      gimple rhs1_def_stmt = SSA_NAME_DEF_STMT (rhs1);
+      if (!vect_is_simple_use_1 (rhs1, stmt, loop_vinfo, bb_vinfo,
+				 &rhs1_def_stmt, &def, &dt, &vectype1))
+	return false;
+    }
+  else if (TREE_CODE (rhs1) != INTEGER_CST && TREE_CODE (rhs1) != REAL_CST
+	   && TREE_CODE (rhs1) != FIXED_CST)
+    return false;
+
+  if (TREE_CODE (rhs2) == SSA_NAME)
+    {
+      gimple rhs2_def_stmt = SSA_NAME_DEF_STMT (rhs2);
+      if (!vect_is_simple_use_1 (rhs2, stmt, loop_vinfo, bb_vinfo,
+				 &rhs2_def_stmt, &def, &dt, &vectype2))
+	return false;
+    }
+  else if (TREE_CODE (rhs2) != INTEGER_CST && TREE_CODE (rhs2) != REAL_CST
+	   && TREE_CODE (rhs2) != FIXED_CST)
+    return false;
+
+  vectype = vectype1 ? vectype1 : vectype2;
+
+  if (!vectype
+      || nunits != TYPE_VECTOR_SUBPARTS (vectype))
+    return false;
+
+  if (!vec_stmt)
+    {
+      STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+      return expand_vec_cmp_expr_p (vectype, mask_type);
+    }
+
+  /* Transform.  */
+  if (!slp_node)
+    {
+      vec_oprnds0.create (1);
+      vec_oprnds1.create (1);
+    }
+
+  /* Handle def.  */
+  lhs = gimple_assign_lhs (stmt);
+  mask = vect_create_destination_var (lhs, mask_type);
+
+  /* Handle cmp expr.  */
+  for (j = 0; j < ncopies; j++)
+    {
+      gassign *new_stmt = NULL;
+      if (j == 0)
+	{
+	  if (slp_node)
+	    {
+	      auto_vec<tree, 2> ops;
+	      auto_vec<vec<tree>, 2> vec_defs;
+
+	      ops.safe_push (rhs1);
+	      ops.safe_push (rhs2);
+	      vect_get_slp_defs (ops, slp_node, &vec_defs, -1);
+	      vec_oprnds1 = vec_defs.pop ();
+	      vec_oprnds0 = vec_defs.pop ();
+
+	      ops.release ();
+	      vec_defs.release ();
+	    }
+	  else
+	    {
+	      gimple gtemp;
+	      vec_rhs1
+		= vect_get_vec_def_for_operand (rhs1, stmt, NULL);
+	      vect_is_simple_use (rhs1, stmt, loop_vinfo, NULL,
+				  &gtemp, &def, &dts[0]);
+	      vec_rhs2 =
+		vect_get_vec_def_for_operand (rhs2, stmt, NULL);
+	      vect_is_simple_use (rhs2, stmt, loop_vinfo, NULL,
+				  &gtemp, &def, &dts[1]);
+	    }
+	}
+      else
+	{
+	  vec_rhs1 = vect_get_vec_def_for_stmt_copy (dts[0],
+						     vec_oprnds0.pop ());
+	  vec_rhs2 = vect_get_vec_def_for_stmt_copy (dts[1],
+						     vec_oprnds1.pop ());
+	}
+
+      if (!slp_node)
+	{
+	  vec_oprnds0.quick_push (vec_rhs1);
+	  vec_oprnds1.quick_push (vec_rhs2);
+	}
+
+      /* Arguments are ready.  Create the new vector stmt.  */
+      FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_rhs1)
+	{
+	  vec_rhs2 = vec_oprnds1[i];
+
+	  vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2);
+	  new_stmt = gimple_build_assign (mask, vec_compare);
+	  new_temp = make_ssa_name (mask, new_stmt);
+	  gimple_assign_set_lhs (new_stmt, new_temp);
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	  if (slp_node)
+	    SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
+	}
+
+      if (slp_node)
+	continue;
+
+      if (j == 0)
+	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+      else
+	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
+    }
+
+  vec_oprnds0.release ();
+  vec_oprnds1.release ();
+
+  return true;
+}
 
 /* Make sure the statement is vectorizable.  */
 
@@ -7576,7 +7824,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	  || vectorizable_call (stmt, NULL, NULL, node)
 	  || vectorizable_store (stmt, NULL, NULL, node)
 	  || vectorizable_reduction (stmt, NULL, NULL, node)
-	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+	  || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
   else
     {
       if (bb_vinfo)
@@ -7588,7 +7837,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	      || vectorizable_load (stmt, NULL, NULL, node, NULL)
 	      || vectorizable_call (stmt, NULL, NULL, node)
 	      || vectorizable_store (stmt, NULL, NULL, node)
-	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+	      || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
     }
 
   if (!ok)
@@ -7704,6 +7954,11 @@ vect_transform_stmt (gimple stmt, gimple_stmt_iterator *gsi,
       gcc_assert (done);
       break;
 
+    case comparison_vec_info_type:
+      done = vectorizable_comparison (stmt, gsi, &vec_stmt, NULL, slp_node);
+      gcc_assert (done);
+      break;
+
     case call_vec_info_type:
       done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node);
       stmt = gsi_stmt (*gsi);
@@ -8038,6 +8293,23 @@ get_vectype_for_scalar_type (tree scalar_type)
   return vectype;
 }
 
+/* Function get_mask_type_for_scalar_type.
+
+   Returns the mask type corresponding to a result of comparison
+   of vectors of specified SCALAR_TYPE as supported by target.  */
+
+tree
+get_mask_type_for_scalar_type (tree scalar_type)
+{
+  tree vectype = get_vectype_for_scalar_type (scalar_type);
+
+  if (!vectype)
+    return NULL;
+
+  return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype),
+				  current_vector_size);
+}
+
 /* Function get_same_sized_vectype
 
    Returns a vector type corresponding to SCALAR_TYPE of size
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 58e8f10..94aea1a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -28,7 +28,8 @@ along with GCC; see the file COPYING3.  If not see
 enum vect_var_kind {
   vect_simple_var,
   vect_pointer_var,
-  vect_scalar_var
+  vect_scalar_var,
+  vect_mask_var
 };
 
 /* Defines type of operation.  */
@@ -482,6 +483,7 @@ enum stmt_vec_info_type {
   call_simd_clone_vec_info_type,
   assignment_vec_info_type,
   condition_vec_info_type,
+  comparison_vec_info_type,
   reduc_vec_info_type,
   induc_vec_info_type,
   type_promotion_vec_info_type,
@@ -995,6 +997,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 /* In tree-vect-stmts.c.  */
 extern unsigned int current_vector_size;
 extern tree get_vectype_for_scalar_type (tree);
+extern tree get_mask_type_for_scalar_type (tree);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_is_simple_use (tree, gimple, loop_vec_info,
 			        bb_vec_info, gimple *,
diff --git a/gcc/tree.c b/gcc/tree.c
index af3a6a3..30398e5 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -10568,6 +10568,20 @@ build_vector_type (tree innertype, int nunits)
   return make_vector_type (innertype, nunits, VOIDmode);
 }
 
+/* Build truth vector with specified length and number of units.  */
+
+tree
+build_truth_vector_type (unsigned nunits, unsigned vector_size)
+{
+  machine_mode mask_mode = targetm.vectorize.get_mask_mode (nunits,
+							    vector_size);
+
+  if (mask_mode == VOIDmode)
+    return NULL;
+
+  return make_vector_type (boolean_type_node, nunits, mask_mode);
+}
+
 /* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set.  */
 
 tree
@@ -11054,9 +11068,10 @@ truth_type_for (tree type)
 {
   if (TREE_CODE (type) == VECTOR_TYPE)
     {
-      tree elem = lang_hooks.types.type_for_size
-        (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (type))), 0);
-      return build_opaque_vector_type (elem, TYPE_VECTOR_SUBPARTS (type));
+      if (TREE_TYPE (type) == boolean_type_node)
+	return type;
+      return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (type),
+				      GET_MODE_SIZE (TYPE_MODE (type)));
     }
   else
     return boolean_type_node;
diff --git a/gcc/tree.h b/gcc/tree.h
index 2cd6ec4..1657e06 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3820,6 +3820,7 @@ extern tree build_reference_type_for_mode (tree, machine_mode, bool);
 extern tree build_reference_type (tree);
 extern tree build_vector_type_for_mode (tree, machine_mode);
 extern tree build_vector_type (tree innertype, int nunits);
+extern tree build_truth_vector_type (unsigned, unsigned);
 extern tree build_opaque_vector_type (tree innertype, int nunits);
 extern tree build_index_type (tree);
 extern tree build_array_type (tree, tree);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-08 12:43                         ` Ilya Enkovich
@ 2015-09-15 13:55                           ` Ilya Enkovich
  2015-09-17 17:54                             ` Richard Henderson
  2015-09-18 12:45                             ` Richard Biener
  0 siblings, 2 replies; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-15 13:55 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 8197 bytes --]

On 08 Sep 15:37, Ilya Enkovich wrote:
> 2015-09-04 23:42 GMT+03:00 Jeff Law <law@redhat.com>:
> >
> > So do we have enough confidence in this representation that we want to go
> > ahead and commit to it?
> 
> I think new representation fits nice mostly. There are some places
> where I have to make some exceptions for vector of bools to make it
> work. This is mostly to avoid target modifications. I'd like to avoid
> necessity to change all targets currently supporting vec_cond. It
> makes me add some special handling of vec<bool> in GIMPLE, e.g. I add
> a special code in vect_init_vector to build vec<bool> invariants with
> proper casting to int. Otherwise I'd need to do it on a target side.
> 
> I made several fixes and current patch (still allowing integer vector
> result for vector comparison and applying bool patterns) passes
> bootstrap and regression testing on x86_64. Now I'll try to fully
> switch to vec<bool> and see how it goes.
> 
> Thanks,
> Ilya
> 

Hi,

I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target.  Several issues were met.

 - c/c++ front-ends generate vector comparison with integer vector result.  I had to make some modifications to use vec_cond instead.  Don't know if there are other front-ends producing vector comparisons.
 - vector lowering fails to expand vector masks due to mismatch of type and mode sizes.  I fixed vector type size computation to match mode size and added a special handling of mask expand.
 - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here.  But get_mask_mode call may be moved into type layout to get it.
 - Expand of vec<bool> constants/contstructors requires special handling.  Common case should require target hooks/optabs to expand vector into required mode.  But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes.

Currently 'make check' shows two types of regression.
  - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  This must be due to my front-end changes.  Hope it will be easy to fix.
  - missed vectorization. All of them appear due to bool patterns disabling.  I didn't look into all of them but it seems the main problem is in mixed type sizes.  With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation.  With boolean mask we don't have a proper scalar statements to do that.  I think mask widening/narrowing may be directly supported in masked statements vectorization.  Going to look into it.

I attach what I currently have for a prototype.  It grows bigger so I split into several parts.

Thanks,
Ilya
--
* avx512-vec-bool-01-add-truth-vector.ChangeLog

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* doc/tm.texi: Regenerated.
	* doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
	* stor-layout.c (layout_type): Use mode to get vector mask size.
	(vector_type_mode): Likewise.
	* target.def (get_mask_mode): New.
	* targhooks.c (default_vector_alignment): Use mode alignment
	for vector masks.
	(default_get_mask_mode): New.
	* targhooks.h (default_get_mask_mode): New.
	* tree.c (make_vector_type): Vector mask has no canonical type.
	(build_truth_vector_type): New.
	(build_same_sized_truth_vector_type): New.
	(truth_type_for): Support vector masks.
	* tree.h (VECTOR_MASK_TYPE_P): New.
	(build_truth_vector_type): New.
	(build_same_sized_truth_vector_type): New.

* avx512-vec-bool-02-no-int-vec-cmp.ChangeLog

gcc/

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* tree-cfg.c (verify_gimple_comparison) Require vector mask
	type for vector comparison.
	(verify_gimple_assign_ternary): Likewise.

gcc/c

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* c-typeck.c (build_conditional_expr): Use vector mask
	type for vector comparison.
	(build_vec_cmp): New.
	(build_binary_op): Use build_vec_cmp for comparison.

gcc/cp

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* call.c (build_conditional_expr_1): Use vector mask
	type for vector comparison.
	* typeck.c (build_vec_cmp): New.
	(cp_build_binary_op): Use build_vec_cmp for comparison.

* avx512-vec-bool-03-vec-lower.ChangeLog

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* tree-vect-generic.c (tree_vec_extract): Use additional
	comparison when extracting boolean value.
	(do_bool_compare): New.
	(expand_vector_comparison): Add casts for vector mask.
	(expand_vector_divmod): Use vector mask type for vector
	comparison.
	(expand_vector_operations_1) Skip scalar mode mask statements.

* avx512-vec-bool-04-vectorize.ChangeLog

gcc/

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
	(const_vector_mask_from_tree): New.
	(const_vector_from_tree): Use const_vector_mask_from_tree for vector
	masks.
	* internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
	(expand_MASK_STORE): Likewise.
	* optabs.c (vector_compare_rtx): Add OPNO arg.
	(expand_vec_cond_expr): Adjust to vector_compare_rtx change.
	(get_vec_cmp_icode): New.
	(expand_vec_cmp_expr_p): New.
	(expand_vec_cmp_expr): New.
	(can_vec_mask_load_store_p): Add MASK_MODE arg.
	* optabs.def (vec_cmp_optab): New.
	(vec_cmpu_optab): New.
	(maskload_optab): Transform into convert optab.
	(maskstore_optab): Likewise.
	* optabs.h (expand_vec_cmp_expr_p): New.
	(expand_vec_cmp_expr): New.
	(can_vec_mask_load_store_p): Add MASK_MODE arg.
	* tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
	can_vec_mask_load_store_p signature change.
	(predicate_mem_writes): Use boolean mask.
	* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
	(vect_create_destination_var): Likewise.
	* tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
	operations for VF.  Add mask type computation.
	* tree-vect-stmts.c (vect_init_vector): Support mask invariants.
	(vect_get_vec_def_for_operand): Support mask constant.
	(vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
	signature change.
	(vectorizable_condition): Use vector mask type for vector comparison.
	(vectorizable_comparison): New.
	(vect_analyze_stmt): Add vectorizable_comparison.
	(vect_transform_stmt): Likewise.
	(get_mask_type_for_scalar_type): New.
	* tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
	(enum stmt_vec_info_type): Add comparison_vec_info_type.
	(get_mask_type_for_scalar_type): New.

* avx512-vec-bool-05-bool-patterns.ChangeLog

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* tree-vect-patterns.c (check_bool_pattern): Check fails
	if we can vectorize comparison directly.
	(search_type_for_mask): New.
	(vect_recog_bool_pattern): Support cases when bool pattern
	check fails.

* avx512-vec-bool-06-i386.ChangeLog

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
	(ix86_expand_int_vec_cmp): New.
	(ix86_expand_fp_vec_cmp): New.
	* config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
	op_true and op_false.
	(ix86_int_cmp_code_to_pcmp_immediate): New.
	(ix86_fp_cmp_code_to_pcmp_immediate): New.
	(ix86_cmp_code_to_pcmp_immediate): New.
	(ix86_expand_mask_vec_cmp): New.
	(ix86_expand_fp_vec_cmp): New.
	(ix86_expand_int_sse_cmp): New.
	(ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
	(ix86_expand_int_vec_cmp): New.
	(ix86_get_mask_mode): New.
	(TARGET_VECTORIZE_GET_MASK_MODE): New.
	* config/i386/sse.md (avx512fmaskmodelower): New.
	(vec_cmp<mode><avx512fmaskmodelower>): New.
	(vec_cmp<mode><sseintvecmodelower>): New.
	(vec_cmpv2div2di): New.
	(vec_cmpu<mode><avx512fmaskmodelower>): New.
	(vec_cmpu<mode><sseintvecmodelower>): New.
	(vec_cmpuv2div2di): New.
	(maskload<mode>): Rename to ...
	(maskload<mode><sseintvecmodelower>): ... this.
	(maskstore<mode>): Rename to ...
	(maskstore<mode><sseintvecmodelower>): ... this.
	(maskload<mode><avx512fmaskmodelower>): New.
	(maskstore<mode><avx512fmaskmodelower>): New.

[-- Attachment #2: avx512-vec-bool-01-add-truth-vector.patch --]
[-- Type: text/plain, Size: 8491 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f5a1f84..acdfcd5 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5688,6 +5688,11 @@ mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.
 The default is zero which means to not iterate over other vector sizes.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_GET_MASK_MODE (unsigned @var{nunits}, unsigned @var{length})
+This hook returns mode to be used for a mask to be used for a vector
+of specified @var{length} with @var{nunits} elements.
+@end deftypefn
+
 @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info})
 This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block.  The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block.  If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9d5ac0a..52e912a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4225,6 +4225,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
 
+@hook TARGET_VECTORIZE_GET_MASK_MODE
+
 @hook TARGET_VECTORIZE_INIT_COST
 
 @hook TARGET_VECTORIZE_ADD_STMT_COST
diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index 938e54b..f24a0c4 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -2184,11 +2184,22 @@ layout_type (tree type)
 
 	TYPE_SATURATING (type) = TYPE_SATURATING (TREE_TYPE (type));
         TYPE_UNSIGNED (type) = TYPE_UNSIGNED (TREE_TYPE (type));
-	TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR,
-					         TYPE_SIZE_UNIT (innertype),
-					         size_int (nunits));
-	TYPE_SIZE (type) = int_const_binop (MULT_EXPR, TYPE_SIZE (innertype),
-					    bitsize_int (nunits));
+	if (VECTOR_MASK_TYPE_P (type))
+	  {
+	    TYPE_SIZE_UNIT (type)
+	      = size_int (GET_MODE_SIZE (type->type_common.mode));
+	    TYPE_SIZE (type)
+	      = bitsize_int (GET_MODE_BITSIZE (type->type_common.mode));
+	  }
+	else
+	  {
+	    TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR,
+						     TYPE_SIZE_UNIT (innertype),
+						     size_int (nunits));
+	    TYPE_SIZE (type) = int_const_binop (MULT_EXPR,
+						TYPE_SIZE (innertype),
+						bitsize_int (nunits));
+	  }
 
 	/* For vector types, we do not default to the mode's alignment.
 	   Instead, query a target hook, defaulting to natural alignment.
@@ -2455,7 +2466,14 @@ vector_type_mode (const_tree t)
       machine_mode innermode = TREE_TYPE (t)->type_common.mode;
 
       /* For integers, try mapping it to a same-sized scalar mode.  */
-      if (GET_MODE_CLASS (innermode) == MODE_INT)
+      if (VECTOR_MASK_TYPE_P (t))
+	{
+	  mode = mode_for_size (GET_MODE_BITSIZE (mode), MODE_INT, 0);
+
+	  if (mode != VOIDmode && have_regs_of_mode[mode])
+	    return mode;
+	}
+      else if (GET_MODE_CLASS (innermode) == MODE_INT)
 	{
 	  mode = mode_for_size (TYPE_VECTOR_SUBPARTS (t)
 				* GET_MODE_BITSIZE (innermode), MODE_INT, 0);
diff --git a/gcc/target.def b/gcc/target.def
index 4edc209..c5b8ed9 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1789,6 +1789,15 @@ The default is zero which means to not iterate over other vector sizes.",
  (void),
  default_autovectorize_vector_sizes)
 
+/* Function to get a target mode for a vector mask.  */
+DEFHOOK
+(get_mask_mode,
+ "This hook returns mode to be used for a mask to be used for a vector\n\
+of specified @var{length} with @var{nunits} elements.",
+ machine_mode,
+ (unsigned nunits, unsigned length),
+ default_get_mask_mode)
+
 /* Target builtin that implements vector gather operation.  */
 DEFHOOK
 (builtin_gather,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 7238c8f..ac01d57 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1087,6 +1087,20 @@ default_autovectorize_vector_sizes (void)
   return 0;
 }
 
+/* By defaults a vector of integers is used as a mask.  */
+
+machine_mode
+default_get_mask_mode (unsigned nunits, unsigned vector_size)
+{
+  unsigned elem_size = vector_size / nunits;
+  machine_mode elem_mode
+    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
+
+  gcc_assert (elem_size * nunits == vector_size);
+
+  return mode_for_vector (elem_mode, nunits);
+}
+
 /* By default, the cost model accumulates three separate costs (prologue,
    loop body, and epilogue) for a vectorized loop or block.  So allocate an
    array of three unsigned ints, set it to zero, and return its address.  */
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 5ae991d..cc7263f 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -100,6 +100,7 @@ default_builtin_support_vector_misalignment (machine_mode mode,
 					     int, bool);
 extern machine_mode default_preferred_simd_mode (machine_mode mode);
 extern unsigned int default_autovectorize_vector_sizes (void);
+extern machine_mode default_get_mask_mode (unsigned, unsigned);
 extern void *default_init_cost (struct loop *);
 extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
 				       struct _stmt_vec_info *, int,
diff --git a/gcc/tree.c b/gcc/tree.c
index af3a6a3..946d2ad 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -9742,8 +9742,9 @@ make_vector_type (tree innertype, int nunits, machine_mode mode)
 
   if (TYPE_STRUCTURAL_EQUALITY_P (innertype))
     SET_TYPE_STRUCTURAL_EQUALITY (t);
-  else if (TYPE_CANONICAL (innertype) != innertype
-	   || mode != VOIDmode)
+  else if ((TYPE_CANONICAL (innertype) != innertype
+	    || mode != VOIDmode)
+	   && !VECTOR_MASK_TYPE_P (t))
     TYPE_CANONICAL (t)
       = make_vector_type (TYPE_CANONICAL (innertype), nunits, VOIDmode);
 
@@ -10568,6 +10569,36 @@ build_vector_type (tree innertype, int nunits)
   return make_vector_type (innertype, nunits, VOIDmode);
 }
 
+/* Build truth vector with specified length and number of units.  */
+
+tree
+build_truth_vector_type (unsigned nunits, unsigned vector_size)
+{
+  machine_mode mask_mode = targetm.vectorize.get_mask_mode (nunits,
+							    vector_size);
+
+  if (mask_mode == VOIDmode)
+    return NULL;
+
+  return make_vector_type (boolean_type_node, nunits, mask_mode);
+}
+
+/* Returns a vector type corresponding to a comparison of VECTYPE.  */
+
+tree
+build_same_sized_truth_vector_type (tree vectype)
+{
+  if (VECTOR_MASK_TYPE_P (vectype))
+    return vectype;
+
+  unsigned HOST_WIDE_INT size = GET_MODE_SIZE (TYPE_MODE (vectype));
+
+  if (!size)
+    size = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
+
+  return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype), size);
+}
+
 /* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set.  */
 
 tree
@@ -11054,9 +11085,10 @@ truth_type_for (tree type)
 {
   if (TREE_CODE (type) == VECTOR_TYPE)
     {
-      tree elem = lang_hooks.types.type_for_size
-        (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (type))), 0);
-      return build_opaque_vector_type (elem, TYPE_VECTOR_SUBPARTS (type));
+      if (VECTOR_MASK_TYPE_P (type))
+	return type;
+      return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (type),
+				      GET_MODE_SIZE (TYPE_MODE (type)));
     }
   else
     return boolean_type_node;
diff --git a/gcc/tree.h b/gcc/tree.h
index 2cd6ec4..09fb26d 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -469,6 +469,12 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 
 #define VECTOR_TYPE_P(TYPE) (TREE_CODE (TYPE) == VECTOR_TYPE)
 
+/* Nonzero if TYPE represents a vector of booleans.  */
+
+#define VECTOR_MASK_TYPE_P(TYPE)				\
+  (TREE_CODE (TYPE) == VECTOR_TYPE			\
+   && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE)
+
 /* Nonzero if TYPE represents an integral type.  Note that we do not
    include COMPLEX types here.  Keep these checks in ascending code
    order.  */
@@ -3820,6 +3826,8 @@ extern tree build_reference_type_for_mode (tree, machine_mode, bool);
 extern tree build_reference_type (tree);
 extern tree build_vector_type_for_mode (tree, machine_mode);
 extern tree build_vector_type (tree innertype, int nunits);
+extern tree build_truth_vector_type (unsigned, unsigned);
+extern tree build_same_sized_truth_vector_type (tree vectype);
 extern tree build_opaque_vector_type (tree innertype, int nunits);
 extern tree build_index_type (tree);
 extern tree build_array_type (tree, tree);

[-- Attachment #3: avx512-vec-bool-02-no-int-vec-cmp.patch --]
[-- Type: text/plain, Size: 6313 bytes --]

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index e8c8189..6ea4f19 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -4753,6 +4753,18 @@ build_conditional_expr (location_t colon_loc, tree ifexp, bool ifexp_bcp,
 		       && TREE_CODE (orig_op2) == INTEGER_CST
 		       && !TREE_OVERFLOW (orig_op2)));
     }
+
+  /* Need to convert condition operand into a vector mask.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (ifexp)))
+    {
+      tree vectype = TREE_TYPE (ifexp);
+      tree elem_type = TREE_TYPE (vectype);
+      tree zero = build_int_cst (elem_type, 0);
+      tree zero_vec = build_vector_from_val (vectype, zero);
+      tree cmp_type = build_same_sized_truth_vector_type (vectype);
+      ifexp = build2 (NE_EXPR, cmp_type, ifexp, zero_vec);
+    }
+
   if (int_const || (ifexp_bcp && TREE_CODE (ifexp) == INTEGER_CST))
     ret = fold_build3_loc (colon_loc, COND_EXPR, result_type, ifexp, op1, op2);
   else
@@ -10195,6 +10207,19 @@ push_cleanup (tree decl, tree cleanup, bool eh_only)
   STATEMENT_LIST_STMT_EXPR (list) = stmt_expr;
 }
 \f
+/* Build a vector comparison using VEC_COND_EXPR.  */
+
+static tree
+build_vec_cmp (tree_code code, tree type,
+	       tree arg0, tree arg1)
+{
+  tree zero_vec = build_zero_cst (type);
+  tree minus_one_vec = build_minus_one_cst (type);
+  tree cmp_type = build_same_sized_truth_vector_type (type);
+  tree cmp = build2 (code, cmp_type, arg0, arg1);
+  return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
+}
+
 /* Build a binary-operation expression without default conversions.
    CODE is the kind of expression to build.
    LOCATION is the operator's location.
@@ -10753,7 +10778,8 @@ build_binary_op (location_t location, enum tree_code code,
           result_type = build_opaque_vector_type (intt,
 						  TYPE_VECTOR_SUBPARTS (type0));
           converted = 1;
-          break;
+	  ret = build_vec_cmp (resultcode, result_type, op0, op1);
+          goto return_build_binary_op;
         }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
@@ -10895,7 +10921,8 @@ build_binary_op (location_t location, enum tree_code code,
           result_type = build_opaque_vector_type (intt,
 						  TYPE_VECTOR_SUBPARTS (type0));
           converted = 1;
-          break;
+	  ret = build_vec_cmp (resultcode, result_type, op0, op1);
+          goto return_build_binary_op;
         }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 8d4a9e2..7f16e84 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4727,8 +4727,10 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree arg2, tree arg3,
 	}
 
       if (!COMPARISON_CLASS_P (arg1))
-	arg1 = cp_build_binary_op (loc, NE_EXPR, arg1,
-				   build_zero_cst (arg1_type), complain);
+	{
+	  tree cmp_type = build_same_sized_truth_vector_type (arg1_type);
+	  arg1 = build2 (NE_EXPR, cmp_type, arg1, build_zero_cst (arg1_type));
+	}
       return fold_build3 (VEC_COND_EXPR, arg2_type, arg1, arg2, arg3);
     }
 
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 83fd34c..89bacc2 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -3898,6 +3898,18 @@ build_binary_op (location_t location, enum tree_code code, tree op0, tree op1,
   return cp_build_binary_op (location, code, op0, op1, tf_warning_or_error);
 }
 
+/* Build a vector comparison using VEC_COND_EXPR.  */
+
+static tree
+build_vec_cmp (tree_code code, tree type,
+	       tree arg0, tree arg1)
+{
+  tree zero_vec = build_zero_cst (type);
+  tree minus_one_vec = build_minus_one_cst (type);
+  tree cmp_type = build_same_sized_truth_vector_type(type);
+  tree cmp = build2 (code, cmp_type, arg0, arg1);
+  return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
+}
 
 /* Build a binary-operation expression without default conversions.
    CODE is the kind of expression to build.
@@ -4757,7 +4769,7 @@ cp_build_binary_op (location_t location,
 	  result_type = build_opaque_vector_type (intt,
 						  TYPE_VECTOR_SUBPARTS (type0));
 	  converted = 1;
-	  break;
+	  return build_vec_cmp (resultcode, result_type, op0, op1);
 	}
       build_type = boolean_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5ac73b3..2ce5a84 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
           return true;
         }
     }
-  /* Or an integer vector type with the same size and element count
+  /* Or a boolean vector type with the same element count
      as the comparison operand types.  */
   else if (TREE_CODE (type) == VECTOR_TYPE
-	   && TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+	   && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
     {
       if (TREE_CODE (op0_type) != VECTOR_TYPE
 	  || TREE_CODE (op1_type) != VECTOR_TYPE)
@@ -3478,12 +3478,7 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
           return true;
         }
 
-      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
-	  || (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (type)))
-	      != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type))))
-	  /* The result of a vector comparison is of signed
-	     integral type.  */
-	  || TYPE_UNSIGNED (TREE_TYPE (type)))
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type))
         {
           error ("invalid vector comparison resulting type");
           debug_generic_expr (type);
@@ -3970,15 +3965,13 @@ verify_gimple_assign_ternary (gassign *stmt)
       break;
 
     case VEC_COND_EXPR:
-      if (!VECTOR_INTEGER_TYPE_P (rhs1_type)
-	  || TYPE_SIGN (rhs1_type) != SIGNED
-	  || TYPE_SIZE (rhs1_type) != TYPE_SIZE (lhs_type)
+      if (!VECTOR_MASK_TYPE_P (rhs1_type)
 	  || TYPE_VECTOR_SUBPARTS (rhs1_type)
 	     != TYPE_VECTOR_SUBPARTS (lhs_type))
 	{
-	  error ("the first argument of a VEC_COND_EXPR must be of a signed "
-		 "integral vector type of the same size and number of "
-		 "elements as the result");
+	  error ("the first argument of a VEC_COND_EXPR must be of a "
+		 "boolean vector type of the same number of elements "
+		 "as the result");
 	  debug_generic_expr (lhs_type);
 	  debug_generic_expr (rhs1_type);
 	  return true;

[-- Attachment #4: avx512-vec-bool-03-vec-lower.patch --]
[-- Type: text/plain, Size: 3901 bytes --]

diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index be3d27f..a89b08c 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
 		  tree t, tree bitsize, tree bitpos)
 {
   if (bitpos)
-    return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
+    {
+      if (TREE_CODE (type) == BOOLEAN_TYPE)
+	{
+	  tree itype
+	    = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
+	  tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t,
+					bitsize, bitpos);
+	  return gimplify_build2 (gsi, NE_EXPR, type, field,
+				  build_zero_cst (itype));
+	}
+      else
+	return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
+    }
   else
     return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
 }
@@ -171,6 +183,21 @@ do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
 			  build_int_cst (comp_type, 0));
 }
 
+/* Construct expression (A[BITPOS] code B[BITPOS])
+
+   INNER_TYPE is the type of A and B elements
+
+   returned expression is of boolean type.  */
+static tree
+do_bool_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+		 tree bitpos, tree bitsize, enum tree_code code)
+{
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+
+  return gimplify_build2 (gsi, code, boolean_type_node, a, b);
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -350,9 +377,31 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
                           tree op1, enum tree_code code)
 {
   tree t;
-  if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
-    t = expand_vector_piecewise (gsi, do_compare, type,
-				 TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  if (!expand_vec_cmp_expr_p (TREE_TYPE (op0), type)
+      && !expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
+    {
+      if (VECTOR_MODE_P (TYPE_MODE (type)))
+	{
+	  tree inner_type = TREE_TYPE (TREE_TYPE (op0));
+	  tree elem_type = build_nonstandard_integer_type
+	    (GET_MODE_BITSIZE (TYPE_MODE (inner_type)), 0);
+	  tree int_vec_type = build_vector_type (elem_type,
+						 TYPE_VECTOR_SUBPARTS (type));
+	  tree vec = expand_vector_piecewise (gsi, do_compare, int_vec_type,
+					      TREE_TYPE (TREE_TYPE (op0)),
+					      op0, op1, code);
+	  gimple stmt;
+
+	  return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, vec);
+	  t = make_ssa_name (type);
+	  stmt = gimple_build_assign (t, build1 (VIEW_CONVERT_EXPR, type, vec));
+	  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+	}
+      else
+	t = expand_vector_piecewise (gsi, do_bool_compare, type,
+				     TREE_TYPE (TREE_TYPE (op0)),
+				     op0, op1, code);
+    }
   else
     t = NULL_TREE;
 
@@ -625,11 +674,12 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 	  if (addend == NULL_TREE
 	      && expand_vec_cond_expr_p (type, type))
 	    {
-	      tree zero, cst, cond;
+	      tree zero, cst, cond, mask_type;
 	      gimple stmt;
 
+	      mask_type = build_same_sized_truth_vector_type (type);
 	      zero = build_zero_cst (type);
-	      cond = build2 (LT_EXPR, type, op0, zero);
+	      cond = build2 (LT_EXPR, mask_type, op0, zero);
 	      for (i = 0; i < nunits; i++)
 		vec[i] = build_int_cst (TREE_TYPE (type),
 					((unsigned HOST_WIDE_INT) 1
@@ -1506,6 +1556,12 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != VECTOR_TYPE)
     return;
 
+  /* A scalar operation pretending to be a vector one.  */
+  if (VECTOR_MASK_TYPE_P (type)
+      && !VECTOR_MODE_P (TYPE_MODE (type))
+      && TYPE_MODE (type) != BLKmode)
+    return;
+
   if (CONVERT_EXPR_CODE_P (code)
       || code == FLOAT_EXPR
       || code == FIX_TRUNC_EXPR

[-- Attachment #5: avx512-vec-bool-04-vectorize.patch --]
[-- Type: text/plain, Size: 36486 bytes --]

diff --git a/gcc/expr.c b/gcc/expr.c
index 1e820b4..6ae0c4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11000,9 +11000,15 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
     {
       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
-      tree if_true = constant_boolean_node (true, ops->type);
-      tree if_false = constant_boolean_node (false, ops->type);
-      return expand_vec_cond_expr (ops->type, ifexp, if_true, if_false, target);
+      if (VECTOR_MASK_TYPE_P (ops->type))
+	return expand_vec_cmp_expr (ops->type, ifexp, target);
+      else
+	{
+	  tree if_true = constant_boolean_node (true, ops->type);
+	  tree if_false = constant_boolean_node (false, ops->type);
+	  return expand_vec_cond_expr (ops->type, ifexp, if_true,
+				       if_false, target);
+	}
     }
 
   /* Get the rtx comparison code to use.  We know that EXP is a comparison
@@ -11289,6 +11295,39 @@ try_tablejump (tree index_type, tree index_expr, tree minval, tree range,
   return 1;
 }
 
+/* Return a CONST_VECTOR rtx representing vector mask for
+   a VECTOR_CST of booleans.  */
+static rtx
+const_vector_mask_from_tree (tree exp)
+{
+  rtvec v;
+  unsigned i;
+  int units;
+  tree elt;
+  machine_mode inner, mode;
+
+  mode = TYPE_MODE (TREE_TYPE (exp));
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+
+  v = rtvec_alloc (units);
+
+  for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
+    {
+      elt = VECTOR_CST_ELT (exp, i);
+
+      gcc_assert (TREE_CODE (elt) == INTEGER_CST);
+      if (integer_zerop (elt))
+	RTVEC_ELT (v, i) = CONST0_RTX (inner);
+      else if (integer_onep (elt))
+	RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
+      else
+	gcc_unreachable ();
+    }
+
+  return gen_rtx_CONST_VECTOR (mode, v);
+}
+
 /* Return a CONST_VECTOR rtx for a VECTOR_CST tree.  */
 static rtx
 const_vector_from_tree (tree exp)
@@ -11304,6 +11343,9 @@ const_vector_from_tree (tree exp)
   if (initializer_zerop (exp))
     return CONST0_RTX (mode);
 
+  if (VECTOR_MASK_TYPE_P (TREE_TYPE (exp)))
+      return const_vector_mask_from_tree (exp);
+
   units = GET_MODE_NUNITS (mode);
   inner = GET_MODE_INNER (mode);
 
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index e785946..4ca0a40 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
   create_output_operand (&ops[0], target, TYPE_MODE (type));
   create_fixed_operand (&ops[1], mem);
   create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
+  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
+				      TYPE_MODE (TREE_TYPE (maskt))),
+	       3, ops);
 }
 
 static void
@@ -1908,7 +1910,9 @@ expand_MASK_STORE (gcall *stmt)
   create_fixed_operand (&ops[0], mem);
   create_input_operand (&ops[1], reg, TYPE_MODE (type));
   create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops);
+  expand_insn (convert_optab_handler (maskstore_optab, TYPE_MODE (type),
+				      TYPE_MODE (TREE_TYPE (maskt))),
+	       3, ops);
 }
 
 static void
diff --git a/gcc/optabs.c b/gcc/optabs.c
index e533e6e..fd9932f 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6490,11 +6490,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
 }
 
 /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
-   unsigned operators. Do not generate compare instruction.  */
+   unsigned operators.  OPNO holds an index of the first comparison
+   operand in insn with code ICODE.  Do not generate compare instruction.  */
 
 static rtx
 vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
-		    bool unsignedp, enum insn_code icode)
+		    bool unsignedp, enum insn_code icode,
+		    unsigned int opno)
 {
   struct expand_operand ops[2];
   rtx rtx_op0, rtx_op1;
@@ -6520,7 +6522,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
 
   create_input_operand (&ops[0], rtx_op0, m0);
   create_input_operand (&ops[1], rtx_op1, m1);
-  if (!maybe_legitimize_operands (icode, 4, 2, ops))
+  if (!maybe_legitimize_operands (icode, opno, 2, ops))
     gcc_unreachable ();
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
@@ -6843,16 +6845,25 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
       op0a = TREE_OPERAND (op0, 0);
       op0b = TREE_OPERAND (op0, 1);
       tcode = TREE_CODE (op0);
+      unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
     }
   else
     {
+      gcc_assert (VECTOR_MASK_TYPE_P (TREE_TYPE (op0)));
+      if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE ((op0)))) != MODE_VECTOR_INT)
+	{
+	  /* This is a vcond with mask.  To be supported soon...  */
+	  gcc_unreachable ();
+	}
       /* Fake op0 < 0.  */
-      gcc_assert (!TYPE_UNSIGNED (TREE_TYPE (op0)));
-      op0a = op0;
-      op0b = build_zero_cst (TREE_TYPE (op0));
-      tcode = LT_EXPR;
+      else
+	{
+	  op0a = op0;
+	  op0b = build_zero_cst (TREE_TYPE (op0));
+	  tcode = LT_EXPR;
+	  unsignedp = false;
+	}
     }
-  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
   cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
 
 
@@ -6863,7 +6874,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode);
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
@@ -6877,6 +6888,63 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   return ops[0].value;
 }
 
+/* Return insn code for a comparison operator with VMODE
+   resultin MASK_MODE, unsigned if UNS is true.  */
+
+static inline enum insn_code
+get_vec_cmp_icode (machine_mode vmode, machine_mode mask_mode, bool uns)
+{
+  optab tab = uns ? vec_cmpu_optab : vec_cmp_optab;
+  return convert_optab_handler (tab, vmode, mask_mode);
+}
+
+/* Return TRUE if appropriate vector insn is available
+   for vector comparison expr with vector type VALUE_TYPE
+   and resulting mask with MASK_TYPE.  */
+
+bool
+expand_vec_cmp_expr_p (tree value_type, tree mask_type)
+{
+  enum insn_code icode = get_vec_cmp_icode (TYPE_MODE (value_type),
+					    TYPE_MODE (mask_type),
+					    TYPE_UNSIGNED (value_type));
+  return (icode != CODE_FOR_nothing);
+}
+
+/* Generate insns for a vector comparison into a mask.  */
+
+rtx
+expand_vec_cmp_expr (tree type, tree exp, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  rtx comparison;
+  machine_mode mask_mode = TYPE_MODE (type);
+  machine_mode vmode;
+  bool unsignedp;
+  tree op0a, op0b;
+  enum tree_code tcode;
+
+  op0a = TREE_OPERAND (exp, 0);
+  op0b = TREE_OPERAND (exp, 1);
+  tcode = TREE_CODE (exp);
+
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+  vmode = TYPE_MODE (TREE_TYPE (op0a));
+
+  icode = get_vec_cmp_icode (vmode, mask_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
+  create_output_operand (&ops[0], target, mask_mode);
+  create_fixed_operand (&ops[1], comparison);
+  create_fixed_operand (&ops[2], XEXP (comparison, 0));
+  create_fixed_operand (&ops[3], XEXP (comparison, 1));
+  expand_insn (icode, 4, ops);
+  return ops[0].value;
+}
+
 /* Return non-zero if a highpart multiply is supported of can be synthisized.
    For the benefit of expand_mult_highpart, the return value is 1 for direct,
    2 for even/odd widening, and 3 for hi/lo widening.  */
@@ -7002,26 +7070,32 @@ expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
 
 /* Return true if target supports vector masked load/store for mode.  */
 bool
-can_vec_mask_load_store_p (machine_mode mode, bool is_load)
+can_vec_mask_load_store_p (machine_mode mode,
+			   machine_mode mask_mode,
+			   bool is_load)
 {
   optab op = is_load ? maskload_optab : maskstore_optab;
-  machine_mode vmode;
   unsigned int vector_sizes;
 
   /* If mode is vector mode, check it directly.  */
   if (VECTOR_MODE_P (mode))
-    return optab_handler (op, mode) != CODE_FOR_nothing;
+    return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
 
   /* Otherwise, return true if there is some vector mode with
      the mask load/store supported.  */
 
   /* See if there is any chance the mask load or store might be
      vectorized.  If not, punt.  */
-  vmode = targetm.vectorize.preferred_simd_mode (mode);
-  if (!VECTOR_MODE_P (vmode))
+  mode = targetm.vectorize.preferred_simd_mode (mode);
+  if (!VECTOR_MODE_P (mode))
+    return false;
+
+  mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+					       GET_MODE_SIZE (mode));
+  if (mask_mode == VOIDmode)
     return false;
 
-  if (optab_handler (op, vmode) != CODE_FOR_nothing)
+  if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
     return true;
 
   vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
@@ -7031,9 +7105,12 @@ can_vec_mask_load_store_p (machine_mode mode, bool is_load)
       vector_sizes &= ~cur;
       if (cur <= GET_MODE_SIZE (mode))
 	continue;
-      vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
-      if (VECTOR_MODE_P (vmode)
-	  && optab_handler (op, vmode) != CODE_FOR_nothing)
+      mode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
+      mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+						   cur);
+      if (VECTOR_MODE_P (mode)
+	  && mask_mode != VOIDmode
+	  && convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
 	return true;
     }
   return false;
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..9804378 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -61,6 +61,10 @@ OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b")
 OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b")
 OPTAB_CD(vcond_optab, "vcond$a$b")
 OPTAB_CD(vcondu_optab, "vcondu$a$b")
+OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
+OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
+OPTAB_CD(maskload_optab, "maskload$a$b")
+OPTAB_CD(maskstore_optab, "maskstore$a$b")
 
 OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(add_optab, "add$F$a3")
@@ -264,8 +268,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
-OPTAB_D (maskload_optab, "maskload$a")
-OPTAB_D (maskstore_optab, "maskstore$a")
 OPTAB_D (vec_extract_optab, "vec_extract$a")
 OPTAB_D (vec_init_optab, "vec_init$a")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..dfe9ebf 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -496,6 +496,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
 extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
 
 /* Return tree if target supports vector operations for COND_EXPR.  */
+bool expand_vec_cmp_expr_p (tree, tree);
+
+/* Generate code for VEC_COND_EXPR.  */
+extern rtx expand_vec_cmp_expr (tree, tree, rtx);
+
+/* Return true if target supports vector comparison.  */
 bool expand_vec_cond_expr_p (tree, tree);
 
 /* Generate code for VEC_COND_EXPR.  */
@@ -508,7 +514,7 @@ extern int can_mult_highpart_p (machine_mode, bool);
 extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
 
 /* Return true if target supports vector masked load/store for mode.  */
-extern bool can_vec_mask_load_store_p (machine_mode, bool);
+extern bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
 
 /* Return true if there is an inline compare and swap pattern.  */
 extern bool can_compare_and_swap_p (machine_mode, bool);
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 291e602..d66517d 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -811,7 +811,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
       || VECTOR_MODE_P (mode))
     return false;
 
-  if (can_vec_mask_load_store_p (mode, is_load))
+  if (can_vec_mask_load_store_p (mode, VOIDmode, is_load))
     return true;
 
   return false;
@@ -2068,7 +2068,7 @@ predicate_mem_writes (loop_p loop)
 	  {
 	    tree lhs = gimple_assign_lhs (stmt);
 	    tree rhs = gimple_assign_rhs1 (stmt);
-	    tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask;
+	    tree ref, addr, ptr, masktype, mask;
 	    gimple new_stmt;
 	    int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
 	    ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
@@ -2082,15 +2082,47 @@ predicate_mem_writes (loop_p loop)
 	      mask = vect_masks[index];
 	    else
 	      {
-		masktype = build_nonstandard_integer_type (bitsize, 1);
-		mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
-		mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
-		cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-						   is_gimple_condexpr,
-						   NULL_TREE,
-						   true, GSI_SAME_STMT);
-		mask = fold_build_cond_expr (masktype, unshare_expr (cond),
-					     mask_op0, mask_op1);
+		masktype = boolean_type_node;
+		if ((TREE_CODE (cond) == NE_EXPR
+		     || TREE_CODE (cond) == EQ_EXPR)
+		    && (integer_zerop (TREE_OPERAND (cond, 1))
+			|| integer_onep (TREE_OPERAND (cond, 1)))
+		    && TREE_CODE (TREE_TYPE (TREE_OPERAND (cond, 0)))
+		       == BOOLEAN_TYPE)
+		  {
+		    bool negate = (TREE_CODE (cond) == EQ_EXPR);
+		    if (integer_onep (TREE_OPERAND (cond, 1)))
+		      negate = !negate;
+		    if (swap)
+		      negate = !negate;
+		    mask = TREE_OPERAND (cond, 0);
+		    if (negate)
+		      {
+			mask = ifc_temp_var (masktype, unshare_expr (cond),
+					     &gsi);
+			mask = build1 (TRUTH_NOT_EXPR, masktype, mask);
+		      }
+		  }
+		else if (swap &&
+			 TREE_CODE_CLASS (TREE_CODE (cond)) == tcc_comparison)
+		  {
+		    tree op_type = TREE_TYPE (TREE_OPERAND (cond, 0));
+		    tree_code code
+		      = invert_tree_comparison (TREE_CODE (cond),
+						HONOR_NANS (op_type));
+		    if (code != ERROR_MARK)
+			mask = build2 (code, TREE_TYPE (cond),
+				       TREE_OPERAND (cond, 0),
+				       TREE_OPERAND (cond, 1));
+		    else
+		      {
+			mask = ifc_temp_var (masktype, unshare_expr (cond),
+					     &gsi);
+			mask = build1 (TRUTH_NOT_EXPR, masktype, mask);
+		      }
+		  }
+		else
+		  mask = unshare_expr (cond);
 		mask = ifc_temp_var (masktype, mask, &gsi);
 		/* Save mask and its size for further use.  */
 	        vect_sizes.safe_push (bitsize);
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index f1eaef4..0a39825 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3849,6 +3849,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind var_kind, const char *name)
   case vect_scalar_var:
     prefix = "stmp";
     break;
+  case vect_mask_var:
+    prefix = "mask";
+    break;
   case vect_pointer_var:
     prefix = "vectp";
     break;
@@ -4403,7 +4406,11 @@ vect_create_destination_var (tree scalar_dest, tree vectype)
   tree type;
   enum vect_var_kind kind;
 
-  kind = vectype ? vect_simple_var : vect_scalar_var;
+  kind = vectype
+    ? VECTOR_MASK_TYPE_P (vectype)
+    ? vect_mask_var
+    : vect_simple_var
+    : vect_scalar_var;
   type = vectype ? vectype : TREE_TYPE (scalar_dest);
 
   gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 59c75af..1810f78 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -193,19 +193,21 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  int nbbs = loop->num_nodes;
+  unsigned nbbs = loop->num_nodes;
   unsigned int vectorization_factor = 0;
   tree scalar_type;
   gphi *phi;
   tree vectype;
   unsigned int nunits;
   stmt_vec_info stmt_info;
-  int i;
+  unsigned i;
   HOST_WIDE_INT dummy;
   gimple stmt, pattern_stmt = NULL;
   gimple_seq pattern_def_seq = NULL;
   gimple_stmt_iterator pattern_def_si = gsi_none ();
   bool analyze_pattern_stmt = false;
+  bool bool_result;
+  auto_vec<stmt_vec_info> mask_producers;
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -424,6 +426,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 	      return false;
 	    }
 
+	  bool_result = false;
+
 	  if (STMT_VINFO_VECTYPE (stmt_info))
 	    {
 	      /* The only case when a vectype had been already set is for stmts
@@ -444,6 +448,32 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 		scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
 	      else
 		scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
+
+	      /* Bool ops don't participate in vectorization factor
+		 computation.  For comparison use compared types to
+		 compute a factor.  */
+	      if (TREE_CODE (scalar_type) == BOOLEAN_TYPE)
+		{
+		  mask_producers.safe_push (stmt_info);
+		  bool_result = true;
+
+		  if (gimple_code (stmt) == GIMPLE_ASSIGN
+		      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
+			 == tcc_comparison
+		      && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+			 != BOOLEAN_TYPE)
+		    scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+		  else
+		    {
+		      if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
+			{
+			  pattern_def_seq = NULL;
+			  gsi_next (&si);
+			}
+		      continue;
+		    }
+		}
+
 	      if (dump_enabled_p ())
 		{
 		  dump_printf_loc (MSG_NOTE, vect_location,
@@ -466,7 +496,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 		  return false;
 		}
 
-	      STMT_VINFO_VECTYPE (stmt_info) = vectype;
+	      if (!bool_result)
+		STMT_VINFO_VECTYPE (stmt_info) = vectype;
 
 	      if (dump_enabled_p ())
 		{
@@ -479,8 +510,9 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 	  /* The vectorization factor is according to the smallest
 	     scalar type (or the largest vector size, but we only
 	     support one vector size per loop).  */
-	  scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
-						       &dummy);
+	  if (!bool_result)
+	    scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
+							 &dummy);
 	  if (dump_enabled_p ())
 	    {
 	      dump_printf_loc (MSG_NOTE, vect_location,
@@ -555,6 +587,100 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
     }
   LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
 
+  for (i = 0; i < mask_producers.length (); i++)
+    {
+      tree mask_type = NULL;
+      bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (mask_producers[i]);
+
+      stmt = STMT_VINFO_STMT (mask_producers[i]);
+
+      if (gimple_code (stmt) == GIMPLE_ASSIGN
+	  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
+	  && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt))) != BOOLEAN_TYPE)
+	{
+	  scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+	  mask_type = get_mask_type_for_scalar_type (scalar_type);
+
+	  if (!mask_type)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "not vectorized: unsupported mask\n");
+	      return false;
+	    }
+	}
+      else
+	{
+	  tree rhs, def;
+	  ssa_op_iter iter;
+	  gimple def_stmt;
+	  enum vect_def_type dt;
+
+	  FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
+	    {
+	      if (!vect_is_simple_use_1 (rhs, stmt, loop_vinfo, bb_vinfo,
+					 &def_stmt, &def, &dt, &vectype))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "not vectorized: can't compute mask type "
+				       "for statement, ");
+		      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
+					0);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		    }
+		  return false;
+		}
+
+	      /* No vectype probably means external definition.
+		 Allow it in case there is another operand which
+		 allows to determine mask type.  */
+	      if (!vectype)
+		continue;
+
+	      if (!mask_type)
+		mask_type = vectype;
+	      else if (TYPE_VECTOR_SUBPARTS (mask_type)
+		       != TYPE_VECTOR_SUBPARTS (vectype))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "not vectorized: different sized masks "
+				       "types in statement, ");
+		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+					 mask_type);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+					 vectype);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		    }
+		  return false;
+		}
+	    }
+	}
+
+      /* No mask_type should mean loop invariant predicate.
+	 This is probably a subject for optimization in
+	 if-conversion.  */
+      if (!mask_type)
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: can't compute mask type "
+			       "for statement, ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
+				0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+	  return false;
+	}
+
+      STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type;
+    }
+
   return true;
 }
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f87c066..f3887be 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1316,27 +1316,61 @@ vect_init_vector_1 (gimple stmt, gimple new_stmt, gimple_stmt_iterator *gsi)
 tree
 vect_init_vector (gimple stmt, tree val, tree type, gimple_stmt_iterator *gsi)
 {
+  tree val_type = TREE_TYPE (val);
+  machine_mode mode = TYPE_MODE (type);
+  machine_mode val_mode = TYPE_MODE(val_type);
   tree new_var;
   gimple init_stmt;
   tree vec_oprnd;
   tree new_temp;
 
   if (TREE_CODE (type) == VECTOR_TYPE
-      && TREE_CODE (TREE_TYPE (val)) != VECTOR_TYPE)
-    {
-      if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
+      && TREE_CODE (val_type) != VECTOR_TYPE)
+    {
+      /* Handle vector of bool represented as a vector of
+	 integers here rather than on expand because it is
+	 a default mask type for targets.  Vector mask is
+	 built in a following way:
+
+	 tmp = (int)val
+	 vec_tmp = {tmp, ..., tmp}
+	 vec_cst = VIEW_CONVERT_EXPR<vector(N) _Bool>(vec_tmp);  */
+      if (TREE_CODE (val_type) == BOOLEAN_TYPE
+	  && VECTOR_MODE_P (mode)
+	  && SCALAR_INT_MODE_P (GET_MODE_INNER (mode))
+	  && GET_MODE_INNER (mode) != val_mode)
 	{
-	  if (CONSTANT_CLASS_P (val))
-	    val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
-	  else
+	  unsigned size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
+	  tree stype = build_nonstandard_integer_type (size, 1);
+	  tree vectype = get_vectype_for_scalar_type (stype);
+
+	  new_temp = make_ssa_name (stype);
+	  init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
+	  vect_init_vector_1 (stmt, init_stmt, gsi);
+
+	  val = make_ssa_name (vectype);
+	  new_temp = build_vector_from_val (vectype, new_temp);
+	  init_stmt = gimple_build_assign (val, new_temp);
+	  vect_init_vector_1 (stmt, init_stmt, gsi);
+
+	  val = build1 (VIEW_CONVERT_EXPR, type, val);
+	}
+      else
+	{
+	  if (!types_compatible_p (TREE_TYPE (type), val_type))
 	    {
-	      new_temp = make_ssa_name (TREE_TYPE (type));
-	      init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
-	      vect_init_vector_1 (stmt, init_stmt, gsi);
-	      val = new_temp;
+	      if (CONSTANT_CLASS_P (val))
+		val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
+	      else
+		{
+		  new_temp = make_ssa_name (TREE_TYPE (type));
+		  init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
+		  vect_init_vector_1 (stmt, init_stmt, gsi);
+		  val = new_temp;
+		}
 	    }
+	  val = build_vector_from_val (type, val);
 	}
-      val = build_vector_from_val (type, val);
     }
 
   new_var = vect_get_new_vect_var (type, vect_simple_var, "cst_");
@@ -1368,6 +1402,7 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
   gimple def_stmt;
   stmt_vec_info def_stmt_info = NULL;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
+  tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
   unsigned int nunits;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
   tree def;
@@ -1411,7 +1446,12 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
     /* Case 1: operand is a constant.  */
     case vect_constant_def:
       {
-	vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+	if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+	    && VECTOR_MASK_TYPE_P (stmt_vectype))
+	  vector_type = stmt_vectype;
+	else
+	  vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+
 	gcc_assert (vector_type);
 	nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
@@ -1429,7 +1469,11 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
     /* Case 2: operand is defined outside the loop - loop invariant.  */
     case vect_external_def:
       {
-	vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
+	if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+	    && VECTOR_MASK_TYPE_P (stmt_vectype))
+	  vector_type = stmt_vectype;
+	else
+	  vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
 	gcc_assert (vector_type);
 
 	if (scalar_def)
@@ -1758,6 +1802,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
   bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree mask_vectype;
   tree elem_type;
   gimple new_stmt;
   tree dummy;
@@ -1785,8 +1830,8 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 
   is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE;
   mask = gimple_call_arg (stmt, 2);
-  if (TYPE_PRECISION (TREE_TYPE (mask))
-      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype))))
+
+  if (TREE_CODE (TREE_TYPE (mask)) != BOOLEAN_TYPE)
     return false;
 
   /* FORNOW. This restriction should be relaxed.  */
@@ -1815,6 +1860,19 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
   if (STMT_VINFO_STRIDED_P (stmt_info))
     return false;
 
+  if (TREE_CODE (mask) != SSA_NAME)
+    return false;
+
+  if (!vect_is_simple_use_1 (mask, stmt, loop_vinfo, NULL,
+			     &def_stmt, &def, &dt, &mask_vectype))
+    return false;
+
+  if (!mask_vectype)
+    mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
+
+  if (!mask_vectype)
+    return false;
+
   if (STMT_VINFO_GATHER_P (stmt_info))
     {
       gimple def_stmt;
@@ -1848,14 +1906,9 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 				 : DR_STEP (dr), size_zero_node) <= 0)
     return false;
   else if (!VECTOR_MODE_P (TYPE_MODE (vectype))
-	   || !can_vec_mask_load_store_p (TYPE_MODE (vectype), !is_store))
-    return false;
-
-  if (TREE_CODE (mask) != SSA_NAME)
-    return false;
-
-  if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL,
-			   &def_stmt, &def, &dt))
+	   || !can_vec_mask_load_store_p (TYPE_MODE (vectype),
+					  TYPE_MODE (mask_vectype),
+					  !is_store))
     return false;
 
   if (is_store)
@@ -7229,10 +7282,7 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
 	   && TREE_CODE (else_clause) != FIXED_CST)
     return false;
 
-  unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype)));
-  /* The result of a vector comparison should be signed type.  */
-  tree cmp_type = build_nonstandard_integer_type (prec, 0);
-  vec_cmp_type = get_same_sized_vectype (cmp_type, vectype);
+  vec_cmp_type = build_same_sized_truth_vector_type (comp_vectype);
   if (vec_cmp_type == NULL_TREE)
     return false;
 
@@ -7373,6 +7423,201 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+
+bool
+vectorizable_comparison (gimple stmt, gimple_stmt_iterator *gsi,
+			 gimple *vec_stmt, tree reduc_def,
+			 slp_tree slp_node)
+{
+  tree lhs, rhs1, rhs2;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype1, vectype2;
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
+  tree vec_compare;
+  tree new_temp;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  tree def;
+  enum vect_def_type dt, dts[4];
+  unsigned nunits;
+  int ncopies;
+  enum tree_code code;
+  stmt_vec_info prev_stmt_info = NULL;
+  int i, j;
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  vec<tree> vec_oprnds0 = vNULL;
+  vec<tree> vec_oprnds1 = vNULL;
+  tree mask_type;
+  tree mask;
+
+  if (!VECTOR_MASK_TYPE_P (vectype))
+    return false;
+
+  mask_type = vectype;
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+    ncopies = 1;
+  else
+    ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+	   && reduc_def))
+    return false;
+
+  if (STMT_VINFO_LIVE_P (stmt_info))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "value used after loop.\n");
+      return false;
+    }
+
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  code = gimple_assign_rhs_code (stmt);
+
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (rhs1) == SSA_NAME)
+    {
+      gimple rhs1_def_stmt = SSA_NAME_DEF_STMT (rhs1);
+      if (!vect_is_simple_use_1 (rhs1, stmt, loop_vinfo, bb_vinfo,
+				 &rhs1_def_stmt, &def, &dt, &vectype1))
+	return false;
+    }
+  else if (TREE_CODE (rhs1) != INTEGER_CST && TREE_CODE (rhs1) != REAL_CST
+	   && TREE_CODE (rhs1) != FIXED_CST)
+    return false;
+
+  if (TREE_CODE (rhs2) == SSA_NAME)
+    {
+      gimple rhs2_def_stmt = SSA_NAME_DEF_STMT (rhs2);
+      if (!vect_is_simple_use_1 (rhs2, stmt, loop_vinfo, bb_vinfo,
+				 &rhs2_def_stmt, &def, &dt, &vectype2))
+	return false;
+    }
+  else if (TREE_CODE (rhs2) != INTEGER_CST && TREE_CODE (rhs2) != REAL_CST
+	   && TREE_CODE (rhs2) != FIXED_CST)
+    return false;
+
+  vectype = vectype1 ? vectype1 : vectype2;
+
+  if (!vectype
+      || nunits != TYPE_VECTOR_SUBPARTS (vectype))
+    return false;
+
+  if (!vec_stmt)
+    {
+      STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+      return expand_vec_cmp_expr_p (vectype, mask_type);
+    }
+
+  /* Transform.  */
+  if (!slp_node)
+    {
+      vec_oprnds0.create (1);
+      vec_oprnds1.create (1);
+    }
+
+  /* Handle def.  */
+  lhs = gimple_assign_lhs (stmt);
+  mask = vect_create_destination_var (lhs, mask_type);
+
+  /* Handle cmp expr.  */
+  for (j = 0; j < ncopies; j++)
+    {
+      gassign *new_stmt = NULL;
+      if (j == 0)
+	{
+	  if (slp_node)
+	    {
+	      auto_vec<tree, 2> ops;
+	      auto_vec<vec<tree>, 2> vec_defs;
+
+	      ops.safe_push (rhs1);
+	      ops.safe_push (rhs2);
+	      vect_get_slp_defs (ops, slp_node, &vec_defs, -1);
+	      vec_oprnds1 = vec_defs.pop ();
+	      vec_oprnds0 = vec_defs.pop ();
+
+	      ops.release ();
+	      vec_defs.release ();
+	    }
+	  else
+	    {
+	      gimple gtemp;
+	      vec_rhs1
+		= vect_get_vec_def_for_operand (rhs1, stmt, NULL);
+	      vect_is_simple_use (rhs1, stmt, loop_vinfo, NULL,
+				  &gtemp, &def, &dts[0]);
+	      vec_rhs2 =
+		vect_get_vec_def_for_operand (rhs2, stmt, NULL);
+	      vect_is_simple_use (rhs2, stmt, loop_vinfo, NULL,
+				  &gtemp, &def, &dts[1]);
+	    }
+	}
+      else
+	{
+	  vec_rhs1 = vect_get_vec_def_for_stmt_copy (dts[0],
+						     vec_oprnds0.pop ());
+	  vec_rhs2 = vect_get_vec_def_for_stmt_copy (dts[1],
+						     vec_oprnds1.pop ());
+	}
+
+      if (!slp_node)
+	{
+	  vec_oprnds0.quick_push (vec_rhs1);
+	  vec_oprnds1.quick_push (vec_rhs2);
+	}
+
+      /* Arguments are ready.  Create the new vector stmt.  */
+      FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_rhs1)
+	{
+	  vec_rhs2 = vec_oprnds1[i];
+
+	  vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2);
+	  new_stmt = gimple_build_assign (mask, vec_compare);
+	  new_temp = make_ssa_name (mask, new_stmt);
+	  gimple_assign_set_lhs (new_stmt, new_temp);
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	  if (slp_node)
+	    SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
+	}
+
+      if (slp_node)
+	continue;
+
+      if (j == 0)
+	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+      else
+	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
+    }
+
+  vec_oprnds0.release ();
+  vec_oprnds1.release ();
+
+  return true;
+}
 
 /* Make sure the statement is vectorizable.  */
 
@@ -7576,7 +7821,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	  || vectorizable_call (stmt, NULL, NULL, node)
 	  || vectorizable_store (stmt, NULL, NULL, node)
 	  || vectorizable_reduction (stmt, NULL, NULL, node)
-	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+	  || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
   else
     {
       if (bb_vinfo)
@@ -7588,7 +7834,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	      || vectorizable_load (stmt, NULL, NULL, node, NULL)
 	      || vectorizable_call (stmt, NULL, NULL, node)
 	      || vectorizable_store (stmt, NULL, NULL, node)
-	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+	      || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
     }
 
   if (!ok)
@@ -7704,6 +7951,11 @@ vect_transform_stmt (gimple stmt, gimple_stmt_iterator *gsi,
       gcc_assert (done);
       break;
 
+    case comparison_vec_info_type:
+      done = vectorizable_comparison (stmt, gsi, &vec_stmt, NULL, slp_node);
+      gcc_assert (done);
+      break;
+
     case call_vec_info_type:
       done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node);
       stmt = gsi_stmt (*gsi);
@@ -8038,6 +8290,23 @@ get_vectype_for_scalar_type (tree scalar_type)
   return vectype;
 }
 
+/* Function get_mask_type_for_scalar_type.
+
+   Returns the mask type corresponding to a result of comparison
+   of vectors of specified SCALAR_TYPE as supported by target.  */
+
+tree
+get_mask_type_for_scalar_type (tree scalar_type)
+{
+  tree vectype = get_vectype_for_scalar_type (scalar_type);
+
+  if (!vectype)
+    return NULL;
+
+  return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype),
+				  current_vector_size);
+}
+
 /* Function get_same_sized_vectype
 
    Returns a vector type corresponding to SCALAR_TYPE of size
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 58e8f10..94aea1a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -28,7 +28,8 @@ along with GCC; see the file COPYING3.  If not see
 enum vect_var_kind {
   vect_simple_var,
   vect_pointer_var,
-  vect_scalar_var
+  vect_scalar_var,
+  vect_mask_var
 };
 
 /* Defines type of operation.  */
@@ -482,6 +483,7 @@ enum stmt_vec_info_type {
   call_simd_clone_vec_info_type,
   assignment_vec_info_type,
   condition_vec_info_type,
+  comparison_vec_info_type,
   reduc_vec_info_type,
   induc_vec_info_type,
   type_promotion_vec_info_type,
@@ -995,6 +997,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 /* In tree-vect-stmts.c.  */
 extern unsigned int current_vector_size;
 extern tree get_vectype_for_scalar_type (tree);
+extern tree get_mask_type_for_scalar_type (tree);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_is_simple_use (tree, gimple, loop_vec_info,
 			        bb_vec_info, gimple *,

[-- Attachment #6: avx512-vec-bool-05-bool-patterns.patch --]
[-- Type: text/plain, Size: 7561 bytes --]

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 758ca38..cffacaa 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2957,7 +2957,7 @@ check_bool_pattern (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
     default:
       if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
 	{
-	  tree vecitype, comp_vectype;
+	  tree vecitype, comp_vectype, mask_type;
 
 	  /* If the comparison can throw, then is_gimple_condexpr will be
 	     false and we can't make a COND_EXPR/VEC_COND_EXPR out of it.  */
@@ -2968,6 +2968,11 @@ check_bool_pattern (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
 	  if (comp_vectype == NULL_TREE)
 	    return false;
 
+	  mask_type = get_mask_type_for_scalar_type (TREE_TYPE (rhs1));
+	  if (mask_type
+	      && expand_vec_cmp_expr_p (comp_vectype, mask_type))
+	    return false;
+
 	  if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE)
 	    {
 	      machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1));
@@ -3192,6 +3197,75 @@ adjust_bool_pattern (tree var, tree out_type, tree trueval,
 }
 
 
+/* Try to determine a proper type for converting bool VAR
+   into an integer value.  The type is chosen so that
+   conversion has the same number of elements as a mask
+   producer.  */
+
+static tree
+search_type_for_mask (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
+{
+  gimple def_stmt;
+  enum vect_def_type dt;
+  tree def, rhs1;
+  enum tree_code rhs_code;
+  tree res = NULL;
+
+  if (TREE_CODE (var) != SSA_NAME)
+    return NULL;
+
+  if ((TYPE_PRECISION (TREE_TYPE (var)) != 1
+       || !TYPE_UNSIGNED (TREE_TYPE (var)))
+      && TREE_CODE (TREE_TYPE (var)) != BOOLEAN_TYPE)
+    return NULL;
+
+  if (!vect_is_simple_use (var, NULL, loop_vinfo, bb_vinfo, &def_stmt, &def,
+			   &dt))
+    return NULL;
+
+  if (dt != vect_internal_def)
+    return NULL;
+
+  if (!is_gimple_assign (def_stmt))
+    return NULL;
+
+  rhs_code = gimple_assign_rhs_code (def_stmt);
+  rhs1 = gimple_assign_rhs1 (def_stmt);
+
+  switch (rhs_code)
+    {
+    case SSA_NAME:
+    case BIT_NOT_EXPR:
+    CASE_CONVERT:
+      res = search_type_for_mask (rhs1, loop_vinfo, bb_vinfo);
+      break;
+
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_XOR_EXPR:
+      if (!(res = search_type_for_mask (rhs1, loop_vinfo, bb_vinfo)))
+	res = search_type_for_mask (gimple_assign_rhs2 (def_stmt),
+					loop_vinfo, bb_vinfo);
+      break;
+
+    default:
+      if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+	{
+	  if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE
+	      || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
+	    {
+	      machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1));
+	      res = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1);
+	    }
+	  else
+	    res = TREE_TYPE (rhs1);
+	}
+    }
+
+  return res;
+}
+
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -3249,6 +3323,7 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
+  stmt_vec_info new_stmt_info;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
   gimple pattern_stmt;
@@ -3274,16 +3349,43 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in,
       if (vectype == NULL_TREE)
 	return NULL;
 
-      if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
-	return NULL;
-
-      rhs = adjust_bool_pattern (var, TREE_TYPE (lhs), NULL_TREE, stmts);
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
-	pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);
+      if (check_bool_pattern (var, loop_vinfo, bb_vinfo))
+	{
+	  rhs = adjust_bool_pattern (var, TREE_TYPE (lhs), NULL_TREE, stmts);
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
+	    pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);
+	  else
+	    pattern_stmt
+	      = gimple_build_assign (lhs, NOP_EXPR, rhs);
+	}
       else
-	pattern_stmt
-	  = gimple_build_assign (lhs, NOP_EXPR, rhs);
+	{
+	  tree type = search_type_for_mask (var, loop_vinfo, bb_vinfo);
+	  tree cst0, cst1;
+
+	  if (!type || TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (lhs)))
+	    type = TREE_TYPE (lhs);
+	  cst0 = build_int_cst (type, 0);
+	  cst1 = build_int_cst (type, 1);
+	  lhs = vect_recog_temp_ssa_var (type, NULL);
+	  pattern_stmt = gimple_build_assign (lhs, COND_EXPR, var, cst0, cst1);
+
+	  if (!useless_type_conversion_p (type, TREE_TYPE (lhs)))
+	    {
+	      tree new_vectype = get_vectype_for_scalar_type (type);
+	      new_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo,
+						 bb_vinfo);
+	      set_vinfo_for_stmt (pattern_stmt, new_stmt_info);
+	      STMT_VINFO_VECTYPE (new_stmt_info) = new_vectype;
+	      new_pattern_def_seq (stmt_vinfo, pattern_stmt);
+
+	      rhs = lhs;
+	      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	      pattern_stmt = gimple_build_assign (lhs, CONVERT_EXPR, rhs);
+	    }
+	}
+
       *type_out = vectype;
       *type_in = vectype;
       stmts->safe_push (last_stmt);
@@ -3312,10 +3414,11 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in,
       if (get_vectype_for_scalar_type (type) == NULL_TREE)
 	return NULL;
 
-      if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
-	return NULL;
+      if (check_bool_pattern (var, loop_vinfo, bb_vinfo))
+	rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts);
+      else
+	rhs = var;
 
-      rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts);
       lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
       pattern_stmt 
 	  = gimple_build_assign (lhs, COND_EXPR,
@@ -3340,16 +3443,38 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in,
       gcc_assert (vectype != NULL_TREE);
       if (!VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
-      if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
-	return NULL;
 
-      rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, stmts);
+      if (check_bool_pattern (var, loop_vinfo, bb_vinfo))
+	rhs = adjust_bool_pattern (var, TREE_TYPE (vectype),
+				   NULL_TREE, stmts);
+      else
+	{
+	  tree type = search_type_for_mask (var, loop_vinfo, bb_vinfo);
+	  tree cst0, cst1, new_vectype;
+
+	  if (!type || TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (vectype)))
+	    type = TREE_TYPE (vectype);
+
+	  cst0 = build_int_cst (type, 0);
+	  cst1 = build_int_cst (type, 1);
+	  new_vectype = get_vectype_for_scalar_type (type);
+
+	  rhs = vect_recog_temp_ssa_var (type, NULL);
+	  pattern_stmt = gimple_build_assign (rhs, COND_EXPR, var, cst0, cst1);
+
+	  pattern_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo,
+						 bb_vinfo);
+	  set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info);
+	  STMT_VINFO_VECTYPE (pattern_stmt_info) = new_vectype;
+	  append_pattern_def_seq (stmt_vinfo, pattern_stmt);
+	}
+
       lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs);
       if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
 	{
 	  tree rhs2 = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
 	  gimple cast_stmt = gimple_build_assign (rhs2, NOP_EXPR, rhs);
-	  new_pattern_def_seq (stmt_vinfo, cast_stmt);
+	  append_pattern_def_seq (stmt_vinfo, cast_stmt);
 	  rhs = rhs2;
 	}
       pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);

[-- Attachment #7: avx512-vec-bool-06-i386.patch --]
[-- Type: text/plain, Size: 22755 bytes --]

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 6a17ef4..e22aa57 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -129,6 +129,9 @@ extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
 extern void ix86_expand_vec_perm (rtx[]);
 extern bool ix86_expand_vec_perm_const (rtx[]);
+extern bool ix86_expand_mask_vec_cmp (rtx[]);
+extern bool ix86_expand_int_vec_cmp (rtx[]);
+extern bool ix86_expand_fp_vec_cmp (rtx[]);
 extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 070605f..d17c350 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21440,8 +21440,8 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1,
     cmp_op1 = force_reg (cmp_ops_mode, cmp_op1);
 
   if (optimize
-      || reg_overlap_mentioned_p (dest, op_true)
-      || reg_overlap_mentioned_p (dest, op_false))
+      || (op_true && reg_overlap_mentioned_p (dest, op_true))
+      || (op_false && reg_overlap_mentioned_p (dest, op_false)))
     dest = gen_reg_rtx (maskcmp ? cmp_mode : mode);
 
   /* Compare patterns for int modes are unspec in AVX512F only.  */
@@ -21713,34 +21713,127 @@ ix86_expand_fp_movcc (rtx operands[])
   return true;
 }
 
-/* Expand a floating-point vector conditional move; a vcond operation
-   rather than a movcc operation.  */
+/* Helper for ix86_cmp_code_to_pcmp_immediate for int modes.  */
+
+static int
+ix86_int_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+  switch (code)
+    {
+    case EQ:
+      return 0;
+    case LT:
+    case LTU:
+      return 1;
+    case LE:
+    case LEU:
+      return 2;
+    case NE:
+      return 4;
+    case GE:
+    case GEU:
+      return 5;
+    case GT:
+    case GTU:
+      return 6;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Helper for ix86_cmp_code_to_pcmp_immediate for fp modes.  */
+
+static int
+ix86_fp_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+  switch (code)
+    {
+    case EQ:
+      return 0x08;
+    case NE:
+      return 0x04;
+    case GT:
+      return 0x16;
+    case LE:
+      return 0x1a;
+    case GE:
+      return 0x15;
+    case LT:
+      return 0x19;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return immediate value to be used in UNSPEC_PCMP
+   for comparison CODE in MODE.  */
+
+static int
+ix86_cmp_code_to_pcmp_immediate (enum rtx_code code, machine_mode mode)
+{
+  if (FLOAT_MODE_P (mode))
+    return ix86_fp_cmp_code_to_pcmp_immediate (code);
+  return ix86_int_cmp_code_to_pcmp_immediate (code);
+}
+
+/* Expand AVX-512 vector comparison.  */
 
 bool
-ix86_expand_fp_vcond (rtx operands[])
+ix86_expand_mask_vec_cmp (rtx operands[])
 {
-  enum rtx_code code = GET_CODE (operands[3]);
+  machine_mode mask_mode = GET_MODE (operands[0]);
+  machine_mode cmp_mode = GET_MODE (operands[2]);
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx imm = GEN_INT (ix86_cmp_code_to_pcmp_immediate (code, cmp_mode));
+  int unspec_code;
+  rtx unspec;
+
+  switch (code)
+    {
+    case LEU:
+    case GTU:
+    case GEU:
+    case LTU:
+      unspec_code = UNSPEC_UNSIGNED_PCMP;
+    default:
+      unspec_code = UNSPEC_PCMP;
+    }
+
+  unspec = gen_rtx_UNSPEC (mask_mode, gen_rtvec (3, operands[2],
+						 operands[3], imm),
+			   unspec_code);
+  emit_insn (gen_rtx_SET (operands[0], unspec));
+
+  return true;
+}
+
+/* Expand fp vector comparison.  */
+
+bool
+ix86_expand_fp_vec_cmp (rtx operands[])
+{
+  enum rtx_code code = GET_CODE (operands[1]);
   rtx cmp;
 
   code = ix86_prepare_sse_fp_compare_args (operands[0], code,
-					   &operands[4], &operands[5]);
+					   &operands[2], &operands[3]);
   if (code == UNKNOWN)
     {
       rtx temp;
-      switch (GET_CODE (operands[3]))
+      switch (GET_CODE (operands[1]))
 	{
 	case LTGT:
-	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
-				      operands[5], operands[0], operands[0]);
-	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
-				     operands[5], operands[1], operands[2]);
+	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[2],
+				      operands[3], NULL, NULL);
+	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[2],
+				     operands[3], NULL, NULL);
 	  code = AND;
 	  break;
 	case UNEQ:
-	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
-				      operands[5], operands[0], operands[0]);
-	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
-				     operands[5], operands[1], operands[2]);
+	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[2],
+				      operands[3], NULL, NULL);
+	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[2],
+				     operands[3], NULL, NULL);
 	  code = IOR;
 	  break;
 	default:
@@ -21748,72 +21841,26 @@ ix86_expand_fp_vcond (rtx operands[])
 	}
       cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
 				 OPTAB_DIRECT);
-      ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
-      return true;
     }
+  else
+    cmp = ix86_expand_sse_cmp (operands[0], code, operands[2], operands[3],
+			       operands[1], operands[2]);
 
-  if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
-				 operands[5], operands[1], operands[2]))
-    return true;
+  if (operands[0] != cmp)
+    emit_move_insn (operands[0], cmp);
 
-  cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
-			     operands[1], operands[2]);
-  ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
   return true;
 }
 
-/* Expand a signed/unsigned integral vector conditional move.  */
-
-bool
-ix86_expand_int_vcond (rtx operands[])
+static rtx
+ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1,
+			 rtx op_true, rtx op_false, bool *negate)
 {
-  machine_mode data_mode = GET_MODE (operands[0]);
-  machine_mode mode = GET_MODE (operands[4]);
-  enum rtx_code code = GET_CODE (operands[3]);
-  bool negate = false;
-  rtx x, cop0, cop1;
-
-  cop0 = operands[4];
-  cop1 = operands[5];
+  machine_mode data_mode = GET_MODE (dest);
+  machine_mode mode = GET_MODE (cop0);
+  rtx x;
 
-  /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
-     and x < 0 ? 1 : 0 into (unsigned) x >> 31.  */
-  if ((code == LT || code == GE)
-      && data_mode == mode
-      && cop1 == CONST0_RTX (mode)
-      && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
-      && GET_MODE_UNIT_SIZE (data_mode) > 1
-      && GET_MODE_UNIT_SIZE (data_mode) <= 8
-      && (GET_MODE_SIZE (data_mode) == 16
-	  || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
-    {
-      rtx negop = operands[2 - (code == LT)];
-      int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
-      if (negop == CONST1_RTX (data_mode))
-	{
-	  rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
-					 operands[0], 1, OPTAB_DIRECT);
-	  if (res != operands[0])
-	    emit_move_insn (operands[0], res);
-	  return true;
-	}
-      else if (GET_MODE_INNER (data_mode) != DImode
-	       && vector_all_ones_operand (negop, data_mode))
-	{
-	  rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
-					 operands[0], 0, OPTAB_DIRECT);
-	  if (res != operands[0])
-	    emit_move_insn (operands[0], res);
-	  return true;
-	}
-    }
-
-  if (!nonimmediate_operand (cop1, mode))
-    cop1 = force_reg (mode, cop1);
-  if (!general_operand (operands[1], data_mode))
-    operands[1] = force_reg (data_mode, operands[1]);
-  if (!general_operand (operands[2], data_mode))
-    operands[2] = force_reg (data_mode, operands[2]);
+  *negate = false;
 
   /* XOP supports all of the comparisons on all 128-bit vector int types.  */
   if (TARGET_XOP
@@ -21834,13 +21881,13 @@ ix86_expand_int_vcond (rtx operands[])
 	case LE:
 	case LEU:
 	  code = reverse_condition (code);
-	  negate = true;
+	  *negate = true;
 	  break;
 
 	case GE:
 	case GEU:
 	  code = reverse_condition (code);
-	  negate = true;
+	  *negate = true;
 	  /* FALLTHRU */
 
 	case LT:
@@ -21861,14 +21908,14 @@ ix86_expand_int_vcond (rtx operands[])
 	    case EQ:
 	      /* SSE4.1 supports EQ.  */
 	      if (!TARGET_SSE4_1)
-		return false;
+		return NULL;
 	      break;
 
 	    case GT:
 	    case GTU:
 	      /* SSE4.2 supports GT/GTU.  */
 	      if (!TARGET_SSE4_2)
-		return false;
+		return NULL;
 	      break;
 
 	    default:
@@ -21929,12 +21976,13 @@ ix86_expand_int_vcond (rtx operands[])
 	    case V8HImode:
 	      /* Perform a parallel unsigned saturating subtraction.  */
 	      x = gen_reg_rtx (mode);
-	      emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0, cop1)));
+	      emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0,
+							   cop1)));
 
 	      cop0 = x;
 	      cop1 = CONST0_RTX (mode);
 	      code = EQ;
-	      negate = !negate;
+	      *negate = !*negate;
 	      break;
 
 	    default:
@@ -21943,22 +21991,162 @@ ix86_expand_int_vcond (rtx operands[])
 	}
     }
 
+  if (*negate)
+    std::swap (op_true, op_false);
+
   /* Allow the comparison to be done in one mode, but the movcc to
      happen in another mode.  */
   if (data_mode == mode)
     {
-      x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-			       operands[1+negate], operands[2-negate]);
+      x = ix86_expand_sse_cmp (dest, code, cop0, cop1,
+			       op_true, op_false);
     }
   else
     {
       gcc_assert (GET_MODE_SIZE (data_mode) == GET_MODE_SIZE (mode));
       x = ix86_expand_sse_cmp (gen_reg_rtx (mode), code, cop0, cop1,
-			       operands[1+negate], operands[2-negate]);
+			       op_true, op_false);
       if (GET_MODE (x) == mode)
 	x = gen_lowpart (data_mode, x);
     }
 
+  return x;
+}
+
+/* Expand integer vector comparison.  */
+
+bool
+ix86_expand_int_vec_cmp (rtx operands[])
+{
+  rtx_code code = GET_CODE (operands[1]);
+  bool negate = false;
+  rtx cmp = ix86_expand_int_sse_cmp (operands[0], code, operands[2],
+				     operands[3], NULL, NULL, &negate);
+
+  if (!cmp)
+    return false;
+
+  if (negate)
+    cmp = ix86_expand_int_sse_cmp (operands[0], EQ, cmp,
+				   CONST0_RTX (GET_MODE (cmp)),
+				   NULL, NULL, &negate);
+
+  gcc_assert (!negate);
+
+  if (operands[0] != cmp)
+    emit_move_insn (operands[0], cmp);
+
+  return true;
+}
+
+/* Expand a floating-point vector conditional move; a vcond operation
+   rather than a movcc operation.  */
+
+bool
+ix86_expand_fp_vcond (rtx operands[])
+{
+  enum rtx_code code = GET_CODE (operands[3]);
+  rtx cmp;
+
+  code = ix86_prepare_sse_fp_compare_args (operands[0], code,
+					   &operands[4], &operands[5]);
+  if (code == UNKNOWN)
+    {
+      rtx temp;
+      switch (GET_CODE (operands[3]))
+	{
+	case LTGT:
+	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
+				      operands[5], operands[0], operands[0]);
+	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
+				     operands[5], operands[1], operands[2]);
+	  code = AND;
+	  break;
+	case UNEQ:
+	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
+				      operands[5], operands[0], operands[0]);
+	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
+				     operands[5], operands[1], operands[2]);
+	  code = IOR;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
+				 OPTAB_DIRECT);
+      ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+      return true;
+    }
+
+  if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
+				 operands[5], operands[1], operands[2]))
+    return true;
+
+  cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
+			     operands[1], operands[2]);
+  ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+  return true;
+}
+
+/* Expand a signed/unsigned integral vector conditional move.  */
+
+bool
+ix86_expand_int_vcond (rtx operands[])
+{
+  machine_mode data_mode = GET_MODE (operands[0]);
+  machine_mode mode = GET_MODE (operands[4]);
+  enum rtx_code code = GET_CODE (operands[3]);
+  bool negate = false;
+  rtx x, cop0, cop1;
+
+  cop0 = operands[4];
+  cop1 = operands[5];
+
+  /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
+     and x < 0 ? 1 : 0 into (unsigned) x >> 31.  */
+  if ((code == LT || code == GE)
+      && data_mode == mode
+      && cop1 == CONST0_RTX (mode)
+      && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
+      && GET_MODE_UNIT_SIZE (data_mode) > 1
+      && GET_MODE_UNIT_SIZE (data_mode) <= 8
+      && (GET_MODE_SIZE (data_mode) == 16
+	  || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
+    {
+      rtx negop = operands[2 - (code == LT)];
+      int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
+      if (negop == CONST1_RTX (data_mode))
+	{
+	  rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
+					 operands[0], 1, OPTAB_DIRECT);
+	  if (res != operands[0])
+	    emit_move_insn (operands[0], res);
+	  return true;
+	}
+      else if (GET_MODE_INNER (data_mode) != DImode
+	       && vector_all_ones_operand (negop, data_mode))
+	{
+	  rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
+					 operands[0], 0, OPTAB_DIRECT);
+	  if (res != operands[0])
+	    emit_move_insn (operands[0], res);
+	  return true;
+	}
+    }
+
+  if (!nonimmediate_operand (cop1, mode))
+    cop1 = force_reg (mode, cop1);
+  if (!general_operand (operands[1], data_mode))
+    operands[1] = force_reg (data_mode, operands[1]);
+  if (!general_operand (operands[2], data_mode))
+    operands[2] = force_reg (data_mode, operands[2]);
+
+  x = ix86_expand_int_sse_cmp (operands[0], code, cop0, cop1,
+			       operands[1], operands[2], &negate);
+
+  if (!x)
+    return false;
+
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 			 operands[2-negate]);
   return true;
@@ -51678,6 +51866,30 @@ ix86_autovectorize_vector_sizes (void)
     (TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0;
 }
 
+/* Implemenation of targetm.vectorize.get_mask_mode.  */
+
+static machine_mode
+ix86_get_mask_mode (unsigned nunits, unsigned vector_size)
+{
+  /* Scalar mask case.  */
+  if (TARGET_AVX512F && vector_size == 64)
+    {
+      unsigned elem_size = vector_size / nunits;
+      if ((vector_size == 64 || TARGET_AVX512VL)
+	  && ((elem_size == 4 || elem_size == 8)
+	      || TARGET_AVX512BW))
+	return smallest_mode_for_size (nunits, MODE_INT);
+    }
+
+  unsigned elem_size = vector_size / nunits;
+  machine_mode elem_mode
+    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
+
+  gcc_assert (elem_size * nunits == vector_size);
+
+  return mode_for_vector (elem_mode, nunits);
+}
+
 \f
 
 /* Return class of registers which could be used for pseudo of MODE
@@ -52612,6 +52824,8 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
 #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode
 #undef TARGET_VECTORIZE_INIT_COST
 #define TARGET_VECTORIZE_INIT_COST ix86_init_cost
 #undef TARGET_VECTORIZE_ADD_STMT_COST
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4535570..a8d55cc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -605,6 +605,15 @@
    (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
+;; Mapping of vector modes to corresponding mask size
+(define_mode_attr avx512fmaskmodelower
+  [(V64QI "di") (V32QI "si") (V16QI "hi")
+   (V32HI "si") (V16HI "hi") (V8HI  "qi") (V4HI "qi")
+   (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
+   (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
+   (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
+   (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V16SF "V16SI") (V8DF  "V8DI")
@@ -2803,6 +2812,150 @@
 		      (const_string "0")))
    (set_attr "mode" "<MODE>")])
 
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:V48_AVX512VL 2 "register_operand")
+	   (match_operand:V48_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512F"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI12_AVX512VL 2 "register_operand")
+	   (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512BW"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI_256 2 "register_operand")
+	   (match_operand:VI_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI124_128 2 "register_operand")
+	   (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpv2div2di"
+  [(set (match_operand:V2DI 0 "register_operand")
+	(match_operator:V2DI 1 ""
+	  [(match_operand:V2DI 2 "register_operand")
+	   (match_operand:V2DI 3 "nonimmediate_operand")]))]
+  "TARGET_SSE4_2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VF_256 2 "register_operand")
+	   (match_operand:VF_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX"
+{
+  bool ok = ix86_expand_fp_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VF_128 2 "register_operand")
+	   (match_operand:VF_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE"
+{
+  bool ok = ix86_expand_fp_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI48_AVX512VL 2 "register_operand")
+	   (match_operand:VI48_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512F"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI12_AVX512VL 2 "register_operand")
+	   (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512BW"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI_256 2 "register_operand")
+	   (match_operand:VI_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI124_128 2 "register_operand")
+	   (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpuv2div2di"
+  [(set (match_operand:V2DI 0 "register_operand")
+	(match_operator:V2DI 1 ""
+	  [(match_operand:V2DI 2 "register_operand")
+	   (match_operand:V2DI 3 "nonimmediate_operand")]))]
+  "TARGET_SSE4_2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
 (define_expand "vcond<V_512:mode><VF_512:mode>"
   [(set (match_operand:V_512 0 "register_operand")
 	(if_then_else:V_512
@@ -17895,7 +18048,7 @@
    (set_attr "btver2_decode" "vector") 
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "maskload<mode>"
+(define_expand "maskload<mode><sseintvecmodelower>"
   [(set (match_operand:V48_AVX2 0 "register_operand")
 	(unspec:V48_AVX2
 	  [(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17903,7 +18056,23 @@
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
-(define_expand "maskstore<mode>"
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+  [(set (match_operand:V48_AVX512VL 0 "register_operand")
+	(vec_merge:V48_AVX512VL
+	  (match_operand:V48_AVX512VL 1 "memory_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+  [(set (match_operand:VI12_AVX512VL 0 "register_operand")
+	(vec_merge:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "memory_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512BW")
+
+(define_expand "maskstore<mode><sseintvecmodelower>"
   [(set (match_operand:V48_AVX2 0 "memory_operand")
 	(unspec:V48_AVX2
 	  [(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17912,6 +18081,22 @@
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
+	(vec_merge:V48_AVX512VL
+	  (match_operand:V48_AVX512VL 1 "register_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+  [(set (match_operand:VI12_AVX512VL 0 "memory_operand")
+	(vec_merge:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "register_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512BW")
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-15 13:55                           ` Ilya Enkovich
@ 2015-09-17 17:54                             ` Richard Henderson
  2015-09-18 13:26                               ` Ilya Enkovich
  2015-09-18 12:45                             ` Richard Biener
  1 sibling, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2015-09-17 17:54 UTC (permalink / raw)
  To: Ilya Enkovich, Jeff Law; +Cc: Richard Biener, GCC Patches

On 09/15/2015 06:52 AM, Ilya Enkovich wrote:
> I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target.  Several issues were met.
> 
>  - c/c++ front-ends generate vector comparison with integer vector result.  I had to make some modifications to use vec_cond instead.  Don't know if there are other front-ends producing vector comparisons.
>  - vector lowering fails to expand vector masks due to mismatch of type and mode sizes.  I fixed vector type size computation to match mode size and added a special handling of mask expand.
>  - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here.  But get_mask_mode call may be moved into type layout to get it.
>  - Expand of vec<bool> constants/contstructors requires special handling.  Common case should require target hooks/optabs to expand vector into required mode.  But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes.
> 
> Currently 'make check' shows two types of regression.
>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  This must be due to my front-end changes.  Hope it will be easy to fix.
>   - missed vectorization. All of them appear due to bool patterns disabling.  I didn't look into all of them but it seems the main problem is in mixed type sizes.  With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation.  With boolean mask we don't have a proper scalar statements to do that.  I think mask widening/narrowing may be directly supported in masked statements vectorization.  Going to look into it.
> 
> I attach what I currently have for a prototype.  It grows bigger so I split into several parts.

The general approach looks good.


> +/* By defaults a vector of integers is used as a mask.  */
> +
> +machine_mode
> +default_get_mask_mode (unsigned nunits, unsigned vector_size)
> +{
> +  unsigned elem_size = vector_size / nunits;
> +  machine_mode elem_mode
> +    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);

Why these arguments as opposed to passing elem_size?  It seems that every hook
is going to have to do this division...

> +#define VECTOR_MASK_TYPE_P(TYPE)				\
> +  (TREE_CODE (TYPE) == VECTOR_TYPE			\
> +   && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE)

Perhaps better as VECTOR_BOOLEAN_TYPE_P, since that's exactly what's being tested?

> @@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>            return true;
>          }
>      }
> -  /* Or an integer vector type with the same size and element count
> +  /* Or a boolean vector type with the same element count
>       as the comparison operand types.  */
>    else if (TREE_CODE (type) == VECTOR_TYPE
> -	   && TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
> +	   && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)

VECTOR_BOOLEAN_TYPE_P.

> @@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
>  		  tree t, tree bitsize, tree bitpos)
>  {
>    if (bitpos)
> -    return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
> +    {
> +      if (TREE_CODE (type) == BOOLEAN_TYPE)
> +	{
> +	  tree itype
> +	    = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
> +	  tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t,
> +					bitsize, bitpos);
> +	  return gimplify_build2 (gsi, NE_EXPR, type, field,
> +				  build_zero_cst (itype));
> +	}
> +      else
> +	return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
> +    }
>    else
>      return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
>  }

So... this is us lowering vector operations on a target that doesn't support
them.  Which means that we get to decide what value is produced for a
comparison?  Which means that we can settle on the "normal" -1, correct?

Which means that we ought not need to extract the entire element and then
compare for non-zero, but rather simply extract a single bit from the element,
and directly call that a boolean result, correct?

I assume you tested all this code with -mno-sse or equivalent arch default?

> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
>    create_output_operand (&ops[0], target, TYPE_MODE (type));
>    create_fixed_operand (&ops[1], mem);
>    create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> -  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
> +  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
> +				      TYPE_MODE (TREE_TYPE (maskt))),
> +	       3, ops);

Why do we now need a conversion here?

> +      if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE ((op0)))) != MODE_VECTOR_INT)
> +	{
> +	  /* This is a vcond with mask.  To be supported soon...  */
> +	  gcc_unreachable ();
> +	}

Leave this out til we need it?  I can't see that you replace this later in the
patch series...


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-03 14:12                               ` Ilya Enkovich
@ 2015-09-18 12:29                                 ` Richard Biener
  2015-09-18 13:44                                   ` Ilya Enkovich
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2015-09-18 12:29 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, gcc-patches

On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-09-03 15:11 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> Adding CCs.
>>>
>>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>>>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>
>>>> Totally disabling old style vector comparison and bool pattern is a
>>>> goal but doing hat would mean a lot of regressions for many targets.
>>>> Do you want to it to be tried to estimate amount of changes required
>>>> and reveal possible issues? What would be integration plan for these
>>>> changes? Do you want to just introduce new vector<bool> in GIMPLE
>>>> disabling bool patterns and then resolving vectorization regression on
>>>> all targets or allow them live together with following target switch
>>>> one by one from bool patterns with finally removing them? Not all
>>>> targets are likely to be adopted fast I suppose.
>>
>> Well, the frontends already create vec_cond exprs I believe.  So for
>> bool patterns the vectorizer would have to do the same, but the
>> comparison result in there would still use vec<bool>.  Thus the scalar
>>
>>  _Bool a = b < c;
>>  _Bool c = a || d;
>>  if (c)
>>
>> would become
>>
>>  vec<int> a = VEC_COND <a < b ? -1 : 0>;
>>  vec<int> c = a | d;
>
> This should be identical to
>
> vec<_Bool> a = a < b;
> vec<_Bool> c = a | d;
>
> where vec<_Bool> has VxSI mode. And we should prefer it in case target
> supports vector comparison into vec<bool>, right?
>
>>
>> when the target does not have vec<bool>s directly and otherwise
>> vec<boo> directly (dropping the VEC_COND).
>>
>> Just the vector comparison inside the VEC_COND would always
>> have vec<bool> type.
>
> I don't really understand what you mean by 'doesn't have vec<bool>s
> dirrectly' here. Currently I have a hook to ask for a vec<bool> mode
> and assume target doesn't support it in case it returns VOIDmode. But
> in such case I have no mode to use for vec<bool> inside VEC_COND
> either.

I was thinking about targets not supporting generating vec<bool>
(of whatever mode) from a comparison directly but only via
a COND_EXPR.

> In default implementation of the new target hook I always return
> integer vector mode (to have default behavior similar to the current
> one). It should allow me to use vec<bool> for conditions in all
> vec_cond. But we'd need some other trigger for bool patterns to apply.
> Probably check vec_cmp optab in check_bool_pattern and don't convert
> in case comparison is supported by target? Or control it via
> additional hook.

Not sure if we are always talking about the same thing for
"bool patterns".  I'd remove bool patterns completely, IMHO
they are not necessary at all.

>>
>> And the "bool patterns" I am talking about are those in
>> tree-vect-patterns.c, not any targets instruction patterns.
>
> I refer to them also. BTW bool patterns also pull comparison into
> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
> think with vector comparisons in place we should allow SSA_NAME as
> conditions in VEC_COND for better CSE. That should require new vcond
> optabs though.

I think we do allow this, just the vectorizer doesn't expect it.  In the long
run I want to get rid of the GENERIC exprs in both COND_EXPR and
VEC_COND_EXPR.  Just didn't have the time to do this...

Richard.

> Ilya
>
>>
>> Richard.
>>
>>>>
>>>> Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-15 13:55                           ` Ilya Enkovich
  2015-09-17 17:54                             ` Richard Henderson
@ 2015-09-18 12:45                             ` Richard Biener
  2015-09-18 13:55                               ` Ilya Enkovich
  1 sibling, 1 reply; 48+ messages in thread
From: Richard Biener @ 2015-09-18 12:45 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, GCC Patches

On Tue, Sep 15, 2015 at 3:52 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> On 08 Sep 15:37, Ilya Enkovich wrote:
>> 2015-09-04 23:42 GMT+03:00 Jeff Law <law@redhat.com>:
>> >
>> > So do we have enough confidence in this representation that we want to go
>> > ahead and commit to it?
>>
>> I think new representation fits nice mostly. There are some places
>> where I have to make some exceptions for vector of bools to make it
>> work. This is mostly to avoid target modifications. I'd like to avoid
>> necessity to change all targets currently supporting vec_cond. It
>> makes me add some special handling of vec<bool> in GIMPLE, e.g. I add
>> a special code in vect_init_vector to build vec<bool> invariants with
>> proper casting to int. Otherwise I'd need to do it on a target side.
>>
>> I made several fixes and current patch (still allowing integer vector
>> result for vector comparison and applying bool patterns) passes
>> bootstrap and regression testing on x86_64. Now I'll try to fully
>> switch to vec<bool> and see how it goes.
>>
>> Thanks,
>> Ilya
>>
>
> Hi,
>
> I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target.  Several issues were met.
>
>  - c/c++ front-ends generate vector comparison with integer vector result.  I had to make some modifications to use vec_cond instead.  Don't know if there are other front-ends producing vector comparisons.
>  - vector lowering fails to expand vector masks due to mismatch of type and mode sizes.  I fixed vector type size computation to match mode size and added a special handling of mask expand.
>  - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here.  But get_mask_mode call may be moved into type layout to get it.
>  - Expand of vec<bool> constants/contstructors requires special handling.  Common case should require target hooks/optabs to expand vector into required mode.  But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes.

One complication you might run into currently is that at the moment we
require the comparison result to be
of the same size as the comparison operands.  This means that
vector<bool> with, say, 4 elements has
to support different modes for v4si < v4si vs. v4df < v4df (if you
think of x86 with its multiple vector sizes).
That's for the "fallback" non-mask vector<bool> only of course.  Does
that mean we have to use different
bool types with different modes here?

So the other possibility is to never expose the fallback vector<bool>
anywhere but make sure to lower to
vector<int> via VEC_COND_EXPRs.  After all it's only the vectorizer
that should create stmts with
vector<bool> LHS and the vectorizer is already careful to only
generate code supported by the target.

> Currently 'make check' shows two types of regression.
>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  This must be due to my front-end changes.  Hope it will be easy to fix.
>   - missed vectorization. All of them appear due to bool patterns disabling.  I didn't look into all of them but it seems the main problem is in mixed type sizes.  With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation.  With boolean mask we don't have a proper scalar statements to do that.  I think mask widening/narrowing may be directly supported in masked statements vectorization.  Going to look into it.
>
> I attach what I currently have for a prototype.  It grows bigger so I split into several parts.
>
> Thanks,
> Ilya
> --
> * avx512-vec-bool-01-add-truth-vector.ChangeLog
>
> 2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * doc/tm.texi: Regenerated.
>         * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
>         * stor-layout.c (layout_type): Use mode to get vector mask size.
>         (vector_type_mode): Likewise.
>         * target.def (get_mask_mode): New.
>         * targhooks.c (default_vector_alignment): Use mode alignment
>         for vector masks.
>         (default_get_mask_mode): New.
>         * targhooks.h (default_get_mask_mode): New.
>         * tree.c (make_vector_type): Vector mask has no canonical type.
>         (build_truth_vector_type): New.
>         (build_same_sized_truth_vector_type): New.
>         (truth_type_for): Support vector masks.
>         * tree.h (VECTOR_MASK_TYPE_P): New.
>         (build_truth_vector_type): New.
>         (build_same_sized_truth_vector_type): New.
>
> * avx512-vec-bool-02-no-int-vec-cmp.ChangeLog
>
> gcc/
>
> 2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * tree-cfg.c (verify_gimple_comparison) Require vector mask
>         type for vector comparison.
>         (verify_gimple_assign_ternary): Likewise.
>
> gcc/c
>
> 2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * c-typeck.c (build_conditional_expr): Use vector mask
>         type for vector comparison.
>         (build_vec_cmp): New.
>         (build_binary_op): Use build_vec_cmp for comparison.
>
> gcc/cp
>
> 2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * call.c (build_conditional_expr_1): Use vector mask
>         type for vector comparison.
>         * typeck.c (build_vec_cmp): New.
>         (cp_build_binary_op): Use build_vec_cmp for comparison.
>
> * avx512-vec-bool-03-vec-lower.ChangeLog
>
> 2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * tree-vect-generic.c (tree_vec_extract): Use additional
>         comparison when extracting boolean value.
>         (do_bool_compare): New.
>         (expand_vector_comparison): Add casts for vector mask.
>         (expand_vector_divmod): Use vector mask type for vector
>         comparison.
>         (expand_vector_operations_1) Skip scalar mode mask statements.
>
> * avx512-vec-bool-04-vectorize.ChangeLog
>
> gcc/
>
> 2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
>         (const_vector_mask_from_tree): New.
>         (const_vector_from_tree): Use const_vector_mask_from_tree for vector
>         masks.
>         * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
>         (expand_MASK_STORE): Likewise.
>         * optabs.c (vector_compare_rtx): Add OPNO arg.
>         (expand_vec_cond_expr): Adjust to vector_compare_rtx change.
>         (get_vec_cmp_icode): New.
>         (expand_vec_cmp_expr_p): New.
>         (expand_vec_cmp_expr): New.
>         (can_vec_mask_load_store_p): Add MASK_MODE arg.
>         * optabs.def (vec_cmp_optab): New.
>         (vec_cmpu_optab): New.
>         (maskload_optab): Transform into convert optab.
>         (maskstore_optab): Likewise.
>         * optabs.h (expand_vec_cmp_expr_p): New.
>         (expand_vec_cmp_expr): New.
>         (can_vec_mask_load_store_p): Add MASK_MODE arg.
>         * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
>         can_vec_mask_load_store_p signature change.
>         (predicate_mem_writes): Use boolean mask.
>         * tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
>         (vect_create_destination_var): Likewise.
>         * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
>         operations for VF.  Add mask type computation.
>         * tree-vect-stmts.c (vect_init_vector): Support mask invariants.
>         (vect_get_vec_def_for_operand): Support mask constant.
>         (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
>         signature change.
>         (vectorizable_condition): Use vector mask type for vector comparison.
>         (vectorizable_comparison): New.
>         (vect_analyze_stmt): Add vectorizable_comparison.
>         (vect_transform_stmt): Likewise.
>         (get_mask_type_for_scalar_type): New.
>         * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
>         (enum stmt_vec_info_type): Add comparison_vec_info_type.
>         (get_mask_type_for_scalar_type): New.
>
> * avx512-vec-bool-05-bool-patterns.ChangeLog
>
> 2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * tree-vect-patterns.c (check_bool_pattern): Check fails
>         if we can vectorize comparison directly.
>         (search_type_for_mask): New.
>         (vect_recog_bool_pattern): Support cases when bool pattern
>         check fails.
>
> * avx512-vec-bool-06-i386.ChangeLog
>
> 2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
>         (ix86_expand_int_vec_cmp): New.
>         (ix86_expand_fp_vec_cmp): New.
>         * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
>         op_true and op_false.
>         (ix86_int_cmp_code_to_pcmp_immediate): New.
>         (ix86_fp_cmp_code_to_pcmp_immediate): New.
>         (ix86_cmp_code_to_pcmp_immediate): New.
>         (ix86_expand_mask_vec_cmp): New.
>         (ix86_expand_fp_vec_cmp): New.
>         (ix86_expand_int_sse_cmp): New.
>         (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
>         (ix86_expand_int_vec_cmp): New.
>         (ix86_get_mask_mode): New.
>         (TARGET_VECTORIZE_GET_MASK_MODE): New.
>         * config/i386/sse.md (avx512fmaskmodelower): New.
>         (vec_cmp<mode><avx512fmaskmodelower>): New.
>         (vec_cmp<mode><sseintvecmodelower>): New.
>         (vec_cmpv2div2di): New.
>         (vec_cmpu<mode><avx512fmaskmodelower>): New.
>         (vec_cmpu<mode><sseintvecmodelower>): New.
>         (vec_cmpuv2div2di): New.
>         (maskload<mode>): Rename to ...
>         (maskload<mode><sseintvecmodelower>): ... this.
>         (maskstore<mode>): Rename to ...
>         (maskstore<mode><sseintvecmodelower>): ... this.
>         (maskload<mode><avx512fmaskmodelower>): New.
>         (maskstore<mode><avx512fmaskmodelower>): New.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-17 17:54                             ` Richard Henderson
@ 2015-09-18 13:26                               ` Ilya Enkovich
  2015-09-18 16:58                                 ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-18 13:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jeff Law, Richard Biener, GCC Patches

2015-09-17 20:35 GMT+03:00 Richard Henderson <rth@redhat.com>:
> On 09/15/2015 06:52 AM, Ilya Enkovich wrote:
>> I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target.  Several issues were met.
>>
>>  - c/c++ front-ends generate vector comparison with integer vector result.  I had to make some modifications to use vec_cond instead.  Don't know if there are other front-ends producing vector comparisons.
>>  - vector lowering fails to expand vector masks due to mismatch of type and mode sizes.  I fixed vector type size computation to match mode size and added a special handling of mask expand.
>>  - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here.  But get_mask_mode call may be moved into type layout to get it.
>>  - Expand of vec<bool> constants/contstructors requires special handling.  Common case should require target hooks/optabs to expand vector into required mode.  But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes.
>>
>> Currently 'make check' shows two types of regression.
>>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  This must be due to my front-end changes.  Hope it will be easy to fix.
>>   - missed vectorization. All of them appear due to bool patterns disabling.  I didn't look into all of them but it seems the main problem is in mixed type sizes.  With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation.  With boolean mask we don't have a proper scalar statements to do that.  I think mask widening/narrowing may be directly supported in masked statements vectorization.  Going to look into it.
>>
>> I attach what I currently have for a prototype.  It grows bigger so I split into several parts.
>
> The general approach looks good.
>

Great!

>
>> +/* By defaults a vector of integers is used as a mask.  */
>> +
>> +machine_mode
>> +default_get_mask_mode (unsigned nunits, unsigned vector_size)
>> +{
>> +  unsigned elem_size = vector_size / nunits;
>> +  machine_mode elem_mode
>> +    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
>
> Why these arguments as opposed to passing elem_size?  It seems that every hook
> is going to have to do this division...

Every target would have nunits = vector_size / elem_size because
nunits is used to create a vector mode. Thus no difference.

>
>> +#define VECTOR_MASK_TYPE_P(TYPE)                             \
>> +  (TREE_CODE (TYPE) == VECTOR_TYPE                   \
>> +   && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE)
>
> Perhaps better as VECTOR_BOOLEAN_TYPE_P, since that's exactly what's being tested?

OK

>
>> @@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>            return true;
>>          }
>>      }
>> -  /* Or an integer vector type with the same size and element count
>> +  /* Or a boolean vector type with the same element count
>>       as the comparison operand types.  */
>>    else if (TREE_CODE (type) == VECTOR_TYPE
>> -        && TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
>> +        && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>
> VECTOR_BOOLEAN_TYPE_P.
>
>> @@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
>>                 tree t, tree bitsize, tree bitpos)
>>  {
>>    if (bitpos)
>> -    return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
>> +    {
>> +      if (TREE_CODE (type) == BOOLEAN_TYPE)
>> +     {
>> +       tree itype
>> +         = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
>> +       tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t,
>> +                                     bitsize, bitpos);
>> +       return gimplify_build2 (gsi, NE_EXPR, type, field,
>> +                               build_zero_cst (itype));
>> +     }
>> +      else
>> +     return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
>> +    }
>>    else
>>      return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
>>  }
>
> So... this is us lowering vector operations on a target that doesn't support
> them.  Which means that we get to decide what value is produced for a
> comparison?  Which means that we can settle on the "normal" -1, correct?
>
> Which means that we ought not need to extract the entire element and then
> compare for non-zero, but rather simply extract a single bit from the element,
> and directly call that a boolean result, correct?

Didn't think about that. I'll give it a try.

>
> I assume you tested all this code with -mno-sse or equivalent arch default?

I didn't make some special runs for that. Just used regression testing
which seems to have such tests.

>
>> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
>>    create_output_operand (&ops[0], target, TYPE_MODE (type));
>>    create_fixed_operand (&ops[1], mem);
>>    create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
>> -  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
>> +  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
>> +                                   TYPE_MODE (TREE_TYPE (maskt))),
>> +            3, ops);
>
> Why do we now need a conversion here?

Mask mode was implicit for masked loads and stores. Now it becomes
explicit because we may load the same value using different masks.
E.g. for i386 we may load 256bit vector using both vector and scalar
masks.

>
>> +      if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE ((op0)))) != MODE_VECTOR_INT)
>> +     {
>> +       /* This is a vcond with mask.  To be supported soon...  */
>> +       gcc_unreachable ();
>> +     }
>
> Leave this out til we need it?  I can't see that you replace this later in the
> patch series...

Currently we just shouldn't generate such vec_cond. But later I want
to enable such statements generation and this will need to handle non
vector masks. I don't have it in my patches set yet. But it is not
finished and have lots of unfinished work. I don't expect detailed
review for these patches. Once we make a decision that this
representation works fine and we want to have, I'll address opened
issues and send it a separate series.

Thanks,
Ilya

>
>
> r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-18 12:29                                 ` Richard Biener
@ 2015-09-18 13:44                                   ` Ilya Enkovich
  2015-09-23 13:46                                     ` Ilya Enkovich
  2015-09-23 13:50                                     ` Richard Biener
  0 siblings, 2 replies; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-18 13:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches

2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2015-09-03 15:11 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> Adding CCs.
>>>>
>>>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>>>>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>
>>>>> Totally disabling old style vector comparison and bool pattern is a
>>>>> goal but doing hat would mean a lot of regressions for many targets.
>>>>> Do you want to it to be tried to estimate amount of changes required
>>>>> and reveal possible issues? What would be integration plan for these
>>>>> changes? Do you want to just introduce new vector<bool> in GIMPLE
>>>>> disabling bool patterns and then resolving vectorization regression on
>>>>> all targets or allow them live together with following target switch
>>>>> one by one from bool patterns with finally removing them? Not all
>>>>> targets are likely to be adopted fast I suppose.
>>>
>>> Well, the frontends already create vec_cond exprs I believe.  So for
>>> bool patterns the vectorizer would have to do the same, but the
>>> comparison result in there would still use vec<bool>.  Thus the scalar
>>>
>>>  _Bool a = b < c;
>>>  _Bool c = a || d;
>>>  if (c)
>>>
>>> would become
>>>
>>>  vec<int> a = VEC_COND <a < b ? -1 : 0>;
>>>  vec<int> c = a | d;
>>
>> This should be identical to
>>
>> vec<_Bool> a = a < b;
>> vec<_Bool> c = a | d;
>>
>> where vec<_Bool> has VxSI mode. And we should prefer it in case target
>> supports vector comparison into vec<bool>, right?
>>
>>>
>>> when the target does not have vec<bool>s directly and otherwise
>>> vec<boo> directly (dropping the VEC_COND).
>>>
>>> Just the vector comparison inside the VEC_COND would always
>>> have vec<bool> type.
>>
>> I don't really understand what you mean by 'doesn't have vec<bool>s
>> dirrectly' here. Currently I have a hook to ask for a vec<bool> mode
>> and assume target doesn't support it in case it returns VOIDmode. But
>> in such case I have no mode to use for vec<bool> inside VEC_COND
>> either.
>
> I was thinking about targets not supporting generating vec<bool>
> (of whatever mode) from a comparison directly but only via
> a COND_EXPR.

Where may these direct comparisons come from? Vectorizer never
generates unsupported statements. It means we get them from
gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
Actually vect lowering checks if we are able to make comparison and
expand also uses vec_cond to expand vector comparison, so probably we
may live with them.

>
>> In default implementation of the new target hook I always return
>> integer vector mode (to have default behavior similar to the current
>> one). It should allow me to use vec<bool> for conditions in all
>> vec_cond. But we'd need some other trigger for bool patterns to apply.
>> Probably check vec_cmp optab in check_bool_pattern and don't convert
>> in case comparison is supported by target? Or control it via
>> additional hook.
>
> Not sure if we are always talking about the same thing for
> "bool patterns".  I'd remove bool patterns completely, IMHO
> they are not necessary at all.

I refer to transformations made by vect_recog_bool_pattern. Don't see
how to remove them completely for targets not supporting comparison
vectorization.

>
>>>
>>> And the "bool patterns" I am talking about are those in
>>> tree-vect-patterns.c, not any targets instruction patterns.
>>
>> I refer to them also. BTW bool patterns also pull comparison into
>> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
>> think with vector comparisons in place we should allow SSA_NAME as
>> conditions in VEC_COND for better CSE. That should require new vcond
>> optabs though.
>
> I think we do allow this, just the vectorizer doesn't expect it.  In the long
> run I want to get rid of the GENERIC exprs in both COND_EXPR and
> VEC_COND_EXPR.  Just didn't have the time to do this...

That would be nice. As a first step I'd like to support optabs for
VEC_COND_EXPR directly using vec<bool>.

Thanks,
Ilya

>
> Richard.
>
>> Ilya
>>
>>>
>>> Richard.
>>>
>>>>>
>>>>> Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-18 12:45                             ` Richard Biener
@ 2015-09-18 13:55                               ` Ilya Enkovich
  0 siblings, 0 replies; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-18 13:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

2015-09-18 15:29 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Sep 15, 2015 at 3:52 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> On 08 Sep 15:37, Ilya Enkovich wrote:
>>> 2015-09-04 23:42 GMT+03:00 Jeff Law <law@redhat.com>:
>>> >
>>> > So do we have enough confidence in this representation that we want to go
>>> > ahead and commit to it?
>>>
>>> I think new representation fits nice mostly. There are some places
>>> where I have to make some exceptions for vector of bools to make it
>>> work. This is mostly to avoid target modifications. I'd like to avoid
>>> necessity to change all targets currently supporting vec_cond. It
>>> makes me add some special handling of vec<bool> in GIMPLE, e.g. I add
>>> a special code in vect_init_vector to build vec<bool> invariants with
>>> proper casting to int. Otherwise I'd need to do it on a target side.
>>>
>>> I made several fixes and current patch (still allowing integer vector
>>> result for vector comparison and applying bool patterns) passes
>>> bootstrap and regression testing on x86_64. Now I'll try to fully
>>> switch to vec<bool> and see how it goes.
>>>
>>> Thanks,
>>> Ilya
>>>
>>
>> Hi,
>>
>> I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target.  Several issues were met.
>>
>>  - c/c++ front-ends generate vector comparison with integer vector result.  I had to make some modifications to use vec_cond instead.  Don't know if there are other front-ends producing vector comparisons.
>>  - vector lowering fails to expand vector masks due to mismatch of type and mode sizes.  I fixed vector type size computation to match mode size and added a special handling of mask expand.
>>  - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here.  But get_mask_mode call may be moved into type layout to get it.
>>  - Expand of vec<bool> constants/contstructors requires special handling.  Common case should require target hooks/optabs to expand vector into required mode.  But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes.
>
> One complication you might run into currently is that at the moment we
> require the comparison result to be
> of the same size as the comparison operands.  This means that
> vector<bool> with, say, 4 elements has
> to support different modes for v4si < v4si vs. v4df < v4df (if you
> think of x86 with its multiple vector sizes).
> That's for the "fallback" non-mask vector<bool> only of course.  Does
> that mean we have to use different
> bool types with different modes here?

I though about boolean types with different sizes/modes. I still avoid
them but it causes some ugliness. E.g. sizeof(innertype)*nelems !=
sizeof(vectortype) for vec<bool>. I causes some special handling in
type layout and problems in lowering because BIT_FIELD_REF uses more
bits than resulting type has. I use additional comparison to handle
it. Richard also proposed to extract one bit only for bools. Don't
know if differently sized boolean types may help to resolve this issue
or create more problems.

>
> So the other possibility is to never expose the fallback vector<bool>
> anywhere but make sure to lower to
> vector<int> via VEC_COND_EXPRs.  After all it's only the vectorizer
> that should create stmts with
> vector<bool> LHS and the vectorizer is already careful to only
> generate code supported by the target.

In case vec<bool> has integer vector mode, comparison should be
handled similar to VEC_COND_EXPR by vect lowering and expand which
should be enough to have it properly handled on targets with no
vec<bool> support.

Thanks,
Ilya

>
>> Currently 'make check' shows two types of regression.
>>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  This must be due to my front-end changes.  Hope it will be easy to fix.
>>   - missed vectorization. All of them appear due to bool patterns disabling.  I didn't look into all of them but it seems the main problem is in mixed type sizes.  With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation.  With boolean mask we don't have a proper scalar statements to do that.  I think mask widening/narrowing may be directly supported in masked statements vectorization.  Going to look into it.
>>
>> I attach what I currently have for a prototype.  It grows bigger so I split into several parts.
>>
>> Thanks,
>> Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-18 13:26                               ` Ilya Enkovich
@ 2015-09-18 16:58                                 ` Richard Henderson
  2015-09-21 12:21                                   ` Ilya Enkovich
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2015-09-18 16:58 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, Richard Biener, GCC Patches

On 09/18/2015 06:21 AM, Ilya Enkovich wrote:
>>> +machine_mode
>>> +default_get_mask_mode (unsigned nunits, unsigned vector_size)
>>> +{
>>> +  unsigned elem_size = vector_size / nunits;
>>> +  machine_mode elem_mode
>>> +    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
>>
>> Why these arguments as opposed to passing elem_size?  It seems that every hook
>> is going to have to do this division...
> 
> Every target would have nunits = vector_size / elem_size because
> nunits is used to create a vector mode. Thus no difference.

I meant passing nunits and elem_size, but not vector_size.  Thus no division
required.  If the target does require the vector size, it could be obtained by
multiplication, which is cheaper.  But in cases like this we'd not require
either mult or div.

>>> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
>>>    create_output_operand (&ops[0], target, TYPE_MODE (type));
>>>    create_fixed_operand (&ops[1], mem);
>>>    create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
>>> -  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
>>> +  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
>>> +                                   TYPE_MODE (TREE_TYPE (maskt))),
>>> +            3, ops);
>>
>> Why do we now need a conversion here?
> 
> Mask mode was implicit for masked loads and stores. Now it becomes
> explicit because we may load the same value using different masks.
> E.g. for i386 we may load 256bit vector using both vector and scalar
> masks.

Ok, sure, the mask mode is needed, I get that.  But that doesn't answer the
question regarding conversion.  Why would convert_optab_handler be needed to
*change* the mode of the mask.  I assume that's not actually possible, with the
target hook already having chosen the proper mode for the mask.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-18 16:58                                 ` Richard Henderson
@ 2015-09-21 12:21                                   ` Ilya Enkovich
  2015-09-21 17:40                                     ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-21 12:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jeff Law, Richard Biener, GCC Patches

2015-09-18 19:50 GMT+03:00 Richard Henderson <rth@redhat.com>:
> On 09/18/2015 06:21 AM, Ilya Enkovich wrote:
>>>> +machine_mode
>>>> +default_get_mask_mode (unsigned nunits, unsigned vector_size)
>>>> +{
>>>> +  unsigned elem_size = vector_size / nunits;
>>>> +  machine_mode elem_mode
>>>> +    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
>>>
>>> Why these arguments as opposed to passing elem_size?  It seems that every hook
>>> is going to have to do this division...
>>
>> Every target would have nunits = vector_size / elem_size because
>> nunits is used to create a vector mode. Thus no difference.
>
> I meant passing nunits and elem_size, but not vector_size.  Thus no division
> required.  If the target does require the vector size, it could be obtained by
> multiplication, which is cheaper.  But in cases like this we'd not require
> either mult or div.

OK

>
>>>> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
>>>>    create_output_operand (&ops[0], target, TYPE_MODE (type));
>>>>    create_fixed_operand (&ops[1], mem);
>>>>    create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
>>>> -  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
>>>> +  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
>>>> +                                   TYPE_MODE (TREE_TYPE (maskt))),
>>>> +            3, ops);
>>>
>>> Why do we now need a conversion here?
>>
>> Mask mode was implicit for masked loads and stores. Now it becomes
>> explicit because we may load the same value using different masks.
>> E.g. for i386 we may load 256bit vector using both vector and scalar
>> masks.
>
> Ok, sure, the mask mode is needed, I get that.  But that doesn't answer the
> question regarding conversion.  Why would convert_optab_handler be needed to
> *change* the mode of the mask.  I assume that's not actually possible, with the
> target hook already having chosen the proper mode for the mask.

There is no any conversion here, maskload_optab is a convert_optab
because it uses two modes, one for value and the other one for mask.

Ilya

>
>
> r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-21 12:21                                   ` Ilya Enkovich
@ 2015-09-21 17:40                                     ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2015-09-21 17:40 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, Richard Biener, GCC Patches

On 09/21/2015 05:08 AM, Ilya Enkovich wrote:
> There is no any conversion here, maskload_optab is a convert_optab
> because it uses two modes, one for value and the other one for mask.

Ah, I see.  In which case I think we ought to come up with a different name.
C.f. get_vcond_icode.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-18 13:44                                   ` Ilya Enkovich
@ 2015-09-23 13:46                                     ` Ilya Enkovich
  2015-09-23 14:10                                       ` Richard Biener
  2015-09-23 13:50                                     ` Richard Biener
  1 sibling, 1 reply; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-23 13:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches

2015-09-18 16:40 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
> 2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>
>> I was thinking about targets not supporting generating vec<bool>
>> (of whatever mode) from a comparison directly but only via
>> a COND_EXPR.
>
> Where may these direct comparisons come from? Vectorizer never
> generates unsupported statements. It means we get them from
> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
> Actually vect lowering checks if we are able to make comparison and
> expand also uses vec_cond to expand vector comparison, so probably we
> may live with them.
>
>>
>> Not sure if we are always talking about the same thing for
>> "bool patterns".  I'd remove bool patterns completely, IMHO
>> they are not necessary at all.
>
> I refer to transformations made by vect_recog_bool_pattern. Don't see
> how to remove them completely for targets not supporting comparison
> vectorization.
>
>>
>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>> VEC_COND_EXPR.  Just didn't have the time to do this...
>
> That would be nice. As a first step I'd like to support optabs for
> VEC_COND_EXPR directly using vec<bool>.
>
> Thanks,
> Ilya
>
>>
>> Richard.

Hi Richard,

Do you think we have enough confidence approach is working and we may
start integrating it into trunk? What would be integration plan then?

Thanks,
Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-18 13:44                                   ` Ilya Enkovich
  2015-09-23 13:46                                     ` Ilya Enkovich
@ 2015-09-23 13:50                                     ` Richard Biener
  1 sibling, 0 replies; 48+ messages in thread
From: Richard Biener @ 2015-09-23 13:50 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, gcc-patches

On Fri, Sep 18, 2015 at 3:40 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-09-03 15:11 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>> Adding CCs.
>>>>>
>>>>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>>>>>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>
>>>>>> Totally disabling old style vector comparison and bool pattern is a
>>>>>> goal but doing hat would mean a lot of regressions for many targets.
>>>>>> Do you want to it to be tried to estimate amount of changes required
>>>>>> and reveal possible issues? What would be integration plan for these
>>>>>> changes? Do you want to just introduce new vector<bool> in GIMPLE
>>>>>> disabling bool patterns and then resolving vectorization regression on
>>>>>> all targets or allow them live together with following target switch
>>>>>> one by one from bool patterns with finally removing them? Not all
>>>>>> targets are likely to be adopted fast I suppose.
>>>>
>>>> Well, the frontends already create vec_cond exprs I believe.  So for
>>>> bool patterns the vectorizer would have to do the same, but the
>>>> comparison result in there would still use vec<bool>.  Thus the scalar
>>>>
>>>>  _Bool a = b < c;
>>>>  _Bool c = a || d;
>>>>  if (c)
>>>>
>>>> would become
>>>>
>>>>  vec<int> a = VEC_COND <a < b ? -1 : 0>;
>>>>  vec<int> c = a | d;
>>>
>>> This should be identical to
>>>
>>> vec<_Bool> a = a < b;
>>> vec<_Bool> c = a | d;
>>>
>>> where vec<_Bool> has VxSI mode. And we should prefer it in case target
>>> supports vector comparison into vec<bool>, right?
>>>
>>>>
>>>> when the target does not have vec<bool>s directly and otherwise
>>>> vec<boo> directly (dropping the VEC_COND).
>>>>
>>>> Just the vector comparison inside the VEC_COND would always
>>>> have vec<bool> type.
>>>
>>> I don't really understand what you mean by 'doesn't have vec<bool>s
>>> dirrectly' here. Currently I have a hook to ask for a vec<bool> mode
>>> and assume target doesn't support it in case it returns VOIDmode. But
>>> in such case I have no mode to use for vec<bool> inside VEC_COND
>>> either.
>>
>> I was thinking about targets not supporting generating vec<bool>
>> (of whatever mode) from a comparison directly but only via
>> a COND_EXPR.
>
> Where may these direct comparisons come from? Vectorizer never
> generates unsupported statements. It means we get them from
> gimplifier?

That's what I say - the vecotirzer wouldn't generate them.

> So touch optabs in gimplifier to avoid direct comparisons?
> Actually vect lowering checks if we are able to make comparison and
> expand also uses vec_cond to expand vector comparison, so probably we
> may live with them.
>
>>
>>> In default implementation of the new target hook I always return
>>> integer vector mode (to have default behavior similar to the current
>>> one). It should allow me to use vec<bool> for conditions in all
>>> vec_cond. But we'd need some other trigger for bool patterns to apply.
>>> Probably check vec_cmp optab in check_bool_pattern and don't convert
>>> in case comparison is supported by target? Or control it via
>>> additional hook.
>>
>> Not sure if we are always talking about the same thing for
>> "bool patterns".  I'd remove bool patterns completely, IMHO
>> they are not necessary at all.
>
> I refer to transformations made by vect_recog_bool_pattern. Don't see
> how to remove them completely for targets not supporting comparison
> vectorization.

The vectorizer can vectorize comparisons by emitting a VEC_COND_EXPR
(the bool pattern would turn the comparison into a COND_EXPR).  I don't
see how the pattern intermediate step is necessary.  The important part
is to get the desired vector type of the comparison determined.

>>
>>>>
>>>> And the "bool patterns" I am talking about are those in
>>>> tree-vect-patterns.c, not any targets instruction patterns.
>>>
>>> I refer to them also. BTW bool patterns also pull comparison into
>>> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
>>> think with vector comparisons in place we should allow SSA_NAME as
>>> conditions in VEC_COND for better CSE. That should require new vcond
>>> optabs though.
>>
>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>> VEC_COND_EXPR.  Just didn't have the time to do this...
>
> That would be nice. As a first step I'd like to support optabs for
> VEC_COND_EXPR directly using vec<bool>.
>
> Thanks,
> Ilya
>
>>
>> Richard.
>>
>>> Ilya
>>>
>>>>
>>>> Richard.
>>>>
>>>>>>
>>>>>> Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-23 13:46                                     ` Ilya Enkovich
@ 2015-09-23 14:10                                       ` Richard Biener
  2015-09-23 18:51                                         ` Richard Henderson
  2015-09-25 14:57                                         ` Ilya Enkovich
  0 siblings, 2 replies; 48+ messages in thread
From: Richard Biener @ 2015-09-23 14:10 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, gcc-patches

On Wed, Sep 23, 2015 at 3:41 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2015-09-18 16:40 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>> 2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>
>>> I was thinking about targets not supporting generating vec<bool>
>>> (of whatever mode) from a comparison directly but only via
>>> a COND_EXPR.
>>
>> Where may these direct comparisons come from? Vectorizer never
>> generates unsupported statements. It means we get them from
>> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
>> Actually vect lowering checks if we are able to make comparison and
>> expand also uses vec_cond to expand vector comparison, so probably we
>> may live with them.
>>
>>>
>>> Not sure if we are always talking about the same thing for
>>> "bool patterns".  I'd remove bool patterns completely, IMHO
>>> they are not necessary at all.
>>
>> I refer to transformations made by vect_recog_bool_pattern. Don't see
>> how to remove them completely for targets not supporting comparison
>> vectorization.
>>
>>>
>>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>>> VEC_COND_EXPR.  Just didn't have the time to do this...
>>
>> That would be nice. As a first step I'd like to support optabs for
>> VEC_COND_EXPR directly using vec<bool>.
>>
>> Thanks,
>> Ilya
>>
>>>
>>> Richard.
>
> Hi Richard,
>
> Do you think we have enough confidence approach is working and we may
> start integrating it into trunk? What would be integration plan then?

I'm still worried about the vec<bool> vector size vs. element size
issue (well, somewhat).

Otherwise the integration plan would be

 1) put in the vector<bool> GIMPLE type support and change the vector
comparison type IL requirement to be vector<bool>,
fixing all fallout

 2) get support for directly expanding vector comparisons to
vector<bool> and make use of that from the x86 backend

 3) make the vectorizer generate the above if supported

I think independent improvements are

 1) remove (most) of the bool patterns from the vectorizer

 2) make VEC_COND_EXPR not have a GENERIC comparison embedded

(same for COND_EXPR?)

Richard.

> Thanks,
> Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-23 14:10                                       ` Richard Biener
@ 2015-09-23 18:51                                         ` Richard Henderson
  2015-09-24  8:40                                           ` Richard Biener
  2015-09-25 14:57                                         ` Ilya Enkovich
  1 sibling, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2015-09-23 18:51 UTC (permalink / raw)
  To: Richard Biener, Ilya Enkovich; +Cc: Jeff Law, gcc-patches

On 09/23/2015 06:53 AM, Richard Biener wrote:
> I think independent improvements are
> 
>  1) remove (most) of the bool patterns from the vectorizer
> 
>  2) make VEC_COND_EXPR not have a GENERIC comparison embedded
> 
> (same for COND_EXPR?)

Careful.

The reason that COND_EXPR have embedded comparisons is to handle flags
registers.  You can't separate the setting of the flags from the using of the
flags on most targets, because there's only one flags register.

The same is true for VEC_COND_EXPR with respect to MIPS.  The base architecture
has 8 floating-point comparison result flags, and the vector compare
instructions are fixed to set fcc[0:width-1].  So again there's only one
possible output location for the result of the compare.

MIPS is going to present a problem if we attempt to generalize logical
combinations of these vector<bool>, since one has to use several instructions
(or one insn and pre-load constants into two registers) to get the fcc bits out
into a form we can manipulate.

r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-23 18:51                                         ` Richard Henderson
@ 2015-09-24  8:40                                           ` Richard Biener
  2015-09-24 16:55                                             ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2015-09-24  8:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Ilya Enkovich, Jeff Law, gcc-patches

On Wed, Sep 23, 2015 at 8:44 PM, Richard Henderson <rth@redhat.com> wrote:
> On 09/23/2015 06:53 AM, Richard Biener wrote:
>> I think independent improvements are
>>
>>  1) remove (most) of the bool patterns from the vectorizer
>>
>>  2) make VEC_COND_EXPR not have a GENERIC comparison embedded
>>
>> (same for COND_EXPR?)
>
> Careful.
>
> The reason that COND_EXPR have embedded comparisons is to handle flags
> registers.  You can't separate the setting of the flags from the using of the
> flags on most targets, because there's only one flags register.
>
> The same is true for VEC_COND_EXPR with respect to MIPS.  The base architecture
> has 8 floating-point comparison result flags, and the vector compare
> instructions are fixed to set fcc[0:width-1].  So again there's only one
> possible output location for the result of the compare.
>
> MIPS is going to present a problem if we attempt to generalize logical
> combinations of these vector<bool>, since one has to use several instructions
> (or one insn and pre-load constants into two registers) to get the fcc bits out
> into a form we can manipulate.

Both are basically a (target) restriction on how we should expand a conditional
move (and its condition).  It's techincally convenient to tie both together by
having them in the same statement but it's also techincally very incovenient
in other places.  I'd say for targets where

tem_1 = a_2 < b_3;
res_4 = tem_1 ? c_5 : d_6;
res_7 = tem_1 ? x_8 : z_9;

presents a serious issue ("re-using" the flags register) out-of-SSA
should duplicate
the conditionals so that TER can do its job (and RTL expansion should use TER
to get at the flags setter).  I imagine that if we expand the above to
adjacent statements
the CPUs can re-use the condition code.

To me where the condition is in GIMPLE is an implementation detail and the
inconveniences outweight the benefits.

Maybe we should make the effects of TER on the statement schedule explicitely
visible to make debugging that easier and remove the implicit scheduling from
the SSA name expansion code (basically require SSA names do have expanded defs).
That way we have the chance to perform pre-expansion "scheduling" in a more
predictable way leaving only the parts of the expansion using TER that want to
see a bigger expression (like [VEC_]COND_EXPR expansion eventually).

Richard.

>
> r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-24  8:40                                           ` Richard Biener
@ 2015-09-24 16:55                                             ` Richard Henderson
  2015-09-25  8:51                                               ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2015-09-24 16:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: Ilya Enkovich, Jeff Law, gcc-patches

On 09/24/2015 01:09 AM, Richard Biener wrote:
> Both are basically a (target) restriction on how we should expand a conditional
> move (and its condition).  It's techincally convenient to tie both together by
> having them in the same statement but it's also techincally very incovenient
> in other places.  I'd say for targets where
> 
> tem_1 = a_2 < b_3;
> res_4 = tem_1 ? c_5 : d_6;
> res_7 = tem_1 ? x_8 : z_9;
> 
> presents a serious issue ("re-using" the flags register) out-of-SSA should
> duplicate the conditionals so that TER can do its job (and RTL expansion
> should use TER to get at the flags setter).

Sure it's a target restriction, but it's an extremely common one.  Essentially
all of our production platforms have it.  What do we gain by adding some sort
of target hook for this?

> I imagine that if we expand the above to adjacent statements the CPUs can
> re-use the condition code.

Sure, but IMO it should be the job of RTL CSE to make that decision, after all
of the uses (and clobbers) of the flags register have been exposed.

> To me where the condition is in GIMPLE is an implementation detail and the
> inconveniences outweight the benefits.

Why is a 3-operand gimple statement fine, but a 4-operand gimple statement
inconvenient?


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-24 16:55                                             ` Richard Henderson
@ 2015-09-25  8:51                                               ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2015-09-25  8:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Ilya Enkovich, Jeff Law, gcc-patches

On Thu, Sep 24, 2015 at 6:37 PM, Richard Henderson <rth@redhat.com> wrote:
> On 09/24/2015 01:09 AM, Richard Biener wrote:
>> Both are basically a (target) restriction on how we should expand a conditional
>> move (and its condition).  It's techincally convenient to tie both together by
>> having them in the same statement but it's also techincally very incovenient
>> in other places.  I'd say for targets where
>>
>> tem_1 = a_2 < b_3;
>> res_4 = tem_1 ? c_5 : d_6;
>> res_7 = tem_1 ? x_8 : z_9;
>>
>> presents a serious issue ("re-using" the flags register) out-of-SSA should
>> duplicate the conditionals so that TER can do its job (and RTL expansion
>> should use TER to get at the flags setter).
>
> Sure it's a target restriction, but it's an extremely common one.  Essentially
> all of our production platforms have it.  What do we gain by adding some sort
> of target hook for this?

A cleaner IL, no GENERIC expression tree building in GIMPLE (I guess that's
sth Andrew needs for his GIMPLE types project as well), less awkward
special-casing of comparisons based on context in code like genmatch.c
or in value-numbering.

>> I imagine that if we expand the above to adjacent statements the CPUs can
>> re-use the condition code.
>
> Sure, but IMO it should be the job of RTL CSE to make that decision, after all
> of the uses (and clobbers) of the flags register have been exposed.
>
>> To me where the condition is in GIMPLE is an implementation detail and the
>> inconveniences outweight the benefits.
>
> Why is a 3-operand gimple statement fine, but a 4-operand gimple statement
> inconvenient?

The inconvenience is not the number of operands but that we have two operation
codes and that we compute two values but only have an SSA name def for one
of them.  Oh, and did I mention that second operation is GENERIC?

So one way to clean things up would be to no longer use a GIMPLE_ASSIGN for
x = a < b ? c : d but instead use a GIMPLE_COND and give that a SSA def
for the result, using the true/false label operand places for 'c' and 'd'.

That still wouldn't get the compare a SSA def but at least it would get rid of
the 2nd operator code and the GENERIC expression operand.

From the GIMPLE side forcing out the comparison to a separate stmt looks
more obvious and if we're considering doing a different thing then we may as
well think of how to represent predicating arbitrary stmts or how to explicitely
model condition codes in GIMPLE.

It kind of looks like we want a GIMPLE PARALLEL ... (we already have
a GIMPLE stmt with multiple defs - GIMPLE_ASM)

Richard.

>
> r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] Try vector<bool> as a new representation for vector masks
  2015-09-23 14:10                                       ` Richard Biener
  2015-09-23 18:51                                         ` Richard Henderson
@ 2015-09-25 14:57                                         ` Ilya Enkovich
  1 sibling, 0 replies; 48+ messages in thread
From: Ilya Enkovich @ 2015-09-25 14:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches

2015-09-23 16:53 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Sep 23, 2015 at 3:41 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2015-09-18 16:40 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>>> 2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>
>>>> I was thinking about targets not supporting generating vec<bool>
>>>> (of whatever mode) from a comparison directly but only via
>>>> a COND_EXPR.
>>>
>>> Where may these direct comparisons come from? Vectorizer never
>>> generates unsupported statements. It means we get them from
>>> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
>>> Actually vect lowering checks if we are able to make comparison and
>>> expand also uses vec_cond to expand vector comparison, so probably we
>>> may live with them.
>>>
>>>>
>>>> Not sure if we are always talking about the same thing for
>>>> "bool patterns".  I'd remove bool patterns completely, IMHO
>>>> they are not necessary at all.
>>>
>>> I refer to transformations made by vect_recog_bool_pattern. Don't see
>>> how to remove them completely for targets not supporting comparison
>>> vectorization.
>>>
>>>>
>>>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>>>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>>>> VEC_COND_EXPR.  Just didn't have the time to do this...
>>>
>>> That would be nice. As a first step I'd like to support optabs for
>>> VEC_COND_EXPR directly using vec<bool>.
>>>
>>> Thanks,
>>> Ilya
>>>
>>>>
>>>> Richard.
>>
>> Hi Richard,
>>
>> Do you think we have enough confidence approach is working and we may
>> start integrating it into trunk? What would be integration plan then?
>
> I'm still worried about the vec<bool> vector size vs. element size
> issue (well, somewhat).

Yeah, I hit another problem related to element size in vec lowering.
It uses inner type sizes in expand_vector_piecewise and bool vector
expand goes in a wrong way. There were also other places with similar
problems and therefore I want to try to use bools of different sizes
and see how it goes. Also having different sized bools may be useful
to represent masks pack/unpack in scalar code.

>
> Otherwise the integration plan would be
>
>  1) put in the vector<bool> GIMPLE type support and change the vector
> comparison type IL requirement to be vector<bool>,
> fixing all fallout
>
>  2) get support for directly expanding vector comparisons to
> vector<bool> and make use of that from the x86 backend
>
>  3) make the vectorizer generate the above if supported
>
> I think independent improvements are
>
>  1) remove (most) of the bool patterns from the vectorizer
>
>  2) make VEC_COND_EXPR not have a GENERIC comparison embedded

Sounds great!

Ilya

>
> (same for COND_EXPR?)
>
> Richard.
>
>> Thanks,
>> Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-09-25 14:39 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-17 16:27 [Scalar masks 2/x] Use bool masks in if-conversion Ilya Enkovich
2015-08-20 19:26 ` Jeff Law
2015-08-21  8:32   ` Richard Biener
2015-08-21 10:52     ` Ilya Enkovich
2015-08-21 11:15       ` Richard Biener
2015-08-21 12:19         ` Ilya Enkovich
2015-08-25 21:40           ` Jeff Law
2015-08-26 11:13             ` Ilya Enkovich
2015-08-26 13:09           ` Richard Biener
2015-08-26 13:21             ` Jakub Jelinek
2015-08-26 13:27               ` Richard Biener
2015-08-26 13:47                 ` Jakub Jelinek
2015-08-26 14:36                   ` Richard Biener
2015-08-26 14:51             ` Ilya Enkovich
2015-08-26 15:02               ` Richard Biener
2015-08-26 15:15                 ` Jakub Jelinek
2015-08-26 16:09                 ` Ilya Enkovich
2015-08-27  7:58                   ` Richard Biener
2015-09-01 13:13                     ` [RFC] Try vector<bool> as a new representation for vector masks Ilya Enkovich
2015-09-01 14:25                       ` Richard Biener
     [not found]                         ` <CAMbmDYafMuqzmRwRQfFHpLORFFGmFpfSRTR0QKx+LRFm6z75JQ@mail.gmail.com>
2015-09-03 12:12                           ` Ilya Enkovich
2015-09-03 12:42                             ` Richard Biener
2015-09-03 14:12                               ` Ilya Enkovich
2015-09-18 12:29                                 ` Richard Biener
2015-09-18 13:44                                   ` Ilya Enkovich
2015-09-23 13:46                                     ` Ilya Enkovich
2015-09-23 14:10                                       ` Richard Biener
2015-09-23 18:51                                         ` Richard Henderson
2015-09-24  8:40                                           ` Richard Biener
2015-09-24 16:55                                             ` Richard Henderson
2015-09-25  8:51                                               ` Richard Biener
2015-09-25 14:57                                         ` Ilya Enkovich
2015-09-23 13:50                                     ` Richard Biener
2015-09-04 20:47                       ` Jeff Law
2015-09-08 12:43                         ` Ilya Enkovich
2015-09-15 13:55                           ` Ilya Enkovich
2015-09-17 17:54                             ` Richard Henderson
2015-09-18 13:26                               ` Ilya Enkovich
2015-09-18 16:58                                 ` Richard Henderson
2015-09-21 12:21                                   ` Ilya Enkovich
2015-09-21 17:40                                     ` Richard Henderson
2015-09-18 12:45                             ` Richard Biener
2015-09-18 13:55                               ` Ilya Enkovich
2015-08-25 21:42       ` [Scalar masks 2/x] Use bool masks in if-conversion Jeff Law
2015-08-26 11:14         ` Ilya Enkovich
2015-08-26 13:12           ` Richard Biener
2015-08-26 16:58           ` Jeff Law
2015-08-21 15:57     ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).