public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489
@ 2021-12-20  8:59 wwwhhhyyy333 at gmail dot com
  2021-12-20 11:45 ` [Bug target/103771] [12 Regression] " pinskia at gcc dot gnu.org
                   ` (44 more replies)
  0 siblings, 45 replies; 46+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2021-12-20  8:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

            Bug ID: 103771
           Summary: Missed vectorization under -mavx512f -mavx512vl after
                    r12-5489
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wwwhhhyyy333 at gmail dot com
  Target Milestone: ---

cat vect.c

typedef unsigned char uint8_t;                                                  

static uint8_t x264_clip_uint8( int x )                                         
{                                                                               
    return x&(~255) ? (-x)>>31 : x;                                             
}                                                                               

void mc_weight( uint8_t * __restrict dst, uint8_t * __restrict src, int
i_width, int i_scale)
{                                                                               
  for( int x = 0; x < i_width; x++ )                                            
    dst[x] = x264_clip_uint8(src[x] * i_scale);                                 
}                                                                               

It can not be vectorized with -mavx512f -mavx512vl, but can be vectorized with
-mavx2, See https://godbolt.org/z/M1jx161f6

The commit https://gcc.gnu.org/cgi-bin/gcc-gitref.cgi?r=r12-5489 converts  (x &
(~255)) == 0 to x <= 255, which may trigger some missing pattern with
-mavx512vl. 

Also an 1.5% regression was found on -march=cascadelake due to missing 128bit
epilogue for this loop.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
@ 2021-12-20 11:45 ` pinskia at gcc dot gnu.org
  2022-01-04 15:31 ` tnfchris at gcc dot gnu.org
                   ` (43 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-20 11:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Missed vectorization under  |[12 Regression] Missed
                   |-mavx512f -mavx512vl after  |vectorization under
                   |r12-5489                    |-mavx512f -mavx512vl after
                   |                            |r12-5489
   Target Milestone|---                         |12.0
           Keywords|                            |missed-optimization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
  2021-12-20 11:45 ` [Bug target/103771] [12 Regression] " pinskia at gcc dot gnu.org
@ 2022-01-04 15:31 ` tnfchris at gcc dot gnu.org
  2022-01-05  9:00 ` crazylht at gmail dot com
                   ` (42 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-01-04 15:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #1 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Looks like the change causes the simpler conditional to be detected by the
vectorizer as a masked operation, which in principle makes sense:

note:   vect_recog_mask_conversion_pattern: detected: iftmp.0_21 = x.1_14 > 255
? iftmp.0_19 : iftmp.0_20;
note:   mask_conversion pattern recognized: patt_43 = patt_42 ? iftmp.0_19 :
iftmp.0_20;
note:   extra pattern stmt: patt_40 = x.1_14 > 255;
note:   extra pattern stmt: patt_42 = (<signed-boolean:8>) patt_40;

However not quite sure how the masking works on x86.  The additional statement
generated for patt_42 causes it to fail during vectorization:

note:   ==> examining pattern def statement: patt_42 = (<signed-boolean:8>)
patt_40;
note:   ==> examining statement: patt_42 = (<signed-boolean:8>) patt_40;
note:   vect_is_simple_use: operand x.1_14 > 255, type of def: internal
note:   vect_is_simple_use: vectype vector(8) <signed-boolean:1>
missed:   conversion not supported by target.
note:   vect_is_simple_use: operand x.1_14 > 255, type of def: internal
note:   vect_is_simple_use: vectype vector(8) <signed-boolean:1>
note:   vect_is_simple_use: operand x.1_14 > 255, type of def: internal
note:   vect_is_simple_use: vectype vector(8) <signed-boolean:1>
missed:   not vectorized: relevant stmt not supported: patt_42 =
(<signed-boolean:8>) patt_40;
missed:  bad operation or unsupported loop bound.
note:  ***** Analysis  failed with vector mode V32QI

as there's no conversion patterns for `VEC_UNPACK_LO_EXPR` between bool and a
mask.

which explains why it works for AVX2 and AVX512BW. AVX512F doesn't seem to
allow any QI mode conversions [1] so it fails..

Not sure why it's doing the replacement without checking to see that the target
is able to vectorize the statements it generates later. Specifically it doesn't
check if what's returned by build_mask_conversion is supported or not.

My guess is because vectorizable_condition will fail anyway without the type of
the conditional being a vector boolean.

With -mavx512vl V32QI seems to generate in the pattern mask conversions between
vector (8) <signed-boolean:1> and without it vector(32) <signed-boolean:8>. I
think some x86 person needs to give a hint here :)

[1] https://www.felixcloutier.com/x86/kunpckbw:kunpckwd:kunpckdq

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
  2021-12-20 11:45 ` [Bug target/103771] [12 Regression] " pinskia at gcc dot gnu.org
  2022-01-04 15:31 ` tnfchris at gcc dot gnu.org
@ 2022-01-05  9:00 ` crazylht at gmail dot com
  2022-01-13  8:43 ` crazylht at gmail dot com
                   ` (41 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-05  9:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Tamar Christina from comment #1)
> Looks like the change causes the simpler conditional to be detected by the
> vectorizer as a masked operation, which in principle makes sense:
> 
> note:   vect_recog_mask_conversion_pattern: detected: iftmp.0_21 = x.1_14 >
> 255 ? iftmp.0_19 : iftmp.0_20;
> note:   mask_conversion pattern recognized: patt_43 = patt_42 ? iftmp.0_19 :
> iftmp.0_20;
> note:   extra pattern stmt: patt_40 = x.1_14 > 255;
> note:   extra pattern stmt: patt_42 = (<signed-boolean:8>) patt_40;
> 
> However not quite sure how the masking works on x86.  The additional
> statement generated for patt_42 causes it to fail during vectorization:
> 
> note:   ==> examining pattern def statement: patt_42 = (<signed-boolean:8>)
> patt_40;
> note:   ==> examining statement: patt_42 = (<signed-boolean:8>) patt_40;
> note:   vect_is_simple_use: operand x.1_14 > 255, type of def: internal
> note:   vect_is_simple_use: vectype vector(8) <signed-boolean:1>
> missed:   conversion not supported by target.
> note:   vect_is_simple_use: operand x.1_14 > 255, type of def: internal
> note:   vect_is_simple_use: vectype vector(8) <signed-boolean:1>
> note:   vect_is_simple_use: operand x.1_14 > 255, type of def: internal
> note:   vect_is_simple_use: vectype vector(8) <signed-boolean:1>
> missed:   not vectorized: relevant stmt not supported: patt_42 =
> (<signed-boolean:8>) patt_40;
> missed:  bad operation or unsupported loop bound.
> note:  ***** Analysis  failed with vector mode V32QI
> 
> as there's no conversion patterns for `VEC_UNPACK_LO_EXPR` between bool and
> a mask.
W/ avx512, we're using scalar mode for mask, can we use VEC_UNPACKS_SBOOL_LO_
here?
Since we have vec_unpsack_sbool_lo/hi_qi which should be used for conversion
from vector<8> <signed-boolean:1> to vector<4> <signed-boolean:1>.
> 
> which explains why it works for AVX2 and AVX512BW. AVX512F doesn't seem to
> allow any QI mode conversions [1] so it fails..
> 
> Not sure why it's doing the replacement without checking to see that the
> target is able to vectorize the statements it generates later. Specifically
> it doesn't check if what's returned by build_mask_conversion is supported or
> not.
> 
> My guess is because vectorizable_condition will fail anyway without the type
> of the conditional being a vector boolean.
> 
> With -mavx512vl V32QI seems to generate in the pattern mask conversions
> between vector (8) <signed-boolean:1> and without it vector(32)
> <signed-boolean:8>. I think some x86 person needs to give a hint here :)
> 
> [1] https://www.felixcloutier.com/x86/kunpckbw:kunpckwd:kunpckdq

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (2 preceding siblings ...)
  2022-01-05  9:00 ` crazylht at gmail dot com
@ 2022-01-13  8:43 ` crazylht at gmail dot com
  2022-01-13  9:02 ` crazylht at gmail dot com
                   ` (40 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-13  8:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
for patt_42 = (<signed-boolean:1>) patt_40;

vectype_in (QImode:nunits 4)

 <vector_type 0x7fffea18de70
    type <boolean_type 0x7fffea18ddc8 public QI
        size <integer_cst 0x7fffea2e2e40 constant 8>
        unit-size <integer_cst 0x7fffea2e2e58 constant 1>
        align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea18ddc8 precision:1 min <integer_cst 0x7fffea196ab0 -1> max <integer_cst
0x7fffea196b70 0>>
    QI size <integer_cst 0x7fffea2e2e40 8> unit-size <integer_cst
0x7fffea2e2e58 1>
    align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea18de70 nunits:4>

vectype_out(HImode)

 <vector_type 0x7fffea18df18
    type <boolean_type 0x7fffea18ddc8 public QI
        size <integer_cst 0x7fffea2e2e40 constant 8>
        unit-size <integer_cst 0x7fffea2e2e58 constant 1>
        align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea18ddc8 precision:1 min <integer_cst 0x7fffea196ab0 -1> max <integer_cst
0x7fffea196b70 0>>
    HI
    size <integer_cst 0x7fffea2e2f00 type <integer_type 0x7fffea2fd0a8
bitsizetype> constant 16>
    unit-size <integer_cst 0x7fffea2e2f18 type <integer_type 0x7fffea2fd000
sizetype> constant 2>
    align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea18df18 nunits:16>

And ‘vec_pack_sbool_trunc_m’ only handle situation when input and output have
same mode.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (3 preceding siblings ...)
  2022-01-13  8:43 ` crazylht at gmail dot com
@ 2022-01-13  9:02 ` crazylht at gmail dot com
  2022-01-13  9:04 ` crazylht at gmail dot com
                   ` (39 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-13  9:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---

(In reply to Hongtao.liu from comment #3)
> for patt_42 = (<signed-boolean:1>) patt_40;
> 
> vectype_in (QImode:nunits 4)
> 
>  <vector_type 0x7fffea18de70
>     type <boolean_type 0x7fffea18ddc8 public QI
>         size <integer_cst 0x7fffea2e2e40 constant 8>
>         unit-size <integer_cst 0x7fffea2e2e58 constant 1>
>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fffea18ddc8 precision:1 min <integer_cst 0x7fffea196ab0 -1> max
> <integer_cst 0x7fffea196b70 0>>
>     QI size <integer_cst 0x7fffea2e2e40 8> unit-size <integer_cst
> 0x7fffea2e2e58 1>
>     align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fffea18de70 nunits:4>
> 
> vectype_out(HImode)
> 
>  <vector_type 0x7fffea18df18
>     type <boolean_type 0x7fffea18ddc8 public QI
>         size <integer_cst 0x7fffea2e2e40 constant 8>
>         unit-size <integer_cst 0x7fffea2e2e58 constant 1>
>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fffea18ddc8 precision:1 min <integer_cst 0x7fffea196ab0 -1> max
> <integer_cst 0x7fffea196b70 0>>
>     HI
>     size <integer_cst 0x7fffea2e2f00 type <integer_type 0x7fffea2fd0a8
> bitsizetype> constant 16>
>     unit-size <integer_cst 0x7fffea2e2f18 type <integer_type 0x7fffea2fd000
> sizetype> constant 2>
>     align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fffea18df18 nunits:16>
> 
> And ‘vec_pack_sbool_trunc_m’ only handle situation when input and output
> have same mode.

And GCC vectorizer only handle 2X elements, but not 4X,8X,...

      /* For scalar masks we may have different boolean
         vector types having the same QImode.  Thus we
         add additional check for elements number.  */
     if (known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,              
          TYPE_VECTOR_SUBPARTS (narrow_vectype)))

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (4 preceding siblings ...)
  2022-01-13  9:02 ` crazylht at gmail dot com
@ 2022-01-13  9:04 ` crazylht at gmail dot com
  2022-01-13  9:28 ` rguenth at gcc dot gnu.org
                   ` (38 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-13  9:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---

> And GCC vectorizer only handle 2X elements, but not 4X,8X,...
> 
>       /* For scalar masks we may have different boolean
> 	 vector types having the same QImode.  Thus we
> 	 add additional check for elements number.  */
>      if (known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,		    
>           TYPE_VECTOR_SUBPARTS (narrow_vectype)))


We can have vec_pack_trunc_qi + vec_pack_sbool_trunc_qi by extending
vec_pack_sbool_trunc_qi to handle 8/4, 4/2, 2/1 vectype_out/vectype_in
elements.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (5 preceding siblings ...)
  2022-01-13  9:04 ` crazylht at gmail dot com
@ 2022-01-13  9:28 ` rguenth at gcc dot gnu.org
  2022-01-13  9:50 ` rguenth at gcc dot gnu.org
                   ` (37 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-13  9:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-01-13
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #1)
> Looks like the change causes the simpler conditional to be detected by the
> vectorizer as a masked operation, which in principle makes sense:
> 
> note:   vect_recog_mask_conversion_pattern: detected: iftmp.0_21 = x.1_14 >
> 255 ? iftmp.0_19 : iftmp.0_20;
> note:   mask_conversion pattern recognized: patt_43 = patt_42 ? iftmp.0_19 :
> iftmp.0_20;
> note:   extra pattern stmt: patt_40 = x.1_14 > 255;
> note:   extra pattern stmt: patt_42 = (<signed-boolean:8>) patt_40;

if we look at the ifcvt result we see

  iftmp.0_19 = (unsigned char) _18;
  iftmp.0_20 = (unsigned char) _5;
  iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20;
  *_6 = iftmp.0_21;

I suspect we intended to carry the fact that x.1_14 > 255 will produce
a mask from a SImode vector element compare but we need a mask suitable
for a QImode vector element select (iftmp.0_19 and iftmp.0_20).  That's
not something we can express in scalar code and thus a pattern I think.

So the pattern as generated doesn't make very much sense to me.  I suppose
it might try to convert a AVX512 mask to a AVX2 style mask but that needs
to be done with

  patt_42 = patt_40 ? (<signed-boolean:8>)-1 : 0;

not with a conversion.  But it's also somewhat pointless since it will
simply cause the same issue again - the COND_EXPR vectorization will need
to cobble up 4 AVX512 masks to produce the desired result.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (6 preceding siblings ...)
  2022-01-13  9:28 ` rguenth at gcc dot gnu.org
@ 2022-01-13  9:50 ` rguenth at gcc dot gnu.org
  2022-01-13 10:04 ` crazylht at gmail dot com
                   ` (36 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-13  9:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Forcing the pattern to not trigger produces the expected

t.c:8:6: missed:   not vectorized: relevant stmt not supported: iftmp.0_21 =
x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20;

since condition vectorization itself doesn't know how to handle this, we end
up at

  if (vectype1 && !useless_type_conversion_p (vectype, vectype1))
    return false;

with vectype V32QI and vectype1 V8SI.

Splitting out the compare from the COND_EXPR in the pattern but leaving out
the attempt to "widen" it reveals the same fact that vectorizable_condition
doesn't support packing of multiple vector defs for the mask operand.

I think that is what we need to add.  We also don't have a good representation
for "packing" of masks.

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 3ea905538e1..729a1d32612 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -4679,8 +4679,10 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
                                  rhs1_type);
        }

-      if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
-                   TYPE_VECTOR_SUBPARTS (vectype2)))
+      /* AVX512 style masks cannot be packed/unpacked.  */
+      if (TYPE_PRECISION (TREE_TYPE (vectype2)) != 1
+         && maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
+                      TYPE_VECTOR_SUBPARTS (vectype2)))
        tmp = build_mask_conversion (vinfo, rhs1, vectype1, stmt_vinfo);
       else
        tmp = rhs1;

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (7 preceding siblings ...)
  2022-01-13  9:50 ` rguenth at gcc dot gnu.org
@ 2022-01-13 10:04 ` crazylht at gmail dot com
  2022-01-13 10:11 ` rsandifo at gcc dot gnu.org
                   ` (35 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-13 10:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #7)
> Forcing the pattern to not trigger produces the expected
> 
> t.c:8:6: missed:   not vectorized: relevant stmt not supported: iftmp.0_21 =
> x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20;
> 
> since condition vectorization itself doesn't know how to handle this, we end
> up at
> 
>   if (vectype1 && !useless_type_conversion_p (vectype, vectype1))
>     return false;
> 
> with vectype V32QI and vectype1 V8SI.
> 
> Splitting out the compare from the COND_EXPR in the pattern but leaving out
> the attempt to "widen" it reveals the same fact that vectorizable_condition
> doesn't support packing of multiple vector defs for the mask operand.
> 
> I think that is what we need to add.  We also don't have a good
> representation
> for "packing" of masks.
> 
shouldn't multi_step_cvt be supposed to handle this, just a little ambiguous
between ‘vec_pack_sbool_trunc_m’ and ‘vec_pack_trunc_m’, need to make
vectorizer use vec_pack_sbool_trunc_qi + vec_pack_trunc_qi to get a HI mask,
just like how we pack 4 QImode to 1 SImode mask by vec_pack_trunc_qi +
vec_pack_trunc_hi (in the main loop with -mprefer-vector-width=256)

 mask_patt_40.26_118 = vect_x.18_95 > { 255, 255, 255, 255, 255, 255, 255, 255
};
  mask_patt_40.26_119 = vect_x.18_96 > { 255, 255, 255, 255, 255, 255, 255, 255
};
  mask_patt_40.26_120 = vect_x.18_97 > { 255, 255, 255, 255, 255, 255, 255, 255
};
  mask_patt_40.26_121 = vect_x.18_98 > { 255, 255, 255, 255, 255, 255, 255, 255
};
  mask_patt_42.28_122 = VEC_PACK_TRUNC_EXPR <mask_patt_40.26_118,
mask_patt_40.26_119>;
  mask_patt_42.28_123 = VEC_PACK_TRUNC_EXPR <mask_patt_40.26_120,
mask_patt_40.26_121>;
  mask_patt_42.27_124 = VEC_PACK_TRUNC_EXPR <mask_patt_42.28_122,
mask_patt_42.28_123>;
  vect_patt_43.29_125 = VEC_COND_EXPR <mask_patt_42.27_124, vect_iftmp.22_113,
vect_iftmp.24_116>;
  iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20;

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (8 preceding siblings ...)
  2022-01-13 10:04 ` crazylht at gmail dot com
@ 2022-01-13 10:11 ` rsandifo at gcc dot gnu.org
  2022-01-13 10:22 ` crazylht at gmail dot com
                   ` (34 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2022-01-13 10:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #9 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> I think that is what we need to add.  We also don't have a good
> representation
> for "packing" of masks.
> 
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 3ea905538e1..729a1d32612 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -4679,8 +4679,10 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
>                                   rhs1_type);
>         }
>  
> -      if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> -                   TYPE_VECTOR_SUBPARTS (vectype2)))
> +      /* AVX512 style masks cannot be packed/unpacked.  */
> +      if (TYPE_PRECISION (TREE_TYPE (vectype2)) != 1
> +         && maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> +                      TYPE_VECTOR_SUBPARTS (vectype2)))
>         tmp = build_mask_conversion (vinfo, rhs1, vectype1, stmt_vinfo);
>        else
>         tmp = rhs1;
Haven't had time to look at it properly yet, but my first impression
is that that's likely to regress SVE.  Packing and unpacking are
natural operations on boolean vector modes.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (9 preceding siblings ...)
  2022-01-13 10:11 ` rsandifo at gcc dot gnu.org
@ 2022-01-13 10:22 ` crazylht at gmail dot com
  2022-01-13 10:43 ` crazylht at gmail dot com
                   ` (33 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-13 10:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
with
@@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code code,
       c1 = VEC_PACK_TRUNC_EXPR;
       if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
          && VECTOR_BOOLEAN_TYPE_P (vectype)
-         && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
+         && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
+             || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
          && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
        optab1 = vec_pack_sbool_trunc_optab;
       else
@@ -12213,6 +12214,7 @@ supportable_narrowing_operation (enum tree_code code,
       if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
          && VECTOR_BOOLEAN_TYPE_P (prev_type)
          && intermediate_mode == prev_mode
+         && known_lt (TYPE_VECTOR_SUBPARTS (intermediate_type), BITS_PER_UNIT)
          && SCALAR_INT_MODE_P (prev_mode))
        interm_optab = vec_pack_sbool_trunc_optab;
       else

-march=icelake-server -O3 -mprefer-vector-width=128 now can get vectorized
loop.


        vmovdqu8        (%rsi,%rax), %xmm0
        vpmovzxbw       %xmm0, %xmm2
        vpmovzxwd       %xmm2, %xmm1
        vpsrldq $8, %xmm0, %xmm0
        vpsrldq $8, %xmm2, %xmm2
        vpmovzxbw       %xmm0, %xmm0
        vpmovzxwd       %xmm2, %xmm2
        vpmulld %xmm9, %xmm1, %xmm1
        vpmulld %xmm9, %xmm2, %xmm2
        vpmovzxwd       %xmm0, %xmm4
        vpsrldq $8, %xmm0, %xmm0
        vpmovzxwd       %xmm0, %xmm0
        vpmulld %xmm9, %xmm4, %xmm4
        vpmulld %xmm9, %xmm0, %xmm0
        vpcmpud $6, %xmm6, %xmm1, %k0
        vpsubd  %xmm1, %xmm7, %xmm3
        vpcmpud $6, %xmm6, %xmm2, %k1
        vpsubd  %xmm2, %xmm7, %xmm5
        vpsrad  $31, %xmm5, %xmm5
        vpsrad  $31, %xmm3, %xmm3
        vpermt2w        %xmm5, %xmm8, %xmm3
        vpsubd  %xmm0, %xmm7, %xmm10
        vpsubd  %xmm4, %xmm7, %xmm5
        kshiftlb        $4, %k1, %k1
        vpcmpud $6, %xmm6, %xmm0, %k2
        vpsrad  $31, %xmm5, %xmm5
        vpsrad  $31, %xmm10, %xmm10
        kandb   %k3, %k0, %k0
        korb    %k1, %k0, %k0
        vpcmpud $6, %xmm6, %xmm4, %k1
        vpermt2w        %xmm10, %xmm8, %xmm5
        vpermt2w        %xmm2, %xmm8, %xmm1
        vpermt2w        %xmm0, %xmm8, %xmm4
        vpermt2b        %xmm5, %xmm11, %xmm3
        vpermt2b        %xmm4, %xmm11, %xmm1
        kandb   %k3, %k1, %k1
        kshiftlb        $4, %k2, %k2
        korb    %k2, %k1, %k1
        kunpckbw        %k0, %k1, %k1
        vmovdqu8        %xmm3, %xmm1{%k1}
        vmovdqu8        %xmm1, (%rdi,%rax)
        addq    $16, %rax
        cmpq    %rax, %r8
        jne     .L4

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (10 preceding siblings ...)
  2022-01-13 10:22 ` crazylht at gmail dot com
@ 2022-01-13 10:43 ` crazylht at gmail dot com
  2022-01-13 10:44 ` rguenther at suse dot de
                   ` (32 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-13 10:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #10)
> with
> @@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code code,
>        c1 = VEC_PACK_TRUNC_EXPR;
>        if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
>  	  && VECTOR_BOOLEAN_TYPE_P (vectype)
> -	  && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> +	  && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> +	      || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
>  	  && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
>  	optab1 = vec_pack_sbool_trunc_optab;
>        else
> @@ -12213,6 +12214,7 @@ supportable_narrowing_operation (enum tree_code code,
>        if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
>  	  && VECTOR_BOOLEAN_TYPE_P (prev_type)
>  	  && intermediate_mode == prev_mode
> +	  && known_lt (TYPE_VECTOR_SUBPARTS (intermediate_type), BITS_PER_UNIT)
>  	  && SCALAR_INT_MODE_P (prev_mode))
>  	interm_optab = vec_pack_sbool_trunc_optab;
>        else
> 
> -march=icelake-server -O3 -mprefer-vector-width=128 now can get vectorized
> loop.
> 
> 
> 	vmovdqu8	(%rsi,%rax), %xmm0
> 	vpmovzxbw	%xmm0, %xmm2
> 	vpmovzxwd	%xmm2, %xmm1
> 	vpsrldq	$8, %xmm0, %xmm0
> 	vpsrldq	$8, %xmm2, %xmm2
> 	vpmovzxbw	%xmm0, %xmm0
> 	vpmovzxwd	%xmm2, %xmm2
> 	vpmulld	%xmm9, %xmm1, %xmm1
> 	vpmulld	%xmm9, %xmm2, %xmm2
> 	vpmovzxwd	%xmm0, %xmm4
> 	vpsrldq	$8, %xmm0, %xmm0
> 	vpmovzxwd	%xmm0, %xmm0
> 	vpmulld	%xmm9, %xmm4, %xmm4
> 	vpmulld	%xmm9, %xmm0, %xmm0
> 	vpcmpud	$6, %xmm6, %xmm1, %k0
> 	vpsubd	%xmm1, %xmm7, %xmm3
> 	vpcmpud	$6, %xmm6, %xmm2, %k1
> 	vpsubd	%xmm2, %xmm7, %xmm5
> 	vpsrad	$31, %xmm5, %xmm5
> 	vpsrad	$31, %xmm3, %xmm3
> 	vpermt2w	%xmm5, %xmm8, %xmm3
> 	vpsubd	%xmm0, %xmm7, %xmm10
> 	vpsubd	%xmm4, %xmm7, %xmm5
> 	kshiftlb	$4, %k1, %k1
> 	vpcmpud	$6, %xmm6, %xmm0, %k2
> 	vpsrad	$31, %xmm5, %xmm5
> 	vpsrad	$31, %xmm10, %xmm10
> 	kandb	%k3, %k0, %k0
> 	korb	%k1, %k0, %k0
> 	vpcmpud	$6, %xmm6, %xmm4, %k1
> 	vpermt2w	%xmm10, %xmm8, %xmm5
> 	vpermt2w	%xmm2, %xmm8, %xmm1
> 	vpermt2w	%xmm0, %xmm8, %xmm4
> 	vpermt2b	%xmm5, %xmm11, %xmm3
> 	vpermt2b	%xmm4, %xmm11, %xmm1
> 	kandb	%k3, %k1, %k1
> 	kshiftlb	$4, %k2, %k2
> 	korb	%k2, %k1, %k1
> 	kunpckbw	%k0, %k1, %k1
> 	vmovdqu8	%xmm3, %xmm1{%k1}
> 	vmovdqu8	%xmm1, (%rdi,%rax)
> 	addq	$16, %rax
> 	cmpq	%rax, %r8
> 	jne	.L4

But still not as good as before, since original version we only need to pack
data which is produced by vec_cond_expr, but now need to extraly pack mask.

before

  # x_24 = PHI <x_16(9), 0(21)>
  # vectp_src.11_73 = PHI <vectp_src.11_74(9), src_11(D)(21)>
  # vectp_dst.23_112 = PHI <vectp_dst.23_113(9), dst_13(D)(21)>
  # ivtmp_115 = PHI <ivtmp_116(9), 0(21)>
  # DEBUG x => NULL
  # DEBUG BEGIN_STMT
  _1 = (sizetype) x_24;
  _2 = src_11(D) + _1;
  vect__3.13_75 = MEM <vector(16) unsigned char> [(uint8_t *)vectp_src.11_73];
  _3 = *_2;
  vect__4.15_76 = [vec_unpack_lo_expr] vect__3.13_75;
  vect__4.15_77 = [vec_unpack_hi_expr] vect__3.13_75;
  vect__4.14_78 = [vec_unpack_lo_expr] vect__4.15_76;
  vect__4.14_79 = [vec_unpack_hi_expr] vect__4.15_76;
  vect__4.14_80 = [vec_unpack_lo_expr] vect__4.15_77;
  vect__4.14_81 = [vec_unpack_hi_expr] vect__4.15_77;
  _4 = (int) _3;
  vect__5.16_83 = vect__4.14_78 * vect_cst__82;
  vect__5.16_84 = vect__4.14_79 * vect_cst__82;
  vect__5.16_85 = vect__4.14_80 * vect_cst__82;
  vect__5.16_86 = vect__4.14_81 * vect_cst__82;
  _5 = _4 * i_scale_12(D);
  _6 = dst_13(D) + _1;
  # DEBUG x => NULL
  # DEBUG INLINE_ENTRY x264_clip_uint8
  # DEBUG BEGIN_STMT
  vect__14.17_88 = vect__5.16_83 & vect_cst__87;
  vect__14.17_89 = vect__5.16_84 & vect_cst__87;
  vect__14.17_90 = vect__5.16_85 & vect_cst__87;
  vect__14.17_91 = vect__5.16_86 & vect_cst__87;
  _14 = _5 & -256;
  vect__17.18_92 = -vect__5.16_83;
  vect__17.18_93 = -vect__5.16_84;
  vect__17.18_94 = -vect__5.16_85;
  vect__17.18_95 = -vect__5.16_86;
  _17 = -_5;
  vect__18.19_96 = vect__17.18_92 >> 31;
  vect__18.19_97 = vect__17.18_93 >> 31;
  vect__18.19_98 = vect__17.18_94 >> 31;
  vect__18.19_99 = vect__17.18_95 >> 31;
  _18 = _17 >> 31;
  iftmp.0_19 = (unsigned char) _18;
  iftmp.0_20 = (unsigned char) _5;
  _101 = vect__14.17_88 != vect_cst__100;
  vect_patt_40.20_102 = VEC_COND_EXPR <_101, vect__18.19_96, vect__5.16_83>;
  _103 = vect__14.17_89 != vect_cst__100;
  vect_patt_40.20_104 = VEC_COND_EXPR <_103, vect__18.19_97, vect__5.16_84>;
  _105 = vect__14.17_90 != vect_cst__100;
  vect_patt_40.20_106 = VEC_COND_EXPR <_105, vect__18.19_98, vect__5.16_85>;
  _107 = vect__14.17_91 != vect_cst__100;
  vect_patt_40.20_108 = VEC_COND_EXPR <_107, vect__18.19_99, vect__5.16_86>;
  vect_patt_41.22_109 = VEC_PACK_TRUNC_EXPR <vect_patt_40.20_102,
vect_patt_40.20_104>;
  vect_patt_41.22_110 = VEC_PACK_TRUNC_EXPR <vect_patt_40.20_106,
vect_patt_40.20_108>;
  vect_patt_41.21_111 = VEC_PACK_TRUNC_EXPR <vect_patt_41.22_109,
vect_patt_41.22_110>;
  iftmp.0_21 = _14 != 0 ? iftmp.0_19 : iftmp.0_20;
  # DEBUG x => NULL
  MEM <vector(16) unsigned char> [(uint8_t *)vectp_dst.23_112] =
vect_patt_41.21_111;
  # DEBUG BEGIN_STMT
  x_16 = x_24 + 1;
  # DEBUG x => x_16
  # DEBUG BEGIN_STMT
  vectp_src.11_74 = vectp_src.11_73 + 16;
  vectp_dst.23_113 = vectp_dst.23_112 + 16;
  ivtmp_116 = ivtmp_115 + 1;

after

  # x_24 = PHI <x_16(9), 0(21)>
  # vectp_src.12_78 = PHI <vectp_src.12_79(9), src_11(D)(21)>
  # vectp_dst.30_123 = PHI <vectp_dst.30_124(9), dst_13(D)(21)>
  # ivtmp_126 = PHI <ivtmp_127(9), 0(21)>
  _1 = (sizetype) x_24;
  _2 = src_11(D) + _1;
  vect__3.14_80 = MEM <vector(16) unsigned char> [(uint8_t *)vectp_src.12_78];
  _3 = *_2;
  vect__4.16_81 = [vec_unpack_lo_expr] vect__3.14_80;
  vect__4.16_82 = [vec_unpack_hi_expr] vect__3.14_80;
  vect__4.15_83 = [vec_unpack_lo_expr] vect__4.16_81;
  vect__4.15_84 = [vec_unpack_hi_expr] vect__4.16_81;
  vect__4.15_85 = [vec_unpack_lo_expr] vect__4.16_82;
  vect__4.15_86 = [vec_unpack_hi_expr] vect__4.16_82;
  _4 = (int) _3;
  vect__5.17_88 = vect__4.15_83 * vect_cst__87;
  vect__5.17_89 = vect__4.15_84 * vect_cst__87;
  vect__5.17_90 = vect__4.15_85 * vect_cst__87;
  vect__5.17_91 = vect__4.15_86 * vect_cst__87;
  _5 = _4 * i_scale_12(D);
  _6 = dst_13(D) + _1;
  vect_x.18_92 = VIEW_CONVERT_EXPR<vector(4) unsigned int>(vect__5.17_88);
  vect_x.18_93 = VIEW_CONVERT_EXPR<vector(4) unsigned int>(vect__5.17_89);
  vect_x.18_94 = VIEW_CONVERT_EXPR<vector(4) unsigned int>(vect__5.17_90);
  vect_x.18_95 = VIEW_CONVERT_EXPR<vector(4) unsigned int>(vect__5.17_91);
  x.1_14 = (unsigned int) _5;
  vect__41.19_96 = -vect_x.18_92;
  vect__41.19_97 = -vect_x.18_93;
  vect__41.19_98 = -vect_x.18_94;
  vect__41.19_99 = -vect_x.18_95;
  _41 = -x.1_14;
  vect__17.20_100 = VIEW_CONVERT_EXPR<vector(4) int>(vect__41.19_96);
  vect__17.20_101 = VIEW_CONVERT_EXPR<vector(4) int>(vect__41.19_97);
  vect__17.20_102 = VIEW_CONVERT_EXPR<vector(4) int>(vect__41.19_98);
  vect__17.20_103 = VIEW_CONVERT_EXPR<vector(4) int>(vect__41.19_99);
  _17 = (int) _41;
  vect__18.21_104 = vect__17.20_100 >> 31;
  vect__18.21_105 = vect__17.20_101 >> 31;
  vect__18.21_106 = vect__17.20_102 >> 31;
  vect__18.21_107 = vect__17.20_103 >> 31;
  _18 = _17 >> 31;
  vect_iftmp.23_108 = VEC_PACK_TRUNC_EXPR <vect__18.21_104, vect__18.21_105>;
  vect_iftmp.23_109 = VEC_PACK_TRUNC_EXPR <vect__18.21_106, vect__18.21_107>;
  vect_iftmp.22_110 = VEC_PACK_TRUNC_EXPR <vect_iftmp.23_108,
vect_iftmp.23_109>;
  iftmp.0_19 = (unsigned char) _18;
  vect_iftmp.25_111 = VEC_PACK_TRUNC_EXPR <vect__5.17_88, vect__5.17_89>;
  vect_iftmp.25_112 = VEC_PACK_TRUNC_EXPR <vect__5.17_90, vect__5.17_91>;
  vect_iftmp.24_113 = VEC_PACK_TRUNC_EXPR <vect_iftmp.25_111,
vect_iftmp.25_112>;
  iftmp.0_20 = (unsigned char) _5;
  mask_patt_40.26_115 = vect_x.18_92 > { 255, 255, 255, 255 };
  mask_patt_40.26_116 = vect_x.18_93 > { 255, 255, 255, 255 };
  mask_patt_40.26_117 = vect_x.18_94 > { 255, 255, 255, 255 };
  mask_patt_40.26_118 = vect_x.18_95 > { 255, 255, 255, 255 };
  mask_patt_42.28_119 = VEC_PACK_TRUNC_EXPR <mask_patt_40.26_115,
mask_patt_40.26_116>;
  mask_patt_42.28_120 = VEC_PACK_TRUNC_EXPR <mask_patt_40.26_117,
mask_patt_40.26_118>;
  mask_patt_42.27_121 = VEC_PACK_TRUNC_EXPR <mask_patt_42.28_119,
mask_patt_42.28_120>;
  vect_patt_43.29_122 = VEC_COND_EXPR <mask_patt_42.27_121, vect_iftmp.22_110,
vect_iftmp.24_113>;
  iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20;
  MEM <vector(16) unsigned char> [(uint8_t *)vectp_dst.30_123] =
vect_patt_43.29_122;
  x_16 = x_24 + 1;
  vectp_src.12_79 = vectp_src.12_78 + 16;
  vectp_dst.30_124 = vectp_dst.30_123 + 16;
  ivtmp_127 = ivtmp_126 + 1;

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (11 preceding siblings ...)
  2022-01-13 10:43 ` crazylht at gmail dot com
@ 2022-01-13 10:44 ` rguenther at suse dot de
  2022-01-13 10:49 ` rguenther at suse dot de
                   ` (31 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenther at suse dot de @ 2022-01-13 10:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 13 Jan 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> 
> --- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
> with
> @@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code code,
>        c1 = VEC_PACK_TRUNC_EXPR;
>        if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
>           && VECTOR_BOOLEAN_TYPE_P (vectype)
> -         && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> +         && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> +             || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
>           && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))

I think we instead simply want

         if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
             && TYPE_PRECISION (TREE_TYPE (narrow_vectype)) == 1
             && VECTOR_BOOLEAN_TYPE_P (vectype)
             && TYPE_PRECISION (TREE_TYPE (vectype)) == 1)

note the docs of vec_pack_sbool_trunc say

This instruction pattern is used when all the vector input and output
operands have the same scalar mode @var{m} and thus using
@code{vec_pack_trunc_@var{m}} would be ambiguous.

It also says "_Narrow_ and merge the elements of two vectors.", I think
"narrow" is misleading here, _trunc in the optab name as well.  So
with the above it suggests we could have used vect_pack_trunc_hi here?

To avoid breaking things for the VnBImode using targets we probably
want to retain the SCALAR_INT_MODE_P (prev_mode) check.  And we
probably want to adjust the documentation a bit.

This all is with my pasted pattern patch or is this with the weird
inserted conversion still?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (12 preceding siblings ...)
  2022-01-13 10:44 ` rguenther at suse dot de
@ 2022-01-13 10:49 ` rguenther at suse dot de
  2022-01-13 10:50 ` crazylht at gmail dot com
                   ` (30 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenther at suse dot de @ 2022-01-13 10:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 13 Jan 2022, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> 
> --- Comment #9 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #7)
> > I think that is what we need to add.  We also don't have a good
> > representation
> > for "packing" of masks.
> > 
> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > index 3ea905538e1..729a1d32612 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -4679,8 +4679,10 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
> >                                   rhs1_type);
> >         }
> >  
> > -      if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> > -                   TYPE_VECTOR_SUBPARTS (vectype2)))
> > +      /* AVX512 style masks cannot be packed/unpacked.  */
> > +      if (TYPE_PRECISION (TREE_TYPE (vectype2)) != 1
> > +         && maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> > +                      TYPE_VECTOR_SUBPARTS (vectype2)))
> >         tmp = build_mask_conversion (vinfo, rhs1, vectype1, stmt_vinfo);
> >        else
> >         tmp = rhs1;
> Haven't had time to look at it properly yet, but my first impression
> is that that's likely to regress SVE.  Packing and unpacking are
> natural operations on boolean vector modes.

Sure, but we can't produce scalar code mimicking this for
1 bit element vectors.  It "works" for the others because based
on the width of the elements we choose a vector with different
number of elements.  But here the pattern produces a 8 bit element
which isn't want is desired here.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (13 preceding siblings ...)
  2022-01-13 10:49 ` rguenther at suse dot de
@ 2022-01-13 10:50 ` crazylht at gmail dot com
  2022-01-13 12:40 ` crazylht at gmail dot com
                   ` (29 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-13 10:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> ---

> 
> But still not as good as before, since original version we only need to pack
> data which is produced by vec_cond_expr, but now need to extraly pack mask.
> 
>

Also for non-avx512 target, it looks like a regression for non-sve target.

https://godbolt.org/z/4aavMjjG4

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (14 preceding siblings ...)
  2022-01-13 10:50 ` crazylht at gmail dot com
@ 2022-01-13 12:40 ` crazylht at gmail dot com
  2022-01-13 13:42 ` rguenther at suse dot de
                   ` (28 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-13 12:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to rguenther@suse.de from comment #12)
> On Thu, 13 Jan 2022, crazylht at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> > 
> > --- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
> > with
> > @@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code code,
> >        c1 = VEC_PACK_TRUNC_EXPR;
> >        if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> >           && VECTOR_BOOLEAN_TYPE_P (vectype)
> > -         && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> > +         && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> > +             || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
> >           && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
> 
> I think we instead simply want
> 
>          if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
>              && TYPE_PRECISION (TREE_TYPE (narrow_vectype)) == 1
>              && VECTOR_BOOLEAN_TYPE_P (vectype)
>              && TYPE_PRECISION (TREE_TYPE (vectype)) == 1)
> 
> note the docs of vec_pack_sbool_trunc say
> 
> This instruction pattern is used when all the vector input and output
> operands have the same scalar mode @var{m} and thus using
> @code{vec_pack_trunc_@var{m}} would be ambiguous.
> 
> It also says "_Narrow_ and merge the elements of two vectors.", I think
> "narrow" is misleading here, _trunc in the optab name as well.  So
> with the above it suggests we could have used vect_pack_trunc_hi here?
> 
> To avoid breaking things for the VnBImode using targets we probably
> want to retain the SCALAR_INT_MODE_P (prev_mode) check.  And we
> probably want to adjust the documentation a bit.
> 
> This all is with my pasted pattern patch or is this with the weird
> inserted conversion still?

It's w/o your patch, I'm try to handle the weird conversion with multi
steps(first pack QI:4 -> QI:8 through vec_pack_sbool_trunc_qi, then pack QI:8
-> HI:16 through vec_pack_sbool_trunc_hi). But on the othersize the weird
inserted conversion shouldn't be existed.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (15 preceding siblings ...)
  2022-01-13 12:40 ` crazylht at gmail dot com
@ 2022-01-13 13:42 ` rguenther at suse dot de
  2022-01-13 14:42 ` crazylht at gmail dot com
                   ` (27 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenther at suse dot de @ 2022-01-13 13:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 13 Jan 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> 
> --- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
> (In reply to rguenther@suse.de from comment #12)
> > On Thu, 13 Jan 2022, crazylht at gmail dot com wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> > > 
> > > --- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
> > > with
> > > @@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code code,
> > >        c1 = VEC_PACK_TRUNC_EXPR;
> > >        if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> > >           && VECTOR_BOOLEAN_TYPE_P (vectype)
> > > -         && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> > > +         && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> > > +             || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
> > >           && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
> > 
> > I think we instead simply want
> > 
> >          if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> >              && TYPE_PRECISION (TREE_TYPE (narrow_vectype)) == 1
> >              && VECTOR_BOOLEAN_TYPE_P (vectype)
> >              && TYPE_PRECISION (TREE_TYPE (vectype)) == 1)
> > 
> > note the docs of vec_pack_sbool_trunc say
> > 
> > This instruction pattern is used when all the vector input and output
> > operands have the same scalar mode @var{m} and thus using
> > @code{vec_pack_trunc_@var{m}} would be ambiguous.
> > 
> > It also says "_Narrow_ and merge the elements of two vectors.", I think
> > "narrow" is misleading here, _trunc in the optab name as well.  So
> > with the above it suggests we could have used vect_pack_trunc_hi here?
> > 
> > To avoid breaking things for the VnBImode using targets we probably
> > want to retain the SCALAR_INT_MODE_P (prev_mode) check.  And we
> > probably want to adjust the documentation a bit.
> > 
> > This all is with my pasted pattern patch or is this with the weird
> > inserted conversion still?
> 
> It's w/o your patch, I'm try to handle the weird conversion with multi
> steps(first pack QI:4 -> QI:8 through vec_pack_sbool_trunc_qi, then pack QI:8
> -> HI:16 through vec_pack_sbool_trunc_hi). But on the othersize the weird
> inserted conversion shouldn't be existed.

But the weird conversion suggests packing { 0, 1, 0, 1 } as
{ 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, ... }
thus expanding each bit to 8 bits.  So it's rather an unpacking :/
As said, the scalar conversion does not make any sense...

But maybe I'm missing something very obvious?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (16 preceding siblings ...)
  2022-01-13 13:42 ` rguenther at suse dot de
@ 2022-01-13 14:42 ` crazylht at gmail dot com
  2022-01-14  9:40 ` rsandifo at gcc dot gnu.org
                   ` (26 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-13 14:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #17 from Hongtao.liu <crazylht at gmail dot com> ---
> As said, the scalar conversion does not make any sense...
Agree.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (17 preceding siblings ...)
  2022-01-13 14:42 ` crazylht at gmail dot com
@ 2022-01-14  9:40 ` rsandifo at gcc dot gnu.org
  2022-01-14 10:18 ` rguenth at gcc dot gnu.org
                   ` (25 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2022-01-14  9:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #18 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Again haven't really looked at this in detail, so could be wrong, but:

(In reply to rguenther@suse.de from comment #13)
> On Thu, 13 Jan 2022, rsandifo at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> > 
> > --- Comment #9 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #7)
> > > I think that is what we need to add.  We also don't have a good
> > > representation
> > > for "packing" of masks.
> > > 
> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > index 3ea905538e1..729a1d32612 100644
> > > --- a/gcc/tree-vect-patterns.c
> > > +++ b/gcc/tree-vect-patterns.c
> > > @@ -4679,8 +4679,10 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
> > >                                   rhs1_type);
> > >         }
> > >  
> > > -      if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> > > -                   TYPE_VECTOR_SUBPARTS (vectype2)))
> > > +      /* AVX512 style masks cannot be packed/unpacked.  */
> > > +      if (TYPE_PRECISION (TREE_TYPE (vectype2)) != 1
> > > +         && maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> > > +                      TYPE_VECTOR_SUBPARTS (vectype2)))
> > >         tmp = build_mask_conversion (vinfo, rhs1, vectype1, stmt_vinfo);
> > >        else
> > >         tmp = rhs1;
> > Haven't had time to look at it properly yet, but my first impression
> > is that that's likely to regress SVE.  Packing and unpacking are
> > natural operations on boolean vector modes.
> 
> Sure, but we can't produce scalar code mimicking this for
> 1 bit element vectors.
Yeah, but at least in the SVE case, we're not aiming to do that.
The vector boolean type that we want to use is recorded in the
STMT_VINFO_VECTYPE of the conversion instead, and doesn't get
recomputed later.

Like you said in the earlier comment, the fact that we can't
do that on scalar code is why this needs to be a vector pattern.

(As discussed elsewhere, it would be good if we didn't commit
vector types early like this.  But in this case I don't think
we can rely on scalar code to model the conversion either.)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (18 preceding siblings ...)
  2022-01-14  9:40 ` rsandifo at gcc dot gnu.org
@ 2022-01-14 10:18 ` rguenth at gcc dot gnu.org
  2022-01-17  8:22 ` crazylht at gmail dot com
                   ` (24 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-14 10:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, so the issue is missing -mavx512bw which means we end up with a AVX2 style
mask for V32QImode.  With -mavx512bw the code vectorizes fine.

I guess ix86_get_mask_mode is too restrictive here?

And indeed to build a AVX2 style mask vector from a AVX512 style mask vector
we'd need to use a ?:, not a simple conversion.  Unfortunately we don't support
vectorizing that:

t.c:10:21: note:   ==> examining statement: patt_42 = patt_40 ? -1 : 0;
t.c:10:21: note:   vect_is_simple_use: operand x.1_14 > 255, type of def:
internal
t.c:10:21: note:   vect_is_simple_use: vectype vector(8) <signed-boolean:1>
t.c:10:21: note:   vect_is_simple_use: operand -1, type of def: constant
t.c:10:21: note:   vect_is_simple_use: operand 0, type of def: constant
t.c:8:6: missed:   not vectorized: relevant stmt not supported: patt_42 =
patt_40 ? -1 : 0;

or maybe we need to tweak the conversion vectorization for better support
of this case as noted in comment#10 ... but the code generated from that
AFAIU is to produce a HImode mask, not a V32QImode one which puts the fault
on ix86_get_mask_mode again.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (19 preceding siblings ...)
  2022-01-14 10:18 ` rguenth at gcc dot gnu.org
@ 2022-01-17  8:22 ` crazylht at gmail dot com
  2022-01-18  8:31 ` crazylht at gmail dot com
                   ` (23 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-17  8:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #20 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #19)
> Ah, so the issue is missing -mavx512bw which means we end up with a AVX2
> style
> mask for V32QImode.  With -mavx512bw the code vectorizes fine.

Vectorization code is worse than before, now we need to pack vectorized mask
which takes extra 3 instructions.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (20 preceding siblings ...)
  2022-01-17  8:22 ` crazylht at gmail dot com
@ 2022-01-18  8:31 ` crazylht at gmail dot com
  2022-01-18  8:36 ` crazylht at gmail dot com
                   ` (22 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-18  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #21 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #20)
> (In reply to Richard Biener from comment #19)
> > Ah, so the issue is missing -mavx512bw which means we end up with a AVX2
> > style
> > mask for V32QImode.  With -mavx512bw the code vectorizes fine.
> 
> Vectorization code is worse than before, now we need to pack vectorized mask
> which takes extra 3 instructions.

Current ifcvt convert

---------dump of .ch_vect-------
  if (x.1_14 > 255)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 4> [local count: 477815112]:
  _17 = -_5;
  _18 = _17 >> 31;
  iftmp.0_19 = (unsigned char) _18;
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 477815112]:
  iftmp.0_20 = (unsigned char) _5;

  <bb 6> [local count: 955630225]:
  # iftmp.0_21 = PHI <iftmp.0_19(4), iftmp.0_20(5)>
-------dump end---------


to 
---- dump of .ifcvt---------
  _41 = -x.1_14;
  _17 = (int) _41;
  _18 = _17 >> 31;
  iftmp.0_19 = (unsigned char) _18; -- vec_pack_trunc
  iftmp.0_20 = (unsigned char) _5; -- vec_pack_trunc
  iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20; -- vec_pack_trunc
  *_6 = iftmp.0_21;
  x_16 = x_24 + 1;
-----dump end----------


if ifcvt output things like
------------optimal .ifcvt------
  _41 = -x.1_14;
  _17 = (int) _41;
  _18 = _17 >> 31;
  iftmp.0_21 = x.1_14 > 255 ? _18 : _5;
  iftmp.0_22 = (unsigned char) iftmp.0_21; --- vec_pack_trunc
  *_6 = iftmp.0_22;
  x_16 = x_24 + 1;
------------end------------

we can save operations for packing mask(3 vec_pack_trunc vs 1 vec_pack_trunc?).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (21 preceding siblings ...)
  2022-01-18  8:31 ` crazylht at gmail dot com
@ 2022-01-18  8:36 ` crazylht at gmail dot com
  2022-01-18 10:44 ` rguenther at suse dot de
                   ` (21 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-18  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #22 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #21)
> (In reply to Hongtao.liu from comment #20)
> > (In reply to Richard Biener from comment #19)
> > > Ah, so the issue is missing -mavx512bw which means we end up with a AVX2
> > > style
> > > mask for V32QImode.  With -mavx512bw the code vectorizes fine.
> > 
> > Vectorization code is worse than before, now we need to pack vectorized mask
> > which takes extra 3 instructions.
> 
> Current ifcvt convert
> 
> ---------dump of .ch_vect-------
>   if (x.1_14 > 255)
>     goto <bb 4>; [50.00%]
>   else
>     goto <bb 5>; [50.00%]
> 
>   <bb 4> [local count: 477815112]:
>   _17 = -_5;
>   _18 = _17 >> 31;
>   iftmp.0_19 = (unsigned char) _18;
>   goto <bb 6>; [100.00%]
> 
>   <bb 5> [local count: 477815112]:
>   iftmp.0_20 = (unsigned char) _5;
> 
>   <bb 6> [local count: 955630225]:
>   # iftmp.0_21 = PHI <iftmp.0_19(4), iftmp.0_20(5)>
> -------dump end---------
> 
> 
> to 
> ---- dump of .ifcvt---------
>   _41 = -x.1_14;
>   _17 = (int) _41;
>   _18 = _17 >> 31;
>   iftmp.0_19 = (unsigned char) _18; -- vec_pack_trunc
>   iftmp.0_20 = (unsigned char) _5; -- vec_pack_trunc
>   iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20; -- vec_pack_trunc
>   *_6 = iftmp.0_21;
>   x_16 = x_24 + 1;
> -----dump end----------
> 
> 
> if ifcvt output things like
> ------------optimal .ifcvt------
>   _41 = -x.1_14;
>   _17 = (int) _41;
>   _18 = _17 >> 31;
>   iftmp.0_21 = x.1_14 > 255 ? _18 : _5;
>   iftmp.0_22 = (unsigned char) iftmp.0_21; --- vec_pack_trunc
>   *_6 = iftmp.0_22;
>   x_16 = x_24 + 1;
> ------------end------------
> 
> we can save operations for packing mask(3 vec_pack_trunc vs 1
> vec_pack_trunc?).

Or maybe a gimple simplification for it?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (22 preceding siblings ...)
  2022-01-18  8:36 ` crazylht at gmail dot com
@ 2022-01-18 10:44 ` rguenther at suse dot de
  2022-01-18 10:49 ` pinskia at gcc dot gnu.org
                   ` (20 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenther at suse dot de @ 2022-01-18 10:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #23 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 18 Jan 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> 
> --- Comment #22 from Hongtao.liu <crazylht at gmail dot com> ---
> (In reply to Hongtao.liu from comment #21)
> > (In reply to Hongtao.liu from comment #20)
> > > (In reply to Richard Biener from comment #19)
> > > > Ah, so the issue is missing -mavx512bw which means we end up with a AVX2
> > > > style
> > > > mask for V32QImode.  With -mavx512bw the code vectorizes fine.
> > > 
> > > Vectorization code is worse than before, now we need to pack vectorized mask
> > > which takes extra 3 instructions.
> > 
> > Current ifcvt convert
> > 
> > ---------dump of .ch_vect-------
> >   if (x.1_14 > 255)
> >     goto <bb 4>; [50.00%]
> >   else
> >     goto <bb 5>; [50.00%]
> > 
> >   <bb 4> [local count: 477815112]:
> >   _17 = -_5;
> >   _18 = _17 >> 31;
> >   iftmp.0_19 = (unsigned char) _18;
> >   goto <bb 6>; [100.00%]
> > 
> >   <bb 5> [local count: 477815112]:
> >   iftmp.0_20 = (unsigned char) _5;
> > 
> >   <bb 6> [local count: 955630225]:
> >   # iftmp.0_21 = PHI <iftmp.0_19(4), iftmp.0_20(5)>
> > -------dump end---------
> > 
> > 
> > to 
> > ---- dump of .ifcvt---------
> >   _41 = -x.1_14;
> >   _17 = (int) _41;
> >   _18 = _17 >> 31;
> >   iftmp.0_19 = (unsigned char) _18; -- vec_pack_trunc
> >   iftmp.0_20 = (unsigned char) _5; -- vec_pack_trunc
> >   iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20; -- vec_pack_trunc
> >   *_6 = iftmp.0_21;
> >   x_16 = x_24 + 1;
> > -----dump end----------
> > 
> > 
> > if ifcvt output things like
> > ------------optimal .ifcvt------
> >   _41 = -x.1_14;
> >   _17 = (int) _41;
> >   _18 = _17 >> 31;
> >   iftmp.0_21 = x.1_14 > 255 ? _18 : _5;
> >   iftmp.0_22 = (unsigned char) iftmp.0_21; --- vec_pack_trunc
> >   *_6 = iftmp.0_22;
> >   x_16 = x_24 + 1;
> > ------------end------------
> > 
> > we can save operations for packing mask(3 vec_pack_trunc vs 1
> > vec_pack_trunc?).
> 
> Or maybe a gimple simplification for it?

Yes, I think that's a candidate for a match.pd simplification.
Fortunately if-conversion already folds the built stmts.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (23 preceding siblings ...)
  2022-01-18 10:44 ` rguenther at suse dot de
@ 2022-01-18 10:49 ` pinskia at gcc dot gnu.org
  2022-01-19  6:03 ` crazylht at gmail dot com
                   ` (19 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-18 10:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu.org

--- Comment #24 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note phiopt has code which is supposed to factor out the cast but many reasons,
I have noticed it almost never does it (I still don't understand the cases when
it does and when it does not yet). This was something which I was going to fix
for GCC 13.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (24 preceding siblings ...)
  2022-01-18 10:49 ` pinskia at gcc dot gnu.org
@ 2022-01-19  6:03 ` crazylht at gmail dot com
  2022-01-19  6:10 ` pinskia at gcc dot gnu.org
                   ` (18 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-19  6:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #25 from Hongtao.liu <crazylht at gmail dot com> ---
in fold_unary_loc
---cut from fold-const.cc-----
 9276      else if (TREE_CODE (arg0) == COND_EXPR)
 9277        {
 9278          tree arg01 = TREE_OPERAND (arg0, 1);
 9279          tree arg02 = TREE_OPERAND (arg0, 2);
 9280          if (! VOID_TYPE_P (TREE_TYPE (arg01)))
 9281            arg01 = fold_build1_loc (loc, code, type,
 9282                                 fold_convert_loc (loc,
 9283                                                   TREE_TYPE (op0),
arg01));
 9284          if (! VOID_TYPE_P (TREE_TYPE (arg02)))
 9285            arg02 = fold_build1_loc (loc, code, type,
 9286                                 fold_convert_loc (loc,
 9287                                                   TREE_TYPE (op0),
arg02));
 9288=>        tem = fold_build3_loc (loc, COND_EXPR, type, TREE_OPERAND (arg0,
0),
 9289                             arg01, arg02);

-----------end---------------

gcc always tries to simplify (convert (cond (cmp a b) c d) ---- > (cond (cmp a
b) (convert c) (convert d)), exactly the opposite of what this case wants.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (25 preceding siblings ...)
  2022-01-19  6:03 ` crazylht at gmail dot com
@ 2022-01-19  6:10 ` pinskia at gcc dot gnu.org
  2022-01-19  6:25 ` crazylht at gmail dot com
                   ` (17 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-19  6:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #26 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #25)
> 
> gcc always tries to simplify (convert (cond (cmp a b) c d) ---- > (cond (cmp
> a b) (convert c) (convert d)), exactly the opposite of what this case wants.

Right that is for generic. For the gimple level the function
factor_out_conditional_conversion in phiopt is supposed to undo this but as I
mentioned it does not always.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (26 preceding siblings ...)
  2022-01-19  6:10 ` pinskia at gcc dot gnu.org
@ 2022-01-19  6:25 ` crazylht at gmail dot com
  2022-01-19  7:44 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-19  6:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #27 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Andrew Pinski from comment #26)
> (In reply to Hongtao.liu from comment #25)
> > 
> > gcc always tries to simplify (convert (cond (cmp a b) c d) ---- > (cond (cmp
> > a b) (convert c) (convert d)), exactly the opposite of what this case wants.
> 
> Right that is for generic. For the gimple level the function
> factor_out_conditional_conversion in phiopt is supposed to undo this but as
Oh, thanks for the info. I'll leave it to you in GCC13.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (27 preceding siblings ...)
  2022-01-19  6:25 ` crazylht at gmail dot com
@ 2022-01-19  7:44 ` rguenth at gcc dot gnu.org
  2022-01-19  8:33 ` crazylht at gmail dot com
                   ` (15 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-19  7:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #28 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #25)
> in fold_unary_loc
> ---cut from fold-const.cc-----
>  9276      else if (TREE_CODE (arg0) == COND_EXPR)
>  9277        {
>  9278          tree arg01 = TREE_OPERAND (arg0, 1);
>  9279          tree arg02 = TREE_OPERAND (arg0, 2);
>  9280          if (! VOID_TYPE_P (TREE_TYPE (arg01)))
>  9281            arg01 = fold_build1_loc (loc, code, type,
>  9282                                 fold_convert_loc (loc,
>  9283                                                   TREE_TYPE (op0),
> arg01));
>  9284          if (! VOID_TYPE_P (TREE_TYPE (arg02)))
>  9285            arg02 = fold_build1_loc (loc, code, type,
>  9286                                 fold_convert_loc (loc,
>  9287                                                   TREE_TYPE (op0),
> arg02));
>  9288=>        tem = fold_build3_loc (loc, COND_EXPR, type, TREE_OPERAND
> (arg0, 0),
>  9289                             arg01, arg02);
> 
> -----------end---------------
> 
> gcc always tries to simplify (convert (cond (cmp a b) c d) ---- > (cond (cmp
> a b) (convert c) (convert d)), exactly the opposite of what this case wants.

It also then undos this if the result didn't simplify and plays trick to avoid
recursions.

I think this particular transform ought to be specialized, maybe to
(T)p?(T')a:(T')b or maybe done during gimplification or RTL expansion only.

The "cheap" way of avoiding a conflict is to wrap the match.pd pattern
with opposite logic in

#if GIMPLE
#endif

(with a comment explaining this)

Note that we can move a conversion out only if the sources of the conversions
have compatible types but we always can move a conversion in.

Alternatively this transform can also be done in a vectorizer pattern based
on vector compatibility of the ?: predicate with the data.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (28 preceding siblings ...)
  2022-01-19  7:44 ` rguenth at gcc dot gnu.org
@ 2022-01-19  8:33 ` crazylht at gmail dot com
  2022-01-19  8:48 ` rguenther at suse dot de
                   ` (14 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-19  8:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #29 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #28)
> (In reply to Hongtao.liu from comment #25)
> > in fold_unary_loc
> > ---cut from fold-const.cc-----
> >  9276      else if (TREE_CODE (arg0) == COND_EXPR)
> >  9277        {
> >  9278          tree arg01 = TREE_OPERAND (arg0, 1);
> >  9279          tree arg02 = TREE_OPERAND (arg0, 2);
> >  9280          if (! VOID_TYPE_P (TREE_TYPE (arg01)))
> >  9281            arg01 = fold_build1_loc (loc, code, type,
> >  9282                                 fold_convert_loc (loc,
> >  9283                                                   TREE_TYPE (op0),
> > arg01));
> >  9284          if (! VOID_TYPE_P (TREE_TYPE (arg02)))
> >  9285            arg02 = fold_build1_loc (loc, code, type,
> >  9286                                 fold_convert_loc (loc,
> >  9287                                                   TREE_TYPE (op0),
> > arg02));
> >  9288=>        tem = fold_build3_loc (loc, COND_EXPR, type, TREE_OPERAND
> > (arg0, 0),
> >  9289                             arg01, arg02);
> > 
> > -----------end---------------
> > 
> > gcc always tries to simplify (convert (cond (cmp a b) c d) ---- > (cond (cmp
> > a b) (convert c) (convert d)), exactly the opposite of what this case wants.
> 
> It also then undos this if the result didn't simplify and plays trick to
> avoid
> recursions.
> 
> I think this particular transform ought to be specialized, maybe to
> (T)p?(T')a:(T')b or maybe done during gimplification or RTL expansion only.
> 
> The "cheap" way of avoiding a conflict is to wrap the match.pd pattern
> with opposite logic in
> 
> #if GIMPLE
> #endif
> 
It doesn't work, 
> (with a comment explaining this)
> 
> Note that we can move a conversion out only if the sources of the conversions
> have compatible types but we always can move a conversion in.
> 
> Alternatively this transform can also be done in a vectorizer pattern based
> on vector compatibility of the ?: predicate with the data.
yes, I'm thinking of doing this in fold_build_cond_expr which is only used by
pass_ifcvt to generate cond_expr.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (29 preceding siblings ...)
  2022-01-19  8:33 ` crazylht at gmail dot com
@ 2022-01-19  8:48 ` rguenther at suse dot de
  2022-01-20  8:52 ` cvs-commit at gcc dot gnu.org
                   ` (13 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenther at suse dot de @ 2022-01-19  8:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #30 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 19 Jan 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> 
> --- Comment #29 from Hongtao.liu <crazylht at gmail dot com> ---
> (In reply to Richard Biener from comment #28)
> > (In reply to Hongtao.liu from comment #25)
> > > in fold_unary_loc
> > > ---cut from fold-const.cc-----
> > >  9276      else if (TREE_CODE (arg0) == COND_EXPR)
> > >  9277        {
> > >  9278          tree arg01 = TREE_OPERAND (arg0, 1);
> > >  9279          tree arg02 = TREE_OPERAND (arg0, 2);
> > >  9280          if (! VOID_TYPE_P (TREE_TYPE (arg01)))
> > >  9281            arg01 = fold_build1_loc (loc, code, type,
> > >  9282                                 fold_convert_loc (loc,
> > >  9283                                                   TREE_TYPE (op0),
> > > arg01));
> > >  9284          if (! VOID_TYPE_P (TREE_TYPE (arg02)))
> > >  9285            arg02 = fold_build1_loc (loc, code, type,
> > >  9286                                 fold_convert_loc (loc,
> > >  9287                                                   TREE_TYPE (op0),
> > > arg02));
> > >  9288=>        tem = fold_build3_loc (loc, COND_EXPR, type, TREE_OPERAND
> > > (arg0, 0),
> > >  9289                             arg01, arg02);
> > > 
> > > -----------end---------------
> > > 
> > > gcc always tries to simplify (convert (cond (cmp a b) c d) ---- > (cond (cmp
> > > a b) (convert c) (convert d)), exactly the opposite of what this case wants.
> > 
> > It also then undos this if the result didn't simplify and plays trick to
> > avoid
> > recursions.
> > 
> > I think this particular transform ought to be specialized, maybe to
> > (T)p?(T')a:(T')b or maybe done during gimplification or RTL expansion only.
> > 
> > The "cheap" way of avoiding a conflict is to wrap the match.pd pattern
> > with opposite logic in
> > 
> > #if GIMPLE
> > #endif
> > 
> It doesn't work, 
> > (with a comment explaining this)
> > 
> > Note that we can move a conversion out only if the sources of the conversions
> > have compatible types but we always can move a conversion in.
> > 
> > Alternatively this transform can also be done in a vectorizer pattern based
> > on vector compatibility of the ?: predicate with the data.
> yes, I'm thinking of doing this in fold_build_cond_expr which is only used by
> pass_ifcvt to generate cond_expr.

That's also a reasonable place but the vectorizer pattern recog phase
might have more contextual information to determine the best types
to use.   fold_build_cond_expr is probably easiest to adjust though.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (30 preceding siblings ...)
  2022-01-19  8:48 ` rguenther at suse dot de
@ 2022-01-20  8:52 ` cvs-commit at gcc dot gnu.org
  2022-01-27  6:03 ` crazylht at gmail dot com
                   ` (12 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-20  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #31 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:8bc700f4c3fbe405413db02281ef2918bfa831fc

commit r12-6756-g8bc700f4c3fbe405413db02281ef2918bfa831fc
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Jan 17 10:47:46 2022 +0800

    Enhance vec_pack_trunc for integral mode mask.

    For testcase in PR, the patch supports QI:4 -> HI:16 pack with
    multi steps(first pack QI:4 -> QI:8 through vec_pack_sbool_trunc_qi,
    then pack QI:8 -> HI:16 through vec_pack_trunc_hi).
    Similar for QI:2 -> HI:16 which is test4 in mask-pack-prefer-128.c.

    gcc/ChangeLog:

            PR target/103771
            * tree-vect-stmts.cc (supportable_narrowing_operation): Enhance
            integral mode mask pack by multi steps which takes
            vec_pack_sbool_trunc_optab as start when elements number is
            less than BITS_PER_UNITS.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/mask-pack-prefer128.c: New test.
            * gcc.target/i386/mask-pack-prefer256.c: New test.
            * gcc.target/i386/pr103771.c: New test.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (31 preceding siblings ...)
  2022-01-20  8:52 ` cvs-commit at gcc dot gnu.org
@ 2022-01-27  6:03 ` crazylht at gmail dot com
  2022-02-01  5:21 ` pinskia at gcc dot gnu.org
                   ` (11 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-01-27  6:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #32 from Hongtao.liu <crazylht at gmail dot com> ---
A patch is posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589168.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (32 preceding siblings ...)
  2022-01-27  6:03 ` crazylht at gmail dot com
@ 2022-02-01  5:21 ` pinskia at gcc dot gnu.org
  2022-02-01 16:13 ` law at gcc dot gnu.org
                   ` (10 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-01  5:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #33 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Oh yes I remember why factor_out_conditional_conversion is not doing it. it is
because we have two bb but the current code does not handle that case.
That is we have:
  if (x.1_1 > 255)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  _2 = -x_5(D);
  _3 = _2 >> 31;
  iftmp.0_7 = (uint8_t) _3;
  goto <bb 5>; [INV]

  <bb 4> :
  iftmp.0_6 = (uint8_t) x_5(D);

  <bb 5> :
  # iftmp.0_4 = PHI <iftmp.0_7(3), iftmp.0_6(4)>

While the correct code only handles the case where the one of the bb4 does not
exist. Yes I am working on this for GCC 13.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (33 preceding siblings ...)
  2022-02-01  5:21 ` pinskia at gcc dot gnu.org
@ 2022-02-01 16:13 ` law at gcc dot gnu.org
  2022-02-01 16:18 ` pinskia at gcc dot gnu.org
                   ` (9 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: law at gcc dot gnu.org @ 2022-02-01 16:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #34 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I've always wanted to see us be able to push something like those matched
conversions down through the PHI.

That would make the code look like:

  if (x.1_1 > 255)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  _2 = -x_5(D);
  _3 = _2 >> 31;
  goto <bb 5>; [INV]

  <bb 4> :

  <bb 5> :
  # tmp = PHI <_3, x_5(4)>
  iftmp.0_4 = (uint8_t) tmp

And presumably we'd clean up the empty bb4 which would in turn unblock other
optimizations.

Is that what you're working on?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (34 preceding siblings ...)
  2022-02-01 16:13 ` law at gcc dot gnu.org
@ 2022-02-01 16:18 ` pinskia at gcc dot gnu.org
  2022-02-13  9:58 ` cvs-commit at gcc dot gnu.org
                   ` (8 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-01 16:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #35 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #34)
> I've always wanted to see us be able to push something like those matched
> conversions down through the PHI.
> 
....

> Is that what you're working on?

Yes.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (35 preceding siblings ...)
  2022-02-01 16:18 ` pinskia at gcc dot gnu.org
@ 2022-02-13  9:58 ` cvs-commit at gcc dot gnu.org
  2022-02-14  7:36 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-13  9:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #36 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:7e204bd2f189850cb940677c99d8d93eb7dd40cd

commit r12-7216-g7e204bd2f189850cb940677c99d8d93eb7dd40cd
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Jan 24 11:05:47 2022 +0800

    Add vect_recog_cond_expr_convert_pattern.

    The pattern converts (cond (cmp a b) (convert c) (convert d))
    to (convert (cond (cmp a b) c d)) when
    1) types_match (c, d)
    2) single_use for (convert c) and (convert d)
    3) TYPE_PRECISION (TREE_TYPE (c)) == TYPE_PRECISION (TREE_TYPE (a))
    4) INTEGERAL_TYPE_P (TREE_TYPE (c))

    The pattern can save packing of mask and data(partial for data, 2 vs
    1).

    gcc/ChangeLog:

            PR target/103771
            * match.pd (cond_expr_convert_p): New match.
            * tree-vect-patterns.cc (gimple_cond_expr_convert_p): Declare.
            (vect_recog_cond_expr_convert_pattern): New.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr103771-2.c: New test.
            * gcc.target/i386/pr103771-3.c: New test.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (36 preceding siblings ...)
  2022-02-13  9:58 ` cvs-commit at gcc dot gnu.org
@ 2022-02-14  7:36 ` rguenth at gcc dot gnu.org
  2022-02-14  7:45 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-14  7:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #37 from Richard Biener <rguenth at gcc dot gnu.org> ---
Is this now "good enough" for GCC 12?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (37 preceding siblings ...)
  2022-02-14  7:36 ` rguenth at gcc dot gnu.org
@ 2022-02-14  7:45 ` crazylht at gmail dot com
  2022-02-14  7:53 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: crazylht at gmail dot com @ 2022-02-14  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #38 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #37)
> Is this now "good enough" for GCC 12?

Yes.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (38 preceding siblings ...)
  2022-02-14  7:45 ` crazylht at gmail dot com
@ 2022-02-14  7:53 ` rguenth at gcc dot gnu.org
  2022-02-17 11:01 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-14  7:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
   Target Milestone|12.0                        |13.0

--- Comment #39 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed for the target regression part, Andrew is working on a more generic fix
for GCC 13.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (39 preceding siblings ...)
  2022-02-14  7:53 ` rguenth at gcc dot gnu.org
@ 2022-02-17 11:01 ` cvs-commit at gcc dot gnu.org
  2023-04-26  6:55 ` [Bug target/103771] [12/13/14 " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-17 11:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #40 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:754dce903ca28c4c2f2bc8614a8de5e631655f2e

commit r12-7276-g754dce903ca28c4c2f2bc8614a8de5e631655f2e
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Feb 16 12:15:18 2022 +0800

    Restrict the two sources of vect_recog_cond_expr_convert_pattern to be of
the same type when convert is extension.

    It's not equal to transform

     (cond (cmp @1 @2) (convert@3 @4) (convert@5 @6))

     to

     (convert (cmp @1 @2) (convert)@4 @6)

    when(convert@3 @4) is extension because it's zero_extend vs sign_extend.

    gcc/ChangeLog:

            PR tree-optimization/104551
            PR tree-optimization/103771
            * match.pd (cond_expr_convert_p): Add types_match check when
            convert is extension.
            * tree-vect-patterns.cc
            (gimple_cond_expr_convert_p): Adjust comments.
            (vect_recog_cond_expr_convert_pattern): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr104551.c: New test.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12/13/14 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (40 preceding siblings ...)
  2022-02-17 11:01 ` cvs-commit at gcc dot gnu.org
@ 2023-04-26  6:55 ` rguenth at gcc dot gnu.org
  2023-05-06 23:21 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  44 siblings, 0 replies; 46+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-26  6:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|13.0                        |13.2

--- Comment #41 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 13.1 is being released, retargeting bugs to GCC 13.2.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12/13/14 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (41 preceding siblings ...)
  2023-04-26  6:55 ` [Bug target/103771] [12/13/14 " rguenth at gcc dot gnu.org
@ 2023-05-06 23:21 ` pinskia at gcc dot gnu.org
  2023-05-08  7:38 ` cvs-commit at gcc dot gnu.org
  2023-05-08  7:39 ` pinskia at gcc dot gnu.org
  44 siblings, 0 replies; 46+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-06 23:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #42 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The patch is simple after recent cleanups:
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f14b7e8b7e6..41fea78dc8d 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -4087,7 +4087,7 @@ pass_phiopt::execute (function *)

       gphi *newphi;
       if (single_pred_p (bb1)
-         && !diamond_p
+         && EDGE_COUNT (merge->preds) == 2
          && (newphi = factor_out_conditional_conversion (e1, e2, phi,
                                                          arg0, arg1,
                                                          cond_stmt)))


I am putting together testcases for this right now.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12/13/14 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (42 preceding siblings ...)
  2023-05-06 23:21 ` pinskia at gcc dot gnu.org
@ 2023-05-08  7:38 ` cvs-commit at gcc dot gnu.org
  2023-05-08  7:39 ` pinskia at gcc dot gnu.org
  44 siblings, 0 replies; 46+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-08  7:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #43 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>:

https://gcc.gnu.org/g:69f1a8af45d8a42003c21489019ddfb01d88d30d

commit r14-573-g69f1a8af45d8a42003c21489019ddfb01d88d30d
Author: Andrew Pinski <apinski@marvell.com>
Date:   Wed Apr 26 14:55:46 2023 -0700

    PHIOPT: Add diamond bb form to factor_out_conditional_conversion

    So the function factor_out_conditional_conversion already supports
    diamond shaped bb forms, just need to be called for such a thing.

    harden-cond-comp.c needed to be changed as we would optimize out the
    conversion now and that causes the compare hardening not needing to
    split the block which it was testing. So change it such that there
    would be no chance of optimization.

    Also add two testcases that showed the improvement. PR 103771 is
    solved in ifconvert also for the vectorizer but now it is solved
    in a general sense.

    OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

            PR tree-optimization/49959
            PR tree-optimization/103771

    gcc/ChangeLog:

            * tree-ssa-phiopt.cc (pass_phiopt::execute): Support
            Diamond shapped bb form for factor_out_conditional_conversion.

    gcc/testsuite/ChangeLog:

            * c-c++-common/torture/harden-cond-comp.c: Change testcase
            slightly to avoid the new phiopt optimization.
            * gcc.dg/tree-ssa/abs-2.c: New test.
            * gcc.dg/tree-ssa/pr103771.c: New test.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Bug target/103771] [12/13/14 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489
  2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
                   ` (43 preceding siblings ...)
  2023-05-08  7:38 ` cvs-commit at gcc dot gnu.org
@ 2023-05-08  7:39 ` pinskia at gcc dot gnu.org
  44 siblings, 0 replies; 46+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-08  7:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
   Target Milestone|13.2                        |14.0
             Status|ASSIGNED                    |RESOLVED

--- Comment #44 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2023-05-08  7:39 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20  8:59 [Bug target/103771] New: Missed vectorization under -mavx512f -mavx512vl after r12-5489 wwwhhhyyy333 at gmail dot com
2021-12-20 11:45 ` [Bug target/103771] [12 Regression] " pinskia at gcc dot gnu.org
2022-01-04 15:31 ` tnfchris at gcc dot gnu.org
2022-01-05  9:00 ` crazylht at gmail dot com
2022-01-13  8:43 ` crazylht at gmail dot com
2022-01-13  9:02 ` crazylht at gmail dot com
2022-01-13  9:04 ` crazylht at gmail dot com
2022-01-13  9:28 ` rguenth at gcc dot gnu.org
2022-01-13  9:50 ` rguenth at gcc dot gnu.org
2022-01-13 10:04 ` crazylht at gmail dot com
2022-01-13 10:11 ` rsandifo at gcc dot gnu.org
2022-01-13 10:22 ` crazylht at gmail dot com
2022-01-13 10:43 ` crazylht at gmail dot com
2022-01-13 10:44 ` rguenther at suse dot de
2022-01-13 10:49 ` rguenther at suse dot de
2022-01-13 10:50 ` crazylht at gmail dot com
2022-01-13 12:40 ` crazylht at gmail dot com
2022-01-13 13:42 ` rguenther at suse dot de
2022-01-13 14:42 ` crazylht at gmail dot com
2022-01-14  9:40 ` rsandifo at gcc dot gnu.org
2022-01-14 10:18 ` rguenth at gcc dot gnu.org
2022-01-17  8:22 ` crazylht at gmail dot com
2022-01-18  8:31 ` crazylht at gmail dot com
2022-01-18  8:36 ` crazylht at gmail dot com
2022-01-18 10:44 ` rguenther at suse dot de
2022-01-18 10:49 ` pinskia at gcc dot gnu.org
2022-01-19  6:03 ` crazylht at gmail dot com
2022-01-19  6:10 ` pinskia at gcc dot gnu.org
2022-01-19  6:25 ` crazylht at gmail dot com
2022-01-19  7:44 ` rguenth at gcc dot gnu.org
2022-01-19  8:33 ` crazylht at gmail dot com
2022-01-19  8:48 ` rguenther at suse dot de
2022-01-20  8:52 ` cvs-commit at gcc dot gnu.org
2022-01-27  6:03 ` crazylht at gmail dot com
2022-02-01  5:21 ` pinskia at gcc dot gnu.org
2022-02-01 16:13 ` law at gcc dot gnu.org
2022-02-01 16:18 ` pinskia at gcc dot gnu.org
2022-02-13  9:58 ` cvs-commit at gcc dot gnu.org
2022-02-14  7:36 ` rguenth at gcc dot gnu.org
2022-02-14  7:45 ` crazylht at gmail dot com
2022-02-14  7:53 ` rguenth at gcc dot gnu.org
2022-02-17 11:01 ` cvs-commit at gcc dot gnu.org
2023-04-26  6:55 ` [Bug target/103771] [12/13/14 " rguenth at gcc dot gnu.org
2023-05-06 23:21 ` pinskia at gcc dot gnu.org
2023-05-08  7:38 ` cvs-commit at gcc dot gnu.org
2023-05-08  7:39 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).