public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch, vectorizer] Fix PR33953 ICE in vectorizable_operation at  tree-vect-transform.c:4017
@ 2007-11-01 15:08 Ira Rosen
  2007-11-06  8:18 ` Dorit Nuzman
  0 siblings, 1 reply; 7+ messages in thread
From: Ira Rosen @ 2007-11-01 15:08 UTC (permalink / raw)
  To: gcc-patches


Hi,

In case of vector shift with scalar shift argument only one scalar operand
is currently created, while in case of SLP a sequence of statements can be
grouped into more than one vector statement. This patch replicates such
operand for every vector statement that will be created.

As pointed out by Richard, there may also be a problem with VEC
allocations. I will look into this.

Bootstrapped with vectorization enabled and tested on x86_64-linux.
O.K. for mainline?

Thanks,
Ira


ChangeLog:

      PR tree-optimization/33953
      * tree-vect-transform.c (vectorizable_operation): In case of SLP,
allocate
      vec_oprnds1 according to the number of created vector statements. In
case
      of shift with scalar argument, store scalar operand for every vector
      statement to be created for the SLP node. Fix a comment.

testsuite/ChangeLog:

      PR tree-optimization/33953
      * gcc.dg/vect/pr33953.c: New testcase.


Index: tree-vect-transform.c
===================================================================
--- tree-vect-transform.c       (revision 129627)
+++ tree-vect-transform.c       (working copy)
@@ -3772,6 +3772,7 @@ vectorizable_operation (tree stmt, block
   int j, i;
   VEC(tree,heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
   tree vop0, vop1;
+  unsigned int k;

   /* FORNOW: SLP with multiple types is not supported. The SLP analysis
verifies
      this, so we can safely override NCOPIES with 1 here.  */
@@ -3915,10 +3916,14 @@ vectorizable_operation (tree stmt, block
   /* Handle def.  */
   vec_dest = vect_create_destination_var (scalar_dest, vectype);

-  if (!slp_node)
-    vec_oprnds0 = VEC_alloc (tree, heap, 1);
-  if (op_type == binary_op)
-    vec_oprnds1 = VEC_alloc (tree, heap, 1);
+  if (slp_node && op_type == binary_op)
+    vec_oprnds1 = VEC_alloc (tree, heap, slp_node->vec_stmts_size);
+  else
+    {
+      vec_oprnds0 = VEC_alloc (tree, heap, 1);
+      if (op_type == binary_op)
+        vec_oprnds1 = VEC_alloc (tree, heap, 1);
+    }

   /* In case the vectorization factor (VF) is bigger than the number
      of elements that we can fit in a vectype (nunits), we have to
generate
@@ -3993,10 +3998,17 @@ vectorizable_operation (tree stmt, block
                    fprintf (vect_dump, "operand 1 using scalar mode.");
                  vec_oprnd1 = op1;
                  VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
+                 if (slp_node)
+                   {
+                     /* Store vec_oprnd1 for every vector stmt to be
created
+                        for SLP_NODE.  */
+                     for (k = 0; k < slp_node->vec_stmts_size - 1; k++)
+                       VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
+                   }
                }
            }

-          /* vec_oprnd is available if operand 1 should be of a
scalar-type
+          /* vec_oprnd1 is available if operand 1 should be of a
scalar-type
              (a special case for certain kind of vector shifts);
otherwise,
              operand 1 should be of a vector type (the usual case).  */
          if (op_type == binary_op && !vec_oprnd1)

Index: testsuite/gcc.dg/vect/pr33953.c
===================================================================
--- testsuite/gcc.dg/vect/pr33953.c     (revision 0)
+++ testsuite/gcc.dg/vect/pr33953.c     (revision 0)
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+typedef unsigned int UINT32;
+
+void blockmove_NtoN_blend_noremap32 (const UINT32 *srcdata, int srcwidth,
+                                     int srcheight, int srcmodulo,
+                                     UINT32 *dstdata, int dstmodulo,
+                                     int srcshift)
+{
+ UINT32 *end;
+
+ while (srcheight)
+   {
+     while (dstdata <= end - 8)
+       {
+         dstdata[0] |= srcdata[0] << srcshift;
+         dstdata[1] |= srcdata[1] << srcshift;
+         dstdata[2] |= srcdata[2] << srcshift;
+         dstdata[3] |= srcdata[3] << srcshift;
+         dstdata[4] |= srcdata[4] << srcshift;
+         dstdata[5] |= srcdata[5] << srcshift;
+         dstdata[6] |= srcdata[6] << srcshift;
+         dstdata[7] |= srcdata[7] << srcshift;
+         dstdata += 8;
+         srcdata += 8;
+       }
+   }
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1
"vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch, vectorizer] Fix PR33953 ICE in vectorizable_operation at   tree-vect-transform.c:4017
  2007-11-01 15:08 [patch, vectorizer] Fix PR33953 ICE in vectorizable_operation at tree-vect-transform.c:4017 Ira Rosen
@ 2007-11-06  8:18 ` Dorit Nuzman
  0 siblings, 0 replies; 7+ messages in thread
From: Dorit Nuzman @ 2007-11-06  8:18 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

>
> Hi,
>
> In case of vector shift with scalar shift argument only one scalar
operand
> is currently created, while in case of SLP a sequence of statements can
be
> grouped into more than one vector statement. This patch replicates such
> operand for every vector statement that will be created.
>
> As pointed out by Richard, there may also be a problem with VEC
> allocations. I will look into this.
>
> Bootstrapped with vectorization enabled and tested on x86_64-linux.
> O.K. for mainline?
>
> Thanks,
> Ira
>
>
> ChangeLog:
>
>       PR tree-optimization/33953
>       * tree-vect-transform.c (vectorizable_operation): In case of SLP,
> allocate
>       vec_oprnds1 according to the number of created vector statements.
In
> case
>       of shift with scalar argument, store scalar operand for every
vector
>       statement to be created for the SLP node. Fix a comment.
>
> testsuite/ChangeLog:
>
>       PR tree-optimization/33953
>       * gcc.dg/vect/pr33953.c: New testcase.
>
>
> Index: tree-vect-transform.c
> ===================================================================
> --- tree-vect-transform.c       (revision 129627)
> +++ tree-vect-transform.c       (working copy)
> @@ -3772,6 +3772,7 @@ vectorizable_operation (tree stmt, block
>    int j, i;
>    VEC(tree,heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
>    tree vop0, vop1;
> +  unsigned int k;
>
>    /* FORNOW: SLP with multiple types is not supported. The SLP analysis
> verifies
>       this, so we can safely override NCOPIES with 1 here.  */
> @@ -3915,10 +3916,14 @@ vectorizable_operation (tree stmt, block
>    /* Handle def.  */
>    vec_dest = vect_create_destination_var (scalar_dest, vectype);
>
> -  if (!slp_node)
> -    vec_oprnds0 = VEC_alloc (tree, heap, 1);
> -  if (op_type == binary_op)
> -    vec_oprnds1 = VEC_alloc (tree, heap, 1);
> +  if (slp_node && op_type == binary_op)
> +    vec_oprnds1 = VEC_alloc (tree, heap, slp_node->vec_stmts_size);
> +  else
> +    {
> +      vec_oprnds0 = VEC_alloc (tree, heap, 1);
> +      if (op_type == binary_op)
> +        vec_oprnds1 = VEC_alloc (tree, heap, 1);
> +    }
>

can you please add a comment explaining why in the case of SLP-ing a
binary-op we need to allocate a VEC only for oprnd1 and not oprnd0? and why
in the case of SLP-ing an unary-op we allocate a VEC of size 1 whereas in
the case of SLP-ing a binary-op we allocate a VEC of size "vec_stmts_size"?
(I think a high level description of how these VECs are used would help)

>    /* In case the vectorization factor (VF) is bigger than the number
>       of elements that we can fit in a vectype (nunits), we have to
> generate
> @@ -3993,10 +3998,17 @@ vectorizable_operation (tree stmt, block
>                     fprintf (vect_dump, "operand 1 using scalar mode.");
>                   vec_oprnd1 = op1;
>                   VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
> +                 if (slp_node)
> +                   {
> +                     /* Store vec_oprnd1 for every vector stmt to be
> created
> +                        for SLP_NODE.  */
> +                     for (k = 0; k < slp_node->vec_stmts_size - 1; k++)
> +                       VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
> +                   }

would this work also if it was a different (scalar) shift-amounts for the
generated vector stmts? i.e. if the testcase was:

> +         dstdata[0] |= srcdata[0] << srcshift1;
> +         dstdata[1] |= srcdata[1] << srcshift1;
> +         dstdata[2] |= srcdata[2] << srcshift1;
> +         dstdata[3] |= srcdata[3] << srcshift1;
> +         dstdata[4] |= srcdata[4] << srcshift2;
> +         dstdata[5] |= srcdata[5] << srcshift2;
> +         dstdata[6] |= srcdata[6] << srcshift2;
> +         dstdata[7] |= srcdata[7] << srcshift2;

?

thanks,
dorit

>                 }
>             }
>
> -          /* vec_oprnd is available if operand 1 should be of a
> scalar-type
> +          /* vec_oprnd1 is available if operand 1 should be of a
> scalar-type
>               (a special case for certain kind of vector shifts);
> otherwise,
>               operand 1 should be of a vector type (the usual case).  */
>           if (op_type == binary_op && !vec_oprnd1)
>
> Index: testsuite/gcc.dg/vect/pr33953.c
> ===================================================================
> --- testsuite/gcc.dg/vect/pr33953.c     (revision 0)
> +++ testsuite/gcc.dg/vect/pr33953.c     (revision 0)
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +typedef unsigned int UINT32;
> +
> +void blockmove_NtoN_blend_noremap32 (const UINT32 *srcdata, int
srcwidth,
> +                                     int srcheight, int srcmodulo,
> +                                     UINT32 *dstdata, int dstmodulo,
> +                                     int srcshift)
> +{
> + UINT32 *end;
> +
> + while (srcheight)
> +   {
> +     while (dstdata <= end - 8)
> +       {
> +         dstdata[0] |= srcdata[0] << srcshift;
> +         dstdata[1] |= srcdata[1] << srcshift;
> +         dstdata[2] |= srcdata[2] << srcshift;
> +         dstdata[3] |= srcdata[3] << srcshift;
> +         dstdata[4] |= srcdata[4] << srcshift;
> +         dstdata[5] |= srcdata[5] << srcshift;
> +         dstdata[6] |= srcdata[6] << srcshift;
> +         dstdata[7] |= srcdata[7] << srcshift;
> +         dstdata += 8;
> +         srcdata += 8;
> +       }
> +   }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } }
*/
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1
> "vect"  } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
>
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch, vectorizer] Fix PR33953 ICE in vectorizable_operation at   tree-vect-transform.c:4017
       [not found] <OFC03D72B6.96853328-ON8825738B.005EC36B-8825738B.0062ED55@LocalDomain>
@ 2007-11-07  7:33 ` Ira Rosen
  0 siblings, 0 replies; 7+ messages in thread
From: Ira Rosen @ 2007-11-07  7:33 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: gcc-patches



Dorit Nuzman/Haifa/IBM wrote on 06/11/2007 20:00:32:

> Ira Rosen/Haifa/IBM wrote on 06/11/2007 00:25:02:
>
> > Dorit Nuzman/Haifa/IBM wrote on 06/11/2007 10:20:40:
> >
> ...
> > > > +                     for (k = 0; k < slp_node->vec_stmts_size- 1;
k++)
> > > > +                       VEC_quick_push (tree, vec_oprnds1,
vec_oprnd1);
> > > > +                   }
> > >
> > > would this work also if it was a different (scalar) shift-amounts
> > > for the generated vector stmts? i.e. if the testcase was:
> > >
> > > > +         dstdata[0] |= srcdata[0] << srcshift1;
> > > > +         dstdata[1] |= srcdata[1] << srcshift1;
> > > > +         dstdata[2] |= srcdata[2] << srcshift1;
> > > > +         dstdata[3] |= srcdata[3] << srcshift1;
> > > > +         dstdata[4] |= srcdata[4] << srcshift2;
> > > > +         dstdata[5] |= srcdata[5] << srcshift2;
> > > > +         dstdata[6] |= srcdata[6] << srcshift2;
> > > > +         dstdata[7] |= srcdata[7] << srcshift2;
> > >
> > > ?
> >
> > We check during the analysis that all the shift arguments in case of
> > scalar shift arguments are equal. I will add a comment here to explain
this.
> >
>
> shall we open a missed-optimization PR for this (or add it to the
> todo list in http://gcc.gnu.org/wiki/VectorizationTasks?)

I think I'll add it to the (SLP) todo list.

Ira


>
> thanks,
> dorit
>
> > Thanks,
> > Ira
> >
> > >
> > > thanks,
> > > dorit
> > >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch, vectorizer] Fix PR33953 ICE in vectorizable_operation at    tree-vect-transform.c:4017
       [not found] <OF00864FAF.73B7282D-ONC225738B.00432A71-C225738B.0043B1DF@LocalDomain>
@ 2007-11-07  5:45 ` Dorit Nuzman
  0 siblings, 0 replies; 7+ messages in thread
From: Dorit Nuzman @ 2007-11-07  5:45 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

Ira Rosen/Haifa/IBM wrote on 06/11/2007 04:19:24:

> Resubmitting.
> Bootstrapped and tested on x86_64. O.K. for mainline?
>

ok.
(just see minor comment about the documentation)

thanks,
dorit

> Thanks,
> Ira
>
> ChangeLog:
>
>  PR tree-optimization/33953
>  * tree-vect-transform.c (vectorizable_operation): In case of SLP,
allocate
>  vec_oprnds1 according to the number of created vector statements. In
case
>  of shift with scalar argument, store scalar operand for every vector
>  statement to be created for the SLP node. Fix a comment.
>
> testsuite/ChangeLog:
>
>  PR tree-optimization/33953
>  * gcc.dg/vect/pr33953.c: New testcase.
>
> Index: tree-vect-transform.c
> ===================================================================
> --- tree-vect-transform.c       (revision 129918)
> +++ tree-vect-transform.c       (working copy)
> @@ -3775,6 +3775,8 @@ vectorizable_operation (tree stmt, block
>    int j, i;
>    VEC(tree,heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
>    tree vop0, vop1;
> +  unsigned int k;
> +  bool scalar_shift_arg = false;
>
>    /* FORNOW: SLP with multiple types is not supported. The SLP
> analysis verifies
>       this, so we can safely override NCOPIES with 1 here.  */
> @@ -3891,14 +3893,18 @@ vectorizable_operation (tree stmt, block
>        /* Invariant argument is needed for a vector shift
>          by a scalar shift operand.  */
>        optab_op2_mode = insn_data[icode].operand[2].mode;
> -      if (! (VECTOR_MODE_P (optab_op2_mode)
> -            || dt[1] == vect_constant_def
> -            || dt[1] == vect_invariant_def))
> +      if (!VECTOR_MODE_P (optab_op2_mode))
>         {
> -         if (vect_print_dump_info (REPORT_DETAILS))
> -           fprintf (vect_dump, "operand mode requires invariant
argument.");
> -         return false;
> -       }
> +         if (dt[1] != vect_constant_def && dt[1] != vect_invariant_def)
> +           {
> +             if (vect_print_dump_info (REPORT_DETAILS))
> +               fprintf (vect_dump, "operand mode requires invariant"
> +                                    " argument.");
> +             return false;
> +           }
> +
> +          scalar_shift_arg = true;
> +        }
>      }
>
>    if (!vec_stmt) /* transformation not required.  */
> @@ -3918,10 +3924,21 @@ vectorizable_operation (tree stmt, block
>    /* Handle def.  */
>    vec_dest = vect_create_destination_var (scalar_dest, vectype);
>
> +  /* Allocate VECs for vector operands. In case of SLP, vector operands
are
> +     created in the previous stages of the recursion, so no allocation
is
> +     needed, except for the case of shift with scalar shift argument. In
that
> +     case we store the scalar operand in VEC_OPRNDS1 for every vector
stmt to
> +     be created to vectorize the SLP group, i.e.,
SLP_NODE->VEC_STMTS_SIZE.
> +     In case of loop-based vectorization we allocate VECs of size 1. We
> +     allocate VEC_OPRNDS1 only in case of binary operation.  */
>    if (!slp_node)
> -    vec_oprnds0 = VEC_alloc (tree, heap, 1);
> -  if (op_type == binary_op)
> -    vec_oprnds1 = VEC_alloc (tree, heap, 1);
> +    {
> +      vec_oprnds0 = VEC_alloc (tree, heap, 1);
> +      if (op_type == binary_op)
> +        vec_oprnds1 = VEC_alloc (tree, heap, 1);
> +    }
> +  else if (scalar_shift_arg)
> +    vec_oprnds1 = VEC_alloc (tree, heap, slp_node->vec_stmts_size);
>
>    /* In case the vectorization factor (VF) is bigger than the number
>       of elements that we can fit in a vectype (nunits), we have to
generate
> @@ -3996,10 +4013,18 @@ vectorizable_operation (tree stmt, block
>                     fprintf (vect_dump, "operand 1 using scalar mode.");
>                   vec_oprnd1 = op1;
>                   VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
> +                 if (slp_node)
> +                   {
> +                     /* Store vec_oprnd1 for every vector stmt to be
created
> +                        for SLP_NODE. We check during the analysis
> that all the
> +                         shift arguments are the same.  */

please add to the comment: "TODO: Allow different constants for different
vector stmts generated for an slp pack" (or something like that).

thanks,
dorit

> +                     for (k = 0; k < slp_node->vec_stmts_size - 1; k++)
> +                       VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
> +                   }
>                 }
>             }
>
> -          /* vec_oprnd is available if operand 1 should be of a
scalar-type
> +          /* vec_oprnd1 is available if operand 1 should be of a
scalar-type
>               (a special case for certain kind of vector shifts);
otherwise,
>               operand 1 should be of a vector type (the usual case).  */
>           if (op_type == binary_op && !vec_oprnd1)
>
> Index: testsuite/gcc.dg/vect/pr33953.c
> ===================================================================
> --- testsuite/gcc.dg/vect/pr33953.c     (revision 0)
> +++ testsuite/gcc.dg/vect/pr33953.c     (revision 0)
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +typedef unsigned int UINT32;
> +
> +void blockmove_NtoN_blend_noremap32 (const UINT32 *srcdata, int
srcwidth,
> +                                     int srcheight, int srcmodulo,
> +                                     UINT32 *dstdata, int dstmodulo,
> +                                     int srcshift)
> +{
> + UINT32 *end;
> +
> + while (srcheight)
> +   {
> +     while (dstdata <= end - 8)
> +       {
> +         dstdata[0] |= srcdata[0] << srcshift;
> +         dstdata[1] |= srcdata[1] << srcshift;
> +         dstdata[2] |= srcdata[2] << srcshift;
> +         dstdata[3] |= srcdata[3] << srcshift;
> +         dstdata[4] |= srcdata[4] << srcshift;
> +         dstdata[5] |= srcdata[5] << srcshift;
> +         dstdata[6] |= srcdata[6] << srcshift;
> +         dstdata[7] |= srcdata[7] << srcshift;
> +         dstdata += 8;
> +         srcdata += 8;
> +       }
> +   }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } }
*/
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP"
> 1 "vect"  } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch, vectorizer] Fix PR33953 ICE in vectorizable_operation at   tree-vect-transform.c:4017
       [not found] <OFD9E70188.C982BA8F-ONC225738B.002DDE5C-C225738B.002E3CDC@LocalDomain>
@ 2007-11-06 17:57 ` Dorit Nuzman
  0 siblings, 0 replies; 7+ messages in thread
From: Dorit Nuzman @ 2007-11-06 17:57 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

Ira Rosen/Haifa/IBM wrote on 06/11/2007 00:25:02:

> Dorit Nuzman/Haifa/IBM wrote on 06/11/2007 10:20:40:
>
...
> > > +                     for (k = 0; k < slp_node->vec_stmts_size - 1;
k++)
> > > +                       VEC_quick_push (tree, vec_oprnds1,
vec_oprnd1);
> > > +                   }
> >
> > would this work also if it was a different (scalar) shift-amounts
> > for the generated vector stmts? i.e. if the testcase was:
> >
> > > +         dstdata[0] |= srcdata[0] << srcshift1;
> > > +         dstdata[1] |= srcdata[1] << srcshift1;
> > > +         dstdata[2] |= srcdata[2] << srcshift1;
> > > +         dstdata[3] |= srcdata[3] << srcshift1;
> > > +         dstdata[4] |= srcdata[4] << srcshift2;
> > > +         dstdata[5] |= srcdata[5] << srcshift2;
> > > +         dstdata[6] |= srcdata[6] << srcshift2;
> > > +         dstdata[7] |= srcdata[7] << srcshift2;
> >
> > ?
>
> We check during the analysis that all the shift arguments in case of
> scalar shift arguments are equal. I will add a comment here to explain
this.
>

shall we open a missed-optimization PR for this (or add it to the todo list
in http://gcc.gnu.org/wiki/VectorizationTasks?)

thanks,
dorit

> Thanks,
> Ira
>
> >
> > thanks,
> > dorit
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch, vectorizer] Fix PR33953 ICE in vectorizable_operation at    tree-vect-transform.c:4017
  2007-11-06  8:25 ` Ira Rosen
@ 2007-11-06 12:20   ` Ira Rosen
  0 siblings, 0 replies; 7+ messages in thread
From: Ira Rosen @ 2007-11-06 12:20 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Dorit Nuzman, gcc-patches

Resubmitting.
Bootstrapped and tested on x86_64. O.K. for mainline?

Thanks,
Ira

ChangeLog:

      PR tree-optimization/33953
      * tree-vect-transform.c (vectorizable_operation): In case of SLP,
allocate
      vec_oprnds1 according to the number of created vector statements. In
case
      of shift with scalar argument, store scalar operand for every vector
      statement to be created for the SLP node. Fix a comment.

testsuite/ChangeLog:

      PR tree-optimization/33953
      * gcc.dg/vect/pr33953.c: New testcase.


Index: tree-vect-transform.c
===================================================================
--- tree-vect-transform.c       (revision 129918)
+++ tree-vect-transform.c       (working copy)
@@ -3775,6 +3775,8 @@ vectorizable_operation (tree stmt, block
   int j, i;
   VEC(tree,heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
   tree vop0, vop1;
+  unsigned int k;
+  bool scalar_shift_arg = false;

   /* FORNOW: SLP with multiple types is not supported. The SLP analysis
verifies
      this, so we can safely override NCOPIES with 1 here.  */
@@ -3891,14 +3893,18 @@ vectorizable_operation (tree stmt, block
       /* Invariant argument is needed for a vector shift
         by a scalar shift operand.  */
       optab_op2_mode = insn_data[icode].operand[2].mode;
-      if (! (VECTOR_MODE_P (optab_op2_mode)
-            || dt[1] == vect_constant_def
-            || dt[1] == vect_invariant_def))
+      if (!VECTOR_MODE_P (optab_op2_mode))
        {
-         if (vect_print_dump_info (REPORT_DETAILS))
-           fprintf (vect_dump, "operand mode requires invariant
argument.");
-         return false;
-       }
+         if (dt[1] != vect_constant_def && dt[1] != vect_invariant_def)
+           {
+             if (vect_print_dump_info (REPORT_DETAILS))
+               fprintf (vect_dump, "operand mode requires invariant"
+                                    " argument.");
+             return false;
+           }
+
+          scalar_shift_arg = true;
+        }
     }

   if (!vec_stmt) /* transformation not required.  */
@@ -3918,10 +3924,21 @@ vectorizable_operation (tree stmt, block
   /* Handle def.  */
   vec_dest = vect_create_destination_var (scalar_dest, vectype);

+  /* Allocate VECs for vector operands. In case of SLP, vector operands
are
+     created in the previous stages of the recursion, so no allocation is
+     needed, except for the case of shift with scalar shift argument. In
that
+     case we store the scalar operand in VEC_OPRNDS1 for every vector stmt
to
+     be created to vectorize the SLP group, i.e.,
SLP_NODE->VEC_STMTS_SIZE.
+     In case of loop-based vectorization we allocate VECs of size 1. We
+     allocate VEC_OPRNDS1 only in case of binary operation.  */
   if (!slp_node)
-    vec_oprnds0 = VEC_alloc (tree, heap, 1);
-  if (op_type == binary_op)
-    vec_oprnds1 = VEC_alloc (tree, heap, 1);
+    {
+      vec_oprnds0 = VEC_alloc (tree, heap, 1);
+      if (op_type == binary_op)
+        vec_oprnds1 = VEC_alloc (tree, heap, 1);
+    }
+  else if (scalar_shift_arg)
+    vec_oprnds1 = VEC_alloc (tree, heap, slp_node->vec_stmts_size);

   /* In case the vectorization factor (VF) is bigger than the number
      of elements that we can fit in a vectype (nunits), we have to
generate
@@ -3996,10 +4013,18 @@ vectorizable_operation (tree stmt, block
                    fprintf (vect_dump, "operand 1 using scalar mode.");
                  vec_oprnd1 = op1;
                  VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
+                 if (slp_node)
+                   {
+                     /* Store vec_oprnd1 for every vector stmt to be
created
+                        for SLP_NODE. We check during the analysis that
all the
+                         shift arguments are the same.  */
+                     for (k = 0; k < slp_node->vec_stmts_size - 1; k++)
+                       VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
+                   }
                }
            }

-          /* vec_oprnd is available if operand 1 should be of a
scalar-type
+          /* vec_oprnd1 is available if operand 1 should be of a
scalar-type
              (a special case for certain kind of vector shifts);
otherwise,
              operand 1 should be of a vector type (the usual case).  */
          if (op_type == binary_op && !vec_oprnd1)

Index: testsuite/gcc.dg/vect/pr33953.c
===================================================================
--- testsuite/gcc.dg/vect/pr33953.c     (revision 0)
+++ testsuite/gcc.dg/vect/pr33953.c     (revision 0)
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+typedef unsigned int UINT32;
+
+void blockmove_NtoN_blend_noremap32 (const UINT32 *srcdata, int srcwidth,
+                                     int srcheight, int srcmodulo,
+                                     UINT32 *dstdata, int dstmodulo,
+                                     int srcshift)
+{
+ UINT32 *end;
+
+ while (srcheight)
+   {
+     while (dstdata <= end - 8)
+       {
+         dstdata[0] |= srcdata[0] << srcshift;
+         dstdata[1] |= srcdata[1] << srcshift;
+         dstdata[2] |= srcdata[2] << srcshift;
+         dstdata[3] |= srcdata[3] << srcshift;
+         dstdata[4] |= srcdata[4] << srcshift;
+         dstdata[5] |= srcdata[5] << srcshift;
+         dstdata[6] |= srcdata[6] << srcshift;
+         dstdata[7] |= srcdata[7] << srcshift;
+         dstdata += 8;
+         srcdata += 8;
+       }
+   }
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1
"vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch, vectorizer] Fix PR33953 ICE in vectorizable_operation at   tree-vect-transform.c:4017
       [not found] <OF2E0D4587.186A24A6-ON8825738B.002DAFCD-8825738B.002DD6AF@LocalDomain>
@ 2007-11-06  8:25 ` Ira Rosen
  2007-11-06 12:20   ` Ira Rosen
  0 siblings, 1 reply; 7+ messages in thread
From: Ira Rosen @ 2007-11-06  8:25 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: gcc-patches


Dorit Nuzman/Haifa/IBM wrote on 06/11/2007 10:20:40:

> > @@ -3915,10 +3916,14 @@ vectorizable_operation (tree stmt, block
> >    /* Handle def.  */
> >    vec_dest = vect_create_destination_var (scalar_dest, vectype);
> >
> > -  if (!slp_node)
> > -    vec_oprnds0 = VEC_alloc (tree, heap, 1);
> > -  if (op_type == binary_op)
> > -    vec_oprnds1 = VEC_alloc (tree, heap, 1);
> > +  if (slp_node && op_type == binary_op)
> > +    vec_oprnds1 = VEC_alloc (tree, heap, slp_node->vec_stmts_size);
> > +  else
> > +    {
> > +      vec_oprnds0 = VEC_alloc (tree, heap, 1);
> > +      if (op_type == binary_op)
> > +        vec_oprnds1 = VEC_alloc (tree, heap, 1);
> > +    }
> >
>
> can you please add a comment explaining why in the case of SLP-ing a
> binary-op we need to allocate a VEC only for oprnd1 and not oprnd0?
> and why in the case of SLP-ing an unary-op we allocate a VEC of size
> 1 whereas in the case of SLP-ing a binary-op we allocate a VEC of
> size "vec_stmts_size"? (I think a high level description of how
> these VECs are used would help)

We actually need a VEC of size "vec_stmts_size" only in case of SLP of
shift with scalar argument. I will rewrite the condition here and add a
comment.

>
> >    /* In case the vectorization factor (VF) is bigger than the number
> >       of elements that we can fit in a vectype (nunits), we have to
> > generate
> > @@ -3993,10 +3998,17 @@ vectorizable_operation (tree stmt, block
> >                     fprintf (vect_dump, "operand 1 using scalar
mode.");
> >                   vec_oprnd1 = op1;
> >                   VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
> > +                 if (slp_node)
> > +                   {
> > +                     /* Store vec_oprnd1 for every vector stmt to be
> > created
> > +                        for SLP_NODE.  */
> > +                     for (k = 0; k < slp_node->vec_stmts_size - 1;
k++)
> > +                       VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
> > +                   }
>
> would this work also if it was a different (scalar) shift-amounts
> for the generated vector stmts? i.e. if the testcase was:
>
> > +         dstdata[0] |= srcdata[0] << srcshift1;
> > +         dstdata[1] |= srcdata[1] << srcshift1;
> > +         dstdata[2] |= srcdata[2] << srcshift1;
> > +         dstdata[3] |= srcdata[3] << srcshift1;
> > +         dstdata[4] |= srcdata[4] << srcshift2;
> > +         dstdata[5] |= srcdata[5] << srcshift2;
> > +         dstdata[6] |= srcdata[6] << srcshift2;
> > +         dstdata[7] |= srcdata[7] << srcshift2;
>
> ?

We check during the analysis that all the shift arguments in case of scalar
shift arguments are equal. I will add a comment here to explain this.

Thanks,
Ira

>
> thanks,
> dorit
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-11-07  7:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-01 15:08 [patch, vectorizer] Fix PR33953 ICE in vectorizable_operation at tree-vect-transform.c:4017 Ira Rosen
2007-11-06  8:18 ` Dorit Nuzman
     [not found] <OF2E0D4587.186A24A6-ON8825738B.002DAFCD-8825738B.002DD6AF@LocalDomain>
2007-11-06  8:25 ` Ira Rosen
2007-11-06 12:20   ` Ira Rosen
     [not found] <OFD9E70188.C982BA8F-ONC225738B.002DDE5C-C225738B.002E3CDC@LocalDomain>
2007-11-06 17:57 ` Dorit Nuzman
     [not found] <OF00864FAF.73B7282D-ONC225738B.00432A71-C225738B.0043B1DF@LocalDomain>
2007-11-07  5:45 ` Dorit Nuzman
     [not found] <OFC03D72B6.96853328-ON8825738B.005EC36B-8825738B.0062ED55@LocalDomain>
2007-11-07  7:33 ` Ira Rosen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).