public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch] [4.3 projects] outer-loop vectorization patch 3/n
@ 2007-08-11 17:10 Dorit Nuzman
  0 siblings, 0 replies; only message in thread
From: Dorit Nuzman @ 2007-08-11 17:10 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4471 bytes --]


Hi,

This part is not specific to outer-loop vectorization, but may needed even
more when doing outer-loop vectorization. This back to the issue raised
here: http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00136.html. In cases
that the compiler can't prove that the loop-count is non-zero, it returns
"scev_not_known", and vectorization fails cause it thinks that the loop is
not countable. Instead, with this patch, we have the number-of-iterations
analysis return both the loop-bound expression and the maybe_zero
expression, and from them build a COND_EXPR that represents the loop count.
i.e. : "lb = may_be_zero ? zero : lb".
The one limitation is that currently the compiler does not let us generate
COND_EXPR as a rhs (at least until this patch is committed:
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02052.html), so we have to
prevent situations in which the vectorizer would actually generate stmts
using this expression (e.g. peeling the loop).

Bootstrapped on powerpc64-linux,
bootstrapped with vectorization enabled on i386-linux,
passed full regression testing on both platforms.

This part needs approval.

thanks,
dorit

ChangeLog:

        * tree-scalar-evolutions.c (number_of_latch_executions_1): New.
        Contains code that was factored out from
number_of_latch_executions.
        (number_of_latch_executions): Call number_of_latch_executions_1
        instead of code that was factored out.
        (number_of_exit_cond_executions): Takes additional argument
        may_be_zero. Call  number_of_latch_executions_1 instead of
        number_of_latch_executions.
        * tree-scalar-evolutions.h (number_of_exit_cond_executions): Takes
        additional argument.
        * tree-vect-analyze.c (vect_analyze_operations): Avoid
vectorization
        in case the loop-bound is a COND_EXPR and peeling needs to be done.
        (vect_get_loop_niters): Call number_of_exit_cond_executions with
        additional argument may_be_zero. Use it to build a (potentially)
        conditional expression for the loop niters.

testsuite/ChangeLog:

      * gcc.dg/vect/vect-outer-fir-lb.c: First loop now gets vectorized.

P.S.1 One example where this helps is in this case:

 for (k = 0; k < 4; k++) {
  for (i = 0; i < N; i++) {
    diff = 0;
    j = k;
    do {
      diff += in[j+i]*coeff[j];
      j+=4;
    } while (j < M);
    out[i] += diff;
  }
 }

(taken from vect-outer-fir-lb.c, as posted here:
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00742.html).

Without the patch we get:
"
Analyzing # of iterations of loop 4
  exit condition [k_45 + 4, + , 4](no_overflow) <= 63
  bounds on difference of bases: -2147483584 ... 2147483707
  result:
    zero if k_45 > 63
    # of iterations (63 - (unsigned int) k_45) /[fl] 4, bounded by
536870927
  (set_nb_iterations_in_loop = scev_not_known))
(get_loop_exit_condition
  if (j_19 <= 63))

vect-outer-fir-lb.c:33: note: not vectorized: number of iterations cannot
be computed.
"

With the patch, the loop gets vectorized.

Of-course, what would really be good is if range propoagation could help
the number-of-iterations analysis realize that since k<4 it holds that
k<127 and so the loop-bound is never zero.

P.S.2 When the above loop is written this way:

 for (k = 0; k < 4; k++) {
  for (i = 0; i < N; i++) {
    diff = 0;
    for (j = k; j < M; j+=4) {
      diff += in[j+i]*coeff[j];
    }
    out[i] += diff;
  }
 }

(i.e. the inner-loop is a for-loop instead of a do-while loop; taken from
vect-outer-fir.c, as posted here:
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00729.html), the compiler
returns the following expression:
"
Analyzing # of iterations of loop 4
  exit condition [k_45 + 4, + , 4](no_overflow) <= 127
  bounds on difference of bases: -4 ... 2147483771
  result:
    # of iterations (127 - (unsigned int) k_45) /[fl] 4, bounded by
536870943
  (set_nb_iterations_in_loop = (127 - (unsigned int) k_45) /[fl] 4))

vect-outer-fir.c:29: note: ==> get_loop_niters:(127 - (unsigned int) k_45)
/[fl] 4 + 1
"
...but although there isn't a maybe_zero part, the compiler still thinks
that the inner-loop may iterate 0 times, and so it generates a guard code
around it to skip the inner-loop in case it's loop-bound it 0 (which the
outer-loop vectorizer doesn't like...):

outer-loop-header:
      ...
      if (k_45 <= 127)
            then goto inner-loop
      else
            goto outer-loop tail
inner-loop:
      ...
outer-loop tail:
      ...


(See attached file: p4.txt)

[-- Attachment #2: p4.txt --]
[-- Type: text/plain, Size: 8246 bytes --]

*** tree-vect-analyze.c.p3	2007-08-07 09:32:38.000000000 +0300
--- tree-vect-analyze.c	2007-08-09 14:34:58.000000000 +0300
*************** vect_analyze_operations (loop_vec_info l
*** 551,561 ****
                       "not vectorized: can't create epilog loop 1.");
            return false;
          }
        if (!slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
          {
            if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
              fprintf (vect_dump,
!                      "not vectorized: can't create epilog loop 2.");
            return false;
          }
      }
--- 551,570 ----
                       "not vectorized: can't create epilog loop 1.");
            return false;
          }
+       if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+ 	  && LOOP_VINFO_NITERS (loop_vinfo)
+ 	  && TREE_CODE (LOOP_VINFO_NITERS (loop_vinfo)) == COND_EXPR)
+ 	{
+ 	  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+ 	    fprintf (vect_dump,
+ 	             "not vectorized: can't create epilog loop 2.");
+ 	    return false;
+ 	}
        if (!slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
          {
            if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
              fprintf (vect_dump,
!                      "not vectorized: can't create epilog loop 3.");
            return false;
          }
      }
*************** static tree
*** 2911,2927 ****
  vect_get_loop_niters (struct loop *loop, tree *number_of_iterations)
  {
    tree niters;
  
    if (vect_print_dump_info (REPORT_DETAILS))
      fprintf (vect_dump, "=== get_loop_niters ===");
  
!   niters = number_of_exit_cond_executions (loop);
  
!   if (niters != NULL_TREE
!       && niters != chrec_dont_know)
      {
!       *number_of_iterations = niters;
  
        if (vect_print_dump_info (REPORT_DETAILS))
  	{
  	  fprintf (vect_dump, "==> get_loop_niters:" );
--- 2920,2955 ----
  vect_get_loop_niters (struct loop *loop, tree *number_of_iterations)
  {
    tree niters;
+   tree may_be_zero;
  
    if (vect_print_dump_info (REPORT_DETAILS))
      fprintf (vect_dump, "=== get_loop_niters ===");
  
!   niters = number_of_exit_cond_executions (loop, &may_be_zero);
  
!   if (niters != NULL_TREE && niters != chrec_dont_know)
      {
!       tree type = TREE_TYPE (niters);
!       tree zero = build_int_cst (type, 0);
  
+       *number_of_iterations = niters;
+       if (may_be_zero)
+ 	{
+ 	  if (integer_nonzerop (may_be_zero))
+ 	    *number_of_iterations = zero;
+ 	  else if (integer_zerop (may_be_zero))
+ 	    ;
+ 	  else
+ 	    {
+ 	      /* CHECKME: other things to check? */
+ 	      if (COMPARISON_CLASS_P (may_be_zero))
+ 		*number_of_iterations = 
+ 			build3 (COND_EXPR, type, may_be_zero, zero, niters);
+ 	      else
+ 		*number_of_iterations = chrec_dont_know;
+ 	    }
+ 	}
+  
        if (vect_print_dump_info (REPORT_DETAILS))
  	{
  	  fprintf (vect_dump, "==> get_loop_niters:" );
Index: tree-scalar-evolution.c
===================================================================
*** tree-scalar-evolution.c	(revision 127356)
--- tree-scalar-evolution.c	(working copy)
*************** resolve_mixers (struct loop *loop, tree 
*** 2282,2299 ****
     and EF (6) = 53.  In this case the value of 'b' on exit is '53' and
     the loop body has been executed 6 times.  */
  
! tree 
! number_of_latch_executions (struct loop *loop)
  {
!   tree res, type;
    edge exit;
    struct tree_niter_desc niter_desc;
  
    /* Determine whether the number_of_iterations_in_loop has already
       been computed.  */
    res = loop->nb_iterations;
    if (res)
      return res;
    res = chrec_dont_know;
  
    if (dump_file && (dump_flags & TDF_DETAILS))
--- 2282,2302 ----
     and EF (6) = 53.  In this case the value of 'b' on exit is '53' and
     the loop body has been executed 6 times.  */
  
! static tree 
! number_of_latch_executions_1 (struct loop *loop, tree *may_be_zero)
  {
!   tree res;
    edge exit;
    struct tree_niter_desc niter_desc;
  
+   *may_be_zero = NULL_TREE;
+ 
    /* Determine whether the number_of_iterations_in_loop has already
       been computed.  */
    res = loop->nb_iterations;
    if (res)
      return res;
+ 
    res = chrec_dont_know;
  
    if (dump_file && (dump_flags & TDF_DETAILS))
*************** number_of_latch_executions (struct loop 
*** 2306,2323 ****
    if (!number_of_iterations_exit (loop, exit, &niter_desc, false))
      goto end;
  
!   type = TREE_TYPE (niter_desc.niter);
!   if (integer_nonzerop (niter_desc.may_be_zero))
!     res = build_int_cst (type, 0);
!   else if (integer_zerop (niter_desc.may_be_zero))
!     res = niter_desc.niter;
!   else
!     res = chrec_dont_know;
  
  end:
    return set_nb_iterations_in_loop (loop, res);
  }
  
  /* Returns the number of executions of the exit condition of LOOP,
     i.e., the number by one higher than number_of_latch_executions.
     Note that unline number_of_latch_executions, this number does
--- 2309,2344 ----
    if (!number_of_iterations_exit (loop, exit, &niter_desc, false))
      goto end;
  
!   *may_be_zero = niter_desc.may_be_zero;
!   res = niter_desc.niter;
  
  end:
    return set_nb_iterations_in_loop (loop, res);
  }
  
+ 
+ tree
+ number_of_latch_executions (struct loop *loop)
+ {
+   tree res, type;
+   tree may_be_zero;
+ 
+   res = number_of_latch_executions_1 (loop, &may_be_zero);
+ 
+   if (may_be_zero)
+     {
+       type = TREE_TYPE (res);
+       if (integer_nonzerop (may_be_zero))
+ 	res = build_int_cst (type, 0);
+       else if (integer_zerop (may_be_zero))
+ 	;
+       else
+ 	res = chrec_dont_know;
+     }
+ 
+   return res;
+ }
+ 
  /* Returns the number of executions of the exit condition of LOOP,
     i.e., the number by one higher than number_of_latch_executions.
     Note that unline number_of_latch_executions, this number does
*************** end:
*** 2329,2337 ****
     the possible overflow.  */
  
  tree 
! number_of_exit_cond_executions (struct loop *loop)
  {
!   tree ret = number_of_latch_executions (loop);
    tree type = chrec_type (ret);
  
    if (chrec_contains_undetermined (ret))
--- 2350,2358 ----
     the possible overflow.  */
  
  tree 
! number_of_exit_cond_executions (struct loop *loop, tree *may_be_zero)
  {
!   tree ret = number_of_latch_executions_1 (loop, may_be_zero);
    tree type = chrec_type (ret);
  
    if (chrec_contains_undetermined (ret))
Index: tree-scalar-evolution.h
===================================================================
*** tree-scalar-evolution.h	(revision 127356)
--- tree-scalar-evolution.h	(working copy)
*************** along with GCC; see the file COPYING3.  
*** 22,28 ****
  #define GCC_TREE_SCALAR_EVOLUTION_H
  
  extern tree number_of_latch_executions (struct loop *);
! extern tree number_of_exit_cond_executions (struct loop *);
  extern tree get_loop_exit_condition (struct loop *);
  
  extern void scev_initialize (void);
--- 22,28 ----
  #define GCC_TREE_SCALAR_EVOLUTION_H
  
  extern tree number_of_latch_executions (struct loop *);
! extern tree number_of_exit_cond_executions (struct loop *, tree *);
  extern tree get_loop_exit_condition (struct loop *);
  
  extern void scev_initialize (void);
*** testsuite/gcc.dg/vect/vect-outer-fir-lb.c	2007-08-11 19:48:20.000000000 +0300
--- testsuite/gcc.dg/vect/vect-outer-fir-lb.c.p4	2007-08-11 19:48:36.000000000 +0300
*************** float out[N];
*** 11,19 ****
  float fir_out[N];
  
  /* Should be vectorized. Fixed misaligment in the inner-loop.  */
- /* Currently not vectorized because the loop-count for the inner-loop
-    has a maybe_zero component. Will be fixed when we incorporate the
-    "cond_expr in rhs" patch.  */
  void foo (){
   int i,j,k;
   float diff;
--- 11,16 ----
*************** int main (void)
*** 75,80 ****
    return 0;
  }
  
! /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
! /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
  /* { dg-final { cleanup-tree-dump "vect" } } */
--- 72,76 ----
    return 0;
  }
  
! /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail vect_no_align } } } */
  /* { dg-final { cleanup-tree-dump "vect" } } */

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2007-08-11 17:10 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-11 17:10 [patch] [4.3 projects] outer-loop vectorization patch 3/n Dorit Nuzman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).