[Patch for suggestions]: How do we know a loop is the peeled version?

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [Patch for suggestions]: How do we know a loop is the peeled version?
@ 2010-07-01 18:42 Fang, Changpeng
  2010-07-01 19:38 ` Zdenek Dvorak
  2010-07-02 16:31 ` Fang, Changpeng
  0 siblings, 2 replies; 7+ messages in thread
From: Fang, Changpeng @ 2010-07-01 18:42 UTC (permalink / raw)
  To: Richard Guenther, Sebastian Pop
  Cc: Zdenek Dvorak, Christian Borntraeger, gcc-patches, uweigand

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

Hi,

Just found that many optimizations (prefetch, loop unrolling) are performed on the peeled loops.
This causes code size and compilation time increase without benefit.

MODULE kinds
   INTEGER, PARAMETER :: RK8 = SELECTED_REAL_KIND(15, 300)
END MODULE kinds
! --------------------------------------------------------------------
PROGRAM TEST_FPU  ! A number-crunching benchmark using matrix inversion.
USE kinds         ! Implemented by:    David Frank  Dave_Frank@hotmail.com
IMPLICIT NONE     ! Gauss  routine by: Tim Prince   N8TM@aol.com
                  ! Crout  routine by: James Van Buskirk  torsop@ix.netcom.com
                  ! Lapack routine by: Jos Bergervoet bergervo@IAEhv.nl

REAL(RK8) :: pool(101, 101,1000), a(101, 101)
INTEGER :: i

      DO i = 1,1000
         a = pool(:,:,i)         ! get next matrix to invert
      END DO

END PROGRAM TEST_FPU

For this example (-O3 -fprefetch-loop-arrays -funroll-loops), the vectorizer peels the loop.
And the prefetching and loop unrolling are performed on the peeled loops.

In the attached patch, the vectorizer marked the loop as peeled, and the prefetching
gives up. However, the RTL unroller could not get this information and still unroll the peeled
loop.

I need suggestion: How the optimizer recognizes that the loop is the peeled version (preloop or postloop)?

Thanks,

Changpeng

 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Marking-peeled-loop.patch --]
[-- Type: text/x-patch; name="0001-Marking-peeled-loop.patch", Size: 1972 bytes --]

From 17fa894c3056bfdf76af19e7ca1d685b9aa4a881 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@houghton.(none)>
Date: Thu, 1 Jul 2010 11:27:21 -0700
Subject: [PATCH] Marking peeled loop

	* cfgloop.h : add peeled filed for loop structure.

	* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): mark
	new_loop as peeled.

	* tree-ssa-loop-prefetch.c (loop_prefetch_arrays): Do not do
	loop prefetch for peeled loop.
---
 gcc/cfgloop.h                |    2 ++
 gcc/tree-ssa-loop-prefetch.c |    4 ++++
 gcc/tree-vect-loop-manip.c   |    1 +
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 3821ee6..f43d6df 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -166,6 +166,8 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* Head of the cyclic list of the exits of the loop.  */
   struct loop_exit *exits;
 
+  bool peeled;
+
   /* The single induction variable of the loop when the loop is in
      normal form.  */
   tree single_iv;
diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index cde5e18..8a219c3 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -1724,6 +1724,10 @@ loop_prefetch_arrays (struct loop *loop)
       return false;
     }
 
+  /* Don't do prefetching for peeled loop.  */
+  if (loop->peeled)
+    return false;
+
   /* Step 1: gather the memory references.  */
   refs = gather_memory_references (loop, &no_other_refs, &mem_ref_count);
 
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 8289b36..6bf54f5 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1905,6 +1905,7 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
 					    cond_expr, cond_expr_stmt_list);
   gcc_assert (new_loop);
   gcc_assert (loop_num == loop->num);
+  new_loop->peeled = true;
 #ifdef ENABLE_CHECKING
   slpeel_verify_cfg_after_peeling (loop, new_loop);
 #endif
-- 
1.6.3.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch for suggestions]: How do we know a loop is the peeled version?
  2010-07-01 18:42 [Patch for suggestions]: How do we know a loop is the peeled version? Fang, Changpeng
@ 2010-07-01 19:38 ` Zdenek Dvorak
  2010-07-02 18:40   ` Fang, Changpeng
  2010-07-02 16:31 ` Fang, Changpeng
  1 sibling, 1 reply; 7+ messages in thread
From: Zdenek Dvorak @ 2010-07-01 19:38 UTC (permalink / raw)
  To: Fang, Changpeng
  Cc: Richard Guenther, Sebastian Pop, Christian Borntraeger,
	gcc-patches, uweigand

Hi,

> Just found that many optimizations (prefetch, loop unrolling) are performed on the peeled loops.
> This causes code size and compilation time increase without benefit.
> 
> MODULE kinds
>    INTEGER, PARAMETER :: RK8 = SELECTED_REAL_KIND(15, 300)
> END MODULE kinds
> ! --------------------------------------------------------------------
> PROGRAM TEST_FPU  ! A number-crunching benchmark using matrix inversion.
> USE kinds         ! Implemented by:    David Frank  Dave_Frank@hotmail.com
> IMPLICIT NONE     ! Gauss  routine by: Tim Prince   N8TM@aol.com
>                   ! Crout  routine by: James Van Buskirk  torsop@ix.netcom.com
>                   ! Lapack routine by: Jos Bergervoet bergervo@IAEhv.nl
> 
> REAL(RK8) :: pool(101, 101,1000), a(101, 101)
> INTEGER :: i
> 
>       DO i = 1,1000
>          a = pool(:,:,i)         ! get next matrix to invert
>       END DO
> 
> END PROGRAM TEST_FPU
> 
> For this example (-O3 -fprefetch-loop-arrays -funroll-loops), the vectorizer peels the loop.
> And the prefetching and loop unrolling are performed on the peeled loops.
> 
> In the attached patch, the vectorizer marked the loop as peeled, and the prefetching
> gives up. However, the RTL unroller could not get this information and still unroll the peeled
> loop.
> 
> I need suggestion: How the optimizer recognizes that the loop is the peeled version (preloop or postloop)?

instead of the "peeled" flag, it might be better to make sure that the latter optimizers know that
the peeled loop does not roll enough (set nb_iterations_upper_bound/nb_iterations_estimate).

Passing the information from gimple to rtl loop optimizer is a long-standing problem.  One possibility is
to keep the information about the loops up-to-date between the optimizers (few years ago, I made
sure that this is possible up to tree->rtl expansion, so making this work again should not be too hard;
handling the expansion and rtl passes might be a little more challenging, but not terribly so).
A less involved solution is to drop some notes in the instruction stream at the end of gimple loop
optimizer, and pick them up in the rtl one,

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [Patch for suggestions]: How do we know a loop is the peeled version?
  2010-07-01 18:42 [Patch for suggestions]: How do we know a loop is the peeled version? Fang, Changpeng
  2010-07-01 19:38 ` Zdenek Dvorak
@ 2010-07-02 16:31 ` Fang, Changpeng
  2010-07-02 18:40   ` Zdenek Dvorak
  1 sibling, 1 reply; 7+ messages in thread
From: Fang, Changpeng @ 2010-07-02 16:31 UTC (permalink / raw)
  To: Fang, Changpeng, Richard Guenther, Sebastian Pop
  Cc: Zdenek Dvorak, Christian Borntraeger, gcc-patches, uweigand

Hi,

An additional problem is that after a loop is completely-unrolled, the loop structure
seems not destroyed. And thus later optimization passes may still be performed 
on these non-existence loops.

Changpeng

________________________________________
From: Fang, Changpeng
Sent: Thursday, July 01, 2010 1:41 PM
To: Richard Guenther; Sebastian Pop
Cc: Zdenek Dvorak; Christian Borntraeger; gcc-patches@gcc.gnu.org; uweigand@de.ibm.com
Subject: [Patch for suggestions]: How do we know a loop is the peeled version?

Hi,

Just found that many optimizations (prefetch, loop unrolling) are performed on the peeled loops.
This causes code size and compilation time increase without benefit.

MODULE kinds
   INTEGER, PARAMETER :: RK8 = SELECTED_REAL_KIND(15, 300)
END MODULE kinds
! --------------------------------------------------------------------
PROGRAM TEST_FPU  ! A number-crunching benchmark using matrix inversion.
USE kinds         ! Implemented by:    David Frank  Dave_Frank@hotmail.com
IMPLICIT NONE     ! Gauss  routine by: Tim Prince   N8TM@aol.com
                  ! Crout  routine by: James Van Buskirk  torsop@ix.netcom.com
                  ! Lapack routine by: Jos Bergervoet bergervo@IAEhv.nl

REAL(RK8) :: pool(101, 101,1000), a(101, 101)
INTEGER :: i

      DO i = 1,1000
         a = pool(:,:,i)         ! get next matrix to invert
      END DO

END PROGRAM TEST_FPU

For this example (-O3 -fprefetch-loop-arrays -funroll-loops), the vectorizer peels the loop.
And the prefetching and loop unrolling are performed on the peeled loops.

In the attached patch, the vectorizer marked the loop as peeled, and the prefetching
gives up. However, the RTL unroller could not get this information and still unroll the peeled
loop.

I need suggestion: How the optimizer recognizes that the loop is the peeled version (preloop or postloop)?

Thanks,

Changpeng

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch for suggestions]: How do we know a loop is the peeled version?
  2010-07-02 16:31 ` Fang, Changpeng
@ 2010-07-02 18:40   ` Zdenek Dvorak
  0 siblings, 0 replies; 7+ messages in thread
From: Zdenek Dvorak @ 2010-07-02 18:40 UTC (permalink / raw)
  To: Fang, Changpeng
  Cc: Richard Guenther, Sebastian Pop, Christian Borntraeger,
	gcc-patches, uweigand

Hi,

> An additional problem is that after a loop is completely-unrolled, the loop structure
> seems not destroyed. And thus later optimization passes may still be performed 
> on these non-existence loops.

if that is so, that would be a bug; but I am fairly sure that is not the case
(verify_loop_structure catches this kind of problems, and it is being run all over
the place).  IIRC, complete loop unrolling only replaces the exit condition of the
loop by if (0), and the loop gets physically removed in the following cfg cleanup,

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [Patch for suggestions]: How do we know a loop is the peeled version?
  2010-07-01 19:38 ` Zdenek Dvorak
@ 2010-07-02 18:40   ` Fang, Changpeng
  2010-07-02 18:57     ` Zdenek Dvorak
  0 siblings, 1 reply; 7+ messages in thread
From: Fang, Changpeng @ 2010-07-02 18:40 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Richard Guenther, Sebastian Pop, Christian Borntraeger,
	gcc-patches, uweigand


>>instead of the "peeled" flag, it might be better to make sure that the latter optimizers know that
>>the peeled loop does not roll enough (set nb_iterations_upper_bound/nb_iterations_estimate).

The problem is that we don't know exactly how many iterations are peeled. 
For example, in vect_do_peeling_for_alignment:

niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name,
							   &wide_prolog_niters);

We could not give a constant estimation of the upper-bound.

Also, any bounds you set will be erased after dce.

>>Passing the information from gimple to rtl loop optimizer is a long-standing problem.  One possibility is
>>to keep the information about the loops up-to-date between the optimizers (few years ago, I made
>>sure that this is possible up to tree->rtl expansion, so making this work again should not be too hard;
>>handling the expansion and rtl passes might be a little more challenging, but not terribly so).
>>A less involved solution is to drop some notes in the instruction stream at the end of gimple loop
>>optimizer, and pick them up in the rtl one,

Could you please give some details.

Thnaks,

Changpeng

>Zdenek


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch for suggestions]: How do we know a loop is the peeled version?
  2010-07-02 18:40   ` Fang, Changpeng
@ 2010-07-02 18:57     ` Zdenek Dvorak
  2010-07-03  0:06       ` Fang, Changpeng
  0 siblings, 1 reply; 7+ messages in thread
From: Zdenek Dvorak @ 2010-07-02 18:57 UTC (permalink / raw)
  To: Fang, Changpeng
  Cc: Richard Guenther, Sebastian Pop, Christian Borntraeger,
	gcc-patches, uweigand

Hi,

> 
> >>instead of the "peeled" flag, it might be better to make sure that the latter optimizers know that
> >>the peeled loop does not roll enough (set nb_iterations_upper_bound/nb_iterations_estimate).
> 
> The problem is that we don't know exactly how many iterations are peeled. 
> For example, in vect_do_peeling_for_alignment:
> 
> niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name,
> 							   &wide_prolog_niters);
> 
> We could not give a constant estimation of the upper-bound.

surely there is some upper bound on the number of iterations of the loop (if it
is used to ensure alignment, it cannot be more than the value of the alignment
divided by the step of the memory reference?).  For nb_iterations_estimate, you do
not even have to be sure you are correct, it just needs to be a reasonable guess.

> Also, any bounds you set will be erased after dce.

So is anything else that you attach to loop structure, at the moment, so this still
is not an argument for having "peeled" flag.

> >>Passing the information from gimple to rtl loop optimizer is a long-standing problem.  One possibility is
> >>to keep the information about the loops up-to-date between the optimizers (few years ago, I made
> >>sure that this is possible up to tree->rtl expansion, so making this work again should not be too hard;
> >>handling the expansion and rtl passes might be a little more challenging, but not terribly so).
> >>A less involved solution is to drop some notes in the instruction stream at the end of gimple loop
> >>optimizer, and pick them up in the rtl one,
> 
> Could you please give some details.

I recall having a proof-of-concept patch for this; I will try to find it,

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [Patch for suggestions]: How do we know a loop is the peeled version?
  2010-07-02 18:57     ` Zdenek Dvorak
@ 2010-07-03  0:06       ` Fang, Changpeng
  0 siblings, 0 replies; 7+ messages in thread
From: Fang, Changpeng @ 2010-07-03  0:06 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Richard Guenther, Sebastian Pop, Christian Borntraeger,
	gcc-patches, uweigand

Thanks!

 I am not defending "peeled" flag proposal. Instead, I used the patch to illustrate
the problem.

Anyway, I opened bug 44794 for this problem.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

Have a wonderful weekend.

Changpeng

________________________________________
From: Zdenek Dvorak [rakdver@kam.mff.cuni.cz]
Sent: Friday, July 02, 2010 1:57 PM
To: Fang, Changpeng
Cc: Richard Guenther; Sebastian Pop; Christian Borntraeger; gcc-patches@gcc.gnu.org; uweigand@de.ibm.com
Subject: Re: [Patch for suggestions]: How do we know a loop is the peeled       version?

Hi,

>
> >>instead of the "peeled" flag, it might be better to make sure that the latter optimizers know that
> >>the peeled loop does not roll enough (set nb_iterations_upper_bound/nb_iterations_estimate).
>
> The problem is that we don't know exactly how many iterations are peeled.
> For example, in vect_do_peeling_for_alignment:
>
> niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name,
>                                                          &wide_prolog_niters);
>
> We could not give a constant estimation of the upper-bound.

surely there is some upper bound on the number of iterations of the loop (if it
is used to ensure alignment, it cannot be more than the value of the alignment
divided by the step of the memory reference?).  For nb_iterations_estimate, you do
not even have to be sure you are correct, it just needs to be a reasonable guess.

> Also, any bounds you set will be erased after dce.

So is anything else that you attach to loop structure, at the moment, so this still
is not an argument for having "peeled" flag.

> >>Passing the information from gimple to rtl loop optimizer is a long-standing problem.  One possibility is
> >>to keep the information about the loops up-to-date between the optimizers (few years ago, I made
> >>sure that this is possible up to tree->rtl expansion, so making this work again should not be too hard;
> >>handling the expansion and rtl passes might be a little more challenging, but not terribly so).
> >>A less involved solution is to drop some notes in the instruction stream at the end of gimple loop
> >>optimizer, and pick them up in the rtl one,
>
> Could you please give some details.

I recall having a proof-of-concept patch for this; I will try to find it,

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-03  0:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-01 18:42 [Patch for suggestions]: How do we know a loop is the peeled version? Fang, Changpeng
2010-07-01 19:38 ` Zdenek Dvorak
2010-07-02 18:40   ` Fang, Changpeng
2010-07-02 18:57     ` Zdenek Dvorak
2010-07-03  0:06       ` Fang, Changpeng
2010-07-02 16:31 ` Fang, Changpeng
2010-07-02 18:40   ` Zdenek Dvorak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).