public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* comments on loop changes
@ 1999-01-31 14:58 Richard Henderson
  1999-01-31 16:58 ` Michael Hayes
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Richard Henderson @ 1999-01-31 14:58 UTC (permalink / raw)
  To: Jorn Wolfgang Rennecke; +Cc: egcs

While diffing assembly output to find out where things are
bombing out on Alpha, I noticed some peculiarities:

-       lda $2,1528($30)
+       lda $2,1688($30)
        .align 4
 $L1898:
        ldt $f10,0($3)
-       stt $f10,160($2)
+       stt $f10,160($12)
        ldt $f10,0($4)
-       stt $f10,80($2)
+       stt $f10,80($12)
        ldt $f10,0($5)
-       stt $f10,-240($2)
+       stt $f10,-240($12)
        ldt $f10,0($6)
-       stt $f10,-160($2)
+       stt $f10,-160($12)
        ldt $f10,0($7)
-       stt $f10,-80($2)
+       stt $f10,-80($12)
+       subq $2,160,$12
        ldt $f10,0($8)
-       stt $f10,0($2)
+       stt $f10,-160($2)
        addq $8,8,$8
-       addq $2,8,$2
        addq $7,8,$7
        addq $6,8,$6
        addq $5,8,$5
        addq $4,8,$4
        addq $3,8,$3
+       addq $2,8,$2
        addl $11,1,$11
        cmplt $11,$22,$1
        bne $1,$L1898

In this loop, there are two problems, (1) we use $12 without
initializing it (haven't tracked that down yet), (2) $12 is
wholy unnecessary -- we have in fact increased register pressure
rather than reducing it.

Having added dump output for the changes made by the new code
(appended), I don't see any recombinations that aren't redundant
or worse than combinations done by existing code.  E.g.

Insn 1223: giv reg 627 src reg 380 benefit 57 lifetime 1 replaceable mult 24
	   add (plus:DI (plus:DI (reg:DI 620)
			         (const_int 184))
		        (reg:DI 63 FP))
Insn 1225: dest address src reg 380 benefit 57 lifetime 1 replaceable mult 24
	   add (plus:DI (plus:DI (reg:DI 620)
			         (const_int 184))
		        (reg:DI 63 FP))
Insn 1245: giv reg 642 src reg 380 benefit 57 lifetime 1 replaceable mult 24
	   add (plus:DI (plus:DI (reg:DI 620)
			         (const_int 424))
		        (reg:DI 63 FP))
Insn 1247: dest address src reg 380 benefit 57 lifetime 1 replaceable mult 24
	   add (plus:DI (plus:DI (reg:DI 620)
			         (const_int 424))
		        (reg:DI 63 FP))
Insn 1267: giv reg 657 src reg 380 benefit 57 lifetime 1 replaceable mult 24
	   add (plus:DI (plus:DI (reg:DI 620)
			         (const_int 664))
			(reg:DI 63 FP))
Insn 1269: dest address src reg 380 benefit 57 lifetime 1 replaceable mult 24
	   add (plus:DI (plus:DI (reg:DI 620)
			         (const_int 664))
		        (reg:DI 63 FP))
Insn 1289: giv reg 672 src reg 380 benefit 57 lifetime 1 replaceable mult 24
	   add (plus:DI (plus:DI (reg:DI 620)
			         (const_int 904))
		        (reg:DI 63 FP))
Insn 1291: dest address src reg 380 benefit 57 lifetime 1 replaceable mult 24
	   add (plus:DI (plus:DI (reg:DI 620)
			         (const_int 904))
		        (reg:DI 63 FP))
...
giv at 1291 combined with giv at 1289
giv at 1269 combined with giv at 1289
giv at 1247 combined with giv at 1289
giv at 1225 combined with giv at 1289
...
>> giv at 1225 recombined with giv at 1223 as (reg:DI 627)
>> giv at 1247 recombined with giv at 1245 as (reg:DI 642)
>> giv at 1269 recombined with giv at 1267 as (reg:DI 657)
>> giv at 1291 recombined with giv at 1289 as (reg:DI 672)
>> giv at 1220 derived from 1219 as (reg:DI 625)
>> giv at 1245 derived from 1223 as (reg:DI 642)
>> giv at 1267 derived from 1223 as (reg:DI 657)
>> giv at 1289 derived from 1223 as (reg:DI 672)

So what happens is that all the work done by combine_givs to
unify all of the addr givs with a single reg giv has been undone.

It seems like this optimization might work out better if it was
restricted to reg givs.  Or perhaps AUTO_INC_DEC targets combined
with checking that the displacement between two addr givs is
something that is usable in a {pre,post}_{inc,dec,modify}.


r~


@@ -7119,6 +7117,15 @@ recombine_givs (bl, loop_start, loop_end
              last_giv->combined_with++;
              /* No need to update lifetimes / benefits here since we have
                 already decided what to reduce.  */
+
+             if (loop_dump_stream)
+               {
+                 fprintf (loop_dump_stream,
+                          ">> giv at %d recombined with giv at %d as ",
+                          INSN_UID (v->insn), INSN_UID (last_giv->insn));
+                 print_rtl (loop_dump_stream, v->new_reg);
+                 fprintf (loop_dump_stream, "\n");
+               }
              continue;
            }
          v = v->same;
@@ -7321,9 +7328,18 @@ recombine_givs (bl, loop_start, loop_end
                                  gen_rtx_SET (GET_MODE (v->dest_reg),
                                               v->dest_reg, sum), 0))
            {
              v->derived_from = last_giv;
              v->new_reg = v->dest_reg;
              life_end = stats[i].end_luid;
+
+             if (loop_dump_stream)
+               {
+                 fprintf (loop_dump_stream,
+                          ">> giv at %d derived from %d as ",
+                          INSN_UID (v->insn), INSN_UID (last_giv->insn));
+                 print_rtl (loop_dump_stream, v->new_reg);
+                 fprintf (loop_dump_stream, "\n");
+               }
            }
          else if (rescan < 0)
            rescan = i;


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-01-31 14:58 comments on loop changes Richard Henderson
@ 1999-01-31 16:58 ` Michael Hayes
  1999-01-31 17:16   ` Richard Henderson
  1999-01-31 20:13 ` Jeffrey A Law
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Michael Hayes @ 1999-01-31 16:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jorn Wolfgang Rennecke, m.hayes, egcs

Richard Henderson writes:
 > It seems like this optimization might work out better if it was
 > restricted to reg givs.  Or perhaps AUTO_INC_DEC targets combined
 > with checking that the displacement between two addr givs is
 > something that is usable in a {pre,post}_{inc,dec,modify}.

On a related note, I was hoping that these changes would fix the
problem we have when choosing GIVs to combine that were generated for
accesses to structure arrays (in particular complex arrays).  The
current behaviour is very poor for machines with autoincrement
addressing.

For example, the simple copy of a complex array

void fcvcopy1(const __complex__ float *b, __complex__ float *a, int size)
{
    int     i;

    for (i = 0; i < size; i++)
	b[i] = a[i];
}

generates an inner loop body on the C4x of

L8:
	ldfu	*ar2++,f0
	stf	f0,*ar1++(2)
	ldfu	*ar2++,f1
	stf	f1,*+ar0(1)
	addi3	2,ar0,ar0

compared to what gcc-2.8.1 generates

	ldfu	*ar2++,f0
	ldf	*ar2++,f1	
||	stf	f0,*ar0++
	stf	f1,*ar0++

The latter only requires two address registers, 3 instructions, and
doesn't stall the pipeline by explicity incrementing an address
register.

After some experimentation, I found that the following patch seems to
give the best results.  If you're interested, I can supply a detailed
analysis.  In a nutshell, we give a DEST_REG GIV of the form
(set (reg 2) (mult (reg 1) (const_int x))
too high a benefit since we can often express a DEST_ADDR GIV in terms
of this one using a (mem (plus (reg 2) (reg 3))) addressing mode.
However, this is very undesirable if we support autoincrement
addressing since it leads to a poor combination of GIVS.

Michael.


Index: loop.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/loop.c,v
retrieving revision 1.123
diff -c -3 -p -r1.123 loop.c
*** loop.c	1999/01/30 07:32:56	1.123
--- loop.c	1999/02/01 00:33:40
*************** combine_givs_p (g1, g2)
*** 6713,6718 ****
--- 6713,6723 ----
    if (tem != NULL_RTX
        && g2->giv_type == DEST_ADDR
        && memory_address_p (g2->mem_mode, tem)
+ #ifdef AUTO_INC_DEC
+       && ! (GET_CODE (tem) == PLUS
+ 	    && REG_P (XEXP (tem, 0))
+ 	    && REG_P (XEXP (tem, 1)))
+ #endif
        /* ??? Looses, especially with -fforce-addr, where *g2->location
  	 will always be a register, and so anything more complicated
  	 gets discarded.  */


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-01-31 16:58 ` Michael Hayes
@ 1999-01-31 17:16   ` Richard Henderson
  1999-01-31 18:00     ` Michael Hayes
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 1999-01-31 17:16 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jorn Wolfgang Rennecke, egcs

On Mon, Feb 01, 1999 at 02:01:10PM +1300, Michael Hayes wrote:
> After some experimentation, I found that the following patch seems to
> give the best results.  If you're interested, I can supply a detailed
> analysis. 

I would be interested in seeing one.

> *************** combine_givs_p (g1, g2)
> *** 6713,6718 ****
> --- 6713,6723 ----
>     if (tem != NULL_RTX
>         && g2->giv_type == DEST_ADDR
>         && memory_address_p (g2->mem_mode, tem)
> + #ifdef AUTO_INC_DEC
> +       && ! (GET_CODE (tem) == PLUS
> + 	    && REG_P (XEXP (tem, 0))
> + 	    && REG_P (XEXP (tem, 1)))
> + #endif

I think this is the wrong place to solve this problem.  IMO, combine_givs_p
should only answer the question "is a combination possible", and that
policy on "is a combination worthwhile" should be answered elsewhere.

Note that I have not spent a good deal of time examining losage occurring
in auto_inc_dec code, so I don't know exactly how to fix the problem.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-01-31 17:16   ` Richard Henderson
@ 1999-01-31 18:00     ` Michael Hayes
  1999-01-31 21:41       ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Hayes @ 1999-01-31 18:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Michael Hayes, Jorn Wolfgang Rennecke, egcs

Richard Henderson writes:
 > On Mon, Feb 01, 1999 at 02:01:10PM +1300, Michael Hayes wrote:

 > I think this is the wrong place to solve this problem.  IMO, combine_givs_p
 > should only answer the question "is a combination possible", and that
 > policy on "is a combination worthwhile" should be answered elsewhere.

Good point.  The alternatives as I see it are that
combine_givs_benefit_from is modified to reduce the benefit for this
GIV or that we scan the list of GIVS adjusting the benefits.

 > > After some experimentation, I found that the following patch seems to
 > > give the best results.  If you're interested, I can supply a detailed
 > > analysis. 
 > 
 > I would be interested in seeing one.

Here's the .cse dump for the loop body:


(insn 25 22 27 (parallel[ 
            (set (reg:QI 41)
                (mult:QI (reg/v:QI 40)
                    (const_int 2)))
            (clobber (reg:CC_NOOV 21 st))
        ] ) 52 {*mulqi3_clobber} (nil)
    (nil))

(insn 27 25 31 (parallel[ 
            (set (reg:QI 42)
                (plus:QI (reg:QI 41)
                    (reg/v:QI 38)))
            (clobber (reg:CC_NOOV 21 st))
        ] ) 36 {*addqi3_clobber} (nil)
    (nil))

(insn 31 27 32 (parallel[ 
            (set (reg:QI 44)
                (plus:QI (reg:QI 41)
                    (reg/v:QI 37)))
            (clobber (reg:CC_NOOV 21 st))
        ] ) 36 {*addqi3_clobber} (nil)
    (nil))

(insn 32 31 34 (clobber (mem/s:QC (reg:QI 42) 4)) -1 (nil)
    (nil))

(insn 34 32 35 (set (reg:QF 45)
        (mem/s:QF (reg:QI 44) 0)) 112 {movqf_noclobber} (nil)
    (nil))

(insn 35 34 37 (set (mem/s:QF (reg:QI 42) 0)
        (reg:QF 45)) 112 {movqf_noclobber} (nil)
    (nil))

(insn 37 35 38 (set (reg:QF 46)
        (mem/s:QF (plus:QI (reg:QI 44)
                (const_int 1)) 0)) 112 {movqf_noclobber} (nil)
    (nil))

(insn 38 37 40 (set (mem/s:QF (plus:QI (reg:QI 42)
                (const_int 1)) 0)
        (reg:QF 46)) 112 {movqf_noclobber} (nil)
    (nil))

and here's the interesting bits from the .loop dump:

Loop from 16 to 49: 11 real insns.
Continue at insn 40.
Insn 43: possible biv, reg 40, const =1
Reg 40: biv verified
Biv 40 initialized at insn 15: initial value 0
Insn 25: giv reg 41 src reg 40 benefit 2 lifetime 2 replaceable ncav mult 2 add 0
Insn 27: giv reg 42 src reg 40 benefit 4 lifetime 6 replaceable ncav mult 2 add (reg/v:QI 38)
Insn 31: giv reg 44 src reg 40 benefit 4 lifetime 4 replaceable ncav mult 2 add (reg/v:QI 37)
Insn 34: dest address src reg 40 benefit 4 lifetime 1 replaceable ncav mult 2 add (reg/v:QI 37)
Insn 35: dest address src reg 40 benefit 4 lifetime 1 replaceable ncav mult 2 add (reg/v:QI 38)
Insn 37: dest address src reg 40 benefit 4 lifetime 1 replaceable mult 2 add (plus:QI (reg/v:QI 37)
    (const_int 1))
Insn 38: dest address src reg 40 benefit 4 lifetime 1 replaceable mult 2 add (plus:QI (reg/v:QI 38)
    (const_int 1))
Loop iterations: Final value not constant (reg/v:QI 39).
Cannot eliminate biv 40: biv used in insn 18.
Sorted combine statistics:
 {25, 17} {35, 12} {34, 12} {38, 11} {37, 11} {31, 11} {27, 11}
giv at 35 combined with giv at 25
giv at 34 combined with giv at 25
Sorted combine statistics:
 {31, 8} {27, 8} {38, 4} {37, 4}
giv at 37 combined with giv at 31
Sorted combine statistics:
 {27, 8} {38, 4}
giv at 38 combined with giv at 27
Sorted combine statistics:

giv at 38 reduced to (plus:QI (reg:QI 49)
    (const_int 1))
giv at 37 reduced to (plus:QI (reg:QI 48)
    (const_int 1))
giv at 35 reduced to (plus:QI (reg:QI 50)
    (reg/v:QI 38))
giv at 34 reduced to (reg:QI 48)
giv at 31 reduced to (reg:QI 48)
giv at 27 reduced to (reg:QI 49)
giv at 25 reduced to (reg:QI 50)


Now here's the matrix showing how GIV2 can be expressed in terms of
GIV1 (note that r47 is the addr_placeholder).

GIV1/GIV2     38        37      35      34      31      27      25 
38                           r47-1                   r47-1 
37                                   r47-1   r47-1
35         r47+1                                       r47
34                   r47+1             r47
31                   r44+1             r44         
27         r42+1               r42     
25     r41+r38+1 r41+r37+1 r41+r38 r41+r37 r41+r37 r41+r38

On the c4x all the above derivations are legal addresses except for
REG+REG+CONST.  

Here is the GIV combination benefit matrix (where `-' denotes
that a GIV2 could not be expressed in terms of GIV1):

GIV1/GIV2     38       37      35       34       31      27     25 Benefit
38             4        -     4+3        -        -       0      -      11
37             -        4       -      4+3        0       -      -      11
35           4+3        -     4+1        -        -       0      -      12
34             -      4+3       -      4+1        0       -      -      12
31             -      0+3       -      0+3      4+1       -      -      12
27           0+3        -     0+3        -        -     4+1      -      11
25             0        0     4+3      4+3        0       0     2+1     17

Note that the pesky DEST_REG GIV at insn 25 comes with the greatest
benefit since the DEST_REG GIVS at insns 34 and 35 can be expressed in
terms of REG+REG.

Now my simple minded approach was to penalise the REG+REG addresses
which in this case reduces the benefit of the GIV at insn 25 to 3.

The resulting .loop dump is now:


Loop from 16 to 49: 11 real insns.
Continue at insn 40.
Insn 43: possible biv, reg 40, const =1
Reg 40: biv verified
Biv 40 initialized at insn 15: initial value 0
Insn 25: giv reg 41 src reg 40 benefit 2 lifetime 2 replaceable ncav mult 2 add 0
Insn 27: giv reg 42 src reg 40 benefit 4 lifetime 6 replaceable ncav mult 2 add (reg/v:QI 38)
Insn 31: giv reg 44 src reg 40 benefit 4 lifetime 4 replaceable ncav mult 2 add (reg/v:QI 37)
Insn 34: dest address src reg 40 benefit 4 lifetime 1 replaceable ncav mult 2 add (reg/v:QI 37)
Insn 35: dest address src reg 40 benefit 4 lifetime 1 replaceable ncav mult 2 add (reg/v:QI 38)
Insn 37: dest address src reg 40 benefit 4 lifetime 1 replaceable mult 2 add (plus:QI (reg/v:QI 37)
    (const_int 1))
Insn 38: dest address src reg 40 benefit 4 lifetime 1 replaceable mult 2 add (plus:QI (reg/v:QI 38)
    (const_int 1))
Loop iterations: Final value not constant (reg/v:QI 39).
Cannot eliminate biv 40: biv used in insn 18.
Sorted combine statistics:
 {35, 12} {34, 12} {38, 11} {37, 11} {31, 11} {27, 11} {25, 3}
giv at 38 combined with giv at 35
Sorted combine statistics:
 {31, 11} {27, 8} {34, 5} {37, 4} {25, 3}
giv at 37 combined with giv at 31
giv at 34 combined with giv at 31
Sorted combine statistics:
 {27, 8} {25, 3}
giv of insn 25 not worth while, 0 vs 11.
giv at 38 reduced to (plus:QI (reg:QI 48)
    (const_int 1))
giv at 37 reduced to (plus:QI (reg:QI 49)
    (const_int 1))
giv at 35 reduced to (reg:QI 48)
giv at 34 reduced to (reg:QI 49)
giv at 31 reduced to (reg:QI 49)
giv at 27 reduced to (reg:QI 50)

This results in a much better combination of the GIVs.

Michael.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-01-31 14:58 comments on loop changes Richard Henderson
  1999-01-31 16:58 ` Michael Hayes
@ 1999-01-31 20:13 ` Jeffrey A Law
  1999-01-31 22:05   ` Richard Henderson
  1999-02-01  9:53 ` Joern Rennecke
  1999-02-01 10:53 ` Joern Rennecke
  3 siblings, 1 reply; 23+ messages in thread
From: Jeffrey A Law @ 1999-01-31 20:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jorn Wolfgang Rennecke, egcs

  In message < 19990131145827.A17537@cygnus.com >you write:
[ ... ]
I'd highly recommend we install your code to dump info about recombination
and derivation.  It'll make debugging this code (both for correctness and
performance issues) a hell of a lot easier.

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-01-31 18:00     ` Michael Hayes
@ 1999-01-31 21:41       ` Richard Henderson
  1999-02-01  2:29         ` Michael Hayes
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 1999-01-31 21:41 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jorn Wolfgang Rennecke, egcs

On Mon, Feb 01, 1999 at 03:02:56PM +1300, Michael Hayes wrote:
> Now my simple minded approach was to penalise the REG+REG addresses
> which in this case reduces the benefit of the GIV at insn 25 to 3.

Hmm.  Would you refresh my memory on what sorts of sequences are
likely to be recognized for auto inc/dec?  Or better yet, what
sorts of sequences are recognized that patch I seem to recall you
saying you had.

I'm thinking the best way to solve the problem is

 (1) To specifically identify -- before any combinations are made --
     which pairs are likely to be turned into auto inc/dec.  This may
     include some of the work Joern has done on derivation of givs. 

 (2) With what's left we go through combination as it stands now.

 (3) To that we apply Joern's derivation work (again).

Rationale

 * Auto inc/dec is important enough to deserve special attention.
   At the moment we don't have a cost model suitable for describing
   it within loop, and it's probably not worthwhile to try and 
   cobble one together.  The impression I've gotten is that it
   is so beneficial that you want to use it whenever possible.
   Examples to the contrary welcomed.

 * The combination code I developed early last year is quite good
   at finding what relationships it can -- at least on reg+disp and
   reg+index+disp machines.  (I can't remember what losage, if any,
   I had remaining on Sparc, which has reg+disp and reg+reg.)
   
 * The final derivation pass takes care of adjusting for related
   givs in which the displacement is too large.  Consider a target
   that has a 4 bit displacement (SH or M*Core) (and ignore auto
   inc/dec for the moment so that I don't have to twist my head to
   find a sequence that defeats it): 

	struct { float x, y; int large[40]; } *p;
	for (i = 0; i < n-1; ++i) {
		float x, y;
		x = p[i].x;
		y = p[i].y;
		diffs[i*2+0] = x - p[i+1].x;
		diffs[i*2+1] = y - p[i+1].y;
	}

   At the moment this might be resolved into two givs

	p[i].x = (reg 100)
	p[i].y = (plus (reg 100) 4)
	p[i+1].x = (reg 101)
	p[i+1].y = (plus (reg 101) 4)

   both with increments of 168.  If Joern's code were to avoid
   recombining the .y addr references, this could be derived as

	p[i].x = (reg 100)
	p[i].y = (plus (reg 100) 4)
	(set (reg 100) (plus (reg 100) 168))
	p[i+1].x = (reg 100)
	p[i+1].y = (plus (reg 100) 4)

   Not a great example, but you get the idea. 

   This shows up on machines with non-trivial displacements as well. 
   Arrays in Fortran common blocks are often multiple megabytes, and
   their locations within the block are resolved by the front end quite
   early.

Thoughts?

---

While I'm thinking about it, one final giv-related optimization I'd 
like to see is recognizing when the final value of an inner loop's
giv is the same as an outer loop's increment:

	for (i = 0; i < N; ++i)
	  for (j = 0; j < N; ++j)
	    a[i][j] += 1;

    becomes

	pi = a;
	ci = N-1;
	do {
	  pj = pi;
	  cj = N-1;
	  do {
	    *pj += 1;
	    pj += 4;
	  } while (--cj >= 0);
	  pi += 4*N;
	} while (--ci >= 0);

I believe the first step to recognizing (Hi Toon) that these loops
are collapsible is recognizing that at the end of the inner loop,
pj == pi + 4*N.  Given the delta on pj across the inner loop, and
the manner of its initialization, lets us unify pi and pj into a
common varible.  I've not got a handle on what additional fact lets
us unify ci and cj and collapse the loops.

I do not have enough of loop in my head to babble on about how we
might go about getting this.  I have a feeling we don't keep the
information live across optimizing the different loops, and that 
doing so would be a semi-major restructuring of loop.c. 

Something that I unfortunately don't have time for at just this moment.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-01-31 20:13 ` Jeffrey A Law
@ 1999-01-31 22:05   ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 1999-01-31 22:05 UTC (permalink / raw)
  To: law; +Cc: Jorn Wolfgang Rennecke, egcs

On Sun, Jan 31, 1999 at 04:29:04PM -0700, Jeffrey A Law wrote:
> I'd highly recommend we install your code to dump info about recombination
> and derivation.  It'll make debugging this code (both for correctness and
> performance issues) a hell of a lot easier.

Done.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-01-31 21:41       ` Richard Henderson
@ 1999-02-01  2:29         ` Michael Hayes
       [not found]           ` < 14005.30140.529348.966766@ongaonga.elec.canterbury.ac.nz >
  1999-02-28 22:53           ` Michael Hayes
  0 siblings, 2 replies; 23+ messages in thread
From: Michael Hayes @ 1999-02-01  2:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Michael Hayes, Jorn Wolfgang Rennecke, egcs

Richard Henderson writes:
 > Hmm.  Would you refresh my memory on what sorts of sequences are
 > likely to be recognized for auto inc/dec?  Or better yet, what
 > sorts of sequences are recognized that patch I seem to recall you
 > saying you had.

The most common case combines the following sub-expressions,
where size is the size of the memref:

1: (mem (plus (reg) (offset)))
2: (plus (reg) (inc))

type           inc    offset           
post-inc      size         0
pre-inc       size      size
post-dec     -size         0 
pre-dec      -size     -size

We also handle the two sub-expression in the other order:

1: (plus (reg) (inc))
2: (mem (plus (reg) (offset)))

type           inc    offset           
post-inc      size     -size
pre-inc       size         0
post-dec     -size      size 
pre-dec      -size         0

The autoinc mods I wrote only extended the first case where the
increments followed the memrefs.  They looked to see if a sequence
of memrefs could be converted to auto-{inc, dec, modify} to satisfy
the desired increment.  I also added support for auto-modify
which handled larger constant increments and register increments.

A while ago I looked at a more generic approach to convert memory
references into auto-{inc, dec, modify} which would also handle
increments before the memref.  This required greater changes to flow.c
so I stopped when I saw that others were working on flow.c [I'm
currently waiting to see what effect your changes will have.]


 > I'm thinking the best way to solve the problem is
 > 
 >  (1) To specifically identify -- before any combinations are made --
 >      which pairs are likely to be turned into auto inc/dec.  This may
 >      include some of the work Joern has done on derivation of givs. 

Yes, currently this ad hoc at best.

 >  (2) With what's left we go through combination as it stands now.

So we only combine givs not earmarked for autoinc?

 >  (3) To that we apply Joern's derivation work (again).

 >  * Auto inc/dec is important enough to deserve special attention.

I agree, especially when looking through my DSP view of the world.
This is a crucial optimisation for digital signal processors.

<example snipped> 

 >    At the moment this might be resolved into two givs
 > 
 > 	p[i].x = (reg 100)
 > 	p[i].y = (plus (reg 100) 4)
 > 	p[i+1].x = (reg 101)
 > 	p[i+1].y = (plus (reg 101) 4)

 >    both with increments of 168. 

Unfortunately, after giv combination we get nothing like this at the
moment :-( I would be happy if we achieved this since the adds could
be converted to auto-modifys on the c4x.

 >    If Joern's code were to avoid
 >    recombining the .y addr references, this could be derived as
 > 
 > 	p[i].x = (reg 100)
 > 	p[i].y = (plus (reg 100) 4)
 > 	(set (reg 100) (plus (reg 100) 168))
 > 	p[i+1].x = (reg 100)
 > 	p[i+1].y = (plus (reg 100) 4)

This would be great.  Only 1 auto-modify for the c4x.

 >    Not a great example, but you get the idea. 

It does show up how poorly we combine givs at the moment.  Probably a
better example to concentrate on (to keep the complex arithmetic folk
happy) is:

struct bar { float x, y;};

void foo(struct bar *p, struct bar *d, int n)
{
  int i;
  
  for (i = 0; i < n-1; ++i)
    {
      d[i].x = p[i].x - p[i+1].x;
      d[i].y = p[i].y - p[i+1].y;
    }
}

 > While I'm thinking about it, one final giv-related optimization I'd 
 > like to see is recognizing when the final value of an inner loop's
 > giv is the same as an outer loop's increment:
 > 
 > 	for (i = 0; i < N; ++i)
 > 	  for (j = 0; j < N; ++j)
 > 	    a[i][j] += 1;

Yeah, we make a mess of nested loops like this :-(

 > I do not have enough of loop in my head to babble on about how we
 > might go about getting this.  I have a feeling we don't keep the
 > information live across optimizing the different loops, and that 
 > doing so would be a semi-major restructuring of loop.c. 

The patches to loop.c I submitted a couple of weeks ago would help in
this respect.  We would probably only need to keep the loop iteration
info for one depth of nesting.  It might get a little trickier if
we looked to fuse loops as well.  

Michael.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
       [not found]           ` < 14005.30140.529348.966766@ongaonga.elec.canterbury.ac.nz >
@ 1999-02-01  2:59             ` Richard Henderson
       [not found]               ` < 19990201025925.A17750@cygnus.com >
  1999-02-28 22:53               ` Richard Henderson
  0 siblings, 2 replies; 23+ messages in thread
From: Richard Henderson @ 1999-02-01  2:59 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jorn Wolfgang Rennecke, egcs

On Mon, Feb 01, 1999 at 11:32:06PM +1300, Michael Hayes wrote:
> This required greater changes to flow.c
> so I stopped when I saw that others were working on flow.c [I'm
> currently waiting to see what effect your changes will have.]

Shouldn't be _too_ much in that area.  I've been concentrating
on the CFG for now.

>  >  (2) With what's left we go through combination as it stands now.
> 
> So we only combine givs not earmarked for autoinc?

Yes.  In effect we'll consider them already combined.

>  > 	p[i].x = (reg 100)
>  > 	p[i].y = (plus (reg 100) 4)
>  > 	p[i+1].x = (reg 101)
>  > 	p[i+1].y = (plus (reg 101) 4)
> 
>  >    both with increments of 168. 
> 
> Unfortunately, after giv combination we get nothing like this at the
> moment :-(

Hurm.. I guess that reg+reg thing is really screwing you.  I get

        lds $f10,-172($2)
        lds $f11,-168($2)
        lds $f12,-4($2)
        lds $f13,0($2)
	...
        sts $f10,0($3)
        sts $f11,4($3)

for the code snippet.

> struct bar { float x, y;};
> 
> void foo(struct bar *p, struct bar *d, int n)
> {
>   int i;
>   
>   for (i = 0; i < n-1; ++i)
>     {
>       d[i].x = p[i].x - p[i+1].x;
>       d[i].y = p[i].y - p[i+1].y;
>     }
> }

Again:

        lds $f11,-8($2)
        lds $f12,0($2)
        lds $f10,-4($2)
        lds $f13,4($2)
	addq $2,8,$2
	...
        sts $f11,0($3)
        sts $f10,4($3)
	addq $3,8,$3

I'll take a closer look at your reg+reg problem and see if I can
replicate it on Sparc.  There's got to be a way to convince the
combiner to try reg+reg last.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-01-31 14:58 comments on loop changes Richard Henderson
  1999-01-31 16:58 ` Michael Hayes
  1999-01-31 20:13 ` Jeffrey A Law
@ 1999-02-01  9:53 ` Joern Rennecke
  1999-02-28 22:53   ` Joern Rennecke
  1999-02-01 10:53 ` Joern Rennecke
  3 siblings, 1 reply; 23+ messages in thread
From: Joern Rennecke @ 1999-02-01  9:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: amylaar, egcs

> So what happens is that all the work done by combine_givs to
> unify all of the addr givs with a single reg giv has been undone.

There is already some logic to avoid recombinations that use givs that
might be dead.

          /* combine_givs_p actually says if we can make this transformation.
             The other tests are here only to avoid keeping a giv alive
             that could otherwise be eliminated.  */
          if (last_giv
              && ((old_same->maybe_dead && ! old_same->combined_with)
                  || ! last_giv->maybe_dead
                  || last_giv->combined_with)
              && (new_combine = combine_givs_p (last_giv, v)))


So the real question is, is maybe_dead incorrectly set or cleared for
some givs?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-01-31 14:58 comments on loop changes Richard Henderson
                   ` (2 preceding siblings ...)
  1999-02-01  9:53 ` Joern Rennecke
@ 1999-02-01 10:53 ` Joern Rennecke
       [not found]   ` < 199902011851.SAA27551@phal.cygnus.co.uk >
  1999-02-28 22:53   ` Joern Rennecke
  3 siblings, 2 replies; 23+ messages in thread
From: Joern Rennecke @ 1999-02-01 10:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: amylaar, egcs

> >> giv at 1225 recombined with giv at 1223 as (reg:DI 627)
> >> giv at 1247 recombined with giv at 1245 as (reg:DI 642)
> >> giv at 1269 recombined with giv at 1267 as (reg:DI 657)
> >> giv at 1291 recombined with giv at 1289 as (reg:DI 672)
> >> giv at 1220 derived from 1219 as (reg:DI 625)
> >> giv at 1245 derived from 1223 as (reg:DI 642)
> >> giv at 1267 derived from 1223 as (reg:DI 657)
> >> giv at 1289 derived from 1223 as (reg:DI 672)

Is the giv at 1219 the one that is superflous?  If so, this patch should help:

*************** find_splittable_givs (bl, unroll_type, l
*** 2998,3003 ****
--- 3016,3030 ----
  				 INSN_UID (v->insn));
  		      continue;
  		    }
+ 		  if (v->same && v->same->derived_from)
+ 		    {
+ 		      /* Handle V as if the giv from which V->SAME has
+ 			 been derived has been combined with V.  */
+ 
+ 		      v->same = v->same->derived_from;
+ 		      v->new_reg = express_from (v->same, v);
+ 		    }
+ 
  		}
  	      
  	      /* Store the value of dest_reg into the insn.  This sharing

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
       [not found]   ` < 199902011851.SAA27551@phal.cygnus.co.uk >
@ 1999-02-01 12:36     ` Richard Henderson
  1999-02-28 22:53       ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 1999-02-01 12:36 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: egcs

On Mon, Feb 01, 1999 at 06:51:52PM +0000, Joern Rennecke wrote:
> > >> giv at 1225 recombined with giv at 1223 as (reg:DI 627)
> > >> giv at 1247 recombined with giv at 1245 as (reg:DI 642)
> > >> giv at 1269 recombined with giv at 1267 as (reg:DI 657)
> > >> giv at 1291 recombined with giv at 1289 as (reg:DI 672)
> > >> giv at 1220 derived from 1219 as (reg:DI 625)
> > >> giv at 1245 derived from 1223 as (reg:DI 642)
> > >> giv at 1267 derived from 1223 as (reg:DI 657)
> > >> giv at 1289 derived from 1223 as (reg:DI 672)
> 
> Is the giv at 1219 the one that is superflous?

I'm sorry, that was probably confusing.  Those were two
separate examples.  That dump isn't from the $12 example.

I'll try the patch anyway.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
       [not found]               ` < 19990201025925.A17750@cygnus.com >
@ 1999-02-03 14:12                 ` Michael Hayes
       [not found]                   ` < 14008.51524.336200.906670@ongaonga.elec.canterbury.ac.nz >
  1999-02-28 22:53                   ` Michael Hayes
  0 siblings, 2 replies; 23+ messages in thread
From: Michael Hayes @ 1999-02-03 14:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Michael Hayes, Jorn Wolfgang Rennecke, egcs

Richard Henderson writes:
 > I'll take a closer look at your reg+reg problem and see if I can
 > replicate it on Sparc.  There's got to be a way to convince the
 > combiner to try reg+reg last.

I've found I can emulate the behaviour of the Sparc by setting
ADDRESS_COST to 1.  I then get essentially the same pattern of GIVs
and the GIV combination works well.

So it appears that the problem I see is due to a subtle interaction
between CSE and GIV combination.  Somehow we need the GIV combiner to
be a bit smarter but I'm unsure how to progress.

Michael.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
       [not found]                   ` < 14008.51524.336200.906670@ongaonga.elec.canterbury.ac.nz >
@ 1999-02-05 14:50                     ` Michael Hayes
  1999-02-28 22:53                       ` Michael Hayes
  1999-02-12 16:34                     ` Michael Hayes
  1 sibling, 1 reply; 23+ messages in thread
From: Michael Hayes @ 1999-02-05 14:50 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Richard Henderson, Jorn Wolfgang Rennecke, egcs

Michael Hayes writes:
 > I've found I can emulate the behaviour of the Sparc by setting
 > ADDRESS_COST to 1.  I then get essentially the same pattern of GIVs
 > and the GIV combination works well.

With the Sparc we get the following expressions which the giv combiner
does the right thing:

ra = biv * 8
*(ra + rb)
*(ra + rc)
rd = ra + rb
re = ra + rc
*(rd + 4)
*(re + 4)

However, if a reg+reg addressing cost is higher than a reg addressing
cost (as on the c4x), CSE turns the above into:

ra = biv * 8
rd = ra + rb
re = ra + rc
*rd
*re
*(rd + 4)
*(re + 4)

Unfortunately, giv combination does the wrong thing with this.
The upshot is, if the address costs try to discourage reg+reg
addressing modes, there is a greater chance of generating them!

Michael.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
       [not found]                   ` < 14008.51524.336200.906670@ongaonga.elec.canterbury.ac.nz >
  1999-02-05 14:50                     ` Michael Hayes
@ 1999-02-12 16:34                     ` Michael Hayes
  1999-02-28 22:53                       ` Michael Hayes
  1 sibling, 1 reply; 23+ messages in thread
From: Michael Hayes @ 1999-02-12 16:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Michael Hayes, Jorn Wolfgang Rennecke, egcs

Michael Hayes writes:
 > Richard Henderson writes:
 >  > I'll take a closer look at your reg+reg problem and see if I can
 >  > replicate it on Sparc.  There's got to be a way to convince the
 >  > combiner to try reg+reg last.
 > 
 > I've found I can emulate the behaviour of the Sparc by setting
 > ADDRESS_COST to 1.  I then get essentially the same pattern of GIVs
 > and the GIV combination works well.

Unfortunately, setting ADDRESS_COST to 1 on the c4x kills generation
of autoincrement addressing modes for all the common cases (REG+REG
addresses need a higher cost than REG addresses for CSE TDTRT.  So I
am back to square 1.

Now gcc-2.8.1 still generates much better code for the c4x since it
has the test in combine_givs_p to reject GIVs for combination based on
the address cost.  With your understandable reluctance to revert your
patch to combine_givs_p, the alternative is to modify
combine_givs_benefit_from.  Richard, could you supply a comment for
this function to specify how it should behave?

Michael.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-02-12 16:34                     ` Michael Hayes
@ 1999-02-28 22:53                       ` Michael Hayes
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Hayes @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Michael Hayes, Jorn Wolfgang Rennecke, egcs

Michael Hayes writes:
 > Richard Henderson writes:
 >  > I'll take a closer look at your reg+reg problem and see if I can
 >  > replicate it on Sparc.  There's got to be a way to convince the
 >  > combiner to try reg+reg last.
 > 
 > I've found I can emulate the behaviour of the Sparc by setting
 > ADDRESS_COST to 1.  I then get essentially the same pattern of GIVs
 > and the GIV combination works well.

Unfortunately, setting ADDRESS_COST to 1 on the c4x kills generation
of autoincrement addressing modes for all the common cases (REG+REG
addresses need a higher cost than REG addresses for CSE TDTRT.  So I
am back to square 1.

Now gcc-2.8.1 still generates much better code for the c4x since it
has the test in combine_givs_p to reject GIVs for combination based on
the address cost.  With your understandable reluctance to revert your
patch to combine_givs_p, the alternative is to modify
combine_givs_benefit_from.  Richard, could you supply a comment for
this function to specify how it should behave?

Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-02-01 12:36     ` Richard Henderson
@ 1999-02-28 22:53       ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: egcs

On Mon, Feb 01, 1999 at 06:51:52PM +0000, Joern Rennecke wrote:
> > >> giv at 1225 recombined with giv at 1223 as (reg:DI 627)
> > >> giv at 1247 recombined with giv at 1245 as (reg:DI 642)
> > >> giv at 1269 recombined with giv at 1267 as (reg:DI 657)
> > >> giv at 1291 recombined with giv at 1289 as (reg:DI 672)
> > >> giv at 1220 derived from 1219 as (reg:DI 625)
> > >> giv at 1245 derived from 1223 as (reg:DI 642)
> > >> giv at 1267 derived from 1223 as (reg:DI 657)
> > >> giv at 1289 derived from 1223 as (reg:DI 672)
> 
> Is the giv at 1219 the one that is superflous?

I'm sorry, that was probably confusing.  Those were two
separate examples.  That dump isn't from the $12 example.

I'll try the patch anyway.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-02-01  2:59             ` Richard Henderson
       [not found]               ` < 19990201025925.A17750@cygnus.com >
@ 1999-02-28 22:53               ` Richard Henderson
  1 sibling, 0 replies; 23+ messages in thread
From: Richard Henderson @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jorn Wolfgang Rennecke, egcs

On Mon, Feb 01, 1999 at 11:32:06PM +1300, Michael Hayes wrote:
> This required greater changes to flow.c
> so I stopped when I saw that others were working on flow.c [I'm
> currently waiting to see what effect your changes will have.]

Shouldn't be _too_ much in that area.  I've been concentrating
on the CFG for now.

>  >  (2) With what's left we go through combination as it stands now.
> 
> So we only combine givs not earmarked for autoinc?

Yes.  In effect we'll consider them already combined.

>  > 	p[i].x = (reg 100)
>  > 	p[i].y = (plus (reg 100) 4)
>  > 	p[i+1].x = (reg 101)
>  > 	p[i+1].y = (plus (reg 101) 4)
> 
>  >    both with increments of 168. 
> 
> Unfortunately, after giv combination we get nothing like this at the
> moment :-(

Hurm.. I guess that reg+reg thing is really screwing you.  I get

        lds $f10,-172($2)
        lds $f11,-168($2)
        lds $f12,-4($2)
        lds $f13,0($2)
	...
        sts $f10,0($3)
        sts $f11,4($3)

for the code snippet.

> struct bar { float x, y;};
> 
> void foo(struct bar *p, struct bar *d, int n)
> {
>   int i;
>   
>   for (i = 0; i < n-1; ++i)
>     {
>       d[i].x = p[i].x - p[i+1].x;
>       d[i].y = p[i].y - p[i+1].y;
>     }
> }

Again:

        lds $f11,-8($2)
        lds $f12,0($2)
        lds $f10,-4($2)
        lds $f13,4($2)
	addq $2,8,$2
	...
        sts $f11,0($3)
        sts $f10,4($3)
	addq $3,8,$3

I'll take a closer look at your reg+reg problem and see if I can
replicate it on Sparc.  There's got to be a way to convince the
combiner to try reg+reg last.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-02-03 14:12                 ` Michael Hayes
       [not found]                   ` < 14008.51524.336200.906670@ongaonga.elec.canterbury.ac.nz >
@ 1999-02-28 22:53                   ` Michael Hayes
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Hayes @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Michael Hayes, Jorn Wolfgang Rennecke, egcs

Richard Henderson writes:
 > I'll take a closer look at your reg+reg problem and see if I can
 > replicate it on Sparc.  There's got to be a way to convince the
 > combiner to try reg+reg last.

I've found I can emulate the behaviour of the Sparc by setting
ADDRESS_COST to 1.  I then get essentially the same pattern of GIVs
and the GIV combination works well.

So it appears that the problem I see is due to a subtle interaction
between CSE and GIV combination.  Somehow we need the GIV combiner to
be a bit smarter but I'm unsure how to progress.

Michael.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-02-01  2:29         ` Michael Hayes
       [not found]           ` < 14005.30140.529348.966766@ongaonga.elec.canterbury.ac.nz >
@ 1999-02-28 22:53           ` Michael Hayes
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Hayes @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Michael Hayes, Jorn Wolfgang Rennecke, egcs

Richard Henderson writes:
 > Hmm.  Would you refresh my memory on what sorts of sequences are
 > likely to be recognized for auto inc/dec?  Or better yet, what
 > sorts of sequences are recognized that patch I seem to recall you
 > saying you had.

The most common case combines the following sub-expressions,
where size is the size of the memref:

1: (mem (plus (reg) (offset)))
2: (plus (reg) (inc))

type           inc    offset           
post-inc      size         0
pre-inc       size      size
post-dec     -size         0 
pre-dec      -size     -size

We also handle the two sub-expression in the other order:

1: (plus (reg) (inc))
2: (mem (plus (reg) (offset)))

type           inc    offset           
post-inc      size     -size
pre-inc       size         0
post-dec     -size      size 
pre-dec      -size         0

The autoinc mods I wrote only extended the first case where the
increments followed the memrefs.  They looked to see if a sequence
of memrefs could be converted to auto-{inc, dec, modify} to satisfy
the desired increment.  I also added support for auto-modify
which handled larger constant increments and register increments.

A while ago I looked at a more generic approach to convert memory
references into auto-{inc, dec, modify} which would also handle
increments before the memref.  This required greater changes to flow.c
so I stopped when I saw that others were working on flow.c [I'm
currently waiting to see what effect your changes will have.]


 > I'm thinking the best way to solve the problem is
 > 
 >  (1) To specifically identify -- before any combinations are made --
 >      which pairs are likely to be turned into auto inc/dec.  This may
 >      include some of the work Joern has done on derivation of givs. 

Yes, currently this ad hoc at best.

 >  (2) With what's left we go through combination as it stands now.

So we only combine givs not earmarked for autoinc?

 >  (3) To that we apply Joern's derivation work (again).

 >  * Auto inc/dec is important enough to deserve special attention.

I agree, especially when looking through my DSP view of the world.
This is a crucial optimisation for digital signal processors.

<example snipped> 

 >    At the moment this might be resolved into two givs
 > 
 > 	p[i].x = (reg 100)
 > 	p[i].y = (plus (reg 100) 4)
 > 	p[i+1].x = (reg 101)
 > 	p[i+1].y = (plus (reg 101) 4)

 >    both with increments of 168. 

Unfortunately, after giv combination we get nothing like this at the
moment :-( I would be happy if we achieved this since the adds could
be converted to auto-modifys on the c4x.

 >    If Joern's code were to avoid
 >    recombining the .y addr references, this could be derived as
 > 
 > 	p[i].x = (reg 100)
 > 	p[i].y = (plus (reg 100) 4)
 > 	(set (reg 100) (plus (reg 100) 168))
 > 	p[i+1].x = (reg 100)
 > 	p[i+1].y = (plus (reg 100) 4)

This would be great.  Only 1 auto-modify for the c4x.

 >    Not a great example, but you get the idea. 

It does show up how poorly we combine givs at the moment.  Probably a
better example to concentrate on (to keep the complex arithmetic folk
happy) is:

struct bar { float x, y;};

void foo(struct bar *p, struct bar *d, int n)
{
  int i;
  
  for (i = 0; i < n-1; ++i)
    {
      d[i].x = p[i].x - p[i+1].x;
      d[i].y = p[i].y - p[i+1].y;
    }
}

 > While I'm thinking about it, one final giv-related optimization I'd 
 > like to see is recognizing when the final value of an inner loop's
 > giv is the same as an outer loop's increment:
 > 
 > 	for (i = 0; i < N; ++i)
 > 	  for (j = 0; j < N; ++j)
 > 	    a[i][j] += 1;

Yeah, we make a mess of nested loops like this :-(

 > I do not have enough of loop in my head to babble on about how we
 > might go about getting this.  I have a feeling we don't keep the
 > information live across optimizing the different loops, and that 
 > doing so would be a semi-major restructuring of loop.c. 

The patches to loop.c I submitted a couple of weeks ago would help in
this respect.  We would probably only need to keep the loop iteration
info for one depth of nesting.  It might get a little trickier if
we looked to fuse loops as well.  

Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-02-01 10:53 ` Joern Rennecke
       [not found]   ` < 199902011851.SAA27551@phal.cygnus.co.uk >
@ 1999-02-28 22:53   ` Joern Rennecke
  1 sibling, 0 replies; 23+ messages in thread
From: Joern Rennecke @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: amylaar, egcs

> >> giv at 1225 recombined with giv at 1223 as (reg:DI 627)
> >> giv at 1247 recombined with giv at 1245 as (reg:DI 642)
> >> giv at 1269 recombined with giv at 1267 as (reg:DI 657)
> >> giv at 1291 recombined with giv at 1289 as (reg:DI 672)
> >> giv at 1220 derived from 1219 as (reg:DI 625)
> >> giv at 1245 derived from 1223 as (reg:DI 642)
> >> giv at 1267 derived from 1223 as (reg:DI 657)
> >> giv at 1289 derived from 1223 as (reg:DI 672)

Is the giv at 1219 the one that is superflous?  If so, this patch should help:

*************** find_splittable_givs (bl, unroll_type, l
*** 2998,3003 ****
--- 3016,3030 ----
  				 INSN_UID (v->insn));
  		      continue;
  		    }
+ 		  if (v->same && v->same->derived_from)
+ 		    {
+ 		      /* Handle V as if the giv from which V->SAME has
+ 			 been derived has been combined with V.  */
+ 
+ 		      v->same = v->same->derived_from;
+ 		      v->new_reg = express_from (v->same, v);
+ 		    }
+ 
  		}
  	      
  	      /* Store the value of dest_reg into the insn.  This sharing

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-02-05 14:50                     ` Michael Hayes
@ 1999-02-28 22:53                       ` Michael Hayes
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Hayes @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Richard Henderson, Jorn Wolfgang Rennecke, egcs

Michael Hayes writes:
 > I've found I can emulate the behaviour of the Sparc by setting
 > ADDRESS_COST to 1.  I then get essentially the same pattern of GIVs
 > and the GIV combination works well.

With the Sparc we get the following expressions which the giv combiner
does the right thing:

ra = biv * 8
*(ra + rb)
*(ra + rc)
rd = ra + rb
re = ra + rc
*(rd + 4)
*(re + 4)

However, if a reg+reg addressing cost is higher than a reg addressing
cost (as on the c4x), CSE turns the above into:

ra = biv * 8
rd = ra + rb
re = ra + rc
*rd
*re
*(rd + 4)
*(re + 4)

Unfortunately, giv combination does the wrong thing with this.
The upshot is, if the address costs try to discourage reg+reg
addressing modes, there is a greater chance of generating them!

Michael.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: comments on loop changes
  1999-02-01  9:53 ` Joern Rennecke
@ 1999-02-28 22:53   ` Joern Rennecke
  0 siblings, 0 replies; 23+ messages in thread
From: Joern Rennecke @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: amylaar, egcs

> So what happens is that all the work done by combine_givs to
> unify all of the addr givs with a single reg giv has been undone.

There is already some logic to avoid recombinations that use givs that
might be dead.

          /* combine_givs_p actually says if we can make this transformation.
             The other tests are here only to avoid keeping a giv alive
             that could otherwise be eliminated.  */
          if (last_giv
              && ((old_same->maybe_dead && ! old_same->combined_with)
                  || ! last_giv->maybe_dead
                  || last_giv->combined_with)
              && (new_combine = combine_givs_p (last_giv, v)))


So the real question is, is maybe_dead incorrectly set or cleared for
some givs?

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~1999-02-28 22:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-31 14:58 comments on loop changes Richard Henderson
1999-01-31 16:58 ` Michael Hayes
1999-01-31 17:16   ` Richard Henderson
1999-01-31 18:00     ` Michael Hayes
1999-01-31 21:41       ` Richard Henderson
1999-02-01  2:29         ` Michael Hayes
     [not found]           ` < 14005.30140.529348.966766@ongaonga.elec.canterbury.ac.nz >
1999-02-01  2:59             ` Richard Henderson
     [not found]               ` < 19990201025925.A17750@cygnus.com >
1999-02-03 14:12                 ` Michael Hayes
     [not found]                   ` < 14008.51524.336200.906670@ongaonga.elec.canterbury.ac.nz >
1999-02-05 14:50                     ` Michael Hayes
1999-02-28 22:53                       ` Michael Hayes
1999-02-12 16:34                     ` Michael Hayes
1999-02-28 22:53                       ` Michael Hayes
1999-02-28 22:53                   ` Michael Hayes
1999-02-28 22:53               ` Richard Henderson
1999-02-28 22:53           ` Michael Hayes
1999-01-31 20:13 ` Jeffrey A Law
1999-01-31 22:05   ` Richard Henderson
1999-02-01  9:53 ` Joern Rennecke
1999-02-28 22:53   ` Joern Rennecke
1999-02-01 10:53 ` Joern Rennecke
     [not found]   ` < 199902011851.SAA27551@phal.cygnus.co.uk >
1999-02-01 12:36     ` Richard Henderson
1999-02-28 22:53       ` Richard Henderson
1999-02-28 22:53   ` Joern Rennecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).