public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* strength reduction example
@ 1999-02-18  6:51 Zack Weinberg
       [not found] ` < 199902181451.JAA11891@blastula.phys.columbia.edu >
  1999-02-28 22:53 ` Zack Weinberg
  0 siblings, 2 replies; 20+ messages in thread
From: Zack Weinberg @ 1999-02-18  6:51 UTC (permalink / raw)
  To: egcs

I've been looking at a strength reduction bug for the last week or
so.  Until very recently we generated incorrect code when Joern's giv
combiner was active for this case.  Now the code is correct, but
decidedly suboptimal due to increased register pressure.

Here's the source:

typedef unsigned long size_t;
char *
copy(s1, s2, n)
     char *s1;
     const char *s2;
     size_t n;
{
  char c;
  char *s = s1;
  size_t n4 = n >> 2;
  --s;

  do
  {
    c = *s2++;
    *++s1 = c;
    if(c == '\0') break;
    c = *s2++;
    *++s1 = c;
    if(c == '\0') break;
    c = *s2++;
    *++s1 = c;
    if(c == '\0') break;
    c = *s2++;
    *++s1 = c;
    if(c == '\0') break;
  }
  while(--n4);

  return s;
}

Without Joern's stuff, we get this assembly for the loop:

.L5:
        decl %ebx
        jz .L4
.L3:
        movb (%eax),%dl
        incl %eax
        incl %ecx
        movb %dl,(%ecx)
        testb %dl,%dl
        je .L4
        movb (%eax),%dl
        incl %eax
        incl %ecx
        movb %dl,(%ecx)
        testb %dl,%dl
        je .L4
        movb (%eax),%dl
        incl %eax
        incl %ecx
        movb %dl,(%ecx)
        testb %dl,%dl
        je .L4
        movb (%eax),%dl
        incl %eax
        incl %ecx
        movb %dl,(%ecx)
        testb %dl,%dl
        jne .L5
.L4:

This code is fine.  It may in fact be optimal for this case, due to
pipelining; I'm not enough of an x86 expert to say.  If I were to bang
on it by hand, I'd produce something like this:

.L5:
        decl %ebx
        jz .L4
	addl $4,%eax
	addl $4,%ecx
.L3:
        movb (%eax),%dl
        movb %dl,1(%ecx)
        testb %dl,%dl
        je .L4
        movb 1(%eax),%dl
        movb %dl,2(%ecx)
        testb %dl,%dl
        je .L4
        movb 2(%eax),%dl
        movb %dl,3(%ecx)
        testb %dl,%dl
        je .L4
        movb 3(%eax),%dl
        movb %dl,4(%ecx)
        testb %dl,%dl
        jne .L5
.L4:

which might be worse due to pipeline stalls, but has the same register
demands and roughly the same code size.

With Joern's code, we get instead:

.L5:
        decl -8(%ebp)
        jz .L4
.L3:
        movb -3(%edi),%dl
        movb %dl,(%eax)
        testb %dl,%dl
        je .L4
        movb (%ebx),%dl
        leal 1(%eax),%esi
        movb %dl,1(%eax)
        testb %dl,%dl
        je .L4
        movl -12(%ebp),%ebx
        movb (%ebx),%dl
        movb %dl,2(%eax)
        testb %dl,%dl
        je .L4
        movb (%edi),%dl
        addl $4,%ebx
        movl %ebx,-12(%ebp)
        addl $4,%ecx
        leal 3(%ecx),%edi
        leal 1(%ecx),%ebx
        addl $4,%eax
        movb %dl,-2(%esi)
        testb %dl,%dl
        jne .L5
.L4:

Register pressure has forced things onto the stack.  We have many more
address generations.  We have pointer recalculations inside the loop.
We have significantly increased code size.

Looking at the loop dump, the difference seems to be that Joern's code
converts the incremented pointers to their own givs, and then
preserves those givs at the expense of the original bivs.  We should
prefer to hang on to the original bivs and express things in terms of
them, at least in this context.

zw

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
       [not found] ` < 199902181451.JAA11891@blastula.phys.columbia.edu >
@ 1999-02-18 17:27   ` Joern Rennecke
       [not found]     ` < 199902190127.BAA27142@phal.cygnus.co.uk >
  1999-02-28 22:53     ` Joern Rennecke
  0 siblings, 2 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-18 17:27 UTC (permalink / raw)
  To: Zack Weinberg, wilson, law; +Cc: egcs

> I've been looking at a strength reduction bug for the last week or
> so.  Until very recently we generated incorrect code when Joern's giv
> combiner was active for this case.  Now the code is correct, but
> decidedly suboptimal due to increased register pressure.

Please try / review this patch:

Fri Feb 19 01:25:17 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* rtl.h (subst, subst_insn): Declare.
	* loop.c (strength_reduce): Clean up code to set maybe_multiple.
	In code to convert biv increments to givs, try to create address givs.
	* combine.c (subst, subst_insn): No longer static.
	(combine_instructions): Before returning, clear reg_last_set_value.
	(nonzero_bits): handle case when reg_last_set_value is not set.

===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/rtl.h,v
retrieving revision 1.105.2.1
diff -p -r1.105.2.1 rtl.h
*** rtl.h	1999/01/28 00:26:15	1.105.2.1
--- rtl.h	1999/02/19 01:18:53
*************** extern void add_clobbers		PROTO ((rtx, i
*** 1370,1375 ****
--- 1370,1377 ----
  extern void combine_instructions	PROTO ((rtx, int));
  extern int extended_count		PROTO ((rtx, enum machine_mode, int));
  extern rtx remove_death			PROTO ((int, rtx));
+ extern rtx subst			PROTO((rtx, rtx, rtx, int, int));
+ extern rtx subst_insn;
  #ifdef BUFSIZ
  extern void dump_combine_stats		PROTO ((FILE *));
  extern void dump_combine_total_stats	PROTO ((FILE *));
Index: loop.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/loop.c,v
retrieving revision 1.166.2.31
diff -p -r1.166.2.31 loop.c
*** loop.c	1999/02/18 19:41:23	1.166.2.31
--- loop.c	1999/02/19 01:18:55
*************** strength_reduce (scan_start, end, loop_t
*** 3782,3794 ****
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && (INSN_UID (JUMP_LABEL (insn)) >= max_uid_for_loop
! 			      || (INSN_UID (p) < max_uid_for_loop
! 				  ? (INSN_LUID (JUMP_LABEL (insn))
! 				     <= INSN_LUID (p))
! 				  : (INSN_UID (insn) >= max_uid_for_loop
! 				     || (INSN_LUID (JUMP_LABEL (insn))
! 					 < INSN_LUID (insn))))))))
  		{
  		  maybe_multiple = 1;
  		  break;
--- 3813,3819 ----
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && ! loop_insn_first_p (p, JUMP_LABEL (insn)))))
  		{
  		  maybe_multiple = 1;
  		  break;
*************** strength_reduce (scan_start, end, loop_t
*** 4157,4164 ****
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
--- 4182,4190 ----
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, src, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
+ 	      int all_eliminated;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
*************** strength_reduce (scan_start, end, loop_t
*** 4198,4203 ****
--- 4224,4268 ----
  		  continue;
  		}
  	      next->add_val = add_val;
+ 
+ 	      /* Remove the increment from the list of biv increments.  */
+ 	      *vp = next;
+ 	      bl->biv_count--;
+ 	      old_regno = REGNO (old_reg);
+ 	      new_regno = REGNO (dest_reg);
+ 	      VARRAY_INT (set_in_loop, old_regno)--;
+ 	      VARRAY_INT (n_times_set, old_regno)--;
+ 
+ 	      src = SET_SRC (set);
+ 	      all_eliminated = 1;
+ 	      for (p = NEXT_INSN (v->insn); p != next->insn; p = NEXT_INSN (p))
+ 		{
+ 		  rtx newpat;
+ 
+ 		  if (GET_RTX_CLASS (GET_CODE (p)) != 'i')
+ 		    continue;
+ 		  subst_insn = p;
+ 		  newpat = subst (PATTERN (p), old_reg, src, 0, 0);
+ 		  if (validate_change (p, &PATTERN (p), newpat, 0))
+ 		    replace_rtx (REG_NOTES (p), old_reg, src);
+ 		  else
+ 		    all_eliminated = 0;
+ 		}
+ 	      if (all_eliminated)
+ 		{
+ 		  if (loop_dump_stream)
+ 		    fprintf (loop_dump_stream,
+ 			     "Increment %d of biv %d eliminated.\n\n",
+ 			 INSN_UID (v->insn), old_regno);
+ 		  PUT_CODE (v->insn, NOTE);
+ 		  NOTE_LINE_NUMBER (v->insn) = NOTE_INSN_DELETED;
+ 		  NOTE_SOURCE_FILE (v->insn) = 0;
+ 		  VARRAY_CHAR (may_not_optimize, new_regno) = 0;
+ 		  VARRAY_INT (set_in_loop, new_regno) = 0;
+ 		  VARRAY_INT (n_times_set, new_regno) = 0;
+ 		  continue;
+ 		}
+ 
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
*************** strength_reduce (scan_start, end, loop_t
*** 4219,4239 ****
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
- 	      old_regno = REGNO (old_reg);
- 	      new_regno = REGNO (dest_reg);
- 	      VARRAY_INT (set_in_loop, old_regno)--;
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
- 	      VARRAY_INT (n_times_set, old_regno)--;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
- 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Remove the increment from the list of biv increments,
! 		 and record it as a giv.  */
! 	      *vp = next;
! 	      bl->biv_count--;
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
--- 4284,4296 ----
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Record V as a giv.  */
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
Index: combine.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/combine.c,v
retrieving revision 1.193
diff -p -r1.193 combine.c
*** combine.c	1999/01/20 17:50:18	1.193
--- combine.c	1999/02/19 01:18:59
*************** static int last_call_cuid;
*** 165,171 ****
     looked at, but this may be used to examine the successors of the insn
     to judge whether a simplification is valid.  */
  
! static rtx subst_insn;
  
  /* This is an insn that belongs before subst_insn, but is not currently
     on the insn chain.  */
--- 165,171 ----
     looked at, but this may be used to examine the successors of the insn
     to judge whether a simplification is valid.  */
  
! rtx subst_insn;
  
  /* This is an insn that belongs before subst_insn, but is not currently
     on the insn chain.  */
*************** static int combinable_i3pat	PROTO((rtx, 
*** 398,404 ****
  static rtx try_combine		PROTO((rtx, rtx, rtx));
  static void undo_all		PROTO((void));
  static rtx *find_split_point	PROTO((rtx *, rtx));
- static rtx subst		PROTO((rtx, rtx, rtx, int, int));
  static rtx simplify_rtx		PROTO((rtx, enum machine_mode, int, int));
  static rtx simplify_if_then_else  PROTO((rtx));
  static rtx simplify_set		PROTO((rtx));
--- 398,403 ----
*************** combine_instructions (f, nregs)
*** 674,679 ****
--- 673,679 ----
    total_successes += combine_successes;
  
    nonzero_sign_valid = 0;
+   reg_last_set_value = 0;
  
    /* Make recognizer allow volatile MEMs again.  */
    init_recog ();
*************** find_split_point (loc, insn)
*** 3104,3110 ****
     UNIQUE_COPY is non-zero if each substitution must be unique.  We do this
     by copying if `n_occurrences' is non-zero.  */
  
! static rtx
  subst (x, from, to, in_dest, unique_copy)
       register rtx x, from, to;
       int in_dest;
--- 3147,3153 ----
     UNIQUE_COPY is non-zero if each substitution must be unique.  We do this
     by copying if `n_occurrences' is non-zero.  */
  
! rtx
  subst (x, from, to, in_dest, unique_copy)
       register rtx x, from, to;
       int in_dest;
*************** nonzero_bits (x, mode)
*** 7622,7627 ****
--- 7665,7674 ----
  	  return nonzero &= ~ (sp_alignment - 1);
  	}
  #endif
+ 
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return nonzero;
  
        /* If X is a register whose nonzero bits value is current, use it.
  	 Otherwise, if X is a register whose value we can find, use that

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
       [not found]     ` < 199902190127.BAA27142@phal.cygnus.co.uk >
@ 1999-02-18 18:36       ` Joern Rennecke
       [not found]         ` < 199902190235.CAA32082@phal.cygnus.co.uk >
  1999-02-28 22:53         ` Joern Rennecke
  0 siblings, 2 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-18 18:36 UTC (permalink / raw)
  To: zack, wilson, law; +Cc: egcs

It turned out that I also have to change num_sign_bit_copies and
reg_last_value.  Here is an updated patch:

Fri Feb 19 02:34:23 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* rtl.h (subst, subst_insn): Declare.
	* loop.c (strength_reduce): Clean up code to set maybe_multiple.
	In code to convert biv increments to givs, try to create address givs.
	* combine.c (subst, subst_insn): No longer static.
	(combine_instructions): Before returning, clear reg_last_set_value.
	(nonzero_bits, num_sign_bit_copies, reg_last_value):
	Handle case when reg_last_set_value is not set.

===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/rtl.h,v
retrieving revision 1.105.2.1
diff -p -r1.105.2.1 rtl.h
*** rtl.h	1999/01/28 00:26:15	1.105.2.1
--- rtl.h	1999/02/19 01:18:53
*************** extern void add_clobbers		PROTO ((rtx, i
*** 1370,1375 ****
--- 1370,1377 ----
  extern void combine_instructions	PROTO ((rtx, int));
  extern int extended_count		PROTO ((rtx, enum machine_mode, int));
  extern rtx remove_death			PROTO ((int, rtx));
+ extern rtx subst			PROTO((rtx, rtx, rtx, int, int));
+ extern rtx subst_insn;
  #ifdef BUFSIZ
  extern void dump_combine_stats		PROTO ((FILE *));
  extern void dump_combine_total_stats	PROTO ((FILE *));
Index: loop.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/loop.c,v
retrieving revision 1.166.2.31
diff -p -r1.166.2.31 loop.c
*** loop.c	1999/02/18 19:41:23	1.166.2.31
--- loop.c	1999/02/19 01:18:55
*************** strength_reduce (scan_start, end, loop_t
*** 3782,3794 ****
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && (INSN_UID (JUMP_LABEL (insn)) >= max_uid_for_loop
! 			      || (INSN_UID (p) < max_uid_for_loop
! 				  ? (INSN_LUID (JUMP_LABEL (insn))
! 				     <= INSN_LUID (p))
! 				  : (INSN_UID (insn) >= max_uid_for_loop
! 				     || (INSN_LUID (JUMP_LABEL (insn))
! 					 < INSN_LUID (insn))))))))
  		{
  		  maybe_multiple = 1;
  		  break;
--- 3813,3819 ----
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && ! loop_insn_first_p (p, JUMP_LABEL (insn)))))
  		{
  		  maybe_multiple = 1;
  		  break;
*************** strength_reduce (scan_start, end, loop_t
*** 4157,4164 ****
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
--- 4182,4190 ----
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, src, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
+ 	      int all_eliminated;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
*************** strength_reduce (scan_start, end, loop_t
*** 4198,4203 ****
--- 4224,4268 ----
  		  continue;
  		}
  	      next->add_val = add_val;
+ 
+ 	      /* Remove the increment from the list of biv increments.  */
+ 	      *vp = next;
+ 	      bl->biv_count--;
+ 	      old_regno = REGNO (old_reg);
+ 	      new_regno = REGNO (dest_reg);
+ 	      VARRAY_INT (set_in_loop, old_regno)--;
+ 	      VARRAY_INT (n_times_set, old_regno)--;
+ 
+ 	      src = SET_SRC (set);
+ 	      all_eliminated = 1;
+ 	      for (p = NEXT_INSN (v->insn); p != next->insn; p = NEXT_INSN (p))
+ 		{
+ 		  rtx newpat;
+ 
+ 		  if (GET_RTX_CLASS (GET_CODE (p)) != 'i')
+ 		    continue;
+ 		  subst_insn = p;
+ 		  newpat = subst (PATTERN (p), old_reg, src, 0, 0);
+ 		  if (validate_change (p, &PATTERN (p), newpat, 0))
+ 		    replace_rtx (REG_NOTES (p), old_reg, src);
+ 		  else
+ 		    all_eliminated = 0;
+ 		}
+ 	      if (all_eliminated)
+ 		{
+ 		  if (loop_dump_stream)
+ 		    fprintf (loop_dump_stream,
+ 			     "Increment %d of biv %d eliminated.\n\n",
+ 			 INSN_UID (v->insn), old_regno);
+ 		  PUT_CODE (v->insn, NOTE);
+ 		  NOTE_LINE_NUMBER (v->insn) = NOTE_INSN_DELETED;
+ 		  NOTE_SOURCE_FILE (v->insn) = 0;
+ 		  VARRAY_CHAR (may_not_optimize, new_regno) = 0;
+ 		  VARRAY_INT (set_in_loop, new_regno) = 0;
+ 		  VARRAY_INT (n_times_set, new_regno) = 0;
+ 		  continue;
+ 		}
+ 
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
*************** strength_reduce (scan_start, end, loop_t
*** 4219,4239 ****
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
- 	      old_regno = REGNO (old_reg);
- 	      new_regno = REGNO (dest_reg);
- 	      VARRAY_INT (set_in_loop, old_regno)--;
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
- 	      VARRAY_INT (n_times_set, old_regno)--;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
- 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Remove the increment from the list of biv increments,
! 		 and record it as a giv.  */
! 	      *vp = next;
! 	      bl->biv_count--;
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
--- 4284,4296 ----
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Record V as a giv.  */
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
Index: combine.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/combine.c,v
retrieving revision 1.193
diff -p -r1.193 combine.c
*** combine.c	1999/01/20 17:50:18	1.193
--- combine.c	1999/02/19 02:32:53
*************** static int last_call_cuid;
*** 165,171 ****
     looked at, but this may be used to examine the successors of the insn
     to judge whether a simplification is valid.  */
  
! static rtx subst_insn;
  
  /* This is an insn that belongs before subst_insn, but is not currently
     on the insn chain.  */
--- 165,171 ----
     looked at, but this may be used to examine the successors of the insn
     to judge whether a simplification is valid.  */
  
! rtx subst_insn;
  
  /* This is an insn that belongs before subst_insn, but is not currently
     on the insn chain.  */
*************** static int combinable_i3pat	PROTO((rtx, 
*** 398,404 ****
  static rtx try_combine		PROTO((rtx, rtx, rtx));
  static void undo_all		PROTO((void));
  static rtx *find_split_point	PROTO((rtx *, rtx));
- static rtx subst		PROTO((rtx, rtx, rtx, int, int));
  static rtx simplify_rtx		PROTO((rtx, enum machine_mode, int, int));
  static rtx simplify_if_then_else  PROTO((rtx));
  static rtx simplify_set		PROTO((rtx));
--- 398,403 ----
*************** combine_instructions (f, nregs)
*** 674,679 ****
--- 673,679 ----
    total_successes += combine_successes;
  
    nonzero_sign_valid = 0;
+   reg_last_set_value = 0;
  
    /* Make recognizer allow volatile MEMs again.  */
    init_recog ();
*************** find_split_point (loc, insn)
*** 3104,3110 ****
     UNIQUE_COPY is non-zero if each substitution must be unique.  We do this
     by copying if `n_occurrences' is non-zero.  */
  
! static rtx
  subst (x, from, to, in_dest, unique_copy)
       register rtx x, from, to;
       int in_dest;
--- 3147,3153 ----
     UNIQUE_COPY is non-zero if each substitution must be unique.  We do this
     by copying if `n_occurrences' is non-zero.  */
  
! rtx
  subst (x, from, to, in_dest, unique_copy)
       register rtx x, from, to;
       int in_dest;
*************** nonzero_bits (x, mode)
*** 7623,7628 ****
--- 7666,7675 ----
  	}
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return nonzero;
+ 
        /* If X is a register whose nonzero bits value is current, use it.
  	 Otherwise, if X is a register whose value we can find, use that
  	 value.  Otherwise, use the previously-computed global nonzero bits
*************** num_sign_bit_copies (x, mode)
*** 8018,8023 ****
--- 8065,8074 ----
  	return GET_MODE_BITSIZE (Pmode) - GET_MODE_BITSIZE (ptr_mode) + 1;
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return 1;
+ 
        if (reg_last_set_value[REGNO (x)] != 0
  	  && reg_last_set_mode[REGNO (x)] == mode
  	  && (REG_N_SETS (REGNO (x)) == 1
*************** get_last_value (x)
*** 10924,10930 ****
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   if (GET_CODE (x) != REG)
      return 0;
  
    regno = REGNO (x);
--- 10975,10982 ----
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   /* If called from loop, the reg_last_* arrays are not set.  */
!   if (GET_CODE (x) != REG || ! reg_last_set_value)
      return 0;
  
    regno = REGNO (x);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
       [not found]         ` < 199902190235.CAA32082@phal.cygnus.co.uk >
@ 1999-02-18 20:09           ` Zack Weinberg
       [not found]             ` < 199902190406.XAA05899@blastula.phys.columbia.edu >
  1999-02-28 22:53             ` Zack Weinberg
  0 siblings, 2 replies; 20+ messages in thread
From: Zack Weinberg @ 1999-02-18 20:09 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: wilson, law, egcs

On Fri, 19 Feb 1999 02:35:54 +0000 (GMT), Joern Rennecke wrote:
>It turned out that I also have to change num_sign_bit_copies and
>reg_last_value.  Here is an updated patch:

Looks good on my test case: loop.c with combination enabled and this
patch applied produces almost exactly the code I said I'd have written
by hand.  I'm going to run a full bootstrap now and make sure nothing
broke.

zw

;; Function strncpy

Loop from 22 to 109: 27 real insns.
Continue at insn 94.
Insn 29: possible biv, reg 23, const =1
Insn 32: possible biv, reg 22, const =1
Insn 46: possible biv, reg 23, const =1
Insn 49: possible biv, reg 22, const =1
Insn 63: possible biv, reg 23, const =1
Insn 66: possible biv, reg 22, const =1
Insn 80: possible biv, reg 23, const =1
Insn 83: possible biv, reg 22, const =1
Insn 98: possible biv, reg 27, const =-1
Reg 27: biv verified
Reg 22: biv verified
Reg 23: biv verified
Biv 27 initialized at insn 17: initial value is complex
Biv 22 initialized at insn 4: initial value is complex
Biv 23 initialized at insn 6: initial value is complex
Increment 32 of biv 22 eliminated.
Increment 49 of biv 22 eliminated.
Increment 66 of biv 22 eliminated.
Increment 29 of biv 23 eliminated.
Increment 46 of biv 23 eliminated.
Increment 63 of biv 23 eliminated.

Insn 28: dest address src reg 23 benefit 0 lifetime 1 replaceable ncav mult 1 add 0
Insn 34: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 1
Insn 45: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 1
Insn 51: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 2
Insn 62: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 2
Insn 68: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 3
Insn 79: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 3
Insn 85: dest address src reg 22 benefit 0 lifetime 1 replaceable ncav mult 1 add 0
Loop unrolling: Not basic or general induction var.
Cannot eliminate biv 27: biv used in insn 99.
Sorted combine statistics:

biv 22 can be eliminated.
Sorted combine statistics:
 {85, 7} {68, 6} {51, 6} {34, 6}
giv at 68 combined with giv at 85
giv at 51 combined with giv at 85
giv at 34 combined with giv at 85
Sorted combine statistics:

giv of insn 85 not worth while, -1008 vs 27.
giv at 51 recombined with giv at 85 as (plus:SI (reg:SI 29)
    (const_int 2))
giv at 68 recombined with giv at 85 as (plus:SI (reg:SI 29)
    (const_int 3))
biv 23 can be eliminated.
Sorted combine statistics:
 {28, 7} {79, 6} {62, 6} {45, 6}
giv at 79 combined with giv at 28
giv at 62 combined with giv at 28
giv at 45 combined with giv at 28
Sorted combine statistics:

giv at 62 recombined with giv at 28 as (plus:SI (reg:SI 29)
    (const_int 2))
giv at 79 recombined with giv at 28 as (plus:SI (reg:SI 29)
    (const_int 3))


Loop from 22 to 109: 21 real insns.
Continue at insn 94.
Insn 80: possible biv, reg 23, const =4
Insn 83: possible biv, reg 22, const =4
Insn 98: possible biv, reg 27, const =-1
Reg 27: biv verified
Reg 22: biv verified
Reg 23: biv verified
Biv 27 initialized at insn 17: initial value is complex
Biv 22 initialized at insn 4: initial value is complex
Biv 23 initialized at insn 6: initial value is complex
Insn 28: dest address src reg 23 benefit 0 lifetime 1 replaceable ncav mult 1 add 0
Insn 34: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 1
Insn 45: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 1
Insn 51: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 2
Insn 62: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 2
Insn 68: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 3
Insn 79: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 3
Insn 85: dest address src reg 22 benefit 0 lifetime 1 replaceable ncav mult 1 add 0
Loop unrolling: Not basic or general induction var.
Cannot eliminate biv 27: biv used in insn 99.
Sorted combine statistics:

biv 22 can be eliminated.
Sorted combine statistics:
 {85, 7} {68, 6} {51, 6} {34, 6}
giv at 68 combined with giv at 85
giv at 51 combined with giv at 85
giv at 34 combined with giv at 85
Sorted combine statistics:

giv of insn 85 not worth while, -1008 vs 21.
giv at 51 recombined with giv at 85 as (plus:SI (reg:SI 36)
    (const_int 2))
giv at 68 recombined with giv at 85 as (plus:SI (reg:SI 36)
    (const_int 3))
biv 23 can be eliminated.
Sorted combine statistics:
 {28, 7} {79, 6} {62, 6} {45, 6}
giv at 79 combined with giv at 28
giv at 62 combined with giv at 28
giv at 45 combined with giv at 28
Sorted combine statistics:

giv of insn 28 not worth while, -1008 vs 21.
giv at 62 recombined with giv at 28 as (plus:SI (reg:SI 36)
    (const_int 2))
giv at 79 recombined with giv at 28 as (plus:SI (reg:SI 36)
    (const_int 3))

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
       [not found]             ` < 199902190406.XAA05899@blastula.phys.columbia.edu >
@ 1999-02-18 20:18               ` Joern Rennecke
       [not found]                 ` < 199902190417.EAA25841@phal.cygnus.co.uk >
  1999-02-28 22:53                 ` Joern Rennecke
  0 siblings, 2 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-18 20:18 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: wilson, law, egcs

> by hand.  I'm going to run a full bootstrap now and make sure nothing
> broke.

Don't bother, it's broken :-( .  There are problems with shared RTL,
and with the handling of undobuf.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
       [not found]                 ` < 199902190417.EAA25841@phal.cygnus.co.uk >
@ 1999-02-18 21:15                   ` Joern Rennecke
       [not found]                     ` < 199902190514.FAA10029@phal.cygnus.co.uk >
  1999-02-28 22:53                     ` Joern Rennecke
  0 siblings, 2 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-18 21:15 UTC (permalink / raw)
  To: zack, wilson, law; +Cc: egcs

> Don't bother, it's broken :-( .  There are problems with shared RTL,
> and with the handling of undobuf.

I hope that's fixed now:

Fri Feb 19 02:34:23 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* rtl.h (validate_subst): Declare.
	* loop.c (strength_reduce): Clean up code to set maybe_multiple.
	In code to convert biv increments to givs, try to create address givs.
	* combine.c (valiadte_subst): New function.
	(combine_instructions): Before returning, clear reg_last_set_value.
	(nonzero_bits, num_sign_bit_copies, reg_last_value):
	Handle case when reg_last_set_value is not set.

Index: rtl.h
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/rtl.h,v
retrieving revision 1.105.2.1
diff -p -r1.105.2.1 rtl.h
*** rtl.h	1999/01/28 00:26:15	1.105.2.1
--- rtl.h	1999/02/19 05:09:15
*************** extern void add_clobbers		PROTO ((rtx, i
*** 1370,1375 ****
--- 1370,1376 ----
  extern void combine_instructions	PROTO ((rtx, int));
  extern int extended_count		PROTO ((rtx, enum machine_mode, int));
  extern rtx remove_death			PROTO ((int, rtx));
+ extern int validate_subst		PROTO((rtx, rtx, rtx));
  #ifdef BUFSIZ
  extern void dump_combine_stats		PROTO ((FILE *));
  extern void dump_combine_total_stats	PROTO ((FILE *));
Index: loop.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/loop.c,v
retrieving revision 1.166.2.31
diff -p -r1.166.2.31 loop.c
*** loop.c	1999/02/18 19:41:23	1.166.2.31
--- loop.c	1999/02/19 05:09:18
*************** strength_reduce (scan_start, end, loop_t
*** 3782,3794 ****
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && (INSN_UID (JUMP_LABEL (insn)) >= max_uid_for_loop
! 			      || (INSN_UID (p) < max_uid_for_loop
! 				  ? (INSN_LUID (JUMP_LABEL (insn))
! 				     <= INSN_LUID (p))
! 				  : (INSN_UID (insn) >= max_uid_for_loop
! 				     || (INSN_LUID (JUMP_LABEL (insn))
! 					 < INSN_LUID (insn))))))))
  		{
  		  maybe_multiple = 1;
  		  break;
--- 3813,3819 ----
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && ! loop_insn_first_p (p, JUMP_LABEL (insn)))))
  		{
  		  maybe_multiple = 1;
  		  break;
*************** strength_reduce (scan_start, end, loop_t
*** 4157,4164 ****
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
--- 4196,4204 ----
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, src, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
+ 	      int all_eliminated;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
*************** strength_reduce (scan_start, end, loop_t
*** 4198,4203 ****
--- 4238,4278 ----
  		  continue;
  		}
  	      next->add_val = add_val;
+ 
+ 	      /* Remove the increment from the list of biv increments.  */
+ 	      *vp = next;
+ 	      bl->biv_count--;
+ 	      old_regno = REGNO (old_reg);
+ 	      new_regno = REGNO (dest_reg);
+ 	      VARRAY_INT (set_in_loop, old_regno)--;
+ 	      VARRAY_INT (n_times_set, old_regno)--;
+ 
+ 	      src = SET_SRC (set);
+ 	      all_eliminated = 1;
+ 	      for (p = NEXT_INSN (v->insn); p != next->insn; p = NEXT_INSN (p))
+ 		{
+ 		  rtx newpat;
+ 
+ 		  if (GET_RTX_CLASS (GET_CODE (p)) != 'i')
+ 		    continue;
+ 		  if (! validate_subst (p, old_reg, src))
+ 		    all_eliminated = 0;
+ 		}
+ 	      if (all_eliminated)
+ 		{
+ 		  if (loop_dump_stream)
+ 		    fprintf (loop_dump_stream,
+ 			     "Increment %d of biv %d eliminated.\n\n",
+ 			 INSN_UID (v->insn), old_regno);
+ 		  PUT_CODE (v->insn, NOTE);
+ 		  NOTE_LINE_NUMBER (v->insn) = NOTE_INSN_DELETED;
+ 		  NOTE_SOURCE_FILE (v->insn) = 0;
+ 		  VARRAY_CHAR (may_not_optimize, new_regno) = 0;
+ 		  VARRAY_INT (set_in_loop, new_regno) = 0;
+ 		  VARRAY_INT (n_times_set, new_regno) = 0;
+ 		  continue;
+ 		}
+ 
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
*************** strength_reduce (scan_start, end, loop_t
*** 4219,4239 ****
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
- 	      old_regno = REGNO (old_reg);
- 	      new_regno = REGNO (dest_reg);
- 	      VARRAY_INT (set_in_loop, old_regno)--;
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
- 	      VARRAY_INT (n_times_set, old_regno)--;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
- 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Remove the increment from the list of biv increments,
! 		 and record it as a giv.  */
! 	      *vp = next;
! 	      bl->biv_count--;
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
--- 4294,4306 ----
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Record V as a giv.  */
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
Index: combine.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/combine.c,v
retrieving revision 1.193
diff -p -r1.193 combine.c
*** combine.c	1999/01/20 17:50:18	1.193
--- combine.c	1999/02/19 05:09:22
*************** combine_instructions (f, nregs)
*** 674,679 ****
--- 674,680 ----
    total_successes += combine_successes;
  
    nonzero_sign_valid = 0;
+   reg_last_set_value = 0;
  
    /* Make recognizer allow volatile MEMs again.  */
    init_recog ();
*************** nonzero_bits (x, mode)
*** 7623,7628 ****
--- 7667,7676 ----
  	}
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return nonzero;
+ 
        /* If X is a register whose nonzero bits value is current, use it.
  	 Otherwise, if X is a register whose value we can find, use that
  	 value.  Otherwise, use the previously-computed global nonzero bits
*************** num_sign_bit_copies (x, mode)
*** 8018,8023 ****
--- 8066,8075 ----
  	return GET_MODE_BITSIZE (Pmode) - GET_MODE_BITSIZE (ptr_mode) + 1;
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return 1;
+ 
        if (reg_last_set_value[REGNO (x)] != 0
  	  && reg_last_set_mode[REGNO (x)] == mode
  	  && (REG_N_SETS (REGNO (x)) == 1
*************** get_last_value (x)
*** 10924,10930 ****
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   if (GET_CODE (x) != REG)
      return 0;
  
    regno = REGNO (x);
--- 10976,10983 ----
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   /* If called from loop, the reg_last_* arrays are not set.  */
!   if (GET_CODE (x) != REG || ! reg_last_set_value)
      return 0;
  
    regno = REGNO (x);
*************** insn_cuid (insn)
*** 12086,12091 ****
--- 12139,12180 ----
      abort ();
  
    return INSN_CUID (insn);
+ }
+ \f
+ /* Replace FROM with to throughout in INSN, and make simplifications.
+    Return nonzero for sucess.  */
+ int
+ validate_subst (insn, from, to)
+      rtx insn, from, to;
+ {
+   rtx pat, new_pat;
+ 
+   undobuf.previous_undos = undobuf.undos;
+ 
+   /* Save the current high-water-mark so we can free storage if we didn't
+      accept this combination.  */
+   undobuf.storage = (char *) oballoc (0);
+ 
+   subst_insn = insn;
+ 
+   pat = PATTERN (insn);
+   new_pat = subst (pat, from, to, 0, 1);
+   /* Change INSN to a nop so that validate_change is forced to re-recognize.  */
+   PATTERN (insn) = const0_rtx;
+   if (validate_change (insn, &PATTERN (insn), new_pat, 0))
+     {
+       rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+ 
+       if (note)
+ 	XEXP (note, 0) = subst (XEXP (note, 0), from, to, 0, 1);
+       return 1;
+     }
+   else
+     {
+       PATTERN (insn) = pat;
+       undo_all ();
+       return 0;
+     }
  }
  \f
  void

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
       [not found]                     ` < 199902190514.FAA10029@phal.cygnus.co.uk >
@ 1999-02-19 14:19                       ` Jim Wilson
       [not found]                         ` < 199902192218.OAA21328@rtl.cygnus.com >
  1999-02-28 22:53                         ` Jim Wilson
  0 siblings, 2 replies; 20+ messages in thread
From: Jim Wilson @ 1999-02-19 14:19 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: zack, law, egcs

What does this patch do?  There are no comments explaining what it does,
and you didn't offer any explanation.  All I know is that it gives better
code for Zack's testcase, but I am not going to approve it unless I can
understand why.

Actually, I see one clue in the changelog entry, it mentions creating
address givs, but nowhere do I see DEST_ADDR used in your patch.

Why do you want to call subst?  subst is a combiner internal routine.
I'm skeptical that making it callable from elsewhere is a good idea.
That could just result in a lot of bugs and maintenance problems.
You will have to have a good reason for wanting to do this before I will
seriously consider it.

The loop_insn_first_p change looks like a carry-over from another pending
patch.  I'll take care of that patch.

Jim

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
       [not found]                         ` < 199902192218.OAA21328@rtl.cygnus.com >
@ 1999-02-19 14:27                           ` Joern Rennecke
  1999-02-28 22:53                             ` Joern Rennecke
  0 siblings, 1 reply; 20+ messages in thread
From: Joern Rennecke @ 1999-02-19 14:27 UTC (permalink / raw)
  To: Jim Wilson; +Cc: amylaar, zack, law, egcs

> What does this patch do?  There are no comments explaining what it does,
> and you didn't offer any explanation.  All I know is that it gives better
> code for Zack's testcase, but I am not going to approve it unless I can
> understand why.

When I implemented the biv increment -> giv conversion, I made DEST_REG givs.
the giv combination is not really good at handling cases when there are
DEST_REG givs that are only used for a DEST_ADDR giv that could be changed
to make the DEST_REG giv dead.

> Actually, I see one clue in the changelog entry, it mentions creating
> address givs, but nowhere do I see DEST_ADDR used in your patch.

The address givs are not created explicitly, but by combining the register
set into its use, I create DEST_ADDR givs if the use happens to be in
a MEM.  If we can combine the set into some other uses, so much the better.

> Why do you want to call subst?  subst is a combiner internal routine.
> I'm skeptical that making it callable from elsewhere is a good idea.
> That could just result in a lot of bugs and maintenance problems.
> You will have to have a good reason for wanting to do this before I will
> seriously consider it.

I have found out in the meantime that calling subst from outside is
indeed not a good idea.  Look at my follow-up-patch, which defines
a function validate_subst in combine to handle the combiner internals.

In effect, I want to combine the setting of an register with its use -
possibly with multiple uses - outside of combine.  Re-using code from
combine seemed the natural thing to do.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
  1999-02-18 21:15                   ` Joern Rennecke
       [not found]                     ` < 199902190514.FAA10029@phal.cygnus.co.uk >
@ 1999-02-28 22:53                     ` Joern Rennecke
  1 sibling, 0 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-28 22:53 UTC (permalink / raw)
  To: zack, wilson, law; +Cc: egcs

> Don't bother, it's broken :-( .  There are problems with shared RTL,
> and with the handling of undobuf.

I hope that's fixed now:

Fri Feb 19 02:34:23 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* rtl.h (validate_subst): Declare.
	* loop.c (strength_reduce): Clean up code to set maybe_multiple.
	In code to convert biv increments to givs, try to create address givs.
	* combine.c (valiadte_subst): New function.
	(combine_instructions): Before returning, clear reg_last_set_value.
	(nonzero_bits, num_sign_bit_copies, reg_last_value):
	Handle case when reg_last_set_value is not set.

Index: rtl.h
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/rtl.h,v
retrieving revision 1.105.2.1
diff -p -r1.105.2.1 rtl.h
*** rtl.h	1999/01/28 00:26:15	1.105.2.1
--- rtl.h	1999/02/19 05:09:15
*************** extern void add_clobbers		PROTO ((rtx, i
*** 1370,1375 ****
--- 1370,1376 ----
  extern void combine_instructions	PROTO ((rtx, int));
  extern int extended_count		PROTO ((rtx, enum machine_mode, int));
  extern rtx remove_death			PROTO ((int, rtx));
+ extern int validate_subst		PROTO((rtx, rtx, rtx));
  #ifdef BUFSIZ
  extern void dump_combine_stats		PROTO ((FILE *));
  extern void dump_combine_total_stats	PROTO ((FILE *));
Index: loop.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/loop.c,v
retrieving revision 1.166.2.31
diff -p -r1.166.2.31 loop.c
*** loop.c	1999/02/18 19:41:23	1.166.2.31
--- loop.c	1999/02/19 05:09:18
*************** strength_reduce (scan_start, end, loop_t
*** 3782,3794 ****
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && (INSN_UID (JUMP_LABEL (insn)) >= max_uid_for_loop
! 			      || (INSN_UID (p) < max_uid_for_loop
! 				  ? (INSN_LUID (JUMP_LABEL (insn))
! 				     <= INSN_LUID (p))
! 				  : (INSN_UID (insn) >= max_uid_for_loop
! 				     || (INSN_LUID (JUMP_LABEL (insn))
! 					 < INSN_LUID (insn))))))))
  		{
  		  maybe_multiple = 1;
  		  break;
--- 3813,3819 ----
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && ! loop_insn_first_p (p, JUMP_LABEL (insn)))))
  		{
  		  maybe_multiple = 1;
  		  break;
*************** strength_reduce (scan_start, end, loop_t
*** 4157,4164 ****
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
--- 4196,4204 ----
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, src, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
+ 	      int all_eliminated;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
*************** strength_reduce (scan_start, end, loop_t
*** 4198,4203 ****
--- 4238,4278 ----
  		  continue;
  		}
  	      next->add_val = add_val;
+ 
+ 	      /* Remove the increment from the list of biv increments.  */
+ 	      *vp = next;
+ 	      bl->biv_count--;
+ 	      old_regno = REGNO (old_reg);
+ 	      new_regno = REGNO (dest_reg);
+ 	      VARRAY_INT (set_in_loop, old_regno)--;
+ 	      VARRAY_INT (n_times_set, old_regno)--;
+ 
+ 	      src = SET_SRC (set);
+ 	      all_eliminated = 1;
+ 	      for (p = NEXT_INSN (v->insn); p != next->insn; p = NEXT_INSN (p))
+ 		{
+ 		  rtx newpat;
+ 
+ 		  if (GET_RTX_CLASS (GET_CODE (p)) != 'i')
+ 		    continue;
+ 		  if (! validate_subst (p, old_reg, src))
+ 		    all_eliminated = 0;
+ 		}
+ 	      if (all_eliminated)
+ 		{
+ 		  if (loop_dump_stream)
+ 		    fprintf (loop_dump_stream,
+ 			     "Increment %d of biv %d eliminated.\n\n",
+ 			 INSN_UID (v->insn), old_regno);
+ 		  PUT_CODE (v->insn, NOTE);
+ 		  NOTE_LINE_NUMBER (v->insn) = NOTE_INSN_DELETED;
+ 		  NOTE_SOURCE_FILE (v->insn) = 0;
+ 		  VARRAY_CHAR (may_not_optimize, new_regno) = 0;
+ 		  VARRAY_INT (set_in_loop, new_regno) = 0;
+ 		  VARRAY_INT (n_times_set, new_regno) = 0;
+ 		  continue;
+ 		}
+ 
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
*************** strength_reduce (scan_start, end, loop_t
*** 4219,4239 ****
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
- 	      old_regno = REGNO (old_reg);
- 	      new_regno = REGNO (dest_reg);
- 	      VARRAY_INT (set_in_loop, old_regno)--;
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
- 	      VARRAY_INT (n_times_set, old_regno)--;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
- 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Remove the increment from the list of biv increments,
! 		 and record it as a giv.  */
! 	      *vp = next;
! 	      bl->biv_count--;
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
--- 4294,4306 ----
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Record V as a giv.  */
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
Index: combine.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/combine.c,v
retrieving revision 1.193
diff -p -r1.193 combine.c
*** combine.c	1999/01/20 17:50:18	1.193
--- combine.c	1999/02/19 05:09:22
*************** combine_instructions (f, nregs)
*** 674,679 ****
--- 674,680 ----
    total_successes += combine_successes;
  
    nonzero_sign_valid = 0;
+   reg_last_set_value = 0;
  
    /* Make recognizer allow volatile MEMs again.  */
    init_recog ();
*************** nonzero_bits (x, mode)
*** 7623,7628 ****
--- 7667,7676 ----
  	}
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return nonzero;
+ 
        /* If X is a register whose nonzero bits value is current, use it.
  	 Otherwise, if X is a register whose value we can find, use that
  	 value.  Otherwise, use the previously-computed global nonzero bits
*************** num_sign_bit_copies (x, mode)
*** 8018,8023 ****
--- 8066,8075 ----
  	return GET_MODE_BITSIZE (Pmode) - GET_MODE_BITSIZE (ptr_mode) + 1;
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return 1;
+ 
        if (reg_last_set_value[REGNO (x)] != 0
  	  && reg_last_set_mode[REGNO (x)] == mode
  	  && (REG_N_SETS (REGNO (x)) == 1
*************** get_last_value (x)
*** 10924,10930 ****
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   if (GET_CODE (x) != REG)
      return 0;
  
    regno = REGNO (x);
--- 10976,10983 ----
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   /* If called from loop, the reg_last_* arrays are not set.  */
!   if (GET_CODE (x) != REG || ! reg_last_set_value)
      return 0;
  
    regno = REGNO (x);
*************** insn_cuid (insn)
*** 12086,12091 ****
--- 12139,12180 ----
      abort ();
  
    return INSN_CUID (insn);
+ }
+ \f
+ /* Replace FROM with to throughout in INSN, and make simplifications.
+    Return nonzero for sucess.  */
+ int
+ validate_subst (insn, from, to)
+      rtx insn, from, to;
+ {
+   rtx pat, new_pat;
+ 
+   undobuf.previous_undos = undobuf.undos;
+ 
+   /* Save the current high-water-mark so we can free storage if we didn't
+      accept this combination.  */
+   undobuf.storage = (char *) oballoc (0);
+ 
+   subst_insn = insn;
+ 
+   pat = PATTERN (insn);
+   new_pat = subst (pat, from, to, 0, 1);
+   /* Change INSN to a nop so that validate_change is forced to re-recognize.  */
+   PATTERN (insn) = const0_rtx;
+   if (validate_change (insn, &PATTERN (insn), new_pat, 0))
+     {
+       rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+ 
+       if (note)
+ 	XEXP (note, 0) = subst (XEXP (note, 0), from, to, 0, 1);
+       return 1;
+     }
+   else
+     {
+       PATTERN (insn) = pat;
+       undo_all ();
+       return 0;
+     }
  }
  \f
  void

^ permalink raw reply	[flat|nested] 20+ messages in thread

* strength reduction example
  1999-02-18  6:51 strength reduction example Zack Weinberg
       [not found] ` < 199902181451.JAA11891@blastula.phys.columbia.edu >
@ 1999-02-28 22:53 ` Zack Weinberg
  1 sibling, 0 replies; 20+ messages in thread
From: Zack Weinberg @ 1999-02-28 22:53 UTC (permalink / raw)
  To: egcs

I've been looking at a strength reduction bug for the last week or
so.  Until very recently we generated incorrect code when Joern's giv
combiner was active for this case.  Now the code is correct, but
decidedly suboptimal due to increased register pressure.

Here's the source:

typedef unsigned long size_t;
char *
copy(s1, s2, n)
     char *s1;
     const char *s2;
     size_t n;
{
  char c;
  char *s = s1;
  size_t n4 = n >> 2;
  --s;

  do
  {
    c = *s2++;
    *++s1 = c;
    if(c == '\0') break;
    c = *s2++;
    *++s1 = c;
    if(c == '\0') break;
    c = *s2++;
    *++s1 = c;
    if(c == '\0') break;
    c = *s2++;
    *++s1 = c;
    if(c == '\0') break;
  }
  while(--n4);

  return s;
}

Without Joern's stuff, we get this assembly for the loop:

.L5:
        decl %ebx
        jz .L4
.L3:
        movb (%eax),%dl
        incl %eax
        incl %ecx
        movb %dl,(%ecx)
        testb %dl,%dl
        je .L4
        movb (%eax),%dl
        incl %eax
        incl %ecx
        movb %dl,(%ecx)
        testb %dl,%dl
        je .L4
        movb (%eax),%dl
        incl %eax
        incl %ecx
        movb %dl,(%ecx)
        testb %dl,%dl
        je .L4
        movb (%eax),%dl
        incl %eax
        incl %ecx
        movb %dl,(%ecx)
        testb %dl,%dl
        jne .L5
.L4:

This code is fine.  It may in fact be optimal for this case, due to
pipelining; I'm not enough of an x86 expert to say.  If I were to bang
on it by hand, I'd produce something like this:

.L5:
        decl %ebx
        jz .L4
	addl $4,%eax
	addl $4,%ecx
.L3:
        movb (%eax),%dl
        movb %dl,1(%ecx)
        testb %dl,%dl
        je .L4
        movb 1(%eax),%dl
        movb %dl,2(%ecx)
        testb %dl,%dl
        je .L4
        movb 2(%eax),%dl
        movb %dl,3(%ecx)
        testb %dl,%dl
        je .L4
        movb 3(%eax),%dl
        movb %dl,4(%ecx)
        testb %dl,%dl
        jne .L5
.L4:

which might be worse due to pipeline stalls, but has the same register
demands and roughly the same code size.

With Joern's code, we get instead:

.L5:
        decl -8(%ebp)
        jz .L4
.L3:
        movb -3(%edi),%dl
        movb %dl,(%eax)
        testb %dl,%dl
        je .L4
        movb (%ebx),%dl
        leal 1(%eax),%esi
        movb %dl,1(%eax)
        testb %dl,%dl
        je .L4
        movl -12(%ebp),%ebx
        movb (%ebx),%dl
        movb %dl,2(%eax)
        testb %dl,%dl
        je .L4
        movb (%edi),%dl
        addl $4,%ebx
        movl %ebx,-12(%ebp)
        addl $4,%ecx
        leal 3(%ecx),%edi
        leal 1(%ecx),%ebx
        addl $4,%eax
        movb %dl,-2(%esi)
        testb %dl,%dl
        jne .L5
.L4:

Register pressure has forced things onto the stack.  We have many more
address generations.  We have pointer recalculations inside the loop.
We have significantly increased code size.

Looking at the loop dump, the difference seems to be that Joern's code
converts the incremented pointers to their own givs, and then
preserves those givs at the expense of the original bivs.  We should
prefer to hang on to the original bivs and express things in terms of
them, at least in this context.

zw

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
  1999-02-18 18:36       ` Joern Rennecke
       [not found]         ` < 199902190235.CAA32082@phal.cygnus.co.uk >
@ 1999-02-28 22:53         ` Joern Rennecke
  1 sibling, 0 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-28 22:53 UTC (permalink / raw)
  To: zack, wilson, law; +Cc: egcs

It turned out that I also have to change num_sign_bit_copies and
reg_last_value.  Here is an updated patch:

Fri Feb 19 02:34:23 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* rtl.h (subst, subst_insn): Declare.
	* loop.c (strength_reduce): Clean up code to set maybe_multiple.
	In code to convert biv increments to givs, try to create address givs.
	* combine.c (subst, subst_insn): No longer static.
	(combine_instructions): Before returning, clear reg_last_set_value.
	(nonzero_bits, num_sign_bit_copies, reg_last_value):
	Handle case when reg_last_set_value is not set.

===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/rtl.h,v
retrieving revision 1.105.2.1
diff -p -r1.105.2.1 rtl.h
*** rtl.h	1999/01/28 00:26:15	1.105.2.1
--- rtl.h	1999/02/19 01:18:53
*************** extern void add_clobbers		PROTO ((rtx, i
*** 1370,1375 ****
--- 1370,1377 ----
  extern void combine_instructions	PROTO ((rtx, int));
  extern int extended_count		PROTO ((rtx, enum machine_mode, int));
  extern rtx remove_death			PROTO ((int, rtx));
+ extern rtx subst			PROTO((rtx, rtx, rtx, int, int));
+ extern rtx subst_insn;
  #ifdef BUFSIZ
  extern void dump_combine_stats		PROTO ((FILE *));
  extern void dump_combine_total_stats	PROTO ((FILE *));
Index: loop.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/loop.c,v
retrieving revision 1.166.2.31
diff -p -r1.166.2.31 loop.c
*** loop.c	1999/02/18 19:41:23	1.166.2.31
--- loop.c	1999/02/19 01:18:55
*************** strength_reduce (scan_start, end, loop_t
*** 3782,3794 ****
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && (INSN_UID (JUMP_LABEL (insn)) >= max_uid_for_loop
! 			      || (INSN_UID (p) < max_uid_for_loop
! 				  ? (INSN_LUID (JUMP_LABEL (insn))
! 				     <= INSN_LUID (p))
! 				  : (INSN_UID (insn) >= max_uid_for_loop
! 				     || (INSN_LUID (JUMP_LABEL (insn))
! 					 < INSN_LUID (insn))))))))
  		{
  		  maybe_multiple = 1;
  		  break;
--- 3813,3819 ----
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && ! loop_insn_first_p (p, JUMP_LABEL (insn)))))
  		{
  		  maybe_multiple = 1;
  		  break;
*************** strength_reduce (scan_start, end, loop_t
*** 4157,4164 ****
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
--- 4182,4190 ----
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, src, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
+ 	      int all_eliminated;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
*************** strength_reduce (scan_start, end, loop_t
*** 4198,4203 ****
--- 4224,4268 ----
  		  continue;
  		}
  	      next->add_val = add_val;
+ 
+ 	      /* Remove the increment from the list of biv increments.  */
+ 	      *vp = next;
+ 	      bl->biv_count--;
+ 	      old_regno = REGNO (old_reg);
+ 	      new_regno = REGNO (dest_reg);
+ 	      VARRAY_INT (set_in_loop, old_regno)--;
+ 	      VARRAY_INT (n_times_set, old_regno)--;
+ 
+ 	      src = SET_SRC (set);
+ 	      all_eliminated = 1;
+ 	      for (p = NEXT_INSN (v->insn); p != next->insn; p = NEXT_INSN (p))
+ 		{
+ 		  rtx newpat;
+ 
+ 		  if (GET_RTX_CLASS (GET_CODE (p)) != 'i')
+ 		    continue;
+ 		  subst_insn = p;
+ 		  newpat = subst (PATTERN (p), old_reg, src, 0, 0);
+ 		  if (validate_change (p, &PATTERN (p), newpat, 0))
+ 		    replace_rtx (REG_NOTES (p), old_reg, src);
+ 		  else
+ 		    all_eliminated = 0;
+ 		}
+ 	      if (all_eliminated)
+ 		{
+ 		  if (loop_dump_stream)
+ 		    fprintf (loop_dump_stream,
+ 			     "Increment %d of biv %d eliminated.\n\n",
+ 			 INSN_UID (v->insn), old_regno);
+ 		  PUT_CODE (v->insn, NOTE);
+ 		  NOTE_LINE_NUMBER (v->insn) = NOTE_INSN_DELETED;
+ 		  NOTE_SOURCE_FILE (v->insn) = 0;
+ 		  VARRAY_CHAR (may_not_optimize, new_regno) = 0;
+ 		  VARRAY_INT (set_in_loop, new_regno) = 0;
+ 		  VARRAY_INT (n_times_set, new_regno) = 0;
+ 		  continue;
+ 		}
+ 
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
*************** strength_reduce (scan_start, end, loop_t
*** 4219,4239 ****
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
- 	      old_regno = REGNO (old_reg);
- 	      new_regno = REGNO (dest_reg);
- 	      VARRAY_INT (set_in_loop, old_regno)--;
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
- 	      VARRAY_INT (n_times_set, old_regno)--;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
- 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Remove the increment from the list of biv increments,
! 		 and record it as a giv.  */
! 	      *vp = next;
! 	      bl->biv_count--;
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
--- 4284,4296 ----
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Record V as a giv.  */
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
Index: combine.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/combine.c,v
retrieving revision 1.193
diff -p -r1.193 combine.c
*** combine.c	1999/01/20 17:50:18	1.193
--- combine.c	1999/02/19 02:32:53
*************** static int last_call_cuid;
*** 165,171 ****
     looked at, but this may be used to examine the successors of the insn
     to judge whether a simplification is valid.  */
  
! static rtx subst_insn;
  
  /* This is an insn that belongs before subst_insn, but is not currently
     on the insn chain.  */
--- 165,171 ----
     looked at, but this may be used to examine the successors of the insn
     to judge whether a simplification is valid.  */
  
! rtx subst_insn;
  
  /* This is an insn that belongs before subst_insn, but is not currently
     on the insn chain.  */
*************** static int combinable_i3pat	PROTO((rtx, 
*** 398,404 ****
  static rtx try_combine		PROTO((rtx, rtx, rtx));
  static void undo_all		PROTO((void));
  static rtx *find_split_point	PROTO((rtx *, rtx));
- static rtx subst		PROTO((rtx, rtx, rtx, int, int));
  static rtx simplify_rtx		PROTO((rtx, enum machine_mode, int, int));
  static rtx simplify_if_then_else  PROTO((rtx));
  static rtx simplify_set		PROTO((rtx));
--- 398,403 ----
*************** combine_instructions (f, nregs)
*** 674,679 ****
--- 673,679 ----
    total_successes += combine_successes;
  
    nonzero_sign_valid = 0;
+   reg_last_set_value = 0;
  
    /* Make recognizer allow volatile MEMs again.  */
    init_recog ();
*************** find_split_point (loc, insn)
*** 3104,3110 ****
     UNIQUE_COPY is non-zero if each substitution must be unique.  We do this
     by copying if `n_occurrences' is non-zero.  */
  
! static rtx
  subst (x, from, to, in_dest, unique_copy)
       register rtx x, from, to;
       int in_dest;
--- 3147,3153 ----
     UNIQUE_COPY is non-zero if each substitution must be unique.  We do this
     by copying if `n_occurrences' is non-zero.  */
  
! rtx
  subst (x, from, to, in_dest, unique_copy)
       register rtx x, from, to;
       int in_dest;
*************** nonzero_bits (x, mode)
*** 7623,7628 ****
--- 7666,7675 ----
  	}
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return nonzero;
+ 
        /* If X is a register whose nonzero bits value is current, use it.
  	 Otherwise, if X is a register whose value we can find, use that
  	 value.  Otherwise, use the previously-computed global nonzero bits
*************** num_sign_bit_copies (x, mode)
*** 8018,8023 ****
--- 8065,8074 ----
  	return GET_MODE_BITSIZE (Pmode) - GET_MODE_BITSIZE (ptr_mode) + 1;
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return 1;
+ 
        if (reg_last_set_value[REGNO (x)] != 0
  	  && reg_last_set_mode[REGNO (x)] == mode
  	  && (REG_N_SETS (REGNO (x)) == 1
*************** get_last_value (x)
*** 10924,10930 ****
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   if (GET_CODE (x) != REG)
      return 0;
  
    regno = REGNO (x);
--- 10975,10982 ----
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   /* If called from loop, the reg_last_* arrays are not set.  */
!   if (GET_CODE (x) != REG || ! reg_last_set_value)
      return 0;
  
    regno = REGNO (x);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
  1999-02-18 20:09           ` Zack Weinberg
       [not found]             ` < 199902190406.XAA05899@blastula.phys.columbia.edu >
@ 1999-02-28 22:53             ` Zack Weinberg
  1 sibling, 0 replies; 20+ messages in thread
From: Zack Weinberg @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: wilson, law, egcs

On Fri, 19 Feb 1999 02:35:54 +0000 (GMT), Joern Rennecke wrote:
>It turned out that I also have to change num_sign_bit_copies and
>reg_last_value.  Here is an updated patch:

Looks good on my test case: loop.c with combination enabled and this
patch applied produces almost exactly the code I said I'd have written
by hand.  I'm going to run a full bootstrap now and make sure nothing
broke.

zw

;; Function strncpy

Loop from 22 to 109: 27 real insns.
Continue at insn 94.
Insn 29: possible biv, reg 23, const =1
Insn 32: possible biv, reg 22, const =1
Insn 46: possible biv, reg 23, const =1
Insn 49: possible biv, reg 22, const =1
Insn 63: possible biv, reg 23, const =1
Insn 66: possible biv, reg 22, const =1
Insn 80: possible biv, reg 23, const =1
Insn 83: possible biv, reg 22, const =1
Insn 98: possible biv, reg 27, const =-1
Reg 27: biv verified
Reg 22: biv verified
Reg 23: biv verified
Biv 27 initialized at insn 17: initial value is complex
Biv 22 initialized at insn 4: initial value is complex
Biv 23 initialized at insn 6: initial value is complex
Increment 32 of biv 22 eliminated.
Increment 49 of biv 22 eliminated.
Increment 66 of biv 22 eliminated.
Increment 29 of biv 23 eliminated.
Increment 46 of biv 23 eliminated.
Increment 63 of biv 23 eliminated.

Insn 28: dest address src reg 23 benefit 0 lifetime 1 replaceable ncav mult 1 add 0
Insn 34: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 1
Insn 45: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 1
Insn 51: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 2
Insn 62: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 2
Insn 68: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 3
Insn 79: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 3
Insn 85: dest address src reg 22 benefit 0 lifetime 1 replaceable ncav mult 1 add 0
Loop unrolling: Not basic or general induction var.
Cannot eliminate biv 27: biv used in insn 99.
Sorted combine statistics:

biv 22 can be eliminated.
Sorted combine statistics:
 {85, 7} {68, 6} {51, 6} {34, 6}
giv at 68 combined with giv at 85
giv at 51 combined with giv at 85
giv at 34 combined with giv at 85
Sorted combine statistics:

giv of insn 85 not worth while, -1008 vs 27.
giv at 51 recombined with giv at 85 as (plus:SI (reg:SI 29)
    (const_int 2))
giv at 68 recombined with giv at 85 as (plus:SI (reg:SI 29)
    (const_int 3))
biv 23 can be eliminated.
Sorted combine statistics:
 {28, 7} {79, 6} {62, 6} {45, 6}
giv at 79 combined with giv at 28
giv at 62 combined with giv at 28
giv at 45 combined with giv at 28
Sorted combine statistics:

giv at 62 recombined with giv at 28 as (plus:SI (reg:SI 29)
    (const_int 2))
giv at 79 recombined with giv at 28 as (plus:SI (reg:SI 29)
    (const_int 3))


Loop from 22 to 109: 21 real insns.
Continue at insn 94.
Insn 80: possible biv, reg 23, const =4
Insn 83: possible biv, reg 22, const =4
Insn 98: possible biv, reg 27, const =-1
Reg 27: biv verified
Reg 22: biv verified
Reg 23: biv verified
Biv 27 initialized at insn 17: initial value is complex
Biv 22 initialized at insn 4: initial value is complex
Biv 23 initialized at insn 6: initial value is complex
Insn 28: dest address src reg 23 benefit 0 lifetime 1 replaceable ncav mult 1 add 0
Insn 34: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 1
Insn 45: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 1
Insn 51: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 2
Insn 62: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 2
Insn 68: dest address src reg 22 benefit -1 lifetime 1 replaceable mult 1 add 3
Insn 79: dest address src reg 23 benefit -1 lifetime 1 replaceable mult 1 add 3
Insn 85: dest address src reg 22 benefit 0 lifetime 1 replaceable ncav mult 1 add 0
Loop unrolling: Not basic or general induction var.
Cannot eliminate biv 27: biv used in insn 99.
Sorted combine statistics:

biv 22 can be eliminated.
Sorted combine statistics:
 {85, 7} {68, 6} {51, 6} {34, 6}
giv at 68 combined with giv at 85
giv at 51 combined with giv at 85
giv at 34 combined with giv at 85
Sorted combine statistics:

giv of insn 85 not worth while, -1008 vs 21.
giv at 51 recombined with giv at 85 as (plus:SI (reg:SI 36)
    (const_int 2))
giv at 68 recombined with giv at 85 as (plus:SI (reg:SI 36)
    (const_int 3))
biv 23 can be eliminated.
Sorted combine statistics:
 {28, 7} {79, 6} {62, 6} {45, 6}
giv at 79 combined with giv at 28
giv at 62 combined with giv at 28
giv at 45 combined with giv at 28
Sorted combine statistics:

giv of insn 28 not worth while, -1008 vs 21.
giv at 62 recombined with giv at 28 as (plus:SI (reg:SI 36)
    (const_int 2))
giv at 79 recombined with giv at 28 as (plus:SI (reg:SI 36)
    (const_int 3))


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
  1999-02-18 20:18               ` Joern Rennecke
       [not found]                 ` < 199902190417.EAA25841@phal.cygnus.co.uk >
@ 1999-02-28 22:53                 ` Joern Rennecke
  1 sibling, 0 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: wilson, law, egcs

> by hand.  I'm going to run a full bootstrap now and make sure nothing
> broke.

Don't bother, it's broken :-( .  There are problems with shared RTL,
and with the handling of undobuf.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
  1999-02-19 14:19                       ` Jim Wilson
       [not found]                         ` < 199902192218.OAA21328@rtl.cygnus.com >
@ 1999-02-28 22:53                         ` Jim Wilson
  1 sibling, 0 replies; 20+ messages in thread
From: Jim Wilson @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: zack, law, egcs

What does this patch do?  There are no comments explaining what it does,
and you didn't offer any explanation.  All I know is that it gives better
code for Zack's testcase, but I am not going to approve it unless I can
understand why.

Actually, I see one clue in the changelog entry, it mentions creating
address givs, but nowhere do I see DEST_ADDR used in your patch.

Why do you want to call subst?  subst is a combiner internal routine.
I'm skeptical that making it callable from elsewhere is a good idea.
That could just result in a lot of bugs and maintenance problems.
You will have to have a good reason for wanting to do this before I will
seriously consider it.

The loop_insn_first_p change looks like a carry-over from another pending
patch.  I'll take care of that patch.

Jim

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
  1999-02-18 17:27   ` Joern Rennecke
       [not found]     ` < 199902190127.BAA27142@phal.cygnus.co.uk >
@ 1999-02-28 22:53     ` Joern Rennecke
  1 sibling, 0 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Zack Weinberg, wilson, law; +Cc: egcs

> I've been looking at a strength reduction bug for the last week or
> so.  Until very recently we generated incorrect code when Joern's giv
> combiner was active for this case.  Now the code is correct, but
> decidedly suboptimal due to increased register pressure.

Please try / review this patch:

Fri Feb 19 01:25:17 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

	* rtl.h (subst, subst_insn): Declare.
	* loop.c (strength_reduce): Clean up code to set maybe_multiple.
	In code to convert biv increments to givs, try to create address givs.
	* combine.c (subst, subst_insn): No longer static.
	(combine_instructions): Before returning, clear reg_last_set_value.
	(nonzero_bits): handle case when reg_last_set_value is not set.

===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/rtl.h,v
retrieving revision 1.105.2.1
diff -p -r1.105.2.1 rtl.h
*** rtl.h	1999/01/28 00:26:15	1.105.2.1
--- rtl.h	1999/02/19 01:18:53
*************** extern void add_clobbers		PROTO ((rtx, i
*** 1370,1375 ****
--- 1370,1377 ----
  extern void combine_instructions	PROTO ((rtx, int));
  extern int extended_count		PROTO ((rtx, enum machine_mode, int));
  extern rtx remove_death			PROTO ((int, rtx));
+ extern rtx subst			PROTO((rtx, rtx, rtx, int, int));
+ extern rtx subst_insn;
  #ifdef BUFSIZ
  extern void dump_combine_stats		PROTO ((FILE *));
  extern void dump_combine_total_stats	PROTO ((FILE *));
Index: loop.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/loop.c,v
retrieving revision 1.166.2.31
diff -p -r1.166.2.31 loop.c
*** loop.c	1999/02/18 19:41:23	1.166.2.31
--- loop.c	1999/02/19 01:18:55
*************** strength_reduce (scan_start, end, loop_t
*** 3782,3794 ****
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && (INSN_UID (JUMP_LABEL (insn)) >= max_uid_for_loop
! 			      || (INSN_UID (p) < max_uid_for_loop
! 				  ? (INSN_LUID (JUMP_LABEL (insn))
! 				     <= INSN_LUID (p))
! 				  : (INSN_UID (insn) >= max_uid_for_loop
! 				     || (INSN_LUID (JUMP_LABEL (insn))
! 					 < INSN_LUID (insn))))))))
  		{
  		  maybe_multiple = 1;
  		  break;
--- 3813,3819 ----
  		  && (! condjump_p (insn)
  		      || (JUMP_LABEL (insn) != 0
  			  && JUMP_LABEL (insn) != scan_start
! 			  && ! loop_insn_first_p (p, JUMP_LABEL (insn)))))
  		{
  		  maybe_multiple = 1;
  		  break;
*************** strength_reduce (scan_start, end, loop_t
*** 4157,4164 ****
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
--- 4182,4190 ----
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, src, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
+ 	      int all_eliminated;
      
  	      if (! v->always_executed
  		  || v->maybe_multiple
*************** strength_reduce (scan_start, end, loop_t
*** 4198,4203 ****
--- 4224,4268 ----
  		  continue;
  		}
  	      next->add_val = add_val;
+ 
+ 	      /* Remove the increment from the list of biv increments.  */
+ 	      *vp = next;
+ 	      bl->biv_count--;
+ 	      old_regno = REGNO (old_reg);
+ 	      new_regno = REGNO (dest_reg);
+ 	      VARRAY_INT (set_in_loop, old_regno)--;
+ 	      VARRAY_INT (n_times_set, old_regno)--;
+ 
+ 	      src = SET_SRC (set);
+ 	      all_eliminated = 1;
+ 	      for (p = NEXT_INSN (v->insn); p != next->insn; p = NEXT_INSN (p))
+ 		{
+ 		  rtx newpat;
+ 
+ 		  if (GET_RTX_CLASS (GET_CODE (p)) != 'i')
+ 		    continue;
+ 		  subst_insn = p;
+ 		  newpat = subst (PATTERN (p), old_reg, src, 0, 0);
+ 		  if (validate_change (p, &PATTERN (p), newpat, 0))
+ 		    replace_rtx (REG_NOTES (p), old_reg, src);
+ 		  else
+ 		    all_eliminated = 0;
+ 		}
+ 	      if (all_eliminated)
+ 		{
+ 		  if (loop_dump_stream)
+ 		    fprintf (loop_dump_stream,
+ 			     "Increment %d of biv %d eliminated.\n\n",
+ 			 INSN_UID (v->insn), old_regno);
+ 		  PUT_CODE (v->insn, NOTE);
+ 		  NOTE_LINE_NUMBER (v->insn) = NOTE_INSN_DELETED;
+ 		  NOTE_SOURCE_FILE (v->insn) = 0;
+ 		  VARRAY_CHAR (may_not_optimize, new_regno) = 0;
+ 		  VARRAY_INT (set_in_loop, new_regno) = 0;
+ 		  VARRAY_INT (n_times_set, new_regno) = 0;
+ 		  continue;
+ 		}
+ 
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
*************** strength_reduce (scan_start, end, loop_t
*** 4219,4239 ****
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
- 	      old_regno = REGNO (old_reg);
- 	      new_regno = REGNO (dest_reg);
- 	      VARRAY_INT (set_in_loop, old_regno)--;
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
- 	      VARRAY_INT (n_times_set, old_regno)--;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
- 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Remove the increment from the list of biv increments,
! 		 and record it as a giv.  */
! 	      *vp = next;
! 	      bl->biv_count--;
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
--- 4284,4296 ----
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Record V as a giv.  */
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
Index: combine.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/combine.c,v
retrieving revision 1.193
diff -p -r1.193 combine.c
*** combine.c	1999/01/20 17:50:18	1.193
--- combine.c	1999/02/19 01:18:59
*************** static int last_call_cuid;
*** 165,171 ****
     looked at, but this may be used to examine the successors of the insn
     to judge whether a simplification is valid.  */
  
! static rtx subst_insn;
  
  /* This is an insn that belongs before subst_insn, but is not currently
     on the insn chain.  */
--- 165,171 ----
     looked at, but this may be used to examine the successors of the insn
     to judge whether a simplification is valid.  */
  
! rtx subst_insn;
  
  /* This is an insn that belongs before subst_insn, but is not currently
     on the insn chain.  */
*************** static int combinable_i3pat	PROTO((rtx, 
*** 398,404 ****
  static rtx try_combine		PROTO((rtx, rtx, rtx));
  static void undo_all		PROTO((void));
  static rtx *find_split_point	PROTO((rtx *, rtx));
- static rtx subst		PROTO((rtx, rtx, rtx, int, int));
  static rtx simplify_rtx		PROTO((rtx, enum machine_mode, int, int));
  static rtx simplify_if_then_else  PROTO((rtx));
  static rtx simplify_set		PROTO((rtx));
--- 398,403 ----
*************** combine_instructions (f, nregs)
*** 674,679 ****
--- 673,679 ----
    total_successes += combine_successes;
  
    nonzero_sign_valid = 0;
+   reg_last_set_value = 0;
  
    /* Make recognizer allow volatile MEMs again.  */
    init_recog ();
*************** find_split_point (loc, insn)
*** 3104,3110 ****
     UNIQUE_COPY is non-zero if each substitution must be unique.  We do this
     by copying if `n_occurrences' is non-zero.  */
  
! static rtx
  subst (x, from, to, in_dest, unique_copy)
       register rtx x, from, to;
       int in_dest;
--- 3147,3153 ----
     UNIQUE_COPY is non-zero if each substitution must be unique.  We do this
     by copying if `n_occurrences' is non-zero.  */
  
! rtx
  subst (x, from, to, in_dest, unique_copy)
       register rtx x, from, to;
       int in_dest;
*************** nonzero_bits (x, mode)
*** 7622,7627 ****
--- 7665,7674 ----
  	  return nonzero &= ~ (sp_alignment - 1);
  	}
  #endif
+ 
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return nonzero;
  
        /* If X is a register whose nonzero bits value is current, use it.
  	 Otherwise, if X is a register whose value we can find, use that

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
  1999-02-19 14:27                           ` Joern Rennecke
@ 1999-02-28 22:53                             ` Joern Rennecke
  0 siblings, 0 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Jim Wilson; +Cc: amylaar, zack, law, egcs

> What does this patch do?  There are no comments explaining what it does,
> and you didn't offer any explanation.  All I know is that it gives better
> code for Zack's testcase, but I am not going to approve it unless I can
> understand why.

When I implemented the biv increment -> giv conversion, I made DEST_REG givs.
the giv combination is not really good at handling cases when there are
DEST_REG givs that are only used for a DEST_ADDR giv that could be changed
to make the DEST_REG giv dead.

> Actually, I see one clue in the changelog entry, it mentions creating
> address givs, but nowhere do I see DEST_ADDR used in your patch.

The address givs are not created explicitly, but by combining the register
set into its use, I create DEST_ADDR givs if the use happens to be in
a MEM.  If we can combine the set into some other uses, so much the better.

> Why do you want to call subst?  subst is a combiner internal routine.
> I'm skeptical that making it callable from elsewhere is a good idea.
> That could just result in a lot of bugs and maintenance problems.
> You will have to have a good reason for wanting to do this before I will
> seriously consider it.

I have found out in the meantime that calling subst from outside is
indeed not a good idea.  Look at my follow-up-patch, which defines
a function validate_subst in combine to handle the combiner internals.

In effect, I want to combine the setting of an register with its use -
possibly with multiple uses - outside of combine.  Re-using code from
combine seemed the natural thing to do.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
  1999-02-25 22:01 ` Jim Wilson
       [not found]   ` < 199902260600.WAA24238@rtl.cygnus.com >
@ 1999-02-28 22:53   ` Jim Wilson
  1 sibling, 0 replies; 20+ messages in thread
From: Jim Wilson @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: zack, law, egcs

I understand what the patch is doing now.

I am still concerned that you are calling subst from loop.  You are doing
it indirectly through validate_subst which is much better, but you are still
calling it from loop.  This has implications for long term maintenance,
since it means anyone modifying combine has to remember that the code must
work called from loop also.  It is likely that people will forget this, and
we will see occasional bugs as a result.  I don't see any easy convenient
alternative though, and the combine changes to make this work seem pretty
simple, so it appears to be worth trying.

The current code always does this:
            VARRAY_CHAR (may_not_optimize, new_regno) = 0;
The patched code only does this if an increment is eliminated.  I don't
understand how that can be right unless the current code is actually wrong.

Before the for loop that calls validate_subst, you should add a comment
explaining that we try to create address givs by replacing reg uses
with (PLUS reg increment).

The patch calls validate_subst for every instruction, regardless of whether
it uses old_reg.  validate_subst calls subst and recog for every instruction
regardless of whether any change was made.  This seems wasteful.  Shouldn't
we call reg_mentioned_p first, and if it returns true, only then call
validate_subst?  That might not work if there is a REG_EQUAL note that
refers to old_reg but the insn doesn't.  Perhaps the reg_mentioned_p check
should be in validate_subst?

Since we really want to create address givs, maybe we should only do this
substitution inside MEMs?  Will something useful happen if it gets replaced
elsewhere?  I guess that will result in creation of a DEST_REG giv, so it
should still work.

Typo in combine.c: sucess -> success.

The code in validate_subst to set previous_undos from undos looks unnecessary.
previous_undos is only used if we call subst multiple times.  If we want
to undo the effect of the last subst call, then we can restore undos from
previous_undos.  This doesn't apply to validate_subst.

validate_subst assumes that undobuf.undos has some reasonable value at
the beginning.  I don't think that is a safe assumption.  You should set
undobuf.undos to zero at the beginning.

Why do you call validate_change in validate_subst?  It would be simpler to
call recog_insn.  You could then get rid of 3 other statements that fiddle
wtih PATTERN (insn).  The only reason I see for this is because validate_change
tries adding/removing clobbers.  In that case, you should call
recog_for_combine.  It looks safe, and will probably fix a bug, since
recog_for_combine looks for the (clobber (const_int 0)) stuff that
combine sometimes creates for invalid substitutions.  These clobbers would
just be ignored by validate_change.

validate_subst only tries to fix a REG_EQUAL note if validate_change
succeeded.  But what if we have an insn that doesn't use the register
but has a REG_EQUAL note that does?  This could happen for a libcall
sequence.  It might be possible in other cases.  In loop, the
DEST_REG giv case immediately after the validate_subst call handles
reg notes independently.  Either the code in validate_subst is wrong,
or the code in loop is unnecessarily inefficient.  I think the validate_subst
code is wrong.

Jim

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
  1999-02-26 17:28     ` Joern Rennecke
@ 1999-02-28 22:53       ` Joern Rennecke
  0 siblings, 0 replies; 20+ messages in thread
From: Joern Rennecke @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Jim Wilson; +Cc: zack, law, egcs

> The current code always does this:
>             VARRAY_CHAR (may_not_optimize, new_regno) = 0;
> The patched code only does this if an increment is eliminated.  I don't
> understand how that can be right unless the current code is actually wrong.

Yes, this is really only important when the increment is converted but
not eliminated (because only then is the register used at all), but best
done unconditionally.  I changed the code accordingly.

> The patch calls validate_subst for every instruction, regardless of whether
> it uses old_reg.  validate_subst calls subst and recog for every instruction
> regardless of whether any change was made.  This seems wasteful.  Shouldn't
> we call reg_mentioned_p first, and if it returns true, only then call
> validate_subst?  That might not work if there is a REG_EQUAL note that
> refers to old_reg but the insn doesn't.  Perhaps the reg_mentioned_p check
> should be in validate_subst?

Ok.

> Since we really want to create address givs, maybe we should only do this
> substitution inside MEMs?  Will something useful happen if it gets replaced
> elsewhere?  I guess that will result in creation of a DEST_REG giv, so it
> should still work.

It can change DEST_REG givs using add or lea, and on processors like the
m68k it might even create the odd pea.

> The code in validate_subst to set previous_undos from undos looks unnecessary.
> previous_undos is only used if we call subst multiple times.  If we want
> to undo the effect of the last subst call, then we can restore undos from
> previous_undos.  This doesn't apply to validate_subst.

It is necessary because gen_rtx_combine looks at previous_undos.

> validate_subst assumes that undobuf.undos has some reasonable value at
> the beginning.  I don't think that is a safe assumption.  You should set
> undobuf.undos to zero at the beginning.

Yes.

> Why do you call validate_change in validate_subst?  It would be simpler to
> call recog_insn.  You could then get rid of 3 other statements that fiddle
> wtih PATTERN (insn).  The only reason I see for this is because validate_change
> tries adding/removing clobbers.  In that case, you should call
> recog_for_combine.  It looks safe, and will probably fix a bug, since
> recog_for_combine looks for the (clobber (const_int 0)) stuff that
> combine sometimes creates for invalid substitutions.  These clobbers would
> just be ignored by validate_change.

Originally the ida was that you can use the in_group mechanism of
validate_change - but that wouldn't actually work since subst does
destructive changes.  But yes, the clobber handling is better than calling
recog directly.
I can't use recog_for_combine because it uses data flow information.
In oparticular, reg_dead_at_p looks at basic_block_live_at_start[block].

> validate_subst only tries to fix a REG_EQUAL note if validate_change
> succeeded.  But what if we have an insn that doesn't use the register
> but has a REG_EQUAL note that does?  This could happen for a libcall
> sequence.  It might be possible in other cases.  In loop, the
> DEST_REG giv case immediately after the validate_subst call handles
> reg notes independently.  Either the code in validate_subst is wrong,
> or the code in loop is unnecessarily inefficient.  I think the validate_subst
> code is wrong.

validate_change will succeed also if the register is not mentioned,
since the pattern was forced to be different.

The code in loop handles a different substitution, namely substituting
the biv with a DEST_REG giv.

In the meantime Jeffrey Law has sent me a testcase that pointed out a bug
in the code that I was patching: if a DEST_REG giv is created, we need
v->insn and last_use_insn to have valid luids.  There was no provision to
guarantee this, so that we can get failures when an inner loop has been
unrolled, and we then process the outer loop which contains a number of
insns with uids > max_uid_for_loop.

This has made a change of plans necessary: it must be possible to undo
all substitutions if we'd otherwise end up with a DEST_REG giv without
valid LUIDs.

Sat Feb 27 00:56:34 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

        * rtl.h (validate_subst, validate_subst_start, validate_subst_undo):
	Declare.
        * loop.c (strength_reduce): In code to convert biv increments to givs,
	try to create address givs.
        * combine.c (undo_last): New functions.
        (validate_subst, validate_subst_start, validate_subst_undo): Likewise.
        (combine_instructions): Before returning, clear reg_last_set_value.
        (nonzero_bits, num_sign_bit_copies, reg_last_value):
        Handle case when reg_last_set_value is not set.

Index: rtl.h
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/rtl.h,v
retrieving revision 1.105.2.3
diff -p -r1.105.2.3 rtl.h
*** rtl.h	1999/02/25 19:28:24	1.105.2.3
--- rtl.h	1999/02/27 00:56:26
*************** extern void add_clobbers		PROTO ((rtx, i
*** 1369,1374 ****
--- 1369,1377 ----
  extern void combine_instructions	PROTO ((rtx, int));
  extern int extended_count		PROTO ((rtx, enum machine_mode, int));
  extern rtx remove_death			PROTO ((int, rtx));
+ extern int validate_subst		PROTO((rtx, rtx, rtx));
+ extern void validate_subst_start	PROTO((void));
+ extern void validate_subst_undo		PROTO((void));
  #ifdef BUFSIZ
  extern void dump_combine_stats		PROTO ((FILE *));
  extern void dump_combine_total_stats	PROTO ((FILE *));
Index: loop.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/loop.c,v
retrieving revision 1.166.2.37
diff -p -r1.166.2.37 loop.c
*** loop.c	1999/02/25 19:28:25	1.166.2.37
--- loop.c	1999/02/27 00:56:28
*************** strength_reduce (scan_start, end, loop_t
*** 4162,4168 ****
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
  
  	      if (! v->always_executed
--- 4195,4201 ----
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, src, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
  
  	      if (! v->always_executed
*************** strength_reduce (scan_start, end, loop_t
*** 4182,4187 ****
--- 4215,4222 ----
  	      add_val = plus_constant (next->add_val, offset);
  	      old_reg = v->dest_reg;
  	      dest_reg = gen_reg_rtx (v->mode);
+ 	      old_regno = REGNO (old_reg);
+ 	      new_regno = REGNO (dest_reg);
      
  	      /* Unlike reg_iv_type / reg_iv_info, the other three arrays
  		 have been allocated with some slop space, so we may not
*************** strength_reduce (scan_start, end, loop_t
*** 4194,4208 ****
  		  VARRAY_GROW (n_times_set, nregs);
  		  VARRAY_GROW (may_not_optimize, nregs);
  		}
      
! 	      validate_change (v->insn, &SET_DEST (set), dest_reg, 1);
! 	      validate_change (next->insn, next->location, add_val, 1);
! 	      if (! apply_change_group ())
  		{
  		  vp = &v->next_iv;
  		  continue;
  		}
  	      next->add_val = add_val;
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
--- 4229,4306 ----
  		  VARRAY_GROW (n_times_set, nregs);
  		  VARRAY_GROW (may_not_optimize, nregs);
  		}
+ 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
! 	      if (! validate_change (next->insn, next->location, add_val, 0))
  		{
  		  vp = &v->next_iv;
  		  continue;
  		}
+ 
+ 	      src = SET_SRC (set);
+ 	      /* Try to replace all uses of OLD_REG with SRC.  This will
+ 		 mostly win when it generates / changes address givs, but it
+ 		 might also change some DEST_REG givs or create the odd
+ 		 PEA on an 68k.  */
+ 	      last_use_insn = NULL_RTX;
+ 	      validate_subst_start ();
+ 	      for (p = NEXT_INSN (v->insn); p != next->insn; p = NEXT_INSN (p))
+ 		{
+ 		  rtx newpat;
+ 
+ 		  if (GET_RTX_CLASS (GET_CODE (p)) != 'i')
+ 		    continue;
+ 		  if (! validate_subst (p, old_reg, src))
+ 		    last_use_insn = p;
+ 		}
+ 	      /* If some uses remain, we'd like to make this a DEST_REG
+ 		 giv.  However, after loop unrolling, V->INSN or LAST_USE_INSN
+ 		 might have no valid luid.  We need these not only for
+ 		 calculating the lifetime now, but also in recombine_givs when
+ 		 doing giv derivation, to find givs with non-overlapping
+ 		 lifetimes.  So if we don't have LUIDs available, or if we
+ 		 can't calculate the giv, leave the biv increment alone.  */
+ 	      if (last_use_insn
+ 		  && (INSN_UID (v->insn) >= max_uid_for_loop
+ 		      || INSN_UID (last_use_insn) >= max_uid_for_loop
+ 		      || ! validate_change (v->insn, &SET_DEST (set),
+ 					    dest_reg, 0)))
+ 		{
+ 		  /* Change the increment at NEXT back to what it was.  */
+ 		  if (! validate_change (next->insn, next->location,
+ 		      next->add_val, 0))
+ 		    abort ();
+ 
+ 		  /* Undo all the substitutions made by validate_subst above,
+ 		     since the biv does hold the incremented value after
+ 		     all.  */
+ 		  validate_subst_undo ();
+ 
+ 		  vp = &v->next_iv;
+ 		  continue;
+ 		}
+ 
+ 	      /* Remove the increment from the list of biv increments.  */
+ 	      *vp = next;
+ 	      bl->biv_count--;
+ 	      VARRAY_INT (set_in_loop, old_regno)--;
+ 	      VARRAY_INT (n_times_set, old_regno)--;
  	      next->add_val = add_val;
+ 
+ 	      if (! last_use_insn)
+ 		{
+ 		  if (loop_dump_stream)
+ 		    fprintf (loop_dump_stream,
+ 			     "Increment %d of biv %d eliminated.\n\n",
+ 			 INSN_UID (v->insn), old_regno);
+ 		  PUT_CODE (v->insn, NOTE);
+ 		  NOTE_LINE_NUMBER (v->insn) = NOTE_INSN_DELETED;
+ 		  NOTE_SOURCE_FILE (v->insn) = 0;
+ 		  VARRAY_INT (set_in_loop, new_regno) = 0;
+ 		  VARRAY_INT (n_times_set, new_regno) = 0;
+ 		  continue;
+ 		}
+ 
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
*************** strength_reduce (scan_start, end, loop_t
*** 4224,4244 ****
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
- 	      old_regno = REGNO (old_reg);
- 	      new_regno = REGNO (dest_reg);
- 	      VARRAY_INT (set_in_loop, old_regno)--;
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
- 	      VARRAY_INT (n_times_set, old_regno)--;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
- 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Remove the increment from the list of biv increments,
! 		 and record it as a giv.  */
! 	      *vp = next;
! 	      bl->biv_count--;
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
--- 4322,4334 ----
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Record V as a giv.  */
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
Index: combine.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/combine.c,v
retrieving revision 1.193
diff -p -r1.193 combine.c
*** combine.c	1999/01/20 17:50:18	1.193
--- combine.c	1999/02/27 00:56:32
*************** static int sets_function_arg_p	PROTO((rt
*** 397,402 ****
--- 397,403 ----
  static int combinable_i3pat	PROTO((rtx, rtx *, rtx, rtx, int, rtx *));
  static rtx try_combine		PROTO((rtx, rtx, rtx));
  static void undo_all		PROTO((void));
+ static void undo_last		PROTO((void));
  static rtx *find_split_point	PROTO((rtx *, rtx));
  static rtx subst		PROTO((rtx, rtx, rtx, int, int));
  static rtx simplify_rtx		PROTO((rtx, enum machine_mode, int, int));
*************** combine_instructions (f, nregs)
*** 674,679 ****
--- 675,681 ----
    total_successes += combine_successes;
  
    nonzero_sign_valid = 0;
+   reg_last_set_value = 0;
  
    /* Make recognizer allow volatile MEMs again.  */
    init_recog ();
*************** undo_all ()
*** 2682,2687 ****
--- 2727,2758 ----
       affected.  */
    subst_prev_insn = NULL_RTX;
  }
+ 
+ /* Undo all the modifications recorded in undobuf after previous_undos.  */
+ 
+ static void
+ undo_last ()
+ {
+   struct undo *undo, *next;
+ 
+   for (undo = undobuf.undos; undo != undobuf.previous_undos; undo = next)
+     {
+       next = undo->next;
+       if (undo->is_int)
+ 	*undo->where.i = undo->old_contents.i;
+       else
+ 	*undo->where.r = undo->old_contents.r;
+ 
+       undo->next = undobuf.frees;
+       undobuf.frees = undo;
+     }
+ 
+   undobuf.undos = undobuf.previous_undos;
+ 
+   /* Clear this here, so that subsequent get_last_value calls are not
+      affected.  */
+   subst_prev_insn = NULL_RTX;
+ }
  \f
  /* Find the innermost point within the rtx at LOC, possibly LOC itself,
     where we have an arithmetic expression and return that point.  LOC will
*************** nonzero_bits (x, mode)
*** 7623,7628 ****
--- 7694,7703 ----
  	}
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return nonzero;
+ 
        /* If X is a register whose nonzero bits value is current, use it.
  	 Otherwise, if X is a register whose value we can find, use that
  	 value.  Otherwise, use the previously-computed global nonzero bits
*************** num_sign_bit_copies (x, mode)
*** 8018,8023 ****
--- 8093,8102 ----
  	return GET_MODE_BITSIZE (Pmode) - GET_MODE_BITSIZE (ptr_mode) + 1;
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return 1;
+ 
        if (reg_last_set_value[REGNO (x)] != 0
  	  && reg_last_set_mode[REGNO (x)] == mode
  	  && (REG_N_SETS (REGNO (x)) == 1
*************** get_last_value (x)
*** 10924,10930 ****
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   if (GET_CODE (x) != REG)
      return 0;
  
    regno = REGNO (x);
--- 11003,11010 ----
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   /* If called from loop, the reg_last_* arrays are not set.  */
!   if (GET_CODE (x) != REG || ! reg_last_set_value)
      return 0;
  
    regno = REGNO (x);
*************** insn_cuid (insn)
*** 12086,12091 ****
--- 12166,12251 ----
      abort ();
  
    return INSN_CUID (insn);
+ }
+ \f
+ void
+ validate_subst_start ()
+ {
+   undobuf.undos = 0;
+ 
+   /* Save the current high-water-mark so we can free storage if we didn't
+      accept this set of combinations.  */
+   undobuf.storage = (char *) oballoc (0);
+ }
+ 
+ void
+ validate_subst_undo ()
+ {
+   undo_all ();
+ }
+ 
+ /* Replace FROM with to throughout in INSN, and make simplifications.
+    Return nonzero for success.  */
+ int
+ validate_subst (insn, from, to)
+      rtx insn, from, to;
+ {
+   rtx pat, new_pat;
+   int i;
+ 
+   pat = PATTERN (insn);
+ 
+   /* If from is not mentioned in PAT, we don't need to grind it through
+      subst.  This can safe some time, and also avoids suprious failures
+      when a simplification is not recognized as a valid insn.  */
+   if (! reg_mentioned_p (from, pat))
+     {
+       /* But we still have to make sure a REG_EQUAL note gets updated.  */
+       rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+ 
+       if (note)
+ 	XEXP (note, 0) = subst (XEXP (note, 0), from, to, 0, 1);
+       return 1;
+     }
+ 
+   /* We have to set previous_undos to prevent gen_rtx_combine from re-using
+      some piece of shared rtl.  */
+   undobuf.previous_undos = undobuf.undos;
+ 
+   subst_insn = insn;
+ 
+   new_pat = subst (pat, from, to, 0, 1);
+ 
+   /* If PAT is a PARALLEL, check to see if it contains the CLOBBER
+      we use to indicate that something didn't match.  If we find such a
+      thing, force rejection.
+      This is the same test as in recog_for_combine; we can't use the that
+      function here because it tries to use data flow information.  */
+   if (GET_CODE (pat) == PARALLEL)
+     for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
+       if (GET_CODE (XVECEXP (pat, 0, i)) == CLOBBER
+ 	  && XEXP (XVECEXP (pat, 0, i), 0) == const0_rtx)
+     {
+       undo_last ();
+       return 0;
+     }
+ 
+   /* Change INSN to a nop so that validate_change is forced to re-recognize.  */
+   PATTERN (insn) = const0_rtx;
+   if (validate_change (insn, &PATTERN (insn), new_pat, 0))
+     {
+       rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+ 
+       if (note)
+ 	XEXP (note, 0) = subst (XEXP (note, 0), from, to, 0, 1);
+       return 1;
+     }
+   else
+     {
+       PATTERN (insn) = pat;
+       undo_last ();
+       return 0;
+     }
  }
  \f
  void

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
       [not found]   ` < 199902260600.WAA24238@rtl.cygnus.com >
@ 1999-02-26 17:28     ` Joern Rennecke
  1999-02-28 22:53       ` Joern Rennecke
  0 siblings, 1 reply; 20+ messages in thread
From: Joern Rennecke @ 1999-02-26 17:28 UTC (permalink / raw)
  To: Jim Wilson; +Cc: zack, law, egcs

> The current code always does this:
>             VARRAY_CHAR (may_not_optimize, new_regno) = 0;
> The patched code only does this if an increment is eliminated.  I don't
> understand how that can be right unless the current code is actually wrong.

Yes, this is really only important when the increment is converted but
not eliminated (because only then is the register used at all), but best
done unconditionally.  I changed the code accordingly.

> The patch calls validate_subst for every instruction, regardless of whether
> it uses old_reg.  validate_subst calls subst and recog for every instruction
> regardless of whether any change was made.  This seems wasteful.  Shouldn't
> we call reg_mentioned_p first, and if it returns true, only then call
> validate_subst?  That might not work if there is a REG_EQUAL note that
> refers to old_reg but the insn doesn't.  Perhaps the reg_mentioned_p check
> should be in validate_subst?

Ok.

> Since we really want to create address givs, maybe we should only do this
> substitution inside MEMs?  Will something useful happen if it gets replaced
> elsewhere?  I guess that will result in creation of a DEST_REG giv, so it
> should still work.

It can change DEST_REG givs using add or lea, and on processors like the
m68k it might even create the odd pea.

> The code in validate_subst to set previous_undos from undos looks unnecessary.
> previous_undos is only used if we call subst multiple times.  If we want
> to undo the effect of the last subst call, then we can restore undos from
> previous_undos.  This doesn't apply to validate_subst.

It is necessary because gen_rtx_combine looks at previous_undos.

> validate_subst assumes that undobuf.undos has some reasonable value at
> the beginning.  I don't think that is a safe assumption.  You should set
> undobuf.undos to zero at the beginning.

Yes.

> Why do you call validate_change in validate_subst?  It would be simpler to
> call recog_insn.  You could then get rid of 3 other statements that fiddle
> wtih PATTERN (insn).  The only reason I see for this is because validate_change
> tries adding/removing clobbers.  In that case, you should call
> recog_for_combine.  It looks safe, and will probably fix a bug, since
> recog_for_combine looks for the (clobber (const_int 0)) stuff that
> combine sometimes creates for invalid substitutions.  These clobbers would
> just be ignored by validate_change.

Originally the ida was that you can use the in_group mechanism of
validate_change - but that wouldn't actually work since subst does
destructive changes.  But yes, the clobber handling is better than calling
recog directly.
I can't use recog_for_combine because it uses data flow information.
In oparticular, reg_dead_at_p looks at basic_block_live_at_start[block].

> validate_subst only tries to fix a REG_EQUAL note if validate_change
> succeeded.  But what if we have an insn that doesn't use the register
> but has a REG_EQUAL note that does?  This could happen for a libcall
> sequence.  It might be possible in other cases.  In loop, the
> DEST_REG giv case immediately after the validate_subst call handles
> reg notes independently.  Either the code in validate_subst is wrong,
> or the code in loop is unnecessarily inefficient.  I think the validate_subst
> code is wrong.

validate_change will succeed also if the register is not mentioned,
since the pattern was forced to be different.

The code in loop handles a different substitution, namely substituting
the biv with a DEST_REG giv.

In the meantime Jeffrey Law has sent me a testcase that pointed out a bug
in the code that I was patching: if a DEST_REG giv is created, we need
v->insn and last_use_insn to have valid luids.  There was no provision to
guarantee this, so that we can get failures when an inner loop has been
unrolled, and we then process the outer loop which contains a number of
insns with uids > max_uid_for_loop.

This has made a change of plans necessary: it must be possible to undo
all substitutions if we'd otherwise end up with a DEST_REG giv without
valid LUIDs.

Sat Feb 27 00:56:34 1999  J"orn Rennecke <amylaar@cygnus.co.uk>

        * rtl.h (validate_subst, validate_subst_start, validate_subst_undo):
	Declare.
        * loop.c (strength_reduce): In code to convert biv increments to givs,
	try to create address givs.
        * combine.c (undo_last): New functions.
        (validate_subst, validate_subst_start, validate_subst_undo): Likewise.
        (combine_instructions): Before returning, clear reg_last_set_value.
        (nonzero_bits, num_sign_bit_copies, reg_last_value):
        Handle case when reg_last_set_value is not set.

Index: rtl.h
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/rtl.h,v
retrieving revision 1.105.2.3
diff -p -r1.105.2.3 rtl.h
*** rtl.h	1999/02/25 19:28:24	1.105.2.3
--- rtl.h	1999/02/27 00:56:26
*************** extern void add_clobbers		PROTO ((rtx, i
*** 1369,1374 ****
--- 1369,1377 ----
  extern void combine_instructions	PROTO ((rtx, int));
  extern int extended_count		PROTO ((rtx, enum machine_mode, int));
  extern rtx remove_death			PROTO ((int, rtx));
+ extern int validate_subst		PROTO((rtx, rtx, rtx));
+ extern void validate_subst_start	PROTO((void));
+ extern void validate_subst_undo		PROTO((void));
  #ifdef BUFSIZ
  extern void dump_combine_stats		PROTO ((FILE *));
  extern void dump_combine_total_stats	PROTO ((FILE *));
Index: loop.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/loop.c,v
retrieving revision 1.166.2.37
diff -p -r1.166.2.37 loop.c
*** loop.c	1999/02/25 19:28:25	1.166.2.37
--- loop.c	1999/02/27 00:56:28
*************** strength_reduce (scan_start, end, loop_t
*** 4162,4168 ****
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
  
  	      if (! v->always_executed
--- 4195,4201 ----
  	  for (vp = &bl->biv, next = *vp; v = next, next = v->next_iv;)
  	    {
  	      HOST_WIDE_INT offset;
! 	      rtx set, src, add_val, old_reg, dest_reg, last_use_insn;
  	      int old_regno, new_regno;
  
  	      if (! v->always_executed
*************** strength_reduce (scan_start, end, loop_t
*** 4182,4187 ****
--- 4215,4222 ----
  	      add_val = plus_constant (next->add_val, offset);
  	      old_reg = v->dest_reg;
  	      dest_reg = gen_reg_rtx (v->mode);
+ 	      old_regno = REGNO (old_reg);
+ 	      new_regno = REGNO (dest_reg);
      
  	      /* Unlike reg_iv_type / reg_iv_info, the other three arrays
  		 have been allocated with some slop space, so we may not
*************** strength_reduce (scan_start, end, loop_t
*** 4194,4208 ****
  		  VARRAY_GROW (n_times_set, nregs);
  		  VARRAY_GROW (may_not_optimize, nregs);
  		}
      
! 	      validate_change (v->insn, &SET_DEST (set), dest_reg, 1);
! 	      validate_change (next->insn, next->location, add_val, 1);
! 	      if (! apply_change_group ())
  		{
  		  vp = &v->next_iv;
  		  continue;
  		}
  	      next->add_val = add_val;
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
--- 4229,4306 ----
  		  VARRAY_GROW (n_times_set, nregs);
  		  VARRAY_GROW (may_not_optimize, nregs);
  		}
+ 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
! 	      if (! validate_change (next->insn, next->location, add_val, 0))
  		{
  		  vp = &v->next_iv;
  		  continue;
  		}
+ 
+ 	      src = SET_SRC (set);
+ 	      /* Try to replace all uses of OLD_REG with SRC.  This will
+ 		 mostly win when it generates / changes address givs, but it
+ 		 might also change some DEST_REG givs or create the odd
+ 		 PEA on an 68k.  */
+ 	      last_use_insn = NULL_RTX;
+ 	      validate_subst_start ();
+ 	      for (p = NEXT_INSN (v->insn); p != next->insn; p = NEXT_INSN (p))
+ 		{
+ 		  rtx newpat;
+ 
+ 		  if (GET_RTX_CLASS (GET_CODE (p)) != 'i')
+ 		    continue;
+ 		  if (! validate_subst (p, old_reg, src))
+ 		    last_use_insn = p;
+ 		}
+ 	      /* If some uses remain, we'd like to make this a DEST_REG
+ 		 giv.  However, after loop unrolling, V->INSN or LAST_USE_INSN
+ 		 might have no valid luid.  We need these not only for
+ 		 calculating the lifetime now, but also in recombine_givs when
+ 		 doing giv derivation, to find givs with non-overlapping
+ 		 lifetimes.  So if we don't have LUIDs available, or if we
+ 		 can't calculate the giv, leave the biv increment alone.  */
+ 	      if (last_use_insn
+ 		  && (INSN_UID (v->insn) >= max_uid_for_loop
+ 		      || INSN_UID (last_use_insn) >= max_uid_for_loop
+ 		      || ! validate_change (v->insn, &SET_DEST (set),
+ 					    dest_reg, 0)))
+ 		{
+ 		  /* Change the increment at NEXT back to what it was.  */
+ 		  if (! validate_change (next->insn, next->location,
+ 		      next->add_val, 0))
+ 		    abort ();
+ 
+ 		  /* Undo all the substitutions made by validate_subst above,
+ 		     since the biv does hold the incremented value after
+ 		     all.  */
+ 		  validate_subst_undo ();
+ 
+ 		  vp = &v->next_iv;
+ 		  continue;
+ 		}
+ 
+ 	      /* Remove the increment from the list of biv increments.  */
+ 	      *vp = next;
+ 	      bl->biv_count--;
+ 	      VARRAY_INT (set_in_loop, old_regno)--;
+ 	      VARRAY_INT (n_times_set, old_regno)--;
  	      next->add_val = add_val;
+ 
+ 	      if (! last_use_insn)
+ 		{
+ 		  if (loop_dump_stream)
+ 		    fprintf (loop_dump_stream,
+ 			     "Increment %d of biv %d eliminated.\n\n",
+ 			 INSN_UID (v->insn), old_regno);
+ 		  PUT_CODE (v->insn, NOTE);
+ 		  NOTE_LINE_NUMBER (v->insn) = NOTE_INSN_DELETED;
+ 		  NOTE_SOURCE_FILE (v->insn) = 0;
+ 		  VARRAY_INT (set_in_loop, new_regno) = 0;
+ 		  VARRAY_INT (n_times_set, new_regno) = 0;
+ 		  continue;
+ 		}
+ 
  	      v->dest_reg = dest_reg;
  	      v->giv_type = DEST_REG;
  	      v->location = &SET_SRC (set);
*************** strength_reduce (scan_start, end, loop_t
*** 4224,4244 ****
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
- 	      old_regno = REGNO (old_reg);
- 	      new_regno = REGNO (dest_reg);
- 	      VARRAY_INT (set_in_loop, old_regno)--;
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
- 	      VARRAY_INT (n_times_set, old_regno)--;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
- 	      VARRAY_CHAR (may_not_optimize, new_regno) = 0;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Remove the increment from the list of biv increments,
! 		 and record it as a giv.  */
! 	      *vp = next;
! 	      bl->biv_count--;
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
--- 4322,4334 ----
  	      v->replaceable = 1;
  	      v->no_const_addval = 0;
      
  	      VARRAY_INT (set_in_loop, new_regno) = 1;
  	      VARRAY_INT (n_times_set, new_regno) = 1;
      
  	      REG_IV_TYPE (new_regno) = GENERAL_INDUCT;
  	      REG_IV_INFO (new_regno) = v;
      
! 	      /* Record V as a giv.  */
  	      v->next_iv = bl->giv;
  	      bl->giv = v;
  	      bl->giv_count++;
Index: combine.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/combine.c,v
retrieving revision 1.193
diff -p -r1.193 combine.c
*** combine.c	1999/01/20 17:50:18	1.193
--- combine.c	1999/02/27 00:56:32
*************** static int sets_function_arg_p	PROTO((rt
*** 397,402 ****
--- 397,403 ----
  static int combinable_i3pat	PROTO((rtx, rtx *, rtx, rtx, int, rtx *));
  static rtx try_combine		PROTO((rtx, rtx, rtx));
  static void undo_all		PROTO((void));
+ static void undo_last		PROTO((void));
  static rtx *find_split_point	PROTO((rtx *, rtx));
  static rtx subst		PROTO((rtx, rtx, rtx, int, int));
  static rtx simplify_rtx		PROTO((rtx, enum machine_mode, int, int));
*************** combine_instructions (f, nregs)
*** 674,679 ****
--- 675,681 ----
    total_successes += combine_successes;
  
    nonzero_sign_valid = 0;
+   reg_last_set_value = 0;
  
    /* Make recognizer allow volatile MEMs again.  */
    init_recog ();
*************** undo_all ()
*** 2682,2687 ****
--- 2727,2758 ----
       affected.  */
    subst_prev_insn = NULL_RTX;
  }
+ 
+ /* Undo all the modifications recorded in undobuf after previous_undos.  */
+ 
+ static void
+ undo_last ()
+ {
+   struct undo *undo, *next;
+ 
+   for (undo = undobuf.undos; undo != undobuf.previous_undos; undo = next)
+     {
+       next = undo->next;
+       if (undo->is_int)
+ 	*undo->where.i = undo->old_contents.i;
+       else
+ 	*undo->where.r = undo->old_contents.r;
+ 
+       undo->next = undobuf.frees;
+       undobuf.frees = undo;
+     }
+ 
+   undobuf.undos = undobuf.previous_undos;
+ 
+   /* Clear this here, so that subsequent get_last_value calls are not
+      affected.  */
+   subst_prev_insn = NULL_RTX;
+ }
  \f
  /* Find the innermost point within the rtx at LOC, possibly LOC itself,
     where we have an arithmetic expression and return that point.  LOC will
*************** nonzero_bits (x, mode)
*** 7623,7628 ****
--- 7694,7703 ----
  	}
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return nonzero;
+ 
        /* If X is a register whose nonzero bits value is current, use it.
  	 Otherwise, if X is a register whose value we can find, use that
  	 value.  Otherwise, use the previously-computed global nonzero bits
*************** num_sign_bit_copies (x, mode)
*** 8018,8023 ****
--- 8093,8102 ----
  	return GET_MODE_BITSIZE (Pmode) - GET_MODE_BITSIZE (ptr_mode) + 1;
  #endif
  
+       /* If called from loop, the reg_last_* arrays are not set.  */
+       if (! reg_last_set_value)
+ 	return 1;
+ 
        if (reg_last_set_value[REGNO (x)] != 0
  	  && reg_last_set_mode[REGNO (x)] == mode
  	  && (REG_N_SETS (REGNO (x)) == 1
*************** get_last_value (x)
*** 10924,10930 ****
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   if (GET_CODE (x) != REG)
      return 0;
  
    regno = REGNO (x);
--- 11003,11010 ----
        && (value = get_last_value (SUBREG_REG (x))) != 0)
      return gen_lowpart_for_combine (GET_MODE (x), value);
  
!   /* If called from loop, the reg_last_* arrays are not set.  */
!   if (GET_CODE (x) != REG || ! reg_last_set_value)
      return 0;
  
    regno = REGNO (x);
*************** insn_cuid (insn)
*** 12086,12091 ****
--- 12166,12251 ----
      abort ();
  
    return INSN_CUID (insn);
+ }
+ \f
+ void
+ validate_subst_start ()
+ {
+   undobuf.undos = 0;
+ 
+   /* Save the current high-water-mark so we can free storage if we didn't
+      accept this set of combinations.  */
+   undobuf.storage = (char *) oballoc (0);
+ }
+ 
+ void
+ validate_subst_undo ()
+ {
+   undo_all ();
+ }
+ 
+ /* Replace FROM with to throughout in INSN, and make simplifications.
+    Return nonzero for success.  */
+ int
+ validate_subst (insn, from, to)
+      rtx insn, from, to;
+ {
+   rtx pat, new_pat;
+   int i;
+ 
+   pat = PATTERN (insn);
+ 
+   /* If from is not mentioned in PAT, we don't need to grind it through
+      subst.  This can safe some time, and also avoids suprious failures
+      when a simplification is not recognized as a valid insn.  */
+   if (! reg_mentioned_p (from, pat))
+     {
+       /* But we still have to make sure a REG_EQUAL note gets updated.  */
+       rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+ 
+       if (note)
+ 	XEXP (note, 0) = subst (XEXP (note, 0), from, to, 0, 1);
+       return 1;
+     }
+ 
+   /* We have to set previous_undos to prevent gen_rtx_combine from re-using
+      some piece of shared rtl.  */
+   undobuf.previous_undos = undobuf.undos;
+ 
+   subst_insn = insn;
+ 
+   new_pat = subst (pat, from, to, 0, 1);
+ 
+   /* If PAT is a PARALLEL, check to see if it contains the CLOBBER
+      we use to indicate that something didn't match.  If we find such a
+      thing, force rejection.
+      This is the same test as in recog_for_combine; we can't use the that
+      function here because it tries to use data flow information.  */
+   if (GET_CODE (pat) == PARALLEL)
+     for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
+       if (GET_CODE (XVECEXP (pat, 0, i)) == CLOBBER
+ 	  && XEXP (XVECEXP (pat, 0, i), 0) == const0_rtx)
+     {
+       undo_last ();
+       return 0;
+     }
+ 
+   /* Change INSN to a nop so that validate_change is forced to re-recognize.  */
+   PATTERN (insn) = const0_rtx;
+   if (validate_change (insn, &PATTERN (insn), new_pat, 0))
+     {
+       rtx note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+ 
+       if (note)
+ 	XEXP (note, 0) = subst (XEXP (note, 0), from, to, 0, 1);
+       return 1;
+     }
+   else
+     {
+       PATTERN (insn) = pat;
+       undo_last ();
+       return 0;
+     }
  }
  \f
  void

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: strength reduction example
       [not found] <199902192227.WAA15295.cygnus.egcs@phal.cygnus.co.uk>
@ 1999-02-25 22:01 ` Jim Wilson
       [not found]   ` < 199902260600.WAA24238@rtl.cygnus.com >
  1999-02-28 22:53   ` Jim Wilson
  0 siblings, 2 replies; 20+ messages in thread
From: Jim Wilson @ 1999-02-25 22:01 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: zack, law, egcs

I understand what the patch is doing now.

I am still concerned that you are calling subst from loop.  You are doing
it indirectly through validate_subst which is much better, but you are still
calling it from loop.  This has implications for long term maintenance,
since it means anyone modifying combine has to remember that the code must
work called from loop also.  It is likely that people will forget this, and
we will see occasional bugs as a result.  I don't see any easy convenient
alternative though, and the combine changes to make this work seem pretty
simple, so it appears to be worth trying.

The current code always does this:
            VARRAY_CHAR (may_not_optimize, new_regno) = 0;
The patched code only does this if an increment is eliminated.  I don't
understand how that can be right unless the current code is actually wrong.

Before the for loop that calls validate_subst, you should add a comment
explaining that we try to create address givs by replacing reg uses
with (PLUS reg increment).

The patch calls validate_subst for every instruction, regardless of whether
it uses old_reg.  validate_subst calls subst and recog for every instruction
regardless of whether any change was made.  This seems wasteful.  Shouldn't
we call reg_mentioned_p first, and if it returns true, only then call
validate_subst?  That might not work if there is a REG_EQUAL note that
refers to old_reg but the insn doesn't.  Perhaps the reg_mentioned_p check
should be in validate_subst?

Since we really want to create address givs, maybe we should only do this
substitution inside MEMs?  Will something useful happen if it gets replaced
elsewhere?  I guess that will result in creation of a DEST_REG giv, so it
should still work.

Typo in combine.c: sucess -> success.

The code in validate_subst to set previous_undos from undos looks unnecessary.
previous_undos is only used if we call subst multiple times.  If we want
to undo the effect of the last subst call, then we can restore undos from
previous_undos.  This doesn't apply to validate_subst.

validate_subst assumes that undobuf.undos has some reasonable value at
the beginning.  I don't think that is a safe assumption.  You should set
undobuf.undos to zero at the beginning.

Why do you call validate_change in validate_subst?  It would be simpler to
call recog_insn.  You could then get rid of 3 other statements that fiddle
wtih PATTERN (insn).  The only reason I see for this is because validate_change
tries adding/removing clobbers.  In that case, you should call
recog_for_combine.  It looks safe, and will probably fix a bug, since
recog_for_combine looks for the (clobber (const_int 0)) stuff that
combine sometimes creates for invalid substitutions.  These clobbers would
just be ignored by validate_change.

validate_subst only tries to fix a REG_EQUAL note if validate_change
succeeded.  But what if we have an insn that doesn't use the register
but has a REG_EQUAL note that does?  This could happen for a libcall
sequence.  It might be possible in other cases.  In loop, the
DEST_REG giv case immediately after the validate_subst call handles
reg notes independently.  Either the code in validate_subst is wrong,
or the code in loop is unnecessarily inefficient.  I think the validate_subst
code is wrong.

Jim

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~1999-02-28 22:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-02-18  6:51 strength reduction example Zack Weinberg
     [not found] ` < 199902181451.JAA11891@blastula.phys.columbia.edu >
1999-02-18 17:27   ` Joern Rennecke
     [not found]     ` < 199902190127.BAA27142@phal.cygnus.co.uk >
1999-02-18 18:36       ` Joern Rennecke
     [not found]         ` < 199902190235.CAA32082@phal.cygnus.co.uk >
1999-02-18 20:09           ` Zack Weinberg
     [not found]             ` < 199902190406.XAA05899@blastula.phys.columbia.edu >
1999-02-18 20:18               ` Joern Rennecke
     [not found]                 ` < 199902190417.EAA25841@phal.cygnus.co.uk >
1999-02-18 21:15                   ` Joern Rennecke
     [not found]                     ` < 199902190514.FAA10029@phal.cygnus.co.uk >
1999-02-19 14:19                       ` Jim Wilson
     [not found]                         ` < 199902192218.OAA21328@rtl.cygnus.com >
1999-02-19 14:27                           ` Joern Rennecke
1999-02-28 22:53                             ` Joern Rennecke
1999-02-28 22:53                         ` Jim Wilson
1999-02-28 22:53                     ` Joern Rennecke
1999-02-28 22:53                 ` Joern Rennecke
1999-02-28 22:53             ` Zack Weinberg
1999-02-28 22:53         ` Joern Rennecke
1999-02-28 22:53     ` Joern Rennecke
1999-02-28 22:53 ` Zack Weinberg
     [not found] <199902192227.WAA15295.cygnus.egcs@phal.cygnus.co.uk>
1999-02-25 22:01 ` Jim Wilson
     [not found]   ` < 199902260600.WAA24238@rtl.cygnus.com >
1999-02-26 17:28     ` Joern Rennecke
1999-02-28 22:53       ` Joern Rennecke
1999-02-28 22:53   ` Jim Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).