array reference optimization

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* array reference optimization
@ 1998-05-13 22:33 Richard Henderson
  1998-05-16 19:47 ` Richard Henderson
  1998-05-17 17:58 ` Jim Wilson
  0 siblings, 2 replies; 4+ messages in thread
From: Richard Henderson @ 1998-05-13 22:33 UTC (permalink / raw)
  To: egcs

I believe that I've found the reason that Fortran loop induction
variables have not been being recognized recently.  This bit from
get_inner_reference

           index = fold (build (MULT_EXPR, sbitsizetype, index,
                                convert (sbitsizetype,
                                         TYPE_SIZE (TREE_TYPE (exp)))));

produces and conversion to TImode, which is later fed through a 
division to produce the offset.  While the individual instructions
that are produced from this aren't so bad, the convolutions are too
much for loop.

But given that I've complained about the division producing less than
optimal code in the past, I think that the proper way to fix the problem
is to do the array index calculation in units rather than bits.

The following patch implements this.


r~



Wed May 13 17:14:56 1998  Richard Henderson  <rth@cygnus.com>

	* tree.h (TYPE_SIZE_UNIT): New.
	(struct tree_type): Add size_unit member.
	* stor-layout.c (layout_type): Initialize it.
	* expr.c (get_inner_reference) [ARRAY_REF]: Use it.


Index: expr.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/expr.c,v
retrieving revision 1.56
diff -c -p -d -r1.56 expr.c
*** expr.c	1998/05/13 12:40:13	1.56
--- expr.c	1998/05/14 00:13:10
*************** get_inner_reference (exp, pbitsize, pbit
*** 4374,4379 ****
--- 4374,4380 ----
  	  tree low_bound
  	    = domain ? TYPE_MIN_VALUE (domain) : integer_zero_node;
  	  tree index_type = TREE_TYPE (index);
+ 	  tree xindex;
  
  	  if (TYPE_PRECISION (index_type) != TYPE_PRECISION (sizetype))
  	    {
*************** get_inner_reference (exp, pbitsize, pbit
*** 4391,4411 ****
  	      index_type = TREE_TYPE (index);
  	    }
  
! 	  index = fold (build (MULT_EXPR, sbitsizetype, index,
! 			       convert (sbitsizetype,
! 					TYPE_SIZE (TREE_TYPE (exp)))));
  
! 	  if (TREE_CODE (index) == INTEGER_CST
! 	      && TREE_INT_CST_HIGH (index) == 0)
! 	    *pbitpos += TREE_INT_CST_LOW (index);
  	  else
  	    {
! 	      if (contains_placeholder_p (index))
! 		index = build (WITH_RECORD_EXPR, sizetype, index, exp);
  
! 	      offset = size_binop (PLUS_EXPR, offset,
! 				   size_binop (FLOOR_DIV_EXPR, index,
! 					       size_int (BITS_PER_UNIT)));
  	    }
  	}
        else if (TREE_CODE (exp) != NON_LVALUE_EXPR
--- 4392,4426 ----
  	      index_type = TREE_TYPE (index);
  	    }
  
! 	  xindex = fold (build (MULT_EXPR, sbitsizetype, index,
! 			        convert (sbitsizetype,
! 					 TYPE_SIZE (TREE_TYPE (exp)))));
  
! 	  if (TREE_CODE (xindex) == INTEGER_CST
! 	      && TREE_INT_CST_HIGH (xindex) == 0)
! 	    *pbitpos += TREE_INT_CST_LOW (xindex);
  	  else
  	    {
! 	      if (TYPE_SIZE_UNIT (TREE_TYPE (exp)) != 0)
! 		{
! 		  xindex = fold (build (MULT_EXPR, ssizetype, index,
!                                         convert (ssizetype,
!                                          TYPE_SIZE_UNIT (TREE_TYPE (exp)))));
  
! 		  if (contains_placeholder_p (xindex))
! 		    xindex = build (WITH_RECORD_EXPR, sizetype, xindex, exp);
! 
! 	          offset = size_binop (PLUS_EXPR, offset, xindex);
! 		}
! 	      else
! 		{
! 		  if (contains_placeholder_p (xindex))
! 		    xindex = build (WITH_RECORD_EXPR, sizetype, xindex, exp);
! 
! 	          offset = size_binop (PLUS_EXPR, offset,
! 				       size_binop (FLOOR_DIV_EXPR, xindex,
! 					           size_int (BITS_PER_UNIT)));
! 		}
  	    }
  	}
        else if (TREE_CODE (exp) != NON_LVALUE_EXPR
Index: stor-layout.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/stor-layout.c,v
retrieving revision 1.12
diff -c -p -d -r1.12 stor-layout.c
*** stor-layout.c	1998/05/06 04:53:58	1.12
--- stor-layout.c	1998/05/14 00:13:10
*************** layout_type (type)
*** 715,725 ****
--- 715,727 ----
        TYPE_MODE (type) = smallest_mode_for_size (TYPE_PRECISION (type),
  						 MODE_INT);
        TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)), 0L);
+       TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (TYPE_MODE (type)));
        break;
  
      case REAL_TYPE:
        TYPE_MODE (type) = mode_for_size (TYPE_PRECISION (type), MODE_FLOAT, 0);
        TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)), 0L);
+       TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (TYPE_MODE (type)));
        break;
  
      case COMPLEX_TYPE:
*************** layout_type (type)
*** 730,758 ****
  			  ? MODE_COMPLEX_INT : MODE_COMPLEX_FLOAT),
  			 0);
        TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)), 0L);
        break;
  
      case VOID_TYPE:
        TYPE_SIZE (type) = size_zero_node;
        TYPE_ALIGN (type) = 1;
        TYPE_MODE (type) = VOIDmode;
        break;
  
      case OFFSET_TYPE:
        TYPE_SIZE (type) = bitsize_int (POINTER_SIZE, 0L);
        TYPE_MODE (type) = ptr_mode;
        break;
  
      case FUNCTION_TYPE:
      case METHOD_TYPE:
        TYPE_MODE (type) = mode_for_size (2 * POINTER_SIZE, MODE_INT, 0);
!       TYPE_SIZE (type) = size_int (2 * POINTER_SIZE);
        break;
  
      case POINTER_TYPE:
      case REFERENCE_TYPE:
        TYPE_MODE (type) = ptr_mode;
        TYPE_SIZE (type) = bitsize_int (POINTER_SIZE, 0L);
        TREE_UNSIGNED (type) = 1;
        TYPE_PRECISION (type) = POINTER_SIZE;
        break;
--- 732,765 ----
  			  ? MODE_COMPLEX_INT : MODE_COMPLEX_FLOAT),
  			 0);
        TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)), 0L);
+       TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (TYPE_MODE (type)));
        break;
  
      case VOID_TYPE:
        TYPE_SIZE (type) = size_zero_node;
+       TYPE_SIZE_UNIT (type) = size_zero_node;
        TYPE_ALIGN (type) = 1;
        TYPE_MODE (type) = VOIDmode;
        break;
  
      case OFFSET_TYPE:
        TYPE_SIZE (type) = bitsize_int (POINTER_SIZE, 0L);
+       TYPE_SIZE_UNIT (type) = size_int (POINTER_SIZE / BITS_PER_UNIT);
        TYPE_MODE (type) = ptr_mode;
        break;
  
      case FUNCTION_TYPE:
      case METHOD_TYPE:
        TYPE_MODE (type) = mode_for_size (2 * POINTER_SIZE, MODE_INT, 0);
!       TYPE_SIZE (type) = bitsize_int (2 * POINTER_SIZE, 0);
!       TYPE_SIZE_UNIT (type) = size_int ((2 * POINTER_SIZE) / BITS_PER_UNIT);
        break;
  
      case POINTER_TYPE:
      case REFERENCE_TYPE:
        TYPE_MODE (type) = ptr_mode;
        TYPE_SIZE (type) = bitsize_int (POINTER_SIZE, 0L);
+       TYPE_SIZE_UNIT (type) = size_int (POINTER_SIZE / BITS_PER_UNIT);
        TREE_UNSIGNED (type) = 1;
        TYPE_PRECISION (type) = POINTER_SIZE;
        break;
*************** layout_type (type)
*** 805,810 ****
--- 812,828 ----
  
  	    TYPE_SIZE (type) = size_binop (MULT_EXPR, TYPE_SIZE (element),
  					   length);
+ 
+ 	    /* If we know the size of the element, calculate the total
+ 	       size directly, rather than do some division thing below.
+ 	       This optimization helps Fortran assumed-size arrays
+ 	       (where the size of the array is determined at runtime)
+ 	       substantially.  */
+ 	    if (TYPE_SIZE_UNIT (element) != 0)
+ 	      {
+ 	        TYPE_SIZE_UNIT (type)
+ 		  = size_binop (MULT_EXPR, TYPE_SIZE_UNIT (element), length);
+ 	      }
  	  }
  
  	/* Now round the alignment and size,
*************** layout_type (type)
*** 819,826 ****
  
  #ifdef ROUND_TYPE_SIZE
  	if (TYPE_SIZE (type) != 0)
! 	  TYPE_SIZE (type)
! 	    = ROUND_TYPE_SIZE (type, TYPE_SIZE (type), TYPE_ALIGN (type));
  #endif
  
  	TYPE_MODE (type) = BLKmode;
--- 837,849 ----
  
  #ifdef ROUND_TYPE_SIZE
  	if (TYPE_SIZE (type) != 0)
! 	  {
! 	    tree tmp;
! 	    tmp = ROUND_TYPE_SIZE (type, TYPE_SIZE (type), TYPE_ALIGN (type));
! 	    if (TYPE_SIZE (type) != tmp)
! 	      TYPE_SIZE_UNIT (type) = NULL;
! 	    TYPE_SIZE (type) = tmp;
! 	  }
  #endif
  
  	TYPE_MODE (type) = BLKmode;
*************** layout_type (type)
*** 978,983 ****
--- 1001,1007 ----
  	  else
  	    TYPE_MODE (type) = mode_for_size (alignment, MODE_INT, 1);
  	  TYPE_SIZE (type) = bitsize_int (rounded_size, 0L);
+ 	  TYPE_SIZE_UNIT (type) = size_int (rounded_size / BITS_PER_UNIT);
  	  TYPE_ALIGN (type) = alignment;
  	  TYPE_PRECISION (type) = size_in_bits;
  	}
*************** layout_type (type)
*** 1010,1015 ****
--- 1034,1052 ----
    if (TYPE_SIZE (type) != 0 && TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
      TYPE_SIZE (type) = variable_size (TYPE_SIZE (type));
  
+   /* If we failed to find a simple way to calculate the unit size
+      of the type above, find it by division.  */
+   if (TYPE_SIZE_UNIT (type) == 0 && TYPE_SIZE (type) != 0)
+     {
+       TYPE_SIZE_UNIT (type) = size_binop (FLOOR_DIV_EXPR, TYPE_SIZE (type),
+ 				          size_int (BITS_PER_UNIT));
+     }
+ 
+   /* Once again, evaluate only once, either now or as soon as safe.  */
+   if (TYPE_SIZE_UNIT (type) != 0
+       && TREE_CODE (TYPE_SIZE_UNIT (type)) != INTEGER_CST)
+     TYPE_SIZE_UNIT (type) = variable_size (TYPE_SIZE_UNIT (type));
+ 
    /* Also layout any other variants of the type.  */
    if (TYPE_NEXT_VARIANT (type)
        || type != TYPE_MAIN_VARIANT (type))
*************** layout_type (type)
*** 1017,1022 ****
--- 1054,1060 ----
        tree variant;
        /* Record layout info of this variant.  */
        tree size = TYPE_SIZE (type);
+       tree size_unit = TYPE_SIZE_UNIT (type);
        int align = TYPE_ALIGN (type);
        enum machine_mode mode = TYPE_MODE (type);
  
*************** layout_type (type)
*** 1026,1031 ****
--- 1064,1070 ----
  	   variant = TYPE_NEXT_VARIANT (variant))
  	{
  	  TYPE_SIZE (variant) = size;
+ 	  TYPE_SIZE_UNIT (variant) = size_unit;
  	  TYPE_ALIGN (variant) = align;
  	  TYPE_MODE (variant) = mode;
  	}
Index: tree.h
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/tree.h,v
retrieving revision 1.28
diff -c -p -d -r1.28 tree.h
*** tree.h	1998/05/06 08:36:04	1.28
--- tree.h	1998/05/14 00:13:10
*************** struct tree_block
*** 698,703 ****
--- 698,704 ----
  
  #define TYPE_UID(NODE) ((NODE)->type.uid)
  #define TYPE_SIZE(NODE) ((NODE)->type.size)
+ #define TYPE_SIZE_UNIT(NODE) ((NODE)->type.size_unit)
  #define TYPE_MODE(NODE) ((NODE)->type.mode)
  #define TYPE_VALUES(NODE) ((NODE)->type.values)
  #define TYPE_DOMAIN(NODE) ((NODE)->type.values)
*************** struct tree_type
*** 779,784 ****
--- 780,786 ----
    char common[sizeof (struct tree_common)];
    union tree_node *values;
    union tree_node *size;
+   union tree_node *size_unit;
    union tree_node *attributes;
    unsigned uid;
  

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: array reference optimization
  1998-05-13 22:33 array reference optimization Richard Henderson
@ 1998-05-16 19:47 ` Richard Henderson
  1998-05-17 17:58 ` Jim Wilson
  1 sibling, 0 replies; 4+ messages in thread
From: Richard Henderson @ 1998-05-16 19:47 UTC (permalink / raw)
  To: wilson; +Cc: egcs

Here's a revised version of the patch that changes a few of the
things mentioned yesterday afternoon.  To wit: 

  (1) Since layout_type should always be initializing TYPE_SIZE_UNIT,
      get_inner_reference does not check it for NULL.

  (2) size_in_bytes and int_size_in_bytes use TYPE_SIZE_UNIT.

In addition, I added a bit more commentary.

r~

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: array reference optimization
  1998-05-13 22:33 array reference optimization Richard Henderson
  1998-05-16 19:47 ` Richard Henderson
@ 1998-05-17 17:58 ` Jim Wilson
  1998-05-17 22:03   ` Richard Henderson
  1 sibling, 1 reply; 4+ messages in thread
From: Jim Wilson @ 1998-05-17 17:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: egcs

This looks OK to me.

However, I don't think that adding a new field to the tree types is
strictly necessary to solve this.  One could just as well do the divide
at the point we need it instead of saving it in all tree type nodes.

E.g. instead of what the old code does:
! 	  index = fold (build (MULT_EXPR, sbitsizetype, index,
! 			       convert (sbitsizetype,
! 					TYPE_SIZE (TREE_TYPE (exp)))));
	...
! 	offset = ...		   size_binop (FLOOR_DIV_EXPR, index,
! 					       size_int (BITS_PER_UNIT)));
which requires doing the multiply in bitsizetype, we could do the divide
first and then the multiple, something like this:
	oset = ... fold (build (MULT_EXPR, ssizetype, index,
			size_binop (FLOOR_DIV_EXPR, TYPE_SIZE (...)
				    size_int (BITS_PER_UNIT))))
which keeps everything in range of sizetype.

Jim

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: array reference optimization
  1998-05-17 17:58 ` Jim Wilson
@ 1998-05-17 22:03   ` Richard Henderson
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Henderson @ 1998-05-17 22:03 UTC (permalink / raw)
  To: Jim Wilson, Richard Henderson; +Cc: egcs

On Sun, May 17, 1998 at 02:19:00PM -0700, Jim Wilson wrote:
> which requires doing the multiply in bitsizetype, we could do the divide
> first and then the multiple, something like this:
> 	oset = ... fold (build (MULT_EXPR, ssizetype, index,
> 			size_binop (FLOOR_DIV_EXPR, TYPE_SIZE (...)
> 				    size_int (BITS_PER_UNIT))))
> which keeps everything in range of sizetype.

It keeps it in the range of sizetype, yes, but my patch had a
secondary goal: that of eliminating a divide at runtime.

When the size of a type is not constant, as for the variables `res'
`rhs' and `u' in this simplified version of a function from spec95:

      subroutine resid(res,u,rhs,n,h2i)
      implicit none
      integer n
      double precision res(n,n),rhs(n,n),u(n,n)
      integer i,j
      double precision h2i
      do j = 2, n-1
         do i = 2, n-1
            res(i,j)=-h2i*(u(i+1,j)+u(i-1,j)+u(i,j+1)+u(i,j-1)-
     &          4.d0*u(i,j))+rhs(i,j)
         enddo
      enddo
      end

We get, in part,

	.prologue 1
	bis $20,$20,$9
	ldl $1,0($19)		; Load N
	sll $1,6,$20		; Multiply by 64
[...]
$L9:
	subq $23,1,$8
	s8addq $8,0,$8
	bis $20,$20,$6
	bis $31,$31,$7
	srl $6,3,$6		; Divide by 8

In my mind there is no excuse for this -- we should have only done
the multiply by 8 in the first place.


r~

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1998-05-17 22:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-05-13 22:33 array reference optimization Richard Henderson
1998-05-16 19:47 ` Richard Henderson
1998-05-17 17:58 ` Jim Wilson
1998-05-17 22:03   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).