public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: multiple_of_p change
@ 1997-08-23  0:01 Torbjorn Granlund
  0 siblings, 0 replies; 4+ messages in thread
From: Torbjorn Granlund @ 1997-08-23  0:01 UTC (permalink / raw)
  To: egcs

  As the comments in the patch suggest, there are other cases which
  potentially we might want to optimize, but would need further work
  in expand_divmod (ie cases where code generated for EXACT_DIV_EXPR
  is slower than than TRUNC_DIV or FLOOR_DIV).

  [ BTW can someone provide code samples where TRUNC_DIV or FLOOR_DIV
    produces better code than EXACT_DIV that we can examine? ]

I suppose that could happen for powers-of-two.  If there are other cases
that give worse code for EXACT_DIV_EXPR, please let me know.

Here is an untested patch:

	* expmed.c (expand_divmod): Make op1_is_pow2 depend on unsignedp
	for negative constants.  Promote EXACT_DIV_EXPR to TRUNC_DIV_EXPR
	when op1_is_pow2.

*** /u/gcc/ss/970803/expmed.c	Mon May 26 00:54:03 1997
--- expmed.c	Sat Aug 23 01:49:18 1997
***************
*** 2719,2723 ****
    op1_is_pow2 = (op1_is_constant
  		 && ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1))
! 		      || EXACT_POWER_OF_2_OR_ZERO_P (-INTVAL (op1)))));
  
    /*
--- 2719,2723 ----
    op1_is_pow2 = (op1_is_constant
  		 && ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1))
! 		      || (! unsignedp && EXACT_POWER_OF_2_OR_ZERO_P (-INTVAL (op1))))));
  
    /*
***************
*** 2855,2858 ****
--- 2855,2860 ----
        if (code == FLOOR_MOD_EXPR)
  	code = TRUNC_MOD_EXPR;
+       if (code == EXACT_DIV_EXPR && op1_is_pow2)
+ 	code = TRUNC_DIV_EXPR;
      }
  

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: multiple_of_p change
  1997-10-24 16:16 ` multiple_of_p change Christian Iseli
@ 1997-11-02 21:29   ` Jeffrey A Law
  0 siblings, 0 replies; 4+ messages in thread
From: Jeffrey A Law @ 1997-11-02 21:29 UTC (permalink / raw)
  To: Christian Iseli; +Cc: egcs

  In message <199710242313.BAA08816@Rivendell.MiddleEarth.net>you write:
  > During the processing, the subreg case around line 7390 of cse.c is entered.
  > a classp is found, and its exp field is (subreg:SI (reg/v:SF 35) 0).  At the 
  > next loop iteration, new_src is set from gen_lowpart_if_possible to the same
  > expression (subreg:SI (reg/v:SF 35) 0).
  > The *bad* thing is that HASH produces a different value than it did when 
  > classp->exp was hashed.
OK.  This is where I'd suggest you concentrate your effort -- figure
out how/why the same RTL hashes differently from the two different
call sites.

jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: multiple_of_p change
  1997-10-20  9:43 Small problem in cse Jeffrey A Law
@ 1997-10-24 16:16 ` Christian Iseli
  1997-11-02 21:29   ` Jeffrey A Law
  0 siblings, 1 reply; 4+ messages in thread
From: Christian Iseli @ 1997-10-24 16:16 UTC (permalink / raw)
  To: law; +Cc: egcs

>> Sorry, I meant to say why is classp->first_same_value NULL?

>> From my review of the code I don't see that classp->first_same_value
>> should ever be NULL -- thus I suspect something has gone wrong
>> elsewhere that needs to be investigated.

>> But I could be wrong, since you've got a target & testcase which
>> triggers this problem you'll need to do some of the analysis.

>Ok, I'll do my best and keep you posted...

Well, it seems you were right about the fact that first_same_value should never
be null... but here is what I managed to observe so far...

cse_insn is called with the following insn
(insn 4708 4707 4709 (set (subreg:SF (reg/v:SI 47) 0)
        (const_double:SF (const_int 0) 0 1076953088)) 4 {movsf} (nil)
    (expr_list:REG_EQUAL (minus:SF (const_double:SF (const_int 0) 0 1077018624)
            (const_double:SF (cc0) 0 1072693248))
        (nil)))
This is for an 8-bit target, where the source is attempting to do a 
pre-decrement of a long double
number.  The target defines float as TQF (24 bits), double and long double are 
both SF (32 bits).

During the processing, the subreg case around line 7390 of cse.c is entered.
a classp is found, and its exp field is (subreg:SI (reg/v:SF 35) 0).  At the 
next loop iteration,
new_src is set from gen_lowpart_if_possible to the same expression (subreg:SI 
(reg/v:SF 35) 0).
The *bad* thing is that HASH produces a different value than it did when 
classp->exp was hashed.
So, when insert_regs is called, the element pointed to by classp is deleted 
from the hash table
and thus classp->first_same_value becomes 0...

But now I'm stuck (and tired... ;-).  What would be the right thing(tm) to do 
now?

					Christian



^ permalink raw reply	[flat|nested] 4+ messages in thread

* multiple_of_p change
  1997-08-22 19:47 fixing the c++/f77 circular dependency H.J. Lu
@ 1997-08-22 23:04 ` Jeffrey A Law
  0 siblings, 0 replies; 4+ messages in thread
From: Jeffrey A Law @ 1997-08-22 23:04 UTC (permalink / raw)
  To: egcs

I _finally_ got around to looking at the multiple_of_p change.


The canonical C example from Toon is:

void copy3(int n; float a[n][n][n], float b[n][n][n], int n)
{
        int i, j, k;

        for (i = 0; i < n; i++)
                for (j = 0; j < n; j++)
                        for (k = 0; k < n; k++)
                                b[i][j][k] = a[i][j][k];
}


On my PA, the inner loop expands to about 25 insns; with the
multiple_of_p patch it's only 5 insns.

I don't know how often this kind of thing normally happens in C,
but apparently it's quite common in Fortran.  So it's worth trying
to optimize.

As the comments in the patch suggest, there are other cases which
potentially we might want to optimize, but would need further work
in expand_divmod (ie cases where code generated for EXACT_DIV_EXPR
is slower than than TRUNC_DIV or FLOOR_DIV).

[ BTW can someone provide code samples where TRUNC_DIV or FLOOR_DIV
  produces better code than EXACT_DIV that we can examine? ]

However, since one of things we want to do with this project is
be more pragmatic about this kind of stuff, does anyone see a
reason why this patch couldn't go in (after some comment updates
of course).

Index: fold-const.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/fold-const.c,v
retrieving revision 1.105
diff -c -3 -p -r1.105 fold-const.c
*** fold-const.c	1997/08/08 17:58:39	1.105
--- fold-const.c	1997/08/22 23:00:33
*************** static tree fold_range_test	PROTO((tree)
*** 107,112 ****
--- 107,113 ----
  static tree unextend		PROTO((tree, int, int, tree));
  static tree fold_truthop	PROTO((enum tree_code, tree, tree, tree));
  static tree strip_compound_expr PROTO((tree, tree));
+ static int multiple_of_p		PROTO((tree, tree, tree));
  
  #ifndef BRANCH_COST
  #define BRANCH_COST 1
*************** fold (expr) 
*** 4571,4576 ****
--- 4572,4589 ----
        if (integer_zerop (arg1))
  	return t;
  
+       /* If arg0 is a multiple of arg1, then rewrite to the fastest div
+ 	 operation, EXACT_DIV_EXPR.  Otherwise, handle folding of
+ 	 general divide.  Note that only CEIL_DIV_EXPR is rewritten now,
+ 	 only because the others seem to be faster in some cases, e.g. the
+ 	 nonoptimized TRUNC_DIV_EXPR or FLOOR_DIV_EXPR on DEC Alpha.  This
+ 	 is probably just due to more work being done on it in expmed.c than
+ 	 on EXACT_DIV_EXPR, and could presumably be fixed, since
+ 	 EXACT_DIV_EXPR should _never_ be slower than *_DIV_EXPR.  */
+       if (code == CEIL_DIV_EXPR
+ 	  && multiple_of_p (type, arg0, arg1))
+ 	return fold (build (EXACT_DIV_EXPR, type, arg0, arg1));
+ 
        /* If we have ((a / C1) / C2) where both division are the same type, try
  	 to simplify.  First see if C1 * C2 overflows or not.  */
        if (TREE_CODE (arg0) == code && TREE_CODE (arg1) == INTEGER_CST
*************** fold (expr) 
*** 5746,5748 ****
--- 5759,5862 ----
        return t;
      } /* switch (code) */
  }
+ 
+ /* Determine if first argument is a multiple of second argument.
+    Return 0 if it is not, or is not easily determined to so be.
+ 
+    An example of the sort of thing we care about (at this point --
+    this routine could surely be made more general, and expanded
+    to do what the *_DIV_EXPR's fold() cases do now) is discovering
+    that
+ 
+      SAVE_EXPR (I) * SAVE_EXPR (J * 8)
+ 
+    is a multiple of
+ 
+      SAVE_EXPR (J * 8)
+ 
+    when we know that the two `SAVE_EXPR (J * 8)' nodes are the
+    same node (which means they will have the same value at run
+    time, even though we don't know when they'll be assigned).
+ 
+    This code also handles discovering that
+ 
+      SAVE_EXPR (I) * SAVE_EXPR (J * 8)
+ 
+    is a multiple of
+ 
+      8
+ 
+    (of course) so we don't have to worry about dealing with a
+    possible remainder.
+ 
+    Note that we _look_ inside a SAVE_EXPR only to determine
+    how it was calculated; it is not safe for fold() to do much
+    of anything else with the internals of a SAVE_EXPR, since
+    fold() cannot know when it will be evaluated at run time.
+    For example, the latter example above _cannot_ be implemented
+    as
+ 
+      SAVE_EXPR (I) * J
+ 
+    or any variant thereof, since the value of J at evaluation time
+    of the original SAVE_EXPR is not necessarily the same at the time
+    the new expression is evaluated.  The only optimization of this
+    sort that would be valid is changing
+ 
+      SAVE_EXPR (I) * SAVE_EXPR (SAVE_EXPR (J) * 8)
+    divided by
+      8
+ 
+    to
+ 
+      SAVE_EXPR (I) * SAVE_EXPR (J)
+ 
+    (where the same SAVE_EXPR (J) is used in the original and the
+    transformed version).  */
+ 
+ static int
+ multiple_of_p (type, top, bottom)
+      tree type;
+      tree top;
+      tree bottom;
+ {
+   if (operand_equal_p (top, bottom, 0))
+     return 1;
+ 
+   if (TREE_CODE (type) != INTEGER_TYPE)
+     return 0;
+ 
+   switch (TREE_CODE (top))
+     {
+     case MULT_EXPR:
+       return (multiple_of_p (type, TREE_OPERAND (top, 0), bottom)
+ 	      || multiple_of_p (type, TREE_OPERAND (top, 1), bottom));
+ 
+     case PLUS_EXPR:
+     case MINUS_EXPR:
+       return (multiple_of_p (type, TREE_OPERAND (top, 0), bottom)
+ 	      && multiple_of_p (type, TREE_OPERAND (top, 1), bottom));
+ 
+     case NOP_EXPR:
+       /* Punt if conversion from non-integral or wider integral type.  */
+       if ((TREE_CODE (TREE_TYPE (TREE_OPERAND (top, 0))) != INTEGER_TYPE)
+ 	  || (TYPE_PRECISION (type)
+ 	      < TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (top, 0)))))
+ 	return 0;
+       /* Fall through. */
+     case SAVE_EXPR:
+       return multiple_of_p (type, TREE_OPERAND (top, 0), bottom);
+ 
+     case INTEGER_CST:
+       if ((TREE_CODE (bottom) != INTEGER_CST)
+ 	  || (tree_int_cst_sgn (top) < 0)
+ 	  || (tree_int_cst_sgn (bottom) < 0))
+ 	return 0;
+       return integer_zerop (const_binop (TRUNC_MOD_EXPR,
+ 					 top, bottom, 0));
+ 
+     default:
+       return 0;
+     }
+ }
+ \f

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1997-11-02 21:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-08-23  0:01 multiple_of_p change Torbjorn Granlund
  -- strict thread matches above, loose matches on Subject: below --
1997-10-20  9:43 Small problem in cse Jeffrey A Law
1997-10-24 16:16 ` multiple_of_p change Christian Iseli
1997-11-02 21:29   ` Jeffrey A Law
1997-08-22 19:47 fixing the c++/f77 circular dependency H.J. Lu
1997-08-22 23:04 ` multiple_of_p change Jeffrey A Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).