* Re: multiple_of_p change
@ 1997-08-23 0:01 Torbjorn Granlund
0 siblings, 0 replies; 4+ messages in thread
From: Torbjorn Granlund @ 1997-08-23 0:01 UTC (permalink / raw)
To: egcs
As the comments in the patch suggest, there are other cases which
potentially we might want to optimize, but would need further work
in expand_divmod (ie cases where code generated for EXACT_DIV_EXPR
is slower than than TRUNC_DIV or FLOOR_DIV).
[ BTW can someone provide code samples where TRUNC_DIV or FLOOR_DIV
produces better code than EXACT_DIV that we can examine? ]
I suppose that could happen for powers-of-two. If there are other cases
that give worse code for EXACT_DIV_EXPR, please let me know.
Here is an untested patch:
* expmed.c (expand_divmod): Make op1_is_pow2 depend on unsignedp
for negative constants. Promote EXACT_DIV_EXPR to TRUNC_DIV_EXPR
when op1_is_pow2.
*** /u/gcc/ss/970803/expmed.c Mon May 26 00:54:03 1997
--- expmed.c Sat Aug 23 01:49:18 1997
***************
*** 2719,2723 ****
op1_is_pow2 = (op1_is_constant
&& ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1))
! || EXACT_POWER_OF_2_OR_ZERO_P (-INTVAL (op1)))));
/*
--- 2719,2723 ----
op1_is_pow2 = (op1_is_constant
&& ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1))
! || (! unsignedp && EXACT_POWER_OF_2_OR_ZERO_P (-INTVAL (op1))))));
/*
***************
*** 2855,2858 ****
--- 2855,2860 ----
if (code == FLOOR_MOD_EXPR)
code = TRUNC_MOD_EXPR;
+ if (code == EXACT_DIV_EXPR && op1_is_pow2)
+ code = TRUNC_DIV_EXPR;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: multiple_of_p change
1997-10-24 16:16 ` multiple_of_p change Christian Iseli
@ 1997-11-02 21:29 ` Jeffrey A Law
0 siblings, 0 replies; 4+ messages in thread
From: Jeffrey A Law @ 1997-11-02 21:29 UTC (permalink / raw)
To: Christian Iseli; +Cc: egcs
In message <199710242313.BAA08816@Rivendell.MiddleEarth.net>you write:
> During the processing, the subreg case around line 7390 of cse.c is entered.
> a classp is found, and its exp field is (subreg:SI (reg/v:SF 35) 0). At the
> next loop iteration, new_src is set from gen_lowpart_if_possible to the same
> expression (subreg:SI (reg/v:SF 35) 0).
> The *bad* thing is that HASH produces a different value than it did when
> classp->exp was hashed.
OK. This is where I'd suggest you concentrate your effort -- figure
out how/why the same RTL hashes differently from the two different
call sites.
jeff
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: multiple_of_p change
1997-10-20 9:43 Small problem in cse Jeffrey A Law
@ 1997-10-24 16:16 ` Christian Iseli
1997-11-02 21:29 ` Jeffrey A Law
0 siblings, 1 reply; 4+ messages in thread
From: Christian Iseli @ 1997-10-24 16:16 UTC (permalink / raw)
To: law; +Cc: egcs
>> Sorry, I meant to say why is classp->first_same_value NULL?
>> From my review of the code I don't see that classp->first_same_value
>> should ever be NULL -- thus I suspect something has gone wrong
>> elsewhere that needs to be investigated.
>> But I could be wrong, since you've got a target & testcase which
>> triggers this problem you'll need to do some of the analysis.
>Ok, I'll do my best and keep you posted...
Well, it seems you were right about the fact that first_same_value should never
be null... but here is what I managed to observe so far...
cse_insn is called with the following insn
(insn 4708 4707 4709 (set (subreg:SF (reg/v:SI 47) 0)
(const_double:SF (const_int 0) 0 1076953088)) 4 {movsf} (nil)
(expr_list:REG_EQUAL (minus:SF (const_double:SF (const_int 0) 0 1077018624)
(const_double:SF (cc0) 0 1072693248))
(nil)))
This is for an 8-bit target, where the source is attempting to do a
pre-decrement of a long double
number. The target defines float as TQF (24 bits), double and long double are
both SF (32 bits).
During the processing, the subreg case around line 7390 of cse.c is entered.
a classp is found, and its exp field is (subreg:SI (reg/v:SF 35) 0). At the
next loop iteration,
new_src is set from gen_lowpart_if_possible to the same expression (subreg:SI
(reg/v:SF 35) 0).
The *bad* thing is that HASH produces a different value than it did when
classp->exp was hashed.
So, when insert_regs is called, the element pointed to by classp is deleted
from the hash table
and thus classp->first_same_value becomes 0...
But now I'm stuck (and tired... ;-). What would be the right thing(tm) to do
now?
Christian
^ permalink raw reply [flat|nested] 4+ messages in thread
* multiple_of_p change
1997-08-22 19:47 fixing the c++/f77 circular dependency H.J. Lu
@ 1997-08-22 23:04 ` Jeffrey A Law
0 siblings, 0 replies; 4+ messages in thread
From: Jeffrey A Law @ 1997-08-22 23:04 UTC (permalink / raw)
To: egcs
I _finally_ got around to looking at the multiple_of_p change.
The canonical C example from Toon is:
void copy3(int n; float a[n][n][n], float b[n][n][n], int n)
{
int i, j, k;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
for (k = 0; k < n; k++)
b[i][j][k] = a[i][j][k];
}
On my PA, the inner loop expands to about 25 insns; with the
multiple_of_p patch it's only 5 insns.
I don't know how often this kind of thing normally happens in C,
but apparently it's quite common in Fortran. So it's worth trying
to optimize.
As the comments in the patch suggest, there are other cases which
potentially we might want to optimize, but would need further work
in expand_divmod (ie cases where code generated for EXACT_DIV_EXPR
is slower than than TRUNC_DIV or FLOOR_DIV).
[ BTW can someone provide code samples where TRUNC_DIV or FLOOR_DIV
produces better code than EXACT_DIV that we can examine? ]
However, since one of things we want to do with this project is
be more pragmatic about this kind of stuff, does anyone see a
reason why this patch couldn't go in (after some comment updates
of course).
Index: fold-const.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/fold-const.c,v
retrieving revision 1.105
diff -c -3 -p -r1.105 fold-const.c
*** fold-const.c 1997/08/08 17:58:39 1.105
--- fold-const.c 1997/08/22 23:00:33
*************** static tree fold_range_test PROTO((tree)
*** 107,112 ****
--- 107,113 ----
static tree unextend PROTO((tree, int, int, tree));
static tree fold_truthop PROTO((enum tree_code, tree, tree, tree));
static tree strip_compound_expr PROTO((tree, tree));
+ static int multiple_of_p PROTO((tree, tree, tree));
#ifndef BRANCH_COST
#define BRANCH_COST 1
*************** fold (expr)
*** 4571,4576 ****
--- 4572,4589 ----
if (integer_zerop (arg1))
return t;
+ /* If arg0 is a multiple of arg1, then rewrite to the fastest div
+ operation, EXACT_DIV_EXPR. Otherwise, handle folding of
+ general divide. Note that only CEIL_DIV_EXPR is rewritten now,
+ only because the others seem to be faster in some cases, e.g. the
+ nonoptimized TRUNC_DIV_EXPR or FLOOR_DIV_EXPR on DEC Alpha. This
+ is probably just due to more work being done on it in expmed.c than
+ on EXACT_DIV_EXPR, and could presumably be fixed, since
+ EXACT_DIV_EXPR should _never_ be slower than *_DIV_EXPR. */
+ if (code == CEIL_DIV_EXPR
+ && multiple_of_p (type, arg0, arg1))
+ return fold (build (EXACT_DIV_EXPR, type, arg0, arg1));
+
/* If we have ((a / C1) / C2) where both division are the same type, try
to simplify. First see if C1 * C2 overflows or not. */
if (TREE_CODE (arg0) == code && TREE_CODE (arg1) == INTEGER_CST
*************** fold (expr)
*** 5746,5748 ****
--- 5759,5862 ----
return t;
} /* switch (code) */
}
+
+ /* Determine if first argument is a multiple of second argument.
+ Return 0 if it is not, or is not easily determined to so be.
+
+ An example of the sort of thing we care about (at this point --
+ this routine could surely be made more general, and expanded
+ to do what the *_DIV_EXPR's fold() cases do now) is discovering
+ that
+
+ SAVE_EXPR (I) * SAVE_EXPR (J * 8)
+
+ is a multiple of
+
+ SAVE_EXPR (J * 8)
+
+ when we know that the two `SAVE_EXPR (J * 8)' nodes are the
+ same node (which means they will have the same value at run
+ time, even though we don't know when they'll be assigned).
+
+ This code also handles discovering that
+
+ SAVE_EXPR (I) * SAVE_EXPR (J * 8)
+
+ is a multiple of
+
+ 8
+
+ (of course) so we don't have to worry about dealing with a
+ possible remainder.
+
+ Note that we _look_ inside a SAVE_EXPR only to determine
+ how it was calculated; it is not safe for fold() to do much
+ of anything else with the internals of a SAVE_EXPR, since
+ fold() cannot know when it will be evaluated at run time.
+ For example, the latter example above _cannot_ be implemented
+ as
+
+ SAVE_EXPR (I) * J
+
+ or any variant thereof, since the value of J at evaluation time
+ of the original SAVE_EXPR is not necessarily the same at the time
+ the new expression is evaluated. The only optimization of this
+ sort that would be valid is changing
+
+ SAVE_EXPR (I) * SAVE_EXPR (SAVE_EXPR (J) * 8)
+ divided by
+ 8
+
+ to
+
+ SAVE_EXPR (I) * SAVE_EXPR (J)
+
+ (where the same SAVE_EXPR (J) is used in the original and the
+ transformed version). */
+
+ static int
+ multiple_of_p (type, top, bottom)
+ tree type;
+ tree top;
+ tree bottom;
+ {
+ if (operand_equal_p (top, bottom, 0))
+ return 1;
+
+ if (TREE_CODE (type) != INTEGER_TYPE)
+ return 0;
+
+ switch (TREE_CODE (top))
+ {
+ case MULT_EXPR:
+ return (multiple_of_p (type, TREE_OPERAND (top, 0), bottom)
+ || multiple_of_p (type, TREE_OPERAND (top, 1), bottom));
+
+ case PLUS_EXPR:
+ case MINUS_EXPR:
+ return (multiple_of_p (type, TREE_OPERAND (top, 0), bottom)
+ && multiple_of_p (type, TREE_OPERAND (top, 1), bottom));
+
+ case NOP_EXPR:
+ /* Punt if conversion from non-integral or wider integral type. */
+ if ((TREE_CODE (TREE_TYPE (TREE_OPERAND (top, 0))) != INTEGER_TYPE)
+ || (TYPE_PRECISION (type)
+ < TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (top, 0)))))
+ return 0;
+ /* Fall through. */
+ case SAVE_EXPR:
+ return multiple_of_p (type, TREE_OPERAND (top, 0), bottom);
+
+ case INTEGER_CST:
+ if ((TREE_CODE (bottom) != INTEGER_CST)
+ || (tree_int_cst_sgn (top) < 0)
+ || (tree_int_cst_sgn (bottom) < 0))
+ return 0;
+ return integer_zerop (const_binop (TRUNC_MOD_EXPR,
+ top, bottom, 0));
+
+ default:
+ return 0;
+ }
+ }
+ \f
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~1997-11-02 21:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-08-23 0:01 multiple_of_p change Torbjorn Granlund
-- strict thread matches above, loose matches on Subject: below --
1997-10-20 9:43 Small problem in cse Jeffrey A Law
1997-10-24 16:16 ` multiple_of_p change Christian Iseli
1997-11-02 21:29 ` Jeffrey A Law
1997-08-22 19:47 fixing the c++/f77 circular dependency H.J. Lu
1997-08-22 23:04 ` multiple_of_p change Jeffrey A Law
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).