* More than you ever wanted to know about Fortran array indexing ;-)
@ 1997-10-05 5:02 Toon Moene
1997-10-05 17:49 ` Richard Henderson
0 siblings, 1 reply; 7+ messages in thread
From: Toon Moene @ 1997-10-05 5:02 UTC (permalink / raw)
To: egcs
[ The secret hope behind all this is that someone gets so fed up
with my mails that s/he solves the problem, saving the documentation
for later ... ]
All this pertains to the Alpha platform, dealing with array index
calculations in a different mode than the final address expressions.
Consider the following two examples:
subroutine a(x, y, n)
dimension x(n), y(n)
do i = 2, n-1
x(i) = (y(i-1) + 2 * y(i) + y(i+1))/3
enddo
end
and
subroutine b(x, y, n, m)
dimension x(n, m), y(n, m)
do j = 1, n
do i = 1, m
x(i, j) = y(i, j)
enddo
enddo
end
Up til now, it looked like rank 1 examples were compiled to
efficient code (using Richard Henderson's "expr.c" patch) and rank >
1 examples inefficiently.
These new examples show that that is not the correct way of looking
at this poblem.
The loop of the first looks like this:
$37:
lds $f1,0($4)
addl $3,$31,$1
adds $f1,$f1,$f1
s4addq $1,$17,$1
lds $f10,-4($1)
adds $f10,$f1,$f10
s4addq $5,$17,$1
lds $f1,0($1)
adds $f10,$f1,$f10
divs $f10,$f11,$f10
addq $4,4,$4
addq $3,1,$3
subl $2,1,$2
addl $5,1,$5
sts $f10,0($16)
addq $16,4,$16
bge $2,$37
and the inner loop of the second:
$41:
lds $f1,0($1)
subl $3,1,$3
addq $1,4,$1
sts $f1,0($4)
addq $4,4,$4
bge $3,$41
Apparently, in the last case, the backend is able to get rid of the
multiplications and divisions entirely. In the fist case, although
from first principles one knows that there are only three loads,
one store and four addq's necessary, we still see two s4adds and
several {add.sub}l's which means that not all integer computations
have been converted to `sizetype' mode (on an Alpha: 64 bits, i.e.
*q*uadword).
The conclusion here is very simple: This cannot be solved in the
backend completely. The ....l instructions result fom the
frontend's i-1 and i+1 calculations, which cannot be relegated to
the backend.
So, we find ourselves in a nice cave, surrounded by a twisty maze
of little passages, all inpenetrable ...
"Wie 't weet, mag 't zeggen"
Cheers,
Toon.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More than you ever wanted to know about Fortran array indexing ;-)
1997-10-05 5:02 More than you ever wanted to know about Fortran array indexing ;-) Toon Moene
@ 1997-10-05 17:49 ` Richard Henderson
1997-10-06 7:02 ` More than you ever wanted to know about Fortran array indexing;-) Toon Moene
1997-10-07 23:14 ` Toon Moene
0 siblings, 2 replies; 7+ messages in thread
From: Richard Henderson @ 1997-10-05 17:49 UTC (permalink / raw)
To: Toon Moene; +Cc: egcs
On Sun, Oct 05, 1997 at 01:53:36PM +0200, Toon Moene wrote:
> The conclusion here is very simple: This cannot be solved in the
> backend completely. The ....l instructions result fom the
> frontend's i-1 and i+1 calculations, which cannot be relegated to
> the backend.
The appended patch, I believe, does this. The result is relatively
good code for all of the examples that have been posted so far.
Only, I would like to see loop take y[i-1], y[i], y[i+1], make one
induction variable, and use -4(R), 0(R), 4(R) in the loop. Hmm...
> "Wie 't weet, mag 't zeggen"
Mande?
r~
Index: com.c
===================================================================
RCS file: /cvs/cvsfiles/egcs/gcc/f/com.c,v
retrieving revision 1.5
diff -u -p -d -r1.5 com.c
--- com.c 1997/09/18 23:30:08 1.5
+++ com.c 1997/10/06 00:28:29
@@ -433,7 +433,7 @@ static ffecomConcatList_ ffecom_concat_l
static void ffecom_debug_kludge_ (tree aggr, char *aggr_type, ffesymbol member,
tree member_type, ffetargetOffset offset);
static void ffecom_do_entry_ (ffesymbol fn, int entrynum);
-static tree ffecom_expr_ (ffebld expr, tree dest_tree,
+static tree ffecom_expr_ (ffebld expr, tree type_tree, tree dest_tree,
ffebld dest, bool *dest_used,
bool assignp);
static tree ffecom_expr_intrinsic_ (ffebld expr, tree dest_tree,
@@ -2666,11 +2666,15 @@ ffecom_do_entry_ (ffesymbol fn, int entr
Recursive descent on expr while making corresponding tree nodes and
attaching type info and such. If destination supplied and compatible
with temporary that would be made in certain cases, temporary isn't
- made, destination used instead, and dest_used flag set TRUE. */
+ made, destination used instead, and dest_used flag set TRUE.
+
+ If TREE_TYPE is non-null, it overrides the type that the expression
+ would normally be computed in. This is most useful for array indices
+ which should be done in sizetype for efficiency. */
#if FFECOM_targetCURRENT == FFECOM_targetGCC
static tree
-ffecom_expr_ (ffebld expr, tree dest_tree,
+ffecom_expr_ (ffebld expr, tree tree_type, tree dest_tree,
ffebld dest, bool *dest_used,
bool assignp)
{
@@ -2680,7 +2684,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
ffeinfoBasictype bt;
ffeinfoKindtype kt;
tree t;
- tree tree_type;
tree dt; /* decl_tree for an ffesymbol. */
ffesymbol s;
enum tree_code code;
@@ -2692,11 +2695,12 @@ ffecom_expr_ (ffebld expr, tree dest_tre
bt = ffeinfo_basictype (ffebld_info (expr));
kt = ffeinfo_kindtype (ffebld_info (expr));
+ if (!tree_type)
+ tree_type = ffecom_tree_type[bt][kt];
switch (ffebld_op (expr))
{
case FFEBLD_opACCTER:
- tree_type = ffecom_tree_type[bt][kt];
{
ffebitCount i;
ffebit bits = ffebld_accter_bits (expr);
@@ -2760,7 +2764,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
return list;
case FFEBLD_opARRTER:
- tree_type = ffecom_tree_type[bt][kt];
{
ffetargetOffset i;
@@ -2796,7 +2799,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
return list;
case FFEBLD_opCONTER:
- tree_type = ffecom_tree_type[bt][kt];
item
= ffecom_constantunion (&ffebld_constant_union (ffebld_conter (expr)),
bt, kt, tree_type);
@@ -2930,54 +2932,46 @@ ffecom_expr_ (ffebld expr, tree dest_tre
t = ffecom_2 (ARRAY_REF,
TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (t))),
t,
- ffecom_expr (dims[--i]));
+ ffecom_expr_ (dims[--i], sizetype, NULL, NULL,
+ NULL, FALSE));
#endif
return t;
}
case FFEBLD_opUPLUS:
- tree_type = ffecom_tree_type[bt][kt];
return ffecom_1 (NOP_EXPR, tree_type, ffecom_expr (ffebld_left (expr)));
case FFEBLD_opPAREN: /* ~~~Make sure Fortran rules respected here */
- tree_type = ffecom_tree_type[bt][kt];
return ffecom_1 (NOP_EXPR, tree_type, ffecom_expr (ffebld_left (expr)));
case FFEBLD_opUMINUS:
- tree_type = ffecom_tree_type[bt][kt];
return ffecom_1 (NEGATE_EXPR, tree_type,
ffecom_expr (ffebld_left (expr)));
case FFEBLD_opADD:
- tree_type = ffecom_tree_type[bt][kt];
return ffecom_2 (PLUS_EXPR, tree_type,
ffecom_expr (ffebld_left (expr)),
ffecom_expr (ffebld_right (expr)));
break;
case FFEBLD_opSUBTRACT:
- tree_type = ffecom_tree_type[bt][kt];
return ffecom_2 (MINUS_EXPR, tree_type,
ffecom_expr (ffebld_left (expr)),
ffecom_expr (ffebld_right (expr)));
case FFEBLD_opMULTIPLY:
- tree_type = ffecom_tree_type[bt][kt];
return ffecom_2 (MULT_EXPR, tree_type,
ffecom_expr (ffebld_left (expr)),
ffecom_expr (ffebld_right (expr)));
case FFEBLD_opDIVIDE:
- tree_type = ffecom_tree_type[bt][kt];
- return
- ffecom_tree_divide_ (tree_type,
- ffecom_expr (ffebld_left (expr)),
- ffecom_expr (ffebld_right (expr)),
- dest_tree, dest, dest_used);
+ return ffecom_tree_divide_ (tree_type,
+ ffecom_expr (ffebld_left (expr)),
+ ffecom_expr (ffebld_right (expr)),
+ dest_tree, dest, dest_used);
case FFEBLD_opPOWER:
- tree_type = ffecom_tree_type[bt][kt];
{
ffebld left = ffebld_left (expr);
ffebld right = ffebld_right (expr);
@@ -3093,12 +3087,10 @@ ffecom_expr_ (ffebld expr, tree dest_tre
}
case FFEBLD_opNOT:
- tree_type = ffecom_tree_type[bt][kt];
switch (bt)
{
case FFEINFO_basictypeLOGICAL:
- item
- = ffecom_truth_value_invert (ffecom_expr (ffebld_left (expr)));
+ item = ffecom_truth_value_invert (ffecom_expr (ffebld_left (expr)));
return convert (tree_type, item);
case FFEINFO_basictypeINTEGER:
@@ -3118,7 +3110,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
!= FFEINFO_basictypeCHARACTER);
/* Fall through. */
case FFEBLD_opSUBRREF:
- tree_type = ffecom_tree_type[bt][kt];
if (ffeinfo_where (ffebld_info (ffebld_left (expr)))
== FFEINFO_whereINTRINSIC)
{ /* Invocation of an intrinsic. */
@@ -3161,7 +3152,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
return item;
case FFEBLD_opAND:
- tree_type = ffecom_tree_type[bt][kt];
switch (bt)
{
case FFEINFO_basictypeLOGICAL:
@@ -3185,7 +3175,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
break;
case FFEBLD_opOR:
- tree_type = ffecom_tree_type[bt][kt];
switch (bt)
{
case FFEINFO_basictypeLOGICAL:
@@ -3210,7 +3199,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
case FFEBLD_opXOR:
case FFEBLD_opNEQV:
- tree_type = ffecom_tree_type[bt][kt];
switch (bt)
{
case FFEINFO_basictypeLOGICAL:
@@ -3234,7 +3222,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
break;
case FFEBLD_opEQV:
- tree_type = ffecom_tree_type[bt][kt];
switch (bt)
{
case FFEINFO_basictypeLOGICAL:
@@ -3263,7 +3250,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
if (ffebld_op (ffebld_left (expr)) == FFEBLD_opANY)
return error_mark_node;
- tree_type = ffecom_tree_type[bt][kt];
switch (bt)
{
case FFEINFO_basictypeLOGICAL:
@@ -3328,8 +3314,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
code = GE_EXPR;
relational: /* :::::::::::::::::::: */
-
- tree_type = ffecom_tree_type[bt][kt];
switch (ffeinfo_basictype (ffebld_info (ffebld_left (expr))))
{
case FFEINFO_basictypeLOGICAL:
@@ -3471,7 +3455,6 @@ ffecom_expr_ (ffebld expr, tree dest_tre
break;
case FFEBLD_opPERCENT_LOC:
- tree_type = ffecom_tree_type[bt][kt];
item = ffecom_arg_ptr_to_expr (ffebld_left (expr), &list);
return convert (tree_type, item);
@@ -11436,7 +11419,7 @@ ffecom_expand_let_stmt (ffebld dest, ffe
if ((TREE_CODE (dest_tree) != VAR_DECL)
|| TREE_ADDRESSABLE (dest_tree))
- source_tree = ffecom_expr_ (source, dest_tree, dest,
+ source_tree = ffecom_expr_ (source, NULL_TREE, dest_tree, dest,
&dest_used, FALSE);
else
{
@@ -11478,7 +11461,7 @@ ffecom_expand_let_stmt (ffebld dest, ffe
tree
ffecom_expr (ffebld expr)
{
- return ffecom_expr_ (expr, NULL_TREE, NULL, NULL,
+ return ffecom_expr_ (expr, NULL_TREE, NULL_TREE, NULL, NULL,
FALSE);
}
@@ -11489,7 +11472,7 @@ ffecom_expr (ffebld expr)
tree
ffecom_expr_assign (ffebld expr)
{
- return ffecom_expr_ (expr, NULL_TREE, NULL, NULL,
+ return ffecom_expr_ (expr, NULL_TREE, NULL_TREE, NULL, NULL,
TRUE);
}
@@ -11500,7 +11483,7 @@ ffecom_expr_assign (ffebld expr)
tree
ffecom_expr_assign_w (ffebld expr)
{
- return ffecom_expr_ (expr, NULL_TREE, NULL, NULL,
+ return ffecom_expr_ (expr, NULL_TREE, NULL_TREE, NULL, NULL,
TRUE);
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More than you ever wanted to know about Fortran array indexing;-)
1997-10-05 17:49 ` Richard Henderson
@ 1997-10-06 7:02 ` Toon Moene
1997-10-07 23:14 ` Toon Moene
1 sibling, 0 replies; 7+ messages in thread
From: Toon Moene @ 1997-10-06 7:02 UTC (permalink / raw)
To: Richard Henderson; +Cc: egcs
> The appended patch, I believe, does this. The result is
> relatively good code for all of the examples that have
> been posted so far.
Yep, the timing for bs3dvw.f also is around the same value as with
the FFECOM_FASTER_ARRAY_REFS "solution". BTW, it seems to be very
hard to get consistent timings on a Linux Alpha - in five runs I got
anywhere between 3.6 and 4.3 seconds, whereas on my machine it is
consistent up to the tenth of seconds (and it takes 90 seconds !).
> Only, I would like to see loop take y[i-1], y[i], y[i+1],
> make one induction variable, and use -4(R), 0(R), 4(R)
> in the loop. Hmm...
This is a nice follow-on project :-) Note that lots of
computational physics code is based on finite difference or finite
element algorithms, which all have these opportunities for
optimisation.
Or recognise that, on loop unrolling, you've got the following
equivalences:
1 y(i-1) y(i) y(i+1)
2 y(i-1) y(i) y(i+1)
3 y(i-1) y(i) y(i+1)
4 y(i-1) y(i) y(i+1)
Anyway, we can consider this problem "solved" (apply Jeff's `use'
patch, your `expr' patch, my `fold' patch and your `frontend'
patch), unless Craig objects against the frontend changes.
There is namely a pedantic-standards-interpretation question hiding
in your frontend patch: Should the compiler evaluate each and
every expression in the mode it is written in ? Note that
currently, g77+backend as a whole already violate that rule because
of the `sizetype' promotion `expr.c' does. What your frontend patch
does is lift this uncleanliness up into the frontend ...
We'll see - Craig has the last word on this.
Cheers,
Toon.
PS: I'm starting a test now with all those patches applied to
egcs-970929, to see if it still works on my Fortran code ;-)
PS2: My `fold' patch has a typo; somewhere it says:
+ && multiple_of_p (Ttype, REE_OPERAND (xarg0, 1), arg1))
Obviously, this should be:
+ && multiple_of_p (type, TREE_OPERAND (xarg0, 1),
arg1))
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More than you ever wanted to know about Fortran array indexing;-)
1997-10-05 17:49 ` Richard Henderson
1997-10-06 7:02 ` More than you ever wanted to know about Fortran array indexing;-) Toon Moene
@ 1997-10-07 23:14 ` Toon Moene
1997-10-08 21:19 ` More than you ever wanted to know about Fortran array indexing ;-) Jeffrey A Law
1 sibling, 1 reply; 7+ messages in thread
From: Toon Moene @ 1997-10-07 23:14 UTC (permalink / raw)
To: Richard Henderson; +Cc: egcs
> Only, I would like to see loop take y[i-1], y[i], y[i+1],
> make one induction variable, and use -4(R), 0(R), 4(R)
> in the loop. Hmm...
I wrote that that would be a nice follow on project, but it turns
out that gcc already knows how to do that; unfortunately, the
function that decides about this optimisation almost never applies
it.
I am talking about `combine_givs_p' in loop.c. Basically it says:
If a giv G2 can be expressed as a function of giv G1 and that
function is a valid address expression, and the resulting addressing
mode is cheaper than the original, then do not reduce G2.
I think the cost/benefit reasoning here is flawed. First of all,
if G2 is reduced, you need to include the increment it needs at the
end of the loop in its cost. More important, however, is that you
need an extra register ! Therefore, it is almost always cheaper to
combine those givs than to keep them separate.
Another flaw of combine_givs_p is that it uses ADDRESS_COST, a
macro that's not defined for every architecture gcc supports, but it
doesn't use a backup like RTX_COST if it isn't. This already lead
to one hard to find bug.
I'll try out tonight what happens if I simply bypass its cost
calculation ...
Cheers,
Toon.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More than you ever wanted to know about Fortran array indexing ;-)
1997-10-07 23:14 ` Toon Moene
@ 1997-10-08 21:19 ` Jeffrey A Law
1997-10-09 15:14 ` More than you ever wanted to know about Fortran array indexing;-) Toon Moene
0 siblings, 1 reply; 7+ messages in thread
From: Jeffrey A Law @ 1997-10-08 21:19 UTC (permalink / raw)
To: Toon Moene; +Cc: Richard Henderson, egcs
In message < 9710080552.AA13313@moene.indiv.nluug.nl >you write:
> I wrote that that would be a nice follow on project, but it turns
> out that gcc already knows how to do that; unfortunately, the
> function that decides about this optimisation almost never applies it.
>
> I am talking about `combine_givs_p' in loop.c. Basically it says:
> If a giv G2 can be expressed as a function of giv G1 and that
> function is a valid address expression, and the resulting addressing
> mode is cheaper than the original, then do not reduce G2.
Yes. On a somewhat related note, as I've mentioned, with my patch to
handle more complex givs (aka the USE patch) I generally get worse code
on my PA. I _believe_ this is due to loop not realizing that many givs
are related and due to the giv cost/benefit analysis not handling more
complicated givs.
jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More than you ever wanted to know about Fortran array indexing;-)
1997-10-08 21:19 ` More than you ever wanted to know about Fortran array indexing ;-) Jeffrey A Law
@ 1997-10-09 15:14 ` Toon Moene
1997-10-13 9:27 ` More than you ever wanted to know about Fortran array indexing ;-) Jeffrey A Law
0 siblings, 1 reply; 7+ messages in thread
From: Toon Moene @ 1997-10-09 15:14 UTC (permalink / raw)
To: egcs
Jeff wrote:
> On a somewhat related note, as I've mentioned, with my
> patch to handle more complex givs (aka the USE patch) I
> generally get worse code on my PA. I _believe_ this is
> due to loop not realizing that many givs are related and
> due to the giv cost/benefit analysis not handling more
> complicated givs.
Pah, it probably means the PA is broken as an architecture :-) -
I've tried it on machines as far apart as DEC Alpha and Motorola
m68k - all Fortran code I throw at it gets compiled to better code
*with* your `USE' patch; certainly all others (with possibly the x86
as the sole exception) must fall between these extremes ...
Oh, BTW, making combine_givs_p accept all possible candidate givs
for combining does not result in much improvement, because the
return value of this function is but one of the (half a dozen)
conditions that have to be fulfilled before giv G2 can be combined
with giv G1 (see the routine combine_givs, i.e. without `_p').
Obviously, this could work correctly if every giv could have a
_list_ of other givs that could be combined with it, and
combine_givs computed the transitive closure of the
G2-combines-with-G1 relation (checking, in the mean time, that the
computational relationship remained a valid addressing mode).
But that requires `some' more work ;-)
Cheers,
Toon.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More than you ever wanted to know about Fortran array indexing ;-)
1997-10-09 15:14 ` More than you ever wanted to know about Fortran array indexing;-) Toon Moene
@ 1997-10-13 9:27 ` Jeffrey A Law
0 siblings, 0 replies; 7+ messages in thread
From: Jeffrey A Law @ 1997-10-13 9:27 UTC (permalink / raw)
To: Toon Moene; +Cc: egcs
In message < 9710091754.AA05699@moene.indiv.nluug.nl >you write:
> Pah, it probably means the PA is broken as an architecture :-) -
:-) :-) I've said this more than once myself.
> I've tried it on machines as far apart as DEC Alpha and Motorola
> m68k - all Fortran code I throw at it gets compiled to better code
> *with* your `USE' patch; certainly all others (with possibly the x86
> as the sole exception) must fall between these extremes ...
Hmmm. It's always possible I goof'd something during my benchmark runs.
Basically what I saw was inner loops improving, but one level out there
was _severe_ code explosion -- enough that improvements in the inner loops
were completely negated.
There's also the possibility the slowdown is PA specific due to quirks
in the PA's addressing modes.
> Oh, BTW, making combine_givs_p accept all possible candidate givs
> for combining does not result in much improvement, because the
> return value of this function is but one of the (half a dozen)
> conditions that have to be fulfilled before giv G2 can be combined
> with giv G1 (see the routine combine_givs, i.e. without `_p').
Yup. Found that out myself when I briefly looked at the PA slowdowns.
> Obviously, this could work correctly if every giv could have a
> _list_ of other givs that could be combined with it, and
> combine_givs computed the transitive closure of the
> G2-combines-with-G1 relation (checking, in the mean time, that the
> computational relationship remained a valid addressing mode).
>
> But that requires `some' more work ;-)
A "simple" matter of hacking :-) :-)
jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~1997-10-13 9:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-10-05 5:02 More than you ever wanted to know about Fortran array indexing ;-) Toon Moene
1997-10-05 17:49 ` Richard Henderson
1997-10-06 7:02 ` More than you ever wanted to know about Fortran array indexing;-) Toon Moene
1997-10-07 23:14 ` Toon Moene
1997-10-08 21:19 ` More than you ever wanted to know about Fortran array indexing ;-) Jeffrey A Law
1997-10-09 15:14 ` More than you ever wanted to know about Fortran array indexing;-) Toon Moene
1997-10-13 9:27 ` More than you ever wanted to know about Fortran array indexing ;-) Jeffrey A Law
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).