From mboxrd@z Thu Jan  1 00:00:00 1970
From: Craig Burley <burley@gnu.org>
To: davem@dm.cobaltmicro.com
Cc: law@cygnus.com, d.love@dl.ac.uk, egcs@cygnus.com
Subject: ix86 double alignment (was Re: egcs-1.1 release schedule)
Date: Mon, 22 Jun 1998 18:20:00 -0000
Message-id: <199806221829.OAA07477@melange.gnu.org>
References: <199806221217.FAA20123@dm.cobaltmicro.com>
X-SW-Source: 1998-06/msg00769.html

>   Date: Sun, 21 Jun 1998 22:31:31 -0600
>   From: Jeffrey A Law <law@cygnus.com>
>
>	   * The ABI is still going to mandate that some doubles in
>	     argument lists are going to be mis-aligned.  We'd have
>	     to arrange to copy them from the arglist into a suitable
>	     stack slot.  This may be more trouble than its worth.
>
>And there are still going to be issues with equivalence statements.

Well, I'm willing to not try to do any special aligning for
EQUIVALENCE and COMMON for now.  If we can just get 64-bit
alignment for stack-allocated VAR_DECLs -- which generally
won't include EQUIVALENCE (and certainly not COMMON) -- we'll
have made a *huge* improvement in g77 performance, especially
its *repeatability* of performance measurements.

(Without this improvement, egcs 1.1 will often appear *substantially*
worse than the combination of g77 0.5.22 and gcc 2.7.2.3 on lots of
widely used Fortran code, assuming users are using -malign-double.)

I hope to have a fairly thorough sample program put together soon
(tomorrow?) to illustrate this, but the simple cases we want
to align for now are like

	subroutine x
	double precision a
	...
	end

and:

	subroutine y(n)
	double precision a(n)
	...
	end

The latter uses automatic arrays (which gcc and g77 support), it'd
be great to get those 64-bit aligned as well.  The former is the
most important thing we *aren't* aligning, currently, even with
`-malign-double'.  (It should be aligned especially if `a' is an
array, of course.)

A case we can't 64-bit align is:

	real r(2)
	double precision d1, d2
	equivalence (r(1),d1)
	equivalence (r(2),d2)

Regardless of whether this is stack, static, or even part of a common
block, we can't 64-bit align both d1 and d2.  (Well, not without
an option to completely change the way we implement Fortran; I wonder
if Sun does that to support weird-but-conforming code on SPARCs,
such as the above.)

What we *can* do is *implement* the above, perhaps warning about
the suboptimal alignment.  That is, there's no reason we can't
go ahead and 32-bit align d1 and d2, so one of them is not
64-bit aligned.  The programmer asked for it, after all!

What we can also 64-bit align is this:

	real r(2)
	double precision d
	equivalence (r(2),d)

We can do that because we can see that there are no actual *conflicts*
of alignment.  We can implement this by either inserting a dummy
unused 32-bit variable before r(1) and aligning *that* to a 64-bit
boundary (stack or static, doesn't matter), or, if we have a
smart-enough back end (or linker, for static memory I guess), simply
use a directive that means "align to a 64-bit boundary on bit 32".

But it's not *important* to 64-bit align the above EQUIVALENCE case,
certainly not for egcs 1.1.

And what we also need to continue to support is stuff like

	real r1, r2
	real s(6)
	double precision d1, d2
	common r1, d1, r2, d2
	equivalence (r1,s)

which requires that s(1) overlays r1, s(2) and s(3) overlay d1, s(4)
overlays r2, and s(5) and s(6) overlays d2.

Again, we can do this by seeing that there are no "hard" conflicts
(at the machine or ABI level), and punting (and warning?) over the
fact that the "soft" conflicts (the ideal 64-bit alignment of
double for performance reasons) prevent "ideal" alignment.  Again,
"so what", the programmer has specified no 64-bit alignment, so
we don't give it to him in cases like that -- but we can still
compile correct, and fairly fast, ABI-compatible, code.

Note that I suggested the gcc architecture (machine descriptions,
etc.) be modified to include a more fine-grained expression of
alignment requirements.  E.g. distinguishing hardware requirements
(even instruction requirements, such as `ld' vs. `ldd' on SPARCv8)
from ABI requirements from ideal performance settings.  But this
suggestion was turned down at the time -- some seven years ago!

Maybe it's time we finally got this all "right", and I'm sure
willing to help.  But I think we can only manage to get a bit of
it "right" to improve x86 performance for egcs 1.1.

        tq vm, (burley)