From mboxrd@z Thu Jan  1 00:00:00 1970
From: Craig Burley <burley@gnu.org>
To: law@cygnus.com
Cc: davem@dm.cobaltmicro.com, d.love@dl.ac.uk, egcs@cygnus.com
Subject: Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
Date: Tue, 23 Jun 1998 05:13:00 -0000
Message-id: <199806230651.CAA02192@melange.gnu.org>
References: <17700.898578696@hurl.cygnus.com>
X-SW-Source: 1998-06/msg00790.html

[Need to study your later email with more technical issues more
carefully next, but here are some quick points.]

>  In message < 199806221829.OAA07477@melange.gnu.org >you write:
>  > Well, I'm willing to not try to do any special aligning for
>  > EQUIVALENCE and COMMON for now.  If we can just get 64-bit
>  > alignment for stack-allocated VAR_DECLs -- which generally
>  > won't include EQUIVALENCE (and certainly not COMMON) -- we'll
>  > have made a *huge* improvement in g77 performance, especially
>  > its *repeatability* of performance measurements.
>Yup.  But considering the release schedule, I'd be happy if we could
>just get the stack aligned properly without breaking the ABI, then
>iterate to getting automatic variables aligned relative to the stack.

I'd say getting the stack aligned properly without breaking the ABI
and *also* getting VAR_DECLs that are type `double' aligned within
those frames (whether arrays or scalars) is not only the most
important combination, but solves ~95% of the problems the g77
user community sees.

At least getting this to work when -malign-double is specified
would be kind of a "minimum" for making egcs 1.1 not noticably
much worse than g77 0.5.21 and 0.5.22 were.

>If we can get more done before the release, then great, but I wouldn't
>want to hold things up on this issue if we can avoid it.

I, hesitantly, agree.  Dave rightly points out that he and others
have been yelling about this for, well, it seems like years now.
I still have email from him (weeks ago) asking me to make a more
noticable push for this on the egcs list, and am now sorry I didn't
take his advice sooner.  I can only plead insanity, I guess it's
all those soccer balls I hit with my head playing for Scotland that
makes my brain mushy (just kidding, that's another Craig Burley ;-).

>  > (Without this improvement, egcs 1.1 will often appear *substantially*
>  > worse than the combination of g77 0.5.22 and gcc 2.7.2.3 on lots of
>  > widely used Fortran code, assuming users are using -malign-double.)
>Well, we still have -malign-double as an option for the x86 port, so
>if they use it they presumably would see comparable performance, right?

No.  I haven't yet finished up my totally-over-engineered diagnostic
program to expose all this, but the preliminary results are:

g77 0.5.21, 0.5.22:

  -  Without -malign-double, basically no doubles get aligned properly,
     with the exception of doubles that can be lazily aligned in COMMON
     (without conflicts; and I'm not quite sure why this is, maybe my
     program isn't working quite right, it could easily be an accident
     that these appear to be aligned, the sort of accident the new
     version of my program should make much less likely).

  -  With -malign-double, stack-based doubles still not aligned
     properly, but static/COMMON ones are (even if it breaks the
     COMMON ABI).

  -  With -malign-double -O, all doubles are aligned properly.  This is
     surprising; I didn't realize one needed -O to get this.  (And, yes,
     this breaks the COMMON ABI, etc.)

g77 0.5.23, egcs 1.0.3:

  -  Without -malign-double, no doubles get aligned properly.

  -  With -malign-double, static and automatic doubles get aligned,
     but not stack.  (Automatic are stack-based with dynamic size.)
     -O makes no difference.

egcs 19980615:

  -  Without -malign-double, only static doubles (and non-conflicting
     COMMON doubles) get aligned properly, the rest don't.

  -  With -malign-double, same as g77 0.5.23 and egcs 1.0.3.

So, the one huge improvement we should try to make for 1.1 is, IMO,
to achieve this:

  -  Without -malign-double, static, stack, and automatic doubles
     get aligned properly, but not if they're in EQUIVALENCE or
     COMMON blocks.  (Basically, any VAR_DECL the back end sees.)

  -  With -malign-double, same as g77 0.5.21 and 0.5.22 when -O is
     specified, except -O wouldn't be needed here.

The ideal situation would be to align all doubles that aren't
involved in ABI issues, but I think just doing the non-aggregate
ones handles ~95% of the important performance cases (as I've said,
and that's a *real* seat-of-the-pants guess).

Then, the only reason to use -malign-double is when the user knows
ABI issues are consistent across all pertinent modules (e.g. they're
all compiled with -malign-double) and the last ounce of performance
is needed.

>Actually, I'd expect generally better performance because we do  handle 
>alignments for static store items in a reasonable manner, which is a
>significant improvement by itself).

Right, which makes egcs *default* to better performance than the *default*
for 0.5.22, but to *worse* performance than 0.5.22 with -malign-double,
*even if -malign-double is used for egcs*.

The reason is that egcs nor gcc 2.8 will align stack-based doubles
(except automatics, i.e. dynamically-sized stack doubles).

So that's the main performance "regression" we currently have, which
means that at least getting -malign-double to align all doubles,
including stack-based ones, would seem to be worth making a
"required" item for egcs 1.1.

But I still would prefer it if we wouldn't effectively persuade most
users to risk using -malign-double just to get stack-based doubles
aligned for reasonable performance, and the way to do that is to make
alignment of stack-based (and automatic) doubles the default, again,
as long as the ABI isn't broken.

So not needing to use -malign-double to align stack-based doubles would
be a *huge* win, making egcs 1.1 *obviously* better than g77 0.5.21,
0.5.22, or 0.5.23 for most g77 users on x86.  That's because lots of
g77 users probably benchmark without reading up on (or feeling safe about
using) -malign-double.

>  > Note that I suggested the gcc architecture (machine descriptions,
>  > etc.) be modified to include a more fine-grained expression of
>  > alignment requirements.  E.g. distinguishing hardware requirements
>  > (even instruction requirements, such as `ld' vs. `ldd' on SPARCv8)
>  > from ABI requirements from ideal performance settings.  But this
>  > suggestion was turned down at the time -- some seven years ago!
>Sigh.  Yea, it really seems like something we should have -- then
>again, there's been little gcc emphasis on the x86 in the past and
>it's the most likely benefactor of such stuff.

When I asked for it, I thought it would have helped with the SPARC
system I was working on at the time.  Though, I might have been
wrong.  The x86 surely seems to have the most variety of alignment
flavors I've ever seen for any given type: e.g. `double' alone has
at least three alignments, 1 byte (minimum hardware alignment),
4 bytes (ABI alignment), and 8 bytes (ideal performance alignment)!

If you really want to see how sick I am about this "let's architect
the thing right so programmers have distinct things to specify,
instead of one-size-fits-all straitjackets", take a look at the
(now-ancient) g77 internals in this area.  E.g. egcs/gcc/f/target.h.

You'll find it not only tracks the alignment for each type, but
the "modulo".  That is, g77 can be taught (if given a suitable
back end ;-) that a given type is to be aligned such that it begins
on byte M of an N-byte-aligned block.  gcc and other tools
are architected to always align on byte 0 of an N-byte-aligned
block, but before I had worked on g77, I had some awareness of
the possibility of, e.g., a 10-byte type whose *last* 8 bytes
had to be 8-byte aligned, so the object as a whole would have
to be aligned to byte 6 of an 8-byte block.

Of course, this proved useful when cutting the code to handle
oddly-alignable aggregates, though never yet for fundamental
types.  Specifically, aligning an EQUIVALENCE block such that
REAL R must be immediately followed by DOUBLE PRECISION D fits
neatly into this scheme.  E.g. "EQUIVALENCE (R,S(1)), (D,S(2))"
is something g77 can and does handle by intuiting that the
entire EQUIVALENCE block must be aligned to byte 4 of an 8-byte-
aligned block (assuming compiling for a SPARC, or with -malign-double
on an x86, or any machine with 8-byte-aligned doubles and 4-byte
long floats).  That means g77 can also handle "COMMON C, D",
where C is CHARACTER*1 and D is DOUBLE PRECISION, though it warns
that it has to insert pre-padding (of 7 bytes, usually) between
the linker's idea of where the common block starts and where
it actually starts (where C starts).  (The warning is in case
the same common area is declared with a different type layout,
resulting in a different pre-padding being used, e.g. none.)

And, by "sick" I mean I designed it this way back around 1988,
before seeing barely a shred of GNU code!  My usual philosophy
is: if I can't think of *one* clear-cut meaning for a value,
that usually means I shouldn't be using just *one* value (or
constant or macro or whatever).  gcc's DECL_ALIGN field, and
the relevant muck, violates that principle in spades, of course.

        tq vm, (burley)