From mboxrd@z Thu Jan 1 00:00:00 1970 From: Craig Burley To: law@cygnus.com Cc: davem@dm.cobaltmicro.com, d.love@dl.ac.uk, egcs@cygnus.com Subject: Re: ix86 double alignment (was Re: egcs-1.1 release schedule) Date: Tue, 23 Jun 1998 05:13:00 -0000 Message-id: <199806230651.CAA02192@melange.gnu.org> References: <17700.898578696@hurl.cygnus.com> X-SW-Source: 1998-06/msg00790.html [Need to study your later email with more technical issues more carefully next, but here are some quick points.] > In message < 199806221829.OAA07477@melange.gnu.org >you write: > > Well, I'm willing to not try to do any special aligning for > > EQUIVALENCE and COMMON for now. If we can just get 64-bit > > alignment for stack-allocated VAR_DECLs -- which generally > > won't include EQUIVALENCE (and certainly not COMMON) -- we'll > > have made a *huge* improvement in g77 performance, especially > > its *repeatability* of performance measurements. >Yup. But considering the release schedule, I'd be happy if we could >just get the stack aligned properly without breaking the ABI, then >iterate to getting automatic variables aligned relative to the stack. I'd say getting the stack aligned properly without breaking the ABI and *also* getting VAR_DECLs that are type `double' aligned within those frames (whether arrays or scalars) is not only the most important combination, but solves ~95% of the problems the g77 user community sees. At least getting this to work when -malign-double is specified would be kind of a "minimum" for making egcs 1.1 not noticably much worse than g77 0.5.21 and 0.5.22 were. >If we can get more done before the release, then great, but I wouldn't >want to hold things up on this issue if we can avoid it. I, hesitantly, agree. Dave rightly points out that he and others have been yelling about this for, well, it seems like years now. I still have email from him (weeks ago) asking me to make a more noticable push for this on the egcs list, and am now sorry I didn't take his advice sooner. I can only plead insanity, I guess it's all those soccer balls I hit with my head playing for Scotland that makes my brain mushy (just kidding, that's another Craig Burley ;-). > > (Without this improvement, egcs 1.1 will often appear *substantially* > > worse than the combination of g77 0.5.22 and gcc 2.7.2.3 on lots of > > widely used Fortran code, assuming users are using -malign-double.) >Well, we still have -malign-double as an option for the x86 port, so >if they use it they presumably would see comparable performance, right? No. I haven't yet finished up my totally-over-engineered diagnostic program to expose all this, but the preliminary results are: g77 0.5.21, 0.5.22: - Without -malign-double, basically no doubles get aligned properly, with the exception of doubles that can be lazily aligned in COMMON (without conflicts; and I'm not quite sure why this is, maybe my program isn't working quite right, it could easily be an accident that these appear to be aligned, the sort of accident the new version of my program should make much less likely). - With -malign-double, stack-based doubles still not aligned properly, but static/COMMON ones are (even if it breaks the COMMON ABI). - With -malign-double -O, all doubles are aligned properly. This is surprising; I didn't realize one needed -O to get this. (And, yes, this breaks the COMMON ABI, etc.) g77 0.5.23, egcs 1.0.3: - Without -malign-double, no doubles get aligned properly. - With -malign-double, static and automatic doubles get aligned, but not stack. (Automatic are stack-based with dynamic size.) -O makes no difference. egcs 19980615: - Without -malign-double, only static doubles (and non-conflicting COMMON doubles) get aligned properly, the rest don't. - With -malign-double, same as g77 0.5.23 and egcs 1.0.3. So, the one huge improvement we should try to make for 1.1 is, IMO, to achieve this: - Without -malign-double, static, stack, and automatic doubles get aligned properly, but not if they're in EQUIVALENCE or COMMON blocks. (Basically, any VAR_DECL the back end sees.) - With -malign-double, same as g77 0.5.21 and 0.5.22 when -O is specified, except -O wouldn't be needed here. The ideal situation would be to align all doubles that aren't involved in ABI issues, but I think just doing the non-aggregate ones handles ~95% of the important performance cases (as I've said, and that's a *real* seat-of-the-pants guess). Then, the only reason to use -malign-double is when the user knows ABI issues are consistent across all pertinent modules (e.g. they're all compiled with -malign-double) and the last ounce of performance is needed. >Actually, I'd expect generally better performance because we do handle >alignments for static store items in a reasonable manner, which is a >significant improvement by itself). Right, which makes egcs *default* to better performance than the *default* for 0.5.22, but to *worse* performance than 0.5.22 with -malign-double, *even if -malign-double is used for egcs*. The reason is that egcs nor gcc 2.8 will align stack-based doubles (except automatics, i.e. dynamically-sized stack doubles). So that's the main performance "regression" we currently have, which means that at least getting -malign-double to align all doubles, including stack-based ones, would seem to be worth making a "required" item for egcs 1.1. But I still would prefer it if we wouldn't effectively persuade most users to risk using -malign-double just to get stack-based doubles aligned for reasonable performance, and the way to do that is to make alignment of stack-based (and automatic) doubles the default, again, as long as the ABI isn't broken. So not needing to use -malign-double to align stack-based doubles would be a *huge* win, making egcs 1.1 *obviously* better than g77 0.5.21, 0.5.22, or 0.5.23 for most g77 users on x86. That's because lots of g77 users probably benchmark without reading up on (or feeling safe about using) -malign-double. > > Note that I suggested the gcc architecture (machine descriptions, > > etc.) be modified to include a more fine-grained expression of > > alignment requirements. E.g. distinguishing hardware requirements > > (even instruction requirements, such as `ld' vs. `ldd' on SPARCv8) > > from ABI requirements from ideal performance settings. But this > > suggestion was turned down at the time -- some seven years ago! >Sigh. Yea, it really seems like something we should have -- then >again, there's been little gcc emphasis on the x86 in the past and >it's the most likely benefactor of such stuff. When I asked for it, I thought it would have helped with the SPARC system I was working on at the time. Though, I might have been wrong. The x86 surely seems to have the most variety of alignment flavors I've ever seen for any given type: e.g. `double' alone has at least three alignments, 1 byte (minimum hardware alignment), 4 bytes (ABI alignment), and 8 bytes (ideal performance alignment)! If you really want to see how sick I am about this "let's architect the thing right so programmers have distinct things to specify, instead of one-size-fits-all straitjackets", take a look at the (now-ancient) g77 internals in this area. E.g. egcs/gcc/f/target.h. You'll find it not only tracks the alignment for each type, but the "modulo". That is, g77 can be taught (if given a suitable back end ;-) that a given type is to be aligned such that it begins on byte M of an N-byte-aligned block. gcc and other tools are architected to always align on byte 0 of an N-byte-aligned block, but before I had worked on g77, I had some awareness of the possibility of, e.g., a 10-byte type whose *last* 8 bytes had to be 8-byte aligned, so the object as a whole would have to be aligned to byte 6 of an 8-byte block. Of course, this proved useful when cutting the code to handle oddly-alignable aggregates, though never yet for fundamental types. Specifically, aligning an EQUIVALENCE block such that REAL R must be immediately followed by DOUBLE PRECISION D fits neatly into this scheme. E.g. "EQUIVALENCE (R,S(1)), (D,S(2))" is something g77 can and does handle by intuiting that the entire EQUIVALENCE block must be aligned to byte 4 of an 8-byte- aligned block (assuming compiling for a SPARC, or with -malign-double on an x86, or any machine with 8-byte-aligned doubles and 4-byte long floats). That means g77 can also handle "COMMON C, D", where C is CHARACTER*1 and D is DOUBLE PRECISION, though it warns that it has to insert pre-padding (of 7 bytes, usually) between the linker's idea of where the common block starts and where it actually starts (where C starts). (The warning is in case the same common area is declared with a different type layout, resulting in a different pre-padding being used, e.g. none.) And, by "sick" I mean I designed it this way back around 1988, before seeing barely a shred of GNU code! My usual philosophy is: if I can't think of *one* clear-cut meaning for a value, that usually means I shouldn't be using just *one* value (or constant or macro or whatever). gcc's DECL_ALIGN field, and the relevant muck, violates that principle in spades, of course. tq vm, (burley)