From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22885 invoked by alias); 6 Jan 2003 19:47:59 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 22818 invoked from network); 6 Jan 2003 19:47:55 -0000 Received: from unknown (HELO emf.net) (205.149.0.20) by sources.redhat.com with SMTP; 6 Jan 2003 19:47:55 -0000 Received: (from lord@localhost) by emf.net (K/K) id EAA22286; Sun, 5 Jan 2003 04:24:22 -0800 (PST) Date: Mon, 06 Jan 2003 19:50:00 -0000 From: Tom Lord Message-Id: <200301051224.EAA22286@emf.net> To: dewar@gnat.com CC: denisc@overta.ru, dewar@gnat.com, ja_walker@earthlink.net, gcc@gcc.gnu.org In-reply-to: <20030105113840.BF53CF28C4@nile.gnat.com> (dewar@gnat.com) Subject: Re: An unusual Performance approach using Synthetic registers References: <20030105113840.BF53CF28C4@nile.gnat.com> X-SW-Source: 2003-01/txt/msg00285.txt.bz2 dewar: This is a bit of an odd statement. In practice on a machine like the x86, the current stack frame will typically be resident in L1 cache, and that's where the register allocator spills to. What some of us still don't see is the difference in final resulting code between your "synthetic registers" and normal spill locations from the register allocator. Register spills clearly don't equal synthetic registers. Presumably, the number of locations dedicated to register spills never exceeds (approximately) the maximum number of simultaneously live _intermediate_ values minus the number of general purpose registers. Any non-intermediate value (i.e., one that has a main memory location), rather than being spilled, will be written to its location. If that value is later re-used, it will be retrieved from memory. The number of synthetic registers can be much larger than the number of simultaneously live intermediate values. So, with synthetic registers, some values that are not intermediates can be retained (in synthetic registers). Without synthetic registers, the next time those values are used, they have to be fetched from (non-special) memory. In other words, with synthregs, the CPU can ship some value off to memory and not care how long it takes to get there or to get back from there -- because it also ships it off to the synthreg, which it hypothetically has faster access to. In practice, that means that synthregs will store some values in memory twice: once in the location the program text says they go in; again in the synthetic register. If the synthetic register is indeed cache-favored, maybe there's a performance win there -- and if so, a register allocator is the right algorithm to decide which values to keep duplicated in synthetic registers (so the proposed implementation strategy is sensible). (Another weird interaction is intermediate values that can be recalculated -- I don't know if GCC ever makes that trade-off -- if it does, it needs to be tuned for synthregs.) So, does that hypothesis (that synthreg access is faster than general memory access) hold? Quite possibly. For example, a re-used synthreg inherits cache-presence (at all levels, not just L1) from the previous uses. synthregs may win for some apps for more than just L1 reasons. This brings in new alignment issues, too. If you can, you might want to make sure that your allocator locates its metadata where it will cache-collide with the synthregs, to help push allocated memory out of those locations (presuming here that allocator meta-data is relatively infrequently accessed). It's probably not all that hard to do this "by accident". Just in general: do things to protect the cache-presence of the synthregs. It might eventually lead to some hw advances: give synthregs with absolute locations cache preference. Or, if synthregs are on the stack, give locations near the frame pointer cache preference (or is that done already?). I'd therefore guess it will be a very system-specific optimization -- but that it will win often enough to be useful. And given what I understand about trends in architecture, the cases in which it will win will sharply increase over time. No? -t p.s.: arch foo thinking about non-disruptive ways to improve gcc's rev ctl practices: http://lists.fifthvision.net/pipermail/arch-users/2003-January/001856.html and some of the follow-ups. It's a pretty "noisy" list, though.