From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32253 invoked by alias); 30 Dec 2002 16:58:29 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 32246 invoked from network); 30 Dec 2002 16:58:28 -0000 Received: from unknown (HELO mail.goquest.com) (12.18.108.6) by 209.249.29.67 with SMTP; 30 Dec 2002 16:58:28 -0000 Received: (qmail 7642 invoked by uid 0); 30 Dec 2002 16:58:11 -0000 Received: from mszick@goquest.com by mail.goquest.com by uid 502 with qmail-scanner-1.12 (spamassassin: 2.31. . Clear:. Processed in 1.146485 secs); 30 Dec 2002 16:58:11 -0000 Received: from unknown (HELO localhost.localdomain) (66.90.217.183) by mail.goquest.com with SMTP; 30 Dec 2002 16:58:09 -0000 Content-Type: text/plain; charset="iso-8859-1" From: Michael S. Zick To: Daniel Egger Subject: Re: An unusual Performance approach using Synthetic registers Date: Mon, 30 Dec 2002 10:25:00 -0000 Cc: Andy Walker , gcc@gcc.gnu.org References: <02122917451200.00862@localhost.localdomain> <1041214697.26664.57.camel@sonja> In-Reply-To: <1041214697.26664.57.camel@sonja> MIME-Version: 1.0 Message-Id: <02123010533901.00764@localhost.localdomain> Content-Transfer-Encoding: 8bit X-SW-Source: 2002-12/txt/msg01584.txt.bz2 On Sunday 29 December 2002 08:18 pm, Daniel Egger wrote: > Am Mon, 2002-12-30 um 00.45 schrieb Michael S.Zick: > > Daniel, have you considered that the "16 used registers" observation > > could be an artifact? > > Yes. No offense intended, I really did presume that you had. > > > Presume an arbitrarily sophisticated optimization algorithm working > > on a symbolic machine with an infinite number of registers... > > > > Wouldn't the "16 used registers" observation fail as the size of the > > source file approached infinite size? > > Depends on the code. Numerical applications tend to need far more > temporary values then say a notepad. > > > Similarly with the size and complexity of a single expression statement. > > Those things are usually limited in size and complexity to what the > > human mind can comprehend. > > The problem here is that you can surely recursively inline the full > application into the main function, compile it as one chunk and then > be happy about the maximum use of registers; however at the same time > you absolutely blew code reuse and performance because of cache abuse. > Indeed. Every one of those things. In fact, I would like to quote your entire, clear, description of the result of recursively in-lining the entire application (including libc, libgcc, libstdc++, etc.) into the main function, BUT; NOT in support of failing to do it where the optimization's on the symbolic machine with a infinite sized set of registers can get hold of it, INSTEAD; In support that there is an entire pass (just prior to hard register / synthetic register / stack slot assignments with spill & reload of whatever is left over) missing from the compiler design. It is at this point that the compiler should address (no pun intended) things like cache line utilization and locality, memory footprint (and implied code reuse), etc. For the purpose of this thread, lets skip over the point that there are much better ways of achieving the goal of exposing the entire application to the optimization pass(es) than simply, recursively, in-lining the entire application into the main function. Just pretend we did something realistic with a similar result. I'll refer to that as "effectively exposing the entire application" to the optimization pass(es). Presume further that the optimization pass(es) on the symbolic machine with an infinite number of registers has done it's thing. At which point we have arrived at the front door of the port specific "back end". Where (currently) the task is to translate the symbolic machine with a fully, usage, optimized, infinite register set into whatever our silicon really has, plus any artificial restrictions, such as ABI definitions. At this point we insert the compiler pass that I feel is missing. For ease of discussion and visualization, presume that this symbolic machine arrives in the form of a tree. This pass traverses the tree, looking for leafs, twigs, small branches, major limbs having the same (or similar after transformation) patterns. Leafs, twigs, etc occurring withing a loop count as that many occurrences of the leaf, twig, etc. At some point the pass "decides" that a certain pattern in the tree occurs often enough that it should be UN-INLINED and turned into a function call. The cost of all the prolog, epilogue, register bashing about to meet an ABI specification, and other havoc that has to be done to the beautiful symbolic machine code is considered in that "decision" process. All of that takes care of code reuse and cache line assignment. With this design, any "inline function (.....)" and common functions that the programmer factored out of the body of his code is only a HINT to this new pass of what common instruction groupings should be.* While still, in source form, meeting their PRIMARY, intended, goal of making the source human comprehensible. Then you assign hard registers, synthetic registers, and stack slots with spill & reload of whatever is left over. *Other than the use of the new "never-inline" attribute. Mike