From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3013 invoked by alias); 29 Jun 2009 16:25:51 -0000 Received: (qmail 2998 invoked by uid 22791); 29 Jun 2009 16:25:50 -0000 X-SWARE-Spam-Status: No, hits=-1.1 required=5.0 tests=AWL,BAYES_50,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mx2.redhat.com (HELO mx2.redhat.com) (66.187.237.31) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 29 Jun 2009 16:25:43 +0000 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n5TGPfjr006484; Mon, 29 Jun 2009 12:25:41 -0400 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n5TGPe7t031151; Mon, 29 Jun 2009 12:25:41 -0400 Received: from toll.yyz.redhat.com (toll.yyz.redhat.com [10.15.16.165]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n5TGPdIH003029; Mon, 29 Jun 2009 12:25:40 -0400 Message-ID: <4A48EB07.3040602@redhat.com> Date: Mon, 29 Jun 2009 16:27:00 -0000 From: Vladimir Makarov User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: reply@meinersbur.de CC: gcc@gcc.gnu.org Subject: Re: Register Pressure in Instruction Level Parallelism References: <4A47462E.1080402@uni-paderborn.de> In-Reply-To: <4A47462E.1080402@uni-paderborn.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-06/txt/msg00677.txt.bz2 Michael Kruse wrote: > Hello GCC developers, > > I am going to write a Master's Thesis about instruction scheduling > based on the technique presented in [1]. This will also include a > implementation. > > The basic idea is to have an additional pass on the data dependency > graph before instruction scheduling and register allocation takes > place. It estimates the minimal (register sufficiency) and maximal > number of registers (register saturation) required by schedules based > on that graph. > > If the register sufficiency is higher than the physical number of > registers, spill code is added to the graph. > For me, that is the most interesting part, unfortunately Touti (as I remember) in his thesis say practically nothing about this. > If the register saturation is higher than the physical number of > registers, artificial dependencies are added to the graph, such that > the instruction scheduler is forced to generate a schedule with less > registers. > > The aim is that the instruction scheduler can focus on the optimal > arrangement of instructions to exploit a maximal amount of instruction > level parallelism. [1] also suggests heuristics for all these > problems. According to the author, these are "nearly optimal" in > practice. The heuristics for estimating register sufficiency and > saturation are both optimistic, meaning that it might still be > necessary to add more spill code to the final code. Hence, this this > technique is just an optimization pass and doesn't replace existing > register allocation or instruction scheduling passes. > > [1] also has a part about loop scheduling, but my thesis will be about > basic blocks only. See [2] for an presentation of this technique. > > So, now my questions: How much do you think could this could improve > compiled code speed? Would the current GCC/YARA benefit from such an > optimization pass at all? I think nobody can answer the questions until it is implemented. I am working on register-pressure sensitive insn scheduling too for a few last months. I started with simple heuristic described in Muchnic book: when the current register pressure is high, choose insn decreasing the pressure and when the pressure is low, use the usual scheduling heuristic. This did not work at all (increasing code size in about 3-5% for x86). The problem was in that even the current register pressure is low at given point, issuing insn at this point could still increase register pressure in later points. So I am working now on evaluation register pressure in later points too (some form of look ahead). This approach is similar what Touati proposed (but without actually adding additional DD arcs). If you are going to work on this project, some small advice about evaluating register sufficiency. I found that register pressure is already practically minimized before insn-scheduling (I suspect that it is mostly done in TER). By the way, it also means that tree balancing (which is actually much more complicated issue because it is not tree but DAG) is not necessary for the backend as it was done in Preston Briggs project (and mentioned in proposed Ken Zadeck's pass stack). > What are the chances that this could get into the main GCC tree if it > shows up to be an improvement? > I don't see any problem to get the code into main GCC tree if you get even 1-2% improvement. Although there are some technical questions (like code fitting into gcc practice standards) and commitment to maintain the code. But this problems could be definitely overcome. > There has been a short discussion on this mailing list already [3] > some years ago (Note if you read this: intLP has been used to compare > the heuristic to the optimal result only). To my knowledge, there > hasn't been any more on this topic yet (anywhere). > > I'd prefer to implement this for the gcc, but my advisor wants me to > do it for the university's own compiler. Therefore I could also need > arguments why to do it for the GCC. > Personally, I'd love to see this work done in GCC. I believe the research work in compiler field should be done in real industrial compilers because that is a final target of the research. I saw many times that researchers report big improvements of their algorithms because they are using toy compilers but when the same algorithms are implemented in GCC they give practically nothing. For me a toy compiler criteria is defined how good they are on some generally used compiler benchmarks as SPEC (the better existing compiler performance, the harder to get the same percent improvement). So if your university compiler performance is close for example to SPECFP2000, I think you could use it to get a real optimization impact. On the other hand, you definitely will get better numbers (and spending less time) using a toy compiler than GCC and your research probably could look more successful with the academic point of view. > > [1] http://www.prism.uvsq.fr/~touati/thesis.html > [2] > http://tel.archives-ouvertes.fr/docs/00/04/72/88/ANNEX/tel-00007405.ppt > [3] http://gcc.gnu.org/ml/gcc/2002-07/msg00565.html >