From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-155023-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 3013 invoked by alias); 29 Jun 2009 16:25:51 -0000
Received: (qmail 2998 invoked by uid 22791); 29 Jun 2009 16:25:50 -0000
X-SWARE-Spam-Status: No, hits=-1.1 required=5.0 	tests=AWL,BAYES_50,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mx2.redhat.com (HELO mx2.redhat.com) (66.187.237.31)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 29 Jun 2009 16:25:43 +0000
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) 	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n5TGPfjr006484; 	Mon, 29 Jun 2009 12:25:41 -0400
Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) 	by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n5TGPe7t031151; 	Mon, 29 Jun 2009 12:25:41 -0400
Received: from toll.yyz.redhat.com (toll.yyz.redhat.com [10.15.16.165]) 	by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n5TGPdIH003029; 	Mon, 29 Jun 2009 12:25:40 -0400
Message-ID: <4A48EB07.3040602@redhat.com>
Date: Mon, 29 Jun 2009 16:27:00 -0000
From: Vladimir Makarov <vmakarov@redhat.com>
User-Agent: Thunderbird 2.0.0.19 (X11/20090105)
MIME-Version: 1.0
To: reply@meinersbur.de
CC: gcc@gcc.gnu.org
Subject: Re: Register Pressure in Instruction Level Parallelism
References: <4A47462E.1080402@uni-paderborn.de>
In-Reply-To: <4A47462E.1080402@uni-paderborn.de>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-06/txt/msg00677.txt.bz2

Michael Kruse wrote:
> Hello GCC developers,
>
> I am going to write a Master's Thesis about instruction scheduling 
> based on the technique presented in [1]. This will also include a 
> implementation.
>
> The basic idea is to have an additional pass on the data dependency 
> graph before instruction scheduling and register allocation takes 
> place. It estimates the minimal (register sufficiency) and maximal 
> number of registers (register saturation) required by schedules based 
> on that graph.
>
> If the register sufficiency is higher than the physical number of 
> registers, spill code is added to the graph.
>
For me, that is the most interesting part, unfortunately Touti (as I 
remember) in his thesis say practically nothing about this.
> If the register saturation is higher than the physical number of 
> registers, artificial dependencies are added to the graph, such that 
> the instruction scheduler is forced to generate a schedule with less 
> registers.
>
> The aim is that the instruction scheduler can focus on the optimal 
> arrangement of instructions to exploit a maximal amount of instruction 
> level parallelism. [1] also suggests heuristics for all these 
> problems. According to the author, these are "nearly optimal" in 
> practice. The heuristics for estimating register sufficiency and 
> saturation are both optimistic, meaning that it might still be 
> necessary to add more spill code to the final code. Hence, this this 
> technique is just an optimization pass and doesn't replace existing 
> register allocation or instruction scheduling passes.
>
> [1] also has a part about loop scheduling, but my thesis will be about 
> basic blocks only. See [2] for an presentation of this technique.
>
> So, now my questions: How much do you think could this could improve 
> compiled code speed? Would the current GCC/YARA benefit from such an 
> optimization pass at all?
I think nobody can answer the questions until it is implemented.

I am working on register-pressure sensitive insn scheduling too for a 
few last months.  I started with simple heuristic described in Muchnic 
book: when the current register pressure is high, choose insn decreasing 
the pressure and when the pressure is low, use the usual scheduling 
heuristic.  This did not work at all (increasing code size in about 3-5% 
for x86).  The problem was in that even the current register pressure is 
low at given point, issuing insn at this point could still increase 
register pressure in later points.  So I am working now on evaluation 
register pressure in later points too (some form of look ahead).  This 
approach is similar what Touati proposed (but without actually adding 
additional DD arcs).

If you are going to work on this project, some small advice about 
evaluating register sufficiency.  I found that register pressure is 
already practically minimized before insn-scheduling (I suspect that it 
is mostly done in TER).  By the way, it also means that tree balancing 
(which is actually much more complicated issue because it is not tree 
but DAG) is not necessary for the backend as it was done in Preston 
Briggs project (and mentioned in proposed Ken Zadeck's pass stack).

> What are the chances that this could get into the main GCC tree if it 
> shows up to be an improvement?
>
I don't see any problem to get the code into main GCC tree if you get 
even 1-2% improvement.  Although there are some technical questions 
(like code fitting into gcc practice standards) and  commitment to 
maintain the code.  But this problems could be definitely overcome.
> There has been a short discussion on this mailing list already [3] 
> some years ago (Note if you read this: intLP has been used to compare 
> the heuristic to the optimal result only). To my knowledge, there 
> hasn't been any more on this topic yet (anywhere).
>
> I'd prefer to implement this for the gcc, but my advisor wants me to 
> do it for the university's own compiler. Therefore I could also need 
> arguments why to do it for the GCC.
>
Personally, I'd love to see this work done in GCC.  I believe the 
research work in compiler field should be done in real industrial 
compilers because that is a final target of the research.  I saw many 
times that researchers report big improvements of their algorithms 
because they are using toy compilers but when the same algorithms are 
implemented in GCC they give practically nothing.  For me a toy compiler 
criteria is defined how good they are on some generally used compiler 
benchmarks as SPEC (the better existing compiler performance, the harder 
to get the same percent improvement).  So if your university compiler 
performance is close for example to SPECFP2000,  I think you could use 
it to get a real optimization impact.

On the other hand, you definitely will get better numbers (and spending 
less time) using a toy compiler than GCC and your research probably 
could look more successful with the academic point of view.
>
> [1] http://www.prism.uvsq.fr/~touati/thesis.html
> [2] 
> http://tel.archives-ouvertes.fr/docs/00/04/72/88/ANNEX/tel-00007405.ppt
> [3] http://gcc.gnu.org/ml/gcc/2002-07/msg00565.html
>