From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-219526-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 6895 invoked by alias); 4 Jun 2008 20:50:32 -0000
Received: (qmail 6884 invoked by uid 22791); 4 Jun 2008 20:50:32 -0000
X-Spam-Check-By: sourceware.org
Received: from fg-out-1718.google.com (HELO fg-out-1718.google.com) (72.14.220.152)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 04 Jun 2008 20:50:12 +0000
Received: by fg-out-1718.google.com with SMTP id e21so155968fga.28         for <gcc-patches@gcc.gnu.org>; Wed, 04 Jun 2008 13:50:09 -0700 (PDT)
Received: by 10.86.98.14 with SMTP id v14mr758546fgb.74.1212612609023;         Wed, 04 Jun 2008 13:50:09 -0700 (PDT)
Received: from ?10.0.0.25? ( [81.195.7.177])         by mx.google.com with ESMTPS id f31sm4637153fkf.5.2008.06.04.13.50.07         (version=SSLv3 cipher=RC4-MD5);         Wed, 04 Jun 2008 13:50:08 -0700 (PDT)
Message-ID: <4846FFFF.8020402@ispras.ru>
Date: Wed, 04 Jun 2008 20:50:00 -0000
From: Andrey Belevantsev <abel@ispras.ru>
User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
MIME-Version: 1.0
To: Mark Mitchell <mark@codesourcery.com>
CC: GCC Patches <gcc-patches@gcc.gnu.org>,   Jim Wilson <wilson@tuliptree.org>,  Vladimir Makarov <vmakarov@redhat.com>
Subject: Re: [RFC] Selective scheduling pass
References: <4845522C.3010006@ispras.ru> <4846C8F2.2080508@codesourcery.com>
In-Reply-To: <4846C8F2.2080508@codesourcery.com>
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2008-06/txt/msg00209.txt.bz2

Mark Mitchell wrote:
> That's a very good result.  Congratulations! 
Thank you!

> I know that this scheduler is aimed at CPUs like the ones you mention 
> above.  However, would it function correctly on other CPUs with more 
> "traditional" characteristics, like older ARM, MIPS, or x86 cores?  And, 
> would it be reasonably possible to tune it for those CPUs as well?
When a target doesn't do anything "fancy" in scheduler hooks, everything 
should just work (modulo bugs, of course; we've tried only ppc64 and 
x86-64).  In case a target saves some information describing scheduler's 
state, simple hooks manipulating this data should be implemented, like 
we did for the rs6000 port.

> As with the IRA allocator, I'd like to avoid having multiple schedulers 
> in GCC.  (I know we've done that for a while, but I still think it's 
> undesirable.)  So, I'd like to see if we can get this to work well 
> across all of the Primary and Secondary CPUs, and then just make it "the 
> GCC scheduler" rather than an optional thing enabled at some 
> optimization levels on some CPUs.
This is our goal as well, and I think it can be done incrementally.  We 
are now working on the ppc performance.  Then we need to tune the 
scheduler so that for traditional targets it is no worse in performance 
and the slowdown is reasonable, e.g. with disabling pipelining and 
decreasing the scheduling window.  The last thing to do is to speed up 
the implementation so that for scheduling-eager targets with pipelining 
enabled the slowdown will be acceptable for -O2.

Note that the selective scheduler does not subsume SMS, but complements 
it, because SMS does better job for countable loops, but cannot handle 
loops with control flow and with unknown number of iterations.  So in 
any case there will be two schedulers.

> Do you think that's feasible?  Or do you think that there are inherent 
> aspects of the algorithm that mean that we need to have this new 
> scheduler for one class of CPUs and the old scheduler for the other 
> class?  Is there any way to make the new scheduler do a reasonable job 
> with the existing descriptions in GCC, so that port maintainers can tune 
> later, or is a level of effort like that for Itanium require
The ia64 backend is very complex, and we put a lot of efforts in tuning 
it by itself -- you can see it in my other mail about target changes. 
So I think that tuning for other targets will be simpler.  The cell 
results I mentioned in the mail were received from a guy who did the 
tuning internally in Samsung, and AFAIR he didn't mentioned any 
target-independent changes he had to do, but basically he just made it 
working.

>> Compile time slowdown measured with --enable-checking=assert is quite 
>> significant -- about 12% on spec int and about 18% on spec fp and 
>> cc1-i-files collection.  For this reason, we have enabled selective 
>> scheduler by default at -O3 on ia64 and disabled by default on other 
>> targets.
> 
> Do you understand what's causing the compile-time slowdown?
The part that takes most time is the update of availability sets, as 
this is the central part of the algorithm.  Renaming is quite expensive 
too, but we have tackled this limiting it only to several insns with the 
largest priority.  To make the updates faster, you need to build the 
data dependence graph and to keep it up to date while scheduling. 
Unfortunately, we didn't manage to do this during this project.  The 
first step towards this goal will be to make the dependence graph 
classify the dependencies built on control/data, lhs/rhs, 
register/memory etc.  Then we can invent the mechanism of updating the 
graph, which would not be trivial -- e.g. when an insn gets renamed, we 
have introduced a register-register copy which can generate completely 
new register dependencies that cannot be devised from existing ones. 
Such a project is likely to make it to trunk on the next release cycle, 
and that would correspond to the last step of the incremental approach 
outlined above.

Yours, Andrey