From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6895 invoked by alias); 4 Jun 2008 20:50:32 -0000 Received: (qmail 6884 invoked by uid 22791); 4 Jun 2008 20:50:32 -0000 X-Spam-Check-By: sourceware.org Received: from fg-out-1718.google.com (HELO fg-out-1718.google.com) (72.14.220.152) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 04 Jun 2008 20:50:12 +0000 Received: by fg-out-1718.google.com with SMTP id e21so155968fga.28 for ; Wed, 04 Jun 2008 13:50:09 -0700 (PDT) Received: by 10.86.98.14 with SMTP id v14mr758546fgb.74.1212612609023; Wed, 04 Jun 2008 13:50:09 -0700 (PDT) Received: from ?10.0.0.25? ( [81.195.7.177]) by mx.google.com with ESMTPS id f31sm4637153fkf.5.2008.06.04.13.50.07 (version=SSLv3 cipher=RC4-MD5); Wed, 04 Jun 2008 13:50:08 -0700 (PDT) Message-ID: <4846FFFF.8020402@ispras.ru> Date: Wed, 04 Jun 2008 20:50:00 -0000 From: Andrey Belevantsev User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: Mark Mitchell CC: GCC Patches , Jim Wilson , Vladimir Makarov Subject: Re: [RFC] Selective scheduling pass References: <4845522C.3010006@ispras.ru> <4846C8F2.2080508@codesourcery.com> In-Reply-To: <4846C8F2.2080508@codesourcery.com> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2008-06/txt/msg00209.txt.bz2 Mark Mitchell wrote: > That's a very good result. Congratulations! Thank you! > I know that this scheduler is aimed at CPUs like the ones you mention > above. However, would it function correctly on other CPUs with more > "traditional" characteristics, like older ARM, MIPS, or x86 cores? And, > would it be reasonably possible to tune it for those CPUs as well? When a target doesn't do anything "fancy" in scheduler hooks, everything should just work (modulo bugs, of course; we've tried only ppc64 and x86-64). In case a target saves some information describing scheduler's state, simple hooks manipulating this data should be implemented, like we did for the rs6000 port. > As with the IRA allocator, I'd like to avoid having multiple schedulers > in GCC. (I know we've done that for a while, but I still think it's > undesirable.) So, I'd like to see if we can get this to work well > across all of the Primary and Secondary CPUs, and then just make it "the > GCC scheduler" rather than an optional thing enabled at some > optimization levels on some CPUs. This is our goal as well, and I think it can be done incrementally. We are now working on the ppc performance. Then we need to tune the scheduler so that for traditional targets it is no worse in performance and the slowdown is reasonable, e.g. with disabling pipelining and decreasing the scheduling window. The last thing to do is to speed up the implementation so that for scheduling-eager targets with pipelining enabled the slowdown will be acceptable for -O2. Note that the selective scheduler does not subsume SMS, but complements it, because SMS does better job for countable loops, but cannot handle loops with control flow and with unknown number of iterations. So in any case there will be two schedulers. > Do you think that's feasible? Or do you think that there are inherent > aspects of the algorithm that mean that we need to have this new > scheduler for one class of CPUs and the old scheduler for the other > class? Is there any way to make the new scheduler do a reasonable job > with the existing descriptions in GCC, so that port maintainers can tune > later, or is a level of effort like that for Itanium require The ia64 backend is very complex, and we put a lot of efforts in tuning it by itself -- you can see it in my other mail about target changes. So I think that tuning for other targets will be simpler. The cell results I mentioned in the mail were received from a guy who did the tuning internally in Samsung, and AFAIR he didn't mentioned any target-independent changes he had to do, but basically he just made it working. >> Compile time slowdown measured with --enable-checking=assert is quite >> significant -- about 12% on spec int and about 18% on spec fp and >> cc1-i-files collection. For this reason, we have enabled selective >> scheduler by default at -O3 on ia64 and disabled by default on other >> targets. > > Do you understand what's causing the compile-time slowdown? The part that takes most time is the update of availability sets, as this is the central part of the algorithm. Renaming is quite expensive too, but we have tackled this limiting it only to several insns with the largest priority. To make the updates faster, you need to build the data dependence graph and to keep it up to date while scheduling. Unfortunately, we didn't manage to do this during this project. The first step towards this goal will be to make the dependence graph classify the dependencies built on control/data, lhs/rhs, register/memory etc. Then we can invent the mechanism of updating the graph, which would not be trivial -- e.g. when an insn gets renamed, we have introduced a register-register copy which can generate completely new register dependencies that cannot be devised from existing ones. Such a project is likely to make it to trunk on the next release cycle, and that would correspond to the last step of the incremental approach outlined above. Yours, Andrey