From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-160149-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 22246 invoked by alias); 19 Mar 2010 16:10:26 -0000
Received: (qmail 22236 invoked by uid 22791); 19 Mar 2010 16:10:25 -0000
X-SWARE-Spam-Status: No, hits=-0.8 required=5.0 	tests=AWL,BAYES_50
X-Spam-Check-By: sourceware.org
Received: from portal.icerasemi.com (HELO pOrtaL.icerasemi.com) (213.249.204.90)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 19 Mar 2010 16:10:20 +0000
X-ASG-Debug-ID: 1269015013-1860002b0002-ThFIni
X-Barracuda-URL: http://192.168.1.243:80/cgi-bin/mark.cgi
Received: from Exchangevs.Icerasemi.com (cluster1.icerasemi.local [192.168.1.203]) 	by pOrtaL.icerasemi.com (Spam & Virus Firewall) with ESMTP id 9CFE5F7DA2 	for <gcc@gcc.gnu.org>; Fri, 19 Mar 2010 16:10:13 +0000 (GMT)
Received: from Exchangevs.Icerasemi.com (cluster1.icerasemi.local [192.168.1.203]) by pOrtaL.icerasemi.com with ESMTP id oNEsg8RKImB4ksdH for <gcc@gcc.gnu.org>; Fri, 19 Mar 2010 16:10:13 +0000 (GMT)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; 	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-ASG-Orig-Subj: Understanding Scheduling
Subject: Understanding Scheduling
Date: Fri, 19 Mar 2010 16:12:00 -0000
Message-ID: <4D60B0700D1DB54A8C0C6E9BE69163700E08F38E@EXCHANGEVS.IceraSemi.local>
From: "Ian Bolton" <bolton@IceraSemi.com>
To: <gcc@gcc.gnu.org>
X-Barracuda-Connect: cluster1.icerasemi.local[192.168.1.203]
X-Barracuda-Start-Time: 1269015013
X-Barracuda-Spam-Score: 0.00
X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=
X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.25234 	Rule breakdown below 	 pts rule name              description 	---- ---------------------- --------------------------------------------------
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2010-03/txt/msg00294.txt.bz2

Hi folks!

I've moved on from register allocation (see Understanding IRA thread)
and onto scheduling.

In particular, I am investigating the effectiveness of the sched1
pass on our architecture and the associated interblock-scheduling
optimisation.


Let's start with sched1 ...

For our architecture at least, it seems like Richard Earnshaw is
right that sched1 is generally bad when you are using -Os, because
it can increase register pressure and cause extra spill/fill code when
you move independent instructions in between dependent instructions.

For example:

LOAD c2,c1[0]
LOAD c3,c1[1]
ADD c2,c2,c3  # depends on LOAD above it (might stall)
LOAD c3,c1[2]
ADD c2,c2,c3  # depends on LOAD above it (might stall)
LOAD c3,c1[3]
ADD c2,c2,c3  # depends on LOAD above it (might stall)
LOAD c3,c1[4]
ADD c2,c2,c3  # depends on LOAD above it (might stall)

might become:

LOAD c2,c1[0]
LOAD c3,c1[1]
LOAD c4,c1[2] # independent of first two LOADS
LOAD c5,c1[3] # independent of first two LOADS
ADD c2,c2,c3  # not dependent on preceding two insns (avoids stall)
LOAD c3,c1[4]
ADD c2,c2,c4  # not dependent on preceding three insns (avoids stall)
...

This is a nice effect if your LOAD instructions have a latency of 3,
so this should lead to performance increases, and indeed this is
what I see for some low-reg-pressure Nullstone cases.  Turning
sched1 off therefore causes a regression on these cases.

However, this pipeline-type effect may increase your register
pressure such that caller-save regs are required and extra spill/fill
code needs to be generated.  This happens for some other Nullstone
cases, and so it is good to have sched1 turned off for them!

It's therefore looking like some kind of clever hybrid is required.

I mention all this because I was wondering which other architectures
have turned off sched1 for -Os?  More importantly, I was wondering
if anyone else had considered creating some kind of clever hybrid
that only uses sched1 when it will increase performance without
increasing register pressure?

Or perhaps I could make a heuristic based on the balanced-ness of the
tree?  (I see sched1 does a lot better if the tree is balanced, since
it has more options to play with.)


Now onto interblock-scheduling ...

As we all know, you can't have interblock-scheduling enabled unless
you use the sched1 pass, so if sched1 is off then interblock is
irrelevant.  For now, let's assume we are going to make some clever
hybrid that allows sched1 when we think it will increase performance
for Os and we are going to keep sched1 on for O2 and O3.

As I understand it, interblock-scheduling enlarges the scope of
sched1, such that you can insert independent insns from a
completely different block in between dependent insns in this
block.  As well as potentially amortizing stalls on high latency
insns, we also get the chance to do "meatier" work in the destination
block and leave less to do in the source block.  I don't know if this
is a deliberate effect of interblock-scheduling or if it is just
a happy side-effect.

Anyway, the reason I mention interblock-scheduling is that I see it
doing seemingly intelligent moves, but then the later BB-reorder pass
is juggling blocks around such that we end up with extra code inside
hot loops!  I assume this is because the scheduler and BB-reorderer
are largely ignorant of each other, and so good intentions on the
part of the former can be scuppered by the latter.

I was wondering if anyone else has witnessed this madness on their
architecture?  Maybe it is a bug with BB-reorder?  Or maybe it should
only be enabled when function profiling information (e.g. gcov) is
available?  Or maybe it is not a high-priority thing for anyone to
think about because no one uses interblock-scheduling?

If anyone can shed some light on the above, I'd greatly appreciate
it.  For now, I will continue my experiments with selective enabling
of sched1 for -Os.

Best regards,
Ian