From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-291080-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24410 invoked by alias); 4 May 2011 08:15:24 -0000
Received: (qmail 24395 invoked by uid 22791); 4 May 2011 08:15:21 -0000
X-SWARE-Spam-Status: No, hits=-2.3 required=5.0	tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Received: from mail-ey0-f175.google.com (HELO mail-ey0-f175.google.com) (209.85.215.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 04 May 2011 08:15:06 +0000
Received: by eye27 with SMTP id 27so297717eye.20        for <gcc-patches@gcc.gnu.org>; Wed, 04 May 2011 01:15:05 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.213.33.67 with SMTP id g3mr1832763ebd.13.1304496904632; Wed, 04 May 2011 01:15:04 -0700 (PDT)
Received: by 10.213.108.203 with HTTP; Wed, 4 May 2011 01:15:04 -0700 (PDT)
In-Reply-To: <BANLkTimKTPwg3kM+rUsH7SDQOBX2iF-e5Q@mail.gmail.com>
References: <BANLkTimKTPwg3kM+rUsH7SDQOBX2iF-e5Q@mail.gmail.com>
Date: Wed, 04 May 2011 08:19:00 -0000
Message-ID: <BANLkTimKLK3sA3X0P2wHy4PiV82i9iP8uQ@mail.gmail.com>
Subject: Re: [PATCH, SMS] Avoid considering debug_insn when calculating SCCs
From: Revital Eres <revital.eres@linaro.org>
To: zaks@il.ibm.com, gcc-patches@gcc.gnu.org
Cc: Patch Tracking <patches@linaro.org>
Content-Type: text/plain; charset=ISO-8859-1
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-05/txt/msg00251.txt.bz2

Hello,

The following is a summary of discussion I had with Ayal regarding the patch:

Some background: currently, SMS supports only  targets where the doloop
pattern is decoupled from the rest of the loop's instructions (for example
PowerPC) (we'll call it 'case decoupled' for simplicity) In this case,
the branch is not been scheduled with the rest of the instructions but
rather placed in row ii-1 at the end of the scheduling procedure after
all the rest of the instructions had been scheduled. The resulting kernel
is optimal with respect to the Stage Count because min_cycle placed in
row 0 after normalizing the cycles to start from cycle zero.
This patch tries to extend SMS to support targets where the doloop
pattern is not decoupled from the rest of the loop's instructions (name
it 'case NEW' for simplicity). In this case the branch can not be placed
wherever we want due to the fact it must honor dependencies and thus we
schedule the branch instruction with the rest of the loop's instructions
and rotate it to be in row ii-1 at the end of the scheduling procedure
to make sure it's the last instruction in the iteration.

The suggestion was to simplify the patch by always schedule the branch
with the rest of the instructions.
This should not effect performance but rather code size by increasing
the SC by at most 1, which means adding instructions from at most one
iteration to the prologue and epilogue; for case decoupled. (where we
have the alternative of normalizing the cycles and achieve optimal SC).

The following is my attempt to prove that the SC can increase by
at most one:
If the distance between min_cycle and max_cycle remains the same when
considering the same loop with decoupled branch part, once scheduling
the branch instruction with the rest of the loop's instructions and
once ignoring it; it means that the SC is at most +1 for the first case.
This is true in one direction as the branch instruction should not effect
the scheduling window of any other instruction which is what we expect
for case decoupled. The question is if there are cases where the branch
can be scheduled outside the range of min_cycle and max_cycle.  I think
there is no such case because the branch will be scheduled in asap =
0 which means that it will fall in the range of min_cycle max_cycle.
In practice there is edge between memory references and the branch
instruction with latency zero which is inserted by haifa sched. Also,
it might be that the branch will be scheduled outside the range of
min_cycle and max_cycle due to resources constraints. For example, in
PowerPC the issue rate in SMS in always 1 which forces the branch to be
scheduled in a new cycle (and might also influence ii in artifact way).

Example of resulting SMS kernel for the same loop:

The SMS kernel for case NEW, resulting in SC of 3 and ii 5:

cycle                      node
------------------------------------
-3                                0
-2
-1                                1
<- start of SC 2
0
1
2                                 3
3
4                               5 the branch
<- start of SC 3
5                                  4

The SMS kernel for case decoupled resulting in SC of 2 and ii 5:

cycle                      node
------------------------------------
0                                0
1
2                                 1
3                                 3
4                                 2
<- start of SC 2
5
6                                  3


Thanks,
Revital