From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24410 invoked by alias); 4 May 2011 08:15:24 -0000 Received: (qmail 24395 invoked by uid 22791); 4 May 2011 08:15:21 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-ey0-f175.google.com (HELO mail-ey0-f175.google.com) (209.85.215.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 04 May 2011 08:15:06 +0000 Received: by eye27 with SMTP id 27so297717eye.20 for ; Wed, 04 May 2011 01:15:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.33.67 with SMTP id g3mr1832763ebd.13.1304496904632; Wed, 04 May 2011 01:15:04 -0700 (PDT) Received: by 10.213.108.203 with HTTP; Wed, 4 May 2011 01:15:04 -0700 (PDT) In-Reply-To: References: Date: Wed, 04 May 2011 08:19:00 -0000 Message-ID: Subject: Re: [PATCH, SMS] Avoid considering debug_insn when calculating SCCs From: Revital Eres To: zaks@il.ibm.com, gcc-patches@gcc.gnu.org Cc: Patch Tracking Content-Type: text/plain; charset=ISO-8859-1 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-05/txt/msg00251.txt.bz2 Hello, The following is a summary of discussion I had with Ayal regarding the patch: Some background: currently, SMS supports only targets where the doloop pattern is decoupled from the rest of the loop's instructions (for example PowerPC) (we'll call it 'case decoupled' for simplicity) In this case, the branch is not been scheduled with the rest of the instructions but rather placed in row ii-1 at the end of the scheduling procedure after all the rest of the instructions had been scheduled. The resulting kernel is optimal with respect to the Stage Count because min_cycle placed in row 0 after normalizing the cycles to start from cycle zero. This patch tries to extend SMS to support targets where the doloop pattern is not decoupled from the rest of the loop's instructions (name it 'case NEW' for simplicity). In this case the branch can not be placed wherever we want due to the fact it must honor dependencies and thus we schedule the branch instruction with the rest of the loop's instructions and rotate it to be in row ii-1 at the end of the scheduling procedure to make sure it's the last instruction in the iteration. The suggestion was to simplify the patch by always schedule the branch with the rest of the instructions. This should not effect performance but rather code size by increasing the SC by at most 1, which means adding instructions from at most one iteration to the prologue and epilogue; for case decoupled. (where we have the alternative of normalizing the cycles and achieve optimal SC). The following is my attempt to prove that the SC can increase by at most one: If the distance between min_cycle and max_cycle remains the same when considering the same loop with decoupled branch part, once scheduling the branch instruction with the rest of the loop's instructions and once ignoring it; it means that the SC is at most +1 for the first case. This is true in one direction as the branch instruction should not effect the scheduling window of any other instruction which is what we expect for case decoupled. The question is if there are cases where the branch can be scheduled outside the range of min_cycle and max_cycle. I think there is no such case because the branch will be scheduled in asap = 0 which means that it will fall in the range of min_cycle max_cycle. In practice there is edge between memory references and the branch instruction with latency zero which is inserted by haifa sched. Also, it might be that the branch will be scheduled outside the range of min_cycle and max_cycle due to resources constraints. For example, in PowerPC the issue rate in SMS in always 1 which forces the branch to be scheduled in a new cycle (and might also influence ii in artifact way). Example of resulting SMS kernel for the same loop: The SMS kernel for case NEW, resulting in SC of 3 and ii 5: cycle node ------------------------------------ -3 0 -2 -1 1 <- start of SC 2 0 1 2 3 3 4 5 the branch <- start of SC 3 5 4 The SMS kernel for case decoupled resulting in SC of 2 and ii 5: cycle node ------------------------------------ 0 0 1 2 1 3 3 4 2 <- start of SC 2 5 6 3 Thanks, Revital