From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32479 invoked by alias); 26 Sep 2002 19:01:34 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 32439 invoked from network); 26 Sep 2002 19:01:34 -0000 Received: from unknown (HELO mail-out2.apple.com) (17.254.0.51) by sources.redhat.com with SMTP; 26 Sep 2002 19:01:34 -0000 Received: from mailgate1.apple.com (A17-128-100-225.apple.com [17.128.100.225]) by mail-out2.apple.com (8.11.3/8.11.3) with ESMTP id g8QJ1X914763; Thu, 26 Sep 2002 12:01:33 -0700 (PDT) Received: from scv3.apple.com (scv3.apple.com) by mailgate1.apple.com (Content Technologies SMTPRS 4.2.5) with ESMTP id ; Thu, 26 Sep 2002 12:01:26 -0700 Received: from johada.apple.com (johada.apple.com [17.201.20.167]) by scv3.apple.com (8.11.3/8.11.3) with ESMTP id g8QJ1W313857; Thu, 26 Sep 2002 12:01:32 -0700 (PDT) Date: Thu, 26 Sep 2002 12:16:00 -0000 Subject: Another performance regression Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v543) Cc: Dale Johannesen To: gcc-patches@gcc.gnu.org, gcc@gcc.gnu.org From: Dale Johannesen Content-Transfer-Encoding: 7bit Message-Id: <62C4CADA-D182-11D6-A88A-003065C86F94@apple.com> X-SW-Source: 2002-09/txt/msg01078.txt.bz2 Try the program at the bottom with -O2 -funroll-loops. Don't worry about the body of the loops; that's only important insofar as it has the right amount of code to cause the inner loop to be unrolled the right number of times, namely 2, with 1 left over. The unroller generates some rather stupid code here: /* Calculate the difference between the final and initial values. Final value may be a (plus (reg x) (const_int 1)) rtx. Let the following cse pass simplify this if initial value is a constant. (there's more to it besides the expression described above) with the expectation that cse will clean it up. However, the second pass of loop optimization pulls some, but not all, of this code out of the outer loop, with the effect that cse can't eliminate it. On ppc, for example, the beginning of the function looks like this: bge- cr0,L18 ; zero-trip check for outer loop li r0,1 ; unnecessary cmpwi cr1,r0,0 ; unnecessary cmpwi cr6,r0,25 ; unnecessary L16: ; top of outer loop slwi r0,r6,2 li r8,0 add r7,r0,r28 mr r10,r29 bge+ cr6,L22 ; always false beq- cr1,L15 ; always false L22: ... single copy of inner loop body... L15: ... two copies of inner loop body, executed 12 times... ble L15 ... blt L16 L18: I'm not entirely sure, but I think this patch was the culprit: 2002-07-21 Richard Henderson * loop.h (LOOP_AUTO_UNROLL): Rename from LOOP_FIRST_PASS. * loop.c (strength_reduce): Update. * toplev.c (rest_of_compilation): Do unrolling in the first loop pass, not the second. This didn't happen when unrolling was done last. So should I fix this by making the unrolling code smarter, in effect doing cse's job? It seems likely Roger Sayle's approach of running gcse after loop opts would Just Work. Is that going to go in? int foo(char *abcd00, int abcd01, char *abcd02, int *abcd03, int*abcd04) { int abcd05, abcd06, abcd07=0, abcd08=0, abcd09, abcd10, abcd11=0; for (abcd05=0;abcd05