From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-181614-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14733 invoked by alias); 30 Mar 2006 16:15:00 -0000
Received: (qmail 14723 invoked by uid 48); 30 Mar 2006 16:14:57 -0000
Date: Thu, 30 Mar 2006 16:15:00 -0000
Subject: [Bug tree-optimization/26944]  New: -ftree-ch generates worse code
X-Bugzilla-Reason: CC
Message-ID: <bug-26944-1008@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "dann at godzilla dot ics dot uci dot edu" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2006-03/txt/msg02973.txt.bz2
List-Id: <gcc-bugs.sourceware.org>

The loop from the code below is compiled to this when using gcc-4.1 -O2
.L5:
        movl    16(%ebp), %eax
        addl    %ecx, %eax
        addl    $1, %ecx
        movl    %edx, 20(%ebx,%eax,4)
        leal    (%edx,%ecx), %eax
        cmpl    %edi, %eax
        jle     .L5
but the code is much better when using gcc -fno-tree-ch -O2 
.L3:
        addl    $1, %ecx
        movl    %ebx, -4(%edx)
        addl    $4, %edx
        cmpl    %eax, %ecx
        jle     .L3
This is a regression as gcc-3.4.3 generates similar code. 

The code is from the Dhrystone as included in Unixbench.

The regression is quite important as embedded processor people still use
Dhrystone for benchmarking compiler/processor speed.

Its strange that tree-ch messes up, the loop is about as simple as loops can
get. 

typedef int One_Fifty;
typedef int Arr_1_Dim [50];
typedef int Arr_2_Dim [50] [50];
extern int Int_Glob;

void Proc_8 (Arr_1_Par_Ref, Arr_2_Par_Ref, Int_1_Par_Val, Int_2_Par_Val)
     Arr_1_Dim Arr_1_Par_Ref;
     Arr_2_Dim Arr_2_Par_Ref;
     int Int_1_Par_Val;
     int Int_2_Par_Val;
{
  register One_Fifty Int_Index;
  register One_Fifty Int_Loc;

  Int_Loc = Int_1_Par_Val + 5;
  Arr_1_Par_Ref [Int_Loc] = Int_2_Par_Val;
  Arr_1_Par_Ref [Int_Loc+1] = Arr_1_Par_Ref [Int_Loc];
  Arr_1_Par_Ref [Int_Loc+30] = Int_Loc;
  for (Int_Index = Int_Loc; Int_Index <= Int_Loc+1; ++Int_Index)
    Arr_2_Par_Ref [Int_Loc] [Int_Index] = Int_Loc;
  Arr_2_Par_Ref [Int_Loc] [Int_Loc-1] += 1;
  Arr_2_Par_Ref [Int_Loc+20] [Int_Loc] = Arr_1_Par_Ref [Int_Loc];
  Int_Glob = 5;
}


Intel's compiler generates even tighter code:

..B1.7:                         # Preds ..B1.10 ..B1.7
        movl      %ebx, (%ecx,%edx,4)                           #20.5
        addl      $1, %edx                                      #19.55
        cmpl      %eax, %edx                                    #19.3
        jle       ..B1.7        # Prob 80%                      #19.3


-- 
           Summary: -ftree-ch generates worse code
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944