From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-135889-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 16426 invoked by alias); 25 Mar 2005 22:21:55 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 16387 invoked by uid 48); 25 Mar 2005 22:21:49 -0000
Date: Fri, 25 Mar 2005 22:21:00 -0000
From: "sje at cup dot hp dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Message-ID: <20050325222146.20643.sje@cup.hp.com>
Reply-To: gcc-bugzilla@gcc.gnu.org
Subject: [Bug tree-optimization/20643] New: Tree loop optimizer does worse job than RTL loop optimizer
X-Bugzilla-Reason: CC
X-SW-Source: 2005-03/txt/msg03044.txt.bz2
List-Id: <gcc-bugs.sourceware.org>

In the attached test case, 3.4.* GCC generates better code than 4.0 (or 4.1)
because it moves more loop invariant code out of the inner loop of P7Viterbi. 
The problem seems to be in the alias analysis which determines what can be moved
out of that loop.  If you change the field M, which is unused, from int to float
then the 4.* GCC generates better code.  I tried the structure-alias branch to
see if that helped and it didn't.  See the email string starting at
http://gcc.gnu.org/ml/gcc/2005-03/msg00835.html for some more info.

Test case:

#define L_CONST 500

void *malloc(long size);

struct plan7_s {
  int M;
  int **tsc;                   /* transition scores     [0.6][1.M-1]        */
};

struct dpmatrix_s {
  int **mmx;
};
struct dpmatrix_s *mx;


void
AllocPlan7Body(struct plan7_s *hmm, int M) 
{
  int i;

  hmm->tsc    = malloc (7 * sizeof(int *));
  hmm->tsc[0] = malloc ((M+16) * sizeof(int));
  mx->mmx = (int **) malloc(sizeof(int *) * (L_CONST+1));
  for (i = 0; i <= L_CONST; i++) {
    mx->mmx[i] = malloc (M+2+16);
  }
  return;
}  

void
P7Viterbi(int L, int M, struct plan7_s *hmm, int **mmx)
{
  int   i,k;
  
  for (i = 1; i <= L; i++) {
    for (k = 1; k <= M; k++) {
      mmx[i][k] = mmx[i-1][k-1] + hmm->tsc[0][k-1];
    }
  }
}

main ()
{
	struct plan7_s *hmm;
	char dsq[L_CONST];
        int i;

	hmm = (struct plan7_s *) malloc (sizeof (struct plan7_s));
	mx = (struct dpmatrix_s *) malloc (sizeof (struct dpmatrix_s));
	AllocPlan7Body(hmm, 10);
        for (i = 0; i < 600000; i++) {
                P7Viterbi(500, 10, hmm, mx->mmx);
        }
}

-- 
           Summary: Tree loop optimizer does worse job than RTL loop
                    optimizer
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sje at cup dot hp dot com
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: ia64-*-*
  GCC host triplet: ia64-*-*
GCC target triplet: ia64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20643