From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16426 invoked by alias); 25 Mar 2005 22:21:55 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 16387 invoked by uid 48); 25 Mar 2005 22:21:49 -0000 Date: Fri, 25 Mar 2005 22:21:00 -0000 From: "sje at cup dot hp dot com" To: gcc-bugs@gcc.gnu.org Message-ID: <20050325222146.20643.sje@cup.hp.com> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug tree-optimization/20643] New: Tree loop optimizer does worse job than RTL loop optimizer X-Bugzilla-Reason: CC X-SW-Source: 2005-03/txt/msg03044.txt.bz2 List-Id: In the attached test case, 3.4.* GCC generates better code than 4.0 (or 4.1) because it moves more loop invariant code out of the inner loop of P7Viterbi. The problem seems to be in the alias analysis which determines what can be moved out of that loop. If you change the field M, which is unused, from int to float then the 4.* GCC generates better code. I tried the structure-alias branch to see if that helped and it didn't. See the email string starting at http://gcc.gnu.org/ml/gcc/2005-03/msg00835.html for some more info. Test case: #define L_CONST 500 void *malloc(long size); struct plan7_s { int M; int **tsc; /* transition scores [0.6][1.M-1] */ }; struct dpmatrix_s { int **mmx; }; struct dpmatrix_s *mx; void AllocPlan7Body(struct plan7_s *hmm, int M) { int i; hmm->tsc = malloc (7 * sizeof(int *)); hmm->tsc[0] = malloc ((M+16) * sizeof(int)); mx->mmx = (int **) malloc(sizeof(int *) * (L_CONST+1)); for (i = 0; i <= L_CONST; i++) { mx->mmx[i] = malloc (M+2+16); } return; } void P7Viterbi(int L, int M, struct plan7_s *hmm, int **mmx) { int i,k; for (i = 1; i <= L; i++) { for (k = 1; k <= M; k++) { mmx[i][k] = mmx[i-1][k-1] + hmm->tsc[0][k-1]; } } } main () { struct plan7_s *hmm; char dsq[L_CONST]; int i; hmm = (struct plan7_s *) malloc (sizeof (struct plan7_s)); mx = (struct dpmatrix_s *) malloc (sizeof (struct dpmatrix_s)); AllocPlan7Body(hmm, 10); for (i = 0; i < 600000; i++) { P7Viterbi(500, 10, hmm, mx->mmx); } } -- Summary: Tree loop optimizer does worse job than RTL loop optimizer Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sje at cup dot hp dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: ia64-*-* GCC host triplet: ia64-*-* GCC target triplet: ia64-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20643