From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23605 invoked by alias); 14 Jul 2011 16:51:40 -0000 Received: (qmail 23594 invoked by uid 22791); 14 Jul 2011 16:51:38 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 14 Jul 2011 16:51:24 +0000 From: "wschmidt at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/49749] New: Reassociation rank algorithm does not include all non-NULL operands X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: wschmidt at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Thu, 14 Jul 2011 16:51:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-07/txt/msg01134.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49749 Summary: Reassociation rank algorithm does not include all non-NULL operands Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned@gcc.gnu.org ReportedBy: wschmidt@gcc.gnu.org CC: dje@gcc.gnu.org, richard.guenther@gmail.com, bergner@gcc.gnu.org, pthaugen@gcc.gnu.org, meissner@gcc.gnu.org Host: powerpc64-linux Target: powerpc64-linux Build: powerpc64-linux Created attachment 24754 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24754 Excerpt from bwaves showing the problem on r161839 and before tree-ssa-reassoc.c: get_rank () contains the following code: /* Otherwise, find the maximum rank for the operands, or the bb rank, whichever is less. */ rank = 0; maxrank = bb_rank[gimple_bb(stmt)->index]; if (gimple_assign_single_p (stmt)) { tree rhs = gimple_assign_rhs1 (stmt); n = TREE_OPERAND_LENGTH (rhs); if (n == 0) rank = MAX (rank, get_rank (rhs)); else { for (i = 0; i < n && TREE_OPERAND (rhs, i) && rank != maxrank; i++) rank = MAX(rank, get_rank (TREE_OPERAND (rhs, i))); } } else { n = gimple_num_ops (stmt); for (i = 1; i < n && rank != maxrank; i++) { gcc_assert (gimple_op (stmt, i)); rank = MAX(rank, get_rank (gimple_op (stmt, i))); } } The loop condition: i < n && TREE_OPERAND (rhs, i) && rank != maxrank; i++) erroneously stops processing operands as soon as a NULL operand is detected. As an example, a TARGET_MEM_REF without a symbol operand (operand 0) will be assigned a rank of zero by that loop. The condition "rank != maxrank" in both loops seems to indicate that an expression should never get a rank larger than its block rank. I am seeing cases where the rank is slightly larger; for example, a block rank of 1638400 and an operand rank of 1638404. If the intent is to limit the value to 1638400 in such a case, this also needs to be fixed. The existence of this bug has contributed in an odd way to a degradation of the CPU2006 bwaves benchmark on powerpc64 (affected procedure attached). In trunk revision 161840 (7/5/2010), Richard Guenther changed the ivopts code to generate MEM_REFs in preference to TARGET_MEM_REFs. Prior to this change, SSA names from assignments to TARGET_MEM_REFs were given a rank of 1. Following this change, SSA names from assignments to MEM_REFs were given a rank of 1638404 (for example), while SSA names from PHI assignments in the same block were given a rank of 1638400. Thus, prior to this change, PHI assignments had higher rank than memory references in the same block, but after this change, they had lower rank. Giving preference to the PHI expressions over the memory references provided better reassociation for the bwaves example. In particular, prior to 161840 a prephitmp was reassociated so that copyrename could generate something like prephitmp.160_552 = D.1095_522 + prephitmp.160_161; thus making prephitmp.160 independent of other calculations in the loop iteration. When the PHI expression was given lowest rank, this did not occur, causing prephitmp.160 to be dependent upon the result of the previous loop iteration's calculations. For powerpc64, we unroll the loop by a factor of 2. When the two iterations are independent, FMAs from the two iterations can be interleaved leading to good performance. When the two iterations are not independent, we end up with a stack of linearly dependent FMAs at the end of the loop that chokes the floating-point unit. This leads to the question of whether PHIs at the top of a block should generally be given preference over memory references in the interior of that block. It may be that if the PHIs and the memory references have the same rank, this is exactly what will happen. That might just be serendipity, though; thoughts? I will be experimenting with capping the rank of an expression at the rank of the containing block, and including all operands in the ranking. But I am very interested in thoughts on this problem from those who are familiar with the reassociator. Please let me know if you would like further materials to examine. The bug goes back a long way; the degradation affects us on 4.6 and 4.7.