From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1658 invoked by alias); 20 Jul 2011 16:30:01 -0000 Received: (qmail 1647 invoked by uid 22791); 20 Jul 2011 16:30:00 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 20 Jul 2011 16:29:46 +0000 From: "wschmidt at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/49749] Reassociation rank algorithm does not include all non-NULL operands X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: wschmidt at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: wschmidt at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Status AssignedTo Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Wed, 20 Jul 2011 16:30:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-07/txt/msg01680.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49749 William J. Schmidt changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |wschmidt at gcc dot gnu.org |gnu.org | --- Comment #9 from William J. Schmidt 2011-07-20 16:28:49 UTC --- Created attachment 24798 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24798 Proposed patch I'm attaching a patch that solves this issue. The patch was produced against the ibm/gcc-4_6-branch, but should apply OK to trunk -- I'll verify that later if the direction of this patch is acceptable. This regains the 20% performance loss we had experienced in 410.bwaves, and also gives a 5% boost to 444.namd. No significant degradations were observed on SPEC cpu2006. In addition to fixing the operand access problems, the patch looks for loop-carried dependencies in innermost loops, and biases the reassociation so that the phi target is summed last. The purpose of this is to identify accumulator variables in inner loops and make each iteration of their accumulations independent. When these loops are unrolled, multiple independent iterations can be interleaved for improved performance. The optimization is restricted to innermost loops to avoid unnecessarily raising register pressure. There may be a better way to achieve the bias than what I've chosen here, so please comment. If the general approach is acceptable, I'll apply comments and submit the patch for approval.