From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15128 invoked by alias); 15 Oct 2014 08:38:49 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 15090 invoked by uid 48); 15 Oct 2014 08:38:46 -0000 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/63537] [4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate Date: Wed, 15 Oct 2014 08:38:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.9.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.9.2 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: keywords bug_status cf_reconfirmed_on cf_known_to_work target_milestone short_desc everconfirmed Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-10/txt/msg01127.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2014-10-15 Known to work| |4.7.3 Target Milestone|--- |4.9.2 Summary|Missed optimization: Loop |[4.9/5 Regression] Missed |unrolling adds extra copy |optimization: Loop |when returning aggregate |unrolling adds extra copy | |when returning aggregate Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- This is because the outer loop is unrolled only after SRA gets a chance to scalarize away the local aggregate. With GCC 4.7 we unroll the loop during early unrolling even at -O2. With 4.9 we conclude: Estimating sizes for loop 1 BB: 4, after_exit: 0 size: 2 if (i_1 <= 2) Exit condition will be eliminated in peeled copies. BB: 3, after_exit: 1 size: 1 _4 = lhs.n[i_1]; size: 1 _6 = _4 * rhs_5(D); size: 1 ret.n[i_1] = _6; size: 1 i_8 = i_1 + 1; Induction variable computation will be folded away. size: 6-3, last_iteration: 2-0 Loop size: 6 Estimated size after unrolling: 7 Not unrolling loop 1: size would grow. while 4.7 had: Estimating sizes for loop 1 BB: 4, after_exit: 0 size: 2 if (i_1 <= 2) Exit condition will be eliminated. BB: 3, after_exit: 1 size: 1 D.1593_3 = lhs.n[i_1]; size: 1 D.1594_5 = D.1593_3 * rhs_4(D); size: 1 ret.n[i_1] = D.1594_5; size: 1 i_6 = i_1 + 1; Induction variable computation will be folded away. size: 6-3, last_iteration: 2-2 Loop size: 6 Estimated size after unrolling: 6 so the difference is in last_iteration handling. Honza? Otherwise this is a optimization pass ordering issue. Eventually a simple pass could handle = ret; ret ={v} {CLOBBER}; return ; and back-propagate into all stores/loads of ret.