From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-464106-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 15128 invoked by alias); 15 Oct 2014 08:38:49 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 15090 invoked by uid 48); 15 Oct 2014 08:38:46 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/63537] [4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
Date: Wed, 15 Oct 2014 08:38:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 4.9.1
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.2
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: keywords bug_status cf_reconfirmed_on cf_known_to_work target_milestone short_desc everconfirmed
Message-ID: <bug-63537-4-b6Tpbz5s0t@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-63537-4@http.gcc.gnu.org/bugzilla/>
References: <bug-63537-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-10/txt/msg01127.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-10-15
      Known to work|                            |4.7.3
   Target Milestone|---                         |4.9.2
            Summary|Missed optimization: Loop   |[4.9/5 Regression] Missed
                   |unrolling adds extra copy   |optimization: Loop
                   |when returning aggregate    |unrolling adds extra copy
                   |                            |when returning aggregate
     Ever confirmed|0                           |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is because the outer loop is unrolled only after SRA gets a chance to
scalarize away the local aggregate.  With GCC 4.7 we unroll the loop during
early unrolling even at -O2.

With 4.9 we conclude:

Estimating sizes for loop 1
 BB: 4, after_exit: 0
  size:   2 if (i_1 <= 2)
   Exit condition will be eliminated in peeled copies.
 BB: 3, after_exit: 1
  size:   1 _4 = lhs.n[i_1];
  size:   1 _6 = _4 * rhs_5(D);
  size:   1 ret.n[i_1] = _6;
  size:   1 i_8 = i_1 + 1;
   Induction variable computation will be folded away.

size: 6-3, last_iteration: 2-0
  Loop size: 6
  Estimated size after unrolling: 7
Not unrolling loop 1: size would grow.

while 4.7 had:

Estimating sizes for loop 1
 BB: 4, after_exit: 0
  size:   2 if (i_1 <= 2)
   Exit condition will be eliminated.
 BB: 3, after_exit: 1
  size:   1 D.1593_3 = lhs.n[i_1];
  size:   1 D.1594_5 = D.1593_3 * rhs_4(D);
  size:   1 ret.n[i_1] = D.1594_5;
  size:   1 i_6 = i_1 + 1;
   Induction variable computation will be folded away.
size: 6-3, last_iteration: 2-2
  Loop size: 6
  Estimated size after unrolling: 6

so the difference is in last_iteration handling.

Honza?

Otherwise this is a optimization pass ordering issue.

Eventually a simple pass could handle

  <retval> = ret;
  ret ={v} {CLOBBER};
  return <retval>;

and back-propagate <retval> into all stores/loads of ret.