From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-494607-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 84107 invoked by alias); 12 Aug 2015 07:12:36 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 84021 invoked by uid 55); 12 Aug 2015 07:12:32 -0000
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/29256] [4.9/5/6 regression] loop performance regression
Date: Wed, 12 Aug 2015 07:12:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.2.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution:
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.4
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-29256-4-maPsMd0asl@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-29256-4@http.gcc.gnu.org/bugzilla/>
References: <bug-29256-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-08/txt/msg00749.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
--- Comment #57 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 11 Aug 2015, wschmidt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
> 
> --- Comment #56 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
> (In reply to Bill Schmidt from comment #53)
> > I'm not a fan of a tree-level unroller.  It's impossible to make good
> > decisions about unroll factors that early.  But your second approach sounds
> > quite promising to me.
> 
> I would be willing to soften this statement.  I think that an early unroller
> might well be a profitable approach for most systems with large caches and so
> forth, where if the unrolling heuristics are not completely accurate we are
> still likely to make a reasonably good decision.  However, I would expect to
> see ports with limited caches/memory to want more accurate control over
> unrolling decisions.  So I could see allowing ports to select between a GIMPLE
> unroller and an RTL unroller (I doubt anybody would want both).
> 
> In general it seems like PowerPC could benefit from more aggressive unrolling
> much of the time, provided we can also solve the related IVOPTS problems that
> cause too much register spill.
> 
> I may have an interest in working on a GIMPLE unroller, depending on how
> quickly I can complete or shed some other projects...

I think that a separate unrolling on GIMPLE would be a hard sell
due to the lack of a good cost mode.  _But_ doing unrolling as part
of another transform like we are doing now makes sense.  So does
eventually moving parts of an RTL pass involving unrolling to
GIMPLE, like modulo scheduling or SMS (leaving the scheduling part
to RTL).

Note that the RTL unroller is not enabled by default by any optimization
level and note that unfortunately the RTL unroller shares flags with
the GIMPLE level complete peeling (where it mainly controls cost 
modeling).  Oh, but it's enabled with -fprofile-use.

It's been a long time since I've done SPEC measuring with/without
-funroll-loops (or/and -fpeel-loops).  Note that these flags have
secondary effects as well:

toplev.c:    flag_web = flag_unroll_loops || flag_peel_loops;
toplev.c:    flag_rename_registers = flag_unroll_loops || flag_peel_loops;