From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-400180-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 10572 invoked by alias); 5 Sep 2012 11:01:02 -0000
Received: (qmail 10543 invoked by uid 22791); 5 Sep 2012 11:01:01 -0000
X-SWARE-Spam-Status: No, hits=-4.3 required=5.0	tests=ALL_TRUSTED,AWL,BAYES_00,KHOP_THREADED,TW_TM
X-Spam-Check-By: sourceware.org
Received: from localhost (HELO gcc.gnu.org) (127.0.0.1)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 05 Sep 2012 11:00:46 +0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
Date: Wed, 05 Sep 2012 11:01:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Keywords: compile-time-hog, memory-hog
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.6.4
X-Bugzilla-Changed-Fields:
Message-ID: <bug-46590-4-PBbTwc5n4A@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-46590-4@http.gcc.gnu.org/bugzilla/>
References: <bug-46590-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2012-09/txt/msg00346.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #36 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-05 10:59:52 UTC ---
If I fix that (PR54489) by iterating over immediate dominators when querying
AVAIL_OUT
instead of accumulating then other loop opts quickly take over in compile-time,
but memory usage stays reasonable at -O1.  LIM is now the pass that pushes
memory usage to 1.8GB - all other optimization passes are happy with just
~800MB.  The issue with LIM is that it analyzes the whole function instead
of working on outermost loops at a time (PR54488).  Then of course IRA
comes along and wrecks memory usage again ... (create_loop_tree_nodes).
One can tame down IRA a bit using -fno-ira-loop-pressure -fira-region=one.
We then arrive at roughly a constant 900MB memory usage for the full(!)
testcase at -O1 and

Execution times (seconds)
 phase opt and generate  : 495.90 (99%) usr   1.98 (98%) sys 499.91 (99%) wall 
870508 kB (92%) ggc
 df reaching defs        :  19.16 ( 4%) usr   0.06 ( 3%) sys  19.18 ( 4%) wall 
     0 kB ( 0%) ggc
 alias stmt walking      :  28.75 ( 6%) usr   0.21 (10%) sys  29.12 ( 6%) wall 
  2336 kB ( 0%) ggc
 tree SSA rewrite        :  63.42 (13%) usr   0.02 ( 1%) sys  63.77 (13%) wall 
 18830 kB ( 2%) ggc
 tree SSA incremental    :  74.64 (15%) usr   0.03 ( 1%) sys  74.44 (15%) wall 
 25886 kB ( 3%) ggc
 dominance frontiers     : 101.71 (20%) usr   0.09 ( 4%) sys 102.17 (20%) wall 
     0 kB ( 0%) ggc
 dominance computation   :  52.56 (11%) usr   0.09 ( 4%) sys  53.35 (11%) wall 
     0 kB ( 0%) ggc
 loop invariant motion   : 101.20 (20%) usr   0.10 ( 5%) sys 101.75 (20%) wall 
  2700 kB ( 0%) ggc
 TOTAL                 : 498.79             2.03           502.87            
947764 kB

(all entries > 10s)

The incremental SSA stuff is complete loop unrolling / IV canonicalization
which does SSA update once per loop (similar to what loop header copying
formerly did).  Fixing that leads to

Execution times (seconds)
 phase opt and generate  : 214.62 (99%) usr   1.53 (96%) sys 217.41 (99%) wall 
870508 kB (92%) ggc
 df reaching defs        :  23.07 (11%) usr   0.01 ( 1%) sys  23.10 (10%) wall 
     0 kB ( 0%) ggc
 alias stmt walking      :  28.51 (13%) usr   0.23 (14%) sys  28.93 (13%) wall 
  2336 kB ( 0%) ggc
 loop invariant motion   : 105.43 (48%) usr   0.01 ( 1%) sys 106.22 (48%) wall 
  2700 kB ( 0%) ggc
 TOTAL                 : 217.56             1.59           220.44            
947764 kB

so RTL invariant motion is now the main offender ;)