From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-49489-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 8774 invoked by alias); 4 Apr 2002 14:57:01 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 8764 invoked from network); 4 Apr 2002 14:57:00 -0000
Received: from unknown (HELO newton.math.purdue.edu) (128.210.3.6)
  by sources.redhat.com with SMTP; 4 Apr 2002 14:57:00 -0000
Received: from banach.math.purdue.edu (lucier@banach.math.purdue.edu [128.210.3.16])
	by newton.math.purdue.edu (8.10.1/8.10.1/PURDUE_MATH-4.0) with ESMTP id g34Eur800905;
	Thu, 4 Apr 2002 09:56:54 -0500 (EST)
Received: (from lucier@localhost)
	by banach.math.purdue.edu (8.10.1/8.10.1/PURDUE_MATH-4.0) id g34Eunh29854;
	Thu, 4 Apr 2002 09:56:49 -0500 (EST)
From: Brad Lucier <lucier@math.purdue.edu>
Message-Id: <200204041456.g34Eunh29854@banach.math.purdue.edu>
Subject: Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
To: jh@suse.cz (Jan Hubicka)
Date: Thu, 04 Apr 2002 07:02:00 -0000
Cc: lucier@math.purdue.edu (Brad Lucier), jh@suse.cz (Jan Hubicka),
   dje@watson.ibm.com (David Edelsohn), gcc@gcc.gnu.org, mark@codesourcery.com,
   feeley@iro.umontreal.ca
In-Reply-To: <20020329162904.GK2886@atrey.karlin.mff.cuni.cz> from "Jan Hubicka" at Mar 29, 2002 05:29:04 PM
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-SW-Source: 2002-04/txt/msg00141.txt.bz2

Honza and I have exchanged some off-list e-mail about this problem.  Real
life has intervened, and I doubt that he will have time before the scheduled
release of April 15 to work on this, so I'll attempt to summarize what's
been happening.  I think I need some help.

All this is with the compile options -O1 -fschedule-insns2.

At first, this was the profile on my all.i test:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 37.31   1106.44  1106.44 2123667007     0.00     0.00  try_crossjump_to_edge
 11.27   1440.70   334.26                             internal_mcount
  6.85   1643.67   202.97   395788     0.00     0.00  cselib_invalidate_regno
  6.53   1837.39   193.72                             htab_traverse
  4.67   1975.95   138.56     4987     0.03     0.03  propagate_freq
  2.87   2061.08    85.13       29     2.94     2.94  find_unreachable_blocks
  2.50   2135.09    74.01       15     4.93     4.94  calc_idoms
  2.48   2208.53    73.44   468802     0.00     0.00  try_forward_edges
  2.46   2281.48    72.95 173160573     0.00     0.00  cached_make_edge
  2.41   2353.01    71.53 175996207     0.00     0.00  bitmap_operation
...

cleanup cfg took > 98% of the CPU time of 18 hours on my UltraSPARC.

Honza sent me a patch on March 28 that disabled try_crossjump_bb if there
are more than 100 outgoint edges.  This changed the profile on a slightly
smaller example (denoise3.i) to

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 63.29   1219.40  1219.40    33525     0.04     0.04  try_crossjump_to_edge
  8.19   1377.13   157.73                             internal_mcount
  5.08   1474.99    97.86                             htab_traverse
  2.52   1523.56    48.57   206655     0.00     0.00  try_forward_edges
  2.40   1569.81    46.25       31     1.49     1.49  find_unreachable_blocks
  2.16   1611.38    41.57     2905     0.01     0.01  propagate_freq
  1.40   1638.30    26.91        9     2.99     5.48  calculate_global_regs_live
  1.33   1663.98    25.68       15     1.71     1.71  calc_idoms
  1.20   1687.02    23.04       15     1.54     1.54  calc_dfs_tree_nonrec

Basically, try_crossjump_bb was no longer a problem, but try_crossjump_to_edge
is still a problem; cleanup cfg still took > 87% of the CPU time.

Honza suggested (several times) that one way to deal with this is to disable
try_crossjump_to_edge and try_crossjump_bb with unless we use -O2 or higher.
These algorithms are O(N^3) in the number of edges, which are quadratic in
the size of my programs, since I use a lot of computed gotos and label 
addresses.

I've looked at cfgcleanup.c and don't really know how to proceed.  Can
someone suggest a reasonable way to fix this?

Brad