From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-prs-return-22570-listarch-gcc-prs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 18266 invoked by alias); 5 Oct 2002 10:06:05 -0000
Mailing-List: contact gcc-prs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-prs/>
List-Post: <mailto:gcc-prs@gcc.gnu.org>
List-Help: <mailto:gcc-prs-help@gcc.gnu.org>
Sender: gcc-prs-owner@gcc.gnu.org
Received: (qmail 18251 invoked by uid 71); 5 Oct 2002 10:06:04 -0000
Date: Sat, 05 Oct 2002 03:06:00 -0000
Message-ID: <20021005100604.18250.qmail@sources.redhat.com>
To: nobody@gcc.gnu.org
Cc: gcc-prs@gcc.gnu.org,
From: Anton Ertl <anton@a0.complang.tuwien.ac.at>
Subject: Re: optimization/8092: cross-jump triggers too often
Reply-To: Anton Ertl <anton@a0.complang.tuwien.ac.at>
X-SW-Source: 2002-10/txt/msg00177.txt.bz2
List-Id: <gcc-prs.sourceware.org>

The following reply was made to PR optimization/8092; it has been noted by GNATS.

From: Anton Ertl <anton@a0.complang.tuwien.ac.at>
To: bernd.paysan@gmx.de, rth@gcc.gnu.org, gcc-bugs@gcc.gnu.org,
        gcc-prs@gcc.gnu.org, obody@gcc.gnu.org, gcc-gnats@gcc.gnu.org
Cc:  
Subject: Re: optimization/8092: cross-jump triggers too often
Date: Sat, 5 Oct 2002 11:57:34 +0200 (MET DST)

 PR 8092 reports essentially the same problems as
 
 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&pr=7953
 
 Here's some additional data to give you an idea how far the thumb is
 sticking out:
 
 Here you see user times of four benchmarks in seconds (i.e. smaller is
 better) on a Pentium 4 2.26GHz:
 
 Overall slowdown from gcc-3.2 is around a factor of 5:
 
 0.26    0.29    0.32    0.37 gcc-2.95.3 with explicit reg vars
 1.20    1.46    1.79    1.96 gcc-3.2 with explicit reg vars
 
 Indirect slowdown from disabling Piumarta-style interpreter inlining
 is around a factor 2.5 (interpreter inlining does not work with
 gcc-3.2 thanks to cross-jumping):
 
 0.26    0.29    0.32    0.37 gcc-2.95.3 with reg vars
 0.62    0.76    1.15    0.89 gcc-2.95.3 with reg vars, --no-dynamic
 
 Direct slowdown from gcc-3.2 pessimisations is around a factor of 2:
 
 0.62    0.76    1.15    0.89 gcc-2.95.3 with reg vars, --no-dynamic
 1.20    1.46    1.79    1.96 gcc-3.2 with explicit reg vars
 
 Note that at least the direct slowdowns will apply to all threaded
 code interpreters (e.g. Gforth, Ocaml bytecode, various Prolog
 implementations, possibly Perl6), and maybe to other interpreters as
 well.
 
 As for the explicit reg vars, these are optional and used just because
 it produces faster code than gcc's register allocation; for platforms
 where gcc does well on its own, we don't define explicit register
 variables; hopefully one day the 386 platform will be among those.
 Giving an internal compiler error when explicit register allocation
 does not work is fine with me; in some cases I have also found wrong
 code generated, but did not consider it important enough to report a
 bug.
 
 On the Pentium 4 register allocation seems to be unimportant:
 
 0.26    0.29    0.32    0.37 gcc-2.95.3 with explicit reg vars
 0.24    0.31    0.28    0.40 gcc-2.95.3 without explicit reg vars
 
 1.20    1.46    1.79    1.96 gcc-3.2 with explicit reg vars
 1.33    1.59    1.89    1.76 gcc-3.2 without explicit reg vars
 
 However, on the Athlon and the Pentium III register allocation has a
 large influence on performance (timings from an Athlon 1200):
 
 0.37    0.55    0.25    0.61 gcc-2.95.1, reg vars
 0.77    1.06    1.34    1.31 gcc-2.95.1, no reg vars
 
 And here's my wishlist:
 
 1) Add a -fno-cross-jump flag or similar, as in Bernd's patch.
 
 2) Fix the bug that moves unrelated code into virtual machine
 instructions even with -fno-gcse; we can work around that in the
 present case (not yet done in the timings above), but at least I did
 not find the source of this code and thus the workaround.
 
 3) Make register allocation good enough that explicit reg vars don't
 pay off even on the Athlon.:-)
 
 - anton