From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18266 invoked by alias); 5 Oct 2002 10:06:05 -0000 Mailing-List: contact gcc-prs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-prs-owner@gcc.gnu.org Received: (qmail 18251 invoked by uid 71); 5 Oct 2002 10:06:04 -0000 Date: Sat, 05 Oct 2002 03:06:00 -0000 Message-ID: <20021005100604.18250.qmail@sources.redhat.com> To: nobody@gcc.gnu.org Cc: gcc-prs@gcc.gnu.org, From: Anton Ertl Subject: Re: optimization/8092: cross-jump triggers too often Reply-To: Anton Ertl X-SW-Source: 2002-10/txt/msg00177.txt.bz2 List-Id: The following reply was made to PR optimization/8092; it has been noted by GNATS. From: Anton Ertl To: bernd.paysan@gmx.de, rth@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, obody@gcc.gnu.org, gcc-gnats@gcc.gnu.org Cc: Subject: Re: optimization/8092: cross-jump triggers too often Date: Sat, 5 Oct 2002 11:57:34 +0200 (MET DST) PR 8092 reports essentially the same problems as http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&pr=7953 Here's some additional data to give you an idea how far the thumb is sticking out: Here you see user times of four benchmarks in seconds (i.e. smaller is better) on a Pentium 4 2.26GHz: Overall slowdown from gcc-3.2 is around a factor of 5: 0.26 0.29 0.32 0.37 gcc-2.95.3 with explicit reg vars 1.20 1.46 1.79 1.96 gcc-3.2 with explicit reg vars Indirect slowdown from disabling Piumarta-style interpreter inlining is around a factor 2.5 (interpreter inlining does not work with gcc-3.2 thanks to cross-jumping): 0.26 0.29 0.32 0.37 gcc-2.95.3 with reg vars 0.62 0.76 1.15 0.89 gcc-2.95.3 with reg vars, --no-dynamic Direct slowdown from gcc-3.2 pessimisations is around a factor of 2: 0.62 0.76 1.15 0.89 gcc-2.95.3 with reg vars, --no-dynamic 1.20 1.46 1.79 1.96 gcc-3.2 with explicit reg vars Note that at least the direct slowdowns will apply to all threaded code interpreters (e.g. Gforth, Ocaml bytecode, various Prolog implementations, possibly Perl6), and maybe to other interpreters as well. As for the explicit reg vars, these are optional and used just because it produces faster code than gcc's register allocation; for platforms where gcc does well on its own, we don't define explicit register variables; hopefully one day the 386 platform will be among those. Giving an internal compiler error when explicit register allocation does not work is fine with me; in some cases I have also found wrong code generated, but did not consider it important enough to report a bug. On the Pentium 4 register allocation seems to be unimportant: 0.26 0.29 0.32 0.37 gcc-2.95.3 with explicit reg vars 0.24 0.31 0.28 0.40 gcc-2.95.3 without explicit reg vars 1.20 1.46 1.79 1.96 gcc-3.2 with explicit reg vars 1.33 1.59 1.89 1.76 gcc-3.2 without explicit reg vars However, on the Athlon and the Pentium III register allocation has a large influence on performance (timings from an Athlon 1200): 0.37 0.55 0.25 0.61 gcc-2.95.1, reg vars 0.77 1.06 1.34 1.31 gcc-2.95.1, no reg vars And here's my wishlist: 1) Add a -fno-cross-jump flag or similar, as in Bernd's patch. 2) Fix the bug that moves unrelated code into virtual machine instructions even with -fno-gcse; we can work around that in the present case (not yet done in the timings above), but at least I did not find the source of this code and thus the workaround. 3) Make register allocation good enough that explicit reg vars don't pay off even on the Athlon.:-) - anton