public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Extremely bad performance compiling bzip2
@ 1998-12-07  8:18 Osku Salerma
  1998-12-07  8:28 ` Jeffrey A Law
  1998-12-07 14:30 ` H.J. Lu
  0 siblings, 2 replies; 11+ messages in thread
From: Osku Salerma @ 1998-12-07  8:18 UTC (permalink / raw)
  To: egcs

I upgraded my system from libc5 and gcc 2.7.2.1 to glibc2 2.0.6 and
egcs 1.1.1 over the weekend. Everything seems to be working, but the
first thing that I did to test to see if egcs was any better at
optimizing than gcc was to recompile bzip2, version 0.1pl2 to be
exact.

I use the latest netscape distribution as the testfile, which is a 12M
.bz2 file. Using the command "time bzip2 -dc nav.tar.bz2 > /dev/null"
as the benchmark command, my old bzip2 binary, compiled with gcc
2.7.2.1 and linked to libc5, does it in 33-35 seconds.

The flags used to compile with the old gcc were "-O3
-fomit-frame-pointer -funroll-loops", which are the default ones in the
bzip2 distribution.

To make a long story short, compiling with egcs 1.1.1 and those same
flags and linking against glibc2, I get a running time of 58-63
seconds. Adding "-mcpu=pentiumpro" to the flags I get down to 52-53
seconds. (My machine is a K6/233 with 96M of memory). That's still 50%
slower than plain old gcc.

Is there something I'm doing wrong, or should I just accept that bzip2
compiled with egcs is dog slow? I did some tests with gzip too
afterwards, and gzip compiled with egcs is in the +-1% range with my
old gzip.

--
Osku Salerma - osku@iki.fi - http://www.iki.fi/osku/
I'd give my right arm to be ambidextrous.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-07  8:18 Extremely bad performance compiling bzip2 Osku Salerma
@ 1998-12-07  8:28 ` Jeffrey A Law
  1998-12-08 12:42   ` Osku Salerma
  1998-12-07 14:30 ` H.J. Lu
  1 sibling, 1 reply; 11+ messages in thread
From: Jeffrey A Law @ 1998-12-07  8:28 UTC (permalink / raw)
  To: Osku Salerma; +Cc: egcs

  In message < Pine.LNX.3.91.981207180935.1228A-100000@in58.in.helsinki.fi >you w
rite:
  > I upgraded my system from libc5 and gcc 2.7.2.1 to glibc2 2.0.6 and
  > egcs 1.1.1 over the weekend. Everything seems to be working, but the
  > first thing that I did to test to see if egcs was any better at
  > optimizing than gcc was to recompile bzip2, version 0.1pl2 to be
  > exact.
  > 
  > I use the latest netscape distribution as the testfile, which is a 12M
  > .bz2 file. Using the command "time bzip2 -dc nav.tar.bz2 > /dev/null"
  > as the benchmark command, my old bzip2 binary, compiled with gcc
  > 2.7.2.1 and linked to libc5, does it in 33-35 seconds.
  > 
  > The flags used to compile with the old gcc were "-O3
  > -fomit-frame-pointer -funroll-loops", which are the default ones in the
  > bzip2 distribution.
  > 
  > To make a long story short, compiling with egcs 1.1.1 and those same
  > flags and linking against glibc2, I get a running time of 58-63
  > seconds. Adding "-mcpu=pentiumpro" to the flags I get down to 52-53
  > seconds. (My machine is a K6/233 with 96M of memory). That's still 50%
  > slower than plain old gcc.
  > 
  > Is there something I'm doing wrong, or should I just accept that bzip2
  > compiled with egcs is dog slow? I did some tests with gzip too
  > afterwards, and gzip compiled with egcs is in the +-1% range with my
  > old gzip.
You'd have to look at the hot loops.  It may also be the case that these
problems are fixed by current snapshots.  We've done a lot of work to improve
spill code and register allocation, which can have a big impact on x86
performance.

In general, I don't recommend loop unrolling on the x86 -- unrolling loops
takes more registers, and the x86 is painfully short of registers.

jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-07  8:18 Extremely bad performance compiling bzip2 Osku Salerma
  1998-12-07  8:28 ` Jeffrey A Law
@ 1998-12-07 14:30 ` H.J. Lu
  1998-12-08 12:43   ` Osku Salerma
  1998-12-08 22:10   ` Andi Kleen
  1 sibling, 2 replies; 11+ messages in thread
From: H.J. Lu @ 1998-12-07 14:30 UTC (permalink / raw)
  To: Osku Salerma; +Cc: egcs, GNU C Library

> I use the latest netscape distribution as the testfile, which is a 12M
> .bz2 file. Using the command "time bzip2 -dc nav.tar.bz2 > /dev/null"
> as the benchmark command, my old bzip2 binary, compiled with gcc
> 2.7.2.1 and linked to libc5, does it in 33-35 seconds.
> 
> The flags used to compile with the old gcc were "-O3
> -fomit-frame-pointer -funroll-loops", which are the default ones in the
> bzip2 distribution.
> 
> To make a long story short, compiling with egcs 1.1.1 and those same
> flags and linking against glibc2, I get a running time of 58-63
> seconds. Adding "-mcpu=pentiumpro" to the flags I get down to 52-53
> seconds. (My machine is a K6/233 with 96M of memory). That's still 50%
> slower than plain old gcc.
> 

Have you tried bzip2 compiled with gcc 2.7.2.3 and linked to glibc 2?
I know if there are many getc/putc's in the code, glibc 2 can much
slower than libc 5.

-- 
H.J. Lu (hjl@gnu.org)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-07  8:28 ` Jeffrey A Law
@ 1998-12-08 12:42   ` Osku Salerma
  1998-12-08 12:55     ` Jeffrey A Law
  0 siblings, 1 reply; 11+ messages in thread
From: Osku Salerma @ 1998-12-08 12:42 UTC (permalink / raw)
  To: law; +Cc: egcs

> You'd have to look at the hot loops. It may also be the case that
> these problems are fixed by current snapshots. We've done a lot of
> work to improve spill code and register allocation, which can have a
> big impact on x86 performance.

Egcs 1.1.1 was released five days ago. The newest .c file in the
sources is six days old. Am I misunderstanding something about the
egcs development policy or what do you mean by a "current snapshot"?

--
Osku Salerma - osku@iki.fi - http://www.iki.fi/osku/
I'd give my right arm to be ambidextrous.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-07 14:30 ` H.J. Lu
@ 1998-12-08 12:43   ` Osku Salerma
  1998-12-08 13:17     ` Ulrich Drepper
  1998-12-08 22:10   ` Andi Kleen
  1 sibling, 1 reply; 11+ messages in thread
From: Osku Salerma @ 1998-12-08 12:43 UTC (permalink / raw)
  To: H.J. Lu; +Cc: egcs, GNU C Library

> Have you tried bzip2 compiled with gcc 2.7.2.3 and linked to glibc 2?
> I know if there are many getc/putc's in the code, glibc 2 can much
> slower than libc 5.

I would, but I didn't keep my old gcc around. Could someone out there
test bzip2 with some compiler with both libc5 and glibc2 and post some
figures?

BTW, why is it slower? Is glibc 2.1 any faster?

--
Osku Salerma - osku@iki.fi - http://www.iki.fi/osku/
I'd give my right arm to be ambidextrous.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-08 12:42   ` Osku Salerma
@ 1998-12-08 12:55     ` Jeffrey A Law
  0 siblings, 0 replies; 11+ messages in thread
From: Jeffrey A Law @ 1998-12-08 12:55 UTC (permalink / raw)
  To: Osku Salerma; +Cc: egcs

  In message < Pine.LNX.3.91.981208223730.3436A-100000@oskula >you write:
  > > You'd have to look at the hot loops. It may also be the case that
  > > these problems are fixed by current snapshots. We've done a lot of
  > > work to improve spill code and register allocation, which can have a
  > > big impact on x86 performance.
  > 
  > Egcs 1.1.1 was released five days ago. The newest .c file in the
  > sources is six days old. Am I misunderstanding something about the
  > egcs development policy or what do you mean by a "current snapshot"?
The easiest way to look at it is there's a stable release tree and an
unstable development tree.

Snapshots happen from the development tree.

jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-08 12:43   ` Osku Salerma
@ 1998-12-08 13:17     ` Ulrich Drepper
  1998-12-09  7:18       ` Osku Salerma
  0 siblings, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 1998-12-08 13:17 UTC (permalink / raw)
  To: Osku Salerma; +Cc: egcs

Osku Salerma <osku@iki.fi> writes:

> BTW, why is it slower? Is glibc 2.1 any faster?

If a program uses putc on a stream which has no problems with multiple
threads it should use putc_unlocked.  Similar functions variants exist
for fputc, fputs etc.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-07 14:30 ` H.J. Lu
  1998-12-08 12:43   ` Osku Salerma
@ 1998-12-08 22:10   ` Andi Kleen
  1998-12-09  0:53     ` Tobias Ringstrom
  1998-12-09  1:57     ` Andreas Schwab
  1 sibling, 2 replies; 11+ messages in thread
From: Andi Kleen @ 1998-12-08 22:10 UTC (permalink / raw)
  To: hjl; +Cc: egcs

In muc.lists.egcs.misc, you wrote:
>Have you tried bzip2 compiled with gcc 2.7.2.3 and linked to glibc 2?
>I know if there are many getc/putc's in the code, glibc 2 can much
>slower than libc 5.

I am curious why. Does checking an uncongested lock really involve that
much overhead? 

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-08 22:10   ` Andi Kleen
@ 1998-12-09  0:53     ` Tobias Ringstrom
  1998-12-09  1:57     ` Andreas Schwab
  1 sibling, 0 replies; 11+ messages in thread
From: Tobias Ringstrom @ 1998-12-09  0:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: egcs

On 09-Dec-98 Andi Kleen wrote:
> In muc.lists.egcs.misc, you wrote:
>>Have you tried bzip2 compiled with gcc 2.7.2.3 and linked to glibc 2?
>>I know if there are many getc/putc's in the code, glibc 2 can much
>>slower than libc 5.
> 
> I am curious why. Does checking an uncongested lock really involve that
> much overhead? 

I ran a quick test on my i686-pc-linux-gnu with glibc-2.0.7-13.rpm and got:

> time ./putchar_unlocked 100000000 > /dev/null
3.830u 0.010s 0:03.83 100.2%    0+0k 0+0io 65pf+0w

> time ./putchar 100000000 > /dev/null
16.160u 0.040s 0:16.20 100.0%   0+0k 0+0io 66pf+0w

The test program is running in a tight loop, printing spaces. Notice the
impressive CPU utilization of 100.2% :-)

/ Tobias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-08 22:10   ` Andi Kleen
  1998-12-09  0:53     ` Tobias Ringstrom
@ 1998-12-09  1:57     ` Andreas Schwab
  1 sibling, 0 replies; 11+ messages in thread
From: Andreas Schwab @ 1998-12-09  1:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: hjl, egcs

Andi Kleen <ak@muc.de> writes:

|> In muc.lists.egcs.misc, you wrote:
|> >Have you tried bzip2 compiled with gcc 2.7.2.3 and linked to glibc 2?
|> >I know if there are many getc/putc's in the code, glibc 2 can much
|> >slower than libc 5.
|> 
|> I am curious why. Does checking an uncongested lock really involve that
|> much overhead? 

The overhead is most likely in the function call.  getc/putc in glibc 2
are a real functions, whereas in libc 5 they are macros that only call a
real function on under/overflow.

-- 
Andreas Schwab                                      "And now for something
schwab@issan.cs.uni-dortmund.de                      completely different"
schwab@gnu.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Extremely bad performance compiling bzip2
  1998-12-08 13:17     ` Ulrich Drepper
@ 1998-12-09  7:18       ` Osku Salerma
  0 siblings, 0 replies; 11+ messages in thread
From: Osku Salerma @ 1998-12-09  7:18 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: egcs

> > BTW, why is it slower? Is glibc 2.1 any faster?
> 
> If a program uses putc on a stream which has no problems with multiple
> threads it should use putc_unlocked.  Similar functions variants exist
> for fputc, fputs etc.

I put these at the start of bzip2.c, but the running time only went
down about 5 seconds, to 47 seconds. (There weren't any places where
putc/getc instead of fputc/fgetc seemed likely to cause troubles.)
     
#define ferror(fp) ferror_unlocked(fp)
#define getc(fp) getc_unlocked(fp)
#define putc(a,fp) putc_unlocked(a,fp)
#define fputc(a,fp) putc_unlocked(a,fp)
#define fgetc(fp) getc_unlocked(fp)

--
Osku Salerma - osku@iki.fi - http://www.iki.fi/osku/
I'd give my right arm to be ambidextrous.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~1998-12-09  7:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-12-07  8:18 Extremely bad performance compiling bzip2 Osku Salerma
1998-12-07  8:28 ` Jeffrey A Law
1998-12-08 12:42   ` Osku Salerma
1998-12-08 12:55     ` Jeffrey A Law
1998-12-07 14:30 ` H.J. Lu
1998-12-08 12:43   ` Osku Salerma
1998-12-08 13:17     ` Ulrich Drepper
1998-12-09  7:18       ` Osku Salerma
1998-12-08 22:10   ` Andi Kleen
1998-12-09  0:53     ` Tobias Ringstrom
1998-12-09  1:57     ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).