public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* 19980707 built on win95/i686-pc-cygwin32
@ 1998-07-10  0:33 N8TM
  1998-07-10  7:01 ` Hannu Koivisto
  1998-07-10 15:56 ` Joern Rennecke
  0 siblings, 2 replies; 10+ messages in thread
From: N8TM @ 1998-07-10  0:33 UTC (permalink / raw)
  To: egcs

configure suggests i586-pc-cygwin32, would that be better?  Under linux and NT
it selects i686.

I added #undef HAVE_INTTYPES_H and copied <wchar.h> to  <wctype.h>

gcc and g77 appear to run OK but:

there were additional warnings which I don't see in builds for other systems,
including winnt.

There were stage2 to stage3 bootstrap comparison failures.

Unlike the PPro, egcs is not as fast on Pentium II as gcc-2.8.1/g77-0.5.23. 

Build is reasonably fast on win95, even though the win95/cygwin combination
slows everything down.  NT takes several times as long, both with 64MB, which
must not be enough on NT.

Let me know if there is interest in any of the build gripes, and what details
you want.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 19980707 built on win95/i686-pc-cygwin32
  1998-07-10  0:33 19980707 built on win95/i686-pc-cygwin32 N8TM
@ 1998-07-10  7:01 ` Hannu Koivisto
  1998-07-10 15:56 ` Joern Rennecke
  1 sibling, 0 replies; 10+ messages in thread
From: Hannu Koivisto @ 1998-07-10  7:01 UTC (permalink / raw)
  To: N8TM; +Cc: egcs

N8TM@aol.com writes:

| Build is reasonably fast on win95, even though the win95/cygwin combination
| slows everything down.  NT takes several times as long, both with 64MB, which
| must not be enough on NT.

It should be enough. The problem may be related to cygwin
operating on an NTFS filesystem (if this is the case in your
setup; if it's not, then just ignore this). Browse the cygwin
mailinglist archive, the issue has been discussed there IIRC.

| Let me know if there is interest in any of the build gripes, and what details
| you want.

If the build process required something out of the ordinary,
what about making patches of the required changes and post them
here so that future snapshots could be compiled out of the box
(regarding those problems)?

//Hannu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 19980707 built on win95/i686-pc-cygwin32
  1998-07-10  0:33 19980707 built on win95/i686-pc-cygwin32 N8TM
  1998-07-10  7:01 ` Hannu Koivisto
@ 1998-07-10 15:56 ` Joern Rennecke
  1 sibling, 0 replies; 10+ messages in thread
From: Joern Rennecke @ 1998-07-10 15:56 UTC (permalink / raw)
  To: N8TM; +Cc: egcs

> 
> Unlike the PPro, egcs is not as fast on Pentium II as gcc-2.8.1/g77-0.5.23. 

Hmm, I think the main problem with the Pentium II is the slower level 2
cache.  So it the code is larger, you might see some slowdown due to
more code fetches from L2 cache into icache.

Did you try egcs with -fno-gcse ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 19980707 built on win95/i686-pc-cygwin32
@ 1998-07-12  9:18 N8TM
  0 siblings, 0 replies; 10+ messages in thread
From: N8TM @ 1998-07-12  9:18 UTC (permalink / raw)
  To: law; +Cc: amylaar, egcs

In a message dated 7/12/98 12:44:04 AM Pacific Daylight Time,
law@hurl.cygnus.com writes:

> Look for this code in gcse.c:
>  
>        if (optimize_size)
>          changed |= one_classic_gcse_pass (f, pass + 1);
>        else
>          changed |= one_pre_gcse_pass (f, pass + 1);
>  
>  Change "optimize_size" to "1" to try running the classic gcse pass
>  instead of the pre based gcse pass.


This cures those performance deficits on Livermore Kernels 9,11, and 16,
without any significant effects on the other Kernels.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 19980707 built on win95/i686-pc-cygwin32
@ 1998-07-12  8:15 N8TM
  0 siblings, 0 replies; 10+ messages in thread
From: N8TM @ 1998-07-12  8:15 UTC (permalink / raw)
  To: law; +Cc: amylaar, egcs

In a message dated 7/12/98 12:44:04 AM Pacific Daylight Time,
law@hurl.cygnus.com writes:
>icache or dcache?
> Either could be adversely effected I suppose.
>  
>  The partial redundancy elimination based version of gcse tends to trade
>  code size for code speed.  So if your code is icache sensitive it could
>  be a lose.

I certainly don't know all there is to know about cache, but I believe that
icache and dcache are separate at level 1, not at level 2.  The performance
loss is occurring when both code size and data size increase, but not unless
both occur, and not in Linux.  So it looks to me like a level 2 cache miss
issue, with Linux apparently allowing a longer life for level 2 cache data.
Most of these Livermore loops are small enough that they ought to to stick in
level 1 cache, with the aid of the p2align scheme.  Kernels 9 and 16 are
bigger than average, but Kernel 11, which Linux also improves on, is (or ought
to be) small even with unrolling.

<One experiment you might consider trying is to run the older classic
<gcse pass instead of the pre based gcse pass.

I'll do that.

<The majority of the cases where I've seen pre lose in the past have
<been register pressure issues.

That certainly must make it tough to make a compiler which optimizes both on
Intel and on architectures with 4 times as many registers available.  I would
think that an "optimization" which loses on account of register pressure in
Livermore Kernels is not one I would want active in general.  I think there is
a correlation here, too, with the relative performance of gcc-2.8.1/g77-0.5.23
and egcs.  Egcs tends to achieve top performance on cases which don't need
many registers but fall down on ones which want to use more registers than are
available.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 19980707 built on win95/i686-pc-cygwin32
@ 1998-07-12  0:41 N8TM
  1998-07-12  0:41 ` Jeffrey A Law
  0 siblings, 1 reply; 10+ messages in thread
From: N8TM @ 1998-07-12  0:41 UTC (permalink / raw)
  To: law; +Cc: amylaar, egcs

In a message dated 7/11/98 11:04:08 PM Pacific Daylight Time,
law@hurl.cygnus.com writes:

> Any chance you could analyze this code in more detail?  I'm quite
>  interested in cases where gcse makes code slower.


Thanks for the suggestion.  The differences in performance turn out to be
confined to small parts of my benchmark codes.  In the Livermore Kernels
double precision, -fno-gcse makes a significant difference in just 2 of the 24
kernel tests.  By significant I mean a difference greater than the
"experimental timing error" assessed by the benchmark code.  I ran these tests
on Pentium II 233 Mhz, with -funroll-loops -malign-double -march=pentiumpro
-O2, with binutils-2.9.1 installed with the p2align hooks.  I quote numbers
from win95/cygwin32 first, then mention the comparison with Linux.  win95
timings were done with sys_clock() rewritten with QueryPerformance WinAPI
calls; Linux with cpu_time() from libU77.

Kernel 9 performance at vector length 101 drops from 58 to 52 Mflops with
gcse.  At vector length 15, it is 57 Mflops either way.  This is definitely
abnormal, for performance to drop with increasing vector length, and, to me,
this would indicate an increase in cache miss rate.

Kernel 16 drops from 39 to 34 Mflops with gcse, at all vector lengths
(15,40,75).  I don't see anything to tell whether this is a code size or a
jump target alignment effect.  It could easily be the latter.

Kernels 9, 11, and 16 are the only ones where Linux (i686-pc-linux-gnulibc1)
performance is significantly better than win95/cygwin32.  Linux does not
exhibit any reduced performance with increased vector length.  Again, I think
this supports the supposition of an adverse cache effect under win95.

Maybe tomorrow I will get a chance to look at the .s code for these cases.  I
would look for a correlation with code size or order of data access.  I don't
know that I'm likely to see anything, or to figure out how to see the results
obtained by p2align.

I do see consistent increases in run time for number-crunching codes going
from Linux to win95.  On real codes, some of that evidently is in the slowness
of disk file access under win95.

Let's see if I can paste in source code for Kernels 9 and 16:
C***********************************************************************
C***  KERNEL 9      INTEGRATE PREDICTORS
C***********************************************************************
C
C
	  do k= 1,n
	    px(1,k)= dm28*px(13,k)+dm27*px(12,k)+dm26*px(11,k)+dm25*px(1
     &0,k)+dm24*px(9,k)+dm23*px(8,k)+dm22*px(7,k)+c0*(px(5,k)+px(6,k))+p
     &x(3,k)
	    enddo

C***********************************************************************
C***  KERNEL 16     MONTE CARLO SEARCH LOOP
C***********************************************************************
C
  do m= 1,zone(1)
	      j2= (n+n)*(m-1)+1
	      do k= 1,n
		  k2= k2+1
		  j4= j2+k+k
		  j5= zone(j4)
		  if(j5 >= n)then
		      if(j5 == n)then
			exit
			endif
		      k3= k3+1
		      if(d(j5) <  d(j5-1)*(t-d(j5-2))**2+(s-d(j5-3))**2+
     &(r-d(j5-4))**2)then
			goto200
			endif
		      if(d(j5) == d(j5-1)*(t-d(j5-2))**2+(s-d(j5-3))**2+
     &(r-d(j5-4))**2)then
			exit
			endif
		    else
		      if(j5-n+lb <  0)then
			  if(plan(j5) <  t)then
			    goto200
			    endif
			  if(plan(j5) == t)then
			    exit
			    endif
			else
			  if(j5-n+ii <  0)then
			      if(plan(j5) <  s)then
				goto200
				endif
			      if(plan(j5) == s)then
				exit
				endif
			    else
				if(plan(j5) <  r)then
				  goto200
				  endif
				if(plan(j5) == r)then
				  exit
				  endif
			    endif
			endif
		    endif
		  if(zone(j4-1) <= 0)then
		    goto200
		    endif
		enddo
	      exit
200             if(zone(j4-1) == 0)then
		  exit
		  endif
	    enddo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 19980707 built on win95/i686-pc-cygwin32
  1998-07-12  0:41 N8TM
@ 1998-07-12  0:41 ` Jeffrey A Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeffrey A Law @ 1998-07-12  0:41 UTC (permalink / raw)
  To: N8TM; +Cc: amylaar, egcs

  In message < 404f573f.35a864cf@aol.com >you write:
  > Kernel 9 performance at vector length 101 drops from 58 to 52 Mflops with
  > gcse.  At vector length 15, it is 57 Mflops either way.  This is definitely
  > abnormal, for performance to drop with increasing vector length, and, to me,
  > this would indicate an increase in cache miss rate.
icache or dcache?

Either could be adversely effected I suppose.

The partial redundancy elimination based version of gcse tends to trade
code size for code speed.  So if your code is icache sensitive it could
be a lose.

The dcache effects could possibly occur due to additional register
pressure causing more spills and access into stack slots.

One experiment you might consider trying is to run the older classic
gcse pass instead of the pre based gcse pass.  It doesn't tend to
expand code as much and finds fewer redundancies (and thus needs fewer
regs with cross-block lifetimes).


Look for this code in gcse.c:

      if (optimize_size)
        changed |= one_classic_gcse_pass (f, pass + 1);
      else
        changed |= one_pre_gcse_pass (f, pass + 1);

Change "optimize_size" to "1" to try running the classic gcse pass
instead of the pre based gcse pass.


The majority of the cases where I've seen pre lose in the past have
been register pressure issues.  Addressing these concerns is one of
the primary motivations behind the "lazy code motion" based version
of pre/gcse that Cygnus will donate later this year.

Other cases have been secondary effects related to register lifetimes
and the like.  For example, a register created by gcse may be set in
multiple places, which inhibits certain optimizations -- the old
code would compute the same value in multiple locations into multiple
regs (usually each being set only once).  That kind of stuff.


jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 19980707 built on win95/i686-pc-cygwin32
  1998-07-10 23:05 N8TM
@ 1998-07-11 22:54 ` Jeffrey A Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeffrey A Law @ 1998-07-11 22:54 UTC (permalink / raw)
  To: N8TM; +Cc: amylaar, egcs

  In message < 57600ee5.35a7006b@aol.com >you write:
  > I wasn't aware of that option.  It certainly accentuates the differences in
  > performance between egcs/g77 and g77-0.5.23 on Linux/P II.  Livermore Kernel
  > (double precision) loops vary from 50% faster for egcs with this option to 50%
  > faster for g77-0.5.23.  On win95, however, the effects of -fno-gcse seem to be
  > all favorable, bringing the performance of egcs on win95 almost up to where it
  > is on Linux, and well above g77-0.5.23 on win95.  win95 must be poisoning the
  > cache, if I take your suggestion.
Any chance you could analyze this code in more detail?  I'm quite
interested in cases where gcse makes code slower.

THough given the variation in your results, this may be yet another
"double alignment" issue.

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 19980707 built on win95/i686-pc-cygwin32
@ 1998-07-10 23:05 N8TM
  1998-07-11 22:54 ` Jeffrey A Law
  0 siblings, 1 reply; 10+ messages in thread
From: N8TM @ 1998-07-10 23:05 UTC (permalink / raw)
  To: amylaar; +Cc: egcs

In a message dated 7/10/98 3:57:03 PM Pacific Daylight Time,
amylaar@cygnus.co.uk writes:

>  it the code is larger, you might see some slowdown due to
>  more code fetches from L2 cache into icache.
>  
>  Did you try egcs with -fno-gcse ?

I wasn't aware of that option.  It certainly accentuates the differences in
performance between egcs/g77 and g77-0.5.23 on Linux/P II.  Livermore Kernel
(double precision) loops vary from 50% faster for egcs with this option to 50%
faster for g77-0.5.23.  On win95, however, the effects of -fno-gcse seem to be
all favorable, bringing the performance of egcs on win95 almost up to where it
is on Linux, and well above g77-0.5.23 on win95.  win95 must be poisoning the
cache, if I take your suggestion.

On my CFD benchmark (all single precision), the one subroutine which spends
most of its time copying data, with a relatively low cache hit rate, runs 20%
faster with egcs -fno-gcse, while all the other subroutines which spend
significant time are faster with g77-0.5.23.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 19980707 built on win95/i686-pc-cygwin32
@ 1998-07-10 20:04 N8TM
  0 siblings, 0 replies; 10+ messages in thread
From: N8TM @ 1998-07-10 20:04 UTC (permalink / raw)
  To: azure; +Cc: egcs

In a message dated 7/10/98 7:01:40 AM Pacific Daylight Time, azure@iki.fi
writes:

> If the build process required something out of the ordinary,
>  what about making patches of the required changes and post them
>  here so that future snapshots could be compiled out of the box

On win95/cygwin32, the search path through gcc/config was not always followed
from "auto-config.h", so the .h files in config/i386 were not being picked up.
That problem does not occur on NT, but misconfiguration re <inttypes.h> does.
Also, there are minor unnecessary annoyances due to failure to #undef macros
before changing them or unnecessary redefining of POSIX.  Those and the
including of <wctype.h> instead of <wchar.h>

No, 64MB is not enough for satisfactory build performance on NT on a Novell
network with a virus checker active.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~1998-07-12  9:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-07-10  0:33 19980707 built on win95/i686-pc-cygwin32 N8TM
1998-07-10  7:01 ` Hannu Koivisto
1998-07-10 15:56 ` Joern Rennecke
1998-07-10 20:04 N8TM
1998-07-10 23:05 N8TM
1998-07-11 22:54 ` Jeffrey A Law
1998-07-12  0:41 N8TM
1998-07-12  0:41 ` Jeffrey A Law
1998-07-12  8:15 N8TM
1998-07-12  9:18 N8TM

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).