public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* march question on pentium4
@ 2002-08-29 23:04 Martin Kahlert
  2002-08-30  6:52 ` Tim Prince
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Kahlert @ 2002-08-29 23:04 UTC (permalink / raw)
  To: gcc

Hi!
I use gcc-3.2 release on a FreeBSD system running on a Pentium 4 machine.

My (integer based, no floating point at all) code shows a strange behaviour:
It is faster when compiled with -march=pentium3 than
with -march=pentium4. Is this a known issue or a problem that should be 
investigated further?

Thanks in advance.

Regards
Martin.

-- 
The early bird catches the worm. If you want something else for       
breakfast, get up later.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: march question on pentium4
  2002-08-29 23:04 march question on pentium4 Martin Kahlert
@ 2002-08-30  6:52 ` Tim Prince
  2002-08-30 12:08   ` Joe Buck
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Prince @ 2002-08-30  6:52 UTC (permalink / raw)
  To: martin.kahlert, gcc

On Thursday 29 August 2002 23:04, Martin Kahlert wrote:
> Hi!
> I use gcc-3.2 release on a FreeBSD system running on a Pentium 4 machine.
>
> My (integer based, no floating point at all) code shows a strange
> behaviour: It is faster when compiled with -march=pentium3 than
> with -march=pentium4. Is this a known issue or a problem that should be
> investigated further?
>
If I am to speculate without an example, the pentium4 costs for shift and 
multiply are set so high that the compiler will always use the alternative of 
add sequences.  Intel's P4 Optimization Guide suggests they be set only high 
enough that a shift costs more than 3 adds, for example.  You could be 
getting excessive code expansion for multiplication by constant.  
While the expectation of gcc would be that the costs reflect accurately the 
performance of the instructions, high costs which produce long code may not 
be best in practice.
Evidently not speaking for anyone.
-- 
Tim Prince

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: march question on pentium4
  2002-08-30  6:52 ` Tim Prince
@ 2002-08-30 12:08   ` Joe Buck
  2002-09-01 22:26     ` Tim Prince
  0 siblings, 1 reply; 4+ messages in thread
From: Joe Buck @ 2002-08-30 12:08 UTC (permalink / raw)
  To: tprince; +Cc: martin.kahlert, gcc

On Thursday 29 August 2002 23:04, Martin Kahlert wrote:
> > Hi!
> > I use gcc-3.2 release on a FreeBSD system running on a Pentium 4 machine.
> >
> > My (integer based, no floating point at all) code shows a strange
> > behaviour: It is faster when compiled with -march=pentium3 than
> > with -march=pentium4. Is this a known issue or a problem that should be
> > investigated further?

Tim Prince writes:

> If I am to speculate without an example, the pentium4 costs for shift
> and multiply are set so high that the compiler will always use the
> alternative of add sequences...

It would be better, I think, to look at the actual example and determine
if there are any outright errors in costs for the P4 or P3.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: march question on pentium4
  2002-08-30 12:08   ` Joe Buck
@ 2002-09-01 22:26     ` Tim Prince
  0 siblings, 0 replies; 4+ messages in thread
From: Tim Prince @ 2002-09-01 22:26 UTC (permalink / raw)
  To: Joe Buck; +Cc: martin.kahlert, gcc

On Friday 30 August 2002 12:08, Joe Buck wrote:
> On Thursday 29 August 2002 23:04, Martin Kahlert wrote:
> > > Hi!
> > > I use gcc-3.2 release on a FreeBSD system running on a Pentium 4
> > > machine.
> > >
> > > My (integer based, no floating point at all) code shows a strange
> > > behaviour: It is faster when compiled with -march=pentium3 than
> > > with -march=pentium4. Is this a known issue or a problem that should be
> > > investigated further?
>
> Tim Prince writes:
> > If I am to speculate without an example, the pentium4 costs for shift
> > and multiply are set so high that the compiler will always use the
> > alternative of add sequences...
>
> It would be better, I think, to look at the actual example and determine
> if there are any outright errors in costs for the P4 or P3.
This is not a question of outright error.  The point at which the switch 
between shift (or multiply) and add sequences should be made depends on more 
factors than the basic costs of the operations, and on the specific model of 
the CPU.  Extremely large code expansion in add sequences, even though it is 
in agreement with the basic costs, will not produce the expected performance 
benefit in practical context.  Even though a P4 shift costs much more than 4 
adds, in isolation, in practice it is seldom useful to use more than 3 adds 
to replace a shift.  The attempt to use the actual costs of the operations 
may be the right way on a processor which has no out-of-order facility.
-- 
Tim Prince

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-09-02  5:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-29 23:04 march question on pentium4 Martin Kahlert
2002-08-30  6:52 ` Tim Prince
2002-08-30 12:08   ` Joe Buck
2002-09-01 22:26     ` Tim Prince

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).