* march question on pentium4
@ 2002-08-29 23:04 Martin Kahlert
2002-08-30 6:52 ` Tim Prince
0 siblings, 1 reply; 4+ messages in thread
From: Martin Kahlert @ 2002-08-29 23:04 UTC (permalink / raw)
To: gcc
Hi!
I use gcc-3.2 release on a FreeBSD system running on a Pentium 4 machine.
My (integer based, no floating point at all) code shows a strange behaviour:
It is faster when compiled with -march=pentium3 than
with -march=pentium4. Is this a known issue or a problem that should be
investigated further?
Thanks in advance.
Regards
Martin.
--
The early bird catches the worm. If you want something else for
breakfast, get up later.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: march question on pentium4
2002-08-29 23:04 march question on pentium4 Martin Kahlert
@ 2002-08-30 6:52 ` Tim Prince
2002-08-30 12:08 ` Joe Buck
0 siblings, 1 reply; 4+ messages in thread
From: Tim Prince @ 2002-08-30 6:52 UTC (permalink / raw)
To: martin.kahlert, gcc
On Thursday 29 August 2002 23:04, Martin Kahlert wrote:
> Hi!
> I use gcc-3.2 release on a FreeBSD system running on a Pentium 4 machine.
>
> My (integer based, no floating point at all) code shows a strange
> behaviour: It is faster when compiled with -march=pentium3 than
> with -march=pentium4. Is this a known issue or a problem that should be
> investigated further?
>
If I am to speculate without an example, the pentium4 costs for shift and
multiply are set so high that the compiler will always use the alternative of
add sequences. Intel's P4 Optimization Guide suggests they be set only high
enough that a shift costs more than 3 adds, for example. You could be
getting excessive code expansion for multiplication by constant.
While the expectation of gcc would be that the costs reflect accurately the
performance of the instructions, high costs which produce long code may not
be best in practice.
Evidently not speaking for anyone.
--
Tim Prince
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: march question on pentium4
2002-08-30 6:52 ` Tim Prince
@ 2002-08-30 12:08 ` Joe Buck
2002-09-01 22:26 ` Tim Prince
0 siblings, 1 reply; 4+ messages in thread
From: Joe Buck @ 2002-08-30 12:08 UTC (permalink / raw)
To: tprince; +Cc: martin.kahlert, gcc
On Thursday 29 August 2002 23:04, Martin Kahlert wrote:
> > Hi!
> > I use gcc-3.2 release on a FreeBSD system running on a Pentium 4 machine.
> >
> > My (integer based, no floating point at all) code shows a strange
> > behaviour: It is faster when compiled with -march=pentium3 than
> > with -march=pentium4. Is this a known issue or a problem that should be
> > investigated further?
Tim Prince writes:
> If I am to speculate without an example, the pentium4 costs for shift
> and multiply are set so high that the compiler will always use the
> alternative of add sequences...
It would be better, I think, to look at the actual example and determine
if there are any outright errors in costs for the P4 or P3.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: march question on pentium4
2002-08-30 12:08 ` Joe Buck
@ 2002-09-01 22:26 ` Tim Prince
0 siblings, 0 replies; 4+ messages in thread
From: Tim Prince @ 2002-09-01 22:26 UTC (permalink / raw)
To: Joe Buck; +Cc: martin.kahlert, gcc
On Friday 30 August 2002 12:08, Joe Buck wrote:
> On Thursday 29 August 2002 23:04, Martin Kahlert wrote:
> > > Hi!
> > > I use gcc-3.2 release on a FreeBSD system running on a Pentium 4
> > > machine.
> > >
> > > My (integer based, no floating point at all) code shows a strange
> > > behaviour: It is faster when compiled with -march=pentium3 than
> > > with -march=pentium4. Is this a known issue or a problem that should be
> > > investigated further?
>
> Tim Prince writes:
> > If I am to speculate without an example, the pentium4 costs for shift
> > and multiply are set so high that the compiler will always use the
> > alternative of add sequences...
>
> It would be better, I think, to look at the actual example and determine
> if there are any outright errors in costs for the P4 or P3.
This is not a question of outright error. The point at which the switch
between shift (or multiply) and add sequences should be made depends on more
factors than the basic costs of the operations, and on the specific model of
the CPU. Extremely large code expansion in add sequences, even though it is
in agreement with the basic costs, will not produce the expected performance
benefit in practical context. Even though a P4 shift costs much more than 4
adds, in isolation, in practice it is seldom useful to use more than 3 adds
to replace a shift. The attempt to use the actual costs of the operations
may be the right way on a processor which has no out-of-order facility.
--
Tim Prince
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2002-09-02 5:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-29 23:04 march question on pentium4 Martin Kahlert
2002-08-30 6:52 ` Tim Prince
2002-08-30 12:08 ` Joe Buck
2002-09-01 22:26 ` Tim Prince
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).