* Performance regression of generated numerical code
@ 2009-12-14 23:50 Martin Reinecke
2009-12-14 23:59 ` H.J. Lu
0 siblings, 1 reply; 3+ messages in thread
From: Martin Reinecke @ 2009-12-14 23:50 UTC (permalink / raw)
To: gcc
[-- Attachment #1: Type: text/plain, Size: 649 bytes --]
Hi,
I have noticed a big performance decrease in one of my numerical codes
when switching from gcc 4.4 to gcc 4.5. A small test case is attached.
When compiling this test case with "gcc -O3 perf.c -lm -std=c99"
and executing the resulting binary, the CPU time with the head of
the 4.4 branch is about 1.1s, with the head of the trunk it is 2.1s.
This is on a Pentium D CPU. I have verified that both binaries produce
identical results.
If I can do anything to help locate the reason for this slowdown, I'd be
glad to help, but I must admit that I'm no good at interpreting assembler.
Any insight would be greatly appreciated.
Thanks,
Martin
[-- Attachment #2: perf.c --]
[-- Type: text/plain, Size: 1748 bytes --]
#include <math.h>
#include <stdlib.h>
static inline double max (double a, double b)
{ return (a>=b) ? a : b; }
static inline int nearest_int (double arg)
{
arg += 0.5;
return (arg>=0) ? (int)arg : (int)arg-1;
}
void wrec3jj (double l2, double l3, double m2, double m3, double *res, int sz)
{
const int expo=250;
const double srhuge=ldexp(1.,expo),
tiny=ldexp(1.,-2*expo), srtiny=ldexp(1.,-expo);
const double m1 = -m2 -m3;
const double l1min = max(fabs(l2-l3),fabs(m1)),
l1max = l2 + l3;
const int ncoef = nearest_int(l1max-l1min)+1;
const double l2ml3sq = (l2-l3)*(l2-l3),
pre1 = (l2+l3+1.)*(l2+l3+1.),
m1sq = m1*m1,
pre2 = m1*(l2*(l2+1.)-l3*(l3+1.)),
m3mm2 = m3-m2;
int i=0;
res[i] = srtiny;
double sumfor = (2.*l1min+1.) * res[i]*res[i];
double c1=1e300;
double oldfac=0.;
do
{
if (i==ncoef-1) break; // all done
++i;
const double l1 = l1min+i,
l1sq = l1*l1;
const double c1old=fabs(c1);
const double newfac = sqrt((l1sq-l2ml3sq)*(pre1-l1sq)*(l1sq-m1sq));
if (i>1)
{
const double tmp1 = 1./((l1-1.)*newfac);
c1 = (2.*l1-1.)*(pre2-(l1sq-l1)*m3mm2) * tmp1;
res[i] = res[i-1]*c1 - res[i-2]*l1*oldfac*tmp1;
}
else
{
c1 = (l1>1.000001) ? (2.*l1-1.)*(pre2-(l1sq-l1)*m3mm2)/((l1-1.)*newfac)
: (2.*l1-1.)*l1*(m3mm2)/newfac;
res[i] = res[i-1]*c1;
}
oldfac=newfac;
if (c1old<=fabs(c1)) break;
}
while (1);
}
int main(void)
{
double *res = (double *)malloc(1000*sizeof(double));
for (int m=0; m<1000000; ++m)
wrec3jj (100, 60, 60, -50, res, 1000);
return 0;
}
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Performance regression of generated numerical code
2009-12-14 23:50 Performance regression of generated numerical code Martin Reinecke
@ 2009-12-14 23:59 ` H.J. Lu
2009-12-15 9:20 ` Martin Reinecke
0 siblings, 1 reply; 3+ messages in thread
From: H.J. Lu @ 2009-12-14 23:59 UTC (permalink / raw)
To: Martin Reinecke; +Cc: gcc
On Mon, Dec 14, 2009 at 3:50 PM, Martin Reinecke
<martin@mpa-garching.mpg.de> wrote:
> Hi,
>
> I have noticed a big performance decrease in one of my numerical codes
> when switching from gcc 4.4 to gcc 4.5. A small test case is attached.
> When compiling this test case with "gcc -O3 perf.c -lm -std=c99"
> and executing the resulting binary, the CPU time with the head of
> the 4.4 branch is about 1.1s, with the head of the trunk it is 2.1s.
>
> This is on a Pentium D CPU. I have verified that both binaries produce
> identical results.
>
> If I can do anything to help locate the reason for this slowdown, I'd be
> glad to help, but I must admit that I'm no good at interpreting assembler.
>
> Any insight would be greatly appreciated.
>
You didn't what target you are using. Pentium D can run both 32bit
and 64bit. codes.
--
H.J.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Performance regression of generated numerical code
2009-12-14 23:59 ` H.J. Lu
@ 2009-12-15 9:20 ` Martin Reinecke
0 siblings, 0 replies; 3+ messages in thread
From: Martin Reinecke @ 2009-12-15 9:20 UTC (permalink / raw)
To: H.J. Lu; +Cc: gcc
Hi!
> You didn't what target you are using. Pentium D can run both 32bit
> and 64bit. codes.
This was done with 32bit code. I have opened PR 42376 describing
the issue and added some more information.
Cheers,
Martin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-12-15 9:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-14 23:50 Performance regression of generated numerical code Martin Reinecke
2009-12-14 23:59 ` H.J. Lu
2009-12-15 9:20 ` Martin Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).