public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/21550] New: i686 floating point performance 33% slower than gcc 3.4.3
@ 2005-05-13 15:22 trt at acm dot org
  2005-05-13 18:03 ` [Bug tree-optimization/21550] [4.0/4.1 Regression] " pinskia at gcc dot gnu dot org
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: trt at acm dot org @ 2005-05-13 15:22 UTC (permalink / raw)
  To: gcc-bugs

gcc 4.0.0 generates slower code than gcc 3.4.3 for the BLAS "axpy" operation.
(This is no doubt specific to IA32, and perhaps also to the processor version.)
The program is below, here are the timing results:

                   gcc 3.4.3    gcc 4.0.0
Method              cpu secs     cpu secs
z[]=x[]+alpha*y[]     1.45         1.72
z[]=z[]+alpha*y[]     1.47         2.03
z[]=z[]+y[]           1.44         1.57
                                                                                
The second method is a common special case of the first,
so it is unfortunate that gcc 4 does poorly on it.

========
The program is in two files to defeat inlining: rzvaxpy.c and zvaxpy.c
and here is the script I used to compile/run them:

for m in METH1 METH2 METH3
do
   for cc in gcc343 gcc400
   do
      $cc -march=i686 -O3 -D$m rzvaxpy.c zvaxpy.c
      echo $cc $m `(time a.out)2>&1`
   done
done

==== zvaxpy.c

void
zvaxpy(double *z, double *x, double *y, int n, double alpha)
{
   int i;
                                                                                
#if defined(METH1)
   for (i = 0; i < n; i++) z[i] = x[i] + alpha * y[i];
#elif defined(METH2)
   for (i = 0; i < n; i++) z[i] = z[i] + alpha * y[i];
#else
   for (i = 0; i < n; i++) z[i] = z[i] +  y[i];
#endif
}

==== rzvaxpy.c

#include <stdio.h>
                                                                                
#define N 100
#define NITER ((300*1000*1000)/N)
double a[100], b[100];
                                                                                
extern void zvaxpy(double *, double *, double *, int, double);
                                                                                
int
main()
{
   int i;
   double sum;
   for (i = 0; i < 100; i++) { a[i] = 0; b[i] = 1; }
   for (i = 0; i < NITER; i++) zvaxpy(a,a, b, N, 1.1);
   sum = 0; for (i = 0; i < N; i++) sum += a[i];
   printf("sum %g\n", sum);
   return 0;
}

-- 
           Summary: i686 floating point performance 33% slower than gcc
                    3.4.3
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: trt at acm dot org
                CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21550


^ permalink raw reply	[flat|nested] 6+ messages in thread
[parent not found: <bug-21550-4397@http.gcc.gnu.org/bugzilla/>]

end of thread, other threads:[~2005-10-16 22:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-13 15:22 [Bug c/21550] New: i686 floating point performance 33% slower than gcc 3.4.3 trt at acm dot org
2005-05-13 18:03 ` [Bug tree-optimization/21550] [4.0/4.1 Regression] " pinskia at gcc dot gnu dot org
2005-07-08  1:35 ` mmitchel at gcc dot gnu dot org
2005-09-27 15:57 ` mmitchel at gcc dot gnu dot org
2005-09-29  3:28 ` pinskia at gcc dot gnu dot org
     [not found] <bug-21550-4397@http.gcc.gnu.org/bugzilla/>
2005-10-16 22:25 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).