From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter van Hoof To: gcc-gnats@gcc.gnu.org Subject: optimization/4487: -ffast-math fails to disable gradual underflow on Ultrasparc Date: Fri, 05 Oct 2001 15:46:00 -0000 Message-id: X-SW-Source: 2001-10/msg00112.html List-Id: >Number: 4487 >Category: optimization >Synopsis: -ffast-math fails to disable gradual underflow for Ultrasparc >Confidential: no >Severity: serious >Priority: low >Responsible: unassigned >State: open >Class: pessimizes-code >Submitter-Id: net >Arrival-Date: Fri Oct 05 15:46:01 PDT 2001 >Closed-Date: >Last-Modified: >Originator: Peter van Hoof >Release: 3.0.1 >Organization: Canadian Institute for Theoretical Astrophysics >Environment: System: SunOS scooby 5.8 Generic_108528-10 sun4u sparc SUNW,Sun-Blade-100 Architecture: sun4 host: sparc-sun-solaris2.8 build: sparc-sun-solaris2.8 target: sparc-sun-solaris2.8 configured with: ../gcc-3.0.1/configure --prefix=/opt/local --enable-threads --enable-gcj >Description: Ultrasparc chips do not support gradual underflow in hardware, and therefore these instructions need to be emulated in software. Since -ffast-math allows deviations from the IEEE-754 standard for the sake of increasing performance, it is my opinion that -ffast-math should flush denormalized numbers to zero (or at least there should be some option for enabling this; to the best of my knowledge no such flag exists for Sparc hardware). Needless to say that software emulation can lead to substantial performance degradation for certain programs. My machine has a 500MHz Ultrasparc IIe processor, but I think the problem is the same for all v9 hardware. >How-To-Repeat: To illustrate the degradation, here is a little program that generates oodles of underflows: scooby> gcc -O3 -ffast-math test.c -lm scooby> time a.out 16.02u 135.95s 2:35.33 97.8% The -fast option on the SunWorks compiler does flush denormalized numbers to zero. I do not have a SunWorks compiler myself, so I used somebody elses (running on a Sun Ultra 1): chinook> cc -fast test.c -lm scooby> time a.out 0.23u 0.01s 0:00.21 114.2% There obviously is a dramatic improvement in performance. This is test.c: double pow(double,double); int main() { long i,j; double x[5000],y[5000],fac; fac = 1.e-305; for( i=0; i < 5000; i++ ) { x[i] = pow((double)(i+1),5.); y[i] = 0.; } for( j=0; j < 1000; j++ ) { for( i=0; i < 5000; i++ ) { y[i] += fac/x[i]; } } } >Fix: The SunWorks compiler can obviously work around the problem, so there must be a workaround. However, I haven't found it yet. If somebody knows how to do it, I would be happy to hear about it! >Release-Note: >Audit-Trail: >Unformatted: