From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1987 invoked by alias); 14 Jul 2012 20:52:36 -0000 Received: (qmail 1976 invoked by uid 22791); 14 Jul 2012 20:52:35 -0000 X-SWARE-Spam-Status: No, hits=-3.6 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 14 Jul 2012 20:52:23 +0000 From: "bfriesen at simple dot dallas.tx.us" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) Date: Sat, 14 Jul 2012 20:52:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bfriesen at simple dot dallas.tx.us X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-07/txt/msg01133.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 Bug #: 53967 Summary: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned@gcc.gnu.org ReportedBy: bfriesen@simple.dallas.tx.us Created attachment 27792 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27792 Convolution example C file, pre-processed version, build log, assembler output The classic convolution algorithm (as implemented in GraphicsMagick) is observed to run 2X slower with -mfpmath=sse than with -mfpmath=387. Unfortunately -mfpmath=sse is the default for -m64 builds on AMD_64 so this has large impact for users. Even with -mfpmath=387 other compilers (LLVM, Open64, and Oracle Studio) produce faster code by default so some of these compilers are producing up to 3X better overall run-time performance and all of them are at least 2X faster than the GCC default for x86-64. This issue has been verified under Solaris 10, OpenIndiana, and Ubuntu Linux on Opteron and several modern Xeon CPUs. Please note that AMD Opteron 6200 family CPUs were not observed to suffer from this issue.