From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-245152-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 2942 invoked by alias); 16 Feb 2008 18:50:58 -0000
Received: (qmail 1417 invoked by uid 48); 16 Feb 2008 18:50:09 -0000
Date: Sat, 16 Feb 2008 18:50:00 -0000
Message-ID: <20080216185009.1416.qmail@sourceware.org>
X-Bugzilla-Reason: CC
References: <bug-29549-1719@http.gcc.gnu.org/bugzilla/>
Subject: [Bug fortran/29549] matmul slow for complex matrices
In-Reply-To: <bug-29549-1719@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "fxcoudert at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2008-02/txt/msg01806.txt.bz2


------- Comment #7 from fxcoudert at gcc dot gnu dot org  2008-02-16 18:50 -------
Thomas is right: -fcx-limited-range sets flag_complex_method to 0, but already
with flag_complex_method == 1 we have some rather good figures. Here are the
execution times of 300x300 matmul on my MacBook Pro (i386-apple-darwin8.11.1):

  - a home-made triple do loop in Fortran (Janne's comment #2) is 0.1876 sec
  - unpatched matmul is 0.5499 sec
  - matmul compiled with flag_complex_method == 1 is 0.1448 sec

The following patch is what I used to benchmark: it creates a
-fcx-fortran-rules (of course, we do know that Fortran actually rules, but
hiding it in an option name is a clever way for people to slowly start
realizing it) option that sets flag_complex_method to 1, and uses it to compile
libgfortran's matmul routines.


Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c        (revision 132353)
+++ gcc/toplev.c        (working copy)
@@ -2001,6 +2001,10 @@
   if (flag_cx_limited_range)
     flag_complex_method = 0;

+  /* With -fcx-fortran-rules, we do something in-between cheap and C99.  */
+  if (flag_cx_fortran_rules)
+    flag_complex_method = 1;
+
   /* Targets must be able to place spill slots at lower addresses.  If the
      target already uses a soft frame pointer, the transition is trivial.  */
   if (!FRAME_GROWS_DOWNWARD && flag_stack_protect)
Index: gcc/common.opt
===================================================================
--- gcc/common.opt      (revision 132353)
+++ gcc/common.opt      (working copy)
@@ -390,6 +390,10 @@
 Common Report Var(flag_cx_limited_range) Optimization
 Omit range reduction step when performing complex division

+fcx-fortran-rules
+Common Report Var(flag_cx_fortran_rules) Optimization
+Complex multiplication and division follow Fortran rules
+
 fdata-sections
 Common Report Var(flag_data_sections) Optimization
 Place data items into their own section
Index: libgfortran/Makefile.am
===================================================================
--- libgfortran/Makefile.am     (revision 132353)
+++ libgfortran/Makefile.am     (working copy)
@@ -636,7 +636,7 @@
 install-pdf:

 # Turn on vectorization and loop unrolling for matmul.
-$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
-fs
+$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
-fs
 # Logical matmul doesn't vectorize.
 $(patsubst %.c,%.lo,$(notdir $(i_matmull_c))): AM_CFLAGS += -funroll-loops


-- 

fxcoudert at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fxcoudert at gcc dot gnu dot
                   |                            |org
           Keywords|                            |patch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549