From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30047 invoked by alias); 25 Mar 2004 12:37:31 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 30013 invoked from network); 25 Mar 2004 12:37:29 -0000 Received: from unknown (HELO nile.gnat.com) (205.232.38.5) by sources.redhat.com with SMTP; 25 Mar 2004 12:37:29 -0000 Received: from localhost (localhost [127.0.0.1]) by nile.gnat.com (Postfix) with ESMTP id E7F04F2CB5; Thu, 25 Mar 2004 07:37:28 -0500 (EST) Received: from nile.gnat.com ([127.0.0.1]) by localhost (nile.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 23120-01-8; Thu, 25 Mar 2004 07:37:28 -0500 (EST) Received: from gnat.com (hoosic.gnat.com [205.232.38.102]) by nile.gnat.com (Postfix) with ESMTP id 8BCD8F281B; Thu, 25 Mar 2004 07:37:28 -0500 (EST) Message-ID: <4062D287.7050100@gnat.com> Date: Thu, 25 Mar 2004 14:45:00 -0000 From: Robert Dewar User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 MIME-Version: 1.0 To: Joost VandeVondele Cc: gcc@gcc.gnu.org Subject: Re: (a+b)+c should be replaced by a+(b+c) References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at nile.gnat.com X-SW-Source: 2004-03/txt/msg01461.txt.bz2 Joost VandeVondele wrote: > good compilers (e.g. xlf90) will (at -O4) do higher order transforms of > the loop to introduce blocking, independent FMAs, ... that makes this > little piece of code about 100 times faster at O4 than O2 (what about > LNO/SSA?). This can only be done if you allow (a+b)+c -> a+(b+c). It is > basically what any optimized blas routine will do. Matrix multiply is a > trivial example, if you want blas performance, call blas. There are many > other kernels like this in e.g. scientific code that are not blas. You > can't expect a scientist to hand unroll and block any kernel to the > appropriate depth for any machine. There need to be a compiler option to > do this. This can only be done if you allow (a+b)+c -> a+(b+c). Can you really deduce this freedom from later versions of the Fortran standard?