From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10841 invoked by alias); 14 Jul 2011 09:40:18 -0000 Received: (qmail 10830 invoked by uid 22791); 14 Jul 2011 09:40:17 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-iw0-f175.google.com (HELO mail-iw0-f175.google.com) (209.85.214.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 14 Jul 2011 09:40:03 +0000 Received: by iwn4 with SMTP id 4so41884iwn.20 for ; Thu, 14 Jul 2011 02:40:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.123.209 with SMTP id q17mr1025277ibr.121.1310636402430; Thu, 14 Jul 2011 02:40:02 -0700 (PDT) Received: by 10.231.176.85 with HTTP; Thu, 14 Jul 2011 02:40:02 -0700 (PDT) In-Reply-To: References: <20110713131305.GA5348@bart> Date: Thu, 14 Jul 2011 09:43:00 -0000 Message-ID: Subject: Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions From: Richard Guenther To: Steven Bosscher Cc: Andreas Krebbel , gcc-patches@gcc.gnu.org, Richard Henderson Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-07/txt/msg01116.txt.bz2 On Wed, Jul 13, 2011 at 11:49 PM, Steven Bosscher w= rote: > On Wed, Jul 13, 2011 at 4:34 PM, Richard Guenther > wrote: >> On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel >> wrote: >>> Hi, >>> >>> the widening_mul pass might increase the number of multiplications in >>> the code by transforming >>> >>> a =3D b * c >>> d =3D a + 2 >>> e =3D a + 3 >>> >>> into: >>> >>> d =3D b * c + 2 >>> e =3D b * c + 3 >>> >>> under the assumption that an FMA instruction is not more expensive >>> than a simple add. =A0This certainly isn't always true. =A0While e.g. on >>> s390 an fma is indeed not slower than an add execution-wise it has >>> disadvantages regarding instruction grouping. =A0It doesn't group with >>> any other instruction what has a major impact on the instruction >>> dispatch bandwidth. >>> >>> The following patch tries to figure out the costs for adds, mults and >>> fmas by building an RTX and asking the backends cost function in order >>> to estimate whether it is whorthwhile doing the transformation. >>> >>> With that patch the 436.cactus hotloop contains 28 less >>> multiplications than before increasing performance slightly (~2%). >>> >>> Bootstrapped and regtested on x86_64 and s390x. >> >> Ick ;) > > +1 > >> Maybe this is finally the time to introduce target hook(s) to >> get us back costs for trees? =A0For this case we'd need two >> actually, or just one - dependent on what finegrained information >> we pass. =A0Choices: >> >> =A0tree_code_cost (enum tree_code) >> =A0tree_code_cost (enum tree_code, enum machine_mode mode) >> =A0unary_cost (enum tree_code, tree actual_arg0) // args will be mostly >> SSA names or constants, but at least they are typed - works for >> mixed-typed operations >> =A0binary_cost (...) >> =A0... >> =A0unary_cost (enum tree_code, enum tree_code arg0_kind) // constant >> vs. non-constant arg, but lacks type/mode > > Or maybe add a cost function for all named insns (i.e. > http://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names)? > I think that any form of lower GIMPLE will not be so low level that > more combinations will exist than the available named patterns. It > should be possible to write a gen* tool using rtx_costs to compute > some useful cost metric for all named patterns. How complicated that > could be (modes, reg vs. mem, etc.), I don't know... But at least that > way we don't end up with multiple target costs depending on the IR in > use. Yeah, it occured to me as well that when we look for supportable operations via optabs the same mechanism should also be able to provide a cost, maybe as simple as attaching one to the named expanders. Generating RTL from GIMPLE passes just to be able to use rtx_cost is, well ... gross. Yes, we do it in IVOPTs (and that case is even more comple= x), but I don't think we want to start doing it elsewhere (look how the vectori= zer for example uses new target hooks instead of generating vectorized RTL and then using rtx_cost). Richard. > Ciao! > Steven >