From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27689 invoked by alias); 13 Jul 2011 15:00:03 -0000 Received: (qmail 27553 invoked by uid 22791); 13 Jul 2011 15:00:00 -0000 X-SWARE-Spam-Status: No, hits=-1.4 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_NONE X-Spam-Check-By: sourceware.org Received: from mo-p00-ob.rzone.de (HELO mo-p00-ob.rzone.de) (81.169.146.161) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 13 Jul 2011 14:59:18 +0000 X-RZG-AUTH: :LXoWVUeid/7A29J/hMvvT2k715jHQaJercGObUOFkj18odoYNahU4Q== X-RZG-CLASS-ID: mo00 Received: from [192.168.0.22] (business-188-111-022-002.static.arcor-ip.net [188.111.22.2]) by smtp.strato.de (fruni mo12) (RZmta 26.0) with ESMTPA id N0464fn6DE9iqv ; Wed, 13 Jul 2011 16:59:16 +0200 (MEST) Message-ID: <4E1DB2C3.7070900@gjlay.de> Date: Wed, 13 Jul 2011 15:27:00 -0000 From: Georg-Johann Lay User-Agent: Thunderbird 2.0.0.24 (X11/20100302) MIME-Version: 1.0 To: Richard Guenther CC: Andreas Krebbel , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions References: <20110713131305.GA5348@bart> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-07/txt/msg01055.txt.bz2 Richard Guenther wrote: > On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel > wrote: >> Hi, >> >> the widening_mul pass might increase the number of multiplications in >> the code by transforming >> >> a = b * c >> d = a + 2 >> e = a + 3 >> >> into: >> >> d = b * c + 2 >> e = b * c + 3 >> >> under the assumption that an FMA instruction is not more expensive >> than a simple add. This certainly isn't always true. While e.g. on >> s390 an fma is indeed not slower than an add execution-wise it has >> disadvantages regarding instruction grouping. It doesn't group with >> any other instruction what has a major impact on the instruction >> dispatch bandwidth. >> >> The following patch tries to figure out the costs for adds, mults and >> fmas by building an RTX and asking the backends cost function in order >> to estimate whether it is whorthwhile doing the transformation. >> >> With that patch the 436.cactus hotloop contains 28 less >> multiplications than before increasing performance slightly (~2%). >> >> Bootstrapped and regtested on x86_64 and s390x. > > Ick ;) > > Maybe this is finally the time to introduce target hook(s) to > get us back costs for trees? For this case we'd need two > actually, or just one - dependent on what finegrained information > we pass. Choices: > > tree_code_cost (enum tree_code) > tree_code_cost (enum tree_code, enum machine_mode mode) > unary_cost (enum tree_code, tree actual_arg0) // args will be mostly > SSA names or constants, but at least they are typed - works for > mixed-typed operations > binary_cost (...) > ... > unary_cost (enum tree_code, enum tree_code arg0_kind) // constant > vs. non-constant arg, but lacks type/mode > > Richard. What's bad with rtx_costs? Yet another cost function might duplicate cost computation in a backend -- once on trees and once on RTXs. BTW: For a port I read rtx_costs from insn attributes which helped me to clean up code in rtx_costs to a great extend. In particular for a target with complex instructions which are synthesized by insn combine, rtx_costs is mostly mechanical and brain-dead retyping of bulk of code that is already present almost identical in insn-recog.c. Johann