From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 503E73858C60 for ; Thu, 7 Oct 2021 14:59:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 503E73858C60 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 24846200D0; Thu, 7 Oct 2021 14:59:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633618791; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IYLnao4M705kr+5hHif4Fstta1pxKXIYQlQOegTbu2Y=; b=0H+Veq/XB+v6JiDlkeA7NDT9wkiw7K3nrUFFevtXxvUY8HU6dwJICZ6r9LlOA1s6guoYfF LidOgyVa5+qLIuORVprT7rEBDso2modg/kAlxkXG71VPlCiwuZM2OaT/xZxI/9wZZ9O8TM uzfB2ZtWL35uz+xepikgyug4w1wDzTc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633618791; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IYLnao4M705kr+5hHif4Fstta1pxKXIYQlQOegTbu2Y=; b=nk5/Cma3B6qSWXeIc8hIn3cL+w0fovUqhqrM7gu2X7bSiKzRX0H//w/5yQqod4k3WCxFCV wivomvaZMMwHKmBg== Received: from suse.cz (mjambor.udp.ovpn1.nue.suse.de [10.163.29.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 10F11A3B83; Thu, 7 Oct 2021 14:59:51 +0000 (UTC) From: Martin Jambor To: Jan Hubicka Cc: GCC Patches , Xionghu Luo Subject: Re: [PATCH 2/4] ipa-cp: Propagation boost for recursion generated values In-Reply-To: <20211006154937.GB64649@kam.mff.cuni.cz> References: <3068c6c4ee451244031d8198d663de6e614f28f9.1629805719.git.mjambor@suse.cz> <20211006154937.GB64649@kam.mff.cuni.cz> User-Agent: Notmuch/0.33.1 (https://notmuchmail.org) Emacs/27.2 (x86_64-suse-linux-gnu) Date: Thu, 07 Oct 2021 16:59:48 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 14:59:54 -0000 Hi, On Wed, Oct 06 2021, Jan Hubicka wrote: >> Recursive call graph edges, even when they are hot and important for >> the compiled program, can never have frequency bigger than one, even >> when the actual time savings in the next recursion call are not >> realized just once but depend on the depth of recursion. The current >> IPA-CP effect propagation code did not take that into account and just >> used the frequency, thus severely underestimating the effect. >> >> This patch artificially boosts values taking part in such calls. If a >> value feeds into itself through a recursive call, the frequency of the >> edge is multiplied by a parameter with default value of 6, basically >> assuming that the recursion will take place 6 times. This value can >> of course be subject to change. >> >> Moreover, values which do not feed into themselves but which were >> generated for a self-recursive call with an arithmetic >> pass-function (aka the 548.exchange "hack" which however is generally >> applicable for recursive functions which count the recursion depth in >> a parameter) have the edge frequency multiplied as many times as there >> are generated values in the chain. In essence, we will assume they >> are all useful. >> >> This patch partially fixes the current situation when we fail to >> optimize 548.exchange with PGO. In the benchmark one recursive edge >> count overwhelmingly dominates all other counts in the program and so >> we fail to perform the first cloning (for the nonrecursive entry call) >> because it looks totally insignificant. >> >> gcc/ChangeLog: >> >> 2021-07-16 Martin Jambor >> >> * params.opt (ipa-cp-recursive-freq-factor): New. >> * ipa-cp.c (ipcp_value): Switch to inline initialization. New members >> scc_no, self_recursion_generated_level, same_scc and >> self_recursion_generated_p. >> (ipcp_lattice::add_value): Replaced parameter unlimited with >> same_lat_gen_level, usit it determine limit of values and store it to >> the value. >> (ipcp_lattice::print): Dump the new fileds. >> (allocate_and_init_ipcp_value): Take same_lat_gen_level as a new >> parameter and store it to the new value. >> (self_recursively_generated_p): Removed. >> (propagate_vals_across_arith_jfunc): Use self_recursion_generated_p >> instead of self_recursively_generated_p, store self generation level >> to such values. >> (value_topo_info::add_val): Set scc_no. >> (value_topo_info::propagate_effects): Multiply frequencies of >> recursively feeding values and self generated values by appropriate >> new factors. > > If you boost every self fed value by factor of 6, I wonder how quickly > we run into exponential explosion of the cost (since the frequency > should be close to 1 and 6^9=10077696.... The factor of six is applied once for an entire SCC, so we'd reach this huge number only if there was a chain of nine different recursive functions - with this patch we assume each one will recurse six times, so the result is indeed a huge execution count estimate. The constant is not used for the "self generated" values like those in exchange, those are handled by the else branch below. For those we expect the recursion happens 8 times, because that is how many values we generate, but the boost is different depending on the recursion depth. > > I think it would be more robust to simply assume that the job will >distribute evenly across the clones. How hard is to implement that? This is not an update of counters. The code tries to estimate execution time improvement that is will be possible in callees if we clone for this particular value and so is based on call graph edge frequencies (so that if in a callee we can save 5 units of time and the frequency is 5, we estimate we will save 25). The code has the advantage that it is universal for both situations when profile feedback is and is not available. Martin