From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id D110E3858C78 for ; Wed, 13 Dec 2023 11:36:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D110E3858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D110E3858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702467381; cv=none; b=go7Z5k2qlA3cEuI3fncEK6KRhx6FgkIb5T+wHcBoxM3foPfKe/qBBwUG0q0cH8g68OTD4uZ0xOCJ7lrKZkLifmr0YmGcxoWiqfRCRblxvaloz1TtW3DWU2UTgW1UqBGRGzMwM+WfexVCGjvvav6f2cbc658wmhX3MLSM/4A3iYU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702467381; c=relaxed/simple; bh=Siwvpo6ZAfuFx0HP4phcl868klpyUNkFU9oBWdhIymI=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=XWYlzOHRaN0j/t8Kw2C/h4LeWVwll7m3UmaLQfyCVHN6sX+5ILDuwHdNA+KwKgLAk4ZRrS6rCrqcNFUTnCBBX4F9L7HaUxnIcZ5iJAZ/hGT/Ev208aAJhhGWy2m8sWOVtW8kO6HcQwv7Vhz3UY6In0a1sacPe+O0J2Hdbi1DEGk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 834BB22580; Wed, 13 Dec 2023 11:36:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702467375; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FDbvVHn/I81DeVFWTZv73W/I/wf8KX2mQqXssJ6cVjc=; b=qjON7HK6CotIDg2g2ihdJoKjA4Gj538FiQMIVJrncK5YGCf8edfgYwdO20INag/AagoKSb Fg7UdR66wS4yYUlBbjdx2pnPN0AMfnq0cL4BrQnSnKou3rFoBv2U+/BzyakxBNS9YfERBa frJ4ozrvF/v+7JOCniBLmTWDieF5pXQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702467375; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FDbvVHn/I81DeVFWTZv73W/I/wf8KX2mQqXssJ6cVjc=; b=RcfyYqDTnjion2HEE+3KgsdGLDs5euT97N+Q39EpkegYtp4cq2HdktIKqTEiKT8p22ZIxE qyKAQH9UU8vxIIBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702467375; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FDbvVHn/I81DeVFWTZv73W/I/wf8KX2mQqXssJ6cVjc=; b=qjON7HK6CotIDg2g2ihdJoKjA4Gj538FiQMIVJrncK5YGCf8edfgYwdO20INag/AagoKSb Fg7UdR66wS4yYUlBbjdx2pnPN0AMfnq0cL4BrQnSnKou3rFoBv2U+/BzyakxBNS9YfERBa frJ4ozrvF/v+7JOCniBLmTWDieF5pXQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702467375; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FDbvVHn/I81DeVFWTZv73W/I/wf8KX2mQqXssJ6cVjc=; b=RcfyYqDTnjion2HEE+3KgsdGLDs5euT97N+Q39EpkegYtp4cq2HdktIKqTEiKT8p22ZIxE qyKAQH9UU8vxIIBQ== Date: Wed, 13 Dec 2023 12:35:11 +0100 (CET) From: Richard Biener To: Filip Kastl cc: gcc-patches@gcc.gnu.org, hubicka@ucw.cz Subject: Re: [PATCH v3] A new copy propagation and PHI elimination pass In-Reply-To: Message-ID: References: <571578n9-pn64-qqqq-q59s-866548669143@fhfr.qr> <39489or3-7659-5306-4133-6p3n2pr58p99@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: X-Spam-Score: -3.10 Authentication-Results: smtp-out1.suse.de; none X-Spam-Level: X-Spam-Score: -3.30 X-Spamd-Result: default: False [-3.30 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-0.978]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,suse.cz:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Flag: NO X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 8 Dec 2023, Filip Kastl wrote: > > > Hi, > > > > > > this is a patch that I submitted two months ago as an RFC. I added some polish > > > since. > > > > > > It is a new lightweight pass that removes redundant PHI functions and as a > > > bonus does basic copy propagation. With Jan Hubi?ka we measured that it is able > > > to remove usually more than 5% of all PHI functions when run among early passes > > > (sometimes even 13% or more). Those are mostly PHI functions that would be > > > later optimized away but with this pass it is possible to remove them early > > > enough so that they don't get streamed when runing LTO (and also potentially > > > inlined at multiple places). It is also able to remove some redundant PHIs > > > that otherwise would still be present during RTL expansion. > > > > > > Jakub Jel?nek was concerned about debug info coverage so I compiled cc1plus > > > with and without this patch. These are the sizes of .debug_info and > > > .debug_loclists > > > > > > .debug_info without patch 181694311 > > > .debug_info with patch 181692320 > > > +0.0011% change > > > > > > .debug_loclists without patch 47934753 > > > .debug_loclists with patch 47934966 > > > -0.0004% change > > > > > > I wanted to use dwlocstat to compare debug coverages but didn't manage to get > > > the program working on my machine sadly. Hope this suffices. Seems to me that > > > my patch doesn't have a significant impact on debug info. > > > > > > Bootstraped and tested* on x86_64-pc-linux-gnu. > > > > > > * One testcase (pr79691.c) did regress. However that is because the test is > > > dependent on a certain variable not being copy propagated. I will go into more > > > detail about this in a reply to this mail. > > > > > > Ok to commit? > > > > This is a second version of the patch. In this version, I modified the > > pr79691.c testcase so that it works as intended with other changes from the > > patch. > > > > The pr79691.c testcase checks that we get constants from snprintf calls and > > that they simplify into a single constant. The testcase doesn't account for > > the fact that this constant may be further copy propagated which is exactly > > what happens with this patch applied. > > > > Bootstrapped and tested on x86_64-pc-linux-gnu. > > > > Ok to commit? > > This is the third version of the patch. In this version, I adressed most of > Richards remarks about the second version. Here is a summary of changes I made: > > - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc > - Use simple_dce_from_worklist to remove propagated statements > - Use existing replace_uses API instead of reinventing it > - This allowed me to get rid of some now redundant cleanup code > - Encapsulate the SCC finding into a class > - Rework stmt_may_generate_copy to get rid of redundant checks > - Add check that PHI doesn't contain two non-SSA-name values to > stmt_may_generate_copy > - Regarding alignment and value ranges in stmt_may_generate_copy: For now use > the conservative check that Richard suggested > - Index array of vertices that SCC discovery uses by SSA name version numbers > instead of numbering statements myself. > > > I didn't make any changes based on these remarks: > > 1 It might be nice to optimize SCCs of size 1 somehow, not sure how > many times these appear - possibly prevent them from even entering > the SCC discovery? > > It would be nice. But the only way to do this that I see right now is to first > propagate SCCs of size 1 and then the rest. This would mean adding a new copy > propagation procedure. It wouldn't be a trivial procedure. Efficiency of the > pass relies on having SCCs topogically sorted so this procedure would have to > implement some topological sort algorithm. > > This could be done. It could save allocating some vec<>s (right now, SCCs of > size 1 are represented by a vec<> with a single element). But is it worth it to > optimize the pass this way right now? If possible, I'd like to see that the > pass works and sort out any problems people encounter with it before I start > optimizing it. > > 2 Instead of collecting all stmts that may generate a copy at the beginning of > the pass into a vec<>, let the SCC discovery check that statements may > generate a copy on the fly. > > This would be a big change to the pass, it would require a lot of reworking. > I'm also not sure if this would help reduce the number of allocated vec<>s that > much because I'll still want to represent SCCs by vec<>s. > > Again - its possible I'll want to rework the pass in this way in the future but > I'd like to leave it as it is for now. > > 3 Add a comment saying that the pass is doing optimistic copy propagation > > I don't think the pass works in an optimistic way. It doesn't assume that all > variables are copies of each other at any point. It instead identifies copy > statements (or PHI SCCs that act as copy statements) and then for each of these > it propagates: By that I mean if a copy statement says that _3 is a copy of _2, > then it replaces all uses of _3 by _2. > > But its possible that I still misinterpret what 'optimistic' means. If > mentioning that it works in an optimistic way would help readability of the > pass, I'm happy to add that comment. I understood it handles copy cycles? Consider _1 = _2; bb3: # _3 = PHI <_1, _4> _4 = _3; if (...) break; else goto bb3; It should see that _4 and _3 are a copy of _1 (and thus _2)? When processing the cycle comprising the defs of _3 and _4 don't you then optimistically assume that _1 == _4 when processing the PHI? I realize this might be "trivial" given the SCCs are constructed from copies. But still with say _3 = PHI <_1, _4> _4 = PHI <_3, _2> the cycle would have two entries (it's irreducible) and neither _3 nor _4 are a copy of _1 (or _2) unless _1 is a copy of _2 (or vice versa). > Richard, is the patch ok as it is now? Or do you think that making changes 1, 2 > or 3 is necessary? Or are there any other issues which should be adressed? It's OK to not do the changes you skipped. > Filip > > > -- >8 -- > > This patch adds the strongly-connected copy propagation (SCCOPY) pass. > It is a lightweight GIMPLE copy propagation pass that also removes some > redundant PHI statements. It handles degenerate PHIs, e.g.: > > _5 = PHI <_1>; > _6 = PHI <_6, _6, _1, _1>; > _7 = PHI <16, _7>; > // Replaces occurences of _5 and _6 by _1 and _7 by 16 > > It also handles more complicated situations, e.g.: > > _8 = PHI <_9, _10>; > _9 = PHI <_8, _10>; > _10 = PHI <_8, _9, _1>; > // Replaces occurences of _8, _9 and _10 by _1 > > gcc/ChangeLog: > > * Makefile.in: Added sccopy pass. > * passes.def: Added sccopy pass before LTO streaming and before > RTL expanstion. > * tree-pass.h (make_pass_sccopy): Added sccopy pass. > * gimple-ssa-sccopy.cc: New file. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account > for additional copy propagation this patch adds. > * gcc.dg/sccopy-1.c: New test. > > Signed-off-by: Filip Kastl > --- > gcc/Makefile.in | 1 + > gcc/gimple-ssa-sccopy.cc | 687 ++++++++++++++++++++++++ > gcc/passes.def | 3 + > gcc/testsuite/gcc.dg/sccopy-1.c | 78 +++ > gcc/testsuite/gcc.dg/tree-ssa/pr79691.c | 2 +- > gcc/tree-pass.h | 1 + > 6 files changed, 771 insertions(+), 1 deletion(-) > create mode 100644 gcc/gimple-ssa-sccopy.cc > create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index 753f2f36618..2e6f2fef1ba 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1497,6 +1497,7 @@ OBJS = \ > gimple-ssa-backprop.o \ > gimple-ssa-isolate-paths.o \ > gimple-ssa-nonnull-compare.o \ > + gimple-ssa-sccopy.o \ > gimple-ssa-split-paths.o \ > gimple-ssa-store-merging.o \ > gimple-ssa-strength-reduction.o \ > diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc > new file mode 100644 > index 00000000000..b74f7a62569 > --- /dev/null > +++ b/gcc/gimple-ssa-sccopy.cc > @@ -0,0 +1,687 @@ > +/* Strongly-connected copy propagation pass for the GNU compiler. > + Copyright (C) 2023 Free Software Foundation, Inc. > + Contributed by Filip Kastl > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +. */ > + > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "backend.h" > +#include "tree.h" > +#include "gimple.h" > +#include "tree-pass.h" > +#include "ssa.h" > +#include "gimple-iterator.h" > +#include "vec.h" > +#include "hash-set.h" > +#include > +#include "ssa-iterators.h" > +#include "gimple-fold.h" > +#include "gimplify.h" > +#include "tree-cfg.h" > +#include "tree-eh.h" > +#include "builtins.h" > +#include "tree-ssa-dce.h" > +#include "fold-const.h" > + > +/* Strongly connected copy propagation pass. > + > + This is a lightweight copy propagation pass that is also able to eliminate > + redundant PHI statements. The pass considers the following types of copy > + statements: > + > + 1 An assignment statement with a single argument. > + > + _3 = _2; > + _4 = 5; > + > + 2 A degenerate PHI statement. A degenerate PHI is a PHI that only refers to > + itself or one other value. > + > + _5 = PHI <_1>; > + _6 = PHI <_6, _6, _1, _1>; > + _7 = PHI <16, _7>; > + > + 3 A set of PHI statements that only refer to each other or to one other > + value. > + > + _8 = PHI <_9, _10>; > + _9 = PHI <_8, _10>; > + _10 = PHI <_8, _9, _1>; > + > + All of these statements produce copies and can be eliminated from the > + program. For a copy statement we identify the value it creates a copy of > + and replace references to the statement with the value -- we propagate the > + copy. > + > + _3 = _2; // Replace all occurences of _3 by _2 > + > + _8 = PHI <_9, _10>; > + _9 = PHI <_8, _10>; > + _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1 > + > + To find all three types of copy statements we use an algorithm based on > + strongly-connected components (SCCs) in dataflow graph. The algorithm was > + introduced in an article from 2013[1]. We describe the algorithm bellow. > + > + To identify SCCs we implement the Robert Tarjan's SCC algorithm. For the > + SCC computation we wrap potential copy statements in the 'vertex' struct. > + To each of these statements we also assign a vertex number ('vxnum'). Since > + the main algorithm has to be able to compute SCCs of subgraphs of the whole > + dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from > + leaving the subgraph. > + > + References: > + > + [1] Simple and Efficient Construction of Static Single Assignmemnt Form, > + Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791, > + Section 3.2. */ > + > +/* Bitmap tracking statements which were propagated to be removed at the end of > + the pass. */ > + > +static bitmap dead_stmts; > + > +/* State of vertex during SCC discovery. > + > + unvisited Vertex hasn't yet been popped from worklist. > + vopen DFS has visited vertex for the first time. Vertex has been put > + on Tarjan stack. > + closed DFS has backtracked through vertex. At this point, vertex > + doesn't have any unvisited neighbors. > + in_scc Vertex has been popped from Tarjan stack. */ > + > +enum vstate > +{ > + unvisited, > + vopen, > + closed, > + in_scc > +}; > + > +/* Information about a vertex. Used by SCC discovery. */ > + > +struct vertex > +{ > + bool active; /* scc_discovery::compute_sccs () only considers a subgraph of > + the whole dataflow graph. It uses this flag so that it knows > + which vertices are part of this subgraph. */ > + vstate state; > + unsigned index; > + unsigned lowlink; > +}; > + > +/* SCC discovery. > + > + Used to find SCCs in a dataflow graph. Implements Tarjan's SCC > + algorithm. */ > + > +class scc_discovery > +{ > +public: > + scc_discovery (); > + ~scc_discovery (); > + auto_vec> compute_sccs (auto_vec &stmts); > + > +private: > + unsigned curr_generation = 0; > + vertex* vertices; /* Indexed by SSA_NAME_VERSION. */ > + auto_vec worklist; /* DFS stack. */ > + auto_vec stack; /* Tarjan stack. */ > + > + void visit_neighbor (tree neigh_tree, unsigned parent_vxnum); > +}; > + > +scc_discovery::scc_discovery () > +{ > + /* Create vertex struct for each SSA name. */ > + vertices = XNEWVEC (struct vertex, num_ssa_names); > + unsigned i = 0; > + for (i = 0; i < num_ssa_names; i++) > + vertices[i].active = false; > +} > + > +scc_discovery::~scc_discovery () > +{ > + XDELETEVEC (vertices); > +} > + > +/* Part of 'scc_discovery::compute_sccs()'. */ > + > +void > +scc_discovery::visit_neighbor (tree neigh_tree, unsigned parent_version) > +{ > + if (TREE_CODE (neigh_tree) != SSA_NAME) > + return; /* Skip any neighbor that isn't an SSA name. */ > + unsigned neigh_version = SSA_NAME_VERSION (neigh_tree); > + > + /* Skip neighbors outside the subgraph that Tarjan currently works > + with. */ > + if (!vertices[neigh_version].active) > + return; > + > + vstate neigh_state = vertices[neigh_version].state; > + vstate parent_state = vertices[parent_version].state; > + if (parent_state == vopen) /* We're currently opening parent. */ > + { > + /* Put unvisited neighbors on worklist. Update lowlink of parent > + vertex according to indices of neighbors present on stack. */ > + switch (neigh_state) > + { > + case unvisited: > + worklist.safe_push (neigh_version); > + break; > + case vopen: > + case closed: > + vertices[parent_version].lowlink = > + std::min (vertices[parent_version].lowlink, > + vertices[neigh_version].index); > + break; > + case in_scc: > + /* Ignore these edges. */ > + break; > + } > + } > + else if (parent_state == closed) /* We're currently closing parent. */ > + { > + /* Update lowlink of parent vertex according to lowlinks of > + children of parent (in terms of DFS tree). */ > + if (neigh_state == closed) > + { > + vertices[parent_version].lowlink = > + std::min (vertices[parent_version].lowlink, > + vertices[neigh_version].lowlink); > + } > + } > +} > + > +/* Compute SCCs in dataflow graph on given statements 'stmts'. Ignore > + statements outside 'stmts'. Return the SCCs in a reverse topological > + order. > + > + stmt_may_generate_copy () must be true for all statements from 'stmts'! */ > + > +auto_vec> > +scc_discovery::compute_sccs (auto_vec &stmts) Please use vec &stmts, not auto_vec<..>. > +{ > + auto_vec> sccs; > + > + for (gimple *stmt : stmts) > + { > + unsigned i; > + switch (gimple_code (stmt)) > + { > + case GIMPLE_ASSIGN: > + i = SSA_NAME_VERSION (gimple_assign_lhs (stmt)); > + break; > + case GIMPLE_PHI: > + i = SSA_NAME_VERSION (gimple_phi_result (stmt)); > + break; > + default: > + gcc_unreachable (); > + } > + > + vertices[i].index = 0; > + vertices[i].lowlink = 0; > + vertices[i].state = unvisited; > + vertices[i].active = true; /* Mark the subgraph we'll be working on so > + that we don't leave it. */ > + > + worklist.safe_push (i); > + } > + > + /* Worklist loop. */ > + unsigned curr_index = 0; > + while (!worklist.is_empty ()) > + { > + unsigned i = worklist.pop (); > + gimple *stmt = SSA_NAME_DEF_STMT (ssa_name (i)); > + vstate state = vertices[i].state; > + > + if (state == unvisited) > + { > + vertices[i].state = vopen; > + > + /* Assign index to this vertex. */ > + vertices[i].index = curr_index; > + vertices[i].lowlink = curr_index; > + curr_index++; > + > + /* Put vertex on stack and also on worklist to be closed later. */ > + stack.safe_push (i); > + worklist.safe_push (i); > + } > + else if (state == vopen) > + vertices[i].state = closed; > + > + /* Visit neighbors of this vertex. */ > + tree op; > + gphi *phi; > + switch (gimple_code (stmt)) > + { > + case GIMPLE_PHI: > + phi = as_a (stmt); > + unsigned j; > + for (j = 0; j < gimple_phi_num_args (phi); j++) > + { > + op = gimple_phi_arg_def (phi, j); > + visit_neighbor (op, i); > + } > + break; > + case GIMPLE_ASSIGN: > + op = gimple_assign_rhs1 (stmt); > + visit_neighbor (op, i); > + break; > + default: > + gcc_unreachable (); > + } > + > + /* If we've just closed a root vertex of an scc, pop scc from stack. */ > + if (state == vopen && vertices[i].lowlink == vertices[i].index) > + { > + vec scc = vNULL; > + > + unsigned j; > + do > + { > + j = stack.pop (); > + scc.safe_push (SSA_NAME_DEF_STMT (ssa_name (j))); > + vertices[j].state = in_scc; > + } > + while (j != i); > + > + sccs.safe_push (scc); > + } > + } > + > + if (!stack.is_empty ()) > + gcc_unreachable (); > + > + /* Clear 'active' flags. */ > + for (gimple *stmt : stmts) > + { > + unsigned i; > + switch (gimple_code (stmt)) > + { > + case GIMPLE_ASSIGN: > + i = SSA_NAME_VERSION (gimple_assign_lhs (stmt)); > + break; > + case GIMPLE_PHI: > + i = SSA_NAME_VERSION (gimple_phi_result (stmt)); > + break; > + default: > + gcc_unreachable (); > + } > + > + vertices[i].active = false; > + } > + > + return sccs; I think you need to use return std::move (sccs); here? But maybe I'm confusing C++ here and this auto-magically works. Please try to run valgrind leak check on a testcase you know the pass performs some optimization. > +} > + > +/* Could this statement potentially be a copy statement? > + > + This pass only considers statements for which this function returns 'true'. > + Those are basically PHI functions and assignment statements similar to > + > + _2 = _1; > + or > + _2 = 5; */ > + > +static bool > +stmt_may_generate_copy (gimple *stmt) > +{ > + /* A PHI may generate a copy. */ > + if (gimple_code (stmt) == GIMPLE_PHI) > + { > + gphi *phi = as_a (stmt); > + > + /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs. */ > + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi))) > + return false; > + > + unsigned i; > + for (i = 0; i < gimple_phi_num_args (phi); i++) > + { > + tree op = gimple_phi_arg_def (phi, i); > + if (TREE_CODE (op) == SSA_NAME > + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op)) > + return false; > + } > + > + /* If PHI has more than one unique non-SSA arguments, it won't generate a > + copy. */ > + tree const_op = NULL_TREE; > + for (i = 0; i < gimple_phi_num_args (phi); i++) > + { > + tree op = gimple_phi_arg_def (phi, i); > + if (TREE_CODE (op) != SSA_NAME) > + { > + if (const_op && !operand_equal_p (op, const_op)) > + return false; > + const_op = op; > + } > + } > + > + return true; > + } > + > + /* Or a statement of type _2 = _1; OR _2 = 5; may generate a copy. */ > + > + if (!gimple_assign_single_p (stmt)) > + return false; > + > + tree lhs = gimple_assign_lhs (stmt); > + tree rhs = gimple_assign_rhs1 (stmt); > + > + if (TREE_CODE (lhs) != SSA_NAME) > + return false; > + > + /* lhs shouldn't flow through any abnormal edges. */ > + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)) > + return false; > + > + if (is_gimple_min_invariant (rhs)) > + return true; /* A statement of type _2 = 5;. */ > + > + if (TREE_CODE (rhs) != SSA_NAME) > + return false; > + > + /* rhs shouldn't flow through any abnormal edges. */ > + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs)) > + return false; > + > + /* It is possible that lhs has more alignment or value range information. By > + propagating we would lose this information. So in the case that alignment > + or value range information differs, we are conservative and do not > + propagate. > + > + FIXME: Propagate alignment and value range info the same way copy-prop > + does. */ > + if (POINTER_TYPE_P (TREE_TYPE (lhs)) > + && POINTER_TYPE_P (TREE_TYPE (rhs)) > + && SSA_NAME_PTR_INFO (lhs) != SSA_NAME_PTR_INFO (rhs)) > + return false; > + if (!POINTER_TYPE_P (TREE_TYPE (lhs)) > + && !POINTER_TYPE_P (TREE_TYPE (rhs)) > + && SSA_NAME_RANGE_INFO (lhs) != SSA_NAME_RANGE_INFO (rhs)) > + return false; > + > + return true; /* A statement of type _2 = _1;. */ > +} > + > +/* Return all statements in cfun that could generate copies. All statements > + for which stmt_may_generate_copy returns 'true'. */ > + > +static auto_vec > +get_all_stmt_may_generate_copy (void) > +{ > + auto_vec result; > + > + basic_block bb; > + FOR_EACH_BB_FN (bb, cfun) > + { > + gimple_stmt_iterator gsi; > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gimple *s = gsi_stmt (gsi); > + if (stmt_may_generate_copy (s)) > + { > + result.safe_push (s); > + } > + } > + > + gphi_iterator pi; > + for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi)) > + { > + gimple *s = pi.phi (); > + if (stmt_may_generate_copy (s)) > + { > + result.safe_push (s); > + } > + } > + } > + > + return result; > +} > + > +/* For each statement from given SCC, replace its usages by value > + VAL. */ > + > +static void > +replace_scc_by_value (vec scc, tree val) > +{ > + for (gimple *stmt : scc) > + { > + tree name = gimple_get_lhs (stmt); > + replace_uses_by (name, val); > + bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name)); > + } > + > + if (dump_file) > + fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ()); > +} > + > +/* Part of 'sccopy_propagate ()'. */ > + > +static void > +sccopy_visit_op (tree op, hash_set &outer_ops, > + hash_set &scc_set, bool &is_inner, > + tree &last_outer_op) > +{ > + bool op_in_scc = false; > + > + if (TREE_CODE (op) == SSA_NAME) > + { > + gimple *op_stmt = SSA_NAME_DEF_STMT (op); > + if (scc_set.contains (op_stmt)) > + op_in_scc = true; As optimization this could be a [s]bitmap indexed by SSA_NAME_VERSION (if you have many instances of scc_set a bitmap is better, a sbitmap has O(n) initialization overhead). > + } > + > + if (!op_in_scc) > + { > + outer_ops.add (op); I do wonder if it's relevant to track non-SSA_NAME 'op' here? Otherwise the same as above would apply. This can be done as followup (if possible) if you like. > + last_outer_op = op; > + is_inner = false; > + } > +} > + > +/* Main function of this pass. Find and propagate all three types of copy > + statements (see pass description above). > + > + This is an implementation of an algorithm from the paper Simple and > + Efficient Construction of Static Single Assignmemnt Form[1]. It is based > + on strongly-connected components (SCCs) in dataflow graph. The original > + algorithm only considers PHI statements. We extend it to also consider > + assignment statements of type _2 = _1;. > + > + The algorithm is based on this definition of a set of redundant PHIs[1]: > + > + A non-empty set P of PHI functions is redundant iff the PHI functions just > + reference each other or one other value > + > + It uses this lemma[1]: > + > + Let P be a redundant set of PHI functions. Then there is a > + strongly-connected component S subset of P that is also redundant. > + > + The algorithm works in this way: > + > + 1 Find SCCs > + 2 For each SCC S in topological order: > + 3 Construct set 'inner' of statements that only have other statements > + from S on their right hand side > + 4 Construct set 'outer' of values that originate outside S and appear on > + right hand side of some statement from S > + 5 If |outer| = 1, outer only contains a value v. Statements in S only > + refer to each other or to v -- they are redundant. Propagate v. > + Else, recurse on statements in inner. > + > + The implementation is non-recursive. > + > + References: > + > + [1] Simple and Efficient Construction of Static Single Assignmemnt Form, > + Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791, > + Section 3.2. */ > + > +static void > +sccopy_propagate () > +{ > + auto_vec useful_stmts = get_all_stmt_may_generate_copy (); > + scc_discovery discovery; > + > + auto_vec> worklist; > + worklist = discovery.compute_sccs (useful_stmts); using initialization might be cheaper than first default-constructing > + > + while (!worklist.is_empty ()) > + { > + vec scc = worklist.pop (); > + > + auto_vec inner; > + hash_set outer_ops; > + tree last_outer_op = NULL_TREE; > + > + /* Prepare hash set of PHIs in scc to query later. */ ah, that was to identify in-SCC vs. out-of-SCC stmts ... > + hash_set scc_set; > + for (gimple *stmt : scc) > + scc_set.add (stmt); > + > + for (gimple *stmt : scc) > + { > + bool is_inner = true; > + > + gphi *phi; > + tree op; > + > + switch (gimple_code (stmt)) > + { > + case GIMPLE_PHI: > + phi = as_a (stmt); > + unsigned j; > + for (j = 0; j < gimple_phi_num_args (phi); j++) > + { > + op = gimple_phi_arg_def (phi, j); > + sccopy_visit_op (op, outer_ops, scc_set, is_inner, > + last_outer_op); > + } > + break; > + case GIMPLE_ASSIGN: > + op = gimple_assign_rhs1 (stmt); > + sccopy_visit_op (op, outer_ops, scc_set, is_inner, > + last_outer_op); > + break; > + default: > + gcc_unreachable (); > + } > + > + if (is_inner) > + { > + inner.safe_push (stmt); > + } general coding-style comment, we don't put braces around single stmts. > + } > + > + if (outer_ops.elements () == 1) > + { > + /* The only operand in outer_ops. */ > + tree outer_op = last_outer_op; > + replace_scc_by_value (scc, outer_op); > + } > + else if (outer_ops.elements () > 1) > + { > + /* Add inner sccs to worklist. */ > + auto_vec> inner_sccs = > + discovery.compute_sccs (inner); > + for (vec inner_scc : inner_sccs) > + worklist.safe_push (inner_scc); > + } > + else > + gcc_unreachable (); > + > + scc.release (); > + } > +} > + > +/* Called when pass execution starts. */ > + > +static void > +init_sccopy (void) > +{ > + /* For propagated statements. */ > + dead_stmts = BITMAP_ALLOC (NULL); > +} > + > +/* Called before pass execution ends. */ > + > +static void > +finalize_sccopy (void) > +{ > + /* Remove all propagated statements. */ > + simple_dce_from_worklist (dead_stmts); > + BITMAP_FREE (dead_stmts); > + > + /* Propagating a constant may create dead eh edges. */ > + basic_block bb; > + FOR_EACH_BB_FN (bb, cfun) > + gimple_purge_dead_eh_edges (bb); > +} > + > +namespace { > + > +const pass_data pass_data_sccopy = > +{ > + GIMPLE_PASS, /* type */ > + "sccopy", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + TV_NONE, /* tv_id */ > + ( PROP_cfg | PROP_ssa ), /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */ > +}; > + > +class pass_sccopy : public gimple_opt_pass > +{ > +public: > + pass_sccopy (gcc::context *ctxt) > + : gimple_opt_pass (pass_data_sccopy, ctxt) > + {} > + > + /* opt_pass methods: */ > + virtual bool gate (function *) { return true; } > + virtual unsigned int execute (function *); > + opt_pass * clone () final override { return new pass_sccopy (m_ctxt); } > +}; // class pass_sccopy > + > +unsigned > +pass_sccopy::execute (function *) > +{ > + init_sccopy (); > + sccopy_propagate (); > + finalize_sccopy (); > + return 0; > +} > + > +} // anon namespace > + > +gimple_opt_pass * > +make_pass_sccopy (gcc::context *ctxt) > +{ > + return new pass_sccopy (ctxt); > +} > diff --git a/gcc/passes.def b/gcc/passes.def > index 1e1950bdb39..fa6c5a2c9fa 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -100,6 +100,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_if_to_switch); > NEXT_PASS (pass_convert_switch); > NEXT_PASS (pass_cleanup_eh); > + NEXT_PASS (pass_sccopy); > NEXT_PASS (pass_profile); > NEXT_PASS (pass_local_pure_const); > NEXT_PASS (pass_modref); > @@ -368,6 +369,7 @@ along with GCC; see the file COPYING3. If not see > However, this also causes us to misdiagnose cases that should be > real warnings (e.g., testsuite/gcc.dg/pr18501.c). */ > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > + NEXT_PASS (pass_sccopy); > NEXT_PASS (pass_tail_calls); > /* Split critical edges before late uninit warning to reduce the > number of false positives from it. */ > @@ -409,6 +411,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_sancov); > NEXT_PASS (pass_asan); > NEXT_PASS (pass_tsan); > + NEXT_PASS (pass_sccopy); I don't think we want to run the pass again with -Og. OK with those changes. Thanks, Richard. > /* ??? We do want some kind of loop invariant motion, but we possibly > need to adjust LIM to be more friendly towards preserving accurate > debug information here. */ > diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c > new file mode 100644 > index 00000000000..1e61a6b320e > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/sccopy-1.c > @@ -0,0 +1,78 @@ > +/* { dg-do compile } */ > +/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */ > +/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */ > + > +int __GIMPLE (ssa, startwith ("sccopy")) > +main () > +{ > + int a; > + int y; > + int x; > + int _1; > + int _2; > + int _13; > + > + __BB(2): > + if (x_7(D) == 5) > + goto __BB3; > + else > + goto __BB4; > + > + __BB(3): > + a_10 = x_7(D); > + goto __BB5; > + > + __BB(4): > + a_9 = y_8(D); > + goto __BB5; > + > + __BB(5): > + a_3 = __PHI (__BB3: a_10, __BB4: a_9); > + if (x_7(D) == y_8(D)) > + goto __BB6; > + else > + goto __BB11; > + > + __BB(6): > + a_11 = a_3 + 1; > + goto __BB7; > + > + __BB(7): > + a_4 = __PHI (__BB6: a_11, __BB11: a_6); > +label1: > + if (x_7(D) != y_8(D)) > + goto __BB8; > + else > + goto __BB10; > + > + __BB(8): > + goto __BB9; > + > + __BB(9): > + a_12 = __PHI (__BB8: a_4, __BB10: a_5); > + goto __BB10; > + > + __BB(10,loop_header(1)): > + a_5 = __PHI (__BB7: a_4, __BB9: a_12); > +label2: > + _1 = y_8(D) * 2; > + if (x_7(D) == _1) > + goto __BB9; > + else > + goto __BB11; > + > + __BB(11): > + a_6 = __PHI (__BB5: a_3, __BB10: a_5); > + _2 = x_7(D) * 3; > + if (y_8(D) == _2) > + goto __BB7; > + else > + goto __BB12; > + > + __BB(12): > + _13 = 0; > + return _13; > + > +} > + > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c > index bf889318c06..43770c95bca 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c > @@ -34,4 +34,4 @@ int f4 (int i) > > /* { dg-final { scan-tree-dump-times "sprintf" 1 "optimized" } } > { dg-final { scan-tree-dump-times "snprintf" 1 "optimized" } } > - { dg-final { scan-tree-dump " = 9;" "optimized" } } */ > + { dg-final { scan-tree-dump "return 9;" "optimized" } } */ > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > index 09e6ada5b2f..94a48461590 100644 > --- a/gcc/tree-pass.h > +++ b/gcc/tree-pass.h > @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt); > +extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt); > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)