public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]
@ 2021-01-13 22:03 David Malcolm
  2021-01-14  7:30 ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: David Malcolm @ 2021-01-13 22:03 UTC (permalink / raw)
  To: gcc-patches

gimple.h has this comment for gimple's uid field:

  /* UID of this statement.  This is used by passes that want to
     assign IDs to statements.  It must be assigned and used by each
     pass.  By default it should be assumed to contain garbage.  */
  unsigned uid;

and gimple_set_uid has:

   Please note that this UID property is supposed to be undefined at
   pass boundaries.  This means that a given pass should not assume it
   contains any useful value when the pass starts and thus can set it
   to any value it sees fit.

which suggests that any pass can use the uid field as an arbitrary
scratch space.

PR analyzer/98599 reports a case where this error occurs in LTO mode:
  fatal error: Cgraph edge statement index out of range
on certain inputs with -fanalyzer.

The error occurs in the LTRANS phase after -fanalyzer runs in the
WPA phase.  The analyzer pass writes to the uid fields of all stmts.

The error occurs when LTRANS is streaming callgraph edges back in.
If I'm reading things correctly, the LTO format uses stmt uids to
associate call stmts with callgraph edges between WPA and LTRANS.
For example, in lto-cgraph.c, lto_output_edge writes out the
gimple_uid, and input_edge reads it back in.

Hence IPA passes that touch the uids in WPA need to restore them,
or the stream-in at LTRANS will fail.

Is it intended that the LTO machinery relies on the value of the uid
field being preserved during WPA (or, at least, needs to be saved and
restored by passes that touch it)?

On the assumption that this is the case, this patch updates the comments
in gimple.h referring to passes being able to set uid to any value to
note the caveat for IPA passes, and it updates the analyzer to save
and restore the UIDs, fixing the error.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for master?

gcc/analyzer/ChangeLog:
	PR analyzer/98599
	* supergraph.cc (saved_uids::make_uid_unique): New.
	(saved_uids::restore_uids): New.
	(supergraph::supergraph): Replace assignments to stmt->uid with
	calls to m_stmt_uids.make_uid_unique.
	(supergraph::~supergraph): New.
	* supergraph.h (class saved_uids): New.
	(supergraph::~supergraph): New decl.
	(supergraph::m_stmt_uids): New field.

gcc/ChangeLog:
	PR analyzer/98599
	* doc/gimple.texi: Document that UIDs must not change during IPA
	passes.
	* gimple.h (gimple::uid): Likewise.
	(gimple_set_uid): Likewise.

gcc/testsuite/ChangeLog:
	PR analyzer/98599
	* gcc.dg/analyzer/pr98599-a.c: New test.
	* gcc.dg/analyzer/pr98599-b.c: New test.
---
 gcc/analyzer/supergraph.cc                | 53 +++++++++++++++++++++--
 gcc/analyzer/supergraph.h                 | 15 +++++++
 gcc/doc/gimple.texi                       |  6 +++
 gcc/gimple.h                              | 13 +++++-
 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 ++++
 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
 6 files changed, 90 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c

diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
index 419f6424f76..40acfbd16a8 100644
--- a/gcc/analyzer/supergraph.cc
+++ b/gcc/analyzer/supergraph.cc
@@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple *stmt)
   return edge;
 }
 
+/* class saved_uids.
+
+   In order to ensure consistent results without relying on the ordering
+   of pointer values we assign a uid to each gimple stmt, globally unique
+   across all functions.
+
+   Normally, the stmt uids are a scratch space that each pass can freely
+   assign its own values to.  However, in the case of LTO, the uids are
+   used to associate call stmts with callgraph edges between the WPA phase
+   (where the analyzer runs in LTO mode) and the LTRANS phase; if the
+   analyzer changes them in the WPA phase, it leads to errors when
+   streaming the code back in at LTRANS.
+
+   Hence this class has responsibility for tracking the original uids
+   and restoring them once the pass is complete (in the supergraph dtor).  */
+
+/* Give STMT a globally unique uid, storing its original uid so it can
+   later be restored.  */
+
+void
+saved_uids::make_uid_unique (gimple *stmt)
+{
+  unsigned next_uid = m_old_stmt_uids.length ();
+  unsigned old_stmt_uid = stmt->uid;
+  stmt->uid = next_uid;
+  m_old_stmt_uids.safe_push
+    (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
+}
+
+/* Restore the saved uids of all stmts.  */
+
+void
+saved_uids::restore_uids () const
+{
+  unsigned i;
+  std::pair<gimple *, unsigned> *pair;
+  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
+    pair->first->uid = pair->second;
+}
+
 /* supergraph's ctor.  Walk the callgraph, building supernodes for each
    CFG basic block, splitting the basic blocks at callsites.  Join
    together the supernodes with interprocedural and intraprocedural
@@ -101,8 +141,6 @@ supergraph::supergraph (logger *logger)
 
   /* First pass: make supernodes (and assign UIDs to the gimple stmts).  */
   {
-    unsigned next_uid = 0;
-
     /* Sort the cgraph_nodes?  */
     cgraph_node *node;
     FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
@@ -127,7 +165,7 @@ supergraph::supergraph (logger *logger)
 	    {
 	      gimple *stmt = gsi_stmt (gpi);
 	      m_stmt_to_node_t.put (stmt, node_for_stmts);
-	      stmt->uid = next_uid++;
+	      m_stmt_uids.make_uid_unique (stmt);
 	    }
 
 	  /* Append statements from BB to the current supernode, splitting
@@ -139,7 +177,7 @@ supergraph::supergraph (logger *logger)
 	      gimple *stmt = gsi_stmt (gsi);
 	      node_for_stmts->m_stmts.safe_push (stmt);
 	      m_stmt_to_node_t.put (stmt, node_for_stmts);
-	      stmt->uid = next_uid++;
+	      m_stmt_uids.make_uid_unique (stmt);
 	      if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
 		{
 		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
@@ -257,6 +295,13 @@ supergraph::supergraph (logger *logger)
   }
 }
 
+/* supergraph's dtor.  Reset stmt uids.  */
+
+supergraph::~supergraph ()
+{
+  m_stmt_uids.restore_uids ();
+}
+
 /* Dump this graph in .dot format to PP, using DUMP_ARGS.
    Cluster the supernodes by function, then by BB from original CFG.  */
 
diff --git a/gcc/analyzer/supergraph.h b/gcc/analyzer/supergraph.h
index fc4a753c5a4..b2125a35410 100644
--- a/gcc/analyzer/supergraph.h
+++ b/gcc/analyzer/supergraph.h
@@ -79,6 +79,18 @@ struct supergraph_traits
   typedef supercluster cluster_t;
 };
 
+/* A class to manage the setting and restoring of statement uids.  */
+
+class saved_uids
+{
+public:
+  void make_uid_unique (gimple *stmt);
+  void restore_uids () const;
+
+private:
+  auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
+};
+
 /* A "supergraph" is a directed graph formed by joining together all CFGs,
    linking them via interprocedural call and return edges.
 
@@ -90,6 +102,7 @@ class supergraph : public digraph<supergraph_traits>
 {
 public:
   supergraph (logger *logger);
+  ~supergraph ();
 
   supernode *get_node_for_function_entry (function *fun) const
   {
@@ -205,6 +218,8 @@ private:
 
   typedef hash_map<const function *, unsigned> function_to_num_snodes_t;
   function_to_num_snodes_t m_function_to_num_snodes;
+
+  saved_uids m_stmt_uids;
 };
 
 /* A node within a supergraph.  */
diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 4b3d7d7452e..417f37169ff 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -173,6 +173,12 @@ This is an unsigned integer used by passes that want to assign
 IDs to every statement. These IDs must be assigned and used by
 each pass.
 
+Normally the uid field is a scratch space for use by passes that
+need one.  However, in LTO mode the UIDs are used to associate call
+statements with callgraph edges between the WPA phase and the LTRANS
+phase; hence if they are changed by an IPA pass they should be restored
+at the end of that pass.
+
 @item @code{location}
 This is a @code{location_t} identifier to specify source code
 location for this statement. It is inherited from the front
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 3ec86f5f082..7d01bd4dd6a 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -260,7 +260,11 @@ struct GTY((desc ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
 
   /* UID of this statement.  This is used by passes that want to
      assign IDs to statements.  It must be assigned and used by each
-     pass.  By default it should be assumed to contain garbage.  */
+     pass.  By default it should be assumed to contain garbage.
+     However, in LTO mode the UIDs are used to associate call stmts with
+     callgraph edges between the WPA phase and the LTRANS phase; hence
+     if they are changed by an IPA pass they should be restored at the
+     end of that pass.  */
   unsigned uid;
 
   /* [ WORD 2 ]
@@ -2043,7 +2047,12 @@ gimple_plf (gimple *stmt, enum plf_mask plf)
    Please note that this UID property is supposed to be undefined at
    pass boundaries.  This means that a given pass should not assume it
    contains any useful value when the pass starts and thus can set it
-   to any value it sees fit.  */
+   to any value it sees fit.
+
+   That said, in LTO mode the UIDs are used to associate call stmts with
+   callgraph edges between the WPA phase and the LTRANS phase; hence
+   if they are changed by an IPA pass they should be restored at the
+   end of that pass.  */
 
 static inline void
 gimple_set_uid (gimple *g, unsigned uid)
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
new file mode 100644
index 00000000000..2bbf37b0e6e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
@@ -0,0 +1,8 @@
+/* { dg-do link } */
+/* { dg-require-effective-target lto } */
+/* { dg-additional-options "-Os -flto" } */
+/* { dg-additional-sources pr98599-b.c } */
+
+int b(int x);
+int a() { b(5); }
+int main() { a(); }
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
new file mode 100644
index 00000000000..cfdeb3bf134
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
@@ -0,0 +1 @@
+int b(int x) { return x; }
-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-01-13 22:03 [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599] David Malcolm
@ 2021-01-14  7:30 ` Richard Biener
  2021-01-14 14:00   ` Jan Hubicka
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2021-01-14  7:30 UTC (permalink / raw)
  To: David Malcolm, Jan Hubicka; +Cc: GCC Patches

On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> gimple.h has this comment for gimple's uid field:
>
>   /* UID of this statement.  This is used by passes that want to
>      assign IDs to statements.  It must be assigned and used by each
>      pass.  By default it should be assumed to contain garbage.  */
>   unsigned uid;
>
> and gimple_set_uid has:
>
>    Please note that this UID property is supposed to be undefined at
>    pass boundaries.  This means that a given pass should not assume it
>    contains any useful value when the pass starts and thus can set it
>    to any value it sees fit.
>
> which suggests that any pass can use the uid field as an arbitrary
> scratch space.
>
> PR analyzer/98599 reports a case where this error occurs in LTO mode:
>   fatal error: Cgraph edge statement index out of range
> on certain inputs with -fanalyzer.
>
> The error occurs in the LTRANS phase after -fanalyzer runs in the
> WPA phase.  The analyzer pass writes to the uid fields of all stmts.
>
> The error occurs when LTRANS is streaming callgraph edges back in.
> If I'm reading things correctly, the LTO format uses stmt uids to
> associate call stmts with callgraph edges between WPA and LTRANS.
> For example, in lto-cgraph.c, lto_output_edge writes out the
> gimple_uid, and input_edge reads it back in.
>
> Hence IPA passes that touch the uids in WPA need to restore them,
> or the stream-in at LTRANS will fail.
>
> Is it intended that the LTO machinery relies on the value of the uid
> field being preserved during WPA (or, at least, needs to be saved and
> restored by passes that touch it)?

I belive this is solely at the cgraph stream out to stream in boundary but
this may be a blurred area since while we materialize the whole cgraph
at once the function bodies are streamed in on demand.

Honza can probably clarify things.

Note LTO uses this exactly because of this comment to avoid allocating
extra memory for an 'index' but it could of course leave gimple_uid alone
at some extra expense (eventually paid for in generic cgraph data structures
and thus for not only the streaming time).

> On the assumption that this is the case, this patch updates the comments
> in gimple.h referring to passes being able to set uid to any value to
> note the caveat for IPA passes, and it updates the analyzer to save
> and restore the UIDs, fixing the error.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> OK for master?

The analyzer bits are OK, let's see how Honza can clarify the situation.

Thanks,
Richard.

> gcc/analyzer/ChangeLog:
>         PR analyzer/98599
>         * supergraph.cc (saved_uids::make_uid_unique): New.
>         (saved_uids::restore_uids): New.
>         (supergraph::supergraph): Replace assignments to stmt->uid with
>         calls to m_stmt_uids.make_uid_unique.
>         (supergraph::~supergraph): New.
>         * supergraph.h (class saved_uids): New.
>         (supergraph::~supergraph): New decl.
>         (supergraph::m_stmt_uids): New field.
>
> gcc/ChangeLog:
>         PR analyzer/98599
>         * doc/gimple.texi: Document that UIDs must not change during IPA
>         passes.
>         * gimple.h (gimple::uid): Likewise.
>         (gimple_set_uid): Likewise.
>
> gcc/testsuite/ChangeLog:
>         PR analyzer/98599
>         * gcc.dg/analyzer/pr98599-a.c: New test.
>         * gcc.dg/analyzer/pr98599-b.c: New test.
> ---
>  gcc/analyzer/supergraph.cc                | 53 +++++++++++++++++++++--
>  gcc/analyzer/supergraph.h                 | 15 +++++++
>  gcc/doc/gimple.texi                       |  6 +++
>  gcc/gimple.h                              | 13 +++++-
>  gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 ++++
>  gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
>  6 files changed, 90 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
>
> diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
> index 419f6424f76..40acfbd16a8 100644
> --- a/gcc/analyzer/supergraph.cc
> +++ b/gcc/analyzer/supergraph.cc
> @@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple *stmt)
>    return edge;
>  }
>
> +/* class saved_uids.
> +
> +   In order to ensure consistent results without relying on the ordering
> +   of pointer values we assign a uid to each gimple stmt, globally unique
> +   across all functions.
> +
> +   Normally, the stmt uids are a scratch space that each pass can freely
> +   assign its own values to.  However, in the case of LTO, the uids are
> +   used to associate call stmts with callgraph edges between the WPA phase
> +   (where the analyzer runs in LTO mode) and the LTRANS phase; if the
> +   analyzer changes them in the WPA phase, it leads to errors when
> +   streaming the code back in at LTRANS.
> +
> +   Hence this class has responsibility for tracking the original uids
> +   and restoring them once the pass is complete (in the supergraph dtor).  */
> +
> +/* Give STMT a globally unique uid, storing its original uid so it can
> +   later be restored.  */
> +
> +void
> +saved_uids::make_uid_unique (gimple *stmt)
> +{
> +  unsigned next_uid = m_old_stmt_uids.length ();
> +  unsigned old_stmt_uid = stmt->uid;
> +  stmt->uid = next_uid;
> +  m_old_stmt_uids.safe_push
> +    (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
> +}
> +
> +/* Restore the saved uids of all stmts.  */
> +
> +void
> +saved_uids::restore_uids () const
> +{
> +  unsigned i;
> +  std::pair<gimple *, unsigned> *pair;
> +  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
> +    pair->first->uid = pair->second;
> +}
> +
>  /* supergraph's ctor.  Walk the callgraph, building supernodes for each
>     CFG basic block, splitting the basic blocks at callsites.  Join
>     together the supernodes with interprocedural and intraprocedural
> @@ -101,8 +141,6 @@ supergraph::supergraph (logger *logger)
>
>    /* First pass: make supernodes (and assign UIDs to the gimple stmts).  */
>    {
> -    unsigned next_uid = 0;
> -
>      /* Sort the cgraph_nodes?  */
>      cgraph_node *node;
>      FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> @@ -127,7 +165,7 @@ supergraph::supergraph (logger *logger)
>             {
>               gimple *stmt = gsi_stmt (gpi);
>               m_stmt_to_node_t.put (stmt, node_for_stmts);
> -             stmt->uid = next_uid++;
> +             m_stmt_uids.make_uid_unique (stmt);
>             }
>
>           /* Append statements from BB to the current supernode, splitting
> @@ -139,7 +177,7 @@ supergraph::supergraph (logger *logger)
>               gimple *stmt = gsi_stmt (gsi);
>               node_for_stmts->m_stmts.safe_push (stmt);
>               m_stmt_to_node_t.put (stmt, node_for_stmts);
> -             stmt->uid = next_uid++;
> +             m_stmt_uids.make_uid_unique (stmt);
>               if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
>                 {
>                   m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
> @@ -257,6 +295,13 @@ supergraph::supergraph (logger *logger)
>    }
>  }
>
> +/* supergraph's dtor.  Reset stmt uids.  */
> +
> +supergraph::~supergraph ()
> +{
> +  m_stmt_uids.restore_uids ();
> +}
> +
>  /* Dump this graph in .dot format to PP, using DUMP_ARGS.
>     Cluster the supernodes by function, then by BB from original CFG.  */
>
> diff --git a/gcc/analyzer/supergraph.h b/gcc/analyzer/supergraph.h
> index fc4a753c5a4..b2125a35410 100644
> --- a/gcc/analyzer/supergraph.h
> +++ b/gcc/analyzer/supergraph.h
> @@ -79,6 +79,18 @@ struct supergraph_traits
>    typedef supercluster cluster_t;
>  };
>
> +/* A class to manage the setting and restoring of statement uids.  */
> +
> +class saved_uids
> +{
> +public:
> +  void make_uid_unique (gimple *stmt);
> +  void restore_uids () const;
> +
> +private:
> +  auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
> +};
> +
>  /* A "supergraph" is a directed graph formed by joining together all CFGs,
>     linking them via interprocedural call and return edges.
>
> @@ -90,6 +102,7 @@ class supergraph : public digraph<supergraph_traits>
>  {
>  public:
>    supergraph (logger *logger);
> +  ~supergraph ();
>
>    supernode *get_node_for_function_entry (function *fun) const
>    {
> @@ -205,6 +218,8 @@ private:
>
>    typedef hash_map<const function *, unsigned> function_to_num_snodes_t;
>    function_to_num_snodes_t m_function_to_num_snodes;
> +
> +  saved_uids m_stmt_uids;
>  };
>
>  /* A node within a supergraph.  */
> diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
> index 4b3d7d7452e..417f37169ff 100644
> --- a/gcc/doc/gimple.texi
> +++ b/gcc/doc/gimple.texi
> @@ -173,6 +173,12 @@ This is an unsigned integer used by passes that want to assign
>  IDs to every statement. These IDs must be assigned and used by
>  each pass.
>
> +Normally the uid field is a scratch space for use by passes that
> +need one.  However, in LTO mode the UIDs are used to associate call
> +statements with callgraph edges between the WPA phase and the LTRANS
> +phase; hence if they are changed by an IPA pass they should be restored
> +at the end of that pass.
> +
>  @item @code{location}
>  This is a @code{location_t} identifier to specify source code
>  location for this statement. It is inherited from the front
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 3ec86f5f082..7d01bd4dd6a 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -260,7 +260,11 @@ struct GTY((desc ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
>
>    /* UID of this statement.  This is used by passes that want to
>       assign IDs to statements.  It must be assigned and used by each
> -     pass.  By default it should be assumed to contain garbage.  */
> +     pass.  By default it should be assumed to contain garbage.
> +     However, in LTO mode the UIDs are used to associate call stmts with
> +     callgraph edges between the WPA phase and the LTRANS phase; hence
> +     if they are changed by an IPA pass they should be restored at the
> +     end of that pass.  */
>    unsigned uid;
>
>    /* [ WORD 2 ]
> @@ -2043,7 +2047,12 @@ gimple_plf (gimple *stmt, enum plf_mask plf)
>     Please note that this UID property is supposed to be undefined at
>     pass boundaries.  This means that a given pass should not assume it
>     contains any useful value when the pass starts and thus can set it
> -   to any value it sees fit.  */
> +   to any value it sees fit.
> +
> +   That said, in LTO mode the UIDs are used to associate call stmts with
> +   callgraph edges between the WPA phase and the LTRANS phase; hence
> +   if they are changed by an IPA pass they should be restored at the
> +   end of that pass.  */
>
>  static inline void
>  gimple_set_uid (gimple *g, unsigned uid)
> diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> new file mode 100644
> index 00000000000..2bbf37b0e6e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> @@ -0,0 +1,8 @@
> +/* { dg-do link } */
> +/* { dg-require-effective-target lto } */
> +/* { dg-additional-options "-Os -flto" } */
> +/* { dg-additional-sources pr98599-b.c } */
> +
> +int b(int x);
> +int a() { b(5); }
> +int main() { a(); }
> diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> new file mode 100644
> index 00000000000..cfdeb3bf134
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> @@ -0,0 +1 @@
> +int b(int x) { return x; }
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-01-14  7:30 ` Richard Biener
@ 2021-01-14 14:00   ` Jan Hubicka
  2021-01-21 17:15     ` David Malcolm
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Hubicka @ 2021-01-14 14:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Malcolm, GCC Patches

> On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > gimple.h has this comment for gimple's uid field:
> >
> >   /* UID of this statement.  This is used by passes that want to
> >      assign IDs to statements.  It must be assigned and used by each
> >      pass.  By default it should be assumed to contain garbage.  */
> >   unsigned uid;
> >
> > and gimple_set_uid has:
> >
> >    Please note that this UID property is supposed to be undefined at
> >    pass boundaries.  This means that a given pass should not assume it
> >    contains any useful value when the pass starts and thus can set it
> >    to any value it sees fit.
> >
> > which suggests that any pass can use the uid field as an arbitrary
> > scratch space.
> >
> > PR analyzer/98599 reports a case where this error occurs in LTO mode:
> >   fatal error: Cgraph edge statement index out of range
> > on certain inputs with -fanalyzer.
> >
> > The error occurs in the LTRANS phase after -fanalyzer runs in the
> > WPA phase.  The analyzer pass writes to the uid fields of all stmts.
> >
> > The error occurs when LTRANS is streaming callgraph edges back in.
> > If I'm reading things correctly, the LTO format uses stmt uids to
> > associate call stmts with callgraph edges between WPA and LTRANS.
> > For example, in lto-cgraph.c, lto_output_edge writes out the
> > gimple_uid, and input_edge reads it back in.
> >
> > Hence IPA passes that touch the uids in WPA need to restore them,
> > or the stream-in at LTRANS will fail.
> >
> > Is it intended that the LTO machinery relies on the value of the uid
> > field being preserved during WPA (or, at least, needs to be saved and
> > restored by passes that touch it)?
> 
> I belive this is solely at the cgraph stream out to stream in boundary but
> this may be a blurred area since while we materialize the whole cgraph
> at once the function bodies are streamed in on demand.
> 
> Honza can probably clarify things.

Well, the uids are used to associate cgraph edges with statements.  At
WPA stage you do not have function bodies and thus uids serves role of
pointers to the statement.  If you load the body in (via get_body) the
uids are replaced by pointers and when you stream out uids are
recomputed again.

When do you touch the uids? At WPA time or from small IPA pass in
ltrans?

hozna
> 
> Note LTO uses this exactly because of this comment to avoid allocating
> extra memory for an 'index' but it could of course leave gimple_uid alone
> at some extra expense (eventually paid for in generic cgraph data structures
> and thus for not only the streaming time).
> 
> > On the assumption that this is the case, this patch updates the comments
> > in gimple.h referring to passes being able to set uid to any value to
> > note the caveat for IPA passes, and it updates the analyzer to save
> > and restore the UIDs, fixing the error.
> >
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > OK for master?
> 
> The analyzer bits are OK, let's see how Honza can clarify the situation.
> 
> Thanks,
> Richard.
> 
> > gcc/analyzer/ChangeLog:
> >         PR analyzer/98599
> >         * supergraph.cc (saved_uids::make_uid_unique): New.
> >         (saved_uids::restore_uids): New.
> >         (supergraph::supergraph): Replace assignments to stmt->uid with
> >         calls to m_stmt_uids.make_uid_unique.
> >         (supergraph::~supergraph): New.
> >         * supergraph.h (class saved_uids): New.
> >         (supergraph::~supergraph): New decl.
> >         (supergraph::m_stmt_uids): New field.
> >
> > gcc/ChangeLog:
> >         PR analyzer/98599
> >         * doc/gimple.texi: Document that UIDs must not change during IPA
> >         passes.
> >         * gimple.h (gimple::uid): Likewise.
> >         (gimple_set_uid): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >         PR analyzer/98599
> >         * gcc.dg/analyzer/pr98599-a.c: New test.
> >         * gcc.dg/analyzer/pr98599-b.c: New test.
> > ---
> >  gcc/analyzer/supergraph.cc                | 53 +++++++++++++++++++++--
> >  gcc/analyzer/supergraph.h                 | 15 +++++++
> >  gcc/doc/gimple.texi                       |  6 +++
> >  gcc/gimple.h                              | 13 +++++-
> >  gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 ++++
> >  gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
> >  6 files changed, 90 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> >
> > diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
> > index 419f6424f76..40acfbd16a8 100644
> > --- a/gcc/analyzer/supergraph.cc
> > +++ b/gcc/analyzer/supergraph.cc
> > @@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple *stmt)
> >    return edge;
> >  }
> >
> > +/* class saved_uids.
> > +
> > +   In order to ensure consistent results without relying on the ordering
> > +   of pointer values we assign a uid to each gimple stmt, globally unique
> > +   across all functions.
> > +
> > +   Normally, the stmt uids are a scratch space that each pass can freely
> > +   assign its own values to.  However, in the case of LTO, the uids are
> > +   used to associate call stmts with callgraph edges between the WPA phase
> > +   (where the analyzer runs in LTO mode) and the LTRANS phase; if the
> > +   analyzer changes them in the WPA phase, it leads to errors when
> > +   streaming the code back in at LTRANS.
> > +
> > +   Hence this class has responsibility for tracking the original uids
> > +   and restoring them once the pass is complete (in the supergraph dtor).  */
> > +
> > +/* Give STMT a globally unique uid, storing its original uid so it can
> > +   later be restored.  */
> > +
> > +void
> > +saved_uids::make_uid_unique (gimple *stmt)
> > +{
> > +  unsigned next_uid = m_old_stmt_uids.length ();
> > +  unsigned old_stmt_uid = stmt->uid;
> > +  stmt->uid = next_uid;
> > +  m_old_stmt_uids.safe_push
> > +    (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
> > +}
> > +
> > +/* Restore the saved uids of all stmts.  */
> > +
> > +void
> > +saved_uids::restore_uids () const
> > +{
> > +  unsigned i;
> > +  std::pair<gimple *, unsigned> *pair;
> > +  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
> > +    pair->first->uid = pair->second;
> > +}
> > +
> >  /* supergraph's ctor.  Walk the callgraph, building supernodes for each
> >     CFG basic block, splitting the basic blocks at callsites.  Join
> >     together the supernodes with interprocedural and intraprocedural
> > @@ -101,8 +141,6 @@ supergraph::supergraph (logger *logger)
> >
> >    /* First pass: make supernodes (and assign UIDs to the gimple stmts).  */
> >    {
> > -    unsigned next_uid = 0;
> > -
> >      /* Sort the cgraph_nodes?  */
> >      cgraph_node *node;
> >      FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> > @@ -127,7 +165,7 @@ supergraph::supergraph (logger *logger)
> >             {
> >               gimple *stmt = gsi_stmt (gpi);
> >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > -             stmt->uid = next_uid++;
> > +             m_stmt_uids.make_uid_unique (stmt);
> >             }
> >
> >           /* Append statements from BB to the current supernode, splitting
> > @@ -139,7 +177,7 @@ supergraph::supergraph (logger *logger)
> >               gimple *stmt = gsi_stmt (gsi);
> >               node_for_stmts->m_stmts.safe_push (stmt);
> >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > -             stmt->uid = next_uid++;
> > +             m_stmt_uids.make_uid_unique (stmt);
> >               if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
> >                 {
> >                   m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
> > @@ -257,6 +295,13 @@ supergraph::supergraph (logger *logger)
> >    }
> >  }
> >
> > +/* supergraph's dtor.  Reset stmt uids.  */
> > +
> > +supergraph::~supergraph ()
> > +{
> > +  m_stmt_uids.restore_uids ();
> > +}
> > +
> >  /* Dump this graph in .dot format to PP, using DUMP_ARGS.
> >     Cluster the supernodes by function, then by BB from original CFG.  */
> >
> > diff --git a/gcc/analyzer/supergraph.h b/gcc/analyzer/supergraph.h
> > index fc4a753c5a4..b2125a35410 100644
> > --- a/gcc/analyzer/supergraph.h
> > +++ b/gcc/analyzer/supergraph.h
> > @@ -79,6 +79,18 @@ struct supergraph_traits
> >    typedef supercluster cluster_t;
> >  };
> >
> > +/* A class to manage the setting and restoring of statement uids.  */
> > +
> > +class saved_uids
> > +{
> > +public:
> > +  void make_uid_unique (gimple *stmt);
> > +  void restore_uids () const;
> > +
> > +private:
> > +  auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
> > +};
> > +
> >  /* A "supergraph" is a directed graph formed by joining together all CFGs,
> >     linking them via interprocedural call and return edges.
> >
> > @@ -90,6 +102,7 @@ class supergraph : public digraph<supergraph_traits>
> >  {
> >  public:
> >    supergraph (logger *logger);
> > +  ~supergraph ();
> >
> >    supernode *get_node_for_function_entry (function *fun) const
> >    {
> > @@ -205,6 +218,8 @@ private:
> >
> >    typedef hash_map<const function *, unsigned> function_to_num_snodes_t;
> >    function_to_num_snodes_t m_function_to_num_snodes;
> > +
> > +  saved_uids m_stmt_uids;
> >  };
> >
> >  /* A node within a supergraph.  */
> > diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
> > index 4b3d7d7452e..417f37169ff 100644
> > --- a/gcc/doc/gimple.texi
> > +++ b/gcc/doc/gimple.texi
> > @@ -173,6 +173,12 @@ This is an unsigned integer used by passes that want to assign
> >  IDs to every statement. These IDs must be assigned and used by
> >  each pass.
> >
> > +Normally the uid field is a scratch space for use by passes that
> > +need one.  However, in LTO mode the UIDs are used to associate call
> > +statements with callgraph edges between the WPA phase and the LTRANS
> > +phase; hence if they are changed by an IPA pass they should be restored
> > +at the end of that pass.
> > +
> >  @item @code{location}
> >  This is a @code{location_t} identifier to specify source code
> >  location for this statement. It is inherited from the front
> > diff --git a/gcc/gimple.h b/gcc/gimple.h
> > index 3ec86f5f082..7d01bd4dd6a 100644
> > --- a/gcc/gimple.h
> > +++ b/gcc/gimple.h
> > @@ -260,7 +260,11 @@ struct GTY((desc ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
> >
> >    /* UID of this statement.  This is used by passes that want to
> >       assign IDs to statements.  It must be assigned and used by each
> > -     pass.  By default it should be assumed to contain garbage.  */
> > +     pass.  By default it should be assumed to contain garbage.
> > +     However, in LTO mode the UIDs are used to associate call stmts with
> > +     callgraph edges between the WPA phase and the LTRANS phase; hence
> > +     if they are changed by an IPA pass they should be restored at the
> > +     end of that pass.  */
> >    unsigned uid;
> >
> >    /* [ WORD 2 ]
> > @@ -2043,7 +2047,12 @@ gimple_plf (gimple *stmt, enum plf_mask plf)
> >     Please note that this UID property is supposed to be undefined at
> >     pass boundaries.  This means that a given pass should not assume it
> >     contains any useful value when the pass starts and thus can set it
> > -   to any value it sees fit.  */
> > +   to any value it sees fit.
> > +
> > +   That said, in LTO mode the UIDs are used to associate call stmts with
> > +   callgraph edges between the WPA phase and the LTRANS phase; hence
> > +   if they are changed by an IPA pass they should be restored at the
> > +   end of that pass.  */
> >
> >  static inline void
> >  gimple_set_uid (gimple *g, unsigned uid)
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > new file mode 100644
> > index 00000000000..2bbf37b0e6e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > @@ -0,0 +1,8 @@
> > +/* { dg-do link } */
> > +/* { dg-require-effective-target lto } */
> > +/* { dg-additional-options "-Os -flto" } */
> > +/* { dg-additional-sources pr98599-b.c } */
> > +
> > +int b(int x);
> > +int a() { b(5); }
> > +int main() { a(); }
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > new file mode 100644
> > index 00000000000..cfdeb3bf134
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > @@ -0,0 +1 @@
> > +int b(int x) { return x; }
> > --
> > 2.26.2
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-01-14 14:00   ` Jan Hubicka
@ 2021-01-21 17:15     ` David Malcolm
  2021-01-21 19:09       ` Jan Hubicka
  0 siblings, 1 reply; 14+ messages in thread
From: David Malcolm @ 2021-01-21 17:15 UTC (permalink / raw)
  To: Jan Hubicka, Richard Biener; +Cc: GCC Patches

On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > > gimple.h has this comment for gimple's uid field:
> > > 
> > >   /* UID of this statement.  This is used by passes that want to
> > >      assign IDs to statements.  It must be assigned and used by
> > > each
> > >      pass.  By default it should be assumed to contain
> > > garbage.  */
> > >   unsigned uid;
> > > 
> > > and gimple_set_uid has:
> > > 
> > >    Please note that this UID property is supposed to be undefined
> > > at
> > >    pass boundaries.  This means that a given pass should not
> > > assume it
> > >    contains any useful value when the pass starts and thus can
> > > set it
> > >    to any value it sees fit.
> > > 
> > > which suggests that any pass can use the uid field as an
> > > arbitrary
> > > scratch space.
> > > 
> > > PR analyzer/98599 reports a case where this error occurs in LTO
> > > mode:
> > >   fatal error: Cgraph edge statement index out of range
> > > on certain inputs with -fanalyzer.
> > > 
> > > The error occurs in the LTRANS phase after -fanalyzer runs in the
> > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > stmts.
> > > 
> > > The error occurs when LTRANS is streaming callgraph edges back
> > > in.
> > > If I'm reading things correctly, the LTO format uses stmt uids to
> > > associate call stmts with callgraph edges between WPA and LTRANS.
> > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > gimple_uid, and input_edge reads it back in.
> > > 
> > > Hence IPA passes that touch the uids in WPA need to restore them,
> > > or the stream-in at LTRANS will fail.
> > > 
> > > Is it intended that the LTO machinery relies on the value of the
> > > uid
> > > field being preserved during WPA (or, at least, needs to be saved
> > > and
> > > restored by passes that touch it)?
> > 
> > I belive this is solely at the cgraph stream out to stream in
> > boundary but
> > this may be a blurred area since while we materialize the whole
> > cgraph
> > at once the function bodies are streamed in on demand.
> > 
> > Honza can probably clarify things.
> 
> Well, the uids are used to associate cgraph edges with
> statements.  At
> WPA stage you do not have function bodies and thus uids serves role
> of
> pointers to the statement.  If you load the body in (via get_body)
> the
> uids are replaced by pointers and when you stream out uids are
> recomputed again.
> 
> When do you touch the uids? At WPA time or from small IPA pass in
> ltrans?

The analyzer is here in passes.def:
  INSERT_PASSES_AFTER (all_regular_ipa_passes)
  NEXT_PASS (pass_analyzer);

and so in LTO runs as the first regular IPA pass at WPA time,
when do_whole_program_analysis calls:
  execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);

FWIW I hope to eventually have a way to summarize function bodies in
the analyzer, but I don't yet, so I'm currently brute forcing things by
loading all function bodies at the start of the analyzer (when
-fanalyzer is enabled).

I wonder if that's messing things up somehow?

Does the stream-out from WPA make any assumptions about the stmt uids?
For example, 
  #define STMT_UID_NOT_IN_RANGE(uid) \
    (gimple_stmt_max_uid (fn) < uid || uid == 0)
seems to assume that the UIDs are per-function ranges from
  [0-gimple_stmt_max_uid (fn)]
which isn't the case for the uids set by the analyzer.  Maybe that's
the issue here?

Sorry for not being more familiar with the IPA/LTO code
Dave


> hozna
> > Note LTO uses this exactly because of this comment to avoid
> > allocating
> > extra memory for an 'index' but it could of course leave gimple_uid
> > alone
> > at some extra expense (eventually paid for in generic cgraph data
> > structures
> > and thus for not only the streaming time).
> > 
> > > On the assumption that this is the case, this patch updates the
> > > comments
> > > in gimple.h referring to passes being able to set uid to any
> > > value to
> > > note the caveat for IPA passes, and it updates the analyzer to
> > > save
> > > and restore the UIDs, fixing the error.
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > OK for master?
> > 
> > The analyzer bits are OK, let's see how Honza can clarify the
> > situation.
> > 
> > Thanks,
> > Richard.
> > 
> > > gcc/analyzer/ChangeLog:
> > >         PR analyzer/98599
> > >         * supergraph.cc (saved_uids::make_uid_unique): New.
> > >         (saved_uids::restore_uids): New.
> > >         (supergraph::supergraph): Replace assignments to stmt-
> > > >uid with
> > >         calls to m_stmt_uids.make_uid_unique.
> > >         (supergraph::~supergraph): New.
> > >         * supergraph.h (class saved_uids): New.
> > >         (supergraph::~supergraph): New decl.
> > >         (supergraph::m_stmt_uids): New field.
> > > 
> > > gcc/ChangeLog:
> > >         PR analyzer/98599
> > >         * doc/gimple.texi: Document that UIDs must not change
> > > during IPA
> > >         passes.
> > >         * gimple.h (gimple::uid): Likewise.
> > >         (gimple_set_uid): Likewise.
> > > 
> > > gcc/testsuite/ChangeLog:
> > >         PR analyzer/98599
> > >         * gcc.dg/analyzer/pr98599-a.c: New test.
> > >         * gcc.dg/analyzer/pr98599-b.c: New test.
> > > ---
> > >  gcc/analyzer/supergraph.cc                | 53
> > > +++++++++++++++++++++--
> > >  gcc/analyzer/supergraph.h                 | 15 +++++++
> > >  gcc/doc/gimple.texi                       |  6 +++
> > >  gcc/gimple.h                              | 13 +++++-
> > >  gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 ++++
> > >  gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
> > >  6 files changed, 90 insertions(+), 6 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > 
> > > diff --git a/gcc/analyzer/supergraph.cc
> > > b/gcc/analyzer/supergraph.cc
> > > index 419f6424f76..40acfbd16a8 100644
> > > --- a/gcc/analyzer/supergraph.cc
> > > +++ b/gcc/analyzer/supergraph.cc
> > > @@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple
> > > *stmt)
> > >    return edge;
> > >  }
> > > 
> > > +/* class saved_uids.
> > > +
> > > +   In order to ensure consistent results without relying on the
> > > ordering
> > > +   of pointer values we assign a uid to each gimple stmt,
> > > globally unique
> > > +   across all functions.
> > > +
> > > +   Normally, the stmt uids are a scratch space that each pass
> > > can freely
> > > +   assign its own values to.  However, in the case of LTO, the
> > > uids are
> > > +   used to associate call stmts with callgraph edges between the
> > > WPA phase
> > > +   (where the analyzer runs in LTO mode) and the LTRANS phase;
> > > if the
> > > +   analyzer changes them in the WPA phase, it leads to errors
> > > when
> > > +   streaming the code back in at LTRANS.
> > > +
> > > +   Hence this class has responsibility for tracking the original
> > > uids
> > > +   and restoring them once the pass is complete (in the
> > > supergraph dtor).  */
> > > +
> > > +/* Give STMT a globally unique uid, storing its original uid so
> > > it can
> > > +   later be restored.  */
> > > +
> > > +void
> > > +saved_uids::make_uid_unique (gimple *stmt)
> > > +{
> > > +  unsigned next_uid = m_old_stmt_uids.length ();
> > > +  unsigned old_stmt_uid = stmt->uid;
> > > +  stmt->uid = next_uid;
> > > +  m_old_stmt_uids.safe_push
> > > +    (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
> > > +}
> > > +
> > > +/* Restore the saved uids of all stmts.  */
> > > +
> > > +void
> > > +saved_uids::restore_uids () const
> > > +{
> > > +  unsigned i;
> > > +  std::pair<gimple *, unsigned> *pair;
> > > +  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
> > > +    pair->first->uid = pair->second;
> > > +}
> > > +
> > >  /* supergraph's ctor.  Walk the callgraph, building supernodes
> > > for each
> > >     CFG basic block, splitting the basic blocks at
> > > callsites.  Join
> > >     together the supernodes with interprocedural and
> > > intraprocedural
> > > @@ -101,8 +141,6 @@ supergraph::supergraph (logger *logger)
> > > 
> > >    /* First pass: make supernodes (and assign UIDs to the gimple
> > > stmts).  */
> > >    {
> > > -    unsigned next_uid = 0;
> > > -
> > >      /* Sort the cgraph_nodes?  */
> > >      cgraph_node *node;
> > >      FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> > > @@ -127,7 +165,7 @@ supergraph::supergraph (logger *logger)
> > >             {
> > >               gimple *stmt = gsi_stmt (gpi);
> > >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > -             stmt->uid = next_uid++;
> > > +             m_stmt_uids.make_uid_unique (stmt);
> > >             }
> > > 
> > >           /* Append statements from BB to the current supernode,
> > > splitting
> > > @@ -139,7 +177,7 @@ supergraph::supergraph (logger *logger)
> > >               gimple *stmt = gsi_stmt (gsi);
> > >               node_for_stmts->m_stmts.safe_push (stmt);
> > >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > -             stmt->uid = next_uid++;
> > > +             m_stmt_uids.make_uid_unique (stmt);
> > >               if (cgraph_edge *edge = supergraph_call_edge (fun,
> > > stmt))
> > >                 {
> > >                   m_cgraph_edge_to_caller_prev_node.put(edge,
> > > node_for_stmts);
> > > @@ -257,6 +295,13 @@ supergraph::supergraph (logger *logger)
> > >    }
> > >  }
> > > 
> > > +/* supergraph's dtor.  Reset stmt uids.  */
> > > +
> > > +supergraph::~supergraph ()
> > > +{
> > > +  m_stmt_uids.restore_uids ();
> > > +}
> > > +
> > >  /* Dump this graph in .dot format to PP, using DUMP_ARGS.
> > >     Cluster the supernodes by function, then by BB from original
> > > CFG.  */
> > > 
> > > diff --git a/gcc/analyzer/supergraph.h
> > > b/gcc/analyzer/supergraph.h
> > > index fc4a753c5a4..b2125a35410 100644
> > > --- a/gcc/analyzer/supergraph.h
> > > +++ b/gcc/analyzer/supergraph.h
> > > @@ -79,6 +79,18 @@ struct supergraph_traits
> > >    typedef supercluster cluster_t;
> > >  };
> > > 
> > > +/* A class to manage the setting and restoring of statement
> > > uids.  */
> > > +
> > > +class saved_uids
> > > +{
> > > +public:
> > > +  void make_uid_unique (gimple *stmt);
> > > +  void restore_uids () const;
> > > +
> > > +private:
> > > +  auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
> > > +};
> > > +
> > >  /* A "supergraph" is a directed graph formed by joining together
> > > all CFGs,
> > >     linking them via interprocedural call and return edges.
> > > 
> > > @@ -90,6 +102,7 @@ class supergraph : public
> > > digraph<supergraph_traits>
> > >  {
> > >  public:
> > >    supergraph (logger *logger);
> > > +  ~supergraph ();
> > > 
> > >    supernode *get_node_for_function_entry (function *fun) const
> > >    {
> > > @@ -205,6 +218,8 @@ private:
> > > 
> > >    typedef hash_map<const function *, unsigned>
> > > function_to_num_snodes_t;
> > >    function_to_num_snodes_t m_function_to_num_snodes;
> > > +
> > > +  saved_uids m_stmt_uids;
> > >  };
> > > 
> > >  /* A node within a supergraph.  */
> > > diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
> > > index 4b3d7d7452e..417f37169ff 100644
> > > --- a/gcc/doc/gimple.texi
> > > +++ b/gcc/doc/gimple.texi
> > > @@ -173,6 +173,12 @@ This is an unsigned integer used by passes
> > > that want to assign
> > >  IDs to every statement. These IDs must be assigned and used by
> > >  each pass.
> > > 
> > > +Normally the uid field is a scratch space for use by passes that
> > > +need one.  However, in LTO mode the UIDs are used to associate
> > > call
> > > +statements with callgraph edges between the WPA phase and the
> > > LTRANS
> > > +phase; hence if they are changed by an IPA pass they should be
> > > restored
> > > +at the end of that pass.
> > > +
> > >  @item @code{location}
> > >  This is a @code{location_t} identifier to specify source code
> > >  location for this statement. It is inherited from the front
> > > diff --git a/gcc/gimple.h b/gcc/gimple.h
> > > index 3ec86f5f082..7d01bd4dd6a 100644
> > > --- a/gcc/gimple.h
> > > +++ b/gcc/gimple.h
> > > @@ -260,7 +260,11 @@ struct GTY((desc
> > > ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
> > > 
> > >    /* UID of this statement.  This is used by passes that want to
> > >       assign IDs to statements.  It must be assigned and used by
> > > each
> > > -     pass.  By default it should be assumed to contain
> > > garbage.  */
> > > +     pass.  By default it should be assumed to contain garbage.
> > > +     However, in LTO mode the UIDs are used to associate call
> > > stmts with
> > > +     callgraph edges between the WPA phase and the LTRANS phase;
> > > hence
> > > +     if they are changed by an IPA pass they should be restored
> > > at the
> > > +     end of that pass.  */
> > >    unsigned uid;
> > > 
> > >    /* [ WORD 2 ]
> > > @@ -2043,7 +2047,12 @@ gimple_plf (gimple *stmt, enum plf_mask
> > > plf)
> > >     Please note that this UID property is supposed to be
> > > undefined at
> > >     pass boundaries.  This means that a given pass should not
> > > assume it
> > >     contains any useful value when the pass starts and thus can
> > > set it
> > > -   to any value it sees fit.  */
> > > +   to any value it sees fit.
> > > +
> > > +   That said, in LTO mode the UIDs are used to associate call
> > > stmts with
> > > +   callgraph edges between the WPA phase and the LTRANS phase;
> > > hence
> > > +   if they are changed by an IPA pass they should be restored at
> > > the
> > > +   end of that pass.  */
> > > 
> > >  static inline void
> > >  gimple_set_uid (gimple *g, unsigned uid)
> > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > new file mode 100644
> > > index 00000000000..2bbf37b0e6e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > @@ -0,0 +1,8 @@
> > > +/* { dg-do link } */
> > > +/* { dg-require-effective-target lto } */
> > > +/* { dg-additional-options "-Os -flto" } */
> > > +/* { dg-additional-sources pr98599-b.c } */
> > > +
> > > +int b(int x);
> > > +int a() { b(5); }
> > > +int main() { a(); }
> > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > new file mode 100644
> > > index 00000000000..cfdeb3bf134
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > @@ -0,0 +1 @@
> > > +int b(int x) { return x; }
> > > --
> > > 2.26.2
> > > 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-01-21 17:15     ` David Malcolm
@ 2021-01-21 19:09       ` Jan Hubicka
  2021-01-21 19:35         ` David Malcolm
  2021-01-22  3:07         ` David Malcolm
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Hubicka @ 2021-01-21 19:09 UTC (permalink / raw)
  To: David Malcolm; +Cc: Richard Biener, GCC Patches

> On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > > gimple.h has this comment for gimple's uid field:
> > > > 
> > > >   /* UID of this statement.  This is used by passes that want to
> > > >      assign IDs to statements.  It must be assigned and used by
> > > > each
> > > >      pass.  By default it should be assumed to contain
> > > > garbage.  */
> > > >   unsigned uid;
> > > > 
> > > > and gimple_set_uid has:
> > > > 
> > > >    Please note that this UID property is supposed to be undefined
> > > > at
> > > >    pass boundaries.  This means that a given pass should not
> > > > assume it
> > > >    contains any useful value when the pass starts and thus can
> > > > set it
> > > >    to any value it sees fit.
> > > > 
> > > > which suggests that any pass can use the uid field as an
> > > > arbitrary
> > > > scratch space.
> > > > 
> > > > PR analyzer/98599 reports a case where this error occurs in LTO
> > > > mode:
> > > >   fatal error: Cgraph edge statement index out of range
> > > > on certain inputs with -fanalyzer.
> > > > 
> > > > The error occurs in the LTRANS phase after -fanalyzer runs in the
> > > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > > stmts.
> > > > 
> > > > The error occurs when LTRANS is streaming callgraph edges back
> > > > in.
> > > > If I'm reading things correctly, the LTO format uses stmt uids to
> > > > associate call stmts with callgraph edges between WPA and LTRANS.
> > > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > > gimple_uid, and input_edge reads it back in.
> > > > 
> > > > Hence IPA passes that touch the uids in WPA need to restore them,
> > > > or the stream-in at LTRANS will fail.
> > > > 
> > > > Is it intended that the LTO machinery relies on the value of the
> > > > uid
> > > > field being preserved during WPA (or, at least, needs to be saved
> > > > and
> > > > restored by passes that touch it)?
> > > 
> > > I belive this is solely at the cgraph stream out to stream in
> > > boundary but
> > > this may be a blurred area since while we materialize the whole
> > > cgraph
> > > at once the function bodies are streamed in on demand.
> > > 
> > > Honza can probably clarify things.
> > 
> > Well, the uids are used to associate cgraph edges with
> > statements.  At
> > WPA stage you do not have function bodies and thus uids serves role
> > of
> > pointers to the statement.  If you load the body in (via get_body)
> > the
> > uids are replaced by pointers and when you stream out uids are
> > recomputed again.
> > 
> > When do you touch the uids? At WPA time or from small IPA pass in
> > ltrans?
> 
> The analyzer is here in passes.def:
>   INSERT_PASSES_AFTER (all_regular_ipa_passes)
>   NEXT_PASS (pass_analyzer);
> 
> and so in LTO runs as the first regular IPA pass at WPA time,
> when do_whole_program_analysis calls:
>   execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
> 
> FWIW I hope to eventually have a way to summarize function bodies in
> the analyzer, but I don't yet, so I'm currently brute forcing things by
> loading all function bodies at the start of the analyzer (when
> -fanalyzer is enabled).
> 
> I wonder if that's messing things up somehow?

Actually I think it should work.  If you do get_body or
get_untransformed_body (that will be equal at that time) you ought to
get ids in symtab datastructure rewritten to pointers and at stream out
time we should assign new ids...
> 
> Does the stream-out from WPA make any assumptions about the stmt uids?
> For example, 
>   #define STMT_UID_NOT_IN_RANGE(uid) \
>     (gimple_stmt_max_uid (fn) < uid || uid == 0)
> seems to assume that the UIDs are per-function ranges from
>   [0-gimple_stmt_max_uid (fn)]
> which isn't the case for the uids set by the analyzer.  Maybe that's
> the issue here?
> 
> Sorry for not being more familiar with the IPA/LTO code

There is lto_prepare_function_for_streaming which assigns uids to be
incremental.   So I guess problem is that it is not called at WPA time
if function is in memory (since at moment we do not really modify bodies
at WPA time, however we do stream them in sometimes to icf compare them
or to update profile).

So i guess fix would be to arrange lto_prepare_function_for_streaming to
be called on all functions with body defined before WPA stream-out?

Honza
> Dave
> 
> 
> > hozna
> > > Note LTO uses this exactly because of this comment to avoid
> > > allocating
> > > extra memory for an 'index' but it could of course leave gimple_uid
> > > alone
> > > at some extra expense (eventually paid for in generic cgraph data
> > > structures
> > > and thus for not only the streaming time).
> > > 
> > > > On the assumption that this is the case, this patch updates the
> > > > comments
> > > > in gimple.h referring to passes being able to set uid to any
> > > > value to
> > > > note the caveat for IPA passes, and it updates the analyzer to
> > > > save
> > > > and restore the UIDs, fixing the error.
> > > > 
> > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > > OK for master?
> > > 
> > > The analyzer bits are OK, let's see how Honza can clarify the
> > > situation.
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > > gcc/analyzer/ChangeLog:
> > > >         PR analyzer/98599
> > > >         * supergraph.cc (saved_uids::make_uid_unique): New.
> > > >         (saved_uids::restore_uids): New.
> > > >         (supergraph::supergraph): Replace assignments to stmt-
> > > > >uid with
> > > >         calls to m_stmt_uids.make_uid_unique.
> > > >         (supergraph::~supergraph): New.
> > > >         * supergraph.h (class saved_uids): New.
> > > >         (supergraph::~supergraph): New decl.
> > > >         (supergraph::m_stmt_uids): New field.
> > > > 
> > > > gcc/ChangeLog:
> > > >         PR analyzer/98599
> > > >         * doc/gimple.texi: Document that UIDs must not change
> > > > during IPA
> > > >         passes.
> > > >         * gimple.h (gimple::uid): Likewise.
> > > >         (gimple_set_uid): Likewise.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > >         PR analyzer/98599
> > > >         * gcc.dg/analyzer/pr98599-a.c: New test.
> > > >         * gcc.dg/analyzer/pr98599-b.c: New test.
> > > > ---
> > > >  gcc/analyzer/supergraph.cc                | 53
> > > > +++++++++++++++++++++--
> > > >  gcc/analyzer/supergraph.h                 | 15 +++++++
> > > >  gcc/doc/gimple.texi                       |  6 +++
> > > >  gcc/gimple.h                              | 13 +++++-
> > > >  gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 ++++
> > > >  gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
> > > >  6 files changed, 90 insertions(+), 6 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > 
> > > > diff --git a/gcc/analyzer/supergraph.cc
> > > > b/gcc/analyzer/supergraph.cc
> > > > index 419f6424f76..40acfbd16a8 100644
> > > > --- a/gcc/analyzer/supergraph.cc
> > > > +++ b/gcc/analyzer/supergraph.cc
> > > > @@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple
> > > > *stmt)
> > > >    return edge;
> > > >  }
> > > > 
> > > > +/* class saved_uids.
> > > > +
> > > > +   In order to ensure consistent results without relying on the
> > > > ordering
> > > > +   of pointer values we assign a uid to each gimple stmt,
> > > > globally unique
> > > > +   across all functions.
> > > > +
> > > > +   Normally, the stmt uids are a scratch space that each pass
> > > > can freely
> > > > +   assign its own values to.  However, in the case of LTO, the
> > > > uids are
> > > > +   used to associate call stmts with callgraph edges between the
> > > > WPA phase
> > > > +   (where the analyzer runs in LTO mode) and the LTRANS phase;
> > > > if the
> > > > +   analyzer changes them in the WPA phase, it leads to errors
> > > > when
> > > > +   streaming the code back in at LTRANS.
> > > > +
> > > > +   Hence this class has responsibility for tracking the original
> > > > uids
> > > > +   and restoring them once the pass is complete (in the
> > > > supergraph dtor).  */
> > > > +
> > > > +/* Give STMT a globally unique uid, storing its original uid so
> > > > it can
> > > > +   later be restored.  */
> > > > +
> > > > +void
> > > > +saved_uids::make_uid_unique (gimple *stmt)
> > > > +{
> > > > +  unsigned next_uid = m_old_stmt_uids.length ();
> > > > +  unsigned old_stmt_uid = stmt->uid;
> > > > +  stmt->uid = next_uid;
> > > > +  m_old_stmt_uids.safe_push
> > > > +    (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
> > > > +}
> > > > +
> > > > +/* Restore the saved uids of all stmts.  */
> > > > +
> > > > +void
> > > > +saved_uids::restore_uids () const
> > > > +{
> > > > +  unsigned i;
> > > > +  std::pair<gimple *, unsigned> *pair;
> > > > +  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
> > > > +    pair->first->uid = pair->second;
> > > > +}
> > > > +
> > > >  /* supergraph's ctor.  Walk the callgraph, building supernodes
> > > > for each
> > > >     CFG basic block, splitting the basic blocks at
> > > > callsites.  Join
> > > >     together the supernodes with interprocedural and
> > > > intraprocedural
> > > > @@ -101,8 +141,6 @@ supergraph::supergraph (logger *logger)
> > > > 
> > > >    /* First pass: make supernodes (and assign UIDs to the gimple
> > > > stmts).  */
> > > >    {
> > > > -    unsigned next_uid = 0;
> > > > -
> > > >      /* Sort the cgraph_nodes?  */
> > > >      cgraph_node *node;
> > > >      FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> > > > @@ -127,7 +165,7 @@ supergraph::supergraph (logger *logger)
> > > >             {
> > > >               gimple *stmt = gsi_stmt (gpi);
> > > >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > > -             stmt->uid = next_uid++;
> > > > +             m_stmt_uids.make_uid_unique (stmt);
> > > >             }
> > > > 
> > > >           /* Append statements from BB to the current supernode,
> > > > splitting
> > > > @@ -139,7 +177,7 @@ supergraph::supergraph (logger *logger)
> > > >               gimple *stmt = gsi_stmt (gsi);
> > > >               node_for_stmts->m_stmts.safe_push (stmt);
> > > >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > > -             stmt->uid = next_uid++;
> > > > +             m_stmt_uids.make_uid_unique (stmt);
> > > >               if (cgraph_edge *edge = supergraph_call_edge (fun,
> > > > stmt))
> > > >                 {
> > > >                   m_cgraph_edge_to_caller_prev_node.put(edge,
> > > > node_for_stmts);
> > > > @@ -257,6 +295,13 @@ supergraph::supergraph (logger *logger)
> > > >    }
> > > >  }
> > > > 
> > > > +/* supergraph's dtor.  Reset stmt uids.  */
> > > > +
> > > > +supergraph::~supergraph ()
> > > > +{
> > > > +  m_stmt_uids.restore_uids ();
> > > > +}
> > > > +
> > > >  /* Dump this graph in .dot format to PP, using DUMP_ARGS.
> > > >     Cluster the supernodes by function, then by BB from original
> > > > CFG.  */
> > > > 
> > > > diff --git a/gcc/analyzer/supergraph.h
> > > > b/gcc/analyzer/supergraph.h
> > > > index fc4a753c5a4..b2125a35410 100644
> > > > --- a/gcc/analyzer/supergraph.h
> > > > +++ b/gcc/analyzer/supergraph.h
> > > > @@ -79,6 +79,18 @@ struct supergraph_traits
> > > >    typedef supercluster cluster_t;
> > > >  };
> > > > 
> > > > +/* A class to manage the setting and restoring of statement
> > > > uids.  */
> > > > +
> > > > +class saved_uids
> > > > +{
> > > > +public:
> > > > +  void make_uid_unique (gimple *stmt);
> > > > +  void restore_uids () const;
> > > > +
> > > > +private:
> > > > +  auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
> > > > +};
> > > > +
> > > >  /* A "supergraph" is a directed graph formed by joining together
> > > > all CFGs,
> > > >     linking them via interprocedural call and return edges.
> > > > 
> > > > @@ -90,6 +102,7 @@ class supergraph : public
> > > > digraph<supergraph_traits>
> > > >  {
> > > >  public:
> > > >    supergraph (logger *logger);
> > > > +  ~supergraph ();
> > > > 
> > > >    supernode *get_node_for_function_entry (function *fun) const
> > > >    {
> > > > @@ -205,6 +218,8 @@ private:
> > > > 
> > > >    typedef hash_map<const function *, unsigned>
> > > > function_to_num_snodes_t;
> > > >    function_to_num_snodes_t m_function_to_num_snodes;
> > > > +
> > > > +  saved_uids m_stmt_uids;
> > > >  };
> > > > 
> > > >  /* A node within a supergraph.  */
> > > > diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
> > > > index 4b3d7d7452e..417f37169ff 100644
> > > > --- a/gcc/doc/gimple.texi
> > > > +++ b/gcc/doc/gimple.texi
> > > > @@ -173,6 +173,12 @@ This is an unsigned integer used by passes
> > > > that want to assign
> > > >  IDs to every statement. These IDs must be assigned and used by
> > > >  each pass.
> > > > 
> > > > +Normally the uid field is a scratch space for use by passes that
> > > > +need one.  However, in LTO mode the UIDs are used to associate
> > > > call
> > > > +statements with callgraph edges between the WPA phase and the
> > > > LTRANS
> > > > +phase; hence if they are changed by an IPA pass they should be
> > > > restored
> > > > +at the end of that pass.
> > > > +
> > > >  @item @code{location}
> > > >  This is a @code{location_t} identifier to specify source code
> > > >  location for this statement. It is inherited from the front
> > > > diff --git a/gcc/gimple.h b/gcc/gimple.h
> > > > index 3ec86f5f082..7d01bd4dd6a 100644
> > > > --- a/gcc/gimple.h
> > > > +++ b/gcc/gimple.h
> > > > @@ -260,7 +260,11 @@ struct GTY((desc
> > > > ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
> > > > 
> > > >    /* UID of this statement.  This is used by passes that want to
> > > >       assign IDs to statements.  It must be assigned and used by
> > > > each
> > > > -     pass.  By default it should be assumed to contain
> > > > garbage.  */
> > > > +     pass.  By default it should be assumed to contain garbage.
> > > > +     However, in LTO mode the UIDs are used to associate call
> > > > stmts with
> > > > +     callgraph edges between the WPA phase and the LTRANS phase;
> > > > hence
> > > > +     if they are changed by an IPA pass they should be restored
> > > > at the
> > > > +     end of that pass.  */
> > > >    unsigned uid;
> > > > 
> > > >    /* [ WORD 2 ]
> > > > @@ -2043,7 +2047,12 @@ gimple_plf (gimple *stmt, enum plf_mask
> > > > plf)
> > > >     Please note that this UID property is supposed to be
> > > > undefined at
> > > >     pass boundaries.  This means that a given pass should not
> > > > assume it
> > > >     contains any useful value when the pass starts and thus can
> > > > set it
> > > > -   to any value it sees fit.  */
> > > > +   to any value it sees fit.
> > > > +
> > > > +   That said, in LTO mode the UIDs are used to associate call
> > > > stmts with
> > > > +   callgraph edges between the WPA phase and the LTRANS phase;
> > > > hence
> > > > +   if they are changed by an IPA pass they should be restored at
> > > > the
> > > > +   end of that pass.  */
> > > > 
> > > >  static inline void
> > > >  gimple_set_uid (gimple *g, unsigned uid)
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > new file mode 100644
> > > > index 00000000000..2bbf37b0e6e
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > @@ -0,0 +1,8 @@
> > > > +/* { dg-do link } */
> > > > +/* { dg-require-effective-target lto } */
> > > > +/* { dg-additional-options "-Os -flto" } */
> > > > +/* { dg-additional-sources pr98599-b.c } */
> > > > +
> > > > +int b(int x);
> > > > +int a() { b(5); }
> > > > +int main() { a(); }
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > new file mode 100644
> > > > index 00000000000..cfdeb3bf134
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > @@ -0,0 +1 @@
> > > > +int b(int x) { return x; }
> > > > --
> > > > 2.26.2
> > > > 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-01-21 19:09       ` Jan Hubicka
@ 2021-01-21 19:35         ` David Malcolm
  2021-01-22  3:07         ` David Malcolm
  1 sibling, 0 replies; 14+ messages in thread
From: David Malcolm @ 2021-01-21 19:35 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Biener, GCC Patches

On Thu, 2021-01-21 at 20:09 +0100, Jan Hubicka wrote:
> > On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > gimple.h has this comment for gimple's uid field:
> > > > > 
> > > > >   /* UID of this statement.  This is used by passes that want
> > > > > to
> > > > >      assign IDs to statements.  It must be assigned and used
> > > > > by
> > > > > each
> > > > >      pass.  By default it should be assumed to contain
> > > > > garbage.  */
> > > > >   unsigned uid;
> > > > > 
> > > > > and gimple_set_uid has:
> > > > > 
> > > > >    Please note that this UID property is supposed to be
> > > > > undefined
> > > > > at
> > > > >    pass boundaries.  This means that a given pass should not
> > > > > assume it
> > > > >    contains any useful value when the pass starts and thus
> > > > > can
> > > > > set it
> > > > >    to any value it sees fit.
> > > > > 
> > > > > which suggests that any pass can use the uid field as an
> > > > > arbitrary
> > > > > scratch space.
> > > > > 
> > > > > PR analyzer/98599 reports a case where this error occurs in
> > > > > LTO
> > > > > mode:
> > > > >   fatal error: Cgraph edge statement index out of range
> > > > > on certain inputs with -fanalyzer.
> > > > > 
> > > > > The error occurs in the LTRANS phase after -fanalyzer runs in
> > > > > the
> > > > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > > > stmts.
> > > > > 
> > > > > The error occurs when LTRANS is streaming callgraph edges
> > > > > back
> > > > > in.
> > > > > If I'm reading things correctly, the LTO format uses stmt
> > > > > uids to
> > > > > associate call stmts with callgraph edges between WPA and
> > > > > LTRANS.
> > > > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > > > gimple_uid, and input_edge reads it back in.
> > > > > 
> > > > > Hence IPA passes that touch the uids in WPA need to restore
> > > > > them,
> > > > > or the stream-in at LTRANS will fail.
> > > > > 
> > > > > Is it intended that the LTO machinery relies on the value of
> > > > > the
> > > > > uid
> > > > > field being preserved during WPA (or, at least, needs to be
> > > > > saved
> > > > > and
> > > > > restored by passes that touch it)?
> > > > 
> > > > I belive this is solely at the cgraph stream out to stream in
> > > > boundary but
> > > > this may be a blurred area since while we materialize the whole
> > > > cgraph
> > > > at once the function bodies are streamed in on demand.
> > > > 
> > > > Honza can probably clarify things.
> > > 
> > > Well, the uids are used to associate cgraph edges with
> > > statements.  At
> > > WPA stage you do not have function bodies and thus uids serves
> > > role
> > > of
> > > pointers to the statement.  If you load the body in (via
> > > get_body)
> > > the
> > > uids are replaced by pointers and when you stream out uids are
> > > recomputed again.
> > > 
> > > When do you touch the uids? At WPA time or from small IPA pass in
> > > ltrans?
> > 
> > The analyzer is here in passes.def:
> >   INSERT_PASSES_AFTER (all_regular_ipa_passes)
> >   NEXT_PASS (pass_analyzer);
> > 
> > and so in LTO runs as the first regular IPA pass at WPA time,
> > when do_whole_program_analysis calls:
> >   execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
> > 
> > FWIW I hope to eventually have a way to summarize function bodies
> > in
> > the analyzer, but I don't yet, so I'm currently brute forcing
> > things by
> > loading all function bodies at the start of the analyzer (when
> > -fanalyzer is enabled).
> > 
> > I wonder if that's messing things up somehow?
> 
> Actually I think it should work.  If you do get_body or
> get_untransformed_body (that will be equal at that time) you ought to
> get ids in symtab datastructure rewritten to pointers and at stream
> out
> time we should assign new ids...
> > Does the stream-out from WPA make any assumptions about the stmt
> > uids?
> > For example, 
> >   #define STMT_UID_NOT_IN_RANGE(uid) \
> >     (gimple_stmt_max_uid (fn) < uid || uid == 0)
> > seems to assume that the UIDs are per-function ranges from
> >   [0-gimple_stmt_max_uid (fn)]
> > which isn't the case for the uids set by the analyzer.  Maybe
> > that's
> > the issue here?
> > 
> > Sorry for not being more familiar with the IPA/LTO code
> 
> There is lto_prepare_function_for_streaming which assigns uids to be
> incremental.   So I guess problem is that it is not called at WPA
> time
> if function is in memory (since at moment we do not really modify
> bodies
> at WPA time, however we do stream them in sometimes to icf compare
> them
> or to update profile).
> 
> So i guess fix would be to arrange lto_prepare_function_for_streaming
> to
> be called on all functions with body defined before WPA stream-out?

Thanks; I'll have a go at implementing that.
Dave

> Honza
> > Dave
> > 
> > 
> > > hozna
> > > > Note LTO uses this exactly because of this comment to avoid
> > > > allocating
> > > > extra memory for an 'index' but it could of course leave
> > > > gimple_uid
> > > > alone
> > > > at some extra expense (eventually paid for in generic cgraph
> > > > data
> > > > structures
> > > > and thus for not only the streaming time).
> > > > 
> > > > > On the assumption that this is the case, this patch updates
> > > > > the
> > > > > comments
> > > > > in gimple.h referring to passes being able to set uid to any
> > > > > value to
> > > > > note the caveat for IPA passes, and it updates the analyzer
> > > > > to
> > > > > save
> > > > > and restore the UIDs, fixing the error.
> > > > > 
> > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-
> > > > > gnu.
> > > > > OK for master?
> > > > 
> > > > The analyzer bits are OK, let's see how Honza can clarify the
> > > > situation.
> > > > 
> > > > Thanks,
> > > > Richard.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-01-21 19:09       ` Jan Hubicka
  2021-01-21 19:35         ` David Malcolm
@ 2021-01-22  3:07         ` David Malcolm
  2021-04-13  1:24           ` [committed] " David Malcolm
  1 sibling, 1 reply; 14+ messages in thread
From: David Malcolm @ 2021-01-22  3:07 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Biener, GCC Patches

On Thu, 2021-01-21 at 20:09 +0100, Jan Hubicka wrote:
> > On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > gimple.h has this comment for gimple's uid field:
> > > > > 
> > > > >   /* UID of this statement.  This is used by passes that want
> > > > > to
> > > > >      assign IDs to statements.  It must be assigned and used
> > > > > by
> > > > > each
> > > > >      pass.  By default it should be assumed to contain
> > > > > garbage.  */
> > > > >   unsigned uid;
> > > > > 
> > > > > and gimple_set_uid has:
> > > > > 
> > > > >    Please note that this UID property is supposed to be
> > > > > undefined
> > > > > at
> > > > >    pass boundaries.  This means that a given pass should not
> > > > > assume it
> > > > >    contains any useful value when the pass starts and thus
> > > > > can
> > > > > set it
> > > > >    to any value it sees fit.
> > > > > 
> > > > > which suggests that any pass can use the uid field as an
> > > > > arbitrary
> > > > > scratch space.
> > > > > 
> > > > > PR analyzer/98599 reports a case where this error occurs in
> > > > > LTO
> > > > > mode:
> > > > >   fatal error: Cgraph edge statement index out of range
> > > > > on certain inputs with -fanalyzer.
> > > > > 
> > > > > The error occurs in the LTRANS phase after -fanalyzer runs in
> > > > > the
> > > > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > > > stmts.
> > > > > 
> > > > > The error occurs when LTRANS is streaming callgraph edges
> > > > > back
> > > > > in.
> > > > > If I'm reading things correctly, the LTO format uses stmt
> > > > > uids to
> > > > > associate call stmts with callgraph edges between WPA and
> > > > > LTRANS.
> > > > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > > > gimple_uid, and input_edge reads it back in.
> > > > > 
> > > > > Hence IPA passes that touch the uids in WPA need to restore
> > > > > them,
> > > > > or the stream-in at LTRANS will fail.
> > > > > 
> > > > > Is it intended that the LTO machinery relies on the value of
> > > > > the
> > > > > uid
> > > > > field being preserved during WPA (or, at least, needs to be
> > > > > saved
> > > > > and
> > > > > restored by passes that touch it)?
> > > > 
> > > > I belive this is solely at the cgraph stream out to stream in
> > > > boundary but
> > > > this may be a blurred area since while we materialize the whole
> > > > cgraph
> > > > at once the function bodies are streamed in on demand.
> > > > 
> > > > Honza can probably clarify things.
> > > 
> > > Well, the uids are used to associate cgraph edges with
> > > statements.  At
> > > WPA stage you do not have function bodies and thus uids serves
> > > role
> > > of
> > > pointers to the statement.  If you load the body in (via
> > > get_body)
> > > the
> > > uids are replaced by pointers and when you stream out uids are
> > > recomputed again.
> > > 
> > > When do you touch the uids? At WPA time or from small IPA pass in
> > > ltrans?
> > 
> > The analyzer is here in passes.def:
> >   INSERT_PASSES_AFTER (all_regular_ipa_passes)
> >   NEXT_PASS (pass_analyzer);
> > 
> > and so in LTO runs as the first regular IPA pass at WPA time,
> > when do_whole_program_analysis calls:
> >   execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
> > 
> > FWIW I hope to eventually have a way to summarize function bodies
> > in
> > the analyzer, but I don't yet, so I'm currently brute forcing
> > things by
> > loading all function bodies at the start of the analyzer (when
> > -fanalyzer is enabled).
> > 
> > I wonder if that's messing things up somehow?
> 
> Actually I think it should work.  If you do get_body or
> get_untransformed_body (that will be equal at that time) you ought to
> get ids in symtab datastructure rewritten to pointers and at stream
> out
> time we should assign new ids...
> > Does the stream-out from WPA make any assumptions about the stmt
> > uids?
> > For example, 
> >   #define STMT_UID_NOT_IN_RANGE(uid) \
> >     (gimple_stmt_max_uid (fn) < uid || uid == 0)
> > seems to assume that the UIDs are per-function ranges from
> >   [0-gimple_stmt_max_uid (fn)]
> > which isn't the case for the uids set by the analyzer.  Maybe
> > that's
> > the issue here?
> > 
> > Sorry for not being more familiar with the IPA/LTO code
> 
> There is lto_prepare_function_for_streaming which assigns uids to be
> incremental.   So I guess problem is that it is not called at WPA
> time
> if function is in memory (since at moment we do not really modify
> bodies
> at WPA time, however we do stream them in sometimes to icf compare
> them
> or to update profile).
> 
> So i guess fix would be to arrange lto_prepare_function_for_streaming
> to
> be called on all functions with body defined before WPA stream-out?

Thanks.

I think my earlier analysis was wrong.

With the caveat that I'm not as familiar with the IPA code as other
parts of the compiler, what I think is going on is:

WPA streams in the input callgraph:

  main/1 -------------------> a/0  ----------------------> b/2
                 |                           |
          lto_stmt_uid == 1          lto_stmt_uid == 1

WPA generates a clone of a, and streams out this callgraph:

  main/1 -------------------> a.isra/3  ----------------------> b/2

Without -fanalyzer, the callgraph edges have NULL call_stmt and so the
existing lto_stmt_uid values are used when streaming out:

  main/1 -------------------> a.isra/3  ----------------------> b/2
                 |                           |
         lto_stmt_uid == 1          lto_stmt_uid == 1


With -fanalyzer, the functions were materialized, so the callgraph
edges have non-NULL call-stmt.

lto_prepare_function_for_streaming is called for "main/1", but not for
"a.isra/3", as it is a clone (see the condition at line 310):

    307	  /* Do body modifications needed for streaming before we fork out
    308	     worker processes.  */
    309	  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
    310	    if (!node->clone_of && gimple_has_body_p (node->decl))
    311	      lto_prepare_function_for_streaming (node);

Hence the uids are written to within main/1, but not within a.isra/3 (I
think).
Hence this callgraph is written out if the functions bodies were loaded:

  main/1 -------------------> a.isra/3  ----------------------> b/2
                 |                           |
         call_stmt->uid == 0           call_stmt->uid == 2
        so lto_stmt_uid == 1          so lto_stmt_uid == 3


At the ltrans stream-in, it reads in the edges, and then
calls fixup_call_stmt_edges_1 on both the original node *and* any
clones:

        1237	    fixup_call_stmt_edges_1 (orig, stmts, fn);
        1238	  if (orig->clones)
        1239	    for (node = orig->clones; node != orig;)
        1240	      {
        1241		if (!node->thunk)
        1242		  fixup_call_stmt_edges_1 (node, stmts, fn);

WPA has renumbered the stmts in non-clone functions, but has not
renumbered them for clones.

Note that fixup_call_stmt_edges passes the same array of stmts to both
calls to fixup_call_stmt_edges_1 (both the original and the clones)
i.e. it's trying to fixup the edges for "a.isra/3" with the stmts in
"a/0".   Should it be doing this?  "stmts" is constructed in
input_function by walking the BBs of cfun.  Is that going to find the
stmts of the clones?  (sorry, I'm not familiar with how clones work
within our IPA implementation).

fixup_call_stmt_edges_1 attempts to fixup the call edge from a.isra/3
to b/2, which has lto_stmt_uid == 3 and thus stmt->uid == 2, but there
are only 2 stmts within the function "a/0", hence it's one past the end
of the array, and fails with:
test.c:3:5: fatal error: Cgraph edge statement index out of range

I wonder if this can happen any time WPA loads a function body that
makes a call that then gets cloned; it just happens that -fanalyzer is
loading the function body.

Thoughts?  Hope the above makes sense.
Dave


> Honza
> > Dave
> > 
> > 
> > > hozna
> > > > Note LTO uses this exactly because of this comment to avoid
> > > > allocating
> > > > extra memory for an 'index' but it could of course leave
> > > > gimple_uid
> > > > alone
> > > > at some extra expense (eventually paid for in generic cgraph
> > > > data
> > > > structures
> > > > and thus for not only the streaming time).
> > > > 
> > > > > On the assumption that this is the case, this patch updates
> > > > > the
> > > > > comments
> > > > > in gimple.h referring to passes being able to set uid to any
> > > > > value to
> > > > > note the caveat for IPA passes, and it updates the analyzer
> > > > > to
> > > > > save
> > > > > and restore the UIDs, fixing the error.
> > > > > 
> > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-
> > > > > gnu.
> > > > > OK for master?
> > > > 
> > > > The analyzer bits are OK, let's see how Honza can clarify the
> > > > situation.
> > > > 
> > > > Thanks,
> > > > Richard.
> > > > 
> > > > > gcc/analyzer/ChangeLog:
> > > > >         PR analyzer/98599
> > > > >         * supergraph.cc (saved_uids::make_uid_unique): New.
> > > > >         (saved_uids::restore_uids): New.
> > > > >         (supergraph::supergraph): Replace assignments to
> > > > > stmt-
> > > > > > uid with
> > > > >         calls to m_stmt_uids.make_uid_unique.
> > > > >         (supergraph::~supergraph): New.
> > > > >         * supergraph.h (class saved_uids): New.
> > > > >         (supergraph::~supergraph): New decl.
> > > > >         (supergraph::m_stmt_uids): New field.
> > > > > 
> > > > > gcc/ChangeLog:
> > > > >         PR analyzer/98599
> > > > >         * doc/gimple.texi: Document that UIDs must not change
> > > > > during IPA
> > > > >         passes.
> > > > >         * gimple.h (gimple::uid): Likewise.
> > > > >         (gimple_set_uid): Likewise.
> > > > > 
> > > > > gcc/testsuite/ChangeLog:
> > > > >         PR analyzer/98599
> > > > >         * gcc.dg/analyzer/pr98599-a.c: New test.
> > > > >         * gcc.dg/analyzer/pr98599-b.c: New test.
> > > > > ---
> > > > >  gcc/analyzer/supergraph.cc                | 53
> > > > > +++++++++++++++++++++--
> > > > >  gcc/analyzer/supergraph.h                 | 15 +++++++
> > > > >  gcc/doc/gimple.texi                       |  6 +++
> > > > >  gcc/gimple.h                              | 13 +++++-
> > > > >  gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 ++++
> > > > >  gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
> > > > >  6 files changed, 90 insertions(+), 6 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > > 
> > > > > diff --git a/gcc/analyzer/supergraph.cc
> > > > > b/gcc/analyzer/supergraph.cc
> > > > > index 419f6424f76..40acfbd16a8 100644
> > > > > --- a/gcc/analyzer/supergraph.cc
> > > > > +++ b/gcc/analyzer/supergraph.cc
> > > > > @@ -87,6 +87,46 @@ supergraph_call_edge (function *fun,
> > > > > gimple
> > > > > *stmt)
> > > > >    return edge;
> > > > >  }
> > > > > 
> > > > > +/* class saved_uids.
> > > > > +
> > > > > +   In order to ensure consistent results without relying on
> > > > > the
> > > > > ordering
> > > > > +   of pointer values we assign a uid to each gimple stmt,
> > > > > globally unique
> > > > > +   across all functions.
> > > > > +
> > > > > +   Normally, the stmt uids are a scratch space that each
> > > > > pass
> > > > > can freely
> > > > > +   assign its own values to.  However, in the case of LTO,
> > > > > the
> > > > > uids are
> > > > > +   used to associate call stmts with callgraph edges between
> > > > > the
> > > > > WPA phase
> > > > > +   (where the analyzer runs in LTO mode) and the LTRANS
> > > > > phase;
> > > > > if the
> > > > > +   analyzer changes them in the WPA phase, it leads to
> > > > > errors
> > > > > when
> > > > > +   streaming the code back in at LTRANS.
> > > > > +
> > > > > +   Hence this class has responsibility for tracking the
> > > > > original
> > > > > uids
> > > > > +   and restoring them once the pass is complete (in the
> > > > > supergraph dtor).  */
> > > > > +
> > > > > +/* Give STMT a globally unique uid, storing its original uid
> > > > > so
> > > > > it can
> > > > > +   later be restored.  */
> > > > > +
> > > > > +void
> > > > > +saved_uids::make_uid_unique (gimple *stmt)
> > > > > +{
> > > > > +  unsigned next_uid = m_old_stmt_uids.length ();
> > > > > +  unsigned old_stmt_uid = stmt->uid;
> > > > > +  stmt->uid = next_uid;
> > > > > +  m_old_stmt_uids.safe_push
> > > > > +    (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
> > > > > +}
> > > > > +
> > > > > +/* Restore the saved uids of all stmts.  */
> > > > > +
> > > > > +void
> > > > > +saved_uids::restore_uids () const
> > > > > +{
> > > > > +  unsigned i;
> > > > > +  std::pair<gimple *, unsigned> *pair;
> > > > > +  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
> > > > > +    pair->first->uid = pair->second;
> > > > > +}
> > > > > +
> > > > >  /* supergraph's ctor.  Walk the callgraph, building
> > > > > supernodes
> > > > > for each
> > > > >     CFG basic block, splitting the basic blocks at
> > > > > callsites.  Join
> > > > >     together the supernodes with interprocedural and
> > > > > intraprocedural
> > > > > @@ -101,8 +141,6 @@ supergraph::supergraph (logger *logger)
> > > > > 
> > > > >    /* First pass: make supernodes (and assign UIDs to the
> > > > > gimple
> > > > > stmts).  */
> > > > >    {
> > > > > -    unsigned next_uid = 0;
> > > > > -
> > > > >      /* Sort the cgraph_nodes?  */
> > > > >      cgraph_node *node;
> > > > >      FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> > > > > @@ -127,7 +165,7 @@ supergraph::supergraph (logger *logger)
> > > > >             {
> > > > >               gimple *stmt = gsi_stmt (gpi);
> > > > >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > > > -             stmt->uid = next_uid++;
> > > > > +             m_stmt_uids.make_uid_unique (stmt);
> > > > >             }
> > > > > 
> > > > >           /* Append statements from BB to the current
> > > > > supernode,
> > > > > splitting
> > > > > @@ -139,7 +177,7 @@ supergraph::supergraph (logger *logger)
> > > > >               gimple *stmt = gsi_stmt (gsi);
> > > > >               node_for_stmts->m_stmts.safe_push (stmt);
> > > > >               m_stmt_to_node_t.put (stmt, node_for_stmts);
> > > > > -             stmt->uid = next_uid++;
> > > > > +             m_stmt_uids.make_uid_unique (stmt);
> > > > >               if (cgraph_edge *edge = supergraph_call_edge
> > > > > (fun,
> > > > > stmt))
> > > > >                 {
> > > > >                   m_cgraph_edge_to_caller_prev_node.put(edge,
> > > > > node_for_stmts);
> > > > > @@ -257,6 +295,13 @@ supergraph::supergraph (logger *logger)
> > > > >    }
> > > > >  }
> > > > > 
> > > > > +/* supergraph's dtor.  Reset stmt uids.  */
> > > > > +
> > > > > +supergraph::~supergraph ()
> > > > > +{
> > > > > +  m_stmt_uids.restore_uids ();
> > > > > +}
> > > > > +
> > > > >  /* Dump this graph in .dot format to PP, using DUMP_ARGS.
> > > > >     Cluster the supernodes by function, then by BB from
> > > > > original
> > > > > CFG.  */
> > > > > 
> > > > > diff --git a/gcc/analyzer/supergraph.h
> > > > > b/gcc/analyzer/supergraph.h
> > > > > index fc4a753c5a4..b2125a35410 100644
> > > > > --- a/gcc/analyzer/supergraph.h
> > > > > +++ b/gcc/analyzer/supergraph.h
> > > > > @@ -79,6 +79,18 @@ struct supergraph_traits
> > > > >    typedef supercluster cluster_t;
> > > > >  };
> > > > > 
> > > > > +/* A class to manage the setting and restoring of statement
> > > > > uids.  */
> > > > > +
> > > > > +class saved_uids
> > > > > +{
> > > > > +public:
> > > > > +  void make_uid_unique (gimple *stmt);
> > > > > +  void restore_uids () const;
> > > > > +
> > > > > +private:
> > > > > +  auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
> > > > > +};
> > > > > +
> > > > >  /* A "supergraph" is a directed graph formed by joining
> > > > > together
> > > > > all CFGs,
> > > > >     linking them via interprocedural call and return edges.
> > > > > 
> > > > > @@ -90,6 +102,7 @@ class supergraph : public
> > > > > digraph<supergraph_traits>
> > > > >  {
> > > > >  public:
> > > > >    supergraph (logger *logger);
> > > > > +  ~supergraph ();
> > > > > 
> > > > >    supernode *get_node_for_function_entry (function *fun)
> > > > > const
> > > > >    {
> > > > > @@ -205,6 +218,8 @@ private:
> > > > > 
> > > > >    typedef hash_map<const function *, unsigned>
> > > > > function_to_num_snodes_t;
> > > > >    function_to_num_snodes_t m_function_to_num_snodes;
> > > > > +
> > > > > +  saved_uids m_stmt_uids;
> > > > >  };
> > > > > 
> > > > >  /* A node within a supergraph.  */
> > > > > diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
> > > > > index 4b3d7d7452e..417f37169ff 100644
> > > > > --- a/gcc/doc/gimple.texi
> > > > > +++ b/gcc/doc/gimple.texi
> > > > > @@ -173,6 +173,12 @@ This is an unsigned integer used by
> > > > > passes
> > > > > that want to assign
> > > > >  IDs to every statement. These IDs must be assigned and used
> > > > > by
> > > > >  each pass.
> > > > > 
> > > > > +Normally the uid field is a scratch space for use by passes
> > > > > that
> > > > > +need one.  However, in LTO mode the UIDs are used to
> > > > > associate
> > > > > call
> > > > > +statements with callgraph edges between the WPA phase and
> > > > > the
> > > > > LTRANS
> > > > > +phase; hence if they are changed by an IPA pass they should
> > > > > be
> > > > > restored
> > > > > +at the end of that pass.
> > > > > +
> > > > >  @item @code{location}
> > > > >  This is a @code{location_t} identifier to specify source
> > > > > code
> > > > >  location for this statement. It is inherited from the front
> > > > > diff --git a/gcc/gimple.h b/gcc/gimple.h
> > > > > index 3ec86f5f082..7d01bd4dd6a 100644
> > > > > --- a/gcc/gimple.h
> > > > > +++ b/gcc/gimple.h
> > > > > @@ -260,7 +260,11 @@ struct GTY((desc
> > > > > ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
> > > > > 
> > > > >    /* UID of this statement.  This is used by passes that
> > > > > want to
> > > > >       assign IDs to statements.  It must be assigned and used
> > > > > by
> > > > > each
> > > > > -     pass.  By default it should be assumed to contain
> > > > > garbage.  */
> > > > > +     pass.  By default it should be assumed to contain
> > > > > garbage.
> > > > > +     However, in LTO mode the UIDs are used to associate
> > > > > call
> > > > > stmts with
> > > > > +     callgraph edges between the WPA phase and the LTRANS
> > > > > phase;
> > > > > hence
> > > > > +     if they are changed by an IPA pass they should be
> > > > > restored
> > > > > at the
> > > > > +     end of that pass.  */
> > > > >    unsigned uid;
> > > > > 
> > > > >    /* [ WORD 2 ]
> > > > > @@ -2043,7 +2047,12 @@ gimple_plf (gimple *stmt, enum
> > > > > plf_mask
> > > > > plf)
> > > > >     Please note that this UID property is supposed to be
> > > > > undefined at
> > > > >     pass boundaries.  This means that a given pass should not
> > > > > assume it
> > > > >     contains any useful value when the pass starts and thus
> > > > > can
> > > > > set it
> > > > > -   to any value it sees fit.  */
> > > > > +   to any value it sees fit.
> > > > > +
> > > > > +   That said, in LTO mode the UIDs are used to associate
> > > > > call
> > > > > stmts with
> > > > > +   callgraph edges between the WPA phase and the LTRANS
> > > > > phase;
> > > > > hence
> > > > > +   if they are changed by an IPA pass they should be
> > > > > restored at
> > > > > the
> > > > > +   end of that pass.  */
> > > > > 
> > > > >  static inline void
> > > > >  gimple_set_uid (gimple *g, unsigned uid)
> > > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > > new file mode 100644
> > > > > index 00000000000..2bbf37b0e6e
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do link } */
> > > > > +/* { dg-require-effective-target lto } */
> > > > > +/* { dg-additional-options "-Os -flto" } */
> > > > > +/* { dg-additional-sources pr98599-b.c } */
> > > > > +
> > > > > +int b(int x);
> > > > > +int a() { b(5); }
> > > > > +int main() { a(); }
> > > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > > new file mode 100644
> > > > > index 00000000000..cfdeb3bf134
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> > > > > @@ -0,0 +1 @@
> > > > > +int b(int x) { return x; }
> > > > > --
> > > > > 2.26.2
> > > > > 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [committed] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-01-22  3:07         ` David Malcolm
@ 2021-04-13  1:24           ` David Malcolm
  2021-04-13  5:46             ` Jan Hubicka
  2021-04-15  9:45             ` Jan Hubicka
  0 siblings, 2 replies; 14+ messages in thread
From: David Malcolm @ 2021-04-13  1:24 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, David Malcolm

On Thu, 2021-01-21 at 22:07 -0500, David Malcolm via Gcc-patches wrote:
> On Thu, 2021-01-21 at 20:09 +0100, Jan Hubicka wrote:
> > > On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via
> > > > > Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > gimple.h has this comment for gimple's uid field:
> > > > > > 
> > > > > >   /* UID of this statement.  This is used by passes that
> > > > > > want
> > > > > > to
> > > > > >      assign IDs to statements.  It must be assigned and
> > > > > > used
> > > > > > by
> > > > > > each
> > > > > >      pass.  By default it should be assumed to contain
> > > > > > garbage.  */
> > > > > >   unsigned uid;
> > > > > > 
> > > > > > and gimple_set_uid has:
> > > > > > 
> > > > > >    Please note that this UID property is supposed to be
> > > > > > undefined
> > > > > > at
> > > > > >    pass boundaries.  This means that a given pass should
> > > > > > not
> > > > > > assume it
> > > > > >    contains any useful value when the pass starts and thus
> > > > > > can
> > > > > > set it
> > > > > >    to any value it sees fit.
> > > > > > 
> > > > > > which suggests that any pass can use the uid field as an
> > > > > > arbitrary
> > > > > > scratch space.
> > > > > > 
> > > > > > PR analyzer/98599 reports a case where this error occurs in
> > > > > > LTO
> > > > > > mode:
> > > > > >   fatal error: Cgraph edge statement index out of range
> > > > > > on certain inputs with -fanalyzer.
> > > > > > 
> > > > > > The error occurs in the LTRANS phase after -fanalyzer runs
> > > > > > in
> > > > > > the
> > > > > > WPA phase.  The analyzer pass writes to the uid fields of
> > > > > > all
> > > > > > stmts.
> > > > > > 
> > > > > > The error occurs when LTRANS is streaming callgraph edges
> > > > > > back
> > > > > > in.
> > > > > > If I'm reading things correctly, the LTO format uses stmt
> > > > > > uids to
> > > > > > associate call stmts with callgraph edges between WPA and
> > > > > > LTRANS.
> > > > > > For example, in lto-cgraph.c, lto_output_edge writes out
> > > > > > the
> > > > > > gimple_uid, and input_edge reads it back in.
> > > > > > 
> > > > > > Hence IPA passes that touch the uids in WPA need to restore
> > > > > > them,
> > > > > > or the stream-in at LTRANS will fail.
> > > > > > 
> > > > > > Is it intended that the LTO machinery relies on the value
> > > > > > of
> > > > > > the
> > > > > > uid
> > > > > > field being preserved during WPA (or, at least, needs to be
> > > > > > saved
> > > > > > and
> > > > > > restored by passes that touch it)?
> > > > > 
> > > > > I belive this is solely at the cgraph stream out to stream in
> > > > > boundary but
> > > > > this may be a blurred area since while we materialize the
> > > > > whole
> > > > > cgraph
> > > > > at once the function bodies are streamed in on demand.
> > > > > 
> > > > > Honza can probably clarify things.
> > > > 
> > > > Well, the uids are used to associate cgraph edges with
> > > > statements.  At
> > > > WPA stage you do not have function bodies and thus uids serves
> > > > role
> > > > of
> > > > pointers to the statement.  If you load the body in (via
> > > > get_body)
> > > > the
> > > > uids are replaced by pointers and when you stream out uids are
> > > > recomputed again.
> > > > 
> > > > When do you touch the uids? At WPA time or from small IPA pass
> > > > in
> > > > ltrans?
> > > 
> > > The analyzer is here in passes.def:
> > >   INSERT_PASSES_AFTER (all_regular_ipa_passes)
> > >   NEXT_PASS (pass_analyzer);
> > > 
> > > and so in LTO runs as the first regular IPA pass at WPA time,
> > > when do_whole_program_analysis calls:
> > >   execute_ipa_pass_list (g->get_passes
> > > ()->all_regular_ipa_passes);
> > > 
> > > FWIW I hope to eventually have a way to summarize function bodies
> > > in
> > > the analyzer, but I don't yet, so I'm currently brute forcing
> > > things by
> > > loading all function bodies at the start of the analyzer (when
> > > -fanalyzer is enabled).
> > > 
> > > I wonder if that's messing things up somehow?
> > 
> > Actually I think it should work.  If you do get_body or
> > get_untransformed_body (that will be equal at that time) you ought
> > to
> > get ids in symtab datastructure rewritten to pointers and at stream
> > out
> > time we should assign new ids...
> > > Does the stream-out from WPA make any assumptions about the stmt
> > > uids?
> > > For example, 
> > >   #define STMT_UID_NOT_IN_RANGE(uid) \
> > >     (gimple_stmt_max_uid (fn) < uid || uid == 0)
> > > seems to assume that the UIDs are per-function ranges from
> > >   [0-gimple_stmt_max_uid (fn)]
> > > which isn't the case for the uids set by the analyzer.  Maybe
> > > that's
> > > the issue here?
> > > 
> > > Sorry for not being more familiar with the IPA/LTO code
> > 
> > There is lto_prepare_function_for_streaming which assigns uids to
> > be
> > incremental.   So I guess problem is that it is not called at WPA
> > time
> > if function is in memory (since at moment we do not really modify
> > bodies
> > at WPA time, however we do stream them in sometimes to icf compare
> > them
> > or to update profile).
> > 
> > So i guess fix would be to arrange
> > lto_prepare_function_for_streaming
> > to
> > be called on all functions with body defined before WPA stream-out?
> 
> Thanks.
> 
> I think my earlier analysis was wrong.
> 
> With the caveat that I'm not as familiar with the IPA code as other
> parts of the compiler, what I think is going on is:
> 
> WPA streams in the input callgraph:
> 
>   main/1 -------------------> a/0  ----------------------> b/2
>                  |                           |
>           lto_stmt_uid == 1          lto_stmt_uid == 1
> 
> WPA generates a clone of a, and streams out this callgraph:
> 
>   main/1 -------------------> a.isra/3  ----------------------> b/2
> 
> Without -fanalyzer, the callgraph edges have NULL call_stmt and so
> the
> existing lto_stmt_uid values are used when streaming out:
> 
>   main/1 -------------------> a.isra/3  ----------------------> b/2
>                  |                           |
>          lto_stmt_uid == 1          lto_stmt_uid == 1
> 
> 
> With -fanalyzer, the functions were materialized, so the callgraph
> edges have non-NULL call-stmt.
> 
> lto_prepare_function_for_streaming is called for "main/1", but not
> for
> "a.isra/3", as it is a clone (see the condition at line 310):
> 
>     307   /* Do body modifications needed for streaming before we
> fork out
>     308      worker processes.  */
>     309   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
>     310     if (!node->clone_of && gimple_has_body_p (node->decl))
>     311       lto_prepare_function_for_streaming (node);
> 
> Hence the uids are written to within main/1, but not within a.isra/3
> (I
> think).
> Hence this callgraph is written out if the functions bodies were
> loaded:
> 
>   main/1 -------------------> a.isra/3  ----------------------> b/2
>                  |                           |
>          call_stmt->uid == 0           call_stmt->uid == 2
>         so lto_stmt_uid == 1          so lto_stmt_uid == 3
> 
> 
> At the ltrans stream-in, it reads in the edges, and then
> calls fixup_call_stmt_edges_1 on both the original node *and* any
> clones:
> 
>         1237        fixup_call_stmt_edges_1 (orig, stmts, fn);
>         1238      if (orig->clones)
>         1239        for (node = orig->clones; node != orig;)
>         1240          {
>         1241            if (!node->thunk)
>         1242              fixup_call_stmt_edges_1 (node, stmts, fn);
> 
> WPA has renumbered the stmts in non-clone functions, but has not
> renumbered them for clones.
> 
> Note that fixup_call_stmt_edges passes the same array of stmts to
> both
> calls to fixup_call_stmt_edges_1 (both the original and the clones)
> i.e. it's trying to fixup the edges for "a.isra/3" with the stmts in
> "a/0".   Should it be doing this?  "stmts" is constructed in
> input_function by walking the BBs of cfun.  Is that going to find the
> stmts of the clones?  (sorry, I'm not familiar with how clones work
> within our IPA implementation).
> 
> fixup_call_stmt_edges_1 attempts to fixup the call edge from a.isra/3
> to b/2, which has lto_stmt_uid == 3 and thus stmt->uid == 2, but
> there
> are only 2 stmts within the function "a/0", hence it's one past the
> end
> of the array, and fails with:
> test.c:3:5: fatal error: Cgraph edge statement index out of range
> 
> I wonder if this can happen any time WPA loads a function body that
> makes a call that then gets cloned; it just happens that -fanalyzer
> is
> loading the function body.

It's still not clear to me what's going on with clones in the above,
e.g. why we don't fix up clones in lto_prepare_function_for_streaming.
However, given that the analyzer part of the patch fixes things and
seems low risk, I've gone ahead and committed the following simpler
version of the patch to trunk for gcc 11, to unbreak the combination
of -fanalyzer and -flto for the case where a function gets cloned.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r11-8142-g17f3c2b8ac477b07ca0aafbc7d74ba305dc1ee33.

Here's the commit message and patch:

gimple.h has this comment for gimple's uid field:

  /* UID of this statement.  This is used by passes that want to
     assign IDs to statements.  It must be assigned and used by each
     pass.  By default it should be assumed to contain garbage.  */
  unsigned uid;

and gimple_set_uid has:

   Please note that this UID property is supposed to be undefined at
   pass boundaries.  This means that a given pass should not assume it
   contains any useful value when the pass starts and thus can set it
   to any value it sees fit.

which suggests that any pass can use the uid field as an arbitrary
scratch space.

PR analyzer/98599 reports a case where this error occurs in LTO mode:
  fatal error: Cgraph edge statement index out of range
on certain inputs with -fanalyzer.

The error occurs in the LTRANS phase after -fanalyzer runs in the
WPA phase.  The analyzer pass writes to the uid fields of all stmts.

The error occurs when LTRANS is streaming callgraph edges back in.
The LTO format uses stmt uids to associate call stmts with callgraph
edges between WPA and LTRANS.
For example, in lto-cgraph.c, lto_output_edge writes out the
gimple_uid, and input_edge reads it back in.

lto_prepare_function_for_streaming has code to renumber the stmt UIDs
when the code is streamed back out, but for some reason this isn't
called for clones:
    307	  /* Do body modifications needed for streaming before we fork out
    308	     worker processes.  */
    309	  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
    310	    if (!node->clone_of && gimple_has_body_p (node->decl))
    311	      lto_prepare_function_for_streaming (node);

Hence the combination of -fanalyzer and -flto will fail in LTRANS's
stream-in if any function clones are encountered.

It's not fully clear to me why this isn't done for clones, and what the
correct fix should be to allow arbitrary changes to uids within WPA
passes.

In the meantime, this patch works around the issue by updating the
analyzer to save and restore the UIDs, fixing the error.

gcc/analyzer/ChangeLog:
	PR analyzer/98599
	* supergraph.cc (saved_uids::make_uid_unique): New.
	(saved_uids::restore_uids): New.
	(supergraph::supergraph): Replace assignments to stmt->uid with
	calls to m_stmt_uids.make_uid_unique.
	(supergraph::~supergraph): New.
	* supergraph.h (class saved_uids): New.
	(supergraph::~supergraph): New decl.
	(supergraph::m_stmt_uids): New field.

gcc/testsuite/ChangeLog:
	PR analyzer/98599
	* gcc.dg/analyzer/pr98599-a.c: New test.
	* gcc.dg/analyzer/pr98599-b.c: New test.
---
 gcc/analyzer/supergraph.cc                | 57 +++++++++++++++++++++--
 gcc/analyzer/supergraph.h                 | 15 ++++++
 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 ++++
 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
 4 files changed, 77 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c

diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
index 4b934568db6..8611d0f8689 100644
--- a/gcc/analyzer/supergraph.cc
+++ b/gcc/analyzer/supergraph.cc
@@ -87,6 +87,50 @@ supergraph_call_edge (function *fun, gimple *stmt)
   return edge;
 }
 
+/* class saved_uids.
+
+   In order to ensure consistent results without relying on the ordering
+   of pointer values we assign a uid to each gimple stmt, globally unique
+   across all functions.
+
+   Normally, the stmt uids are a scratch space that each pass can freely
+   assign its own values to.  However, in the case of LTO, the uids are
+   used to associate call stmts with callgraph edges between the WPA phase
+   (where the analyzer runs in LTO mode) and the LTRANS phase; if the
+   analyzer changes them in the WPA phase, it leads to errors when
+   streaming the code back in at LTRANS.
+   lto_prepare_function_for_streaming has code to renumber the stmt UIDs
+   when the code is streamed back out, but for some reason this isn't
+   called for clones.
+
+   Hence, as a workaround, this class has responsibility for tracking
+   the original uids and restoring them once the pass is complete
+   (in the supergraph dtor).  */
+
+/* Give STMT a globally unique uid, storing its original uid so it can
+   later be restored.  */
+
+void
+saved_uids::make_uid_unique (gimple *stmt)
+{
+  unsigned next_uid = m_old_stmt_uids.length ();
+  unsigned old_stmt_uid = stmt->uid;
+  stmt->uid = next_uid;
+  m_old_stmt_uids.safe_push
+    (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
+}
+
+/* Restore the saved uids of all stmts.  */
+
+void
+saved_uids::restore_uids () const
+{
+  unsigned i;
+  std::pair<gimple *, unsigned> *pair;
+  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
+    pair->first->uid = pair->second;
+}
+
 /* supergraph's ctor.  Walk the callgraph, building supernodes for each
    CFG basic block, splitting the basic blocks at callsites.  Join
    together the supernodes with interprocedural and intraprocedural
@@ -101,8 +145,6 @@ supergraph::supergraph (logger *logger)
 
   /* First pass: make supernodes (and assign UIDs to the gimple stmts).  */
   {
-    unsigned next_uid = 0;
-
     /* Sort the cgraph_nodes?  */
     cgraph_node *node;
     FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
@@ -127,7 +169,7 @@ supergraph::supergraph (logger *logger)
 	    {
 	      gimple *stmt = gsi_stmt (gpi);
 	      m_stmt_to_node_t.put (stmt, node_for_stmts);
-	      stmt->uid = next_uid++;
+	      m_stmt_uids.make_uid_unique (stmt);
 	    }
 
 	  /* Append statements from BB to the current supernode, splitting
@@ -139,7 +181,7 @@ supergraph::supergraph (logger *logger)
 	      gimple *stmt = gsi_stmt (gsi);
 	      node_for_stmts->m_stmts.safe_push (stmt);
 	      m_stmt_to_node_t.put (stmt, node_for_stmts);
-	      stmt->uid = next_uid++;
+	      m_stmt_uids.make_uid_unique (stmt);
 	      if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
 		{
 		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
@@ -257,6 +299,13 @@ supergraph::supergraph (logger *logger)
   }
 }
 
+/* supergraph's dtor.  Reset stmt uids.  */
+
+supergraph::~supergraph ()
+{
+  m_stmt_uids.restore_uids ();
+}
+
 /* Dump this graph in .dot format to PP, using DUMP_ARGS.
    Cluster the supernodes by function, then by BB from original CFG.  */
 
diff --git a/gcc/analyzer/supergraph.h b/gcc/analyzer/supergraph.h
index 5d1268e555f..f4090fd5e0e 100644
--- a/gcc/analyzer/supergraph.h
+++ b/gcc/analyzer/supergraph.h
@@ -79,6 +79,18 @@ struct supergraph_traits
   typedef supercluster cluster_t;
 };
 
+/* A class to manage the setting and restoring of statement uids.  */
+
+class saved_uids
+{
+public:
+  void make_uid_unique (gimple *stmt);
+  void restore_uids () const;
+
+private:
+  auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
+};
+
 /* A "supergraph" is a directed graph formed by joining together all CFGs,
    linking them via interprocedural call and return edges.
 
@@ -90,6 +102,7 @@ class supergraph : public digraph<supergraph_traits>
 {
 public:
   supergraph (logger *logger);
+  ~supergraph ();
 
   supernode *get_node_for_function_entry (function *fun) const
   {
@@ -205,6 +218,8 @@ private:
 
   typedef hash_map<const function *, unsigned> function_to_num_snodes_t;
   function_to_num_snodes_t m_function_to_num_snodes;
+
+  saved_uids m_stmt_uids;
 };
 
 /* A node within a supergraph.  */
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
new file mode 100644
index 00000000000..2bbf37b0e6e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
@@ -0,0 +1,8 @@
+/* { dg-do link } */
+/* { dg-require-effective-target lto } */
+/* { dg-additional-options "-Os -flto" } */
+/* { dg-additional-sources pr98599-b.c } */
+
+int b(int x);
+int a() { b(5); }
+int main() { a(); }
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
new file mode 100644
index 00000000000..cfdeb3bf134
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
@@ -0,0 +1 @@
+int b(int x) { return x; }
-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-04-13  1:24           ` [committed] " David Malcolm
@ 2021-04-13  5:46             ` Jan Hubicka
  2021-04-13  6:08               ` Jan Hubicka
  2021-04-15  9:45             ` Jan Hubicka
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Hubicka @ 2021-04-13  5:46 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

Hello
> > Thanks.
> > 
> > I think my earlier analysis was wrong.
Sorry for late reply.  I was looking into it again yesterday but was bit
confused about what is goin gon here.
> > 
> > With the caveat that I'm not as familiar with the IPA code as other
> > parts of the compiler, what I think is going on is:
> > 
> > WPA streams in the input callgraph:
> > 
> >   main/1 -------------------> a/0  ----------------------> b/2
> >                  |                           |
> >           lto_stmt_uid == 1          lto_stmt_uid == 1
> > 
> > WPA generates a clone of a, and streams out this callgraph:
> > 
> >   main/1 -------------------> a.isra/3  ----------------------> b/2
> > 
> > Without -fanalyzer, the callgraph edges have NULL call_stmt and so
> > the
> > existing lto_stmt_uid values are used when streaming out:
> > 
> >   main/1 -------------------> a.isra/3  ----------------------> b/2
> >                  |                           |
> >          lto_stmt_uid == 1          lto_stmt_uid == 1
> > 
> > 
> > With -fanalyzer, the functions were materialized, so the callgraph
> > edges have non-NULL call-stmt.
> > 
> > lto_prepare_function_for_streaming is called for "main/1", but not
> > for
> > "a.isra/3", as it is a clone (see the condition at line 310):
> > 
> >     307   /* Do body modifications needed for streaming before we
> > fork out
> >     308      worker processes.  */
> >     309   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> >     310     if (!node->clone_of && gimple_has_body_p (node->decl))
> >     311       lto_prepare_function_for_streaming (node);
> > 
> > Hence the uids are written to within main/1, but not within a.isra/3
> > (I
> > think).

This is still correct. The idea here is that clone does not have its own
body but shares it with the original function until it gets materialized
and then it is turned to real function itself.

At WPA time some functions are loaded into memory while others are just
rerefences to LTO data.
So either
  1) we have function referencing LTO data and then the function
     and all its clones should use lto_stmt_uid and stmt pointers should
     be NULL or 
  2) we have function loaded in memory and the function and all its
     clones should have lto_stmt_uid undefined and stmt points used.

Since you load function to memory we should end up using stmt pointers.
Since there is only one body, you want to call
lto_prepare_function_for_streaming just once - for the function and not
for its clones.
> > Hence this callgraph is written out if the functions bodies were
> > loaded:
> > 
> >   main/1 -------------------> a.isra/3  ----------------------> b/2
> >                  |                           |
> >          call_stmt->uid == 0           call_stmt->uid == 2
> >         so lto_stmt_uid == 1          so lto_stmt_uid == 3
> > 
> > 
> > At the ltrans stream-in, it reads in the edges, and then
> > calls fixup_call_stmt_edges_1 on both the original node *and* any
> > clones:
> > 
> >         1237        fixup_call_stmt_edges_1 (orig, stmts, fn);
> >         1238      if (orig->clones)
> >         1239        for (node = orig->clones; node != orig;)
> >         1240          {
> >         1241            if (!node->thunk)
> >         1242              fixup_call_stmt_edges_1 (node, stmts, fn);

This is also intended - if you load the body and to maintain the
invariant above, you want to change the representation of all edges
(including those in clones)
> > 
> > WPA has renumbered the stmts in non-clone functions, but has not
> > renumbered them for clones.

And this is where things seem to be wrong - since function body is in
memory at WPA time, the uids sould be cleared by fixup_call_stmt_edges
at WPA time and thus edge->call_stmt != 0 for all callgraph edges in the
function and all its clones.
Consequently while streaming out we should use:
uid = !edge->call_stmt ? edge->lto_stmt_uid
		       : gimple_uid (edge->call_stmt) + 1;
So all the uids should go from gimple_uid (edge->call_stmt) and those
all points to the call statements that has been renumbered
by lto_prepare_function.  So I do not see how these are going out of
sync.
> > 
> > Note that fixup_call_stmt_edges passes the same array of stmts to
> > both
> > calls to fixup_call_stmt_edges_1 (both the original and the clones)
> > i.e. it's trying to fixup the edges for "a.isra/3" with the stmts in
> > "a/0".   Should it be doing this?  "stmts" is constructed in
Yes, since these are same call statements (there is only one gimple body
around).
> > input_function by walking the BBs of cfun.  Is that going to find the
> > stmts of the clones?  (sorry, I'm not familiar with how clones work
> > within our IPA implementation).
> > 
> > fixup_call_stmt_edges_1 attempts to fixup the call edge from a.isra/3
> > to b/2, which has lto_stmt_uid == 3 and thus stmt->uid == 2, but
> > there
> > are only 2 stmts within the function "a/0", hence it's one past the
> > end
> > of the array, and fails with:
> > test.c:3:5: fatal error: Cgraph edge statement index out of range
> > 
> > I wonder if this can happen any time WPA loads a function body that
> > makes a call that then gets cloned; it just happens that -fanalyzer
> > is
> > loading the function body.

I guess one needs to figure out how WPA end up streaming different
indexes.  I will try to de bug that.

Honza
> 
> It's still not clear to me what's going on with clones in the above,
> e.g. why we don't fix up clones in lto_prepare_function_for_streaming.
> However, given that the analyzer part of the patch fixes things and
> seems low risk, I've gone ahead and committed the following simpler
> version of the patch to trunk for gcc 11, to unbreak the combination
> of -fanalyzer and -flto for the case where a function gets cloned.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Pushed to trunk as r11-8142-g17f3c2b8ac477b07ca0aafbc7d74ba305dc1ee33.
> 
> Here's the commit message and patch:
> 
> gimple.h has this comment for gimple's uid field:
> 
>   /* UID of this statement.  This is used by passes that want to
>      assign IDs to statements.  It must be assigned and used by each
>      pass.  By default it should be assumed to contain garbage.  */
>   unsigned uid;
> 
> and gimple_set_uid has:
> 
>    Please note that this UID property is supposed to be undefined at
>    pass boundaries.  This means that a given pass should not assume it
>    contains any useful value when the pass starts and thus can set it
>    to any value it sees fit.
> 
> which suggests that any pass can use the uid field as an arbitrary
> scratch space.
> 
> PR analyzer/98599 reports a case where this error occurs in LTO mode:
>   fatal error: Cgraph edge statement index out of range
> on certain inputs with -fanalyzer.
> 
> The error occurs in the LTRANS phase after -fanalyzer runs in the
> WPA phase.  The analyzer pass writes to the uid fields of all stmts.
> 
> The error occurs when LTRANS is streaming callgraph edges back in.
> The LTO format uses stmt uids to associate call stmts with callgraph
> edges between WPA and LTRANS.
> For example, in lto-cgraph.c, lto_output_edge writes out the
> gimple_uid, and input_edge reads it back in.
> 
> lto_prepare_function_for_streaming has code to renumber the stmt UIDs
> when the code is streamed back out, but for some reason this isn't
> called for clones:
>     307	  /* Do body modifications needed for streaming before we fork out
>     308	     worker processes.  */
>     309	  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
>     310	    if (!node->clone_of && gimple_has_body_p (node->decl))
>     311	      lto_prepare_function_for_streaming (node);
> 
> Hence the combination of -fanalyzer and -flto will fail in LTRANS's
> stream-in if any function clones are encountered.
> 
> It's not fully clear to me why this isn't done for clones, and what the
> correct fix should be to allow arbitrary changes to uids within WPA
> passes.
> 
> In the meantime, this patch works around the issue by updating the
> analyzer to save and restore the UIDs, fixing the error.
> 
> gcc/analyzer/ChangeLog:
> 	PR analyzer/98599
> 	* supergraph.cc (saved_uids::make_uid_unique): New.
> 	(saved_uids::restore_uids): New.
> 	(supergraph::supergraph): Replace assignments to stmt->uid with
> 	calls to m_stmt_uids.make_uid_unique.
> 	(supergraph::~supergraph): New.
> 	* supergraph.h (class saved_uids): New.
> 	(supergraph::~supergraph): New decl.
> 	(supergraph::m_stmt_uids): New field.
> 
> gcc/testsuite/ChangeLog:
> 	PR analyzer/98599
> 	* gcc.dg/analyzer/pr98599-a.c: New test.
> 	* gcc.dg/analyzer/pr98599-b.c: New test.
> ---
>  gcc/analyzer/supergraph.cc                | 57 +++++++++++++++++++++--
>  gcc/analyzer/supergraph.h                 | 15 ++++++
>  gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 ++++
>  gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
>  4 files changed, 77 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> 
> diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
> index 4b934568db6..8611d0f8689 100644
> --- a/gcc/analyzer/supergraph.cc
> +++ b/gcc/analyzer/supergraph.cc
> @@ -87,6 +87,50 @@ supergraph_call_edge (function *fun, gimple *stmt)
>    return edge;
>  }
>  
> +/* class saved_uids.
> +
> +   In order to ensure consistent results without relying on the ordering
> +   of pointer values we assign a uid to each gimple stmt, globally unique
> +   across all functions.
> +
> +   Normally, the stmt uids are a scratch space that each pass can freely
> +   assign its own values to.  However, in the case of LTO, the uids are
> +   used to associate call stmts with callgraph edges between the WPA phase
> +   (where the analyzer runs in LTO mode) and the LTRANS phase; if the
> +   analyzer changes them in the WPA phase, it leads to errors when
> +   streaming the code back in at LTRANS.
> +   lto_prepare_function_for_streaming has code to renumber the stmt UIDs
> +   when the code is streamed back out, but for some reason this isn't
> +   called for clones.
> +
> +   Hence, as a workaround, this class has responsibility for tracking
> +   the original uids and restoring them once the pass is complete
> +   (in the supergraph dtor).  */
> +
> +/* Give STMT a globally unique uid, storing its original uid so it can
> +   later be restored.  */
> +
> +void
> +saved_uids::make_uid_unique (gimple *stmt)
> +{
> +  unsigned next_uid = m_old_stmt_uids.length ();
> +  unsigned old_stmt_uid = stmt->uid;
> +  stmt->uid = next_uid;
> +  m_old_stmt_uids.safe_push
> +    (std::pair<gimple *, unsigned> (stmt, old_stmt_uid));
> +}
> +
> +/* Restore the saved uids of all stmts.  */
> +
> +void
> +saved_uids::restore_uids () const
> +{
> +  unsigned i;
> +  std::pair<gimple *, unsigned> *pair;
> +  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
> +    pair->first->uid = pair->second;
> +}
> +
>  /* supergraph's ctor.  Walk the callgraph, building supernodes for each
>     CFG basic block, splitting the basic blocks at callsites.  Join
>     together the supernodes with interprocedural and intraprocedural
> @@ -101,8 +145,6 @@ supergraph::supergraph (logger *logger)
>  
>    /* First pass: make supernodes (and assign UIDs to the gimple stmts).  */
>    {
> -    unsigned next_uid = 0;
> -
>      /* Sort the cgraph_nodes?  */
>      cgraph_node *node;
>      FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> @@ -127,7 +169,7 @@ supergraph::supergraph (logger *logger)
>  	    {
>  	      gimple *stmt = gsi_stmt (gpi);
>  	      m_stmt_to_node_t.put (stmt, node_for_stmts);
> -	      stmt->uid = next_uid++;
> +	      m_stmt_uids.make_uid_unique (stmt);
>  	    }
>  
>  	  /* Append statements from BB to the current supernode, splitting
> @@ -139,7 +181,7 @@ supergraph::supergraph (logger *logger)
>  	      gimple *stmt = gsi_stmt (gsi);
>  	      node_for_stmts->m_stmts.safe_push (stmt);
>  	      m_stmt_to_node_t.put (stmt, node_for_stmts);
> -	      stmt->uid = next_uid++;
> +	      m_stmt_uids.make_uid_unique (stmt);
>  	      if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
>  		{
>  		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
> @@ -257,6 +299,13 @@ supergraph::supergraph (logger *logger)
>    }
>  }
>  
> +/* supergraph's dtor.  Reset stmt uids.  */
> +
> +supergraph::~supergraph ()
> +{
> +  m_stmt_uids.restore_uids ();
> +}
> +
>  /* Dump this graph in .dot format to PP, using DUMP_ARGS.
>     Cluster the supernodes by function, then by BB from original CFG.  */
>  
> diff --git a/gcc/analyzer/supergraph.h b/gcc/analyzer/supergraph.h
> index 5d1268e555f..f4090fd5e0e 100644
> --- a/gcc/analyzer/supergraph.h
> +++ b/gcc/analyzer/supergraph.h
> @@ -79,6 +79,18 @@ struct supergraph_traits
>    typedef supercluster cluster_t;
>  };
>  
> +/* A class to manage the setting and restoring of statement uids.  */
> +
> +class saved_uids
> +{
> +public:
> +  void make_uid_unique (gimple *stmt);
> +  void restore_uids () const;
> +
> +private:
> +  auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids;
> +};
> +
>  /* A "supergraph" is a directed graph formed by joining together all CFGs,
>     linking them via interprocedural call and return edges.
>  
> @@ -90,6 +102,7 @@ class supergraph : public digraph<supergraph_traits>
>  {
>  public:
>    supergraph (logger *logger);
> +  ~supergraph ();
>  
>    supernode *get_node_for_function_entry (function *fun) const
>    {
> @@ -205,6 +218,8 @@ private:
>  
>    typedef hash_map<const function *, unsigned> function_to_num_snodes_t;
>    function_to_num_snodes_t m_function_to_num_snodes;
> +
> +  saved_uids m_stmt_uids;
>  };
>  
>  /* A node within a supergraph.  */
> diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> new file mode 100644
> index 00000000000..2bbf37b0e6e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
> @@ -0,0 +1,8 @@
> +/* { dg-do link } */
> +/* { dg-require-effective-target lto } */
> +/* { dg-additional-options "-Os -flto" } */
> +/* { dg-additional-sources pr98599-b.c } */
> +
> +int b(int x);
> +int a() { b(5); }
> +int main() { a(); }
> diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> new file mode 100644
> index 00000000000..cfdeb3bf134
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
> @@ -0,0 +1 @@
> +int b(int x) { return x; }
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-04-13  5:46             ` Jan Hubicka
@ 2021-04-13  6:08               ` Jan Hubicka
  2021-04-13 18:21                 ` David Malcolm
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Hubicka @ 2021-04-13  6:08 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

Hi,
stepping through the streaming process it turns out to be funny
difference between gimple_has_body and node->has_gimple_body_p.
While the first tests whether gimple body really exists in memory (by
looking for DECL_STRUCT_FUNCTION) the second tests if gimple body can be
made available via node->get_body (so gimple_body_p returns 1).

Now what happens is that function clone is created and the original
function becomes dead and thus is marked as external declaration. At
this point we can not get the body via node->get_body, but body is still
around since analyzer read it. This makes the renumbering loop to miss
this particular body.  Since for node->clone_of we have
 gimple_body_p (node->clone_of->decl) == true
and
 node->has_gimple_body_p (node->clone_of) == false
While for node we have
 node->has_gimple_body_p (node) == true

This is both correct but next stage1 we really ought to rename one of
predicates. Sadly I can not think of much better names though.

I am testing. 

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index ceb61bb300b..5903f75ac23 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -306,7 +306,7 @@ lto_wpa_write_files (void)
   cgraph_node *node;
   /* Do body modifications needed for streaming before we fork out
      worker processes.  */
-  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+  FOR_EACH_FUNCTION (node)
     if (!node->clone_of && gimple_has_body_p (node->decl))
       lto_prepare_function_for_streaming (node);
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-04-13  6:08               ` Jan Hubicka
@ 2021-04-13 18:21                 ` David Malcolm
  2021-04-13 18:36                   ` David Malcolm
  0 siblings, 1 reply; 14+ messages in thread
From: David Malcolm @ 2021-04-13 18:21 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Tue, 2021-04-13 at 08:08 +0200, Jan Hubicka wrote:
> Hi,
> stepping through the streaming process it turns out to be funny
> difference between gimple_has_body and node->has_gimple_body_p.
> While the first tests whether gimple body really exists in memory (by
> looking for DECL_STRUCT_FUNCTION) the second tests if gimple body can
> be
> made available via node->get_body (so gimple_body_p returns 1).
> 
> Now what happens is that function clone is created and the original
> function becomes dead and thus is marked as external declaration. At
> this point we can not get the body via node->get_body, but body is
> still
> around since analyzer read it. This makes the renumbering loop to
> miss
> this particular body.  Since for node->clone_of we have
>  gimple_body_p (node->clone_of->decl) == true
> and
>  node->has_gimple_body_p (node->clone_of) == false
> While for node we have
>  node->has_gimple_body_p (node) == true
> 
> This is both correct but next stage1 we really ought to rename one of
> predicates. Sadly I can not think of much better names though.
> 
> I am testing. 
> 
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index ceb61bb300b..5903f75ac23 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -306,7 +306,7 @@ lto_wpa_write_files (void)
>    cgraph_node *node;
>    /* Do body modifications needed for streaming before we fork out
>       worker processes.  */
> -  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> +  FOR_EACH_FUNCTION (node)
>      if (!node->clone_of && gimple_has_body_p (node->decl))
>        lto_prepare_function_for_streaming (node);
> 

Thanks for digging into it.


FWIW the above patch seems to fix my reproducer (if I remove the uid
restoration logic from my patch).  But maybe leave it to next stage 1,
and keep my workaround for now?

On the subject of the analyzer and LTO, I should note that the analyzer
somewhat abuses the framework by doing all of the analysis together in
WPA, rather than trying to storing summaries from WPA to allow sharding
of the analysis in LTRANS.  The analyzer currently only has a
placeholder implementation of call summarization; it's something I hope
to look at for GCC 12:

Updating it further to properly split things between WPA and LTRANS
would be additional work on top of that.

Dave


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-04-13 18:21                 ` David Malcolm
@ 2021-04-13 18:36                   ` David Malcolm
  0 siblings, 0 replies; 14+ messages in thread
From: David Malcolm @ 2021-04-13 18:36 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Tue, 2021-04-13 at 14:21 -0400, David Malcolm via Gcc-patches wrote:
> On Tue, 2021-04-13 at 08:08 +0200, Jan Hubicka wrote:
> > Hi,
> > stepping through the streaming process it turns out to be funny
> > difference between gimple_has_body and node->has_gimple_body_p.
> > While the first tests whether gimple body really exists in memory
> > (by
> > looking for DECL_STRUCT_FUNCTION) the second tests if gimple body
> > can
> > be
> > made available via node->get_body (so gimple_body_p returns 1).
> > 
> > Now what happens is that function clone is created and the original
> > function becomes dead and thus is marked as external declaration.
> > At
> > this point we can not get the body via node->get_body, but body is
> > still
> > around since analyzer read it. This makes the renumbering loop to
> > miss
> > this particular body.  Since for node->clone_of we have
> >  gimple_body_p (node->clone_of->decl) == true
> > and
> >  node->has_gimple_body_p (node->clone_of) == false
> > While for node we have
> >  node->has_gimple_body_p (node) == true
> > 
> > This is both correct but next stage1 we really ought to rename one
> > of
> > predicates. Sadly I can not think of much better names though.
> > 
> > I am testing. 
> > 
> > diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> > index ceb61bb300b..5903f75ac23 100644
> > --- a/gcc/lto/lto.c
> > +++ b/gcc/lto/lto.c
> > @@ -306,7 +306,7 @@ lto_wpa_write_files (void)
> >    cgraph_node *node;
> >    /* Do body modifications needed for streaming before we fork out
> >       worker processes.  */
> > -  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> > +  FOR_EACH_FUNCTION (node)
> >      if (!node->clone_of && gimple_has_body_p (node->decl))
> >        lto_prepare_function_for_streaming (node);
> > 
> 
> Thanks for digging into it.
> 
> 
> FWIW the above patch seems to fix my reproducer (if I remove the uid
> restoration logic from my patch).  But maybe leave it to next stage
> 1,
> and keep my workaround for now?
> 
> On the subject of the analyzer and LTO, I should note that the
> analyzer
> somewhat abuses the framework by doing all of the analysis together
> in
> WPA, rather than trying to storing summaries from WPA to allow
> sharding
> of the analysis in LTRANS.  The analyzer currently only has a
> placeholder implementation of call summarization; it's something I
> hope
> to look at for GCC 12:

FWIW I meant to paste:
  https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=99390
here, ("[meta-bug] tracker bug for call summaries in -fanalyzer")

> Updating it further to properly split things between WPA and LTRANS
> would be additional work on top of that.
> 
> Dave
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-04-13  1:24           ` [committed] " David Malcolm
  2021-04-13  5:46             ` Jan Hubicka
@ 2021-04-15  9:45             ` Jan Hubicka
  2021-04-15 12:17               ` David Malcolm
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Hubicka @ 2021-04-15  9:45 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

Hi,
this is patch fixing the underlying issue of function missing
lto_prepare_function_for_streaming because gimple_has_body_p is not the
same thing as node.has_gimple_body (which needs to be clarified next
stage1 by finding better names for this I suppose).

I commited it to gcc 11 even though we already have your workaround
since it is small and safe and it may save some pain when backporting
changes to the branch in future - basically all passes at WPA
renumbering statements would hit this issue which is not that obvious to
debug as we found :)

We may backport it to gcc10 too if you preffer it over your fix - I
think both are fine in general for release branches.

lto-bootstrapped/regtested x86_64-linux.

Honza

2021-04-15  Jan Hubicka  <hubicka@ucw.cz>

	PR lto/98599
	* lto.c (lto_wpa_write_files): Fix handling of clones.

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index ceb61bb300b..5903f75ac23 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -306,7 +306,7 @@ lto_wpa_write_files (void)
   cgraph_node *node;
   /* Do body modifications needed for streaming before we fork out
      worker processes.  */
-  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+  FOR_EACH_FUNCTION (node)
     if (!node->clone_of && gimple_has_body_p (node->decl))
       lto_prepare_function_for_streaming (node);
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]
  2021-04-15  9:45             ` Jan Hubicka
@ 2021-04-15 12:17               ` David Malcolm
  0 siblings, 0 replies; 14+ messages in thread
From: David Malcolm @ 2021-04-15 12:17 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Thu, 2021-04-15 at 11:45 +0200, Jan Hubicka wrote:
> Hi,
> this is patch fixing the underlying issue of function missing
> lto_prepare_function_for_streaming because gimple_has_body_p is not
> the
> same thing as node.has_gimple_body (which needs to be clarified next
> stage1 by finding better names for this I suppose).
> 
> I commited it to gcc 11 even though we already have your workaround
> since it is small and safe and it may save some pain when backporting
> changes to the branch in future - basically all passes at WPA
> renumbering statements would hit this issue which is not that obvious
> to
> debug as we found :)
> 

I think it's just the analyzer that's affected in gcc 11 (and plugins,
I suppose), hence I went with the localized fix, but it's your call.


> We may backport it to gcc10 too if you preffer it over your fix - I
> think both are fine in general for release branches.
> 

The analyzer started changing stmt uids in gcc 11 (specifically in
b0702ac5588333e27d7ec43d21d704521f7a05c6, on 2020-10-27), so I think
the fix would only affect plugins in older releases.

Dave


> lto-bootstrapped/regtested x86_64-linux.
> 
> Honza
> 
> 2021-04-15  Jan Hubicka  <hubicka@ucw.cz>
> 
>         PR lto/98599
>         * lto.c (lto_wpa_write_files): Fix handling of clones.
> 
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index ceb61bb300b..5903f75ac23 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -306,7 +306,7 @@ lto_wpa_write_files (void)
>    cgraph_node *node;
>    /* Do body modifications needed for streaming before we fork out
>       worker processes.  */
> -  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> +  FOR_EACH_FUNCTION (node)
>      if (!node->clone_of && gimple_has_body_p (node->decl))
>        lto_prepare_function_for_streaming (node);
>  
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-04-15 12:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-13 22:03 [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599] David Malcolm
2021-01-14  7:30 ` Richard Biener
2021-01-14 14:00   ` Jan Hubicka
2021-01-21 17:15     ` David Malcolm
2021-01-21 19:09       ` Jan Hubicka
2021-01-21 19:35         ` David Malcolm
2021-01-22  3:07         ` David Malcolm
2021-04-13  1:24           ` [committed] " David Malcolm
2021-04-13  5:46             ` Jan Hubicka
2021-04-13  6:08               ` Jan Hubicka
2021-04-13 18:21                 ` David Malcolm
2021-04-13 18:36                   ` David Malcolm
2021-04-15  9:45             ` Jan Hubicka
2021-04-15 12:17               ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).