public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
@ 2016-04-01 19:13 Patrick Palka
  2016-04-01 19:25 ` Patrick Palka
  2016-04-02  1:28 ` Jason Merrill
  0 siblings, 2 replies; 11+ messages in thread
From: Patrick Palka @ 2016-04-01 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: jason, Patrick Palka

Currently during constexpr CALL_EXPR evaluation we create a new copy of
the callee function's body for each separate call with no attempt made
at reusing the function body.  So when a function ends up getting called
10s of thousands of times durnig constexpr evaluuation, we end up
creating 10s of thousands of copies of the function.

This patch is an attempt at reusing these copies instead of having a new
copy get created each call.  It implements a per-function freelist of
the unshared body/parm/res copies that were created by copy_fn().  When
a constexpr call is finished it pushes the body/parm/res trees to the
freelist, and before a call is evaluated it tries to reuse the trees
from the freelist, falling back to copy_fn() only if the freelist is
empty.

Consequently, instead of creating 10s of thousands of copies for 10s of
thousands of calls, the number of copies that're made corresponds to the
recursion depth of the function.  This makes sense because the reason we
create copies in the first place (IIUC) is to ensure that recursive
calls to a function do not share the same
PARM_DECLs/VAR_DECLs/RESULT_DECLs.

In order for this reuse to work properly in the presence of SAVE_EXPRs,
we have to track each SAVE_EXPR (belonging to the callee) that we
evaluate during the call and then forget its value after the call.
constexpr-vla4.C is a new test that would fail if this patch had missed
this SAVE_EXPR part.

With this patch, the compile time of the test case in the PR gets cut
from 3.5s to 2s and memory usage gets cut from 550M to 200M.

I bootstrapped + regtested this change on x86_64-pc-linux-gnu, and also
compile-tested it against boost with no regressions.

gcc/cp/ChangeLog:

	PR c++/70452
	* constexpr.c (struct bpr_entry): New struct.
	(struct constexpr_fundef): New field bpr_freelist.
	(retrieve_constexpr_fundef): Initialize field bpr_freelist to
	NULL.
	(register_constexpr_fundef): Likewise.
	(cxx_eval_call_expression): Check constexpr_fundef::bpr_freelist
	for a reusable copy of the function's body before creating a new
	copy.  Keep track of the callee's SAVE_EXPRs that we evaluate
	and forget their values afterwards. Push the function body we
	used onto the freelist for later reuse.

gcc/testsuite/ChangeLog:

	PR c++/70452
	* g++.dg/ext/constexpr-vla4.C: New test.
---
 gcc/cp/constexpr.c                        | 55 ++++++++++++++++++++++++++++---
 gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++++++++++
 2 files changed, 67 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ea605dc..fc9b67c 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -112,11 +112,24 @@ ensure_literal_type_for_constexpr_object (tree decl)
   return decl;
 }
 
+/* The representation of a single node in the constexpr_fundef::bpr_freelist chain.  */
+
+struct GTY((chain_next ("%h.prev"))) bpr_entry
+{
+  tree body;
+  tree parms;
+  tree res;
+  struct bpr_entry *prev;
+};
+
 /* Representation of entries in the constexpr function definition table.  */
 
 struct GTY((for_user)) constexpr_fundef {
   tree decl;
   tree body;
+  /* A chain of unused copies of this function's body awaiting reuse for
+     CALL_EXPR evaluation.  */
+  struct bpr_entry *bpr_freelist;
 };
 
 struct constexpr_fundef_hasher : ggc_ptr_hash<constexpr_fundef>
@@ -154,7 +167,7 @@ constexpr_fundef_hasher::hash (constexpr_fundef *fundef)
 static constexpr_fundef *
 retrieve_constexpr_fundef (tree fun)
 {
-  constexpr_fundef fundef = { NULL, NULL };
+  constexpr_fundef fundef = { NULL, NULL, NULL };
   if (constexpr_fundef_table == NULL)
     return NULL;
 
@@ -806,6 +819,7 @@ register_constexpr_fundef (tree fun, tree body)
 
   entry.decl = fun;
   entry.body = body;
+  entry.bpr_freelist = NULL;
   slot = constexpr_fundef_table->find_slot (&entry, INSERT);
 
   gcc_assert (*slot == NULL);
@@ -1377,10 +1391,21 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
       if (!result || result == error_mark_node)
 	{
 	  gcc_assert (DECL_SAVED_TREE (fun));
-	  tree parms, res;
+	  tree body, parms, res;
 
-	  /* Unshare the whole function body.  */
-	  tree body = copy_fn (fun, parms, res);
+	  /* Reuse or create a new unshared copy of this function's body.  */
+	  if (bpr_entry *bpr = new_call.fundef->bpr_freelist)
+	    {
+	      body = bpr->body;
+	      parms = bpr->parms;
+	      res = bpr->res;
+	      new_call.fundef->bpr_freelist = bpr->prev;
+	    }
+	  else
+	    {
+	      /* Unshare the whole function body.  */
+	      body = copy_fn (fun, parms, res);
+	    }
 
 	  /* Associate the bindings with the remapped parms.  */
 	  tree bound = new_call.bindings;
@@ -1409,11 +1434,22 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 	  else
 	    ctx->values->put (res, NULL_TREE);
 
+	  /* Track the SAVE_EXPRs of the callee that we evaluate so that
+	     we can later forget their values.  */
+	  constexpr_ctx ctx_with_save_exprs = *ctx;
+	  hash_set<tree> save_exprs;
+	  ctx_with_save_exprs.save_exprs = &save_exprs;
+
 	  tree jump_target = NULL_TREE;
-	  cxx_eval_constant_expression (ctx, body,
+	  cxx_eval_constant_expression (&ctx_with_save_exprs, body,
 					lval, non_constant_p, overflow_p,
 					&jump_target);
 
+	  /* Forget the saved values of the callee's SAVE_EXPRs.  */
+	  for (hash_set<tree>::iterator iter = save_exprs.begin();
+	       iter != save_exprs.end(); ++iter)
+	    ctx_with_save_exprs.values->remove (*iter);
+
 	  if (DECL_CONSTRUCTOR_P (fun))
 	    /* This can be null for a subobject constructor call, in
 	       which case what we care about is the initialization
@@ -1444,6 +1480,15 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 	    ctx->values->remove (slot);
 	  for (tree parm = parms; parm; parm = TREE_CHAIN (parm))
 	    ctx->values->remove (parm);
+
+	  /* Make the unshared function body that we evaluated available for
+	     reuse by a later call to cxx_eval_call_expression.  */
+	  bpr_entry *bpr = ggc_alloc<bpr_entry> ();
+	  bpr->body = body;
+	  bpr->parms = parms;
+	  bpr->res = res;
+	  bpr->prev = new_call.fundef->bpr_freelist;
+	  new_call.fundef->bpr_freelist = bpr;
 	}
 
       if (result == error_mark_node)
diff --git a/gcc/testsuite/g++.dg/ext/constexpr-vla4.C b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
new file mode 100644
index 0000000..428a8fd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
@@ -0,0 +1,17 @@
+// PR c++/70452
+// { dg-do compile { target c++14 } }
+
+constexpr int
+foo (int n, bool p)
+{
+  __extension__ int a [n] = { 0 };
+  if (n == 3)
+    foo (n - 2, false);
+  if (n == 3)
+    foo(n - 1, true);
+  if (p)
+    return a[1];
+  return 0;
+}
+
+constexpr int i2 = foo (3, false); // { dg-bogus "array subscript out of bound" }
-- 
2.8.0.rc3.27.gade0865

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-01 19:13 [PATCH] Fix PR c++/70452 (regression in C++ parsing performance) Patrick Palka
@ 2016-04-01 19:25 ` Patrick Palka
  2016-04-02  1:28 ` Jason Merrill
  1 sibling, 0 replies; 11+ messages in thread
From: Patrick Palka @ 2016-04-01 19:25 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jason Merrill, Patrick Palka

On Fri, Apr 1, 2016 at 3:13 PM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> Currently during constexpr CALL_EXPR evaluation we create a new copy of
> the callee function's body for each separate call with no attempt made
> at reusing the function body.  So when a function ends up getting called
> 10s of thousands of times durnig constexpr evaluuation, we end up
> creating 10s of thousands of copies of the function.
>
> This patch is an attempt at reusing these copies instead of having a new
> copy get created each call.  It implements a per-function freelist of
> the unshared body/parm/res copies that were created by copy_fn().  When
> a constexpr call is finished it pushes the body/parm/res trees to the
> freelist, and before a call is evaluated it tries to reuse the trees
> from the freelist, falling back to copy_fn() only if the freelist is
> empty.
>
> Consequently, instead of creating 10s of thousands of copies for 10s of
> thousands of calls, the number of copies that're made corresponds to the
> recursion depth of the function.  This makes sense because the reason we
> create copies in the first place (IIUC) is to ensure that recursive
> calls to a function do not share the same
> PARM_DECLs/VAR_DECLs/RESULT_DECLs.
>
> In order for this reuse to work properly in the presence of SAVE_EXPRs,
> we have to track each SAVE_EXPR (belonging to the callee) that we
> evaluate during the call and then forget its value after the call.
> constexpr-vla4.C is a new test that would fail if this patch had missed
> this SAVE_EXPR part.
>
> With this patch, the compile time of the test case in the PR gets cut
> from 3.5s to 2s and memory usage gets cut from 550M to 200M.
>
> I bootstrapped + regtested this change on x86_64-pc-linux-gnu, and also
> compile-tested it against boost with no regressions.
>
> gcc/cp/ChangeLog:
>
>         PR c++/70452
>         * constexpr.c (struct bpr_entry): New struct.
>         (struct constexpr_fundef): New field bpr_freelist.
>         (retrieve_constexpr_fundef): Initialize field bpr_freelist to
>         NULL.
>         (register_constexpr_fundef): Likewise.
>         (cxx_eval_call_expression): Check constexpr_fundef::bpr_freelist
>         for a reusable copy of the function's body before creating a new
>         copy.  Keep track of the callee's SAVE_EXPRs that we evaluate
>         and forget their values afterwards. Push the function body we
>         used onto the freelist for later reuse.
>
> gcc/testsuite/ChangeLog:
>
>         PR c++/70452
>         * g++.dg/ext/constexpr-vla4.C: New test.
> ---
>  gcc/cp/constexpr.c                        | 55 ++++++++++++++++++++++++++++---
>  gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++++++++++
>  2 files changed, 67 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C
>
> diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> index ea605dc..fc9b67c 100644
> --- a/gcc/cp/constexpr.c
> +++ b/gcc/cp/constexpr.c
> @@ -112,11 +112,24 @@ ensure_literal_type_for_constexpr_object (tree decl)
>    return decl;
>  }
>
> +/* The representation of a single node in the constexpr_fundef::bpr_freelist chain.  */
> +
> +struct GTY((chain_next ("%h.prev"))) bpr_entry
> +{
> +  tree body;
> +  tree parms;
> +  tree res;
> +  struct bpr_entry *prev;
> +};
> +
>  /* Representation of entries in the constexpr function definition table.  */
>
>  struct GTY((for_user)) constexpr_fundef {
>    tree decl;
>    tree body;
> +  /* A chain of unused copies of this function's body awaiting reuse for
> +     CALL_EXPR evaluation.  */
> +  struct bpr_entry *bpr_freelist;
>  };
>
>  struct constexpr_fundef_hasher : ggc_ptr_hash<constexpr_fundef>
> @@ -154,7 +167,7 @@ constexpr_fundef_hasher::hash (constexpr_fundef *fundef)
>  static constexpr_fundef *
>  retrieve_constexpr_fundef (tree fun)
>  {
> -  constexpr_fundef fundef = { NULL, NULL };
> +  constexpr_fundef fundef = { NULL, NULL, NULL };
>    if (constexpr_fundef_table == NULL)
>      return NULL;
>
> @@ -806,6 +819,7 @@ register_constexpr_fundef (tree fun, tree body)
>
>    entry.decl = fun;
>    entry.body = body;
> +  entry.bpr_freelist = NULL;
>    slot = constexpr_fundef_table->find_slot (&entry, INSERT);
>
>    gcc_assert (*slot == NULL);
> @@ -1377,10 +1391,21 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>        if (!result || result == error_mark_node)
>         {
>           gcc_assert (DECL_SAVED_TREE (fun));
> -         tree parms, res;
> +         tree body, parms, res;
>
> -         /* Unshare the whole function body.  */
> -         tree body = copy_fn (fun, parms, res);
> +         /* Reuse or create a new unshared copy of this function's body.  */
> +         if (bpr_entry *bpr = new_call.fundef->bpr_freelist)
> +           {
> +             body = bpr->body;
> +             parms = bpr->parms;
> +             res = bpr->res;
> +             new_call.fundef->bpr_freelist = bpr->prev;
> +           }
> +         else
> +           {
> +             /* Unshare the whole function body.  */
> +             body = copy_fn (fun, parms, res);
> +           }
>
>           /* Associate the bindings with the remapped parms.  */
>           tree bound = new_call.bindings;
> @@ -1409,11 +1434,22 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>           else
>             ctx->values->put (res, NULL_TREE);
>
> +         /* Track the SAVE_EXPRs of the callee that we evaluate so that
> +            we can later forget their values.  */
> +         constexpr_ctx ctx_with_save_exprs = *ctx;
> +         hash_set<tree> save_exprs;
> +         ctx_with_save_exprs.save_exprs = &save_exprs;

Didn't like the way this comment is worded so I changed it to

+         /* Track the callee's evaluated SAVE_EXPRs so that we can forget
+            their values after the call.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-01 19:13 [PATCH] Fix PR c++/70452 (regression in C++ parsing performance) Patrick Palka
  2016-04-01 19:25 ` Patrick Palka
@ 2016-04-02  1:28 ` Jason Merrill
  2016-04-02  1:55   ` Patrick Palka
  2016-04-02 21:18   ` Patrick Palka
  1 sibling, 2 replies; 11+ messages in thread
From: Jason Merrill @ 2016-04-02  1:28 UTC (permalink / raw)
  To: Patrick Palka, gcc-patches

I like this approach a lot.  One thing, though:

On 04/01/2016 03:13 PM, Patrick Palka wrote:
> +struct GTY((chain_next ("%h.prev"))) bpr_entry
> +{
> +  tree body;
> +  tree parms;
> +  tree res;
> +  struct bpr_entry *prev;
> +};
> +
>   /* Representation of entries in the constexpr function definition table.  */
>
>   struct GTY((for_user)) constexpr_fundef {
>     tree decl;
>     tree body;
> +  /* A chain of unused copies of this function's body awaiting reuse for
> +     CALL_EXPR evaluation.  */
> +  struct bpr_entry *bpr_freelist;
>   };

The freelist should be discarded on GC.  I think the way to achieve that 
is to put all the freelists into a separate deletable hash table rather 
than hang them off the fundefs.

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-02  1:28 ` Jason Merrill
@ 2016-04-02  1:55   ` Patrick Palka
  2016-04-02  2:10     ` Patrick Palka
  2016-04-02 21:18   ` Patrick Palka
  1 sibling, 1 reply; 11+ messages in thread
From: Patrick Palka @ 2016-04-02  1:55 UTC (permalink / raw)
  To: Jason Merrill; +Cc: GCC Patches

On Fri, Apr 1, 2016 at 9:28 PM, Jason Merrill <jason@redhat.com> wrote:
> I like this approach a lot.  One thing, though:
>
> On 04/01/2016 03:13 PM, Patrick Palka wrote:
>>
>> +struct GTY((chain_next ("%h.prev"))) bpr_entry
>> +{
>> +  tree body;
>> +  tree parms;
>> +  tree res;
>> +  struct bpr_entry *prev;
>> +};
>> +
>>   /* Representation of entries in the constexpr function definition table.
>> */
>>
>>   struct GTY((for_user)) constexpr_fundef {
>>     tree decl;
>>     tree body;
>> +  /* A chain of unused copies of this function's body awaiting reuse for
>> +     CALL_EXPR evaluation.  */
>> +  struct bpr_entry *bpr_freelist;
>>   };
>
>
> The freelist should be discarded on GC.  I think the way to achieve that is
> to put all the freelists into a separate deletable hash table rather than
> hang them off the fundefs.
>
> Jason
>

Would instead marking the bpr_freelist field as GTY ((skip)) have the
same effect?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-02  1:55   ` Patrick Palka
@ 2016-04-02  2:10     ` Patrick Palka
  0 siblings, 0 replies; 11+ messages in thread
From: Patrick Palka @ 2016-04-02  2:10 UTC (permalink / raw)
  To: Jason Merrill; +Cc: GCC Patches

On Fri, Apr 1, 2016 at 9:54 PM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> On Fri, Apr 1, 2016 at 9:28 PM, Jason Merrill <jason@redhat.com> wrote:
>> I like this approach a lot.  One thing, though:
>>
>> On 04/01/2016 03:13 PM, Patrick Palka wrote:
>>>
>>> +struct GTY((chain_next ("%h.prev"))) bpr_entry
>>> +{
>>> +  tree body;
>>> +  tree parms;
>>> +  tree res;
>>> +  struct bpr_entry *prev;
>>> +};
>>> +
>>>   /* Representation of entries in the constexpr function definition table.
>>> */
>>>
>>>   struct GTY((for_user)) constexpr_fundef {
>>>     tree decl;
>>>     tree body;
>>> +  /* A chain of unused copies of this function's body awaiting reuse for
>>> +     CALL_EXPR evaluation.  */
>>> +  struct bpr_entry *bpr_freelist;
>>>   };
>>
>>
>> The freelist should be discarded on GC.  I think the way to achieve that is
>> to put all the freelists into a separate deletable hash table rather than
>> hang them off the fundefs.
>>
>> Jason
>>
>
> Would instead marking the bpr_freelist field as GTY ((skip)) have the
> same effect?

Looks like not according to
https://gcc.gnu.org/onlinedocs/gccint/GTY-Options.html.  I should have
read that first :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-02  1:28 ` Jason Merrill
  2016-04-02  1:55   ` Patrick Palka
@ 2016-04-02 21:18   ` Patrick Palka
  2016-04-03  3:52     ` Trevor Saunders
  2016-04-04 15:39     ` Jason Merrill
  1 sibling, 2 replies; 11+ messages in thread
From: Patrick Palka @ 2016-04-02 21:18 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Patrick Palka, gcc-patches

On Fri, 1 Apr 2016, Jason Merrill wrote:

> I like this approach a lot.  One thing, though:
> 
> On 04/01/2016 03:13 PM, Patrick Palka wrote:
> > +struct GTY((chain_next ("%h.prev"))) bpr_entry
> > +{
> > +  tree body;
> > +  tree parms;
> > +  tree res;
> > +  struct bpr_entry *prev;
> > +};
> > +
> >   /* Representation of entries in the constexpr function definition table.
> > */
> > 
> >   struct GTY((for_user)) constexpr_fundef {
> >     tree decl;
> >     tree body;
> > +  /* A chain of unused copies of this function's body awaiting reuse for
> > +     CALL_EXPR evaluation.  */
> > +  struct bpr_entry *bpr_freelist;
> >   };
> 
> The freelist should be discarded on GC.  I think the way to achieve that is to
> put all the freelists into a separate deletable hash table rather than hang
> them off the fundefs.
> 
> Jason
> 
> 

Here's a version that uses a separate deletable table to cache the
function copies.  For simplicity I used a hash_map instead of a
hash_table.  Does this look OK to commit after bootstrap + regtest?

On a related note, I noticed that the constexpr_call_table is not marked
deletable.  Marking it deletable speeds up the test case in the PR by
about 10% and saves about 10MB.  Do you think doing so is a good idea?

On another related note, I noticed that marking something as both
GTY((deletable, cache)) doesn't work as intended, because the
gt_cleare_cache functions run _after_ all deletable roots get
zeroed out.  So during GC the gt_cleare_cache function of a root
marked "deletable, cache" would always be a no-op.  Concretely I think
this means that our cv_cache and fold_cache leak memory during GC
because their underlying hash_map (allocated by operator new) is zeroed
before gc_cleare_cache could clear it.

gcc/cp/ChangeLog:

	PR c++/70452
	* constexpr.c (struct fun_copy): New struct.
	(struct fundef_copies_t): New struct.
	(fundef_copies_table): New static variable.
	(maybe_initialize_fundef_copies_table): New static function.
	(get_fun_copy): New static function.
	(save_fun_copy): New static function.
	(cxx_eval_call_expression): Use
	maybe_initialize_fundef_copies_table, get_fun_copy, and
	save_fun_copy.

gcc/testsuite/ChangeLog:

	PR c++/70452
	* g++.dg/ext/constexpr-vla4.C: New test.
---
 gcc/cp/constexpr.c                        | 96 +++++++++++++++++++++++++++++--
 gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++++++
 2 files changed, 109 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ea605dc..e1a5a4e 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -965,6 +965,76 @@ maybe_initialize_constexpr_call_table (void)
     constexpr_call_table = hash_table<constexpr_call_hasher>::create_ggc (101);
 }
 
+/* The representation of a single node in the per-function freelist maintained
+   by FUNDEF_COPIES_TABLE.  */
+
+struct fun_copy
+{
+  tree body;
+  tree parms;
+  tree res;
+  fun_copy *prev;
+};
+
+/* During constexpr CALL_EXPR evaluation, to avoid issues with sharing when
+   a function happens to get called recursively, we unshare the callee
+   function's body and evaluate this unshared copy instead of evaluating the
+   original body.
+
+   FUNDEF_COPIES_TABLE is a per-function freelist of these unshared function
+   copies.  The underlying data structure of FUNDEF_COPIES_TABLE is a hash_map
+   that's keyed off of the original FUNCTION_DECL and whose value is the chain
+   of this function's unused copies awaiting reuse.  */
+
+struct fundef_copies_table_t
+{
+  hash_map<tree, fun_copy *> *map;
+};
+
+static GTY((deletable)) fundef_copies_table_t fundef_copies_table;
+
+/* Initialize FUNDEF_COPIES_TABLE if it's not initialized.  */
+
+static void
+maybe_initialize_fundef_copies_table ()
+{
+  if (fundef_copies_table.map == NULL)
+    fundef_copies_table.map = hash_map<tree, fun_copy *>::create_ggc (101);
+}
+
+/* Reuse or create a new unshared copy of the function FUN.
+   Return this copy.  */
+
+static fun_copy *
+get_fun_copy (tree fun)
+{
+  fun_copy *copy;
+  fun_copy **slot = &fundef_copies_table.map->get_or_insert (fun, NULL);
+  if (*slot == NULL)
+    {
+      copy = ggc_alloc<fun_copy> ();
+      copy->body = copy_fn (fun, copy->parms, copy->res);
+      copy->prev = NULL;
+    }
+  else
+    {
+      copy = *slot;
+      *slot = (*slot)->prev;
+    }
+
+  return copy;
+}
+
+/* Save the copy COPY of function FUN for later reuse by get_fun_copy().  */
+
+static void
+save_fun_copy (tree fun, fun_copy *copy)
+{
+  fun_copy **slot = &fundef_copies_table.map->get_or_insert (fun, NULL);
+  copy->prev = *slot;
+  *slot = copy;
+}
+
 /* We have an expression tree T that represents a call, either CALL_EXPR
    or AGGR_INIT_EXPR.  If the call is lexically to a named function,
    retrun the _DECL for that function.  */
@@ -1377,10 +1447,14 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
       if (!result || result == error_mark_node)
 	{
 	  gcc_assert (DECL_SAVED_TREE (fun));
-	  tree parms, res;
+	  tree body, parms, res;
 
-	  /* Unshare the whole function body.  */
-	  tree body = copy_fn (fun, parms, res);
+	  /* Reuse or create a new unshared copy of this function's body.  */
+	  maybe_initialize_fundef_copies_table ();
+	  fun_copy *copy = get_fun_copy (fun);
+	  body = copy->body;
+	  parms = copy->parms;
+	  res = copy->res;
 
 	  /* Associate the bindings with the remapped parms.  */
 	  tree bound = new_call.bindings;
@@ -1409,8 +1483,14 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 	  else
 	    ctx->values->put (res, NULL_TREE);
 
+	  /* Track the callee's evaluated SAVE_EXPRs so that we can forget
+	     their values after the call.  */
+	  constexpr_ctx ctx_with_save_exprs = *ctx;
+	  hash_set<tree> save_exprs;
+	  ctx_with_save_exprs.save_exprs = &save_exprs;
+
 	  tree jump_target = NULL_TREE;
-	  cxx_eval_constant_expression (ctx, body,
+	  cxx_eval_constant_expression (&ctx_with_save_exprs, body,
 					lval, non_constant_p, overflow_p,
 					&jump_target);
 
@@ -1435,6 +1515,11 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 		}
 	    }
 
+	  /* Forget the saved values of the callee's SAVE_EXPRs.  */
+	  for (hash_set<tree>::iterator iter = save_exprs.begin();
+	       iter != save_exprs.end(); ++iter)
+	    ctx_with_save_exprs.values->remove (*iter);
+
 	  /* Remove the parms/result from the values map.  Is it worth
 	     bothering to do this when the map itself is only live for
 	     one constexpr evaluation?  If so, maybe also clear out
@@ -1444,6 +1529,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 	    ctx->values->remove (slot);
 	  for (tree parm = parms; parm; parm = TREE_CHAIN (parm))
 	    ctx->values->remove (parm);
+
+	  /* Make the unshared function copy we used available for re-use.  */
+	  save_fun_copy (fun, copy);
 	}
 
       if (result == error_mark_node)
diff --git a/gcc/testsuite/g++.dg/ext/constexpr-vla4.C b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
new file mode 100644
index 0000000..428a8fd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
@@ -0,0 +1,17 @@
+// PR c++/70452
+// { dg-do compile { target c++14 } }
+
+constexpr int
+foo (int n, bool p)
+{
+  __extension__ int a [n] = { 0 };
+  if (n == 3)
+    foo (n - 2, false);
+  if (n == 3)
+    foo(n - 1, true);
+  if (p)
+    return a[1];
+  return 0;
+}
+
+constexpr int i2 = foo (3, false); // { dg-bogus "array subscript out of bound" }
-- 
2.8.0.rc3.27.gade0865

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-02 21:18   ` Patrick Palka
@ 2016-04-03  3:52     ` Trevor Saunders
  2016-04-03 22:06       ` Patrick Palka
  2016-04-04 15:39     ` Jason Merrill
  1 sibling, 1 reply; 11+ messages in thread
From: Trevor Saunders @ 2016-04-03  3:52 UTC (permalink / raw)
  To: Patrick Palka; +Cc: Jason Merrill, gcc-patches

On Sat, Apr 02, 2016 at 05:18:31PM -0400, Patrick Palka wrote:
> On Fri, 1 Apr 2016, Jason Merrill wrote:
> 
> > I like this approach a lot.  One thing, though:
> > 
> > On 04/01/2016 03:13 PM, Patrick Palka wrote:
> > > +struct GTY((chain_next ("%h.prev"))) bpr_entry
> > > +{
> > > +  tree body;
> > > +  tree parms;
> > > +  tree res;
> > > +  struct bpr_entry *prev;
> > > +};
> > > +
> > >   /* Representation of entries in the constexpr function definition table.
> > > */
> > > 
> > >   struct GTY((for_user)) constexpr_fundef {
> > >     tree decl;
> > >     tree body;
> > > +  /* A chain of unused copies of this function's body awaiting reuse for
> > > +     CALL_EXPR evaluation.  */
> > > +  struct bpr_entry *bpr_freelist;
> > >   };
> > 
> > The freelist should be discarded on GC.  I think the way to achieve that is to
> > put all the freelists into a separate deletable hash table rather than hang
> > them off the fundefs.
> > 
> > Jason
> > 
> > 
> 
> Here's a version that uses a separate deletable table to cache the
> function copies.  For simplicity I used a hash_map instead of a
> hash_table.  Does this look OK to commit after bootstrap + regtest?
> 
> On a related note, I noticed that the constexpr_call_table is not marked
> deletable.  Marking it deletable speeds up the test case in the PR by
> about 10% and saves about 10MB.  Do you think doing so is a good idea?
> 
> On another related note, I noticed that marking something as both
> GTY((deletable, cache)) doesn't work as intended, because the
> gt_cleare_cache functions run _after_ all deletable roots get
> zeroed out.  So during GC the gt_cleare_cache function of a root
> marked "deletable, cache" would always be a no-op.  Concretely I think
> this means that our cv_cache and fold_cache leak memory during GC
> because their underlying hash_map (allocated by operator new) is zeroed
> before gc_cleare_cache could clear it.

yeah, I think that's true.  I'm not really sure what the expected point
of making something both cache and deletable is maybe we should just ban
it?  The way this ties in with the gc is... wierd ;) but I'm not really
sure how I'd fix it.

Trev

> 
> gcc/cp/ChangeLog:
> 
> 	PR c++/70452
> 	* constexpr.c (struct fun_copy): New struct.
> 	(struct fundef_copies_t): New struct.
> 	(fundef_copies_table): New static variable.
> 	(maybe_initialize_fundef_copies_table): New static function.
> 	(get_fun_copy): New static function.
> 	(save_fun_copy): New static function.
> 	(cxx_eval_call_expression): Use
> 	maybe_initialize_fundef_copies_table, get_fun_copy, and
> 	save_fun_copy.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR c++/70452
> 	* g++.dg/ext/constexpr-vla4.C: New test.
> ---
>  gcc/cp/constexpr.c                        | 96 +++++++++++++++++++++++++++++--
>  gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++++++
>  2 files changed, 109 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C
> 
> diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> index ea605dc..e1a5a4e 100644
> --- a/gcc/cp/constexpr.c
> +++ b/gcc/cp/constexpr.c
> @@ -965,6 +965,76 @@ maybe_initialize_constexpr_call_table (void)
>      constexpr_call_table = hash_table<constexpr_call_hasher>::create_ggc (101);
>  }
>  
> +/* The representation of a single node in the per-function freelist maintained
> +   by FUNDEF_COPIES_TABLE.  */
> +
> +struct fun_copy
> +{
> +  tree body;
> +  tree parms;
> +  tree res;
> +  fun_copy *prev;
> +};
> +
> +/* During constexpr CALL_EXPR evaluation, to avoid issues with sharing when
> +   a function happens to get called recursively, we unshare the callee
> +   function's body and evaluate this unshared copy instead of evaluating the
> +   original body.
> +
> +   FUNDEF_COPIES_TABLE is a per-function freelist of these unshared function
> +   copies.  The underlying data structure of FUNDEF_COPIES_TABLE is a hash_map
> +   that's keyed off of the original FUNCTION_DECL and whose value is the chain
> +   of this function's unused copies awaiting reuse.  */
> +
> +struct fundef_copies_table_t
> +{
> +  hash_map<tree, fun_copy *> *map;
> +};
> +
> +static GTY((deletable)) fundef_copies_table_t fundef_copies_table;
> +
> +/* Initialize FUNDEF_COPIES_TABLE if it's not initialized.  */
> +
> +static void
> +maybe_initialize_fundef_copies_table ()
> +{
> +  if (fundef_copies_table.map == NULL)
> +    fundef_copies_table.map = hash_map<tree, fun_copy *>::create_ggc (101);
> +}
> +
> +/* Reuse or create a new unshared copy of the function FUN.
> +   Return this copy.  */
> +
> +static fun_copy *
> +get_fun_copy (tree fun)
> +{
> +  fun_copy *copy;
> +  fun_copy **slot = &fundef_copies_table.map->get_or_insert (fun, NULL);
> +  if (*slot == NULL)
> +    {
> +      copy = ggc_alloc<fun_copy> ();
> +      copy->body = copy_fn (fun, copy->parms, copy->res);
> +      copy->prev = NULL;
> +    }
> +  else
> +    {
> +      copy = *slot;
> +      *slot = (*slot)->prev;
> +    }
> +
> +  return copy;
> +}
> +
> +/* Save the copy COPY of function FUN for later reuse by get_fun_copy().  */
> +
> +static void
> +save_fun_copy (tree fun, fun_copy *copy)
> +{
> +  fun_copy **slot = &fundef_copies_table.map->get_or_insert (fun, NULL);
> +  copy->prev = *slot;
> +  *slot = copy;
> +}
> +
>  /* We have an expression tree T that represents a call, either CALL_EXPR
>     or AGGR_INIT_EXPR.  If the call is lexically to a named function,
>     retrun the _DECL for that function.  */
> @@ -1377,10 +1447,14 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>        if (!result || result == error_mark_node)
>  	{
>  	  gcc_assert (DECL_SAVED_TREE (fun));
> -	  tree parms, res;
> +	  tree body, parms, res;
>  
> -	  /* Unshare the whole function body.  */
> -	  tree body = copy_fn (fun, parms, res);
> +	  /* Reuse or create a new unshared copy of this function's body.  */
> +	  maybe_initialize_fundef_copies_table ();
> +	  fun_copy *copy = get_fun_copy (fun);
> +	  body = copy->body;
> +	  parms = copy->parms;
> +	  res = copy->res;
>  
>  	  /* Associate the bindings with the remapped parms.  */
>  	  tree bound = new_call.bindings;
> @@ -1409,8 +1483,14 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>  	  else
>  	    ctx->values->put (res, NULL_TREE);
>  
> +	  /* Track the callee's evaluated SAVE_EXPRs so that we can forget
> +	     their values after the call.  */
> +	  constexpr_ctx ctx_with_save_exprs = *ctx;
> +	  hash_set<tree> save_exprs;
> +	  ctx_with_save_exprs.save_exprs = &save_exprs;
> +
>  	  tree jump_target = NULL_TREE;
> -	  cxx_eval_constant_expression (ctx, body,
> +	  cxx_eval_constant_expression (&ctx_with_save_exprs, body,
>  					lval, non_constant_p, overflow_p,
>  					&jump_target);
>  
> @@ -1435,6 +1515,11 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>  		}
>  	    }
>  
> +	  /* Forget the saved values of the callee's SAVE_EXPRs.  */
> +	  for (hash_set<tree>::iterator iter = save_exprs.begin();
> +	       iter != save_exprs.end(); ++iter)
> +	    ctx_with_save_exprs.values->remove (*iter);
> +
>  	  /* Remove the parms/result from the values map.  Is it worth
>  	     bothering to do this when the map itself is only live for
>  	     one constexpr evaluation?  If so, maybe also clear out
> @@ -1444,6 +1529,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>  	    ctx->values->remove (slot);
>  	  for (tree parm = parms; parm; parm = TREE_CHAIN (parm))
>  	    ctx->values->remove (parm);
> +
> +	  /* Make the unshared function copy we used available for re-use.  */
> +	  save_fun_copy (fun, copy);
>  	}
>  
>        if (result == error_mark_node)
> diff --git a/gcc/testsuite/g++.dg/ext/constexpr-vla4.C b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
> new file mode 100644
> index 0000000..428a8fd
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
> @@ -0,0 +1,17 @@
> +// PR c++/70452
> +// { dg-do compile { target c++14 } }
> +
> +constexpr int
> +foo (int n, bool p)
> +{
> +  __extension__ int a [n] = { 0 };
> +  if (n == 3)
> +    foo (n - 2, false);
> +  if (n == 3)
> +    foo(n - 1, true);
> +  if (p)
> +    return a[1];
> +  return 0;
> +}
> +
> +constexpr int i2 = foo (3, false); // { dg-bogus "array subscript out of bound" }
> -- 
> 2.8.0.rc3.27.gade0865
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-03  3:52     ` Trevor Saunders
@ 2016-04-03 22:06       ` Patrick Palka
  0 siblings, 0 replies; 11+ messages in thread
From: Patrick Palka @ 2016-04-03 22:06 UTC (permalink / raw)
  To: Trevor Saunders; +Cc: Jason Merrill, GCC Patches

On Sat, Apr 2, 2016 at 11:57 PM, Trevor Saunders <tbsaunde@tbsaunde.org> wrote:
> On Sat, Apr 02, 2016 at 05:18:31PM -0400, Patrick Palka wrote:
>> On Fri, 1 Apr 2016, Jason Merrill wrote:
>>
>> > I like this approach a lot.  One thing, though:
>> >
>> > On 04/01/2016 03:13 PM, Patrick Palka wrote:
>> > > +struct GTY((chain_next ("%h.prev"))) bpr_entry
>> > > +{
>> > > +  tree body;
>> > > +  tree parms;
>> > > +  tree res;
>> > > +  struct bpr_entry *prev;
>> > > +};
>> > > +
>> > >   /* Representation of entries in the constexpr function definition table.
>> > > */
>> > >
>> > >   struct GTY((for_user)) constexpr_fundef {
>> > >     tree decl;
>> > >     tree body;
>> > > +  /* A chain of unused copies of this function's body awaiting reuse for
>> > > +     CALL_EXPR evaluation.  */
>> > > +  struct bpr_entry *bpr_freelist;
>> > >   };
>> >
>> > The freelist should be discarded on GC.  I think the way to achieve that is to
>> > put all the freelists into a separate deletable hash table rather than hang
>> > them off the fundefs.
>> >
>> > Jason
>> >
>> >
>>
>> Here's a version that uses a separate deletable table to cache the
>> function copies.  For simplicity I used a hash_map instead of a
>> hash_table.  Does this look OK to commit after bootstrap + regtest?
>>
>> On a related note, I noticed that the constexpr_call_table is not marked
>> deletable.  Marking it deletable speeds up the test case in the PR by
>> about 10% and saves about 10MB.  Do you think doing so is a good idea?
>>
>> On another related note, I noticed that marking something as both
>> GTY((deletable, cache)) doesn't work as intended, because the
>> gt_cleare_cache functions run _after_ all deletable roots get
>> zeroed out.  So during GC the gt_cleare_cache function of a root
>> marked "deletable, cache" would always be a no-op.  Concretely I think
>> this means that our cv_cache and fold_cache leak memory during GC
>> because their underlying hash_map (allocated by operator new) is zeroed
>> before gc_cleare_cache could clear it.
>
> yeah, I think that's true.  I'm not really sure what the expected point
> of making something both cache and deletable is maybe we should just ban
> it?  The way this ties in with the gc is... wierd ;) but I'm not really
> sure how I'd fix it.
>
> Trev

The intended effect can be achieved I think by marking the cache_map
as GTY((cache)) (so that the table can be cleared during GC) and then
marking the cache_map::map field as GTY((skip)) (so that PCH ignores
it).

>
>>
>> gcc/cp/ChangeLog:
>>
>>       PR c++/70452
>>       * constexpr.c (struct fun_copy): New struct.
>>       (struct fundef_copies_t): New struct.
>>       (fundef_copies_table): New static variable.
>>       (maybe_initialize_fundef_copies_table): New static function.
>>       (get_fun_copy): New static function.
>>       (save_fun_copy): New static function.
>>       (cxx_eval_call_expression): Use
>>       maybe_initialize_fundef_copies_table, get_fun_copy, and
>>       save_fun_copy.
>>
>> gcc/testsuite/ChangeLog:
>>
>>       PR c++/70452
>>       * g++.dg/ext/constexpr-vla4.C: New test.
>> ---
>>  gcc/cp/constexpr.c                        | 96 +++++++++++++++++++++++++++++--
>>  gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++++++
>>  2 files changed, 109 insertions(+), 4 deletions(-)
>>  create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C
>>
>> diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
>> index ea605dc..e1a5a4e 100644
>> --- a/gcc/cp/constexpr.c
>> +++ b/gcc/cp/constexpr.c
>> @@ -965,6 +965,76 @@ maybe_initialize_constexpr_call_table (void)
>>      constexpr_call_table = hash_table<constexpr_call_hasher>::create_ggc (101);
>>  }
>>
>> +/* The representation of a single node in the per-function freelist maintained
>> +   by FUNDEF_COPIES_TABLE.  */
>> +
>> +struct fun_copy
>> +{
>> +  tree body;
>> +  tree parms;
>> +  tree res;
>> +  fun_copy *prev;
>> +};
>> +
>> +/* During constexpr CALL_EXPR evaluation, to avoid issues with sharing when
>> +   a function happens to get called recursively, we unshare the callee
>> +   function's body and evaluate this unshared copy instead of evaluating the
>> +   original body.
>> +
>> +   FUNDEF_COPIES_TABLE is a per-function freelist of these unshared function
>> +   copies.  The underlying data structure of FUNDEF_COPIES_TABLE is a hash_map
>> +   that's keyed off of the original FUNCTION_DECL and whose value is the chain
>> +   of this function's unused copies awaiting reuse.  */
>> +
>> +struct fundef_copies_table_t
>> +{
>> +  hash_map<tree, fun_copy *> *map;
>> +};
>> +
>> +static GTY((deletable)) fundef_copies_table_t fundef_copies_table;
>> +
>> +/* Initialize FUNDEF_COPIES_TABLE if it's not initialized.  */
>> +
>> +static void
>> +maybe_initialize_fundef_copies_table ()
>> +{
>> +  if (fundef_copies_table.map == NULL)
>> +    fundef_copies_table.map = hash_map<tree, fun_copy *>::create_ggc (101);
>> +}
>> +
>> +/* Reuse or create a new unshared copy of the function FUN.
>> +   Return this copy.  */
>> +
>> +static fun_copy *
>> +get_fun_copy (tree fun)
>> +{
>> +  fun_copy *copy;
>> +  fun_copy **slot = &fundef_copies_table.map->get_or_insert (fun, NULL);
>> +  if (*slot == NULL)
>> +    {
>> +      copy = ggc_alloc<fun_copy> ();
>> +      copy->body = copy_fn (fun, copy->parms, copy->res);
>> +      copy->prev = NULL;
>> +    }
>> +  else
>> +    {
>> +      copy = *slot;
>> +      *slot = (*slot)->prev;
>> +    }
>> +
>> +  return copy;
>> +}
>> +
>> +/* Save the copy COPY of function FUN for later reuse by get_fun_copy().  */
>> +
>> +static void
>> +save_fun_copy (tree fun, fun_copy *copy)
>> +{
>> +  fun_copy **slot = &fundef_copies_table.map->get_or_insert (fun, NULL);
>> +  copy->prev = *slot;
>> +  *slot = copy;
>> +}
>> +
>>  /* We have an expression tree T that represents a call, either CALL_EXPR
>>     or AGGR_INIT_EXPR.  If the call is lexically to a named function,
>>     retrun the _DECL for that function.  */
>> @@ -1377,10 +1447,14 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>>        if (!result || result == error_mark_node)
>>       {
>>         gcc_assert (DECL_SAVED_TREE (fun));
>> -       tree parms, res;
>> +       tree body, parms, res;
>>
>> -       /* Unshare the whole function body.  */
>> -       tree body = copy_fn (fun, parms, res);
>> +       /* Reuse or create a new unshared copy of this function's body.  */
>> +       maybe_initialize_fundef_copies_table ();
>> +       fun_copy *copy = get_fun_copy (fun);
>> +       body = copy->body;
>> +       parms = copy->parms;
>> +       res = copy->res;
>>
>>         /* Associate the bindings with the remapped parms.  */
>>         tree bound = new_call.bindings;
>> @@ -1409,8 +1483,14 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>>         else
>>           ctx->values->put (res, NULL_TREE);
>>
>> +       /* Track the callee's evaluated SAVE_EXPRs so that we can forget
>> +          their values after the call.  */
>> +       constexpr_ctx ctx_with_save_exprs = *ctx;
>> +       hash_set<tree> save_exprs;
>> +       ctx_with_save_exprs.save_exprs = &save_exprs;
>> +
>>         tree jump_target = NULL_TREE;
>> -       cxx_eval_constant_expression (ctx, body,
>> +       cxx_eval_constant_expression (&ctx_with_save_exprs, body,
>>                                       lval, non_constant_p, overflow_p,
>>                                       &jump_target);
>>
>> @@ -1435,6 +1515,11 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>>               }
>>           }
>>
>> +       /* Forget the saved values of the callee's SAVE_EXPRs.  */
>> +       for (hash_set<tree>::iterator iter = save_exprs.begin();
>> +            iter != save_exprs.end(); ++iter)
>> +         ctx_with_save_exprs.values->remove (*iter);
>> +
>>         /* Remove the parms/result from the values map.  Is it worth
>>            bothering to do this when the map itself is only live for
>>            one constexpr evaluation?  If so, maybe also clear out
>> @@ -1444,6 +1529,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>>           ctx->values->remove (slot);
>>         for (tree parm = parms; parm; parm = TREE_CHAIN (parm))
>>           ctx->values->remove (parm);
>> +
>> +       /* Make the unshared function copy we used available for re-use.  */
>> +       save_fun_copy (fun, copy);
>>       }
>>
>>        if (result == error_mark_node)
>> diff --git a/gcc/testsuite/g++.dg/ext/constexpr-vla4.C b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
>> new file mode 100644
>> index 0000000..428a8fd
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
>> @@ -0,0 +1,17 @@
>> +// PR c++/70452
>> +// { dg-do compile { target c++14 } }
>> +
>> +constexpr int
>> +foo (int n, bool p)
>> +{
>> +  __extension__ int a [n] = { 0 };
>> +  if (n == 3)
>> +    foo (n - 2, false);
>> +  if (n == 3)
>> +    foo(n - 1, true);
>> +  if (p)
>> +    return a[1];
>> +  return 0;
>> +}
>> +
>> +constexpr int i2 = foo (3, false); // { dg-bogus "array subscript out of bound" }
>> --
>> 2.8.0.rc3.27.gade0865
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-02 21:18   ` Patrick Palka
  2016-04-03  3:52     ` Trevor Saunders
@ 2016-04-04 15:39     ` Jason Merrill
  2016-04-04 16:55       ` Patrick Palka
  1 sibling, 1 reply; 11+ messages in thread
From: Jason Merrill @ 2016-04-04 15:39 UTC (permalink / raw)
  To: Patrick Palka; +Cc: gcc-patches

On 04/02/2016 05:18 PM, Patrick Palka wrote:
> Here's a version that uses a separate deletable table to cache the
> function copies.  For simplicity I used a hash_map instead of a
> hash_table.  Does this look OK to commit after bootstrap + regtest?

Thanks.  Minor nits:

 > +struct fundef_copies_table_t
 > +{
 > +  hash_map<tree, fun_copy *> *map;
 > +};

Why wrap the pointer in a struct?

> +	  maybe_initialize_fundef_copies_table ();
> +	  fun_copy *copy = get_fun_copy (fun);

Let's move the initialization call inside get_fun_copy.

> On a related note, I noticed that the constexpr_call_table is not marked
> deletable.  Marking it deletable speeds up the test case in the PR by
> about 10% and saves about 10MB.  Do you think doing so is a good idea?

Please.

> On another related note, I noticed that marking something as both
> GTY((deletable, cache)) doesn't work as intended, because the
> gt_cleare_cache functions run _after_ all deletable roots get
> zeroed out.  So during GC the gt_cleare_cache function of a root
> marked "deletable, cache" would always be a no-op.  Concretely I think
> this means that our cv_cache and fold_cache leak memory during GC
> because their underlying hash_map (allocated by operator new) is zeroed
> before gc_cleare_cache could clear it.

Hmm, I thought I remembered hitting the breakpoint in gt_cleare_cache 
and it being non-null.  But I guess we can get rid of the cache_map 
class and use the approach you have here, of a deletable gc-allocated 
hash_map pointer; I'd still use ->empty() for dumping the cache outside 
of GC, though.

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-04 15:39     ` Jason Merrill
@ 2016-04-04 16:55       ` Patrick Palka
  2016-04-05 13:06         ` Jason Merrill
  0 siblings, 1 reply; 11+ messages in thread
From: Patrick Palka @ 2016-04-04 16:55 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Patrick Palka, gcc-patches

On Mon, 4 Apr 2016, Jason Merrill wrote:

> On 04/02/2016 05:18 PM, Patrick Palka wrote:
> > Here's a version that uses a separate deletable table to cache the
> > function copies.  For simplicity I used a hash_map instead of a
> > hash_table.  Does this look OK to commit after bootstrap + regtest?
> 
> Thanks.  Minor nits:
> 
> > +struct fundef_copies_table_t
> > +{
> > +  hash_map<tree, fun_copy *> *map;
> > +};
> 
> Why wrap the pointer in a struct?

If I don't wrap the pointer then I get really weird link errors.  If the
following line is added to constexpr.c:

  static GTY((deletable)) hash_map<tree, fundef_copy *> *blah;

then the final link fails with dozens of these undefined reference errors:

  /home/patrick/code/gcc/gcc/hash-map.h:68: undefined reference to `gt_pch_nx(tree_node*&)'
  /home/patrick/code/gcc/gcc/hash-map.h:61: undefined reference to `gt_ggc_mx(tree_node*&)'
  /home/patrick/code/gcc/gcc/hash-map.h:62: undefined reference to `gt_ggc_mx(tree_node*&)'
  ...

Seems to only happen when the value type of the hash_map is something other
than "tree".  This strangeness can be conveniently avoided by wrapping the
hash_map.


> 
> > +	  maybe_initialize_fundef_copies_table ();
> > +	  fun_copy *copy = get_fun_copy (fun);
> 
> Let's move the initialization call inside get_fun_copy.

Done.

> 
> > On a related note, I noticed that the constexpr_call_table is not marked
> > deletable.  Marking it deletable speeds up the test case in the PR by
> > about 10% and saves about 10MB.  Do you think doing so is a good idea?
> 
> Please.

Done.

> 
> > On another related note, I noticed that marking something as both
> > GTY((deletable, cache)) doesn't work as intended, because the
> > gt_cleare_cache functions run _after_ all deletable roots get
> > zeroed out.  So during GC the gt_cleare_cache function of a root
> > marked "deletable, cache" would always be a no-op.  Concretely I think
> > this means that our cv_cache and fold_cache leak memory during GC
> > because their underlying hash_map (allocated by operator new) is zeroed
> > before gc_cleare_cache could clear it.
> 
> Hmm, I thought I remembered hitting the breakpoint in gt_cleare_cache and it
> being non-null.  But I guess we can get rid of the cache_map class and use the
> approach you have here, of a deletable gc-allocated hash_map pointer; I'd
> still use ->empty() for dumping the cache outside of GC, though.

Could do this too in a subsequent patch.

Here's an updated patch that that adjusts the initialization call and marks
constexpr_call_table as deletable.  (Also fun_copy is renamed to fundef_copy
for consistency.)

-- >8 --

gcc/cp/ChangeLog:

	PR c++/70452
	* constexpr.c (struct fundef_copy): New struct.
	(struct fundef_copies_table_t): New struct.
	(fundef_copies_table): New static variable.
	(maybe_initialize_fundef_copies_table): New static function.
	(get_fundef_copy): New static function.
	(save_fundef_copy): New static function.
	(cxx_eval_call_expression): Use get_fundef_copy, and
	save_fundef_copy.
	(constexpr_call_table): Add "deletable" GTY marker.

gcc/testsuite/ChangeLog:

	PR c++/70452
	* g++.dg/ext/constexpr-vla4.C: New test.
---
 gcc/cp/constexpr.c                        | 99 +++++++++++++++++++++++++++++--
 gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++++++
 2 files changed, 111 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index b94b346..bcbf9bd 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -915,7 +915,7 @@ struct constexpr_ctx {
 /* A table of all constexpr calls that have been evaluated by the
    compiler in this translation unit.  */
 
-static GTY (()) hash_table<constexpr_call_hasher> *constexpr_call_table;
+static GTY ((deletable)) hash_table<constexpr_call_hasher> *constexpr_call_table;
 
 static tree cxx_eval_constant_expression (const constexpr_ctx *, tree,
 					  bool, bool *, bool *, tree * = NULL);
@@ -965,6 +965,78 @@ maybe_initialize_constexpr_call_table (void)
     constexpr_call_table = hash_table<constexpr_call_hasher>::create_ggc (101);
 }
 
+/* The representation of a single node in the per-function freelist maintained
+   by FUNDEF_COPIES_TABLE.  */
+
+struct fundef_copy
+{
+  tree body;
+  tree parms;
+  tree res;
+  fundef_copy *prev;
+};
+
+/* During constexpr CALL_EXPR evaluation, to avoid issues with sharing when
+   a function happens to get called recursively, we unshare the callee
+   function's body and evaluate this unshared copy instead of evaluating the
+   original body.
+
+   FUNDEF_COPIES_TABLE is a per-function freelist of these unshared function
+   copies.  The underlying data structure of FUNDEF_COPIES_TABLE is a hash_map
+   that's keyed off of the original FUNCTION_DECL and whose value is the chain
+   of this function's unused copies awaiting reuse.  */
+
+struct fundef_copies_table_t
+{
+  hash_map<tree, fundef_copy *> *map;
+};
+
+static GTY((deletable)) fundef_copies_table_t fundef_copies_table;
+
+/* Initialize FUNDEF_COPIES_TABLE if it's not initialized.  */
+
+static void
+maybe_initialize_fundef_copies_table ()
+{
+  if (fundef_copies_table.map == NULL)
+    fundef_copies_table.map = hash_map<tree, fundef_copy *>::create_ggc (101);
+}
+
+/* Reuse a copy or create a new unshared copy of the function FUN.
+   Return this copy.  */
+
+static fundef_copy *
+get_fundef_copy (tree fun)
+{
+  maybe_initialize_fundef_copies_table ();
+
+  fundef_copy *copy;
+  fundef_copy **slot = &fundef_copies_table.map->get_or_insert (fun, NULL);
+  if (*slot == NULL)
+    {
+      copy = ggc_alloc<fundef_copy> ();
+      copy->body = copy_fn (fun, copy->parms, copy->res);
+      copy->prev = NULL;
+    }
+  else
+    {
+      copy = *slot;
+      *slot = (*slot)->prev;
+    }
+
+  return copy;
+}
+
+/* Save the copy COPY of function FUN for later reuse by get_fundef_copy().  */
+
+static void
+save_fundef_copy (tree fun, fundef_copy *copy)
+{
+  fundef_copy **slot = &fundef_copies_table.map->get_or_insert (fun, NULL);
+  copy->prev = *slot;
+  *slot = copy;
+}
+
 /* We have an expression tree T that represents a call, either CALL_EXPR
    or AGGR_INIT_EXPR.  If the call is lexically to a named function,
    retrun the _DECL for that function.  */
@@ -1365,10 +1437,13 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
       if (!result || result == error_mark_node)
 	{
 	  gcc_assert (DECL_SAVED_TREE (fun));
-	  tree parms, res;
+	  tree body, parms, res;
 
-	  /* Unshare the whole function body.  */
-	  tree body = copy_fn (fun, parms, res);
+	  /* Reuse or create a new unshared copy of this function's body.  */
+	  fundef_copy *copy = get_fundef_copy (fun);
+	  body = copy->body;
+	  parms = copy->parms;
+	  res = copy->res;
 
 	  /* Associate the bindings with the remapped parms.  */
 	  tree bound = new_call.bindings;
@@ -1397,8 +1472,14 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 	  else
 	    ctx->values->put (res, NULL_TREE);
 
+	  /* Track the callee's evaluated SAVE_EXPRs so that we can forget
+	     their values after the call.  */
+	  constexpr_ctx ctx_with_save_exprs = *ctx;
+	  hash_set<tree> save_exprs;
+	  ctx_with_save_exprs.save_exprs = &save_exprs;
+
 	  tree jump_target = NULL_TREE;
-	  cxx_eval_constant_expression (ctx, body,
+	  cxx_eval_constant_expression (&ctx_with_save_exprs, body,
 					lval, non_constant_p, overflow_p,
 					&jump_target);
 
@@ -1423,6 +1504,11 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 		}
 	    }
 
+	  /* Forget the saved values of the callee's SAVE_EXPRs.  */
+	  for (hash_set<tree>::iterator iter = save_exprs.begin();
+	       iter != save_exprs.end(); ++iter)
+	    ctx_with_save_exprs.values->remove (*iter);
+
 	  /* Remove the parms/result from the values map.  Is it worth
 	     bothering to do this when the map itself is only live for
 	     one constexpr evaluation?  If so, maybe also clear out
@@ -1432,6 +1518,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 	    ctx->values->remove (slot);
 	  for (tree parm = parms; parm; parm = TREE_CHAIN (parm))
 	    ctx->values->remove (parm);
+
+	  /* Make the unshared function copy we used available for re-use.  */
+	  save_fundef_copy (fun, copy);
 	}
 
       if (result == error_mark_node)
diff --git a/gcc/testsuite/g++.dg/ext/constexpr-vla4.C b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
new file mode 100644
index 0000000..428a8fd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
@@ -0,0 +1,17 @@
+// PR c++/70452
+// { dg-do compile { target c++14 } }
+
+constexpr int
+foo (int n, bool p)
+{
+  __extension__ int a [n] = { 0 };
+  if (n == 3)
+    foo (n - 2, false);
+  if (n == 3)
+    foo(n - 1, true);
+  if (p)
+    return a[1];
+  return 0;
+}
+
+constexpr int i2 = foo (3, false); // { dg-bogus "array subscript out of bound" }
-- 
2.8.0.rc3.27.gade0865

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)
  2016-04-04 16:55       ` Patrick Palka
@ 2016-04-05 13:06         ` Jason Merrill
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Merrill @ 2016-04-05 13:06 UTC (permalink / raw)
  To: Patrick Palka; +Cc: gcc-patches

OK, thanks.

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-04-05 13:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01 19:13 [PATCH] Fix PR c++/70452 (regression in C++ parsing performance) Patrick Palka
2016-04-01 19:25 ` Patrick Palka
2016-04-02  1:28 ` Jason Merrill
2016-04-02  1:55   ` Patrick Palka
2016-04-02  2:10     ` Patrick Palka
2016-04-02 21:18   ` Patrick Palka
2016-04-03  3:52     ` Trevor Saunders
2016-04-03 22:06       ` Patrick Palka
2016-04-04 15:39     ` Jason Merrill
2016-04-04 16:55       ` Patrick Palka
2016-04-05 13:06         ` Jason Merrill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).