public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Make function clone name numbering independent.
@ 2018-07-16 19:38 Michael Ploujnikov
  2018-07-17  6:10 ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-07-16 19:38 UTC (permalink / raw)
  To: gcc-patches


[-- Attachment #1.1: Type: text/plain, Size: 4126 bytes --]

Hi,

This patch is a small part of the work I'm doing to make function codegen/assembly independent from one another as mentioned in: https://gcc.gnu.org/ml/gcc/2018-07/msg00210.html. It deals with clone_fn_id_num rather than object UIDs and I figured it's better to make my first submission with a smaller, simpler and self-contained patch.

This changes clone_function_name_1 such that clone names are based on a per-function rather than a global counter so that the number of clones of one function doesn't affect the numbering of clone names of other functions.

This should have minimal impact as the only user of the clone names that I found (https://gcc.gnu.org/ml/gcc/2013-03/msg00268.html) doesn't actually care about the specific numeric values.


Thanks
- Michael

gcc:
2018-07-16  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Make function clone name numbering independent.
       * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
       (clone_function_name_1): Use it.

testsuite:
2018-07-16  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	Clone id counters should be completely independent from one another.
	* gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.


---
 gcc/cgraphclones.c                            | 11 ++++++--
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c

diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 69572b9..c5a40bd 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -528,7 +528,7 @@ cgraph_node::create_clone (tree new_decl, gcov_type gcov_count, int freq,
   return new_node;
 }
 
-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
 /* Return a new assembler name for a clone with SUFFIX of a decl named
    NAME.  */
@@ -543,7 +543,14 @@ clone_function_name_1 (const char *name, const char *suffix)
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
+  unsigned int *suffix_counter;
+  if (!clone_fn_ids) {
+    /* Initialize the per-function counter hash table if this is the first call */
+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (1000);
+  }
+  suffix_counter = &clone_fn_ids->get_or_insert(name);
+  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, *suffix_counter);
+  *suffix_counter = *suffix_counter + 1;
   return get_identifier (tmp_name);
 }
 
diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
new file mode 100644
index 0000000..d723e20
--- /dev/null
+++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
+
+extern int printf (const char *, ...);
+
+static int __attribute__ ((noinline))
+foo (int arg)
+{
+  return 7 * arg;
+}
+
+static int __attribute__ ((noinline))
+bar (int arg)
+{
+  return arg * arg;
+}
+
+int
+baz (int arg)
+{
+  printf("%d\n", bar (3));
+  printf("%d\n", bar (4));
+  printf("%d\n", foo (5));
+  printf("%d\n", foo (6));
+  /* adding or removing the following call should not affect foo
+     function's clone numbering */
+  printf("%d\n", bar (7));
+  return foo (8);
+}
+
+/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
+/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
+/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
-- 
2.7.4


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Make function clone name numbering independent.
  2018-07-16 19:38 [PATCH] Make function clone name numbering independent Michael Ploujnikov
@ 2018-07-17  6:10 ` Bernhard Reutner-Fischer
  2018-07-17 10:03   ` Richard Biener
  0 siblings, 1 reply; 30+ messages in thread
From: Bernhard Reutner-Fischer @ 2018-07-17  6:10 UTC (permalink / raw)
  To: gcc-patches, Michael Ploujnikov

On 16 July 2018 21:38:36 CEST, Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote:
>Hi,
>

>+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc
>(1000);

Isn't 1000 a bit excessive? What about 64 or thereabouts? 

thanks,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Make function clone name numbering independent.
  2018-07-17  6:10 ` Bernhard Reutner-Fischer
@ 2018-07-17 10:03   ` Richard Biener
  2018-07-17 20:25     ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Biener @ 2018-07-17 10:03 UTC (permalink / raw)
  To: Bernhard Fischer; +Cc: GCC Patches, Michael Ploujnikov

On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer
<rep.dot.nop@gmail.com> wrote:
>
> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote:
> >Hi,
> >
>
> >+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc
> >(1000);
>
> Isn't 1000 a bit excessive? What about 64 or thereabouts?

I'm not sure we should throw memory at this "problem" in this way.
What specific issue
does this address?

Iff then I belive forgoing the automatic counter addidion is the way
to go and hand
control of that to the callers (for example the caller in
lto/lto-partition.c doesn't
even seem to need that counter.

You also assume the string you key on persists - luckily the
lto-partition.c caller
leaks it but this makes your approach quite fragile in my eye (put the logic
into clone_function_name instead, where you at least know you are dealing
with a string from an indentifier which are never collected).

Richard.

> thanks,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] Make function clone name numbering independent.
  2018-07-17 10:03   ` Richard Biener
@ 2018-07-17 20:25     ` Michael Ploujnikov
  2018-07-20  2:49       ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-07-17 20:25 UTC (permalink / raw)
  To: Richard Biener, Bernhard Fischer; +Cc: GCC Patches


[-- Attachment #1.1: Type: text/plain, Size: 3629 bytes --]

On 2018-07-17 06:02 AM, Richard Biener wrote:
> On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer
> <rep.dot.nop@gmail.com> wrote:
>>
>> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote:
>>> Hi,
>>>
>>
>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc
>>> (1000);
>>
>> Isn't 1000 a bit excessive? What about 64 or thereabouts?
> 
> I'm not sure we should throw memory at this "problem" in this way.
> What specific issue
> does this address?

This goes along with the general theme of preventing changes to one function affecting codegen of others. What I'm seeing in this case is when a function bar is modified codegen decides to create more clones of it (eg: during the constprop pass). These extra clones cause the global counter to increment so the clones of the unchanged function foo are named differently only because of a source change to bar. I was hoping that the testcase would be a good illustration, but perhaps not; is there anything else I can do to make this clearer?


> 
> Iff then I belive forgoing the automatic counter addidion is the way
> to go and hand
> control of that to the callers (for example the caller in
> lto/lto-partition.c doesn't
> even seem to need that counter.

How can you tell that privatize_symbol_name_1 doesn't need the counter? I'm assuming it has a good reason to call clone_function_name_1 rather than appending ".lto_priv" itself.

> You also assume the string you key on persists - luckily the
> lto-partition.c caller
> leaks it but this makes your approach quite fragile in my eye (put the logic
> into clone_function_name instead, where you at least know you are dealing
> with a string from an indentifier which are never collected).
> 
> Richard.
> 

Is this what you had in mind?:

diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 6e84a31..f000420 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   return new_node;
 }
 
-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
 /* Return a new assembler name for a clone with SUFFIX of a decl named
    NAME.  */
@@ -521,14 +521,13 @@ tree
 clone_function_name_1 (const char *name, const char *suffix)
 {
   size_t len = strlen (name);
-  char *tmp_name, *prefix;
+  char *prefix;
 
   prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
-  return get_identifier (tmp_name);
+  return get_identifier (prefix);
 }
 
 /* Return a new assembler name for a clone of DECL with SUFFIX.  */
@@ -537,7 +536,17 @@ tree
 clone_function_name (tree decl, const char *suffix)
 {
   tree name = DECL_ASSEMBLER_NAME (decl);
-  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+  char *decl_name = IDENTIFIER_POINTER (name);
+  char *numbered_name;
+  unsigned int *suffix_counter;
+  if (!clone_fn_ids) {
+    /* Initialize the per-function counter hash table if this is the first call */
+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
+  }
+  suffix_counter = &clone_fn_ids->get_or_insert(name);
+  ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter);
+  *suffix_counter = *suffix_counter + 1;
+  return clone_function_name_1 (numbered_name, suffix);
 }
 
- Michael




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Make function clone name numbering independent.
  2018-07-17 20:25     ` Michael Ploujnikov
@ 2018-07-20  2:49       ` Michael Ploujnikov
  2018-07-20 10:05         ` Richard Biener
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-07-20  2:49 UTC (permalink / raw)
  To: Richard Biener, Bernhard Fischer; +Cc: GCC Patches


[-- Attachment #1.1: Type: text/plain, Size: 7951 bytes --]

On 2018-07-17 04:25 PM, Michael Ploujnikov wrote:
> On 2018-07-17 06:02 AM, Richard Biener wrote:
>> On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer
>> <rep.dot.nop@gmail.com> wrote:
>>>
>>> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote:
>>>> Hi,
>>>>
>>>
>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc
>>>> (1000);
>>>
>>> Isn't 1000 a bit excessive? What about 64 or thereabouts?
>>
>> I'm not sure we should throw memory at this "problem" in this way.
>> What specific issue
>> does this address?
> 
> This goes along with the general theme of preventing changes to one function affecting codegen of others. What I'm seeing in this case is when a function bar is modified codegen decides to create more clones of it (eg: during the constprop pass). These extra clones cause the global counter to increment so the clones of the unchanged function foo are named differently only because of a source change to bar. I was hoping that the testcase would be a good illustration, but perhaps not; is there anything else I can do to make this clearer?
> 
> 
>>
>> Iff then I belive forgoing the automatic counter addidion is the way
>> to go and hand
>> control of that to the callers (for example the caller in
>> lto/lto-partition.c doesn't
>> even seem to need that counter.
> 
> How can you tell that privatize_symbol_name_1 doesn't need the counter? I'm assuming it has a good reason to call clone_function_name_1 rather than appending ".lto_priv" itself.
> 
>> You also assume the string you key on persists - luckily the
>> lto-partition.c caller
>> leaks it but this makes your approach quite fragile in my eye (put the logic
>> into clone_function_name instead, where you at least know you are dealing
>> with a string from an indentifier which are never collected).
>>
>> Richard.
>>
> 
> Is this what you had in mind?:
> 
> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
> index 6e84a31..f000420 100644
> --- gcc/cgraphclones.c
> +++ gcc/cgraphclones.c
> @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>    return new_node;
>  }
>  
> -static GTY(()) unsigned int clone_fn_id_num;
> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
>  
>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>     NAME.  */
> @@ -521,14 +521,13 @@ tree
>  clone_function_name_1 (const char *name, const char *suffix)
>  {
>    size_t len = strlen (name);
> -  char *tmp_name, *prefix;
> +  char *prefix;
>  
>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>    memcpy (prefix, name, len);
>    strcpy (prefix + len + 1, suffix);
>    prefix[len] = symbol_table::symbol_suffix_separator ();
> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> -  return get_identifier (tmp_name);
> +  return get_identifier (prefix);
>  }
>  
>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
> @@ -537,7 +536,17 @@ tree
>  clone_function_name (tree decl, const char *suffix)
>  {
>    tree name = DECL_ASSEMBLER_NAME (decl);
> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
> +  char *decl_name = IDENTIFIER_POINTER (name);
> +  char *numbered_name;
> +  unsigned int *suffix_counter;
> +  if (!clone_fn_ids) {
> +    /* Initialize the per-function counter hash table if this is the first call */
> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
> +  }
> +  suffix_counter = &clone_fn_ids->get_or_insert(name);
> +  ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter);
> +  *suffix_counter = *suffix_counter + 1;
> +  return clone_function_name_1 (numbered_name, suffix);
>  }
>  
> - Michael
> 
> 
> 

Ping, and below is the updated version of the full patch with changelogs:


gcc:
2018-07-16  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Make function clone name numbering independent.
       * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
       (clone_function_name_1): Move suffixing to clone_function_name
       and change it to use per-function clone_fn_ids.

testsuite:
2018-07-16  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	Clone id counters should be completely independent from one another.
	* gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.

---
 gcc/cgraphclones.c                            | 19 ++++++++++----
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c

diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 6e84a31..e1a77a2 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   return new_node;
 }
 
-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
 /* Return a new assembler name for a clone with SUFFIX of a decl named
    NAME.  */
@@ -521,14 +521,13 @@ tree
 clone_function_name_1 (const char *name, const char *suffix)
 {
   size_t len = strlen (name);
-  char *tmp_name, *prefix;
+  char *prefix;
 
   prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
-  return get_identifier (tmp_name);
+  return get_identifier (prefix);
 }
 
 /* Return a new assembler name for a clone of DECL with SUFFIX.  */
@@ -537,7 +536,17 @@ tree
 clone_function_name (tree decl, const char *suffix)
 {
   tree name = DECL_ASSEMBLER_NAME (decl);
-  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+  const char *decl_name = IDENTIFIER_POINTER (name);
+  char *numbered_name;
+  unsigned int *suffix_counter;
+  if (!clone_fn_ids) {
+    /* Initialize the per-function counter hash table if this is the first call */
+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
+  }
+  suffix_counter = &clone_fn_ids->get_or_insert(decl_name);
+  ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter);
+  *suffix_counter = *suffix_counter + 1;
+  return clone_function_name_1 (numbered_name, suffix);
 }
 
 
diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
new file mode 100644
index 0000000..d723e20
--- /dev/null
+++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
+
+extern int printf (const char *, ...);
+
+static int __attribute__ ((noinline))
+foo (int arg)
+{
+  return 7 * arg;
+}
+
+static int __attribute__ ((noinline))
+bar (int arg)
+{
+  return arg * arg;
+}
+
+int
+baz (int arg)
+{
+  printf("%d\n", bar (3));
+  printf("%d\n", bar (4));
+  printf("%d\n", foo (5));
+  printf("%d\n", foo (6));
+  /* adding or removing the following call should not affect foo
+     function's clone numbering */
+  printf("%d\n", bar (7));
+  return foo (8);
+}
+
+/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
+/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
+/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
-- 
2.7.4


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Make function clone name numbering independent.
  2018-07-20  2:49       ` Michael Ploujnikov
@ 2018-07-20 10:05         ` Richard Biener
  2018-07-24 13:57           ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Biener @ 2018-07-20 10:05 UTC (permalink / raw)
  To: Michael Ploujnikov; +Cc: Bernhard Fischer, GCC Patches

On Fri, Jul 20, 2018 at 4:48 AM Michael Ploujnikov
<michael.ploujnikov@oracle.com> wrote:
>
> On 2018-07-17 04:25 PM, Michael Ploujnikov wrote:
> > On 2018-07-17 06:02 AM, Richard Biener wrote:
> >> On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer
> >> <rep.dot.nop@gmail.com> wrote:
> >>>
> >>> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote:
> >>>> Hi,
> >>>>
> >>>
> >>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc
> >>>> (1000);
> >>>
> >>> Isn't 1000 a bit excessive? What about 64 or thereabouts?
> >>
> >> I'm not sure we should throw memory at this "problem" in this way.
> >> What specific issue
> >> does this address?
> >
> > This goes along with the general theme of preventing changes to one function affecting codegen of others. What I'm seeing in this case is when a function bar is modified codegen decides to create more clones of it (eg: during the constprop pass). These extra clones cause the global counter to increment so the clones of the unchanged function foo are named differently only because of a source change to bar. I was hoping that the testcase would be a good illustration, but perhaps not; is there anything else I can do to make this clearer?
> >
> >
> >>
> >> Iff then I belive forgoing the automatic counter addidion is the way
> >> to go and hand
> >> control of that to the callers (for example the caller in
> >> lto/lto-partition.c doesn't
> >> even seem to need that counter.
> >
> > How can you tell that privatize_symbol_name_1 doesn't need the counter? I'm assuming it has a good reason to call clone_function_name_1 rather than appending ".lto_priv" itself.
> >
> >> You also assume the string you key on persists - luckily the
> >> lto-partition.c caller
> >> leaks it but this makes your approach quite fragile in my eye (put the logic
> >> into clone_function_name instead, where you at least know you are dealing
> >> with a string from an indentifier which are never collected).
> >>
> >> Richard.
> >>
> >
> > Is this what you had in mind?:
> >
> > diff --git gcc/cgraphclones.c gcc/cgraphclones.c
> > index 6e84a31..f000420 100644
> > --- gcc/cgraphclones.c
> > +++ gcc/cgraphclones.c
> > @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
> >    return new_node;
> >  }
> >
> > -static GTY(()) unsigned int clone_fn_id_num;
> > +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
> >
> >  /* Return a new assembler name for a clone with SUFFIX of a decl named
> >     NAME.  */
> > @@ -521,14 +521,13 @@ tree
> >  clone_function_name_1 (const char *name, const char *suffix)
> >  {
> >    size_t len = strlen (name);
> > -  char *tmp_name, *prefix;
> > +  char *prefix;
> >
> >    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
> >    memcpy (prefix, name, len);
> >    strcpy (prefix + len + 1, suffix);
> >    prefix[len] = symbol_table::symbol_suffix_separator ();
> > -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> > -  return get_identifier (tmp_name);
> > +  return get_identifier (prefix);
> >  }
> >
> >  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
> > @@ -537,7 +536,17 @@ tree
> >  clone_function_name (tree decl, const char *suffix)
> >  {
> >    tree name = DECL_ASSEMBLER_NAME (decl);
> > -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
> > +  char *decl_name = IDENTIFIER_POINTER (name);
> > +  char *numbered_name;
> > +  unsigned int *suffix_counter;
> > +  if (!clone_fn_ids) {
> > +    /* Initialize the per-function counter hash table if this is the first call */
> > +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
> > +  }
> > +  suffix_counter = &clone_fn_ids->get_or_insert(name);
> > +  ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter);
> > +  *suffix_counter = *suffix_counter + 1;
> > +  return clone_function_name_1 (numbered_name, suffix);
> >  }
> >
> > - Michael
> >
> >
> >
>
> Ping, and below is the updated version of the full patch with changelogs:
>
>
> gcc:
> 2018-07-16  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>
>        Make function clone name numbering independent.
>        * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
>        (clone_function_name_1): Move suffixing to clone_function_name
>        and change it to use per-function clone_fn_ids.
>
> testsuite:
> 2018-07-16  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>
>         Clone id counters should be completely independent from one another.
>         * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
>
> ---
>  gcc/cgraphclones.c                            | 19 ++++++++++----
>  gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
>  2 files changed, 52 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c
>
> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
> index 6e84a31..e1a77a2 100644
> --- gcc/cgraphclones.c
> +++ gcc/cgraphclones.c
> @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>    return new_node;
>  }
>
> -static GTY(()) unsigned int clone_fn_id_num;
> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
>
>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>     NAME.  */
> @@ -521,14 +521,13 @@ tree
>  clone_function_name_1 (const char *name, const char *suffix)

pass this function the counter to use....

>  {
>    size_t len = strlen (name);
> -  char *tmp_name, *prefix;
> +  char *prefix;
>
>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>    memcpy (prefix, name, len);
>    strcpy (prefix + len + 1, suffix);
>    prefix[len] = symbol_table::symbol_suffix_separator ();
> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);

and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
the lto/lto-partition.c caller (just use zero as counter).

> -  return get_identifier (tmp_name);
> +  return get_identifier (prefix);
>  }
>
>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
> @@ -537,7 +536,17 @@ tree
>  clone_function_name (tree decl, const char *suffix)
>  {
>    tree name = DECL_ASSEMBLER_NAME (decl);
> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
> +  const char *decl_name = IDENTIFIER_POINTER (name);
> +  char *numbered_name;
> +  unsigned int *suffix_counter;
> +  if (!clone_fn_ids) {
> +    /* Initialize the per-function counter hash table if this is the first call */
> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
> +  }

I still do not like throwing memory at the problem in this way for the
little benefit
this change provides :/

So no approval from me at this point...

Richard.

> +  suffix_counter = &clone_fn_ids->get_or_insert(decl_name);
> +  ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter);
> +  *suffix_counter = *suffix_counter + 1;
> +  return clone_function_name_1 (numbered_name, suffix);
>  }
>
>
> diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
> new file mode 100644
> index 0000000..d723e20
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
> +
> +extern int printf (const char *, ...);
> +
> +static int __attribute__ ((noinline))
> +foo (int arg)
> +{
> +  return 7 * arg;
> +}
> +
> +static int __attribute__ ((noinline))
> +bar (int arg)
> +{
> +  return arg * arg;
> +}
> +
> +int
> +baz (int arg)
> +{
> +  printf("%d\n", bar (3));
> +  printf("%d\n", bar (4));
> +  printf("%d\n", foo (5));
> +  printf("%d\n", foo (6));
> +  /* adding or removing the following call should not affect foo
> +     function's clone numbering */
> +  printf("%d\n", bar (7));
> +  return foo (8);
> +}
> +
> +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Make function clone name numbering independent.
  2018-07-20 10:05         ` Richard Biener
@ 2018-07-24 13:57           ` Michael Ploujnikov
  2018-07-26 17:27             ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-07-24 13:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bernhard Fischer, GCC Patches


[-- Attachment #1.1: Type: text/plain, Size: 27663 bytes --]

On 2018-07-20 06:05 AM, Richard Biener wrote:
> On Fri, Jul 20, 2018 at 4:48 AM Michael Ploujnikov
> <michael.ploujnikov@oracle.com> wrote:
>>
>> On 2018-07-17 04:25 PM, Michael Ploujnikov wrote:
>>> On 2018-07-17 06:02 AM, Richard Biener wrote:
>>>> On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer
>>>> <rep.dot.nop@gmail.com> wrote:
>>>>>
>>>>> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>
>>>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc
>>>>>> (1000);
>>>>>
>>>>> Isn't 1000 a bit excessive? What about 64 or thereabouts?
>>>>
>>>> I'm not sure we should throw memory at this "problem" in this way.
>>>> What specific issue
>>>> does this address?
>>>
>>> This goes along with the general theme of preventing changes to one function affecting codegen of others. What I'm seeing in this case is when a function bar is modified codegen decides to create more clones of it (eg: during the constprop pass). These extra clones cause the global counter to increment so the clones of the unchanged function foo are named differently only because of a source change to bar. I was hoping that the testcase would be a good illustration, but perhaps not; is there anything else I can do to make this clearer?
>>>
>>>
>>>>
>>>> Iff then I belive forgoing the automatic counter addidion is the way
>>>> to go and hand
>>>> control of that to the callers (for example the caller in
>>>> lto/lto-partition.c doesn't
>>>> even seem to need that counter.
>>>
>>> How can you tell that privatize_symbol_name_1 doesn't need the counter? I'm assuming it has a good reason to call clone_function_name_1 rather than appending ".lto_priv" itself.
>>>
>>>> You also assume the string you key on persists - luckily the
>>>> lto-partition.c caller
>>>> leaks it but this makes your approach quite fragile in my eye (put the logic
>>>> into clone_function_name instead, where you at least know you are dealing
>>>> with a string from an indentifier which are never collected).
>>>>
>>>> Richard.
>>>>
>>>
>>> Is this what you had in mind?:
>>>
>>> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
>>> index 6e84a31..f000420 100644
>>> --- gcc/cgraphclones.c
>>> +++ gcc/cgraphclones.c
>>> @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>>>    return new_node;
>>>  }
>>>
>>> -static GTY(()) unsigned int clone_fn_id_num;
>>> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
>>>
>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>>     NAME.  */
>>> @@ -521,14 +521,13 @@ tree
>>>  clone_function_name_1 (const char *name, const char *suffix)
>>>  {
>>>    size_t len = strlen (name);
>>> -  char *tmp_name, *prefix;
>>> +  char *prefix;
>>>
>>>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>>>    memcpy (prefix, name, len);
>>>    strcpy (prefix + len + 1, suffix);
>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>> -  return get_identifier (tmp_name);
>>> +  return get_identifier (prefix);
>>>  }
>>>
>>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
>>> @@ -537,7 +536,17 @@ tree
>>>  clone_function_name (tree decl, const char *suffix)
>>>  {
>>>    tree name = DECL_ASSEMBLER_NAME (decl);
>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>> +  char *decl_name = IDENTIFIER_POINTER (name);
>>> +  char *numbered_name;
>>> +  unsigned int *suffix_counter;
>>> +  if (!clone_fn_ids) {
>>> +    /* Initialize the per-function counter hash table if this is the first call */
>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>> +  }
>>> +  suffix_counter = &clone_fn_ids->get_or_insert(name);
>>> +  ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter);
>>> +  *suffix_counter = *suffix_counter + 1;
>>> +  return clone_function_name_1 (numbered_name, suffix);
>>>  }
>>>
>>> - Michael
>>>
>>>
>>>
>>
>> Ping, and below is the updated version of the full patch with changelogs:
>>
>>
>> gcc:
>> 2018-07-16  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>
>>        Make function clone name numbering independent.
>>        * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
>>        (clone_function_name_1): Move suffixing to clone_function_name
>>        and change it to use per-function clone_fn_ids.
>>
>> testsuite:
>> 2018-07-16  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>
>>         Clone id counters should be completely independent from one another.
>>         * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
>>
>> ---
>>  gcc/cgraphclones.c                            | 19 ++++++++++----
>>  gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
>>  2 files changed, 52 insertions(+), 5 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>
>> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
>> index 6e84a31..e1a77a2 100644
>> --- gcc/cgraphclones.c
>> +++ gcc/cgraphclones.c
>> @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>>    return new_node;
>>  }
>>
>> -static GTY(()) unsigned int clone_fn_id_num;
>> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
>>
>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>     NAME.  */
>> @@ -521,14 +521,13 @@ tree
>>  clone_function_name_1 (const char *name, const char *suffix)
> 
> pass this function the counter to use....
> 
>>  {
>>    size_t len = strlen (name);
>> -  char *tmp_name, *prefix;
>> +  char *prefix;
>>
>>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>>    memcpy (prefix, name, len);
>>    strcpy (prefix + len + 1, suffix);
>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> 
> and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
> the lto/lto-partition.c caller (just use zero as counter).
> 
>> -  return get_identifier (tmp_name);
>> +  return get_identifier (prefix);
>>  }
>>
>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
>> @@ -537,7 +536,17 @@ tree
>>  clone_function_name (tree decl, const char *suffix)
>>  {
>>    tree name = DECL_ASSEMBLER_NAME (decl);
>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>> +  char *numbered_name;
>> +  unsigned int *suffix_counter;
>> +  if (!clone_fn_ids) {
>> +    /* Initialize the per-function counter hash table if this is the first call */
>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>> +  }
> 
> I still do not like throwing memory at the problem in this way for the
> little benefit
> this change provides :/
> 
> So no approval from me at this point...
> 
> Richard.

Can you give me an idea of the memory constraints that are involved?

The highest memory usage increase that I could find was when compiling
a source file (from Linux) with roughly 10,000 functions. It showed a 2kB
increase over the before-patch use of 6936kB which is barely 0.03%.

Using a single counter can result in more confusing namespacing when
you have .bar.clone.4 despite there only being 3 clones of .bar.

From a practical point of view this change is helpful to anyone
diffing binary output such as forensic analysts, Debian Reproducible
Builds or even someone validating compiler output (before and after an input
source patch). The extra changes that this patch alleviates are a
distraction and could even be misleading. For example, applying a
source patch to the same Linux source produces the following binary
diff before my change:

--- /tmp/output.o.objdump
+++ /tmp/patched-output.o.objdump
@@ -1,5 +1,5 @@

-/tmp/uverbs_cmd/output.o:     file format elf32-i386
+/tmp/uverbs_cmd/patched-output.o:     file format elf32-i386


 Disassembly of section .text.get_order:
@@ -265,12 +265,12 @@
    3:	e9 fc ff ff ff       	jmp    4 <put_cq_read+0x4>
 			4: R_386_PC32	.text.put_uobj_read

-Disassembly of section .text.trace_kmalloc.constprop.3:
+Disassembly of section .text.trace_kmalloc.constprop.4:

-00000000 <trace_kmalloc.constprop.3>:
+00000000 <trace_kmalloc.constprop.4>:
    0:	83 3d 04 00 00 00 00 	cmpl   $0x0,0x4
 			2: R_386_32	__tracepoint_kmalloc
-   7:	74 34                	je     3d <trace_kmalloc.constprop.3+0x3d>
+   7:	74 34                	je     3d <trace_kmalloc.constprop.4+0x3d>
    9:	55                   	push   %ebp
    a:	89 cd                	mov    %ecx,%ebp
    c:	57                   	push   %edi
@@ -281,7 +281,7 @@
   13:	8b 1d 10 00 00 00    	mov    0x10,%ebx
 			15: R_386_32	__tracepoint_kmalloc
   19:	85 db                	test   %ebx,%ebx
-  1b:	74 1b                	je     38 <trace_kmalloc.constprop.3+0x38>
+  1b:	74 1b                	je     38 <trace_kmalloc.constprop.4+0x38>
   1d:	68 d0 00 00 00       	push   $0xd0
   22:	89 fa                	mov    %edi,%edx
   24:	89 f0                	mov    %esi,%eax
@@ -292,7 +292,7 @@
   31:	58                   	pop    %eax
   32:	83 3b 00             	cmpl   $0x0,(%ebx)
   35:	5a                   	pop    %edx
-  36:	eb e3                	jmp    1b <trace_kmalloc.constprop.3+0x1b>
+  36:	eb e3                	jmp    1b <trace_kmalloc.constprop.4+0x1b>
   38:	5b                   	pop    %ebx
   39:	5e                   	pop    %esi
   3a:	5f                   	pop    %edi
@@ -846,7 +846,7 @@
   78:	b8 5f 00 00 00       	mov    $0x5f,%eax
 			79: R_386_32	.text.ib_uverbs_alloc_pd
   7d:	e8 fc ff ff ff       	call   7e <ib_uverbs_alloc_pd+0x7e>
-			7e: R_386_PC32	.text.trace_kmalloc.constprop.3
+			7e: R_386_PC32	.text.trace_kmalloc.constprop.4
   82:	c7 45 d4 f4 ff ff ff 	movl   $0xfffffff4,-0x2c(%ebp)
   89:	59                   	pop    %ecx
   8a:	85 db                	test   %ebx,%ebx
@@ -1068,7 +1068,7 @@
   9c:	b8 83 00 00 00       	mov    $0x83,%eax
 			9d: R_386_32	.text.ib_uverbs_reg_mr
   a1:	e8 fc ff ff ff       	call   a2 <ib_uverbs_reg_mr+0xa2>
-			a2: R_386_PC32	.text.trace_kmalloc.constprop.3
+			a2: R_386_PC32	.text.trace_kmalloc.constprop.4
   a6:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
   ab:	58                   	pop    %eax
   ac:	85 db                	test   %ebx,%ebx
@@ -1385,7 +1385,7 @@
   99:	b8 7b 00 00 00       	mov    $0x7b,%eax
 			9a: R_386_32	.text.ib_uverbs_create_cq
   9e:	e8 fc ff ff ff       	call   9f <ib_uverbs_create_cq+0x9f>
-			9f: R_386_PC32	.text.trace_kmalloc.constprop.3
+			9f: R_386_PC32	.text.trace_kmalloc.constprop.4
   a3:	58                   	pop    %eax
   a4:	85 db                	test   %ebx,%ebx
   a6:	75 0a                	jne    b2 <ib_uverbs_create_cq+0xb2>
@@ -1607,129 +1607,107 @@
 00000000 <ib_uverbs_poll_cq>:
    0:	55                   	push   %ebp
    1:	57                   	push   %edi
-   2:	89 c7                	mov    %eax,%edi
-   4:	56                   	push   %esi
-   5:	89 ce                	mov    %ecx,%esi
-   7:	b9 10 00 00 00       	mov    $0x10,%ecx
-   c:	53                   	push   %ebx
-   d:	83 ec 18             	sub    $0x18,%esp
-  10:	8d 44 24 08          	lea    0x8(%esp),%eax
-  14:	e8 fc ff ff ff       	call   15 <ib_uverbs_poll_cq+0x15>
-			15: R_386_PC32	copy_from_user
-  19:	85 c0                	test   %eax,%eax
-  1b:	0f 85 34 01 00 00    	jne    155 <ib_uverbs_poll_cq+0x155>
-  21:	6b 44 24 14 34       	imul   $0x34,0x14(%esp),%eax
-  26:	ba d0 00 00 00       	mov    $0xd0,%edx
-  2b:	e8 fc ff ff ff       	call   2c <ib_uverbs_poll_cq+0x2c>
-			2c: R_386_PC32	__kmalloc
-  30:	89 c5                	mov    %eax,%ebp
-  32:	85 c0                	test   %eax,%eax
-  34:	0f 84 22 01 00 00    	je     15c <ib_uverbs_poll_cq+0x15c>
-  3a:	6b 44 24 14 30       	imul   $0x30,0x14(%esp),%eax
-  3f:	ba d0 00 00 00       	mov    $0xd0,%edx
-  44:	83 c0 08             	add    $0x8,%eax
-  47:	89 44 24 04          	mov    %eax,0x4(%esp)
-  4b:	e8 fc ff ff ff       	call   4c <ib_uverbs_poll_cq+0x4c>
-			4c: R_386_PC32	__kmalloc
-  50:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
-  55:	89 04 24             	mov    %eax,(%esp)
-  58:	85 c0                	test   %eax,%eax
-  5a:	0f 84 e1 00 00 00    	je     141 <ib_uverbs_poll_cq+0x141>
-  60:	8b 4f 58             	mov    0x58(%edi),%ecx
-  63:	6a 00                	push   $0x0
-  65:	b8 00 00 00 00       	mov    $0x0,%eax
-			66: R_386_32	ib_uverbs_cq_idr
-  6a:	8b 54 24 14          	mov    0x14(%esp),%edx
-  6e:	e8 fc ff ff ff       	call   6f <ib_uverbs_poll_cq+0x6f>
-			6f: R_386_PC32	.text.idr_read_obj
-  73:	ba ea ff ff ff       	mov    $0xffffffea,%edx
-  78:	89 c7                	mov    %eax,%edi
-  7a:	58                   	pop    %eax
-  7b:	85 ff                	test   %edi,%edi
-  7d:	0f 84 ae 00 00 00    	je     131 <ib_uverbs_poll_cq+0x131>
-  83:	8b 1f                	mov    (%edi),%ebx
-  85:	8b 54 24 14          	mov    0x14(%esp),%edx
-  89:	89 e9                	mov    %ebp,%ecx
-  8b:	89 f8                	mov    %edi,%eax
-  8d:	ff 93 70 01 00 00    	call   *0x170(%ebx)
-  93:	8b 1c 24             	mov    (%esp),%ebx
-  96:	89 03                	mov    %eax,(%ebx)
-  98:	89 f8                	mov    %edi,%eax
-  9a:	e8 fc ff ff ff       	call   9b <ib_uverbs_poll_cq+0x9b>
-			9b: R_386_PC32	.text.put_cq_read
-  9f:	8b 1c 24             	mov    (%esp),%ebx
-  a2:	89 e8                	mov    %ebp,%eax
-  a4:	6b 3b 34             	imul   $0x34,(%ebx),%edi
-  a7:	8d 53 08             	lea    0x8(%ebx),%edx
-  aa:	01 ef                	add    %ebp,%edi
-  ac:	39 f8                	cmp    %edi,%eax
-  ae:	74 67                	je     117 <ib_uverbs_poll_cq+0x117>
-  b0:	8b 08                	mov    (%eax),%ecx
-  b2:	8b 58 04             	mov    0x4(%eax),%ebx
-  b5:	83 c2 30             	add    $0x30,%edx
-  b8:	83 c0 34             	add    $0x34,%eax
-  bb:	89 4a d0             	mov    %ecx,-0x30(%edx)
-  be:	89 5a d4             	mov    %ebx,-0x2c(%edx)
-  c1:	8b 48 d4             	mov    -0x2c(%eax),%ecx
-  c4:	89 4a d8             	mov    %ecx,-0x28(%edx)
-  c7:	8b 48 d8             	mov    -0x28(%eax),%ecx
-  ca:	89 4a dc             	mov    %ecx,-0x24(%edx)
-  cd:	8b 48 dc             	mov    -0x24(%eax),%ecx
-  d0:	89 4a e0             	mov    %ecx,-0x20(%edx)
-  d3:	8b 48 e0             	mov    -0x20(%eax),%ecx
-  d6:	89 4a e4             	mov    %ecx,-0x1c(%edx)
-  d9:	8b 48 e8             	mov    -0x18(%eax),%ecx
-  dc:	89 4a e8             	mov    %ecx,-0x18(%edx)
-  df:	8b 48 e4             	mov    -0x1c(%eax),%ecx
-  e2:	8b 49 20             	mov    0x20(%ecx),%ecx
-  e5:	89 4a ec             	mov    %ecx,-0x14(%edx)
-  e8:	8b 48 ec             	mov    -0x14(%eax),%ecx
-  eb:	89 4a f0             	mov    %ecx,-0x10(%edx)
-  ee:	8b 48 f0             	mov    -0x10(%eax),%ecx
-  f1:	89 4a f4             	mov    %ecx,-0xc(%edx)
-  f4:	8b 48 f4             	mov    -0xc(%eax),%ecx
-  f7:	66 89 4a f8          	mov    %cx,-0x8(%edx)
-  fb:	66 8b 48 f6          	mov    -0xa(%eax),%cx
-  ff:	66 89 4a fa          	mov    %cx,-0x6(%edx)
- 103:	8a 48 f8             	mov    -0x8(%eax),%cl
- 106:	88 4a fc             	mov    %cl,-0x4(%edx)
- 109:	8a 48 f9             	mov    -0x7(%eax),%cl
- 10c:	88 4a fd             	mov    %cl,-0x3(%edx)
- 10f:	8a 48 fa             	mov    -0x6(%eax),%cl
- 112:	88 4a fe             	mov    %cl,-0x2(%edx)
- 115:	eb 95                	jmp    ac <ib_uverbs_poll_cq+0xac>
- 117:	8b 14 24             	mov    (%esp),%edx
- 11a:	8b 4c 24 04          	mov    0x4(%esp),%ecx
- 11e:	8b 44 24 08          	mov    0x8(%esp),%eax
- 122:	e8 fc ff ff ff       	call   123 <ib_uverbs_poll_cq+0x123>
-			123: R_386_PC32	copy_to_user
- 127:	83 f8 01             	cmp    $0x1,%eax
- 12a:	19 d2                	sbb    %edx,%edx
- 12c:	f7 d2                	not    %edx
- 12e:	83 e2 f2             	and    $0xfffffff2,%edx
- 131:	8b 04 24             	mov    (%esp),%eax
- 134:	89 54 24 04          	mov    %edx,0x4(%esp)
- 138:	e8 fc ff ff ff       	call   139 <ib_uverbs_poll_cq+0x139>
-			139: R_386_PC32	kfree
- 13d:	8b 54 24 04          	mov    0x4(%esp),%edx
- 141:	89 e8                	mov    %ebp,%eax
- 143:	89 14 24             	mov    %edx,(%esp)
- 146:	e8 fc ff ff ff       	call   147 <ib_uverbs_poll_cq+0x147>
-			147: R_386_PC32	kfree
- 14b:	8b 14 24             	mov    (%esp),%edx
- 14e:	85 d2                	test   %edx,%edx
- 150:	0f 45 f2             	cmovne %edx,%esi
- 153:	eb 0c                	jmp    161 <ib_uverbs_poll_cq+0x161>
- 155:	be f2 ff ff ff       	mov    $0xfffffff2,%esi
- 15a:	eb 05                	jmp    161 <ib_uverbs_poll_cq+0x161>
- 15c:	be f4 ff ff ff       	mov    $0xfffffff4,%esi
- 161:	83 c4 18             	add    $0x18,%esp
- 164:	89 f0                	mov    %esi,%eax
- 166:	5b                   	pop    %ebx
- 167:	5e                   	pop    %esi
- 168:	5f                   	pop    %edi
- 169:	5d                   	pop    %ebp
- 16a:	c3                   	ret
+   2:	bf f2 ff ff ff       	mov    $0xfffffff2,%edi
+   7:	56                   	push   %esi
+   8:	53                   	push   %ebx
+   9:	89 c3                	mov    %eax,%ebx
+   b:	81 ec 84 00 00 00    	sub    $0x84,%esp
+  11:	89 4c 24 04          	mov    %ecx,0x4(%esp)
+  15:	8d 44 24 10          	lea    0x10(%esp),%eax
+  19:	b9 10 00 00 00       	mov    $0x10,%ecx
+  1e:	e8 fc ff ff ff       	call   1f <ib_uverbs_poll_cq+0x1f>
+			1f: R_386_PC32	copy_from_user
+  23:	89 04 24             	mov    %eax,(%esp)
+  26:	85 c0                	test   %eax,%eax
+  28:	0f 85 1f 01 00 00    	jne    14d <ib_uverbs_poll_cq+0x14d>
+  2e:	8b 4b 58             	mov    0x58(%ebx),%ecx
+  31:	6a 00                	push   $0x0
+  33:	b8 00 00 00 00       	mov    $0x0,%eax
+			34: R_386_32	ib_uverbs_cq_idr
+  38:	bf ea ff ff ff       	mov    $0xffffffea,%edi
+  3d:	8b 54 24 1c          	mov    0x1c(%esp),%edx
+  41:	e8 fc ff ff ff       	call   42 <ib_uverbs_poll_cq+0x42>
+			42: R_386_PC32	.text.idr_read_obj
+  46:	89 c3                	mov    %eax,%ebx
+  48:	58                   	pop    %eax
+  49:	85 db                	test   %ebx,%ebx
+  4b:	0f 84 fc 00 00 00    	je     14d <ib_uverbs_poll_cq+0x14d>
+  51:	8b 6c 24 10          	mov    0x10(%esp),%ebp
+  55:	b9 02 00 00 00       	mov    $0x2,%ecx
+  5a:	8d 7c 24 08          	lea    0x8(%esp),%edi
+  5e:	8b 04 24             	mov    (%esp),%eax
+  61:	8d 75 08             	lea    0x8(%ebp),%esi
+  64:	f3 ab                	rep stos %eax,%es:(%edi)
+  66:	8b 44 24 1c          	mov    0x1c(%esp),%eax
+  6a:	39 44 24 08          	cmp    %eax,0x8(%esp)
+  6e:	73 1f                	jae    8f <ib_uverbs_poll_cq+0x8f>
+  70:	8b 3b                	mov    (%ebx),%edi
+  72:	8d 4c 24 50          	lea    0x50(%esp),%ecx
+  76:	ba 01 00 00 00       	mov    $0x1,%edx
+  7b:	89 d8                	mov    %ebx,%eax
+  7d:	ff 97 70 01 00 00    	call   *0x170(%edi)
+  83:	89 c7                	mov    %eax,%edi
+  85:	85 c0                	test   %eax,%eax
+  87:	0f 88 b9 00 00 00    	js     146 <ib_uverbs_poll_cq+0x146>
+  8d:	75 21                	jne    b0 <ib_uverbs_poll_cq+0xb0>
+  8f:	b9 08 00 00 00       	mov    $0x8,%ecx
+  94:	8d 54 24 08          	lea    0x8(%esp),%edx
+  98:	89 e8                	mov    %ebp,%eax
+  9a:	bf f2 ff ff ff       	mov    $0xfffffff2,%edi
+  9f:	e8 fc ff ff ff       	call   a0 <ib_uverbs_poll_cq+0xa0>
+			a0: R_386_PC32	copy_to_user
+  a4:	85 c0                	test   %eax,%eax
+  a6:	0f 44 7c 24 04       	cmove  0x4(%esp),%edi
+  ab:	e9 96 00 00 00       	jmp    146 <ib_uverbs_poll_cq+0x146>
+  b0:	8b 44 24 50          	mov    0x50(%esp),%eax
+  b4:	8b 54 24 54          	mov    0x54(%esp),%edx
+  b8:	b9 30 00 00 00       	mov    $0x30,%ecx
+  bd:	89 44 24 20          	mov    %eax,0x20(%esp)
+  c1:	8b 44 24 58          	mov    0x58(%esp),%eax
+  c5:	89 54 24 24          	mov    %edx,0x24(%esp)
+  c9:	8b 54 24 74          	mov    0x74(%esp),%edx
+  cd:	89 44 24 28          	mov    %eax,0x28(%esp)
+  d1:	8b 44 24 5c          	mov    0x5c(%esp),%eax
+  d5:	89 44 24 2c          	mov    %eax,0x2c(%esp)
+  d9:	8b 44 24 60          	mov    0x60(%esp),%eax
+  dd:	89 44 24 30          	mov    %eax,0x30(%esp)
+  e1:	8b 44 24 64          	mov    0x64(%esp),%eax
+  e5:	89 44 24 34          	mov    %eax,0x34(%esp)
+  e9:	8b 44 24 6c          	mov    0x6c(%esp),%eax
+  ed:	89 44 24 38          	mov    %eax,0x38(%esp)
+  f1:	8b 44 24 68          	mov    0x68(%esp),%eax
+  f5:	8b 40 20             	mov    0x20(%eax),%eax
+  f8:	89 54 24 44          	mov    %edx,0x44(%esp)
+  fc:	8d 54 24 20          	lea    0x20(%esp),%edx
+ 100:	c6 44 24 4f 00       	movb   $0x0,0x4f(%esp)
+ 105:	89 44 24 3c          	mov    %eax,0x3c(%esp)
+ 109:	8b 44 24 70          	mov    0x70(%esp),%eax
+ 10d:	89 44 24 40          	mov    %eax,0x40(%esp)
+ 111:	8b 44 24 78          	mov    0x78(%esp),%eax
+ 115:	89 44 24 48          	mov    %eax,0x48(%esp)
+ 119:	8b 44 24 7c          	mov    0x7c(%esp),%eax
+ 11d:	66 89 44 24 4c       	mov    %ax,0x4c(%esp)
+ 122:	8a 44 24 7e          	mov    0x7e(%esp),%al
+ 126:	88 44 24 4e          	mov    %al,0x4e(%esp)
+ 12a:	89 f0                	mov    %esi,%eax
+ 12c:	e8 fc ff ff ff       	call   12d <ib_uverbs_poll_cq+0x12d>
+			12d: R_386_PC32	copy_to_user
+ 131:	85 c0                	test   %eax,%eax
+ 133:	75 0c                	jne    141 <ib_uverbs_poll_cq+0x141>
+ 135:	83 c6 30             	add    $0x30,%esi
+ 138:	ff 44 24 08          	incl   0x8(%esp)
+ 13c:	e9 25 ff ff ff       	jmp    66 <ib_uverbs_poll_cq+0x66>
+ 141:	bf f2 ff ff ff       	mov    $0xfffffff2,%edi
+ 146:	89 d8                	mov    %ebx,%eax
+ 148:	e8 fc ff ff ff       	call   149 <ib_uverbs_poll_cq+0x149>
+			149: R_386_PC32	.text.put_cq_read
+ 14d:	81 c4 84 00 00 00    	add    $0x84,%esp
+ 153:	89 f8                	mov    %edi,%eax
+ 155:	5b                   	pop    %ebx
+ 156:	5e                   	pop    %esi
+ 157:	5f                   	pop    %edi
+ 158:	5d                   	pop    %ebp
+ 159:	c3                   	ret

 Disassembly of section .text.ib_uverbs_req_notify_cq:

@@ -1915,7 +1893,7 @@
   94:	b8 7b 00 00 00       	mov    $0x7b,%eax
 			95: R_386_32	.text.ib_uverbs_create_qp
   99:	e8 fc ff ff ff       	call   9a <ib_uverbs_create_qp+0x9a>
-			9a: R_386_PC32	.text.trace_kmalloc.constprop.3
+			9a: R_386_PC32	.text.trace_kmalloc.constprop.4
   9e:	59                   	pop    %ecx
   9f:	c7 85 50 ff ff ff f4 	movl   $0xfffffff4,-0xb0(%ebp)
   a6:	ff ff ff
@@ -2241,7 +2219,7 @@
   68:	b8 4f 00 00 00       	mov    $0x4f,%eax
 			69: R_386_32	.text.ib_uverbs_query_qp
   6d:	e8 fc ff ff ff       	call   6e <ib_uverbs_query_qp+0x6e>
-			6e: R_386_PC32	.text.trace_kmalloc.constprop.3
+			6e: R_386_PC32	.text.trace_kmalloc.constprop.4
   72:	59                   	pop    %ecx
   73:	c7 85 5c ff ff ff 10 	movl   $0x10,-0xa4(%ebp)
   7a:	00 00 00
@@ -2260,7 +2238,7 @@
   a3:	b8 86 00 00 00       	mov    $0x86,%eax
 			a4: R_386_32	.text.ib_uverbs_query_qp
   a8:	e8 fc ff ff ff       	call   a9 <ib_uverbs_query_qp+0xa9>
-			a9: R_386_PC32	.text.trace_kmalloc.constprop.3
+			a9: R_386_PC32	.text.trace_kmalloc.constprop.4
   ad:	5a                   	pop    %edx
   ae:	85 db                	test   %ebx,%ebx
   b0:	0f 84 c1 01 00 00    	je     277 <ib_uverbs_query_qp+0x277>
@@ -2462,7 +2440,7 @@
   88:	b8 6f 00 00 00       	mov    $0x6f,%eax
 			89: R_386_32	.text.ib_uverbs_modify_qp
   8d:	e8 fc ff ff ff       	call   8e <ib_uverbs_modify_qp+0x8e>
-			8e: R_386_PC32	.text.trace_kmalloc.constprop.3
+			8e: R_386_PC32	.text.trace_kmalloc.constprop.4
   92:	5a                   	pop    %edx
   93:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
   98:	85 db                	test   %ebx,%ebx
@@ -3129,7 +3107,7 @@
   6a:	b8 4c 00 00 00       	mov    $0x4c,%eax
 			6b: R_386_32	.text.ib_uverbs_create_ah
   6f:	e8 fc ff ff ff       	call   70 <ib_uverbs_create_ah+0x70>
-			70: R_386_PC32	.text.trace_kmalloc.constprop.3
+			70: R_386_PC32	.text.trace_kmalloc.constprop.4
   74:	58                   	pop    %eax
   75:	85 db                	test   %ebx,%ebx
   77:	75 0a                	jne    83 <ib_uverbs_create_ah+0x83>
@@ -3396,7 +3374,7 @@
   af:	b8 91 00 00 00       	mov    $0x91,%eax
 			b0: R_386_32	.text.ib_uverbs_attach_mcast
   b4:	e8 fc ff ff ff       	call   b5 <ib_uverbs_attach_mcast+0xb5>
-			b5: R_386_PC32	.text.trace_kmalloc.constprop.3
+			b5: R_386_PC32	.text.trace_kmalloc.constprop.4
   b9:	58                   	pop    %eax
   ba:	85 db                	test   %ebx,%ebx
   bc:	75 07                	jne    c5 <ib_uverbs_attach_mcast+0xc5>
@@ -3572,7 +3550,7 @@
   77:	b8 5e 00 00 00       	mov    $0x5e,%eax
 			78: R_386_32	.text.ib_uverbs_create_srq
   7c:	e8 fc ff ff ff       	call   7d <ib_uverbs_create_srq+0x7d>
-			7d: R_386_PC32	.text.trace_kmalloc.constprop.3
+			7d: R_386_PC32	.text.trace_kmalloc.constprop.4
   81:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
   86:	58                   	pop    %eax
   87:	85 db                	test   %ebx,%ebx

Needless to say, after my change the diff only shows the truly changed
functions.

- Michael

> 
>> +  suffix_counter = &clone_fn_ids->get_or_insert(decl_name);
>> +  ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter);
>> +  *suffix_counter = *suffix_counter + 1;
>> +  return clone_function_name_1 (numbered_name, suffix);
>>  }
>>
>>
>> diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
>> new file mode 100644
>> index 0000000..d723e20
>> --- /dev/null
>> +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
>> @@ -0,0 +1,38 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
>> +
>> +extern int printf (const char *, ...);
>> +
>> +static int __attribute__ ((noinline))
>> +foo (int arg)
>> +{
>> +  return 7 * arg;
>> +}
>> +
>> +static int __attribute__ ((noinline))
>> +bar (int arg)
>> +{
>> +  return arg * arg;
>> +}
>> +
>> +int
>> +baz (int arg)
>> +{
>> +  printf("%d\n", bar (3));
>> +  printf("%d\n", bar (4));
>> +  printf("%d\n", foo (5));
>> +  printf("%d\n", foo (6));
>> +  /* adding or removing the following call should not affect foo
>> +     function's clone numbering */
>> +  printf("%d\n", bar (7));
>> +  return foo (8);
>> +}
>> +
>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
>> --
>> 2.7.4
>>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Make function clone name numbering independent.
  2018-07-24 13:57           ` Michael Ploujnikov
@ 2018-07-26 17:27             ` Michael Ploujnikov
  2018-07-31 17:40               ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-07-26 17:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bernhard Fischer, GCC Patches


[-- Attachment #1.1: Type: text/plain, Size: 26550 bytes --]

On 2018-07-24 09:57 AM, Michael Ploujnikov wrote:
> On 2018-07-20 06:05 AM, Richard Biener wrote:
>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>>     NAME.  */
>>> @@ -521,14 +521,13 @@ tree
>>>  clone_function_name_1 (const char *name, const char *suffix)
>>
>> pass this function the counter to use....
>>
>>>  {
>>>    size_t len = strlen (name);
>>> -  char *tmp_name, *prefix;
>>> +  char *prefix;
>>>
>>>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>>>    memcpy (prefix, name, len);
>>>    strcpy (prefix + len + 1, suffix);
>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>
>> and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
>> the lto/lto-partition.c caller (just use zero as counter).
>>
>>> -  return get_identifier (tmp_name);
>>> +  return get_identifier (prefix);
>>>  }
>>>
>>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
>>> @@ -537,7 +536,17 @@ tree
>>>  clone_function_name (tree decl, const char *suffix)
>>>  {
>>>    tree name = DECL_ASSEMBLER_NAME (decl);
>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>>> +  char *numbered_name;
>>> +  unsigned int *suffix_counter;
>>> +  if (!clone_fn_ids) {
>>> +    /* Initialize the per-function counter hash table if this is the first call */
>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>> +  }
>>
>> I still do not like throwing memory at the problem in this way for the
>> little benefit
>> this change provides :/
>>
>> So no approval from me at this point...
>>
>> Richard.
> 
> Can you give me an idea of the memory constraints that are involved?
> 
> The highest memory usage increase that I could find was when compiling
> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB
> increase over the before-patch use of 6936kB which is barely 0.03%.
> 
> Using a single counter can result in more confusing namespacing when
> you have .bar.clone.4 despite there only being 3 clones of .bar.
> 
> From a practical point of view this change is helpful to anyone
> diffing binary output such as forensic analysts, Debian Reproducible
> Builds or even someone validating compiler output (before and after an input
> source patch). The extra changes that this patch alleviates are a
> distraction and could even be misleading. For example, applying a
> source patch to the same Linux source produces the following binary
> diff before my change:
> 
> --- /tmp/output.o.objdump
> +++ /tmp/patched-output.o.objdump
> @@ -1,5 +1,5 @@
> 
> -/tmp/uverbs_cmd/output.o:     file format elf32-i386
> +/tmp/uverbs_cmd/patched-output.o:     file format elf32-i386
> 
> 
>  Disassembly of section .text.get_order:
> @@ -265,12 +265,12 @@
>     3:	e9 fc ff ff ff       	jmp    4 <put_cq_read+0x4>
>  			4: R_386_PC32	.text.put_uobj_read
> 
> -Disassembly of section .text.trace_kmalloc.constprop.3:
> +Disassembly of section .text.trace_kmalloc.constprop.4:
> 
> -00000000 <trace_kmalloc.constprop.3>:
> +00000000 <trace_kmalloc.constprop.4>:
>     0:	83 3d 04 00 00 00 00 	cmpl   $0x0,0x4
>  			2: R_386_32	__tracepoint_kmalloc
> -   7:	74 34                	je     3d <trace_kmalloc.constprop.3+0x3d>
> +   7:	74 34                	je     3d <trace_kmalloc.constprop.4+0x3d>
>     9:	55                   	push   %ebp
>     a:	89 cd                	mov    %ecx,%ebp
>     c:	57                   	push   %edi
> @@ -281,7 +281,7 @@
>    13:	8b 1d 10 00 00 00    	mov    0x10,%ebx
>  			15: R_386_32	__tracepoint_kmalloc
>    19:	85 db                	test   %ebx,%ebx
> -  1b:	74 1b                	je     38 <trace_kmalloc.constprop.3+0x38>
> +  1b:	74 1b                	je     38 <trace_kmalloc.constprop.4+0x38>
>    1d:	68 d0 00 00 00       	push   $0xd0
>    22:	89 fa                	mov    %edi,%edx
>    24:	89 f0                	mov    %esi,%eax
> @@ -292,7 +292,7 @@
>    31:	58                   	pop    %eax
>    32:	83 3b 00             	cmpl   $0x0,(%ebx)
>    35:	5a                   	pop    %edx
> -  36:	eb e3                	jmp    1b <trace_kmalloc.constprop.3+0x1b>
> +  36:	eb e3                	jmp    1b <trace_kmalloc.constprop.4+0x1b>
>    38:	5b                   	pop    %ebx
>    39:	5e                   	pop    %esi
>    3a:	5f                   	pop    %edi
> @@ -846,7 +846,7 @@
>    78:	b8 5f 00 00 00       	mov    $0x5f,%eax
>  			79: R_386_32	.text.ib_uverbs_alloc_pd
>    7d:	e8 fc ff ff ff       	call   7e <ib_uverbs_alloc_pd+0x7e>
> -			7e: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			7e: R_386_PC32	.text.trace_kmalloc.constprop.4
>    82:	c7 45 d4 f4 ff ff ff 	movl   $0xfffffff4,-0x2c(%ebp)
>    89:	59                   	pop    %ecx
>    8a:	85 db                	test   %ebx,%ebx
> @@ -1068,7 +1068,7 @@
>    9c:	b8 83 00 00 00       	mov    $0x83,%eax
>  			9d: R_386_32	.text.ib_uverbs_reg_mr
>    a1:	e8 fc ff ff ff       	call   a2 <ib_uverbs_reg_mr+0xa2>
> -			a2: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			a2: R_386_PC32	.text.trace_kmalloc.constprop.4
>    a6:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
>    ab:	58                   	pop    %eax
>    ac:	85 db                	test   %ebx,%ebx
> @@ -1385,7 +1385,7 @@
>    99:	b8 7b 00 00 00       	mov    $0x7b,%eax
>  			9a: R_386_32	.text.ib_uverbs_create_cq
>    9e:	e8 fc ff ff ff       	call   9f <ib_uverbs_create_cq+0x9f>
> -			9f: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			9f: R_386_PC32	.text.trace_kmalloc.constprop.4
>    a3:	58                   	pop    %eax
>    a4:	85 db                	test   %ebx,%ebx
>    a6:	75 0a                	jne    b2 <ib_uverbs_create_cq+0xb2>
> @@ -1607,129 +1607,107 @@
>  00000000 <ib_uverbs_poll_cq>:
>     0:	55                   	push   %ebp
>     1:	57                   	push   %edi
> -   2:	89 c7                	mov    %eax,%edi
> -   4:	56                   	push   %esi
> -   5:	89 ce                	mov    %ecx,%esi
> -   7:	b9 10 00 00 00       	mov    $0x10,%ecx
> -   c:	53                   	push   %ebx
> -   d:	83 ec 18             	sub    $0x18,%esp
> -  10:	8d 44 24 08          	lea    0x8(%esp),%eax
> -  14:	e8 fc ff ff ff       	call   15 <ib_uverbs_poll_cq+0x15>
> -			15: R_386_PC32	copy_from_user
> -  19:	85 c0                	test   %eax,%eax
> -  1b:	0f 85 34 01 00 00    	jne    155 <ib_uverbs_poll_cq+0x155>
> -  21:	6b 44 24 14 34       	imul   $0x34,0x14(%esp),%eax
> -  26:	ba d0 00 00 00       	mov    $0xd0,%edx
> -  2b:	e8 fc ff ff ff       	call   2c <ib_uverbs_poll_cq+0x2c>
> -			2c: R_386_PC32	__kmalloc
> -  30:	89 c5                	mov    %eax,%ebp
> -  32:	85 c0                	test   %eax,%eax
> -  34:	0f 84 22 01 00 00    	je     15c <ib_uverbs_poll_cq+0x15c>
> -  3a:	6b 44 24 14 30       	imul   $0x30,0x14(%esp),%eax
> -  3f:	ba d0 00 00 00       	mov    $0xd0,%edx
> -  44:	83 c0 08             	add    $0x8,%eax
> -  47:	89 44 24 04          	mov    %eax,0x4(%esp)
> -  4b:	e8 fc ff ff ff       	call   4c <ib_uverbs_poll_cq+0x4c>
> -			4c: R_386_PC32	__kmalloc
> -  50:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
> -  55:	89 04 24             	mov    %eax,(%esp)
> -  58:	85 c0                	test   %eax,%eax
> -  5a:	0f 84 e1 00 00 00    	je     141 <ib_uverbs_poll_cq+0x141>
> -  60:	8b 4f 58             	mov    0x58(%edi),%ecx
> -  63:	6a 00                	push   $0x0
> -  65:	b8 00 00 00 00       	mov    $0x0,%eax
> -			66: R_386_32	ib_uverbs_cq_idr
> -  6a:	8b 54 24 14          	mov    0x14(%esp),%edx
> -  6e:	e8 fc ff ff ff       	call   6f <ib_uverbs_poll_cq+0x6f>
> -			6f: R_386_PC32	.text.idr_read_obj
> -  73:	ba ea ff ff ff       	mov    $0xffffffea,%edx
> -  78:	89 c7                	mov    %eax,%edi
> -  7a:	58                   	pop    %eax
> -  7b:	85 ff                	test   %edi,%edi
> -  7d:	0f 84 ae 00 00 00    	je     131 <ib_uverbs_poll_cq+0x131>
> -  83:	8b 1f                	mov    (%edi),%ebx
> -  85:	8b 54 24 14          	mov    0x14(%esp),%edx
> -  89:	89 e9                	mov    %ebp,%ecx
> -  8b:	89 f8                	mov    %edi,%eax
> -  8d:	ff 93 70 01 00 00    	call   *0x170(%ebx)
> -  93:	8b 1c 24             	mov    (%esp),%ebx
> -  96:	89 03                	mov    %eax,(%ebx)
> -  98:	89 f8                	mov    %edi,%eax
> -  9a:	e8 fc ff ff ff       	call   9b <ib_uverbs_poll_cq+0x9b>
> -			9b: R_386_PC32	.text.put_cq_read
> -  9f:	8b 1c 24             	mov    (%esp),%ebx
> -  a2:	89 e8                	mov    %ebp,%eax
> -  a4:	6b 3b 34             	imul   $0x34,(%ebx),%edi
> -  a7:	8d 53 08             	lea    0x8(%ebx),%edx
> -  aa:	01 ef                	add    %ebp,%edi
> -  ac:	39 f8                	cmp    %edi,%eax
> -  ae:	74 67                	je     117 <ib_uverbs_poll_cq+0x117>
> -  b0:	8b 08                	mov    (%eax),%ecx
> -  b2:	8b 58 04             	mov    0x4(%eax),%ebx
> -  b5:	83 c2 30             	add    $0x30,%edx
> -  b8:	83 c0 34             	add    $0x34,%eax
> -  bb:	89 4a d0             	mov    %ecx,-0x30(%edx)
> -  be:	89 5a d4             	mov    %ebx,-0x2c(%edx)
> -  c1:	8b 48 d4             	mov    -0x2c(%eax),%ecx
> -  c4:	89 4a d8             	mov    %ecx,-0x28(%edx)
> -  c7:	8b 48 d8             	mov    -0x28(%eax),%ecx
> -  ca:	89 4a dc             	mov    %ecx,-0x24(%edx)
> -  cd:	8b 48 dc             	mov    -0x24(%eax),%ecx
> -  d0:	89 4a e0             	mov    %ecx,-0x20(%edx)
> -  d3:	8b 48 e0             	mov    -0x20(%eax),%ecx
> -  d6:	89 4a e4             	mov    %ecx,-0x1c(%edx)
> -  d9:	8b 48 e8             	mov    -0x18(%eax),%ecx
> -  dc:	89 4a e8             	mov    %ecx,-0x18(%edx)
> -  df:	8b 48 e4             	mov    -0x1c(%eax),%ecx
> -  e2:	8b 49 20             	mov    0x20(%ecx),%ecx
> -  e5:	89 4a ec             	mov    %ecx,-0x14(%edx)
> -  e8:	8b 48 ec             	mov    -0x14(%eax),%ecx
> -  eb:	89 4a f0             	mov    %ecx,-0x10(%edx)
> -  ee:	8b 48 f0             	mov    -0x10(%eax),%ecx
> -  f1:	89 4a f4             	mov    %ecx,-0xc(%edx)
> -  f4:	8b 48 f4             	mov    -0xc(%eax),%ecx
> -  f7:	66 89 4a f8          	mov    %cx,-0x8(%edx)
> -  fb:	66 8b 48 f6          	mov    -0xa(%eax),%cx
> -  ff:	66 89 4a fa          	mov    %cx,-0x6(%edx)
> - 103:	8a 48 f8             	mov    -0x8(%eax),%cl
> - 106:	88 4a fc             	mov    %cl,-0x4(%edx)
> - 109:	8a 48 f9             	mov    -0x7(%eax),%cl
> - 10c:	88 4a fd             	mov    %cl,-0x3(%edx)
> - 10f:	8a 48 fa             	mov    -0x6(%eax),%cl
> - 112:	88 4a fe             	mov    %cl,-0x2(%edx)
> - 115:	eb 95                	jmp    ac <ib_uverbs_poll_cq+0xac>
> - 117:	8b 14 24             	mov    (%esp),%edx
> - 11a:	8b 4c 24 04          	mov    0x4(%esp),%ecx
> - 11e:	8b 44 24 08          	mov    0x8(%esp),%eax
> - 122:	e8 fc ff ff ff       	call   123 <ib_uverbs_poll_cq+0x123>
> -			123: R_386_PC32	copy_to_user
> - 127:	83 f8 01             	cmp    $0x1,%eax
> - 12a:	19 d2                	sbb    %edx,%edx
> - 12c:	f7 d2                	not    %edx
> - 12e:	83 e2 f2             	and    $0xfffffff2,%edx
> - 131:	8b 04 24             	mov    (%esp),%eax
> - 134:	89 54 24 04          	mov    %edx,0x4(%esp)
> - 138:	e8 fc ff ff ff       	call   139 <ib_uverbs_poll_cq+0x139>
> -			139: R_386_PC32	kfree
> - 13d:	8b 54 24 04          	mov    0x4(%esp),%edx
> - 141:	89 e8                	mov    %ebp,%eax
> - 143:	89 14 24             	mov    %edx,(%esp)
> - 146:	e8 fc ff ff ff       	call   147 <ib_uverbs_poll_cq+0x147>
> -			147: R_386_PC32	kfree
> - 14b:	8b 14 24             	mov    (%esp),%edx
> - 14e:	85 d2                	test   %edx,%edx
> - 150:	0f 45 f2             	cmovne %edx,%esi
> - 153:	eb 0c                	jmp    161 <ib_uverbs_poll_cq+0x161>
> - 155:	be f2 ff ff ff       	mov    $0xfffffff2,%esi
> - 15a:	eb 05                	jmp    161 <ib_uverbs_poll_cq+0x161>
> - 15c:	be f4 ff ff ff       	mov    $0xfffffff4,%esi
> - 161:	83 c4 18             	add    $0x18,%esp
> - 164:	89 f0                	mov    %esi,%eax
> - 166:	5b                   	pop    %ebx
> - 167:	5e                   	pop    %esi
> - 168:	5f                   	pop    %edi
> - 169:	5d                   	pop    %ebp
> - 16a:	c3                   	ret
> +   2:	bf f2 ff ff ff       	mov    $0xfffffff2,%edi
> +   7:	56                   	push   %esi
> +   8:	53                   	push   %ebx
> +   9:	89 c3                	mov    %eax,%ebx
> +   b:	81 ec 84 00 00 00    	sub    $0x84,%esp
> +  11:	89 4c 24 04          	mov    %ecx,0x4(%esp)
> +  15:	8d 44 24 10          	lea    0x10(%esp),%eax
> +  19:	b9 10 00 00 00       	mov    $0x10,%ecx
> +  1e:	e8 fc ff ff ff       	call   1f <ib_uverbs_poll_cq+0x1f>
> +			1f: R_386_PC32	copy_from_user
> +  23:	89 04 24             	mov    %eax,(%esp)
> +  26:	85 c0                	test   %eax,%eax
> +  28:	0f 85 1f 01 00 00    	jne    14d <ib_uverbs_poll_cq+0x14d>
> +  2e:	8b 4b 58             	mov    0x58(%ebx),%ecx
> +  31:	6a 00                	push   $0x0
> +  33:	b8 00 00 00 00       	mov    $0x0,%eax
> +			34: R_386_32	ib_uverbs_cq_idr
> +  38:	bf ea ff ff ff       	mov    $0xffffffea,%edi
> +  3d:	8b 54 24 1c          	mov    0x1c(%esp),%edx
> +  41:	e8 fc ff ff ff       	call   42 <ib_uverbs_poll_cq+0x42>
> +			42: R_386_PC32	.text.idr_read_obj
> +  46:	89 c3                	mov    %eax,%ebx
> +  48:	58                   	pop    %eax
> +  49:	85 db                	test   %ebx,%ebx
> +  4b:	0f 84 fc 00 00 00    	je     14d <ib_uverbs_poll_cq+0x14d>
> +  51:	8b 6c 24 10          	mov    0x10(%esp),%ebp
> +  55:	b9 02 00 00 00       	mov    $0x2,%ecx
> +  5a:	8d 7c 24 08          	lea    0x8(%esp),%edi
> +  5e:	8b 04 24             	mov    (%esp),%eax
> +  61:	8d 75 08             	lea    0x8(%ebp),%esi
> +  64:	f3 ab                	rep stos %eax,%es:(%edi)
> +  66:	8b 44 24 1c          	mov    0x1c(%esp),%eax
> +  6a:	39 44 24 08          	cmp    %eax,0x8(%esp)
> +  6e:	73 1f                	jae    8f <ib_uverbs_poll_cq+0x8f>
> +  70:	8b 3b                	mov    (%ebx),%edi
> +  72:	8d 4c 24 50          	lea    0x50(%esp),%ecx
> +  76:	ba 01 00 00 00       	mov    $0x1,%edx
> +  7b:	89 d8                	mov    %ebx,%eax
> +  7d:	ff 97 70 01 00 00    	call   *0x170(%edi)
> +  83:	89 c7                	mov    %eax,%edi
> +  85:	85 c0                	test   %eax,%eax
> +  87:	0f 88 b9 00 00 00    	js     146 <ib_uverbs_poll_cq+0x146>
> +  8d:	75 21                	jne    b0 <ib_uverbs_poll_cq+0xb0>
> +  8f:	b9 08 00 00 00       	mov    $0x8,%ecx
> +  94:	8d 54 24 08          	lea    0x8(%esp),%edx
> +  98:	89 e8                	mov    %ebp,%eax
> +  9a:	bf f2 ff ff ff       	mov    $0xfffffff2,%edi
> +  9f:	e8 fc ff ff ff       	call   a0 <ib_uverbs_poll_cq+0xa0>
> +			a0: R_386_PC32	copy_to_user
> +  a4:	85 c0                	test   %eax,%eax
> +  a6:	0f 44 7c 24 04       	cmove  0x4(%esp),%edi
> +  ab:	e9 96 00 00 00       	jmp    146 <ib_uverbs_poll_cq+0x146>
> +  b0:	8b 44 24 50          	mov    0x50(%esp),%eax
> +  b4:	8b 54 24 54          	mov    0x54(%esp),%edx
> +  b8:	b9 30 00 00 00       	mov    $0x30,%ecx
> +  bd:	89 44 24 20          	mov    %eax,0x20(%esp)
> +  c1:	8b 44 24 58          	mov    0x58(%esp),%eax
> +  c5:	89 54 24 24          	mov    %edx,0x24(%esp)
> +  c9:	8b 54 24 74          	mov    0x74(%esp),%edx
> +  cd:	89 44 24 28          	mov    %eax,0x28(%esp)
> +  d1:	8b 44 24 5c          	mov    0x5c(%esp),%eax
> +  d5:	89 44 24 2c          	mov    %eax,0x2c(%esp)
> +  d9:	8b 44 24 60          	mov    0x60(%esp),%eax
> +  dd:	89 44 24 30          	mov    %eax,0x30(%esp)
> +  e1:	8b 44 24 64          	mov    0x64(%esp),%eax
> +  e5:	89 44 24 34          	mov    %eax,0x34(%esp)
> +  e9:	8b 44 24 6c          	mov    0x6c(%esp),%eax
> +  ed:	89 44 24 38          	mov    %eax,0x38(%esp)
> +  f1:	8b 44 24 68          	mov    0x68(%esp),%eax
> +  f5:	8b 40 20             	mov    0x20(%eax),%eax
> +  f8:	89 54 24 44          	mov    %edx,0x44(%esp)
> +  fc:	8d 54 24 20          	lea    0x20(%esp),%edx
> + 100:	c6 44 24 4f 00       	movb   $0x0,0x4f(%esp)
> + 105:	89 44 24 3c          	mov    %eax,0x3c(%esp)
> + 109:	8b 44 24 70          	mov    0x70(%esp),%eax
> + 10d:	89 44 24 40          	mov    %eax,0x40(%esp)
> + 111:	8b 44 24 78          	mov    0x78(%esp),%eax
> + 115:	89 44 24 48          	mov    %eax,0x48(%esp)
> + 119:	8b 44 24 7c          	mov    0x7c(%esp),%eax
> + 11d:	66 89 44 24 4c       	mov    %ax,0x4c(%esp)
> + 122:	8a 44 24 7e          	mov    0x7e(%esp),%al
> + 126:	88 44 24 4e          	mov    %al,0x4e(%esp)
> + 12a:	89 f0                	mov    %esi,%eax
> + 12c:	e8 fc ff ff ff       	call   12d <ib_uverbs_poll_cq+0x12d>
> +			12d: R_386_PC32	copy_to_user
> + 131:	85 c0                	test   %eax,%eax
> + 133:	75 0c                	jne    141 <ib_uverbs_poll_cq+0x141>
> + 135:	83 c6 30             	add    $0x30,%esi
> + 138:	ff 44 24 08          	incl   0x8(%esp)
> + 13c:	e9 25 ff ff ff       	jmp    66 <ib_uverbs_poll_cq+0x66>
> + 141:	bf f2 ff ff ff       	mov    $0xfffffff2,%edi
> + 146:	89 d8                	mov    %ebx,%eax
> + 148:	e8 fc ff ff ff       	call   149 <ib_uverbs_poll_cq+0x149>
> +			149: R_386_PC32	.text.put_cq_read
> + 14d:	81 c4 84 00 00 00    	add    $0x84,%esp
> + 153:	89 f8                	mov    %edi,%eax
> + 155:	5b                   	pop    %ebx
> + 156:	5e                   	pop    %esi
> + 157:	5f                   	pop    %edi
> + 158:	5d                   	pop    %ebp
> + 159:	c3                   	ret
> 
>  Disassembly of section .text.ib_uverbs_req_notify_cq:
> 
> @@ -1915,7 +1893,7 @@
>    94:	b8 7b 00 00 00       	mov    $0x7b,%eax
>  			95: R_386_32	.text.ib_uverbs_create_qp
>    99:	e8 fc ff ff ff       	call   9a <ib_uverbs_create_qp+0x9a>
> -			9a: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			9a: R_386_PC32	.text.trace_kmalloc.constprop.4
>    9e:	59                   	pop    %ecx
>    9f:	c7 85 50 ff ff ff f4 	movl   $0xfffffff4,-0xb0(%ebp)
>    a6:	ff ff ff
> @@ -2241,7 +2219,7 @@
>    68:	b8 4f 00 00 00       	mov    $0x4f,%eax
>  			69: R_386_32	.text.ib_uverbs_query_qp
>    6d:	e8 fc ff ff ff       	call   6e <ib_uverbs_query_qp+0x6e>
> -			6e: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			6e: R_386_PC32	.text.trace_kmalloc.constprop.4
>    72:	59                   	pop    %ecx
>    73:	c7 85 5c ff ff ff 10 	movl   $0x10,-0xa4(%ebp)
>    7a:	00 00 00
> @@ -2260,7 +2238,7 @@
>    a3:	b8 86 00 00 00       	mov    $0x86,%eax
>  			a4: R_386_32	.text.ib_uverbs_query_qp
>    a8:	e8 fc ff ff ff       	call   a9 <ib_uverbs_query_qp+0xa9>
> -			a9: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			a9: R_386_PC32	.text.trace_kmalloc.constprop.4
>    ad:	5a                   	pop    %edx
>    ae:	85 db                	test   %ebx,%ebx
>    b0:	0f 84 c1 01 00 00    	je     277 <ib_uverbs_query_qp+0x277>
> @@ -2462,7 +2440,7 @@
>    88:	b8 6f 00 00 00       	mov    $0x6f,%eax
>  			89: R_386_32	.text.ib_uverbs_modify_qp
>    8d:	e8 fc ff ff ff       	call   8e <ib_uverbs_modify_qp+0x8e>
> -			8e: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			8e: R_386_PC32	.text.trace_kmalloc.constprop.4
>    92:	5a                   	pop    %edx
>    93:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
>    98:	85 db                	test   %ebx,%ebx
> @@ -3129,7 +3107,7 @@
>    6a:	b8 4c 00 00 00       	mov    $0x4c,%eax
>  			6b: R_386_32	.text.ib_uverbs_create_ah
>    6f:	e8 fc ff ff ff       	call   70 <ib_uverbs_create_ah+0x70>
> -			70: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			70: R_386_PC32	.text.trace_kmalloc.constprop.4
>    74:	58                   	pop    %eax
>    75:	85 db                	test   %ebx,%ebx
>    77:	75 0a                	jne    83 <ib_uverbs_create_ah+0x83>
> @@ -3396,7 +3374,7 @@
>    af:	b8 91 00 00 00       	mov    $0x91,%eax
>  			b0: R_386_32	.text.ib_uverbs_attach_mcast
>    b4:	e8 fc ff ff ff       	call   b5 <ib_uverbs_attach_mcast+0xb5>
> -			b5: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			b5: R_386_PC32	.text.trace_kmalloc.constprop.4
>    b9:	58                   	pop    %eax
>    ba:	85 db                	test   %ebx,%ebx
>    bc:	75 07                	jne    c5 <ib_uverbs_attach_mcast+0xc5>
> @@ -3572,7 +3550,7 @@
>    77:	b8 5e 00 00 00       	mov    $0x5e,%eax
>  			78: R_386_32	.text.ib_uverbs_create_srq
>    7c:	e8 fc ff ff ff       	call   7d <ib_uverbs_create_srq+0x7d>
> -			7d: R_386_PC32	.text.trace_kmalloc.constprop.3
> +			7d: R_386_PC32	.text.trace_kmalloc.constprop.4
>    81:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
>    86:	58                   	pop    %eax
>    87:	85 db                	test   %ebx,%ebx
> 
> Needless to say, after my change the diff only shows the truly changed
> functions.
> 
> - Michael
> 

Updated patch


gcc:
2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Make function clone name numbering independent.
       * cgraph.h (clone_function_name_1): Add clone_num argument
       * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
       (clone_function_name): Use counters from the hashtable.

lto:
2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * lto-partition.c (privatize_symbol_name_1): Pass 0 to
       clone_function_name_1.


testsuite:
2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	Clone numbering should be independent for each function.
	* gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.

---
 gcc/cgraph.h                                  |  2 +-
 gcc/cgraphclones.c                            | 16 ++++++++---
 gcc/lto/lto-partition.c                       |  2 +-
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
 4 files changed, 52 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c

diff --git gcc/cgraph.h gcc/cgraph.h
index a8b1b4c..fe44bd0 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool,
profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */

-tree clone_function_name_1 (const char *, const char *);
+tree clone_function_name_1 (const char *, const char *, unsigned int);
 tree clone_function_name (tree decl, const char *);

 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 6e84a31..7de7a65 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count
prof_count,
   return new_node;
 }

-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;

 /* Return a new assembler name for a clone with SUFFIX of a decl named
    NAME.  */

 tree
-clone_function_name_1 (const char *name, const char *suffix)
+clone_function_name_1 (const char *name, const char *suffix, unsigned int
clone_num)
 {
   size_t len = strlen (name);
   char *tmp_name, *prefix;
@@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix)
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
+  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num);
   return get_identifier (tmp_name);
 }

@@ -537,7 +537,15 @@ tree
 clone_function_name (tree decl, const char *suffix)
 {
   tree name = DECL_ASSEMBLER_NAME (decl);
-  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+  const char *decl_name = IDENTIFIER_POINTER (name);
+  unsigned int *suffix_counter;
+  if (!clone_fn_ids) {
+    /* Initialize the per-function counter hash table on first call */
+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
+  }
+  suffix_counter = &clone_fn_ids->get_or_insert(decl_name);
+  *suffix_counter = *suffix_counter + 1;
+  return clone_function_name_1 (decl_name, suffix, *suffix_counter);
 }


diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
index c7a5710..4b15232 100644
--- gcc/lto/lto-partition.c
+++ gcc/lto/lto-partition.c
@@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
   name = maybe_rewrite_identifier (name);
   symtab->change_decl_assembler_name (decl,
 				      clone_function_name_1 (name,
-							     "lto_priv"));
+							     "lto_priv", 0));

   if (node->lto_file_data)
     lto_record_renamed_decl (node->lto_file_data, name,
diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c
gcc/testsuite/gcc.dg/independent-cloneids-1.c
new file mode 100644
index 0000000..d723e20
--- /dev/null
+++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
+
+extern int printf (const char *, ...);
+
+static int __attribute__ ((noinline))
+foo (int arg)
+{
+  return 7 * arg;
+}
+
+static int __attribute__ ((noinline))
+bar (int arg)
+{
+  return arg * arg;
+}
+
+int
+baz (int arg)
+{
+  printf("%d\n", bar (3));
+  printf("%d\n", bar (4));
+  printf("%d\n", foo (5));
+  printf("%d\n", foo (6));
+  /* adding or removing the following call should not affect foo
+     function's clone numbering */
+  printf("%d\n", bar (7));
+  return foo (8);
+}
+
+/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
+/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
+/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
-- 
2.7.4


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Make function clone name numbering independent.
  2018-07-26 17:27             ` Michael Ploujnikov
@ 2018-07-31 17:40               ` Michael Ploujnikov
  2018-08-01 10:38                 ` Richard Biener
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-07-31 17:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bernhard Fischer, GCC Patches


[-- Attachment #1.1: Type: text/plain, Size: 27456 bytes --]

On 2018-07-26 01:27 PM, Michael Ploujnikov wrote:
> On 2018-07-24 09:57 AM, Michael Ploujnikov wrote:
>> On 2018-07-20 06:05 AM, Richard Biener wrote:
>>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>>>     NAME.  */
>>>> @@ -521,14 +521,13 @@ tree
>>>>  clone_function_name_1 (const char *name, const char *suffix)
>>>
>>> pass this function the counter to use....
>>>
>>>>  {
>>>>    size_t len = strlen (name);
>>>> -  char *tmp_name, *prefix;
>>>> +  char *prefix;
>>>>
>>>>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>>>>    memcpy (prefix, name, len);
>>>>    strcpy (prefix + len + 1, suffix);
>>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>>
>>> and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
>>> the lto/lto-partition.c caller (just use zero as counter).
>>>
>>>> -  return get_identifier (tmp_name);
>>>> +  return get_identifier (prefix);
>>>>  }
>>>>
>>>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
>>>> @@ -537,7 +536,17 @@ tree
>>>>  clone_function_name (tree decl, const char *suffix)
>>>>  {
>>>>    tree name = DECL_ASSEMBLER_NAME (decl);
>>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>>>> +  char *numbered_name;
>>>> +  unsigned int *suffix_counter;
>>>> +  if (!clone_fn_ids) {
>>>> +    /* Initialize the per-function counter hash table if this is the first call */
>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>>> +  }
>>>
>>> I still do not like throwing memory at the problem in this way for the
>>> little benefit
>>> this change provides :/
>>>
>>> So no approval from me at this point...
>>>
>>> Richard.
>>
>> Can you give me an idea of the memory constraints that are involved?
>>
>> The highest memory usage increase that I could find was when compiling
>> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB
>> increase over the before-patch use of 6936kB which is barely 0.03%.
>>
>> Using a single counter can result in more confusing namespacing when
>> you have .bar.clone.4 despite there only being 3 clones of .bar.
>>
>> From a practical point of view this change is helpful to anyone
>> diffing binary output such as forensic analysts, Debian Reproducible
>> Builds or even someone validating compiler output (before and after an input
>> source patch). The extra changes that this patch alleviates are a
>> distraction and could even be misleading. For example, applying a
>> source patch to the same Linux source produces the following binary
>> diff before my change:
>>
>> --- /tmp/output.o.objdump
>> +++ /tmp/patched-output.o.objdump
>> @@ -1,5 +1,5 @@
>>
>> -/tmp/uverbs_cmd/output.o:     file format elf32-i386
>> +/tmp/uverbs_cmd/patched-output.o:     file format elf32-i386
>>
>>
>>  Disassembly of section .text.get_order:
>> @@ -265,12 +265,12 @@
>>     3:	e9 fc ff ff ff       	jmp    4 <put_cq_read+0x4>
>>  			4: R_386_PC32	.text.put_uobj_read
>>
>> -Disassembly of section .text.trace_kmalloc.constprop.3:
>> +Disassembly of section .text.trace_kmalloc.constprop.4:
>>
>> -00000000 <trace_kmalloc.constprop.3>:
>> +00000000 <trace_kmalloc.constprop.4>:
>>     0:	83 3d 04 00 00 00 00 	cmpl   $0x0,0x4
>>  			2: R_386_32	__tracepoint_kmalloc
>> -   7:	74 34                	je     3d <trace_kmalloc.constprop.3+0x3d>
>> +   7:	74 34                	je     3d <trace_kmalloc.constprop.4+0x3d>
>>     9:	55                   	push   %ebp
>>     a:	89 cd                	mov    %ecx,%ebp
>>     c:	57                   	push   %edi
>> @@ -281,7 +281,7 @@
>>    13:	8b 1d 10 00 00 00    	mov    0x10,%ebx
>>  			15: R_386_32	__tracepoint_kmalloc
>>    19:	85 db                	test   %ebx,%ebx
>> -  1b:	74 1b                	je     38 <trace_kmalloc.constprop.3+0x38>
>> +  1b:	74 1b                	je     38 <trace_kmalloc.constprop.4+0x38>
>>    1d:	68 d0 00 00 00       	push   $0xd0
>>    22:	89 fa                	mov    %edi,%edx
>>    24:	89 f0                	mov    %esi,%eax
>> @@ -292,7 +292,7 @@
>>    31:	58                   	pop    %eax
>>    32:	83 3b 00             	cmpl   $0x0,(%ebx)
>>    35:	5a                   	pop    %edx
>> -  36:	eb e3                	jmp    1b <trace_kmalloc.constprop.3+0x1b>
>> +  36:	eb e3                	jmp    1b <trace_kmalloc.constprop.4+0x1b>
>>    38:	5b                   	pop    %ebx
>>    39:	5e                   	pop    %esi
>>    3a:	5f                   	pop    %edi
>> @@ -846,7 +846,7 @@
>>    78:	b8 5f 00 00 00       	mov    $0x5f,%eax
>>  			79: R_386_32	.text.ib_uverbs_alloc_pd
>>    7d:	e8 fc ff ff ff       	call   7e <ib_uverbs_alloc_pd+0x7e>
>> -			7e: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			7e: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    82:	c7 45 d4 f4 ff ff ff 	movl   $0xfffffff4,-0x2c(%ebp)
>>    89:	59                   	pop    %ecx
>>    8a:	85 db                	test   %ebx,%ebx
>> @@ -1068,7 +1068,7 @@
>>    9c:	b8 83 00 00 00       	mov    $0x83,%eax
>>  			9d: R_386_32	.text.ib_uverbs_reg_mr
>>    a1:	e8 fc ff ff ff       	call   a2 <ib_uverbs_reg_mr+0xa2>
>> -			a2: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			a2: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    a6:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
>>    ab:	58                   	pop    %eax
>>    ac:	85 db                	test   %ebx,%ebx
>> @@ -1385,7 +1385,7 @@
>>    99:	b8 7b 00 00 00       	mov    $0x7b,%eax
>>  			9a: R_386_32	.text.ib_uverbs_create_cq
>>    9e:	e8 fc ff ff ff       	call   9f <ib_uverbs_create_cq+0x9f>
>> -			9f: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			9f: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    a3:	58                   	pop    %eax
>>    a4:	85 db                	test   %ebx,%ebx
>>    a6:	75 0a                	jne    b2 <ib_uverbs_create_cq+0xb2>
>> @@ -1607,129 +1607,107 @@
>>  00000000 <ib_uverbs_poll_cq>:
>>     0:	55                   	push   %ebp
>>     1:	57                   	push   %edi
>> -   2:	89 c7                	mov    %eax,%edi
>> -   4:	56                   	push   %esi
>> -   5:	89 ce                	mov    %ecx,%esi
>> -   7:	b9 10 00 00 00       	mov    $0x10,%ecx
>> -   c:	53                   	push   %ebx
>> -   d:	83 ec 18             	sub    $0x18,%esp
>> -  10:	8d 44 24 08          	lea    0x8(%esp),%eax
>> -  14:	e8 fc ff ff ff       	call   15 <ib_uverbs_poll_cq+0x15>
>> -			15: R_386_PC32	copy_from_user
>> -  19:	85 c0                	test   %eax,%eax
>> -  1b:	0f 85 34 01 00 00    	jne    155 <ib_uverbs_poll_cq+0x155>
>> -  21:	6b 44 24 14 34       	imul   $0x34,0x14(%esp),%eax
>> -  26:	ba d0 00 00 00       	mov    $0xd0,%edx
>> -  2b:	e8 fc ff ff ff       	call   2c <ib_uverbs_poll_cq+0x2c>
>> -			2c: R_386_PC32	__kmalloc
>> -  30:	89 c5                	mov    %eax,%ebp
>> -  32:	85 c0                	test   %eax,%eax
>> -  34:	0f 84 22 01 00 00    	je     15c <ib_uverbs_poll_cq+0x15c>
>> -  3a:	6b 44 24 14 30       	imul   $0x30,0x14(%esp),%eax
>> -  3f:	ba d0 00 00 00       	mov    $0xd0,%edx
>> -  44:	83 c0 08             	add    $0x8,%eax
>> -  47:	89 44 24 04          	mov    %eax,0x4(%esp)
>> -  4b:	e8 fc ff ff ff       	call   4c <ib_uverbs_poll_cq+0x4c>
>> -			4c: R_386_PC32	__kmalloc
>> -  50:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
>> -  55:	89 04 24             	mov    %eax,(%esp)
>> -  58:	85 c0                	test   %eax,%eax
>> -  5a:	0f 84 e1 00 00 00    	je     141 <ib_uverbs_poll_cq+0x141>
>> -  60:	8b 4f 58             	mov    0x58(%edi),%ecx
>> -  63:	6a 00                	push   $0x0
>> -  65:	b8 00 00 00 00       	mov    $0x0,%eax
>> -			66: R_386_32	ib_uverbs_cq_idr
>> -  6a:	8b 54 24 14          	mov    0x14(%esp),%edx
>> -  6e:	e8 fc ff ff ff       	call   6f <ib_uverbs_poll_cq+0x6f>
>> -			6f: R_386_PC32	.text.idr_read_obj
>> -  73:	ba ea ff ff ff       	mov    $0xffffffea,%edx
>> -  78:	89 c7                	mov    %eax,%edi
>> -  7a:	58                   	pop    %eax
>> -  7b:	85 ff                	test   %edi,%edi
>> -  7d:	0f 84 ae 00 00 00    	je     131 <ib_uverbs_poll_cq+0x131>
>> -  83:	8b 1f                	mov    (%edi),%ebx
>> -  85:	8b 54 24 14          	mov    0x14(%esp),%edx
>> -  89:	89 e9                	mov    %ebp,%ecx
>> -  8b:	89 f8                	mov    %edi,%eax
>> -  8d:	ff 93 70 01 00 00    	call   *0x170(%ebx)
>> -  93:	8b 1c 24             	mov    (%esp),%ebx
>> -  96:	89 03                	mov    %eax,(%ebx)
>> -  98:	89 f8                	mov    %edi,%eax
>> -  9a:	e8 fc ff ff ff       	call   9b <ib_uverbs_poll_cq+0x9b>
>> -			9b: R_386_PC32	.text.put_cq_read
>> -  9f:	8b 1c 24             	mov    (%esp),%ebx
>> -  a2:	89 e8                	mov    %ebp,%eax
>> -  a4:	6b 3b 34             	imul   $0x34,(%ebx),%edi
>> -  a7:	8d 53 08             	lea    0x8(%ebx),%edx
>> -  aa:	01 ef                	add    %ebp,%edi
>> -  ac:	39 f8                	cmp    %edi,%eax
>> -  ae:	74 67                	je     117 <ib_uverbs_poll_cq+0x117>
>> -  b0:	8b 08                	mov    (%eax),%ecx
>> -  b2:	8b 58 04             	mov    0x4(%eax),%ebx
>> -  b5:	83 c2 30             	add    $0x30,%edx
>> -  b8:	83 c0 34             	add    $0x34,%eax
>> -  bb:	89 4a d0             	mov    %ecx,-0x30(%edx)
>> -  be:	89 5a d4             	mov    %ebx,-0x2c(%edx)
>> -  c1:	8b 48 d4             	mov    -0x2c(%eax),%ecx
>> -  c4:	89 4a d8             	mov    %ecx,-0x28(%edx)
>> -  c7:	8b 48 d8             	mov    -0x28(%eax),%ecx
>> -  ca:	89 4a dc             	mov    %ecx,-0x24(%edx)
>> -  cd:	8b 48 dc             	mov    -0x24(%eax),%ecx
>> -  d0:	89 4a e0             	mov    %ecx,-0x20(%edx)
>> -  d3:	8b 48 e0             	mov    -0x20(%eax),%ecx
>> -  d6:	89 4a e4             	mov    %ecx,-0x1c(%edx)
>> -  d9:	8b 48 e8             	mov    -0x18(%eax),%ecx
>> -  dc:	89 4a e8             	mov    %ecx,-0x18(%edx)
>> -  df:	8b 48 e4             	mov    -0x1c(%eax),%ecx
>> -  e2:	8b 49 20             	mov    0x20(%ecx),%ecx
>> -  e5:	89 4a ec             	mov    %ecx,-0x14(%edx)
>> -  e8:	8b 48 ec             	mov    -0x14(%eax),%ecx
>> -  eb:	89 4a f0             	mov    %ecx,-0x10(%edx)
>> -  ee:	8b 48 f0             	mov    -0x10(%eax),%ecx
>> -  f1:	89 4a f4             	mov    %ecx,-0xc(%edx)
>> -  f4:	8b 48 f4             	mov    -0xc(%eax),%ecx
>> -  f7:	66 89 4a f8          	mov    %cx,-0x8(%edx)
>> -  fb:	66 8b 48 f6          	mov    -0xa(%eax),%cx
>> -  ff:	66 89 4a fa          	mov    %cx,-0x6(%edx)
>> - 103:	8a 48 f8             	mov    -0x8(%eax),%cl
>> - 106:	88 4a fc             	mov    %cl,-0x4(%edx)
>> - 109:	8a 48 f9             	mov    -0x7(%eax),%cl
>> - 10c:	88 4a fd             	mov    %cl,-0x3(%edx)
>> - 10f:	8a 48 fa             	mov    -0x6(%eax),%cl
>> - 112:	88 4a fe             	mov    %cl,-0x2(%edx)
>> - 115:	eb 95                	jmp    ac <ib_uverbs_poll_cq+0xac>
>> - 117:	8b 14 24             	mov    (%esp),%edx
>> - 11a:	8b 4c 24 04          	mov    0x4(%esp),%ecx
>> - 11e:	8b 44 24 08          	mov    0x8(%esp),%eax
>> - 122:	e8 fc ff ff ff       	call   123 <ib_uverbs_poll_cq+0x123>
>> -			123: R_386_PC32	copy_to_user
>> - 127:	83 f8 01             	cmp    $0x1,%eax
>> - 12a:	19 d2                	sbb    %edx,%edx
>> - 12c:	f7 d2                	not    %edx
>> - 12e:	83 e2 f2             	and    $0xfffffff2,%edx
>> - 131:	8b 04 24             	mov    (%esp),%eax
>> - 134:	89 54 24 04          	mov    %edx,0x4(%esp)
>> - 138:	e8 fc ff ff ff       	call   139 <ib_uverbs_poll_cq+0x139>
>> -			139: R_386_PC32	kfree
>> - 13d:	8b 54 24 04          	mov    0x4(%esp),%edx
>> - 141:	89 e8                	mov    %ebp,%eax
>> - 143:	89 14 24             	mov    %edx,(%esp)
>> - 146:	e8 fc ff ff ff       	call   147 <ib_uverbs_poll_cq+0x147>
>> -			147: R_386_PC32	kfree
>> - 14b:	8b 14 24             	mov    (%esp),%edx
>> - 14e:	85 d2                	test   %edx,%edx
>> - 150:	0f 45 f2             	cmovne %edx,%esi
>> - 153:	eb 0c                	jmp    161 <ib_uverbs_poll_cq+0x161>
>> - 155:	be f2 ff ff ff       	mov    $0xfffffff2,%esi
>> - 15a:	eb 05                	jmp    161 <ib_uverbs_poll_cq+0x161>
>> - 15c:	be f4 ff ff ff       	mov    $0xfffffff4,%esi
>> - 161:	83 c4 18             	add    $0x18,%esp
>> - 164:	89 f0                	mov    %esi,%eax
>> - 166:	5b                   	pop    %ebx
>> - 167:	5e                   	pop    %esi
>> - 168:	5f                   	pop    %edi
>> - 169:	5d                   	pop    %ebp
>> - 16a:	c3                   	ret
>> +   2:	bf f2 ff ff ff       	mov    $0xfffffff2,%edi
>> +   7:	56                   	push   %esi
>> +   8:	53                   	push   %ebx
>> +   9:	89 c3                	mov    %eax,%ebx
>> +   b:	81 ec 84 00 00 00    	sub    $0x84,%esp
>> +  11:	89 4c 24 04          	mov    %ecx,0x4(%esp)
>> +  15:	8d 44 24 10          	lea    0x10(%esp),%eax
>> +  19:	b9 10 00 00 00       	mov    $0x10,%ecx
>> +  1e:	e8 fc ff ff ff       	call   1f <ib_uverbs_poll_cq+0x1f>
>> +			1f: R_386_PC32	copy_from_user
>> +  23:	89 04 24             	mov    %eax,(%esp)
>> +  26:	85 c0                	test   %eax,%eax
>> +  28:	0f 85 1f 01 00 00    	jne    14d <ib_uverbs_poll_cq+0x14d>
>> +  2e:	8b 4b 58             	mov    0x58(%ebx),%ecx
>> +  31:	6a 00                	push   $0x0
>> +  33:	b8 00 00 00 00       	mov    $0x0,%eax
>> +			34: R_386_32	ib_uverbs_cq_idr
>> +  38:	bf ea ff ff ff       	mov    $0xffffffea,%edi
>> +  3d:	8b 54 24 1c          	mov    0x1c(%esp),%edx
>> +  41:	e8 fc ff ff ff       	call   42 <ib_uverbs_poll_cq+0x42>
>> +			42: R_386_PC32	.text.idr_read_obj
>> +  46:	89 c3                	mov    %eax,%ebx
>> +  48:	58                   	pop    %eax
>> +  49:	85 db                	test   %ebx,%ebx
>> +  4b:	0f 84 fc 00 00 00    	je     14d <ib_uverbs_poll_cq+0x14d>
>> +  51:	8b 6c 24 10          	mov    0x10(%esp),%ebp
>> +  55:	b9 02 00 00 00       	mov    $0x2,%ecx
>> +  5a:	8d 7c 24 08          	lea    0x8(%esp),%edi
>> +  5e:	8b 04 24             	mov    (%esp),%eax
>> +  61:	8d 75 08             	lea    0x8(%ebp),%esi
>> +  64:	f3 ab                	rep stos %eax,%es:(%edi)
>> +  66:	8b 44 24 1c          	mov    0x1c(%esp),%eax
>> +  6a:	39 44 24 08          	cmp    %eax,0x8(%esp)
>> +  6e:	73 1f                	jae    8f <ib_uverbs_poll_cq+0x8f>
>> +  70:	8b 3b                	mov    (%ebx),%edi
>> +  72:	8d 4c 24 50          	lea    0x50(%esp),%ecx
>> +  76:	ba 01 00 00 00       	mov    $0x1,%edx
>> +  7b:	89 d8                	mov    %ebx,%eax
>> +  7d:	ff 97 70 01 00 00    	call   *0x170(%edi)
>> +  83:	89 c7                	mov    %eax,%edi
>> +  85:	85 c0                	test   %eax,%eax
>> +  87:	0f 88 b9 00 00 00    	js     146 <ib_uverbs_poll_cq+0x146>
>> +  8d:	75 21                	jne    b0 <ib_uverbs_poll_cq+0xb0>
>> +  8f:	b9 08 00 00 00       	mov    $0x8,%ecx
>> +  94:	8d 54 24 08          	lea    0x8(%esp),%edx
>> +  98:	89 e8                	mov    %ebp,%eax
>> +  9a:	bf f2 ff ff ff       	mov    $0xfffffff2,%edi
>> +  9f:	e8 fc ff ff ff       	call   a0 <ib_uverbs_poll_cq+0xa0>
>> +			a0: R_386_PC32	copy_to_user
>> +  a4:	85 c0                	test   %eax,%eax
>> +  a6:	0f 44 7c 24 04       	cmove  0x4(%esp),%edi
>> +  ab:	e9 96 00 00 00       	jmp    146 <ib_uverbs_poll_cq+0x146>
>> +  b0:	8b 44 24 50          	mov    0x50(%esp),%eax
>> +  b4:	8b 54 24 54          	mov    0x54(%esp),%edx
>> +  b8:	b9 30 00 00 00       	mov    $0x30,%ecx
>> +  bd:	89 44 24 20          	mov    %eax,0x20(%esp)
>> +  c1:	8b 44 24 58          	mov    0x58(%esp),%eax
>> +  c5:	89 54 24 24          	mov    %edx,0x24(%esp)
>> +  c9:	8b 54 24 74          	mov    0x74(%esp),%edx
>> +  cd:	89 44 24 28          	mov    %eax,0x28(%esp)
>> +  d1:	8b 44 24 5c          	mov    0x5c(%esp),%eax
>> +  d5:	89 44 24 2c          	mov    %eax,0x2c(%esp)
>> +  d9:	8b 44 24 60          	mov    0x60(%esp),%eax
>> +  dd:	89 44 24 30          	mov    %eax,0x30(%esp)
>> +  e1:	8b 44 24 64          	mov    0x64(%esp),%eax
>> +  e5:	89 44 24 34          	mov    %eax,0x34(%esp)
>> +  e9:	8b 44 24 6c          	mov    0x6c(%esp),%eax
>> +  ed:	89 44 24 38          	mov    %eax,0x38(%esp)
>> +  f1:	8b 44 24 68          	mov    0x68(%esp),%eax
>> +  f5:	8b 40 20             	mov    0x20(%eax),%eax
>> +  f8:	89 54 24 44          	mov    %edx,0x44(%esp)
>> +  fc:	8d 54 24 20          	lea    0x20(%esp),%edx
>> + 100:	c6 44 24 4f 00       	movb   $0x0,0x4f(%esp)
>> + 105:	89 44 24 3c          	mov    %eax,0x3c(%esp)
>> + 109:	8b 44 24 70          	mov    0x70(%esp),%eax
>> + 10d:	89 44 24 40          	mov    %eax,0x40(%esp)
>> + 111:	8b 44 24 78          	mov    0x78(%esp),%eax
>> + 115:	89 44 24 48          	mov    %eax,0x48(%esp)
>> + 119:	8b 44 24 7c          	mov    0x7c(%esp),%eax
>> + 11d:	66 89 44 24 4c       	mov    %ax,0x4c(%esp)
>> + 122:	8a 44 24 7e          	mov    0x7e(%esp),%al
>> + 126:	88 44 24 4e          	mov    %al,0x4e(%esp)
>> + 12a:	89 f0                	mov    %esi,%eax
>> + 12c:	e8 fc ff ff ff       	call   12d <ib_uverbs_poll_cq+0x12d>
>> +			12d: R_386_PC32	copy_to_user
>> + 131:	85 c0                	test   %eax,%eax
>> + 133:	75 0c                	jne    141 <ib_uverbs_poll_cq+0x141>
>> + 135:	83 c6 30             	add    $0x30,%esi
>> + 138:	ff 44 24 08          	incl   0x8(%esp)
>> + 13c:	e9 25 ff ff ff       	jmp    66 <ib_uverbs_poll_cq+0x66>
>> + 141:	bf f2 ff ff ff       	mov    $0xfffffff2,%edi
>> + 146:	89 d8                	mov    %ebx,%eax
>> + 148:	e8 fc ff ff ff       	call   149 <ib_uverbs_poll_cq+0x149>
>> +			149: R_386_PC32	.text.put_cq_read
>> + 14d:	81 c4 84 00 00 00    	add    $0x84,%esp
>> + 153:	89 f8                	mov    %edi,%eax
>> + 155:	5b                   	pop    %ebx
>> + 156:	5e                   	pop    %esi
>> + 157:	5f                   	pop    %edi
>> + 158:	5d                   	pop    %ebp
>> + 159:	c3                   	ret
>>
>>  Disassembly of section .text.ib_uverbs_req_notify_cq:
>>
>> @@ -1915,7 +1893,7 @@
>>    94:	b8 7b 00 00 00       	mov    $0x7b,%eax
>>  			95: R_386_32	.text.ib_uverbs_create_qp
>>    99:	e8 fc ff ff ff       	call   9a <ib_uverbs_create_qp+0x9a>
>> -			9a: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			9a: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    9e:	59                   	pop    %ecx
>>    9f:	c7 85 50 ff ff ff f4 	movl   $0xfffffff4,-0xb0(%ebp)
>>    a6:	ff ff ff
>> @@ -2241,7 +2219,7 @@
>>    68:	b8 4f 00 00 00       	mov    $0x4f,%eax
>>  			69: R_386_32	.text.ib_uverbs_query_qp
>>    6d:	e8 fc ff ff ff       	call   6e <ib_uverbs_query_qp+0x6e>
>> -			6e: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			6e: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    72:	59                   	pop    %ecx
>>    73:	c7 85 5c ff ff ff 10 	movl   $0x10,-0xa4(%ebp)
>>    7a:	00 00 00
>> @@ -2260,7 +2238,7 @@
>>    a3:	b8 86 00 00 00       	mov    $0x86,%eax
>>  			a4: R_386_32	.text.ib_uverbs_query_qp
>>    a8:	e8 fc ff ff ff       	call   a9 <ib_uverbs_query_qp+0xa9>
>> -			a9: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			a9: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    ad:	5a                   	pop    %edx
>>    ae:	85 db                	test   %ebx,%ebx
>>    b0:	0f 84 c1 01 00 00    	je     277 <ib_uverbs_query_qp+0x277>
>> @@ -2462,7 +2440,7 @@
>>    88:	b8 6f 00 00 00       	mov    $0x6f,%eax
>>  			89: R_386_32	.text.ib_uverbs_modify_qp
>>    8d:	e8 fc ff ff ff       	call   8e <ib_uverbs_modify_qp+0x8e>
>> -			8e: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			8e: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    92:	5a                   	pop    %edx
>>    93:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
>>    98:	85 db                	test   %ebx,%ebx
>> @@ -3129,7 +3107,7 @@
>>    6a:	b8 4c 00 00 00       	mov    $0x4c,%eax
>>  			6b: R_386_32	.text.ib_uverbs_create_ah
>>    6f:	e8 fc ff ff ff       	call   70 <ib_uverbs_create_ah+0x70>
>> -			70: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			70: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    74:	58                   	pop    %eax
>>    75:	85 db                	test   %ebx,%ebx
>>    77:	75 0a                	jne    83 <ib_uverbs_create_ah+0x83>
>> @@ -3396,7 +3374,7 @@
>>    af:	b8 91 00 00 00       	mov    $0x91,%eax
>>  			b0: R_386_32	.text.ib_uverbs_attach_mcast
>>    b4:	e8 fc ff ff ff       	call   b5 <ib_uverbs_attach_mcast+0xb5>
>> -			b5: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			b5: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    b9:	58                   	pop    %eax
>>    ba:	85 db                	test   %ebx,%ebx
>>    bc:	75 07                	jne    c5 <ib_uverbs_attach_mcast+0xc5>
>> @@ -3572,7 +3550,7 @@
>>    77:	b8 5e 00 00 00       	mov    $0x5e,%eax
>>  			78: R_386_32	.text.ib_uverbs_create_srq
>>    7c:	e8 fc ff ff ff       	call   7d <ib_uverbs_create_srq+0x7d>
>> -			7d: R_386_PC32	.text.trace_kmalloc.constprop.3
>> +			7d: R_386_PC32	.text.trace_kmalloc.constprop.4
>>    81:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
>>    86:	58                   	pop    %eax
>>    87:	85 db                	test   %ebx,%ebx
>>
>> Needless to say, after my change the diff only shows the truly changed
>> functions.
>>
>> - Michael
>>
> 
> Updated patch
> 
> 
> gcc:
> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
> 
>        Make function clone name numbering independent.
>        * cgraph.h (clone_function_name_1): Add clone_num argument
>        * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
>        (clone_function_name): Use counters from the hashtable.
> 
> lto:
> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
> 
>        * lto-partition.c (privatize_symbol_name_1): Pass 0 to
>        clone_function_name_1.
> 
> 
> testsuite:
> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
> 
> 	Clone numbering should be independent for each function.
> 	* gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
> 
> ---
>  gcc/cgraph.h                                  |  2 +-
>  gcc/cgraphclones.c                            | 16 ++++++++---
>  gcc/lto/lto-partition.c                       |  2 +-
>  gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
>  4 files changed, 52 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c
> 
> diff --git gcc/cgraph.h gcc/cgraph.h
> index a8b1b4c..fe44bd0 100644
> --- gcc/cgraph.h
> +++ gcc/cgraph.h
> @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool,
> profile_count);
>  tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
>  /* In cgraphclones.c  */
> 
> -tree clone_function_name_1 (const char *, const char *);
> +tree clone_function_name_1 (const char *, const char *, unsigned int);
>  tree clone_function_name (tree decl, const char *);
> 
>  void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
> index 6e84a31..7de7a65 100644
> --- gcc/cgraphclones.c
> +++ gcc/cgraphclones.c
> @@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count
> prof_count,
>    return new_node;
>  }
> 
> -static GTY(()) unsigned int clone_fn_id_num;
> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
> 
>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>     NAME.  */
> 
>  tree
> -clone_function_name_1 (const char *name, const char *suffix)
> +clone_function_name_1 (const char *name, const char *suffix, unsigned int
> clone_num)
>  {
>    size_t len = strlen (name);
>    char *tmp_name, *prefix;
> @@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix)
>    memcpy (prefix, name, len);
>    strcpy (prefix + len + 1, suffix);
>    prefix[len] = symbol_table::symbol_suffix_separator ();
> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> +  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num);
>    return get_identifier (tmp_name);
>  }
> 
> @@ -537,7 +537,15 @@ tree
>  clone_function_name (tree decl, const char *suffix)
>  {
>    tree name = DECL_ASSEMBLER_NAME (decl);
> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
> +  const char *decl_name = IDENTIFIER_POINTER (name);
> +  unsigned int *suffix_counter;
> +  if (!clone_fn_ids) {
> +    /* Initialize the per-function counter hash table on first call */
> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
> +  }
> +  suffix_counter = &clone_fn_ids->get_or_insert(decl_name);
> +  *suffix_counter = *suffix_counter + 1;
> +  return clone_function_name_1 (decl_name, suffix, *suffix_counter);
>  }
> 
> 
> diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
> index c7a5710..4b15232 100644
> --- gcc/lto/lto-partition.c
> +++ gcc/lto/lto-partition.c
> @@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
>    name = maybe_rewrite_identifier (name);
>    symtab->change_decl_assembler_name (decl,
>  				      clone_function_name_1 (name,
> -							     "lto_priv"));
> +							     "lto_priv", 0));
> 
>    if (node->lto_file_data)
>      lto_record_renamed_decl (node->lto_file_data, name,
> diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c
> gcc/testsuite/gcc.dg/independent-cloneids-1.c
> new file mode 100644
> index 0000000..d723e20
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
> +
> +extern int printf (const char *, ...);
> +
> +static int __attribute__ ((noinline))
> +foo (int arg)
> +{
> +  return 7 * arg;
> +}
> +
> +static int __attribute__ ((noinline))
> +bar (int arg)
> +{
> +  return arg * arg;
> +}
> +
> +int
> +baz (int arg)
> +{
> +  printf("%d\n", bar (3));
> +  printf("%d\n", bar (4));
> +  printf("%d\n", foo (5));
> +  printf("%d\n", foo (6));
> +  /* adding or removing the following call should not affect foo
> +     function's clone numbering */
> +  printf("%d\n", bar (7));
> +  return foo (8);
> +}
> +
> +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
> +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
> 

Ping? Could someone else please take a look at this since Richard appears to be
overloaded at the moment?


Regards,
- Michael


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Make function clone name numbering independent.
  2018-07-31 17:40               ` Michael Ploujnikov
@ 2018-08-01 10:38                 ` Richard Biener
  2018-08-02 19:05                   ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Biener @ 2018-08-01 10:38 UTC (permalink / raw)
  To: Michael Ploujnikov; +Cc: Bernhard Fischer, GCC Patches

On Tue, Jul 31, 2018 at 7:40 PM Michael Ploujnikov
<michael.ploujnikov@oracle.com> wrote:
>
> On 2018-07-26 01:27 PM, Michael Ploujnikov wrote:
> > On 2018-07-24 09:57 AM, Michael Ploujnikov wrote:
> >> On 2018-07-20 06:05 AM, Richard Biener wrote:
> >>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
> >>>>     NAME.  */
> >>>> @@ -521,14 +521,13 @@ tree
> >>>>  clone_function_name_1 (const char *name, const char *suffix)
> >>>
> >>> pass this function the counter to use....
> >>>
> >>>>  {
> >>>>    size_t len = strlen (name);
> >>>> -  char *tmp_name, *prefix;
> >>>> +  char *prefix;
> >>>>
> >>>>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
> >>>>    memcpy (prefix, name, len);
> >>>>    strcpy (prefix + len + 1, suffix);
> >>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
> >>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> >>>
> >>> and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
> >>> the lto/lto-partition.c caller (just use zero as counter).
> >>>
> >>>> -  return get_identifier (tmp_name);
> >>>> +  return get_identifier (prefix);
> >>>>  }
> >>>>
> >>>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
> >>>> @@ -537,7 +536,17 @@ tree
> >>>>  clone_function_name (tree decl, const char *suffix)
> >>>>  {
> >>>>    tree name = DECL_ASSEMBLER_NAME (decl);
> >>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
> >>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
> >>>> +  char *numbered_name;
> >>>> +  unsigned int *suffix_counter;
> >>>> +  if (!clone_fn_ids) {
> >>>> +    /* Initialize the per-function counter hash table if this is the first call */
> >>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
> >>>> +  }
> >>>
> >>> I still do not like throwing memory at the problem in this way for the
> >>> little benefit
> >>> this change provides :/
> >>>
> >>> So no approval from me at this point...
> >>>
> >>> Richard.
> >>
> >> Can you give me an idea of the memory constraints that are involved?
> >>
> >> The highest memory usage increase that I could find was when compiling
> >> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB
> >> increase over the before-patch use of 6936kB which is barely 0.03%.
> >>
> >> Using a single counter can result in more confusing namespacing when
> >> you have .bar.clone.4 despite there only being 3 clones of .bar.
> >>
> >> From a practical point of view this change is helpful to anyone
> >> diffing binary output such as forensic analysts, Debian Reproducible
> >> Builds or even someone validating compiler output (before and after an input
> >> source patch). The extra changes that this patch alleviates are a
> >> distraction and could even be misleading. For example, applying a
> >> source patch to the same Linux source produces the following binary
> >> diff before my change:
> >>
> >> --- /tmp/output.o.objdump
> >> +++ /tmp/patched-output.o.objdump
> >> @@ -1,5 +1,5 @@
> >>
> >> -/tmp/uverbs_cmd/output.o:     file format elf32-i386
> >> +/tmp/uverbs_cmd/patched-output.o:     file format elf32-i386
> >>
> >>
> >>  Disassembly of section .text.get_order:
> >> @@ -265,12 +265,12 @@
> >>     3:       e9 fc ff ff ff          jmp    4 <put_cq_read+0x4>
> >>                      4: R_386_PC32   .text.put_uobj_read
> >>
> >> -Disassembly of section .text.trace_kmalloc.constprop.3:
> >> +Disassembly of section .text.trace_kmalloc.constprop.4:
> >>
> >> -00000000 <trace_kmalloc.constprop.3>:
> >> +00000000 <trace_kmalloc.constprop.4>:
> >>     0:       83 3d 04 00 00 00 00    cmpl   $0x0,0x4
> >>                      2: R_386_32     __tracepoint_kmalloc
> >> -   7:       74 34                   je     3d <trace_kmalloc.constprop.3+0x3d>
> >> +   7:       74 34                   je     3d <trace_kmalloc.constprop.4+0x3d>
> >>     9:       55                      push   %ebp
> >>     a:       89 cd                   mov    %ecx,%ebp
> >>     c:       57                      push   %edi
> >> @@ -281,7 +281,7 @@
> >>    13:       8b 1d 10 00 00 00       mov    0x10,%ebx
> >>                      15: R_386_32    __tracepoint_kmalloc
> >>    19:       85 db                   test   %ebx,%ebx
> >> -  1b:       74 1b                   je     38 <trace_kmalloc.constprop.3+0x38>
> >> +  1b:       74 1b                   je     38 <trace_kmalloc.constprop.4+0x38>
> >>    1d:       68 d0 00 00 00          push   $0xd0
> >>    22:       89 fa                   mov    %edi,%edx
> >>    24:       89 f0                   mov    %esi,%eax
> >> @@ -292,7 +292,7 @@
> >>    31:       58                      pop    %eax
> >>    32:       83 3b 00                cmpl   $0x0,(%ebx)
> >>    35:       5a                      pop    %edx
> >> -  36:       eb e3                   jmp    1b <trace_kmalloc.constprop.3+0x1b>
> >> +  36:       eb e3                   jmp    1b <trace_kmalloc.constprop.4+0x1b>
> >>    38:       5b                      pop    %ebx
> >>    39:       5e                      pop    %esi
> >>    3a:       5f                      pop    %edi
> >> @@ -846,7 +846,7 @@
> >>    78:       b8 5f 00 00 00          mov    $0x5f,%eax
> >>                      79: R_386_32    .text.ib_uverbs_alloc_pd
> >>    7d:       e8 fc ff ff ff          call   7e <ib_uverbs_alloc_pd+0x7e>
> >> -                    7e: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    7e: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    82:       c7 45 d4 f4 ff ff ff    movl   $0xfffffff4,-0x2c(%ebp)
> >>    89:       59                      pop    %ecx
> >>    8a:       85 db                   test   %ebx,%ebx
> >> @@ -1068,7 +1068,7 @@
> >>    9c:       b8 83 00 00 00          mov    $0x83,%eax
> >>                      9d: R_386_32    .text.ib_uverbs_reg_mr
> >>    a1:       e8 fc ff ff ff          call   a2 <ib_uverbs_reg_mr+0xa2>
> >> -                    a2: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    a2: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    a6:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
> >>    ab:       58                      pop    %eax
> >>    ac:       85 db                   test   %ebx,%ebx
> >> @@ -1385,7 +1385,7 @@
> >>    99:       b8 7b 00 00 00          mov    $0x7b,%eax
> >>                      9a: R_386_32    .text.ib_uverbs_create_cq
> >>    9e:       e8 fc ff ff ff          call   9f <ib_uverbs_create_cq+0x9f>
> >> -                    9f: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    9f: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    a3:       58                      pop    %eax
> >>    a4:       85 db                   test   %ebx,%ebx
> >>    a6:       75 0a                   jne    b2 <ib_uverbs_create_cq+0xb2>
> >> @@ -1607,129 +1607,107 @@
> >>  00000000 <ib_uverbs_poll_cq>:
> >>     0:       55                      push   %ebp
> >>     1:       57                      push   %edi
> >> -   2:       89 c7                   mov    %eax,%edi
> >> -   4:       56                      push   %esi
> >> -   5:       89 ce                   mov    %ecx,%esi
> >> -   7:       b9 10 00 00 00          mov    $0x10,%ecx
> >> -   c:       53                      push   %ebx
> >> -   d:       83 ec 18                sub    $0x18,%esp
> >> -  10:       8d 44 24 08             lea    0x8(%esp),%eax
> >> -  14:       e8 fc ff ff ff          call   15 <ib_uverbs_poll_cq+0x15>
> >> -                    15: R_386_PC32  copy_from_user
> >> -  19:       85 c0                   test   %eax,%eax
> >> -  1b:       0f 85 34 01 00 00       jne    155 <ib_uverbs_poll_cq+0x155>
> >> -  21:       6b 44 24 14 34          imul   $0x34,0x14(%esp),%eax
> >> -  26:       ba d0 00 00 00          mov    $0xd0,%edx
> >> -  2b:       e8 fc ff ff ff          call   2c <ib_uverbs_poll_cq+0x2c>
> >> -                    2c: R_386_PC32  __kmalloc
> >> -  30:       89 c5                   mov    %eax,%ebp
> >> -  32:       85 c0                   test   %eax,%eax
> >> -  34:       0f 84 22 01 00 00       je     15c <ib_uverbs_poll_cq+0x15c>
> >> -  3a:       6b 44 24 14 30          imul   $0x30,0x14(%esp),%eax
> >> -  3f:       ba d0 00 00 00          mov    $0xd0,%edx
> >> -  44:       83 c0 08                add    $0x8,%eax
> >> -  47:       89 44 24 04             mov    %eax,0x4(%esp)
> >> -  4b:       e8 fc ff ff ff          call   4c <ib_uverbs_poll_cq+0x4c>
> >> -                    4c: R_386_PC32  __kmalloc
> >> -  50:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
> >> -  55:       89 04 24                mov    %eax,(%esp)
> >> -  58:       85 c0                   test   %eax,%eax
> >> -  5a:       0f 84 e1 00 00 00       je     141 <ib_uverbs_poll_cq+0x141>
> >> -  60:       8b 4f 58                mov    0x58(%edi),%ecx
> >> -  63:       6a 00                   push   $0x0
> >> -  65:       b8 00 00 00 00          mov    $0x0,%eax
> >> -                    66: R_386_32    ib_uverbs_cq_idr
> >> -  6a:       8b 54 24 14             mov    0x14(%esp),%edx
> >> -  6e:       e8 fc ff ff ff          call   6f <ib_uverbs_poll_cq+0x6f>
> >> -                    6f: R_386_PC32  .text.idr_read_obj
> >> -  73:       ba ea ff ff ff          mov    $0xffffffea,%edx
> >> -  78:       89 c7                   mov    %eax,%edi
> >> -  7a:       58                      pop    %eax
> >> -  7b:       85 ff                   test   %edi,%edi
> >> -  7d:       0f 84 ae 00 00 00       je     131 <ib_uverbs_poll_cq+0x131>
> >> -  83:       8b 1f                   mov    (%edi),%ebx
> >> -  85:       8b 54 24 14             mov    0x14(%esp),%edx
> >> -  89:       89 e9                   mov    %ebp,%ecx
> >> -  8b:       89 f8                   mov    %edi,%eax
> >> -  8d:       ff 93 70 01 00 00       call   *0x170(%ebx)
> >> -  93:       8b 1c 24                mov    (%esp),%ebx
> >> -  96:       89 03                   mov    %eax,(%ebx)
> >> -  98:       89 f8                   mov    %edi,%eax
> >> -  9a:       e8 fc ff ff ff          call   9b <ib_uverbs_poll_cq+0x9b>
> >> -                    9b: R_386_PC32  .text.put_cq_read
> >> -  9f:       8b 1c 24                mov    (%esp),%ebx
> >> -  a2:       89 e8                   mov    %ebp,%eax
> >> -  a4:       6b 3b 34                imul   $0x34,(%ebx),%edi
> >> -  a7:       8d 53 08                lea    0x8(%ebx),%edx
> >> -  aa:       01 ef                   add    %ebp,%edi
> >> -  ac:       39 f8                   cmp    %edi,%eax
> >> -  ae:       74 67                   je     117 <ib_uverbs_poll_cq+0x117>
> >> -  b0:       8b 08                   mov    (%eax),%ecx
> >> -  b2:       8b 58 04                mov    0x4(%eax),%ebx
> >> -  b5:       83 c2 30                add    $0x30,%edx
> >> -  b8:       83 c0 34                add    $0x34,%eax
> >> -  bb:       89 4a d0                mov    %ecx,-0x30(%edx)
> >> -  be:       89 5a d4                mov    %ebx,-0x2c(%edx)
> >> -  c1:       8b 48 d4                mov    -0x2c(%eax),%ecx
> >> -  c4:       89 4a d8                mov    %ecx,-0x28(%edx)
> >> -  c7:       8b 48 d8                mov    -0x28(%eax),%ecx
> >> -  ca:       89 4a dc                mov    %ecx,-0x24(%edx)
> >> -  cd:       8b 48 dc                mov    -0x24(%eax),%ecx
> >> -  d0:       89 4a e0                mov    %ecx,-0x20(%edx)
> >> -  d3:       8b 48 e0                mov    -0x20(%eax),%ecx
> >> -  d6:       89 4a e4                mov    %ecx,-0x1c(%edx)
> >> -  d9:       8b 48 e8                mov    -0x18(%eax),%ecx
> >> -  dc:       89 4a e8                mov    %ecx,-0x18(%edx)
> >> -  df:       8b 48 e4                mov    -0x1c(%eax),%ecx
> >> -  e2:       8b 49 20                mov    0x20(%ecx),%ecx
> >> -  e5:       89 4a ec                mov    %ecx,-0x14(%edx)
> >> -  e8:       8b 48 ec                mov    -0x14(%eax),%ecx
> >> -  eb:       89 4a f0                mov    %ecx,-0x10(%edx)
> >> -  ee:       8b 48 f0                mov    -0x10(%eax),%ecx
> >> -  f1:       89 4a f4                mov    %ecx,-0xc(%edx)
> >> -  f4:       8b 48 f4                mov    -0xc(%eax),%ecx
> >> -  f7:       66 89 4a f8             mov    %cx,-0x8(%edx)
> >> -  fb:       66 8b 48 f6             mov    -0xa(%eax),%cx
> >> -  ff:       66 89 4a fa             mov    %cx,-0x6(%edx)
> >> - 103:       8a 48 f8                mov    -0x8(%eax),%cl
> >> - 106:       88 4a fc                mov    %cl,-0x4(%edx)
> >> - 109:       8a 48 f9                mov    -0x7(%eax),%cl
> >> - 10c:       88 4a fd                mov    %cl,-0x3(%edx)
> >> - 10f:       8a 48 fa                mov    -0x6(%eax),%cl
> >> - 112:       88 4a fe                mov    %cl,-0x2(%edx)
> >> - 115:       eb 95                   jmp    ac <ib_uverbs_poll_cq+0xac>
> >> - 117:       8b 14 24                mov    (%esp),%edx
> >> - 11a:       8b 4c 24 04             mov    0x4(%esp),%ecx
> >> - 11e:       8b 44 24 08             mov    0x8(%esp),%eax
> >> - 122:       e8 fc ff ff ff          call   123 <ib_uverbs_poll_cq+0x123>
> >> -                    123: R_386_PC32 copy_to_user
> >> - 127:       83 f8 01                cmp    $0x1,%eax
> >> - 12a:       19 d2                   sbb    %edx,%edx
> >> - 12c:       f7 d2                   not    %edx
> >> - 12e:       83 e2 f2                and    $0xfffffff2,%edx
> >> - 131:       8b 04 24                mov    (%esp),%eax
> >> - 134:       89 54 24 04             mov    %edx,0x4(%esp)
> >> - 138:       e8 fc ff ff ff          call   139 <ib_uverbs_poll_cq+0x139>
> >> -                    139: R_386_PC32 kfree
> >> - 13d:       8b 54 24 04             mov    0x4(%esp),%edx
> >> - 141:       89 e8                   mov    %ebp,%eax
> >> - 143:       89 14 24                mov    %edx,(%esp)
> >> - 146:       e8 fc ff ff ff          call   147 <ib_uverbs_poll_cq+0x147>
> >> -                    147: R_386_PC32 kfree
> >> - 14b:       8b 14 24                mov    (%esp),%edx
> >> - 14e:       85 d2                   test   %edx,%edx
> >> - 150:       0f 45 f2                cmovne %edx,%esi
> >> - 153:       eb 0c                   jmp    161 <ib_uverbs_poll_cq+0x161>
> >> - 155:       be f2 ff ff ff          mov    $0xfffffff2,%esi
> >> - 15a:       eb 05                   jmp    161 <ib_uverbs_poll_cq+0x161>
> >> - 15c:       be f4 ff ff ff          mov    $0xfffffff4,%esi
> >> - 161:       83 c4 18                add    $0x18,%esp
> >> - 164:       89 f0                   mov    %esi,%eax
> >> - 166:       5b                      pop    %ebx
> >> - 167:       5e                      pop    %esi
> >> - 168:       5f                      pop    %edi
> >> - 169:       5d                      pop    %ebp
> >> - 16a:       c3                      ret
> >> +   2:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
> >> +   7:       56                      push   %esi
> >> +   8:       53                      push   %ebx
> >> +   9:       89 c3                   mov    %eax,%ebx
> >> +   b:       81 ec 84 00 00 00       sub    $0x84,%esp
> >> +  11:       89 4c 24 04             mov    %ecx,0x4(%esp)
> >> +  15:       8d 44 24 10             lea    0x10(%esp),%eax
> >> +  19:       b9 10 00 00 00          mov    $0x10,%ecx
> >> +  1e:       e8 fc ff ff ff          call   1f <ib_uverbs_poll_cq+0x1f>
> >> +                    1f: R_386_PC32  copy_from_user
> >> +  23:       89 04 24                mov    %eax,(%esp)
> >> +  26:       85 c0                   test   %eax,%eax
> >> +  28:       0f 85 1f 01 00 00       jne    14d <ib_uverbs_poll_cq+0x14d>
> >> +  2e:       8b 4b 58                mov    0x58(%ebx),%ecx
> >> +  31:       6a 00                   push   $0x0
> >> +  33:       b8 00 00 00 00          mov    $0x0,%eax
> >> +                    34: R_386_32    ib_uverbs_cq_idr
> >> +  38:       bf ea ff ff ff          mov    $0xffffffea,%edi
> >> +  3d:       8b 54 24 1c             mov    0x1c(%esp),%edx
> >> +  41:       e8 fc ff ff ff          call   42 <ib_uverbs_poll_cq+0x42>
> >> +                    42: R_386_PC32  .text.idr_read_obj
> >> +  46:       89 c3                   mov    %eax,%ebx
> >> +  48:       58                      pop    %eax
> >> +  49:       85 db                   test   %ebx,%ebx
> >> +  4b:       0f 84 fc 00 00 00       je     14d <ib_uverbs_poll_cq+0x14d>
> >> +  51:       8b 6c 24 10             mov    0x10(%esp),%ebp
> >> +  55:       b9 02 00 00 00          mov    $0x2,%ecx
> >> +  5a:       8d 7c 24 08             lea    0x8(%esp),%edi
> >> +  5e:       8b 04 24                mov    (%esp),%eax
> >> +  61:       8d 75 08                lea    0x8(%ebp),%esi
> >> +  64:       f3 ab                   rep stos %eax,%es:(%edi)
> >> +  66:       8b 44 24 1c             mov    0x1c(%esp),%eax
> >> +  6a:       39 44 24 08             cmp    %eax,0x8(%esp)
> >> +  6e:       73 1f                   jae    8f <ib_uverbs_poll_cq+0x8f>
> >> +  70:       8b 3b                   mov    (%ebx),%edi
> >> +  72:       8d 4c 24 50             lea    0x50(%esp),%ecx
> >> +  76:       ba 01 00 00 00          mov    $0x1,%edx
> >> +  7b:       89 d8                   mov    %ebx,%eax
> >> +  7d:       ff 97 70 01 00 00       call   *0x170(%edi)
> >> +  83:       89 c7                   mov    %eax,%edi
> >> +  85:       85 c0                   test   %eax,%eax
> >> +  87:       0f 88 b9 00 00 00       js     146 <ib_uverbs_poll_cq+0x146>
> >> +  8d:       75 21                   jne    b0 <ib_uverbs_poll_cq+0xb0>
> >> +  8f:       b9 08 00 00 00          mov    $0x8,%ecx
> >> +  94:       8d 54 24 08             lea    0x8(%esp),%edx
> >> +  98:       89 e8                   mov    %ebp,%eax
> >> +  9a:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
> >> +  9f:       e8 fc ff ff ff          call   a0 <ib_uverbs_poll_cq+0xa0>
> >> +                    a0: R_386_PC32  copy_to_user
> >> +  a4:       85 c0                   test   %eax,%eax
> >> +  a6:       0f 44 7c 24 04          cmove  0x4(%esp),%edi
> >> +  ab:       e9 96 00 00 00          jmp    146 <ib_uverbs_poll_cq+0x146>
> >> +  b0:       8b 44 24 50             mov    0x50(%esp),%eax
> >> +  b4:       8b 54 24 54             mov    0x54(%esp),%edx
> >> +  b8:       b9 30 00 00 00          mov    $0x30,%ecx
> >> +  bd:       89 44 24 20             mov    %eax,0x20(%esp)
> >> +  c1:       8b 44 24 58             mov    0x58(%esp),%eax
> >> +  c5:       89 54 24 24             mov    %edx,0x24(%esp)
> >> +  c9:       8b 54 24 74             mov    0x74(%esp),%edx
> >> +  cd:       89 44 24 28             mov    %eax,0x28(%esp)
> >> +  d1:       8b 44 24 5c             mov    0x5c(%esp),%eax
> >> +  d5:       89 44 24 2c             mov    %eax,0x2c(%esp)
> >> +  d9:       8b 44 24 60             mov    0x60(%esp),%eax
> >> +  dd:       89 44 24 30             mov    %eax,0x30(%esp)
> >> +  e1:       8b 44 24 64             mov    0x64(%esp),%eax
> >> +  e5:       89 44 24 34             mov    %eax,0x34(%esp)
> >> +  e9:       8b 44 24 6c             mov    0x6c(%esp),%eax
> >> +  ed:       89 44 24 38             mov    %eax,0x38(%esp)
> >> +  f1:       8b 44 24 68             mov    0x68(%esp),%eax
> >> +  f5:       8b 40 20                mov    0x20(%eax),%eax
> >> +  f8:       89 54 24 44             mov    %edx,0x44(%esp)
> >> +  fc:       8d 54 24 20             lea    0x20(%esp),%edx
> >> + 100:       c6 44 24 4f 00          movb   $0x0,0x4f(%esp)
> >> + 105:       89 44 24 3c             mov    %eax,0x3c(%esp)
> >> + 109:       8b 44 24 70             mov    0x70(%esp),%eax
> >> + 10d:       89 44 24 40             mov    %eax,0x40(%esp)
> >> + 111:       8b 44 24 78             mov    0x78(%esp),%eax
> >> + 115:       89 44 24 48             mov    %eax,0x48(%esp)
> >> + 119:       8b 44 24 7c             mov    0x7c(%esp),%eax
> >> + 11d:       66 89 44 24 4c          mov    %ax,0x4c(%esp)
> >> + 122:       8a 44 24 7e             mov    0x7e(%esp),%al
> >> + 126:       88 44 24 4e             mov    %al,0x4e(%esp)
> >> + 12a:       89 f0                   mov    %esi,%eax
> >> + 12c:       e8 fc ff ff ff          call   12d <ib_uverbs_poll_cq+0x12d>
> >> +                    12d: R_386_PC32 copy_to_user
> >> + 131:       85 c0                   test   %eax,%eax
> >> + 133:       75 0c                   jne    141 <ib_uverbs_poll_cq+0x141>
> >> + 135:       83 c6 30                add    $0x30,%esi
> >> + 138:       ff 44 24 08             incl   0x8(%esp)
> >> + 13c:       e9 25 ff ff ff          jmp    66 <ib_uverbs_poll_cq+0x66>
> >> + 141:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
> >> + 146:       89 d8                   mov    %ebx,%eax
> >> + 148:       e8 fc ff ff ff          call   149 <ib_uverbs_poll_cq+0x149>
> >> +                    149: R_386_PC32 .text.put_cq_read
> >> + 14d:       81 c4 84 00 00 00       add    $0x84,%esp
> >> + 153:       89 f8                   mov    %edi,%eax
> >> + 155:       5b                      pop    %ebx
> >> + 156:       5e                      pop    %esi
> >> + 157:       5f                      pop    %edi
> >> + 158:       5d                      pop    %ebp
> >> + 159:       c3                      ret
> >>
> >>  Disassembly of section .text.ib_uverbs_req_notify_cq:
> >>
> >> @@ -1915,7 +1893,7 @@
> >>    94:       b8 7b 00 00 00          mov    $0x7b,%eax
> >>                      95: R_386_32    .text.ib_uverbs_create_qp
> >>    99:       e8 fc ff ff ff          call   9a <ib_uverbs_create_qp+0x9a>
> >> -                    9a: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    9a: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    9e:       59                      pop    %ecx
> >>    9f:       c7 85 50 ff ff ff f4    movl   $0xfffffff4,-0xb0(%ebp)
> >>    a6:       ff ff ff
> >> @@ -2241,7 +2219,7 @@
> >>    68:       b8 4f 00 00 00          mov    $0x4f,%eax
> >>                      69: R_386_32    .text.ib_uverbs_query_qp
> >>    6d:       e8 fc ff ff ff          call   6e <ib_uverbs_query_qp+0x6e>
> >> -                    6e: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    6e: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    72:       59                      pop    %ecx
> >>    73:       c7 85 5c ff ff ff 10    movl   $0x10,-0xa4(%ebp)
> >>    7a:       00 00 00
> >> @@ -2260,7 +2238,7 @@
> >>    a3:       b8 86 00 00 00          mov    $0x86,%eax
> >>                      a4: R_386_32    .text.ib_uverbs_query_qp
> >>    a8:       e8 fc ff ff ff          call   a9 <ib_uverbs_query_qp+0xa9>
> >> -                    a9: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    a9: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    ad:       5a                      pop    %edx
> >>    ae:       85 db                   test   %ebx,%ebx
> >>    b0:       0f 84 c1 01 00 00       je     277 <ib_uverbs_query_qp+0x277>
> >> @@ -2462,7 +2440,7 @@
> >>    88:       b8 6f 00 00 00          mov    $0x6f,%eax
> >>                      89: R_386_32    .text.ib_uverbs_modify_qp
> >>    8d:       e8 fc ff ff ff          call   8e <ib_uverbs_modify_qp+0x8e>
> >> -                    8e: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    8e: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    92:       5a                      pop    %edx
> >>    93:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
> >>    98:       85 db                   test   %ebx,%ebx
> >> @@ -3129,7 +3107,7 @@
> >>    6a:       b8 4c 00 00 00          mov    $0x4c,%eax
> >>                      6b: R_386_32    .text.ib_uverbs_create_ah
> >>    6f:       e8 fc ff ff ff          call   70 <ib_uverbs_create_ah+0x70>
> >> -                    70: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    70: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    74:       58                      pop    %eax
> >>    75:       85 db                   test   %ebx,%ebx
> >>    77:       75 0a                   jne    83 <ib_uverbs_create_ah+0x83>
> >> @@ -3396,7 +3374,7 @@
> >>    af:       b8 91 00 00 00          mov    $0x91,%eax
> >>                      b0: R_386_32    .text.ib_uverbs_attach_mcast
> >>    b4:       e8 fc ff ff ff          call   b5 <ib_uverbs_attach_mcast+0xb5>
> >> -                    b5: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    b5: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    b9:       58                      pop    %eax
> >>    ba:       85 db                   test   %ebx,%ebx
> >>    bc:       75 07                   jne    c5 <ib_uverbs_attach_mcast+0xc5>
> >> @@ -3572,7 +3550,7 @@
> >>    77:       b8 5e 00 00 00          mov    $0x5e,%eax
> >>                      78: R_386_32    .text.ib_uverbs_create_srq
> >>    7c:       e8 fc ff ff ff          call   7d <ib_uverbs_create_srq+0x7d>
> >> -                    7d: R_386_PC32  .text.trace_kmalloc.constprop.3
> >> +                    7d: R_386_PC32  .text.trace_kmalloc.constprop.4
> >>    81:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
> >>    86:       58                      pop    %eax
> >>    87:       85 db                   test   %ebx,%ebx
> >>
> >> Needless to say, after my change the diff only shows the truly changed
> >> functions.
> >>
> >> - Michael
> >>
> >
> > Updated patch
> >
> >
> > gcc:
> > 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
> >
> >        Make function clone name numbering independent.
> >        * cgraph.h (clone_function_name_1): Add clone_num argument
> >        * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
> >        (clone_function_name): Use counters from the hashtable.
> >
> > lto:
> > 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
> >
> >        * lto-partition.c (privatize_symbol_name_1): Pass 0 to
> >        clone_function_name_1.
> >
> >
> > testsuite:
> > 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
> >
> >       Clone numbering should be independent for each function.
> >       * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
> >
> > ---
> >  gcc/cgraph.h                                  |  2 +-
> >  gcc/cgraphclones.c                            | 16 ++++++++---
> >  gcc/lto/lto-partition.c                       |  2 +-
> >  gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
> >  4 files changed, 52 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c
> >
> > diff --git gcc/cgraph.h gcc/cgraph.h
> > index a8b1b4c..fe44bd0 100644
> > --- gcc/cgraph.h
> > +++ gcc/cgraph.h
> > @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool,
> > profile_count);
> >  tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
> >  /* In cgraphclones.c  */
> >
> > -tree clone_function_name_1 (const char *, const char *);
> > +tree clone_function_name_1 (const char *, const char *, unsigned int);
> >  tree clone_function_name (tree decl, const char *);
> >
> >  void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
> > diff --git gcc/cgraphclones.c gcc/cgraphclones.c
> > index 6e84a31..7de7a65 100644
> > --- gcc/cgraphclones.c
> > +++ gcc/cgraphclones.c
> > @@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count
> > prof_count,
> >    return new_node;
> >  }
> >
> > -static GTY(()) unsigned int clone_fn_id_num;
> > +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
> >
> >  /* Return a new assembler name for a clone with SUFFIX of a decl named
> >     NAME.  */
> >
> >  tree
> > -clone_function_name_1 (const char *name, const char *suffix)
> > +clone_function_name_1 (const char *name, const char *suffix, unsigned int
> > clone_num)
> >  {
> >    size_t len = strlen (name);
> >    char *tmp_name, *prefix;
> > @@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix)
> >    memcpy (prefix, name, len);
> >    strcpy (prefix + len + 1, suffix);
> >    prefix[len] = symbol_table::symbol_suffix_separator ();
> > -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> > +  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num);
> >    return get_identifier (tmp_name);
> >  }
> >
> > @@ -537,7 +537,15 @@ tree
> >  clone_function_name (tree decl, const char *suffix)
> >  {
> >    tree name = DECL_ASSEMBLER_NAME (decl);
> > -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
> > +  const char *decl_name = IDENTIFIER_POINTER (name);
> > +  unsigned int *suffix_counter;
> > +  if (!clone_fn_ids) {
> > +    /* Initialize the per-function counter hash table on first call */
> > +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
> > +  }
> > +  suffix_counter = &clone_fn_ids->get_or_insert(decl_name);
> > +  *suffix_counter = *suffix_counter + 1;
> > +  return clone_function_name_1 (decl_name, suffix, *suffix_counter);
> >  }
> >
> >
> > diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
> > index c7a5710..4b15232 100644
> > --- gcc/lto/lto-partition.c
> > +++ gcc/lto/lto-partition.c
> > @@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
> >    name = maybe_rewrite_identifier (name);
> >    symtab->change_decl_assembler_name (decl,
> >                                     clone_function_name_1 (name,
> > -                                                          "lto_priv"));
> > +                                                          "lto_priv", 0));
> >
> >    if (node->lto_file_data)
> >      lto_record_renamed_decl (node->lto_file_data, name,
> > diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c
> > gcc/testsuite/gcc.dg/independent-cloneids-1.c
> > new file mode 100644
> > index 0000000..d723e20
> > --- /dev/null
> > +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
> > +
> > +extern int printf (const char *, ...);
> > +
> > +static int __attribute__ ((noinline))
> > +foo (int arg)
> > +{
> > +  return 7 * arg;
> > +}
> > +
> > +static int __attribute__ ((noinline))
> > +bar (int arg)
> > +{
> > +  return arg * arg;
> > +}
> > +
> > +int
> > +baz (int arg)
> > +{
> > +  printf("%d\n", bar (3));
> > +  printf("%d\n", bar (4));
> > +  printf("%d\n", foo (5));
> > +  printf("%d\n", foo (6));
> > +  /* adding or removing the following call should not affect foo
> > +     function's clone numbering */
> > +  printf("%d\n", bar (7));
> > +  return foo (8);
> > +}
> > +
> > +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
> > +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
> > +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
> > +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
> > +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
> > +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
> > +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
> > +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
> >
>
> Ping? Could someone else please take a look at this since Richard appears to be
> overloaded at the moment?

The implementation is now how I suggested.  I'm still not 100% agreeing to
the solution of using a hash-map here as I hoped that most callers have
a good idea themselves what "number" of clone they are creating.  Thus
most of them should be explicit in passing the number.  But I didn't investigate
any of them (but the LTO one which I think doesn't need the suffix at all).

So if anybody besides me is fine with this approach feel free to approve.

One minor nit:

> > +  if (!clone_fn_ids) {
> > +    /* Initialize the per-function counter hash table on first call */
> > +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
> > +  }

omit {} around single stmts, the comment doesn't add much information but
if you want to keep it move it before the if().

Thanks and sorry for the delay (slowly wading through older patches...),
Richard.

>
>
> Regards,
> - Michael
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Make function clone name numbering independent.
  2018-08-01 10:38                 ` Richard Biener
@ 2018-08-02 19:05                   ` Michael Ploujnikov
  2018-08-13 23:58                     ` [PING][PATCH] " Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-08-02 19:05 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bernhard Fischer, GCC Patches


[-- Attachment #1.1.1: Type: text/plain, Size: 34010 bytes --]

On 2018-08-01 06:37 AM, Richard Biener wrote:
> On Tue, Jul 31, 2018 at 7:40 PM Michael Ploujnikov
> <michael.ploujnikov@oracle.com> wrote:
>>
>> On 2018-07-26 01:27 PM, Michael Ploujnikov wrote:
>>> On 2018-07-24 09:57 AM, Michael Ploujnikov wrote:
>>>> On 2018-07-20 06:05 AM, Richard Biener wrote:
>>>>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>>>>>     NAME.  */
>>>>>> @@ -521,14 +521,13 @@ tree
>>>>>>  clone_function_name_1 (const char *name, const char *suffix)
>>>>>
>>>>> pass this function the counter to use....
>>>>>
>>>>>>  {
>>>>>>    size_t len = strlen (name);
>>>>>> -  char *tmp_name, *prefix;
>>>>>> +  char *prefix;
>>>>>>
>>>>>>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>>>>>>    memcpy (prefix, name, len);
>>>>>>    strcpy (prefix + len + 1, suffix);
>>>>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>>>>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>>>>
>>>>> and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
>>>>> the lto/lto-partition.c caller (just use zero as counter).
>>>>>
>>>>>> -  return get_identifier (tmp_name);
>>>>>> +  return get_identifier (prefix);
>>>>>>  }
>>>>>>
>>>>>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
>>>>>> @@ -537,7 +536,17 @@ tree
>>>>>>  clone_function_name (tree decl, const char *suffix)
>>>>>>  {
>>>>>>    tree name = DECL_ASSEMBLER_NAME (decl);
>>>>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>>>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>>>>>> +  char *numbered_name;
>>>>>> +  unsigned int *suffix_counter;
>>>>>> +  if (!clone_fn_ids) {
>>>>>> +    /* Initialize the per-function counter hash table if this is the first call */
>>>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>>>>> +  }
>>>>>
>>>>> I still do not like throwing memory at the problem in this way for the
>>>>> little benefit
>>>>> this change provides :/
>>>>>
>>>>> So no approval from me at this point...
>>>>>
>>>>> Richard.
>>>>
>>>> Can you give me an idea of the memory constraints that are involved?
>>>>
>>>> The highest memory usage increase that I could find was when compiling
>>>> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB
>>>> increase over the before-patch use of 6936kB which is barely 0.03%.
>>>>
>>>> Using a single counter can result in more confusing namespacing when
>>>> you have .bar.clone.4 despite there only being 3 clones of .bar.
>>>>
>>>> From a practical point of view this change is helpful to anyone
>>>> diffing binary output such as forensic analysts, Debian Reproducible
>>>> Builds or even someone validating compiler output (before and after an input
>>>> source patch). The extra changes that this patch alleviates are a
>>>> distraction and could even be misleading. For example, applying a
>>>> source patch to the same Linux source produces the following binary
>>>> diff before my change:
>>>>
>>>> --- /tmp/output.o.objdump
>>>> +++ /tmp/patched-output.o.objdump
>>>> @@ -1,5 +1,5 @@
>>>>
>>>> -/tmp/uverbs_cmd/output.o:     file format elf32-i386
>>>> +/tmp/uverbs_cmd/patched-output.o:     file format elf32-i386
>>>>
>>>>
>>>>  Disassembly of section .text.get_order:
>>>> @@ -265,12 +265,12 @@
>>>>     3:       e9 fc ff ff ff          jmp    4 <put_cq_read+0x4>
>>>>                      4: R_386_PC32   .text.put_uobj_read
>>>>
>>>> -Disassembly of section .text.trace_kmalloc.constprop.3:
>>>> +Disassembly of section .text.trace_kmalloc.constprop.4:
>>>>
>>>> -00000000 <trace_kmalloc.constprop.3>:
>>>> +00000000 <trace_kmalloc.constprop.4>:
>>>>     0:       83 3d 04 00 00 00 00    cmpl   $0x0,0x4
>>>>                      2: R_386_32     __tracepoint_kmalloc
>>>> -   7:       74 34                   je     3d <trace_kmalloc.constprop.3+0x3d>
>>>> +   7:       74 34                   je     3d <trace_kmalloc.constprop.4+0x3d>
>>>>     9:       55                      push   %ebp
>>>>     a:       89 cd                   mov    %ecx,%ebp
>>>>     c:       57                      push   %edi
>>>> @@ -281,7 +281,7 @@
>>>>    13:       8b 1d 10 00 00 00       mov    0x10,%ebx
>>>>                      15: R_386_32    __tracepoint_kmalloc
>>>>    19:       85 db                   test   %ebx,%ebx
>>>> -  1b:       74 1b                   je     38 <trace_kmalloc.constprop.3+0x38>
>>>> +  1b:       74 1b                   je     38 <trace_kmalloc.constprop.4+0x38>
>>>>    1d:       68 d0 00 00 00          push   $0xd0
>>>>    22:       89 fa                   mov    %edi,%edx
>>>>    24:       89 f0                   mov    %esi,%eax
>>>> @@ -292,7 +292,7 @@
>>>>    31:       58                      pop    %eax
>>>>    32:       83 3b 00                cmpl   $0x0,(%ebx)
>>>>    35:       5a                      pop    %edx
>>>> -  36:       eb e3                   jmp    1b <trace_kmalloc.constprop.3+0x1b>
>>>> +  36:       eb e3                   jmp    1b <trace_kmalloc.constprop.4+0x1b>
>>>>    38:       5b                      pop    %ebx
>>>>    39:       5e                      pop    %esi
>>>>    3a:       5f                      pop    %edi
>>>> @@ -846,7 +846,7 @@
>>>>    78:       b8 5f 00 00 00          mov    $0x5f,%eax
>>>>                      79: R_386_32    .text.ib_uverbs_alloc_pd
>>>>    7d:       e8 fc ff ff ff          call   7e <ib_uverbs_alloc_pd+0x7e>
>>>> -                    7e: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    7e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    82:       c7 45 d4 f4 ff ff ff    movl   $0xfffffff4,-0x2c(%ebp)
>>>>    89:       59                      pop    %ecx
>>>>    8a:       85 db                   test   %ebx,%ebx
>>>> @@ -1068,7 +1068,7 @@
>>>>    9c:       b8 83 00 00 00          mov    $0x83,%eax
>>>>                      9d: R_386_32    .text.ib_uverbs_reg_mr
>>>>    a1:       e8 fc ff ff ff          call   a2 <ib_uverbs_reg_mr+0xa2>
>>>> -                    a2: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    a2: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    a6:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>    ab:       58                      pop    %eax
>>>>    ac:       85 db                   test   %ebx,%ebx
>>>> @@ -1385,7 +1385,7 @@
>>>>    99:       b8 7b 00 00 00          mov    $0x7b,%eax
>>>>                      9a: R_386_32    .text.ib_uverbs_create_cq
>>>>    9e:       e8 fc ff ff ff          call   9f <ib_uverbs_create_cq+0x9f>
>>>> -                    9f: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    9f: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    a3:       58                      pop    %eax
>>>>    a4:       85 db                   test   %ebx,%ebx
>>>>    a6:       75 0a                   jne    b2 <ib_uverbs_create_cq+0xb2>
>>>> @@ -1607,129 +1607,107 @@
>>>>  00000000 <ib_uverbs_poll_cq>:
>>>>     0:       55                      push   %ebp
>>>>     1:       57                      push   %edi
>>>> -   2:       89 c7                   mov    %eax,%edi
>>>> -   4:       56                      push   %esi
>>>> -   5:       89 ce                   mov    %ecx,%esi
>>>> -   7:       b9 10 00 00 00          mov    $0x10,%ecx
>>>> -   c:       53                      push   %ebx
>>>> -   d:       83 ec 18                sub    $0x18,%esp
>>>> -  10:       8d 44 24 08             lea    0x8(%esp),%eax
>>>> -  14:       e8 fc ff ff ff          call   15 <ib_uverbs_poll_cq+0x15>
>>>> -                    15: R_386_PC32  copy_from_user
>>>> -  19:       85 c0                   test   %eax,%eax
>>>> -  1b:       0f 85 34 01 00 00       jne    155 <ib_uverbs_poll_cq+0x155>
>>>> -  21:       6b 44 24 14 34          imul   $0x34,0x14(%esp),%eax
>>>> -  26:       ba d0 00 00 00          mov    $0xd0,%edx
>>>> -  2b:       e8 fc ff ff ff          call   2c <ib_uverbs_poll_cq+0x2c>
>>>> -                    2c: R_386_PC32  __kmalloc
>>>> -  30:       89 c5                   mov    %eax,%ebp
>>>> -  32:       85 c0                   test   %eax,%eax
>>>> -  34:       0f 84 22 01 00 00       je     15c <ib_uverbs_poll_cq+0x15c>
>>>> -  3a:       6b 44 24 14 30          imul   $0x30,0x14(%esp),%eax
>>>> -  3f:       ba d0 00 00 00          mov    $0xd0,%edx
>>>> -  44:       83 c0 08                add    $0x8,%eax
>>>> -  47:       89 44 24 04             mov    %eax,0x4(%esp)
>>>> -  4b:       e8 fc ff ff ff          call   4c <ib_uverbs_poll_cq+0x4c>
>>>> -                    4c: R_386_PC32  __kmalloc
>>>> -  50:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>> -  55:       89 04 24                mov    %eax,(%esp)
>>>> -  58:       85 c0                   test   %eax,%eax
>>>> -  5a:       0f 84 e1 00 00 00       je     141 <ib_uverbs_poll_cq+0x141>
>>>> -  60:       8b 4f 58                mov    0x58(%edi),%ecx
>>>> -  63:       6a 00                   push   $0x0
>>>> -  65:       b8 00 00 00 00          mov    $0x0,%eax
>>>> -                    66: R_386_32    ib_uverbs_cq_idr
>>>> -  6a:       8b 54 24 14             mov    0x14(%esp),%edx
>>>> -  6e:       e8 fc ff ff ff          call   6f <ib_uverbs_poll_cq+0x6f>
>>>> -                    6f: R_386_PC32  .text.idr_read_obj
>>>> -  73:       ba ea ff ff ff          mov    $0xffffffea,%edx
>>>> -  78:       89 c7                   mov    %eax,%edi
>>>> -  7a:       58                      pop    %eax
>>>> -  7b:       85 ff                   test   %edi,%edi
>>>> -  7d:       0f 84 ae 00 00 00       je     131 <ib_uverbs_poll_cq+0x131>
>>>> -  83:       8b 1f                   mov    (%edi),%ebx
>>>> -  85:       8b 54 24 14             mov    0x14(%esp),%edx
>>>> -  89:       89 e9                   mov    %ebp,%ecx
>>>> -  8b:       89 f8                   mov    %edi,%eax
>>>> -  8d:       ff 93 70 01 00 00       call   *0x170(%ebx)
>>>> -  93:       8b 1c 24                mov    (%esp),%ebx
>>>> -  96:       89 03                   mov    %eax,(%ebx)
>>>> -  98:       89 f8                   mov    %edi,%eax
>>>> -  9a:       e8 fc ff ff ff          call   9b <ib_uverbs_poll_cq+0x9b>
>>>> -                    9b: R_386_PC32  .text.put_cq_read
>>>> -  9f:       8b 1c 24                mov    (%esp),%ebx
>>>> -  a2:       89 e8                   mov    %ebp,%eax
>>>> -  a4:       6b 3b 34                imul   $0x34,(%ebx),%edi
>>>> -  a7:       8d 53 08                lea    0x8(%ebx),%edx
>>>> -  aa:       01 ef                   add    %ebp,%edi
>>>> -  ac:       39 f8                   cmp    %edi,%eax
>>>> -  ae:       74 67                   je     117 <ib_uverbs_poll_cq+0x117>
>>>> -  b0:       8b 08                   mov    (%eax),%ecx
>>>> -  b2:       8b 58 04                mov    0x4(%eax),%ebx
>>>> -  b5:       83 c2 30                add    $0x30,%edx
>>>> -  b8:       83 c0 34                add    $0x34,%eax
>>>> -  bb:       89 4a d0                mov    %ecx,-0x30(%edx)
>>>> -  be:       89 5a d4                mov    %ebx,-0x2c(%edx)
>>>> -  c1:       8b 48 d4                mov    -0x2c(%eax),%ecx
>>>> -  c4:       89 4a d8                mov    %ecx,-0x28(%edx)
>>>> -  c7:       8b 48 d8                mov    -0x28(%eax),%ecx
>>>> -  ca:       89 4a dc                mov    %ecx,-0x24(%edx)
>>>> -  cd:       8b 48 dc                mov    -0x24(%eax),%ecx
>>>> -  d0:       89 4a e0                mov    %ecx,-0x20(%edx)
>>>> -  d3:       8b 48 e0                mov    -0x20(%eax),%ecx
>>>> -  d6:       89 4a e4                mov    %ecx,-0x1c(%edx)
>>>> -  d9:       8b 48 e8                mov    -0x18(%eax),%ecx
>>>> -  dc:       89 4a e8                mov    %ecx,-0x18(%edx)
>>>> -  df:       8b 48 e4                mov    -0x1c(%eax),%ecx
>>>> -  e2:       8b 49 20                mov    0x20(%ecx),%ecx
>>>> -  e5:       89 4a ec                mov    %ecx,-0x14(%edx)
>>>> -  e8:       8b 48 ec                mov    -0x14(%eax),%ecx
>>>> -  eb:       89 4a f0                mov    %ecx,-0x10(%edx)
>>>> -  ee:       8b 48 f0                mov    -0x10(%eax),%ecx
>>>> -  f1:       89 4a f4                mov    %ecx,-0xc(%edx)
>>>> -  f4:       8b 48 f4                mov    -0xc(%eax),%ecx
>>>> -  f7:       66 89 4a f8             mov    %cx,-0x8(%edx)
>>>> -  fb:       66 8b 48 f6             mov    -0xa(%eax),%cx
>>>> -  ff:       66 89 4a fa             mov    %cx,-0x6(%edx)
>>>> - 103:       8a 48 f8                mov    -0x8(%eax),%cl
>>>> - 106:       88 4a fc                mov    %cl,-0x4(%edx)
>>>> - 109:       8a 48 f9                mov    -0x7(%eax),%cl
>>>> - 10c:       88 4a fd                mov    %cl,-0x3(%edx)
>>>> - 10f:       8a 48 fa                mov    -0x6(%eax),%cl
>>>> - 112:       88 4a fe                mov    %cl,-0x2(%edx)
>>>> - 115:       eb 95                   jmp    ac <ib_uverbs_poll_cq+0xac>
>>>> - 117:       8b 14 24                mov    (%esp),%edx
>>>> - 11a:       8b 4c 24 04             mov    0x4(%esp),%ecx
>>>> - 11e:       8b 44 24 08             mov    0x8(%esp),%eax
>>>> - 122:       e8 fc ff ff ff          call   123 <ib_uverbs_poll_cq+0x123>
>>>> -                    123: R_386_PC32 copy_to_user
>>>> - 127:       83 f8 01                cmp    $0x1,%eax
>>>> - 12a:       19 d2                   sbb    %edx,%edx
>>>> - 12c:       f7 d2                   not    %edx
>>>> - 12e:       83 e2 f2                and    $0xfffffff2,%edx
>>>> - 131:       8b 04 24                mov    (%esp),%eax
>>>> - 134:       89 54 24 04             mov    %edx,0x4(%esp)
>>>> - 138:       e8 fc ff ff ff          call   139 <ib_uverbs_poll_cq+0x139>
>>>> -                    139: R_386_PC32 kfree
>>>> - 13d:       8b 54 24 04             mov    0x4(%esp),%edx
>>>> - 141:       89 e8                   mov    %ebp,%eax
>>>> - 143:       89 14 24                mov    %edx,(%esp)
>>>> - 146:       e8 fc ff ff ff          call   147 <ib_uverbs_poll_cq+0x147>
>>>> -                    147: R_386_PC32 kfree
>>>> - 14b:       8b 14 24                mov    (%esp),%edx
>>>> - 14e:       85 d2                   test   %edx,%edx
>>>> - 150:       0f 45 f2                cmovne %edx,%esi
>>>> - 153:       eb 0c                   jmp    161 <ib_uverbs_poll_cq+0x161>
>>>> - 155:       be f2 ff ff ff          mov    $0xfffffff2,%esi
>>>> - 15a:       eb 05                   jmp    161 <ib_uverbs_poll_cq+0x161>
>>>> - 15c:       be f4 ff ff ff          mov    $0xfffffff4,%esi
>>>> - 161:       83 c4 18                add    $0x18,%esp
>>>> - 164:       89 f0                   mov    %esi,%eax
>>>> - 166:       5b                      pop    %ebx
>>>> - 167:       5e                      pop    %esi
>>>> - 168:       5f                      pop    %edi
>>>> - 169:       5d                      pop    %ebp
>>>> - 16a:       c3                      ret
>>>> +   2:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
>>>> +   7:       56                      push   %esi
>>>> +   8:       53                      push   %ebx
>>>> +   9:       89 c3                   mov    %eax,%ebx
>>>> +   b:       81 ec 84 00 00 00       sub    $0x84,%esp
>>>> +  11:       89 4c 24 04             mov    %ecx,0x4(%esp)
>>>> +  15:       8d 44 24 10             lea    0x10(%esp),%eax
>>>> +  19:       b9 10 00 00 00          mov    $0x10,%ecx
>>>> +  1e:       e8 fc ff ff ff          call   1f <ib_uverbs_poll_cq+0x1f>
>>>> +                    1f: R_386_PC32  copy_from_user
>>>> +  23:       89 04 24                mov    %eax,(%esp)
>>>> +  26:       85 c0                   test   %eax,%eax
>>>> +  28:       0f 85 1f 01 00 00       jne    14d <ib_uverbs_poll_cq+0x14d>
>>>> +  2e:       8b 4b 58                mov    0x58(%ebx),%ecx
>>>> +  31:       6a 00                   push   $0x0
>>>> +  33:       b8 00 00 00 00          mov    $0x0,%eax
>>>> +                    34: R_386_32    ib_uverbs_cq_idr
>>>> +  38:       bf ea ff ff ff          mov    $0xffffffea,%edi
>>>> +  3d:       8b 54 24 1c             mov    0x1c(%esp),%edx
>>>> +  41:       e8 fc ff ff ff          call   42 <ib_uverbs_poll_cq+0x42>
>>>> +                    42: R_386_PC32  .text.idr_read_obj
>>>> +  46:       89 c3                   mov    %eax,%ebx
>>>> +  48:       58                      pop    %eax
>>>> +  49:       85 db                   test   %ebx,%ebx
>>>> +  4b:       0f 84 fc 00 00 00       je     14d <ib_uverbs_poll_cq+0x14d>
>>>> +  51:       8b 6c 24 10             mov    0x10(%esp),%ebp
>>>> +  55:       b9 02 00 00 00          mov    $0x2,%ecx
>>>> +  5a:       8d 7c 24 08             lea    0x8(%esp),%edi
>>>> +  5e:       8b 04 24                mov    (%esp),%eax
>>>> +  61:       8d 75 08                lea    0x8(%ebp),%esi
>>>> +  64:       f3 ab                   rep stos %eax,%es:(%edi)
>>>> +  66:       8b 44 24 1c             mov    0x1c(%esp),%eax
>>>> +  6a:       39 44 24 08             cmp    %eax,0x8(%esp)
>>>> +  6e:       73 1f                   jae    8f <ib_uverbs_poll_cq+0x8f>
>>>> +  70:       8b 3b                   mov    (%ebx),%edi
>>>> +  72:       8d 4c 24 50             lea    0x50(%esp),%ecx
>>>> +  76:       ba 01 00 00 00          mov    $0x1,%edx
>>>> +  7b:       89 d8                   mov    %ebx,%eax
>>>> +  7d:       ff 97 70 01 00 00       call   *0x170(%edi)
>>>> +  83:       89 c7                   mov    %eax,%edi
>>>> +  85:       85 c0                   test   %eax,%eax
>>>> +  87:       0f 88 b9 00 00 00       js     146 <ib_uverbs_poll_cq+0x146>
>>>> +  8d:       75 21                   jne    b0 <ib_uverbs_poll_cq+0xb0>
>>>> +  8f:       b9 08 00 00 00          mov    $0x8,%ecx
>>>> +  94:       8d 54 24 08             lea    0x8(%esp),%edx
>>>> +  98:       89 e8                   mov    %ebp,%eax
>>>> +  9a:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
>>>> +  9f:       e8 fc ff ff ff          call   a0 <ib_uverbs_poll_cq+0xa0>
>>>> +                    a0: R_386_PC32  copy_to_user
>>>> +  a4:       85 c0                   test   %eax,%eax
>>>> +  a6:       0f 44 7c 24 04          cmove  0x4(%esp),%edi
>>>> +  ab:       e9 96 00 00 00          jmp    146 <ib_uverbs_poll_cq+0x146>
>>>> +  b0:       8b 44 24 50             mov    0x50(%esp),%eax
>>>> +  b4:       8b 54 24 54             mov    0x54(%esp),%edx
>>>> +  b8:       b9 30 00 00 00          mov    $0x30,%ecx
>>>> +  bd:       89 44 24 20             mov    %eax,0x20(%esp)
>>>> +  c1:       8b 44 24 58             mov    0x58(%esp),%eax
>>>> +  c5:       89 54 24 24             mov    %edx,0x24(%esp)
>>>> +  c9:       8b 54 24 74             mov    0x74(%esp),%edx
>>>> +  cd:       89 44 24 28             mov    %eax,0x28(%esp)
>>>> +  d1:       8b 44 24 5c             mov    0x5c(%esp),%eax
>>>> +  d5:       89 44 24 2c             mov    %eax,0x2c(%esp)
>>>> +  d9:       8b 44 24 60             mov    0x60(%esp),%eax
>>>> +  dd:       89 44 24 30             mov    %eax,0x30(%esp)
>>>> +  e1:       8b 44 24 64             mov    0x64(%esp),%eax
>>>> +  e5:       89 44 24 34             mov    %eax,0x34(%esp)
>>>> +  e9:       8b 44 24 6c             mov    0x6c(%esp),%eax
>>>> +  ed:       89 44 24 38             mov    %eax,0x38(%esp)
>>>> +  f1:       8b 44 24 68             mov    0x68(%esp),%eax
>>>> +  f5:       8b 40 20                mov    0x20(%eax),%eax
>>>> +  f8:       89 54 24 44             mov    %edx,0x44(%esp)
>>>> +  fc:       8d 54 24 20             lea    0x20(%esp),%edx
>>>> + 100:       c6 44 24 4f 00          movb   $0x0,0x4f(%esp)
>>>> + 105:       89 44 24 3c             mov    %eax,0x3c(%esp)
>>>> + 109:       8b 44 24 70             mov    0x70(%esp),%eax
>>>> + 10d:       89 44 24 40             mov    %eax,0x40(%esp)
>>>> + 111:       8b 44 24 78             mov    0x78(%esp),%eax
>>>> + 115:       89 44 24 48             mov    %eax,0x48(%esp)
>>>> + 119:       8b 44 24 7c             mov    0x7c(%esp),%eax
>>>> + 11d:       66 89 44 24 4c          mov    %ax,0x4c(%esp)
>>>> + 122:       8a 44 24 7e             mov    0x7e(%esp),%al
>>>> + 126:       88 44 24 4e             mov    %al,0x4e(%esp)
>>>> + 12a:       89 f0                   mov    %esi,%eax
>>>> + 12c:       e8 fc ff ff ff          call   12d <ib_uverbs_poll_cq+0x12d>
>>>> +                    12d: R_386_PC32 copy_to_user
>>>> + 131:       85 c0                   test   %eax,%eax
>>>> + 133:       75 0c                   jne    141 <ib_uverbs_poll_cq+0x141>
>>>> + 135:       83 c6 30                add    $0x30,%esi
>>>> + 138:       ff 44 24 08             incl   0x8(%esp)
>>>> + 13c:       e9 25 ff ff ff          jmp    66 <ib_uverbs_poll_cq+0x66>
>>>> + 141:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
>>>> + 146:       89 d8                   mov    %ebx,%eax
>>>> + 148:       e8 fc ff ff ff          call   149 <ib_uverbs_poll_cq+0x149>
>>>> +                    149: R_386_PC32 .text.put_cq_read
>>>> + 14d:       81 c4 84 00 00 00       add    $0x84,%esp
>>>> + 153:       89 f8                   mov    %edi,%eax
>>>> + 155:       5b                      pop    %ebx
>>>> + 156:       5e                      pop    %esi
>>>> + 157:       5f                      pop    %edi
>>>> + 158:       5d                      pop    %ebp
>>>> + 159:       c3                      ret
>>>>
>>>>  Disassembly of section .text.ib_uverbs_req_notify_cq:
>>>>
>>>> @@ -1915,7 +1893,7 @@
>>>>    94:       b8 7b 00 00 00          mov    $0x7b,%eax
>>>>                      95: R_386_32    .text.ib_uverbs_create_qp
>>>>    99:       e8 fc ff ff ff          call   9a <ib_uverbs_create_qp+0x9a>
>>>> -                    9a: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    9a: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    9e:       59                      pop    %ecx
>>>>    9f:       c7 85 50 ff ff ff f4    movl   $0xfffffff4,-0xb0(%ebp)
>>>>    a6:       ff ff ff
>>>> @@ -2241,7 +2219,7 @@
>>>>    68:       b8 4f 00 00 00          mov    $0x4f,%eax
>>>>                      69: R_386_32    .text.ib_uverbs_query_qp
>>>>    6d:       e8 fc ff ff ff          call   6e <ib_uverbs_query_qp+0x6e>
>>>> -                    6e: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    6e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    72:       59                      pop    %ecx
>>>>    73:       c7 85 5c ff ff ff 10    movl   $0x10,-0xa4(%ebp)
>>>>    7a:       00 00 00
>>>> @@ -2260,7 +2238,7 @@
>>>>    a3:       b8 86 00 00 00          mov    $0x86,%eax
>>>>                      a4: R_386_32    .text.ib_uverbs_query_qp
>>>>    a8:       e8 fc ff ff ff          call   a9 <ib_uverbs_query_qp+0xa9>
>>>> -                    a9: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    a9: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    ad:       5a                      pop    %edx
>>>>    ae:       85 db                   test   %ebx,%ebx
>>>>    b0:       0f 84 c1 01 00 00       je     277 <ib_uverbs_query_qp+0x277>
>>>> @@ -2462,7 +2440,7 @@
>>>>    88:       b8 6f 00 00 00          mov    $0x6f,%eax
>>>>                      89: R_386_32    .text.ib_uverbs_modify_qp
>>>>    8d:       e8 fc ff ff ff          call   8e <ib_uverbs_modify_qp+0x8e>
>>>> -                    8e: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    8e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    92:       5a                      pop    %edx
>>>>    93:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>    98:       85 db                   test   %ebx,%ebx
>>>> @@ -3129,7 +3107,7 @@
>>>>    6a:       b8 4c 00 00 00          mov    $0x4c,%eax
>>>>                      6b: R_386_32    .text.ib_uverbs_create_ah
>>>>    6f:       e8 fc ff ff ff          call   70 <ib_uverbs_create_ah+0x70>
>>>> -                    70: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    70: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    74:       58                      pop    %eax
>>>>    75:       85 db                   test   %ebx,%ebx
>>>>    77:       75 0a                   jne    83 <ib_uverbs_create_ah+0x83>
>>>> @@ -3396,7 +3374,7 @@
>>>>    af:       b8 91 00 00 00          mov    $0x91,%eax
>>>>                      b0: R_386_32    .text.ib_uverbs_attach_mcast
>>>>    b4:       e8 fc ff ff ff          call   b5 <ib_uverbs_attach_mcast+0xb5>
>>>> -                    b5: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    b5: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    b9:       58                      pop    %eax
>>>>    ba:       85 db                   test   %ebx,%ebx
>>>>    bc:       75 07                   jne    c5 <ib_uverbs_attach_mcast+0xc5>
>>>> @@ -3572,7 +3550,7 @@
>>>>    77:       b8 5e 00 00 00          mov    $0x5e,%eax
>>>>                      78: R_386_32    .text.ib_uverbs_create_srq
>>>>    7c:       e8 fc ff ff ff          call   7d <ib_uverbs_create_srq+0x7d>
>>>> -                    7d: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>> +                    7d: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>    81:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>    86:       58                      pop    %eax
>>>>    87:       85 db                   test   %ebx,%ebx
>>>>
>>>> Needless to say, after my change the diff only shows the truly changed
>>>> functions.
>>>>
>>>> - Michael
>>>>
>>>
>>> Updated patch
>>>
>>>
>>> gcc:
>>> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>>
>>>        Make function clone name numbering independent.
>>>        * cgraph.h (clone_function_name_1): Add clone_num argument
>>>        * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
>>>        (clone_function_name): Use counters from the hashtable.
>>>
>>> lto:
>>> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>>
>>>        * lto-partition.c (privatize_symbol_name_1): Pass 0 to
>>>        clone_function_name_1.
>>>
>>>
>>> testsuite:
>>> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>>
>>>       Clone numbering should be independent for each function.
>>>       * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
>>>
>>> ---
>>>  gcc/cgraph.h                                  |  2 +-
>>>  gcc/cgraphclones.c                            | 16 ++++++++---
>>>  gcc/lto/lto-partition.c                       |  2 +-
>>>  gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
>>>  4 files changed, 52 insertions(+), 6 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>>
>>> diff --git gcc/cgraph.h gcc/cgraph.h
>>> index a8b1b4c..fe44bd0 100644
>>> --- gcc/cgraph.h
>>> +++ gcc/cgraph.h
>>> @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool,
>>> profile_count);
>>>  tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
>>>  /* In cgraphclones.c  */
>>>
>>> -tree clone_function_name_1 (const char *, const char *);
>>> +tree clone_function_name_1 (const char *, const char *, unsigned int);
>>>  tree clone_function_name (tree decl, const char *);
>>>
>>>  void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
>>> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
>>> index 6e84a31..7de7a65 100644
>>> --- gcc/cgraphclones.c
>>> +++ gcc/cgraphclones.c
>>> @@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count
>>> prof_count,
>>>    return new_node;
>>>  }
>>>
>>> -static GTY(()) unsigned int clone_fn_id_num;
>>> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
>>>
>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>>     NAME.  */
>>>
>>>  tree
>>> -clone_function_name_1 (const char *name, const char *suffix)
>>> +clone_function_name_1 (const char *name, const char *suffix, unsigned int
>>> clone_num)
>>>  {
>>>    size_t len = strlen (name);
>>>    char *tmp_name, *prefix;
>>> @@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix)
>>>    memcpy (prefix, name, len);
>>>    strcpy (prefix + len + 1, suffix);
>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>> +  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num);
>>>    return get_identifier (tmp_name);
>>>  }
>>>
>>> @@ -537,7 +537,15 @@ tree
>>>  clone_function_name (tree decl, const char *suffix)
>>>  {
>>>    tree name = DECL_ASSEMBLER_NAME (decl);
>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>>> +  unsigned int *suffix_counter;
>>> +  if (!clone_fn_ids) {
>>> +    /* Initialize the per-function counter hash table on first call */
>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>> +  }
>>> +  suffix_counter = &clone_fn_ids->get_or_insert(decl_name);
>>> +  *suffix_counter = *suffix_counter + 1;
>>> +  return clone_function_name_1 (decl_name, suffix, *suffix_counter);
>>>  }
>>>
>>>
>>> diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
>>> index c7a5710..4b15232 100644
>>> --- gcc/lto/lto-partition.c
>>> +++ gcc/lto/lto-partition.c
>>> @@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
>>>    name = maybe_rewrite_identifier (name);
>>>    symtab->change_decl_assembler_name (decl,
>>>                                     clone_function_name_1 (name,
>>> -                                                          "lto_priv"));
>>> +                                                          "lto_priv", 0));
>>>
>>>    if (node->lto_file_data)
>>>      lto_record_renamed_decl (node->lto_file_data, name,
>>> diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>> gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>> new file mode 100644
>>> index 0000000..d723e20
>>> --- /dev/null
>>> +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>> @@ -0,0 +1,38 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
>>> +
>>> +extern int printf (const char *, ...);
>>> +
>>> +static int __attribute__ ((noinline))
>>> +foo (int arg)
>>> +{
>>> +  return 7 * arg;
>>> +}
>>> +
>>> +static int __attribute__ ((noinline))
>>> +bar (int arg)
>>> +{
>>> +  return arg * arg;
>>> +}
>>> +
>>> +int
>>> +baz (int arg)
>>> +{
>>> +  printf("%d\n", bar (3));
>>> +  printf("%d\n", bar (4));
>>> +  printf("%d\n", foo (5));
>>> +  printf("%d\n", foo (6));
>>> +  /* adding or removing the following call should not affect foo
>>> +     function's clone numbering */
>>> +  printf("%d\n", bar (7));
>>> +  return foo (8);
>>> +}
>>> +
>>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
>>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
>>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
>>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
>>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
>>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
>>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
>>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
>>>
>>
>> Ping? Could someone else please take a look at this since Richard appears to be
>> overloaded at the moment?
> 
> The implementation is now how I suggested.  I'm still not 100% agreeing to
> the solution of using a hash-map here as I hoped that most callers have
> a good idea themselves what "number" of clone they are creating.  Thus
> most of them should be explicit in passing the number.  But I didn't investigate
> any of them (but the LTO one which I think doesn't need the suffix at all).
> 
> So if anybody besides me is fine with this approach feel free to approve.
> 
> One minor nit:
> 
>>> +  if (!clone_fn_ids) {
>>> +    /* Initialize the per-function counter hash table on first call */
>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>> +  }
> 
> omit {} around single stmts, the comment doesn't add much information but
> if you want to keep it move it before the if().
> 
> Thanks and sorry for the delay (slowly wading through older patches...),
> Richard.

No worries, and thank you for the review.

I dug a little deeper and found that LTO does need the numbered suffixes.
Otherwise I get errors in a lot of tests that look like:

...
spawn -ignore SIGHUP /gcc/build/gcc/xgcc -B/gcc/build/gcc/ c_lto_pr60820_0.o
c_lto_pr60820_1.o -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -r
-nostdlib -O2 -flinker-output=nolto-rel -o gcc-dg-lto-pr60820-01.exe

pid is 52067 -52067
lto1: error: two or more sections for
.gnu.lto_local_in6addr_any.lto_priv.0.lto_priv.0.3f1a2975b4946e93

lto1: internal compiler error: cannot read LTO decls from /tmp/ccwo9PaB.ltrans0.o

...

which makes sense because lto-partition.c:rename_statics calls
privatize_symbol_name in a loop over symbols with the same asm name.

So I'm planning to go back to the version where clone_function_name_1 just takes
two strings and I'm wondering if I should do:

name = IDENTIFIER_POINTER (get_identifier (maybe_rewrite_identifier (name)));

in privatize_symbol_name_1 to guarantee that name always persists.


- Michael

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-Make-function-assembly-more-independent.patch --]
[-- Type: text/x-patch; name="0001-Make-function-assembly-more-independent.patch", Size: 4454 bytes --]

From 03e9191e18e3935171f0281b588b0c6191e46253 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 16 Jul 2018 12:55:49 -0400
Subject: [PATCH] Make function assembly more independent.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Make function clone name numbering independent.
       * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
       (clone_function_name_1): Use it.

lto:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * lto-partition.c (privatize_symbol_name_1): Pass a persistent
       identifier string to clone_function_name_1.

testsuite:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Clone id counters should be completely independent from one another.
       * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
---
 gcc/cgraphclones.c                            | 10 +++++--
 gcc/lto/lto-partition.c                       |  2 +-
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c

diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 6e84a31..dfed9e5 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   return new_node;
 }
 
-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
 /* Return a new assembler name for a clone with SUFFIX of a decl named
    NAME.  */
@@ -522,12 +522,18 @@ clone_function_name_1 (const char *name, const char *suffix)
 {
   size_t len = strlen (name);
   char *tmp_name, *prefix;
+  unsigned int *suffix_counter;
 
   prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
+  /* Initialize the per-function counter hash table on first call */
+  if (!clone_fn_ids)
+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
+  suffix_counter = &clone_fn_ids->get_or_insert(name);
+  *suffix_counter = *suffix_counter + 1;
+  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, *suffix_counter);
   return get_identifier (tmp_name);
 }
 
diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
index c7a5710..fc8514a 100644
--- gcc/lto/lto-partition.c
+++ gcc/lto/lto-partition.c
@@ -962,7 +962,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
   if (must_not_rename (node, name))
     return false;
 
-  name = maybe_rewrite_identifier (name);
+  name = IDENTIFIER_POINTER (get_identifier (maybe_rewrite_identifier (name)));
   symtab->change_decl_assembler_name (decl,
 				      clone_function_name_1 (name,
 							     "lto_priv"));
diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
new file mode 100644
index 0000000..d723e20
--- /dev/null
+++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
+
+extern int printf (const char *, ...);
+
+static int __attribute__ ((noinline))
+foo (int arg)
+{
+  return 7 * arg;
+}
+
+static int __attribute__ ((noinline))
+bar (int arg)
+{
+  return arg * arg;
+}
+
+int
+baz (int arg)
+{
+  printf("%d\n", bar (3));
+  printf("%d\n", bar (4));
+  printf("%d\n", foo (5));
+  printf("%d\n", foo (6));
+  /* adding or removing the following call should not affect foo
+     function's clone numbering */
+  printf("%d\n", bar (7));
+  return foo (8);
+}
+
+/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
+/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
+/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
+/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
-- 
2.7.4


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PING][PATCH] Make function clone name numbering independent.
  2018-08-02 19:05                   ` Michael Ploujnikov
@ 2018-08-13 23:58                     ` Michael Ploujnikov
  2018-08-20 19:47                       ` Jeff Law
  2018-08-31 18:49                       ` [PING v2][PATCH] " Michael Ploujnikov
  0 siblings, 2 replies; 30+ messages in thread
From: Michael Ploujnikov @ 2018-08-13 23:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bernhard Fischer, GCC Patches, hubicka, Sriraman Tallam


[-- Attachment #1.1.1: Type: text/plain, Size: 35696 bytes --]

Ping and I've updated the patch since last time as follows:

  - unittest scans assembly rather than the constprop dump because its
    forward changed
  - unittests should handle different hosts where any of
    NO_DOT_IN_LABEL, NO_DOLLAR_IN_LABEL or __USER_LABEL_PREFIX__ may
    be defined
  - not 100% it's safe to change DECL_NAME to DECL_ASSEMBLER_NAME in
    cgraph_node::create_virtual_clone, but I've attempted to reduce
    some code duplication
  - lto-partition.c: privatize_symbol_name_1 *does* need numbered
    names
  - but cold sections don't
  - Expecting an IDENTIFIER_NODE in clone_function_name_1 avoids
    unreliable string pointer use as pointed out in the first review
  - renamed clone_function_name_1 and clone_function_name to
    numbered_clone_function_name_1 and numbered_clone_function_name to
    clarify purpose and discourage future unintended uses

Also bootstrapped and regtested.

- Michael

On 2018-08-02 03:05 PM, Michael Ploujnikov wrote:
> On 2018-08-01 06:37 AM, Richard Biener wrote:
>> On Tue, Jul 31, 2018 at 7:40 PM Michael Ploujnikov
>> <michael.ploujnikov@oracle.com> wrote:
>>>
>>> On 2018-07-26 01:27 PM, Michael Ploujnikov wrote:
>>>> On 2018-07-24 09:57 AM, Michael Ploujnikov wrote:
>>>>> On 2018-07-20 06:05 AM, Richard Biener wrote:
>>>>>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>>>>>>     NAME.  */
>>>>>>> @@ -521,14 +521,13 @@ tree
>>>>>>>  clone_function_name_1 (const char *name, const char *suffix)
>>>>>>
>>>>>> pass this function the counter to use....
>>>>>>
>>>>>>>  {
>>>>>>>    size_t len = strlen (name);
>>>>>>> -  char *tmp_name, *prefix;
>>>>>>> +  char *prefix;
>>>>>>>
>>>>>>>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>>>>>>>    memcpy (prefix, name, len);
>>>>>>>    strcpy (prefix + len + 1, suffix);
>>>>>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>>>>>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>>>>>
>>>>>> and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
>>>>>> the lto/lto-partition.c caller (just use zero as counter).
>>>>>>
>>>>>>> -  return get_identifier (tmp_name);
>>>>>>> +  return get_identifier (prefix);
>>>>>>>  }
>>>>>>>
>>>>>>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
>>>>>>> @@ -537,7 +536,17 @@ tree
>>>>>>>  clone_function_name (tree decl, const char *suffix)
>>>>>>>  {
>>>>>>>    tree name = DECL_ASSEMBLER_NAME (decl);
>>>>>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>>>>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>>>>>>> +  char *numbered_name;
>>>>>>> +  unsigned int *suffix_counter;
>>>>>>> +  if (!clone_fn_ids) {
>>>>>>> +    /* Initialize the per-function counter hash table if this is the first call */
>>>>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>>>>>> +  }
>>>>>>
>>>>>> I still do not like throwing memory at the problem in this way for the
>>>>>> little benefit
>>>>>> this change provides :/
>>>>>>
>>>>>> So no approval from me at this point...
>>>>>>
>>>>>> Richard.
>>>>>
>>>>> Can you give me an idea of the memory constraints that are involved?
>>>>>
>>>>> The highest memory usage increase that I could find was when compiling
>>>>> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB
>>>>> increase over the before-patch use of 6936kB which is barely 0.03%.
>>>>>
>>>>> Using a single counter can result in more confusing namespacing when
>>>>> you have .bar.clone.4 despite there only being 3 clones of .bar.
>>>>>
>>>>> From a practical point of view this change is helpful to anyone
>>>>> diffing binary output such as forensic analysts, Debian Reproducible
>>>>> Builds or even someone validating compiler output (before and after an input
>>>>> source patch). The extra changes that this patch alleviates are a
>>>>> distraction and could even be misleading. For example, applying a
>>>>> source patch to the same Linux source produces the following binary
>>>>> diff before my change:
>>>>>
>>>>> --- /tmp/output.o.objdump
>>>>> +++ /tmp/patched-output.o.objdump
>>>>> @@ -1,5 +1,5 @@
>>>>>
>>>>> -/tmp/uverbs_cmd/output.o:     file format elf32-i386
>>>>> +/tmp/uverbs_cmd/patched-output.o:     file format elf32-i386
>>>>>
>>>>>
>>>>>  Disassembly of section .text.get_order:
>>>>> @@ -265,12 +265,12 @@
>>>>>     3:       e9 fc ff ff ff          jmp    4 <put_cq_read+0x4>
>>>>>                      4: R_386_PC32   .text.put_uobj_read
>>>>>
>>>>> -Disassembly of section .text.trace_kmalloc.constprop.3:
>>>>> +Disassembly of section .text.trace_kmalloc.constprop.4:
>>>>>
>>>>> -00000000 <trace_kmalloc.constprop.3>:
>>>>> +00000000 <trace_kmalloc.constprop.4>:
>>>>>     0:       83 3d 04 00 00 00 00    cmpl   $0x0,0x4
>>>>>                      2: R_386_32     __tracepoint_kmalloc
>>>>> -   7:       74 34                   je     3d <trace_kmalloc.constprop.3+0x3d>
>>>>> +   7:       74 34                   je     3d <trace_kmalloc.constprop.4+0x3d>
>>>>>     9:       55                      push   %ebp
>>>>>     a:       89 cd                   mov    %ecx,%ebp
>>>>>     c:       57                      push   %edi
>>>>> @@ -281,7 +281,7 @@
>>>>>    13:       8b 1d 10 00 00 00       mov    0x10,%ebx
>>>>>                      15: R_386_32    __tracepoint_kmalloc
>>>>>    19:       85 db                   test   %ebx,%ebx
>>>>> -  1b:       74 1b                   je     38 <trace_kmalloc.constprop.3+0x38>
>>>>> +  1b:       74 1b                   je     38 <trace_kmalloc.constprop.4+0x38>
>>>>>    1d:       68 d0 00 00 00          push   $0xd0
>>>>>    22:       89 fa                   mov    %edi,%edx
>>>>>    24:       89 f0                   mov    %esi,%eax
>>>>> @@ -292,7 +292,7 @@
>>>>>    31:       58                      pop    %eax
>>>>>    32:       83 3b 00                cmpl   $0x0,(%ebx)
>>>>>    35:       5a                      pop    %edx
>>>>> -  36:       eb e3                   jmp    1b <trace_kmalloc.constprop.3+0x1b>
>>>>> +  36:       eb e3                   jmp    1b <trace_kmalloc.constprop.4+0x1b>
>>>>>    38:       5b                      pop    %ebx
>>>>>    39:       5e                      pop    %esi
>>>>>    3a:       5f                      pop    %edi
>>>>> @@ -846,7 +846,7 @@
>>>>>    78:       b8 5f 00 00 00          mov    $0x5f,%eax
>>>>>                      79: R_386_32    .text.ib_uverbs_alloc_pd
>>>>>    7d:       e8 fc ff ff ff          call   7e <ib_uverbs_alloc_pd+0x7e>
>>>>> -                    7e: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    7e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    82:       c7 45 d4 f4 ff ff ff    movl   $0xfffffff4,-0x2c(%ebp)
>>>>>    89:       59                      pop    %ecx
>>>>>    8a:       85 db                   test   %ebx,%ebx
>>>>> @@ -1068,7 +1068,7 @@
>>>>>    9c:       b8 83 00 00 00          mov    $0x83,%eax
>>>>>                      9d: R_386_32    .text.ib_uverbs_reg_mr
>>>>>    a1:       e8 fc ff ff ff          call   a2 <ib_uverbs_reg_mr+0xa2>
>>>>> -                    a2: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    a2: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    a6:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>>    ab:       58                      pop    %eax
>>>>>    ac:       85 db                   test   %ebx,%ebx
>>>>> @@ -1385,7 +1385,7 @@
>>>>>    99:       b8 7b 00 00 00          mov    $0x7b,%eax
>>>>>                      9a: R_386_32    .text.ib_uverbs_create_cq
>>>>>    9e:       e8 fc ff ff ff          call   9f <ib_uverbs_create_cq+0x9f>
>>>>> -                    9f: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    9f: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    a3:       58                      pop    %eax
>>>>>    a4:       85 db                   test   %ebx,%ebx
>>>>>    a6:       75 0a                   jne    b2 <ib_uverbs_create_cq+0xb2>
>>>>> @@ -1607,129 +1607,107 @@
>>>>>  00000000 <ib_uverbs_poll_cq>:
>>>>>     0:       55                      push   %ebp
>>>>>     1:       57                      push   %edi
>>>>> -   2:       89 c7                   mov    %eax,%edi
>>>>> -   4:       56                      push   %esi
>>>>> -   5:       89 ce                   mov    %ecx,%esi
>>>>> -   7:       b9 10 00 00 00          mov    $0x10,%ecx
>>>>> -   c:       53                      push   %ebx
>>>>> -   d:       83 ec 18                sub    $0x18,%esp
>>>>> -  10:       8d 44 24 08             lea    0x8(%esp),%eax
>>>>> -  14:       e8 fc ff ff ff          call   15 <ib_uverbs_poll_cq+0x15>
>>>>> -                    15: R_386_PC32  copy_from_user
>>>>> -  19:       85 c0                   test   %eax,%eax
>>>>> -  1b:       0f 85 34 01 00 00       jne    155 <ib_uverbs_poll_cq+0x155>
>>>>> -  21:       6b 44 24 14 34          imul   $0x34,0x14(%esp),%eax
>>>>> -  26:       ba d0 00 00 00          mov    $0xd0,%edx
>>>>> -  2b:       e8 fc ff ff ff          call   2c <ib_uverbs_poll_cq+0x2c>
>>>>> -                    2c: R_386_PC32  __kmalloc
>>>>> -  30:       89 c5                   mov    %eax,%ebp
>>>>> -  32:       85 c0                   test   %eax,%eax
>>>>> -  34:       0f 84 22 01 00 00       je     15c <ib_uverbs_poll_cq+0x15c>
>>>>> -  3a:       6b 44 24 14 30          imul   $0x30,0x14(%esp),%eax
>>>>> -  3f:       ba d0 00 00 00          mov    $0xd0,%edx
>>>>> -  44:       83 c0 08                add    $0x8,%eax
>>>>> -  47:       89 44 24 04             mov    %eax,0x4(%esp)
>>>>> -  4b:       e8 fc ff ff ff          call   4c <ib_uverbs_poll_cq+0x4c>
>>>>> -                    4c: R_386_PC32  __kmalloc
>>>>> -  50:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>> -  55:       89 04 24                mov    %eax,(%esp)
>>>>> -  58:       85 c0                   test   %eax,%eax
>>>>> -  5a:       0f 84 e1 00 00 00       je     141 <ib_uverbs_poll_cq+0x141>
>>>>> -  60:       8b 4f 58                mov    0x58(%edi),%ecx
>>>>> -  63:       6a 00                   push   $0x0
>>>>> -  65:       b8 00 00 00 00          mov    $0x0,%eax
>>>>> -                    66: R_386_32    ib_uverbs_cq_idr
>>>>> -  6a:       8b 54 24 14             mov    0x14(%esp),%edx
>>>>> -  6e:       e8 fc ff ff ff          call   6f <ib_uverbs_poll_cq+0x6f>
>>>>> -                    6f: R_386_PC32  .text.idr_read_obj
>>>>> -  73:       ba ea ff ff ff          mov    $0xffffffea,%edx
>>>>> -  78:       89 c7                   mov    %eax,%edi
>>>>> -  7a:       58                      pop    %eax
>>>>> -  7b:       85 ff                   test   %edi,%edi
>>>>> -  7d:       0f 84 ae 00 00 00       je     131 <ib_uverbs_poll_cq+0x131>
>>>>> -  83:       8b 1f                   mov    (%edi),%ebx
>>>>> -  85:       8b 54 24 14             mov    0x14(%esp),%edx
>>>>> -  89:       89 e9                   mov    %ebp,%ecx
>>>>> -  8b:       89 f8                   mov    %edi,%eax
>>>>> -  8d:       ff 93 70 01 00 00       call   *0x170(%ebx)
>>>>> -  93:       8b 1c 24                mov    (%esp),%ebx
>>>>> -  96:       89 03                   mov    %eax,(%ebx)
>>>>> -  98:       89 f8                   mov    %edi,%eax
>>>>> -  9a:       e8 fc ff ff ff          call   9b <ib_uverbs_poll_cq+0x9b>
>>>>> -                    9b: R_386_PC32  .text.put_cq_read
>>>>> -  9f:       8b 1c 24                mov    (%esp),%ebx
>>>>> -  a2:       89 e8                   mov    %ebp,%eax
>>>>> -  a4:       6b 3b 34                imul   $0x34,(%ebx),%edi
>>>>> -  a7:       8d 53 08                lea    0x8(%ebx),%edx
>>>>> -  aa:       01 ef                   add    %ebp,%edi
>>>>> -  ac:       39 f8                   cmp    %edi,%eax
>>>>> -  ae:       74 67                   je     117 <ib_uverbs_poll_cq+0x117>
>>>>> -  b0:       8b 08                   mov    (%eax),%ecx
>>>>> -  b2:       8b 58 04                mov    0x4(%eax),%ebx
>>>>> -  b5:       83 c2 30                add    $0x30,%edx
>>>>> -  b8:       83 c0 34                add    $0x34,%eax
>>>>> -  bb:       89 4a d0                mov    %ecx,-0x30(%edx)
>>>>> -  be:       89 5a d4                mov    %ebx,-0x2c(%edx)
>>>>> -  c1:       8b 48 d4                mov    -0x2c(%eax),%ecx
>>>>> -  c4:       89 4a d8                mov    %ecx,-0x28(%edx)
>>>>> -  c7:       8b 48 d8                mov    -0x28(%eax),%ecx
>>>>> -  ca:       89 4a dc                mov    %ecx,-0x24(%edx)
>>>>> -  cd:       8b 48 dc                mov    -0x24(%eax),%ecx
>>>>> -  d0:       89 4a e0                mov    %ecx,-0x20(%edx)
>>>>> -  d3:       8b 48 e0                mov    -0x20(%eax),%ecx
>>>>> -  d6:       89 4a e4                mov    %ecx,-0x1c(%edx)
>>>>> -  d9:       8b 48 e8                mov    -0x18(%eax),%ecx
>>>>> -  dc:       89 4a e8                mov    %ecx,-0x18(%edx)
>>>>> -  df:       8b 48 e4                mov    -0x1c(%eax),%ecx
>>>>> -  e2:       8b 49 20                mov    0x20(%ecx),%ecx
>>>>> -  e5:       89 4a ec                mov    %ecx,-0x14(%edx)
>>>>> -  e8:       8b 48 ec                mov    -0x14(%eax),%ecx
>>>>> -  eb:       89 4a f0                mov    %ecx,-0x10(%edx)
>>>>> -  ee:       8b 48 f0                mov    -0x10(%eax),%ecx
>>>>> -  f1:       89 4a f4                mov    %ecx,-0xc(%edx)
>>>>> -  f4:       8b 48 f4                mov    -0xc(%eax),%ecx
>>>>> -  f7:       66 89 4a f8             mov    %cx,-0x8(%edx)
>>>>> -  fb:       66 8b 48 f6             mov    -0xa(%eax),%cx
>>>>> -  ff:       66 89 4a fa             mov    %cx,-0x6(%edx)
>>>>> - 103:       8a 48 f8                mov    -0x8(%eax),%cl
>>>>> - 106:       88 4a fc                mov    %cl,-0x4(%edx)
>>>>> - 109:       8a 48 f9                mov    -0x7(%eax),%cl
>>>>> - 10c:       88 4a fd                mov    %cl,-0x3(%edx)
>>>>> - 10f:       8a 48 fa                mov    -0x6(%eax),%cl
>>>>> - 112:       88 4a fe                mov    %cl,-0x2(%edx)
>>>>> - 115:       eb 95                   jmp    ac <ib_uverbs_poll_cq+0xac>
>>>>> - 117:       8b 14 24                mov    (%esp),%edx
>>>>> - 11a:       8b 4c 24 04             mov    0x4(%esp),%ecx
>>>>> - 11e:       8b 44 24 08             mov    0x8(%esp),%eax
>>>>> - 122:       e8 fc ff ff ff          call   123 <ib_uverbs_poll_cq+0x123>
>>>>> -                    123: R_386_PC32 copy_to_user
>>>>> - 127:       83 f8 01                cmp    $0x1,%eax
>>>>> - 12a:       19 d2                   sbb    %edx,%edx
>>>>> - 12c:       f7 d2                   not    %edx
>>>>> - 12e:       83 e2 f2                and    $0xfffffff2,%edx
>>>>> - 131:       8b 04 24                mov    (%esp),%eax
>>>>> - 134:       89 54 24 04             mov    %edx,0x4(%esp)
>>>>> - 138:       e8 fc ff ff ff          call   139 <ib_uverbs_poll_cq+0x139>
>>>>> -                    139: R_386_PC32 kfree
>>>>> - 13d:       8b 54 24 04             mov    0x4(%esp),%edx
>>>>> - 141:       89 e8                   mov    %ebp,%eax
>>>>> - 143:       89 14 24                mov    %edx,(%esp)
>>>>> - 146:       e8 fc ff ff ff          call   147 <ib_uverbs_poll_cq+0x147>
>>>>> -                    147: R_386_PC32 kfree
>>>>> - 14b:       8b 14 24                mov    (%esp),%edx
>>>>> - 14e:       85 d2                   test   %edx,%edx
>>>>> - 150:       0f 45 f2                cmovne %edx,%esi
>>>>> - 153:       eb 0c                   jmp    161 <ib_uverbs_poll_cq+0x161>
>>>>> - 155:       be f2 ff ff ff          mov    $0xfffffff2,%esi
>>>>> - 15a:       eb 05                   jmp    161 <ib_uverbs_poll_cq+0x161>
>>>>> - 15c:       be f4 ff ff ff          mov    $0xfffffff4,%esi
>>>>> - 161:       83 c4 18                add    $0x18,%esp
>>>>> - 164:       89 f0                   mov    %esi,%eax
>>>>> - 166:       5b                      pop    %ebx
>>>>> - 167:       5e                      pop    %esi
>>>>> - 168:       5f                      pop    %edi
>>>>> - 169:       5d                      pop    %ebp
>>>>> - 16a:       c3                      ret
>>>>> +   2:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
>>>>> +   7:       56                      push   %esi
>>>>> +   8:       53                      push   %ebx
>>>>> +   9:       89 c3                   mov    %eax,%ebx
>>>>> +   b:       81 ec 84 00 00 00       sub    $0x84,%esp
>>>>> +  11:       89 4c 24 04             mov    %ecx,0x4(%esp)
>>>>> +  15:       8d 44 24 10             lea    0x10(%esp),%eax
>>>>> +  19:       b9 10 00 00 00          mov    $0x10,%ecx
>>>>> +  1e:       e8 fc ff ff ff          call   1f <ib_uverbs_poll_cq+0x1f>
>>>>> +                    1f: R_386_PC32  copy_from_user
>>>>> +  23:       89 04 24                mov    %eax,(%esp)
>>>>> +  26:       85 c0                   test   %eax,%eax
>>>>> +  28:       0f 85 1f 01 00 00       jne    14d <ib_uverbs_poll_cq+0x14d>
>>>>> +  2e:       8b 4b 58                mov    0x58(%ebx),%ecx
>>>>> +  31:       6a 00                   push   $0x0
>>>>> +  33:       b8 00 00 00 00          mov    $0x0,%eax
>>>>> +                    34: R_386_32    ib_uverbs_cq_idr
>>>>> +  38:       bf ea ff ff ff          mov    $0xffffffea,%edi
>>>>> +  3d:       8b 54 24 1c             mov    0x1c(%esp),%edx
>>>>> +  41:       e8 fc ff ff ff          call   42 <ib_uverbs_poll_cq+0x42>
>>>>> +                    42: R_386_PC32  .text.idr_read_obj
>>>>> +  46:       89 c3                   mov    %eax,%ebx
>>>>> +  48:       58                      pop    %eax
>>>>> +  49:       85 db                   test   %ebx,%ebx
>>>>> +  4b:       0f 84 fc 00 00 00       je     14d <ib_uverbs_poll_cq+0x14d>
>>>>> +  51:       8b 6c 24 10             mov    0x10(%esp),%ebp
>>>>> +  55:       b9 02 00 00 00          mov    $0x2,%ecx
>>>>> +  5a:       8d 7c 24 08             lea    0x8(%esp),%edi
>>>>> +  5e:       8b 04 24                mov    (%esp),%eax
>>>>> +  61:       8d 75 08                lea    0x8(%ebp),%esi
>>>>> +  64:       f3 ab                   rep stos %eax,%es:(%edi)
>>>>> +  66:       8b 44 24 1c             mov    0x1c(%esp),%eax
>>>>> +  6a:       39 44 24 08             cmp    %eax,0x8(%esp)
>>>>> +  6e:       73 1f                   jae    8f <ib_uverbs_poll_cq+0x8f>
>>>>> +  70:       8b 3b                   mov    (%ebx),%edi
>>>>> +  72:       8d 4c 24 50             lea    0x50(%esp),%ecx
>>>>> +  76:       ba 01 00 00 00          mov    $0x1,%edx
>>>>> +  7b:       89 d8                   mov    %ebx,%eax
>>>>> +  7d:       ff 97 70 01 00 00       call   *0x170(%edi)
>>>>> +  83:       89 c7                   mov    %eax,%edi
>>>>> +  85:       85 c0                   test   %eax,%eax
>>>>> +  87:       0f 88 b9 00 00 00       js     146 <ib_uverbs_poll_cq+0x146>
>>>>> +  8d:       75 21                   jne    b0 <ib_uverbs_poll_cq+0xb0>
>>>>> +  8f:       b9 08 00 00 00          mov    $0x8,%ecx
>>>>> +  94:       8d 54 24 08             lea    0x8(%esp),%edx
>>>>> +  98:       89 e8                   mov    %ebp,%eax
>>>>> +  9a:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
>>>>> +  9f:       e8 fc ff ff ff          call   a0 <ib_uverbs_poll_cq+0xa0>
>>>>> +                    a0: R_386_PC32  copy_to_user
>>>>> +  a4:       85 c0                   test   %eax,%eax
>>>>> +  a6:       0f 44 7c 24 04          cmove  0x4(%esp),%edi
>>>>> +  ab:       e9 96 00 00 00          jmp    146 <ib_uverbs_poll_cq+0x146>
>>>>> +  b0:       8b 44 24 50             mov    0x50(%esp),%eax
>>>>> +  b4:       8b 54 24 54             mov    0x54(%esp),%edx
>>>>> +  b8:       b9 30 00 00 00          mov    $0x30,%ecx
>>>>> +  bd:       89 44 24 20             mov    %eax,0x20(%esp)
>>>>> +  c1:       8b 44 24 58             mov    0x58(%esp),%eax
>>>>> +  c5:       89 54 24 24             mov    %edx,0x24(%esp)
>>>>> +  c9:       8b 54 24 74             mov    0x74(%esp),%edx
>>>>> +  cd:       89 44 24 28             mov    %eax,0x28(%esp)
>>>>> +  d1:       8b 44 24 5c             mov    0x5c(%esp),%eax
>>>>> +  d5:       89 44 24 2c             mov    %eax,0x2c(%esp)
>>>>> +  d9:       8b 44 24 60             mov    0x60(%esp),%eax
>>>>> +  dd:       89 44 24 30             mov    %eax,0x30(%esp)
>>>>> +  e1:       8b 44 24 64             mov    0x64(%esp),%eax
>>>>> +  e5:       89 44 24 34             mov    %eax,0x34(%esp)
>>>>> +  e9:       8b 44 24 6c             mov    0x6c(%esp),%eax
>>>>> +  ed:       89 44 24 38             mov    %eax,0x38(%esp)
>>>>> +  f1:       8b 44 24 68             mov    0x68(%esp),%eax
>>>>> +  f5:       8b 40 20                mov    0x20(%eax),%eax
>>>>> +  f8:       89 54 24 44             mov    %edx,0x44(%esp)
>>>>> +  fc:       8d 54 24 20             lea    0x20(%esp),%edx
>>>>> + 100:       c6 44 24 4f 00          movb   $0x0,0x4f(%esp)
>>>>> + 105:       89 44 24 3c             mov    %eax,0x3c(%esp)
>>>>> + 109:       8b 44 24 70             mov    0x70(%esp),%eax
>>>>> + 10d:       89 44 24 40             mov    %eax,0x40(%esp)
>>>>> + 111:       8b 44 24 78             mov    0x78(%esp),%eax
>>>>> + 115:       89 44 24 48             mov    %eax,0x48(%esp)
>>>>> + 119:       8b 44 24 7c             mov    0x7c(%esp),%eax
>>>>> + 11d:       66 89 44 24 4c          mov    %ax,0x4c(%esp)
>>>>> + 122:       8a 44 24 7e             mov    0x7e(%esp),%al
>>>>> + 126:       88 44 24 4e             mov    %al,0x4e(%esp)
>>>>> + 12a:       89 f0                   mov    %esi,%eax
>>>>> + 12c:       e8 fc ff ff ff          call   12d <ib_uverbs_poll_cq+0x12d>
>>>>> +                    12d: R_386_PC32 copy_to_user
>>>>> + 131:       85 c0                   test   %eax,%eax
>>>>> + 133:       75 0c                   jne    141 <ib_uverbs_poll_cq+0x141>
>>>>> + 135:       83 c6 30                add    $0x30,%esi
>>>>> + 138:       ff 44 24 08             incl   0x8(%esp)
>>>>> + 13c:       e9 25 ff ff ff          jmp    66 <ib_uverbs_poll_cq+0x66>
>>>>> + 141:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
>>>>> + 146:       89 d8                   mov    %ebx,%eax
>>>>> + 148:       e8 fc ff ff ff          call   149 <ib_uverbs_poll_cq+0x149>
>>>>> +                    149: R_386_PC32 .text.put_cq_read
>>>>> + 14d:       81 c4 84 00 00 00       add    $0x84,%esp
>>>>> + 153:       89 f8                   mov    %edi,%eax
>>>>> + 155:       5b                      pop    %ebx
>>>>> + 156:       5e                      pop    %esi
>>>>> + 157:       5f                      pop    %edi
>>>>> + 158:       5d                      pop    %ebp
>>>>> + 159:       c3                      ret
>>>>>
>>>>>  Disassembly of section .text.ib_uverbs_req_notify_cq:
>>>>>
>>>>> @@ -1915,7 +1893,7 @@
>>>>>    94:       b8 7b 00 00 00          mov    $0x7b,%eax
>>>>>                      95: R_386_32    .text.ib_uverbs_create_qp
>>>>>    99:       e8 fc ff ff ff          call   9a <ib_uverbs_create_qp+0x9a>
>>>>> -                    9a: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    9a: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    9e:       59                      pop    %ecx
>>>>>    9f:       c7 85 50 ff ff ff f4    movl   $0xfffffff4,-0xb0(%ebp)
>>>>>    a6:       ff ff ff
>>>>> @@ -2241,7 +2219,7 @@
>>>>>    68:       b8 4f 00 00 00          mov    $0x4f,%eax
>>>>>                      69: R_386_32    .text.ib_uverbs_query_qp
>>>>>    6d:       e8 fc ff ff ff          call   6e <ib_uverbs_query_qp+0x6e>
>>>>> -                    6e: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    6e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    72:       59                      pop    %ecx
>>>>>    73:       c7 85 5c ff ff ff 10    movl   $0x10,-0xa4(%ebp)
>>>>>    7a:       00 00 00
>>>>> @@ -2260,7 +2238,7 @@
>>>>>    a3:       b8 86 00 00 00          mov    $0x86,%eax
>>>>>                      a4: R_386_32    .text.ib_uverbs_query_qp
>>>>>    a8:       e8 fc ff ff ff          call   a9 <ib_uverbs_query_qp+0xa9>
>>>>> -                    a9: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    a9: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    ad:       5a                      pop    %edx
>>>>>    ae:       85 db                   test   %ebx,%ebx
>>>>>    b0:       0f 84 c1 01 00 00       je     277 <ib_uverbs_query_qp+0x277>
>>>>> @@ -2462,7 +2440,7 @@
>>>>>    88:       b8 6f 00 00 00          mov    $0x6f,%eax
>>>>>                      89: R_386_32    .text.ib_uverbs_modify_qp
>>>>>    8d:       e8 fc ff ff ff          call   8e <ib_uverbs_modify_qp+0x8e>
>>>>> -                    8e: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    8e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    92:       5a                      pop    %edx
>>>>>    93:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>>    98:       85 db                   test   %ebx,%ebx
>>>>> @@ -3129,7 +3107,7 @@
>>>>>    6a:       b8 4c 00 00 00          mov    $0x4c,%eax
>>>>>                      6b: R_386_32    .text.ib_uverbs_create_ah
>>>>>    6f:       e8 fc ff ff ff          call   70 <ib_uverbs_create_ah+0x70>
>>>>> -                    70: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    70: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    74:       58                      pop    %eax
>>>>>    75:       85 db                   test   %ebx,%ebx
>>>>>    77:       75 0a                   jne    83 <ib_uverbs_create_ah+0x83>
>>>>> @@ -3396,7 +3374,7 @@
>>>>>    af:       b8 91 00 00 00          mov    $0x91,%eax
>>>>>                      b0: R_386_32    .text.ib_uverbs_attach_mcast
>>>>>    b4:       e8 fc ff ff ff          call   b5 <ib_uverbs_attach_mcast+0xb5>
>>>>> -                    b5: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    b5: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    b9:       58                      pop    %eax
>>>>>    ba:       85 db                   test   %ebx,%ebx
>>>>>    bc:       75 07                   jne    c5 <ib_uverbs_attach_mcast+0xc5>
>>>>> @@ -3572,7 +3550,7 @@
>>>>>    77:       b8 5e 00 00 00          mov    $0x5e,%eax
>>>>>                      78: R_386_32    .text.ib_uverbs_create_srq
>>>>>    7c:       e8 fc ff ff ff          call   7d <ib_uverbs_create_srq+0x7d>
>>>>> -                    7d: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    7d: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    81:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>>    86:       58                      pop    %eax
>>>>>    87:       85 db                   test   %ebx,%ebx
>>>>>
>>>>> Needless to say, after my change the diff only shows the truly changed
>>>>> functions.
>>>>>
>>>>> - Michael
>>>>>
>>>>
>>>> Updated patch
>>>>
>>>>
>>>> gcc:
>>>> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>>>
>>>>        Make function clone name numbering independent.
>>>>        * cgraph.h (clone_function_name_1): Add clone_num argument
>>>>        * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
>>>>        (clone_function_name): Use counters from the hashtable.
>>>>
>>>> lto:
>>>> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>>>
>>>>        * lto-partition.c (privatize_symbol_name_1): Pass 0 to
>>>>        clone_function_name_1.
>>>>
>>>>
>>>> testsuite:
>>>> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>>>
>>>>       Clone numbering should be independent for each function.
>>>>       * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
>>>>
>>>> ---
>>>>  gcc/cgraph.h                                  |  2 +-
>>>>  gcc/cgraphclones.c                            | 16 ++++++++---
>>>>  gcc/lto/lto-partition.c                       |  2 +-
>>>>  gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
>>>>  4 files changed, 52 insertions(+), 6 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>>>
>>>> diff --git gcc/cgraph.h gcc/cgraph.h
>>>> index a8b1b4c..fe44bd0 100644
>>>> --- gcc/cgraph.h
>>>> +++ gcc/cgraph.h
>>>> @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool,
>>>> profile_count);
>>>>  tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
>>>>  /* In cgraphclones.c  */
>>>>
>>>> -tree clone_function_name_1 (const char *, const char *);
>>>> +tree clone_function_name_1 (const char *, const char *, unsigned int);
>>>>  tree clone_function_name (tree decl, const char *);
>>>>
>>>>  void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
>>>> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
>>>> index 6e84a31..7de7a65 100644
>>>> --- gcc/cgraphclones.c
>>>> +++ gcc/cgraphclones.c
>>>> @@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count
>>>> prof_count,
>>>>    return new_node;
>>>>  }
>>>>
>>>> -static GTY(()) unsigned int clone_fn_id_num;
>>>> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
>>>>
>>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>>>     NAME.  */
>>>>
>>>>  tree
>>>> -clone_function_name_1 (const char *name, const char *suffix)
>>>> +clone_function_name_1 (const char *name, const char *suffix, unsigned int
>>>> clone_num)
>>>>  {
>>>>    size_t len = strlen (name);
>>>>    char *tmp_name, *prefix;
>>>> @@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix)
>>>>    memcpy (prefix, name, len);
>>>>    strcpy (prefix + len + 1, suffix);
>>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>>> +  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num);
>>>>    return get_identifier (tmp_name);
>>>>  }
>>>>
>>>> @@ -537,7 +537,15 @@ tree
>>>>  clone_function_name (tree decl, const char *suffix)
>>>>  {
>>>>    tree name = DECL_ASSEMBLER_NAME (decl);
>>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>>>> +  unsigned int *suffix_counter;
>>>> +  if (!clone_fn_ids) {
>>>> +    /* Initialize the per-function counter hash table on first call */
>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>>> +  }
>>>> +  suffix_counter = &clone_fn_ids->get_or_insert(decl_name);
>>>> +  *suffix_counter = *suffix_counter + 1;
>>>> +  return clone_function_name_1 (decl_name, suffix, *suffix_counter);
>>>>  }
>>>>
>>>>
>>>> diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
>>>> index c7a5710..4b15232 100644
>>>> --- gcc/lto/lto-partition.c
>>>> +++ gcc/lto/lto-partition.c
>>>> @@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
>>>>    name = maybe_rewrite_identifier (name);
>>>>    symtab->change_decl_assembler_name (decl,
>>>>                                     clone_function_name_1 (name,
>>>> -                                                          "lto_priv"));
>>>> +                                                          "lto_priv", 0));
>>>>
>>>>    if (node->lto_file_data)
>>>>      lto_record_renamed_decl (node->lto_file_data, name,
>>>> diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>>> gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>>> new file mode 100644
>>>> index 0000000..d723e20
>>>> --- /dev/null
>>>> +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>>> @@ -0,0 +1,38 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
>>>> +
>>>> +extern int printf (const char *, ...);
>>>> +
>>>> +static int __attribute__ ((noinline))
>>>> +foo (int arg)
>>>> +{
>>>> +  return 7 * arg;
>>>> +}
>>>> +
>>>> +static int __attribute__ ((noinline))
>>>> +bar (int arg)
>>>> +{
>>>> +  return arg * arg;
>>>> +}
>>>> +
>>>> +int
>>>> +baz (int arg)
>>>> +{
>>>> +  printf("%d\n", bar (3));
>>>> +  printf("%d\n", bar (4));
>>>> +  printf("%d\n", foo (5));
>>>> +  printf("%d\n", foo (6));
>>>> +  /* adding or removing the following call should not affect foo
>>>> +     function's clone numbering */
>>>> +  printf("%d\n", bar (7));
>>>> +  return foo (8);
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
>>>>
>>>
>>> Ping? Could someone else please take a look at this since Richard appears to be
>>> overloaded at the moment?
>>
>> The implementation is now how I suggested.  I'm still not 100% agreeing to
>> the solution of using a hash-map here as I hoped that most callers have
>> a good idea themselves what "number" of clone they are creating.  Thus
>> most of them should be explicit in passing the number.  But I didn't investigate
>> any of them (but the LTO one which I think doesn't need the suffix at all).
>>
>> So if anybody besides me is fine with this approach feel free to approve.
>>
>> One minor nit:
>>
>>>> +  if (!clone_fn_ids) {
>>>> +    /* Initialize the per-function counter hash table on first call */
>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>>> +  }
>>
>> omit {} around single stmts, the comment doesn't add much information but
>> if you want to keep it move it before the if().
>>
>> Thanks and sorry for the delay (slowly wading through older patches...),
>> Richard.
> 
> No worries, and thank you for the review.
> 
> I dug a little deeper and found that LTO does need the numbered suffixes.
> Otherwise I get errors in a lot of tests that look like:
> 
> ...
> spawn -ignore SIGHUP /gcc/build/gcc/xgcc -B/gcc/build/gcc/ c_lto_pr60820_0.o
> c_lto_pr60820_1.o -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -r
> -nostdlib -O2 -flinker-output=nolto-rel -o gcc-dg-lto-pr60820-01.exe
> 
> pid is 52067 -52067
> lto1: error: two or more sections for
> .gnu.lto_local_in6addr_any.lto_priv.0.lto_priv.0.3f1a2975b4946e93
> 
> lto1: internal compiler error: cannot read LTO decls from /tmp/ccwo9PaB.ltrans0.o
> 
> ...
> 
> which makes sense because lto-partition.c:rename_statics calls
> privatize_symbol_name in a loop over symbols with the same asm name.
> 
> So I'm planning to go back to the version where clone_function_name_1 just takes
> two strings and I'm wondering if I should do:
> 
> name = IDENTIFIER_POINTER (get_identifier (maybe_rewrite_identifier (name)));
> 
> in privatize_symbol_name_1 to guarantee that name always persists.
> 
> 
> - Michael
> 


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-Make-function-assembly-more-independent.patch --]
[-- Type: text/x-patch; name="0001-Make-function-assembly-more-independent.patch", Size: 6060 bytes --]

From adde0632266d3f1b0540e09b9db931df4302d2bc Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 16 Jul 2018 12:55:49 -0400
Subject: [PATCH 1/4] Make function assembly more independent.

This changes clone_function_name such that clone names are based on a
per-function counter rather than a global one.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Make function clone name numbering independent.
       * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
       (clone_function_name_1): Take an IDENTIFIER_NODE instead of a
       string. Use clone_fn_id_num.

lto:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * lto-partition.c (privatize_symbol_name_1): Pass a persistent
       identifier node to clone_function_name_1.

testsuite:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Clone id counters should be completely independent from one another.
       * gcc.dg/independent-cloneids-1.c: New test.
---
 gcc/cgraph.h                                  |  2 +-
 gcc/cgraphclones.c                            | 19 +++++++++-----
 gcc/lto/lto-partition.c                       |  5 ++--
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c

diff --git gcc/cgraph.h gcc/cgraph.h
index a8b1b4c..fadc107 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
-tree clone_function_name_1 (const char *, const char *);
+tree clone_function_name_1 (tree identifier, const char *);
 tree clone_function_name (tree decl, const char *);
 
 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 6e84a31..fb81fbd 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -512,14 +512,15 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   return new_node;
 }
 
-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
-/* Return a new assembler name for a clone with SUFFIX of a decl named
-   NAME.  */
+/* Return a new assembler name for a numbered clone with SUFFIX of a
+   decl named IDENTIFIER.  */
 
 tree
-clone_function_name_1 (const char *name, const char *suffix)
+clone_function_name_1 (tree identifier, const char *suffix)
 {
+  const char *name = IDENTIFIER_POINTER (identifier);
   size_t len = strlen (name);
   char *tmp_name, *prefix;
 
@@ -527,7 +528,12 @@ clone_function_name_1 (const char *name, const char *suffix)
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
+  /* Initialize the per-function counter hash table on first call */
+  if (!clone_fn_ids)
+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
+  unsigned int &suffix_counter = clone_fn_ids->get_or_insert(name);
+  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, suffix_counter);
+  suffix_counter++;
   return get_identifier (tmp_name);
 }
 
@@ -536,8 +542,7 @@ clone_function_name_1 (const char *name, const char *suffix)
 tree
 clone_function_name (tree decl, const char *suffix)
 {
-  tree name = DECL_ASSEMBLER_NAME (decl);
-  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+  return clone_function_name_1 (DECL_ASSEMBLER_NAME (decl), suffix);
 }
 
 
diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
index c7a5710..e3efa71 100644
--- gcc/lto/lto-partition.c
+++ gcc/lto/lto-partition.c
@@ -962,11 +962,12 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
   if (must_not_rename (node, name))
     return false;
 
-  name = maybe_rewrite_identifier (name);
+  tree identifier = get_identifier (maybe_rewrite_identifier (name));
   symtab->change_decl_assembler_name (decl,
-				      clone_function_name_1 (name,
+				      clone_function_name_1 (identifier,
 							     "lto_priv"));
 
+  name = IDENTIFIER_POINTER (identifier);
   if (node->lto_file_data)
     lto_record_renamed_decl (node->lto_file_data, name,
 			     IDENTIFIER_POINTER
diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
new file mode 100644
index 0000000..3203895
--- /dev/null
+++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone"  } */
+
+extern int printf (const char *, ...);
+
+static int __attribute__ ((noinline))
+foo (int arg)
+{
+  return 7 * arg;
+}
+
+static int __attribute__ ((noinline))
+bar (int arg)
+{
+  return arg * arg;
+}
+
+int
+baz (int arg)
+{
+  printf("%d\n", bar (3));
+  printf("%d\n", bar (4));
+  printf("%d\n", foo (5));
+  printf("%d\n", foo (6));
+  /* adding or removing the following call should not affect foo
+     function's clone numbering */
+  printf("%d\n", bar (7));
+  return foo (8);
+}
+
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]3:} } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]4:} } } */
-- 
2.7.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.3: 0002-Rename-clone_function_name_1-and-clone_function_name.patch --]
[-- Type: text/x-patch; name="0002-Rename-clone_function_name_1-and-clone_function_name.patch", Size: 8895 bytes --]

From bbea3705e59e23289cdba72da143f89b3a08b78c Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Tue, 7 Aug 2018 20:36:53 -0400
Subject: [PATCH 2/4] Rename clone_function_name_1 and clone_function_name to
 clarify usage.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * gcc/cgraph.h: Rename clone_function_name_1 to
         numbered_clone_function_name_1. Rename clone_function_name to
         numbered_clone_function_name.
       * cgraphclones.c: Ditto.
       * config/rs6000/rs6000.c: Ditto.
       * lto/lto-partition.c: Ditto.
       * multiple_target.c: Ditto.
       * omp-expand.c: Ditto.
       * omp-low.c: Ditto.
       * omp-simd-clone.c: Ditto.
       * symtab.c: Ditto.
---
 gcc/cgraph.h               |  4 ++--
 gcc/cgraphclones.c         | 14 +++++++-------
 gcc/config/rs6000/rs6000.c |  2 +-
 gcc/lto/lto-partition.c    |  4 ++--
 gcc/multiple_target.c      |  8 ++++----
 gcc/omp-expand.c           |  2 +-
 gcc/omp-low.c              |  4 ++--
 gcc/omp-simd-clone.c       |  2 +-
 gcc/symtab.c               |  2 +-
 9 files changed, 21 insertions(+), 21 deletions(-)

diff --git gcc/cgraph.h gcc/cgraph.h
index fadc107..322df1e 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -2368,8 +2368,8 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
-tree clone_function_name_1 (tree identifier, const char *);
-tree clone_function_name (tree decl, const char *);
+tree numbered_clone_function_name_1 (tree identifier, const char *);
+tree numbered_clone_function_name (tree decl, const char *);
 
 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
 			       bool, bitmap, bool, bitmap, basic_block);
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index fb81fbd..08fc922 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -316,7 +316,7 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node)
   gcc_checking_assert (!DECL_RESULT (new_decl));
   gcc_checking_assert (!DECL_RTL_SET_P (new_decl));
 
-  DECL_NAME (new_decl) = clone_function_name (thunk->decl, "artificial_thunk");
+  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
   SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
 
   new_thunk = cgraph_node::create (new_decl);
@@ -518,7 +518,7 @@ static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
    decl named IDENTIFIER.  */
 
 tree
-clone_function_name_1 (tree identifier, const char *suffix)
+numbered_clone_function_name_1 (tree identifier, const char *suffix)
 {
   const char *name = IDENTIFIER_POINTER (identifier);
   size_t len = strlen (name);
@@ -537,12 +537,12 @@ clone_function_name_1 (tree identifier, const char *suffix)
   return get_identifier (tmp_name);
 }
 
-/* Return a new assembler name for a clone of DECL with SUFFIX.  */
+/* Return a new assembler name for a clone of DECL with numbered SUFFIX.  */
 
 tree
-clone_function_name (tree decl, const char *suffix)
+numbered_clone_function_name (tree decl, const char *suffix)
 {
-  return clone_function_name_1 (DECL_ASSEMBLER_NAME (decl), suffix);
+  return numbered_clone_function_name_1 (DECL_ASSEMBLER_NAME (decl), suffix);
 }
 
 
@@ -590,7 +590,7 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
   strcpy (name + len + 1, suffix);
   name[len] = '.';
   DECL_NAME (new_decl) = get_identifier (name);
-  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix));
+  SET_DECL_ASSEMBLER_NAME (new_decl, numbered_clone_function_name (old_decl, suffix));
   SET_DECL_RTL (new_decl, NULL);
 
   new_node = create_clone (new_decl, count, false,
@@ -969,7 +969,7 @@ cgraph_node::create_version_clone_with_body
       = build_function_decl_skip_args (old_decl, args_to_skip, skip_return);
 
   /* Generate a new name for the new version. */
-  DECL_NAME (new_decl) = clone_function_name (old_decl, suffix);
+  DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, suffix);
   SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
   SET_DECL_RTL (new_decl, NULL);
 
diff --git gcc/config/rs6000/rs6000.c gcc/config/rs6000/rs6000.c
index c2af4b8..f9c9936 100644
--- gcc/config/rs6000/rs6000.c
+++ gcc/config/rs6000/rs6000.c
@@ -36512,7 +36512,7 @@ make_resolver_func (const tree default_decl,
 {
   /* Make the resolver function static.  The resolver function returns
      void *.  */
-  tree decl_name = clone_function_name (default_decl, "resolver");
+  tree decl_name = numbered_clone_function_name (default_decl, "resolver");
   const char *resolver_name = IDENTIFIER_POINTER (decl_name);
   tree type = build_function_type_list (ptr_type_node, NULL_TREE);
   tree decl = build_fn_decl (resolver_name, type);
diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
index e3efa71..c872325 100644
--- gcc/lto/lto-partition.c
+++ gcc/lto/lto-partition.c
@@ -964,8 +964,8 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
 
   tree identifier = get_identifier (maybe_rewrite_identifier (name));
   symtab->change_decl_assembler_name (decl,
-				      clone_function_name_1 (identifier,
-							     "lto_priv"));
+				      numbered_clone_function_name_1 (identifier,
+								      "lto_priv"));
 
   name = IDENTIFIER_POINTER (identifier);
   if (node->lto_file_data)
diff --git gcc/multiple_target.c gcc/multiple_target.c
index a1fe09a..592e7b9 100644
--- gcc/multiple_target.c
+++ gcc/multiple_target.c
@@ -162,8 +162,8 @@ create_dispatcher_calls (struct cgraph_node *node)
     }
 
   symtab->change_decl_assembler_name (node->decl,
-				      clone_function_name (node->decl,
-							   "default"));
+				      numbered_clone_function_name (node->decl,
+								    "default"));
 
   /* FIXME: copy of cgraph_node::make_local that should be cleaned up
 	    in next stage1.  */
@@ -312,8 +312,8 @@ create_target_clone (cgraph_node *node, bool definition, char *name)
       new_node = cgraph_node::get_create (new_decl);
       /* Generate a new name for the new version.  */
       symtab->change_decl_assembler_name (new_node->decl,
-					  clone_function_name (node->decl,
-							       name));
+					  numbered_clone_function_name (node->decl,
+									name));
     }
   return new_node;
 }
diff --git gcc/omp-expand.c gcc/omp-expand.c
index d2a77c0..3aeaae0 100644
--- gcc/omp-expand.c
+++ gcc/omp-expand.c
@@ -7625,7 +7625,7 @@ grid_expand_target_grid_body (struct omp_region *target)
     expand_omp (gpukernel->inner);
 
   tree kern_fndecl = copy_node (orig_child_fndecl);
-  DECL_NAME (kern_fndecl) = clone_function_name (kern_fndecl, "kernel");
+  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
   SET_DECL_ASSEMBLER_NAME (kern_fndecl, DECL_NAME (kern_fndecl));
   tree tgtblock = gimple_block (tgt_stmt);
   tree fniniblock = make_node (BLOCK);
diff --git gcc/omp-low.c gcc/omp-low.c
index 843c66f..6748e05 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1531,8 +1531,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 static tree
 create_omp_child_function_name (bool task_copy)
 {
-  return clone_function_name (current_function_decl,
-			      task_copy ? "_omp_cpyfn" : "_omp_fn");
+  return numbered_clone_function_name (current_function_decl,
+				       task_copy ? "_omp_cpyfn" : "_omp_fn");
 }
 
 /* Return true if CTX may belong to offloaded code: either if current function
diff --git gcc/omp-simd-clone.c gcc/omp-simd-clone.c
index b15adf0..806c486 100644
--- gcc/omp-simd-clone.c
+++ gcc/omp-simd-clone.c
@@ -444,7 +444,7 @@ simd_clone_create (struct cgraph_node *old_node)
     {
       tree old_decl = old_node->decl;
       tree new_decl = copy_node (old_node->decl);
-      DECL_NAME (new_decl) = clone_function_name (old_decl, "simdclone");
+      DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, "simdclone");
       SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
       SET_DECL_RTL (new_decl, NULL);
       DECL_STATIC_CONSTRUCTOR (new_decl) = 0;
diff --git gcc/symtab.c gcc/symtab.c
index c5464cb..de65b65 100644
--- gcc/symtab.c
+++ gcc/symtab.c
@@ -1787,7 +1787,7 @@ symtab_node::noninterposable_alias (void)
   /* Otherwise create a new one.  */
   new_decl = copy_node (node->decl);
   DECL_DLLIMPORT_P (new_decl) = 0;
-  DECL_NAME (new_decl) = clone_function_name (node->decl, "localalias");
+  DECL_NAME (new_decl) = numbered_clone_function_name (node->decl, "localalias");
   if (TREE_CODE (new_decl) == FUNCTION_DECL)
     DECL_STRUCT_FUNCTION (new_decl) = NULL;
   DECL_INITIAL (new_decl) = NULL;
-- 
2.7.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.4: 0003-Cold-sections-don-t-need-to-be-numbered.patch --]
[-- Type: text/x-patch; name="0003-Cold-sections-don-t-need-to-be-numbered.patch", Size: 5781 bytes --]

From f6a5f30d8975214c564ec3740163b6156ca2ad97 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Tue, 7 Aug 2018 20:47:12 -0400
Subject: [PATCH 3/4] Cold sections don't need to be numbered.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * cgraph.h: Add suffixed_function_name.
       * cgraphclones.c: Ditto.
       * final.c (final_scan_insn_1): Use it.

testsuite:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
       * gcc.dg/tree-prof/cold_partition_label.c: Update for new cold
         section names.
       * gcc.dg/tree-prof/section-attr-1.c: ditto.
       * gcc.dg/tree-prof/section-attr-2.c: ditto.
       * gcc.dg/tree-prof/section-attr-3.c: ditto.
---
 gcc/cgraph.h                                          |  1 +
 gcc/cgraphclones.c                                    | 16 ++++++++++++++++
 gcc/final.c                                           |  2 +-
 gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c |  4 ++--
 gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c       |  2 +-
 gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c       |  2 +-
 gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c       |  2 +-
 7 files changed, 23 insertions(+), 6 deletions(-)

diff --git gcc/cgraph.h gcc/cgraph.h
index 322df1e..836ceb1 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -2368,6 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
+tree suffixed_function_name (tree decl, const char *);
 tree numbered_clone_function_name_1 (tree identifier, const char *);
 tree numbered_clone_function_name (tree decl, const char *);
 
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 08fc922..784f1df 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -514,6 +514,22 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
 
 static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
+/* Return a new assembler name for a clone of DECL with SUFFIX.  */
+
+tree
+suffixed_function_name (tree decl, const char *suffix)
+{
+  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  size_t len = strlen (name);
+  char *prefix;
+
+  prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
+  memcpy (prefix, name, len);
+  prefix[len] = symbol_table::symbol_suffix_separator ();
+  strcpy (prefix + len + 1, suffix);
+  return get_identifier (prefix);
+}
+
 /* Return a new assembler name for a numbered clone with SUFFIX of a
    decl named IDENTIFIER.  */
 
diff --git gcc/final.c gcc/final.c
index 842e5e0..ac83417 100644
--- gcc/final.c
+++ gcc/final.c
@@ -2203,7 +2203,7 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int optimize_p ATTRIBUTE_UNUSED,
 
 	  if (in_cold_section_p)
 	    cold_function_name
-	      = clone_function_name (current_function_decl, "cold");
+	      = suffixed_function_name (current_function_decl, "cold");
 
 	  if (dwarf2out_do_frame ())
 	    {
diff --git gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
index 52518cf..450308d 100644
--- gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
+++ gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
@@ -37,6 +37,6 @@ main (int argc, char *argv[])
   return 0;
 }
 
-/* { dg-final-use { scan-assembler "foo\[._\]+cold\[\._\]+0" { target *-*-linux* *-*-gnu* } } } */
-/* { dg-final-use { scan-assembler "size\[ \ta-zA-Z0-0\]+foo\[._\]+cold\[\._\]+0" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "foo\[._\]+cold" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "size\[ \ta-zA-Z0-0\]+foo\[._\]+cold" { target *-*-linux* *-*-gnu* } } } */
 /* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized"} } */
diff --git gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c
index ee6662e..1bb8cd9 100644
--- gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c
+++ gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c
@@ -42,4 +42,4 @@ foo (int path)
     }
 }
 
-/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */
diff --git gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c
index 898a395..31be7eb 100644
--- gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c
+++ gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c
@@ -41,4 +41,4 @@ foo (int path)
     }
 }
 
-/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */
diff --git gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c
index 36829dc..0e64001 100644
--- gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c
+++ gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c
@@ -42,4 +42,4 @@ foo (int path)
     }
 }
 
-/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */
-- 
2.7.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.5: 0004-Use-suffixed_function_name-in-cgraph_node-create_vir.patch --]
[-- Type: text/x-patch; name="0004-Use-suffixed_function_name-in-cgraph_node-create_vir.patch", Size: 1722 bytes --]

From d96bedcf51c289f3c80a12a681a7831d9e287874 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 13 Aug 2018 13:59:46 -0400
Subject: [PATCH 4/4] Use suffixed_function_name in
 cgraph_node::create_virtual_clone.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * cgraphclones.c (cgraph_node::create_virtual_clone): Use
         suffixed_function_name.
---
 gcc/cgraphclones.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 784f1df..876b7fa 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -576,9 +576,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
   tree old_decl = decl;
   cgraph_node *new_node = NULL;
   tree new_decl;
-  size_t len, i;
+  size_t i;
   ipa_replace_map *map;
-  char *name;
 
   gcc_checking_assert (local.versionable);
   gcc_assert (local.can_change_signature || !args_to_skip);
@@ -600,12 +599,7 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
      sometimes storing only clone decl instead of original.  */
 
   /* Generate a new name for the new version. */
-  len = IDENTIFIER_LENGTH (DECL_NAME (old_decl));
-  name = XALLOCAVEC (char, len + strlen (suffix) + 2);
-  memcpy (name, IDENTIFIER_POINTER (DECL_NAME (old_decl)), len);
-  strcpy (name + len + 1, suffix);
-  name[len] = '.';
-  DECL_NAME (new_decl) = get_identifier (name);
+  DECL_NAME (new_decl) = suffixed_function_name (old_decl, suffix);
   SET_DECL_ASSEMBLER_NAME (new_decl, numbered_clone_function_name (old_decl, suffix));
   SET_DECL_RTL (new_decl, NULL);
 
-- 
2.7.4


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PING][PATCH] Make function clone name numbering independent.
  2018-08-13 23:58                     ` [PING][PATCH] " Michael Ploujnikov
@ 2018-08-20 19:47                       ` Jeff Law
  2018-08-31 18:49                       ` [PING v2][PATCH] " Michael Ploujnikov
  1 sibling, 0 replies; 30+ messages in thread
From: Jeff Law @ 2018-08-20 19:47 UTC (permalink / raw)
  To: Michael Ploujnikov, Richard Biener
  Cc: Bernhard Fischer, GCC Patches, hubicka, Sriraman Tallam

On 08/13/2018 05:58 PM, Michael Ploujnikov wrote:
> Ping and I've updated the patch since last time as follows:
> 
>   - unittest scans assembly rather than the constprop dump because its
>     forward changed
>   - unittests should handle different hosts where any of
>     NO_DOT_IN_LABEL, NO_DOLLAR_IN_LABEL or __USER_LABEL_PREFIX__ may
>     be defined
>   - not 100% it's safe to change DECL_NAME to DECL_ASSEMBLER_NAME in
>     cgraph_node::create_virtual_clone, but I've attempted to reduce
>     some code duplication
>   - lto-partition.c: privatize_symbol_name_1 *does* need numbered
>     names
>   - but cold sections don't
>   - Expecting an IDENTIFIER_NODE in clone_function_name_1 avoids
>     unreliable string pointer use as pointed out in the first review
>   - renamed clone_function_name_1 and clone_function_name to
>     numbered_clone_function_name_1 and numbered_clone_function_name to
>     clarify purpose and discourage future unintended uses
Richi has more state here than I do, so I'm going to let him own it.  I
know he's just returning from PTO, so it's going to take him a bit of
time to catch up.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PING v2][PATCH] Make function clone name numbering independent.
  2018-08-13 23:58                     ` [PING][PATCH] " Michael Ploujnikov
  2018-08-20 19:47                       ` Jeff Law
@ 2018-08-31 18:49                       ` Michael Ploujnikov
  2018-09-03 10:02                         ` Martin Jambor
  1 sibling, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-08-31 18:49 UTC (permalink / raw)
  To: gcc-patches


[-- Attachment #1.1.1: Type: text/plain, Size: 4381 bytes --]

On 2018-08-13 07:58 PM, Michael Ploujnikov wrote:
> Ping and I've updated the patch since last time as follows:
> 
>   - unittest scans assembly rather than the constprop dump because its
>     forward changed
>   - unittests should handle different hosts where any of
>     NO_DOT_IN_LABEL, NO_DOLLAR_IN_LABEL or __USER_LABEL_PREFIX__ may
>     be defined
>   - not 100% it's safe to change DECL_NAME to DECL_ASSEMBLER_NAME in
>     cgraph_node::create_virtual_clone, but I've attempted to reduce
>     some code duplication
>   - lto-partition.c: privatize_symbol_name_1 *does* need numbered
>     names
>   - but cold sections don't
>   - Expecting an IDENTIFIER_NODE in clone_function_name_1 avoids
>     unreliable string pointer use as pointed out in the first review
>   - renamed clone_function_name_1 and clone_function_name to
>     numbered_clone_function_name_1 and numbered_clone_function_name to
>     clarify purpose and discourage future unintended uses
> 
> Also bootstrapped and regtested.

Ping.

I've done some more digging into the current uses of
numbered_clone_function_name and checked if any tests fail if I change
it to suffixed_function_name:

  - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
    - no new tests fail, inconclusive
    - and despite the comment on redirect_callee_duplicating_thunks
      about "one or more" duplicates it doesn't seem like
      duplicate_thunk_for_node would be called more than once for each
      node, assuming each node is named uniquely, but I'm far from an
      expert in this area
  - gcc/cgraphclones.c:  SET_DECL_ASSEMBLER_NAME (new_decl, numbered_clone_function_name (old_decl, suffix));
    - called by cgraph_node::create_virtual_clone, definitely needs it
    - this is where clones like .constprop come from
  - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, suffix);
    - more tests fail
    - this is where clones like .simdclone come from
    - called by cgraph_node::create_version_clone_with_body, most likely needs it
  - gcc/config/rs6000/rs6000.c:  tree decl_name = numbered_clone_function_name (default_decl, "resolver");
    - doesn't look like this needs the numbering as there should only
      be one resolver per multi-version function, but need someone
      with an rs/6000 to confirm
  - gcc/lto/lto-partition.c:                                      numbered_clone_function_name_1 (identifier,
    - definitely needed for disambiguation, shown by unit tests failing
  - gcc/multiple_target.c:                                numbered_clone_function_name (node->decl,
    - create_dispatcher_calls says it only creates the dispatcher once
    - no new tests fail, inconclusive
  - gcc/multiple_target.c:                                    numbered_clone_function_name (node->decl,
    - I have a feeling this isn't necessary
    - no new tests fail, inconclusive
  - gcc/omp-expand.c:  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
    - no new tests fail, inconclusive
    - I didn't see (and couldn't figure out a way to get) any of the
      existing omp/acc tests actually exercise this codeptah
  - gcc/omp-low.c:  return numbered_clone_function_name (current_function_decl,
    - definitely needed based on
      gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c and others
  - gcc/omp-simd-clone.c:      DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, "simdclone");
    - no tests fail, inconclusive
    - can definitely have more than one simdclone per function as above, but
      maybe not these external types
    - some tests, like declare-simd.c actually exercise this codepath,
      but I couldn't find the resulting .simdclone decl the the
      simdclone pass dump nor in any of the other dumps to verify
  - gcc/symtab.c:  DECL_NAME (new_decl) = numbered_clone_function_name (node->decl, "localalias");
    - no tests fail, inconclusive
    - my understanding of function_and_variable_visibility says that
      there can only be one per function so maybe this isn't; hubicka?

I'll add patches to switch these once someone with expertise in these
areas can confirm that the numbering isn't needed in the respective
decl names.


- Michael


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-Make-function-assembly-more-independent.patch --]
[-- Type: text/x-patch; name="0001-Make-function-assembly-more-independent.patch", Size: 6060 bytes --]

From adde0632266d3f1b0540e09b9db931df4302d2bc Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 16 Jul 2018 12:55:49 -0400
Subject: [PATCH 1/4] Make function assembly more independent.

This changes clone_function_name such that clone names are based on a
per-function counter rather than a global one.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Make function clone name numbering independent.
       * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
       (clone_function_name_1): Take an IDENTIFIER_NODE instead of a
       string. Use clone_fn_id_num.

lto:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * lto-partition.c (privatize_symbol_name_1): Pass a persistent
       identifier node to clone_function_name_1.

testsuite:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Clone id counters should be completely independent from one another.
       * gcc.dg/independent-cloneids-1.c: New test.
---
 gcc/cgraph.h                                  |  2 +-
 gcc/cgraphclones.c                            | 19 +++++++++-----
 gcc/lto/lto-partition.c                       |  5 ++--
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c

diff --git gcc/cgraph.h gcc/cgraph.h
index a8b1b4c..fadc107 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
-tree clone_function_name_1 (const char *, const char *);
+tree clone_function_name_1 (tree identifier, const char *);
 tree clone_function_name (tree decl, const char *);
 
 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 6e84a31..fb81fbd 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -512,14 +512,15 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   return new_node;
 }
 
-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
-/* Return a new assembler name for a clone with SUFFIX of a decl named
-   NAME.  */
+/* Return a new assembler name for a numbered clone with SUFFIX of a
+   decl named IDENTIFIER.  */
 
 tree
-clone_function_name_1 (const char *name, const char *suffix)
+clone_function_name_1 (tree identifier, const char *suffix)
 {
+  const char *name = IDENTIFIER_POINTER (identifier);
   size_t len = strlen (name);
   char *tmp_name, *prefix;
 
@@ -527,7 +528,12 @@ clone_function_name_1 (const char *name, const char *suffix)
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
+  /* Initialize the per-function counter hash table on first call */
+  if (!clone_fn_ids)
+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
+  unsigned int &suffix_counter = clone_fn_ids->get_or_insert(name);
+  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, suffix_counter);
+  suffix_counter++;
   return get_identifier (tmp_name);
 }
 
@@ -536,8 +542,7 @@ clone_function_name_1 (const char *name, const char *suffix)
 tree
 clone_function_name (tree decl, const char *suffix)
 {
-  tree name = DECL_ASSEMBLER_NAME (decl);
-  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+  return clone_function_name_1 (DECL_ASSEMBLER_NAME (decl), suffix);
 }
 
 
diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
index c7a5710..e3efa71 100644
--- gcc/lto/lto-partition.c
+++ gcc/lto/lto-partition.c
@@ -962,11 +962,12 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
   if (must_not_rename (node, name))
     return false;
 
-  name = maybe_rewrite_identifier (name);
+  tree identifier = get_identifier (maybe_rewrite_identifier (name));
   symtab->change_decl_assembler_name (decl,
-				      clone_function_name_1 (name,
+				      clone_function_name_1 (identifier,
 							     "lto_priv"));
 
+  name = IDENTIFIER_POINTER (identifier);
   if (node->lto_file_data)
     lto_record_renamed_decl (node->lto_file_data, name,
 			     IDENTIFIER_POINTER
diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
new file mode 100644
index 0000000..3203895
--- /dev/null
+++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone"  } */
+
+extern int printf (const char *, ...);
+
+static int __attribute__ ((noinline))
+foo (int arg)
+{
+  return 7 * arg;
+}
+
+static int __attribute__ ((noinline))
+bar (int arg)
+{
+  return arg * arg;
+}
+
+int
+baz (int arg)
+{
+  printf("%d\n", bar (3));
+  printf("%d\n", bar (4));
+  printf("%d\n", foo (5));
+  printf("%d\n", foo (6));
+  /* adding or removing the following call should not affect foo
+     function's clone numbering */
+  printf("%d\n", bar (7));
+  return foo (8);
+}
+
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]3:} } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]4:} } } */
-- 
2.7.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.3: 0002-Rename-clone_function_name_1-and-clone_function_name.patch --]
[-- Type: text/x-patch; name="0002-Rename-clone_function_name_1-and-clone_function_name.patch", Size: 8895 bytes --]

From bbea3705e59e23289cdba72da143f89b3a08b78c Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Tue, 7 Aug 2018 20:36:53 -0400
Subject: [PATCH 2/4] Rename clone_function_name_1 and clone_function_name to
 clarify usage.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * gcc/cgraph.h: Rename clone_function_name_1 to
         numbered_clone_function_name_1. Rename clone_function_name to
         numbered_clone_function_name.
       * cgraphclones.c: Ditto.
       * config/rs6000/rs6000.c: Ditto.
       * lto/lto-partition.c: Ditto.
       * multiple_target.c: Ditto.
       * omp-expand.c: Ditto.
       * omp-low.c: Ditto.
       * omp-simd-clone.c: Ditto.
       * symtab.c: Ditto.
---
 gcc/cgraph.h               |  4 ++--
 gcc/cgraphclones.c         | 14 +++++++-------
 gcc/config/rs6000/rs6000.c |  2 +-
 gcc/lto/lto-partition.c    |  4 ++--
 gcc/multiple_target.c      |  8 ++++----
 gcc/omp-expand.c           |  2 +-
 gcc/omp-low.c              |  4 ++--
 gcc/omp-simd-clone.c       |  2 +-
 gcc/symtab.c               |  2 +-
 9 files changed, 21 insertions(+), 21 deletions(-)

diff --git gcc/cgraph.h gcc/cgraph.h
index fadc107..322df1e 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -2368,8 +2368,8 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
-tree clone_function_name_1 (tree identifier, const char *);
-tree clone_function_name (tree decl, const char *);
+tree numbered_clone_function_name_1 (tree identifier, const char *);
+tree numbered_clone_function_name (tree decl, const char *);
 
 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
 			       bool, bitmap, bool, bitmap, basic_block);
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index fb81fbd..08fc922 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -316,7 +316,7 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node)
   gcc_checking_assert (!DECL_RESULT (new_decl));
   gcc_checking_assert (!DECL_RTL_SET_P (new_decl));
 
-  DECL_NAME (new_decl) = clone_function_name (thunk->decl, "artificial_thunk");
+  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
   SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
 
   new_thunk = cgraph_node::create (new_decl);
@@ -518,7 +518,7 @@ static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
    decl named IDENTIFIER.  */
 
 tree
-clone_function_name_1 (tree identifier, const char *suffix)
+numbered_clone_function_name_1 (tree identifier, const char *suffix)
 {
   const char *name = IDENTIFIER_POINTER (identifier);
   size_t len = strlen (name);
@@ -537,12 +537,12 @@ clone_function_name_1 (tree identifier, const char *suffix)
   return get_identifier (tmp_name);
 }
 
-/* Return a new assembler name for a clone of DECL with SUFFIX.  */
+/* Return a new assembler name for a clone of DECL with numbered SUFFIX.  */
 
 tree
-clone_function_name (tree decl, const char *suffix)
+numbered_clone_function_name (tree decl, const char *suffix)
 {
-  return clone_function_name_1 (DECL_ASSEMBLER_NAME (decl), suffix);
+  return numbered_clone_function_name_1 (DECL_ASSEMBLER_NAME (decl), suffix);
 }
 
 
@@ -590,7 +590,7 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
   strcpy (name + len + 1, suffix);
   name[len] = '.';
   DECL_NAME (new_decl) = get_identifier (name);
-  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix));
+  SET_DECL_ASSEMBLER_NAME (new_decl, numbered_clone_function_name (old_decl, suffix));
   SET_DECL_RTL (new_decl, NULL);
 
   new_node = create_clone (new_decl, count, false,
@@ -969,7 +969,7 @@ cgraph_node::create_version_clone_with_body
       = build_function_decl_skip_args (old_decl, args_to_skip, skip_return);
 
   /* Generate a new name for the new version. */
-  DECL_NAME (new_decl) = clone_function_name (old_decl, suffix);
+  DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, suffix);
   SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
   SET_DECL_RTL (new_decl, NULL);
 
diff --git gcc/config/rs6000/rs6000.c gcc/config/rs6000/rs6000.c
index c2af4b8..f9c9936 100644
--- gcc/config/rs6000/rs6000.c
+++ gcc/config/rs6000/rs6000.c
@@ -36512,7 +36512,7 @@ make_resolver_func (const tree default_decl,
 {
   /* Make the resolver function static.  The resolver function returns
      void *.  */
-  tree decl_name = clone_function_name (default_decl, "resolver");
+  tree decl_name = numbered_clone_function_name (default_decl, "resolver");
   const char *resolver_name = IDENTIFIER_POINTER (decl_name);
   tree type = build_function_type_list (ptr_type_node, NULL_TREE);
   tree decl = build_fn_decl (resolver_name, type);
diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
index e3efa71..c872325 100644
--- gcc/lto/lto-partition.c
+++ gcc/lto/lto-partition.c
@@ -964,8 +964,8 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
 
   tree identifier = get_identifier (maybe_rewrite_identifier (name));
   symtab->change_decl_assembler_name (decl,
-				      clone_function_name_1 (identifier,
-							     "lto_priv"));
+				      numbered_clone_function_name_1 (identifier,
+								      "lto_priv"));
 
   name = IDENTIFIER_POINTER (identifier);
   if (node->lto_file_data)
diff --git gcc/multiple_target.c gcc/multiple_target.c
index a1fe09a..592e7b9 100644
--- gcc/multiple_target.c
+++ gcc/multiple_target.c
@@ -162,8 +162,8 @@ create_dispatcher_calls (struct cgraph_node *node)
     }
 
   symtab->change_decl_assembler_name (node->decl,
-				      clone_function_name (node->decl,
-							   "default"));
+				      numbered_clone_function_name (node->decl,
+								    "default"));
 
   /* FIXME: copy of cgraph_node::make_local that should be cleaned up
 	    in next stage1.  */
@@ -312,8 +312,8 @@ create_target_clone (cgraph_node *node, bool definition, char *name)
       new_node = cgraph_node::get_create (new_decl);
       /* Generate a new name for the new version.  */
       symtab->change_decl_assembler_name (new_node->decl,
-					  clone_function_name (node->decl,
-							       name));
+					  numbered_clone_function_name (node->decl,
+									name));
     }
   return new_node;
 }
diff --git gcc/omp-expand.c gcc/omp-expand.c
index d2a77c0..3aeaae0 100644
--- gcc/omp-expand.c
+++ gcc/omp-expand.c
@@ -7625,7 +7625,7 @@ grid_expand_target_grid_body (struct omp_region *target)
     expand_omp (gpukernel->inner);
 
   tree kern_fndecl = copy_node (orig_child_fndecl);
-  DECL_NAME (kern_fndecl) = clone_function_name (kern_fndecl, "kernel");
+  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
   SET_DECL_ASSEMBLER_NAME (kern_fndecl, DECL_NAME (kern_fndecl));
   tree tgtblock = gimple_block (tgt_stmt);
   tree fniniblock = make_node (BLOCK);
diff --git gcc/omp-low.c gcc/omp-low.c
index 843c66f..6748e05 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1531,8 +1531,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 static tree
 create_omp_child_function_name (bool task_copy)
 {
-  return clone_function_name (current_function_decl,
-			      task_copy ? "_omp_cpyfn" : "_omp_fn");
+  return numbered_clone_function_name (current_function_decl,
+				       task_copy ? "_omp_cpyfn" : "_omp_fn");
 }
 
 /* Return true if CTX may belong to offloaded code: either if current function
diff --git gcc/omp-simd-clone.c gcc/omp-simd-clone.c
index b15adf0..806c486 100644
--- gcc/omp-simd-clone.c
+++ gcc/omp-simd-clone.c
@@ -444,7 +444,7 @@ simd_clone_create (struct cgraph_node *old_node)
     {
       tree old_decl = old_node->decl;
       tree new_decl = copy_node (old_node->decl);
-      DECL_NAME (new_decl) = clone_function_name (old_decl, "simdclone");
+      DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, "simdclone");
       SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
       SET_DECL_RTL (new_decl, NULL);
       DECL_STATIC_CONSTRUCTOR (new_decl) = 0;
diff --git gcc/symtab.c gcc/symtab.c
index c5464cb..de65b65 100644
--- gcc/symtab.c
+++ gcc/symtab.c
@@ -1787,7 +1787,7 @@ symtab_node::noninterposable_alias (void)
   /* Otherwise create a new one.  */
   new_decl = copy_node (node->decl);
   DECL_DLLIMPORT_P (new_decl) = 0;
-  DECL_NAME (new_decl) = clone_function_name (node->decl, "localalias");
+  DECL_NAME (new_decl) = numbered_clone_function_name (node->decl, "localalias");
   if (TREE_CODE (new_decl) == FUNCTION_DECL)
     DECL_STRUCT_FUNCTION (new_decl) = NULL;
   DECL_INITIAL (new_decl) = NULL;
-- 
2.7.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.4: 0003-Cold-sections-don-t-need-to-be-numbered.patch --]
[-- Type: text/x-patch; name="0003-Cold-sections-don-t-need-to-be-numbered.patch", Size: 5781 bytes --]

From f6a5f30d8975214c564ec3740163b6156ca2ad97 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Tue, 7 Aug 2018 20:47:12 -0400
Subject: [PATCH 3/4] Cold sections don't need to be numbered.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * cgraph.h: Add suffixed_function_name.
       * cgraphclones.c: Ditto.
       * final.c (final_scan_insn_1): Use it.

testsuite:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
       * gcc.dg/tree-prof/cold_partition_label.c: Update for new cold
         section names.
       * gcc.dg/tree-prof/section-attr-1.c: ditto.
       * gcc.dg/tree-prof/section-attr-2.c: ditto.
       * gcc.dg/tree-prof/section-attr-3.c: ditto.
---
 gcc/cgraph.h                                          |  1 +
 gcc/cgraphclones.c                                    | 16 ++++++++++++++++
 gcc/final.c                                           |  2 +-
 gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c |  4 ++--
 gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c       |  2 +-
 gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c       |  2 +-
 gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c       |  2 +-
 7 files changed, 23 insertions(+), 6 deletions(-)

diff --git gcc/cgraph.h gcc/cgraph.h
index 322df1e..836ceb1 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -2368,6 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
+tree suffixed_function_name (tree decl, const char *);
 tree numbered_clone_function_name_1 (tree identifier, const char *);
 tree numbered_clone_function_name (tree decl, const char *);
 
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 08fc922..784f1df 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -514,6 +514,22 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
 
 static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
+/* Return a new assembler name for a clone of DECL with SUFFIX.  */
+
+tree
+suffixed_function_name (tree decl, const char *suffix)
+{
+  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  size_t len = strlen (name);
+  char *prefix;
+
+  prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
+  memcpy (prefix, name, len);
+  prefix[len] = symbol_table::symbol_suffix_separator ();
+  strcpy (prefix + len + 1, suffix);
+  return get_identifier (prefix);
+}
+
 /* Return a new assembler name for a numbered clone with SUFFIX of a
    decl named IDENTIFIER.  */
 
diff --git gcc/final.c gcc/final.c
index 842e5e0..ac83417 100644
--- gcc/final.c
+++ gcc/final.c
@@ -2203,7 +2203,7 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int optimize_p ATTRIBUTE_UNUSED,
 
 	  if (in_cold_section_p)
 	    cold_function_name
-	      = clone_function_name (current_function_decl, "cold");
+	      = suffixed_function_name (current_function_decl, "cold");
 
 	  if (dwarf2out_do_frame ())
 	    {
diff --git gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
index 52518cf..450308d 100644
--- gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
+++ gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
@@ -37,6 +37,6 @@ main (int argc, char *argv[])
   return 0;
 }
 
-/* { dg-final-use { scan-assembler "foo\[._\]+cold\[\._\]+0" { target *-*-linux* *-*-gnu* } } } */
-/* { dg-final-use { scan-assembler "size\[ \ta-zA-Z0-0\]+foo\[._\]+cold\[\._\]+0" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "foo\[._\]+cold" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "size\[ \ta-zA-Z0-0\]+foo\[._\]+cold" { target *-*-linux* *-*-gnu* } } } */
 /* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized"} } */
diff --git gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c
index ee6662e..1bb8cd9 100644
--- gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c
+++ gcc/testsuite/gcc.dg/tree-prof/section-attr-1.c
@@ -42,4 +42,4 @@ foo (int path)
     }
 }
 
-/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */
diff --git gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c
index 898a395..31be7eb 100644
--- gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c
+++ gcc/testsuite/gcc.dg/tree-prof/section-attr-2.c
@@ -41,4 +41,4 @@ foo (int path)
     }
 }
 
-/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */
diff --git gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c
index 36829dc..0e64001 100644
--- gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c
+++ gcc/testsuite/gcc.dg/tree-prof/section-attr-3.c
@@ -42,4 +42,4 @@ foo (int path)
     }
 }
 
-/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target *-*-linux* *-*-gnu* } } } */
+/* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold" { target *-*-linux* *-*-gnu* } } } */
-- 
2.7.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.5: 0004-Use-suffixed_function_name-in-cgraph_node-create_vir.patch --]
[-- Type: text/x-patch; name="0004-Use-suffixed_function_name-in-cgraph_node-create_vir.patch", Size: 1722 bytes --]

From d96bedcf51c289f3c80a12a681a7831d9e287874 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 13 Aug 2018 13:59:46 -0400
Subject: [PATCH 4/4] Use suffixed_function_name in
 cgraph_node::create_virtual_clone.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * cgraphclones.c (cgraph_node::create_virtual_clone): Use
         suffixed_function_name.
---
 gcc/cgraphclones.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 784f1df..876b7fa 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -576,9 +576,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
   tree old_decl = decl;
   cgraph_node *new_node = NULL;
   tree new_decl;
-  size_t len, i;
+  size_t i;
   ipa_replace_map *map;
-  char *name;
 
   gcc_checking_assert (local.versionable);
   gcc_assert (local.can_change_signature || !args_to_skip);
@@ -600,12 +599,7 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
      sometimes storing only clone decl instead of original.  */
 
   /* Generate a new name for the new version. */
-  len = IDENTIFIER_LENGTH (DECL_NAME (old_decl));
-  name = XALLOCAVEC (char, len + strlen (suffix) + 2);
-  memcpy (name, IDENTIFIER_POINTER (DECL_NAME (old_decl)), len);
-  strcpy (name + len + 1, suffix);
-  name[len] = '.';
-  DECL_NAME (new_decl) = get_identifier (name);
+  DECL_NAME (new_decl) = suffixed_function_name (old_decl, suffix);
   SET_DECL_ASSEMBLER_NAME (new_decl, numbered_clone_function_name (old_decl, suffix));
   SET_DECL_RTL (new_decl, NULL);
 
-- 
2.7.4


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PING v2][PATCH] Make function clone name numbering independent.
  2018-08-31 18:49                       ` [PING v2][PATCH] " Michael Ploujnikov
@ 2018-09-03 10:02                         ` Martin Jambor
  2018-09-03 11:08                           ` Richard Biener
  2018-09-05  3:32                           ` [PING v2][PATCH] " Michael Ploujnikov
  0 siblings, 2 replies; 30+ messages in thread
From: Martin Jambor @ 2018-09-03 10:02 UTC (permalink / raw)
  To: Michael Ploujnikov, gcc-patches

Hi,

On Fri, Aug 31 2018, Michael Ploujnikov wrote:
> I've done some more digging into the current uses of
> numbered_clone_function_name and checked if any tests fail if I change
> it to suffixed_function_name:
>
>   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
>     - no new tests fail, inconclusive
>     - and despite the comment on redirect_callee_duplicating_thunks
>       about "one or more" duplicates it doesn't seem like
>       duplicate_thunk_for_node would be called more than once for each
>       node, assuming each node is named uniquely, but I'm far from an
>       expert in this area

The comment means that if there is a chain of thunks, the method clones
all of them.  Nevertheless, you need name numbering here for the same
reason why you need them for constprop.


>   - gcc/omp-expand.c:  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
>     - no new tests fail, inconclusive
>     - I didn't see (and couldn't figure out a way to get) any of the
>       existing omp/acc tests actually exercise this codeptah

I guess this one should not need it.  Build with
--enable-offload-targets=hsa and run gomp.exp to try yourself.  I can
run run-time HSA tests for you if you want.

Martin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PING v2][PATCH] Make function clone name numbering independent.
  2018-09-03 10:02                         ` Martin Jambor
@ 2018-09-03 11:08                           ` Richard Biener
  2018-09-03 13:15                             ` Martin Jambor
  2018-09-05  3:32                           ` [PING v2][PATCH] " Michael Ploujnikov
  1 sibling, 1 reply; 30+ messages in thread
From: Richard Biener @ 2018-09-03 11:08 UTC (permalink / raw)
  To: Martin Jambor; +Cc: Michael Ploujnikov, GCC Patches

On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor <mjambor@suse.cz> wrote:
>
> Hi,
>
> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
> > I've done some more digging into the current uses of
> > numbered_clone_function_name and checked if any tests fail if I change
> > it to suffixed_function_name:
> >
> >   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
> >     - no new tests fail, inconclusive
> >     - and despite the comment on redirect_callee_duplicating_thunks
> >       about "one or more" duplicates it doesn't seem like
> >       duplicate_thunk_for_node would be called more than once for each
> >       node, assuming each node is named uniquely, but I'm far from an
> >       expert in this area
>
> The comment means that if there is a chain of thunks, the method clones
> all of them.  Nevertheless, you need name numbering here for the same
> reason why you need them for constprop.

The remaining question I had with the patch was if maybe all callers
can handle assigning
the numbering themselves, thus do sth like

   for-each-clone-of (i, fn)
      DECL_NAME (...) = numbered_clone_function_name (...,
"artificial_thunk", i);

which would make the map of name -> number unnecessary.

>
> >   - gcc/omp-expand.c:  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
> >     - no new tests fail, inconclusive
> >     - I didn't see (and couldn't figure out a way to get) any of the
> >       existing omp/acc tests actually exercise this codeptah
>
> I guess this one should not need it.  Build with
> --enable-offload-targets=hsa and run gomp.exp to try yourself.  I can
> run run-time HSA tests for you if you want.
>
> Martin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PING v2][PATCH] Make function clone name numbering independent.
  2018-09-03 11:08                           ` Richard Biener
@ 2018-09-03 13:15                             ` Martin Jambor
  2018-09-05  3:24                               ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Jambor @ 2018-09-03 13:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: Michael Ploujnikov, GCC Patches

Hi,

On Mon, Sep 03 2018, Richard Biener wrote:
> On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor <mjambor@suse.cz> wrote:
>>
>> Hi,
>>
>> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
>> > I've done some more digging into the current uses of
>> > numbered_clone_function_name and checked if any tests fail if I change
>> > it to suffixed_function_name:
>> >
>> >   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
>> >     - no new tests fail, inconclusive
>> >     - and despite the comment on redirect_callee_duplicating_thunks
>> >       about "one or more" duplicates it doesn't seem like
>> >       duplicate_thunk_for_node would be called more than once for each
>> >       node, assuming each node is named uniquely, but I'm far from an
>> >       expert in this area
>>
>> The comment means that if there is a chain of thunks, the method clones
>> all of them.  Nevertheless, you need name numbering here for the same
>> reason why you need them for constprop.
>
> The remaining question I had with the patch was if maybe all callers
> can handle assigning
> the numbering themselves, thus do sth like
>
>    for-each-clone-of (i, fn)
>       DECL_NAME (...) = numbered_clone_function_name (...,
> "artificial_thunk", i);
>
> which would make the map of name -> number unnecessary.
>

That is what I would prefer too but it also has downsides.  Callers
would have to do their book keeping and the various cloning interfaces
would get another parameter when already they have quite a few
(introducing some cloning context classes seems like an overkill to me
:-).

If we want to get rid of .constprop discrepancies, something like the
following (only very mildly tested) patch should be enough (note that
IPA-CP currently does not really need the new has because it creates
clones in only one pass through the call graph, but that is something
likely to change).  We could then adjust other cloners that we care
about.

However, if clones of thunks are one of them, then the optional
parameter will also proliferate to cgraph_node::create_clone(),
cgraph_edge::redirect_callee_duplicating_thunks() and
duplicate_thunk_for_node().  create_version_clone_with_body would almost
certainly need it because of IPA-SRA too (IPA-SRA can of course always
pass zero).

Martin



diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 2b00f0165fa..703c3cfea7b 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -964,11 +964,15 @@ public:
 			     cgraph_node *new_inlined_to,
 			     bitmap args_to_skip, const char *suffix = NULL);
 
-  /* Create callgraph node clone with new declaration.  The actual body will
-     be copied later at compilation stage.  */
+  /* Create callgraph node clone with new declaration.  The actual body will be
+     copied later at compilation stage.  The name of the new clone will be
+     constructed from the name of the original name, SUFFIX and a number which
+     can either be NUM_SUFFIX if non-negative or a unique incremented integer
+     otherwise.  */
   cgraph_node *create_virtual_clone (vec<cgraph_edge *> redirect_callers,
 				     vec<ipa_replace_map *, va_gc> *tree_map,
-				     bitmap args_to_skip, const char * suffix);
+				     bitmap args_to_skip, const char * suffix,
+				     int num_suffix = -1);
 
   /* cgraph node being removed from symbol table; see if its entry can be
    replaced by other inline clone.  */
@@ -2376,8 +2380,9 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
-tree clone_function_name_1 (const char *, const char *);
-tree clone_function_name (tree decl, const char *);
+tree clone_function_name_1 (const char *name, const char *suffix,
+			    int num_suffix = -1);
+tree clone_function_name (tree decl, const char *suffix, int num_suffix = -1);
 
 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
 			       bool, bitmap, bool, bitmap, basic_block);
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 0c0a94b04a3..865c3fa1ad0 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -513,11 +513,13 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
 
 static GTY(()) unsigned int clone_fn_id_num;
 
-/* Return a new assembler name for a clone with SUFFIX of a decl named
-   NAME.  */
+/* Return a new assembler name for a clone with SUFFIX of a decl named NAME
+   Apart from, string SUFFIX, the new name will end with either NUM_SIFFIX, if
+   non-negative, or a unique unspecified number otherwise.  .  */
 
 tree
-clone_function_name_1 (const char *name, const char *suffix)
+clone_function_name_1 (const char *name, const char *suffix,
+		       int num_suffix)
 {
   size_t len = strlen (name);
   char *tmp_name, *prefix;
@@ -526,22 +528,32 @@ clone_function_name_1 (const char *name, const char *suffix)
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
+  unsigned int num;
+  if (num_suffix < 0)
+    num = clone_fn_id_num++;
+  else
+    num = (unsigned) num_suffix;
+  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, num);
   return get_identifier (tmp_name);
 }
 
-/* Return a new assembler name for a clone of DECL with SUFFIX.  */
+/* Return a new assembler name for a clone of DECL with SUFFIX. Apart from,
+   string SUFFIX, the new name will end with either NUM_SIFFIX, if
+   non-negative, or a unique unspecified number otherwise.  */
 
 tree
-clone_function_name (tree decl, const char *suffix)
+clone_function_name (tree decl, const char *suffix, int num_suffix)
 {
   tree name = DECL_ASSEMBLER_NAME (decl);
-  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix, num_suffix);
 }
 
 
-/* Create callgraph node clone with new declaration.  The actual body will
-   be copied later at compilation stage.
+/* Create callgraph node clone with new declaration.  The actual body will be
+   copied later at compilation stage.  The name of the new clone will be
+   constructed from the name of the original name, SUFFIX and a number which
+   can either be NUM_SUFFIX if non-negative or a unique incremented integer
+   otherwise.
 
    TODO: after merging in ipa-sra use function call notes instead of args_to_skip
    bitmap interface.
@@ -549,7 +561,8 @@ clone_function_name (tree decl, const char *suffix)
 cgraph_node *
 cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
 				   vec<ipa_replace_map *, va_gc> *tree_map,
-				   bitmap args_to_skip, const char * suffix)
+				   bitmap args_to_skip, const char * suffix,
+				   int num_suffix)
 {
   tree old_decl = decl;
   cgraph_node *new_node = NULL;
@@ -584,7 +597,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
   strcpy (name + len + 1, suffix);
   name[len] = '.';
   DECL_NAME (new_decl) = get_identifier (name);
-  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix));
+  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix,
+							  num_suffix));
   SET_DECL_RTL (new_decl, NULL);
 
   new_node = create_clone (new_decl, count, false,
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index afc45969558..52690996419 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -376,6 +376,9 @@ static profile_count max_count;
 
 static long overall_size, max_new_size;
 
+/* Hash to give different UID suffixes */
+static hash_map<cgraph_node *, int> *clone_num_suffixes;
+
 /* Return the param lattices structure corresponding to the Ith formal
    parameter of the function described by INFO.  */
 static inline struct ipcp_param_lattices *
@@ -3850,8 +3853,11 @@ create_specialized_node (struct cgraph_node *node,
 	}
     }
 
+  int &suffix_counter = clone_num_suffixes->get_or_insert(node);
   new_node = node->create_virtual_clone (callers, replace_trees,
-					 args_to_skip, "constprop");
+					 args_to_skip, "constprop",
+					 suffix_counter);
+  suffix_counter++;
 
   bool have_self_recursive_calls = !self_recursive_calls.is_empty ();
   for (unsigned j = 0; j < self_recursive_calls.length (); j++)
@@ -5065,6 +5071,7 @@ ipcp_driver (void)
 
   ipa_check_create_node_params ();
   ipa_check_create_edge_args ();
+  clone_num_suffixes = new hash_map<cgraph_node *, int>;
 
   if (dump_file)
     {
@@ -5086,6 +5093,7 @@ ipcp_driver (void)
   ipcp_store_vr_results ();
 
   /* Free all IPCP structures.  */
+  delete clone_num_suffixes;
   free_toporder_info (&topo);
   delete edge_clone_summaries;
   edge_clone_summaries = NULL;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PING v2][PATCH] Make function clone name numbering independent.
  2018-09-03 13:15                             ` Martin Jambor
@ 2018-09-05  3:24                               ` Michael Ploujnikov
  2018-11-28 21:09                                 ` [PATCH v3] " Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-09-05  3:24 UTC (permalink / raw)
  To: Martin Jambor, Richard Biener; +Cc: GCC Patches


[-- Attachment #1.1: Type: text/plain, Size: 11535 bytes --]

Hi Martin, Richard,

Thanks for your responses.

On 2018-09-03 09:15 AM, Martin Jambor wrote:
> Hi,
> 
> On Mon, Sep 03 2018, Richard Biener wrote:
>> On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor <mjambor@suse.cz> wrote:
>>>
>>> Hi,
>>>
>>> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
>>>> I've done some more digging into the current uses of
>>>> numbered_clone_function_name and checked if any tests fail if I change
>>>> it to suffixed_function_name:
>>>>
>>>>   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
>>>>     - no new tests fail, inconclusive
>>>>     - and despite the comment on redirect_callee_duplicating_thunks
>>>>       about "one or more" duplicates it doesn't seem like
>>>>       duplicate_thunk_for_node would be called more than once for each
>>>>       node, assuming each node is named uniquely, but I'm far from an
>>>>       expert in this area
>>>
>>> The comment means that if there is a chain of thunks, the method clones
>>> all of them.  Nevertheless, you need name numbering here for the same
>>> reason why you need them for constprop.
>>
>> The remaining question I had with the patch was if maybe all callers
>> can handle assigning
>> the numbering themselves, thus do sth like
>>
>>    for-each-clone-of (i, fn)
>>       DECL_NAME (...) = numbered_clone_function_name (...,
>> "artificial_thunk", i);
>>
>> which would make the map of name -> number unnecessary.

Please see my comment below.

>>
> 
> That is what I would prefer too but it also has downsides.  Callers
> would have to do their book keeping and the various cloning interfaces
> would get another parameter when already they have quite a few
> (introducing some cloning context classes seems like an overkill to me
> :-).

I'm fine with doing it the way you guys are suggesting, but just to
explain my original thinking: The way I see it is that there will
always be some kind of name -> number mapping, even if it's managed by
each individual user and even if it's actually cgraph_node ->
number. Needless to say I'm far from an expert in all of the involved
areas (this is my first contribution to GCC) so the most effective
approach I could come up is doing it all in one place. Now with your
expert advice and as I get a better understanding of how things like
sra and other clones are created, I'm starting to see that a more
tailored approach is possible and probably the way it should have been
done in the first place.

> 
> If we want to get rid of .constprop discrepancies,

Could you please clarify to me what exactly you mean by the
".constprop discrepancies"?

> something like the
> following (only very mildly tested) patch should be enough (note that
> IPA-CP currently does not really need the new has because it creates

What do you mean "it doesn't need the new hash"? My
independent-cloneids-1.c test shows that a function can have more than
one .constprop clone, but I'm probably just not understanding
something.

> clones in only one pass through the call graph, but that is something
> likely to change).  We could then adjust other cloners that we care
> about.
> 
> However, if clones of thunks are one of them, then the optional
> parameter will also proliferate to cgraph_node::create_clone(),
> cgraph_edge::redirect_callee_duplicating_thunks() and
> duplicate_thunk_for_node().  create_version_clone_with_body would almost
> certainly need it because of IPA-SRA too (IPA-SRA can of course always
> pass zero).

I'll try to do this and to convert the other users in the next version
of the patch, but I might need some help with understanding how some
of those passes work!

> 
> Martin
> 
> 
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 2b00f0165fa..703c3cfea7b 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -964,11 +964,15 @@ public:
>  			     cgraph_node *new_inlined_to,
>  			     bitmap args_to_skip, const char *suffix = NULL);
>  
> -  /* Create callgraph node clone with new declaration.  The actual body will
> -     be copied later at compilation stage.  */
> +  /* Create callgraph node clone with new declaration.  The actual body will be
> +     copied later at compilation stage.  The name of the new clone will be
> +     constructed from the name of the original name, SUFFIX and a number which
> +     can either be NUM_SUFFIX if non-negative or a unique incremented integer
> +     otherwise.  */
>    cgraph_node *create_virtual_clone (vec<cgraph_edge *> redirect_callers,
>  				     vec<ipa_replace_map *, va_gc> *tree_map,
> -				     bitmap args_to_skip, const char * suffix);
> +				     bitmap args_to_skip, const char * suffix,
> +				     int num_suffix = -1);
>  
>    /* cgraph node being removed from symbol table; see if its entry can be
>     replaced by other inline clone.  */
> @@ -2376,8 +2380,9 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
>  tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
>  /* In cgraphclones.c  */
>  
> -tree clone_function_name_1 (const char *, const char *);
> -tree clone_function_name (tree decl, const char *);
> +tree clone_function_name_1 (const char *name, const char *suffix,
> +			    int num_suffix = -1);
> +tree clone_function_name (tree decl, const char *suffix, int num_suffix = -1);

I can see how using the default -1 helped you keep this patch very
small and simple, but for my version of the patch I would actually
prefer to avoid the implicit fallback to the original global counter
and instead make the patch as big as necessary to convert all the
users who need it. The motivation is that implicit behaviour can
sometimes be missed, which I believe is related to why some existing
calls to clone_function_name are unnecessary.

On this note, what do you think about my idea of renaming the function
to numbered_clone_function_name and adding a suffixed_function_name to
make a distinction for users who don't care about numbering?

>  
>  void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
>  			       bool, bitmap, bool, bitmap, basic_block);
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index 0c0a94b04a3..865c3fa1ad0 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -513,11 +513,13 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>  
>  static GTY(()) unsigned int clone_fn_id_num;
>  
> -/* Return a new assembler name for a clone with SUFFIX of a decl named
> -   NAME.  */
> +/* Return a new assembler name for a clone with SUFFIX of a decl named NAME
> +   Apart from, string SUFFIX, the new name will end with either NUM_SIFFIX, if
> +   non-negative, or a unique unspecified number otherwise.  .  */
>  
>  tree
> -clone_function_name_1 (const char *name, const char *suffix)
> +clone_function_name_1 (const char *name, const char *suffix,
> +		       int num_suffix)
>  {
>    size_t len = strlen (name);
>    char *tmp_name, *prefix;
> @@ -526,22 +528,32 @@ clone_function_name_1 (const char *name, const char *suffix)
>    memcpy (prefix, name, len);
>    strcpy (prefix + len + 1, suffix);
>    prefix[len] = symbol_table::symbol_suffix_separator ();
> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> +  unsigned int num;
> +  if (num_suffix < 0)
> +    num = clone_fn_id_num++;
> +  else
> +    num = (unsigned) num_suffix;
> +  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, num);
>    return get_identifier (tmp_name);
>  }
>  
> -/* Return a new assembler name for a clone of DECL with SUFFIX.  */
> +/* Return a new assembler name for a clone of DECL with SUFFIX. Apart from,
> +   string SUFFIX, the new name will end with either NUM_SIFFIX, if
> +   non-negative, or a unique unspecified number otherwise.  */
>  
>  tree
> -clone_function_name (tree decl, const char *suffix)
> +clone_function_name (tree decl, const char *suffix, int num_suffix)
>  {
>    tree name = DECL_ASSEMBLER_NAME (decl);
> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
> +  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix, num_suffix);
>  }
>  
>  
> -/* Create callgraph node clone with new declaration.  The actual body will
> -   be copied later at compilation stage.
> +/* Create callgraph node clone with new declaration.  The actual body will be
> +   copied later at compilation stage.  The name of the new clone will be
> +   constructed from the name of the original name, SUFFIX and a number which
> +   can either be NUM_SUFFIX if non-negative or a unique incremented integer
> +   otherwise.
>  
>     TODO: after merging in ipa-sra use function call notes instead of args_to_skip
>     bitmap interface.
> @@ -549,7 +561,8 @@ clone_function_name (tree decl, const char *suffix)
>  cgraph_node *
>  cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
>  				   vec<ipa_replace_map *, va_gc> *tree_map,
> -				   bitmap args_to_skip, const char * suffix)
> +				   bitmap args_to_skip, const char * suffix,
> +				   int num_suffix)
>  {
>    tree old_decl = decl;
>    cgraph_node *new_node = NULL;
> @@ -584,7 +597,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
>    strcpy (name + len + 1, suffix);
>    name[len] = '.';
>    DECL_NAME (new_decl) = get_identifier (name);
> -  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix));
> +  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix,
> +							  num_suffix));
>    SET_DECL_RTL (new_decl, NULL);
>  
>    new_node = create_clone (new_decl, count, false,
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index afc45969558..52690996419 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -376,6 +376,9 @@ static profile_count max_count;
>  
>  static long overall_size, max_new_size;
>  
> +/* Hash to give different UID suffixes */
> +static hash_map<cgraph_node *, int> *clone_num_suffixes;
> +
>  /* Return the param lattices structure corresponding to the Ith formal
>     parameter of the function described by INFO.  */
>  static inline struct ipcp_param_lattices *
> @@ -3850,8 +3853,11 @@ create_specialized_node (struct cgraph_node *node,
>  	}
>      }
>  
> +  int &suffix_counter = clone_num_suffixes->get_or_insert(node);
>    new_node = node->create_virtual_clone (callers, replace_trees,
> -					 args_to_skip, "constprop");
> +					 args_to_skip, "constprop",
> +					 suffix_counter);
> +  suffix_counter++;
>  
>    bool have_self_recursive_calls = !self_recursive_calls.is_empty ();
>    for (unsigned j = 0; j < self_recursive_calls.length (); j++)
> @@ -5065,6 +5071,7 @@ ipcp_driver (void)
>  
>    ipa_check_create_node_params ();
>    ipa_check_create_edge_args ();
> +  clone_num_suffixes = new hash_map<cgraph_node *, int>;
>  
>    if (dump_file)
>      {
> @@ -5086,6 +5093,7 @@ ipcp_driver (void)
>    ipcp_store_vr_results ();
>  
>    /* Free all IPCP structures.  */
> +  delete clone_num_suffixes;

I like how this allows the reduction of the lifetime of the mapping!
Richard: that helps with your concerns about the extra memory usage,
right?


>    free_toporder_info (&topo);
>    delete edge_clone_summaries;
>    edge_clone_summaries = NULL;
> 


Thanks again,
- Michael


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PING v2][PATCH] Make function clone name numbering independent.
  2018-09-03 10:02                         ` Martin Jambor
  2018-09-03 11:08                           ` Richard Biener
@ 2018-09-05  3:32                           ` Michael Ploujnikov
  2018-12-04 12:48                             ` Martin Jambor
  1 sibling, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-09-05  3:32 UTC (permalink / raw)
  To: Martin Jambor, gcc-patches


[-- Attachment #1.1: Type: text/plain, Size: 1740 bytes --]

Hi Martin,

On 2018-09-03 06:01 AM, Martin Jambor wrote:
> Hi,
> 
> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
>> I've done some more digging into the current uses of
>> numbered_clone_function_name and checked if any tests fail if I change
>> it to suffixed_function_name:
>>
>>   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
>>     - no new tests fail, inconclusive
>>     - and despite the comment on redirect_callee_duplicating_thunks
>>       about "one or more" duplicates it doesn't seem like
>>       duplicate_thunk_for_node would be called more than once for each
>>       node, assuming each node is named uniquely, but I'm far from an
>>       expert in this area
> 
> The comment means that if there is a chain of thunks, the method clones
> all of them.  Nevertheless, you need name numbering here for the same
> reason why you need them for constprop.
> 
> 
>>   - gcc/omp-expand.c:  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
>>     - no new tests fail, inconclusive
>>     - I didn't see (and couldn't figure out a way to get) any of the
>>       existing omp/acc tests actually exercise this codeptah
> 
> I guess this one should not need it.  Build with
> --enable-offload-targets=hsa and run gomp.exp to try yourself.  I can
> run run-time HSA tests for you if you want.
> 
> Martin
> 

I've tried building with numbered_clone_function_name replaced by
suffixed_function_name and with --enable-offload-targets=hsa and
didn't see any errors in gomp.exp. I don't have a readily available
HSA setup so if you could do a quick test, I would really appreciate
it!

- Michael


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3] Make function clone name numbering independent.
  2018-09-05  3:24                               ` Michael Ploujnikov
@ 2018-11-28 21:09                                 ` Michael Ploujnikov
  2018-11-28 22:49                                   ` Segher Boessenkool
  2018-11-30  7:26                                   ` Richard Biener
  0 siblings, 2 replies; 30+ messages in thread
From: Michael Ploujnikov @ 2018-11-28 21:09 UTC (permalink / raw)
  To: Martin Jambor, Richard Biener; +Cc: GCC Patches, Jan Hubicka, segher


[-- Attachment #1.1.1: Type: text/plain, Size: 1245 bytes --]

Continuing from https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00228.html

It took longer than expected, but I've finally rebased this on top of
the new clone_function_name* API and included the requested
optimizations.

There are a few remaining spots where we could probably apply similar
optimizations:

- gcc/multiple_target.c:create_target_clone
- gcc/multiple_target.c:create_dispatcher_calls
- gcc/omp-expand.c:grid_expand_target_grid_body

But I've yet to figure out how these work in detail and with the new
API these shouldn't block the main change from being merged.

I've also included a small change to rs6000 which I'm pretty sure is
safe, but I have no way of testing.

I'm not sure what's the consensus on what's more appropriate, but I
thought that it would be a good idea to keep these changes as separate
patches since only the first one really changes behaviour, while the
rest are optimizations. It's conceivable that someone who is
backporting this to an older release might want to just backport the
core bits of the change. I can re-submit it as one patch if that's
more appropriate.

Everything in these patches was bootstrapped and regression tested on
x86_64.

Ok for trunk?

- Michael

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-Make-function-assembly-more-independent.patch --]
[-- Type: text/x-patch; name="0001-Make-function-assembly-more-independent.patch", Size: 3631 bytes --]

From 40cb5c888522d69bc42791f0c884dcb9e29eff37 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Thu, 1 Nov 2018 12:57:30 -0400
Subject: [PATCH 1/4] Make function assembly more independent.

This is achieved by having clone_function_name assign unique clone
numbers for each function independently.

gcc:

2018-11-28  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* cgraphclones.c: Replaced clone_fn_id_num with clone_fn_ids;
	  hash map.
	(clone_function_name_numbered): Use clone_fn_ids.

gcc/testsuite:

2018-11-28  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* gcc.dg/independent-cloneids-1.c: New test.
---
 gcc/cgraphclones.c                            | 10 ++++-
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++
 2 files changed, 46 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c

diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index e17959c0ca..fdd84b60d3 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -513,7 +513,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   return new_node;
 }
 
-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
 /* Return a new assembler name for a clone of decl named NAME.  Apart
    from the string SUFFIX, the new name will end with a unique (for
@@ -525,7 +525,13 @@ static GTY(()) unsigned int clone_fn_id_num;
 tree
 clone_function_name_numbered (const char *name, const char *suffix)
 {
-  return clone_function_name (name, suffix, clone_fn_id_num++);
+  /* Initialize the function->counter mapping the first time it's
+     needed.  */
+  if (!clone_fn_ids)
+    clone_fn_ids = hash_map<const char *, unsigned int>::create_ggc (64);
+  unsigned int &suffix_counter = clone_fn_ids->get_or_insert (
+				   IDENTIFIER_POINTER (get_identifier (name)));
+  return clone_function_name (name, suffix, suffix_counter++);
 }
 
 /* Return a new assembler name for a clone of DECL.  Apart from string
diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
new file mode 100644
index 0000000000..3203895492
--- /dev/null
+++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone"  } */
+
+extern int printf (const char *, ...);
+
+static int __attribute__ ((noinline))
+foo (int arg)
+{
+  return 7 * arg;
+}
+
+static int __attribute__ ((noinline))
+bar (int arg)
+{
+  return arg * arg;
+}
+
+int
+baz (int arg)
+{
+  printf("%d\n", bar (3));
+  printf("%d\n", bar (4));
+  printf("%d\n", foo (5));
+  printf("%d\n", foo (6));
+  /* adding or removing the following call should not affect foo
+     function's clone numbering */
+  printf("%d\n", bar (7));
+  return foo (8);
+}
+
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]3:} } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]4:} } } */
-- 
2.19.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.3: 0002-Minimize-clone-counter-memory-usage-in-create_virtua.patch --]
[-- Type: text/x-patch; name="0002-Minimize-clone-counter-memory-usage-in-create_virtua.patch", Size: 6590 bytes --]

From 80b07ddf059415c896cecb9862517899db3993e9 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 17 Sep 2018 16:02:27 -0400
Subject: [PATCH 2/4] Minimize clone counter memory usage in
 create_virtual_clone.

Based on Martin Jambour's suggestion:
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00111.html

gcc:

2018-11-28  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* cgraph.h (cgraph_node::create_virtual_clone): Add a new
	  argument: num_suffix.
	* cgraphclones.c (cgraph_node::create_virtual_clone): Pass
	  num_suffix to clone_function_name.
	* ipa-cp.c (create_specialized_node): Keep track of clone
	  counters in clone_num_suffixes hash map.
	(ipcp_driver): Free the counter hash map.
	* ipa-hsa.c (process_hsa_functions): Keep track of clone
	  counters in hsa_clone_numbers hash map.
---
 gcc/cgraph.h       |  8 +++++---
 gcc/cgraphclones.c | 16 +++++++++++-----
 gcc/ipa-cp.c       | 10 +++++++++-
 gcc/ipa-hsa.c      |  9 +++++++--
 4 files changed, 32 insertions(+), 11 deletions(-)

diff --git gcc/cgraph.h gcc/cgraph.h
index c13d79850f..4c0fb4da05 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -968,11 +968,13 @@ public:
 			     cgraph_node *new_inlined_to,
 			     bitmap args_to_skip, const char *suffix = NULL);
 
-  /* Create callgraph node clone with new declaration.  The actual body will
-     be copied later at compilation stage.  */
+  /* Create callgraph node clone with new declaration.  The actual body will be
+     copied later at compilation stage.  The name of the new clone will be
+     constructed from the name of the original node, SUFFIX and NUM_SUFFIX.  */
   cgraph_node *create_virtual_clone (vec<cgraph_edge *> redirect_callers,
 				     vec<ipa_replace_map *, va_gc> *tree_map,
-				     bitmap args_to_skip, const char * suffix);
+				     bitmap args_to_skip, const char * suffix,
+				     unsigned num_suffix);
 
   /* cgraph node being removed from symbol table; see if its entry can be
    replaced by other inline clone.  */
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index fdd84b60d3..51bdd7680e 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -595,8 +595,9 @@ clone_function_name (tree decl, const char *suffix)
 }
 
 
-/* Create callgraph node clone with new declaration.  The actual body will
-   be copied later at compilation stage.
+/* Create callgraph node clone with new declaration.  The actual body will be
+   copied later at compilation stage.  The name of the new clone will be
+   constructed from the name of the original node, SUFFIX and NUM_SUFFIX.
 
    TODO: after merging in ipa-sra use function call notes instead of args_to_skip
    bitmap interface.
@@ -604,7 +605,8 @@ clone_function_name (tree decl, const char *suffix)
 cgraph_node *
 cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
 				   vec<ipa_replace_map *, va_gc> *tree_map,
-				   bitmap args_to_skip, const char * suffix)
+				   bitmap args_to_skip, const char * suffix,
+				   unsigned num_suffix)
 {
   tree old_decl = decl;
   cgraph_node *new_node = NULL;
@@ -639,8 +641,12 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
   strcpy (name + len + 1, suffix);
   name[len] = '.';
   DECL_NAME (new_decl) = get_identifier (name);
-  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name_numbered (old_decl,
-								   suffix));
+  SET_DECL_ASSEMBLER_NAME (new_decl,
+			   clone_function_name (
+			     IDENTIFIER_POINTER (
+			       DECL_ASSEMBLER_NAME (old_decl)),
+			     suffix,
+			     num_suffix));
   SET_DECL_RTL (new_decl, NULL);
 
   new_node = create_clone (new_decl, count, false,
diff --git gcc/ipa-cp.c gcc/ipa-cp.c
index 4471bae11c..e0cd1bc45b 100644
--- gcc/ipa-cp.c
+++ gcc/ipa-cp.c
@@ -376,6 +376,9 @@ static profile_count max_count;
 
 static long overall_size, max_new_size;
 
+/* Node to unique clone suffix number map.  */
+static hash_map<cgraph_node *, unsigned> *clone_num_suffixes;
+
 /* Return the param lattices structure corresponding to the Ith formal
    parameter of the function described by INFO.  */
 static inline struct ipcp_param_lattices *
@@ -3828,8 +3831,11 @@ create_specialized_node (struct cgraph_node *node,
 	}
     }
 
+  unsigned &suffix_counter = clone_num_suffixes->get_or_insert (node);
   new_node = node->create_virtual_clone (callers, replace_trees,
-					 args_to_skip, "constprop");
+					 args_to_skip, "constprop",
+					 suffix_counter);
+  suffix_counter++;
 
   bool have_self_recursive_calls = !self_recursive_calls.is_empty ();
   for (unsigned j = 0; j < self_recursive_calls.length (); j++)
@@ -5043,6 +5049,7 @@ ipcp_driver (void)
 
   ipa_check_create_node_params ();
   ipa_check_create_edge_args ();
+  clone_num_suffixes = new hash_map<cgraph_node *, unsigned>;
 
   if (dump_file)
     {
@@ -5064,6 +5071,7 @@ ipcp_driver (void)
   ipcp_store_vr_results ();
 
   /* Free all IPCP structures.  */
+  delete clone_num_suffixes;
   free_toporder_info (&topo);
   delete edge_clone_summaries;
   edge_clone_summaries = NULL;
diff --git gcc/ipa-hsa.c gcc/ipa-hsa.c
index 7c6ceaab8f..b5dc3dfef2 100644
--- gcc/ipa-hsa.c
+++ gcc/ipa-hsa.c
@@ -74,6 +74,7 @@ process_hsa_functions (void)
   if (hsa_summaries == NULL)
     hsa_summaries = new hsa_summary_t (symtab);
 
+  hash_map<cgraph_node *, unsigned> hsa_clone_numbers;
   FOR_EACH_DEFINED_FUNCTION (node)
     {
       hsa_function_summary *s = hsa_summaries->get (node);
@@ -86,9 +87,11 @@ process_hsa_functions (void)
 	{
 	  if (!check_warn_node_versionable (node))
 	    continue;
+	  unsigned &clone_number = hsa_clone_numbers.get_or_insert (node);
 	  cgraph_node *clone
 	    = node->create_virtual_clone (vec <cgraph_edge *> (),
-					  NULL, NULL, "hsa");
+					  NULL, NULL, "hsa", clone_number);
+	  clone_number++;
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 	  clone->externally_visible = node->externally_visible;
 
@@ -107,9 +110,11 @@ process_hsa_functions (void)
 	{
 	  if (!check_warn_node_versionable (node))
 	    continue;
+	  unsigned &clone_number = hsa_clone_numbers.get_or_insert (node);
 	  cgraph_node *clone
 	    = node->create_virtual_clone (vec <cgraph_edge *> (),
-					  NULL, NULL, "hsa");
+					  NULL, NULL, "hsa", clone_number);
+	  clone_number++;
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 	  clone->externally_visible = node->externally_visible;
 
-- 
2.19.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.4: 0003-Minimize-clone-counter-memory-usage-in-LTO.patch --]
[-- Type: text/x-patch; name="0003-Minimize-clone-counter-memory-usage-in-LTO.patch", Size: 2613 bytes --]

From 5bb67981114c08fbaf78e5cc0f3f8abaa0aee60b Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 26 Nov 2018 14:22:39 -0500
Subject: [PATCH 3/4] Minimize clone counter memory usage in LTO.

gcc/lto:

2018-11-28  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* lto-partition.c (privatize_symbol_name_1): Keep track of
	  non-unique symbol counters in the lto_clone_numbers hash
	  map.
	(lto_promote_cross_file_statics): Allocate and free the
	lto_clone_numbers hash map.
	(lto_promote_statics_nonwpa): Free the lto_clone_numbers hash
	map.
---
 gcc/lto/lto-partition.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
index 24e7c23859..3afdcd6f5d 100644
--- gcc/lto/lto-partition.c
+++ gcc/lto/lto-partition.c
@@ -951,6 +951,9 @@ validize_symbol_for_target (symtab_node *node)
     }
 }
 
+/* Maps symbol names to unique lto clone counters.  */
+static hash_map<const char *, unsigned> *lto_clone_numbers;
+
 /* Helper for privatize_symbol_name.  Mangle NODE symbol name
    represented by DECL.  */
 
@@ -962,10 +965,13 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
   if (must_not_rename (node, name))
     return false;
 
+  unsigned &clone_number = lto_clone_numbers->get_or_insert (
+			     IDENTIFIER_POINTER (DECL_NAME (decl)));
   name = maybe_rewrite_identifier (name);
   symtab->change_decl_assembler_name (decl,
-				      clone_function_name_numbered (
-					  name, "lto_priv"));
+				      clone_function_name (
+					  name, "lto_priv", clone_number));
+  clone_number++;
 
   if (node->lto_file_data)
     lto_record_renamed_decl (node->lto_file_data, name,
@@ -1157,6 +1163,8 @@ lto_promote_cross_file_statics (void)
       part->encoder = compute_ltrans_boundary (part->encoder);
     }
 
+  lto_clone_numbers = new hash_map<const char *, unsigned>;
+
   /* Look at boundaries and promote symbols as needed.  */
   for (i = 0; i < n_sets; i++)
     {
@@ -1187,6 +1195,7 @@ lto_promote_cross_file_statics (void)
           promote_symbol (node);
         }
     }
+  delete lto_clone_numbers;
 }
 
 /* Rename statics in the whole unit in the case that 
@@ -1196,9 +1205,12 @@ void
 lto_promote_statics_nonwpa (void)
 {
   symtab_node *node;
+
+  lto_clone_numbers = new hash_map<const char *, unsigned>;
   FOR_EACH_SYMBOL (node)
     {
       rename_statics (NULL, node);
       validize_symbol_for_target (node);
     }
+  delete lto_clone_numbers;
 }
-- 
2.19.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.5: 0004-There-can-be-at-most-one-.resolver-clone-per-functio.patch --]
[-- Type: text/x-patch; name="0004-There-can-be-at-most-one-.resolver-clone-per-functio.patch", Size: 1242 bytes --]

From a822759e422f0cea11640ec3d6d6fba94bcd4ee9 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Wed, 28 Nov 2018 09:24:30 -0500
Subject: [PATCH 4/4] There can be at most one .resolver clone per function

gcc:

2018-11-28  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* config/rs6000/rs6000.c (make_resolver_func): Generate
	  resolver symbol with clone_function_name instead of
	  clone_function_name_numbered.
---
 gcc/config/rs6000/rs6000.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git gcc/config/rs6000/rs6000.c gcc/config/rs6000/rs6000.c
index 4f113cb025..a9d038829b 100644
--- gcc/config/rs6000/rs6000.c
+++ gcc/config/rs6000/rs6000.c
@@ -36997,7 +36997,7 @@ make_resolver_func (const tree default_decl,
 {
   /* Make the resolver function static.  The resolver function returns
      void *.  */
-  tree decl_name = clone_function_name_numbered (default_decl, "resolver");
+  tree decl_name = clone_function_name (default_decl, "resolver");
   const char *resolver_name = IDENTIFIER_POINTER (decl_name);
   tree type = build_function_type_list (ptr_type_node, NULL_TREE);
   tree decl = build_fn_decl (resolver_name, type);
-- 
2.19.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] Make function clone name numbering independent.
  2018-11-28 21:09                                 ` [PATCH v3] " Michael Ploujnikov
@ 2018-11-28 22:49                                   ` Segher Boessenkool
  2018-11-29 14:13                                     ` Michael Ploujnikov
  2018-11-30  7:26                                   ` Richard Biener
  1 sibling, 1 reply; 30+ messages in thread
From: Segher Boessenkool @ 2018-11-28 22:49 UTC (permalink / raw)
  To: Michael Ploujnikov
  Cc: Martin Jambor, Richard Biener, GCC Patches, Jan Hubicka

Hi!

On Wed, Nov 28, 2018 at 04:09:14PM -0500, Michael Ploujnikov wrote:
> I've also included a small change to rs6000 which I'm pretty sure is
> safe, but I have no way of testing.

Do you have an account on the GCC Compile Farm?
https://cfarm.tetaneutral.net/
There are some nice Power machines in there!

Does this patch dependent on the rest of the series?

If it tests okay, it is okay for trunk of course.  Thanks!

One comment about your changelog:

> 2018-11-28  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
> 
> 	* config/rs6000/rs6000.c (make_resolver_func): Generate
> 	  resolver symbol with clone_function_name instead of
> 	  clone_function_name_numbered.

Those last two lines should not start with the spaces.  It should be:

2018-11-28  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* config/rs6000/rs6000.c (make_resolver_func): Generate
	resolver symbol with clone_function_name instead of
	clone_function_name_numbered.


Segher

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] Make function clone name numbering independent.
  2018-11-28 22:49                                   ` Segher Boessenkool
@ 2018-11-29 14:13                                     ` Michael Ploujnikov
  0 siblings, 0 replies; 30+ messages in thread
From: Michael Ploujnikov @ 2018-11-29 14:13 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Martin Jambor, Richard Biener, GCC Patches, Jan Hubicka


[-- Attachment #1.1: Type: text/plain, Size: 1303 bytes --]

On 2018-11-28 5:49 p.m., Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Nov 28, 2018 at 04:09:14PM -0500, Michael Ploujnikov wrote:
>> I've also included a small change to rs6000 which I'm pretty sure is
>> safe, but I have no way of testing.
> 
> Do you have an account on the GCC Compile Farm?
> https://cfarm.tetaneutral.net/
> There are some nice Power machines in there!
> 
> Does this patch dependent on the rest of the series?
> 
> If it tests okay, it is okay for trunk of course.  Thanks!
> 
> One comment about your changelog:
> 
>> 2018-11-28  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>
>> 	* config/rs6000/rs6000.c (make_resolver_func): Generate
>> 	  resolver symbol with clone_function_name instead of
>> 	  clone_function_name_numbered.
> 
> Those last two lines should not start with the spaces.  It should be:
> 
> 2018-11-28  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
> 
> 	* config/rs6000/rs6000.c (make_resolver_func): Generate
> 	resolver symbol with clone_function_name instead of
> 	clone_function_name_numbered.
> 
> 
> Segher
> 

Thanks for the review and suggestion to use the cfarm! I've
successfully regtested this on power9 and committed the stand-alone
change to rs6000.c after fixing the changelog.

- Michael


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] Make function clone name numbering independent.
  2018-11-28 21:09                                 ` [PATCH v3] " Michael Ploujnikov
  2018-11-28 22:49                                   ` Segher Boessenkool
@ 2018-11-30  7:26                                   ` Richard Biener
  2018-11-30 21:11                                     ` Michael Ploujnikov
  1 sibling, 1 reply; 30+ messages in thread
From: Richard Biener @ 2018-11-30  7:26 UTC (permalink / raw)
  To: Michael Ploujnikov
  Cc: Martin Jambor, GCC Patches, Jan Hubicka, Segher Boessenkool

On Wed, Nov 28, 2018 at 10:09 PM Michael Ploujnikov
<michael.ploujnikov@oracle.com> wrote:
>
> Continuing from https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00228.html
>
> It took longer than expected, but I've finally rebased this on top of
> the new clone_function_name* API and included the requested
> optimizations.
>
> There are a few remaining spots where we could probably apply similar
> optimizations:
>
> - gcc/multiple_target.c:create_target_clone
> - gcc/multiple_target.c:create_dispatcher_calls
> - gcc/omp-expand.c:grid_expand_target_grid_body
>
> But I've yet to figure out how these work in detail and with the new
> API these shouldn't block the main change from being merged.
>
> I've also included a small change to rs6000 which I'm pretty sure is
> safe, but I have no way of testing.
>
> I'm not sure what's the consensus on what's more appropriate, but I
> thought that it would be a good idea to keep these changes as separate
> patches since only the first one really changes behaviour, while the
> rest are optimizations. It's conceivable that someone who is
> backporting this to an older release might want to just backport the
> core bits of the change. I can re-submit it as one patch if that's
> more appropriate.
>
> Everything in these patches was bootstrapped and regression tested on
> x86_64.
>
> Ok for trunk?

+  unsigned &clone_number = lto_clone_numbers->get_or_insert (
+     IDENTIFIER_POINTER (DECL_NAME (decl)));
   name = maybe_rewrite_identifier (name);
   symtab->change_decl_assembler_name (decl,
-      clone_function_name_numbered (
-  name, "lto_priv"));
+      clone_function_name (
+  name, "lto_priv", clone_number));

since we use 'name' from maybe_rewrite_identifier in the end wouldn't it
make more sense to use that as a key for lto_clone_numbers?

For the ipa-hsa.c changes since we iterate over all defined functions
we at most create a single clone per cgraph_node.  That means your
hash-map keyed on cgraph_node is useless and you could use
an unnumbered clone (or a static zero number).

-  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name_numbered (old_decl,
-   suffix));
+  SET_DECL_ASSEMBLER_NAME (new_decl,
+   clone_function_name (
+     IDENTIFIER_POINTER (
+       DECL_ASSEMBLER_NAME (old_decl)),
+     suffix,
+     num_suffix));

can you please hide the implementation detail of accessing the assembler name
by adding an overload to clone_function_name_numbered with an explicit
number?

OK with those changes.

Thanks,
Richard.



>
> - Michael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] Make function clone name numbering independent.
  2018-11-30  7:26                                   ` Richard Biener
@ 2018-11-30 21:11                                     ` Michael Ploujnikov
  2018-12-01 16:31                                       ` H.J. Lu
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-11-30 21:11 UTC (permalink / raw)
  To: Richard Biener
  Cc: Martin Jambor, GCC Patches, Jan Hubicka, Segher Boessenkool


[-- Attachment #1.1.1: Type: text/plain, Size: 1570 bytes --]

Hi,

On 2018-11-30 2:25 a.m., Richard Biener wrote:
> +  unsigned &clone_number = lto_clone_numbers->get_or_insert (
> +     IDENTIFIER_POINTER (DECL_NAME (decl)));
>    name = maybe_rewrite_identifier (name);
>    symtab->change_decl_assembler_name (decl,
> -      clone_function_name_numbered (
> -  name, "lto_priv"));
> +      clone_function_name (
> +  name, "lto_priv", clone_number));
> 
> since we use 'name' from maybe_rewrite_identifier in the end wouldn't it
> make more sense to use that as a key for lto_clone_numbers?

Yup, that looks much better. Fixed it.

> For the ipa-hsa.c changes since we iterate over all defined functions
> we at most create a single clone per cgraph_node.  That means your
> hash-map keyed on cgraph_node is useless and you could use
> an unnumbered clone (or a static zero number).

Good catch. I was thinking of doing this, but it somehow fell through the cracks during the rebase.

> 
> -  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name_numbered (old_decl,
> -   suffix));
> +  SET_DECL_ASSEMBLER_NAME (new_decl,
> +   clone_function_name (
> +     IDENTIFIER_POINTER (
> +       DECL_ASSEMBLER_NAME (old_decl)),
> +     suffix,
> +     num_suffix));
> 
> can you please hide the implementation detail of accessing the assembler name
> by adding an overload to clone_function_name_numbered with an explicit
> number?

Done.


> 
> OK with those changes.
> 
> Thanks,
> Richard.

Thank you for the review. I will commit as soon as my last test run finishes.

- Michael

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-Make-function-assembly-more-independent.patch --]
[-- Type: text/x-patch; name="0001-Make-function-assembly-more-independent.patch", Size: 3629 bytes --]

From ac1f1579d37804c97833d460ec6cd5b87d6184c7 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Thu, 1 Nov 2018 12:57:30 -0400
Subject: [PATCH 1/3] Make function assembly more independent.

This is achieved by having clone_function_name assign unique clone
numbers for each function independently.

gcc:

2018-11-30  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* cgraphclones.c: Replaced clone_fn_id_num with clone_fn_ids;
	hash map.
	(clone_function_name_numbered): Use clone_fn_ids.

gcc/testsuite:

2018-11-30  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* gcc.dg/independent-cloneids-1.c: New test.
---
 gcc/cgraphclones.c                            | 10 ++++-
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++
 2 files changed, 46 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c

diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index e17959c0ca..fdd84b60d3 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -513,7 +513,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   return new_node;
 }
 
-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
 /* Return a new assembler name for a clone of decl named NAME.  Apart
    from the string SUFFIX, the new name will end with a unique (for
@@ -525,7 +525,13 @@ static GTY(()) unsigned int clone_fn_id_num;
 tree
 clone_function_name_numbered (const char *name, const char *suffix)
 {
-  return clone_function_name (name, suffix, clone_fn_id_num++);
+  /* Initialize the function->counter mapping the first time it's
+     needed.  */
+  if (!clone_fn_ids)
+    clone_fn_ids = hash_map<const char *, unsigned int>::create_ggc (64);
+  unsigned int &suffix_counter = clone_fn_ids->get_or_insert (
+				   IDENTIFIER_POINTER (get_identifier (name)));
+  return clone_function_name (name, suffix, suffix_counter++);
 }
 
 /* Return a new assembler name for a clone of DECL.  Apart from string
diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
new file mode 100644
index 0000000000..3203895492
--- /dev/null
+++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone"  } */
+
+extern int printf (const char *, ...);
+
+static int __attribute__ ((noinline))
+foo (int arg)
+{
+  return 7 * arg;
+}
+
+static int __attribute__ ((noinline))
+bar (int arg)
+{
+  return arg * arg;
+}
+
+int
+baz (int arg)
+{
+  printf("%d\n", bar (3));
+  printf("%d\n", bar (4));
+  printf("%d\n", foo (5));
+  printf("%d\n", foo (6));
+  /* adding or removing the following call should not affect foo
+     function's clone numbering */
+  printf("%d\n", bar (7));
+  return foo (8);
+}
+
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]3:} } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]4:} } } */
-- 
2.19.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.3: 0002-Minimize-clone-counter-memory-usage-in-create_virtua.patch --]
[-- Type: text/x-patch; name="0002-Minimize-clone-counter-memory-usage-in-create_virtua.patch", Size: 7198 bytes --]

From 9656fc66b05c4ee9efd1b4a0533d564a35a85bc5 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 17 Sep 2018 16:02:27 -0400
Subject: [PATCH 2/3] Minimize clone counter memory usage in
 create_virtual_clone.

Based on Martin Jambour's suggestion:
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00111.html

gcc:

2018-11-30  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* cgraph.h (clone_function_name): Add a variant that takes a
	tree decl.
	* cgraph.h (cgraph_node::create_virtual_clone): Add a new
	argument: num_suffix.
	* cgraphclones.c (cgraph_node::create_virtual_clone): Pass
	num_suffix to clone_function_name.
	(clone_function_name): Add a variant that takes a tree decl.
	* ipa-cp.c (create_specialized_node): Keep track of clone
	counters in clone_num_suffixes hash map.
	(ipcp_driver): Free the counter hash map.
	* ipa-hsa.c (process_hsa_functions): Creates at most one hsa
	clone per function.
---
 gcc/cgraph.h       | 10 +++++++---
 gcc/cgraphclones.c | 25 ++++++++++++++++++++-----
 gcc/ipa-cp.c       | 10 +++++++++-
 gcc/ipa-hsa.c      |  4 ++--
 4 files changed, 38 insertions(+), 11 deletions(-)

diff --git gcc/cgraph.h gcc/cgraph.h
index c13d79850f..689bb828ca 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -968,11 +968,13 @@ public:
 			     cgraph_node *new_inlined_to,
 			     bitmap args_to_skip, const char *suffix = NULL);
 
-  /* Create callgraph node clone with new declaration.  The actual body will
-     be copied later at compilation stage.  */
+  /* Create callgraph node clone with new declaration.  The actual body will be
+     copied later at compilation stage.  The name of the new clone will be
+     constructed from the name of the original node, SUFFIX and NUM_SUFFIX.  */
   cgraph_node *create_virtual_clone (vec<cgraph_edge *> redirect_callers,
 				     vec<ipa_replace_map *, va_gc> *tree_map,
-				     bitmap args_to_skip, const char * suffix);
+				     bitmap args_to_skip, const char * suffix,
+				     unsigned num_suffix);
 
   /* cgraph node being removed from symbol table; see if its entry can be
    replaced by other inline clone.  */
@@ -2386,6 +2388,8 @@ tree clone_function_name_numbered (const char *name, const char *suffix);
 tree clone_function_name_numbered (tree decl, const char *suffix);
 tree clone_function_name (const char *name, const char *suffix,
 			  unsigned long number);
+tree clone_function_name (tree decl, const char *suffix,
+			  unsigned long number);
 tree clone_function_name (tree decl, const char *suffix);
 
 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index fdd84b60d3..2b7598e29e 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -568,6 +568,19 @@ clone_function_name (const char *name, const char *suffix,
   return get_identifier (tmp_name);
 }
 
+/* Return a new assembler name for a clone of DECL.  Apart from the
+   string SUFFIX, the new name will end with the specified NUMBER.  If
+   clone numbering is not needed then the two argument
+   clone_function_name should be used instead.  */
+
+tree
+clone_function_name (tree decl, const char *suffix,
+		     unsigned long number)
+{
+  return clone_function_name (
+	   IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)), suffix, number);
+}
+
 /* Return a new assembler name ending with the string SUFFIX for a
    clone of DECL.  */
 
@@ -595,8 +608,9 @@ clone_function_name (tree decl, const char *suffix)
 }
 
 
-/* Create callgraph node clone with new declaration.  The actual body will
-   be copied later at compilation stage.
+/* Create callgraph node clone with new declaration.  The actual body will be
+   copied later at compilation stage.  The name of the new clone will be
+   constructed from the name of the original node, SUFFIX and NUM_SUFFIX.
 
    TODO: after merging in ipa-sra use function call notes instead of args_to_skip
    bitmap interface.
@@ -604,7 +618,8 @@ clone_function_name (tree decl, const char *suffix)
 cgraph_node *
 cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
 				   vec<ipa_replace_map *, va_gc> *tree_map,
-				   bitmap args_to_skip, const char * suffix)
+				   bitmap args_to_skip, const char * suffix,
+				   unsigned num_suffix)
 {
   tree old_decl = decl;
   cgraph_node *new_node = NULL;
@@ -639,8 +654,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
   strcpy (name + len + 1, suffix);
   name[len] = '.';
   DECL_NAME (new_decl) = get_identifier (name);
-  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name_numbered (old_decl,
-								   suffix));
+  SET_DECL_ASSEMBLER_NAME (new_decl,
+			   clone_function_name (old_decl, suffix, num_suffix));
   SET_DECL_RTL (new_decl, NULL);
 
   new_node = create_clone (new_decl, count, false,
diff --git gcc/ipa-cp.c gcc/ipa-cp.c
index 4471bae11c..e0cd1bc45b 100644
--- gcc/ipa-cp.c
+++ gcc/ipa-cp.c
@@ -376,6 +376,9 @@ static profile_count max_count;
 
 static long overall_size, max_new_size;
 
+/* Node to unique clone suffix number map.  */
+static hash_map<cgraph_node *, unsigned> *clone_num_suffixes;
+
 /* Return the param lattices structure corresponding to the Ith formal
    parameter of the function described by INFO.  */
 static inline struct ipcp_param_lattices *
@@ -3828,8 +3831,11 @@ create_specialized_node (struct cgraph_node *node,
 	}
     }
 
+  unsigned &suffix_counter = clone_num_suffixes->get_or_insert (node);
   new_node = node->create_virtual_clone (callers, replace_trees,
-					 args_to_skip, "constprop");
+					 args_to_skip, "constprop",
+					 suffix_counter);
+  suffix_counter++;
 
   bool have_self_recursive_calls = !self_recursive_calls.is_empty ();
   for (unsigned j = 0; j < self_recursive_calls.length (); j++)
@@ -5043,6 +5049,7 @@ ipcp_driver (void)
 
   ipa_check_create_node_params ();
   ipa_check_create_edge_args ();
+  clone_num_suffixes = new hash_map<cgraph_node *, unsigned>;
 
   if (dump_file)
     {
@@ -5064,6 +5071,7 @@ ipcp_driver (void)
   ipcp_store_vr_results ();
 
   /* Free all IPCP structures.  */
+  delete clone_num_suffixes;
   free_toporder_info (&topo);
   delete edge_clone_summaries;
   edge_clone_summaries = NULL;
diff --git gcc/ipa-hsa.c gcc/ipa-hsa.c
index 7c6ceaab8f..63b41eabc9 100644
--- gcc/ipa-hsa.c
+++ gcc/ipa-hsa.c
@@ -88,7 +88,7 @@ process_hsa_functions (void)
 	    continue;
 	  cgraph_node *clone
 	    = node->create_virtual_clone (vec <cgraph_edge *> (),
-					  NULL, NULL, "hsa");
+					  NULL, NULL, "hsa", 0);
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 	  clone->externally_visible = node->externally_visible;
 
@@ -109,7 +109,7 @@ process_hsa_functions (void)
 	    continue;
 	  cgraph_node *clone
 	    = node->create_virtual_clone (vec <cgraph_edge *> (),
-					  NULL, NULL, "hsa");
+					  NULL, NULL, "hsa", 0);
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 	  clone->externally_visible = node->externally_visible;
 
-- 
2.19.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.4: 0003-Minimize-clone-counter-memory-usage-in-LTO.patch --]
[-- Type: text/x-patch; name="0003-Minimize-clone-counter-memory-usage-in-LTO.patch", Size: 2525 bytes --]

From 15283936692d1f0a8ff72c3e3211bb72eb5d11f8 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 26 Nov 2018 14:22:39 -0500
Subject: [PATCH 3/3] Minimize clone counter memory usage in LTO.

gcc/lto:

2018-11-30  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	* lto-partition.c (privatize_symbol_name_1): Keep track of
	non-unique symbol counters in the lto_clone_numbers hash
	map.
	(lto_promote_cross_file_statics): Allocate and free the
	lto_clone_numbers hash map.
	(lto_promote_statics_nonwpa): Free the lto_clone_numbers hash
	map.
---
 gcc/lto/lto-partition.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
index 24e7c23859..867f075b58 100644
--- gcc/lto/lto-partition.c
+++ gcc/lto/lto-partition.c
@@ -951,6 +951,9 @@ validize_symbol_for_target (symtab_node *node)
     }
 }
 
+/* Maps symbol names to unique lto clone counters.  */
+static hash_map<const char *, unsigned> *lto_clone_numbers;
+
 /* Helper for privatize_symbol_name.  Mangle NODE symbol name
    represented by DECL.  */
 
@@ -963,9 +966,11 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
     return false;
 
   name = maybe_rewrite_identifier (name);
+  unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
   symtab->change_decl_assembler_name (decl,
-				      clone_function_name_numbered (
-					  name, "lto_priv"));
+				      clone_function_name (
+					  name, "lto_priv", clone_number));
+  clone_number++;
 
   if (node->lto_file_data)
     lto_record_renamed_decl (node->lto_file_data, name,
@@ -1157,6 +1162,8 @@ lto_promote_cross_file_statics (void)
       part->encoder = compute_ltrans_boundary (part->encoder);
     }
 
+  lto_clone_numbers = new hash_map<const char *, unsigned>;
+
   /* Look at boundaries and promote symbols as needed.  */
   for (i = 0; i < n_sets; i++)
     {
@@ -1187,6 +1194,7 @@ lto_promote_cross_file_statics (void)
           promote_symbol (node);
         }
     }
+  delete lto_clone_numbers;
 }
 
 /* Rename statics in the whole unit in the case that 
@@ -1196,9 +1204,12 @@ void
 lto_promote_statics_nonwpa (void)
 {
   symtab_node *node;
+
+  lto_clone_numbers = new hash_map<const char *, unsigned>;
   FOR_EACH_SYMBOL (node)
     {
       rename_statics (NULL, node);
       validize_symbol_for_target (node);
     }
+  delete lto_clone_numbers;
 }
-- 
2.19.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] Make function clone name numbering independent.
  2018-11-30 21:11                                     ` Michael Ploujnikov
@ 2018-12-01 16:31                                       ` H.J. Lu
  2018-12-03 17:00                                         ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: H.J. Lu @ 2018-12-01 16:31 UTC (permalink / raw)
  To: michael.ploujnikov
  Cc: Richard Guenther, Martin Jambor, GCC Patches, Jan Hubicka,
	Segher Boessenkool

On Fri, Nov 30, 2018 at 1:11 PM Michael Ploujnikov
<michael.ploujnikov@oracle.com> wrote:
>
> Hi,
>
> On 2018-11-30 2:25 a.m., Richard Biener wrote:
> > +  unsigned &clone_number = lto_clone_numbers->get_or_insert (
> > +     IDENTIFIER_POINTER (DECL_NAME (decl)));
> >    name = maybe_rewrite_identifier (name);
> >    symtab->change_decl_assembler_name (decl,
> > -      clone_function_name_numbered (
> > -  name, "lto_priv"));
> > +      clone_function_name (
> > +  name, "lto_priv", clone_number));
> >
> > since we use 'name' from maybe_rewrite_identifier in the end wouldn't it
> > make more sense to use that as a key for lto_clone_numbers?
>
> Yup, that looks much better. Fixed it.
>
> > For the ipa-hsa.c changes since we iterate over all defined functions
> > we at most create a single clone per cgraph_node.  That means your
> > hash-map keyed on cgraph_node is useless and you could use
> > an unnumbered clone (or a static zero number).
>
> Good catch. I was thinking of doing this, but it somehow fell through the cracks during the rebase.
>
> >
> > -  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name_numbered (old_decl,
> > -   suffix));
> > +  SET_DECL_ASSEMBLER_NAME (new_decl,
> > +   clone_function_name (
> > +     IDENTIFIER_POINTER (
> > +       DECL_ASSEMBLER_NAME (old_decl)),
> > +     suffix,
> > +     num_suffix));
> >
> > can you please hide the implementation detail of accessing the assembler name
> > by adding an overload to clone_function_name_numbered with an explicit
> > number?
>
> Done.
>
>
> >
> > OK with those changes.
> >
> > Thanks,
> > Richard.
>
> Thank you for the review. I will commit as soon as my last test run finishes.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88297

-- 
H.J.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] Make function clone name numbering independent.
  2018-12-01 16:31                                       ` H.J. Lu
@ 2018-12-03 17:00                                         ` Michael Ploujnikov
  2018-12-04  3:06                                           ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-12-03 17:00 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Martin Jambor, GCC Patches, Jan Hubicka


[-- Attachment #1.1: Type: text/plain, Size: 603 bytes --]

On 2018-12-01 11:29 a.m., H.J. Lu wrote:
> This caused:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88297
> 

Sorry about that. Looks like I should have been testing with
--with-build-config=bootstrap-lto rather than just --enable-bootstrap.

The quick fix would be to undo the patch to create_virtual_clone or to
just change clone_num_suffixes to key off of DECL_ASSEMBLER_NAME
(node->decl) instead of node pointers. Any preferences?

The harder fix would be to figure out why some nodes share the same
names and fix that, but maybe that's just inevitable with LTO?

- Michael


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] Make function clone name numbering independent.
  2018-12-03 17:00                                         ` Michael Ploujnikov
@ 2018-12-04  3:06                                           ` Michael Ploujnikov
  2018-12-04 14:07                                             ` Jan Hubicka
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ploujnikov @ 2018-12-04  3:06 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Martin Jambor, GCC Patches, Jan Hubicka


[-- Attachment #1.1.1: Type: text/plain, Size: 841 bytes --]

On 2018-12-03 12:00 p.m., Michael Ploujnikov wrote:
> On 2018-12-01 11:29 a.m., H.J. Lu wrote:
>> This caused:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88297
>>
> 
> Sorry about that. Looks like I should have been testing with
> --with-build-config=bootstrap-lto rather than just --enable-bootstrap.
> 
> The quick fix would be to undo the patch to create_virtual_clone or to
> just change clone_num_suffixes to key off of DECL_ASSEMBLER_NAME
> (node->decl) instead of node pointers. Any preferences?
> 
> The harder fix would be to figure out why some nodes share the same
> names and fix that, but maybe that's just inevitable with LTO?
> 
> - Michael
> 

Here's a quick fix while the issue is being investigated.

Bootstrapped (--with-build-config=bootstrap-lto) and regtested on x86_64.

Ok for trunk?

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-PR-ipa-88297.patch --]
[-- Type: text/x-patch; name="0001-PR-ipa-88297.patch", Size: 1849 bytes --]

From f5e2500f30ad337e85e0b53eaa15c724657966a2 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 3 Dec 2018 18:19:18 -0500
Subject: [PATCH] PR ipa/88297

gcc/ChangeLog:

2018-12-03  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

	PR ipa/88297
	* ipa-cp.c (create_specialized_node): Store clone counters by
	node assembler names.
	(ipcp_driver): Change type of clone_num_suffixes key type to
	const char*.
---
 gcc/ipa-cp.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git gcc/ipa-cp.c gcc/ipa-cp.c
index e0cd1bc45b..f82473e37c 100644
--- gcc/ipa-cp.c
+++ gcc/ipa-cp.c
@@ -376,8 +376,8 @@ static profile_count max_count;
 
 static long overall_size, max_new_size;
 
-/* Node to unique clone suffix number map.  */
-static hash_map<cgraph_node *, unsigned> *clone_num_suffixes;
+/* Node name to unique clone suffix number map.  */
+static hash_map<const char *, unsigned> *clone_num_suffixes;
 
 /* Return the param lattices structure corresponding to the Ith formal
    parameter of the function described by INFO.  */
@@ -3831,7 +3831,9 @@ create_specialized_node (struct cgraph_node *node,
 	}
     }
 
-  unsigned &suffix_counter = clone_num_suffixes->get_or_insert (node);
+  unsigned &suffix_counter = clone_num_suffixes->get_or_insert (
+			       IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (
+				 node->decl)));
   new_node = node->create_virtual_clone (callers, replace_trees,
 					 args_to_skip, "constprop",
 					 suffix_counter);
@@ -5049,7 +5051,7 @@ ipcp_driver (void)
 
   ipa_check_create_node_params ();
   ipa_check_create_edge_args ();
-  clone_num_suffixes = new hash_map<cgraph_node *, unsigned>;
+  clone_num_suffixes = new hash_map<const char *, unsigned>;
 
   if (dump_file)
     {
-- 
2.19.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PING v2][PATCH] Make function clone name numbering independent.
  2018-09-05  3:32                           ` [PING v2][PATCH] " Michael Ploujnikov
@ 2018-12-04 12:48                             ` Martin Jambor
  2018-12-04 13:22                               ` Michael Ploujnikov
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Jambor @ 2018-12-04 12:48 UTC (permalink / raw)
  To: Michael Ploujnikov, gcc-patches

Hi,

On Tue, Sep 04 2018, Michael Ploujnikov wrote:
>
> I've tried building with numbered_clone_function_name replaced by
> suffixed_function_name and with --enable-offload-targets=hsa and
> didn't see any errors in gomp.exp. I don't have a readily available
> HSA setup so if you could do a quick test, I would really appreciate
> it!

Sorry it took so long, I have run regular tests on trunk today and your
patch does not appear to have caused any issues.

Martin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PING v2][PATCH] Make function clone name numbering independent.
  2018-12-04 12:48                             ` Martin Jambor
@ 2018-12-04 13:22                               ` Michael Ploujnikov
  0 siblings, 0 replies; 30+ messages in thread
From: Michael Ploujnikov @ 2018-12-04 13:22 UTC (permalink / raw)
  To: Martin Jambor, gcc-patches


[-- Attachment #1.1: Type: text/plain, Size: 595 bytes --]

On 2018-12-04 7:48 a.m., Martin Jambor wrote:
> Hi,
> 
> On Tue, Sep 04 2018, Michael Ploujnikov wrote:
>>
>> I've tried building with numbered_clone_function_name replaced by
>> suffixed_function_name and with --enable-offload-targets=hsa and
>> didn't see any errors in gomp.exp. I don't have a readily available
>> HSA setup so if you could do a quick test, I would really appreciate
>> it!
> 
> Sorry it took so long, I have run regular tests on trunk today and your
> patch does not appear to have caused any issues.
> 
> Martin
> 

Thank you for checking.

- Michael


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] Make function clone name numbering independent.
  2018-12-04  3:06                                           ` Michael Ploujnikov
@ 2018-12-04 14:07                                             ` Jan Hubicka
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Hubicka @ 2018-12-04 14:07 UTC (permalink / raw)
  To: Michael Ploujnikov; +Cc: H.J. Lu, Richard Guenther, Martin Jambor, GCC Patches

> On 2018-12-03 12:00 p.m., Michael Ploujnikov wrote:
> > On 2018-12-01 11:29 a.m., H.J. Lu wrote:
> >> This caused:
> >>
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88297
> >>
> > 
> > Sorry about that. Looks like I should have been testing with
> > --with-build-config=bootstrap-lto rather than just --enable-bootstrap.
> > 
> > The quick fix would be to undo the patch to create_virtual_clone or to
> > just change clone_num_suffixes to key off of DECL_ASSEMBLER_NAME
> > (node->decl) instead of node pointers. Any preferences?
> > 
> > The harder fix would be to figure out why some nodes share the same
> > names and fix that, but maybe that's just inevitable with LTO?
> > 
> > - Michael
> > 
> 
> Here's a quick fix while the issue is being investigated.
> 
> Bootstrapped (--with-build-config=bootstrap-lto) and regtested on x86_64.
> 
> Ok for trunk?

> From f5e2500f30ad337e85e0b53eaa15c724657966a2 Mon Sep 17 00:00:00 2001
> From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
> Date: Mon, 3 Dec 2018 18:19:18 -0500
> Subject: [PATCH] PR ipa/88297
> 
> gcc/ChangeLog:
> 
> 2018-12-03  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
> 
> 	PR ipa/88297
> 	* ipa-cp.c (create_specialized_node): Store clone counters by
> 	node assembler names.
> 	(ipcp_driver): Change type of clone_num_suffixes key type to
> 	const char*.

OK,
thanks!
Honza

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-12-04 14:07 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-16 19:38 [PATCH] Make function clone name numbering independent Michael Ploujnikov
2018-07-17  6:10 ` Bernhard Reutner-Fischer
2018-07-17 10:03   ` Richard Biener
2018-07-17 20:25     ` Michael Ploujnikov
2018-07-20  2:49       ` Michael Ploujnikov
2018-07-20 10:05         ` Richard Biener
2018-07-24 13:57           ` Michael Ploujnikov
2018-07-26 17:27             ` Michael Ploujnikov
2018-07-31 17:40               ` Michael Ploujnikov
2018-08-01 10:38                 ` Richard Biener
2018-08-02 19:05                   ` Michael Ploujnikov
2018-08-13 23:58                     ` [PING][PATCH] " Michael Ploujnikov
2018-08-20 19:47                       ` Jeff Law
2018-08-31 18:49                       ` [PING v2][PATCH] " Michael Ploujnikov
2018-09-03 10:02                         ` Martin Jambor
2018-09-03 11:08                           ` Richard Biener
2018-09-03 13:15                             ` Martin Jambor
2018-09-05  3:24                               ` Michael Ploujnikov
2018-11-28 21:09                                 ` [PATCH v3] " Michael Ploujnikov
2018-11-28 22:49                                   ` Segher Boessenkool
2018-11-29 14:13                                     ` Michael Ploujnikov
2018-11-30  7:26                                   ` Richard Biener
2018-11-30 21:11                                     ` Michael Ploujnikov
2018-12-01 16:31                                       ` H.J. Lu
2018-12-03 17:00                                         ` Michael Ploujnikov
2018-12-04  3:06                                           ` Michael Ploujnikov
2018-12-04 14:07                                             ` Jan Hubicka
2018-09-05  3:32                           ` [PING v2][PATCH] " Michael Ploujnikov
2018-12-04 12:48                             ` Martin Jambor
2018-12-04 13:22                               ` Michael Ploujnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).