public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [Patch, objcopy] Fix objcopy quadratic-time when handling --redefine-syms
@ 2017-04-06  8:43 Jiong Wang
  2017-04-06  8:51 ` Pedro Alves
  2017-04-06  9:18 ` Alan Modra
  0 siblings, 2 replies; 4+ messages in thread
From: Jiong Wang @ 2017-04-06  8:43 UTC (permalink / raw)
  To: Binutils

[-- Attachment #1: Type: text/plain, Size: 2598 bytes --]

When handling --redefine-syms, objcopy is using a list.  For each redefine pair,
objcopy needs to traverse the list and do strcmp against each node on the list,
this caused quadratic-time.  When there is bulk-renaming in large collection of
c++ object files, it is quite slow.

This patch use hashtab for redefine node.  Customized "eq" and "hash" hooks are
added as we want to use redefine_node->source as the key to generate hash and do
the equality comparison.
  
For 10000 symbols, it reduce the time from >1s to < 0.05s.
For 100000 symbols, reduce the time from >2m to <0.5s.

The time increase looks like is linear after this patch.

gen-redef.sh
===
#!/bin/bash
for VARIABLE in {1..10000}
do
     echo "veryveryverylongsourcesymbolfortestingveryveryverylongsourcesymbolfortestingveryveryverylongsourcesymbolfortesting$VARIABLE muchmuchmuchveryveryverylonglongsourcesymbolfortestingmuchmuchmuchveryveryverylonglongsourcesymbolfortestingmuchmuchmuchveryveryverylonglongsourcesymbolfortesting$VARIABLE" >> redefine.txt
done

echo "main muchmuchmuchveryveryverylonglongsourcesymbolfortestingmuchmuchmuchveryveryverylonglongsourcesymbolfortestingmuchmuchmuchveryveryverylonglongsourcesymbolfortesting_mainrename" >> redefine.txt
===

objcopy --redefine-syms=redefine.txt hello.o hello-out.o
(hello.o is generated from simple helloworld).

The time reduced from
   real    0m1.264s
   user    0m1.260s
   sys     0m0.004s
to:
   real    0m0.044s
   user    0m0.044s
   sys     0m0.000s


There is no regression on check-binutils after this patch, although I found
there is actually no check on redefine symbol feature in binutils testsuite.
So I also tested this patch on several manually written testcases which were
trying to rename some symbols, they works fine.

Ok for master?


binutils/
2017-04-05  Jiong Wang  <jiong.wang@arm.com>

         * objcopy.c (struct redefine_node): Delete the field "next".
         (redefine_sym_list): Deleted.
         (redefine_specific_htab): New hash table.
         (redefine_specific_reverse_htab): Likewise.
         (eq_string_redefnode): New function.
         (htab_hash_redefnode): Likewise.
         (create_symbol2redef_htab): Likewise.
         (add_specific_symbol_node): Likewise.
         (create_symbol_htabs): Create redefine_specific_htab and
         redefine_specific_reverse_htab.
         (lookup_sym_redefinition): Use hash table instead of list.
         (redefine_list_append): Likewise, and rename to add_redefine_and_check.
         (copy_main): Use redefine_specific_htab instead of redefine_sym_list.
         Update comments.


[-- Attachment #2: fix-rename.patch --]
[-- Type: text/x-diff, Size: 7103 bytes --]

diff --git a/binutils/objcopy.c b/binutils/objcopy.c
index 4af4d92..715e5fa 100644
--- a/binutils/objcopy.c
+++ b/binutils/objcopy.c
@@ -54,12 +54,11 @@ struct is_specified_symbol_predicate_data
   bfd_boolean	found;
 };
 
-/* A list to support redefine_sym.  */
+/* A node includes symbol name mapping to support redefine_sym.  */
 struct redefine_node
 {
   char *source;
   char *target;
-  struct redefine_node *next;
 };
 
 struct addsym_node
@@ -249,7 +248,8 @@ static htab_t localize_specific_htab = NULL;
 static htab_t globalize_specific_htab = NULL;
 static htab_t keepglobal_specific_htab = NULL;
 static htab_t weaken_specific_htab = NULL;
-static struct redefine_node *redefine_sym_list = NULL;
+static htab_t redefine_specific_htab = NULL;
+static htab_t redefine_specific_reverse_htab = NULL;
 static struct addsym_node *add_sym_list = NULL, **add_sym_tail = &add_sym_list;
 static int add_symbols = 0;
 
@@ -947,6 +947,34 @@ find_section_list (const char *name, bfd_boolean add, unsigned int context)
   return p;
 }
 
+/* S1 is the entry node already in the table, S2 is the key node.  */
+
+static int
+eq_string_redefnode (const void *s1, const void *s2)
+{
+  struct redefine_node *node1 = (struct redefine_node *) s1;
+  struct redefine_node *node2 = (struct redefine_node *) s2;
+  return !strcmp ((const char *) node1->source, (const char *) node2->source);
+}
+
+/* P is redefine node.  Hash value is generated from its "source" filed.  */
+
+static hashval_t
+htab_hash_redefnode (const PTR p)
+{
+  struct redefine_node *redefnode = (struct redefine_node *) p;
+  return htab_hash_string (redefnode->source);
+}
+
+/* Create hashtab used for redefine node.  */
+
+static htab_t
+create_symbol2redef_htab (void)
+{
+  return htab_create_alloc (16, htab_hash_redefnode, eq_string_redefnode, NULL,
+			    xcalloc, free);
+}
+
 /* There is htab_hash_string but no htab_eq_string. Makes sense.  */
 
 static int
@@ -971,6 +999,10 @@ create_symbol_htabs (void)
   globalize_specific_htab = create_symbol_htab ();
   keepglobal_specific_htab = create_symbol_htab ();
   weaken_specific_htab = create_symbol_htab ();
+  redefine_specific_htab = create_symbol2redef_htab ();
+  /* As there is no bidirectional hash table in libiberty, need a reverse table
+     to check duplicated target string.  */
+  redefine_specific_reverse_htab = create_symbol_htab ();
 }
 
 /* Add a symbol to strip_specific_list.  */
@@ -981,6 +1013,14 @@ add_specific_symbol (const char *name, htab_t htab)
   *htab_find_slot (htab, name, INSERT) = (char *) name;
 }
 
+/* Like add_specific_symbol, but the element type is PTR.  */
+
+static void
+add_specific_symbol_node (const PTR node, htab_t htab)
+{
+  *htab_find_slot (htab, node, INSERT) = (PTR) node;
+}
+
 /* Add symbols listed in `filename' to strip_specific_list.  */
 
 #define IS_WHITESPACE(c)      ((c) == ' ' || (c) == '\t')
@@ -1439,7 +1479,7 @@ filter_symbols (bfd *abfd, bfd *obfd, asymbol **osyms,
 	    to[dst_count++] = create_new_symbol (ptr, obfd);
 	}
 
-      if (redefine_sym_list || section_rename_list)
+      if (htab_elements (redefine_specific_htab) || section_rename_list)
 	{
 	  char *new_name;
 
@@ -1625,42 +1665,40 @@ filter_symbols (bfd *abfd, bfd *obfd, asymbol **osyms,
 static const char *
 lookup_sym_redefinition (const char *source)
 {
-  struct redefine_node *list;
-
-  for (list = redefine_sym_list; list != NULL; list = list->next)
-    if (strcmp (source, list->source) == 0)
-      return list->target;
+  struct redefine_node key_node = {(char *) source, NULL};
+  struct redefine_node *redef_node
+    = (struct redefine_node *) htab_find (redefine_specific_htab, &key_node);
 
-  return source;
+  return redef_node == NULL ? source : redef_node->target;
 }
 
-/* Add a node to a symbol redefine list.  */
+/* Insert a node into symbol redefine hash tabel.  */
 
 static void
-redefine_list_append (const char *cause, const char *source, const char *target)
+add_redefine_and_check (const char *cause, const char *source,
+			const char *target)
 {
-  struct redefine_node **p;
-  struct redefine_node *list;
-  struct redefine_node *new_node;
+  struct redefine_node *new_node
+    = (struct redefine_node *) xmalloc (sizeof (struct redefine_node));
 
-  for (p = &redefine_sym_list; (list = *p) != NULL; p = &list->next)
-    {
-      if (strcmp (source, list->source) == 0)
-	fatal (_("%s: Multiple redefinition of symbol \"%s\""),
-	       cause, source);
+  new_node->source = strdup (source);
+  new_node->target = strdup (target);
 
-      if (strcmp (target, list->target) == 0)
-	fatal (_("%s: Symbol \"%s\" is target of more than one redefinition"),
-	       cause, target);
-    }
+  if (htab_find (redefine_specific_htab, new_node) != HTAB_EMPTY_ENTRY)
+    fatal (_("%s: Multiple redefinition of symbol \"%s\""),
+	   cause, source);
 
-  new_node = (struct redefine_node *) xmalloc (sizeof (struct redefine_node));
+  if (htab_find (redefine_specific_reverse_htab, target) != HTAB_EMPTY_ENTRY)
+    fatal (_("%s: Symbol \"%s\" is target of more than one redefinition"),
+	   cause, target);
 
-  new_node->source = strdup (source);
-  new_node->target = strdup (target);
-  new_node->next = NULL;
+  /* Insert the NEW_NODE into hash table for quick search.  */
+  add_specific_symbol_node (new_node, redefine_specific_htab);
+
+  /* Insert the target string into the reverse hash table, this is needed for
+     duplicated target string check.  */
+  add_specific_symbol (new_node->target, redefine_specific_reverse_htab);
 
-  *p = new_node;
 }
 
 /* Handle the --redefine-syms option.  Read lines containing "old new"
@@ -1745,7 +1783,7 @@ add_redefine_syms_file (const char *filename)
 	end_of_line:
 	  /* Append the redefinition to the list.  */
 	  if (buf[0] != '\0')
-	    redefine_list_append (filename, &buf[0], &buf[outsym_off]);
+	    add_redefine_and_check (filename, &buf[0], &buf[outsym_off]);
 
 	  lineno++;
 	  len = 0;
@@ -2695,13 +2733,13 @@ copy_object (bfd *ibfd, bfd *obfd, const bfd_arch_info_type *input_arch)
       || htab_elements (globalize_specific_htab) != 0
       || htab_elements (keepglobal_specific_htab) != 0
       || htab_elements (weaken_specific_htab) != 0
+      || htab_elements (redefine_specific_htab) != 0
       || prefix_symbols_string
       || sections_removed
       || sections_copied
       || convert_debugging
       || change_leading_char
       || remove_leading_char
-      || redefine_sym_list
       || section_rename_list
       || weaken
       || add_symbols)
@@ -4750,7 +4788,7 @@ copy_main (int argc, char *argv[])
 
 	case OPTION_REDEFINE_SYM:
 	  {
-	    /* Push this redefinition onto redefine_symbol_list.  */
+	    /* Insert this redefinition onto redefine_specific_htab.  */
 
 	    int len;
 	    const char *s;
@@ -4771,7 +4809,7 @@ copy_main (int argc, char *argv[])
 	    target = (char *) xmalloc (len + 1);
 	    strcpy (target, nextarg);
 
-	    redefine_list_append ("--redefine-sym", source, target);
+	    add_redefine_and_check ("--redefine-sym", source, target);
 
 	    free (source);
 	    free (target);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch, objcopy] Fix objcopy quadratic-time when handling --redefine-syms
  2017-04-06  8:43 [Patch, objcopy] Fix objcopy quadratic-time when handling --redefine-syms Jiong Wang
@ 2017-04-06  8:51 ` Pedro Alves
  2017-04-06  8:58   ` Jiong Wang
  2017-04-06  9:18 ` Alan Modra
  1 sibling, 1 reply; 4+ messages in thread
From: Pedro Alves @ 2017-04-06  8:51 UTC (permalink / raw)
  To: Jiong Wang, Binutils

On 04/06/2017 09:43 AM, Jiong Wang wrote:
> 
> 
> There is no regression on check-binutils after this patch, although I found
> there is actually no check on redefine symbol feature in binutils testsuite.
> 
> So I also tested this patch on several manually written testcases which were
> 
> trying to rename some symbols, they works fine.

Could those manually written tests be converted to real testsuite tests?

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch, objcopy] Fix objcopy quadratic-time when handling --redefine-syms
  2017-04-06  8:51 ` Pedro Alves
@ 2017-04-06  8:58   ` Jiong Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Jiong Wang @ 2017-04-06  8:58 UTC (permalink / raw)
  To: Pedro Alves, Binutils

On 06/04/17 09:51, Pedro Alves wrote:
> On 04/06/2017 09:43 AM, Jiong Wang wrote:
>>
>> There is no regression on check-binutils after this patch, although I found
>> there is actually no check on redefine symbol feature in binutils testsuite.
>>
>> So I also tested this patch on several manually written testcases which were
>>
>> trying to rename some symbols, they works fine.
> Could those manually written tests be converted to real testsuite tests?

They could.

I was using the automatically generated redefine file listed in this email, then inserted some
valid renaming pair in some random places, and was checking the symbols are really renamed in the
final output file.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch, objcopy] Fix objcopy quadratic-time when handling --redefine-syms
  2017-04-06  8:43 [Patch, objcopy] Fix objcopy quadratic-time when handling --redefine-syms Jiong Wang
  2017-04-06  8:51 ` Pedro Alves
@ 2017-04-06  9:18 ` Alan Modra
  1 sibling, 0 replies; 4+ messages in thread
From: Alan Modra @ 2017-04-06  9:18 UTC (permalink / raw)
  To: Jiong Wang; +Cc: Binutils

On Thu, Apr 06, 2017 at 09:43:15AM +0100, Jiong Wang wrote:
>         * objcopy.c (struct redefine_node): Delete the field "next".
>         (redefine_sym_list): Deleted.
>         (redefine_specific_htab): New hash table.
>         (redefine_specific_reverse_htab): Likewise.
>         (eq_string_redefnode): New function.
>         (htab_hash_redefnode): Likewise.
>         (create_symbol2redef_htab): Likewise.
>         (add_specific_symbol_node): Likewise.
>         (create_symbol_htabs): Create redefine_specific_htab and
>         redefine_specific_reverse_htab.
>         (lookup_sym_redefinition): Use hash table instead of list.
>         (redefine_list_append): Likewise, and rename to add_redefine_and_check.
>         (copy_main): Use redefine_specific_htab instead of redefine_sym_list.
>         Update comments.

OK, except

> +static hashval_t
> +htab_hash_redefnode (const PTR p)

please replace PTR with void *,

> +/* Like add_specific_symbol, but the element type is PTR.  */
> +
> +static void
> +add_specific_symbol_node (const PTR node, htab_t htab)
> +{
> +  *htab_find_slot (htab, node, INSERT) = (PTR) node;

and here.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-04-06  9:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06  8:43 [Patch, objcopy] Fix objcopy quadratic-time when handling --redefine-syms Jiong Wang
2017-04-06  8:51 ` Pedro Alves
2017-04-06  8:58   ` Jiong Wang
2017-04-06  9:18 ` Alan Modra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).