public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Optimize reading ELF objects with many group sections
@ 2018-01-08  9:54 Jens Widell
  2018-01-08 10:23 ` Nick Clifton
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Widell @ 2018-01-08  9:54 UTC (permalink / raw)
  To: binutils; +Cc: Jens Widell

When processing a section that is a member of a group, the group
that contains it is looked up using a linear search. The resulting
O(n^2) complexity causes significant performance issues when
dealing with object files with very many groups.

By remembering the index of the last found group and restarting
the next search from that index, the search instead becomes O(n)
in common cases.

Signed-off-by: Jens Widell <jl@opera.com>
---
Background: This turned out to be a significant issue when building
Chromium with debug fission, which means clang calls objcopy twice at
the end of a compilation to extract debug information into a separate
file, and unity building, which combines many source files into a
single compilation unit to reduce overall build time, thus procuding
some very large object files with many sections.

 bfd/elf-bfd.h |  1 +
 bfd/elf.c     | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/bfd/elf-bfd.h b/bfd/elf-bfd.h
index 9c9b0c7..07879e5 100644
--- a/bfd/elf-bfd.h
+++ b/bfd/elf-bfd.h
@@ -1908,6 +1908,7 @@ struct elf_obj_tdata
 
   Elf_Internal_Shdr **group_sect_ptr;
   int num_group;
+  unsigned int group_search_offset;
 
   unsigned int symtab_section, dynsymtab_section;
   unsigned int dynversym_section, dynverdef_section, dynverref_section;
diff --git a/bfd/elf.c b/bfd/elf.c
index 9f44ff9..d7d084e 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -737,10 +737,17 @@ setup_group (bfd *abfd, Elf_Internal_Shdr *hdr, asection *newsect)
 
   if (num_group != (unsigned) -1)
     {
-      unsigned int i;
+      unsigned int search_offset = elf_tdata (abfd)->group_search_offset;
+      unsigned int j;
 
-      for (i = 0; i < num_group; i++)
+      for (j = 0; j < num_group; j++)
 	{
+	  /* Begin search from previous found group. */
+	  unsigned i = j + search_offset;
+
+	  if (i >= num_group)
+	    i -= num_group;
+
 	  Elf_Internal_Shdr *shdr = elf_tdata (abfd)->group_sect_ptr[i];
 	  Elf_Internal_Group *idx;
 	  bfd_size_type n_elt;
@@ -803,7 +810,9 @@ setup_group (bfd *abfd, Elf_Internal_Shdr *hdr, asection *newsect)
 		if (shdr->bfd_section != NULL)
 		  elf_next_in_group (shdr->bfd_section) = newsect;
 
-		i = num_group - 1;
+		elf_tdata (abfd)->group_search_offset = i;
+
+		j = num_group - 1;
 		break;
 	      }
 	}
-- 
2.7.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Optimize reading ELF objects with many group sections
  2018-01-08  9:54 [PATCH] Optimize reading ELF objects with many group sections Jens Widell
@ 2018-01-08 10:23 ` Nick Clifton
  2018-01-08 13:50   ` Jens Widell
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Clifton @ 2018-01-08 10:23 UTC (permalink / raw)
  To: Jens Widell, binutils

Hi Jens,

  A couple of comments on the patch:

  * It would be helpful if you could include a changelog 
    entry along with the patch itself.

  * Please can you tell us how you tested the patch, and if
    you noticed any regressions.  (I do not expect that there
    were any, but you never know).  In particular a patch like
    this that affects generic code needs to be tested with a 
    wide range of targets to make sure that it does not break
    anything.

  * Also...

> +++ b/bfd/elf-bfd.h
> @@ -1908,6 +1908,7 @@ struct elf_obj_tdata
>  
>    Elf_Internal_Shdr **group_sect_ptr;
>    int num_group;
> +  unsigned int group_search_offset;

  I think that it would be helpful to include a comment by 
  this field explaining what it is for.


> +	  /* Begin search from previous found group. */
> +	  unsigned i = j + search_offset;
> +
> +	  if (i >= num_group)
> +	    i -= num_group;

  I found this to be a little bit confusing when I was reading 
  the patch for the first time.  Maybe this would be clearer ?

          unsigned int i = (j + search_offset) % num_group;

  Of course since the intent is to optimise this loop for speed 
  you may object to the presence of the modulo operator, so I 
  leave the choice up to you.

Cheers
  Nick


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Optimize reading ELF objects with many group sections
  2018-01-08 10:23 ` Nick Clifton
@ 2018-01-08 13:50   ` Jens Widell
  2018-01-08 14:40     ` Nick Clifton
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Widell @ 2018-01-08 13:50 UTC (permalink / raw)
  To: Nick Clifton; +Cc: binutils

On Mon, Jan 8, 2018 at 11:23 AM, Nick Clifton <nickc@redhat.com> wrote:
> Hi Jens,
>
>   A couple of comments on the patch:
>
>   * It would be helpful if you could include a changelog
>     entry along with the patch itself.

Will do that in the next revision.

>   * Please can you tell us how you tested the patch, and if
>     you noticed any regressions.  (I do not expect that there
>     were any, but you never know).  In particular a patch like
>     this that affects generic code needs to be tested with a
>     wide range of targets to make sure that it does not break
>     anything.

As I was investigating a performance issue, my manual testing has been
somewhat focused on performance. I've run `make check` (on a 64-bit
Linux) with and without my patch, and the output was identical.

Some pointers to other useful testing I should do would be helpful.
First-time contributor here, so I'm not exactly familiar with the
process.

>   * Also...
>
>> +++ b/bfd/elf-bfd.h
>> @@ -1908,6 +1908,7 @@ struct elf_obj_tdata
>>
>>    Elf_Internal_Shdr **group_sect_ptr;
>>    int num_group;
>> +  unsigned int group_search_offset;
>
>   I think that it would be helpful to include a comment by
>   this field explaining what it is for.

Will add.

>> +       /* Begin search from previous found group. */
>> +       unsigned i = j + search_offset;
>> +
>> +       if (i >= num_group)
>> +         i -= num_group;
>
>   I found this to be a little bit confusing when I was reading
>   the patch for the first time.  Maybe this would be clearer ?
>
>           unsigned int i = (j + search_offset) % num_group;
>
>   Of course since the intent is to optimise this loop for speed
>   you may object to the presence of the modulo operator, so I
>   leave the choice up to you.

That happens to be exactly what I had in the first iteration of my patch. :-)

IIRC my next profiling run showed that division instruction as
"significant", and thought someone might comment on it in the context
of an optimization patch, so changed to the current code. But overall,
I like the code with a modulo operator better, and I doubt the
difference in performance, if there even was one, matters in practice.
I'll change back.

--
Jens

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Optimize reading ELF objects with many group sections
  2018-01-08 13:50   ` Jens Widell
@ 2018-01-08 14:40     ` Nick Clifton
  2018-01-12  9:39       ` [PATCH v2] " Jens Widell
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Clifton @ 2018-01-08 14:40 UTC (permalink / raw)
  To: Jens Widell; +Cc: binutils

Hi Jens,

> Some pointers to other useful testing I should do would be helpful.
> First-time contributor here, so I'm not exactly familiar with the
> process.

No worries - thank you for taking the time to pursue contributing this patch.

As for testing I would recommend building and testing a 32-bit target, in
order to make sure that there are no problems relating to bit sizes.  Eg:

  % <path-to-binutils-sources>/configure --quiet CFLAGS="-m32"
  % make all-gas all-ld all-binutils
  % make check

Another useful test is with a toolchain configured for all targets as this
can often throw up issues with other architectures that do not show up with
your primary architecture:

  % <path-to-binutils-source>/configure --enable-64-bit-bfd --enable-targets=all
  % make all-gas all-ld all-binutils
  % make check

Note - I am not actually expecting you to find any issues with the patch,
it looks perfectly fine to me.  But it is good practice to always test,
even when you think that something looks innocuous.


>>           unsigned int i = (j + search_offset) % num_group;
>>
>>   Of course since the intent is to optimise this loop for speed
>>   you may object to the presence of the modulo operator, so I
>>   leave the choice up to you.
> 
> That happens to be exactly what I had in the first iteration of my patch. :-)

Great minds ... :-)

> IIRC my next profiling run showed that division instruction as
> "significant", and thought someone might comment on it in the context
> of an optimization patch, so changed to the current code. But overall,
> I like the code with a modulo operator better, and I doubt the
> difference in performance, if there even was one, matters in practice.
> I'll change back.

Fair enough.  I would hope that a sufficiently clever compiler would
be able to replace the modulo operator with a subtraction anyway, so
performance should not suffer.

Cheers
  Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] Optimize reading ELF objects with many group sections
  2018-01-08 14:40     ` Nick Clifton
@ 2018-01-12  9:39       ` Jens Widell
  2018-01-12 13:18         ` Nick Clifton
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Widell @ 2018-01-12  9:39 UTC (permalink / raw)
  To: binutils, Nick Clifton; +Cc: Jens Widell

When processing a section that is a member of a group, the group
that contains it is looked up using a linear search. The resulting
O(n^2) complexity causes significant performance issues when
dealing with object files with very many groups.

By remembering the index of the last found group and restarting
the next search from that index, the search instead becomes O(n)
in common cases.

Signed-off-by: Jens Widell <jl@opera.com>
---
Changes from previous version:
 - added changelog entry
 - added comment in elf-bfd.h
 - use modulo operator for index calculation

Tests run with default build on 64-bit Linux, with CFLAGS=-m32 and
with "./configure --enable-64-bit-bfd --enable-targets=all".

 bfd/ChangeLog |  5 +++++
 bfd/elf-bfd.h |  4 ++++
 bfd/elf.c     | 12 +++++++++---
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/bfd/ChangeLog b/bfd/ChangeLog
index 7803ef8..44a22c1 100644
--- a/bfd/ChangeLog
+++ b/bfd/ChangeLog
@@ -1,3 +1,8 @@
+2018-01-08  Jens Widell  <jl@opera.com>
+
+	* elf.c (setup_group): Optimize search for group by remembering
+	last found group and restarting search at that index.
+
 2018-01-03  John Baldwin  <jhb@FreeBSD.org>
 
 	* elf.c (elfcore_grok_freebsd_note): Handle
diff --git a/bfd/elf-bfd.h b/bfd/elf-bfd.h
index 9c9b0c7..3136b22 100644
--- a/bfd/elf-bfd.h
+++ b/bfd/elf-bfd.h
@@ -1909,6 +1909,10 @@ struct elf_obj_tdata
   Elf_Internal_Shdr **group_sect_ptr;
   int num_group;
 
+  /* Index into group_sect_ptr, updated by setup_group when finding a
+     section's group.  Used to optimize subsequent group searches. */
+  unsigned int group_search_offset;
+
   unsigned int symtab_section, dynsymtab_section;
   unsigned int dynversym_section, dynverdef_section, dynverref_section;
 
diff --git a/bfd/elf.c b/bfd/elf.c
index 9f44ff9..2088e4b 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -737,10 +737,14 @@ setup_group (bfd *abfd, Elf_Internal_Shdr *hdr, asection *newsect)
 
   if (num_group != (unsigned) -1)
     {
-      unsigned int i;
+      unsigned int search_offset = elf_tdata (abfd)->group_search_offset;
+      unsigned int j;
 
-      for (i = 0; i < num_group; i++)
+      for (j = 0; j < num_group; j++)
 	{
+	  /* Begin search from previous found group. */
+	  unsigned i = (j + search_offset) % num_group;
+
 	  Elf_Internal_Shdr *shdr = elf_tdata (abfd)->group_sect_ptr[i];
 	  Elf_Internal_Group *idx;
 	  bfd_size_type n_elt;
@@ -803,7 +807,9 @@ setup_group (bfd *abfd, Elf_Internal_Shdr *hdr, asection *newsect)
 		if (shdr->bfd_section != NULL)
 		  elf_next_in_group (shdr->bfd_section) = newsect;
 
-		i = num_group - 1;
+		elf_tdata (abfd)->group_search_offset = i;
+
+		j = num_group - 1;
 		break;
 	      }
 	}
-- 
2.7.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] Optimize reading ELF objects with many group sections
  2018-01-12  9:39       ` [PATCH v2] " Jens Widell
@ 2018-01-12 13:18         ` Nick Clifton
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Clifton @ 2018-01-12 13:18 UTC (permalink / raw)
  To: Jens Widell, binutils

Hi Jens,

  Thanks for updating the patch.  I have now applied it, with one small change:

> +2018-01-08  Jens Widell  <jl@opera.com>
> +
> +	* elf.c (setup_group): Optimize search for group by remembering
> +	last found group and restarting search at that index.

I added a line about the patch to elf-bfd.h to the ChangeLog entry.

Cheers
  Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-01-12 13:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-08  9:54 [PATCH] Optimize reading ELF objects with many group sections Jens Widell
2018-01-08 10:23 ` Nick Clifton
2018-01-08 13:50   ` Jens Widell
2018-01-08 14:40     ` Nick Clifton
2018-01-12  9:39       ` [PATCH v2] " Jens Widell
2018-01-12 13:18         ` Nick Clifton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).