public inbox for debugedit@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Optimize do_read_32_relocated using binary search
@ 2024-04-17  6:34 Nikita Popov
  2024-04-17 14:49 ` Mark Wielaard
  0 siblings, 1 reply; 3+ messages in thread
From: Nikita Popov @ 2024-04-17  6:34 UTC (permalink / raw)
  To: debugedit

[-- Attachment #1: Type: text/plain, Size: 1586 bytes --]

debugedit is currently very slow when processing DWARF 5 debuginfo
produced by clang. For some kernel modules, debugedit processing
takes hours.

The root cause of the issue is the loop for finding the correct
REL entry in do_read_32_relocated. This is currently a simple
linear scan. For large objects, it may loop for hundreds of
thousands of iterations.

As the relocations are sorted, we can use a binary search instead,
which is what this patch implements. The time to run debugedit on
a large kernel module (nouveau.ko) drops down to 3 seconds with
this change.

Signed-off-by: Nikita Popov <npopov@redhat.com>
---
 tools/debugedit.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/tools/debugedit.c b/tools/debugedit.c
index f16eecd..d678673 100644
--- a/tools/debugedit.c
+++ b/tools/debugedit.c
@@ -335,12 +335,27 @@ strptr (DSO *dso, size_t sec, size_t offset)
 REL *relptr, *relend;
 int reltype;

+static inline REL *find_rel_for_ptr(unsigned char *xptr)
+{
+  size_t l = 0, r = relend - relptr;
+  while (l < r)
+  {
+    size_t m = (l + r) / 2;
+    if (relptr[m].ptr < xptr)
+      l = m + 1;
+    else if (relptr[m].ptr > xptr)
+      r = m;
+    else
+      return &relptr[m];
+  }
+  return relend;
+}
+
 #define do_read_32_relocated(xptr) ({ \
   uint32_t dret = do_read_32 (xptr); \
   if (relptr) \
     { \
-      while (relptr < relend && relptr->ptr < (xptr)) \
- ++relptr; \
+      relptr = find_rel_for_ptr((xptr)); \
       if (relptr < relend && relptr->ptr == (xptr)) \
  { \
   if (reltype == SHT_REL) \
-- 
2.44.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Optimize do_read_32_relocated using binary search
  2024-04-17  6:34 [PATCH] Optimize do_read_32_relocated using binary search Nikita Popov
@ 2024-04-17 14:49 ` Mark Wielaard
  2024-05-15 13:38   ` Mark Wielaard
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Wielaard @ 2024-04-17 14:49 UTC (permalink / raw)
  To: Nikita Popov, debugedit

Hi Nikita,

On Wed, 2024-04-17 at 15:34 +0900, Nikita Popov wrote:
> debugedit is currently very slow when processing DWARF 5 debuginfo
> produced by clang. For some kernel modules, debugedit processing
> takes hours.
> 
> The root cause of the issue is the loop for finding the correct
> REL entry in do_read_32_relocated. This is currently a simple
> linear scan. For large objects, it may loop for hundreds of
> thousands of iterations.
> 
> As the relocations are sorted, we can use a binary search instead,
> which is what this patch implements. The time to run debugedit on
> a large kernel module (nouveau.ko) drops down to 3 seconds with
> this change.

Very nice. I missed that in the original commit that added support for
strx. Commit 3e7aeeab4f744ad15108775685db68d3a35b0735 does explain a
bit, but I didn't realize that this change made things do a linear
search over and over again...

  The tricky part is the ET_REL (object files or kernel modules)
  support. Relocation reading is "global" per section and we expect to
  read a relocation only once. But we need to read the
  DW_AT_str_offsets_base before reading any strx form attributes. So we
  read that first, then reset the relptr. And when we read from the
  .debug_str_offsets section we need to save and restore the
  .debug_info relptr.


> Signed-off-by: Nikita Popov <npopov@redhat.com>
> ---
>  tools/debugedit.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/debugedit.c b/tools/debugedit.c
> index f16eecd..d678673 100644
> --- a/tools/debugedit.c
> +++ b/tools/debugedit.c
> @@ -335,12 +335,27 @@ strptr (DSO *dso, size_t sec, size_t offset)
>  REL *relptr, *relend;
>  int reltype;
>  
> +static inline REL *find_rel_for_ptr(unsigned char *xptr)
> +{
> +  size_t l = 0, r = relend - relptr;
> +  while (l < r)
> +  {
> +    size_t m = (l + r) / 2;
> +    if (relptr[m].ptr < xptr)
> +      l = m + 1;
> +    else if (relptr[m].ptr > xptr)
> +      r = m;
> +    else
> +      return &relptr[m];
> +  }
> +  return relend;
> +}
> +
>  #define do_read_32_relocated(xptr) ({ \
>    uint32_t dret = do_read_32 (xptr); \
>    if (relptr) \
>      { \
> -      while (relptr < relend && relptr->ptr < (xptr)) \
> - ++relptr; \
> +      relptr = find_rel_for_ptr((xptr)); \
>        if (relptr < relend && relptr->ptr == (xptr)) \
>   { \
>    if (reltype == SHT_REL) \

I think this might make the code slightly slower for the case where
there are no .debug_str_offsets since all other sections are parsed
linearly. But that might not really be such a problem. Since this is
only for the .ko cases.

But I now see a few errors in the testsuite (before and after your
patch, so it is a pre-existing failure) when configuring with CC=clang
CXX=clang++ for clang18, but also for clang17 with CC="clang -gdwarf-5"
CXX="clang++ -gdwarf-5".

  1: debugedit help                                  ok
  2: debugedit usage                                 ok
  3: debugedit executable                            ok
  4: debugedit .debug_str objects DWARF4             ok
  5: debugedit .debug_str/line_str objects DWARF5    ok
  6: debugedit .debug_str partial DWARF4             ok
  7: debugedit .debug_str/line_str partial DWARF5    ok
  8: debugedit .debug_str exe DWARF4                 ok
  9: debugedit .debug_str/line_str exe DWARF5        ok
 10: debugedit .debug_info objects                   FAILED
(debugedit.at:307)
 11: debugedit .debug_info partial                   FAILED
(debugedit.at:330)
 12: debugedit .debug_info exe                       ok
 13: debugedit .debug_types objects                  skipped
(debugedit.at:367)
 14: debugedit .debug_types partial                  skipped
(debugedit.at:405)
 15: debugedit .debug_types exe                      skipped
(debugedit.at:435)
 16: debugedit .debug_line objects DWARF4            ok
 17: debugedit .debug_line objects DWARF5            ok
 18: debugedit .debug_line partial DWARF4            ok
 19: debugedit .debug_line partial DWARF5            ok
 20: debugedit .debug_line exe DWARF4                ok
 21: debugedit .debug_line exe DWARF5                ok
 22: debugedit .debug_macro objects                  FAILED
(debugedit.at:619)
 23: debugedit .debug_macro partial                  FAILED
(debugedit.at:644)
 24: debugedit .debug_macro exe                      FAILED
(debugedit.at:667)
 25: debugedit --list-file DWARF4                    ok
 26: debugedit --list-file DWARF5                    ok


--- expout      2024-04-17 16:45:58.790260019 +0200
+++ /home/mark/src/debugedit/tests/testsuite.dir/at-groups/10/stdout  
2024-04-
17 16:45:58.814260130 +0200
@@ -1,3 +1,2 @@
 /foo/bar/baz
-/foo/bar/baz/baz.c
 /foo/bar/baz/subdir_bar
10. debugedit.at:294: 10. debugedit .debug_info objects
(debugedit.at:294): FAILED (debugedit.at:307)

#                             -*- compilation -*-
11. debugedit.at:319: testing debugedit .debug_info partial ...
./debugedit.at:329: debugedit -b $(pwd) -d /foo/bar/baz
./foobarbaz.part.o
./debugedit.at:330:
$READELF --debug-dump=info ./foobarbaz.part.o \
        | grep -E 'DW_AT_(name|comp_dir)' \
        | rev | cut -d: -f1 | rev | cut -c2- | grep ^/foo/bar/baz |
sort -u

--- /dev/null   2024-04-15 17:25:16.496344492 +0200
+++ /home/mark/src/debugedit/tests/testsuite.dir/at-groups/11/stderr  
2024-04-17 16:45:58.997260972 +0200
@@ -0,0 +1 @@
+readelf: ./foobarbaz.part.o: Warning: indirect offset too big: 0x1a9
11. debugedit.at:319: 11. debugedit .debug_info partial
(debugedit.at:319): FAILED (debugedit.at:330)

#                             -*- compilation -*-
22. debugedit.at:608: testing debugedit .debug_macro objects ...
./debugedit.at:619: debugedit -b $(pwd) -d /foo/bar/baz ./foo.o
--- /dev/null   2024-04-15 17:25:16.496344492 +0200
+++ /home/mark/src/debugedit/tests/testsuite.dir/at-groups/22/stderr  
2024-04-17 16:46:00.516267966 +0200
@@ -0,0 +1 @@
+debugedit: Unhandled DW_MACRO op 0xb
./debugedit.at:619: exit code was 1, expected 0
22. debugedit.at:608: 22. debugedit .debug_macro objects
(debugedit.at:608): FAILED (debugedit.at:619)

#                             -*- compilation -*-
23. debugedit.at:633: testing debugedit .debug_macro partial ...
./debugedit.at:644: debugedit -b $(pwd) -d /foo/bar/baz
./foobarbaz.part.o
--- /dev/null   2024-04-15 17:25:16.496344492 +0200
+++ /home/mark/src/debugedit/tests/testsuite.dir/at-groups/23/stderr  
2024-04-17 16:46:00.631268496 +0200
@@ -0,0 +1 @@
+debugedit: Unhandled DW_MACRO op 0xb
./debugedit.at:644: exit code was 1, expected 0
23. debugedit.at:633: 23. debugedit .debug_macro partial
(debugedit.at:633): FAILED (debugedit.at:644)

#                             -*- compilation -*-
24. debugedit.at:656: testing debugedit .debug_macro exe ...
./debugedit.at:667: debugedit -b $(pwd) -d /foo/bar/baz ./foobarbaz.exe
--- /dev/null   2024-04-15 17:25:16.496344492 +0200
+++ /home/mark/src/debugedit/tests/testsuite.dir/at-groups/24/stderr  
2024-04-17 16:46:00.746269025 +0200
@@ -0,0 +1 @@
+debugedit: Unhandled DW_MACRO op 0xb
./debugedit.at:667: exit code was 1, expected 0
24. debugedit.at:656: 24. debugedit .debug_macro exe
(debugedit.at:656): FAILED (debugedit.at:667)

I'll look into that and try to add CI to explicitly test against clang.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Optimize do_read_32_relocated using binary search
  2024-04-17 14:49 ` Mark Wielaard
@ 2024-05-15 13:38   ` Mark Wielaard
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Wielaard @ 2024-05-15 13:38 UTC (permalink / raw)
  To: Nikita Popov, debugedit

[-- Attachment #1: Type: text/plain, Size: 182 bytes --]

Hi,

I finally pushed this (already tested in Fedora rawhide).
I have some followup patches to handle relocations and strx better.
But this patch is correct.

Cheers,

Mark

[-- Attachment #2: 0001-Optimize-do_read_32_relocated-using-binary-search.patch --]
[-- Type: text/x-patch, Size: 1843 bytes --]

From fbad879eb03f7ecd58a9919865c84b422a473b37 Mon Sep 17 00:00:00 2001
From: Nikita Popov <npopov@redhat.com>
Date: Wed, 17 Apr 2024 15:34:19 +0900
Subject: [PATCH] Optimize do_read_32_relocated using binary search

debugedit is currently very slow when processing DWARF 5 debuginfo
produced by clang. For some kernel modules, debugedit processing
takes hours.

The root cause of the issue is the loop for finding the correct
REL entry in do_read_32_relocated. This is currently a simple
linear scan. For large objects, it may loop for hundreds of
thousands of iterations.

As the relocations are sorted, we can use a binary search instead,
which is what this patch implements. The time to run debugedit on
a large kernel module (nouveau.ko) drops down to 3 seconds with
this change.

Signed-off-by: Nikita Popov <npopov@redhat.com>
---
 tools/debugedit.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/debugedit.c b/tools/debugedit.c
index f16eecd89a61..ae8e38fa58da 100644
--- a/tools/debugedit.c
+++ b/tools/debugedit.c
@@ -335,12 +335,28 @@ strptr (DSO *dso, size_t sec, size_t offset)
 REL *relptr, *relend;
 int reltype;
 
+static inline REL *
+find_rel_for_ptr (unsigned char *xptr)
+{
+  size_t l = 0, r = relend - relptr;
+  while (l < r)
+    {
+      size_t m = (l + r) / 2;
+      if (relptr[m].ptr < xptr)
+	l = m + 1;
+      else if (relptr[m].ptr > xptr)
+	r = m;
+      else
+	return &relptr[m];
+    }
+  return relend;
+}
+
 #define do_read_32_relocated(xptr) ({			\
   uint32_t dret = do_read_32 (xptr);			\
   if (relptr)						\
     {							\
-      while (relptr < relend && relptr->ptr < (xptr))	\
-	++relptr;					\
+      relptr = find_rel_for_ptr (xptr);			\
       if (relptr < relend && relptr->ptr == (xptr))	\
 	{						\
 	  if (reltype == SHT_REL)			\
-- 
2.45.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-05-15 13:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-17  6:34 [PATCH] Optimize do_read_32_relocated using binary search Nikita Popov
2024-04-17 14:49 ` Mark Wielaard
2024-05-15 13:38   ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).