public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug gdb/29702] New: Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse)
@ 2022-10-19 15:15 nickpelling at nickpelling dot com
  2022-10-21 16:53 ` [Bug gdb/29702] " tromey at sourceware dot org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: nickpelling at nickpelling dot com @ 2022-10-19 15:15 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29702

            Bug ID: 29702
           Summary: Huge slowdown just after loading thread DWARF data
                    when hitting breakpoints (gdb in MCUXpresso / Eclipse)
           Product: gdb
           Version: 10.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: gdb
          Assignee: unassigned at sourceware dot org
          Reporter: nickpelling at nickpelling dot com
  Target Milestone: ---

Hi everyone,

As our (non-LTO, Debug) Cortex-M7 executable has got larger, so the length of
time that MCUXpresso's gdb takes to hit the first breakpoint (near the end of
main()) has steadily increased.

I first noticed this with GNU gdb (GNU Tools for Arm Embedded Processors
8-2019-q3-update) 8.3.0.20190703-git, but hoped it would go away when we
upgraded to GNU gdb (GNU Arm Embedded Toolchain 10.3-2021.10)
10.2.90.20210621-git (though this too suffered from the same issue, alas).

With 16 threads, I'm seeing a 30-second delay. And with 25 threads, I'm seeing
a 60-second delay. (Note: the breakpoint is after the threads have been first
created, but before the FreeRTOS scheduler has actually called any of them.)

By enabling loads of gdb trace debug flags, I believe I have isolated the
slowdown to the gdb code just after it has parsed the DWARF data for each
thread/task.

From the trace, each thread causes gdb to parse the associated DIE, then parse
the directories & files, and then process blocks of lines. After the final
block of lines, there's a pause: though this pause is initially very short, it
gets longer (up to 10 seconds) as more threads are loaded.

For example, here's the gdb trace log for the last few lines of our ptpd thread
startup code in startup_mcu.c:

359,753 &"Processing actual line 191: file 1, address 0x6006089e, is_stmt 1,
discrim 0\n"
359,754 &"Recording line 191, file startup_mcu.c, address 0x6006089e\n"
359,754 &"Processing actual line 195: file 1, address 0x600608b8, is_stmt 1,
discrim 0\n"
359,754 &"Recording line 195, file startup_mcu.c, address 0x600608b8\n"
359,754 &"Processing actual line 195: file 1, address 0x600608cc, is_stmt 1,
discrim 0\t(end sequence)\n"
359,754 &"Finishing current line, file startup_mcu.c, address 0x600608cc\n"
359,754 &"Recording line 0, file startup_mcu.c, address 0x600608cc\n"
369,835 &"Done expanding CU at offset 0x556b6\n"
369,835 &"Done expanding symtabs of C:\\[File Path]\\Platform_FreeRTOS.axf.\n"

These trace messages were triggered using:
    set debug dwarf-read 10
    set debug dwarf-line 10

The C function this is describing is unremarkable (comments removed):

187: void ptpd_init(void)
188: {
189:    bool bMaster;
190:    bMaster = HOSTPLATFORM_TimeSync_ShouldBePTPMaster();
191:    xTaskCreate(ptpd_thread, "ptpd_task", PTPD_STACKSIZE_BYTES / 4, (void
*)bMaster, PTPD_PRIORITY, &xCreatedTask);
192:
193:    // Comment
194:    // Comment
195:}
196:
(end of file)

And yet looking at the timestamps, you can see a whopping 10-second gap between
the final "Recording line 0, file startup_mcu.c,..." line and the immediately
following "Done expanding CU at offset 0x556b6\n" line.

Any ideas what could be causing this super-annoying slowdown?

I believe that "Recording line 0..." is in dwarf_decode_lines() -->
dwarf_decode_lines_1() --> lnp_state_machine::record_line -->
dwarf_finish_line() --> dwarf_record_line_1().

Also: are there any additional gdb trace flags I can enable to get more
information about what it's doing between the end of the dwarf decoding and
"Done expanding CU at offset..."?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gdb/29702] Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse)
  2022-10-19 15:15 [Bug gdb/29702] New: Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse) nickpelling at nickpelling dot com
@ 2022-10-21 16:53 ` tromey at sourceware dot org
  2022-10-21 17:12 ` simark at simark dot ca
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tromey at sourceware dot org @ 2022-10-21 16:53 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29702

Tom Tromey <tromey at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tromey at sourceware dot org

--- Comment #1 from Tom Tromey <tromey at sourceware dot org> ---
(In reply to Nick Pelling from comment #0)

> I first noticed this with GNU gdb (GNU Tools for Arm Embedded Processors
> 8-2019-q3-update) 8.3.0.20190703-git, but hoped it would go away when we
> upgraded to GNU gdb (GNU Arm Embedded Toolchain 10.3-2021.10)
> 10.2.90.20210621-git (though this too suffered from the same issue, alas).

Might be worth trying an even newer version.

> With 16 threads, I'm seeing a 30-second delay. And with 25 threads, I'm
> seeing a 60-second delay. (Note: the breakpoint is after the threads have
> been first created, but before the FreeRTOS scheduler has actually called
> any of them.)

It seems strange to me that the number of threads has an impact like this.

> By enabling loads of gdb trace debug flags, I believe I have isolated the
> slowdown to the gdb code just after it has parsed the DWARF data for each
> thread/task.

> For example, here's the gdb trace log for the last few lines of our ptpd
> thread startup code in startup_mcu.c:

This doesn't show the timestamp / output for the "Expanding one or more
symtabs"
message that I'd expect to see.

> 359,754 &"Recording line 0, file startup_mcu.c, address 0x600608cc\n"
> 369,835 &"Done expanding CU at offset 0x556b6\n"
> 369,835 &"Done expanding symtabs of C:\\[File
> Path]\\Platform_FreeRTOS.axf.\n"

> The C function this is describing is unremarkable (comments removed):
> 
[...]
> 
> And yet looking at the timestamps, you can see a whopping 10-second gap
> between the final "Recording line 0, file startup_mcu.c,..." line and the
> immediately following "Done expanding CU at offset 0x556b6\n" line.
> 
> Any ideas what could be causing this super-annoying slowdown?

gdb's DWARF reader has two phases.  In the first phase it reads "partial
symbols"
(this step is reimplemented for GDB 13 and the names are all different, but the
idea is the same).

When gdb needs full information about some object, e.g. when you set a
breakpoint
or when it attaches and tries to find out where the current PC is located, then
it will expand these partial symbols to full symbols.

To do this, it reads the DWARF for the entire compilation unit and constructs
symbols for everything there.

It's possible that if your compilation units are very large, then this will be
slow.
We'd like to fix this (see bug #29398) but it is not easy.

If your DWARF has intra-CU references then bumping the max cache age can help.
See bug #25703.  This may only affect LTO and/or dwz though.

The last time I saw a very bad slowdown was with -g3 and it turned out to be
a bad interaction between GCC and mold.  I don't know if that could be your
situation.

> Also: are there any additional gdb trace flags I can enable to get more
> information about what it's doing between the end of the dwarf decoding and
> "Done expanding CU at offset..."?

Not really, I think.  Though of course you can always add instrumentation,
or run it under a profiler.

It would be interesting to learn more about your situation.  For example if it
is a pathological CU expansion of some kind, is there a bug in the reader that
can be fixed (like if there's N^2 behavior somewhere or the like)?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gdb/29702] Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse)
  2022-10-19 15:15 [Bug gdb/29702] New: Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse) nickpelling at nickpelling dot com
  2022-10-21 16:53 ` [Bug gdb/29702] " tromey at sourceware dot org
@ 2022-10-21 17:12 ` simark at simark dot ca
  2022-10-24 16:50 ` nickpelling at nickpelling dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: simark at simark dot ca @ 2022-10-21 17:12 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29702

Simon Marchi <simark at simark dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |simark at simark dot ca

--- Comment #2 from Simon Marchi <simark at simark dot ca> ---
To make the problem easier to investigate, you can try to manually trigger the
expansion of symtabs using the "maintenance expand-symtabs" command.  You don't
need to be connected to your target to do this.  For instance (running gdb on
gdb):

(top-gdb) with debug timestamp 1 -- with debug dwarf-read 1 -- maintenance
expand-symtabs finish
1110674.433796 [dwarf-read] process_queue: Expanding one or more symtabs of
objfile /home/simark/build/binutils-gdb/gdb/gdb ...
1110674.433821 [dwarf-read] process_queue: Expanding symtab of CU at offset
0x47aa3b3
1110674.489215 [dwarf-read] process_queue: Done expanding CU at offset
0x47aa3b3
1110674.489236 [dwarf-read] process_queue: Done expanding symtabs of
/home/simark/build/binutils-gdb/gdb/gdb.

If you can reproduce the delay this way, it will be much easier to investigate.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gdb/29702] Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse)
  2022-10-19 15:15 [Bug gdb/29702] New: Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse) nickpelling at nickpelling dot com
  2022-10-21 16:53 ` [Bug gdb/29702] " tromey at sourceware dot org
  2022-10-21 17:12 ` simark at simark dot ca
@ 2022-10-24 16:50 ` nickpelling at nickpelling dot com
  2022-10-24 17:07 ` simon.marchi at polymtl dot ca
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: nickpelling at nickpelling dot com @ 2022-10-24 16:50 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29702

--- Comment #3 from Nick Pelling <nickpelling at nickpelling dot com> ---
Thanks very much Tom & Simon, I now have some additional debug info to share.

I tried using "maint set dwarf max-cache-age 0" but this made no noticeable
difference (alas). I also tried "maintenance expand-symtabs", along with "set
debug symfile on" and "set debug symbol-lookup 10", but these yielded no extra
information.

I did get some more information using "set debug symtab-create", e.g.

set debug symtab-create 10
set debug dwarf-read 10
set debug dwarf-line 10

This showed that the unexpected long pause for a given thread is happening
after the final "Created symtab" message and before "Done expanding CU at
offset":

794,918 &"Created symtab 0x213880c8 for module C:\\[File
Path]\\repos\\platformmaster\\FreeRTOSLib\\lwip\\src\\include/lwip/prot/udp.h.\n"
794,918 &"Created symtab 0x213880e8 for module
../../FreeRTOSLib/libcoap/include/coap2/utlist.h.\n"
800,521 &"Done expanding CU at offset 0x14dffb\n"

The final printf here is in process_queues(), so the code seems to be inside
process_full_comp_unit(), i.e. processing the CU's sequence of dwarf tokens.

Similarly, it looks as though the code is exiting dwarf_decode_lines() before
the slowdown is happening: which seems to be right at the end of the code
parsing the DW_AT_stmt_list token.

Using "set debug dwarf-die 10" causes the dwarf tokens in the die to be printed
out (a little further up the gdb trace). The final two dwarf tokens in each of
the modules appear to be (for example):

772,637 &"    DW_AT_stmt_list (DW_FORM_sec_offset) section offset: 206677\n"
772,637 &"    DW_AT_GNU_macros (DW_FORM_sec_offset) section offset: 289465\n"

Hence I'm now wondering if there might actually be something malformed in the
data being emitted by gcc following the DW_AT_GNU_macros dwarf token that is
causing the dwarf_decode_macros() code to get subtly borked or confused.

Unfortunately, there doesn't seem to be (unless I'm missing something obvious)
a single line of debug trace in all the gdb macro-processing code, so I'm not
getting a lot of trace assistance from that. :-(

Are there any known issues to do with gcc and parsing the DW_AT_GNU_macros
token section that might be triggering an issue here?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gdb/29702] Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse)
  2022-10-19 15:15 [Bug gdb/29702] New: Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse) nickpelling at nickpelling dot com
                   ` (2 preceding siblings ...)
  2022-10-24 16:50 ` nickpelling at nickpelling dot com
@ 2022-10-24 17:07 ` simon.marchi at polymtl dot ca
  2022-10-25 15:27 ` nickpelling at nickpelling dot com
  2022-10-25 16:16 ` tromey at sourceware dot org
  5 siblings, 0 replies; 7+ messages in thread
From: simon.marchi at polymtl dot ca @ 2022-10-24 17:07 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29702

Simon Marchi <simon.marchi at polymtl dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |simon.marchi at polymtl dot ca

--- Comment #4 from Simon Marchi <simon.marchi at polymtl dot ca> ---
(In reply to Nick Pelling from comment #3)
> Thanks very much Tom & Simon, I now have some additional debug info to share.
> 
> I tried using "maint set dwarf max-cache-age 0" but this made no noticeable
> difference (alas). I also tried "maintenance expand-symtabs", along with
> "set debug symfile on" and "set debug symbol-lookup 10", but these yielded
> no extra information.
> 
> I did get some more information using "set debug symtab-create", e.g.
> 
> set debug symtab-create 10
> set debug dwarf-read 10
> set debug dwarf-line 10
> 
> This showed that the unexpected long pause for a given thread is happening
> after the final "Created symtab" message and before "Done expanding CU at
> offset":
> 
> 794,918 &"Created symtab 0x213880c8 for module C:\\[File
> Path]\\repos\\platformmaster\\FreeRTOSLib\\lwip\\src\\include/lwip/prot/udp.
> h.\n"
> 794,918 &"Created symtab 0x213880e8 for module
> ../../FreeRTOSLib/libcoap/include/coap2/utlist.h.\n"
> 800,521 &"Done expanding CU at offset 0x14dffb\n"
> 
> The final printf here is in process_queues(), so the code seems to be inside
> process_full_comp_unit(), i.e. processing the CU's sequence of dwarf tokens.
> 
> Similarly, it looks as though the code is exiting dwarf_decode_lines()
> before the slowdown is happening: which seems to be right at the end of the
> code parsing the DW_AT_stmt_list token.
> 
> Using "set debug dwarf-die 10" causes the dwarf tokens in the die to be
> printed out (a little further up the gdb trace). The final two dwarf tokens
> in each of the modules appear to be (for example):
> 
> 772,637 &"    DW_AT_stmt_list (DW_FORM_sec_offset) section offset: 206677\n"
> 772,637 &"    DW_AT_GNU_macros (DW_FORM_sec_offset) section offset: 289465\n"
> 
> Hence I'm now wondering if there might actually be something malformed in
> the data being emitted by gcc following the DW_AT_GNU_macros dwarf token
> that is causing the dwarf_decode_macros() code to get subtly borked or
> confused.
> 
> Unfortunately, there doesn't seem to be (unless I'm missing something
> obvious) a single line of debug trace in all the gdb macro-processing code,
> so I'm not getting a lot of trace assistance from that. :-(
> 
> Are there any known issues to do with gcc and parsing the DW_AT_GNU_macros
> token section that might be triggering an issue here?

Maybe this: https://sourceware.org/bugzilla/show_bug.cgi?id=27754

This is what Tom was referring to when he said:

> The last time I saw a very bad slowdown was with -g3 and it turned out to be
a bad interaction between GCC and mold.  I don't know if that could be your
situation.

Basically, bad interaction between gcc and the linker would cause the macro
information to have a bunch of `DW_MACRO_import` with offset 0, when they were
actually supposed to reference some other offset.

You can check your file with `readelf --debug-dump=macro`, go to offset 0x46ab9
(I got this from your logs, it's 289465 decimal).  If there are a bunch of
`DW_MACRO_import 0x0`, it's likely this problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gdb/29702] Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse)
  2022-10-19 15:15 [Bug gdb/29702] New: Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse) nickpelling at nickpelling dot com
                   ` (3 preceding siblings ...)
  2022-10-24 17:07 ` simon.marchi at polymtl dot ca
@ 2022-10-25 15:27 ` nickpelling at nickpelling dot com
  2022-10-25 16:16 ` tromey at sourceware dot org
  5 siblings, 0 replies; 7+ messages in thread
From: nickpelling at nickpelling dot com @ 2022-10-25 15:27 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29702

--- Comment #5 from Nick Pelling <nickpelling at nickpelling dot com> ---
Hi Simon,

Yes, readelf does indeed reveal numerous suspicious-looking zero-offset macros
in our file (in fact, 16257 of them), e.g.:

 DW_MACRO_import - offset : 0x0
 DW_MACRO_start_file - lineno: 16 filenum: 3 filename:
c:\nxp\mcuxpressoide_11.1.0_3209\ide\tools\lib\gcc\arm-none-eabi\8.3.1\include/stddef.h
 DW_MACRO_import - offset : 0x0
 DW_MACRO_end_file

I'm reasonably certain that this will turn out to be the smoking gun here, so
thanks very much for your help! My next step is to try to get hold of an
updated gcc linker for the Cortex-M7 and see if that fixes the issue. I'll
update this ticket when I have more information.

Cheers, Nick

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug gdb/29702] Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse)
  2022-10-19 15:15 [Bug gdb/29702] New: Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse) nickpelling at nickpelling dot com
                   ` (4 preceding siblings ...)
  2022-10-25 15:27 ` nickpelling at nickpelling dot com
@ 2022-10-25 16:16 ` tromey at sourceware dot org
  5 siblings, 0 replies; 7+ messages in thread
From: tromey at sourceware dot org @ 2022-10-25 16:16 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29702

Tom Tromey <tromey at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |DUPLICATE
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #6 from Tom Tromey <tromey at sourceware dot org> ---
I'm marking this as a dup of the other one.

*** This bug has been marked as a duplicate of bug 27754 ***

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-25 16:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-19 15:15 [Bug gdb/29702] New: Huge slowdown just after loading thread DWARF data when hitting breakpoints (gdb in MCUXpresso / Eclipse) nickpelling at nickpelling dot com
2022-10-21 16:53 ` [Bug gdb/29702] " tromey at sourceware dot org
2022-10-21 17:12 ` simark at simark dot ca
2022-10-24 16:50 ` nickpelling at nickpelling dot com
2022-10-24 17:07 ` simon.marchi at polymtl dot ca
2022-10-25 15:27 ` nickpelling at nickpelling dot com
2022-10-25 16:16 ` tromey at sourceware dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).