* GDB 13.2: breakpoint at wrong line after unrelated change @ 2024-03-11 18:28 Paul Smith 2024-03-11 19:14 ` Simon Marchi 0 siblings, 1 reply; 9+ messages in thread From: Paul Smith @ 2024-03-11 18:28 UTC (permalink / raw) To: gdb Hi all; I have an extremely odd error and I'm wondering if it rings any bells with anyone. If not I'll embark on an effort of upgrading my tools to see if it's fixed in newer versions and if not trying to file a bug. I have a C++ unit test program. This is GNU/Linux 64bit compiled with GCC 12.3 and I'm using GDB 13.2 to debug it. The error happens regardless of whether I compile with "-ggdb3 -O0" or with "-ggdb3 -O2". I haven't tried other optimization levels. In the current behavior, I can set a breakpoint at a function and GDB will stop at the first line of the function; for example: class TestClass : public ... { ... void breakpointTest(TestData* data) { printf("obj = %p\n", data); } ... If I run: (gdb) br TestClass::breakpointTest (gdb) run then GDB will stop at the printf line, and the "data" variable is set properly: Thread 1 "TestClass" hit Breakpoint 1, TestClass::breakpointTest (this=0x7ffff2a09a00, data=0x7ffff2aa7000) at TestClass.cpp:100 100 printf("obj = %p\n", data); Now if I make a change in my program in a completely different, irrelevant spot (this change creates a new templated function that uses Args... and perfect forwarding etc.: it's complex and uses the fmt library, but it is not being used at all in this function, or even in this class although it's used in a superclass), then after I do exactly the same thing above, GDB will stop at the wrong location. Instead of stopping at line 100 at the first line of the function it stops "before" the function is entered and the function arguments are not set yet (in the example below note the values of "this" and "data" are wrong). I have noticed that if I only include the templated function definition but don't call it, then the problem doesn't happen. I have to use the templated function somewhere in the translation unit, but it doesn't have to be anywhere near the function. In the failure case if I use "n" to go to the next line, THEN I get to the first line in the function and everything is set properly. Example: Thread 1 "TestClass" hit Breakpoint 1, TestClass::breakpointTest (this=0x0, data=0x21) at TestClass.cpp:98 98 void breakpointTest(TestData* data) (gdb) n 100 printf("obj = %p\n", txn); (gdb) fr #0 TestClass::breakpointTest (this=0x7ffff2a09a00, data=0x7ffff2aa7000) at TestClass.cpp:100 100 printf("obj = %p\n", data); Is anyone aware of any issue in GDB, or GCC, where using templated functions with perfect forwarding or other complex C++ template features could cause GDB's understanding of the starting line number of functions to be miscalculated like this? Is there a way for me to investigate what information GDB is looking at to determine where to set breakpoints when given a symbol name like this? Is this the same info available via addr2line etc.? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GDB 13.2: breakpoint at wrong line after unrelated change 2024-03-11 18:28 GDB 13.2: breakpoint at wrong line after unrelated change Paul Smith @ 2024-03-11 19:14 ` Simon Marchi 2024-03-11 19:38 ` Paul Smith 0 siblings, 1 reply; 9+ messages in thread From: Simon Marchi @ 2024-03-11 19:14 UTC (permalink / raw) To: psmith, gdb On 3/11/24 14:28, Paul Smith via Gdb wrote: > Hi all; > > I have an extremely odd error and I'm wondering if it rings any bells > with anyone. If not I'll embark on an effort of upgrading my tools to > see if it's fixed in newer versions and if not trying to file a bug. > > I have a C++ unit test program. This is GNU/Linux 64bit compiled with > GCC 12.3 and I'm using GDB 13.2 to debug it. The error happens > regardless of whether I compile with "-ggdb3 -O0" or with "-ggdb3 -O2". > I haven't tried other optimization levels. > > In the current behavior, I can set a breakpoint at a function and GDB > will stop at the first line of the function; for example: > > class TestClass : public ... { > ... > void breakpointTest(TestData* data) > { > printf("obj = %p\n", data); > } > ... > > If I run: > > (gdb) br TestClass::breakpointTest > > (gdb) run > > then GDB will stop at the printf line, and the "data" variable is set > properly: > > Thread 1 "TestClass" hit Breakpoint 1, TestClass::breakpointTest > (this=0x7ffff2a09a00, data=0x7ffff2aa7000) at TestClass.cpp:100 > 100 printf("obj = %p\n", data); > > > Now if I make a change in my program in a completely different, > irrelevant spot (this change creates a new templated function that uses > Args... and perfect forwarding etc.: it's complex and uses the fmt > library, but it is not being used at all in this function, or even in > this class although it's used in a superclass), then after I do exactly > the same thing above, GDB will stop at the wrong location. Instead of > stopping at line 100 at the first line of the function it stops > "before" the function is entered and the function arguments are not set > yet (in the example below note the values of "this" and "data" are > wrong). > > I have noticed that if I only include the templated function definition > but don't call it, then the problem doesn't happen. When you say that, does it mean that you just define the templated function, or do you manually instantiate it? In other words, does it cause any code to be generated? > I have to use the > templated function somewhere in the translation unit, but it doesn't > have to be anywhere near the function. > > In the failure case if I use "n" to go to the next line, THEN I get to > the first line in the function and everything is set properly. > > Example: > > Thread 1 "TestClass" hit Breakpoint 1, TestClass::breakpointTest > (this=0x0, data=0x21) at TestClass.cpp:98 > 98 void breakpointTest(TestData* data) > > (gdb) n > 100 printf("obj = %p\n", txn); > > (gdb) fr > #0 TestClass::breakpointTest (this=0x7ffff2a09a00, data=0x7ffff2aa7000) > at TestClass.cpp:100 > 100 printf("obj = %p\n", data); > > > Is anyone aware of any issue in GDB, or GCC, where using templated > functions with perfect forwarding or other complex C++ template > features could cause GDB's understanding of the starting line number of > functions to be miscalculated like this? > > Is there a way for me to investigate what information GDB is looking at > to determine where to set breakpoints when given a symbol name like > this? Is this the same info available via addr2line etc.? When placing a breakpoint on a function name like this, on code compiled by gcc/g++, GDB analyzes the prologue and tries to guess at which point the stack for the function is set up and where the location expressions given by the DWARF debug info for the the local variables become meaningful. With optimizations, this can become tricky, but you said it happens with -O0, so let's focus on that. I don't really have an idea of what's happening, but you could try showing what the "disas" command shows after hitting the breakpoint in both cases (the `=>` should show where you are stopped, so where the breakpoint was set). Simon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GDB 13.2: breakpoint at wrong line after unrelated change 2024-03-11 19:14 ` Simon Marchi @ 2024-03-11 19:38 ` Paul Smith 2024-03-11 19:50 ` Simon Marchi 0 siblings, 1 reply; 9+ messages in thread From: Paul Smith @ 2024-03-11 19:38 UTC (permalink / raw) To: gdb On Mon, 2024-03-11 at 15:14 -0400, Simon Marchi wrote: > > I have noticed that if I only include the templated function > > definition but don't call it, then the problem doesn't happen. > > When you say that, does it mean that you just define the templated > function, or do you manually instantiate it? In other words, does it > cause any code to be generated? If I just define the templated function I don't see the issue. If I invoke the templated function, I get the problem. FYI I'm switching to the fmt library (if you're familiar with that) and the templated function invokes it; it's something like this: void criticalErrorV(fmt::string_view fmt, const char *file, int line, fmt::format_args args); template <typename... Args> void criticalError(fmt::format_string<Args...> fmt, const char* file, int line, Args &&...args) { criticalErrorV(fmt, file, line, fmt::make_format_args(args...)); } If I never call criticalError() then it works fine (or in my previous implementation, which used printf-style calls with stdarg, it worked fine as well). If I have some invocation of criticalError() somewhere in the translation unit, I get this problem. I haven't checked moving it around to see if it needs to be invoked before/after the "problem" method in the TU to get this behavior. > I don't really have an idea of what's happening, but you could try > showing what the "disas" command shows after hitting the breakpoint > in both cases (the `=>` should show where you are stopped, so where > the breakpoint was set). Good idea; here's what I get for the correct behavior: 0x000000000053209c <+0>: push %rbp 0x000000000053209d <+1>: mov %rsp,%rbp 0x00000000005320a0 <+4>: lea -0x10(%rsp),%rsp 0x00000000005320a5 <+9>: mov %rdi,-0x8(%rbp) 0x00000000005320a9 <+13>: mov %rsi,-0x10(%rbp) => 0x00000000005320ad <+17>: mov -0x10(%rbp),%rax 0x00000000005320b1 <+21>: mov %rax,%rsi 0x00000000005320b4 <+24>: lea 0x17e9a9(%rip),%rax # 0x6b0a64 0x00000000005320bb <+31>: mov %rax,%rdi 0x00000000005320be <+34>: mov $0x0,%eax 0x00000000005320c3 <+39>: call 0x52bc00 <printf@plt> 0x00000000005320c8 <+44>: nop 0x00000000005320c9 <+45>: mov %rbp,%rsp 0x00000000005320cc <+48>: pop %rbp 0x00000000005320cd <+49>: ret Here's what I get for the incorrect behavior: => 0x00000000005320ee <+0>: push %rbp 0x00000000005320ef <+1>: mov %rsp,%rbp 0x00000000005320f2 <+4>: lea -0x10(%rsp),%rsp 0x00000000005320f7 <+9>: mov %rdi,-0x8(%rbp) 0x00000000005320fb <+13>: mov %rsi,-0x10(%rbp) 0x00000000005320ff <+17>: mov -0x10(%rbp),%rax 0x0000000000532103 <+21>: mov %rax,%rsi 0x0000000000532106 <+24>: lea 0x17e9d6(%rip),%rax # 0x6b0ae3 0x000000000053210d <+31>: mov %rax,%rdi 0x0000000000532110 <+34>: mov $0x0,%eax 0x0000000000532115 <+39>: call 0x52bc00 <printf@plt> 0x000000000053211a <+44>: nop 0x000000000053211b <+45>: mov %rbp,%rsp 0x000000000053211e <+48>: pop %rbp 0x000000000053211f <+49>: ret It seems to have given up and just picked the first instruction :) Here's the compile line args (removed extraneous stuff like warnings and preprocessor options): g++ -std=gnu++20 -ggdb3 -fPIC -march=haswell -mtune=intel \ -fno-omit-frame-pointer -O0 -pthread \ -o TestClass.o -c TestClass.cpp ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GDB 13.2: breakpoint at wrong line after unrelated change 2024-03-11 19:38 ` Paul Smith @ 2024-03-11 19:50 ` Simon Marchi 2024-03-11 20:17 ` Paul Smith 2024-03-15 21:11 ` Paul Smith 0 siblings, 2 replies; 9+ messages in thread From: Simon Marchi @ 2024-03-11 19:50 UTC (permalink / raw) To: psmith, gdb On 3/11/24 15:38, Paul Smith via Gdb wrote: > On Mon, 2024-03-11 at 15:14 -0400, Simon Marchi wrote: >>> I have noticed that if I only include the templated function >>> definition but don't call it, then the problem doesn't happen. >> >> When you say that, does it mean that you just define the templated >> function, or do you manually instantiate it? In other words, does it >> cause any code to be generated? > > If I just define the templated function I don't see the issue. If I > invoke the templated function, I get the problem. > > FYI I'm switching to the fmt library (if you're familiar with that) Yes, I love it, we should use it in GDB :). > and > the templated function invokes it; it's something like this: > > void criticalErrorV(fmt::string_view fmt, const char *file, int line, > fmt::format_args args); > > template <typename... Args> > void criticalError(fmt::format_string<Args...> fmt, > const char* file, int line, Args &&...args) > { > criticalErrorV(fmt, file, line, fmt::make_format_args(args...)); > } > > If I never call criticalError() then it works fine (or in my previous > implementation, which used printf-style calls with stdarg, it worked > fine as well). If you never call it, if never generates code, so it kinda make sense that it doesn't change anything. > If I have some invocation of criticalError() somewhere in the > translation unit, I get this problem. I haven't checked moving it > around to see if it needs to be invoked before/after the "problem" > method in the TU to get this behavior. > >> I don't really have an idea of what's happening, but you could try >> showing what the "disas" command shows after hitting the breakpoint >> in both cases (the `=>` should show where you are stopped, so where >> the breakpoint was set). > > Good idea; here's what I get for the correct behavior: > > 0x000000000053209c <+0>: push %rbp > 0x000000000053209d <+1>: mov %rsp,%rbp > 0x00000000005320a0 <+4>: lea -0x10(%rsp),%rsp > 0x00000000005320a5 <+9>: mov %rdi,-0x8(%rbp) > 0x00000000005320a9 <+13>: mov %rsi,-0x10(%rbp) > => 0x00000000005320ad <+17>: mov -0x10(%rbp),%rax > 0x00000000005320b1 <+21>: mov %rax,%rsi > 0x00000000005320b4 <+24>: lea 0x17e9a9(%rip),%rax # 0x6b0a64 > 0x00000000005320bb <+31>: mov %rax,%rdi > 0x00000000005320be <+34>: mov $0x0,%eax > 0x00000000005320c3 <+39>: call 0x52bc00 <printf@plt> > 0x00000000005320c8 <+44>: nop > 0x00000000005320c9 <+45>: mov %rbp,%rsp > 0x00000000005320cc <+48>: pop %rbp > 0x00000000005320cd <+49>: ret > > Here's what I get for the incorrect behavior: > > => 0x00000000005320ee <+0>: push %rbp > 0x00000000005320ef <+1>: mov %rsp,%rbp > 0x00000000005320f2 <+4>: lea -0x10(%rsp),%rsp > 0x00000000005320f7 <+9>: mov %rdi,-0x8(%rbp) > 0x00000000005320fb <+13>: mov %rsi,-0x10(%rbp) > 0x00000000005320ff <+17>: mov -0x10(%rbp),%rax > 0x0000000000532103 <+21>: mov %rax,%rsi > 0x0000000000532106 <+24>: lea 0x17e9d6(%rip),%rax # 0x6b0ae3 > 0x000000000053210d <+31>: mov %rax,%rdi > 0x0000000000532110 <+34>: mov $0x0,%eax > 0x0000000000532115 <+39>: call 0x52bc00 <printf@plt> > 0x000000000053211a <+44>: nop > 0x000000000053211b <+45>: mov %rbp,%rsp > 0x000000000053211e <+48>: pop %rbp > 0x000000000053211f <+49>: ret > > It seems to have given up and just picked the first instruction :) > > Here's the compile line args (removed extraneous stuff like warnings > and preprocessor options): > > g++ -std=gnu++20 -ggdb3 -fPIC -march=haswell -mtune=intel \ > -fno-omit-frame-pointer -O0 -pthread \ > -o TestClass.o -c TestClass.cpp Ok, so clearly GDB failed to analyze the prologue. Which is weird because the two functions are identical (modulo the addresses). To get to the bottom of this, you (or someone else) would need to debug GDB itself. If you want to do this, I would start at function skip_prologue_using_sal, in symtab.c. Off hand, I don't think we have a debug switch to enable logging for prologue skipping. It would be useful to have some here, as we would be able to compare the logging shown in both cases. When you have DWARF debug info (which is your case), prologue skipping is done using the DWARF line tables. You could try to extract the line tables for both versions of the function and see what's different. But that would probably only be useful if you're debugging GDB already. Simon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GDB 13.2: breakpoint at wrong line after unrelated change 2024-03-11 19:50 ` Simon Marchi @ 2024-03-11 20:17 ` Paul Smith 2024-03-15 21:11 ` Paul Smith 1 sibling, 0 replies; 9+ messages in thread From: Paul Smith @ 2024-03-11 20:17 UTC (permalink / raw) To: gdb On Mon, 2024-03-11 at 15:50 -0400, Simon Marchi wrote: > > FYI I'm switching to the fmt library (if you're familiar with that) > > Yes, I love it, we should use it in GDB :). It's very nice for portable programming, but I really hope that GCC will add __attribute__ support for it like they have for printf() formatting. The errors generated are incomprehensible (basically you get pages of error messages and the only useful thing is you get a filename and linenumber to look at) and it has one significant failing: it will throw a compile error if you have _too few_ arguments for the formatting string, or the argument can't be formatted, but it is completely silent if you have _too many_ arguments for the formatting string. In the abstract this makes sense since unlike with varargs it's quite possible to have extra arguments that are not formatted... but in real life I expect that capability is virtually never used, and the lack of this warning makes porting very tricky (if you forget to switch a "%s" to a "{}", the compiler will not warn you). > If you never call it, if never generates code, so it kinda make sense > that it doesn't change anything. Yes agreed. > > If I have some invocation of criticalError() somewhere in the > > translation unit, I get this problem. I haven't checked moving it > > around to see if it needs to be invoked before/after the "problem" > > method in the TU to get this behavior. Just to try it I put only one invocation near the start of the TU, then only one invocation near the end of the TU. I got the problem in both cases, so that's kind of odd. Maybe. > Ok, so clearly GDB failed to analyze the prologue. Which is weird > because the two functions are identical (modulo the addresses). Yes! Very weird. > To get to the bottom of this, you (or someone else) would need to > debug GDB itself. OK. I will look into this but it may take a few days. The first thing I'll do is build the latest release of GDB and see if it makes a difference. I'm hopeful that the problem is in GDB not GCC or binutils since those are harder to change, but we'll see. Thanks for the conversation Simon I'll let you know where I get, with the goal of filing a bug if I can repro with the latest code and can get some idea of what's going on. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GDB 13.2: breakpoint at wrong line after unrelated change 2024-03-11 19:50 ` Simon Marchi 2024-03-11 20:17 ` Paul Smith @ 2024-03-15 21:11 ` Paul Smith 2024-03-15 22:19 ` Paul Smith 1 sibling, 1 reply; 9+ messages in thread From: Paul Smith @ 2024-03-15 21:11 UTC (permalink / raw) To: Simon Marchi, gdb On Mon, 2024-03-11 at 15:50 -0400, Simon Marchi wrote: > Ok, so clearly GDB failed to analyze the prologue. Which is weird > because the two functions are identical (modulo the addresses). To > get to the bottom of this, you (or someone else) would need to debug > GDB itself. If you want to do this, I would start at function > skip_prologue_using_sal, in symtab.c. Off hand, I don't think we > have a debug switch to enable logging for prologue skipping. It > would be useful to have some here, as we would be able to compare the > logging shown in both cases. FYI I have finally gotten back to looking at this. I've only been at it for a short time but just for information: I was able to build GDB 14.2 (latest release) from source and I still see the issue there. So I started debugging. I can tell you that in the "good" binary case I can see that amd64_tdep.c:amd64_skip_prologue() is invoked which invokes symtab.c:skip_prologue_using_sal() as you suggested. In fact, these methods are called numerous times. In the "bad" binary case, neither of those methods is called, ever. I put a gdb_printf() in both functions and in the "good" binary I see probably 20 invocations between starting, setting the breakpoint, running, and exiting: in the "bad" binary zero invocations. I do see that we definitely invoke set_gdbarch_skip_prologue() with the amd64 function pointer in both cases, so it's not that. I'm looking to see where *_skip_prologue() is called from to figure out where the code paths diverge, just thought I'd send a note to let folks know that I've not dropped this investigation. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GDB 13.2: breakpoint at wrong line after unrelated change 2024-03-15 21:11 ` Paul Smith @ 2024-03-15 22:19 ` Paul Smith 2024-03-16 16:33 ` Simon Marchi 0 siblings, 1 reply; 9+ messages in thread From: Paul Smith @ 2024-03-15 22:19 UTC (permalink / raw) To: Simon Marchi, gdb On Fri, 2024-03-15 at 17:11 -0400, Paul Smith via Gdb wrote: > I can tell you that in the "good" binary case I can see that > amd64_tdep.c:amd64_skip_prologue() is invoked which invokes > symtab.c:skip_prologue_using_sal() as you suggested. In fact, these > methods are called numerous times. > > In the "bad" binary case, neither of those methods is called, ever. > I put a gdb_printf() in both functions and in the "good" binary I see > probably 20 invocations between starting, setting the breakpoint, > running, and exiting: in the "bad" binary zero invocations. I do see > that we definitely invoke set_gdbarch_skip_prologue() with the amd64 > function pointer in both cases, so it's not that. More details, no answers. However, the problem is much deeper than some kind of incorrect computation of the prologue length. It appears to be a major difference in the structure of the binary itself, which is weird. The difference happens in symtab.c:find_function_start_sal_1(). When this is called on the "good" binary, sal.symtab->compunit()->locations_valid() is 0 so we fall through to calling skip_prologue_sal(). In the "bad" binary, locations_valid() returns 1 instead. This sends us through this code starting at symtab.c:3607: if (funfirstline && sal.symtab != NULL && (sal.symtab->compunit ()->locations_valid () || sal.symtab->language () == language_asm)) { struct gdbarch *gdbarch = sal.symtab->compunit ()->objfile ()- >arch (); sal.pc = func_addr; if (gdbarch_skip_entrypoint_p (gdbarch)) sal.pc = gdbarch_skip_entrypoint (gdbarch, sal.pc); return sal; } thus returning early. I've checked and gdb_arch_skip_entrypoint_p() returns null so gdbarch_skip_entrypoint() is not called. I've also verified that all other aspects of the above if-statement (funfirstline and sal.symtab->language()) are the same (1 and 4) between the good and bad calls. The difference appears to be the return code of locations_valid(). Looking into this it appears to be something set for the entire binary, differently between the "good" and "bad" binary. In the "good" binary we enter read.c:process_full_comp_unit() the passed-in dwarf2_cu value of has_loclist is false. Because of that, this is not called: if (cu->has_loclist && gcc_4_minor >= 5) cust->set_locations_valid (true); and because this is not called, the locations_valid() return above is false. In the "bad" binary when we enter process_full_comp_unit(), the value of has_loclist is true. Because of this we call cust- >set_locations_valid(true) above, and this means locations_valid() returns true and we follow the alternate path when skip_prologue_sal() is called. I have to stop here for today but maybe I'll have more time later this weekend. If anyone has hints on how to determine why the settings of struct dwarf2_cu is different let me know. Cheers! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GDB 13.2: breakpoint at wrong line after unrelated change 2024-03-15 22:19 ` Paul Smith @ 2024-03-16 16:33 ` Simon Marchi 2024-03-16 19:57 ` Paul Smith 0 siblings, 1 reply; 9+ messages in thread From: Simon Marchi @ 2024-03-16 16:33 UTC (permalink / raw) To: psmith, gdb On 2024-03-15 18:19, Paul Smith wrote: > On Fri, 2024-03-15 at 17:11 -0400, Paul Smith via Gdb wrote: >> I can tell you that in the "good" binary case I can see that >> amd64_tdep.c:amd64_skip_prologue() is invoked which invokes >> symtab.c:skip_prologue_using_sal() as you suggested. In fact, these >> methods are called numerous times. >> >> In the "bad" binary case, neither of those methods is called, ever. >> I put a gdb_printf() in both functions and in the "good" binary I see >> probably 20 invocations between starting, setting the breakpoint, >> running, and exiting: in the "bad" binary zero invocations. I do see >> that we definitely invoke set_gdbarch_skip_prologue() with the amd64 >> function pointer in both cases, so it's not that. > > More details, no answers. > > However, the problem is much deeper than some kind of incorrect > computation of the prologue length. It appears to be a major > difference in the structure of the binary itself, which is weird. > > The difference happens in symtab.c:find_function_start_sal_1(). When > this is called on the "good" binary, > sal.symtab->compunit()->locations_valid() is 0 so we fall through to > calling skip_prologue_sal(). > > In the "bad" binary, locations_valid() returns 1 instead. This sends > us through this code starting at symtab.c:3607: > > if (funfirstline && sal.symtab != NULL > && (sal.symtab->compunit ()->locations_valid () > || sal.symtab->language () == language_asm)) > { > struct gdbarch *gdbarch = sal.symtab->compunit ()->objfile ()- >> arch (); > > sal.pc = func_addr; > if (gdbarch_skip_entrypoint_p (gdbarch)) > sal.pc = gdbarch_skip_entrypoint (gdbarch, sal.pc); > return sal; > } > > thus returning early. I've checked and gdb_arch_skip_entrypoint_p() > returns null so gdbarch_skip_entrypoint() is not called. > > I've also verified that all other aspects of the above if-statement > (funfirstline and sal.symtab->language()) are the same (1 and 4) > between the good and bad calls. The difference appears to be the > return code of locations_valid(). > > > Looking into this it appears to be something set for the entire binary, > differently between the "good" and "bad" binary. > > In the "good" binary we enter read.c:process_full_comp_unit() the > passed-in dwarf2_cu value of has_loclist is false. Because of that, > this is not called: > > if (cu->has_loclist && gcc_4_minor >= 5) > cust->set_locations_valid (true); > > and because this is not called, the locations_valid() return above is > false. > > In the "bad" binary when we enter process_full_comp_unit(), the value > of has_loclist is true. Because of this we call cust- >> set_locations_valid(true) above, and this means locations_valid() > returns true and we follow the alternate path when skip_prologue_sal() > is called. > > I have to stop here for today but maybe I'll have more time later this > weekend. If anyone has hints on how to determine why the settings of > struct dwarf2_cu is different let me know. Hi Paul, I started to look at this problem this week, because I hit a case in my own C++ program very similar to yours. I didn't have time to finish my reply, but my findings were very similar to yours. When compiled with gcc 11, the prologue is skipped. When compiled with gcc 12 and 13, the prologue is not skipped. All with -O0. Here's my analysis (partly redundant with what you said): First, what I see: Here, GDB stopped at the very first instruction of the function. The arguments are wrong: (gdb) info args this = 0x3dd736 msgType1 = ((anonymous namespace)::MsgType::MSG_ITER_INACTIVITY | unknown: 0x5554) msgType2 = (unknown: 0x555553f8) If I step past the prologue, they become correct: (gdb) n 183 const auto specTestName = makeSpecTestName(_mTestName, msgType1, msgType2); (gdb) info args this = 0x555555a65e40 <(anonymous namespace)::errorTestCases> msgType1 = (anonymous namespace)::MsgType::STREAM msgType2 = (anonymous namespace)::MsgType::STREAM When the prologue is skipped in the gcc 11-compiled executable, we reach the skip_prologue_sal function like this: #0 skip_prologue_sal (sal=0x7ffd145c64b0) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:3852 #1 0x0000561d4fa02155 in find_function_start_sal_1 (func_addr=1910486, section=0x561d5247d818, funfirstline=true) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:3716 #2 0x0000561d4fa0221e in find_function_start_sal (sym=0x561d52ed5900, funfirstline=true) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:3744 #3 0x0000561d4f7bc0e5 in symbol_to_sal (result=0x7ffd145c6570, funfirstline=1, sym=0x561d52ed5900) at /home/smarchi/src/binutils-gdb/gdb/linespec.c:4376 #4 0x0000561d4f7b62f1 in convert_linespec_to_sals (state=0x7ffd145c69a0, ls=0x7ffd145c69f0) at /home/smarchi/src/binutils-gdb/gdb/linespec.c:2255 #5 0x0000561d4f7b73a5 in parse_linespec (parser=0x7ffd145c6970, arg=0x7f002c018ce0 "_runOne", match_type=symbol_name_match_type::WILD) at /home/smarchi/src/binutils-gdb/gdb/linespec.c:2640 #6 0x0000561d4f7b84e4 in location_spec_to_sals (parser=0x7ffd145c6970, locspec=0x561d52464650) at /home/smarchi/src/binutils-gdb/gdb/linespec.c:3080 #7 0x0000561d4f7b890a in decode_line_full (locspec=0x561d52464650, flags=1, search_pspace=0x0, default_symtab=0x0, default_line=0, canonical=0x7ffd145c6e00, select_mode=0x0, filter=0x0) at /home/smarchi/src/binutils-gdb/gdb/linespec.c:3157 #8 0x0000561d4f4989e9 in parse_breakpoint_sals (locspec=0x561d52464650, canonical=0x7ffd145c6e00) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:8895 #9 0x0000561d4f4a5077 in create_sals_from_location_spec_default (locspec=0x561d52464650, canonical=0x7ffd145c6e00) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:13200 #10 0x0000561d4f499a0f in create_breakpoint (gdbarch=0x561d5244cac0, locspec=0x561d52464650, cond_string=0x0, thread=-1, inferior=-1, extra_string=0x0, force_condition=false, parse_extra=1, tempflag=0, type_wanted=bp_breakpoint, ignore_count=0, pending_break_support=AUTO_BOOLEAN_AUTO, ops=0x561d501b5100 <code_breakpoint_ops>, from_tty=1, enabled=1, internal=0, flags=0) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:9230 #11 0x0000561d4f49a4da in break_command_1 (arg=0x561d521e1d49 "", flag=0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:9415 When the prologue is not skipped, with gcc 12 and 13, skip_prologue_sal is never called. Backtracking a bit, I found that in that case find_function_start_sal_1 returns early due to `sal.symtab->compunit ()->locations_valid ()` being true. The locations_valid flag is set in the DWARF reader (process_full_comp_unit function) whenever dwarf2_cu::has_loclist is true. That is set in var_decode_location when processing a symbol whose location (DW_AT_location) is a loclist. In my gcc 11-generated executable, I don't have a symbol whose location is a loclist. In my gcc 12 or 13-generated executable, I do: DW_AT_location [DW_FORM_sec_offset] (0x0000000c: [0x000000000024ed64, 0x000000000024ed7b): DW_OP_reg5 RDI [0x000000000024ed7b, 0x000000000024ee14): DW_OP_reg3 RBX [0x000000000024ee14, 0x000000000024ee15): DW_OP_entry_value(DW_OP_reg5 RDI), DW_OP_stack_value [0x000000000024ee15, 0x000000000024ef0b): DW_OP_reg3 RBX) The reasoning being this is explained here in process_full_comp_unit: /* GCC-4.0 has started to support -fvar-tracking. GCC-3.x still can produce DW_AT_location with location lists but it can be possibly invalid without -fvar-tracking. Still up to GCC-4.4.x incl. 4.4.0 there were bugs in prologue debug info, fixed later in GCC-4.5 by "unwind info for epilogues" patch (which is not directly related). For -gdwarf-4 type units LOCATIONS_VALID indication is fortunately not needed, it would be wrong due to missing DW_AT_producer there. Still one can confuse GDB by using non-standard GCC compilation options - this waits on GCC PR other/32998 (-frecord-gcc-switches). */ if (cu->has_loclist && gcc_4_minor >= 5) cust->set_locations_valid (true); So, as soon as it sees one loclist in the compilation unit, GDB assumes that GCC has produced loclists that describe accurately variable values even in prologues everywhere. This assumption is not true here. The locations for the two arguments I tried to print earlier are only valid after the prologue, after the stack has been set up: 0x00055daa: DW_TAG_formal_parameter DW_AT_name [DW_FORM_strp] ("msgType1") DW_AT_location [DW_FORM_exprloc] (DW_OP_fbreg -1660) 0x00055dba: DW_TAG_formal_parameter DW_AT_name [DW_FORM_strp] ("msgType2") DW_AT_location [DW_FORM_exprloc] (DW_OP_fbreg -1664) Simon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GDB 13.2: breakpoint at wrong line after unrelated change 2024-03-16 16:33 ` Simon Marchi @ 2024-03-16 19:57 ` Paul Smith 0 siblings, 0 replies; 9+ messages in thread From: Paul Smith @ 2024-03-16 19:57 UTC (permalink / raw) To: Simon Marchi, gdb On Sat, 2024-03-16 at 12:33 -0400, Simon Marchi wrote: > Here, GDB stopped at the very first instruction of the function. The > arguments are wrong: > If I step past the prologue, they become correct: Everything you discovered is identical to my situation in every way, and you got a little further than I did. It's interesting that as long as I don't invoke my perfect-forwarding fmt templated function it works; as soon as I do GDB seems to misinterpret the handling of the entire binary. It seems this is a bug in GDB, after all. I feel like you're in a better position to file a bug (with a deeper understanding of the problem) but if you would prefer me to do so I can. I'm certainly willing to test out any proposed fixes, if any were forthcoming. I do have the infrastructure here to build and test GDB. In the meantime I will look into what I can do to work around this issue as it's causing some of my tests to fail (we have a suite of GDB Python macros we use and we wrote tests of these macros in our test suite, which are failing due to this problem). I don't really want to delay deployment of my fmt changes until this GDB issue is fixed. Perhaps I can modify the tests to add a "step" call, if it detects this incorrect prologue skip situation or something like that. Thanks! -- Paul D. Smith <psmith@gnu.org> Find some GNU Make tips at: https://www.gnu.org http://make.mad-scientist.net "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-03-16 19:57 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-03-11 18:28 GDB 13.2: breakpoint at wrong line after unrelated change Paul Smith 2024-03-11 19:14 ` Simon Marchi 2024-03-11 19:38 ` Paul Smith 2024-03-11 19:50 ` Simon Marchi 2024-03-11 20:17 ` Paul Smith 2024-03-15 21:11 ` Paul Smith 2024-03-15 22:19 ` Paul Smith 2024-03-16 16:33 ` Simon Marchi 2024-03-16 19:57 ` Paul Smith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).