public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] gdb/c++: fix handling of breakpoints on @plt symbols
@ 2022-12-20 13:55 Andrew Burgess
  2023-01-20 11:48 ` [PATCHv2] " Andrew Burgess
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Burgess @ 2022-12-20 13:55 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071.
Additionally, PR gdb/17199 relates to this area of code, but is more
of a request to refactor some parts of GDB, this commit does not
address that request, but it is probably worth reading that PR when
looking at this commit.

When the current language is C++, and the user places a breakpoint on
a function in a shared library, GDB will currently find two locations
for the breakpoint, one location will be within the function itself as
we would expect, but the other location will be within the PLT table
for the call to the named function.  Consider this session:

  $ gdb -q /tmp/breakpoint-shlib-func
  Reading symbols from /tmp/breakpoint-shlib-func...
  (gdb) start
  Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20.
  Starting program: /tmp/breakpoint-shlib-func

  Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20
  20	  int answer = foo ();
  (gdb) break foo
  Breakpoint 2 at 0x401030 (2 locations)
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   <MULTIPLE>
  2.1                         y   0x0000000000401030 <foo()@plt>
  2.2                         y   0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20

This is not the expected behaviour.  If we compile the same test using
a C compiler then we see this:

  (gdb) break foo
  Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20.
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20

Here's what's happening.  When GDB parses the symbols in the main
executable and the shared library we see a number of different symbols
for foo, and use these to create entries in GDB's msymbol table:

  - In the main executable we see a symbol 'foo@plt' that points at
    the plt entry for foo, from this we add two entries into GDB's
    msymbol table, one called 'foo@plt' which points at the plt entry
    and has type mst_text, then we create a second symbol, this time
    called 'foo' with type mst_solib_trampoline which also points at
    the plt entry,

  - Then, when the shared library is loaded we see another symbol
    called 'foo', this one points at the actual implementation in the
    shared library.  This time GDB creates a msymbol called 'foo' with
    type mst_text that points at the implementation.

This means that GDB creates 3 msymbols to represent the 2 symbols
found in the executable and shared library.

When the user creates a breakpoint on 'foo' GDB eventually ends up in
search_minsyms_for_name (linespec.c), this function then calls
iterate_over_minimal_symbols passing in the name we are looking for
wrapped in a lookup_name_info object.

In iterate_over_minimal_symbols we iterate over two hash tables (using
the name we're looking for as the hash key), first we walk the hash
table of symbol linkage names, then we walk the hash table of
demangled symbol names.

When the language is C++ the symbols for 'foo' will all have been
mangled, as a result, in this case, the iteration of the linkage name
hash table will find no matching results.

However, when we walk the demangled hash table we do find some
results.  In order to match symbol names, GDB obtains a symbol name
matching function by calling the get_symbol_name_matcher method on the
language_defn class.  For C++, in this case, the matching function we
use is cp_fq_symbol_name_matches, which delegates the work to
strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and
language set to language_cplus.

The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will
skip any parameters in the demangled symbol name when checking for a
match, e.g. 'foo' will match the demangled name 'foo()'.  The way this
is done is that the strings are matched character by character, but,
once the string we are looking for ('foo' here) is exhausted, if we
are looking at '(' then we consider the match a success.

Lets consider the 3 symbols GDB created.  If the function declaration
is 'void foo ()' then from the main executable we added symbols
'_Z3foov@plt' and '_Z3foov', while from the shared library we added
another symbol call '_Z3foov'.  When these are demangled they become
'foo()@plt', 'foo', and 'foo' respectively.

Now, the '_Z3foov' symbol from the main executable has the type
mst_solib_trampoline, and in search_minsyms_for_name, we search for
any symbols of type mst_solib_trampoline and filter these out of the
results.

However, the '_Z3foov@plt' symbol (from the main executable), and the
'_Z3foov' symbol (from the shared library) both have type mst_text.

During the demangled name matching, due to the use of MATCH_PARAMS
mode, we stop the comparison as soon as we hit a '(' in the demangled
name.  And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches
'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'.

By contrast, for C, there are no demangled hash table entries to be
iterated over (in iterate_over_minimal_symbols), we only consider the
linkage name symbols which are 'foo@plt' and 'foo'.  The plain 'foo'
symbol obviously matches when we are looking for 'foo', but in this
case the 'foo@plt' will not match due to the '@plt' suffix.

And so, when the user asks for a breakpoint in 'foo', and the language
is C, search_minsyms_for_name, returns a single msymbol, the mst_text
symbol for foo in the shared library, while, when the language is C++,
we get two results, '_Z3foov' for the shared library function, and
'_Z3foov@plt' for the plt entry in the main executable.

I propose to fix this in strncmp_iw_with_mode.  When the mode is
MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a
success, GDB will instead search forward for the matching, closing,
')', effectively skipping the parameter list, and then resume
matching.  Thus, when comparing 'foo' to 'foo()@plt' GDB will
effectively compare against 'foo@plt' (skipping the parameter list),
and the match will fail, just as it does when the language is C.

There is one slight complication, which is revealed by the test
gdb.linespec/cpcompletion.exp, when searching for the symbol of a
const member function, the demangled symbol will have 'const' at the
end of its name, e.g.:

  struct_with_const_overload::const_overload_fn() const

Previously, the matching would stop at the '(' character, but after my
change the whole '()' is skipped, and the match resumes.  As a result,
the 'const' modifier results in a failure to match, when previously
GDB would have found a match.

To work around this issue, in strncmp_iw_with_mode, for language C++
and mode MATCH_PARAMS, I explicitly allow a trailing 'const' to be
skipped.

With these changes in place I now see GDB correctly setting a
breakpoint only at the implementation of 'foo' in the shared library.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199
---
 .../gdb.cp/breakpoint-shlib-func-lib.cc       | 21 +++++
 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc | 22 ++++++
 .../gdb.cp/breakpoint-shlib-func.exp          | 78 +++++++++++++++++++
 gdb/utils.c                                   | 26 ++++++-
 4 files changed, 146 insertions(+), 1 deletion(-)
 create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
 create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
 create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp

diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
new file mode 100644
index 00000000000..c88620a8bc5
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
@@ -0,0 +1,21 @@
+/* Copyright 2022 Free Software Foundation, Inc.
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+extern int foo ();
+
+int
+foo ()
+{
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
new file mode 100644
index 00000000000..3970e44ec8d
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
@@ -0,0 +1,22 @@
+/* Copyright 2022 Free Software Foundation, Inc.
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+extern int foo ();
+
+int
+main ()
+{
+  int answer = foo ();
+  return answer;
+}
diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
new file mode 100644
index 00000000000..3d15b68c182
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
@@ -0,0 +1,78 @@
+# Copyright 2022 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Places a breakpoint on a function in a shared library before the
+# inferior has started.  GDB will place the breakpoint on the @plt
+# symbol in the main executable.
+#
+# When the inferior is started GDB will re-evaluate the breakpoint
+# location and move the breakpoint to the function implementation in
+# the shared library.
+#
+# Then, with the inferior started, delete all breakpoints, and
+# re-create the breakpoint on the shared library function, GDB should
+# place a single breakpoint on the function implementation in the
+# shared library.
+
+if {[skip_shlib_tests]} {
+    return 0
+}
+
+standard_testfile .cc -lib.cc
+
+set libobj [standard_output_file libfoo.so]
+if { [build_executable "build shared library" $libobj $srcfile2 {debug c++ shlib}] != 0 } {
+    return -1
+}
+
+if { [prepare_for_testing "failed to prepare" $testfile $srcfile [list debug c++ shlib=$libobj]] } {
+    return -1
+}
+
+# Place the breakpoint before the shared library has been loaded, the
+# breakpoint should be placed on the @plt symbol.
+gdb_test "break foo" "Breakpoint $decimal at $hex"
+gdb_test "info breakpoints" "<foo\\(\\)@plt>"
+
+# This is used as an override for delete_breakpoints when we don't
+# want functions in gdb.exp to delete breakpoints behind the scenes
+# for us.
+proc do_not_delete_breakpoints {} {
+    # Just do nothing.
+}
+
+# Runto main, but don't delete all the breakpoints.
+with_override delete_breakpoints do_not_delete_breakpoints {
+    if ![runto_main] {
+	return -1
+    }
+}
+
+# The breakpoint should now be showing in `foo` for real.
+gdb_test "info breakpoints" \
+    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+\r\n.*" \
+    "check breakpoints after starting the inferior"
+
+# Now we can delete the breakpoints.
+delete_breakpoints
+
+# And recreate the foo breakpoint, we should only get one location,
+# the actual location.
+gdb_test "break foo" "Breakpoint $decimal at \[^\r\n\]+" \
+    "recreate foo breakpoint"
+
+# Check the breakpoint was recreated correctly.
+gdb_test "info breakpoints" \
+    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+" \
+    "check breakpoints after recreation"
diff --git a/gdb/utils.c b/gdb/utils.c
index 4a715b945f6..b50e57b39cd 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -2397,7 +2397,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
 	  return 0;
 	}
       else
-	return (*string1 != '\0' && *string1 != '(');
+	{
+	  if (*string1 == '(')
+	    {
+	      int p_count = 0;
+
+	      do
+		{
+		  if (*string1 == '(')
+		    ++p_count;
+		  else if (*string1 == ')')
+		    --p_count;
+		  ++string1;
+		}
+	      while (*string1 != '\0' && p_count > 0);
+
+	      /* There maybe things like 'const' after the parameters,
+		 which we do want to ignore.  However, if there's an '@'
+		 then this likely indicates something like '@plt' which we
+		 should not ignore.  */
+	      return *string1 == '@';
+	    }
+
+	  return *string1 == '\0' ? 0 : 1;
+	}
+
     }
   else
     return 1;

base-commit: 832a980e1725a4a7fa527f2dcd92a64d7e68ab0f
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2] gdb/c++: fix handling of breakpoints on @plt symbols
  2022-12-20 13:55 [PATCH] gdb/c++: fix handling of breakpoints on @plt symbols Andrew Burgess
@ 2023-01-20 11:48 ` Andrew Burgess
  2023-01-30 15:13   ` Andrew Burgess
                     ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Andrew Burgess @ 2023-01-20 11:48 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Changes since v1:

  - Rebased to current HEAD of master,

  - Updated test to use 'require allow_shlib_tests', and wrapped a few
    long lines.  Also Copyright date ranges have been updated for 2023.

---

This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071.
Additionally, PR gdb/17199 relates to this area of code, but is more
of a request to refactor some parts of GDB, this commit does not
address that request, but it is probably worth reading that PR when
looking at this commit.

When the current language is C++, and the user places a breakpoint on
a function in a shared library, GDB will currently find two locations
for the breakpoint, one location will be within the function itself as
we would expect, but the other location will be within the PLT table
for the call to the named function.  Consider this session:

  $ gdb -q /tmp/breakpoint-shlib-func
  Reading symbols from /tmp/breakpoint-shlib-func...
  (gdb) start
  Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20.
  Starting program: /tmp/breakpoint-shlib-func

  Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20
  20	  int answer = foo ();
  (gdb) break foo
  Breakpoint 2 at 0x401030 (2 locations)
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   <MULTIPLE>
  2.1                         y   0x0000000000401030 <foo()@plt>
  2.2                         y   0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20

This is not the expected behaviour.  If we compile the same test using
a C compiler then we see this:

  (gdb) break foo
  Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20.
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20

Here's what's happening.  When GDB parses the symbols in the main
executable and the shared library we see a number of different symbols
for foo, and use these to create entries in GDB's msymbol table:

  - In the main executable we see a symbol 'foo@plt' that points at
    the plt entry for foo, from this we add two entries into GDB's
    msymbol table, one called 'foo@plt' which points at the plt entry
    and has type mst_text, then we create a second symbol, this time
    called 'foo' with type mst_solib_trampoline which also points at
    the plt entry,

  - Then, when the shared library is loaded we see another symbol
    called 'foo', this one points at the actual implementation in the
    shared library.  This time GDB creates a msymbol called 'foo' with
    type mst_text that points at the implementation.

This means that GDB creates 3 msymbols to represent the 2 symbols
found in the executable and shared library.

When the user creates a breakpoint on 'foo' GDB eventually ends up in
search_minsyms_for_name (linespec.c), this function then calls
iterate_over_minimal_symbols passing in the name we are looking for
wrapped in a lookup_name_info object.

In iterate_over_minimal_symbols we iterate over two hash tables (using
the name we're looking for as the hash key), first we walk the hash
table of symbol linkage names, then we walk the hash table of
demangled symbol names.

When the language is C++ the symbols for 'foo' will all have been
mangled, as a result, in this case, the iteration of the linkage name
hash table will find no matching results.

However, when we walk the demangled hash table we do find some
results.  In order to match symbol names, GDB obtains a symbol name
matching function by calling the get_symbol_name_matcher method on the
language_defn class.  For C++, in this case, the matching function we
use is cp_fq_symbol_name_matches, which delegates the work to
strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and
language set to language_cplus.

The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will
skip any parameters in the demangled symbol name when checking for a
match, e.g. 'foo' will match the demangled name 'foo()'.  The way this
is done is that the strings are matched character by character, but,
once the string we are looking for ('foo' here) is exhausted, if we
are looking at '(' then we consider the match a success.

Lets consider the 3 symbols GDB created.  If the function declaration
is 'void foo ()' then from the main executable we added symbols
'_Z3foov@plt' and '_Z3foov', while from the shared library we added
another symbol call '_Z3foov'.  When these are demangled they become
'foo()@plt', 'foo', and 'foo' respectively.

Now, the '_Z3foov' symbol from the main executable has the type
mst_solib_trampoline, and in search_minsyms_for_name, we search for
any symbols of type mst_solib_trampoline and filter these out of the
results.

However, the '_Z3foov@plt' symbol (from the main executable), and the
'_Z3foov' symbol (from the shared library) both have type mst_text.

During the demangled name matching, due to the use of MATCH_PARAMS
mode, we stop the comparison as soon as we hit a '(' in the demangled
name.  And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches
'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'.

By contrast, for C, there are no demangled hash table entries to be
iterated over (in iterate_over_minimal_symbols), we only consider the
linkage name symbols which are 'foo@plt' and 'foo'.  The plain 'foo'
symbol obviously matches when we are looking for 'foo', but in this
case the 'foo@plt' will not match due to the '@plt' suffix.

And so, when the user asks for a breakpoint in 'foo', and the language
is C, search_minsyms_for_name, returns a single msymbol, the mst_text
symbol for foo in the shared library, while, when the language is C++,
we get two results, '_Z3foov' for the shared library function, and
'_Z3foov@plt' for the plt entry in the main executable.

I propose to fix this in strncmp_iw_with_mode.  When the mode is
MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a
success, GDB will instead search forward for the matching, closing,
')', effectively skipping the parameter list, and then resume
matching.  Thus, when comparing 'foo' to 'foo()@plt' GDB will
effectively compare against 'foo@plt' (skipping the parameter list),
and the match will fail, just as it does when the language is C.

There is one slight complication, which is revealed by the test
gdb.linespec/cpcompletion.exp, when searching for the symbol of a
const member function, the demangled symbol will have 'const' at the
end of its name, e.g.:

  struct_with_const_overload::const_overload_fn() const

Previously, the matching would stop at the '(' character, but after my
change the whole '()' is skipped, and the match resumes.  As a result,
the 'const' modifier results in a failure to match, when previously
GDB would have found a match.

To work around this issue, in strncmp_iw_with_mode, for language C++
and mode MATCH_PARAMS, I explicitly allow a trailing 'const' to be
skipped.

With these changes in place I now see GDB correctly setting a
breakpoint only at the implementation of 'foo' in the shared library.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199
---
 .../gdb.cp/breakpoint-shlib-func-lib.cc       | 21 +++++
 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc | 22 ++++++
 .../gdb.cp/breakpoint-shlib-func.exp          | 78 +++++++++++++++++++
 gdb/utils.c                                   | 26 ++++++-
 4 files changed, 146 insertions(+), 1 deletion(-)
 create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
 create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
 create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp

diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
new file mode 100644
index 00000000000..7219f7c5a23
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
@@ -0,0 +1,21 @@
+/* Copyright 2022-2023 Free Software Foundation, Inc.
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+extern int foo ();
+
+int
+foo ()
+{
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
new file mode 100644
index 00000000000..a86d06560d4
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
@@ -0,0 +1,22 @@
+/* Copyright 2022-2023 Free Software Foundation, Inc.
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+extern int foo ();
+
+int
+main ()
+{
+  int answer = foo ();
+  return answer;
+}
diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
new file mode 100644
index 00000000000..5d5c87d0b51
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
@@ -0,0 +1,78 @@
+# Copyright 2022-2023 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Places a breakpoint on a function in a shared library before the
+# inferior has started.  GDB will place the breakpoint on the @plt
+# symbol in the main executable.
+#
+# When the inferior is started GDB will re-evaluate the breakpoint
+# location and move the breakpoint to the function implementation in
+# the shared library.
+#
+# Then, with the inferior started, delete all breakpoints, and
+# re-create the breakpoint on the shared library function, GDB should
+# place a single breakpoint on the function implementation in the
+# shared library.
+
+require allow_shlib_tests
+
+standard_testfile .cc -lib.cc
+
+set libobj [standard_output_file libfoo.so]
+if {[build_executable "build shared library" $libobj $srcfile2 \
+	 {debug c++ shlib}] != 0} {
+    return -1
+}
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+	 [list debug c++ shlib=$libobj]]} {
+    return -1
+}
+
+# Place the breakpoint before the shared library has been loaded, the
+# breakpoint should be placed on the @plt symbol.
+gdb_test "break foo" "Breakpoint $decimal at $hex"
+gdb_test "info breakpoints" "<foo\\(\\)@plt>"
+
+# This is used as an override for delete_breakpoints when we don't
+# want functions in gdb.exp to delete breakpoints behind the scenes
+# for us.
+proc do_not_delete_breakpoints {} {
+    # Just do nothing.
+}
+
+# Runto main, but don't delete all the breakpoints.
+with_override delete_breakpoints do_not_delete_breakpoints {
+    if {![runto_main]} {
+	return -1
+    }
+}
+
+# The breakpoint should now be showing in `foo` for real.
+gdb_test "info breakpoints" \
+    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+\r\n.*" \
+    "check breakpoints after starting the inferior"
+
+# Now we can delete the breakpoints.
+delete_breakpoints
+
+# And recreate the foo breakpoint, we should only get one location,
+# the actual location.
+gdb_test "break foo" "Breakpoint $decimal at \[^\r\n\]+" \
+    "recreate foo breakpoint"
+
+# Check the breakpoint was recreated correctly.
+gdb_test "info breakpoints" \
+    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+" \
+    "check breakpoints after recreation"
diff --git a/gdb/utils.c b/gdb/utils.c
index 734c5bf7f70..e64fec941f1 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -2397,7 +2397,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
 	  return 0;
 	}
       else
-	return (*string1 != '\0' && *string1 != '(');
+	{
+	  if (*string1 == '(')
+	    {
+	      int p_count = 0;
+
+	      do
+		{
+		  if (*string1 == '(')
+		    ++p_count;
+		  else if (*string1 == ')')
+		    --p_count;
+		  ++string1;
+		}
+	      while (*string1 != '\0' && p_count > 0);
+
+	      /* There maybe things like 'const' after the parameters,
+		 which we do want to ignore.  However, if there's an '@'
+		 then this likely indicates something like '@plt' which we
+		 should not ignore.  */
+	      return *string1 == '@';
+	    }
+
+	  return *string1 == '\0' ? 0 : 1;
+	}
+
     }
   else
     return 1;

base-commit: 7d02a94c8f74613edb0cdde11982b22eaaa9affe
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-01-20 11:48 ` [PATCHv2] " Andrew Burgess
@ 2023-01-30 15:13   ` Andrew Burgess
  2023-02-08  9:36   ` Bruno Larsen
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Andrew Burgess @ 2023-01-30 15:13 UTC (permalink / raw)
  To: gdb-patches


Ping.

Andrew Burgess <aburgess@redhat.com> writes:

> Changes since v1:
>
>   - Rebased to current HEAD of master,
>
>   - Updated test to use 'require allow_shlib_tests', and wrapped a few
>     long lines.  Also Copyright date ranges have been updated for 2023.
>
> ---
>
> This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071.
> Additionally, PR gdb/17199 relates to this area of code, but is more
> of a request to refactor some parts of GDB, this commit does not
> address that request, but it is probably worth reading that PR when
> looking at this commit.
>
> When the current language is C++, and the user places a breakpoint on
> a function in a shared library, GDB will currently find two locations
> for the breakpoint, one location will be within the function itself as
> we would expect, but the other location will be within the PLT table
> for the call to the named function.  Consider this session:
>
>   $ gdb -q /tmp/breakpoint-shlib-func
>   Reading symbols from /tmp/breakpoint-shlib-func...
>   (gdb) start
>   Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20.
>   Starting program: /tmp/breakpoint-shlib-func
>
>   Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20
>   20	  int answer = foo ();
>   (gdb) break foo
>   Breakpoint 2 at 0x401030 (2 locations)
>   (gdb) info breakpoints
>   Num     Type           Disp Enb Address            What
>   2       breakpoint     keep y   <MULTIPLE>
>   2.1                         y   0x0000000000401030 <foo()@plt>
>   2.2                         y   0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20
>
> This is not the expected behaviour.  If we compile the same test using
> a C compiler then we see this:
>
>   (gdb) break foo
>   Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20.
>   (gdb) info breakpoints
>   Num     Type           Disp Enb Address            What
>   2       breakpoint     keep y   0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20
>
> Here's what's happening.  When GDB parses the symbols in the main
> executable and the shared library we see a number of different symbols
> for foo, and use these to create entries in GDB's msymbol table:
>
>   - In the main executable we see a symbol 'foo@plt' that points at
>     the plt entry for foo, from this we add two entries into GDB's
>     msymbol table, one called 'foo@plt' which points at the plt entry
>     and has type mst_text, then we create a second symbol, this time
>     called 'foo' with type mst_solib_trampoline which also points at
>     the plt entry,
>
>   - Then, when the shared library is loaded we see another symbol
>     called 'foo', this one points at the actual implementation in the
>     shared library.  This time GDB creates a msymbol called 'foo' with
>     type mst_text that points at the implementation.
>
> This means that GDB creates 3 msymbols to represent the 2 symbols
> found in the executable and shared library.
>
> When the user creates a breakpoint on 'foo' GDB eventually ends up in
> search_minsyms_for_name (linespec.c), this function then calls
> iterate_over_minimal_symbols passing in the name we are looking for
> wrapped in a lookup_name_info object.
>
> In iterate_over_minimal_symbols we iterate over two hash tables (using
> the name we're looking for as the hash key), first we walk the hash
> table of symbol linkage names, then we walk the hash table of
> demangled symbol names.
>
> When the language is C++ the symbols for 'foo' will all have been
> mangled, as a result, in this case, the iteration of the linkage name
> hash table will find no matching results.
>
> However, when we walk the demangled hash table we do find some
> results.  In order to match symbol names, GDB obtains a symbol name
> matching function by calling the get_symbol_name_matcher method on the
> language_defn class.  For C++, in this case, the matching function we
> use is cp_fq_symbol_name_matches, which delegates the work to
> strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and
> language set to language_cplus.
>
> The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will
> skip any parameters in the demangled symbol name when checking for a
> match, e.g. 'foo' will match the demangled name 'foo()'.  The way this
> is done is that the strings are matched character by character, but,
> once the string we are looking for ('foo' here) is exhausted, if we
> are looking at '(' then we consider the match a success.
>
> Lets consider the 3 symbols GDB created.  If the function declaration
> is 'void foo ()' then from the main executable we added symbols
> '_Z3foov@plt' and '_Z3foov', while from the shared library we added
> another symbol call '_Z3foov'.  When these are demangled they become
> 'foo()@plt', 'foo', and 'foo' respectively.
>
> Now, the '_Z3foov' symbol from the main executable has the type
> mst_solib_trampoline, and in search_minsyms_for_name, we search for
> any symbols of type mst_solib_trampoline and filter these out of the
> results.
>
> However, the '_Z3foov@plt' symbol (from the main executable), and the
> '_Z3foov' symbol (from the shared library) both have type mst_text.
>
> During the demangled name matching, due to the use of MATCH_PARAMS
> mode, we stop the comparison as soon as we hit a '(' in the demangled
> name.  And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches
> 'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'.
>
> By contrast, for C, there are no demangled hash table entries to be
> iterated over (in iterate_over_minimal_symbols), we only consider the
> linkage name symbols which are 'foo@plt' and 'foo'.  The plain 'foo'
> symbol obviously matches when we are looking for 'foo', but in this
> case the 'foo@plt' will not match due to the '@plt' suffix.
>
> And so, when the user asks for a breakpoint in 'foo', and the language
> is C, search_minsyms_for_name, returns a single msymbol, the mst_text
> symbol for foo in the shared library, while, when the language is C++,
> we get two results, '_Z3foov' for the shared library function, and
> '_Z3foov@plt' for the plt entry in the main executable.
>
> I propose to fix this in strncmp_iw_with_mode.  When the mode is
> MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a
> success, GDB will instead search forward for the matching, closing,
> ')', effectively skipping the parameter list, and then resume
> matching.  Thus, when comparing 'foo' to 'foo()@plt' GDB will
> effectively compare against 'foo@plt' (skipping the parameter list),
> and the match will fail, just as it does when the language is C.
>
> There is one slight complication, which is revealed by the test
> gdb.linespec/cpcompletion.exp, when searching for the symbol of a
> const member function, the demangled symbol will have 'const' at the
> end of its name, e.g.:
>
>   struct_with_const_overload::const_overload_fn() const
>
> Previously, the matching would stop at the '(' character, but after my
> change the whole '()' is skipped, and the match resumes.  As a result,
> the 'const' modifier results in a failure to match, when previously
> GDB would have found a match.
>
> To work around this issue, in strncmp_iw_with_mode, for language C++
> and mode MATCH_PARAMS, I explicitly allow a trailing 'const' to be
> skipped.
>
> With these changes in place I now see GDB correctly setting a
> breakpoint only at the implementation of 'foo' in the shared library.
>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199
> ---
>  .../gdb.cp/breakpoint-shlib-func-lib.cc       | 21 +++++
>  gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc | 22 ++++++
>  .../gdb.cp/breakpoint-shlib-func.exp          | 78 +++++++++++++++++++
>  gdb/utils.c                                   | 26 ++++++-
>  4 files changed, 146 insertions(+), 1 deletion(-)
>  create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
>  create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
>  create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
>
> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
> new file mode 100644
> index 00000000000..7219f7c5a23
> --- /dev/null
> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
> @@ -0,0 +1,21 @@
> +/* Copyright 2022-2023 Free Software Foundation, Inc.
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +extern int foo ();
> +
> +int
> +foo ()
> +{
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
> new file mode 100644
> index 00000000000..a86d06560d4
> --- /dev/null
> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
> @@ -0,0 +1,22 @@
> +/* Copyright 2022-2023 Free Software Foundation, Inc.
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +extern int foo ();
> +
> +int
> +main ()
> +{
> +  int answer = foo ();
> +  return answer;
> +}
> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
> new file mode 100644
> index 00000000000..5d5c87d0b51
> --- /dev/null
> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
> @@ -0,0 +1,78 @@
> +# Copyright 2022-2023 Free Software Foundation, Inc.
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Places a breakpoint on a function in a shared library before the
> +# inferior has started.  GDB will place the breakpoint on the @plt
> +# symbol in the main executable.
> +#
> +# When the inferior is started GDB will re-evaluate the breakpoint
> +# location and move the breakpoint to the function implementation in
> +# the shared library.
> +#
> +# Then, with the inferior started, delete all breakpoints, and
> +# re-create the breakpoint on the shared library function, GDB should
> +# place a single breakpoint on the function implementation in the
> +# shared library.
> +
> +require allow_shlib_tests
> +
> +standard_testfile .cc -lib.cc
> +
> +set libobj [standard_output_file libfoo.so]
> +if {[build_executable "build shared library" $libobj $srcfile2 \
> +	 {debug c++ shlib}] != 0} {
> +    return -1
> +}
> +
> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
> +	 [list debug c++ shlib=$libobj]]} {
> +    return -1
> +}
> +
> +# Place the breakpoint before the shared library has been loaded, the
> +# breakpoint should be placed on the @plt symbol.
> +gdb_test "break foo" "Breakpoint $decimal at $hex"
> +gdb_test "info breakpoints" "<foo\\(\\)@plt>"
> +
> +# This is used as an override for delete_breakpoints when we don't
> +# want functions in gdb.exp to delete breakpoints behind the scenes
> +# for us.
> +proc do_not_delete_breakpoints {} {
> +    # Just do nothing.
> +}
> +
> +# Runto main, but don't delete all the breakpoints.
> +with_override delete_breakpoints do_not_delete_breakpoints {
> +    if {![runto_main]} {
> +	return -1
> +    }
> +}
> +
> +# The breakpoint should now be showing in `foo` for real.
> +gdb_test "info breakpoints" \
> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+\r\n.*" \
> +    "check breakpoints after starting the inferior"
> +
> +# Now we can delete the breakpoints.
> +delete_breakpoints
> +
> +# And recreate the foo breakpoint, we should only get one location,
> +# the actual location.
> +gdb_test "break foo" "Breakpoint $decimal at \[^\r\n\]+" \
> +    "recreate foo breakpoint"
> +
> +# Check the breakpoint was recreated correctly.
> +gdb_test "info breakpoints" \
> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+" \
> +    "check breakpoints after recreation"
> diff --git a/gdb/utils.c b/gdb/utils.c
> index 734c5bf7f70..e64fec941f1 100644
> --- a/gdb/utils.c
> +++ b/gdb/utils.c
> @@ -2397,7 +2397,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>  	  return 0;
>  	}
>        else
> -	return (*string1 != '\0' && *string1 != '(');
> +	{
> +	  if (*string1 == '(')
> +	    {
> +	      int p_count = 0;
> +
> +	      do
> +		{
> +		  if (*string1 == '(')
> +		    ++p_count;
> +		  else if (*string1 == ')')
> +		    --p_count;
> +		  ++string1;
> +		}
> +	      while (*string1 != '\0' && p_count > 0);
> +
> +	      /* There maybe things like 'const' after the parameters,
> +		 which we do want to ignore.  However, if there's an '@'
> +		 then this likely indicates something like '@plt' which we
> +		 should not ignore.  */
> +	      return *string1 == '@';
> +	    }
> +
> +	  return *string1 == '\0' ? 0 : 1;
> +	}
> +
>      }
>    else
>      return 1;
>
> base-commit: 7d02a94c8f74613edb0cdde11982b22eaaa9affe
> -- 
> 2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-01-20 11:48 ` [PATCHv2] " Andrew Burgess
  2023-01-30 15:13   ` Andrew Burgess
@ 2023-02-08  9:36   ` Bruno Larsen
  2023-02-10 19:09     ` Andrew Burgess
  2023-02-08 22:22   ` Simon Marchi
  2023-02-10 19:03   ` [PATCHv3] " Andrew Burgess
  3 siblings, 1 reply; 12+ messages in thread
From: Bruno Larsen @ 2023-02-08  9:36 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches

On 20/01/2023 12:48, Andrew Burgess via Gdb-patches wrote:
> Changes since v1:
>
>    - Rebased to current HEAD of master,
>
>    - Updated test to use 'require allow_shlib_tests', and wrapped a few
>      long lines.  Also Copyright date ranges have been updated for 2023.
>
> ---

Hi Andrew,

I've tested this patch and verified that it doesn't introduce any 
regressions, just a few minor comments inlined below.

Tested-By: Bruno Larsen <blarsen@redhat.com>

>
> This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071.
> Additionally, PR gdb/17199 relates to this area of code, but is more
> of a request to refactor some parts of GDB, this commit does not
> address that request, but it is probably worth reading that PR when
> looking at this commit.
>
> When the current language is C++, and the user places a breakpoint on
> a function in a shared library, GDB will currently find two locations
> for the breakpoint, one location will be within the function itself as
> we would expect, but the other location will be within the PLT table
> for the call to the named function.  Consider this session:
>
>    $ gdb -q /tmp/breakpoint-shlib-func
>    Reading symbols from /tmp/breakpoint-shlib-func...
>    (gdb) start
>    Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20.
>    Starting program: /tmp/breakpoint-shlib-func
>
>    Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20
>    20	  int answer = foo ();
>    (gdb) break foo
>    Breakpoint 2 at 0x401030 (2 locations)
>    (gdb) info breakpoints
>    Num     Type           Disp Enb Address            What
>    2       breakpoint     keep y   <MULTIPLE>
>    2.1                         y   0x0000000000401030 <foo()@plt>
>    2.2                         y   0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20
>
> This is not the expected behaviour.  If we compile the same test using
> a C compiler then we see this:
>
>    (gdb) break foo
>    Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20.
>    (gdb) info breakpoints
>    Num     Type           Disp Enb Address            What
>    2       breakpoint     keep y   0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20
>
> Here's what's happening.  When GDB parses the symbols in the main
> executable and the shared library we see a number of different symbols
> for foo, and use these to create entries in GDB's msymbol table:
>
>    - In the main executable we see a symbol 'foo@plt' that points at
>      the plt entry for foo, from this we add two entries into GDB's
>      msymbol table, one called 'foo@plt' which points at the plt entry
>      and has type mst_text, then we create a second symbol, this time
>      called 'foo' with type mst_solib_trampoline which also points at
>      the plt entry,
>
>    - Then, when the shared library is loaded we see another symbol
>      called 'foo', this one points at the actual implementation in the
>      shared library.  This time GDB creates a msymbol called 'foo' with
>      type mst_text that points at the implementation.
>
> This means that GDB creates 3 msymbols to represent the 2 symbols
> found in the executable and shared library.
>
> When the user creates a breakpoint on 'foo' GDB eventually ends up in
> search_minsyms_for_name (linespec.c), this function then calls
> iterate_over_minimal_symbols passing in the name we are looking for
> wrapped in a lookup_name_info object.
>
> In iterate_over_minimal_symbols we iterate over two hash tables (using
> the name we're looking for as the hash key), first we walk the hash
> table of symbol linkage names, then we walk the hash table of
> demangled symbol names.
>
> When the language is C++ the symbols for 'foo' will all have been
> mangled, as a result, in this case, the iteration of the linkage name
> hash table will find no matching results.
>
> However, when we walk the demangled hash table we do find some
> results.  In order to match symbol names, GDB obtains a symbol name
> matching function by calling the get_symbol_name_matcher method on the
> language_defn class.  For C++, in this case, the matching function we
> use is cp_fq_symbol_name_matches, which delegates the work to
> strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and
> language set to language_cplus.
>
> The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will
> skip any parameters in the demangled symbol name when checking for a
> match, e.g. 'foo' will match the demangled name 'foo()'.  The way this
> is done is that the strings are matched character by character, but,
> once the string we are looking for ('foo' here) is exhausted, if we
> are looking at '(' then we consider the match a success.
>
> Lets consider the 3 symbols GDB created.  If the function declaration
> is 'void foo ()' then from the main executable we added symbols
> '_Z3foov@plt' and '_Z3foov', while from the shared library we added
> another symbol call '_Z3foov'.  When these are demangled they become
> 'foo()@plt', 'foo', and 'foo' respectively.
>
> Now, the '_Z3foov' symbol from the main executable has the type
> mst_solib_trampoline, and in search_minsyms_for_name, we search for
> any symbols of type mst_solib_trampoline and filter these out of the
> results.
>
> However, the '_Z3foov@plt' symbol (from the main executable), and the
> '_Z3foov' symbol (from the shared library) both have type mst_text.
>
> During the demangled name matching, due to the use of MATCH_PARAMS
> mode, we stop the comparison as soon as we hit a '(' in the demangled
> name.  And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches
> 'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'.
>
> By contrast, for C, there are no demangled hash table entries to be
> iterated over (in iterate_over_minimal_symbols), we only consider the
> linkage name symbols which are 'foo@plt' and 'foo'.  The plain 'foo'
> symbol obviously matches when we are looking for 'foo', but in this
> case the 'foo@plt' will not match due to the '@plt' suffix.
>
> And so, when the user asks for a breakpoint in 'foo', and the language
> is C, search_minsyms_for_name, returns a single msymbol, the mst_text
> symbol for foo in the shared library, while, when the language is C++,
> we get two results, '_Z3foov' for the shared library function, and
> '_Z3foov@plt' for the plt entry in the main executable.
>
> I propose to fix this in strncmp_iw_with_mode.  When the mode is
> MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a
> success, GDB will instead search forward for the matching, closing,
> ')', effectively skipping the parameter list, and then resume
> matching.  Thus, when comparing 'foo' to 'foo()@plt' GDB will
> effectively compare against 'foo@plt' (skipping the parameter list),
> and the match will fail, just as it does when the language is C.
>
> There is one slight complication, which is revealed by the test
> gdb.linespec/cpcompletion.exp, when searching for the symbol of a
> const member function, the demangled symbol will have 'const' at the
> end of its name, e.g.:
>
>    struct_with_const_overload::const_overload_fn() const
>
> Previously, the matching would stop at the '(' character, but after my
> change the whole '()' is skipped, and the match resumes.  As a result,
> the 'const' modifier results in a failure to match, when previously
> GDB would have found a match.
>
> To work around this issue, in strncmp_iw_with_mode, for language C++
> and mode MATCH_PARAMS, I explicitly allow a trailing 'const' to be
> skipped.

This explanation is the inverse of what the code does. What you actually 
implemented is that if an @ is found, you assume it is @plt, rather than 
allowing for "const".

>
> With these changes in place I now see GDB correctly setting a
> breakpoint only at the implementation of 'foo' in the shared library.
>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199
> ---
>   .../gdb.cp/breakpoint-shlib-func-lib.cc       | 21 +++++
>   gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc | 22 ++++++
>   .../gdb.cp/breakpoint-shlib-func.exp          | 78 +++++++++++++++++++
>   gdb/utils.c                                   | 26 ++++++-
>   4 files changed, 146 insertions(+), 1 deletion(-)
>   create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
>   create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
>   create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
>
> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
> new file mode 100644
> index 00000000000..7219f7c5a23
> --- /dev/null
> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
> @@ -0,0 +1,21 @@
> +/* Copyright 2022-2023 Free Software Foundation, Inc.
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +extern int foo ();
> +
> +int
> +foo ()
> +{
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
> new file mode 100644
> index 00000000000..a86d06560d4
> --- /dev/null
> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
> @@ -0,0 +1,22 @@
> +/* Copyright 2022-2023 Free Software Foundation, Inc.
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +extern int foo ();
> +
> +int
> +main ()
> +{
> +  int answer = foo ();
> +  return answer;
> +}
> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
> new file mode 100644
> index 00000000000..5d5c87d0b51
> --- /dev/null
> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
> @@ -0,0 +1,78 @@
> +# Copyright 2022-2023 Free Software Foundation, Inc.
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Places a breakpoint on a function in a shared library before the
> +# inferior has started.  GDB will place the breakpoint on the @plt
> +# symbol in the main executable.
> +#
> +# When the inferior is started GDB will re-evaluate the breakpoint
> +# location and move the breakpoint to the function implementation in
> +# the shared library.
> +#
> +# Then, with the inferior started, delete all breakpoints, and
> +# re-create the breakpoint on the shared library function, GDB should
> +# place a single breakpoint on the function implementation in the
> +# shared library.
> +
> +require allow_shlib_tests
> +
> +standard_testfile .cc -lib.cc
> +
> +set libobj [standard_output_file libfoo.so]
> +if {[build_executable "build shared library" $libobj $srcfile2 \
> +	 {debug c++ shlib}] != 0} {
> +    return -1
> +}
> +
> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
> +	 [list debug c++ shlib=$libobj]]} {
> +    return -1
> +}
> +
> +# Place the breakpoint before the shared library has been loaded, the
> +# breakpoint should be placed on the @plt symbol.
> +gdb_test "break foo" "Breakpoint $decimal at $hex"
> +gdb_test "info breakpoints" "<foo\\(\\)@plt>"
> +
> +# This is used as an override for delete_breakpoints when we don't
> +# want functions in gdb.exp to delete breakpoints behind the scenes
> +# for us.
> +proc do_not_delete_breakpoints {} {
> +    # Just do nothing.
> +}
> +
> +# Runto main, but don't delete all the breakpoints.
> +with_override delete_breakpoints do_not_delete_breakpoints {
> +    if {![runto_main]} {
> +	return -1
> +    }
> +}
> +
> +# The breakpoint should now be showing in `foo` for real.
> +gdb_test "info breakpoints" \
> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+\r\n.*" \
> +    "check breakpoints after starting the inferior"
> +
> +# Now we can delete the breakpoints.
> +delete_breakpoints
> +
> +# And recreate the foo breakpoint, we should only get one location,
> +# the actual location.
> +gdb_test "break foo" "Breakpoint $decimal at \[^\r\n\]+" \
> +    "recreate foo breakpoint"
> +
> +# Check the breakpoint was recreated correctly.
> +gdb_test "info breakpoints" \
> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+" \
> +    "check breakpoints after recreation"
> diff --git a/gdb/utils.c b/gdb/utils.c
> index 734c5bf7f70..e64fec941f1 100644
> --- a/gdb/utils.c
> +++ b/gdb/utils.c
> @@ -2397,7 +2397,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>   	  return 0;
>   	}
>         else
> -	return (*string1 != '\0' && *string1 != '(');
> +	{
> +	  if (*string1 == '(')
> +	    {
> +	      int p_count = 0;
> +
> +	      do
> +		{
> +		  if (*string1 == '(')
> +		    ++p_count;
> +		  else if (*string1 == ')')
> +		    --p_count;
> +		  ++string1;
> +		}
> +	      while (*string1 != '\0' && p_count > 0);
> +
> +	      /* There maybe things like 'const' after the parameters,
> +		 which we do want to ignore.  However, if there's an '@'
> +		 then this likely indicates something like '@plt' which we
> +		 should not ignore.  */
> +	      return *string1 == '@';
> +	    }
> +
> +	  return *string1 == '\0' ? 0 : 1;
> +	}
> +
In here you make the assumption that string1 will always be the one from 
the symbol table and string2 wil always be the one from the user. While 
it seems like a fine assumption to make, since it was there already, but 
it would be nice if this assumption was documented in the comment somewhere.
>       }
>     else
>       return 1;
>
> base-commit: 7d02a94c8f74613edb0cdde11982b22eaaa9affe


-- 
Cheers,
Bruno


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-01-20 11:48 ` [PATCHv2] " Andrew Burgess
  2023-01-30 15:13   ` Andrew Burgess
  2023-02-08  9:36   ` Bruno Larsen
@ 2023-02-08 22:22   ` Simon Marchi
  2023-02-10 19:10     ` Andrew Burgess
  2023-02-10 19:03   ` [PATCHv3] " Andrew Burgess
  3 siblings, 1 reply; 12+ messages in thread
From: Simon Marchi @ 2023-02-08 22:22 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches



> Lets consider the 3 symbols GDB created.  If the function declaration
> is 'void foo ()' then from the main executable we added symbols
> '_Z3foov@plt' and '_Z3foov', while from the shared library we added
> another symbol call '_Z3foov'.  When these are demangled they become
> 'foo()@plt', 'foo', and 'foo' respectively.

I think the last two should be 'foo()'?

I have only looked at the code, not the test.  That part LGTM.  I was
wondering if you could add some unit tests for strncmp_iw_with_mode for
this case though.  There are already some just below the implementation.

Simon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv3] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-01-20 11:48 ` [PATCHv2] " Andrew Burgess
                     ` (2 preceding siblings ...)
  2023-02-08 22:22   ` Simon Marchi
@ 2023-02-10 19:03   ` Andrew Burgess
  2023-02-10 20:51     ` Simon Marchi
  3 siblings, 1 reply; 12+ messages in thread
From: Andrew Burgess @ 2023-02-10 19:03 UTC (permalink / raw)
  To: gdb-patches; +Cc: Simon Marchi, Bruno Larsen, Andrew Burgess

Changes since v2:

  - Reworded the commit message based on Bruno's feedback,

  - Fixed some missing '()' in the commit message based on Simon's
    feedback,

  - Added some selftests,

  - Rebased.

Changes since v1:

  - Rebased to current HEAD of master,

  - Updated test to use 'require allow_shlib_tests', and wrapped a few
    long lines.  Also Copyright date ranges have been updated for 2023.

---

This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071.
Additionally, PR gdb/17199 relates to this area of code, but is more
of a request to refactor some parts of GDB, this commit does not
address that request, but it is probably worth reading that PR when
looking at this commit.

When the current language is C++, and the user places a breakpoint on
a function in a shared library, GDB will currently find two locations
for the breakpoint, one location will be within the function itself as
we would expect, but the other location will be within the PLT table
for the call to the named function.  Consider this session:

  $ gdb -q /tmp/breakpoint-shlib-func
  Reading symbols from /tmp/breakpoint-shlib-func...
  (gdb) start
  Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20.
  Starting program: /tmp/breakpoint-shlib-func

  Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20
  20	  int answer = foo ();
  (gdb) break foo
  Breakpoint 2 at 0x401030 (2 locations)
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   <MULTIPLE>
  2.1                         y   0x0000000000401030 <foo()@plt>
  2.2                         y   0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20

This is not the expected behaviour.  If we compile the same test using
a C compiler then we see this:

  (gdb) break foo
  Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20.
  (gdb) info breakpoints
  Num     Type           Disp Enb Address            What
  2       breakpoint     keep y   0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20

Here's what's happening.  When GDB parses the symbols in the main
executable and the shared library we see a number of different symbols
for foo, and use these to create entries in GDB's msymbol table:

  - In the main executable we see a symbol 'foo@plt' that points at
    the plt entry for foo, from this we add two entries into GDB's
    msymbol table, one called 'foo@plt' which points at the plt entry
    and has type mst_text, then we create a second symbol, this time
    called 'foo' with type mst_solib_trampoline which also points at
    the plt entry,

  - Then, when the shared library is loaded we see another symbol
    called 'foo', this one points at the actual implementation in the
    shared library.  This time GDB creates a msymbol called 'foo' with
    type mst_text that points at the implementation.

This means that GDB creates 3 msymbols to represent the 2 symbols
found in the executable and shared library.

When the user creates a breakpoint on 'foo' GDB eventually ends up in
search_minsyms_for_name (linespec.c), this function then calls
iterate_over_minimal_symbols passing in the name we are looking for
wrapped in a lookup_name_info object.

In iterate_over_minimal_symbols we iterate over two hash tables (using
the name we're looking for as the hash key), first we walk the hash
table of symbol linkage names, then we walk the hash table of
demangled symbol names.

When the language is C++ the symbols for 'foo' will all have been
mangled, as a result, in this case, the iteration of the linkage name
hash table will find no matching results.

However, when we walk the demangled hash table we do find some
results.  In order to match symbol names, GDB obtains a symbol name
matching function by calling the get_symbol_name_matcher method on the
language_defn class.  For C++, in this case, the matching function we
use is cp_fq_symbol_name_matches, which delegates the work to
strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and
language set to language_cplus.

The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will
skip any parameters in the demangled symbol name when checking for a
match, e.g. 'foo' will match the demangled name 'foo()'.  The way this
is done is that the strings are matched character by character, but,
once the string we are looking for ('foo' here) is exhausted, if we
are looking at '(' then we consider the match a success.

Lets consider the 3 symbols GDB created.  If the function declaration
is 'void foo ()' then from the main executable we added symbols
'_Z3foov@plt' and '_Z3foov', while from the shared library we added
another symbol call '_Z3foov'.  When these are demangled they become
'foo()@plt', 'foo()', and 'foo()' respectively.

Now, the '_Z3foov' symbol from the main executable has the type
mst_solib_trampoline, and in search_minsyms_for_name, we search for
any symbols of type mst_solib_trampoline and filter these out of the
results.

However, the '_Z3foov@plt' symbol (from the main executable), and the
'_Z3foov' symbol (from the shared library) both have type mst_text.

During the demangled name matching, due to the use of MATCH_PARAMS
mode, we stop the comparison as soon as we hit a '(' in the demangled
name.  And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches
'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'.

By contrast, for C, there are no demangled hash table entries to be
iterated over (in iterate_over_minimal_symbols), we only consider the
linkage name symbols which are 'foo@plt' and 'foo'.  The plain 'foo'
symbol obviously matches when we are looking for 'foo', but in this
case the 'foo@plt' will not match due to the '@plt' suffix.

And so, when the user asks for a breakpoint in 'foo', and the language
is C, search_minsyms_for_name, returns a single msymbol, the mst_text
symbol for foo in the shared library, while, when the language is C++,
we get two results, '_Z3foov' for the shared library function, and
'_Z3foov@plt' for the plt entry in the main executable.

I propose to fix this in strncmp_iw_with_mode.  When the mode is
MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a
success, GDB will instead search forward for the matching, closing,
')', effectively skipping the parameter list, and then resume
matching.  Thus, when comparing 'foo' to 'foo()@plt' GDB will
effectively compare against 'foo@plt' (skipping the parameter list),
and the match will fail, just as it does when the language is C.

There is one slight complication, which is revealed by the test
gdb.linespec/cpcompletion.exp, when searching for the symbol of a
const member function, the demangled symbol will have 'const' at the
end of its name, e.g.:

  struct_with_const_overload::const_overload_fn() const

Previously, the matching would stop at the '(' character, but after my
change the whole '()' is skipped, and the match resumes.  As a result,
the 'const' modifier results in a failure to match, when previously
GDB would have found a match.

To work around this issue, in strncmp_iw_with_mode, when mode is
MATCH_PARAMS, after skipping the parameter list, if the next character
is '@' then we assume we are looking at something like '@plt' and
return a value indicating the match failed, otherwise, we return a
value indicating the match succeeded, this allows things like 'const'
to be skipped.

With these changes in place I now see GDB correctly setting a
breakpoint only at the implementation of 'foo' in the shared library.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199

Tested-By: Bruno Larsen <blarsen@redhat.com>
---
 .../gdb.cp/breakpoint-shlib-func-lib.cc       | 21 +++++
 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc | 22 ++++++
 .../gdb.cp/breakpoint-shlib-func.exp          | 78 +++++++++++++++++++
 gdb/utils.c                                   | 31 +++++++-
 4 files changed, 151 insertions(+), 1 deletion(-)
 create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
 create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
 create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp

diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
new file mode 100644
index 00000000000..7219f7c5a23
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
@@ -0,0 +1,21 @@
+/* Copyright 2022-2023 Free Software Foundation, Inc.
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+extern int foo ();
+
+int
+foo ()
+{
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
new file mode 100644
index 00000000000..a86d06560d4
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
@@ -0,0 +1,22 @@
+/* Copyright 2022-2023 Free Software Foundation, Inc.
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+extern int foo ();
+
+int
+main ()
+{
+  int answer = foo ();
+  return answer;
+}
diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
new file mode 100644
index 00000000000..5d5c87d0b51
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
@@ -0,0 +1,78 @@
+# Copyright 2022-2023 Free Software Foundation, Inc.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Places a breakpoint on a function in a shared library before the
+# inferior has started.  GDB will place the breakpoint on the @plt
+# symbol in the main executable.
+#
+# When the inferior is started GDB will re-evaluate the breakpoint
+# location and move the breakpoint to the function implementation in
+# the shared library.
+#
+# Then, with the inferior started, delete all breakpoints, and
+# re-create the breakpoint on the shared library function, GDB should
+# place a single breakpoint on the function implementation in the
+# shared library.
+
+require allow_shlib_tests
+
+standard_testfile .cc -lib.cc
+
+set libobj [standard_output_file libfoo.so]
+if {[build_executable "build shared library" $libobj $srcfile2 \
+	 {debug c++ shlib}] != 0} {
+    return -1
+}
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+	 [list debug c++ shlib=$libobj]]} {
+    return -1
+}
+
+# Place the breakpoint before the shared library has been loaded, the
+# breakpoint should be placed on the @plt symbol.
+gdb_test "break foo" "Breakpoint $decimal at $hex"
+gdb_test "info breakpoints" "<foo\\(\\)@plt>"
+
+# This is used as an override for delete_breakpoints when we don't
+# want functions in gdb.exp to delete breakpoints behind the scenes
+# for us.
+proc do_not_delete_breakpoints {} {
+    # Just do nothing.
+}
+
+# Runto main, but don't delete all the breakpoints.
+with_override delete_breakpoints do_not_delete_breakpoints {
+    if {![runto_main]} {
+	return -1
+    }
+}
+
+# The breakpoint should now be showing in `foo` for real.
+gdb_test "info breakpoints" \
+    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+\r\n.*" \
+    "check breakpoints after starting the inferior"
+
+# Now we can delete the breakpoints.
+delete_breakpoints
+
+# And recreate the foo breakpoint, we should only get one location,
+# the actual location.
+gdb_test "break foo" "Breakpoint $decimal at \[^\r\n\]+" \
+    "recreate foo breakpoint"
+
+# Check the breakpoint was recreated correctly.
+gdb_test "info breakpoints" \
+    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+" \
+    "check breakpoints after recreation"
diff --git a/gdb/utils.c b/gdb/utils.c
index 95adbe58e4a..76c0df36a80 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -2402,7 +2402,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
 	  return 0;
 	}
       else
-	return (*string1 != '\0' && *string1 != '(');
+	{
+	  if (*string1 == '(')
+	    {
+	      int p_count = 0;
+
+	      do
+		{
+		  if (*string1 == '(')
+		    ++p_count;
+		  else if (*string1 == ')')
+		    --p_count;
+		  ++string1;
+		}
+	      while (*string1 != '\0' && p_count > 0);
+
+	      /* There maybe things like 'const' after the parameters,
+		 which we do want to ignore.  However, if there's an '@'
+		 then this likely indicates something like '@plt' which we
+		 should not ignore.  */
+	      return *string1 == '@';
+	    }
+
+	  return *string1 == '\0' ? 0 : 1;
+	}
+
     }
   else
     return 1;
@@ -2934,6 +2958,11 @@ strncmp_iw_with_mode_tests ()
   CHECK_MATCH ("foo[abi:a][abi:b](bar[abi:c][abi:d])", "foo[abi:a][abi:b](bar[abi:c][abi:d])",
 	       MATCH_PARAMS);
   CHECK_MATCH ("foo[abi:a][abi:b](bar[abi:c][abi:d])", "foo", MATCH_PARAMS);
+  CHECK_NO_MATCH ("foo(args)@plt", "foo", MATCH_PARAMS);
+  CHECK_NO_MATCH ("foo((())args(()))@plt", "foo", MATCH_PARAMS);
+  CHECK_MATCH ("foo((())args(()))", "foo", MATCH_PARAMS);
+  CHECK_MATCH ("foo(args) const", "foo", MATCH_PARAMS);
+  CHECK_MATCH ("foo(args)const", "foo", MATCH_PARAMS);
 
   /* strncmp_iw_with_mode also supports case insensitivity.  */
   {

base-commit: 1947a4a4bb7651a4656edceb1f9b246f96f89ebd
-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-02-08  9:36   ` Bruno Larsen
@ 2023-02-10 19:09     ` Andrew Burgess
  2023-02-13  9:31       ` Bruno Larsen
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Burgess @ 2023-02-10 19:09 UTC (permalink / raw)
  To: Bruno Larsen, gdb-patches

Bruno Larsen <blarsen@redhat.com> writes:

> On 20/01/2023 12:48, Andrew Burgess via Gdb-patches wrote:
>> Changes since v1:
>>
>>    - Rebased to current HEAD of master,
>>
>>    - Updated test to use 'require allow_shlib_tests', and wrapped a few
>>      long lines.  Also Copyright date ranges have been updated for 2023.
>>
>> ---
>
> Hi Andrew,
>
> I've tested this patch and verified that it doesn't introduce any 
> regressions, just a few minor comments inlined below.
>
> Tested-By: Bruno Larsen <blarsen@redhat.com>
>
>>
>> This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071.
>> Additionally, PR gdb/17199 relates to this area of code, but is more
>> of a request to refactor some parts of GDB, this commit does not
>> address that request, but it is probably worth reading that PR when
>> looking at this commit.
>>
>> When the current language is C++, and the user places a breakpoint on
>> a function in a shared library, GDB will currently find two locations
>> for the breakpoint, one location will be within the function itself as
>> we would expect, but the other location will be within the PLT table
>> for the call to the named function.  Consider this session:
>>
>>    $ gdb -q /tmp/breakpoint-shlib-func
>>    Reading symbols from /tmp/breakpoint-shlib-func...
>>    (gdb) start
>>    Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20.
>>    Starting program: /tmp/breakpoint-shlib-func
>>
>>    Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20
>>    20	  int answer = foo ();
>>    (gdb) break foo
>>    Breakpoint 2 at 0x401030 (2 locations)
>>    (gdb) info breakpoints
>>    Num     Type           Disp Enb Address            What
>>    2       breakpoint     keep y   <MULTIPLE>
>>    2.1                         y   0x0000000000401030 <foo()@plt>
>>    2.2                         y   0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20
>>
>> This is not the expected behaviour.  If we compile the same test using
>> a C compiler then we see this:
>>
>>    (gdb) break foo
>>    Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20.
>>    (gdb) info breakpoints
>>    Num     Type           Disp Enb Address            What
>>    2       breakpoint     keep y   0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20
>>
>> Here's what's happening.  When GDB parses the symbols in the main
>> executable and the shared library we see a number of different symbols
>> for foo, and use these to create entries in GDB's msymbol table:
>>
>>    - In the main executable we see a symbol 'foo@plt' that points at
>>      the plt entry for foo, from this we add two entries into GDB's
>>      msymbol table, one called 'foo@plt' which points at the plt entry
>>      and has type mst_text, then we create a second symbol, this time
>>      called 'foo' with type mst_solib_trampoline which also points at
>>      the plt entry,
>>
>>    - Then, when the shared library is loaded we see another symbol
>>      called 'foo', this one points at the actual implementation in the
>>      shared library.  This time GDB creates a msymbol called 'foo' with
>>      type mst_text that points at the implementation.
>>
>> This means that GDB creates 3 msymbols to represent the 2 symbols
>> found in the executable and shared library.
>>
>> When the user creates a breakpoint on 'foo' GDB eventually ends up in
>> search_minsyms_for_name (linespec.c), this function then calls
>> iterate_over_minimal_symbols passing in the name we are looking for
>> wrapped in a lookup_name_info object.
>>
>> In iterate_over_minimal_symbols we iterate over two hash tables (using
>> the name we're looking for as the hash key), first we walk the hash
>> table of symbol linkage names, then we walk the hash table of
>> demangled symbol names.
>>
>> When the language is C++ the symbols for 'foo' will all have been
>> mangled, as a result, in this case, the iteration of the linkage name
>> hash table will find no matching results.
>>
>> However, when we walk the demangled hash table we do find some
>> results.  In order to match symbol names, GDB obtains a symbol name
>> matching function by calling the get_symbol_name_matcher method on the
>> language_defn class.  For C++, in this case, the matching function we
>> use is cp_fq_symbol_name_matches, which delegates the work to
>> strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and
>> language set to language_cplus.
>>
>> The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will
>> skip any parameters in the demangled symbol name when checking for a
>> match, e.g. 'foo' will match the demangled name 'foo()'.  The way this
>> is done is that the strings are matched character by character, but,
>> once the string we are looking for ('foo' here) is exhausted, if we
>> are looking at '(' then we consider the match a success.
>>
>> Lets consider the 3 symbols GDB created.  If the function declaration
>> is 'void foo ()' then from the main executable we added symbols
>> '_Z3foov@plt' and '_Z3foov', while from the shared library we added
>> another symbol call '_Z3foov'.  When these are demangled they become
>> 'foo()@plt', 'foo', and 'foo' respectively.
>>
>> Now, the '_Z3foov' symbol from the main executable has the type
>> mst_solib_trampoline, and in search_minsyms_for_name, we search for
>> any symbols of type mst_solib_trampoline and filter these out of the
>> results.
>>
>> However, the '_Z3foov@plt' symbol (from the main executable), and the
>> '_Z3foov' symbol (from the shared library) both have type mst_text.
>>
>> During the demangled name matching, due to the use of MATCH_PARAMS
>> mode, we stop the comparison as soon as we hit a '(' in the demangled
>> name.  And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches
>> 'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'.
>>
>> By contrast, for C, there are no demangled hash table entries to be
>> iterated over (in iterate_over_minimal_symbols), we only consider the
>> linkage name symbols which are 'foo@plt' and 'foo'.  The plain 'foo'
>> symbol obviously matches when we are looking for 'foo', but in this
>> case the 'foo@plt' will not match due to the '@plt' suffix.
>>
>> And so, when the user asks for a breakpoint in 'foo', and the language
>> is C, search_minsyms_for_name, returns a single msymbol, the mst_text
>> symbol for foo in the shared library, while, when the language is C++,
>> we get two results, '_Z3foov' for the shared library function, and
>> '_Z3foov@plt' for the plt entry in the main executable.
>>
>> I propose to fix this in strncmp_iw_with_mode.  When the mode is
>> MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a
>> success, GDB will instead search forward for the matching, closing,
>> ')', effectively skipping the parameter list, and then resume
>> matching.  Thus, when comparing 'foo' to 'foo()@plt' GDB will
>> effectively compare against 'foo@plt' (skipping the parameter list),
>> and the match will fail, just as it does when the language is C.
>>
>> There is one slight complication, which is revealed by the test
>> gdb.linespec/cpcompletion.exp, when searching for the symbol of a
>> const member function, the demangled symbol will have 'const' at the
>> end of its name, e.g.:
>>
>>    struct_with_const_overload::const_overload_fn() const
>>
>> Previously, the matching would stop at the '(' character, but after my
>> change the whole '()' is skipped, and the match resumes.  As a result,
>> the 'const' modifier results in a failure to match, when previously
>> GDB would have found a match.
>>
>> To work around this issue, in strncmp_iw_with_mode, for language C++
>> and mode MATCH_PARAMS, I explicitly allow a trailing 'const' to be
>> skipped.
>
> This explanation is the inverse of what the code does. What you actually 
> implemented is that if an @ is found, you assume it is @plt, rather than 
> allowing for "const".

Thanks, I've reworded this in V3.  Do let me know if you still think it
is not correct.

>
>>
>> With these changes in place I now see GDB correctly setting a
>> breakpoint only at the implementation of 'foo' in the shared library.
>>
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199
>> ---
>>   .../gdb.cp/breakpoint-shlib-func-lib.cc       | 21 +++++
>>   gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc | 22 ++++++
>>   .../gdb.cp/breakpoint-shlib-func.exp          | 78 +++++++++++++++++++
>>   gdb/utils.c                                   | 26 ++++++-
>>   4 files changed, 146 insertions(+), 1 deletion(-)
>>   create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
>>   create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
>>   create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
>>
>> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
>> new file mode 100644
>> index 00000000000..7219f7c5a23
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
>> @@ -0,0 +1,21 @@
>> +/* Copyright 2022-2023 Free Software Foundation, Inc.
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +extern int foo ();
>> +
>> +int
>> +foo ()
>> +{
>> +  return 0;
>> +}
>> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
>> new file mode 100644
>> index 00000000000..a86d06560d4
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc
>> @@ -0,0 +1,22 @@
>> +/* Copyright 2022-2023 Free Software Foundation, Inc.
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +extern int foo ();
>> +
>> +int
>> +main ()
>> +{
>> +  int answer = foo ();
>> +  return answer;
>> +}
>> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
>> new file mode 100644
>> index 00000000000..5d5c87d0b51
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp
>> @@ -0,0 +1,78 @@
>> +# Copyright 2022-2023 Free Software Foundation, Inc.
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +# Places a breakpoint on a function in a shared library before the
>> +# inferior has started.  GDB will place the breakpoint on the @plt
>> +# symbol in the main executable.
>> +#
>> +# When the inferior is started GDB will re-evaluate the breakpoint
>> +# location and move the breakpoint to the function implementation in
>> +# the shared library.
>> +#
>> +# Then, with the inferior started, delete all breakpoints, and
>> +# re-create the breakpoint on the shared library function, GDB should
>> +# place a single breakpoint on the function implementation in the
>> +# shared library.
>> +
>> +require allow_shlib_tests
>> +
>> +standard_testfile .cc -lib.cc
>> +
>> +set libobj [standard_output_file libfoo.so]
>> +if {[build_executable "build shared library" $libobj $srcfile2 \
>> +	 {debug c++ shlib}] != 0} {
>> +    return -1
>> +}
>> +
>> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
>> +	 [list debug c++ shlib=$libobj]]} {
>> +    return -1
>> +}
>> +
>> +# Place the breakpoint before the shared library has been loaded, the
>> +# breakpoint should be placed on the @plt symbol.
>> +gdb_test "break foo" "Breakpoint $decimal at $hex"
>> +gdb_test "info breakpoints" "<foo\\(\\)@plt>"
>> +
>> +# This is used as an override for delete_breakpoints when we don't
>> +# want functions in gdb.exp to delete breakpoints behind the scenes
>> +# for us.
>> +proc do_not_delete_breakpoints {} {
>> +    # Just do nothing.
>> +}
>> +
>> +# Runto main, but don't delete all the breakpoints.
>> +with_override delete_breakpoints do_not_delete_breakpoints {
>> +    if {![runto_main]} {
>> +	return -1
>> +    }
>> +}
>> +
>> +# The breakpoint should now be showing in `foo` for real.
>> +gdb_test "info breakpoints" \
>> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+\r\n.*" \
>> +    "check breakpoints after starting the inferior"
>> +
>> +# Now we can delete the breakpoints.
>> +delete_breakpoints
>> +
>> +# And recreate the foo breakpoint, we should only get one location,
>> +# the actual location.
>> +gdb_test "break foo" "Breakpoint $decimal at \[^\r\n\]+" \
>> +    "recreate foo breakpoint"
>> +
>> +# Check the breakpoint was recreated correctly.
>> +gdb_test "info breakpoints" \
>> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+" \
>> +    "check breakpoints after recreation"
>> diff --git a/gdb/utils.c b/gdb/utils.c
>> index 734c5bf7f70..e64fec941f1 100644
>> --- a/gdb/utils.c
>> +++ b/gdb/utils.c
>> @@ -2397,7 +2397,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>>   	  return 0;
>>   	}
>>         else
>> -	return (*string1 != '\0' && *string1 != '(');
>> +	{
>> +	  if (*string1 == '(')
>> +	    {
>> +	      int p_count = 0;
>> +
>> +	      do
>> +		{
>> +		  if (*string1 == '(')
>> +		    ++p_count;
>> +		  else if (*string1 == ')')
>> +		    --p_count;
>> +		  ++string1;
>> +		}
>> +	      while (*string1 != '\0' && p_count > 0);
>> +
>> +	      /* There maybe things like 'const' after the parameters,
>> +		 which we do want to ignore.  However, if there's an '@'
>> +		 then this likely indicates something like '@plt' which we
>> +		 should not ignore.  */
>> +	      return *string1 == '@';
>> +	    }
>> +
>> +	  return *string1 == '\0' ? 0 : 1;
>> +	}
>> +
> In here you make the assumption that string1 will always be the one from 
> the symbol table and string2 wil always be the one from the user. While 
> it seems like a fine assumption to make, since it was there already, but 
> it would be nice if this assumption was documented in the comment
> somewhere.

The documentation for these functions is really not great.  You're right
that string1 is the one likely from the symbol table, but the actual
documentation can be found on the declaration of strcmp_iw in utils.h,
and I believe my changes don't add or change any of the assumptions
already stated there.

I'd really rather not get into trying to rewrite the documentation for
these functions are part of this commit, some parts of these functions,
especially around the mode, still leave me a little confused - I can see
what the functions does, but it's not clear to me why we do it...  I
suspect understanding all this, and documenting this could take some
time :/

Let me know if you feel this is a blocker for merging this patch.

Thanks,
Andrew


>>       }
>>     else
>>       return 1;
>>
>> base-commit: 7d02a94c8f74613edb0cdde11982b22eaaa9affe
>
>
> -- 
> Cheers,
> Bruno


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-02-08 22:22   ` Simon Marchi
@ 2023-02-10 19:10     ` Andrew Burgess
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Burgess @ 2023-02-10 19:10 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

Simon Marchi <simark@simark.ca> writes:

>> Lets consider the 3 symbols GDB created.  If the function declaration
>> is 'void foo ()' then from the main executable we added symbols
>> '_Z3foov@plt' and '_Z3foov', while from the shared library we added
>> another symbol call '_Z3foov'.  When these are demangled they become
>> 'foo()@plt', 'foo', and 'foo' respectively.
>
> I think the last two should be 'foo()'?

Fixed in v3.

>
> I have only looked at the code, not the test.  That part LGTM.  I was
> wondering if you could add some unit tests for strncmp_iw_with_mode for
> this case though.  There are already some just below the implementation.

Thank, I added a couple of tests in v3, which I think cover all the
cases I was interested in.  Let me know what you think.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv3] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-02-10 19:03   ` [PATCHv3] " Andrew Burgess
@ 2023-02-10 20:51     ` Simon Marchi
  2023-02-12  6:22       ` Andrew Burgess
  2023-02-13 14:14       ` Tom Tromey
  0 siblings, 2 replies; 12+ messages in thread
From: Simon Marchi @ 2023-02-10 20:51 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches; +Cc: Bruno Larsen

> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
> new file mode 100644
> index 00000000000..7219f7c5a23
> --- /dev/null
> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
> @@ -0,0 +1,21 @@
> +/* Copyright 2022-2023 Free Software Foundation, Inc.
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +extern int foo ();

I don't think this extern declaration is needed.

> +# The breakpoint should now be showing in `foo` for real.
> +gdb_test "info breakpoints" \
> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+\r\n.*" \
> +    "check breakpoints after starting the inferior"
> +
> +# Now we can delete the breakpoints.
> +delete_breakpoints
> +
> +# And recreate the foo breakpoint, we should only get one location,
> +# the actual location.
> +gdb_test "break foo" "Breakpoint $decimal at \[^\r\n\]+" \
> +    "recreate foo breakpoint"
> +
> +# Check the breakpoint was recreated correctly.
> +gdb_test "info breakpoints" \
> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+" \
> +    "check breakpoints after recreation"

For completeness, you could maybe resume and confirm you are stopped at
foo (and not foo@plt).

Otherwise, it all LGTM:

Approved-By: Simon Marchi <simon.marchi@efficios.com>

> diff --git a/gdb/utils.c b/gdb/utils.c
> index 95adbe58e4a..76c0df36a80 100644
> --- a/gdb/utils.c
> +++ b/gdb/utils.c
> @@ -2402,7 +2402,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>  	  return 0;
>  	}
>        else
> -	return (*string1 != '\0' && *string1 != '(');
> +	{
> +	  if (*string1 == '(')
> +	    {
> +	      int p_count = 0;
> +
> +	      do
> +		{
> +		  if (*string1 == '(')
> +		    ++p_count;
> +		  else if (*string1 == ')')
> +		    --p_count;
> +		  ++string1;
> +		}
> +	      while (*string1 != '\0' && p_count > 0);
> +
> +	      /* There maybe things like 'const' after the parameters,
> +		 which we do want to ignore.  However, if there's an '@'
> +		 then this likely indicates something like '@plt' which we
> +		 should not ignore.  */
> +	      return *string1 == '@';
> +	    }
> +
> +	  return *string1 == '\0' ? 0 : 1;
> +	}

Not really relevant for your patch but... just chatting.  I don't know
if it would be really inefficient, but reading this function, I dream of
some util function that would parse a demangled C++ function symbol into
some structured type (listing the namespaces, name, abi tags,
parameters, whether it is const or not, etc).  It would then be much
easier to work on that, rather than doing everything string-based.

Simon


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv3] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-02-10 20:51     ` Simon Marchi
@ 2023-02-12  6:22       ` Andrew Burgess
  2023-02-13 14:14       ` Tom Tromey
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Burgess @ 2023-02-12  6:22 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Bruno Larsen

Simon Marchi <simark@simark.ca> writes:

>> diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
>> new file mode 100644
>> index 00000000000..7219f7c5a23
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc
>> @@ -0,0 +1,21 @@
>> +/* Copyright 2022-2023 Free Software Foundation, Inc.
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +extern int foo ();
>
> I don't think this extern declaration is needed.
>
>> +# The breakpoint should now be showing in `foo` for real.
>> +gdb_test "info breakpoints" \
>> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+\r\n.*" \
>> +    "check breakpoints after starting the inferior"
>> +
>> +# Now we can delete the breakpoints.
>> +delete_breakpoints
>> +
>> +# And recreate the foo breakpoint, we should only get one location,
>> +# the actual location.
>> +gdb_test "break foo" "Breakpoint $decimal at \[^\r\n\]+" \
>> +    "recreate foo breakpoint"
>> +
>> +# Check the breakpoint was recreated correctly.
>> +gdb_test "info breakpoints" \
>> +    "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+" \
>> +    "check breakpoints after recreation"
>
> For completeness, you could maybe resume and confirm you are stopped at
> foo (and not foo@plt).
>
> Otherwise, it all LGTM:

I removed the 'extern' and added an extra check that we hit the expected
breakpoint, then pushed this patch.

Thanks,
Andrew

>
> Approved-By: Simon Marchi <simon.marchi@efficios.com>
>
>> diff --git a/gdb/utils.c b/gdb/utils.c
>> index 95adbe58e4a..76c0df36a80 100644
>> --- a/gdb/utils.c
>> +++ b/gdb/utils.c
>> @@ -2402,7 +2402,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>>  	  return 0;
>>  	}
>>        else
>> -	return (*string1 != '\0' && *string1 != '(');
>> +	{
>> +	  if (*string1 == '(')
>> +	    {
>> +	      int p_count = 0;
>> +
>> +	      do
>> +		{
>> +		  if (*string1 == '(')
>> +		    ++p_count;
>> +		  else if (*string1 == ')')
>> +		    --p_count;
>> +		  ++string1;
>> +		}
>> +	      while (*string1 != '\0' && p_count > 0);
>> +
>> +	      /* There maybe things like 'const' after the parameters,
>> +		 which we do want to ignore.  However, if there's an '@'
>> +		 then this likely indicates something like '@plt' which we
>> +		 should not ignore.  */
>> +	      return *string1 == '@';
>> +	    }
>> +
>> +	  return *string1 == '\0' ? 0 : 1;
>> +	}
>
> Not really relevant for your patch but... just chatting.  I don't know
> if it would be really inefficient, but reading this function, I dream of
> some util function that would parse a demangled C++ function symbol into
> some structured type (listing the namespaces, name, abi tags,
> parameters, whether it is const or not, etc).  It would then be much
> easier to work on that, rather than doing everything string-based.
>
> Simon


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-02-10 19:09     ` Andrew Burgess
@ 2023-02-13  9:31       ` Bruno Larsen
  0 siblings, 0 replies; 12+ messages in thread
From: Bruno Larsen @ 2023-02-13  9:31 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches

On 10/02/2023 20:09, Andrew Burgess wrote:
> Bruno Larsen <blarsen@redhat.com> writes:
>
>> On 20/01/2023 12:48, Andrew Burgess via Gdb-patches wrote:
>>> Changes since v1:
>>>
>>>     - Rebased to current HEAD of master,
>>>
>>>     - Updated test to use 'require allow_shlib_tests', and wrapped a few
>>>       long lines.  Also Copyright date ranges have been updated for 2023.
>>>
>>> ---
>> Hi Andrew,
>>
>> I've tested this patch and verified that it doesn't introduce any
>> regressions, just a few minor comments inlined below.
>>
>> Tested-By: Bruno Larsen <blarsen@redhat.com>
>>
>>> This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071.
>>> Additionally, PR gdb/17199 relates to this area of code, but is more
>>> of a request to refactor some parts of GDB, this commit does not
>>> address that request, but it is probably worth reading that PR when
>>> looking at this commit.
>>>
>>> When the current language is C++, and the user places a breakpoint on
>>> a function in a shared library, GDB will currently find two locations
>>> for the breakpoint, one location will be within the function itself as
>>> we would expect, but the other location will be within the PLT table
>>> for the call to the named function.  Consider this session:
>>>
>>>     $ gdb -q /tmp/breakpoint-shlib-func
>>>     Reading symbols from /tmp/breakpoint-shlib-func...
>>>     (gdb) start
>>>     Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20.
>>>     Starting program: /tmp/breakpoint-shlib-func
>>>
>>>     Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20
>>>     20	  int answer = foo ();
>>>     (gdb) break foo
>>>     Breakpoint 2 at 0x401030 (2 locations)
>>>     (gdb) info breakpoints
>>>     Num     Type           Disp Enb Address            What
>>>     2       breakpoint     keep y   <MULTIPLE>
>>>     2.1                         y   0x0000000000401030 <foo()@plt>
>>>     2.2                         y   0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20
>>>
>>> This is not the expected behaviour.  If we compile the same test using
>>> a C compiler then we see this:
>>>
>>>     (gdb) break foo
>>>     Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20.
>>>     (gdb) info breakpoints
>>>     Num     Type           Disp Enb Address            What
>>>     2       breakpoint     keep y   0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20
>>>
>>> Here's what's happening.  When GDB parses the symbols in the main
>>> executable and the shared library we see a number of different symbols
>>> for foo, and use these to create entries in GDB's msymbol table:
>>>
>>>     - In the main executable we see a symbol 'foo@plt' that points at
>>>       the plt entry for foo, from this we add two entries into GDB's
>>>       msymbol table, one called 'foo@plt' which points at the plt entry
>>>       and has type mst_text, then we create a second symbol, this time
>>>       called 'foo' with type mst_solib_trampoline which also points at
>>>       the plt entry,
>>>
>>>     - Then, when the shared library is loaded we see another symbol
>>>       called 'foo', this one points at the actual implementation in the
>>>       shared library.  This time GDB creates a msymbol called 'foo' with
>>>       type mst_text that points at the implementation.
>>>
>>> This means that GDB creates 3 msymbols to represent the 2 symbols
>>> found in the executable and shared library.
>>>
>>> When the user creates a breakpoint on 'foo' GDB eventually ends up in
>>> search_minsyms_for_name (linespec.c), this function then calls
>>> iterate_over_minimal_symbols passing in the name we are looking for
>>> wrapped in a lookup_name_info object.
>>>
>>> In iterate_over_minimal_symbols we iterate over two hash tables (using
>>> the name we're looking for as the hash key), first we walk the hash
>>> table of symbol linkage names, then we walk the hash table of
>>> demangled symbol names.
>>>
>>> When the language is C++ the symbols for 'foo' will all have been
>>> mangled, as a result, in this case, the iteration of the linkage name
>>> hash table will find no matching results.
>>>
>>> However, when we walk the demangled hash table we do find some
>>> results.  In order to match symbol names, GDB obtains a symbol name
>>> matching function by calling the get_symbol_name_matcher method on the
>>> language_defn class.  For C++, in this case, the matching function we
>>> use is cp_fq_symbol_name_matches, which delegates the work to
>>> strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and
>>> language set to language_cplus.
>>>
>>> The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will
>>> skip any parameters in the demangled symbol name when checking for a
>>> match, e.g. 'foo' will match the demangled name 'foo()'.  The way this
>>> is done is that the strings are matched character by character, but,
>>> once the string we are looking for ('foo' here) is exhausted, if we
>>> are looking at '(' then we consider the match a success.
>>>
>>> Lets consider the 3 symbols GDB created.  If the function declaration
>>> is 'void foo ()' then from the main executable we added symbols
>>> '_Z3foov@plt' and '_Z3foov', while from the shared library we added
>>> another symbol call '_Z3foov'.  When these are demangled they become
>>> 'foo()@plt', 'foo', and 'foo' respectively.
>>>
>>> Now, the '_Z3foov' symbol from the main executable has the type
>>> mst_solib_trampoline, and in search_minsyms_for_name, we search for
>>> any symbols of type mst_solib_trampoline and filter these out of the
>>> results.
>>>
>>> However, the '_Z3foov@plt' symbol (from the main executable), and the
>>> '_Z3foov' symbol (from the shared library) both have type mst_text.
>>>
>>> During the demangled name matching, due to the use of MATCH_PARAMS
>>> mode, we stop the comparison as soon as we hit a '(' in the demangled
>>> name.  And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches
>>> 'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'.
>>>
>>> By contrast, for C, there are no demangled hash table entries to be
>>> iterated over (in iterate_over_minimal_symbols), we only consider the
>>> linkage name symbols which are 'foo@plt' and 'foo'.  The plain 'foo'
>>> symbol obviously matches when we are looking for 'foo', but in this
>>> case the 'foo@plt' will not match due to the '@plt' suffix.
>>>
>>> And so, when the user asks for a breakpoint in 'foo', and the language
>>> is C, search_minsyms_for_name, returns a single msymbol, the mst_text
>>> symbol for foo in the shared library, while, when the language is C++,
>>> we get two results, '_Z3foov' for the shared library function, and
>>> '_Z3foov@plt' for the plt entry in the main executable.
>>>
>>> I propose to fix this in strncmp_iw_with_mode.  When the mode is
>>> MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a
>>> success, GDB will instead search forward for the matching, closing,
>>> ')', effectively skipping the parameter list, and then resume
>>> matching.  Thus, when comparing 'foo' to 'foo()@plt' GDB will
>>> effectively compare against 'foo@plt' (skipping the parameter list),
>>> and the match will fail, just as it does when the language is C.
>>>
>>> There is one slight complication, which is revealed by the test
>>> gdb.linespec/cpcompletion.exp, when searching for the symbol of a
>>> const member function, the demangled symbol will have 'const' at the
>>> end of its name, e.g.:
>>>
>>>     struct_with_const_overload::const_overload_fn() const
>>>
>>> Previously, the matching would stop at the '(' character, but after my
>>> change the whole '()' is skipped, and the match resumes.  As a result,
>>> the 'const' modifier results in a failure to match, when previously
>>> GDB would have found a match.
>>>
>>> To work around this issue, in strncmp_iw_with_mode, for language C++
>>> and mode MATCH_PARAMS, I explicitly allow a trailing 'const' to be
>>> skipped.
>> This explanation is the inverse of what the code does. What you actually
>> implemented is that if an @ is found, you assume it is @plt, rather than
>> allowing for "const".
> Thanks, I've reworded this in V3.  Do let me know if you still think it
> is not correct.
The new wording is good, thank you!
>
>>> With these changes in place I now see GDB correctly setting a
>>> breakpoint only at the implementation of 'foo' in the shared library.
>>>
>>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091
>>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201
>>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071
>>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199
>>> ---
<snip>
>>> diff --git a/gdb/utils.c b/gdb/utils.c
>>> index 734c5bf7f70..e64fec941f1 100644
>>> --- a/gdb/utils.c
>>> +++ b/gdb/utils.c
>>> @@ -2397,7 +2397,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>>>    	  return 0;
>>>    	}
>>>          else
>>> -	return (*string1 != '\0' && *string1 != '(');
>>> +	{
>>> +	  if (*string1 == '(')
>>> +	    {
>>> +	      int p_count = 0;
>>> +
>>> +	      do
>>> +		{
>>> +		  if (*string1 == '(')
>>> +		    ++p_count;
>>> +		  else if (*string1 == ')')
>>> +		    --p_count;
>>> +		  ++string1;
>>> +		}
>>> +	      while (*string1 != '\0' && p_count > 0);
>>> +
>>> +	      /* There maybe things like 'const' after the parameters,
>>> +		 which we do want to ignore.  However, if there's an '@'
>>> +		 then this likely indicates something like '@plt' which we
>>> +		 should not ignore.  */
>>> +	      return *string1 == '@';
>>> +	    }
>>> +
>>> +	  return *string1 == '\0' ? 0 : 1;
>>> +	}
>>> +
>> In here you make the assumption that string1 will always be the one from
>> the symbol table and string2 wil always be the one from the user. While
>> it seems like a fine assumption to make, since it was there already, but
>> it would be nice if this assumption was documented in the comment
>> somewhere.
> The documentation for these functions is really not great.  You're right
> that string1 is the one likely from the symbol table, but the actual
> documentation can be found on the declaration of strcmp_iw in utils.h,
> and I believe my changes don't add or change any of the assumptions
> already stated there.
>
> I'd really rather not get into trying to rewrite the documentation for
> these functions are part of this commit, some parts of these functions,
> especially around the mode, still leave me a little confused - I can see
> what the functions does, but it's not clear to me why we do it...  I
> suspect understanding all this, and documenting this could take some
> time :/
>
> Let me know if you feel this is a blocker for merging this patch.
I don't think it is a blocker, especially because I agree that it isn't 
a new assumption, I would just like it documented at some point. The 
reason I brought it up is because I thought there were fewer checks for 
size in the path that you added, but I checked again and there aren't, 
so there is no real rush to get the docs up.

-- 
Cheers,
Bruno

>
> Thanks,
> Andrew
>
>
>>>        }
>>>      else
>>>        return 1;
>>>
>>> base-commit: 7d02a94c8f74613edb0cdde11982b22eaaa9affe
>>
>> -- 
>> Cheers,
>> Bruno


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv3] gdb/c++: fix handling of breakpoints on @plt symbols
  2023-02-10 20:51     ` Simon Marchi
  2023-02-12  6:22       ` Andrew Burgess
@ 2023-02-13 14:14       ` Tom Tromey
  1 sibling, 0 replies; 12+ messages in thread
From: Tom Tromey @ 2023-02-13 14:14 UTC (permalink / raw)
  To: Simon Marchi via Gdb-patches; +Cc: Andrew Burgess, Simon Marchi, Bruno Larsen

>>>>> "Simon" == Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:

Simon> Not really relevant for your patch but... just chatting.  I don't know
Simon> if it would be really inefficient, but reading this function, I dream of
Simon> some util function that would parse a demangled C++ function symbol into
Simon> some structured type (listing the namespaces, name, abi tags,
Simon> parameters, whether it is const or not, etc).  It would then be much
Simon> easier to work on that, rather than doing everything string-based.

We have something along these lines in cp-name-parser.py.  It creates
the same data structure that is used internally by the demangler.

I too wish this area could be improved, though I'm unsure of the best
route forward.

Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-13 14:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-20 13:55 [PATCH] gdb/c++: fix handling of breakpoints on @plt symbols Andrew Burgess
2023-01-20 11:48 ` [PATCHv2] " Andrew Burgess
2023-01-30 15:13   ` Andrew Burgess
2023-02-08  9:36   ` Bruno Larsen
2023-02-10 19:09     ` Andrew Burgess
2023-02-13  9:31       ` Bruno Larsen
2023-02-08 22:22   ` Simon Marchi
2023-02-10 19:10     ` Andrew Burgess
2023-02-10 19:03   ` [PATCHv3] " Andrew Burgess
2023-02-10 20:51     ` Simon Marchi
2023-02-12  6:22       ` Andrew Burgess
2023-02-13 14:14       ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).