From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 86FA33858D39 for ; Wed, 8 Feb 2023 09:36:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 86FA33858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675849015; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Tti7rncVG0QPKtOhB63mEMgFuNg8BMYZ6Uj+uQbxEeA=; b=bR/aMDwLtGfp8D42nxIYkYm506g7mnIX2SUdvN+6qUJTVVorsvtIcFyjWt3UGH/mZhIh2x 7lTuopnCZf6oyZK2H+GeFu2SC/ZQgDbbwtGa5tapvSrymw5P9Stz8nPCTJfx2v0rQU8oFJ ZOdTAFzJDLvGNYEufwFwpM5PrezZOq4= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-270-8SF-MxPFOlSuIImGdwz5IA-1; Wed, 08 Feb 2023 04:36:53 -0500 X-MC-Unique: 8SF-MxPFOlSuIImGdwz5IA-1 Received: by mail-ej1-f71.google.com with SMTP id d14-20020a170906c20e00b00889f989d8deso12808251ejz.15 for ; Wed, 08 Feb 2023 01:36:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Tti7rncVG0QPKtOhB63mEMgFuNg8BMYZ6Uj+uQbxEeA=; b=UwJhxyUeeGQdGLIpNl0cgQwpX7rqnk5f3chnsl5cqWfo9BzDDSEGfIOXhxVGbx6TvZ M/ek/oJusc12+soCymVSmcvLz6UHPqJPpfpjSJUHxTVHtr6G8MU7KrOksPqi8lEIK4Ka HNblEmTGLD/r8u62Ipwb0s5NU9he8hBkJM++zPn/vt+yW3LJ4wnWPOm1ElPHfGzMVnjN 3Jo6N/ssF+kQAA8z4VguTs0EGpmqbeortOP3i+IMXdBirbl6SzMZmiEBmpS0haznUP3H RfifW6kAGXkRx9cE275NOHiMevkTJWfEVRE13Jf6KwQqPMfmRa5h1SybJYsVxOWvuele hcEw== X-Gm-Message-State: AO0yUKXAVoOtuzmPJ4h/Kz7AOyGAPobMn4cGeqXltlUl8jO5kPbbxmrm VMTYGJXVapUra40b/RZ2t5cYhP+resFK8Uvb5GlllyOh1/s+oStX7aWQc5q3Z2g5MmE8FMFxfot UE28asUenKFhuO13PRwbh8w== X-Received: by 2002:a17:906:3857:b0:8aa:a7b6:1c4c with SMTP id w23-20020a170906385700b008aaa7b61c4cmr4496185ejc.21.1675849012191; Wed, 08 Feb 2023 01:36:52 -0800 (PST) X-Google-Smtp-Source: AK7set+owmr8k8uKhZ38XRJEhmFfHyt03d82CY/68906ZazCM8LMcttXu8VrTCdOTMKR2SBnt8dR5g== X-Received: by 2002:a17:906:3857:b0:8aa:a7b6:1c4c with SMTP id w23-20020a170906385700b008aaa7b61c4cmr4496168ejc.21.1675849011894; Wed, 08 Feb 2023 01:36:51 -0800 (PST) Received: from [10.43.2.105] (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id t4-20020a170906268400b0087bda70d3efsm8044489ejc.118.2023.02.08.01.36.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Feb 2023 01:36:51 -0800 (PST) Message-ID: Date: Wed, 8 Feb 2023 10:36:50 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [PATCHv2] gdb/c++: fix handling of breakpoints on @plt symbols To: Andrew Burgess , gdb-patches@sourceware.org References: <756de5175770d79b3f0642fa3035ef348388bda4.1671544509.git.aburgess@redhat.com> <29e560fd9c94874f0839924fa25557a7e8418ad3.1674215225.git.aburgess@redhat.com> From: Bruno Larsen In-Reply-To: <29e560fd9c94874f0839924fa25557a7e8418ad3.1674215225.git.aburgess@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 20/01/2023 12:48, Andrew Burgess via Gdb-patches wrote: > Changes since v1: > > - Rebased to current HEAD of master, > > - Updated test to use 'require allow_shlib_tests', and wrapped a few > long lines. Also Copyright date ranges have been updated for 2023. > > --- Hi Andrew, I've tested this patch and verified that it doesn't introduce any regressions, just a few minor comments inlined below. Tested-By: Bruno Larsen > > This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071. > Additionally, PR gdb/17199 relates to this area of code, but is more > of a request to refactor some parts of GDB, this commit does not > address that request, but it is probably worth reading that PR when > looking at this commit. > > When the current language is C++, and the user places a breakpoint on > a function in a shared library, GDB will currently find two locations > for the breakpoint, one location will be within the function itself as > we would expect, but the other location will be within the PLT table > for the call to the named function. Consider this session: > > $ gdb -q /tmp/breakpoint-shlib-func > Reading symbols from /tmp/breakpoint-shlib-func... > (gdb) start > Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20. > Starting program: /tmp/breakpoint-shlib-func > > Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20 > 20 int answer = foo (); > (gdb) break foo > Breakpoint 2 at 0x401030 (2 locations) > (gdb) info breakpoints > Num Type Disp Enb Address What > 2 breakpoint keep y > 2.1 y 0x0000000000401030 > 2.2 y 0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20 > > This is not the expected behaviour. If we compile the same test using > a C compiler then we see this: > > (gdb) break foo > Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20. > (gdb) info breakpoints > Num Type Disp Enb Address What > 2 breakpoint keep y 0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20 > > Here's what's happening. When GDB parses the symbols in the main > executable and the shared library we see a number of different symbols > for foo, and use these to create entries in GDB's msymbol table: > > - In the main executable we see a symbol 'foo@plt' that points at > the plt entry for foo, from this we add two entries into GDB's > msymbol table, one called 'foo@plt' which points at the plt entry > and has type mst_text, then we create a second symbol, this time > called 'foo' with type mst_solib_trampoline which also points at > the plt entry, > > - Then, when the shared library is loaded we see another symbol > called 'foo', this one points at the actual implementation in the > shared library. This time GDB creates a msymbol called 'foo' with > type mst_text that points at the implementation. > > This means that GDB creates 3 msymbols to represent the 2 symbols > found in the executable and shared library. > > When the user creates a breakpoint on 'foo' GDB eventually ends up in > search_minsyms_for_name (linespec.c), this function then calls > iterate_over_minimal_symbols passing in the name we are looking for > wrapped in a lookup_name_info object. > > In iterate_over_minimal_symbols we iterate over two hash tables (using > the name we're looking for as the hash key), first we walk the hash > table of symbol linkage names, then we walk the hash table of > demangled symbol names. > > When the language is C++ the symbols for 'foo' will all have been > mangled, as a result, in this case, the iteration of the linkage name > hash table will find no matching results. > > However, when we walk the demangled hash table we do find some > results. In order to match symbol names, GDB obtains a symbol name > matching function by calling the get_symbol_name_matcher method on the > language_defn class. For C++, in this case, the matching function we > use is cp_fq_symbol_name_matches, which delegates the work to > strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and > language set to language_cplus. > > The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will > skip any parameters in the demangled symbol name when checking for a > match, e.g. 'foo' will match the demangled name 'foo()'. The way this > is done is that the strings are matched character by character, but, > once the string we are looking for ('foo' here) is exhausted, if we > are looking at '(' then we consider the match a success. > > Lets consider the 3 symbols GDB created. If the function declaration > is 'void foo ()' then from the main executable we added symbols > '_Z3foov@plt' and '_Z3foov', while from the shared library we added > another symbol call '_Z3foov'. When these are demangled they become > 'foo()@plt', 'foo', and 'foo' respectively. > > Now, the '_Z3foov' symbol from the main executable has the type > mst_solib_trampoline, and in search_minsyms_for_name, we search for > any symbols of type mst_solib_trampoline and filter these out of the > results. > > However, the '_Z3foov@plt' symbol (from the main executable), and the > '_Z3foov' symbol (from the shared library) both have type mst_text. > > During the demangled name matching, due to the use of MATCH_PARAMS > mode, we stop the comparison as soon as we hit a '(' in the demangled > name. And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches > 'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'. > > By contrast, for C, there are no demangled hash table entries to be > iterated over (in iterate_over_minimal_symbols), we only consider the > linkage name symbols which are 'foo@plt' and 'foo'. The plain 'foo' > symbol obviously matches when we are looking for 'foo', but in this > case the 'foo@plt' will not match due to the '@plt' suffix. > > And so, when the user asks for a breakpoint in 'foo', and the language > is C, search_minsyms_for_name, returns a single msymbol, the mst_text > symbol for foo in the shared library, while, when the language is C++, > we get two results, '_Z3foov' for the shared library function, and > '_Z3foov@plt' for the plt entry in the main executable. > > I propose to fix this in strncmp_iw_with_mode. When the mode is > MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a > success, GDB will instead search forward for the matching, closing, > ')', effectively skipping the parameter list, and then resume > matching. Thus, when comparing 'foo' to 'foo()@plt' GDB will > effectively compare against 'foo@plt' (skipping the parameter list), > and the match will fail, just as it does when the language is C. > > There is one slight complication, which is revealed by the test > gdb.linespec/cpcompletion.exp, when searching for the symbol of a > const member function, the demangled symbol will have 'const' at the > end of its name, e.g.: > > struct_with_const_overload::const_overload_fn() const > > Previously, the matching would stop at the '(' character, but after my > change the whole '()' is skipped, and the match resumes. As a result, > the 'const' modifier results in a failure to match, when previously > GDB would have found a match. > > To work around this issue, in strncmp_iw_with_mode, for language C++ > and mode MATCH_PARAMS, I explicitly allow a trailing 'const' to be > skipped. This explanation is the inverse of what the code does. What you actually implemented is that if an @ is found, you assume it is @plt, rather than allowing for "const". > > With these changes in place I now see GDB correctly setting a > breakpoint only at the implementation of 'foo' in the shared library. > > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091 > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201 > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071 > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199 > --- > .../gdb.cp/breakpoint-shlib-func-lib.cc | 21 +++++ > gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc | 22 ++++++ > .../gdb.cp/breakpoint-shlib-func.exp | 78 +++++++++++++++++++ > gdb/utils.c | 26 ++++++- > 4 files changed, 146 insertions(+), 1 deletion(-) > create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc > create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc > create mode 100644 gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp > > diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc > new file mode 100644 > index 00000000000..7219f7c5a23 > --- /dev/null > +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func-lib.cc > @@ -0,0 +1,21 @@ > +/* Copyright 2022-2023 Free Software Foundation, Inc. > + This program is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + This program is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with this program. If not, see . */ > + > +extern int foo (); > + > +int > +foo () > +{ > + return 0; > +} > diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc > new file mode 100644 > index 00000000000..a86d06560d4 > --- /dev/null > +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.cc > @@ -0,0 +1,22 @@ > +/* Copyright 2022-2023 Free Software Foundation, Inc. > + This program is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + This program is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with this program. If not, see . */ > + > +extern int foo (); > + > +int > +main () > +{ > + int answer = foo (); > + return answer; > +} > diff --git a/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp > new file mode 100644 > index 00000000000..5d5c87d0b51 > --- /dev/null > +++ b/gdb/testsuite/gdb.cp/breakpoint-shlib-func.exp > @@ -0,0 +1,78 @@ > +# Copyright 2022-2023 Free Software Foundation, Inc. > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +# Places a breakpoint on a function in a shared library before the > +# inferior has started. GDB will place the breakpoint on the @plt > +# symbol in the main executable. > +# > +# When the inferior is started GDB will re-evaluate the breakpoint > +# location and move the breakpoint to the function implementation in > +# the shared library. > +# > +# Then, with the inferior started, delete all breakpoints, and > +# re-create the breakpoint on the shared library function, GDB should > +# place a single breakpoint on the function implementation in the > +# shared library. > + > +require allow_shlib_tests > + > +standard_testfile .cc -lib.cc > + > +set libobj [standard_output_file libfoo.so] > +if {[build_executable "build shared library" $libobj $srcfile2 \ > + {debug c++ shlib}] != 0} { > + return -1 > +} > + > +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \ > + [list debug c++ shlib=$libobj]]} { > + return -1 > +} > + > +# Place the breakpoint before the shared library has been loaded, the > +# breakpoint should be placed on the @plt symbol. > +gdb_test "break foo" "Breakpoint $decimal at $hex" > +gdb_test "info breakpoints" "" > + > +# This is used as an override for delete_breakpoints when we don't > +# want functions in gdb.exp to delete breakpoints behind the scenes > +# for us. > +proc do_not_delete_breakpoints {} { > + # Just do nothing. > +} > + > +# Runto main, but don't delete all the breakpoints. > +with_override delete_breakpoints do_not_delete_breakpoints { > + if {![runto_main]} { > + return -1 > + } > +} > + > +# The breakpoint should now be showing in `foo` for real. > +gdb_test "info breakpoints" \ > + "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+\r\n.*" \ > + "check breakpoints after starting the inferior" > + > +# Now we can delete the breakpoints. > +delete_breakpoints > + > +# And recreate the foo breakpoint, we should only get one location, > +# the actual location. > +gdb_test "break foo" "Breakpoint $decimal at \[^\r\n\]+" \ > + "recreate foo breakpoint" > + > +# Check the breakpoint was recreated correctly. > +gdb_test "info breakpoints" \ > + "\r\n$decimal\\s+\[^\r\n\]+ in foo\\(\\) at \[^\r\n\]+" \ > + "check breakpoints after recreation" > diff --git a/gdb/utils.c b/gdb/utils.c > index 734c5bf7f70..e64fec941f1 100644 > --- a/gdb/utils.c > +++ b/gdb/utils.c > @@ -2397,7 +2397,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2, > return 0; > } > else > - return (*string1 != '\0' && *string1 != '('); > + { > + if (*string1 == '(') > + { > + int p_count = 0; > + > + do > + { > + if (*string1 == '(') > + ++p_count; > + else if (*string1 == ')') > + --p_count; > + ++string1; > + } > + while (*string1 != '\0' && p_count > 0); > + > + /* There maybe things like 'const' after the parameters, > + which we do want to ignore. However, if there's an '@' > + then this likely indicates something like '@plt' which we > + should not ignore. */ > + return *string1 == '@'; > + } > + > + return *string1 == '\0' ? 0 : 1; > + } > + In here you make the assumption that string1 will always be the one from the symbol table and string2 wil always be the one from the user. While it seems like a fine assumption to make, since it was there already, but it would be nice if this assumption was documented in the comment somewhere. > } > else > return 1; > > base-commit: 7d02a94c8f74613edb0cdde11982b22eaaa9affe -- Cheers, Bruno