From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id CD9193857803 for ; Mon, 13 Feb 2023 09:31:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CD9193857803 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676280673; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AgvqwEJmhDI5zgMBFZplcXkYhbYYNlr3Yjsq22WObos=; b=CKsHojDDzkIb+5NDCvg1t3DnHy2cwOMo4dWctB2SGc99H+fnL/i9ElM/GldZwW8Qx5HfY2 WEfTZaor2YJ0lWjW81uPLYIEmRY35w6c/Zg1il75+Kx1o8Bl/4WvKRezK7wEhYKmFik1Fj matGEDyofNw526AaxbNB1wZ1wQnPga0= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-518-Ptq0xDHLPN-9aq9a452Dqw-1; Mon, 13 Feb 2023 04:31:12 -0500 X-MC-Unique: Ptq0xDHLPN-9aq9a452Dqw-1 Received: by mail-wr1-f71.google.com with SMTP id r15-20020a5d494f000000b002c54d9b8312so723533wrs.5 for ; Mon, 13 Feb 2023 01:31:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AgvqwEJmhDI5zgMBFZplcXkYhbYYNlr3Yjsq22WObos=; b=Cv4OTUqgMNg3QNY+6AsOvlPwp3tP/QEufqQubY0IpnAByUd/Xy/vs/fCwNO5CJEsfK bIwWpCSjoF7hY1Dafcr7uhARcI6bCvWXu0FezsPkfa55S11xJGcCwjQ7J/dshEf/CVOz W74eIx/8Dd55ZPNZ6sgYPyAJYXJn1eTuEWLy1nwnVpnL+gqkfsSN0uzHcAmEUv2QEj3g sZgpYocUAmKiwEB6e9zs5e04SFN3M1QsEKItpNnIONYOT8lF2e7dJHI4BjUMk1AB7KLo z+Ts7qDoxe4aqFqCcKeOFTutPrzxurbRB3NAaBT9CgjaTibIwf9iKZ3xM1Wq2xayYL5A ZZAQ== X-Gm-Message-State: AO0yUKUWYNnWRAqczGgkzuRW2D8LL6tFW3ilgs2AJHjTdiQ88oAIWxt9 tkrBke34OXhmn6cByzS4R8nJwpCN12n0mxJZG3F0YpwJzJ0nqlEvcnlvyuyiLaQWmd9lKxpBi4/ gYrz6TOONdav5Lk3AYLW4ogYBeFs= X-Received: by 2002:a5d:610c:0:b0:2bf:ecee:acc6 with SMTP id v12-20020a5d610c000000b002bfeceeacc6mr20060047wrt.61.1676280670574; Mon, 13 Feb 2023 01:31:10 -0800 (PST) X-Google-Smtp-Source: AK7set/FiR2pYAkQXwO0rgQBYl3YCnamKnaJRYK/XPIewNwHz3jYpbJYrz0z6qb/yAzJctYC95LcQg== X-Received: by 2002:a5d:610c:0:b0:2bf:ecee:acc6 with SMTP id v12-20020a5d610c000000b002bfeceeacc6mr20060032wrt.61.1676280670217; Mon, 13 Feb 2023 01:31:10 -0800 (PST) Received: from [192.168.0.45] (ip-94-112-225-44.bb.vodafone.cz. [94.112.225.44]) by smtp.gmail.com with ESMTPSA id x1-20020adfffc1000000b002425be3c9e2sm10118139wrs.60.2023.02.13.01.31.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 13 Feb 2023 01:31:09 -0800 (PST) Message-ID: Date: Mon, 13 Feb 2023 10:31:08 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [PATCHv2] gdb/c++: fix handling of breakpoints on @plt symbols To: Andrew Burgess , gdb-patches@sourceware.org References: <756de5175770d79b3f0642fa3035ef348388bda4.1671544509.git.aburgess@redhat.com> <29e560fd9c94874f0839924fa25557a7e8418ad3.1674215225.git.aburgess@redhat.com> <87v8k9uyku.fsf@redhat.com> From: Bruno Larsen In-Reply-To: <87v8k9uyku.fsf@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 10/02/2023 20:09, Andrew Burgess wrote: > Bruno Larsen writes: > >> On 20/01/2023 12:48, Andrew Burgess via Gdb-patches wrote: >>> Changes since v1: >>> >>> - Rebased to current HEAD of master, >>> >>> - Updated test to use 'require allow_shlib_tests', and wrapped a few >>> long lines. Also Copyright date ranges have been updated for 2023. >>> >>> --- >> Hi Andrew, >> >> I've tested this patch and verified that it doesn't introduce any >> regressions, just a few minor comments inlined below. >> >> Tested-By: Bruno Larsen >> >>> This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071. >>> Additionally, PR gdb/17199 relates to this area of code, but is more >>> of a request to refactor some parts of GDB, this commit does not >>> address that request, but it is probably worth reading that PR when >>> looking at this commit. >>> >>> When the current language is C++, and the user places a breakpoint on >>> a function in a shared library, GDB will currently find two locations >>> for the breakpoint, one location will be within the function itself as >>> we would expect, but the other location will be within the PLT table >>> for the call to the named function. Consider this session: >>> >>> $ gdb -q /tmp/breakpoint-shlib-func >>> Reading symbols from /tmp/breakpoint-shlib-func... >>> (gdb) start >>> Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc, line 20. >>> Starting program: /tmp/breakpoint-shlib-func >>> >>> Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20 >>> 20 int answer = foo (); >>> (gdb) break foo >>> Breakpoint 2 at 0x401030 (2 locations) >>> (gdb) info breakpoints >>> Num Type Disp Enb Address What >>> 2 breakpoint keep y >>> 2.1 y 0x0000000000401030 >>> 2.2 y 0x00007ffff7fc50fd in foo() at /tmp/breakpoint-shlib-func-lib.cc:20 >>> >>> This is not the expected behaviour. If we compile the same test using >>> a C compiler then we see this: >>> >>> (gdb) break foo >>> Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c, line 20. >>> (gdb) info breakpoints >>> Num Type Disp Enb Address What >>> 2 breakpoint keep y 0x00007ffff7fc50fd in foo at /tmp/breakpoint-shlib-func-c-lib.c:20 >>> >>> Here's what's happening. When GDB parses the symbols in the main >>> executable and the shared library we see a number of different symbols >>> for foo, and use these to create entries in GDB's msymbol table: >>> >>> - In the main executable we see a symbol 'foo@plt' that points at >>> the plt entry for foo, from this we add two entries into GDB's >>> msymbol table, one called 'foo@plt' which points at the plt entry >>> and has type mst_text, then we create a second symbol, this time >>> called 'foo' with type mst_solib_trampoline which also points at >>> the plt entry, >>> >>> - Then, when the shared library is loaded we see another symbol >>> called 'foo', this one points at the actual implementation in the >>> shared library. This time GDB creates a msymbol called 'foo' with >>> type mst_text that points at the implementation. >>> >>> This means that GDB creates 3 msymbols to represent the 2 symbols >>> found in the executable and shared library. >>> >>> When the user creates a breakpoint on 'foo' GDB eventually ends up in >>> search_minsyms_for_name (linespec.c), this function then calls >>> iterate_over_minimal_symbols passing in the name we are looking for >>> wrapped in a lookup_name_info object. >>> >>> In iterate_over_minimal_symbols we iterate over two hash tables (using >>> the name we're looking for as the hash key), first we walk the hash >>> table of symbol linkage names, then we walk the hash table of >>> demangled symbol names. >>> >>> When the language is C++ the symbols for 'foo' will all have been >>> mangled, as a result, in this case, the iteration of the linkage name >>> hash table will find no matching results. >>> >>> However, when we walk the demangled hash table we do find some >>> results. In order to match symbol names, GDB obtains a symbol name >>> matching function by calling the get_symbol_name_matcher method on the >>> language_defn class. For C++, in this case, the matching function we >>> use is cp_fq_symbol_name_matches, which delegates the work to >>> strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and >>> language set to language_cplus. >>> >>> The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will >>> skip any parameters in the demangled symbol name when checking for a >>> match, e.g. 'foo' will match the demangled name 'foo()'. The way this >>> is done is that the strings are matched character by character, but, >>> once the string we are looking for ('foo' here) is exhausted, if we >>> are looking at '(' then we consider the match a success. >>> >>> Lets consider the 3 symbols GDB created. If the function declaration >>> is 'void foo ()' then from the main executable we added symbols >>> '_Z3foov@plt' and '_Z3foov', while from the shared library we added >>> another symbol call '_Z3foov'. When these are demangled they become >>> 'foo()@plt', 'foo', and 'foo' respectively. >>> >>> Now, the '_Z3foov' symbol from the main executable has the type >>> mst_solib_trampoline, and in search_minsyms_for_name, we search for >>> any symbols of type mst_solib_trampoline and filter these out of the >>> results. >>> >>> However, the '_Z3foov@plt' symbol (from the main executable), and the >>> '_Z3foov' symbol (from the shared library) both have type mst_text. >>> >>> During the demangled name matching, due to the use of MATCH_PARAMS >>> mode, we stop the comparison as soon as we hit a '(' in the demangled >>> name. And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches >>> 'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'. >>> >>> By contrast, for C, there are no demangled hash table entries to be >>> iterated over (in iterate_over_minimal_symbols), we only consider the >>> linkage name symbols which are 'foo@plt' and 'foo'. The plain 'foo' >>> symbol obviously matches when we are looking for 'foo', but in this >>> case the 'foo@plt' will not match due to the '@plt' suffix. >>> >>> And so, when the user asks for a breakpoint in 'foo', and the language >>> is C, search_minsyms_for_name, returns a single msymbol, the mst_text >>> symbol for foo in the shared library, while, when the language is C++, >>> we get two results, '_Z3foov' for the shared library function, and >>> '_Z3foov@plt' for the plt entry in the main executable. >>> >>> I propose to fix this in strncmp_iw_with_mode. When the mode is >>> MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a >>> success, GDB will instead search forward for the matching, closing, >>> ')', effectively skipping the parameter list, and then resume >>> matching. Thus, when comparing 'foo' to 'foo()@plt' GDB will >>> effectively compare against 'foo@plt' (skipping the parameter list), >>> and the match will fail, just as it does when the language is C. >>> >>> There is one slight complication, which is revealed by the test >>> gdb.linespec/cpcompletion.exp, when searching for the symbol of a >>> const member function, the demangled symbol will have 'const' at the >>> end of its name, e.g.: >>> >>> struct_with_const_overload::const_overload_fn() const >>> >>> Previously, the matching would stop at the '(' character, but after my >>> change the whole '()' is skipped, and the match resumes. As a result, >>> the 'const' modifier results in a failure to match, when previously >>> GDB would have found a match. >>> >>> To work around this issue, in strncmp_iw_with_mode, for language C++ >>> and mode MATCH_PARAMS, I explicitly allow a trailing 'const' to be >>> skipped. >> This explanation is the inverse of what the code does. What you actually >> implemented is that if an @ is found, you assume it is @plt, rather than >> allowing for "const". > Thanks, I've reworded this in V3. Do let me know if you still think it > is not correct. The new wording is good, thank you! > >>> With these changes in place I now see GDB correctly setting a >>> breakpoint only at the implementation of 'foo' in the shared library. >>> >>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091 >>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201 >>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071 >>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199 >>> --- >>> diff --git a/gdb/utils.c b/gdb/utils.c >>> index 734c5bf7f70..e64fec941f1 100644 >>> --- a/gdb/utils.c >>> +++ b/gdb/utils.c >>> @@ -2397,7 +2397,31 @@ strncmp_iw_with_mode (const char *string1, const char *string2, >>> return 0; >>> } >>> else >>> - return (*string1 != '\0' && *string1 != '('); >>> + { >>> + if (*string1 == '(') >>> + { >>> + int p_count = 0; >>> + >>> + do >>> + { >>> + if (*string1 == '(') >>> + ++p_count; >>> + else if (*string1 == ')') >>> + --p_count; >>> + ++string1; >>> + } >>> + while (*string1 != '\0' && p_count > 0); >>> + >>> + /* There maybe things like 'const' after the parameters, >>> + which we do want to ignore. However, if there's an '@' >>> + then this likely indicates something like '@plt' which we >>> + should not ignore. */ >>> + return *string1 == '@'; >>> + } >>> + >>> + return *string1 == '\0' ? 0 : 1; >>> + } >>> + >> In here you make the assumption that string1 will always be the one from >> the symbol table and string2 wil always be the one from the user. While >> it seems like a fine assumption to make, since it was there already, but >> it would be nice if this assumption was documented in the comment >> somewhere. > The documentation for these functions is really not great. You're right > that string1 is the one likely from the symbol table, but the actual > documentation can be found on the declaration of strcmp_iw in utils.h, > and I believe my changes don't add or change any of the assumptions > already stated there. > > I'd really rather not get into trying to rewrite the documentation for > these functions are part of this commit, some parts of these functions, > especially around the mode, still leave me a little confused - I can see > what the functions does, but it's not clear to me why we do it... I > suspect understanding all this, and documenting this could take some > time :/ > > Let me know if you feel this is a blocker for merging this patch. I don't think it is a blocker, especially because I agree that it isn't a new assumption, I would just like it documented at some point. The reason I brought it up is because I thought there were fewer checks for size in the path that you added, but I checked again and there aren't, so there is no real rush to get the docs up. -- Cheers, Bruno > > Thanks, > Andrew > > >>> } >>> else >>> return 1; >>> >>> base-commit: 7d02a94c8f74613edb0cdde11982b22eaaa9affe >> >> -- >> Cheers, >> Bruno