From: Zheng <linuxmaker@163.com>
To: "Andrew Burgess" <aburgess@redhat.com>
Cc: "Tom Tromey" <tom@tromey.com>,
"Zheng Zhan Liang via Gdb-patches" <gdb-patches@sourceware.org>,
"Zheng Zhan" <zzlossdev@163.com>
Subject: Re:Re:Re:Re: [PATCH] I'm debugging https://github.com/helix-editor/helix.git@63dcaae1b9083396fb3faaef9eaa2421f7e48fb9, which is a editor implemented with rust lang. When I type gdb command below: (gdb) b pars gdb dumped. I got: m_match = 0x7fffd8173cc7 "parse::h3bbecc5bbd82b347" m_ignored_ranges = { first = 0x7fffd8173cbb "<impl str>::parse::h3bbecc5bbd82b347", second = 0x7fffd8173cc5 "::parse::h3bbecc5bbd82b347" }
Date: Fri, 13 Jan 2023 13:24:56 +0800 (CST) [thread overview]
Message-ID: <61496177.3025.185a996955a.Coremail.linuxmaker@163.com> (raw)
In-Reply-To: <875ydbh8th.fsf@redhat.com>
At 2023-01-13 03:06:02, "Andrew Burgess" <aburgess@redhat.com> wrote:
>Zheng <linuxmaker@163.com> writes:
>
>> At 2023-01-06 23:53:48, "Andrew Burgess" <aburgess@redhat.com> wrote:
>>>linuxmaker via Gdb-patches <gdb-patches@sourceware.org> writes:
>>>
>>>> At 2023-01-06 04:24:42, "Tom Tromey" <tom@tromey.com> wrote:
>>>>>>>>>> "Zheng" == Zheng Zhan Liang via Gdb-patches <gdb-patches@sourceware.org> writes:
>>>>>
>>>>>Zheng> From: Zheng Zhan <zzlossdev@163.com>
>>>>>
>>>>>Hi. Thank you for the patch.
>>>>>
>>>>>The text all seemed to end up in the Subject line. Probably you need
>>>>>another newline after the first line of the commit message.
>>>>>
>>>>>Zheng> --- a/gdb/completer.h
>>>>>Zheng> +++ b/gdb/completer.h
>>>>>Zheng> @@ -163,8 +163,11 @@ class completion_match_for_lcd
>>>>>Zheng> const char *prev = m_match;
>>>>>Zheng> for (const auto &range : m_ignored_ranges)
>>>>>Zheng> {
>>>>>Zheng> - m_finished_storage.append (prev, range.first);
>>>>>Zheng> - prev = range.second;
>>>>>Zheng> + if (prev < range.first)
>>>>>Zheng> + {
>>>>>Zheng> + m_finished_storage.append (prev, range.first);
>>>>>Zheng> + prev = range.second;
>>>>>Zheng> + }
>>>>>
>>>>>Is there any way to construct a test case for this?
>>>> seems pretty common in rust. you can test it like below:
>>>> fn main() {
>>>> let four: u8 = "4".parse().unwrap();
>>>> println!("{}", four);
>>>> }
>>>> (gdb) b pars[TAB]
>>>
>>>I haven't reviewed the fix. But in case this makes it easier for
>> I've done a little more digging. And It looks like `strncmp_iw_with_mode` cached
>> the last `mark_ignored_range` setting, then crap the current one.
>> I paste a newer patch at the bottom, and it seems work for me.
>
>Thanks, I was also looking at the same area of code, wondering if that
>would be a better place for the fix.
>
>I'd actually like to propose that the fix be moved one frame up the
>stack to cp_symbol_name_matches_1, my reasons are explained in the patch
>below.
>
>Additionally, I've now managed to reproduce the crash when completing a
>C++ symbol, which means we have a much more reliable test - it doesn't
>depend on symbols being pulled in from the rust runtime, which might
>change in the future.
>
>The patch below is my proposal for what we should merge.
>
>All feedback welcome.
>
>Thanks,
>Andrew
>
>--
>
>commit 467065c7a25a6fc87fab6c73ac6d7d509aae81b6
>Author: Andrew Burgess <aburgess@redhat.com>
>Date: Fri Jan 6 15:50:26 2023 +0000
>
> gdb: fix crash during command completion
>
> In some cases GDB will fail when attempting to complete a command that
> involves a rust symbol, the failure can manifest as a crash.
>
> The problem is caused by the completion_match_for_lcd object being
> left containing invalid data during calls to cp_symbol_name_matches_1.
>
> The first question to address is why we are calling a C++ support
> function when handling a rust symbol. That's due to GDB's auto
> language detection for msymbols, in some cases GDB can't tell if a
> symbol is a rust symbol, or a C++ symbol.
>
> The test application contains symbols for functions which are
> statically linked in from various rust support libraries. There's no
> DWARF for these symbols, so all GDB has is the msymbols built from the
> ELF symbol table.
>
> Here's the problematic symbol that leads to our crash:
>
> mangled: _ZN4core3str21_$LT$impl$u20$str$GT$5parse17h5111d2d6a50d22bdE
> demangled: core::str::<impl str>::parse
>
> As an msymbol this is initially created with language auto, then GDB
> eventually calls symbol_find_demangled_name, which loops over all
> languages calling language_defn::sniff_from_mangled_name, the first
> language that can demangle the symbol gets assigned as the language
> for that symbol.
>
> Unfortunately, there's overlap in the mangled symbol names,
> some (legacy) rust symbols can be demangled as both rust and C++, see
> cplus_demangle in libiberty/cplus-dem.c where this is mentioned.
>
> And so, because we check the C++ language before we check for rust,
> then the msymbol is (incorrectly) given the C++ language.
>
> Now it's true that is some cases we might be able to figure out that a
> demangled symbol is not actually a valid C++ symbol, for example, in
> our case, the construct '::<impl str>::' is not, I believe, valid in a
> C++ symbol, we could look for ':<' and '>:' and refuse to accept this
> as a C++ symbol.
>
> However, I'm not sure it is always possible to tell that a demangled
> symbol is rust or C++, so, I think, we have to accept that some times
> we will get this language detection wrong.
>
> If we accept that we can't fix the symbol language detection 100% of
> the time, then we should make sure that GDB doesn't crash when it gets
> the language wrong, that is what this commit addresses.
>
> In our test case the user tries to complete a symbol name like this:
>
> (gdb) complete break pars
>
> This results in GDB trying to find all symbols that match 'pars',
> eventually we consider our problematic symbol, and we end up with a
> call stack that looks like this:
>
> #0 0x0000000000f3c6bd in strncmp_iw_with_mode
> #1 0x0000000000706d8d in cp_symbol_name_matches_1
> #2 0x0000000000706fa4 in cp_symbol_name_matches
> #3 0x0000000000df3c45 in compare_symbol_name
> #4 0x0000000000df3c91 in completion_list_add_name
> #5 0x0000000000df3f1d in completion_list_add_msymbol
> #6 0x0000000000df4c94 in default_collect_symbol_completion_matches_break_on
> #7 0x0000000000658c08 in language_defn::collect_symbol_completion_matches
> #8 0x0000000000df54c9 in collect_symbol_completion_matches
> #9 0x00000000009d98fb in linespec_complete_function
> #10 0x00000000009d99f0 in complete_linespec_component
> #11 0x00000000009da200 in linespec_complete
> #12 0x00000000006e4132 in complete_address_and_linespec_locations
> #13 0x00000000006e4ac3 in location_completer
>
> In cp_symbol_name_matches_1 we enter a loop, this loop repeatedly
> tries to match the demangled problematic symbol name against the user
> supplied text ('pars'). Each time around the loop another component
> of the symbol name is stripped off, thus, we check 'pars' against
> these options:
>
> core::str::<impl str>::parse
> str::<impl str>::parse
> <impl str>::parse
> parse
>
> As soon as we get a match the cp_symbol_name_matches_1 exits its loop
> and returns. In our case, when we're looking for 'pars', the match
> occurs on the last iteration of the loop, when we are comparing to
> 'parse'.
>
> Now the problem here is that cp_symbol_name_matches_1 uses the
> strncmp_iw_with_mode, and inside strncmp_iw_with_mode we allow for
> skipping over template parameters. This allows GDB to match the
> symbol name 'foo<int>(int,int)' if the user supplies 'foo(int,'.
> Inside strncmp_iw_with_mode GDB will record any template arguments
> that it has skipped over inside the completion_match_for_lcd object
> that is passed in as an argument.
>
> And so, when GDB tries to match against '<impl str>::parse', the first
> thing it sees is '<impl str>', GDB assumes this is a template argument
> and records this as a skipped region within the
> completion_match_for_lcd object. After '<impl str>' GDB sees a ':'
> character, which doesn't match with the 'pars' the user supplied, so
> strncmp_iw_with_mode returns a value indicating a non-match. GDB then
> removes the '<impl str>' component from the symbol name and tries
> again, this time comparing to 'parse', which does match.
>
> Having found a match, then in cp_symbol_name_matches_1 we record the
> match string, and the full symbol name within the
> completion_match_result object, and return.
>
> The problem here is that the skipped region, the '<impl str>' that we
> recorded in the penultimate loop iteration was never discarded, its
> still there in our returned result.
>
> If we look at what the pointers held in the completion_match_result
> that cp_symbol_name_matches_1 returns, this is what we see:
>
> core::str::<impl str>::parse
> | \________/ |
> | | '--- completion match string
> | '---skip range
> '--- full symbol name
>
> When GDB calls completion_match_for_lcd::finish, GDB tries to create a
> string using the completion match string (parse), but excluding the
> skip range, as the stored skip range is before the start of the
> completion match string, then GDB tries to do some weird string
> creation, which will cause GDB to crash.
>
> The reason we don't often see this problem in C++ is that for C++
> symbols there is always some non-template text before the template
> argument. This non-template text means GDB is likely to either match
> the symbol, or reject the symbol without storing a skip range.
>
> However, notice, I did say, we don't often see this problem. Once I
> understood the issue, I was able to reproduce the crash using a pure
> C++ example:
>
> template<typename S>
> struct foo
> {
> template<typename T>
> foo (int p1, T a)
> {
> s = 0;
> }
>
> S s;
> };
>
> int
> main ()
> {
> foo<int> obj (2.3, 0);
> return 0;
> }
>
> Then in GDB:
>
> (gdb) complete break foo(int
>
> The problem here is that the C++ symbol for the constructor looks like
> this:
>
> foo<int>::foo<double>(int, double)
>
> When GDB enters cp_symbol_name_matches_1 the symbols it examines are:
>
> foo<int>::foo<double>(int, double)
> foo<double>(int, double)
>
> The first iteration of the loop will match the 'foo', then add the
> '<int>' template argument will be added as a skip range. When GDB
> find the ':' after the '<int>' the first iteration of the loop fails
> to match, GDB removes the 'foo<int>::' component, and starts the
> second iteration of the loop.
>
> Again, GDB matches the 'foo', and now adds '<double>' as a skip
> region. After that the '(int' successfully matches, and so the second
> iteration of the loop succeeds, but, once again we left the '<int>' in
> place as a skip region, even though this occurs before the start of
> our match string, and this will cause GDB to crash.
>
> This problem was reported to the mailing list, and a solution
> discussed in this thread:
>
> https://sourceware.org/pipermail/gdb-patches/2023-January/195166.html
>
> The solution proposed here is similar to one proposed by the original
> bug reported, but implemented in a different location within GDB.
> Instead of placing the fix in strncmp_iw_with_mode, I place the fix in
> cp_symbol_name_matches_1. I believe this is a better location as it
> is this function that implements the loop, and it is this loop, which
> repeatedly calls strncmp_iw_with_mode, that should be resetting the
> result object state (I believe).
>
> What I have done is add an assert to strncmp_iw_with_mode that the
> incoming result object is empty.
>
> I've also added some other asserts in related code, in
> completion_match_for_lcd::mark_ignored_range, I make some basic
> assertions about the incoming range pointers, and in
> completion_match_for_lcd::finish I also make some assertions about how
> the skip ranges relate to the match pointer.
>
> There's two new tests. The original rust example that was used in the
> initial bug report, and a C++ test. The rust example depends on which
> symbols are pulled in from the rust libraries, so it is possible that,
> at some future date, the problematic symbol will disappear from this
> test program. The C++ test should be more reliable, as this only
> depends on symbols from within the C++ source code.
>
> Co-Authored-By: Zheng Zhan <zzlossdev@163.com>
>
>diff --git a/gdb/completer.h b/gdb/completer.h
>index 8b4ad8ec4d4..67d2fbf9d38 100644
>--- a/gdb/completer.h
>+++ b/gdb/completer.h
>@@ -145,7 +145,12 @@ class completion_match_for_lcd
>
> /* Mark the range between [BEGIN, END) as ignored. */
> void mark_ignored_range (const char *begin, const char *end)
>- { m_ignored_ranges.emplace_back (begin, end); }
>+ {
>+ gdb_assert (begin < end);
>+ gdb_assert (m_ignored_ranges.empty ()
>+ || m_ignored_ranges.back ().second < begin);
>+ m_ignored_ranges.emplace_back (begin, end);
>+ }
>
> /* Get the resulting LCD, after a successful match. If there are
> ignored ranges, then this builds a new string with the ignored
>@@ -160,9 +165,14 @@ class completion_match_for_lcd
> {
> m_finished_storage.clear ();
>
>+ gdb_assert (m_ignored_ranges.back ().second
>+ <= (m_match + strlen (m_match)));
>+
> const char *prev = m_match;
> for (const auto &range : m_ignored_ranges)
> {
>+ gdb_assert (prev < range.first);
>+ gdb_assert (range.second > range.first);
> m_finished_storage.append (prev, range.first);
> prev = range.second;
> }
>@@ -179,6 +189,13 @@ class completion_match_for_lcd
> m_ignored_ranges.clear ();
> }
>
>+ /* Return true if this object has had no match data set since its
>+ creation, or the last call to clear. */
>+ bool empty () const
>+ {
>+ return m_match == nullptr && m_ignored_ranges.empty ();
>+ }
>+
> private:
> /* The completion match result for LCD. This is usually either a
> pointer into to a substring within a symbol's name, or to the
>diff --git a/gdb/cp-support.c b/gdb/cp-support.c
>index b7ed2101b78..fb5758ff914 100644
>--- a/gdb/cp-support.c
>+++ b/gdb/cp-support.c
>@@ -1773,6 +1773,8 @@ cp_symbol_name_matches_1 (const char *symbol_search_name,
> completion_match_for_lcd *match_for_lcd
> = (comp_match_res != NULL ? &comp_match_res->match_for_lcd : NULL);
>
>+ gdb_assert (match_for_lcd == nullptr || match_for_lcd->empty ());
>+
> while (true)
> {
> if (strncmp_iw_with_mode (sname, lookup_name, lookup_name_len,
>@@ -1806,6 +1808,11 @@ cp_symbol_name_matches_1 (const char *symbol_search_name,
> return true;
> }
>
>+ /* Clear match_for_lcd so the next strncmp_iw_with_mode call starts
>+ from scratch. */
>+ if (match_for_lcd != nullptr)
>+ match_for_lcd->clear ();
Well done. I have a question here, what if the `match_for_lcd` comes from the final call of
`strncmp_iw_with_mode` and what's it contain matters? Currently, I find two more references to
`strnmcp_iw_with_mode`. Are we going to fix each one outsidely?
Thanks,
Zheng
>+
> unsigned int len = cp_find_first_component (sname);
>
> if (sname[len] == '\0')
>diff --git a/gdb/testsuite/gdb.cp/cpcompletion.exp b/gdb/testsuite/gdb.cp/cpcompletion.exp
>index 931f376a23d..48692dbc1f8 100644
>--- a/gdb/testsuite/gdb.cp/cpcompletion.exp
>+++ b/gdb/testsuite/gdb.cp/cpcompletion.exp
>@@ -135,3 +135,5 @@ with_test_prefix "expression with namespace" {
> # Add a disambiguating character and we get a unique completion.
> test_gdb_complete_unique "p Test_NS::f" "p Test_NS::foo"
> }
>+
>+test_gdb_complete_unique "break baz(int" "break baz(int, double)"
>diff --git a/gdb/testsuite/gdb.cp/pr9594.cc b/gdb/testsuite/gdb.cp/pr9594.cc
>index 54ddaafc0ca..b854d810b11 100644
>--- a/gdb/testsuite/gdb.cp/pr9594.cc
>+++ b/gdb/testsuite/gdb.cp/pr9594.cc
>@@ -52,8 +52,31 @@ int qux;
>
> } /* namespace Test_NS */
>
>+/* The important thing with class baz is that both the class and the
>+ constructor must have a template argument, we need the symbol to look
>+ like:
>+
>+ baz<TYPE_1>::baz<TYPE_2>(int,....whatever...)
>+
>+ It doesn't really matter if TYPE_1 and TYPE_2 are the same or different,
>+ but we create them differently in this test as it makes debugging GDB
>+ slightly easier. */
>+
>+template<typename S>
>+struct baz
>+{
>+ template<typename T>
>+ baz (int p1, T a)
>+ {
>+ s = 0;
>+ }
>+
>+ S s;
>+};
>+
> int main ()
> {
>+ baz<int> obj (2.3, 0.1);
> // Anonymous struct with method.
> struct {
> int get() { return 5; }
>diff --git a/gdb/testsuite/gdb.rust/completion.exp b/gdb/testsuite/gdb.rust/completion.exp
>new file mode 100644
>index 00000000000..00923db14a6
>--- /dev/null
>+++ b/gdb/testsuite/gdb.rust/completion.exp
>@@ -0,0 +1,34 @@
>+# Copyright (C) 2023 Free Software Foundation, Inc.
>+
>+# This program is free software; you can redistribute it and/or modify
>+# it under the terms of the GNU General Public License as published by
>+# the Free Software Foundation; either version 3 of the License, or
>+# (at your option) any later version.
>+#
>+# This program is distributed in the hope that it will be useful,
>+# but WITHOUT ANY WARRANTY; without even the implied warranty of
>+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+# GNU General Public License for more details.
>+#
>+# You should have received a copy of the GNU General Public License
>+# along with this program. If not, see <http://www.gnu.org/licenses/>.
>+
>+# Test a completion case that was causing GDB to crash.
>+
>+load_lib rust-support.exp
>+if {[skip_rust_tests]} {
>+ return
>+}
>+
>+standard_testfile .rs
>+if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug rust}]} {
>+ return -1
>+}
>+
>+set line [gdb_get_line_number "set breakpoint here"]
>+if {![runto ${srcfile}:$line]} {
>+ untested "could not run to breakpoint"
>+ return -1
>+}
>+
>+gdb_test "complete break pars" ".*"
>diff --git a/gdb/testsuite/gdb.rust/completion.rs b/gdb/testsuite/gdb.rust/completion.rs
>new file mode 100644
>index 00000000000..8396e3f4086
>--- /dev/null
>+++ b/gdb/testsuite/gdb.rust/completion.rs
>@@ -0,0 +1,19 @@
>+// Copyright (C) 2023 Free Software Foundation, Inc.
>+
>+// This program is free software; you can redistribute it and/or modify
>+// it under the terms of the GNU General Public License as published by
>+// the Free Software Foundation; either version 3 of the License, or
>+// (at your option) any later version.
>+//
>+// This program is distributed in the hope that it will be useful,
>+// but WITHOUT ANY WARRANTY; without even the implied warranty of
>+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+// GNU General Public License for more details.
>+//
>+// You should have received a copy of the GNU General Public License
>+// along with this program. If not, see <http://www.gnu.org/licenses/>.
>+
>+fn main() {
>+ let four: u8 = "4".parse().unwrap();
>+ println!("{}", four); // set breakpoint here
>+}
>diff --git a/gdb/utils.c b/gdb/utils.c
>index 734c5bf7f70..ab14e186821 100644
>--- a/gdb/utils.c
>+++ b/gdb/utils.c
>@@ -2139,6 +2139,8 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
> || language == language_rust
> || language == language_fortran);
>
>+ gdb_assert (match_for_lcd == nullptr || match_for_lcd->empty ());
>+
> while (1)
> {
> if (skip_spaces
next prev parent reply other threads:[~2023-01-13 5:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-02 12:43 Zheng Zhan Liang
2023-01-05 20:24 ` Tom Tromey
2023-01-06 3:01 ` linuxmaker
2023-01-06 15:53 ` Andrew Burgess
2023-01-07 8:58 ` Zheng
2023-01-12 19:06 ` Andrew Burgess
2023-01-13 5:24 ` Zheng [this message]
2023-01-13 14:46 ` Andrew Burgess
2023-03-20 13:56 ` Tom Tromey
2023-03-20 16:08 ` Andrew Burgess
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=61496177.3025.185a996955a.Coremail.linuxmaker@163.com \
--to=linuxmaker@163.com \
--cc=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=tom@tromey.com \
--cc=zzlossdev@163.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).