* Re: [PATCH] Fix comparator bug in cooked index
2023-01-27 19:56 [PATCH] Fix comparator bug in cooked index Tom Tromey
@ 2023-01-28 4:20 ` Simon Marchi
2023-01-30 14:30 ` Tom Tromey
2023-01-30 10:33 ` Andrew Burgess
1 sibling, 1 reply; 5+ messages in thread
From: Simon Marchi @ 2023-01-28 4:20 UTC (permalink / raw)
To: Tom Tromey, gdb-patches
On 1/27/23 14:56, Tom Tromey via Gdb-patches wrote:
> Simon pointed out that the cooked index template-matching patch
> introduced a failure in libstdc++ debug mode. In particular, the new
> code violates the assumption of std::lower_bound and std::upper_bound
> that the range is sorted with respect to the comparison.
>
> When I first debugged this, I thought the problem was unfixable as-is
> and that a second layer of filtering would have to be done. However,
> on irc, Simon pointed out that it could perhaps be solved if the
> comparison function were assured that one operand always came from the
> index, with the other always being the search string.
>
> This patch implements this idea.
>
> First, a new mode is introduced: a sorting mode for
> cooked_index_entry::compare. In this mode, strings are compared
> case-insensitively, but we're careful to always sort '<' before any
> other printable character. This way, two names like "func" and
> "func<param>" will be sorted next to each other -- i.e., "func1" will
> not be seen between them. This is important when searching.
>
> Second, the compare function is changed to work in a strcmp-like way.
> This makes it easier to test and (IMO) understand.
>
> Third, the compare function is modified so that in non-sorting modes,
> the index entry is always the first argument. This allows consistency
> in compares.
>
> I regression tested this in libstdc++ debug mode on x86-64 Fedora 36.
> It fixes the crash that Simon saw.
I'm glad to know this idea works.
Just some minor comments below, otherwise the patch LGTM.
> ---
> gdb/dwarf2/cooked-index.c | 166 ++++++++++++++++++++------------------
> gdb/dwarf2/cooked-index.h | 42 ++++++++--
> 2 files changed, 123 insertions(+), 85 deletions(-)
>
> diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
> index 09b3fd70b26..f6b1df6e529 100644
> --- a/gdb/dwarf2/cooked-index.c
> +++ b/gdb/dwarf2/cooked-index.c
> @@ -30,56 +30,38 @@
>
> /* See cooked-index.h. */
>
> -bool
> +int
> cooked_index_entry::compare (const char *stra, const char *strb,
> - bool completing)
> + comparison_mode mode)
> {
> - /* If we've ever matched "<" in both strings, then we disable the
> - special template parameter handling. */
> - bool seen_lt = false;
> + /* We want to sort '<' before any other printable character. So,
> + rewrite '<' to something just before ' '. */
> +#define MUNGE(c) (c == '<' ? '\x1f' : TOLOWER ((unsigned char) c))
Can you make it a local function, like
auto munge = [] (char c)
{
...
};
? Macros are harder to debug.
> + /* Convenience aliases. */
> + const auto mode_compare = cooked_index_entry::COMPARE;
> + const auto mode_sort = cooked_index_entry::SORT;
> + const auto mode_complete = cooked_index_entry::COMPLETE;
> +
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> + mode_compare) == 0);
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> + mode_compare) == 0);
Maybe I'm missing something, but the two checks above seem identical.
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> + mode_complete) == 0);
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> + mode_complete) == 0);
And these two too.
> diff --git a/gdb/dwarf2/cooked-index.h b/gdb/dwarf2/cooked-index.h
> index 55eaf9955ab..1c291ba5694 100644
> --- a/gdb/dwarf2/cooked-index.h
> +++ b/gdb/dwarf2/cooked-index.h
> @@ -143,16 +143,46 @@ struct cooked_index_entry : public allocate_on_obstack
> STORAGE. */
> const char *full_name (struct obstack *storage) const;
>
> - /* Compare two strings, case-insensitively. Return true if STRA is
> - less than STRB. If one string has template parameters, but the
> - other does not, then they are considered to be equal; so for
> - example "t<x>" == "t<x>", "t<x>" < "t<y>", but "t" == "t<x>". */
> - static bool compare (const char *stra, const char *strb, bool completing);
> + /* Comparison modes for the 'compare' function. See the function
> + for a description. */
> + enum comparison_mode
Just a nit, but I like to use enum class. For the simple reason that
users do "comparison_mode::SORT" instead of just "SORT", and I find that
this little bit of context helps when reading.
> + {
> + COMPARE,
> + SORT,
> + COMPLETE,
It's slightly confusing to have a more called COMPARE, since the
function itself is called compare. Perhaps SEARCH or MATCH?
Simon
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix comparator bug in cooked index
2023-01-27 19:56 [PATCH] Fix comparator bug in cooked index Tom Tromey
2023-01-28 4:20 ` Simon Marchi
@ 2023-01-30 10:33 ` Andrew Burgess
2023-01-30 15:15 ` Simon Marchi
1 sibling, 1 reply; 5+ messages in thread
From: Andrew Burgess @ 2023-01-30 10:33 UTC (permalink / raw)
To: Tom Tromey via Gdb-patches, gdb-patches; +Cc: Tom Tromey
Tom Tromey via Gdb-patches <gdb-patches@sourceware.org> writes:
> Simon pointed out that the cooked index template-matching patch
> introduced a failure in libstdc++ debug mode. In particular, the new
> code violates the assumption of std::lower_bound and std::upper_bound
> that the range is sorted with respect to the comparison.
>
> When I first debugged this, I thought the problem was unfixable as-is
> and that a second layer of filtering would have to be done. However,
> on irc, Simon pointed out that it could perhaps be solved if the
> comparison function were assured that one operand always came from the
> index, with the other always being the search string.
>
> This patch implements this idea.
>
> First, a new mode is introduced: a sorting mode for
> cooked_index_entry::compare. In this mode, strings are compared
> case-insensitively, but we're careful to always sort '<' before any
> other printable character. This way, two names like "func" and
> "func<param>" will be sorted next to each other -- i.e., "func1" will
> not be seen between them. This is important when searching.
>
> Second, the compare function is changed to work in a strcmp-like way.
> This makes it easier to test and (IMO) understand.
>
> Third, the compare function is modified so that in non-sorting modes,
> the index entry is always the first argument. This allows consistency
> in compares.
>
> I regression tested this in libstdc++ debug mode on x86-64 Fedora 36.
> It fixes the crash that Simon saw.
> ---
> gdb/dwarf2/cooked-index.c | 166 ++++++++++++++++++++------------------
> gdb/dwarf2/cooked-index.h | 42 ++++++++--
> 2 files changed, 123 insertions(+), 85 deletions(-)
>
> diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
> index 09b3fd70b26..f6b1df6e529 100644
> --- a/gdb/dwarf2/cooked-index.c
> +++ b/gdb/dwarf2/cooked-index.c
> @@ -30,56 +30,38 @@
>
> /* See cooked-index.h. */
>
> -bool
> +int
> cooked_index_entry::compare (const char *stra, const char *strb,
> - bool completing)
> + comparison_mode mode)
> {
> - /* If we've ever matched "<" in both strings, then we disable the
> - special template parameter handling. */
> - bool seen_lt = false;
> + /* We want to sort '<' before any other printable character. So,
> + rewrite '<' to something just before ' '. */
> +#define MUNGE(c) (c == '<' ? '\x1f' : TOLOWER ((unsigned char) c))
>
> while (*stra != '\0'
> && *strb != '\0'
> - && (TOLOWER ((unsigned char) *stra)
> - == TOLOWER ((unsigned char ) *strb)))
> + && (MUNGE (*stra) == MUNGE (*strb)))
> {
> - if (*stra == '<')
> - seen_lt = true;
> ++stra;
> ++strb;
> }
>
> - unsigned c1 = TOLOWER ((unsigned char) *stra);
> - unsigned c2 = TOLOWER ((unsigned char) *strb);
> + unsigned c1 = MUNGE (*stra);
> + unsigned c2 = MUNGE (*strb);
>
> - if (completing)
> - {
> - /* When completing, if one string ends earlier than the other,
> - consider them as equal. Also, completion mode ignores the
> - special '<' handling. */
> - if (c1 == '\0' || c2 == '\0')
> - return false;
> - /* Fall through to the generic case. */
> - }
> - else if (seen_lt)
> - {
> - /* Fall through to the generic case. */
> - }
> - else if (c1 == '\0' || c1 == '<')
> - {
> - /* Maybe they both end at the same spot. */
> - if (c2 == '\0' || c2 == '<')
> - return false;
> - /* First string ended earlier. */
> - return true;
> - }
> - else if (c2 == '\0' || c2 == '<')
> + if (c1 == c2)
> + return 0;
> +
> + /* When completing, if STRB ends earlier than STRA, consider them as
> + equal. When comparing, if STRB ends earlier and STRA ends with
> + '<', consider them as equal. */
> + if (mode == COMPLETE || (mode == COMPARE && c1 == MUNGE ('<')))
> {
> - /* Second string ended earlier. */
> - return false;
> + if (c2 == '\0')
> + return 0;
> }
>
> - return c1 < c2;
> + return c1 < c2 ? -1 : 1;
> }
>
> #if GDB_SELF_TEST
> @@ -89,45 +71,69 @@ namespace {
> void
> test_compare ()
> {
> - SELF_CHECK (!cooked_index_entry::compare ("abcd", "abcd", false));
> - SELF_CHECK (!cooked_index_entry::compare ("abcd", "abcd", false));
> - SELF_CHECK (!cooked_index_entry::compare ("abcd", "abcd", true));
> - SELF_CHECK (!cooked_index_entry::compare ("abcd", "abcd", true));
> -
> - SELF_CHECK (cooked_index_entry::compare ("abcd", "ABCDE", false));
> - SELF_CHECK (!cooked_index_entry::compare ("ABCDE", "abcd", false));
> - SELF_CHECK (!cooked_index_entry::compare ("abcd", "ABCDE", true));
> - SELF_CHECK (!cooked_index_entry::compare ("ABCDE", "abcd", true));
> -
> - SELF_CHECK (!cooked_index_entry::compare ("name", "name<>", false));
> - SELF_CHECK (!cooked_index_entry::compare ("name<>", "name", false));
> - SELF_CHECK (!cooked_index_entry::compare ("name", "name<>", true));
> - SELF_CHECK (!cooked_index_entry::compare ("name<>", "name", true));
> -
> - SELF_CHECK (!cooked_index_entry::compare ("name<arg>", "name<arg>", false));
> - SELF_CHECK (!cooked_index_entry::compare ("name<arg>", "name<arg>", false));
> - SELF_CHECK (!cooked_index_entry::compare ("name<arg>", "name<arg>", true));
> - SELF_CHECK (!cooked_index_entry::compare ("name<arg>", "name<ag>", true));
> -
> - SELF_CHECK (!cooked_index_entry::compare ("name<arg<more>>",
> - "name<arg<more>>", false));
> -
> - SELF_CHECK (!cooked_index_entry::compare ("name", "name<arg<more>>", false));
> - SELF_CHECK (!cooked_index_entry::compare ("name<arg<more>>", "name", false));
> - SELF_CHECK (cooked_index_entry::compare ("name<arg<", "name<arg<more>>",
> - false));
> - SELF_CHECK (!cooked_index_entry::compare ("name<arg<",
> - "name<arg<more>>",
> - true));
> - SELF_CHECK (!cooked_index_entry::compare ("name<arg<more>>", "name<arg<",
> - false));
> - SELF_CHECK (!cooked_index_entry::compare ("name<arg<more>>", "name<arg<",
> - true));
> -
> - SELF_CHECK (cooked_index_entry::compare ("", "abcd", false));
> - SELF_CHECK (!cooked_index_entry::compare ("", "abcd", true));
> - SELF_CHECK (!cooked_index_entry::compare ("abcd", "", false));
> - SELF_CHECK (!cooked_index_entry::compare ("abcd", "", true));
> + /* Convenience aliases. */
> + const auto mode_compare = cooked_index_entry::COMPARE;
> + const auto mode_sort = cooked_index_entry::SORT;
> + const auto mode_complete = cooked_index_entry::COMPLETE;
> +
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> + mode_compare) == 0);
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> + mode_compare) == 0);
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> + mode_complete) == 0);
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> + mode_complete) == 0);
> +
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "ABCDE",
> + mode_compare) < 0);
> + SELF_CHECK (cooked_index_entry::compare ("ABCDE", "abcd",
> + mode_compare) > 0);
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "ABCDE",
> + mode_complete) < 0);
> + SELF_CHECK (cooked_index_entry::compare ("ABCDE", "abcd",
> + mode_complete) == 0);
> +
> + SELF_CHECK (cooked_index_entry::compare ("name", "name<>",
> + mode_compare) < 0);
> + SELF_CHECK (cooked_index_entry::compare ("name<>", "name",
> + mode_compare) == 0);
> + SELF_CHECK (cooked_index_entry::compare ("name", "name<>",
> + mode_complete) < 0);
> + SELF_CHECK (cooked_index_entry::compare ("name<>", "name",
> + mode_complete) == 0);
> +
> + SELF_CHECK (cooked_index_entry::compare ("name<arg>", "name<arg>",
> + mode_compare) == 0);
> + SELF_CHECK (cooked_index_entry::compare ("name<arg>", "name<ag>",
> + mode_compare) > 0);
> + SELF_CHECK (cooked_index_entry::compare ("name<arg>", "name<arg>",
> + mode_complete) == 0);
> + SELF_CHECK (cooked_index_entry::compare ("name<arg>", "name<ag>",
> + mode_complete) > 0);
> +
> + SELF_CHECK (cooked_index_entry::compare ("name<arg<more>>",
> + "name<arg<more>>",
> + mode_compare) == 0);
> +
> + SELF_CHECK (cooked_index_entry::compare ("name", "name<arg<more>>",
> + mode_compare) < 0);
> + SELF_CHECK (cooked_index_entry::compare ("name<arg<more>>", "name",
> + mode_compare) == 0);
> + SELF_CHECK (cooked_index_entry::compare ("name<arg<more>>", "name<arg<",
> + mode_compare) > 0);
> + SELF_CHECK (cooked_index_entry::compare ("name<arg<more>>", "name<arg<",
> + mode_complete) == 0);
> +
> + SELF_CHECK (cooked_index_entry::compare ("", "abcd", mode_compare) < 0);
> + SELF_CHECK (cooked_index_entry::compare ("", "abcd", mode_complete) < 0);
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "", mode_compare) > 0);
> + SELF_CHECK (cooked_index_entry::compare ("abcd", "", mode_complete) == 0);
> +
> + SELF_CHECK (cooked_index_entry::compare ("func", "func<type>",
> + mode_sort) < 0);
> + SELF_CHECK (cooked_index_entry::compare ("func<type>", "func1",
> + mode_sort) < 0);
> }
>
> } /* anonymous namespace */
> @@ -359,20 +365,22 @@ cooked_index::find (const std::string &name, bool completing)
> {
> wait ();
>
> + cooked_index_entry::comparison_mode mode = (completing
> + ? cooked_index_entry::COMPLETE
> + : cooked_index_entry::COMPARE);
> +
> auto lower = std::lower_bound (m_entries.begin (), m_entries.end (), name,
> [=] (const cooked_index_entry *entry,
> const std::string &n)
> {
> - return cooked_index_entry::compare (entry->canonical, n.c_str (),
> - completing);
> + return cooked_index_entry::compare (entry->canonical, n.c_str (), mode) < 0;
> });
>
> auto upper = std::upper_bound (m_entries.begin (), m_entries.end (), name,
> [=] (const std::string &n,
> const cooked_index_entry *entry)
> {
> - return cooked_index_entry::compare (n.c_str (), entry->canonical,
> - completing);
> + return cooked_index_entry::compare (entry->canonical, n.c_str (), mode) > 0;
> });
>
> return range (lower, upper);
> diff --git a/gdb/dwarf2/cooked-index.h b/gdb/dwarf2/cooked-index.h
> index 55eaf9955ab..1c291ba5694 100644
> --- a/gdb/dwarf2/cooked-index.h
> +++ b/gdb/dwarf2/cooked-index.h
> @@ -143,16 +143,46 @@ struct cooked_index_entry : public allocate_on_obstack
> STORAGE. */
> const char *full_name (struct obstack *storage) const;
>
> - /* Compare two strings, case-insensitively. Return true if STRA is
> - less than STRB. If one string has template parameters, but the
> - other does not, then they are considered to be equal; so for
> - example "t<x>" == "t<x>", "t<x>" < "t<y>", but "t" == "t<x>". */
> - static bool compare (const char *stra, const char *strb, bool completing);
> + /* Comparison modes for the 'compare' function. See the function
> + for a description. */
> + enum comparison_mode
> + {
> + COMPARE,
> + SORT,
> + COMPLETE,
> + };
> +
> + /* Compare two strings, case-insensitively. Return -1 if STRA is
> + less than STRB, 0 if they are equal, and 1 if STRA is greater.
> +
> + When comparing, '<' is considered to be less than all other
> + printable characters. This ensures that "t<x>" sorts before
> + "t1", which is necessary when looking up "t".
> +
> + MODE controls how the comparison proceeds.
> +
> + MODE==SORT is used when sorting and does no special template
> + handling.
I think you should replace "template" with "'<'" here. The handling is
not limited to template argument blocks, but to anything after a '<'.
So languages that don't have templates will be impacted, as will things
like 'operator<', 'operator<<', etc in C++.
This is all fine, returning false positives from the index is clearly
OK, we do the exact matching back in core GDB once the symbols are
expanded, but I don't think we should give the impression that this code
is actually checking for / skipping template arguments, when all it's
really doing is giving up on matching at a '<' (which, again, is fine).
I don't know if it's worth explicitly stating that the '<' handling is
just a rough heuristic? All the examples you give are template based,
so the comment does leave the impression that the compare function does
some smart detection/handling of template blocks.
Also, it might be worth saying why its fine that we do this C++ specific
'<' handling for all languages.
On a similar theme, it might be worth saying why a case-insensitive
match is fine, even for languages where symbols are case-sensitive.
For me, when I started looking at this code, these were all the
questions that I had. After some thought I figured out the answers, but
I had to dig into other parts of GDB to confirm them. It would have
been useful for me if the comments had covered all this stuff.
Thanks,
Andrew
> +
> + MODE==COMPARE is used when searching for a symbol. In this case,
> + STRB must always be the search name, and STRA must be the name in
> + the index that is under consideration. In compare mode, early
> + termination of STRB may match STRA -- for example, "t<int>" and
> + "t" will be considered to be equal. (However, if A=="t" and
> + B=="t<int>", then this will not consider them as equal.)
> +
> + MODE==COMPLETE is used when searching for a symbol for
> + completion. In this case, STRB must always be the search name,
> + and STRA must be the name in the index that is under
> + consideration. In completion mode, early termination of STRB
> + always results in a match. */
> + static int compare (const char *stra, const char *strb,
> + comparison_mode mode);
>
> /* Compare two entries by canonical name. */
> bool operator< (const cooked_index_entry &other) const
> {
> - return compare (canonical, other.canonical, false);
> + return compare (canonical, other.canonical, SORT) < 0;
> }
>
> /* The name as it appears in DWARF. This always points into one of
> --
> 2.38.1
^ permalink raw reply [flat|nested] 5+ messages in thread