From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=/ud0=53=redhat.com=aburgess@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 6099F3858D32
	for <gdb-patches@sourceware.org>; Mon, 30 Jan 2023 10:33:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6099F3858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1675074815;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Ei8/H7xiPGTYA2mJXOAUIYBHJUVtiYAOmoRvkTEOTSM=;
	b=UKNwYarLlH4vSNe0NEYHFnok2F9Gm0R+8+sH9nh7YmzEbKZPYA/te8PqWtAWexkAkIkvxA
	OXehQvGaUpqGXMW9PievCYodepoGX73cRAuAv5yr9glbRtwLbEML2f1pBs7CihJvl0UK7d
	SdyMlnN2WQTQ2ohTcLjNZ5Cc6kMg2qw=
Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com
 [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-605-Z5Z725BsMBaPqqs9LakYng-1; Mon, 30 Jan 2023 05:33:34 -0500
X-MC-Unique: Z5Z725BsMBaPqqs9LakYng-1
Received: by mail-qk1-f198.google.com with SMTP id v7-20020a05620a0f0700b006faffce43b2so6786854qkl.9
        for <gdb-patches@sourceware.org>; Mon, 30 Jan 2023 02:33:34 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=mime-version:message-id:date:references:in-reply-to:subject:cc:to
         :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=Ei8/H7xiPGTYA2mJXOAUIYBHJUVtiYAOmoRvkTEOTSM=;
        b=A9YcBrUY70dYoittKDDrchE5lswvfLs19qHgiiTqRs+ZuSbA3XQMch/gY8tLiQzVcZ
         xmerNjwl5uqWlGx2pWKOCMLidYKkeotd60tatMYeZwFmllQ91l0EPyjbMrcReSRD17BG
         KtLaJjQPjPGeL2taMLn4mLF+MQqUPBaR87fHrU2XfBn18F6I00bPd3mvBW0FXx9Ft2Xt
         mQjaRW5ivGxPRKaFr/DNKyYsNw9QH6LRiVIxinA2L8q4QUnw3Wgzu8b8b2FlEb9wvM3C
         Fe/9R3nhSNeWD/5G+aY44Z2enG28SIDu0uCr8ML5WyWWOdarimGBAlupKFwfwZ4c8cSG
         aU/Q==
X-Gm-Message-State: AO0yUKW+bZyDxEUykYV0z2s7PpTNTS2sPLW16c0QGTwMT7W/k3hY/+D7
	YdzzZ6BflauPb6063HXQV/lajzffTda+oVbeImiFlMrZBmY9Ok14YDRLWpw7XuStZgTsn3lP+fW
	sW8A9HFKLkTxWHdXi6UYlWxP1zyDHaDpfRuCBVlZS2WVGdG+J1d6/Ey4VwLtmz8FzJbrIXk+tpg
	==
X-Received: by 2002:ac8:5988:0:b0:3b8:48e4:83b1 with SMTP id e8-20020ac85988000000b003b848e483b1mr10652494qte.20.1675074813988;
        Mon, 30 Jan 2023 02:33:33 -0800 (PST)
X-Google-Smtp-Source: AK7set9310GYnUwh5/K2JWGe0Tb4zQSfljaog28lU0wbY1M/WPhsSe9PIUs6j/orMflG/CiJj9j+qg==
X-Received: by 2002:ac8:5988:0:b0:3b8:48e4:83b1 with SMTP id e8-20020ac85988000000b003b848e483b1mr10652462qte.20.1675074813401;
        Mon, 30 Jan 2023 02:33:33 -0800 (PST)
Received: from localhost (95.72.115.87.dyn.plus.net. [87.115.72.95])
        by smtp.gmail.com with ESMTPSA id s39-20020a05622a1aa700b003a7e38055c9sm7818240qtc.63.2023.01.30.02.33.32
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 30 Jan 2023 02:33:33 -0800 (PST)
From: Andrew Burgess <aburgess@redhat.com>
To: Tom Tromey via Gdb-patches <gdb-patches@sourceware.org>,
 gdb-patches@sourceware.org
Cc: Tom Tromey <tromey@adacore.com>
Subject: Re: [PATCH] Fix comparator bug in cooked index
In-Reply-To: <20230127195632.1570281-1-tromey@adacore.com>
References: <20230127195632.1570281-1-tromey@adacore.com>
Date: Mon, 30 Jan 2023 10:33:31 +0000
Message-ID: <87ilgouxac.fsf@redhat.com>
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain
X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

Tom Tromey via Gdb-patches <gdb-patches@sourceware.org> writes:

> Simon pointed out that the cooked index template-matching patch
> introduced a failure in libstdc++ debug mode.  In particular, the new
> code violates the assumption of std::lower_bound and std::upper_bound
> that the range is sorted with respect to the comparison.
>
> When I first debugged this, I thought the problem was unfixable as-is
> and that a second layer of filtering would have to be done.  However,
> on irc, Simon pointed out that it could perhaps be solved if the
> comparison function were assured that one operand always came from the
> index, with the other always being the search string.
>
> This patch implements this idea.
>
> First, a new mode is introduced: a sorting mode for
> cooked_index_entry::compare.  In this mode, strings are compared
> case-insensitively, but we're careful to always sort '<' before any
> other printable character.  This way, two names like "func" and
> "func<param>" will be sorted next to each other -- i.e., "func1" will
> not be seen between them.  This is important when searching.
>
> Second, the compare function is changed to work in a strcmp-like way.
> This makes it easier to test and (IMO) understand.
>
> Third, the compare function is modified so that in non-sorting modes,
> the index entry is always the first argument.  This allows consistency
> in compares.
>
> I regression tested this in libstdc++ debug mode on x86-64 Fedora 36.
> It fixes the crash that Simon saw.
> ---
>  gdb/dwarf2/cooked-index.c | 166 ++++++++++++++++++++------------------
>  gdb/dwarf2/cooked-index.h |  42 ++++++++--
>  2 files changed, 123 insertions(+), 85 deletions(-)
>
> diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
> index 09b3fd70b26..f6b1df6e529 100644
> --- a/gdb/dwarf2/cooked-index.c
> +++ b/gdb/dwarf2/cooked-index.c
> @@ -30,56 +30,38 @@
>  
>  /* See cooked-index.h.  */
>  
> -bool
> +int
>  cooked_index_entry::compare (const char *stra, const char *strb,
> -			     bool completing)
> +			     comparison_mode mode)
>  {
> -  /* If we've ever matched "<" in both strings, then we disable the
> -     special template parameter handling.  */
> -  bool seen_lt = false;
> +  /* We want to sort '<' before any other printable character.  So,
> +     rewrite '<' to something just before ' '.  */
> +#define MUNGE(c) (c == '<' ? '\x1f' : TOLOWER ((unsigned char) c))
>  
>    while (*stra != '\0'
>  	 && *strb != '\0'
> -	 && (TOLOWER ((unsigned char) *stra)
> -	     == TOLOWER ((unsigned char ) *strb)))
> +	 && (MUNGE (*stra) == MUNGE (*strb)))
>      {
> -      if (*stra == '<')
> -	seen_lt = true;
>        ++stra;
>        ++strb;
>      }
>  
> -  unsigned c1 = TOLOWER ((unsigned char) *stra);
> -  unsigned c2 = TOLOWER ((unsigned char) *strb);
> +  unsigned c1 = MUNGE (*stra);
> +  unsigned c2 = MUNGE (*strb);
>  
> -  if (completing)
> -    {
> -      /* When completing, if one string ends earlier than the other,
> -	 consider them as equal.  Also, completion mode ignores the
> -	 special '<' handling.  */
> -      if (c1 == '\0' || c2 == '\0')
> -	return false;
> -      /* Fall through to the generic case.  */
> -    }
> -  else if (seen_lt)
> -    {
> -      /* Fall through to the generic case.  */
> -    }
> -  else if (c1 == '\0' || c1 == '<')
> -    {
> -      /* Maybe they both end at the same spot.  */
> -      if (c2 == '\0' || c2 == '<')
> -	return false;
> -      /* First string ended earlier.  */
> -      return true;
> -    }
> -  else if (c2 == '\0' || c2 == '<')
> +  if (c1 == c2)
> +    return 0;
> +
> +  /* When completing, if STRB ends earlier than STRA, consider them as
> +     equal.  When comparing, if STRB ends earlier and STRA ends with
> +     '<', consider them as equal.  */
> +  if (mode == COMPLETE || (mode == COMPARE && c1 == MUNGE ('<')))
>      {
> -      /* Second string ended earlier.  */
> -      return false;
> +      if (c2 == '\0')
> +	return 0;
>      }
>  
> -  return c1 < c2;
> +  return c1 < c2 ? -1 : 1;
>  }
>  
>  #if GDB_SELF_TEST
> @@ -89,45 +71,69 @@ namespace {
>  void
>  test_compare ()
>  {
> -  SELF_CHECK (!cooked_index_entry::compare ("abcd", "abcd", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("abcd", "abcd", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("abcd", "abcd", true));
> -  SELF_CHECK (!cooked_index_entry::compare ("abcd", "abcd", true));
> -
> -  SELF_CHECK (cooked_index_entry::compare ("abcd", "ABCDE", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("ABCDE", "abcd", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("abcd", "ABCDE", true));
> -  SELF_CHECK (!cooked_index_entry::compare ("ABCDE", "abcd", true));
> -
> -  SELF_CHECK (!cooked_index_entry::compare ("name", "name<>", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("name<>", "name", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("name", "name<>", true));
> -  SELF_CHECK (!cooked_index_entry::compare ("name<>", "name", true));
> -
> -  SELF_CHECK (!cooked_index_entry::compare ("name<arg>", "name<arg>", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("name<arg>", "name<arg>", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("name<arg>", "name<arg>", true));
> -  SELF_CHECK (!cooked_index_entry::compare ("name<arg>", "name<ag>", true));
> -
> -  SELF_CHECK (!cooked_index_entry::compare ("name<arg<more>>",
> -					    "name<arg<more>>", false));
> -
> -  SELF_CHECK (!cooked_index_entry::compare ("name", "name<arg<more>>", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("name<arg<more>>", "name", false));
> -  SELF_CHECK (cooked_index_entry::compare ("name<arg<", "name<arg<more>>",
> -					   false));
> -  SELF_CHECK (!cooked_index_entry::compare ("name<arg<",
> -					    "name<arg<more>>",
> -					    true));
> -  SELF_CHECK (!cooked_index_entry::compare ("name<arg<more>>", "name<arg<",
> -					    false));
> -  SELF_CHECK (!cooked_index_entry::compare ("name<arg<more>>", "name<arg<",
> -					    true));
> -
> -  SELF_CHECK (cooked_index_entry::compare ("", "abcd", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("", "abcd", true));
> -  SELF_CHECK (!cooked_index_entry::compare ("abcd", "", false));
> -  SELF_CHECK (!cooked_index_entry::compare ("abcd", "", true));
> +  /* Convenience aliases.  */
> +  const auto mode_compare = cooked_index_entry::COMPARE;
> +  const auto mode_sort = cooked_index_entry::SORT;
> +  const auto mode_complete = cooked_index_entry::COMPLETE;
> +
> +  SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> +					   mode_compare) == 0);
> +  SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> +					   mode_compare) == 0);
> +  SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> +					   mode_complete) == 0);
> +  SELF_CHECK (cooked_index_entry::compare ("abcd", "abcd",
> +					   mode_complete) == 0);
> +
> +  SELF_CHECK (cooked_index_entry::compare ("abcd", "ABCDE",
> +					   mode_compare) < 0);
> +  SELF_CHECK (cooked_index_entry::compare ("ABCDE", "abcd",
> +					   mode_compare) > 0);
> +  SELF_CHECK (cooked_index_entry::compare ("abcd", "ABCDE",
> +					   mode_complete) < 0);
> +  SELF_CHECK (cooked_index_entry::compare ("ABCDE", "abcd",
> +					   mode_complete) == 0);
> +
> +  SELF_CHECK (cooked_index_entry::compare ("name", "name<>",
> +					   mode_compare) < 0);
> +  SELF_CHECK (cooked_index_entry::compare ("name<>", "name",
> +					   mode_compare) == 0);
> +  SELF_CHECK (cooked_index_entry::compare ("name", "name<>",
> +					   mode_complete) < 0);
> +  SELF_CHECK (cooked_index_entry::compare ("name<>", "name",
> +					   mode_complete) == 0);
> +
> +  SELF_CHECK (cooked_index_entry::compare ("name<arg>", "name<arg>",
> +					   mode_compare) == 0);
> +  SELF_CHECK (cooked_index_entry::compare ("name<arg>", "name<ag>",
> +					   mode_compare) > 0);
> +  SELF_CHECK (cooked_index_entry::compare ("name<arg>", "name<arg>",
> +					   mode_complete) == 0);
> +  SELF_CHECK (cooked_index_entry::compare ("name<arg>", "name<ag>",
> +					   mode_complete) > 0);
> +
> +  SELF_CHECK (cooked_index_entry::compare ("name<arg<more>>",
> +					   "name<arg<more>>",
> +					   mode_compare) == 0);
> +
> +  SELF_CHECK (cooked_index_entry::compare ("name", "name<arg<more>>",
> +					   mode_compare) < 0);
> +  SELF_CHECK (cooked_index_entry::compare ("name<arg<more>>", "name",
> +					   mode_compare) == 0);
> +  SELF_CHECK (cooked_index_entry::compare ("name<arg<more>>", "name<arg<",
> +					   mode_compare) > 0);
> +  SELF_CHECK (cooked_index_entry::compare ("name<arg<more>>", "name<arg<",
> +					   mode_complete) == 0);
> +
> +  SELF_CHECK (cooked_index_entry::compare ("", "abcd", mode_compare) < 0);
> +  SELF_CHECK (cooked_index_entry::compare ("", "abcd", mode_complete) < 0);
> +  SELF_CHECK (cooked_index_entry::compare ("abcd", "", mode_compare) > 0);
> +  SELF_CHECK (cooked_index_entry::compare ("abcd", "", mode_complete) == 0);
> +
> +  SELF_CHECK (cooked_index_entry::compare ("func", "func<type>",
> +					   mode_sort) < 0);
> +  SELF_CHECK (cooked_index_entry::compare ("func<type>", "func1",
> +					   mode_sort) < 0);
>  }
>  
>  } /* anonymous namespace */
> @@ -359,20 +365,22 @@ cooked_index::find (const std::string &name, bool completing)
>  {
>    wait ();
>  
> +  cooked_index_entry::comparison_mode mode = (completing
> +					      ? cooked_index_entry::COMPLETE
> +					      : cooked_index_entry::COMPARE);
> +
>    auto lower = std::lower_bound (m_entries.begin (), m_entries.end (), name,
>  				 [=] (const cooked_index_entry *entry,
>  				      const std::string &n)
>    {
> -    return cooked_index_entry::compare (entry->canonical, n.c_str (),
> -					completing);
> +    return cooked_index_entry::compare (entry->canonical, n.c_str (), mode) < 0;
>    });
>  
>    auto upper = std::upper_bound (m_entries.begin (), m_entries.end (), name,
>  				 [=] (const std::string &n,
>  				      const cooked_index_entry *entry)
>    {
> -    return cooked_index_entry::compare (n.c_str (), entry->canonical,
> -					completing);
> +    return cooked_index_entry::compare (entry->canonical, n.c_str (), mode) > 0;
>    });
>  
>    return range (lower, upper);
> diff --git a/gdb/dwarf2/cooked-index.h b/gdb/dwarf2/cooked-index.h
> index 55eaf9955ab..1c291ba5694 100644
> --- a/gdb/dwarf2/cooked-index.h
> +++ b/gdb/dwarf2/cooked-index.h
> @@ -143,16 +143,46 @@ struct cooked_index_entry : public allocate_on_obstack
>       STORAGE.  */
>    const char *full_name (struct obstack *storage) const;
>  
> -  /* Compare two strings, case-insensitively.  Return true if STRA is
> -     less than STRB.  If one string has template parameters, but the
> -     other does not, then they are considered to be equal; so for
> -     example "t<x>" == "t<x>", "t<x>" < "t<y>", but "t" == "t<x>".  */
> -  static bool compare (const char *stra, const char *strb, bool completing);
> +  /* Comparison modes for the 'compare' function.  See the function
> +     for a description.  */
> +  enum comparison_mode
> +  {
> +    COMPARE,
> +    SORT,
> +    COMPLETE,
> +  };
> +
> +  /* Compare two strings, case-insensitively.  Return -1 if STRA is
> +     less than STRB, 0 if they are equal, and 1 if STRA is greater.
> +
> +     When comparing, '<' is considered to be less than all other
> +     printable characters.  This ensures that "t<x>" sorts before
> +     "t1", which is necessary when looking up "t".
> +
> +     MODE controls how the comparison proceeds.
> +
> +     MODE==SORT is used when sorting and does no special template
> +     handling.

I think you should replace "template" with "'<'" here.  The handling is
not limited to template argument blocks, but to anything after a '<'.
So languages that don't have templates will be impacted, as will things
like 'operator<', 'operator<<', etc in C++.

This is all fine, returning false positives from the index is clearly
OK, we do the exact matching back in core GDB once the symbols are
expanded, but I don't think we should give the impression that this code
is actually checking for / skipping template arguments, when all it's
really doing is giving up on matching at a '<' (which, again, is fine).

I don't know if it's worth explicitly stating that the '<' handling is
just a rough heuristic?  All the examples you give are template based,
so the comment does leave the impression that the compare function does
some smart detection/handling of template blocks.

Also, it might be worth saying why its fine that we do this C++ specific
'<' handling for all languages.

On a similar theme, it might be worth saying why a case-insensitive
match is fine, even for languages where symbols are case-sensitive.

For me, when I started looking at this code, these were all the
questions that I had.  After some thought I figured out the answers, but
I had to dig into other parts of GDB to confirm them.  It would have
been useful for me if the comments had covered all this stuff.

Thanks,
Andrew

> +
> +     MODE==COMPARE is used when searching for a symbol.  In this case,
> +     STRB must always be the search name, and STRA must be the name in
> +     the index that is under consideration.  In compare mode, early
> +     termination of STRB may match STRA -- for example, "t<int>" and
> +     "t" will be considered to be equal.  (However, if A=="t" and
> +     B=="t<int>", then this will not consider them as equal.)
> +
> +     MODE==COMPLETE is used when searching for a symbol for
> +     completion.  In this case, STRB must always be the search name,
> +     and STRA must be the name in the index that is under
> +     consideration.  In completion mode, early termination of STRB
> +     always results in a match.  */
> +  static int compare (const char *stra, const char *strb,
> +		      comparison_mode mode);
>  
>    /* Compare two entries by canonical name.  */
>    bool operator< (const cooked_index_entry &other) const
>    {
> -    return compare (canonical, other.canonical, false);
> +    return compare (canonical, other.canonical, SORT) < 0;
>    }
>  
>    /* The name as it appears in DWARF.  This always points into one of
> -- 
> 2.38.1