From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=DRfT=6N=redhat.com=aburgess@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id 2C1E1385E004
	for <gdb-patches@sourceware.org>; Fri, 17 Feb 2023 22:26:56 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2C1E1385E004
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1676672815;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=uJYf95BLrXldiGbSvZhIJt0jUuud0/KZSzHgd9Y8GbE=;
	b=WuBmaPa6UNONQ2lQfVeBjnmVbHV6ozWPX733dnAQIIWLuWY9EDOnle6W1cZm1vb9GMBuFL
	HNPz/hJXiRlJ5TdU2NlxWouFbudHAgKCVo6d77BIe+OxoxYiFLbBlMQBS1WF5d49tXiL4k
	e/I0zvznhvgp4SC2nF4+iUa6hbliUEc=
Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com
 [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-428-5POG_TwoMmWyFlImeG3ecw-1; Fri, 17 Feb 2023 17:26:54 -0500
X-MC-Unique: 5POG_TwoMmWyFlImeG3ecw-1
Received: by mail-wm1-f69.google.com with SMTP id bg7-20020a05600c3c8700b003e21a3f4e84so950755wmb.8
        for <gdb-patches@sourceware.org>; Fri, 17 Feb 2023 14:26:54 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=mime-version:message-id:date:references:in-reply-to:subject:cc:to
         :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=uJYf95BLrXldiGbSvZhIJt0jUuud0/KZSzHgd9Y8GbE=;
        b=7L+kteChFgpTNzg5E/yXU6EEpcWyOPEqYnpq94Hb/Gci35P0nV7PKSyp484B0Utcz5
         DDXXK6+C9SIdoQ2qGTJmR6rzvKPKv5aaPAzNCjxH74eXuhcxPf0ZTktj7l9gCW0LN7jj
         Lr2g3hIbOBplDKIi/g/qiMnIbkWW7pDZDdgniPz/ds0NgOGafJdY2ANYm8epkwhY45LQ
         CpnVagNDwTnFH3rj4i9DjTxPpmgRoQoTqQmaljlOk3TMFPO8jLmFjoGnOYvhPyn+ra/+
         5UHdQesLa3DzdWYY0/Gm/Om+PIavQwf+CIGaUKfvO71RtJtWxXtWAjoD6nm+PXEBaq6j
         6Vig==
X-Gm-Message-State: AO0yUKUYrWXDbbO1xRW7UPXtMH3+g0jG301xPsjlAE2OXkUAro/7OG4d
	7SqTlDpkBhfBC8EgbthxCO7lNfT+AJNBaOFCJdklPgShNAPaJPOUWEjby//FWNnNNfI3mKvqS3U
	FzYsaLaQxi3XCHO4ZLDbxNB0W2a0=
X-Received: by 2002:a5d:65c7:0:b0:2c5:594b:10d5 with SMTP id e7-20020a5d65c7000000b002c5594b10d5mr2016490wrw.1.1676672812921;
        Fri, 17 Feb 2023 14:26:52 -0800 (PST)
X-Google-Smtp-Source: AK7set+HO2tE88pELUYHb+p/7T251syRqKqX/fZybh/UW1s94S5xeok5IEBbse+BVBKIf/VjUhIt7w==
X-Received: by 2002:a5d:65c7:0:b0:2c5:594b:10d5 with SMTP id e7-20020a5d65c7000000b002c5594b10d5mr2016480wrw.1.1676672812330;
        Fri, 17 Feb 2023 14:26:52 -0800 (PST)
Received: from localhost (95.72.115.87.dyn.plus.net. [87.115.72.95])
        by smtp.gmail.com with ESMTPSA id h9-20020adfe989000000b002c5501a5803sm5279099wrm.65.2023.02.17.14.26.51
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 17 Feb 2023 14:26:51 -0800 (PST)
From: Andrew Burgess <aburgess@redhat.com>
To: Tom Tromey <tom@tromey.com>, gdb-patches@sourceware.org
Cc: Tom Tromey <tom@tromey.com>
Subject: Re: [PATCH] Fix "start" for D, Rust, etc
In-Reply-To: <20230214030000.1982722-1-tom@tromey.com>
References: <20230214030000.1982722-1-tom@tromey.com>
Date: Fri, 17 Feb 2023 22:26:50 +0000
Message-ID: <87k00gkjx1.fsf@redhat.com>
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain
X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

Tom Tromey <tom@tromey.com> writes:

> The new DWARF indexer broke "start" for some languages.
>
> For D, it is broken because, while the code in cooked_index_shard::add
> specifically excludes Ada, it fails to exclude D.  This means that the
> C "main" will be detected as "main" here -- whereas what is intended
> is for the code in find_main_name to use d_main_name to find the name.
>
> The Rust compiler, on the other hand, uses DW_AT_main_subprogram.
> However, the code in dwarf2_build_psymtabs_hard fails to create a
> fully-qualified name, so the name always ends up as plain "main".
>
> For D and Ada, a very simple approach suffices: remove the check
> against "main" from cooked_index_shard::add.  This also has the
> benefit of slightly speeding up DWARF indexing.  I assume this
> approach will work for Pascal and Modula-2 as well, but I don't have a
> way to test those at present.
>
> For Rust, though, this is not sufficient.  And, computing the
> fully-qualified name in dwarf2_build_psymtabs_hard will crash, because
> cooked_index_entry::full_name uses the canonical name -- and that is
> not computed until after canonicalization.
>
> However, we don't want to wait for canonicalization to be done before
> computing the main name.  That would remove any benefit from doing
> canonicalization is the background.
>
> This patch solves this dilemma by noticing that languages using
> DW_AT_main_subprogram are, currently, disjoint from languages
> requiring canonicalization.  Because of this, we can add a parameter
> to full_name to let us avoid crashes, slowdowns, and races here.
>
> This is kind of tricky and ugly, so I've tried to comment it
> sufficiently.
>
> While doing this, I had to change gdb.dwarf2/main-subprogram.exp.  A
> different possibility here would be to ignore the canonicalization
> needs of C in this situation, because those only affect certain types.
> However, I chose this approach because the test case is artificial
> anyhow.

After reading the description a couple of times, then looking at the
patch and reading it again, I feel I understand what's going on, except
for one thing....

>
> A long time ago, in an earlier threading attempt, I changed the global
> current_language to be a function (hidden behind a macro) to let us
> attempt lazily computing the current language.  Perhaps this approach
> could still be made to work.  However, that also seemed rather tricky,
> more so than this patch.

... I don't understand what this last paragraph has to do with the
problem being solved here.

I have a couple of _really_ minor comment fixes that I think would help,
and one small code suggestion.  See inline below.

>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30116
> ---
>  gdb/dwarf2/cooked-index.c                    | 50 +++++++++++++-------
>  gdb/dwarf2/cooked-index.h                    | 18 +++++--
>  gdb/dwarf2/read.c                            | 13 ++++-
>  gdb/testsuite/gdb.dlang/dlang-start.exp      | 38 +++++++++++++++
>  gdb/testsuite/gdb.dlang/simple.d             | 17 +++++++
>  gdb/testsuite/gdb.dwarf2/main-subprogram.exp |  3 +-
>  gdb/testsuite/gdb.rust/rust-start.exp        | 38 +++++++++++++++
>  7 files changed, 154 insertions(+), 23 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.dlang/dlang-start.exp
>  create mode 100644 gdb/testsuite/gdb.dlang/simple.d
>  create mode 100644 gdb/testsuite/gdb.rust/rust-start.exp
>
> diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
> index 3d23a65ad8f..d465028add4 100644
> --- a/gdb/dwarf2/cooked-index.c
> +++ b/gdb/dwarf2/cooked-index.c
> @@ -48,6 +48,16 @@ to_string (cooked_index_flag flags)
>  
>  /* See cooked-index.h.  */
>  
> +bool
> +language_requires_canonicalization (enum language lang)
> +{
> +  return (lang == language_ada
> +	  || lang == language_c
> +	  || lang == language_cplus);
> +}
> +
> +/* See cooked-index.h.  */
> +
>  int
>  cooked_index_entry::compare (const char *stra, const char *strb,
>  			     comparison_mode mode)
> @@ -162,10 +172,12 @@ test_compare ()
>  /* See cooked-index.h.  */
>  
>  const char *
> -cooked_index_entry::full_name (struct obstack *storage) const
> +cooked_index_entry::full_name (struct obstack *storage, bool for_main) const
>  {
> +  const char *local_name = for_main ? name : canonical;
> +
>    if ((flags & IS_LINKAGE) != 0 || parent_entry == nullptr)
> -    return canonical;
> +    return local_name;
>  
>    const char *sep = nullptr;
>    switch (per_cu->lang ())
> @@ -182,11 +194,11 @@ cooked_index_entry::full_name (struct obstack *storage) const
>        break;
>  
>      default:
> -      return canonical;
> +      return local_name;
>      }
>  
> -  parent_entry->write_scope (storage, sep);
> -  obstack_grow0 (storage, canonical, strlen (canonical));
> +  parent_entry->write_scope (storage, sep, for_main);
> +  obstack_grow0 (storage, local_name, strlen (local_name));
>    return (const char *) obstack_finish (storage);
>  }
>  
> @@ -194,11 +206,13 @@ cooked_index_entry::full_name (struct obstack *storage) const
>  
>  void
>  cooked_index_entry::write_scope (struct obstack *storage,
> -				 const char *sep) const
> +				 const char *sep,
> +				 bool for_main) const

The comment above this function says: 'See cooked_index.h', but there's
no comment there explaining any of the parameters... which is bit of a
shame.

>  {
>    if (parent_entry != nullptr)
> -    parent_entry->write_scope (storage, sep);
> -  obstack_grow (storage, canonical, strlen (canonical));
> +    parent_entry->write_scope (storage, sep, for_main);
> +  const char *local_name = for_main ? name : canonical;
> +  obstack_grow (storage, local_name, strlen (local_name));
>    obstack_grow (storage, sep, strlen (sep));
>  }
>  
> @@ -218,10 +232,6 @@ cooked_index_shard::add (sect_offset die_offset, enum dwarf_tag tag,
>       implicit "main" discovery.  */
>    if ((flags & IS_MAIN) != 0)
>      m_main = result;
> -  else if (per_cu->lang () != language_ada
> -	   && m_main == nullptr
> -	   && strcmp (name, "main") == 0)
> -    m_main = result;

The comment on m_main in cooked-index.h needs updating after this change
I think.

>  
>    return result;
>  }
> @@ -323,6 +333,8 @@ cooked_index_shard::do_finalize ()
>  
>    for (cooked_index_entry *entry : m_entries)
>      {
> +      /* Note that this code must be kept in sync with
> +	 language_requires_canonicalization.  */

Can I suggest that after the switch statement we include this assert:

  gdb_assert (entry->canonical == entry->name
              || language_requires_canonicalization (entry->per_cu->lang ()));

Then, if we ever add additional canonicalization and forget to keep
language_requires_canonicalization in sync, the assert will fire.


>        gdb_assert (entry->canonical == nullptr);
>        if ((entry->flags & IS_LINKAGE) != 0)
>  	entry->canonical = entry->name;
> @@ -474,11 +486,15 @@ cooked_index::get_main () const
>    for (const auto &index : m_vector)
>      {
>        const cooked_index_entry *entry = index->get_main ();
> -      if (result == nullptr
> -	  || ((result->flags & IS_MAIN) == 0
> -	      && entry != nullptr
> -	      && (entry->flags & IS_MAIN) != 0))
> -	result = entry;
> +      /* Choose the first "main" we see.  The choice among several is
> +	 arbitrary.  See the comment by the sole caller to understand
> +	 the rationale for filtering by language.  */
> +      if (entry != nullptr
> +	  && !language_requires_canonicalization (entry->per_cu->lang ()))
> +	{
> +	  result = entry;
> +	  break;
> +	}
>      }
>  
>    return result;
> diff --git a/gdb/dwarf2/cooked-index.h b/gdb/dwarf2/cooked-index.h
> index 7fa78d5e87e..e90544f7906 100644
> --- a/gdb/dwarf2/cooked-index.h
> +++ b/gdb/dwarf2/cooked-index.h
> @@ -58,6 +58,13 @@ DEF_ENUM_FLAGS_TYPE (enum cooked_index_flag_enum, cooked_index_flag);
>  
>  std::string to_string (cooked_index_flag flags);
>  
> +/* Return true if LANG requires canonicalization.  This is used
> +   primarily to work around an issue computing the name of "main".
> +   This function must be kept in sync with
> +   cooked_index_shard::do_finalize.  */
> +
> +extern bool language_requires_canonicalization (enum language lang);
> +
>  /* A cooked_index_entry represents a single item in the index.  Note
>     that two entries can be created for the same DIE -- one using the
>     name, and another one using the linkage name, if any.
> @@ -144,8 +151,12 @@ struct cooked_index_entry : public allocate_on_obstack
>  
>    /* Construct the fully-qualified name of this entry and return a
>       pointer to it.  If allocation is needed, it will be done on
> -     STORAGE.  */
> -  const char *full_name (struct obstack *storage) const;
> +     STORAGE.  FOR_MAIN is true if we are computing the name of the
> +     "main" entry -- one marked DW_AT_main_subprogram.  This matters
> +     for avoiding name canonicalization (see comments about this
> +     elsewhere) and also a related race (if "main" computation is done

Saying "see comments about this elsewhere" isn't very helpful.  Ideally
say where to go look.  Or repeat the reason here.  Or just drop that
sentence.

> +     during finalization).  */
> +  const char *full_name (struct obstack *storage, bool for_main = false) const;
>  
>    /* Comparison modes for the 'compare' function.  See the function
>       for a description.  */
> @@ -220,7 +231,8 @@ struct cooked_index_entry : public allocate_on_obstack
>  
>  private:
>  
> -  void write_scope (struct obstack *storage, const char *sep) const;
> +  void write_scope (struct obstack *storage, const char *sep,
> +		    bool for_name) const;
>  };
>  
>  class cooked_index;
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 470ff125c5b..382603c2936 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -7167,8 +7167,17 @@ dwarf2_build_psymtabs_hard (dwarf2_per_objfile *per_objfile)
>  
>    const cooked_index_entry *main_entry = vec->get_main ();
>    if (main_entry != nullptr)
> -    set_objfile_main_name (objfile, main_entry->name,
> -			   main_entry->per_cu->lang ());
> +    {
> +      /* We only do this for names not requiring canonicalization.  At
> +	 this point in the process, names have not been canonicalized.

I think the comma after process can be dropped.

> +	 However, currently, languages that require this step also do
> +	 not use DW_AT_main_subprogram.  An assert is appropriate here
> +	 because this filtering is done in get_main.  */
> +      enum language lang = main_entry->per_cu->lang ();
> +      gdb_assert (!language_requires_canonicalization (lang));
> +      const char *full_name = main_entry->full_name (&per_bfd->obstack, true);
> +      set_objfile_main_name (objfile, full_name, lang);
> +    }
>  
>    dwarf_read_debug_printf ("Done building psymtabs of %s",
>  			   objfile_name (objfile));
> diff --git a/gdb/testsuite/gdb.dlang/dlang-start.exp b/gdb/testsuite/gdb.dlang/dlang-start.exp
> new file mode 100644
> index 00000000000..fd4688b0635
> --- /dev/null
> +++ b/gdb/testsuite/gdb.dlang/dlang-start.exp
> @@ -0,0 +1,38 @@
> +# Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test "start" for D.
> +
> +load_lib d-support.exp
> +require allow_d_tests
> +
> +# This testcase verifies the behavior of the `start' command, which
> +# does not work when we use the gdb stub...
> +require !use_gdb_stub
> +
> +standard_testfile simple.d
> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug d}]} {
> +    return -1
> +}
> +
> +# Verify that "start" lands inside the right procedure.
> +if {[gdb_start_cmd] < 0} {
> +    unsupported "start failed"
> +    return -1
> +}
> +
> +gdb_test "" \
> +    "main \\(\\) at .*simple.d.*" \
> +    "start"
> diff --git a/gdb/testsuite/gdb.dlang/simple.d b/gdb/testsuite/gdb.dlang/simple.d
> new file mode 100644
> index 00000000000..b00884b1b9f
> --- /dev/null
> +++ b/gdb/testsuite/gdb.dlang/simple.d
> @@ -0,0 +1,17 @@
> +// Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +// This program is free software; you can redistribute it and/or modify
> +// it under the terms of the GNU General Public License as published by
> +// the Free Software Foundation; either version 3 of the License, or
> +// (at your option) any later version.
> +//
> +// This program is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +void main() {
> +}
> diff --git a/gdb/testsuite/gdb.dwarf2/main-subprogram.exp b/gdb/testsuite/gdb.dwarf2/main-subprogram.exp
> index 23f02df8513..149b7a801be 100644
> --- a/gdb/testsuite/gdb.dwarf2/main-subprogram.exp
> +++ b/gdb/testsuite/gdb.dwarf2/main-subprogram.exp
> @@ -27,8 +27,9 @@ Dwarf::assemble $asm_file {
>      global srcfile
>  
>      cu {} {
> +	# Note we don't want C here as that requires canonicalization.
>  	DW_TAG_compile_unit {
> -                {DW_AT_language @DW_LANG_C}
> +		{DW_AT_language @DW_LANG_PLI}

I'm curious why PLI?  I guess it's to force GDB to select language
minimal, but this is so weird it feels like it's worth a comment.

I agree that the solution feels a little yuck, but I'm not sure we'd
want anything more complex until we're forced too (by some new language
that uses DW_AT_main_subprogram and requires canonicalization).  Plus I
was able to follow what's going on here, so I'd be happy to see this in
GDB.

Reviewed-By: Andrew Burgess <aburgess@redhat.com>

Thanks,
Andrew


>                  {DW_AT_name     $srcfile}
>                  {DW_AT_comp_dir /tmp}
>          } {
> diff --git a/gdb/testsuite/gdb.rust/rust-start.exp b/gdb/testsuite/gdb.rust/rust-start.exp
> new file mode 100644
> index 00000000000..96ba2ae3ac8
> --- /dev/null
> +++ b/gdb/testsuite/gdb.rust/rust-start.exp
> @@ -0,0 +1,38 @@
> +# Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test "start" for Rust.
> +
> +load_lib rust-support.exp
> +require allow_rust_tests
> +
> +# This testcase verifies the behavior of the `start' command, which
> +# does not work when we use the gdb stub...
> +require !use_gdb_stub
> +
> +standard_testfile simple.rs
> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug rust}]} {
> +    return -1
> +}
> +
> +# Verify that "start" lands inside the right procedure.
> +if {[gdb_start_cmd] < 0} {
> +    unsupported "start failed"
> +    return -1
> +}
> +
> +gdb_test "" \
> +    "simple::main \\(\\) at .*simple.rs.*" \
> +    "start"
> -- 
> 2.39.1