From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-158488-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 100199 invoked by alias); 28 Aug 2019 05:16:03 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 100191 invoked by uid 89); 28 Aug 2019 05:16:02 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-19.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 spammy=surface, H*r:sk:server-, talks
X-HELO: simark.ca
Received: from simark.ca (HELO simark.ca) (158.69.221.121) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Aug 2019 05:16:00 +0000
Received: from [10.0.0.11] (unknown [192.222.164.54])	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)	(No client certificate requested)	by simark.ca (Postfix) with ESMTPSA id 23FFC1E4A2;	Wed, 28 Aug 2019 01:15:58 -0400 (EDT)
Subject: Re: [PATCH 1/4] Increasing support for dwarf 5.
To: Ali Tamur <tamur@google.com>, gdb-patches@sourceware.org
References: <20190827234148.243410-1-tamur@google.com>
From: Simon Marchi <simark@simark.ca>
Message-ID: <d8a1fa51-d7bc-6f54-729d-acaeaa9cf079@simark.ca>
Date: Wed, 28 Aug 2019 05:16:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <20190827234148.243410-1-tamur@google.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-SW-Source: 2019-08/txt/msg00625.txt.bz2

Hi Ali,

Thanks for splitting your patches.  In addition to having separate patches, please
name each patch (git commit) to reflect what it does.  Otherwise we'll get 4 commits
named "Increasing support for dwarf 5", not super clear.

In commit titles / commit messages / comments, can you please use "DWARF" instead of
"dwarf", since that is the official spelling?

Could you also precise (here and in the commit message) how you have tested this?  Which
compiler, which version, which test case?

Here are a few comments on the code, just the surface since I am learning about
the subject as we go.

On 2019-08-27 7:41 p.m., Ali Tamur via gdb-patches wrote:
> * DW_UT_skeleton and DW_UT_split_compile compilation units also have signatures
> to match the compilation unit in the skeleton and .dwo files. The signature is
> part of the header.
> 
> 
> gdb/ChangeLog:                                                                   
> 2019-08-26  Ali Tamur  <tamur@google.com>
>                                                                                  
> 	* gdb/dwarf2read.c (comp_unit_head): Update comment.                     
> 	(dwarf2_dwo_name): New function declaration.                             
> 	(read_comp_unit_head): Add support for new compilation units,            
> 	DW_UT_partial, DW_UT_skeleton, DW_UT_split_compile, DW_UT_split_type.    
> 	Particularly, DW_UT_skeleton and DW_UT_split_compile also have signature 
> 	in their header.                                                         
> 	(lookup_signature): New function. Returns the signature of the given     
> 	compile unit.                                                            
> 	(lookup_dwo_unit): API change. Use the new lookup_signature function.    
> 	(init_cutu_and_read_dies): Use dwarf2_dwo_name to find the dwo name.     
> 	(create_dwo_cu_reader): Use the added lookup_signature function.         
> 	(dwarf2_dwo_name): Get the dwo name if present.
> ---
>  gdb/dwarf2read.c | 101 ++++++++++++++++++++++++++++++++---------------
>  1 file changed, 69 insertions(+), 32 deletions(-)
> 
> diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
> index de9755f6ce..91a4e48c8f 100644
> --- a/gdb/dwarf2read.c
> +++ b/gdb/dwarf2read.c
> @@ -373,8 +373,9 @@ struct comp_unit_head
>       This will be the first byte following the compilation unit header.  */
>    cu_offset first_die_cu_offset;
>  
> -  /* 64-bit signature of this type unit - it is valid only for
> -     UNIT_TYPE DW_UT_type.  */
> +  /* 64-bit signature of this type unit. In dwarf 4, it is valid only for
> +     UNIT_TYPE DW_UT_type. In dwarf 5, DW_UT_split_type, DW_UT_partial,
> +     DW_UT_skeleton, DW_UT_split_compile also contain a signature.  */
>    ULONGEST signature;

Does DW_UT_partial contain a signature?  In section 7.5.1.1 of the DWARF5 spec, it doesn't seem so.

If I understand the spec correctly, these have a dwo_id:

- DW_UT_skeleton
- DW_UT_split_compile

And these have a type_signature:

- DW_UT_type
- DW_UT_split_type

I think the code will be confusing if we keep the field named simply "signature", when it could
actually mean "dwo_id".  Somebody reading the code with the DWARF 5 spec on the side will have some
trouble making the correlation between both.  Maybe we should start using a union to describe that
different fields are used with different kinds of units?

  /* Unit-type-dependent data.  */
  union {
    /* Used in DWARF 5 if UNIT_TYPE is DW_UT_skeleton or
       DW_UT_split_compile.  */
    struct {
      /* 64-bit compilation unit id that links the skeleton unit and its
         corresponding split unit. */
      ULONGEST dwo_id;
    } skeleton_split_compile;

    /* Used in DWARF 5 if UNIT_TYPE is DW_UT_type or DW_UT_split_type.  Also
       used in DWARF 4 for type units.  */
    struct  {
      /* 64-bit signature of the type described by this unit.  */
      ULONGEST signature;

      /* Offset in the type's DIE of the type defined by this TU.  */
      cu_offset type_cu_offset_in_tu;
    } type_split_type;
  };

>  
>    /* For types, offset in the type's DIE of the type defined by this TU.  */
> @@ -1579,6 +1580,8 @@ static struct attribute *dwarf2_attr_no_follow (struct die_info *,
>  static const char *dwarf2_string_attr (struct die_info *die, unsigned int name,
>                                         struct dwarf2_cu *cu);
>  
> +static const char *dwarf2_dwo_name (struct die_info *die, struct dwarf2_cu *cu);
> +
>  static int dwarf2_flag_true_p (struct die_info *die, unsigned name,
>                                 struct dwarf2_cu *cu);
>  
> @@ -6384,18 +6387,24 @@ read_comp_unit_head (struct comp_unit_head *cu_header,
>        switch (cu_header->unit_type)
>  	{
>  	case DW_UT_compile:
> +	case DW_UT_partial:
> +	case DW_UT_skeleton:
> +	case DW_UT_split_compile:
>  	  if (section_kind != rcuh_kind::COMPILE)
>  	    error (_("Dwarf Error: wrong unit_type in compilation unit header "
> -		   "(is DW_UT_compile, should be DW_UT_type) [in module %s]"),
> -		   filename);
> +		   "(is %d, should be DW_UT_type) [in module %s]"),
> +		   cu_header->unit_type, filename);

There's a little discrepancy in this error message, it would show up as:

  (is 1, should be DW_UT_type)

Since we know it's an existing DW_UT value here, I'd suggest we print it as a
string, or even both:

  (is DW_UT_skeleton (0x4), should be DW_UT_type (0x2))

We would just need a small utility function that converts numerical DW_UT value to string.

>  	  break;
>  	case DW_UT_type:
> +	case DW_UT_split_type:
>  	  section_kind = rcuh_kind::TYPE;
>  	  break;
>  	default:
>  	  error (_("Dwarf Error: wrong unit_type in compilation unit header "
> -		 "(is %d, should be %d or %d) [in module %s]"),
> -		 cu_header->unit_type, DW_UT_compile, DW_UT_type, filename);
> +		 "(is %d, should be one of: %d, %d, %d, %d or %d) "
> +		 "[in module %s]"),
> +		 cu_header->unit_type, DW_UT_compile, DW_UT_skeleton,
> +		 DW_UT_split_compile, DW_UT_type, DW_UT_split_type, filename);
>  	}

Here, it's fine to not print the stringified version of the value, since it's an
invalid value.  But could make the message appear like:

  (is 0xff, should be one of: DW_UT_compile (0x1), DW_UT_skeleton (0x4), ...)

?

> @@ -7291,46 +7306,59 @@ read_cutu_die_from_dwo (struct dwarf2_per_cu_data *this_cu,
>    return 1;
>  }
>  
> +/* Return the signature of the compile unit. In dwarf 4 and before, the
> +   signature is in the DW_AT_GNU_dwo_id attribute. In dwarf 5 and later, the
> +   signature is part of the header. Returns 0 if the signature is not found.  */
> +static ULONGEST
> +lookup_signature (struct dwarf2_cu *cu, struct die_info* comp_unit_die)

Again here, since the DWARF 5 standard talks about dwo_id, not signature, wouldn't it make
more sense for us to use the term dwo_id?

> +{
> +  ULONGEST signature = 0;
> +  struct attribute *attr;
> +
> + attr = dwarf2_attr (comp_unit_die, DW_AT_GNU_dwo_id, cu);
> + if (attr)
> +   signature = DW_UNSND (attr);
> + else if (cu->header.signature)
> +   signature = cu->header.signature;
> + return signature;
> +}

Wouldn't it make sense here to check the DWARF version from the cu_header and
take an action based on that?

switch (cu->header.version)
  {
  case 4:
    // Do the thing for DWARF 4.
    break;
  case 5:
    // Do the thing for DWARF 5.
    break;
  default:
    internal_error (...);
  }

The idea being that we want to be sure not to use cu->header.signature if the CU
is DWARF 4, and we want to be sure not to use any DW_AT_GNU_dwo_id attribute if the
CU is DWARF 5 (AFAIK it shouldn't be there).

> +
>  /* Subroutine of init_cutu_and_read_dies to simplify it.
>     Look up the DWO unit specified by COMP_UNIT_DIE of THIS_CU.
>     Returns NULL if the specified DWO unit cannot be found.  */
>  
>  static struct dwo_unit *
> -lookup_dwo_unit (struct dwarf2_per_cu_data *this_cu,
> +lookup_dwo_unit (const struct die_reader_specs *reader,
>  		 struct die_info *comp_unit_die)
>  {
> -  struct dwarf2_cu *cu = this_cu->cu;
> -  ULONGEST signature;
> +  struct dwarf2_cu *cu = reader->cu;
> +  struct dwarf2_per_cu_data *per_cu = cu->per_cu;
>    struct dwo_unit *dwo_unit;
>    const char *comp_dir, *dwo_name;
>  
>    gdb_assert (cu != NULL);
>  
>    /* Yeah, we look dwo_name up again, but it simplifies the code.  */
> -  dwo_name = dwarf2_string_attr (comp_unit_die, DW_AT_GNU_dwo_name, cu);
> +  dwo_name = dwarf2_dwo_name (comp_unit_die, cu);
>    comp_dir = dwarf2_string_attr (comp_unit_die, DW_AT_comp_dir, cu);
>  
> -  if (this_cu->is_debug_types)
> +  if (per_cu->is_debug_types)
>      {
>        struct signatured_type *sig_type;
>  
> -      /* Since this_cu is the first member of struct signatured_type,
> +      /* Since per_cu is the first member of struct signatured_type,
>  	 we can go from a pointer to one to a pointer to the other.  */
> -      sig_type = (struct signatured_type *) this_cu;
> -      signature = sig_type->signature;
> +      sig_type = (struct signatured_type *) per_cu;
>        dwo_unit = lookup_dwo_type_unit (sig_type, dwo_name, comp_dir);
>      }
>    else
>      {
> -      struct attribute *attr;
> -
> -      attr = dwarf2_attr (comp_unit_die, DW_AT_GNU_dwo_id, cu);
> -      if (! attr)
> +      ULONGEST signature = lookup_signature(cu, comp_unit_die);
> +      if (!signature)
>  	error (_("Dwarf Error: missing dwo_id for dwo_name %s"
>  		 " [in module %s]"),

Even though we're unlikely to have a signature / dwo_id that is 0, I would
prefer if we didn't use it as a special value to denote failure.  The function
should return a boolean, and the signature / dwo_id would be returned as an
output parameter.

> @@ -20102,6 +20128,17 @@ dwarf2_string_attr (struct die_info *die, unsigned int name, struct dwarf2_cu *c
>    return str;
>  }
>  
> +/* Return the dwo name or NULL if not present. If present, it is in either
> +   DW_AT_GNU_dwo_name or DW_AT_dwo_name atrribute.  */
> +static const char *
> +dwarf2_dwo_name (struct die_info *die, struct dwarf2_cu *cu)
> +{
> +  const char *dwo_name = dwarf2_string_attr (die, DW_AT_GNU_dwo_name, cu);
> +  if (!dwo_name)
> +    dwo_name = dwarf2_string_attr (die, DW_AT_dwo_name, cu);
> +  return dwo_name;
> +}

Here too, should we do something based on the CU's DWARF version?  I presume we don't want
to accept a DW_AT_GNU_dwo_name attribute in a DWARF 5 CU.

Simon