Read and write DWARF5 units and forms

public inbox for dwz@sourceware.org
 help / color / mirror / Atom feed

* Read and write DWARF5 units and forms
@ 2020-09-24 16:25 Mark Wielaard
  2020-09-24 16:25 ` [PATCH 1/4] Calculate size and write correct DWARF5 header Mark Wielaard
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Mark Wielaard @ 2020-09-24 16:25 UTC (permalink / raw)
  To: dwz

Hi,

The following patches implement reading and writing of DWARF5
units and forms (excluding the split-dwarf DW_FORM_strx,
DW_FORM_addrx, DW_FORM_loclistx and DW_FORM_rnglistx)

This makes dwz able to process and write out
DWARF5 as generated by GCC for various files (as long as they
don't contain loclists or DWARF5 debug line tables).

[PATCH 1/4] Calculate size and write correct DWARF5 header
[PATCH 2/4] Handle DW_FORM_data16.
[PATCH 3/4] Handle DW_FORM_line_strp by not moving DIE.
[PATCH 4/4] Handle DW_FORM_implicit_const [experiment].

I believe the first two patches are good to go. A DWARF5 unit
header is 1 byte larger and has a different order for some of
the fields. DW_FORM_data16 is simple since it is just 16 bytes
of data that doesn't need to be treated specially for any attribute.

For now DW_FORM_line_strp is also fairly simple. But that is
mainly because I made it so that DIEs cannot be moved to a
multifile. There is no .debug_line_str section in a supplemental
file. I like to push this change for now and then revisit how
to handle line_strp when adding support for DWARF5 .debug_line
tables. It makes sense to merge all dir and line table strings
into the the supplemental .debug_str table (so both the main
file and multifile can refer to them using DW_FORM_strp or
DW_FORM_GNU_strp_alt/DW_FORM_strp_sup).

The last patch is a work in progress. DW_FORM_implicit_const
keeps the actual data value in the abbrev. This makes things tricky
because we have to decide where to keep (a reference to) the value
and when to update the value if necessary. In particular
DW_FORM_implicit_const might be used with a line table index for
the DW_AT_decl_file or DW_AT_call_file attributes which might
need updating in multifile mode. The patch commit message
contains some questions how to handle this better than is
done now. In particular I think the value updates are done at
the wrong point because I see various odr failures which seem
to indicate some DIEs aren't correctly detected as being similar.
I am also not sure things work correctly for --devel-ignore-locus.
Suggestions on how to handle this better would be very welcome.

I want to first work on handling the new .debug_loclists,
.debug_rnglists sections and parsing/writing of version 5 .debugline.
Then come back to properly handling DW_FORM_implicit_const.

Note that to make the testsuite work (there are still some failures
when building with make check CC="gcc -gdwarf-5" CXX="g++ -gdwarf-5")
you need binutils 2.35.1 (released this week) plus at least this patch:
https://sourceware.org/pipermail/binutils/2020-September/113425.html

Cheers,

Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] Calculate size and write correct DWARF5 header
  2020-09-24 16:25 Read and write DWARF5 units and forms Mark Wielaard
@ 2020-09-24 16:25 ` Mark Wielaard
  2020-09-24 19:39   ` Jakub Jelinek
  2020-09-24 16:25 ` [PATCH 2/4] Handle DW_FORM_data16 Mark Wielaard
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Mark Wielaard @ 2020-09-24 16:25 UTC (permalink / raw)
  To: dwz; +Cc: Mark Wielaard

	* dwz.c (try_debug_info): Add header size check for cu_version 5.
	(read_debug_info): Likewise.
	(partition_dups_1): Add 1 to pu_size for cu_version 5.
	(create_import_tree): Include header_size, 13 or 14 depending on
	cu_version, in cost comparision.
	(compute_abbrevs): Adjust headersz depending on cu_version.
	(recompute_abbrevs): Likewise.
	(write_info): Write cu_version 5 unit type.
---
 dwz.c | 45 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/dwz.c b/dwz.c
index 180f9dc..8291333 100644
--- a/dwz.c
+++ b/dwz.c
@@ -5605,6 +5605,7 @@ try_debug_info (DSO *dso)
       unsigned int culen;
       int cu_version;
 
+      /* Note header is one bigger with DWARF version 5.  */
       if (ptr + (kind == DEBUG_TYPES ? 23 : 11) > endsec)
 	{
 	  error (0, 0, "%s: %s CU header too small", dso->filename,
@@ -5682,6 +5683,13 @@ try_debug_info (DSO *dso)
 
       if (cu_version == 5)
 	{
+	  /* Above we only checked for the smaller version 4 header size.  */
+	  if (ptr + 4 > endsec)
+	    {
+	      error (0, 0, "%s: %s CU version 5 header too small",
+		     dso->filename, debug_sections[kind].name);
+	      goto fail;
+	    }
 	  value = read_32 (ptr);
 	  if (value >= debug_sections[DEBUG_ABBREV].size)
 	    {
@@ -5867,6 +5875,7 @@ read_debug_info (DSO *dso, int kind, unsigned int *die_count)
       unsigned int debug_line_off;
       unsigned int type_offset = 0;
 
+      /* Note header is one bigger with DWARF version 5.  */
       if (ptr + (kind == DEBUG_TYPES ? 23 : 11) > endsec)
 	{
 	  error (0, 0, "%s: %s CU header too small", dso->filename,
@@ -5914,6 +5923,13 @@ read_debug_info (DSO *dso, int kind, unsigned int *die_count)
 	}
       else
 	{
+	  /* Above we only checked for the smaller version 4 header size.  */
+	  if (ptr + 4 > endsec)
+	    {
+	      error (0, 0, "%s: %s CU version 5 header too small",
+		     dso->filename, debug_sections[kind].name);
+	      goto fail;
+	    }
 	  value = read_32 (ptr);
 	  if (value >= debug_sections[DEBUG_ABBREV].size)
 	    {
@@ -7209,6 +7225,8 @@ partition_dups_1 (dw_die_ref *arr, size_t vec_size,
 	   /* CU Header: address_size (ubyte).
 	      1 byte.  */
 	   + 1
+	   /* DWARF5 CU header: unit_type (ubyte).  */
+	   + die_cu (arr[i])->cu_version >= 5 ? 1 : 0
 	   /* CU Root DIE: abbreviation code (unsigned LEB128).
 	      1 or more bytes.  Optimistically assume 1.  */
 	   + 1
@@ -8052,8 +8070,8 @@ create_import_tree (void)
      that would need to be added, and if some new DW_TAG_partial_unit
      CUs are going to be created as a result of this routine, that size
      too.  DW_TAG_imported_unit has size 5 (for DWARF3+) or 1 + ptr_size
-     (DWARF2), DW_TAG_partial_unit has size 13 (11 CU header + 1 byte
-     abbrev number + 1 byte child end).  */
+     (DWARF2), DW_TAG_partial_unit has size 13/14 (11 CU header + 1 byte
+     abbrev number + 1 byte child end + 1 byte for DWARF5 unit_type).  */
   unsigned int size = 0;
   /* Size of DW_TAG_imported_unit if the same everywhere, otherwise
      (mixing DWARF2 and DWARF3+ with ptr_size != 4) 0.  */
@@ -8306,14 +8324,18 @@ create_import_tree (void)
 		    continue;
 		  if (npu == NULL)
 		    {
+		      unsigned int header_size;
 		      pusrc[srccount] = e3->icu;
+		      header_size = (pusrc[srccount]->cu->cu_version >= 5
+				     ? 14 : 13); /* DWARF5 unit_type byte.  */
 		      cost += edge_cost;
 		      if (!edge_cost)
 			cost += pusrc[srccount]->cu->cu_version == 2
 				? 1 + ptr_size : 5;
 		      srccount++;
 		      if (ignore_size || ((dstcount - 1) * cost
-					  > 13 + dstcount * new_edge_cost))
+					  > (header_size
+					     + dstcount * new_edge_cost)))
 			{
 			  unsigned int j;
 
@@ -10451,7 +10473,8 @@ compute_abbrevs (DSO *dso)
       dw_die_ref *intracuarr, *intracuvec;
       enum dwarf_form intracuform = DW_FORM_ref4;
       dw_die_ref child, *lastotr, child_next, *last;
-      unsigned int headersz = cu->cu_kind == CU_TYPES ? 23 : 11;
+      unsigned int headersz = (cu->cu_kind == CU_TYPES
+			       ? 23 : (cu->cu_version >= 5 ? 12 : 11));
 
       if (unlikely (fi_multifile) && cu->cu_die->die_remove)
 	continue;
@@ -10627,7 +10650,7 @@ compute_abbrevs (DSO *dso)
 	collapse_children (cu, cu->cu_die);
     }
   if (wr_multifile)
-    total_size += 11;
+    total_size += 11; /* See the end of write_info.  */
   obstack_free (&ob2, (void *) t);
   cuarr = (dw_cu_ref *) obstack_alloc (&ob2, ncus * sizeof (dw_cu_ref));
   for (cu = first_cu, i = 0; cu; cu = cu->cu_next)
@@ -11721,7 +11744,8 @@ write_die (unsigned char *ptr, dw_cu_ref cu, dw_die_ref die,
 static void
 recompute_abbrevs (dw_cu_ref cu, unsigned int cu_size)
 {
-  unsigned int headersz = cu->cu_kind == CU_TYPES ? 23 : 11;
+  unsigned int headersz = (cu->cu_kind == CU_TYPES
+			   ? 23 : (cu->cu_version >= 5 ? 12 : 11));
   struct abbrev_tag *t;
   unsigned int ndies = 0, intracusize, off, i;
   dw_die_ref *intracuarr, *intracuvec;
@@ -11826,8 +11850,15 @@ write_info (unsigned int *die_count)
       /* Write CU header.  */
       write_32 (ptr, next_off - cu->cu_new_offset - 4);
       write_16 (ptr, cu->cu_version);
+      if (cu->cu_version >= 5)
+	{
+	  *ptr++ = (cu->cu_die->die_tag == DW_TAG_compile_unit
+		    ? DW_UT_compile : DW_UT_partial);
+	  write_8 (ptr, ptr_size);
+	}
       write_32 (ptr, cu->u2.cu_new_abbrev_offset);
-      write_8 (ptr, ptr_size);
+      if (cu->cu_version < 5)
+	write_8 (ptr, ptr_size);
       ptr = write_die (ptr, cu, cu->cu_die, NULL, NULL, die_count);
       assert (info + (next_off - (wr_multifile ? multi_info_off : 0)) == ptr);
       if (unlikely (low_mem) && cu->cu_kind != CU_PU)
-- 
2.18.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] Calculate size and write correct DWARF5 header
  2020-09-24 16:25 ` [PATCH 1/4] Calculate size and write correct DWARF5 header Mark Wielaard
@ 2020-09-24 19:39   ` Jakub Jelinek
  2020-09-25 16:35     ` Mark Wielaard
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2020-09-24 19:39 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: dwz

On Thu, Sep 24, 2020 at 06:25:54PM +0200, Mark Wielaard wrote:
> @@ -7209,6 +7225,8 @@ partition_dups_1 (dw_die_ref *arr, size_t vec_size,
>  	   /* CU Header: address_size (ubyte).
>  	      1 byte.  */
>  	   + 1
> +	   /* DWARF5 CU header: unit_type (ubyte).  */
> +	   + die_cu (arr[i])->cu_version >= 5 ? 1 : 0
>  	   /* CU Root DIE: abbreviation code (unsigned LEB128).
>  	      1 or more bytes.  Optimistically assume 1.  */
>  	   + 1

The above looks incorrect.
4 + 2 + 4 + 1 + x >= 5 ? 1 : 0 + 1 + 4 + 4
is (11 + x >= 5) ? 1 : (0 + 1 + 4 + 4)
rather than
4 + 2 + 4 + 1 + (x >= 5 ? 1 : 0) + 1 + 4 + 4
we need there.  So, please add ()s around.

> @@ -10451,7 +10473,8 @@ compute_abbrevs (DSO *dso)
>        dw_die_ref *intracuarr, *intracuvec;
>        enum dwarf_form intracuform = DW_FORM_ref4;
>        dw_die_ref child, *lastotr, child_next, *last;
> -      unsigned int headersz = cu->cu_kind == CU_TYPES ? 23 : 11;
> +      unsigned int headersz = (cu->cu_kind == CU_TYPES
> +			       ? 23 : (cu->cu_version >= 5 ? 12 : 11));

Is the DWARF 5 .debug_info types header also 23 bytes long?

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] Calculate size and write correct DWARF5 header
  2020-09-24 19:39   ` Jakub Jelinek
@ 2020-09-25 16:35     ` Mark Wielaard
  2020-09-25 16:39       ` Jakub Jelinek
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Wielaard @ 2020-09-25 16:35 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: dwz

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

Hi Jakub,

On Thu, 2020-09-24 at 21:39 +0200, Jakub Jelinek wrote:
> On Thu, Sep 24, 2020 at 06:25:54PM +0200, Mark Wielaard wrote:
> > @@ -7209,6 +7225,8 @@ partition_dups_1 (dw_die_ref *arr, size_t
> > vec_size,
> >  	   /* CU Header: address_size (ubyte).
> >  	      1 byte.  */
> >  	   + 1
> > +	   /* DWARF5 CU header: unit_type (ubyte).  */
> > +	   + die_cu (arr[i])->cu_version >= 5 ? 1 : 0
> >  	   /* CU Root DIE: abbreviation code (unsigned LEB128).
> >  	      1 or more bytes.  Optimistically assume 1.  */
> >  	   + 1
> 
> The above looks incorrect.
> 4 + 2 + 4 + 1 + x >= 5 ? 1 : 0 + 1 + 4 + 4
> is (11 + x >= 5) ? 1 : (0 + 1 + 4 + 4)
> rather than
> 4 + 2 + 4 + 1 + (x >= 5 ? 1 : 0) + 1 + 4 + 4
> we need there.  So, please add ()s around.

Groan, how embarrassing. Fixed.

> > @@ -10451,7 +10473,8 @@ compute_abbrevs (DSO *dso)
> >        dw_die_ref *intracuarr, *intracuvec;
> >        enum dwarf_form intracuform = DW_FORM_ref4;
> >        dw_die_ref child, *lastotr, child_next, *last;
> > -      unsigned int headersz = cu->cu_kind == CU_TYPES ? 23 : 11;
> > +      unsigned int headersz = (cu->cu_kind == CU_TYPES
> > +			       ? 23 : (cu->cu_version >= 5 ? 12 : 11));
> 
> Is the DWARF 5 .debug_info types header also 23 bytes long?

DW_UT_type units are length (4) + version (2) + unit_type (1) +
ptr_size (1) + abbrev_off (4) + signature (8) + type_off (4) = 28 bytes
long. But we don't support those yet. When we do we need to distinguish
between CU_TYPES_4 and CU_TYPES_5 because they can be both present.

For my first pass of adding DWARF5 support I was not planning on doing
CU_TYPES_5 because as far as I know nothing produces it at the moment.
But it shouldn't be too hard to add it at the same level that dwz
supports CU_TYPES_4.

Is the patch (with the above fix) as attached for just compile and
partial DWARF5 unit headers OK for now?

Cheers,

Mark

[-- Attachment #2: Type: text/x-patch, Size: 5946 bytes --]

From c2cf77e0cdf05611bb0aa6ee6261acb13dff5634 Mon Sep 17 00:00:00 2001
From: Mark Wielaard <mark@klomp.org>
Date: Wed, 23 Sep 2020 01:48:39 +0200
Subject: [PATCH] Calculate size and write correct DWARF5 compile and partial
 unit headers.

	* dwz.c (try_debug_info): Add header size check for cu_version 5.
	(read_debug_info): Likewise.
	(partition_dups_1): Add 1 to pu_size for cu_version 5.
	(create_import_tree): Include header_size, 13 or 14 depending on
	cu_version, in cost comparision.
	(compute_abbrevs): Adjust headersz depending on cu_version.
	(recompute_abbrevs): Likewise.
	(write_info): Write cu_version 5 unit type.
---
 dwz.c | 45 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/dwz.c b/dwz.c
index 180f9dc..c8503a7 100644
--- a/dwz.c
+++ b/dwz.c
@@ -5605,6 +5605,7 @@ try_debug_info (DSO *dso)
       unsigned int culen;
       int cu_version;
 
+      /* Note header is one bigger with DWARF version 5.  */
       if (ptr + (kind == DEBUG_TYPES ? 23 : 11) > endsec)
 	{
 	  error (0, 0, "%s: %s CU header too small", dso->filename,
@@ -5682,6 +5683,13 @@ try_debug_info (DSO *dso)
 
       if (cu_version == 5)
 	{
+	  /* Above we only checked for the smaller version 4 header size.  */
+	  if (ptr + 4 > endsec)
+	    {
+	      error (0, 0, "%s: %s CU version 5 header too small",
+		     dso->filename, debug_sections[kind].name);
+	      goto fail;
+	    }
 	  value = read_32 (ptr);
 	  if (value >= debug_sections[DEBUG_ABBREV].size)
 	    {
@@ -5867,6 +5875,7 @@ read_debug_info (DSO *dso, int kind, unsigned int *die_count)
       unsigned int debug_line_off;
       unsigned int type_offset = 0;
 
+      /* Note header is one bigger with DWARF version 5.  */
       if (ptr + (kind == DEBUG_TYPES ? 23 : 11) > endsec)
 	{
 	  error (0, 0, "%s: %s CU header too small", dso->filename,
@@ -5945,6 +5954,13 @@ read_debug_info (DSO *dso, int kind, unsigned int *die_count)
 
       if (cu_version == 5)
 	{
+	  /* Above we only checked for the smaller version 4 header size.  */
+	  if (ptr + 4 > endsec)
+	    {
+	      error (0, 0, "%s: %s CU version 5 header too small",
+		     dso->filename, debug_sections[kind].name);
+	      goto fail;
+	    }
 	  value = read_32 (ptr);
 	  if (value >= debug_sections[DEBUG_ABBREV].size)
 	    {
@@ -7209,6 +7225,8 @@ partition_dups_1 (dw_die_ref *arr, size_t vec_size,
 	   /* CU Header: address_size (ubyte).
 	      1 byte.  */
 	   + 1
+	   /* DWARF5 CU header: unit_type (ubyte).  */
+	   + (die_cu (arr[i])->cu_version >= 5 ? 1 : 0)
 	   /* CU Root DIE: abbreviation code (unsigned LEB128).
 	      1 or more bytes.  Optimistically assume 1.  */
 	   + 1
@@ -8052,8 +8070,8 @@ create_import_tree (void)
      that would need to be added, and if some new DW_TAG_partial_unit
      CUs are going to be created as a result of this routine, that size
      too.  DW_TAG_imported_unit has size 5 (for DWARF3+) or 1 + ptr_size
-     (DWARF2), DW_TAG_partial_unit has size 13 (11 CU header + 1 byte
-     abbrev number + 1 byte child end).  */
+     (DWARF2), DW_TAG_partial_unit has size 13/14 (11 CU header + 1 byte
+     abbrev number + 1 byte child end + 1 byte for DWARF5 unit_type).  */
   unsigned int size = 0;
   /* Size of DW_TAG_imported_unit if the same everywhere, otherwise
      (mixing DWARF2 and DWARF3+ with ptr_size != 4) 0.  */
@@ -8306,14 +8324,18 @@ create_import_tree (void)
 		    continue;
 		  if (npu == NULL)
 		    {
+		      unsigned int header_size;
 		      pusrc[srccount] = e3->icu;
+		      header_size = (pusrc[srccount]->cu->cu_version >= 5
+				     ? 14 : 13); /* DWARF5 unit_type byte.  */
 		      cost += edge_cost;
 		      if (!edge_cost)
 			cost += pusrc[srccount]->cu->cu_version == 2
 				? 1 + ptr_size : 5;
 		      srccount++;
 		      if (ignore_size || ((dstcount - 1) * cost
-					  > 13 + dstcount * new_edge_cost))
+					  > (header_size
+					     + dstcount * new_edge_cost)))
 			{
 			  unsigned int j;
 
@@ -10451,7 +10473,8 @@ compute_abbrevs (DSO *dso)
       dw_die_ref *intracuarr, *intracuvec;
       enum dwarf_form intracuform = DW_FORM_ref4;
       dw_die_ref child, *lastotr, child_next, *last;
-      unsigned int headersz = cu->cu_kind == CU_TYPES ? 23 : 11;
+      unsigned int headersz = (cu->cu_kind == CU_TYPES
+			       ? 23 : (cu->cu_version >= 5 ? 12 : 11));
 
       if (unlikely (fi_multifile) && cu->cu_die->die_remove)
 	continue;
@@ -10627,7 +10650,7 @@ compute_abbrevs (DSO *dso)
 	collapse_children (cu, cu->cu_die);
     }
   if (wr_multifile)
-    total_size += 11;
+    total_size += 11; /* See the end of write_info.  */
   obstack_free (&ob2, (void *) t);
   cuarr = (dw_cu_ref *) obstack_alloc (&ob2, ncus * sizeof (dw_cu_ref));
   for (cu = first_cu, i = 0; cu; cu = cu->cu_next)
@@ -11721,7 +11744,8 @@ write_die (unsigned char *ptr, dw_cu_ref cu, dw_die_ref die,
 static void
 recompute_abbrevs (dw_cu_ref cu, unsigned int cu_size)
 {
-  unsigned int headersz = cu->cu_kind == CU_TYPES ? 23 : 11;
+  unsigned int headersz = (cu->cu_kind == CU_TYPES
+			   ? 23 : (cu->cu_version >= 5 ? 12 : 11));
   struct abbrev_tag *t;
   unsigned int ndies = 0, intracusize, off, i;
   dw_die_ref *intracuarr, *intracuvec;
@@ -11826,8 +11850,15 @@ write_info (unsigned int *die_count)
       /* Write CU header.  */
       write_32 (ptr, next_off - cu->cu_new_offset - 4);
       write_16 (ptr, cu->cu_version);
+      if (cu->cu_version >= 5)
+	{
+	  *ptr++ = (cu->cu_die->die_tag == DW_TAG_compile_unit
+		    ? DW_UT_compile : DW_UT_partial);
+	  write_8 (ptr, ptr_size);
+	}
       write_32 (ptr, cu->u2.cu_new_abbrev_offset);
-      write_8 (ptr, ptr_size);
+      if (cu->cu_version < 5)
+	write_8 (ptr, ptr_size);
       ptr = write_die (ptr, cu, cu->cu_die, NULL, NULL, die_count);
       assert (info + (next_off - (wr_multifile ? multi_info_off : 0)) == ptr);
       if (unlikely (low_mem) && cu->cu_kind != CU_PU)
-- 
2.18.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] Calculate size and write correct DWARF5 header
  2020-09-25 16:35     ` Mark Wielaard
@ 2020-09-25 16:39       ` Jakub Jelinek
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-09-25 16:39 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: dwz

On Fri, Sep 25, 2020 at 06:35:43PM +0200, Mark Wielaard wrote:
> > > -      unsigned int headersz = cu->cu_kind == CU_TYPES ? 23 : 11;
> > > +      unsigned int headersz = (cu->cu_kind == CU_TYPES
> > > +			       ? 23 : (cu->cu_version >= 5 ? 12 : 11));
> > 
> > Is the DWARF 5 .debug_info types header also 23 bytes long?
> 
> DW_UT_type units are length (4) + version (2) + unit_type (1) +
> ptr_size (1) + abbrev_off (4) + signature (8) + type_off (4) = 28 bytes
> long. But we don't support those yet. When we do we need to distinguish
> between CU_TYPES_4 and CU_TYPES_5 because they can be both present.
> 
> For my first pass of adding DWARF5 support I was not planning on doing
> CU_TYPES_5 because as far as I know nothing produces it at the moment.
> But it shouldn't be too hard to add it at the same level that dwz
> supports CU_TYPES_4.
> 
> Is the patch (with the above fix) as attached for just compile and
> partial DWARF5 unit headers OK for now?

Ok, thanks.

> From c2cf77e0cdf05611bb0aa6ee6261acb13dff5634 Mon Sep 17 00:00:00 2001
> From: Mark Wielaard <mark@klomp.org>
> Date: Wed, 23 Sep 2020 01:48:39 +0200
> Subject: [PATCH] Calculate size and write correct DWARF5 compile and partial
>  unit headers.
> 
> 	* dwz.c (try_debug_info): Add header size check for cu_version 5.
> 	(read_debug_info): Likewise.
> 	(partition_dups_1): Add 1 to pu_size for cu_version 5.
> 	(create_import_tree): Include header_size, 13 or 14 depending on
> 	cu_version, in cost comparision.
> 	(compute_abbrevs): Adjust headersz depending on cu_version.
> 	(recompute_abbrevs): Likewise.
> 	(write_info): Write cu_version 5 unit type.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/4] Handle DW_FORM_data16.
  2020-09-24 16:25 Read and write DWARF5 units and forms Mark Wielaard
  2020-09-24 16:25 ` [PATCH 1/4] Calculate size and write correct DWARF5 header Mark Wielaard
@ 2020-09-24 16:25 ` Mark Wielaard
  2020-09-24 19:43   ` Jakub Jelinek
  2020-09-24 16:25 ` [PATCH 3/4] Handle DW_FORM_line_strp by not moving DIE Mark Wielaard
  2020-09-24 16:25 ` [PATCH 4/4] Handle DW_FORM_implicit_const [experiment] Mark Wielaard
  3 siblings, 1 reply; 13+ messages in thread
From: Mark Wielaard @ 2020-09-24 16:25 UTC (permalink / raw)
  To: dwz; +Cc: Mark Wielaard

This is a simple form to handle, it just encodes 16 bytes of DIE data.

	* dwz.c (read_abbrev): Accept DW_FORM_data16.
	(skip_attr_no_dw_form_indirect): Handle DW_FORM_data16.
	(checksum_die): Likewise.
	(checksum_ref_die): Likewise.
	(die_eq_1): Likewise.
	(mark_refs): Likewise.
	(read_debug_info): Likewise.
	(build_abbrevs_for_die): Likewise.
	(DW_FORM_ref_sig8): Likewise.
---
 dwz.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/dwz.c b/dwz.c
index 8291333..9719ee6 100644
--- a/dwz.c
+++ b/dwz.c
@@ -1260,7 +1260,8 @@ read_abbrev (DSO *dso, unsigned char *ptr)
 	  nattr++;
 	  form = read_uleb128 (p);
 	  if (form == 2
-	      || (form > DW_FORM_flag_present && form != DW_FORM_ref_sig8))
+	      || (form > DW_FORM_flag_present && (form != DW_FORM_ref_sig8
+						  || form != DW_FORM_data16)))
 	    {
 	      error (0, 0, "%s: Unknown DWARF %s",
 		     dso->filename, get_DW_FORM_str (form));
@@ -1709,6 +1710,9 @@ skip_attr_no_dw_form_indirect (unsigned int cu_version, uint32_t form,
     case DW_FORM_ref_sig8:
       ptr += 8;
       break;
+    case DW_FORM_data16:
+      ptr += 16;
+      break;
     case DW_FORM_sdata:
     case DW_FORM_ref_udata:
     case DW_FORM_udata:
@@ -2975,6 +2979,9 @@ checksum_die (DSO *dso, dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die)
 	  die->die_no_multifile = 1;
 	  ptr += 8;
 	  break;
+	case DW_FORM_data16:
+	  ptr += 16;
+	  break;
 	case DW_FORM_sdata:
 	case DW_FORM_udata:
 	  skip_leb128 (ptr);
@@ -3394,6 +3401,9 @@ checksum_ref_die (dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die,
 	    case DW_FORM_ref_sig8:
 	      ptr += 8;
 	      break;
+	    case DW_FORM_data16:
+	      ptr += 16;
+	      break;
 	    case DW_FORM_sdata:
 	    case DW_FORM_udata:
 	      skip_leb128 (ptr);
@@ -4214,6 +4224,10 @@ die_eq_1 (dw_cu_ref cu1, dw_cu_ref cu2,
 	  ptr1 += 8;
 	  ptr2 += 8;
 	  break;
+	case DW_FORM_data16:
+	  ptr1 += 16;
+	  ptr2 += 16;
+	  break;
 	case DW_FORM_sdata:
 	case DW_FORM_udata:
 	  skip_leb128 (ptr1);
@@ -5295,6 +5309,9 @@ mark_refs (dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die, int mode)
 	    case DW_FORM_ref_sig8:
 	      ptr += 8;
 	      break;
+	    case DW_FORM_data16:
+	      ptr += 16;
+	      break;
 	    case DW_FORM_sdata:
 	    case DW_FORM_udata:
 	      skip_leb128 (ptr);
@@ -6270,6 +6287,9 @@ read_debug_info (DSO *dso, int kind, unsigned int *die_count)
 		case DW_FORM_ref_sig8:
 		  ptr += 8;
 		  break;
+		case DW_FORM_data16:
+		  ptr += 16;
+		  break;
 		case DW_FORM_sdata:
 		case DW_FORM_udata:
 		  if (lang_p
@@ -9941,6 +9961,9 @@ build_abbrevs_for_die (htab_t h, dw_cu_ref cu, dw_die_ref die,
 		case DW_FORM_ref_sig8:
 		  ptr += 8;
 		  break;
+		case DW_FORM_data16:
+		  ptr += 16;
+		  break;
 		case DW_FORM_sdata:
 		case DW_FORM_udata:
 		  skip_leb128 (ptr);
@@ -11481,6 +11504,9 @@ write_die (unsigned char *ptr, dw_cu_ref cu, dw_die_ref die,
 	    case DW_FORM_ref_sig8:
 	      inptr += 8;
 	      break;
+	    case DW_FORM_data16:
+	      inptr += 16;
+	      break;
 	    case DW_FORM_sdata:
 	    case DW_FORM_udata:
 	      skip_leb128 (inptr);
-- 
2.18.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] Handle DW_FORM_data16.
  2020-09-24 16:25 ` [PATCH 2/4] Handle DW_FORM_data16 Mark Wielaard
@ 2020-09-24 19:43   ` Jakub Jelinek
  2020-09-24 21:53     ` Jakub Jelinek
  2020-09-25 16:42     ` Mark Wielaard
  0 siblings, 2 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-09-24 19:43 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: dwz

On Thu, Sep 24, 2020 at 06:25:55PM +0200, Mark Wielaard wrote:
> This is a simple form to handle, it just encodes 16 bytes of DIE data.
> 
> 	* dwz.c (read_abbrev): Accept DW_FORM_data16.
> 	(skip_attr_no_dw_form_indirect): Handle DW_FORM_data16.
> 	(checksum_die): Likewise.
> 	(checksum_ref_die): Likewise.
> 	(die_eq_1): Likewise.
> 	(mark_refs): Likewise.
> 	(read_debug_info): Likewise.
> 	(build_abbrevs_for_die): Likewise.
> 	(DW_FORM_ref_sig8): Likewise.
> ---
>  dwz.c | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/dwz.c b/dwz.c
> index 8291333..9719ee6 100644
> --- a/dwz.c
> +++ b/dwz.c
> @@ -1260,7 +1260,8 @@ read_abbrev (DSO *dso, unsigned char *ptr)
>  	  nattr++;
>  	  form = read_uleb128 (p);
>  	  if (form == 2
> -	      || (form > DW_FORM_flag_present && form != DW_FORM_ref_sig8))
> +	      || (form > DW_FORM_flag_present && (form != DW_FORM_ref_sig8
> +						  || form != DW_FORM_data16)))

This should have been changed to:
	      || (form > DW_FORM_ref_sig8 && form != DW_FORM_data16))
The point is that the code now handles everything up to DW_FORM_ref_sig8
inclusive except 2, and then DW_FORM_data16 with gaps in between.
>  	    {
>  	      error (0, 0, "%s: Unknown DWARF %s",
>  		     dso->filename, get_DW_FORM_str (form));
> @@ -1709,6 +1710,9 @@ skip_attr_no_dw_form_indirect (unsigned int cu_version, uint32_t form,
>      case DW_FORM_ref_sig8:
>        ptr += 8;
>        break;
> +    case DW_FORM_data16:
> +      ptr += 16;
> +      break;
>      case DW_FORM_sdata:
>      case DW_FORM_ref_udata:
>      case DW_FORM_udata:
> @@ -2975,6 +2979,9 @@ checksum_die (DSO *dso, dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die)
>  	  die->die_no_multifile = 1;
>  	  ptr += 8;
>  	  break;
> +	case DW_FORM_data16:
> +	  ptr += 16;
> +	  break;

Also, wonder if DW_FORM_ref_sig8 has a separate case from DW_FORM_data8, if
it wouldn't be more readable to stick DW_FORM_data16 case below the
DW_FORM_data8 one.  When both are the same case, that isn't possible of
course.

Otherwise LGTM.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] Handle DW_FORM_data16.
  2020-09-24 19:43   ` Jakub Jelinek
@ 2020-09-24 21:53     ` Jakub Jelinek
  2020-09-25 16:42     ` Mark Wielaard
  1 sibling, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-09-24 21:53 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: dwz

On Thu, Sep 24, 2020 at 09:43:33PM +0200, Jakub Jelinek wrote:
> > -	      || (form > DW_FORM_flag_present && form != DW_FORM_ref_sig8))
> > +	      || (form > DW_FORM_flag_present && (form != DW_FORM_ref_sig8
> > +						  || form != DW_FORM_data16)))
> 
> This should have been changed to:
> 	      || (form > DW_FORM_ref_sig8 && form != DW_FORM_data16))
> The point is that the code now handles everything up to DW_FORM_ref_sig8
> inclusive except 2, and then DW_FORM_data16 with gaps in between.

Sorry, I'm wrong, have been after years caught by the issue I used to be
aware of, that DW_FORM_ref_sig8 has been given non-consecutive number 0x20
after previous 0x19 :(.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] Handle DW_FORM_data16.
  2020-09-24 19:43   ` Jakub Jelinek
  2020-09-24 21:53     ` Jakub Jelinek
@ 2020-09-25 16:42     ` Mark Wielaard
  1 sibling, 0 replies; 13+ messages in thread
From: Mark Wielaard @ 2020-09-25 16:42 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: dwz

Hi,

On Thu, 2020-09-24 at 21:43 +0200, Jakub Jelinek wrote:
> On Thu, Sep 24, 2020 at 06:25:55PM +0200, Mark Wielaard wrote:
> > @@ -1709,6 +1710,9 @@ skip_attr_no_dw_form_indirect (unsigned int
> > cu_version, uint32_t form,
> >      case DW_FORM_ref_sig8:
> >        ptr += 8;
> >        break;
> > +    case DW_FORM_data16:
> > +      ptr += 16;
> > +      break;
> >      case DW_FORM_sdata:
> >      case DW_FORM_ref_udata:
> >      case DW_FORM_udata:
> > @@ -2975,6 +2979,9 @@ checksum_die (DSO *dso, dw_cu_ref cu,
> > dw_die_ref top_die, dw_die_ref die)
> >  	  die->die_no_multifile = 1;
> >  	  ptr += 8;
> >  	  break;
> > +	case DW_FORM_data16:
> > +	  ptr += 16;
> > +	  break;
> 
> Also, wonder if DW_FORM_ref_sig8 has a separate case from DW_FORM_data8, if
> it wouldn't be more readable to stick DW_FORM_data16 case below the
> DW_FORM_data8 one.  When both are the same case, that isn't possible of course.

In the case of skip_attr_no_dw_form_indirect DW_FORM_ref8,
DW_FORM_data8 and DW_FORM_ref_sig8 are the same case (and
DW_FORM_data16 immediately follows that case), but in the case of
checksym_die they are different cases. I moved DW_FORM_data16 right
after DW_FORM_data8 in that case.

> Otherwise LGTM.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/4] Handle DW_FORM_line_strp by not moving DIE.
  2020-09-24 16:25 Read and write DWARF5 units and forms Mark Wielaard
  2020-09-24 16:25 ` [PATCH 1/4] Calculate size and write correct DWARF5 header Mark Wielaard
  2020-09-24 16:25 ` [PATCH 2/4] Handle DW_FORM_data16 Mark Wielaard
@ 2020-09-24 16:25 ` Mark Wielaard
  2020-09-24 19:45   ` Jakub Jelinek
  2020-09-24 16:25 ` [PATCH 4/4] Handle DW_FORM_implicit_const [experiment] Mark Wielaard
  3 siblings, 1 reply; 13+ messages in thread
From: Mark Wielaard @ 2020-09-24 16:25 UTC (permalink / raw)
  To: dwz; +Cc: Mark Wielaard

DW_FORM_line_strp points into a new string table .debug_line_str.
There is no supplemental .debug_line_str, so DIEs having a line_strp
form attribute cannot be moved. This isn't really a problem because
normally DW_FORM_line_strp is only used for file paths used in top-level
DW_AT_name and DW_AT_comp_dir, which wouldn't be moved anyway.

When we add support for DWARF5 line tables we might revise the decision
to keep the DW_FORM_line_strp as is since it might be advantaguous move
the dir table and file table strings into the supplemental file .debug_str
section.

	* dwz.c (read_abbrev): Accept DW_FORM_line_strp.
	(skip_attr_no_dw_form_indirect): Handle DW_FORM_line_strp.
	(get_AT_string): Return string from DEBUG_LINE_STR for
	DW_FORM_line_strp.
	(checksum_die): Make DIE CK_BAD for DW_FORM_line_strp to prevent
	moving.
	(die_eq_1): Handle DW_FORM_line_strp.
	(mark_refs): Likewise.
	(read_debug_info): Likewise. Don't note string.
	(build_abbrevs_for_die): Handle DW_FORM_line_strp. Keep form as is.
	(write_unit_die): Write out DW_FORM_line_strp data.
	(write_die): Likewise.
---
 dwz.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 4 deletions(-)

diff --git a/dwz.c b/dwz.c
index 9719ee6..8f13298 100644
--- a/dwz.c
+++ b/dwz.c
@@ -1260,8 +1260,10 @@ read_abbrev (DSO *dso, unsigned char *ptr)
 	  nattr++;
 	  form = read_uleb128 (p);
 	  if (form == 2
-	      || (form > DW_FORM_flag_present && (form != DW_FORM_ref_sig8
-						  || form != DW_FORM_data16)))
+	      || (form > DW_FORM_flag_present
+		  && (form != DW_FORM_ref_sig8
+		      || form != DW_FORM_data16
+		      || form != DW_FORM_line_strp)))
 	    {
 	      error (0, 0, "%s: Unknown DWARF %s",
 		     dso->filename, get_DW_FORM_str (form));
@@ -1703,6 +1705,7 @@ skip_attr_no_dw_form_indirect (unsigned int cu_version, uint32_t form,
     case DW_FORM_data4:
     case DW_FORM_sec_offset:
     case DW_FORM_strp:
+    case DW_FORM_line_strp:
       ptr += 4;
       break;
     case DW_FORM_ref8:
@@ -1869,6 +1872,14 @@ get_AT_string (dw_die_ref die, enum dwarf_attribute at)
 	  return NULL;
 	return (char *) debug_sections[DEBUG_STR].data + strp;
       }
+    case DW_FORM_line_strp:
+      {
+	unsigned int line_strp = read_32 (ptr);
+	if (line_strp >= debug_sections[DEBUG_LINE_STR].size)
+	  return NULL;
+	else
+	  return (char *) debug_sections[DEBUG_LINE_STR].data + line_strp;
+      }
     default:
       return NULL;
     }
@@ -3063,6 +3074,14 @@ checksum_die (DSO *dso, dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die)
 		}
 	    }
 	  break;
+	case DW_FORM_line_strp:
+	  /* There is no .debug_line_str in the alt file, so we cannot
+	     move this DIE unless we change the string reference.
+	     This is not that bad because DW_FORM_line_strp is often
+	     only used in the CU DIE for file name and comp_dir and we
+	     don't move the CU DIE anyway.  */
+	  die->die_ck_state = CK_BAD;
+	  break;
 	case DW_FORM_string:
 	  ptr = (unsigned char *) strchr ((char *)ptr, '\0') + 1;
 	  if (only_hash_name_p && t->attr[i].attr == DW_AT_name)
@@ -3395,6 +3414,7 @@ checksum_ref_die (dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die,
 	    case DW_FORM_data4:
 	    case DW_FORM_sec_offset:
 	    case DW_FORM_strp:
+	    case DW_FORM_line_strp:
 	      ptr += 4;
 	      break;
 	    case DW_FORM_data8:
@@ -4264,6 +4284,10 @@ die_eq_1 (dw_cu_ref cu1, dw_cu_ref cu2,
 	  ptr1 += 4;
 	  ptr2 += 4;
 	  break;
+	case DW_FORM_line_strp:
+	  ptr1 += 4;
+	  ptr2 += 4;
+	  break;
 	case DW_FORM_string:
 	  ptr1 = (unsigned char *) strchr ((char *)ptr1, '\0') + 1;
 	  ptr2 = (unsigned char *) strchr ((char *)ptr2, '\0') + 1;
@@ -5303,6 +5327,7 @@ mark_refs (dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die, int mode)
 	    case DW_FORM_data4:
 	    case DW_FORM_sec_offset:
 	    case DW_FORM_strp:
+	    case DW_FORM_line_strp:
 	      ptr += 4;
 	      break;
 	    case DW_FORM_data8:
@@ -6317,6 +6342,17 @@ read_debug_info (DSO *dso, int kind, unsigned int *die_count)
 		  else
 		    ptr += 4;
 		  break;
+		case DW_FORM_line_strp:
+		  if (t->attr[i].attr == DW_AT_name
+		      && (die->die_tag == DW_TAG_namespace
+			  || die->die_tag == DW_TAG_module)
+		      && !die->die_root
+		      && (die->die_parent->die_root
+			  || die->die_parent->die_named_namespace))
+		    die->die_named_namespace = 1;
+		  /* Don't note strp, different string table.  */
+		  ptr += 4;
+		  break;
 		case DW_FORM_string:
 		  ptr = (unsigned char *) strchr ((char *)ptr, '\0') + 1;
 		  if (t->attr[i].attr == DW_AT_name
@@ -9983,6 +10019,11 @@ build_abbrevs_for_die (htab_t h, dw_cu_ref cu, dw_die_ref die,
 		  else
 		    ptr += 4;
 		  break;
+		case DW_FORM_line_strp:
+		  /* Since we don't register the line_strp we cannot
+		     change the form in the case of multifile.  */
+		  ptr += 4;
+		  break;
 		case DW_FORM_string:
 		  ptr = (unsigned char *) strchr ((char *)ptr, '\0') + 1;
 		  break;
@@ -10128,13 +10169,17 @@ build_abbrevs_for_die (htab_t h, dw_cu_ref cu, dw_die_ref die,
 	  {
 	    enum dwarf_form form;
 	    unsigned char *ptr = get_AT (origin, DW_AT_comp_dir, &form);
-	    assert (ptr && (form == DW_FORM_string || form == DW_FORM_strp));
+	    assert (ptr && (form == DW_FORM_string
+			    || form == DW_FORM_strp
+			    || form == DW_FORM_line_strp));
 	    if (form == DW_FORM_strp)
 	      {
 		if (unlikely (op_multifile || fi_multifile))
 		  form = note_strp_offset2 (read_32 (ptr));
 		die->die_size += 4;
 	      }
+	    else if (form == DW_FORM_line_strp)
+	      die->die_size += 4;
 	    else
 	      die->die_size
 		+= strlen (refcu->cu_comp_dir) + 1;
@@ -10148,13 +10193,17 @@ build_abbrevs_for_die (htab_t h, dw_cu_ref cu, dw_die_ref die,
 	{
 	  enum dwarf_form form;
 	  unsigned char *ptr = get_AT (origin, DW_AT_name, &form);
-	  assert (ptr && (form == DW_FORM_string || form == DW_FORM_strp));
+	  assert (ptr && (form == DW_FORM_string
+			  || form == DW_FORM_strp
+			  || form == DW_FORM_line_strp));
 	  if (form == DW_FORM_strp)
 	    {
 	      if (unlikely (op_multifile || fi_multifile))
 		form = note_strp_offset2 (read_32 (ptr));
 	      die->die_size = 4;
 	    }
+	  else if (form == DW_FORM_line_strp)
+	    die->die_size += 4;
 	  else
 	    die->die_size = strlen ((char *) ptr) + 1;
 	  t->attr[0].attr = DW_AT_name;
@@ -11254,6 +11303,11 @@ write_unit_die (unsigned char *ptr, dw_die_ref die, dw_die_ref origin)
 		    ptr += 4;
 		  }
 	      }
+	    else if (form == DW_FORM_line_strp)
+	      {
+		memcpy (ptr, p, 4);
+		ptr += 4;
+	      }
 	    else
 	      {
 		size_t len = strlen ((char *) p) + 1;
@@ -11523,6 +11577,9 @@ write_die (unsigned char *ptr, dw_cu_ref cu, dw_die_ref die,
 		}
 	      inptr += 4;
 	      break;
+	    case DW_FORM_line_strp:
+	      inptr += 4;
+	      break;
 	    case DW_FORM_string:
 	      inptr = (unsigned char *) strchr ((char *)inptr, '\0') + 1;
 	      break;
@@ -11703,6 +11760,11 @@ write_die (unsigned char *ptr, dw_cu_ref cu, dw_die_ref die,
 		  ptr += 4;
 		}
 	    }
+	  else if (form == DW_FORM_line_strp)
+	    {
+	      memcpy (ptr, p, 4);
+	      ptr += 4;
+	    }
 	  else
 	    {
 	      size_t len = strlen ((char *) p) + 1;
-- 
2.18.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] Handle DW_FORM_line_strp by not moving DIE.
  2020-09-24 16:25 ` [PATCH 3/4] Handle DW_FORM_line_strp by not moving DIE Mark Wielaard
@ 2020-09-24 19:45   ` Jakub Jelinek
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-09-24 19:45 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: dwz

On Thu, Sep 24, 2020 at 06:25:56PM +0200, Mark Wielaard wrote:
> DW_FORM_line_strp points into a new string table .debug_line_str.
> There is no supplemental .debug_line_str, so DIEs having a line_strp
> form attribute cannot be moved. This isn't really a problem because
> normally DW_FORM_line_strp is only used for file paths used in top-level
> DW_AT_name and DW_AT_comp_dir, which wouldn't be moved anyway.
> 
> When we add support for DWARF5 line tables we might revise the decision
> to keep the DW_FORM_line_strp as is since it might be advantaguous move
> the dir table and file table strings into the supplemental file .debug_str
> section.
> 
> 	* dwz.c (read_abbrev): Accept DW_FORM_line_strp.
> 	(skip_attr_no_dw_form_indirect): Handle DW_FORM_line_strp.
> 	(get_AT_string): Return string from DEBUG_LINE_STR for
> 	DW_FORM_line_strp.
> 	(checksum_die): Make DIE CK_BAD for DW_FORM_line_strp to prevent
> 	moving.
> 	(die_eq_1): Handle DW_FORM_line_strp.
> 	(mark_refs): Likewise.
> 	(read_debug_info): Likewise. Don't note string.
> 	(build_abbrevs_for_die): Handle DW_FORM_line_strp. Keep form as is.
> 	(write_unit_die): Write out DW_FORM_line_strp data.
> 	(write_die): Likewise.

LGTM.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] Handle DW_FORM_implicit_const [experiment].
  2020-09-24 16:25 Read and write DWARF5 units and forms Mark Wielaard
                   ` (2 preceding siblings ...)
  2020-09-24 16:25 ` [PATCH 3/4] Handle DW_FORM_line_strp by not moving DIE Mark Wielaard
@ 2020-09-24 16:25 ` Mark Wielaard
  2020-09-24 19:57   ` Jakub Jelinek
  3 siblings, 1 reply; 13+ messages in thread
From: Mark Wielaard @ 2020-09-24 16:25 UTC (permalink / raw)
  To: dwz; +Cc: Mark Wielaard

This handles DW_FORM_implicit_const. This form keeps the actual data
value in the abbrev. This makes things tricky because we have to decide
where to keep (a reference to) the value and when to update the value
if necessary. In particular DW_FORM_implicit_const might be used with
a line table index for the DW_AT_decl_file or DW_AT_call_file which
might be updated in multifile mode.

This implementation keeps a value with each attribute (which is highly
inefficient, we could allocate values at the end of the struct abbrev_attr
only when there is an DW_FORM_implicit_const attribute). And updates the
value late in compute_abbrevs when writing the final multifile.

	* dwz.c (struct abbrev_attr): Add value field.
	(abbrev_eq2): Handle DW_FORM_implicit_const by also comparing
	the value.
	(compute_abbrev_hash): Likewise.
	(read_abbrev): Read DW_FORM_implicit_const.
	(skip_attr_no_dw_form_indirect): Skip DW_FORM_implicit_const.
	(get_AT): Return pointer to value for DW_FORM_implicit_const.
	(get_AT_int): Return value of DW_FORM_implicit_const.
	(checksum_die): Get value for DW_AT_decl_file or DW_AT_call_file
	from DW_FORM_implicit_const, skip for others (should the value
	then be in the u.p1.die_hash instead?).
	(checksum_ref_die): Likewise.
	(die_eq_1): Likewise.
	(mark_refs): Handle DW_FORM_implicit_const.
	(read_debug_info): Handle DW_FORM_implicit_const, set cu_lang
	for compile/partial units.
	(build_abbrevs_for_die): Handle DW_FORM_implicit_const value.
	Don't update value for DW_AT_decl_file or DW_AT_call_file here.
	(abbrev_cmp): Also compare DW_FORM_implicit_const values.
	(compute_abbrevs): Update DW_AT_decl_file and DW_AT_call_file
	DW_FORM_implicit_const values here.
	(write_abbrev): Also write DW_FORM_implicit_const value.
	(write_die): Handle DW_FORM_implicit_const, but don't change
	form/value for DW_AT_decl_file and DW_AT_call_file.
---
 dwz.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 118 insertions(+), 27 deletions(-)

diff --git a/dwz.c b/dwz.c
index 8f13298..f020d15 100644
--- a/dwz.c
+++ b/dwz.c
@@ -848,6 +848,8 @@ struct abbrev_attr
   unsigned int attr;
   /* DW_FORM_* form code.  */
   unsigned int form;
+  /* Only for DW_FORM_implicit_const.  */
+  int64_t value;
 };
 
 /* Internal structure for .debug_abbrev entries.  */
@@ -1211,7 +1213,9 @@ abbrev_eq2 (const void *p, const void *q)
     return 0;
   for (i = 0; i < t1->nattr; i++)
     if (t1->attr[i].attr != t2->attr[i].attr
-	|| t1->attr[i].form != t2->attr[i].form)
+	|| t1->attr[i].form != t2->attr[i].form
+	|| (t1->attr[i].form == DW_FORM_implicit_const
+	    && t1->attr[i].value != t2->attr[i].value))
       return 0;
   return 1;
 }
@@ -1229,6 +1233,8 @@ compute_abbrev_hash (struct abbrev_tag *t)
     {
       t->hash = iterative_hash_object (t->attr[i].attr, t->hash);
       t->hash = iterative_hash_object (t->attr[i].form, t->hash);
+      if (t->attr[i].form == DW_FORM_implicit_const)
+	t->hash = iterative_hash_object (t->attr[i].value, t->hash);
     }
 }
 
@@ -1259,7 +1265,9 @@ read_abbrev (DSO *dso, unsigned char *ptr)
 	{
 	  nattr++;
 	  form = read_uleb128 (p);
-	  if (form == 2
+	  if (form == DW_FORM_implicit_const)
+	    skip_leb128 (p);
+	  else if (form == 2
 	      || (form > DW_FORM_flag_present
 		  && (form != DW_FORM_ref_sig8
 		      || form != DW_FORM_data16
@@ -1291,6 +1299,8 @@ read_abbrev (DSO *dso, unsigned char *ptr)
       while ((attr = read_uleb128 (ptr)) != 0)
 	{
 	  form = read_uleb128 (ptr);
+	  if (form == DW_FORM_implicit_const)
+	    t->attr[t->nattr].value = read_sleb128 (ptr);
 	  t->attr[t->nattr].attr = attr;
 	  t->attr[t->nattr++].form = form;
 	}
@@ -1691,6 +1701,7 @@ skip_attr_no_dw_form_indirect (unsigned int cu_version, uint32_t form,
       ptr += ptr_size;
       break;
     case DW_FORM_flag_present:
+    case DW_FORM_implicit_const:
       break;
     case DW_FORM_ref1:
     case DW_FORM_flag:
@@ -1791,6 +1802,8 @@ get_AT (dw_die_ref die, enum dwarf_attribute at, enum dwarf_form *formp)
       if (t->attr[i].attr == at)
 	{
 	  *formp = form;
+	  if (form == DW_FORM_implicit_const)
+	    return (unsigned char *) &t->attr[i].value;
 	  return ptr;
 	}
 
@@ -1839,6 +1852,8 @@ get_AT_int (dw_die_ref die, enum dwarf_attribute at, bool *present,
     case DW_FORM_ref_udata:
     case DW_FORM_udata:
       return read_uleb128 (ptr);
+    case DW_FORM_implicit_const:
+      return *(uint64_t *)ptr; /* See get_AT.  */
     default:
       *present = false;
       return 0;
@@ -2837,6 +2852,8 @@ checksum_die (DSO *dso, dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die)
 	    case DW_FORM_data8: value = read_64 (ptr); handled = true; break;
 	    case DW_FORM_udata:
 	      value = read_uleb128 (ptr); handled = true; break;
+	    case DW_FORM_implicit_const:
+	      value = t->attr[i].value; handled = true; break;
 	    default:
 	      error (0, 0, "%s: Unhandled %s for %s",
 		     dso->filename, get_DW_FORM_str (form),
@@ -2971,6 +2988,7 @@ checksum_die (DSO *dso, dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die)
 	  ptr += ptr_size;
 	  break;
 	case DW_FORM_flag_present:
+	case DW_FORM_implicit_const:
 	  break;
 	case DW_FORM_flag:
 	case DW_FORM_data1:
@@ -3403,6 +3421,7 @@ checksum_ref_die (dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die,
 	      ptr += ptr_size;
 	      break;
 	    case DW_FORM_flag_present:
+	    case DW_FORM_implicit_const:
 	      break;
 	    case DW_FORM_flag:
 	    case DW_FORM_data1:
@@ -4095,6 +4114,7 @@ die_eq_1 (dw_cu_ref cu1, dw_cu_ref cu2,
 	    case DW_FORM_data4: value1 = read_32 (ptr1); break;
 	    case DW_FORM_data8: value1 = read_64 (ptr1); break;
 	    case DW_FORM_udata: value1 = read_uleb128 (ptr1); break;
+	    case DW_FORM_implicit_const: value1 = t1->attr[i].value; break;
 	    default: abort ();
 	    }
 	  switch (form2)
@@ -4104,6 +4124,7 @@ die_eq_1 (dw_cu_ref cu1, dw_cu_ref cu2,
 	    case DW_FORM_data4: value2 = read_32 (ptr2); break;
 	    case DW_FORM_data8: value2 = read_64 (ptr2); break;
 	    case DW_FORM_udata: value2 = read_uleb128 (ptr2); break;
+	    case DW_FORM_implicit_const: value2 = t2->attr[j].value; break;
 	    default: abort ();
 	    }
 	  if (ignore_locus)
@@ -4224,6 +4245,7 @@ die_eq_1 (dw_cu_ref cu1, dw_cu_ref cu2,
 	  ptr2 += ptr_size;
 	  break;
 	case DW_FORM_flag_present:
+	case DW_FORM_implicit_const:
 	  break;
 	case DW_FORM_flag:
 	case DW_FORM_data1:
@@ -5316,6 +5338,7 @@ mark_refs (dw_cu_ref cu, dw_die_ref top_die, dw_die_ref die, int mode)
 	      ptr += ptr_size;
 	      break;
 	    case DW_FORM_flag_present:
+	    case DW_FORM_implicit_const:
 	      break;
 	    case DW_FORM_flag:
 	    case DW_FORM_data1:
@@ -6269,6 +6292,13 @@ read_debug_info (DSO *dso, int kind, unsigned int *die_count)
 		  break;
 		case DW_FORM_flag_present:
 		  break;
+		case DW_FORM_implicit_const:
+		  if (lang_p
+		      && (die->die_tag == DW_TAG_compile_unit
+			  || die->die_tag == DW_TAG_partial_unit)
+		      && t->attr[i].attr == DW_AT_language)
+		    cu->lang = t->attr[i].value;
+		  break;
 		case DW_FORM_data1:
 		  if (lang_p
 		      && (die->die_tag == DW_TAG_compile_unit
@@ -9785,6 +9815,8 @@ build_abbrevs_for_die (htab_t h, dw_cu_ref cu, dw_die_ref die,
 	    {
 	      t->attr[i].attr = reft->attr[i].attr;
 	      t->attr[i].form = reft->attr[i].form;
+	      if (t->attr[i].form == DW_FORM_implicit_const)
+		t->attr[i].value = reft->attr[i].value;
 	    }
 	  t->nattr = reft->nattr;
 	}
@@ -9814,28 +9846,36 @@ build_abbrevs_for_die (htab_t h, dw_cu_ref cu, dw_die_ref die,
 		    case DW_FORM_data4: value = read_32 (ptr); break;
 		    case DW_FORM_data8: value = read_64 (ptr); break;
 		    case DW_FORM_udata: value = read_uleb128 (ptr); break;
+		    case DW_FORM_implicit_const: break;
 		    default:
 		      error (0, 0, "Unhandled %s for %s",
 			     get_DW_FORM_str (form),
 			     get_DW_AT_str (reft->attr[i].attr));
 		      return 1;
 		    }
-		  value = line_htab_lookup (refcu, value);
-		  if (value <= 0xff)
-		    {
-		      form = DW_FORM_data1;
-		      die->die_size++;
-		    }
-		  else if (value <= 0xffff)
+		  /* Note that we transform DW_FORM_implicit_const in
+		     compute_abbrev ().  */
+		  if (form != DW_FORM_implicit_const)
 		    {
-		      form = DW_FORM_data2;
-		      die->die_size += 2;
+		      value = line_htab_lookup (refcu, value);
+		      if (value <= 0xff)
+			{
+			  form = DW_FORM_data1;
+			  die->die_size++;
+			}
+		      else if (value <= 0xffff)
+			{
+			  form = DW_FORM_data2;
+			  die->die_size += 2;
+			}
+		      else
+			{
+			  form = DW_FORM_data4;
+			  die->die_size += 4;
+			}
 		    }
 		  else
-		    {
-		      form = DW_FORM_data4;
-		      die->die_size += 4;
-		    }
+		    t->attr[j].value = reft->attr[i].value;
 		  t->attr[j].attr = reft->attr[i].attr;
 		  t->attr[j++].form = form;
 		  continue;
@@ -9941,6 +9981,7 @@ build_abbrevs_for_die (htab_t h, dw_cu_ref cu, dw_die_ref die,
 		    }
 		  break;
 		case DW_FORM_flag_present:
+		case DW_FORM_implicit_const:
 		  break;
 		case DW_FORM_flag:
 		case DW_FORM_data1:
@@ -10123,7 +10164,10 @@ build_abbrevs_for_die (htab_t h, dw_cu_ref cu, dw_die_ref die,
 	      if (form == DW_FORM_block1)
 		ptr += len;
 	      t->attr[j].attr = reft->attr[i].attr;
-	      t->attr[j++].form = reft->attr[i].form;
+	      t->attr[j].form = reft->attr[i].form;
+	      if (reft->attr[i].form == DW_FORM_implicit_const)
+		t->attr[j].value = reft->attr[i].value;
+	      j++;
 	      die->die_size += ptr - orig_ptr;
 	    }
 	  t->nattr = j;
@@ -10348,6 +10392,13 @@ abbrev_cmp (const void *p, const void *q)
 	return -1;
       if (t1->attr[i].form > t2->attr[i].form)
 	return 1;
+      if (t1->attr[i].form == DW_FORM_implicit_const)
+	{
+	  if (t1->attr[i].value < t2->attr[i].value)
+	    return -1;
+	  if (t1->attr[i].value > t2->attr[i].value)
+	    return 1;
+	}
     }
   return 0;
 }
@@ -10947,6 +10998,21 @@ compute_abbrevs (DSO *dso)
 	    {
 	      abbrev_size += size_of_uleb128 (arr[i]->attr[j].attr);
 	      abbrev_size += size_of_uleb128 (arr[i]->attr[j].form);
+	      if (arr[i]->attr[j].form == DW_FORM_implicit_const)
+		{
+		  /* If this is a shared abbrev for a file reference
+		     attribute, update to the new file number (in the
+		     mulifile .debug_line).  Note that this might
+		     change the abbrev size...  */
+		  if (unlikely (wr_multifile)
+		      && (arr[i]->attr[j].attr == DW_AT_decl_file
+			  || arr[i]->attr[j].attr == DW_AT_call_file))
+		    {
+		      uint64_t value = arr[i]->attr[j].value;
+		      arr[i]->attr[j].value = line_htab_lookup (cu, value);
+		    }
+		  abbrev_size += size_of_uleb128 (arr[i]->attr[j].value);
+		}
 	    }
 	  abbrev_size += 2;
 	}
@@ -11037,6 +11103,8 @@ write_abbrev (void)
 	    {
 	      write_uleb128 (ptr, arr[i]->attr[j].attr);
 	      write_uleb128 (ptr, arr[i]->attr[j].form);
+	      if (arr[i]->attr[j].form == DW_FORM_implicit_const)
+		write_uleb128 (ptr, arr[i]->attr[j].value);
 	    }
 	  *ptr++ = 0;
 	  *ptr++ = 0;
@@ -11398,22 +11466,44 @@ write_die (unsigned char *ptr, dw_cu_ref cu, dw_die_ref die,
 	      && (reft->attr[i].attr == DW_AT_decl_file
 		  || reft->attr[i].attr == DW_AT_call_file))
 	    {
+	      bool update = false;
 	      switch (form)
 		{
-		case DW_FORM_data1: value = read_8 (inptr); break;
-		case DW_FORM_data2: value = read_16 (inptr); break;
-		case DW_FORM_data4: value = read_32 (inptr); break;
-		case DW_FORM_data8: value = read_64 (inptr); break;
-		case DW_FORM_udata: value = read_uleb128 (inptr); break;
+		case DW_FORM_data1:
+		  value = read_8 (inptr);
+		  update = true;
+		  break;
+		case DW_FORM_data2:
+		  value = read_16 (inptr);
+		  update = true;
+		  break;
+		case DW_FORM_data4:
+		  value = read_32 (inptr);
+		  update = true;
+		  break;
+		case DW_FORM_data8:
+		  value = read_64 (inptr);
+		  update = true;
+		  break;
+		case DW_FORM_udata:
+		  value = read_uleb128 (inptr);
+		  update = true;
+		  break;
+		case DW_FORM_implicit_const:
+		  /* This gets updated in compute_abbrevs.  */
+		  break;
 		default: abort ();
 		}
-	      value = line_htab_lookup (refcu, value);
-	      switch (t->attr[j].form)
+	      if (update)
 		{
-		case DW_FORM_data1: write_8 (ptr, value); break;
-		case DW_FORM_data2: write_16 (ptr, value); break;
-		case DW_FORM_data4: write_32 (ptr, value); break;
-		default: abort ();
+		  value = line_htab_lookup (refcu, value);
+		  switch (t->attr[j].form)
+		    {
+		    case DW_FORM_data1: write_8 (ptr, value); break;
+		    case DW_FORM_data2: write_16 (ptr, value); break;
+		    case DW_FORM_data4: write_32 (ptr, value); break;
+		    default: abort ();
+		    }
 		}
 	      j++;
 	      continue;
@@ -11505,6 +11595,7 @@ write_die (unsigned char *ptr, dw_cu_ref cu, dw_die_ref die,
 		}
 	      break;
 	    case DW_FORM_flag_present:
+	    case DW_FORM_implicit_const:
 	      break;
 	    case DW_FORM_flag:
 	    case DW_FORM_data1:
-- 
2.18.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] Handle DW_FORM_implicit_const [experiment].
  2020-09-24 16:25 ` [PATCH 4/4] Handle DW_FORM_implicit_const [experiment] Mark Wielaard
@ 2020-09-24 19:57   ` Jakub Jelinek
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-09-24 19:57 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: dwz

On Thu, Sep 24, 2020 at 06:25:57PM +0200, Mark Wielaard wrote:
> --- a/dwz.c
> +++ b/dwz.c
> @@ -848,6 +848,8 @@ struct abbrev_attr
>    unsigned int attr;
>    /* DW_FORM_* form code.  */
>    unsigned int form;
> +  /* Only for DW_FORM_implicit_const.  */
> +  int64_t value;
>  };

Indeed, this is the most expensive choice memory-wise, but I guess it might
be interesting to take some very large dwz run (e.g. the webkits) and
measure how many abbrev_attr structs we have actually allocated at peak time
and whether it just isn't acceptable.  It is true we'd waste all that memory
e.g. when processing DWARF4 and never actually use it in that case.
On the other side, it is the cleanest and fastest approach.

Another possibility is to add int64_t * field to struct abbrev_tag and
if the abbreviation contains at least one DW_FORM_implicit_const, then
allocate tail payload of int64_ts (either nattr of them, or perhaps optimize
and allocate only index of highest DW_FORM_implicit_const + 1 ones,
or optimize even more and allocate index of highest DW_FORM_implicit_const
- index of lowest DW_FORM_implicit_const + 1 times int64_t and point the
pointer earlier such that t1->value[i] would be the the value for each
DW_FORM_implicit_const).  Or even optimize memory even more and just
allocate as many int64_ts as there are DW_FORM_implicit_const forms.
Except that then all the loops that iterate from 0 to nattr looking through
the attributes would need to track how many DW_FORM_implicit_const were seen
(or there would need to be another loop to count that afterwards).

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-25 16:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-24 16:25 Read and write DWARF5 units and forms Mark Wielaard
2020-09-24 16:25 ` [PATCH 1/4] Calculate size and write correct DWARF5 header Mark Wielaard
2020-09-24 19:39   ` Jakub Jelinek
2020-09-25 16:35     ` Mark Wielaard
2020-09-25 16:39       ` Jakub Jelinek
2020-09-24 16:25 ` [PATCH 2/4] Handle DW_FORM_data16 Mark Wielaard
2020-09-24 19:43   ` Jakub Jelinek
2020-09-24 21:53     ` Jakub Jelinek
2020-09-25 16:42     ` Mark Wielaard
2020-09-24 16:25 ` [PATCH 3/4] Handle DW_FORM_line_strp by not moving DIE Mark Wielaard
2020-09-24 19:45   ` Jakub Jelinek
2020-09-24 16:25 ` [PATCH 4/4] Handle DW_FORM_implicit_const [experiment] Mark Wielaard
2020-09-24 19:57   ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).