public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 2/2] libdw: Reduce size of struct Dwarf_Abbrev.
  2017-12-26 19:38 Simpler abbrev parsing using less memory Mark Wielaard
  2017-12-26 19:38 ` [PATCH 1/2] libdw: New get_uleb128_unchecked to use with already checked Dwarf_Abbrev Mark Wielaard
@ 2017-12-26 19:38 ` Mark Wielaard
  2018-01-01 22:59 ` Simpler abbrev parsing using less memory Mark Wielaard
  2 siblings, 0 replies; 4+ messages in thread
From: Mark Wielaard @ 2017-12-26 19:38 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard

If we don't cache the attrcnt and use bitfields for the has_children and
code we can reduce the size of struct Dwarf Abbrev from 32 to 24 bytes on
64bit architectures and from 28 to 20 bytes on 32bit arches.

Signed-off-by: Mark Wielaard <mark@klomp.org>
---
 libdw/ChangeLog          |  7 +++++++
 libdw/dwarf_getabbrev.c  |  7 +++----
 libdw/dwarf_getattrcnt.c | 19 +++++++++++++++++--
 libdw/libdwP.h           | 15 +++++++--------
 4 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index 3e3b7169..c284dbbd 100644
--- a/libdw/ChangeLog
+++ b/libdw/ChangeLog
@@ -1,3 +1,10 @@
+2017-12-26  Mark Wielaard  <mark@klomp.org>
+
+	* libdwP.h (struct Dwarf_Abbrev): Pack struct. Remove attrcnt,
+	use bitfields for has_children and code.
+	* dwarf_getabbrev.c (__libdw_getabbrev): Don't count attrs.
+	* dwarf_getattrcnt.c (dwarf_getattrcnt): Count attrs.
+
 2017-12-26  Mark Wielaard  <mark@klomp.org>
 
 	* memory-access.h (__libdw_get_uleb128_unchecked): New function.
diff --git a/libdw/dwarf_getabbrev.c b/libdw/dwarf_getabbrev.c
index ef51b847..a3a68b37 100644
--- a/libdw/dwarf_getabbrev.c
+++ b/libdw/dwarf_getabbrev.c
@@ -1,5 +1,5 @@
 /* Get abbreviation at given offset.
-   Copyright (C) 2003, 2004, 2005, 2006, 2014 Red Hat, Inc.
+   Copyright (C) 2003, 2004, 2005, 2006, 2014, 2017 Red Hat, Inc.
    This file is part of elfutils.
    Written by Ulrich Drepper <drepper@redhat.com>, 2003.
 
@@ -121,8 +121,7 @@ __libdw_getabbrev (Dwarf *dbg, struct Dwarf_CU *cu, Dwarf_Off offset,
   abb->attrp = (unsigned char *) abbrevp;
   abb->offset = offset;
 
-  /* Skip over all the attributes and count them while doing so.  */
-  abb->attrcnt = 0;
+  /* Skip over all the attributes and check rest of the abbrev is valid.  */
   unsigned int attrname;
   unsigned int attrform;
   do
@@ -134,7 +133,7 @@ __libdw_getabbrev (Dwarf *dbg, struct Dwarf_CU *cu, Dwarf_Off offset,
 	goto invalid;
       get_uleb128 (attrform, abbrevp, end);
     }
-  while (attrname != 0 && attrform != 0 && ++abb->attrcnt);
+  while (attrname != 0 && attrform != 0);
 
   /* Return the length to the caller if she asked for it.  */
   if (lengthp != NULL)
diff --git a/libdw/dwarf_getattrcnt.c b/libdw/dwarf_getattrcnt.c
index 2bfb4ac5..a05976d4 100644
--- a/libdw/dwarf_getattrcnt.c
+++ b/libdw/dwarf_getattrcnt.c
@@ -1,5 +1,5 @@
 /* Get number of attributes of abbreviation.
-   Copyright (C) 2003, 2004 Red Hat, Inc.
+   Copyright (C) 2003, 2004, 2017 Red Hat, Inc.
    This file is part of elfutils.
    Written by Ulrich Drepper <drepper@redhat.com>, 2003.
 
@@ -40,7 +40,22 @@ dwarf_getattrcnt (Dwarf_Abbrev *abbrev, size_t *attrcntp)
   if (abbrev == NULL)
     return -1;
 
-  *attrcntp = abbrev->attrcnt;
+  const unsigned char *abbrevp = abbrev->attrp;
+
+  /* Skip over all the attributes and count them while doing so.  */
+  int attrcnt = 0;
+  unsigned int attrname;
+  unsigned int attrform;
+  do
+    {
+      /* We can use unchecked since they were checked when the Dwrf_Abbrev
+	 was created.  */
+      get_uleb128_unchecked (attrname, abbrevp);
+      get_uleb128_unchecked (attrform, abbrevp);
+    }
+  while (attrname != 0 && attrform != 0 && ++attrcnt);
+
+  *attrcntp = attrcnt;
 
   return 0;
 }
diff --git a/libdw/libdwP.h b/libdw/libdwP.h
index 82b47d09..fbb8ed08 100644
--- a/libdw/libdwP.h
+++ b/libdw/libdwP.h
@@ -1,5 +1,5 @@
 /* Internal definitions for libdwarf.
-   Copyright (C) 2002-2011, 2013-2016 Red Hat, Inc.
+   Copyright (C) 2002-2011, 2013-2017 Red Hat, Inc.
    This file is part of elfutils.
    Written by Ulrich Drepper <drepper@redhat.com>, 2002.
 
@@ -212,13 +212,12 @@ struct Dwarf
 /* Abbreviation representation.  */
 struct Dwarf_Abbrev
 {
-  Dwarf_Off offset;
-  unsigned char *attrp;
-  unsigned int attrcnt;
-  unsigned int code;
-  unsigned int tag;
-  bool has_children;
-};
+  Dwarf_Off offset;	  /* Offset to start of abbrev into .debug_abbrev.  */
+  unsigned char *attrp;   /* Pointer to start of attribute name/form pairs. */
+  bool has_children : 1;  /* Whether or not the DIE has children. */
+  unsigned int code : 31; /* The (unique) abbrev code.  */
+  unsigned int tag;	  /* The tag of the DIE. */
+} attribute_packed;
 
 #include "dwarf_abbrev_hash.h"
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] libdw: New get_uleb128_unchecked to use with already checked Dwarf_Abbrev.
  2017-12-26 19:38 Simpler abbrev parsing using less memory Mark Wielaard
@ 2017-12-26 19:38 ` Mark Wielaard
  2017-12-26 19:38 ` [PATCH 2/2] libdw: Reduce size of struct Dwarf_Abbrev Mark Wielaard
  2018-01-01 22:59 ` Simpler abbrev parsing using less memory Mark Wielaard
  2 siblings, 0 replies; 4+ messages in thread
From: Mark Wielaard @ 2017-12-26 19:38 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard

When creating a Dwarf_Abbrev in dwarf_getabbrev (__libdw_getabbrev) we
already check it is fully readable from the .debug_abbrev section. So
whenever we reread it later using the attrp pointer we don't have to
check it again. Introduce get_uleb128_unchecked to use for ulebs we
know are safe to read directly.

Signed-off-by: Mark Wielaard <mark@klomp.org>
---
 libdw/ChangeLog             | 10 ++++++++++
 libdw/dwarf_child.c         | 19 +++++--------------
 libdw/dwarf_getabbrevattr.c | 10 +++++-----
 libdw/dwarf_getattrs.c      | 20 +++++---------------
 libdw/dwarf_hasattr.c       | 22 +++++-----------------
 libdw/memory-access.h       | 18 ++++++++++++++++++
 6 files changed, 48 insertions(+), 51 deletions(-)

diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index eb1cb709..3e3b7169 100644
--- a/libdw/ChangeLog
+++ b/libdw/ChangeLog
@@ -1,3 +1,13 @@
+2017-12-26  Mark Wielaard  <mark@klomp.org>
+
+	* memory-access.h (__libdw_get_uleb128_unchecked): New function.
+	(get_uleb128_unchecked): New define.
+	* dwarf_child.c (__libdw_find_attr): Use get_uleb128_unchecked to
+	read attr name and form.
+	* dwarf_getabbrevattr.c (dwarf_getabbrevattr): Likewise.
+	* dwarf_getattrs.c (dwarf_getattrs): Likewise.
+	* dwarf_hasattr.c (dwarf_hasattr): Likewise.
+
 2017-05-09  Ulf Hermann  <ulf.hermann@qt.io>
 	    Mark Wielaard  <mark@klomp.org>
 
diff --git a/libdw/dwarf_child.c b/libdw/dwarf_child.c
index cc95fb3f..248338eb 100644
--- a/libdw/dwarf_child.c
+++ b/libdw/dwarf_child.c
@@ -1,5 +1,5 @@
 /* Return child of current DIE.
-   Copyright (C) 2003-2011, 2014 Red Hat, Inc.
+   Copyright (C) 2003-2011, 2014, 2017 Red Hat, Inc.
    This file is part of elfutils.
    Written by Ulrich Drepper <drepper@redhat.com>, 2003.
 
@@ -43,36 +43,27 @@ internal_function
 __libdw_find_attr (Dwarf_Die *die, unsigned int search_name,
 		   unsigned int *codep, unsigned int *formp)
 {
-  Dwarf *dbg = die->cu->dbg;
   const unsigned char *readp;
 
   /* Find the abbreviation entry.  */
   Dwarf_Abbrev *abbrevp = __libdw_dieabbrev (die, &readp);
   if (unlikely (abbrevp == DWARF_END_ABBREV))
     {
-    invalid_dwarf:
       __libdw_seterrno (DWARF_E_INVALID_DWARF);
       return NULL;
     }
 
-  /* Search the name attribute.  */
-  unsigned char *const endp
-    = ((unsigned char *) dbg->sectiondata[IDX_debug_abbrev]->d_buf
-       + dbg->sectiondata[IDX_debug_abbrev]->d_size);
-
+  /* Search the name attribute.  Attribute has been checked when
+     Dwarf_Abbrev was created, we can read unchecked.  */
   const unsigned char *attrp = abbrevp->attrp;
   while (1)
     {
       /* Get attribute name and form.  */
-      if (unlikely (attrp >= endp))
-	goto invalid_dwarf;
       unsigned int attr_name;
-      get_uleb128 (attr_name, attrp, endp);
+      get_uleb128_unchecked (attr_name, attrp);
 
-      if (unlikely (attrp >= endp))
-	goto invalid_dwarf;
       unsigned int attr_form;
-      get_uleb128 (attr_form, attrp, endp);
+      get_uleb128_unchecked (attr_form, attrp);
 
       /* We can stop if we found the attribute with value zero.  */
       if (attr_name == 0 && attr_form == 0)
diff --git a/libdw/dwarf_getabbrevattr.c b/libdw/dwarf_getabbrevattr.c
index 3b4da99c..57fe3637 100644
--- a/libdw/dwarf_getabbrevattr.c
+++ b/libdw/dwarf_getabbrevattr.c
@@ -1,5 +1,5 @@
 /* Get specific attribute of abbreviation.
-   Copyright (C) 2003, 2004, 2005, 2014 Red Hat, Inc.
+   Copyright (C) 2003, 2004, 2005, 2014, 2017 Red Hat, Inc.
    This file is part of elfutils.
    Written by Ulrich Drepper <drepper@redhat.com>, 2003.
 
@@ -53,10 +53,10 @@ dwarf_getabbrevattr (Dwarf_Abbrev *abbrev, size_t idx, unsigned int *namep,
     {
       start_attrp = attrp;
 
-      /* Attribute code and form are encoded as ULEB128 values.i
-         XXX We have no way to bounds check.  */
-      get_uleb128 (name, attrp, attrp + len_leb128 (name));
-      get_uleb128 (form, attrp, attrp + len_leb128 (form));
+      /* Attribute code and form are encoded as ULEB128 values.
+         Already checked when Dwarf_Abbrev was created, read unchecked.  */
+      get_uleb128_unchecked (name, attrp);
+      get_uleb128_unchecked (form, attrp);
 
       /* If both values are zero the index is out of range.  */
       if (name == 0 && form == 0)
diff --git a/libdw/dwarf_getattrs.c b/libdw/dwarf_getattrs.c
index 0da8b5ba..7f55fafc 100644
--- a/libdw/dwarf_getattrs.c
+++ b/libdw/dwarf_getattrs.c
@@ -1,5 +1,5 @@
 /* Get attributes of the DIE.
-   Copyright (C) 2004, 2005, 2008, 2009, 2014 Red Hat, Inc.
+   Copyright (C) 2004, 2005, 2008, 2009, 2014, 2017 Red Hat, Inc.
    This file is part of elfutils.
    Written by Ulrich Drepper <drepper@redhat.com>, 2004.
 
@@ -51,7 +51,6 @@ dwarf_getattrs (Dwarf_Die *die, int (*callback) (Dwarf_Attribute *, void *),
 
   if (unlikely (abbrevp == DWARF_END_ABBREV))
     {
-    invalid_dwarf:
       __libdw_seterrno (DWARF_E_INVALID_DWARF);
       return -1l;
     }
@@ -61,24 +60,15 @@ dwarf_getattrs (Dwarf_Die *die, int (*callback) (Dwarf_Attribute *, void *),
   const unsigned char *const offset_attrp = abbrevp->attrp + offset;
 
   /* Go over the list of attributes.  */
-  Dwarf *dbg = die->cu->dbg;
-  const unsigned char *endp;
-  endp = ((const unsigned char *) dbg->sectiondata[IDX_debug_abbrev]->d_buf
-	  + dbg->sectiondata[IDX_debug_abbrev]->d_size);
   while (1)
     {
-      /* Are we still in bounds?  */
-      if (unlikely (attrp >= endp))
-	goto invalid_dwarf;
-
-      /* Get attribute name and form.  */
+      /* Get attribute name and form.  Dwarf_Abbrev was checked when
+	 created, so we can read unchecked.  */
       Dwarf_Attribute attr;
       const unsigned char *remembered_attrp = attrp;
 
-      get_uleb128 (attr.code, attrp, endp);
-      if (unlikely (attrp >= endp))
-	goto invalid_dwarf;
-      get_uleb128 (attr.form, attrp, endp);
+      get_uleb128_unchecked (attr.code, attrp);
+      get_uleb128_unchecked (attr.form, attrp);
 
       /* We can stop if we found the attribute with value zero.  */
       if (attr.code == 0 && attr.form == 0)
diff --git a/libdw/dwarf_hasattr.c b/libdw/dwarf_hasattr.c
index 2bb8dc82..90b333e6 100644
--- a/libdw/dwarf_hasattr.c
+++ b/libdw/dwarf_hasattr.c
@@ -1,5 +1,5 @@
 /* Check whether given DIE has specific attribute.
-   Copyright (C) 2003, 2005, 2014 Red Hat, Inc.
+   Copyright (C) 2003, 2005, 2014, 2017 Red Hat, Inc.
    This file is part of elfutils.
    Written by Ulrich Drepper <drepper@redhat.com>, 2003.
 
@@ -45,32 +45,20 @@ dwarf_hasattr (Dwarf_Die *die, unsigned int search_name)
   Dwarf_Abbrev *abbrevp = __libdw_dieabbrev (die, NULL);
   if (unlikely (abbrevp == DWARF_END_ABBREV))
     {
-    invalid_dwarf:
       __libdw_seterrno (DWARF_E_INVALID_DWARF);
       return 0;
     }
 
-  Dwarf *dbg = die->cu->dbg;
-
-  /* Search the name attribute.  */
-  unsigned char *const endp
-    = ((unsigned char *) dbg->sectiondata[IDX_debug_abbrev]->d_buf
-       + dbg->sectiondata[IDX_debug_abbrev]->d_size);
-
+  /* Search the name attribute.  Dwarf_Abbrev was checked when created,
+     so we can read unchecked here.  */
   const unsigned char *attrp = abbrevp->attrp;
   while (1)
     {
-      /* Are we still in bounds?  This test needs to be refined.  */
-      if (unlikely (attrp >= endp))
-	goto invalid_dwarf;
-
       /* Get attribute name and form.  */
       unsigned int attr_name;
-      get_uleb128 (attr_name, attrp, endp);
+      get_uleb128_unchecked (attr_name, attrp);
       unsigned int attr_form;
-      if (unlikely (attrp >= endp))
-	goto invalid_dwarf;
-      get_uleb128 (attr_form, attrp, endp);
+      get_uleb128_unchecked (attr_form, attrp);
 
       /* We can stop if we found the attribute with value zero.  */
       if (attr_name == 0 || attr_form == 0)
diff --git a/libdw/memory-access.h b/libdw/memory-access.h
index afb651fc..ed68bdb9 100644
--- a/libdw/memory-access.h
+++ b/libdw/memory-access.h
@@ -87,8 +87,26 @@ __libdw_get_uleb128 (const unsigned char **addrp, const unsigned char *end)
   return UINT64_MAX;
 }
 
+static inline uint64_t
+__libdw_get_uleb128_unchecked (const unsigned char **addrp)
+{
+  uint64_t acc = 0;
+
+  /* Unroll the first step to help the compiler optimize
+     for the common single-byte case.  */
+  get_uleb128_step (acc, *addrp, 0);
+
+  const size_t max = len_leb128 (uint64_t);
+  for (size_t i = 1; i < max; ++i)
+    get_uleb128_step (acc, *addrp, i);
+  /* Other implementations set VALUE to UINT_MAX in this
+     case.  So we better do this as well.  */
+  return UINT64_MAX;
+}
+
 /* Note, addr needs to me smaller than end. */
 #define get_uleb128(var, addr, end) ((var) = __libdw_get_uleb128 (&(addr), end))
+#define get_uleb128_unchecked(var, addr) ((var) = __libdw_get_uleb128_unchecked (&(addr)))
 
 /* The signed case is similar, but we sign-extend the result.  */
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Simpler abbrev parsing using less memory
@ 2017-12-26 19:38 Mark Wielaard
  2017-12-26 19:38 ` [PATCH 1/2] libdw: New get_uleb128_unchecked to use with already checked Dwarf_Abbrev Mark Wielaard
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Mark Wielaard @ 2017-12-26 19:38 UTC (permalink / raw)
  To: elfutils-devel

Hi,

When we added bounds checking to almost all data reading functions
(commit 7a05347 libdw: Add get_uleb128 and get_sleb128 bounds checking)
we also added extra checks to the abbrev reading. But since we didn't
really have bounds for the "raw" Dwarf_Abbrev reading functions we
just "guessed" the maximum of a uleb128. This wasn't really correct
and not really needed. A struct Dwarf_Abbrev can only be created by
__libdw_getabbrev, which checks the whole abbrev (code, tag, children
and attribute names/forms) is valid already. So whenever we use the
attrp pointer from the Dwarf_Abbrev to read the name/forms we already
know they are in the .debug_abbrev bounds).

[PATCH 1/2] libdw: New get_uleb128_unchecked to use with already
                   checked Dwarf_Abbrev.

So the first patch introduces a get_uleb128_unchecked function that
is used for re-reading such uleb128 values.

The second patch reduces the size of the struct Dwarf_Abbrev by not
storing the attrcnt and by using bitfields for has_children and code.

[PATCH 2/2] libdw: Reduce size of struct Dwarf_Abbrev.

The attrcnt was only used by the dwarf_getattrcnt function. Which
is only used in one testcase. And which seems mostly unnecessary
for real programs. The function now explicitly counts the attrs
instead of using a cached value.

The combined patches very slightly reduces the time for parsing
abbrevs and make the abbrev cache somewhat smaller.

On my machine eu-readelf -N --debug-dump=info libstdc++.so.debug
goes down from 1.79 to 1.71 secs. And max rss goes down from 15.296
to 14.684 kbytes.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Simpler abbrev parsing using less memory
  2017-12-26 19:38 Simpler abbrev parsing using less memory Mark Wielaard
  2017-12-26 19:38 ` [PATCH 1/2] libdw: New get_uleb128_unchecked to use with already checked Dwarf_Abbrev Mark Wielaard
  2017-12-26 19:38 ` [PATCH 2/2] libdw: Reduce size of struct Dwarf_Abbrev Mark Wielaard
@ 2018-01-01 22:59 ` Mark Wielaard
  2 siblings, 0 replies; 4+ messages in thread
From: Mark Wielaard @ 2018-01-01 22:59 UTC (permalink / raw)
  To: elfutils-devel

On Tue, Dec 26, 2017 at 08:38:38PM +0100, Mark Wielaard wrote:
> [PATCH 1/2] libdw: New get_uleb128_unchecked to use with already
>                    checked Dwarf_Abbrev.
> 
> So the first patch introduces a get_uleb128_unchecked function that
> is used for re-reading such uleb128 values.
> 
> The second patch reduces the size of the struct Dwarf_Abbrev by not
> storing the attrcnt and by using bitfields for has_children and code.
> 
> [PATCH 2/2] libdw: Reduce size of struct Dwarf_Abbrev.

I pushed both patches to master.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-01-01 22:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-26 19:38 Simpler abbrev parsing using less memory Mark Wielaard
2017-12-26 19:38 ` [PATCH 1/2] libdw: New get_uleb128_unchecked to use with already checked Dwarf_Abbrev Mark Wielaard
2017-12-26 19:38 ` [PATCH 2/2] libdw: Reduce size of struct Dwarf_Abbrev Mark Wielaard
2018-01-01 22:59 ` Simpler abbrev parsing using less memory Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).