public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* RFC Dwarf2 referent dies
@ 2003-03-13 15:09 Tim Combs
  2003-03-14 17:08 ` Daniel Berlin
  0 siblings, 1 reply; 2+ messages in thread
From: Tim Combs @ 2003-03-13 15:09 UTC (permalink / raw)
  To: gdb

Working with the Arm ADS 1.1 compiler, I noticed a problem with
typedefs and gdb.  The setup would be:

   In a header file
foo.h
  typedef INT32 int;

foo2.h
  include "foo.h"
  typedef RETCODE INT32;

foobar.c
   include "foo2.h"
   RETCODE = foobar(void);

foo.c
   include "foo2.h"
   RETCODE = foobar();   

The relevant sections of the dwarf2 dump looks like this:

** Section #5 '.debug_info' (SHT_PROGBITS)
    Size   : 684 bytes

  Header
    size 0x44 bytes, dwarf version 2, abbrevp 0x0, address size 4
  00000b: 11  = 0x11 (DW_TAG_compile_unit)
  00000c:   DW_AT_name foo.c
  000012:   DW_AT_producer Thumb C Compiler, ADS1.1 [Build 712]
  000037:   DW_AT_language 0x1
  000038:   DW_AT_macro_info 0x0
  00003c:   DW_AT_stmt_list 0x0
  000040:   4  = 0x24 (DW_TAG_base_type)
  000041:     DW_AT_byte_size 0x4
  000042:     DW_AT_encoding DW_ATE_signed
  000043:     DW_AT_name int
  000047:   0  null
  Header
    size 0x48 bytes, dwarf version 2, abbrevp 0x0, address size 4
  000053: 8  = 0x11 (DW_TAG_compile_unit)
  000054:   DW_AT_name foo.h
  00005a:   DW_AT_producer Thumb C Compiler, ADS1.1 [Build 712]
  00007f:   DW_AT_language 0x1
  000080:   4  = 0x24 (DW_TAG_base_type)
  000081:     DW_AT_byte_size 0x4
  000082:     DW_AT_encoding DW_ATE_signed
  000083:     DW_AT_name int
  000087:   56  = 0x16 (DW_TAG_typedef)
  000088:     DW_AT_name INT32
  00008e:     DW_AT_type indirect DW_FORM_ref_udata 0x38 (0x80)
  000090:   0  null
  000091: 0  padding
  000092: 0  padding
  000093: 0  padding
  Header
    size 0x4c bytes, dwarf version 2, abbrevp 0x0, address size 4
  00009f: 11  = 0x11 (DW_TAG_compile_unit)
  0000a0:   DW_AT_name foo2.h
  0000a7:   DW_AT_producer Thumb C Compiler, ADS1.1 [Build 712]
  0000cc:   DW_AT_language 0x1
  0000cd:   DW_AT_macro_info 0x17c
  0000d1:   DW_AT_stmt_list 0xac
  0000d5:   56  = 0x16 (DW_TAG_typedef)
  0000d6:     DW_AT_name RETCODE
  0000de:     DW_AT_type indirect DW_FORM_ref_addr 0x87
  0000e3:   0  null

<snip> .... foo.c compilation unit missing

  Header
    size 0xa8 bytes, dwarf version 2, abbrevp 0x0, address size 4
  00020b: 5  = 0x11 (DW_TAG_compile_unit)
  00020c:   DW_AT_name foobar.c
  000215:   DW_AT_producer Thumb C Compiler, ADS1.1 [Build 712]
  00023a:   DW_AT_language 0x1
  00023b:   DW_AT_low_pc 0x138
  00023f:   DW_AT_high_pc 0x144
  000243:   DW_AT_stmt_list 0xdc
  000247:   35  = 0x2e (DW_TAG_subprogram)
  000248:     DW_AT_sibling 0xab (0x2ab)
  00024b:     DW_AT_decl_file 0x1
  00024c:     DW_AT_decl_line 0x4
  00024d:     DW_AT_decl_column 0x0
  00024e:     DW_AT_name foobar
  000255:     DW_AT_external 0x1
  000256:     DW_AT_type indirect DW_FORM_ref_addr 0xd5
  00025b:     DW_AT_low_pc 0x138
  00025f:     DW_AT_high_pc 0x144
<snip> -- rest doesn't matter.

To get the DW_AT_type of function foobar():
1. Read at offset 0xd5 DW_FORM_ref_addr ---that tells you to go to--->
2. Read at offset 0x87 DW_FORM_ref_udata ---that tells you to go read
                                          offset of 0x38 of the compilation
                                          unit.

The problem comes in when dwarf2_get_ref_die_offset() is called to figure
this out.  The offset that is added to the attribute is set to the original
compilation unit.  Since we have arrived by the "back door", we don't know
the compilation unit's offset:
dwarf2_get_ref_die_offset()
    switch (attr->form)
    {
    case DW_FORM_ref_addr:
      result = DW_ADDR (attr); 
      break;
    case DW_FORM_ref1: 
    case DW_FORM_ref2:
    case DW_FORM_ref4:
    case DW_FORM_ref8:
    case DW_FORM_ref_udata:
      result = cu_header_offset + DW_UNSND (attr);
               ^^^^^^^^^^^^^^^
      break;

My proposal for working around this problem is based on an old patch from
Daniel Berlin.  The idea is to create a binary tree of compilation unit headers
using a splay algorithm and walk through the table and look 
at the offset and length of each compilation unit.  Then find a compilation 
unit that contains this offset. I used a splay tree because it seems that
a developer generally works on one area of the code and that keeps the
most used compilation units at the top of the tree where they are quick to
find.  Then dwarf2_get_ref_die_offset() snippet would look something like this:

  switch (attr->form)
    {
    case DW_FORM_ref_addr:
      result = DW_ADDR (attr);
      break;
    case DW_FORM_ref1:
    case DW_FORM_ref2:
    case DW_FORM_ref4:
    case DW_FORM_ref8:
    case DW_FORM_ref_udata:
      result = find_cu_header_offset(offset, objfile);
      if (result == FALSE) 
          {
              complain (&dwarf2_unknown_cu_offset, offset);
              return 0;
          }
      result += DW_UNSND (attr);
      break;

I implemented this in 5.0 and although the change was pretty invasive it has
seemed to work for a couple of years.  I would like to take advantage of the
new location list changes and better C++ support in the current GDB but 
can't do it until something like this is in the current GDB source base.  

Comments?

Tim Combs

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RFC Dwarf2 referent dies
  2003-03-13 15:09 RFC Dwarf2 referent dies Tim Combs
@ 2003-03-14 17:08 ` Daniel Berlin
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Berlin @ 2003-03-14 17:08 UTC (permalink / raw)
  To: Tim Combs; +Cc: gdb

>
> To get the DW_AT_type of function foobar():
> 1. Read at offset 0xd5 DW_FORM_ref_addr ---that tells you to go to--->
> 2. Read at offset 0x87 DW_FORM_ref_udata ---that tells you to go read
>                                           offset of 0x38 of the compilation
>                                           unit.
>
> The problem comes in when dwarf2_get_ref_die_offset() is called to figure
> this out.  The offset that is added to the attribute is set to the original
> compilation unit.  Since we have arrived by the "back door", we don't know
> the compilation unit's offset:

Not that we actually *need* to know it beforehand, we just have to realize
it's not necessarily the same, and go hunting for the right CU in that
case.

 > dwarf2_get_ref_die_offset()
>     switch (attr->form)
>     {
>     case DW_FORM_ref_addr:
>       result = DW_ADDR (attr);
>       break;
>     case DW_FORM_ref1:
>     case DW_FORM_ref2:
>     case DW_FORM_ref4:
>     case DW_FORM_ref8:
>     case DW_FORM_ref_udata:
>       result = cu_header_offset + DW_UNSND (attr);
>                ^^^^^^^^^^^^^^^
>       break;
>
> My proposal for working around this problem is based on an old patch from
> Daniel Berlin.

Whoops, you don't want to use my code. It's horrendous, ugly, evil, and
bad. Just ask Andrew. (Even though it works fine and has been used without
trouble by people for years).  I'm surprised your gdb doesn't explode into
flames just by including it.

> The idea is to create a binary tree of compilation unit headers
> using a splay algorithm and walk through the table and look
> at the offset and length of each compilation unit.  Then find a compilation
> unit that contains this offset.

This also means being able to read in a CU on demand because we might be
at a DIE that is in a currently unread CU.
> I used a splay tree because it seems that
> a developer generally works on one area of the code and that keeps the
> most used compilation units at the top of the tree where they are quick to
> find.  Then dwarf2_get_ref_die_offset() snippet would look something like this:
>
>   switch (attr->form)
>     {
>     case DW_FORM_ref_addr:
>       result = DW_ADDR (attr);
>       break;
>     case DW_FORM_ref1:
>     case DW_FORM_ref2:
>     case DW_FORM_ref4:
>     case DW_FORM_ref8:
>     case DW_FORM_ref_udata:
>       result = find_cu_header_offset(offset, objfile);
>       if (result == FALSE)
>           {
>               complain (&dwarf2_unknown_cu_offset, offset);
>               return 0;
>           }
>       result += DW_UNSND (attr);
>       break;
>
> I implemented this in 5.0 and although the change was pretty invasive it has
> seemed to work for a couple of years.

The change was invasive because the dwarf2 reader is globals happy.
It needed a severe amount of simple rewrites that touch almost every
function in order to be able to do it right.
Where right means not having to read the entire debug info section just to
get at a given CU.

> I would like to take advantage of the
> new location list changes and better C++ support in the current GDB but
> can't do it until something like this is in the current GDB source base.

You could reimplement on top of what's in the current gdb tree.
However, it's a bit of work, since while the structure for multiple cu's
is there (struct comp_unit_head has a next  pointer), it's not actually
used (You can #if 0 out the next member and it'll still compile).
In addition, there are still plenty of places that assume one CU at a time
only (using the die_ref table for instance).

I went from a linked list to a splay tree specifically because linearly
searching all the CU's to find the right one was too slow.


> Comments?
>

Feel free to clean up your patch and make it apply to the head, and
submit it.  If you aren't sure how, send it to me and i'll update it for
you.
Or redo it using what's there now, and live with the increased memory
usage.
However, i'm not going to submit it for review/inclusion, and shepard it
through the process. You'd have to do that.
> Tim Combs
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-03-14 17:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-13 15:09 RFC Dwarf2 referent dies Tim Combs
2003-03-14 17:08 ` Daniel Berlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).