using relocs in disassembler

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* using relocs in disassembler
@ 1999-11-15 13:56 Lynn Winebarger
  1999-11-15 15:20 ` Alan Modra
  1999-11-15 21:21 ` Ian Lance Taylor
  0 siblings, 2 replies; 6+ messages in thread
From: Lynn Winebarger @ 1999-11-15 13:56 UTC (permalink / raw)
  To: binutils

   Hello again.  Been a while since I last posted.  I've converted objdump
and the i386 disassembler to return scheme values, but ignoring relocs
(both static and dynamic).  Now that I've got that part  basically
working, I'd like to try and incorporate the relocation information, since
I'm using this for writing a decompiler which I want to use on shared
libraries (besides a lot of code being in shared libraries, they have the
extra advantage of retaining a lot of symbol information).
    The way I plan on doing this is to change the disassembler to check
for the presence of relocs affecting an operand of the current
instruction.  Now that I'm familiar with the guts of the i386
disassembler, I don't think this will be too difficult.  Anyway, I think I
have a grip on how to deal with static relocs, but not with dynamic
relocs.  First let me see if my assumption about static relocs can be
verified: a reloc affects at most one immediate operand of an
instruction.  Thus I don't need to worry about the size of a reloc, I can
just let the disassembler get the operand as normal, and then just ignore
it (except for the size).  
   Now, I guess my question also applies to some static relocs: when a
reloc appears in data, how can I tell how much data is taken up by the
reloc, and exactly what I should replace it with.  I'm guessing, if it's
data, the data would become a pointer to the symbol (or the value of the
symbol).  
   Anyway, dynamic relocs aren't very well documented in the bfd manual
(at least the last version I got).  Nor are the various types of relocs
(e.g BFD_RELOC_386_JUMP_SLOT).
   Detecting dynamic relocs in data will be done in objdump.c code, so I
can probably bring that back to binutils' objdump.c.  Do dynamic relocs
appear as in code, or just data?

Lynn



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: using relocs in disassembler
  1999-11-15 13:56 using relocs in disassembler Lynn Winebarger
@ 1999-11-15 15:20 ` Alan Modra
  1999-11-15 15:52   ` Lynn Winebarger
  1999-11-16 14:22   ` Ralf Baechle
  1999-11-15 21:21 ` Ian Lance Taylor
  1 sibling, 2 replies; 6+ messages in thread
From: Alan Modra @ 1999-11-15 15:20 UTC (permalink / raw)
  To: Lynn Winebarger; +Cc: binutils

On Mon, 15 Nov 1999, Lynn Winebarger wrote:

> relocs.  First let me see if my assumption about static relocs can be
> verified: a reloc affects at most one immediate operand of an

You could have more than one reloc affecting an instruction. eg. x86
"movl $addr1,addr2"  Other architectures may allow even more (vax ?)

> instruction.  Thus I don't need to worry about the size of a reloc, I can
> just let the disassembler get the operand as normal, and then just ignore
> it (except for the size).  
>    Now, I guess my question also applies to some static relocs: when a
> reloc appears in data, how can I tell how much data is taken up by the
> reloc, and exactly what I should replace it with.  I'm guessing, if it's
> data, the data would become a pointer to the symbol (or the value of the
> symbol).  

Think of a reloc record as just being a formula telling you how to modify
existing instruction or data bytes.  As far as I know, a reloc never
changes the size of an instruction in an object file.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: using relocs in disassembler
  1999-11-15 15:20 ` Alan Modra
@ 1999-11-15 15:52   ` Lynn Winebarger
  1999-11-16 14:22   ` Ralf Baechle
  1 sibling, 0 replies; 6+ messages in thread
From: Lynn Winebarger @ 1999-11-15 15:52 UTC (permalink / raw)
  To: binutils

On Tue, 16 Nov 1999, Alan Modra wrote:

> On Mon, 15 Nov 1999, Lynn Winebarger wrote:
> 
> > relocs.  First let me see if my assumption about static relocs can be
> > verified: a reloc affects at most one immediate operand of an
> 
> You could have more than one reloc affecting an instruction. eg. x86
> "movl $addr1,addr2"  Other architectures may allow even more (vax ?)
> 
    Right, but each of those relocs would only affect one operand, right?
I'm also assuming a reloc wouldn't affect addressing mode information
(e.g. in i386 code I shouldn't have to worry about the ModR/M byte being
affected by a reloc).

> >    Now, I guess my question also applies to some static relocs: when a
> > reloc appears in data, how can I tell how much data is taken up by the
> > reloc, and exactly what I should replace it with.  I'm guessing, if it's
> > data, the data would become a pointer to the symbol (or the value of the
> > symbol).  
> 
> Think of a reloc record as just being a formula telling you how to modify
> existing instruction or data bytes.  As far as I know, a reloc never
> changes the size of an instruction in an object file.
   
    The problem is that I don't want to have to take into account the file
type, so I can't make assumptions about how to apply the reloc.  I don't
want to replace the data with the result of the relocation, I want to 
replace it with a scheme representation of the symbol reference (I'm
representing the disassembly/data with tagged list).  So all I really need
to know is how many bytes the reloc affects.  For example, right now, if
there's a block of data passed to disassemble_bytes, then I would return
a list like this
'(data 1 2 3 4 5 6 34 120  ...)
Now lets assume there's a reloc that starts at the 5th byte, so I would
want to return
'(data 1 2 3 4 (symbol-ref <name> <number-of-bytes>) 34 120 ...)
if, for example, the reloc affects only 2 bytes.  The problem then, is how
do I tell how many bytes the reloc affects.  In the i386, it could be
2, 4, or 6 bytes (all legitimate sizes of pointers).  Other architectures
could introduce other sizes (say 8 bytes).  
   I guess another small question I have would be whether I would need to
worry about a reloc for an offset, say I have the code


movl eax, (edx)
jmp eax

Could the value edx points to (an offset from the current EIP) be set up
by a reloc (such a thing could be set up by a tail recursion optimizing
compiler for a functional language, I think, or perhaps an implementation
of object methods).  
   Another question: can dynamic relocs affect instructions?  (the shared
libraries I've looked at don't, but that's not decisive)

Lynn



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: using relocs in disassembler
  1999-11-15 13:56 using relocs in disassembler Lynn Winebarger
  1999-11-15 15:20 ` Alan Modra
@ 1999-11-15 21:21 ` Ian Lance Taylor
  1999-11-16  4:46   ` Lynn Winebarger
  1 sibling, 1 reply; 6+ messages in thread
From: Ian Lance Taylor @ 1999-11-15 21:21 UTC (permalink / raw)
  To: owinebar; +Cc: binutils

   Date: Mon, 15 Nov 1999 16:59:31 -0500 (EST)
   From: Lynn Winebarger <owinebar@free-expression.org>

   First let me see if my assumption about static relocs can be
   verified: a reloc affects at most one immediate operand of an
   instruction.

That is true on the i386, and on most chips.  On the PowerPC a reloc
can affect both the branch address and the branch prediction bit.
There may be a few other minor exceptions.

      Now, I guess my question also applies to some static relocs: when a
   reloc appears in data, how can I tell how much data is taken up by the
   reloc, and exactly what I should replace it with.  I'm guessing, if it's
   data, the data would become a pointer to the symbol (or the value of the
   symbol).  

Given the howto structure, you can call bfd_get_reloc_size to get the
number of bytes that it affects.

      Anyway, dynamic relocs aren't very well documented in the bfd manual
   (at least the last version I got).  Nor are the various types of relocs
   (e.g BFD_RELOC_386_JUMP_SLOT).

BFD_RELOC_386_JUMP_SLOT is a special instruction which marks a
procedure linkage table entry in an i386 dynamically linked executable
or shared library.  On the i386, a procedure linkage table entry is a
16 byte sequence used to locate a function at run time.  This is used
to permit the dynamic linker to only spend the time to locate a
function if it is actually called.  This speeds up program start
times.  See elf_i386_plt_entry in bfd/elf32-i386.c.

Note that BFD_RELOC_386_JUMP_SLOT is ELF specific.

      Detecting dynamic relocs in data will be done in objdump.c code, so I
   can probably bring that back to binutils' objdump.c.  Do dynamic relocs
   appear as in code, or just data?

Dynamic relocs can appear in both code and data.

   Date: Mon, 15 Nov 1999 18:54:55 -0500 (EST)
   From: Lynn Winebarger <owinebar@free-expression.org>

      I guess another small question I have would be whether I would need to
   worry about a reloc for an offset, say I have the code

   movl eax, (edx)
   jmp eax

   Could the value edx points to (an offset from the current EIP) be set up
   by a reloc (such a thing could be set up by a tail recursion optimizing
   compiler for a functional language, I think, or perhaps an implementation
   of object methods).  

I'm not sure I completely understand your question.  Certainly the
value at the address to which edx points could be initialized using a
reloc.  Also, the offset off of edx could be set by a reloc.  I don't
think a C compiler is likely to ever generate such a case, but it
could be done in assembly code.

      Another question: can dynamic relocs affect instructions?  (the shared
   libraries I've looked at don't, but that's not decisive)

Yes, dynamic relocs can affect instructions.  You should be able to
see an example by making a shared library without using -fpic when
compiling the code.

Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: using relocs in disassembler
  1999-11-15 21:21 ` Ian Lance Taylor
@ 1999-11-16  4:46   ` Lynn Winebarger
  0 siblings, 0 replies; 6+ messages in thread
From: Lynn Winebarger @ 1999-11-16  4:46 UTC (permalink / raw)
  To: binutils

On 15 Nov 1999, Ian Lance Taylor wrote:

>       Now, I guess my question also applies to some static relocs: when a
>    reloc appears in data, how can I tell how much data is taken up by the
>    reloc, and exactly what I should replace it with.  I'm guessing, if it's
>    data, the data would become a pointer to the symbol (or the value of the
>    symbol).  
> 
> Given the howto structure, you can call bfd_get_reloc_size to get the
> number of bytes that it affects.
> 

   Yeah, I noticed that when I went back to the manual (doh!). The source
code (reloc.c) does mention the possibility of variable sized relocs,
though it doesn't mention when that could happen.

>    movl eax, (edx)
>    jmp eax
> 
>    Could the value edx points to (an offset from the current EIP) be set up
>    by a reloc (such a thing could be set up by a tail recursion optimizing
>    compiler for a functional language, I think, or perhaps an implementation
>    of object methods).  
> 
> I'm not sure I completely understand your question.  Certainly the
> value at the address to which edx points could be initialized using a
> reloc.  Also, the offset off of edx could be set by a reloc.  I don't
> think a C compiler is likely to ever generate such a case, but it
> could be done in assembly code.
> 

   This kind of code can be generated by a large switch statement in C.
A label is produced for each case of the switch, and the case to jump to
is determined by looking up the case in a table of labels, and jumping.
Depending on what kind of jump is used, the table entry could require a
pc-relative displacement.  (Though I can't think of any time this would
require a reloc - that's sort of the point of the pc-relative
displacement).
   Another (more likely) example is a functional language where tail calls
are optimized.  In this case, I might set a variable to a closure, then
apply that variable in tail position.  Tail calls should never cost more
than a jump in a tail recursion optimized language, so you might very well
want to use a dynamic relocation that sets the variable to the pc-relative
offset of a dynamically loaded piece of code.  Whether or not this is
actually supported by system tools, I don't know.

Thanks for the info.

Lynn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: using relocs in disassembler
  1999-11-15 15:20 ` Alan Modra
  1999-11-15 15:52   ` Lynn Winebarger
@ 1999-11-16 14:22   ` Ralf Baechle
  1 sibling, 0 replies; 6+ messages in thread
From: Ralf Baechle @ 1999-11-16 14:22 UTC (permalink / raw)
  To: Alan Modra; +Cc: Lynn Winebarger, binutils

On Tue, Nov 16, 1999 at 09:50:41AM +1030, Alan Modra wrote:

> You could have more than one reloc affecting an instruction. eg. x86
> "movl $addr1,addr2"  Other architectures may allow even more (vax ?)

The MIPS N32 and 64 ELF formats even allow multiple relocations to be
applied to the same location.

  Ralf

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~1999-11-16 14:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-11-15 13:56 using relocs in disassembler Lynn Winebarger
1999-11-15 15:20 ` Alan Modra
1999-11-15 15:52   ` Lynn Winebarger
1999-11-16 14:22   ` Ralf Baechle
1999-11-15 21:21 ` Ian Lance Taylor
1999-11-16  4:46   ` Lynn Winebarger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).