Link-time call sizing and section stretching

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* Link-time call sizing and section stretching
@ 2000-01-06 17:50 Michael Sokolov
  2000-01-06 18:05 ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Sokolov @ 2000-01-06 17:50 UTC (permalink / raw)
  To: binutils; +Cc: alex, dayakar, hari, john_w_marshall, stacy

Hi binutils hackers,

I have this one problem that our company very much needs solved and that I
think should be doable, but binutils aren't my forte...

I want to do link-time call sizing. I'm programming for an embedded platform
based on Motorola 68000 (plain 68000, not 68020) using binutils-2.9.1 for the
m68k-*-coff target. I'm using COFF only as an intermediate format and the
"executable" produced by ld isn't actually executed on anything, but is fed
into a program that rips the raw section images out of it and blasts them into
ROM. I know that folks tend to prefer ELF to COFF these days. However, ELF
stands for Executable and Linkable Format, and almost all of its fancy features
not present in COFF have to do with the executable part and UNIX ABIs and
shared libraries. These features are of no use for an intermediate format and
would be just dead weight, which is why I think COFF is a better choice.
Anyway, given binutils' BFD library, this shouldn't really matter.

Under this platform programs have to be split into sections not larger than 64
KB, and each section is ROMed independently of others at an unpredictable
address, which means that function calls and code references have to be done in
a very special way. Since all code is in ROM, we can't patch calls at run time
with absolute addresses. This means that all references within a section must
be PC-relative. Since this is a 68000 and not a 68020, we only have signed 16-
bit displacements here. I have devised wacky instruction sequences that
effectively act as bsrl and lea and pea with 32-bit PC-relative displacements,
which you only have on 68020. However, there is a high overhead associated with
them, which makes it very important to use 16-bit references wherever possible
and use 32-bit ones only where necessary, i.e., if a reference is from one end
to the other end of an almost full 64 KB section and exceeds the -32 KB to +32
KB range of 16-bit references. Even more importantly, this is only for
intrasection references. Since not only the absolute address of each section,
but even the relative location of sections is unknown, intersection references
involve special instruction sequences that extract the base address of the
target section from a global variable initialized at run time.

There are several solutions already available to the problem of intersection
calls. Some involve teaching gcc what function is in what section and how to
generate intersection call instruction sequences, others involve patching
assembly code as it comes out of gcc and is fed into as, and still others
involve adding stubs to each section that does the intersection jump and have
everyone just call stubs with normal intrasection calls. However, they all have
major drawbacks. Also there is no solution to the problem of sizing
intrasection calls so that only the ones that need to be 32-bit are 32-bit and
the others are 16-bit. I've written a script that patches assembly code coming
out of gcc before assembly to make all calls 32-bit, but that's really
expensive. And overall the lack of a true synergistic solution to all these
problems causes other inconveniences.

The only real synergistic solution is link-time call sizing. Instead of having
gcc generate the actual instruction for calls and other references and having
as assemble them, have gcc generate magic tokens saying "I want to call
function foo", have as encode them in the object file somehow, and have the
linker turn them into actual instruction sequences.

The problem with this is that it requires the linker to stretch the sections it
gets from object files, rather than just move them around, which is what all
the traditional relocations allow. This requires some hacking. It is my
understanding that none of the traditional UNIX object file formats, neither
a.out, nor COFF, nor ELF, are designed for this. In all of them the assembler
is free to assume that the section it outputs can only be moved around, but
cannot be stretched or compressed. If a single section in a single object
module contains a label and a relative reference to it, the assembler will
finalize the reference itself and won't emit a relocation, as it assumes that
it has sufficient information for this.

What I'm thinking about is creatively breaking these rules to achieve the
desired effect. First hack as (and I guess ld -r) to emit relocations for all
references, even relative ones that seem like they can be finalized. Second,
teach as some special new directives to emit special tokens in the object
indicating where unknown-size relative references would have to be made and to
what target. Here I'm thinking about creating a special section that won't
appear in the final executable and that would contain this information in
object files. Wherever a call instruction is being placed now, place a local
label there instead. Then in the special section put records each containing a
reference to this label, so that the linker knows where to patch, a reference
to the target, so that it knows what are we referencing, and a code indicating
what instruction sequence to generate (a call, a lea-type reference, or a pea-
type reference). Finally, add a new step to the linker to process this special
section and to inject call and reference instruction sequences into other
sections, stretching them as necessary. Stretch a section by correcting all
symbol table and reference table entries appropriately. This step must be added
to the linker after the link script has been processed and all sections have
been combined as directed there, so that everything is finalized as to what is
in what section, but before the relocations are stripped, so that there will
still be one relocation step to fix the relocations blown off by section
stretching, as well as the all-new relocations the generated instruction
sequences will have in them.

As far as I could tell, what I'm describing is doable. However, my binutils
hacking skills are probably a little insufficient for a hack like this.
Therefore, I would very much appreciate any suggestions and help. The
impression I've got from looking at the binutils code is that this is something
that a hacker more GNU-literate than I am would be able to do in a timeframe
from a few hours to a few days. Since this is not a hobby of mine but something
that our company needs for our projects, and we need it urgently, I'm sure we
would be able to pay someone for a couple of days of work for this (I'm Cc'ing
this posting to our management for that).

TIA 2^32 for any help!

--
Michael Sokolov				2695 VILLA CREEK DR STE 240
Software Engineer			DALLAS TX 75234-7329 USA
JP Systems, Inc.			Phone: +1-972-484-5432 x247
					    or +1-888-665-2460 x247
E-mail: msokolov@meson.jpsystems.com	Fax:   +1-972-484-4154

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Link-time call sizing and section stretching
  2000-01-06 17:50 Link-time call sizing and section stretching Michael Sokolov
@ 2000-01-06 18:05 ` Ian Lance Taylor
  2000-04-01  0:00   ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2000-01-06 18:05 UTC (permalink / raw)
  To: msokolov; +Cc: binutils, alex, dayakar, hari, john_w_marshall, stacy

   Date: Thu, 6 Jan 00 19:49:39 -0600
   From: msokolov@meson.jpsystems.com (Michael Sokolov)

   I'm using COFF only as an intermediate format and the
   "executable" produced by ld isn't actually executed on anything, but is fed
   into a program that rips the raw section images out of it and blasts them into
   ROM. I know that folks tend to prefer ELF to COFF these days. However, ELF
   stands for Executable and Linkable Format, and almost all of its fancy features
   not present in COFF have to do with the executable part and UNIX ABIs and
   shared libraries. These features are of no use for an intermediate format and
   would be just dead weight, which is why I think COFF is a better choice.

The main advantage of ELF over COFF in an embedded system is that ELF
permits the section alignment to set on a section by section basis.

   However, there is a high overhead associated with
   them, which makes it very important to use 16-bit references wherever possible
   and use 32-bit ones only where necessary

This general class of problem is what the GNU linker calls relaxation.
Relaxation is implemented for a number of targets: SH, H8/300,
MIPS/ECOFF with -membedded-pic come to mind.

The solutions more or less follow the lines you sketched out: keep all
the relocations in assembler output, and then hack over them in the
linker.

The most interesting one to you might be the MIPS -membedded-pic work,
since it has some similar characteristics.  The MIPS only has a 18 bit
relative branch.  A call to a procedure which is more than 18 bits
away is changed to a 5 instruction sequence.  The linker relaxation
code is in bfd/coff-mips.c.

   As far as I could tell, what I'm describing is doable. However, my binutils
   hacking skills are probably a little insufficient for a hack like this.
   Therefore, I would very much appreciate any suggestions and help. The
   impression I've got from looking at the binutils code is that this is something
   that a hacker more GNU-literate than I am would be able to do in a timeframe
   from a few hours to a few days. Since this is not a hobby of mine but something
   that our company needs for our projects, and we need it urgently, I'm sure we
   would be able to pay someone for a couple of days of work for this (I'm Cc'ing
   this posting to our management for that).

Frankly, I think your time estimates are low.  I've done this sort of
relaxation work for a number of processors in BFD already, and I've
done very similar work for the m68000 using another linker.  However,
if I were to do this (which I will not), I would estimate the work at
closer to a couple of weeks.

Good luck, though.

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Link-time call sizing and section stretching
  2000-01-06 18:05 ` Ian Lance Taylor
@ 2000-04-01  0:00   ` Ian Lance Taylor
  0 siblings, 0 replies; 5+ messages in thread
From: Ian Lance Taylor @ 2000-04-01  0:00 UTC (permalink / raw)
  To: msokolov; +Cc: binutils, alex, dayakar, hari, john_w_marshall, stacy

   Date: Thu, 6 Jan 00 19:49:39 -0600
   From: msokolov@meson.jpsystems.com (Michael Sokolov)

   I'm using COFF only as an intermediate format and the
   "executable" produced by ld isn't actually executed on anything, but is fed
   into a program that rips the raw section images out of it and blasts them into
   ROM. I know that folks tend to prefer ELF to COFF these days. However, ELF
   stands for Executable and Linkable Format, and almost all of its fancy features
   not present in COFF have to do with the executable part and UNIX ABIs and
   shared libraries. These features are of no use for an intermediate format and
   would be just dead weight, which is why I think COFF is a better choice.

The main advantage of ELF over COFF in an embedded system is that ELF
permits the section alignment to set on a section by section basis.

   However, there is a high overhead associated with
   them, which makes it very important to use 16-bit references wherever possible
   and use 32-bit ones only where necessary

This general class of problem is what the GNU linker calls relaxation.
Relaxation is implemented for a number of targets: SH, H8/300,
MIPS/ECOFF with -membedded-pic come to mind.

The solutions more or less follow the lines you sketched out: keep all
the relocations in assembler output, and then hack over them in the
linker.

The most interesting one to you might be the MIPS -membedded-pic work,
since it has some similar characteristics.  The MIPS only has a 18 bit
relative branch.  A call to a procedure which is more than 18 bits
away is changed to a 5 instruction sequence.  The linker relaxation
code is in bfd/coff-mips.c.

   As far as I could tell, what I'm describing is doable. However, my binutils
   hacking skills are probably a little insufficient for a hack like this.
   Therefore, I would very much appreciate any suggestions and help. The
   impression I've got from looking at the binutils code is that this is something
   that a hacker more GNU-literate than I am would be able to do in a timeframe
   from a few hours to a few days. Since this is not a hobby of mine but something
   that our company needs for our projects, and we need it urgently, I'm sure we
   would be able to pay someone for a couple of days of work for this (I'm Cc'ing
   this posting to our management for that).

Frankly, I think your time estimates are low.  I've done this sort of
relaxation work for a number of processors in BFD already, and I've
done very similar work for the m68000 using another linker.  However,
if I were to do this (which I will not), I would estimate the work at
closer to a couple of weeks.

Good luck, though.

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Link-time call sizing and section stretching
  2000-01-06 18:31 Michael Sokolov
@ 2000-04-01  0:00 ` Michael Sokolov
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Sokolov @ 2000-04-01  0:00 UTC (permalink / raw)
  To: binutils; +Cc: john_w_marshall, stacy

Ian Lance Taylor <ian@zembu.com> wrote:

> The main advantage of ELF over COFF in an embedded system is that ELF
> permits the section alignment to set on a section by section basis.

OK, this one doesn't really matter for my platform, 4-byte alignment everywhere
is fine. In fact, it's actually a non-issue, as my ld link script puts every
section at 0. Why? As I've said, each section ends up on the embedded device at
an unpredictable address independently of other sections, making it an exercise
in futility for the linker to picture your program as having first this
section, then that section, etc. This means that the starting address the
linker thinks each section has is meaningless. On the other hand, making it 0
makes it easy to get the relative offset of a symbol from the start of its
section, which is what's really needed usually, by simply emitting an
"absolute" reference.

> This general class of problem is what the GNU linker calls relaxation.
> Relaxation is implemented for a number of targets: SH, H8/300,
> MIPS/ECOFF with -membedded-pic come to mind.
>
> The solutions more or less follow the lines you sketched out: keep all
> the relocations in assembler output, and then hack over them in the
> linker.
>
> The most interesting one to you might be the MIPS -membedded-pic work,
> since it has some similar characteristics.  The MIPS only has a 18 bit
> relative branch.  A call to a procedure which is more than 18 bits
> away is changed to a 5 instruction sequence.  The linker relaxation
> code is in bfd/coff-mips.c.

Thanks for the pointers, with these I might actually be able to do something!
(The offer of a few days' pay for a good hacker still stands, though. We are in
Dallas, TX, USA, and we develop cool mobile communication software for handheld
devices, primarily PalmOS ones, which is the embedded platform I'm talking
about.)

--
Michael Sokolov				2695 VILLA CREEK DR STE 240
Software Engineer			DALLAS TX 75234-7329 USA
JP Systems, Inc.			Phone: +1-972-484-5432 x247
					    or +1-888-665-2460 x247
E-mail: msokolov@meson.jpsystems.com	Fax:   +1-972-484-4154

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Link-time call sizing and section stretching
@ 2000-01-06 18:31 Michael Sokolov
  2000-04-01  0:00 ` Michael Sokolov
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Sokolov @ 2000-01-06 18:31 UTC (permalink / raw)
  To: binutils; +Cc: john_w_marshall, stacy

Ian Lance Taylor <ian@zembu.com> wrote:

> The main advantage of ELF over COFF in an embedded system is that ELF
> permits the section alignment to set on a section by section basis.

OK, this one doesn't really matter for my platform, 4-byte alignment everywhere
is fine. In fact, it's actually a non-issue, as my ld link script puts every
section at 0. Why? As I've said, each section ends up on the embedded device at
an unpredictable address independently of other sections, making it an exercise
in futility for the linker to picture your program as having first this
section, then that section, etc. This means that the starting address the
linker thinks each section has is meaningless. On the other hand, making it 0
makes it easy to get the relative offset of a symbol from the start of its
section, which is what's really needed usually, by simply emitting an
"absolute" reference.

> This general class of problem is what the GNU linker calls relaxation.
> Relaxation is implemented for a number of targets: SH, H8/300,
> MIPS/ECOFF with -membedded-pic come to mind.
>
> The solutions more or less follow the lines you sketched out: keep all
> the relocations in assembler output, and then hack over them in the
> linker.
>
> The most interesting one to you might be the MIPS -membedded-pic work,
> since it has some similar characteristics.  The MIPS only has a 18 bit
> relative branch.  A call to a procedure which is more than 18 bits
> away is changed to a 5 instruction sequence.  The linker relaxation
> code is in bfd/coff-mips.c.

Thanks for the pointers, with these I might actually be able to do something!
(The offer of a few days' pay for a good hacker still stands, though. We are in
Dallas, TX, USA, and we develop cool mobile communication software for handheld
devices, primarily PalmOS ones, which is the embedded platform I'm talking
about.)

--
Michael Sokolov				2695 VILLA CREEK DR STE 240
Software Engineer			DALLAS TX 75234-7329 USA
JP Systems, Inc.			Phone: +1-972-484-5432 x247
					    or +1-888-665-2460 x247
E-mail: msokolov@meson.jpsystems.com	Fax:   +1-972-484-4154

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-04-01  0:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-01-06 17:50 Link-time call sizing and section stretching Michael Sokolov
2000-01-06 18:05 ` Ian Lance Taylor
2000-04-01  0:00   ` Ian Lance Taylor
2000-01-06 18:31 Michael Sokolov
2000-04-01  0:00 ` Michael Sokolov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).