public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* Linker : Make a map of typeinfo to vtable
@ 2024-01-19 22:29 Frederick Virchanza Gotham
  2024-01-19 22:52 ` Frederick Virchanza Gotham
  2024-01-22 16:50 ` Michael Matz
  0 siblings, 2 replies; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2024-01-19 22:29 UTC (permalink / raw)
  To: binutils

[-- Attachment #1: Type: text/plain, Size: 2236 bytes --]

Over on the mailing list for new proposals for the C++ programming
language, I am proposing that we be able to get a pointer to a class's
vtable from its typeinfo.

So I want to edit the source code for the GNU linker in order to add two
new symbols to every binary:

    (1) __map_typeinfohash_vtable
    (2) __map_typeinfohash_vtable_size

The first one is an array mapping the hashes of type_info's to vtable
pointers. The second one is the number of elements in the array. The array
would look something like the following in C++:

    pair<size_t, void const*> const g_vtables[] = {
         { typeid(MyClass).hash_code(), &_ZTI7MyClass },
         { typeid(YourClass).hash_code(), &_ZTI9YourClass },
    };

Next I can write a function called 'std::get_polymorphic_facilitator' which
would
perform a binary search through this array, in order to convert a typeinfo
to a vtable pointer. So if a massive program like Chromium has 6,000
classes, then a worst case binary search would be about
13 checks. On a desktop PC that's less than a microsecond.

Afterward I would fork libstdc++ to edit the <typeinfo> standard header
file to add the following inline function:

    inline void const *get_polymorphic_facilitator(type_info const &ti)
    {
        extern pair<size_t,void const*> const *const
__map_typeinfohash_vtable;

        extern size_t const __map_typeinfohash_vtable_size;

        return lower_bound( p, p + __map_typeinfohash_vtable_size,
ti.hash_code(), OnlyCompareFirst()
)->second;
    }

But first of all I need to edit the source code for 'ld' so that, after it
has generated all the vtables and typeinfo's for every class, I can then
create my aforementioned array which maps typeinfo's to vtables. Or,
instead of waiting until after all the vtables and typeinfo's have been
generated, I can populate my array as and when these things are generated
(I imagine this might be easier than waiting until they're all generated,
but whatever works).

I've never edited a compiler or linker before, so could I please ask for a
little guidance here? Could someone point me toward the right source file
for 'ld' where I should start making changes to implement this? I would
appreciate any tips you can give me please.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linker : Make a map of typeinfo to vtable
  2024-01-19 22:29 Linker : Make a map of typeinfo to vtable Frederick Virchanza Gotham
@ 2024-01-19 22:52 ` Frederick Virchanza Gotham
  2024-01-20 19:44   ` Frederick Virchanza Gotham
  2024-01-22 16:50 ` Michael Matz
  1 sibling, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2024-01-19 22:52 UTC (permalink / raw)
  To: binutils

[-- Attachment #1: Type: text/plain, Size: 844 bytes --]

On Friday, January 19, 2024, Frederick Virchanza Gotham wrote:

>
> Or, instead of waiting until after all the vtables and typeinfo's have
> been generated, I can populate my array as and when these things are
> generated (I imagine this might be easier than waiting until they're all
> generated, but whatever works).
>


Having thought about this a little more, I realise that it's the C++
compiler that generates the vtables and typeinfo's, and so, inside the
source code for the linker, what I'll need to do is:

(1) After all the object files have been loaded in, make a finite list of
all typeinfo's and vtables
(2) Generate my array that maps typeinfo's to vtable pointers
(3) Place my array somewhere inside a section inside the ELF binary
(4) Provide the two linker symbols __map_typeinfohash_vtable and
__map_typeinfohash_vtable_size

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Linker : Make a map of typeinfo to vtable
  2024-01-19 22:52 ` Frederick Virchanza Gotham
@ 2024-01-20 19:44   ` Frederick Virchanza Gotham
  2024-01-21  3:27     ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2024-01-20 19:44 UTC (permalink / raw)
  To: binutils

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

On Fri, Jan 19, 2024 at 10:52 PM Frederick Virchanza Gotham wrote:
>
>> Or, instead of waiting until after all the vtables and typeinfo's have
been generated,
> I can populate my array as and when these things are generated (I imagine
this might
> be easier than waiting until they're all generated, but whatever works).


Am I right in thinking that I should be looking in the file "bfd/linker.c",
Line 1382, at the function "_bfd_generic_link_add_one_symbol", which you
can see here:

    https://github.com/bminor/binutils-gdb/blob/
4a2318c9858fdb1899157339f526df3d20e43cfe/bfd/linker.c#L1382

So if the linker is looking through a load of object files (.o), static
archives (.a), and shared libraries (.so), every symbol encountered will be
processed by the "add_one_symbol" function.

And so if I add a few lines to that function to detect symbols beginning
with "_ZTI", I can add these symbols to a global container for all the
vtables. Also I can detect all the type_info's too.

Am I barking up the right tree?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linker : Make a map of typeinfo to vtable
  2024-01-20 19:44   ` Frederick Virchanza Gotham
@ 2024-01-21  3:27     ` Frederick Virchanza Gotham
  2024-01-21 22:35       ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2024-01-21  3:27 UTC (permalink / raw)
  To: binutils

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

On Sat, Jan 20, 2024 at 7:44 PM Frederick Virchanza Gotham wrote:
>
> Am I right in thinking that I should be looking in the file
"bfd/linker.c", Line 1382, at the function
"_bfd_generic_link_add_one_symbol"


I'm still a bit clueless here, but here's what I've got so far, I'm able to
print out a list of all the classes that have both a vtable and a typeinfo:

Here's the link to Github showing the changes I've made so far, I've
created two new files 'polymorphism.h' and 'polymorphism.c'.


https://github.com/bminor/binutils-gdb/compare/5a75433a122ea1037ccb4e948332f4886e242911...healytpk:linker-vtable:master

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linker : Make a map of typeinfo to vtable
  2024-01-21  3:27     ` Frederick Virchanza Gotham
@ 2024-01-21 22:35       ` Frederick Virchanza Gotham
  0 siblings, 0 replies; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2024-01-21 22:35 UTC (permalink / raw)
  To: binutils

[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]

On Sun, Jan 21, 2024 at 3:27 AM Frederick Virchanza Gotham wrote:


> I'm still a bit clueless here, but here's what I've got so far, I'm able
> to print out a list of all the classes that have both a vtable and a
> typeinfo:
>
> Here's the link to Github showing the changes I've made so far, I've
> created two new files 'polymorphism.h' and 'polymorphism.c'.
>
>
> https://github.com/bminor/binutils-gdb/compare/5a75433a122ea1037ccb4e948332f4886e242911...healytpk:linker-vtable:master
>
>

I'm close to getting this working but I'm stuck on something. The function,
"_bfd_generic_link_add_one_symbol", is called to add each new symbol, and
I'm able to record the names of the vtables and typeinfo's that are
encountered. One of the arguments passed to that function is called
'value', and it is the symbol's byte offset within its section. The problem
however is that sometimes 'value' is set to 0. I can understand if just one
symbol has its offset set to 0 inside its section (i.e. if it's the first
thing in the section), but a typeinfo and a vtable for different types
can't both be located at the beginning of the same section.

Does anyone know why "_bfd_generic_link_add_one_symbol" would be called
with an argument of 0 for 'value' when it is defining a symbol?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linker : Make a map of typeinfo to vtable
  2024-01-19 22:29 Linker : Make a map of typeinfo to vtable Frederick Virchanza Gotham
  2024-01-19 22:52 ` Frederick Virchanza Gotham
@ 2024-01-22 16:50 ` Michael Matz
  2024-01-25 10:17   ` Frederick Virchanza Gotham
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Matz @ 2024-01-22 16:50 UTC (permalink / raw)
  To: Frederick Virchanza Gotham; +Cc: binutils

Hello,

On Fri, 19 Jan 2024, Frederick Virchanza Gotham wrote:

> So I want to edit the source code for the GNU linker in order to add two
> new symbols to every binary:
> 
>     (1) __map_typeinfohash_vtable
>     (2) __map_typeinfohash_vtable_size
> 
> The first one is an array mapping the hashes of type_info's to vtable
> pointers. The second one is the number of elements in the array. The array
> would look something like the following in C++:
> 
>     pair<size_t, void const*> const g_vtables[] = {
>          { typeid(MyClass).hash_code(), &_ZTI7MyClass },
>          { typeid(YourClass).hash_code(), &_ZTI9YourClass },
>     };

When you want to get from typeinfo to vtable pointer, why not just add 
that pointer directly into the typeinfo?  A hashtable is quite a 
roundabout way to implement 'typeid(Type).vtable()' .

As you're proposing your general idea on a mailing list for new features 
of C++, that seems like a more sensible design than hacking the linker.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linker : Make a map of typeinfo to vtable
  2024-01-22 16:50 ` Michael Matz
@ 2024-01-25 10:17   ` Frederick Virchanza Gotham
  2024-01-25 23:02     ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2024-01-25 10:17 UTC (permalink / raw)
  To: binutils; +Cc: Michael Matz

On Mon, Jan 22, 2024 at 4:50 PM Michael Matz wrote:
>
> When you want to get from typeinfo to vtable pointer, why not just add
> that pointer directly into the typeinfo?  A hashtable is quite a
> roundabout way to implement 'typeid(Type).vtable()' .


That would be one option, yes it would be possible to edit the source
code for the GNU g++ compiler so that the type '__vmi_class_type_info'
has the vtable pointer appended onto the end of it, and then the
'__vmi_class_type_info::flags' member could have a bit set in it to
indicate the presence of the vtable pointer. There are two drawbacks
to this option though:
    (1) It involves editing the GNU g++ compiler, and I'd rather edit
the linker because then it can be extended easily to other languages
(such as D)
    (2) If we have multiple object files with a weak vtable symbol,
and if some of the weak symbols don't have the vtable pointer
appended, then a symbol missing the vtable pointer might become
'strong'
    (2) It will be necessary to recompile entire programs, whereas if
I edit just the linker, only the final linking step will need to be
redone (which is a big deal if you have been given proprietary object
files and static archives)

So for the time being I'm ploughing on ahead with editing the linker.
Please help me to get this working, here's what I've got so far:

    https://github.com/bminor/binutils-gdb/compare/f11786a1951932217d348f3739e8deb31975a355...healytpk:linker-vtable:master

Inside the file, "ld/ldmain.c", the function 'main' calls 'ldwrite'
which calls 'bfd_final_link'. After 'ldwrite' is called, I am able to
use the pointer "link_info.output_bfd" in order to get a list of all
symbols along with their offsets inside their sections. From this data
I'm able to produce my array which maps typeinfo's to vtables. But
there's one problem -- 'ldwrite' has already been called by the time I
produce my array. So, somehow, I need to call 'bfd_final_link' in
order get the offsets for the vtables inside their section, but then I
need to append my own array onto the end of the section, and then
perform the final link.

Can anybody please point me in the right direction for how to do this?
Can I possibly make a duplicate of the "bfd_link_info" struct and pass
it to "bfd_final_link" so that I can determine the vtable offsets,
then append my array onto the end of the section, and then call
'bfd_final_link' a second time giving it the original "bfd_link_info"
struct? Would something like that work?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linker : Make a map of typeinfo to vtable
  2024-01-25 10:17   ` Frederick Virchanza Gotham
@ 2024-01-25 23:02     ` Frederick Virchanza Gotham
  2024-01-27 10:56       ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2024-01-25 23:02 UTC (permalink / raw)
  To: binutils; +Cc: Michael Matz

On Thu, Jan 25, 2024 at 10:17 AM Frederick Virchanza Gotham wrote:
>
> So, somehow, I need to call 'bfd_final_link' in
> order get the offsets for the vtables inside their section, but then I
> need to append my own array onto the end of the section, and then
> perform the final link.


I've had another idea.

Inside the function "bfd_generic_link_add_one_symbol", I can keep a
record of all vtables and typeinfo's encountered. So before the call
to 'ldwrite', I know how many entries there will be in my array. So
before I call 'ldwrite', I add a new symbol to the readonly section
for my array, calling it "__map_typeinfo_vtable", and I give it the
correct size. I fill my array with all zeroes, except for a 128-Bit
magic number at the beginning: 021fed6674b3ac813273c5909cb7d0ac. So
then after I call 'ldwrite' and have produced the final binary file, I
re-open the binary file and I scan through it until I find my magic
number (i.e. 3c1fed6f74bb6c814173c5409cb7d0aa), and I replace it with
the correct data.

This will work I think. I'm gonna try code it now.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Linker : Make a map of typeinfo to vtable
  2024-01-25 23:02     ` Frederick Virchanza Gotham
@ 2024-01-27 10:56       ` Frederick Virchanza Gotham
  0 siblings, 0 replies; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2024-01-27 10:56 UTC (permalink / raw)
  To: binutils; +Cc: Michael Matz

On Thu, Jan 25, 2024 at 11:02 PM Frederick Virchanza Gotham wrote:
>
> So then after I call 'ldwrite' and have produced the final binary file, I
> re-open the binary file and I scan through it until I find my magic
> number (i.e. 3c1fed6f74bb6c814173c5409cb7d0aa), and I replace it with
> the correct data.


I wasn't able to get much help with how 'ld' or 'libbfd' works so I
had to get creative. I have gotten my new feature for the GNU bfd
linker working, here's my latest code:

    https://github.com/bminor/binutils-gdb/compare/8409b75...healytpk:linker-vtable:e4adc46

Here's what I had to do:

(1) In 'main', I add another string to the end of 'argv' for another
object file (e.g. "/tmp/linker_9821421740127.c.o"). This object file
doesn't exist yet.
(2) In '_bfd_generic_link_add_one_symbol', I keep a record of all
typeinfo's and vtables encountered. I keep a count of matching pairs.
(3) In 'load_symbols', I keep a watch out for
"/tmp/linker_9821421740127.c.o", and I generate this object file at
the point it's requested by using 'gcc' to compile a simple C source
file which contains a definition for the array
'__map_typeinfo_vtable'. The size of this array depends upon the count
of matching pairs.
(4) The newly-generated object file contains the symbol
'__map_typeinfo_vtable' inside the section ".data.rel.ro", but the
array is all zeroes.
(5) After 'ldwrite' is called and the final output file has been
written, I then check what the offset of each symbol is by looking at
"link_info.output_bfd". I make an array of these offsets and I sort
it.
(6) I re-open the output file in binary mode, and I seek to the
location of '__map_typeinfo_vtable', and I write the real array
contents to the file.

I have this tested and working. I've written a C++ program that does a
binary search through '__map_typeinfo_vtable' in order to map a
typeinfo to a vtable, and I can then use that to convert a virtual
function pointer into a direct function pointer.

It's cool that I've gotten this working but it's a little bit
Frankensteinish in places. Instead of appending a new object file to
the end of the command line, it would be nice if I could just call a
function in 'libbfd' to manually add the symbol
'__map_typeinfo_vtable' -- but I don't know how to do this. And
instead of re-opening the output file and overwriting the zeroed-out
array, it would be nice if I could write it correctly the first time.

If anyone has any pointers here, please help me out. Here's my latest
post on the C++ proposals mailing list showing my C++ code:

    https://lists.isocpp.org/std-proposals/2024/01/8948.php

I'm hoping to either put this linker up on GodBolt or to host it on my
own webspace if I can.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-01-27 10:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-19 22:29 Linker : Make a map of typeinfo to vtable Frederick Virchanza Gotham
2024-01-19 22:52 ` Frederick Virchanza Gotham
2024-01-20 19:44   ` Frederick Virchanza Gotham
2024-01-21  3:27     ` Frederick Virchanza Gotham
2024-01-21 22:35       ` Frederick Virchanza Gotham
2024-01-22 16:50 ` Michael Matz
2024-01-25 10:17   ` Frederick Virchanza Gotham
2024-01-25 23:02     ` Frederick Virchanza Gotham
2024-01-27 10:56       ` Frederick Virchanza Gotham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).