public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Custom hash tables in extensions
@ 2013-06-11 13:08 Alex Leach
  0 siblings, 0 replies; only message in thread
From: Alex Leach @ 2013-06-11 13:08 UTC (permalink / raw)
  To: gcc

Dear GCC devs,

I hope you don't mind me posting on this list. I'm trying to finish up an  
AST to XML converter, which I started porting from GCC-XML (a patched  
version of GCC-4.2) to a GCC plugin, quite a while ago now.

I'd really appreciate any help with finishing this up, as there's a lot to  
learn about GCC's internal garbage collection mechanisms, and I can't  
afford to burn much time on this at the moment, but I would like it to  
actually work...


(Please excuse the use of `VEC(tree, gc)` instead of `vec<tree, va_gc>` in  
this email; I'm accommodating for both in my code. Also, sorry about the  
length of this; I'm not very good at concise...)


Current plugin deficiency, cf. original GCCXML implementation
-----

One of the (last?) limitations of the plugin as it stands, is that each  
`cp_binding_level` is missing an extra `VEC(tree, gc)*` member, which was  
originally patched in to name-lookup.h. This VEC - i.e.  
`cp_binding_level->all_decls` - stored all (grand-)child declarations  
passed by `ht_forall(ident_hash, callback, 0);` to `callback`, just before  
the XML dump starts. (`callback` is implemented as  
`xml_fill_all_decls`[1]).

I've tried generating this `all_decls` vector on the fly, during the main  
dump routine, but it seems that the information needs to be gathered in a  
separate, preliminary pass of the AST, which is what the `ht_forall` call  
achieves. Each `cp_binding_level`'s `all_decls` member is populated by  
recursing backwards through each `cxx_binding`'s `previous` member, while  
`ht_forall` recurses forward through the AST.

The full `all_decls` member is used during the XML dump, only when writing  
out complete `NAMESPACE_DECL`s. (see lines preceding 1673, of  
`xml_output_namespace_decl`[2]).


Custom hash table
--------

I've been browsing the GCC code and reading the internals manual, and it  
seems to me that one way to replicate this functionality in a plugin,  
would be to use `ht_forall(ident_hash, ..)` to populate a separate hash  
table, mapping IDENTIFIER_NODE's to VEC's.

I'm sure you're all aware that implementing that, is much easier said than  
done! I had a grep for GCC source code using `ht_` functions, and came  
across stringpool.c. So I started modifying code from there, and then came  
to a bit of a wall: `struct GTY(()) string_pool_data`.

If I understand stringpool.c correctly, one `string_pool_data` instance is  
assigned to each `hashnode`, but I don't know how to get  
`string_pool_data` out of its hashnode.. Is there some gengtype-generated  
function that achieves this, or is a cast all that's required?

If this is the way to go about getting that `all_decls` VEC, please could  
someone help me out(!), or point me at some source code that has a GTY'd  
mapping of IDENTIFIER_NODE's to VEC's? I've got chapter 22 of the  
Internals manual in front of me (Memory Management and Type Information),  
but it's a lot to take in. It also looks like I'll have to figure out  
chapter 22.4, on how to use the `PLUGIN_GGC_(START|END)` callbacks, which  
will also take some time.. Pointers to any existing examples where this is  
done, would be really appreciated!


Using the existing hash table, `ident_hash`
----

This would be ideal. I think it would be the least amount of code,  
wouldn't require gengtype, or the call to `ht_forall`. If this is possible  
(I'm sure it is), I've failed to get a working implementation. At first I  
changed `xml_fill_all_decls`, to instead put the VEC of declarations into  
each `cxx_binding`s `static_decls`. This gives improper results, however,  
I think due to duplicating declarations and messing with things I  
shouldn't touch.

My second attempt got rid of the `ht_forall` call, and instead used  
`ht_lookup(ident_hash, ...)` during the dump, to get a namespace's  
hashnode. But I haven't got this to work, because I haven't found a way to  
get a VEC of all declarations, recursively, given the namespace node (as  
either a `tree`, `hashnode` or `cxx_binding`). One inefficiency of doing  
it this way, is that each time a nested namespace is encountered, then it  
would have to repeat itself, as the parent namespace had already recursed  
through it when populating its own VEC. Potential benefit: reduced minimum  
memory usage during the dump. Still, I can't figure out which are the  
necessary API functions or macros. I get lost looking in the tree.h files..

Either way, the method doesn't matter so much, as long as the result is  
accurate and the implementation saves some time.


Ways to help..
----

If you're familiar with the GTY datatypes, gengtype, hash tables and / or  
`tree.h`, please could you help me decide how I can replace the  
`cp_binding_level->all_decls` member, and also with finding usage examples  
of the relevant internal GCC API(s). If you'd be happy to contribute code,  
that would be even better! All due credit will be given where deserved, of  
course.


Current code
---

If you'd like to see the current state of the hash table code I've tried,  
please let me know and I can easily create a fork on github with the  
`xml_ident_hash`. I haven't figured out the exact gengtype commands I need  
to put in the build system files yet, but that's on its way...

The attempt to use `static_decls` in place of `all_decls` is currently  
what's live in my github repo[3]. This appears to work fine, when testing  
against the 80 C++ STL headers provided by GCC-4.8's libstdc++. Only four  
of the tests fail; further digging led me to figure out that missing  
`all_decls` is a much bigger problem than I'd initially thought..


Any help, pointers or advice would be really, really appreciated! If /  
when it's up to standard, I'd like to propose it for inclusion on the GCC  
plugins wiki, but it's not quite there yet...

Yours sincerely,
Alex


[1]: https://github.com/gccxml/gccxml/blob/master/GCC/gcc/cp/xml.c#L3709
[2]: https://github.com/gccxml/gccxml/blob/master/GCC/gcc/cp/xml.c#L1652
[3]: https://github.com/alexleach/gccxml_plugin

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-06-11 13:08 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-11 13:08 Custom hash tables in extensions Alex Leach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).