* Parser rewritting @ 2010-03-30 18:46 Sergio Durigan Junior 2010-03-30 19:05 ` Chris Moller ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Sergio Durigan Junior @ 2010-03-30 18:46 UTC (permalink / raw) To: Project Archer Hello! As you may have noticed, in the last Archer meeting I brought a topic into discussion: the rewritting of the GDB's parser. The current parser is written using Bison, and unfortunately it is insufficient to satisfy our current needs, especially for C++ productions. With that in mind, Tom asked me to start this discussion in the mailing-list to see what you think about it. We decided to send an e-mail to the archer list at first; this topic will eventually be discussed at the gdb list as well. I am sorry I took so long to send this e-mail, but I was trying to come up with an initial plan to re-implement the parser. I've been studying GCC/G++ parsers in order to understand how they work, but I noticed that it would take some time for me to think in a good plan. I also noticed that other people here have (much!!) more experience about parsers than I do, so why not exposing this idea and see what you think? The initial idea (by Tom) would be to mimic the current structure of the G++ parser. There is also another proposal (from Keith), but I don't know if he wants it to be listed here :-). Feel free to post it, Keith! Any more ideas? Comments about the exinsting ideas are also welcome, of course. Meanwhile, I'll continue studying this parser stuff and will try to propose something useful in some time. Regards, -- Sergio Durigan Junior Debugger Engineer Red Hat Inc. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-03-30 18:46 Parser rewritting Sergio Durigan Junior @ 2010-03-30 19:05 ` Chris Moller 2010-03-30 21:12 ` Tom Tromey 2010-03-30 21:18 ` Tom Tromey 2010-04-02 1:50 ` Chris Moller 2 siblings, 1 reply; 14+ messages in thread From: Chris Moller @ 2010-03-30 19:05 UTC (permalink / raw) To: Sergio Durigan Junior; +Cc: Project Archer There are a couple of antlr C++ parsers available: http://hg.netbeans.org/main/file/tip/cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/cppparser.g http://www.antlr.org/grammar/1198064893071/CPP_parser_v_3.2.zip as well as a C++ preprocessor: http://hg.netbeans.org/main/file/tip/cnd.apt/src/org/netbeans/modules/cnd/apt/impl/support/aptlexer.g I don't know how easy/hard they'd be to adapt to use in GDB, but they might be worth looking at. And I'm just about certain it would be easier to use them than to write a whole new parser--antlr is a lot less weird than bison. There's antlr package in brewroot. On 03/30/10 14:46, Sergio Durigan Junior wrote: > Hello! > > As you may have noticed, in the last Archer meeting I brought a topic into > discussion: the rewritting of the GDB's parser. The current parser is written > using Bison, and unfortunately it is insufficient to satisfy our current > needs, especially for C++ productions. > > With that in mind, Tom asked me to start this discussion in the mailing-list > to see what you think about it. We decided to send an e-mail to the archer > list at first; this topic will eventually be discussed at the gdb list as > well. > > I am sorry I took so long to send this e-mail, but I was trying to come up > with an initial plan to re-implement the parser. I've been studying GCC/G++ > parsers in order to understand how they work, but I noticed that it would take > some time for me to think in a good plan. I also noticed that other people > here have (much!!) more experience about parsers than I do, so why not > exposing this idea and see what you think? > > The initial idea (by Tom) would be to mimic the current structure of the G++ > parser. There is also another proposal (from Keith), but I don't know if he > wants it to be listed here :-). Feel free to post it, Keith! > > Any more ideas? Comments about the exinsting ideas are also welcome, of > course. Meanwhile, I'll continue studying this parser stuff and will try to > propose something useful in some time. > > Regards, > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-03-30 19:05 ` Chris Moller @ 2010-03-30 21:12 ` Tom Tromey 2010-04-04 8:50 ` Dodji Seketeli 0 siblings, 1 reply; 14+ messages in thread From: Tom Tromey @ 2010-03-30 21:12 UTC (permalink / raw) To: Chris Moller; +Cc: Sergio Durigan Junior, Project Archer >>>>> "Chris" == Chris Moller <cmoller@redhat.com> writes: Chris> There are a couple of antlr C++ parsers available: Chris> http://hg.netbeans.org/main/file/tip/cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/cppparser.g We can't generally reuse code like this due to copyright assignment requirements. Chris> And I'm just about certain it would be easier to use them than to Chris> write a whole new parser--antlr is a lot less weird than bison. My preferred route is to hand-write a recursive descent parser, mimicing the structure of the existing code in g++. I think directly sharing code is impractical due to impedance mismatch between gdb and g++ internals. Also our goals are slightly different, in that in gdb we only need to parse expressions, we want a single parser for C and C++, and finally gdb must implement certain language extensions. Using a parser generator may be ok, but I think there are benefits to following an existing parser. Also, parsers like the one in g++ are simpler to debug. (Of course, maybe that is a problem we should solve as well :-) Tom ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-03-30 21:12 ` Tom Tromey @ 2010-04-04 8:50 ` Dodji Seketeli 2010-04-08 19:28 ` Tom Tromey 0 siblings, 1 reply; 14+ messages in thread From: Dodji Seketeli @ 2010-04-04 8:50 UTC (permalink / raw) To: Tom Tromey; +Cc: Chris Moller, Sergio Durigan Junior, Project Archer On Tue, Mar 30, 2010 at 03:12:26PM -0600, Tom Tromey wrote: [...] > Chris> There are a couple of antlr C++ parsers available: > Chris> http://hg.netbeans.org/main/file/tip/cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/cppparser.g > > We can't generally reuse code like this due to copyright assignment > requirements. Would the copyright assignment requirements prevent us from trying to reuse, say, Clang? Maybe one could think about providing a C api on top of Clang and consider Clang as an external dependency? If not, then my point was to explicitely mention it and make sure we did consider the option and ruled it out based on sound reasons. [...] > My preferred route is to hand-write a recursive descent parser, > mimicing > the structure of the existing code in g++. > > I think directly sharing code is impractical due to impedance mismatch > between gdb and g++ internals. Also our goals are slightly different, > in that in gdb we only need to parse expressions, we want a single > parser for C and C++, and finally gdb must implement certain language > extensions. I understand that this minimal parser is meant to stay simple, e.g. no preprocessing support, very minimal error reporting if any at all, no semantic analysis etc, but still, if we can't re-use Clang, then would it be possible to devise this new "minimal parser" as an independant, reusable library with its own dejagnu-free testsuite? Maybe other projects might be interested in using (and extending) something like that. Dodji ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-04-04 8:50 ` Dodji Seketeli @ 2010-04-08 19:28 ` Tom Tromey 2010-04-10 22:05 ` Jim Blandy 0 siblings, 1 reply; 14+ messages in thread From: Tom Tromey @ 2010-04-08 19:28 UTC (permalink / raw) To: Dodji Seketeli; +Cc: Chris Moller, Sergio Durigan Junior, Project Archer >>>>> "Dodji" == Dodji Seketeli <dodji@redhat.com> writes: Dodji> Would the copyright assignment requirements prevent us from trying to Dodji> reuse, say, Clang? Maybe one could think about providing a C api on top Dodji> of Clang and consider Clang as an external dependency? This can be done, after all, we do it with Python :) A new external dependency always causes trouble, though. Look through the archives to see the discussions around expat, python, and libiconv. A required external dependency will be trouble. Anyway, I suspect the impedance mismatch problem holds equally for clang. It is probably worth verifying that. Dodji> I understand that this minimal parser is meant to stay simple, e.g. no Dodji> preprocessing support, very minimal error reporting if any at all, no Dodji> semantic analysis etc, but still, if we can't re-use Clang, then would Dodji> it be possible to devise this new "minimal parser" as an independant, Dodji> reusable library with its own dejagnu-free testsuite? Dodji> Maybe other projects might be interested in using (and extending) Dodji> something like that. I'm not opposed to this but I don't want to slow down our progress to make a library. Tom ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-04-08 19:28 ` Tom Tromey @ 2010-04-10 22:05 ` Jim Blandy 2010-04-10 22:11 ` Jim Blandy 0 siblings, 1 reply; 14+ messages in thread From: Jim Blandy @ 2010-04-10 22:05 UTC (permalink / raw) To: Tom Tromey Cc: Dodji Seketeli, Chris Moller, Sergio Durigan Junior, Project Archer On Thu, Apr 8, 2010 at 12:28 PM, Tom Tromey <tromey@redhat.com> wrote: > I'm not opposed to this but I don't want to slow down our progress to > make a library. For what it's worth, isolating a complex component like this makes it much easier to write unit tests for it. As an experiment, I did my recent work on Google Breakpad --- a new symbol dumper for Linux that converts DWARF debugging info and CFI to Breakpad's own textual format, corresponding extensions to the parser for that data, and stack walkers for x86, x86_64, and ARM --- following a discipline of providing full code coverage and branch coverage (each branch has to be both taken and not taken) with unit tests for each separable component. It slowed me down quite a bit --- I spent more time writing tests than code. But except for cases where I misunderstood the spec, I have also not had any bugs yet in ~5500 non-comment lines of code. Or, more precisely, I had lots of bugs --- some days I could have stayed in bed and not lost ground --- but none of them got committed. This full rewrite of the debugging info dumper, and pretty deep surgery on the stack walker is running on our production crash-handling servers (crash-stats.mozilla.com), and the transition has been painless. What made this possible, though, was that each piece could be taken in isolation and driven from the Google C++ Test Framework. It was easy for me to directly check the results of the parser in isolation, not the results of the command-line interpreter's dispatching, the parsing, the symbol table lookup (and thus the debug info readers), the evaluator, and the printer. The tests were fast to run, so I would run them after pretty much at every point the code could be expected to behave, during the development process. As I say, it wasn't quick. But it also means that my next project can actually have my full attention, because I'm not spreading that debugging effort across the next year, based on ill-defined, occasionally reproducible bug reports. Anyway, what this message comes down to is, "But, but, unit testing! Wow!" ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-04-10 22:05 ` Jim Blandy @ 2010-04-10 22:11 ` Jim Blandy 0 siblings, 0 replies; 14+ messages in thread From: Jim Blandy @ 2010-04-10 22:11 UTC (permalink / raw) To: Tom Tromey Cc: Dodji Seketeli, Chris Moller, Sergio Durigan Junior, Project Archer On Sat, Apr 10, 2010 at 3:04 PM, Jim Blandy <jimb@red-bean.com> wrote: > But except for cases where > I misunderstood the spec, I have also not had any bugs yet in ~5500 > non-comment lines of code. That's non-test lines. There are apparently ~10k lines of non-comment test code. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-03-30 18:46 Parser rewritting Sergio Durigan Junior 2010-03-30 19:05 ` Chris Moller @ 2010-03-30 21:18 ` Tom Tromey 2010-03-30 22:20 ` Keith Seitz 2010-04-02 1:50 ` Chris Moller 2 siblings, 1 reply; 14+ messages in thread From: Tom Tromey @ 2010-03-30 21:18 UTC (permalink / raw) To: Sergio Durigan Junior; +Cc: Project Archer >>>>> "Sergio" == Sergio Durigan Junior <sergiodj@redhat.com> writes: Sergio> The current parser is written using Bison, and unfortunately it Sergio> is insufficient to satisfy our current needs, especially for C++ Sergio> productions. A few particulars... We ran into some problems with the function-like cast notation. I think those are probably fixable, by not differentiating different kinds of names here, but we think there will be more problems. E.g., I suspect we'll run into problems when we get rid of the template name hack in the lexer. Also, there is no good way in bison to disable a production only when the parsing language is C++. You can play games by returning different tokens in different modes, or you can run a preprocessor on the grammar, but both of those are pretty ugly. Tom ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-03-30 21:18 ` Tom Tromey @ 2010-03-30 22:20 ` Keith Seitz 2010-03-30 22:59 ` Tom Tromey 0 siblings, 1 reply; 14+ messages in thread From: Keith Seitz @ 2010-03-30 22:20 UTC (permalink / raw) To: Project Archer On 03/30/2010 02:18 PM, Tom Tromey wrote: > Also, there is no good way in bison to disable a production only when > the parsing language is C++. You can play games by returning different > tokens in different modes, or you can run a preprocessor on the grammar, > but both of those are pretty ugly. Do we really need to worry about C vs C++? How dangerous would it be to simply assume C++? [I know there is a subtle difference between the two, I just wonder whether it would matter that much in usage to warrant treating the two differently/independently.] I also worry more about three other areas that might influence design/implementation decisions: 1) Java? Okay, we could probably work around this by using the current parser for java (ick!) [Do we even consider adding java to the mix worth it? I don't, but that's just my opinion...] 2) Linespec re-evaluation: Let's face it, a number of us have had to deal with problems in linespec.c, and we all know it's a nightmare. Anyone (else) interested in moving to expressions-based linespec processing? 3) Symbol table cleanups: I get a sinking feeling that the symbol table API may need some work before any attempt at writing a new parser my be started. Specifically, when a symbol lookup happens, we should get ALL matching symbols, not just the first one found. [Maybe that's just me?] I know this was a constant barricade when trying to implement overload resolution in the parser. And to this day, we cannot implement overload resolution on a non-class function. A nice side-effect of this: it would help with symbol completion. Heck, I might even just settle for something that says there are multiple matches... Keith ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-03-30 22:20 ` Keith Seitz @ 2010-03-30 22:59 ` Tom Tromey 2010-03-31 2:01 ` Matt Rice 0 siblings, 1 reply; 14+ messages in thread From: Tom Tromey @ 2010-03-30 22:59 UTC (permalink / raw) To: Keith Seitz; +Cc: Project Archer >>>>> "Keith" == Keith Seitz <keiths@redhat.com> writes: Keith> Do we really need to worry about C vs C++? How dangerous would it be Keith> to simply assume C++? [I know there is a subtle difference between the Keith> two, I just wonder whether it would matter that much in usage to Keith> warrant treating the two differently/independently.] There are plenty of unsubtle distinctions as well, like all the additional operator names in C++. This we have to handle, though of course we already have an adequate solution here. Offhand I don't know if there are productions which would cause confusion if enabled in C. Maybe not. It still seems less potentially confusing and perhaps mildly more future-proof to follow each language spec relatively closely. Keith> 1) Java? Okay, we could probably work around this by using the current Keith> parser for java (ick!) [Do we even consider adding java to the mix Keith> worth it? I don't, but that's just my opinion...] Let's leave Java alone. It is "good enough" and really reworking it isn't our mandate. If we were going to really consider merging another language into this effort, I would say ObjC, which currently has its own fork of c-exp.y, minus most of the bug fixes from the last couple of years. But even there, I would rather have somebody knowledgeable and interested in ObjC do it. Keith> 2) Linespec re-evaluation: Let's face it, a number of us have had to Keith> deal with problems in linespec.c, and we all know it's a Keith> nightmare. Anyone (else) interested in moving to expressions-based Keith> linespec processing? Yeah, I think we need a better parser in linespec.c, but I see that as mostly orthogonal. Maybe we would need a second entry point to each language's expression parser to let us ask for just a "function name" production, but otherwise I don't think there is a big overlap. This can easily be retrofitted into the bison-based parsers if needed. Keith> 3) Symbol table cleanups: I get a sinking feeling that the symbol Keith> table API may need some work before any attempt at writing a new Keith> parser my be started. Keith> Specifically, when a symbol lookup happens, we should get ALL matching Keith> symbols, not just the first one found. [Maybe that's just me?] I tend to agree with this idea, though I haven't thought through all the ramifications. But this can also be done independently, I think. The overload resolution stuff is largely done at evaluation time, not in the parser (which makes sense if you want to choose different overloads depending on the value of a convenience variable, which doesn't have a static type). So here we would need the symbol table change and perhaps an IR change -- but not, I think, a parser change. IMO, the first goal for a rewrite of the parser should just be feature parity. It is just changing how we express the parser, from bison to (say) recursive descent. Then we can start adding features, fixing bugs, and moving hacks out of the lexer and into the parser. Tom ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-03-30 22:59 ` Tom Tromey @ 2010-03-31 2:01 ` Matt Rice 0 siblings, 0 replies; 14+ messages in thread From: Matt Rice @ 2010-03-31 2:01 UTC (permalink / raw) To: Tom Tromey; +Cc: Keith Seitz, Project Archer On Tue, Mar 30, 2010 at 3:59 PM, Tom Tromey <tromey@redhat.com> wrote: >>>>>> "Keith" == Keith Seitz <keiths@redhat.com> writes: > > Keith> 1) Java? Okay, we could probably work around this by using the current > Keith> parser for java (ick!) [Do we even consider adding java to the mix > Keith> worth it? I don't, but that's just my opinion...] > > Let's leave Java alone. It is "good enough" and really reworking it > isn't our mandate. > > If we were going to really consider merging another language into this > effort, I would say ObjC, which currently has its own fork of c-exp.y, > minus most of the bug fixes from the last couple of years. But even > there, I would rather have somebody knowledgeable and interested in ObjC > do it. > I would agree with Tom, ObjC is a strict superset of C so it would be alot easier to bolt on top of the new c parser. and a unified parser could have good implications for myself, having been debugging objc++ code, it is quite a pain to have to split up expressions, and 'set language' in the middle of the split up expression so, i would be willing to put some time into getting objc working on what you guys come up with, and keeping an eye on your progress with this in mind. not really something i'd expect you guys to undertake just for fun so first i need to start making test cases of the things the objc parser currently handles, then objc++ cases it doesn't currently handle with any luck the differences between c and c++ will also be applicable and adding objc support to the parser will not add unforseen issues (I wont really hold my breath on that until i see it...), if that is not the case, adding a 2nd set of problems now won't get you guys any closer to your goal, while having the 1st set of problems solved by your parser would surely help when dealing with the 2nd set from the objc perspective. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-03-30 18:46 Parser rewritting Sergio Durigan Junior 2010-03-30 19:05 ` Chris Moller 2010-03-30 21:18 ` Tom Tromey @ 2010-04-02 1:50 ` Chris Moller 2010-04-08 19:21 ` Tom Tromey 2 siblings, 1 reply; 14+ messages in thread From: Chris Moller @ 2010-04-02 1:50 UTC (permalink / raw) To: Sergio Durigan Junior; +Cc: Project Archer [-- Attachment #1: Type: text/plain, Size: 1476 bytes --] On 03/30/10 14:46, Sergio Durigan Junior wrote: > Hello! > > As you may have noticed, in the last Archer meeting I brought a topic into > discussion: the rewritting of the GDB's parser. The current parser is written > using Bison, and unfortunately it is insufficient to satisfy our current > needs, especially for C++ productions. > > With that in mind, Tom asked me to start this discussion in the mailing-list > to see what you think about it. We decided to send an e-mail to the archer > list at first; this topic will eventually be discussed at the gdb list as > well. > A lot of years ago I wrote a fairly elaborate parser using antlr--definitely a cool tool and I recommend you consider it. It's a predicated LL(*) parser generator--the "predicated" bit making it possible, among other things, to handle the context-dependent bits of C/C++ grammar. Just as an example, I've attached a rudimentary antlr grammar that parses a subset of C/C++ decls--if you look, you'll see that the rules look a lot like the specifications in the C++, and in fact started out as a cut'n'paste of those specs. Also, if you look in the grammar for "is_cpp," you can see how rule predicates can be used to have the parser do different things depending on circumstances. Anyway, it's probably worth considering. (In addition to the .g file attached, I wrote a couple other .c and .h files that make it all work. I'll make them available if anyone wants him.) Chris [-- Attachment #2: CPPparser.g --] [-- Type: text/plain, Size: 8503 bytes --] grammar CPPparser; options { language = C; backtrack = true; } @header { #include "pd.h" } decl_specifier @init { bzero(&data_type, sizeof(data_type)); data_type.data_type = DATA_TYPE_TYPE_DESC; } @after { print_data (&data_type); } : storage_class_specifier* WS? type_specifier? function_specifier? FRIEND? TYPEDEF? CONSTEXPR? /* | alignment_specifier */ ; storage_class_specifier : REGISTER { if (type_desc.storage_class == STORAGE_CLASS_NONE) type_desc.storage_class = STORAGE_CLASS_REGISTER; else fprintf (stderr, "Storage class already set.\n"); } | STATIC { if (type_desc.storage_class == STORAGE_CLASS_NONE) type_desc.storage_class = STORAGE_CLASS_STATIC; else fprintf (stderr, "Storage class already set.\n"); } | THREAD_LOCAL | EXTERN { if (type_desc.storage_class == STORAGE_CLASS_NONE) type_desc.storage_class = STORAGE_CLASS_EXTERN; else fprintf (stderr, "Storage class already set.\n"); } | MUTABLE ; type_specifier @init { type_desc.type = TYPE_CODE_UNSET; type_desc.size = -1; type_desc.nr_longs = 0; type_desc.nosign_bit = 1; type_desc.signed_bit = 0; } : ( ({!is_cpp}? simple_type_specifier | {is_cpp}? simple_type_specifier_cpp) WS?)+ | class_specifier /* | enum_specifier */ /* | elaborated_type_specifier */ /* | typename_specifier */ | cv_qualifier* ; simple_type_specifier : /* nested_name_specifier? type_name */ /* | nested_name_specifier TEMPLATE type_name */ CHAR { type_desc.type = TYPE_CODE_INT; type_desc.size = sizeof(char); } | WCHAR_T { type_desc.type = TYPE_CODE_INT; type_desc.size = sizeof(wchar_t); } /* | BOOL { type_desc.type = TYPE_CODE_INT; type_desc.size = sizeof(bool); } */ | SHORT { type_desc.type = TYPE_CODE_INT; type_desc.size = sizeof(short); } | INT { type_desc.type = TYPE_CODE_INT; switch (type_desc.nr_longs) { case 0: type_desc.size = sizeof(int); break; case 1: type_desc.size = sizeof(long int); break; case 2: type_desc.size = sizeof(long long int); break; } } | LONG { if (type_desc.type == TYPE_CODE_UNSET) type_desc.type = TYPE_CODE_INT; if (type_desc.nr_longs < 2) type_desc.nr_longs++; switch (type_desc.nr_longs) { case 0: type_desc.size = sizeof(int); break; case 1: type_desc.size = sizeof(long int); break; case 2: type_desc.size = sizeof(long long int); break; } } | SIGNED { type_desc.nosign_bit = 0; type_desc.signed_bit = 1; } | UNSIGNED { type_desc.nosign_bit = 0; type_desc.signed_bit = 0; } | FLOAT { type_desc.type = TYPE_CODE_FLT; type_desc.size = sizeof(float); } | DOUBLE { type_desc.type = TYPE_CODE_FLT; type_desc.size = (type_desc.nr_longs > 0) ? sizeof(long double) : sizeof(double); } | VOID { type_desc.type = TYPE_CODE_VOID; } | AUTO { } /* | decltype ( expression) */ ; simple_type_specifier_cpp : /* nested_name_specifier? type_name */ /* | nested_name_specifier TEMPLATE type_name */ Char { type_desc.type = TYPE_CODE_INT; type_desc.size = sizeof(char); } | Wchar_t { type_desc.type = TYPE_CODE_INT; type_desc.size = sizeof(wchar_t); } /* | Bool { type_desc.type = TYPE_CODE_INT; type_desc.size = sizeof(bool); } */ | Short { type_desc.type = TYPE_CODE_INT; type_desc.size = sizeof(short); } | Int { type_desc.type = TYPE_CODE_INT; switch (type_desc.nr_longs) { case 0: type_desc.size = sizeof(int); break; case 1: type_desc.size = sizeof(long int); break; case 2: type_desc.size = sizeof(long long int); break; } } | Long { if (type_desc.type == TYPE_CODE_UNSET) type_desc.type = TYPE_CODE_INT; if (type_desc.nr_longs < 2) type_desc.nr_longs++; switch (type_desc.nr_longs) { case 0: type_desc.size = sizeof(int); break; case 1: type_desc.size = sizeof(long int); break; case 2: type_desc.size = sizeof(long long int); break; } } | Signed { type_desc.nosign_bit = 0; type_desc.signed_bit = 1; } | Unsigned { type_desc.nosign_bit = 0; type_desc.signed_bit = 0; } | Float { type_desc.type = TYPE_CODE_FLT; type_desc.size = sizeof(float); } | Double { type_desc.type = TYPE_CODE_FLT; type_desc.size = (type_desc.nr_longs > 0) ? sizeof(long double) : sizeof(double); } | Void { type_desc.type = TYPE_CODE_VOID; } | Auto { } /* | decltype ( expression) */ ; class_specifier : class_head '{' member_specification* '}' ; class_head : class_key identifier? /* | nested_name_specifier identifier base_clause? */ /* | nested_name_specifier? simple_template_id base_clause? */ ; member_specification: type_specifier identifier initialiser? ';' | scope_specifier ':' ; class_key : CLASS | STRUCT | UNION ; scope_specifier : PRIVATE | PUBLIC | PROTECTED ; initialiser : '=' numeric /* | string */ /* | array */ ; numeric : FIXED | FLOATING | EXPO ; identifier : ALPHAI ALPHAC ; /* type_name: class_name enum_name typedef_name ; */ cv_qualifier : CONST | VOLATILE ; function_specifier : INLINE | VIRTUAL | EXPLICIT ; /* Literals for decl_specifier. */ FRIEND : 'friend' ; TYPEDEF : 'typedef' ; CONSTEXPR : 'constexpr' ; /* Literals for storage_specifier. */ REGISTER : 'register' ; STATIC : 'static' ; THREAD_LOCAL : 'thread_local' ; EXTERN : 'extern' ; MUTABLE : 'mutable' ; /* Literals for function_specifier. */ INLINE : 'inline' ; VIRTUAL : 'virtual' ; EXPLICIT : 'explicit' ; /* Literals for simple_type_specifier. */ CHAR : 'char' ; WCHAR_T : 'wchar_t' ; BOOL : 'bool' ; SHORT : 'short' ; INT : 'int' ; LONG : 'long' ; SIGNED : 'signed' ; UNSIGNED : 'unsigned' ; FLOAT : 'float' ; DOUBLE : 'double' ; VOID : 'void' ; AUTO : 'auto' ; /* Literals for simple_type_specifier_cpp. */ Char : 'Char' ; Wchar_t : 'Wchar_t' ; Bool : 'Bool' ; Short : 'Short' ; Int : 'Int' ; Long : 'Long' ; Signed : 'Signed' ; Unsigned : 'Unsigned' ; Float : 'Float' ; Double : 'Double' ; Void : 'Void' ; Auto : 'Auto' ; /* Literals for cv_qualifier. */ CONST : 'const' ; VOLATILE : 'volatile' ; /* Literals for class_key. */ CLASS : 'class' ; STRUCT : 'struct' ; UNION : 'union' ; /* Literals for scope_specifier. */ PRIVATE : 'private' ; PUBLIC : 'public' ; PROTECTED : 'protected' ; SIGN : ('+' | '-') ; INTEGER : ('0'..'9') ; FIXED : INTEGER+ ; FLOATING : INTEGER '.' INTEGER* ; EXPO : INTEGER ('.' INTEGER*)? ('e' | 'E') SIGN? INTEGER ; ALPHAI : ('a'..'z' | 'A'..'Z' | '_') ; ALPHAC : (ALPHAI | INTEGER)* ; NEWLINE : '\r' ? '\n' ; WS : (' ' |'\t' |'\n' |'\r' )* /*{ SKIP(); }*/ ; ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-04-02 1:50 ` Chris Moller @ 2010-04-08 19:21 ` Tom Tromey 2010-04-08 20:21 ` Chris Moller 0 siblings, 1 reply; 14+ messages in thread From: Tom Tromey @ 2010-04-08 19:21 UTC (permalink / raw) To: Chris Moller; +Cc: Sergio Durigan Junior, Project Archer Chris> A lot of years ago I wrote a fairly elaborate parser using Chris> antlr--definitely a cool tool and I recommend you consider it. One thing to ensure is that the antlr output is GPL-compatible. If not, we can't use it. Chris> Just as an example, I've attached a rudimentary antlr grammar that Chris> parses a subset of C/C++ decls We only need expressions. Chris> Anyway, it's probably worth considering. While I still think it makes the most sense to mimic g++, I am open to other solutions that are powerful enough. Another thing worth considering is bison's GLR mode. This has the advantage that we wouldn't actually need to rewrite the whole parser, we could just start by tweaking it. Using tools that generate code is problematic in GDB, because people complain about every new dependency. Even requiring bison will probably generate complaints, because AFAIK some people still do their builds with byacc. Maybe we could check in the generated code, though. Tom ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Parser rewritting 2010-04-08 19:21 ` Tom Tromey @ 2010-04-08 20:21 ` Chris Moller 0 siblings, 0 replies; 14+ messages in thread From: Chris Moller @ 2010-04-08 20:21 UTC (permalink / raw) To: Tom Tromey; +Cc: Sergio Durigan Junior, Project Archer On 04/08/10 15:21, Tom Tromey wrote: > Chris> A lot of years ago I wrote a fairly elaborate parser using > Chris> antlr--definitely a cool tool and I recommend you consider it. > > One thing to ensure is that the antlr output is GPL-compatible. > If not, we can't use it. > antlr.org says that ANTLR itself is under "The BSD License," which looks to like a small subset of GPLv2, but IANAL. I couldn't find anything about licensing for the generated code. http://www.antlr.org/wiki/display/Mantra/License > Chris> Just as an example, I've attached a rudimentary antlr grammar that > Chris> parses a subset of C/C++ decls > > We only need expressions. > > Chris> Anyway, it's probably worth considering. > > While I still think it makes the most sense to mimic g++, I am open to > other solutions that are powerful enough. > > Another thing worth considering is bison's GLR mode. This has the > advantage that we wouldn't actually need to rewrite the whole parser, we > could just start by tweaking it. > > Using tools that generate code is problematic in GDB, because people > complain about every new dependency. Even requiring bison will probably > generate complaints, because AFAIK some people still do their builds > with byacc. Maybe we could check in the generated code, though. > With one exception, ANTLR, including v3, under at least Fedora--I don't know about RHEL. The exception is the v3 C target-language support, which I had to install separately, but I expect it could be included in the antlrv3 package. The generated code is kinda big. The source for the antlr C/C++ expression parser I wrote totals 737 lines, about 500 of which is C support code--the antlr grammar is only 239 lines. But that 239 lines gets turned into about 8800 lines of combined lexer and parser. > Tom > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-04-10 22:11 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-03-30 18:46 Parser rewritting Sergio Durigan Junior 2010-03-30 19:05 ` Chris Moller 2010-03-30 21:12 ` Tom Tromey 2010-04-04 8:50 ` Dodji Seketeli 2010-04-08 19:28 ` Tom Tromey 2010-04-10 22:05 ` Jim Blandy 2010-04-10 22:11 ` Jim Blandy 2010-03-30 21:18 ` Tom Tromey 2010-03-30 22:20 ` Keith Seitz 2010-03-30 22:59 ` Tom Tromey 2010-03-31 2:01 ` Matt Rice 2010-04-02 1:50 ` Chris Moller 2010-04-08 19:21 ` Tom Tromey 2010-04-08 20:21 ` Chris Moller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).