From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32479 invoked by alias); 26 Aug 2004 14:01:39 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 32449 invoked from network); 26 Aug 2004 14:01:36 -0000 Received: from unknown (HELO mclean.mail.mindspring.net) (207.69.200.57) by sourceware.org with SMTP; 26 Aug 2004 14:01:36 -0000 Received: from user-119a90a.biz.mindspring.com ([66.149.36.10] helo=berman.michael-chastain.com) by mclean.mail.mindspring.net with esmtp (Exim 3.33 #1) id 1C0Kog-0000a8-00; Thu, 26 Aug 2004 10:01:26 -0400 Received: from mindspring.com (localhost [127.0.0.1]) by berman.michael-chastain.com (Postfix) with SMTP id 11E7E4B102; Thu, 26 Aug 2004 10:01:40 -0400 (EDT) Date: Thu, 26 Aug 2004 14:01:00 -0000 From: Michael Chastain To: bob@brasko.net Subject: Re: GDB/MI Output Syntax Cc: gdb@sources.redhat.com Message-ID: <412DED43.nail3XH31S08T@mindspring.com> References: <20040825154348.GA19533@white> <412CB6B6.nail1DX11BPYQ@mindspring.com> <20040825193659.GA19945@white> In-Reply-To: <20040825193659.GA19945@white> User-Agent: nail 10.8 6/28/04 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2004-08/txt/msg00387.txt.bz2 Bob Rossi wrote: > so far, it seems to parse everything I throw at it. However, I haven't > tested it to much because I am building an intermediate representation. > This is what I'll use from the front end. How can we hook this up with the gdb test suite? I've got a corpus of gdb.log files. Someone could write some Perl script to pick out pieces and invoke your parser as an external program. It might help to add a few more rules at the top: session -> input_output_pair_list input_output_pair_list -> epsilon | input_output_pair_list input output input -> ... The sticky part is that dejagnu mixes its own output into this. Ick. Getting into the grammar itself: Comma separators and lists are kludgy. In these rules: result_record -> opt_token "^" result_class result_list_prime result_list_prime -> result_list | epsilon result_list -> result_list "," result | "," result The actual gdb output for a result_record could be either: 105^done 103^done,BreakPointTable={...} It looks a little weird to me to parse the first comma as part of result_list_prime. How about: result_record -> opt_token "^" result_class result_record -> opt_token "^" result_class "," result_list result_list -> result | result_list "," result That simplifies tuple and list as well: tuple -> "{}" | "{" result_list "}" list -> "[]" | "[" value_list "]" | "[ result_list ]" That simplifies the rules also, because they won't need any special code to construct a list for: "[" result result_list "]" . This also gets rid of the foo_prime constructions, which can cause trouble. The original oob_record_list_prime caused the original shift/reduce conflict, because the parser had to decide whether to reduce an epsilon to oob_record_list_prime or keep shifting and reduce later to the non-epsilon form of the oob_record_list. Style point: there is a lot of: foo_list -> foo_list foo | epsilon bar_list -> bar_list bar | bar I think this is more readable: foo_list -> epsilon | foo_list foo bar_list -> bar | bar_list bar Another nit: how is the grammar even working with: nl -> CR | CR_LF Doesn't this have to be: nl -> LF | CR | CR LF Or is the lexer quietly defining CR_LF to include "\n"? For coding purposes it would be more efficient to make NL a single token and have the lexer recognize all three forms. For doco purposes it might be better to explicitly make nl a non-terminal and show the LF, CR, CR LF terminals. Either way is okay, but I'd like to have one or the other: either have the lexer do all the work, or have the lexer be stupid simple and have the grammar do the work. Michael