From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-19240-listarch-gdb=sources.redhat.com@sources.redhat.com>
Received: (qmail 32479 invoked by alias); 26 Aug 2004 14:01:39 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 32449 invoked from network); 26 Aug 2004 14:01:36 -0000
Received: from unknown (HELO mclean.mail.mindspring.net) (207.69.200.57)
  by sourceware.org with SMTP; 26 Aug 2004 14:01:36 -0000
Received: from user-119a90a.biz.mindspring.com ([66.149.36.10] helo=berman.michael-chastain.com)
	by mclean.mail.mindspring.net with esmtp (Exim 3.33 #1)
	id 1C0Kog-0000a8-00; Thu, 26 Aug 2004 10:01:26 -0400
Received: from mindspring.com (localhost [127.0.0.1])
	by berman.michael-chastain.com (Postfix) with SMTP
	id 11E7E4B102; Thu, 26 Aug 2004 10:01:40 -0400 (EDT)
Date: Thu, 26 Aug 2004 14:01:00 -0000
From: Michael Chastain <mec.gnu@mindspring.com>
To: bob@brasko.net
Subject: Re: GDB/MI Output Syntax
Cc: gdb@sources.redhat.com
Message-ID: <412DED43.nail3XH31S08T@mindspring.com>
References: <20040825154348.GA19533@white>
 <412CB6B6.nail1DX11BPYQ@mindspring.com> <20040825193659.GA19945@white>
In-Reply-To: <20040825193659.GA19945@white>
User-Agent: nail 10.8 6/28/04
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-SW-Source: 2004-08/txt/msg00387.txt.bz2

Bob Rossi <bob@brasko.net> wrote:
> so far, it seems to parse everything I throw at it. However, I haven't
> tested it to much because I am building an intermediate representation.
> This is what I'll use from the front end.

How can we hook this up with the gdb test suite?

I've got a corpus of gdb.log files.  Someone could write some Perl
script to pick out pieces and invoke your parser as an external program.
It might help to add a few more rules at the top:

  session                 -> input_output_pair_list
  input_output_pair_list  -> epsilon | input_output_pair_list input output
  input                   -> ...

The sticky part is that dejagnu mixes its own output into this.
Ick.

Getting into the grammar itself:

Comma separators and lists are kludgy.  In these rules:

  result_record      -> opt_token "^" result_class result_list_prime
  result_list_prime  -> result_list | epsilon
  result_list        -> result_list "," result | "," result

The actual gdb output for a result_record could be either:

  105^done
  103^done,BreakPointTable={...}

It looks a little weird to me to parse the first comma as part
of result_list_prime.  How about:

  result_record  -> opt_token "^" result_class
  result_record  -> opt_token "^" result_class "," result_list
  result_list    -> result | result_list "," result

That simplifies tuple and list as well:

  tuple  -> "{}" | "{" result_list "}"
  list   -> "[]" | "[" value_list "]" | "[ result_list ]"

That simplifies the rules also, because they won't need any special code
to construct a list for: "[" result result_list "]" .

This also gets rid of the foo_prime constructions, which can cause
trouble.  The original oob_record_list_prime caused the original
shift/reduce conflict, because the parser had to decide whether to
reduce an epsilon to oob_record_list_prime or keep shifting and reduce
later to the non-epsilon form of the oob_record_list.

Style point: there is a lot of:

  foo_list -> foo_list foo | epsilon
  bar_list -> bar_list bar | bar

I think this is more readable:

  foo_list -> epsilon | foo_list foo
  bar_list -> bar | bar_list bar

Another nit: how is the grammar even working with:

  nl -> CR | CR_LF

Doesn't this have to be:

  nl -> LF | CR | CR LF

Or is the lexer quietly defining CR_LF to include "\n"?

For coding purposes it would be more efficient to make NL
a single token and have the lexer recognize all three forms.

For doco purposes it might be better to explicitly make nl
a non-terminal and show the LF, CR, CR LF terminals.

Either way is okay, but I'd like to have one or the other:
either have the lexer do all the work, or have the lexer be
stupid simple and have the grammar do the work.

Michael