From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-19132-listarch-gdb=sources.redhat.com@sources.redhat.com>
Received: (qmail 15812 invoked by alias); 22 Aug 2004 02:55:31 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 15735 invoked from network); 22 Aug 2004 02:55:29 -0000
Received: from unknown (HELO atlantic.mail.pas.earthlink.net) (207.217.120.179)
  by sourceware.org with SMTP; 22 Aug 2004 02:55:29 -0000
Received: from ip216-26-76-134.dsl.du.teleport.com ([216.26.76.134] helo=stray.canids)
	by atlantic.mail.pas.earthlink.net with esmtp (Exim 3.33 #1)
	id 1ByiW0-0004sp-00
	for gdb@sources.redhat.com; Sat, 21 Aug 2004 19:55:28 -0700
Received: from stray.canids (localhost.localdomain [127.0.0.1])
	by stray.canids (Postfix) with ESMTP id 96F5D511B4A
	for <gdb@sources.redhat.com>; Sat, 21 Aug 2004 19:55:27 -0700 (PDT)
From: Felix Lee <felix.1@canids.net>
To: gdb@sources.redhat.com
Subject: Re: GDB/XMI (XML Machine Interface) 
In-Reply-To: message
    on Sat, 21 Aug 2004 14:28:52 PDT
    from Felix Lee <felix.1@canids.net> 
Date: Sun, 22 Aug 2004 02:55:00 -0000
Message-Id: <20040822025527.96F5D511B4A@stray.canids>
X-SW-Source: 2004-08/txt/msg00279.txt.bz2

Felix Lee <felix.1@canids.net>:
> Bob Rossi <bob@brasko.net>:
> >    1. Have to write a parser. (regex, recursive decent)
> >       BTW, I guarantee the parser will have to be updated with every
> >       release of GDB.
> so far, I haven't found that xml is any less work than that, and
> it usually feels like a lot more work, but I haven't used xml for
> anything substantial yet, so it may just be unfamiliarity.

here's some elaboration.  this is what I think about xml parsers
today.  please correct me if I'm wrong.

there are two types of xml parsers, stream-based and tree-based.

using an xml stream parser is equivalent to writing a recursive
descent parser.  the stream parser basically just handles the
'tokenization' aspect of parsing xml (which is complicated by
considerations like character encoding, etc.)

to read data with an xml stream parser, you have to write
handlers that match the structure of the data you're parsing,
which is not any simpler than writing a recursive descent parser
for some other tree-like data format.

using an xml tree parser is complicated by xml's origin as a
markup language, which introduces issues that aren't particularly
relevant to data representation, but can't easily be ignored.

something like perl's XML::Simple tries to hide the messy details
and give you a natural data structure that corresponds to an xml
document, but there are a few problems that make XML::Simple
unsuitable for data that isn't "simple".

using a more general xml tree parser is harder.  in order to
access the data you want, you either have to walk the document
tree yourself (which is similar to writing a recursive descent
parser) or use XPATH descriptions to locate items in the tree
(which is similar to using regexps).

xml tree parsers also have the disadvantage of needing a lot of
memory.  the estimates are 10x to 30x the size of the xml
document, which puzzles me.  it's not clear to me why you'd need
more than about 2x.  (actually I'd expect more like 0.8x since
xml is redundantly verbose.)

with either stream parsers or tree parsers, if an xml schema
changes, you have to revise your code, unless the change is
careful to make only backward-compatible extensions.
guaranteeing that is hard for nontrivial changes, so people often
screw it up, or they play it safe and define a new schema.  in
either case, old code will often require updating anyway.
--