From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15812 invoked by alias); 22 Aug 2004 02:55:31 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 15735 invoked from network); 22 Aug 2004 02:55:29 -0000 Received: from unknown (HELO atlantic.mail.pas.earthlink.net) (207.217.120.179) by sourceware.org with SMTP; 22 Aug 2004 02:55:29 -0000 Received: from ip216-26-76-134.dsl.du.teleport.com ([216.26.76.134] helo=stray.canids) by atlantic.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 1ByiW0-0004sp-00 for gdb@sources.redhat.com; Sat, 21 Aug 2004 19:55:28 -0700 Received: from stray.canids (localhost.localdomain [127.0.0.1]) by stray.canids (Postfix) with ESMTP id 96F5D511B4A for ; Sat, 21 Aug 2004 19:55:27 -0700 (PDT) From: Felix Lee To: gdb@sources.redhat.com Subject: Re: GDB/XMI (XML Machine Interface) In-Reply-To: message on Sat, 21 Aug 2004 14:28:52 PDT from Felix Lee Date: Sun, 22 Aug 2004 02:55:00 -0000 Message-Id: <20040822025527.96F5D511B4A@stray.canids> X-SW-Source: 2004-08/txt/msg00279.txt.bz2 Felix Lee : > Bob Rossi : > > 1. Have to write a parser. (regex, recursive decent) > > BTW, I guarantee the parser will have to be updated with every > > release of GDB. > so far, I haven't found that xml is any less work than that, and > it usually feels like a lot more work, but I haven't used xml for > anything substantial yet, so it may just be unfamiliarity. here's some elaboration. this is what I think about xml parsers today. please correct me if I'm wrong. there are two types of xml parsers, stream-based and tree-based. using an xml stream parser is equivalent to writing a recursive descent parser. the stream parser basically just handles the 'tokenization' aspect of parsing xml (which is complicated by considerations like character encoding, etc.) to read data with an xml stream parser, you have to write handlers that match the structure of the data you're parsing, which is not any simpler than writing a recursive descent parser for some other tree-like data format. using an xml tree parser is complicated by xml's origin as a markup language, which introduces issues that aren't particularly relevant to data representation, but can't easily be ignored. something like perl's XML::Simple tries to hide the messy details and give you a natural data structure that corresponds to an xml document, but there are a few problems that make XML::Simple unsuitable for data that isn't "simple". using a more general xml tree parser is harder. in order to access the data you want, you either have to walk the document tree yourself (which is similar to writing a recursive descent parser) or use XPATH descriptions to locate items in the tree (which is similar to using regexps). xml tree parsers also have the disadvantage of needing a lot of memory. the estimates are 10x to 30x the size of the xml document, which puzzles me. it's not clear to me why you'd need more than about 2x. (actually I'd expect more like 0.8x since xml is redundantly verbose.) with either stream parsers or tree parsers, if an xml schema changes, you have to revise your code, unless the change is careful to make only backward-compatible extensions. guaranteeing that is hard for nontrivial changes, so people often screw it up, or they play it safe and define a new schema. in either case, old code will often require updating anyway. --