From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-1825-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 21064 invoked by alias); 21 Dec 2009 20:06:40 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 21053 invoked by uid 22791); 21 Dec 2009 20:06:39 -0000
X-SWARE-Spam-Status: No, hits=-2.5 required=5.0
	tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
From: Tom Tromey <tromey@redhat.com>
To: Phil Muldoon <pmuldoon@redhat.com>
Cc: Project Archer <archer@sourceware.org>
Subject: Re: [patch][rfc] Implement Python lazy strings (PR 10705)
References: <4B2CDABA.4050301@redhat.com>
Reply-To: Tom Tromey <tromey@redhat.com>
Date: Mon, 21 Dec 2009 20:06:00 -0000
In-Reply-To: <4B2CDABA.4050301@redhat.com> (Phil Muldoon's message of "Sat, 19
	Dec 2009 13:52:58 +0000")
Message-ID: <m36380ay7v.fsf@fleche.redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-SW-Source: 2009-q4/txt/msg00127.txt.bz2

>>>>> "Phil" == Phil Muldoon <pmuldoon@redhat.com> writes:

Phil> I wanted to send this patch out before the holidays for initial
Phil> review.  This patch implements Python lazy strings, and also
Phil> alters the Python pretty-printing process to handle them
Phil> accordingly.

This looks quite nice.

I think you should submit the next revision of this patch upstream.  In
general I think that fixes and modifications to Python code should go
upstream if the underlying functionality is already upstream.

Phil> +If the optional @var{encoding} argument is given, it must be a string
Phil> +naming the encoding of the string in the @code{gdb.LazyString}, such as
Phil> +@code{"ascii"}, @code{"iso-8859-6"} or @code{"utf-8"}.  When a lazy
Phil> +string is printed, the @value{GDBN} codec machinery is used to convert
Phil> +the string during printing.  If @var{encoding} is not given, or if
Phil> +@var{encoding} is an empty string, then @var{encoding} is set to
Phil> +NULL.  When @value{GDBN} attempts to print a @code{gdb.LazyString}
Phil> +where @var{encoding} is set to NULL, @value{GDBN} will automatically
Phil> +select the encoding most suitable for the string type.

I think this paragraph should be reorganized a little.  The "When a lazy
string is printed" sentence should probably be in its own paragraph.

Phil> +If the optional @var{length} argument is given, the string will be
Phil> +fetched and encoded to the given length.

This should mention whether the length is in chars or bytes, and what
happens if no length is given.

Phil> +@defivar LazyString length
Phil> +This attribute holds the length of the string.  This attribute is not
Phil> +writable.

Likewise.

Phil> +@defivar LazyString encoding
Phil> +This attribute holds the encoding that will be applied to the string
Phil> +when the string is printed by @value{GDBN}.  This attribute is not
Phil> +writable.

This should be None if there is no encoding.

Phil> +@defivar LazyString type
Phil> +This attribute holds the type that is associated with the string.

This should mention if it is the string's type or the underlying
character type.

Phil> +typedef struct {
Phil> +  PyObject_HEAD
Phil> +  CORE_ADDR address;
Phil> +  char *encoding;
Phil> +  long length;
Phil> +  struct type *type;
Phil> +} lazy_string_object;

I try to put comments on each local field (not HEAD) indicating the
meaning of the field.  This could mirror the docs, more or less.

Phil> +static PyObject *
Phil> +stpy_get_encoding (PyObject *self, void *closure)
Phil> +{
Phil> +  lazy_string_object *self_string = (lazy_string_object *) self;
Phil> +
Phil> +  /* An encoding can be set to NULL by the user, so check before
Phil> +     attempting a Python FromString call.  */
Phil> +  if (self_string->encoding)
Phil> +    return PyString_FromString (self_string->encoding);
Phil> +
Phil> +  return NULL;

I think this should return None if there is no encoding.

Phil> +PyObject *
Phil> +stpy_get_type (PyObject *self, void *closure)
Phil> +{
Phil> +  lazy_string_object *obj = (lazy_string_object *) self;
Phil> +  PyObject *type;
Phil> +
Phil> +  if (obj->type)
Phil> +    type = type_to_type_object (obj->type);
Phil> +  else
Phil> +    return NULL;

I think the lack of a type should be detected and rejected when the
object is constructed.

BTW it isn't generally valid to just return NULL in a Python method.
You must set the error first.

Phil> +PyObject *
Phil> +create_lazy_string_object (CORE_ADDR address, long length,
Phil> +			   const char *encoding, struct type *type)

I think we usually use some py_ name for exported APIs.

Phil> +  str_obj = PyObject_New (lazy_string_object, &lazy_string_object_type);
Phil> +  if (!str_obj)
Phil> +    {
Phil> +      PyErr_SetString (PyExc_MemoryError,
Phil> +		       "Could not allocate lazy string object.");
Phil> +      return NULL;
Phil> +    }

I think if PyObject_New returns NULL, then it has probably already set
the error, and you can simply return NULL without setting it again.

Phil> +  if (address == 0)
Phil> +    {
Phil> +      PyErr_SetString (PyExc_MemoryError,
Phil> +		       "Cannot create a lazy string from a GDB-side string.");
Phil> +      return NULL;
Phil> +    }

Do this check first, before making a new object.
Otherwise, this leaks the new object.

Phil> +  str_obj->address = address;
Phil> +  str_obj->length = length;
Phil> +  if (encoding == NULL || strcmp (encoding, ""))

You probably meant !strcmp here.

Phil> +  type_incref (type);

Upstream doesn't have this stuff FWIW.
Hopefully we'll fix it up next year...

Phil> +  val = value_at_lazy (self_string->type, self_string->address);

I guess this means the lazy string holds the string type, not the
character type.

Phil> +  xfree (((lazy_string_object *) self)->encoding);
Phil> +  type_decref (((lazy_string_object *) self)->type);

Introduce a temporary to avoid duplicating the ugly cast.

Phil> +/* Determine whether the printer object pointed to by OBJ is a
Phil> +   Python lazy string.  */
Phil> +int is_lazy_string (PyObject *result)

Newline after "int".

Probably should have a py_ name for this and for extract_lazy_string.

Phil> +gdb_byte *
Phil> +extract_lazy_string (PyObject *string, struct type
Phil> +		     **str_type, long *length,

Don't line break before "**str_type", that looks weird.

Phil> +  errcode = read_string (value_address (output), *length, width,
Phil> +			 *length, byte_order, &buffer,
Phil> +			 &bytes_read);

This seems fishy.  Doesn't this mean we end up reading the full contents
of the string?

I looked a little and it seems this is what gdb already does.
I guess I am confused about something here.

I suppose this does solve the decoding problem nicely, just not the full
laziness problem.

Also, note that read_string will allocate 'buffer' even on error (not on
exception though).  So this should free buffer if read_string returns an
error.

And, read_string can throw, so you need a try-catch here.

Phil> +	  if (is_lazy_string (py_v))
Phil> +	    {
Phil> +	      output = extract_lazy_string (py_v, &type, &length, &encoding);
Phil> +	      xfree (encoding);
Phil> +	    }
[...]
Phil> +	      fputs_filtered (output, stream);

It isn't ok to print bytes from the target like this.
They have to be converted to the host encoding.

Phil> +static PyObject *
Phil> +valpy_lazy_string (PyObject *self, PyObject *args, PyObject *kw)
[...]
Phil> +  str_obj = create_lazy_string_object (value_address (value), length,
Phil> +				       user_encoding, value_type (value));
Phil> +  if (str_obj)
Phil> +    Py_INCREF (str_obj);

I think create_lazy_string_object returns a new reference.
That means this incref is wrong.

Tom