From: Andrew Burgess <aburgess@redhat.com>
To: gdb-patches@sourceware.org
Subject: Re: [PATCH] gdb/python: make more use of host_string_to_python_string
Date: Thu, 23 Dec 2021 11:35:38 +0000 [thread overview]
Message-ID: <20211223113538.GJ2634@redhat.com> (raw)
In-Reply-To: <20211206181030.2463266-1-aburgess@redhat.com>
* Andrew Burgess <aburgess@redhat.com> [2021-12-06 18:10:30 +0000]:
> We have a function host_string_to_python_string, which is a wrapper
> around PyString_Decode, which, on Python 3, is an alias for
> PyUnicode_Decode.
>
> However, there are a few places where we still call PyUnicode_Decode
> directly.
>
> This commit replaces all uses of PyUnicode_Decode with calls to
> host_string_to_python_string instead.
>
> To make the use of host_string_to_python_string easier, I've added a
> couple of overloads for this function in python-internal.h, these all
> just forward their calls onto the single base implementation. The
> signatures of all host_string_to_python_string overloads are:
>
> gdbpy_ref<> host_string_to_python_string (const char *str, size_t len);
> gdbpy_ref<> host_string_to_python_string (const char *str)
> gdbpy_ref<> host_string_to_python_string (const string_file &str)
>
> For Python 3 PyString_Decode is setup (in python-internal.h) as an
> alias for PyUnicode_Decode, so there should certainly be no user
> visible changes in this case.
>
> For Python 2 this commit will change the behaviour. Previously by
> calling PyUnicode_Decode we would have been returning a Unicode
> object. Now, after calling host_string_to_python_string, we will have
> a str object.
>
> I've checked the GDB documentation, and, as far as I can tell, the
> methods I've touched were all documented as returning a string, or in
> the gdb.Command case, take a string as an argument, and my
> understanding is that for Python 2, string generally means str. So I
> think the new behaviour would be more expected.
>
> A different solution, that would also make things more consistent in
> the Python 2 world, would be to have host_string_to_python_string
> always return a Unicode object. However, I've been reading this page:
>
> https://pythonhosted.org/kitchen/unicode-frustrations.html
>
> item #3 recommends that unicode strings should be converted to str
> objects before being printed (in Python 2). That would mean that
> users should have been adding .encode() calls to the output of the
> routines I've changed in this commit (if they wanted to print that
> output), which is not something I think is made clear from the GDB
> documentation.
> ---
> gdb/python/py-cmd.c | 9 +++------
> gdb/python/py-frame.c | 5 ++---
> gdb/python/py-type.c | 3 +--
> gdb/python/py-utils.c | 5 ++---
> gdb/python/py-value.c | 4 ++--
> gdb/python/python-internal.h | 11 ++++++++++-
> gdb/python/python.c | 4 ++--
> 7 files changed, 22 insertions(+), 19 deletions(-)
>
> diff --git a/gdb/python/py-cmd.c b/gdb/python/py-cmd.c
> index 94608e0bbcf..b2cafeba320 100644
> --- a/gdb/python/py-cmd.c
> +++ b/gdb/python/py-cmd.c
> @@ -120,8 +120,7 @@ cmdpy_function (const char *args, int from_tty, cmd_list_element *command)
>
> if (! args)
> args = "";
> - gdbpy_ref<> argobj (PyUnicode_Decode (args, strlen (args), host_charset (),
> - NULL));
> + gdbpy_ref<> argobj = host_string_to_python_string (args);
> if (argobj == NULL)
> {
> gdbpy_print_stack ();
> @@ -181,8 +180,7 @@ cmdpy_completer_helper (struct cmd_list_element *command,
> return NULL;
> }
>
> - gdbpy_ref<> textobj (PyUnicode_Decode (text, strlen (text), host_charset (),
> - NULL));
> + gdbpy_ref<> textobj = host_string_to_python_string (text);
> if (textobj == NULL)
> error (_("Could not convert argument to Python string."));
>
> @@ -194,8 +192,7 @@ cmdpy_completer_helper (struct cmd_list_element *command,
> }
> else
> {
> - wordobj.reset (PyUnicode_Decode (word, strlen (word), host_charset (),
> - NULL));
> + wordobj = host_string_to_python_string (word);
> if (wordobj == NULL)
> error (_("Could not convert argument to Python string."));
> }
> diff --git a/gdb/python/py-frame.c b/gdb/python/py-frame.c
> index ee57eb10576..b507ff0794f 100644
> --- a/gdb/python/py-frame.c
> +++ b/gdb/python/py-frame.c
> @@ -131,8 +131,7 @@ frapy_name (PyObject *self, PyObject *args)
>
> if (name)
> {
> - result = PyUnicode_Decode (name.get (), strlen (name.get ()),
> - host_charset (), NULL);
> + result = host_string_to_python_string (name.get ()).release ();
> }
> else
> {
> @@ -658,7 +657,7 @@ gdbpy_frame_stop_reason_string (PyObject *self, PyObject *args)
> }
>
> str = unwind_stop_reason_to_string ((enum unwind_stop_reason) reason);
> - return PyUnicode_Decode (str, strlen (str), host_charset (), NULL);
> + return host_string_to_python_string (str).release ();
> }
>
> /* Implements the equality comparison for Frame objects.
> diff --git a/gdb/python/py-type.c b/gdb/python/py-type.c
> index 8b17b70fbe3..a178c6a4ab2 100644
> --- a/gdb/python/py-type.c
> +++ b/gdb/python/py-type.c
> @@ -1033,8 +1033,7 @@ typy_str (PyObject *self)
> GDB_PY_HANDLE_EXCEPTION (except);
> }
>
> - return PyUnicode_Decode (thetype.c_str (), thetype.size (),
> - host_charset (), NULL);
> + return host_string_to_python_string (thetype).release ();
> }
>
> /* Implement the richcompare method. */
> diff --git a/gdb/python/py-utils.c b/gdb/python/py-utils.c
> index 10c4173efcd..2eb1ed2a09e 100644
> --- a/gdb/python/py-utils.c
> +++ b/gdb/python/py-utils.c
> @@ -152,10 +152,9 @@ python_string_to_host_string (PyObject *obj)
> /* Convert a host string to a python string. */
>
> gdbpy_ref<>
> -host_string_to_python_string (const char *str)
> +host_string_to_python_string (const char *str, size_t len)
> {
> - return gdbpy_ref<> (PyString_Decode (str, strlen (str), host_charset (),
> - NULL));
> + return gdbpy_ref<> (PyString_Decode (str, len, host_charset (), nullptr));
> }
>
> /* Return true if OBJ is a Python string or unicode object, false
> diff --git a/gdb/python/py-value.c b/gdb/python/py-value.c
> index c843c2c3072..8bd30729454 100644
> --- a/gdb/python/py-value.c
> +++ b/gdb/python/py-value.c
> @@ -743,7 +743,7 @@ valpy_format_string (PyObject *self, PyObject *args, PyObject *kw)
> GDB_PY_HANDLE_EXCEPTION (except);
> }
>
> - return PyUnicode_Decode (stb.c_str (), stb.size (), host_charset (), NULL);
> + return host_string_to_python_string (stb).release ();
> }
>
> /* A helper function that implements the various cast operators. */
> @@ -1149,7 +1149,7 @@ valpy_str (PyObject *self)
> GDB_PY_HANDLE_EXCEPTION (except);
> }
>
> - return PyUnicode_Decode (stb.c_str (), stb.size (), host_charset (), NULL);
> + return host_string_to_python_string (stb).release ();
> }
>
> /* Implements gdb.Value.is_optimized_out. */
> diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
> index 211833e4b2d..195cb0a1896 100644
> --- a/gdb/python/python-internal.h
> +++ b/gdb/python/python-internal.h
> @@ -22,6 +22,7 @@
>
> #include "extension.h"
> #include "extension-priv.h"
> +#include "ui-file.h"
>
> /* These WITH_* macros are defined by the CPython API checker that
> comes with the Python plugin for GCC. See:
> @@ -715,7 +716,15 @@ gdb::unique_xmalloc_ptr<char> unicode_to_target_string (PyObject *unicode_str);
> gdb::unique_xmalloc_ptr<char> python_string_to_target_string (PyObject *obj);
> gdbpy_ref<> python_string_to_target_python_string (PyObject *obj);
> gdb::unique_xmalloc_ptr<char> python_string_to_host_string (PyObject *obj);
> -gdbpy_ref<> host_string_to_python_string (const char *str);
> +gdbpy_ref<> host_string_to_python_string (const char *str, size_t len);
> +static inline gdbpy_ref<> host_string_to_python_string (const char *str)
> +{
> + return host_string_to_python_string (str, strlen (str));
> +}
> +static inline gdbpy_ref<> host_string_to_python_string (const string_file &str)
> +{
> + return host_string_to_python_string (str.c_str (), str.size ());
> +}
> int gdbpy_is_string (PyObject *obj);
> gdb::unique_xmalloc_ptr<char> gdbpy_obj_to_string (PyObject *obj);
>
> diff --git a/gdb/python/python.c b/gdb/python/python.c
> index 82af012068b..6e85d30ed97 100644
> --- a/gdb/python/python.c
> +++ b/gdb/python/python.c
> @@ -558,7 +558,7 @@ gdbpy_target_charset (PyObject *self, PyObject *args)
> {
> const char *cset = target_charset (python_gdbarch);
>
> - return PyUnicode_Decode (cset, strlen (cset), host_charset (), NULL);
> + return host_string_to_python_string (cset).release ();
> }
I wonder if we are over using 'host_charset ()'? Consider this
session:
gdb) set host-charset UTF-32
(gdb) python print(gdb.target_charset ())
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)
Error while executing Python code.
(gdb)
Because the string we're encoding is not in host_charset, but is an
ascii string from within GDB. I wonder if in situations like this we
should actually be saying:
static PyObject *
gdbpy_target_charset (PyObject *self, PyObject *args)
{
const char *cset = target_charset (python_gdbarch);
return PyUnicode_Decode (cset, strlen (cset), "utf-8", NULL);
}
Then this function should be documented as returning a utf-8 unicode
object.
Though, having said that, I wonder if the real problem is with 'set
host-charset'? Consider:
(gdb) help set host-charset
Set the host character set.
The `host character set' is the one used by the system GDB is running on.
You may only use supersets of ASCII for your host character set; GDB does
not support any others.
To see a list of the character sets GDB supports, type `set host-charset <TAB>'.
It's not clear what a superset of ASCII is, but I would have assumed
it meant a character set where the bytes for any ascii character are
unchanged in the unicode encoding, which, as I understand it, is not
the case for things like utf-32. So maybe superset of ASCII means
something different... or maybe I'm not understanding utf-32
correctly...
Thanks,
Andrew
prev parent reply other threads:[~2021-12-23 11:35 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-06 18:10 Andrew Burgess
2021-12-22 15:08 ` PING: " Andrew Burgess
2021-12-23 2:19 ` Simon Marchi
2021-12-23 11:07 ` Andrew Burgess
2021-12-24 3:10 ` Simon Marchi
2021-12-24 12:20 ` Andrew Burgess
2021-12-27 1:30 ` Simon Marchi
2021-12-23 11:35 ` Andrew Burgess [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211223113538.GJ2634@redhat.com \
--to=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).