public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
From: "b.r.longbons at gmail dot com" <sourceware-bugzilla@sourceware.org>
To: gdb-prs@sourceware.org
Subject: [Bug python/17138] C strings, gdb.Value.__str__, and Python 3
Date: Thu, 10 Jul 2014 04:59:00 -0000	[thread overview]
Message-ID: <bug-17138-4717-N5ZRUxEBMA@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-17138-4717@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=17138

Ben Longbons <b.r.longbons at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |b.r.longbons at gmail dot com

--- Comment #1 from Ben Longbons <b.r.longbons at gmail dot com> ---
Having done a lot, there are only a couple of solutions for dealing with
strings:

1. Avoid unicode strings entirely, use 'bytes', avoid operations that differ
(particularly, indexing, which returns an integer in python3). Disadvantage:
lots of operations are not available on 'bytes' in python3.
2. Use unicode everywhere, with errors='surrogateescape'. Disadvantages: have
to put up with lots of whining from the Python community about how you don't
understand strings; has no C implementation in python2 and you have to bundle
the python version.
3. Use unicode in python3, bytes in python2. Advantage: avoids most of the
language-feature problems. Disadvantage: *lots* of opportunities for subtle
bugs, such as the ones mentioned in this bug report.

You'll note that all the real difficulties occur only in Python3, since it
*insists* that you have absolute knowledge about and control over your users
(this has caused no end of pain for people writing webservers).

3 is what programs do if you don't pay any attention. 2 is feasible if you are
developing new code with python3 as your primary target (from __future__ import
unicode_literals). 1 is the most correct for the kind of work gdb is doing, but
can be painful without a. But that leads us to:


4. Invent an entire new string type that just DTRT in both python2 and python3.
This *should* be possible as long as everyone duck types. It probably *is* safe
to assume that any unicode string you get (mostly, from python string literals)
is safe to treat as utf-8 (most of them will be ascii anyway), but for the vast
majority of your code, you can just deal with byte strings in whatever encoding
the inferior wants.

In approach 4, all functions that take strings, just need to feed them through
the new string factory, so there's not a lot of pain on callers. There is,
however, a problem that you can't use *builtin* functions on strings,
particularly you can't write: '%s %s' % (a, b), you have to write bstring('%s
%s') % (a, b). The best we can do for this case is try to see if it's possible
to make that always throw an exception, so at least they will fail quickly.
Unless maybe we hook in an AST rewriter like py.test does ...

If all this seems too complicated:

5. Just stick with python2 forever, and acheive ultimate success by simply
ignoring python3 and unicode.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


  reply	other threads:[~2014-07-10  4:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-10  2:47 [Bug python/17138] New: " naesten at gmail dot com
2014-07-10  4:59 ` b.r.longbons at gmail dot com [this message]
2023-09-13 14:47 ` [Bug python/17138] " tromey at sourceware dot org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-17138-4717-N5ZRUxEBMA@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=gdb-prs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).