public inbox for gdb-prs@sourceware.org help / color / mirror / Atom feed
From: "naesten at gmail dot com" <sourceware-bugzilla@sourceware.org> To: gdb-prs@sourceware.org Subject: [Bug python/17138] New: C strings, gdb.Value.__str__, and Python 3 Date: Thu, 10 Jul 2014 02:47:00 -0000 [thread overview] Message-ID: <bug-17138-4717@http.sourceware.org/bugzilla/> (raw) https://sourceware.org/bugzilla/show_bug.cgi?id=17138 Bug ID: 17138 Summary: C strings, gdb.Value.__str__, and Python 3 Product: gdb Version: 7.7 Status: NEW Severity: normal Priority: P2 Component: python Assignee: unassigned at sourceware dot org Reporter: naesten at gmail dot com I wanted to see how GDB's Python support dealt with strange C strings in Python 3 (using a build of 7.7.1 based on the Debian packaging git). Now, I should probably start by reminding everyone that Python 3 has changed the rules for strings: where in Python 2 the "" syntax and the corresponding function-like-class, str(), implicitly refer to byte strings of no particular encoding, in Python 3 they refer to Unicode strings, though code units can nevertheless be 1, 2, or 4 bytes long. Python 3 (and 2.6+) have a new b"" syntax and bytes() type for strings of bytes (which while they might sometimes resemble text, should never be confused with actual text, unless of course they actually do represent text, in which case they should be decoded). Note: Probably all of the str() calls in the following are technically redundant with the use of print(), but for clarity I will include them anyway. The parentheses around the argument to print are mandatory in Python 3, as the print keyword has been replaced by a builtin function. The first thing I tried had what looked like VERY strange results: (gdb) python print(str(gdb.parse_and_eval('"foo\x80"'))) "foo\302\200" (gdb) ... until I realized that the escape was presumably being handled by Python here, and so was treated as referring to U+0080, so GDB just encoded it as UTF-8 before trying to parse it, with the obvious results. So next I tried: (gdb) python print(str(gdb.parse_and_eval('"foo\\x80"'))) "foo\200" (gdb) ... which looks like GDB just invented UCS-1. I also tried calling functions like len() and bytes() on these char* values, only to find that they were not implemented. Around this point, I decided to consult the documentation, which I discovered did not mention the __str__() method *anywhere*, but did talk of a string() method, so I tried that out instead: (gdb) python print(gdb.parse_and_eval('"foo\\x80"').string()) Traceback (most recent call last): File "<string>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3: invalid start byte Error while executing Python code. (gdb) At last, results that I can actually understand! (Unhelpful though they may be.) Are we sure this is the right default here? Might it not make more sense to return bytes unless specifically asked for an encoding? At the very least, we definitely provide a way to get uninterpreted bytes in a bytes() object for Python 2.6+. -- You are receiving this mail because: You are on the CC list for the bug.
next reply other threads:[~2014-07-10 2:47 UTC|newest] Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-07-10 2:47 naesten at gmail dot com [this message] 2014-07-10 4:59 ` [Bug python/17138] " b.r.longbons at gmail dot com 2023-09-13 14:47 ` tromey at sourceware dot org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-17138-4717@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=gdb-prs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).