From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.polymtl.ca (smtp.polymtl.ca [132.207.4.11]) by sourceware.org (Postfix) with ESMTPS id 6FD3C3858D39 for ; Mon, 15 Nov 2021 02:44:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6FD3C3858D39 Received: from simark.ca (simark.ca [158.69.221.121]) (authenticated bits=0) by smtp.polymtl.ca (8.14.7/8.14.7) with ESMTP id 1AF2i3nv016878 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 14 Nov 2021 21:44:08 -0500 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.polymtl.ca 1AF2i3nv016878 Received: from [10.0.0.11] (192-222-157-6.qc.cable.ebox.net [192.222.157.6]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 89CC51E940; Sun, 14 Nov 2021 21:44:03 -0500 (EST) Message-ID: <708b4894-2a9f-e10c-466e-45bf355b4587@polymtl.ca> Date: Sun, 14 Nov 2021 21:44:03 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 Subject: Re: [PATCHv5 4/4] gdb: handle binary data in 'maint packet' and RemoteTargetConnection.send_packet Content-Language: en-US To: Andrew Burgess , gdb-patches@sourceware.org References: <1e73226e9c0b203e6021b92fb825a4b1610e5181.1634922528.git.andrew.burgess@embecosm.com> From: Simon Marchi In-Reply-To: <1e73226e9c0b203e6021b92fb825a4b1610e5181.1634922528.git.andrew.burgess@embecosm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Poly-FromMTA: (simark.ca [158.69.221.121]) at Mon, 15 Nov 2021 02:44:03 +0000 X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_INFOUSMEBIZ, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_35_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Nov 2021 02:44:12 -0000 On 2021-10-22 13:10, Andrew Burgess wrote: > Previous the 'maint packet' command, and the recently added Python API Previous -> Previously? > RemoteTargetConnection.send_packet, could only cope with replies that > contained printable string characters. >=20 > The code implementing these two features treated the reply buffer as a > string, so any null-bytes would terminate the output, and if some > other non-printable character was encountered, GDB would just try to > print it anyway... >=20 > The only way I've found to reproduce this easily is to try and examine > the auxv data, which is binary in nature, here's a GDB session before > my patch: >=20 > (gdb) target remote :54321 > ... > (gdb) maint packet qXfer:auxv:read::0,1000 > sending: "qXfer:auxv:read::0,1000" > received: "l!" > (gdb) >=20 > And after my patch: >=20 > (gdb) target remote :54321 > ... > (gdb) maint packet qXfer:auxv:read::0,1000 > sending: "qXfer:auxv:read::0,1000" > received: "l!\x00\x00\x00\x00\x00\x00\x00\x00\xf0\xfc\xf7\xff\x7f\x00= \x00\x10\x00\x00\x00\x00\x00\x00\x00\xff\xfb\xeb\xbf\x00\x00\x00\x00\x06\= x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x11\x00\x00\x= 00\x00\x00\x00\x00d\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x= 00\x00@\x00@\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x008\x00\x00= \x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00\x00\= x00\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00\x00\x10\xfd\xf7\xff\x7f\x00\x= 00\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x09\x0= 0\x00\x00\x00\x00\x00\x00P\x10@\x00\x00\x00\x00\x00\x0b\x00\x00\x00\x00\x= 00\x00\x00\xe8\x03\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x00\x0= 0\xe8\x03\x00\x00\x00\x00\x00\x00\x0d\x00\x00\x00\x00\x00\x00\x00\xe8\x03= \x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x00\xe8\x03\x00\x00\= x00\x00\x00\x00\x17\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x= 00\x00\x19\x00\x00\x00\x00\x00\x00\x00y\xba\xff\xff\xff\x7f\x00\x00\x1a\x= 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x00\x0= 0\x00\x00\x00\x00\xdf\xef\xff\xff\xff\x7f\x00\x00\x0f\x00\x00\x00\x00\x00= \x00\x00\x89\xba\xff\xff\xff\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\= x00\x00\x00\x00\x00\x00\x00\x00" > (gdb) >=20 > The binary contents of the reply are now printed as escaped hex. >=20 > Similarly, here's the same packet fetched, and printed, using this > Python script: >=20 > inf =3D gdb.selected_inferior() > conn =3D inf.connection > res =3D conn.send_packet("qXfer:auxv:read::0,1000") > print ("Got: %s" % str(res)) >=20 > The GDB session: >=20 > (gdb) source fetch-auxv.py > Got: b'l!\x00\x00\x00\x00\x00\x00\x00\x00\xf0\xfc\xf7\xff\x7f\x00\x00= \x10\x00\x00\x00\x00\x00\x00\x00\xff\xfb\xeb\xbf\x00\x00\x00\x00\x06\x00\= x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x11\x00\x00\x00\x= 00\x00\x00\x00d\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x= 00@\x00@\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x008\x00\x00\x00= \x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00\x00\x00\= x00\x00\x07\x00\x00\x00\x00\x00\x00\x00\x00\x10\xfd\xf7\xff\x7f\x00\x00\x= 08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\= x00\x00\x00\x00\x00P\x10@\x00\x00\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00= \x00\xe8\x03\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x00\x00\xe8\= x03\x00\x00\x00\x00\x00\x00\r\x00\x00\x00\x00\x00\x00\x00\xe8\x03\x00\x00= \x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x00\xe8\x03\x00\x00\x00\x00\= x00\x00\x17\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x= 19\x00\x00\x00\x00\x00\x00\x00y\xba\xff\xff\xff\x7f\x00\x00\x1a\x00\x00\x= 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x00\x00\x00\x0= 0\x00\x00\xdf\xef\xff\xff\xff\x7f\x00\x00\x0f\x00\x00\x00\x00\x00\x00\x00= \x89\xba\xff\xff\xff\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\= x00\x00\x00\x00\x00\x00' >=20 > The testing for this change is pretty weak right now, I just fetch the > auxv data using 'maint packet' and from Python, and check that the two > results match. I also check that the results, when converted to a > string, contains some hex-encoded characters (e.g. \xf0, etc). Reading this made me think that the input side of things should probably use bytes and not str as well. See this example using the X packet to write memory using binary data: inf =3D gdb.selected_inferior() conn =3D inf.connection res =3D conn.send_packet('X555555558028,4:\xff\x03\x02\x01') print(res) The result: (gdb) source s.py=20 [remote] Sending packet: $X555555558028,4:=C3=BF\003\002\001#f4 [remote] Packet received: E01 b'E01' And on GDBserver side: gdbserver: Received too much data from the target. The ff gets interpreted as this unicode character: https://www.fileformat.info/info/unicode/char/00ff/index.htm and encoded in UTF-8 as two bytes (C3 BF). > --- > gdb/python/py-connection.c | 21 ++++---- > gdb/remote.c | 57 ++++++++++++---------= > gdb/remote.h | 10 ++-- > gdb/testsuite/gdb.python/py-send-packet.exp | 33 ++++++++++++ > gdb/testsuite/gdb.python/py-send-packet.py | 28 ++++++++++ > 5 files changed, 112 insertions(+), 37 deletions(-) >=20 > diff --git a/gdb/python/py-connection.c b/gdb/python/py-connection.c > index d282c225e9c..a0bea11694d 100644 > --- a/gdb/python/py-connection.c > +++ b/gdb/python/py-connection.c > @@ -320,22 +320,25 @@ struct py_send_packet_callbacks : public send_rem= ote_packet_callbacks > void sending (const char *args) override > { /* Nothing. */ } > =20 > - /* When the result is returned create a Python string and assign thi= s > - into the result member variable. */ > + /* When the result is returned create a Python object and assign thi= s > + into M_RESULT. If for any reason we can't create a Python object= to > + represent the result then M_RESULT is left untouched (it is > + initialised to reference Py_None), so after this call M_RESULT wi= ll > + either be an object representing the reply, or None. */ > =20 > - void received (const gdb::char_vector &buf) override > + void received (const gdb::char_vector &buf, int bytes) override I was confused about why you needed to pass both BUF and BYTES. The reason is that BUF refers to the whole remote_state::buf vectors, but only a subset of that is valid. In that case, it would perhaps be better to replace the parameter with a gdb::array_view (instead of adding BYTES). > { > - /* m_result is initialized to None, leave it untouched unless we g= ot > - back a useful result. */ > - if (buf.data ()[0] !=3D '\0') > + gdb_assert (bytes >=3D 0); > + > + if (bytes > 0 && buf.data ()[0] !=3D '\0') > { > PyObject *result; > =20 > #ifdef IS_PY3K > - result =3D Py_BuildValue("y#", buf.data (), strlen (buf.data ())); > + result =3D Py_BuildValue("y#", buf.data (), bytes); > #else > - result =3D PyString_FromStringAndSize (buf.data (), > - strlen (buf.data ())); > + //result =3D PyString_FromStringAndSize (buf.data (), bytes); > + result =3D PyByteArray_FromStringAndSize (buf.data (), bytes); Leftover comment. I think that for Python 2, string was the right type actually. Strings were just plain byte sequences. Python 2.6 and 2.7 actually make bytes an alias of str, because they are the same thing: >>> bytes >>> bytes =3D=3D str True You might be able to use PyBytes_FromStringAndSize for both Python 2 and 3 actually. See the note at the top here: https://docs.python.org/2.7/c-api/string.html I didn't test though, as I don't have a Python 2 build on hand. I could make one though if we need to discuss this more. > diff --git a/gdb/testsuite/gdb.python/py-send-packet.py b/gdb/testsuite= /gdb.python/py-send-packet.py > index 35754164a98..226c3e9477f 100644 > --- a/gdb/testsuite/gdb.python/py-send-packet.py > +++ b/gdb/testsuite/gdb.python/py-send-packet.py > @@ -77,5 +77,33 @@ def run_send_packet_test(): > print("Send packet test passed") > =20 > =20 > +# Convert a bytes or bytearray object to a string. This follows the > +# same rules as the 'maint packet' command so that the output from the= > +# two sources can be compared. > +def bytes_to_string(byte_array): > + res =3D "" > + for b in byte_array: > + b =3D int(b) > + if b >=3D 32 and b <=3D 126: > + res =3D res + ("%c" % b) > + else: > + res =3D res + ("\\x%02x" % b) > + return res > + > +# A very simple test for sending the packet that reads the auxv data. > +# We convert the result to a string and expect to find some > +# hex-encoded bytes in the output. This test will only work on > +# targets that actually supply auxv data. > +def run_auxv_send_packet_test(expected_result): > + inf =3D gdb.selected_inferior() > + conn =3D inf.connection > + assert isinstance(conn, gdb.RemoteTargetConnection) > + res =3D conn.send_packet("qXfer:auxv:read::0,1000") Can you assert that type(res) is the right thing, to verify that what we document is true? Following what I said above, it could be assert type(res) is bytes for both Python 2 and 3. Simon