From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id 609433858D39 for ; Wed, 24 May 2023 15:42:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 609433858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=simark.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark.ca Received: from [10.0.5.195] (unknown [207.219.174.155]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 68E041E0D4; Wed, 24 May 2023 11:42:16 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=simark.ca; s=mail; t=1684942936; bh=srdeNE/4k3cQa6Dnh0p3hdc4185WmG1TPTJLDKJqiFE=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=jhTmpqBTnTjZ4jtxun15vczvXlUW4agjKh/CeYVj32zsuDIOvM/rMFLoxx6uIL1Aa Z6fC/ocVM0m90YCxSwDT84TQl2aiVxOwNTTX8J/LJmoR5F6bukXNLuVU91flPO/UGi rATXnuImW4EfK2PHupH2U/473T9uLXLU9xQ/2MvA= Message-ID: Date: Wed, 24 May 2023 11:42:11 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCHv3] gdb: building inferior strings from within GDB Content-Language: fr To: Andrew Burgess , gdb-patches@sourceware.org Cc: Pedro Alves , Simon Marchi References: <75628df1fbd72166504c9b2a368170121b86e072.1680849242.git.aburgess@redhat.com> From: Simon Marchi In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/24/23 10:10, Andrew Burgess wrote: > History Of This Patch > ===================== > > This commit aims to address PR gdb/21699. There have now been a > couple of attempts to fix this issue. Simon originally posted two > patches back in 2021: > > https://sourceware.org/pipermail/gdb-patches/2021-July/180894.html > https://sourceware.org/pipermail/gdb-patches/2021-July/180896.html > > Before Pedro then posted a version of his own: > > https://sourceware.org/pipermail/gdb-patches/2021-July/180970.html > > After this the conversation halted. Then in 2023 I (Andrew) also took > a look at this bug and posted two versions: > > https://sourceware.org/pipermail/gdb-patches/2023-April/198570.html > https://sourceware.org/pipermail/gdb-patches/2023-April/198680.html > > The approach taken in my first patch was pretty similar to what Simon > originally posted back in 2021. My second attempt was only a slight > variation on the first. > > Pedro then pointed out his older patch, and so we arrive at this > patch. The GDB changes here are mostly Pedro's work, but updated by > me (Andrew), any mistakes are mine. > > The tests here are a combinations of everyone's work, and the commit > message is new, but copies bits from everyone's earlier work. > > Problem Description > =================== > > Bug PR gdb/21699 makes the observation that using $_as_string with > GDB's printf can cause GDB to print unexpected data from the > inferior. The reproducer is pretty simple: > > #include > static char arena[100]; > > /* Override malloc() so value_coerce_to_target() gets a known > pointer, and we know we"ll see an error if $_as_string() gives > a string that isn't null terminated. */ > void > *malloc (size_t size) > { > memset (arena, 'x', sizeof (arena)); > if (size > sizeof (arena)) > return NULL; > return arena; > } > > int > main () > { > return 0; > } > > And then in a GDB session: > > $ gdb -q test > Reading symbols from /tmp/test... > (gdb) start > Temporary breakpoint 1 at 0x4004c8: file test.c, line 17. > Starting program: /tmp/test > > Temporary breakpoint 1, main () at test.c:17 > 17 return 0; > (gdb) printf "%s\n", $_as_string("hello") > "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > (gdb) quit > > The problem above is caused by how value_cstring is used within > py-value.c, but once we understand the issue then it turns out that > value_cstring is used in an unexpected way in many places within GDB. > > Within py-value.c we have a null-terminated C-style string. We then > pass a pointer to this string, along with the length of this > string (so not including the null-character) to value_cstring. > > In value_cstring GDB allocates an array value of the given character > type, and copies in requested number of characters. However > value_cstring does not add a null-character of its own. This means > that the value created by calling value_cstring is only > null-terminated if the null-character is included in the passed in > length. In py-value.c this is not the case, and indeed, in most uses > of value_cstring, this is not the case. > > When GDB tries to print one of these strings the value contents are > pushed to the inferior, and then read back as a C-style string, that > is, GDB reads inferior memory until it finds a null-terminator. For > the py-value.c case, no null-terminator is pushed into the inferior, > so GDB will continue reading inferior memory until a null-terminator > is found, with unpredictable results. > > Patch Description > ================= > > The first thing this patch does is better define what the arguments > for the two function value_cstring and value_string should represent. > The comments in the header file are updated to describe whether the > length argument should, or should not, include a null-character. > Also, the data argument is changed to type gdb_byte. The functions as > they currently exist will handle wide-characters, in which case more > than one 'char' would be needed for each character. As such using > gdb_byte seems to make more sense. > > To avoid adding casts throughout GDB, I've also added an overload that > still takes a 'char *', but asserts that the character type being used > is of size '1'. > > The value_cstring function is now responsible for adding a null > character at the end of the string value it creates. > > However, once we start looking at how value_cstring is used, we > realise there's another, related, problem. Not every language's > strings are null terminated. Fortran and Ada strings, for example, > are just an array of characters, GDB already has the function > value_string which can be used to create such values. > > Consider this example using current GDB: > > (gdb) set language ada > (gdb) p $_gdb_setting("arch") > $1 = (97, 117, 116, 111) > (gdb) ptype $ > type = array (1 .. 4) of char > (gdb) p $_gdb_maint_setting("test-settings string") > $2 = (0) > (gdb) ptype $ > type = array (1 .. 1) of char > > This shows two problems, first, the $_gdb_setting and > $_gdb_maint_setting functions are calling value_cstring using the > builtin_char character, rather than a language appropriate type. In > the first call, the 'arch' case, the value_cstring call doesn't > include the null character, so the returned array only contains the > expected characters. But, in the $_gdb_maint_setting example we do > end up including the null-character, even though this is not expected > for Ada strings. > > This commit adds a new language method language_defn::value_string, > this function takes a pointer and length and creates a language > appropriate value that represents the string. For C, C++, etc this > will be a null-terminated string (by calling value_cstring), and for > Fortran and Ada this can be a bounded array of characters with no null > terminator. Additionally, this new language_defn::value_string > function is responsible for selecting a language appropriate character > type. > > After this commit the only calls to value_cstring are from the C > expression evaluator and from the default language_defn::value_string. > > And the only calls to value_string are from Fortan, Ada, and ObjectC > related code. > > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699 > > Co-Authored-By: Simon Marchi > Co-Authored-By: Andrew Burgess > Co-Authored-By: Pedro Alves This LGTM: Approved-By: Simon Marchi Just one small question: > diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c > index b7b65303a0b..3fa833d596c 100644 > --- a/gdb/cli/cli-cmds.c > +++ b/gdb/cli/cli-cmds.c > @@ -2316,11 +2316,9 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch) > } > > if (len > 0) > - return value_cstring (value, len, > - builtin_type (gdbarch)->builtin_char); > + return current_language->value_string (gdbarch, value, len); > else > - return value_cstring ("", 1, > - builtin_type (gdbarch)->builtin_char); > + return current_language->value_string (gdbarch, "", 0); Do we still need the two calls, or can we just do with the first (even when len is 0)? Same idea for str_value_from_setting. Simon