public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Andrew Burgess <aburgess@redhat.com>
To: Simon Marchi <simark@simark.ca>, gdb-patches@sourceware.org
Cc: Pedro Alves <pedro@palves.net>, Simon Marchi <simon.marchi@efficios.com>
Subject: Re: [PATCHv3] gdb: building inferior strings from within GDB
Date: Mon, 05 Jun 2023 13:26:35 +0100	[thread overview]
Message-ID: <878rcy3xj8.fsf@redhat.com> (raw)
In-Reply-To: <d1ff310c-0b20-cc1e-7940-e0a44cf264ee@simark.ca>

Simon Marchi <simark@simark.ca> writes:

> On 5/24/23 10:10, Andrew Burgess wrote:
>> History Of This Patch
>> =====================
>> 
>> This commit aims to address PR gdb/21699.  There have now been a
>> couple of attempts to fix this issue.  Simon originally posted two
>> patches back in 2021:
>> 
>>   https://sourceware.org/pipermail/gdb-patches/2021-July/180894.html
>>   https://sourceware.org/pipermail/gdb-patches/2021-July/180896.html
>> 
>> Before Pedro then posted a version of his own:
>> 
>>   https://sourceware.org/pipermail/gdb-patches/2021-July/180970.html
>> 
>> After this the conversation halted.  Then in 2023 I (Andrew) also took
>> a look at this bug and posted two versions:
>> 
>>   https://sourceware.org/pipermail/gdb-patches/2023-April/198570.html
>>   https://sourceware.org/pipermail/gdb-patches/2023-April/198680.html
>> 
>> The approach taken in my first patch was pretty similar to what Simon
>> originally posted back in 2021.  My second attempt was only a slight
>> variation on the first.
>> 
>> Pedro then pointed out his older patch, and so we arrive at this
>> patch.  The GDB changes here are mostly Pedro's work, but updated by
>> me (Andrew), any mistakes are mine.
>> 
>> The tests here are a combinations of everyone's work, and the commit
>> message is new, but copies bits from everyone's earlier work.
>> 
>> Problem Description
>> ===================
>> 
>> Bug PR gdb/21699 makes the observation that using $_as_string with
>> GDB's printf can cause GDB to print unexpected data from the
>> inferior.  The reproducer is pretty simple:
>> 
>>   #include <stddef.h>
>>   static char arena[100];
>> 
>>   /* Override malloc() so value_coerce_to_target() gets a known
>>      pointer, and we know we"ll see an error if $_as_string() gives
>>      a string that isn't null terminated. */
>>   void
>>   *malloc (size_t size)
>>   {
>>       memset (arena, 'x', sizeof (arena));
>>       if (size > sizeof (arena))
>>           return NULL;
>>       return arena;
>>   }
>> 
>>   int
>>   main ()
>>   {
>>     return 0;
>>   }
>> 
>> And then in a GDB session:
>> 
>>   $ gdb -q test
>>   Reading symbols from /tmp/test...
>>   (gdb) start
>>   Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
>>   Starting program: /tmp/test
>> 
>>   Temporary breakpoint 1, main () at test.c:17
>>   17        return 0;
>>   (gdb) printf "%s\n", $_as_string("hello")
>>   "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>   (gdb) quit
>> 
>> The problem above is caused by how value_cstring is used within
>> py-value.c, but once we understand the issue then it turns out that
>> value_cstring is used in an unexpected way in many places within GDB.
>> 
>> Within py-value.c we have a null-terminated C-style string.  We then
>> pass a pointer to this string, along with the length of this
>> string (so not including the null-character) to value_cstring.
>> 
>> In value_cstring GDB allocates an array value of the given character
>> type, and copies in requested number of characters.  However
>> value_cstring does not add a null-character of its own.  This means
>> that the value created by calling value_cstring is only
>> null-terminated if the null-character is included in the passed in
>> length.  In py-value.c this is not the case, and indeed, in most uses
>> of value_cstring, this is not the case.
>> 
>> When GDB tries to print one of these strings the value contents are
>> pushed to the inferior, and then read back as a C-style string, that
>> is, GDB reads inferior memory until it finds a null-terminator.  For
>> the py-value.c case, no null-terminator is pushed into the inferior,
>> so GDB will continue reading inferior memory until a null-terminator
>> is found, with unpredictable results.
>> 
>> Patch Description
>> =================
>> 
>> The first thing this patch does is better define what the arguments
>> for the two function value_cstring and value_string should represent.
>> The comments in the header file are updated to describe whether the
>> length argument should, or should not, include a null-character.
>> Also, the data argument is changed to type gdb_byte.  The functions as
>> they currently exist will handle wide-characters, in which case more
>> than one 'char' would be needed for each character.  As such using
>> gdb_byte seems to make more sense.
>> 
>> To avoid adding casts throughout GDB, I've also added an overload that
>> still takes a 'char *', but asserts that the character type being used
>> is of size '1'.
>> 
>> The value_cstring function is now responsible for adding a null
>> character at the end of the string value it creates.
>> 
>> However, once we start looking at how value_cstring is used, we
>> realise there's another, related, problem.  Not every language's
>> strings are null terminated.  Fortran and Ada strings, for example,
>> are just an array of characters, GDB already has the function
>> value_string which can be used to create such values.
>> 
>> Consider this example using current GDB:
>> 
>>   (gdb) set language ada
>>   (gdb) p $_gdb_setting("arch")
>>   $1 = (97, 117, 116, 111)
>>   (gdb) ptype $
>>   type = array (1 .. 4) of char
>>   (gdb) p $_gdb_maint_setting("test-settings string")
>>   $2 = (0)
>>   (gdb) ptype $
>>   type = array (1 .. 1) of char
>> 
>> This shows two problems, first, the $_gdb_setting and
>> $_gdb_maint_setting functions are calling value_cstring using the
>> builtin_char character, rather than a language appropriate type.  In
>> the first call, the 'arch' case, the value_cstring call doesn't
>> include the null character, so the returned array only contains the
>> expected characters.  But, in the $_gdb_maint_setting example we do
>> end up including the null-character, even though this is not expected
>> for Ada strings.
>> 
>> This commit adds a new language method language_defn::value_string,
>> this function takes a pointer and length and creates a language
>> appropriate value that represents the string.  For C, C++, etc this
>> will be a null-terminated string (by calling value_cstring), and for
>> Fortran and Ada this can be a bounded array of characters with no null
>> terminator.  Additionally, this new language_defn::value_string
>> function is responsible for selecting a language appropriate character
>> type.
>> 
>> After this commit the only calls to value_cstring are from the C
>> expression evaluator and from the default language_defn::value_string.
>> 
>> And the only calls to value_string are from Fortan, Ada, and ObjectC
>> related code.
>> 
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
>> 
>> Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
>> Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
>> Co-Authored-By: Pedro Alves <pedro@palves.net>
>
> This LGTM:
>
> Approved-By: Simon Marchi <simon.marchi@efficios.com>
>
> Just one small question:
>
>> diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
>> index b7b65303a0b..3fa833d596c 100644
>> --- a/gdb/cli/cli-cmds.c
>> +++ b/gdb/cli/cli-cmds.c
>> @@ -2316,11 +2316,9 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
>>  	  }
>>  
>>  	if (len > 0)
>> -	  return value_cstring (value, len,
>> -				builtin_type (gdbarch)->builtin_char);
>> +	  return current_language->value_string (gdbarch, value, len);
>>  	else
>> -	  return value_cstring ("", 1,
>> -				builtin_type (gdbarch)->builtin_char);
>> +	  return current_language->value_string (gdbarch, "", 0);
>
> Do we still need the two calls, or can we just do with the first (even
> when len is 0)?  Same idea for str_value_from_setting.

You are right.  I merged these two calls, and the other two in
str_value_from_setting, and pushed this patch.

Thanks,
Andrew


  reply	other threads:[~2023-06-05 12:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-03 21:49 [PATCH] gdb: fix missing null-character when using value_cstring Andrew Burgess
2023-04-04 13:58 ` Simon Marchi
2023-04-06 13:20   ` Andrew Burgess
2023-04-11 12:58     ` Pedro Alves
2023-04-12 20:47       ` Andrew Burgess
2023-04-13 11:56         ` Pedro Alves
2023-04-07  6:35 ` [PATCHv2] " Andrew Burgess
2023-05-24 14:10   ` [PATCHv3] gdb: building inferior strings from within GDB Andrew Burgess
2023-05-24 15:42     ` Simon Marchi
2023-06-05 12:26       ` Andrew Burgess [this message]
2023-06-05 17:57         ` Simon Marchi
2023-06-06 15:50           ` Andrew Burgess
2023-06-09 13:41             ` Tom Tromey
2023-06-09 14:20               ` Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878rcy3xj8.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=pedro@palves.net \
    --cc=simark@simark.ca \
    --cc=simon.marchi@efficios.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).