public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] gdb: fix missing null-character when using value_cstring
@ 2023-04-03 21:49 Andrew Burgess
  2023-04-04 13:58 ` Simon Marchi
  2023-04-07  6:35 ` [PATCHv2] " Andrew Burgess
  0 siblings, 2 replies; 14+ messages in thread
From: Andrew Burgess @ 2023-04-03 21:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

In PR gdb/21699 an issue was reported with the $_as_string convenience
function.  It was observed that the string returned by this function,
when pushed into the inferior, was not null terminated.

This was causing problems when using the result with GDB's printf
command, as this command relies on the string having been pushed into
the inferior and being null terminated.

The bug includes a simple reproducer:

  #include <stddef.h>
  static char arena[51] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";

  /* Override malloc() so value_coerce_to_target() gets a known pointer, and we
     know we"ll see an error if $_as_string() gives a string that isn't NULL
     terminated. */
  void
  *malloc (size_t size)
  {
      if (size > sizeof (arena))
          return NULL;
      return arena;
  }

  int
  main ()
  {
    return 0;
  }

Then use this in a GDB session like this:

  $ gdb -q test
  Reading symbols from /tmp/test...
  (gdb) start
  Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
  Starting program: /tmp/test

  Temporary breakpoint 1, main () at test.c:17
  17        return 0;
  (gdb) printf "%s\n", $_as_string("hello")
  "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  (gdb) quit

The problem is with the GDB function value_cstring, or at least, how
this function is being used.

When $_as_string is called we enter fnpy_call (python/py-function.c),
this calls into Python code.  The Python code returns a result, which
will be a Python string, and then we call convert_value_from_python to
convert the Python string to a GDB value.

In convert_value_from_python (python/py-value.c) we enter the
gdbpy_is_string case (as the result is a string), then we call
python_string_to_target_string, which returns a null terminated C
string.  Next we then make this call:

  value = value_cstring (s.get (), strlen (s.get ()),
                         builtin_type_pychar);

This passes the pointer to the first character 's.get ()' and the
length of the string 'strlen (s.get ())', however, this length does
not include the trailing null character.

If we look at value_cstring (valops.c) we see that an array is created
using the passed in length, and characters are copied into the newly
allocated array value.  This means we do not copy the strings trailing
null character, nor does value_cstring add a trailing null.

Finally, when we use this array value with printf, GDB pushed the
array to the inferior, which mallocs some space based on the array
length, and then copies the array content to the inferior.

As the array doesn't include a trailing null, non is written into the
inferior.  So what we place into the inferior is not a C string, but
is actually an array of characters.

When printf tries to print the value it starts at the address of the
first character and continues until it reaches a null.  When that will
be is undefined, so we may end up printing random garbage.

Now, ignore for a moment that the whole push an array to the inferior
just so we can fetch it in order to print it is clearly crazy.  That's
a problem for another day I think.  The important question here is:
should value_cstring include a null character or not.

Given the function name include 'cstring', which I'm assuming means C
style string, I think that we should be including a trailing null.

Given that, I see two possibilities, either value_cstring can always
add a trailing null, or value_cstring can assert that there is a
trailing null, and the caller is responsible for making sure that the
passed in length includes the null character.

Given we're always passing from a C style string to begin with the
question is really, should the length being passed to value_cstring
include the null, or not include the null?

The only place where we currently include the null in the passed
length is from c_string_operation::evaluate.  Every other use of
value_cstring passes the length excluding the null.

I was tempted to adjust c_string_operation::evaluate to exclude the
null, and then have value_cstring add a trailing null.  However, this
does mean that if, in the future, a use is introduced that incorrectly
includes the trailing null in the passed length, then we are unlikely
to spot immediately - we'd instead create an array with two null
characters at the end.

Alternatively, if we change the requirements of value_cstring so that
we require the passed length includes the trailing null, then we can
assert that this is indeed the case within value_cstring.  Any
incorrect uses in the future will be quickly spotted.

So that's what I did, c_string_operation::evaluate is left unchanged,
but every other use of value_cstring is adjusted with a '+ 1' so that
we include the null within the length being passed.

I've added a header comment to value_cstring (value.h) to describe the
requirements.

Upon testing there were two tests that failed after this fix,
gdb.base/settings.exp and gdb.python/py-mi.exp.  In both of these
cases we end up asking for the type of a character array allocated
through value_cstring.  The length of this array has now increased by
one.  Here's the previous behaviour:

  (gdb) set args abc
  (gdb) p $_gdb_setting("args")
  $1 = "abc"
  (gdb) ptype $1
  type = char [3]
  (gdb)

And here's the modified behaviour:

  (gdb) set args abc
  (gdb) p $_gdb_setting("args")
  $1 = "abc"
  (gdb) ptype $1
  type = char [4]
  (gdb)

Notice the type of $1 changed from an array of length 3 to an array of
length 4.  However, I don't think this is a bad thing, consider:

  char lit[] = "zzz";
  int
  main()
  {
    return 0;
  }

And in GDB:

  (gdb) ptype lit
  type = char [4]
  (gdb)

The null character is considered part of the array, so I think the new
behaviour makes sense.

I've added some new tests that try to exercise this issue in a few
different ways.  At the end of the day though it's all just variations
of the same thing, create a value by calling through value_cstring,
then force GDB to push the value to the inferior, and then treat that
as a C style string and see the lack of null character cause problems.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
---
 gdb/cli/cli-cmds.c                       | 17 ++++--
 gdb/guile/scm-math.c                     | 16 ++++--
 gdb/python/py-value.c                    |  8 ++-
 gdb/testsuite/gdb.base/cstring-exprs.c   | 51 ++++++++++++++++++
 gdb/testsuite/gdb.base/cstring-exprs.exp | 68 ++++++++++++++++++++++++
 gdb/testsuite/gdb.base/settings.exp      |  4 +-
 gdb/testsuite/gdb.python/py-mi.exp       |  2 +-
 gdb/valops.c                             |  4 ++
 gdb/value.c                              |  3 +-
 gdb/value.h                              |  4 ++
 10 files changed, 164 insertions(+), 13 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.c
 create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.exp

diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
index 3b1c6a9f4bd..55cca84e7e5 100644
--- a/gdb/cli/cli-cmds.c
+++ b/gdb/cli/cli-cmds.c
@@ -2307,8 +2307,11 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  }
 
 	if (len > 0)
-	  return value_cstring (value, len,
-				builtin_type (gdbarch)->builtin_char);
+	  {
+	    /* The +1 here ensures we include the trailing null character.  */
+	    return value_cstring (value, len + 1,
+				  builtin_type (gdbarch)->builtin_char);
+	  }
 	else
 	  return value_cstring ("", 1,
 				builtin_type (gdbarch)->builtin_char);
@@ -2363,7 +2366,8 @@ str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
       {
 	std::string cmd_val = get_setshow_command_value_string (var);
 
-	return value_cstring (cmd_val.c_str (), cmd_val.size (),
+	/* The +1 ensures we include the trailing null character.  */
+	return value_cstring (cmd_val.c_str (), cmd_val.size () + 1,
 			      builtin_type (gdbarch)->builtin_char);
       }
 
@@ -2392,8 +2396,11 @@ str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  }
 
 	if (len > 0)
-	  return value_cstring (value, len,
-				builtin_type (gdbarch)->builtin_char);
+	  {
+	    /* The +1 here ensures we include the trailing null character.  */
+	    return value_cstring (value, len + 1,
+				  builtin_type (gdbarch)->builtin_char);
+	  }
 	else
 	  return value_cstring ("", 1,
 				builtin_type (gdbarch)->builtin_char);
diff --git a/gdb/guile/scm-math.c b/gdb/guile/scm-math.c
index 49fe93b97f8..c7dcfc85f11 100644
--- a/gdb/guile/scm-math.c
+++ b/gdb/guile/scm-math.c
@@ -803,9 +803,19 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
 					0 /*non-strict*/,
 					&except_scm);
 	      if (s != NULL)
-		value = value_cstring (s.get (), len,
-				       language_string_char_type (language,
-								  gdbarch));
+		{
+		  /* The gdbscm_scm_to_string call doesn't guarantee to add
+		     a tailing null-character, so we need to add our own.  */
+		  gdb::unique_xmalloc_ptr<char> tmp ((char *) xmalloc (len + 1));
+		  memcpy (tmp.get (), s.get (), len);
+		  *(tmp.get () + len) = '\0';
+
+		  /* The +1 here ensures we include the trailing null
+		     character.  */
+		  value = value_cstring (tmp.get (), len + 1,
+					 language_string_char_type (language,
+								    gdbarch));
+		}
 	      else
 		value = NULL;
 	    }
diff --git a/gdb/python/py-value.c b/gdb/python/py-value.c
index 65384c781bc..59b879e5ebc 100644
--- a/gdb/python/py-value.c
+++ b/gdb/python/py-value.c
@@ -1881,8 +1881,12 @@ convert_value_from_python (PyObject *obj)
 	  gdb::unique_xmalloc_ptr<char> s
 	    = python_string_to_target_string (obj);
 	  if (s != NULL)
-	    value = value_cstring (s.get (), strlen (s.get ()),
-				   builtin_type_pychar);
+	    {
+	      /* The +1 here ensures we include the trailing null
+		 character.  */
+	      value = value_cstring (s.get (), strlen (s.get ()) + 1,
+				     builtin_type_pychar);
+	    }
 	}
       else if (PyObject_TypeCheck (obj, &value_object_type))
 	value = ((value_object *) obj)->value->copy ();
diff --git a/gdb/testsuite/gdb.base/cstring-exprs.c b/gdb/testsuite/gdb.base/cstring-exprs.c
new file mode 100644
index 00000000000..8135edd97d4
--- /dev/null
+++ b/gdb/testsuite/gdb.base/cstring-exprs.c
@@ -0,0 +1,51 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+#include <string.h>
+
+/* A memory area used as the malloc memory buffer.  */
+
+static char arena[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
+
+/* Override malloc().  When GDB tries to push strings into the inferior we
+   always return the same pointer to arena.  This does mean we can't have
+   multiple strings in use at the same time, but that's fine for our basic
+   testing, and this is simpler than using dlsym.  */
+
+void
+*malloc (size_t size)
+{
+  memset (arena, 'X', sizeof (arena));
+  if (size > sizeof (arena))
+    return NULL;
+  return arena;
+}
+
+/* This function is called from GDB.  */
+
+void
+take_string (const char *str)
+{
+  /* Nothing.  */
+}
+
+int
+main (void)
+{
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/cstring-exprs.exp b/gdb/testsuite/gdb.base/cstring-exprs.exp
new file mode 100644
index 00000000000..fc2f96affa0
--- /dev/null
+++ b/gdb/testsuite/gdb.base/cstring-exprs.exp
@@ -0,0 +1,68 @@
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test different ways that we can cause GDB to call the value_string
+# function.  This function should create a C style string, i.e. the
+# string should include a null terminating character.
+#
+# If the value created by value_cstring is pushed into the inferior
+# and the null terminating character is missing, then we can get
+# unexpected results.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" $testfile $srcfile] } {
+    return -1
+}
+
+if {![runto_main]} {
+    return 0
+}
+
+if [allow_python_tests] {
+    # The $_as_string convenience function is implemented in Python.
+    gdb_test {printf "%s\n", $_as_string("aabbcc")} "\"aabbcc\""
+
+    # Create a gdb.Value object for a string.  Take its address (which
+    # forces it into the inferior), and then print the address as a
+    # string.
+    gdb_test_no_output {python addr = gdb.Value("xxyyzz").address}
+    gdb_test {python gdb.execute("x/1s 0x%x" % addr)} \
+	"$hex <arena>:\\s+\"xxyyzz\""
+
+    # Call an inferior function through a gdb.Value object, pass a
+    # string to the inferior function and ensure it arrives correctly.
+    gdb_test "p/x take_string" " = $hex.*"
+    gdb_test_no_output "python func_ptr = gdb.history (0)" \
+	"place address of take_string into Python variable"
+    gdb_test "python func_value = func_ptr.dereference()" ""
+    gdb_breakpoint "take_string"
+    gdb_test {python result = func_value("qqaazz")} \
+	"Breakpoint $decimal, take_string \\(str=$hex <arena> \"qqaazz\"\\) at .*"
+    gdb_test "continue" "Continuing\\."
+}
+
+# Use printf on a string parsed by the C expression parser.
+gdb_test {printf "%s\n", "ddeeff"} "ddeeff"
+
+# Parse a string in the C expression parser, force it into the
+# inferior by taking its address, then print it as a string.
+gdb_test {x/1s &"gghhii"} "$hex <arena>:\\s+\"gghhii\""
+
+# Use $_gdb_setting_str and $_gdb_maint_setting_str to create a string
+# value, and then print using printf, which forces the string into the
+# inferior.
+gdb_test {printf "%s\n", $_gdb_setting_str("arch")} "auto"
+gdb_test {printf "%s\n", $_gdb_maint_setting_str("bfd-sharing")} "on"
diff --git a/gdb/testsuite/gdb.base/settings.exp b/gdb/testsuite/gdb.base/settings.exp
index e54203924ec..9dd3ee174c7 100644
--- a/gdb/testsuite/gdb.base/settings.exp
+++ b/gdb/testsuite/gdb.base/settings.exp
@@ -504,7 +504,9 @@ proc_with_prefix test-enum {} {
     gdb_test_no_output "$set_cmd zzz"
     show_setting "$show_cmd" "zzz" 0 "yyy"
 
-    check_type "test-settings enum" "type = char \\\[3\\\]"
+    # A string literal includes the trailing null character, hence the
+    # array size of four here.
+    check_type "test-settings enum" "type = char \\\[4\\\]"
 
     test_gdb_complete_multiple "$set_cmd " "" "" {
 	"xxx"
diff --git a/gdb/testsuite/gdb.python/py-mi.exp b/gdb/testsuite/gdb.python/py-mi.exp
index 97b8a42f715..c65c83553f1 100644
--- a/gdb/testsuite/gdb.python/py-mi.exp
+++ b/gdb/testsuite/gdb.python/py-mi.exp
@@ -288,7 +288,7 @@ mi_create_dynamic_varobj nstype2 nstype2 ".*" 1 \
   "create nstype2 varobj"
 
 mi_list_varobj_children nstype2 {
-    { {nstype2.<error at 0>} {<error at 0>} 6 {char \[6\]} }
+    { {nstype2.<error at 0>} {<error at 0>} 7 {char \[7\]} }
 } "list children after setting exception flag"
 
 mi_create_varobj me me \
diff --git a/gdb/valops.c b/gdb/valops.c
index d002c9dca9b..305885cc9c6 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1738,9 +1738,13 @@ value_array (int lowbound, int highbound, struct value **elemvec)
   return val;
 }
 
+/* See value.h.  */
+
 struct value *
 value_cstring (const char *ptr, ssize_t len, struct type *char_type)
 {
+  gdb_assert (*(ptr + (len - 1)) == '\0');
+
   struct value *val;
   int lowbound = current_language->string_lower_bound ();
   ssize_t highbound = len / char_type->length ();
diff --git a/gdb/value.c b/gdb/value.c
index eab1933b256..4f1e0d6c53e 100644
--- a/gdb/value.c
+++ b/gdb/value.c
@@ -2044,7 +2044,8 @@ value_of_internalvar (struct gdbarch *gdbarch, struct internalvar *var)
       break;
 
     case INTERNALVAR_STRING:
-      val = value_cstring (var->u.string, strlen (var->u.string),
+      /* The +1 ensures we include the null character.  */
+      val = value_cstring (var->u.string, strlen (var->u.string) + 1,
 			   builtin_type (gdbarch)->builtin_char);
       break;
 
diff --git a/gdb/value.h b/gdb/value.h
index e13f92c397b..a23adda3c65 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -1182,6 +1182,10 @@ class scoped_value_mark
   const struct value *m_value;
 };
 
+/* Create an array value, with element type CHAR_TYPE and length LEN.  The
+   array represents a C style string.  The content for the value is copied
+   from PTR.  The last array element must be a null character.  */
+
 extern struct value *value_cstring (const char *ptr, ssize_t len,
 				    struct type *char_type);
 extern struct value *value_string (const char *ptr, ssize_t len,

base-commit: 60a13bbcdfb0ce008a77563cea0c34c830d7b170
-- 
2.25.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gdb: fix missing null-character when using value_cstring
  2023-04-03 21:49 [PATCH] gdb: fix missing null-character when using value_cstring Andrew Burgess
@ 2023-04-04 13:58 ` Simon Marchi
  2023-04-06 13:20   ` Andrew Burgess
  2023-04-07  6:35 ` [PATCHv2] " Andrew Burgess
  1 sibling, 1 reply; 14+ messages in thread
From: Simon Marchi @ 2023-04-04 13:58 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches

On 4/3/23 17:49, Andrew Burgess via Gdb-patches wrote:
> In PR gdb/21699 an issue was reported with the $_as_string convenience
> function.  It was observed that the string returned by this function,
> when pushed into the inferior, was not null terminated.
> 
> This was causing problems when using the result with GDB's printf
> command, as this command relies on the string having been pushed into
> the inferior and being null terminated.
> 
> The bug includes a simple reproducer:
> 
>   #include <stddef.h>
>   static char arena[51] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
> 
>   /* Override malloc() so value_coerce_to_target() gets a known pointer, and we
>      know we"ll see an error if $_as_string() gives a string that isn't NULL
>      terminated. */
>   void
>   *malloc (size_t size)
>   {
>       if (size > sizeof (arena))
>           return NULL;
>       return arena;
>   }
> 
>   int
>   main ()
>   {
>     return 0;
>   }
> 
> Then use this in a GDB session like this:
> 
>   $ gdb -q test
>   Reading symbols from /tmp/test...
>   (gdb) start
>   Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
>   Starting program: /tmp/test
> 
>   Temporary breakpoint 1, main () at test.c:17
>   17        return 0;
>   (gdb) printf "%s\n", $_as_string("hello")
>   "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>   (gdb) quit
> 
> The problem is with the GDB function value_cstring, or at least, how
> this function is being used.
> 
> When $_as_string is called we enter fnpy_call (python/py-function.c),
> this calls into Python code.  The Python code returns a result, which
> will be a Python string, and then we call convert_value_from_python to
> convert the Python string to a GDB value.
> 
> In convert_value_from_python (python/py-value.c) we enter the
> gdbpy_is_string case (as the result is a string), then we call
> python_string_to_target_string, which returns a null terminated C
> string.  Next we then make this call:
> 
>   value = value_cstring (s.get (), strlen (s.get ()),
>                          builtin_type_pychar);
> 
> This passes the pointer to the first character 's.get ()' and the
> length of the string 'strlen (s.get ())', however, this length does
> not include the trailing null character.
> 
> If we look at value_cstring (valops.c) we see that an array is created
> using the passed in length, and characters are copied into the newly
> allocated array value.  This means we do not copy the strings trailing
> null character, nor does value_cstring add a trailing null.
> 
> Finally, when we use this array value with printf, GDB pushed the
> array to the inferior, which mallocs some space based on the array
> length, and then copies the array content to the inferior.
> 
> As the array doesn't include a trailing null, non is written into the

non -> none

> inferior.  So what we place into the inferior is not a C string, but
> is actually an array of characters.
> 
> When printf tries to print the value it starts at the address of the
> first character and continues until it reaches a null.  When that will
> be is undefined, so we may end up printing random garbage.
> 
> Now, ignore for a moment that the whole push an array to the inferior
> just so we can fetch it in order to print it is clearly crazy.  That's
> a problem for another day I think.  The important question here is:
> should value_cstring include a null character or not.
> 
> Given the function name include 'cstring', which I'm assuming means C
> style string, I think that we should be including a trailing null.
> 
> Given that, I see two possibilities, either value_cstring can always
> add a trailing null, or value_cstring can assert that there is a
> trailing null, and the caller is responsible for making sure that the
> passed in length includes the null character.
> 
> Given we're always passing from a C style string to begin with the
> question is really, should the length being passed to value_cstring
> include the null, or not include the null?
> 
> The only place where we currently include the null in the passed
> length is from c_string_operation::evaluate.  Every other use of
> value_cstring passes the length excluding the null.
> 
> I was tempted to adjust c_string_operation::evaluate to exclude the
> null, and then have value_cstring add a trailing null.  However, this
> does mean that if, in the future, a use is introduced that incorrectly
> includes the trailing null in the passed length, then we are unlikely
> to spot immediately - we'd instead create an array with two null
> characters at the end.o

You can always assert that the last character is not '\0'.

> 
> Alternatively, if we change the requirements of value_cstring so that
> we require the passed length includes the trailing null, then we can
> assert that this is indeed the case within value_cstring.  Any
> incorrect uses in the future will be quickly spotted.
> 
> So that's what I did, c_string_operation::evaluate is left unchanged,
> but every other use of value_cstring is adjusted with a '+ 1' so that
> we include the null within the length being passed.

That sounds counterintuitive to me.  With an API of style pointer +
length, I don't expect the length to include the null terminator.  It
also unnecessarily forces the caller to have a null-terminated version
of the string, which may not always be the case (you might want to call
value_cstring on a subset of an existing string).

I think that:

struct value *
value_cstring (const char *ptr, ssize_t len, struct type *char_type)

should take a length excluding the null terminator, but a null
terminator in the result (its job is to build a C string, and a C string
requires a null terminator at the end).

We can have the following overload, for convenience, for places that
already have a C string but don't already know its length:

struct value *
value_cstring (const char *str, struct type *char_type)
{
  return value_cstring (str, strlen (str), char_type);
}

> I've added a header comment to value_cstring (value.h) to describe the
> requirements.
> 
> Upon testing there were two tests that failed after this fix,
> gdb.base/settings.exp and gdb.python/py-mi.exp.  In both of these
> cases we end up asking for the type of a character array allocated
> through value_cstring.  The length of this array has now increased by
> one.  Here's the previous behaviour:
> 
>   (gdb) set args abc
>   (gdb) p $_gdb_setting("args")
>   $1 = "abc"
>   (gdb) ptype $1
>   type = char [3]
>   (gdb)
> 
> And here's the modified behaviour:
> 
>   (gdb) set args abc
>   (gdb) p $_gdb_setting("args")
>   $1 = "abc"
>   (gdb) ptype $1
>   type = char [4]
>   (gdb)
> 
> Notice the type of $1 changed from an array of length 3 to an array of
> length 4.  However, I don't think this is a bad thing, consider:
> 
>   char lit[] = "zzz";
>   int
>   main()
>   {
>     return 0;
>   }
> 
> And in GDB:
> 
>   (gdb) ptype lit
>   type = char [4]
>   (gdb)
> 
> The null character is considered part of the array, so I think the new
> behaviour makes sense.

Makes sense.

> diff --git a/gdb/testsuite/gdb.base/cstring-exprs.c b/gdb/testsuite/gdb.base/cstring-exprs.c
> new file mode 100644
> index 00000000000..8135edd97d4
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/cstring-exprs.c
> @@ -0,0 +1,51 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2023 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#include <stddef.h>
> +#include <string.h>
> +
> +/* A memory area used as the malloc memory buffer.  */
> +
> +static char arena[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
> +
> +/* Override malloc().  When GDB tries to push strings into the inferior we
> +   always return the same pointer to arena.  This does mean we can't have
> +   multiple strings in use at the same time, but that's fine for our basic
> +   testing, and this is simpler than using dlsym.  */
> +
> +void
> +*malloc (size_t size)

The * is on the wrong line.

Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gdb: fix missing null-character when using value_cstring
  2023-04-04 13:58 ` Simon Marchi
@ 2023-04-06 13:20   ` Andrew Burgess
  2023-04-11 12:58     ` Pedro Alves
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Burgess @ 2023-04-06 13:20 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

Simon Marchi <simark@simark.ca> writes:

> On 4/3/23 17:49, Andrew Burgess via Gdb-patches wrote:
>> In PR gdb/21699 an issue was reported with the $_as_string convenience
>> function.  It was observed that the string returned by this function,
>> when pushed into the inferior, was not null terminated.
>> 
>> This was causing problems when using the result with GDB's printf
>> command, as this command relies on the string having been pushed into
>> the inferior and being null terminated.
>> 
>> The bug includes a simple reproducer:
>> 
>>   #include <stddef.h>
>>   static char arena[51] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
>> 
>>   /* Override malloc() so value_coerce_to_target() gets a known pointer, and we
>>      know we"ll see an error if $_as_string() gives a string that isn't NULL
>>      terminated. */
>>   void
>>   *malloc (size_t size)
>>   {
>>       if (size > sizeof (arena))
>>           return NULL;
>>       return arena;
>>   }
>> 
>>   int
>>   main ()
>>   {
>>     return 0;
>>   }
>> 
>> Then use this in a GDB session like this:
>> 
>>   $ gdb -q test
>>   Reading symbols from /tmp/test...
>>   (gdb) start
>>   Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
>>   Starting program: /tmp/test
>> 
>>   Temporary breakpoint 1, main () at test.c:17
>>   17        return 0;
>>   (gdb) printf "%s\n", $_as_string("hello")
>>   "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>   (gdb) quit
>> 
>> The problem is with the GDB function value_cstring, or at least, how
>> this function is being used.
>> 
>> When $_as_string is called we enter fnpy_call (python/py-function.c),
>> this calls into Python code.  The Python code returns a result, which
>> will be a Python string, and then we call convert_value_from_python to
>> convert the Python string to a GDB value.
>> 
>> In convert_value_from_python (python/py-value.c) we enter the
>> gdbpy_is_string case (as the result is a string), then we call
>> python_string_to_target_string, which returns a null terminated C
>> string.  Next we then make this call:
>> 
>>   value = value_cstring (s.get (), strlen (s.get ()),
>>                          builtin_type_pychar);
>> 
>> This passes the pointer to the first character 's.get ()' and the
>> length of the string 'strlen (s.get ())', however, this length does
>> not include the trailing null character.
>> 
>> If we look at value_cstring (valops.c) we see that an array is created
>> using the passed in length, and characters are copied into the newly
>> allocated array value.  This means we do not copy the strings trailing
>> null character, nor does value_cstring add a trailing null.
>> 
>> Finally, when we use this array value with printf, GDB pushed the
>> array to the inferior, which mallocs some space based on the array
>> length, and then copies the array content to the inferior.
>> 
>> As the array doesn't include a trailing null, non is written into the
>
> non -> none
>
>> inferior.  So what we place into the inferior is not a C string, but
>> is actually an array of characters.
>> 
>> When printf tries to print the value it starts at the address of the
>> first character and continues until it reaches a null.  When that will
>> be is undefined, so we may end up printing random garbage.
>> 
>> Now, ignore for a moment that the whole push an array to the inferior
>> just so we can fetch it in order to print it is clearly crazy.  That's
>> a problem for another day I think.  The important question here is:
>> should value_cstring include a null character or not.
>> 
>> Given the function name include 'cstring', which I'm assuming means C
>> style string, I think that we should be including a trailing null.
>> 
>> Given that, I see two possibilities, either value_cstring can always
>> add a trailing null, or value_cstring can assert that there is a
>> trailing null, and the caller is responsible for making sure that the
>> passed in length includes the null character.
>> 
>> Given we're always passing from a C style string to begin with the
>> question is really, should the length being passed to value_cstring
>> include the null, or not include the null?
>> 
>> The only place where we currently include the null in the passed
>> length is from c_string_operation::evaluate.  Every other use of
>> value_cstring passes the length excluding the null.
>> 
>> I was tempted to adjust c_string_operation::evaluate to exclude the
>> null, and then have value_cstring add a trailing null.  However, this
>> does mean that if, in the future, a use is introduced that incorrectly
>> includes the trailing null in the passed length, then we are unlikely
>> to spot immediately - we'd instead create an array with two null
>> characters at the end.o
>
> You can always assert that the last character is not '\0'.

I thought about that, but I worried about strings that might contain
embedded '\0' characters... maybe we just don't care about them.

>
>> 
>> Alternatively, if we change the requirements of value_cstring so that
>> we require the passed length includes the trailing null, then we can
>> assert that this is indeed the case within value_cstring.  Any
>> incorrect uses in the future will be quickly spotted.
>> 
>> So that's what I did, c_string_operation::evaluate is left unchanged,
>> but every other use of value_cstring is adjusted with a '+ 1' so that
>> we include the null within the length being passed.
>
> That sounds counterintuitive to me.  With an API of style pointer +
> length, I don't expect the length to include the null terminator.  It
> also unnecessarily forces the caller to have a null-terminated version
> of the string, which may not always be the case (you might want to call
> value_cstring on a subset of an existing string).
>
> I think that:
>
> struct value *
> value_cstring (const char *ptr, ssize_t len, struct type *char_type)
>
> should take a length excluding the null terminator, but a null
> terminator in the result (its job is to build a C string, and a C string
> requires a null terminator at the end).

This is why writing comments is so important :)

I read it as "build a value* from this C-string", which is why I figured
we can assume there will be a '\0' at the end.

Anyway, I don't really mind either way, just so long as we can get
something that works!  I'll flip this around to the way you suggest and
repost.

Thanks for the feedback,
Andrew

>
> We can have the following overload, for convenience, for places that
> already have a C string but don't already know its length:
>
> struct value *
> value_cstring (const char *str, struct type *char_type)
> {
>   return value_cstring (str, strlen (str), char_type);
> }
>
>> I've added a header comment to value_cstring (value.h) to describe the
>> requirements.
>> 
>> Upon testing there were two tests that failed after this fix,
>> gdb.base/settings.exp and gdb.python/py-mi.exp.  In both of these
>> cases we end up asking for the type of a character array allocated
>> through value_cstring.  The length of this array has now increased by
>> one.  Here's the previous behaviour:
>> 
>>   (gdb) set args abc
>>   (gdb) p $_gdb_setting("args")
>>   $1 = "abc"
>>   (gdb) ptype $1
>>   type = char [3]
>>   (gdb)
>> 
>> And here's the modified behaviour:
>> 
>>   (gdb) set args abc
>>   (gdb) p $_gdb_setting("args")
>>   $1 = "abc"
>>   (gdb) ptype $1
>>   type = char [4]
>>   (gdb)
>> 
>> Notice the type of $1 changed from an array of length 3 to an array of
>> length 4.  However, I don't think this is a bad thing, consider:
>> 
>>   char lit[] = "zzz";
>>   int
>>   main()
>>   {
>>     return 0;
>>   }
>> 
>> And in GDB:
>> 
>>   (gdb) ptype lit
>>   type = char [4]
>>   (gdb)
>> 
>> The null character is considered part of the array, so I think the new
>> behaviour makes sense.
>
> Makes sense.
>
>> diff --git a/gdb/testsuite/gdb.base/cstring-exprs.c b/gdb/testsuite/gdb.base/cstring-exprs.c
>> new file mode 100644
>> index 00000000000..8135edd97d4
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.base/cstring-exprs.c
>> @@ -0,0 +1,51 @@
>> +/* This testcase is part of GDB, the GNU debugger.
>> +
>> +   Copyright 2023 Free Software Foundation, Inc.
>> +
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +#include <stddef.h>
>> +#include <string.h>
>> +
>> +/* A memory area used as the malloc memory buffer.  */
>> +
>> +static char arena[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
>> +
>> +/* Override malloc().  When GDB tries to push strings into the inferior we
>> +   always return the same pointer to arena.  This does mean we can't have
>> +   multiple strings in use at the same time, but that's fine for our basic
>> +   testing, and this is simpler than using dlsym.  */
>> +
>> +void
>> +*malloc (size_t size)
>
> The * is on the wrong line.
>
> Simon


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv2] gdb: fix missing null-character when using value_cstring
  2023-04-03 21:49 [PATCH] gdb: fix missing null-character when using value_cstring Andrew Burgess
  2023-04-04 13:58 ` Simon Marchi
@ 2023-04-07  6:35 ` Andrew Burgess
  2023-05-24 14:10   ` [PATCHv3] gdb: building inferior strings from within GDB Andrew Burgess
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Burgess @ 2023-04-07  6:35 UTC (permalink / raw)
  To: gdb-patches; +Cc: Simon Marchi, Andrew Burgess

In v2:

 - Updated inline with Simon's suggestion.  value_cstring is now
   responsible for adding the null character, users of value_cstring
   have been updated as required.

---

In PR gdb/21699 an issue was reported with the $_as_string convenience
function.  It was observed that the string returned by this function,
when pushed into the inferior, was not null terminated.

This was causing problems when using the result with GDB's printf
command, as this command relies on the string having been pushed into
the inferior and being null terminated.

The bug includes a simple reproducer:

  #include <stddef.h>
  static char arena[51] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";

  /* Override malloc() so value_coerce_to_target() gets a known pointer, and we
     know we"ll see an error if $_as_string() gives a string that isn't NULL
     terminated. */
  void
  *malloc (size_t size)
  {
      if (size > sizeof (arena))
          return NULL;
      return arena;
  }

  int
  main ()
  {
    return 0;
  }

Then use this in a GDB session like this:

  $ gdb -q test
  Reading symbols from /tmp/test...
  (gdb) start
  Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
  Starting program: /tmp/test

  Temporary breakpoint 1, main () at test.c:17
  17        return 0;
  (gdb) printf "%s\n", $_as_string("hello")
  "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  (gdb) quit

The problem is with the GDB function value_cstring, or at least, how
this function is being used.

When $_as_string is called we enter fnpy_call (python/py-function.c),
this calls into Python code.  The Python code returns a result, which
will be a Python string, and then we call convert_value_from_python to
convert the Python string to a GDB value.

In convert_value_from_python (python/py-value.c) we enter the
gdbpy_is_string case (as the result is a string), then we call
python_string_to_target_string, which returns a null terminated C
string.  Next we then make this call:

  value = value_cstring (s.get (), strlen (s.get ()),
                         builtin_type_pychar);

This passes the pointer to the first character 's.get ()' and the
length of the string 'strlen (s.get ())', however, this length does
not include the trailing null character.

If we look at value_cstring (valops.c) we see that an array is created
using the passed in length, and characters are copied into the newly
allocated array value.  This means we do not copy the strings trailing
null character, nor does value_cstring add a trailing null.

Finally, when we use this array value with printf, GDB pushed the
array to the inferior, which mallocs some space based on the array
length, and then copies the array content to the inferior.

As the array doesn't include a trailing null, none is written into the
inferior.  So what we place into the inferior is not a C string, but
is actually an array of characters.

When printf tries to print the value it starts at the address of the
first character and continues until it reaches a null.  When that will
be is undefined, so we may end up printing random garbage.

Now, ignore for a moment that the whole push an array to the inferior
just so we can fetch it in order to print it is clearly crazy.  That's
a problem for another day I think.  The important question here is:
should value_cstring include a null character or not.

Given the function name include 'cstring', which I'm assuming means C
style string, I think that we should be including a trailing null.

Currently, the use of value_cstring in c_string_operation::evaluate
includes the trailing null in the length sent to value_cstring, as do
two uses of value_cstring in str_value_from_setting.  Every other use
of value_cstring does not include the null in the length being sent,
and so is currently broken.

This commit changes value_cstring to add a null character at the end
of the generated array value, and updates c_string_operation::evaluate
and str_value_from_setting so that the length sent no longer includes
the trailing null.

I've added a header comment to value_cstring (value.h) to describe how
value_cstring works.

Upon testing there were two tests that failed after this fix,
gdb.base/settings.exp and gdb.python/py-mi.exp.  In both of these
cases we end up asking for the type of a character array allocated
through value_cstring.  The length of this array has now increased by
one.  Here's the previous behaviour:

  (gdb) set args abc
  (gdb) p $_gdb_setting("args")
  $1 = "abc"
  (gdb) ptype $1
  type = char [3]
  (gdb)

And here's the modified behaviour:

  (gdb) set args abc
  (gdb) p $_gdb_setting("args")
  $1 = "abc"
  (gdb) ptype $1
  type = char [4]
  (gdb)

Notice the type of $1 changed from an array of length 3 to an array of
length 4.  However, I don't think this is a bad thing, consider:

  char lit[] = "zzz";
  int
  main()
  {
    return 0;
  }

And in GDB:

  (gdb) ptype lit
  type = char [4]
  (gdb)

The null character is considered part of the array, so I think the new
behaviour makes sense.

I've added some new tests that try to exercise this issue in a few
different ways.  At the end of the day though it's all just variations
of the same thing, create a value by calling through value_cstring,
then force GDB to push the value to the inferior, and then treat that
as a C style string and see the lack of null character cause problems.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
---
 gdb/c-lang.c                             | 13 +++--
 gdb/cli/cli-cmds.c                       |  4 +-
 gdb/testsuite/gdb.base/cstring-exprs.c   | 51 ++++++++++++++++++
 gdb/testsuite/gdb.base/cstring-exprs.exp | 68 ++++++++++++++++++++++++
 gdb/testsuite/gdb.base/settings.exp      |  4 +-
 gdb/testsuite/gdb.python/py-mi.exp       |  2 +-
 gdb/valops.c                             | 12 +++++
 gdb/value.h                              |  6 +++
 8 files changed, 153 insertions(+), 7 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.c
 create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.exp

diff --git a/gdb/c-lang.c b/gdb/c-lang.c
index 6535ab93498..d845f1ad829 100644
--- a/gdb/c-lang.c
+++ b/gdb/c-lang.c
@@ -678,9 +678,16 @@ c_string_operation::evaluate (struct type *expect_type,
 		  obstack_object_size (&output));
 	}
       else
-	result = value_cstring ((const char *) obstack_base (&output),
-				obstack_object_size (&output),
-				type);
+	{
+	  /* We expect the string on the obstack to, at a minimum, contain
+	     a null character.  */
+	  const char *str = (const char *) obstack_base (&output);
+	  gdb_assert (obstack_object_size (&output) >= type->length ());
+	  ssize_t len = obstack_object_size (&output) - type->length ();
+	  for (i = 0; i < type->length (); ++i)
+	    gdb_assert (str[len + i] == '\0');
+	  result = value_cstring (str, len, type);
+	}
     }
   return result;
 }
diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
index 3b1c6a9f4bd..ad22a68a9ec 100644
--- a/gdb/cli/cli-cmds.c
+++ b/gdb/cli/cli-cmds.c
@@ -2310,7 +2310,7 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  return value_cstring (value, len,
 				builtin_type (gdbarch)->builtin_char);
 	else
-	  return value_cstring ("", 1,
+	  return value_cstring ("", 0,
 				builtin_type (gdbarch)->builtin_char);
       }
     default:
@@ -2395,7 +2395,7 @@ str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  return value_cstring (value, len,
 				builtin_type (gdbarch)->builtin_char);
 	else
-	  return value_cstring ("", 1,
+	  return value_cstring ("", 0,
 				builtin_type (gdbarch)->builtin_char);
       }
     default:
diff --git a/gdb/testsuite/gdb.base/cstring-exprs.c b/gdb/testsuite/gdb.base/cstring-exprs.c
new file mode 100644
index 00000000000..e541bf0bf3f
--- /dev/null
+++ b/gdb/testsuite/gdb.base/cstring-exprs.c
@@ -0,0 +1,51 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+#include <string.h>
+
+/* A memory area used as the malloc memory buffer.  */
+
+static char arena[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
+
+/* Override malloc().  When GDB tries to push strings into the inferior we
+   always return the same pointer to arena.  This does mean we can't have
+   multiple strings in use at the same time, but that's fine for our basic
+   testing, and this is simpler than using dlsym.  */
+
+void *
+malloc (size_t size)
+{
+  memset (arena, 'X', sizeof (arena));
+  if (size > sizeof (arena))
+    return NULL;
+  return arena;
+}
+
+/* This function is called from GDB.  */
+
+void
+take_string (const char *str)
+{
+  /* Nothing.  */
+}
+
+int
+main (void)
+{
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/cstring-exprs.exp b/gdb/testsuite/gdb.base/cstring-exprs.exp
new file mode 100644
index 00000000000..fc2f96affa0
--- /dev/null
+++ b/gdb/testsuite/gdb.base/cstring-exprs.exp
@@ -0,0 +1,68 @@
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test different ways that we can cause GDB to call the value_string
+# function.  This function should create a C style string, i.e. the
+# string should include a null terminating character.
+#
+# If the value created by value_cstring is pushed into the inferior
+# and the null terminating character is missing, then we can get
+# unexpected results.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" $testfile $srcfile] } {
+    return -1
+}
+
+if {![runto_main]} {
+    return 0
+}
+
+if [allow_python_tests] {
+    # The $_as_string convenience function is implemented in Python.
+    gdb_test {printf "%s\n", $_as_string("aabbcc")} "\"aabbcc\""
+
+    # Create a gdb.Value object for a string.  Take its address (which
+    # forces it into the inferior), and then print the address as a
+    # string.
+    gdb_test_no_output {python addr = gdb.Value("xxyyzz").address}
+    gdb_test {python gdb.execute("x/1s 0x%x" % addr)} \
+	"$hex <arena>:\\s+\"xxyyzz\""
+
+    # Call an inferior function through a gdb.Value object, pass a
+    # string to the inferior function and ensure it arrives correctly.
+    gdb_test "p/x take_string" " = $hex.*"
+    gdb_test_no_output "python func_ptr = gdb.history (0)" \
+	"place address of take_string into Python variable"
+    gdb_test "python func_value = func_ptr.dereference()" ""
+    gdb_breakpoint "take_string"
+    gdb_test {python result = func_value("qqaazz")} \
+	"Breakpoint $decimal, take_string \\(str=$hex <arena> \"qqaazz\"\\) at .*"
+    gdb_test "continue" "Continuing\\."
+}
+
+# Use printf on a string parsed by the C expression parser.
+gdb_test {printf "%s\n", "ddeeff"} "ddeeff"
+
+# Parse a string in the C expression parser, force it into the
+# inferior by taking its address, then print it as a string.
+gdb_test {x/1s &"gghhii"} "$hex <arena>:\\s+\"gghhii\""
+
+# Use $_gdb_setting_str and $_gdb_maint_setting_str to create a string
+# value, and then print using printf, which forces the string into the
+# inferior.
+gdb_test {printf "%s\n", $_gdb_setting_str("arch")} "auto"
+gdb_test {printf "%s\n", $_gdb_maint_setting_str("bfd-sharing")} "on"
diff --git a/gdb/testsuite/gdb.base/settings.exp b/gdb/testsuite/gdb.base/settings.exp
index e54203924ec..9dd3ee174c7 100644
--- a/gdb/testsuite/gdb.base/settings.exp
+++ b/gdb/testsuite/gdb.base/settings.exp
@@ -504,7 +504,9 @@ proc_with_prefix test-enum {} {
     gdb_test_no_output "$set_cmd zzz"
     show_setting "$show_cmd" "zzz" 0 "yyy"
 
-    check_type "test-settings enum" "type = char \\\[3\\\]"
+    # A string literal includes the trailing null character, hence the
+    # array size of four here.
+    check_type "test-settings enum" "type = char \\\[4\\\]"
 
     test_gdb_complete_multiple "$set_cmd " "" "" {
 	"xxx"
diff --git a/gdb/testsuite/gdb.python/py-mi.exp b/gdb/testsuite/gdb.python/py-mi.exp
index 97b8a42f715..c65c83553f1 100644
--- a/gdb/testsuite/gdb.python/py-mi.exp
+++ b/gdb/testsuite/gdb.python/py-mi.exp
@@ -288,7 +288,7 @@ mi_create_dynamic_varobj nstype2 nstype2 ".*" 1 \
   "create nstype2 varobj"
 
 mi_list_varobj_children nstype2 {
-    { {nstype2.<error at 0>} {<error at 0>} 6 {char \[6\]} }
+    { {nstype2.<error at 0>} {<error at 0>} 7 {char \[7\]} }
 } "list children after setting exception flag"
 
 mi_create_varobj me me \
diff --git a/gdb/valops.c b/gdb/valops.c
index d002c9dca9b..13df292d251 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1738,17 +1738,29 @@ value_array (int lowbound, int highbound, struct value **elemvec)
   return val;
 }
 
+/* See value.h.  */
+
 struct value *
 value_cstring (const char *ptr, ssize_t len, struct type *char_type)
 {
+  gdb_assert (len % char_type->length () == 0);
+
   struct value *val;
   int lowbound = current_language->string_lower_bound ();
   ssize_t highbound = len / char_type->length ();
+
+  /* Add one to HIGHBOUND to leave space for a null character.  */
+  highbound += 1;
+
   struct type *stringtype
     = lookup_array_range_type (char_type, lowbound, highbound + lowbound - 1);
 
   val = value::allocate (stringtype);
+  gdb_assert (val->type ()->length () >= len);
   memcpy (val->contents_raw ().data (), ptr, len);
+
+  /* Fill the null character.  */
+  memset (val->contents_raw ().data () + len, 0, char_type->length ());
   return val;
 }
 
diff --git a/gdb/value.h b/gdb/value.h
index e13f92c397b..86669a9a64b 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -1182,6 +1182,12 @@ class scoped_value_mark
   const struct value *m_value;
 };
 
+/* Create an array value, with element type CHAR_TYPE and length LEN.  The
+   array represents a C style string.  The content for the value is copied
+   from PTR.  This function will add a null-character at the end of the
+   resulting value, so LEN should not include the null from the source
+   string.  */
+
 extern struct value *value_cstring (const char *ptr, ssize_t len,
 				    struct type *char_type);
 extern struct value *value_string (const char *ptr, ssize_t len,

base-commit: 9340f361097963011369c3339f7d28239d2f851b
-- 
2.25.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gdb: fix missing null-character when using value_cstring
  2023-04-06 13:20   ` Andrew Burgess
@ 2023-04-11 12:58     ` Pedro Alves
  2023-04-12 20:47       ` Andrew Burgess
  0 siblings, 1 reply; 14+ messages in thread
From: Pedro Alves @ 2023-04-11 12:58 UTC (permalink / raw)
  To: Andrew Burgess, Simon Marchi, gdb-patches

On 2023-04-06 2:20 p.m., Andrew Burgess via Gdb-patches wrote:
> Simon Marchi <simark@simark.ca> writes:
> 
>> On 4/3/23 17:49, Andrew Burgess via Gdb-patches wrote:

>>> Alternatively, if we change the requirements of value_cstring so that
>>> we require the passed length includes the trailing null, then we can
>>> assert that this is indeed the case within value_cstring.  Any
>>> incorrect uses in the future will be quickly spotted.
>>>
>>> So that's what I did, c_string_operation::evaluate is left unchanged,
>>> but every other use of value_cstring is adjusted with a '+ 1' so that
>>> we include the null within the length being passed.
>>
>> That sounds counterintuitive to me.  With an API of style pointer +
>> length, I don't expect the length to include the null terminator.  It
>> also unnecessarily forces the caller to have a null-terminated version
>> of the string, which may not always be the case (you might want to call
>> value_cstring on a subset of an existing string).
>>
>> I think that:
>>
>> struct value *
>> value_cstring (const char *ptr, ssize_t len, struct type *char_type)
>>
>> should take a length excluding the null terminator, but a null
>> terminator in the result (its job is to build a C string, and a C string
>> requires a null terminator at the end).
> 
> This is why writing comments is so important :)
> 
> I read it as "build a value* from this C-string", which is why I figured
> we can assume there will be a '\0' at the end.
> 
> Anyway, I don't really mind either way, just so long as we can get
> something that works!  I'll flip this around to the way you suggest and
> repost.
> 

Hmm, this rang a bell.  See this patch and following discussion:

 https://inbox.sourceware.org/gdb-patches/20210713153142.3957666-1-simon.marchi@efficios.com/

Pedro Alves

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gdb: fix missing null-character when using value_cstring
  2023-04-11 12:58     ` Pedro Alves
@ 2023-04-12 20:47       ` Andrew Burgess
  2023-04-13 11:56         ` Pedro Alves
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Burgess @ 2023-04-12 20:47 UTC (permalink / raw)
  To: Pedro Alves, Simon Marchi, gdb-patches

Pedro Alves <pedro@palves.net> writes:

> On 2023-04-06 2:20 p.m., Andrew Burgess via Gdb-patches wrote:
>> Simon Marchi <simark@simark.ca> writes:
>> 
>>> On 4/3/23 17:49, Andrew Burgess via Gdb-patches wrote:
>
>>>> Alternatively, if we change the requirements of value_cstring so that
>>>> we require the passed length includes the trailing null, then we can
>>>> assert that this is indeed the case within value_cstring.  Any
>>>> incorrect uses in the future will be quickly spotted.
>>>>
>>>> So that's what I did, c_string_operation::evaluate is left unchanged,
>>>> but every other use of value_cstring is adjusted with a '+ 1' so that
>>>> we include the null within the length being passed.
>>>
>>> That sounds counterintuitive to me.  With an API of style pointer +
>>> length, I don't expect the length to include the null terminator.  It
>>> also unnecessarily forces the caller to have a null-terminated version
>>> of the string, which may not always be the case (you might want to call
>>> value_cstring on a subset of an existing string).
>>>
>>> I think that:
>>>
>>> struct value *
>>> value_cstring (const char *ptr, ssize_t len, struct type *char_type)
>>>
>>> should take a length excluding the null terminator, but a null
>>> terminator in the result (its job is to build a C string, and a C string
>>> requires a null terminator at the end).
>> 
>> This is why writing comments is so important :)
>> 
>> I read it as "build a value* from this C-string", which is why I figured
>> we can assume there will be a '\0' at the end.
>> 
>> Anyway, I don't really mind either way, just so long as we can get
>> something that works!  I'll flip this around to the way you suggest and
>> repost.
>> 
>
> Hmm, this rang a bell.  See this patch and following discussion:
>
>  https://inbox.sourceware.org/gdb-patches/20210713153142.3957666-1-simon.marchi@efficios.com/
>

Thanks for pointing that out.

I guess from the age of that patch it's not something you're planning to
get back to anytime soon.  I assume you'd have no objections if I
rebased your patch and merged it with my tests and reposted it?

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] gdb: fix missing null-character when using value_cstring
  2023-04-12 20:47       ` Andrew Burgess
@ 2023-04-13 11:56         ` Pedro Alves
  0 siblings, 0 replies; 14+ messages in thread
From: Pedro Alves @ 2023-04-13 11:56 UTC (permalink / raw)
  To: Andrew Burgess, Simon Marchi, gdb-patches

On 2023-04-12 9:47 p.m., Andrew Burgess wrote:
> Pedro Alves <pedro@palves.net> writes:

>> Hmm, this rang a bell.  See this patch and following discussion:
>>
>>  https://inbox.sourceware.org/gdb-patches/20210713153142.3957666-1-simon.marchi@efficios.com/
>>
> 
> Thanks for pointing that out.
> 
> I guess from the age of that patch it's not something you're planning to
> get back to anytime soon.  I assume you'd have no objections if I
> rebased your patch and merged it with my tests and reposted it?
> 

No objections at all.  Please feel free to use anything from it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv3] gdb: building inferior strings from within GDB
  2023-04-07  6:35 ` [PATCHv2] " Andrew Burgess
@ 2023-05-24 14:10   ` Andrew Burgess
  2023-05-24 15:42     ` Simon Marchi
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Burgess @ 2023-05-24 14:10 UTC (permalink / raw)
  To: gdb-patches; +Cc: Pedro Alves, Simon Marchi, Andrew Burgess, Simon Marchi

History Of This Patch
=====================

This commit aims to address PR gdb/21699.  There have now been a
couple of attempts to fix this issue.  Simon originally posted two
patches back in 2021:

  https://sourceware.org/pipermail/gdb-patches/2021-July/180894.html
  https://sourceware.org/pipermail/gdb-patches/2021-July/180896.html

Before Pedro then posted a version of his own:

  https://sourceware.org/pipermail/gdb-patches/2021-July/180970.html

After this the conversation halted.  Then in 2023 I (Andrew) also took
a look at this bug and posted two versions:

  https://sourceware.org/pipermail/gdb-patches/2023-April/198570.html
  https://sourceware.org/pipermail/gdb-patches/2023-April/198680.html

The approach taken in my first patch was pretty similar to what Simon
originally posted back in 2021.  My second attempt was only a slight
variation on the first.

Pedro then pointed out his older patch, and so we arrive at this
patch.  The GDB changes here are mostly Pedro's work, but updated by
me (Andrew), any mistakes are mine.

The tests here are a combinations of everyone's work, and the commit
message is new, but copies bits from everyone's earlier work.

Problem Description
===================

Bug PR gdb/21699 makes the observation that using $_as_string with
GDB's printf can cause GDB to print unexpected data from the
inferior.  The reproducer is pretty simple:

  #include <stddef.h>
  static char arena[100];

  /* Override malloc() so value_coerce_to_target() gets a known
     pointer, and we know we"ll see an error if $_as_string() gives
     a string that isn't null terminated. */
  void
  *malloc (size_t size)
  {
      memset (arena, 'x', sizeof (arena));
      if (size > sizeof (arena))
          return NULL;
      return arena;
  }

  int
  main ()
  {
    return 0;
  }

And then in a GDB session:

  $ gdb -q test
  Reading symbols from /tmp/test...
  (gdb) start
  Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
  Starting program: /tmp/test

  Temporary breakpoint 1, main () at test.c:17
  17        return 0;
  (gdb) printf "%s\n", $_as_string("hello")
  "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  (gdb) quit

The problem above is caused by how value_cstring is used within
py-value.c, but once we understand the issue then it turns out that
value_cstring is used in an unexpected way in many places within GDB.

Within py-value.c we have a null-terminated C-style string.  We then
pass a pointer to this string, along with the length of this
string (so not including the null-character) to value_cstring.

In value_cstring GDB allocates an array value of the given character
type, and copies in requested number of characters.  However
value_cstring does not add a null-character of its own.  This means
that the value created by calling value_cstring is only
null-terminated if the null-character is included in the passed in
length.  In py-value.c this is not the case, and indeed, in most uses
of value_cstring, this is not the case.

When GDB tries to print one of these strings the value contents are
pushed to the inferior, and then read back as a C-style string, that
is, GDB reads inferior memory until it finds a null-terminator.  For
the py-value.c case, no null-terminator is pushed into the inferior,
so GDB will continue reading inferior memory until a null-terminator
is found, with unpredictable results.

Patch Description
=================

The first thing this patch does is better define what the arguments
for the two function value_cstring and value_string should represent.
The comments in the header file are updated to describe whether the
length argument should, or should not, include a null-character.
Also, the data argument is changed to type gdb_byte.  The functions as
they currently exist will handle wide-characters, in which case more
than one 'char' would be needed for each character.  As such using
gdb_byte seems to make more sense.

To avoid adding casts throughout GDB, I've also added an overload that
still takes a 'char *', but asserts that the character type being used
is of size '1'.

The value_cstring function is now responsible for adding a null
character at the end of the string value it creates.

However, once we start looking at how value_cstring is used, we
realise there's another, related, problem.  Not every language's
strings are null terminated.  Fortran and Ada strings, for example,
are just an array of characters, GDB already has the function
value_string which can be used to create such values.

Consider this example using current GDB:

  (gdb) set language ada
  (gdb) p $_gdb_setting("arch")
  $1 = (97, 117, 116, 111)
  (gdb) ptype $
  type = array (1 .. 4) of char
  (gdb) p $_gdb_maint_setting("test-settings string")
  $2 = (0)
  (gdb) ptype $
  type = array (1 .. 1) of char

This shows two problems, first, the $_gdb_setting and
$_gdb_maint_setting functions are calling value_cstring using the
builtin_char character, rather than a language appropriate type.  In
the first call, the 'arch' case, the value_cstring call doesn't
include the null character, so the returned array only contains the
expected characters.  But, in the $_gdb_maint_setting example we do
end up including the null-character, even though this is not expected
for Ada strings.

This commit adds a new language method language_defn::value_string,
this function takes a pointer and length and creates a language
appropriate value that represents the string.  For C, C++, etc this
will be a null-terminated string (by calling value_cstring), and for
Fortran and Ada this can be a bounded array of characters with no null
terminator.  Additionally, this new language_defn::value_string
function is responsible for selecting a language appropriate character
type.

After this commit the only calls to value_cstring are from the C
expression evaluator and from the default language_defn::value_string.

And the only calls to value_string are from Fortan, Ada, and ObjectC
related code.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699

Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Co-Authored-By: Pedro Alves <pedro@palves.net>
---
 gdb/ada-lang.c                                |  13 +
 gdb/c-lang.c                                  |  14 +-
 gdb/cli/cli-cmds.c                            |  16 +-
 gdb/f-lang.c                                  |  10 +
 gdb/f-lang.h                                  |   5 +
 gdb/guile/scm-math.c                          |   4 +-
 gdb/language.c                                |  10 +
 gdb/language.h                                |   6 +
 gdb/python/py-value.c                         |   8 +-
 .../gdb.base/internal-string-values.c         |  32 ++
 .../gdb.base/internal-string-values.exp       | 279 ++++++++++++++++++
 .../gdb.base/print-internal-string.c          |  56 ++++
 .../gdb.base/print-internal-string.exp        |  64 ++++
 gdb/testsuite/gdb.base/settings.exp           |   2 +-
 gdb/testsuite/gdb.python/py-mi.exp            |   2 +-
 gdb/valops.c                                  |  23 +-
 gdb/value.c                                   |   5 +-
 gdb/value.h                                   |  41 ++-
 18 files changed, 546 insertions(+), 44 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/internal-string-values.c
 create mode 100644 gdb/testsuite/gdb.base/internal-string-values.exp
 create mode 100644 gdb/testsuite/gdb.base/print-internal-string.c
 create mode 100644 gdb/testsuite/gdb.base/print-internal-string.exp

diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
index 21f3348a161..567db6ddab7 100644
--- a/gdb/ada-lang.c
+++ b/gdb/ada-lang.c
@@ -13526,6 +13526,19 @@ class ada_language : public language_defn
     return symbol->is_artificial ();
   }
 
+  /* See language.h.  */
+  struct value *value_string (struct gdbarch *gdbarch,
+			      const char *ptr, ssize_t len) const override
+  {
+    struct type *type = language_string_char_type (this, gdbarch);
+    value *val = ::value_string (ptr, len, type);
+    /* VAL will be a TYPE_CODE_STRING, but Ada only knows how to print
+       strings that are arrays of characters, so fix the type now.  */
+    gdb_assert (val->type ()->code () == TYPE_CODE_STRING);
+    val->type ()->set_code (TYPE_CODE_ARRAY);
+    return val;
+  }
+
   /* See language.h.  */
   void language_arch_info (struct gdbarch *gdbarch,
 			   struct language_arch_info *lai) const override
diff --git a/gdb/c-lang.c b/gdb/c-lang.c
index 484f81e455b..b71457bd25f 100644
--- a/gdb/c-lang.c
+++ b/gdb/c-lang.c
@@ -652,16 +652,11 @@ c_string_operation::evaluate (struct type *expect_type,
     }
   else
     {
-      int i;
-
-      /* Write the terminating character.  */
-      for (i = 0; i < type->length (); ++i)
-	obstack_1grow (&output, 0);
+      int element_size = type->length ();
 
       if (satisfy_expected)
 	{
 	  LONGEST low_bound, high_bound;
-	  int element_size = type->length ();
 
 	  if (!get_discrete_bounds (expect_type->index_type (),
 				    &low_bound, &high_bound))
@@ -676,10 +671,13 @@ c_string_operation::evaluate (struct type *expect_type,
 	  result = value::allocate (expect_type);
 	  memcpy (result->contents_raw ().data (), obstack_base (&output),
 		  obstack_object_size (&output));
+	  /* Write the terminating character.  */
+	  memset (result->contents_raw ().data () + obstack_object_size (&output),
+		  0, element_size);
 	}
       else
-	result = value_cstring ((const char *) obstack_base (&output),
-				obstack_object_size (&output),
+	result = value_cstring ((const gdb_byte *) obstack_base (&output),
+				obstack_object_size (&output) / element_size,
 				type);
     }
   return result;
diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
index b7b65303a0b..3fa833d596c 100644
--- a/gdb/cli/cli-cmds.c
+++ b/gdb/cli/cli-cmds.c
@@ -2316,11 +2316,9 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  }
 
 	if (len > 0)
-	  return value_cstring (value, len,
-				builtin_type (gdbarch)->builtin_char);
+	  return current_language->value_string (gdbarch, value, len);
 	else
-	  return value_cstring ("", 1,
-				builtin_type (gdbarch)->builtin_char);
+	  return current_language->value_string (gdbarch, "", 0);
       }
     default:
       gdb_assert_not_reached ("bad var_type");
@@ -2372,8 +2370,8 @@ str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
       {
 	std::string cmd_val = get_setshow_command_value_string (var);
 
-	return value_cstring (cmd_val.c_str (), cmd_val.size (),
-			      builtin_type (gdbarch)->builtin_char);
+	return current_language->value_string (gdbarch, cmd_val.c_str (),
+					       cmd_val.size ());
       }
 
     case var_string:
@@ -2401,11 +2399,9 @@ str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  }
 
 	if (len > 0)
-	  return value_cstring (value, len,
-				builtin_type (gdbarch)->builtin_char);
+	  return current_language->value_string (gdbarch, value, len);
 	else
-	  return value_cstring ("", 1,
-				builtin_type (gdbarch)->builtin_char);
+	  return current_language->value_string (gdbarch, "", 0);
       }
     default:
       gdb_assert_not_reached ("bad var_type");
diff --git a/gdb/f-lang.c b/gdb/f-lang.c
index 365e0c0489b..7ab2a7b0688 100644
--- a/gdb/f-lang.c
+++ b/gdb/f-lang.c
@@ -101,6 +101,16 @@ f_language::get_encoding (struct type *type)
   return encoding;
 }
 
+/* See language.h.  */
+
+struct value *
+f_language::value_string (struct gdbarch *gdbarch,
+			  const char *ptr, ssize_t len) const
+{
+  struct type *type = language_string_char_type (this, gdbarch);
+  return ::value_string (ptr, len, type);
+}
+
 /* A helper function for the "bound" intrinsics that checks that TYPE
    is an array.  LBOUND_P is true for lower bound; this is used for
    the error message, if any.  */
diff --git a/gdb/f-lang.h b/gdb/f-lang.h
index 673e273d31a..0a29380d31d 100644
--- a/gdb/f-lang.h
+++ b/gdb/f-lang.h
@@ -203,6 +203,11 @@ class f_language : public language_defn
 
   /* See language.h.  */
 
+  struct value *value_string (struct gdbarch *gdbarch,
+			      const char *ptr, ssize_t len) const override;
+
+  /* See language.h.  */
+
   const char *struct_too_deep_ellipsis () const override
   { return "(...)"; }
 
diff --git a/gdb/guile/scm-math.c b/gdb/guile/scm-math.c
index 49fe93b97f8..32595cf0a68 100644
--- a/gdb/guile/scm-math.c
+++ b/gdb/guile/scm-math.c
@@ -803,9 +803,7 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
 					0 /*non-strict*/,
 					&except_scm);
 	      if (s != NULL)
-		value = value_cstring (s.get (), len,
-				       language_string_char_type (language,
-								  gdbarch));
+		value = language->value_string (gdbarch, s.get (), len);
 	      else
 		value = NULL;
 	    }
diff --git a/gdb/language.c b/gdb/language.c
index f82c5b173b3..c768971be42 100644
--- a/gdb/language.c
+++ b/gdb/language.c
@@ -874,6 +874,16 @@ language_string_char_type (const struct language_defn *la,
 
 /* See language.h.  */
 
+struct value *
+language_defn::value_string (struct gdbarch *gdbarch,
+			     const char *ptr, ssize_t len) const
+{
+  struct type *type = language_string_char_type (this, gdbarch);
+  return value_cstring (ptr, len, type);
+}
+
+/* See language.h.  */
+
 struct type *
 language_bool_type (const struct language_defn *la,
 		    struct gdbarch *gdbarch)
diff --git a/gdb/language.h b/gdb/language.h
index 0ebd4c1d957..e4b1e53b279 100644
--- a/gdb/language.h
+++ b/gdb/language.h
@@ -602,6 +602,12 @@ struct language_defn
   virtual char string_lower_bound () const
   { return c_style_arrays_p () ? 0 : 1; }
 
+  /* Return the LEN characters long string at PTR as a value suitable for
+     this language.  GDBARCH is used to infer the character type.  The
+     default implementation returns a null-terminated C string.  */
+  virtual struct value *value_string (struct gdbarch *gdbarch,
+				      const char *ptr, ssize_t len) const;
+
   /* Returns true if the symbols names should be stored in GDB's data
      structures for minimal/partial/full symbols using their linkage (aka
      mangled) form; false if the symbol names should be demangled first.
diff --git a/gdb/python/py-value.c b/gdb/python/py-value.c
index 6c62820c63b..a518965ae46 100644
--- a/gdb/python/py-value.c
+++ b/gdb/python/py-value.c
@@ -54,9 +54,6 @@
 #define builtin_type_pybool \
   language_bool_type (current_language, gdbpy_enter::get_gdbarch ())
 
-#define builtin_type_pychar \
-  language_string_char_type (current_language, gdbpy_enter::get_gdbarch ())
-
 struct value_object {
   PyObject_HEAD
   struct value_object *next;
@@ -1881,8 +1878,9 @@ convert_value_from_python (PyObject *obj)
 	  gdb::unique_xmalloc_ptr<char> s
 	    = python_string_to_target_string (obj);
 	  if (s != NULL)
-	    value = value_cstring (s.get (), strlen (s.get ()),
-				   builtin_type_pychar);
+	    value
+	      = current_language->value_string (gdbpy_enter::get_gdbarch (),
+						s.get (), strlen (s.get ()));
 	}
       else if (PyObject_TypeCheck (obj, &value_object_type))
 	value = ((value_object *) obj)->value->copy ();
diff --git a/gdb/testsuite/gdb.base/internal-string-values.c b/gdb/testsuite/gdb.base/internal-string-values.c
new file mode 100644
index 00000000000..e89f71707f7
--- /dev/null
+++ b/gdb/testsuite/gdb.base/internal-string-values.c
@@ -0,0 +1,32 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021-2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+static void
+trace_me (void)
+{}
+
+static void
+end (void)
+{}
+
+int
+main (void)
+{
+  trace_me ();
+  end ();
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/internal-string-values.exp b/gdb/testsuite/gdb.base/internal-string-values.exp
new file mode 100644
index 00000000000..ddf38a6f482
--- /dev/null
+++ b/gdb/testsuite/gdb.base/internal-string-values.exp
@@ -0,0 +1,279 @@
+# Copyright 2021-2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that string values are correctly allocated inside GDB when doing
+# various operations that yield strings.
+#
+# The issue that lead to this test was a missing NULL terminator in the
+# C-string values.  We verify that we can print the null terminator of these
+# strings.
+
+load_lib "trace-support.exp"
+load_lib "gdb-guile.exp"
+
+standard_testfile
+
+if {[build_executable "failed to prepare" $testfile $srcfile ]} {
+    return
+}
+
+set user_conv_funcs {$_gdb_setting $_gdb_setting_str}
+set maint_conv_funcs {$_gdb_maint_setting $_gdb_maint_setting_str}
+
+# Add language (LANG) appropriate quotation marks around string STR.
+proc quote_for_lang {lang str} {
+    if {$lang == "fortran"} {
+	return "'$str'"
+    } else {
+	return "\"$str\""
+    }
+}
+
+# Check that the string contained in the convenienced variable $v is
+# EXPECTED_STR.
+#
+# In particular, check that the null terminator is there and that we can't
+# access a character past the end of the string.
+
+proc check_v_string { expected_str } {
+    set len [string length $expected_str]
+
+    for { set i 0 } { $i < $len } { incr i } {
+	set c [string index $expected_str $i]
+	gdb_test "print \$v\[$i\]" "= $::decimal '$c'"
+    }
+
+    # Check that the string ends with a null terminator.
+    gdb_test "print \$v\[$i\]" {= 0 '\\000'}
+
+    # Check that we can't access a character after the end of the string.
+    incr i
+    gdb_test "print \$v\[$i\]" "no such vector element"
+}
+
+# Test with string values made by $_gdb_setting & co.
+
+proc_with_prefix test_setting { } {
+    clean_restart
+
+    # This is an internal GDB implementation detail, but the variable backing
+    # a string setting starts as nullptr (unless explicitly initialized at
+    # startup).  When assigning an empty value, the variable then points to an
+    # empty string.  Test both cases, as it triggers different code paths (in
+    # addition to a non-empty value).
+    #
+    # Use "set trace-user" and "maintenance set test-settings string" as they
+    # are both not initialized at startup.
+    with_test_prefix "user setting" {
+	with_test_prefix "not set" {
+	    foreach_with_prefix conv_func $::user_conv_funcs {
+		gdb_test_no_output "set \$v = ${conv_func}(\"trace-user\")"
+		check_v_string ""
+	    }
+	}
+
+	with_test_prefix "set to empty" {
+	    gdb_test "set trace-user"
+	    foreach_with_prefix conv_func $::user_conv_funcs {
+		gdb_test_no_output "set \$v = ${conv_func}(\"trace-user\")"
+		check_v_string ""
+	    }
+	}
+
+	with_test_prefix "set" {
+	    gdb_test "set trace-user poulet"
+	    foreach_with_prefix conv_func $::user_conv_funcs {
+		gdb_test_no_output {set $v = $_gdb_setting("trace-user")}
+		check_v_string "poulet"
+	    }
+	}
+    }
+
+    with_test_prefix "maintenance setting" {
+	with_test_prefix "not set" {
+	    foreach_with_prefix conv_func $::maint_conv_funcs {
+		gdb_test_no_output \
+		    "set \$v = ${conv_func}(\"test-settings string\")"
+		check_v_string ""
+	    }
+	}
+
+	with_test_prefix "set to empty" {
+	    gdb_test "maintenance set test-settings string"
+	    foreach_with_prefix conv_func $::maint_conv_funcs {
+		gdb_test_no_output \
+		    "set \$v = ${conv_func}(\"test-settings string\")"
+		check_v_string ""
+	    }
+	}
+
+	with_test_prefix "set" {
+	    gdb_test "maintenance set test-settings string perchaude"
+	    foreach_with_prefix conv_func $::maint_conv_funcs {
+		gdb_test_no_output \
+		    "set \$v = ${conv_func}(\"test-settings string\")"
+		check_v_string "perchaude"
+	    }
+	}
+    }
+
+    # Test with a non-string setting, this tests yet another code path.
+    with_test_prefix "integer setting" {
+	gdb_test_no_output {set $v = $_gdb_setting_str("remotetimeout")}
+	check_v_string "2"
+    }
+
+    # Test string values made by $_gdb_setting & co. in all languages.
+    with_test_prefix "all langs" {
+	# Get list of supported languages.
+	set langs [gdb_supported_languages]
+
+	gdb_test "maintenance set test-settings string foo"
+	foreach_with_prefix lang $langs {
+	    gdb_test_no_output "set language $lang"
+
+	    if {$lang == "modula-2"} {
+		# The Modula-2 parser doesn't know how to build a
+		# suitable string expression.
+		gdb_test "print \"foo\"" "strings are not implemented"
+		continue
+	    }
+
+	    if {$lang == "rust"} {
+		# Rust strings are actually structs, without a running
+		# inferior into which the string data can be pushed
+		# GDB can't print anything.
+		gdb_test "print \"foo\"" \
+		    "evaluation of this expression requires the target program to be active"
+		gdb_test "print \$_gdb_maint_setting(\"test-settings string\")" \
+		    "evaluation of this expression requires the target program to be active"
+		continue
+	    }
+
+	    if {$lang == "unknown"} {
+		# Skipped because expression parsing is not supported
+		# for the "unknown" language.  See gdb/28093 for more
+		# details.
+		continue
+	    }
+
+	    set print_output ""
+	    set ptype_output ""
+
+	    set foo_str [quote_for_lang $lang foo]
+	    gdb_test_multiple "print $foo_str" "" {
+		-wrap -re " = (.*)" {
+		    set print_output $expect_out(1,string)
+		    pass $gdb_test_name
+		}
+	    }
+
+	    gdb_test_multiple "ptype $foo_str" "" {
+		-wrap -re " = (.*)" {
+		    set ptype_output $expect_out(1,string)
+		    pass $gdb_test_name
+		}
+	    }
+
+	    set cmd_str [quote_for_lang $lang "test-settings string"]
+	    set ptype_output_re [string_to_regexp $ptype_output]
+	    set print_output_re [string_to_regexp $print_output]
+
+	    foreach_with_prefix conv_func $::maint_conv_funcs {
+		gdb_test "print ${conv_func}($cmd_str)" \
+		    " = $print_output_re"
+		gdb_test "ptype \$" \
+		    " = $ptype_output_re"
+	    }
+	}
+    }
+}
+
+# Test with a string value created by gdb.Value in Python.
+
+proc_with_prefix test_python_value { } {
+    clean_restart
+
+    if {![allow_python_tests]} {
+	untested "skipping test_python_value"
+	return
+    }
+
+    gdb_test_no_output "python gdb.set_convenience_variable(\"v\", \"bar\")" \
+	"set convenience var"
+    check_v_string "bar"
+}
+
+# Test with a string value created by make-value in Guile.
+
+proc_with_prefix test_guile_value { } {
+    clean_restart
+
+    if {![allow_guile_tests]} {
+	untested "skipping test_guile_value"
+	return
+    }
+
+    # We can't set a convenience var from Guile, but we can append to history.
+    # Do that, then transfer to a convenience var with a CLI command.
+    gdb_test_no_output "guile (use-modules (gdb))"
+    gdb_test_multiple "guile (history-append! (make-value \"foo\"))" "make value" {
+	-re -wrap "($::decimal)" {
+	    set histnum $expect_out(1,string)
+	}
+    }
+
+    gdb_test_no_output "set \$v = \$$histnum"
+    check_v_string "foo"
+}
+
+# Test with a string value coming from a string internal var.  The only internal
+# vars of this type, at the time of writing, are $trace_func and $trace_file.
+# They both require inspecting a trace frame.  So if the target is capable start
+# tracing, record one trace frame, and use $trace_func.
+
+proc_with_prefix test_internal_var { } {
+    if {![gdb_trace_common_supports_arch]} {
+	unsupported "arch does not support trace"
+	return
+    }
+
+    clean_restart $::binfile
+
+    if {![runto_main]} {
+	fail "could not run to main"
+	return
+    }
+
+    if {![gdb_target_supports_trace]} {
+	unsupported "target does not support trace"
+	return
+    }
+
+    gdb_breakpoint "end"
+    gdb_test "trace trace_me" "Tracepoint $::decimal at $::hex.*"
+    gdb_test_no_output "tstart"
+    gdb_continue_to_breakpoint "breakpoint at end"
+    gdb_test_no_output "tstop"
+    gdb_test "tfind" "Found trace frame 0, tracepoint $::decimal.*"
+    gdb_test_no_output "set \$v = \$trace_func"
+    gdb_test "tfind none" "No longer looking at any trace frame.*"
+    check_v_string "trace_me"
+}
+
+test_setting
+test_python_value
+test_guile_value
+test_internal_var
diff --git a/gdb/testsuite/gdb.base/print-internal-string.c b/gdb/testsuite/gdb.base/print-internal-string.c
new file mode 100644
index 00000000000..09dab882ae1
--- /dev/null
+++ b/gdb/testsuite/gdb.base/print-internal-string.c
@@ -0,0 +1,56 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+#include <string.h>
+
+/* A memory area used as the malloc memory buffer.  */
+
+static char arena[256];
+
+/* Override malloc().  When GDB tries to push strings into the inferior we
+   will always return the same pointer (to arena).  This does mean we can't
+   have multiple strings in use at the same time, but that's fine for this
+   simple test.  On each malloc call the contents of arena are reset, which
+   should make it more obvious if GDB tried to print memory that it
+   shouldn't.  */
+
+void *
+malloc (size_t size)
+{
+  /* Reset the contents of arena, and ensure there's a null-character at
+     the end just in case GDB tries to print memory that it shouldn't.  */
+  memset (arena, 'X', sizeof (arena));
+  arena [sizeof (arena) - 1] = '\0';
+  if (size > sizeof (arena))
+    return NULL;
+  return arena;
+}
+
+/* This function is called from GDB.  */
+
+void
+take_string (const char *str)
+{
+  /* Nothing.  */
+}
+
+int
+main (void)
+{
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/print-internal-string.exp b/gdb/testsuite/gdb.base/print-internal-string.exp
new file mode 100644
index 00000000000..be5b2f38206
--- /dev/null
+++ b/gdb/testsuite/gdb.base/print-internal-string.exp
@@ -0,0 +1,64 @@
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test different ways in which GDB can create a string and then push
+# that string into the inferior before reading it back.  Check that
+# the thing that is read back is correctly interpreted as a string.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" $testfile $srcfile] } {
+    return -1
+}
+
+if {![runto_main]} {
+    return 0
+}
+
+if [allow_python_tests] {
+    # The $_as_string convenience function is implemented in Python.
+    gdb_test {printf "%s\n", $_as_string("aabbcc")} "\"aabbcc\""
+
+    # Create a gdb.Value object for a string.  Take its address (which
+    # forces it into the inferior), and then print the address as a
+    # string.
+    gdb_test_no_output {python addr = gdb.Value("ccbbaa").address}
+    gdb_test {python gdb.execute("x/1s 0x%x" % addr)} \
+	"$hex <arena>:\\s+\"ccbbaa\""
+
+    # Call an inferior function through a gdb.Value object, pass a
+    # string to the inferior function and ensure it arrives correctly.
+    gdb_test "p/x take_string" " = $hex.*"
+    gdb_test_no_output "python func_ptr = gdb.history (0)" \
+	"place address of take_string into Python variable"
+    gdb_test "python func_value = func_ptr.dereference()" ""
+    gdb_breakpoint "take_string"
+    gdb_test {python result = func_value("qqaazz")} \
+	"Breakpoint $decimal, take_string \\(str=$hex <arena> \"qqaazz\"\\) at .*"
+    gdb_test "continue" "Continuing\\."
+}
+
+# Use printf on a string parsed by the C expression parser.
+gdb_test {printf "%s\n", "ddeeff"} "ddeeff"
+
+# Parse a string in the C expression parser, force it into the
+# inferior by taking its address, then print it as a string.
+gdb_test {x/1s &"gghhii"} "$hex <arena>:\\s+\"gghhii\""
+
+# Use $_gdb_setting_str and $_gdb_maint_setting_str to create a string
+# value, and then print using printf, which forces the string into the
+# inferior.
+gdb_test {printf "%s\n", $_gdb_setting_str("arch")} "auto"
+gdb_test {printf "%s\n", $_gdb_maint_setting_str("bfd-sharing")} "on"
diff --git a/gdb/testsuite/gdb.base/settings.exp b/gdb/testsuite/gdb.base/settings.exp
index ac885d838a1..dc96f85c1bb 100644
--- a/gdb/testsuite/gdb.base/settings.exp
+++ b/gdb/testsuite/gdb.base/settings.exp
@@ -504,7 +504,7 @@ proc_with_prefix test-enum {} {
     gdb_test_no_output "$set_cmd zzz"
     show_setting "$show_cmd" "zzz" 0 "yyy"
 
-    check_type "test-settings enum" "type = char \\\[3\\\]"
+    check_type "test-settings enum" "type = char \\\[4\\\]"
 
     test_gdb_complete_multiple "$set_cmd " "" "" {
 	"xxx"
diff --git a/gdb/testsuite/gdb.python/py-mi.exp b/gdb/testsuite/gdb.python/py-mi.exp
index 97b8a42f715..c65c83553f1 100644
--- a/gdb/testsuite/gdb.python/py-mi.exp
+++ b/gdb/testsuite/gdb.python/py-mi.exp
@@ -288,7 +288,7 @@ mi_create_dynamic_varobj nstype2 nstype2 ".*" 1 \
   "create nstype2 varobj"
 
 mi_list_varobj_children nstype2 {
-    { {nstype2.<error at 0>} {<error at 0>} 6 {char \[6\]} }
+    { {nstype2.<error at 0>} {<error at 0>} 7 {char \[7\]} }
 } "list children after setting exception flag"
 
 mi_create_varobj me me \
diff --git a/gdb/valops.c b/gdb/valops.c
index 22be4805a52..5806003c3b6 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1738,39 +1738,38 @@ value_array (int lowbound, int highbound, struct value **elemvec)
   return val;
 }
 
+/* See value.h.  */
+
 struct value *
-value_cstring (const char *ptr, ssize_t len, struct type *char_type)
+value_cstring (const gdb_byte *ptr, ssize_t count, struct type *char_type)
 {
   struct value *val;
   int lowbound = current_language->string_lower_bound ();
-  ssize_t highbound = len / char_type->length ();
+  ssize_t highbound = count + 1;
   struct type *stringtype
     = lookup_array_range_type (char_type, lowbound, highbound + lowbound - 1);
 
   val = value::allocate (stringtype);
+  ssize_t len = count * char_type->length ();
   memcpy (val->contents_raw ().data (), ptr, len);
+  /* Write the terminating null-character.  */
+  memset (val->contents_raw ().data () + len, 0, char_type->length ());
   return val;
 }
 
-/* Create a value for a string constant by allocating space in the
-   inferior, copying the data into that space, and returning the
-   address with type TYPE_CODE_STRING.  PTR points to the string
-   constant data; LEN is number of characters.
-
-   Note that string types are like array of char types with a lower
-   bound of zero and an upper bound of LEN - 1.  Also note that the
-   string may contain embedded null bytes.  */
+/* See value.h.  */
 
 struct value *
-value_string (const char *ptr, ssize_t len, struct type *char_type)
+value_string (const gdb_byte *ptr, ssize_t count, struct type *char_type)
 {
   struct value *val;
   int lowbound = current_language->string_lower_bound ();
-  ssize_t highbound = len / char_type->length ();
+  ssize_t highbound = count;
   struct type *stringtype
     = lookup_string_range_type (char_type, lowbound, highbound + lowbound - 1);
 
   val = value::allocate (stringtype);
+  ssize_t len = count * char_type->length ();
   memcpy (val->contents_raw ().data (), ptr, len);
   return val;
 }
diff --git a/gdb/value.c b/gdb/value.c
index 980722a6dd7..95178682284 100644
--- a/gdb/value.c
+++ b/gdb/value.c
@@ -2044,8 +2044,9 @@ value_of_internalvar (struct gdbarch *gdbarch, struct internalvar *var)
       break;
 
     case INTERNALVAR_STRING:
-      val = value_cstring (var->u.string, strlen (var->u.string),
-			   builtin_type (gdbarch)->builtin_char);
+      val = current_language->value_string (gdbarch,
+					    var->u.string,
+					    strlen (var->u.string));
       break;
 
     case INTERNALVAR_VALUE:
diff --git a/gdb/value.h b/gdb/value.h
index d042d816409..7a0b74ad6e9 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -1182,11 +1182,48 @@ class scoped_value_mark
   const struct value *m_value;
 };
 
-extern struct value *value_cstring (const char *ptr, ssize_t len,
+/* Create not_lval value representing a NULL-terminated C string.  The
+   resulting value has type TYPE_CODE_ARRAY.  The string passed in should
+   not include embedded null characters.
+
+   PTR points to the string data; COUNT is number of characters (does
+   not include the NULL terminator) pointed to by PTR, each character is of
+   type (and size of) CHAR_TYPE.  */
+
+extern struct value *value_cstring (const gdb_byte *ptr, ssize_t count,
 				    struct type *char_type);
-extern struct value *value_string (const char *ptr, ssize_t len,
+
+/* Specialisation of value_cstring above.  In this case PTR points to
+   single byte characters.  CHAR_TYPE must have a length of 1.  */
+inline struct value *value_cstring (const char *ptr, ssize_t count,
+				    struct type *char_type)
+{
+  gdb_assert (char_type->length () == 1);
+  return value_cstring ((const gdb_byte *) ptr, count, char_type);
+}
+
+/* Create a not_lval value with type TYPE_CODE_STRING, the resulting value
+   has type TYPE_CODE_STRING.
+
+   PTR points to the string data; COUNT is number of characters pointed to
+   by PTR, each character has the type (and size of) CHAR_TYPE.
+
+   Note that string types are like array of char types with a lower bound
+   defined by the language (usually zero or one).  Also the string may
+   contain embedded null characters.  */
+
+extern struct value *value_string (const gdb_byte *ptr, ssize_t count,
 				   struct type *char_type);
 
+/* Specialisation of value_string above.  In this case PTR points to
+   single byte characters.  CHAR_TYPE must have a length of 1.  */
+inline struct value *value_string (const char *ptr, ssize_t count,
+				   struct type *char_type)
+{
+  gdb_assert (char_type->length () == 1);
+  return value_string ((const gdb_byte *) ptr, count, char_type);
+}
+
 extern struct value *value_array (int lowbound, int highbound,
 				  struct value **elemvec);
 

base-commit: e84060b489746d031ed1ec9e7b6b39fdf4b6cfe3
-- 
2.25.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] gdb: building inferior strings from within GDB
  2023-05-24 14:10   ` [PATCHv3] gdb: building inferior strings from within GDB Andrew Burgess
@ 2023-05-24 15:42     ` Simon Marchi
  2023-06-05 12:26       ` Andrew Burgess
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Marchi @ 2023-05-24 15:42 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches; +Cc: Pedro Alves, Simon Marchi

On 5/24/23 10:10, Andrew Burgess wrote:
> History Of This Patch
> =====================
> 
> This commit aims to address PR gdb/21699.  There have now been a
> couple of attempts to fix this issue.  Simon originally posted two
> patches back in 2021:
> 
>   https://sourceware.org/pipermail/gdb-patches/2021-July/180894.html
>   https://sourceware.org/pipermail/gdb-patches/2021-July/180896.html
> 
> Before Pedro then posted a version of his own:
> 
>   https://sourceware.org/pipermail/gdb-patches/2021-July/180970.html
> 
> After this the conversation halted.  Then in 2023 I (Andrew) also took
> a look at this bug and posted two versions:
> 
>   https://sourceware.org/pipermail/gdb-patches/2023-April/198570.html
>   https://sourceware.org/pipermail/gdb-patches/2023-April/198680.html
> 
> The approach taken in my first patch was pretty similar to what Simon
> originally posted back in 2021.  My second attempt was only a slight
> variation on the first.
> 
> Pedro then pointed out his older patch, and so we arrive at this
> patch.  The GDB changes here are mostly Pedro's work, but updated by
> me (Andrew), any mistakes are mine.
> 
> The tests here are a combinations of everyone's work, and the commit
> message is new, but copies bits from everyone's earlier work.
> 
> Problem Description
> ===================
> 
> Bug PR gdb/21699 makes the observation that using $_as_string with
> GDB's printf can cause GDB to print unexpected data from the
> inferior.  The reproducer is pretty simple:
> 
>   #include <stddef.h>
>   static char arena[100];
> 
>   /* Override malloc() so value_coerce_to_target() gets a known
>      pointer, and we know we"ll see an error if $_as_string() gives
>      a string that isn't null terminated. */
>   void
>   *malloc (size_t size)
>   {
>       memset (arena, 'x', sizeof (arena));
>       if (size > sizeof (arena))
>           return NULL;
>       return arena;
>   }
> 
>   int
>   main ()
>   {
>     return 0;
>   }
> 
> And then in a GDB session:
> 
>   $ gdb -q test
>   Reading symbols from /tmp/test...
>   (gdb) start
>   Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
>   Starting program: /tmp/test
> 
>   Temporary breakpoint 1, main () at test.c:17
>   17        return 0;
>   (gdb) printf "%s\n", $_as_string("hello")
>   "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>   (gdb) quit
> 
> The problem above is caused by how value_cstring is used within
> py-value.c, but once we understand the issue then it turns out that
> value_cstring is used in an unexpected way in many places within GDB.
> 
> Within py-value.c we have a null-terminated C-style string.  We then
> pass a pointer to this string, along with the length of this
> string (so not including the null-character) to value_cstring.
> 
> In value_cstring GDB allocates an array value of the given character
> type, and copies in requested number of characters.  However
> value_cstring does not add a null-character of its own.  This means
> that the value created by calling value_cstring is only
> null-terminated if the null-character is included in the passed in
> length.  In py-value.c this is not the case, and indeed, in most uses
> of value_cstring, this is not the case.
> 
> When GDB tries to print one of these strings the value contents are
> pushed to the inferior, and then read back as a C-style string, that
> is, GDB reads inferior memory until it finds a null-terminator.  For
> the py-value.c case, no null-terminator is pushed into the inferior,
> so GDB will continue reading inferior memory until a null-terminator
> is found, with unpredictable results.
> 
> Patch Description
> =================
> 
> The first thing this patch does is better define what the arguments
> for the two function value_cstring and value_string should represent.
> The comments in the header file are updated to describe whether the
> length argument should, or should not, include a null-character.
> Also, the data argument is changed to type gdb_byte.  The functions as
> they currently exist will handle wide-characters, in which case more
> than one 'char' would be needed for each character.  As such using
> gdb_byte seems to make more sense.
> 
> To avoid adding casts throughout GDB, I've also added an overload that
> still takes a 'char *', but asserts that the character type being used
> is of size '1'.
> 
> The value_cstring function is now responsible for adding a null
> character at the end of the string value it creates.
> 
> However, once we start looking at how value_cstring is used, we
> realise there's another, related, problem.  Not every language's
> strings are null terminated.  Fortran and Ada strings, for example,
> are just an array of characters, GDB already has the function
> value_string which can be used to create such values.
> 
> Consider this example using current GDB:
> 
>   (gdb) set language ada
>   (gdb) p $_gdb_setting("arch")
>   $1 = (97, 117, 116, 111)
>   (gdb) ptype $
>   type = array (1 .. 4) of char
>   (gdb) p $_gdb_maint_setting("test-settings string")
>   $2 = (0)
>   (gdb) ptype $
>   type = array (1 .. 1) of char
> 
> This shows two problems, first, the $_gdb_setting and
> $_gdb_maint_setting functions are calling value_cstring using the
> builtin_char character, rather than a language appropriate type.  In
> the first call, the 'arch' case, the value_cstring call doesn't
> include the null character, so the returned array only contains the
> expected characters.  But, in the $_gdb_maint_setting example we do
> end up including the null-character, even though this is not expected
> for Ada strings.
> 
> This commit adds a new language method language_defn::value_string,
> this function takes a pointer and length and creates a language
> appropriate value that represents the string.  For C, C++, etc this
> will be a null-terminated string (by calling value_cstring), and for
> Fortran and Ada this can be a bounded array of characters with no null
> terminator.  Additionally, this new language_defn::value_string
> function is responsible for selecting a language appropriate character
> type.
> 
> After this commit the only calls to value_cstring are from the C
> expression evaluator and from the default language_defn::value_string.
> 
> And the only calls to value_string are from Fortan, Ada, and ObjectC
> related code.
> 
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
> 
> Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
> Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
> Co-Authored-By: Pedro Alves <pedro@palves.net>

This LGTM:

Approved-By: Simon Marchi <simon.marchi@efficios.com>

Just one small question:

> diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
> index b7b65303a0b..3fa833d596c 100644
> --- a/gdb/cli/cli-cmds.c
> +++ b/gdb/cli/cli-cmds.c
> @@ -2316,11 +2316,9 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
>  	  }
>  
>  	if (len > 0)
> -	  return value_cstring (value, len,
> -				builtin_type (gdbarch)->builtin_char);
> +	  return current_language->value_string (gdbarch, value, len);
>  	else
> -	  return value_cstring ("", 1,
> -				builtin_type (gdbarch)->builtin_char);
> +	  return current_language->value_string (gdbarch, "", 0);

Do we still need the two calls, or can we just do with the first (even
when len is 0)?  Same idea for str_value_from_setting.

Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] gdb: building inferior strings from within GDB
  2023-05-24 15:42     ` Simon Marchi
@ 2023-06-05 12:26       ` Andrew Burgess
  2023-06-05 17:57         ` Simon Marchi
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Burgess @ 2023-06-05 12:26 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Pedro Alves, Simon Marchi

Simon Marchi <simark@simark.ca> writes:

> On 5/24/23 10:10, Andrew Burgess wrote:
>> History Of This Patch
>> =====================
>> 
>> This commit aims to address PR gdb/21699.  There have now been a
>> couple of attempts to fix this issue.  Simon originally posted two
>> patches back in 2021:
>> 
>>   https://sourceware.org/pipermail/gdb-patches/2021-July/180894.html
>>   https://sourceware.org/pipermail/gdb-patches/2021-July/180896.html
>> 
>> Before Pedro then posted a version of his own:
>> 
>>   https://sourceware.org/pipermail/gdb-patches/2021-July/180970.html
>> 
>> After this the conversation halted.  Then in 2023 I (Andrew) also took
>> a look at this bug and posted two versions:
>> 
>>   https://sourceware.org/pipermail/gdb-patches/2023-April/198570.html
>>   https://sourceware.org/pipermail/gdb-patches/2023-April/198680.html
>> 
>> The approach taken in my first patch was pretty similar to what Simon
>> originally posted back in 2021.  My second attempt was only a slight
>> variation on the first.
>> 
>> Pedro then pointed out his older patch, and so we arrive at this
>> patch.  The GDB changes here are mostly Pedro's work, but updated by
>> me (Andrew), any mistakes are mine.
>> 
>> The tests here are a combinations of everyone's work, and the commit
>> message is new, but copies bits from everyone's earlier work.
>> 
>> Problem Description
>> ===================
>> 
>> Bug PR gdb/21699 makes the observation that using $_as_string with
>> GDB's printf can cause GDB to print unexpected data from the
>> inferior.  The reproducer is pretty simple:
>> 
>>   #include <stddef.h>
>>   static char arena[100];
>> 
>>   /* Override malloc() so value_coerce_to_target() gets a known
>>      pointer, and we know we"ll see an error if $_as_string() gives
>>      a string that isn't null terminated. */
>>   void
>>   *malloc (size_t size)
>>   {
>>       memset (arena, 'x', sizeof (arena));
>>       if (size > sizeof (arena))
>>           return NULL;
>>       return arena;
>>   }
>> 
>>   int
>>   main ()
>>   {
>>     return 0;
>>   }
>> 
>> And then in a GDB session:
>> 
>>   $ gdb -q test
>>   Reading symbols from /tmp/test...
>>   (gdb) start
>>   Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
>>   Starting program: /tmp/test
>> 
>>   Temporary breakpoint 1, main () at test.c:17
>>   17        return 0;
>>   (gdb) printf "%s\n", $_as_string("hello")
>>   "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>   (gdb) quit
>> 
>> The problem above is caused by how value_cstring is used within
>> py-value.c, but once we understand the issue then it turns out that
>> value_cstring is used in an unexpected way in many places within GDB.
>> 
>> Within py-value.c we have a null-terminated C-style string.  We then
>> pass a pointer to this string, along with the length of this
>> string (so not including the null-character) to value_cstring.
>> 
>> In value_cstring GDB allocates an array value of the given character
>> type, and copies in requested number of characters.  However
>> value_cstring does not add a null-character of its own.  This means
>> that the value created by calling value_cstring is only
>> null-terminated if the null-character is included in the passed in
>> length.  In py-value.c this is not the case, and indeed, in most uses
>> of value_cstring, this is not the case.
>> 
>> When GDB tries to print one of these strings the value contents are
>> pushed to the inferior, and then read back as a C-style string, that
>> is, GDB reads inferior memory until it finds a null-terminator.  For
>> the py-value.c case, no null-terminator is pushed into the inferior,
>> so GDB will continue reading inferior memory until a null-terminator
>> is found, with unpredictable results.
>> 
>> Patch Description
>> =================
>> 
>> The first thing this patch does is better define what the arguments
>> for the two function value_cstring and value_string should represent.
>> The comments in the header file are updated to describe whether the
>> length argument should, or should not, include a null-character.
>> Also, the data argument is changed to type gdb_byte.  The functions as
>> they currently exist will handle wide-characters, in which case more
>> than one 'char' would be needed for each character.  As such using
>> gdb_byte seems to make more sense.
>> 
>> To avoid adding casts throughout GDB, I've also added an overload that
>> still takes a 'char *', but asserts that the character type being used
>> is of size '1'.
>> 
>> The value_cstring function is now responsible for adding a null
>> character at the end of the string value it creates.
>> 
>> However, once we start looking at how value_cstring is used, we
>> realise there's another, related, problem.  Not every language's
>> strings are null terminated.  Fortran and Ada strings, for example,
>> are just an array of characters, GDB already has the function
>> value_string which can be used to create such values.
>> 
>> Consider this example using current GDB:
>> 
>>   (gdb) set language ada
>>   (gdb) p $_gdb_setting("arch")
>>   $1 = (97, 117, 116, 111)
>>   (gdb) ptype $
>>   type = array (1 .. 4) of char
>>   (gdb) p $_gdb_maint_setting("test-settings string")
>>   $2 = (0)
>>   (gdb) ptype $
>>   type = array (1 .. 1) of char
>> 
>> This shows two problems, first, the $_gdb_setting and
>> $_gdb_maint_setting functions are calling value_cstring using the
>> builtin_char character, rather than a language appropriate type.  In
>> the first call, the 'arch' case, the value_cstring call doesn't
>> include the null character, so the returned array only contains the
>> expected characters.  But, in the $_gdb_maint_setting example we do
>> end up including the null-character, even though this is not expected
>> for Ada strings.
>> 
>> This commit adds a new language method language_defn::value_string,
>> this function takes a pointer and length and creates a language
>> appropriate value that represents the string.  For C, C++, etc this
>> will be a null-terminated string (by calling value_cstring), and for
>> Fortran and Ada this can be a bounded array of characters with no null
>> terminator.  Additionally, this new language_defn::value_string
>> function is responsible for selecting a language appropriate character
>> type.
>> 
>> After this commit the only calls to value_cstring are from the C
>> expression evaluator and from the default language_defn::value_string.
>> 
>> And the only calls to value_string are from Fortan, Ada, and ObjectC
>> related code.
>> 
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
>> 
>> Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
>> Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
>> Co-Authored-By: Pedro Alves <pedro@palves.net>
>
> This LGTM:
>
> Approved-By: Simon Marchi <simon.marchi@efficios.com>
>
> Just one small question:
>
>> diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
>> index b7b65303a0b..3fa833d596c 100644
>> --- a/gdb/cli/cli-cmds.c
>> +++ b/gdb/cli/cli-cmds.c
>> @@ -2316,11 +2316,9 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
>>  	  }
>>  
>>  	if (len > 0)
>> -	  return value_cstring (value, len,
>> -				builtin_type (gdbarch)->builtin_char);
>> +	  return current_language->value_string (gdbarch, value, len);
>>  	else
>> -	  return value_cstring ("", 1,
>> -				builtin_type (gdbarch)->builtin_char);
>> +	  return current_language->value_string (gdbarch, "", 0);
>
> Do we still need the two calls, or can we just do with the first (even
> when len is 0)?  Same idea for str_value_from_setting.

You are right.  I merged these two calls, and the other two in
str_value_from_setting, and pushed this patch.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] gdb: building inferior strings from within GDB
  2023-06-05 12:26       ` Andrew Burgess
@ 2023-06-05 17:57         ` Simon Marchi
  2023-06-06 15:50           ` Andrew Burgess
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Marchi @ 2023-06-05 17:57 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches; +Cc: Pedro Alves, Simon Marchi

On 6/5/23 08:26, Andrew Burgess via Gdb-patches wrote:
> You are right.  I merged these two calls, and the other two in
> str_value_from_setting, and pushed this patch.

Turns out this test triggers an ASan error:

(gdb) PASS: gdb.base/internal-string-values.exp: test_setting: all langs: lang=ada: ptype "foo"
print $_gdb_maint_setting("test-settings string")
=================================================================
==80377==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000068034 at pc 0x564785cba682 bp 0x7ffd20644620 sp 0x7ffd20644610
READ of size 1 at 0x603000068034 thread T0
    #0 0x564785cba681 in find_command_name_length(char const*) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2129
    #1 0x564785cbacb2 in lookup_cmd_1(char const**, cmd_list_element*, cmd_list_element**, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, bool) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2186
    #2 0x564785cbb539 in lookup_cmd_1(char const**, cmd_list_element*, cmd_list_element**, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, bool) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2248
    #3 0x564785cbbcf3 in lookup_cmd(char const**, cmd_list_element*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, int) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2339
    #4 0x564785c82df2 in setting_cmd /home/smarchi/src/binutils-gdb/gdb/cli/cli-cmds.c:2219
    #5 0x564785c84274 in gdb_maint_setting_internal_fn /home/smarchi/src/binutils-gdb/gdb/cli/cli-cmds.c:2348
    #6 0x564788167b3b in call_internal_function(gdbarch*, language_defn const*, value*, int, value**) /home/smarchi/src/binutils-gdb/gdb/value.c:2321
    #7 0x5647854b6ebd in expr::ada_funcall_operation::evaluate(type*, expression*, noside) /home/smarchi/src/binutils-gdb/gdb/ada-lang.c:11254
    #8 0x564786658266 in expression::evaluate(type*, noside) /home/smarchi/src/binutils-gdb/gdb/eval.c:111
    #9 0x5647871242d6 in process_print_command_args /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1322
    #10 0x5647871244b3 in print_command_1 /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1335
    #11 0x564787125384 in print_command /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1468
    #12 0x564785caac44 in do_simple_func /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:95
    #13 0x564785cc18f0 in cmd_func(cmd_list_element*, char const*, int) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2735
    #14 0x564787c70c68 in execute_command(char const*, int) /home/smarchi/src/binutils-gdb/gdb/top.c:574
    #15 0x564786686180 in command_handler(char const*) /home/smarchi/src/binutils-gdb/gdb/event-top.c:543
    #16 0x56478668752f in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /home/smarchi/src/binutils-gdb/gdb/event-top.c:779
    #17 0x564787dcb29a in tui_command_line_handler /home/smarchi/src/binutils-gdb/gdb/tui/tui-interp.c:104
    #18 0x56478668443d in gdb_rl_callback_handler /home/smarchi/src/binutils-gdb/gdb/event-top.c:250
    #19 0x7f4efd506246 in rl_callback_read_char (/usr/lib/libreadline.so.8+0x3b246) (BuildId: 092e91fc4361b0ef94561e3ae03a75f69398acbb)
    #20 0x564786683dea in gdb_rl_callback_read_char_wrapper_noexcept /home/smarchi/src/binutils-gdb/gdb/event-top.c:192
    #21 0x564786684042 in gdb_rl_callback_read_char_wrapper /home/smarchi/src/binutils-gdb/gdb/event-top.c:225
    #22 0x564787f1b119 in stdin_event_handler /home/smarchi/src/binutils-gdb/gdb/ui.c:155
    #23 0x56478862438d in handle_file_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:573
    #24 0x564788624d23 in gdb_wait_for_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:694
    #25 0x56478862297c in gdb_do_one_event(int) /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:264
    #26 0x564786df99f0 in start_event_loop /home/smarchi/src/binutils-gdb/gdb/main.c:412
    #27 0x564786dfa069 in captured_command_loop /home/smarchi/src/binutils-gdb/gdb/main.c:476
    #28 0x564786dff61f in captured_main /home/smarchi/src/binutils-gdb/gdb/main.c:1320
    #29 0x564786dff75c in gdb_main(captured_main_args*) /home/smarchi/src/binutils-gdb/gdb/main.c:1339
    #30 0x564785381b6d in main /home/smarchi/src/binutils-gdb/gdb/gdb.c:32
    #31 0x7f4efbc3984f  (/usr/lib/libc.so.6+0x2384f) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)
    #32 0x7f4efbc39909 in __libc_start_main (/usr/lib/libc.so.6+0x23909) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)
    #33 0x564785381934 in _start (/home/smarchi/build/binutils-gdb/gdb/gdb+0xabc5934) (BuildId: 90de353ac158646e7dab501b76a18a76628fca33)

0x603000068034 is located 0 bytes after 20-byte region [0x603000068020,0x603000068034)
allocated by thread T0 here:
    #0 0x7f4efcee0cd1 in __interceptor_calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
    #1 0x5647856265d8 in xcalloc /home/smarchi/src/binutils-gdb/gdb/alloc.c:97
    #2 0x564788610c6b in xzalloc(unsigned long) /home/smarchi/src/binutils-gdb/gdbsupport/common-utils.cc:29
    #3 0x56478815721a in value::allocate_contents(bool) /home/smarchi/src/binutils-gdb/gdb/value.c:929
    #4 0x564788157285 in value::allocate(type*, bool) /home/smarchi/src/binutils-gdb/gdb/value.c:941
    #5 0x56478815733a in value::allocate(type*) /home/smarchi/src/binutils-gdb/gdb/value.c:951
    #6 0x5647854ae81c in expr::ada_string_operation::evaluate(type*, expression*, noside) /home/smarchi/src/binutils-gdb/gdb/ada-lang.c:10675
    #7 0x5647854b63b8 in expr::ada_funcall_operation::evaluate(type*, expression*, noside) /home/smarchi/src/binutils-gdb/gdb/ada-lang.c:11184
    #8 0x564786658266 in expression::evaluate(type*, noside) /home/smarchi/src/binutils-gdb/gdb/eval.c:111
    #9 0x5647871242d6 in process_print_command_args /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1322
    #10 0x5647871244b3 in print_command_1 /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1335
    #11 0x564787125384 in print_command /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1468
    #12 0x564785caac44 in do_simple_func /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:95
    #13 0x564785cc18f0 in cmd_func(cmd_list_element*, char const*, int) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2735
    #14 0x564787c70c68 in execute_command(char const*, int) /home/smarchi/src/binutils-gdb/gdb/top.c:574
    #15 0x564786686180 in command_handler(char const*) /home/smarchi/src/binutils-gdb/gdb/event-top.c:543
    #16 0x56478668752f in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /home/smarchi/src/binutils-gdb/gdb/event-top.c:779
    #17 0x564787dcb29a in tui_command_line_handler /home/smarchi/src/binutils-gdb/gdb/tui/tui-interp.c:104
    #18 0x56478668443d in gdb_rl_callback_handler /home/smarchi/src/binutils-gdb/gdb/event-top.c:250
    #19 0x7f4efd506246 in rl_callback_read_char (/usr/lib/libreadline.so.8+0x3b246) (BuildId: 092e91fc4361b0ef94561e3ae03a75f69398acbb)

Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] gdb: building inferior strings from within GDB
  2023-06-05 17:57         ` Simon Marchi
@ 2023-06-06 15:50           ` Andrew Burgess
  2023-06-09 13:41             ` Tom Tromey
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Burgess @ 2023-06-06 15:50 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Pedro Alves, Simon Marchi

Simon Marchi <simark@simark.ca> writes:

> On 6/5/23 08:26, Andrew Burgess via Gdb-patches wrote:
>> You are right.  I merged these two calls, and the other two in
>> str_value_from_setting, and pushed this patch.
>
> Turns out this test triggers an ASan error:
>
> (gdb) PASS: gdb.base/internal-string-values.exp: test_setting: all langs: lang=ada: ptype "foo"
> print $_gdb_maint_setting("test-settings string")
> =================================================================
> ==80377==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000068034 at pc 0x564785cba682 bp 0x7ffd20644620 sp 0x7ffd20644610
> READ of size 1 at 0x603000068034 thread T0

Thanks for pointing this out.  I see what's going on here.

What about this patch for fixing it?

Thanks,
Andrew

---

commit 2cf56b25deb3e74de60cf165f3f852a871125136
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Tue Jun 6 16:34:35 2023 +0100

    gdb: fix ASan failure after recent string changes
    
    After this commit:
    
      commit baab375361c365afee2577c94cbbd3fdd443d6da
      Date:   Tue Jul 13 14:44:27 2021 -0400
    
          gdb: building inferior strings from within GDB
    
    It was pointed out that a new ASan failure had been introduced which
    was triggered by gdb.base/internal-string-values.exp:
    
      (gdb) PASS: gdb.base/internal-string-values.exp: test_setting: all langs: lang=ada: ptype "foo"
      print $_gdb_maint_setting("test-settings string")
      =================================================================
      ==80377==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000068034 at pc 0x564785cba682 bp 0x7ffd20644620 sp 0x7ffd20644610
      READ of size 1 at 0x603000068034 thread T0
          #0 0x564785cba681 in find_command_name_length(char const*) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2129
          #1 0x564785cbacb2 in lookup_cmd_1(char const**, cmd_list_element*, cmd_list_element**, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, bool) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2186
          #2 0x564785cbb539 in lookup_cmd_1(char const**, cmd_list_element*, cmd_list_element**, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, bool) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2248
          #3 0x564785cbbcf3 in lookup_cmd(char const**, cmd_list_element*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, int) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2339
          #4 0x564785c82df2 in setting_cmd /tmp/src/binutils-gdb/gdb/cli/cli-cmds.c:2219
          #5 0x564785c84274 in gdb_maint_setting_internal_fn /tmp/src/binutils-gdb/gdb/cli/cli-cmds.c:2348
          #6 0x564788167b3b in call_internal_function(gdbarch*, language_defn const*, value*, int, value**) /tmp/src/binutils-gdb/gdb/value.c:2321
          #7 0x5647854b6ebd in expr::ada_funcall_operation::evaluate(type*, expression*, noside) /tmp/src/binutils-gdb/gdb/ada-lang.c:11254
          #8 0x564786658266 in expression::evaluate(type*, noside) /tmp/src/binutils-gdb/gdb/eval.c:111
          #9 0x5647871242d6 in process_print_command_args /tmp/src/binutils-gdb/gdb/printcmd.c:1322
          #10 0x5647871244b3 in print_command_1 /tmp/src/binutils-gdb/gdb/printcmd.c:1335
          #11 0x564787125384 in print_command /tmp/src/binutils-gdb/gdb/printcmd.c:1468
          #12 0x564785caac44 in do_simple_func /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:95
          #13 0x564785cc18f0 in cmd_func(cmd_list_element*, char const*, int) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2735
          #14 0x564787c70c68 in execute_command(char const*, int) /tmp/src/binutils-gdb/gdb/top.c:574
          #15 0x564786686180 in command_handler(char const*) /tmp/src/binutils-gdb/gdb/event-top.c:543
          #16 0x56478668752f in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /tmp/src/binutils-gdb/gdb/event-top.c:779
          #17 0x564787dcb29a in tui_command_line_handler /tmp/src/binutils-gdb/gdb/tui/tui-interp.c:104
          #18 0x56478668443d in gdb_rl_callback_handler /tmp/src/binutils-gdb/gdb/event-top.c:250
          #19 0x7f4efd506246 in rl_callback_read_char (/usr/lib/libreadline.so.8+0x3b246) (BuildId: 092e91fc4361b0ef94561e3ae03a75f69398acbb)
          #20 0x564786683dea in gdb_rl_callback_read_char_wrapper_noexcept /tmp/src/binutils-gdb/gdb/event-top.c:192
          #21 0x564786684042 in gdb_rl_callback_read_char_wrapper /tmp/src/binutils-gdb/gdb/event-top.c:225
          #22 0x564787f1b119 in stdin_event_handler /tmp/src/binutils-gdb/gdb/ui.c:155
          #23 0x56478862438d in handle_file_event /tmp/src/binutils-gdb/gdbsupport/event-loop.cc:573
          #24 0x564788624d23 in gdb_wait_for_event /tmp/src/binutils-gdb/gdbsupport/event-loop.cc:694
          #25 0x56478862297c in gdb_do_one_event(int) /tmp/src/binutils-gdb/gdbsupport/event-loop.cc:264
          #26 0x564786df99f0 in start_event_loop /tmp/src/binutils-gdb/gdb/main.c:412
          #27 0x564786dfa069 in captured_command_loop /tmp/src/binutils-gdb/gdb/main.c:476
          #28 0x564786dff61f in captured_main /tmp/src/binutils-gdb/gdb/main.c:1320
          #29 0x564786dff75c in gdb_main(captured_main_args*) /tmp/src/binutils-gdb/gdb/main.c:1339
          #30 0x564785381b6d in main /tmp/src/binutils-gdb/gdb/gdb.c:32
          #31 0x7f4efbc3984f  (/usr/lib/libc.so.6+0x2384f) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)
          #32 0x7f4efbc39909 in __libc_start_main (/usr/lib/libc.so.6+0x23909) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)
          #33 0x564785381934 in _start (/tmp/build/binutils-gdb/gdb/gdb+0xabc5934) (BuildId: 90de353ac158646e7dab501b76a18a76628fca33)
    
      0x603000068034 is located 0 bytes after 20-byte region [0x603000068020,0x603000068034) allocated by thread T0 here:
          #0 0x7f4efcee0cd1 in __interceptor_calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
          #1 0x5647856265d8 in xcalloc /tmp/src/binutils-gdb/gdb/alloc.c:97
          #2 0x564788610c6b in xzalloc(unsigned long) /tmp/src/binutils-gdb/gdbsupport/common-utils.cc:29
          #3 0x56478815721a in value::allocate_contents(bool) /tmp/src/binutils-gdb/gdb/value.c:929
          #4 0x564788157285 in value::allocate(type*, bool) /tmp/src/binutils-gdb/gdb/value.c:941
          #5 0x56478815733a in value::allocate(type*) /tmp/src/binutils-gdb/gdb/value.c:951
          #6 0x5647854ae81c in expr::ada_string_operation::evaluate(type*, expression*, noside) /tmp/src/binutils-gdb/gdb/ada-lang.c:10675
          #7 0x5647854b63b8 in expr::ada_funcall_operation::evaluate(type*, expression*, noside) /tmp/src/binutils-gdb/gdb/ada-lang.c:11184
          #8 0x564786658266 in expression::evaluate(type*, noside) /tmp/src/binutils-gdb/gdb/eval.c:111
          #9 0x5647871242d6 in process_print_command_args /tmp/src/binutils-gdb/gdb/printcmd.c:1322
          #10 0x5647871244b3 in print_command_1 /tmp/src/binutils-gdb/gdb/printcmd.c:1335
          #11 0x564787125384 in print_command /tmp/src/binutils-gdb/gdb/printcmd.c:1468
          #12 0x564785caac44 in do_simple_func /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:95
          #13 0x564785cc18f0 in cmd_func(cmd_list_element*, char const*, int) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2735
          #14 0x564787c70c68 in execute_command(char const*, int) /tmp/src/binutils-gdb/gdb/top.c:574
          #15 0x564786686180 in command_handler(char const*) /tmp/src/binutils-gdb/gdb/event-top.c:543
          #16 0x56478668752f in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /tmp/src/binutils-gdb/gdb/event-top.c:779
          #17 0x564787dcb29a in tui_command_line_handler /tmp/src/binutils-gdb/gdb/tui/tui-interp.c:104
          #18 0x56478668443d in gdb_rl_callback_handler /tmp/src/binutils-gdb/gdb/event-top.c:250
          #19 0x7f4efd506246 in rl_callback_read_char (/usr/lib/libreadline.so.8+0x3b246) (BuildId: 092e91fc4361b0ef94561e3ae03a75f69398acbb)
    
    The problem is in cli/cli-cmds.c, in the function setting_cmd, where
    we do this:
    
      const char *a0 = (const char *) argv[0]->contents ().data ();
    
    Here argv[0] is a value* which we know is either a TYPE_CODE_ARRAY or
    a TYPE_CODE_STRING.  The problem is that the above line is casting the
    value contents directly to a C-string, i.e. one that is assumed to
    have a null-terminator at the end.
    
    After the above commit this can no longer be assumed to be true.  A
    string value will be represented just as it would be in the current
    language, so for Ada and Fortran the string will be an array of
    characters with no null-terminator at the end.
    
    My proposed solution is to copy the string contents into a std::string
    object, and then use the std::string::c_str() value, this will ensure
    that a null-terminator has been added.
    
    I had a check through GDB at places TYPE_CODE_STRING was used and
    couldn't see any other obvious places where this type of assumption
    was being made, so hopefully this is the only offender.
    
    Running the above test with ASan compiled in no longer gives an error.

diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
index 8994cc5ea3d..638c138e7cb 100644
--- a/gdb/cli/cli-cmds.c
+++ b/gdb/cli/cli-cmds.c
@@ -2215,7 +2215,13 @@ setting_cmd (const char *fnname, struct cmd_list_element *showlist,
       && type0->code () != TYPE_CODE_STRING)
     error (_("First argument of %s must be a string."), fnname);
 
-  const char *a0 = (const char *) argv[0]->contents ().data ();
+  /* Not all languages null-terminate their strings, by moving the string
+     content into a std::string we ensure that a null-terminator is added.
+     For languages that do add a null-terminator the std::string might end
+     up with two null characters at the end, but that's harmless.  */
+  const std::string setting ((const char *) argv[0]->contents ().data (),
+			     type0->length ());
+  const char *a0 = setting.c_str ();
   cmd_list_element *cmd = lookup_cmd (&a0, showlist, "", NULL, -1, 0);
 
   if (cmd == nullptr || cmd->type != show_cmd)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] gdb: building inferior strings from within GDB
  2023-06-06 15:50           ` Andrew Burgess
@ 2023-06-09 13:41             ` Tom Tromey
  2023-06-09 14:20               ` Andrew Burgess
  0 siblings, 1 reply; 14+ messages in thread
From: Tom Tromey @ 2023-06-09 13:41 UTC (permalink / raw)
  To: Andrew Burgess via Gdb-patches
  Cc: Simon Marchi, Andrew Burgess, Pedro Alves, Simon Marchi

>>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:

Andrew> Thanks for pointing this out.  I see what's going on here.

Andrew> What about this patch for fixing it?

Looks good to me.

If this idiom:

Andrew> +  const std::string setting ((const char *) argv[0]->contents ().data (),
Andrew> +			     type0->length ());

is needed elsewhere, then maybe a new function or value method would be good.
For a single use though it doesn't seem to matter.

Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] gdb: building inferior strings from within GDB
  2023-06-09 13:41             ` Tom Tromey
@ 2023-06-09 14:20               ` Andrew Burgess
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Burgess @ 2023-06-09 14:20 UTC (permalink / raw)
  To: Tom Tromey, Andrew Burgess via Gdb-patches
  Cc: Simon Marchi, Pedro Alves, Simon Marchi

Tom Tromey <tom@tromey.com> writes:

>>>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:
>
> Andrew> Thanks for pointing this out.  I see what's going on here.
>
> Andrew> What about this patch for fixing it?
>
> Looks good to me.
>
> If this idiom:
>
> Andrew> +  const std::string setting ((const char *) argv[0]->contents ().data (),
> Andrew> +			     type0->length ());
>
> is needed elsewhere, then maybe a new function or value method would be good.
> For a single use though it doesn't seem to matter.

I have some patches that move in this direction, but haven't finished
they yet.  Something like language_defn::value_to_string(value *) to
build a std::string from a value for a particular language.

I'll go ahead and push this patch as it is for now as it fixes a
regression.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-09 14:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-03 21:49 [PATCH] gdb: fix missing null-character when using value_cstring Andrew Burgess
2023-04-04 13:58 ` Simon Marchi
2023-04-06 13:20   ` Andrew Burgess
2023-04-11 12:58     ` Pedro Alves
2023-04-12 20:47       ` Andrew Burgess
2023-04-13 11:56         ` Pedro Alves
2023-04-07  6:35 ` [PATCHv2] " Andrew Burgess
2023-05-24 14:10   ` [PATCHv3] gdb: building inferior strings from within GDB Andrew Burgess
2023-05-24 15:42     ` Simon Marchi
2023-06-05 12:26       ` Andrew Burgess
2023-06-05 17:57         ` Simon Marchi
2023-06-06 15:50           ` Andrew Burgess
2023-06-09 13:41             ` Tom Tromey
2023-06-09 14:20               ` Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).