public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] gdb: fix missing null-character when using value_cstring
@ 2023-04-03 21:49 Andrew Burgess
  2023-04-04 13:58 ` Simon Marchi
  2023-04-07  6:35 ` [PATCHv2] " Andrew Burgess
  0 siblings, 2 replies; 14+ messages in thread
From: Andrew Burgess @ 2023-04-03 21:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

In PR gdb/21699 an issue was reported with the $_as_string convenience
function.  It was observed that the string returned by this function,
when pushed into the inferior, was not null terminated.

This was causing problems when using the result with GDB's printf
command, as this command relies on the string having been pushed into
the inferior and being null terminated.

The bug includes a simple reproducer:

  #include <stddef.h>
  static char arena[51] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";

  /* Override malloc() so value_coerce_to_target() gets a known pointer, and we
     know we"ll see an error if $_as_string() gives a string that isn't NULL
     terminated. */
  void
  *malloc (size_t size)
  {
      if (size > sizeof (arena))
          return NULL;
      return arena;
  }

  int
  main ()
  {
    return 0;
  }

Then use this in a GDB session like this:

  $ gdb -q test
  Reading symbols from /tmp/test...
  (gdb) start
  Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
  Starting program: /tmp/test

  Temporary breakpoint 1, main () at test.c:17
  17        return 0;
  (gdb) printf "%s\n", $_as_string("hello")
  "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  (gdb) quit

The problem is with the GDB function value_cstring, or at least, how
this function is being used.

When $_as_string is called we enter fnpy_call (python/py-function.c),
this calls into Python code.  The Python code returns a result, which
will be a Python string, and then we call convert_value_from_python to
convert the Python string to a GDB value.

In convert_value_from_python (python/py-value.c) we enter the
gdbpy_is_string case (as the result is a string), then we call
python_string_to_target_string, which returns a null terminated C
string.  Next we then make this call:

  value = value_cstring (s.get (), strlen (s.get ()),
                         builtin_type_pychar);

This passes the pointer to the first character 's.get ()' and the
length of the string 'strlen (s.get ())', however, this length does
not include the trailing null character.

If we look at value_cstring (valops.c) we see that an array is created
using the passed in length, and characters are copied into the newly
allocated array value.  This means we do not copy the strings trailing
null character, nor does value_cstring add a trailing null.

Finally, when we use this array value with printf, GDB pushed the
array to the inferior, which mallocs some space based on the array
length, and then copies the array content to the inferior.

As the array doesn't include a trailing null, non is written into the
inferior.  So what we place into the inferior is not a C string, but
is actually an array of characters.

When printf tries to print the value it starts at the address of the
first character and continues until it reaches a null.  When that will
be is undefined, so we may end up printing random garbage.

Now, ignore for a moment that the whole push an array to the inferior
just so we can fetch it in order to print it is clearly crazy.  That's
a problem for another day I think.  The important question here is:
should value_cstring include a null character or not.

Given the function name include 'cstring', which I'm assuming means C
style string, I think that we should be including a trailing null.

Given that, I see two possibilities, either value_cstring can always
add a trailing null, or value_cstring can assert that there is a
trailing null, and the caller is responsible for making sure that the
passed in length includes the null character.

Given we're always passing from a C style string to begin with the
question is really, should the length being passed to value_cstring
include the null, or not include the null?

The only place where we currently include the null in the passed
length is from c_string_operation::evaluate.  Every other use of
value_cstring passes the length excluding the null.

I was tempted to adjust c_string_operation::evaluate to exclude the
null, and then have value_cstring add a trailing null.  However, this
does mean that if, in the future, a use is introduced that incorrectly
includes the trailing null in the passed length, then we are unlikely
to spot immediately - we'd instead create an array with two null
characters at the end.

Alternatively, if we change the requirements of value_cstring so that
we require the passed length includes the trailing null, then we can
assert that this is indeed the case within value_cstring.  Any
incorrect uses in the future will be quickly spotted.

So that's what I did, c_string_operation::evaluate is left unchanged,
but every other use of value_cstring is adjusted with a '+ 1' so that
we include the null within the length being passed.

I've added a header comment to value_cstring (value.h) to describe the
requirements.

Upon testing there were two tests that failed after this fix,
gdb.base/settings.exp and gdb.python/py-mi.exp.  In both of these
cases we end up asking for the type of a character array allocated
through value_cstring.  The length of this array has now increased by
one.  Here's the previous behaviour:

  (gdb) set args abc
  (gdb) p $_gdb_setting("args")
  $1 = "abc"
  (gdb) ptype $1
  type = char [3]
  (gdb)

And here's the modified behaviour:

  (gdb) set args abc
  (gdb) p $_gdb_setting("args")
  $1 = "abc"
  (gdb) ptype $1
  type = char [4]
  (gdb)

Notice the type of $1 changed from an array of length 3 to an array of
length 4.  However, I don't think this is a bad thing, consider:

  char lit[] = "zzz";
  int
  main()
  {
    return 0;
  }

And in GDB:

  (gdb) ptype lit
  type = char [4]
  (gdb)

The null character is considered part of the array, so I think the new
behaviour makes sense.

I've added some new tests that try to exercise this issue in a few
different ways.  At the end of the day though it's all just variations
of the same thing, create a value by calling through value_cstring,
then force GDB to push the value to the inferior, and then treat that
as a C style string and see the lack of null character cause problems.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
---
 gdb/cli/cli-cmds.c                       | 17 ++++--
 gdb/guile/scm-math.c                     | 16 ++++--
 gdb/python/py-value.c                    |  8 ++-
 gdb/testsuite/gdb.base/cstring-exprs.c   | 51 ++++++++++++++++++
 gdb/testsuite/gdb.base/cstring-exprs.exp | 68 ++++++++++++++++++++++++
 gdb/testsuite/gdb.base/settings.exp      |  4 +-
 gdb/testsuite/gdb.python/py-mi.exp       |  2 +-
 gdb/valops.c                             |  4 ++
 gdb/value.c                              |  3 +-
 gdb/value.h                              |  4 ++
 10 files changed, 164 insertions(+), 13 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.c
 create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.exp

diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
index 3b1c6a9f4bd..55cca84e7e5 100644
--- a/gdb/cli/cli-cmds.c
+++ b/gdb/cli/cli-cmds.c
@@ -2307,8 +2307,11 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  }
 
 	if (len > 0)
-	  return value_cstring (value, len,
-				builtin_type (gdbarch)->builtin_char);
+	  {
+	    /* The +1 here ensures we include the trailing null character.  */
+	    return value_cstring (value, len + 1,
+				  builtin_type (gdbarch)->builtin_char);
+	  }
 	else
 	  return value_cstring ("", 1,
 				builtin_type (gdbarch)->builtin_char);
@@ -2363,7 +2366,8 @@ str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
       {
 	std::string cmd_val = get_setshow_command_value_string (var);
 
-	return value_cstring (cmd_val.c_str (), cmd_val.size (),
+	/* The +1 ensures we include the trailing null character.  */
+	return value_cstring (cmd_val.c_str (), cmd_val.size () + 1,
 			      builtin_type (gdbarch)->builtin_char);
       }
 
@@ -2392,8 +2396,11 @@ str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  }
 
 	if (len > 0)
-	  return value_cstring (value, len,
-				builtin_type (gdbarch)->builtin_char);
+	  {
+	    /* The +1 here ensures we include the trailing null character.  */
+	    return value_cstring (value, len + 1,
+				  builtin_type (gdbarch)->builtin_char);
+	  }
 	else
 	  return value_cstring ("", 1,
 				builtin_type (gdbarch)->builtin_char);
diff --git a/gdb/guile/scm-math.c b/gdb/guile/scm-math.c
index 49fe93b97f8..c7dcfc85f11 100644
--- a/gdb/guile/scm-math.c
+++ b/gdb/guile/scm-math.c
@@ -803,9 +803,19 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
 					0 /*non-strict*/,
 					&except_scm);
 	      if (s != NULL)
-		value = value_cstring (s.get (), len,
-				       language_string_char_type (language,
-								  gdbarch));
+		{
+		  /* The gdbscm_scm_to_string call doesn't guarantee to add
+		     a tailing null-character, so we need to add our own.  */
+		  gdb::unique_xmalloc_ptr<char> tmp ((char *) xmalloc (len + 1));
+		  memcpy (tmp.get (), s.get (), len);
+		  *(tmp.get () + len) = '\0';
+
+		  /* The +1 here ensures we include the trailing null
+		     character.  */
+		  value = value_cstring (tmp.get (), len + 1,
+					 language_string_char_type (language,
+								    gdbarch));
+		}
 	      else
 		value = NULL;
 	    }
diff --git a/gdb/python/py-value.c b/gdb/python/py-value.c
index 65384c781bc..59b879e5ebc 100644
--- a/gdb/python/py-value.c
+++ b/gdb/python/py-value.c
@@ -1881,8 +1881,12 @@ convert_value_from_python (PyObject *obj)
 	  gdb::unique_xmalloc_ptr<char> s
 	    = python_string_to_target_string (obj);
 	  if (s != NULL)
-	    value = value_cstring (s.get (), strlen (s.get ()),
-				   builtin_type_pychar);
+	    {
+	      /* The +1 here ensures we include the trailing null
+		 character.  */
+	      value = value_cstring (s.get (), strlen (s.get ()) + 1,
+				     builtin_type_pychar);
+	    }
 	}
       else if (PyObject_TypeCheck (obj, &value_object_type))
 	value = ((value_object *) obj)->value->copy ();
diff --git a/gdb/testsuite/gdb.base/cstring-exprs.c b/gdb/testsuite/gdb.base/cstring-exprs.c
new file mode 100644
index 00000000000..8135edd97d4
--- /dev/null
+++ b/gdb/testsuite/gdb.base/cstring-exprs.c
@@ -0,0 +1,51 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+#include <string.h>
+
+/* A memory area used as the malloc memory buffer.  */
+
+static char arena[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
+
+/* Override malloc().  When GDB tries to push strings into the inferior we
+   always return the same pointer to arena.  This does mean we can't have
+   multiple strings in use at the same time, but that's fine for our basic
+   testing, and this is simpler than using dlsym.  */
+
+void
+*malloc (size_t size)
+{
+  memset (arena, 'X', sizeof (arena));
+  if (size > sizeof (arena))
+    return NULL;
+  return arena;
+}
+
+/* This function is called from GDB.  */
+
+void
+take_string (const char *str)
+{
+  /* Nothing.  */
+}
+
+int
+main (void)
+{
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/cstring-exprs.exp b/gdb/testsuite/gdb.base/cstring-exprs.exp
new file mode 100644
index 00000000000..fc2f96affa0
--- /dev/null
+++ b/gdb/testsuite/gdb.base/cstring-exprs.exp
@@ -0,0 +1,68 @@
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test different ways that we can cause GDB to call the value_string
+# function.  This function should create a C style string, i.e. the
+# string should include a null terminating character.
+#
+# If the value created by value_cstring is pushed into the inferior
+# and the null terminating character is missing, then we can get
+# unexpected results.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" $testfile $srcfile] } {
+    return -1
+}
+
+if {![runto_main]} {
+    return 0
+}
+
+if [allow_python_tests] {
+    # The $_as_string convenience function is implemented in Python.
+    gdb_test {printf "%s\n", $_as_string("aabbcc")} "\"aabbcc\""
+
+    # Create a gdb.Value object for a string.  Take its address (which
+    # forces it into the inferior), and then print the address as a
+    # string.
+    gdb_test_no_output {python addr = gdb.Value("xxyyzz").address}
+    gdb_test {python gdb.execute("x/1s 0x%x" % addr)} \
+	"$hex <arena>:\\s+\"xxyyzz\""
+
+    # Call an inferior function through a gdb.Value object, pass a
+    # string to the inferior function and ensure it arrives correctly.
+    gdb_test "p/x take_string" " = $hex.*"
+    gdb_test_no_output "python func_ptr = gdb.history (0)" \
+	"place address of take_string into Python variable"
+    gdb_test "python func_value = func_ptr.dereference()" ""
+    gdb_breakpoint "take_string"
+    gdb_test {python result = func_value("qqaazz")} \
+	"Breakpoint $decimal, take_string \\(str=$hex <arena> \"qqaazz\"\\) at .*"
+    gdb_test "continue" "Continuing\\."
+}
+
+# Use printf on a string parsed by the C expression parser.
+gdb_test {printf "%s\n", "ddeeff"} "ddeeff"
+
+# Parse a string in the C expression parser, force it into the
+# inferior by taking its address, then print it as a string.
+gdb_test {x/1s &"gghhii"} "$hex <arena>:\\s+\"gghhii\""
+
+# Use $_gdb_setting_str and $_gdb_maint_setting_str to create a string
+# value, and then print using printf, which forces the string into the
+# inferior.
+gdb_test {printf "%s\n", $_gdb_setting_str("arch")} "auto"
+gdb_test {printf "%s\n", $_gdb_maint_setting_str("bfd-sharing")} "on"
diff --git a/gdb/testsuite/gdb.base/settings.exp b/gdb/testsuite/gdb.base/settings.exp
index e54203924ec..9dd3ee174c7 100644
--- a/gdb/testsuite/gdb.base/settings.exp
+++ b/gdb/testsuite/gdb.base/settings.exp
@@ -504,7 +504,9 @@ proc_with_prefix test-enum {} {
     gdb_test_no_output "$set_cmd zzz"
     show_setting "$show_cmd" "zzz" 0 "yyy"
 
-    check_type "test-settings enum" "type = char \\\[3\\\]"
+    # A string literal includes the trailing null character, hence the
+    # array size of four here.
+    check_type "test-settings enum" "type = char \\\[4\\\]"
 
     test_gdb_complete_multiple "$set_cmd " "" "" {
 	"xxx"
diff --git a/gdb/testsuite/gdb.python/py-mi.exp b/gdb/testsuite/gdb.python/py-mi.exp
index 97b8a42f715..c65c83553f1 100644
--- a/gdb/testsuite/gdb.python/py-mi.exp
+++ b/gdb/testsuite/gdb.python/py-mi.exp
@@ -288,7 +288,7 @@ mi_create_dynamic_varobj nstype2 nstype2 ".*" 1 \
   "create nstype2 varobj"
 
 mi_list_varobj_children nstype2 {
-    { {nstype2.<error at 0>} {<error at 0>} 6 {char \[6\]} }
+    { {nstype2.<error at 0>} {<error at 0>} 7 {char \[7\]} }
 } "list children after setting exception flag"
 
 mi_create_varobj me me \
diff --git a/gdb/valops.c b/gdb/valops.c
index d002c9dca9b..305885cc9c6 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1738,9 +1738,13 @@ value_array (int lowbound, int highbound, struct value **elemvec)
   return val;
 }
 
+/* See value.h.  */
+
 struct value *
 value_cstring (const char *ptr, ssize_t len, struct type *char_type)
 {
+  gdb_assert (*(ptr + (len - 1)) == '\0');
+
   struct value *val;
   int lowbound = current_language->string_lower_bound ();
   ssize_t highbound = len / char_type->length ();
diff --git a/gdb/value.c b/gdb/value.c
index eab1933b256..4f1e0d6c53e 100644
--- a/gdb/value.c
+++ b/gdb/value.c
@@ -2044,7 +2044,8 @@ value_of_internalvar (struct gdbarch *gdbarch, struct internalvar *var)
       break;
 
     case INTERNALVAR_STRING:
-      val = value_cstring (var->u.string, strlen (var->u.string),
+      /* The +1 ensures we include the null character.  */
+      val = value_cstring (var->u.string, strlen (var->u.string) + 1,
 			   builtin_type (gdbarch)->builtin_char);
       break;
 
diff --git a/gdb/value.h b/gdb/value.h
index e13f92c397b..a23adda3c65 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -1182,6 +1182,10 @@ class scoped_value_mark
   const struct value *m_value;
 };
 
+/* Create an array value, with element type CHAR_TYPE and length LEN.  The
+   array represents a C style string.  The content for the value is copied
+   from PTR.  The last array element must be a null character.  */
+
 extern struct value *value_cstring (const char *ptr, ssize_t len,
 				    struct type *char_type);
 extern struct value *value_string (const char *ptr, ssize_t len,

base-commit: 60a13bbcdfb0ce008a77563cea0c34c830d7b170
-- 
2.25.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-09 14:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-03 21:49 [PATCH] gdb: fix missing null-character when using value_cstring Andrew Burgess
2023-04-04 13:58 ` Simon Marchi
2023-04-06 13:20   ` Andrew Burgess
2023-04-11 12:58     ` Pedro Alves
2023-04-12 20:47       ` Andrew Burgess
2023-04-13 11:56         ` Pedro Alves
2023-04-07  6:35 ` [PATCHv2] " Andrew Burgess
2023-05-24 14:10   ` [PATCHv3] gdb: building inferior strings from within GDB Andrew Burgess
2023-05-24 15:42     ` Simon Marchi
2023-06-05 12:26       ` Andrew Burgess
2023-06-05 17:57         ` Simon Marchi
2023-06-06 15:50           ` Andrew Burgess
2023-06-09 13:41             ` Tom Tromey
2023-06-09 14:20               ` Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).