public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] gdb: consider null terminator for length arguments of value_cstring calls
@ 2021-07-13 15:31 Simon Marchi
  2021-07-13 17:07 ` Pedro Alves
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Marchi @ 2021-07-13 15:31 UTC (permalink / raw)
  To: gdb-patches

From: Simon Marchi <simon.marchi@polymtl.ca>

This patch started when I looked at this bit in cli/cli-cmds.c:

    if (*(char **) cmd->var)
      return value_cstring (*(char **) cmd->var, strlen (*(char **) cmd->var),
			    builtin_type (gdbarch)->builtin_char);
    else
      return value_cstring ("", 1,
			    builtin_type (gdbarch)->builtin_char);

I found it odd that the first value_cstring call passes a length that
does not consider the null terminator of the C string, but second does.
I want to clarify the documentation of value_cstring and fix the one
that is wrong.

Debugging a little bit, I found that the first call is the wrong one.
Doing:

  (gdb) set args foo
  (gdb) print $_gdb_setting("args")
  $1 = "foo"

creates a struct value of code TYPE_CODE_ARRAY of size 3, which doesn't
have a null terminator.  That does not create a valid C string.  It is
however printed correctly by GDB, because the printing code makes sure
not to read past the value's length.

A way to expose the bug, as implemented in the new test, is to make an
inferior function call with this value, using a function that reads the
string until the null terminator.  That allocates some memory in the
inferior for the string and copies it.  If the null terminator is
missing, the function will read past the allocated buffer, possibly
causing a crash.  In practice, the memory following the buffer must
pretty often be zeroed-out, because I didn't see a crash just with this.
But if we compile the inferior program with AddressSanitizer, then the
out-of-bounds access causes a crash systematically.

It would be nice to find a test that does not require compiling the
inferior program with AddressSanitizer, if somebody has a better idea,
but at least I think it's better than no test.

I found a few calls to value_cstring that were wrong, I fixed them
all, all exercised by the test.

The change in guile/scm-math.c maybe requires a bit of explanation.
According to the doc of gdbscm_scm_to_string, if don't pass a length
pointer, we get back a null-terminated string.  If we pass a length
pointer, we get a non-null-terminated string, but we get the length
separately.  Trying to pass "len + 1" to value_cstring would lead to GDB
reading past the allocated buffer, that is exactly of length `len`.  So
instead, don't pass a length pointer and use strlen on the result.

gdb.base/settings.exp and gdb.python/py-mi.exp required changes in some
expected outputs, because the array type created by $_gdb_setting_str
is now one element larger, to account for the null terminator.  I don't
think this change in user-visible behavior is a problem.

Change-Id: If8dd13cce55c70521e34e7c360139064b4e87496
---
 gdb/cli/cli-cmds.c                            |   8 +-
 gdb/guile/scm-math.c                          |   6 +-
 gdb/python/py-value.c                         |   2 +-
 gdb/testsuite/gdb.base/infcall-gdb-setting.c  |  42 ++++
 .../gdb.base/infcall-gdb-setting.exp          | 204 ++++++++++++++++++
 gdb/testsuite/gdb.base/settings.exp           |   2 +-
 gdb/testsuite/gdb.python/py-mi.exp            |   2 +-
 gdb/valops.c                                  |  11 +-
 gdb/value.c                                   |   2 +-
 gdb/value.h                                   |  14 ++
 10 files changed, 274 insertions(+), 19 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/infcall-gdb-setting.c
 create mode 100644 gdb/testsuite/gdb.base/infcall-gdb-setting.exp

diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
index 56ae12a0c19..f6fc5ab854f 100644
--- a/gdb/cli/cli-cmds.c
+++ b/gdb/cli/cli-cmds.c
@@ -2146,7 +2146,8 @@ value_from_setting (const cmd_list_element *cmd, struct gdbarch *gdbarch)
     case var_filename:
     case var_enum:
       if (*(char **) cmd->var)
-	return value_cstring (*(char **) cmd->var, strlen (*(char **) cmd->var),
+	return value_cstring (*(char **) cmd->var,
+			      strlen (*(char **) cmd->var) + 1,
 			      builtin_type (gdbarch)->builtin_char);
       else
 	return value_cstring ("", 1,
@@ -2198,7 +2199,7 @@ str_value_from_setting (const cmd_list_element *cmd, struct gdbarch *gdbarch)
       {
 	std::string cmd_val = get_setshow_command_value_string (cmd);
 
-	return value_cstring (cmd_val.c_str (), cmd_val.size (),
+	return value_cstring (cmd_val.c_str (), cmd_val.size () + 1,
 			      builtin_type (gdbarch)->builtin_char);
       }
 
@@ -2212,7 +2213,8 @@ str_value_from_setting (const cmd_list_element *cmd, struct gdbarch *gdbarch)
 	 escaping quotes.  So, we directly use the cmd->var string value,
 	 similarly to the value_from_setting code for these cases.  */
       if (*(char **) cmd->var)
-	return value_cstring (*(char **) cmd->var, strlen (*(char **) cmd->var),
+	return value_cstring (*(char **) cmd->var,
+			      strlen (*(char **) cmd->var) + 1,
 			      builtin_type (gdbarch)->builtin_char);
       else
 	return value_cstring ("", 1,
diff --git a/gdb/guile/scm-math.c b/gdb/guile/scm-math.c
index d9fd6718196..8b7fe9b69ec 100644
--- a/gdb/guile/scm-math.c
+++ b/gdb/guile/scm-math.c
@@ -776,8 +776,6 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
 	}
       else if (scm_is_string (obj))
 	{
-	  size_t len;
-
 	  if (type != NULL)
 	    {
 	      except_scm = gdbscm_make_misc_error (func_name, type_arg_pos,
@@ -789,12 +787,12 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
 	    {
 	      /* TODO: Provide option to specify conversion strategy.  */
 	      gdb::unique_xmalloc_ptr<char> s
-		= gdbscm_scm_to_string (obj, &len,
+		= gdbscm_scm_to_string (obj, nullptr,
 					target_charset (gdbarch),
 					0 /*non-strict*/,
 					&except_scm);
 	      if (s != NULL)
-		value = value_cstring (s.get (), len,
+		value = value_cstring (s.get (), strlen (s.get ()) + 1,
 				       language_string_char_type (language,
 								  gdbarch));
 	      else
diff --git a/gdb/python/py-value.c b/gdb/python/py-value.c
index 8df8a15f8d6..bc0f88b7a16 100644
--- a/gdb/python/py-value.c
+++ b/gdb/python/py-value.c
@@ -1907,7 +1907,7 @@ convert_value_from_python (PyObject *obj)
 	  gdb::unique_xmalloc_ptr<char> s
 	    = python_string_to_target_string (obj);
 	  if (s != NULL)
-	    value = value_cstring (s.get (), strlen (s.get ()),
+	    value = value_cstring (s.get (), strlen (s.get ()) + 1,
 				   builtin_type_pychar);
 	}
       else if (PyObject_TypeCheck (obj, &value_object_type))
diff --git a/gdb/testsuite/gdb.base/infcall-gdb-setting.c b/gdb/testsuite/gdb.base/infcall-gdb-setting.c
new file mode 100644
index 00000000000..f8f00063783
--- /dev/null
+++ b/gdb/testsuite/gdb.base/infcall-gdb-setting.c
@@ -0,0 +1,42 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+static char *buf;
+
+static char *
+print_str (const char *s)
+{
+  free (buf);
+  buf = strdup (s);
+  return buf;
+}
+
+static void
+end (void)
+{}
+
+int
+main (void)
+{
+  print_str ("unused");
+  end ();
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/infcall-gdb-setting.exp b/gdb/testsuite/gdb.base/infcall-gdb-setting.exp
new file mode 100644
index 00000000000..fd016c645a3
--- /dev/null
+++ b/gdb/testsuite/gdb.base/infcall-gdb-setting.exp
@@ -0,0 +1,204 @@
+# Copyright 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that string values are correctly allocated inside GDB when doing
+# various operations that yield a string object.
+#
+# The issue that lead to this test was missing NULL terminator in the C-string
+# values.  The way to expose this is:
+#
+#  - Compile a test program with AddressSanitizer, with a function that takes a
+#    string as a parameter and does something with it that requires reading
+#    until the NULL terminator (e.g. strdups it).
+#  - Call this function by hand, passing the string value to test
+#
+# This requires GDB to copy the string value to the inferior, allocating
+# as many bytes of new memory as the struct value requires.  If the NULL
+# terminator is missing, the function will read past the allocated buffer,
+# triggering an ASan failure, failing the test.
+#
+# The "infcall" portion is really just there to test and expose bugs in the
+# value-creation code.
+
+load_lib "trace-support.exp"
+load_lib "gdb-guile.exp"
+
+standard_testfile
+
+if {[build_executable "failed to prepare" \
+     $testfile $srcfile \
+     {debug additional_flags=-fsanitize=address ldflags=-fsanitize=address}]} {
+    return
+}
+
+# Test with string values made by $_gdb_setting & co.
+
+proc_with_prefix test_setting { } {
+    clean_restart $::binfile
+
+    if {![runto_main]} {
+	fail "could not run to main"
+	return
+    }
+
+    # This is an internal GDB implementation detail, but the variable backing a
+    # string setting starts as nullptr (unless explicitly initialized at startup).
+    # When assigning an empty value, the variable then points to an empty string.
+    # Test both cases, as it triggers different code paths (in addition to a
+    # non-empty value).
+    #
+    # Use "set trace-user" and "maintenance set test-settings string" as they are
+    # both not initialized at startup.
+    gdb_test "print print_str(\$_gdb_setting(\"trace-user\"))" \
+	" = $::hex \"\"" \
+	"print \$_gdb_setting nullptr"
+    gdb_test "print print_str(\$_gdb_setting_str(\"trace-user\"))" \
+	" = $::hex \"\"" \
+	"print \$_gdb_setting_str nullptr"
+
+    gdb_test "set trace-user"
+    gdb_test "print print_str(\$_gdb_setting(\"trace-user\"))" \
+	" = $::hex \"\"" \
+	"print \$_gdb_setting empty"
+    gdb_test "print print_str(\$_gdb_setting_str(\"trace-user\"))" \
+	" = $::hex \"\"" \
+	"print \$_gdb_setting_str empty"
+
+    gdb_test "set trace-user poulet"
+    gdb_test "print print_str(\$_gdb_setting(\"trace-user\"))" \
+	" = $::hex \"poulet\"" \
+	"print \$_gdb_setting non-empty"
+    gdb_test "print print_str(\$_gdb_setting_str(\"trace-user\"))" \
+	" = $::hex \"poulet\"" \
+	"print \$_gdb_setting_str non-empty"
+
+    gdb_test "print print_str(\$_gdb_maint_setting(\"test-settings string\"))" \
+	" = $::hex \"\"" \
+	"print \$_gdb_maint_setting nullptr"
+    gdb_test "print print_str(\$_gdb_maint_setting_str(\"test-settings string\"))" \
+	" = $::hex \"\"" \
+	"print \$_gdb_maint_setting_str nullptr"
+
+    gdb_test "maintenance set test-settings string"
+    gdb_test "print print_str(\$_gdb_maint_setting(\"test-settings string\"))" \
+	" = $::hex \"\"" \
+	"print \$_gdb_maint_setting empty"
+    gdb_test "print print_str(\$_gdb_maint_setting_str(\"test-settings string\"))" \
+	" = $::hex \"\"" \
+	"print \$_gdb_maint_setting_str empty"
+
+    gdb_test "maintenance set test-settings string perchaude"
+    gdb_test "print print_str(\$_gdb_maint_setting(\"test-settings string\"))" \
+	" = $::hex \"perchaude\"" \
+	"print \$_gdb_maint_setting non-empty"
+    gdb_test "print print_str(\$_gdb_maint_setting_str(\"test-settings string\"))" \
+	" = $::hex \"perchaude\"" \
+	"print \$_gdb_maint_setting_str non-empty"
+
+    # Test with a non-string setting, this tests yet another code path.
+    gdb_test "print print_str(\$_gdb_setting_str(\"remotetimeout\"))" \
+	" = $::hex \"2\"" \
+	"print \$_gdb_setting integer"
+}
+
+# Test with a string value created by gdb.Value in Python.
+
+proc_with_prefix test_python_value { } {
+    clean_restart $::binfile
+
+    if {![runto_main]} {
+	fail "could not run to main"
+	return
+    }
+
+    if {[skip_python_tests]} {
+	untested "skipping test_python_value"
+	return
+    }
+
+    gdb_test_no_output "python gdb.set_convenience_variable(\"foo\", \"bar\")" \
+	"set convenience var"
+    gdb_test "print print_str(\$foo)" \
+	" = $::hex \"bar\"" \
+	"print from Python value"
+}
+
+# Test with a string value created by make-value in Guile.
+
+proc_with_prefix test_guile_value { } {
+    clean_restart $::binfile
+
+    if {![runto_main]} {
+	fail "could not run to main"
+	return
+    }
+
+    if {[skip_guile_tests]} {
+	untested "skipping test_guile_value"
+	return
+    }
+
+    gdb_test_no_output "guile (use-modules (gdb))"
+    gdb_test_multiple "guile (history-append! (make-value \"foo\"))" "make value" {
+	-re -wrap "($::decimal)" {
+	    set histnum $expect_out(1,string)
+	}
+    }
+
+    gdb_test "print print_str(\$$histnum)" \
+        " = $::hex \"foo\"" \
+        "print from Guile value"
+}
+
+# Test with a string value coming from a string internal var.  The only internal
+# vars of this type, at the time of writing, are $trace_func and $trace_file.
+# They both require inspecting a trace frame.  So if the target is capable start
+# tracing, record one trace frame, and use $trace_func.
+
+proc_with_prefix test_internal_var { } {
+    if {![gdb_trace_common_supports_arch]} {
+	unsupported "arch does not support trace"
+	return
+    }
+
+    clean_restart $::binfile
+
+    if {![runto_main]} {
+	fail "could not run to main"
+	return
+    }
+
+    if {![gdb_target_supports_trace]} {
+	unsupported "target does not support trace"
+	return
+    }
+
+    gdb_test "break end" "Breakpoint $::decimal at $::hex.*"
+    gdb_test "trace print_str" "Tracepoint $::decimal at $::hex.*"
+    gdb_test_no_output "tstart"
+    gdb_test "continue" "Breakpoint $::decimal, end.*"
+    gdb_test_no_output "tstop"
+    gdb_test "tfind" "Found trace frame 0, tracepoint $::decimal.*"
+    gdb_test_no_output "set \$convenience = \$trace_func"
+    gdb_test "tfind none" "No longer looking at any trace frame.*"
+    gdb_test "print print_str(\$convenience)" \
+	" = $::hex \"print_str\"" \
+	"print internal var string"
+}
+
+test_setting
+test_python_value
+test_guile_value
+test_internal_var
diff --git a/gdb/testsuite/gdb.base/settings.exp b/gdb/testsuite/gdb.base/settings.exp
index 64257aaae5a..0a594685d6b 100644
--- a/gdb/testsuite/gdb.base/settings.exp
+++ b/gdb/testsuite/gdb.base/settings.exp
@@ -498,7 +498,7 @@ proc_with_prefix test-enum {} {
     gdb_test_no_output "$set_cmd zzz"
     show_setting "$show_cmd" "zzz" 0 "yyy"
 
-    check_type "test-settings enum" "type = char \\\[3\\\]"
+    check_type "test-settings enum" "type = char \\\[4\\\]"
 
     test_gdb_complete_multiple "$set_cmd " "" "" {
 	"xxx"
diff --git a/gdb/testsuite/gdb.python/py-mi.exp b/gdb/testsuite/gdb.python/py-mi.exp
index 6f0c1c65e62..b81ee9acd74 100644
--- a/gdb/testsuite/gdb.python/py-mi.exp
+++ b/gdb/testsuite/gdb.python/py-mi.exp
@@ -289,7 +289,7 @@ mi_create_dynamic_varobj nstype2 nstype2 1 \
   "create nstype2 varobj"
 
 mi_list_varobj_children nstype2 {
-    { {nstype2.<error at 0>} {<error at 0>} 6 {char \[6\]} }
+    { {nstype2.<error at 0>} {<error at 0>} 7 {char \[7\]} }
 } "list children after setting exception flag"
 
 mi_create_varobj me me \
diff --git a/gdb/valops.c b/gdb/valops.c
index bd547923496..f67612e87b0 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1747,6 +1747,8 @@ value_array (int lowbound, int highbound, struct value **elemvec)
   return val;
 }
 
+/* See value.h.  */
+
 struct value *
 value_cstring (const char *ptr, ssize_t len, struct type *char_type)
 {
@@ -1761,14 +1763,7 @@ value_cstring (const char *ptr, ssize_t len, struct type *char_type)
   return val;
 }
 
-/* Create a value for a string constant by allocating space in the
-   inferior, copying the data into that space, and returning the
-   address with type TYPE_CODE_STRING.  PTR points to the string
-   constant data; LEN is number of characters.
-
-   Note that string types are like array of char types with a lower
-   bound of zero and an upper bound of LEN - 1.  Also note that the
-   string may contain embedded null bytes.  */
+/* See value.h.  */
 
 struct value *
 value_string (const char *ptr, ssize_t len, struct type *char_type)
diff --git a/gdb/value.c b/gdb/value.c
index 6a07495d32b..db34d2e5762 100644
--- a/gdb/value.c
+++ b/gdb/value.c
@@ -2188,7 +2188,7 @@ value_of_internalvar (struct gdbarch *gdbarch, struct internalvar *var)
       break;
 
     case INTERNALVAR_STRING:
-      val = value_cstring (var->u.string, strlen (var->u.string),
+      val = value_cstring (var->u.string, strlen (var->u.string) + 1,
 			   builtin_type (gdbarch)->builtin_char);
       break;
 
diff --git a/gdb/value.h b/gdb/value.h
index 379cddafbe7..24392437ed0 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -782,8 +782,22 @@ class scoped_value_mark
   const struct value *m_value;
 };
 
+/* Create a not_lval value with type TYPE_CODE_ARRAY.
+
+   PTR points to the string data; LEN is the length of the string including the
+   NULL terminator.  */
+
 extern struct value *value_cstring (const char *ptr, ssize_t len,
 				    struct type *char_type);
+
+/* Create a not_lval value with type TYPE_CODE_STRING.
+
+   PTR points to the string data; LEN is number of characters.
+
+   Note that string types are like array of char types with a lower
+   bound of zero and an upper bound of LEN - 1.  Also note that the
+   string may contain embedded null bytes.  */
+
 extern struct value *value_string (const char *ptr, ssize_t len,
 				   struct type *char_type);
 
-- 
2.32.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gdb: consider null terminator for length arguments of value_cstring calls
  2021-07-13 15:31 [PATCH] gdb: consider null terminator for length arguments of value_cstring calls Simon Marchi
@ 2021-07-13 17:07 ` Pedro Alves
  2021-07-13 18:44   ` Simon Marchi
  0 siblings, 1 reply; 8+ messages in thread
From: Pedro Alves @ 2021-07-13 17:07 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2021-07-13 4:31 p.m., Simon Marchi via Gdb-patches wrote:

> A way to expose the bug, as implemented in the new test, is to make an
> inferior function call with this value, using a function that reads the
> string until the null terminator.  That allocates some memory in the
> inferior for the string and copies it.  If the null terminator is
> missing, the function will read past the allocated buffer, possibly
> causing a crash.  In practice, the memory following the buffer must
> pretty often be zeroed-out, because I didn't see a crash just with this.
> But if we compile the inferior program with AddressSanitizer, then the
> out-of-bounds access causes a crash systematically.
> 
> It would be nice to find a test that does not require compiling the
> inferior program with AddressSanitizer, if somebody has a better idea,
> but at least I think it's better than no test.

Do we need to run a program at all?  How about simply something based on this?

Before patch:

 (gdb) maint set test-settings string foo
 (gdb) p $_gdb_maint_setting("test-settings string")
 $1 = "foo"
 (gdb) p $_gdb_maint_setting("test-settings string")[0]
 $2 = 102 'f'
 (gdb) p $_gdb_maint_setting("test-settings string")[1]
 $3 = 111 'o'
 (gdb) p $_gdb_maint_setting("test-settings string")[2]
 $4 = 111 'o'
 (gdb) p $_gdb_maint_setting("test-settings string")[3]
 no such vector element

After patch:

 (gdb) maint set test-settings string foo
 (gdb) p $_gdb_maint_setting("test-settings string")
 $1 = "foo"
 (gdb) p $_gdb_maint_setting("test-settings string")[0]
 $2 = 102 'f'
 (gdb) p $_gdb_maint_setting("test-settings string")[1]
 $3 = 111 'o'
 (gdb) p $_gdb_maint_setting("test-settings string")[2]
 $4 = 111 'o'
 (gdb) p $_gdb_maint_setting("test-settings string")[3]
 $5 = 0 '\000'
 (gdb) p $_gdb_maint_setting("test-settings string")[4]
 no such vector element

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gdb: consider null terminator for length arguments of value_cstring calls
  2021-07-13 17:07 ` Pedro Alves
@ 2021-07-13 18:44   ` Simon Marchi
  2021-07-14 10:25     ` Pedro Alves
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Marchi @ 2021-07-13 18:44 UTC (permalink / raw)
  To: Pedro Alves, Simon Marchi, gdb-patches

On 2021-07-13 1:07 p.m., Pedro Alves wrote:
> On 2021-07-13 4:31 p.m., Simon Marchi via Gdb-patches wrote:
> 
>> A way to expose the bug, as implemented in the new test, is to make an
>> inferior function call with this value, using a function that reads the
>> string until the null terminator.  That allocates some memory in the
>> inferior for the string and copies it.  If the null terminator is
>> missing, the function will read past the allocated buffer, possibly
>> causing a crash.  In practice, the memory following the buffer must
>> pretty often be zeroed-out, because I didn't see a crash just with this.
>> But if we compile the inferior program with AddressSanitizer, then the
>> out-of-bounds access causes a crash systematically.
>>
>> It would be nice to find a test that does not require compiling the
>> inferior program with AddressSanitizer, if somebody has a better idea,
>> but at least I think it's better than no test.
> 
> Do we need to run a program at all?  How about simply something based on this?
> 
> Before patch:
> 
>  (gdb) maint set test-settings string foo
>  (gdb) p $_gdb_maint_setting("test-settings string")
>  $1 = "foo"
>  (gdb) p $_gdb_maint_setting("test-settings string")[0]
>  $2 = 102 'f'
>  (gdb) p $_gdb_maint_setting("test-settings string")[1]
>  $3 = 111 'o'
>  (gdb) p $_gdb_maint_setting("test-settings string")[2]
>  $4 = 111 'o'
>  (gdb) p $_gdb_maint_setting("test-settings string")[3]
>  no such vector element
> 
> After patch:
> 
>  (gdb) maint set test-settings string foo
>  (gdb) p $_gdb_maint_setting("test-settings string")
>  $1 = "foo"
>  (gdb) p $_gdb_maint_setting("test-settings string")[0]
>  $2 = 102 'f'
>  (gdb) p $_gdb_maint_setting("test-settings string")[1]
>  $3 = 111 'o'
>  (gdb) p $_gdb_maint_setting("test-settings string")[2]
>  $4 = 111 'o'
>  (gdb) p $_gdb_maint_setting("test-settings string")[3]
>  $5 = 0 '\000'
>  (gdb) p $_gdb_maint_setting("test-settings string")[4]
>  no such vector element

Ohh, thanks.  The changes I needed to do in the existing tests should
have made me think about that.  I liked the inferior call approach,
because it makes the inferior program crash, so it's an obvious bug.  By
just printing the string elements, it may not be as clear cut that it's
a bug.  As long as the values stay in GDB, they don't cause a crash (or
at least, I didn't find a way to cause a crash).

But if there's not doubt that null-terminating these strings is what we
want, your approach is fine and makes the test a bit simpler.  I added a
little mention of the inferior function call way of reproducing the bug
in the commit message.  See the updated patch below.


From 633cacecf3e6fb958ca9fbf33a03f2ec416320dc Mon Sep 17 00:00:00 2001
From: Simon Marchi <simon.marchi@polymtl.ca>
Date: Tue, 13 Jul 2021 11:31:42 -0400
Subject: [PATCH v2] gdb: consider null terminator for length arguments of
 value_cstring calls

This patch started when I looked at this bit in cli/cli-cmds.c:

    if (*(char **) cmd->var)
      return value_cstring (*(char **) cmd->var, strlen (*(char **) cmd->var),
			    builtin_type (gdbarch)->builtin_char);
    else
      return value_cstring ("", 1,
			    builtin_type (gdbarch)->builtin_char);

I found it odd that the first value_cstring call passes a length that
does not consider the null terminator of the C string, but second does.
I want to clarify the documentation of value_cstring and fix the one
that is wrong.

Debugging a little bit, I found that the first call is the wrong one.
Doing:

  (gdb) set args foo
  (gdb) print $_gdb_setting("args")
  $1 = "foo"

creates a struct value of code TYPE_CODE_ARRAY of size 3, which doesn't
have a null terminator.  That does not create a valid C string.  It is
however printed correctly by GDB, because the printing code makes sure
not to read past the value's length.

A way to expose the bug is to print each element of the created string,
including the null terminator.  Before:

    (gdb) set args foo
    (gdb) p $_gdb_setting("args")[3]
    no such vector element

After:

    (gdb) set args foo
    (gdb) p $_gdb_setting("args")[3]
    $1 = 0 '\000'

Another perhaps more convincing way of showing the bug is if the string
value is passed to an inferior function call;

    (gdb) print an_inferior_function($_gdb_setting("args))

The space allocate for the string in the inferior will not take into
account a null terminator (with the string "foo", 3 bytes will be
allocated).  If the inferior tries to read the string until the null
terminator, it will read past the allocated buffer.  Compiling the
inferior with AddressSanitizer makes that bad access obvious.

I found a few calls to value_cstring that were wrong, I fixed them
all, all exercised by the test.

The change in guile/scm-math.c maybe requires a bit of explanation.
According to the doc of gdbscm_scm_to_string, if don't pass a length
pointer, we get back a null-terminated string.  If we pass a length
pointer, we get a non-null-terminated string, but we get the length
separately.  Trying to pass "len + 1" to value_cstring would lead to GDB
reading past the allocated buffer, that is exactly of length `len`.  So
instead, don't pass a length pointer and use strlen on the result.

gdb.base/settings.exp and gdb.python/py-mi.exp required changes in some
expected outputs, because the array type created by $_gdb_setting_str
is now one element larger, to account for the null terminator.  I don't
think this change in user-visible behavior is a problem.

Change-Id: If8dd13cce55c70521e34e7c360139064b4e87496
---
 gdb/cli/cli-cmds.c                            |   8 +-
 gdb/guile/scm-math.c                          |   6 +-
 gdb/python/py-value.c                         |   2 +-
 .../gdb.base/internal-string-values.c         |  32 +++
 .../gdb.base/internal-string-values.exp       | 198 ++++++++++++++++++
 gdb/testsuite/gdb.base/settings.exp           |   2 +-
 gdb/testsuite/gdb.python/py-mi.exp            |   2 +-
 gdb/valops.c                                  |  11 +-
 gdb/value.c                                   |   2 +-
 gdb/value.h                                   |  14 ++
 10 files changed, 258 insertions(+), 19 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/internal-string-values.c
 create mode 100644 gdb/testsuite/gdb.base/internal-string-values.exp

diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
index 56ae12a0c19d..f6fc5ab854f4 100644
--- a/gdb/cli/cli-cmds.c
+++ b/gdb/cli/cli-cmds.c
@@ -2146,7 +2146,8 @@ value_from_setting (const cmd_list_element *cmd, struct gdbarch *gdbarch)
     case var_filename:
     case var_enum:
       if (*(char **) cmd->var)
-	return value_cstring (*(char **) cmd->var, strlen (*(char **) cmd->var),
+	return value_cstring (*(char **) cmd->var,
+			      strlen (*(char **) cmd->var) + 1,
 			      builtin_type (gdbarch)->builtin_char);
       else
 	return value_cstring ("", 1,
@@ -2198,7 +2199,7 @@ str_value_from_setting (const cmd_list_element *cmd, struct gdbarch *gdbarch)
       {
 	std::string cmd_val = get_setshow_command_value_string (cmd);
 
-	return value_cstring (cmd_val.c_str (), cmd_val.size (),
+	return value_cstring (cmd_val.c_str (), cmd_val.size () + 1,
 			      builtin_type (gdbarch)->builtin_char);
       }
 
@@ -2212,7 +2213,8 @@ str_value_from_setting (const cmd_list_element *cmd, struct gdbarch *gdbarch)
 	 escaping quotes.  So, we directly use the cmd->var string value,
 	 similarly to the value_from_setting code for these cases.  */
       if (*(char **) cmd->var)
-	return value_cstring (*(char **) cmd->var, strlen (*(char **) cmd->var),
+	return value_cstring (*(char **) cmd->var,
+			      strlen (*(char **) cmd->var) + 1,
 			      builtin_type (gdbarch)->builtin_char);
       else
 	return value_cstring ("", 1,
diff --git a/gdb/guile/scm-math.c b/gdb/guile/scm-math.c
index d9fd67181966..8b7fe9b69ecd 100644
--- a/gdb/guile/scm-math.c
+++ b/gdb/guile/scm-math.c
@@ -776,8 +776,6 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
 	}
       else if (scm_is_string (obj))
 	{
-	  size_t len;
-
 	  if (type != NULL)
 	    {
 	      except_scm = gdbscm_make_misc_error (func_name, type_arg_pos,
@@ -789,12 +787,12 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
 	    {
 	      /* TODO: Provide option to specify conversion strategy.  */
 	      gdb::unique_xmalloc_ptr<char> s
-		= gdbscm_scm_to_string (obj, &len,
+		= gdbscm_scm_to_string (obj, nullptr,
 					target_charset (gdbarch),
 					0 /*non-strict*/,
 					&except_scm);
 	      if (s != NULL)
-		value = value_cstring (s.get (), len,
+		value = value_cstring (s.get (), strlen (s.get ()) + 1,
 				       language_string_char_type (language,
 								  gdbarch));
 	      else
diff --git a/gdb/python/py-value.c b/gdb/python/py-value.c
index 8df8a15f8d6b..bc0f88b7a166 100644
--- a/gdb/python/py-value.c
+++ b/gdb/python/py-value.c
@@ -1907,7 +1907,7 @@ convert_value_from_python (PyObject *obj)
 	  gdb::unique_xmalloc_ptr<char> s
 	    = python_string_to_target_string (obj);
 	  if (s != NULL)
-	    value = value_cstring (s.get (), strlen (s.get ()),
+	    value = value_cstring (s.get (), strlen (s.get ()) + 1,
 				   builtin_type_pychar);
 	}
       else if (PyObject_TypeCheck (obj, &value_object_type))
diff --git a/gdb/testsuite/gdb.base/internal-string-values.c b/gdb/testsuite/gdb.base/internal-string-values.c
new file mode 100644
index 000000000000..b9c7c847cbf5
--- /dev/null
+++ b/gdb/testsuite/gdb.base/internal-string-values.c
@@ -0,0 +1,32 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+static void
+trace_me (void)
+{}
+
+static void
+end (void)
+{}
+
+int
+main (void)
+{
+  trace_me ();
+  end ();
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/internal-string-values.exp b/gdb/testsuite/gdb.base/internal-string-values.exp
new file mode 100644
index 000000000000..0748ab0984c6
--- /dev/null
+++ b/gdb/testsuite/gdb.base/internal-string-values.exp
@@ -0,0 +1,198 @@
+# Copyright 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that string values are correctly allocated inside GDB when doing
+# various operations that yield strings.
+#
+# The issue that lead to this test was a missing NULL terminator in the C-string
+# values.  We verify that we can print the null terminator of these strings.
+
+load_lib "trace-support.exp"
+load_lib "gdb-guile.exp"
+
+standard_testfile
+
+if {[build_executable "failed to prepare" $testfile $srcfile ]} {
+    return
+}
+
+# Check that the string contained in the convenienced variable $v is
+# EXPECTED_STR.
+#
+# In particular, check that the null terminator is there and that we can't
+# access a character past the end of the string.
+
+proc check_v_string { expected_str } {
+    set len [string length $expected_str]
+
+    for { set i 0 } { $i < $len } { incr i } {
+	set c [string index $expected_str $i]
+	gdb_test "print \$v\[$i\]" "= $::decimal '$c'"
+    }
+
+    # Check that the string ends with a null terminator.
+    gdb_test "print \$v\[$i\]" {= 0 '\\000'}
+
+    # Check that we can't access a character after the end of the string.
+    incr i
+    gdb_test "print \$v\[$i\]" "no such vector element"
+}
+
+# Test with string values made by $_gdb_setting & co.
+
+proc_with_prefix test_setting { } {
+    clean_restart
+
+    # This is an internal GDB implementation detail, but the variable backing a
+    # string setting starts as nullptr (unless explicitly initialized at startup).
+    # When assigning an empty value, the variable then points to an empty string.
+    # Test both cases, as it triggers different code paths (in addition to a
+    # non-empty value).
+    #
+    # Use "set trace-user" and "maintenance set test-settings string" as they are
+    # both not initialized at startup.
+    with_test_prefix "user setting" {
+	with_test_prefix "not set" {
+	    foreach_with_prefix conv_func {$_gdb_setting $_gdb_setting_str} {
+		gdb_test_no_output "set \$v = ${conv_func}(\"trace-user\")"
+		check_v_string ""
+	    }
+	}
+
+	with_test_prefix "set to empty" {
+	    gdb_test "set trace-user"
+	    foreach_with_prefix conv_func {$_gdb_setting $_gdb_setting_str} {
+		gdb_test_no_output "set \$v = ${conv_func}(\"trace-user\")"
+		check_v_string ""
+	    }
+	}
+
+	with_test_prefix "set" {
+	    gdb_test "set trace-user poulet"
+	    foreach_with_prefix conv_func {$_gdb_setting $_gdb_setting_str} {
+		gdb_test_no_output {set $v = $_gdb_setting("trace-user")}
+		check_v_string "poulet"
+	    }
+	}
+    }
+
+    with_test_prefix "maintenance setting" {
+	with_test_prefix "not set" {
+	    foreach_with_prefix conv_func {$_gdb_maint_setting $_gdb_maint_setting_str} {
+		gdb_test_no_output "set \$v = ${conv_func}(\"test-settings string\")"
+		check_v_string ""
+	    }
+	}
+
+	with_test_prefix "set to empty" {
+	    gdb_test "maintenance set test-settings string"
+	    foreach_with_prefix conv_func {$_gdb_maint_setting $_gdb_maint_setting_str} {
+		gdb_test_no_output "set \$v = ${conv_func}(\"test-settings string\")"
+		check_v_string ""
+	    }
+	}
+
+	with_test_prefix "set" {
+	    gdb_test "maintenance set test-settings string perchaude"
+	    foreach_with_prefix conv_func {$_gdb_maint_setting $_gdb_maint_setting_str} {
+		gdb_test_no_output "set \$v = ${conv_func}(\"test-settings string\")"
+		check_v_string "perchaude"
+	    }
+	}
+    }
+
+    # Test with a non-string setting, this tests yet another code path.
+    with_test_prefix "integer setting" {
+	gdb_test_no_output {set $v = $_gdb_setting_str("remotetimeout")}
+	check_v_string "2"
+    }
+}
+
+# Test with a string value created by gdb.Value in Python.
+
+proc_with_prefix test_python_value { } {
+    clean_restart
+
+    if {[skip_python_tests]} {
+	untested "skipping test_python_value"
+	return
+    }
+
+    gdb_test_no_output "python gdb.set_convenience_variable(\"v\", \"bar\")" \
+	"set convenience var"
+    check_v_string "bar"
+}
+
+# Test with a string value created by make-value in Guile.
+
+proc_with_prefix test_guile_value { } {
+    clean_restart
+
+    if {[skip_guile_tests]} {
+	untested "skipping test_guile_value"
+	return
+    }
+
+    # We can't set a convenience var from Guile, but we can append to history.
+    # Do that, then transfer to a convenience var with a CLI command.
+    gdb_test_no_output "guile (use-modules (gdb))"
+    gdb_test_multiple "guile (history-append! (make-value \"foo\"))" "make value" {
+	-re -wrap "($::decimal)" {
+	    set histnum $expect_out(1,string)
+	}
+    }
+
+    gdb_test_no_output "set \$v = \$$histnum"
+    check_v_string "foo"
+}
+
+# Test with a string value coming from a string internal var.  The only internal
+# vars of this type, at the time of writing, are $trace_func and $trace_file.
+# They both require inspecting a trace frame.  So if the target is capable start
+# tracing, record one trace frame, and use $trace_func.
+
+proc_with_prefix test_internal_var { } {
+    if {![gdb_trace_common_supports_arch]} {
+	unsupported "arch does not support trace"
+	return
+    }
+
+    clean_restart $::binfile
+
+    if {![runto_main]} {
+	fail "could not run to main"
+	return
+    }
+
+    if {![gdb_target_supports_trace]} {
+	unsupported "target does not support trace"
+	return
+    }
+
+    gdb_test "break end" "Breakpoint $::decimal at $::hex.*"
+    gdb_test "trace trace_me" "Tracepoint $::decimal at $::hex.*"
+    gdb_test_no_output "tstart"
+    gdb_test "continue" "Breakpoint $::decimal, end.*"
+    gdb_test_no_output "tstop"
+    gdb_test "tfind" "Found trace frame 0, tracepoint $::decimal.*"
+    gdb_test_no_output "set \$v = \$trace_func"
+    gdb_test "tfind none" "No longer looking at any trace frame.*"
+    check_v_string "trace_me"
+}
+
+test_setting
+test_python_value
+test_guile_value
+test_internal_var
diff --git a/gdb/testsuite/gdb.base/settings.exp b/gdb/testsuite/gdb.base/settings.exp
index 64257aaae5a7..0a594685d6b1 100644
--- a/gdb/testsuite/gdb.base/settings.exp
+++ b/gdb/testsuite/gdb.base/settings.exp
@@ -498,7 +498,7 @@ proc_with_prefix test-enum {} {
     gdb_test_no_output "$set_cmd zzz"
     show_setting "$show_cmd" "zzz" 0 "yyy"
 
-    check_type "test-settings enum" "type = char \\\[3\\\]"
+    check_type "test-settings enum" "type = char \\\[4\\\]"
 
     test_gdb_complete_multiple "$set_cmd " "" "" {
 	"xxx"
diff --git a/gdb/testsuite/gdb.python/py-mi.exp b/gdb/testsuite/gdb.python/py-mi.exp
index 6f0c1c65e62a..b81ee9acd747 100644
--- a/gdb/testsuite/gdb.python/py-mi.exp
+++ b/gdb/testsuite/gdb.python/py-mi.exp
@@ -289,7 +289,7 @@ mi_create_dynamic_varobj nstype2 nstype2 1 \
   "create nstype2 varobj"
 
 mi_list_varobj_children nstype2 {
-    { {nstype2.<error at 0>} {<error at 0>} 6 {char \[6\]} }
+    { {nstype2.<error at 0>} {<error at 0>} 7 {char \[7\]} }
 } "list children after setting exception flag"
 
 mi_create_varobj me me \
diff --git a/gdb/valops.c b/gdb/valops.c
index bd547923496f..f67612e87b08 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1747,6 +1747,8 @@ value_array (int lowbound, int highbound, struct value **elemvec)
   return val;
 }
 
+/* See value.h.  */
+
 struct value *
 value_cstring (const char *ptr, ssize_t len, struct type *char_type)
 {
@@ -1761,14 +1763,7 @@ value_cstring (const char *ptr, ssize_t len, struct type *char_type)
   return val;
 }
 
-/* Create a value for a string constant by allocating space in the
-   inferior, copying the data into that space, and returning the
-   address with type TYPE_CODE_STRING.  PTR points to the string
-   constant data; LEN is number of characters.
-
-   Note that string types are like array of char types with a lower
-   bound of zero and an upper bound of LEN - 1.  Also note that the
-   string may contain embedded null bytes.  */
+/* See value.h.  */
 
 struct value *
 value_string (const char *ptr, ssize_t len, struct type *char_type)
diff --git a/gdb/value.c b/gdb/value.c
index 6a07495d32bc..db34d2e5762f 100644
--- a/gdb/value.c
+++ b/gdb/value.c
@@ -2188,7 +2188,7 @@ value_of_internalvar (struct gdbarch *gdbarch, struct internalvar *var)
       break;
 
     case INTERNALVAR_STRING:
-      val = value_cstring (var->u.string, strlen (var->u.string),
+      val = value_cstring (var->u.string, strlen (var->u.string) + 1,
 			   builtin_type (gdbarch)->builtin_char);
       break;
 
diff --git a/gdb/value.h b/gdb/value.h
index 379cddafbe7f..24392437ed0b 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -782,8 +782,22 @@ class scoped_value_mark
   const struct value *m_value;
 };
 
+/* Create a not_lval value with type TYPE_CODE_ARRAY.
+
+   PTR points to the string data; LEN is the length of the string including the
+   NULL terminator.  */
+
 extern struct value *value_cstring (const char *ptr, ssize_t len,
 				    struct type *char_type);
+
+/* Create a not_lval value with type TYPE_CODE_STRING.
+
+   PTR points to the string data; LEN is number of characters.
+
+   Note that string types are like array of char types with a lower
+   bound of zero and an upper bound of LEN - 1.  Also note that the
+   string may contain embedded null bytes.  */
+
 extern struct value *value_string (const char *ptr, ssize_t len,
 				   struct type *char_type);
 
-- 
2.32.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gdb: consider null terminator for length arguments of value_cstring calls
  2021-07-13 18:44   ` Simon Marchi
@ 2021-07-14 10:25     ` Pedro Alves
  2021-07-14 13:50       ` Simon Marchi
  2021-07-21 15:21       ` Philippe Waroquiers
  0 siblings, 2 replies; 8+ messages in thread
From: Pedro Alves @ 2021-07-14 10:25 UTC (permalink / raw)
  To: Simon Marchi, Simon Marchi, gdb-patches, Philippe Waroquiers

Adding Philippe as $_gdb_setting original author.

On 2021-07-13 7:44 p.m., Simon Marchi wrote:
> On 2021-07-13 1:07 p.m., Pedro Alves wrote:
>> On 2021-07-13 4:31 p.m., Simon Marchi via Gdb-patches wrote:

> Ohh, thanks.  The changes I needed to do in the existing tests should
> have made me think about that.  I liked the inferior call approach,
> because it makes the inferior program crash, so it's an obvious bug.  By
> just printing the string elements, it may not be as clear cut that it's
> a bug.  As long as the values stay in GDB, they don't cause a crash (or
> at least, I didn't find a way to cause a crash).
> 
> But if there's not doubt that null-terminating these strings is what we
> want, your approach is fine and makes the test a bit simpler.  I added a
> little mention of the inferior function call way of reproducing the bug
> in the commit message.  See the updated patch below.
> 

For C, it makes sence to me to return a null terminated string.  Though I wonder
whether the string you get should depend on language?  I mean, see this:

 (gdb) set language c
 (gdb) ptype "foo"
 type = char [4]
 (gdb) set language ada
 (gdb) ptype "foo"
 type = array (1 .. 3) of character

Whatever we decide, I think we should update the manual to make it explicit.

> 
> creates a struct value of code TYPE_CODE_ARRAY of size 3, which doesn't
> have a null terminator.  That does not create a valid C string.  It is
> however printed correctly by GDB, because the printing code makes sure
> not to read past the value's length.
> 
> A way to expose the bug is to print each element of the created string,
> including the null terminator.  Before:
> 
>     (gdb) set args foo
>     (gdb) p $_gdb_setting("args")[3]
>     no such vector element
> 
> After:
> 
>     (gdb) set args foo
>     (gdb) p $_gdb_setting("args")[3]
>     $1 = 0 '\000'
> 
> Another perhaps more convincing way of showing the bug is if the string
> value is passed to an inferior function call;
> 
>     (gdb) print an_inferior_function($_gdb_setting("args))
> 
> The space allocate for the string in the inferior will not take into

"The space allocated" ?

Such an inferior call in Ada (and other non-terminated string languages) will now
be passing an unexpected \0 to the called function, of course.  I wonder
whether that's why Philippe didn't add the \0 in the first place, given his
Ada bias?  :-)

> account a null terminator (with the string "foo", 3 bytes will be
> allocated).  If the inferior tries to read the string until the null
> terminator, it will read past the allocated buffer.  Compiling the
> inferior with AddressSanitizer makes that bad access obvious.
> 
> I found a few calls to value_cstring that were wrong, I fixed them
> all, all exercised by the test.
> 
> The change in guile/scm-math.c maybe requires a bit of explanation.
> According to the doc of gdbscm_scm_to_string, if don't pass a length

"if we don't" ?

> pointer, we get back a null-terminated string.  If we pass a length
> pointer, we get a non-null-terminated string, but we get the length
> separately.  Trying to pass "len + 1" to value_cstring would lead to GDB
> reading past the allocated buffer, that is exactly of length `len`.  So
> instead, don't pass a length pointer and use strlen on the result.



> +load_lib "trace-support.exp"
> +load_lib "gdb-guile.exp"
> +
> +standard_testfile
> +
> +if {[build_executable "failed to prepare" $testfile $srcfile ]} {
> +    return
> +}
> +
> +# Check that the string contained in the convenienced variable $v is

convenienced -> convenience

> +# EXPECTED_STR.
> +#
> +# In particular, check that the null terminator is there and that we can't
> +# access a character past the end of the string.
> +
> +proc check_v_string { expected_str } {

> +# Test with string values made by $_gdb_setting & co.
> +
> +proc_with_prefix test_setting { } {
> +    clean_restart
> +
> +    # This is an internal GDB implementation detail, but the variable backing a
> +    # string setting starts as nullptr (unless explicitly initialized at startup).
> +    # When assigning an empty value, the variable then points to an empty string.
> +    # Test both cases, as it triggers different code paths (in addition to a
> +    # non-empty value).

The paranthesis remark in this last sentence gave me pause, as it sounded like
you were saying that the different code paths result in a non-empty value.

I'd suggest instead:

   # Test both cases, as it triggers different code paths, in addition to testing a
   # non-empty value.

> +    #
> +    # Use "set trace-user" and "maintenance set test-settings string" as they are
> +    # both not initialized at startup.
> +    with_test_prefix "user setting" {
> +	with_test_prefix "not set" {
> +	    foreach_with_prefix conv_func {$_gdb_setting $_gdb_setting_str} {
> +		gdb_test_no_output "set \$v = ${conv_func}(\"trace-user\")"
> +		check_v_string ""
> +	    }
> +	}


> +# Test with a string value created by make-value in Guile.
> +
> +proc_with_prefix test_guile_value { } {
> +    clean_restart
> +
> +    if {[skip_guile_tests]} {
> +	untested "skipping test_guile_value"
> +	return
> +    }
> +
> +    # We can't set a convenience var from Guile, but we can append to history.
> +    # Do that, then transfer to a convenience var with a CLI command.
> +    gdb_test_no_output "guile (use-modules (gdb))"
> +    gdb_test_multiple "guile (history-append! (make-value \"foo\"))" "make value" {
> +	-re -wrap "($::decimal)" {
> +	    set histnum $expect_out(1,string)

set histnum to something before gdb_test_multiple, to avoid ending up with an
unset tcl variable error if this gdb_test_multiple fails.

> +	}
> +    }
> +
> +    gdb_test_no_output "set \$v = \$$histnum"
> +    check_v_string "foo"
> +}
> +




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gdb: consider null terminator for length arguments of value_cstring calls
  2021-07-14 10:25     ` Pedro Alves
@ 2021-07-14 13:50       ` Simon Marchi
  2021-07-15 14:29         ` Pedro Alves
  2021-07-21 15:21       ` Philippe Waroquiers
  1 sibling, 1 reply; 8+ messages in thread
From: Simon Marchi @ 2021-07-14 13:50 UTC (permalink / raw)
  To: Pedro Alves, Simon Marchi, gdb-patches, Philippe Waroquiers

> For C, it makes sence to me to return a null terminated string.  Though I wonder
> whether the string you get should depend on language?  I mean, see this:
>
>  (gdb) set language c
>  (gdb) ptype "foo"
>  type = char [4]
>  (gdb) set language ada
>  (gdb) ptype "foo"
>  type = array (1 .. 3) of character
>
> Whatever we decide, I think we should update the manual to make it explicit.

Agreed, I think the result of:

  (gdb) ptype "foo"

should be the same as

  (gdb) set args foo
  (gdb) ptype $_gdb_setting("args")

for every language.

For Ada though, I get:

  $ ./gdb -q -nx --data-directory=data-directory -ex "set language ada" -ex "set args foo" -ex 'ptype $_gdb_setting("args")' -ex 'print $_gdb_setting("args")' -batch
  =================================================================
  ==1687217==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200007a374 at pc 0x558ffff54aa4 bp 0x7ffc05e9dc30 sp 0x7ffc05e9dc20
  READ of size 1 at 0x60200007a374 thread T0
      #0 0x558ffff54aa3 in find_command_name_length(char const*) /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:1554
      #1 0x558ffff54e81 in lookup_cmd_1(char const**, cmd_list_element*, cmd_list_element**, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, bool) /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:1611
      #2 0x558ffff557e7 in lookup_cmd(char const**, cmd_list_element*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, int) /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:1764
      #3 0x558ffff36950 in setting_cmd /home/simark/src/binutils-gdb/gdb/cli/cli-cmds.c:2079
      #4 0x558ffff3737f in gdb_setting_internal_fn /home/simark/src/binutils-gdb/gdb/cli/cli-cmds.c:2167
      #5 0x559001394ac0 in call_internal_function(gdbarch*, language_defn const*, value*, int, value**) /home/simark/src/binutils-gdb/gdb/value.c:2469
      #6 0x558fffac5dbb in expr::ada_funcall_operation::evaluate(type*, expression*, noside) /home/simark/src/binutils-gdb/gdb/ada-lang.c:10633
      #7 0x5590003c4419 in expression::evaluate(type*, noside) /home/simark/src/binutils-gdb/gdb/eval.c:101
      #8 0x5590003c455b in evaluate_expression(expression*, type*) /home/simark/src/binutils-gdb/gdb/eval.c:115
      #9 0x559000ab4777 in process_print_command_args /home/simark/src/binutils-gdb/gdb/printcmd.c:1305
      #10 0x559000ab4947 in print_command_1 /home/simark/src/binutils-gdb/gdb/printcmd.c:1318
      #11 0x559000ab535a in print_command /home/simark/src/binutils-gdb/gdb/printcmd.c:1435
      #12 0x558ffff4ea06 in do_const_cfunc /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:102
      #13 0x558ffff59a2b in cmd_func(cmd_list_element*, char const*, int) /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:2160
      #14 0x55900113062b in execute_command(char const*, int) /home/simark/src/binutils-gdb/gdb/top.c:674
      #15 0x5590008230ee in catch_command_errors /home/simark/src/binutils-gdb/gdb/main.c:523
      #16 0x559000823796 in execute_cmdargs /home/simark/src/binutils-gdb/gdb/main.c:618
      #17 0x559000826d34 in captured_main_1 /home/simark/src/binutils-gdb/gdb/main.c:1322
      #18 0x559000827330 in captured_main /home/simark/src/binutils-gdb/gdb/main.c:1343
      #19 0x5590008273c5 in gdb_main(captured_main_args*) /home/simark/src/binutils-gdb/gdb/main.c:1368
      #20 0x558fffa0a6a8 in main /home/simark/src/binutils-gdb/gdb/gdb.c:32
      #21 0x7f217afd9b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
      #22 0x558fffa0a47d in _start (/home/simark/build/binutils-gdb/gdb/gdb+0x12c547d)

  0x60200007a374 is located 0 bytes to the right of 4-byte region [0x60200007a370,0x60200007a374)
  allocated by thread T0 here:
      #0 0x7f217c25f459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
      #1 0x558fffb71bc0 in xcalloc /home/simark/src/binutils-gdb/gdb/alloc.c:100
      #2 0x55900251d16d in xzalloc(unsigned long) /home/simark/src/binutils-gdb/gdbsupport/common-utils.cc:28
      #3 0x55900138cbfd in allocate_value_contents /home/simark/src/binutils-gdb/gdb/value.c:1024
      #4 0x55900138cc37 in allocate_value(type*) /home/simark/src/binutils-gdb/gdb/value.c:1035
      #5 0x559001352466 in value_string(char const*, long, type*) /home/simark/src/binutils-gdb/gdb/valops.c:1777
      #6 0x5590003ca5b5 in eval_op_string(type*, expression*, noside, int, char const*) /home/simark/src/binutils-gdb/gdb/eval.c:1089
      #7 0x558fffa56120 in expr::string_operation::evaluate(type*, expression*, noside) /home/simark/src/binutils-gdb/gdb/expop.h:918
      #8 0x558fffac345b in expr::ada_string_operation::evaluate(type*, expression*, noside) /home/simark/src/binutils-gdb/gdb/ada-lang.c:10154
      #9 0x558fffac56e8 in expr::ada_funcall_operation::evaluate(type*, expression*, noside) /home/simark/src/binutils-gdb/gdb/ada-lang.c:10563
      #10 0x5590003c4419 in expression::evaluate(type*, noside) /home/simark/src/binutils-gdb/gdb/eval.c:101
      #11 0x5590003c455b in evaluate_expression(expression*, type*) /home/simark/src/binutils-gdb/gdb/eval.c:115
      #12 0x559000ab4777 in process_print_command_args /home/simark/src/binutils-gdb/gdb/printcmd.c:1305
      #13 0x559000ab4947 in print_command_1 /home/simark/src/binutils-gdb/gdb/printcmd.c:1318
      #14 0x559000ab535a in print_command /home/simark/src/binutils-gdb/gdb/printcmd.c:1435
      #15 0x558ffff4ea06 in do_const_cfunc /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:102
      #16 0x558ffff59a2b in cmd_func(cmd_list_element*, char const*, int) /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:2160
      #17 0x55900113062b in execute_command(char const*, int) /home/simark/src/binutils-gdb/gdb/top.c:674
      #18 0x5590008230ee in catch_command_errors /home/simark/src/binutils-gdb/gdb/main.c:523
      #19 0x559000823796 in execute_cmdargs /home/simark/src/binutils-gdb/gdb/main.c:618
      #20 0x559000826d34 in captured_main_1 /home/simark/src/binutils-gdb/gdb/main.c:1322
      #21 0x559000827330 in captured_main /home/simark/src/binutils-gdb/gdb/main.c:1343
      #22 0x5590008273c5 in gdb_main(captured_main_args*) /home/simark/src/binutils-gdb/gdb/main.c:1368
      #23 0x558fffa0a6a8 in main /home/simark/src/binutils-gdb/gdb/gdb.c:32
      #24 0x7f217afd9b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

Let's pretend I didn't see this and try with a non-Asan GDB:

  $ ./gdb -q -nx --data-directory=data-directory -ex "set language ada" -ex "set args foo" -ex 'ptype $_gdb_setting("args")' -ex 'print $_gdb_setting("args")' -batch
  type = <4-byte integer>
  $1 = (102, 111, 111)

  $ ./gdb -q -nx --data-directory=data-directory -ex "set language ada" -ex "set args foo" -ex 'ptype $_gdb_setting("args")' -ex 'print $_gdb_setting("args")' -batch
  type = <4-byte integer>
  $1 = (102, 111, 111, 0)

Not sure what the 4-byte integer refers to.  But in any case, the result
doesn't look good.

The created type (using value_cstring vs value_string) should probably
depend on the language.  I tried swapping value_cstring for
value_string, but that wasn't enough.  So, it seems there's work to do,
but I don't think I'm the right person to do it as I don't know much
about Ada (and don't have much time to dig into it).

>> creates a struct value of code TYPE_CODE_ARRAY of size 3, which doesn't
>> have a null terminator.  That does not create a valid C string.  It is
>> however printed correctly by GDB, because the printing code makes sure
>> not to read past the value's length.
>>
>> A way to expose the bug is to print each element of the created string,
>> including the null terminator.  Before:
>>
>>     (gdb) set args foo
>>     (gdb) p $_gdb_setting("args")[3]
>>     no such vector element
>>
>> After:
>>
>>     (gdb) set args foo
>>     (gdb) p $_gdb_setting("args")[3]
>>     $1 = 0 '\000'
>>
>> Another perhaps more convincing way of showing the bug is if the string
>> value is passed to an inferior function call;
>>
>>     (gdb) print an_inferior_function($_gdb_setting("args))
>>
>> The space allocate for the string in the inferior will not take into
>
> "The space allocated" ?

Fixed.

> Such an inferior call in Ada (and other non-terminated string languages) will now
> be passing an unexpected \0 to the called function, of course.  I wonder
> whether that's why Philippe didn't add the \0 in the first place, given his
> Ada bias?  :-)

Hmm, maybe.  I would appreciate if somebody familiar with Ada could look
into it.  At least, make sure I don't make things worse than they
already are.

>> account a null terminator (with the string "foo", 3 bytes will be
>> allocated).  If the inferior tries to read the string until the null
>> terminator, it will read past the allocated buffer.  Compiling the
>> inferior with AddressSanitizer makes that bad access obvious.
>>
>> I found a few calls to value_cstring that were wrong, I fixed them
>> all, all exercised by the test.
>>
>> The change in guile/scm-math.c maybe requires a bit of explanation.
>> According to the doc of gdbscm_scm_to_string, if don't pass a length
>
> "if we don't" ?

Fixed.

>> pointer, we get back a null-terminated string.  If we pass a length
>> pointer, we get a non-null-terminated string, but we get the length
>> separately.  Trying to pass "len + 1" to value_cstring would lead to GDB
>> reading past the allocated buffer, that is exactly of length `len`.  So
>> instead, don't pass a length pointer and use strlen on the result.
>
>
>
>> +load_lib "trace-support.exp"
>> +load_lib "gdb-guile.exp"
>> +
>> +standard_testfile
>> +
>> +if {[build_executable "failed to prepare" $testfile $srcfile ]} {
>> +    return
>> +}
>> +
>> +# Check that the string contained in the convenienced variable $v is
>
> convenienced -> convenience

Fixed.

>> +# EXPECTED_STR.
>> +#
>> +# In particular, check that the null terminator is there and that we can't
>> +# access a character past the end of the string.
>> +
>> +proc check_v_string { expected_str } {
>
>> +# Test with string values made by $_gdb_setting & co.
>> +
>> +proc_with_prefix test_setting { } {
>> +    clean_restart
>> +
>> +    # This is an internal GDB implementation detail, but the variable backing a
>> +    # string setting starts as nullptr (unless explicitly initialized at startup).
>> +    # When assigning an empty value, the variable then points to an empty string.
>> +    # Test both cases, as it triggers different code paths (in addition to a
>> +    # non-empty value).
>
> The paranthesis remark in this last sentence gave me pause, as it sounded like
> you were saying that the different code paths result in a non-empty value.
>
> I'd suggest instead:
>
>    # Test both cases, as it triggers different code paths, in addition to testing a
>    # non-empty value.

Fixed that.  Although with my other series that uses std::string for
string settings, that's not true anymore.  I could remove that comment
and the extra tests in that patch...

>> +# Test with a string value created by make-value in Guile.
>> +
>> +proc_with_prefix test_guile_value { } {
>> +    clean_restart
>> +
>> +    if {[skip_guile_tests]} {
>> +	untested "skipping test_guile_value"
>> +	return
>> +    }
>> +
>> +    # We can't set a convenience var from Guile, but we can append to history.
>> +    # Do that, then transfer to a convenience var with a CLI command.
>> +    gdb_test_no_output "guile (use-modules (gdb))"
>> +    gdb_test_multiple "guile (history-append! (make-value \"foo\"))" "make value" {
>> +	-re -wrap "($::decimal)" {
>> +	    set histnum $expect_out(1,string)
>
> set histnum to something before gdb_test_multiple, to avoid ending up with an
> unset tcl variable error if this gdb_test_multiple fails.

Done.

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gdb: consider null terminator for length arguments of value_cstring calls
  2021-07-14 13:50       ` Simon Marchi
@ 2021-07-15 14:29         ` Pedro Alves
  2021-07-23 19:37           ` Simon Marchi
  0 siblings, 1 reply; 8+ messages in thread
From: Pedro Alves @ 2021-07-15 14:29 UTC (permalink / raw)
  To: Simon Marchi, Simon Marchi, gdb-patches, Philippe Waroquiers

On 2021-07-14 2:50 p.m., Simon Marchi wrote:
>> For C, it makes sence to me to return a null terminated string.  Though I wonder
>> whether the string you get should depend on language?  I mean, see this:
>>
>>  (gdb) set language c
>>  (gdb) ptype "foo"
>>  type = char [4]
>>  (gdb) set language ada
>>  (gdb) ptype "foo"
>>  type = array (1 .. 3) of character
>>
>> Whatever we decide, I think we should update the manual to make it explicit.
> 
> Agreed, I think the result of:
> 
>   (gdb) ptype "foo"
> 
> should be the same as
> 
>   (gdb) set args foo
>   (gdb) ptype $_gdb_setting("args")
> 
> for every language.
> 
> For Ada though, I get:

...

> Let's pretend I didn't see this and try with a non-Asan GDB:
> 
>   $ ./gdb -q -nx --data-directory=data-directory -ex "set language ada" -ex "set args foo" -ex 'ptype $_gdb_setting("args")' -ex 'print $_gdb_setting("args")' -batch
>   type = <4-byte integer>
>   $1 = (102, 111, 111)
> 
>   $ ./gdb -q -nx --data-directory=data-directory -ex "set language ada" -ex "set args foo" -ex 'ptype $_gdb_setting("args")' -ex 'print $_gdb_setting("args")' -batch
>   type = <4-byte integer>
>   $1 = (102, 111, 111, 0)
> 
> Not sure what the 4-byte integer refers to.  But in any case, the result
> doesn't look good.

That "4-byte integer" thing is a red herring and a known issue -- it also happens in C:

$ $g -q -nx  -ex "set language c" -ex "set args foo" -ex 'ptype $_gdb_setting("args")' -ex 'print $_gdb_setting("args")' -batch
type = int
$1 = "foo"

See settings.exp:

# Verifies that $_gdb_setting (SETTING) gives a value whose ptype matches EXPECTED.
proc check_type {setting expected} {
    with_test_prefix "check_type $setting $expected" {
        gdb_test "print \$_gdb_maint_setting(\"$setting\")"
        gdb_test "ptype $" "$expected"

        # Currently, GDB ptype <internal function> always tells it is type int.
        # ptype should better report an error such as:
        #   "No type information for GDB functions"
        # Test 'type int', so as to make it fail if ptype is changed.
        gdb_test "ptype \$_gdb_maint_setting(\"$setting\")" \
            "type = int"
    }
}

So instead, try (current master):

$ ./gdb -q -nx --data-directory=data-directory -ex "set language c" -ex "set args foo" -ex 'print $_gdb_setting("args")' -ex 'ptype $' -batch
$1 = "foo"
type = char [3]

> 
> The created type (using value_cstring vs value_string) should probably
> depend on the language.  I tried swapping value_cstring for
> value_string, but that wasn't enough.  So, it seems there's work to do,
> but I don't think I'm the right person to do it as I don't know much
> about Ada (and don't have much time to dig into it).

I think what we need is a language hook to return a string struct value from
a const char * / len.  For C, the implementation would use value_cstring,
for Ada it would use value_string.  The testcase can cycle through
all languages and make sure that "print/ptype "foo" and 
print/ptype \$_gdb_maint_setting return the same thing.

I gave this a try, see patch below, on top of your v2.  I've also put it in the 
users/palves/value_string branch on sourceware. 

Note I went the other direction and made value_cstring work with characters instead of
bytes, and made it add the \0 itself.  Thus I undid the guile change to use nullptr
instead of &len, as well as the strlen()+1 changes.

After this, the only value_cstring calls left are in the C parser, and in the
language method:

 $ grep value_cstring -rn
 value.h:792:extern struct value *value_cstring (const char *ptr, ssize_t len,
 c-lang.c:680:   result = value_cstring ((const char *) obstack_base (&output),
 valops.c:1753:value_cstring (const char *ptr, ssize_t len, struct type *char_type)
 language.c:985:  return value_cstring (ptr, len, type);

Note how the the current python and guile code were already depending on
the language, but in an odd way.  E.g., in Python:

-#define builtin_type_pychar \
-  language_string_char_type (python_language, python_gdbarch)

-           value = value_cstring (s.get (), strlen (s.get ()) + 1,
-                                  builtin_type_pychar);


I don't know what that was supposed to do when the language was not C?

The test runs into a few pre-existing oddities:

 (gdb) set language modula-2 
 (gdb) p "foo"
 strings are not implemented

 (gdb) p $_gdb_maint_setting("test-settings string")
 ',' or ')' expected

 (gdb) set language unknown 
 (gdb) p "foo"
 Aborted (core dumped)

I have no idea how rust one is just weird, but then again if parsing worked, I don't
think the call would work anyway, since Rust strings are a {ptr,len}
aggregate, not a plain char array.

I filed PR gdb/28093 for the crash.


From 50991aaf22b03a9925ae37155f16bc8257f95242 Mon Sep 17 00:00:00 2001
From: Pedro Alves <pedro@palves.net>
Date: Thu, 15 Jul 2021 10:47:56 +0100
Subject: [PATCH] all languages

Change-Id: I79ef914087dbf85e1cbc19263843a6034383afe7
---
 gdb/ada-lang.c                                | 10 +++
 gdb/c-lang.c                                  | 12 ++-
 gdb/cli/cli-cmds.c                            | 22 +++--
 gdb/f-lang.c                                  |  8 ++
 gdb/f-lang.h                                  |  3 +
 gdb/guile/scm-math.c                          |  8 +-
 gdb/language.c                                |  9 +++
 gdb/language.h                                |  7 ++
 gdb/python/py-value.c                         |  8 +-
 .../gdb.base/internal-string-values.exp       | 80 +++++++++++++++++++
 gdb/valops.c                                  |  7 +-
 gdb/value.c                                   |  5 +-
 gdb/value.h                                   |  8 +-
 13 files changed, 152 insertions(+), 35 deletions(-)

diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
index b098991612d..82319e49326 100644
--- a/gdb/ada-lang.c
+++ b/gdb/ada-lang.c
@@ -12900,6 +12900,16 @@ class ada_language : public language_defn
     return language_defn::read_var_value (var, var_block, frame);
   }
 
+  struct value *value_string (struct gdbarch *gdbarch,
+			      const char *ptr, ssize_t len) const override
+  {
+    struct type *type = language_string_char_type (this, gdbarch);
+    value *val = ::value_string (ptr, len, type);
+    /* See ada_string_operation::evaluate.  */
+    value_type (val)->set_code (TYPE_CODE_ARRAY);
+    return val;
+  }
+
   /* See language.h.  */
   void language_arch_info (struct gdbarch *gdbarch,
 			   struct language_arch_info *lai) const override
diff --git a/gdb/c-lang.c b/gdb/c-lang.c
index 98f4984b020..d30cebe9f0c 100644
--- a/gdb/c-lang.c
+++ b/gdb/c-lang.c
@@ -653,16 +653,11 @@ c_string_operation::evaluate (struct type *expect_type,
     }
   else
     {
-      int i;
-
-      /* Write the terminating character.  */
-      for (i = 0; i < TYPE_LENGTH (type); ++i)
-	obstack_1grow (&output, 0);
+      int element_size = TYPE_LENGTH (type);
 
       if (satisfy_expected)
 	{
 	  LONGEST low_bound, high_bound;
-	  int element_size = TYPE_LENGTH (type);
 
 	  if (!get_discrete_bounds (expect_type->index_type (),
 				    &low_bound, &high_bound))
@@ -677,10 +672,13 @@ c_string_operation::evaluate (struct type *expect_type,
 	  result = allocate_value (expect_type);
 	  memcpy (value_contents_raw (result), obstack_base (&output),
 		  obstack_object_size (&output));
+	  /* Write the terminating character.  */
+	  memset (value_contents_raw (result) + obstack_object_size (&output),
+		  0, element_size);
 	}
       else
 	result = value_cstring ((const char *) obstack_base (&output),
-				obstack_object_size (&output),
+				obstack_object_size (&output) / element_size,
 				type);
     }
   return result;
diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
index f6fc5ab854f..b445b75fae2 100644
--- a/gdb/cli/cli-cmds.c
+++ b/gdb/cli/cli-cmds.c
@@ -2146,12 +2146,10 @@ value_from_setting (const cmd_list_element *cmd, struct gdbarch *gdbarch)
     case var_filename:
     case var_enum:
       if (*(char **) cmd->var)
-	return value_cstring (*(char **) cmd->var,
-			      strlen (*(char **) cmd->var) + 1,
-			      builtin_type (gdbarch)->builtin_char);
+	return current_language->value_string (gdbarch, *(char **) cmd->var,
+					       strlen (*(char **) cmd->var));
       else
-	return value_cstring ("", 1,
-			      builtin_type (gdbarch)->builtin_char);
+	return current_language->value_string (gdbarch, "", 0);
     default:
       gdb_assert_not_reached ("bad var_type");
     }
@@ -2199,8 +2197,9 @@ str_value_from_setting (const cmd_list_element *cmd, struct gdbarch *gdbarch)
       {
 	std::string cmd_val = get_setshow_command_value_string (cmd);
 
-	return value_cstring (cmd_val.c_str (), cmd_val.size () + 1,
-			      builtin_type (gdbarch)->builtin_char);
+	return current_language->value_string (gdbarch,
+					       cmd_val.data (),
+					       cmd_val.size ());
       }
 
     case var_string:
@@ -2213,12 +2212,11 @@ str_value_from_setting (const cmd_list_element *cmd, struct gdbarch *gdbarch)
 	 escaping quotes.  So, we directly use the cmd->var string value,
 	 similarly to the value_from_setting code for these cases.  */
       if (*(char **) cmd->var)
-	return value_cstring (*(char **) cmd->var,
-			      strlen (*(char **) cmd->var) + 1,
-			      builtin_type (gdbarch)->builtin_char);
+	return current_language->value_string (gdbarch,
+					       *(char **) cmd->var,
+					       strlen (*(char **) cmd->var));
       else
-	return value_cstring ("", 1,
-			      builtin_type (gdbarch)->builtin_char);
+	return current_language->value_string (gdbarch, "", 0);
 
     default:
       gdb_assert_not_reached ("bad var_type");
diff --git a/gdb/f-lang.c b/gdb/f-lang.c
index 16ec9e04044..869a95d4304 100644
--- a/gdb/f-lang.c
+++ b/gdb/f-lang.c
@@ -103,6 +103,14 @@ f_language::get_encoding (struct type *type)
 
 \f
 
+struct value *
+f_language::value_string (struct gdbarch *gdbarch,
+			  const char *ptr, ssize_t len) const
+{
+  struct type *type = language_string_char_type (this, gdbarch);
+  return ::value_string (ptr, len, type);
+}
+
 /* A helper function for the "bound" intrinsics that checks that TYPE
    is an array.  LBOUND_P is true for lower bound; this is used for
    the error message, if any.  */
diff --git a/gdb/f-lang.h b/gdb/f-lang.h
index 1ccdd3978ea..1306b135779 100644
--- a/gdb/f-lang.h
+++ b/gdb/f-lang.h
@@ -193,6 +193,9 @@ class f_language : public language_defn
 		&& TYPE_TARGET_TYPE (type)->code () == TYPE_CODE_CHAR));
   }
 
+  struct value *value_string (struct gdbarch *gdbarch,
+			      const char *ptr, ssize_t len) const override;
+
   /* See language.h.  */
 
   const char *struct_too_deep_ellipsis () const override
diff --git a/gdb/guile/scm-math.c b/gdb/guile/scm-math.c
index 8b7fe9b69ec..a88c805fc23 100644
--- a/gdb/guile/scm-math.c
+++ b/gdb/guile/scm-math.c
@@ -776,6 +776,8 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
 	}
       else if (scm_is_string (obj))
 	{
+	  size_t len;
+
 	  if (type != NULL)
 	    {
 	      except_scm = gdbscm_make_misc_error (func_name, type_arg_pos,
@@ -787,14 +789,12 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
 	    {
 	      /* TODO: Provide option to specify conversion strategy.  */
 	      gdb::unique_xmalloc_ptr<char> s
-		= gdbscm_scm_to_string (obj, nullptr,
+		= gdbscm_scm_to_string (obj, &len,
 					target_charset (gdbarch),
 					0 /*non-strict*/,
 					&except_scm);
 	      if (s != NULL)
-		value = value_cstring (s.get (), strlen (s.get ()) + 1,
-				       language_string_char_type (language,
-								  gdbarch));
+		value = language->value_string (gdbarch, s.get (), len);
 	      else
 		value = NULL;
 	    }
diff --git a/gdb/language.c b/gdb/language.c
index 0d1e3848de8..2ee32a1edff 100644
--- a/gdb/language.c
+++ b/gdb/language.c
@@ -47,6 +47,7 @@
 #include <algorithm>
 #include "gdbarch.h"
 #include "compile/compile-internal.h"
+#include "arch-utils.h"
 
 static void set_range_case (void);
 
@@ -976,6 +977,14 @@ language_string_char_type (const struct language_defn *la,
   return ld->arch_info[la->la_language].string_char_type ();
 }
 
+struct value *
+language_defn::value_string (struct gdbarch *gdbarch,
+			     const char *ptr, ssize_t len) const
+{
+  struct type *type = language_string_char_type (this, gdbarch);
+  return value_cstring (ptr, len, type);
+}
+
 /* See language.h.  */
 
 struct type *
diff --git a/gdb/language.h b/gdb/language.h
index 21ed47b3580..87064962b9b 100644
--- a/gdb/language.h
+++ b/gdb/language.h
@@ -579,6 +579,13 @@ struct language_defn
   virtual char string_lower_bound () const
   { return c_style_arrays_p () ? 0 : 1; }
 
+  /* Return the LEN characters long string at STR as a value as
+     represented in this language.  GDBARCH is used to infer the
+     character type.  The default implementation returns a
+     null-terminated C string.  */
+  virtual struct value *value_string (struct gdbarch *gdbarch,
+				      const char *ptr, ssize_t len) const;
+
   /* Returns true if the symbols names should be stored in GDB's data
      structures for minimal/partial/full symbols using their linkage (aka
      mangled) form; false if the symbol names should be demangled first.
diff --git a/gdb/python/py-value.c b/gdb/python/py-value.c
index bc0f88b7a16..f5b72446912 100644
--- a/gdb/python/py-value.c
+++ b/gdb/python/py-value.c
@@ -50,9 +50,6 @@
 #define builtin_type_pybool \
   language_bool_type (python_language, python_gdbarch)
 
-#define builtin_type_pychar \
-  language_string_char_type (python_language, python_gdbarch)
-
 struct value_object {
   PyObject_HEAD
   struct value_object *next;
@@ -1907,8 +1904,9 @@ convert_value_from_python (PyObject *obj)
 	  gdb::unique_xmalloc_ptr<char> s
 	    = python_string_to_target_string (obj);
 	  if (s != NULL)
-	    value = value_cstring (s.get (), strlen (s.get ()) + 1,
-				   builtin_type_pychar);
+	    value = python_language->value_string (python_gdbarch,
+						   s.get (),
+						   strlen (s.get ()));
 	}
       else if (PyObject_TypeCheck (obj, &value_object_type))
 	value = value_copy (((value_object *) obj)->value);
diff --git a/gdb/testsuite/gdb.base/internal-string-values.exp b/gdb/testsuite/gdb.base/internal-string-values.exp
index 0748ab0984c..3380d9d8077 100644
--- a/gdb/testsuite/gdb.base/internal-string-values.exp
+++ b/gdb/testsuite/gdb.base/internal-string-values.exp
@@ -118,6 +118,86 @@ proc_with_prefix test_setting { } {
 	gdb_test_no_output {set $v = $_gdb_setting_str("remotetimeout")}
 	check_v_string "2"
     }
+
+    # Test string values made by $_gdb_setting & co. in all languages.
+    with_test_prefix "all langs" {
+	# Get list of supported languages.
+	set langs {}
+	gdb_test_multiple "complete set language " "" {
+	    -re "set language (\[^\r\n\]+)" {
+		lappend langs $expect_out(1,string)
+		exp_continue
+	    }
+	    -re "$::gdb_prompt $" {
+		pass $gdb_test_name
+	    }
+	}
+
+	proc quote {str} {
+	    upvar lang lang
+
+	    if {$lang == "fortran"} {
+		return "'$str'"
+	    } else {
+		return "\"$str\""
+	    }
+	}
+
+	gdb_test "maintenance set test-settings string foo"
+	foreach_with_prefix lang $langs {
+	    gdb_test_no_output "set language $lang"
+
+	    if {$lang == "modula-2"} {
+		gdb_test "print \"foo\"" "strings are not implemented"
+		continue
+	    }
+
+	    if {$lang == "rust"} {
+		gdb_test "print \"foo\"" \
+		    "evaluation of this expression requires the target program to be active"
+
+		# We could get the above working by starting the
+		# program, but how to make the below work?
+		gdb_test "print \$_gdb_maint_setting(\"test-settings string\")" \
+		    "',' or '\\)' expected"
+		continue
+	    }
+
+	    if {$lang == "unknown"} {
+		# Skipped because of PR gdb/28093 -- GDB would crash.
+		continue
+	    }
+
+	    set print_output ""
+	    set ptype_output ""
+
+	    set foo_str [quote foo]
+	    gdb_test_multiple "print $foo_str" "" {
+		-wrap -re " = (.*)" {
+		    set print_output $expect_out(1,string)
+		    pass $gdb_test_name
+		}
+	    }
+
+	    gdb_test_multiple "ptype $foo_str" "" {
+		-wrap -re " = (.*)" {
+		    set ptype_output $expect_out(1,string)
+		    pass $gdb_test_name
+		}
+	    }
+
+	    set cmd_str [quote "test-settings string"]
+	    set ptype_output_re [string_to_regexp $ptype_output]
+	    set print_output_re [string_to_regexp $print_output]
+
+	    foreach_with_prefix conv_func {$_gdb_maint_setting $_gdb_maint_setting_str} {
+		gdb_test "print ${conv_func}($cmd_str)" \
+		    " = $print_output_re"
+		gdb_test "ptype \$" \
+		    " = $ptype_output_re"
+	    }
+	}
+    }
 }
 
 # Test with a string value created by gdb.Value in Python.
diff --git a/gdb/valops.c b/gdb/valops.c
index f67612e87b0..161b8696cb2 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1754,12 +1754,15 @@ value_cstring (const char *ptr, ssize_t len, struct type *char_type)
 {
   struct value *val;
   int lowbound = current_language->string_lower_bound ();
-  ssize_t highbound = len / TYPE_LENGTH (char_type);
+  ssize_t highbound = len + 1;
   struct type *stringtype
     = lookup_array_range_type (char_type, lowbound, highbound + lowbound - 1);
 
   val = allocate_value (stringtype);
-  memcpy (value_contents_raw (val), ptr, len);
+  memcpy (value_contents_raw (val), ptr, len * TYPE_LENGTH (char_type));
+  /* Write the terminating character.  */
+  memset (value_contents_raw (val) + len * TYPE_LENGTH (char_type),
+	  0, TYPE_LENGTH (char_type));
   return val;
 }
 
diff --git a/gdb/value.c b/gdb/value.c
index db34d2e5762..092325c8731 100644
--- a/gdb/value.c
+++ b/gdb/value.c
@@ -2188,8 +2188,9 @@ value_of_internalvar (struct gdbarch *gdbarch, struct internalvar *var)
       break;
 
     case INTERNALVAR_STRING:
-      val = value_cstring (var->u.string, strlen (var->u.string) + 1,
-			   builtin_type (gdbarch)->builtin_char);
+      val = current_language->value_string (gdbarch,
+					    var->u.string,
+					    strlen (var->u.string));
       break;
 
     case INTERNALVAR_VALUE:
diff --git a/gdb/value.h b/gdb/value.h
index 24392437ed0..3313b9fd346 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -782,10 +782,12 @@ class scoped_value_mark
   const struct value *m_value;
 };
 
-/* Create a not_lval value with type TYPE_CODE_ARRAY.
+/* Create not_lval value representing a NULL-terminated C string.  The
+   resulting value has type TYPE_CODE_ARRAY.
 
-   PTR points to the string data; LEN is the length of the string including the
-   NULL terminator.  */
+   PTR points to the string data; LEN is number of characters (does
+   not include the NULL terminator).  CHAR_TYPE is the character
+   type.  */
 
 extern struct value *value_cstring (const char *ptr, ssize_t len,
 				    struct type *char_type);

base-commit: f9e5d80cf75c4c549c392b6bb1bd33e103824657
prerequisite-patch-id: 77de8c2c15ea6664d089023bd3332da3e23913c0
-- 
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gdb: consider null terminator for length arguments of value_cstring calls
  2021-07-14 10:25     ` Pedro Alves
  2021-07-14 13:50       ` Simon Marchi
@ 2021-07-21 15:21       ` Philippe Waroquiers
  1 sibling, 0 replies; 8+ messages in thread
From: Philippe Waroquiers @ 2021-07-21 15:21 UTC (permalink / raw)
  To: Pedro Alves, Simon Marchi, Simon Marchi, gdb-patches

On Wed, 2021-07-14 at 11:25 +0100, Pedro Alves wrote:
> Such an inferior call in Ada (and other non-terminated string languages) will now
> be passing an unexpected \0 to the called function, of course.  I wonder
> whether that's why Philippe didn't add the \0 in the first place, given his
> Ada bias?  :-)

The idea for $_gdb_setting was to allow a user defined command to access
the gdb settings and provide a behaviour depending on these settings.

I did not envisage the use case of providing the values created by $_gdb_setting
to the inferior.

For the "user defined command" use case, typically, $_streq will be used to 
implement the behaviour depending on the string setting values.

It looks like $_streq works without a null terminator:

(gdb) set var $ccc = "foo"
(gdb) set lang ada
(gdb) set var $aaa := "foo"
(gdb) p $_streq($ccc, $aaa)
$1 = true
(gdb) ptype $ccc
type = array (0 .. 3) of char
(gdb) ptype $aaa
type = array (1 .. 3) of character
(gdb) set lang c
(gdb) p $_streq($ccc, $aaa)
$2 = 1
(gdb) ptype $ccc
type = char [4]
(gdb) ptype $aaa
type = character [3]
(gdb) 

(gdb) set args foo
(gdb) p $_streq($_gdb_setting("args"), "foo")
$1 = 1
(gdb) set lang ada
(gdb) p $_streq($_gdb_setting("args"), "foo")
$2 = true
(gdb) 


So, at least for the initially envisaged use case, I do not see a problem
with the absence (or presence) of a trailing null terminator.

Philippe



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gdb: consider null terminator for length arguments of value_cstring calls
  2021-07-15 14:29         ` Pedro Alves
@ 2021-07-23 19:37           ` Simon Marchi
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Marchi @ 2021-07-23 19:37 UTC (permalink / raw)
  To: Pedro Alves, Simon Marchi, gdb-patches, Philippe Waroquiers

On 2021-07-15 10:29 a.m., Pedro Alves wrote:> On 2021-07-14 2:50 p.m., Simon Marchi wrote:
>>> For C, it makes sence to me to return a null terminated string.  Though I wonder
>>> whether the string you get should depend on language?  I mean, see this:
>>>
>>>  (gdb) set language c
>>>  (gdb) ptype "foo"
>>>  type = char [4]
>>>  (gdb) set language ada
>>>  (gdb) ptype "foo"
>>>  type = array (1 .. 3) of character
>>>
>>> Whatever we decide, I think we should update the manual to make it explicit.
>>
>> Agreed, I think the result of:
>>
>>   (gdb) ptype "foo"
>>
>> should be the same as
>>
>>   (gdb) set args foo
>>   (gdb) ptype $_gdb_setting("args")
>>
>> for every language.
>>
>> For Ada though, I get:
> 
> ...
> 
>> Let's pretend I didn't see this and try with a non-Asan GDB:
>>
>>   $ ./gdb -q -nx --data-directory=data-directory -ex "set language ada" -ex "set args foo" -ex 'ptype $_gdb_setting("args")' -ex 'print $_gdb_setting("args")' -batch
>>   type = <4-byte integer>
>>   $1 = (102, 111, 111)
>>
>>   $ ./gdb -q -nx --data-directory=data-directory -ex "set language ada" -ex "set args foo" -ex 'ptype $_gdb_setting("args")' -ex 'print $_gdb_setting("args")' -batch
>>   type = <4-byte integer>
>>   $1 = (102, 111, 111, 0)
>>
>> Not sure what the 4-byte integer refers to.  But in any case, the result
>> doesn't look good.
> 
> That "4-byte integer" thing is a red herring and a known issue -- it also happens in C:
> 
> $ $g -q -nx  -ex "set language c" -ex "set args foo" -ex 'ptype $_gdb_setting("args")' -ex 'print $_gdb_setting("args")' -batch
> type = int
> $1 = "foo"
> 
> See settings.exp:
> 
> # Verifies that $_gdb_setting (SETTING) gives a value whose ptype matches EXPECTED.
> proc check_type {setting expected} {
>     with_test_prefix "check_type $setting $expected" {
>         gdb_test "print \$_gdb_maint_setting(\"$setting\")"
>         gdb_test "ptype $" "$expected"
> 
>         # Currently, GDB ptype <internal function> always tells it is type int.
>         # ptype should better report an error such as:
>         #   "No type information for GDB functions"
>         # Test 'type int', so as to make it fail if ptype is changed.
>         gdb_test "ptype \$_gdb_maint_setting(\"$setting\")" \
>             "type = int"
>     }
> }
> 
> So instead, try (current master):
> 
> $ ./gdb -q -nx --data-directory=data-directory -ex "set language c" -ex "set args foo" -ex 'print $_gdb_setting("args")' -ex 'ptype $' -batch
> $1 = "foo"
> type = char [3]
> 
>>
>> The created type (using value_cstring vs value_string) should probably
>> depend on the language.  I tried swapping value_cstring for
>> value_string, but that wasn't enough.  So, it seems there's work to do,
>> but I don't think I'm the right person to do it as I don't know much
>> about Ada (and don't have much time to dig into it).
> 
> I think what we need is a language hook to return a string struct value from
> a const char * / len.  For C, the implementation would use value_cstring,
> for Ada it would use value_string.  The testcase can cycle through
> all languages and make sure that "print/ptype "foo" and 
> print/ptype \$_gdb_maint_setting return the same thing.
> 
> I gave this a try, see patch below, on top of your v2.  I've also put it in the 
> users/palves/value_string branch on sourceware. 
> 
> Note I went the other direction and made value_cstring work with characters instead of
> bytes, and made it add the \0 itself.  Thus I undid the guile change to use nullptr
> instead of &len, as well as the strlen()+1 changes.
> 
> After this, the only value_cstring calls left are in the C parser, and in the
> language method:
> 
>  $ grep value_cstring -rn
>  value.h:792:extern struct value *value_cstring (const char *ptr, ssize_t len,
>  c-lang.c:680:   result = value_cstring ((const char *) obstack_base (&output),
>  valops.c:1753:value_cstring (const char *ptr, ssize_t len, struct type *char_type)
>  language.c:985:  return value_cstring (ptr, len, type);
> 
> Note how the the current python and guile code were already depending on
> the language, but in an odd way.  E.g., in Python:
> 
> -#define builtin_type_pychar \
> -  language_string_char_type (python_language, python_gdbarch)
> 
> -           value = value_cstring (s.get (), strlen (s.get ()) + 1,
> -                                  builtin_type_pychar);
> 
> 
> I don't know what that was supposed to do when the language was not C?
> 
> The test runs into a few pre-existing oddities:
> 
>  (gdb) set language modula-2 
>  (gdb) p "foo"
>  strings are not implemented
> 
>  (gdb) p $_gdb_maint_setting("test-settings string")
>  ',' or ')' expected
> 
>  (gdb) set language unknown 
>  (gdb) p "foo"
>  Aborted (core dumped)
> 
> I have no idea how rust one is just weird, but then again if parsing worked, I don't
> think the call would work anyway, since Rust strings are a {ptr,len}
> aggregate, not a plain char array.
> 
> I filed PR gdb/28093 for the crash.

Your patch is much more complete than mine and does more the right
thing, so I don't think it's worth merging my patch (or if there are
some useful bits, they can be integrated in your patch).

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-23 19:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-13 15:31 [PATCH] gdb: consider null terminator for length arguments of value_cstring calls Simon Marchi
2021-07-13 17:07 ` Pedro Alves
2021-07-13 18:44   ` Simon Marchi
2021-07-14 10:25     ` Pedro Alves
2021-07-14 13:50       ` Simon Marchi
2021-07-15 14:29         ` Pedro Alves
2021-07-23 19:37           ` Simon Marchi
2021-07-21 15:21       ` Philippe Waroquiers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).