public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Rewrite gdb_argv using extract_string_maybe_quoted
@ 2024-02-08 16:28 Andrew Burgess
  2024-02-08 16:28 ` [PATCH 1/4] gdb/unittests: add extract_string_maybe_quoted self tests Andrew Burgess
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Andrew Burgess @ 2024-02-08 16:28 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This short series is really a refactor, though it does fix one tiny
bug in gdb_argv which can be seen in the Python API's
gdb.string_to_argv.  See patch #4 for details.

Patches #1 and #3 are purely about adding more unit tests.

Patches #2 and #4 are the real GDB changes.

This series feeds into two other projects that I'm working on, one of
which is already on the mailing list, and the other is still in
progress.

The first patch in this series:

  https://inbox.sourceware.org/gdb-patches/cover.1704809585.git.aburgess@redhat.com

is for libiberties buildargv function.  Using that function is good
(code reuse), but we already have extract_string_maybe_quoted which is
a GDB local copy of the core loop from within buildargv.  I therefore
claim that using buildargv is good, but using
extract_string_maybe_quoted is better -- the more heavily used GDB's
utility functions are, the more confident we can be that they are
correct.

The second place where this refactor will help is, I think, is with
addign support for filename command options (e.g. -option FILENAME),
in this case, when completing, we need to be able to skip over
FILENAME, which requires having a version of
extract_string_maybe_quoted that can adapt to the escaping which is
applied to FILENAME.  The extract_string_maybe_quoted changes in this
series will be helpful here.

All feedback welcome.

Thanks,
Andrew

---

Andrew Burgess (4):
  gdb/unittests: add extract_string_maybe_quoted self tests
  gdbsupport: allow for configuration of extract_string_maybe_quoted
  gdb/unittests: additional tests for gdb_argv class
  gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted

 gdb/Makefile.in                          |   2 +
 gdb/testsuite/gdb.python/py-cmd.exp      |   8 ++
 gdb/unittests/extract-string-selftests.c |  96 +++++++++++++++
 gdb/unittests/gdb_argv-selftests.c       | 134 +++++++++++++++++++++
 gdb/utils.c                              |  26 ----
 gdbsupport/buildargv.h                   |  50 +++++---
 gdbsupport/common-utils.cc               |  80 +++++++------
 gdbsupport/common-utils.h                | 144 ++++++++++++++++++++++-
 8 files changed, 464 insertions(+), 76 deletions(-)
 create mode 100644 gdb/unittests/extract-string-selftests.c
 create mode 100644 gdb/unittests/gdb_argv-selftests.c


base-commit: e58beedf2c8a1e0c78e0f57aeab3934de9416bfc
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] gdb/unittests: add extract_string_maybe_quoted self tests
  2024-02-08 16:28 [PATCH 0/4] Rewrite gdb_argv using extract_string_maybe_quoted Andrew Burgess
@ 2024-02-08 16:28 ` Andrew Burgess
  2024-02-08 18:17   ` Tom Tromey
  2024-02-08 16:28 ` [PATCH 2/4] gdbsupport: allow for configuration of extract_string_maybe_quoted Andrew Burgess
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Andrew Burgess @ 2024-02-08 16:28 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This commit adds some self tests for the extract_string_maybe_quoted
function.  No functionality is changed in this commit.
---
 gdb/Makefile.in                          |  1 +
 gdb/unittests/extract-string-selftests.c | 85 ++++++++++++++++++++++++
 2 files changed, 86 insertions(+)
 create mode 100644 gdb/unittests/extract-string-selftests.c

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index 0e0f19c40c9..92fe06de489 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -457,6 +457,7 @@ SELFTESTS_SRCS = \
 	unittests/copy_bitwise-selftests.c \
 	unittests/enum-flags-selftests.c \
 	unittests/environ-selftests.c \
+	unittests/extract-string-selftests.c \
 	unittests/filtered_iterator-selftests.c \
 	unittests/format_pieces-selftests.c \
 	unittests/frame_info_ptr-selftests.c \
diff --git a/gdb/unittests/extract-string-selftests.c b/gdb/unittests/extract-string-selftests.c
new file mode 100644
index 00000000000..c2eedd75342
--- /dev/null
+++ b/gdb/unittests/extract-string-selftests.c
@@ -0,0 +1,85 @@
+/* Self tests for the function extract_string_maybe_quoted.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "gdbsupport/selftest.h"
+
+namespace selftests {
+namespace extract_string {
+
+struct test_def
+{
+  test_def (const char *input,
+	    const char *output,
+	    size_t offset)
+    : m_input (input),
+      m_output (output),
+      m_offset (offset)
+  { /* Nothing.  */ }
+
+  void run () const
+  {
+    const char *tmp = m_input;
+    std::string test_out = extract_string_maybe_quoted (&tmp);
+
+    if (run_verbose ())
+      {
+	debug_printf ("Input: %s\n", m_input);
+	debug_printf ("Output [Got]: %s\n", test_out.c_str ());
+	debug_printf ("Output [Exp]: %s\n", m_output);
+	debug_printf ("Rest [Got]: %s\n", tmp);
+	debug_printf ("Rest [Exp]: %s\n", (m_input + m_offset));
+      }
+
+    SELF_CHECK (test_out == m_output);
+    SELF_CHECK (tmp == m_input + m_offset);
+  }
+
+private:
+  const char *m_input;
+  const char *m_output;
+  size_t m_offset;
+};
+
+test_def tests[] = {
+  { "abc def", "abc", 3 },
+  { "'abc' def", "abc", 5 },
+  { "\"abc\" def", "abc", 5 },
+  { "ab\\ cd ef", "ab cd", 6 },
+  { "\"abc\\\"def\" ghi", "abc\"def", 10 },
+  { "\"'abc' 'def'\" ghi", "'abc' 'def'", 13 },
+};
+
+static void
+run_tests ()
+{
+  for (const auto &test : tests)
+    test.run ();
+}
+
+} /* namespace extract_string */
+} /* namespace selftests */
+
+void _initialize_extract_string_selftest ();
+void
+_initialize_extract_string_selftest ()
+{
+  selftests::register_test ("extract-string",
+			    selftests::extract_string::run_tests);
+}
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/4] gdbsupport: allow for configuration of extract_string_maybe_quoted
  2024-02-08 16:28 [PATCH 0/4] Rewrite gdb_argv using extract_string_maybe_quoted Andrew Burgess
  2024-02-08 16:28 ` [PATCH 1/4] gdb/unittests: add extract_string_maybe_quoted self tests Andrew Burgess
@ 2024-02-08 16:28 ` Andrew Burgess
  2024-02-08 19:05   ` Tom Tromey
  2024-02-08 16:28 ` [PATCH 3/4] gdb/unittests: additional tests for gdb_argv class Andrew Burgess
  2024-02-08 16:28 ` [PATCH 4/4] gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted Andrew Burgess
  3 siblings, 1 reply; 10+ messages in thread
From: Andrew Burgess @ 2024-02-08 16:28 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This function has the extract_string_maybe_quoted function accept an
extra parameter, a control structure.  This control structure can then
be used to modify how extract_string_maybe_quoted operates.

What can be controlled exactly?

The new argument allows controls over which characters can and can't
be escaped within each of three contexts: outside any quotes, within
single quotes, and within double quotes.

The escaping rules for the current extract_string_maybe_quoted make
very little sense to me, we allow every character to be escaped within
all three contexts.

This doesn't make much sense to me, why do we need to allow so much
escaping?  In most places in GDB where extract_string_maybe_quoted is
used no characters have special meaning, so if I call
extract_string_maybe_quoted to extract from this:

  "Blah\@"

Then I'll get back (Blah@) without the parenthesis.  But why is the
escaping supported?  I could just as easily write:

  "Blah@"

In which case I'll get the same (Blah@) result.  If I want a result
of (Blah\@) then I need to type:

  "Blah\\@"

Now the argument could be used that this is consistent with shell
escaping, which is what users will expect.  Except it really isn't.
Outside of any quotes then yes, any \<CHAR> sequence will result in
<CHAR> even when <CHAR> doesn't have a special meaning, but within
double quotes only a limited set of characters need escaping, in all
other cases the backslash will be preserved, yet for GDB within double
quotes, our escaping is just like the unquoted escaping.

And within single quotes no characters can be escaped at a shell, and
yet, once again, GDB just uses the unquoted strategy.

So I think GDB's approach neither makes sense from a necessity point
of view, nor does it make sense from a "like a shell" point of view.

My plan, in a later commit, is to reuse extract_string_maybe_quoted as
the core of gdb_argv instead of calling into libiberty.  To do this I
need extract_string_maybe_quoted to behave more "shell like" just as
libiberties buildargv function does.

And so, by allowing control over which characters can, or can't be
quoted, I can retain the current extract_string_maybe_quoted behaviour
for the current users, but provide a different control structure for
when (later on) I reuse extract_string_maybe_quoted in gdb_argv.

I did consider changing the default behaviour of
extract_string_maybe_quoted, and I might still propose that in the
future.  Unfortunately, there is one test (gdb.base/options.exp) that
depends on the current extract_string_maybe_quoted behaviour,
specifically is relies on being able to escape a single quote
character within a single quote context -- at a POSIX shell it is not
possible to embed a single quote within a single quoted string.  So
as this is tested behaviour I'm reluctant to just go changing this.

In this commit I've added a new control structure extract_string_ctrl
in gdbsupport/common-utils.h, defined a default structure in
gdbsupport/common-utils.cc, and this default is used by
extract_string_maybe_quoted to maintain the current behaviour.

In gdb/unittests/extract-string-selftests.c I've defined a new
extract_string_ctrl object which implements shell like behaviour, I've
then added some new extract_string_maybe_quoted self tests which check
the shell like behaviour.
---
 gdb/unittests/extract-string-selftests.c |  20 +++-
 gdbsupport/common-utils.cc               |  75 +++++++------
 gdbsupport/common-utils.h                | 133 ++++++++++++++++++++++-
 3 files changed, 191 insertions(+), 37 deletions(-)

diff --git a/gdb/unittests/extract-string-selftests.c b/gdb/unittests/extract-string-selftests.c
index c2eedd75342..3b9a5bd104c 100644
--- a/gdb/unittests/extract-string-selftests.c
+++ b/gdb/unittests/extract-string-selftests.c
@@ -23,20 +23,30 @@
 namespace selftests {
 namespace extract_string {
 
+static extract_string_ctrl shell_extract_string_ctrl
+  (nullptr, "", "\"$`\\", "\n", "", "\n");
+
 struct test_def
 {
   test_def (const char *input,
 	    const char *output,
-	    size_t offset)
+	    size_t offset,
+	    extract_string_ctrl *ctrl = nullptr)
     : m_input (input),
       m_output (output),
-      m_offset (offset)
+      m_offset (offset),
+      m_ctrl (ctrl)
   { /* Nothing.  */ }
 
   void run () const
   {
     const char *tmp = m_input;
-    std::string test_out = extract_string_maybe_quoted (&tmp);
+    std::string test_out;
+
+    if (m_ctrl == nullptr)
+      test_out = extract_string_maybe_quoted (&tmp);
+    else
+      test_out = extract_string_maybe_quoted (&tmp, *m_ctrl);
 
     if (run_verbose ())
       {
@@ -55,6 +65,7 @@ struct test_def
   const char *m_input;
   const char *m_output;
   size_t m_offset;
+  extract_string_ctrl *m_ctrl;
 };
 
 test_def tests[] = {
@@ -64,6 +75,9 @@ test_def tests[] = {
   { "ab\\ cd ef", "ab cd", 6 },
   { "\"abc\\\"def\" ghi", "abc\"def", 10 },
   { "\"'abc' 'def'\" ghi", "'abc' 'def'", 13 },
+  { "'ab\\ cd' ef", "ab\\ cd", 8, &shell_extract_string_ctrl },
+  { "ab\\\ncd ef", "abcd", 6, &shell_extract_string_ctrl },
+  { "\"ab\\\ncd\" ef", "abcd", 8, &shell_extract_string_ctrl },
 };
 
 static void
diff --git a/gdbsupport/common-utils.cc b/gdbsupport/common-utils.cc
index 99b9cb8609b..15ee2c43f83 100644
--- a/gdbsupport/common-utils.cc
+++ b/gdbsupport/common-utils.cc
@@ -164,58 +164,67 @@ savestring (const char *ptr, size_t len)
   return p;
 }
 
+/* See common-utils.h.  */
+
+extract_string_ctrl default_extract_string_ctrl (nullptr, nullptr, nullptr);
+
 /* See documentation in common-utils.h.  */
 
 std::string
-extract_string_maybe_quoted (const char **arg)
+extract_string_maybe_quoted (const char **arg,
+			     const extract_string_ctrl &ctrl)
 {
-  bool squote = false;
-  bool dquote = false;
-  bool bsquote = false;
+  char quote = '\0';
   std::string result;
   const char *p = *arg;
 
   /* Find the start of the argument.  */
   p = skip_spaces (p);
 
-  /* Parse p similarly to gdb_argv buildargv function.  */
+  /* Parse p similarly to the libiberty buildargv function, but respect
+     the escaping rules of CTRL.  */
   while (*p != '\0')
     {
-      if (ISSPACE (*p) && !squote && !dquote && !bsquote)
+      if (ISSPACE (*p) && quote == '\0')
 	break;
       else
 	{
-	  if (bsquote)
-	    {
-	      bsquote = false;
-	      result += *p;
-	    }
-	  else if (*p == '\\')
-	    bsquote = true;
-	  else if (squote)
+	  if (*p == '\\')
 	    {
-	      if (*p == '\'')
-		squote = false;
+	      /* The character after the backslash.  */
+	      char c = *(p + 1);
+
+	      if (ctrl.discard_escaped_char (c, quote))
+		{
+		  /* We are discarding the escape character (backslash) and
+		     the character after the escape.  This allows us to
+		     emulate POSIX newline handling where an escaped
+		     newline is discarded.  */
+		  ++p;
+		}
+	      else if (ctrl.is_escaped (c, quote))
+		{
+		  /* The character C is escaped.  We discard the escape
+		     character (backslash), but do include C.  */
+		  ++p;
+		  result += c;
+		}
 	      else
-		result += *p;
-	    }
-	  else if (dquote)
-	    {
-	      if (*p == '"')
-		dquote = false;
-	      else
-		result += *p;
+		{
+		  /* We are going to treat the backslash as a literal
+		     character with no special association to the following
+		     character.  */
+		  result += *p;
+		}
 	    }
+	  else if (*p == quote)
+	    quote = '\0';
+	  else if (quote == '\0' && (*p == '\'' || *p == '"'))
+	    quote = *p;
 	  else
-	    {
-	      if (*p == '\'')
-		squote = true;
-	      else if (*p == '"')
-		dquote = true;
-	      else
-		result += *p;
-	    }
-	  p++;
+	    result += *p;
+
+	  ++p;
 	}
     }
 
diff --git a/gdbsupport/common-utils.h b/gdbsupport/common-utils.h
index 23cd40c0207..60578cd7560 100644
--- a/gdbsupport/common-utils.h
+++ b/gdbsupport/common-utils.h
@@ -74,6 +74,135 @@ std::string &string_vappendf (std::string &dest, const char* fmt, va_list args)
 
 char *savestring (const char *ptr, size_t len);
 
+/* A control data structure, instances of this class are passed to the
+   extract_string_maybe_quoted function in order to modify its behaviour.
+
+   This class controls which characters can and can't be quoted within
+   different context (no quotes, single quotes, or double quotes).  */
+
+struct extract_string_ctrl
+{
+  /* Constructor.  The CAN_ESCAPE_* arguments are the set of characters
+     which can be escaped within each context.  If any character within
+     this set appears within the given context with a backslash before it,
+     then the backslash will be removed and the character retained in the
+     output.  If any of these arguments are nullptr then every character
+     can be escaped within that context.  If you want no character escaping
+     within a particular context then pass the empty string.
+
+     The DISCARD_* arguments are the set of characters which should be
+     discarded when escaped within a given context.  If any of these
+     characters appear within the given context with a backslash before
+     them, then both the backslash and the character will be removed from
+     the final output.  If any of these arguments are nullptr then no
+     characters will be discarded when escaped within that context.
+
+     During string extraction, if a backslash is encountered in the input,
+     and the next character is not in the relevant CAN_ESCAPE_* variable,
+     nor is in the relevant DISCARD_* variable, then the backslash will be
+     retained in the output.  */
+
+  explicit extract_string_ctrl (const char *can_escape_unquoted,
+				const char *can_escape_single,
+				const char *can_escape_double,
+				const char *discard_unquoted = nullptr,
+				const char *discard_single = nullptr,
+				const char *discard_double = nullptr)
+    : m_can_escape_unquoted (can_escape_unquoted),
+      m_can_escape_single (can_escape_single),
+      m_can_escape_double (can_escape_double),
+      m_discard_unquoted (discard_unquoted),
+      m_discard_single (discard_single),
+      m_discard_double (discard_double)
+  {
+    /* Nothing.  */
+  }
+
+  /* Return true if, when character C appears escaped within an input
+     string, both the backslash escape character, and the character C
+     should be discarded from the input stream.  */
+
+  bool discard_escaped_char (const char c, const char quote) const
+  {
+    const char *char_set = nullptr;
+    switch (quote)
+      {
+      case '\0':
+	char_set = m_discard_unquoted;
+	break;
+      case '\'':
+	char_set = m_discard_single;
+	break;
+      case '"':
+	char_set = m_discard_double;
+	break;
+      default:
+	gdb_assert_not_reached ("unexpected quote character '%c'", quote);
+      }
+
+    /* Discard nothing.  */
+    if (char_set == nullptr)
+      return false;
+
+    /* If C is in CHAR_SET then discard.  */
+    return strchr (char_set, c) != nullptr;
+  }
+
+  /* Return true if, when character C appears escaped within an input
+     string, we should discard the previous escape character and just
+     include character C.  Otherwise return false.  */
+
+  bool is_escaped (const char c, const char quote) const
+  {
+    const char *char_set = nullptr;
+    switch (quote)
+      {
+      case '\0':
+	char_set = m_can_escape_unquoted;
+	break;
+      case '\'':
+	char_set = m_can_escape_single;
+	break;
+      case '"':
+	char_set = m_can_escape_double;
+	break;
+      default:
+	gdb_assert_not_reached ("unexpected quote character '%c'", quote);
+      }
+
+    /* When no character set is defined we always return true, this means
+       that every character can be escaped.  */
+    if (char_set == nullptr)
+      return true;
+
+    /* Otherwise, return true if C is in CHAR_SET.  */
+    return strchr (char_set, c) != nullptr;
+  }
+
+private:
+  /* The set of characters that can be escaped within a particular
+     context.  See the constructor for more information.  */
+  const char *m_can_escape_unquoted;
+  const char *m_can_escape_single;
+  const char *m_can_escape_double;
+
+  /* The set of characters that can be discarded within a particular
+     context.  See the constructor for more information.  */
+  const char *m_discard_unquoted;
+  const char *m_discard_single;
+  const char *m_discard_double;
+};
+
+/* The default control configuration for extract_string_maybe_quoted.
+   This is the way it is for backwards compatibility reasons, though it is
+   unclear if this is actually a good thing or not.  Every character in
+   every context can be escaped.  This really makes very little sense as
+   in most locations in GDB no characters, other than quotes and white
+   space, actually have any special meaning, so really very little
+   escaping should be supported.  */
+
+extern extract_string_ctrl default_extract_string_ctrl;
+
 /* Extract the next word from ARG.  The next word is defined as either,
    everything up to the next space, or, if the next word starts with either
    a single or double quote, then everything up to the closing quote.  The
@@ -82,7 +211,9 @@ char *savestring (const char *ptr, size_t len);
    word, or, for quoted words, the first character after the closing
    quote.  */
 
-std::string extract_string_maybe_quoted (const char **arg);
+std::string extract_string_maybe_quoted
+  (const char **arg,
+   const extract_string_ctrl &ctrl = default_extract_string_ctrl);
 
 /* The strerror() function can return NULL for errno values that are
    out of range.  Provide a "safe" version that always returns a
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/4] gdb/unittests: additional tests for gdb_argv class
  2024-02-08 16:28 [PATCH 0/4] Rewrite gdb_argv using extract_string_maybe_quoted Andrew Burgess
  2024-02-08 16:28 ` [PATCH 1/4] gdb/unittests: add extract_string_maybe_quoted self tests Andrew Burgess
  2024-02-08 16:28 ` [PATCH 2/4] gdbsupport: allow for configuration of extract_string_maybe_quoted Andrew Burgess
@ 2024-02-08 16:28 ` Andrew Burgess
  2024-02-08 16:28 ` [PATCH 4/4] gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted Andrew Burgess
  3 siblings, 0 replies; 10+ messages in thread
From: Andrew Burgess @ 2024-02-08 16:28 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

We already have some gdb_argv tests, but these only test the
gdb_argv::as_array_view member function.

This commit creates a new unittests/gdb_argv-selftests.c file and adds
a wider range of gdb_argv tests, this includes covering the existing
as_array_view tests, so the existing tests have been deleted from
gdb/utils.c.

I've also added an additional test to gdb.python/py-cmd.exp which
tests calling gdb.string_to_argv with an empty string.  The
gdb.string_to_argv function is a wrapper around gdb_argv, and I
thought the empty string case was worth covering.
---
 gdb/Makefile.in                     |   1 +
 gdb/testsuite/gdb.python/py-cmd.exp |   4 +
 gdb/unittests/gdb_argv-selftests.c  | 131 ++++++++++++++++++++++++++++
 gdb/utils.c                         |  26 ------
 4 files changed, 136 insertions(+), 26 deletions(-)
 create mode 100644 gdb/unittests/gdb_argv-selftests.c

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index 92fe06de489..18dec570389 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -462,6 +462,7 @@ SELFTESTS_SRCS = \
 	unittests/format_pieces-selftests.c \
 	unittests/frame_info_ptr-selftests.c \
 	unittests/function-view-selftests.c \
+	unittests/gdb_argv-selftests.c \
 	unittests/gdb_tilde_expand-selftests.c \
 	unittests/gmp-utils-selftests.c \
 	unittests/intrusive_list-selftests.c \
diff --git a/gdb/testsuite/gdb.python/py-cmd.exp b/gdb/testsuite/gdb.python/py-cmd.exp
index 82cb4cb557a..01af475dded 100644
--- a/gdb/testsuite/gdb.python/py-cmd.exp
+++ b/gdb/testsuite/gdb.python/py-cmd.exp
@@ -142,6 +142,10 @@ gdb_test "python print (gdb.string_to_argv ('1\\ 2 3'))" \
   {\['1 2', '3'\]} \
     "string_to_argv ('1\\ 2 3')"
 
+gdb_test "python print (gdb.string_to_argv (''))" \
+  {\[\]} \
+    "string_to_argv ('')"
+
 # Test user-defined python commands.
 gdb_test_multiline "input simple user-defined command" \
   "python" "" \
diff --git a/gdb/unittests/gdb_argv-selftests.c b/gdb/unittests/gdb_argv-selftests.c
new file mode 100644
index 00000000000..b6a87254c93
--- /dev/null
+++ b/gdb/unittests/gdb_argv-selftests.c
@@ -0,0 +1,131 @@
+/* Self tests for the gdb_argv class from gdbsupport.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "gdbsupport/selftest.h"
+#include "gdbsupport/buildargv.h"
+
+namespace selftests {
+namespace gdb_argv_tests {
+
+static extract_string_ctrl shell_extract_string_ctrl
+  (nullptr, "", "\"$`\\", "\n", "", "\n");
+
+struct test_def
+{
+  /* INPUT is the string to pass to the gdb_argv constructor.  OUTPUT is
+     the vector of arguments that gdb_argv should split INPUT into.  */
+
+  test_def (const char *input,
+	    std::vector<const char *> output)
+    : m_input (input),
+      m_output (output)
+  { /* Nothing.  */ }
+
+  /* Run this test and check the results.  Throws an exception if the test
+     fails.  */
+
+  void run () const
+  {
+    gdb_argv argv (m_input);
+    int count = argv.count ();
+
+    if (run_verbose ())
+      {
+	debug_printf ("Input: %s\n", m_input);
+	debug_printf ("Output [Count]: %d\n", count);
+      }
+
+    SELF_CHECK (count == m_output.size ());
+
+    gdb::array_view<char *> view = argv.as_array_view ();
+    SELF_CHECK (view.size () == count);
+
+    const char *const *data = argv.get ();
+    for (int i = 0; i < count; ++i)
+      {
+	const char *a = argv[i];
+	SELF_CHECK (view[i] == a);
+	SELF_CHECK (strcmp (a, m_output[i]) == 0);
+	SELF_CHECK (a == *data);
+	++data;
+      }
+
+    SELF_CHECK (*data == nullptr);
+  }
+
+private:
+  /* What to pass to gdb_argv constructor.  */
+
+  const char *m_input;
+
+  /* The expected contents of gdb_argv after construction.  */
+
+  std::vector<const char *> m_output;
+};
+
+/* The array of all tests.  */
+
+test_def tests[] = {
+  { "abc def", {"abc", "def"} },
+  { "one two 3", {"one", "two", "3"} },
+  { "one two\\ three", {"one", "two three"} },
+  { "one\\ two three", {"one two", "three"} },
+  { "'one two' three", {"one two", "three"} },
+  { "one \"two three\"", {"one", "two three"} },
+  { "'\"' \"''\"", {"\"", "''"} },
+};
+
+/* Test the creation of an empty gdb_argv object.  */
+
+static void
+empty_argv_tests ()
+{
+  {
+    gdb_argv argv;
+
+    SELF_CHECK (argv.get () == nullptr);
+    SELF_CHECK (argv.count () == 0);
+
+    gdb::array_view<char *> view = argv.as_array_view ();
+
+    SELF_CHECK (view.data () == nullptr);
+    SELF_CHECK (view.size () == 0);
+  }
+}
+
+static void
+run_tests ()
+{
+  empty_argv_tests ();
+
+  for (const auto &test : tests)
+    test.run ();
+}
+
+} /* namespace gdb_argv_tests */
+} /* namespace selftests */
+
+void _initialize_gdb_argv_selftest ();
+void
+_initialize_gdb_argv_selftest ()
+{
+  selftests::register_test ("gdb_argv",
+			    selftests::gdb_argv_tests::run_tests);
+}
diff --git a/gdb/utils.c b/gdb/utils.c
index 702c3f15826..05fbbdc2b63 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -3294,31 +3294,6 @@ gdb_realpath_tests ()
   gdb_realpath_check_trailer ("", "");
 }
 
-/* Test the gdb_argv::as_array_view method.  */
-
-static void
-gdb_argv_as_array_view_test ()
-{
-  {
-    gdb_argv argv;
-
-    gdb::array_view<char *> view = argv.as_array_view ();
-
-    SELF_CHECK (view.data () == nullptr);
-    SELF_CHECK (view.size () == 0);
-  }
-  {
-    gdb_argv argv ("une bonne 50");
-
-    gdb::array_view<char *> view = argv.as_array_view ();
-
-    SELF_CHECK (view.size () == 3);
-    SELF_CHECK (strcmp (view[0], "une") == 0);
-    SELF_CHECK (strcmp (view[1], "bonne") == 0);
-    SELF_CHECK (strcmp (view[2], "50") == 0);
-  }
-}
-
 #endif /* GDB_SELF_TEST */
 
 /* Simple, portable version of dirname that does not modify its
@@ -3794,7 +3769,6 @@ When set, debugging messages will be marked with seconds and microseconds."),
 
 #if GDB_SELF_TEST
   selftests::register_test ("gdb_realpath", gdb_realpath_tests);
-  selftests::register_test ("gdb_argv_array_view", gdb_argv_as_array_view_test);
   selftests::register_test ("strncmp_iw_with_mode",
 			    strncmp_iw_with_mode_tests);
   selftests::register_test ("pager", test_pager);
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted
  2024-02-08 16:28 [PATCH 0/4] Rewrite gdb_argv using extract_string_maybe_quoted Andrew Burgess
                   ` (2 preceding siblings ...)
  2024-02-08 16:28 ` [PATCH 3/4] gdb/unittests: additional tests for gdb_argv class Andrew Burgess
@ 2024-02-08 16:28 ` Andrew Burgess
  2024-02-08 19:08   ` Tom Tromey
  3 siblings, 1 reply; 10+ messages in thread
From: Andrew Burgess @ 2024-02-08 16:28 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This commit rewrites the gdb_argv class to make use of the function
extract_string_maybe_quoted.

I've moved shell_extract_string_ctrl from the unittests directory into
the gdbsupport directory to make it more widely available, and this is
used within gdb_argv to split arguments and handle escaping correctly.

The algorithm behind this rewrite is pretty simple, for an input
string we skip white space, and then call extract_string_maybe_quoted
to get the next argument.  We then repeat this process to get each
subsequent argument.

The result is almost identical to the libibery buildargv function
except for one edge case which I think is a bug in buildargv.

The libibery buildargv function, if passed a string that only contains
white space, will return a single empty argument.  I think this is
wrong, and after this rewrite we now return a zero length argument
list.  This edge case is tested in gdb_argv-selftests.c and also in
gdb.python/py-cmd.exp.
---
 gdb/testsuite/gdb.python/py-cmd.exp      |  4 ++
 gdb/unittests/extract-string-selftests.c |  3 --
 gdb/unittests/gdb_argv-selftests.c       |  3 ++
 gdbsupport/buildargv.h                   | 50 ++++++++++++++++--------
 gdbsupport/common-utils.cc               |  5 +++
 gdbsupport/common-utils.h                | 11 ++++++
 6 files changed, 57 insertions(+), 19 deletions(-)

diff --git a/gdb/testsuite/gdb.python/py-cmd.exp b/gdb/testsuite/gdb.python/py-cmd.exp
index 01af475dded..8b1175d44c8 100644
--- a/gdb/testsuite/gdb.python/py-cmd.exp
+++ b/gdb/testsuite/gdb.python/py-cmd.exp
@@ -146,6 +146,10 @@ gdb_test "python print (gdb.string_to_argv (''))" \
   {\[\]} \
     "string_to_argv ('')"
 
+gdb_test "python print (gdb.string_to_argv (' '))" \
+  {\[\]} \
+    "string_to_argv (' ')"
+
 # Test user-defined python commands.
 gdb_test_multiline "input simple user-defined command" \
   "python" "" \
diff --git a/gdb/unittests/extract-string-selftests.c b/gdb/unittests/extract-string-selftests.c
index 3b9a5bd104c..5fd8426ec22 100644
--- a/gdb/unittests/extract-string-selftests.c
+++ b/gdb/unittests/extract-string-selftests.c
@@ -23,9 +23,6 @@
 namespace selftests {
 namespace extract_string {
 
-static extract_string_ctrl shell_extract_string_ctrl
-  (nullptr, "", "\"$`\\", "\n", "", "\n");
-
 struct test_def
 {
   test_def (const char *input,
diff --git a/gdb/unittests/gdb_argv-selftests.c b/gdb/unittests/gdb_argv-selftests.c
index b6a87254c93..b374b9b32ec 100644
--- a/gdb/unittests/gdb_argv-selftests.c
+++ b/gdb/unittests/gdb_argv-selftests.c
@@ -83,6 +83,9 @@ struct test_def
 /* The array of all tests.  */
 
 test_def tests[] = {
+  { "", { } },
+  { "  ", { } },
+  { "''", { "" } },
   { "abc def", {"abc", "def"} },
   { "one two 3", {"one", "two", "3"} },
   { "one two\\ three", {"one", "two three"} },
diff --git a/gdbsupport/buildargv.h b/gdbsupport/buildargv.h
index 8fdd085b386..8e84affed39 100644
--- a/gdbsupport/buildargv.h
+++ b/gdbsupport/buildargv.h
@@ -1,4 +1,4 @@
-/* RAII wrapper for buildargv
+/* RAII wrapper to split an argument string into individual arguments.
 
    Copyright (C) 2021-2024 Free Software Foundation, Inc.
 
@@ -21,9 +21,12 @@
 #define GDBSUPPORT_BUILDARGV_H
 
 #include "libiberty.h"
+#include "gdbsupport/common-utils.h"
+#include <string>
 
-/* A wrapper for an array of char* that was allocated in the way that
-   'buildargv' does, and should be freed with 'freeargv'.  */
+/* A class for splitting a single char* string into an array of char*
+   arguments, each argument is extracted from the original string.  The
+   resulting array will be freed by this classes destructor.  */
 
 class gdb_argv
 {
@@ -32,15 +35,15 @@ class gdb_argv
   /* A constructor that initializes to NULL.  */
 
   gdb_argv ()
-    : m_argv (NULL)
+    : m_argv (nullptr)
   {
   }
 
-  /* A constructor that calls buildargv on STR.  STR may be NULL, in
-     which case this object is initialized with a NULL array.  */
+  /* A constructor that splits STR into an array of arguments.  STR may be
+     NULL, in which case this object is initialized with a NULL array.  */
 
   explicit gdb_argv (const char *str)
-    : m_argv (NULL)
+    : m_argv (nullptr)
   {
     reset (str);
   }
@@ -74,18 +77,34 @@ class gdb_argv
     freeargv (m_argv);
   }
 
-  /* Call buildargv on STR, storing the result in this object.  Any
-     previous state is freed.  STR may be NULL, in which case this
-     object is reset with a NULL array.  If buildargv fails due to
-     out-of-memory, call malloc_failure.  Therefore, the value is
-     guaranteed to be non-NULL, unless the parameter itself is
-     NULL.  */
+  /* Read arguments from STR by calling extract_string_maybe_quoted.
+     Leading and trailing white space in STR will be ignored.  Any previous
+     argument state is freed.  STR may be nullptr, in which case this
+     object is reset with a nullptr array.  */
 
   void reset (const char *str)
   {
-    char **argv = buildargv (str);
     freeargv (m_argv);
-    m_argv = argv;
+    m_argv = nullptr;
+
+    if (str == nullptr)
+      return;
+
+    std::vector<char *> tmp_argv;
+
+    str = skip_spaces (str);
+    while (*str != '\0')
+      {
+	std::string arg
+	  = extract_string_maybe_quoted (&str, shell_extract_string_ctrl);
+	tmp_argv.emplace_back (xstrdup (arg.c_str ()));
+	str = skip_spaces (str);
+      }
+    tmp_argv.emplace_back (nullptr);
+
+    size_t sz = sizeof (decltype (tmp_argv)::value_type) * tmp_argv.size ();
+    m_argv = (char **) xmalloc (sz);
+    memcpy (m_argv, tmp_argv.data (), sz);
   }
 
   /* Return the underlying array.  */
@@ -197,7 +216,6 @@ class gdb_argv
 private:
 
   /* The wrapped array.  */
-
   char **m_argv;
 };
 
diff --git a/gdbsupport/common-utils.cc b/gdbsupport/common-utils.cc
index 15ee2c43f83..0fb2b6f350a 100644
--- a/gdbsupport/common-utils.cc
+++ b/gdbsupport/common-utils.cc
@@ -168,6 +168,11 @@ savestring (const char *ptr, size_t len)
 
 extract_string_ctrl default_extract_string_ctrl (nullptr, nullptr, nullptr);
 
+/* See common-utils.h.  */
+
+extract_string_ctrl shell_extract_string_ctrl (nullptr, "", "\"$`\\",
+					       "\n", "", "\n");
+
 /* See documentation in common-utils.h.  */
 
 std::string
diff --git a/gdbsupport/common-utils.h b/gdbsupport/common-utils.h
index 60578cd7560..bf603ee04e3 100644
--- a/gdbsupport/common-utils.h
+++ b/gdbsupport/common-utils.h
@@ -203,6 +203,17 @@ struct extract_string_ctrl
 
 extern extract_string_ctrl default_extract_string_ctrl;
 
+/* This control object for use with extract_string_maybe_quoted replicates
+   shell like behaviour.  Every character can be escaped in an unquoted
+   context, nothing can be escaped within single quotes, and a limited set
+   of characters (see POSIX shell spec) can be escaped within a double
+   quoted context.
+
+   An escaped newline will be removed in an unquoted and double quoted
+   context.  */
+
+extern extract_string_ctrl shell_extract_string_ctrl;
+
 /* Extract the next word from ARG.  The next word is defined as either,
    everything up to the next space, or, if the next word starts with either
    a single or double quote, then everything up to the closing quote.  The
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] gdb/unittests: add extract_string_maybe_quoted self tests
  2024-02-08 16:28 ` [PATCH 1/4] gdb/unittests: add extract_string_maybe_quoted self tests Andrew Burgess
@ 2024-02-08 18:17   ` Tom Tromey
  0 siblings, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2024-02-08 18:17 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

Andrew> This commit adds some self tests for the extract_string_maybe_quoted
Andrew> function.  No functionality is changed in this commit.

Andrew> +  test_def (const char *input,
Andrew> +	    const char *output,
Andrew> +	    size_t offset)
Andrew> +    : m_input (input),
Andrew> +      m_output (output),
Andrew> +      m_offset (offset)
Andrew> +  { /* Nothing.  */ }

I wonder -- if this constructor was constexpr, would that create the
objects in the rodata section?

I don't really know how that works.  However, making data read-only like
this is one of the advantages of...

Andrew> +test_def tests[] = {
Andrew> +  { "abc def", "abc", 3 },

... making this kind of thing 'static const'.

Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] gdbsupport: allow for configuration of extract_string_maybe_quoted
  2024-02-08 16:28 ` [PATCH 2/4] gdbsupport: allow for configuration of extract_string_maybe_quoted Andrew Burgess
@ 2024-02-08 19:05   ` Tom Tromey
  0 siblings, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2024-02-08 19:05 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

Andrew> And so, by allowing control over which characters can, or can't be
Andrew> quoted, I can retain the current extract_string_maybe_quoted behaviour
Andrew> for the current users, but provide a different control structure for
Andrew> when (later on) I reuse extract_string_maybe_quoted in gdb_argv.

Andrew> I did consider changing the default behaviour of
Andrew> extract_string_maybe_quoted, and I might still propose that in the
Andrew> future.  Unfortunately, there is one test (gdb.base/options.exp) that
Andrew> depends on the current extract_string_maybe_quoted behaviour,
Andrew> specifically is relies on being able to escape a single quote
Andrew> character within a single quote context -- at a POSIX shell it is not
Andrew> possible to embed a single quote within a single quoted string.  So
Andrew> as this is tested behaviour I'm reluctant to just go changing this.

To me the current behavior seems pretty surprising and not extremely
useful.  I think I'd be in favor of changing it to always be more
shell-like.

I think it might be better to also just remove the controls from this
function -- i.e., simplify this patch -- and have all commands that use
this present the same UI.

The idea here is to try to make gdb be a bit more consistent.  However,
I haven't looked at all the callers to know if this is infeasible for
some reason.  Do you know if there's a reason not to do this?

Internally at AdaCore, we have a change to the libiberty buildargv that
changes the backslash behavior on Windows hosts.  However, it seems to
me that fixing the backslash to not affect ordinary characters would
probably fill the same role.

Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted
  2024-02-08 16:28 ` [PATCH 4/4] gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted Andrew Burgess
@ 2024-02-08 19:08   ` Tom Tromey
  2024-02-09 16:46     ` Andrew Burgess
  0 siblings, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2024-02-08 19:08 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

Andrew> The libibery buildargv function, if passed a string that only contains

typo, 'libiberty'

Andrew> white space, will return a single empty argument.  I think this is
Andrew> wrong, and after this rewrite we now return a zero length argument
Andrew> list.

I wonder if this should also be fixed in libiberty.

That said, I'm also in favor of gdb moving away from using buildargv.
Maybe gdb_argv could even be simplified later -- right now it has to do
some memory allocation stuff just to be compatible with how libiberty
works; but I wonder if we could eventually just switch it to
std::vector<std::string>.

Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted
  2024-02-08 19:08   ` Tom Tromey
@ 2024-02-09 16:46     ` Andrew Burgess
  2024-02-09 17:22       ` Tom Tromey
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Burgess @ 2024-02-09 16:46 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

Tom Tromey <tom@tromey.com> writes:

>>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:
>
> Andrew> The libibery buildargv function, if passed a string that only contains
>
> typo, 'libiberty'
>
> Andrew> white space, will return a single empty argument.  I think this is
> Andrew> wrong, and after this rewrite we now return a zero length argument
> Andrew> list.
>
> I wonder if this should also be fixed in libiberty.

I already have a patch in this area on the gcc list.  It doesn't include
this fix, but an item on my todo list is to update that patch to include
this fix too.

> That said, I'm also in favor of gdb moving away from using buildargv.

If I simplify extract_string_maybe_quoted as you suggest on patch #2
then I'll not be able to share that here.  I'm going to rework the first
3 patches, and see how this leaves this patch looking.

> Maybe gdb_argv could even be simplified later -- right now it has to do
> some memory allocation stuff just to be compatible with how libiberty
> works; but I wonder if we could eventually just switch it to
> std::vector<std::string>.

Maybe.  One place we use gdb_argv is when we start an inferior without a
shell, in which case we forward the gdb_argv data to the exec call (so
we need a nullptr terminated array of C strings).  But that's just one
of _many_ uses of gdb_argv, so maybe we should specialise that case in
some way, and all the other uses could use the vector of strings.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted
  2024-02-09 16:46     ` Andrew Burgess
@ 2024-02-09 17:22       ` Tom Tromey
  0 siblings, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2024-02-09 17:22 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: Tom Tromey, gdb-patches

>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

>> That said, I'm also in favor of gdb moving away from using buildargv.

Andrew> If I simplify extract_string_maybe_quoted as you suggest on patch #2
Andrew> then I'll not be able to share that here.  I'm going to rework the first
Andrew> 3 patches, and see how this leaves this patch looking.

If my suggestion causes problems, it's fine to push back... I wouldn't
want a sort of offhand thought to make more work.

Andrew> Maybe.  One place we use gdb_argv is when we start an inferior without a
Andrew> shell, in which case we forward the gdb_argv data to the exec call (so
Andrew> we need a nullptr terminated array of C strings).  But that's just one
Andrew> of _many_ uses of gdb_argv, so maybe we should specialise that case in
Andrew> some way, and all the other uses could use the vector of strings.

For this particular case we could have some method to return a transient
vector of 'const char *' or something along those lines.  To be clear
I'm not suggesting you do anything like this in your series, I was just
pointing out why I think it'd be good to move off of buildargv.

Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-09 17:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-08 16:28 [PATCH 0/4] Rewrite gdb_argv using extract_string_maybe_quoted Andrew Burgess
2024-02-08 16:28 ` [PATCH 1/4] gdb/unittests: add extract_string_maybe_quoted self tests Andrew Burgess
2024-02-08 18:17   ` Tom Tromey
2024-02-08 16:28 ` [PATCH 2/4] gdbsupport: allow for configuration of extract_string_maybe_quoted Andrew Burgess
2024-02-08 19:05   ` Tom Tromey
2024-02-08 16:28 ` [PATCH 3/4] gdb/unittests: additional tests for gdb_argv class Andrew Burgess
2024-02-08 16:28 ` [PATCH 4/4] gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted Andrew Burgess
2024-02-08 19:08   ` Tom Tromey
2024-02-09 16:46     ` Andrew Burgess
2024-02-09 17:22       ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).