public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libiberty/buildargv: POSIX behaviour for backslash handling
@ 2023-12-06 16:50 Andrew Burgess
  2024-01-02 11:22 ` Ping: " Andrew Burgess
  2024-02-10 17:25 ` [PATCHv2 0/2] Changes to libiberty buildargv Andrew Burgess
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Burgess @ 2023-12-06 16:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Burgess

GDB makes use of the libiberty function buildargv for splitting the
inferior (program being debugged) argument string in the case where
the inferior is not being started under a shell.

I have recently been working to improve this area of GDB, and have
tracked done some of the unexpected behaviour to the libiberty
function buildargv, and how it handles backslash escapes.

For reference, I've been mostly reading:

  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html

The issues that I would like to fix are:

  1. Backslashes within single quotes should not be treated as an
  escape, thus: '\a' should split to \a, retaining the backslash.

  2. Backslashes within double quotes should only act as an escape if
  they are immediately before one of the characters $ (dollar),
  ` (backtick), " (double quote), ` (backslash), or \n (newline).  In
  all other cases a backslash should not be treated as an escape
  character.  Thus: "\a" should split to \a, but "\$" should split to
  $.

  3. A backslash-newline sequence should be treated as a line
  continuation, both the backslash and the newline should be removed.

I've updated libiberty and also added some tests.  All the existing
libiberty tests continue to pass, but I'm not sure if there is more
testing that should be done, buildargv is used within lto-wraper.cc,
so maybe there's some testing folk can suggest that I run?
---
 libiberty/argv.c                      |  8 +++++--
 libiberty/testsuite/test-expandargv.c | 34 +++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/libiberty/argv.c b/libiberty/argv.c
index c2823d3e4ba..6bae4ca2ee9 100644
--- a/libiberty/argv.c
+++ b/libiberty/argv.c
@@ -224,9 +224,13 @@ char **buildargv (const char *input)
 		  if (bsquote)
 		    {
 		      bsquote = 0;
-		      *arg++ = *input;
+		      if (*input != '\n')
+			*arg++ = *input;
 		    }
-		  else if (*input == '\\')
+		  else if (*input == '\\'
+			   && !squote
+			   && (!dquote
+			       || strchr ("$`\"\\\n", *(input + 1)) != NULL))
 		    {
 		      bsquote = 1;
 		    }
diff --git a/libiberty/testsuite/test-expandargv.c b/libiberty/testsuite/test-expandargv.c
index 30f2337ef77..b8dcc6a269a 100644
--- a/libiberty/testsuite/test-expandargv.c
+++ b/libiberty/testsuite/test-expandargv.c
@@ -142,6 +142,40 @@ const char *test_data[] = {
   "b",
   0,
 
+  /* Test 7 - No backslash removal within single quotes.  */
+  "'a\\$VAR' '\\\"'",    /* Test 7 data */
+  ARGV0,
+  "@test-expandargv-7.lst",
+  0,
+  ARGV0,
+  "a\\$VAR",
+  "\\\"",
+  0,
+
+  /* Test 8 - Remove backslash / newline pairs.  */
+  "\"ab\\\ncd\" ef\\\ngh",    /* Test 8 data */
+  ARGV0,
+  "@test-expandargv-8.lst",
+  0,
+  ARGV0,
+  "abcd",
+  "efgh",
+  0,
+
+  /* Test 9 - Backslash within double quotes.  */
+  "\"\\$VAR\" \"\\`\" \"\\\"\" \"\\\\\" \"\\n\" \"\\t\"",    /* Test 9 data */
+  ARGV0,
+  "@test-expandargv-9.lst",
+  0,
+  ARGV0,
+  "$VAR",
+  "`",
+  "\"",
+  "\\",
+  "\\n",
+  "\\t",
+  0,
+
   0 /* Test done marker, don't remove. */
 };
 

base-commit: 458e7c937924bbcef80eb006af0b61420dbfc1c1
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Ping: Re: [PATCH] libiberty/buildargv: POSIX behaviour for backslash handling
  2023-12-06 16:50 [PATCH] libiberty/buildargv: POSIX behaviour for backslash handling Andrew Burgess
@ 2024-01-02 11:22 ` Andrew Burgess
  2024-02-10 17:25 ` [PATCHv2 0/2] Changes to libiberty buildargv Andrew Burgess
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Burgess @ 2024-01-02 11:22 UTC (permalink / raw)
  To: gcc-patches


Ping!

Thanks,
Andrew

Andrew Burgess <aburgess@redhat.com> writes:

> GDB makes use of the libiberty function buildargv for splitting the
> inferior (program being debugged) argument string in the case where
> the inferior is not being started under a shell.
>
> I have recently been working to improve this area of GDB, and have
> tracked done some of the unexpected behaviour to the libiberty
> function buildargv, and how it handles backslash escapes.
>
> For reference, I've been mostly reading:
>
>   https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html
>
> The issues that I would like to fix are:
>
>   1. Backslashes within single quotes should not be treated as an
>   escape, thus: '\a' should split to \a, retaining the backslash.
>
>   2. Backslashes within double quotes should only act as an escape if
>   they are immediately before one of the characters $ (dollar),
>   ` (backtick), " (double quote), ` (backslash), or \n (newline).  In
>   all other cases a backslash should not be treated as an escape
>   character.  Thus: "\a" should split to \a, but "\$" should split to
>   $.
>
>   3. A backslash-newline sequence should be treated as a line
>   continuation, both the backslash and the newline should be removed.
>
> I've updated libiberty and also added some tests.  All the existing
> libiberty tests continue to pass, but I'm not sure if there is more
> testing that should be done, buildargv is used within lto-wraper.cc,
> so maybe there's some testing folk can suggest that I run?
> ---
>  libiberty/argv.c                      |  8 +++++--
>  libiberty/testsuite/test-expandargv.c | 34 +++++++++++++++++++++++++++
>  2 files changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/libiberty/argv.c b/libiberty/argv.c
> index c2823d3e4ba..6bae4ca2ee9 100644
> --- a/libiberty/argv.c
> +++ b/libiberty/argv.c
> @@ -224,9 +224,13 @@ char **buildargv (const char *input)
>  		  if (bsquote)
>  		    {
>  		      bsquote = 0;
> -		      *arg++ = *input;
> +		      if (*input != '\n')
> +			*arg++ = *input;
>  		    }
> -		  else if (*input == '\\')
> +		  else if (*input == '\\'
> +			   && !squote
> +			   && (!dquote
> +			       || strchr ("$`\"\\\n", *(input + 1)) != NULL))
>  		    {
>  		      bsquote = 1;
>  		    }
> diff --git a/libiberty/testsuite/test-expandargv.c b/libiberty/testsuite/test-expandargv.c
> index 30f2337ef77..b8dcc6a269a 100644
> --- a/libiberty/testsuite/test-expandargv.c
> +++ b/libiberty/testsuite/test-expandargv.c
> @@ -142,6 +142,40 @@ const char *test_data[] = {
>    "b",
>    0,
>  
> +  /* Test 7 - No backslash removal within single quotes.  */
> +  "'a\\$VAR' '\\\"'",    /* Test 7 data */
> +  ARGV0,
> +  "@test-expandargv-7.lst",
> +  0,
> +  ARGV0,
> +  "a\\$VAR",
> +  "\\\"",
> +  0,
> +
> +  /* Test 8 - Remove backslash / newline pairs.  */
> +  "\"ab\\\ncd\" ef\\\ngh",    /* Test 8 data */
> +  ARGV0,
> +  "@test-expandargv-8.lst",
> +  0,
> +  ARGV0,
> +  "abcd",
> +  "efgh",
> +  0,
> +
> +  /* Test 9 - Backslash within double quotes.  */
> +  "\"\\$VAR\" \"\\`\" \"\\\"\" \"\\\\\" \"\\n\" \"\\t\"",    /* Test 9 data */
> +  ARGV0,
> +  "@test-expandargv-9.lst",
> +  0,
> +  ARGV0,
> +  "$VAR",
> +  "`",
> +  "\"",
> +  "\\",
> +  "\\n",
> +  "\\t",
> +  0,
> +
>    0 /* Test done marker, don't remove. */
>  };
>  
>
> base-commit: 458e7c937924bbcef80eb006af0b61420dbfc1c1
> -- 
> 2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv2 0/2] Changes to libiberty buildargv
  2023-12-06 16:50 [PATCH] libiberty/buildargv: POSIX behaviour for backslash handling Andrew Burgess
  2024-01-02 11:22 ` Ping: " Andrew Burgess
@ 2024-02-10 17:25 ` Andrew Burgess
  2024-02-10 17:26   ` [PATCHv2 1/2] libiberty/buildargv: POSIX behaviour for backslash handling Andrew Burgess
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Andrew Burgess @ 2024-02-10 17:25 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Burgess

I realise that these patches are not going to get merged until GCC is
back in stage 1, but thought I'd post my latest set of changes for the
libiberty buildargv function.

Patch #1 is unchanged from V1.

Patch #2 is new in V2.

Thanks,
Andrew

---

Andrew Burgess (2):
  libiberty/buildargv: POSIX behaviour for backslash handling
  libiberty/buildargv: handle input consisting of only white space

 libiberty/argv.c                      | 104 ++++++++--------
 libiberty/testsuite/test-expandargv.c | 170 ++++++++++++++++++++++----
 2 files changed, 200 insertions(+), 74 deletions(-)


base-commit: cff174fabd6c980c09aee95db1d9d5c22421761f
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv2 1/2] libiberty/buildargv: POSIX behaviour for backslash handling
  2024-02-10 17:25 ` [PATCHv2 0/2] Changes to libiberty buildargv Andrew Burgess
@ 2024-02-10 17:26   ` Andrew Burgess
  2024-05-26 15:03     ` Jeff Law
  2024-02-10 17:26   ` [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space Andrew Burgess
  2024-04-27  9:48   ` [PATCHv2 0/2] Changes to libiberty buildargv Andrew Burgess
  2 siblings, 1 reply; 10+ messages in thread
From: Andrew Burgess @ 2024-02-10 17:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Burgess

GDB makes use of the libiberty function buildargv for splitting the
inferior (program being debugged) argument string in the case where
the inferior is not being started under a shell.

I have recently been working to improve this area of GDB, and have
tracked done some of the unexpected behaviour to the libiberty
function buildargv, and how it handles backslash escapes.

For reference, I've been mostly reading:

  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html

The issues that I would like to fix are:

  1. Backslashes within single quotes should not be treated as an
  escape, thus: '\a' should split to \a, retaining the backslash.

  2. Backslashes within double quotes should only act as an escape if
  they are immediately before one of the characters $ (dollar),
  ` (backtick), " (double quote), ` (backslash), or \n (newline).  In
  all other cases a backslash should not be treated as an escape
  character.  Thus: "\a" should split to \a, but "\$" should split to
  $.

  3. A backslash-newline sequence should be treated as a line
  continuation, both the backslash and the newline should be removed.

I've updated libiberty and also added some tests.  All the existing
libiberty tests continue to pass, but I'm not sure if there is more
testing that should be done, buildargv is used within lto-wraper.cc,
so maybe there's some testing folk can suggest that I run?
---
 libiberty/argv.c                      |  8 +++++--
 libiberty/testsuite/test-expandargv.c | 34 +++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/libiberty/argv.c b/libiberty/argv.c
index 45f16854603..d9d32e59e72 100644
--- a/libiberty/argv.c
+++ b/libiberty/argv.c
@@ -224,9 +224,13 @@ char **buildargv (const char *input)
 		  if (bsquote)
 		    {
 		      bsquote = 0;
-		      *arg++ = *input;
+		      if (*input != '\n')
+			*arg++ = *input;
 		    }
-		  else if (*input == '\\')
+		  else if (*input == '\\'
+			   && !squote
+			   && (!dquote
+			       || strchr ("$`\"\\\n", *(input + 1)) != NULL))
 		    {
 		      bsquote = 1;
 		    }
diff --git a/libiberty/testsuite/test-expandargv.c b/libiberty/testsuite/test-expandargv.c
index 1e9cb0a0d5a..ea1aeb0eda2 100644
--- a/libiberty/testsuite/test-expandargv.c
+++ b/libiberty/testsuite/test-expandargv.c
@@ -142,6 +142,40 @@ const char *test_data[] = {
   "b",
   0,
 
+  /* Test 7 - No backslash removal within single quotes.  */
+  "'a\\$VAR' '\\\"'",    /* Test 7 data */
+  ARGV0,
+  "@test-expandargv-7.lst",
+  0,
+  ARGV0,
+  "a\\$VAR",
+  "\\\"",
+  0,
+
+  /* Test 8 - Remove backslash / newline pairs.  */
+  "\"ab\\\ncd\" ef\\\ngh",    /* Test 8 data */
+  ARGV0,
+  "@test-expandargv-8.lst",
+  0,
+  ARGV0,
+  "abcd",
+  "efgh",
+  0,
+
+  /* Test 9 - Backslash within double quotes.  */
+  "\"\\$VAR\" \"\\`\" \"\\\"\" \"\\\\\" \"\\n\" \"\\t\"",    /* Test 9 data */
+  ARGV0,
+  "@test-expandargv-9.lst",
+  0,
+  ARGV0,
+  "$VAR",
+  "`",
+  "\"",
+  "\\",
+  "\\n",
+  "\\t",
+  0,
+
   0 /* Test done marker, don't remove. */
 };
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space
  2024-02-10 17:25 ` [PATCHv2 0/2] Changes to libiberty buildargv Andrew Burgess
  2024-02-10 17:26   ` [PATCHv2 1/2] libiberty/buildargv: POSIX behaviour for backslash handling Andrew Burgess
@ 2024-02-10 17:26   ` Andrew Burgess
  2024-05-26 15:08     ` Jeff Law
  2024-04-27  9:48   ` [PATCHv2 0/2] Changes to libiberty buildargv Andrew Burgess
  2 siblings, 1 reply; 10+ messages in thread
From: Andrew Burgess @ 2024-02-10 17:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Burgess

GDB makes use of the libiberty function buildargv for splitting the
inferior (program being debugged) argument string in the case where
the inferior is not being started under a shell.

I have recently been working to improve this area of GDB, and noticed
some unexpected behaviour to the libiberty function buildargv, when
the input is a string consisting only of white space.

What I observe is that if the input to buildargv is a string
containing only white space, then buildargv will return an argv list
containing a single empty argument, e.g.:

  char **argv = buildargv (" ");
  assert (*argv[0] == '\0');
  assert (argv[1] == NULL);

We get the same output from buildargv if the input is a single space,
or multiple spaces.  Other white space characters give the same
results.

This doesn't seem right to me, and in fact, there appears to be a work
around for this issue in expandargv where we have this code:

  /* If the file is empty or contains only whitespace, buildargv would
     return a single empty argument.  In this context we want no arguments,
     instead.  */
  if (only_whitespace (buffer))
    {
      file_argv = (char **) xmalloc (sizeof (char *));
      file_argv[0] = NULL;
    }
  else
    /* Parse the string.  */
    file_argv = buildargv (buffer);

I think that the correct behaviour in this situation is to return an
empty argv array, e.g.:

  char **argv = buildargv (" ");
  assert (argv[0] == NULL);

And it turns out that this is a trivial change to buildargv.  The diff
does look big, but this is because I've re-indented a block.  Check
with 'git diff -b' to see the minimal changes.  I've also removed the
work around from expandargv.

When testing this sort of thing I normally write the tests first, and
then fix the code.  In this case test-expandargv.c has sort-of been
used as a mechanism for testing the buildargv function (expandargv
does call buildargv most of the time), however, for this particular
issue the work around in expandargv (mentioned above) masked the
buildargv bug.

I did consider adding a new test-buildargv.c file, however, this would
have basically been a copy & paste of test-expandargv.c (with some
minor changes to call buildargv).  This would be fine now, but feels
like we would eventually end up with one file not being updated as
much as the other, and so test coverage would suffer.

Instead, I have added some explicit buildargv testing to the
test-expandargv.c file, this reuses the test input that is already
defined for expandargv.

Of course, once I removed the work around from expandargv then we now
do always call buildargv from expandargv, and so the bug I'm fixing
would impact both expandargv and buildargv, so maybe the new testing
is redundant?  I tend to think more testing is always better, so I've
left it in for now.
---
 libiberty/argv.c                      | 108 ++++++++++----------
 libiberty/testsuite/test-expandargv.c | 136 ++++++++++++++++++++++----
 2 files changed, 166 insertions(+), 78 deletions(-)

diff --git a/libiberty/argv.c b/libiberty/argv.c
index d9d32e59e72..675336273f3 100644
--- a/libiberty/argv.c
+++ b/libiberty/argv.c
@@ -212,71 +212,74 @@ char **buildargv (const char *input)
 	      argv[argc] = NULL;
 	    }
 	  /* Begin scanning arg */
-	  arg = copybuf;
-	  while (*input != EOS)
+	  if (*input != EOS)
 	    {
-	      if (ISSPACE (*input) && !squote && !dquote && !bsquote)
+	      arg = copybuf;
+	      while (*input != EOS)
 		{
-		  break;
-		}
-	      else
-		{
-		  if (bsquote)
-		    {
-		      bsquote = 0;
-		      if (*input != '\n')
-			*arg++ = *input;
-		    }
-		  else if (*input == '\\'
-			   && !squote
-			   && (!dquote
-			       || strchr ("$`\"\\\n", *(input + 1)) != NULL))
+		  if (ISSPACE (*input) && !squote && !dquote && !bsquote)
 		    {
-		      bsquote = 1;
-		    }
-		  else if (squote)
-		    {
-		      if (*input == '\'')
-			{
-			  squote = 0;
-			}
-		      else
-			{
-			  *arg++ = *input;
-			}
+		      break;
 		    }
-		  else if (dquote)
+		  else
 		    {
-		      if (*input == '"')
+		      if (bsquote)
 			{
-			  dquote = 0;
+			  bsquote = 0;
+			  if (*input != '\n')
+			    *arg++ = *input;
 			}
-		      else
+		      else if (*input == '\\'
+			       && !squote
+			       && (!dquote
+				   || strchr ("$`\"\\\n", *(input + 1)) != NULL))
 			{
-			  *arg++ = *input;
+			  bsquote = 1;
 			}
-		    }
-		  else
-		    {
-		      if (*input == '\'')
+		      else if (squote)
 			{
-			  squote = 1;
+			  if (*input == '\'')
+			    {
+			      squote = 0;
+			    }
+			  else
+			    {
+			      *arg++ = *input;
+			    }
 			}
-		      else if (*input == '"')
+		      else if (dquote)
 			{
-			  dquote = 1;
+			  if (*input == '"')
+			    {
+			      dquote = 0;
+			    }
+			  else
+			    {
+			      *arg++ = *input;
+			    }
 			}
 		      else
 			{
-			  *arg++ = *input;
+			  if (*input == '\'')
+			    {
+			      squote = 1;
+			    }
+			  else if (*input == '"')
+			    {
+			      dquote = 1;
+			    }
+			  else
+			    {
+			      *arg++ = *input;
+			    }
 			}
+		      input++;
 		    }
-		  input++;
 		}
+	      *arg = EOS;
+	      argv[argc] = xstrdup (copybuf);
+	      argc++;
 	    }
-	  *arg = EOS;
-	  argv[argc] = xstrdup (copybuf);
-	  argc++;
 	  argv[argc] = NULL;
 
 	  consume_whitespace (&input);
@@ -439,17 +442,8 @@ expandargv (int *argcp, char ***argvp)
 	}
       /* Add a NUL terminator.  */
       buffer[len] = '\0';
-      /* If the file is empty or contains only whitespace, buildargv would
-	 return a single empty argument.  In this context we want no arguments,
-	 instead.  */
-      if (only_whitespace (buffer))
-	{
-	  file_argv = (char **) xmalloc (sizeof (char *));
-	  file_argv[0] = NULL;
-	}
-      else
-	/* Parse the string.  */
-	file_argv = buildargv (buffer);
+      /* Parse the string.  */
+      file_argv = buildargv (buffer);
       /* If *ARGVP is not already dynamically allocated, copy it.  */
       if (*argvp == original_argv)
 	*argvp = dupargv (*argvp);
diff --git a/libiberty/testsuite/test-expandargv.c b/libiberty/testsuite/test-expandargv.c
index ea1aeb0eda2..ca7031eaf68 100644
--- a/libiberty/testsuite/test-expandargv.c
+++ b/libiberty/testsuite/test-expandargv.c
@@ -176,6 +176,30 @@ const char *test_data[] = {
   "\\t",
   0,
 
+  /* Test 10 - Mixed white space characters.  */
+  "\t \n \t ",		/* Test 10 data */
+  ARGV0,
+  "@test-expandargv-10.lst",
+  0,
+  ARGV0,
+  0,
+
+  /* Test 11 - Single ' ' character.  */
+  " ",		/* Test 11 data */
+  ARGV0,
+  "@test-expandargv-11.lst",
+  0,
+  ARGV0,
+  0,
+
+  /* Test 12 - Multiple ' ' characters.  */
+  "   ",		/* Test 12 data */
+  ARGV0,
+  "@test-expandargv-12.lst",
+  0,
+  ARGV0,
+  0,
+
   0 /* Test done marker, don't remove. */
 };
 
@@ -265,6 +289,78 @@ erase_test (int test)
     fatal_error (__LINE__, "Failed to erase test file.", errno);
 }
 
+/* compare_argv:
+     TEST is the current test number, and NAME is a short string to identify
+     which libibery function is being tested.  ARGC_A and ARGV_A describe an
+     argument array, and this is compared to ARGC_B and ARGV_B, return 0 if
+     the two arrays match, otherwise return 1.  */
+
+static int
+compare_argv (int test, const char *name, int argc_a, char *argv_a[],
+	      int argc_b, char *argv_b[])
+{
+  int failed = 0, k;
+
+  if (argc_a != argc_b)
+    {
+      printf ("FAIL: test-%s-%d.  Argument count didn't match\n", name, test);
+      failed = 1;
+    }
+  /* Compare each of the argv's ... */
+  else
+    for (k = 0; k < argc_a; k++)
+      if (strcmp (argv_a[k], argv_b[k]) != 0)
+	{
+	  printf ("FAIL: test-%s-%d. Arguments don't match.\n", name, test);
+	  failed = 1;
+	  break;
+	}
+
+  if (!failed)
+    printf ("PASS: test-%s-%d.\n", name, test);
+
+  return failed;
+}
+
+/* test_buildargv
+     Test the buildargv function from libiberty.  TEST is the current test
+     number and TEST_INPUT is the string to pass to buildargv (after calling
+     run_replaces on it).  ARGC_AFTER and ARGV_AFTER are the expected
+     results.  Return 0 if the test passes, otherwise return 1.  */
+
+static int
+test_buildargv (int test, const char * test_input, int argc_after,
+		char *argv_after[])
+{
+  char * input, ** argv;
+  size_t len;
+  int argc, failed;
+
+  /* Generate RW copy of data for replaces */
+  len = strlen (test_input);
+  input = malloc (sizeof (char) * (len + 1));
+  if (input == NULL)
+    fatal_error (__LINE__, "Failed to malloc buildargv input buffer.", errno);
+
+  memcpy (input, test_input, sizeof (char) * (len + 1));
+  /* Run all possible replaces */
+  run_replaces (input);
+
+  /* Split INPUT into separate arguments.  */
+  argv = buildargv (input);
+
+  /* Count the arguments we got back.  */
+  argc = 0;
+  while (argv[argc])
+    ++argc;
+
+  failed = compare_argv (test, "buildargv", argc_after, argv_after, argc, argv);
+
+  free (input);
+  freeargv (argv);
+
+  return failed;
+}
 
 /* run_tests:
     Run expandargv
@@ -276,12 +372,16 @@ run_tests (const char **test_data)
 {
   int argc_after, argc_before;
   char ** argv_before, ** argv_after;
-  int i, j, k, fails, failed;
+  int i, j, k, fails;
+  const char * input_str;
 
   i = j = fails = 0;
   /* Loop over all the tests */
   while (test_data[j])
     {
+      /* Save original input in case we run a buildargv test.  */
+      input_str = test_data[j];
+
       /* Write test data */
       writeout_test (i, test_data[j++]);
       /* Copy argv before */
@@ -305,29 +405,23 @@ run_tests (const char **test_data)
       for (k = 0; k < argc_after; k++)
         run_replaces (argv_after[k]);
 
+      /* If the test input is just a file to expand then we can also test
+	 calling buildargv directly as the expected output is equivalent to
+	 calling buildargv on the contents of the file.
+
+	 The results of calling buildargv will not include the ARGV0 constant,
+	 which is why we pass 'argc_after - 1' and 'argv_after + 1', this skips
+	 over the ARGV0 in the expected results.  */
+      if (argc_before == 2)
+	fails += test_buildargv (i, input_str, argc_after - 1, argv_after + 1);
+      else
+	printf ("SKIP: test-buildargv-%d.  This test isn't for buildargv\n", i);
+
       /* Run test: Expand arguments */
       expandargv (&argc_before, &argv_before);
 
-      failed = 0;
-      /* Compare size first */
-      if (argc_before != argc_after)
-        {
-          printf ("FAIL: test-expandargv-%d. Number of arguments don't match.\n", i);
-	  failed++;
-        }
-      /* Compare each of the argv's ... */
-      else
-        for (k = 0; k < argc_after; k++)
-          if (strcmp (argv_before[k], argv_after[k]) != 0)
-            {
-              printf ("FAIL: test-expandargv-%d. Arguments don't match.\n", i);
-              failed++;
-            }
-
-      if (!failed)
-        printf ("PASS: test-expandargv-%d.\n", i);
-      else
-        fails++;
+      fails += compare_argv (i, "expandargv", argc_before, argv_before,
+			     argc_after, argv_after);
 
       freeargv (argv_before);
       freeargv (argv_after);
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 0/2] Changes to libiberty buildargv
  2024-02-10 17:25 ` [PATCHv2 0/2] Changes to libiberty buildargv Andrew Burgess
  2024-02-10 17:26   ` [PATCHv2 1/2] libiberty/buildargv: POSIX behaviour for backslash handling Andrew Burgess
  2024-02-10 17:26   ` [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space Andrew Burgess
@ 2024-04-27  9:48   ` Andrew Burgess
  2 siblings, 0 replies; 10+ messages in thread
From: Andrew Burgess @ 2024-04-27  9:48 UTC (permalink / raw)
  To: gcc-patches


Ping!

Any thoughts on these patches?

Thanks,
Andrew


Andrew Burgess <aburgess@redhat.com> writes:

> I realise that these patches are not going to get merged until GCC is
> back in stage 1, but thought I'd post my latest set of changes for the
> libiberty buildargv function.
>
> Patch #1 is unchanged from V1.
>
> Patch #2 is new in V2.
>
> Thanks,
> Andrew
>
> ---
>
> Andrew Burgess (2):
>   libiberty/buildargv: POSIX behaviour for backslash handling
>   libiberty/buildargv: handle input consisting of only white space
>
>  libiberty/argv.c                      | 104 ++++++++--------
>  libiberty/testsuite/test-expandargv.c | 170 ++++++++++++++++++++++----
>  2 files changed, 200 insertions(+), 74 deletions(-)
>
>
> base-commit: cff174fabd6c980c09aee95db1d9d5c22421761f
> -- 
> 2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 1/2] libiberty/buildargv: POSIX behaviour for backslash handling
  2024-02-10 17:26   ` [PATCHv2 1/2] libiberty/buildargv: POSIX behaviour for backslash handling Andrew Burgess
@ 2024-05-26 15:03     ` Jeff Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2024-05-26 15:03 UTC (permalink / raw)
  To: Andrew Burgess, gcc-patches



On 2/10/24 10:26 AM, Andrew Burgess wrote:
> GDB makes use of the libiberty function buildargv for splitting the
> inferior (program being debugged) argument string in the case where
> the inferior is not being started under a shell.
> 
> I have recently been working to improve this area of GDB, and have
> tracked done some of the unexpected behaviour to the libiberty
> function buildargv, and how it handles backslash escapes.
> 
> For reference, I've been mostly reading:
> 
>    https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html
> 
> The issues that I would like to fix are:
> 
>    1. Backslashes within single quotes should not be treated as an
>    escape, thus: '\a' should split to \a, retaining the backslash.
> 
>    2. Backslashes within double quotes should only act as an escape if
>    they are immediately before one of the characters $ (dollar),
>    ` (backtick), " (double quote), ` (backslash), or \n (newline).  In
>    all other cases a backslash should not be treated as an escape
>    character.  Thus: "\a" should split to \a, but "\$" should split to
>    $.
> 
>    3. A backslash-newline sequence should be treated as a line
>    continuation, both the backslash and the newline should be removed.
> 
> I've updated libiberty and also added some tests.  All the existing
> libiberty tests continue to pass, but I'm not sure if there is more
> testing that should be done, buildargv is used within lto-wraper.cc,
> so maybe there's some testing folk can suggest that I run?
I'm not immediately aware of other tests that we should be running other 
than the usual bootstrap & regression test.

You do need a ChangeLog entry or the pre-commit hooks are going to 
complain loudly.

OK for the trunk with the usual testing.  I know you're typically on the 
GDB side of things, so if you need help on the testing GCC, let us know.

jeff



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space
  2024-02-10 17:26   ` [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space Andrew Burgess
@ 2024-05-26 15:08     ` Jeff Law
  2024-06-11 10:39       ` Andrew Burgess
  2024-06-11 10:41       ` Andrew Burgess
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff Law @ 2024-05-26 15:08 UTC (permalink / raw)
  To: Andrew Burgess, gcc-patches



On 2/10/24 10:26 AM, Andrew Burgess wrote:
> GDB makes use of the libiberty function buildargv for splitting the
> inferior (program being debugged) argument string in the case where
> the inferior is not being started under a shell.
> 
> I have recently been working to improve this area of GDB, and noticed
> some unexpected behaviour to the libiberty function buildargv, when
> the input is a string consisting only of white space.
> 
> What I observe is that if the input to buildargv is a string
> containing only white space, then buildargv will return an argv list
> containing a single empty argument, e.g.:
> 
>    char **argv = buildargv (" ");
>    assert (*argv[0] == '\0');
>    assert (argv[1] == NULL);
> 
> We get the same output from buildargv if the input is a single space,
> or multiple spaces.  Other white space characters give the same
> results.
> 
> This doesn't seem right to me, and in fact, there appears to be a work
> around for this issue in expandargv where we have this code:
> 
>    /* If the file is empty or contains only whitespace, buildargv would
>       return a single empty argument.  In this context we want no arguments,
>       instead.  */
>    if (only_whitespace (buffer))
>      {
>        file_argv = (char **) xmalloc (sizeof (char *));
>        file_argv[0] = NULL;
>      }
>    else
>      /* Parse the string.  */
>      file_argv = buildargv (buffer);
> 
> I think that the correct behaviour in this situation is to return an
> empty argv array, e.g.:
> 
>    char **argv = buildargv (" ");
>    assert (argv[0] == NULL);
> 
> And it turns out that this is a trivial change to buildargv.  The diff
> does look big, but this is because I've re-indented a block.  Check
> with 'git diff -b' to see the minimal changes.  I've also removed the
> work around from expandargv.
> 
> When testing this sort of thing I normally write the tests first, and
> then fix the code.  In this case test-expandargv.c has sort-of been
> used as a mechanism for testing the buildargv function (expandargv
> does call buildargv most of the time), however, for this particular
> issue the work around in expandargv (mentioned above) masked the
> buildargv bug.
> 
> I did consider adding a new test-buildargv.c file, however, this would
> have basically been a copy & paste of test-expandargv.c (with some
> minor changes to call buildargv).  This would be fine now, but feels
> like we would eventually end up with one file not being updated as
> much as the other, and so test coverage would suffer.
> 
> Instead, I have added some explicit buildargv testing to the
> test-expandargv.c file, this reuses the test input that is already
> defined for expandargv.
> 
> Of course, once I removed the work around from expandargv then we now
> do always call buildargv from expandargv, and so the bug I'm fixing
> would impact both expandargv and buildargv, so maybe the new testing
> is redundant?  I tend to think more testing is always better, so I've
> left it in for now.
So just an FYI.  Sometimes folks include the -b diffs as well for these 
scenarios.  THe problem with doing so (as I recently stumbled over 
myself) is the bots which monitor the list and apply patches get quite 
confused by that practice.  Anyway, just something to be aware of.

As for testing, I tend to agree, more is better unless we're highly 
confident its redundant.  So I'll go with your judgment on 
redundant-ness of the test.

As with the prior patch, you'll need to run it through the usual 
bootstrap/regression cycle and cobble together a ChangeLog.

OK once those things are taken care of.

jeff



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space
  2024-05-26 15:08     ` Jeff Law
@ 2024-06-11 10:39       ` Andrew Burgess
  2024-06-11 10:41       ` Andrew Burgess
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Burgess @ 2024-06-11 10:39 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

Jeff Law <jeffreyalaw@gmail.com> writes:

> On 2/10/24 10:26 AM, Andrew Burgess wrote:
>> GDB makes use of the libiberty function buildargv for splitting the
>> inferior (program being debugged) argument string in the case where
>> the inferior is not being started under a shell.
>> 
>> I have recently been working to improve this area of GDB, and noticed
>> some unexpected behaviour to the libiberty function buildargv, when
>> the input is a string consisting only of white space.
>> 
>> What I observe is that if the input to buildargv is a string
>> containing only white space, then buildargv will return an argv list
>> containing a single empty argument, e.g.:
>> 
>>    char **argv = buildargv (" ");
>>    assert (*argv[0] == '\0');
>>    assert (argv[1] == NULL);
>> 
>> We get the same output from buildargv if the input is a single space,
>> or multiple spaces.  Other white space characters give the same
>> results.
>> 
>> This doesn't seem right to me, and in fact, there appears to be a work
>> around for this issue in expandargv where we have this code:
>> 
>>    /* If the file is empty or contains only whitespace, buildargv would
>>       return a single empty argument.  In this context we want no arguments,
>>       instead.  */
>>    if (only_whitespace (buffer))
>>      {
>>        file_argv = (char **) xmalloc (sizeof (char *));
>>        file_argv[0] = NULL;
>>      }
>>    else
>>      /* Parse the string.  */
>>      file_argv = buildargv (buffer);
>> 
>> I think that the correct behaviour in this situation is to return an
>> empty argv array, e.g.:
>> 
>>    char **argv = buildargv (" ");
>>    assert (argv[0] == NULL);
>> 
>> And it turns out that this is a trivial change to buildargv.  The diff
>> does look big, but this is because I've re-indented a block.  Check
>> with 'git diff -b' to see the minimal changes.  I've also removed the
>> work around from expandargv.
>> 
>> When testing this sort of thing I normally write the tests first, and
>> then fix the code.  In this case test-expandargv.c has sort-of been
>> used as a mechanism for testing the buildargv function (expandargv
>> does call buildargv most of the time), however, for this particular
>> issue the work around in expandargv (mentioned above) masked the
>> buildargv bug.
>> 
>> I did consider adding a new test-buildargv.c file, however, this would
>> have basically been a copy & paste of test-expandargv.c (with some
>> minor changes to call buildargv).  This would be fine now, but feels
>> like we would eventually end up with one file not being updated as
>> much as the other, and so test coverage would suffer.
>> 
>> Instead, I have added some explicit buildargv testing to the
>> test-expandargv.c file, this reuses the test input that is already
>> defined for expandargv.
>> 
>> Of course, once I removed the work around from expandargv then we now
>> do always call buildargv from expandargv, and so the bug I'm fixing
>> would impact both expandargv and buildargv, so maybe the new testing
>> is redundant?  I tend to think more testing is always better, so I've
>> left it in for now.
> So just an FYI.  Sometimes folks include the -b diffs as well for these 
> scenarios.  THe problem with doing so (as I recently stumbled over 
> myself) is the bots which monitor the list and apply patches get quite 
> confused by that practice.  Anyway, just something to be aware of.
>
> As for testing, I tend to agree, more is better unless we're highly 
> confident its redundant.  So I'll go with your judgment on 
> redundant-ness of the test.
>
> As with the prior patch, you'll need to run it through the usual 
> bootstrap/regression cycle and cobble together a ChangeLog.
>
> OK once those things are taken care of.

Jeff,

Thanks for looking these patches over.

For testing, using current(ish) gcc HEAD, on x86-64 GNU/Linux, I:

  ../src/configure --prefix=$(cd .. && pwd)/install
  make
  make check

I did this with / without my patch and then:

  find . -name "*.sum"
  ... compare all .sum files ...

There was no change in any of the .sum files.

  1. Am I correct that this will have run the bootstrap test by default?

  2. Is there any other testing I should be doing?

  3. If not, am I OK to apply both patches in this series?

Thanks,
Andrew



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space
  2024-05-26 15:08     ` Jeff Law
  2024-06-11 10:39       ` Andrew Burgess
@ 2024-06-11 10:41       ` Andrew Burgess
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Burgess @ 2024-06-11 10:41 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

Jeff Law <jeffreyalaw@gmail.com> writes:

> On 2/10/24 10:26 AM, Andrew Burgess wrote:
>> GDB makes use of the libiberty function buildargv for splitting the
>> inferior (program being debugged) argument string in the case where
>> the inferior is not being started under a shell.
>> 
>> I have recently been working to improve this area of GDB, and noticed
>> some unexpected behaviour to the libiberty function buildargv, when
>> the input is a string consisting only of white space.
>> 
>> What I observe is that if the input to buildargv is a string
>> containing only white space, then buildargv will return an argv list
>> containing a single empty argument, e.g.:
>> 
>>    char **argv = buildargv (" ");
>>    assert (*argv[0] == '\0');
>>    assert (argv[1] == NULL);
>> 
>> We get the same output from buildargv if the input is a single space,
>> or multiple spaces.  Other white space characters give the same
>> results.
>> 
>> This doesn't seem right to me, and in fact, there appears to be a work
>> around for this issue in expandargv where we have this code:
>> 
>>    /* If the file is empty or contains only whitespace, buildargv would
>>       return a single empty argument.  In this context we want no arguments,
>>       instead.  */
>>    if (only_whitespace (buffer))
>>      {
>>        file_argv = (char **) xmalloc (sizeof (char *));
>>        file_argv[0] = NULL;
>>      }
>>    else
>>      /* Parse the string.  */
>>      file_argv = buildargv (buffer);
>> 
>> I think that the correct behaviour in this situation is to return an
>> empty argv array, e.g.:
>> 
>>    char **argv = buildargv (" ");
>>    assert (argv[0] == NULL);
>> 
>> And it turns out that this is a trivial change to buildargv.  The diff
>> does look big, but this is because I've re-indented a block.  Check
>> with 'git diff -b' to see the minimal changes.  I've also removed the
>> work around from expandargv.
>> 
>> When testing this sort of thing I normally write the tests first, and
>> then fix the code.  In this case test-expandargv.c has sort-of been
>> used as a mechanism for testing the buildargv function (expandargv
>> does call buildargv most of the time), however, for this particular
>> issue the work around in expandargv (mentioned above) masked the
>> buildargv bug.
>> 
>> I did consider adding a new test-buildargv.c file, however, this would
>> have basically been a copy & paste of test-expandargv.c (with some
>> minor changes to call buildargv).  This would be fine now, but feels
>> like we would eventually end up with one file not being updated as
>> much as the other, and so test coverage would suffer.
>> 
>> Instead, I have added some explicit buildargv testing to the
>> test-expandargv.c file, this reuses the test input that is already
>> defined for expandargv.
>> 
>> Of course, once I removed the work around from expandargv then we now
>> do always call buildargv from expandargv, and so the bug I'm fixing
>> would impact both expandargv and buildargv, so maybe the new testing
>> is redundant?  I tend to think more testing is always better, so I've
>> left it in for now.
> So just an FYI.  Sometimes folks include the -b diffs as well for these 
> scenarios.  THe problem with doing so (as I recently stumbled over 
> myself) is the bots which monitor the list and apply patches get quite 
> confused by that practice.  Anyway, just something to be aware of.

I meant to say: I used to do this (diff -b), but I stopped as it was
causing more problems.  Not just bots trying to apply patches, but
reviewers who want to download and apply patches to test.

I'm happy to send the reduced diff if it helps though.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-06-11 10:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-06 16:50 [PATCH] libiberty/buildargv: POSIX behaviour for backslash handling Andrew Burgess
2024-01-02 11:22 ` Ping: " Andrew Burgess
2024-02-10 17:25 ` [PATCHv2 0/2] Changes to libiberty buildargv Andrew Burgess
2024-02-10 17:26   ` [PATCHv2 1/2] libiberty/buildargv: POSIX behaviour for backslash handling Andrew Burgess
2024-05-26 15:03     ` Jeff Law
2024-02-10 17:26   ` [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space Andrew Burgess
2024-05-26 15:08     ` Jeff Law
2024-06-11 10:39       ` Andrew Burgess
2024-06-11 10:41       ` Andrew Burgess
2024-04-27  9:48   ` [PATCHv2 0/2] Changes to libiberty buildargv Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).