public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] gdb: add a numeric check after the exponent (PR cli/24124)
@ 2022-09-04  8:36 Enze Li
  2022-09-04  8:42 ` Andreas Schwab
  2022-09-05 13:57 ` [PATCH v2] " Enze Li
  0 siblings, 2 replies; 8+ messages in thread
From: Enze Li @ 2022-09-04  8:36 UTC (permalink / raw)
  To: gdb-patches; +Cc: enze.li

PR cli/24124 points out that `b *804874d` or `b *804874f` gives output
`Invalid number "804874d".` or `Invalid number "804874f".`  And the
output of `b *804874e` is `Breakpoint 1 at 0xc480a`.

That is to say, when "e" or "E" appears after a decimal value, it will
be incorrectly parsed as a floating point number.  Importantly, this
parsing is not consistent with the C language.

The initial idea was to perform a "0x" or "0X" check of the address.
But Tom pointed out that the text after the "*" is an arbitrary
expression, not just an integer.  Therefore, I realized that this idea
was going in the wrong direction.

After digging a bit deeper, I found that the root cause of this problem
is that the lex_one_token function doesn't check for the case where the
exponent has no digits.  If we check this, GDB will not continue parsing
the invalid numbers.

Before this patch applied, things like

  (gdb) b *804874d
  Invalid number "804874d".
  (gdb) b *804874e
  Breakpoint 1 at 0xc480a
  (gdb) print 80d
  Invalid number "80d".
  (gdb) ptype 80d
  Invalid number "80d".
  (gdb) print 80e
  $1 = 80
  (gdb) ptype 80e
  type = double

The new behavior is

  (gdb) b *804874d
  Invalid number "804874d".
  (gdb) b *804874e
  Invalid number "804874e".
  (gdb) print 80d
  Invalid number "80d".
  (gdb) ptype 80d
  Invalid number "80d".
  (gdb) print 80e
  Invalid number "80e".
  (gdb) ptype 80e
  Invalid number "80e".

Tested on x86_64-linux.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24124
---
 gdb/c-exp.y                         | 3 ++-
 gdb/testsuite/gdb.base/commands.exp | 7 +++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/gdb/c-exp.y b/gdb/c-exp.y
index 61a61fcba09c..b13de967b1b6 100644
--- a/gdb/c-exp.y
+++ b/gdb/c-exp.y
@@ -2769,7 +2769,8 @@ lex_one_token (struct parser_state *par_state, bool *is_quoted_name)
 	    /* This test includes !hex because 'e' is a valid hex digit
 	       and thus does not indicate a floating point number when
 	       the radix is hex.  */
-	    if (!hex && !got_e && !got_p && (*p == 'e' || *p == 'E'))
+	    if (!hex && !got_e && !got_p && (*p == 'e' || *p == 'E')
+	        && p[1] >= '0' && p[1] <= '9')
 	      got_dot = got_e = 1;
 	    else if (!got_e && !got_p && (*p == 'p' || *p == 'P'))
 	      got_dot = got_p = 1;
diff --git a/gdb/testsuite/gdb.base/commands.exp b/gdb/testsuite/gdb.base/commands.exp
index 3eb4463cd1a5..ca30b757e38a 100644
--- a/gdb/testsuite/gdb.base/commands.exp
+++ b/gdb/testsuite/gdb.base/commands.exp
@@ -278,6 +278,13 @@ proc_with_prefix breakpoint_command_test {} {
     gdb_test "continue" \
 	    "Breakpoint \[0-9\]*, factorial.*Now the value is 5"
     gdb_test "print value" " = 5"
+
+    gdb_test "break *804874d" "Invalid number.*" "804874d is an invalid number"
+    gdb_test "break *804874e" "Invalid number.*" "804874e is an invalid number"
+    gdb_test "print 80d" "Invalid number.*" "(print) 80d is an invalid number"
+    gdb_test "ptype 80d" "Invalid number.*" "(ptype) 80d is an invalid number"
+    gdb_test "print 80e" "Invalid number.*" "(print) 80e is an invalid number"
+    gdb_test "ptype 80e" "Invalid number.*" "(ptype) 80e is an invalid number"
 }
 
 # Test clearing the commands of several breakpoints with one single "end".
-- 
2.37.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gdb: add a numeric check after the exponent (PR cli/24124)
  2022-09-04  8:36 [PATCH] gdb: add a numeric check after the exponent (PR cli/24124) Enze Li
@ 2022-09-04  8:42 ` Andreas Schwab
  2022-09-04 10:01   ` Enze Li
  2022-09-05 13:57 ` [PATCH v2] " Enze Li
  1 sibling, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2022-09-04  8:42 UTC (permalink / raw)
  To: Enze Li via Gdb-patches; +Cc: Enze Li, enze.li

On Sep 04 2022, Enze Li via Gdb-patches wrote:

> diff --git a/gdb/c-exp.y b/gdb/c-exp.y
> index 61a61fcba09c..b13de967b1b6 100644
> --- a/gdb/c-exp.y
> +++ b/gdb/c-exp.y
> @@ -2769,7 +2769,8 @@ lex_one_token (struct parser_state *par_state, bool *is_quoted_name)
>  	    /* This test includes !hex because 'e' is a valid hex digit
>  	       and thus does not indicate a floating point number when
>  	       the radix is hex.  */
> -	    if (!hex && !got_e && !got_p && (*p == 'e' || *p == 'E'))
> +	    if (!hex && !got_e && !got_p && (*p == 'e' || *p == 'E')
> +	        && p[1] >= '0' && p[1] <= '9')

'e' can be followed by a signed number.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gdb: add a numeric check after the exponent (PR cli/24124)
  2022-09-04  8:42 ` Andreas Schwab
@ 2022-09-04 10:01   ` Enze Li
  0 siblings, 0 replies; 8+ messages in thread
From: Enze Li @ 2022-09-04 10:01 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Enze Li via Gdb-patches, enze.li

On Sun, Sep 04, 2022 at 10:42:43 AM +0200, Andreas Schwab wrote:

> On Sep 04 2022, Enze Li via Gdb-patches wrote:
>
>> diff --git a/gdb/c-exp.y b/gdb/c-exp.y
>> index 61a61fcba09c..b13de967b1b6 100644
>> --- a/gdb/c-exp.y
>> +++ b/gdb/c-exp.y
>> @@ -2769,7 +2769,8 @@ lex_one_token (struct parser_state *par_state, bool *is_quoted_name)
>>  	    /* This test includes !hex because 'e' is a valid hex digit
>>  	       and thus does not indicate a floating point number when
>>  	       the radix is hex.  */
>> -	    if (!hex && !got_e && !got_p && (*p == 'e' || *p == 'E'))
>> +	    if (!hex && !got_e && !got_p && (*p == 'e' || *p == 'E')
>> +	        && p[1] >= '0' && p[1] <= '9')
>
> 'e' can be followed by a signed number.

Oops, I didn't metion that.  Thank you for reminding me.

I'll soon update this patch in current thread.

Thanks,
Enze

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] gdb: add a numeric check after the exponent (PR cli/24124)
  2022-09-04  8:36 [PATCH] gdb: add a numeric check after the exponent (PR cli/24124) Enze Li
  2022-09-04  8:42 ` Andreas Schwab
@ 2022-09-05 13:57 ` Enze Li
  2022-09-21 18:12   ` Tom Tromey
  1 sibling, 1 reply; 8+ messages in thread
From: Enze Li @ 2022-09-05 13:57 UTC (permalink / raw)
  To: gdb-patches; +Cc: enze.li, schwab

PR cli/24124 points out that `b *804874d` or `b *804874f` gives output
`Invalid number "804874d".` or `Invalid number "804874f".`  And the
output of `b *804874e` is `Breakpoint 1 at 0xc480a`.

That is to say, when "e" or "E" appears after a decimal value, it will
be incorrectly parsed as a floating point number.  Importantly, this
parsing is not consistent with the C language.

The initial idea was to perform a "0x" or "0X" check of the address.
But Tom pointed out that the text after the "*" is an arbitrary
expression, not just an integer.  Therefore, I realized that this idea
was going in the wrong direction.

After digging a bit deeper, I found that the root cause of this problem
is that the lex_one_token function doesn't check for the case where the
exponent has no digits.  If we check this, GDB will not continue parsing
the invalid numbers.

Before this patch applied, things like

  (gdb) b *804874d
  Invalid number "804874d".
  (gdb) b *804874e
  Breakpoint 1 at 0xc480a
  (gdb) print 80d
  Invalid number "80d".
  (gdb) ptype 80d
  Invalid number "80d".
  (gdb) print 80e
  $1 = 80
  (gdb) ptype 80e
  type = double

The new behavior is

  (gdb) b *804874d
  Invalid number "804874d".
  (gdb) b *804874e
  Invalid number "804874e".
  (gdb) print 80d
  Invalid number "80d".
  (gdb) ptype 80d
  Invalid number "80d".
  (gdb) print 80e
  Invalid number "80e".
  (gdb) ptype 80e
  Invalid number "80e".

New in v2:

- Address Andreas's comments, avoid breaking the resolution of the sign
  of the exponent.
- Add test cases for exponent with sign.

Tested on x86_64-linux.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24124
---
 gdb/c-exp.y                         | 10 ++++++++++
 gdb/testsuite/gdb.base/commands.exp | 11 +++++++++++
 2 files changed, 21 insertions(+)

diff --git a/gdb/c-exp.y b/gdb/c-exp.y
index 61a61fcba09c..9827de9ad549 100644
--- a/gdb/c-exp.y
+++ b/gdb/c-exp.y
@@ -2752,6 +2752,7 @@ lex_one_token (struct parser_state *par_state, bool *is_quoted_name)
 	int got_dot = 0, got_e = 0, got_p = 0, toktype;
 	const char *p = tokstart;
 	int hex = input_radix > 10;
+	int exp_num = 0;
 
 	if (c == '0' && (p[1] == 'x' || p[1] == 'X'))
 	  {
@@ -2783,6 +2784,9 @@ lex_one_token (struct parser_state *par_state, bool *is_quoted_name)
 	      /* This is the sign of the exponent, not the end of the
 		 number.  */
 	      continue;
+	    /* This is the digit of the exponent.  */
+	    else if (got_e && *p >= '0' && *p <= '9')
+	      exp_num++;
 	    /* We will take any letters or digits.  parse_number will
 	       complain if past the radix, or if L or U are not final.  */
 	    else if ((*p < '0' || *p > '9')
@@ -2790,6 +2794,12 @@ lex_one_token (struct parser_state *par_state, bool *is_quoted_name)
 				  && (*p < 'A' || *p > 'Z')))
 	      break;
 	  }
+
+	/* If the exponent has no digits, it must be invalid.  There is
+	   no need to continue parsing.  */
+	if (got_e && exp_num == 0)
+	  error (_("Invalid number \"%s\"."), tokstart);
+
 	toktype = parse_number (par_state, tokstart, p - tokstart,
 				got_dot | got_e | got_p, &yylval);
 	if (toktype == ERROR)
diff --git a/gdb/testsuite/gdb.base/commands.exp b/gdb/testsuite/gdb.base/commands.exp
index 3eb4463cd1a5..10c4c2722901 100644
--- a/gdb/testsuite/gdb.base/commands.exp
+++ b/gdb/testsuite/gdb.base/commands.exp
@@ -278,6 +278,17 @@ proc_with_prefix breakpoint_command_test {} {
     gdb_test "continue" \
 	    "Breakpoint \[0-9\]*, factorial.*Now the value is 5"
     gdb_test "print value" " = 5"
+
+    gdb_test "break *804874d" "Invalid number.*" "804874d is an invalid number"
+    gdb_test "break *804874e" "Invalid number.*" "804874e is an invalid number"
+    gdb_test "print 80d" "Invalid number.*" "(print) 80d is an invalid number"
+    gdb_test "ptype 80d" "Invalid number.*" "(ptype) 80d is an invalid number"
+    gdb_test "print 80e" "Invalid number.*" "(print) 80e is an invalid number"
+    gdb_test "ptype 80e" "Invalid number.*" "(ptype) 80e is an invalid number"
+    gdb_test "print 80e0" " = 80" "(print) 80e0"
+    gdb_test "print 80e1" " = 800" "(print) 80e1"
+    gdb_test "print 80e+2" " = 8000" "(print) 80e+2"
+    gdb_test "ptype 80e-1" " = double" "(ptype) 80e-1"
 }
 
 # Test clearing the commands of several breakpoints with one single "end".
-- 
2.37.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] gdb: add a numeric check after the exponent (PR cli/24124)
  2022-09-05 13:57 ` [PATCH v2] " Enze Li
@ 2022-09-21 18:12   ` Tom Tromey
  2022-09-22 14:09     ` Enze Li
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Tromey @ 2022-09-21 18:12 UTC (permalink / raw)
  To: Enze Li via Gdb-patches; +Cc: Enze Li, schwab, enze.li

>>>>> Enze Li via Gdb-patches <gdb-patches@sourceware.org> writes:

> PR cli/24124 points out that `b *804874d` or `b *804874f` gives output
> `Invalid number "804874d".` or `Invalid number "804874f".`  And the
> output of `b *804874e` is `Breakpoint 1 at 0xc480a`.

Thank you for the patch.

> That is to say, when "e" or "E" appears after a decimal value, it will
> be incorrectly parsed as a floating point number.  Importantly, this
> parsing is not consistent with the C language.

I suspect the change has to be in c-exp.y:parse_number, in order to
support the case where the input radix is 16.  In this situation,
something like "80e" is a valid number.

The input radix seems like a misfeature, IMNSHO, but I think people do
actually use it.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] gdb: add a numeric check after the exponent (PR cli/24124)
  2022-09-21 18:12   ` Tom Tromey
@ 2022-09-22 14:09     ` Enze Li
  2022-09-23 13:47       ` Tom Tromey
  0 siblings, 1 reply; 8+ messages in thread
From: Enze Li @ 2022-09-22 14:09 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Enze Li via Gdb-patches, schwab, enze.li

Hi Tom,

Thanks for your review.  A few questions, please see below.

On Wed, Sep 21 2022 at 12:12:45 PM -0600, Tom Tromey wrote:

>>>>>> Enze Li via Gdb-patches <gdb-patches@sourceware.org> writes:
>
>> PR cli/24124 points out that `b *804874d` or `b *804874f` gives output
>> `Invalid number "804874d".` or `Invalid number "804874f".`  And the
>> output of `b *804874e` is `Breakpoint 1 at 0xc480a`.
>
> Thank you for the patch.
>
>> That is to say, when "e" or "E" appears after a decimal value, it will
>> be incorrectly parsed as a floating point number.  Importantly, this
>> parsing is not consistent with the C language.
>
> I suspect the change has to be in c-exp.y:parse_number, in order to
> support the case where the input radix is 16.  In this situation,
                                            ^^
Sorry, I didn't get that.  Shouldn't this radix be 10?

> something like "80e" is a valid number.

If the radix is 16, is that mean that we should treat the "80" as a hex
number, and "80e" equals "128e0"(radix is 10)?

>
> The input radix seems like a misfeature, IMNSHO, but I think people do
> actually use it.

In addition, do these test cases below meet expectations?
......
+    gdb_test "break *804874d" "Invalid number.*" "804874d is an invalid number"
+    gdb_test "break *804874e" "Invalid number.*" "804874e is an invalid number"
+    gdb_test "print 80d" "Invalid number.*" "(print) 80d is an invalid number"
+    gdb_test "ptype 80d" "Invalid number.*" "(ptype) 80d is an invalid number"
+    gdb_test "print 80e" "Invalid number.*" "(print) 80e is an invalid number"
+    gdb_test "ptype 80e" "Invalid number.*" "(ptype) 80e is an invalid number"
+    gdb_test "print 80e0" " = 80" "(print) 80e0"
+    gdb_test "print 80e1" " = 800" "(print) 80e1"
+    gdb_test "print 80e+2" " = 8000" "(print) 80e+2"
+    gdb_test "ptype 80e-1" " = double" "(ptype) 80e-1"
......

Thanks,
Enze

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] gdb: add a numeric check after the exponent (PR cli/24124)
  2022-09-22 14:09     ` Enze Li
@ 2022-09-23 13:47       ` Tom Tromey
  2022-10-02 12:15         ` Enze Li
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Tromey @ 2022-09-23 13:47 UTC (permalink / raw)
  To: Enze Li via Gdb-patches; +Cc: Tom Tromey, Enze Li, schwab, enze.li

>>>>> "Enze" == Enze Li via Gdb-patches <gdb-patches@sourceware.org> writes:

>> I suspect the change has to be in c-exp.y:parse_number, in order to
>> support the case where the input radix is 16.  In this situation,
Enze>                                             ^^
Enze> Sorry, I didn't get that.  Shouldn't this radix be 10?

No, what I mean is:

(gdb) set input-radix 16
Input radix now set to decimal 16, hex 10, octal 20.
(gdb) print 80e
$1 = 2062

I think with your patch this will issue an error.

Enze> In addition, do these test cases below meet expectations?

They seem fine to me.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] gdb: add a numeric check after the exponent (PR cli/24124)
  2022-09-23 13:47       ` Tom Tromey
@ 2022-10-02 12:15         ` Enze Li
  0 siblings, 0 replies; 8+ messages in thread
From: Enze Li @ 2022-10-02 12:15 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Enze Li via Gdb-patches, schwab, enze.li

Hi Tom,

Sorry for the delayed response.

On Fri, Sep 23 2022 at 07:47:51 AM -0600, Tom Tromey wrote:

>>>>>> "Enze" == Enze Li via Gdb-patches <gdb-patches@sourceware.org> writes:
>
>>> I suspect the change has to be in c-exp.y:parse_number, in order to
>>> support the case where the input radix is 16.  In this situation,
> Enze>                                             ^^
> Enze> Sorry, I didn't get that.  Shouldn't this radix be 10?
>
> No, what I mean is:
>
> (gdb) set input-radix 16
> Input radix now set to decimal 16, hex 10, octal 20.
> (gdb) print 80e
> $1 = 2062
>
> I think with your patch this will issue an error.

I got the same output as you metioned above with v2 of the patch
applied.  If I set input-radix to 16, the "input_radix"(line 2754)
variable is 16, and "hex" will be 1.  In this case, GDB assumes that the
"80e" is a hex number.  None of the following if and else-if conditions
are satisfied.  When GDB runs to line 2767, "hex" is still 1, this is vital.

......
2751    /* It's a number.  */
2752    int got_dot = 0, got_e = 0, got_p = 0, toktype;
2753    const char *p = tokstart;
2754    int hex = input_radix > 10;
2755    int exp_num = 0;
2756
2757    if (c == '0' && (p[1] == 'x' || p[1] == 'X'))
2758      {
2759        p += 2;
2760        hex = 1;
2761      }
2762    else if (c == '0' && (p[1]=='t' || p[1]=='T' || p[1]=='d' || p[1]=='D'))
2763      {
2764        p += 2;
2765        hex = 0;
2766      }
2767
......

Now, GDB runs to line 2773, the condition cannot be satisfied because
"hex" is 1, which means that "got_e" cannot be set to 1.  Then GDB runs
to line 2788, the judgment condition contains "got_e", and "got_e" is
now 0, which does not satisfy this condition, so it is not affected for
"80e".

......
2768	for (;; ++p)
2769	  {
2770	    /* This test includes !hex because 'e' is a valid hex digit
2771	       and thus does not indicate a floating point number when
2772	       the radix is hex.  */
2773	    if (!hex && !got_e && !got_p && (*p == 'e' || *p == 'E'))
2774	      got_dot = got_e = 1;
2775	    else if (!got_e && !got_p && (*p == 'p' || *p == 'P'))
2776	      got_dot = got_p = 1;
2777	    /* This test does not include !hex, because a '.' always indicates
2778	       a decimal floating point number regardless of the radix.  */
2779	    else if (!got_dot && *p == '.')
2780	      got_dot = 1;
2781	    else if (((got_e && (p[-1] == 'e' || p[-1] == 'E'))
2782		      || (got_p && (p[-1] == 'p' || p[-1] == 'P')))
2783		     && (*p == '-' || *p == '+'))
2784	      /* This is the sign of the exponent, not the end of the
2785		 number.  */
2786	      continue;
2787	    /* This is the digit of the exponent.  */
2788	    else if (got_e && *p >= '0' && *p <= '9')
2789	      exp_num++;
......

This is my understanding.  Did I miss something?

Best Regards,
Enze

>
> Enze> In addition, do these test cases below meet expectations?
>
> They seem fine to me.
>
> Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-02 12:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-04  8:36 [PATCH] gdb: add a numeric check after the exponent (PR cli/24124) Enze Li
2022-09-04  8:42 ` Andreas Schwab
2022-09-04 10:01   ` Enze Li
2022-09-05 13:57 ` [PATCH v2] " Enze Li
2022-09-21 18:12   ` Tom Tromey
2022-09-22 14:09     ` Enze Li
2022-09-23 13:47       ` Tom Tromey
2022-10-02 12:15         ` Enze Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).