public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Handle encoding failures in Windows thread names
@ 2022-04-21 14:39 Tom Tromey
  2022-04-21 15:47 ` Eli Zaretskii
  2022-06-02 14:19 ` Jon Turney
  0 siblings, 2 replies; 10+ messages in thread
From: Tom Tromey @ 2022-04-21 14:39 UTC (permalink / raw)
  To: gdb-patches; +Cc: Tom Tromey

Internally at AdaCore, we noticed that the new Windows thread name
code could fail.  First, it might return a zero-length string, but in
gdb conventions it should return nullptr instead.  Second, an encoding
failure could wind up showing replacement characters to the user; this
is confusing and not useful; it's better to recognize such errors and
simply discard the name.  This patch makes both of these changes.
---
 gdb/nat/windows-nat.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gdb/nat/windows-nat.c b/gdb/nat/windows-nat.c
index bd1b9459145..7a4e804f891 100644
--- a/gdb/nat/windows-nat.c
+++ b/gdb/nat/windows-nat.c
@@ -119,12 +119,19 @@ windows_thread_info::thread_name ()
       HRESULT result = GetThreadDescription (h, &value);
       if (SUCCEEDED (result))
 	{
-	  size_t needed = wcstombs (nullptr, value, 0);
-	  if (needed != (size_t) -1)
+	  int needed = WideCharToMultiByte (CP_ACP, 0, value, -1, nullptr, 0,
+					    nullptr, nullptr);
+	  if (needed != 0)
 	    {
-	      name.reset ((char *) xmalloc (needed));
-	      if (wcstombs (name.get (), value, needed) == (size_t) -1)
-		name.reset ();
+	      BOOL used_default = FALSE;
+	      gdb::unique_xmalloc_ptr<char> new_name
+		((char *) xmalloc (needed));
+	      if (WideCharToMultiByte (CP_ACP, 0, value, -1,
+				       new_name.get (), needed,
+				       nullptr, &used_default) == needed
+		  && !used_default
+		  && strlen (new_name.get ()) > 0)
+		name = std::move (new_name);
 	    }
 	  LocalFree (value);
 	}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Handle encoding failures in Windows thread names
  2022-04-21 14:39 [PATCH] Handle encoding failures in Windows thread names Tom Tromey
@ 2022-04-21 15:47 ` Eli Zaretskii
  2022-04-26 18:53   ` Tom Tromey
  2022-06-02 14:19 ` Jon Turney
  1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2022-04-21 15:47 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

> Date: Thu, 21 Apr 2022 08:39:26 -0600
> From: Tom Tromey via Gdb-patches <gdb-patches@sourceware.org>
> Cc: Tom Tromey <tromey@adacore.com>
> 
> Internally at AdaCore, we noticed that the new Windows thread name
> code could fail.  First, it might return a zero-length string, but in
> gdb conventions it should return nullptr instead.  Second, an encoding
> failure could wind up showing replacement characters to the user; this
> is confusing and not useful; it's better to recognize such errors and
> simply discard the name.  This patch makes both of these changes.

I suggest to explain in a comment how this code detects encoding
failures.  The documentation of WideCharToMultiByte is not simple to
understand in this regard, and "used_default" is not expressive enough
to explain its role here, so I suggest not to rely on the reader to
know those subtleties.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Handle encoding failures in Windows thread names
  2022-04-21 15:47 ` Eli Zaretskii
@ 2022-04-26 18:53   ` Tom Tromey
  0 siblings, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2022-04-26 18:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Tom Tromey, gdb-patches

Eli> I suggest to explain in a comment how this code detects encoding
Eli> failures.  The documentation of WideCharToMultiByte is not simple to
Eli> understand in this regard, and "used_default" is not expressive enough
Eli> to explain its role here, so I suggest not to rely on the reader to
Eli> know those subtleties.

I've added a comment and I'm checking this in.

Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Handle encoding failures in Windows thread names
  2022-04-21 14:39 [PATCH] Handle encoding failures in Windows thread names Tom Tromey
  2022-04-21 15:47 ` Eli Zaretskii
@ 2022-06-02 14:19 ` Jon Turney
  2022-06-02 14:33   ` Tom Tromey
  2022-06-02 16:07   ` Eli Zaretskii
  1 sibling, 2 replies; 10+ messages in thread
From: Jon Turney @ 2022-06-02 14:19 UTC (permalink / raw)
  To: Tom Tromey, gdb-patches

On 21/04/2022 15:39, Tom Tromey via Gdb-patches wrote:
> Internally at AdaCore, we noticed that the new Windows thread name
> code could fail.  First, it might return a zero-length string, but in
> gdb conventions it should return nullptr instead.  Second, an encoding
> failure could wind up showing replacement characters to the user; this
> is confusing and not useful; it's better to recognize such errors and
> simply discard the name.  This patch makes both of these changes.
> ---
>   gdb/nat/windows-nat.c | 17 ++++++++++++-----
>   1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/gdb/nat/windows-nat.c b/gdb/nat/windows-nat.c
> index bd1b9459145..7a4e804f891 100644
> --- a/gdb/nat/windows-nat.c
> +++ b/gdb/nat/windows-nat.c
> @@ -119,12 +119,19 @@ windows_thread_info::thread_name ()
>         HRESULT result = GetThreadDescription (h, &value);
>         if (SUCCEEDED (result))
>   	{
> -	  size_t needed = wcstombs (nullptr, value, 0);
> -	  if (needed != (size_t) -1)
> +	  int needed = WideCharToMultiByte (CP_ACP, 0, value, -1, nullptr, 0,
> +					    nullptr, nullptr);
> +	  if (needed != 0)
>   	    {
> -	      name.reset ((char *) xmalloc (needed));
> -	      if (wcstombs (name.get (), value, needed) == (size_t) -1)
> -		name.reset ();
> +	      BOOL used_default = FALSE;
> +	      gdb::unique_xmalloc_ptr<char> new_name
> +		((char *) xmalloc (needed));
> +	      if (WideCharToMultiByte (CP_ACP, 0, value, -1,
> +				       new_name.get (), needed,
> +				       nullptr, &used_default) == needed
> +		  && !used_default
> +		  && strlen (new_name.get ()) > 0)
> +		name = std::move (new_name);
>   	    }
>   	  LocalFree (value);
>   	}

This is probably wrong on Cygwin (as the target encoding should be 
Cygwin's conception of the locale, not the Windows codepage).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Handle encoding failures in Windows thread names
  2022-06-02 14:19 ` Jon Turney
@ 2022-06-02 14:33   ` Tom Tromey
  2022-06-02 19:29     ` Jon Turney
  2022-06-02 16:07   ` Eli Zaretskii
  1 sibling, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2022-06-02 14:33 UTC (permalink / raw)
  To: Jon Turney; +Cc: Tom Tromey, gdb-patches

>>>>> "Jon" == Jon Turney <jon.turney@dronecode.org.uk> writes:

Jon> This is probably wrong on Cygwin (as the target encoding should be
Jon> Cygwin's conception of the locale, not the Windows codepage).

If there's some way for gdb to know the locale of the inferior, I guess
we could use that here.  I don't have Cygwin and so I can't test it or
anything, but if I knew what to do I could try to write a patch for
someone else to test.

Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Handle encoding failures in Windows thread names
  2022-06-02 14:19 ` Jon Turney
  2022-06-02 14:33   ` Tom Tromey
@ 2022-06-02 16:07   ` Eli Zaretskii
  2022-06-02 19:29     ` Jon Turney
  1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2022-06-02 16:07 UTC (permalink / raw)
  To: Jon Turney; +Cc: tromey, gdb-patches

> Date: Thu, 2 Jun 2022 15:19:21 +0100
> From: Jon Turney <jon.turney@dronecode.org.uk>
> 
> > diff --git a/gdb/nat/windows-nat.c b/gdb/nat/windows-nat.c
> > index bd1b9459145..7a4e804f891 100644
> > --- a/gdb/nat/windows-nat.c
> > +++ b/gdb/nat/windows-nat.c
> > @@ -119,12 +119,19 @@ windows_thread_info::thread_name ()
> >         HRESULT result = GetThreadDescription (h, &value);
> >         if (SUCCEEDED (result))
> >   	{
> > -	  size_t needed = wcstombs (nullptr, value, 0);
> > -	  if (needed != (size_t) -1)
> > +	  int needed = WideCharToMultiByte (CP_ACP, 0, value, -1, nullptr, 0,
> > +					    nullptr, nullptr);
> > +	  if (needed != 0)
> >   	    {
> > -	      name.reset ((char *) xmalloc (needed));
> > -	      if (wcstombs (name.get (), value, needed) == (size_t) -1)
> > -		name.reset ();
> > +	      BOOL used_default = FALSE;
> > +	      gdb::unique_xmalloc_ptr<char> new_name
> > +		((char *) xmalloc (needed));
> > +	      if (WideCharToMultiByte (CP_ACP, 0, value, -1,
> > +				       new_name.get (), needed,
> > +				       nullptr, &used_default) == needed
> > +		  && !used_default
> > +		  && strlen (new_name.get ()) > 0)
> > +		name = std::move (new_name);
> >   	    }
> >   	  LocalFree (value);
> >   	}
> 
> This is probably wrong on Cygwin (as the target encoding should be 
> Cygwin's conception of the locale, not the Windows codepage).

What does CP_ACP does on Cygwin? is it different from what that does
in native Windows programs?

Anyway, for Cygwin I think we should replace CP_ACP with CP_UTF8,
don't you think?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Handle encoding failures in Windows thread names
  2022-06-02 14:33   ` Tom Tromey
@ 2022-06-02 19:29     ` Jon Turney
  2022-06-03 16:19       ` Tom Tromey
  0 siblings, 1 reply; 10+ messages in thread
From: Jon Turney @ 2022-06-02 19:29 UTC (permalink / raw)
  To: Tom Tromey, gdb-patches

On 02/06/2022 15:33, Tom Tromey wrote:
>>>>>> "Jon" == Jon Turney <jon.turney@dronecode.org.uk> writes:
> 
> Jon> This is probably wrong on Cygwin (as the target encoding should be
> Jon> Cygwin's conception of the locale, not the Windows codepage).

Here I meant "Cygwin's conception of the locale for the gdb process"

> If there's some way for gdb to know the locale of the inferior, I guess
> we could use that here.  I don't have Cygwin and so I can't test it or
> anything, but if I knew what to do I could try to write a patch for
> someone else to test.

I am confused by this, but probably I'm missing something.

GetThreadDescription() only exists as a wide-char API, so the threadname 
it returns is always UTF-16 encoded, irrespective of the inferior's locale.

Probably, the "right" thing to do on cygwin is use wcstombs(), but 
apparently that can't report the encoding failure condition you want to 
detect.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Handle encoding failures in Windows thread names
  2022-06-02 16:07   ` Eli Zaretskii
@ 2022-06-02 19:29     ` Jon Turney
  2022-06-03  5:28       ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Jon Turney @ 2022-06-02 19:29 UTC (permalink / raw)
  To: Eli Zaretskii, gdb-patches

On 02/06/2022 17:07, Eli Zaretskii wrote:
>> Date: Thu, 2 Jun 2022 15:19:21 +0100
>> From: Jon Turney <jon.turney@dronecode.org.uk>
>>
>>> diff --git a/gdb/nat/windows-nat.c b/gdb/nat/windows-nat.c
>>> index bd1b9459145..7a4e804f891 100644
>>> --- a/gdb/nat/windows-nat.c
>>> +++ b/gdb/nat/windows-nat.c
>>> @@ -119,12 +119,19 @@ windows_thread_info::thread_name ()
>>>          HRESULT result = GetThreadDescription (h, &value);
>>>          if (SUCCEEDED (result))
>>>    	{
>>> -	  size_t needed = wcstombs (nullptr, value, 0);
>>> -	  if (needed != (size_t) -1)
>>> +	  int needed = WideCharToMultiByte (CP_ACP, 0, value, -1, nullptr, 0,
>>> +					    nullptr, nullptr);
>>> +	  if (needed != 0)
>>>    	    {
>>> -	      name.reset ((char *) xmalloc (needed));
>>> -	      if (wcstombs (name.get (), value, needed) == (size_t) -1)
>>> -		name.reset ();
>>> +	      BOOL used_default = FALSE;
>>> +	      gdb::unique_xmalloc_ptr<char> new_name
>>> +		((char *) xmalloc (needed));
>>> +	      if (WideCharToMultiByte (CP_ACP, 0, value, -1,
>>> +				       new_name.get (), needed,
>>> +				       nullptr, &used_default) == needed
>>> +		  && !used_default
>>> +		  && strlen (new_name.get ()) > 0)
>>> +		name = std::move (new_name);
>>>    	    }
>>>    	  LocalFree (value);
>>>    	}
>>
>> This is probably wrong on Cygwin (as the target encoding should be
>> Cygwin's conception of the locale, not the Windows codepage).
> 
> What does CP_ACP does on Cygwin? is it different from what that does
> in native Windows programs?

It is the same function, doing the same thing.

But I think it's not what's wanted in context:  This string ends up 
being written to stdout, which may be connected to a terminal, which 
would be expecting that to be encoded for the locale specified by LANG 
etc, not the Windows codepage.

> Anyway, for Cygwin I think we should replace CP_ACP with CP_UTF8,
> don't you think?

That would only be right if LANG etc. specified a UTF8 locale?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Handle encoding failures in Windows thread names
  2022-06-02 19:29     ` Jon Turney
@ 2022-06-03  5:28       ` Eli Zaretskii
  0 siblings, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2022-06-03  5:28 UTC (permalink / raw)
  To: Jon Turney; +Cc: gdb-patches

> Date: Thu, 2 Jun 2022 20:29:48 +0100
> From: Jon Turney <jon.turney@dronecode.org.uk>
> 
> But I think it's not what's wanted in context:  This string ends up 
> being written to stdout, which may be connected to a terminal, which 
> would be expecting that to be encoded for the locale specified by LANG 
> etc, not the Windows codepage.

So we need to convert the terminal's locale to a Windows codepage?
Or maybe calling GetConsoleOutputCP would do the job?

> > Anyway, for Cygwin I think we should replace CP_ACP with CP_UTF8,
> > don't you think?
> 
> That would only be right if LANG etc. specified a UTF8 locale?

I thought Cygwin nowadays used UTF-8 by default?  But I'm far from
being a Cygwin expert, so ignore me if this makes no sense.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Handle encoding failures in Windows thread names
  2022-06-02 19:29     ` Jon Turney
@ 2022-06-03 16:19       ` Tom Tromey
  0 siblings, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2022-06-03 16:19 UTC (permalink / raw)
  To: Jon Turney; +Cc: Tom Tromey, gdb-patches

>> If there's some way for gdb to know the locale of the inferior, I guess
>> we could use that here.  I don't have Cygwin and so I can't test it or
>> anything, but if I knew what to do I could try to write a patch for
>> someone else to test.

Jon> I am confused by this, but probably I'm missing something.

No, I was confused.

Jon> Probably, the "right" thing to do on cygwin is use wcstombs(), but
Jon> apparently that can't report the encoding failure condition you want
Jon> to detect.

At some point I switched which API was in use here for precisely this
reason.

Another approach might be to use gdb's charset conversion code.

Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-06-03 16:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 14:39 [PATCH] Handle encoding failures in Windows thread names Tom Tromey
2022-04-21 15:47 ` Eli Zaretskii
2022-04-26 18:53   ` Tom Tromey
2022-06-02 14:19 ` Jon Turney
2022-06-02 14:33   ` Tom Tromey
2022-06-02 19:29     ` Jon Turney
2022-06-03 16:19       ` Tom Tromey
2022-06-02 16:07   ` Eli Zaretskii
2022-06-02 19:29     ` Jon Turney
2022-06-03  5:28       ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).