* [PATCH] Handle encoding failures in Windows thread names
@ 2022-04-21 14:39 Tom Tromey
2022-04-21 15:47 ` Eli Zaretskii
2022-06-02 14:19 ` Jon Turney
0 siblings, 2 replies; 10+ messages in thread
From: Tom Tromey @ 2022-04-21 14:39 UTC (permalink / raw)
To: gdb-patches; +Cc: Tom Tromey
Internally at AdaCore, we noticed that the new Windows thread name
code could fail. First, it might return a zero-length string, but in
gdb conventions it should return nullptr instead. Second, an encoding
failure could wind up showing replacement characters to the user; this
is confusing and not useful; it's better to recognize such errors and
simply discard the name. This patch makes both of these changes.
---
gdb/nat/windows-nat.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/gdb/nat/windows-nat.c b/gdb/nat/windows-nat.c
index bd1b9459145..7a4e804f891 100644
--- a/gdb/nat/windows-nat.c
+++ b/gdb/nat/windows-nat.c
@@ -119,12 +119,19 @@ windows_thread_info::thread_name ()
HRESULT result = GetThreadDescription (h, &value);
if (SUCCEEDED (result))
{
- size_t needed = wcstombs (nullptr, value, 0);
- if (needed != (size_t) -1)
+ int needed = WideCharToMultiByte (CP_ACP, 0, value, -1, nullptr, 0,
+ nullptr, nullptr);
+ if (needed != 0)
{
- name.reset ((char *) xmalloc (needed));
- if (wcstombs (name.get (), value, needed) == (size_t) -1)
- name.reset ();
+ BOOL used_default = FALSE;
+ gdb::unique_xmalloc_ptr<char> new_name
+ ((char *) xmalloc (needed));
+ if (WideCharToMultiByte (CP_ACP, 0, value, -1,
+ new_name.get (), needed,
+ nullptr, &used_default) == needed
+ && !used_default
+ && strlen (new_name.get ()) > 0)
+ name = std::move (new_name);
}
LocalFree (value);
}
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Handle encoding failures in Windows thread names
2022-04-21 14:39 [PATCH] Handle encoding failures in Windows thread names Tom Tromey
@ 2022-04-21 15:47 ` Eli Zaretskii
2022-04-26 18:53 ` Tom Tromey
2022-06-02 14:19 ` Jon Turney
1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2022-04-21 15:47 UTC (permalink / raw)
To: Tom Tromey; +Cc: gdb-patches
> Date: Thu, 21 Apr 2022 08:39:26 -0600
> From: Tom Tromey via Gdb-patches <gdb-patches@sourceware.org>
> Cc: Tom Tromey <tromey@adacore.com>
>
> Internally at AdaCore, we noticed that the new Windows thread name
> code could fail. First, it might return a zero-length string, but in
> gdb conventions it should return nullptr instead. Second, an encoding
> failure could wind up showing replacement characters to the user; this
> is confusing and not useful; it's better to recognize such errors and
> simply discard the name. This patch makes both of these changes.
I suggest to explain in a comment how this code detects encoding
failures. The documentation of WideCharToMultiByte is not simple to
understand in this regard, and "used_default" is not expressive enough
to explain its role here, so I suggest not to rely on the reader to
know those subtleties.
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Handle encoding failures in Windows thread names
2022-04-21 15:47 ` Eli Zaretskii
@ 2022-04-26 18:53 ` Tom Tromey
0 siblings, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2022-04-26 18:53 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Tom Tromey, gdb-patches
Eli> I suggest to explain in a comment how this code detects encoding
Eli> failures. The documentation of WideCharToMultiByte is not simple to
Eli> understand in this regard, and "used_default" is not expressive enough
Eli> to explain its role here, so I suggest not to rely on the reader to
Eli> know those subtleties.
I've added a comment and I'm checking this in.
Tom
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Handle encoding failures in Windows thread names
2022-04-21 14:39 [PATCH] Handle encoding failures in Windows thread names Tom Tromey
2022-04-21 15:47 ` Eli Zaretskii
@ 2022-06-02 14:19 ` Jon Turney
2022-06-02 14:33 ` Tom Tromey
2022-06-02 16:07 ` Eli Zaretskii
1 sibling, 2 replies; 10+ messages in thread
From: Jon Turney @ 2022-06-02 14:19 UTC (permalink / raw)
To: Tom Tromey, gdb-patches
On 21/04/2022 15:39, Tom Tromey via Gdb-patches wrote:
> Internally at AdaCore, we noticed that the new Windows thread name
> code could fail. First, it might return a zero-length string, but in
> gdb conventions it should return nullptr instead. Second, an encoding
> failure could wind up showing replacement characters to the user; this
> is confusing and not useful; it's better to recognize such errors and
> simply discard the name. This patch makes both of these changes.
> ---
> gdb/nat/windows-nat.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/gdb/nat/windows-nat.c b/gdb/nat/windows-nat.c
> index bd1b9459145..7a4e804f891 100644
> --- a/gdb/nat/windows-nat.c
> +++ b/gdb/nat/windows-nat.c
> @@ -119,12 +119,19 @@ windows_thread_info::thread_name ()
> HRESULT result = GetThreadDescription (h, &value);
> if (SUCCEEDED (result))
> {
> - size_t needed = wcstombs (nullptr, value, 0);
> - if (needed != (size_t) -1)
> + int needed = WideCharToMultiByte (CP_ACP, 0, value, -1, nullptr, 0,
> + nullptr, nullptr);
> + if (needed != 0)
> {
> - name.reset ((char *) xmalloc (needed));
> - if (wcstombs (name.get (), value, needed) == (size_t) -1)
> - name.reset ();
> + BOOL used_default = FALSE;
> + gdb::unique_xmalloc_ptr<char> new_name
> + ((char *) xmalloc (needed));
> + if (WideCharToMultiByte (CP_ACP, 0, value, -1,
> + new_name.get (), needed,
> + nullptr, &used_default) == needed
> + && !used_default
> + && strlen (new_name.get ()) > 0)
> + name = std::move (new_name);
> }
> LocalFree (value);
> }
This is probably wrong on Cygwin (as the target encoding should be
Cygwin's conception of the locale, not the Windows codepage).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Handle encoding failures in Windows thread names
2022-06-02 14:19 ` Jon Turney
@ 2022-06-02 14:33 ` Tom Tromey
2022-06-02 19:29 ` Jon Turney
2022-06-02 16:07 ` Eli Zaretskii
1 sibling, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2022-06-02 14:33 UTC (permalink / raw)
To: Jon Turney; +Cc: Tom Tromey, gdb-patches
>>>>> "Jon" == Jon Turney <jon.turney@dronecode.org.uk> writes:
Jon> This is probably wrong on Cygwin (as the target encoding should be
Jon> Cygwin's conception of the locale, not the Windows codepage).
If there's some way for gdb to know the locale of the inferior, I guess
we could use that here. I don't have Cygwin and so I can't test it or
anything, but if I knew what to do I could try to write a patch for
someone else to test.
Tom
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Handle encoding failures in Windows thread names
2022-06-02 14:19 ` Jon Turney
2022-06-02 14:33 ` Tom Tromey
@ 2022-06-02 16:07 ` Eli Zaretskii
2022-06-02 19:29 ` Jon Turney
1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2022-06-02 16:07 UTC (permalink / raw)
To: Jon Turney; +Cc: tromey, gdb-patches
> Date: Thu, 2 Jun 2022 15:19:21 +0100
> From: Jon Turney <jon.turney@dronecode.org.uk>
>
> > diff --git a/gdb/nat/windows-nat.c b/gdb/nat/windows-nat.c
> > index bd1b9459145..7a4e804f891 100644
> > --- a/gdb/nat/windows-nat.c
> > +++ b/gdb/nat/windows-nat.c
> > @@ -119,12 +119,19 @@ windows_thread_info::thread_name ()
> > HRESULT result = GetThreadDescription (h, &value);
> > if (SUCCEEDED (result))
> > {
> > - size_t needed = wcstombs (nullptr, value, 0);
> > - if (needed != (size_t) -1)
> > + int needed = WideCharToMultiByte (CP_ACP, 0, value, -1, nullptr, 0,
> > + nullptr, nullptr);
> > + if (needed != 0)
> > {
> > - name.reset ((char *) xmalloc (needed));
> > - if (wcstombs (name.get (), value, needed) == (size_t) -1)
> > - name.reset ();
> > + BOOL used_default = FALSE;
> > + gdb::unique_xmalloc_ptr<char> new_name
> > + ((char *) xmalloc (needed));
> > + if (WideCharToMultiByte (CP_ACP, 0, value, -1,
> > + new_name.get (), needed,
> > + nullptr, &used_default) == needed
> > + && !used_default
> > + && strlen (new_name.get ()) > 0)
> > + name = std::move (new_name);
> > }
> > LocalFree (value);
> > }
>
> This is probably wrong on Cygwin (as the target encoding should be
> Cygwin's conception of the locale, not the Windows codepage).
What does CP_ACP does on Cygwin? is it different from what that does
in native Windows programs?
Anyway, for Cygwin I think we should replace CP_ACP with CP_UTF8,
don't you think?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Handle encoding failures in Windows thread names
2022-06-02 14:33 ` Tom Tromey
@ 2022-06-02 19:29 ` Jon Turney
2022-06-03 16:19 ` Tom Tromey
0 siblings, 1 reply; 10+ messages in thread
From: Jon Turney @ 2022-06-02 19:29 UTC (permalink / raw)
To: Tom Tromey, gdb-patches
On 02/06/2022 15:33, Tom Tromey wrote:
>>>>>> "Jon" == Jon Turney <jon.turney@dronecode.org.uk> writes:
>
> Jon> This is probably wrong on Cygwin (as the target encoding should be
> Jon> Cygwin's conception of the locale, not the Windows codepage).
Here I meant "Cygwin's conception of the locale for the gdb process"
> If there's some way for gdb to know the locale of the inferior, I guess
> we could use that here. I don't have Cygwin and so I can't test it or
> anything, but if I knew what to do I could try to write a patch for
> someone else to test.
I am confused by this, but probably I'm missing something.
GetThreadDescription() only exists as a wide-char API, so the threadname
it returns is always UTF-16 encoded, irrespective of the inferior's locale.
Probably, the "right" thing to do on cygwin is use wcstombs(), but
apparently that can't report the encoding failure condition you want to
detect.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Handle encoding failures in Windows thread names
2022-06-02 16:07 ` Eli Zaretskii
@ 2022-06-02 19:29 ` Jon Turney
2022-06-03 5:28 ` Eli Zaretskii
0 siblings, 1 reply; 10+ messages in thread
From: Jon Turney @ 2022-06-02 19:29 UTC (permalink / raw)
To: Eli Zaretskii, gdb-patches
On 02/06/2022 17:07, Eli Zaretskii wrote:
>> Date: Thu, 2 Jun 2022 15:19:21 +0100
>> From: Jon Turney <jon.turney@dronecode.org.uk>
>>
>>> diff --git a/gdb/nat/windows-nat.c b/gdb/nat/windows-nat.c
>>> index bd1b9459145..7a4e804f891 100644
>>> --- a/gdb/nat/windows-nat.c
>>> +++ b/gdb/nat/windows-nat.c
>>> @@ -119,12 +119,19 @@ windows_thread_info::thread_name ()
>>> HRESULT result = GetThreadDescription (h, &value);
>>> if (SUCCEEDED (result))
>>> {
>>> - size_t needed = wcstombs (nullptr, value, 0);
>>> - if (needed != (size_t) -1)
>>> + int needed = WideCharToMultiByte (CP_ACP, 0, value, -1, nullptr, 0,
>>> + nullptr, nullptr);
>>> + if (needed != 0)
>>> {
>>> - name.reset ((char *) xmalloc (needed));
>>> - if (wcstombs (name.get (), value, needed) == (size_t) -1)
>>> - name.reset ();
>>> + BOOL used_default = FALSE;
>>> + gdb::unique_xmalloc_ptr<char> new_name
>>> + ((char *) xmalloc (needed));
>>> + if (WideCharToMultiByte (CP_ACP, 0, value, -1,
>>> + new_name.get (), needed,
>>> + nullptr, &used_default) == needed
>>> + && !used_default
>>> + && strlen (new_name.get ()) > 0)
>>> + name = std::move (new_name);
>>> }
>>> LocalFree (value);
>>> }
>>
>> This is probably wrong on Cygwin (as the target encoding should be
>> Cygwin's conception of the locale, not the Windows codepage).
>
> What does CP_ACP does on Cygwin? is it different from what that does
> in native Windows programs?
It is the same function, doing the same thing.
But I think it's not what's wanted in context: This string ends up
being written to stdout, which may be connected to a terminal, which
would be expecting that to be encoded for the locale specified by LANG
etc, not the Windows codepage.
> Anyway, for Cygwin I think we should replace CP_ACP with CP_UTF8,
> don't you think?
That would only be right if LANG etc. specified a UTF8 locale?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Handle encoding failures in Windows thread names
2022-06-02 19:29 ` Jon Turney
@ 2022-06-03 5:28 ` Eli Zaretskii
0 siblings, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2022-06-03 5:28 UTC (permalink / raw)
To: Jon Turney; +Cc: gdb-patches
> Date: Thu, 2 Jun 2022 20:29:48 +0100
> From: Jon Turney <jon.turney@dronecode.org.uk>
>
> But I think it's not what's wanted in context: This string ends up
> being written to stdout, which may be connected to a terminal, which
> would be expecting that to be encoded for the locale specified by LANG
> etc, not the Windows codepage.
So we need to convert the terminal's locale to a Windows codepage?
Or maybe calling GetConsoleOutputCP would do the job?
> > Anyway, for Cygwin I think we should replace CP_ACP with CP_UTF8,
> > don't you think?
>
> That would only be right if LANG etc. specified a UTF8 locale?
I thought Cygwin nowadays used UTF-8 by default? But I'm far from
being a Cygwin expert, so ignore me if this makes no sense.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Handle encoding failures in Windows thread names
2022-06-02 19:29 ` Jon Turney
@ 2022-06-03 16:19 ` Tom Tromey
0 siblings, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2022-06-03 16:19 UTC (permalink / raw)
To: Jon Turney; +Cc: Tom Tromey, gdb-patches
>> If there's some way for gdb to know the locale of the inferior, I guess
>> we could use that here. I don't have Cygwin and so I can't test it or
>> anything, but if I knew what to do I could try to write a patch for
>> someone else to test.
Jon> I am confused by this, but probably I'm missing something.
No, I was confused.
Jon> Probably, the "right" thing to do on cygwin is use wcstombs(), but
Jon> apparently that can't report the encoding failure condition you want
Jon> to detect.
At some point I switched which API was in use here for precisely this
reason.
Another approach might be to use gdb's charset conversion code.
Tom
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-06-03 16:19 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 14:39 [PATCH] Handle encoding failures in Windows thread names Tom Tromey
2022-04-21 15:47 ` Eli Zaretskii
2022-04-26 18:53 ` Tom Tromey
2022-06-02 14:19 ` Jon Turney
2022-06-02 14:33 ` Tom Tromey
2022-06-02 19:29 ` Jon Turney
2022-06-03 16:19 ` Tom Tromey
2022-06-02 16:07 ` Eli Zaretskii
2022-06-02 19:29 ` Jon Turney
2022-06-03 5:28 ` Eli Zaretskii
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).