public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Different outputs affected by locale
@ 2014-05-27 12:13 Yao Qi
  2014-06-04  5:32 ` [ping] " Yao Qi
  0 siblings, 1 reply; 39+ messages in thread
From: Yao Qi @ 2014-05-27 12:13 UTC (permalink / raw)
  To: gdb-patches

We find the following fails in gdb test on mingw host.

FAIL: gdb.base/wchar.exp: print repeat
FAIL: gdb.base/wchar.exp: print repeat_p
FAIL: gdb.base/wchar.exp: print repeat (print null on)
FAIL: gdb.base/wchar.exp: print repeat (print elements 3)
FAIL: gdb.base/wchar.exp: print repeat_p (print elements 3)

print repeat^M
$7 = L"A", '¢' <repeats 21 times>, "B", '\000' <repeats 104 times>^M
(gdb) FAIL: gdb.base/wchar.exp: print repeat

the \242 is expected in the test but cent sign is displayed.

In valprint.c:print_wchar, wchar_printable is called to determine
whether a wchar is printable.  wchar_printable calls iswprint but
the iswprint's return value depends on LC_CTYPE setting of locale [1, 2].
The output may vary with different locale settings.  I noticed that
gdb.exp:gdb_init set LC_CTYPE to C.  If I remove that line, tests
fail on native testing too.

IMO, either \242 or '¢' (cent sign) is a correct output, which is
affect by locale, and it is not related to gdb at all.

[1] http://pubs.opengroup.org/onlinepubs/009604499/functions/iswprint.html
[2] msdn.microsoft.com/en-us/library/ewx8s4kw.aspx

This patch is to add code to 'p repeat[1]' to extract the cent first,
and then use it to match in the following tests.

gdb/testsuite:

2014-05-27  Yao Qi  <yao@codesourcery.com>

	* gdb.base/wchar.exp: Execute command 'p repeat[1]' and extract
	cent from the output.
---
 gdb/testsuite/gdb.base/wchar.exp | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/gdb/testsuite/gdb.base/wchar.exp b/gdb/testsuite/gdb.base/wchar.exp
index 4290478..215d2f4 100644
--- a/gdb/testsuite/gdb.base/wchar.exp
+++ b/gdb/testsuite/gdb.base/wchar.exp
@@ -36,7 +36,23 @@ gdb_test "print simple\[2\]" "= 99 L'c'"
 
 gdb_test "print difficile\[2\]" "= 65261 L'\\\\xfeed'"
 
-set cent "\\\\242"
+# The contents in 'repeat' are shown differently under different
+# locale.  In stead of hard code the cent sign in variable 'cent',
+# extract it from the output of 'print repeat[1]', and use it to
+# match the output in the following tests.
+set cent ""
+set test "get cent"
+gdb_test_multiple "p repeat\[1\]" $test {
+    -re " = 162 L'(.*)'.*\r\n$gdb_prompt $" {
+	set cent [string_to_regexp $expect_out(1,string)]
+	pass $test
+    }
+    -re ".*$gdb_prompt $" {
+	fail $test
+	return
+    }
+}
+
 gdb_test "print repeat" "= L\"A\", '$cent' <repeats 21 times>, \"B.*"
 
 global hex
-- 
1.9.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [ping]  [PATCH] Different outputs affected by locale
  2014-05-27 12:13 [PATCH] Different outputs affected by locale Yao Qi
@ 2014-06-04  5:32 ` Yao Qi
  2014-06-04 12:47   ` Joel Brobecker
  0 siblings, 1 reply; 39+ messages in thread
From: Yao Qi @ 2014-06-04  5:32 UTC (permalink / raw)
  To: gdb-patches

On 05/27/2014 08:10 PM, Yao Qi wrote:
> We find the following fails in gdb test on mingw host.
> 
> FAIL: gdb.base/wchar.exp: print repeat
> FAIL: gdb.base/wchar.exp: print repeat_p
> FAIL: gdb.base/wchar.exp: print repeat (print null on)
> FAIL: gdb.base/wchar.exp: print repeat (print elements 3)
> FAIL: gdb.base/wchar.exp: print repeat_p (print elements 3)
> 
> print repeat^M
> $7 = L"A", '¢' <repeats 21 times>, "B", '\000' <repeats 104 times>^M
> (gdb) FAIL: gdb.base/wchar.exp: print repeat
> 
> the \242 is expected in the test but cent sign is displayed.
> 
> In valprint.c:print_wchar, wchar_printable is called to determine
> whether a wchar is printable.  wchar_printable calls iswprint but
> the iswprint's return value depends on LC_CTYPE setting of locale [1, 2].
> The output may vary with different locale settings.  I noticed that
> gdb.exp:gdb_init set LC_CTYPE to C.  If I remove that line, tests
> fail on native testing too.
> 
> IMO, either \242 or '¢' (cent sign) is a correct output, which is
> affect by locale, and it is not related to gdb at all.
> 
> [1] http://pubs.opengroup.org/onlinepubs/009604499/functions/iswprint.html
> [2] msdn.microsoft.com/en-us/library/ewx8s4kw.aspx
> 
> This patch is to add code to 'p repeat[1]' to extract the cent first,
> and then use it to match in the following tests.
> 
> gdb/testsuite:
> 
> 2014-05-27  Yao Qi  <yao@codesourcery.com>
> 
> 	* gdb.base/wchar.exp: Execute command 'p repeat[1]' and extract
> 	cent from the output.

Ping.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-04  5:32 ` [ping] " Yao Qi
@ 2014-06-04 12:47   ` Joel Brobecker
  2014-06-04 13:21     ` Yao Qi
  0 siblings, 1 reply; 39+ messages in thread
From: Joel Brobecker @ 2014-06-04 12:47 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

> > 2014-05-27  Yao Qi  <yao@codesourcery.com>
> > 
> > 	* gdb.base/wchar.exp: Execute command 'p repeat[1]' and extract
> > 	cent from the output.

This is a patch that I felt would be better reviewed by Tom, but
we'd have to wait for him to be back. When I read your patch,
I thought that the approach you took was weakening the test a little,
because if GDB started printing the character incorrectly, you would
not notice it anymore.

-- 
Joel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-04 12:47   ` Joel Brobecker
@ 2014-06-04 13:21     ` Yao Qi
  2014-06-04 13:52       ` Joel Brobecker
  2014-06-04 20:15       ` Tom Tromey
  0 siblings, 2 replies; 39+ messages in thread
From: Yao Qi @ 2014-06-04 13:21 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

On 06/04/2014 08:47 PM, Joel Brobecker wrote:
>>> 2014-05-27  Yao Qi  <yao@codesourcery.com>
>>>
>>> 	* gdb.base/wchar.exp: Execute command 'p repeat[1]' and extract
>>> 	cent from the output.
> 
> This is a patch that I felt would be better reviewed by Tom, but
> we'd have to wait for him to be back. When I read your patch,
> I thought that the approach you took was weakening the test a little,
> because if GDB started printing the character incorrectly, you would
> not notice it anymore.
> 

The character printed by GDB in this case is out the control of GDB,
IMO.  IOW, we can't tell what character printed is correct and what is
incorrect.  Or we can relax the pattern to match either \242 or '¢'
(cent sign) in the test.  WDYT?
-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-04 13:21     ` Yao Qi
@ 2014-06-04 13:52       ` Joel Brobecker
  2014-06-04 20:15       ` Tom Tromey
  1 sibling, 0 replies; 39+ messages in thread
From: Joel Brobecker @ 2014-06-04 13:52 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

> The character printed by GDB in this case is out the control of GDB,
> IMO.

IIRC, it is a little bit by ways of how it decodes multibyte characters?

> IOW, we can't tell what character printed is correct and what is
> incorrect.  Or we can relax the pattern to match either \242 or '¢'
> (cent sign) in the test.  WDYT?

That would have been my first approach, but I would prefer it if
someone who knows better about encodings commented on that. I could
be wrong!

-- 
Joel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-04 13:21     ` Yao Qi
  2014-06-04 13:52       ` Joel Brobecker
@ 2014-06-04 20:15       ` Tom Tromey
  2014-06-04 20:23         ` Pedro Alves
  1 sibling, 1 reply; 39+ messages in thread
From: Tom Tromey @ 2014-06-04 20:15 UTC (permalink / raw)
  To: Yao Qi; +Cc: Joel Brobecker, gdb-patches

>>>>> "Yao" == Yao Qi <yao@codesourcery.com> writes:

Yao> The character printed by GDB in this case is out the control of GDB,
Yao> IMO.  IOW, we can't tell what character printed is correct and what is
Yao> incorrect.  Or we can relax the pattern to match either \242 or '¢'
Yao> (cent sign) in the test.  WDYT?

I think that would be preferable.  It is more conservative for the
reason Joel pointed out; and should we encounter a system that emits
something else, it is easy to update the test at that time.

I am not really a great standards lawyer but my first reaction is that
mingw's C locale is not conforming.  At least from:

    http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html

.. it seems to me that \242 is not defined as a 'print' character in the
LC_CTYPE section.  Though I'd like to reiterate that I don't actually
trust my own reading of that text.

Tom

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-04 20:15       ` Tom Tromey
@ 2014-06-04 20:23         ` Pedro Alves
  2014-06-05  3:31           ` Yao Qi
  0 siblings, 1 reply; 39+ messages in thread
From: Pedro Alves @ 2014-06-04 20:23 UTC (permalink / raw)
  To: Tom Tromey, Yao Qi; +Cc: Joel Brobecker, gdb-patches

On 06/04/2014 09:15 PM, Tom Tromey wrote:

> I am not really a great standards lawyer but my first reaction is that
> mingw's C locale is not conforming.  At least from:
> 
>     http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html
> 
> .. it seems to me that \242 is not defined as a 'print' character in the
> LC_CTYPE section.  Though I'd like to reiterate that I don't actually
> trust my own reading of that text.

I wonder whether this is really a mingw issue, or whether this is a
remote host testing issue.  That is, aren't we setting LC_CTYPE
on the _build_ (where expect runs), not on the host (mingw, through
ssh)?  Is LC_CTYPE really being propagated to the host?
Does testing GDB manually directly on a Windows console show the same
issue?

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-04 20:23         ` Pedro Alves
@ 2014-06-05  3:31           ` Yao Qi
  2014-06-05  8:58             ` Pedro Alves
  2014-06-05 14:47             ` Eli Zaretskii
  0 siblings, 2 replies; 39+ messages in thread
From: Yao Qi @ 2014-06-05  3:31 UTC (permalink / raw)
  To: Pedro Alves, Tom Tromey; +Cc: Joel Brobecker, gdb-patches

On 06/05/2014 04:23 AM, Pedro Alves wrote:
>> > I am not really a great standards lawyer but my first reaction is that
>> > mingw's C locale is not conforming.  At least from:
>> > 
>> >     http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html
>> > 
>> > .. it seems to me that \242 is not defined as a 'print' character in the
>> > LC_CTYPE section.  Though I'd like to reiterate that I don't actually
>> > trust my own reading of that text.
> I wonder whether this is really a mingw issue, or whether this is a
> remote host testing issue.  That is, aren't we setting LC_CTYPE
> on the _build_ (where expect runs), not on the host (mingw, through

This is a not a mingw issue nor a remote host testing issue.  If the
LC_CTYPE isn't set properly on host, these tests will fail, even in the
native testing.

> ssh)?  Is LC_CTYPE really being propagated to the host?

No, setting env variables on host or target in dejagnu isn't trivial to
me.

> Does testing GDB manually directly on a Windows console show the same
> issue?

Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
However, I didn't investigate why 'ó' is printed.

gdb) p repeat
$1 = L"A", 'ó' <repeats 21 times>, "B\000\xffff\200\000\x1370\500\xfe0c\"\x300\x
7ffe\xfe98\"\xe115\x771b\x67c9\x42c8\xfffe\xffff\x6d91\x7726\x1ae0@\xeb0:\x300\x
7ffe\xea8:\200\000Ω\000\xf480\x7594\000:\000\000\xf489\x7594\017\000\004\000Ω\00
0\xfe9c\"\x6094\x771e\xa2ac\x771f\xffff\xffff$\000\xfe98\"\004\000\000\000\x559\
xc000\xfea8\"\xf600\x7594\000\000\000\000\000\000\xfebc\"\xa442\x7594\x2a8\x759e
\xfefc\"\xf4d2\x7594\b\000\x118e\x7595\x1162\x7595\x8ccb\x3e13\000\000\000\000\0
00\000\x1ae0@\xfed0\"\x8fe3\x759b\xffc4"

here is the update patch to match either \242 or cent sign.

-- 
Yao (齐尧)

Subject: [PATCH] Different outputs affected by locale
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We find the following fails in gdb test on mingw host.

FAIL: gdb.base/wchar.exp: print repeat
FAIL: gdb.base/wchar.exp: print repeat_p
FAIL: gdb.base/wchar.exp: print repeat (print null on)
FAIL: gdb.base/wchar.exp: print repeat (print elements 3)
FAIL: gdb.base/wchar.exp: print repeat_p (print elements 3)

print repeat^M
$7 = L"A", '¢' <repeats 21 times>, "B", '\000' <repeats 104 times>^M
(gdb) FAIL: gdb.base/wchar.exp: print repeat

the \242 is expected in the test but cent sign is displayed.

In valprint.c:print_wchar, wchar_printable is called to determine
whether a wchar is printable.  wchar_printable calls iswprint but
the iswprint's return value depends on LC_CTYPE setting of locale [1, 2].
The output may vary with different locale settings.  I noticed that
gdb.exp:gdb_init set LC_CTYPE to C.  If I remove that line, tests
fail on native testing too.

IMO, either \242 or '¢' (cent sign) is a correct output, which is
affect by locale, and it is not related to gdb at all.

[1] http://pubs.opengroup.org/onlinepubs/009604499/functions/iswprint.html
[2] http://msdn.microsoft.com/en-us/library/ewx8s4kw.aspx

This patch is extend $cent for cent sign to match in the tests.

gdb/testsuite:

2014-06-05  Yao Qi  <yao@codesourcery.com>

	* gdb.base/wchar.exp: Extend $cent to match cent sign.
---
 gdb/testsuite/gdb.base/wchar.exp | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gdb/testsuite/gdb.base/wchar.exp b/gdb/testsuite/gdb.base/wchar.exp
index 4290478..aa19d92 100644
--- a/gdb/testsuite/gdb.base/wchar.exp
+++ b/gdb/testsuite/gdb.base/wchar.exp
@@ -36,7 +36,10 @@ gdb_test "print simple\[2\]" "= 99 L'c'"
 
 gdb_test "print difficile\[2\]" "= 65261 L'\\\\xfeed'"
 
-set cent "\\\\242"
+# The contents in 'repeat' are shown differently under different
+# locale.  We match all the possible outputs here, '\242' or cent sign.
+set cent "(\\\\242|\u00A2)"
+
 gdb_test "print repeat" "= L\"A\", '$cent' <repeats 21 times>, \"B.*"
 
 global hex
-- 
1.9.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-05  3:31           ` Yao Qi
@ 2014-06-05  8:58             ` Pedro Alves
  2014-06-05  9:58               ` Yao Qi
  2014-06-05 14:47             ` Eli Zaretskii
  1 sibling, 1 reply; 39+ messages in thread
From: Pedro Alves @ 2014-06-05  8:58 UTC (permalink / raw)
  To: Yao Qi; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/05/2014 04:29 AM, Yao Qi wrote:
> On 06/05/2014 04:23 AM, Pedro Alves wrote:
>>>> I am not really a great standards lawyer but my first reaction is that
>>>> mingw's C locale is not conforming.  At least from:
>>>>
>>>>     http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html
>>>>
>>>> .. it seems to me that \242 is not defined as a 'print' character in the
>>>> LC_CTYPE section.  Though I'd like to reiterate that I don't actually
>>>> trust my own reading of that text.
>> I wonder whether this is really a mingw issue, or whether this is a
>> remote host testing issue.  That is, aren't we setting LC_CTYPE
>> on the _build_ (where expect runs), not on the host (mingw, through
> 
> This is a not a mingw issue nor a remote host testing issue.  

But that's a conflicting answer.  It's a remote host testing
if this only triggers with remote host testing.

> If the
> LC_CTYPE isn't set properly on host, these tests will fail, even in the
> native testing.

Sure, but it's supposed to be set, and then tests can assume so.
If not set in some circumstance, then it's a bug in the test
infrustruture, not the test.  For native testing, those are
set by gdb.exp:gdb_init.

> 
>> ssh)?  Is LC_CTYPE really being propagated to the host?
> 
> No, setting env variables on host or target in dejagnu isn't trivial to
> me.

They need to be passed down explicitly in the ssh command line:

$ ssh localhost "FOO=1 env | grep FOO"
FOO=1

> 
>> Does testing GDB manually directly on a Windows console show the same
>> issue?
> 
> Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
> However, I didn't investigate why 'ó' is printed.

But was that with LC_CTYPE set to C?

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-05  8:58             ` Pedro Alves
@ 2014-06-05  9:58               ` Yao Qi
  2014-06-05 10:12                 ` Pedro Alves
  2014-06-05 10:27                 ` Pedro Alves
  0 siblings, 2 replies; 39+ messages in thread
From: Yao Qi @ 2014-06-05  9:58 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/05/2014 04:58 PM, Pedro Alves wrote:
>> This is a not a mingw issue nor a remote host testing issue.  
> But that's a conflicting answer.  It's a remote host testing
> if this only triggers with remote host testing.
> 

OK, it is a remote host testing issue, since LC_CTYPE is set on build
only.

>> > If the
>> > LC_CTYPE isn't set properly on host, these tests will fail, even in the
>> > native testing.
> Sure, but it's supposed to be set, and then tests can assume so.
> If not set in some circumstance, then it's a bug in the test
> infrustruture, not the test.  For native testing, those are
> set by gdb.exp:gdb_init.
> 
>> > 
>>> >> ssh)?  Is LC_CTYPE really being propagated to the host?
>> > 
>> > No, setting env variables on host or target in dejagnu isn't trivial to
>> > me.
> They need to be passed down explicitly in the ssh command line:
> 
> $ ssh localhost "FOO=1 env | grep FOO"
> FOO=1
> 

Yes, it is simple to pass env variable through ssh, but isn't trivial to
pass env variable to host or target in dejagnu, because,

 - ssh is not the only connection dejagnu supports, how about telnet?
 - env variable should bind to board.  host and target can have
different env vars.

I saw Jie's patch to set env var on target
http://lists.gnu.org/archive/html/dejagnu/2008-07/msg00000.html
but we need do more than that, IMO.  That is the reason I am inclined to
fix the test case instead of the infrastructure (dejagnu).

>> > 
>>> >> Does testing GDB manually directly on a Windows console show the same
>>> >> issue?
>> > 
>> > Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
>> > However, I didn't investigate why 'ó' is printed.
> But was that with LC_CTYPE set to C?

I don't know how check LC_CTYPE on Windows. :(

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-05  9:58               ` Yao Qi
@ 2014-06-05 10:12                 ` Pedro Alves
  2014-06-05 15:04                   ` Eli Zaretskii
  2014-06-09  8:37                   ` Yao Qi
  2014-06-05 10:27                 ` Pedro Alves
  1 sibling, 2 replies; 39+ messages in thread
From: Pedro Alves @ 2014-06-05 10:12 UTC (permalink / raw)
  To: Yao Qi; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/05/2014 10:56 AM, Yao Qi wrote:
> On 06/05/2014 04:58 PM, Pedro Alves wrote:

>>>>>> Does testing GDB manually directly on a Windows console show the same
>>>>>> issue?
>>>>
>>>> Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
>>>> However, I didn't investigate why 'ó' is printed.
>> But was that with LC_CTYPE set to C?
> 
> I don't know how check LC_CTYPE on Windows. :(

Try "set", and "set /?".

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-05  9:58               ` Yao Qi
  2014-06-05 10:12                 ` Pedro Alves
@ 2014-06-05 10:27                 ` Pedro Alves
  1 sibling, 0 replies; 39+ messages in thread
From: Pedro Alves @ 2014-06-05 10:27 UTC (permalink / raw)
  To: Yao Qi; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/05/2014 10:56 AM, Yao Qi wrote:

> Yes, it is simple to pass env variable through ssh, but isn't trivial to
> pass env variable to host or target in dejagnu, because,
> 
>  - ssh is not the only connection dejagnu supports, how about telnet?

Well, nobody really uses that for _host_ connections.

>  - env variable should bind to board.  host and target can have
> different env vars.
> 
> I saw Jie's patch to set env var on target
> http://lists.gnu.org/archive/html/dejagnu/2008-07/msg00000.html
> but we need do more than that, IMO.  That is the reason I am inclined to
> fix the test case instead of the infrastructure (dejagnu).

In practice, all real host board files will have a ${board}_spawn
override anyway.  We can set GDB's vars in a gdb_env array, similar
to Jie's patch, and then the ${board}_spawn routine can pass them
to $RSH.  When/if Jie's patch is extended to bind to board, and
accepted upstream, we just set the appropriate new board var to $gdb_env.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-05  3:31           ` Yao Qi
  2014-06-05  8:58             ` Pedro Alves
@ 2014-06-05 14:47             ` Eli Zaretskii
  1 sibling, 0 replies; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-05 14:47 UTC (permalink / raw)
  To: Yao Qi; +Cc: palves, tromey, brobecker, gdb-patches

> Date: Thu, 5 Jun 2014 11:29:22 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: Joel Brobecker <brobecker@adacore.com>, <gdb-patches@sourceware.org>
> 
> However, I didn't investigate why 'ó' is printed.

'ó' is 243 decimal (0363 octal), not 242.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-05 10:12                 ` Pedro Alves
@ 2014-06-05 15:04                   ` Eli Zaretskii
  2014-06-09  8:37                   ` Yao Qi
  1 sibling, 0 replies; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-05 15:04 UTC (permalink / raw)
  To: Pedro Alves; +Cc: yao, tromey, brobecker, gdb-patches

> Date: Thu, 05 Jun 2014 11:12:50 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,        gdb-patches@sourceware.org
> 
> > I don't know how check LC_CTYPE on Windows. :(
> 
> Try "set", and "set /?".

Typing "set LC_TYPE" will either display its value or say that it is
not defined.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-05 10:12                 ` Pedro Alves
  2014-06-05 15:04                   ` Eli Zaretskii
@ 2014-06-09  8:37                   ` Yao Qi
  2014-06-09 10:11                     ` Pedro Alves
  1 sibling, 1 reply; 39+ messages in thread
From: Yao Qi @ 2014-06-09  8:37 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/05/2014 06:12 PM, Pedro Alves wrote:
> On 06/05/2014 10:56 AM, Yao Qi wrote:
>> On 06/05/2014 04:58 PM, Pedro Alves wrote:
> 
>>>>>>> Does testing GDB manually directly on a Windows console show the same
>>>>>>> issue?
>>>>>
>>>>> Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
>>>>> However, I didn't investigate why 'ó' is printed.
>>> But was that with LC_CTYPE set to C?
>>
>> I don't know how check LC_CTYPE on Windows. :(
> 
> Try "set", and "set /?".
> 

LC_CTYPE isn't set on the Windows machine I am using.  set LC_CTYPE=C,
but the output is unchanged.

I dive into locale stuff, and find something more, in
main.c:captured_main, gdb does

#if defined (HAVE_SETLOCALE)
  setlocale (LC_CTYPE, "");
#endif

the man page of setlocale says

If locale is "", each part of the locale that should be modified is set
according to the environment variables.

That is why we can pass env var to change gdb's locale.

However, looks setlocale on Windows behaves differently when locale is
"".  The msdn about setlocale
<http://msdn.microsoft.com/en-us/library/x99tb11d.aspx> says "If locale
points to an empty string, the locale is the implementation-defined
native environment.", but it doesn't say much on the
"implementation-defined native environment".  The following example
in the same page gives me some hints,

setlocale( LC_ALL, "" );
Sets the locale to the default, which is the user-default ANSI code page
obtained from the operating system.

As far as I can see, windows doesn't consider any env var with
setlocale(FOO, "").  If I am correct, we can't set gdb's locale by means
of setting env var, instead, we have to match all the possibilities in
the testcase.  WDYT?

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-09  8:37                   ` Yao Qi
@ 2014-06-09 10:11                     ` Pedro Alves
  2014-06-11  2:22                       ` Yao Qi
  0 siblings, 1 reply; 39+ messages in thread
From: Pedro Alves @ 2014-06-09 10:11 UTC (permalink / raw)
  To: Yao Qi; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/09/2014 09:35 AM, Yao Qi wrote:

> LC_CTYPE isn't set on the Windows machine I am using.  set LC_CTYPE=C,
> but the output is unchanged.
> 
> I dive into locale stuff, and find something more, in
> main.c:captured_main, gdb does
> 
> #if defined (HAVE_SETLOCALE)
>   setlocale (LC_CTYPE, "");
> #endif
> 
> the man page of setlocale says
> 
> If locale is "", each part of the locale that should be modified is set
> according to the environment variables.
> 
> That is why we can pass env var to change gdb's locale.
> 
> However, looks setlocale on Windows behaves differently when locale is
> "".  The msdn about setlocale
> <http://msdn.microsoft.com/en-us/library/x99tb11d.aspx> says "If locale
> points to an empty string, the locale is the implementation-defined
> native environment.", but it doesn't say much on the
> "implementation-defined native environment".  The following example
> in the same page gives me some hints,
> 
> setlocale( LC_ALL, "" );
> Sets the locale to the default, which is the user-default ANSI code page
> obtained from the operating system.
> 
> As far as I can see, windows doesn't consider any env var with
> setlocale(FOO, "").  

Correct.

> If I am correct, we can't set gdb's locale by means
> of setting env var, 

Not true.  It just means that GDB should be doing more
on native Windows, instead of assuming setlocale on Windows
behaves like the POSIX counterpart.  See e.g.,
src/intl/localename.c  (gettext):

...
  /* Let the user override the system settings through environment
     variables, as on POSIX systems.  */
  retval = getenv ("LC_ALL");
  if (retval != NULL && retval[0] != '\0')
    return retval;
  retval = getenv (categoryname);
  if (retval != NULL && retval[0] != '\0')
    return retval;
  retval = getenv ("LANG");
  if (retval != NULL && retval[0] != '\0')
    return retval;

  /* Use native Win32 API locale ID.  */
  lcid = GetThreadLocale ();
...

etc.

But that code has evolved upstream, and we have the solution
already in gnulib.  See:

http://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00154.html

Newer versions of intl/gettext override setlocale like that too:

 http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/setlocale.c

> instead, we have to match all the possibilities in
> the testcase.  WDYT?

I think the test caught a real GDB bug on Windows, and we
should fix GDB to make it look at the environment variables,
as is expected of GNU programs.  And that the best way
to handle this is to import the gnulib setlocale module.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-09 10:11                     ` Pedro Alves
@ 2014-06-11  2:22                       ` Yao Qi
  2014-06-11 16:23                         ` Eli Zaretskii
  2014-06-12 11:36                         ` Pedro Alves
  0 siblings, 2 replies; 39+ messages in thread
From: Yao Qi @ 2014-06-11  2:22 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/09/2014 06:11 PM, Pedro Alves wrote:
> I think the test caught a real GDB bug on Windows, and we
> should fix GDB to make it look at the environment variables,
> as is expected of GNU programs.  And that the best way
> to handle this is to import the gnulib setlocale module.

I've started setlocale module import, but during the work, I did some
experiments and the result is confusing me.

We import setlocale so that we can set locale through env var, assuming
that different locales affect the return value of iswprint (0xa2).
However, this assumption isn't true on Windows :(

I write the following program to check the return value of iswprint
under different locales.

On Linux, the output is reasonable
$ ./iswprint
4
C: 0
en_US.UTF-8: 1
C: 0

On Windows, iswprint always return true!
C:\>iswprint.win.exe
2
C: 16
English_United States.1252: 16
C: 16

iswprint return value depends on LC_CTYPE, but under LC_CTYPE=C,
iswprint (0xa2) behaves differently on Windows and Linux.

-- 
Yao (齐尧)

#include <wchar.h>
#include <wctype.h>
#include <stdio.h>
#include <locale.h>

int
main (void)
{
  wchar_t c = 0xa2;

  printf ("%d\n", sizeof c);

  printf ("%s: %d\n", setlocale (LC_CTYPE, NULL), iswprint (c));

  setlocale (LC_CTYPE, "");
  printf ("%s: %d\n", setlocale (LC_CTYPE, NULL), iswprint (c));

  setlocale (LC_CTYPE, "C");
  printf ("%s: %d\n", setlocale (LC_CTYPE, NULL), iswprint (c));


  return 0;
}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-11  2:22                       ` Yao Qi
@ 2014-06-11 16:23                         ` Eli Zaretskii
  2014-06-12  0:48                           ` Yao Qi
  2014-06-12 11:36                         ` Pedro Alves
  1 sibling, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-11 16:23 UTC (permalink / raw)
  To: Yao Qi; +Cc: palves, tromey, brobecker, gdb-patches

> Date: Wed, 11 Jun 2014 10:20:28 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,	<gdb-patches@sourceware.org>
> 
> We import setlocale so that we can set locale through env var, assuming
> that different locales affect the return value of iswprint (0xa2).
> However, this assumption isn't true on Windows :(
> 
> I write the following program to check the return value of iswprint
> under different locales.
> 
> On Linux, the output is reasonable
> $ ./iswprint
> 4
> C: 0
> en_US.UTF-8: 1
> C: 0
> 
> On Windows, iswprint always return true!
> C:\>iswprint.win.exe
> 2
> C: 16
> English_United States.1252: 16
> C: 16
> 
> iswprint return value depends on LC_CTYPE, but under LC_CTYPE=C,
> iswprint (0xa2) behaves differently on Windows and Linux.

Why do you need 0xa2 to be unprintable?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-11 16:23                         ` Eli Zaretskii
@ 2014-06-12  0:48                           ` Yao Qi
  2014-06-12  2:47                             ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Yao Qi @ 2014-06-12  0:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: palves, tromey, brobecker, gdb-patches

On 06/12/2014 12:22 AM, Eli Zaretskii wrote:
> Why do you need 0xa2 to be unprintable?

Test in gdb.base/wchar.exp expects 0xa2 being unprintable.

set cent "\\\\242"
gdb_test "print repeat" "= L\"A\", '$cent' <repeats 21 times>, \"B.*"

but it is printable on mingw and causes several fails in wchar.exp.  At
the beginning, I think this is caused by locale but the experiment later
shows that setting locale doesn't change anything (the subject becomes
misleading).

I should change the subject to "Different output affected by host", and
probably go back to use the patch to relax the pattern to

set cent "(\\\\242|\u00A2)"

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12  0:48                           ` Yao Qi
@ 2014-06-12  2:47                             ` Eli Zaretskii
  2014-06-12  7:04                               ` Yao Qi
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-12  2:47 UTC (permalink / raw)
  To: Yao Qi; +Cc: palves, tromey, brobecker, gdb-patches

> Date: Thu, 12 Jun 2014 08:46:23 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: <palves@redhat.com>, <tromey@redhat.com>, <brobecker@adacore.com>,
> 	<gdb-patches@sourceware.org>
> 
> On 06/12/2014 12:22 AM, Eli Zaretskii wrote:
> > Why do you need 0xa2 to be unprintable?
> 
> Test in gdb.base/wchar.exp expects 0xa2 being unprintable.

So you need _any_ character for which iswprint returns zero?  If so,
does the character have to be a single byte?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12  2:47                             ` Eli Zaretskii
@ 2014-06-12  7:04                               ` Yao Qi
  2014-06-12 17:03                                 ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Yao Qi @ 2014-06-12  7:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: palves, tromey, brobecker, gdb-patches

On 06/12/2014 10:46 AM, Eli Zaretskii wrote:
> So you need _any_ character for which iswprint returns zero?  If so,
> does the character have to be a single byte?

Find a character for which iswprint returns zero isn't the point, IMO.
The problem is wchar.exp expects "\242" but GDB prints cent sign on
mingw.  Instead of changing to another character, isn't better to match
both (\242 and cent sign) in regexp pattern?

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-11  2:22                       ` Yao Qi
  2014-06-11 16:23                         ` Eli Zaretskii
@ 2014-06-12 11:36                         ` Pedro Alves
  2014-06-12 14:39                           ` Yao Qi
  2014-06-12 17:09                           ` Eli Zaretskii
  1 sibling, 2 replies; 39+ messages in thread
From: Pedro Alves @ 2014-06-12 11:36 UTC (permalink / raw)
  To: Yao Qi; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/11/2014 03:20 AM, Yao Qi wrote:
> On 06/09/2014 06:11 PM, Pedro Alves wrote:
>> I think the test caught a real GDB bug on Windows, and we
>> should fix GDB to make it look at the environment variables,
>> as is expected of GNU programs.  And that the best way
>> to handle this is to import the gnulib setlocale module.
> 
> I've started setlocale module import, but during the work, I did some
> experiments and the result is confusing me.
> 
> We import setlocale so that we can set locale through env var, assuming
> that different locales affect the return value of iswprint (0xa2).
> However, this assumption isn't true on Windows :(

Well, it actually is.

> 
> I write the following program to check the return value of iswprint
> under different locales.
> 
> On Linux, the output is reasonable
> $ ./iswprint
> 4
> C: 0
> en_US.UTF-8: 1
> C: 0
> 
> On Windows, iswprint always return true!
> C:\>iswprint.win.exe
> 2
> C: 16
> English_United States.1252: 16

This shows that what happens is that on Windows the LC_CTYPE=C picks
up the CP-1252 Windows code page (Latin 1), an extended ASCII
code page.  And in that code page, 162 is printable.

> C: 16
> 
> iswprint return value depends on LC_CTYPE, but under LC_CTYPE=C,
> iswprint (0xa2) behaves differently on Windows and Linux.
> 

The difference is really in what locale/code page LC_CTYPE=C picks
up.

What does "show host-charset" show on Windows, before and after
you make GDB pick LC_CTYPE=C from the environment (with the
setlocale gnulib module)?

(Ideally, the wchar tests would actually iterate testing GDB
behaves as expected with different values of LC_CTYPE, etc. set
in the environment.  With all other tests assuming ASCII as set
by default by the testsuite framework.)

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 11:36                         ` Pedro Alves
@ 2014-06-12 14:39                           ` Yao Qi
  2014-06-12 17:07                             ` Eli Zaretskii
  2014-06-12 17:23                             ` Pedro Alves
  2014-06-12 17:09                           ` Eli Zaretskii
  1 sibling, 2 replies; 39+ messages in thread
From: Yao Qi @ 2014-06-12 14:39 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/12/2014 07:36 PM, Pedro Alves wrote:
> What does "show host-charset" show on Windows, before and after
> you make GDB pick LC_CTYPE=C from the environment (with the
> setlocale gnulib module)?

GDB on Windows gets host charset from GetACP(), in
charset.c:_initialize_charset ().

#elif defined (USE_WIN32API)
  {
    /* "CP" + x<=5 digits + paranoia.  */
    static char w32_host_default_charset[16];

    snprintf (w32_host_default_charset, sizeof w32_host_default_charset,
	      "CP%d", GetACP());
    auto_host_charset_name = w32_host_default_charset;
    auto_target_charset_name = auto_host_charset_name;
  }
#endif

GetACP doesn't depend on locale, so I don't think LC_CTYPE=C affects the
host-charset in GDB.  However, I do this:

  printf ("%d\n", GetACP());

  setlocale (LC_CTYPE, "");
  printf ("%d\n", GetACP());

  setlocale (LC_CTYPE, "C");
  printf ("%d\n", GetACP());

On my Windows machine, 1252 is printed three times.

> 
> (Ideally, the wchar tests would actually iterate testing GDB
> behaves as expected with different values of LC_CTYPE, etc. set
> in the environment.  With all other tests assuming ASCII as set
> by default by the testsuite framework.)

On the condition that we know or enumerate the expected output for
wchars under each LC_CTYPE on different host (or OS).  Test like this
is out of the scope of GDB (or debugger) testing, IMO.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12  7:04                               ` Yao Qi
@ 2014-06-12 17:03                                 ` Eli Zaretskii
  2014-06-17  1:03                                   ` Yao Qi
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-12 17:03 UTC (permalink / raw)
  To: Yao Qi; +Cc: palves, tromey, brobecker, gdb-patches

> Date: Thu, 12 Jun 2014 15:02:57 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: <palves@redhat.com>, <tromey@redhat.com>, <brobecker@adacore.com>,
> 	<gdb-patches@sourceware.org>
> 
> On 06/12/2014 10:46 AM, Eli Zaretskii wrote:
> > So you need _any_ character for which iswprint returns zero?  If so,
> > does the character have to be a single byte?
> 
> Find a character for which iswprint returns zero isn't the point, IMO.
> The problem is wchar.exp expects "\242" but GDB prints cent sign on
> mingw.  Instead of changing to another character, isn't better to match
> both (\242 and cent sign) in regexp pattern?

Maybe.  Can you tell what is the purpose of the test?  (Sorry, I know
almost nothing about the test suite.)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 14:39                           ` Yao Qi
@ 2014-06-12 17:07                             ` Eli Zaretskii
  2014-06-12 17:23                             ` Pedro Alves
  1 sibling, 0 replies; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-12 17:07 UTC (permalink / raw)
  To: Yao Qi; +Cc: palves, tromey, brobecker, gdb-patches

> Date: Thu, 12 Jun 2014 22:37:38 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,	<gdb-patches@sourceware.org>
> 
> GetACP doesn't depend on locale, so I don't think LC_CTYPE=C affects the
> host-charset in GDB.

Indeed, it doesn't.

> However, I do this:
> 
>   printf ("%d\n", GetACP());
> 
>   setlocale (LC_CTYPE, "");
>   printf ("%d\n", GetACP());
> 
>   setlocale (LC_CTYPE, "C");
>   printf ("%d\n", GetACP());
> 
> On my Windows machine, 1252 is printed three times.

As expected: GetACP returns the _default_ codepage, and the default
does not change when you change a locale.  And the iswprint function
doesn't consult the default codepage.  So I don't think this issue
with GetACP is at all relevant.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 11:36                         ` Pedro Alves
  2014-06-12 14:39                           ` Yao Qi
@ 2014-06-12 17:09                           ` Eli Zaretskii
  2014-06-12 17:27                             ` Pedro Alves
  1 sibling, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-12 17:09 UTC (permalink / raw)
  To: Pedro Alves; +Cc: yao, tromey, brobecker, gdb-patches

> Date: Thu, 12 Jun 2014 12:36:29 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,        gdb-patches@sourceware.org
> 
> (Ideally, the wchar tests would actually iterate testing GDB
> behaves as expected with different values of LC_CTYPE, etc. set
> in the environment.  With all other tests assuming ASCII as set
> by default by the testsuite framework.)

What do you mean by "behaves as expected"?  And why is LC_TYPE
important here?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 14:39                           ` Yao Qi
  2014-06-12 17:07                             ` Eli Zaretskii
@ 2014-06-12 17:23                             ` Pedro Alves
  2014-06-12 17:48                               ` Eli Zaretskii
  2014-06-17  3:46                               ` Yao Qi
  1 sibling, 2 replies; 39+ messages in thread
From: Pedro Alves @ 2014-06-12 17:23 UTC (permalink / raw)
  To: Yao Qi; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/12/2014 03:37 PM, Yao Qi wrote:
> On 06/12/2014 07:36 PM, Pedro Alves wrote:
>> What does "show host-charset" show on Windows, before and after
>> you make GDB pick LC_CTYPE=C from the environment (with the
>> setlocale gnulib module)?
> 
> GDB on Windows gets host charset from GetACP(), in
> charset.c:_initialize_charset ().
> 
> #elif defined (USE_WIN32API)
>   {
>     /* "CP" + x<=5 digits + paranoia.  */
>     static char w32_host_default_charset[16];
> 
>     snprintf (w32_host_default_charset, sizeof w32_host_default_charset,
> 	      "CP%d", GetACP());
>     auto_host_charset_name = w32_host_default_charset;
>     auto_target_charset_name = auto_host_charset_name;
>   }
> #endif
> 

I note gnulib's nl_langinfo replacement actually does
the same thing.

> GetACP doesn't depend on locale, 

Yeah, it's a mess, and those are really different
things.  The former is the system locale, while the latter
the user locale.  MSDN is confusing, but lots of blogs around
explaining this.

> so I don't think LC_CTYPE=C affects the
> host-charset in GDB.  However, I do this:
> 
>   printf ("%d\n", GetACP());
> 
>   setlocale (LC_CTYPE, "");
>   printf ("%d\n", GetACP());
> 
>   setlocale (LC_CTYPE, "C");
>   printf ("%d\n", GetACP());
> 
> On my Windows machine, 1252 is printed three times.

So what I'm thinking is indeed going with making the test
accept the cent, but conditioned, like:

# Fallback to assuming 7-bit ASCII.  Test are ran under LC_CTYPE=C.

set cent "\\\\242"

set test "show host-charset"
gdb_test_multiple $test $test {
   -re "CP1252\r\n$gdb_prompt $" {
        # With Windows code page 1252 (Latin 1), the cent
        # is printable.
	set cent "\u00A2"
	pass $test
   }
   -re "$gdb_prompt $" {
	pass $test
   }
}

> 
>>
>> (Ideally, the wchar tests would actually iterate testing GDB
>> behaves as expected with different values of LC_CTYPE, etc. set
>> in the environment.  With all other tests assuming ASCII as set
>> by default by the testsuite framework.)
> 
> On the condition that we know or enumerate the expected output for
> wchars under each LC_CTYPE on different host (or OS).  Test like this
> is out of the scope of GDB (or debugger) testing, IMO.

Not an exaustive test, and not by host, but just by picking a couple
charsets/locales.  So that we at least ensure that the framework is
all in sync.  That is, check:

$ unset LC_CTYPE; gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch
$ LC_CTYPE=XXX gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch
$ LC_CTYPE=en_US gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch
$ LC_CTYPE=en_US.UTF-8 gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 17:09                           ` Eli Zaretskii
@ 2014-06-12 17:27                             ` Pedro Alves
  2014-06-12 17:50                               ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Pedro Alves @ 2014-06-12 17:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yao, tromey, brobecker, gdb-patches

On 06/12/2014 06:08 PM, Eli Zaretskii wrote:
>> Date: Thu, 12 Jun 2014 12:36:29 +0100
>> From: Pedro Alves <palves@redhat.com>
>> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,        gdb-patches@sourceware.org
>>
>> (Ideally, the wchar tests would actually iterate testing GDB
>> behaves as expected with different values of LC_CTYPE, etc. set
>> in the environment.  With all other tests assuming ASCII as set
>> by default by the testsuite framework.)
> 
> What do you mean by "behaves as expected"?  And why is LC_TYPE
> important here?

I think I've answered this in my response to Yao.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 17:23                             ` Pedro Alves
@ 2014-06-12 17:48                               ` Eli Zaretskii
  2014-06-17  3:46                               ` Yao Qi
  1 sibling, 0 replies; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-12 17:48 UTC (permalink / raw)
  To: Pedro Alves; +Cc: yao, tromey, brobecker, gdb-patches

> Date: Thu, 12 Jun 2014 18:23:34 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,        gdb-patches@sourceware.org
> 
> On 06/12/2014 03:37 PM, Yao Qi wrote:
> > On 06/12/2014 07:36 PM, Pedro Alves wrote:
> >> What does "show host-charset" show on Windows, before and after
> >> you make GDB pick LC_CTYPE=C from the environment (with the
> >> setlocale gnulib module)?
> > 
> > GDB on Windows gets host charset from GetACP(), in
> > charset.c:_initialize_charset ().
> > 
> > #elif defined (USE_WIN32API)
> >   {
> >     /* "CP" + x<=5 digits + paranoia.  */
> >     static char w32_host_default_charset[16];
> > 
> >     snprintf (w32_host_default_charset, sizeof w32_host_default_charset,
> > 	      "CP%d", GetACP());
> >     auto_host_charset_name = w32_host_default_charset;
> >     auto_target_charset_name = auto_host_charset_name;
> >   }
> > #endif
> > 
> 
> I note gnulib's nl_langinfo replacement actually does
> the same thing.

And gnulib's nl_langinfo is wrong, btw, because one can use
'setlocale' to change the codeset, without any relation whatsoever to
the console encoding.  (I sent a fix for that to gnulib's list just
yesterdat.)

> > GetACP doesn't depend on locale, 
> 
> Yeah, it's a mess, and those are really different
> things.  The former is the system locale, while the latter
> the user locale.

That's true, but that's not the important issue here.  The important
issue here is the fundamental difference between the Windows console
encoding and the current locale's codeset.  The former affects how
Windows writes to the console, and in most cases changing the console
codepage (e.g., with SetConsoleCP or SetConsoleOutputCP) is a futile
exercise, because all it does is cause garbled display.  The latter is
an important feature when you are dealing with programs that don't
intend using the codeset to display text to the user, but, for
example, to change the behavior of iswprint.

Using the console codepage when really the locale's codeset is needed
will only going to work for the default setup, not when you want or
need to change the locale.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 17:27                             ` Pedro Alves
@ 2014-06-12 17:50                               ` Eli Zaretskii
  2014-06-12 18:06                                 ` Pedro Alves
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-12 17:50 UTC (permalink / raw)
  To: Pedro Alves; +Cc: yao, tromey, brobecker, gdb-patches

> Date: Thu, 12 Jun 2014 18:26:47 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com,        gdb-patches@sourceware.org
> 
> > What do you mean by "behaves as expected"?  And why is LC_TYPE
> > important here?
> 
> I think I've answered this in my response to Yao.

Not really, but you don't have to explain as long as the original
problem is solved.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 17:50                               ` Eli Zaretskii
@ 2014-06-12 18:06                                 ` Pedro Alves
  2014-06-12 18:35                                   ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Pedro Alves @ 2014-06-12 18:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yao, tromey, brobecker, gdb-patches

On 06/12/2014 06:49 PM, Eli Zaretskii wrote:
>> Date: Thu, 12 Jun 2014 18:26:47 +0100
>> From: Pedro Alves <palves@redhat.com>
>> CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com,        gdb-patches@sourceware.org
>>
>>> What do you mean by "behaves as expected"?  And why is LC_TYPE
>>> important here?
>>
>> I think I've answered this in my response to Yao.
> 
> Not really, but you don't have to explain as long as the original
> problem is solved.

Trying again then.

The testsuite framework does, in gdb.exp:gdb_init:

    # We set LC_ALL, LC_CTYPE, and LANG to C so that we get the same
    # messages as expected.
    setenv LC_ALL C
    setenv LC_CTYPE C
    setenv LANG C

... so that output is stable for everyone.

And if we do that, we miss making sure GDB works correctly
with locales/charsets other than C/ASCII on most hosts.

So I was just saying that IMO ideally we'd have tests that
make sure GDB prints what we think it should print when
LC_CTYPE (etc.) is set to something else, like e.g.,
en_US.UTF-8.

Does that answer the question?

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 18:06                                 ` Pedro Alves
@ 2014-06-12 18:35                                   ` Eli Zaretskii
  2014-06-16 13:58                                     ` Pedro Alves
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-12 18:35 UTC (permalink / raw)
  To: Pedro Alves; +Cc: yao, tromey, brobecker, gdb-patches

> Date: Thu, 12 Jun 2014 19:05:52 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com,
>         gdb-patches@sourceware.org
> 
> Trying again then.

Thanks.

> The testsuite framework does, in gdb.exp:gdb_init:
> 
>     # We set LC_ALL, LC_CTYPE, and LANG to C so that we get the same
>     # messages as expected.
>     setenv LC_ALL C
>     setenv LC_CTYPE C
>     setenv LANG C
> 
> ... so that output is stable for everyone.

With you so far.  But note that on Windows, even the above does not
guarantee "stable output", because the console codepage is not changed
by 'setlocale', and moreover, the Windows 'setlocale' doesn't pay
attention to environment variables.  So on Windows, these tests run in
the default system locale (because we call 'setlocale' with the 2nd
argument an empty string).

> And if we do that, we miss making sure GDB works correctly
> with locales/charsets other than C/ASCII on most hosts.

And here, "works correctly" means what? sets host-charset? or
something else?  Assuming the former below.

> So I was just saying that IMO ideally we'd have tests that
> make sure GDB prints what we think it should print when
> LC_CTYPE (etc.) is set to something else, like e.g.,
> en_US.UTF-8.

You cannot ask the Windows 'setlocale' to use UTF-8 as the codeset
(although there is a UTF-8 codepage, and Windows does support it in
general).  More importantly, since 'setlocale' on Windows disregards
the environment variables, you cannot change the host charset by
setting environment variables.  You must do that by a GDB command that
sets host-charset.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 18:35                                   ` Eli Zaretskii
@ 2014-06-16 13:58                                     ` Pedro Alves
  2014-06-16 15:40                                       ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Pedro Alves @ 2014-06-16 13:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yao, tromey, brobecker, gdb-patches

On 06/12/2014 07:34 PM, Eli Zaretskii wrote:

> With you so far.  But note that on Windows, even the above does not
> guarantee "stable output", because the console codepage is not changed
> by 'setlocale', 

I guess the harmness could run gdb under chcp 65001 or some such.

> You cannot ask the Windows 'setlocale' to use UTF-8 as the codeset
> (although there is a UTF-8 codepage, and Windows does support it in
> general).  More importantly, since 'setlocale' on Windows disregards
> the environment variables, you cannot change the host charset by
> setting environment variables.  You must do that by a GDB command that
> sets host-charset.

See https://sourceware.org/ml/gdb-patches/2014-06/msg00364.html .

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-16 13:58                                     ` Pedro Alves
@ 2014-06-16 15:40                                       ` Eli Zaretskii
  2014-06-16 16:23                                         ` Pedro Alves
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2014-06-16 15:40 UTC (permalink / raw)
  To: Pedro Alves; +Cc: yao, tromey, brobecker, gdb-patches

> Date: Mon, 16 Jun 2014 14:57:59 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com,
>         gdb-patches@sourceware.org
> 
> On 06/12/2014 07:34 PM, Eli Zaretskii wrote:
> 
> > With you so far.  But note that on Windows, even the above does not
> > guarantee "stable output", because the console codepage is not changed
> > by 'setlocale', 
> 
> I guess the harmness could run gdb under chcp 65001 or some such.

You could, but it won't help, really.  It's a long story, but support
for UTF-8 on a Windows console is pathetic.  With enough trouble
(which will need source changes in GDB and in Readline), you might
have European characters displayed correctly, if you also change the
console font to Lucida Console.  But anything beyond European
characters simply cannot be displayed, because the font doesn't have
them.

> > You cannot ask the Windows 'setlocale' to use UTF-8 as the codeset
> > (although there is a UTF-8 codepage, and Windows does support it in
> > general).  More importantly, since 'setlocale' on Windows disregards
> > the environment variables, you cannot change the host charset by
> > setting environment variables.  You must do that by a GDB command that
> > sets host-charset.
> 
> See https://sourceware.org/ml/gdb-patches/2014-06/msg00364.html .

If you mean the last 2 sentences, then yes, using setlocale from
gnulib will fix that.  But the problem with UTF-8 as the charset isn't
(and AFAIK cannot be) solved by gnulib, because Windows simply does
not support codepage 65001 in its setlocale implementation (this is
documented in MSDN).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-16 15:40                                       ` Eli Zaretskii
@ 2014-06-16 16:23                                         ` Pedro Alves
  0 siblings, 0 replies; 39+ messages in thread
From: Pedro Alves @ 2014-06-16 16:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yao, tromey, brobecker, gdb-patches

On 06/16/2014 04:40 PM, Eli Zaretskii wrote:

>>> With you so far.  But note that on Windows, even the above does not
>>> guarantee "stable output", because the console codepage is not changed
>>> by 'setlocale', 
>>
>> I guess the harmness could run gdb under chcp 65001 or some such.
> 
> You could, but it won't help, really.  It's a long story, but support
> for UTF-8 on a Windows console is pathetic.  With enough trouble
> (which will need source changes in GDB and in Readline), you might
> have European characters displayed correctly, if you also change the
> console font to Lucida Console.  But anything beyond European
> characters simply cannot be displayed, because the font doesn't have
> them.

OK, I was focusing more on the "stable output" aspect than
the specific codepage.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 17:03                                 ` Eli Zaretskii
@ 2014-06-17  1:03                                   ` Yao Qi
  0 siblings, 0 replies; 39+ messages in thread
From: Yao Qi @ 2014-06-17  1:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: palves, tromey, brobecker, gdb-patches

On 06/13/2014 01:02 AM, Eli Zaretskii wrote:
> Maybe.  Can you tell what is the purpose of the test?  (Sorry, I know
> almost nothing about the test suite.)

Eli,
This test was added by the following patch, which is to fix the
incorrect placement of comma in repeated characters,

 [RFA] gdb/14288
 https://sourceware.org/ml/gdb-patches/2012-08/msg00780.html

The test wasn't about the \242-or-cent-sign-printing we discussed here.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-12 17:23                             ` Pedro Alves
  2014-06-12 17:48                               ` Eli Zaretskii
@ 2014-06-17  3:46                               ` Yao Qi
  2014-06-17 10:03                                 ` Pedro Alves
  1 sibling, 1 reply; 39+ messages in thread
From: Yao Qi @ 2014-06-17  3:46 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/13/2014 01:23 AM, Pedro Alves wrote:
> So what I'm thinking is indeed going with making the test
> accept the cent, but conditioned, like:

OK, that is more restrict.

> 
> # Fallback to assuming 7-bit ASCII.  Test are ran under LC_CTYPE=C.
> 
> set cent "\\\\242"
> 
> set test "show host-charset"
> gdb_test_multiple $test $test {
>    -re "CP1252\r\n$gdb_prompt $" {

I tweak the pattern to match the output...

>         # With Windows code page 1252 (Latin 1), the cent
>         # is printable.
> 	set cent "\u00A2"
> 	pass $test
>    }
>    -re "$gdb_prompt $" {
> 	pass $test
>    }
> }

... and how about the patch below?

-- 
Yao (齐尧)

Subject: [PATCH] Different outputs affected by hosts
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We find the following fails in gdb test on mingw host.

FAIL: gdb.base/wchar.exp: print repeat
FAIL: gdb.base/wchar.exp: print repeat_p
FAIL: gdb.base/wchar.exp: print repeat (print null on)
FAIL: gdb.base/wchar.exp: print repeat (print elements 3)
FAIL: gdb.base/wchar.exp: print repeat_p (print elements 3)

print repeat^M
$7 = L"A", '¢' <repeats 21 times>, "B", '\000' <repeats 104 times>^M
(gdb) FAIL: gdb.base/wchar.exp: print repeat

the \242 is expected in the test but cent sign is displayed.

In valprint.c:print_wchar, wchar_printable is called to determine
whether a wchar is printable.  wchar_printable calls iswprint but
the iswprint's return value depends on LC_CTYPE setting of locale [1, 2].
The output may vary with different locale settings and OS.  IMO, '¢'
(cent sign) is a correct output on Windows.

[1] http://pubs.opengroup.org/onlinepubs/009604499/functions/iswprint.html
[2] http://msdn.microsoft.com/en-us/library/ewx8s4kw.aspx

This patch is set $cent to cent sign if the GDB is running on a
Windows host.

gdb/testsuite:

2014-06-17  Yao Qi  <yao@codesourcery.com>

	* gdb.base/wchar.exp: Set $cent to \u00A2 if "host-charset" is
	CP1252.
---
 gdb/testsuite/gdb.base/wchar.exp | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/gdb/testsuite/gdb.base/wchar.exp b/gdb/testsuite/gdb.base/wchar.exp
index 4290478..651bd88 100644
--- a/gdb/testsuite/gdb.base/wchar.exp
+++ b/gdb/testsuite/gdb.base/wchar.exp
@@ -37,6 +37,20 @@ gdb_test "print simple\[2\]" "= 99 L'c'"
 gdb_test "print difficile\[2\]" "= 65261 L'\\\\xfeed'"
 
 set cent "\\\\242"
+
+set test "show host-charset"
+gdb_test_multiple $test $test {
+   -re "CP1252\".*\r\n$gdb_prompt $" {
+       # With Windows code page 1252 (Latin 1), the cent
+       # is printable.
+	set cent "\u00A2"
+	pass $test
+   }
+   -re "$gdb_prompt $" {
+	pass $test
+   }
+}
+
 gdb_test "print repeat" "= L\"A\", '$cent' <repeats 21 times>, \"B.*"
 
 global hex
-- 
1.9.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-17  3:46                               ` Yao Qi
@ 2014-06-17 10:03                                 ` Pedro Alves
  2014-06-17 11:39                                   ` Yao Qi
  0 siblings, 1 reply; 39+ messages in thread
From: Pedro Alves @ 2014-06-17 10:03 UTC (permalink / raw)
  To: Yao Qi; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/17/2014 04:44 AM, Yao Qi wrote:
> On 06/13/2014 01:23 AM, Pedro Alves wrote:
>> So what I'm thinking is indeed going with making the test
>> accept the cent, but conditioned, like:
> 
> OK, that is more restrict.
> 
>>
>> # Fallback to assuming 7-bit ASCII.  Test are ran under LC_CTYPE=C.
>>
>> set cent "\\\\242"
>>
>> set test "show host-charset"
>> gdb_test_multiple $test $test {
>>    -re "CP1252\r\n$gdb_prompt $" {
> 
> I tweak the pattern to match the output...
> 
>>         # With Windows code page 1252 (Latin 1), the cent
>>         # is printable.
>> 	set cent "\u00A2"
>> 	pass $test
>>    }
>>    -re "$gdb_prompt $" {
>> 	pass $test
>>    }
>> }
> 
> ... and how about the patch below?
> 

Looks good to me.

Thanks,
-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ping]  [PATCH] Different outputs affected by locale
  2014-06-17 10:03                                 ` Pedro Alves
@ 2014-06-17 11:39                                   ` Yao Qi
  0 siblings, 0 replies; 39+ messages in thread
From: Yao Qi @ 2014-06-17 11:39 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Tom Tromey, Joel Brobecker, gdb-patches

On 06/17/2014 06:03 PM, Pedro Alves wrote:
> Looks good to me.

Thanks for the review.  Patch is pushed in.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2014-06-17 11:39 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-27 12:13 [PATCH] Different outputs affected by locale Yao Qi
2014-06-04  5:32 ` [ping] " Yao Qi
2014-06-04 12:47   ` Joel Brobecker
2014-06-04 13:21     ` Yao Qi
2014-06-04 13:52       ` Joel Brobecker
2014-06-04 20:15       ` Tom Tromey
2014-06-04 20:23         ` Pedro Alves
2014-06-05  3:31           ` Yao Qi
2014-06-05  8:58             ` Pedro Alves
2014-06-05  9:58               ` Yao Qi
2014-06-05 10:12                 ` Pedro Alves
2014-06-05 15:04                   ` Eli Zaretskii
2014-06-09  8:37                   ` Yao Qi
2014-06-09 10:11                     ` Pedro Alves
2014-06-11  2:22                       ` Yao Qi
2014-06-11 16:23                         ` Eli Zaretskii
2014-06-12  0:48                           ` Yao Qi
2014-06-12  2:47                             ` Eli Zaretskii
2014-06-12  7:04                               ` Yao Qi
2014-06-12 17:03                                 ` Eli Zaretskii
2014-06-17  1:03                                   ` Yao Qi
2014-06-12 11:36                         ` Pedro Alves
2014-06-12 14:39                           ` Yao Qi
2014-06-12 17:07                             ` Eli Zaretskii
2014-06-12 17:23                             ` Pedro Alves
2014-06-12 17:48                               ` Eli Zaretskii
2014-06-17  3:46                               ` Yao Qi
2014-06-17 10:03                                 ` Pedro Alves
2014-06-17 11:39                                   ` Yao Qi
2014-06-12 17:09                           ` Eli Zaretskii
2014-06-12 17:27                             ` Pedro Alves
2014-06-12 17:50                               ` Eli Zaretskii
2014-06-12 18:06                                 ` Pedro Alves
2014-06-12 18:35                                   ` Eli Zaretskii
2014-06-16 13:58                                     ` Pedro Alves
2014-06-16 15:40                                       ` Eli Zaretskii
2014-06-16 16:23                                         ` Pedro Alves
2014-06-05 10:27                 ` Pedro Alves
2014-06-05 14:47             ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).