public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
@ 2022-05-31 11:08 vries at gcc dot gnu.org
  2022-05-31 11:11 ` [Bug symtab/29205] " vries at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: vries at gcc dot gnu.org @ 2022-05-31 11:08 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

            Bug ID: 29205
           Summary: Running selftest dw2_expand_symtabs_matching. warning:
                    could not convert 'yfunc�' from the host encoding
                    (UTF-8) to UTF-32
           Product: gdb
           Version: HEAD
            Status: NEW
          Severity: normal
          Priority: P2
         Component: symtab
          Assignee: unassigned at sourceware dot org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

I run into:
...
$ gdb -q -batch -ex "maint selftest dw2_expand_symtabs_matching" 

Running selftest dw2_expand_symtabs_matching.
warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32.
This normally should not happen, please file a bug report.
warning: charset conversion failure for 'yfunc�'.
You may have the wrong value for 'set ada source-charset'.
Ran 1 unit tests, 0 failed
...

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug symtab/29205] Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
  2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
@ 2022-05-31 11:11 ` vries at gcc dot gnu.org
  2022-05-31 11:11 ` vries at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vries at gcc dot gnu.org @ 2022-05-31 11:11 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 14124
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14124&action=edit
Tentative fix

The unit test uses '\377' for 'ÿ', but AFAICT in UTF-8 that's an incomplete
multibyte sequence.

This patch replaces each '\377' with 'ÿ', and fixes the warning.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug symtab/29205] Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
  2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
  2022-05-31 11:11 ` [Bug symtab/29205] " vries at gcc dot gnu.org
@ 2022-05-31 11:11 ` vries at gcc dot gnu.org
  2022-05-31 11:25 ` vries at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vries at gcc dot gnu.org @ 2022-05-31 11:11 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |palves at sourceware dot org

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug symtab/29205] Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
  2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
  2022-05-31 11:11 ` [Bug symtab/29205] " vries at gcc dot gnu.org
  2022-05-31 11:11 ` vries at gcc dot gnu.org
@ 2022-05-31 11:25 ` vries at gcc dot gnu.org
  2022-05-31 11:58 ` pedro at palves dot net
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vries at gcc dot gnu.org @ 2022-05-31 11:25 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
OTOH, if I run the test with LC_ALL=C, I get:
...
$ LC_ALL=C gdb -q -batch -ex "maint selftest dw2_expand_symtabs_matching" 
Running selftest dw2_expand_symtabs_matching.
warning: could not convert 'u8função' from the host encoding (ANSI_X3.4-1968)
to UTF-32.
This normally should not happen, please file a bug report.
warning: charset conversion failure for 'u8função'.
You may have the wrong value for 'set ada source-charset'.
Ran 1 unit tests, 0 failed
...

So I guess here we try to interpret an utf-8 string as ANSI_X3.4-1968.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug symtab/29205] Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
  2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-05-31 11:25 ` vries at gcc dot gnu.org
@ 2022-05-31 11:58 ` pedro at palves dot net
  2022-05-31 12:06 ` vries at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pedro at palves dot net @ 2022-05-31 11:58 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

Pedro Alves <pedro at palves dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pedro at palves dot net

--- Comment #3 from Pedro Alves <pedro at palves dot net> ---
The point of those tests that use \377 is to explicitly test 0xff, not whatever
is "ÿ" in utf-8, because the name matching algorithm increments the last char
of the lookup string by +1, and 0xff would wraparound.  The tests are making
sure that is handled properly.  See comments in make_sort_after_prefix_name.

I think the "This normally should not happen, please file a bug report."
warning it too strict -- I imagine it should be possible to pass such symbol
names to the compiler by using escape codes.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug symtab/29205] Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
  2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-05-31 11:58 ` pedro at palves dot net
@ 2022-05-31 12:06 ` vries at gcc dot gnu.org
  2022-05-31 12:08 ` vries at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vries at gcc dot gnu.org @ 2022-05-31 12:06 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tromey at sourceware dot org

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug symtab/29205] Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
  2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-05-31 12:06 ` vries at gcc dot gnu.org
@ 2022-05-31 12:08 ` vries at gcc dot gnu.org
  2022-05-31 12:23 ` vries at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vries at gcc dot gnu.org @ 2022-05-31 12:08 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

--- Comment #4 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #2)
> OTOH, if I run the test with LC_ALL=C, I get:
> ...
> $ LC_ALL=C gdb -q -batch -ex "maint selftest dw2_expand_symtabs_matching" 
> Running selftest dw2_expand_symtabs_matching.
> warning: could not convert 'u8função' from the host encoding
> (ANSI_X3.4-1968) to UTF-32.
> This normally should not happen, please file a bug report.
> warning: charset conversion failure for 'u8função'.
> You may have the wrong value for 'set ada source-charset'.
> Ran 1 unit tests, 0 failed
> ...
> 
> So I guess here we try to interpret an utf-8 string as ANSI_X3.4-1968.

FWIW, fixed by:
...
diff --git a/gdb/charset.c b/gdb/charset.c
index 74f742e0aa7..4dfd05e8851 100644
--- a/gdb/charset.c
+++ b/gdb/charset.c
@@ -225,7 +225,7 @@ gdb_iconv (iconv_t utf_flag, ICONV_CONST char **inbuf,
size_t *inb
ytesleft,
 #endif

 static const char *auto_host_charset_name = GDB_DEFAULT_HOST_CHARSET;
-static const char *host_charset_name = "auto";
+const char *host_charset_name = "auto";
 static void
 show_host_charset_name (struct ui_file *file, int from_tty,
                        struct cmd_list_element *c,
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index c4578c687d2..9cc16d3fd0b 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -3498,6 +3498,8 @@ dw2_expand_symtabs_matching_symbol
   return result;
 }

+extern const char *host_charset_name;
+
 #if GDB_SELF_TEST

 namespace selftests { namespace dw2_expand_symtabs_matching {
@@ -3746,6 +3748,10 @@ test_dw2_expand_symtabs_matching_symbol ()
   /* Identity checks.  */
   for (const char *sym : test_symbols)
     {
+      const char *saved_host_charset_name = host_charset_name;
+      if (strcmp (sym, u8"u8função") == 0)
+       host_charset_name = "UTF-8";
+
       /* Should be able to match all existing symbols.  */
       CHECK_MATCH (sym, symbol_name_match_type::FULL, false,
                   EXPECT (sym));
@@ -3767,6 +3773,8 @@ test_dw2_expand_symtabs_matching_symbol ()
       with_params = std::string (sym) + " ( int ) &&";
       CHECK_MATCH (with_params.c_str (), symbol_name_match_type::FULL, false,
                   {});
+
+      host_charset_name = saved_host_charset_name;
     }

   /* Check that the name matching algorithm for completion doesn't get
...

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug symtab/29205] Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
  2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-05-31 12:08 ` vries at gcc dot gnu.org
@ 2022-05-31 12:23 ` vries at gcc dot gnu.org
  2022-05-31 12:58 ` pedro at palves dot net
  2022-06-07 13:05 ` vries at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: vries at gcc dot gnu.org @ 2022-05-31 12:23 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

--- Comment #5 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Pedro Alves from comment #3)
> The point of those tests that use \377 is to explicitly test 0xff, not
> whatever is "ÿ" in utf-8, 

If 'ÿ' is not relevant, I then would suggest revising the comment:
...
  /* \377 (0xff) is Latin1 'ÿ'.  */
  "yfunc\377",
...
because to me it seems to suggest that it is.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug symtab/29205] Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
  2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2022-05-31 12:23 ` vries at gcc dot gnu.org
@ 2022-05-31 12:58 ` pedro at palves dot net
  2022-06-07 13:05 ` vries at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pedro at palves dot net @ 2022-05-31 12:58 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

--- Comment #6 from Pedro Alves <pedro at palves dot net> ---
Done:
  [pushed] Clarify why we unit test matching symbol names with 0xff characters
  https://sourceware.org/pipermail/gdb-patches/2022-May/189630.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug symtab/29205] Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32
  2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2022-05-31 12:58 ` pedro at palves dot net
@ 2022-06-07 13:05 ` vries at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: vries at gcc dot gnu.org @ 2022-06-07 13:05 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29205

--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 14134
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14134&action=edit
Tentative patch (downgrades warning to symbol_lookup_debug)

(In reply to Pedro Alves from comment #3)
> I think the "This normally should not happen, please file a bug report."
> warning it too strict -- I imagine it should be possible to pass such symbol
> names to the compiler by using escape codes.

This patch downgrades the warnings to symbol_lookup_debug messages.

WDYT?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-06-07 13:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-31 11:08 [Bug symtab/29205] New: Running selftest dw2_expand_symtabs_matching. warning: could not convert 'yfunc�' from the host encoding (UTF-8) to UTF-32 vries at gcc dot gnu.org
2022-05-31 11:11 ` [Bug symtab/29205] " vries at gcc dot gnu.org
2022-05-31 11:11 ` vries at gcc dot gnu.org
2022-05-31 11:25 ` vries at gcc dot gnu.org
2022-05-31 11:58 ` pedro at palves dot net
2022-05-31 12:06 ` vries at gcc dot gnu.org
2022-05-31 12:08 ` vries at gcc dot gnu.org
2022-05-31 12:23 ` vries at gcc dot gnu.org
2022-05-31 12:58 ` pedro at palves dot net
2022-06-07 13:05 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).