public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters @ 2022-02-16 6:41 mbuilov at gmail dot com 2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: mbuilov at gmail dot com @ 2022-02-16 6:41 UTC (permalink / raw) To: glibc-bugs https://sourceware.org/bugzilla/show_bug.cgi?id=28898 Bug ID: 28898 Summary: mbrtoc16 segfaults when counting wide characters Product: glibc Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: string Assignee: unassigned at sourceware dot org Reporter: mbuilov at gmail dot com Target Milestone: --- The following test case demonstrates Segmentation fault. The program works correctly if the first argument to mbrtoc16() is not NULL, otherwise it crashes. $ cat bug.c #include <stdio.h> #include <string.h> #include <locale.h> #include <uchar.h> static const unsigned char utf8[] = { 0xf0, 0x9d, 0x94, 0xa0, 0xd5, 0xae }; int main(int argc, char *argv[]) { mbstate_t state; unsigned i = 0, n = 0; const char *const locale = setlocale(LC_ALL, "C.UTF-8"); if (!locale) { fprintf(stderr, "failed to set locale\n"); return -2; } memset(&state, 0, sizeof(state)); while (i < sizeof(utf8)) { const size_t sz = mbrtoc16(NULL, (const char*)&utf8[i], 1, &state); if (sz == (size_t)-2) { i += 1; continue; } if (sz != (size_t)-3) i += 1; n++; } fprintf(stdout, "number of utf16 characters: %u\n", n); (void)argc, (void)argv; return 0; } ------------- The problem is in the following peace of code of mbrtoc16(), in ./wcsmbs/mbrtoc16.c: /* The standard text does not say that S being NULL means the state is reset even if the second half of a surrogate still have to be returned. In fact, the error code description indicates otherwise. Therefore always first try to return a second half. */ if (ps->__count & 0x80000000) { /* We have to return the second word for a surrogate. */ ps->__count &= 0x7fffffff; *pc16 = ps->__value.__wch; ps->__value.__wch = L'\0'; return (size_t) -3; } - the pc16 pointer is not checked for NULL before being dereferenced. -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug string/28898] mbrtoc16 segfaults when counting wide characters 2022-02-16 6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com @ 2023-06-26 13:23 ` bruno at clisp dot org 2023-06-26 13:27 ` bruno at clisp dot org 2024-05-04 4:28 ` luigighiron at gmail dot com 2 siblings, 0 replies; 4+ messages in thread From: bruno at clisp dot org @ 2023-06-26 13:23 UTC (permalink / raw) To: glibc-bugs https://sourceware.org/bugzilla/show_bug.cgi?id=28898 Bruno Haible <bruno at clisp dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bruno at clisp dot org --- Comment #1 from Bruno Haible <bruno at clisp dot org> --- Created attachment 14949 --> https://sourceware.org/bugzilla/attachment.cgi?id=14949&action=edit test case bug.c Find attached another test case: simpler code, without a loop. It crashes at the call ret = mbrtoc16 (NULL, input + 4, 0, &state); -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug string/28898] mbrtoc16 segfaults when counting wide characters 2022-02-16 6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com 2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org @ 2023-06-26 13:27 ` bruno at clisp dot org 2024-05-04 4:28 ` luigighiron at gmail dot com 2 siblings, 0 replies; 4+ messages in thread From: bruno at clisp dot org @ 2023-06-26 13:27 UTC (permalink / raw) To: glibc-bugs https://sourceware.org/bugzilla/show_bug.cgi?id=28898 --- Comment #2 from Bruno Haible <bruno at clisp dot org> --- Reproduced with glibc 2.35 and 2.36. -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug string/28898] mbrtoc16 segfaults when counting wide characters 2022-02-16 6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com 2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org 2023-06-26 13:27 ` bruno at clisp dot org @ 2024-05-04 4:28 ` luigighiron at gmail dot com 2 siblings, 0 replies; 4+ messages in thread From: luigighiron at gmail dot com @ 2024-05-04 4:28 UTC (permalink / raw) To: glibc-bugs https://sourceware.org/bugzilla/show_bug.cgi?id=28898 Halalaluyafail3 <luigighiron at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |luigighiron at gmail dot com --- Comment #3 from Halalaluyafail3 <luigighiron at gmail dot com> --- This still seems to happen, also I think it is worth mentioning that the current text of the standard does say "Subsequent calls will store successive wide characters without consuming any additional input until all the characters have been stored." so this could be interpreted as allowing for this crash to happen since no mention of null here is made (though I do not think that this was intended). -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-05-04 4:28 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-16 6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com 2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org 2023-06-26 13:27 ` bruno at clisp dot org 2024-05-04 4:28 ` luigighiron at gmail dot com
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).