public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters
@ 2022-02-16 6:41 mbuilov at gmail dot com
2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: mbuilov at gmail dot com @ 2022-02-16 6:41 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=28898
Bug ID: 28898
Summary: mbrtoc16 segfaults when counting wide characters
Product: glibc
Version: unspecified
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: string
Assignee: unassigned at sourceware dot org
Reporter: mbuilov at gmail dot com
Target Milestone: ---
The following test case demonstrates Segmentation fault.
The program works correctly if the first argument to mbrtoc16() is not NULL,
otherwise it crashes.
$ cat bug.c
#include <stdio.h>
#include <string.h>
#include <locale.h>
#include <uchar.h>
static const unsigned char utf8[] = {
0xf0, 0x9d, 0x94, 0xa0, 0xd5, 0xae
};
int main(int argc, char *argv[])
{
mbstate_t state;
unsigned i = 0, n = 0;
const char *const locale = setlocale(LC_ALL, "C.UTF-8");
if (!locale) {
fprintf(stderr, "failed to set locale\n");
return -2;
}
memset(&state, 0, sizeof(state));
while (i < sizeof(utf8)) {
const size_t sz = mbrtoc16(NULL, (const char*)&utf8[i], 1,
&state);
if (sz == (size_t)-2) {
i += 1;
continue;
}
if (sz != (size_t)-3)
i += 1;
n++;
}
fprintf(stdout, "number of utf16 characters: %u\n", n);
(void)argc, (void)argv;
return 0;
}
-------------
The problem is in the following peace of code of mbrtoc16(), in
./wcsmbs/mbrtoc16.c:
/* The standard text does not say that S being NULL means the state
is reset even if the second half of a surrogate still have to be
returned. In fact, the error code description indicates
otherwise. Therefore always first try to return a second
half. */
if (ps->__count & 0x80000000)
{
/* We have to return the second word for a surrogate. */
ps->__count &= 0x7fffffff;
*pc16 = ps->__value.__wch;
ps->__value.__wch = L'\0';
return (size_t) -3;
}
- the pc16 pointer is not checked for NULL before being dereferenced.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug string/28898] mbrtoc16 segfaults when counting wide characters
2022-02-16 6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com
@ 2023-06-26 13:23 ` bruno at clisp dot org
2023-06-26 13:27 ` bruno at clisp dot org
2024-05-04 4:28 ` luigighiron at gmail dot com
2 siblings, 0 replies; 4+ messages in thread
From: bruno at clisp dot org @ 2023-06-26 13:23 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=28898
Bruno Haible <bruno at clisp dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bruno at clisp dot org
--- Comment #1 from Bruno Haible <bruno at clisp dot org> ---
Created attachment 14949
--> https://sourceware.org/bugzilla/attachment.cgi?id=14949&action=edit
test case bug.c
Find attached another test case: simpler code, without a loop. It crashes at
the call
ret = mbrtoc16 (NULL, input + 4, 0, &state);
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug string/28898] mbrtoc16 segfaults when counting wide characters
2022-02-16 6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com
2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
@ 2023-06-26 13:27 ` bruno at clisp dot org
2024-05-04 4:28 ` luigighiron at gmail dot com
2 siblings, 0 replies; 4+ messages in thread
From: bruno at clisp dot org @ 2023-06-26 13:27 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=28898
--- Comment #2 from Bruno Haible <bruno at clisp dot org> ---
Reproduced with glibc 2.35 and 2.36.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug string/28898] mbrtoc16 segfaults when counting wide characters
2022-02-16 6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com
2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
2023-06-26 13:27 ` bruno at clisp dot org
@ 2024-05-04 4:28 ` luigighiron at gmail dot com
2 siblings, 0 replies; 4+ messages in thread
From: luigighiron at gmail dot com @ 2024-05-04 4:28 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=28898
Halalaluyafail3 <luigighiron at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |luigighiron at gmail dot com
--- Comment #3 from Halalaluyafail3 <luigighiron at gmail dot com> ---
This still seems to happen, also I think it is worth mentioning that the
current text of the standard does say "Subsequent calls will store successive
wide characters without consuming any additional input until all the characters
have been stored." so this could be interpreted as allowing for this crash to
happen since no mention of null here is made (though I do not think that this
was intended).
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-05-04 4:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16 6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com
2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
2023-06-26 13:27 ` bruno at clisp dot org
2024-05-04 4:28 ` luigighiron at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).