public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters
@ 2022-02-16  6:41 mbuilov at gmail dot com
  2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: mbuilov at gmail dot com @ 2022-02-16  6:41 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=28898

            Bug ID: 28898
           Summary: mbrtoc16 segfaults when counting wide characters
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: string
          Assignee: unassigned at sourceware dot org
          Reporter: mbuilov at gmail dot com
  Target Milestone: ---

The following test case demonstrates Segmentation fault.

The program works correctly if the first argument to mbrtoc16() is not NULL,
otherwise it crashes.


$ cat bug.c
#include <stdio.h>
#include <string.h>
#include <locale.h>
#include <uchar.h>

static const unsigned char utf8[] = {
  0xf0, 0x9d, 0x94, 0xa0, 0xd5, 0xae
};

int main(int argc, char *argv[])
{
        mbstate_t state;
        unsigned i = 0, n = 0;
        const char *const locale = setlocale(LC_ALL, "C.UTF-8");
        if (!locale) {
                fprintf(stderr, "failed to set locale\n");
                return -2;
        }
        memset(&state, 0, sizeof(state));
        while (i < sizeof(utf8)) {
                const size_t sz = mbrtoc16(NULL, (const char*)&utf8[i], 1,
&state);
                if (sz == (size_t)-2) {
                        i += 1;
                        continue;
                }
                if (sz != (size_t)-3)
                        i += 1;
                n++;
        }
        fprintf(stdout, "number of utf16 characters: %u\n", n);
        (void)argc, (void)argv;
        return 0;
}

-------------

The problem is in the following peace of code of mbrtoc16(), in
./wcsmbs/mbrtoc16.c:


  /* The standard text does not say that S being NULL means the state
     is reset even if the second half of a surrogate still have to be
     returned.  In fact, the error code description indicates
     otherwise.  Therefore always first try to return a second
     half.  */
  if (ps->__count & 0x80000000)
    {
      /* We have to return the second word for a surrogate.  */
      ps->__count &= 0x7fffffff;
      *pc16 = ps->__value.__wch;
      ps->__value.__wch = L'\0';
      return (size_t) -3;
    }


- the pc16 pointer is not checked for NULL before being dereferenced.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-04  4:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16  6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com
2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
2023-06-26 13:27 ` bruno at clisp dot org
2024-05-04  4:28 ` luigighiron at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).