public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters
@ 2022-02-16  6:41 mbuilov at gmail dot com
  2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: mbuilov at gmail dot com @ 2022-02-16  6:41 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=28898

            Bug ID: 28898
           Summary: mbrtoc16 segfaults when counting wide characters
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: string
          Assignee: unassigned at sourceware dot org
          Reporter: mbuilov at gmail dot com
  Target Milestone: ---

The following test case demonstrates Segmentation fault.

The program works correctly if the first argument to mbrtoc16() is not NULL,
otherwise it crashes.


$ cat bug.c
#include <stdio.h>
#include <string.h>
#include <locale.h>
#include <uchar.h>

static const unsigned char utf8[] = {
  0xf0, 0x9d, 0x94, 0xa0, 0xd5, 0xae
};

int main(int argc, char *argv[])
{
        mbstate_t state;
        unsigned i = 0, n = 0;
        const char *const locale = setlocale(LC_ALL, "C.UTF-8");
        if (!locale) {
                fprintf(stderr, "failed to set locale\n");
                return -2;
        }
        memset(&state, 0, sizeof(state));
        while (i < sizeof(utf8)) {
                const size_t sz = mbrtoc16(NULL, (const char*)&utf8[i], 1,
&state);
                if (sz == (size_t)-2) {
                        i += 1;
                        continue;
                }
                if (sz != (size_t)-3)
                        i += 1;
                n++;
        }
        fprintf(stdout, "number of utf16 characters: %u\n", n);
        (void)argc, (void)argv;
        return 0;
}

-------------

The problem is in the following peace of code of mbrtoc16(), in
./wcsmbs/mbrtoc16.c:


  /* The standard text does not say that S being NULL means the state
     is reset even if the second half of a surrogate still have to be
     returned.  In fact, the error code description indicates
     otherwise.  Therefore always first try to return a second
     half.  */
  if (ps->__count & 0x80000000)
    {
      /* We have to return the second word for a surrogate.  */
      ps->__count &= 0x7fffffff;
      *pc16 = ps->__value.__wch;
      ps->__value.__wch = L'\0';
      return (size_t) -3;
    }


- the pc16 pointer is not checked for NULL before being dereferenced.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug string/28898] mbrtoc16 segfaults when counting wide characters
  2022-02-16  6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com
@ 2023-06-26 13:23 ` bruno at clisp dot org
  2023-06-26 13:27 ` bruno at clisp dot org
  2024-05-04  4:28 ` luigighiron at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: bruno at clisp dot org @ 2023-06-26 13:23 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=28898

Bruno Haible <bruno at clisp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bruno at clisp dot org

--- Comment #1 from Bruno Haible <bruno at clisp dot org> ---
Created attachment 14949
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14949&action=edit
test case bug.c

Find attached another test case: simpler code, without a loop. It crashes at
the call

  ret = mbrtoc16 (NULL, input + 4, 0, &state);

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug string/28898] mbrtoc16 segfaults when counting wide characters
  2022-02-16  6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com
  2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
@ 2023-06-26 13:27 ` bruno at clisp dot org
  2024-05-04  4:28 ` luigighiron at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: bruno at clisp dot org @ 2023-06-26 13:27 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=28898

--- Comment #2 from Bruno Haible <bruno at clisp dot org> ---
Reproduced with glibc 2.35 and 2.36.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug string/28898] mbrtoc16 segfaults when counting wide characters
  2022-02-16  6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com
  2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
  2023-06-26 13:27 ` bruno at clisp dot org
@ 2024-05-04  4:28 ` luigighiron at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: luigighiron at gmail dot com @ 2024-05-04  4:28 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=28898

Halalaluyafail3 <luigighiron at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |luigighiron at gmail dot com

--- Comment #3 from Halalaluyafail3 <luigighiron at gmail dot com> ---
This still seems to happen, also I think it is worth mentioning that the
current text of the standard does say "Subsequent calls will store successive
wide characters without consuming any additional input until all the characters
have been stored." so this could be interpreted as allowing for this crash to
happen since no mention of null here is made (though I do not think that this
was intended).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-04  4:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16  6:41 [Bug string/28898] New: mbrtoc16 segfaults when counting wide characters mbuilov at gmail dot com
2023-06-26 13:23 ` [Bug string/28898] " bruno at clisp dot org
2023-06-26 13:27 ` bruno at clisp dot org
2024-05-04  4:28 ` luigighiron at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).