public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carlos O'Donell <carlos@redhat.com>
To: Andreas Schwab <schwab@suse.de>,
	Carlos O'Donell via Libc-alpha <libc-alpha@sourceware.org>
Cc: Florian Weimer <fw@deneb.enyo.de>
Subject: Re: [PATCH] Reset converter state after second wchar_t output (Bug 25734)
Date: Mon, 30 Mar 2020 13:52:24 -0400	[thread overview]
Message-ID: <a2394d5c-4124-d846-bba1-e2270683056e@redhat.com> (raw)
In-Reply-To: <87eet9lv5f.fsf@igel.home>

On 3/30/20 11:28 AM, Andreas Schwab wrote:
> On Mär 30 2020, Carlos O'Donell via Libc-alpha wrote:
> 
>> On 3/30/20 10:28 AM, Andreas Schwab wrote:
>>> On Mär 30 2020, Florian Weimer wrote:
>>>
>>>> I'm not sure if the C committee wants implementations to be able to
>>>> support Big5 (without Unicode changes first, to add characters which
>>>> avoid the two-codepoint special cases).
>>>
>>> Are you saying mbrtowc should return -1 here?
>>
>> No. That indicates an invalid multibyte sequence was found.
> 
> It is not representable, thus invalid.

Sorry, I think I misunderstood your question.

I think you are actually asking what should a hypothetically correct
implementation do in this case?

If that is your question, then I agree, it should return -1 when it
finds any input that violates the C requirements.

I would *not* change glibc to do this though since BIG5-HKSCS is
supported and in use in glibc.

A simple converter can be written that goes through all input bytes
bytes until the input is at the end or errors out (rather than stopping
at the observed L'\0'), but it requires you know the length of the input.

I have seen many examples looking for result > 0 though, so I expect
such codes would immediately stop when encountering such BIG5-HKSCS
input that generates a 0 return.

The Microsoft docs have a similar example stopping the conversion when
0 is returned, but using -2 to continue stepping through the input,
advancing by one byte to attempt to put together the incomplete sequence
(expecting the state to accrue).

The Microsoft docs are here:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/mbrtowc?view=vs-2019

Example here:
https://rextester.com/UYPGU65292

Windows:
1st mbrtowc call: 0xF325
  result: 2
2nd mbrtowc call: 0x0062
  result: 1
3rd mbrtowc call: 0x0058
  result: 1

Linux:
1st mbrtowc call: 0x00CA
  result: 2
2nd mbrtowc call: 0x0304
  result: 0
3rd mbrtowc call: 0x0058
  result: 1

Note that in the Microsoft implementation you *can't* use the
value of the return from mbrtowc to walk the input forward, and
that seems like a mistake to me, at least 0 is an honest (if
wrong) answer.

-- 
Cheers,
Carlos.


  reply	other threads:[~2020-03-30 17:52 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27 21:18 Carlos O'Donell
2020-03-28 20:37 ` Carlos O'Donell
2020-03-28 20:51   ` Florian Weimer
2020-03-28 23:42     ` Carlos O'Donell
2020-03-30 17:40   ` Tom Honermann
2020-04-01 17:34     ` Tom Honermann
2020-04-04 18:41       ` [PATCH v2] " Carlos O'Donell
2020-04-07  4:20         ` Tom Honermann
2020-04-14 19:03         ` Tom Honermann
2020-04-14 19:56           ` Carlos O'Donell
2020-04-16  2:41             ` Carlos O'Donell
2020-04-16  3:08               ` Tom Honermann
2020-04-16  3:10                 ` Carlos O'Donell
2020-03-30 12:11 ` [PATCH] " Andreas Schwab
2020-03-30 14:19   ` Florian Weimer
2020-03-30 14:28     ` Andreas Schwab
2020-03-30 14:32       ` Florian Weimer
2020-03-30 14:36       ` Carlos O'Donell
2020-03-30 15:28         ` Andreas Schwab
2020-03-30 17:52           ` Carlos O'Donell [this message]
2020-03-30 14:34   ` Carlos O'Donell
2020-03-30 17:36     ` Tom Honermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2394d5c-4124-d846-bba1-e2270683056e@redhat.com \
    --to=carlos@redhat.com \
    --cc=fw@deneb.enyo.de \
    --cc=libc-alpha@sourceware.org \
    --cc=schwab@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).