From: brian.inglis@systematicsw.ab.ca
To: newlib@sourceware.org
Subject: Re: wctomb() accepts out-of-range character in C-locale
Date: Mon, 25 Mar 2024 14:18:52 -0600 [thread overview]
Message-ID: <000faa1d-91bf-4d90-9e4e-138c4bf889c0@systematicsw.ab.ca> (raw)
In-Reply-To: <5DC0BA8B-0B0C-4C91-8F35-C11ACE3E9EF9@kba.biglobe.ne.jp>
On 2024-03-25 08:07, Jun. T wrote:
>
>> 2024/03/25 20:26, Bruno Haible <bruno@clisp.org> wrote:
>>
>>>> But a wide character >= 0x80 can't be converted into a valid
>>>> character in C-loccale (7bit), I think.
>>
>> Err. "C" locale, a.k.a. "POSIX" locale, is not 7-bit but 8-bit.
>> Quoting https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap06.html#tag_06_02 :
>> "The POSIX locale shall contain 256 single-byte characters ..."
>
> I still can't understand why it is useful to convert wide char
> in the range 0x80-0xff to an 8bit char in C-locale (for example
> convert wide char 0xe1 (U+00e1) = á to an 8bit char 0xe1).
Before Unicode, UCS, and UTF character sets, European Single Byte Character Sets
such as ISO-8859-* were used for Latin script based languages, including most
programming languages, with accented characters mainly in the high half, and
supported (most of) the POSIX character set; whereas Arabic, Cyrillic, Greek,
Hebrew, other Asian and Indian, and CJK Han script based languages used some
local SBCS, fuller featured Double Byte Character Sets, and Multi Byte Character
Sets, some of which supported (parts of) the POSIX character set, and used shift
characters to switch to characters encoded using the second and other bytes.
For more info see https://en.wikipedia.org/wiki/SBCS and linked articles.
> But if you say this is THE correct behavior then it's OK.
POSIX says it, so by definition, it's OK! ;^>
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
next prev parent reply other threads:[~2024-03-25 20:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-25 7:45 Jun T
2024-03-25 10:32 ` Corinna Vinschen
2024-03-25 11:26 ` Bruno Haible
2024-03-25 11:34 ` Corinna Vinschen
2024-03-25 14:07 ` Jun. T
2024-03-25 20:18 ` brian.inglis [this message]
2024-03-26 1:43 ` Jun. T
[not found] ` <IBDYAS.IT0GDL3WNBOQ@att.net>
2024-03-26 11:48 ` Steven J Abner
2024-03-27 8:01 ` Jun. T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=000faa1d-91bf-4d90-9e4e-138c4bf889c0@systematicsw.ab.ca \
--to=brian.inglis@systematicsw.ab.ca \
--cc=newlib@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).