public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
From: brian.inglis@systematicsw.ab.ca
To: newlib@sourceware.org
Subject: Re: wctomb() accepts out-of-range character in C-locale
Date: Mon, 25 Mar 2024 14:18:52 -0600	[thread overview]
Message-ID: <000faa1d-91bf-4d90-9e4e-138c4bf889c0@systematicsw.ab.ca> (raw)
In-Reply-To: <5DC0BA8B-0B0C-4C91-8F35-C11ACE3E9EF9@kba.biglobe.ne.jp>

On 2024-03-25 08:07, Jun. T wrote:
> 
>> 2024/03/25 20:26, Bruno Haible <bruno@clisp.org> wrote:
>>
>>>> But a wide character >= 0x80 can't be converted into a valid
>>>> character in C-loccale (7bit), I think.
>>
>> Err. "C" locale, a.k.a. "POSIX" locale, is not 7-bit but 8-bit.
>> Quoting https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap06.html#tag_06_02 :
>>   "The POSIX locale shall contain 256 single-byte characters ..."
> 
> I still can't understand why it is useful to convert wide char
> in the range 0x80-0xff to an 8bit char in C-locale (for example
> convert wide char 0xe1 (U+00e1) = á to an 8bit char 0xe1).

Before Unicode, UCS, and UTF character sets, European Single Byte Character Sets 
such as ISO-8859-* were used for Latin script based languages, including most 
programming languages, with accented characters mainly in the high half, and 
supported (most of) the POSIX character set; whereas Arabic, Cyrillic, Greek, 
Hebrew, other Asian and Indian, and CJK Han script based languages used some 
local SBCS, fuller featured Double Byte Character Sets, and Multi Byte Character 
Sets, some of which supported (parts of) the POSIX character set, and used shift 
characters to switch to characters encoded using the second and other bytes.

For more info see https://en.wikipedia.org/wiki/SBCS and linked articles.

 > But if you say this is THE correct behavior then it's OK.

POSIX says it, so by definition, it's OK! ;^>

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

  reply	other threads:[~2024-03-25 20:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25  7:45 Jun T
2024-03-25 10:32 ` Corinna Vinschen
2024-03-25 11:26   ` Bruno Haible
2024-03-25 11:34     ` Corinna Vinschen
2024-03-25 14:07     ` Jun. T
2024-03-25 20:18       ` brian.inglis [this message]
2024-03-26  1:43         ` Jun. T
     [not found]           ` <IBDYAS.IT0GDL3WNBOQ@att.net>
2024-03-26 11:48             ` Steven J Abner
2024-03-27  8:01               ` Jun. T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=000faa1d-91bf-4d90-9e4e-138c4bf889c0@systematicsw.ab.ca \
    --to=brian.inglis@systematicsw.ab.ca \
    --cc=newlib@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).