From: Joseph Myers <joseph@codesourcery.com>
To: Carlos O'Donell <carlos@redhat.com>
Cc: GNU C Library <libc-alpha@sourceware.org>, <libc-locales@sourceware.org>
Subject: Re: Output of `locale -a` could be in mixed encodings?
Date: Wed, 21 Jan 2015 02:37:00 -0000 [thread overview]
Message-ID: <alpine.DEB.2.10.1501210206400.10663@digraph.polyomino.org.uk> (raw)
In-Reply-To: <54BF0329.5050604@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2181 bytes --]
On Tue, 20 Jan 2015, Carlos O'Donell wrote:
> The problem then is that if you took that UTF8 converted name of
> `bokmål` and tried to call setlocale with that, it would fail.
> It fails because the name in UTF8 doesn't match the name in
> ISO-8859-1 that's stored as the alias or official locale name.
This could be a bug in setlocale.
POSIX says the locale name is a "character string", which is defined as a
sequence of multibyte characters. So arguably it should be interpreted in
the current locale's character set (and so work if the LC_CTYPE before
setlocale is that of a UTF-8 locale, fail if it's ASCII or ISO-8859-1).
Except that the statement about being a character string is not CX-shaded,
so should not be taken as intending any semantics beyond those in ISO C,
and I don't see ISO C requiring any such thing. (That said, I think
interpreting the locale name in the current locale makes sense anyway, and
is at least consistent with ISO C, even if not required.)
Now, we should also probably say that all non-ASCII locale names are
deprecated (so this would just be a matter of adding a few more aliases
for this locale using different encodings). And then we could say that
the locale utility doesn't output any non-ASCII locale names - as long as
each locale has a valid ASCII name, I think that's conforming to POSIX.
In fact, these aliases are already deprecated (locale.alias says "This
file is obsolete ... Nobody should rely on the names defined here").
It's also the case that there's an existing weak deprecation of non-UTF-8
locales (in the sense that every locale with a non-UTF-8 character set is
supposed to have a corresponding locale with UTF-8 character set - if any
don't, that's a bug unless there's some other reason for the locale to be
deprecated whatever the character set - and the threshold for adding any
new non-UTF-8 locales should be higher than for adding new UTF-8 locales).
> language | Norwegian, Bokm<E5>l
That part of the output, however, should clearly be output in the user's
locale character set - not in the character set of the locale in question.
--
Joseph S. Myers
joseph@codesourcery.com
next prev parent reply other threads:[~2015-01-21 2:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-21 1:38 Carlos O'Donell
2015-01-21 2:06 ` Paul Eggert
2015-01-21 2:30 ` Carlos O'Donell
2015-01-21 4:50 ` Paul Eggert
2015-01-21 2:37 ` Joseph Myers [this message]
2015-01-21 16:19 ` Martin Sebor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.10.1501210206400.10663@digraph.polyomino.org.uk \
--to=joseph@codesourcery.com \
--cc=carlos@redhat.com \
--cc=libc-alpha@sourceware.org \
--cc=libc-locales@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).