Re: Output of `locale -a` could be in mixed encodings?

public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed

From: Joseph Myers <joseph@codesourcery.com>
To: Carlos O'Donell <carlos@redhat.com>
Cc: GNU C Library <libc-alpha@sourceware.org>, <libc-locales@sourceware.org>
Subject: Re: Output of `locale -a` could be in mixed encodings?
Date: Wed, 21 Jan 2015 02:37:00 -0000	[thread overview]
Message-ID: <alpine.DEB.2.10.1501210206400.10663@digraph.polyomino.org.uk> (raw)
In-Reply-To: <54BF0329.5050604@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2181 bytes --]

On Tue, 20 Jan 2015, Carlos O'Donell wrote:

> The problem then is that if you took that UTF8 converted name of
> `bokmÃ¥l` and tried to call setlocale with that, it would fail.
> It fails because the name in UTF8 doesn't match the name in
> ISO-8859-1 that's stored as the alias or official locale name.

This could be a bug in setlocale.

POSIX says the locale name is a "character string", which is defined as a 
sequence of multibyte characters.  So arguably it should be interpreted in 
the current locale's character set (and so work if the LC_CTYPE before 
setlocale is that of a UTF-8 locale, fail if it's ASCII or ISO-8859-1).  
Except that the statement about being a character string is not CX-shaded, 
so should not be taken as intending any semantics beyond those in ISO C, 
and I don't see ISO C requiring any such thing.  (That said, I think 
interpreting the locale name in the current locale makes sense anyway, and 
is at least consistent with ISO C, even if not required.)

Now, we should also probably say that all non-ASCII locale names are 
deprecated (so this would just be a matter of adding a few more aliases 
for this locale using different encodings).  And then we could say that 
the locale utility doesn't output any non-ASCII locale names - as long as 
each locale has a valid ASCII name, I think that's conforming to POSIX.  
In fact, these aliases are already deprecated (locale.alias says "This 
file is obsolete ... Nobody should rely on the names defined here").

It's also the case that there's an existing weak deprecation of non-UTF-8 
locales (in the sense that every locale with a non-UTF-8 character set is 
supposed to have a corresponding locale with UTF-8 character set - if any 
don't, that's a bug unless there's some other reason for the locale to be 
deprecated whatever the character set - and the threshold for adding any 
new non-UTF-8 locales should be higher than for adding new UTF-8 locales).

>  language | Norwegian, Bokm<E5>l

That part of the output, however, should clearly be output in the user's 
locale character set - not in the character set of the locale in question.

-- 
Joseph S. Myers
joseph@codesourcery.com

next prev parent reply	other threads:[~2015-01-21  2:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-21  1:38 Carlos O'Donell
2015-01-21  2:06 ` Paul Eggert
2015-01-21  2:30   ` Carlos O'Donell
2015-01-21  4:50     ` Paul Eggert
2015-01-21  2:37 ` Joseph Myers [this message]
2015-01-21 16:19 ` Martin Sebor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1501210206400.10663@digraph.polyomino.org.uk \
    --to=joseph@codesourcery.com \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=libc-locales@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).