From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 105964 invoked by alias); 24 Jul 2015 00:20:12 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 105839 invoked by uid 89); 24 Jul 2015 00:20:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Message-ID: <55B184B4.9030400@redhat.com> Date: Fri, 24 Jul 2015 00:20:00 -0000 From: "Carlos O'Donell" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: =?UTF-8?B?T25kxZllaiBCw61sa2E=?= , Joseph Myers CC: Keld Simonsen , Marko Myllynen , GNU C Library , libc-locales@sourceware.org Subject: Re: [PATCH] Use Unicode code points for country_isbn References: <5571B8C2.8000108@redhat.com> <20150609071130.GA26925@domone> <5576BC13.5020001@redhat.com> <20150721081840.GE12267@vapier> <20150721084006.GB29742@www5.open-std.org> <20150721092217.GG12267@vapier> <20150721115852.GA24115@rap.rap.dk> <20150722190228.GA18489@www5.open-std.org> <20150723222705.GB2518@domone> In-Reply-To: <20150723222705.GB2518@domone> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2015-q3/txt/msg00034.txt.bz2 On 07/23/2015 06:27 PM, Ondřej Bílka wrote: >> I'd rather have some extension to allow a locale source file to declare >> that it is in UTF-8, and then use UTF-8 throughout except for control >> characters or combining characters used in isolation. >> > I second that. It would be technically easy to do, so its mostly matter > of selecting proper interface. If we require some utf8 locale (if we > decide for C.UTF8 then use it otherwise for example en_US. > > Then it would be matter of selecting different locale on files marked > say by having UTF8 in first line. Sample implementation would be: > > fgets (first_line, 5, locale); > if (!memcmp (first_line, "UTF8", 4)) > setlocale(LC_ALL,"en_US.UTF8"); > else > /* unget first line. */ > I agree with Joseph's position here. Further to that, my primary goal is to make contribution for these files easier. I have no interest in the abstract cases that are not being supported by anyone at the present moment. Cheers, Carlos.