From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 97638 invoked by alias); 24 Jul 2017 19:00:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 83560 invoked by uid 89); 24 Jul 2017 19:00:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=pdfs, pursue, PDFs, Supporting X-HELO: mail-qt0-f175.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=3LAF1u0Qf9t+Pl0k1a9MDbvXr6aa5TNVy18XoZQw8gI=; b=Z7XFTJ5+6PEl8Eh5mN3Ozpwr/5OWOL69HkZhPjeTMZxcYeK0TfIkQ4AFmIKA70zpGU gZbBuH0ugeo3tNxg9ilD+U7iRBqzbrHUzB/xWQnJ1bCD7ESD6Cum1+su+K5kjcgcjqJS z7RsN86GZKPJ+y3Bv+c2ifCQDg8LMLDQTfUEapCES/SYxHaKO85Oizla9iOW0jz+CglJ mzHD8au2ZCoHPr3DaIoMY5wj/M2P0uJAwNHlj3VA4Pg/zoMClzX7DmQH6Q88bvkOyax+ pZpDxkgQ7auMZ/+bsdgcM0iFgwyXQhMii4/gKQcRljiHv+WsInkEk+esTnCis1wqBfiu Yd7w== X-Gm-Message-State: AIVw111SZYTI530A3PzB74WOYixxfJhzIaIlGHQpoCaXmjzSzokwNgk1 FSIiRkeMgpWE2NKKjJ/t0A== X-Received: by 10.237.56.135 with SMTP id k7mr21327418qte.134.1500922831221; Mon, 24 Jul 2017 12:00:31 -0700 (PDT) Subject: Re: Is it OK to write ASCII strings directly into locale source files? To: Florian Weimer , Andreas Schwab Cc: Mike FABIAN , libc-alpha@sourceware.org References: <5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com> <87h8y13gvb.fsf@mid.deneb.enyo.de> From: Carlos O'Donell Message-ID: Date: Mon, 24 Jul 2017 20:07:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <87h8y13gvb.fsf@mid.deneb.enyo.de> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2017-07/txt/msg00823.txt.bz2 On 07/24/2017 01:05 PM, Florian Weimer wrote: > * Andreas Schwab: > >> On Jul 24 2017, Carlos O'Donell wrote: >> >>> So let us start slowly and agree with 'ASCII - [<>]' where < denotes >>> the start of a code point and > the end of the code point. >> >> POSIX says "character in the portable character set" if you want to keep >> it portable. > > But our locales only have to be compatible with our localedef, right? Should developers be able to write tools to the POSIX locale spec and parse our source locale definitions? Supporting more than just GNU/Linux? Do the BSDs share our locale definitions? > I know that the FSF does not claim copyright on our locales, so anyone > is free to take them and use them with their own non-GNU systems (or > sell them as PDFs/books). But this does not mean we have to make > their lives easier if it comes at a cost to us (e.g., verifying that > we only use the portable character set, or refraining from using full > UTF-8 at a future date). I agree with your sentiment, and leave it up to Mike to decide what makes it ultimately easier for him as a subsystem maintainer to work with. There is certainly a cost/reward balance. My only technical objection with writing straight UTF-8 is that it could lead to more mistakes, and Mike just found one in CLDR where an Arabic Farsi character was used incorrectly because it displayed the same glyph. It was caught when harmonizing with glibc where you have to write out the code points (Mike filed a bug upstream with CLDR). My preference would be to start small, start using the POSIX portable character set to it's maximum extent for all latin-based languages, see how that works out, and then decide if we even need to pursue full UTF-8 and in which form. -- Cheers, Carlos.