From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by sourceware.org (Postfix) with ESMTPS id DB90E3858D1E for ; Thu, 27 Jan 2022 05:40:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DB90E3858D1E Received: by mail-lf1-x12a.google.com with SMTP id bu18so3162064lfb.5 for ; Wed, 26 Jan 2022 21:40:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8JwefE5LgFo3JW7NuHDcnTLDuOeJBnA6YPr7UF6d/P4=; b=th3NYlaqTP45irKQr0FTAhQiaQ4QsOKh8drg6RzeT8NJHEkNUSJZIxUHximDAy4KKi /cKs62v4+8uCusge8tk4pvxjSFIwhzMbFhM/6JSZxBPfbI6DL64vpa2btAJNT3nmj0o1 zAf/eNJDTidSNDtneGrLok7krkoTmD0fBC6REV3HLSPMEvybbg2meRNDbwokJpQwLEXW i3ZE23Is0/EtHoXoA2X6oCbrraYk5UDVwZOEAouMb392UCbNiA+TciFPSddGGmLDb/oi EtWrtNLgdPADq+MrKk3+AgjBxHP3/o2CZuphedy4H90JK4avFAxCnzAVXYBO4tnWo/vm TUcA== X-Gm-Message-State: AOAM530MPlfnbohhfEd2Z1FwtSjwJSNykEtOYsRW5fnPf7Szl2TrrDyy e33fRigSAQuqPhzI4q/Otva8n2iTLUwI3l9VOOY= X-Google-Smtp-Source: ABdhPJyqOG8KgwY54llRjdtRWvuzacg18JAz4YZDqlAw5NB+sFA426oiFpLN1JKF3DxnFgHGqp4Bq2kB2vO6t0dVZs4= X-Received: by 2002:a19:ac03:: with SMTP id g3mr1879672lfc.466.1643262019126; Wed, 26 Jan 2022 21:40:19 -0800 (PST) MIME-Version: 1.0 References: <033b9a58-46c7-ceff-212f-203bce879172@redhat.com> In-Reply-To: <033b9a58-46c7-ceff-212f-203bce879172@redhat.com> From: Godmar Back Date: Thu, 27 Jan 2022 00:40:07 -0500 Message-ID: Subject: Re: why does `LANG` not influence the locale in GNU/Linux To: "Carlos O'Donell" Cc: libc-help Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jan 2022 05:40:22 -0000 On Thu, Jan 27, 2022 at 12:29 AM Carlos O'Donell wrote: > > On 1/26/22 19:35, Godmar Back via Libc-help wrote: > > If I am writing code that uses fgetwc to read from standard input, I > > need to call setlocale(LC_CTYPE, "en_US.utf8"); to ensure that fgetwc > > will treat stdin as a UTF-8 encoded stream. This property appears to > > hold (on Ubuntu 20, with default GNU libc), independent of the setting > > of LANG. LANG is set to en_US.utf8. > > > > By contrast, Python 3 changes its behavior with regard to the encoding > > it assumes sys.stdin to be in based on the LANG variable - if set to > > en_US.utf8, it'll decode as UTF-8, if set to C or unset, it'll treat > > the input as 8-bit ASCII encoding (I believe). > > > > My question is why GNU libc (chose?) to not use LANG and what > > standards, if any, apply here. > > Is my characterization of the behavior correct? > > The ISO C standard and the POSIX standard have specified the behaviour. > > ISO C says that at program startup the equivalent of: > setlocale (LC_ALL, "C"); /* (See 7.11.1.1.4) */ > is executed. Therefore C programs start in the C locale, not the locale > as specified by LANG. > > POSIX defines several environment variables, and they have precedence > rules for deciding which one should be used by the application [1]. > However, this just defines the categories, but it doesn't *apply* them to the > running program. Internationalized programs that wish to initialize locale > specific operation must call setlocale [2]: > > setlocale (LC_CALL, ""); There's an extraneous `C` here, I'm reading this as `setlocale (LC_ALL, "")`. > > Which will set up the global locale according to the environment variables > in the precedence as setup by POSIX. > > Does that answer your question? Yes it does, thank you. It's slightly ironic if you read, for instance PEP-538 [*] that claims that Python has a need "to use the configured locale encoding by default for consistency with other locale-aware components in the same process or subprocesses" when the default behavior in a Unix environment is to not be locale-aware and revert back to the C locale. [*] https://www.python.org/dev/peps/pep-0538/ > > -- > Cheers, > Carlos. > > [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 > [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html >