From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id EAD283858D1E for ; Thu, 27 Jan 2022 05:29:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EAD283858D1E Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-270-sWsTaobFM1id7752L-0NUQ-1; Thu, 27 Jan 2022 00:28:58 -0500 X-MC-Unique: sWsTaobFM1id7752L-0NUQ-1 Received: by mail-qt1-f197.google.com with SMTP id g18-20020ac84b72000000b002cf274754c5so1469392qts.14 for ; Wed, 26 Jan 2022 21:28:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:organization:cc:in-reply-to :content-transfer-encoding; bh=BSGgpMcJMqyAVyRYa5+IHUU3alnOXbfmwc9vKe0C+xk=; b=18FtPz1HcZiK5vtbvzDlWrVoFV251telOxUZo00Z1Hnm6UnTLBYUUM697/asRil5aC HsJkyBe4MqTtZOizdtqI8X4Y4EM5EbkpmemxXQN25XgAjFo+Vdc3kVW9ieqbcOLC2TBn LWqUxxzpMFvpQZqRdfpAobcPqtVjSjdScN+j0ghXNtWYnMAYDtrdz+NngtDDhAtz7laZ Qmm+SsM1CCyfHdiokwcY4gCjTXhs3W2aETd5OGf9Rva7jZs0QvF97z+QqrwQ+fSh/3fu mSg6VFDg6TDCXlNCovLdbSk1XT17IwrD+4z1JnSKnAZPwEwIFzFaZh2OKDTK/u7LtAQy iwhQ== X-Gm-Message-State: AOAM533uiz8d9QBU3yf7V8ZriV7xNrQ7SMCWbxZiCKx+II6UQX8EOi1A 3T6ftTv6vJlQjd3TmuwWS3EC0clKqlqb4WPYwE0uHjIA/vgvvvMQxLNlh15wNKTtgqQLcag/EQa d8/V1VvzejajkS304Snc= X-Received: by 2002:a05:622a:1911:: with SMTP id w17mr1509187qtc.312.1643261337768; Wed, 26 Jan 2022 21:28:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJwoldvNJG4xOc14v/VlqcBjc+EOIHCgefYmcnnfbdjZ9BfeYhCXJct9aPBZCiUKyHg6c03rDA== X-Received: by 2002:a05:622a:1911:: with SMTP id w17mr1509183qtc.312.1643261337459; Wed, 26 Jan 2022 21:28:57 -0800 (PST) Received: from [192.168.0.241] (135-23-175-80.cpe.pppoe.ca. [135.23.175.80]) by smtp.gmail.com with ESMTPSA id t13sm759250qti.47.2022.01.26.21.28.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 Jan 2022 21:28:56 -0800 (PST) Message-ID: <033b9a58-46c7-ceff-212f-203bce879172@redhat.com> Date: Thu, 27 Jan 2022 00:28:55 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: why does `LANG` not influence the locale in GNU/Linux To: Godmar Back References: From: Carlos O'Donell Organization: Red Hat Cc: libc-help In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jan 2022 05:29:04 -0000 On 1/26/22 19:35, Godmar Back via Libc-help wrote: > If I am writing code that uses fgetwc to read from standard input, I > need to call setlocale(LC_CTYPE, "en_US.utf8"); to ensure that fgetwc > will treat stdin as a UTF-8 encoded stream. This property appears to > hold (on Ubuntu 20, with default GNU libc), independent of the setting > of LANG. LANG is set to en_US.utf8. > > By contrast, Python 3 changes its behavior with regard to the encoding > it assumes sys.stdin to be in based on the LANG variable - if set to > en_US.utf8, it'll decode as UTF-8, if set to C or unset, it'll treat > the input as 8-bit ASCII encoding (I believe). > > My question is why GNU libc (chose?) to not use LANG and what > standards, if any, apply here. > Is my characterization of the behavior correct? The ISO C standard and the POSIX standard have specified the behaviour. ISO C says that at program startup the equivalent of: setlocale (LC_ALL, "C"); /* (See 7.11.1.1.4) */ is executed. Therefore C programs start in the C locale, not the locale as specified by LANG. POSIX defines several environment variables, and they have precedence rules for deciding which one should be used by the application [1]. However, this just defines the categories, but it doesn't *apply* them to the running program. Internationalized programs that wish to initialize locale specific operation must call setlocale [2]: setlocale (LC_CALL, ""); Which will set up the global locale according to the environment variables in the precedence as setup by POSIX. Does that answer your question? -- Cheers, Carlos. [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html