From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by sourceware.org (Postfix) with ESMTPS id A05803857C67 for ; Thu, 27 Jan 2022 00:35:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A05803857C67 Received: by mail-lf1-x136.google.com with SMTP id bu18so2118905lfb.5 for ; Wed, 26 Jan 2022 16:35:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=4rD9GzQhbQf0WuhDDccn8Y0IF0favpt7zrQ4+DjqncQ=; b=Zkmi+dUKCgrAGzaqXQwnQeGMTEdCOy8CnSDyUy/+WPnt4WlrbVUOBn5EOrtfRgSfCM 3Gdb23w9kS83enwzaef0t/EBFwtQVnwiULBZrqfRsnmkjgGJDkmGSmSmHcHwqDuvoInk mdxoezOj5de/7b7lDztt6qNiguTgp0nmLpI12jhzfd7St3xN1RsID4R8zsPXcKygJ1Gf 6sevcx1t0SSxFSsMoxCQWouf1dnLZMom63R929X8fcKHTIoVl6TcamgRUjFn4yVX0RmQ XlSD8FvL4kPJSGgx0K8OyoHnlYjfdJj/apcfGLY4WSSk0lrvTa+EtftnsQ9YBfkctDVx ElMg== X-Gm-Message-State: AOAM530HGC+LOTy+zbCpWJb5eCwHaMplRqa1VQqPyqnkpL39IpJmbRxh 0Wfp2AFk56CloXLRmaAVWipWumwp7FwoFcNYxoAvM0rBgWk= X-Google-Smtp-Source: ABdhPJwzoEfYsr444YtWIallZf4MI42zeqHVm6VE8yHlrg8SJdI9WZWwZdHDiv8p4t6oRDuC4OyRc36ARWqhUkiyzJI= X-Received: by 2002:ac2:5976:: with SMTP id h22mr1132780lfp.474.1643243752247; Wed, 26 Jan 2022 16:35:52 -0800 (PST) MIME-Version: 1.0 From: Godmar Back Date: Wed, 26 Jan 2022 19:35:41 -0500 Message-ID: Subject: why does `LANG` not influence the locale in GNU/Linux To: William Tambe via Libc-help Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jan 2022 00:35:57 -0000 If I am writing code that uses fgetwc to read from standard input, I need to call setlocale(LC_CTYPE, "en_US.utf8"); to ensure that fgetwc will treat stdin as a UTF-8 encoded stream. This property appears to hold (on Ubuntu 20, with default GNU libc), independent of the setting of LANG. LANG is set to en_US.utf8. By contrast, Python 3 changes its behavior with regard to the encoding it assumes sys.stdin to be in based on the LANG variable - if set to en_US.utf8, it'll decode as UTF-8, if set to C or unset, it'll treat the input as 8-bit ASCII encoding (I believe). My question is why GNU libc (chose?) to not use LANG and what standards, if any, apply here. Is my characterization of the behavior correct? - Godmar