From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 8F0003853804 for ; Fri, 28 Jan 2022 16:42:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8F0003853804 Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-290-gI45nqSaPpSte1dXDUSJRg-1; Fri, 28 Jan 2022 11:42:55 -0500 X-MC-Unique: gI45nqSaPpSte1dXDUSJRg-1 Received: by mail-qt1-f198.google.com with SMTP id a28-20020ac84d9c000000b002d05c958a84so5003883qtw.0 for ; Fri, 28 Jan 2022 08:42:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=axJmLU8Pa65T6mPg5+08vMWXyakoAtnZ+UDkIYBBxeo=; b=qG3Wod189AMql7cSM/YDMliUJqJ/CfD/6WTaEMlH/hWvmV7ulzn0UnAhn24UmK4BFi JmK7s+H20jQXQy+HXWG3n+rk3t0fVW1hhWhJ318HpgqlXS+8a99toPMGXzngKbv80aQ/ 1GtcBEPqonPlJ4SMRr/wIgeGeO34Er0y3aMEiGrohooITRF0Btrs3/XVbkEJ3BHGBotZ kZO4hTz+zhYQtNrzSXDmu1I5T1/jeSzeUH91n28j23EX89KeZZOaelZpQSlaUtzCHHB8 WUZ5kNVRoDZ33YES2sOAKdFOOxeWSbVerGWSUEGsk7jPDCYy8yDhbv7UiGMz382mmkSY UNJA== X-Gm-Message-State: AOAM531KQsUkIzQ3HOrKF0I1vG9LMmZgyLttwDZWw2j5ISjf0wzYU+Ng sjEewfy/k/NPNLjZJt0r/pcdQrDFTfn0F3/lYAN9OD0FdYY4e5lj2EoP49Y748FOx3zdb633rKx ChTmNpI9J/elurTrhCMvo X-Received: by 2002:a05:620a:132c:: with SMTP id p12mr6546506qkj.106.1643388174632; Fri, 28 Jan 2022 08:42:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJxBPCc5nbQ2LwqGNyJCquTaOiP6gWnhD+xHtv1Yk0CxZOjeSu173tW66mpc62Qn6TyXGfKMvQ== X-Received: by 2002:a05:620a:132c:: with SMTP id p12mr6546490qkj.106.1643388174333; Fri, 28 Jan 2022 08:42:54 -0800 (PST) Received: from [192.168.0.241] (135-23-175-80.cpe.pppoe.ca. [135.23.175.80]) by smtp.gmail.com with ESMTPSA id w20sm3723774qkp.102.2022.01.28.08.42.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jan 2022 08:42:53 -0800 (PST) Message-ID: Date: Fri, 28 Jan 2022 11:42:52 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: [PATCH v12 2/2] Add generic C.UTF-8 locale (Bug 17318) To: Michael Hudson-Doyle Cc: libc-alpha@sourceware.org, Florian Weimer References: <20210906154336.610973-1-carlos@redhat.com> <20210906154336.610973-3-carlos@redhat.com> From: Carlos O'Donell Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jan 2022 16:42:58 -0000 On 1/25/22 21:44, Michael Hudson-Doyle wrote: > On Tue, 7 Sept 2021 at 03:45, Carlos O'Donell via Libc-alpha < > libc-alpha@sourceware.org> wrote: > >> diff --git a/localedata/locales/C b/localedata/locales/C >> new file mode 100644 >> index 0000000000..ca801c79cf >> --- /dev/null >> +++ b/localedata/locales/C > > > [...] > > >> >> > +LC_TIME >> +% This is the POSIX Locale definition for the LC_TIME category with the >> +% exception that time is per ISO 8601 and 24-hour. >> +% >> +% Abbreviated weekday names (%a) >> +abday "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat" >> + >> +% Full weekday names (%A) >> +day "Sunday";"Monday";"Tuesday";"Wednesday";"Thursday";/ >> + "Friday";"Saturday" >> + >> +% Abbreviated month names (%b) >> +abmon "Jan";"Feb";"Mar";"Apr";"May";"Jun";"Jul";"Aug";"Sep";/ >> + "Oct";"Nov";"Dec" >> + >> +% Full month names (%B) >> +mon "January";"February";"March";"April";"May";"June";"July";/ >> + "August";"September";"October";"November";"December" >> + >> +% Week description, consists of three fields: >> +% 1. Number of days in a week. >> +% 2. Gregorian date that is a first weekday (19971130 for Sunday, >> 19971201 for Monday). >> +% 3. The weekday number to be contained in the first week of the year. >> +% >> +% ISO 8601 conforming applications should use the values 7, 19971201 (a >> +% Monday), and 4 (Thursday), respectively. >> +week 7;19971201;4 >> > > It's obviously a bit late, but this is a difference from the Debian/Ubuntu It is never too late! Thank you for raising this. Given that you've had problems with one application, other applications will have problems too. I think we should probably keep C == C.UTF-8 and not change any of the existing LC_TIME properties. > C.UTF-8 locale, which has: > > week 7;19971130;4 This is the default value from ISO 30112. This data matches the internal C/POSIX locale. e.g. { .string = "\7" }, 7 days in the week. { .word = 19971130 }, Week start Sunday. This matches ISO 30112 definition if week is not specified. { .string = "\4" }, And Thursday needs to be included in the week for it be considered a "first week." { .string = "\1" }, { .string = "\2" }, And ld-time.c follows defaults from ISO 30112 also. 482 /* Set up defaults based on ISO 30112 WD10 [2014]. */ 483 if (time->week_ndays == 0) 484 time->week_ndays = 7; 485 486 if (time->week_1stday == 0) 487 time->week_1stday = 19971130; 488 489 if (time->week_1stweek == 0) 490 time->week_1stweek = 7; > (confusingly, this is preceded by this comment: > > % ISO 8601 conforming applications should use the values 7, 19971130 (a > % Monday), and 4 (Thursday), respectively. > > but 19971130 is a Sunday). The above comment is wrong as you note, it is a Sunday. The verbatim comment from ISO 30112 standard is: ~~~ ISO 8601 conforming applications should use the values 7, 19971201 (a Monday), and 4 (Thursday), respectively. ~~~ Note the correction in the YYYYMMDD e.g. 19971201. In our upstream C.UTF-8 locale we are consciously aligning with ISO 8601 in more cases. 117 % Week description, consists of three fields: 118 % 1. Number of days in a week. 119 % 2. Gregorian date that is a first weekday (19971130 for Sunday, 19971201 for Monday). 120 % 3. The weekday number to be contained in the first week of the year. 121 % 122 % ISO 8601 conforming applications should use the values 7, 19971201 (a 123 % Monday), and 4 (Thursday), respectively. 124 week 7;19971201;4 125 first_weekday 1 126 first_workday 2 So there is a difference between C and C.UTF-8 in that they have different first weekday. > The locale(5) page from the man-pages project also says: > > "For compatibility reasons, all glibc locales should set the value of the > second week list item to 19971130 (Sunday) and base the abday and day lists > appropriately,". This is to align with ISO 30112, which is an older standard. > I found this because it breaks a test of rrdtool (which is probably buggy! > It sets LC_TIME but needs to clear LC_ALL for that to take any effect) and > I just wanted to check that this was truly the intended value before (even > if only just) the release. In this case for C.UTF-8 we have aligned week with ISO 8601. There are other parts of C.UTF-8's LC_TIME which are not aligned with ISO 8601. However, this choice is perhaps inconsistent with the intent of C.UTF-8, so I think this is actually a bug, and Florian found a real bug in d_fmt (need double slashes). I'm going to post a patch to fix this and make it consistent with C. -- Cheers, Carlos.