From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from olivedrab.birch.relay.mailchannels.net (olivedrab.birch.relay.mailchannels.net [23.83.209.135]) by sourceware.org (Postfix) with ESMTPS id 79A7F396EC80 for ; Wed, 11 Aug 2021 07:43:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 79A7F396EC80 X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 7329012281E for ; Wed, 11 Aug 2021 07:43:08 +0000 (UTC) Received: from pdx1-sub0-mail-a42.g.dreamhost.com (100-96-99-6.trex.outbound.svc.cluster.local [100.96.99.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 848AB12284D for ; Wed, 11 Aug 2021 07:43:05 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from pdx1-sub0-mail-a42.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.96.99.6 (trex/6.3.3); Wed, 11 Aug 2021 07:43:08 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Callous-Fumbling: 390ecbdb4486e357_1628667785914_1674632939 X-MC-Loop-Signature: 1628667785914:3240115492 X-MC-Ingress-Time: 1628667785914 Received: from pdx1-sub0-mail-a42.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a42.g.dreamhost.com (Postfix) with ESMTP id D05E27E639 for ; Wed, 11 Aug 2021 00:43:04 -0700 (PDT) Received: from [192.168.1.159] (unknown [1.186.101.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a42.g.dreamhost.com (Postfix) with ESMTPSA id 9CD577EED5 for ; Wed, 11 Aug 2021 00:43:02 -0700 (PDT) Subject: [PING][PATCH v2] setlocale: Fail if iconv module for charset is not present [BZ #27996] To: libc-alpha@sourceware.org References: <2e328221-075c-0297-d437-a1515373dd92@sourceware.org> <20210720023658.1278155-1-siddhesh@sourceware.org> X-DH-BACKEND: pdx1-sub0-mail-a42 From: Siddhesh Poyarekar Message-ID: <380077ec-12e6-a06e-d9ed-f433080bd30a@sourceware.org> Date: Wed, 11 Aug 2021 13:12:58 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210720023658.1278155-1-siddhesh@sourceware.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3493.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NEUTRAL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Aug 2021 07:43:21 -0000 Ping! On 7/20/21 8:06 AM, Siddhesh Poyarekar via Libc-alpha wrote: > setlocale currently succeeds even if the requested locale uses a > charset that does not have a converter module installed. Check for > existence of the charset (either the one requested through the input > name or the one needed by the selected locale file) and fail if it > doesn't. > > The new test tst-invalid-charset verifes that loading test5 and test6 > locales fail because both locales have charsets without a converter, > viz. test5 and test6 respectively. Also, test6.c has been removed as > it was unused. > --- > Changes from v1: > - Find full transformation paths both ways instead of merely looking for > a FROM converter. > > locale/findlocale.c | 77 ++++++++++++----- > localedata/Makefile | 12 ++- > localedata/tests/test6.c | 137 ------------------------------- > localedata/tst-invalid-charset.c | 31 +++++++ > 4 files changed, 95 insertions(+), 162 deletions(-) > delete mode 100644 localedata/tests/test6.c > create mode 100644 localedata/tst-invalid-charset.c > > diff --git a/locale/findlocale.c b/locale/findlocale.c > index ab09122b0c..7ccc98cd8b 100644 > --- a/locale/findlocale.c > +++ b/locale/findlocale.c > @@ -98,6 +98,30 @@ valid_locale_name (const char *name) > return 1; > } > > +/* Return true if we have gconv modules to transform between the INTERNAL > + encoding and CODESET. */ > +static bool > +codeset_has_module (const char *codeset) > +{ > + struct __gconv_step *steps; > + size_t nsteps; > + > + char *ccodeset = (char *) alloca (strlen (codeset) + 3); > + strip (ccodeset, codeset); > + > + if (__gconv_find_transform ("INTERNAL", ccodeset, &steps, &nsteps, 0) > + != __GCONV_OK) > + return false; > + __gconv_close_transform (steps, nsteps); > + > + if (__gconv_find_transform (ccodeset, "INTERNAL", &steps, &nsteps, 0) > + != __GCONV_OK) > + return false; > + __gconv_close_transform (steps, nsteps); > + > + return true; > +} > + > struct __locale_data * > _nl_find_locale (const char *locale_path, size_t locale_path_len, > int category, const char **name) > @@ -200,6 +224,10 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len, > /* Memory allocate problem. */ > return NULL; > > + /* The requested codeset does not have a converter, don't use it. */ > + if (codeset != NULL && !codeset_has_module (codeset)) > + return NULL; > + > /* If exactly this locale was already asked for we have an entry with > the complete name. */ > locale_file = _nl_make_l10nflist (&_nl_locale_file_list[category], > @@ -248,6 +276,33 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len, > return NULL; > } > > + /* Get the codeset information from the locale file. */ > + static const int codeset_idx[] = > + { > + [__LC_CTYPE] = _NL_ITEM_INDEX (CODESET), > + [__LC_NUMERIC] = _NL_ITEM_INDEX (_NL_NUMERIC_CODESET), > + [__LC_TIME] = _NL_ITEM_INDEX (_NL_TIME_CODESET), > + [__LC_COLLATE] = _NL_ITEM_INDEX (_NL_COLLATE_CODESET), > + [__LC_MONETARY] = _NL_ITEM_INDEX (_NL_MONETARY_CODESET), > + [__LC_MESSAGES] = _NL_ITEM_INDEX (_NL_MESSAGES_CODESET), > + [__LC_PAPER] = _NL_ITEM_INDEX (_NL_PAPER_CODESET), > + [__LC_NAME] = _NL_ITEM_INDEX (_NL_NAME_CODESET), > + [__LC_ADDRESS] = _NL_ITEM_INDEX (_NL_ADDRESS_CODESET), > + [__LC_TELEPHONE] = _NL_ITEM_INDEX (_NL_TELEPHONE_CODESET), > + [__LC_MEASUREMENT] = _NL_ITEM_INDEX (_NL_MEASUREMENT_CODESET), > + [__LC_IDENTIFICATION] = _NL_ITEM_INDEX (_NL_IDENTIFICATION_CODESET) > + }; > + const struct __locale_data *data; > + const char *locale_codeset; > + > + data = (const struct __locale_data *) locale_file->data; > + locale_codeset = (const char *) data->values[codeset_idx[category]].string; > + assert (locale_codeset != NULL); > + > + /* The locale codeset does not have a converter, don't use it. */ > + if (locale_codeset[0] != '\0' && !codeset_has_module (locale_codeset)) > + return NULL; > + > /* The LC_CTYPE category allows to check whether a locale is really > usable. If the locale name contains a charset name and the > charset name used in the locale (present in the LC_CTYPE data) is > @@ -256,31 +311,9 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len, > in the locale name. */ > if (codeset != NULL) > { > - /* Get the codeset information from the locale file. */ > - static const int codeset_idx[] = > - { > - [__LC_CTYPE] = _NL_ITEM_INDEX (CODESET), > - [__LC_NUMERIC] = _NL_ITEM_INDEX (_NL_NUMERIC_CODESET), > - [__LC_TIME] = _NL_ITEM_INDEX (_NL_TIME_CODESET), > - [__LC_COLLATE] = _NL_ITEM_INDEX (_NL_COLLATE_CODESET), > - [__LC_MONETARY] = _NL_ITEM_INDEX (_NL_MONETARY_CODESET), > - [__LC_MESSAGES] = _NL_ITEM_INDEX (_NL_MESSAGES_CODESET), > - [__LC_PAPER] = _NL_ITEM_INDEX (_NL_PAPER_CODESET), > - [__LC_NAME] = _NL_ITEM_INDEX (_NL_NAME_CODESET), > - [__LC_ADDRESS] = _NL_ITEM_INDEX (_NL_ADDRESS_CODESET), > - [__LC_TELEPHONE] = _NL_ITEM_INDEX (_NL_TELEPHONE_CODESET), > - [__LC_MEASUREMENT] = _NL_ITEM_INDEX (_NL_MEASUREMENT_CODESET), > - [__LC_IDENTIFICATION] = _NL_ITEM_INDEX (_NL_IDENTIFICATION_CODESET) > - }; > - const struct __locale_data *data; > - const char *locale_codeset; > char *clocale_codeset; > char *ccodeset; > > - data = (const struct __locale_data *) locale_file->data; > - locale_codeset = > - (const char *) data->values[codeset_idx[category]].string; > - assert (locale_codeset != NULL); > /* Note the length of the allocated memory: +3 for up to two slashes > and the NUL byte. */ > clocale_codeset = (char *) alloca (strlen (locale_codeset) + 3); > diff --git a/localedata/Makefile b/localedata/Makefile > index 14e04cd3c5..2af399ec51 100644 > --- a/localedata/Makefile > +++ b/localedata/Makefile > @@ -124,11 +124,13 @@ test-input := \ > test-input-data = $(addsuffix .in, $(test-input)) > test-output := $(foreach s, .out .xout, \ > $(addsuffix $s, $(basename $(test-input)))) > +# Note that tst-invalid-charset depends on test5 and test6 being locales that > +# do not have valid charset converters. > ld-test-names := test1 test2 test3 test4 test5 test6 test7 > ld-test-srcs := $(addprefix tests/,$(addsuffix .cm,$(ld-test-names)) \ > $(addsuffix .def,$(ld-test-names)) \ > $(addsuffix .ds,test5 test6) \ > - test6.c trans.def) > + trans.def) > > fmon-tests = n01y12 n02n40 n10y31 n11y41 n12y11 n20n32 n30y20 n41n00 \ > y01y10 y02n22 y22n42 y30y21 y32n31 y40y00 y42n21 > @@ -158,7 +160,7 @@ tests = $(locale_test_suite) tst-digits tst-setlocale bug-iconv-trans \ > tst-leaks tst-mbswcs1 tst-mbswcs2 tst-mbswcs3 tst-mbswcs4 tst-mbswcs5 \ > tst-mbswcs6 tst-xlocale1 tst-xlocale2 bug-usesetlocale \ > tst-strfmon1 tst-sscanf bug-setlocale1 tst-setlocale2 tst-setlocale3 \ > - tst-wctype tst-iconv-math-trans > + tst-wctype tst-iconv-math-trans tst-invalid-charset > tests-static = bug-setlocale1-static > tests += $(tests-static) > ifeq (yes,$(build-shared)) > @@ -401,7 +403,10 @@ $(objpfx)tst-langinfo-setlocale-static.out: tst-langinfo.sh \ > '$(run-program-env)' '$(test-program-cmd-after-env)' > $@; \ > $(evaluate-test) > > +# These tests depend on tst-locale because they use the locales compiled by > +# that test. > $(objpfx)tst-digits.out: $(objpfx)tst-locale.out > +$(objpfx)tst-invalid-charset.out: $(objpfx)tst-locale.out > $(objpfx)tst-mbswcs6.out: $(addprefix $(objpfx),$(CTYPE_FILES)) > endif > > @@ -461,7 +466,8 @@ $(objpfx)mtrace-tst-leaks.out: $(objpfx)tst-leaks.out > $(common-objpfx)malloc/mtrace $(objpfx)tst-leaks.mtrace > $@; \ > $(evaluate-test) > > -bug-setlocale1-ENV-only = LOCPATH=$(objpfx) LC_CTYPE=de_DE.UTF-8 > +bug-setlocale1-ENV-only = GCONV_PATH=$(common-objpfx)iconvdata \ > + LOCPATH=$(objpfx) LC_CTYPE=de_DE.UTF-8 > bug-setlocale1-static-ENV-only = $(bug-setlocale1-ENV-only) > > $(objdir)/iconvdata/gconv-modules: > diff --git a/localedata/tests/test6.c b/localedata/tests/test6.c > deleted file mode 100644 > index edb5fe4a5f..0000000000 > --- a/localedata/tests/test6.c > +++ /dev/null > @@ -1,137 +0,0 @@ > -/* Test program for character classes and mappings. > - Copyright (C) 1999-2021 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - Contributed by Ulrich Drepper , 1999. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - . */ > - > -#include > -#include > -#include > - > - > -int > -main (void) > -{ > - const char lower[] = "abcdefghijklmnopqrstuvwxyz"; > - const char upper[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; > -#define LEN (sizeof (upper) - 1) > - const wchar_t wlower[] = L"abcdefghijklmnopqrstuvwxyz"; > - const wchar_t wupper[] = L"ABCDEFGHIJKLMNOPQRSTUVWXYZ"; > - int i; > - int result = 0; > - > - setlocale (LC_ALL, "test6"); > - > - for (i = 0; i < LEN; ++i) > - { > - /* Test basic table handling (basic == not more than 256 characters). > - The charmaps swaps the normal lower-upper case meaning of the > - ASCII characters used in the source code while the Unicode mapping > - in the repertoire map has the normal correspondents. This test > - shows the independence of the tables for `char' and `wchar_t' > - characters. */ > - > - if (islower (lower[i])) > - { > - printf ("islower ('%c') false\n", lower[i]); > - result = 1; > - } > - if (! isupper (lower[i])) > - { > - printf ("isupper ('%c') false\n", lower[i]); > - result = 1; > - } > - > - if (! islower (upper[i])) > - { > - printf ("islower ('%c') false\n", upper[i]); > - result = 1; > - } > - if (isupper (upper[i])) > - { > - printf ("isupper ('%c') false\n", upper[i]); > - result = 1; > - } > - > - if (toupper (lower[i]) != lower[i]) > - { > - printf ("toupper ('%c') false\n", lower[i]); > - result = 1; > - } > - if (tolower (lower[i]) != upper[i]) > - { > - printf ("tolower ('%c') false\n", lower[i]); > - result = 1; > - } > - > - if (tolower (upper[i]) != upper[i]) > - { > - printf ("tolower ('%c') false\n", upper[i]); > - result = 1; > - } > - if (toupper (upper[i]) != lower[i]) > - { > - printf ("toupper ('%c') false\n", upper[i]); > - result = 1; > - } > - > - if (iswlower (wupper[i])) > - { > - printf ("iswlower (L'%c') false\n", upper[i]); > - result = 1; > - } > - if (! iswupper (wupper[i])) > - { > - printf ("iswupper (L'%c') false\n", upper[i]); > - result = 1; > - } > - > - if (iswupper (wlower[i])) > - { > - printf ("iswupper (L'%c') false\n", lower[i]); > - result = 1; > - } > - if (! iswlower (wlower[i])) > - { > - printf ("iswlower (L'%c') false\n", lower[i]); > - result = 1; > - } > - > - if (towupper (wlower[i]) != wupper[i]) > - { > - printf ("towupper ('%c') false\n", lower[i]); > - result = 1; > - } > - if (towlower (wlower[i]) != wlower[i]) > - { > - printf ("towlower ('%c') false\n", lower[i]); > - result = 1; > - } > - > - if (towlower (wupper[i]) != wlower[i]) > - { > - printf ("towlower ('%c') false\n", upper[i]); > - result = 1; > - } > - if (towupper (wupper[i]) != wupper[i]) > - { > - printf ("towupper ('%c') false\n", upper[i]); > - result = 1; > - } > - } > - > - return result; > -} > diff --git a/localedata/tst-invalid-charset.c b/localedata/tst-invalid-charset.c > new file mode 100644 > index 0000000000..46a5198c66 > --- /dev/null > +++ b/localedata/tst-invalid-charset.c > @@ -0,0 +1,31 @@ > +/* Test program to verify that setlocale fails for charsets that do not have a > + converter. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > + > + > +int > +main (void) > +{ > + /* Fail if setlocale succeeds for any of these locales. */ > + return (setlocale (LC_ALL, "test5") != NULL > + || setlocale (LC_ALL, "test6") != NULL); > +} >