From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id 0F1C43858C5F for ; Fri, 1 Mar 2024 17:26:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0F1C43858C5F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0F1C43858C5F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::429 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709313975; cv=none; b=X0PyrrS5IZp8djgmejqNqLH6YOIFyuNCaZcHMeEyvl7YLvAg7UzP4TyFxhTMbNhirxNJLLVO5Xkwg7uaDrHS414GCtMXfBI9MW+WXjFgTxasRSeUzFedwbWsyNrzHwPCkOXQnvxjEs1wQf97luggG9HfoJGn9TBd+xlk9xyfpFM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709313975; c=relaxed/simple; bh=b8MYtQY50vQak7W/jhwFEjM7TiU2RKnpWay+HCJeAI8=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=ufugEKFRUsZDE5K/c4wQOocbsFRkIc8UpPtBWWcIkyVo7pRhsd7s/AAbK7mCGKefWI7G9saNu1++VSKftPvnzF5sSmoZ3jfmslRKcEutxQTuV0h3leruKM4BRFEt3Ay/EMfEsv7qpdWZERdSK9LPoxqqklKXJjTKW7mLIr5ZX1w= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-6da9c834646so1995495b3a.3 for ; Fri, 01 Mar 2024 09:26:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1709313971; x=1709918771; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=tu/FVw0Bxdj9r5VBE5dOkEmc2OnpBF5cY2DBVa+D7qk=; b=qwlOt+DHz/zc02I2nj9n4ezh8TIYhNNIS4io3T5AFs/8IWyYoXovm/8+bS59RnFIQg MydSPMmHNCqK0kATaSnbQNtbU8ddi+8GJgjEHS2yxTTeexplKX25KNIISPaILO0QBBby wJhqaA/WrifUwRAQ/pJPJmMi4o218yd3F+H40DT02LLnpqEQ+CRBtUuHujBfbGYmydsG ddqS1rGhls9aBzh5EeoiIWOTMafOANgmopoXTgMSkuojMi1QQvl5TUfsUmZxGaEDGTkl x3keJk+Wnwt3zclEU8GcpUfEu+bZrKQr/AZVqy63bZ0pG8fSjNCyqsTaN1K0TWb2L+2S PfEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709313971; x=1709918771; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tu/FVw0Bxdj9r5VBE5dOkEmc2OnpBF5cY2DBVa+D7qk=; b=WxH1la26lrnxYqmXjZTqkBlzUSQH6QetWzc9pgfWxRoV1wbIfw8eDToCvcecUIwE30 Xo4cViM8sk/MKvqpCPe75yyE6E1RByqDb6oyciGxRzhwfovTF/60dof0YIkAwht2y14/ BTi6AbYqE7ty00ocjaG8Lmvlodtq9AJsNpop1zuhFuCFfnIzGQ+Cwo7QDMgy+9fss9L5 AZ7D5aD0UUwmF+oBD8ygyHYTRNv33cJtYHqdtzq2qViUwgk8FfCnzP5kn2/TetSzkedL OhuOO3nx0HhQKT6NXdO/4cLvzMKlt4bsGFqPZ7+iph5niWMyN41d5JuC1YBrikTsvq+Q SnXw== X-Forwarded-Encrypted: i=1; AJvYcCVuo+sw6ILGyySeaes6eGw9qGp+8jBRTLKCduyVJfGNQlKGj743fl7Yco7UIH4MJIhRgEuD4hfJb4b9uOmFTkOG9CtgW7eMNZOL X-Gm-Message-State: AOJu0Yw1AB3loX2jj4on5wclnYaMz6pU+iHt+K1PxPsdgg7j/o7TI0uh eli+rAAxvhhj/R/ZPg1mHInpJe0yxMKyspM94pI+wvNsqOhliToHz5JVgNiSjCM= X-Google-Smtp-Source: AGHT+IG0Qg4Q+YmswH8FNJrPTAqDutytADsXG8dUVIlcmdRwrX/2scx6xRRE/O227gxpLynSgx/D0g== X-Received: by 2002:a05:6a20:2156:b0:1a0:efd0:b183 with SMTP id z22-20020a056a20215600b001a0efd0b183mr2099293pzz.44.1709313970780; Fri, 01 Mar 2024 09:26:10 -0800 (PST) Received: from ?IPV6:2804:1b3:a7c1:c57d:5cfe:2983:6ffc:5712? ([2804:1b3:a7c1:c57d:5cfe:2983:6ffc:5712]) by smtp.gmail.com with ESMTPSA id 23-20020a630d57000000b005dc87f5dfcfsm3226707pgn.78.2024.03.01.09.26.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 01 Mar 2024 09:26:10 -0800 (PST) Message-ID: <94141573-394d-4a38-a7ee-a349a8f73623@linaro.org> Date: Fri, 1 Mar 2024 14:26:07 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] locale: Handle loading a missing locale twice (Bug 14247) Content-Language: en-US To: Carlos O'Donell , libc-alpha@sourceware.org, fweimer@redhat.com References: <20240227153513.1790813-1-carlos@redhat.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <20240227153513.1790813-1-carlos@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 27/02/24 12:35, Carlos O'Donell wrote: > Fedora has been carrying a downstream-only patch for 13 years that > changes the error handling for newlocale() and other locale related > APIs. The patch was likely never upstreamed because it was incomplete > and needed further fixes. In an attempt to cleanup our downstream patch > collection I've debugged the remaining failures and resolved bug 14247 > in the process. > > 8< --- 8< --- 8< > > Delay setting file->decided until the data has been successfully loaded > by _nl_load_locale(). If the function fails to load the data then we > must return and error and leave decided untouched to allow the caller to > attempt to load the data again at a later time. We should not set > decided to 1 early in the function since doing so may prevent attempting > to load it again. We want to try loading it again because that allows an > open to fail and set errno correctly. > > On the other side of this problem is that if we are called again with > the same inputs we will fetch the cached version of the object and carry > out no open syscalls and that fails to set errno so we must set errno to > ENOENT in that case. There is a second code path that has to be handled > where the name of the locale matches but the codeset doesn't match. > > These changes ensure that errno is correctly set on failure in all the > return paths in _nl_find_locale(). > > Adds tst-locale-loadlocale to cover the bug. > > No regressions on x86_64. > > Co-authored-by: Jeff Law Change looks ok, some minor style issue and a suggestion to use libsupport. > --- > gen-locales.mk | 13 +++- > locale/findlocale.c | 19 ++++- > locale/loadlocale.c | 2 +- > localedata/Makefile | 4 ++ > localedata/gen-locale.sh | 24 +++++-- > localedata/tst-locale-loadlocale.c | 109 +++++++++++++++++++++++++++++ > 6 files changed, 161 insertions(+), 10 deletions(-) > create mode 100644 localedata/tst-locale-loadlocale.c > > diff --git a/gen-locales.mk b/gen-locales.mk > index 9c523d2a05..96add65c05 100644 > --- a/gen-locales.mk > +++ b/gen-locales.mk > @@ -1,8 +1,19 @@ > # defines target $(gen-locales) that generates the locales given in $(LOCALES) > > LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^@ ]*\(@[^ ]*\)\?/\1\2/g') > +# The CHARMAPS dependency handling must be able to process: > +# 1. No character map e.g. eo, en_US > +# 2. Character maps e.g. en_US.UTF-8 > +# 3. Character maps with modifier e.g. tt_RU.UTF-8@iqtelif > +# This complicates the processing slightly so we do it multiple edits, > +# the first captures the character map with the anchoring period while > +# the rest of the edits remove the period to get a valid file or adjust > +# the name to match the true name. > CHARMAPS := $(shell echo "$(LOCALES)" | \ > - sed -e 's/[^ .]*[.]\([^@ ]*\)\(@[^@ ]*\)*/\1/g' -e s/SJIS/SHIFT_JIS/g) > + sed -e 's/\([^ .]*\)\([^@ ]*\)\(@[^@ ]*\)*/\2/g' \ > + -e 's/^\./ /g' \ > + -e 's/ \./ /g' \ > + -e s/SJIS/SHIFT_JIS/g) > CTYPE_FILES = $(addsuffix /LC_CTYPE,$(LOCALES)) > gen-locales := $(addprefix $(common-objpfx)localedata/,$(CTYPE_FILES)) > > diff --git a/locale/findlocale.c b/locale/findlocale.c > index 8d6e4e33e3..43ff7201c1 100644 > --- a/locale/findlocale.c > +++ b/locale/findlocale.c > @@ -244,7 +244,14 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len, > locale_file = locale_file->successor[cnt]; > > if (locale_file == NULL) > - return NULL; > + { > + /* If this is the second time we tried to load a failed > + locale then the locale_file value comes from the cache > + and we will not carry out any actual filesystem > + operations so we must set ENOENT here. */ > + __set_errno (ENOENT); > + return NULL; > + } > } > > /* The LC_CTYPE category allows to check whether a locale is really > @@ -291,8 +298,14 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len, > if (__gconv_compare_alias (upstr (ccodeset, ccodeset), > upstr (clocale_codeset, > clocale_codeset)) != 0) > - /* The codesets are not identical, don't use the locale. */ > - return NULL; > + { > + /* The codesets are not identical, don't use the locale. > + If this is the second time we tried to load a locale > + whose codeset doesn't match then the result came from > + the cache and must set ENOENT here. */ > + __set_errno (ENOENT); > + return NULL; > + } > } > > /* Determine the locale name for which loading succeeded. This > diff --git a/locale/loadlocale.c b/locale/loadlocale.c > index 1e5f93e927..991c0591e9 100644 > --- a/locale/loadlocale.c > +++ b/locale/loadlocale.c > @@ -237,7 +237,6 @@ _nl_load_locale (struct loaded_l10nfile *file, int category) > int save_err; > int alloc = ld_mapped; > > - file->decided = 1; > file->data = NULL; > > fd = __open_nocancel (file->filename, O_RDONLY | O_CLOEXEC); > @@ -345,6 +344,7 @@ _nl_load_locale (struct loaded_l10nfile *file, int category) > newdata->alloc = alloc; > > file->data = newdata; > + file->decided = 1; > } > > void > diff --git a/localedata/Makefile b/localedata/Makefile > index 713e7aebad..6d0bac225b 100644 > --- a/localedata/Makefile > +++ b/localedata/Makefile > @@ -239,6 +239,7 @@ tests = \ > tst-iconv-emojis-trans \ > tst-iconv-math-trans \ > tst-leaks \ > + tst-locale-loadlocale \ > tst-mbswcs1 \ > tst-mbswcs2 \ > tst-mbswcs3 \ > @@ -325,6 +326,7 @@ LOCALES := \ > dsb_DE.UTF-8 \ > dz_BT.UTF-8 \ > en_GB.UTF-8 \ > + en_US \ > en_US.ANSI_X3.4-1968 \ > en_US.ISO-8859-1\ > en_US.UTF-8 \ > @@ -406,6 +408,8 @@ include ../gen-locales.mk > $(objpfx)tst-iconv-emojis-trans.out: $(gen-locales) > > $(objpfx)tst-iconv-math-trans.out: $(gen-locales) > +# tst-locale-loadlocale: Needs an en_US-named locale for the test. > +$(objpfx)tst-locale-loadlocale.out: $(gen-locales) > endif > > include ../Rules > diff --git a/localedata/gen-locale.sh b/localedata/gen-locale.sh > index b8c422ad13..e400d02b82 100644 > --- a/localedata/gen-locale.sh > +++ b/localedata/gen-locale.sh > @@ -48,9 +48,9 @@ generate_locale () > } > > locfile=`echo $locfile|sed 's|.*/\([^/]*/LC_CTYPE\)|\1|'` > -locale=`echo $locfile|sed 's|\([^.]*\)[.].*/LC_CTYPE|\1|'` > -charmap=`echo $locfile|sed 's|[^.]*[.]\([^@ ]*\)\(@[^ ]*\)\?/LC_CTYPE|\1|'` > -modifier=`echo $locfile|sed 's|[^.]*[.]\([^@ ]*\)\(@[^ ]*\)\?/LC_CTYPE|\2|'` > +locale=`echo $locfile|sed 's|\([^.]*\).*/LC_CTYPE|\1|'` > +charmap=`echo $locfile|sed -e 's|[^.]*\([^@ ]*\)\(@[^ ]*\)\?/LC_CTYPE|\1|' -e 's|^\.||g'` > +modifier=`echo $locfile|sed 's|[^.]*\([^@ ]*\)\(@[^ ]*\)\?/LC_CTYPE|\2|'` > > echo "Generating locale $locale.$charmap: this might take a while..." > > @@ -63,10 +63,16 @@ echo "Generating locale $locale.$charmap: this might take a while..." > # you to develop in-progress locales. > flags="" > > +charmap_real="$charmap" > + > +# If no character map is specified then we fall back to UTF-8. > +if [ -z "$charmap" ]; then > + charmap_real="UTF-8" > +fi > + > # For SJIS the charmap is SHIFT_JIS. We just want the locale to have > # a slightly nicer name instead of using "*.SHIFT_SJIS", but that > # means we need a mapping here. > -charmap_real="$charmap" > if [ "$charmap" = "SJIS" ]; then > charmap_real="SHIFT_JIS" > fi > @@ -80,4 +86,12 @@ if [ "$charmap_real" = 'SHIFT_JIS' ] \ > flags="$flags --no-warnings=ascii" > fi > > -generate_locale $charmap_real $locale$modifier $locale.$charmap$modifier "$flags" > +# If the character map is not specified then we output a locale > +# with the just the name of the locale e.g. eo, en_US. This is > +# used for test locales that act as fallbacks. > +output_file="$locale.$charmap$modifier" > +if [ -z "$charmap" ]; then > + output_file="$locale" > +fi > + > +generate_locale $charmap_real $locale$modifier $output_file "$flags" > diff --git a/localedata/tst-locale-loadlocale.c b/localedata/tst-locale-loadlocale.c > new file mode 100644 > index 0000000000..14f394d395 > --- /dev/null > +++ b/localedata/tst-locale-loadlocale.c > @@ -0,0 +1,109 @@ > +/* Test for locale loading error handling (Bug 14247) > + Copyright (C) 2023 Free Software Foundation, Inc. s/2023/2024. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > + > +static int > +do_test(void) Space after function name. Same for the rest of the file. > +{ > + locale_t loc; > + int ret; > + int saved_errno; > + > + /* We must load an en_US-based locale for the final test to fail. > + The locale loading code will find the en_US locale already loaded > + but the codesets won't match. */ > + xsetlocale(LC_ALL, "en_US.UTF-8")> + > + /* Call newlocale with an invalid locale. We expect the test system > + does not have "invalidlocale" locale. */ > + errno = 0; > + ret = 0; > + loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, > + "invalidlocale.utf8", > + 0); > + saved_errno = errno; > + if (errno != 0) > + printf ("PASS: First call to newlocale failed as expected.\n"); > + else > + { > + printf ("FAIL: First call to newlocale unexpectedly succeeded.\n"); > + ret = 1; > + } > + printf("result = %p errno = %d\n", loc, saved_errno); > + > + /* Call newlocale again with the same name. This triggers bug 14247 where > + the second call fails to set errno correctly. */ > + errno = 0; > + loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, > + "invalidlocale.utf8", > + 0); > + saved_errno = errno; > + if (errno != 0) > + printf ("PASS: Second call to newlocale failed as expected.\n"); > + else > + { > + printf ("FAIL: Second call to newlocale unexpectedly succeeded.\n"); > + ret = 1; > + } > + printf("result = %p errno = %d\n", loc, saved_errno); > + > + /* Next we attempt to load a locale that exists but whose codeset > + does not match the codeset of the locale on the system. > + This is more difficult to test but we rely on the fact that locale > + processing will normalize the locale name and attempt to open > + "en_US" with no codeset as a fallback and this will allow us to > + compare a loaded "en_US" locale with a UTF-8 codeset to the > + ficiticious "en_US.utf99" and get a codeset match failure. */ > + errno = 0; > + loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, > + "en_US.utf99", > + 0); > + saved_errno = errno; > + if (errno != 0) > + printf ("PASS: Third call to newlocale failed as expected.\n"); > + else > + { > + printf ("FAIL: Third call to newlocale unexpectedly succeeded.\n"); > + ret = 1; > + } > + printf("result = %p errno = %d\n", loc, saved_errno); Maybe use libsupport here? errno = 0; loc = newlocale (LC_COLLATE_MASK | LC_CTYPE_MASK, "invalidlocale.utf8", 0); TEST_VERIFY (errno != 0); printf ("result = %p errno = %d\n", loc, saved_errno); [...] > + > + /* Call newlocale again with the same name. This triggers bug 14247 where > + the second call fails to set errno correctly. */ > + errno = 0; > + loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, > + "en_US.utf99", > + 0); > + saved_errno = errno; > + if (errno != 0) > + printf ("PASS: Fourth call to newlocale failed as expected.\n"); > + else > + { > + printf ("FAIL: Fourth call to newlocale unexpectedly succeeded.\n"); > + ret = 1; > + } > + printf("result = %p errno = %d\n", loc, saved_errno); > + > + return ret; > +} > + > +#include