From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by sourceware.org (Postfix) with ESMTPS id 2ACF9385770C for ; Mon, 3 Jul 2023 14:39:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2ACF9385770C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-1b06c978946so4264914fac.0 for ; Mon, 03 Jul 2023 07:39:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1688395179; x=1690987179; h=content-transfer-encoding:in-reply-to:organization:references:to :from:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=g3OMgHHt61+QWBOQE4wiZyhJJ5LXrt5WILYqenklKOc=; b=AX0CXbPp2rY7NojKuEk1zJSuKnbciCFOyzOCLGfDAMPmrMH0dvuZ44lQdi9SLmZthC N9Y/uPlqsWCyiWn4HT1qYBL7xoD/bwIRIFEumrTlod0MZhrthJRx1ggp6V4wbXIgmwrY JfdKd3/SeW3j52G5QbtQ/bf2Ln/5JNrZUvpabToDh1DKtNwmH0ycWl2KNcDB2jMgkDoc S6HUF2wn8UcBjXzRfRPSqRmRx1XGMkvcZJPVtdFwSaH3ITVHEERC3V0V9MRoC9eaPM7L Z2Zew+FafEVNPE3tKF4dKjVJVnRusVsjkO8JQsQ2EP2RG/C6PM7m2AEXfWbRDVKCHSqc Eo4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688395179; x=1690987179; h=content-transfer-encoding:in-reply-to:organization:references:to :from:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=g3OMgHHt61+QWBOQE4wiZyhJJ5LXrt5WILYqenklKOc=; b=F8Uwl0lszMW+xadYwLp9MawmKY1Ylggi7uZm9DA8iraqS4E81EYop49ixPCBbLcBpI HHYnY8thFESwoubTrSwwlPmmmuOOn3YLF1PCQmnf+XAiUtHP1vaLZ+7XDg0h9hYbmy0u 6hsgPKn3vm1QmpIXilCHVOAdyJjZuXITD3wqpRCVg31XdcbkJPbz4+klkSt0Z6O/q2nF QBhEoi/972q7c3YpYVGIjKqZUIHDua5bffxwRezWm1fDtVm/4mrW2GJJKwGRM44DauFV hdKkCrANxqMH0yYpoXnIdunqApQpHY1peDylhLgXePnxe4iv5FMjq8sVrxHk/nQ+bWp4 fyAA== X-Gm-Message-State: ABy/qLYdyIfbU2QtF75y3gpivuUzuAYjTIl5dJdjw7g3ekamcotXjaw7 4M5lrFmEESOun8nMUkMDdZLxKlJ0iRUkivM18b9ZkA== X-Google-Smtp-Source: APBJJlEyPakxCtlzFaiw7XAyZ/hexNfjJjQAcqLna+HU9W+jOHuEitwSZ89CL2+KPcrxrsaCHnAeRw== X-Received: by 2002:a05:6870:2a46:b0:1b0:20c2:2c82 with SMTP id jd6-20020a0568702a4600b001b020c22c82mr10723883oab.41.1688395179008; Mon, 03 Jul 2023 07:39:39 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c3:665c:4c86:ac7d:d2ce:ef? ([2804:1b3:a7c3:665c:4c86:ac7d:d2ce:ef]) by smtp.gmail.com with ESMTPSA id ej18-20020a056870f71200b001a8f6be7debsm12416055oab.28.2023.07.03.07.39.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Jul 2023 07:39:38 -0700 (PDT) Message-ID: <3baffc46-3a2f-85ca-f5a4-b1857d7c44a7@linaro.org> Date: Mon, 3 Jul 2023 11:39:35 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH] vfscanf-internal: Remove potentially unbounded allocas Content-Language: en-US From: Adhemerval Zanella Netto To: libc-alpha@sourceware.org, Joe Simmons-Talbott References: <20230703143829.2256518-1-adhemerval.zanella@linaro.org> Organization: Linaro In-Reply-To: <20230703143829.2256518-1-adhemerval.zanella@linaro.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 03/07/23 11:38, Adhemerval Zanella wrote: > From: Joe Simmons-Talbott Sorry about that, I forgot to adjust the author. > > Some locales define a list of mapping pairs of alternate digits and > separators for input digits (to_inpunct). This require the scanf > to create a list of all possible inputs for the optional type > modifier 'I'. > > Checked on x86_64-linux-gnu. > --- > stdio-common/Makefile | 3 ++ > stdio-common/tst-scanf-to_inpunct.c | 78 ++++++++++++++++++++++++++++ > stdio-common/vfscanf-internal.c | 40 ++++++++------- > wcsmbs/Makefile | 3 ++ > wcsmbs/tst-wscanf-to_inpunct.c | 79 +++++++++++++++++++++++++++++ > 5 files changed, 186 insertions(+), 17 deletions(-) > create mode 100644 stdio-common/tst-scanf-to_inpunct.c > create mode 100644 wcsmbs/tst-wscanf-to_inpunct.c > > diff --git a/stdio-common/Makefile b/stdio-common/Makefile > index 8871ec7668..f6d9017ff1 100644 > --- a/stdio-common/Makefile > +++ b/stdio-common/Makefile > @@ -231,6 +231,7 @@ tests := \ > tst-scanf-binary-gnu11 \ > tst-scanf-binary-gnu89 \ > tst-scanf-round \ > + tst-scanf-to_inpunct \ > tst-setvbuf1 \ > tst-sprintf \ > tst-sprintf-errno \ > @@ -347,6 +348,7 @@ LOCALES := \ > de_DE.ISO-8859-1 \ > de_DE.UTF-8 \ > en_US.ISO-8859-1 \ > + fa_IR.UTF-8 \ > hi_IN.UTF-8 \ > ja_JP.EUC-JP \ > ps_AF.UTF-8 \ > @@ -366,6 +368,7 @@ $(objpfx)tst-swprintf.out: $(gen-locales) > $(objpfx)tst-vfprintf-mbs-prec.out: $(gen-locales) > $(objpfx)tst-vfprintf-width-i18n.out: $(gen-locales) > $(objpfx)tst-grouping3.out: $(gen-locales) > +$(objpfx)tst-scanf-to_inpunct.out: $(gen-locales) > endif > > tst-printf-bz18872-ENV = MALLOC_TRACE=$(objpfx)tst-printf-bz18872.mtrace \ > diff --git a/stdio-common/tst-scanf-to_inpunct.c b/stdio-common/tst-scanf-to_inpunct.c > new file mode 100644 > index 0000000000..32236ac2dc > --- /dev/null > +++ b/stdio-common/tst-scanf-to_inpunct.c > @@ -0,0 +1,78 @@ > +/* Test scanf for languages with mapping pairs of alternate digits and > + separators. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > + > +/* fa_IR defines to_inpunct for numbers. */ > +static const struct > +{ > + int n; > + const char *str; > +} inputs[] = > +{ > + { 1, "\xdb\xb1" }, > + { 2, "\xdb\xb2" }, > + { 3, "\xdb\xb3" }, > + { 4, "\xdb\xb4" }, > + { 5, "\xdb\xb5" }, > + { 6, "\xdb\xb6" }, > + { 7, "\xdb\xb7" }, > + { 8, "\xdb\xb8" }, > + { 9, "\xdb\xb9" }, > + { 10, "\xdb\xb1\xdb\xb0" }, > + { 11, "\xdb\xb1\xdb\xb1" }, > + { 12, "\xdb\xb1\xdb\xb2" }, > + { 13, "\xdb\xb1\xdb\xb3" }, > + { 14, "\xdb\xb1\xdb\xb4" }, > + { 15, "\xdb\xb1\xdb\xb5" }, > + { 16, "\xdb\xb1\xdb\xb6" }, > + { 17, "\xdb\xb1\xdb\xb7" }, > + { 18, "\xdb\xb1\xdb\xb8" }, > + { 19, "\xdb\xb1\xdb\xb9" }, > + { 20, "\xdb\xb2\xdb\xb0" }, > + { 30, "\xdb\xb3\xdb\xb0" }, > + { 40, "\xdb\xb4\xdb\xb0" }, > + { 50, "\xdb\xb5\xdb\xb0" }, > + { 60, "\xdb\xb6\xdb\xb0" }, > + { 70, "\xdb\xb7\xdb\xb0" }, > + { 80, "\xdb\xb8\xdb\xb0" }, > + { 90, "\xdb\xb9\xdb\xb0" }, > + { 100, "\xdb\xb1\xdb\xb0\xdb\xb0" }, > + { 1000, "\xdb\xb1\xdb\xb0\xdb\xb0\xdb\xb0" }, > +}; > + > +static int > +do_test (void) > +{ > + xsetlocale (LC_ALL, "fa_IR.UTF-8"); > + > + for (int i = 0; i < array_length (inputs); i++) > + { > + int n; > + sscanf (inputs[i].str, "%Id", &n); > + TEST_COMPARE (n, inputs[i].n); > + } > + > + return 0; > +} > + > +#include > diff --git a/stdio-common/vfscanf-internal.c b/stdio-common/vfscanf-internal.c > index bfb9baa21a..ba4b289de6 100644 > --- a/stdio-common/vfscanf-internal.c > +++ b/stdio-common/vfscanf-internal.c > @@ -1455,13 +1455,14 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > int from_level; > int to_level; > int level; > + enum { num_digits_len = 10 }; > #ifdef COMPILE_WSCANF > - const wchar_t *wcdigits[10]; > - const wchar_t *wcdigits_extended[10]; > + const wchar_t *wcdigits[num_digits_len]; > #else > - const char *mbdigits[10]; > - const char *mbdigits_extended[10]; > + const char *mbdigits[num_digits_len]; > #endif > + CHAR_T *digits_extended[num_digits_len] = { NULL }; > + > /* "to_inpunct" is a map from ASCII digits to their > equivalent in locale. This is defined for locales > which use an extra digits set. */ > @@ -1482,18 +1483,18 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > /* Adding new level for extra digits set in locale file. */ > ++to_level; > > - for (n = 0; n < 10; ++n) > + for (n = 0; n < num_digits_len; ++n) > { > #ifdef COMPILE_WSCANF > wcdigits[n] = (const wchar_t *) > _NL_CURRENT (LC_CTYPE, _NL_CTYPE_INDIGITS0_WC + n); > > wchar_t *wc_extended = (wchar_t *) > - alloca ((to_level + 2) * sizeof (wchar_t)); > + malloc ((to_level + 2) * sizeof (wchar_t)); > __wmemcpy (wc_extended, wcdigits[n], to_level); > wc_extended[to_level] = __towctrans (L'0' + n, map); > wc_extended[to_level + 1] = '\0'; > - wcdigits_extended[n] = wc_extended; > + digits_extended[n] = wc_extended; > #else > mbdigits[n] > = curctype->values[_NL_CTYPE_INDIGITS0_MB + n].string; > @@ -1524,14 +1525,13 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > size_t mbdigits_len = last_char - mbdigits[n]; > > /* Allocate memory for extended multibyte digit. */ > - char *mb_extended; > - mb_extended = (char *) alloca (mbdigits_len + mblen + 1); > + char *mb_extended = malloc (mbdigits_len + mblen + 1); > > /* And get the mbdigits + extra_digit string. */ > *(char *) __mempcpy (__mempcpy (mb_extended, mbdigits[n], > mbdigits_len), > extra_mbdigit, mblen) = '\0'; > - mbdigits_extended[n] = mb_extended; > + digits_extended[n] = mb_extended; > #endif > } > } > @@ -1541,7 +1541,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > { > /* In this round we get the pointer to the digit strings > and also perform the first round of comparisons. */ > - for (n = 0; n < 10; ++n) > + for (n = 0; n < num_digits_len; ++n) > { > /* Get the string for the digits with value N. */ > #ifdef COMPILE_WSCANF > @@ -1553,7 +1553,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > DIAG_IGNORE_NEEDS_COMMENT (4.7, "-Wmaybe-uninitialized"); > > if (__glibc_unlikely (map != NULL)) > - wcdigits[n] = wcdigits_extended[n]; > + wcdigits[n] = digits_extended[n]; > else > wcdigits[n] = (const wchar_t *) > _NL_CURRENT (LC_CTYPE, _NL_CTYPE_INDIGITS0_WC + n); > @@ -1574,7 +1574,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > int avail = width > 0 ? width : INT_MAX; > > if (__glibc_unlikely (map != NULL)) > - mbdigits[n] = mbdigits_extended[n]; > + mbdigits[n] = digits_extended[n]; > else > mbdigits[n] > = curctype->values[_NL_CTYPE_INDIGITS0_MB + n].string; > @@ -1617,13 +1617,13 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > #endif > } > > - if (n == 10) > + if (n == num_digits_len) > { > /* Have not yet found the digit. */ > for (level = from_level + 1; level <= to_level; ++level) > { > /* Search all ten digits of this level. */ > - for (n = 0; n < 10; ++n) > + for (n = 0; n < num_digits_len; ++n) > { > #ifdef COMPILE_WSCANF > if (c == (wint_t) *wcdigits[n]) > @@ -1679,7 +1679,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > } > } > > - if (n < 10) > + if (n < num_digits_len) > c = L_('0') + n; > else if (flags & GROUP) > { > @@ -1708,7 +1708,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > { > __set_errno (ENOMEM); > done = EOF; > - goto errout; > + break; > } > > if (*cmpp != '\0') > @@ -1742,6 +1742,12 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > > c = inchar (); > } > + > + for (n = 0; n < num_digits_len; n++) > + free (digits_extended[n]); > + > + if (done == EOF) > + goto errout; > } > else > /* Read the number into workspace. */ > diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile > index 22192985e1..ed2660a524 100644 > --- a/wcsmbs/Makefile > +++ b/wcsmbs/Makefile > @@ -175,6 +175,7 @@ tests := \ > tst-wscanf-binary-c2x \ > tst-wscanf-binary-gnu11 \ > tst-wscanf-binary-gnu89 \ > + tst-wscanf-to_inpunct \ > wcsatcliff \ > wcsmbs-tst1 \ > # tests > @@ -186,6 +187,7 @@ LOCALES := \ > de_DE.ISO-8859-1 \ > de_DE.UTF-8 \ > en_US.ANSI_X3.4-1968 \ > + fa_IR.UTF-8 \ > hr_HR.ISO-8859-2 \ > ja_JP.EUC-JP \ > tr_TR.ISO-8859-9 \ > @@ -207,6 +209,7 @@ $(objpfx)tst-c16-surrogate.out: $(gen-locales) > $(objpfx)tst-c32-state.out: $(gen-locales) > $(objpfx)test-c8rtomb.out: $(gen-locales) > $(objpfx)test-mbrtoc8.out: $(gen-locales) > +$(objpfx)tst-wscanf-to_inpunct.out: $(gen-locales) > endif > > $(objpfx)tst-wcstod-round: $(libm) > diff --git a/wcsmbs/tst-wscanf-to_inpunct.c b/wcsmbs/tst-wscanf-to_inpunct.c > new file mode 100644 > index 0000000000..72f2a1a422 > --- /dev/null > +++ b/wcsmbs/tst-wscanf-to_inpunct.c > @@ -0,0 +1,79 @@ > +/* Test scanf for languages with mapping pairs of alternate digits and > + separators. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > +#include > + > +/* fa_IR defines to_inpunct for numbers. */ > +static const struct input_t > +{ > + int n; > + const wchar_t str[5]; > +} inputs[] = > +{ > + { 1, { 0x000006f1, L'\0' } }, > + { 2, { 0x000006f2, L'\0' } }, > + { 3, { 0x000006f3, L'\0' } }, > + { 4, { 0x000006f4, L'\0' } }, > + { 5, { 0x000006f5, L'\0' } }, > + { 6, { 0x000006f6, L'\0' } }, > + { 7, { 0x000006f7, L'\0' } }, > + { 8, { 0x000006f8, L'\0' } }, > + { 9, { 0x000006f9, L'\0' } }, > + { 10, { 0x000006f1, 0x000006f0, L'\0' } }, > + { 11, { 0x000006f1, 0x000006f1, L'\0' } }, > + { 12, { 0x000006f1, 0x000006f2, L'\0' } }, > + { 13, { 0x000006f1, 0x000006f3, L'\0' } }, > + { 14, { 0x000006f1, 0x000006f4, L'\0' } }, > + { 15, { 0x000006f1, 0x000006f5, L'\0' } }, > + { 16, { 0x000006f1, 0x000006f6, L'\0' } }, > + { 17, { 0x000006f1, 0x000006f7, L'\0' } }, > + { 18, { 0x000006f1, 0x000006f8, L'\0' } }, > + { 19, { 0x000006f1, 0x000006f9, L'\0' } }, > + { 20, { 0x000006f2, 0x000006f0, L'\0' } }, > + { 30, { 0x000006f3, 0x000006f0, L'\0' } }, > + { 40, { 0x000006f4, 0x000006f0, L'\0' } }, > + { 50, { 0x000006f5, 0x000006f0, L'\0' } }, > + { 60, { 0x000006f6, 0x000006f0, L'\0' } }, > + { 70, { 0x000006f7, 0x000006f0, L'\0' } }, > + { 80, { 0x000006f8, 0x000006f0, L'\0' } }, > + { 90, { 0x000006f9, 0x000006f0, L'\0' } }, > + { 100, { 0x000006f1, 0x000006f0, 0x000006f0, L'\0' } }, > + { 1000, { 0x000006f1, 0x000006f0, 0x000006f0, 0x000006f0, L'\0' } }, > +}; > + > +static int > +do_test (void) > +{ > + xsetlocale (LC_ALL, "fa_IR.UTF-8"); > + > + for (int i = 0; i < array_length (inputs); i++) > + { > + int n; > + swscanf (inputs[i].str, L"%Id", &n); > + TEST_COMPARE (n, inputs[i].n); > + } > + > + return 0; > +} > + > +#include