From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x32c.google.com (mail-ot1-x32c.google.com [IPv6:2607:f8b0:4864:20::32c]) by sourceware.org (Postfix) with ESMTPS id 81B1E3858D28 for ; Mon, 3 Jul 2023 13:40:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 81B1E3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-ot1-x32c.google.com with SMTP id 46e09a7af769-6b5d57d7db9so3515831a34.3 for ; Mon, 03 Jul 2023 06:40:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1688391632; x=1690983632; h=content-transfer-encoding:in-reply-to:organization:references:to :from:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=356+4pJm+v7sGvs1Uos/Guk0knNz4xoATZ91rOq4X/A=; b=kMOEtNn1JY3Y7Jb8CLd/jRUuAKLne83hercm2gr2lTIGWLD93UHrkG5n5nWgag5RF/ nq33RJdQg5kd2Eeu+vJ41ZCo1wLvpYcXyQXngd6nmitP+Ys/2+TxWlpFJ5KYZAwniH3L yFmPpiO2Fg9xrOzNgxH3Y1BQysVaC9V9c57SywD89Ba11Mn1aqsG4UR+ScTht96cxjxN x2gtJG/y0yshptcvPqAbtV+3T6IeFGqGLGRykAZyhBtz7doD9qwxfq8YEog0gYD4b3eD 02nE+gas5i2b5qpTsm7j8sNOnfD8NpgO5w39GDJaKpcoHHF1jlxF9vJdwm7qtrJQWYkX T7ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688391632; x=1690983632; h=content-transfer-encoding:in-reply-to:organization:references:to :from:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=356+4pJm+v7sGvs1Uos/Guk0knNz4xoATZ91rOq4X/A=; b=C7IoGvREKyV5q1tWiYDSOfyZIhvNBJWGBo0Et1UzeaE5I6AxyzJJVTViTa9R9M8AH0 h27hzsR6fZRg5WQpwBIub2YpOGbKbrBE273fZrrFEfb/m+VPP9vpq8ih73lU73jYLaOC 23Z4veNd3Yhwww94+0XTj1r7pq0RPbnEuagxZbT07y3RyEuQP/SaK0sL+XlqPF26HaRa h7PKNBAzAPAOUBqUikQa6QxCGbqGicidEIbxebzkN09NkygqDwC2im8k1wWRjF5ziKio 9wSGkm8HvMy2MQfYpgucORO5U/FotSZ6kUtSbbBGAovQb6ZT/jmtvuC2P0obP/ry9CrP ahuA== X-Gm-Message-State: AC+VfDyLLnsxofU+uG/eNgygcYvUnJxHvK9ICzyO7zXq1YX//c93oocq DKCIztbqRlg8VvBEUtN5n9hvVNlavj2+XoxCZs+pPQ== X-Google-Smtp-Source: ACHHUZ7MmzTi9LIoDWiwNy/FD5uzJniz/taESMuv90T+MvZF96FYWw0/fY+46rN8Uv+bfcUFEWppIg== X-Received: by 2002:a9d:6c55:0:b0:6b8:93c7:a84a with SMTP id g21-20020a9d6c55000000b006b893c7a84amr8372843otq.16.1688391631723; Mon, 03 Jul 2023 06:40:31 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c3:665c:4c86:ac7d:d2ce:ef? ([2804:1b3:a7c3:665c:4c86:ac7d:d2ce:ef]) by smtp.gmail.com with ESMTPSA id f16-20020a9d5e90000000b006b4a11b8df4sm9148444otl.51.2023.07.03.06.40.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Jul 2023 06:40:31 -0700 (PDT) Message-ID: <8a8f0468-c7aa-61d4-6e7f-fb782dcc3059@linaro.org> Date: Mon, 3 Jul 2023 10:40:28 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH] vfscanf-internal: Get rid of unbounded allocas Content-Language: en-US From: Adhemerval Zanella Netto To: Joe Simmons-Talbott , libc-alpha@sourceware.org References: <20230627180556.728512-1-josimmon@redhat.com> <148e9429-761f-ec58-3631-8fd26413b828@linaro.org> Organization: Linaro In-Reply-To: <148e9429-761f-ec58-3631-8fd26413b828@linaro.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 30/06/23 15:14, Adhemerval Zanella Netto wrote: > > > On 27/06/23 15:05, Joe Simmons-Talbott via Libc-alpha wrote: >> Replace two unbounded allocas with scratch_buffers to avoid potential >> stack overflow. >> --- >> stdio-common/vfscanf-internal.c | 26 +++++++++++++++++++++++--- >> 1 file changed, 23 insertions(+), 3 deletions(-) >> >> diff --git a/stdio-common/vfscanf-internal.c b/stdio-common/vfscanf-internal.c >> index bfb9baa21a..60376ccff9 100644 >> --- a/stdio-common/vfscanf-internal.c >> +++ b/stdio-common/vfscanf-internal.c >> @@ -335,6 +335,12 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, >> struct char_buffer charbuf; >> scratch_buffer_init (&charbuf.scratch); >> >> + struct scratch_buffer wc_ext_sbuf; >> + scratch_buffer_init (&wc_ext_sbuf); >> + >> + struct scratch_buffer mb_ext_sbuf; >> + scratch_buffer_init (&mb_ext_sbuf); >> + >> #ifdef __va_copy >> __va_copy (arg, argptr); >> #else >> @@ -1488,8 +1494,14 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, >> wcdigits[n] = (const wchar_t *) >> _NL_CURRENT (LC_CTYPE, _NL_CTYPE_INDIGITS0_WC + n); >> >> - wchar_t *wc_extended = (wchar_t *) >> - alloca ((to_level + 2) * sizeof (wchar_t)); >> + wchar_t *wc_extended; >> + if (!scratch_buffer_set_array_size >> + (&wc_ext_sbuf, sizeof (wchar_t), to_level + 2)) >> + { >> + done = EOF; >> + goto errout; >> + } >> + wc_extended = wc_ext_sbuf.data; >> __wmemcpy (wc_extended, wcdigits[n], to_level); >> wc_extended[to_level] = __towctrans (L'0' + n, map); >> wc_extended[to_level + 1] = '\0'; >> @@ -1525,7 +1537,13 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, >> >> /* Allocate memory for extended multibyte digit. */ >> char *mb_extended; >> - mb_extended = (char *) alloca (mbdigits_len + mblen + 1); >> + if (!scratch_buffer_set_array_size >> + (&mb_ext_sbuf, 1, mbdigits_len + mblen + 1)) >> + { >> + done = EOF; >> + goto errout; >> + } >> + mb_extended = mb_ext_sbuf.data; > > Unfortunately this change is wrong, it will overwrite the wcdigits[n]/mbdigits[n] > on each iteration since you set the value of the stratch_buffer without adjusting > the offset for each entry (for the wide operation). > > This code path will just be triggered by a handful of locale, but you can check > with the example: > > --- > #include > #include > #include > > #define array_length(var) \ > (sizeof (var) / sizeof ((var)[0]) \ > + 0 * sizeof (struct { \ > _Static_assert (!__builtin_types_compatible_p \ > (__typeof (var), __typeof (&(var)[0])), \ > "argument must be an array"); \ > })) > > static const struct > { > int n; > const char *str; > } inputs[] = > { > { 1, "\xdb\xb1" }, > { 2, "\xdb\xb2" }, > { 3, "\xdb\xb3" }, > { 4, "\xdb\xb4" }, > { 5, "\xdb\xb5" }, > { 6, "\xdb\xb6" }, > { 7, "\xdb\xb7" }, > { 8, "\xdb\xb8" }, > { 9, "\xdb\xb9" }, > { 10, "\xdb\xb1\xdb\xb0" }, > { 11, "\xdb\xb1\xdb\xb1" }, > { 12, "\xdb\xb1\xdb\xb2" }, > { 13, "\xdb\xb1\xdb\xb3" }, > { 14, "\xdb\xb1\xdb\xb4" }, > { 15, "\xdb\xb1\xdb\xb5" }, > { 16, "\xdb\xb1\xdb\xb6" }, > { 17, "\xdb\xb1\xdb\xb7" }, > { 18, "\xdb\xb1\xdb\xb8" }, > { 19, "\xdb\xb1\xdb\xb9" }, > { 20, "\xdb\xb2\xdb\xb0" }, > { 30, "\xdb\xb3\xdb\xb0" }, > { 40, "\xdb\xb4\xdb\xb0" }, > { 50, "\xdb\xb5\xdb\xb0" }, > { 60, "\xdb\xb6\xdb\xb0" }, > { 70, "\xdb\xb7\xdb\xb0" }, > { 80, "\xdb\xb8\xdb\xb0" }, > { 90, "\xdb\xb9\xdb\xb0" }, > { 100, "\xdb\xb1\xdb\xb0\xdb\xb0" }, > { 1000, "\xdb\xb1\xdb\xb0\xdb\xb0\xdb\xb0" }, > }; > > int main (int argc, char *argv[]) > { > assert (setlocale (LC_ALL, "fa_IR.UTF-8") != NULL); > > for (int i = 0; i < array_length (inputs); i++) > { > int n; > sscanf (inputs[i].str, "%Id", &n); > assert (n == inputs[i].n); > } > > return 0; > } > --- > > I generate the inputs by round-trip with printf using the same locale. A similar test > for wide char would be: > > --- > #include > #include > #include > #include > > #define array_length(var) \ > (sizeof (var) / sizeof ((var)[0]) \ > + 0 * sizeof (struct { \ > _Static_assert (!__builtin_types_compatible_p \ > (__typeof (var), __typeof (&(var)[0])), \ > "argument must be an array"); \ > })) > > static const struct input_t > { > int n; > const wchar_t str[5]; > } inputs[] = > { > { 1, { 0x000006f1, L'\0' } }, > { 2, { 0x000006f2, L'\0' } }, > { 3, { 0x000006f3, L'\0' } }, > { 4, { 0x000006f4, L'\0' } }, > { 5, { 0x000006f5, L'\0' } }, > { 6, { 0x000006f6, L'\0' } }, > { 7, { 0x000006f7, L'\0' } }, > { 8, { 0x000006f8, L'\0' } }, > { 9, { 0x000006f9, L'\0' } }, > { 10, { 0x000006f1, 0x000006f0, L'\0' } }, > { 11, { 0x000006f1, 0x000006f1, L'\0' } }, > { 12, { 0x000006f1, 0x000006f2, L'\0' } }, > { 13, { 0x000006f1, 0x000006f3, L'\0' } }, > { 14, { 0x000006f1, 0x000006f4, L'\0' } }, > { 15, { 0x000006f1, 0x000006f5, L'\0' } }, > { 16, { 0x000006f1, 0x000006f6, L'\0' } }, > { 17, { 0x000006f1, 0x000006f7, L'\0' } }, > { 18, { 0x000006f1, 0x000006f8, L'\0' } }, > { 19, { 0x000006f1, 0x000006f9, L'\0' } }, > { 20, { 0x000006f2, 0x000006f0, L'\0' } }, > { 30, { 0x000006f3, 0x000006f0, L'\0' } }, > { 40, { 0x000006f4, 0x000006f0, L'\0' } }, > { 50, { 0x000006f5, 0x000006f0, L'\0' } }, > { 60, { 0x000006f6, 0x000006f0, L'\0' } }, > { 70, { 0x000006f7, 0x000006f0, L'\0' } }, > { 80, { 0x000006f8, 0x000006f0, L'\0' } }, > { 90, { 0x000006f9, 0x000006f0, L'\0' } }, > { 100, { 0x000006f1, 0x000006f0, 0x000006f0, L'\0' } }, > { 1000, { 0x000006f1, 0x000006f0, 0x000006f0, 0x000006f0, L'\0' } }, > }; > > int main (int argc, char *argv[]) > { > assert (setlocale (LC_ALL, "fa_IR.UTF-8") != NULL); > > for (int i = 0; i < array_length (inputs); i++) > { > int n = 0; > swscanf (inputs[i].str, L"%Id", &n); > wprintf (L"n=%d input[%d].n=%d (%ls)\n", > n, > i, > inputs[i].n, > inputs[i].str); > assert (n == inputs[i].n); > } > > return 0; > }; > --- > > Also, you do not need two scratch buffer, since either one will be used for > !COMPILE_WSCANF and COMPILE_WSCANF; and to avoid the extra stack requirement > (since there are not used for the most locales) only define them if I18N > is actually used. The alternate digit handling should be something like: Scratch that, I just realized that once we reallocate the scratch_buffer the digits_extended pointers will be invalidated anyway. I think a better strategy would be just use malloc here, it will slow down the conversion a bit but since the size will small it will be fulfilled by the thread cache anyway. I will send a fix with some testcases.