From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=7+7C=CV=linaro.org=adhemerval.zanella@sourceware.org>
Received: from mail-ot1-x32c.google.com (mail-ot1-x32c.google.com [IPv6:2607:f8b0:4864:20::32c])
	by sourceware.org (Postfix) with ESMTPS id 81B1E3858D28
	for <libc-alpha@sourceware.org>; Mon,  3 Jul 2023 13:40:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 81B1E3858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org
Received: by mail-ot1-x32c.google.com with SMTP id 46e09a7af769-6b5d57d7db9so3515831a34.3
        for <libc-alpha@sourceware.org>; Mon, 03 Jul 2023 06:40:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1688391632; x=1690983632;
        h=content-transfer-encoding:in-reply-to:organization:references:to
         :from:content-language:subject:user-agent:mime-version:date
         :message-id:from:to:cc:subject:date:message-id:reply-to;
        bh=356+4pJm+v7sGvs1Uos/Guk0knNz4xoATZ91rOq4X/A=;
        b=kMOEtNn1JY3Y7Jb8CLd/jRUuAKLne83hercm2gr2lTIGWLD93UHrkG5n5nWgag5RF/
         nq33RJdQg5kd2Eeu+vJ41ZCo1wLvpYcXyQXngd6nmitP+Ys/2+TxWlpFJ5KYZAwniH3L
         yFmPpiO2Fg9xrOzNgxH3Y1BQysVaC9V9c57SywD89Ba11Mn1aqsG4UR+ScTht96cxjxN
         x2gtJG/y0yshptcvPqAbtV+3T6IeFGqGLGRykAZyhBtz7doD9qwxfq8YEog0gYD4b3eD
         02nE+gas5i2b5qpTsm7j8sNOnfD8NpgO5w39GDJaKpcoHHF1jlxF9vJdwm7qtrJQWYkX
         T7ew==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1688391632; x=1690983632;
        h=content-transfer-encoding:in-reply-to:organization:references:to
         :from:content-language:subject:user-agent:mime-version:date
         :message-id:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=356+4pJm+v7sGvs1Uos/Guk0knNz4xoATZ91rOq4X/A=;
        b=C7IoGvREKyV5q1tWiYDSOfyZIhvNBJWGBo0Et1UzeaE5I6AxyzJJVTViTa9R9M8AH0
         h27hzsR6fZRg5WQpwBIub2YpOGbKbrBE273fZrrFEfb/m+VPP9vpq8ih73lU73jYLaOC
         23Z4veNd3Yhwww94+0XTj1r7pq0RPbnEuagxZbT07y3RyEuQP/SaK0sL+XlqPF26HaRa
         h7PKNBAzAPAOUBqUikQa6QxCGbqGicidEIbxebzkN09NkygqDwC2im8k1wWRjF5ziKio
         9wSGkm8HvMy2MQfYpgucORO5U/FotSZ6kUtSbbBGAovQb6ZT/jmtvuC2P0obP/ry9CrP
         ahuA==
X-Gm-Message-State: AC+VfDyLLnsxofU+uG/eNgygcYvUnJxHvK9ICzyO7zXq1YX//c93oocq
	DKCIztbqRlg8VvBEUtN5n9hvVNlavj2+XoxCZs+pPQ==
X-Google-Smtp-Source: ACHHUZ7MmzTi9LIoDWiwNy/FD5uzJniz/taESMuv90T+MvZF96FYWw0/fY+46rN8Uv+bfcUFEWppIg==
X-Received: by 2002:a9d:6c55:0:b0:6b8:93c7:a84a with SMTP id g21-20020a9d6c55000000b006b893c7a84amr8372843otq.16.1688391631723;
        Mon, 03 Jul 2023 06:40:31 -0700 (PDT)
Received: from ?IPV6:2804:1b3:a7c3:665c:4c86:ac7d:d2ce:ef? ([2804:1b3:a7c3:665c:4c86:ac7d:d2ce:ef])
        by smtp.gmail.com with ESMTPSA id f16-20020a9d5e90000000b006b4a11b8df4sm9148444otl.51.2023.07.03.06.40.30
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 03 Jul 2023 06:40:31 -0700 (PDT)
Message-ID: <8a8f0468-c7aa-61d4-6e7f-fb782dcc3059@linaro.org>
Date: Mon, 3 Jul 2023 10:40:28 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.12.0
Subject: Re: [PATCH] vfscanf-internal: Get rid of unbounded allocas
Content-Language: en-US
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Joe Simmons-Talbott <josimmon@redhat.com>, libc-alpha@sourceware.org
References: <20230627180556.728512-1-josimmon@redhat.com>
 <148e9429-761f-ec58-3631-8fd26413b828@linaro.org>
Organization: Linaro
In-Reply-To: <148e9429-761f-ec58-3631-8fd26413b828@linaro.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>


On 30/06/23 15:14, Adhemerval Zanella Netto wrote:
> 
> 
> On 27/06/23 15:05, Joe Simmons-Talbott via Libc-alpha wrote:
>> Replace two unbounded allocas with scratch_buffers to avoid potential
>> stack overflow.
>> ---
>>  stdio-common/vfscanf-internal.c | 26 +++++++++++++++++++++++---
>>  1 file changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/stdio-common/vfscanf-internal.c b/stdio-common/vfscanf-internal.c
>> index bfb9baa21a..60376ccff9 100644
>> --- a/stdio-common/vfscanf-internal.c
>> +++ b/stdio-common/vfscanf-internal.c
>> @@ -335,6 +335,12 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr,
>>    struct char_buffer charbuf;
>>    scratch_buffer_init (&charbuf.scratch);
>>  
>> +  struct scratch_buffer wc_ext_sbuf;
>> +  scratch_buffer_init (&wc_ext_sbuf);
>> +
>> +  struct scratch_buffer mb_ext_sbuf;
>> +  scratch_buffer_init (&mb_ext_sbuf);
>> +
>>  #ifdef __va_copy
>>    __va_copy (arg, argptr);
>>  #else
>> @@ -1488,8 +1494,14 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr,
>>  		      wcdigits[n] = (const wchar_t *)
>>  			_NL_CURRENT (LC_CTYPE, _NL_CTYPE_INDIGITS0_WC + n);
>>  
>> -		      wchar_t *wc_extended = (wchar_t *)
>> -			alloca ((to_level + 2) * sizeof (wchar_t));
>> +		      wchar_t *wc_extended;
>> +		      if (!scratch_buffer_set_array_size
>> +		          (&wc_ext_sbuf, sizeof (wchar_t), to_level + 2))
>> +		        {
>> +		          done = EOF;
>> +		          goto errout;
>> +		        }
>> +		      wc_extended = wc_ext_sbuf.data;
>>  		      __wmemcpy (wc_extended, wcdigits[n], to_level);
>>  		      wc_extended[to_level] = __towctrans (L'0' + n, map);
>>  		      wc_extended[to_level + 1] = '\0';
>> @@ -1525,7 +1537,13 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr,
>>  
>>  		      /*  Allocate memory for extended multibyte digit.  */
>>  		      char *mb_extended;
>> -		      mb_extended = (char *) alloca (mbdigits_len + mblen + 1);
>> +		      if (!scratch_buffer_set_array_size
>> +		          (&mb_ext_sbuf, 1, mbdigits_len + mblen + 1))
>> +		        {
>> +		          done = EOF;
>> +		          goto errout;
>> +		        }
>> +		      mb_extended = mb_ext_sbuf.data;
> 
> Unfortunately this change is wrong, it will overwrite the wcdigits[n]/mbdigits[n]
> on each iteration since you set the value of the stratch_buffer without adjusting
> the offset for each entry (for the wide operation).
> 
> This code path will just be triggered by a handful of locale, but you can check
> with the example:
> 
> ---
> #include <assert.h>
> #include <stdio.h>
> #include <locale.h>
> 
> #define array_length(var)                                               \
>   (sizeof (var) / sizeof ((var)[0])                                     \
>    + 0 * sizeof (struct {                                               \
>        _Static_assert (!__builtin_types_compatible_p                    \
>                        (__typeof (var), __typeof (&(var)[0])),          \
>                        "argument must be an array");                    \
>    }))
> 
> static const struct
> {
>   int n;
>   const char *str;
> } inputs[] =
> {
>   { 1, "\xdb\xb1" },
>   { 2, "\xdb\xb2" },
>   { 3, "\xdb\xb3" },
>   { 4, "\xdb\xb4" },
>   { 5, "\xdb\xb5" },
>   { 6, "\xdb\xb6" },
>   { 7, "\xdb\xb7" },
>   { 8, "\xdb\xb8" },
>   { 9, "\xdb\xb9" },
>   { 10, "\xdb\xb1\xdb\xb0" },
>   { 11, "\xdb\xb1\xdb\xb1" },
>   { 12, "\xdb\xb1\xdb\xb2" },
>   { 13, "\xdb\xb1\xdb\xb3" },
>   { 14, "\xdb\xb1\xdb\xb4" },
>   { 15, "\xdb\xb1\xdb\xb5" },
>   { 16, "\xdb\xb1\xdb\xb6" },
>   { 17, "\xdb\xb1\xdb\xb7" },
>   { 18, "\xdb\xb1\xdb\xb8" },
>   { 19, "\xdb\xb1\xdb\xb9" },
>   { 20, "\xdb\xb2\xdb\xb0" },
>   { 30, "\xdb\xb3\xdb\xb0" },
>   { 40, "\xdb\xb4\xdb\xb0" },
>   { 50, "\xdb\xb5\xdb\xb0" },
>   { 60, "\xdb\xb6\xdb\xb0" },
>   { 70, "\xdb\xb7\xdb\xb0" },
>   { 80, "\xdb\xb8\xdb\xb0" },
>   { 90, "\xdb\xb9\xdb\xb0" },
>   { 100, "\xdb\xb1\xdb\xb0\xdb\xb0" },
>   { 1000, "\xdb\xb1\xdb\xb0\xdb\xb0\xdb\xb0" },
> };
> 
> int main (int argc, char *argv[])
> {
>   assert (setlocale (LC_ALL, "fa_IR.UTF-8") != NULL);
> 
>   for (int i = 0; i < array_length (inputs); i++)
>     {
>       int n;
>       sscanf (inputs[i].str, "%Id", &n);
>       assert (n == inputs[i].n);
>     }
> 
>   return 0;
> }
> ---
> 
> I generate the inputs by round-trip with printf using the same locale.  A similar test
> for wide char would be:
> 
> ---
> #include <assert.h>
> #include <stdio.h>
> #include <locale.h>
> #include <wchar.h>
> 
> #define array_length(var)                                               \
>   (sizeof (var) / sizeof ((var)[0])                                     \
>    + 0 * sizeof (struct {                                               \
>        _Static_assert (!__builtin_types_compatible_p                    \
>                        (__typeof (var), __typeof (&(var)[0])),          \
>                        "argument must be an array");                    \
>    }))
> 
> static const struct input_t
> {
>   int n;
>   const wchar_t str[5];
> } inputs[] =
> {
>   {    1, { 0x000006f1, L'\0' } },
>   {    2, { 0x000006f2, L'\0' } },
>   {    3, { 0x000006f3, L'\0' } },
>   {    4, { 0x000006f4, L'\0' } },
>   {    5, { 0x000006f5, L'\0' } },
>   {    6, { 0x000006f6, L'\0' } },
>   {    7, { 0x000006f7, L'\0' } },
>   {    8, { 0x000006f8, L'\0' } },
>   {    9, { 0x000006f9, L'\0' } },
>   {   10, { 0x000006f1, 0x000006f0, L'\0' } },
>   {   11, { 0x000006f1, 0x000006f1, L'\0' } },
>   {   12, { 0x000006f1, 0x000006f2, L'\0' } },
>   {   13, { 0x000006f1, 0x000006f3, L'\0' } },
>   {   14, { 0x000006f1, 0x000006f4, L'\0' } },
>   {   15, { 0x000006f1, 0x000006f5, L'\0' } },
>   {   16, { 0x000006f1, 0x000006f6, L'\0' } },
>   {   17, { 0x000006f1, 0x000006f7, L'\0' } },
>   {   18, { 0x000006f1, 0x000006f8, L'\0' } },
>   {   19, { 0x000006f1, 0x000006f9, L'\0' } },
>   {   20, { 0x000006f2, 0x000006f0, L'\0' } },
>   {   30, { 0x000006f3, 0x000006f0, L'\0' } },
>   {   40, { 0x000006f4, 0x000006f0, L'\0' } },
>   {   50, { 0x000006f5, 0x000006f0, L'\0' } },
>   {   60, { 0x000006f6, 0x000006f0, L'\0' } },
>   {   70, { 0x000006f7, 0x000006f0, L'\0' } },
>   {   80, { 0x000006f8, 0x000006f0, L'\0' } },
>   {   90, { 0x000006f9, 0x000006f0, L'\0' } },
>   {  100, { 0x000006f1, 0x000006f0, 0x000006f0, L'\0' } },
>   { 1000, { 0x000006f1, 0x000006f0, 0x000006f0, 0x000006f0, L'\0' } },
> };
> 
> int main (int argc, char *argv[])
> {
>   assert (setlocale (LC_ALL, "fa_IR.UTF-8") != NULL);
> 
>   for (int i = 0; i < array_length (inputs); i++)
>     {
>       int n = 0;
>       swscanf (inputs[i].str, L"%Id", &n);
>       wprintf (L"n=%d input[%d].n=%d (%ls)\n",
>                n,
>                i,
>                inputs[i].n,
>                inputs[i].str);
>       assert (n == inputs[i].n);
>     }
> 
>   return 0;  
> };
> ---
> 
> Also, you do not need two scratch buffer, since either one will be used for
> !COMPILE_WSCANF and COMPILE_WSCANF; and to avoid the extra stack requirement
> (since there are not used for the most locales) only define them if I18N
> is actually used.  The alternate digit handling should be something like:

Scratch that, I just realized that once we reallocate the scratch_buffer the
digits_extended pointers will be invalidated anyway.  I think a better strategy
would be just use malloc here, it will slow down the conversion a bit but since
the size will small it will be fulfilled by the thread cache anyway.

I will send a fix with some testcases.