From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2c.google.com (mail-oa1-x2c.google.com [IPv6:2001:4860:4864:20::2c]) by sourceware.org (Postfix) with ESMTPS id A3F1C385828C for ; Mon, 11 Jul 2022 12:38:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A3F1C385828C Received: by mail-oa1-x2c.google.com with SMTP id 586e51a60fabf-10c0e6dd55eso6505671fac.7 for ; Mon, 11 Jul 2022 05:38:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=6bsJ77TsqSQ5bWIzun5mYOZG60QpUFraEPU+cpLiPa4=; b=Q/3M9+Sc1h/mb8fjurE4Ay44Judj60pXSXLHxuRB5uH+NP1cUdGEqJLohzL0FBwjDe uSaxxPvEHYNVbn+sxIYURCirLY9pO6UxoezlFpqg5x+MAj6393Uf3Q13FRVXCRNulj3B 5f03iA2rPMSKzV18AviXPWNSanU2p1qawaZ3SeEyHeC7P2kPtAy2tnhzwSRPGKEcjf++ kph3YWuEAD8pUMUuMqwqEGI+B12SWDvoBAPbLbMAw8uqIGsj61WwAKCIbjUc4rxxHAke L9H/KNMvsSseEPG7LJGwTHSiWZPa+b3WcJWDqTFohjFiBrFdSHg7s+k0FH9sDecSUq4H 3QCw== X-Gm-Message-State: AJIora/snstuQEMlM6cbk3USwmN0HwPOvKAI4ytZX1MtY372MdltRHNg hzagUEonhd8npwDDEmWtPGdYzQuKS+QulA== X-Google-Smtp-Source: AGRyM1u4/qTYVG2v1Se7qQW4yn/g/aPC/tn1DWoOlZERU7F6j7Vk06TPa2EsffkpvOcNuihvPgr4Rg== X-Received: by 2002:a05:6870:f597:b0:101:342:9722 with SMTP id eh23-20020a056870f59700b0010103429722mr7309542oab.54.1657543132961; Mon, 11 Jul 2022 05:38:52 -0700 (PDT) Received: from smtpclient.apple ([2804:431:c7cb:5bec:a591:bf2:d8b8:7a84]) by smtp.gmail.com with ESMTPSA id t9-20020a9d5909000000b0060b0b638583sm2555922oth.13.2022.07.11.05.38.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 11 Jul 2022 05:38:52 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: Buffer size checking for scanf* functions From: Adhemerval Zanella In-Reply-To: Date: Mon, 11 Jul 2022 09:38:50 -0300 Cc: libc-help@sourceware.org Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Yair Lenga X-Mailer: Apple Mail (2.3696.100.31) X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2022 12:38:56 -0000 > On 6 Jul 2022, at 12:04, Yair Lenga wrote: >=20 > Thanks for elaborating .Agree that when possible '-m' should be used, = but it's not always trivial .Lot of code is already written in such a = way that those changes are less than trivial. Not to mention that each = %m will require adding an strcpy (or equivalent) to copy the dynamically = allocated strings into the fixed length storage usually defined in = struct, etc. >=20 > struct { ... } s ; > char *sx =3D NULL; > scanf("%ms %d", &sx, &s.i) ; > strcpy_fix(s.output, sizeof(s.output), &sx) ; // Copy from *sx to = s.output, up to the limit, free *sx, and set sx =3D NULL Well either this or to use the a simpler solution like: #define NAMELEN 32 struct { char name[NAMELEN]; } x; #define STRFY(__n) XSTRFY(__n) #define XSTRFY(__n) #__n scanf ("%"STRFY(NAMELEN)"s", x.name); >=20 > Your point about backward compatibility is also very valid - if = possible new features should try to avoid collision with future = improvement. The C standard is getting updated every 10 years (c99, c11, = and the expected c23), I could not find any reason why the C standard = committee chose to use '%m' instead of using the already established = '%a' that existed for many years in glibc, and assign new meaning to the = '%a'. I hope that those are exceptions that proves the rules. >=20 > You raised a good point with the 'scanf_s' - the fact that they chose = to modify the behavior of the '%s' . To my understanding scanf_s '%s' = requires 2 arguments (char *, size_t), vs. scanf that will only expect = the 'char *'. It would have been a much better solution to keep '%s' = compatible. and introduce another formatting sequence for the dynamic = fixed-length string. >=20 > Going back to the question - what will be a good way to integrate the = type safety provided by scanf_s, without creating problems. a few ideas = that I have: > * Use '%S' (upper S) into indicate that a pair (char *, size_t) is = expected, OR > * Use '%@s' ('@' can be any unused letter or special character e.g. = '%!s', '%:s', ...). The logical choice should have been '*' - symmetry = with printf("%*s"). Unfortunately, '*' is already used .. as "ignore = assignment' flag. > * Use '%n %s', when 'n' will indicate a size parameter will be = provided, and will apply to the next '%s' or '%[', or even '%ms' - = dynamic width limit, instead of static width limit. >=20 > Personally, I prefer the second option '%@s', it matches the style of = '*' for printf. Easy for existing developers to grasp. Interesting = enough, it might be possible to implement the scanf_s as a wrapper = around scanf, with some manipulation of the argument list. I don=E2=80=99t have a strong preference, and although I think scanf is = still a bad interface [1] I think you might try to raise this on libc-alpha. I think the =E2=80=98@=E2=80=98 modifier would make more sense, since = ideally it would extend to wscanf familiar as well (and it already defines =E2=80=99%S=E2=80= =99). [1] https://github.com/biojppm/rapidyaml/issues/40=