From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from elaine.keithp.com (home.keithp.com [63.227.221.253]) by sourceware.org (Postfix) with ESMTPS id C03233858433 for ; Tue, 17 Aug 2021 19:11:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C03233858433 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=keithp.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=keithp.com Received: from localhost (localhost [127.0.0.1]) by elaine.keithp.com (Postfix) with ESMTP id 160193F30290 for ; Tue, 17 Aug 2021 12:10:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=keithp.com; s=mail; t=1629227452; bh=T76rWpMAsPknHdcPggsH2uIHIMtlsuLHVG3zIgdFK9I=; h=From:To:Subject:Date:From; b=WWY1XMzZ7eYIBNnhlUfJs/RZ2SqnChhfJKv1j66EtuJ4VjaGDEyrRym8ftjjg46mR hiPhXTnd62QgV7GrmSGdBhmQ+zRcmEn8VZ7lnbex5jjGGoTbjDf720j/ZNalKVerUD xA/6FQ+TzTtiuIhiP7SysIf2oaXzNFD7Ila+WcuN/ds7nvX5ctxS9NN8uRFhS80Kkk NWCxb5aF7D5inyxyd9R7p/h3el2SXPQ0lDWkgez/4dJPmkzDmc2hf6+PC61o0d2LX7 hoysVSKdI9WAo5qGnkawmvNIB6xd+sxpQidN/b/0Tp6cOCnTdDsenVUZUz2L29TTmg I6SAy/6cZUR8A== X-Virus-Scanned: Debian amavisd-new at keithp.com Received: from elaine.keithp.com ([127.0.0.1]) by localhost (elaine.keithp.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id GgvVQ3DJ9ZmB for ; Tue, 17 Aug 2021 12:10:47 -0700 (PDT) Received: from keithp.com (koto.keithp.com [192.168.11.2]) by elaine.keithp.com (Postfix) with ESMTPSA id 4D4043F30188 for ; Tue, 17 Aug 2021 12:10:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=keithp.com; s=mail; t=1629227447; bh=T76rWpMAsPknHdcPggsH2uIHIMtlsuLHVG3zIgdFK9I=; h=From:To:Subject:Date:From; b=isFD85cHdZT+4dxiUHOacsC5LHXyN0dLoWlPRopLKLg3fxP4rGZUMMWJiEqyOsde4 uelUujUGVha6tNm9tjbDp8EGRSMwByfBscz3AhyCRTWIOIl7vByGSpUXfXPI4EH49o jR3+CRmf7WxmjKEcueg37c0sgbj9sJw2c7H1SbO7MrjnI4PX7iP2Y37RijNPG6Q0vj 3MC0hcHAsOZDpYR7Bj7Q6oBOD/8fDhoG/SyR0KZbgHZ21vERGWUfxnpBBYBhXredE0 ww91zTz5tBWC3HVKpCiU6j5PhBUnPzF6noMgypFghrOB7kgT7DC82QCDKXNfqRi7a9 pChNUTKV3I65g== Received: by keithp.com (Postfix, from userid 1000) id AFCF61E6013A; Tue, 17 Aug 2021 12:11:09 -0700 (PDT) From: Keith Packard To: newlib@sourceware.org Subject: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow Date: Tue, 17 Aug 2021 12:11:09 -0700 Message-ID: <874kbnevjm.fsf@keithp.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Aug 2021 19:11:26 -0000 --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain svfwscanf replaces getwc and ungetwc_r. The comments in the code talk about avoiding file operations, but they also need to bypass the mbtowc calls as svfwscanf operates on wchar_t, not multibyte data, which is a more important reason here; they would not work correctly otherwise. The ungetwc replacement has code which uses the 3 byte FILE _ubuf field, but if wchar_t is 32-bits, this field is not large enough to hold even one wchar_t value. Building in this mode generates warnings about array overflow: In file included from ../../newlib/libc/stdio/svfiwscanf.c:35: ../../newlib/libc/stdio/vfwscanf.c: In function '_sungetwc_r.isra': ../../newlib/libc/stdio/vfwscanf.c:316:12: warning: array subscript 4294967295 is above array bounds of 'unsigned char[3]' [-Warray-bounds] 316 | fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)]; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from ../../newlib/libc/stdio/stdio.h:46, from ../../newlib/libc/stdio/vfwscanf.c:82, from ../../newlib/libc/stdio/svfiwscanf.c:35: ../../newlib/libc/include/sys/reent.h:216:17: note: while referencing '_ubuf' 216 | unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */ | ^~~~~ However, the vfwscanf code *never* ungets data before the start of the scanning operation, and *always* ungets data which matches the input at that point, so the code always hits the block which backs up over the input data and never hits the block which uses the _ubuf field. In addition, the svfwscanf code will always start with the unget buffer empty, so the ungetwc replacement never needs to support an unget buffer at all. Simplify the code by removing support for everything other than backing up over the input data, leaving the check to make sure it doesn't get underflowed in case the vfscanf code has a bug in it. Signed-off-by: Keith Packard --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-svfwscanf-Simplify-_sungetwc_r-to-eliminate-apparent.patch Content-Transfer-Encoding: quoted-printable From=202bbbf24fab03472b73296356b7dfb13d616302e9 Mon Sep 17 00:00:00 2001 From: Keith Packard Date: Tue, 17 Aug 2021 11:51:52 -0700 Subject: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buff= er overflow svfwscanf replaces getwc and ungetwc_r. The comments in the code talk about avoiding file operations, but they also need to bypass the mbtowc calls as svfwscanf operates on wchar_t, not multibyte data, which is a more important reason here; they would not work correctly otherwise. The ungetwc replacement has code which uses the 3 byte FILE _ubuf field, but if wchar_t is 32-bits, this field is not large enough to hold even one wchar_t value. Building in this mode generates warnings about array overflow: In file included from ../../newlib/libc/stdio/svfiwscanf.c:35: ../../newlib/libc/stdio/vfwscanf.c: In function '_sungetwc_r.isra': ../../newlib/libc/stdio/vfwscanf.c:316:12: warning: array subscript 429496= 7295 is above array bounds of 'unsigned char[3]' [-Warray-bounds] 316 | fp->_p =3D &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)]; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from ../../newlib/libc/stdio/stdio.h:46, from ../../newlib/libc/stdio/vfwscanf.c:82, from ../../newlib/libc/stdio/svfiwscanf.c:35: ../../newlib/libc/include/sys/reent.h:216:17: note: while referencing '_ub= uf' 216 | unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */ | ^~~~~ However, the vfwscanf code *never* ungets data before the start of the scanning operation, and *always* ungets data which matches the input at that point, so the code always hits the block which backs up over the input data and never hits the block which uses the _ubuf field. In addition, the svfwscanf code will always start with the unget buffer empty, so the ungetwc replacement never needs to support an unget buffer at all. Simplify the code by removing support for everything other than backing up over the input data, leaving the check to make sure it doesn't get underflowed in case the vfscanf code has a bug in it. Signed-off-by: Keith Packard =2D-- newlib/libc/stdio/vfwscanf.c | 40 +++--------------------------------- 1 file changed, 3 insertions(+), 37 deletions(-) diff --git a/newlib/libc/stdio/vfwscanf.c b/newlib/libc/stdio/vfwscanf.c index f00d41a09..e9e00dfec 100644 =2D-- a/newlib/libc/stdio/vfwscanf.c +++ b/newlib/libc/stdio/vfwscanf.c @@ -272,49 +272,15 @@ _sungetwc_r (struct _reent *data, /* After ungetc, we won't be at eof anymore */ fp->_flags &=3D ~__SEOF; =20 =2D /* =2D * If we are in the middle of ungetwc'ing, just continue. =2D * This may require expanding the current ungetc buffer. + /* All ungetwc usage in scanf un-gets the current character, so + * just back up over the string if we aren't at the start */ =2D =2D if (HASUB (fp)) =2D { =2D if (fp->_r >=3D fp->_ub._size && __submore (data, fp)) =2D { =2D return EOF; =2D } =2D fp->_p -=3D sizeof (wchar_t); =2D *fp->_p =3D (wchar_t) wc; =2D fp->_r +=3D sizeof (wchar_t); =2D return wc; =2D } =2D =2D /* =2D * If we can handle this by simply backing up, do so, =2D * but never replace the original character. =2D * (This makes swscanf() work when scanning `const' data.) =2D */ =2D =2D if (fp->_bf._base !=3D NULL && fp->_p > fp->_bf._base =2D && ((wchar_t *)fp->_p)[-1] =3D=3D wc) + if (fp->_bf._base !=3D NULL && fp->_p > fp->_bf._base) { fp->_p -=3D sizeof (wchar_t); fp->_r +=3D sizeof (wchar_t); =2D return wc; } =20 =2D /* =2D * Create an ungetc buffer. =2D * Initially, we will use the `reserve' buffer. =2D */ =2D =2D fp->_ur =3D fp->_r; =2D fp->_up =3D fp->_p; =2D fp->_ub._base =3D fp->_ubuf; =2D fp->_ub._size =3D sizeof (fp->_ubuf); =2D fp->_p =3D &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)]; =2D *(wchar_t *) fp->_p =3D wc; =2D fp->_r =3D 2; return wc; } =20 =2D-=20 2.32.0 --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable =2D-=20 =2Dkeith --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEw4O3eCVWE9/bQJ2R2yIaaQAAABEFAmEcCc0ACgkQ2yIaaQAA ABEBKRAAh4KPGL934f+sD2+auzAG8iERGoTNiNjvOh0eY3v4fFEV41MByOuUMWIG Lz9Xksiuun5dpM/2aPlQPHwnPVipYBvjqEUwA6PGJMKezF2/2rnX4HMEBVuoMF37 otE+rwJRnKHudSxBSzGIiQlj7mihvErlJ2GqTKujp/csWqyPdFfSRrtFGwAp7Xfs hi6T+ubH46cJvJOVDqCkoPb/ldA3ahUHnbx5+D2bNe8XdywQxYYaaYBNL01vSkAu yVm7JIu8NRS6aUKUQfNkQ8mQA+/G2M3cOybFCRwvQcj8gJwAm3xJnvwBSGOV0J2h eP3c/UrXXqn9hLGAfRh3FntWWsJ8wG3JQOccIUS4+r7KzOxy2TkmUVAYeRFdI9yB rTYMQUKOKb0P5LBi6dlYc194Nc1IXgJcmR34JkPifN/5vRw5GgJ68A9TSIFylLRt rMRcrhAPNtRxY/uQLCvx3xO+G6jsekB7tLMoo7nksyzoC66aKqY5go9N4ILp4Vt3 SO0ed1ZR3/0Wt9H10VX6i+8T5i0ma/tLLSxkXEInRHbz2XHoEHsIYXJ+kx1rxBX7 pa8g/Zrk0uGJcoDbo/E6Vmh6gjId9BGoPQctAwQemF6C+pkzIwxtATUuQWT+w1fe /kum6TxQON/rp7pQ8h306t0/FIXx7ZXapiH3gtA0TNZ1McUbHEE= =Eg/A -----END PGP SIGNATURE----- --==-=-=--