public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
From: Keith Packard <keithp@keithp.com>
To: newlib@sourceware.org <newlib@sourceware.org>
Subject: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow
Date: Tue, 17 Aug 2021 12:11:09 -0700	[thread overview]
Message-ID: <874kbnevjm.fsf@keithp.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 1943 bytes --]


svfwscanf replaces getwc and ungetwc_r. The comments in the code talk
about avoiding file operations, but they also need to bypass the
mbtowc calls as svfwscanf operates on wchar_t, not multibyte data,
which is a more important reason here; they would not work correctly
otherwise.

The ungetwc replacement has code which uses the 3 byte FILE _ubuf
field, but if wchar_t is 32-bits, this field is not large enough to
hold even one wchar_t value. Building in this mode generates warnings
about array overflow:

	In file included from ../../newlib/libc/stdio/svfiwscanf.c:35:
	../../newlib/libc/stdio/vfwscanf.c: In function '_sungetwc_r.isra':
	../../newlib/libc/stdio/vfwscanf.c:316:12: warning: array subscript 4294967295 is above array bounds of 'unsigned char[3]' [-Warray-bounds]
	  316 |   fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)];
	      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	In file included from ../../newlib/libc/stdio/stdio.h:46,
			 from ../../newlib/libc/stdio/vfwscanf.c:82,
			 from ../../newlib/libc/stdio/svfiwscanf.c:35:
	../../newlib/libc/include/sys/reent.h:216:17: note: while referencing '_ubuf'
	  216 |   unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */
	      |                 ^~~~~

However, the vfwscanf code *never* ungets data before the start of the
scanning operation, and *always* ungets data which matches the input
at that point, so the code always hits the block which backs up over
the input data and never hits the block which uses the _ubuf field.

In addition, the svfwscanf code will always start with the unget
buffer empty, so the ungetwc replacement never needs to support an
unget buffer at all.

Simplify the code by removing support for everything other than
backing up over the input data, leaving the check to make sure it
doesn't get underflowed in case the vfscanf code has a bug in it.

Signed-off-by: Keith Packard <keithp@keithp.com>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-svfwscanf-Simplify-_sungetwc_r-to-eliminate-apparent.patch --]
[-- Type: text/x-diff, Size: 4031 bytes --]

From 2bbbf24fab03472b73296356b7dfb13d616302e9 Mon Sep 17 00:00:00 2001
From: Keith Packard <keithp@keithp.com>
Date: Tue, 17 Aug 2021 11:51:52 -0700
Subject: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer
 overflow

svfwscanf replaces getwc and ungetwc_r. The comments in the code talk
about avoiding file operations, but they also need to bypass the
mbtowc calls as svfwscanf operates on wchar_t, not multibyte data,
which is a more important reason here; they would not work correctly
otherwise.

The ungetwc replacement has code which uses the 3 byte FILE _ubuf
field, but if wchar_t is 32-bits, this field is not large enough to
hold even one wchar_t value. Building in this mode generates warnings
about array overflow:

	In file included from ../../newlib/libc/stdio/svfiwscanf.c:35:
	../../newlib/libc/stdio/vfwscanf.c: In function '_sungetwc_r.isra':
	../../newlib/libc/stdio/vfwscanf.c:316:12: warning: array subscript 4294967295 is above array bounds of 'unsigned char[3]' [-Warray-bounds]
	  316 |   fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)];
	      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	In file included from ../../newlib/libc/stdio/stdio.h:46,
			 from ../../newlib/libc/stdio/vfwscanf.c:82,
			 from ../../newlib/libc/stdio/svfiwscanf.c:35:
	../../newlib/libc/include/sys/reent.h:216:17: note: while referencing '_ubuf'
	  216 |   unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */
	      |                 ^~~~~

However, the vfwscanf code *never* ungets data before the start of the
scanning operation, and *always* ungets data which matches the input
at that point, so the code always hits the block which backs up over
the input data and never hits the block which uses the _ubuf field.

In addition, the svfwscanf code will always start with the unget
buffer empty, so the ungetwc replacement never needs to support an
unget buffer at all.

Simplify the code by removing support for everything other than
backing up over the input data, leaving the check to make sure it
doesn't get underflowed in case the vfscanf code has a bug in it.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 newlib/libc/stdio/vfwscanf.c | 40 +++---------------------------------
 1 file changed, 3 insertions(+), 37 deletions(-)

diff --git a/newlib/libc/stdio/vfwscanf.c b/newlib/libc/stdio/vfwscanf.c
index f00d41a09..e9e00dfec 100644
--- a/newlib/libc/stdio/vfwscanf.c
+++ b/newlib/libc/stdio/vfwscanf.c
@@ -272,49 +272,15 @@ _sungetwc_r (struct _reent *data,
   /* After ungetc, we won't be at eof anymore */
   fp->_flags &= ~__SEOF;
 
-  /*
-   * If we are in the middle of ungetwc'ing, just continue.
-   * This may require expanding the current ungetc buffer.
+  /* All ungetwc usage in scanf un-gets the current character, so
+   * just back up over the string if we aren't at the start
    */
-
-  if (HASUB (fp))
-    {
-      if (fp->_r >= fp->_ub._size && __submore (data, fp))
-        {
-          return EOF;
-        }
-      fp->_p -= sizeof (wchar_t);
-      *fp->_p = (wchar_t) wc;
-      fp->_r += sizeof (wchar_t);
-      return wc;
-    }
-
-  /*
-   * If we can handle this by simply backing up, do so,
-   * but never replace the original character.
-   * (This makes swscanf() work when scanning `const' data.)
-   */
-
-  if (fp->_bf._base != NULL && fp->_p > fp->_bf._base
-      && ((wchar_t *)fp->_p)[-1] == wc)
+  if (fp->_bf._base != NULL && fp->_p > fp->_bf._base)
     {
       fp->_p -= sizeof (wchar_t);
       fp->_r += sizeof (wchar_t);
-      return wc;
     }
 
-  /*
-   * Create an ungetc buffer.
-   * Initially, we will use the `reserve' buffer.
-   */
-
-  fp->_ur = fp->_r;
-  fp->_up = fp->_p;
-  fp->_ub._base = fp->_ubuf;
-  fp->_ub._size = sizeof (fp->_ubuf);
-  fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)];
-  *(wchar_t *) fp->_p = wc;
-  fp->_r = 2;
   return wc;
 }
 
-- 
2.32.0


[-- Attachment #1.3: Type: text/plain, Size: 15 bytes --]


-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

             reply	other threads:[~2021-08-17 19:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-17 19:11 Keith Packard [this message]
2021-08-18  8:59 ` Corinna Vinschen
2021-08-18 16:30   ` Keith Packard
2021-08-18 16:50     ` Keith Packard
2021-08-19 10:10       ` Corinna Vinschen
2021-08-19 15:17         ` Keith Packard
2021-08-19 15:38           ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874kbnevjm.fsf@keithp.com \
    --to=keithp@keithp.com \
    --cc=newlib@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).