public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow
@ 2021-08-17 19:11 Keith Packard
  2021-08-18  8:59 ` Corinna Vinschen
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Packard @ 2021-08-17 19:11 UTC (permalink / raw)
  To: newlib


[-- Attachment #1.1: Type: text/plain, Size: 1943 bytes --]


svfwscanf replaces getwc and ungetwc_r. The comments in the code talk
about avoiding file operations, but they also need to bypass the
mbtowc calls as svfwscanf operates on wchar_t, not multibyte data,
which is a more important reason here; they would not work correctly
otherwise.

The ungetwc replacement has code which uses the 3 byte FILE _ubuf
field, but if wchar_t is 32-bits, this field is not large enough to
hold even one wchar_t value. Building in this mode generates warnings
about array overflow:

	In file included from ../../newlib/libc/stdio/svfiwscanf.c:35:
	../../newlib/libc/stdio/vfwscanf.c: In function '_sungetwc_r.isra':
	../../newlib/libc/stdio/vfwscanf.c:316:12: warning: array subscript 4294967295 is above array bounds of 'unsigned char[3]' [-Warray-bounds]
	  316 |   fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)];
	      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	In file included from ../../newlib/libc/stdio/stdio.h:46,
			 from ../../newlib/libc/stdio/vfwscanf.c:82,
			 from ../../newlib/libc/stdio/svfiwscanf.c:35:
	../../newlib/libc/include/sys/reent.h:216:17: note: while referencing '_ubuf'
	  216 |   unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */
	      |                 ^~~~~

However, the vfwscanf code *never* ungets data before the start of the
scanning operation, and *always* ungets data which matches the input
at that point, so the code always hits the block which backs up over
the input data and never hits the block which uses the _ubuf field.

In addition, the svfwscanf code will always start with the unget
buffer empty, so the ungetwc replacement never needs to support an
unget buffer at all.

Simplify the code by removing support for everything other than
backing up over the input data, leaving the check to make sure it
doesn't get underflowed in case the vfscanf code has a bug in it.

Signed-off-by: Keith Packard <keithp@keithp.com>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-svfwscanf-Simplify-_sungetwc_r-to-eliminate-apparent.patch --]
[-- Type: text/x-diff, Size: 4031 bytes --]

From 2bbbf24fab03472b73296356b7dfb13d616302e9 Mon Sep 17 00:00:00 2001
From: Keith Packard <keithp@keithp.com>
Date: Tue, 17 Aug 2021 11:51:52 -0700
Subject: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer
 overflow

svfwscanf replaces getwc and ungetwc_r. The comments in the code talk
about avoiding file operations, but they also need to bypass the
mbtowc calls as svfwscanf operates on wchar_t, not multibyte data,
which is a more important reason here; they would not work correctly
otherwise.

The ungetwc replacement has code which uses the 3 byte FILE _ubuf
field, but if wchar_t is 32-bits, this field is not large enough to
hold even one wchar_t value. Building in this mode generates warnings
about array overflow:

	In file included from ../../newlib/libc/stdio/svfiwscanf.c:35:
	../../newlib/libc/stdio/vfwscanf.c: In function '_sungetwc_r.isra':
	../../newlib/libc/stdio/vfwscanf.c:316:12: warning: array subscript 4294967295 is above array bounds of 'unsigned char[3]' [-Warray-bounds]
	  316 |   fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)];
	      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	In file included from ../../newlib/libc/stdio/stdio.h:46,
			 from ../../newlib/libc/stdio/vfwscanf.c:82,
			 from ../../newlib/libc/stdio/svfiwscanf.c:35:
	../../newlib/libc/include/sys/reent.h:216:17: note: while referencing '_ubuf'
	  216 |   unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */
	      |                 ^~~~~

However, the vfwscanf code *never* ungets data before the start of the
scanning operation, and *always* ungets data which matches the input
at that point, so the code always hits the block which backs up over
the input data and never hits the block which uses the _ubuf field.

In addition, the svfwscanf code will always start with the unget
buffer empty, so the ungetwc replacement never needs to support an
unget buffer at all.

Simplify the code by removing support for everything other than
backing up over the input data, leaving the check to make sure it
doesn't get underflowed in case the vfscanf code has a bug in it.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 newlib/libc/stdio/vfwscanf.c | 40 +++---------------------------------
 1 file changed, 3 insertions(+), 37 deletions(-)

diff --git a/newlib/libc/stdio/vfwscanf.c b/newlib/libc/stdio/vfwscanf.c
index f00d41a09..e9e00dfec 100644
--- a/newlib/libc/stdio/vfwscanf.c
+++ b/newlib/libc/stdio/vfwscanf.c
@@ -272,49 +272,15 @@ _sungetwc_r (struct _reent *data,
   /* After ungetc, we won't be at eof anymore */
   fp->_flags &= ~__SEOF;
 
-  /*
-   * If we are in the middle of ungetwc'ing, just continue.
-   * This may require expanding the current ungetc buffer.
+  /* All ungetwc usage in scanf un-gets the current character, so
+   * just back up over the string if we aren't at the start
    */
-
-  if (HASUB (fp))
-    {
-      if (fp->_r >= fp->_ub._size && __submore (data, fp))
-        {
-          return EOF;
-        }
-      fp->_p -= sizeof (wchar_t);
-      *fp->_p = (wchar_t) wc;
-      fp->_r += sizeof (wchar_t);
-      return wc;
-    }
-
-  /*
-   * If we can handle this by simply backing up, do so,
-   * but never replace the original character.
-   * (This makes swscanf() work when scanning `const' data.)
-   */
-
-  if (fp->_bf._base != NULL && fp->_p > fp->_bf._base
-      && ((wchar_t *)fp->_p)[-1] == wc)
+  if (fp->_bf._base != NULL && fp->_p > fp->_bf._base)
     {
       fp->_p -= sizeof (wchar_t);
       fp->_r += sizeof (wchar_t);
-      return wc;
     }
 
-  /*
-   * Create an ungetc buffer.
-   * Initially, we will use the `reserve' buffer.
-   */
-
-  fp->_ur = fp->_r;
-  fp->_up = fp->_p;
-  fp->_ub._base = fp->_ubuf;
-  fp->_ub._size = sizeof (fp->_ubuf);
-  fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)];
-  *(wchar_t *) fp->_p = wc;
-  fp->_r = 2;
   return wc;
 }
 
-- 
2.32.0


[-- Attachment #1.3: Type: text/plain, Size: 15 bytes --]


-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow
  2021-08-17 19:11 [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow Keith Packard
@ 2021-08-18  8:59 ` Corinna Vinschen
  2021-08-18 16:30   ` Keith Packard
  0 siblings, 1 reply; 7+ messages in thread
From: Corinna Vinschen @ 2021-08-18  8:59 UTC (permalink / raw)
  To: newlib

Hi Keith,

On Aug 17 12:11, Keith Packard wrote:
> svfwscanf replaces getwc and ungetwc_r. The comments in the code talk
> about avoiding file operations, but they also need to bypass the
> mbtowc calls as svfwscanf operates on wchar_t, not multibyte data,
> which is a more important reason here; they would not work correctly
> otherwise.
> 
> The ungetwc replacement has code which uses the 3 byte FILE _ubuf
> field, but if wchar_t is 32-bits, this field is not large enough to
> hold even one wchar_t value. Building in this mode generates warnings
> about array overflow:
> 
> 	In file included from ../../newlib/libc/stdio/svfiwscanf.c:35:
> 	../../newlib/libc/stdio/vfwscanf.c: In function '_sungetwc_r.isra':
> 	../../newlib/libc/stdio/vfwscanf.c:316:12: warning: array subscript 4294967295 is above array bounds of 'unsigned char[3]' [-Warray-bounds]
> 	  316 |   fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)];
> 	      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	In file included from ../../newlib/libc/stdio/stdio.h:46,
> 			 from ../../newlib/libc/stdio/vfwscanf.c:82,
> 			 from ../../newlib/libc/stdio/svfiwscanf.c:35:
> 	../../newlib/libc/include/sys/reent.h:216:17: note: while referencing '_ubuf'
> 	  216 |   unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */
> 	      |                 ^~~~~
> 
> However, the vfwscanf code *never* ungets data before the start of the
> scanning operation, and *always* ungets data which matches the input
> at that point, so the code always hits the block which backs up over
> the input data and never hits the block which uses the _ubuf field.

LGTM.  Under the unlikely assumption that wscanf gets extended in future
and has to ungetc a char different from the input char, how do we catch
that?  Do we need a hint, somehow, somewhere?


Corinna


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow
  2021-08-18  8:59 ` Corinna Vinschen
@ 2021-08-18 16:30   ` Keith Packard
  2021-08-18 16:50     ` Keith Packard
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Packard @ 2021-08-18 16:30 UTC (permalink / raw)
  To: Corinna Vinschen, newlib

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

Corinna Vinschen <vinschen@redhat.com> writes:

> LGTM.  Under the unlikely assumption that wscanf gets extended in future
> and has to ungetc a char different from the input char, how do we catch
> that?  Do we need a hint, somehow, somewhere?

I can't imagine a case where that wouldn't be a bug; the only reason I
know that ungetc takes the old char is because stdio might not still
have it in the buffer, and that can't happen for string sources.

However, I also don't know how we'd catch this bug at compile time, and
having the string source work differently in this case than the file
source means testing using string sources might not uncover a bug that
would appear with file sources.

A reasonable alternative would be to have the wchar_t version call the
char version multiple times, instead of just doing it inline; that would
avoid any semantic difference, which seems like a feature. I'll post a
patch that does it this way shortly and see which you prefer.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow
  2021-08-18 16:30   ` Keith Packard
@ 2021-08-18 16:50     ` Keith Packard
  2021-08-19 10:10       ` Corinna Vinschen
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Packard @ 2021-08-18 16:50 UTC (permalink / raw)
  To: Corinna Vinschen, newlib

[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]

Keith Packard <keithp@keithp.com> writes:

> A reasonable alternative would be to have the wchar_t version call the
> char version multiple times, instead of just doing it inline; that would
> avoid any semantic difference, which seems like a feature. I'll post a
> patch that does it this way shortly and see which you prefer.

Never mind -- the wchar_t function cannot simply call the char version
as that would require converting to multi-byte so that EOF is handled
correctly, *and* would end up breaking the getchar function which
assumes that the unget buffer has wchar_t values, not multibyte values.

After looking at the code further and discovering numerous other bugs in
this path, I think we should just use the patch as submitted; I cannot
imagine a case where POSIX would change the semantics of scanf to ever
re-write the input with different characters.

other bugs:

 1. Assumes wchar_t is 2 bytes by setting the bytes remaining (_r) to 2.

 2. Computes the pointer value by subtracting the wchar_t size from the
    _ubuf size, resulting in an unaligned pointer. Then uses this
    pointer directly in both getwc and ungetwc, which will cause a bus
    error on processors unable to access unaligned data.

We could also simplify the matching code path in vfscanf.c at some
point.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow
  2021-08-18 16:50     ` Keith Packard
@ 2021-08-19 10:10       ` Corinna Vinschen
  2021-08-19 15:17         ` Keith Packard
  0 siblings, 1 reply; 7+ messages in thread
From: Corinna Vinschen @ 2021-08-19 10:10 UTC (permalink / raw)
  To: newlib

On Aug 18 09:50, Keith Packard wrote:
> Keith Packard <keithp@keithp.com> writes:
> 
> > A reasonable alternative would be to have the wchar_t version call the
> > char version multiple times, instead of just doing it inline; that would
> > avoid any semantic difference, which seems like a feature. I'll post a
> > patch that does it this way shortly and see which you prefer.
> 
> Never mind -- the wchar_t function cannot simply call the char version
> as that would require converting to multi-byte so that EOF is handled
> correctly, *and* would end up breaking the getchar function which
> assumes that the unget buffer has wchar_t values, not multibyte values.
> 
> After looking at the code further and discovering numerous other bugs in
> this path, I think we should just use the patch as submitted; I cannot
> imagine a case where POSIX would change the semantics of scanf to ever
> re-write the input with different characters.

Ok, pushed.

> other bugs:
> 
>  1. Assumes wchar_t is 2 bytes by setting the bytes remaining (_r) to 2.

That was an oversight.  A few lines above I had replaced that with
sizeof(wchar_t) but apparently forgot this line.  D'oh.

>  2. Computes the pointer value by subtracting the wchar_t size from the
>     _ubuf size, resulting in an unaligned pointer. Then uses this
>     pointer directly in both getwc and ungetwc, which will cause a bus
>     error on processors unable to access unaligned data.

Given all chars are sizeof(wchar_t), how's the buffer ever going to
become unaligned?


Corinna


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow
  2021-08-19 10:10       ` Corinna Vinschen
@ 2021-08-19 15:17         ` Keith Packard
  2021-08-19 15:38           ` Corinna Vinschen
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Packard @ 2021-08-19 15:17 UTC (permalink / raw)
  To: Corinna Vinschen, newlib

[-- Attachment #1: Type: text/plain, Size: 313 bytes --]

Corinna Vinschen <vinschen@redhat.com> writes:

> Given all chars are sizeof(wchar_t), how's the buffer ever going to
> become unaligned?

It places the wchar_t at the end of _ubuf, which is 3 bytes long, making
it unaligned:

  fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)];

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow
  2021-08-19 15:17         ` Keith Packard
@ 2021-08-19 15:38           ` Corinna Vinschen
  0 siblings, 0 replies; 7+ messages in thread
From: Corinna Vinschen @ 2021-08-19 15:38 UTC (permalink / raw)
  To: newlib

[Please don't CC me, I'm reading the ML all the time.  Thanks]

On Aug 19 08:17, Keith Packard wrote:
> Corinna Vinschen <vinschen@redhat.com> writes:
> 
> > Given all chars are sizeof(wchar_t), how's the buffer ever going to
> > become unaligned?
> 
> It places the wchar_t at the end of _ubuf, which is 3 bytes long, making
> it unaligned:
> 
>   fp->_p = &fp->_ubuf[sizeof (fp->_ubuf) - sizeof (wchar_t)];

Oops, right.


Thanks,
Corinna


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-19 15:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 19:11 [PATCH] svfwscanf: Simplify _sungetwc_r to eliminate apparent buffer overflow Keith Packard
2021-08-18  8:59 ` Corinna Vinschen
2021-08-18 16:30   ` Keith Packard
2021-08-18 16:50     ` Keith Packard
2021-08-19 10:10       ` Corinna Vinschen
2021-08-19 15:17         ` Keith Packard
2021-08-19 15:38           ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).