public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
* undefined references since newlib-3.2.0
@ 2020-06-09  8:00 Wolf, Josef
  2020-06-12  8:21 ` Josef Wolf
  0 siblings, 1 reply; 14+ messages in thread
From: Wolf, Josef @ 2020-06-09  8:00 UTC (permalink / raw)
  To: newlib

Hello everybody,



Since newlib-3.2.0, I get this error when linking one of my projects:



$ m68k-unknown-elf-gcc -nostartfiles -Wl,--cref,--section-start=vectors=0 \

    -Wl,-Ttext=0x400,--entry=entry -Wl,--oformat,elf32-m68k \

    -Wl,--cref,-Map,proj.map \

    -Wl,-T,ldscript.be -ansi -pedantic -Wall -Wcast-align \

    -Wstrict-prototypes -Wmissing-prototypes -std=c89 -Wnull-dereference -g \

    -O2 -fno-toplevel-reorder  -mcpu32  -o proj.elf `cat proj.objs` -lc

      /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-abort.o):

      in function `abort':

      /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/stdlib/abort.c:59: undefined reference to `_exit'

      /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-signalr.o):

      in function `_kill_r':

      /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/reent/signalr.c:53: undefined reference to `kill'

      /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-signalr.o):

      in function `_getpid_r':

      /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/reent/signalr.c:83: undefined reference to `getpid'



The project builds just fine with newlib versions up to 3.1.0



newlib was configured like this:



/m/a/tmp/builds/crossgcc/src/newlib-3.2.0/configure --target=m68k-unknown-elf --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --prefix=/m/a/local/crossgcc/m68k-unknown-elf/nanoLibraries --enable-lite-exit --enable-newlib-nano-malloc --enable-newlib-nano-formatted-io --enable-newlib-reent-small --enable-newlib-retargetable-locking --enable-newlib-global-atexit --enable-newlib-global-stdio-streams --disable-newlib-supplied-syscalls --disable-newlib-fvwrite-in-streamio --disable-newlib-fseek-optimization --disable-newlib-wide-orient --disable-newlib-unbuf-stream-opt --disable-nls



Any suggestions?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-09  8:00 undefined references since newlib-3.2.0 Wolf, Josef
@ 2020-06-12  8:21 ` Josef Wolf
  2020-06-12 22:50   ` Josef Wolf
  0 siblings, 1 reply; 14+ messages in thread
From: Josef Wolf @ 2020-06-12  8:21 UTC (permalink / raw)
  To: newlib

Hello again,

finally, I have identified the commit, which makes my project to fail:

My project links just fine against newlib-df5c79f30c3f871b7e0edd6d4629af78b30fca15
but it fails with the error message shown below when linking against
newlib-f88aece242178ff0c187d56e34a79645fbc44a23

To be honest, I completely fail to understand how this commit
https://sourceware.org/git?p=newlib-cygwin.git;a=commit;h=f88aece242178ff0c187d56e34a79645fbc44a23
can possibly cause those link failures. After all,
f88aece242178ff0c187d56e34a79645fbc44a23 shows no traces of abort/exit/kill/getpid.

Any suggestions?

On Tue, Jun 09, 2020 at 08:00:25AM +0000, Wolf, Josef via Newlib wrote:
> 
> Since newlib-3.2.0, I get this error when linking one of my projects:
> 
> 
> $ m68k-unknown-elf-gcc -nostartfiles -Wl,--cref,--section-start=vectors=0 \
>     -Wl,-Ttext=0x400,--entry=entry -Wl,--oformat,elf32-m68k \
>     -Wl,--cref,-Map,proj.map \
>     -Wl,-T,ldscript.be -ansi -pedantic -Wall -Wcast-align \
>     -Wstrict-prototypes -Wmissing-prototypes -std=c89 -Wnull-dereference -g \
>     -O2 -fno-toplevel-reorder  -mcpu32  -o proj.elf `cat proj.objs` -lc
>       /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-abort.o):
>       in function `abort':
>       /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/stdlib/abort.c:59: undefined reference to `_exit'
>       /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-signalr.o):
>       in function `_kill_r':
>       /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/reent/signalr.c:53: undefined reference to `kill'
>       /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-signalr.o):
>       in function `_getpid_r':
>       /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/reent/signalr.c:83: undefined reference to `getpid'
> 
> The project builds just fine with newlib versions up to 3.1.0
> 
> newlib was configured like this:
> 
> /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/configure --target=m68k-unknown-elf --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --prefix=/m/a/local/crossgcc/m68k-unknown-elf/nanoLibraries --enable-lite-exit --enable-newlib-nano-malloc --enable-newlib-nano-formatted-io --enable-newlib-reent-small --enable-newlib-retargetable-locking --enable-newlib-global-atexit --enable-newlib-global-stdio-streams --disable-newlib-supplied-syscalls --disable-newlib-fvwrite-in-streamio --disable-newlib-fseek-optimization --disable-newlib-wide-orient --disable-newlib-unbuf-stream-opt --disable-nls
> 
> Any suggestions?

-- 
Josef Wolf
jw@raven.inka.de

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-12  8:21 ` Josef Wolf
@ 2020-06-12 22:50   ` Josef Wolf
  2020-06-13  0:05     ` Keith Packard
  2020-06-13 10:31     ` Jeffrey Walton
  0 siblings, 2 replies; 14+ messages in thread
From: Josef Wolf @ 2020-06-12 22:50 UTC (permalink / raw)
  To: newlib

On Fri, Jun 12, 2020 at 10:21:11AM +0200, Josef Wolf wrote:

> To be honest, I completely fail to understand how this commit
> https://sourceware.org/git?p=newlib-cygwin.git;a=commit;h=f88aece242178ff0c187d56e34a79645fbc44a23
> can possibly cause those link failures. After all,
> f88aece242178ff0c187d56e34a79645fbc44a23 shows no traces of abort/exit/kill/getpid.

This patch replaces Balloc() by eBalloc() which calls __assert_func() on
failure which in turn calls exit() and/or abort()

Is this really a sane thing to do? After all, assertions should be used to
detect logically impossible situations but not as a general replacement for
error handling.

Shouldn't those library functions simply report the error by returning NULL to
the caller and let the application handle the situation?

-- 
Josef Wolf
jw@raven.inka.de

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-12 22:50   ` Josef Wolf
@ 2020-06-13  0:05     ` Keith Packard
  2020-06-13  8:27       ` Josef Wolf
  2020-06-13 10:31     ` Jeffrey Walton
  1 sibling, 1 reply; 14+ messages in thread
From: Keith Packard @ 2020-06-13  0:05 UTC (permalink / raw)
  To: Josef Wolf, newlib

[-- Attachment #1: Type: text/plain, Size: 1522 bytes --]

Josef Wolf <jw@raven.inka.de> writes:

> This patch replaces Balloc() by eBalloc() which calls __assert_func() on
> failure which in turn calls exit() and/or abort()

As an interim step, having a fix available sooner was a good thing.

> Shouldn't those library functions simply report the error by returning NULL to
> the caller and let the application handle the situation?

It's complicated by the libc API which doesn't have defined error return
values for all of the affected APIs. Glibc has similar failure modes;
using those is probably the best option for newlib.

I've patched this code in picolibc (a newlib fork designed for embedded
systems) to track the malloc failures and return status information back
to applications. Interested parties might want to port those fixes back
to newlib.

https://github.com/keith-packard/picolibc/commit/dad1bd20fb927fb14a63d7951b0441f1cec73bf8
https://github.com/keith-packard/picolibc/commit/9fc3006ef9fec4a72b6dd29d2b691c1cc386e8f1

I've also implemented the recent 'ryu' float/string conversion code,
which performs exact (i.e. round-trip-able) conversions with much lower
precision (128bit for 64-bit floats, 64-bit for 32-bit floats). As a
result, that code does all of its operations on the stack and doesn't
use malloc at all, eliminating this particular source of errors. That
code was not writen for the newlib stdio bits, but instead for the
picolibc tinystdio bits. Still, it might serve as a basis for a newlib
port.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-13  0:05     ` Keith Packard
@ 2020-06-13  8:27       ` Josef Wolf
  2020-06-13 17:20         ` Keith Packard
  0 siblings, 1 reply; 14+ messages in thread
From: Josef Wolf @ 2020-06-13  8:27 UTC (permalink / raw)
  To: newlib

Thanks for your reply, Keith!

On Fri, Jun 12, 2020 at 05:05:15PM -0700, Keith Packard wrote:
> Josef Wolf <jw@raven.inka.de> writes:
> 
> > This patch replaces Balloc() by eBalloc() which calls __assert_func() on
> > failure which in turn calls exit() and/or abort()
> 
> As an interim step, having a fix available sooner was a good thing.

I see. But at a first glance, returning NULL seems to be not much more work.

> > Shouldn't those library functions simply report the error by returning NULL to
> > the caller and let the application handle the situation?
> 
> It's complicated by the libc API which doesn't have defined error return
> values for all of the affected APIs.

I don't really understand this. The classical way to return error from
functions returning a pointer is to return NULL. One has to make sure not to
leak, of course. Which return values would be needed beyond NULL?

-- 
Josef Wolf
jw@raven.inka.de

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-12 22:50   ` Josef Wolf
  2020-06-13  0:05     ` Keith Packard
@ 2020-06-13 10:31     ` Jeffrey Walton
  1 sibling, 0 replies; 14+ messages in thread
From: Jeffrey Walton @ 2020-06-13 10:31 UTC (permalink / raw)
  To: newlib

On Fri, Jun 12, 2020 at 6:51 PM Josef Wolf <jw@raven.inka.de> wrote:
>
> On Fri, Jun 12, 2020 at 10:21:11AM +0200, Josef Wolf wrote:
>
> > To be honest, I completely fail to understand how this commit
> > https://sourceware.org/git?p=newlib-cygwin.git;a=commit;h=f88aece242178ff0c187d56e34a79645fbc44a23
> > can possibly cause those link failures. After all,
> > f88aece242178ff0c187d56e34a79645fbc44a23 shows no traces of abort/exit/kill/getpid.
>
> This patch replaces Balloc() by eBalloc() which calls __assert_func() on
> failure which in turn calls exit() and/or abort()

Please don't call abort(3).

My programs usually handle sensitive information. I need destructors
to run to ensure sensitive information is wiped. This is a compliance
item.

I also need to ensure sensitive information does not escape from the
application's security boundary. For example, the sensitive
information should not be written to the file system by way of a core
file; and it should not be egressed to an error reporting service for
the platform provider or application developer to inspect.

Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-13  8:27       ` Josef Wolf
@ 2020-06-13 17:20         ` Keith Packard
  2020-06-14 14:50           ` Josef Wolf
  2020-06-17 11:36           ` Josef Wolf
  0 siblings, 2 replies; 14+ messages in thread
From: Keith Packard @ 2020-06-13 17:20 UTC (permalink / raw)
  To: Josef Wolf, newlib

[-- Attachment #1: Type: text/plain, Size: 1837 bytes --]

Josef Wolf <jw@raven.inka.de> writes:

> I don't really understand this. The classical way to return error from
> functions returning a pointer is to return NULL. One has to make sure not to
> leak, of course. Which return values would be needed beyond NULL?

The issue is that you need to reflect the allocation error all the way
up to the initial libc entry point, and you need to use a defined error
return value at that point, preferably one which indicates that the
library had an internal allocation failure.

Here are some affected libc functions which don't return a pointer and
which have no error indication of any kind:

        atof
        atoff

These are defined to just return strtod/strtof, so they continue to
inherit whatever those functions do. Which is now:

These have errnos defined, but those do not include ENOMEM:

        strtod
        strtof
        strtold
        wcstod
        wcstold
        strtodg

These now return infinity and set errno to ERANGE on allocation
failure. (not ideal, but the options are limited)

Here are some which do return a pointer, but do not document any errors:

        ecvt
        fcvt
        gcvt
        ecvtbuf
        fcvtbuf
        gcvtbuf

These now can return NULL on allocation failure. They do not set errno.

And here's a list of functions which I feel reasonable applications
should not expect an allocation error from:

        sprintf
        snprintf
        sscanf
        
These return EOF on allocation failure.

I wasn't too concerned with the exact semantics as there are no good
answers when libc uses malloc in these functions, so once I had
eliminated the internal aborts, I moved on to adding the ryu conversion
code to tinystdio to eliminate allocation in that path entirely.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-13 17:20         ` Keith Packard
@ 2020-06-14 14:50           ` Josef Wolf
  2020-06-14 17:02             ` Keith Packard
  2020-06-17 11:36           ` Josef Wolf
  1 sibling, 1 reply; 14+ messages in thread
From: Josef Wolf @ 2020-06-14 14:50 UTC (permalink / raw)
  To: newlib

On Sat, Jun 13, 2020 at 10:20:54AM -0700, Keith Packard wrote:

> Here are some affected libc functions which don't return a pointer and
> which have no error indication of any kind:
> 
>         atof
>         atoff
> [ ... ]
>         strtod
>         strtof
>         strtold
>         wcstod
>         wcstold
>         strtodg

Uh! Why on earth would those functions need to allocate memory?

> These now return infinity and set errno to ERANGE on allocation
> failure. (not ideal, but the options are limited)
> 
> Here are some which do return a pointer, but do not document any errors:
> 
>         ecvt
>         fcvt

Maybe the documentation can be fixed?

>         gcvt
>         ecvtbuf
>         fcvtbuf
>         gcvtbuf

Those get a pointer passed. No need to allocate memory.

> And here's a list of functions which I feel reasonable applications
> should not expect an allocation error from:

I don't think any application should expect those functions to call exit()
and/or abort() either.

>         sprintf
>         snprintf

Those should return -1 on failure.

>         sscanf

For this, ENOMEM is documented.

-- 
Josef Wolf
jw@raven.inka.de

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-14 14:50           ` Josef Wolf
@ 2020-06-14 17:02             ` Keith Packard
  2020-06-14 17:40               ` Brian Inglis
  2020-06-15  4:53               ` Dimitrios Glynos
  0 siblings, 2 replies; 14+ messages in thread
From: Keith Packard @ 2020-06-14 17:02 UTC (permalink / raw)
  To: Josef Wolf, newlib

[-- Attachment #1: Type: text/plain, Size: 5229 bytes --]

Josef Wolf <jw@raven.inka.de> writes:

>>         atof
>>         atoff
>> [ ... ]
>>         strtod
>>         strtof
>>         strtold
>>         wcstod
>>         wcstold
>>         strtodg
>
> Uh! Why on earth would those functions need to allocate memory?

Because they are performing string to float conversions using code
written in 1991 by David Gay, based on research done by Will Clinger
which shows that exact conversion from arbitrary strings of decimal
digits to fixed precision binary requires arbitrary precision
arithmetic.

        https://dl.acm.org/doi/10.1145/93548.93557

>> These now return infinity and set errno to ERANGE on allocation
>> failure. (not ideal, but the options are limited)
>> 
>> Here are some which do return a pointer, but do not document any errors:
>> 
>>         ecvt
>>         fcvt
>
> Maybe the documentation can be fixed?

The documentation is based on a standard, and fixing that standard
involves a bit of process...

>>         gcvt
>>         ecvtbuf
>>         fcvtbuf
>>         gcvtbuf
>
> Those get a pointer passed. No need to allocate memory.

These functions are using code also written by David Gay to perform
float to string conversion, based on research done by Guy Steele and Jon
White in how to print floating point numbers accurately (which happened
to be presented at the same conference as the work above!). In this
work, they showed that exact conversion could be done using 1050 bit
arithmetic to generate a 64-bit double result:

        https://dl.acm.org/doi/10.1145/93548.93559

David Gay's code in newlib for both directions uses arbitrary precision
arithmetic code found in newlib/libc/stdlib/mprec.c. This code allocates
variable sized arrays of integers on the heap to hold all of the values.
Before the eBalloc patch, none of these allocations were checked,
leading to a rather long list of CVEs as the code could end up storing
through a NULL pointer, which can cause security problems on some
architectures.

>> And here's a list of functions which I feel reasonable applications
>> should not expect an allocation error from:
>
> I don't think any application should expect those functions to call exit()
> and/or abort() either.

I'm in complete agreement here. It's better to return an error that an
application *might* check than to not give it any chance to recover at
all.

>>         sprintf
>>         snprintf
>
> Those should return -1 on failure.
>
>>         sscanf
>
> For this, ENOMEM is documented.

Yes, but as I suggested, applications probably aren't expecting a call
to sscanf to return EOF and set errno to ENOMEM.

The real answer to your concerns is to replace the old arbitrary
precision based float/string conversion code with code that uses results
from new research by Ulf Adams.

That research improves on Steel & White by reducing the precision
required for exact 64-bit float to string conversion from 1050 bits to
128 bits. Adams also presents an algorithm using a similar technique to
perform (a slightly weaker form of) exact string to float conversion in
the same precision:

        https://dl.acm.org/citation.cfm?doid=3296979.3192369

This reasonably small fixed precision can be statically allocated in
memory, or allocated on the stack. Either of these solutions eliminates
the use of the dynamic heap through malloc, and eliminates the need to
change the specification of all of these functions to account for the
heap usage in the existing newlib code.

Ulf Adams also published code to implement this algorithm on github:

        https://github.com/ulfjack/ryu

I've ported this code to picolibc, a fork of newlib designed for
embedded systems. That library has an alternate stdio implementation
that doesn't need to use malloc, and it made sense to add this
malloc-free float/string conversion code to that (the previous
float/string conversion code in this implementation was not exact). When
compiled using that code, picolibc will not return errors from malloc
failures in the above cases because it does not call malloc in those
code paths.

The picolibc source repository also includes the stdio code from newlib
which can be used in place of the default picolibc stdio code by setting
a build option. That code has been modified to catch allocation
failures and return the failures above. I did that in case someone
wanted to use the original stdio code as I felt even this non-default
code should not expose applications to arbitrary calls to abort from
inside the library. I believe this code should be ported back to newlib
so that at least newlib wouldn't call abort. Even better would be to
have someone take a look at the Ryu paper and code and make that work in
newlib.

(The definition of 'exact' used in Ulf Adams work offers the guarantee
that you can print any floating point value, and then re-read that
string to exactly reproduce the original floating point value in
memory. This is weaker than what Clinger's research used; in that work,
the goal was to generate the floating point value closest to an
arbitrary string of decimal digits.)

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-14 17:02             ` Keith Packard
@ 2020-06-14 17:40               ` Brian Inglis
  2020-06-14 20:13                 ` Keith Packard
  2020-06-15  4:53               ` Dimitrios Glynos
  1 sibling, 1 reply; 14+ messages in thread
From: Brian Inglis @ 2020-06-14 17:40 UTC (permalink / raw)
  To: newlib

On 2020-06-14 11:02, Keith Packard via Newlib wrote:
> Josef Wolf <jw@raven.inka.de> writes:
>>>         atof
>>>         atoff
>>> [ ... ]
>>>         strtod
>>>         strtof
>>>         strtold
>>>         wcstod
>>>         wcstold
>>>         strtodg

>> Uh! Why on earth would those functions need to allocate memory?

> Because they are performing string to float conversions using code
> written in 1991 by David Gay, based on research done by Will Clinger
> which shows that exact conversion from arbitrary strings of decimal
> digits to fixed precision binary requires arbitrary precision
> arithmetic.
>         https://dl.acm.org/doi/10.1145/93548.93557

>>> These now return infinity and set errno to ERANGE on allocation
>>> failure. (not ideal, but the options are limited)
>>> Here are some which do return a pointer, but do not document any errors:
>>>         ecvt
>>>         fcvt

>> Maybe the documentation can be fixed?

> The documentation is based on a standard, and fixing that standard
> involves a bit of process...

>>>         gcvt
>>>         ecvtbuf
>>>         fcvtbuf
>>>         gcvtbuf

>> Those get a pointer passed. No need to allocate memory.

> These functions are using code also written by David Gay to perform
> float to string conversion, based on research done by Guy Steele and Jon
> White in how to print floating point numbers accurately (which happened
> to be presented at the same conference as the work above!). In this
> work, they showed that exact conversion could be done using 1050 bit
> arithmetic to generate a 64-bit double result:
>         https://dl.acm.org/doi/10.1145/93548.93559
> David Gay's code in newlib for both directions uses arbitrary precision
> arithmetic code found in newlib/libc/stdlib/mprec.c. This code allocates
> variable sized arrays of integers on the heap to hold all of the values.
> Before the eBalloc patch, none of these allocations were checked,
> leading to a rather long list of CVEs as the code could end up storing
> through a NULL pointer, which can cause security problems on some
> architectures.

>>> And here's a list of functions which I feel reasonable applications
>>> should not expect an allocation error from:

>> I don't think any application should expect those functions to call exit()
>> and/or abort() either.

> I'm in complete agreement here. It's better to return an error that an
> application *might* check than to not give it any chance to recover at
> all.

>>>         sprintf
>>>         snprintf

>> Those should return -1 on failure.

>>>         sscanf

>> For this, ENOMEM is documented.

> Yes, but as I suggested, applications probably aren't expecting a call
> to sscanf to return EOF and set errno to ENOMEM.
> The real answer to your concerns is to replace the old arbitrary
> precision based float/string conversion code with code that uses results
> from new research by Ulf Adams.
> That research improves on Steel & White by reducing the precision
> required for exact 64-bit float to string conversion from 1050 bits to
> 128 bits. Adams also presents an algorithm using a similar technique to
> perform (a slightly weaker form of) exact string to float conversion in
> the same precision:
>         https://dl.acm.org/citation.cfm?doid=3296979.3192369

Paper is available:
https://www.researchgate.net/publication/329410883_Ryu_fast_float-to-string_conversion/link/5c073a3292851c6ca1ff1bdb/download

> This reasonably small fixed precision can be statically allocated in
> memory, or allocated on the stack. Either of these solutions eliminates
> the use of the dynamic heap through malloc, and eliminates the need to
> change the specification of all of these functions to account for the
> heap usage in the existing newlib code.
> Ulf Adams also published code to implement this algorithm on github:
>         https://github.com/ulfjack/ryu
> I've ported this code to picolibc, a fork of newlib designed for
> embedded systems. That library has an alternate stdio implementation
> that doesn't need to use malloc, and it made sense to add this
> malloc-free float/string conversion code to that (the previous
> float/string conversion code in this implementation was not exact). When
> compiled using that code, picolibc will not return errors from malloc
> failures in the above cases because it does not call malloc in those
> code paths.
> The picolibc source repository also includes the stdio code from newlib
> which can be used in place of the default picolibc stdio code by setting
> a build option. That code has been modified to catch allocation
> failures and return the failures above. I did that in case someone
> wanted to use the original stdio code as I felt even this non-default
> code should not expose applications to arbitrary calls to abort from
> inside the library. I believe this code should be ported back to newlib
> so that at least newlib wouldn't call abort. Even better would be to
> have someone take a look at the Ryu paper and code and make that work in
> newlib.
> (The definition of 'exact' used in Ulf Adams work offers the guarantee
> that you can print any floating point value, and then re-read that
> string to exactly reproduce the original floating point value in
> memory. This is weaker than what Clinger's research used; in that work,
> the goal was to generate the floating point value closest to an
> arbitrary string of decimal digits.)

What are the accuracy tradeoffs, if any, vs memory vs time?
Should both approaches be retained for flexibility and reproducibility?

Hopefully any new approach added will be implemented as reentrant functions,
using externally supplied memory if required, and some interfaces may provide
that memory statically allocated.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-14 17:40               ` Brian Inglis
@ 2020-06-14 20:13                 ` Keith Packard
  0 siblings, 0 replies; 14+ messages in thread
From: Keith Packard @ 2020-06-14 20:13 UTC (permalink / raw)
  To: Brian Inglis, newlib

[-- Attachment #1: Type: text/plain, Size: 4224 bytes --]

Brian Inglis <Brian.Inglis@SystematicSw.ab.ca> writes:

> What are the accuracy tradeoffs, if any, vs memory vs time?

The 'exact' code offers a 'round trip' guarantee if you use at least 9
digits for 32-bit floats and 17 digits for 64-bit floats. The inexact
code performs all of the exponent scaling operations using floating
point values; the more scaling you do, the less precise the result.

For memory, neither use significant RAM as there aren't that many
intermediate values. Here's a couple of complete application size
comparisons (this example is taken from
https://github.com/keith-packard/picolibc/blob/main/doc/printf.md ):

        #include <stdio.h>
        #include <stdlib.h>

        void
        main(void)
        {
        	printf(" 2⁶¹ = %lld π ≃ %.17g\n", 1ll << 61, printf_float(3.141592653589793));
        }


Here's a comparison between the available options, all built from
picolibc head as of today. These are all compiled for RISC-V rv32imafdc
(32-bit with hardware float and double to avoid linking in all of the
soft float code) as I can emulate that using qemu for
testing. Picolibc's replacement stdio code has a hack for supporting
printf of 32-bit floats, which is useful on smaller systems to reduce
the size of the code, which I show here with the 32-bit measurements.

   text	   data	    bss	    dec	    hex	filename                Description
   9078	     16	      8	   9102	   238e	printf-ryu.elf          Ryu 64-bit
   6350	     16	      8	   6374	   18e6	printf-ryu-float.elf    Ryu 32-bit

   7364	     16	      8	   7388	   1cdc	printf-hard.elf         Old 64-bit
   6428	     16	      8	   6452	   1934	printf-hard-float.elf   Old 32-bit

   2314	     16	      8	   2338	    922	printf-ryu-int.elf      No floats

For more comparison, here's the size of the newlib stdio. Note that
this does not include 1700 bytes of heap allocated after the program
starts:

   text	   data	    bss	    dec	    hex	filename
  22702	    760	    868	  24330	   5f0a	printf-newlib.elf

I haven't benchmarked performance of any of these implementations; Ulf
Adams (the author of the Ryu paper and code) claims significant
performance benefits over glibc and musl. I would wager that it's
significantly faster than the newlib code though.

For a machine without hardware float support, I'd be hard pressed to
guess which is faster; the Ryu code is all done in integer arithmetic,
and so avoids the cost of software floating point, even though it's
doing a lot more visible computation.

> Should both approaches be retained for flexibility and
> reproducibility?

Picolibc retains all of these options in the source code, so you are
free to select the option which is best for your environment. The Debian
packages of picolibc binary versions for RISC-V, ARM and ESP8266 all use
the default options, which use the exact (Ryu) option in the replacement
stdio code.

Configure the build with '-Dio-float-exact=false' to use the inexact
versions, compile with '-Dtinystdio=false -Dnewlib-io-float=true -Dio-long-long=true' to use
the newlib stdio code with float and long long support (which is
always included in the replacement stdio code).

The tests adapt to the inexact mode by reducing the required precision,
which is why picolibc can still pass in that mode. In exact mode, the
tests require exact results. CI tests run on every push show that
picolibc continues to pass this requirement:

        https://github.com/keith-packard/picolibc/actions

> Hopefully any new approach added will be implemented as reentrant functions,
> using externally supplied memory if required, and some interfaces may provide
> that memory statically allocated.

Yes, the new functions use only constant data with all read-write values
being local variables allocated in registers or on the stack. The
picolibc stdio code doesn't have any state in the FILE structure when
performing output, which makes these functions entirely re-entrant. You
can see from the above that picolibc doesn't use a lot of RAM for these
operations; that 24 bytes is all that is required, aside from space on
the stack.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-14 17:02             ` Keith Packard
  2020-06-14 17:40               ` Brian Inglis
@ 2020-06-15  4:53               ` Dimitrios Glynos
  2020-06-16  3:43                 ` Keith Packard
  1 sibling, 1 reply; 14+ messages in thread
From: Dimitrios Glynos @ 2020-06-15  4:53 UTC (permalink / raw)
  To: newlib

Hello,

I'm responsible for the CVE's that triggered these patches
(see [1]) and would just like to mention on-list that
during the initial (private) discussion of these issues with both
Jeff Johnston of newlib and Keith Packard of newlib-nano/picolibc
I had provided the following recommendation:

a) when the standard allows so, have the API return the NULL pointer
   to the caller with ENOMEM, so that the caller knows there's an
   out of memory (OOM) issue.

b) when the standard has no provision for ENOMEM, it would be
   good practice to introduce a function that would be called under
   OOM conditions.

OOM conditions are special, especially for embedded and
bare metal projects, as:
i) a project might not be able to simply call abort(), i.e. other
   important actions must be performed (e.g. communicate the error
   to the user or over some interface)
ii) a project might want to take special actions so that it survives
   this condition (e.g. relinquish some resources, unwind
   to some other state)

For these reasons, I had proposed for case (b) the introduction
of a function that would be implemented by the developer, and
would handle non-reported out-of-memory conditions. Developers
already supply implementations for system calls on bare bones devices,
and this could be a similar concept to that. There are two ways
of implementing this, one at compile-time (e.g. as you define
sbrk today) and another at runtime (similar to an atexit()
registration). However I feel that the compile-time introduction
of the function elaborates more clearly the message "that this
is something that you must take care of when using this library".

I don't recommend masquerading ENOMEM as some other error, as many
developers may miss that this is a point in the
code they should be covering points (i) and/or (ii).

CVE-wise newlib with the right compile-time configuration covers
the specific issues presented in the advisory, albeit with the
caveats you have all witnessed. It is wise to also inspect if
the callers of the reported functions handle NULL returns correctly
as in picolibc there were a number of callers that required a patch.

Finally, it would be beneficial for all if both projects
(picolibc and newlib) filed a comment to the standards group
stating the cases where an ENOMEM was found useful but was not
covered by the standard.

Hope this helps.

Best regards,

Dimitris

[1]
https://census-labs.com/news/2020/01/31/multiple-null-pointer-dereference-vulnerabilities-in-newlib/

On 14/6/20 8:02 μ.μ., Keith Packard via Newlib wrote:
> Josef Wolf <jw@raven.inka.de> writes:
> 
>>>         atof
>>>         atoff
>>> [ ... ]
>>>         strtod
>>>         strtof
>>>         strtold
>>>         wcstod
>>>         wcstold
>>>         strtodg
>>
>> Uh! Why on earth would those functions need to allocate memory?
> 
> Because they are performing string to float conversions using code
> written in 1991 by David Gay, based on research done by Will Clinger
> which shows that exact conversion from arbitrary strings of decimal
> digits to fixed precision binary requires arbitrary precision
> arithmetic.
> 
>         https://dl.acm.org/doi/10.1145/93548.93557
> 
>>> These now return infinity and set errno to ERANGE on allocation
>>> failure. (not ideal, but the options are limited)
>>>
>>> Here are some which do return a pointer, but do not document any errors:
>>>
>>>         ecvt
>>>         fcvt
>>
>> Maybe the documentation can be fixed?
> 
> The documentation is based on a standard, and fixing that standard
> involves a bit of process...
> 
>>>         gcvt
>>>         ecvtbuf
>>>         fcvtbuf
>>>         gcvtbuf
>>
>> Those get a pointer passed. No need to allocate memory.
> 
> These functions are using code also written by David Gay to perform
> float to string conversion, based on research done by Guy Steele and Jon
> White in how to print floating point numbers accurately (which happened
> to be presented at the same conference as the work above!). In this
> work, they showed that exact conversion could be done using 1050 bit
> arithmetic to generate a 64-bit double result:
> 
>         https://dl.acm.org/doi/10.1145/93548.93559
> 
> David Gay's code in newlib for both directions uses arbitrary precision
> arithmetic code found in newlib/libc/stdlib/mprec.c. This code allocates
> variable sized arrays of integers on the heap to hold all of the values.
> Before the eBalloc patch, none of these allocations were checked,
> leading to a rather long list of CVEs as the code could end up storing
> through a NULL pointer, which can cause security problems on some
> architectures.
> 
>>> And here's a list of functions which I feel reasonable applications
>>> should not expect an allocation error from:
>>
>> I don't think any application should expect those functions to call exit()
>> and/or abort() either.
> 
> I'm in complete agreement here. It's better to return an error that an
> application *might* check than to not give it any chance to recover at
> all.
> 
>>>         sprintf
>>>         snprintf
>>
>> Those should return -1 on failure.
>>
>>>         sscanf
>>
>> For this, ENOMEM is documented.
> 
> Yes, but as I suggested, applications probably aren't expecting a call
> to sscanf to return EOF and set errno to ENOMEM.
> 
> The real answer to your concerns is to replace the old arbitrary
> precision based float/string conversion code with code that uses results
> from new research by Ulf Adams.
> 
> That research improves on Steel & White by reducing the precision
> required for exact 64-bit float to string conversion from 1050 bits to
> 128 bits. Adams also presents an algorithm using a similar technique to
> perform (a slightly weaker form of) exact string to float conversion in
> the same precision:
> 
>         https://dl.acm.org/citation.cfm?doid=3296979.3192369
> 
> This reasonably small fixed precision can be statically allocated in
> memory, or allocated on the stack. Either of these solutions eliminates
> the use of the dynamic heap through malloc, and eliminates the need to
> change the specification of all of these functions to account for the
> heap usage in the existing newlib code.
> 
> Ulf Adams also published code to implement this algorithm on github:
> 
>         https://github.com/ulfjack/ryu
> 
> I've ported this code to picolibc, a fork of newlib designed for
> embedded systems. That library has an alternate stdio implementation
> that doesn't need to use malloc, and it made sense to add this
> malloc-free float/string conversion code to that (the previous
> float/string conversion code in this implementation was not exact). When
> compiled using that code, picolibc will not return errors from malloc
> failures in the above cases because it does not call malloc in those
> code paths.
> 
> The picolibc source repository also includes the stdio code from newlib
> which can be used in place of the default picolibc stdio code by setting
> a build option. That code has been modified to catch allocation
> failures and return the failures above. I did that in case someone
> wanted to use the original stdio code as I felt even this non-default
> code should not expose applications to arbitrary calls to abort from
> inside the library. I believe this code should be ported back to newlib
> so that at least newlib wouldn't call abort. Even better would be to
> have someone take a look at the Ryu paper and code and make that work in
> newlib.
> 
> (The definition of 'exact' used in Ulf Adams work offers the guarantee
> that you can print any floating point value, and then re-read that
> string to exactly reproduce the original floating point value in
> memory. This is weaker than what Clinger's research used; in that work,
> the goal was to generate the floating point value closest to an
> arbitrary string of decimal digits.)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-15  4:53               ` Dimitrios Glynos
@ 2020-06-16  3:43                 ` Keith Packard
  0 siblings, 0 replies; 14+ messages in thread
From: Keith Packard @ 2020-06-16  3:43 UTC (permalink / raw)
  To: Dimitrios Glynos, newlib

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

Dimitrios Glynos <dimitris@census-labs.com> writes:

> For these reasons, I had proposed for case (b) the introduction of a
> function that would be implemented by the developer, and would handle
> non-reported out-of-memory conditions. Developers already supply
> implementations for system calls on bare bones devices, and this could
> be a similar concept to that.

I'd suggest that this new application-supplied function would be defined
to return normally, and that the library would then return a "safe"
value back to the application to provide a well-defined flow of control
even in this case.

> Finally, it would be beneficial for all if both projects
> (picolibc and newlib) filed a comment to the standards group
> stating the cases where an ENOMEM was found useful but was not
> covered by the standard.

Yeah, fortunately in the default configuration, picolibc doesn't have
any cases like this at this point. In that configuration, picolibc will
only call malloc for operations which already have well defined
out-of-memory behavior (e.g. strdup). It's only if you switch to the
original newlib stdio code that you hit these cases due to the use of
the multi-precision math code in that path.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: undefined references since newlib-3.2.0
  2020-06-13 17:20         ` Keith Packard
  2020-06-14 14:50           ` Josef Wolf
@ 2020-06-17 11:36           ` Josef Wolf
  1 sibling, 0 replies; 14+ messages in thread
From: Josef Wolf @ 2020-06-17 11:36 UTC (permalink / raw)
  To: newlib

On Sat, Jun 13, 2020 at 10:20:54AM -0700, Keith Packard wrote:
>         strtod
>         strtof
>         strtold
>         wcstod

While we're at it: the signatures for those functions descard the "const"
qualifier from their first parameter by returning a non-qualified pointer to
it.

This also applies to strtol, strtoul and friends.

Is this required by the standard? Or am I missing smoething important?

-- 
Josef Wolf
jw@raven.inka.de

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-17 11:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-09  8:00 undefined references since newlib-3.2.0 Wolf, Josef
2020-06-12  8:21 ` Josef Wolf
2020-06-12 22:50   ` Josef Wolf
2020-06-13  0:05     ` Keith Packard
2020-06-13  8:27       ` Josef Wolf
2020-06-13 17:20         ` Keith Packard
2020-06-14 14:50           ` Josef Wolf
2020-06-14 17:02             ` Keith Packard
2020-06-14 17:40               ` Brian Inglis
2020-06-14 20:13                 ` Keith Packard
2020-06-15  4:53               ` Dimitrios Glynos
2020-06-16  3:43                 ` Keith Packard
2020-06-17 11:36           ` Josef Wolf
2020-06-13 10:31     ` Jeffrey Walton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).