* undefined references since newlib-3.2.0 @ 2020-06-09 8:00 Wolf, Josef 2020-06-12 8:21 ` Josef Wolf 0 siblings, 1 reply; 14+ messages in thread From: Wolf, Josef @ 2020-06-09 8:00 UTC (permalink / raw) To: newlib Hello everybody, Since newlib-3.2.0, I get this error when linking one of my projects: $ m68k-unknown-elf-gcc -nostartfiles -Wl,--cref,--section-start=vectors=0 \ -Wl,-Ttext=0x400,--entry=entry -Wl,--oformat,elf32-m68k \ -Wl,--cref,-Map,proj.map \ -Wl,-T,ldscript.be -ansi -pedantic -Wall -Wcast-align \ -Wstrict-prototypes -Wmissing-prototypes -std=c89 -Wnull-dereference -g \ -O2 -fno-toplevel-reorder -mcpu32 -o proj.elf `cat proj.objs` -lc /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-abort.o): in function `abort': /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/stdlib/abort.c:59: undefined reference to `_exit' /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-signalr.o): in function `_kill_r': /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/reent/signalr.c:53: undefined reference to `kill' /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-signalr.o): in function `_getpid_r': /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/reent/signalr.c:83: undefined reference to `getpid' The project builds just fine with newlib versions up to 3.1.0 newlib was configured like this: /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/configure --target=m68k-unknown-elf --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --prefix=/m/a/local/crossgcc/m68k-unknown-elf/nanoLibraries --enable-lite-exit --enable-newlib-nano-malloc --enable-newlib-nano-formatted-io --enable-newlib-reent-small --enable-newlib-retargetable-locking --enable-newlib-global-atexit --enable-newlib-global-stdio-streams --disable-newlib-supplied-syscalls --disable-newlib-fvwrite-in-streamio --disable-newlib-fseek-optimization --disable-newlib-wide-orient --disable-newlib-unbuf-stream-opt --disable-nls Any suggestions? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-09 8:00 undefined references since newlib-3.2.0 Wolf, Josef @ 2020-06-12 8:21 ` Josef Wolf 2020-06-12 22:50 ` Josef Wolf 0 siblings, 1 reply; 14+ messages in thread From: Josef Wolf @ 2020-06-12 8:21 UTC (permalink / raw) To: newlib Hello again, finally, I have identified the commit, which makes my project to fail: My project links just fine against newlib-df5c79f30c3f871b7e0edd6d4629af78b30fca15 but it fails with the error message shown below when linking against newlib-f88aece242178ff0c187d56e34a79645fbc44a23 To be honest, I completely fail to understand how this commit https://sourceware.org/git?p=newlib-cygwin.git;a=commit;h=f88aece242178ff0c187d56e34a79645fbc44a23 can possibly cause those link failures. After all, f88aece242178ff0c187d56e34a79645fbc44a23 shows no traces of abort/exit/kill/getpid. Any suggestions? On Tue, Jun 09, 2020 at 08:00:25AM +0000, Wolf, Josef via Newlib wrote: > > Since newlib-3.2.0, I get this error when linking one of my projects: > > > $ m68k-unknown-elf-gcc -nostartfiles -Wl,--cref,--section-start=vectors=0 \ > -Wl,-Ttext=0x400,--entry=entry -Wl,--oformat,elf32-m68k \ > -Wl,--cref,-Map,proj.map \ > -Wl,-T,ldscript.be -ansi -pedantic -Wall -Wcast-align \ > -Wstrict-prototypes -Wmissing-prototypes -std=c89 -Wnull-dereference -g \ > -O2 -fno-toplevel-reorder -mcpu32 -o proj.elf `cat proj.objs` -lc > /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-abort.o): > in function `abort': > /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/stdlib/abort.c:59: undefined reference to `_exit' > /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-signalr.o): > in function `_kill_r': > /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/reent/signalr.c:53: undefined reference to `kill' > /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/bin/ld: /m/a/local/crossgcc/m68k-unknown-elf/lib64/gcc/m68k-unknown-elf/9.3.0/../../../../m68k-unknown-elf/lib/mcpu32/libc.a(lib_a-signalr.o): > in function `_getpid_r': > /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/newlib/libc/reent/signalr.c:83: undefined reference to `getpid' > > The project builds just fine with newlib versions up to 3.1.0 > > newlib was configured like this: > > /m/a/tmp/builds/crossgcc/src/newlib-3.2.0/configure --target=m68k-unknown-elf --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --prefix=/m/a/local/crossgcc/m68k-unknown-elf/nanoLibraries --enable-lite-exit --enable-newlib-nano-malloc --enable-newlib-nano-formatted-io --enable-newlib-reent-small --enable-newlib-retargetable-locking --enable-newlib-global-atexit --enable-newlib-global-stdio-streams --disable-newlib-supplied-syscalls --disable-newlib-fvwrite-in-streamio --disable-newlib-fseek-optimization --disable-newlib-wide-orient --disable-newlib-unbuf-stream-opt --disable-nls > > Any suggestions? -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-12 8:21 ` Josef Wolf @ 2020-06-12 22:50 ` Josef Wolf 2020-06-13 0:05 ` Keith Packard 2020-06-13 10:31 ` Jeffrey Walton 0 siblings, 2 replies; 14+ messages in thread From: Josef Wolf @ 2020-06-12 22:50 UTC (permalink / raw) To: newlib On Fri, Jun 12, 2020 at 10:21:11AM +0200, Josef Wolf wrote: > To be honest, I completely fail to understand how this commit > https://sourceware.org/git?p=newlib-cygwin.git;a=commit;h=f88aece242178ff0c187d56e34a79645fbc44a23 > can possibly cause those link failures. After all, > f88aece242178ff0c187d56e34a79645fbc44a23 shows no traces of abort/exit/kill/getpid. This patch replaces Balloc() by eBalloc() which calls __assert_func() on failure which in turn calls exit() and/or abort() Is this really a sane thing to do? After all, assertions should be used to detect logically impossible situations but not as a general replacement for error handling. Shouldn't those library functions simply report the error by returning NULL to the caller and let the application handle the situation? -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-12 22:50 ` Josef Wolf @ 2020-06-13 0:05 ` Keith Packard 2020-06-13 8:27 ` Josef Wolf 2020-06-13 10:31 ` Jeffrey Walton 1 sibling, 1 reply; 14+ messages in thread From: Keith Packard @ 2020-06-13 0:05 UTC (permalink / raw) To: Josef Wolf, newlib [-- Attachment #1: Type: text/plain, Size: 1522 bytes --] Josef Wolf <jw@raven.inka.de> writes: > This patch replaces Balloc() by eBalloc() which calls __assert_func() on > failure which in turn calls exit() and/or abort() As an interim step, having a fix available sooner was a good thing. > Shouldn't those library functions simply report the error by returning NULL to > the caller and let the application handle the situation? It's complicated by the libc API which doesn't have defined error return values for all of the affected APIs. Glibc has similar failure modes; using those is probably the best option for newlib. I've patched this code in picolibc (a newlib fork designed for embedded systems) to track the malloc failures and return status information back to applications. Interested parties might want to port those fixes back to newlib. https://github.com/keith-packard/picolibc/commit/dad1bd20fb927fb14a63d7951b0441f1cec73bf8 https://github.com/keith-packard/picolibc/commit/9fc3006ef9fec4a72b6dd29d2b691c1cc386e8f1 I've also implemented the recent 'ryu' float/string conversion code, which performs exact (i.e. round-trip-able) conversions with much lower precision (128bit for 64-bit floats, 64-bit for 32-bit floats). As a result, that code does all of its operations on the stack and doesn't use malloc at all, eliminating this particular source of errors. That code was not writen for the newlib stdio bits, but instead for the picolibc tinystdio bits. Still, it might serve as a basis for a newlib port. -- -keith [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-13 0:05 ` Keith Packard @ 2020-06-13 8:27 ` Josef Wolf 2020-06-13 17:20 ` Keith Packard 0 siblings, 1 reply; 14+ messages in thread From: Josef Wolf @ 2020-06-13 8:27 UTC (permalink / raw) To: newlib Thanks for your reply, Keith! On Fri, Jun 12, 2020 at 05:05:15PM -0700, Keith Packard wrote: > Josef Wolf <jw@raven.inka.de> writes: > > > This patch replaces Balloc() by eBalloc() which calls __assert_func() on > > failure which in turn calls exit() and/or abort() > > As an interim step, having a fix available sooner was a good thing. I see. But at a first glance, returning NULL seems to be not much more work. > > Shouldn't those library functions simply report the error by returning NULL to > > the caller and let the application handle the situation? > > It's complicated by the libc API which doesn't have defined error return > values for all of the affected APIs. I don't really understand this. The classical way to return error from functions returning a pointer is to return NULL. One has to make sure not to leak, of course. Which return values would be needed beyond NULL? -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-13 8:27 ` Josef Wolf @ 2020-06-13 17:20 ` Keith Packard 2020-06-14 14:50 ` Josef Wolf 2020-06-17 11:36 ` Josef Wolf 0 siblings, 2 replies; 14+ messages in thread From: Keith Packard @ 2020-06-13 17:20 UTC (permalink / raw) To: Josef Wolf, newlib [-- Attachment #1: Type: text/plain, Size: 1837 bytes --] Josef Wolf <jw@raven.inka.de> writes: > I don't really understand this. The classical way to return error from > functions returning a pointer is to return NULL. One has to make sure not to > leak, of course. Which return values would be needed beyond NULL? The issue is that you need to reflect the allocation error all the way up to the initial libc entry point, and you need to use a defined error return value at that point, preferably one which indicates that the library had an internal allocation failure. Here are some affected libc functions which don't return a pointer and which have no error indication of any kind: atof atoff These are defined to just return strtod/strtof, so they continue to inherit whatever those functions do. Which is now: These have errnos defined, but those do not include ENOMEM: strtod strtof strtold wcstod wcstold strtodg These now return infinity and set errno to ERANGE on allocation failure. (not ideal, but the options are limited) Here are some which do return a pointer, but do not document any errors: ecvt fcvt gcvt ecvtbuf fcvtbuf gcvtbuf These now can return NULL on allocation failure. They do not set errno. And here's a list of functions which I feel reasonable applications should not expect an allocation error from: sprintf snprintf sscanf These return EOF on allocation failure. I wasn't too concerned with the exact semantics as there are no good answers when libc uses malloc in these functions, so once I had eliminated the internal aborts, I moved on to adding the ryu conversion code to tinystdio to eliminate allocation in that path entirely. -- -keith [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-13 17:20 ` Keith Packard @ 2020-06-14 14:50 ` Josef Wolf 2020-06-14 17:02 ` Keith Packard 2020-06-17 11:36 ` Josef Wolf 1 sibling, 1 reply; 14+ messages in thread From: Josef Wolf @ 2020-06-14 14:50 UTC (permalink / raw) To: newlib On Sat, Jun 13, 2020 at 10:20:54AM -0700, Keith Packard wrote: > Here are some affected libc functions which don't return a pointer and > which have no error indication of any kind: > > atof > atoff > [ ... ] > strtod > strtof > strtold > wcstod > wcstold > strtodg Uh! Why on earth would those functions need to allocate memory? > These now return infinity and set errno to ERANGE on allocation > failure. (not ideal, but the options are limited) > > Here are some which do return a pointer, but do not document any errors: > > ecvt > fcvt Maybe the documentation can be fixed? > gcvt > ecvtbuf > fcvtbuf > gcvtbuf Those get a pointer passed. No need to allocate memory. > And here's a list of functions which I feel reasonable applications > should not expect an allocation error from: I don't think any application should expect those functions to call exit() and/or abort() either. > sprintf > snprintf Those should return -1 on failure. > sscanf For this, ENOMEM is documented. -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-14 14:50 ` Josef Wolf @ 2020-06-14 17:02 ` Keith Packard 2020-06-14 17:40 ` Brian Inglis 2020-06-15 4:53 ` Dimitrios Glynos 0 siblings, 2 replies; 14+ messages in thread From: Keith Packard @ 2020-06-14 17:02 UTC (permalink / raw) To: Josef Wolf, newlib [-- Attachment #1: Type: text/plain, Size: 5229 bytes --] Josef Wolf <jw@raven.inka.de> writes: >> atof >> atoff >> [ ... ] >> strtod >> strtof >> strtold >> wcstod >> wcstold >> strtodg > > Uh! Why on earth would those functions need to allocate memory? Because they are performing string to float conversions using code written in 1991 by David Gay, based on research done by Will Clinger which shows that exact conversion from arbitrary strings of decimal digits to fixed precision binary requires arbitrary precision arithmetic. https://dl.acm.org/doi/10.1145/93548.93557 >> These now return infinity and set errno to ERANGE on allocation >> failure. (not ideal, but the options are limited) >> >> Here are some which do return a pointer, but do not document any errors: >> >> ecvt >> fcvt > > Maybe the documentation can be fixed? The documentation is based on a standard, and fixing that standard involves a bit of process... >> gcvt >> ecvtbuf >> fcvtbuf >> gcvtbuf > > Those get a pointer passed. No need to allocate memory. These functions are using code also written by David Gay to perform float to string conversion, based on research done by Guy Steele and Jon White in how to print floating point numbers accurately (which happened to be presented at the same conference as the work above!). In this work, they showed that exact conversion could be done using 1050 bit arithmetic to generate a 64-bit double result: https://dl.acm.org/doi/10.1145/93548.93559 David Gay's code in newlib for both directions uses arbitrary precision arithmetic code found in newlib/libc/stdlib/mprec.c. This code allocates variable sized arrays of integers on the heap to hold all of the values. Before the eBalloc patch, none of these allocations were checked, leading to a rather long list of CVEs as the code could end up storing through a NULL pointer, which can cause security problems on some architectures. >> And here's a list of functions which I feel reasonable applications >> should not expect an allocation error from: > > I don't think any application should expect those functions to call exit() > and/or abort() either. I'm in complete agreement here. It's better to return an error that an application *might* check than to not give it any chance to recover at all. >> sprintf >> snprintf > > Those should return -1 on failure. > >> sscanf > > For this, ENOMEM is documented. Yes, but as I suggested, applications probably aren't expecting a call to sscanf to return EOF and set errno to ENOMEM. The real answer to your concerns is to replace the old arbitrary precision based float/string conversion code with code that uses results from new research by Ulf Adams. That research improves on Steel & White by reducing the precision required for exact 64-bit float to string conversion from 1050 bits to 128 bits. Adams also presents an algorithm using a similar technique to perform (a slightly weaker form of) exact string to float conversion in the same precision: https://dl.acm.org/citation.cfm?doid=3296979.3192369 This reasonably small fixed precision can be statically allocated in memory, or allocated on the stack. Either of these solutions eliminates the use of the dynamic heap through malloc, and eliminates the need to change the specification of all of these functions to account for the heap usage in the existing newlib code. Ulf Adams also published code to implement this algorithm on github: https://github.com/ulfjack/ryu I've ported this code to picolibc, a fork of newlib designed for embedded systems. That library has an alternate stdio implementation that doesn't need to use malloc, and it made sense to add this malloc-free float/string conversion code to that (the previous float/string conversion code in this implementation was not exact). When compiled using that code, picolibc will not return errors from malloc failures in the above cases because it does not call malloc in those code paths. The picolibc source repository also includes the stdio code from newlib which can be used in place of the default picolibc stdio code by setting a build option. That code has been modified to catch allocation failures and return the failures above. I did that in case someone wanted to use the original stdio code as I felt even this non-default code should not expose applications to arbitrary calls to abort from inside the library. I believe this code should be ported back to newlib so that at least newlib wouldn't call abort. Even better would be to have someone take a look at the Ryu paper and code and make that work in newlib. (The definition of 'exact' used in Ulf Adams work offers the guarantee that you can print any floating point value, and then re-read that string to exactly reproduce the original floating point value in memory. This is weaker than what Clinger's research used; in that work, the goal was to generate the floating point value closest to an arbitrary string of decimal digits.) -- -keith [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-14 17:02 ` Keith Packard @ 2020-06-14 17:40 ` Brian Inglis 2020-06-14 20:13 ` Keith Packard 2020-06-15 4:53 ` Dimitrios Glynos 1 sibling, 1 reply; 14+ messages in thread From: Brian Inglis @ 2020-06-14 17:40 UTC (permalink / raw) To: newlib On 2020-06-14 11:02, Keith Packard via Newlib wrote: > Josef Wolf <jw@raven.inka.de> writes: >>> atof >>> atoff >>> [ ... ] >>> strtod >>> strtof >>> strtold >>> wcstod >>> wcstold >>> strtodg >> Uh! Why on earth would those functions need to allocate memory? > Because they are performing string to float conversions using code > written in 1991 by David Gay, based on research done by Will Clinger > which shows that exact conversion from arbitrary strings of decimal > digits to fixed precision binary requires arbitrary precision > arithmetic. > https://dl.acm.org/doi/10.1145/93548.93557 >>> These now return infinity and set errno to ERANGE on allocation >>> failure. (not ideal, but the options are limited) >>> Here are some which do return a pointer, but do not document any errors: >>> ecvt >>> fcvt >> Maybe the documentation can be fixed? > The documentation is based on a standard, and fixing that standard > involves a bit of process... >>> gcvt >>> ecvtbuf >>> fcvtbuf >>> gcvtbuf >> Those get a pointer passed. No need to allocate memory. > These functions are using code also written by David Gay to perform > float to string conversion, based on research done by Guy Steele and Jon > White in how to print floating point numbers accurately (which happened > to be presented at the same conference as the work above!). In this > work, they showed that exact conversion could be done using 1050 bit > arithmetic to generate a 64-bit double result: > https://dl.acm.org/doi/10.1145/93548.93559 > David Gay's code in newlib for both directions uses arbitrary precision > arithmetic code found in newlib/libc/stdlib/mprec.c. This code allocates > variable sized arrays of integers on the heap to hold all of the values. > Before the eBalloc patch, none of these allocations were checked, > leading to a rather long list of CVEs as the code could end up storing > through a NULL pointer, which can cause security problems on some > architectures. >>> And here's a list of functions which I feel reasonable applications >>> should not expect an allocation error from: >> I don't think any application should expect those functions to call exit() >> and/or abort() either. > I'm in complete agreement here. It's better to return an error that an > application *might* check than to not give it any chance to recover at > all. >>> sprintf >>> snprintf >> Those should return -1 on failure. >>> sscanf >> For this, ENOMEM is documented. > Yes, but as I suggested, applications probably aren't expecting a call > to sscanf to return EOF and set errno to ENOMEM. > The real answer to your concerns is to replace the old arbitrary > precision based float/string conversion code with code that uses results > from new research by Ulf Adams. > That research improves on Steel & White by reducing the precision > required for exact 64-bit float to string conversion from 1050 bits to > 128 bits. Adams also presents an algorithm using a similar technique to > perform (a slightly weaker form of) exact string to float conversion in > the same precision: > https://dl.acm.org/citation.cfm?doid=3296979.3192369 Paper is available: https://www.researchgate.net/publication/329410883_Ryu_fast_float-to-string_conversion/link/5c073a3292851c6ca1ff1bdb/download > This reasonably small fixed precision can be statically allocated in > memory, or allocated on the stack. Either of these solutions eliminates > the use of the dynamic heap through malloc, and eliminates the need to > change the specification of all of these functions to account for the > heap usage in the existing newlib code. > Ulf Adams also published code to implement this algorithm on github: > https://github.com/ulfjack/ryu > I've ported this code to picolibc, a fork of newlib designed for > embedded systems. That library has an alternate stdio implementation > that doesn't need to use malloc, and it made sense to add this > malloc-free float/string conversion code to that (the previous > float/string conversion code in this implementation was not exact). When > compiled using that code, picolibc will not return errors from malloc > failures in the above cases because it does not call malloc in those > code paths. > The picolibc source repository also includes the stdio code from newlib > which can be used in place of the default picolibc stdio code by setting > a build option. That code has been modified to catch allocation > failures and return the failures above. I did that in case someone > wanted to use the original stdio code as I felt even this non-default > code should not expose applications to arbitrary calls to abort from > inside the library. I believe this code should be ported back to newlib > so that at least newlib wouldn't call abort. Even better would be to > have someone take a look at the Ryu paper and code and make that work in > newlib. > (The definition of 'exact' used in Ulf Adams work offers the guarantee > that you can print any floating point value, and then re-read that > string to exactly reproduce the original floating point value in > memory. This is weaker than what Clinger's research used; in that work, > the goal was to generate the floating point value closest to an > arbitrary string of decimal digits.) What are the accuracy tradeoffs, if any, vs memory vs time? Should both approaches be retained for flexibility and reproducibility? Hopefully any new approach added will be implemented as reentrant functions, using externally supplied memory if required, and some interfaces may provide that memory statically allocated. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in IEC units and prefixes, physical quantities in SI.] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-14 17:40 ` Brian Inglis @ 2020-06-14 20:13 ` Keith Packard 0 siblings, 0 replies; 14+ messages in thread From: Keith Packard @ 2020-06-14 20:13 UTC (permalink / raw) To: Brian Inglis, newlib [-- Attachment #1: Type: text/plain, Size: 4224 bytes --] Brian Inglis <Brian.Inglis@SystematicSw.ab.ca> writes: > What are the accuracy tradeoffs, if any, vs memory vs time? The 'exact' code offers a 'round trip' guarantee if you use at least 9 digits for 32-bit floats and 17 digits for 64-bit floats. The inexact code performs all of the exponent scaling operations using floating point values; the more scaling you do, the less precise the result. For memory, neither use significant RAM as there aren't that many intermediate values. Here's a couple of complete application size comparisons (this example is taken from https://github.com/keith-packard/picolibc/blob/main/doc/printf.md ): #include <stdio.h> #include <stdlib.h> void main(void) { printf(" 2⁶¹ = %lld π ≃ %.17g\n", 1ll << 61, printf_float(3.141592653589793)); } Here's a comparison between the available options, all built from picolibc head as of today. These are all compiled for RISC-V rv32imafdc (32-bit with hardware float and double to avoid linking in all of the soft float code) as I can emulate that using qemu for testing. Picolibc's replacement stdio code has a hack for supporting printf of 32-bit floats, which is useful on smaller systems to reduce the size of the code, which I show here with the 32-bit measurements. text data bss dec hex filename Description 9078 16 8 9102 238e printf-ryu.elf Ryu 64-bit 6350 16 8 6374 18e6 printf-ryu-float.elf Ryu 32-bit 7364 16 8 7388 1cdc printf-hard.elf Old 64-bit 6428 16 8 6452 1934 printf-hard-float.elf Old 32-bit 2314 16 8 2338 922 printf-ryu-int.elf No floats For more comparison, here's the size of the newlib stdio. Note that this does not include 1700 bytes of heap allocated after the program starts: text data bss dec hex filename 22702 760 868 24330 5f0a printf-newlib.elf I haven't benchmarked performance of any of these implementations; Ulf Adams (the author of the Ryu paper and code) claims significant performance benefits over glibc and musl. I would wager that it's significantly faster than the newlib code though. For a machine without hardware float support, I'd be hard pressed to guess which is faster; the Ryu code is all done in integer arithmetic, and so avoids the cost of software floating point, even though it's doing a lot more visible computation. > Should both approaches be retained for flexibility and > reproducibility? Picolibc retains all of these options in the source code, so you are free to select the option which is best for your environment. The Debian packages of picolibc binary versions for RISC-V, ARM and ESP8266 all use the default options, which use the exact (Ryu) option in the replacement stdio code. Configure the build with '-Dio-float-exact=false' to use the inexact versions, compile with '-Dtinystdio=false -Dnewlib-io-float=true -Dio-long-long=true' to use the newlib stdio code with float and long long support (which is always included in the replacement stdio code). The tests adapt to the inexact mode by reducing the required precision, which is why picolibc can still pass in that mode. In exact mode, the tests require exact results. CI tests run on every push show that picolibc continues to pass this requirement: https://github.com/keith-packard/picolibc/actions > Hopefully any new approach added will be implemented as reentrant functions, > using externally supplied memory if required, and some interfaces may provide > that memory statically allocated. Yes, the new functions use only constant data with all read-write values being local variables allocated in registers or on the stack. The picolibc stdio code doesn't have any state in the FILE structure when performing output, which makes these functions entirely re-entrant. You can see from the above that picolibc doesn't use a lot of RAM for these operations; that 24 bytes is all that is required, aside from space on the stack. -- -keith [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-14 17:02 ` Keith Packard 2020-06-14 17:40 ` Brian Inglis @ 2020-06-15 4:53 ` Dimitrios Glynos 2020-06-16 3:43 ` Keith Packard 1 sibling, 1 reply; 14+ messages in thread From: Dimitrios Glynos @ 2020-06-15 4:53 UTC (permalink / raw) To: newlib Hello, I'm responsible for the CVE's that triggered these patches (see [1]) and would just like to mention on-list that during the initial (private) discussion of these issues with both Jeff Johnston of newlib and Keith Packard of newlib-nano/picolibc I had provided the following recommendation: a) when the standard allows so, have the API return the NULL pointer to the caller with ENOMEM, so that the caller knows there's an out of memory (OOM) issue. b) when the standard has no provision for ENOMEM, it would be good practice to introduce a function that would be called under OOM conditions. OOM conditions are special, especially for embedded and bare metal projects, as: i) a project might not be able to simply call abort(), i.e. other important actions must be performed (e.g. communicate the error to the user or over some interface) ii) a project might want to take special actions so that it survives this condition (e.g. relinquish some resources, unwind to some other state) For these reasons, I had proposed for case (b) the introduction of a function that would be implemented by the developer, and would handle non-reported out-of-memory conditions. Developers already supply implementations for system calls on bare bones devices, and this could be a similar concept to that. There are two ways of implementing this, one at compile-time (e.g. as you define sbrk today) and another at runtime (similar to an atexit() registration). However I feel that the compile-time introduction of the function elaborates more clearly the message "that this is something that you must take care of when using this library". I don't recommend masquerading ENOMEM as some other error, as many developers may miss that this is a point in the code they should be covering points (i) and/or (ii). CVE-wise newlib with the right compile-time configuration covers the specific issues presented in the advisory, albeit with the caveats you have all witnessed. It is wise to also inspect if the callers of the reported functions handle NULL returns correctly as in picolibc there were a number of callers that required a patch. Finally, it would be beneficial for all if both projects (picolibc and newlib) filed a comment to the standards group stating the cases where an ENOMEM was found useful but was not covered by the standard. Hope this helps. Best regards, Dimitris [1] https://census-labs.com/news/2020/01/31/multiple-null-pointer-dereference-vulnerabilities-in-newlib/ On 14/6/20 8:02 μ.μ., Keith Packard via Newlib wrote: > Josef Wolf <jw@raven.inka.de> writes: > >>> atof >>> atoff >>> [ ... ] >>> strtod >>> strtof >>> strtold >>> wcstod >>> wcstold >>> strtodg >> >> Uh! Why on earth would those functions need to allocate memory? > > Because they are performing string to float conversions using code > written in 1991 by David Gay, based on research done by Will Clinger > which shows that exact conversion from arbitrary strings of decimal > digits to fixed precision binary requires arbitrary precision > arithmetic. > > https://dl.acm.org/doi/10.1145/93548.93557 > >>> These now return infinity and set errno to ERANGE on allocation >>> failure. (not ideal, but the options are limited) >>> >>> Here are some which do return a pointer, but do not document any errors: >>> >>> ecvt >>> fcvt >> >> Maybe the documentation can be fixed? > > The documentation is based on a standard, and fixing that standard > involves a bit of process... > >>> gcvt >>> ecvtbuf >>> fcvtbuf >>> gcvtbuf >> >> Those get a pointer passed. No need to allocate memory. > > These functions are using code also written by David Gay to perform > float to string conversion, based on research done by Guy Steele and Jon > White in how to print floating point numbers accurately (which happened > to be presented at the same conference as the work above!). In this > work, they showed that exact conversion could be done using 1050 bit > arithmetic to generate a 64-bit double result: > > https://dl.acm.org/doi/10.1145/93548.93559 > > David Gay's code in newlib for both directions uses arbitrary precision > arithmetic code found in newlib/libc/stdlib/mprec.c. This code allocates > variable sized arrays of integers on the heap to hold all of the values. > Before the eBalloc patch, none of these allocations were checked, > leading to a rather long list of CVEs as the code could end up storing > through a NULL pointer, which can cause security problems on some > architectures. > >>> And here's a list of functions which I feel reasonable applications >>> should not expect an allocation error from: >> >> I don't think any application should expect those functions to call exit() >> and/or abort() either. > > I'm in complete agreement here. It's better to return an error that an > application *might* check than to not give it any chance to recover at > all. > >>> sprintf >>> snprintf >> >> Those should return -1 on failure. >> >>> sscanf >> >> For this, ENOMEM is documented. > > Yes, but as I suggested, applications probably aren't expecting a call > to sscanf to return EOF and set errno to ENOMEM. > > The real answer to your concerns is to replace the old arbitrary > precision based float/string conversion code with code that uses results > from new research by Ulf Adams. > > That research improves on Steel & White by reducing the precision > required for exact 64-bit float to string conversion from 1050 bits to > 128 bits. Adams also presents an algorithm using a similar technique to > perform (a slightly weaker form of) exact string to float conversion in > the same precision: > > https://dl.acm.org/citation.cfm?doid=3296979.3192369 > > This reasonably small fixed precision can be statically allocated in > memory, or allocated on the stack. Either of these solutions eliminates > the use of the dynamic heap through malloc, and eliminates the need to > change the specification of all of these functions to account for the > heap usage in the existing newlib code. > > Ulf Adams also published code to implement this algorithm on github: > > https://github.com/ulfjack/ryu > > I've ported this code to picolibc, a fork of newlib designed for > embedded systems. That library has an alternate stdio implementation > that doesn't need to use malloc, and it made sense to add this > malloc-free float/string conversion code to that (the previous > float/string conversion code in this implementation was not exact). When > compiled using that code, picolibc will not return errors from malloc > failures in the above cases because it does not call malloc in those > code paths. > > The picolibc source repository also includes the stdio code from newlib > which can be used in place of the default picolibc stdio code by setting > a build option. That code has been modified to catch allocation > failures and return the failures above. I did that in case someone > wanted to use the original stdio code as I felt even this non-default > code should not expose applications to arbitrary calls to abort from > inside the library. I believe this code should be ported back to newlib > so that at least newlib wouldn't call abort. Even better would be to > have someone take a look at the Ryu paper and code and make that work in > newlib. > > (The definition of 'exact' used in Ulf Adams work offers the guarantee > that you can print any floating point value, and then re-read that > string to exactly reproduce the original floating point value in > memory. This is weaker than what Clinger's research used; in that work, > the goal was to generate the floating point value closest to an > arbitrary string of decimal digits.) > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-15 4:53 ` Dimitrios Glynos @ 2020-06-16 3:43 ` Keith Packard 0 siblings, 0 replies; 14+ messages in thread From: Keith Packard @ 2020-06-16 3:43 UTC (permalink / raw) To: Dimitrios Glynos, newlib [-- Attachment #1: Type: text/plain, Size: 1245 bytes --] Dimitrios Glynos <dimitris@census-labs.com> writes: > For these reasons, I had proposed for case (b) the introduction of a > function that would be implemented by the developer, and would handle > non-reported out-of-memory conditions. Developers already supply > implementations for system calls on bare bones devices, and this could > be a similar concept to that. I'd suggest that this new application-supplied function would be defined to return normally, and that the library would then return a "safe" value back to the application to provide a well-defined flow of control even in this case. > Finally, it would be beneficial for all if both projects > (picolibc and newlib) filed a comment to the standards group > stating the cases where an ENOMEM was found useful but was not > covered by the standard. Yeah, fortunately in the default configuration, picolibc doesn't have any cases like this at this point. In that configuration, picolibc will only call malloc for operations which already have well defined out-of-memory behavior (e.g. strdup). It's only if you switch to the original newlib stdio code that you hit these cases due to the use of the multi-precision math code in that path. -- -keith [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-13 17:20 ` Keith Packard 2020-06-14 14:50 ` Josef Wolf @ 2020-06-17 11:36 ` Josef Wolf 1 sibling, 0 replies; 14+ messages in thread From: Josef Wolf @ 2020-06-17 11:36 UTC (permalink / raw) To: newlib On Sat, Jun 13, 2020 at 10:20:54AM -0700, Keith Packard wrote: > strtod > strtof > strtold > wcstod While we're at it: the signatures for those functions descard the "const" qualifier from their first parameter by returning a non-qualified pointer to it. This also applies to strtol, strtoul and friends. Is this required by the standard? Or am I missing smoething important? -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: undefined references since newlib-3.2.0 2020-06-12 22:50 ` Josef Wolf 2020-06-13 0:05 ` Keith Packard @ 2020-06-13 10:31 ` Jeffrey Walton 1 sibling, 0 replies; 14+ messages in thread From: Jeffrey Walton @ 2020-06-13 10:31 UTC (permalink / raw) To: newlib On Fri, Jun 12, 2020 at 6:51 PM Josef Wolf <jw@raven.inka.de> wrote: > > On Fri, Jun 12, 2020 at 10:21:11AM +0200, Josef Wolf wrote: > > > To be honest, I completely fail to understand how this commit > > https://sourceware.org/git?p=newlib-cygwin.git;a=commit;h=f88aece242178ff0c187d56e34a79645fbc44a23 > > can possibly cause those link failures. After all, > > f88aece242178ff0c187d56e34a79645fbc44a23 shows no traces of abort/exit/kill/getpid. > > This patch replaces Balloc() by eBalloc() which calls __assert_func() on > failure which in turn calls exit() and/or abort() Please don't call abort(3). My programs usually handle sensitive information. I need destructors to run to ensure sensitive information is wiped. This is a compliance item. I also need to ensure sensitive information does not escape from the application's security boundary. For example, the sensitive information should not be written to the file system by way of a core file; and it should not be egressed to an error reporting service for the platform provider or application developer to inspect. Jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-06-17 11:40 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-09 8:00 undefined references since newlib-3.2.0 Wolf, Josef 2020-06-12 8:21 ` Josef Wolf 2020-06-12 22:50 ` Josef Wolf 2020-06-13 0:05 ` Keith Packard 2020-06-13 8:27 ` Josef Wolf 2020-06-13 17:20 ` Keith Packard 2020-06-14 14:50 ` Josef Wolf 2020-06-14 17:02 ` Keith Packard 2020-06-14 17:40 ` Brian Inglis 2020-06-14 20:13 ` Keith Packard 2020-06-15 4:53 ` Dimitrios Glynos 2020-06-16 3:43 ` Keith Packard 2020-06-17 11:36 ` Josef Wolf 2020-06-13 10:31 ` Jeffrey Walton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).