public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
From: Brian Inglis <Brian.Inglis@SystematicSw.ab.ca>
To: newlib@sourceware.org
Subject: Re: undefined references since newlib-3.2.0
Date: Sun, 14 Jun 2020 11:40:53 -0600	[thread overview]
Message-ID: <d4b62347-0388-56d8-6f46-2f05c0a1661d@SystematicSw.ab.ca> (raw)
In-Reply-To: <87k109h96z.fsf@keithp.com>

On 2020-06-14 11:02, Keith Packard via Newlib wrote:
> Josef Wolf <jw@raven.inka.de> writes:
>>>         atof
>>>         atoff
>>> [ ... ]
>>>         strtod
>>>         strtof
>>>         strtold
>>>         wcstod
>>>         wcstold
>>>         strtodg

>> Uh! Why on earth would those functions need to allocate memory?

> Because they are performing string to float conversions using code
> written in 1991 by David Gay, based on research done by Will Clinger
> which shows that exact conversion from arbitrary strings of decimal
> digits to fixed precision binary requires arbitrary precision
> arithmetic.
>         https://dl.acm.org/doi/10.1145/93548.93557

>>> These now return infinity and set errno to ERANGE on allocation
>>> failure. (not ideal, but the options are limited)
>>> Here are some which do return a pointer, but do not document any errors:
>>>         ecvt
>>>         fcvt

>> Maybe the documentation can be fixed?

> The documentation is based on a standard, and fixing that standard
> involves a bit of process...

>>>         gcvt
>>>         ecvtbuf
>>>         fcvtbuf
>>>         gcvtbuf

>> Those get a pointer passed. No need to allocate memory.

> These functions are using code also written by David Gay to perform
> float to string conversion, based on research done by Guy Steele and Jon
> White in how to print floating point numbers accurately (which happened
> to be presented at the same conference as the work above!). In this
> work, they showed that exact conversion could be done using 1050 bit
> arithmetic to generate a 64-bit double result:
>         https://dl.acm.org/doi/10.1145/93548.93559
> David Gay's code in newlib for both directions uses arbitrary precision
> arithmetic code found in newlib/libc/stdlib/mprec.c. This code allocates
> variable sized arrays of integers on the heap to hold all of the values.
> Before the eBalloc patch, none of these allocations were checked,
> leading to a rather long list of CVEs as the code could end up storing
> through a NULL pointer, which can cause security problems on some
> architectures.

>>> And here's a list of functions which I feel reasonable applications
>>> should not expect an allocation error from:

>> I don't think any application should expect those functions to call exit()
>> and/or abort() either.

> I'm in complete agreement here. It's better to return an error that an
> application *might* check than to not give it any chance to recover at
> all.

>>>         sprintf
>>>         snprintf

>> Those should return -1 on failure.

>>>         sscanf

>> For this, ENOMEM is documented.

> Yes, but as I suggested, applications probably aren't expecting a call
> to sscanf to return EOF and set errno to ENOMEM.
> The real answer to your concerns is to replace the old arbitrary
> precision based float/string conversion code with code that uses results
> from new research by Ulf Adams.
> That research improves on Steel & White by reducing the precision
> required for exact 64-bit float to string conversion from 1050 bits to
> 128 bits. Adams also presents an algorithm using a similar technique to
> perform (a slightly weaker form of) exact string to float conversion in
> the same precision:
>         https://dl.acm.org/citation.cfm?doid=3296979.3192369

Paper is available:
https://www.researchgate.net/publication/329410883_Ryu_fast_float-to-string_conversion/link/5c073a3292851c6ca1ff1bdb/download

> This reasonably small fixed precision can be statically allocated in
> memory, or allocated on the stack. Either of these solutions eliminates
> the use of the dynamic heap through malloc, and eliminates the need to
> change the specification of all of these functions to account for the
> heap usage in the existing newlib code.
> Ulf Adams also published code to implement this algorithm on github:
>         https://github.com/ulfjack/ryu
> I've ported this code to picolibc, a fork of newlib designed for
> embedded systems. That library has an alternate stdio implementation
> that doesn't need to use malloc, and it made sense to add this
> malloc-free float/string conversion code to that (the previous
> float/string conversion code in this implementation was not exact). When
> compiled using that code, picolibc will not return errors from malloc
> failures in the above cases because it does not call malloc in those
> code paths.
> The picolibc source repository also includes the stdio code from newlib
> which can be used in place of the default picolibc stdio code by setting
> a build option. That code has been modified to catch allocation
> failures and return the failures above. I did that in case someone
> wanted to use the original stdio code as I felt even this non-default
> code should not expose applications to arbitrary calls to abort from
> inside the library. I believe this code should be ported back to newlib
> so that at least newlib wouldn't call abort. Even better would be to
> have someone take a look at the Ryu paper and code and make that work in
> newlib.
> (The definition of 'exact' used in Ulf Adams work offers the guarantee
> that you can print any floating point value, and then re-read that
> string to exactly reproduce the original floating point value in
> memory. This is weaker than what Clinger's research used; in that work,
> the goal was to generate the floating point value closest to an
> arbitrary string of decimal digits.)

What are the accuracy tradeoffs, if any, vs memory vs time?
Should both approaches be retained for flexibility and reproducibility?

Hopefully any new approach added will be implemented as reentrant functions,
using externally supplied memory if required, and some interfaces may provide
that memory statically allocated.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]

  reply	other threads:[~2020-06-14 17:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-09  8:00 Wolf, Josef
2020-06-12  8:21 ` Josef Wolf
2020-06-12 22:50   ` Josef Wolf
2020-06-13  0:05     ` Keith Packard
2020-06-13  8:27       ` Josef Wolf
2020-06-13 17:20         ` Keith Packard
2020-06-14 14:50           ` Josef Wolf
2020-06-14 17:02             ` Keith Packard
2020-06-14 17:40               ` Brian Inglis [this message]
2020-06-14 20:13                 ` Keith Packard
2020-06-15  4:53               ` Dimitrios Glynos
2020-06-16  3:43                 ` Keith Packard
2020-06-17 11:36           ` Josef Wolf
2020-06-13 10:31     ` Jeffrey Walton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4b62347-0388-56d8-6f46-2f05c0a1661d@SystematicSw.ab.ca \
    --to=brian.inglis@systematicsw.ab.ca \
    --cc=newlib@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).