Brian Inglis <Brian.Inglis@SystematicSw.ab.ca> writes:

> What are the accuracy tradeoffs, if any, vs memory vs time?

The 'exact' code offers a 'round trip' guarantee if you use at least 9
digits for 32-bit floats and 17 digits for 64-bit floats. The inexact
code performs all of the exponent scaling operations using floating
point values; the more scaling you do, the less precise the result.

For memory, neither use significant RAM as there aren't that many
intermediate values. Here's a couple of complete application size
comparisons (this example is taken from
https://github.com/keith-packard/picolibc/blob/main/doc/printf.md ):

        #include <stdio.h>
        #include <stdlib.h>

        void
        main(void)
        {
        	printf(" 2⁶¹ = %lld π ≃ %.17g\n", 1ll << 61, printf_float(3.141592653589793));
        }


Here's a comparison between the available options, all built from
picolibc head as of today. These are all compiled for RISC-V rv32imafdc
(32-bit with hardware float and double to avoid linking in all of the
soft float code) as I can emulate that using qemu for
testing. Picolibc's replacement stdio code has a hack for supporting
printf of 32-bit floats, which is useful on smaller systems to reduce
the size of the code, which I show here with the 32-bit measurements.

   text	   data	    bss	    dec	    hex	filename                Description
   9078	     16	      8	   9102	   238e	printf-ryu.elf          Ryu 64-bit
   6350	     16	      8	   6374	   18e6	printf-ryu-float.elf    Ryu 32-bit

   7364	     16	      8	   7388	   1cdc	printf-hard.elf         Old 64-bit
   6428	     16	      8	   6452	   1934	printf-hard-float.elf   Old 32-bit

   2314	     16	      8	   2338	    922	printf-ryu-int.elf      No floats

For more comparison, here's the size of the newlib stdio. Note that
this does not include 1700 bytes of heap allocated after the program
starts:

   text	   data	    bss	    dec	    hex	filename
  22702	    760	    868	  24330	   5f0a	printf-newlib.elf

I haven't benchmarked performance of any of these implementations; Ulf
Adams (the author of the Ryu paper and code) claims significant
performance benefits over glibc and musl. I would wager that it's
significantly faster than the newlib code though.

For a machine without hardware float support, I'd be hard pressed to
guess which is faster; the Ryu code is all done in integer arithmetic,
and so avoids the cost of software floating point, even though it's
doing a lot more visible computation.

> Should both approaches be retained for flexibility and
> reproducibility?

Picolibc retains all of these options in the source code, so you are
free to select the option which is best for your environment. The Debian
packages of picolibc binary versions for RISC-V, ARM and ESP8266 all use
the default options, which use the exact (Ryu) option in the replacement
stdio code.

Configure the build with '-Dio-float-exact=false' to use the inexact
versions, compile with '-Dtinystdio=false -Dnewlib-io-float=true -Dio-long-long=true' to use
the newlib stdio code with float and long long support (which is
always included in the replacement stdio code).

The tests adapt to the inexact mode by reducing the required precision,
which is why picolibc can still pass in that mode. In exact mode, the
tests require exact results. CI tests run on every push show that
picolibc continues to pass this requirement:

        https://github.com/keith-packard/picolibc/actions

> Hopefully any new approach added will be implemented as reentrant functions,
> using externally supplied memory if required, and some interfaces may provide
> that memory statically allocated.

Yes, the new functions use only constant data with all read-write values
being local variables allocated in registers or on the stack. The
picolibc stdio code doesn't have any state in the FILE structure when
performing output, which makes these functions entirely re-entrant. You
can see from the above that picolibc doesn't use a lot of RAM for these
operations; that 24 bytes is all that is required, aside from space on
the stack.

-- 
-keith