possible snprintf() regression in 3.3.2

public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed

* possible snprintf() regression in 3.3.2
@ 2021-11-17  0:37 Tony Cook
  2021-11-17  9:21 ` Takashi Yano
  0 siblings, 1 reply; 23+ messages in thread
From: Tony Cook @ 2021-11-17  0:37 UTC (permalink / raw)
  To: cygwin

This came up from regression testing perl.

Regression testing of perl @4a1b9dd524007193213d3919d6a331109608b90c
used (from uname):

 cygwin_nt-10.0 fv-az177-186 3.3.1(0.34153) 2021-10-28 20:52 x86_64 cygwin

this did not exhibit the problem.  See https://github.com/Perl/perl5/runs/4084168038?check_suite_focus=true

Testing of perl @a85e04e2281234a61c051f8f3ff63bed7381902c, the next
commit, which is purely a documentation change did exhibit the bug, used:

  cygwin_nt-10.0 fv-az177-290 3.3.2(0.34153) 2021-11-08 16:55 x86_64 cygwin

which did crash.  See https://github.com/Perl/perl5/runs/4159124596?check_suite_focus=true

snprintf() appears to be crashing internally to ldtoa_r(), without
cygwin-debuginfo the backtrace is:

Thread 1 "perl" received signal SIGSEGV, Segmentation fault.
0x00007ffd26b21548 in ntdll!RtlVirtualUnwind ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
(gdb) bt
#0  0x00007ffd26b21548 in ntdll!RtlVirtualUnwind ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007ffd26b21040 in ntdll!RtlVirtualUnwind ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#2  0x00007ffd26b20e7b in ntdll!RtlVirtualUnwind ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#3  0x00007ffd26b413a8 in ntdll!RtlRaiseException ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#4  0x00007ffd26b90bfe in ntdll!KiUserExceptionDispatcher ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#5  0x00000001801fec7c in eiremain () from /usr/bin/cygwin1.dll
#6  0x0000000180200319 in _ldtoa_r () from /usr/bin/cygwin1.dll
#7  0x00000001801cfca9 in _svfprintf_r () from /usr/bin/cygwin1.dll
#8  0x00000001801bf327 in snprintf () from /usr/bin/cygwin1.dll
#9  0x000000018018eb0b in _sigfe () from /usr/bin/cygwin1.dll
#10 0x0000000052162647 in Perl_sv_vcatpvfn_flags (my_perl=0x80004a3e0,
    sv=0x800ca9e78, pat=0x523a3501 <regexp_core_intflags_names+4769> "%.9f",
    patlen=4, args=0xffffc550, svargs=0x0, sv_count=0, maybe_tainted=0x0,
    flags=0) at sv.c:13482
#11 0x000000005215e360 in Perl_sv_vsetpvfn (my_perl=0x80004a3e0,
    sv=0x800ca9e78, pat=0x523a3501 <regexp_core_intflags_names+4769> "%.9f",
    patlen=4, args=0xffffc550, svargs=0x0, sv_count=0, maybe_tainted=0x0)
    at sv.c:11271
#12 0x000000005215dde9 in Perl_sv_vsetpvf (my_perl=0x80004a3e0,
    sv=0x800ca9e78, pat=0x523a3501 <regexp_core_intflags_names+4769> "%.9f",
    args=0xffffc550) at sv.c:11101
#13 0x000000005215dd6a in Perl_sv_setpvf (my_perl=0x80004a3e0, sv=0x800ca9e78,
    pat=0x523a3501 <regexp_core_intflags_names+4769> "%.9f") at sv.c:11076
#14 0x000000005210aa74 in Perl_upg_version (my_perl=0x80004a3e0,
    ver=0x800cacb00, qv=false) at /home/tony/dev/perl/git/perl/vutil.c:700
#15 0x00000000520440a4 in XS_universal_version (my_perl=0x80004a3e0,
    cv=0x80004dfa0) at /home/tony/dev/perl/git/perl/vxs.inc:122
#16 0x0000000052142b10 in Perl_pp_entersub (my_perl=0x80004a3e0)
    at pp_hot.c:5361
#17 0x00000000521318e7 in Perl_runops_standard (my_perl=0x80004a3e0)
    at run.c:41
#18 0x00000000520376ff in S_run_body (my_perl=0x80004a3e0, oldscope=1)
    at perl.c:2715
#19 0x0000000052037214 in perl_run (my_perl=0x80004a3e0) at perl.c:2643
#20 0x000000010040117c in main (argc=4, argv=0xffffcc00, env=0x8000281a0)
    at perlmain.c:110

With cygwin-debuginfo installed the backtrace is:

Thread 1 "perl" received signal SIGSEGV, Segmentation fault.
0x00007ffd26b21548 in ntdll!RtlVirtualUnwind ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
(gdb) bt
#0  0x00007ffd26b21548 in ntdll!RtlVirtualUnwind ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007ffd26b21040 in ntdll!RtlVirtualUnwind ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#2  0x00007ffd26b20e7b in ntdll!RtlVirtualUnwind ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#3  0x00007ffd26b413a8 in ntdll!RtlRaiseException ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#4  0x00007ffd26b90bfe in ntdll!KiUserExceptionDispatcher ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#5  0x00000001801fec7c in eiremain (den=0x1, num=0x0, ldp=0x5)
    at /usr/src/debug/cygwin-3.3.2-1/newlib/libc/stdlib/ldtoa.c:3736
#6  0x00000001802bb0b0 in etens () from /usr/bin/cygwin1.dll
#7  0x00000000ffffbac2 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The stack appears to be badly corrupted (etens isn't a function).

Unfortunately I haven't been able to make a simple pure C reproducer
for this.

To reproduce:

Fetch and build perl (needs make, gcc):

   $ git clone https://github.com/Perl/perl5.git
   $ cd perl5
   $ ./Configure -des -Dusedevel -Doptimize=-O0\ -g
   # adjust -j6 to the desired parallel jobs
   $ make -j6 test-prep

reproducing the original failure:

  $ cd t ; gdb --args ./perl -I.. -MTestInit ../cpan/version/t/01base.t

a simpler test case, with some debugging:

  $ echo '$Foo::VERSION = 9.e99; eval { Foo->VERSION(9e99) }' >mytest.pl
  $ gdb --args ./perl mytest.pl
  ...
  Reading symbols from ./perl...
  (gdb) b vutil.c:700
  No source file named vutil.c.
  Make breakpoint pending on future shared library load? (y or [n]) y
  Breakpoint 1 (vutil.c:700) pending.
  (gdb) r
  Starting program: /home/tony/dev/perl/git/perl/perl mytest.pl
  [New Thread 16120.0x3a54]
  [New Thread 16120.0x3368]
  [New Thread 16120.0x134]
  [New Thread 16120.0x133c]

  Thread 1 "perl" hit Breakpoint 1, Perl_upg_version (my_perl=0x80004a3b0,
      ver=0x800090c98, qv=false) at /home/tony/dev/perl/git/perl/vutil.c:700
  700                     Perl_sv_setpvf(aTHX_ sv, "%.9" NVff, SvNVX(ver));
  (gdb) b sv.c:13482
  Breakpoint 2 at 0x521624f6: file sv.c, line 13482.
  (gdb) c
  Continuing.

  Thread 1 "perl" hit Breakpoint 2, Perl_sv_vcatpvfn_flags (my_perl=0x80004a3b0,
      sv=0x800090c80, pat=0x523a3501 <regexp_core_intflags_names+4769> "%.9f",
      patlen=4, args=0xffffc580, svargs=0x0, sv_count=0, maybe_tainted=0x0,
      flags=0) at sv.c:13482
  13482                   WITH_LC_NUMERIC_SET_TO_NEEDED_IN(in_lc_numeric,
  (gdb) p my_perl->Iefloatbuf
  $1 = 0x80009b540 ""
  (gdb) p my_perl->Iefloatsize
  $2 = 177
  (gdb) p ptr
  $3 = 0xffffc44a "%.9f"
  (gdb) p fv
  $4 = 8.99999999999999994886e+99
  (gdb) c
  Continuing.
  [New Thread 16120.0x3a98]
  [New Thread 16120.0x1c3c]

  Thread 1 "perl" received signal SIGSEGV, Segmentation fault.
  0x00007ffd26b21548 in ntdll!RtlVirtualUnwind ()
     from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll

The code at sv.c:13482 is 13482 is:

                WITH_LC_NUMERIC_SET_TO_NEEDED_IN(in_lc_numeric,
                    elen = ((intsize == 'q')
                            ? my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, fv)
                            : my_snprintf(PL_efloatbuf, PL_efloatsize, ptr, (double)fv))
                );

Given the macro definitions and value of intsize at this point, this
becomes a call to snprintf:

  snprintf(my_perl->Iefloatbuf, my_perl->Iefloatsize, ptr, (double)fv)

I suspect

  https://cygwin.com/git/?p=newlib-cygwin.git;a=commit;h=4d90e5335914551862831de3e02f6c102b78435b

but I'm not really setup to build and debug cygwin.

Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-17  0:37 possible snprintf() regression in 3.3.2 Tony Cook
@ 2021-11-17  9:21 ` Takashi Yano
  2021-11-17 12:27   ` Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Takashi Yano @ 2021-11-17  9:21 UTC (permalink / raw)
  To: cygwin

On Wed, 17 Nov 2021 11:37:18 +1100
Tony Cook wrote:
> This came up from regression testing perl.
> 
> Regression testing of perl @4a1b9dd524007193213d3919d6a331109608b90c
> used (from uname):
> 
>  cygwin_nt-10.0 fv-az177-186 3.3.1(0.34153) 2021-10-28 20:52 x86_64 cygwin
> 
> this did not exhibit the problem.  See https://github.com/Perl/perl5/runs/4084168038?check_suite_focus=true
> 
> Testing of perl @a85e04e2281234a61c051f8f3ff63bed7381902c, the next
> commit, which is purely a documentation change did exhibit the bug, used:
> 
>   cygwin_nt-10.0 fv-az177-290 3.3.2(0.34153) 2021-11-08 16:55 x86_64 cygwin
> 
> which did crash.  See https://github.com/Perl/perl5/runs/4159124596?check_suite_focus=true
> 
> snprintf() appears to be crashing internally to ldtoa_r(), without
> cygwin-debuginfo the backtrace is:
> 
> Thread 1 "perl" received signal SIGSEGV, Segmentation fault.
> 0x00007ffd26b21548 in ntdll!RtlVirtualUnwind ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> (gdb) bt
> #0  0x00007ffd26b21548 in ntdll!RtlVirtualUnwind ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #1  0x00007ffd26b21040 in ntdll!RtlVirtualUnwind ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #2  0x00007ffd26b20e7b in ntdll!RtlVirtualUnwind ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #3  0x00007ffd26b413a8 in ntdll!RtlRaiseException ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #4  0x00007ffd26b90bfe in ntdll!KiUserExceptionDispatcher ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #5  0x00000001801fec7c in eiremain () from /usr/bin/cygwin1.dll
> #6  0x0000000180200319 in _ldtoa_r () from /usr/bin/cygwin1.dll
> #7  0x00000001801cfca9 in _svfprintf_r () from /usr/bin/cygwin1.dll
> #8  0x00000001801bf327 in snprintf () from /usr/bin/cygwin1.dll
> #9  0x000000018018eb0b in _sigfe () from /usr/bin/cygwin1.dll
> #10 0x0000000052162647 in Perl_sv_vcatpvfn_flags (my_perl=0x80004a3e0,
>     sv=0x800ca9e78, pat=0x523a3501 <regexp_core_intflags_names+4769> "%.9f",
>     patlen=4, args=0xffffc550, svargs=0x0, sv_count=0, maybe_tainted=0x0,
>     flags=0) at sv.c:13482
> #11 0x000000005215e360 in Perl_sv_vsetpvfn (my_perl=0x80004a3e0,
>     sv=0x800ca9e78, pat=0x523a3501 <regexp_core_intflags_names+4769> "%.9f",
>     patlen=4, args=0xffffc550, svargs=0x0, sv_count=0, maybe_tainted=0x0)
>     at sv.c:11271
> #12 0x000000005215dde9 in Perl_sv_vsetpvf (my_perl=0x80004a3e0,
>     sv=0x800ca9e78, pat=0x523a3501 <regexp_core_intflags_names+4769> "%.9f",
>     args=0xffffc550) at sv.c:11101
> #13 0x000000005215dd6a in Perl_sv_setpvf (my_perl=0x80004a3e0, sv=0x800ca9e78,
>     pat=0x523a3501 <regexp_core_intflags_names+4769> "%.9f") at sv.c:11076
> #14 0x000000005210aa74 in Perl_upg_version (my_perl=0x80004a3e0,
>     ver=0x800cacb00, qv=false) at /home/tony/dev/perl/git/perl/vutil.c:700
> #15 0x00000000520440a4 in XS_universal_version (my_perl=0x80004a3e0,
>     cv=0x80004dfa0) at /home/tony/dev/perl/git/perl/vxs.inc:122
> #16 0x0000000052142b10 in Perl_pp_entersub (my_perl=0x80004a3e0)
>     at pp_hot.c:5361
> #17 0x00000000521318e7 in Perl_runops_standard (my_perl=0x80004a3e0)
>     at run.c:41
> #18 0x00000000520376ff in S_run_body (my_perl=0x80004a3e0, oldscope=1)
>     at perl.c:2715
> #19 0x0000000052037214 in perl_run (my_perl=0x80004a3e0) at perl.c:2643
> #20 0x000000010040117c in main (argc=4, argv=0xffffcc00, env=0x8000281a0)
>     at perlmain.c:110
> 
> With cygwin-debuginfo installed the backtrace is:
> 
> Thread 1 "perl" received signal SIGSEGV, Segmentation fault.
> 0x00007ffd26b21548 in ntdll!RtlVirtualUnwind ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> (gdb) bt
> #0  0x00007ffd26b21548 in ntdll!RtlVirtualUnwind ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #1  0x00007ffd26b21040 in ntdll!RtlVirtualUnwind ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #2  0x00007ffd26b20e7b in ntdll!RtlVirtualUnwind ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #3  0x00007ffd26b413a8 in ntdll!RtlRaiseException ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #4  0x00007ffd26b90bfe in ntdll!KiUserExceptionDispatcher ()
>    from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
> #5  0x00000001801fec7c in eiremain (den=0x1, num=0x0, ldp=0x5)
>     at /usr/src/debug/cygwin-3.3.2-1/newlib/libc/stdlib/ldtoa.c:3736
> #6  0x00000001802bb0b0 in etens () from /usr/bin/cygwin1.dll
> #7  0x00000000ffffbac2 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> 
> The stack appears to be badly corrupted (etens isn't a function).

I found the caused by the commit:
commit 4d90e5335914551862831de3e02f6c102b78435b
Author: Corinna Vinschen <corinna@vinschen.de>
Date:   Thu Nov 4 11:30:44 2021 +0100

    ldtoa: fix dropping too many digits from output

    ldtoa cuts the number of digits it returns based on a computation of
    number of supported bits (144) divide by log10(2).  Not only is the
    integer approximation of log10(2) ~= 8/27 missing a digit here, it
    also fails to take really small double and long double values into
    account.

    Allow for the full potential precision of long double values.  At the
    same time, change the local string array allocation to request only as
    much bytes as necessary to support the caller-requested number of
    digits, to keep the stack size low on small targets.

    In the long run a better fix would be to switch to gdtoa, as the BSD
    variants, as well as Mingw64 do.

    Signed-off-by: Corinna Vinschen <corinna@vinschen.de>

Reverting this commit solves the problem.

Corinna, could you please have a look?

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-17  9:21 ` Takashi Yano
@ 2021-11-17 12:27   ` Corinna Vinschen
  2021-11-18  0:06     ` Tony Cook
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2021-11-17 12:27 UTC (permalink / raw)
  To: cygwin

On Nov 17 18:21, Takashi Yano via Cygwin wrote:
> On Wed, 17 Nov 2021 11:37:18 +1100
> Tony Cook wrote:
> > This came up from regression testing perl.
> > 
> > Regression testing of perl @4a1b9dd524007193213d3919d6a331109608b90c
> > used (from uname):
> > [...]
> I found the caused by the commit:
> commit 4d90e5335914551862831de3e02f6c102b78435b
> Author: Corinna Vinschen <corinna@vinschen.de>
> Date:   Thu Nov 4 11:30:44 2021 +0100
> 
>     ldtoa: fix dropping too many digits from output
> 
>     ldtoa cuts the number of digits it returns based on a computation of
>     number of supported bits (144) divide by log10(2).  Not only is the
>     integer approximation of log10(2) ~= 8/27 missing a digit here, it
>     also fails to take really small double and long double values into
>     account.
> 
>     Allow for the full potential precision of long double values.  At the
>     same time, change the local string array allocation to request only as
>     much bytes as necessary to support the caller-requested number of
>     digits, to keep the stack size low on small targets.
> 
>     In the long run a better fix would be to switch to gdtoa, as the BSD
>     variants, as well as Mingw64 do.
> 
>     Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
> 
> Reverting this commit solves the problem.
> 
> Corinna, could you please have a look?

I don't have a good solution.  The old ldtoa code is lacking, for
switching newlib to gdtoa I simply don't have the time.  On the newlib
list was a short discussion starting at
https://sourceware.org/pipermail/newlib/2021/018626.html but nothing
came out of it yet.

Patches gratefully accepted (except just reverting the above change).


Corinna

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-17 12:27   ` Corinna Vinschen
@ 2021-11-18  0:06     ` Tony Cook
  2021-11-18 11:35       ` Takashi Yano
  0 siblings, 1 reply; 23+ messages in thread
From: Tony Cook @ 2021-11-18  0:06 UTC (permalink / raw)
  To: cygwin

On Wed, Nov 17, 2021 at 01:27:55PM +0100, Corinna Vinschen via Cygwin wrote:
> On Nov 17 18:21, Takashi Yano via Cygwin wrote:
> > On Wed, 17 Nov 2021 11:37:18 +1100
> > Tony Cook wrote:
> > > This came up from regression testing perl.
> > > 
> > > Regression testing of perl @4a1b9dd524007193213d3919d6a331109608b90c
> > > used (from uname):
> > > [...]
> > I found the caused by the commit:
> > commit 4d90e5335914551862831de3e02f6c102b78435b
> > Author: Corinna Vinschen <corinna@vinschen.de>
> > Date:   Thu Nov 4 11:30:44 2021 +0100
> > 
> >     ldtoa: fix dropping too many digits from output
> > 
> >     ldtoa cuts the number of digits it returns based on a computation of
> >     number of supported bits (144) divide by log10(2).  Not only is the
> >     integer approximation of log10(2) ~= 8/27 missing a digit here, it
> >     also fails to take really small double and long double values into
> >     account.
> > 
> >     Allow for the full potential precision of long double values.  At the
> >     same time, change the local string array allocation to request only as
> >     much bytes as necessary to support the caller-requested number of
> >     digits, to keep the stack size low on small targets.
> > 
> >     In the long run a better fix would be to switch to gdtoa, as the BSD
> >     variants, as well as Mingw64 do.
> > 
> >     Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
> > 
> > Reverting this commit solves the problem.
> > 
> > Corinna, could you please have a look?
> 
> I don't have a good solution.  The old ldtoa code is lacking, for
> switching newlib to gdtoa I simply don't have the time.  On the newlib
> list was a short discussion starting at
> https://sourceware.org/pipermail/newlib/2021/018626.html but nothing
> came out of it yet.
> 
> Patches gratefully accepted (except just reverting the above change).

From what I can tell the problem has nothing to do with the extra
precision, but has to do with misusing ndigits for the buffer size
with a %f format string, leading to a buffer overflow.

At entry to _ldtoa_r() ndigits is 9, but for a %f format with a large
number the number of digits is more closely related to the magnitude
of the number, not ndigits.

With the input number (9e99) and the supplied format I'd expect 109
characters output, but outbuf is only:

   ndigits + MAX_EXP_DIGITS + 10 = 9 + 5 + 10 = 24

characters in length.

Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-18  0:06     ` Tony Cook
@ 2021-11-18 11:35       ` Takashi Yano
  2021-11-18 13:19         ` Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Takashi Yano @ 2021-11-18 11:35 UTC (permalink / raw)
  To: cygwin

On Thu, 18 Nov 2021 11:06:49 +1100
Tony Cook wrote:
> On Wed, Nov 17, 2021 at 01:27:55PM +0100, Corinna Vinschen via Cygwin wrote:
> > I don't have a good solution.  The old ldtoa code is lacking, for
> > switching newlib to gdtoa I simply don't have the time.  On the newlib
> > list was a short discussion starting at
> > https://sourceware.org/pipermail/newlib/2021/018626.html but nothing
> > came out of it yet.
> > 
> > Patches gratefully accepted (except just reverting the above change).
> 
> From what I can tell the problem has nothing to do with the extra
> precision, but has to do with misusing ndigits for the buffer size
> with a %f format string, leading to a buffer overflow.
> 
> At entry to _ldtoa_r() ndigits is 9, but for a %f format with a large
> number the number of digits is more closely related to the magnitude
> of the number, not ndigits.
> 
> With the input number (9e99) and the supplied format I'd expect 109
> characters output, but outbuf is only:
> 
>    ndigits + MAX_EXP_DIGITS + 10 = 9 + 5 + 10 = 24
> 
> characters in length.

Then, isn't the following the right thing?

diff --git a/newlib/libc/stdlib/ldtoa.c b/newlib/libc/stdlib/ldtoa.c
index 7da61457b..826a1b2ed 100644
--- a/newlib/libc/stdlib/ldtoa.c
+++ b/newlib/libc/stdlib/ldtoa.c
@@ -2794,6 +2794,7 @@ _ldtoa_r (struct _reent *ptr, long double d, int mode, int ndigits,
   LDPARMS rnd;
   LDPARMS *ldp = &rnd;
   char *outstr;
+  char outbuf[NDEC + MAX_EXP_DIGITS + 10];
   union uconv du;
   du.d = d;
 
@@ -2840,8 +2841,6 @@ _ldtoa_r (struct _reent *ptr, long double d, int mode, int ndigits,
   if (ndigits > NDEC)
     ndigits = NDEC;
 
-  char outbuf[ndigits + MAX_EXP_DIGITS + 10];
-
   etoasc (e, outbuf, ndigits, mode, ldp);
   s = outbuf;
   if (eisinf (e) || eisnan (e))


-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-18 11:35       ` Takashi Yano
@ 2021-11-18 13:19         ` Corinna Vinschen
  2021-11-18 14:11           ` Noel Grandin
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2021-11-18 13:19 UTC (permalink / raw)
  To: cygwin

On Nov 18 20:35, Takashi Yano via Cygwin wrote:
> On Thu, 18 Nov 2021 11:06:49 +1100
> Tony Cook wrote:
> > On Wed, Nov 17, 2021 at 01:27:55PM +0100, Corinna Vinschen via Cygwin wrote:
> > > I don't have a good solution.  The old ldtoa code is lacking, for
> > > switching newlib to gdtoa I simply don't have the time.  On the newlib
> > > list was a short discussion starting at
> > > https://sourceware.org/pipermail/newlib/2021/018626.html but nothing
> > > came out of it yet.
> > > 
> > > Patches gratefully accepted (except just reverting the above change).
> > 
> > From what I can tell the problem has nothing to do with the extra
> > precision, but has to do with misusing ndigits for the buffer size
> > with a %f format string, leading to a buffer overflow.
> > 
> > At entry to _ldtoa_r() ndigits is 9, but for a %f format with a large
> > number the number of digits is more closely related to the magnitude
> > of the number, not ndigits.
> > 
> > With the input number (9e99) and the supplied format I'd expect 109
> > characters output, but outbuf is only:
> > 
> >    ndigits + MAX_EXP_DIGITS + 10 = 9 + 5 + 10 = 24
> > 
> > characters in length.
> 
> Then, isn't the following the right thing?
> 
> diff --git a/newlib/libc/stdlib/ldtoa.c b/newlib/libc/stdlib/ldtoa.c
> index 7da61457b..826a1b2ed 100644
> --- a/newlib/libc/stdlib/ldtoa.c
> +++ b/newlib/libc/stdlib/ldtoa.c
> @@ -2794,6 +2794,7 @@ _ldtoa_r (struct _reent *ptr, long double d, int mode, int ndigits,
>    LDPARMS rnd;
>    LDPARMS *ldp = &rnd;
>    char *outstr;
> +  char outbuf[NDEC + MAX_EXP_DIGITS + 10];
>    union uconv du;
>    du.d = d;
>  
> @@ -2840,8 +2841,6 @@ _ldtoa_r (struct _reent *ptr, long double d, int mode, int ndigits,
>    if (ndigits > NDEC)
>      ndigits = NDEC;
>  
> -  char outbuf[ndigits + MAX_EXP_DIGITS + 10];
> -
>    etoasc (e, outbuf, ndigits, mode, ldp);
>    s = outbuf;
>    if (eisinf (e) || eisnan (e))

Ouch.

My patch raised NDEC from 43 to 1023 to allow aproximately the same
number of digits as glibc.  Newlib strives to support embedded targets
and bare metal.  Some of them are lucky if they have a stack size of 1K.
The outbuf buffer is created on the stack, so I used ndigits to save
stack space.

While that patch fixes the reported problem, it will make users of
smaller-than-Cygwin targets pretty unhappy.

A workaround would be to malloc outbuf instead.  Given that printf
doesn't work without malloc anyway, that might be a working workaround,
until somebody takes heart and provides newlib with a new ldtoa
solution.

Either way, this discussion should better take place on the newlib
mailing list, given the embedded stakeholders on this list, ideally
as reply to

https://sourceware.org/pipermail/newlib/2021/018626.html

If push comes to shove, we probably have to revert my patch for the time
being, I guess.


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-18 13:19         ` Corinna Vinschen
@ 2021-11-18 14:11           ` Noel Grandin
  2021-11-18 14:27             ` Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Noel Grandin @ 2021-11-18 14:11 UTC (permalink / raw)
  To: cygwin



On 2021/11/18 3:19 pm, Corinna Vinschen via Cygwin wrote:
> My patch raised NDEC from 43 to 1023 to allow aproximately the same
> number of digits as glibc.  Newlib strives to support embedded targets
> and bare metal.  Some of them are lucky if they have a stack size of 1K.
> The outbuf buffer is created on the stack, so I used ndigits to save
> stack space.
> 
> While that patch fixes the reported problem, it will make users of
> smaller-than-Cygwin targets pretty unhappy.
> 
> A workaround would be to malloc outbuf instead.  Given that printf

printf is often performance sensitive, and using malloc there would likely be significantly slower.

Possibly use alloca() to allocate only the necessary amount on stack?
Seems unlikely that embedded systems would be printing values that needed such large space anyway.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-18 14:11           ` Noel Grandin
@ 2021-11-18 14:27             ` Corinna Vinschen
  2021-11-18 21:08               ` Sam Edge
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2021-11-18 14:27 UTC (permalink / raw)
  To: cygwin

On Nov 18 16:11, Noel Grandin via Cygwin wrote:
> 
> 
> On 2021/11/18 3:19 pm, Corinna Vinschen via Cygwin wrote:
> > My patch raised NDEC from 43 to 1023 to allow aproximately the same
> > number of digits as glibc.  Newlib strives to support embedded targets
> > and bare metal.  Some of them are lucky if they have a stack size of 1K.
> > The outbuf buffer is created on the stack, so I used ndigits to save
> > stack space.
> > 
> > While that patch fixes the reported problem, it will make users of
> > smaller-than-Cygwin targets pretty unhappy.
> > 
> > A workaround would be to malloc outbuf instead.  Given that printf
> 
> printf is often performance sensitive, and using malloc there would likely be significantly slower.
> 
> Possibly use alloca() to allocate only the necessary amount on stack?

That's kind of what the current code does.

But that's apparently the problem.  The necessary amount is only known to
the current algorithm while populating outbuf already.  So before my
patch, outbuf had a constant size, but it was size restricted.

> Seems unlikely that embedded systems would be printing values that needed such large space anyway.

Perhaps that's a workaround:

Use a constant buffer size, but use NDEC = 1023 only on Cygwin for the
time being, something like NDEC = 64 otherwise...


Corinna

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-18 14:27             ` Corinna Vinschen
@ 2021-11-18 21:08               ` Sam Edge
  2021-11-21  0:16                 ` Tony Cook
  0 siblings, 1 reply; 23+ messages in thread
From: Sam Edge @ 2021-11-18 21:08 UTC (permalink / raw)
  To: cygwin

On 18/11/2021 14:27, Corinna Vinschen via Cygwin wrote:

> On Nov 18 16:11, Noel Grandin via Cygwin wrote:
>>
>> On 2021/11/18 3:19 pm, Corinna Vinschen via Cygwin wrote:
>>> My patch raised NDEC from 43 to 1023 to allow aproximately the same
>>> number of digits as glibc.  Newlib strives to support embedded targets
>>> and bare metal.  Some of them are lucky if they have a stack size of 1K.
>>> The outbuf buffer is created on the stack, so I used ndigits to save
>>> stack space.
>>>
>>> While that patch fixes the reported problem, it will make users of
>>> smaller-than-Cygwin targets pretty unhappy.
>>>
>>> A workaround would be to malloc outbuf instead.  Given that printf
>> printf is often performance sensitive, and using malloc there would likely be significantly slower.
>>
>> Possibly use alloca() to allocate only the necessary amount on stack?
> That's kind of what the current code does.
>
> But that's apparently the problem.  The necessary amount is only known to
> the current algorithm while populating outbuf already.  So before my
> patch, outbuf had a constant size, but it was size restricted.
>
>> Seems unlikely that embedded systems would be printing values that needed such large space anyway.
> Perhaps that's a workaround:
>
> Use a constant buffer size, but use NDEC = 1023 only on Cygwin for the
> time being, something like NDEC = 64 otherwise...
>
>
> Corinna


Hi all.

I use newlib on embedded with threading libs that have predetermined
fixed thread stack sizes. While we tend to have more RAM than in former
times we also have multiple thread stacks. Use of alloca() or variable
length automatic arrays makes me wince especially in code I might not be
able to avoid calling which is often the case with XXXprintf() in
third-party libraries' debug output. I'd usually rather take the
performance hit from using heap instead of having to make all my stacks
bigger.

Just my two penn'orth.

Cheers.
--
Sam


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-18 21:08               ` Sam Edge
@ 2021-11-21  0:16                 ` Tony Cook
  2021-11-22 10:34                   ` Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Tony Cook @ 2021-11-21  0:16 UTC (permalink / raw)
  To: Sam Edge; +Cc: cygwin

On Thu, Nov 18, 2021 at 09:08:40PM +0000, Sam Edge via Cygwin wrote:
> I use newlib on embedded with threading libs that have predetermined
> fixed thread stack sizes. While we tend to have more RAM than in former
> times we also have multiple thread stacks. Use of alloca() or variable
> length automatic arrays makes me wince especially in code I might not be
> able to avoid calling which is often the case with XXXprintf() in
> third-party libraries' debug output. I'd usually rather take the
> performance hit from using heap instead of having to make all my stacks
> bigger.

A simple option would be to use an small auto fixed buffer for most
conversions, but use malloc() for %f formats for numbers greater in
magnitude than some limit, though it would also need to be adjusted
for the precision (ndigits here), since they take extra space.

This would avoid using the optional-to-implement VLA feature too.

Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-21  0:16                 ` Tony Cook
@ 2021-11-22 10:34                   ` Corinna Vinschen
  2021-11-22 13:04                     ` Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2021-11-22 10:34 UTC (permalink / raw)
  To: cygwin

On Nov 21 11:16, Tony Cook wrote:
> On Thu, Nov 18, 2021 at 09:08:40PM +0000, Sam Edge via Cygwin wrote:
> > I use newlib on embedded with threading libs that have predetermined
> > fixed thread stack sizes. While we tend to have more RAM than in former
> > times we also have multiple thread stacks. Use of alloca() or variable
> > length automatic arrays makes me wince especially in code I might not be
> > able to avoid calling which is often the case with XXXprintf() in
> > third-party libraries' debug output. I'd usually rather take the
> > performance hit from using heap instead of having to make all my stacks
> > bigger.
> 
> A simple option would be to use an small auto fixed buffer for most
> conversions, but use malloc() for %f formats for numbers greater in
> magnitude than some limit, though it would also need to be adjusted
> for the precision (ndigits here), since they take extra space.
> 
> This would avoid using the optional-to-implement VLA feature too.

Good idea.  I guess I create a simple fix doing just that.


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-22 10:34                   ` Corinna Vinschen
@ 2021-11-22 13:04                     ` Corinna Vinschen
  2021-11-22 23:23                       ` Tony Cook
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2021-11-22 13:04 UTC (permalink / raw)
  To: cygwin

On Nov 22 11:34, Corinna Vinschen via Cygwin wrote:
> On Nov 21 11:16, Tony Cook wrote:
> > On Thu, Nov 18, 2021 at 09:08:40PM +0000, Sam Edge via Cygwin wrote:
> > > I use newlib on embedded with threading libs that have predetermined
> > > fixed thread stack sizes. While we tend to have more RAM than in former
> > > times we also have multiple thread stacks. Use of alloca() or variable
> > > length automatic arrays makes me wince especially in code I might not be
> > > able to avoid calling which is often the case with XXXprintf() in
> > > third-party libraries' debug output. I'd usually rather take the
> > > performance hit from using heap instead of having to make all my stacks
> > > bigger.
> > 
> > A simple option would be to use an small auto fixed buffer for most
> > conversions, but use malloc() for %f formats for numbers greater in
> > magnitude than some limit, though it would also need to be adjusted
> > for the precision (ndigits here), since they take extra space.
> > 
> > This would avoid using the optional-to-implement VLA feature too.
> 
> Good idea.  I guess I create a simple fix doing just that.

I created a patch:
https://sourceware.org/git/?p=newlib-cygwin.git;a=commitdiff;h=68faeef4be71

Please test the latest developer snapshot from http://cygwin.com/snapshots/


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-22 13:04                     ` Corinna Vinschen
@ 2021-11-22 23:23                       ` Tony Cook
  2021-11-23  8:34                         ` Takashi Yano
  0 siblings, 1 reply; 23+ messages in thread
From: Tony Cook @ 2021-11-22 23:23 UTC (permalink / raw)
  To: cygwin

On Mon, Nov 22, 2021 at 02:04:06PM +0100, Corinna Vinschen via Cygwin wrote:
> On Nov 22 11:34, Corinna Vinschen via Cygwin wrote:
> > On Nov 21 11:16, Tony Cook wrote:
> > > On Thu, Nov 18, 2021 at 09:08:40PM +0000, Sam Edge via Cygwin wrote:
> > > > I use newlib on embedded with threading libs that have predetermined
> > > > fixed thread stack sizes. While we tend to have more RAM than in former
> > > > times we also have multiple thread stacks. Use of alloca() or variable
> > > > length automatic arrays makes me wince especially in code I might not be
> > > > able to avoid calling which is often the case with XXXprintf() in
> > > > third-party libraries' debug output. I'd usually rather take the
> > > > performance hit from using heap instead of having to make all my stacks
> > > > bigger.
> > > 
> > > A simple option would be to use an small auto fixed buffer for most
> > > conversions, but use malloc() for %f formats for numbers greater in
> > > magnitude than some limit, though it would also need to be adjusted
> > > for the precision (ndigits here), since they take extra space.
> > > 
> > > This would avoid using the optional-to-implement VLA feature too.
> > 
> > Good idea.  I guess I create a simple fix doing just that.
> 
> I created a patch:
> https://sourceware.org/git/?p=newlib-cygwin.git;a=commitdiff;h=68faeef4be71
> 
> Please test the latest developer snapshot from http://cygwin.com/snapshots/

I don't think this solves the fundamental problem.

Simply looking at ndigits isn't enough for %f.

For %f with a large number (like 9e99), the buffer size required is
ndigits plus (roughly) log10(n), which we can further estimate
with log2(n)*146/485 (log2(10) is 3.32 ~== 485/146)

I think something more like:

  size_t outsize;
  if (mode == 3) {        /* %f */
    int expon = (e[NI-1] & 0x7fff) - (EXONE - 1); /* exponent part of float */
    /* log2(10) approximately 485/146 */
    outsize = expon * 146 / 485 + ndigits + 10;
  }
  else { /* %g/%e */
    outsize = ndigits + MAX_EXP_DIGITS + 10;
  }
  if (outsize > NDEC_SML) {
    outbuf = (char *)_malloc_r(ptr, outsize);
  }

You'll probably need to pass outsize into etoasc() rather than
calculating it.

See https://github.com/Perl/perl5/blob/blead/sv.c#L13295 for code in
perl that calculates the buffer size needed for %f (precis aka ndigits
is added at line 13385).

Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-22 23:23                       ` Tony Cook
@ 2021-11-23  8:34                         ` Takashi Yano
  2021-11-23  9:48                           ` Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Takashi Yano @ 2021-11-23  8:34 UTC (permalink / raw)
  To: cygwin

On Tue, 23 Nov 2021 10:23:02 +1100
Tony Cook wrote:
> On Mon, Nov 22, 2021 at 02:04:06PM +0100, Corinna Vinschen via Cygwin wrote:
> > On Nov 22 11:34, Corinna Vinschen via Cygwin wrote:
> > > On Nov 21 11:16, Tony Cook wrote:
> > > > On Thu, Nov 18, 2021 at 09:08:40PM +0000, Sam Edge via Cygwin wrote:
> > > > > I use newlib on embedded with threading libs that have predetermined
> > > > > fixed thread stack sizes. While we tend to have more RAM than in former
> > > > > times we also have multiple thread stacks. Use of alloca() or variable
> > > > > length automatic arrays makes me wince especially in code I might not be
> > > > > able to avoid calling which is often the case with XXXprintf() in
> > > > > third-party libraries' debug output. I'd usually rather take the
> > > > > performance hit from using heap instead of having to make all my stacks
> > > > > bigger.
> > > > 
> > > > A simple option would be to use an small auto fixed buffer for most
> > > > conversions, but use malloc() for %f formats for numbers greater in
> > > > magnitude than some limit, though it would also need to be adjusted
> > > > for the precision (ndigits here), since they take extra space.
> > > > 
> > > > This would avoid using the optional-to-implement VLA feature too.
> > > 
> > > Good idea.  I guess I create a simple fix doing just that.
> > 
> > I created a patch:
> > https://sourceware.org/git/?p=newlib-cygwin.git;a=commitdiff;h=68faeef4be71
> > 
> > Please test the latest developer snapshot from http://cygwin.com/snapshots/
> 
> I don't think this solves the fundamental problem.
> 
> Simply looking at ndigits isn't enough for %f.
> 
> For %f with a large number (like 9e99), the buffer size required is
> ndigits plus (roughly) log10(n), which we can further estimate
> with log2(n)*146/485 (log2(10) is 3.32 ~== 485/146)
> 
> I think something more like:
> 
>   size_t outsize;
>   if (mode == 3) {        /* %f */
>     int expon = (e[NI-1] & 0x7fff) - (EXONE - 1); /* exponent part of float */
>     /* log2(10) approximately 485/146 */
>     outsize = expon * 146 / 485 + ndigits + 10;
>   }
>   else { /* %g/%e */
>     outsize = ndigits + MAX_EXP_DIGITS + 10;
>   }
>   if (outsize > NDEC_SML) {
>     outbuf = (char *)_malloc_r(ptr, outsize);
>   }
> 
> You'll probably need to pass outsize into etoasc() rather than
> calculating it.
> 
> See https://github.com/Perl/perl5/blob/blead/sv.c#L13295 for code in
> perl that calculates the buffer size needed for %f (precis aka ndigits
> is added at line 13385).

I guess Corinna thinks that 'ndigits' keeps the total number
of digits to be printed.

However, in reality, for example in the case:
snprintf(buf, sizeof(buf), "%.3f", 1234567890123456.789);
'ndigits' is only 3 even though total digits will be 20.

So, Tony thinks current code does not correct.

However, I think something is wrong with interpretation
of 'ndigits' in dltoa.c.

printf("%.3f\n", sqrt(2)*1e70);
printf("%.50f\n", sqrt(2)*1e70);

outputs

14142135623730951759073108307330633613786387000000000000000000000000000.000
14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078

Is this as intended?

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-23  8:34                         ` Takashi Yano
@ 2021-11-23  9:48                           ` Corinna Vinschen
  2021-11-24  3:40                             ` Takashi Yano
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2021-11-23  9:48 UTC (permalink / raw)
  To: cygwin

On Nov 23 17:34, Takashi Yano via Cygwin wrote:
> On Tue, 23 Nov 2021 10:23:02 +1100
> Tony Cook wrote:
> > On Mon, Nov 22, 2021 at 02:04:06PM +0100, Corinna Vinschen via Cygwin wrote:
> > > On Nov 22 11:34, Corinna Vinschen via Cygwin wrote:
> > > > On Nov 21 11:16, Tony Cook wrote:
> > > > > A simple option would be to use an small auto fixed buffer for most
> > > > > conversions, but use malloc() for %f formats for numbers greater in
> > > > > magnitude than some limit, though it would also need to be adjusted
> > > > > for the precision (ndigits here), since they take extra space.
> > > > > 
> > > > > This would avoid using the optional-to-implement VLA feature too.
> > > > 
> > > > Good idea.  I guess I create a simple fix doing just that.
> > > 
> > > I created a patch:
> > > https://sourceware.org/git/?p=newlib-cygwin.git;a=commitdiff;h=68faeef4be71
> > I don't think this solves the fundamental problem.
> > 
> > Simply looking at ndigits isn't enough for %f.
> > 
> > For %f with a large number (like 9e99), the buffer size required is
> > ndigits plus (roughly) log10(n), which we can further estimate
> > with log2(n)*146/485 (log2(10) is 3.32 ~== 485/146)
> > 
> > I think something more like:
> > 
> >   size_t outsize;
> >   if (mode == 3) {        /* %f */
> >     int expon = (e[NI-1] & 0x7fff) - (EXONE - 1); /* exponent part of float */
> >     /* log2(10) approximately 485/146 */
> >     outsize = expon * 146 / 485 + ndigits + 10;
> >   }
> >   else { /* %g/%e */
> >     outsize = ndigits + MAX_EXP_DIGITS + 10;
> >   }
> >   if (outsize > NDEC_SML) {
> >     outbuf = (char *)_malloc_r(ptr, outsize);
> >   }
> > 
> > You'll probably need to pass outsize into etoasc() rather than
> > calculating it.
> > 
> > See https://github.com/Perl/perl5/blob/blead/sv.c#L13295 for code in
> > perl that calculates the buffer size needed for %f (precis aka ndigits
> > is added at line 13385).
> 
> I guess Corinna thinks that 'ndigits' keeps the total number
> of digits to be printed.

No, I don't.  It's the requested decimal precision.

However, the fun fact is that ldtoa in newlib is more than 20 years old,
with only minor changes in 2003.  My patches don't change the basic
mechanism of ldtoa.  I just don't have enough knowledge of floating
point arithmetic to do that.  My patches only try to raise the number of
*possible* digits by raising the matching macro and raising the size of
the single, local digit buffer accordingly.

If the above crashed, then probably because the buffer was too small.
That should be fixed now, because the second patch fixes the buffer size
and the computation based on the buffer size.  If that's not the
problem, then, in theory, the same would have occured with the old code.

If my patches are inadequate, we can revert the patches and then the
precision will be restricted to 42 digits again, as before, see the
thread https://sourceware.org/pipermail/newlib/2021/018626.html

For everything else, we either need somebody who knows how to change the
current ldtoa to "do the right thing", whatever that is, or somebody who
takes a stab at replacing ldtoa with another, better alternative.

> However, in reality, for example in the case:
> snprintf(buf, sizeof(buf), "%.3f", 1234567890123456.789);
> 'ndigits' is only 3 even though total digits will be 20.
> 
> So, Tony thinks current code does not correct.
> 
> However, I think something is wrong with interpretation
> of 'ndigits' in dltoa.c.
> 
> printf("%.3f\n", sqrt(2)*1e70);
> printf("%.50f\n", sqrt(2)*1e70);
> 
> outputs
> 
> 14142135623730951759073108307330633613786387000000000000000000000000000.000
> 14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078
> 
> Is this as intended?

On Linux I see

14142135623730951759073108307330633613786387161811679011529922516615168.000
14142135623730951759073108307330633613786387161811679011529922516615168.00000000000000000000000000000000000000000000000000

The newlib output for .3f probably suffers from the fact that ldtoa
chooses the small buffer, which restricts the number of digits to 43 or
44.  But keep in mind that ldtoa *always* restricted the output to 42,
so you never got a more precise output anyway.  Every digit beyond digit
42 is only printed due to the bigger buffer sizes.

So, what newlib and, in extension, Cygwin really needs at this point are
patches to the existing ldtoa or a change to gdtoa or equivalent.

https://cygwin.com/acronyms/#SHTDI
https://cygwin.com/acronyms/#PTC


Corinna

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-23  9:48                           ` Corinna Vinschen
@ 2021-11-24  3:40                             ` Takashi Yano
  2021-11-24  8:48                               ` Corinna Vinschen
  2021-11-24  8:52                               ` Takashi Yano
  0 siblings, 2 replies; 23+ messages in thread
From: Takashi Yano @ 2021-11-24  3:40 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1932 bytes --]

On Tue, 23 Nov 2021 10:48:21 +0100
Corinna Vinschen wrote:
> On Nov 23 17:34, Takashi Yano via Cygwin wrote:
> > However, in reality, for example in the case:
> > snprintf(buf, sizeof(buf), "%.3f", 1234567890123456.789);
> > 'ndigits' is only 3 even though total digits will be 20.
> > 
> > So, Tony thinks current code does not correct.
> > 
> > However, I think something is wrong with interpretation
> > of 'ndigits' in dltoa.c.
> > 
> > printf("%.3f\n", sqrt(2)*1e70);
> > printf("%.50f\n", sqrt(2)*1e70);
> > 
> > outputs
> > 
> > 14142135623730951759073108307330633613786387000000000000000000000000000.000
> > 14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078
> > 
> > Is this as intended?
> 
> On Linux I see
> 
> 14142135623730951759073108307330633613786387161811679011529922516615168.000
> 14142135623730951759073108307330633613786387161811679011529922516615168.00000000000000000000000000000000000000000000000000
> 
> The newlib output for .3f probably suffers from the fact that ldtoa
> chooses the small buffer, which restricts the number of digits to 43 or
> 44.  But keep in mind that ldtoa *always* restricted the output to 42,
> so you never got a more precise output anyway.  Every digit beyond digit
> 42 is only printed due to the bigger buffer sizes.
> 
> So, what newlib and, in extension, Cygwin really needs at this point are
> patches to the existing ldtoa or a change to gdtoa or equivalent.
> 
> https://cygwin.com/acronyms/#SHTDI
> https://cygwin.com/acronyms/#PTC

The attached patch is the one which I think correct so far.

With this patch:

14142135623730951759073108307330633613786386978891021459448717416650727.134
14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078

Isn't this better than current behaviour?


-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

[-- Attachment #2: ldtoa.patch --]
[-- Type: application/octet-stream, Size: 3537 bytes --]

diff --git a/newlib/libc/stdlib/ldtoa.c b/newlib/libc/stdlib/ldtoa.c
index 1b32e801c..95347e017 100644
--- a/newlib/libc/stdlib/ldtoa.c
+++ b/newlib/libc/stdlib/ldtoa.c
@@ -83,7 +83,7 @@ static void eclear (register short unsigned int *x);
 static void einfin (register short unsigned int *x, register LDPARMS * ldp);
 static void efloor (short unsigned int *x, short unsigned int *y,
 		    LDPARMS * ldp);
-static void etoasc (short unsigned int *x, char *string, int ndigs,
+static void etoasc (short unsigned int *x, char *string, int ndec, int ndigs,
 		    int outformat, LDPARMS * ldp);
 
 union uconv
@@ -217,7 +217,7 @@ static const char *const ermsg[7] = {
  *	e24toasc( &f, str, n )	single to ASCII string, n digits after decimal
  *	e53toasc( &d, str, n )	double to ASCII string, n digits after decimal
  *	e64toasc( &d, str, n )	long double to ASCII string
- *	etoasc(e,str,n,fmt,ldp)e to ASCII string, n digits after decimal
+ *	etoasc(e,str,ndec,n,fmt,ldp)e to ASCII string, n digits after decimal
  *	etoe24( e, &f )		convert e type to IEEE single precision
  *	etoe53( e, &d )		convert e type to IEEE double precision
  *	etoe64( e, &d )		convert e type to IEEE long double precision
@@ -2839,23 +2839,33 @@ _ldtoa_r (struct _reent *ptr, long double d, int mode, int ndigits,
 
 /* This sanity limit must agree with the corresponding one in etoasc, to
    keep straight the returned value of outexpon.  Note that we use a dynamic
-   limit now, either NDEC or NDEC_SML, depending on ndigits.  See the usage
-   of "my_NDEC" in etoasc. */
-  if (ndigits > NDEC)
-    ndigits = NDEC;
+   limit now, either ndec (<= NDEC) or NDEC_SML, depending on ndigits. */
+  int ndec;
+  if (mode == 3) /* %f */
+    {
+      int expon = (e[NE - 1] & 0x7fff) - (EXONE - 1); /* exponent part */
+      /* log2(10) approximately 485/146 */
+      ndec = expon * 146 / 485 + ndigits;
+    }
+  else /* %g/%e */
+    ndec = ndigits;
+  if (ndec < 0)
+    ndec = 0;
+  if (ndec > NDEC)
+    ndec = NDEC;
 
   /* Allocate buffer if more than NDEC_SML digits are requested. */
-  if (ndigits > NDEC_SML)
+  if (ndec > NDEC_SML)
     {
-      outbuf = (char *) _malloc_r (ptr, NDEC + MAX_EXP_DIGITS + 10);
+      outbuf = (char *) _malloc_r (ptr, ndec + MAX_EXP_DIGITS + 10);
       if (!outbuf)
 	{
-	  ndigits = NDEC_SML;
+	  ndec = NDEC_SML;
 	  outbuf = outbuf_sml;
 	}
     }
 
-  etoasc (e, outbuf, ndigits, mode, ldp);
+  etoasc (e, outbuf, ndec, ndigits, mode, ldp);
   s = outbuf;
   if (eisinf (e) || eisnan (e))
     {
@@ -2992,8 +3002,8 @@ _ldcheck (long double *d)
 }				/* _ldcheck */
 
 static void
-etoasc (short unsigned int *x, char *string, int ndigits, int outformat,
-	LDPARMS * ldp)
+etoasc (short unsigned int *x, char *string, int ndec, int ndigits,
+	int outformat, LDPARMS * ldp)
 {
   long digit;
   unsigned short y[NI], t[NI], u[NI], w[NI];
@@ -3003,7 +3013,6 @@ etoasc (short unsigned int *x, char *string, int ndigits, int outformat,
   char *s, *ss;
   unsigned short m;
   unsigned short *equot = ldp->equot;
-  int my_NDEC = (ndigits > NDEC_SML) ? NDEC : NDEC_SML;
 
   ndigs = ndigits;
   rndsav = ldp->rndprc;
@@ -3129,7 +3138,7 @@ tnzro:
       else
 	{
 	  emovi (y, w);
-	  for (i = 0; i < my_NDEC + 1; i++)
+	  for (i = 0; i < ndec + 1; i++)
 	    {
 	      if ((w[NI - 1] & 0x7) != 0)
 		break;
@@ -3205,8 +3214,8 @@ isone:
 else if( ndigs < 0 )
         ndigs = 0;
 */
-  if (ndigs > my_NDEC)
-    ndigs = my_NDEC;
+  if (ndigs > ndec)
+    ndigs = ndec;
   if (digit == 10)
     {
       *s++ = '1';

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-24  3:40                             ` Takashi Yano
@ 2021-11-24  8:48                               ` Corinna Vinschen
  2021-11-24  8:52                               ` Takashi Yano
  1 sibling, 0 replies; 23+ messages in thread
From: Corinna Vinschen @ 2021-11-24  8:48 UTC (permalink / raw)
  To: cygwin

On Nov 24 12:40, Takashi Yano via Cygwin wrote:
> On Tue, 23 Nov 2021 10:48:21 +0100
> Corinna Vinschen wrote:
> > On Nov 23 17:34, Takashi Yano via Cygwin wrote:
> > > However, in reality, for example in the case:
> > > snprintf(buf, sizeof(buf), "%.3f", 1234567890123456.789);
> > > 'ndigits' is only 3 even though total digits will be 20.
> > > 
> > > So, Tony thinks current code does not correct.
> > > 
> > > However, I think something is wrong with interpretation
> > > of 'ndigits' in dltoa.c.
> > > 
> > > printf("%.3f\n", sqrt(2)*1e70);
> > > printf("%.50f\n", sqrt(2)*1e70);
> > > 
> > > outputs
> > > 
> > > 14142135623730951759073108307330633613786387000000000000000000000000000.000
> > > 14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078
> > > 
> > > Is this as intended?
> > 
> > On Linux I see
> > 
> > 14142135623730951759073108307330633613786387161811679011529922516615168.000
> > 14142135623730951759073108307330633613786387161811679011529922516615168.00000000000000000000000000000000000000000000000000
> > 
> > The newlib output for .3f probably suffers from the fact that ldtoa
> > chooses the small buffer, which restricts the number of digits to 43 or
> > 44.  But keep in mind that ldtoa *always* restricted the output to 42,
> > so you never got a more precise output anyway.  Every digit beyond digit
> > 42 is only printed due to the bigger buffer sizes.
> > 
> > So, what newlib and, in extension, Cygwin really needs at this point are
> > patches to the existing ldtoa or a change to gdtoa or equivalent.
> > 
> > https://cygwin.com/acronyms/#SHTDI
> > https://cygwin.com/acronyms/#PTC
> 
> The attached patch is the one which I think correct so far.
> 
> With this patch:
> 
> 14142135623730951759073108307330633613786386978891021459448717416650727.134
> 14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078
> 
> Isn't this better than current behaviour?

LGTM, thanks!  Can you send this as patch to the newlib ML, please?


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-24  3:40                             ` Takashi Yano
  2021-11-24  8:48                               ` Corinna Vinschen
@ 2021-11-24  8:52                               ` Takashi Yano
  2021-11-24  9:14                                 ` Takashi Yano
  1 sibling, 1 reply; 23+ messages in thread
From: Takashi Yano @ 2021-11-24  8:52 UTC (permalink / raw)
  To: cygwin

On Wed, 24 Nov 2021 12:40:55 +0900
Takashi Yano wrote:
> On Tue, 23 Nov 2021 10:48:21 +0100
> Corinna Vinschen wrote:
> > On Nov 23 17:34, Takashi Yano via Cygwin wrote:
> > > However, in reality, for example in the case:
> > > snprintf(buf, sizeof(buf), "%.3f", 1234567890123456.789);
> > > 'ndigits' is only 3 even though total digits will be 20.
> > > 
> > > So, Tony thinks current code does not correct.
> > > 
> > > However, I think something is wrong with interpretation
> > > of 'ndigits' in dltoa.c.
> > > 
> > > printf("%.3f\n", sqrt(2)*1e70);
> > > printf("%.50f\n", sqrt(2)*1e70);
> > > 
> > > outputs
> > > 
> > > 14142135623730951759073108307330633613786387000000000000000000000000000.000
> > > 14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078
> > > 
> > > Is this as intended?
> > 
> > On Linux I see
> > 
> > 14142135623730951759073108307330633613786387161811679011529922516615168.000
> > 14142135623730951759073108307330633613786387161811679011529922516615168.00000000000000000000000000000000000000000000000000
> > 
> > The newlib output for .3f probably suffers from the fact that ldtoa
> > chooses the small buffer, which restricts the number of digits to 43 or
> > 44.  But keep in mind that ldtoa *always* restricted the output to 42,
> > so you never got a more precise output anyway.  Every digit beyond digit
> > 42 is only printed due to the bigger buffer sizes.
> > 
> > So, what newlib and, in extension, Cygwin really needs at this point are
> > patches to the existing ldtoa or a change to gdtoa or equivalent.
> > 
> > https://cygwin.com/acronyms/#SHTDI
> > https://cygwin.com/acronyms/#PTC
> 
> The attached patch is the one which I think correct so far.
> 
> With this patch:
> 
> 14142135623730951759073108307330633613786386978891021459448717416650727.134
> 14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078
> 
> Isn't this better than current behaviour?

The printed value is still something wrong...
sqrt(2)*1e70 should be an integer value.

sqrt(2)*1e62: OK
141421356237309519594917508377920125928940956418610020629872640.000000

sqrt(2)*1e63: NG
1414213562373095241621101250369917453154560547438491094548266395.615752

I will look into this problem.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-24  8:52                               ` Takashi Yano
@ 2021-11-24  9:14                                 ` Takashi Yano
  2021-11-24  9:28                                   ` Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Takashi Yano @ 2021-11-24  9:14 UTC (permalink / raw)
  To: cygwin

On Wed, 24 Nov 2021 17:52:04 +0900
Takashi Yano wrote:
> On Wed, 24 Nov 2021 12:40:55 +0900
> Takashi Yano wrote:
> > On Tue, 23 Nov 2021 10:48:21 +0100
> > Corinna Vinschen wrote:
> > > On Nov 23 17:34, Takashi Yano via Cygwin wrote:
> > > > However, in reality, for example in the case:
> > > > snprintf(buf, sizeof(buf), "%.3f", 1234567890123456.789);
> > > > 'ndigits' is only 3 even though total digits will be 20.
> > > > 
> > > > So, Tony thinks current code does not correct.
> > > > 
> > > > However, I think something is wrong with interpretation
> > > > of 'ndigits' in dltoa.c.
> > > > 
> > > > printf("%.3f\n", sqrt(2)*1e70);
> > > > printf("%.50f\n", sqrt(2)*1e70);
> > > > 
> > > > outputs
> > > > 
> > > > 14142135623730951759073108307330633613786387000000000000000000000000000.000
> > > > 14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078
> > > > 
> > > > Is this as intended?
> > > 
> > > On Linux I see
> > > 
> > > 14142135623730951759073108307330633613786387161811679011529922516615168.000
> > > 14142135623730951759073108307330633613786387161811679011529922516615168.00000000000000000000000000000000000000000000000000
> > > 
> > > The newlib output for .3f probably suffers from the fact that ldtoa
> > > chooses the small buffer, which restricts the number of digits to 43 or
> > > 44.  But keep in mind that ldtoa *always* restricted the output to 42,
> > > so you never got a more precise output anyway.  Every digit beyond digit
> > > 42 is only printed due to the bigger buffer sizes.
> > > 
> > > So, what newlib and, in extension, Cygwin really needs at this point are
> > > patches to the existing ldtoa or a change to gdtoa or equivalent.
> > > 
> > > https://cygwin.com/acronyms/#SHTDI
> > > https://cygwin.com/acronyms/#PTC
> > 
> > The attached patch is the one which I think correct so far.
> > 
> > With this patch:
> > 
> > 14142135623730951759073108307330633613786386978891021459448717416650727.134
> > 14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078
> > 
> > Isn't this better than current behaviour?
> 
> The printed value is still something wrong...
> sqrt(2)*1e70 should be an integer value.

I mean...

sqrt(2)*1e70 is actually not an integer, however, double has mantissa
of only 52 bit. So, (double value)*(5^70*2^70) should be an integer.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-24  9:14                                 ` Takashi Yano
@ 2021-11-24  9:28                                   ` Corinna Vinschen
  2021-11-24 12:29                                     ` Lemke, Michael  SF/HZA-ZI2E
  2021-11-25 12:02                                     ` Takashi Yano
  0 siblings, 2 replies; 23+ messages in thread
From: Corinna Vinschen @ 2021-11-24  9:28 UTC (permalink / raw)
  To: cygwin

On Nov 24 18:14, Takashi Yano via Cygwin wrote:
> On Wed, 24 Nov 2021 17:52:04 +0900
> Takashi Yano wrote:
> > The printed value is still something wrong...
> > sqrt(2)*1e70 should be an integer value.
> 
> I mean...
> 
> sqrt(2)*1e70 is actually not an integer, however, double has mantissa
> of only 52 bit. So, (double value)*(5^70*2^70) should be an integer.

The conversion is a bit inexact, I guess, but that's another problem
of this old ldto, right?


Corinna

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: possible snprintf() regression in 3.3.2
  2021-11-24  9:28                                   ` Corinna Vinschen
@ 2021-11-24 12:29                                     ` Lemke, Michael  SF/HZA-ZI2E
  2021-11-25 12:02                                     ` Takashi Yano
  1 sibling, 0 replies; 23+ messages in thread
From: Lemke, Michael  SF/HZA-ZI2E @ 2021-11-24 12:29 UTC (permalink / raw)
  To: cygwin

On Wednesday, November 24, 2021 10:28 AM Corinna Vinschen wrote:
> On Nov 24 18:14, Takashi Yano via Cygwin wrote:
> > On Wed, 24 Nov 2021 17:52:04 +0900
> > Takashi Yano wrote:
> > > The printed value is still something wrong...
> > > sqrt(2)*1e70 should be an integer value.
> > 
> > I mean...
> > 
> > sqrt(2)*1e70 is actually not an integer, however, double has mantissa
> > of only 52 bit. So, (double value)*(5^70*2^70) should be an integer.
> 
> The conversion is a bit inexact, I guess, but that's another problem
> of this old ldto, right?

Just in case you didn't notice: Only the first 16 digits of both of
Takashi's long numbers are correct. Floating point accuracy?

Michael

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-24  9:28                                   ` Corinna Vinschen
  2021-11-24 12:29                                     ` Lemke, Michael  SF/HZA-ZI2E
@ 2021-11-25 12:02                                     ` Takashi Yano
  2021-11-25 12:45                                       ` Corinna Vinschen
  1 sibling, 1 reply; 23+ messages in thread
From: Takashi Yano @ 2021-11-25 12:02 UTC (permalink / raw)
  To: cygwin

On Wed, 24 Nov 2021 10:28:13 +0100
Corinna Vinschen wrote:
> On Nov 24 18:14, Takashi Yano via Cygwin wrote:
> > On Wed, 24 Nov 2021 17:52:04 +0900
> > Takashi Yano wrote:
> > > The printed value is still something wrong...
> > > sqrt(2)*1e70 should be an integer value.
> > 
> > I mean...
> > 
> > sqrt(2)*1e70 is actually not an integer, however, double has mantissa
> > of only 52 bit. So, (double value)*(5^70*2^70) should be an integer.
> 
> The conversion is a bit inexact, I guess, but that's another problem
> of this old ldto, right?

I looked into this problem and found that:

This problem is in principle unavoidable with current algorithms.
This is because the current algorithm uses a value of 10^n for
the conversion. When n>62, the value does not fit into the 144
bits of the mantissa part of the internal representation in ldtoa.
This degrades the precision.


-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: possible snprintf() regression in 3.3.2
  2021-11-25 12:02                                     ` Takashi Yano
@ 2021-11-25 12:45                                       ` Corinna Vinschen
  0 siblings, 0 replies; 23+ messages in thread
From: Corinna Vinschen @ 2021-11-25 12:45 UTC (permalink / raw)
  To: cygwin

On Nov 25 21:02, Takashi Yano via Cygwin wrote:
> On Wed, 24 Nov 2021 10:28:13 +0100
> Corinna Vinschen wrote:
> > On Nov 24 18:14, Takashi Yano via Cygwin wrote:
> > > On Wed, 24 Nov 2021 17:52:04 +0900
> > > Takashi Yano wrote:
> > > > The printed value is still something wrong...
> > > > sqrt(2)*1e70 should be an integer value.
> > > 
> > > I mean...
> > > 
> > > sqrt(2)*1e70 is actually not an integer, however, double has mantissa
> > > of only 52 bit. So, (double value)*(5^70*2^70) should be an integer.
> > 
> > The conversion is a bit inexact, I guess, but that's another problem
> > of this old ldto, right?
> 
> I looked into this problem and found that:
> 
> This problem is in principle unavoidable with current algorithms.
> This is because the current algorithm uses a value of 10^n for
> the conversion. When n>62, the value does not fit into the 144
> bits of the mantissa part of the internal representation in ldtoa.
> This degrades the precision.

Thanks for checking.

Corinna

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-11-25 12:45 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17  0:37 possible snprintf() regression in 3.3.2 Tony Cook
2021-11-17  9:21 ` Takashi Yano
2021-11-17 12:27   ` Corinna Vinschen
2021-11-18  0:06     ` Tony Cook
2021-11-18 11:35       ` Takashi Yano
2021-11-18 13:19         ` Corinna Vinschen
2021-11-18 14:11           ` Noel Grandin
2021-11-18 14:27             ` Corinna Vinschen
2021-11-18 21:08               ` Sam Edge
2021-11-21  0:16                 ` Tony Cook
2021-11-22 10:34                   ` Corinna Vinschen
2021-11-22 13:04                     ` Corinna Vinschen
2021-11-22 23:23                       ` Tony Cook
2021-11-23  8:34                         ` Takashi Yano
2021-11-23  9:48                           ` Corinna Vinschen
2021-11-24  3:40                             ` Takashi Yano
2021-11-24  8:48                               ` Corinna Vinschen
2021-11-24  8:52                               ` Takashi Yano
2021-11-24  9:14                                 ` Takashi Yano
2021-11-24  9:28                                   ` Corinna Vinschen
2021-11-24 12:29                                     ` Lemke, Michael  SF/HZA-ZI2E
2021-11-25 12:02                                     ` Takashi Yano
2021-11-25 12:45                                       ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).