public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/15854] New: strtod should avoid calling strlen
@ 2013-08-20  2:12 emogenet at gmail dot com
  2013-08-20  2:13 ` [Bug libc/15854] " emogenet at gmail dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: emogenet at gmail dot com @ 2013-08-20  2:12 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=15854

            Bug ID: 15854
           Summary: strtod should avoid calling strlen
           Product: glibc
           Version: 2.18
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: emogenet at gmail dot com
                CC: drepper.fsp at gmail dot com

Problem : glibc's strtod seem to systematically call strlen on its input.

To the layman that I am, there doesn't seem to be any legitimate reason why it
should: it seems that strtod should simply consume its input one char at a time
until it reaches a char that marks the end of a valid FP number ASCII rep. and
should therefore work on a non-zero terminated buffer, as long said buffer ends
with a char that terminates the parsing.

This internal call to strlen makes it essentially impossible to call strtod
on a no zero terminated buffer, and there seems to be no other way to otherwise
access the non-trivial code that converts an ASCII buffer to a FP number.

This makes it in particular painful to call strtod on a very large mmap'd
buffer of ASCII floats : strlen will plow through the entire file for every
call to strtod, making things highly inefficient (it is also not guaranteed
not to crash).

To work around this shortcoming, one ends up having to figure out the end of
the FP ASCII string, "by hand", copy the result to a zero terminated buffer,
and then call strtod on that.

This is both inefficient and clunky.

See this article for a good description of the issue:

http://www.ryanjuckett.com/programming/c-cplusplus/25-optimizing-atof-and-strtod

Here's another instance of the problem:

http://stackoverflow.com/questions/2033845/any-one-know-how-to-convert-a-huge-char-array-to-float-very-huge-array-perform

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libc/15854] strtod should avoid calling strlen
  2013-08-20  2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
@ 2013-08-20  2:13 ` emogenet at gmail dot com
  2013-08-20  7:11 ` neleai at seznam dot cz
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: emogenet at gmail dot com @ 2013-08-20  2:13 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=15854

emogenet at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |emogenet at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug libc/15854] New: strtod should avoid calling strlen
  2013-08-20  2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
  2013-08-20  2:13 ` [Bug libc/15854] " emogenet at gmail dot com
  2013-08-20  7:11 ` neleai at seznam dot cz
@ 2013-08-20  7:11 ` Ondřej Bílka
  2013-08-20  7:29 ` [Bug libc/15854] " emogenet at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Ondřej Bílka @ 2013-08-20  7:11 UTC (permalink / raw)
  To: emogenet at gmail dot com; +Cc: glibc-bugs

On Tue, Aug 20, 2013 at 02:12:32AM +0000, emogenet at gmail dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
> 
>             Bug ID: 15854
>            Summary: strtod should avoid calling strlen
>            Product: glibc
>            Version: 2.18
>             Status: NEW
>           Severity: enhancement
>           Priority: P2
>          Component: libc
>           Assignee: unassigned at sourceware dot org
>           Reporter: emogenet at gmail dot com
>                 CC: drepper.fsp at gmail dot com
> 
> Problem : glibc's strtod seem to systematically call strlen on its input.
> 
> To the layman that I am, there doesn't seem to be any legitimate reason why it
> should: it seems that strtod should simply consume its input one char at a time
> until it reaches a char that marks the end of a valid FP number ASCII rep. and
> should therefore work on a non-zero terminated buffer, as long said buffer ends
> with a char that terminates the parsing.
>
This is not that big problem, strtod only uses strlen in following context

  decimal = _NL_CURRENT (LC_NUMERIC, DECIMAL_POINT); // which is "."
  decimal_len = strlen (decimal); // which is 1

 
> This internal call to strlen makes it essentially impossible to call strtod
> on a no zero terminated buffer, and there seems to be no other way to otherwise
> access the non-trivial code that converts an ASCII buffer to a FP number.
> 
> This makes it in particular painful to call strtod on a very large mmap'd
> buffer of ASCII floats : strlen will plow through the entire file for every
> call to strtod, making things highly inefficient (it is also not guaranteed
> not to crash).
>
Do you have testcase to demonstrate quadratic behavior? It is possible
that end is determined by other ineffective means.
 
> To work around this shortcoming, one ends up having to figure out the end of
> the FP ASCII string, "by hand", copy the result to a zero terminated buffer,
> and then call strtod on that.
> 
> This is both inefficient and clunky.
> 
> See this article for a good description of the issue:
> 
> http://www.ryanjuckett.com/programming/c-cplusplus/25-optimizing-atof-and-strtod
> 
> Here's another instance of the problem:
> 
> http://stackoverflow.com/questions/2033845/any-one-know-how-to-convert-a-huge-char-array-to-float-very-huge-array-perform
> 
Not relevant for us as these are windows problems.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libc/15854] strtod should avoid calling strlen
  2013-08-20  2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
  2013-08-20  2:13 ` [Bug libc/15854] " emogenet at gmail dot com
@ 2013-08-20  7:11 ` neleai at seznam dot cz
  2013-08-20  7:11 ` [Bug libc/15854] New: " Ondřej Bílka
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: neleai at seznam dot cz @ 2013-08-20  7:11 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=15854

--- Comment #1 from Ondrej Bilka <neleai at seznam dot cz> ---
On Tue, Aug 20, 2013 at 02:12:32AM +0000, emogenet at gmail dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
> 
>             Bug ID: 15854
>            Summary: strtod should avoid calling strlen
>            Product: glibc
>            Version: 2.18
>             Status: NEW
>           Severity: enhancement
>           Priority: P2
>          Component: libc
>           Assignee: unassigned at sourceware dot org
>           Reporter: emogenet at gmail dot com
>                 CC: drepper.fsp at gmail dot com
> 
> Problem : glibc's strtod seem to systematically call strlen on its input.
> 
> To the layman that I am, there doesn't seem to be any legitimate reason why it
> should: it seems that strtod should simply consume its input one char at a time
> until it reaches a char that marks the end of a valid FP number ASCII rep. and
> should therefore work on a non-zero terminated buffer, as long said buffer ends
> with a char that terminates the parsing.
>
This is not that big problem, strtod only uses strlen in following context

  decimal = _NL_CURRENT (LC_NUMERIC, DECIMAL_POINT); // which is "."
  decimal_len = strlen (decimal); // which is 1


> This internal call to strlen makes it essentially impossible to call strtod
> on a no zero terminated buffer, and there seems to be no other way to otherwise
> access the non-trivial code that converts an ASCII buffer to a FP number.
> 
> This makes it in particular painful to call strtod on a very large mmap'd
> buffer of ASCII floats : strlen will plow through the entire file for every
> call to strtod, making things highly inefficient (it is also not guaranteed
> not to crash).
>
Do you have testcase to demonstrate quadratic behavior? It is possible
that end is determined by other ineffective means.

> To work around this shortcoming, one ends up having to figure out the end of
> the FP ASCII string, "by hand", copy the result to a zero terminated buffer,
> and then call strtod on that.
> 
> This is both inefficient and clunky.
> 
> See this article for a good description of the issue:
> 
> http://www.ryanjuckett.com/programming/c-cplusplus/25-optimizing-atof-and-strtod
> 
> Here's another instance of the problem:
> 
> http://stackoverflow.com/questions/2033845/any-one-know-how-to-convert-a-huge-char-array-to-float-very-huge-array-perform
> 
Not relevant for us as these are windows problems.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libc/15854] strtod should avoid calling strlen
  2013-08-20  2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
                   ` (2 preceding siblings ...)
  2013-08-20  7:11 ` [Bug libc/15854] New: " Ondřej Bílka
@ 2013-08-20  7:29 ` emogenet at gmail dot com
  2013-08-24  7:42   ` Ondřej Bílka
  2013-08-20  7:32 ` allan at archlinux dot org
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 9+ messages in thread
From: emogenet at gmail dot com @ 2013-08-20  7:29 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=15854

--- Comment #2 from emogenet at gmail dot com ---
You are correct, the call tp strlen in strtod is not a problem. I
incrorectly assumed
it was calling strlen on the whole buffer because sscanf does exhibit the
problem I
describe, but as it turns out, the problem is inherent to sscanf, and
strtod works fine.

As a matter of fact, I just tested glibc's strtod on a very large ASCII
mmap'd buffer
just now, an it works fine, no quadratic behavior.

Apologies for not testing this better before reporting the bug. Please feel
free to close.

   - Emmanuel



On Tue, Aug 20, 2013 at 9:11 AM, neleai at seznam dot cz <
sourceware-bugzilla@sourceware.org> wrote:

> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
>
> --- Comment #1 from Ondrej Bilka <neleai at seznam dot cz> ---
> On Tue, Aug 20, 2013 at 02:12:32AM +0000, emogenet at gmail dot com wrote:
> > http://sourceware.org/bugzilla/show_bug.cgi?id=15854
> >
> >             Bug ID: 15854
> >            Summary: strtod should avoid calling strlen
> >            Product: glibc
> >            Version: 2.18
> >             Status: NEW
> >           Severity: enhancement
> >           Priority: P2
> >          Component: libc
> >           Assignee: unassigned at sourceware dot org
> >           Reporter: emogenet at gmail dot com
> >                 CC: drepper.fsp at gmail dot com
> >
> > Problem : glibc's strtod seem to systematically call strlen on its input.
> >
> > To the layman that I am, there doesn't seem to be any legitimate reason
> why it
> > should: it seems that strtod should simply consume its input one char at
> a time
> > until it reaches a char that marks the end of a valid FP number ASCII
> rep. and
> > should therefore work on a non-zero terminated buffer, as long said
> buffer ends
> > with a char that terminates the parsing.
> >
> This is not that big problem, strtod only uses strlen in following context
>
>   decimal = _NL_CURRENT (LC_NUMERIC, DECIMAL_POINT); // which is "."
>   decimal_len = strlen (decimal); // which is 1
>
>
> > This internal call to strlen makes it essentially impossible to call
> strtod
> > on a no zero terminated buffer, and there seems to be no other way to
> otherwise
> > access the non-trivial code that converts an ASCII buffer to a FP number.
> >
> > This makes it in particular painful to call strtod on a very large mmap'd
> > buffer of ASCII floats : strlen will plow through the entire file for
> every
> > call to strtod, making things highly inefficient (it is also not
> guaranteed
> > not to crash).
> >
> Do you have testcase to demonstrate quadratic behavior? It is possible
> that end is determined by other ineffective means.
>
> > To work around this shortcoming, one ends up having to figure out the
> end of
> > the FP ASCII string, "by hand", copy the result to a zero terminated
> buffer,
> > and then call strtod on that.
> >
> > This is both inefficient and clunky.
> >
> > See this article for a good description of the issue:
> >
> >
> http://www.ryanjuckett.com/programming/c-cplusplus/25-optimizing-atof-and-strtod
> >
> > Here's another instance of the problem:
> >
> >
> http://stackoverflow.com/questions/2033845/any-one-know-how-to-convert-a-huge-char-array-to-float-very-huge-array-perform
> >
> Not relevant for us as these are windows problems.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
> You reported the bug.
>

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libc/15854] strtod should avoid calling strlen
  2013-08-20  2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
                   ` (3 preceding siblings ...)
  2013-08-20  7:29 ` [Bug libc/15854] " emogenet at gmail dot com
@ 2013-08-20  7:32 ` allan at archlinux dot org
  2013-08-24  7:42 ` neleai at seznam dot cz
  2014-06-13 13:08 ` fweimer at redhat dot com
  6 siblings, 0 replies; 9+ messages in thread
From: allan at archlinux dot org @ 2013-08-20  7:32 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=15854

Allan McRae <allan at archlinux dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |allan at archlinux dot org
         Resolution|---                         |INVALID

--- Comment #3 from Allan McRae <allan at archlinux dot org> ---
Closing.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libc/15854] strtod should avoid calling strlen
  2013-08-20  2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
                   ` (4 preceding siblings ...)
  2013-08-20  7:32 ` allan at archlinux dot org
@ 2013-08-24  7:42 ` neleai at seznam dot cz
  2014-06-13 13:08 ` fweimer at redhat dot com
  6 siblings, 0 replies; 9+ messages in thread
From: neleai at seznam dot cz @ 2013-08-24  7:42 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=15854

--- Comment #4 from Ondrej Bilka <neleai at seznam dot cz> ---
On Tue, Aug 20, 2013 at 07:29:21AM +0000, emogenet at gmail dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
> 
> --- Comment #2 from emogenet at gmail dot com ---
> You are correct, the call tp strlen in strtod is not a problem. I
> incrorectly assumed
> it was calling strlen on the whole buffer because sscanf does exhibit the
> problem I
> describe, but as it turns out, the problem is inherent to sscanf, and
> strtod works fine.
> 
And could you provide sscanf testcase as separate bug report?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug libc/15854] strtod should avoid calling strlen
  2013-08-20  7:29 ` [Bug libc/15854] " emogenet at gmail dot com
@ 2013-08-24  7:42   ` Ondřej Bílka
  0 siblings, 0 replies; 9+ messages in thread
From: Ondřej Bílka @ 2013-08-24  7:42 UTC (permalink / raw)
  To: emogenet at gmail dot com; +Cc: glibc-bugs

On Tue, Aug 20, 2013 at 07:29:21AM +0000, emogenet at gmail dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
> 
> --- Comment #2 from emogenet at gmail dot com ---
> You are correct, the call tp strlen in strtod is not a problem. I
> incrorectly assumed
> it was calling strlen on the whole buffer because sscanf does exhibit the
> problem I
> describe, but as it turns out, the problem is inherent to sscanf, and
> strtod works fine.
> 
And could you provide sscanf testcase as separate bug report?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libc/15854] strtod should avoid calling strlen
  2013-08-20  2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
                   ` (5 preceding siblings ...)
  2013-08-24  7:42 ` neleai at seznam dot cz
@ 2014-06-13 13:08 ` fweimer at redhat dot com
  6 siblings, 0 replies; 9+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 13:08 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=15854

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-06-13 13:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-20  2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
2013-08-20  2:13 ` [Bug libc/15854] " emogenet at gmail dot com
2013-08-20  7:11 ` neleai at seznam dot cz
2013-08-20  7:11 ` [Bug libc/15854] New: " Ondřej Bílka
2013-08-20  7:29 ` [Bug libc/15854] " emogenet at gmail dot com
2013-08-24  7:42   ` Ondřej Bílka
2013-08-20  7:32 ` allan at archlinux dot org
2013-08-24  7:42 ` neleai at seznam dot cz
2014-06-13 13:08 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).