public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/15854] New: strtod should avoid calling strlen
@ 2013-08-20 2:12 emogenet at gmail dot com
2013-08-20 2:13 ` [Bug libc/15854] " emogenet at gmail dot com
` (6 more replies)
0 siblings, 7 replies; 9+ messages in thread
From: emogenet at gmail dot com @ 2013-08-20 2:12 UTC (permalink / raw)
To: glibc-bugs
http://sourceware.org/bugzilla/show_bug.cgi?id=15854
Bug ID: 15854
Summary: strtod should avoid calling strlen
Product: glibc
Version: 2.18
Status: NEW
Severity: enhancement
Priority: P2
Component: libc
Assignee: unassigned at sourceware dot org
Reporter: emogenet at gmail dot com
CC: drepper.fsp at gmail dot com
Problem : glibc's strtod seem to systematically call strlen on its input.
To the layman that I am, there doesn't seem to be any legitimate reason why it
should: it seems that strtod should simply consume its input one char at a time
until it reaches a char that marks the end of a valid FP number ASCII rep. and
should therefore work on a non-zero terminated buffer, as long said buffer ends
with a char that terminates the parsing.
This internal call to strlen makes it essentially impossible to call strtod
on a no zero terminated buffer, and there seems to be no other way to otherwise
access the non-trivial code that converts an ASCII buffer to a FP number.
This makes it in particular painful to call strtod on a very large mmap'd
buffer of ASCII floats : strlen will plow through the entire file for every
call to strtod, making things highly inefficient (it is also not guaranteed
not to crash).
To work around this shortcoming, one ends up having to figure out the end of
the FP ASCII string, "by hand", copy the result to a zero terminated buffer,
and then call strtod on that.
This is both inefficient and clunky.
See this article for a good description of the issue:
http://www.ryanjuckett.com/programming/c-cplusplus/25-optimizing-atof-and-strtod
Here's another instance of the problem:
http://stackoverflow.com/questions/2033845/any-one-know-how-to-convert-a-huge-char-array-to-float-very-huge-array-perform
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/15854] strtod should avoid calling strlen
2013-08-20 2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
@ 2013-08-20 2:13 ` emogenet at gmail dot com
2013-08-20 7:11 ` neleai at seznam dot cz
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: emogenet at gmail dot com @ 2013-08-20 2:13 UTC (permalink / raw)
To: glibc-bugs
http://sourceware.org/bugzilla/show_bug.cgi?id=15854
emogenet at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |emogenet at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug libc/15854] New: strtod should avoid calling strlen
2013-08-20 2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
2013-08-20 2:13 ` [Bug libc/15854] " emogenet at gmail dot com
2013-08-20 7:11 ` neleai at seznam dot cz
@ 2013-08-20 7:11 ` Ondřej Bílka
2013-08-20 7:29 ` [Bug libc/15854] " emogenet at gmail dot com
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Ondřej Bílka @ 2013-08-20 7:11 UTC (permalink / raw)
To: emogenet at gmail dot com; +Cc: glibc-bugs
On Tue, Aug 20, 2013 at 02:12:32AM +0000, emogenet at gmail dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
>
> Bug ID: 15854
> Summary: strtod should avoid calling strlen
> Product: glibc
> Version: 2.18
> Status: NEW
> Severity: enhancement
> Priority: P2
> Component: libc
> Assignee: unassigned at sourceware dot org
> Reporter: emogenet at gmail dot com
> CC: drepper.fsp at gmail dot com
>
> Problem : glibc's strtod seem to systematically call strlen on its input.
>
> To the layman that I am, there doesn't seem to be any legitimate reason why it
> should: it seems that strtod should simply consume its input one char at a time
> until it reaches a char that marks the end of a valid FP number ASCII rep. and
> should therefore work on a non-zero terminated buffer, as long said buffer ends
> with a char that terminates the parsing.
>
This is not that big problem, strtod only uses strlen in following context
decimal = _NL_CURRENT (LC_NUMERIC, DECIMAL_POINT); // which is "."
decimal_len = strlen (decimal); // which is 1
> This internal call to strlen makes it essentially impossible to call strtod
> on a no zero terminated buffer, and there seems to be no other way to otherwise
> access the non-trivial code that converts an ASCII buffer to a FP number.
>
> This makes it in particular painful to call strtod on a very large mmap'd
> buffer of ASCII floats : strlen will plow through the entire file for every
> call to strtod, making things highly inefficient (it is also not guaranteed
> not to crash).
>
Do you have testcase to demonstrate quadratic behavior? It is possible
that end is determined by other ineffective means.
> To work around this shortcoming, one ends up having to figure out the end of
> the FP ASCII string, "by hand", copy the result to a zero terminated buffer,
> and then call strtod on that.
>
> This is both inefficient and clunky.
>
> See this article for a good description of the issue:
>
> http://www.ryanjuckett.com/programming/c-cplusplus/25-optimizing-atof-and-strtod
>
> Here's another instance of the problem:
>
> http://stackoverflow.com/questions/2033845/any-one-know-how-to-convert-a-huge-char-array-to-float-very-huge-array-perform
>
Not relevant for us as these are windows problems.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/15854] strtod should avoid calling strlen
2013-08-20 2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
2013-08-20 2:13 ` [Bug libc/15854] " emogenet at gmail dot com
@ 2013-08-20 7:11 ` neleai at seznam dot cz
2013-08-20 7:11 ` [Bug libc/15854] New: " Ondřej Bílka
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: neleai at seznam dot cz @ 2013-08-20 7:11 UTC (permalink / raw)
To: glibc-bugs
http://sourceware.org/bugzilla/show_bug.cgi?id=15854
--- Comment #1 from Ondrej Bilka <neleai at seznam dot cz> ---
On Tue, Aug 20, 2013 at 02:12:32AM +0000, emogenet at gmail dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
>
> Bug ID: 15854
> Summary: strtod should avoid calling strlen
> Product: glibc
> Version: 2.18
> Status: NEW
> Severity: enhancement
> Priority: P2
> Component: libc
> Assignee: unassigned at sourceware dot org
> Reporter: emogenet at gmail dot com
> CC: drepper.fsp at gmail dot com
>
> Problem : glibc's strtod seem to systematically call strlen on its input.
>
> To the layman that I am, there doesn't seem to be any legitimate reason why it
> should: it seems that strtod should simply consume its input one char at a time
> until it reaches a char that marks the end of a valid FP number ASCII rep. and
> should therefore work on a non-zero terminated buffer, as long said buffer ends
> with a char that terminates the parsing.
>
This is not that big problem, strtod only uses strlen in following context
decimal = _NL_CURRENT (LC_NUMERIC, DECIMAL_POINT); // which is "."
decimal_len = strlen (decimal); // which is 1
> This internal call to strlen makes it essentially impossible to call strtod
> on a no zero terminated buffer, and there seems to be no other way to otherwise
> access the non-trivial code that converts an ASCII buffer to a FP number.
>
> This makes it in particular painful to call strtod on a very large mmap'd
> buffer of ASCII floats : strlen will plow through the entire file for every
> call to strtod, making things highly inefficient (it is also not guaranteed
> not to crash).
>
Do you have testcase to demonstrate quadratic behavior? It is possible
that end is determined by other ineffective means.
> To work around this shortcoming, one ends up having to figure out the end of
> the FP ASCII string, "by hand", copy the result to a zero terminated buffer,
> and then call strtod on that.
>
> This is both inefficient and clunky.
>
> See this article for a good description of the issue:
>
> http://www.ryanjuckett.com/programming/c-cplusplus/25-optimizing-atof-and-strtod
>
> Here's another instance of the problem:
>
> http://stackoverflow.com/questions/2033845/any-one-know-how-to-convert-a-huge-char-array-to-float-very-huge-array-perform
>
Not relevant for us as these are windows problems.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/15854] strtod should avoid calling strlen
2013-08-20 2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
` (2 preceding siblings ...)
2013-08-20 7:11 ` [Bug libc/15854] New: " Ondřej Bílka
@ 2013-08-20 7:29 ` emogenet at gmail dot com
2013-08-24 7:42 ` Ondřej Bílka
2013-08-20 7:32 ` allan at archlinux dot org
` (2 subsequent siblings)
6 siblings, 1 reply; 9+ messages in thread
From: emogenet at gmail dot com @ 2013-08-20 7:29 UTC (permalink / raw)
To: glibc-bugs
http://sourceware.org/bugzilla/show_bug.cgi?id=15854
--- Comment #2 from emogenet at gmail dot com ---
You are correct, the call tp strlen in strtod is not a problem. I
incrorectly assumed
it was calling strlen on the whole buffer because sscanf does exhibit the
problem I
describe, but as it turns out, the problem is inherent to sscanf, and
strtod works fine.
As a matter of fact, I just tested glibc's strtod on a very large ASCII
mmap'd buffer
just now, an it works fine, no quadratic behavior.
Apologies for not testing this better before reporting the bug. Please feel
free to close.
- Emmanuel
On Tue, Aug 20, 2013 at 9:11 AM, neleai at seznam dot cz <
sourceware-bugzilla@sourceware.org> wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
>
> --- Comment #1 from Ondrej Bilka <neleai at seznam dot cz> ---
> On Tue, Aug 20, 2013 at 02:12:32AM +0000, emogenet at gmail dot com wrote:
> > http://sourceware.org/bugzilla/show_bug.cgi?id=15854
> >
> > Bug ID: 15854
> > Summary: strtod should avoid calling strlen
> > Product: glibc
> > Version: 2.18
> > Status: NEW
> > Severity: enhancement
> > Priority: P2
> > Component: libc
> > Assignee: unassigned at sourceware dot org
> > Reporter: emogenet at gmail dot com
> > CC: drepper.fsp at gmail dot com
> >
> > Problem : glibc's strtod seem to systematically call strlen on its input.
> >
> > To the layman that I am, there doesn't seem to be any legitimate reason
> why it
> > should: it seems that strtod should simply consume its input one char at
> a time
> > until it reaches a char that marks the end of a valid FP number ASCII
> rep. and
> > should therefore work on a non-zero terminated buffer, as long said
> buffer ends
> > with a char that terminates the parsing.
> >
> This is not that big problem, strtod only uses strlen in following context
>
> decimal = _NL_CURRENT (LC_NUMERIC, DECIMAL_POINT); // which is "."
> decimal_len = strlen (decimal); // which is 1
>
>
> > This internal call to strlen makes it essentially impossible to call
> strtod
> > on a no zero terminated buffer, and there seems to be no other way to
> otherwise
> > access the non-trivial code that converts an ASCII buffer to a FP number.
> >
> > This makes it in particular painful to call strtod on a very large mmap'd
> > buffer of ASCII floats : strlen will plow through the entire file for
> every
> > call to strtod, making things highly inefficient (it is also not
> guaranteed
> > not to crash).
> >
> Do you have testcase to demonstrate quadratic behavior? It is possible
> that end is determined by other ineffective means.
>
> > To work around this shortcoming, one ends up having to figure out the
> end of
> > the FP ASCII string, "by hand", copy the result to a zero terminated
> buffer,
> > and then call strtod on that.
> >
> > This is both inefficient and clunky.
> >
> > See this article for a good description of the issue:
> >
> >
> http://www.ryanjuckett.com/programming/c-cplusplus/25-optimizing-atof-and-strtod
> >
> > Here's another instance of the problem:
> >
> >
> http://stackoverflow.com/questions/2033845/any-one-know-how-to-convert-a-huge-char-array-to-float-very-huge-array-perform
> >
> Not relevant for us as these are windows problems.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
> You reported the bug.
>
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/15854] strtod should avoid calling strlen
2013-08-20 2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
` (3 preceding siblings ...)
2013-08-20 7:29 ` [Bug libc/15854] " emogenet at gmail dot com
@ 2013-08-20 7:32 ` allan at archlinux dot org
2013-08-24 7:42 ` neleai at seznam dot cz
2014-06-13 13:08 ` fweimer at redhat dot com
6 siblings, 0 replies; 9+ messages in thread
From: allan at archlinux dot org @ 2013-08-20 7:32 UTC (permalink / raw)
To: glibc-bugs
http://sourceware.org/bugzilla/show_bug.cgi?id=15854
Allan McRae <allan at archlinux dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
CC| |allan at archlinux dot org
Resolution|--- |INVALID
--- Comment #3 from Allan McRae <allan at archlinux dot org> ---
Closing.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/15854] strtod should avoid calling strlen
2013-08-20 2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
` (4 preceding siblings ...)
2013-08-20 7:32 ` allan at archlinux dot org
@ 2013-08-24 7:42 ` neleai at seznam dot cz
2014-06-13 13:08 ` fweimer at redhat dot com
6 siblings, 0 replies; 9+ messages in thread
From: neleai at seznam dot cz @ 2013-08-24 7:42 UTC (permalink / raw)
To: glibc-bugs
http://sourceware.org/bugzilla/show_bug.cgi?id=15854
--- Comment #4 from Ondrej Bilka <neleai at seznam dot cz> ---
On Tue, Aug 20, 2013 at 07:29:21AM +0000, emogenet at gmail dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
>
> --- Comment #2 from emogenet at gmail dot com ---
> You are correct, the call tp strlen in strtod is not a problem. I
> incrorectly assumed
> it was calling strlen on the whole buffer because sscanf does exhibit the
> problem I
> describe, but as it turns out, the problem is inherent to sscanf, and
> strtod works fine.
>
And could you provide sscanf testcase as separate bug report?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug libc/15854] strtod should avoid calling strlen
2013-08-20 7:29 ` [Bug libc/15854] " emogenet at gmail dot com
@ 2013-08-24 7:42 ` Ondřej Bílka
0 siblings, 0 replies; 9+ messages in thread
From: Ondřej Bílka @ 2013-08-24 7:42 UTC (permalink / raw)
To: emogenet at gmail dot com; +Cc: glibc-bugs
On Tue, Aug 20, 2013 at 07:29:21AM +0000, emogenet at gmail dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15854
>
> --- Comment #2 from emogenet at gmail dot com ---
> You are correct, the call tp strlen in strtod is not a problem. I
> incrorectly assumed
> it was calling strlen on the whole buffer because sscanf does exhibit the
> problem I
> describe, but as it turns out, the problem is inherent to sscanf, and
> strtod works fine.
>
And could you provide sscanf testcase as separate bug report?
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/15854] strtod should avoid calling strlen
2013-08-20 2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
` (5 preceding siblings ...)
2013-08-24 7:42 ` neleai at seznam dot cz
@ 2014-06-13 13:08 ` fweimer at redhat dot com
6 siblings, 0 replies; 9+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 13:08 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=15854
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-06-13 13:08 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-20 2:12 [Bug libc/15854] New: strtod should avoid calling strlen emogenet at gmail dot com
2013-08-20 2:13 ` [Bug libc/15854] " emogenet at gmail dot com
2013-08-20 7:11 ` neleai at seznam dot cz
2013-08-20 7:11 ` [Bug libc/15854] New: " Ondřej Bílka
2013-08-20 7:29 ` [Bug libc/15854] " emogenet at gmail dot com
2013-08-24 7:42 ` Ondřej Bílka
2013-08-20 7:32 ` allan at archlinux dot org
2013-08-24 7:42 ` neleai at seznam dot cz
2014-06-13 13:08 ` fweimer at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).