From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29998 invoked by alias); 20 Aug 2013 07:11:33 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 29957 invoked by uid 89); 20 Aug 2013 07:11:32 -0000 X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,SPF_NEUTRAL autolearn=no version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients Received: from popelka.ms.mff.cuni.cz (HELO popelka.ms.mff.cuni.cz) (195.113.20.131) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Tue, 20 Aug 2013 07:11:30 +0000 Received: from domone.kolej.mff.cuni.cz (popelka.ms.mff.cuni.cz [195.113.20.131]) by popelka.ms.mff.cuni.cz (Postfix) with ESMTPS id B927A500DE; Tue, 20 Aug 2013 09:11:25 +0200 (CEST) Received: by domone.kolej.mff.cuni.cz (Postfix, from userid 1000) id 8D37260135; Tue, 20 Aug 2013 09:11:25 +0200 (CEST) Date: Tue, 20 Aug 2013 07:11:00 -0000 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: emogenet at gmail dot com Cc: glibc-bugs@sourceware.org Subject: Re: [Bug libc/15854] New: strtod should avoid calling strlen Message-ID: <20130820071124.GA5814@domone.kolej.mff.cuni.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2013-08/txt/msg00099.txt.bz2 On Tue, Aug 20, 2013 at 02:12:32AM +0000, emogenet at gmail dot com wrote: > http://sourceware.org/bugzilla/show_bug.cgi?id=15854 > > Bug ID: 15854 > Summary: strtod should avoid calling strlen > Product: glibc > Version: 2.18 > Status: NEW > Severity: enhancement > Priority: P2 > Component: libc > Assignee: unassigned at sourceware dot org > Reporter: emogenet at gmail dot com > CC: drepper.fsp at gmail dot com > > Problem : glibc's strtod seem to systematically call strlen on its input. > > To the layman that I am, there doesn't seem to be any legitimate reason why it > should: it seems that strtod should simply consume its input one char at a time > until it reaches a char that marks the end of a valid FP number ASCII rep. and > should therefore work on a non-zero terminated buffer, as long said buffer ends > with a char that terminates the parsing. > This is not that big problem, strtod only uses strlen in following context decimal = _NL_CURRENT (LC_NUMERIC, DECIMAL_POINT); // which is "." decimal_len = strlen (decimal); // which is 1 > This internal call to strlen makes it essentially impossible to call strtod > on a no zero terminated buffer, and there seems to be no other way to otherwise > access the non-trivial code that converts an ASCII buffer to a FP number. > > This makes it in particular painful to call strtod on a very large mmap'd > buffer of ASCII floats : strlen will plow through the entire file for every > call to strtod, making things highly inefficient (it is also not guaranteed > not to crash). > Do you have testcase to demonstrate quadratic behavior? It is possible that end is determined by other ineffective means. > To work around this shortcoming, one ends up having to figure out the end of > the FP ASCII string, "by hand", copy the result to a zero terminated buffer, > and then call strtod on that. > > This is both inefficient and clunky. > > See this article for a good description of the issue: > > http://www.ryanjuckett.com/programming/c-cplusplus/25-optimizing-atof-and-strtod > > Here's another instance of the problem: > > http://stackoverflow.com/questions/2033845/any-one-know-how-to-convert-a-huge-char-array-to-float-very-huge-array-perform > Not relevant for us as these are windows problems.