From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jwakely@redhat.com>
Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com
 [207.211.31.120])
 by sourceware.org (Postfix) with ESMTP id BF7FA3857C56
 for <libstdc++@gcc.gnu.org>; Tue, 21 Jul 2020 11:04:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BF7FA3857C56
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-471-NHCYZaM2N7-wUmyXSJmoEg-1; Tue, 21 Jul 2020 07:04:20 -0400
X-MC-Unique: NHCYZaM2N7-wUmyXSJmoEg-1
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9C7761005510;
 Tue, 21 Jul 2020 11:04:19 +0000 (UTC)
Received: from localhost (unknown [10.33.36.241])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 35C861002397;
 Tue, 21 Jul 2020 11:04:19 +0000 (UTC)
Date: Tue, 21 Jul 2020 12:04:18 +0100
From: Jonathan Wakely <jwakely@redhat.com>
To: Florian Weimer <fweimer@redhat.com>
Cc: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: Re: [committed] libstdc++: Add std::from_chars for floating-point
 types
Message-ID: <20200721110418.GH3215@redhat.com>
References: <20200720225233.GA174400@redhat.com>
 <87y2ndtnr5.fsf@oldenburg2.str.redhat.com>
MIME-Version: 1.0
In-Reply-To: <87y2ndtnr5.fsf@oldenburg2.str.redhat.com>
X-Clacks-Overhead: GNU Terry Pratchett
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Disposition: inline
X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libstdc++@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libstdc++ mailing list <libstdc++.gcc.gnu.org>
List-Unsubscribe: <http://gcc.gnu.org/mailman/options/libstdc++>,
 <mailto:libstdc++-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/libstdc++/>
List-Post: <mailto:libstdc++@gcc.gnu.org>
List-Help: <mailto:libstdc++-request@gcc.gnu.org?subject=help>
List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/libstdc++>,
 <mailto:libstdc++-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Jul 2020 11:04:25 -0000

On 21/07/20 07:56 +0200, Florian Weimer wrote:
>* Jonathan Wakely via Libstdc:
>
>> By replacing the use of strtod we could avoid allocation, avoid changing
>> locale, and use optimised code paths specific to each std::chars_format
>> case. We would also get more portable behaviour, rather than depending
>> on the presence of uselocale, and on any bugs or quirks of the target
>> libc's strtod. Replacing strtod is a project for a later date.
>
>glibc already has strtod_l (since glibc 2.1, undocumented, but declared
>in <stdlib.h>).

Yes, I noticed that in the glibc sources. I decided not to bother
using it because we still need the newlocale and freelocale calls,
which can still potentially allocate memory (although in practice
maybe they don't for the "C" locale?) and because what I committed
should work for any POSIX target.

>What seems to be missing is a function that takes an explicit buffer
>length.  A static reference to the C locale object would be helpful as
>well, I assume.

How expensive is it to do newlocale("C", nullptr), uselocale, and
freelocale?

>Maybe this is sufficiently clean that we can export this for libstdc++'s
>use?  Without repeating the libio mess?

I think we could beat strtod's performance with a handwritten
implementation, so I don't know if it's worth adding glibc extensions
if we would stop using them eventually anyway.

std::from_chars takes an enum that says whether the input is in
hex, scientific or fixed format (or 'general' which is
fixed|scientific). Because strtod determines the format itself, we
need to do some preprocessing before calling strtod, to stop it being
too general.

Some examples where strtod does the wrong thing unless we do extra
work before calling it:

"0x1p01" should always produce the result 0, for any format (because
you pass the hex flag to std::from_chars, it doesn't need a "0x"
prefix, and if one is present it's interpreted as simply "0"). If we
don't truncate the string, strtod produces 2.

"0.8p1" should produce 0.8 for fixed and general formats, produce an
error for scientific format, and produce 1 for hex format (which means
we need to create the string "0x0.8p1" to pass to strtod). strtod
always produces 0.8 for this input.

I also noticed some strings give an underflow error with glibc's
strtod, but are valid for the Microsoft implementation. For example,
this one:
https://github.com/microsoft/STL/blob/master/tests/std/tests/P0067R5_charconv/double_from_chars_test_cases.hpp#L265

Without the final '1' digit glibc returns DBL_MIN, but with the final
'1' digit (so a number larger than DBL_MIN) it underflows. Is that
expected?