From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 40630 invoked by alias); 29 Nov 2019 21:45:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 40619 invoked by uid 89); 29 Nov 2019 21:45:43 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.8 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3 autolearn=ham version=3.3.1 spammy=codecvt X-HELO: us-smtp-delivery-1.mimecast.com Received: from us-smtp-2.mimecast.com (HELO us-smtp-delivery-1.mimecast.com) (205.139.110.61) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 29 Nov 2019 21:45:41 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063939; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3E/jmUY6BM6DueqUUUa/Dc7uX413uZBf1uY3A6HRtXo=; b=Smhi5+ypIBvwBwA6ZsfB58jmkCSQVds4atyUkE8Z8MCu7riWBhf4E4Je2wUq1RQjJCEWU0 Jzkr7gdfB3uIU5mBJkQtTbB3Lu3oJZgf10B5W88fowONCt3w6gKIFpdoOYHceX0+i/qtPq 0NityFuoktL0DvzSu0LZvitzDRcTyKA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-346-5kc5etQFPfOjEilL1qCc2g-1; Fri, 29 Nov 2019 16:45:37 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6651580183C; Fri, 29 Nov 2019 21:45:36 +0000 (UTC) Received: from localhost (unknown [10.33.36.157]) by smtp.corp.redhat.com (Postfix) with ESMTP id CB827600C8; Fri, 29 Nov 2019 21:45:35 +0000 (UTC) Date: Fri, 29 Nov 2019 22:11:00 -0000 From: Jonathan Wakely To: Tom Honermann Cc: "libstdc++@gcc.gnu.org" , gcc-patches Subject: Re: [PATCH 0/4]: C++ P1423R3 char8_t remediation implementation Message-ID: <20191129214534.GA11522@redhat.com> References: <4491be7d-2cea-c437-b991-b1cc43e344ee@honermann.net> <20191129174517.GY11522@redhat.com> <20191129194845.GZ11522@redhat.com> MIME-Version: 1.0 In-Reply-To: <20191129194845.GZ11522@redhat.com> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.12.1 (2019-06-15) X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-SW-Source: 2019-11/txt/msg02692.txt.bz2 On 29/11/19 19:48 +0000, Jonathan Wakely wrote: >On 29/11/19 17:45 +0000, Jonathan Wakely wrote: >>On 15/09/19 15:39 -0400, Tom Honermann wrote: >>>This series of patches provides an implementation of the changes=20 >>>for C++ proposal P1423R3 [1]. >>> >>>These changes do not impact default libstdc++ behavior for C++17=20 >>>and earlier; they are only active for C++2a or when the -fchar8_t=20 >>>option is specified. >>> >>>Tested x86_64-linux. >>> >>>Patch 1: Decouple constraints for u8path from path constructors. >>>Patch 2: Update __cpp_lib_char8_t feature test macro value, add=20 >>>deleted operators, update u8path. >>>Patch 3: Updates to existing tests. >>>Patch 4: New tests. >>> >>>Tom. >>> >>>[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html >> >>It took a while, but I've committed these four patches, with just some >>minor whitespace changes and changelog tweaks. > >Running the new tests revealed a latent bug on Windows, where >experimental::filesystem::u8path(const Source&) assumed the input >was an iterator over a NTCTS. That worked for a const char* but not a >std::string or experimental::string_view. > >The attached patch fixes that (and simplifies the #if and if-constexpr >conditions for Windows) but there's a remaining bug. Constructing a >experimental::filesystem::path from a char8_t string doesn't do the >right thing on Windows, so these cases fails: > > fs::path p(u8"\xf0\x9d\x84\x9e"); > VERIFY( p.u8string() =3D=3D u8"\U0001D11E" ); > > p =3D fs::u8path(u8"\xf0\x9d\x84\x9e"); > VERIFY( p.u8string() =3D=3D u8"\U0001D11E" ); > >It works correctly for std::filesystem::path, just not the TS version. I think this is the fix needed for the TS code: --- a/libstdc++-v3/include/experimental/bits/fs_path.h +++ b/libstdc++-v3/include/experimental/bits/fs_path.h @@ -765,7 +765,14 @@ namespace __detail { #ifdef _GLIBCXX_USE_CHAR8_T if constexpr (is_same<_CharT, char8_t>::value) - return _S_wconvert((const char*)__f, (const char*)__l, true_type(= )); + { + const char* __f2 =3D (const char*)__f; + const char* __l2 =3D (const char*)__l; + std::wstring __wstr; + std::codecvt_utf8_utf16 __wcvt; + if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt)) + return __wstr; + } else #endif { The current code uses std::codecvt but when we know the input is UTF-8 encoded we should use codecvt_utf8_utf16 (which is what the C++17 code already does for char8_t input). I'll add that the patch I'm testing.