From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04olkn2066.outbound.protection.outlook.com [40.92.75.66]) by sourceware.org (Postfix) with ESMTPS id B3A573858C78; Tue, 10 Jan 2023 12:59:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B3A573858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=hotmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=hotmail.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jKafXVgdOqB+/AWE/NBkmciW56e/N3GFbPvzBGnnixbH6nuKmfh2gxH+PS9MW85s1eCRVBWrDRubbrtW3iYrf8XbOBNG5vfw5Ga8XgdqJbj3IbtBFYyyOmK1K1Wn0u9ex29AH5xGUw6RhuyuTydnMJ9BxOWhpS3NocPbMCskb1lGPMfE5DJA+yJs/g2Dz7H4TIx1yU7zsc3GZ85cuGs2Rg4kHEbgdZGlWZHG8aeNyQ9buYR+2LYX0sCCD4tdEpckDexPK3iJMyynNFcwAq8qUNjf4IiTXT0AqMIiooEvqIniyG0hIg7fnFrWeZCcJCqbutZovOCw2t/T93AYQU/cCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=F1b1nXjKLEsAfvR5km4/qM5k/FZXlu2mKOM5ssjSbJU=; b=RWg/41AnswaprsyQ/V8BCjM6GGHcGpT95EfdOsCpk5V2MQimnup3DLjga8Rckdpm9XZNk3NKYZ3ZhFpM9G9AXoYbIH7dJVtdc2JkwuAIGwMMvjhxeeEYcTYGD8imaWBDXjql0QkRu0nocWUSBQOtGu49f4kw7dSsID+DlkFuySv5pIw/jbx8h4A6tA4pWK5+OAqSOGWMI78arnaMsFVg5rAHOhk1miHwj1xvawcXShGI9midVuxn/X1VT6/OKkkAkGfjp2TC8IPpUO2Q4cZGQDyo7Ya2iROr3GUbSGlTLitubpo5SmipdAAFJPNptKzyARXapK59iD93S0Y8/ZbUnA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=F1b1nXjKLEsAfvR5km4/qM5k/FZXlu2mKOM5ssjSbJU=; b=DAZCLXXwYRrDgBwWRrFuOwHStuO+9fAC23Dt/Ht7fICM3O43ogB5ByAfc8+s5SAhCx0jvOX7lCWh9VCWhp1fd4a+R7WE5DlHa9bWZKKu/DShMqQ+1umQ8EJ8kWjmg1ZuI/ZPQNNyUuSfuZUIMsBCiilE5TN0xCT6jtExMmbK/W+q/31lFpk3HJvH1yei/N6yquGhEit+agajyGiGhcoI1KpT+c92XN7bRSR13pwl2T6lrmxJm786YCYzEUDp7QqdeOvyTAInOwhBEyuh7VqlG/jOnU3+k9u+wFodsy53Ca1xAjZGgnb175rcIdN0CM+yluqOG95X+pc/tHVthmDKcA== Received: from AM0PR04MB5412.eurprd04.prod.outlook.com (2603:10a6:208:10f::11) by DB8PR04MB7018.eurprd04.prod.outlook.com (2603:10a6:10:121::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Tue, 10 Jan 2023 12:59:05 +0000 Received: from AM0PR04MB5412.eurprd04.prod.outlook.com ([fe80::8021:ef99:c515:b20f]) by AM0PR04MB5412.eurprd04.prod.outlook.com ([fe80::8021:ef99:c515:b20f%3]) with mapi id 15.20.5986.018; Tue, 10 Jan 2023 12:59:03 +0000 Message-ID: Subject: [PATCH v2] libstdc++: Fix Unicode codecvt and add tests [PR86419] From: Dimitrij Mijoski To: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org Date: Tue, 10 Jan 2023 13:58:59 +0100 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4-0ubuntu1 X-TMN: [Ix00PYKkgxB28EmSOVxZW88nnBSdWG2F] X-ClientProxiedBy: VI1PR0102CA0002.eurprd01.prod.exchangelabs.com (2603:10a6:802::15) To AM0PR04MB5412.eurprd04.prod.outlook.com (2603:10a6:208:10f::11) X-Microsoft-Original-Message-ID: MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AM0PR04MB5412:EE_|DB8PR04MB7018:EE_ X-MS-Office365-Filtering-Correlation-Id: 97c88ebc-040b-42d0-3f24-08daf30a7263 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: I7gSGJCEEq+RzWlYS/a8+aoQm6M3lrkDOXJVCYrQ2VPMz2gk5T/oXPR/J5WZl/mmgpNleSJaM8fGlQnE6bufK7b5oa3yuM53Y2SpoZnNj+kBW6wUuKAazbf1PpDnhcay6frMtggF9YiCSkDtqmneYaxda/EGVvzmR/ipZsom8HwATMRXzKqyiHZ6qgoNZUob4eQhZJ8JybQZfilIvrpmBxy/2Kk4A7DBanoYeh1lF2dQYMZHJcENExBjwQN7Y0/Nr3OF65WBHPmjQXtgKic1hVvXMGBZmOHaS4GMGuwORMC5aR9Oa0ltK7IMDjnqlg7UNYu6CuU7UAv9YG0skrFMHnfRkfJdP6qAJlUzXnkqEn5I44FNm62nIE5KZ1gfZqmdbnUK+2ndaOVLJpaCm5kTbav9PpBCeNv8JuZDiK8XKO85q6DjMqtVM227VITT3RhIVocpHWZXdqla/VzsT6SSb39mJC1T1PQdyq6hQt8xvDdK+AtwuBASh37EoHJBkdhyhstakbInk3u9X4KXDe+y6TVN9m1Cnc98ODV9v55j93Q/djxyFRa2ODmEfT+9QlFKukb0UuYZQ323DweYJtcGDg== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?UisvVGpObmNvUEN2c2t1c3V1REVIQUUvRDVRT05qNzZvMWNjTUxCZEo1VXp6?= =?utf-8?B?R1Npemk1UXlBK0NMZkdsR2crUWhEbG9RWkpUSVoyOVlxVUdFTkV5bjhTMERo?= =?utf-8?B?ekpqQXh6Y2FJZmUvY0Q0MFp4YS82QXlHWTBpRWpxNVhJUEhYQ1lubWh0bjU0?= =?utf-8?B?SzJsM1dGdUN0Q2tFbWp6Ukh2b0Q2dUc4WXVPK0h6YUtYby9MVmhBUThhbHdP?= =?utf-8?B?KzBOUEhZZDhtanY3U2NWUjB1NGJnWFRkaDJNNExObURMcVBYSFhiSVFxaWl3?= =?utf-8?B?L0ZDK0huQytkRTJQSUlXWlAzaTkwUmlIR21IN1RGTE1FSXRzY2FyVG1zVDQ2?= =?utf-8?B?bGMrV2wyKzRiTFFKRWUyZkxwWDdKSWdsNVpoN25wazBvMDhJSUVzelRNUlYz?= =?utf-8?B?NHdjem1BZ1Fta0dNb3M2TGI2Smo4YlNRbExGRVZrbjAzVEw2eE8wdDhmTTFX?= =?utf-8?B?YVZsLzdXMldEVUJiMUdNbExZbFRweXB2a2Vnc2JJR3BqNlZwaUFYRkN0L0pk?= =?utf-8?B?MjNUeGQzT2JBbnhtaWl4bS9tOVJiblk2OHdlWCtqYWdsSGZ3dERCa24vV0lB?= =?utf-8?B?V3daTkxXS0tZS3Jlem12SjNkUldLSHlvY0VPWlJzWjRNVDZvVTdVdTJmQUJy?= =?utf-8?B?dEkwZmdFWjl5dWMyM1pIRjI1aG5ZRDdaaUlScE9KcFZaTTI1K3hQVzdSbnF0?= =?utf-8?B?RHZ2cnVwejlFelBNWXVwaWVDaytHTEY2bWEzam5xanFxdTFWUHpnWFNCSkY4?= =?utf-8?B?VE01TmtWc1Z0eFlhS0VhbU5pc002QXl6c1FoMmRqN2lpWlBxT1MzbDZSa0FN?= =?utf-8?B?dTQ4ZmFPLyswTDZBdUUvTklpRDJmWVdLNHZ1VWZ6NVhSQ2FyVThSVDFxOU1u?= =?utf-8?B?bXRYbEsvQm5Gd29ReVpPZ3JDa21ERzgvNHMxMHVVek1CSnEzU1F3UW1aSExm?= =?utf-8?B?TzA3cEt6OHpmVDl6QzhpeFdhK2V4VGRrMUo2WUoyblhvU2NsVUFKNHU4c25n?= =?utf-8?B?bEVvc0VKWllEa2R3RW9iZ05YWnEzSHVRejRwZEIrSUNxR2w4YVNtVklCNExm?= =?utf-8?B?Y1Bxd2ltbFM2MkpidkZZSGdsL3pNNUQxbDR5S0VPUXRZdUpRb3RtcFB4cUNP?= =?utf-8?B?cmgwZVJmZ0xsNjBBZlpFSitLcXkxVnh3T0R0Z29rMjFhWjRyMzEyVStSKzBm?= =?utf-8?B?aGkvNkpXdTIzQy9rWTVGWnNtc295VU1JVzFMMWZmUFp0NUdIS2VsWU5reDlt?= =?utf-8?B?MWVnRnFrZjhYUkhrRXpyWlRIYk1keENSK1RWMDVFQ1g4Qjd0bWRTaXRMdVBY?= =?utf-8?B?VHdPcDhQZkU4bDhuSjVpREpoRDE4Z2syOXJTNVdqYytZbzVUMklpZi9EQmNh?= =?utf-8?B?NWkwVWVKN2ppTWpFQklxd3UwaVBKYXJ1YUJyL1lKVGUvbXg0Wjk2ODMvNXhp?= =?utf-8?B?UHFtUFNqdGpuWi9EdHJ1bGhQVlc0ZDdESSs1d2hPa3RSZ25HY09PanZnYUpY?= =?utf-8?B?bGFQaXptekg0ZEs0dnNHQU9mUTU2RFRBeDF2RXdXTXFNcjM2cndyN2UzTGlG?= =?utf-8?B?aVFpSkxjSmJuMENGMXhsMk1HSjZCRngrSFhqdDFLYU9tV1pSMUw3SFFwQ3NB?= =?utf-8?B?SmhYTW8xMXRZT2pudUtxKzZsc3hTcUFaVmxneE1Pc1MrU3FQMlNFclI0aVd2?= =?utf-8?Q?lRqkbUfjyEiZeBqxdi3A?= X-OriginatorOrg: sct-15-20-4755-11-msonline-outlook-03a34.templateTenant X-MS-Exchange-CrossTenant-Network-Message-Id: 97c88ebc-040b-42d0-3f24-08daf30a7263 X-MS-Exchange-CrossTenant-AuthSource: AM0PR04MB5412.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2023 12:59:02.9737 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR04MB7018 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Fixes the conversion from UTF-8 to UTF-16 to properly return partial instead ok. Fixes the conversion from UTF-16 to UTF-8 to properly return partial instead ok. Fixes the conversion from UTF-8 to UCS-2 to properly return partial instead error. Fixes the conversion from UTF-8 to UCS-2 to treat 4-byte UTF-8 sequences as error just by seeing the leading byte. Fixes UTF-8 decoding for all codecvts so they detect error at the end of the input range when the last code point is also incomplete. libstdc++-v3/ChangeLog: PR libstdc++/86419 * src/c++11/codecvt.cc: Fix bugs. * testsuite/22_locale/codecvt/codecvt_unicode.cc: New tests. * testsuite/22_locale/codecvt/codecvt_unicode.h: New tests. * testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc: New tests. --- libstdc++-v3/src/c++11/codecvt.cc | 38 +- .../22_locale/codecvt/codecvt_unicode.cc | 68 + .../22_locale/codecvt/codecvt_unicode.h | 1268 +++++++++++++++++ .../codecvt/codecvt_unicode_wchar_t.cc | 59 + 4 files changed, 1414 insertions(+), 19 deletions(-) create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicod= e.cc create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicod= e.h create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicod= e_wchar_t.cc diff --git a/libstdc++-v3/src/c++11/codecvt.cc b/libstdc++-v3/src/c++11/cod= ecvt.cc index 9f8cb7677..49282a510 100644 --- a/libstdc++-v3/src/c++11/codecvt.cc +++ b/libstdc++-v3/src/c++11/codecvt.cc @@ -277,13 +277,15 @@ namespace } else if (c1 < 0xF0) // 3-byte sequence { - if (avail < 3) + if (avail < 2) return incomplete_mb_character; char32_t c2 =3D (unsigned char) from[1]; if ((c2 & 0xC0) !=3D 0x80) return invalid_mb_sequence; if (c1 =3D=3D 0xE0 && c2 < 0xA0) // overlong return invalid_mb_sequence; + if (avail < 3) + return incomplete_mb_character; char32_t c3 =3D (unsigned char) from[2]; if ((c3 & 0xC0) !=3D 0x80) return invalid_mb_sequence; @@ -292,9 +294,9 @@ namespace from +=3D 3; return c; } - else if (c1 < 0xF5) // 4-byte sequence + else if (c1 < 0xF5 && maxcode > 0xFFFF) // 4-byte sequence { - if (avail < 4) + if (avail < 2) return incomplete_mb_character; char32_t c2 =3D (unsigned char) from[1]; if ((c2 & 0xC0) !=3D 0x80) @@ -302,10 +304,14 @@ namespace if (c1 =3D=3D 0xF0 && c2 < 0x90) // overlong return invalid_mb_sequence; if (c1 =3D=3D 0xF4 && c2 >=3D 0x90) // > U+10FFFF - return invalid_mb_sequence; + return invalid_mb_sequence; + if (avail < 3) + return incomplete_mb_character; char32_t c3 =3D (unsigned char) from[2]; if ((c3 & 0xC0) !=3D 0x80) return invalid_mb_sequence; + if (avail < 4) + return incomplete_mb_character; char32_t c4 =3D (unsigned char) from[3]; if ((c4 & 0xC0) !=3D 0x80) return invalid_mb_sequence; @@ -527,12 +533,11 @@ namespace // Flag indicating whether to process UTF-16 or UCS2 enum class surrogates { allowed, disallowed }; =20 - // utf8 -> utf16 (or utf8 -> ucs2 if s =3D=3D surrogates::disallowed) - template - codecvt_base::result - utf16_in(range& from, range& to, - unsigned long maxcode =3D max_code_point, codecvt_mode mode =3D {}, - surrogates s =3D surrogates::allowed) + // utf8 -> utf16 (or utf8 -> ucs2 if maxcode <=3D 0xFFFF) + template + codecvt_base::result utf16_in (range &from, range &to, + unsigned long maxcode =3D max_code_point, + codecvt_mode mode =3D {}) { read_utf8_bom(from, mode); while (from.size() && to.size()) @@ -540,12 +545,7 @@ namespace auto orig =3D from; const char32_t codepoint =3D read_utf8_code_point(from, maxcode); if (codepoint =3D=3D incomplete_mb_character) - { - if (s =3D=3D surrogates::allowed) - return codecvt_base::partial; - else - return codecvt_base::error; // No surrogates in UCS2 - } + return codecvt_base::partial; if (codepoint > maxcode) return codecvt_base::error; if (!write_utf16_code_point(to, codepoint, mode)) @@ -554,7 +554,7 @@ namespace return codecvt_base::partial; } } - return codecvt_base::ok; + return from.size () ? codecvt_base::partial : codecvt_base::ok; } =20 // utf16 -> utf8 (or ucs2 -> utf8 if s =3D=3D surrogates::disallowed) @@ -576,7 +576,7 @@ namespace return codecvt_base::error; // No surrogates in UCS-2 =20 if (from.size() < 2) - return codecvt_base::ok; // stop converting at this point + return codecvt_base::partial; // stop converting at this point =20 const char32_t c2 =3D from[1]; if (is_low_surrogate(c2)) @@ -629,7 +629,7 @@ namespace { // UCS-2 only supports characters in the BMP, i.e. one UTF-16 code uni= t: maxcode =3D std::min(max_single_utf16_unit, maxcode); - return utf16_in(from, to, maxcode, mode, surrogates::disallowed); + return utf16_in (from, to, maxcode, mode); } =20 // ucs2 -> utf8 diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc b/= libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc new file mode 100644 index 000000000..ae4b6c896 --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc @@ -0,0 +1,68 @@ +// Copyright (C) 2020-2023 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// . + +// { dg-do run { target c++11 } } + +#include "codecvt_unicode.h" + +#include + +using namespace std; + +void +test_utf8_utf32_codecvts () +{ + using codecvt_c32 =3D codecvt; + auto loc_c =3D locale::classic (); + VERIFY (has_facet (loc_c)); + auto &cvt =3D use_facet (loc_c); + test_utf8_utf32_codecvts (cvt); + + auto cvt_ptr =3D to_unique_ptr (new codecvt_utf8 ()); + test_utf8_utf32_codecvts (*cvt_ptr); +} + +void +test_utf8_utf16_codecvts () +{ + using codecvt_c16 =3D codecvt; + auto loc_c =3D locale::classic (); + VERIFY (has_facet (loc_c)); + auto &cvt =3D use_facet (loc_c); + test_utf8_utf16_cvts (cvt); + + auto cvt_ptr =3D to_unique_ptr (new codecvt_utf8_utf16 ()); + test_utf8_utf16_cvts (*cvt_ptr); + + auto cvt_ptr2 =3D to_unique_ptr (new codecvt_utf8_utf16 ()); + test_utf8_utf16_cvts (*cvt_ptr2); +} + +void +test_utf8_ucs2_codecvts () +{ + auto cvt_ptr =3D to_unique_ptr (new codecvt_utf8 ()); + test_utf8_ucs2_cvts (*cvt_ptr); +} + +int +main () +{ + test_utf8_utf32_codecvts (); + test_utf8_utf16_codecvts (); + test_utf8_ucs2_codecvts (); +} diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h b/l= ibstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h new file mode 100644 index 000000000..70d079286 --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h @@ -0,0 +1,1268 @@ +// Copyright (C) 2020-2023 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// . + +#include +#include +#include + +template +std::unique_ptr +to_unique_ptr (T *ptr) +{ + return std::unique_ptr (ptr); +} + +struct test_offsets_ok +{ + size_t in_size, out_size; +}; +struct test_offsets_partial +{ + size_t in_size, out_size, expected_in_next, expected_out_next; +}; + +template struct test_offsets_error +{ + size_t in_size, out_size, expected_in_next, expected_out_next; + CharT replace_char; + size_t replace_pos; +}; + +template +auto constexpr array_size (const T (&)[N]) -> size_t +{ + return N; +} + +template +void +utf8_to_utf32_in_ok (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char in[] =3D "b=D1=88\uAAAA\U0010AAAA"; + const char32_t exp_literal[] =3D U"b=D1=88\uAAAA\U0010AAAA"; + CharT exp[array_size (exp_literal)] =3D {}; + std::copy (begin (exp_literal), end (exp_literal), begin (exp)); + + static_assert (array_size (in) =3D=3D 11, ""); + static_assert (array_size (exp_literal) =3D=3D 5, ""); + static_assert (array_size (exp) =3D=3D 5, ""); + VERIFY (char_traits::length (in) =3D=3D 10); + VERIFY (char_traits::length (exp_literal) =3D=3D 4); + VERIFY (char_traits::length (exp) =3D=3D 4); + + test_offsets_ok offsets[] =3D {{0, 0}, {1, 1}, {3, 2}, {6, 3}, {10, 4}}; + for (auto t : offsets) + { + CharT out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.in (state, in, in + t.in_size, in_next, out, out + t.out= _size, + out_next); + VERIFY (res =3D=3D cvt.ok); + VERIFY (in_next =3D=3D in + t.in_size); + VERIFY (out_next =3D=3D out + t.out_size); + VERIFY (char_traits::compare (out, exp, t.out_size) =3D=3D 0)= ; + if (t.out_size < array_size (out)) + VERIFY (out[t.out_size] =3D=3D 0); + } + + for (auto t : offsets) + { + CharT out[array_size (exp)] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res + =3D cvt.in (state, in, in + t.in_size, in_next, out, end (out), out_next)= ; + VERIFY (res =3D=3D cvt.ok); + VERIFY (in_next =3D=3D in + t.in_size); + VERIFY (out_next =3D=3D out + t.out_size); + VERIFY (char_traits::compare (out, exp, t.out_size) =3D=3D 0)= ; + if (t.out_size < array_size (out)) + VERIFY (out[t.out_size] =3D=3D 0); + } +} + +template +void +utf8_to_utf32_in_partial (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char in[] =3D "b=D1=88\uAAAA\U0010AAAA"; + const char32_t exp_literal[] =3D U"b=D1=88\uAAAA\U0010AAAA"; + CharT exp[array_size (exp_literal)] =3D {}; + std::copy (begin (exp_literal), end (exp_literal), begin (exp)); + + static_assert (array_size (in) =3D=3D 11, ""); + static_assert (array_size (exp_literal) =3D=3D 5, ""); + static_assert (array_size (exp) =3D=3D 5, ""); + VERIFY (char_traits::length (in) =3D=3D 10); + VERIFY (char_traits::length (exp_literal) =3D=3D 4); + VERIFY (char_traits::length (exp) =3D=3D 4); + + test_offsets_partial offsets[] =3D { + {1, 0, 0, 0}, // no space for first CP + + {3, 1, 1, 1}, // no space for second CP + {2, 2, 1, 1}, // incomplete second CP + {2, 1, 1, 1}, // incomplete second CP, and no space for it + + {6, 2, 3, 2}, // no space for third CP + {4, 3, 3, 2}, // incomplete third CP + {5, 3, 3, 2}, // incomplete third CP + {4, 2, 3, 2}, // incomplete third CP, and no space for it + {5, 2, 3, 2}, // incomplete third CP, and no space for it + + {10, 3, 6, 3}, // no space for fourth CP + {7, 4, 6, 3}, // incomplete fourth CP + {8, 4, 6, 3}, // incomplete fourth CP + {9, 4, 6, 3}, // incomplete fourth CP + {7, 3, 6, 3}, // incomplete fourth CP, and no space for it + {8, 3, 6, 3}, // incomplete fourth CP, and no space for it + {9, 3, 6, 3}, // incomplete fourth CP, and no space for it + }; + + for (auto t : offsets) + { + CharT out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.in (state, in, in + t.in_size, in_next, out, out + t.out= _size, + out_next); + VERIFY (res =3D=3D cvt.partial); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf8_to_utf32_in_error (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char valid_in[] =3D "b=D1=88\uAAAA\U0010AAAA"; + const char32_t exp_literal[] =3D U"b=D1=88\uAAAA\U0010AAAA"; + CharT exp[array_size (exp_literal)] =3D {}; + std::copy (begin (exp_literal), end (exp_literal), begin (exp)); + + static_assert (array_size (valid_in) =3D=3D 11, ""); + static_assert (array_size (exp_literal) =3D=3D 5, ""); + static_assert (array_size (exp) =3D=3D 5, ""); + VERIFY (char_traits::length (valid_in) =3D=3D 10); + VERIFY (char_traits::length (exp_literal) =3D=3D 4); + VERIFY (char_traits::length (exp) =3D=3D 4); + + test_offsets_error offsets[] =3D { + + // replace leading byte with invalid byte + {1, 4, 0, 0, '\xFF', 0}, + {3, 4, 1, 1, '\xFF', 1}, + {6, 4, 3, 2, '\xFF', 3}, + {10, 4, 6, 3, '\xFF', 6}, + + // replace first trailing byte with ASCII byte + {3, 4, 1, 1, 'z', 2}, + {6, 4, 3, 2, 'z', 4}, + {10, 4, 6, 3, 'z', 7}, + + // replace first trailing byte with invalid byte + {3, 4, 1, 1, '\xFF', 2}, + {6, 4, 3, 2, '\xFF', 4}, + {10, 4, 6, 3, '\xFF', 7}, + + // replace second trailing byte with ASCII byte + {6, 4, 3, 2, 'z', 5}, + {10, 4, 6, 3, 'z', 8}, + + // replace second trailing byte with invalid byte + {6, 4, 3, 2, '\xFF', 5}, + {10, 4, 6, 3, '\xFF', 8}, + + // replace third trailing byte + {10, 4, 6, 3, 'z', 9}, + {10, 4, 6, 3, '\xFF', 9}, + + // replace first trailing byte with ASCII byte, also incomplete at end + {5, 4, 3, 2, 'z', 4}, + {8, 4, 6, 3, 'z', 7}, + {9, 4, 6, 3, 'z', 7}, + + // replace first trailing byte with invalid byte, also incomplete at e= nd + {5, 4, 3, 2, '\xFF', 4}, + {8, 4, 6, 3, '\xFF', 7}, + {9, 4, 6, 3, '\xFF', 7}, + + // replace second trailing byte with ASCII byte, also incomplete at en= d + {9, 4, 6, 3, 'z', 8}, + + // replace second trailing byte with invalid byte, also incomplete at = end + {9, 4, 6, 3, '\xFF', 8}, + }; + for (auto t : offsets) + { + char in[array_size (valid_in)] =3D {}; + CharT out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + char_traits::copy (in, valid_in, array_size (valid_in)); + in[t.replace_pos] =3D t.replace_char; + + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.in (state, in, in + t.in_size, in_next, out, out + t.out= _size, + out_next); + VERIFY (res =3D=3D cvt.error); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf8_to_utf32_in (const std::codecvt &cvt) +{ + utf8_to_utf32_in_ok (cvt); + utf8_to_utf32_in_partial (cvt); + utf8_to_utf32_in_error (cvt); +} + +template +void +utf32_to_utf8_out_ok (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char32_t in_literal[] =3D U"b=D1=88\uAAAA\U0010AAAA"; + const char exp[] =3D "b=D1=88\uAAAA\U0010AAAA"; + CharT in[array_size (in_literal)] =3D {}; + copy (begin (in_literal), end (in_literal), begin (in)); + + static_assert (array_size (in_literal) =3D=3D 5, ""); + static_assert (array_size (in) =3D=3D 5, ""); + static_assert (array_size (exp) =3D=3D 11, ""); + VERIFY (char_traits::length (in_literal) =3D=3D 4); + VERIFY (char_traits::length (in) =3D=3D 4); + VERIFY (char_traits::length (exp) =3D=3D 10); + + const test_offsets_ok offsets[] =3D {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {4,= 10}}; + for (auto t : offsets) + { + char out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + auto state =3D mbstate_t{}; + auto in_next =3D (const CharT *) nullptr; + auto out_next =3D (char *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.out (state, in, in + t.in_size, in_next, out, out + t.ou= t_size, + out_next); + VERIFY (res =3D=3D cvt.ok); + VERIFY (in_next =3D=3D in + t.in_size); + VERIFY (out_next =3D=3D out + t.out_size); + VERIFY (char_traits::compare (out, exp, t.out_size) =3D=3D 0); + if (t.out_size < array_size (out)) + VERIFY (out[t.out_size] =3D=3D 0); + } +} + +template +void +utf32_to_utf8_out_partial (const std::codecvt &cvt= ) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char32_t in_literal[] =3D U"b=D1=88\uAAAA\U0010AAAA"; + const char exp[] =3D "b=D1=88\uAAAA\U0010AAAA"; + CharT in[array_size (in_literal)] =3D {}; + copy (begin (in_literal), end (in_literal), begin (in)); + + static_assert (array_size (in_literal) =3D=3D 5, ""); + static_assert (array_size (in) =3D=3D 5, ""); + static_assert (array_size (exp) =3D=3D 11, ""); + VERIFY (char_traits::length (in_literal) =3D=3D 4); + VERIFY (char_traits::length (in) =3D=3D 4); + VERIFY (char_traits::length (exp) =3D=3D 10); + + const test_offsets_partial offsets[] =3D { + {1, 0, 0, 0}, // no space for first CP + + {2, 1, 1, 1}, // no space for second CP + {2, 2, 1, 1}, // no space for second CP + + {3, 3, 2, 3}, // no space for third CP + {3, 4, 2, 3}, // no space for third CP + {3, 5, 2, 3}, // no space for third CP + + {4, 6, 3, 6}, // no space for fourth CP + {4, 7, 3, 6}, // no space for fourth CP + {4, 8, 3, 6}, // no space for fourth CP + {4, 9, 3, 6}, // no space for fourth CP + }; + for (auto t : offsets) + { + char out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + auto state =3D mbstate_t{}; + auto in_next =3D (const CharT *) nullptr; + auto out_next =3D (char *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.out (state, in, in + t.in_size, in_next, out, out + t.ou= t_size, + out_next); + VERIFY (res =3D=3D cvt.partial); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf32_to_utf8_out_error (const std::codecvt &cvt) +{ + using namespace std; + const char32_t valid_in[] =3D U"b=D1=88\uAAAA\U0010AAAA"; + const char exp[] =3D "b=D1=88\uAAAA\U0010AAAA"; + + static_assert (array_size (valid_in) =3D=3D 5, ""); + static_assert (array_size (exp) =3D=3D 11, ""); + VERIFY (char_traits::length (valid_in) =3D=3D 4); + VERIFY (char_traits::length (exp) =3D=3D 10); + + test_offsets_error offsets[] =3D {{4, 10, 0, 0, 0x00110000, 0}, + {4, 10, 1, 1, 0x00110000, 1}, + {4, 10, 2, 3, 0x00110000, 2}, + {4, 10, 3, 6, 0x00110000, 3}}; + + for (auto t : offsets) + { + CharT in[array_size (valid_in)] =3D {}; + char out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + copy (begin (valid_in), end (valid_in), begin (in)); + in[t.replace_pos] =3D t.replace_char; + + auto state =3D mbstate_t{}; + auto in_next =3D (const CharT *) nullptr; + auto out_next =3D (char *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.out (state, in, in + t.in_size, in_next, out, out + t.ou= t_size, + out_next); + VERIFY (res =3D=3D cvt.error); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf32_to_utf8_out (const std::codecvt &cvt) +{ + utf32_to_utf8_out_ok (cvt); + utf32_to_utf8_out_partial (cvt); + utf32_to_utf8_out_error (cvt); +} + +template +void +test_utf8_utf32_codecvts (const std::codecvt &cvt) +{ + utf8_to_utf32_in (cvt); + utf32_to_utf8_out (cvt); +} + +template +void +utf8_to_utf16_in_ok (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char in[] =3D "b=D1=88\uAAAA\U0010AAAA"; + const char16_t exp_literal[] =3D u"b=D1=88\uAAAA\U0010AAAA"; + CharT exp[array_size (exp_literal)] =3D {}; + copy (begin (exp_literal), end (exp_literal), begin (exp)); + + static_assert (array_size (in) =3D=3D 11, ""); + static_assert (array_size (exp_literal) =3D=3D 6, ""); + static_assert (array_size (exp) =3D=3D 6, ""); + VERIFY (char_traits::length (in) =3D=3D 10); + VERIFY (char_traits::length (exp_literal) =3D=3D 5); + VERIFY (char_traits::length (exp) =3D=3D 5); + + test_offsets_ok offsets[] =3D {{0, 0}, {1, 1}, {3, 2}, {6, 3}, {10, 5}}; + for (auto t : offsets) + { + CharT out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.in (state, in, in + t.in_size, in_next, out, out + t.out= _size, + out_next); + VERIFY (res =3D=3D cvt.ok); + VERIFY (in_next =3D=3D in + t.in_size); + VERIFY (out_next =3D=3D out + t.out_size); + VERIFY (char_traits::compare (out, exp, t.out_size) =3D=3D 0)= ; + if (t.out_size < array_size (out)) + VERIFY (out[t.out_size] =3D=3D 0); + } + + for (auto t : offsets) + { + CharT out[array_size (exp)] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res + =3D cvt.in (state, in, in + t.in_size, in_next, out, end (out), out_next)= ; + VERIFY (res =3D=3D cvt.ok); + VERIFY (in_next =3D=3D in + t.in_size); + VERIFY (out_next =3D=3D out + t.out_size); + VERIFY (char_traits::compare (out, exp, t.out_size) =3D=3D 0)= ; + if (t.out_size < array_size (out)) + VERIFY (out[t.out_size] =3D=3D 0); + } +} + +template +void +utf8_to_utf16_in_partial (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char in[] =3D "b=D1=88\uAAAA\U0010AAAA"; + const char16_t exp_literal[] =3D u"b=D1=88\uAAAA\U0010AAAA"; + CharT exp[array_size (exp_literal)] =3D {}; + copy (begin (exp_literal), end (exp_literal), begin (exp)); + + static_assert (array_size (in) =3D=3D 11, ""); + static_assert (array_size (exp_literal) =3D=3D 6, ""); + static_assert (array_size (exp) =3D=3D 6, ""); + VERIFY (char_traits::length (in) =3D=3D 10); + VERIFY (char_traits::length (exp_literal) =3D=3D 5); + VERIFY (char_traits::length (exp) =3D=3D 5); + + test_offsets_partial offsets[] =3D { + {1, 0, 0, 0}, // no space for first CP + + {3, 1, 1, 1}, // no space for second CP + {2, 2, 1, 1}, // incomplete second CP + {2, 1, 1, 1}, // incomplete second CP, and no space for it + + {6, 2, 3, 2}, // no space for third CP + {4, 3, 3, 2}, // incomplete third CP + {5, 3, 3, 2}, // incomplete third CP + {4, 2, 3, 2}, // incomplete third CP, and no space for it + {5, 2, 3, 2}, // incomplete third CP, and no space for it + + {10, 3, 6, 3}, // no space for fourth CP + {10, 4, 6, 3}, // no space for fourth CP + {7, 5, 6, 3}, // incomplete fourth CP + {8, 5, 6, 3}, // incomplete fourth CP + {9, 5, 6, 3}, // incomplete fourth CP + {7, 3, 6, 3}, // incomplete fourth CP, and no space for it + {8, 3, 6, 3}, // incomplete fourth CP, and no space for it + {9, 3, 6, 3}, // incomplete fourth CP, and no space for it + {7, 4, 6, 3}, // incomplete fourth CP, and no space for it + {8, 4, 6, 3}, // incomplete fourth CP, and no space for it + {9, 4, 6, 3}, // incomplete fourth CP, and no space for it + + }; + + for (auto t : offsets) + { + CharT out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.in (state, in, in + t.in_size, in_next, out, out + t.out= _size, + out_next); + VERIFY (res =3D=3D cvt.partial); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf8_to_utf16_in_error (const std::codecvt &cvt) +{ + using namespace std; + const char valid_in[] =3D "b=D1=88\uAAAA\U0010AAAA"; + const char16_t exp_literal[] =3D u"b=D1=88\uAAAA\U0010AAAA"; + CharT exp[array_size (exp_literal)] =3D {}; + copy (begin (exp_literal), end (exp_literal), begin (exp)); + + static_assert (array_size (valid_in) =3D=3D 11, ""); + static_assert (array_size (exp_literal) =3D=3D 6, ""); + static_assert (array_size (exp) =3D=3D 6, ""); + VERIFY (char_traits::length (valid_in) =3D=3D 10); + VERIFY (char_traits::length (exp_literal) =3D=3D 5); + VERIFY (char_traits::length (exp) =3D=3D 5); + + test_offsets_error offsets[] =3D { + + // replace leading byte with invalid byte + {1, 5, 0, 0, '\xFF', 0}, + {3, 5, 1, 1, '\xFF', 1}, + {6, 5, 3, 2, '\xFF', 3}, + {10, 5, 6, 3, '\xFF', 6}, + + // replace first trailing byte with ASCII byte + {3, 5, 1, 1, 'z', 2}, + {6, 5, 3, 2, 'z', 4}, + {10, 5, 6, 3, 'z', 7}, + + // replace first trailing byte with invalid byte + {3, 5, 1, 1, '\xFF', 2}, + {6, 5, 3, 2, '\xFF', 4}, + {10, 5, 6, 3, '\xFF', 7}, + + // replace second trailing byte with ASCII byte + {6, 5, 3, 2, 'z', 5}, + {10, 5, 6, 3, 'z', 8}, + + // replace second trailing byte with invalid byte + {6, 5, 3, 2, '\xFF', 5}, + {10, 5, 6, 3, '\xFF', 8}, + + // replace third trailing byte + {10, 5, 6, 3, 'z', 9}, + {10, 5, 6, 3, '\xFF', 9}, + + // replace first trailing byte with ASCII byte, also incomplete at end + {5, 5, 3, 2, 'z', 4}, + {8, 5, 6, 3, 'z', 7}, + {9, 5, 6, 3, 'z', 7}, + + // replace first trailing byte with invalid byte, also incomplete at e= nd + {5, 5, 3, 2, '\xFF', 4}, + {8, 5, 6, 3, '\xFF', 7}, + {9, 5, 6, 3, '\xFF', 7}, + + // replace second trailing byte with ASCII byte, also incomplete at en= d + {9, 5, 6, 3, 'z', 8}, + + // replace second trailing byte with invalid byte, also incomplete at = end + {9, 5, 6, 3, '\xFF', 8}, + }; + for (auto t : offsets) + { + char in[array_size (valid_in)] =3D {}; + CharT out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + char_traits::copy (in, valid_in, array_size (valid_in)); + in[t.replace_pos] =3D t.replace_char; + + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.in (state, in, in + t.in_size, in_next, out, out + t.out= _size, + out_next); + VERIFY (res =3D=3D cvt.error); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf8_to_utf16_in (const std::codecvt &cvt) +{ + utf8_to_utf16_in_ok (cvt); + utf8_to_utf16_in_partial (cvt); + utf8_to_utf16_in_error (cvt); +} + +template +void +utf16_to_utf8_out_ok (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char16_t in_literal[] =3D u"b=D1=88\uAAAA\U0010AAAA"; + const char exp[] =3D "b=D1=88\uAAAA\U0010AAAA"; + CharT in[array_size (in_literal)]; + copy (begin (in_literal), end (in_literal), begin (in)); + + static_assert (array_size (in_literal) =3D=3D 6, ""); + static_assert (array_size (exp) =3D=3D 11, ""); + static_assert (array_size (in) =3D=3D 6, ""); + VERIFY (char_traits::length (in_literal) =3D=3D 5); + VERIFY (char_traits::length (exp) =3D=3D 10); + VERIFY (char_traits::length (in) =3D=3D 5); + + const test_offsets_ok offsets[] =3D {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {5,= 10}}; + for (auto t : offsets) + { + char out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + auto state =3D mbstate_t{}; + auto in_next =3D (const CharT *) nullptr; + auto out_next =3D (char *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.out (state, in, in + t.in_size, in_next, out, out + t.ou= t_size, + out_next); + VERIFY (res =3D=3D cvt.ok); + VERIFY (in_next =3D=3D in + t.in_size); + VERIFY (out_next =3D=3D out + t.out_size); + VERIFY (char_traits::compare (out, exp, t.out_size) =3D=3D 0); + if (t.out_size < array_size (out)) + VERIFY (out[t.out_size] =3D=3D 0); + } +} + +template +void +utf16_to_utf8_out_partial (const std::codecvt &cvt= ) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char16_t in_literal[] =3D u"b=D1=88\uAAAA\U0010AAAA"; + const char exp[] =3D "b=D1=88\uAAAA\U0010AAAA"; + CharT in[array_size (in_literal)]; + copy (begin (in_literal), end (in_literal), begin (in)); + + static_assert (array_size (in_literal) =3D=3D 6, ""); + static_assert (array_size (exp) =3D=3D 11, ""); + static_assert (array_size (in) =3D=3D 6, ""); + VERIFY (char_traits::length (in_literal) =3D=3D 5); + VERIFY (char_traits::length (exp) =3D=3D 10); + VERIFY (char_traits::length (in) =3D=3D 5); + + const test_offsets_partial offsets[] =3D { + {1, 0, 0, 0}, // no space for first CP + + {2, 1, 1, 1}, // no space for second CP + {2, 2, 1, 1}, // no space for second CP + + {3, 3, 2, 3}, // no space for third CP + {3, 4, 2, 3}, // no space for third CP + {3, 5, 2, 3}, // no space for third CP + + {5, 6, 3, 6}, // no space for fourth CP + {5, 7, 3, 6}, // no space for fourth CP + {5, 8, 3, 6}, // no space for fourth CP + {5, 9, 3, 6}, // no space for fourth CP + + {4, 10, 3, 6}, // incomplete fourth CP + + {4, 6, 3, 6}, // incomplete fourth CP, and no space for it + {4, 7, 3, 6}, // incomplete fourth CP, and no space for it + {4, 8, 3, 6}, // incomplete fourth CP, and no space for it + {4, 9, 3, 6}, // incomplete fourth CP, and no space for it + }; + for (auto t : offsets) + { + char out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + auto state =3D mbstate_t{}; + auto in_next =3D (const CharT *) nullptr; + auto out_next =3D (char *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.out (state, in, in + t.in_size, in_next, out, out + t.ou= t_size, + out_next); + VERIFY (res =3D=3D cvt.partial); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf16_to_utf8_out_error (const std::codecvt &cvt) +{ + using namespace std; + const char16_t valid_in[] =3D u"b=D1=88\uAAAA\U0010AAAA"; + const char exp[] =3D "b=D1=88\uAAAA\U0010AAAA"; + + static_assert (array_size (valid_in) =3D=3D 6, ""); + static_assert (array_size (exp) =3D=3D 11, ""); + VERIFY (char_traits::length (valid_in) =3D=3D 5); + VERIFY (char_traits::length (exp) =3D=3D 10); + + test_offsets_error offsets[] =3D { + {5, 10, 0, 0, 0xD800, 0}, + {5, 10, 0, 0, 0xDBFF, 0}, + {5, 10, 0, 0, 0xDC00, 0}, + {5, 10, 0, 0, 0xDFFF, 0}, + + {5, 10, 1, 1, 0xD800, 1}, + {5, 10, 1, 1, 0xDBFF, 1}, + {5, 10, 1, 1, 0xDC00, 1}, + {5, 10, 1, 1, 0xDFFF, 1}, + + {5, 10, 2, 3, 0xD800, 2}, + {5, 10, 2, 3, 0xDBFF, 2}, + {5, 10, 2, 3, 0xDC00, 2}, + {5, 10, 2, 3, 0xDFFF, 2}, + + // make the leading surrogate a trailing one + {5, 10, 3, 6, 0xDC00, 3}, + {5, 10, 3, 6, 0xDFFF, 3}, + + // make the trailing surrogate a leading one + {5, 10, 3, 6, 0xD800, 4}, + {5, 10, 3, 6, 0xDBFF, 4}, + + // make the trailing surrogate a BMP char + {5, 10, 3, 6, u'z', 4}, + }; + + for (auto t : offsets) + { + CharT in[array_size (valid_in)] =3D {}; + char out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + copy (begin (valid_in), end (valid_in), begin (in)); + in[t.replace_pos] =3D t.replace_char; + + auto state =3D mbstate_t{}; + auto in_next =3D (const CharT *) nullptr; + auto out_next =3D (char *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.out (state, in, in + t.in_size, in_next, out, out + t.ou= t_size, + out_next); + VERIFY (res =3D=3D cvt.error); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf16_to_utf8_out (const std::codecvt &cvt) +{ + utf16_to_utf8_out_ok (cvt); + utf16_to_utf8_out_partial (cvt); + utf16_to_utf8_out_error (cvt); +} + +template +void +test_utf8_utf16_cvts (const std::codecvt &cvt) +{ + utf8_to_utf16_in (cvt); + utf16_to_utf8_out (cvt); +} + +template +void +utf8_to_ucs2_in_ok (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP + const char in[] =3D "b=D1=88\uAAAA"; + const char16_t exp_literal[] =3D u"b=D1=88\uAAAA"; + CharT exp[array_size (exp_literal)] =3D {}; + copy (begin (exp_literal), end (exp_literal), begin (exp)); + + static_assert (array_size (in) =3D=3D 7, ""); + static_assert (array_size (exp_literal) =3D=3D 4, ""); + static_assert (array_size (exp) =3D=3D 4, ""); + VERIFY (char_traits::length (in) =3D=3D 6); + VERIFY (char_traits::length (exp_literal) =3D=3D 3); + VERIFY (char_traits::length (exp) =3D=3D 3); + + test_offsets_ok offsets[] =3D {{0, 0}, {1, 1}, {3, 2}, {6, 3}}; + for (auto t : offsets) + { + CharT out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.in (state, in, in + t.in_size, in_next, out, out + t.out= _size, + out_next); + VERIFY (res =3D=3D cvt.ok); + VERIFY (in_next =3D=3D in + t.in_size); + VERIFY (out_next =3D=3D out + t.out_size); + VERIFY (char_traits::compare (out, exp, t.out_size) =3D=3D 0)= ; + if (t.out_size < array_size (out)) + VERIFY (out[t.out_size] =3D=3D 0); + } + + for (auto t : offsets) + { + CharT out[array_size (exp)] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res + =3D cvt.in (state, in, in + t.in_size, in_next, out, end (out), out_next)= ; + VERIFY (res =3D=3D cvt.ok); + VERIFY (in_next =3D=3D in + t.in_size); + VERIFY (out_next =3D=3D out + t.out_size); + VERIFY (char_traits::compare (out, exp, t.out_size) =3D=3D 0)= ; + if (t.out_size < array_size (out)) + VERIFY (out[t.out_size] =3D=3D 0); + } +} + +template +void +utf8_to_ucs2_in_partial (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP + const char in[] =3D "b=D1=88\uAAAA"; + const char16_t exp_literal[] =3D u"b=D1=88\uAAAA"; + CharT exp[array_size (exp_literal)] =3D {}; + copy (begin (exp_literal), end (exp_literal), begin (exp)); + + static_assert (array_size (in) =3D=3D 7, ""); + static_assert (array_size (exp_literal) =3D=3D 4, ""); + static_assert (array_size (exp) =3D=3D 4, ""); + VERIFY (char_traits::length (in) =3D=3D 6); + VERIFY (char_traits::length (exp_literal) =3D=3D 3); + VERIFY (char_traits::length (exp) =3D=3D 3); + + test_offsets_partial offsets[] =3D { + {1, 0, 0, 0}, // no space for first CP + + {3, 1, 1, 1}, // no space for second CP + {2, 2, 1, 1}, // incomplete second CP + {2, 1, 1, 1}, // incomplete second CP, and no space for it + + {6, 2, 3, 2}, // no space for third CP + {4, 3, 3, 2}, // incomplete third CP + {5, 3, 3, 2}, // incomplete third CP + {4, 2, 3, 2}, // incomplete third CP, and no space for it + {5, 2, 3, 2}, // incomplete third CP, and no space for it + }; + + for (auto t : offsets) + { + CharT out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.in (state, in, in + t.in_size, in_next, out, out + t.out= _size, + out_next); + VERIFY (res =3D=3D cvt.partial); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf8_to_ucs2_in_error (const std::codecvt &cvt) +{ + using namespace std; + const char valid_in[] =3D "b=D1=88\uAAAA\U0010AAAA"; + const char16_t exp_literal[] =3D u"b=D1=88\uAAAA\U0010AAAA"; + CharT exp[array_size (exp_literal)] =3D {}; + copy (begin (exp_literal), end (exp_literal), begin (exp)); + + static_assert (array_size (valid_in) =3D=3D 11, ""); + static_assert (array_size (exp_literal) =3D=3D 6, ""); + static_assert (array_size (exp) =3D=3D 6, ""); + VERIFY (char_traits::length (valid_in) =3D=3D 10); + VERIFY (char_traits::length (exp_literal) =3D=3D 5); + VERIFY (char_traits::length (exp) =3D=3D 5); + + test_offsets_error offsets[] =3D { + + // replace leading byte with invalid byte + {1, 5, 0, 0, '\xFF', 0}, + {3, 5, 1, 1, '\xFF', 1}, + {6, 5, 3, 2, '\xFF', 3}, + {10, 5, 6, 3, '\xFF', 6}, + + // replace first trailing byte with ASCII byte + {3, 5, 1, 1, 'z', 2}, + {6, 5, 3, 2, 'z', 4}, + {10, 5, 6, 3, 'z', 7}, + + // replace first trailing byte with invalid byte + {3, 5, 1, 1, '\xFF', 2}, + {6, 5, 3, 2, '\xFF', 4}, + {10, 5, 6, 3, '\xFF', 7}, + + // replace second trailing byte with ASCII byte + {6, 5, 3, 2, 'z', 5}, + {10, 5, 6, 3, 'z', 8}, + + // replace second trailing byte with invalid byte + {6, 5, 3, 2, '\xFF', 5}, + {10, 5, 6, 3, '\xFF', 8}, + + // replace third trailing byte + {10, 5, 6, 3, 'z', 9}, + {10, 5, 6, 3, '\xFF', 9}, + + // When we see a leading byte of 4-byte CP, we should return error, no + // matter if it is incomplete at the end or has errors in the trailing + // bytes. + + // Don't replace anything, show full 4-byte CP + {10, 4, 6, 3, 'b', 0}, + {10, 5, 6, 3, 'b', 0}, + + // Don't replace anything, show incomplete 4-byte CP at the end + {7, 4, 6, 3, 'b', 0}, // incomplete fourth CP + {8, 4, 6, 3, 'b', 0}, // incomplete fourth CP + {9, 4, 6, 3, 'b', 0}, // incomplete fourth CP + {7, 5, 6, 3, 'b', 0}, // incomplete fourth CP + {8, 5, 6, 3, 'b', 0}, // incomplete fourth CP + {9, 5, 6, 3, 'b', 0}, // incomplete fourth CP + + // replace first trailing byte with ASCII byte, also incomplete at end + {5, 5, 3, 2, 'z', 4}, + + // replace first trailing byte with invalid byte, also incomplete at e= nd + {5, 5, 3, 2, '\xFF', 4}, + + // replace first trailing byte with ASCII byte, also incomplete at end + {8, 5, 6, 3, 'z', 7}, + {9, 5, 6, 3, 'z', 7}, + + // replace first trailing byte with invalid byte, also incomplete at e= nd + {8, 5, 6, 3, '\xFF', 7}, + {9, 5, 6, 3, '\xFF', 7}, + + // replace second trailing byte with ASCII byte, also incomplete at en= d + {9, 5, 6, 3, 'z', 8}, + + // replace second trailing byte with invalid byte, also incomplete at = end + {9, 5, 6, 3, '\xFF', 8}, + }; + for (auto t : offsets) + { + char in[array_size (valid_in)] =3D {}; + CharT out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + char_traits::copy (in, valid_in, array_size (valid_in)); + in[t.replace_pos] =3D t.replace_char; + + auto state =3D mbstate_t{}; + auto in_next =3D (const char *) nullptr; + auto out_next =3D (CharT *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.in (state, in, in + t.in_size, in_next, out, out + t.out= _size, + out_next); + VERIFY (res =3D=3D cvt.error); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +utf8_to_ucs2_in (const std::codecvt &cvt) +{ + utf8_to_ucs2_in_ok (cvt); + utf8_to_ucs2_in_partial (cvt); + utf8_to_ucs2_in_error (cvt); +} + +template +void +ucs2_to_utf8_out_ok (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP + const char16_t in_literal[] =3D u"b=D1=88\uAAAA"; + const char exp[] =3D "b=D1=88\uAAAA"; + CharT in[array_size (in_literal)] =3D {}; + copy (begin (in_literal), end (in_literal), begin (in)); + + static_assert (array_size (in_literal) =3D=3D 4, ""); + static_assert (array_size (exp) =3D=3D 7, ""); + static_assert (array_size (in) =3D=3D 4, ""); + VERIFY (char_traits::length (in_literal) =3D=3D 3); + VERIFY (char_traits::length (exp) =3D=3D 6); + VERIFY (char_traits::length (in) =3D=3D 3); + + const test_offsets_ok offsets[] =3D {{0, 0}, {1, 1}, {2, 3}, {3, 6}}; + for (auto t : offsets) + { + char out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + auto state =3D mbstate_t{}; + auto in_next =3D (const CharT *) nullptr; + auto out_next =3D (char *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.out (state, in, in + t.in_size, in_next, out, out + t.ou= t_size, + out_next); + VERIFY (res =3D=3D cvt.ok); + VERIFY (in_next =3D=3D in + t.in_size); + VERIFY (out_next =3D=3D out + t.out_size); + VERIFY (char_traits::compare (out, exp, t.out_size) =3D=3D 0); + if (t.out_size < array_size (out)) + VERIFY (out[t.out_size] =3D=3D 0); + } +} + +template +void +ucs2_to_utf8_out_partial (const std::codecvt &cvt) +{ + using namespace std; + // UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP + const char16_t in_literal[] =3D u"b=D1=88\uAAAA"; + const char exp[] =3D "b=D1=88\uAAAA"; + CharT in[array_size (in_literal)] =3D {}; + copy (begin (in_literal), end (in_literal), begin (in)); + + static_assert (array_size (in_literal) =3D=3D 4, ""); + static_assert (array_size (exp) =3D=3D 7, ""); + static_assert (array_size (in) =3D=3D 4, ""); + VERIFY (char_traits::length (in_literal) =3D=3D 3); + VERIFY (char_traits::length (exp) =3D=3D 6); + VERIFY (char_traits::length (in) =3D=3D 3); + + const test_offsets_partial offsets[] =3D { + {1, 0, 0, 0}, // no space for first CP + + {2, 1, 1, 1}, // no space for second CP + {2, 2, 1, 1}, // no space for second CP + + {3, 3, 2, 3}, // no space for third CP + {3, 4, 2, 3}, // no space for third CP + {3, 5, 2, 3}, // no space for third CP + }; + for (auto t : offsets) + { + char out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + auto state =3D mbstate_t{}; + auto in_next =3D (const CharT *) nullptr; + auto out_next =3D (char *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.out (state, in, in + t.in_size, in_next, out, out + t.ou= t_size, + out_next); + VERIFY (res =3D=3D cvt.partial); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +ucs2_to_utf8_out_error (const std::codecvt &cvt) +{ + using namespace std; + const char16_t valid_in[] =3D u"b=D1=88\uAAAA\U0010AAAA"; + const char exp[] =3D "b=D1=88\uAAAA\U0010AAAA"; + + static_assert (array_size (valid_in) =3D=3D 6, ""); + static_assert (array_size (exp) =3D=3D 11, ""); + VERIFY (char_traits::length (valid_in) =3D=3D 5); + VERIFY (char_traits::length (exp) =3D=3D 10); + + test_offsets_error offsets[] =3D { + {5, 10, 0, 0, 0xD800, 0}, + {5, 10, 0, 0, 0xDBFF, 0}, + {5, 10, 0, 0, 0xDC00, 0}, + {5, 10, 0, 0, 0xDFFF, 0}, + + {5, 10, 1, 1, 0xD800, 1}, + {5, 10, 1, 1, 0xDBFF, 1}, + {5, 10, 1, 1, 0xDC00, 1}, + {5, 10, 1, 1, 0xDFFF, 1}, + + {5, 10, 2, 3, 0xD800, 2}, + {5, 10, 2, 3, 0xDBFF, 2}, + {5, 10, 2, 3, 0xDC00, 2}, + {5, 10, 2, 3, 0xDFFF, 2}, + + // dont replace anything, just show the surrogate pair + {5, 10, 3, 6, u'b', 0}, + + // make the leading surrogate a trailing one + {5, 10, 3, 6, 0xDC00, 3}, + {5, 10, 3, 6, 0xDFFF, 3}, + + // make the trailing surrogate a leading one + {5, 10, 3, 6, 0xD800, 4}, + {5, 10, 3, 6, 0xDBFF, 4}, + + // make the trailing surrogate a BMP char + {5, 10, 3, 6, u'z', 4}, + + {5, 7, 3, 6, u'b', 0}, // no space for fourth CP + {5, 8, 3, 6, u'b', 0}, // no space for fourth CP + {5, 9, 3, 6, u'b', 0}, // no space for fourth CP + + {4, 10, 3, 6, u'b', 0}, // incomplete fourth CP + {4, 7, 3, 6, u'b', 0}, // incomplete fourth CP, and no space for it + {4, 8, 3, 6, u'b', 0}, // incomplete fourth CP, and no space for it + {4, 9, 3, 6, u'b', 0}, // incomplete fourth CP, and no space for it + + }; + + for (auto t : offsets) + { + CharT in[array_size (valid_in)] =3D {}; + char out[array_size (exp) - 1] =3D {}; + VERIFY (t.in_size <=3D array_size (in)); + VERIFY (t.out_size <=3D array_size (out)); + VERIFY (t.expected_in_next <=3D t.in_size); + VERIFY (t.expected_out_next <=3D t.out_size); + copy (begin (valid_in), end (valid_in), begin (in)); + in[t.replace_pos] =3D t.replace_char; + + auto state =3D mbstate_t{}; + auto in_next =3D (const CharT *) nullptr; + auto out_next =3D (char *) nullptr; + auto res =3D codecvt_base::result (); + + res =3D cvt.out (state, in, in + t.in_size, in_next, out, out + t.ou= t_size, + out_next); + VERIFY (res =3D=3D cvt.error); + VERIFY (in_next =3D=3D in + t.expected_in_next); + VERIFY (out_next =3D=3D out + t.expected_out_next); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) = =3D=3D 0); + if (t.expected_out_next < array_size (out)) + VERIFY (out[t.expected_out_next] =3D=3D 0); + } +} + +template +void +ucs2_to_utf8_out (const std::codecvt &cvt) +{ + ucs2_to_utf8_out_ok (cvt); + ucs2_to_utf8_out_partial (cvt); + ucs2_to_utf8_out_error (cvt); +} + +template +void +test_utf8_ucs2_cvts (const std::codecvt &cvt) +{ + utf8_to_ucs2_in (cvt); + ucs2_to_utf8_out (cvt); +} diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_wchar= _t.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc new file mode 100644 index 000000000..169504939 --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc @@ -0,0 +1,59 @@ +// Copyright (C) 2020-2023 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// . + +// { dg-do run { target c++11 } } + +#include "codecvt_unicode.h" + +#include + +using namespace std; + +void +test_utf8_utf32_codecvts () +{ +#if __SIZEOF_WCHAR_T__ =3D=3D 4 + auto cvt_ptr =3D to_unique_ptr (new codecvt_utf8 ()); + test_utf8_utf32_codecvts (*cvt_ptr); +#endif +} + +void +test_utf8_utf16_codecvts () +{ +#if __SIZEOF_WCHAR_T__ >=3D 2 + auto cvt_ptr =3D to_unique_ptr (new codecvt_utf8_utf16 ()); + test_utf8_utf16_cvts (*cvt_ptr); +#endif +} + +void +test_utf8_ucs2_codecvts () +{ +#if __SIZEOF_WCHAR_T__ =3D=3D 2 + auto cvt_ptr =3D to_unique_ptr (new codecvt_utf8 ()); + test_utf8_ucs2_cvts (*cvt_ptr); +#endif +} + +int +main () +{ + test_utf8_utf32_codecvts (); + test_utf8_utf16_codecvts (); + test_utf8_ucs2_codecvts (); +} --=20 2.34.1