On 15/01/24 19:09 -0500, Patrick Palka wrote: >On Mon, 15 Jan 2024, Jonathan Wakely wrote: > >> I think I'm happy with this now. It has tests for all the new functions, >> and the performance of the charset alias match algorithm is improved by >> reusing part of . >> >> Tested x86_64-linux. >> >> -- >8 -- >> >> This is another C++26 change, approved in Varna 2022. We require a new > >2023? Oops, yes. >> @@ -1056,6 +1057,54 @@ inline namespace __v15_1_0 >> __literal_encoding_is_utf8() >> { return __literal_encoding_is_unicode(); } >> >> + consteval bool >> + __literal_encoding_is_extended_ascii() >> + { >> + return '0' == 0x30 && 'A' == 0x41 && 'Z' == 0x5a >> + && 'a' == 0x61 && 'z' == 0x7a; >> + } > >This function seems unused now. It is. I thought it might be a useful utility to have in the unicode file, but it's easy enough to recreate if needed. I'll remove it. >> + >> + // https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching >> + constexpr bool >> + __charset_alias_match(string_view __a, string_view __b) >> + { >> + // Map alphanumeric chars to their base 64 value, everything else to 127. >> + auto __map = [](char __c, bool& __num) -> unsigned char { >> + using __detail::__from_chars_alnum_to_val_table; >> + if (__c == '0') [[unlikely]] >> + return __num ? 0 : 127; >> + auto __v = __from_chars_alnum_to_val_table::value.__data[__c]; > >Maybe it'd be more concise to use the accessor function >__from_chars_alnum_to_val here (so e.g. the caller doesn't have to pass >_DecOnly=false explicitly)? Yup, done. > + >> + class _Iterator >> + { >> + public: >> + using value_type = const char*; >> + using reference = const char*; >> + using difference_type = int; >> + constexpr value_type operator*() const; > >Are these defined out-of-line to avoid excessive indentation? Perhaps >we could define _Iterator (and aliases_view::begin) out-of-line instead? Not due to indentation, but because the text_encoding class itself became even more unreadable with this code in the middle of it. I realise that the huge list of enumerators in the id enum already makes it unwieldy, but at least that's just a repetitive list of constants. The iterator function bodies don't really add anything to the understanding of the std::text_encoding API (it's an iterator, it has the expected iterator operations, the details of how they're implemented don't matter to most casual readers). But moving the whole _Iterator class to the end serves the same purpose. I didn't try moving it to the end because I was concerned about Clang giving bogus errors about constexpr functions not being defined, like this recent one: https://github.com/llvm/llvm-project/issues/73232 Although if that was going to be a problem, defining the members out-of-line would probably already fail. I've moved it (and the begin() function that uses it). > >> + constexpr _Iterator& operator++(); >> + constexpr _Iterator& operator--(); >> + constexpr _Iterator operator++(int); >> + constexpr _Iterator operator--(int); >> + constexpr value_type operator[](difference_type) const; >> + constexpr _Iterator& operator+=(difference_type); >> + constexpr _Iterator& operator-=(difference_type); >> + constexpr difference_type operator-(const _Iterator&) const; >> + constexpr bool operator==(const _Iterator&) const = default; >> + constexpr bool operator==(_Sentinel) const noexcept; >> + constexpr strong_ordering operator<=>(const _Iterator&) const; >> + >> + friend _Iterator >> + operator+(_Iterator __i, difference_type __n) > >constexpr? Fixed. I've added tests that all iterator ops are usable in constant expressions, which found a bug in operator+= (it didn't let you increment one past the end of the range). Thanks for the review. Patch v4 attached. I'm testing it now.