[PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]
@ 2021-10-07 13:00 Jakub Jelinek
  2021-10-07 13:12 ` Jason Merrill
  2021-10-07 13:34 ` Lewis Hyatt
  0 siblings, 2 replies; 4+ messages in thread
From: Jakub Jelinek @ 2021-10-07 13:00 UTC (permalink / raw)
  To: Jason Merrill, Joseph S. Myers; +Cc: gcc-patches

Hi!

I believe we need no changes to the compiler for P2316R2, seems we treat
character literals the same between preprocessor and C++ expressions,
here is a testcase that should verify it.

Tested on x86_64-linux, ok for trunk?

Note, seems the internal charset for GCC can be either UTF-8 or UTF-EBCDIC,
but I bet it is very hard (at least for me) to actually test the latter.
I'd guess one needs all system headers to be in EBCDIC and the gcc sources too.
But looking around the source, I'm a little bit worried about the UTF-EBCDIC
case.
One is:
 #if  '\n' == 0x0A && ' ' == 0x20 && '0' == 0x30 \
    && 'A' == 0x41 && 'a' == 0x61 && '!' == 0x21
 #  define HOST_CHARSET HOST_CHARSET_ASCII
 #else
 # if '\n' == 0x15 && ' ' == 0x40 && '0' == 0xF0 \
    && 'A' == 0xC1 && 'a' == 0x81 && '!' == 0x5A
 #  define HOST_CHARSET HOST_CHARSET_EBCDIC
 # else
 #  define HOST_CHARSET HOST_CHARSET_UNKNOWN
 # endif
 #endif
in include/safe-ctype.h, does that mean we only support EBCDIC if -funsigned-char
and otherwise fail to build gcc?  Because with -fsigned-char, '0' is -0x10
rather than 0xF0, 'A' is -0x3F rather than 0xC1 and 'a' is -0x7F rather than
0x81.
And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the libcpp/lex.c
static const cppchar_t utf8_signifier = 0xC0;
...
      if (*buffer->cur >= utf8_signifier)
        {
          if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
                               state, &s))
            return true;
        }
work?  Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of
multi-byte character, it is more complicated and seems _cpp_valid_utf8
assumes UTF-8 as the host charset.

2021-10-07  Jakub Jelinek  <jakub@redhat.com>

	PR c++/102615
	* g++.dg/cpp23/charlit-encoding1.C: New testcase for C++23 P2316R2.

--- gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C.jj	2021-10-07 14:34:35.182132411 +0200
+++ gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C	2021-10-07 14:34:02.902583774 +0200
@@ -0,0 +1,33 @@
+// PR c++/102615 - P2316R2 - Consistent character literal encoding
+// { dg-do compile }
+
+extern "C" void abort ();
+
+int
+main ()
+{
+#if ' ' == 0x20
+  if (' ' != 0x20)
+    abort ();
+#elif ' ' == 0x40
+  if (' ' != 0x40)
+    abort ();
+#else
+  if (' ' == 0x20 || ' ' == 0x40)
+    abort ();
+#endif
+#if 'a' == 0x61
+  if ('a' != 0x61)
+    abort ();
+#elif 'a' == 0x81
+  if ('a' != 0x81)
+    abort ();
+#elif 'a' == -0x7F
+  if ('a' != -0x7F)
+    abort ();
+#else
+  if ('a' == 0x61 || 'a' == 0x81 || 'a' == -0x7F)
+    abort ();
+#endif
+  return 0;
+}

	Jakub


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]
  2021-10-07 13:00 [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615] Jakub Jelinek
@ 2021-10-07 13:12 ` Jason Merrill
  2021-10-07 13:23   ` Jakub Jelinek
  2021-10-07 13:34 ` Lewis Hyatt
  1 sibling, 1 reply; 4+ messages in thread
From: Jason Merrill @ 2021-10-07 13:12 UTC (permalink / raw)
  To: Jakub Jelinek, Joseph S. Myers; +Cc: gcc-patches

On 10/7/21 09:00, Jakub Jelinek wrote:
> Hi!
> 
> I believe we need no changes to the compiler for P2316R2, seems we treat
> character literals the same between preprocessor and C++ expressions,
> here is a testcase that should verify it.
> 
> Tested on x86_64-linux, ok for trunk?
> 
> Note, seems the internal charset for GCC can be either UTF-8 or UTF-EBCDIC,
> but I bet it is very hard (at least for me) to actually test the latter.
> I'd guess one needs all system headers to be in EBCDIC and the gcc sources too.
> But looking around the source, I'm a little bit worried about the UTF-EBCDIC
> case.
> One is:
>   #if  '\n' == 0x0A && ' ' == 0x20 && '0' == 0x30 \
>      && 'A' == 0x41 && 'a' == 0x61 && '!' == 0x21
>   #  define HOST_CHARSET HOST_CHARSET_ASCII
>   #else
>   # if '\n' == 0x15 && ' ' == 0x40 && '0' == 0xF0 \
>      && 'A' == 0xC1 && 'a' == 0x81 && '!' == 0x5A
>   #  define HOST_CHARSET HOST_CHARSET_EBCDIC
>   # else
>   #  define HOST_CHARSET HOST_CHARSET_UNKNOWN
>   # endif
>   #endif
> in include/safe-ctype.h, does that mean we only support EBCDIC if -funsigned-char
> and otherwise fail to build gcc?  Because with -fsigned-char, '0' is -0x10
> rather than 0xF0, 'A' is -0x3F rather than 0xC1 and 'a' is -0x7F rather than
> 0x81.
> And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the libcpp/lex.c
> static const cppchar_t utf8_signifier = 0xC0;
> ...
>        if (*buffer->cur >= utf8_signifier)
>          {
>            if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
>                                 state, &s))
>              return true;
>          }
> work?  Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of
> multi-byte character, it is more complicated and seems _cpp_valid_utf8
> assumes UTF-8 as the host charset.

Are there any supported platforms that use UTF-EBCDIC?

> 2021-10-07  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR c++/102615
> 	* g++.dg/cpp23/charlit-encoding1.C: New testcase for C++23 P2316R2.
> 
> --- gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C.jj	2021-10-07 14:34:35.182132411 +0200
> +++ gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C	2021-10-07 14:34:02.902583774 +0200
> @@ -0,0 +1,33 @@
> +// PR c++/102615 - P2316R2 - Consistent character literal encoding
> +// { dg-do compile }

Doesn't this need to run?  OK with that change.

> +extern "C" void abort ();
> +
> +int
> +main ()
> +{
> +#if ' ' == 0x20
> +  if (' ' != 0x20)
> +    abort ();
> +#elif ' ' == 0x40
> +  if (' ' != 0x40)
> +    abort ();
> +#else
> +  if (' ' == 0x20 || ' ' == 0x40)
> +    abort ();
> +#endif
> +#if 'a' == 0x61
> +  if ('a' != 0x61)
> +    abort ();
> +#elif 'a' == 0x81
> +  if ('a' != 0x81)
> +    abort ();
> +#elif 'a' == -0x7F
> +  if ('a' != -0x7F)
> +    abort ();
> +#else
> +  if ('a' == 0x61 || 'a' == 0x81 || 'a' == -0x7F)
> +    abort ();
> +#endif
> +  return 0;
> +}
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]
  2021-10-07 13:12 ` Jason Merrill
@ 2021-10-07 13:23   ` Jakub Jelinek
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Jelinek @ 2021-10-07 13:23 UTC (permalink / raw)
  To: Jason Merrill, Andreas Krebbel; +Cc: Joseph S. Myers, gcc-patches

On Thu, Oct 07, 2021 at 09:12:15AM -0400, Jason Merrill wrote:
> > And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the libcpp/lex.c
> > static const cppchar_t utf8_signifier = 0xC0;
> > ...
> >        if (*buffer->cur >= utf8_signifier)
> >          {
> >            if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
> >                                 state, &s))
> >              return true;
> >          }
> > work?  Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of
> > multi-byte character, it is more complicated and seems _cpp_valid_utf8
> > assumes UTF-8 as the host charset.
> 
> Are there any supported platforms that use UTF-EBCDIC?

I have no idea.  From the libcpp/charset.c code, seems there is no built-in
conversion for UTF-EBCDIC, the only internally supported conversions are
  { "UTF-8/UTF-32LE", convert_utf8_utf32, (iconv_t)0 },
  { "UTF-8/UTF-32BE", convert_utf8_utf32, (iconv_t)1 },
  { "UTF-8/UTF-16LE", convert_utf8_utf16, (iconv_t)0 },
  { "UTF-8/UTF-16BE", convert_utf8_utf16, (iconv_t)1 },
  { "UTF-32LE/UTF-8", convert_utf32_utf8, (iconv_t)0 },
  { "UTF-32BE/UTF-8", convert_utf32_utf8, (iconv_t)1 },
  { "UTF-16LE/UTF-8", convert_utf16_utf8, (iconv_t)0 },
  { "UTF-16BE/UTF-8", convert_utf16_utf8, (iconv_t)1 },
and identity, so unless the C library iconv supports conversion to
UTF-EBCDIC, the only case that could be supported is when -finput-charset=
is also UTF-EBCDIC.  E.g. glibc iconv doesn't support that.
Never used z/VM nor OS/390 which I think are the only possible hosts that
could have UTF-EBCDIC.
CCing Andreas if he knows more...

> > --- gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C.jj	2021-10-07 14:34:35.182132411 +0200
> > +++ gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C	2021-10-07 14:34:02.902583774 +0200
> > @@ -0,0 +1,33 @@
> > +// PR c++/102615 - P2316R2 - Consistent character literal encoding
> > +// { dg-do compile }
> 
> Doesn't this need to run?  OK with that change.

Thanks for catching that, fixed, retested and committed.

	Jakub

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]
  2021-10-07 13:00 [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615] Jakub Jelinek
  2021-10-07 13:12 ` Jason Merrill
@ 2021-10-07 13:34 ` Lewis Hyatt
  1 sibling, 0 replies; 4+ messages in thread
From: Lewis Hyatt @ 2021-10-07 13:34 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jason Merrill, Joseph S. Myers, gcc-patches

On Thu, Oct 7, 2021 at 9:01 AM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
> And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the libcpp/lex.c
> static const cppchar_t utf8_signifier = 0xC0;
> ...
>       if (*buffer->cur >= utf8_signifier)
>         {
>           if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
>                                state, &s))
>             return true;
>         }
> work?  Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of
> multi-byte character, it is more complicated and seems _cpp_valid_utf8
> assumes UTF-8 as the host charset.

FWIW, here I was following Joseph's guidance from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224#c21 ("You can
ignore anything claiming to handle UTF-EBCDIC.")

-Lewis

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-07 13:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-07 13:00 [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615] Jakub Jelinek
2021-10-07 13:12 ` Jason Merrill
2021-10-07 13:23   ` Jakub Jelinek
2021-10-07 13:34 ` Lewis Hyatt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).