public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jason Merrill <jason@redhat.com>
To: Jakub Jelinek <jakub@redhat.com>,
	"Joseph S. Myers" <joseph@codesourcery.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]
Date: Thu, 7 Oct 2021 09:12:15 -0400	[thread overview]
Message-ID: <364eac8d-92d1-eadf-ad8e-565712f463fe@redhat.com> (raw)
In-Reply-To: <20211007130049.GT304296@tucnak>

On 10/7/21 09:00, Jakub Jelinek wrote:
> Hi!
> 
> I believe we need no changes to the compiler for P2316R2, seems we treat
> character literals the same between preprocessor and C++ expressions,
> here is a testcase that should verify it.
> 
> Tested on x86_64-linux, ok for trunk?
> 
> Note, seems the internal charset for GCC can be either UTF-8 or UTF-EBCDIC,
> but I bet it is very hard (at least for me) to actually test the latter.
> I'd guess one needs all system headers to be in EBCDIC and the gcc sources too.
> But looking around the source, I'm a little bit worried about the UTF-EBCDIC
> case.
> One is:
>   #if  '\n' == 0x0A && ' ' == 0x20 && '0' == 0x30 \
>      && 'A' == 0x41 && 'a' == 0x61 && '!' == 0x21
>   #  define HOST_CHARSET HOST_CHARSET_ASCII
>   #else
>   # if '\n' == 0x15 && ' ' == 0x40 && '0' == 0xF0 \
>      && 'A' == 0xC1 && 'a' == 0x81 && '!' == 0x5A
>   #  define HOST_CHARSET HOST_CHARSET_EBCDIC
>   # else
>   #  define HOST_CHARSET HOST_CHARSET_UNKNOWN
>   # endif
>   #endif
> in include/safe-ctype.h, does that mean we only support EBCDIC if -funsigned-char
> and otherwise fail to build gcc?  Because with -fsigned-char, '0' is -0x10
> rather than 0xF0, 'A' is -0x3F rather than 0xC1 and 'a' is -0x7F rather than
> 0x81.
> And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the libcpp/lex.c
> static const cppchar_t utf8_signifier = 0xC0;
> ...
>        if (*buffer->cur >= utf8_signifier)
>          {
>            if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
>                                 state, &s))
>              return true;
>          }
> work?  Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of
> multi-byte character, it is more complicated and seems _cpp_valid_utf8
> assumes UTF-8 as the host charset.

Are there any supported platforms that use UTF-EBCDIC?

> 2021-10-07  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR c++/102615
> 	* g++.dg/cpp23/charlit-encoding1.C: New testcase for C++23 P2316R2.
> 
> --- gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C.jj	2021-10-07 14:34:35.182132411 +0200
> +++ gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C	2021-10-07 14:34:02.902583774 +0200
> @@ -0,0 +1,33 @@
> +// PR c++/102615 - P2316R2 - Consistent character literal encoding
> +// { dg-do compile }

Doesn't this need to run?  OK with that change.

> +extern "C" void abort ();
> +
> +int
> +main ()
> +{
> +#if ' ' == 0x20
> +  if (' ' != 0x20)
> +    abort ();
> +#elif ' ' == 0x40
> +  if (' ' != 0x40)
> +    abort ();
> +#else
> +  if (' ' == 0x20 || ' ' == 0x40)
> +    abort ();
> +#endif
> +#if 'a' == 0x61
> +  if ('a' != 0x61)
> +    abort ();
> +#elif 'a' == 0x81
> +  if ('a' != 0x81)
> +    abort ();
> +#elif 'a' == -0x7F
> +  if ('a' != -0x7F)
> +    abort ();
> +#else
> +  if ('a' == 0x61 || 'a' == 0x81 || 'a' == -0x7F)
> +    abort ();
> +#endif
> +  return 0;
> +}
> 
> 	Jakub
> 


  reply	other threads:[~2021-10-07 13:12 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07 13:00 Jakub Jelinek
2021-10-07 13:12 ` Jason Merrill [this message]
2021-10-07 13:23   ` Jakub Jelinek
2021-10-07 13:34 ` Lewis Hyatt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=364eac8d-92d1-eadf-ad8e-565712f463fe@redhat.com \
    --to=jason@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=joseph@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).