From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jason@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 3EE563856959
	for <gcc-patches@gcc.gnu.org>; Mon, 29 Aug 2022 21:15:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3EE563856959
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1661807730;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=SzCqKSuZUkz9QsHr2jzzTsg3H71T+k8dd1c5vkfNyi8=;
	b=e1eJ2ZAf9P7MWLc7r07TQSP1v+2qFXxdQo265u+7AIkRQhpZOUutw33/nUqmqTZlMPbqLR
	sxT/uXuimLjUJu18R0QY7ZSkNEADHBTc53jXdVaRD+xlmdqGcb2TtgLXeu/iuUGaoIuy5+
	eVGUE4pNbcHxpD93E1M0GRM4Jg8SiR4=
Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com
 [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-41-94CxetsXMuGxIZuSU4I3ag-1; Mon, 29 Aug 2022 17:15:29 -0400
X-MC-Unique: 94CxetsXMuGxIZuSU4I3ag-1
Received: by mail-qk1-f197.google.com with SMTP id bj2-20020a05620a190200b006bba055ab6eso7562036qkb.12
        for <gcc-patches@gcc.gnu.org>; Mon, 29 Aug 2022 14:15:29 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc;
        bh=SzCqKSuZUkz9QsHr2jzzTsg3H71T+k8dd1c5vkfNyi8=;
        b=t2J368IbIqcjD1ygXAp3t2QhOChiOhhlk/49UFJd9b/HKTstRIknKPrgIprfCy3ZRV
         CrfSv3bvIibp6BvDmMqUDJaFJOqVKlYCntDh3IYiltGcQ8zhAe80CcCj0UYFWJyH+Jqy
         mDrOM5PXKMNKfBeRX06jq2Kyp0Uq40gadIeFRhrUkYQ3KNfwRcnnrJ80eTSNeM/PzsED
         H+aRbrFtspQvFWlG+5pNVD+wxl6wr593kjISXSrYVB8hfDeZ5xTeR/Dfv6XHVtwTmHJ6
         ie5pJ0j57JkYEmRFLDaqkLePzoO5noT0Y4mb9gIv3ihRWO1YZMHoGdfzwAzGxYbKCX3+
         nxYw==
X-Gm-Message-State: ACgBeo3vXU3083VtLvNEP1rxaCxLeEDY5P9xhflA0lrpX4toSysrEM5i
	qYZXbCk8o7I0UyEnhkjVugcdMFwsGGuFa2yRohHQYmSusNpS2VtQOblFsYNdlWS4AIyD9yQrXJ0
	HQoI4pKdZSZYEVLN6Uw==
X-Received: by 2002:a05:622a:413:b0:343:73b9:5cc6 with SMTP id n19-20020a05622a041300b0034373b95cc6mr11987023qtx.655.1661807728778;
        Mon, 29 Aug 2022 14:15:28 -0700 (PDT)
X-Google-Smtp-Source: AA6agR5KgpRG65UbnCmkuRN2W/1HbivrroYLicgQUdhKb81CziyffN5fHLHw3YkOs5PQRqWLYdvrTg==
X-Received: by 2002:a05:622a:413:b0:343:73b9:5cc6 with SMTP id n19-20020a05622a041300b0034373b95cc6mr11986984qtx.655.1661807728232;
        Mon, 29 Aug 2022 14:15:28 -0700 (PDT)
Received: from [192.168.1.101] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43])
        by smtp.gmail.com with ESMTPSA id hf11-20020a05622a608b00b00344b807bb95sm5705344qtb.74.2022.08.29.14.15.27
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 29 Aug 2022 14:15:27 -0700 (PDT)
Message-ID: <e93528c7-efa8-a328-2af5-a2c2ce59dfe4@redhat.com>
Date: Mon, 29 Aug 2022 17:15:26 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.13.0
Subject: Re: [PATCH] libcpp: Add -Winvalid-utf8 warning [PR106655]
To: Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches@gcc.gnu.org
References: <Ywx1jiBRWLmv2NOZ@tucnak>
From: Jason Merrill <jason@redhat.com>
In-Reply-To: <Ywx1jiBRWLmv2NOZ@tucnak>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On 8/29/22 04:15, Jakub Jelinek wrote:
> Hi!
> 
> The following patch introduces a new warning - -Winvalid-utf8 similarly
> to what clang now has - to diagnose invalid UTF-8 byte sequences in
> comments.  In identifiers and in string literals it should be diagnosed
> already but comment content hasn't been really verified.
> 
> I'm not sure if this is enough to say P2295R6 is implemented or not.
> 
> The problem is that in the most common case, people don't use
> -finput-charset= option and the sources often are UTF-8, but sometimes
> could be some ASCII compatible single byte encoding where non-ASCII
> characters only appear in comments.  So having the warning off by default
> is IMO desirable.  Now, if people use explicit -finput-charset=UTF-8,
> perhaps we could make the warning on by default for C++23 and use pedwarn
> instead of warning, because then the user told us explicitly that the source
> is UTF-8.  From the paper I understood one of the implementation options
> is to claim that the implementation supports 2 encodings, UTF-8 and UTF-8
> like encodings where invalid UTF-8 characters in comments are replaced say
> by spaces, where the latter could be the default and the former only
> used if -finput-charset=UTF-8 -Werror=invalid-utf8 options are used.
> 
> Thoughts on this?

That sounds good to me.

> 2022-08-29  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR c++/106655
> libcpp/
> 	* include/cpplib.h (struct cpp_options): Implement C++23
> 	P2295R6 - Support for UTF-8 as a portable source file encoding.
> 	Add cpp_warn_invalid_utf8 field.
> 	(enum cpp_warning_reason): Add CPP_W_INVALID_UTF8 enumerator.
> 	* init.cc (cpp_create_reader): Initialize cpp_warn_invalid_utf8.
> 	* lex.cc (utf8_continuation): New const variable.
> 	(utf8_signifier): Move earlier in the file.
> 	(_cpp_warn_invalid_utf8): New function.
> 	(_cpp_skip_block_comment): Handle -Winvalid-utf8 warning.
> 	(skip_line_comment): Likewise.
> gcc/
> 	* doc/invoke.texi (-Winvalid-utf8): Document it.
> gcc/c-family/
> 	* c.opt (-Winvalid-utf8): New warning.
> gcc/testsuite/
> 	* c-c++-common/cpp/Winvalid-utf8-1.c: New test.
> 
> --- libcpp/include/cpplib.h.jj	2022-08-25 14:25:23.866912426 +0200
> +++ libcpp/include/cpplib.h	2022-08-27 12:17:55.185022807 +0200
> @@ -560,6 +560,9 @@ struct cpp_options
>        cpp_bidirectional_level.  */
>     unsigned char cpp_warn_bidirectional;
>   
> +  /* True if libcpp should warn about invalid UTF-8 characters in comments.  */
> +  bool cpp_warn_invalid_utf8;
> +
>     /* Dependency generation.  */
>     struct
>     {
> @@ -666,7 +669,8 @@ enum cpp_warning_reason {
>     CPP_W_CXX11_COMPAT,
>     CPP_W_CXX20_COMPAT,
>     CPP_W_EXPANSION_TO_DEFINED,
> -  CPP_W_BIDIRECTIONAL
> +  CPP_W_BIDIRECTIONAL,
> +  CPP_W_INVALID_UTF8
>   };
>   
>   /* Callback for header lookup for HEADER, which is the name of a
> --- libcpp/init.cc.jj	2022-08-24 09:55:44.571876638 +0200
> +++ libcpp/init.cc	2022-08-27 12:18:54.559246323 +0200
> @@ -227,6 +227,7 @@ cpp_create_reader (enum c_lang lang, cpp
>     CPP_OPTION (pfile, ext_numeric_literals) = 1;
>     CPP_OPTION (pfile, warn_date_time) = 0;
>     CPP_OPTION (pfile, cpp_warn_bidirectional) = bidirectional_unpaired;
> +  CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0;
>   
>     /* Default CPP arithmetic to something sensible for the host for the
>        benefit of dumb users like fix-header.  */
> --- libcpp/lex.cc.jj	2022-08-26 09:24:12.089615949 +0200
> +++ libcpp/lex.cc	2022-08-27 13:43:40.560769087 +0200
> @@ -1704,6 +1704,59 @@ maybe_warn_bidi_on_char (cpp_reader *pfi
>     bidi::on_char (kind, ucn_p, loc);
>   }
>   
> +static const cppchar_t utf8_continuation = 0x80;
> +static const cppchar_t utf8_signifier = 0xC0; >
> +/* Emit -Winvalid-utf8 warning on invalid UTF-8 character starting
> +   at PFILE->buffer->cur.  Return a pointer after the diagnosed
> +   invalid character.  */
> +
> +static const uchar *
> +_cpp_warn_invalid_utf8 (cpp_reader *pfile)
> +{
> +  cpp_buffer *buffer = pfile->buffer;
> +  const uchar *cur = buffer->cur;
> +
> +  if (cur[0] < utf8_signifier
> +      || cur[1] < utf8_continuation || cur[1] >= utf8_signifier)
> +    {
> +      cpp_warning_with_line (pfile, CPP_W_INVALID_UTF8,
> +			     pfile->line_table->highest_line,
> +			     CPP_BUF_COL (buffer),
> +			     "invalid UTF-8 character <%x> in comment",
> +			     cur[0]);
> +      return cur + 1;
> +    }
> +  else if (cur[2] < utf8_continuation || cur[2] >= utf8_signifier)

Unicode table 3-7 says that the second byte is sometimes restricted to 
less than this range.  Hmm, it looks like one_utf8_to_cppchar doesn't 
check that, either.

Did you consider adding error reporting to one_utf8_to_cppchar?  It 
might be better to localize the utf8 logic.

> +    {
> +      cpp_warning_with_line (pfile, CPP_W_INVALID_UTF8,
> +			     pfile->line_table->highest_line,
> +			     CPP_BUF_COL (buffer),
> +			     "invalid UTF-8 character <%x><%x> in comment",
> +			     cur[0], cur[1]);
> +      return cur + 2;
> +    }
> +  else if (cur[3] < utf8_continuation || cur[3] >= utf8_signifier)
> +    {
> +      cpp_warning_with_line (pfile, CPP_W_INVALID_UTF8,
> +			     pfile->line_table->highest_line,
> +			     CPP_BUF_COL (buffer),
> +			     "invalid UTF-8 character <%x><%x><%x> in comment",
> +			     cur[0], cur[1], cur[2]);
> +      return cur + 3;
> +    }
> +  else
> +    {
> +      cpp_warning_with_line (pfile, CPP_W_INVALID_UTF8,
> +			     pfile->line_table->highest_line,
> +			     CPP_BUF_COL (buffer),
> +			     "invalid UTF-8 character "
> +			     "<%x><%x><%x><%x> in comment",
> +			     cur[0], cur[1], cur[2], cur[3]);
> +      return cur + 4;
> +    }
> +}
> +
>   /* Skip a C-style block comment.  We find the end of the comment by
>      seeing if an asterisk is before every '/' we encounter.  Returns
>      nonzero if comment terminated by EOF, zero otherwise.
> @@ -1716,6 +1769,8 @@ _cpp_skip_block_comment (cpp_reader *pfi
>     const uchar *cur = buffer->cur;
>     uchar c;
>     const bool warn_bidi_p = pfile->warn_bidi_p ();
> +  const bool warn_invalid_utf8_p = CPP_OPTION (pfile, cpp_warn_invalid_utf8);
> +  const bool warn_bidi_or_invalid_utf8_p = warn_bidi_p | warn_invalid_utf8_p;
>   
>     cur++;
>     if (*cur == '/')
> @@ -1765,13 +1820,32 @@ _cpp_skip_block_comment (cpp_reader *pfi
>   
>   	  cur = buffer->cur;
>   	}
> -      /* If this is a beginning of a UTF-8 encoding, it might be
> -	 a bidirectional control character.  */
> -      else if (__builtin_expect (c == bidi::utf8_start, 0) && warn_bidi_p)
> -	{
> -	  location_t loc;
> -	  bidi::kind kind = get_bidi_utf8 (pfile, cur - 1, &loc);
> -	  maybe_warn_bidi_on_char (pfile, kind, /*ucn_p=*/false, loc);
> +      else if (__builtin_expect (c >= utf8_continuation, 0)
> +	       && warn_bidi_or_invalid_utf8_p)
> +	{
> +	  /* If this is a beginning of a UTF-8 encoding, it might be
> +	     a bidirectional control character.  */
> +	  if (c == bidi::utf8_start && warn_bidi_p)
> +	    {
> +	      location_t loc;
> +	      bidi::kind kind = get_bidi_utf8 (pfile, cur - 1, &loc);
> +	      maybe_warn_bidi_on_char (pfile, kind, /*ucn_p=*/false, loc);
> +	    }
> +	  if (!warn_invalid_utf8_p)
> +	    continue;
> +	  if (c >= utf8_signifier)
> +	    {
> +	      cppchar_t s;
> +	      const uchar *pstr = cur - 1;
> +	      if (_cpp_valid_utf8 (pfile, &pstr, buffer->rlimit, 0, NULL, &s)
> +		  && s <= 0x0010FFFF)
> +		{
> +		  cur = pstr;
> +		  continue;
> +		}
> +	    }
> +	  buffer->cur = cur - 1;
> +	  cur = _cpp_warn_invalid_utf8 (pfile);
>   	}
>       }
>   
> @@ -1789,11 +1863,13 @@ skip_line_comment (cpp_reader *pfile)
>     cpp_buffer *buffer = pfile->buffer;
>     location_t orig_line = pfile->line_table->highest_line;
>     const bool warn_bidi_p = pfile->warn_bidi_p ();
> +  const bool warn_invalid_utf8_p = CPP_OPTION (pfile, cpp_warn_invalid_utf8);
> +  const bool warn_bidi_or_invalid_utf8_p = warn_bidi_p | warn_invalid_utf8_p;
>   
> -  if (!warn_bidi_p)
> +  if (!warn_bidi_or_invalid_utf8_p)
>       while (*buffer->cur != '\n')
>         buffer->cur++;
> -  else
> +  else if (!warn_invalid_utf8_p)
>       {
>         while (*buffer->cur != '\n'
>   	     && *buffer->cur != bidi::utf8_start)
> @@ -1813,6 +1889,38 @@ skip_line_comment (cpp_reader *pfile)
>   	  maybe_warn_bidi_on_close (pfile, buffer->cur);
>   	}
>       }
> +  else
> +    {
> +      while (*buffer->cur != '\n')
> +	{
> +	  if (*buffer->cur < utf8_continuation)
> +	    {
> +	      buffer->cur++;
> +	      continue;
> +	    }
> +	  if (__builtin_expect (*buffer->cur == bidi::utf8_start, 0)
> +	      && warn_bidi_p)
> +	    {
> +	      location_t loc;
> +	      bidi::kind kind = get_bidi_utf8 (pfile, buffer->cur, &loc);
> +	      maybe_warn_bidi_on_char (pfile, kind, /*ucn_p=*/false, loc);
> +	    }
> +	  if (*buffer->cur >= utf8_signifier)
> +	    {
> +	      cppchar_t s;
> +	      const uchar *pstr = buffer->cur;
> +	      if (_cpp_valid_utf8 (pfile, &pstr, buffer->rlimit, 0, NULL, &s)
> +		  && s <= 0x0010FFFF)
> +		{
> +		  buffer->cur = pstr;
> +		  continue;
> +		}
> +	    }
> +	  buffer->cur = _cpp_warn_invalid_utf8 (pfile);
> +	}
> +      if (warn_bidi_p)
> +	maybe_warn_bidi_on_close (pfile, buffer->cur);
> +    }
>   
>     _cpp_process_line_notes (pfile, true);
>     return orig_line != pfile->line_table->highest_line;
> @@ -1919,8 +2027,6 @@ warn_about_normalization (cpp_reader *pf
>       }
>   }
>   
> -static const cppchar_t utf8_signifier = 0xC0;
> -
>   /* Returns TRUE if the sequence starting at buffer->cur is valid in
>      an identifier.  FIRST is TRUE if this starts an identifier.  */
>   
> --- gcc/doc/invoke.texi.jj	2022-08-27 09:14:43.047696028 +0200
> +++ gcc/doc/invoke.texi	2022-08-27 14:05:22.417755406 +0200
> @@ -365,9 +365,9 @@ Objective-C and Objective-C++ Dialects}.
>   -Winfinite-recursion @gol
>   -Winit-self  -Winline  -Wno-int-conversion  -Wint-in-bool-context @gol
>   -Wno-int-to-pointer-cast  -Wno-invalid-memory-model @gol
> --Winvalid-pch  -Wjump-misses-init  -Wlarger-than=@var{byte-size} @gol
> --Wlogical-not-parentheses  -Wlogical-op  -Wlong-long @gol
> --Wno-lto-type-mismatch -Wmain  -Wmaybe-uninitialized @gol
> +-Winvalid-pch  -Winvalid-utf8 -Wjump-misses-init  @gol
> +-Wlarger-than=@var{byte-size}  -Wlogical-not-parentheses  -Wlogical-op  @gol
> +-Wlong-long  -Wno-lto-type-mismatch -Wmain  -Wmaybe-uninitialized @gol
>   -Wmemset-elt-size  -Wmemset-transposed-args @gol
>   -Wmisleading-indentation  -Wmissing-attributes  -Wmissing-braces @gol
>   -Wmissing-field-initializers  -Wmissing-format-attribute @gol
> @@ -9569,6 +9569,11 @@ different size.
>   Warn if a precompiled header (@pxref{Precompiled Headers}) is found in
>   the search path but cannot be used.
>   
> +@item -Winvalid-utf8
> +@opindex Winvalid-utf8
> +@opindex Wno-invalid-utf8
> +Warn if an invalid UTF-8 character is inside of a comment.
> +
>   @item -Wlong-long
>   @opindex Wlong-long
>   @opindex Wno-long-long
> --- gcc/c-family/c.opt.jj	2022-08-27 09:14:43.036696173 +0200
> +++ gcc/c-family/c.opt	2022-08-27 14:03:06.328534617 +0200
> @@ -821,6 +821,10 @@ Winvalid-pch
>   C ObjC C++ ObjC++ CPP(warn_invalid_pch) CppReason(CPP_W_INVALID_PCH) Var(cpp_warn_invalid_pch) Init(0) Warning
>   Warn about PCH files that are found but not used.
>   
> +Winvalid-utf8
> +C objC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
> +Warn about invalid UTF-8 characters in comments.
> +
>   Wjump-misses-init
>   C ObjC Var(warn_jump_misses_init) Warning LangEnabledby(C ObjC,Wc++-compat)
>   Warn when a jump misses a variable initialization.
> --- gcc/testsuite/c-c++-common/cpp/Winvalid-utf8-1.c.jj	2022-08-27 14:01:51.115517571 +0200
> +++ gcc/testsuite/c-c++-common/cpp/Winvalid-utf8-1.c	2022-08-27 14:33:21.466802817 +0200
> @@ -0,0 +1,39 @@
> +// P2295R6 - Support for UTF-8 as a portable source file encoding
> +// This test intentionally contains various byte sequences which are not valid UTF-8
> +// { dg-do preprocess }
> +// { dg-options "-finput-charset=UTF-8 -Winvalid-utf8" }
> +
> +// a߿ࠀ퟿𐀀􏿿a		{ dg-bogus "invalid UTF-8 character" }
> +// a�a					{ dg-warning "invalid UTF-8 character <80> in comment" }
> +// a�a					{ dg-warning "invalid UTF-8 character <bf> in comment" }
> +// a�a					{ dg-warning "invalid UTF-8 character <c0> in comment" }
> +// a�a					{ dg-warning "invalid UTF-8 character <c1> in comment" }
> +// a�a					{ dg-warning "invalid UTF-8 character <f5> in comment" }
> +// a�a					{ dg-warning "invalid UTF-8 character <ff> in comment" }
> +// a�a					{ dg-warning "invalid UTF-8 character <c2> in comment" }
> +// a�a					{ dg-warning "invalid UTF-8 character <e0> in comment" }
> +// a���a				{ dg-warning "invalid UTF-8 character <e0><80><bf> in comment" }
> +// a���a				{ dg-warning "invalid UTF-8 character <e0><9f><80> in comment" }
> +// a��a					{ dg-warning "invalid UTF-8 character <e0><bf> in comment" }
> +// a��a					{ dg-warning "invalid UTF-8 character <ec><80> in comment" }
> +// a���a				{ dg-warning "invalid UTF-8 character <ed><a0><80> in comment" }
> +// a����a				{ dg-warning "invalid UTF-8 character <f0><80><80><80> in comment" }
> +// a����a				{ dg-warning "invalid UTF-8 character <f0><8f><bf><bf> in comment" }
> +// a����a				{ dg-warning "invalid UTF-8 character <f4><90><80><80> in comment" }
> +/* a߿ࠀ퟿𐀀􏿿a		{ dg-bogus "invalid UTF-8 character" } */
> +/* a�a					{ dg-warning "invalid UTF-8 character <80> in comment" } */
> +/* a�a					{ dg-warning "invalid UTF-8 character <bf> in comment" } */
> +/* a�a					{ dg-warning "invalid UTF-8 character <c0> in comment" } */
> +/* a�a					{ dg-warning "invalid UTF-8 character <c1> in comment" } */
> +/* a�a					{ dg-warning "invalid UTF-8 character <f5> in comment" } */
> +/* a�a					{ dg-warning "invalid UTF-8 character <ff> in comment" } */
> +/* a�a					{ dg-warning "invalid UTF-8 character <c2> in comment" } */
> +/* a�a					{ dg-warning "invalid UTF-8 character <e0> in comment" } */
> +/* a���a				{ dg-warning "invalid UTF-8 character <e0><80><bf> in comment" } */
> +/* a���a				{ dg-warning "invalid UTF-8 character <e0><9f><80> in comment" } */
> +/* a��a					{ dg-warning "invalid UTF-8 character <e0><bf> in comment" } */
> +/* a��a					{ dg-warning "invalid UTF-8 character <ec><80> in comment" } */
> +/* a���a				{ dg-warning "invalid UTF-8 character <ed><a0><80> in comment" } */
> +/* a����a				{ dg-warning "invalid UTF-8 character <f0><80><80><80> in comment" } */
> +/* a����a				{ dg-warning "invalid UTF-8 character <f0><8f><bf><bf> in comment" } */
> +/* a����a				{ dg-warning "invalid UTF-8 character <f4><90><80><80> in comment" } */
> 
> 	Jakub
>