Patch ping

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Patch ping
@ 2009-10-12 12:37 Jakub Jelinek
  2009-10-12 19:23 ` Tom Tromey
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Jelinek @ 2009-10-12 12:37 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gcc-patches

Hi!

Could you please in light of
http://gcc.gnu.org/ml/gcc-patches/2009-10/msg00179.html
review the libcpp bits of
http://gcc.gnu.org/ml/gcc-patches/2009-04/msg01099.html
?

Thanks.

	Jakub

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Patch ping
  2009-10-12 12:37 Patch ping Jakub Jelinek
@ 2009-10-12 19:23 ` Tom Tromey
  2009-10-12 20:21   ` Jakub Jelinek
  2009-10-13 14:05   ` [PATCH] Raw strings (take 3) Jakub Jelinek
  0 siblings, 2 replies; 6+ messages in thread
From: Tom Tromey @ 2009-10-12 19:23 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

>>>>> "Jakub" == Jakub Jelinek <jakub@redhat.com> writes:

Jakub> Could you please in light of
Jakub> http://gcc.gnu.org/ml/gcc-patches/2009-10/msg00179.html
Jakub> review the libcpp bits of
Jakub> http://gcc.gnu.org/ml/gcc-patches/2009-04/msg01099.html
Jakub> ?

I read the patch.  Sorry about the delay -- these days my attention
wanders a lot so pings for libcpp patches are very helpful.

Most of the patch is clearly fine.

I didn't see anything limiting this to C++0x, but I suppose that will be
done outside libcpp.

The patch refers to `CPP_OPTION (pfile, uliterals)' but I didn't see an
addition to struct cpp_options.

Would it be too much trouble to use calls to cpp_error_with_line for all
new errors?  I think this is generally preferable, and in this code I
think it would also let us emit errors against locations inside strings.
(And, for errors about unterminated strings, it would let us point to
the start of the string, which seems better to me.)

lex_raw_string uses _cpp_get_fresh_line, failing if that returns false.
_cpp_get_fresh_line will always return false inside of a directive -- do
we care about raw strings containing newlines in directives?

Some nits..

From lex_raw_string:

+/* Lexes raw a string.  The stored string contains the spelling, including

I think the first sentence should be "Lexes a raw string".

From _cpp_lex_direct:

+    case 'R':
       /* 'L', 'u' or 'U' may introduce wide characters or strings.  */

This comment needs an update.

This isn't part of libcpp, but it seems to me that C_LEX_RAW_STRINGS is
now confusingly named.

Tom

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Patch ping
  2009-10-12 19:23 ` Tom Tromey
@ 2009-10-12 20:21   ` Jakub Jelinek
  2009-10-12 21:29     ` Tom Tromey
  2009-10-13 14:05   ` [PATCH] Raw strings (take 3) Jakub Jelinek
  1 sibling, 1 reply; 6+ messages in thread
From: Jakub Jelinek @ 2009-10-12 20:21 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Jakub Jelinek, gcc-patches

On Mon, Oct 12, 2009 at 01:20:36PM -0600, Tom Tromey wrote:
> I read the patch.  Sorry about the delay -- these days my attention
> wanders a lot so pings for libcpp patches are very helpful.

Thanks.

> I didn't see anything limiting this to C++0x, but I suppose that will be
> done outside libcpp.
> 
> The patch refers to `CPP_OPTION (pfile, uliterals)' but I didn't see an
> addition to struct cpp_options.

Both of the above questions are related.  It is uliterals that limits this
to C++0x and GNUC99, and that wasn't added because it is already
pre-existing.  Before this patch it was used to limit u"", U"", u'x', U'x',
now it guards also u8"", R"[]", LR"[]", uR"[]", UR"[]" and u8R"[]" style
strings.  See init.c (lang_defaults).

> Would it be too much trouble to use calls to cpp_error_with_line for all
> new errors?  I think this is generally preferable, and in this code I
> think it would also let us emit errors against locations inside strings.
> (And, for errors about unterminated strings, it would let us point to
> the start of the string, which seems better to me.)
> 
> lex_raw_string uses _cpp_get_fresh_line, failing if that returns false.
> _cpp_get_fresh_line will always return false inside of a directive -- do
> we care about raw strings containing newlines in directives?

I'll look at these 2 tomorrow.

> +/* Lexes raw a string.  The stored string contains the spelling, including
> 
> I think the first sentence should be "Lexes a raw string".

Fixed in my copy.

> >From _cpp_lex_direct:
> 
> +    case 'R':
>        /* 'L', 'u' or 'U' may introduce wide characters or strings.  */
> 
> This comment needs an update.

Likewise.

> This isn't part of libcpp, but it seems to me that C_LEX_RAW_STRINGS is
> now confusingly named.

True, perhaps C_LEX_STRING_NO_TRANSLATE_NO_JOIN or just
C_LEX_STRING_NO_JOIN will need to be used instead.

	Jakub

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Patch ping
  2009-10-12 20:21   ` Jakub Jelinek
@ 2009-10-12 21:29     ` Tom Tromey
  0 siblings, 0 replies; 6+ messages in thread
From: Tom Tromey @ 2009-10-12 21:29 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

>>>>> "Jakub" == Jakub Jelinek <jakub@redhat.com> writes:

Tom> I didn't see anything limiting this to C++0x, but I suppose that will be
Tom> done outside libcpp.

Tom> The patch refers to `CPP_OPTION (pfile, uliterals)' but I didn't see an
Tom> addition to struct cpp_options.

Jakub> Both of the above questions are related.  It is uliterals that
Jakub> limits this to C++0x and GNUC99, and that wasn't added because it
Jakub> is already pre-existing.

Oops, I didn't think to look there :)

If you need to add a new option here, that is fine by me.  IIUC, this
part is really just about satisfying the differing needs of the C and
C++ FEs.

Tom

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] Raw strings (take 3)
  2009-10-12 19:23 ` Tom Tromey
  2009-10-12 20:21   ` Jakub Jelinek
@ 2009-10-13 14:05   ` Jakub Jelinek
  2009-10-13 17:24     ` Tom Tromey
  1 sibling, 1 reply; 6+ messages in thread
From: Jakub Jelinek @ 2009-10-13 14:05 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gcc-patches

On Mon, Oct 12, 2009 at 01:20:36PM -0600, Tom Tromey wrote:
> Would it be too much trouble to use calls to cpp_error_with_line for all
> new errors?  I think this is generally preferable, and in this code I
> think it would also let us emit errors against locations inside strings.
> (And, for errors about unterminated strings, it would let us point to
> the start of the string, which seems better to me.)

Done.

> lex_raw_string uses _cpp_get_fresh_line, failing if that returns false.
> _cpp_get_fresh_line will always return false inside of a directive -- do
> we care about raw strings containing newlines in directives?

The code already refuses them before calling cpp_get_fresh_line.  #include
etc. require normal non-raw strings (< h-char-sequence > or " q-char-sequence " ),
and I don't ATM see any need for raw strings in directives.

> >From lex_raw_string:
> 
> +/* Lexes raw a string.  The stored string contains the spelling, including

Fixed.

> +    case 'R':
>        /* 'L', 'u' or 'U' may introduce wide characters or strings.  */
> 
> This comment needs an update.

Likewise.

> This isn't part of libcpp, but it seems to me that C_LEX_RAW_STRINGS is
> now confusingly named.

Likewise.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2009-10-13  Jakub Jelinek  <jakub@redhat.com>

	* charset.c (cpp_init_iconv): Initialize utf8_cset_desc.
	(_cpp_destroy_iconv): Destroy utf8_cset_desc, char16_cset_desc
	and char32_cset_desc.
	(converter_for_type): Handle CPP_UTF8STRING.
	(cpp_interpret_string): Handle CPP_UTF8STRING and raw-strings.
	* directives.c (get__Pragma_string): Handle CPP_UTF8STRING.
	(parse_include): Reject raw strings.
	* include/cpplib.h (CPP_UTF8STRING): New token type.
	* internal.h (struct cpp_reader): Add utf8_cset_desc field.
	* lex.c (lex_raw_string): New function.
	(lex_string): Handle u8 string literals, call lex_raw_string
	for raw string literals.
	(_cpp_lex_direct): Call lex_string even for u8" and {,u,U,L,u8}R"
	sequences.
	* macro.c (stringify_arg): Handle CPP_UTF8STRING.

	* c-common.c (c_parse_error): Handle CPP_UTF8STRING.
	* c-lex.c (c_lex_with_flags): Likewise.  Test C_LEX_STRING_NO_JOIN
	instead of C_LEX_RAW_STRINGS.
	(lex_string): Handle CPP_UTF8STRING.
	* c-parser.c (c_parser_postfix_expression): Likewise.
	* c-pragma.h (C_LEX_RAW_STRINGS): Rename to ...
	(C_LEX_STRING_NO_JOIN): ... this.

	* parser.c (cp_lexer_print_token, cp_parser_is_string_literal,
	cp_parser_string_literal, cp_parser_primary_expression): Likewise.
	(cp_lexer_get_preprocessor_token): Use C_LEX_STRING_JOIN instead
	of C_LEX_RAW_STRINGS.

	* gcc.dg/raw-string-1.c: New test.
	* gcc.dg/raw-string-2.c: New test.
	* gcc.dg/raw-string-3.c: New test.
	* gcc.dg/raw-string-4.c: New test.
	* gcc.dg/raw-string-5.c: New test.
	* gcc.dg/raw-string-6.c: New test.
	* gcc.dg/raw-string-7.c: New test.
	* gcc.dg/utf8-1.c: New test.
	* gcc.dg/utf8-2.c: New test.
	* gcc.dg/utf-badconcat2.c: New test.
	* gcc.dg/utf-dflt2.c: New test.
	* gcc.dg/cpp/include6.c: New test.
	* g++.dg/ext/raw-string-1.C: New test.
	* g++.dg/ext/raw-string-2.C: New test.
	* g++.dg/ext/raw-string-3.C: New test.
	* g++.dg/ext/raw-string-4.C: New test.
	* g++.dg/ext/raw-string-5.C: New test.
	* g++.dg/ext/raw-string-6.C: New test.
	* g++.dg/ext/raw-string-7.C: New test.
	* g++.dg/ext/utf8-1.C: New test.
	* g++.dg/ext/utf8-2.C: New test.
	* g++.dg/ext/utf-badconcat2.C: New test.
	* g++.dg/ext/utf-dflt2.C: New test.

--- libcpp/macro.c.jj	2009-09-03 09:59:43.000000000 +0200
+++ libcpp/macro.c	2009-10-12 21:45:01.000000000 +0200
@@ -379,7 +379,8 @@ stringify_arg (cpp_reader *pfile, macro_
       escape_it = (token->type == CPP_STRING || token->type == CPP_CHAR
 		   || token->type == CPP_WSTRING || token->type == CPP_WCHAR
 		   || token->type == CPP_STRING32 || token->type == CPP_CHAR32
-		   || token->type == CPP_STRING16 || token->type == CPP_CHAR16);
+		   || token->type == CPP_STRING16 || token->type == CPP_CHAR16
+		   || token->type == CPP_UTF8STRING);
 
       /* Room for each char being written in octal, initial space and
 	 final quote and NUL.  */
--- libcpp/include/cpplib.h.jj	2009-09-19 12:04:15.000000000 +0200
+++ libcpp/include/cpplib.h	2009-10-12 21:45:01.000000000 +0200
@@ -127,6 +127,7 @@ struct _cpp_file;
   TK(WSTRING,		LITERAL) /* L"string" */			\
   TK(STRING16,		LITERAL) /* u"string" */			\
   TK(STRING32,		LITERAL) /* U"string" */			\
+  TK(UTF8STRING,	LITERAL) /* u8"string" */			\
   TK(OBJC_STRING,	LITERAL) /* @"string" - Objective-C */		\
   TK(HEADER_NAME,	LITERAL) /* <stdio.h> in #include */		\
 									\
@@ -728,10 +729,10 @@ extern const unsigned char *cpp_macro_de
 extern void _cpp_backup_tokens (cpp_reader *, unsigned int);
 extern const cpp_token *cpp_peek_token (cpp_reader *, int);
 
-/* Evaluate a CPP_CHAR or CPP_WCHAR token.  */
+/* Evaluate a CPP_*CHAR* token.  */
 extern cppchar_t cpp_interpret_charconst (cpp_reader *, const cpp_token *,
 					  unsigned int *, int *);
-/* Evaluate a vector of CPP_STRING or CPP_WSTRING tokens.  */
+/* Evaluate a vector of CPP_*STRING* tokens.  */
 extern bool cpp_interpret_string (cpp_reader *,
 				  const cpp_string *, size_t,
 				  cpp_string *, enum cpp_ttype);
--- libcpp/internal.h.jj	2009-06-08 11:54:15.000000000 +0200
+++ libcpp/internal.h	2009-10-12 21:45:01.000000000 +0200
@@ -397,6 +397,10 @@ struct cpp_reader
   struct cset_converter narrow_cset_desc;
 
   /* Descriptor for converting from the source character set to the
+     UTF-8 execution character set.  */
+  struct cset_converter utf8_cset_desc;
+
+  /* Descriptor for converting from the source character set to the
      UTF-16 execution character set.  */
   struct cset_converter char16_cset_desc;
 
--- libcpp/directives.c.jj	2009-09-13 19:25:04.000000000 +0200
+++ libcpp/directives.c	2009-10-13 10:00:23.000000000 +0200
@@ -697,7 +697,8 @@ parse_include (cpp_reader *pfile, int *p
   /* Allow macro expansion.  */
   header = get_token_no_padding (pfile);
   *location = header->src_loc;
-  if (header->type == CPP_STRING || header->type == CPP_HEADER_NAME)
+  if ((header->type == CPP_STRING && header->val.str.text[0] != 'R')
+      || header->type == CPP_HEADER_NAME)
     {
       fname = XNEWVEC (char, header->val.str.len - 1);
       memcpy (fname, header->val.str.text + 1, header->val.str.len - 2);
@@ -1537,7 +1538,8 @@ get__Pragma_string (cpp_reader *pfile)
   if (string->type == CPP_EOF)
     _cpp_backup_tokens (pfile, 1);
   if (string->type != CPP_STRING && string->type != CPP_WSTRING
-      && string->type != CPP_STRING32 && string->type != CPP_STRING16)
+      && string->type != CPP_STRING32 && string->type != CPP_STRING16
+      && string->type != CPP_UTF8STRING)
     return NULL;
 
   paren = get_token_no_padding (pfile);
--- libcpp/lex.c.jj	2009-06-30 13:10:43.000000000 +0200
+++ libcpp/lex.c	2009-10-13 09:41:20.000000000 +0200
@@ -617,12 +617,191 @@ create_literal (cpp_reader *pfile, cpp_t
   token->val.str.text = dest;
 }
 
+/* Lexes a raw string.  The stored string contains the spelling, including
+   double quotes, delimiter string, '[' and ']', any leading
+   'L', 'u', 'U' or 'u8' and 'R' modifier.  It returns the type of the
+   literal, or CPP_OTHER if it was not properly terminated.
+
+   The spelling is NUL-terminated, but it is not guaranteed that this
+   is the first NUL since embedded NULs are preserved.  */
+
+static void
+lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base,
+		const uchar *cur)
+{
+  source_location saw_NUL = 0;
+  const uchar *raw_prefix;
+  unsigned int raw_prefix_len = 0;
+  enum cpp_ttype type;
+  size_t total_len = 0;
+  _cpp_buff *first_buff = NULL, *last_buff = NULL;
+
+  type = (*base == 'L' ? CPP_WSTRING :
+	  *base == 'U' ? CPP_STRING32 :
+	  *base == 'u' ? (base[1] == '8' ? CPP_UTF8STRING : CPP_STRING16)
+	  : CPP_STRING);
+
+  raw_prefix = cur + 1;
+  while (raw_prefix_len < 16)
+    {
+      switch (raw_prefix[raw_prefix_len])
+	{
+	case ' ': case '[': case ']': case '\t':
+	case '\v': case '\f': case '\n': default:
+	  break;
+	/* Basic source charset except the above chars.  */
+	case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
+	case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
+	case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
+	case 's': case 't': case 'u': case 'v': case 'w': case 'x':
+	case 'y': case 'z':
+	case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
+	case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
+	case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
+	case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
+	case 'Y': case 'Z':
+	case '0': case '1': case '2': case '3': case '4': case '5':
+	case '6': case '7': case '8': case '9':
+	case '_': case '{': case '}': case '#': case '(': case ')':
+	case '<': case '>': case '%': case ':': case ';': case '.':
+	case '?': case '*': case '+': case '-': case '/': case '^':
+	case '&': case '|': case '~': case '!': case '=': case ',':
+	case '\\': case '"': case '\'':
+	  raw_prefix_len++;
+	  continue;
+	}
+      break;
+    }
+
+  if (raw_prefix[raw_prefix_len] != '[')
+    {
+      int col = CPP_BUF_COLUMN (pfile->buffer, raw_prefix + raw_prefix_len);
+      if (raw_prefix_len == 16)
+	cpp_error_with_line (pfile, CPP_DL_ERROR, token->src_loc, col,
+			     "raw string delimiter longer than 16 characters");
+      else
+	cpp_error_with_line (pfile, CPP_DL_ERROR, token->src_loc, col,
+			     "invalid character '%c' in raw string delimiter",
+			     (int) raw_prefix[raw_prefix_len]);
+      pfile->buffer->cur = raw_prefix - 1;
+      create_literal (pfile, token, base, raw_prefix - 1 - base, CPP_OTHER);
+      return;
+    }
+
+  cur = raw_prefix + raw_prefix_len + 1;
+  for (;;)
+    {
+      cppchar_t c = *cur++;
+
+      if (c == ']'
+	  && strncmp ((const char *) cur, (const char *) raw_prefix,
+		      raw_prefix_len) == 0
+	  && cur[raw_prefix_len] == '"')
+	{
+	  cur += raw_prefix_len + 1;
+	  break;
+	}
+      else if (c == '\n')
+	{
+	  if (pfile->state.in_directive
+	      || pfile->state.parsing_args
+	      || pfile->state.in_deferred_pragma)
+	    {
+	      cur--;
+	      type = CPP_OTHER;
+	      cpp_error_with_line (pfile, CPP_DL_ERROR, token->src_loc, 0,
+				   "unterminated raw string");
+	      break;
+	    }
+
+	  /* raw strings allow embedded non-escaped newlines, which
+	     complicates this routine a lot.  */
+	  if (first_buff == NULL)
+	    {
+	      total_len = cur - base;
+	      first_buff = last_buff = _cpp_get_buff (pfile, total_len);
+	      memcpy (BUFF_FRONT (last_buff), base, total_len);
+	      raw_prefix = BUFF_FRONT (last_buff) + (raw_prefix - base);
+	      BUFF_FRONT (last_buff) += total_len;
+	    }
+	  else
+	    {
+	      size_t len = cur - base;
+	      size_t cur_len = len > BUFF_ROOM (last_buff)
+			       ? BUFF_ROOM (last_buff) : len;
+
+	      total_len += len;
+	      memcpy (BUFF_FRONT (last_buff), base, cur_len);
+	      BUFF_FRONT (last_buff) += cur_len;
+	      if (len > cur_len)
+		{
+		  last_buff = _cpp_append_extend_buff (pfile, last_buff,
+						       len - cur_len);
+		  memcpy (BUFF_FRONT (last_buff), base + cur_len,
+			  len - cur_len);
+		  BUFF_FRONT (last_buff) += len - cur_len;
+		}
+	    }
+
+	  if (pfile->buffer->cur < pfile->buffer->rlimit)
+	    CPP_INCREMENT_LINE (pfile, 0);
+	  pfile->buffer->need_line = true;
+
+	  if (!_cpp_get_fresh_line (pfile))
+	    {
+	      source_location src_loc = token->src_loc;
+	      token->type = CPP_EOF;
+	      /* Tell the compiler the line number of the EOF token.  */
+	      token->src_loc = pfile->line_table->highest_line;
+	      token->flags = BOL;
+	      if (first_buff != NULL)
+		_cpp_release_buff (pfile, first_buff);
+	      cpp_error_with_line (pfile, CPP_DL_ERROR, src_loc, 0,
+				   "unterminated raw string");
+	      return;
+	    }
+
+	  cur = base = pfile->buffer->cur;
+	}
+      else if (c == '\0' && !saw_NUL)
+	LINEMAP_POSITION_FOR_COLUMN (saw_NUL, pfile->line_table,
+				     CPP_BUF_COLUMN (pfile->buffer, cur - 1));
+    }
+
+  if (saw_NUL && !pfile->state.skipping)
+    cpp_error_with_line (pfile, CPP_DL_WARNING, saw_NUL, 0,
+	       "null character(s) preserved in literal");
+
+  pfile->buffer->cur = cur;
+  if (first_buff == NULL)
+    create_literal (pfile, token, base, cur - base, type);
+  else
+    {
+      uchar *dest = _cpp_unaligned_alloc (pfile, total_len + (cur - base) + 1);
+
+      token->type = type;
+      token->val.str.len = total_len + (cur - base);
+      token->val.str.text = dest;
+      last_buff = first_buff;
+      while (last_buff != NULL)
+	{
+	  memcpy (dest, last_buff->base,
+		  BUFF_FRONT (last_buff) - last_buff->base);
+	  dest += BUFF_FRONT (last_buff) - last_buff->base;
+	  last_buff = last_buff->next;
+	}
+      _cpp_release_buff (pfile, first_buff);
+      memcpy (dest, base, cur - base);
+      dest[cur - base] = '\0';
+    }
+}
+
 /* Lexes a string, character constant, or angle-bracketed header file
    name.  The stored string contains the spelling, including opening
-   quote and leading any leading 'L', 'u' or 'U'.  It returns the type
-   of the literal, or CPP_OTHER if it was not properly terminated, or
-   CPP_LESS for an unterminated header name which must be relexed as
-   normal tokens.
+   quote and any leading 'L', 'u', 'U' or 'u8' and optional
+   'R' modifier.  It returns the type of the literal, or CPP_OTHER
+   if it was not properly terminated, or CPP_LESS for an unterminated
+   header name which must be relexed as normal tokens.
 
    The spelling is NUL-terminated, but it is not guaranteed that this
    is the first NUL since embedded NULs are preserved.  */
@@ -636,12 +815,24 @@ lex_string (cpp_reader *pfile, cpp_token
 
   cur = base;
   terminator = *cur++;
-  if (terminator == 'L' || terminator == 'u' || terminator == 'U')
+  if (terminator == 'L' || terminator == 'U')
     terminator = *cur++;
-  if (terminator == '\"')
+  else if (terminator == 'u')
+    {
+      terminator = *cur++;
+      if (terminator == '8')
+	terminator = *cur++;
+    }
+  if (terminator == 'R')
+    {
+      lex_raw_string (pfile, token, base, cur);
+      return;
+    }
+  if (terminator == '"')
     type = (*base == 'L' ? CPP_WSTRING :
 	    *base == 'U' ? CPP_STRING32 :
-	    *base == 'u' ? CPP_STRING16 : CPP_STRING);
+	    *base == 'u' ? (base[1] == '8' ? CPP_UTF8STRING : CPP_STRING16)
+			 : CPP_STRING);
   else if (terminator == '\'')
     type = (*base == 'L' ? CPP_WCHAR :
 	    *base == 'U' ? CPP_CHAR32 :
@@ -1101,10 +1292,21 @@ _cpp_lex_direct (cpp_reader *pfile)
     case 'L':
     case 'u':
     case 'U':
-      /* 'L', 'u' or 'U' may introduce wide characters or strings.  */
+    case 'R':
+      /* 'L', 'u', 'U', 'u8' or 'R' may introduce wide characters,
+	 wide strings or raw strings.  */
       if (c == 'L' || CPP_OPTION (pfile, uliterals))
 	{
-	  if (*buffer->cur == '\'' || *buffer->cur == '"')
+	  if ((*buffer->cur == '\'' && c != 'R')
+	      || *buffer->cur == '"'
+	      || (*buffer->cur == 'R'
+		  && c != 'R'
+		  && buffer->cur[1] == '"'
+		  && CPP_OPTION (pfile, uliterals))
+	      || (*buffer->cur == '8'
+		  && c == 'u'
+		  && (buffer->cur[1] == '"'
+		      || (buffer->cur[1] == 'R' && buffer->cur[2] == '"'))))
 	    {
 	      lex_string (pfile, result, buffer->cur - 1);
 	      break;
@@ -1120,7 +1322,7 @@ _cpp_lex_direct (cpp_reader *pfile)
     case 'y': case 'z':
     case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
     case 'G': case 'H': case 'I': case 'J': case 'K':
-    case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
+    case 'M': case 'N': case 'O': case 'P': case 'Q':
     case 'S': case 'T':           case 'V': case 'W': case 'X':
     case 'Y': case 'Z':
       result->type = CPP_NAME;
--- libcpp/charset.c.jj	2009-08-19 17:46:22.000000000 +0200
+++ libcpp/charset.c	2009-10-12 21:45:01.000000000 +0200
@@ -721,6 +721,8 @@ cpp_init_iconv (cpp_reader *pfile)
 
   pfile->narrow_cset_desc = init_iconv_desc (pfile, ncset, SOURCE_CHARSET);
   pfile->narrow_cset_desc.width = CPP_OPTION (pfile, char_precision);
+  pfile->utf8_cset_desc = init_iconv_desc (pfile, "UTF-8", SOURCE_CHARSET);
+  pfile->utf8_cset_desc.width = CPP_OPTION (pfile, char_precision);
   pfile->char16_cset_desc = init_iconv_desc (pfile,
 					     be ? "UTF-16BE" : "UTF-16LE",
 					     SOURCE_CHARSET);
@@ -741,6 +743,12 @@ _cpp_destroy_iconv (cpp_reader *pfile)
     {
       if (pfile->narrow_cset_desc.func == convert_using_iconv)
 	iconv_close (pfile->narrow_cset_desc.cd);
+      if (pfile->utf8_cset_desc.func == convert_using_iconv)
+	iconv_close (pfile->utf8_cset_desc.cd);
+      if (pfile->char16_cset_desc.func == convert_using_iconv)
+	iconv_close (pfile->char16_cset_desc.cd);
+      if (pfile->char32_cset_desc.func == convert_using_iconv)
+	iconv_close (pfile->char32_cset_desc.cd);
       if (pfile->wide_cset_desc.func == convert_using_iconv)
 	iconv_close (pfile->wide_cset_desc.cd);
     }
@@ -1330,6 +1338,8 @@ converter_for_type (cpp_reader *pfile, e
     {
     default:
 	return pfile->narrow_cset_desc;
+    case CPP_UTF8STRING:
+	return pfile->utf8_cset_desc;
     case CPP_CHAR16:
     case CPP_STRING16:
 	return pfile->char16_cset_desc;
@@ -1364,7 +1374,47 @@ cpp_interpret_string (cpp_reader *pfile,
   for (i = 0; i < count; i++)
     {
       p = from[i].text;
-      if (*p == 'L' || *p == 'u' || *p == 'U') p++;
+      if (*p == 'u')
+	{
+	  if (*++p == '8')
+	    p++;
+	}
+      else if (*p == 'L' || *p == 'U') p++;
+      if (*p == 'R')
+	{
+	  const uchar *prefix;
+
+	  /* Skip over 'R"'.  */
+	  p += 2;
+	  prefix = p;
+	  while (*p != '[')
+	    p++;
+	  p++;
+	  limit = from[i].text + from[i].len;
+	  if (limit >= p + (p - prefix) + 1)
+	    limit -= (p - prefix) + 1;
+
+	  for (;;)
+	    {
+	      base = p;
+	      while (p < limit && (*p != '\\' || (p[1] != 'u' && p[1] != 'U')))
+		p++;
+	      if (p > base)
+		{
+		  /* We have a run of normal characters; these can be fed
+		     directly to convert_cset.  */
+		  if (!APPLY_CONVERSION (cvt, base, p - base, &tbuf))
+		    goto fail;
+		}
+	      if (p == limit)
+		break;
+
+	      p = convert_ucn (pfile, p + 1, limit, &tbuf, cvt);
+	    }
+
+	  continue;
+	}
+
       p++; /* Skip leading quote.  */
       limit = from[i].text + from[i].len - 1; /* Skip trailing quote.  */
 
--- gcc/cp/parser.c.jj	2009-10-07 09:24:42.000000000 +0200
+++ gcc/cp/parser.c	2009-10-13 09:43:46.000000000 +0200
@@ -402,7 +402,7 @@ cp_lexer_get_preprocessor_token (cp_lexe
    /* Get a new token from the preprocessor.  */
   token->type
     = c_lex_with_flags (&token->u.value, &token->location, &token->flags,
-			lexer == NULL ? 0 : C_LEX_RAW_STRINGS);
+			lexer == NULL ? 0 : C_LEX_STRING_NO_JOIN);
   token->keyword = RID_MAX;
   token->pragma_kind = PRAGMA_NONE;
 
@@ -792,6 +792,7 @@ cp_lexer_print_token (FILE * stream, cp_
     case CPP_STRING16:
     case CPP_STRING32:
     case CPP_WSTRING:
+    case CPP_UTF8STRING:
       fprintf (stream, " \"%s\"", TREE_STRING_POINTER (token->u.value));
       break;
 
@@ -2060,7 +2061,8 @@ cp_parser_is_string_literal (cp_token* t
   return (token->type == CPP_STRING ||
 	  token->type == CPP_STRING16 ||
 	  token->type == CPP_STRING32 ||
-	  token->type == CPP_WSTRING);
+	  token->type == CPP_WSTRING ||
+	  token->type == CPP_UTF8STRING);
 }
 
 /* Returns nonzero if TOKEN is the indicated KEYWORD.  */
@@ -2999,6 +3001,7 @@ cp_parser_string_literal (cp_parser *par
 	{
 	default:
 	case CPP_STRING:
+	case CPP_UTF8STRING:
 	  TREE_TYPE (value) = char_array_type_node;
 	  break;
 	case CPP_STRING16:
@@ -3228,6 +3231,7 @@ cp_parser_primary_expression (cp_parser 
     case CPP_STRING16:
     case CPP_STRING32:
     case CPP_WSTRING:
+    case CPP_UTF8STRING:
       /* ??? Should wide strings be allowed when parser->translate_strings_p
 	 is false (i.e. in attributes)?  If not, we can kill the third
 	 argument to cp_parser_string_literal.  */
--- gcc/c-parser.c.jj	2009-09-24 11:32:36.000000000 +0200
+++ gcc/c-parser.c	2009-10-12 21:45:01.000000000 +0200
@@ -5349,6 +5349,7 @@ c_parser_postfix_expression (c_parser *p
     case CPP_STRING16:
     case CPP_STRING32:
     case CPP_WSTRING:
+    case CPP_UTF8STRING:
       expr.value = c_parser_peek_token (parser)->value;
       expr.original_code = STRING_CST;
       c_parser_consume_token (parser);
--- gcc/c-pragma.h.jj	2008-09-05 12:56:32.000000000 +0200
+++ gcc/c-pragma.h	2009-10-13 09:43:07.000000000 +0200
@@ -118,9 +118,9 @@ extern enum cpp_ttype pragma_lex (tree *
    so that 0 means to translate and join strings.  */
 #define C_LEX_STRING_NO_TRANSLATE 1 /* Do not lex strings into
 				       execution character set.  */
-#define C_LEX_RAW_STRINGS         2 /* Return raw strings -- no
-				       concatenation, no
-				       translation.  */
+#define C_LEX_STRING_NO_JOIN	  2 /* Do not concatenate strings
+				       nor translate them into execution
+				       character set.  */
 
 /* This is not actually available to pragma parsers.  It's merely a
    convenient location to declare this function for c-lex, after
--- gcc/c-common.c.jj	2009-10-05 10:07:22.000000000 +0200
+++ gcc/c-common.c	2009-10-12 21:59:17.000000000 +0200
@@ -8148,7 +8148,8 @@ c_parse_error (const char *gmsgid, enum 
   else if (token_type == CPP_STRING 
 	   || token_type == CPP_WSTRING 
 	   || token_type == CPP_STRING16
-	   || token_type == CPP_STRING32)
+	   || token_type == CPP_STRING32
+	   || token_type == CPP_UTF8STRING)
     message = catenate_messages (gmsgid, " before string constant");
   else if (token_type == CPP_NUMBER)
     message = catenate_messages (gmsgid, " before numeric constant");
--- gcc/c-lex.c.jj	2009-08-19 17:46:11.000000000 +0200
+++ gcc/c-lex.c	2009-10-13 09:43:25.000000000 +0200
@@ -365,6 +365,7 @@ c_lex_with_flags (tree *value, location_
 	    case CPP_WSTRING:
 	    case CPP_STRING16:
 	    case CPP_STRING32:
+	    case CPP_UTF8STRING:
 	      type = lex_string (tok, value, true, true);
 	      break;
 
@@ -423,7 +424,8 @@ c_lex_with_flags (tree *value, location_
     case CPP_WSTRING:
     case CPP_STRING16:
     case CPP_STRING32:
-      if ((lex_flags & C_LEX_RAW_STRINGS) == 0)
+    case CPP_UTF8STRING:
+      if ((lex_flags & C_LEX_STRING_NO_JOIN) == 0)
 	{
 	  type = lex_string (tok, value, false,
 			     (lex_flags & C_LEX_STRING_NO_TRANSLATE) == 0);
@@ -871,12 +873,13 @@ interpret_fixed (const cpp_token *token,
   return value;
 }
 
-/* Convert a series of STRING, WSTRING, STRING16 and/or STRING32 tokens
-   into a tree, performing string constant concatenation.  TOK is the
-   first of these.  VALP is the location to write the string into.
-   OBJC_STRING indicates whether an '@' token preceded the incoming token.
+/* Convert a series of STRING, WSTRING, STRING16, STRING32 and/or
+   UTF8STRING tokens into a tree, performing string constant
+   concatenation.  TOK is the first of these.  VALP is the location
+   to write the string into. OBJC_STRING indicates whether an '@' token
+   preceded the incoming token.
    Returns the CPP token type of the result (CPP_STRING, CPP_WSTRING,
-   CPP_STRING32, CPP_STRING16, or CPP_OBJC_STRING).
+   CPP_STRING32, CPP_STRING16, CPP_UTF8STRING, or CPP_OBJC_STRING).
 
    This is unfortunately more work than it should be.  If any of the
    strings in the series has an L prefix, the result is a wide string
@@ -921,6 +924,7 @@ lex_string (const cpp_token *tok, tree *
     case CPP_WSTRING:
     case CPP_STRING16:
     case CPP_STRING32:
+    case CPP_UTF8STRING:
       if (type != tok->type)
 	{
 	  if (type == CPP_STRING)
@@ -966,6 +970,7 @@ lex_string (const cpp_token *tok, tree *
 	{
 	default:
 	case CPP_STRING:
+	case CPP_UTF8STRING:
 	  value = build_string (1, "");
 	  break;
 	case CPP_STRING16:
@@ -991,6 +996,7 @@ lex_string (const cpp_token *tok, tree *
     {
     default:
     case CPP_STRING:
+    case CPP_UTF8STRING:
       TREE_TYPE (value) = char_array_type_node;
       break;
     case CPP_STRING16:
--- gcc/testsuite/gcc.dg/utf8-1.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/utf8-1.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-require-iconv "ISO-8859-2" } */
+/* { dg-options "-std=gnu99 -fexec-charset=ISO-8859-2" } */
+
+const char *str1 = "h\u00e1\U0000010Dky ";
+const char *str2 = "\u010d\u00E1rky\n";
+const char *str3 = u8"h\u00e1\U0000010Dky ";
+const char *str4 = u8"\u010d\u00E1rky\n";
+const char *str5 = "h\u00e1\U0000010Dky " "\u010d\u00E1rky\n";
+const char *str6 = u8"h\u00e1\U0000010Dky " "\u010d\u00E1rky\n";
+const char *str7 = "h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n";
+#define u8
+const char *str8 = u8"h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n";
+
+const char latin2_1[] = "\x68\xe1\xe8\x6b\x79\x20";
+const char latin2_2[] = "\xe8\xe1\x72\x6b\x79\n";
+const char utf8_1[] = "\x68\xc3\xa1\xc4\x8d\x6b\x79\x20";
+const char utf8_2[] = "\xc4\x8d\xc3\xa1\x72\x6b\x79\n";
+
+int
+main (void)
+{
+  if (__builtin_strcmp (str1, latin2_1) != 0
+      || __builtin_strcmp (str2, latin2_2) != 0
+      || __builtin_strcmp (str3, utf8_1) != 0
+      || __builtin_strcmp (str4, utf8_2) != 0
+      || __builtin_strncmp (str5, latin2_1, sizeof (latin2_1) - 1) != 0
+      || __builtin_strcmp (str5 + sizeof (latin2_1) - 1, latin2_2) != 0
+      || __builtin_strncmp (str6, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_strcmp (str6 + sizeof (utf8_1) - 1, utf8_2) != 0
+      || __builtin_strncmp (str7, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_strcmp (str7 + sizeof (utf8_1) - 1, utf8_2) != 0
+      || __builtin_strncmp (str8, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_strcmp (str8 + sizeof (utf8_1) - 1, utf8_2) != 0)
+    __builtin_abort ();
+  if (sizeof ("a" u8"b"[0]) != 1
+      || sizeof (u8"a" "b"[0]) != 1
+      || sizeof (u8"a" u8"b"[0]) != 1
+      || sizeof ("a" "\u010d") != 3
+      || sizeof ("a" u8"\u010d") != 4
+      || sizeof (u8"a" "\u010d") != 4
+      || sizeof (u8"a" "\u010d") != 4)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/utf-badconcat2.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/utf-badconcat2.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,15 @@
+/* Test unsupported concatenation of UTF-8 string literals. */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+void	*s0	= u8"a"   "b";
+void	*s1	=   "a" u8"b";
+void	*s2	= u8"a" u8"b";
+void	*s3	= u8"a"  u"b";	/* { dg-error "non-standard concatenation" } */
+void	*s4	=  u"a" u8"b";	/* { dg-error "non-standard concatenation" } */
+void	*s5	= u8"a"  U"b";	/* { dg-error "non-standard concatenation" } */
+void	*s6	=  U"a" u8"b";	/* { dg-error "non-standard concatenation" } */
+void	*s7	= u8"a"  L"b";	/* { dg-error "non-standard concatenation" } */
+void	*s8	=  L"a" u8"b";	/* { dg-error "non-standard concatenation" } */
+
+int main () {}
--- gcc/testsuite/gcc.dg/utf8-2.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/utf8-2.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+#include <wchar.h>
+
+typedef __CHAR16_TYPE__	char16_t;
+typedef __CHAR32_TYPE__ char32_t;
+
+const char	s0[]	= u8"ab";
+const char16_t	s1[]	= u8"ab";	/* { dg-error "from non-wide" } */
+const char32_t  s2[]    = u8"ab";	/* { dg-error "from non-wide" } */
+const wchar_t   s3[]    = u8"ab";	/* { dg-error "from non-wide" } */
+
+const char      t0[0]   = u8"ab";	/* { dg-warning "chars is too long" } */
+const char      t1[1]   = u8"ab";	/* { dg-warning "chars is too long" } */
+const char      t2[2]   = u8"ab";
+const char      t3[3]   = u8"ab";
+const char      t4[4]   = u8"ab";
+
+const char      u0[0]   = u8"\u2160.";	/* { dg-warning "chars is too long" } */
+const char      u1[1]   = u8"\u2160.";	/* { dg-warning "chars is too long" } */
+const char      u2[2]   = u8"\u2160.";	/* { dg-warning "chars is too long" } */
+const char      u3[3]   = u8"\u2160.";	/* { dg-warning "chars is too long" } */
+const char      u4[4]   = u8"\u2160.";
+const char      u5[5]   = u8"\u2160.";
+const char      u6[6]   = u8"\u2160.";
--- gcc/testsuite/gcc.dg/raw-string-6.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/raw-string-6.c	2009-10-13 09:49:33.000000000 +0200
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+const void *s0 = R"ouch[]ouCh";	/* { dg-error "expected expression at end of input" } */
+	/* { dg-error "unterminated raw string" "" { target *-*-* } 4 } */
--- gcc/testsuite/gcc.dg/raw-string-1.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/raw-string-1.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,101 @@
+/* { dg-do run } */
+/* { dg-options "-std=gnu99" } */
+
+#include <wchar.h>
+
+typedef __CHAR16_TYPE__	char16_t;
+typedef __CHAR32_TYPE__ char32_t;
+
+const char s0[] = R"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const char s1[] = "a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const char s2[] = R"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const char s3[] = "ab\nc]\"\nc]*|\"\nc";
+
+const char t0[] = u8R"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const char t1[] = u8"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const char t2[] = u8R"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const char t3[] = u8"ab\nc]\"\nc]*|\"\nc";
+
+const char16_t u0[] = uR"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const char16_t u1[] = u"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const char16_t u2[] = uR"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const char16_t u3[] = u"ab\nc]\"\nc]*|\"\nc";
+
+const char32_t U0[] = UR"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const char32_t U1[] = U"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const char32_t U2[] = UR"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const char32_t U3[] = U"ab\nc]\"\nc]*|\"\nc";
+
+const wchar_t L0[] = LR"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const wchar_t L1[] = L"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const wchar_t L2[] = LR"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const wchar_t L3[] = L"ab\nc]\"\nc]*|\"\nc";
+
+int
+main (void)
+{
+  if (sizeof (s0) != sizeof (s1)
+      || __builtin_memcmp (s0, s1, sizeof (s0)) != 0)
+    __builtin_abort ();
+  if (sizeof (s2) != sizeof (s3)
+      || __builtin_memcmp (s2, s3, sizeof (s2)) != 0)
+    __builtin_abort ();
+  if (sizeof (t0) != sizeof (t1)
+      || __builtin_memcmp (t0, t1, sizeof (t0)) != 0)
+    __builtin_abort ();
+  if (sizeof (t2) != sizeof (t3)
+      || __builtin_memcmp (t2, t3, sizeof (t2)) != 0)
+    __builtin_abort ();
+  if (sizeof (u0) != sizeof (u1)
+      || __builtin_memcmp (u0, u1, sizeof (u0)) != 0)
+    __builtin_abort ();
+  if (sizeof (u2) != sizeof (u3)
+      || __builtin_memcmp (u2, u3, sizeof (u2)) != 0)
+    __builtin_abort ();
+  if (sizeof (U0) != sizeof (U1)
+      || __builtin_memcmp (U0, U1, sizeof (U0)) != 0)
+    __builtin_abort ();
+  if (sizeof (U2) != sizeof (U3)
+      || __builtin_memcmp (U2, U3, sizeof (U2)) != 0)
+    __builtin_abort ();
+  if (sizeof (L0) != sizeof (L1)
+      || __builtin_memcmp (L0, L1, sizeof (L0)) != 0)
+    __builtin_abort ();
+  if (sizeof (L2) != sizeof (L3)
+      || __builtin_memcmp (L2, L3, sizeof (L2)) != 0)
+    __builtin_abort ();
+  if (sizeof (R"*[]*") != 1
+      || __builtin_memcmp (R"*[]*", "", 1) != 0)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/raw-string-2.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/raw-string-2.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,109 @@
+/* { dg-do run } */
+/* { dg-options "-std=gnu99" } */
+
+#include <wchar.h>
+
+typedef __CHAR16_TYPE__	char16_t;
+typedef __CHAR32_TYPE__ char32_t;
+
+#define R
+#define u
+#define uR
+#define U
+#define UR
+#define u8
+#define u8R
+#define L
+#define LR
+
+const char s00[] = R"[a]" "[b]";
+const char s01[] = "[a]" R"*[b]*";
+const char s02[] = R"[a]" R"[b]";
+const char s03[] = R"-[a]-" u8"[b]";
+const char s04[] = "[a]" u8R"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ";
+const char s05[] = R"[a]" u8R"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL";
+const char s06[] = u8R";([a];(" "[b]";
+const char s07[] = u8"[a]" R"[b]";
+const char s08[] = u8R"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-";
+const char s09[] = u8R"/^&|~!=,"'\[a]/^&|~!=,"'\" u8"[b]";
+const char s10[] = u8"[a]" u8R"0123456789abcdef[b]0123456789abcdef";
+const char s11[] = u8R"ghijklmnopqrstuv[a]ghijklmnopqrstuv" u8R"w[b]w";
+
+const char16_t u03[] = R"-[a]-" u"[b]";
+const char16_t u04[] = "[a]" uR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ";
+const char16_t u05[] = R"[a]" uR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL";
+const char16_t u06[] = uR";([a];(" "[b]";
+const char16_t u07[] = u"[a]" R"[b]";
+const char16_t u08[] = uR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-";
+const char16_t u09[] = uR"/^&|~!=,"'\[a]/^&|~!=,"'\" u"[b]";
+const char16_t u10[] = u"[a]" uR"0123456789abcdef[b]0123456789abcdef";
+const char16_t u11[] = uR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" uR"w[b]w";
+
+const char32_t U03[] = R"-[a]-" U"[b]";
+const char32_t U04[] = "[a]" UR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ";
+const char32_t U05[] = R"[a]" UR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL";
+const char32_t U06[] = UR";([a];(" "[b]";
+const char32_t U07[] = U"[a]" R"[b]";
+const char32_t U08[] = UR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-";
+const char32_t U09[] = UR"/^&|~!=,"'\[a]/^&|~!=,"'\" U"[b]";
+const char32_t U10[] = U"[a]" UR"0123456789abcdef[b]0123456789abcdef";
+const char32_t U11[] = UR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" UR"w[b]w";
+
+const wchar_t L03[] = R"-[a]-" L"[b]";
+const wchar_t L04[] = "[a]" LR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ";
+const wchar_t L05[] = R"[a]" LR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL";
+const wchar_t L06[] = LR";([a];(" "[b]";
+const wchar_t L07[] = L"[a]" R"[b]";
+const wchar_t L08[] = LR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-";
+const wchar_t L09[] = LR"/^&|~!=,"'\[a]/^&|~!=,"'\" L"[b]";
+const wchar_t L10[] = L"[a]" LR"0123456789abcdef[b]0123456789abcdef";
+const wchar_t L11[] = LR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" LR"w[b]w";
+
+int
+main (void)
+{
+#define TEST(str, val) \
+  if (sizeof (str) != sizeof (val) \
+      || __builtin_memcmp (str, val, sizeof (str)) != 0) \
+    __builtin_abort ()
+  TEST (s00, "a[b]");
+  TEST (s01, "[a]b");
+  TEST (s02, "ab");
+  TEST (s03, "a[b]");
+  TEST (s04, "[a]b");
+  TEST (s05, "ab");
+  TEST (s06, "a[b]");
+  TEST (s07, "[a]b");
+  TEST (s08, "ab");
+  TEST (s09, "a[b]");
+  TEST (s10, "[a]b");
+  TEST (s11, "ab");
+  TEST (u03, u"a[b]");
+  TEST (u04, u"[a]b");
+  TEST (u05, u"ab");
+  TEST (u06, u"a[b]");
+  TEST (u07, u"[a]b");
+  TEST (u08, u"ab");
+  TEST (u09, u"a[b]");
+  TEST (u10, u"[a]b");
+  TEST (u11, u"ab");
+  TEST (U03, U"a[b]");
+  TEST (U04, U"[a]b");
+  TEST (U05, U"ab");
+  TEST (U06, U"a[b]");
+  TEST (U07, U"[a]b");
+  TEST (U08, U"ab");
+  TEST (U09, U"a[b]");
+  TEST (U10, U"[a]b");
+  TEST (U11, U"ab");
+  TEST (L03, L"a[b]");
+  TEST (L04, L"[a]b");
+  TEST (L05, L"ab");
+  TEST (L06, L"a[b]");
+  TEST (L07, L"[a]b");
+  TEST (L08, L"ab");
+  TEST (L09, L"a[b]");
+  TEST (L10, L"[a]b");
+  TEST (L11, L"ab");
+  return 0;
+}
--- gcc/testsuite/gcc.dg/raw-string-4.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/raw-string-4.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,28 @@
+/* R is not applicable for character literals.  */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+const int	i0	= R'a';	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 5 } */
+const int	i1	= uR'a';	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 7 } */
+const int	i2	= UR'a';	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 9 } */
+const int	i3	= u8R'a';	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 11 } */
+const int	i4	= LR'a';	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 13 } */
+
+#define R	1 +
+#define uR	2 +
+#define UR	3 +
+#define u8R	4 +
+#define LR	5 +
+
+const int	i5	= R'a';
+const int	i6	= uR'a';
+const int	i7	= UR'a';
+const int	i8	= u8R'a';
+const int	i9	= LR'a';
+
+int main () {}
--- gcc/testsuite/gcc.dg/raw-string-3.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/raw-string-3.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,53 @@
+/* If not gnu99, the {,u,u8,U,L}R prefix should be parsed as separate
+   token. */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+const void	*s0	= R"[a]";	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 6 } */
+const void	*s1	= uR"[a]";	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 8 } */
+const void	*s2	= UR"[a]";	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 10 } */
+const void	*s3	= u8R"[a]";	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 12 } */
+const void	*s4	= LR"[a]";	/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 14 } */
+
+const int	i0	= R'a';		/* { dg-error "expected ',' or ';'" } */
+const int	i1	= uR'a';	/* { dg-error "expected ',' or ';'" } */
+const int	i2	= UR'a';	/* { dg-error "expected ',' or ';'" } */
+const int	i3	= u8R'a';	/* { dg-error "expected ',' or ';'" } */
+const int	i4	= LR'a';	/* { dg-error "expected ',' or ';'" } */
+
+#define R	"a"
+#define uR	"b"
+#define UR	"c"
+#define u8R	"d"
+#define LR	"e"
+
+const void	*s5	= R"[a]";
+const void	*s6	= uR"[a]";
+const void	*s7	= UR"[a]";
+const void	*s8	= u8R"[a]";
+const void	*s9	= LR"[a]";
+
+#undef R
+#undef uR
+#undef UR
+#undef u8R
+#undef LR
+
+#define R	1 +
+#define uR	2 +
+#define UR	3 +
+#define u8R	4 +
+#define LR	5 +
+
+const int	i5	= R'a';
+const int	i6	= uR'a';
+const int	i7	= UR'a';
+const int	i8	= u8R'a';
+const int	i9	= LR'a';
+
+int main () {}
--- gcc/testsuite/gcc.dg/raw-string-5.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/raw-string-5.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+const void *s0 = R"0123456789abcdefg[]0123456789abcdefg";
+	/* { dg-error "raw string delimiter longer" "" { target *-*-* } 4 } */
+	/* { dg-error "stray" "" { target *-*-* } 4 } */
+const void *s1 = R" [] ";
+	/* { dg-error "invalid character" "" { target *-*-* } 7 } */
+	/* { dg-error "stray" "" { target *-*-* } 7 } */
+const void *s2 = R"	[]	";
+	/* { dg-error "invalid character" "" { target *-*-* } 10 } */
+	/* { dg-error "stray" "" { target *-*-* } 10 } */
+const void *s3 = R"][]]";
+	/* { dg-error "invalid character" "" { target *-*-* } 13 } */
+	/* { dg-error "stray" "" { target *-*-* } 13 } */
+const void *s4 = R"@[]@";
+	/* { dg-error "invalid character" "" { target *-*-* } 16 } */
+	/* { dg-error "stray" "" { target *-*-* } 16 } */
+const void *s5 = R"$[]$";
+	/* { dg-error "invalid character" "" { target *-*-* } 19 } */
+	/* { dg-error "stray" "" { target *-*-* } 19 } */
+
+int main () {}
--- gcc/testsuite/gcc.dg/utf-dflt2.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/utf-dflt2.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,12 @@
+/* If not gnu99, the u8 prefix should be parsed as separate tokens. */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+const void	*s0 = u8"a";		/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 5 } */
+
+#define u8	"a"
+
+const void	*s1 = u8"a";
+
+int main () {}
--- gcc/testsuite/gcc.dg/cpp/include6.c.jj	2009-10-13 10:03:37.000000000 +0200
+++ gcc/testsuite/gcc.dg/cpp/include6.c	2009-10-13 10:06:05.000000000 +0200
@@ -0,0 +1,14 @@
+/* { dg-do preprocess } */
+/* { dg-options "-std=gnu99" } */
+
+#include <stddef.h>
+#include "stddef.h"
+#include L"stddef.h"		/* { dg-error "include expects" } */
+#include u"stddef.h"		/* { dg-error "include expects" } */
+#include U"stddef.h"		/* { dg-error "include expects" } */
+#include u8"stddef.h"		/* { dg-error "include expects" } */
+#include R"[stddef.h]"		/* { dg-error "include expects" } */
+#include LR"[stddef.h]"		/* { dg-error "include expects" } */
+#include uR"[stddef.h]"		/* { dg-error "include expects" } */
+#include UR"[stddef.h]"		/* { dg-error "include expects" } */
+#include u8R"[stddef.h]"	/* { dg-error "include expects" } */
--- gcc/testsuite/gcc.dg/raw-string-7.c.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/gcc.dg/raw-string-7.c	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,23 @@
+/* The trailing whitespace after \ and before newline extension
+   breaks full compliance for raw strings.  */
+/* { dg-do run { xfail *-*-* } } */
+/* { dg-options "-std=gnu99" } */
+
+/* Note, there is a single space after \ on the following line.  */
+const void *s0 = R"[\ 
+]";
+/* { dg-bogus "backslash and newline separated by space" "" { xfail *-*-* } 7 } */
+
+/* Note, there is a single tab after \ on the following line.  */
+const void *s1 = R"[\	
+]";
+/* { dg-bogus "backslash and newline separated by space" "" { xfail *-*-* } 12 } */
+
+int
+main (void)
+{
+  if (__builtin_strcmp (s0, "\\ \n") != 0
+      || __builtin_strcmp (s1, "\\\t\n") != 0)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/g++.dg/ext/utf-dflt2.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/utf-dflt2.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,12 @@
+// In C++0x, the u8 prefix should be parsed as separate tokens.
+// { dg-do compile }
+// { dg-options "-std=c++98" }
+
+const void	*s0 = u8"a";		// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 5 }
+
+#define u8	"a"
+
+const void	*s1 = u8"a";
+
+int main () {}
--- gcc/testsuite/g++.dg/ext/raw-string-4.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/raw-string-4.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,28 @@
+// R is not applicable for character literals.
+// { dg-do compile }
+// { dg-options "-std=c++0x" }
+
+const int	i0	= R'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 5 }
+const int	i1	= uR'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 7 }
+const int	i2	= UR'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 9 }
+const int	i3	= u8R'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 11 }
+const int	i4	= LR'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 13 }
+
+#define R	1 +
+#define uR	2 +
+#define UR	3 +
+#define u8R	4 +
+#define LR	5 +
+
+const int	i5	= R'a';
+const int	i6	= uR'a';
+const int	i7	= UR'a';
+const int	i8	= u8R'a';
+const int	i9	= LR'a';
+
+int main () {}
--- gcc/testsuite/g++.dg/ext/utf8-1.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/utf8-1.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,45 @@
+// { dg-do run }
+// { dg-require-iconv "ISO-8859-2" }
+// { dg-options "-std=c++0x -fexec-charset=ISO-8859-2" }
+
+const char *str1 = "h\u00e1\U0000010Dky ";
+const char *str2 = "\u010d\u00E1rky\n";
+const char *str3 = u8"h\u00e1\U0000010Dky ";
+const char *str4 = u8"\u010d\u00E1rky\n";
+const char *str5 = "h\u00e1\U0000010Dky " "\u010d\u00E1rky\n";
+const char *str6 = u8"h\u00e1\U0000010Dky " "\u010d\u00E1rky\n";
+const char *str7 = "h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n";
+#define u8
+const char *str8 = u8"h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n";
+
+const char latin2_1[] = "\x68\xe1\xe8\x6b\x79\x20";
+const char latin2_2[] = "\xe8\xe1\x72\x6b\x79\n";
+const char utf8_1[] = "\x68\xc3\xa1\xc4\x8d\x6b\x79\x20";
+const char utf8_2[] = "\xc4\x8d\xc3\xa1\x72\x6b\x79\n";
+
+int
+main (void)
+{
+  if (__builtin_strcmp (str1, latin2_1) != 0
+      || __builtin_strcmp (str2, latin2_2) != 0
+      || __builtin_strcmp (str3, utf8_1) != 0
+      || __builtin_strcmp (str4, utf8_2) != 0
+      || __builtin_strncmp (str5, latin2_1, sizeof (latin2_1) - 1) != 0
+      || __builtin_strcmp (str5 + sizeof (latin2_1) - 1, latin2_2) != 0
+      || __builtin_strncmp (str6, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_strcmp (str6 + sizeof (utf8_1) - 1, utf8_2) != 0
+      || __builtin_strncmp (str7, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_strcmp (str7 + sizeof (utf8_1) - 1, utf8_2) != 0
+      || __builtin_strncmp (str8, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_strcmp (str8 + sizeof (utf8_1) - 1, utf8_2) != 0)
+    __builtin_abort ();
+  if (sizeof ("a" u8"b"[0]) != 1
+      || sizeof (u8"a" "b"[0]) != 1
+      || sizeof (u8"a" u8"b"[0]) != 1
+      || sizeof ("a" "\u010d") != 3
+      || sizeof ("a" u8"\u010d") != 4
+      || sizeof (u8"a" "\u010d") != 4
+      || sizeof (u8"a" "\u010d") != 4)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/g++.dg/ext/raw-string-3.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/raw-string-3.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,58 @@
+// If c++98, the {,u,u8,U,L}R prefix should be parsed as separate
+// token.
+// { dg-do compile }
+// { dg-options "-std=c++98" }
+
+const void	*s0	= R"[a]";	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 6 }
+const void	*s1	= uR"[a]";	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 8 }
+const void	*s2	= UR"[a]";	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 10 }
+const void	*s3	= u8R"[a]";	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 12 }
+const void	*s4	= LR"[a]";	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 14 }
+
+const int	i0	= R'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 17 }
+const int	i1	= uR'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 19 }
+const int	i2	= UR'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 21 }
+const int	i3	= u8R'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 23 }
+const int	i4	= LR'a';	// { dg-error "was not declared" }
+		// { dg-error "expected ',' or ';'" "" { target *-*-* } 25 }
+
+#define R	"a"
+#define uR	"b"
+#define UR	"c"
+#define u8R	"d"
+#define LR	"e"
+
+const void	*s5	= R"[a]";
+const void	*s6	= uR"[a]";
+const void	*s7	= UR"[a]";
+const void	*s8	= u8R"[a]";
+const void	*s9	= LR"[a]";
+
+#undef R
+#undef uR
+#undef UR
+#undef u8R
+#undef LR
+
+#define R	1 +
+#define uR	2 +
+#define UR	3 +
+#define u8R	4 +
+#define LR	5 +
+
+const int	i5	= R'a';
+const int	i6	= uR'a';
+const int	i7	= UR'a';
+const int	i8	= u8R'a';
+const int	i9	= LR'a';
+
+int main () {}
--- gcc/testsuite/g++.dg/ext/raw-string-7.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/raw-string-7.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,23 @@
+// The trailing whitespace after \ and before newline extension
+// breaks full compliance for raw strings.
+// { dg-do run { xfail *-*-* } }
+// { dg-options "-std=c++0x" }
+
+// Note, there is a single space after \ on the following line.
+const char *s0 = R"[\ 
+]";
+// { dg-bogus "backslash and newline separated by space" "" { xfail *-*-* } 7 }
+
+// Note, there is a single tab after \ on the following line.
+const char *s1 = R"[\	
+]";
+// { dg-bogus "backslash and newline separated by space" "" { xfail *-*-* } 12 }
+
+int
+main (void)
+{
+  if (__builtin_strcmp (s0, "\\ \n") != 0
+      || __builtin_strcmp (s1, "\\\t\n") != 0)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/g++.dg/ext/utf-badconcat2.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/utf-badconcat2.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,15 @@
+// Test unsupported concatenation of UTF-8 string literals.
+// { dg-do compile }
+// { dg-options "-std=c++0x" }
+
+const void *s0	= u8"a"   "b";
+const void *s1	=   "a" u8"b";
+const void *s2	= u8"a" u8"b";
+const void *s3	= u8"a"  u"b";	// { dg-error "non-standard concatenation" }
+const void *s4	=  u"a" u8"b";	// { dg-error "non-standard concatenation" }
+const void *s5	= u8"a"  U"b";	// { dg-error "non-standard concatenation" }
+const void *s6	=  U"a" u8"b";	// { dg-error "non-standard concatenation" }
+const void *s7	= u8"a"  L"b";	// { dg-error "non-standard concatenation" }
+const void *s8	=  L"a" u8"b";	// { dg-error "non-standard concatenation" }
+
+int main () {}
--- gcc/testsuite/g++.dg/ext/utf8-2.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/utf8-2.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,21 @@
+// { dg-do compile }
+// { dg-options "-std=c++0x" }
+
+const char	s0[]	= u8"ab";
+const char16_t	s1[]	= u8"ab";	// { dg-error "from non-wide" }
+const char32_t  s2[]    = u8"ab";	// { dg-error "from non-wide" }
+const wchar_t   s3[]    = u8"ab";	// { dg-error "from non-wide" }
+
+const char      t0[0]   = u8"ab";	// { dg-error "chars is too long" }
+const char      t1[1]   = u8"ab";	// { dg-error "chars is too long" }
+const char      t2[2]   = u8"ab";	// { dg-error "chars is too long" }
+const char      t3[3]   = u8"ab";
+const char      t4[4]   = u8"ab";
+
+const char      u0[0]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const char      u1[1]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const char      u2[2]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const char      u3[3]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const char      u4[4]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const char      u5[5]   = u8"\u2160.";
+const char      u6[6]   = u8"\u2160.";
--- gcc/testsuite/g++.dg/ext/raw-string-5.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/raw-string-5.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,23 @@
+// { dg-do compile }
+// { dg-options "-std=c++0x" }
+
+const void *s0 = R"0123456789abcdefg[]0123456789abcdefg";
+	// { dg-error "raw string delimiter longer" "" { target *-*-* } 4 }
+	// { dg-error "stray" "" { target *-*-* } 4 }
+const void *s1 = R" [] ";
+	// { dg-error "invalid character" "" { target *-*-* } 7 }
+	// { dg-error "stray" "" { target *-*-* } 7 }
+const void *s2 = R"	[]	";
+	// { dg-error "invalid character" "" { target *-*-* } 10 }
+	// { dg-error "stray" "" { target *-*-* } 10 }
+const void *s3 = R"][]]";
+	// { dg-error "invalid character" "" { target *-*-* } 13 }
+	// { dg-error "stray" "" { target *-*-* } 13 }
+const void *s4 = R"@[]@";
+	// { dg-error "invalid character" "" { target *-*-* } 16 }
+	// { dg-error "stray" "" { target *-*-* } 16 }
+const void *s5 = R"$[]$";
+	// { dg-error "invalid character" "" { target *-*-* } 19 }
+	// { dg-error "stray" "" { target *-*-* } 19 }
+
+int main () {}
--- gcc/testsuite/g++.dg/ext/raw-string-6.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/raw-string-6.C	2009-10-13 09:51:54.000000000 +0200
@@ -0,0 +1,5 @@
+// { dg-do compile }
+// { dg-options "-std=c++0x" }
+
+const void *s0 = R"ouch[]ouCh";	// { dg-error "at end of input" }
+	// { dg-error "unterminated raw string" "" { target *-*-* } 4 }
--- gcc/testsuite/g++.dg/ext/raw-string-2.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/raw-string-2.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,104 @@
+// { dg-do run }
+// { dg-options "-std=c++0x" }
+
+#define R
+#define u
+#define uR
+#define U
+#define UR
+#define u8
+#define u8R
+#define L
+#define LR
+
+const char s00[] = R"[a]" "[b]";
+const char s01[] = "[a]" R"*[b]*";
+const char s02[] = R"[a]" R"[b]";
+const char s03[] = R"-[a]-" u8"[b]";
+const char s04[] = "[a]" u8R"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ";
+const char s05[] = R"[a]" u8R"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL";
+const char s06[] = u8R";([a];(" "[b]";
+const char s07[] = u8"[a]" R"[b]";
+const char s08[] = u8R"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-";
+const char s09[] = u8R"/^&|~!=,"'\[a]/^&|~!=,"'\" u8"[b]";
+const char s10[] = u8"[a]" u8R"0123456789abcdef[b]0123456789abcdef";
+const char s11[] = u8R"ghijklmnopqrstuv[a]ghijklmnopqrstuv" u8R"w[b]w";
+
+const char16_t u03[] = R"-[a]-" u"[b]";
+const char16_t u04[] = "[a]" uR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ";
+const char16_t u05[] = R"[a]" uR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL";
+const char16_t u06[] = uR";([a];(" "[b]";
+const char16_t u07[] = u"[a]" R"[b]";
+const char16_t u08[] = uR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-";
+const char16_t u09[] = uR"/^&|~!=,"'\[a]/^&|~!=,"'\" u"[b]";
+const char16_t u10[] = u"[a]" uR"0123456789abcdef[b]0123456789abcdef";
+const char16_t u11[] = uR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" uR"w[b]w";
+
+const char32_t U03[] = R"-[a]-" U"[b]";
+const char32_t U04[] = "[a]" UR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ";
+const char32_t U05[] = R"[a]" UR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL";
+const char32_t U06[] = UR";([a];(" "[b]";
+const char32_t U07[] = U"[a]" R"[b]";
+const char32_t U08[] = UR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-";
+const char32_t U09[] = UR"/^&|~!=,"'\[a]/^&|~!=,"'\" U"[b]";
+const char32_t U10[] = U"[a]" UR"0123456789abcdef[b]0123456789abcdef";
+const char32_t U11[] = UR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" UR"w[b]w";
+
+const wchar_t L03[] = R"-[a]-" L"[b]";
+const wchar_t L04[] = "[a]" LR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ";
+const wchar_t L05[] = R"[a]" LR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL";
+const wchar_t L06[] = LR";([a];(" "[b]";
+const wchar_t L07[] = L"[a]" R"[b]";
+const wchar_t L08[] = LR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-";
+const wchar_t L09[] = LR"/^&|~!=,"'\[a]/^&|~!=,"'\" L"[b]";
+const wchar_t L10[] = L"[a]" LR"0123456789abcdef[b]0123456789abcdef";
+const wchar_t L11[] = LR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" LR"w[b]w";
+
+int
+main (void)
+{
+#define TEST(str, val) \
+  if (sizeof (str) != sizeof (val) \
+      || __builtin_memcmp (str, val, sizeof (str)) != 0) \
+    __builtin_abort ()
+  TEST (s00, "a[b]");
+  TEST (s01, "[a]b");
+  TEST (s02, "ab");
+  TEST (s03, "a[b]");
+  TEST (s04, "[a]b");
+  TEST (s05, "ab");
+  TEST (s06, "a[b]");
+  TEST (s07, "[a]b");
+  TEST (s08, "ab");
+  TEST (s09, "a[b]");
+  TEST (s10, "[a]b");
+  TEST (s11, "ab");
+  TEST (u03, u"a[b]");
+  TEST (u04, u"[a]b");
+  TEST (u05, u"ab");
+  TEST (u06, u"a[b]");
+  TEST (u07, u"[a]b");
+  TEST (u08, u"ab");
+  TEST (u09, u"a[b]");
+  TEST (u10, u"[a]b");
+  TEST (u11, u"ab");
+  TEST (U03, U"a[b]");
+  TEST (U04, U"[a]b");
+  TEST (U05, U"ab");
+  TEST (U06, U"a[b]");
+  TEST (U07, U"[a]b");
+  TEST (U08, U"ab");
+  TEST (U09, U"a[b]");
+  TEST (U10, U"[a]b");
+  TEST (U11, U"ab");
+  TEST (L03, L"a[b]");
+  TEST (L04, L"[a]b");
+  TEST (L05, L"ab");
+  TEST (L06, L"a[b]");
+  TEST (L07, L"[a]b");
+  TEST (L08, L"ab");
+  TEST (L09, L"a[b]");
+  TEST (L10, L"[a]b");
+  TEST (L11, L"ab");
+  return 0;
+}
--- gcc/testsuite/g++.dg/ext/raw-string-1.C.jj	2009-10-12 21:45:01.000000000 +0200
+++ gcc/testsuite/g++.dg/ext/raw-string-1.C	2009-10-12 21:45:01.000000000 +0200
@@ -0,0 +1,96 @@
+// { dg-do run }
+// { dg-options "-std=c++0x" }
+
+const char s0[] = R"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const char s1[] = "a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const char s2[] = R"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const char s3[] = "ab\nc]\"\nc]*|\"\nc";
+
+const char t0[] = u8R"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const char t1[] = u8"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const char t2[] = u8R"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const char t3[] = u8"ab\nc]\"\nc]*|\"\nc";
+
+const char16_t u0[] = uR"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const char16_t u1[] = u"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const char16_t u2[] = uR"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const char16_t u3[] = u"ab\nc]\"\nc]*|\"\nc";
+
+const char32_t U0[] = UR"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const char32_t U1[] = U"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const char32_t U2[] = UR"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const char32_t U3[] = U"ab\nc]\"\nc]*|\"\nc";
+
+const wchar_t L0[] = LR"[a\
+\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb
+c]";
+const wchar_t L1[] = L"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc";
+const wchar_t L2[] = LR"*|*[a\
+b
+c]"
+c]*|"
+c]*|*";
+const wchar_t L3[] = L"ab\nc]\"\nc]*|\"\nc";
+
+int
+main (void)
+{
+  if (sizeof (s0) != sizeof (s1)
+      || __builtin_memcmp (s0, s1, sizeof (s0)) != 0)
+    __builtin_abort ();
+  if (sizeof (s2) != sizeof (s3)
+      || __builtin_memcmp (s2, s3, sizeof (s2)) != 0)
+    __builtin_abort ();
+  if (sizeof (t0) != sizeof (t1)
+      || __builtin_memcmp (t0, t1, sizeof (t0)) != 0)
+    __builtin_abort ();
+  if (sizeof (t2) != sizeof (t3)
+      || __builtin_memcmp (t2, t3, sizeof (t2)) != 0)
+    __builtin_abort ();
+  if (sizeof (u0) != sizeof (u1)
+      || __builtin_memcmp (u0, u1, sizeof (u0)) != 0)
+    __builtin_abort ();
+  if (sizeof (u2) != sizeof (u3)
+      || __builtin_memcmp (u2, u3, sizeof (u2)) != 0)
+    __builtin_abort ();
+  if (sizeof (U0) != sizeof (U1)
+      || __builtin_memcmp (U0, U1, sizeof (U0)) != 0)
+    __builtin_abort ();
+  if (sizeof (U2) != sizeof (U3)
+      || __builtin_memcmp (U2, U3, sizeof (U2)) != 0)
+    __builtin_abort ();
+  if (sizeof (L0) != sizeof (L1)
+      || __builtin_memcmp (L0, L1, sizeof (L0)) != 0)
+    __builtin_abort ();
+  if (sizeof (L2) != sizeof (L3)
+      || __builtin_memcmp (L2, L3, sizeof (L2)) != 0)
+    __builtin_abort ();
+  if (sizeof (R"*[]*") != 1
+      || __builtin_memcmp (R"*[]*", "", 1) != 0)
+    __builtin_abort ();
+  return 0;
+}


	Jakub

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Raw strings (take 3)
  2009-10-13 14:05   ` [PATCH] Raw strings (take 3) Jakub Jelinek
@ 2009-10-13 17:24     ` Tom Tromey
  0 siblings, 0 replies; 6+ messages in thread
From: Tom Tromey @ 2009-10-13 17:24 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

>>>>> "Jakub" == Jakub Jelinek <jakub@redhat.com> writes:

Tom> Would it be too much trouble to use calls to cpp_error_with_line for all
Tom> new errors?  I think this is generally preferable, and in this code I
Tom> think it would also let us emit errors against locations inside strings.
Tom> (And, for errors about unterminated strings, it would let us point to
Tom> the start of the string, which seems better to me.)

Jakub> Done.

Thanks!

Jakub> 2009-10-13  Jakub Jelinek  <jakub@redhat.com>
Jakub> 	* charset.c (cpp_init_iconv): Initialize utf8_cset_desc.
[...]

The parts of this that I can approve are ok.
Thanks again.

Tom

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-10-13 17:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-12 12:37 Patch ping Jakub Jelinek
2009-10-12 19:23 ` Tom Tromey
2009-10-12 20:21   ` Jakub Jelinek
2009-10-12 21:29     ` Tom Tromey
2009-10-13 14:05   ` [PATCH] Raw strings (take 3) Jakub Jelinek
2009-10-13 17:24     ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).