From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9690 invoked by alias); 12 Sep 2008 13:24:47 -0000 Received: (qmail 9669 invoked by uid 22791); 12 Sep 2008 13:24:42 -0000 X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 12 Sep 2008 13:24:13 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id m8CDKAII011192; Fri, 12 Sep 2008 09:20:10 -0400 Received: from hs20-bc2-1.build.redhat.com (hs20-bc2-1.build.redhat.com [10.10.28.34]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id m8CDK7pn030187; Fri, 12 Sep 2008 09:20:07 -0400 Received: from hs20-bc2-1.build.redhat.com (localhost.localdomain [127.0.0.1]) by hs20-bc2-1.build.redhat.com (8.13.1/8.13.1) with ESMTP id m8CDK7YV026723; Fri, 12 Sep 2008 09:20:07 -0400 Received: (from jakub@localhost) by hs20-bc2-1.build.redhat.com (8.13.1/8.13.1/Submit) id m8CDK7fO026719; Fri, 12 Sep 2008 09:20:07 -0400 Date: Fri, 12 Sep 2008 14:22:00 -0000 From: Jakub Jelinek To: Tom Tromey , Jason Merrill , "Joseph S. Myers" Cc: gcc-patches@gcc.gnu.org, Kris Van Hees , Ulrich Drepper Subject: [PATCH] Support for C++0x and C1x u8 string literals and raw string literals Message-ID: <20080912132007.GA9666@hs20-bc2-1.build.redhat.com> Reply-To: Jakub Jelinek Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2008-09/txt/msg00961.txt.bz2 Hi! Apparently GCC trunk currently supports only part of the new C++0x (and C1x) string literals, it supports u and U prefixed character literals and also u and U prefixed string literals, but doesn't support u8, R, uR, UR, LR and u8R prefixed string literals. The following patch adds support for the rest. There is one thing in which currently gcc raw strings violates the standard, because of the controversial extension which treats backslash whitespace newline the same as backslash newline. I've added test for that and xfailed it for now. For the raw string delimiter sequences I've tried to be really pedantic and accept only basic source charset character except the listed 7, rather than say all characters except the listed 7 plus maybe disallowing '\0', as this is a new feature I think being pedantic doesn't hurt. In one of the raw string papers floating around there was an example using R"@[...]@" which is not pedantically valid, as @ is not basic source charset character. u8 string literal is quite trivial, all we need to do is use conversion to UTF-8 (nop conversion for UTF-8 SOURCE_CHARSET) unconditionally, no matter what -fexec-charset is. I was mainly following n2723.pdf, but found a few small issues in it, Jason, could you raise that up (or is there any newer draft than n2723?)? In [lex.string]/3 example R"[a should be followed by backslash. In [lex.string]/9, replace "belowit" with "below; it". In [lex.string]/11, it would be good if the text could be clearer about whether R is considered part of the prefix for this paragraph or not. As this is in translation phase 6, a raw-string can't be turned into non raw-string and vice versa I'd say, so IMHO this paragraph should just talk about the initial prefix without the optional trailing R. I don't see a reason why one couldn't concatenate say R"[abc]" "def", there is no reason to force people to concatenate only R"[abc]" R"[def]". Similarly u8"a" R"[def]" should IMHO mean u8"adef". 2008-09-12 Jakub Jelinek * charset.c (cpp_init_iconv): Initialize utf8_cset_desc. (_cpp_destroy_iconv): Destroy utf8_cset_desc, char16_cset_desc and char32_cset_desc. (converter_for_type): Handle CPP_UTF8STRING. (cpp_interpret_string): Handle CPP_UTF8STRING and raw-strings. * directives.c (get__Pragma_string): Handle CPP_UTF8STRING. * include/cpplib.h (CPP_UTF8STRING): New token type. * internal.h (struct cpp_reader): Add utf8_cset_desc field. * lex.c (lex_raw_string): New function. (lex_string): Handle u8 string literals, call lex_raw_string for raw string literals. (_cpp_lex_direct): Call lex_string even for u8" and {,u,U,L,u8}R" sequences. * macro.c (stringify_arg): Handle CPP_UTF8STRING. * c-common.c (c_parse_error): Handle CPP_UTF8STRING. * c-lex.c (c_lex_with_flags, lex_string): Likewise. * c-parser.c (c_parser_postfix_expression): Likewise. * parser.c (cp_lexer_print_token, cp_parser_is_string_literal, cp_parser_string_literal, cp_parser_primary_expression): Likewise. * gcc.dg/raw-string-1.c: New test. * gcc.dg/raw-string-2.c: New test. * gcc.dg/raw-string-3.c: New test. * gcc.dg/raw-string-4.c: New test. * gcc.dg/raw-string-5.c: New test. * gcc.dg/raw-string-6.c: New test. * gcc.dg/raw-string-7.c: New test. * gcc.dg/utf8-1.c: New test. * gcc.dg/utf8-2.c: New test. * gcc.dg/utf-badconcat2.c: New test. * gcc.dg/utf-dflt2.c: New test. * g++.dg/ext/raw-string-1.C: New test. * g++.dg/ext/raw-string-2.C: New test. * g++.dg/ext/raw-string-3.C: New test. * g++.dg/ext/raw-string-4.C: New test. * g++.dg/ext/raw-string-5.C: New test. * g++.dg/ext/raw-string-6.C: New test. * g++.dg/ext/raw-string-7.C: New test. * g++.dg/ext/utf8-1.C: New test. * g++.dg/ext/utf8-2.C: New test. * g++.dg/ext/utf-badconcat2.C: New test. * g++.dg/ext/utf-dflt2.C: New test. --- libcpp/charset.c.jj 2008-09-05 12:59:49.000000000 +0200 +++ libcpp/charset.c 2008-09-11 22:11:02.000000000 +0200 @@ -721,6 +721,8 @@ cpp_init_iconv (cpp_reader *pfile) pfile->narrow_cset_desc = init_iconv_desc (pfile, ncset, SOURCE_CHARSET); pfile->narrow_cset_desc.width = CPP_OPTION (pfile, char_precision); + pfile->utf8_cset_desc = init_iconv_desc (pfile, "UTF-8", SOURCE_CHARSET); + pfile->utf8_cset_desc.width = CPP_OPTION (pfile, char_precision); pfile->char16_cset_desc = init_iconv_desc (pfile, be ? "UTF-16BE" : "UTF-16LE", SOURCE_CHARSET); @@ -741,6 +743,12 @@ _cpp_destroy_iconv (cpp_reader *pfile) { if (pfile->narrow_cset_desc.func == convert_using_iconv) iconv_close (pfile->narrow_cset_desc.cd); + if (pfile->utf8_cset_desc.func == convert_using_iconv) + iconv_close (pfile->utf8_cset_desc.cd); + if (pfile->char16_cset_desc.func == convert_using_iconv) + iconv_close (pfile->char16_cset_desc.cd); + if (pfile->char32_cset_desc.func == convert_using_iconv) + iconv_close (pfile->char32_cset_desc.cd); if (pfile->wide_cset_desc.func == convert_using_iconv) iconv_close (pfile->wide_cset_desc.cd); } @@ -1330,6 +1338,8 @@ converter_for_type (cpp_reader *pfile, e { default: return pfile->narrow_cset_desc; + case CPP_UTF8STRING: + return pfile->utf8_cset_desc; case CPP_CHAR16: case CPP_STRING16: return pfile->char16_cset_desc; @@ -1364,7 +1374,47 @@ cpp_interpret_string (cpp_reader *pfile, for (i = 0; i < count; i++) { p = from[i].text; - if (*p == 'L' || *p == 'u' || *p == 'U') p++; + if (*p == 'u') + { + if (*++p == '8') + p++; + } + else if (*p == 'L' || *p == 'U') p++; + if (*p == 'R') + { + const uchar *prefix; + + /* Skip over 'R"'. */ + p += 2; + prefix = p; + while (*p != '[') + p++; + p++; + limit = from[i].text + from[i].len; + if (limit >= p + (p - prefix) + 1) + limit -= (p - prefix) + 1; + + for (;;) + { + base = p; + while (p < limit && (*p != '\\' || (p[1] != 'u' && p[1] != 'U'))) + p++; + if (p > base) + { + /* We have a run of normal characters; these can be fed + directly to convert_cset. */ + if (!APPLY_CONVERSION (cvt, base, p - base, &tbuf)) + goto fail; + } + if (p == limit) + break; + + p = convert_ucn (pfile, p + 1, limit, &tbuf, cvt); + } + + continue; + } + p++; /* Skip leading quote. */ limit = from[i].text + from[i].len - 1; /* Skip trailing quote. */ --- libcpp/directives.c.jj 2008-09-05 12:59:49.000000000 +0200 +++ libcpp/directives.c 2008-09-11 20:27:32.000000000 +0200 @@ -1519,7 +1519,8 @@ get__Pragma_string (cpp_reader *pfile) if (string->type == CPP_EOF) _cpp_backup_tokens (pfile, 1); if (string->type != CPP_STRING && string->type != CPP_WSTRING - && string->type != CPP_STRING32 && string->type != CPP_STRING16) + && string->type != CPP_STRING32 && string->type != CPP_STRING16 + && string->type != CPP_UTF8STRING) return NULL; paren = get_token_no_padding (pfile); --- libcpp/include/cpplib.h.jj 2008-09-05 12:59:47.000000000 +0200 +++ libcpp/include/cpplib.h 2008-09-11 20:23:53.000000000 +0200 @@ -131,6 +131,7 @@ struct _cpp_file; TK(WSTRING, LITERAL) /* L"string" */ \ TK(STRING16, LITERAL) /* u"string" */ \ TK(STRING32, LITERAL) /* U"string" */ \ + TK(UTF8STRING, LITERAL) /* u8"string" */ \ TK(OBJC_STRING, LITERAL) /* @"string" - Objective-C */ \ TK(HEADER_NAME, LITERAL) /* in #include */ \ \ @@ -724,10 +725,10 @@ extern const unsigned char *cpp_macro_de extern void _cpp_backup_tokens (cpp_reader *, unsigned int); extern const cpp_token *cpp_peek_token (cpp_reader *, int); -/* Evaluate a CPP_CHAR or CPP_WCHAR token. */ +/* Evaluate a CPP_*CHAR* token. */ extern cppchar_t cpp_interpret_charconst (cpp_reader *, const cpp_token *, unsigned int *, int *); -/* Evaluate a vector of CPP_STRING or CPP_WSTRING tokens. */ +/* Evaluate a vector of CPP_*STRING* tokens. */ extern bool cpp_interpret_string (cpp_reader *, const cpp_string *, size_t, cpp_string *, enum cpp_ttype); --- libcpp/internal.h.jj 2008-09-05 12:59:49.000000000 +0200 +++ libcpp/internal.h 2008-09-11 18:23:02.000000000 +0200 @@ -400,6 +400,10 @@ struct cpp_reader struct cset_converter narrow_cset_desc; /* Descriptor for converting from the source character set to the + UTF-8 execution character set. */ + struct cset_converter utf8_cset_desc; + + /* Descriptor for converting from the source character set to the UTF-16 execution character set. */ struct cset_converter char16_cset_desc; --- libcpp/lex.c.jj 2008-09-05 12:59:49.000000000 +0200 +++ libcpp/lex.c 2008-09-12 13:54:01.000000000 +0200 @@ -609,10 +609,185 @@ create_literal (cpp_reader *pfile, cpp_t token->val.str.text = dest; } +/* Lexes raw a string. The stored string contains the spelling, including + double quotes, delimiter string, '[' and ']', any leading + 'L', 'u', 'U' or 'u8' and 'R' modifier. It returns the type of the + literal, or CPP_OTHER if it was not properly terminated. + + The spelling is NUL-terminated, but it is not guaranteed that this + is the first NUL since embedded NULs are preserved. */ + +static void +lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base, + const uchar *cur) +{ + bool saw_NUL = false; + const uchar *raw_prefix; + unsigned int raw_prefix_len = 0; + enum cpp_ttype type; + size_t total_len = 0; + _cpp_buff *first_buff = NULL, *last_buff = NULL; + + type = (*base == 'L' ? CPP_WSTRING : + *base == 'U' ? CPP_STRING32 : + *base == 'u' ? (base[1] == '8' ? CPP_UTF8STRING : CPP_STRING16) + : CPP_STRING); + + raw_prefix = cur + 1; + while (raw_prefix_len < 16) + { + switch (raw_prefix[raw_prefix_len]) + { + case ' ': case '[': case ']': case '\t': + case '\v': case '\f': case '\n': default: + break; + /* Basic source charset except the above chars. */ + case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': + case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': + case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': + case 's': case 't': case 'u': case 'v': case 'w': case 'x': + case 'y': case 'z': + case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': + case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': + case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': + case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': + case 'Y': case 'Z': + case '0': case '1': case '2': case '3': case '4': case '5': + case '6': case '7': case '8': case '9': + case '_': case '{': case '}': case '#': case '(': case ')': + case '<': case '>': case '%': case ':': case ';': case '.': + case '?': case '*': case '+': case '-': case '/': case '^': + case '&': case '|': case '~': case '!': case '=': case ',': + case '\\': case '"': case '\'': + raw_prefix_len++; + continue; + } + break; + } + + if (raw_prefix[raw_prefix_len] != '[') + { + if (raw_prefix_len == 16) + cpp_error (pfile, CPP_DL_ERROR, + "raw string delimiter longer than 16 characters"); + else + cpp_error (pfile, CPP_DL_ERROR, + "invalid character '%c' in raw string delimiter", + (int) raw_prefix[raw_prefix_len]); + pfile->buffer->cur = raw_prefix - 1; + create_literal (pfile, token, base, raw_prefix - 1 - base, CPP_OTHER); + return; + } + + cur = raw_prefix + raw_prefix_len + 1; + for (;;) + { + cppchar_t c = *cur++; + + if (c == ']' + && strncmp ((const char *) cur, (const char *) raw_prefix, + raw_prefix_len) == 0 + && cur[raw_prefix_len] == '"') + { + cur += raw_prefix_len + 1; + break; + } + else if (c == '\n') + { + if (pfile->state.in_directive + || pfile->state.parsing_args + || pfile->state.in_deferred_pragma) + { + cur--; + type = CPP_OTHER; + cpp_error (pfile, CPP_DL_ERROR, "unterminated raw string"); + break; + } + + /* raw strings allow embedded non-escaped newlines, which + complicates this routine a lot. */ + if (first_buff == NULL) + { + total_len = cur - base; + first_buff = last_buff = _cpp_get_buff (pfile, total_len); + memcpy (BUFF_FRONT (last_buff), base, total_len); + raw_prefix = BUFF_FRONT (last_buff) + (raw_prefix - base); + BUFF_FRONT (last_buff) += total_len; + } + else + { + size_t len = cur - base; + size_t cur_len = len > BUFF_ROOM (last_buff) + ? BUFF_ROOM (last_buff) : len; + + total_len += len; + memcpy (BUFF_FRONT (last_buff), base, cur_len); + BUFF_FRONT (last_buff) += cur_len; + if (len > cur_len) + { + last_buff = _cpp_append_extend_buff (pfile, last_buff, + len - cur_len); + memcpy (BUFF_FRONT (last_buff), base + cur_len, + len - cur_len); + BUFF_FRONT (last_buff) += len - cur_len; + } + } + + if (pfile->buffer->cur < pfile->buffer->rlimit) + CPP_INCREMENT_LINE (pfile, 0); + pfile->buffer->need_line = true; + + if (!_cpp_get_fresh_line (pfile)) + { + token->type = CPP_EOF; + /* Tell the compiler the line number of the EOF token. */ + token->src_loc = pfile->line_table->highest_line; + token->flags = BOL; + if (first_buff != NULL) + _cpp_release_buff (pfile, first_buff); + cpp_error (pfile, CPP_DL_ERROR, "unterminated raw string"); + return; + } + + cur = base = pfile->buffer->cur; + } + else if (c == '\0') + saw_NUL = true; + } + + if (saw_NUL && !pfile->state.skipping) + cpp_error (pfile, CPP_DL_WARNING, + "null character(s) preserved in literal"); + + pfile->buffer->cur = cur; + if (first_buff == NULL) + create_literal (pfile, token, base, cur - base, type); + else + { + uchar *dest = _cpp_unaligned_alloc (pfile, total_len + (cur - base) + 1); + + token->type = type; + token->val.str.len = total_len + (cur - base); + token->val.str.text = dest; + last_buff = first_buff; + while (last_buff != NULL) + { + memcpy (dest, last_buff->base, + BUFF_FRONT (last_buff) - last_buff->base); + dest += BUFF_FRONT (last_buff) - last_buff->base; + last_buff = last_buff->next; + } + _cpp_release_buff (pfile, first_buff); + memcpy (dest, base, cur - base); + dest[cur - base] = '\0'; + } +} + /* Lexes a string, character constant, or angle-bracketed header file name. The stored string contains the spelling, including opening - quote and leading any leading 'L', 'u' or 'U'. It returns the type - of the literal, or CPP_OTHER if it was not properly terminated. + quote and any leading 'L', 'u', 'U' or 'u8' and optional + 'R' modifier. It returns the type of the literal, or CPP_OTHER + if it was not properly terminated. The spelling is NUL-terminated, but it is not guaranteed that this is the first NUL since embedded NULs are preserved. */ @@ -626,12 +801,24 @@ lex_string (cpp_reader *pfile, cpp_token cur = base; terminator = *cur++; - if (terminator == 'L' || terminator == 'u' || terminator == 'U') + if (terminator == 'L' || terminator == 'U') terminator = *cur++; - if (terminator == '\"') + else if (terminator == 'u') + { + terminator = *cur++; + if (terminator == '8') + terminator = *cur++; + } + if (terminator == 'R') + { + lex_raw_string (pfile, token, base, cur); + return; + } + if (terminator == '"') type = (*base == 'L' ? CPP_WSTRING : *base == 'U' ? CPP_STRING32 : - *base == 'u' ? CPP_STRING16 : CPP_STRING); + *base == 'u' ? (base[1] == '8' ? CPP_UTF8STRING : CPP_STRING16) + : CPP_STRING); else if (terminator == '\'') type = (*base == 'L' ? CPP_WCHAR : *base == 'U' ? CPP_CHAR32 : @@ -1035,10 +1222,20 @@ _cpp_lex_direct (cpp_reader *pfile) case 'L': case 'u': case 'U': + case 'R': /* 'L', 'u' or 'U' may introduce wide characters or strings. */ if (c == 'L' || CPP_OPTION (pfile, uliterals)) { - if (*buffer->cur == '\'' || *buffer->cur == '"') + if ((*buffer->cur == '\'' && c != 'R') + || *buffer->cur == '"' + || (*buffer->cur == 'R' + && c != 'R' + && buffer->cur[1] == '"' + && CPP_OPTION (pfile, uliterals)) + || (*buffer->cur == '8' + && c == 'u' + && (buffer->cur[1] == '"' + || (buffer->cur[1] == 'R' && buffer->cur[2] == '"')))) { lex_string (pfile, result, buffer->cur - 1); break; @@ -1054,7 +1251,7 @@ _cpp_lex_direct (cpp_reader *pfile) case 'y': case 'z': case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G': case 'H': case 'I': case 'J': case 'K': - case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': + case 'M': case 'N': case 'O': case 'P': case 'Q': case 'S': case 'T': case 'V': case 'W': case 'X': case 'Y': case 'Z': result->type = CPP_NAME; --- libcpp/macro.c.jj 2008-09-05 12:59:49.000000000 +0200 +++ libcpp/macro.c 2008-09-11 20:25:20.000000000 +0200 @@ -377,7 +377,8 @@ stringify_arg (cpp_reader *pfile, macro_ escape_it = (token->type == CPP_STRING || token->type == CPP_CHAR || token->type == CPP_WSTRING || token->type == CPP_STRING || token->type == CPP_STRING32 || token->type == CPP_CHAR32 - || token->type == CPP_STRING16 || token->type == CPP_CHAR16); + || token->type == CPP_STRING16 || token->type == CPP_CHAR16 + || token->type == CPP_UTF8STRING); /* Room for each char being written in octal, initial space and final quote and NUL. */ --- gcc/c-common.c.jj 2008-09-09 16:08:04.000000000 +0200 +++ gcc/c-common.c 2008-09-11 20:30:57.000000000 +0200 @@ -7472,7 +7472,7 @@ c_parse_error (const char *gmsgid, enum message = NULL; } else if (token == CPP_STRING || token == CPP_WSTRING || token == CPP_STRING16 - || token == CPP_STRING32) + || token == CPP_STRING32 || token == CPP_UTF8STRING) message = catenate_messages (gmsgid, " before string constant"); else if (token == CPP_NUMBER) message = catenate_messages (gmsgid, " before numeric constant"); --- gcc/c-lex.c.jj 2008-09-05 12:56:31.000000000 +0200 +++ gcc/c-lex.c 2008-09-11 20:34:06.000000000 +0200 @@ -365,6 +365,7 @@ c_lex_with_flags (tree *value, location_ case CPP_WSTRING: case CPP_STRING16: case CPP_STRING32: + case CPP_UTF8STRING: type = lex_string (tok, value, true, true); break; @@ -423,6 +424,7 @@ c_lex_with_flags (tree *value, location_ case CPP_WSTRING: case CPP_STRING16: case CPP_STRING32: + case CPP_UTF8STRING: if ((lex_flags & C_LEX_RAW_STRINGS) == 0) { type = lex_string (tok, value, false, @@ -830,12 +832,13 @@ interpret_fixed (const cpp_token *token, return value; } -/* Convert a series of STRING, WSTRING, STRING16 and/or STRING32 tokens - into a tree, performing string constant concatenation. TOK is the - first of these. VALP is the location to write the string into. - OBJC_STRING indicates whether an '@' token preceded the incoming token. +/* Convert a series of STRING, WSTRING, STRING16, STRING32 and/or + UTF8STRING tokens into a tree, performing string constant + concatenation. TOK is the first of these. VALP is the location + to write the string into. OBJC_STRING indicates whether an '@' token + preceded the incoming token. Returns the CPP token type of the result (CPP_STRING, CPP_WSTRING, - CPP_STRING32, CPP_STRING16, or CPP_OBJC_STRING). + CPP_STRING32, CPP_STRING16, CPP_UTF8STRING, or CPP_OBJC_STRING). This is unfortunately more work than it should be. If any of the strings in the series has an L prefix, the result is a wide string @@ -880,6 +883,7 @@ lex_string (const cpp_token *tok, tree * case CPP_WSTRING: case CPP_STRING16: case CPP_STRING32: + case CPP_UTF8STRING: if (type != tok->type) { if (type == CPP_STRING) @@ -925,6 +929,7 @@ lex_string (const cpp_token *tok, tree * { default: case CPP_STRING: + case CPP_UTF8STRING: value = build_string (1, ""); break; case CPP_STRING16: @@ -950,6 +955,7 @@ lex_string (const cpp_token *tok, tree * { default: case CPP_STRING: + case CPP_UTF8STRING: TREE_TYPE (value) = char_array_type_node; break; case CPP_STRING16: --- gcc/c-parser.c.jj 2008-09-09 16:08:04.000000000 +0200 +++ gcc/c-parser.c 2008-09-11 20:34:34.000000000 +0200 @@ -5085,6 +5085,7 @@ c_parser_postfix_expression (c_parser *p case CPP_STRING16: case CPP_STRING32: case CPP_WSTRING: + case CPP_UTF8STRING: expr.value = c_parser_peek_token (parser)->value; expr.original_code = STRING_CST; c_parser_consume_token (parser); --- gcc/cp/parser.c.jj 2008-09-09 16:08:03.000000000 +0200 +++ gcc/cp/parser.c 2008-09-11 20:36:10.000000000 +0200 @@ -797,6 +797,7 @@ cp_lexer_print_token (FILE * stream, cp_ case CPP_STRING16: case CPP_STRING32: case CPP_WSTRING: + case CPP_UTF8STRING: fprintf (stream, " \"%s\"", TREE_STRING_POINTER (token->u.value)); break; @@ -2049,7 +2050,8 @@ cp_parser_is_string_literal (cp_token* t return (token->type == CPP_STRING || token->type == CPP_STRING16 || token->type == CPP_STRING32 || - token->type == CPP_WSTRING); + token->type == CPP_WSTRING || + token->type == CPP_UTF8STRING); } /* Returns nonzero if TOKEN is the indicated KEYWORD. */ @@ -2972,6 +2974,7 @@ cp_parser_string_literal (cp_parser *par { default: case CPP_STRING: + case CPP_UTF8STRING: TREE_TYPE (value) = char_array_type_node; break; case CPP_STRING16: @@ -3195,6 +3198,7 @@ cp_parser_primary_expression (cp_parser case CPP_STRING16: case CPP_STRING32: case CPP_WSTRING: + case CPP_UTF8STRING: /* ??? Should wide strings be allowed when parser->translate_strings_p is false (i.e. in attributes)? If not, we can kill the third argument to cp_parser_string_literal. */ --- gcc/testsuite/gcc.dg/raw-string-1.c.jj 2008-09-12 11:48:36.000000000 +0200 +++ gcc/testsuite/gcc.dg/raw-string-1.c 2008-09-12 14:01:27.000000000 +0200 @@ -0,0 +1,101 @@ +/* { dg-do run } */ +/* { dg-options "-std=gnu99" } */ + +#include + +typedef __CHAR16_TYPE__ char16_t; +typedef __CHAR32_TYPE__ char32_t; + +const char s0[] = R"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const char s1[] = "a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const char s2[] = R"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const char s3[] = "ab\nc]\"\nc]*|\"\nc"; + +const char t0[] = u8R"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const char t1[] = u8"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const char t2[] = u8R"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const char t3[] = u8"ab\nc]\"\nc]*|\"\nc"; + +const char16_t u0[] = uR"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const char16_t u1[] = u"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const char16_t u2[] = uR"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const char16_t u3[] = u"ab\nc]\"\nc]*|\"\nc"; + +const char32_t U0[] = UR"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const char32_t U1[] = U"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const char32_t U2[] = UR"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const char32_t U3[] = U"ab\nc]\"\nc]*|\"\nc"; + +const wchar_t L0[] = LR"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const wchar_t L1[] = L"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const wchar_t L2[] = LR"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const wchar_t L3[] = L"ab\nc]\"\nc]*|\"\nc"; + +int +main (void) +{ + if (sizeof (s0) != sizeof (s1) + || __builtin_memcmp (s0, s1, sizeof (s0)) != 0) + __builtin_abort (); + if (sizeof (s2) != sizeof (s3) + || __builtin_memcmp (s2, s3, sizeof (s2)) != 0) + __builtin_abort (); + if (sizeof (t0) != sizeof (t1) + || __builtin_memcmp (t0, t1, sizeof (t0)) != 0) + __builtin_abort (); + if (sizeof (t2) != sizeof (t3) + || __builtin_memcmp (t2, t3, sizeof (t2)) != 0) + __builtin_abort (); + if (sizeof (u0) != sizeof (u1) + || __builtin_memcmp (u0, u1, sizeof (u0)) != 0) + __builtin_abort (); + if (sizeof (u2) != sizeof (u3) + || __builtin_memcmp (u2, u3, sizeof (u2)) != 0) + __builtin_abort (); + if (sizeof (U0) != sizeof (U1) + || __builtin_memcmp (U0, U1, sizeof (U0)) != 0) + __builtin_abort (); + if (sizeof (U2) != sizeof (U3) + || __builtin_memcmp (U2, U3, sizeof (U2)) != 0) + __builtin_abort (); + if (sizeof (L0) != sizeof (L1) + || __builtin_memcmp (L0, L1, sizeof (L0)) != 0) + __builtin_abort (); + if (sizeof (L2) != sizeof (L3) + || __builtin_memcmp (L2, L3, sizeof (L2)) != 0) + __builtin_abort (); + if (sizeof (R"*[]*") != 1 + || __builtin_memcmp (R"*[]*", "", 1) != 0) + __builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.dg/raw-string-2.c.jj 2008-09-12 12:14:42.000000000 +0200 +++ gcc/testsuite/gcc.dg/raw-string-2.c 2008-09-12 13:37:10.000000000 +0200 @@ -0,0 +1,109 @@ +/* { dg-do run } */ +/* { dg-options "-std=gnu99" } */ + +#include + +typedef __CHAR16_TYPE__ char16_t; +typedef __CHAR32_TYPE__ char32_t; + +#define R +#define u +#define uR +#define U +#define UR +#define u8 +#define u8R +#define L +#define LR + +const char s00[] = R"[a]" "[b]"; +const char s01[] = "[a]" R"*[b]*"; +const char s02[] = R"[a]" R"[b]"; +const char s03[] = R"-[a]-" u8"[b]"; +const char s04[] = "[a]" u8R"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ"; +const char s05[] = R"[a]" u8R"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL"; +const char s06[] = u8R";([a];(" "[b]"; +const char s07[] = u8"[a]" R"[b]"; +const char s08[] = u8R"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-"; +const char s09[] = u8R"/^&|~!=,"'\[a]/^&|~!=,"'\" u8"[b]"; +const char s10[] = u8"[a]" u8R"0123456789abcdef[b]0123456789abcdef"; +const char s11[] = u8R"ghijklmnopqrstuv[a]ghijklmnopqrstuv" u8R"w[b]w"; + +const char16_t u03[] = R"-[a]-" u"[b]"; +const char16_t u04[] = "[a]" uR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ"; +const char16_t u05[] = R"[a]" uR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL"; +const char16_t u06[] = uR";([a];(" "[b]"; +const char16_t u07[] = u"[a]" R"[b]"; +const char16_t u08[] = uR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-"; +const char16_t u09[] = uR"/^&|~!=,"'\[a]/^&|~!=,"'\" u"[b]"; +const char16_t u10[] = u"[a]" uR"0123456789abcdef[b]0123456789abcdef"; +const char16_t u11[] = uR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" uR"w[b]w"; + +const char32_t U03[] = R"-[a]-" U"[b]"; +const char32_t U04[] = "[a]" UR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ"; +const char32_t U05[] = R"[a]" UR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL"; +const char32_t U06[] = UR";([a];(" "[b]"; +const char32_t U07[] = U"[a]" R"[b]"; +const char32_t U08[] = UR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-"; +const char32_t U09[] = UR"/^&|~!=,"'\[a]/^&|~!=,"'\" U"[b]"; +const char32_t U10[] = U"[a]" UR"0123456789abcdef[b]0123456789abcdef"; +const char32_t U11[] = UR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" UR"w[b]w"; + +const wchar_t L03[] = R"-[a]-" L"[b]"; +const wchar_t L04[] = "[a]" LR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ"; +const wchar_t L05[] = R"[a]" LR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL"; +const wchar_t L06[] = LR";([a];(" "[b]"; +const wchar_t L07[] = L"[a]" R"[b]"; +const wchar_t L08[] = LR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-"; +const wchar_t L09[] = LR"/^&|~!=,"'\[a]/^&|~!=,"'\" L"[b]"; +const wchar_t L10[] = L"[a]" LR"0123456789abcdef[b]0123456789abcdef"; +const wchar_t L11[] = LR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" LR"w[b]w"; + +int +main (void) +{ +#define TEST(str, val) \ + if (sizeof (str) != sizeof (val) \ + || __builtin_memcmp (str, val, sizeof (str)) != 0) \ + __builtin_abort () + TEST (s00, "a[b]"); + TEST (s01, "[a]b"); + TEST (s02, "ab"); + TEST (s03, "a[b]"); + TEST (s04, "[a]b"); + TEST (s05, "ab"); + TEST (s06, "a[b]"); + TEST (s07, "[a]b"); + TEST (s08, "ab"); + TEST (s09, "a[b]"); + TEST (s10, "[a]b"); + TEST (s11, "ab"); + TEST (u03, u"a[b]"); + TEST (u04, u"[a]b"); + TEST (u05, u"ab"); + TEST (u06, u"a[b]"); + TEST (u07, u"[a]b"); + TEST (u08, u"ab"); + TEST (u09, u"a[b]"); + TEST (u10, u"[a]b"); + TEST (u11, u"ab"); + TEST (U03, U"a[b]"); + TEST (U04, U"[a]b"); + TEST (U05, U"ab"); + TEST (U06, U"a[b]"); + TEST (U07, U"[a]b"); + TEST (U08, U"ab"); + TEST (U09, U"a[b]"); + TEST (U10, U"[a]b"); + TEST (U11, U"ab"); + TEST (L03, L"a[b]"); + TEST (L04, L"[a]b"); + TEST (L05, L"ab"); + TEST (L06, L"a[b]"); + TEST (L07, L"[a]b"); + TEST (L08, L"ab"); + TEST (L09, L"a[b]"); + TEST (L10, L"[a]b"); + TEST (L11, L"ab"); + return 0; +} --- gcc/testsuite/gcc.dg/raw-string-3.c.jj 2008-09-12 13:27:09.000000000 +0200 +++ gcc/testsuite/gcc.dg/raw-string-3.c 2008-09-12 13:42:55.000000000 +0200 @@ -0,0 +1,53 @@ +/* If not gnu99, the {,u,u8,U,L}R prefix should be parsed as separate + token. */ +/* { dg-do compile } */ +/* { dg-options "" } */ + +const void *s0 = R"[a]"; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 6 } */ +const void *s1 = uR"[a]"; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 8 } */ +const void *s2 = UR"[a]"; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 10 } */ +const void *s3 = u8R"[a]"; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 12 } */ +const void *s4 = LR"[a]"; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 14 } */ + +const int i0 = R'a'; /* { dg-error "expected ',' or ';'" } */ +const int i1 = uR'a'; /* { dg-error "expected ',' or ';'" } */ +const int i2 = UR'a'; /* { dg-error "expected ',' or ';'" } */ +const int i3 = u8R'a'; /* { dg-error "expected ',' or ';'" } */ +const int i4 = LR'a'; /* { dg-error "expected ',' or ';'" } */ + +#define R "a" +#define uR "b" +#define UR "c" +#define u8R "d" +#define LR "e" + +const void *s5 = R"[a]"; +const void *s6 = uR"[a]"; +const void *s7 = UR"[a]"; +const void *s8 = u8R"[a]"; +const void *s9 = LR"[a]"; + +#undef R +#undef uR +#undef UR +#undef u8R +#undef LR + +#define R 1 + +#define uR 2 + +#define UR 3 + +#define u8R 4 + +#define LR 5 + + +const int i5 = R'a'; +const int i6 = uR'a'; +const int i7 = UR'a'; +const int i8 = u8R'a'; +const int i9 = LR'a'; + +int main () {} --- gcc/testsuite/gcc.dg/raw-string-4.c.jj 2008-09-12 13:27:09.000000000 +0200 +++ gcc/testsuite/gcc.dg/raw-string-4.c 2008-09-12 13:33:43.000000000 +0200 @@ -0,0 +1,28 @@ +/* R is not applicable for character literals. */ +/* { dg-do compile } */ +/* { dg-options "-std=gnu99" } */ + +const int i0 = R'a'; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 5 } */ +const int i1 = uR'a'; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 7 } */ +const int i2 = UR'a'; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 9 } */ +const int i3 = u8R'a'; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 11 } */ +const int i4 = LR'a'; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 13 } */ + +#define R 1 + +#define uR 2 + +#define UR 3 + +#define u8R 4 + +#define LR 5 + + +const int i5 = R'a'; +const int i6 = uR'a'; +const int i7 = UR'a'; +const int i8 = u8R'a'; +const int i9 = LR'a'; + +int main () {} --- gcc/testsuite/gcc.dg/raw-string-5.c.jj 2008-09-12 13:49:58.000000000 +0200 +++ gcc/testsuite/gcc.dg/raw-string-5.c 2008-09-12 13:59:14.000000000 +0200 @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-std=gnu99" } */ + +const void *s0 = R"0123456789abcdefg[]0123456789abcdefg"; + /* { dg-error "raw string delimiter longer" "" { target *-*-* } 4 } */ + /* { dg-error "stray" "" { target *-*-* } 4 } */ +const void *s1 = R" [] "; + /* { dg-error "invalid character" "" { target *-*-* } 7 } */ + /* { dg-error "stray" "" { target *-*-* } 7 } */ +const void *s2 = R" [] "; + /* { dg-error "invalid character" "" { target *-*-* } 10 } */ + /* { dg-error "stray" "" { target *-*-* } 10 } */ +const void *s3 = R"][]]"; + /* { dg-error "invalid character" "" { target *-*-* } 13 } */ + /* { dg-error "stray" "" { target *-*-* } 13 } */ +const void *s4 = R"@[]@"; + /* { dg-error "invalid character" "" { target *-*-* } 16 } */ + /* { dg-error "stray" "" { target *-*-* } 16 } */ +const void *s5 = R"$[]$"; + /* { dg-error "invalid character" "" { target *-*-* } 19 } */ + /* { dg-error "stray" "" { target *-*-* } 19 } */ + +int main () {} --- gcc/testsuite/gcc.dg/raw-string-6.c.jj 2008-09-12 13:59:33.000000000 +0200 +++ gcc/testsuite/gcc.dg/raw-string-6.c 2008-09-12 14:03:46.000000000 +0200 @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-std=gnu99" } */ + +const void *s0 = R"ouch[]ouCh"; /* { dg-error "expected expression at end of input" } */ + /* { dg-error "unterminated raw string" "" { target *-*-* } 6 } */ --- gcc/testsuite/gcc.dg/raw-string-7.c.jj 2008-09-12 14:27:39.000000000 +0200 +++ gcc/testsuite/gcc.dg/raw-string-7.c 2008-09-12 14:34:17.000000000 +0200 @@ -0,0 +1,23 @@ +/* The trailing whitespace after \ and before newline extension + breaks full compliance for raw strings. */ +/* { dg-do run { xfail *-*-* } } */ +/* { dg-options "-std=gnu99" } */ + +/* Note, there is a single space after \ on the following line. */ +const void *s0 = R"[\ +]"; +/* { dg-bogus "backslash and newline separated by space" "" { xfail *-*-* } 7 } */ + +/* Note, there is a single tab after \ on the following line. */ +const void *s1 = R"[\ +]"; +/* { dg-bogus "backslash and newline separated by space" "" { xfail *-*-* } 12 } */ + +int +main (void) +{ + if (__builtin_strcmp (s0, "\\ \n") != 0 + || __builtin_strcmp (s1, "\\\t\n") != 0) + __builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.dg/utf8-1.c.jj 2008-09-12 10:01:47.000000000 +0200 +++ gcc/testsuite/gcc.dg/utf8-1.c 2008-09-12 11:45:48.000000000 +0200 @@ -0,0 +1,45 @@ +/* { dg-do run } */ +/* { dg-require-iconv "ISO-8859-2" } */ +/* { dg-options "-std=gnu99 -fexec-charset=ISO-8859-2" } */ + +const char *str1 = "h\u00e1\U0000010Dky "; +const char *str2 = "\u010d\u00E1rky\n"; +const char *str3 = u8"h\u00e1\U0000010Dky "; +const char *str4 = u8"\u010d\u00E1rky\n"; +const char *str5 = "h\u00e1\U0000010Dky " "\u010d\u00E1rky\n"; +const char *str6 = u8"h\u00e1\U0000010Dky " "\u010d\u00E1rky\n"; +const char *str7 = "h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n"; +#define u8 +const char *str8 = u8"h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n"; + +const char latin2_1[] = "\x68\xe1\xe8\x6b\x79\x20"; +const char latin2_2[] = "\xe8\xe1\x72\x6b\x79\n"; +const char utf8_1[] = "\x68\xc3\xa1\xc4\x8d\x6b\x79\x20"; +const char utf8_2[] = "\xc4\x8d\xc3\xa1\x72\x6b\x79\n"; + +int +main (void) +{ + if (__builtin_strcmp (str1, latin2_1) != 0 + || __builtin_strcmp (str2, latin2_2) != 0 + || __builtin_strcmp (str3, utf8_1) != 0 + || __builtin_strcmp (str4, utf8_2) != 0 + || __builtin_strncmp (str5, latin2_1, sizeof (latin2_1) - 1) != 0 + || __builtin_strcmp (str5 + sizeof (latin2_1) - 1, latin2_2) != 0 + || __builtin_strncmp (str6, utf8_1, sizeof (utf8_1) - 1) != 0 + || __builtin_strcmp (str6 + sizeof (utf8_1) - 1, utf8_2) != 0 + || __builtin_strncmp (str7, utf8_1, sizeof (utf8_1) - 1) != 0 + || __builtin_strcmp (str7 + sizeof (utf8_1) - 1, utf8_2) != 0 + || __builtin_strncmp (str8, utf8_1, sizeof (utf8_1) - 1) != 0 + || __builtin_strcmp (str8 + sizeof (utf8_1) - 1, utf8_2) != 0) + __builtin_abort (); + if (sizeof ("a" u8"b"[0]) != 1 + || sizeof (u8"a" "b"[0]) != 1 + || sizeof (u8"a" u8"b"[0]) != 1 + || sizeof ("a" "\u010d") != 3 + || sizeof ("a" u8"\u010d") != 4 + || sizeof (u8"a" "\u010d") != 4 + || sizeof (u8"a" "\u010d") != 4) + __builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.dg/utf8-2.c.jj 2008-09-12 11:27:51.000000000 +0200 +++ gcc/testsuite/gcc.dg/utf8-2.c 2008-09-12 11:36:48.000000000 +0200 @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-std=gnu99" } */ + +#include + +typedef __CHAR16_TYPE__ char16_t; +typedef __CHAR32_TYPE__ char32_t; + +const char s0[] = u8"ab"; +const char16_t s1[] = u8"ab"; /* { dg-error "from non-wide" } */ +const char32_t s2[] = u8"ab"; /* { dg-error "from non-wide" } */ +const wchar_t s3[] = u8"ab"; /* { dg-error "from non-wide" } */ + +const char t0[0] = u8"ab"; /* { dg-warning "chars is too long" } */ +const char t1[1] = u8"ab"; /* { dg-warning "chars is too long" } */ +const char t2[2] = u8"ab"; +const char t3[3] = u8"ab"; +const char t4[4] = u8"ab"; + +const char u0[0] = u8"\u2160."; /* { dg-warning "chars is too long" } */ +const char u1[1] = u8"\u2160."; /* { dg-warning "chars is too long" } */ +const char u2[2] = u8"\u2160."; /* { dg-warning "chars is too long" } */ +const char u3[3] = u8"\u2160."; /* { dg-warning "chars is too long" } */ +const char u4[4] = u8"\u2160."; +const char u5[5] = u8"\u2160."; +const char u6[6] = u8"\u2160."; --- gcc/testsuite/gcc.dg/utf-badconcat2.c.jj 2008-09-12 11:28:26.000000000 +0200 +++ gcc/testsuite/gcc.dg/utf-badconcat2.c 2008-09-12 11:30:53.000000000 +0200 @@ -0,0 +1,15 @@ +/* Test unsupported concatenation of UTF-8 string literals. */ +/* { dg-do compile } */ +/* { dg-options "-std=gnu99" } */ + +void *s0 = u8"a" "b"; +void *s1 = "a" u8"b"; +void *s2 = u8"a" u8"b"; +void *s3 = u8"a" u"b"; /* { dg-error "non-standard concatenation" } */ +void *s4 = u"a" u8"b"; /* { dg-error "non-standard concatenation" } */ +void *s5 = u8"a" U"b"; /* { dg-error "non-standard concatenation" } */ +void *s6 = U"a" u8"b"; /* { dg-error "non-standard concatenation" } */ +void *s7 = u8"a" L"b"; /* { dg-error "non-standard concatenation" } */ +void *s8 = L"a" u8"b"; /* { dg-error "non-standard concatenation" } */ + +int main () {} --- gcc/testsuite/gcc.dg/utf-dflt2.c.jj 2008-09-12 11:32:03.000000000 +0200 +++ gcc/testsuite/gcc.dg/utf-dflt2.c 2008-09-12 13:24:39.000000000 +0200 @@ -0,0 +1,12 @@ +/* If not gnu99, the u8 prefix should be parsed as separate tokens. */ +/* { dg-do compile } */ +/* { dg-options "" } */ + +const void *s0 = u8"a"; /* { dg-error "undeclared" } */ + /* { dg-error "expected ',' or ';'" "" { target *-*-* } 5 } */ + +#define u8 "a" + +const void *s1 = u8"a"; + +int main () {} --- gcc/testsuite/g++.dg/ext/raw-string-1.C.jj 2008-09-12 11:48:36.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/raw-string-1.C 2008-09-12 14:18:07.000000000 +0200 @@ -0,0 +1,96 @@ +// { dg-do run } +// { dg-options "-std=c++0x" } + +const char s0[] = R"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const char s1[] = "a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const char s2[] = R"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const char s3[] = "ab\nc]\"\nc]*|\"\nc"; + +const char t0[] = u8R"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const char t1[] = u8"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const char t2[] = u8R"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const char t3[] = u8"ab\nc]\"\nc]*|\"\nc"; + +const char16_t u0[] = uR"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const char16_t u1[] = u"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const char16_t u2[] = uR"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const char16_t u3[] = u"ab\nc]\"\nc]*|\"\nc"; + +const char32_t U0[] = UR"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const char32_t U1[] = U"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const char32_t U2[] = UR"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const char32_t U3[] = U"ab\nc]\"\nc]*|\"\nc"; + +const wchar_t L0[] = LR"[a\ +\u010d\U0000010D\\\'\"\?\a\b\f\n\r\t\v\0\00\000\xa\xabb +c]"; +const wchar_t L1[] = L"a\U0000010d\u010d\\\\\\'\\\"\\?\\a\\b\\f\\n\\r\\t\\v\\0\\00\\000\\xa\\xabb\nc"; +const wchar_t L2[] = LR"*|*[a\ +b +c]" +c]*|" +c]*|*"; +const wchar_t L3[] = L"ab\nc]\"\nc]*|\"\nc"; + +int +main (void) +{ + if (sizeof (s0) != sizeof (s1) + || __builtin_memcmp (s0, s1, sizeof (s0)) != 0) + __builtin_abort (); + if (sizeof (s2) != sizeof (s3) + || __builtin_memcmp (s2, s3, sizeof (s2)) != 0) + __builtin_abort (); + if (sizeof (t0) != sizeof (t1) + || __builtin_memcmp (t0, t1, sizeof (t0)) != 0) + __builtin_abort (); + if (sizeof (t2) != sizeof (t3) + || __builtin_memcmp (t2, t3, sizeof (t2)) != 0) + __builtin_abort (); + if (sizeof (u0) != sizeof (u1) + || __builtin_memcmp (u0, u1, sizeof (u0)) != 0) + __builtin_abort (); + if (sizeof (u2) != sizeof (u3) + || __builtin_memcmp (u2, u3, sizeof (u2)) != 0) + __builtin_abort (); + if (sizeof (U0) != sizeof (U1) + || __builtin_memcmp (U0, U1, sizeof (U0)) != 0) + __builtin_abort (); + if (sizeof (U2) != sizeof (U3) + || __builtin_memcmp (U2, U3, sizeof (U2)) != 0) + __builtin_abort (); + if (sizeof (L0) != sizeof (L1) + || __builtin_memcmp (L0, L1, sizeof (L0)) != 0) + __builtin_abort (); + if (sizeof (L2) != sizeof (L3) + || __builtin_memcmp (L2, L3, sizeof (L2)) != 0) + __builtin_abort (); + if (sizeof (R"*[]*") != 1 + || __builtin_memcmp (R"*[]*", "", 1) != 0) + __builtin_abort (); + return 0; +} --- gcc/testsuite/g++.dg/ext/raw-string-2.C.jj 2008-09-12 12:14:42.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/raw-string-2.C 2008-09-12 14:18:14.000000000 +0200 @@ -0,0 +1,104 @@ +// { dg-do run } +// { dg-options "-std=c++0x" } + +#define R +#define u +#define uR +#define U +#define UR +#define u8 +#define u8R +#define L +#define LR + +const char s00[] = R"[a]" "[b]"; +const char s01[] = "[a]" R"*[b]*"; +const char s02[] = R"[a]" R"[b]"; +const char s03[] = R"-[a]-" u8"[b]"; +const char s04[] = "[a]" u8R"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ"; +const char s05[] = R"[a]" u8R"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL"; +const char s06[] = u8R";([a];(" "[b]"; +const char s07[] = u8"[a]" R"[b]"; +const char s08[] = u8R"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-"; +const char s09[] = u8R"/^&|~!=,"'\[a]/^&|~!=,"'\" u8"[b]"; +const char s10[] = u8"[a]" u8R"0123456789abcdef[b]0123456789abcdef"; +const char s11[] = u8R"ghijklmnopqrstuv[a]ghijklmnopqrstuv" u8R"w[b]w"; + +const char16_t u03[] = R"-[a]-" u"[b]"; +const char16_t u04[] = "[a]" uR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ"; +const char16_t u05[] = R"[a]" uR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL"; +const char16_t u06[] = uR";([a];(" "[b]"; +const char16_t u07[] = u"[a]" R"[b]"; +const char16_t u08[] = uR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-"; +const char16_t u09[] = uR"/^&|~!=,"'\[a]/^&|~!=,"'\" u"[b]"; +const char16_t u10[] = u"[a]" uR"0123456789abcdef[b]0123456789abcdef"; +const char16_t u11[] = uR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" uR"w[b]w"; + +const char32_t U03[] = R"-[a]-" U"[b]"; +const char32_t U04[] = "[a]" UR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ"; +const char32_t U05[] = R"[a]" UR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL"; +const char32_t U06[] = UR";([a];(" "[b]"; +const char32_t U07[] = U"[a]" R"[b]"; +const char32_t U08[] = UR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-"; +const char32_t U09[] = UR"/^&|~!=,"'\[a]/^&|~!=,"'\" U"[b]"; +const char32_t U10[] = U"[a]" UR"0123456789abcdef[b]0123456789abcdef"; +const char32_t U11[] = UR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" UR"w[b]w"; + +const wchar_t L03[] = R"-[a]-" L"[b]"; +const wchar_t L04[] = "[a]" LR"MNOPQRSTUVWXYZ[b]MNOPQRSTUVWXYZ"; +const wchar_t L05[] = R"[a]" LR"wxyzABCDEFGHIJKL[b]wxyzABCDEFGHIJKL"; +const wchar_t L06[] = LR";([a];(" "[b]"; +const wchar_t L07[] = L"[a]" R"[b]"; +const wchar_t L08[] = LR"[a]" R"_{}#()<>%:;.?*+-[b]_{}#()<>%:;.?*+-"; +const wchar_t L09[] = LR"/^&|~!=,"'\[a]/^&|~!=,"'\" L"[b]"; +const wchar_t L10[] = L"[a]" LR"0123456789abcdef[b]0123456789abcdef"; +const wchar_t L11[] = LR"ghijklmnopqrstuv[a]ghijklmnopqrstuv" LR"w[b]w"; + +int +main (void) +{ +#define TEST(str, val) \ + if (sizeof (str) != sizeof (val) \ + || __builtin_memcmp (str, val, sizeof (str)) != 0) \ + __builtin_abort () + TEST (s00, "a[b]"); + TEST (s01, "[a]b"); + TEST (s02, "ab"); + TEST (s03, "a[b]"); + TEST (s04, "[a]b"); + TEST (s05, "ab"); + TEST (s06, "a[b]"); + TEST (s07, "[a]b"); + TEST (s08, "ab"); + TEST (s09, "a[b]"); + TEST (s10, "[a]b"); + TEST (s11, "ab"); + TEST (u03, u"a[b]"); + TEST (u04, u"[a]b"); + TEST (u05, u"ab"); + TEST (u06, u"a[b]"); + TEST (u07, u"[a]b"); + TEST (u08, u"ab"); + TEST (u09, u"a[b]"); + TEST (u10, u"[a]b"); + TEST (u11, u"ab"); + TEST (U03, U"a[b]"); + TEST (U04, U"[a]b"); + TEST (U05, U"ab"); + TEST (U06, U"a[b]"); + TEST (U07, U"[a]b"); + TEST (U08, U"ab"); + TEST (U09, U"a[b]"); + TEST (U10, U"[a]b"); + TEST (U11, U"ab"); + TEST (L03, L"a[b]"); + TEST (L04, L"[a]b"); + TEST (L05, L"ab"); + TEST (L06, L"a[b]"); + TEST (L07, L"[a]b"); + TEST (L08, L"ab"); + TEST (L09, L"a[b]"); + TEST (L10, L"[a]b"); + TEST (L11, L"ab"); + return 0; +} --- gcc/testsuite/g++.dg/ext/raw-string-3.C.jj 2008-09-12 13:27:09.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/raw-string-3.C 2008-09-12 14:17:57.000000000 +0200 @@ -0,0 +1,58 @@ +// If c++98, the {,u,u8,U,L}R prefix should be parsed as separate +// token. +// { dg-do compile } +// { dg-options "-std=c++98" } + +const void *s0 = R"[a]"; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 6 } +const void *s1 = uR"[a]"; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 8 } +const void *s2 = UR"[a]"; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 10 } +const void *s3 = u8R"[a]"; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 12 } +const void *s4 = LR"[a]"; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 14 } + +const int i0 = R'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 17 } +const int i1 = uR'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 19 } +const int i2 = UR'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 21 } +const int i3 = u8R'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 23 } +const int i4 = LR'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 25 } + +#define R "a" +#define uR "b" +#define UR "c" +#define u8R "d" +#define LR "e" + +const void *s5 = R"[a]"; +const void *s6 = uR"[a]"; +const void *s7 = UR"[a]"; +const void *s8 = u8R"[a]"; +const void *s9 = LR"[a]"; + +#undef R +#undef uR +#undef UR +#undef u8R +#undef LR + +#define R 1 + +#define uR 2 + +#define UR 3 + +#define u8R 4 + +#define LR 5 + + +const int i5 = R'a'; +const int i6 = uR'a'; +const int i7 = UR'a'; +const int i8 = u8R'a'; +const int i9 = LR'a'; + +int main () {} --- gcc/testsuite/g++.dg/ext/raw-string-4.C.jj 2008-09-12 13:27:09.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/raw-string-4.C 2008-09-12 14:18:23.000000000 +0200 @@ -0,0 +1,28 @@ +// R is not applicable for character literals. +// { dg-do compile } +// { dg-options "-std=c++0x" } + +const int i0 = R'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 5 } +const int i1 = uR'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 7 } +const int i2 = UR'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 9 } +const int i3 = u8R'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 11 } +const int i4 = LR'a'; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 13 } + +#define R 1 + +#define uR 2 + +#define UR 3 + +#define u8R 4 + +#define LR 5 + + +const int i5 = R'a'; +const int i6 = uR'a'; +const int i7 = UR'a'; +const int i8 = u8R'a'; +const int i9 = LR'a'; + +int main () {} --- gcc/testsuite/g++.dg/ext/raw-string-5.C.jj 2008-09-12 13:49:58.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/raw-string-5.C 2008-09-12 14:18:32.000000000 +0200 @@ -0,0 +1,23 @@ +// { dg-do compile } +// { dg-options "-std=c++0x" } + +const void *s0 = R"0123456789abcdefg[]0123456789abcdefg"; + // { dg-error "raw string delimiter longer" "" { target *-*-* } 4 } + // { dg-error "stray" "" { target *-*-* } 4 } +const void *s1 = R" [] "; + // { dg-error "invalid character" "" { target *-*-* } 7 } + // { dg-error "stray" "" { target *-*-* } 7 } +const void *s2 = R" [] "; + // { dg-error "invalid character" "" { target *-*-* } 10 } + // { dg-error "stray" "" { target *-*-* } 10 } +const void *s3 = R"][]]"; + // { dg-error "invalid character" "" { target *-*-* } 13 } + // { dg-error "stray" "" { target *-*-* } 13 } +const void *s4 = R"@[]@"; + // { dg-error "invalid character" "" { target *-*-* } 16 } + // { dg-error "stray" "" { target *-*-* } 16 } +const void *s5 = R"$[]$"; + // { dg-error "invalid character" "" { target *-*-* } 19 } + // { dg-error "stray" "" { target *-*-* } 19 } + +int main () {} --- gcc/testsuite/g++.dg/ext/raw-string-6.C.jj 2008-09-12 13:59:33.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/raw-string-6.C 2008-09-12 14:20:21.000000000 +0200 @@ -0,0 +1,5 @@ +// { dg-do compile } +// { dg-options "-std=c++0x" } + +const void *s0 = R"ouch[]ouCh"; // { dg-error "at end of input" } + // { dg-error "unterminated raw string" "" { target *-*-* } 6 } --- gcc/testsuite/g++.dg/ext/raw-string-7.C.jj 2008-09-12 14:34:54.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/raw-string-7.C 2008-09-12 14:36:40.000000000 +0200 @@ -0,0 +1,23 @@ +// The trailing whitespace after \ and before newline extension +// breaks full compliance for raw strings. +// { dg-do run { xfail *-*-* } } +// { dg-options "-std=c++0x" } + +// Note, there is a single space after \ on the following line. +const char *s0 = R"[\ +]"; +// { dg-bogus "backslash and newline separated by space" "" { xfail *-*-* } 7 } + +// Note, there is a single tab after \ on the following line. +const char *s1 = R"[\ +]"; +// { dg-bogus "backslash and newline separated by space" "" { xfail *-*-* } 12 } + +int +main (void) +{ + if (__builtin_strcmp (s0, "\\ \n") != 0 + || __builtin_strcmp (s1, "\\\t\n") != 0) + __builtin_abort (); + return 0; +} --- gcc/testsuite/g++.dg/ext/utf8-1.C.jj 2008-09-12 10:01:47.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/utf8-1.C 2008-09-12 14:18:53.000000000 +0200 @@ -0,0 +1,45 @@ +// { dg-do run } +// { dg-require-iconv "ISO-8859-2" } +// { dg-options "-std=c++0x -fexec-charset=ISO-8859-2" } + +const char *str1 = "h\u00e1\U0000010Dky "; +const char *str2 = "\u010d\u00E1rky\n"; +const char *str3 = u8"h\u00e1\U0000010Dky "; +const char *str4 = u8"\u010d\u00E1rky\n"; +const char *str5 = "h\u00e1\U0000010Dky " "\u010d\u00E1rky\n"; +const char *str6 = u8"h\u00e1\U0000010Dky " "\u010d\u00E1rky\n"; +const char *str7 = "h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n"; +#define u8 +const char *str8 = u8"h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n"; + +const char latin2_1[] = "\x68\xe1\xe8\x6b\x79\x20"; +const char latin2_2[] = "\xe8\xe1\x72\x6b\x79\n"; +const char utf8_1[] = "\x68\xc3\xa1\xc4\x8d\x6b\x79\x20"; +const char utf8_2[] = "\xc4\x8d\xc3\xa1\x72\x6b\x79\n"; + +int +main (void) +{ + if (__builtin_strcmp (str1, latin2_1) != 0 + || __builtin_strcmp (str2, latin2_2) != 0 + || __builtin_strcmp (str3, utf8_1) != 0 + || __builtin_strcmp (str4, utf8_2) != 0 + || __builtin_strncmp (str5, latin2_1, sizeof (latin2_1) - 1) != 0 + || __builtin_strcmp (str5 + sizeof (latin2_1) - 1, latin2_2) != 0 + || __builtin_strncmp (str6, utf8_1, sizeof (utf8_1) - 1) != 0 + || __builtin_strcmp (str6 + sizeof (utf8_1) - 1, utf8_2) != 0 + || __builtin_strncmp (str7, utf8_1, sizeof (utf8_1) - 1) != 0 + || __builtin_strcmp (str7 + sizeof (utf8_1) - 1, utf8_2) != 0 + || __builtin_strncmp (str8, utf8_1, sizeof (utf8_1) - 1) != 0 + || __builtin_strcmp (str8 + sizeof (utf8_1) - 1, utf8_2) != 0) + __builtin_abort (); + if (sizeof ("a" u8"b"[0]) != 1 + || sizeof (u8"a" "b"[0]) != 1 + || sizeof (u8"a" u8"b"[0]) != 1 + || sizeof ("a" "\u010d") != 3 + || sizeof ("a" u8"\u010d") != 4 + || sizeof (u8"a" "\u010d") != 4 + || sizeof (u8"a" "\u010d") != 4) + __builtin_abort (); + return 0; +} --- gcc/testsuite/g++.dg/ext/utf8-2.C.jj 2008-09-12 11:27:51.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/utf8-2.C 2008-09-12 14:19:01.000000000 +0200 @@ -0,0 +1,21 @@ +// { dg-do compile } +// { dg-options "-std=c++0x" } + +const char s0[] = u8"ab"; +const char16_t s1[] = u8"ab"; // { dg-error "from non-wide" } +const char32_t s2[] = u8"ab"; // { dg-error "from non-wide" } +const wchar_t s3[] = u8"ab"; // { dg-error "from non-wide" } + +const char t0[0] = u8"ab"; // { dg-error "chars is too long" } +const char t1[1] = u8"ab"; // { dg-error "chars is too long" } +const char t2[2] = u8"ab"; // { dg-error "chars is too long" } +const char t3[3] = u8"ab"; +const char t4[4] = u8"ab"; + +const char u0[0] = u8"\u2160."; // { dg-error "chars is too long" } +const char u1[1] = u8"\u2160."; // { dg-error "chars is too long" } +const char u2[2] = u8"\u2160."; // { dg-error "chars is too long" } +const char u3[3] = u8"\u2160."; // { dg-error "chars is too long" } +const char u4[4] = u8"\u2160."; // { dg-error "chars is too long" } +const char u5[5] = u8"\u2160."; +const char u6[6] = u8"\u2160."; --- gcc/testsuite/g++.dg/ext/utf-badconcat2.C.jj 2008-09-12 11:28:26.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/utf-badconcat2.C 2008-09-12 14:19:17.000000000 +0200 @@ -0,0 +1,15 @@ +// Test unsupported concatenation of UTF-8 string literals. +// { dg-do compile } +// { dg-options "-std=c++0x" } + +const void *s0 = u8"a" "b"; +const void *s1 = "a" u8"b"; +const void *s2 = u8"a" u8"b"; +const void *s3 = u8"a" u"b"; // { dg-error "non-standard concatenation" } +const void *s4 = u"a" u8"b"; // { dg-error "non-standard concatenation" } +const void *s5 = u8"a" U"b"; // { dg-error "non-standard concatenation" } +const void *s6 = U"a" u8"b"; // { dg-error "non-standard concatenation" } +const void *s7 = u8"a" L"b"; // { dg-error "non-standard concatenation" } +const void *s8 = L"a" u8"b"; // { dg-error "non-standard concatenation" } + +int main () {} --- gcc/testsuite/g++.dg/ext/utf-dflt2.C.jj 2008-09-12 11:32:03.000000000 +0200 +++ gcc/testsuite/g++.dg/ext/utf-dflt2.C 2008-09-12 14:19:28.000000000 +0200 @@ -0,0 +1,12 @@ +// In C++0x, the u8 prefix should be parsed as separate tokens. +// { dg-do compile } +// { dg-options "-std=c++98" } + +const void *s0 = u8"a"; // { dg-error "was not declared" } + // { dg-error "expected ',' or ';'" "" { target *-*-* } 5 } + +#define u8 "a" + +const void *s1 = u8"a"; + +int main () {} Jakub