From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 4DC153858421 for ; Fri, 4 Nov 2022 18:25:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4DC153858421 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667586341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1w1wHKH1Cq8qns6qEehto4i/ZrieNTraJfZOmL6vd5o=; b=Yn2XElARyKhdpbIHbJ38fo8ii/4N/Z5WvWjZFyGgzB+EaF7Pn74/gvZmCHVsWk6wkNk5RW CFsLjR7/mm6KeqS0EkpSovUaolvpbOkQtidbcILlXU16GTeodi5+tU4MZzSUXy+V6hMYR2 S+N8dDMUndIO6YUNUrJCRqEF2v1Q2hA= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-665-Wf8w8l81Ou2jvx_xm-qkKw-1; Fri, 04 Nov 2022 14:25:40 -0400 X-MC-Unique: Wf8w8l81Ou2jvx_xm-qkKw-1 Received: by mail-qt1-f199.google.com with SMTP id i13-20020ac8764d000000b003a4ec8693dcso4291297qtr.14 for ; Fri, 04 Nov 2022 11:25:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1w1wHKH1Cq8qns6qEehto4i/ZrieNTraJfZOmL6vd5o=; b=HcMaXo6RNkBgl8ZA14h4oY4hT10QQ6D68nthNqcJzatIVK/PiMjUmrKKBTPdE+7VeA NhpQRGF+eI68VL/p9SKYtV/tJQVQ2mGwhDAyIA9hjrfe3Ue/GLLQ6opLaRUYu4YEO+BM lwY2V/sH3WGqpIMVwBmj+fyBhI4qje3xYb05UBMJ1w2ndnOI6QrltfbmH7UVc0c7LMHT nT3GrVQ8oDffFPehnPCx2jfheKltn2eca8keOVe99lbUrIR6PPpGriLFkee5A1jPQkEu tOj2LiR9ApkUH9HzMLieAl4wHzbgwEaaUYtX13MRv7ewAgXyrxnTF9cRzjJ1RoBLKo36 rI2g== X-Gm-Message-State: ACrzQf2r0WyE5d+Ax4QqYXBz2hCvKHjTZPlLu0rZy7rZpdtYwk/K7kGe RWIJfU1vgQmTAH8FlJTx7NjRexQLZCkf7FuTdlJaM3mMp/mTu0Z51pK/0bgI6XEF2KpUgbEhm5m +pkqRRAz7FhhgIyBvCQ== X-Received: by 2002:a05:622a:590c:b0:3a5:19ce:c047 with SMTP id ga12-20020a05622a590c00b003a519cec047mr27553461qtb.649.1667586340128; Fri, 04 Nov 2022 11:25:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6yZjDF6n6viSUxd+00yZD0AwzKu3rnKhK8hSSSOHQiKeEtBuh8h2ZWIaCIGldp3JiT9t/TDg== X-Received: by 2002:a05:622a:590c:b0:3a5:19ce:c047 with SMTP id ga12-20020a05622a590c00b003a519cec047mr27553434qtb.649.1667586339693; Fri, 04 Nov 2022 11:25:39 -0700 (PDT) Received: from [192.168.1.101] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43]) by smtp.gmail.com with ESMTPSA id w38-20020a05622a192600b0035ce8965045sm2829628qtc.42.2022.11.04.11.25.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 04 Nov 2022 11:25:38 -0700 (PDT) Message-ID: <6d804e2c-a23f-b2d3-a6c1-62e7a0d38139@redhat.com> Date: Fri, 4 Nov 2022 14:25:37 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1 Subject: Re: [PATCH] c++: libcpp: Support raw strings with newlines in directives [PR55971] To: Lewis Hyatt , gcc-patches@gcc.gnu.org References: <38b67944c0759299533ad163d002247996fa5e33.1666891579.git.lhyatt@gmail.com> From: Jason Merrill In-Reply-To: <38b67944c0759299533ad163d002247996fa5e33.1666891579.git.lhyatt@gmail.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 10/27/22 13:30, Lewis Hyatt wrote: > Hello- > > May I please ask for a review of this patch from June? I realize it's a > 10-year-old PR that doesn't seem to be bothering people much, but I still feel > like it's an unfortunate gap in C++11 support that is not hard to fix. > > Original submission is here: > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596820.html > > But I have attached a new version here that is simplified, all the > _Pragma-related stuff has been removed and I will handle that in a later patch > instead. I also removed the changes to c-ppoutput.cc that I realized were not > needed after all. Bootstrap+regtest all languages on x86-64 Linux still looks > good. Thanks! > > -Lewis OK, thanks. > -- >8 -- > > It's not currently possible to use a C++11 raw string containing a newline as > part of the definition of a macro, or in any other preprocessing directive, > such as: > > #define X R"(two > lines)" > > #error R"(this error has > two lines)" > > Add support for that by relaxing the conditions under which > _cpp_get_fresh_line() refuses to get a new line. For the case of lexing a raw > string, it's OK to do so as long as there is another line within the current > buffer. The code in cpp_get_fresh_line() was refactored into a new function > get_fresh_line_impl(), so that the new logic is applied only when processing a > raw string and not any other times. > > libcpp/ChangeLog: > > PR preprocessor/55971 > * lex.cc (get_fresh_line_impl): New function refactoring the code > from... > (_cpp_get_fresh_line): ...here. > (lex_raw_string): Use the new version of get_fresh_line_impl() to > support raw strings containing new lines when processing a directive. > > gcc/testsuite/ChangeLog: > > PR preprocessor/55971 > * c-c++-common/raw-string-directive-1.c: New test. > * c-c++-common/raw-string-directive-2.c: New test. > > gcc/c-family/ChangeLog: > > PR preprocessor/55971 > * c-ppoutput.cc (adjust_for_newlines): Update comment. > --- > gcc/c-family/c-ppoutput.cc | 10 ++- > .../c-c++-common/raw-string-directive-1.c | 74 +++++++++++++++++++ > .../c-c++-common/raw-string-directive-2.c | 33 +++++++++ > libcpp/lex.cc | 41 +++++++--- > 4 files changed, 148 insertions(+), 10 deletions(-) > create mode 100644 gcc/testsuite/c-c++-common/raw-string-directive-1.c > create mode 100644 gcc/testsuite/c-c++-common/raw-string-directive-2.c > > diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc > index a99d9e9c5ca..6e054358e9e 100644 > --- a/gcc/c-family/c-ppoutput.cc > +++ b/gcc/c-family/c-ppoutput.cc > @@ -433,7 +433,15 @@ scan_translation_unit_directives_only (cpp_reader *pfile) > lang_hooks.preprocess_token (pfile, NULL, streamer.filter); > } > > -/* Adjust print.src_line for newlines embedded in output. */ > +/* Adjust print.src_line for newlines embedded in output. For example, if a raw > + string literal contains newlines, then we need to increment our notion of the > + current line to keep in sync and avoid outputting a line marker > + unnecessarily. If a raw string literal containing newlines is the result of > + macro expansion, then we have the opposite problem, where the token takes up > + more lines in the output than it did in the input, and hence a line marker is > + needed to restore the correct state for subsequent lines. In this case, > + incrementing print.src_line still does the job, because it will cause us to > + emit the line marker the next time a token is streamed. */ > static void > account_for_newlines (const unsigned char *str, size_t len) > { > diff --git a/gcc/testsuite/c-c++-common/raw-string-directive-1.c b/gcc/testsuite/c-c++-common/raw-string-directive-1.c > new file mode 100644 > index 00000000000..d6525e107bc > --- /dev/null > +++ b/gcc/testsuite/c-c++-common/raw-string-directive-1.c > @@ -0,0 +1,74 @@ > +/* { dg-do compile } */ > +/* { dg-options "-std=gnu99" { target c } } */ > +/* { dg-options "-std=c++11" { target c++ } } */ > + > +/* Test that multi-line raw strings are lexed OK for all preprocessing > + directives where one could appear. Test raw-string-directive-2.c > + checks that #define is also processed properly. */ > + > +/* Note that in cases where we cause GCC to produce a multi-line error > + message, we construct the string so that the second line looks enough > + like an error message for DejaGNU to process it as such, so that we > + can use dg-warning or dg-error directives to check for it. */ > + > +#warning R"delim(line1 /* { dg-warning "line1" } */ > +file:15:1: warning: line2)delim" /* { dg-warning "line2" } */ > + > +#error R"delim(line3 /* { dg-error "line3" } */ > +file:18:1: error: line4)delim" /* { dg-error "line4" } */ > + > +#define X1 R"(line 5 > +line 6 > +line 7 > +line 8 > +/* > +// > +line 9)" R"delim( > +line10)delim" > + > +#define X2(a) X1 #a R"(line 11 > +/* > +line12 > +)" > + > +#if R"(line 13 /* { dg-error "line13" } */ > +file:35:1: error: line14)" /* { dg-error "line14\\)\"\" is not valid" } */ > +#endif R"(line 15 /* { dg-warning "extra tokens at end of #endif" } */ > +\ > +line16)" "" > + > +#ifdef XYZ R"(line17 /* { dg-warning "extra tokens at end of #ifdef" } */ > +\ > +\ > +line18)" > +#endif > + > +#if 1 > +#else R"(line23 /* { dg-warning "extra tokens at end of #else" } */ > +\ > + > +line24)" > +#endif > + > +#if 0 > +#elif R"(line 25 /* { dg-error "line25" } */ > +file:55:1: error: line26)" /* { dg-error "line26\\)\"\" is not valid" } */ > +#endif > + > +#line 60 R"(file:60:1: warning: this file has a space > +in it!)" > +#warning "line27" /* { dg-warning "line27" } */ > +/* { dg-warning "this file has a space" "#line check" { target *-*-* } 60 } */ > +#line 63 "file" > + > +#undef X1 R"(line28 /* { dg-warning "extra tokens at end of #undef" } */ > +line29 > +\ > +)" > + > +#ident R"(line30 > +line31)" R"(line 32 /* { dg-warning "extra tokens at end of #ident" } */ > +line 33)" > + > +#pragma GCC diagnostic ignored R"(-Woption /* { dg-warning "-Wpragmas" } */ > +-with-a-newline)" > diff --git a/gcc/testsuite/c-c++-common/raw-string-directive-2.c b/gcc/testsuite/c-c++-common/raw-string-directive-2.c > new file mode 100644 > index 00000000000..6fc673ccd82 > --- /dev/null > +++ b/gcc/testsuite/c-c++-common/raw-string-directive-2.c > @@ -0,0 +1,33 @@ > +/* { dg-do run } */ > +/* { dg-options "-std=gnu99" { target c } } */ > +/* { dg-options "-std=c++11" { target c++ } } */ > + > +#define S1 R"(three > +line > +string)" > + > +#define S2 R"(pasted > +two line)" " string" > + > +#define X(a, b) a b R"( > +one more)" > + > +const char *s1 = S1; > +const char *s2 = S2; > +const char *s3 = X(S1, R"( > +with this line plus)"); > + > +int main () > +{ > + const char s1_correct[] = "three\nline\nstring"; > + if (__builtin_memcmp (s1, s1_correct, sizeof s1_correct)) > + __builtin_abort (); > + > + const char s2_correct[] = "pasted\ntwo line string"; > + if (__builtin_memcmp (s2, s2_correct, sizeof s2_correct)) > + __builtin_abort (); > + > + const char s3_correct[] = "three\nline\nstring\nwith this line plus\none more"; > + if (__builtin_memcmp (s3, s3_correct, sizeof s3_correct)) > + __builtin_abort (); > +} > diff --git a/libcpp/lex.cc b/libcpp/lex.cc > index cc12a52d282..b1107920c94 100644 > --- a/libcpp/lex.cc > +++ b/libcpp/lex.cc > @@ -1076,6 +1076,9 @@ _cpp_clean_line (cpp_reader *pfile) > buffer->next_line = s + 1; > } > > +template > +static bool get_fresh_line_impl (cpp_reader *pfile); > + > /* Return true if the trigraph indicated by NOTE should be warned > about in a comment. */ > static bool > @@ -2695,9 +2698,8 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base) > { > pos--; > pfile->buffer->cur = pos; > - if (pfile->state.in_directive > - || (pfile->state.parsing_args > - && pfile->buffer->next_line >= pfile->buffer->rlimit)) > + if ((pfile->state.in_directive || pfile->state.parsing_args) > + && pfile->buffer->next_line >= pfile->buffer->rlimit) > { > cpp_error_with_line (pfile, CPP_DL_ERROR, token->src_loc, 0, > "unterminated raw string"); > @@ -2712,7 +2714,7 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base) > CPP_INCREMENT_LINE (pfile, 0); > pfile->buffer->need_line = true; > > - if (!_cpp_get_fresh_line (pfile)) > + if (!get_fresh_line_impl (pfile)) > { > /* We ran out of file and failed to get a line. */ > location_t src_loc = token->src_loc; > @@ -2724,8 +2726,15 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base) > _cpp_release_buff (pfile, accum.first); > cpp_error_with_line (pfile, CPP_DL_ERROR, src_loc, 0, > "unterminated raw string"); > - /* Now pop the buffer that _cpp_get_fresh_line did not. */ > + > + /* Now pop the buffer that get_fresh_line_impl() did not. Popping > + is not safe if processing a directive, however this cannot > + happen as we already checked above that a line would be > + available, and get_fresh_line_impl() can't fail in this > + case. */ > + gcc_assert (!pfile->state.in_directive); > _cpp_pop_buffer (pfile); > + > return; > } > > @@ -3659,11 +3668,14 @@ _cpp_lex_token (cpp_reader *pfile) > } > > /* Returns true if a fresh line has been loaded. */ > -bool > -_cpp_get_fresh_line (cpp_reader *pfile) > +template > +static bool > +get_fresh_line_impl (cpp_reader *pfile) > { > - /* We can't get a new line until we leave the current directive. */ > - if (pfile->state.in_directive) > + /* We can't get a new line until we leave the current directive, unless we > + are lexing a raw string, in which case it will be OK as long as we don't > + pop the current buffer. */ > + if (!lexing_raw_string && pfile->state.in_directive) > return false; > > for (;;) > @@ -3679,6 +3691,10 @@ _cpp_get_fresh_line (cpp_reader *pfile) > return true; > } > > + /* We can't change buffers until we leave the current directive. */ > + if (lexing_raw_string && pfile->state.in_directive) > + return false; > + > /* First, get out of parsing arguments state. */ > if (pfile->state.parsing_args) > return false; > @@ -3706,6 +3722,13 @@ _cpp_get_fresh_line (cpp_reader *pfile) > } > } > > +bool > +_cpp_get_fresh_line (cpp_reader *pfile) > +{ > + return get_fresh_line_impl (pfile); > +} > + > + > #define IF_NEXT_IS(CHAR, THEN_TYPE, ELSE_TYPE) \ > do \ > { \ >