From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 9D34A3858C1F for ; Mon, 31 Jan 2022 19:04:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9D34A3858C1F Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-640-86jNADUSOP2JiJir79obVQ-1; Mon, 31 Jan 2022 14:04:16 -0500 X-MC-Unique: 86jNADUSOP2JiJir79obVQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BB83F83DD25; Mon, 31 Jan 2022 19:04:15 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.125]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 13E1560BF1; Mon, 31 Jan 2022 19:03:52 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 20VJ3orJ3444669 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Mon, 31 Jan 2022 20:03:50 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 20VJ3nPU3444668; Mon, 31 Jan 2022 20:03:49 +0100 Date: Mon, 31 Jan 2022 20:03:49 +0100 From: Jakub Jelinek To: "Joseph S. Myers" , Marek Polacek , Jason Merrill Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] libcpp: Fix up padding handling in funlike_invocation_p [PR104147] Message-ID: <20220131190349.GM2646553@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Jan 2022 19:04:22 -0000 Hi! As mentioned in the PR, in some cases we preprocess incorrectly when we encounter an identifier which is defined as function-like macro, followed by at least 2 CPP_PADDING tokens and then some other identifier. On the following testcase, the problem is in the 3rd funlike_invocation_p, the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared token), CPP_PADDING (one created with padding_token, val.source is non-NULL and val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME. funlike_invocation_p remembers there was a padding token, but remembers the first one because of its condition, then the next token is the CPP_NAME, which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we can't easily backup more tokens, it pushes into a new context the padding token (the pfile->avoid_paste one). The net effect is that when Y is not defined as fun-like macro, we read Y, avoid_paste, padding_token, Y, while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y (the second avoid_paste is because that is how we handle end of a context). Now, for stringify_arg that is unfortunately a significant difference, which handles CPP_PADDING tokens with: if (token->type == CPP_PADDING) { if (source == NULL || (!(source->flags & PREV_WHITE) && token->val.source == NULL)) source = token->val.source; continue; } and later on /* Leading white space? */ if (dest - 1 != BUFF_FRONT (pfile->u_buff)) { if (source == NULL) source = token; if (source->flags & PREV_WHITE) *dest++ = ' '; } source = NULL; (and c-ppoutput.cc has similar code). So, when Y is not fun-like macro, ' ' is added because padding_token's val.source->flags & PREV_WHITE is non-zero, while when it is fun-like macro, we don't add ' ' in between, because source is NULL and so used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set. Now, the funlike_invocation_p condition if (padding == NULL || (!(padding->flags & PREV_WHITE) && token->val.source == NULL)) padding = token; looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume the intent was to prefer do the same thing and pick the right padding. But there are significant differences. Both stringify_arg and c-ppoutput.cc don't remember the CPP_PADDING token, but its val.source instead, while in funlike_invocation_p we want to remember the padding token that has the significant information for stringify_arg/c-ppoutput.cc. So, IMHO we want to overwrite padding if: 1) padding == NULL (remember that there was any padding at all) 2) padding->val.source == NULL (this matches the source == NULL case in stringify_arg) 3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL (this matches the !(source->flags & PREV_WHITE) && token->val.source == NULL case in stringify_arg) Bootstrapped/regtested on powerpc64le-linux, ok for trunk (and after a while for some release branches)? 2022-01-31 Jakub Jelinek PR preprocessor/104147 * macro.cc (funlike_invocation_p): For padding prefer a token with val.source non-NULL especially if it has PREV_WHITE set on val.source->flags. * c-c++-common/cpp/pr104147.c: New test. --- libcpp/macro.cc.jj 2022-01-18 11:59:00.277972128 +0100 +++ libcpp/macro.cc 2022-01-31 15:44:49.208847524 +0100 @@ -1374,7 +1374,9 @@ funlike_invocation_p (cpp_reader *pfile, if (token->type != CPP_PADDING) break; if (padding == NULL - || (!(padding->flags & PREV_WHITE) && token->val.source == NULL)) + || padding->val.source == NULL + || (!(padding->val.source->flags & PREV_WHITE) + && token->val.source == NULL)) padding = token; } --- gcc/testsuite/c-c++-common/cpp/pr104147.c.jj 2022-01-31 15:54:18.113847934 +0100 +++ gcc/testsuite/c-c++-common/cpp/pr104147.c 2022-01-31 15:54:04.530038941 +0100 @@ -0,0 +1,27 @@ +/* PR preprocessor/104147 */ +/* { dg-do run } */ + +#define X(x,y) x y +#define STR_(x) #x +#define STR(x) STR_(x) +const char *str = +STR(X(Y,Y)) +#define Y() +STR(X(Y,Y)) +#undef Y +STR(X(Y,Y)) +#define Y() +STR(X(Y,Y)) +STR(X(Y, +Y)) +STR(X(Y +,Y)) +; + +int +main () +{ + if (__builtin_strcmp (str, "Y YY YY YY YY YY Y") != 0) + __builtin_abort (); + return 0; +} Jakub