From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 5714B3858D35 for ; Tue, 14 Nov 2023 21:03:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5714B3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5714B3858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699995815; cv=none; b=hkWrQD4jmVkK+gnPVUA7YS3WSKdKs++ZcAFPlOxsy1r9ZiF3N744oNHhs+WNPoQLZo811PypPs4MkfllmXOz9XcKMovZgB80EMDGNjIexlNR1Ip73TkjOLODn2/7zHuZyF4YiEVjSk/2zb2P1Xki0EpEdDw54QUU6ixBZJ2ibcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699995815; c=relaxed/simple; bh=MLW1OTKbOKDYwYPFRHKLTeeD11RTXYlEmIlsFCFJ6D4=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=pxt5HvbouGWm+dBPdPue+R12BxSunMmTFBPVVPYgiEjFNzZlQPz5ctA9MsfXgsIa9qeJIoHxXfQhq8+rQRBZoqNDVSAeiDOtd4On0eFRQCxHxt5VW3AyhPADIk4dHATgr4rKLWcVck2fe3DCTmlgDbn+jvFSEQk/TkWfP1pHUag= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1699995813; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ZW681qrZlCmGZO+82tGM4Zi/CLcr00sBT6IlyaMFbSM=; b=gWTUzLtkfTUbjuBj0Lg0zhfOFQhbIJ13Ch5yKeM7XCLOVnZ4wjUS2LYaP2WILkl7+PwBVW NT+PMx4Eni72mPkNb88hmfSvFJDWUixJ8ZCIPVlAeZfti834mDtct4Njdnqm21o3B6SIzB tz4peKgxfDXR+yxoalZNTTnc/GFtcB0= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-448-1m-EM7TxMZ6wpb-2r7xvNw-1; Tue, 14 Nov 2023 16:03:29 -0500 X-MC-Unique: 1m-EM7TxMZ6wpb-2r7xvNw-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-66d91b47f23so61890906d6.2 for ; Tue, 14 Nov 2023 13:03:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699995809; x=1700600609; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZW681qrZlCmGZO+82tGM4Zi/CLcr00sBT6IlyaMFbSM=; b=cXUGHC9T6ZP/FzwZv14O3gVumMSI0pnJVgkjNPgWvqlgVyANbSN2wTmTmPMN9BkvVo /DB2CGD5ELOXHaMEZYwY7Kl0taVKhaZH/GLEjA4NzIO3BVkxZhw/rXUl/lzOfD91m4Ow pNXDWCjVSXfgiXMAPP/i3DwXBP1SZZjGVvbkMmJ5VxCVRZVlH1NfNggr9L5tjowOcmUO 6Jv1Na5hiZ84Z5z1ZD5iszTzCvV4D1tnQicZgAAC7HTDD9e/v7ruGhm4eim0PiEDruFC DA2B5jPCEj3ICbMCv0fewshLEKu2zjz03OcsDVdZyTFY9HTjXIT3PRvQyjDJRYqk81fC Rm0Q== X-Gm-Message-State: AOJu0YxcwGQFBE3ydQ/EDHfGuG7GRIfGEwtuwebRCnjYziEm5vbKCcGK yp51D0FHaHsHlzSVA+kn5r9QUSd6U5Uz5k3j6kBx8bpJwXOUZG+AW1ZPwq3BYXzbNtJzjz6ycWv cSSz30ARq1xjkGJQA7w== X-Received: by 2002:ad4:40cb:0:b0:66d:a741:d50e with SMTP id x11-20020ad440cb000000b0066da741d50emr3797932qvp.21.1699995808878; Tue, 14 Nov 2023 13:03:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IGKoKGuZ0ORk0wPD1WTfBPHG7eIvZEsyOxX5htdUE6kxadLVQNHnGR0yAxAb9yqsLxf+0Em0Q== X-Received: by 2002:ad4:40cb:0:b0:66d:a741:d50e with SMTP id x11-20020ad440cb000000b0066da741d50emr3797908qvp.21.1699995808503; Tue, 14 Nov 2023 13:03:28 -0800 (PST) Received: from redhat.com (2603-7000-9500-34a5-0000-0000-0000-1db4.res6.spectrum.com. [2603:7000:9500:34a5::1db4]) by smtp.gmail.com with ESMTPSA id i2-20020a0cd842000000b0066d1f118b7esm3195489qvj.1.2023.11.14.13.03.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Nov 2023 13:03:28 -0800 (PST) Date: Tue, 14 Nov 2023 16:03:26 -0500 From: Marek Polacek To: iain@sandoe.co.uk Cc: gcc-patches@gcc.gnu.org, joseph@codesourcery.com, jason@redhat.com Subject: Re: [PATCH 3/4] c-family, C++: Handle clang attributes [PR109877]. Message-ID: References: <20231113060244.90554-1-iain@sandoe.co.uk> <20231113060244.90554-2-iain@sandoe.co.uk> <20231113060244.90554-3-iain@sandoe.co.uk> <20231113060244.90554-4-iain@sandoe.co.uk> MIME-Version: 1.0 In-Reply-To: <20231113060244.90554-4-iain@sandoe.co.uk> User-Agent: Mutt/2.2.9 (2022-11-12) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, Nov 12, 2023 at 08:02:43PM -1000, Iain Sandoe wrote: > This adds the ability to defer the validation of numeric attribute > arguments until the sequence is parsed if the attribute being > handled is one known to be 'clang form'. > > We do this by considering the arguments to be strings regardless > of content and defer the interpretation of those strings until the > argument processing. > > Since the C++ front end lexes tokens separately (and would report > non-compliant numeric tokens before we can intercept them), we need > to implement a small state machine to track the lexing of attributes. > > PR c++/109877 > > gcc/cp/ChangeLog: > > * parser.cc (enum clang_attr_state): New. > (cp_lexer_attribute_state): New. > (cp_lexer_new_main): Set initial attribute lexing state. > (cp_lexer_get_preprocessor_token): Handle lexing of clang- > form attributes. > (cp_parser_clang_attribute): Handle clang-form attributes. > (cp_parser_gnu_attribute_list): Switch to clang-form parsing > where needed. > * parser.h : New flag to signal lexing clang-form attributes. > > Signed-off-by: Iain Sandoe > --- > gcc/cp/parser.cc | 230 +++++++++++++++++++++++++++++++++++++++++++++-- > gcc/cp/parser.h | 3 + > 2 files changed, 227 insertions(+), 6 deletions(-) > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc > index 5116bcb78f6..c12f473f2e3 100644 > --- a/gcc/cp/parser.cc > +++ b/gcc/cp/parser.cc > @@ -699,6 +699,91 @@ cp_lexer_handle_early_pragma (cp_lexer *lexer) > static cp_parser *cp_parser_new (cp_lexer *); > static GTY (()) cp_parser *the_parser; > > +/* Context-sensitive parse-checking for clang-style attributes. */ > + > +enum clang_attr_state { > + CA_NONE = 0, > + CA_ATTR, > + CA_BR1, CA_BR2, > + CA_LIST, > + CA_LIST_ARGS, > + CA_IS_CA, > + CA_CA_ARGS, > + CA_LIST_CONT > +}; > + > +/* State machine tracking context of attribute lexing. */ > + > +static enum clang_attr_state > +cp_lexer_attribute_state (cp_token& token, enum clang_attr_state attr_state) > +{ > + /* Implement a context-sensitive parser for clang attributes. > + We detect __attribute__((clang_style_attribute (ARGS))) and lex the > + args ARGS with the following differences from GNU attributes: > + (a) number-like values are lexed as strings [this allows lexing XX.YY.ZZ > + version numbers]. > + (b) we concatenate strings, since clang attributes allow this too. */ > + switch (attr_state) > + { > + case CA_NONE: > + if (token.type == CPP_KEYWORD > + && token.keyword == RID_ATTRIBUTE) > + attr_state = CA_ATTR; > + break; > + case CA_ATTR: > + if (token.type == CPP_OPEN_PAREN) > + attr_state = CA_BR1; > + else > + attr_state = CA_NONE; > + break; > + case CA_BR1: > + if (token.type == CPP_OPEN_PAREN) > + attr_state = CA_BR2; > + else > + attr_state = CA_NONE; > + break; > + case CA_BR2: > + if (token.type == CPP_NAME) > + { > + tree identifier = (token.type == CPP_KEYWORD) > + /* For keywords, use the canonical spelling, not the > + parsed identifier. */ > + ? ridpointers[(int) token.keyword] > + : token.u.value; > + identifier = canonicalize_attr_name (identifier); > + if (attribute_clang_form_p (identifier)) > + attr_state = CA_IS_CA; > + else > + attr_state = CA_LIST; > + } > + else > + attr_state = CA_NONE; > + break; > + case CA_IS_CA: > + case CA_LIST: > + if (token.type == CPP_COMMA) > + attr_state = CA_BR2; /* Back to the list outer. */ > + else if (token.type == CPP_OPEN_PAREN) > + attr_state = attr_state == CA_IS_CA ? CA_CA_ARGS > + : CA_LIST_ARGS; > + else > + attr_state = CA_NONE; > + break; > + case CA_CA_ARGS: /* We will special-case args in this state. */ > + case CA_LIST_ARGS: > + if (token.type == CPP_CLOSE_PAREN) > + attr_state = CA_LIST_CONT; > + break; > + case CA_LIST_CONT: > + if (token.type == CPP_COMMA) > + attr_state = CA_BR2; /* Back to the list outer. */ > + else > + attr_state = CA_NONE; > + break; > + } > + return attr_state; > +} > + > /* Create a new main C++ lexer, the lexer that gets tokens from the > preprocessor, and also create the main parser. */ > > @@ -715,6 +800,9 @@ cp_lexer_new_main (void) > c_common_no_more_pch (); > > cp_lexer *lexer = cp_lexer_alloc (); > + lexer->lex_number_as_string_p = false; > + enum clang_attr_state attr_state = CA_NONE; > + > /* Put the first token in the buffer. */ > cp_token *tok = lexer->buffer->quick_push (token); > > @@ -738,8 +826,20 @@ cp_lexer_new_main (void) > if (tok->type == CPP_PRAGMA_EOL) > cp_lexer_handle_early_pragma (lexer); > > + attr_state = cp_lexer_attribute_state (*tok, attr_state); > tok = vec_safe_push (lexer->buffer, cp_token ()); > - cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok); > + unsigned int flags = C_LEX_STRING_NO_JOIN; > + if (attr_state == CA_CA_ARGS) > + { > + /* We are handling a clang attribute; > + 1. do not try to diagnose x.y.z as a bad number, it could be a > + version. > + 2. join strings. */ > + flags = C_LEX_NUMBER_AS_STRING; > + lexer->lex_number_as_string_p = true; > + } > + cp_lexer_get_preprocessor_token (flags, tok); > + lexer->lex_number_as_string_p = false; > } > > lexer->next_token = lexer->buffer->address (); > @@ -956,7 +1056,15 @@ cp_lexer_get_preprocessor_token (unsigned flags, cp_token *token) > { > static int is_extern_c = 0; > > - /* Get a new token from the preprocessor. */ > + /* Get a new token from the preprocessor. > + int lex_flags = 0; > + if (lexer != NULL) > + { > + lex_flags = C_LEX_STRING_NO_JOIN; > + if (lexer->lex_number_as_string_p) > + lex_flags |= C_LEX_NUMBER_AS_STRING; > + } > + */ Why is this commented out? > token->type > = c_lex_with_flags (&token->u.value, &token->location, &token->flags, > flags); > @@ -29302,6 +29410,113 @@ cp_parser_gnu_attributes_opt (cp_parser* parser) > return attributes; > } > > +/* Parse the arguments list for a clang attribute. */ > +static tree > +cp_parser_clang_attribute (cp_parser *parser, tree/*attr_id*/) > +{ > + /* Each entry can be : > + identifier > + identifier=N.MM.Z > + identifier="string" > + followed by ',' or ) for the last entry*/ > + > + matching_parens parens; > + if (!parens.require_open (parser)) > + return NULL; > + > + bool save_translate_strings_p = parser->translate_strings_p; > + parser->translate_strings_p = false; This could be auto ts = make_temp_override (parser->translate_strings_p, false); and then... > + tree attr_args = NULL_TREE; > + cp_token *token = cp_lexer_peek_token (parser->lexer); > + if (token->type == CPP_NAME > + && cp_lexer_peek_nth_token (parser->lexer, 2)->type == CPP_COMMA) > + { > + tree platf = token->u.value; > + cp_lexer_consume_token (parser->lexer); > + attr_args = tree_cons (NULL_TREE, platf, NULL_TREE); > + } > + else > + { > + error_at (token->location, > + "expected a platform name followed by %<,%>"); > + cp_parser_skip_to_closing_parenthesis (parser, > + /*recovering=*/true, > + /*or_comma=*/false, > + /*consume_paren=*/true); > + parser->translate_strings_p = save_translate_strings_p; you don't need this, and the one below. > + return error_mark_node; > + } > + > + cp_lexer_consume_token (parser->lexer); /* consume the ',' */ > + do > + { > + tree name = NULL_TREE; > + tree value = NULL_TREE; > + > + token = cp_lexer_peek_token (parser->lexer); > + if (token->type == CPP_NAME) > + { > + name = token->u.value; > + cp_lexer_consume_token (parser->lexer); > + } > + else > + { > + /* FIXME: context-sensitive for that attrib. */ > + error_at (token->location, "expected an attribute keyword"); > + cp_parser_skip_to_closing_parenthesis (parser, > + /*recovering=*/true, > + /*or_comma=*/false, > + /*consume_paren=*/true); > + attr_args = error_mark_node; > + break; > + } > + > + if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)) > + { > + cp_lexer_consume_token (parser->lexer); /* eat the '=' */ > + token = cp_lexer_peek_token (parser->lexer); > + if (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN) > + && cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA)) > + { > + value = token->u.value; > + /* ???: check for error mark and early-return? */ > + cp_lexer_consume_token (parser->lexer); > + } > + else > + { > + error_at (token->location, "expected a value"); > + cp_parser_skip_to_closing_parenthesis (parser, > + /*recovering=*/true, > + /*or_comma=*/false, > + /*consume_paren=*/true); > + attr_args = error_mark_node; > + break; > + } > + } > + else if (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN) > + && cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA)) > + { > + error_at (token->location, "expected %<,%>, %<=%> or %<)%>"); > + cp_parser_skip_to_closing_parenthesis (parser, > + /*recovering=*/true, > + /*or_comma=*/false, > + /*consume_paren=*/true); > + attr_args = error_mark_node; > + break; > + } > + if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA)) > + cp_lexer_consume_token (parser->lexer); > + tree t = tree_cons (value, name, NULL_TREE); > + attr_args = chainon (attr_args, t); attr_chainon > + } while (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN)); I think we format do-while as do { } while (cond); > + > + parser->translate_strings_p = save_translate_strings_p; > + if (!parens.require_close (parser)) > + return error_mark_node; > + > + return attr_args; > +} > + > /* Parse a GNU attribute-list. > > attribute-list: > @@ -29361,9 +29576,12 @@ cp_parser_gnu_attribute_list (cp_parser* parser, bool exactly_one /* = false */) > > /* Peek at the next token. */ > token = cp_lexer_peek_token (parser->lexer); > - /* If it's an `(', then parse the attribute arguments. */ > - if (token->type == CPP_OPEN_PAREN) > + if (token->type == CPP_OPEN_PAREN > + && attribute_clang_form_p (identifier)) > + arguments = cp_parser_clang_attribute (parser, identifier); > + else if (token->type == CPP_OPEN_PAREN) I think it'd be nice to eliminate checking token->type == CPP_OPEN_PAREN twice here, and I'd say it's worth it wrapping attribute_clang_form_p (...) in UNLIKELY (), since it should be extremely rare to be true? Marek