From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=LBYO=G3=redhat.com=polacek@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id 5714B3858D35
	for <gcc-patches@gcc.gnu.org>; Tue, 14 Nov 2023 21:03:33 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5714B3858D35
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5714B3858D35
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699995815; cv=none;
	b=hkWrQD4jmVkK+gnPVUA7YS3WSKdKs++ZcAFPlOxsy1r9ZiF3N744oNHhs+WNPoQLZo811PypPs4MkfllmXOz9XcKMovZgB80EMDGNjIexlNR1Ip73TkjOLODn2/7zHuZyF4YiEVjSk/2zb2P1Xki0EpEdDw54QUU6ixBZJ2ibcs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1699995815; c=relaxed/simple;
	bh=MLW1OTKbOKDYwYPFRHKLTeeD11RTXYlEmIlsFCFJ6D4=;
	h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=pxt5HvbouGWm+dBPdPue+R12BxSunMmTFBPVVPYgiEjFNzZlQPz5ctA9MsfXgsIa9qeJIoHxXfQhq8+rQRBZoqNDVSAeiDOtd4On0eFRQCxHxt5VW3AyhPADIk4dHATgr4rKLWcVck2fe3DCTmlgDbn+jvFSEQk/TkWfP1pHUag=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1699995813;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=ZW681qrZlCmGZO+82tGM4Zi/CLcr00sBT6IlyaMFbSM=;
	b=gWTUzLtkfTUbjuBj0Lg0zhfOFQhbIJ13Ch5yKeM7XCLOVnZ4wjUS2LYaP2WILkl7+PwBVW
	NT+PMx4Eni72mPkNb88hmfSvFJDWUixJ8ZCIPVlAeZfti834mDtct4Njdnqm21o3B6SIzB
	tz4peKgxfDXR+yxoalZNTTnc/GFtcB0=
Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com
 [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-448-1m-EM7TxMZ6wpb-2r7xvNw-1; Tue, 14 Nov 2023 16:03:29 -0500
X-MC-Unique: 1m-EM7TxMZ6wpb-2r7xvNw-1
Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-66d91b47f23so61890906d6.2
        for <gcc-patches@gcc.gnu.org>; Tue, 14 Nov 2023 13:03:29 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699995809; x=1700600609;
        h=user-agent:in-reply-to:content-disposition:mime-version:references
         :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=ZW681qrZlCmGZO+82tGM4Zi/CLcr00sBT6IlyaMFbSM=;
        b=cXUGHC9T6ZP/FzwZv14O3gVumMSI0pnJVgkjNPgWvqlgVyANbSN2wTmTmPMN9BkvVo
         /DB2CGD5ELOXHaMEZYwY7Kl0taVKhaZH/GLEjA4NzIO3BVkxZhw/rXUl/lzOfD91m4Ow
         pNXDWCjVSXfgiXMAPP/i3DwXBP1SZZjGVvbkMmJ5VxCVRZVlH1NfNggr9L5tjowOcmUO
         6Jv1Na5hiZ84Z5z1ZD5iszTzCvV4D1tnQicZgAAC7HTDD9e/v7ruGhm4eim0PiEDruFC
         DA2B5jPCEj3ICbMCv0fewshLEKu2zjz03OcsDVdZyTFY9HTjXIT3PRvQyjDJRYqk81fC
         Rm0Q==
X-Gm-Message-State: AOJu0YxcwGQFBE3ydQ/EDHfGuG7GRIfGEwtuwebRCnjYziEm5vbKCcGK
	yp51D0FHaHsHlzSVA+kn5r9QUSd6U5Uz5k3j6kBx8bpJwXOUZG+AW1ZPwq3BYXzbNtJzjz6ycWv
	cSSz30ARq1xjkGJQA7w==
X-Received: by 2002:ad4:40cb:0:b0:66d:a741:d50e with SMTP id x11-20020ad440cb000000b0066da741d50emr3797932qvp.21.1699995808878;
        Tue, 14 Nov 2023 13:03:28 -0800 (PST)
X-Google-Smtp-Source: AGHT+IGKoKGuZ0ORk0wPD1WTfBPHG7eIvZEsyOxX5htdUE6kxadLVQNHnGR0yAxAb9yqsLxf+0Em0Q==
X-Received: by 2002:ad4:40cb:0:b0:66d:a741:d50e with SMTP id x11-20020ad440cb000000b0066da741d50emr3797908qvp.21.1699995808503;
        Tue, 14 Nov 2023 13:03:28 -0800 (PST)
Received: from redhat.com (2603-7000-9500-34a5-0000-0000-0000-1db4.res6.spectrum.com. [2603:7000:9500:34a5::1db4])
        by smtp.gmail.com with ESMTPSA id i2-20020a0cd842000000b0066d1f118b7esm3195489qvj.1.2023.11.14.13.03.27
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 14 Nov 2023 13:03:28 -0800 (PST)
Date: Tue, 14 Nov 2023 16:03:26 -0500
From: Marek Polacek <polacek@redhat.com>
To: iain@sandoe.co.uk
Cc: gcc-patches@gcc.gnu.org, joseph@codesourcery.com, jason@redhat.com
Subject: Re: [PATCH 3/4] c-family, C++: Handle clang attributes [PR109877].
Message-ID: <ZVPgnjX1jXup8txe@redhat.com>
References: <20231113060244.90554-1-iain@sandoe.co.uk>
 <20231113060244.90554-2-iain@sandoe.co.uk>
 <20231113060244.90554-3-iain@sandoe.co.uk>
 <20231113060244.90554-4-iain@sandoe.co.uk>
MIME-Version: 1.0
In-Reply-To: <20231113060244.90554-4-iain@sandoe.co.uk>
User-Agent: Mutt/2.2.9 (2022-11-12)
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Sun, Nov 12, 2023 at 08:02:43PM -1000, Iain Sandoe wrote:
> This adds the ability to defer the validation of numeric attribute
> arguments until the sequence is parsed if the attribute being
> handled is one known to be 'clang form'.
> 
> We do this by considering the arguments to be strings regardless
> of content and defer the interpretation of those strings until the
> argument processing.
> 
> Since the C++ front end lexes tokens separately (and would report
> non-compliant numeric tokens before we can intercept them), we need
> to implement a small state machine to track the lexing of attributes.
> 
> 	PR c++/109877
> 
> gcc/cp/ChangeLog:
> 
> 	* parser.cc (enum clang_attr_state): New.
> 	(cp_lexer_attribute_state): New.
> 	(cp_lexer_new_main): Set initial attribute lexing state.
> 	(cp_lexer_get_preprocessor_token): Handle lexing of clang-
> 	form attributes.
> 	(cp_parser_clang_attribute): Handle clang-form attributes.
> 	(cp_parser_gnu_attribute_list): Switch to clang-form parsing
> 	where needed.
> 	* parser.h : New flag to signal lexing clang-form attributes.
> 
> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
> ---
>  gcc/cp/parser.cc | 230 +++++++++++++++++++++++++++++++++++++++++++++--
>  gcc/cp/parser.h  |   3 +
>  2 files changed, 227 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 5116bcb78f6..c12f473f2e3 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -699,6 +699,91 @@ cp_lexer_handle_early_pragma (cp_lexer *lexer)
>  static cp_parser *cp_parser_new (cp_lexer *);
>  static GTY (()) cp_parser *the_parser;
>  
> +/* Context-sensitive parse-checking for clang-style attributes.  */
> +
> +enum clang_attr_state {
> +  CA_NONE = 0,
> +  CA_ATTR,
> +  CA_BR1, CA_BR2,
> +  CA_LIST,
> +  CA_LIST_ARGS,
> +  CA_IS_CA,
> +  CA_CA_ARGS,
> +  CA_LIST_CONT
> +};
> +
> +/* State machine tracking context of attribute lexing.  */
> +
> +static enum clang_attr_state
> +cp_lexer_attribute_state (cp_token& token, enum clang_attr_state attr_state)
> +{
> +  /* Implement a context-sensitive parser for clang attributes.
> +     We detect __attribute__((clang_style_attribute (ARGS))) and lex the
> +     args ARGS with the following differences from GNU attributes:
> +	(a) number-like values are lexed as strings [this allows lexing XX.YY.ZZ
> +	   version numbers].
> +	(b) we concatenate strings, since clang attributes allow this too.  */
> +  switch (attr_state)
> +    {
> +    case CA_NONE:
> +      if (token.type == CPP_KEYWORD
> +	  && token.keyword == RID_ATTRIBUTE)
> +	attr_state = CA_ATTR;
> +      break;
> +    case CA_ATTR:
> +      if (token.type == CPP_OPEN_PAREN)
> +	attr_state = CA_BR1;
> +      else
> +	attr_state = CA_NONE;
> +      break;
> +    case CA_BR1:
> +      if (token.type == CPP_OPEN_PAREN)
> +	attr_state = CA_BR2;
> +      else
> +	attr_state = CA_NONE;
> +      break;
> +    case CA_BR2:
> +      if (token.type == CPP_NAME)
> +	{
> +	  tree identifier = (token.type == CPP_KEYWORD)
> +	    /* For keywords, use the canonical spelling, not the
> +	       parsed identifier.  */
> +	    ? ridpointers[(int) token.keyword]
> +	    : token.u.value;
> +	  identifier = canonicalize_attr_name (identifier);
> +	  if (attribute_clang_form_p (identifier))
> +	    attr_state = CA_IS_CA;
> +	  else
> +	    attr_state = CA_LIST;
> +	}
> +      else
> +	attr_state = CA_NONE;
> +      break;
> +    case CA_IS_CA:
> +    case CA_LIST:
> +      if (token.type == CPP_COMMA)
> +	attr_state = CA_BR2; /* Back to the list outer.  */
> +      else if (token.type == CPP_OPEN_PAREN)
> +	attr_state = attr_state == CA_IS_CA ? CA_CA_ARGS
> +					    : CA_LIST_ARGS;
> +      else
> +	attr_state = CA_NONE;
> +      break;
> +    case CA_CA_ARGS: /* We will special-case args in this state.  */
> +    case CA_LIST_ARGS:
> +      if (token.type == CPP_CLOSE_PAREN)
> +	attr_state = CA_LIST_CONT;
> +      break;
> +    case CA_LIST_CONT:
> +      if (token.type == CPP_COMMA)
> +	attr_state = CA_BR2; /* Back to the list outer.  */
> +      else
> +	attr_state = CA_NONE;
> +      break;
> +    }
> +  return attr_state;
> +}
> +
>  /* Create a new main C++ lexer, the lexer that gets tokens from the
>     preprocessor, and also create the main parser.  */
>  
> @@ -715,6 +800,9 @@ cp_lexer_new_main (void)
>    c_common_no_more_pch ();
>  
>    cp_lexer *lexer = cp_lexer_alloc ();
> +  lexer->lex_number_as_string_p = false;
> +  enum clang_attr_state attr_state = CA_NONE;
> +
>    /* Put the first token in the buffer.  */
>    cp_token *tok = lexer->buffer->quick_push (token);
>  
> @@ -738,8 +826,20 @@ cp_lexer_new_main (void)
>        if (tok->type == CPP_PRAGMA_EOL)
>  	cp_lexer_handle_early_pragma (lexer);
>  
> +      attr_state = cp_lexer_attribute_state (*tok, attr_state);
>        tok = vec_safe_push (lexer->buffer, cp_token ());
> -      cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok);
> +      unsigned int flags = C_LEX_STRING_NO_JOIN;
> +      if (attr_state == CA_CA_ARGS)
> +	{
> +	  /* We are handling a clang attribute;
> +	     1. do not try to diagnose x.y.z as a bad number, it could be a
> +		version.
> +	     2. join strings.  */
> +          flags = C_LEX_NUMBER_AS_STRING;
> +	  lexer->lex_number_as_string_p = true;
> +	}
> +      cp_lexer_get_preprocessor_token (flags, tok);
> +      lexer->lex_number_as_string_p = false;
>      }
>  
>    lexer->next_token = lexer->buffer->address ();
> @@ -956,7 +1056,15 @@ cp_lexer_get_preprocessor_token (unsigned flags, cp_token *token)
>  {
>    static int is_extern_c = 0;
>  
> -   /* Get a new token from the preprocessor.  */
> +   /* Get a new token from the preprocessor.
> +  int lex_flags = 0;
> +  if (lexer != NULL)
> +    {
> +      lex_flags = C_LEX_STRING_NO_JOIN;
> +      if (lexer->lex_number_as_string_p)
> +	lex_flags |= C_LEX_NUMBER_AS_STRING;
> +    }
> +  */

Why is this commented out?

>    token->type
>      = c_lex_with_flags (&token->u.value, &token->location, &token->flags,
>  			flags);
> @@ -29302,6 +29410,113 @@ cp_parser_gnu_attributes_opt (cp_parser* parser)
>    return attributes;
>  }
>  
> +/* Parse the arguments list for a clang attribute.   */
> +static tree
> +cp_parser_clang_attribute (cp_parser *parser, tree/*attr_id*/)
> +{
> +  /* Each entry can be :
> +     identifier
> +     identifier=N.MM.Z
> +     identifier="string"
> +     followed by ',' or ) for the last entry*/
> +
> +  matching_parens parens;
> +  if (!parens.require_open (parser))
> +    return NULL;
> +
> +  bool save_translate_strings_p = parser->translate_strings_p;
> +  parser->translate_strings_p = false;

This could be

  auto ts = make_temp_override (parser->translate_strings_p, false);

and then...

> +  tree attr_args = NULL_TREE;
> +  cp_token *token = cp_lexer_peek_token (parser->lexer);
> +  if (token->type == CPP_NAME
> +      && cp_lexer_peek_nth_token (parser->lexer, 2)->type == CPP_COMMA)
> +    {
> +      tree platf = token->u.value;
> +      cp_lexer_consume_token (parser->lexer);
> +      attr_args = tree_cons (NULL_TREE, platf, NULL_TREE);
> +    }
> +  else
> +    {
> +      error_at (token->location,
> +		"expected a platform name followed by %<,%>");
> +      cp_parser_skip_to_closing_parenthesis (parser,
> +					     /*recovering=*/true,
> +					     /*or_comma=*/false,
> +					     /*consume_paren=*/true);
> +      parser->translate_strings_p = save_translate_strings_p;

you don't need this, and the one below.

> +      return error_mark_node;
> +    }
> +
> +  cp_lexer_consume_token (parser->lexer); /* consume the ',' */
> +  do
> +    {
> +      tree name = NULL_TREE;
> +      tree value = NULL_TREE;
> +
> +      token = cp_lexer_peek_token (parser->lexer);
> +      if (token->type == CPP_NAME)
> +	{
> +	  name = token->u.value;
> +	  cp_lexer_consume_token (parser->lexer);
> +	}
> +      else
> +	{
> +	  /* FIXME: context-sensitive for that attrib.  */
> +	  error_at (token->location, "expected an attribute keyword");
> +	  cp_parser_skip_to_closing_parenthesis (parser,
> +						 /*recovering=*/true,
> +						 /*or_comma=*/false,
> +						 /*consume_paren=*/true);
> +	  attr_args = error_mark_node;
> +	  break;
> +	}
> +
> +      if (cp_lexer_next_token_is (parser->lexer, CPP_EQ))
> +	{
> +	  cp_lexer_consume_token (parser->lexer); /* eat the '=' */
> +	  token = cp_lexer_peek_token (parser->lexer);
> +	  if (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN)
> +	      && cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA))
> +	    {
> +	      value = token->u.value;
> +	      /* ???: check for error mark and early-return?  */
> +	      cp_lexer_consume_token (parser->lexer);
> +	    }
> +	  else
> +	    {
> +	      error_at (token->location, "expected a value");
> +	      cp_parser_skip_to_closing_parenthesis (parser,
> +						     /*recovering=*/true,
> +						     /*or_comma=*/false,
> +						     /*consume_paren=*/true);
> +	      attr_args = error_mark_node;
> +	      break;
> +	    }
> +	}
> +      else if (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN)
> +	       && cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA))
> +	{
> +	  error_at (token->location, "expected %<,%>, %<=%> or %<)%>");
> +	  cp_parser_skip_to_closing_parenthesis (parser,
> +						 /*recovering=*/true,
> +						 /*or_comma=*/false,
> +						 /*consume_paren=*/true);
> +	  attr_args = error_mark_node;
> +	  break;
> +	}
> +    if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
> +      cp_lexer_consume_token (parser->lexer);
> +    tree t = tree_cons (value, name, NULL_TREE);
> +    attr_args = chainon (attr_args, t);

attr_chainon

> +  } while (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN));

I think we format do-while as

  do
    {
    }
  while (cond);

> +
> +  parser->translate_strings_p = save_translate_strings_p;
> +  if (!parens.require_close (parser))
> +    return error_mark_node;
> +
> +  return attr_args;
> +}
> +
>  /* Parse a GNU attribute-list.
>  
>     attribute-list:
> @@ -29361,9 +29576,12 @@ cp_parser_gnu_attribute_list (cp_parser* parser, bool exactly_one /* = false */)
>  
>  	  /* Peek at the next token.  */
>  	  token = cp_lexer_peek_token (parser->lexer);
> -	  /* If it's an `(', then parse the attribute arguments.  */
> -	  if (token->type == CPP_OPEN_PAREN)
> +	  if (token->type == CPP_OPEN_PAREN
> +	      && attribute_clang_form_p (identifier))
> +	    arguments = cp_parser_clang_attribute (parser, identifier);
> +	  else if (token->type == CPP_OPEN_PAREN)

I think it'd be nice to eliminate checking token->type == CPP_OPEN_PAREN
twice here, and I'd say it's worth it wrapping attribute_clang_form_p (...)
in UNLIKELY (), since it should be extremely rare to be true?

Marek