public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: Jason Merrill <jason@redhat.com>
Cc: Marek Polacek <polacek@redhat.com>,
	"Joseph S. Myers" <joseph@codesourcery.com>,
	gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] libcpp: Implement C++23 P2290R3 - Delimited escape sequences [PR106645]
Date: Wed, 17 Aug 2022 23:19:14 +0200	[thread overview]
Message-ID: <Yv1bUmjit6zH+Jw0@tucnak> (raw)
In-Reply-To: <d6c2553d-8ed3-b0e3-6d1d-0445103f1a0a@redhat.com>

On Wed, Aug 17, 2022 at 04:47:19PM -0400, Jason Merrill via Gcc-patches wrote:
> > +	  length = 32;
> 
> /* Magic value to indicate no digits seen.  */

Indeed, will add the comment.

> > +	  delimited = true;
> > +	  if (loc_reader)
> > +	    char_range->m_finish = loc_reader->get_next ().m_finish;
> > +	}
> > +    }
> >     else if (str[-1] == 'U')
> >       length = 8;
> >     else
> > @@ -1107,6 +1118,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const
> >     result = 0;
> >     do
> >       {
> > +      if (str == limit)
> > +	break;
> >         c = *str;
> >         if (!ISXDIGIT (c))
> >   	break;
> > @@ -1116,9 +1129,41 @@ _cpp_valid_ucn (cpp_reader *pfile, const
> >   	  gcc_assert (char_range);
> >   	  char_range->m_finish = loc_reader->get_next ().m_finish;
> >   	}
> > +      if (delimited)
> > +	{
> > +	  if (!result)
> > +	    /* Accept arbitrary number of leading zeros.  */
> > +	    length = 16;
> > +	  else if (length == 8)
> > +	    {
> > +	      /* Make sure we detect overflows.  */
> > +	      result |= 0x8000000;
> > +	      ++length;
> > +	    }
> 
> 16 above so that this case happens after we read 8 digits after leading
> zeroes?

Another magic value less than the no digits seen one and
> 8, so that it can count 8 digits with the first non-zero one after
which to or in the overflow flag.  The intent is not to break the loop
if there are further digits, just that there will be overflow.
Another option would be those overflow |= n ^ (n << 4 >> 4);
tests that convert_hex does and just making sure length is never decremented
(except we need a way to distinguish between \u{} and at least one digit).

> > +      if (loc_reader)
> > +	char_range->m_finish = loc_reader->get_next ().m_finish;
> 
> Here and in other functions, the pattern of increment the input pointer and
> update m_finish seems like it should be a macro?

Perhaps or inline function.  Before my patch, there are 5 such ifs
(some with char_range.m_finish and others char_range->m_finish),
the patch adds another 7 such spots.

> > @@ -2119,15 +2255,23 @@ _cpp_interpret_identifier (cpp_reader *p
> >   	cppchar_t value = 0;
> >   	size_t bufleft = len - (bufp - buf);
> >   	int rval;
> > +	bool delimited = false;
> >   	idp += 2;
> > +	if (length == 4 && id[idp] == '{')
> > +	  {
> > +	    delimited = true;
> > +	    idp++;
> > +	  }
> >   	while (length && idp < len && ISXDIGIT (id[idp]))
> >   	  {
> >   	    value = (value << 4) + hex_value (id[idp]);
> >   	    idp++;
> > -	    length--;
> > +	    if (!delimited)
> > +	      length--;
> >   	  }
> > -	idp--;
> > +	if (!delimited)
> > +	  idp--;
> 
> Don't we need to check that the first non-xdigit is a }?

The comments and my understanding of the code say that we first
check what is a valid identifier and the above is only called on
a valid identifier.  So, if it would be delimited \u{ not terminated
with }, then it would fail forms_identifier_p and wouldn't be included
in the range.  Thus e.g. the ISXDIGIT (id[id]) test is probably not needed
unless delimited is true because we've checked earlier that it has 4 or 8
hex digits.
But sure, if you want a id[idp] == '}' test or assertion, it can be
added.

	Jakub


  reply	other threads:[~2022-08-17 21:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  7:17 Jakub Jelinek
2022-08-17 20:47 ` Jason Merrill
2022-08-17 21:19   ` Jakub Jelinek [this message]
2022-08-18  2:22     ` Jason Merrill
2022-08-18  8:17       ` [PATCH] libcpp, v2: " Jakub Jelinek
2022-08-19  0:34         ` Jason Merrill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yv1bUmjit6zH+Jw0@tucnak \
    --to=jakub@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jason@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=polacek@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).