public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [C++/preprocessor Patch] PR c++/53690
@ 2015-07-01 20:30 Paolo Carlini
  2015-07-01 20:45 ` Andreas Schwab
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Carlini @ 2015-07-01 20:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jason Merrill, Tom Tromey

[-- Attachment #1: Type: text/plain, Size: 1465 bytes --]

Hi,

this issue isn't a regression but it's getting a lot of attention and 
duplicates. Essentially, due to the lines at the very end of _cpp_valid_ucn:

   if (result == 0)
     result = 1;

   return result;
}

we can't possibly get the encoding of \U00000000 right in C++, that is 
zero, even after Jason's r152614 which added the following special case 
for C++ earlier in the function:

   else if ((result < 0xa0
         && !CPP_OPTION (pfile, cplusplus)
         && (result != 0x24 && result != 0x40 && result != 0x60))
        || (result & 0x80000000)
        || (result >= 0xD800 && result <= 0xDFFF))
     {
       cpp_error (pfile, CPP_DL_ERROR,
          "%.*s is not a valid universal character",
          (int) (str - base), base);
       result = 1;
     }

thus letting result == 0 thru until the very end of the function. Now, 
fixing the problem seems easy: just change the return type of the 
function to int, thus return zero for success; add a cppchar_t* 
parameter cp where *cp is computed exactly like the current return value 
*without* the final reset to one. Note that this is supposed to have 
*no* effects whatsoever outside C++ because currently, for all those 
languages, the final reset never triggers: in the second conditional 
above, a result == 0 triggers a cpp_error and result becomes == 1, thus 
the final conditional never sees result == 0.

Thus I'm finishing testing the below...

Thanks!
Paolo.

////////////////////////

[-- Attachment #2: CL_53690 --]
[-- Type: text/plain, Size: 450 bytes --]

/libcpp
2015-07-01  Paolo Carlini  <paolo.carlini@oracle.com>

	PR c++/53690
	* charset.c (_cpp_valid_ucn): Add cppchar_t * parameter and change
	return type to int.  Fix encoding of \u0000 and \U00000000 in C++.
	(convert_ucn): Adjust call.
	* lex.c (forms_identifier_p): Likewise.
	* internal.h (_cpp_valid_ucn): Adjust declaration.

/gcc/testsuite
2015-07-01  Paolo Carlini  <paolo.carlini@oracle.com>

	PR c++/53690
	* g++.dg/cpp/pr53690.C: New.

[-- Attachment #3: patch_53690 --]
[-- Type: text/plain, Size: 4164 bytes --]

Index: gcc/testsuite/g++.dg/cpp/pr53690.C
===================================================================
--- gcc/testsuite/g++.dg/cpp/pr53690.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp/pr53690.C	(working copy)
@@ -0,0 +1,7 @@
+// PR c++/53690
+// { dg-do compile { target c++11 } }
+
+int array1[U'\U00000000' == 0 ? 1 : -1];
+int array2[U'\u0000' == 0 ? 1 : -1];
+int array3[u'\U00000000' == 0 ? 1 : -1];
+int array4[u'\u0000' == 0 ? 1 : -1];
Index: libcpp/charset.c
===================================================================
--- libcpp/charset.c	(revision 225265)
+++ libcpp/charset.c	(working copy)
@@ -972,21 +972,20 @@ ucn_valid_in_identifier (cpp_reader *pfile, cppcha
    or 0060 (`), nor one in the range D800 through DFFF inclusive.
 
    *PSTR must be preceded by "\u" or "\U"; it is assumed that the
-   buffer end is delimited by a non-hex digit.  Returns zero if the
-   UCN has not been consumed.
+   buffer end is delimited by a non-hex digit.  Returns one if the
+   UCN has not been consumed, zero otherwise.
 
-   Otherwise the nonzero value of the UCN, whether valid or invalid,
-   is returned.  Diagnostics are emitted for invalid values.  PSTR
-   is updated to point one beyond the UCN, or to the syntactically
-   invalid character.
+   The value of the UCN, whether valid or invalid, is returned in *CP.
+   Diagnostics are emitted for invalid values.  PSTR is updated to point
+   one beyond the UCN, or to the syntactically invalid character.
 
    IDENTIFIER_POS is 0 when not in an identifier, 1 for the start of
    an identifier, or 2 otherwise.  */
 
-cppchar_t
+int
 _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
 		const uchar *limit, int identifier_pos,
-		struct normalize_state *nst)
+		struct normalize_state *nst, cppchar_t *cp)
 {
   cppchar_t result, c;
   unsigned int length;
@@ -1030,8 +1029,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **p
      multiple tokens in identifiers, so we can't give a helpful
      error message in that case.  */
   if (length && identifier_pos)
-    return 0;
-  
+    {
+      *cp = 0;
+      return 1;
+    }
+
   *pstr = str;
   if (length)
     {
@@ -1079,10 +1081,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **p
 		   (int) (str - base), base);
     }
 
-  if (result == 0)
-    result = 1;
-
-  return result;
+  *cp = result;
+  return 0;
 }
 
 /* Convert an UCN, pointed to by FROM, to UTF-8 encoding, then translate
@@ -1100,7 +1100,7 @@ convert_ucn (cpp_reader *pfile, const uchar *from,
   struct normalize_state nst = INITIAL_NORMALIZE_STATE;
 
   from++;  /* Skip u/U.  */
-  ucn = _cpp_valid_ucn (pfile, &from, limit, 0, &nst);
+  _cpp_valid_ucn (pfile, &from, limit, 0, &nst, &ucn);
 
   rval = one_cppchar_to_utf8 (ucn, &bufp, &bytesleft);
   if (rval)
Index: libcpp/internal.h
===================================================================
--- libcpp/internal.h	(revision 225265)
+++ libcpp/internal.h	(working copy)
@@ -744,9 +744,10 @@ struct normalize_state
 #define NORMALIZE_STATE_UPDATE_IDNUM(st, c)	\
   ((st)->previous = (c), (st)->prev_class = 0)
 
-extern cppchar_t _cpp_valid_ucn (cpp_reader *, const unsigned char **,
-				 const unsigned char *, int,
-				 struct normalize_state *state);
+extern int _cpp_valid_ucn (cpp_reader *, const unsigned char **,
+			   const unsigned char *, int,
+			   struct normalize_state *state,
+			   cppchar_t *);
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
 					  unsigned char *, size_t, size_t,
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 225265)
+++ libcpp/lex.c	(working copy)
@@ -1244,9 +1244,10 @@ forms_identifier_p (cpp_reader *pfile, int first,
       && *buffer->cur == '\\'
       && (buffer->cur[1] == 'u' || buffer->cur[1] == 'U'))
     {
+      cppchar_t s;
       buffer->cur += 2;
-      if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
-			  state))
+      if (!_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
+			   state, &s))
 	return true;
       buffer->cur -= 2;
     }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [C++/preprocessor Patch] PR c++/53690
  2015-07-01 20:30 [C++/preprocessor Patch] PR c++/53690 Paolo Carlini
@ 2015-07-01 20:45 ` Andreas Schwab
  2015-07-01 20:54   ` Paolo Carlini
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Schwab @ 2015-07-01 20:45 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc-patches, Jason Merrill, Tom Tromey

Paolo Carlini <paolo.carlini@oracle.com> writes:

> @@ -972,21 +972,20 @@ ucn_valid_in_identifier (cpp_reader *pfile, cppcha
>     or 0060 (`), nor one in the range D800 through DFFF inclusive.
>  
>     *PSTR must be preceded by "\u" or "\U"; it is assumed that the
> -   buffer end is delimited by a non-hex digit.  Returns zero if the
> -   UCN has not been consumed.
> +   buffer end is delimited by a non-hex digit.  Returns one if the
> +   UCN has not been consumed, zero otherwise.

The name of the function would make more sense if it returned a boolean
with true meaning success.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [C++/preprocessor Patch] PR c++/53690
  2015-07-01 20:45 ` Andreas Schwab
@ 2015-07-01 20:54   ` Paolo Carlini
  2015-07-01 21:14     ` Paolo Carlini
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Carlini @ 2015-07-01 20:54 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gcc-patches, Jason Merrill, Tom Tromey, tom

Hi,

On 07/01/2015 10:45 PM, Andreas Schwab wrote:
> Paolo Carlini <paolo.carlini@oracle.com> writes:
>
>> @@ -972,21 +972,20 @@ ucn_valid_in_identifier (cpp_reader *pfile, cppcha
>>      or 0060 (`), nor one in the range D800 through DFFF inclusive.
>>   
>>      *PSTR must be preceded by "\u" or "\U"; it is assumed that the
>> -   buffer end is delimited by a non-hex digit.  Returns zero if the
>> -   UCN has not been consumed.
>> +   buffer end is delimited by a non-hex digit.  Returns one if the
>> +   UCN has not been consumed, zero otherwise.
> The name of the function would make more sense if it returned a boolean
> with true meaning success.
To be clear (the diff can be a little confusing) we are talking about 
_cpp_valid_ucn, the name of the function I'm patching. That said, I'm of 
course open to proposals about a better name for the patched function. 
Likewise for the values returned, which I picked for consistency with 
all the other functions in libcpp, the usual "unix" convention.

Paolo.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [C++/preprocessor Patch] PR c++/53690
  2015-07-01 20:54   ` Paolo Carlini
@ 2015-07-01 21:14     ` Paolo Carlini
  2015-07-02 14:33       ` Paolo Carlini
  2015-07-02 16:22       ` Jason Merrill
  0 siblings, 2 replies; 6+ messages in thread
From: Paolo Carlini @ 2015-07-01 21:14 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gcc-patches, Jason Merrill, Tom Tromey, tom

[-- Attachment #1: Type: text/plain, Size: 1352 bytes --]

Hi again,

On 07/01/2015 10:54 PM, Paolo Carlini wrote:
> Hi,
>
> On 07/01/2015 10:45 PM, Andreas Schwab wrote:
>> Paolo Carlini <paolo.carlini@oracle.com> writes:
>>
>>> @@ -972,21 +972,20 @@ ucn_valid_in_identifier (cpp_reader *pfile, 
>>> cppcha
>>>      or 0060 (`), nor one in the range D800 through DFFF inclusive.
>>>        *PSTR must be preceded by "\u" or "\U"; it is assumed that the
>>> -   buffer end is delimited by a non-hex digit.  Returns zero if the
>>> -   UCN has not been consumed.
>>> +   buffer end is delimited by a non-hex digit.  Returns one if the
>>> +   UCN has not been consumed, zero otherwise.
>> The name of the function would make more sense if it returned a boolean
>> with true meaning success.
> To be clear (the diff can be a little confusing) we are talking about 
> _cpp_valid_ucn, the name of the function I'm patching. That said, I'm 
> of course open to proposals about a better name for the patched 
> function. Likewise for the values returned, which I picked for 
> consistency with all the other functions in libcpp, the usual "unix" 
> convention.
I stand corrected: in fact we are already using a mix of bool and int 
return types in those functions. Thus I'm also testing the below 
version, which simply changes the return type to bool with true meaning 
success.

Thanks!
Paolo.

///////////////////

[-- Attachment #2: CL_53690_2 --]
[-- Type: text/plain, Size: 451 bytes --]

/libcpp
2015-07-01  Paolo Carlini  <paolo.carlini@oracle.com>

	PR c++/53690
	* charset.c (_cpp_valid_ucn): Add cppchar_t * parameter and change
	return type to bool.  Fix encoding of \u0000 and \U00000000 in C++.
	(convert_ucn): Adjust call.
	* lex.c (forms_identifier_p): Likewise.
	* internal.h (_cpp_valid_ucn): Adjust declaration.

/gcc/testsuite
2015-07-01  Paolo Carlini  <paolo.carlini@oracle.com>

	PR c++/53690
	* g++.dg/cpp/pr53690.C: New.

[-- Attachment #3: patch_53690_2 --]
[-- Type: text/plain, Size: 4100 bytes --]

Index: gcc/testsuite/g++.dg/cpp/pr53690.C
===================================================================
--- gcc/testsuite/g++.dg/cpp/pr53690.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp/pr53690.C	(working copy)
@@ -0,0 +1,7 @@
+// PR c++/53690
+// { dg-do compile { target c++11 } }
+
+int array1[U'\U00000000' == 0 ? 1 : -1];
+int array2[U'\u0000' == 0 ? 1 : -1];
+int array3[u'\U00000000' == 0 ? 1 : -1];
+int array4[u'\u0000' == 0 ? 1 : -1];
Index: libcpp/charset.c
===================================================================
--- libcpp/charset.c	(revision 225275)
+++ libcpp/charset.c	(working copy)
@@ -972,21 +972,20 @@ ucn_valid_in_identifier (cpp_reader *pfile, cppcha
    or 0060 (`), nor one in the range D800 through DFFF inclusive.
 
    *PSTR must be preceded by "\u" or "\U"; it is assumed that the
-   buffer end is delimited by a non-hex digit.  Returns zero if the
-   UCN has not been consumed.
+   buffer end is delimited by a non-hex digit.  Returns false if the
+   UCN has not been consumed, true otherwise.
 
-   Otherwise the nonzero value of the UCN, whether valid or invalid,
-   is returned.  Diagnostics are emitted for invalid values.  PSTR
-   is updated to point one beyond the UCN, or to the syntactically
-   invalid character.
+   The value of the UCN, whether valid or invalid, is returned in *CP.
+   Diagnostics are emitted for invalid values.  PSTR is updated to point
+   one beyond the UCN, or to the syntactically invalid character.
 
    IDENTIFIER_POS is 0 when not in an identifier, 1 for the start of
    an identifier, or 2 otherwise.  */
 
-cppchar_t
+bool
 _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
 		const uchar *limit, int identifier_pos,
-		struct normalize_state *nst)
+		struct normalize_state *nst, cppchar_t *cp)
 {
   cppchar_t result, c;
   unsigned int length;
@@ -1030,8 +1029,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **p
      multiple tokens in identifiers, so we can't give a helpful
      error message in that case.  */
   if (length && identifier_pos)
-    return 0;
-  
+    {
+      *cp = 0;
+      return false;
+    }
+
   *pstr = str;
   if (length)
     {
@@ -1079,10 +1081,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **p
 		   (int) (str - base), base);
     }
 
-  if (result == 0)
-    result = 1;
-
-  return result;
+  *cp = result;
+  return true;
 }
 
 /* Convert an UCN, pointed to by FROM, to UTF-8 encoding, then translate
@@ -1100,7 +1100,7 @@ convert_ucn (cpp_reader *pfile, const uchar *from,
   struct normalize_state nst = INITIAL_NORMALIZE_STATE;
 
   from++;  /* Skip u/U.  */
-  ucn = _cpp_valid_ucn (pfile, &from, limit, 0, &nst);
+  _cpp_valid_ucn (pfile, &from, limit, 0, &nst, &ucn);
 
   rval = one_cppchar_to_utf8 (ucn, &bufp, &bytesleft);
   if (rval)
Index: libcpp/internal.h
===================================================================
--- libcpp/internal.h	(revision 225275)
+++ libcpp/internal.h	(working copy)
@@ -744,9 +744,10 @@ struct normalize_state
 #define NORMALIZE_STATE_UPDATE_IDNUM(st, c)	\
   ((st)->previous = (c), (st)->prev_class = 0)
 
-extern cppchar_t _cpp_valid_ucn (cpp_reader *, const unsigned char **,
-				 const unsigned char *, int,
-				 struct normalize_state *state);
+extern bool _cpp_valid_ucn (cpp_reader *, const unsigned char **,
+			    const unsigned char *, int,
+			    struct normalize_state *state,
+			    cppchar_t *);
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
 					  unsigned char *, size_t, size_t,
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 225275)
+++ libcpp/lex.c	(working copy)
@@ -1244,9 +1244,10 @@ forms_identifier_p (cpp_reader *pfile, int first,
       && *buffer->cur == '\\'
       && (buffer->cur[1] == 'u' || buffer->cur[1] == 'U'))
     {
+      cppchar_t s;
       buffer->cur += 2;
       if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
-			  state))
+			  state, &s))
 	return true;
       buffer->cur -= 2;
     }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [C++/preprocessor Patch] PR c++/53690
  2015-07-01 21:14     ` Paolo Carlini
@ 2015-07-02 14:33       ` Paolo Carlini
  2015-07-02 16:22       ` Jason Merrill
  1 sibling, 0 replies; 6+ messages in thread
From: Paolo Carlini @ 2015-07-02 14:33 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gcc-patches, Jason Merrill, tom

On 07/01/2015 11:14 PM, Paolo Carlini wrote:
> I stand corrected: in fact we are already using a mix of bool and int 
> return types in those functions. Thus I'm also testing the below 
> version, which simply changes the return type to bool with true 
> meaning success.
Testing went Ok.

Paolo.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [C++/preprocessor Patch] PR c++/53690
  2015-07-01 21:14     ` Paolo Carlini
  2015-07-02 14:33       ` Paolo Carlini
@ 2015-07-02 16:22       ` Jason Merrill
  1 sibling, 0 replies; 6+ messages in thread
From: Jason Merrill @ 2015-07-02 16:22 UTC (permalink / raw)
  To: Paolo Carlini, Andreas Schwab; +Cc: gcc-patches, tom

OK.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-02 16:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-01 20:30 [C++/preprocessor Patch] PR c++/53690 Paolo Carlini
2015-07-01 20:45 ` Andreas Schwab
2015-07-01 20:54   ` Paolo Carlini
2015-07-01 21:14     ` Paolo Carlini
2015-07-02 14:33       ` Paolo Carlini
2015-07-02 16:22       ` Jason Merrill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).