public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* The treatment of null characters in C source files
@ 1999-09-05 16:29 Zack Weinberg
  1999-09-05 17:07 ` Jeffrey A Law
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Zack Weinberg @ 1999-09-05 16:29 UTC (permalink / raw)
  To: gcc

Consider a source file such as

#include <stdio.h>

int main()
{
  puts ("hello^@ world");
}

where ^@ is a null character.  cccp passes null characters through to
the output and cc1 accepts them in strings.  All released versions of
gcc will therefore compile this without complaint, producing an
executable that prints "hello".

cpplib used to mangle input files with nulls in them.  The patch I
sent in on Friday (gcc-patches/1999-09/msg00158.html) makes it instead
emit a warning and ignore the null.  The above will produce

test.c:5:15: warning: ignoring ASCII NUL in input

and an executable that prints "hello world".

The question is, is this an acceptable behavior change for the
compiler?  Making cpplib pass through nulls would be extremely
difficult, but someone might have a legitimate use for them.

zw

p.s. This has nothing to do with multibyte support.  I'm fully aware
that non-ASCII character sets may contain zero bytes which are not
null characters.  Currently cpplib supports only ASCII (unibyte strict
supersets such as Latin1 probably work if the extended characters are
confined to strings and comments).

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: The treatment of null characters in C source files
@ 1999-09-06 13:46 John Marshall
  1999-09-07 11:37 ` Dave Brolley
  1999-09-30 18:02 ` John Marshall
  0 siblings, 2 replies; 24+ messages in thread
From: John Marshall @ 1999-09-06 13:46 UTC (permalink / raw)
  To: law; +Cc: gcc

>> But, surely there's no requirement that a C *source file* be allowed to
>> have a null character in it.
> Oh, I mis-understood.  Sorry.  No clue what the standard says here.

Section 5.2.1 (Character sets) requires the basic source character set to
have the usual bunch of alphanumerics and punctuation, and space, HT, VT,
FF, and "some way of indicating the end of each line of text".  Outside
of char and string literals and a few other places, encountering anything
else (eg, NUL) is undefined.  Inside a char constant or string literal:

	[...] members of the execution character set shall be represented
	by corresponding members of the source character set or by escape
	sequences [...]

The next sentence requires there to be a NUL character in the basic
execution character set, but not in the source one.

That section then refers you to 6.1.4 for string literals, which doesn't
really say anything about NULs, although there is an obliquely relevant
footnote:

	A character string literal need not be a string (sec 7.1.1), because
	a null character may be embedded in it by a \0 escape sequence.

So I don't think we're required to understand real NUL characters.
(I didn't look in the C++ standard, though.)

    John  "lawyers 'r' us :-)"

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~1999-09-30 18:02 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-09-05 16:29 The treatment of null characters in C source files Zack Weinberg
1999-09-05 17:07 ` Jeffrey A Law
1999-09-05 17:38   ` Zack Weinberg
1999-09-30 18:02     ` Zack Weinberg
1999-09-05 17:42   ` craig
1999-09-06  1:10     ` Jeffrey A Law
1999-09-30 18:02       ` Jeffrey A Law
1999-09-30 18:02     ` craig
1999-09-30 18:02   ` Jeffrey A Law
1999-09-05 19:52 ` Alexandre Oliva
1999-09-06 10:26   ` Joern Rennecke
1999-09-30 18:02     ` Joern Rennecke
1999-09-30 18:02   ` Alexandre Oliva
1999-09-30 18:02 ` Zack Weinberg
1999-09-06 13:46 John Marshall
1999-09-07 11:37 ` Dave Brolley
1999-09-07 12:27   ` Zack Weinberg
1999-09-07 12:40     ` Dave Brolley
1999-09-30 18:02       ` Dave Brolley
1999-09-07 12:44     ` Dave Brolley
1999-09-30 18:02       ` Dave Brolley
1999-09-30 18:02     ` Zack Weinberg
1999-09-30 18:02   ` Dave Brolley
1999-09-30 18:02 ` John Marshall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).