From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4486 invoked by alias); 31 Oct 2013 12:46:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 4477 invoked by uid 89); 31 Oct 2013 12:46:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.2 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 31 Oct 2013 12:46:18 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r9VCkHPV024403 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 31 Oct 2013 08:46:17 -0400 Received: from localhost (ovpn-116-76.ams2.redhat.com [10.36.116.76]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r9VCkFMX021666 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 31 Oct 2013 08:46:16 -0400 Received: by localhost (Postfix, from userid 1000) id 68F4916284D; Thu, 31 Oct 2013 13:46:14 +0100 (CET) From: Dodji Seketeli To: Tom Tromey , Manuel =?utf-8?B?TMOzcGV6LUliw6HDsWV6?= , Jakub Jelinek Cc: GCC Patches Subject: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals X-URL: http://www.redhat.com Date: Thu, 31 Oct 2013 13:45:00 -0000 Message-ID: <871u311ucp.fsf@redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2013-10/txt/msg02680.txt.bz2 Hello, In this problem report, the compiler is fed a (bogus) translation unit in which some literals contains bytes whose value is zero. The preprocessor detects that and proceeds to emit diagnostics for that king of bogus literals. But then when the diagnostics machinery re-reads the input file again to display the bogus literals with a caret, it attempts to calculate the length of each of the lines it got using fgets. The line length calculation is done using strlen. But that doesn't work well when the content of the line can have several zero bytes. The result is that the read_line never sees the end of the line because strlen repeatedly reports that the line ends before the end-of-line character; so read_line thinks its buffer for reading the line is too small; it thus increases the buffer, leading to a huge memory consumption, pain and disaster. The patch below introduces a new string_length() function that can return the length of a string contained in a buffer even if the string contains zero bytes; it does so by starting from the end of the buffer and stops when it encounters the first non-null byte; for that to work, the buffer must have been totally zeroed before getting data. read_line is then modified to return the length of the line along with the line itself, as the line can now contain zero bytes. Callers of read_line are adjusted consequently. diagnostic_show_locus() is modified to consider that a line can have characters of value zero, and so just show a white space when instructed to display one of these characters. Tested on x86_64-unknown-linux-gnu against trunk. I realize this is diagnostics code and I am supposed to be a maintainer for it, but I'd appreciate a review for it nonetheless. Thanks. gcc/ChangeLog: * input.h (location_get_source_line): Take an additional line_size parameter by reference. * input.c (string_length): New static function definition. (read_line): Take an additional line_length output parameter to be set to the size of the line. Use the new string_length function to compute the size of the line returned by fgets, rather than using strlen. Ensure that the buffer is initially zeroed; ensure that when growing the buffer too. (location_get_source_line): Take an additional output line_len parameter. Update the use of read_line to pass it the line_len parameter. * diagnostic.c (adjust_line): Take an additional input parameter for the length of the line, rather than calculating it with strlen. (diagnostic_show_locus): Adjust the use of location_get_source_line and adjust_line with respect to their new signature. While displaying a line now, do not stop at the first null byte. Rather, display the zero byte as a space and keep going until we reach the size of the line. gcc/testsuite/ChangeLog: * c-c++-common/cpp/warning-zero-in-literals-1.c: New test file. --- gcc/diagnostic.c | 17 +++--- gcc/input.c | 60 +++++++++++++++++---- gcc/input.h | 3 +- .../c-c++-common/cpp/warning-zero-in-literals-1.c | Bin 0 -> 240 bytes 4 files changed, 62 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 36094a1..0ca7081 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context, MAX_WIDTH by some margin, then adjust the start of the line such that the COLUMN is smaller than MAX_WIDTH minus the margin. The margin is either 10 characters or the difference between the column - and the length of the line, whatever is smaller. */ + and the length of the line, whatever is smaller. The length of + LINE is given by LINE_WIDTH. */ static const char * -adjust_line (const char *line, int max_width, int *column_p) +adjust_line (const char *line, int line_width, + int max_width, int *column_p) { int right_margin = 10; - int line_width = strlen (line); int column = *column_p; right_margin = MIN (line_width - column, right_margin); @@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context, const diagnostic_info *diagnostic) { const char *line; + int line_width; char *buffer; expanded_location s; int max_width; @@ -297,22 +299,25 @@ diagnostic_show_locus (diagnostic_context * context, context->last_location = diagnostic->location; s = expand_location_to_spelling_point (diagnostic->location); - line = location_get_source_line (s); + line = location_get_source_line (s, line_width); if (line == NULL) return; max_width = context->caret_max_width; - line = adjust_line (line, max_width, &(s.column)); + line = adjust_line (line, line_width, max_width, &(s.column)); pp_newline (context->printer); saved_prefix = pp_get_prefix (context->printer); pp_set_prefix (context->printer, NULL); pp_space (context->printer); - while (max_width > 0 && *line != '\0') + while (max_width > 0 && line_width > 0) { char c = *line == '\t' ? ' ' : *line; + if (c == '\0') + c = ' '; pp_character (context->printer, c); max_width--; + line_width--; line++; } pp_newline (context->printer); diff --git a/gcc/input.c b/gcc/input.c index a141a92..b72183c 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -87,9 +87,37 @@ expand_location_1 (source_location loc, return xloc; } -/* Reads one line from file into a static buffer. */ +/* Returns the length of a string contained in a buffer that was + initially totally zeroed. The length of the string is the number + of bytes up to the last non-null byte of the buffer. Note that + with this definition, the string can contain bytes which value is + zero. + + BUF is a pointer to the beginning of the buffer containing the + string to consider. + + BUF_SIZE is the size of the buffer containing the string to + consider. The string length is thus at most BUF_SIZE. +*/ +static size_t +string_length (const char* buf, size_t buf_size) +{ + for (int i = buf_size - 1; i > 0; --i) + { + if (buf[i] != 0) + return i + 1; + + if (buf[i - 1] != 0) + return i; + } + return 0; +} + +/* Reads one line from FILE into a static buffer. LINE_LENGTH is set + by this function to the length of the returned line. Note that the + returned line can contain several zero bytes. */ static const char * -read_line (FILE *file) +read_line (FILE *file, int& line_length) { static char *string; static size_t string_len; @@ -99,32 +127,42 @@ read_line (FILE *file) if (!string_len) { string_len = 200; - string = XNEWVEC (char, string_len); + string = XCNEWVEC (char, string_len); } + else + memset (string, 0, string_len); while ((ptr = fgets (string + pos, string_len - pos, file))) { - size_t len = strlen (string + pos); + size_t len = string_length (string + pos, string_len - pos); if (string[pos + len - 1] == '\n') { string[pos + len - 1] = 0; + line_length = len; return string; } pos += len; - string = XRESIZEVEC (char, string, string_len * 2); - string_len *= 2; - } - + size_t string_len2 = string_len * 2; + char *string2 = XCNEWVEC (char, string_len2); + memmove (string2, string, string_len); + XDELETE (string);; + string = string2; + string_len = string_len2; + } + + line_length = pos ? string_len : 0; return pos ? string : NULL; } /* Return the physical source line that corresponds to xloc in a buffer that is statically allocated. The newline is replaced by - the null character. */ + the null character. Note that the line can contain several null + characters, so LINE_LEN contains the actual length of the line. */ const char * -location_get_source_line (expanded_location xloc) +location_get_source_line (expanded_location xloc, + int& line_len) { const char *buffer; int lines = 1; @@ -132,7 +170,7 @@ location_get_source_line (expanded_location xloc) if (!stream) return NULL; - while ((buffer = read_line (stream)) && lines < xloc.line) + while ((buffer = read_line (stream, line_len)) && lines < xloc.line) lines++; fclose (stream); diff --git a/gcc/input.h b/gcc/input.h index 8fdc7b2..79b3a10 100644 --- a/gcc/input.h +++ b/gcc/input.h @@ -37,7 +37,8 @@ extern char builtins_location_check[(BUILTINS_LOCATION < RESERVED_LOCATION_COUNT) ? 1 : -1]; extern expanded_location expand_location (source_location); -extern const char *location_get_source_line (expanded_location xloc); +extern const char *location_get_source_line (expanded_location xloc, + int& line_size); extern expanded_location expand_location_to_spelling_point (source_location); extern source_location expansion_point_location_if_in_system_header (source_location); diff --git a/gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c b/gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c new file mode 100644 index 0000000000000000000000000000000000000000..ff2ed962ac96e47ae05b0b040f4e10b8e09637e2 GIT binary patch literal 240 zcmdPbSEyD