From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19338 invoked by alias); 31 Oct 2013 17:36:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 19329 invoked by uid 89); 31 Oct 2013 17:36:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.3 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 31 Oct 2013 17:36:54 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r9VHaqPr023541 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 31 Oct 2013 13:36:52 -0400 Received: from tucnak.zalov.cz (vpn1-5-123.ams2.redhat.com [10.36.5.123]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r9VHaoRo023348 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 31 Oct 2013 13:36:51 -0400 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.14.7/8.14.7) with ESMTP id r9VHang6028367; Thu, 31 Oct 2013 18:36:49 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.14.7/8.14.7/Submit) id r9VHanIt028366; Thu, 31 Oct 2013 18:36:49 +0100 Date: Thu, 31 Oct 2013 18:26:00 -0000 From: Jakub Jelinek To: Dodji Seketeli Cc: Bernd Edlinger , "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals Message-ID: <20131031173649.GW27813@tucnak.zalov.cz> Reply-To: Jakub Jelinek References: <20131031144309.GR27813@tucnak.zalov.cz> <87y559xz7y.fsf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y559xz7y.fsf@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes X-SW-Source: 2013-10/txt/msg02719.txt.bz2 On Thu, Oct 31, 2013 at 04:00:01PM +0100, Dodji Seketeli wrote: > Jakub Jelinek writes: > > > On Thu, Oct 31, 2013 at 03:36:07PM +0100, Bernd Edlinger wrote: > >> if you want to read zero-chars, why don't you simply use fgetc, > >> optionally replacing '\0' with ' ' in read_line? > > > > Because it is too slow? > > > > getline(3) would be much better for this purpose, though of course > > it is a GNU extension in glibc and so we'd need some fallback, which > > very well could be the fgetc or something similar. > > So would getline (+ the current patch as a fallback) be acceptable? I think even as a fallback is the patch too expensive. I'd say best would be to write some getline API compatible function and just use it, using fread on say fixed size buffer (4KB or similar), then for the number of characters returned by fread that were stored into that buffer look for the line terminator there and allocate/copy to the dynamically allocated buffer. A slight complication is what to do on mingw/cygwin and other DOS or Mac style line ending environments, no idea what fgets exactly does there. But, ignoring the DOS/Mac style line endings, it would be roughly (partially from glibc iogetdelim.c). ssize_t getline_fallback (char **lineptr, size_t *n, FILE *fp) { ssize_t cur_len = 0, len; char buf[16384]; if (lineptr == NULL || n == NULL) return -1; if (*lineptr == NULL || *n == 0) { *n = 120; *lineptr = (char *) malloc (*n); if (*lineptr == NULL) return -1; } len = fread (buf, 1, sizeof buf, fp); if (ferror (fp)) return -1; for (;;) { size_t needed; char *t = memchr (buf, '\n', len); if (t != NULL) len = (t - buf) + 1; if (__builtin_expect (len >= SSIZE_MAX - cur_len, 0)) return -1; needed = cur_len + len + 1; if (needed > *n) { char *new_lineptr; if (needed < 2 * *n) needed = 2 * *n; new_lineptr = realloc (*lineptr, needed); if (new_lineptr == NULL) return -1; *lineptr = new_lineptr; *n = needed; } memcpy (*lineptr + cur_len, buf, len); cur_len += len; if (t != NULL) break; len = fread (buf, 1, sizeof buf, fp); if (ferror (fp)) return -1; if (len == 0) break; } (*lineptr)[cur_len] = '\0'; return cur_len; } For the DOS/Mac style line endings, you probably want to look at what exactly does libcpp do with them. BTW, we probably want to do something with the speed of the caret diagnostics too, right now it opens the file again for each single line to be printed in caret diagnostics and reads all lines until the right one, so imagine how fast is printing of many warnings on almost adjacent lines near the end of many megabytes long file. Perhaps we could remember the last file we've opened for caret diagnostics, don't fclose the file right away but only if a new request is for a different file, perhaps keep some vector of line start offsets (say starting byte of every 100th line or similar) and also remember the last read line offset, so if a new request is for the same file, but higher line than last, we can just keep getlineing, and if it is smaller line than last, we look up the offset of the line / 100, fseek to it and just getline only modulo 100 lines. Maybe we should keep not just one, but 2 or 4 opened files as cache (again, with the starting line offsets vectors). Jakub