public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: Dodji Seketeli <dodji@redhat.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>,
	       "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
Date: Thu, 31 Oct 2013 18:26:00 -0000	[thread overview]
Message-ID: <20131031173649.GW27813@tucnak.zalov.cz> (raw)
In-Reply-To: <87y559xz7y.fsf@redhat.com>

On Thu, Oct 31, 2013 at 04:00:01PM +0100, Dodji Seketeli wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> 
> > On Thu, Oct 31, 2013 at 03:36:07PM +0100, Bernd Edlinger wrote:
> >> if you want to read zero-chars, why don't you simply use fgetc,
> >> optionally replacing '\0' with ' ' in read_line?
> >
> > Because it is too slow?
> >
> > getline(3) would be much better for this purpose, though of course
> > it is a GNU extension in glibc and so we'd need some fallback, which
> > very well could be the fgetc or something similar.
> 
> So would getline (+ the current patch as a fallback) be acceptable?

I think even as a fallback is the patch too expensive.
I'd say best would be to write some getline API compatible function
and just use it, using fread on say fixed size buffer (4KB or similar),
then for the number of characters returned by fread that were stored
into that buffer look for the line terminator there and allocate/copy
to the dynamically allocated buffer.  A slight complication is what to do
on mingw/cygwin and other DOS or Mac style line ending environments,
no idea what fgets exactly does there.  But, ignoring the DOS/Mac style line
endings, it would be roughly (partially from glibc iogetdelim.c).

ssize_t
getline_fallback (char **lineptr, size_t *n, FILE *fp)
{
  ssize_t cur_len = 0, len;
  char buf[16384];

  if (lineptr == NULL || n == NULL)
    return -1;

  if (*lineptr == NULL || *n == 0)
    {
      *n = 120;
      *lineptr = (char *) malloc (*n);
      if (*lineptr == NULL)
	return -1;
    }

  len = fread (buf, 1, sizeof buf, fp);
  if (ferror (fp))
    return -1;

  for (;;)
    {
      size_t needed;
      char *t = memchr (buf, '\n', len);
      if (t != NULL)
	len = (t - buf) + 1;
      if (__builtin_expect (len >= SSIZE_MAX - cur_len, 0))
	return -1;
      needed = cur_len + len + 1;
      if (needed > *n)
	{
	  char *new_lineptr;
	  if (needed < 2 * *n)
	    needed = 2 * *n;
	  new_lineptr = realloc (*lineptr, needed);
	  if (new_lineptr == NULL)
	    return -1;
	  *lineptr = new_lineptr;
	  *n = needed;
	}
      memcpy (*lineptr + cur_len, buf, len);
      cur_len += len;
      if (t != NULL)
	break;
      len = fread (buf, 1, sizeof buf, fp);
      if (ferror (fp))
	return -1;
      if (len == 0)
	break;
    }
  (*lineptr)[cur_len] = '\0';
  return cur_len;
}

For the DOS/Mac style line endings, you probably want to look at what
exactly does libcpp do with them.

BTW, we probably want to do something with the speed of the caret
diagnostics too, right now it opens the file again for each single line
to be printed in caret diagnostics and reads all lines until the right one,
so imagine how fast is printing of many warnings on almost adjacent lines
near the end of many megabytes long file.
Perhaps we could remember the last file we've opened for caret diagnostics,
don't fclose the file right away but only if a new request is for a
different file, perhaps keep some vector of line start offsets (say starting
byte of every 100th line or similar) and also remember the last read line
offset, so if a new request is for the same file, but higher line than last,
we can just keep getlineing, and if it is smaller line than last, we look up
the offset of the line / 100, fseek to it and just getline only modulo 100
lines.  Maybe we should keep not just one, but 2 or 4 opened files as cache
(again, with the starting line offsets vectors).

	Jakub

  reply	other threads:[~2013-10-31 17:36 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-31 14:48 Bernd Edlinger
2013-10-31 15:06 ` Jakub Jelinek
2013-10-31 15:19   ` Dodji Seketeli
2013-10-31 18:26     ` Jakub Jelinek [this message]
2013-11-04 11:52       ` Dodji Seketeli
2013-11-04 11:59         ` Jakub Jelinek
2013-11-04 15:42           ` Dodji Seketeli
2013-11-05  0:10             ` Bernd Edlinger
2013-11-05  9:50               ` Dodji Seketeli
2013-11-05 11:19                 ` Bernd Edlinger
2013-11-05 11:43                   ` Dodji Seketeli
2013-11-06 22:27                 ` Bernd Edlinger
2013-11-04 12:06         ` Bernd Edlinger
2013-11-04 12:15           ` Jakub Jelinek
2013-11-04 12:32             ` Bernd Edlinger
2013-11-04 15:21           ` Dodji Seketeli
2013-11-11 10:49       ` Dodji Seketeli
2013-11-11 14:35         ` Jakub Jelinek
2013-11-11 17:13           ` Dodji Seketeli
2013-11-12 16:42             ` Dodji Seketeli
2013-11-13  5:10               ` Bernd Edlinger
2013-11-13  9:40                 ` Dodji Seketeli
2013-11-13  9:43                   ` Bernd Edlinger
2013-11-13  9:49                     ` Dodji Seketeli
2013-11-13  9:49                     ` Dodji Seketeli
2013-11-13  9:51               ` Jakub Jelinek
2013-11-14 15:12                 ` Dodji Seketeli
2013-12-09 20:11                   ` Tom Tromey
2014-01-21 12:28                   ` Bernd Edlinger
2014-01-22  8:16                     ` Dodji Seketeli
2014-01-23 17:12                       ` Jakub Jelinek
2014-01-24  2:58                         ` Bernd Edlinger
2014-01-24  7:53                         ` Dodji Seketeli
2014-01-24 15:05                       ` Markus Trippelsdorf
2014-01-24 15:41                         ` Dodji Seketeli
2014-01-24 15:44                           ` Jakub Jelinek
2014-01-24 16:09                             ` Dodji Seketeli
2014-01-24 16:13                               ` Jakub Jelinek
2014-01-24 23:02                               ` Markus Trippelsdorf
2014-01-24 23:20                                 ` Markus Trippelsdorf
2014-01-28 13:20                                   ` Dodji Seketeli
2014-01-28 13:23                                     ` Dodji Seketeli
2014-01-28 18:40                                       ` H.J. Lu
2014-01-29 11:28                                         ` Dodji Seketeli
  -- strict thread matches above, loose matches on Subject: below --
2013-10-31 13:45 Dodji Seketeli
2013-10-31 17:30 ` Manuel López-Ibáñez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131031173649.GW27813@tucnak.zalov.cz \
    --to=jakub@redhat.com \
    --cc=bernd.edlinger@hotmail.de \
    --cc=dodji@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).