[Patch, fortran] Module loading improvements part 1/3

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [Patch, fortran] Module loading improvements part 1/3
@ 2013-03-24 21:58 Janne Blomqvist
  2013-03-26 15:08 ` Mikael Morin
  0 siblings, 1 reply; 6+ messages in thread
From: Janne Blomqvist @ 2013-03-24 21:58 UTC (permalink / raw)
  To: Fortran List, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2083 bytes --]

Hi,

gfortran's handling of .mod files leaves a lot to be desired. One
problem being that the parsing of the module is done in such a way
that we seek back and forth in the file; see e.g. comments in PR
25708. This not only causes us to spend unnecessary time executing
lseek syscalls, but we also read a lot more data than needed, as the
typical stdio implementation discards any buffered data when the file
is seeked.

Furthermore, I had a patch a year or so ago to reduce the size of the
module files (PR 40958) by compressing them with zlib. Unfortunately
this turned out to be unworkable since seeking in a zlib file is
extremely inefficient (it appears to restart uncompression from the
beginning of the file in order to count up to the right offset). To
fix this I tried various approaches to parse module files without
requiring seeking, but I was not successful in this.

The attached patch takes the crude approach of first sequentially
reading the .mod file into a temporary buffer, then does the actual
parsing from that buffer. Testing on CP2K_2009-05-01.f90 (single-file
version of CP2K for benchmarking purposes), the -fsyntax-only compile
time drops from 55 seconds to 46, lseek syscalls drop from ~512k to
22k, read syscalls from 460k to 37k. This patch lays the groundwork
for

- zlib compression of .mod files (part 2/3).

- Caching module files, a crude implementation of the old "module
namespaces" idea. E.g. put the uncompressed module contents into a map
keyed by module name. (part 3/3).

Regtested on x86_64-unknown-linux-gnu, Ok for trunk?

2013-03-24  Janne Blomqvist  <jb@gcc.gnu.org>

	PR fortran/25708
	* module.c (module_locus): Use long for position.
	(module_content): New variable.
	(module_pos): Likewise.
	(prev_character): Remove.
	(bad_module): Free data instead of closing mod file.
	(set_module_locus): Use module_pos.
	(get_module_locus): Likewise.
	(module_char): use buffer rather than stdio file.
	(module_unget_char): Likewise.
	(read_module_to_tmpbuf): New function.
	(gfc_use_module): Call read_module_to_tmpbuf.


-- 
Janne Blomqvist

[-- Attachment #2: tmpbuf.diff --]
[-- Type: application/octet-stream, Size: 3174 bytes --]

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 1b38555..66b5e8e 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -88,7 +88,7 @@ along with GCC; see the file COPYING3.  If not see
 typedef struct
 {
   int column, line;
-  fpos_t pos;
+  long pos;
 }
 module_locus;
 
@@ -190,8 +190,12 @@ static struct md5_ctx ctx;
 static const char *module_name;
 static gfc_use_list *module_list;
 
+/* Content of module.  */
+static const char* module_content;
+
+static long module_pos;
 static int module_line, module_column, only_flag;
-static int prev_module_line, prev_module_column, prev_character;
+static int prev_module_line, prev_module_column;
 
 static enum
 { IO_INPUT, IO_OUTPUT }
@@ -1004,7 +1008,7 @@ static void bad_module (const char *) ATTRIBUTE_NORETURN;
 static void
 bad_module (const char *msgid)
 {
-  fclose (module_fp);
+  XDELETEVEC (module_content);
 
   switch (iomode)
     {
@@ -1031,7 +1035,7 @@ set_module_locus (module_locus *m)
 {
   module_column = m->column;
   module_line = m->line;
-  fsetpos (module_fp, &m->pos);
+  module_pos = m->pos;
 }
 
 
@@ -1042,7 +1046,7 @@ get_module_locus (module_locus *m)
 {
   m->column = module_column;
   m->line = module_line;
-  fgetpos (module_fp, &m->pos);
+  m->pos = module_pos;
 }
 
 
@@ -1052,16 +1056,12 @@ get_module_locus (module_locus *m)
 static int
 module_char (void)
 {
-  int c;
-
-  c = getc (module_fp);
-
-  if (c == EOF)
+  const char c = module_content[module_pos++];
+  if (c == '\0')
     bad_module ("Unexpected EOF");
 
   prev_module_line = module_line;
   prev_module_column = module_column;
-  prev_character = c;
 
   if (c == '\n')
     {
@@ -1081,7 +1081,7 @@ module_unget_char (void)
 {
   module_line = prev_module_line;
   module_column = prev_module_column;
-  ungetc (prev_character, module_fp);
+  module_pos--;
 }
 
 /* Parse a string constant.  The delimiter is guaranteed to be a
@@ -5903,6 +5903,29 @@ create_derived_type (const char *name, const char *modname,
 }
 
 
+/* Read the contents of the module file into a temporary buffer.  */
+
+static void
+read_module_to_tmpbuf ()
+{
+  /* Find out the size of the file and reserve space.  Assume we're at
+     the beginning.  */
+  fseek (module_fp, 0, SEEK_END);
+  long file_size = ftell (module_fp);
+  fseek (module_fp, 0, SEEK_SET);
+
+  /* An extra byte for the terminating NULL.  */
+  char* s = XNEWVEC (char, file_size + 1);
+
+  size_t nread = fread (s, 1, file_size, module_fp);
+  gcc_assert (nread == file_size);
+  s[file_size] = '\0';
+
+  module_content = s;
+  module_pos = 0;
+}
+
+
 /* USE the ISO_FORTRAN_ENV intrinsic module.  */
 
 static void
@@ -6187,6 +6210,9 @@ gfc_use_module (gfc_use_list *module)
   module_column = 1;
   start = 0;
 
+  read_module_to_tmpbuf ();
+  fclose (module_fp);
+
   /* Skip the first two lines of the module, after checking that this is
      a gfortran module file.  */
   line = 0;
@@ -6234,7 +6260,8 @@ gfc_use_module (gfc_use_list *module)
   free_pi_tree (pi_root);
   pi_root = NULL;
 
-  fclose (module_fp);
+  XDELETEVEC (module_content);
+  module_content = NULL;
 
   use_stmt = gfc_get_use_list ();
   *use_stmt = *module;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch, fortran] Module loading improvements part 1/3
  2013-03-24 21:58 [Patch, fortran] Module loading improvements part 1/3 Janne Blomqvist
@ 2013-03-26 15:08 ` Mikael Morin
  2013-03-26 15:24   ` Tobias Burnus
  2013-03-26 22:09   ` Janne Blomqvist
  0 siblings, 2 replies; 6+ messages in thread
From: Mikael Morin @ 2013-03-26 15:08 UTC (permalink / raw)
  To: Janne Blomqvist; +Cc: Fortran List, GCC Patches

Le 24/03/2013 22:58, Janne Blomqvist a Ã©crit :
> The attached patch takes the crude approach of first sequentially
> reading the .mod file into a temporary buffer, then does the actual
> parsing from that buffer. 
>
I don't like it much, but knowing how bad module files are currently
handled, it's probably the way to go.

> This patch lays the groundwork
> for
> 
> - zlib compression of .mod files (part 2/3).
> 
This one sounds promising


> - Caching module files, a crude implementation of the old "module
> namespaces" idea. E.g. put the uncompressed module contents into a map
> keyed by module name. (part 3/3).
> 
But I'm not too fond of that one.  Bah, let's see if it improves things.



> @@ -1004,7 +1008,7 @@ static void bad_module (const char *) ATTRIBUTE_NORETURN;
>  static void
>  bad_module (const char *msgid)
>  {
> -  fclose (module_fp);
> +  XDELETEVEC (module_content);
>  
I know this is followed by fatal errors only, but I would feel much
better with an additional 'module_content = NULL;'

OK with that change.

Mikael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch, fortran] Module loading improvements part 1/3
  2013-03-26 15:08 ` Mikael Morin
@ 2013-03-26 15:24   ` Tobias Burnus
  2013-03-26 17:23     ` Thomas Koenig
  2013-03-26 22:09   ` Janne Blomqvist
  1 sibling, 1 reply; 6+ messages in thread
From: Tobias Burnus @ 2013-03-26 15:24 UTC (permalink / raw)
  To: Mikael Morin; +Cc: Janne Blomqvist, Fortran List, GCC Patches

Mikael Morin wrote:
> Le 24/03/2013 22:58, Janne Blomqvist a Ã©crit :
>> The attached patch takes the crude approach of first sequentially
>> reading the .mod file into a temporary buffer, then does the actual
>> parsing from that buffer.
> I don't like it much, but knowing how bad module files are currently
> handled, it's probably the way to go.
...
>> - Caching module files, a crude implementation of the old "module
>> namespaces" idea. E.g. put the uncompressed module contents into a map
>> keyed by module name. (part 3/3).
>>
> But I'm not too fond of that one.  Bah, let's see if it improves things.

I wonder whether one should also do what Joost has proposed:* Changing 
"allocatable" to "al" etc. That reduces both the .mod file size (and 
thus I/O and improves caching) and the memory consumption of the 
compiler with the proposed caching scheme. As context-aware compression, 
it could even have a better ratio than ZIP. (ZIP can still be used on 
top of it.)

Tobias

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch, fortran] Module loading improvements part 1/3
  2013-03-26 15:24   ` Tobias Burnus
@ 2013-03-26 17:23     ` Thomas Koenig
  2013-03-26 22:27       ` Janne Blomqvist
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Koenig @ 2013-03-26 17:23 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Mikael Morin, Janne Blomqvist, Fortran List, GCC Patches

Am 26.03.2013 16:24, schrieb Tobias Burnus:

> I wonder whether one should also do what Joost has proposed:* Changing
> "allocatable" to "al" etc. That reduces both the .mod file size (and
> thus I/O and improves caching) and the memory consumption of the
> compiler with the proposed caching scheme. As context-aware compression,
> it could even have a better ratio than ZIP. (ZIP can still be used on
> top of it.)

I have been thinking a little bit about a complete redesign of the
module files.

Ideally, I would like it to be an extensible binary format, which uses
a keyword-value combination and which could be accompanied by a "dumper"
which makes it human-readable.  A simple yacc grammar could handle
both the reading and making the "dumper".

If this is designed right, it might not even be necessary to bump the
module number for old library files.

	Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch, fortran] Module loading improvements part 1/3
  2013-03-26 15:08 ` Mikael Morin
  2013-03-26 15:24   ` Tobias Burnus
@ 2013-03-26 22:09   ` Janne Blomqvist
  1 sibling, 0 replies; 6+ messages in thread
From: Janne Blomqvist @ 2013-03-26 22:09 UTC (permalink / raw)
  To: Mikael Morin; +Cc: Fortran List, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

On Tue, Mar 26, 2013 at 5:08 PM, Mikael Morin <mikael.morin@sfr.fr> wrote:
> Le 24/03/2013 22:58, Janne Blomqvist a écrit :
>> The attached patch takes the crude approach of first sequentially
>> reading the .mod file into a temporary buffer, then does the actual
>> parsing from that buffer.
>>
> I don't like it much, but knowing how bad module files are currently
> handled, it's probably the way to go.

I don't like it either, I'd have preferred that my previous efforts to
parse module files sequentially without the tmp buffer would have been
successful. But alas, this is better than what we have now, IMHO.

>> - Caching module files, a crude implementation of the old "module
>> namespaces" idea. E.g. put the uncompressed module contents into a map
>> keyed by module name. (part 3/3).
>>
> But I'm not too fond of that one.  Bah, let's see if it improves things.

As of today, I don't think this is useful. But on top of the zlib
patch it would reduce the decompression overhead when the same module
is used repeatedly in the same translation unit. Whether the zlib
overhead is significant enough to be worth bothering about I don't
know; I'm happy to take a wait-and-see approach wrt this.

>> @@ -1004,7 +1008,7 @@ static void bad_module (const char *) ATTRIBUTE_NORETURN;
>>  static void
>>  bad_module (const char *msgid)
>>  {
>> -  fclose (module_fp);
>> +  XDELETEVEC (module_content);
>>
> I know this is followed by fatal errors only, but I would feel much
> better with an additional 'module_content = NULL;'
>
> OK with that change.

Done. Joost also reported that my original patch didn't pass bootstrap
(I cheated and used --disable-bootstrap). Slightly fixed patch
attached, and committed as r197124.


-- 
Janne Blomqvist

[-- Attachment #2: tmpbuf.diff --]
[-- Type: application/octet-stream, Size: 3153 bytes --]

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index ee09291..814a40d 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -88,7 +88,7 @@ along with GCC; see the file COPYING3.  If not see
 typedef struct
 {
   int column, line;
-  fpos_t pos;
+  long pos;
 }
 module_locus;
 
@@ -190,8 +190,12 @@ static struct md5_ctx ctx;
 static const char *module_name;
 static gfc_use_list *module_list;
 
+/* Content of module.  */
+static char* module_content;
+
+static long module_pos;
 static int module_line, module_column, only_flag;
-static int prev_module_line, prev_module_column, prev_character;
+static int prev_module_line, prev_module_column;
 
 static enum
 { IO_INPUT, IO_OUTPUT }
@@ -1004,7 +1008,8 @@ static void bad_module (const char *) ATTRIBUTE_NORETURN;
 static void
 bad_module (const char *msgid)
 {
-  fclose (module_fp);
+  XDELETEVEC (module_content);
+  module_content = NULL;
 
   switch (iomode)
     {
@@ -1031,7 +1036,7 @@ set_module_locus (module_locus *m)
 {
   module_column = m->column;
   module_line = m->line;
-  fsetpos (module_fp, &m->pos);
+  module_pos = m->pos;
 }
 
 
@@ -1042,7 +1047,7 @@ get_module_locus (module_locus *m)
 {
   m->column = module_column;
   m->line = module_line;
-  fgetpos (module_fp, &m->pos);
+  m->pos = module_pos;
 }
 
 
@@ -1052,16 +1057,12 @@ get_module_locus (module_locus *m)
 static int
 module_char (void)
 {
-  int c;
-
-  c = getc (module_fp);
-
-  if (c == EOF)
+  const char c = module_content[module_pos++];
+  if (c == '\0')
     bad_module ("Unexpected EOF");
 
   prev_module_line = module_line;
   prev_module_column = module_column;
-  prev_character = c;
 
   if (c == '\n')
     {
@@ -1081,7 +1082,7 @@ module_unget_char (void)
 {
   module_line = prev_module_line;
   module_column = prev_module_column;
-  ungetc (prev_character, module_fp);
+  module_pos--;
 }
 
 /* Parse a string constant.  The delimiter is guaranteed to be a
@@ -6019,6 +6020,27 @@ create_derived_type (const char *name, const char *modname,
 }
 
 
+/* Read the contents of the module file into a temporary buffer.  */
+
+static void
+read_module_to_tmpbuf ()
+{
+  /* Find out the size of the file and reserve space.  Assume we're at
+     the beginning.  */
+  fseek (module_fp, 0, SEEK_END);
+  long file_size = ftell (module_fp);
+  fseek (module_fp, 0, SEEK_SET);
+
+  /* An extra byte for the terminating NULL.  */
+  module_content = XNEWVEC (char, file_size + 1);
+
+  fread (module_content, 1, file_size, module_fp);
+  module_content[file_size] = '\0';
+
+  module_pos = 0;
+}
+
+
 /* USE the ISO_FORTRAN_ENV intrinsic module.  */
 
 static void
@@ -6289,6 +6311,9 @@ gfc_use_module (gfc_use_list *module)
   module_column = 1;
   start = 0;
 
+  read_module_to_tmpbuf ();
+  fclose (module_fp);
+
   /* Skip the first two lines of the module, after checking that this is
      a gfortran module file.  */
   line = 0;
@@ -6336,7 +6361,8 @@ gfc_use_module (gfc_use_list *module)
   free_pi_tree (pi_root);
   pi_root = NULL;
 
-  fclose (module_fp);
+  XDELETEVEC (module_content);
+  module_content = NULL;
 
   use_stmt = gfc_get_use_list ();
   *use_stmt = *module;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch, fortran] Module loading improvements part 1/3
  2013-03-26 17:23     ` Thomas Koenig
@ 2013-03-26 22:27       ` Janne Blomqvist
  0 siblings, 0 replies; 6+ messages in thread
From: Janne Blomqvist @ 2013-03-26 22:27 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: Tobias Burnus, Mikael Morin, Fortran List, GCC Patches

On Tue, Mar 26, 2013 at 7:23 PM, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Am 26.03.2013 16:24, schrieb Tobias Burnus:
>
>
>> I wonder whether one should also do what Joost has proposed:* Changing
>> "allocatable" to "al" etc. That reduces both the .mod file size (and
>> thus I/O and improves caching) and the memory consumption of the
>> compiler with the proposed caching scheme. As context-aware compression,
>> it could even have a better ratio than ZIP. (ZIP can still be used on
>> top of it.)
>
>
> I have been thinking a little bit about a complete redesign of the
> module files.

Yes, the more I dig into module.c the more I agree with that sentiment.. :-/

> Ideally, I would like it to be an extensible binary format, which uses
> a keyword-value combination and which could be accompanied by a "dumper"
> which makes it human-readable.  A simple yacc grammar could handle
> both the reading and making the "dumper".
>
> If this is designed right, it might not even be necessary to bump the
> module number for old library files.

I'd suggest not inventing our own wheel but rather using an existing
one such as Google's protocol buffers.

https://developers.google.com/protocol-buffers/docs/overview


-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-03-26 22:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-24 21:58 [Patch, fortran] Module loading improvements part 1/3 Janne Blomqvist
2013-03-26 15:08 ` Mikael Morin
2013-03-26 15:24   ` Tobias Burnus
2013-03-26 17:23     ` Thomas Koenig
2013-03-26 22:27       ` Janne Blomqvist
2013-03-26 22:09   ` Janne Blomqvist

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).