public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* C++ ping: reduce size of cp_token
@ 2008-03-15  0:30 Tom Tromey
  2008-03-15  9:44 ` Jakub Jelinek
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Tromey @ 2008-03-15  0:30 UTC (permalink / raw)
  To: Gcc Patch List

Ping.  This patch is pretty simple and reduces the size of cp_token.

http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00023.html

Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: C++ ping: reduce size of cp_token
  2008-03-15  0:30 C++ ping: reduce size of cp_token Tom Tromey
@ 2008-03-15  9:44 ` Jakub Jelinek
  2008-03-18 20:15   ` Tom Tromey
  2008-03-20 19:58   ` Tom Tromey
  0 siblings, 2 replies; 9+ messages in thread
From: Jakub Jelinek @ 2008-03-15  9:44 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Gcc Patch List

On Fri, Mar 14, 2008 at 05:22:36PM -0600, Tom Tromey wrote:
> Ping.  This patch is pretty simple and reduces the size of cp_token.
> 
> http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00023.html

What size increase does it cause on parser.o's .text section?
There are hundreds of ->type accesses in parser.c, and that doesn't
count inlined cp_lexer_next_token_is{,_not} etc.  If it is 8 bit, it
can use char access instruction, when it is 7 bit, it needs to additionally
mask it.

What about instead reusing one of the ->flags bits (from what I can see,
parser.c only cares about 3 bits from that or so):
egrep '(\.|->)flags' parser.c
    = c_lex_with_flags (&token->u.value, &token->location, &token->flags,
      || (token->flags & PREV_WHITE))
      && next_token->flags & DIGRAPH
      && !(next_token_2->flags & PREV_WHITE))
  if (token->type != CPP_NUMBER || !(token->flags & PURE_ZERO))
  if (token->type == CPP_OPEN_SQUARE && token->flags & DIGRAPH)
      if (token2->type == CPP_COLON && !(token2->flags & PREV_WHITE))

So if after c_lex_with_flags call you do token->flags &= ~AMBIGUOUS_P;
and transform ->ambiguous_p into ->flags & AMBIGUOUS_P, it might
have smaller code size impact and perhaps be even faster.

	Jakub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: C++ ping: reduce size of cp_token
  2008-03-15  9:44 ` Jakub Jelinek
@ 2008-03-18 20:15   ` Tom Tromey
  2008-03-20 19:58   ` Tom Tromey
  1 sibling, 0 replies; 9+ messages in thread
From: Tom Tromey @ 2008-03-18 20:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Gcc Patch List

>>>>> "Jakub" == Jakub Jelinek <jakub@redhat.com> writes:

Jakub> On Fri, Mar 14, 2008 at 05:22:36PM -0600, Tom Tromey wrote:
>> Ping.  This patch is pretty simple and reduces the size of cp_token.
>> 
>> http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00023.html

Jakub> What size increase does it cause on parser.o's .text section?
Jakub> There are hundreds of ->type accesses in parser.c, and that
Jakub> doesn't count inlined cp_lexer_next_token_is{,_not} etc.  If it
Jakub> is 8 bit, it can use char access instruction, when it is 7 bit,
Jakub> it needs to additionally mask it.

Thanks for looking at this.  I'll take a look at your idea sometime
later this week.

Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: C++ ping: reduce size of cp_token
  2008-03-15  9:44 ` Jakub Jelinek
  2008-03-18 20:15   ` Tom Tromey
@ 2008-03-20 19:58   ` Tom Tromey
  2008-03-24 17:39     ` Mark Mitchell
  1 sibling, 1 reply; 9+ messages in thread
From: Tom Tromey @ 2008-03-20 19:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Gcc Patch List

>>>>> "Jakub" == Jakub Jelinek <jakub@redhat.com> writes:

Jakub> What size increase does it cause on parser.o's .text section?

In the end I didn't look at this, I just rewrote the patch following
your suggestion, since I agreed with your logic.

Instead of re-purposing flag bits in the 'flags' field, I just turned
the needed flags into boolean bitfields in cp_token, since that is
clearer (IMO).  Also, it lets us make it clear that we have 4 unused
bits.

Bootstrapped and regression tested on the compile farm.  Ok?

Tom

gcc/cp/ChangeLog:
2008-03-20  Tom Tromey  <tromey@redhat.com>

	* parser.c (struct cp_token) <flags>: Remove.
	<flag_prev_white, flag_digraph, flag_pure_zero>: New fields.
	<location>: Move earlier.
	(eof_token): Update.
	(cp_lexer_get_preprocessor_token): Set flag fields.
	(check_empty_body): Update.
	(cp_parser_template_id): Likewise.
	(cp_parser_pure_specifier): Likewise.
	(cp_parser_nth_token_starts_template_argument_list_p): Likewise.

Index: gcc/cp/parser.c
===================================================================
--- gcc/cp/parser.c	(revision 133353)
+++ gcc/cp/parser.c	(working copy)
@@ -58,7 +58,8 @@
   tree qualifying_scope;
 };
 
-/* A C++ token.  */
+/* A C++ token.  Note that this is carefully laid out to have the
+   least amount of padding.  */
 
 typedef struct cp_token GTY (())
 {
@@ -67,8 +68,12 @@
   /* If this token is a keyword, this value indicates which keyword.
      Otherwise, this value is RID_MAX.  */
   ENUM_BITFIELD (rid) keyword : 8;
-  /* Token flags.  */
-  unsigned char flags;
+  /* Token flag: if there was whitespace before the token.  */
+  BOOL_BITFIELD flag_prev_white : 1;
+  /* Token flag: if the token was a digraph.  */
+  BOOL_BITFIELD flag_digraph : 1;
+  /* Token flag: if the token was a single 0 digit.  */
+  BOOL_BITFIELD flag_pure_zero : 1;
   /* Identifier for the pragma.  */
   ENUM_BITFIELD (pragma_kind) pragma_kind : 6;
   /* True if this token is from a system header.  */
@@ -79,6 +84,9 @@
      KEYWORD is RID_MAX) iff this name was looked up and found to be
      ambiguous.  An error has already been reported.  */
   BOOL_BITFIELD ambiguous_p : 1;
+  /* Note: 4 free bits.  */
+  /* The location at which this token was found.  */
+  location_t location;
   /* The value associated with this token, if any.  */
   union cp_token_value {
     /* Used for CPP_NESTED_NAME_SPECIFIER and CPP_TEMPLATE_ID.  */
@@ -86,8 +94,6 @@
     /* Use for all other tokens.  */
     tree GTY((tag ("0"))) value;
   } GTY((desc ("(%1.type == CPP_TEMPLATE_ID) || (%1.type == CPP_NESTED_NAME_SPECIFIER)"))) u;
-  /* The location at which this token was found.  */
-  location_t location;
 } cp_token;
 
 /* We use a stack of token pointer for saving token sets.  */
@@ -97,8 +103,8 @@
 
 static cp_token eof_token =
 {
-  CPP_EOF, RID_MAX, 0, PRAGMA_NONE, 0, false, 0, { NULL },
-  0
+  CPP_EOF, RID_MAX, false, false, false, PRAGMA_NONE,
+  false, false, false, 0, { NULL }
 };
 
 /* The cp_lexer structure represents the C++ lexer.  It is responsible
@@ -401,14 +407,18 @@
 cp_lexer_get_preprocessor_token (cp_lexer *lexer, cp_token *token)
 {
   static int is_extern_c = 0;
+  unsigned char flags;
 
    /* Get a new token from the preprocessor.  */
   token->type
-    = c_lex_with_flags (&token->u.value, &token->location, &token->flags,
+    = c_lex_with_flags (&token->u.value, &token->location, &flags,
 			lexer == NULL ? 0 : C_LEX_RAW_STRINGS);
   token->keyword = RID_MAX;
   token->pragma_kind = PRAGMA_NONE;
   token->in_system_header = in_system_header;
+  token->flag_prev_white = (flags & PREV_WHITE) ? 1 : 0;
+  token->flag_digraph = (flags & DIGRAPH) ? 1 : 0;
+  token->flag_pure_zero = (flags & PURE_ZERO) ? 1 : 0;
 
   /* On some systems, some header files are surrounded by an
      implicit extern "C" block.  Set a flag in the token if it
@@ -7195,8 +7205,7 @@
   close_loc = expand_location (close_paren->location);
   token = cp_lexer_peek_nth_token (parser->lexer, 2);
 
-  if (token->type != CPP_SEMICOLON
-      || (token->flags & PREV_WHITE))
+  if (token->type != CPP_SEMICOLON || token->flag_prev_white)
     return;
 
   semi_loc =  expand_location (token->location);
@@ -9751,9 +9760,9 @@
   next_token = cp_lexer_peek_token (parser->lexer);
   next_token_2 = cp_lexer_peek_nth_token (parser->lexer, 2);
   if (next_token->type == CPP_OPEN_SQUARE
-      && next_token->flags & DIGRAPH
+      && next_token->flag_digraph
       && next_token_2->type == CPP_COLON
-      && !(next_token_2->flags & PREV_WHITE))
+      && !next_token_2->flag_prev_white)
     {
       cp_parser_parse_tentatively (parser);
       /* Change `:' into `::'.  */
@@ -15298,7 +15307,7 @@
   /* Look for the `0' token.  */
   token = cp_lexer_consume_token (parser->lexer);
   /* c_lex_with_flags marks a single digit '0' with PURE_ZERO.  */
-  if (token->type != CPP_NUMBER || !(token->flags & PURE_ZERO))
+  if (token->type != CPP_NUMBER || !token->flag_pure_zero)
     {
       cp_parser_error (parser,
 		       "invalid pure specifier (only `= 0' is allowed)");
@@ -17965,11 +17974,11 @@
   /* Check for the sequence `<::' in the original code. It would be lexed as
      `[:', where `[' is a digraph, and there is no whitespace before
      `:'.  */
-  if (token->type == CPP_OPEN_SQUARE && token->flags & DIGRAPH)
+  if (token->type == CPP_OPEN_SQUARE && token->flag_digraph)
     {
       cp_token *token2;
       token2 = cp_lexer_peek_nth_token (parser->lexer, n+1);
-      if (token2->type == CPP_COLON && !(token2->flags & PREV_WHITE))
+      if (token2->type == CPP_COLON && !token2->flag_prev_white)
 	return true;
     }
   return false;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: C++ ping: reduce size of cp_token
  2008-03-20 19:58   ` Tom Tromey
@ 2008-03-24 17:39     ` Mark Mitchell
  2008-03-24 18:05       ` Tom Tromey
  2008-03-25 20:34       ` Tom Tromey
  0 siblings, 2 replies; 9+ messages in thread
From: Mark Mitchell @ 2008-03-24 17:39 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Jakub Jelinek, Gcc Patch List

Tom Tromey wrote:

> gcc/cp/ChangeLog:
> 2008-03-20  Tom Tromey  <tromey@redhat.com>
> 
> 	* parser.c (struct cp_token) <flags>: Remove.
> 	<flag_prev_white, flag_digraph, flag_pure_zero>: New fields.
> 	<location>: Move earlier.
> 	(eof_token): Update.
> 	(cp_lexer_get_preprocessor_token): Set flag fields.
> 	(check_empty_body): Update.
> 	(cp_parser_template_id): Likewise.
> 	(cp_parser_pure_specifier): Likewise.
> 	(cp_parser_nth_token_starts_template_argument_list_p): Likewise.

Do you have any performance data for this patch, either in terms of 
memory saved or in terms of speed improvements?  This patch certainly 
looks like it should help, because making tokens smaller should save 
memory, and we know GCC performance is memory-sensitive.  However, it's 
always nice to measure.

I'll pre-approve the patch if you do some plausible measurement of your 
choice that shows that this makes things no worse.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: C++ ping: reduce size of cp_token
  2008-03-24 17:39     ` Mark Mitchell
@ 2008-03-24 18:05       ` Tom Tromey
  2008-03-25 20:34       ` Tom Tromey
  1 sibling, 0 replies; 9+ messages in thread
From: Tom Tromey @ 2008-03-24 18:05 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jakub Jelinek, Gcc Patch List

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

Mark> Do you have any performance data for this patch, either in terms of
Mark> memory saved or in terms of speed improvements?

Nope.  I don't see how it could possibly increase memory use fwiw.
But, I guess it could slow things down.

Mark> I'll pre-approve the patch if you do some plausible measurement of
Mark> your choice that shows that this makes things no worse.

Ok.  I will try to do that sometime soon.

Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: C++ ping: reduce size of cp_token
  2008-03-24 17:39     ` Mark Mitchell
  2008-03-24 18:05       ` Tom Tromey
@ 2008-03-25 20:34       ` Tom Tromey
  2008-03-26 12:34         ` Richard Guenther
  1 sibling, 1 reply; 9+ messages in thread
From: Tom Tromey @ 2008-03-25 20:34 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jakub Jelinek, Gcc Patch List

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

Mark> This patch certainly looks like it should help, because making
Mark> tokens smaller should save memory, and we know GCC performance
Mark> is memory-sensitive.  However, it's always nice to measure.

Mark> I'll pre-approve the patch if you do some plausible measurement of
Mark> your choice that shows that this makes things no worse.

I looked into this.  I rebuilt libstdc++ a number of times both with
and without the patch.

The memory saved by this patch is completely swamped by the memory
wasted due to the buffer reallocation approach in cp_lexer_new_main.
I instrumented this function to print the amount of memory being used.
For smallish compilation units, the default buffer is not filled.
Resizing the buffer also often yields a lot of wasted space.

So, consider this patch withdrawn.  There's no point to it unless the
greater part is dealt with.  FWIW it did not seem to have any
measurable effect on compilation time.

Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: C++ ping: reduce size of cp_token
  2008-03-25 20:34       ` Tom Tromey
@ 2008-03-26 12:34         ` Richard Guenther
  2008-03-26 14:19           ` Tom Tromey
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Guenther @ 2008-03-26 12:34 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Mark Mitchell, Jakub Jelinek, Gcc Patch List

On Tue, Mar 25, 2008 at 8:09 PM, Tom Tromey <tromey@redhat.com> wrote:
> >>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:
>
>  Mark> This patch certainly looks like it should help, because making
>  Mark> tokens smaller should save memory, and we know GCC performance
>  Mark> is memory-sensitive.  However, it's always nice to measure.
>
>  Mark> I'll pre-approve the patch if you do some plausible measurement of
>  Mark> your choice that shows that this makes things no worse.
>
>  I looked into this.  I rebuilt libstdc++ a number of times both with
>  and without the patch.
>
>  The memory saved by this patch is completely swamped by the memory
>  wasted due to the buffer reallocation approach in cp_lexer_new_main.
>  I instrumented this function to print the amount of memory being used.
>  For smallish compilation units, the default buffer is not filled.
>  Resizing the buffer also often yields a lot of wasted space.
>
>  So, consider this patch withdrawn.  There's no point to it unless the
>  greater part is dealt with.  FWIW it did not seem to have any
>  measurable effect on compilation time.

It is IMHO good to keep structs small, even if there happens to be no
visible effect of doing so.  So please do not withdraw it as the patch
does the right thing ;)

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: C++ ping: reduce size of cp_token
  2008-03-26 12:34         ` Richard Guenther
@ 2008-03-26 14:19           ` Tom Tromey
  0 siblings, 0 replies; 9+ messages in thread
From: Tom Tromey @ 2008-03-26 14:19 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Mark Mitchell, Jakub Jelinek, Gcc Patch List

>>>>> "Richard" == Richard Guenther <richard.guenther@gmail.com> writes:

>> So, consider this patch withdrawn.  There's no point to it unless the
>> greater part is dealt with.  FWIW it did not seem to have any
>> measurable effect on compilation time.

Richard> It is IMHO good to keep structs small, even if there happens to be no
Richard> visible effect of doing so.  So please do not withdraw it as the patch
Richard> does the right thing ;)

I was particularly slow yesterday and it took me a long time to
realize that, even though we still waste a lot of memory, shrinking
the struct may still let us avoid resizing the buffer in some cases --
providing a benefit.

So, I looked at the numbers for all the compiles in libstdc++.  Of the
106 compiles, we avoided reallocating the buffer in 3.  Therefore I
think this is still not a very important effect.

I wonder if the parser buffer impacts peak memory use.  That is, does
the peak occur while this buffer is live?  I tend to suspect not,
though I did not try to measure that.  I'm particularly curious since
the incremental C compiler uses a similar lexing approach, and I was
wondering if I should try to make it be smarter about buffer
allocation.

I don't really mind either way.  If you feel strongly about it I think
it should not hurt.

Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-03-26 13:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-15  0:30 C++ ping: reduce size of cp_token Tom Tromey
2008-03-15  9:44 ` Jakub Jelinek
2008-03-18 20:15   ` Tom Tromey
2008-03-20 19:58   ` Tom Tromey
2008-03-24 17:39     ` Mark Mitchell
2008-03-24 18:05       ` Tom Tromey
2008-03-25 20:34       ` Tom Tromey
2008-03-26 12:34         ` Richard Guenther
2008-03-26 14:19           ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).