public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Subject: Removing the arg_index field from cpp_hashnode
@ 2002-10-30 11:32 Per Bothner
  2002-10-30 12:05 ` Zack Weinberg
  2002-11-07  0:33 ` Neil Booth
  0 siblings, 2 replies; 5+ messages in thread
From: Per Bothner @ 2002-10-30 11:32 UTC (permalink / raw)
  To: gcc

Large compiles generate lots of IDENTIFIER_NODEs, each of which
has many fields, most of which are little used.  We can save a
lot of space by being a little more careful.

I suggest we start out by getting rid of the arg_index field of
cpp_hashnode, which is embedded in all IDENTIFIER_NODE for C/C++/ObjC.
Getting rid of arg_index directly saves 16 bits; on 32-bit machines it
saves a whole 4 bytes due to alignment.

As far as I can tell, the arg_index field is only used during
the processing of a #define, to determine which identifiers in
the macro expansion are macro parameters, and to catch duplicate
macro parameter names.

A simple replacement is to add a new flag:
#define NODE_MACROARG   (1 << 6)  /* Parameter in current macro def. */
When a macro parameter is seen, the NODE_MACROARG bit is set in the
flags field of the cpp_hashnode.  The bit is cleared at the end of the
macro definition, just as we currently clear arg_index.  Testing for
duplicate parameter names is trivial:  Just check the bit.

Harder is modifying lex_expansion_token, when we need to set the
arg_no of the CPP_MACROARG token.  An easy fix is to search the
argument list linearly looking for a match.  This causes O(M*N)
behavior, when M is the number of parameters, and N is the number
of references in the macro definition.  In practice both are likely
to be small, but on general principles we to avoid quadratic
algorithms.  And it is easy to avoid it.

The solution is to stash the arg_index in the cpp_hashnode, but
only during definition processing - i.e. only when the NODE_MACROARG
bit is set.  So we can stash the arg_index anywhere in the cpp_hashnode
that is not used during definition processing.  We need to save the
old value somewhere, and then restore it when we're done with the
define.  We can use the rid_code field for example - except that it
is only 8 bits (and I'd like to remove that field too).

My preferred choice would be to add an arg_index variant to the
'value' union.  Then setting CPP_MACROARG would also save the
entire union in a buffer, and _cpp_create_definition would
restore the entire union from the buffer.  But I don't know
if any of the fields are used using define processing.  Zack?

I'll implement and test this, assuming people think it is worthwhile.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Subject: Removing the arg_index field from cpp_hashnode
  2002-10-30 11:32 Subject: Removing the arg_index field from cpp_hashnode Per Bothner
@ 2002-10-30 12:05 ` Zack Weinberg
  2002-10-30 12:13   ` Per Bothner
  2002-11-07  0:33 ` Neil Booth
  1 sibling, 1 reply; 5+ messages in thread
From: Zack Weinberg @ 2002-10-30 12:05 UTC (permalink / raw)
  To: Per Bothner; +Cc: gcc

On Tue, Oct 29, 2002 at 02:16:39PM -0800, Per Bothner wrote:
> Large compiles generate lots of IDENTIFIER_NODEs, each of which
> has many fields, most of which are little used.  We can save a
> lot of space by being a little more careful.

I am all for this.

The really gaping space waste in IDENTIFIER_NODEs is most of struct
tree_common (20 bytes of which only a few of the flag bits are
actually used for something); however, that's very hard to fix.

> I suggest we start out by getting rid of the arg_index field of
> cpp_hashnode, which is embedded in all IDENTIFIER_NODE for C/C++/ObjC.
> Getting rid of arg_index directly saves 16 bits; on 32-bit machines it
> saves a whole 4 bytes due to alignment.

Sounds good.

Note that on 64-bit machines the patch will make no difference at all,
due to alignment, *unless* you swap the str and len fields in struct
ht_identifier; then struct cpp_hashnode will shrink by eight bytes.
(Both changes are necessary.)

> As far as I can tell, the arg_index field is only used during
> the processing of a #define, to determine which identifiers in
> the macro expansion are macro parameters, and to catch duplicate
> macro parameter names.

Correct.

[...]
> The solution is to stash the arg_index in the cpp_hashnode, but
> only during definition processing - i.e. only when the NODE_MACROARG
> bit is set.  So we can stash the arg_index anywhere in the cpp_hashnode
> that is not used during definition processing.  We need to save the
> old value somewhere, and then restore it when we're done with the
> define.  We can use the rid_code field for example - except that it
> is only 8 bits (and I'd like to remove that field too).

Note that the C99 requirement for minimum-maximum number of macro
arguments is only 127.  Still, 255 might be too small for some really
twisted generated code, and in general we like having lots of headroom.

I'm not sure what good you think it will do to get rid of the rid_code
field.  In principle, the same identifier might simultaneously be a
language keyword (rid_code), directive name (directive_index), and
macro parameter name (arg_index) -- e.g., "if"; there is no way to
get rid of the type and flags fields that I can see; and you'd only
save more space if you somehow got rid of all four of the 8-bit fields
(and then only on 32-bit systems).
 
> My preferred choice would be to add an arg_index variant to the
> 'value' union.  Then setting CPP_MACROARG would also save the
> entire union in a buffer, and _cpp_create_definition would
> restore the entire union from the buffer.  But I don't know
> if any of the fields are used using define processing.  Zack?

I do not believe that any of those fields are used during #define
processing.  Neil could probably tell you better; most of that code is
his.  All the relevant code should be in either cppmacro.c or
cpptrad.c.

I'd be happy to see a patch along these lines.

zw

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Subject: Removing the arg_index field from cpp_hashnode
  2002-10-30 12:05 ` Zack Weinberg
@ 2002-10-30 12:13   ` Per Bothner
  2002-10-30 12:44     ` Zack Weinberg
  0 siblings, 1 reply; 5+ messages in thread
From: Per Bothner @ 2002-10-30 12:13 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc

Zack Weinberg wrote:

> The really gaping space waste in IDENTIFIER_NODEs is most of struct
> tree_common (20 bytes of which only a few of the flag bits are
> actually used for something); however, that's very hard to fix.

I have some other ideas for IDENTIFIER_NODE, but they are orthogonal,
and I'll bring them on individually.

I only count 12 bytes for tree_common (on a 32-bit machine), unless
there is some memory allocation overhead.

> I'm not sure what good you think it will do to get rid of the rid_code
> field.  In principle, the same identifier might simultaneously be a
> language keyword (rid_code), directive name (directive_index), and
> macro parameter name (arg_index) -- e.g., "if".

Well, the rid_code (reserved word) and IDENTIFIER_LOCAL_VALUE
(non-reserved identifier) are mutually exclusive, so for example
we can put them both in the same union.  For example.

> ; there is no way to
> get rid of the type and flags fields that I can see;

The 'type' flag as far as I can see only needs two bits.  The flags
fields can perhaps be combined with the flags in tree_common, though
that is harder.

>  and you'd only
> save more space if you somehow got rid of all four of the 8-bit fields
> (and then only on 32-bit systems).

Yes, saving another word is harder, and may require more radical
changes.  We can save a number of fields in lang_ident, but I'll
get to that in a later message.

But let's start with arg_index.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Subject: Removing the arg_index field from cpp_hashnode
  2002-10-30 12:13   ` Per Bothner
@ 2002-10-30 12:44     ` Zack Weinberg
  0 siblings, 0 replies; 5+ messages in thread
From: Zack Weinberg @ 2002-10-30 12:44 UTC (permalink / raw)
  To: Per Bothner; +Cc: gcc

On Tue, Oct 29, 2002 at 03:35:43PM -0800, Per Bothner wrote:
> Zack Weinberg wrote:
> 
> >The really gaping space waste in IDENTIFIER_NODEs is most of struct
> >tree_common (20 bytes of which only a few of the flag bits are
> >actually used for something); however, that's very hard to fix.
> 
> I have some other ideas for IDENTIFIER_NODE, but they are orthogonal,
> and I'll bring them on individually.
> 
> I only count 12 bytes for tree_common (on a 32-bit machine), unless
> there is some memory allocation overhead.

My bad, yes.  20 bytes is the total size of struct tree_identifier on
a 32-bit machine.

> Well, the rid_code (reserved word) and IDENTIFIER_LOCAL_VALUE
> (non-reserved identifier) are mutually exclusive, so for example
> we can put them both in the same union.  For example.

Could work.  I believe rid_code is only in struct cpp_hashnode because
there were eight convenient bits of padding going to waste there...

> Yes, saving another word is harder, and may require more radical
> changes.  We can save a number of fields in lang_ident, but I'll
> get to that in a later message.
> 
> But let's start with arg_index.

Fair 'nuff.  I look forward to seeing your patches.  (You're still
listed as a cpplib maintainer, you don't need me to rubber stamp
them.)

zw

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Subject: Removing the arg_index field from cpp_hashnode
  2002-10-30 11:32 Subject: Removing the arg_index field from cpp_hashnode Per Bothner
  2002-10-30 12:05 ` Zack Weinberg
@ 2002-11-07  0:33 ` Neil Booth
  1 sibling, 0 replies; 5+ messages in thread
From: Neil Booth @ 2002-11-07  0:33 UTC (permalink / raw)
  To: Per Bothner; +Cc: gcc

Per Bothner wrote:-

> I suggest we start out by getting rid of the arg_index field of
> cpp_hashnode, which is embedded in all IDENTIFIER_NODE for C/C++/ObjC.
> Getting rid of arg_index directly saves 16 bits; on 32-bit machines it
> saves a whole 4 bytes due to alignment.

I like the idea.
 
> A simple replacement is to add a new flag:
> #define NODE_MACROARG   (1 << 6)  /* Parameter in current macro def. */
> When a macro parameter is seen, the NODE_MACROARG bit is set in the
> flags field of the cpp_hashnode.  The bit is cleared at the end of the
> macro definition, just as we currently clear arg_index.  Testing for
> duplicate parameter names is trivial:  Just check the bit.
> 
> Harder is modifying lex_expansion_token, when we need to set the
> arg_no of the CPP_MACROARG token.  An easy fix is to search the
> argument list linearly looking for a match.  This causes O(M*N)
> behavior, when M is the number of parameters, and N is the number
> of references in the macro definition.  In practice both are likely
> to be small, but on general principles we to avoid quadratic
> algorithms.  And it is easy to avoid it.

I'm not sure it's worth the code complexity of trying to avoid
the O(M*N).  On average it's only M*N/2 comparisons, since matches are
guaranteed, and as you said M and N are normally small.

Does your suggestion handle C++'s alternative tokens in the expansion of
#define correctly, by still recording their special status?

Neil.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-11-07  8:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-30 11:32 Subject: Removing the arg_index field from cpp_hashnode Per Bothner
2002-10-30 12:05 ` Zack Weinberg
2002-10-30 12:13   ` Per Bothner
2002-10-30 12:44     ` Zack Weinberg
2002-11-07  0:33 ` Neil Booth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).