public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* cpplib: redoing the lexer
@ 2000-09-05 23:38 Neil Booth
  0 siblings, 0 replies; only message in thread
From: Neil Booth @ 2000-09-05 23:38 UTC (permalink / raw)
  To: gcc; +Cc: Zack Weinberg

This is a rough outline, in stages, of changes I'm planning to make to
the lexer.  Comments are welcome.

1) Return lexing to a forward-looking process rather than
backward-looking.  This will still be single-pass.

2) 1) enables moving to token-at-a-time rather than line-at-a-time
lexing.

3) 2) enables better tracking of lexer state within the lexer itself,
e.g. by processing directives as they are lexed.  With this, we should
be able to e.g. take various information out of the header files,
optimize false conditional skipping, move the check for use of
poisioned identifiers to one place, cleanup handling of <...> in
#include directives etc.

4) Rework memory management, to distinguish better between permanent
allocations and temporary allocations, probably using obstacks.  The
lexer state tracking makes this easier, e.g. the expansion of a macro
when in #define should go into permanent storage.  This should save us
having to re-allocate memory in handlers like do_define.

5) With improved memory management, move towards N-token lookahead and
/ or lookback, where N is a fixed constant.  This is useful within
cpplib itself when looking for '(' during testing for macros, but
should be more important to front-end parsers when cpplib is
integrated, particularly C++ I suspect.

6) Move more functionality into the preprocessor stage, e.g. ISO
string concatenation, and maybe interpretation of integers and / or
floats.

7) (Longer term) I think the above changes should enable us to
describe token streams, for e.g. precompiled headers, in a much more
compact format than the current 16-bytes-per-token + string /
identifier overhead.  I think it should be possible to reach less than
4 bytes per token, with many tokens just being a single "type" byte,
with a special whitespace token.

Neil.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2000-09-05 23:38 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-09-05 23:38 cpplib: redoing the lexer Neil Booth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).