* [gcjx] Patch: FYI: parser and lexer changes
@ 2005-10-03 20:44 Tom Tromey
2005-10-04 2:47 ` Ranjit Mathew
0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2005-10-03 20:44 UTC (permalink / raw)
To: Java Patch List
I'm checking this in on the gcjx branch.
This weekend I looked into gcjx performance a bit. The appended patch
yields about a 30% speed improvement when building Classpath on my
machine. This is nice, but not nearly good enough; I think I'll
probably just have to throw out the current parser and write one using
a real tool. Most likely, I'll put this off for a while.
FWIW in the end I didn't use oprofile but instead built a static gcjx
with -pg. That seemed to yield the most useful results.
Tom
Index: ChangeLog
from Tom Tromey <tromey@redhat.com>
* source/lex.cc (get_raw): Increment column.
(get): Don't increment line number when newline comes from unget.
* source/iconv.cc (get): Rewrote.
(iconv_ucs2_reader): Updated.
(refill): Updated.
(get): Rewrote.
* source/iconv.hh (iconv_ucs2_reader::get): Updated.
(iconv_ucs2_reader::next): Removed.
* source/ucs2.cc (internal_get): New method.
(get): Rewrote.
* source/ucs2.hh (ucs2_reader::get): Changed prototype.
(fallback_ucs2_reader::val): New field.
(fallback_ucs2_reader::internal_get): Declare.
* source/lex.cc (lexer): Initialize new fields.
(get_raw): Rewrote.
* source/lex.hh (lexer::chars): New field.
(lexer::num_chars): Likewise.
(lexer::position): Likewise.
* scope.cc (resolution_scope): Initialize it.
(update_cache): New method.
(get_current_class): Removed.
(push_scope): New method.
(push_iscope): Moved from .hh file.
* scope.hh (resolution_scope::current_class): New field.
(resolution_scope::update_cache): Declare.
(resolution_scope::get_current_class): Now inline.
(push_iscope::save): New field.
(push_iscope): Outline.
(~push_iscope): New destructor.
(resolution_scope::push_scope): Outline.
* model/ideprecatable.hh (IDeprecatable::set_deprecated): Removed
overload.
* source/tstream.cc (lex_file): Handle javadoc here.
(get_javadoc): Rewrote.
(get_token): Removed.
(peek_token): Removed.
* source/parse.hh (class parse): Updated.
* source/parse.cc (any_method_declarator): Changed argument type
from ref_javadoc to bool.
(field_declarator): Likewise.
(void_method): Likewise.
(member_decl): Likewise.
(class_body_declaration): Updated.
(class_or_interface_declaration): Likewise.
* source/tstream.hh (token_stream::javadoc_is_ok): Removed.
(token_stream): Updated.
(token_stream::deprecated): New field.
(token_stream::get_token): Now inline.
(token_stream::get_javadoc): Changed return type.
(token_stream::peek_token): Now inline.
(token_stream::peek_token1): Likewise.
(token_stream::peek_token): Removed overload.
* source/parse.hh (parse::get): Return reference.
(parse::peek): Likewise.
(parse::peek1): Likewise.
(parse::require): Likewise.
(parse::assume): Likewise.
* source/lex.cc (lex_token): Renamed.
* source/lex.hh (lexer::lex_token): Renamed from get_token. Now
protected.
* source/parse.cc (class_body_declaration): Use get_javadoc.
(class_or_interface_declaration): Likewise.
(require): Use and return reference.
(assume): Likewise.
(identifier): Use reference.
(new_something): Likewise.
(type_dot_class): Likewise.
(class_name_dot_this): Likewise.
(postfix_expression): Likewise.
(unary_expression_not_plus_minus): Likewise.
(unary_expression): Likewise.
(relational_expression): Likewise.
(assignment): Likewise.
(primitive_type): Likewise.
(type_name): Likewise.
(final_or_attributes): Likewise.
(assert_statement): Likewise.
(try_statement): Likewise.
(if_then_statement): Likewise.
(statement): Likewise.
(explicit_constructor_invocation): Likewise.
(type_parameters): Likewise.
(member_value): Likewise.
(any_method_declarator): Likewise.
(void_method): Likewise.
(member_decl): Likewise.
(class_body_declaration): Likewise.
(interface_declaration): Likewise.
(class_declaration): Likewise.
(class_or_interface_declaration): Likewise.
(type_declaration): Likewise.
* source/tstream.cc (get_unfiltered_token): Removed.
(peek_token): Updated.
(peek_token1): Updated.
(get_token): Rewrote.
(lex_file): Added assert.
(peek_token): New method.
(peek_token, peek_token1): Removed.
(class saver): Removed.
(get_javadoc): New method.
* source/tstream.hh (token_stream::get_unfiltered_token):
Removed.
(token_stream::position_type): New typedef.
(token_stream::read_position): Use it.
(token_stream::set_mark): Likewise.
(token_stream::reset_to_mark): Likewise.
(class marker): Likewise.
(token_stream::javadoc_ok): Removed.
(token_stream::peek_token): New overload.
(token_stream::peek_token1): Inline.
Index: scope.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/Attic/scope.cc,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 scope.cc
--- scope.cc 13 Jan 2005 03:18:33 -0000 1.1.2.1
+++ scope.cc 3 Oct 2005 20:27:32 -0000
@@ -1,6 +1,6 @@
// Scopes.
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
//
// This file is part of GCC.
//
@@ -24,7 +24,8 @@
\f
resolution_scope::resolution_scope ()
- : warning_scope (static_cast<warning_scope *> (global->get_compiler ()))
+ : warning_scope (static_cast<warning_scope *> (global->get_compiler ())),
+ current_class (NULL)
{
}
@@ -55,18 +56,26 @@
return NULL;
}
-model_class *
-resolution_scope::get_current_class () const
+void
+resolution_scope::update_cache ()
{
- for (std::deque<IScope *>::const_reverse_iterator i = scope_stack.rbegin ();
- i != scope_stack.rend ();
- ++i)
+ // Note that this is called before the last item is popped from the
+ // stack, due how push_iscope is implemented.
+ std::deque<IScope *>::reverse_iterator i = scope_stack.rbegin ();
+ // Return early if stack is empty or if we aren't popping the
+ // current class.
+ if (i == scope_stack.rend () || *i != current_class)
+ return;
+ current_class = NULL;
+ for (++i; i != scope_stack.rend (); ++i)
{
model_class *r = dynamic_cast<model_class *> (*i);
if (r != NULL)
- return r;
+ {
+ current_class = r;
+ break;
+ }
}
- return NULL;
}
ICatcher *
@@ -224,3 +233,22 @@
scope_stack.back ()->add_binding (klass);
}
+
+void
+resolution_scope::push_scope (IScope *i)
+{
+ scope_stack.push_back (i);
+ model_class *r = dynamic_cast<model_class *> (i);
+ if (r != NULL)
+ current_class = r;
+}
+
+resolution_scope::push_iscope::push_iscope (resolution_scope *scope,
+ IScope *is)
+ : stack_temporary<IScope *> (scope->scope_stack, is),
+ save (scope)
+{
+ model_class *r = dynamic_cast<model_class *> (is);
+ if (r != NULL)
+ scope->current_class = r;
+}
Index: scope.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/Attic/scope.hh,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 scope.hh
--- scope.hh 13 Jan 2005 03:18:33 -0000 1.1.2.1
+++ scope.hh 3 Oct 2005 20:27:32 -0000
@@ -1,6 +1,6 @@
// Scopes used during resolution.
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
//
// This file is part of GCC.
//
@@ -35,6 +35,12 @@
// Stack of scope objects.
std::deque<IScope *> scope_stack;
+ // Cache the current class.
+ model_class *current_class;
+
+ // Update the class cache before popping a scope.
+ void update_cache ();
+
public:
resolution_scope ();
@@ -75,7 +81,10 @@
// Return the class we are currently resolving. This will return
// NULL outside of a class context, for instance while resolving
// import declarations.
- model_class *get_current_class () const;
+ model_class *get_current_class () const
+ {
+ return current_class;
+ }
model_method *get_current_method () const;
@@ -88,21 +97,23 @@
/// Note the asymmetric API: we can push a scope on the stack with
/// this method, but there is no way to pop a scope.
- void push_scope (IScope *i)
- {
- scope_stack.push_back (i);
- }
+ void push_scope (IScope *);
+
/// This class is used, via the RAII idiom, to push a new IScope on
/// the current resolution_scope, and then later remove it when this
/// object is destroyed.
class push_iscope : public stack_temporary<IScope *>
{
+ resolution_scope *save;
+
public:
- push_iscope (resolution_scope *scope, IScope *is)
- : stack_temporary<IScope *> (scope->scope_stack, is)
+ push_iscope (resolution_scope *, IScope *);
+
+ ~push_iscope ()
{
+ save->update_cache ();
}
};
};
Index: model/ideprecatable.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/model/Attic/ideprecatable.hh,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 ideprecatable.hh
--- model/ideprecatable.hh 13 Jan 2005 03:18:36 -0000 1.1.2.1
+++ model/ideprecatable.hh 3 Oct 2005 20:27:32 -0000
@@ -1,6 +1,6 @@
// "deprecatable" interface.
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
//
// This file is part of GCC.
//
@@ -48,14 +48,6 @@
{
}
- /// This can be used to set the deprecation state based on a
- /// javadoc comment.
- void set_deprecated (const ref_javadoc &javadoc)
- {
- if (javadoc)
- deprecated = javadoc->deprecated_p ();
- }
-
/// Set the deprecation state of this object.
void set_deprecated (bool d)
{
Index: source/iconv.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/iconv.cc,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 iconv.cc
--- source/iconv.cc 3 Oct 2005 17:38:09 -0000 1.1.2.2
+++ source/iconv.cc 3 Oct 2005 20:27:32 -0000
@@ -48,7 +48,6 @@
assert (handle != (iconv_t) -1);
last = 0;
- next = 0;
}
iconv_ucs2_reader::~iconv_ucs2_reader ()
@@ -60,8 +59,6 @@
void
iconv_ucs2_reader::refill ()
{
- assert (last == next);
-
size_t in_rem = end - curr;
char *out_loc = (char *) translated;
size_t out_rem = sizeof (translated);
@@ -86,23 +83,15 @@
}
}
- // Now update our internal state to reflect the current situation.
- next = 0;
-
// A UCS-2 character is always 2 bytes.
assert ((out_loc - (char *) translated) % 2 == 0);
last = (out_loc - (char *) translated) / 2;
}
-unicode_w_t
-iconv_ucs2_reader::get ()
+jchar *
+iconv_ucs2_reader::get (int &size)
{
- if (next == last)
- {
- if (curr == end)
- return UNICODE_EOF;
- refill ();
- assert (next != last);
- }
- return translated[next++];
+ refill ();
+ size = last;
+ return translated;
}
Index: source/iconv.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/iconv.hh,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 iconv.hh
--- source/iconv.hh 13 Jan 2005 03:18:37 -0000 1.1.2.1
+++ source/iconv.hh 3 Oct 2005 20:27:32 -0000
@@ -1,6 +1,6 @@
// iconv-based reader.
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
//
// This file is part of GCC.
//
@@ -35,9 +35,6 @@
// One past the last valid character in TRANSLATED.
int last;
- // Index of the next valid character.
- int next;
-
// Name of the input encoding.
std::string input_encoding;
@@ -50,7 +47,7 @@
~iconv_ucs2_reader ();
- unicode_w_t get ();
+ jchar *get (int &);
};
#endif // GCJX_SOURCE_ICONV_HH
Index: source/lex.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/lex.cc,v
retrieving revision 1.1.2.4
diff -u -r1.1.2.4 lex.cc
--- source/lex.cc 3 Oct 2005 17:38:09 -0000 1.1.2.4
+++ source/lex.cc 3 Oct 2005 20:27:32 -0000
@@ -42,7 +42,10 @@
filename (file),
line (1),
column (0),
- tab_width (global->get_compiler ()->get_tab_width ())
+ tab_width (global->get_compiler ()->get_tab_width ()),
+ chars (NULL),
+ num_chars (0),
+ position (0)
{
}
@@ -215,30 +218,27 @@
{
c = unget_value;
unget_value = UNICODE_W_NONE;
+ return c;
}
- else
+
+ if (position >= num_chars)
{
try
{
- c = input_filter->get ();
+ chars = input_filter->get (num_chars);
+ position = num_chars > 0 ? 0 : -1;
}
catch (conversion_error &exc)
{
exc.set_location (here ());
throw exc;
}
-
- if (c == UNICODE_TAB)
- {
- // Advance to next multiple of tab width. Note we don't
- // subtract one from tab_width since we start columns at
- // zero.
- column = ((column + tab_width) / tab_width) * tab_width;
- }
- else
- ++column;
}
- return c;
+
+ if (position == -1)
+ return UNICODE_EOF;
+ ++column;
+ return chars[position++];
}
unicode_w_t
@@ -300,10 +300,12 @@
lexer::get ()
{
unicode_w_t c;
+ bool can_incr = true;
if (cooked_unget_value != UNICODE_W_NONE)
{
c = cooked_unget_value;
cooked_unget_value = UNICODE_W_NONE;
+ can_incr = false;
}
else
c = read_handling_escapes ();
@@ -322,7 +324,7 @@
c = UNICODE_LINE_FEED;
}
- if (c == UNICODE_LINE_FEED)
+ if (can_incr && c == UNICODE_LINE_FEED)
{
// Note that it is somewhat bogus for this to be here, since it
// means an escape like \u00a0 will increment the line number.
@@ -1300,7 +1302,7 @@
}
token
-lexer::get_token ()
+lexer::lex_token ()
{
token result;
location where;
Index: source/lex.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/lex.hh,v
retrieving revision 1.1.2.3
diff -u -r1.1.2.3 lex.hh
--- source/lex.hh 3 Oct 2005 17:38:09 -0000 1.1.2.3
+++ source/lex.hh 3 Oct 2005 20:27:32 -0000
@@ -64,6 +64,12 @@
// the compiler.
int tab_width;
+ // Buffer of characters from the input reader, its size, and our
+ // position in it.
+ jchar *chars;
+ int num_chars;
+ int position;
+
// Some methods to characterize characters.
bool identifier_part_p (unicode_w_t);
bool identifier_start_p (unicode_w_t);
@@ -152,15 +158,17 @@
// Internal interface to the lexer; does most of the work.
token get_token_internal (location &);
+protected:
+
+ // Return a single token.
+ token lex_token ();
+
public:
lexer (ucs2_reader *source, const char *file);
virtual ~lexer ();
- // Return a single token.
- token get_token ();
-
// Return the name of a token.
static const char *token_to_string (token_value);
Index: source/parse.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/parse.cc,v
retrieving revision 1.1.2.5
diff -u -r1.1.2.5 parse.cc
--- source/parse.cc 20 Sep 2005 17:11:45 -0000 1.1.2.5
+++ source/parse.cc 3 Oct 2005 20:27:33 -0000
@@ -79,10 +79,10 @@
\f
-inline token
+inline token &
parse::require (token_value val, const char *message)
{
- token t = get ();
+ token &t = get ();
if (t != val)
{
if (message)
@@ -93,10 +93,10 @@
return t;
}
-inline token
+inline token &
parse::assume (token_value val)
{
- token t = get ();
+ token &t = get ();
assert (t == val);
return t;
}
@@ -180,7 +180,7 @@
std::string
parse::identifier ()
{
- token t = require (TOKEN_IDENTIFIER, "identifier expected");
+ token &t = require (TOKEN_IDENTIFIER, "identifier expected");
return ((ref_identifier) t)->get_identifier ();
}
@@ -343,7 +343,7 @@
// since in this case we must search the primary's class for the
// member type named Name; instead we handle this when resolving the
// 'new'.
- token t = peek ();
+ token &t = peek ();
if (t == TOKEN_OPEN_PAREN || primary_dot_new || require_class)
{
std::list<ref_expression> args (arguments ());
@@ -421,7 +421,7 @@
{
ref_forwarding_type the_type (type ());
require (TOKEN_DOT);
- token t = require (TOKEN_CLASS);
+ token &t = require (TOKEN_CLASS);
return ref_expression (new model_class_ref (t, the_type));
}
@@ -432,7 +432,7 @@
{
std::string name = identifier ();
require (TOKEN_DOT);
- token t = require (TOKEN_THIS);
+ token &t = require (TOKEN_THIS);
std::list<std::string> fake_qual;
fake_qual.push_back (name);
ref_forwarding_type fwdt (new model_forwarding_simple (t, fake_qual));
@@ -534,7 +534,7 @@
parse::primary ()
{
ref_expression result;
- token t;
+ token t; // FIXME
// Introduce a new scope so the marker lives just as long as we may
// need it.
@@ -753,7 +753,7 @@
ref_expression result (primary ());
while (true)
{
- token t = peek ();
+ token &t = peek ();
if (t == TOKEN_INCREMENT)
{
assume (TOKEN_INCREMENT);
@@ -782,7 +782,7 @@
parse::unary_expression_not_plus_minus ()
{
ref_unary op;
- token t = peek ();
+ token &t = peek ();
switch (t.get_type ())
{
case TOKEN_BITWISE_NOT:
@@ -841,7 +841,7 @@
parse::unary_expression ()
{
ref_unary op;
- token t = peek ();
+ token &t = peek ();
switch (t.get_type ())
{
case TOKEN_INCREMENT:
@@ -864,13 +864,13 @@
assume (t.get_type ());
if (t == TOKEN_MINUS)
{
- t = peek ();
+ token &t2 = peek ();
// This special case circumvents the value checking in the
// ordinary case, allowing the largest negative value through.
- if (t == TOKEN_DECIMAL_INT_LIT || t == TOKEN_DECIMAL_LONG_LIT)
+ if (t2 == TOKEN_DECIMAL_INT_LIT || t2 == TOKEN_DECIMAL_LONG_LIT)
{
- assume (t.get_type ());
- op->set_expression (ref_expression (t));
+ assume (t2.get_type ());
+ op->set_expression (ref_expression (t2));
return op;
}
}
@@ -942,7 +942,7 @@
while (true)
{
ref_binary op;
- token t = peek ();
+ token &t = peek ();
switch (t.get_type ())
{
case TOKEN_LESS_THAN:
@@ -1113,7 +1113,7 @@
parse::assignment ()
{
ref_expression lhs (left_hand_side ());
- token t = get ();
+ token &t = get ();
if (! assignment_op_p (t.get_type ()))
throw syntax_error (t, "assignment operator expected");
ref_expression rhs (assignment_expression ());
@@ -1210,7 +1210,7 @@
ref_forwarding_type
parse::primitive_type ()
{
- token t = get ();
+ token &t = get ();
if (! basic_type_p (t.get_type ()))
throw syntax_error (t, "primitive type expected");
ref_forwarding_type pt
@@ -1223,7 +1223,7 @@
ref_forwarding_type
parse::type_name (int <_depth)
{
- token t = peek ();
+ token t = peek (); // FIXME
if (basic_type_p (t.get_type ()))
{
get ();
@@ -1243,7 +1243,7 @@
while (true)
{
ids.push_back (identifier ());
- token t = peek ();
+ token t = peek (); // FIXME
// '<' means we're starting the parameters of a generic type.
if (t == TOKEN_LESS_THAN)
@@ -1438,7 +1438,7 @@
{
while (true)
{
- token t = peek ();
+ token &t = peek ();
if (t == TOKEN_FINAL)
{
// FIXME: if (is_final) throw ...
@@ -1515,7 +1515,7 @@
ref_assert
parse::assert_statement ()
{
- token t = assume (TOKEN_ASSERT);
+ token &t = assume (TOKEN_ASSERT);
ref_assert result (new model_assert (t));
result->set_expression (expression ());
@@ -1794,7 +1794,7 @@
ref_try
parse::try_statement ()
{
- token t = assume (TOKEN_TRY);
+ token &t = assume (TOKEN_TRY);
ref_try result (new model_try (t));
result->set_block (block (new model_block (t))); // FIXME location
@@ -1842,7 +1842,7 @@
result->set_true_branch (statement ());
- token t = peek ();
+ token &t = peek ();
// The simple rule is, the inner-most "if" gets the "else".
// We don't need to handle the no-short-if case specially.
if (t == TOKEN_ELSE)
@@ -1866,6 +1866,7 @@
void
parse::switch_block_statement_groups (const ref_switch &result_switch)
{
+ // FIXME
for (token t = peek (); t == TOKEN_CASE || t == TOKEN_DEFAULT; t = peek ())
{
ref_switch_block swb (new model_switch_block (t));
@@ -1952,7 +1953,7 @@
ref_stmt
parse::statement ()
{
- token t = peek ();
+ token &t = peek ();
switch (t.get_type ())
{
@@ -2089,6 +2090,7 @@
std::list<ref_stmt> result;
// This function is also used when parsing a switch block.
+ // FIXME
for (token t = peek ();
t != TOKEN_CLOSE_BRACE && t != TOKEN_CASE && t != TOKEN_DEFAULT;
t = peek ())
@@ -2129,7 +2131,7 @@
{
ref_modifier_list mods (new model_modifier_list ());
- for (token t = peek ();
+ for (token t = peek (); // FIXME
model_modifier_list::modifier_token_p (t);
t = peek ())
{
@@ -2154,7 +2156,7 @@
if (peek () == TOKEN_LESS_THAN)
type_params = actual_type_parameters ();
- token t = peek ();
+ token &t = peek ();
bool need_default = false;
switch (t.get_type ())
@@ -2325,18 +2327,18 @@
std::list<ref_type_variable> result;
int lt_depth = 1;
- token t;
do
{
result.push_back (type_parameter (lt_depth));
assert (lt_depth <= 1);
if (! lt_depth)
break;
- t = peek ();
- if (t == TOKEN_COMMA)
- assume (TOKEN_COMMA);
+ token &t = peek ();
+ if (t != TOKEN_COMMA)
+ break;
+ assume (TOKEN_COMMA);
}
- while (t == TOKEN_COMMA);
+ while (true);
if (lt_depth)
require (TOKEN_GREATER_THAN);
@@ -2350,7 +2352,7 @@
ref_expression
parse::member_value ()
{
- token t = peek ();
+ token &t = peek ();
if (t == TOKEN_OPEN_BRACE)
{
assume (TOKEN_OPEN_BRACE);
@@ -2397,7 +2399,7 @@
{
assume (TOKEN_OPEN_PAREN);
- token t = peek ();
+ token t = peek (); // FIXME
if (t == TOKEN_CLOSE_PAREN)
{
// Marker annotation, do nothing.
@@ -2480,7 +2482,7 @@
const ref_modifier_list &mods,
int flags,
ref_forwarding_type member_type,
- const ref_javadoc &javadoc)
+ bool deprecated)
{
std::string id;
bool is_interface = enclosing_class ()->interface_p ();
@@ -2498,7 +2500,7 @@
result->set_return_type (member_type);
result->set_annotations (annos);
- result->set_deprecated (javadoc);
+ result->set_deprecated (deprecated);
result->set_name (id);
bool dots_found = false;
@@ -2536,7 +2538,7 @@
result->set_throws (type_list ());
}
- token t = peek ();
+ token &t = peek ();
if (t == TOKEN_OPEN_BRACE)
{
if (is_interface)
@@ -2562,7 +2564,7 @@
const ref_forwarding_type &member_type,
const std::list<ref_annotation> &annos,
const ref_modifier_list &mods,
- const ref_javadoc &javadoc)
+ bool deprecated)
{
// We can have several fields from a single declarator.
std::list<ref_variable_decl> vars = variable_declarators (member_type);
@@ -2584,7 +2586,7 @@
ref_field result (new model_field (*i));
result->set_annotations (annos);
result->set_modifiers (mods);
- result->set_deprecated (javadoc);
+ result->set_deprecated (deprecated);
klass->add (result);
}
}
@@ -2593,13 +2595,13 @@
parse::void_method (const ref_class &klass,
const std::list<ref_annotation> &annos,
const ref_modifier_list &mods,
- const ref_javadoc &javadoc)
+ bool deprecated)
{
- token t = require (TOKEN_VOID);
+ token &t = require (TOKEN_VOID);
ref_forwarding_type voidtype
= new model_forwarding_resolved (t, primitive_void_type);
ref_method decl = any_method_declarator (annos, mods, METHOD_VOID,
- voidtype, javadoc);
+ voidtype, deprecated);
klass->add (decl);
return decl;
}
@@ -2616,21 +2618,21 @@
parse::member_decl (const ref_class &klass,
const std::list<ref_annotation> &annos,
const ref_modifier_list &mods,
- const ref_javadoc &javadoc,
+ bool deprecated,
bool parse_annotation)
{
- token t = peek ();
+ token &t = peek ();
// You can't have a void method in an annotation; we throw an
// exception for this later.
if (! parse_annotation && t == TOKEN_VOID)
{
- void_method (klass, annos, mods, javadoc);
+ void_method (klass, annos, mods, deprecated);
return;
}
else if (t == TOKEN_CLASS)
{
ref_class member = class_declaration (mods);
- member->set_deprecated (javadoc);
+ member->set_deprecated (deprecated);
member->set_annotations (annos);
klass->add (member);
return;
@@ -2638,7 +2640,7 @@
else if (t == TOKEN_INTERFACE)
{
ref_class member = interface_declaration (mods);
- member->set_deprecated (javadoc);
+ member->set_deprecated (deprecated);
member->set_annotations (annos);
klass->add (member);
return;
@@ -2646,7 +2648,7 @@
else if (t == TOKEN_ENUM)
{
ref_class member = enum_declaration (mods);
- member->set_deprecated (javadoc);
+ member->set_deprecated (deprecated);
member->set_annotations (annos);
klass->add (member);
return;
@@ -2654,7 +2656,7 @@
else if (t == TOKEN_AT)
{
ref_class member = annotation_type_declaration (mods);
- member->set_deprecated (javadoc);
+ member->set_deprecated (deprecated);
member->set_annotations (annos);
klass->add (member);
return;
@@ -2673,7 +2675,7 @@
// is not a "type".
if (peek () == TOKEN_VOID)
{
- ref_method m = void_method (klass, annos, mods, javadoc);
+ ref_method m = void_method (klass, annos, mods, deprecated);
m->set_type_parameters (type_parms);
return;
}
@@ -2683,7 +2685,7 @@
int method_flags = 0;
ref_forwarding_type member_type;
- token pt = peek ();
+ token &pt = peek ();
if (! parse_annotation
&& pt == TOKEN_IDENTIFIER
&& ((ref_identifier) pt)->get_identifier () == klass->get_name ()
@@ -2704,7 +2706,7 @@
if (parse_annotation)
{
assert (! must_be_method);
- // We just ignore JAVADOC here. Perhaps we should give a
+ // We just ignore javadoc here. Perhaps we should give a
// warning if it had @deprecated, since that is just
// ignored.
annotation_type_member (klass, mods, member_type);
@@ -2712,14 +2714,14 @@
else
{
ref_method meth (any_method_declarator (annos, mods, method_flags,
- member_type, javadoc));
+ member_type, deprecated));
if (! type_parms.empty ())
meth->set_type_parameters (type_parms);
klass->add (meth);
}
}
else
- field_declarator (klass, member_type, annos, mods, javadoc);
+ field_declarator (klass, member_type, annos, mods, deprecated);
}
// Helper function for class_body_declaration. This handles static
@@ -2754,7 +2756,7 @@
parse::class_body_declaration (const ref_class &result,
bool parse_annotation)
{
- token t = peek ();
+ token &t = peek ();
if (t == TOKEN_SEMI)
{
assume (TOKEN_SEMI);
@@ -2771,24 +2773,17 @@
// Javadoc is valid before the modifiers of a member. Before a
// static block we can just ignore the javadoc as it is an ordinary
// comment.
- our_token_stream->javadoc_ok ();
- ref_javadoc javadoc;
- t = peek ();
- if (t == TOKEN_JAVADOC)
- {
- assume (TOKEN_JAVADOC);
- javadoc = t;
- }
+ bool deprecated = our_token_stream->get_javadoc ();
// Always read modifiers at this point, since "static" is a valid
// modifier. Anything incorrect we diagnose later.
ref_modifier_list mods = modifiers_opt ();
- t = peek ();
- if (! parse_annotation && t == TOKEN_OPEN_BRACE)
+ token t2 = peek ();
+ if (! parse_annotation && t2 == TOKEN_OPEN_BRACE)
class_body_block (annos, mods);
else
- member_decl (result, annos, mods, javadoc, parse_annotation);
+ member_decl (result, annos, mods, deprecated, parse_annotation);
}
// class-body:
@@ -2845,7 +2840,7 @@
set_containing_class (result);
result->set_modifiers (mods);
- token t = peek ();
+ token &t = peek ();
if (global->get_compiler ()->feature_generics () && t == TOKEN_LESS_THAN)
{
@@ -2880,7 +2875,7 @@
set_containing_class (result);
result->set_modifiers (mods);
- token t = peek ();
+ token &t = peek ();
if (global->get_compiler ()->feature_generics () && t == TOKEN_LESS_THAN)
{
@@ -3018,14 +3013,7 @@
ref_class result;
// Javadoc is valid before the modifiers of a class.
- our_token_stream->javadoc_ok ();
- ref_javadoc javadoc;
- token t = peek ();
- if (t == TOKEN_JAVADOC)
- {
- assume (TOKEN_JAVADOC);
- javadoc = t;
- }
+ bool deprecated = our_token_stream->get_javadoc ();
// '@' could represent the start of annotations, or it could be
// '@interface'.
@@ -3034,7 +3022,7 @@
annos = annotations ();
ref_modifier_list mods = modifiers_opt ();
- t = peek ();
+ token &t = peek ();
if (t == TOKEN_CLASS)
result = class_declaration (mods);
else if (t == TOKEN_INTERFACE)
@@ -3047,7 +3035,7 @@
throw syntax_error (t, "%<class%>, %<interface%>, or %<enum%> "
"declaration expected");
- result->set_deprecated (javadoc);
+ result->set_deprecated (deprecated);
result->set_annotations (annos);
return result;
@@ -3060,7 +3048,7 @@
ref_class
parse::type_declaration ()
{
- token t = peek ();
+ token &t = peek ();
if (t == TOKEN_SEMI)
{
assume (TOKEN_SEMI);
@@ -3143,7 +3131,7 @@
if (peek () == TOKEN_AT)
annos = annotations ();
- token t = peek ();
+ token t = peek (); // FIXME
if (t == TOKEN_PACKAGE)
{
// FIXME: we make a bogus new element just to hold the location
Index: source/parse.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/parse.hh,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 parse.hh
--- source/parse.hh 3 Oct 2005 17:38:09 -0000 1.1.2.2
+++ source/parse.hh 3 Oct 2005 20:27:33 -0000
@@ -60,30 +60,30 @@
const std::string &);
// Fetch the next token.
- token get ()
+ token &get ()
{
return our_token_stream->get_token ();
}
// Peek ahead a token.
- token peek ()
+ token &peek ()
{
return our_token_stream->peek_token ();
}
// Peek ahead two tokens.
- token peek1 ()
+ token &peek1 ()
{
return our_token_stream->peek_token1 ();
}
// Require that the next token be of a certain type. If not, throw
// a parse exception.
- token require (token_value val, const char *message = NULL);
+ token &require (token_value val, const char *message = NULL);
// Require that the next token will be a certain type. If not,
// abort.
- token assume (token_value);
+ token &assume (token_value);
// Classify tokens.
static bool parse::basic_type_p (token_value);
@@ -179,21 +179,21 @@
const ref_modifier_list &,
int,
ref_forwarding_type,
- const ref_javadoc &);
+ bool);
void field_declarator (const ref_class &,
const ref_forwarding_type &,
const std::list<ref_annotation> &,
const ref_modifier_list &,
- const ref_javadoc &);
+ bool);
void member_decl (const ref_class &,
const std::list<ref_annotation> &,
const ref_modifier_list &,
- const ref_javadoc &,
+ bool,
bool);
ref_method void_method (const ref_class &,
const std::list<ref_annotation> &,
const ref_modifier_list &,
- const ref_javadoc &);
+ bool);
void class_body_block (const std::list<ref_annotation> &,
const ref_modifier_list &);
void class_body_declaration (const ref_class &, bool);
Index: source/tstream.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/tstream.cc,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 tstream.cc
--- source/tstream.cc 29 Sep 2005 00:54:48 -0000 1.1.2.2
+++ source/tstream.cc 3 Oct 2005 20:27:33 -0000
@@ -28,89 +28,21 @@
token r;
do
{
- r = lexer::get_token ();
- buffer.push_back (r);
- }
- while (r != TOKEN_EOF);
-}
-
-token
-token_stream::get_unfiltered_token ()
-{
- if (read_position < buffer.size ())
- ++read_position;
- return buffer[read_position - 1];
-}
-
-token
-token_stream::get_token ()
-{
- token r;
- while (true)
- {
- r = get_unfiltered_token ();
+ r = lex_token ();
// In theory the lexer will never return a repeat or invalid
// token.
assert (r != TOKEN_REPEAT && r != TOKEN_INVALID);
- if (r != TOKEN_JAVADOC)
- break;
- if (javadoc_is_ok)
- {
- // Look ahead -- if the next token is javadoc, we discard
- // this one.
- token n = peek_token ();
- if (n != TOKEN_JAVADOC)
- break;
- }
+ if (r == TOKEN_JAVADOC)
+ deprecated.insert (buffer.size ());
+ else
+ buffer.push_back (r);
}
-
- javadoc_is_ok = false;
- return r;
-}
-
-template<typename T>
-class saver
-{
- T &location;
- T save;
-
-public:
-
- saver (T &loc, T new_val)
- : location (loc),
- save (loc)
- {
- location = new_val;
- }
-
- saver (T &loc)
- : location (loc),
- save (loc)
- {
- }
-
- ~saver ()
- {
- location = save;
- }
-};
-
-token
-token_stream::peek_token ()
-{
- saver<bool> save_jd (javadoc_is_ok);
- saver<int> save_read (read_position);
- token r = get_token ();
- return r;
+ while (r != TOKEN_EOF);
}
-token
-token_stream::peek_token1 ()
+bool
+token_stream::get_javadoc ()
{
- saver<bool> save_jd (javadoc_is_ok);
- saver<int> save_read (read_position);
- get_token ();
- token r = get_token ();
- return r;
+ return deprecated.find (read_position) != deprecated.end ();
}
Index: source/tstream.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/tstream.hh,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 tstream.hh
--- source/tstream.hh 29 Sep 2005 00:54:48 -0000 1.1.2.2
+++ source/tstream.hh 3 Oct 2005 20:27:33 -0000
@@ -32,28 +32,29 @@
// filter the actual tokens according to what the parser requires.
class token_stream : public lexer
{
+ typedef unsigned int position_type;
+
// Our buffer.
std::deque<token> buffer;
// Position of next unread element in buffer.
- int read_position;
-
+ position_type read_position;
- // True if the parser can usefully interpret a javadoc comment as
- // the next token. When false, we filter out such comments.
- bool javadoc_is_ok;
+ // This holds the location of all tokens that were preceded by a
+ // javadoc containing 'deprecated'.
+ std::set<position_type> deprecated;
// Set a mark at the current point. Only called by marker class.
// Returns the position.
- int set_mark ()
+ position_type set_mark ()
{
return read_position;
}
// Reset the read pointer to a position. Only called by marker
// class.
- void reset_to_mark (int where)
+ void reset_to_mark (position_type where)
{
read_position = where;
}
@@ -61,9 +62,6 @@
friend class marker;
- // Return a token before any filtering is applied.
- token get_unfiltered_token ();
-
// Lex the file and fill our buffer.
void lex_file ();
@@ -71,8 +69,7 @@
token_stream (ucs2_reader *source, const char *file)
: lexer (source, file),
- read_position (0),
- javadoc_is_ok (false)
+ read_position (0)
{
lex_file ();
}
@@ -81,23 +78,28 @@
{
}
- // Indicate that it is ok for the next token to be TOKEN_JAVADOC.
- // The token stream imposes some restraints on this that conform to
- // javadoc processing.
- void javadoc_ok ()
+ // Fetch the next token. Javadoc tokens will be ignored.
+ token &get_token ()
{
- javadoc_is_ok = true;
+ return buffer[read_position++];
}
- // Fetch the next token.
- token get_token ();
+ // If the next token is a javadoc token, return the javadoc item.
+ // Otherwise return an empty javadoc wrapper.
+ bool get_javadoc ();
// Look at the next token without committing to it.
- token peek_token ();
+ token &peek_token ()
+ {
+ return buffer[read_position];
+ }
// Look ahead two tokens without committing. This is ad hoc, but it
- // seems to be all the parser requires.
- token peek_token1 ();
+ // is all the parser requires.
+ token &peek_token1 ()
+ {
+ return buffer[read_position + 1];
+ }
};
// This marks a location in a token stream buffer and ensures that no
@@ -105,7 +107,7 @@
class marker
{
// The location of this mark.
- int location;
+ token_stream::position_type location;
// Associated token stream.
token_stream *stream;
Index: source/ucs2.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/ucs2.cc,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 ucs2.cc
--- source/ucs2.cc 3 Oct 2005 17:38:09 -0000 1.1.2.2
+++ source/ucs2.cc 3 Oct 2005 20:27:33 -0000
@@ -50,8 +50,8 @@
\f
-unicode_w_t
-fallback_ucs2_reader::get ()
+jchar
+fallback_ucs2_reader::internal_get ()
{
int c = get_uint8 ();
@@ -103,6 +103,14 @@
% here ();
}
+jchar *
+fallback_ucs2_reader::get (int &len)
+{
+ val = internal_get ();
+ len = 1;
+ return &val;
+}
+
\f
std::string
Index: source/ucs2.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/ucs2.hh,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 ucs2.hh
--- source/ucs2.hh 3 Oct 2005 17:38:09 -0000 1.1.2.2
+++ source/ucs2.hh 3 Oct 2005 20:27:33 -0000
@@ -66,11 +66,7 @@
virtual ~ucs2_reader ();
- /// @return the next UCS-2 character. Return UNICODE_EOF on
- /// end-of-file. Throws conversion_error on any kind of character
- /// conversion error; the caller is expected to set the location on
- /// this exception.
- virtual unicode_w_t get () = 0;
+ virtual jchar *get (int &size) = 0;
/// Gets the current position of this reader within the input data.
virtual int get_posn ()
@@ -92,12 +88,16 @@
// Assume the input is utf-8.
class fallback_ucs2_reader : public ucs2_reader
{
+ jchar val;
+
// Just a dumb helper.
bool in_range_p (unicode_w_t v, unicode_w_t min, unicode_w_t max)
{
return v >= min && v <= max;
}
+ jchar internal_get ();
+
public:
fallback_ucs2_reader (byte_buffer *in)
@@ -105,7 +105,7 @@
{
}
- unicode_w_t get ();
+ jchar *get (int &size);
};
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gcjx] Patch: FYI: parser and lexer changes
2005-10-03 20:44 [gcjx] Patch: FYI: parser and lexer changes Tom Tromey
@ 2005-10-04 2:47 ` Ranjit Mathew
2005-10-04 3:06 ` Tom Tromey
0 siblings, 1 reply; 7+ messages in thread
From: Ranjit Mathew @ 2005-10-04 2:47 UTC (permalink / raw)
To: tromey; +Cc: GCJ Patches
[-- Attachment #1: Type: text/plain, Size: 1670 bytes --]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Tom Tromey wrote:
>
> This weekend I looked into gcjx performance a bit. The appended patch
> yields about a 30% speed improvement when building Classpath on my
> machine. This is nice, but not nearly good enough; I think I'll
You call a 30% speedup not good enough? Whoa!
> probably just have to throw out the current parser and write one using
> a real tool. Most likely, I'll put this off for a while.
People seem to be moving from tool-based to hand-written
recursive-descent parsers (witness the C and C++ front-ends
in mainline). Why do you think the current parser is beyond
redemption?
> FWIW in the end I didn't use oprofile but instead built a static gcjx
> with -pg. That seemed to yield the most useful results.
Agreed.
BTW, I applied the attached patch as obvious to restore the
caret-printing feature. Now I see:
- -------------------------- 8< --------------------------
~/src/tmp > cat Hello.java
public class Hello
{
public static void main( String[] args)
{
junk t;
System.out.println( "Hello World!");
}
}
~/src/tmp > $GCJX Hello.java
./Hello.java:5:4: error: type named 'junk' is undefined
junk t;
^
- -------------------------- 8< --------------------------
Yeah!
Thanks,
Ranjit.
- --
Ranjit Mathew Email: rmathew AT gmail DOT com
Bangalore, INDIA. Web: http://ranjitmathew.hostingzero.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDQe27Yb1hx2wRS48RAudnAJ4wwVWUVvmeYMr+oc2Y/jrCKiclXgCff34r
2y+FvH3SuqT7QgxuoJTqUWw=
=CMYv
-----END PGP SIGNATURE-----
[-- Attachment #2: t1.txt --]
[-- Type: text/plain, Size: 1406 bytes --]
Index: ChangeLog
from Ranjit Mathew <rmathew@gcc.gnu.org>
* source/lex.cc (lexer::get_line): Also save and restore NUM_CHARS
and POSITION.
Index: source/lex.cc
===================================================================
--- source/lex.cc 2005-10-04 07:59:37.000000000 +0530
+++ source/lex.cc 2005-10-04 08:01:21.000000000 +0530
@@ -1328,7 +1328,9 @@ lexer::get_line (int line_number)
unicode_w_t save_unget_value = unget_value;
unicode_w_t save_cooked_unget_value = cooked_unget_value;
bool save_was_return = was_return;
- int save_posn = input_filter->get_posn ();
+ int save_num_chars = num_chars;
+ int save_position = position;
+ int save_if_posn = input_filter->get_posn ();
// Reset the state of the lexer and the input reader.
line = 1;
@@ -1337,6 +1339,8 @@ lexer::get_line (int line_number)
unget_value = UNICODE_W_NONE;
cooked_unget_value = UNICODE_W_NONE;
was_return = false;
+ num_chars = 0;
+ position = 0;
input_filter->set_posn (0);
// Ignore input until we reach the desired line (or end of file).
@@ -1366,7 +1370,9 @@ lexer::get_line (int line_number)
unget_value = save_unget_value;
cooked_unget_value = save_cooked_unget_value;
was_return = save_was_return;
- input_filter->set_posn (save_posn);
+ num_chars = save_num_chars;
+ position = save_position;
+ input_filter->set_posn (save_if_posn);
return ret_val;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gcjx] Patch: FYI: parser and lexer changes
2005-10-04 2:47 ` Ranjit Mathew
@ 2005-10-04 3:06 ` Tom Tromey
2005-10-04 5:25 ` Ranjit Mathew
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Tom Tromey @ 2005-10-04 3:06 UTC (permalink / raw)
To: Ranjit Mathew; +Cc: GCJ Patches
Ranjit> You call a 30% speedup not good enough? Whoa!
I hate to say this, since I think it is a transient condition, and I
don't want people to really remember it, but gcjx as it is today is
really, really slow. It was more than 10x slower than jikes for
building classpath; now it is merely 6x slower.
>> probably just have to throw out the current parser and write one using
>> a real tool. Most likely, I'll put this off for a while.
Ranjit> People seem to be moving from tool-based to hand-written
Ranjit> recursive-descent parsers (witness the C and C++ front-ends
Ranjit> in mainline). Why do you think the current parser is beyond
Ranjit> redemption?
I'm not totally certain that it is. But it does show up in the
profile a lot more than I think it ought to. That is one problem.
Also it has never had a very good error reporting or recovery
approach, and the idea of throwing exceptions when a parse error is
hit turns out to be pretty bad (e.g., we spend 5 seconds constructing
format_repr objects when parsing an error-free code base). My guess
is that a generated parser would solve both of these problems.
Basically I took a crazy approach to writing a parser and now I'm
having second thoughts :-)
Ranjit> BTW, I applied the attached patch as obvious to restore the
Ranjit> caret-printing feature.
Thanks. And sorry about that.
Tom
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gcjx] Patch: FYI: parser and lexer changes
2005-10-04 3:06 ` Tom Tromey
@ 2005-10-04 5:25 ` Ranjit Mathew
2005-10-04 6:38 ` Ranjit Mathew
2005-10-04 7:03 ` Per Bothner
2 siblings, 0 replies; 7+ messages in thread
From: Ranjit Mathew @ 2005-10-04 5:25 UTC (permalink / raw)
To: tromey; +Cc: GCJ Patches
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Tom Tromey wrote:
> Ranjit> You call a 30% speedup not good enough? Whoa!
>
> I hate to say this, since I think it is a transient condition, and I
> don't want people to really remember it, but gcjx as it is today is
> really, really slow. It was more than 10x slower than jikes for
> building classpath; now it is merely 6x slower.
This is the classic "glass half-full v/s half-empty" case. I
think a 30% speedup in a single patch is great news, while
you consider it not being good enough to compete with Jikes.
I think you're being a bit unfair here. For an essentially
one-man project, done largely in spare time, GCJX is fairly
decent as-is. Jikes, ECJ, GCJ, etc. have had many years to
be tweaked, by many more people, sometimes working full-time.
I would personally consider correctness and robustness to
be more important at this stage than raw performance, though
I do understand that being fast would be an important criterion
for wider acceptance. "Premature performance is the root
of all evil" and all that.
> hit turns out to be pretty bad (e.g., we spend 5 seconds constructing
> format_repr objects when parsing an error-free code base). My guess
This is weird - why do we construct these objects at all then?
Isn't this easily rectified?
Thanks,
Ranjit
- --
Ranjit Mathew Email: rmathew AT gmail DOT com
Bangalore, INDIA. Web: http://ranjitmathew.hostingzero.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDQhI0Yb1hx2wRS48RAmsqAJ94lRqS+ul1yDVq0mqDFeGygdHRmQCgllX9
kkGgf/8fBKiS4MMQwsnnRX4=
=Yx2l
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gcjx] Patch: FYI: parser and lexer changes
2005-10-04 3:06 ` Tom Tromey
2005-10-04 5:25 ` Ranjit Mathew
@ 2005-10-04 6:38 ` Ranjit Mathew
2005-10-04 7:03 ` Per Bothner
2 siblings, 0 replies; 7+ messages in thread
From: Ranjit Mathew @ 2005-10-04 6:38 UTC (permalink / raw)
To: tromey; +Cc: GCJ Patches
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Tom Tromey wrote:
> Also it has never had a very good error reporting or recovery
> approach, and the idea of throwing exceptions when a parse error is
> hit turns out to be pretty bad (e.g., we spend 5 seconds constructing
> format_repr objects when parsing an error-free code base). My guess
A side-effect is not recovering from an error and not showing as
many errors as possible. For example:
- ----------------------------- 8< -----------------------------
~/src/tmp > cat Hello.java
public class Hello
{
public static void main( String[] args)
{
simian monkey;
bovine cow;
System.out.println( "Hello World!");
}
}
~/src/tmp > $GCJX Hello.java
./Hello.java:5:4: error: type named 'simian' is undefined
simian monkey;
^
~/src/tmp >
- ----------------------------- 8< -----------------------------
Ranjit.
- --
Ranjit Mathew Email: rmathew AT gmail DOT com
Bangalore, INDIA. Web: http://ranjitmathew.hostingzero.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDQiN0Yb1hx2wRS48RAtlSAJ4jLFo1p696gjT3qyRn7HXMOSh0wwCfUUOv
yUrQnV8170Tg68N7TeA/TLs=
=S9GT
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gcjx] Patch: FYI: parser and lexer changes
2005-10-04 3:06 ` Tom Tromey
2005-10-04 5:25 ` Ranjit Mathew
2005-10-04 6:38 ` Ranjit Mathew
@ 2005-10-04 7:03 ` Per Bothner
2005-10-05 21:30 ` Tom Tromey
2 siblings, 1 reply; 7+ messages in thread
From: Per Bothner @ 2005-10-04 7:03 UTC (permalink / raw)
To: tromey; +Cc: GCJ Patches
Tom Tromey wrote:
> I hate to say this, since I think it is a transient condition, and I
> don't want people to really remember it, but gcjx as it is today is
> really, really slow. It was more than 10x slower than jikes for
> building classpath; now it is merely 6x slower.
Does Jikes handle the JDK 1.5 language? If not, it's not a relevant
comparsion, I think.
Also, Jikes only generates bytecode. We generate native code as well,
which means more general-purpose and probably less efficient trees.
If Jikes's parser really is a lot faster, it may be worthwhile
reading its code to see what they do.
> Ranjit> Why do you think the current parser is beyond redemption?
>
> I'm not totally certain that it is. But it does show up in the
> profile a lot more than I think it ought to.
A suggestion: If we can get read of peek1, then we get rid of the
token_stream, and replace it by a single "current token".
The XQuery parser in Kawa does this and XQuery is actually a fairly
complicated language to parse.
In any case, if we only need peek and peek1, do we really need a
a deque for the tokens?
I see that you have a mechanism for "mark" and "backtrack", and it's
used heavily - i.e. for parse::primary. Is that really needed?
> My guess
> is that a generated parser would solve both of these problems.
> Basically I took a crazy approach to writing a parser and now I'm
> having second thoughts :-)
I don't think a generated parser is the solution. Instead, think about
the simplest data structures you can use.
You might find some ideas in the XQuery parser:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/kawa/gnu/xquery/lang/XQParser.java?rev=1.99&content-type=text/plain&cvsroot=kawa
(There are definitely some ugly and pointless kludges there, of course.)
Start with parsePrimaryExpr, perhaps. No token buffer or backtracking.
--
--Per Bothner
per@bothner.com http://per.bothner.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gcjx] Patch: FYI: parser and lexer changes
2005-10-04 7:03 ` Per Bothner
@ 2005-10-05 21:30 ` Tom Tromey
0 siblings, 0 replies; 7+ messages in thread
From: Tom Tromey @ 2005-10-05 21:30 UTC (permalink / raw)
To: Per Bothner; +Cc: GCJ Patches
Per> Also, Jikes only generates bytecode. We generate native code as well,
Per> which means more general-purpose and probably less efficient trees.
In this case the comparison is reasonably appropriate; gcjx doesn't
yet implement all of 1.5, and I was only comparing the bytecode
compiler (which doesn't use trees at all).
Per> If Jikes's parser really is a lot faster, it may be worthwhile
Per> reading its code to see what they do.
They use a parser generator called "jikespg:.
Per> In any case, if we only need peek and peek1, do we really need a
Per> a deque for the tokens?
Per> I see that you have a mechanism for "mark" and "backtrack", and it's
Per> used heavily - i.e. for parse::primary. Is that really needed?
The answer is "no" to both of these. Maybe we can salvage the current
code without going to a generated parser. I'm putting this off again
for a while :-)
Per> You might find some ideas in the XQuery parser:
Thanks, I'll definitely look.
Tom
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-10-05 21:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-03 20:44 [gcjx] Patch: FYI: parser and lexer changes Tom Tromey
2005-10-04 2:47 ` Ranjit Mathew
2005-10-04 3:06 ` Tom Tromey
2005-10-04 5:25 ` Ranjit Mathew
2005-10-04 6:38 ` Ranjit Mathew
2005-10-04 7:03 ` Per Bothner
2005-10-05 21:30 ` Tom Tromey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).