public inbox for java-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [gcjx] Patch: FYI: parser and lexer changes
@ 2005-10-03 20:44 Tom Tromey
  2005-10-04  2:47 ` Ranjit Mathew
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2005-10-03 20:44 UTC (permalink / raw)
  To: Java Patch List

I'm checking this in on the gcjx branch.

This weekend I looked into gcjx performance a bit.  The appended patch
yields about a 30% speed improvement when building Classpath on my
machine.  This is nice, but not nearly good enough; I think I'll
probably just have to throw out the current parser and write one using
a real tool.  Most likely, I'll put this off for a while.

FWIW in the end I didn't use oprofile but instead built a static gcjx
with -pg.  That seemed to yield the most useful results.

Tom

Index: ChangeLog
from  Tom Tromey  <tromey@redhat.com>

	* source/lex.cc (get_raw): Increment column.
	(get): Don't increment line number when newline comes from unget.

	* source/iconv.cc (get): Rewrote.
	(iconv_ucs2_reader): Updated.
	(refill): Updated.
	(get): Rewrote.
	* source/iconv.hh (iconv_ucs2_reader::get): Updated.
	(iconv_ucs2_reader::next): Removed.
	* source/ucs2.cc (internal_get): New method.
	(get): Rewrote.
	* source/ucs2.hh (ucs2_reader::get): Changed prototype.
	(fallback_ucs2_reader::val): New field.
	(fallback_ucs2_reader::internal_get): Declare.
	* source/lex.cc (lexer): Initialize new fields.
	(get_raw): Rewrote.
	* source/lex.hh (lexer::chars): New field.
	(lexer::num_chars): Likewise.
	(lexer::position): Likewise.

	* scope.cc (resolution_scope): Initialize it.
	(update_cache): New method.
	(get_current_class): Removed.
	(push_scope): New method.
	(push_iscope): Moved from .hh file.
	* scope.hh (resolution_scope::current_class): New field.
	(resolution_scope::update_cache): Declare.
	(resolution_scope::get_current_class): Now inline.
	(push_iscope::save): New field.
	(push_iscope): Outline.
	(~push_iscope): New destructor.
	(resolution_scope::push_scope): Outline.

	* model/ideprecatable.hh (IDeprecatable::set_deprecated): Removed
	overload.
	* source/tstream.cc (lex_file): Handle javadoc here.
	(get_javadoc): Rewrote.
	(get_token): Removed.
	(peek_token): Removed.
	* source/parse.hh (class parse): Updated.
	* source/parse.cc (any_method_declarator): Changed argument type
	from ref_javadoc to bool.
	(field_declarator): Likewise.
	(void_method): Likewise.
	(member_decl): Likewise.
	(class_body_declaration): Updated.
	(class_or_interface_declaration): Likewise.
	* source/tstream.hh (token_stream::javadoc_is_ok): Removed.
	(token_stream): Updated.
	(token_stream::deprecated): New field.
	(token_stream::get_token): Now inline.
	(token_stream::get_javadoc): Changed return type.
	(token_stream::peek_token): Now inline.
	(token_stream::peek_token1): Likewise.
	(token_stream::peek_token): Removed overload.

	* source/parse.hh (parse::get): Return reference.
	(parse::peek): Likewise.
	(parse::peek1): Likewise.
	(parse::require): Likewise.
	(parse::assume): Likewise.
	* source/lex.cc (lex_token): Renamed.
	* source/lex.hh (lexer::lex_token): Renamed from get_token.  Now
	protected.
	* source/parse.cc (class_body_declaration): Use get_javadoc.
	(class_or_interface_declaration): Likewise.
	(require): Use and return reference.
	(assume): Likewise.
	(identifier): Use reference.
	(new_something): Likewise.
	(type_dot_class): Likewise.
	(class_name_dot_this): Likewise.
	(postfix_expression): Likewise.
	(unary_expression_not_plus_minus): Likewise.
	(unary_expression): Likewise.
	(relational_expression): Likewise.
	(assignment): Likewise.
	(primitive_type): Likewise.
	(type_name): Likewise.
	(final_or_attributes): Likewise.
	(assert_statement): Likewise.
	(try_statement): Likewise.
	(if_then_statement): Likewise.
	(statement): Likewise.
	(explicit_constructor_invocation): Likewise.
	(type_parameters): Likewise.
	(member_value): Likewise.
	(any_method_declarator): Likewise.
	(void_method): Likewise.
	(member_decl): Likewise.
	(class_body_declaration): Likewise.
	(interface_declaration): Likewise.
	(class_declaration): Likewise.
	(class_or_interface_declaration): Likewise.
	(type_declaration): Likewise.
	* source/tstream.cc (get_unfiltered_token): Removed.
	(peek_token): Updated.
	(peek_token1): Updated.
	(get_token): Rewrote.
	(lex_file): Added assert.
	(peek_token): New method.
	(peek_token, peek_token1): Removed.
	(class saver): Removed.
	(get_javadoc): New method.
	* source/tstream.hh (token_stream::get_unfiltered_token):
	Removed.
	(token_stream::position_type): New typedef.
	(token_stream::read_position): Use it.
	(token_stream::set_mark): Likewise.
	(token_stream::reset_to_mark): Likewise.
	(class marker): Likewise.
	(token_stream::javadoc_ok): Removed.
	(token_stream::peek_token): New overload.
	(token_stream::peek_token1): Inline.

Index: scope.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/Attic/scope.cc,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 scope.cc
--- scope.cc 13 Jan 2005 03:18:33 -0000 1.1.2.1
+++ scope.cc 3 Oct 2005 20:27:32 -0000
@@ -1,6 +1,6 @@
 // Scopes.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -24,7 +24,8 @@
 \f
 
 resolution_scope::resolution_scope ()
-  : warning_scope (static_cast<warning_scope *> (global->get_compiler ()))
+  : warning_scope (static_cast<warning_scope *> (global->get_compiler ())),
+    current_class (NULL)
 {
 }
 
@@ -55,18 +56,26 @@
   return NULL;
 }
 
-model_class *
-resolution_scope::get_current_class () const
+void
+resolution_scope::update_cache ()
 {
-  for (std::deque<IScope *>::const_reverse_iterator i = scope_stack.rbegin ();
-       i != scope_stack.rend ();
-       ++i)
+  // Note that this is called before the last item is popped from the
+  // stack, due how push_iscope is implemented.
+  std::deque<IScope *>::reverse_iterator i = scope_stack.rbegin (); 
+  // Return early if stack is empty or if we aren't popping the
+  // current class.
+  if (i == scope_stack.rend () || *i != current_class)
+    return;
+  current_class = NULL;
+  for (++i; i != scope_stack.rend (); ++i)
     {
       model_class *r = dynamic_cast<model_class *> (*i);
       if (r != NULL)
-	return r;
+	{
+	  current_class = r;
+	  break;
+	}
     }
-  return NULL;
 }
 
 ICatcher *
@@ -224,3 +233,22 @@
 
   scope_stack.back ()->add_binding (klass);
 }
+
+void
+resolution_scope::push_scope (IScope *i)
+{
+  scope_stack.push_back (i);
+  model_class *r = dynamic_cast<model_class *> (i);
+  if (r != NULL)
+    current_class = r;
+}
+
+resolution_scope::push_iscope::push_iscope (resolution_scope *scope,
+					    IScope *is)
+  : stack_temporary<IScope *> (scope->scope_stack, is),
+    save (scope)
+{
+  model_class *r = dynamic_cast<model_class *> (is);
+  if (r != NULL)
+    scope->current_class = r;
+}
Index: scope.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/Attic/scope.hh,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 scope.hh
--- scope.hh 13 Jan 2005 03:18:33 -0000 1.1.2.1
+++ scope.hh 3 Oct 2005 20:27:32 -0000
@@ -1,6 +1,6 @@
 // Scopes used during resolution.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -35,6 +35,12 @@
   // Stack of scope objects.
   std::deque<IScope *> scope_stack;
 
+  // Cache the current class.
+  model_class *current_class;
+
+  // Update the class cache before popping a scope.
+  void update_cache ();
+
 public:
 
   resolution_scope ();
@@ -75,7 +81,10 @@
   // Return the class we are currently resolving.  This will return
   // NULL outside of a class context, for instance while resolving
   // import declarations.
-  model_class *get_current_class () const;
+  model_class *get_current_class () const
+  {
+    return current_class;
+  }
 
   model_method *get_current_method () const;
 
@@ -88,21 +97,23 @@
 
   /// Note the asymmetric API: we can push a scope on the stack with
   /// this method, but there is no way to pop a scope.
-  void push_scope (IScope *i)
-  {
-    scope_stack.push_back (i);
-  }
+  void push_scope (IScope *);
+
 
   /// This class is used, via the RAII idiom, to push a new IScope on
   /// the current resolution_scope, and then later remove it when this
   /// object is destroyed.
   class push_iscope : public stack_temporary<IScope *>
   {
+    resolution_scope *save;
+
   public:
 
-    push_iscope (resolution_scope *scope, IScope *is)
-      : stack_temporary<IScope *> (scope->scope_stack, is)
+    push_iscope (resolution_scope *, IScope *);
+
+    ~push_iscope ()
     {
+      save->update_cache ();
     }
   };
 };
Index: model/ideprecatable.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/model/Attic/ideprecatable.hh,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 ideprecatable.hh
--- model/ideprecatable.hh 13 Jan 2005 03:18:36 -0000 1.1.2.1
+++ model/ideprecatable.hh 3 Oct 2005 20:27:32 -0000
@@ -1,6 +1,6 @@
 // "deprecatable" interface.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -48,14 +48,6 @@
   {
   }
 
-  /// This can be used to set the deprecation state based on a
-  /// javadoc comment.
-  void set_deprecated (const ref_javadoc &javadoc)
-  {
-    if (javadoc)
-      deprecated = javadoc->deprecated_p ();
-  }
-
   /// Set the deprecation state of this object.
   void set_deprecated (bool d)
   {
Index: source/iconv.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/iconv.cc,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 iconv.cc
--- source/iconv.cc 3 Oct 2005 17:38:09 -0000 1.1.2.2
+++ source/iconv.cc 3 Oct 2005 20:27:32 -0000
@@ -48,7 +48,6 @@
   assert (handle != (iconv_t) -1);
 
   last = 0;
-  next = 0;
 }
 
 iconv_ucs2_reader::~iconv_ucs2_reader ()
@@ -60,8 +59,6 @@
 void
 iconv_ucs2_reader::refill ()
 {
-  assert (last == next);
-
   size_t in_rem = end - curr;
   char *out_loc = (char *) translated;
   size_t out_rem = sizeof (translated);
@@ -86,23 +83,15 @@
 	}
     }
 
-  // Now update our internal state to reflect the current situation.
-  next = 0;
-  
   // A UCS-2 character is always 2 bytes.
   assert ((out_loc - (char *) translated) % 2 == 0);
   last = (out_loc - (char *) translated) / 2;
 }
 
-unicode_w_t
-iconv_ucs2_reader::get ()
+jchar *
+iconv_ucs2_reader::get (int &size)
 {
-  if (next == last)
-    {
-      if (curr == end)
-	return UNICODE_EOF;
-      refill ();
-      assert (next != last);
-    }
-  return translated[next++];
+  refill ();
+  size = last;
+  return translated;
 }
Index: source/iconv.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/iconv.hh,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 iconv.hh
--- source/iconv.hh 13 Jan 2005 03:18:37 -0000 1.1.2.1
+++ source/iconv.hh 3 Oct 2005 20:27:32 -0000
@@ -1,6 +1,6 @@
 // iconv-based reader.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -35,9 +35,6 @@
   // One past the last valid character in TRANSLATED.
   int last;
 
-  // Index of the next valid character.
-  int next;
-
   // Name of the input encoding.
   std::string input_encoding;
 
@@ -50,7 +47,7 @@
 
   ~iconv_ucs2_reader ();
 
-  unicode_w_t get ();
+  jchar *get (int &);
 };
 
 #endif // GCJX_SOURCE_ICONV_HH
Index: source/lex.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/lex.cc,v
retrieving revision 1.1.2.4
diff -u -r1.1.2.4 lex.cc
--- source/lex.cc 3 Oct 2005 17:38:09 -0000 1.1.2.4
+++ source/lex.cc 3 Oct 2005 20:27:32 -0000
@@ -42,7 +42,10 @@
     filename (file),
     line (1),
     column (0),
-    tab_width (global->get_compiler ()->get_tab_width ())
+    tab_width (global->get_compiler ()->get_tab_width ()),
+    chars (NULL),
+    num_chars (0),
+    position (0)
 {
 }
 
@@ -215,30 +218,27 @@
     {
       c = unget_value;
       unget_value = UNICODE_W_NONE;
+      return c;
     }
-  else
+
+  if (position >= num_chars)
     {
       try
 	{
-	  c = input_filter->get ();
+	  chars = input_filter->get (num_chars);
+	  position = num_chars > 0 ? 0 : -1;
 	}
       catch (conversion_error &exc)
 	{
 	  exc.set_location (here ());
 	  throw exc;
 	}
-
-      if (c == UNICODE_TAB)
-	{
-	  // Advance to next multiple of tab width.  Note we don't
-	  // subtract one from tab_width since we start columns at
-	  // zero.
-	  column = ((column + tab_width) / tab_width) * tab_width;
-	}
-      else
-	++column;
     }
-  return c;
+
+  if (position == -1)
+    return UNICODE_EOF;
+  ++column;
+  return chars[position++];
 }
 
 unicode_w_t
@@ -300,10 +300,12 @@
 lexer::get ()
 {
   unicode_w_t c;
+  bool can_incr = true;
   if (cooked_unget_value != UNICODE_W_NONE)
     {
       c = cooked_unget_value;
       cooked_unget_value = UNICODE_W_NONE;
+      can_incr = false;
     }
   else
     c = read_handling_escapes ();
@@ -322,7 +324,7 @@
       c = UNICODE_LINE_FEED;
     }
 
-  if (c == UNICODE_LINE_FEED)
+  if (can_incr && c == UNICODE_LINE_FEED)
     {
       // Note that it is somewhat bogus for this to be here, since it
       // means an escape like \u00a0 will increment the line number.
@@ -1300,7 +1302,7 @@
 }
 
 token
-lexer::get_token ()
+lexer::lex_token ()
 {
   token result;
   location where;
Index: source/lex.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/lex.hh,v
retrieving revision 1.1.2.3
diff -u -r1.1.2.3 lex.hh
--- source/lex.hh 3 Oct 2005 17:38:09 -0000 1.1.2.3
+++ source/lex.hh 3 Oct 2005 20:27:32 -0000
@@ -64,6 +64,12 @@
   // the compiler.
   int tab_width;
 
+  // Buffer of characters from the input reader, its size, and our
+  // position in it.
+  jchar *chars;
+  int num_chars;
+  int position;
+
   // Some methods to characterize characters.
   bool identifier_part_p (unicode_w_t);
   bool identifier_start_p (unicode_w_t);
@@ -152,15 +158,17 @@
   // Internal interface to the lexer; does most of the work.
   token get_token_internal (location &);
 
+protected:
+
+  // Return a single token.
+  token lex_token ();
+
 public:
 
   lexer (ucs2_reader *source, const char *file);
 
   virtual ~lexer ();
 
-  // Return a single token.
-  token get_token ();
-
   // Return the name of a token.
   static const char *token_to_string (token_value);
 
Index: source/parse.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/parse.cc,v
retrieving revision 1.1.2.5
diff -u -r1.1.2.5 parse.cc
--- source/parse.cc 20 Sep 2005 17:11:45 -0000 1.1.2.5
+++ source/parse.cc 3 Oct 2005 20:27:33 -0000
@@ -79,10 +79,10 @@
 
 \f
 
-inline token
+inline token &
 parse::require (token_value val, const char *message)
 {
-  token t = get ();
+  token &t = get ();
   if (t != val)
     {
       if (message)
@@ -93,10 +93,10 @@
   return t;
 }
 
-inline token
+inline token &
 parse::assume (token_value val)
 {
-  token t = get ();
+  token &t = get ();
   assert (t == val);
   return t;
 }
@@ -180,7 +180,7 @@
 std::string
 parse::identifier ()
 {
-  token t = require (TOKEN_IDENTIFIER, "identifier expected");
+  token &t = require (TOKEN_IDENTIFIER, "identifier expected");
   return ((ref_identifier) t)->get_identifier ();
 }
 
@@ -343,7 +343,7 @@
   // since in this case we must search the primary's class for the
   // member type named Name; instead we handle this when resolving the
   // 'new'.
-  token t = peek ();
+  token &t = peek ();
   if (t == TOKEN_OPEN_PAREN || primary_dot_new || require_class)
     {
       std::list<ref_expression> args (arguments ());
@@ -421,7 +421,7 @@
 {
   ref_forwarding_type the_type (type ());
   require (TOKEN_DOT);
-  token t = require (TOKEN_CLASS);
+  token &t = require (TOKEN_CLASS);
   return ref_expression (new model_class_ref (t, the_type));
 }
 
@@ -432,7 +432,7 @@
 {
   std::string name = identifier ();
   require (TOKEN_DOT);
-  token t = require (TOKEN_THIS);
+  token &t = require (TOKEN_THIS);
   std::list<std::string> fake_qual;
   fake_qual.push_back (name);
   ref_forwarding_type fwdt (new model_forwarding_simple (t, fake_qual));
@@ -534,7 +534,7 @@
 parse::primary ()
 {
   ref_expression result;
-  token t;
+  token t;			// FIXME
 
   // Introduce a new scope so the marker lives just as long as we may
   // need it.
@@ -753,7 +753,7 @@
   ref_expression result (primary ());
   while (true)
     {
-      token t = peek ();
+      token &t = peek ();
       if (t == TOKEN_INCREMENT)
 	{
 	  assume (TOKEN_INCREMENT);
@@ -782,7 +782,7 @@
 parse::unary_expression_not_plus_minus ()
 {
   ref_unary op;
-  token t = peek ();
+  token &t = peek ();
   switch (t.get_type ())
     {
     case TOKEN_BITWISE_NOT:
@@ -841,7 +841,7 @@
 parse::unary_expression ()
 {
   ref_unary op;
-  token t = peek ();
+  token &t = peek ();
   switch (t.get_type ())
     {
     case TOKEN_INCREMENT:
@@ -864,13 +864,13 @@
   assume (t.get_type ());
   if (t == TOKEN_MINUS)
     {
-      t = peek ();
+      token &t2 = peek ();
       // This special case circumvents the value checking in the
       // ordinary case, allowing the largest negative value through.
-      if (t == TOKEN_DECIMAL_INT_LIT || t == TOKEN_DECIMAL_LONG_LIT)
+      if (t2 == TOKEN_DECIMAL_INT_LIT || t2 == TOKEN_DECIMAL_LONG_LIT)
 	{
-	  assume (t.get_type ());
-	  op->set_expression (ref_expression (t));
+	  assume (t2.get_type ());
+	  op->set_expression (ref_expression (t2));
 	  return op;
 	}
     }
@@ -942,7 +942,7 @@
   while (true)
     {
       ref_binary op;
-      token t = peek ();
+      token &t = peek ();
       switch (t.get_type ())
 	{
 	case TOKEN_LESS_THAN:
@@ -1113,7 +1113,7 @@
 parse::assignment ()
 {
   ref_expression lhs (left_hand_side ());
-  token t = get ();
+  token &t = get ();
   if (! assignment_op_p (t.get_type ()))
     throw syntax_error (t, "assignment operator expected");
   ref_expression rhs (assignment_expression ());
@@ -1210,7 +1210,7 @@
 ref_forwarding_type
 parse::primitive_type ()
 {
-  token t = get ();
+  token &t = get ();
   if (! basic_type_p (t.get_type ()))
     throw syntax_error (t, "primitive type expected");
   ref_forwarding_type pt
@@ -1223,7 +1223,7 @@
 ref_forwarding_type
 parse::type_name (int &lt_depth)
 {
-  token t = peek ();
+  token t = peek ();		// FIXME
   if (basic_type_p (t.get_type ()))
     {
       get ();
@@ -1243,7 +1243,7 @@
   while (true)
     {
       ids.push_back (identifier ());
-      token t = peek ();
+      token t = peek ();	// FIXME
 
       // '<' means we're starting the parameters of a generic type.
       if (t == TOKEN_LESS_THAN)
@@ -1438,7 +1438,7 @@
 {
   while (true)
     {
-      token t = peek ();
+      token &t = peek ();
       if (t == TOKEN_FINAL)
 	{
 	  // FIXME: if (is_final) throw ...
@@ -1515,7 +1515,7 @@
 ref_assert 
 parse::assert_statement ()
 {
-  token t = assume (TOKEN_ASSERT);
+  token &t = assume (TOKEN_ASSERT);
   ref_assert result (new model_assert (t));
   result->set_expression (expression ());
 
@@ -1794,7 +1794,7 @@
 ref_try 
 parse::try_statement ()
 {
-  token t = assume (TOKEN_TRY);
+  token &t = assume (TOKEN_TRY);
   ref_try result (new model_try (t));
   result->set_block (block (new model_block (t)));  // FIXME location
 
@@ -1842,7 +1842,7 @@
 
   result->set_true_branch (statement ());
 
-  token t = peek ();
+  token &t = peek ();
   // The simple rule is, the inner-most "if" gets the "else".
   // We don't need to handle the no-short-if case specially.
   if (t == TOKEN_ELSE)
@@ -1866,6 +1866,7 @@
 void
 parse::switch_block_statement_groups (const ref_switch &result_switch)
 {
+  // FIXME
   for (token t = peek (); t == TOKEN_CASE || t == TOKEN_DEFAULT; t = peek ())
     {
       ref_switch_block swb (new model_switch_block (t));
@@ -1952,7 +1953,7 @@
 ref_stmt 
 parse::statement ()
 {
-  token t = peek ();
+  token &t = peek ();
 
   switch (t.get_type ())
     {
@@ -2089,6 +2090,7 @@
   std::list<ref_stmt> result;
 
   // This function is also used when parsing a switch block.
+  // FIXME
   for (token t = peek ();
        t != TOKEN_CLOSE_BRACE && t != TOKEN_CASE && t != TOKEN_DEFAULT;
        t = peek ())
@@ -2129,7 +2131,7 @@
 {
   ref_modifier_list mods (new model_modifier_list ());
 
-  for (token t = peek ();
+  for (token t = peek ();	// FIXME
        model_modifier_list::modifier_token_p (t);
        t = peek ())
     {
@@ -2154,7 +2156,7 @@
   if (peek () == TOKEN_LESS_THAN)
     type_params = actual_type_parameters ();
 
-  token t = peek ();
+  token &t = peek ();
   bool need_default = false;
 
   switch (t.get_type ())
@@ -2325,18 +2327,18 @@
   std::list<ref_type_variable> result;
 
   int lt_depth = 1;
-  token t;
   do
     {
       result.push_back (type_parameter (lt_depth));
       assert (lt_depth <= 1);
       if (! lt_depth)
 	break;
-      t = peek ();
-      if (t == TOKEN_COMMA)
-	assume (TOKEN_COMMA);
+      token &t = peek ();
+      if (t != TOKEN_COMMA)
+	break;
+      assume (TOKEN_COMMA);
     }
-  while (t == TOKEN_COMMA);
+  while (true);
 
   if (lt_depth)
     require (TOKEN_GREATER_THAN);
@@ -2350,7 +2352,7 @@
 ref_expression
 parse::member_value ()
 {
-  token t = peek ();
+  token &t = peek ();
   if (t == TOKEN_OPEN_BRACE)
     {
       assume (TOKEN_OPEN_BRACE);
@@ -2397,7 +2399,7 @@
     {
       assume (TOKEN_OPEN_PAREN);
       
-      token t = peek ();
+      token t = peek ();	// FIXME
       if (t == TOKEN_CLOSE_PAREN)
 	{
 	  // Marker annotation, do nothing.
@@ -2480,7 +2482,7 @@
 			      const ref_modifier_list &mods,
 			      int flags,
 			      ref_forwarding_type member_type,
-			      const ref_javadoc &javadoc)
+			      bool deprecated)
 {
   std::string id;
   bool is_interface = enclosing_class ()->interface_p ();
@@ -2498,7 +2500,7 @@
 
   result->set_return_type (member_type);
   result->set_annotations (annos);
-  result->set_deprecated (javadoc);
+  result->set_deprecated (deprecated);
   result->set_name (id);
 
   bool dots_found = false;
@@ -2536,7 +2538,7 @@
       result->set_throws (type_list ());
     }
 
-  token t = peek ();
+  token &t = peek ();
   if (t == TOKEN_OPEN_BRACE)
     {
       if (is_interface)
@@ -2562,7 +2564,7 @@
 			 const ref_forwarding_type &member_type,
 			 const std::list<ref_annotation> &annos,
 			 const ref_modifier_list &mods,
-			 const ref_javadoc &javadoc)
+			 bool deprecated)
 {
   // We can have several fields from a single declarator.
   std::list<ref_variable_decl> vars = variable_declarators (member_type);
@@ -2584,7 +2586,7 @@
       ref_field result (new model_field (*i));
       result->set_annotations (annos);
       result->set_modifiers (mods);
-      result->set_deprecated (javadoc);
+      result->set_deprecated (deprecated);
       klass->add (result);
     }
 }
@@ -2593,13 +2595,13 @@
 parse::void_method (const ref_class &klass,
 		    const std::list<ref_annotation> &annos,
 		    const ref_modifier_list &mods,
-		    const ref_javadoc &javadoc)
+		    bool deprecated)
 {
-  token t = require (TOKEN_VOID);
+  token &t = require (TOKEN_VOID);
   ref_forwarding_type voidtype
     = new model_forwarding_resolved (t, primitive_void_type);
   ref_method decl = any_method_declarator (annos, mods, METHOD_VOID,
-					   voidtype, javadoc);
+					   voidtype, deprecated);
   klass->add (decl);
   return decl;
 }
@@ -2616,21 +2618,21 @@
 parse::member_decl (const ref_class &klass,
 		    const std::list<ref_annotation> &annos,
 		    const ref_modifier_list &mods,
-		    const ref_javadoc &javadoc,
+		    bool deprecated,
 		    bool parse_annotation)
 {
-  token t = peek ();
+  token &t = peek ();
   // You can't have a void method in an annotation; we throw an
   // exception for this later.
   if (! parse_annotation && t == TOKEN_VOID)
     {
-      void_method (klass, annos, mods, javadoc);
+      void_method (klass, annos, mods, deprecated);
       return;
     }
   else if (t == TOKEN_CLASS)
     {
       ref_class member = class_declaration (mods);
-      member->set_deprecated (javadoc);
+      member->set_deprecated (deprecated);
       member->set_annotations (annos);
       klass->add (member);
       return;
@@ -2638,7 +2640,7 @@
   else if (t == TOKEN_INTERFACE)
     {
       ref_class member = interface_declaration (mods);
-      member->set_deprecated (javadoc);
+      member->set_deprecated (deprecated);
       member->set_annotations (annos);
       klass->add (member);
       return;
@@ -2646,7 +2648,7 @@
   else if (t == TOKEN_ENUM)
     {
       ref_class member = enum_declaration (mods);
-      member->set_deprecated (javadoc);
+      member->set_deprecated (deprecated);
       member->set_annotations (annos);
       klass->add (member);
       return;
@@ -2654,7 +2656,7 @@
   else if (t == TOKEN_AT)
     {
       ref_class member = annotation_type_declaration (mods);
-      member->set_deprecated (javadoc);
+      member->set_deprecated (deprecated);
       member->set_annotations (annos);
       klass->add (member);
       return;
@@ -2673,7 +2675,7 @@
       // is not a "type".
       if (peek () == TOKEN_VOID)
 	{
-	  ref_method m = void_method (klass, annos, mods, javadoc);
+	  ref_method m = void_method (klass, annos, mods, deprecated);
 	  m->set_type_parameters (type_parms);
 	  return;
 	}
@@ -2683,7 +2685,7 @@
   int method_flags = 0;
   ref_forwarding_type member_type;
 
-  token pt = peek ();
+  token &pt = peek ();
   if (! parse_annotation
       && pt == TOKEN_IDENTIFIER
       && ((ref_identifier) pt)->get_identifier () == klass->get_name ()
@@ -2704,7 +2706,7 @@
       if (parse_annotation)
 	{
 	  assert (! must_be_method);
-	  // We just ignore JAVADOC here.  Perhaps we should give a
+	  // We just ignore javadoc here.  Perhaps we should give a
 	  // warning if it had @deprecated, since that is just
 	  // ignored.
 	  annotation_type_member (klass, mods, member_type);
@@ -2712,14 +2714,14 @@
       else
 	{
 	  ref_method meth (any_method_declarator (annos, mods, method_flags,
-						  member_type, javadoc));
+						  member_type, deprecated));
 	  if (! type_parms.empty ())
 	    meth->set_type_parameters (type_parms);
 	  klass->add (meth);
 	}
     }
   else
-    field_declarator (klass, member_type, annos, mods, javadoc);
+    field_declarator (klass, member_type, annos, mods, deprecated);
 }
 
 // Helper function for class_body_declaration.  This handles static
@@ -2754,7 +2756,7 @@
 parse::class_body_declaration (const ref_class &result,
 			       bool parse_annotation)
 {
-  token t = peek ();
+  token &t = peek ();
   if (t == TOKEN_SEMI)
     {
       assume (TOKEN_SEMI);
@@ -2771,24 +2773,17 @@
   // Javadoc is valid before the modifiers of a member.  Before a
   // static block we can just ignore the javadoc as it is an ordinary
   // comment.
-  our_token_stream->javadoc_ok ();
-  ref_javadoc javadoc;
-  t = peek ();
-  if (t == TOKEN_JAVADOC)
-    {
-      assume (TOKEN_JAVADOC);
-      javadoc = t;
-    }
+  bool deprecated = our_token_stream->get_javadoc ();
 
   // Always read modifiers at this point, since "static" is a valid
   // modifier.  Anything incorrect we diagnose later.
   ref_modifier_list mods = modifiers_opt ();
 
-  t = peek ();
-  if (! parse_annotation && t == TOKEN_OPEN_BRACE)
+  token t2 = peek ();
+  if (! parse_annotation && t2 == TOKEN_OPEN_BRACE)
     class_body_block (annos, mods);
   else
-    member_decl (result, annos, mods, javadoc, parse_annotation);
+    member_decl (result, annos, mods, deprecated, parse_annotation);
 }
 
 // class-body:
@@ -2845,7 +2840,7 @@
   set_containing_class (result);
   result->set_modifiers (mods);
 
-  token t = peek ();
+  token &t = peek ();
 
   if (global->get_compiler ()->feature_generics () && t == TOKEN_LESS_THAN)
     {
@@ -2880,7 +2875,7 @@
   set_containing_class (result);
   result->set_modifiers (mods);
 
-  token t = peek ();
+  token &t = peek ();
 
   if (global->get_compiler ()->feature_generics () && t == TOKEN_LESS_THAN)
     {
@@ -3018,14 +3013,7 @@
   ref_class result;
 
   // Javadoc is valid before the modifiers of a class.
-  our_token_stream->javadoc_ok ();
-  ref_javadoc javadoc;
-  token t = peek ();
-  if (t == TOKEN_JAVADOC)
-    {
-      assume (TOKEN_JAVADOC);
-      javadoc = t;
-    }
+  bool deprecated = our_token_stream->get_javadoc ();
 
   // '@' could represent the start of annotations, or it could be
   // '@interface'.
@@ -3034,7 +3022,7 @@
     annos = annotations ();
 
   ref_modifier_list mods = modifiers_opt ();
-  t = peek ();
+  token &t = peek ();
   if (t == TOKEN_CLASS)
     result = class_declaration (mods);
   else if (t == TOKEN_INTERFACE)
@@ -3047,7 +3035,7 @@
     throw syntax_error (t, "%<class%>, %<interface%>, or %<enum%> "
 			"declaration expected");
 
-  result->set_deprecated (javadoc);
+  result->set_deprecated (deprecated);
   result->set_annotations (annos);
 
   return result;
@@ -3060,7 +3048,7 @@
 ref_class
 parse::type_declaration ()
 {
-  token t = peek ();
+  token &t = peek ();
   if (t == TOKEN_SEMI)
     {
       assume (TOKEN_SEMI);
@@ -3143,7 +3131,7 @@
   if (peek () == TOKEN_AT)
     annos = annotations ();
 
-  token t = peek ();
+  token t = peek ();		// FIXME
   if (t == TOKEN_PACKAGE)
     {
       // FIXME: we make a bogus new element just to hold the location
Index: source/parse.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/parse.hh,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 parse.hh
--- source/parse.hh 3 Oct 2005 17:38:09 -0000 1.1.2.2
+++ source/parse.hh 3 Oct 2005 20:27:33 -0000
@@ -60,30 +60,30 @@
 					 const std::string &);
 
   // Fetch the next token.
-  token get ()
+  token &get ()
   {
     return our_token_stream->get_token ();
   }
 
   // Peek ahead a token.
-  token peek ()
+  token &peek ()
   {
     return our_token_stream->peek_token ();
   }
 
   // Peek ahead two tokens.
-  token peek1 ()
+  token &peek1 ()
   {
     return our_token_stream->peek_token1 ();
   }
 
   // Require that the next token be of a certain type.  If not, throw
   // a parse exception.
-  token require (token_value val, const char *message = NULL);
+  token &require (token_value val, const char *message = NULL);
 
   // Require that the next token will be a certain type.  If not,
   // abort.
-  token assume (token_value);
+  token &assume (token_value);
 
   // Classify tokens.
   static bool parse::basic_type_p (token_value);
@@ -179,21 +179,21 @@
 				    const ref_modifier_list &,
 				    int,
 				    ref_forwarding_type,
-				    const ref_javadoc &);
+				    bool);
   void field_declarator (const ref_class &,
 			 const ref_forwarding_type &,
 			 const std::list<ref_annotation> &,
 			 const ref_modifier_list &,
-			 const ref_javadoc &);
+			 bool);
   void member_decl (const ref_class &,
 		    const std::list<ref_annotation> &,
 		    const ref_modifier_list &,
-		    const ref_javadoc &,
+		    bool,
 		    bool);
   ref_method void_method (const ref_class &,
 			  const std::list<ref_annotation> &,
 			  const ref_modifier_list &,
-			  const ref_javadoc &);
+			  bool);
   void class_body_block (const std::list<ref_annotation> &,
 			 const ref_modifier_list &);
   void class_body_declaration (const ref_class &, bool);
Index: source/tstream.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/tstream.cc,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 tstream.cc
--- source/tstream.cc 29 Sep 2005 00:54:48 -0000 1.1.2.2
+++ source/tstream.cc 3 Oct 2005 20:27:33 -0000
@@ -28,89 +28,21 @@
   token r;
   do
     {
-      r = lexer::get_token ();
-      buffer.push_back (r);
-    }
-  while (r != TOKEN_EOF);
-}
-
-token
-token_stream::get_unfiltered_token ()
-{
-  if (read_position < buffer.size ())
-    ++read_position;
-  return buffer[read_position - 1];
-}
-
-token
-token_stream::get_token ()
-{
-  token r;
-  while (true)
-    {
-      r = get_unfiltered_token ();
+      r = lex_token ();
       // In theory the lexer will never return a repeat or invalid
       // token.
       assert (r != TOKEN_REPEAT && r != TOKEN_INVALID);
 
-      if (r != TOKEN_JAVADOC)
-	break;
-      if (javadoc_is_ok)
-	{
-	  // Look ahead -- if the next token is javadoc, we discard
-	  // this one.
-	  token n = peek_token ();
-	  if (n != TOKEN_JAVADOC)
-	    break;
-	}
+      if (r == TOKEN_JAVADOC)
+	deprecated.insert (buffer.size ());
+      else
+	buffer.push_back (r);
     }
-
-  javadoc_is_ok = false;
-  return r;
-}
-
-template<typename T>
-class saver
-{
-  T &location;
-  T save;
-
-public:
-
-  saver (T &loc, T new_val)
-    : location (loc),
-      save (loc)
-  {
-    location = new_val;
-  }
-
-  saver (T &loc)
-    : location (loc),
-      save (loc)
-  {
-  }
-
-  ~saver ()
-  {
-    location = save;
-  }
-};
-
-token
-token_stream::peek_token ()
-{
-  saver<bool> save_jd (javadoc_is_ok);
-  saver<int> save_read (read_position);
-  token r = get_token ();
-  return r;
+  while (r != TOKEN_EOF);
 }
 
-token
-token_stream::peek_token1 ()
+bool
+token_stream::get_javadoc ()
 {
-  saver<bool> save_jd (javadoc_is_ok);
-  saver<int> save_read (read_position);
-  get_token ();
-  token r = get_token ();
-  return r;
+  return deprecated.find (read_position) != deprecated.end ();
 }
Index: source/tstream.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/tstream.hh,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 tstream.hh
--- source/tstream.hh 29 Sep 2005 00:54:48 -0000 1.1.2.2
+++ source/tstream.hh 3 Oct 2005 20:27:33 -0000
@@ -32,28 +32,29 @@
 // filter the actual tokens according to what the parser requires.
 class token_stream : public lexer
 {
+  typedef unsigned int position_type;
+
   // Our buffer.
   std::deque<token> buffer;
 
   // Position of next unread element in buffer.
-  int read_position;
-
+  position_type read_position;
 
-  // True if the parser can usefully interpret a javadoc comment as
-  // the next token.  When false, we filter out such comments.
-  bool javadoc_is_ok;
+  // This holds the location of all tokens that were preceded by a
+  // javadoc containing 'deprecated'.
+  std::set<position_type> deprecated;
 
 
   // Set a mark at the current point.  Only called by marker class.
   // Returns the position.
-  int set_mark ()
+  position_type set_mark ()
   {
     return read_position;
   }
 
   // Reset the read pointer to a position.  Only called by marker
   // class.
-  void reset_to_mark (int where)
+  void reset_to_mark (position_type where)
   {
     read_position = where;
   }
@@ -61,9 +62,6 @@
   friend class marker;
 
 
-  // Return a token before any filtering is applied.
-  token get_unfiltered_token ();
-
   // Lex the file and fill our buffer.
   void lex_file ();
 
@@ -71,8 +69,7 @@
 
   token_stream (ucs2_reader *source, const char *file)
     : lexer (source, file),
-      read_position (0),
-      javadoc_is_ok (false)
+      read_position (0)
   {
     lex_file ();
   }
@@ -81,23 +78,28 @@
   {
   }
 
-  // Indicate that it is ok for the next token to be TOKEN_JAVADOC.
-  // The token stream imposes some restraints on this that conform to
-  // javadoc processing.
-  void javadoc_ok ()
+  // Fetch the next token.  Javadoc tokens will be ignored.
+  token &get_token ()
   {
-    javadoc_is_ok = true;
+    return buffer[read_position++];
   }
 
-  // Fetch the next token.
-  token get_token ();
+  // If the next token is a javadoc token, return the javadoc item.
+  // Otherwise return an empty javadoc wrapper.
+  bool get_javadoc ();
 
   // Look at the next token without committing to it.
-  token peek_token ();
+  token &peek_token ()
+  {
+    return buffer[read_position];
+  }
 
   // Look ahead two tokens without committing.  This is ad hoc, but it
-  // seems to be all the parser requires.
-  token peek_token1 ();
+  // is all the parser requires.
+  token &peek_token1 ()
+  {
+    return buffer[read_position + 1];
+  }
 };
 
 // This marks a location in a token stream buffer and ensures that no
@@ -105,7 +107,7 @@
 class marker
 {
   // The location of this mark.
-  int location;
+  token_stream::position_type location;
 
   // Associated token stream.
   token_stream *stream;
Index: source/ucs2.cc
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/ucs2.cc,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 ucs2.cc
--- source/ucs2.cc 3 Oct 2005 17:38:09 -0000 1.1.2.2
+++ source/ucs2.cc 3 Oct 2005 20:27:33 -0000
@@ -50,8 +50,8 @@
 
 \f
 
-unicode_w_t
-fallback_ucs2_reader::get ()
+jchar
+fallback_ucs2_reader::internal_get ()
 {
   int c = get_uint8 ();
 
@@ -103,6 +103,14 @@
     % here ();
 }
 
+jchar *
+fallback_ucs2_reader::get (int &len)
+{
+  val = internal_get ();
+  len = 1;
+  return &val;
+}
+
 \f
 
 std::string
Index: source/ucs2.hh
===================================================================
RCS file: /cvs/gcc/gcc/gcjx/source/Attic/ucs2.hh,v
retrieving revision 1.1.2.2
diff -u -r1.1.2.2 ucs2.hh
--- source/ucs2.hh 3 Oct 2005 17:38:09 -0000 1.1.2.2
+++ source/ucs2.hh 3 Oct 2005 20:27:33 -0000
@@ -66,11 +66,7 @@
 
   virtual ~ucs2_reader ();
 
-  /// @return the next UCS-2 character.  Return UNICODE_EOF on
-  /// end-of-file.  Throws conversion_error on any kind of character
-  /// conversion error; the caller is expected to set the location on
-  /// this exception.
-  virtual unicode_w_t get () = 0;
+  virtual jchar *get (int &size) = 0;
 
   /// Gets the current position of this reader within the input data.
   virtual int get_posn ()
@@ -92,12 +88,16 @@
 // Assume the input is utf-8.
 class fallback_ucs2_reader : public ucs2_reader
 {
+  jchar val;
+
   // Just a dumb helper.
   bool in_range_p (unicode_w_t v, unicode_w_t min, unicode_w_t max)
   {
     return v >= min && v <= max;
   }
 
+  jchar internal_get ();
+
 public:
 
   fallback_ucs2_reader (byte_buffer *in)
@@ -105,7 +105,7 @@
   {
   }
 
-  unicode_w_t get ();
+  jchar *get (int &size);
 };
 
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gcjx] Patch: FYI: parser and lexer changes
  2005-10-03 20:44 [gcjx] Patch: FYI: parser and lexer changes Tom Tromey
@ 2005-10-04  2:47 ` Ranjit Mathew
  2005-10-04  3:06   ` Tom Tromey
  0 siblings, 1 reply; 7+ messages in thread
From: Ranjit Mathew @ 2005-10-04  2:47 UTC (permalink / raw)
  To: tromey; +Cc: GCJ Patches

[-- Attachment #1: Type: text/plain, Size: 1670 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Tromey wrote:
> 
> This weekend I looked into gcjx performance a bit.  The appended patch
> yields about a 30% speed improvement when building Classpath on my
> machine.  This is nice, but not nearly good enough; I think I'll

You call a 30% speedup not good enough? Whoa!


> probably just have to throw out the current parser and write one using
> a real tool.  Most likely, I'll put this off for a while.

People seem to be moving from tool-based to hand-written
recursive-descent parsers (witness the C and C++ front-ends
in mainline). Why do you think the current parser is beyond
redemption?


> FWIW in the end I didn't use oprofile but instead built a static gcjx
> with -pg.  That seemed to yield the most useful results.

Agreed.

BTW, I applied the attached patch as obvious to restore the
caret-printing feature. Now I see:
- -------------------------- 8< --------------------------
~/src/tmp > cat Hello.java
public class Hello
{
  public static void main( String[] args)
  {
    junk t;
    System.out.println( "Hello World!");
  }
}
~/src/tmp > $GCJX Hello.java
./Hello.java:5:4: error: type named 'junk' is undefined

    junk t;
    ^

- -------------------------- 8< --------------------------

Yeah!

Thanks,
Ranjit.

- --
Ranjit Mathew       Email: rmathew AT gmail DOT com

Bangalore, INDIA.     Web: http://ranjitmathew.hostingzero.com/




-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDQe27Yb1hx2wRS48RAudnAJ4wwVWUVvmeYMr+oc2Y/jrCKiclXgCff34r
2y+FvH3SuqT7QgxuoJTqUWw=
=CMYv
-----END PGP SIGNATURE-----

[-- Attachment #2: t1.txt --]
[-- Type: text/plain, Size: 1406 bytes --]

Index: ChangeLog
from  Ranjit Mathew  <rmathew@gcc.gnu.org>

	* source/lex.cc (lexer::get_line): Also save and restore NUM_CHARS
	and POSITION.

Index: source/lex.cc
===================================================================
--- source/lex.cc	2005-10-04 07:59:37.000000000 +0530
+++ source/lex.cc	2005-10-04 08:01:21.000000000 +0530
@@ -1328,7 +1328,9 @@ lexer::get_line (int line_number)
   unicode_w_t save_unget_value = unget_value;
   unicode_w_t save_cooked_unget_value = cooked_unget_value;
   bool save_was_return = was_return;
-  int save_posn = input_filter->get_posn ();
+  int save_num_chars = num_chars;
+  int save_position = position;
+  int save_if_posn = input_filter->get_posn ();
 
   // Reset the state of the lexer and the input reader.
   line = 1;
@@ -1337,6 +1339,8 @@ lexer::get_line (int line_number)
   unget_value = UNICODE_W_NONE;
   cooked_unget_value = UNICODE_W_NONE;
   was_return = false;
+  num_chars = 0;
+  position = 0;
   input_filter->set_posn (0);
 
   // Ignore input until we reach the desired line (or end of file).
@@ -1366,7 +1370,9 @@ lexer::get_line (int line_number)
   unget_value = save_unget_value;
   cooked_unget_value = save_cooked_unget_value;
   was_return = save_was_return;
-  input_filter->set_posn (save_posn);
+  num_chars = save_num_chars;
+  position = save_position;
+  input_filter->set_posn (save_if_posn);
 
   return ret_val;
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gcjx] Patch: FYI: parser and lexer changes
  2005-10-04  2:47 ` Ranjit Mathew
@ 2005-10-04  3:06   ` Tom Tromey
  2005-10-04  5:25     ` Ranjit Mathew
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Tom Tromey @ 2005-10-04  3:06 UTC (permalink / raw)
  To: Ranjit Mathew; +Cc: GCJ Patches

Ranjit> You call a 30% speedup not good enough? Whoa!

I hate to say this, since I think it is a transient condition, and I
don't want people to really remember it, but gcjx as it is today is
really, really slow.  It was more than 10x slower than jikes for
building classpath; now it is merely 6x slower.

>> probably just have to throw out the current parser and write one using
>> a real tool.  Most likely, I'll put this off for a while.

Ranjit> People seem to be moving from tool-based to hand-written
Ranjit> recursive-descent parsers (witness the C and C++ front-ends
Ranjit> in mainline). Why do you think the current parser is beyond
Ranjit> redemption?

I'm not totally certain that it is.  But it does show up in the
profile a lot more than I think it ought to.  That is one problem.
Also it has never had a very good error reporting or recovery
approach, and the idea of throwing exceptions when a parse error is
hit turns out to be pretty bad (e.g., we spend 5 seconds constructing
format_repr objects when parsing an error-free code base).  My guess
is that a generated parser would solve both of these problems.
Basically I took a crazy approach to writing a parser and now I'm
having second thoughts :-)

Ranjit> BTW, I applied the attached patch as obvious to restore the
Ranjit> caret-printing feature.

Thanks.  And sorry about that.  

Tom

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gcjx] Patch: FYI: parser and lexer changes
  2005-10-04  3:06   ` Tom Tromey
@ 2005-10-04  5:25     ` Ranjit Mathew
  2005-10-04  6:38     ` Ranjit Mathew
  2005-10-04  7:03     ` Per Bothner
  2 siblings, 0 replies; 7+ messages in thread
From: Ranjit Mathew @ 2005-10-04  5:25 UTC (permalink / raw)
  To: tromey; +Cc: GCJ Patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Tromey wrote:
> Ranjit> You call a 30% speedup not good enough? Whoa!
> 
> I hate to say this, since I think it is a transient condition, and I
> don't want people to really remember it, but gcjx as it is today is
> really, really slow.  It was more than 10x slower than jikes for
> building classpath; now it is merely 6x slower.

This is the classic "glass half-full v/s half-empty" case. I
think a 30% speedup in a single patch is great news, while
you consider it not being good enough to compete with Jikes.

I think you're being a bit unfair here. For an essentially
one-man project, done largely in spare time, GCJX is fairly
decent as-is. Jikes, ECJ, GCJ, etc. have had many years to
be tweaked, by many more people, sometimes working full-time.

I would personally consider correctness and robustness to
be more important at this stage than raw performance, though
I do understand that being fast would be an important criterion
for wider acceptance. "Premature performance is the root
of all evil" and all that.


> hit turns out to be pretty bad (e.g., we spend 5 seconds constructing
> format_repr objects when parsing an error-free code base).  My guess

This is weird - why do we construct these objects at all then?
Isn't this easily rectified?

Thanks,
Ranjit

- --
Ranjit Mathew      Email: rmathew AT gmail DOT com

Bangalore, INDIA.    Web: http://ranjitmathew.hostingzero.com/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDQhI0Yb1hx2wRS48RAmsqAJ94lRqS+ul1yDVq0mqDFeGygdHRmQCgllX9
kkGgf/8fBKiS4MMQwsnnRX4=
=Yx2l
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gcjx] Patch: FYI: parser and lexer changes
  2005-10-04  3:06   ` Tom Tromey
  2005-10-04  5:25     ` Ranjit Mathew
@ 2005-10-04  6:38     ` Ranjit Mathew
  2005-10-04  7:03     ` Per Bothner
  2 siblings, 0 replies; 7+ messages in thread
From: Ranjit Mathew @ 2005-10-04  6:38 UTC (permalink / raw)
  To: tromey; +Cc: GCJ Patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Tromey wrote:
> Also it has never had a very good error reporting or recovery
> approach, and the idea of throwing exceptions when a parse error is
> hit turns out to be pretty bad (e.g., we spend 5 seconds constructing
> format_repr objects when parsing an error-free code base).  My guess

A side-effect is not recovering from an error and not showing as
many errors as possible. For example:
- ----------------------------- 8< -----------------------------
~/src/tmp > cat Hello.java
public class Hello
{
  public static void main( String[] args)
  {
    simian monkey;
    bovine cow;
    System.out.println( "Hello World!");
  }
}
~/src/tmp > $GCJX Hello.java
./Hello.java:5:4: error: type named 'simian' is undefined

    simian monkey;
    ^

~/src/tmp >
- ----------------------------- 8< -----------------------------

Ranjit.

- --
Ranjit Mathew      Email: rmathew AT gmail DOT com

Bangalore, INDIA.    Web: http://ranjitmathew.hostingzero.com/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDQiN0Yb1hx2wRS48RAtlSAJ4jLFo1p696gjT3qyRn7HXMOSh0wwCfUUOv
yUrQnV8170Tg68N7TeA/TLs=
=S9GT
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gcjx] Patch: FYI: parser and lexer changes
  2005-10-04  3:06   ` Tom Tromey
  2005-10-04  5:25     ` Ranjit Mathew
  2005-10-04  6:38     ` Ranjit Mathew
@ 2005-10-04  7:03     ` Per Bothner
  2005-10-05 21:30       ` Tom Tromey
  2 siblings, 1 reply; 7+ messages in thread
From: Per Bothner @ 2005-10-04  7:03 UTC (permalink / raw)
  To: tromey; +Cc: GCJ Patches

Tom Tromey wrote:
> I hate to say this, since I think it is a transient condition, and I
> don't want people to really remember it, but gcjx as it is today is
> really, really slow.  It was more than 10x slower than jikes for
> building classpath; now it is merely 6x slower.

Does Jikes handle the JDK 1.5 language?   If not, it's not a relevant
comparsion, I think.

Also, Jikes only generates bytecode.  We generate native code as well,
which means more general-purpose and probably less efficient trees.

If Jikes's parser really is a lot faster, it may be worthwhile
reading its code to see what they do.

> Ranjit> Why do you think the current parser is beyond redemption?
> 
> I'm not totally certain that it is.  But it does show up in the
> profile a lot more than I think it ought to. 

A suggestion: If we can get read of peek1, then we get rid of the
token_stream, and replace it by a single "current token".

The XQuery parser in Kawa does this and XQuery is actually a fairly
complicated language to parse.

In any case, if we only need peek and peek1, do we really need a
a deque for the tokens?

I see that you have a mechanism for "mark" and "backtrack", and it's
used heavily - i.e. for parse::primary.  Is that really needed?

>  My guess
> is that a generated parser would solve both of these problems.
> Basically I took a crazy approach to writing a parser and now I'm
> having second thoughts :-)

I don't think a generated parser is the solution.  Instead, think about
the simplest data structures you can use.

You might find some ideas in the XQuery parser:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/kawa/gnu/xquery/lang/XQParser.java?rev=1.99&content-type=text/plain&cvsroot=kawa
(There are definitely some ugly and pointless kludges there, of course.)
Start with parsePrimaryExpr, perhaps.  No token buffer or backtracking.
-- 
	--Per Bothner
per@bothner.com   http://per.bothner.com/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gcjx] Patch: FYI: parser and lexer changes
  2005-10-04  7:03     ` Per Bothner
@ 2005-10-05 21:30       ` Tom Tromey
  0 siblings, 0 replies; 7+ messages in thread
From: Tom Tromey @ 2005-10-05 21:30 UTC (permalink / raw)
  To: Per Bothner; +Cc: GCJ Patches

Per> Also, Jikes only generates bytecode.  We generate native code as well,
Per> which means more general-purpose and probably less efficient trees.

In this case the comparison is reasonably appropriate; gcjx doesn't
yet implement all of 1.5, and I was only comparing the bytecode
compiler (which doesn't use trees at all).

Per> If Jikes's parser really is a lot faster, it may be worthwhile
Per> reading its code to see what they do.

They use a parser generator called "jikespg:.

Per> In any case, if we only need peek and peek1, do we really need a
Per> a deque for the tokens?

Per> I see that you have a mechanism for "mark" and "backtrack", and it's
Per> used heavily - i.e. for parse::primary.  Is that really needed?

The answer is "no" to both of these.  Maybe we can salvage the current
code without going to a generated parser.  I'm putting this off again
for a while :-)

Per> You might find some ideas in the XQuery parser:

Thanks, I'll definitely look.

Tom

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-10-05 21:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-03 20:44 [gcjx] Patch: FYI: parser and lexer changes Tom Tromey
2005-10-04  2:47 ` Ranjit Mathew
2005-10-04  3:06   ` Tom Tromey
2005-10-04  5:25     ` Ranjit Mathew
2005-10-04  6:38     ` Ranjit Mathew
2005-10-04  7:03     ` Per Bothner
2005-10-05 21:30       ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).