public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: new parser: error recovery needs work
@ 2003-01-16 13:40 Robert Dewar
  2003-01-16 17:20 ` Joe Buck
  2003-01-16 17:34 ` Joe Buck
  0 siblings, 2 replies; 36+ messages in thread
From: Robert Dewar @ 2003-01-16 13:40 UTC (permalink / raw)
  To: gcc, jbuck, mark

> To be honest, I'm somewhat unsympathetic.  Not because I think the error
> messages are good, or because I think that we shouldn't do better, but
> because it's hard to do better in some of these cases and because
> we do noticably better in other cases -- the old parser just said
> "parse error" a lot. :-)

I certainly understand Mark's reaction here. I agree that it is unreasonable
to label as a regression particular cases in which the error message may or
may not be worse than the previous one (these things are somewhat subjective
after all).

I think the best thing is to recognize that we are never talking about a bug
when it comes to an unclear error message, but merely a possibly opportunity
for improving things. It is almost worth having a special category for such
suggestions. 

One of my particular interests in the Ada parser has been to work on improving
the error messages (also improving error messages in the semantic analysis
pass). I find that if people report a bad error message as a bug, I am less
sympathetic than if they report it as "here is a case of a message that i
found confusing, would be nice to better if possible" :-)

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: new parser: error recovery needs work
@ 2003-01-19 17:31 Robert Dewar
  0 siblings, 0 replies; 36+ messages in thread
From: Robert Dewar @ 2003-01-19 17:31 UTC (permalink / raw)
  To: phil, tilps; +Cc: gcc

> (I think i heard something about carat errors being on
> the way,

A little note here. I assume carat error means errors where there is a pointer
to the column of the error.

What we found very useful in GNAT was to define the following routines:

   procedure Error_Msg_AP (Msg : String);
   --  Output a message just after the previous token. This routine can be
   --  called only from the parser, since it references Prev_Token_Ptr.

   procedure Error_Msg_BC (Msg : String);
   --  Output a message just before the current token. Note that the important
   --  difference between this and the previous routine is that the BC case
   --  posts a flag on the current line, whereas AP can post a flag at the
   --  end of the preceding line. This routine can be called only from the
   --  parser, since it references Token_Ptr.

   procedure Error_Msg_SC (Msg : String);
   --  Output a message at the start of the current token, unless we are at
   --  the end of file, in which case we always output the message after the
   --  last real token in the file. This routine can be called only from the
   --  parser, since it references Token_Ptr.

   procedure Error_Msg_SP (Msg : String);
   --  Output a message at the start of the previous token. This routine can
   --  be called only from the parser, since it references Prev_Token_Ptr.

This allows very precise placement of the message pointers. One danger of
message pointers is that if they are not exactly right, they can obfuscate
rather than clarify the error.

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: new parser: error recovery needs work
@ 2003-01-18 21:53 Robert Dewar
  0 siblings, 0 replies; 36+ messages in thread
From: Robert Dewar @ 2003-01-18 21:53 UTC (permalink / raw)
  To: gdr, mark; +Cc: gcc, tilps

> | I thought about this, and I intentionally went for more technical error
> | messages.  I often find G++'s error messages confusing because of their
> | wafflyness -- I think "what on earth does that mean?"  G++ has a bad
> | tendency to make up terms that aren't in the standard or even in common
> | usage in the C++ community.  By using technical terms, at least people
> | can consult a reference book to figure out what's going on.

I think "common usage" is a better guide than "technical terms", especially
for a language where the great majority of C++ users have no effective access
to the standard. Even with Ada, where typically any serious Ada programmer
*does* have a copy of the standard readily available and consults it 
frequently, the use of technical terms can be confusing. For example if
a message says package, then technically this does not include generic
packages, but relying on this knowledge would be confusing. Similarly,
in common usage everyone uses the term package spec instead of package
declaration.

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: new parser: error recovery needs work
@ 2003-01-14 14:31 Jeff Donner
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff Donner @ 2003-01-14 14:31 UTC (permalink / raw)
  To: gcc

<a href="http://gcc.gnu.org/ml/gcc/2003-01/msg00663.html">Re:</a>

 >--On Monday, January 13, 2003 03:46:43 PM -0800 Joe Buck 
 ><jbuck@synopsys.com> wrote:
 >
 > The new parser doesn't produce useful diagnostics in the presence of
 > common errors.  Since the old parser did better, this is a regression.
 >
 >-- Mark Mitchell:
 >To be honest, I'm somewhat unsympathetic.  Not because I think the >error
 >messages are good, or because I think that we shouldn't do better, but
 >because it's hard to do better in some of these cases and because
 >we do noticably better in other cases -- the old parser just said
 >"parse error" a lot. :-)
 >
 >It's also hard to do better without breaking legal programs; it takes
 >a lot of head-scratching to think of all the cases.
 >
 >> The new parser might want to use a strategy that goes something
 >> like this:
 >> make a guess as to what was intended.  If a complete statement can be
 >> parsed according to that guess, then keep it.  Optionally try a >second
 >> guess, if there is one available, otherwise skip to some >synchronizing
 >> token.
 >
 >-- Mark Mitchell:
 >I'd prefer that we not introduce yet more backtracking.  Too >complicated.
 >...

If there is an explicit trace of states / previously seen tokens,
you can use it to dispense with having to imagine errors
by having a bunch of programmers dump traces when
they make errors.  A human analyses these, and maps
the traces to nice human messages.  (This idea comes from
a tool that automates this process for YACC-based compilers,

http://unicon.sourceforge.net/merr

which has a paper that explains the idea.)  This makes it
mechanical instead of imagination-stressing to deal with
the many specific cases, & it turns out it has pretty good
resolving power.

Examples of what a token trace can distinguish,
   allowing specific messages:
int main()       // parenthesis or semi-colon expected
int x y;         // missing comma in variable list
char() {}        // function name expected
int a[] = {1, 2; // unclosed initializer
struct foo
int x;           // missing { after struct label

So, I'm saying if it isn't in there, such a
state/token-trace & dump facility would make
constructing the error messages more mechanical, and
take less expertise to find/add new ones, and allow
a more comprehensive & specific set of messages.

Jeff Donner

^ permalink raw reply	[flat|nested] 36+ messages in thread
* new parser: error recovery needs work
@ 2003-01-14  1:24 Joe Buck
  2003-01-14  7:35 ` Mark Mitchell
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Buck @ 2003-01-14  1:24 UTC (permalink / raw)
  To: gcc

The new parser doesn't produce useful diagnostics in the presence of
common errors.  Since the old parser did better, this is a regression.
Some examples:

1) mis-spelling a type name.

unsinged i;

with the old parser produces

pe.cpp:1: error: 'unsinged' is used as a type, but is not defined as a type.

with the new parser we get

pe.cpp:1: error: expected constructor, destructor, or type conversion
pe.cpp:1: error: expected `,' or `;'

2) forgetting to provide a template argument list

#include <vector>
std::vector foo;

with the old parser produces

v1.cpp:3: invalid use of template-name 'std::vector' in a declarator
v1.cpp:3: syntax error before `;' token

and with the new parser produces

v1.cpp:3: error: expected constructor, destructor, or type conversion
v1.cpp:3: error: expected `,' or `;'

3) forgetting the std:: namespace (common for gcc-2.95.x code)

#include <vector>
vector<int> foo;

with the new parser:

v1.cpp:3: error: expected constructor, destructor, or type conversion
v1.cpp:3: error: expected `,' or `;'

with the old parser:
v2.cpp:3: error: 'vector' is used as a type, but is not defined as a type.
(not quite correct, but at least it's a good hint)

The old parser's treatment of #1 and #3 is due to an error recovery rule
that I came up with based on Gerald pointing out that the 2.x -> 3.x
conversion would be a disaster without it (as the only thing the 3.0-pre
compiler would say to typical 2.95.x code was lots of "syntax error
before `;` token" messages.  The hack was pretty simple, attempting to
match

IDENTIFIER optional_template_arg_list IDENTIFIER optional_arg_list ';'

which is a sequence that cannot occur in legal C++.  This was good enough
to catch a number of common mistakes.

The new parser's structure should make it possible to do better.  The key
is to attempt a reasonable guess as to what might have been intended;
in some cases, there is really only one possibility.

For example, if we have two unknown identifiers in a row, the only way the
code could be legal is if the first is a type, declaring the second, and
it is mis-spelled or the declaration was forgotten.

If we have a template identifier followed by another identifier, then
the likelihood is that the template argument list was forgotten.

If an symbol is unknown but there is a matching symbol in the std::
namespace, it's possible to print a "did you mean std::foo?" message.

The new parser might want to use a strategy that goes something like this:
make a guess as to what was intended.  If a complete statement can be
parsed according to that guess, then keep it.  Optionally try a second
guess, if there is one available, otherwise skip to some synchronizing
token.


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2003-01-19 15:26 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-16 13:40 new parser: error recovery needs work Robert Dewar
2003-01-16 17:20 ` Joe Buck
2003-01-16 18:15   ` Mark Mitchell
2003-01-16 17:34 ` Joe Buck
2003-01-16 18:05   ` Gareth Pearce
2003-01-16 20:00     ` Phil Edwards
2003-01-16 20:15       ` Neil Booth
2003-01-16 21:43       ` Janis Johnson
2003-01-16 23:33         ` Mark Mitchell
2003-01-17  8:12         ` Fergus Henderson
2003-01-17 10:06           ` Stan Shebs
2003-01-17 11:36           ` Ben Elliston
2003-01-17 19:15             ` DJ Delorie
2003-01-17 12:27           ` Neil Booth
2003-01-17 23:26             ` Stan Shebs
2003-01-19 14:38               ` Kai Henningsen
2003-01-19  5:18             ` Fergus Henderson
2003-01-19  7:02               ` Neil Booth
2003-01-19  9:45                 ` Fergus Henderson
2003-01-19 11:38               ` Zack Weinberg
2003-01-18 16:46           ` Gabriel Dos Reis
2003-01-17  8:12         ` Phil Edwards
2003-01-18 16:48           ` Gabriel Dos Reis
2003-01-17  5:26       ` Gareth Pearce
2003-01-17  8:36         ` Phil Edwards
2003-01-17  9:10           ` Gareth Pearce
2003-01-18 16:32       ` Gabriel Dos Reis
2003-01-19 14:48         ` Kai Henningsen
2003-01-19 19:45         ` Phil Edwards
2003-01-16 23:04     ` Mark Mitchell
2003-01-18 16:52       ` Gabriel Dos Reis
  -- strict thread matches above, loose matches on Subject: below --
2003-01-19 17:31 Robert Dewar
2003-01-18 21:53 Robert Dewar
2003-01-14 14:31 Jeff Donner
2003-01-14  1:24 Joe Buck
2003-01-14  7:35 ` Mark Mitchell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).