From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 44304 invoked by alias); 17 Sep 2015 08:43:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 44283 invoked by uid 89); 17 Sep 2015 08:43:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f178.google.com Received: from mail-yk0-f178.google.com (HELO mail-yk0-f178.google.com) (209.85.160.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 17 Sep 2015 08:43:23 +0000 Received: by ykdt18 with SMTP id t18so10580099ykd.3 for ; Thu, 17 Sep 2015 01:43:21 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.129.52.10 with SMTP id b10mr31704343ywa.58.1442479401710; Thu, 17 Sep 2015 01:43:21 -0700 (PDT) Received: by 10.37.93.136 with HTTP; Thu, 17 Sep 2015 01:43:21 -0700 (PDT) In-Reply-To: References: <55F2F393.9050501@gmail.com> <1442331491-11471-1-git-send-email-dmalcolm@redhat.com> Date: Thu, 17 Sep 2015 08:46:00 -0000 Message-ID: Subject: Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2 From: Richard Biener To: =?UTF-8?B?TWFudWVsIEzDs3Blei1JYsOhw7Fleg==?= Cc: Michael Matz , David Malcolm , GCC Patches Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-09/txt/msg01260.txt.bz2 On Wed, Sep 16, 2015 at 5:45 PM, Manuel L=C3=B3pez-Ib=C3=A1=C3=B1ez wrote: > On 16 September 2015 at 15:33, Richard Biener > wrote: >> On Wed, Sep 16, 2015 at 3:22 PM, Michael Matz wrote: >>>> if we suggest 'foo' instead of foz then we'll get a more confusing fol= lowup >>>> error if we actually use it. >>> >>> This particular case could be solved by ruling out candidaten of the wr= ong >>> kind (here, something that can be assigned to, vs. a function). But it >>> might actually be too early in parsing to say that there will be an >>> assignment. I don't think _this_ problem should block the patch. > > Indeed. The patch by David does not try to fix-up the code, it merely > suggests a possible candidate. The follow-up errors should be the same > before and after. Such suggestions will never be 100% right, even if > the suggestion makes the code compile and run, it may still be the > wrong one. A wrong suggestion is far less serious than a wrong > uninitialized or Warray-bounds warning and we can live with those. Why > this needs to be perfect from the very beginning? > > BTW, there is a PR for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id= =3D52277 > >> I wonder if we can tentatively parse with the choice at hand, only allow= ing >> (and even suggesting?) it if that works out. > > This would require to queue the error, fix-up the wrong name and > continue parsing. If there is another error, ignore that one and emit > the original error without suggestion. The problem here is that we do > not know if the additional error is actually caused by the fix-up we > did or it is an already existing error. It would be equally terrible > to emit errors caused by the fix-up or emit just a single error for > the typo. We would need to roll-back the tentative parse and do a > definitive parse anyway. This does not seem possible at the moment > because the parsers maintain a lot of global state that is not easy to > roll-back. We cannot simply create a copy of the parser state and > throw it away later to continue as if the tentative parse has not > happened. > > I'm not even sure if, in general, one can stop at the statement level > or we would need to parse the whole function (or translation unit) to > be able to tell if the suggestion is a valid candidate. I was suggesting to only tentatively finish parsing the "current construct". No idea how to best figure that out to the extend to make the tentative parse useful. Say, if we have "a + s.foz" and the field foz is not there but foo is, so if we continue parsing with 'foo' instead but 'foo' will have a type that makes "a + s.foo" invalid then we probably shouldn't suggest it. It _might_ be reasonably "easy" to implement that, but I'm not sure. There might be a field named fz (with same or bigger levenstein distance) with the correct type. Of course it might have been I misspelled 's' and meant 'r' instead which has a field foz of corect type... (and 's' is available as well). I agree that we don't have to solve all this in the first iteration. Richard. > Cheers, > > Manuel.