From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 123132 invoked by alias); 5 Dec 2015 19:53:28 -0000 Mailing-List: contact fortran-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: fortran-owner@gcc.gnu.org Received: (qmail 123098 invoked by uid 89); 5 Dec 2015 19:53:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_40,RCVD_IN_DNSWL_LOW,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: smtp22.services.sfr.fr Received: from smtp22.services.sfr.fr (HELO smtp22.services.sfr.fr) (93.17.128.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Sat, 05 Dec 2015 19:53:25 +0000 Received: from filter.sfr.fr (localhost [86.72.15.34]) by msfrf2219.sfr.fr (SMTP Server) with ESMTP id 8DD417000040; Sat, 5 Dec 2015 20:53:22 +0100 (CET) Authentication-Results: sfrmc.priv.atos.fr; dkim=none (no signature); dkim-adsp=none (no policy) header.from=mikael.morin@sfr.fr Received: from [192.168.1.85] (34.15.72.86.rev.sfr.net [86.72.15.34]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by msfrf2219.sfr.fr (SMTP Server) with ESMTP id A58DD70000B4; Sat, 5 Dec 2015 20:53:20 +0100 (CET) X-SFR-UUID: 20151205195320678.A58DD70000B4@msfrf2219.sfr.fr Subject: Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE To: Bernhard Reutner-Fischer , fortran@gcc.gnu.org References: <1448974501-30981-1-git-send-email-rep.dot.nop@gmail.com> <1448974501-30981-4-git-send-email-rep.dot.nop@gmail.com> Cc: gcc-patches@gcc.gnu.org, dmalcolm@redhat.com From: Mikael Morin Message-ID: <566340AC.3050408@sfr.fr> Date: Sat, 05 Dec 2015 19:53:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <1448974501-30981-4-git-send-email-rep.dot.nop@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2015-12/txt/msg00047.txt.bz2 Hello, to get things moving again, a few comments on top of David Malcolm's: Le 01/12/2015 13:55, Bernhard Reutner-Fischer a écrit : > > David Malcolm nice Levenshtein distance spelling check helpers > were used in some parts of other frontends. This proposed patch adds > some spelling corrections to the fortran frontend. > > Suggestions are printed if we can find a suitable name, currently > perusing a very simple cutoff factor: > /* If more than half of the letters were misspelled, the suggestion is > likely to be meaningless. */ > cutoff = MAX (strlen (typo), strlen (best_guess)) / 2; > which effectively skips names with less than 4 characters. > For e.g. structures, one could try to be much smarter in an attempt to > also provide suggestions for single-letter members/components. > > This patch covers (at least partly): > - user-defined operators > - structures (types and their components) > - functions > - symbols (variables) > > I do not immediately see how to handle subroutines. Ideas? > Not sure what you are looking for; I can get an error generated in gfc_procedure_use if using IMPLICIT NONE (EXTERNAL) > If anybody has a testcase where a spelling-suggestion would make sense > then please pass it along so we maybe can add support for GCC-7. > > diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c > index 685e3f5..6e1f63c 100644 > --- a/gcc/fortran/resolve.c > +++ b/gcc/fortran/resolve.c > @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. If not see > #include "data.h" > #include "target-memory.h" /* for gfc_simplify_transfer */ > #include "constructor.h" > +#include "spellcheck.h" > > /* Types used in equivalence statements. */ > > @@ -2682,6 +2683,61 @@ resolve_specific_f (gfc_expr *expr) > return true; > } > > +/* Recursively append candidate SYM to CANDIDATES. */ > + > +static void > +lookup_function_fuzzy_find_candidates (gfc_symtree *sym, > + vec *candidates) > +{ > + gfc_symtree *p; > + for (p = sym->right; p; p = p->right) > + { > + lookup_function_fuzzy_find_candidates (p, candidates); > + if (p->n.sym->ts.type != BT_UNKNOWN) > + candidates->safe_push (p->name); > + } > + for (p = sym->left; p; p = p->left) > + { > + lookup_function_fuzzy_find_candidates (p, candidates); > + if (p->n.sym->ts.type != BT_UNKNOWN) > + candidates->safe_push (p->name); > + } > +} It seems you are considering some candidates more than once here. The first time through the recursive call you will consider say sym->right->right, and with the loop, you'll consider it again after returning from the recursive call. The usual way to traverse the whole tree is to handle the current pointer and recurse on left and right pointers. So without loop. There is gfc_traverse_ns that you might find handy to do that (no obligation). Same goes for the user operators below. > + > + > +/* Lookup function FN fuzzily, taking names in FUN into account. */ > + > +const char* > +gfc_lookup_function_fuzzy (const char *fn, gfc_symtree *fun) > +{ > + auto_vec candidates; > + lookup_function_fuzzy_find_candidates (fun, &candidates); You have to start the lookup with the current namespace's sym_root (not with fun), otherwise you'll miss some candidates. You may also want to query parent namespaces for host-associated symbols. > + > + /* Determine closest match. */ > + int i; > + const char *name, *best = NULL; > + edit_distance_t best_distance = MAX_EDIT_DISTANCE; > + [...] > diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c > index ff9aff9..212f7d8 100644 > --- a/gcc/fortran/symbol.c > +++ b/gcc/fortran/symbol.c > @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3. If not see > #include "parse.h" > #include "match.h" > #include "constructor.h" > +#include "spellcheck.h" > > > /* Strings for all symbol attributes. We use these for dumping the > @@ -235,6 +236,62 @@ gfc_get_default_type (const char *name, gfc_namespace *ns) > } > > > +/* Recursively append candidate SYM to CANDIDATES. */ > + > +static void > +lookup_symbol_fuzzy_find_candidates (gfc_symtree *sym, > + vec *candidates) > +{ > + gfc_symtree *p; > + for (p = sym->right; p; p = p->right) > + { > + lookup_symbol_fuzzy_find_candidates (p, candidates); > + if (p->n.sym->ts.type != BT_UNKNOWN) > + candidates->safe_push (p->name); > + } > + for (p = sym->left; p; p = p->left) > + { > + lookup_symbol_fuzzy_find_candidates (p, candidates); > + if (p->n.sym->ts.type != BT_UNKNOWN) > + candidates->safe_push (p->name); > + } > +} This looks like the same as lookup_function_fuzzy_find_candidates, isn't it? Maybe have a general symbol traversal function with a selection callback argument to test whether the symbol is what you want, depending on the context (is it a function? a subroutine? etc). > + > + > +/* Lookup symbol SYM fuzzily, taking names in SYMBOL into account. */ > + > +static const char* > +lookup_symbol_fuzzy (const char *sym, gfc_symbol *symbol) > +{ > + auto_vec candidates; > + lookup_symbol_fuzzy_find_candidates (symbol->ns->sym_root, &candidates); > + > + /* Determine closest match. */ > + int i; > + const char *name, *best = NULL; > + edit_distance_t best_distance = MAX_EDIT_DISTANCE; > + > + FOR_EACH_VEC_ELT (candidates, i, name) > + { > + edit_distance_t dist = levenshtein_distance (sym, name); > + if (dist < best_distance) > + { > + best_distance = dist; > + best = name; > + } > + } > + /* If more than half of the letters were misspelled, the suggestion is > + likely to be meaningless. */ > + if (best) > + { > + unsigned int cutoff = MAX (strlen (sym), strlen (best)) / 2; > + if (best_distance > cutoff) > + return NULL; > + } > + return best; > +} > + > + > /* Given a pointer to a symbol, set its type according to the first > letter of its name. Fails if the letter in question has no default > type. */ > @@ -253,8 +310,15 @@ gfc_set_default_type (gfc_symbol *sym, int error_flag, gfc_namespace *ns) > { > if (error_flag && !sym->attr.untyped) > { > - gfc_error ("Symbol %qs at %L has no IMPLICIT type", > - sym->name, &sym->declared_at); > + const char *guessed > + = lookup_symbol_fuzzy (sym->name, sym); > + if (guessed) > + gfc_error ("Symbol %qs at %L has no IMPLICIT type" > + "; did you mean %qs?", > + sym->name, &sym->declared_at, guessed); > + else > + gfc_error ("Symbol %qs at %L has no IMPLICIT type", > + sym->name, &sym->declared_at); > sym->attr.untyped = 1; /* Ensure we only give an error once. */ > } > > @@ -2188,6 +2252,55 @@ bad: > } > > > +/* Recursively append candidate COMPONENT structures to CANDIDATES. */ > + > +static void > +lookup_component_fuzzy_find_candidates (gfc_component *component, > + vec *candidates) > +{ > + for (gfc_component *p = component; p; p = p->next) > + { > + if (00 && p->ts.type == BT_DERIVED) > + /* ??? There's no (suitable) DERIVED_TYPE which would come in > + handy throughout the frontend; Use CLASS_DATA here for brevity. */ > + lookup_component_fuzzy_find_candidates (CLASS_DATA (p), candidates); I don't understand what you are looking for here. Are you trying to handle type extension? Then I guess you would have to pass the derived type symbol instead of its components, and use gfc_get_derived_super_type to retrieve the parent type. Mikael