From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 115483 invoked by alias); 7 Mar 2016 14:57:22 -0000 Mailing-List: contact fortran-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: fortran-owner@gcc.gnu.org Received: (qmail 115455 invoked by uid 89); 7 Mar 2016 14:57:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=candidates, meaningless, xdeletevec, XDELETEVEC X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 07 Mar 2016 14:57:20 +0000 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 8492F7F09A; Mon, 7 Mar 2016 14:57:18 +0000 (UTC) Received: from vpn-229-7.phx2.redhat.com (vpn-229-7.phx2.redhat.com [10.3.229.7]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u27EvGpP032547; Mon, 7 Mar 2016 09:57:17 -0500 Message-ID: <1457362636.9813.27.camel@redhat.com> Subject: Re: [PATCH, fortran, v3] Use Levenshtein spelling suggestions in Fortran FE From: David Malcolm To: Bernhard Reutner-Fischer , fortran@gcc.gnu.org Cc: gcc-patches@gcc.gnu.org, VandeVondele Joost Date: Mon, 07 Mar 2016 14:57:00 -0000 In-Reply-To: <1457217975-28803-1-git-send-email-rep.dot.nop@gmail.com> References: <1451252568-16045-1-git-send-email-rep.dot.nop@gmail.com> <1457217975-28803-1-git-send-email-rep.dot.nop@gmail.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SW-Source: 2016-03/txt/msg00023.txt.bz2 On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote: [...] > diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c > index 405bae0..72ed311 100644 > --- a/gcc/fortran/misc.c > +++ b/gcc/fortran/misc.c [...] > @@ -274,3 +275,41 @@ get_c_kind(const char *c_kind_name,teropKind_tki > nds_table[]) > > return ISOCBINDING_INVALID; > } > + > + > +/* For a given name TYPO, determine the best candidate from > CANDIDATES > + perusing Levenshtein distance. Frees CANDIDATES before > returning. */ > + > +const char * > +gfc_closest_fuzzy_match (const char *typo, char **candidates) > +{ > + /* Determine closest match. */ > + const char *best = NULL; > + char **cand = candidates; > + edit_distance_t best_distance = MAX_EDIT_DISTANCE; > + > + while (cand && *cand) > + { > + edit_distance_t dist = levenshtein_distance (typo, *cand); > + if (dist < best_distance) > + { > + best_distance = dist; > + best = *cand; > + } > + cand++; > + } > + /* If more than half of the letters were misspelled, the > suggestion is > + likely to be meaningless. */ > + if (best) > + { > + unsigned int cutoff = MAX (strlen (typo), strlen (best)) / 2; > + > + if (best_distance > cutoff) > + { > + XDELETEVEC (candidates); > + return NULL; > + } > + XDELETEVEC (candidates); > + } > + return best; > +} FWIW, there are two overloaded variants of levenshtein_distance in gcc/spellcheck.h, the first of which takes a pair of strlen values; your patch uses the second one: extern edit_distance_t levenshtein_distance (const char *s, int len_s, const char *t, int len_t); extern edit_distance_t levenshtein_distance (const char *s, const char *t); So one minor tweak you may want to consider here is to calculate strlen (typo) once at the top of gfc_closest_fuzzy_match, and then pass it in to the 4-arg variant of levenshtein_distance, which would avoid recalculating strlen (typo) for every candidate. I can't comment on the rest of the patch (I'm not a Fortran expert), though it seems sane to Hope this is constructive Dave