From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 105502 invoked by alias); 25 Apr 2016 17:07:31 -0000 Mailing-List: contact fortran-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: fortran-owner@gcc.gnu.org Received: (qmail 105466 invoked by uid 89); 25 Apr 2016 17:07:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=dmalcolmredhatcom, dmalcolm@redhat.com X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 25 Apr 2016 17:07:25 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 75953C057FAD; Mon, 25 Apr 2016 17:07:23 +0000 (UTC) Received: from vpn-231-232.phx2.redhat.com (vpn-231-232.phx2.redhat.com [10.3.231.232]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u3PH7M2h014241; Mon, 25 Apr 2016 13:07:22 -0400 Message-ID: <1461604041.4417.14.camel@redhat.com> Subject: Re: [PATCH, fortran, v3] Use Levenshtein spelling suggestions in Fortran FE From: David Malcolm To: Bernhard Reutner-Fischer , fortran@gcc.gnu.org Cc: gcc-patches@gcc.gnu.org, VandeVondele Joost Date: Mon, 25 Apr 2016 17:07:00 -0000 In-Reply-To: <15FCD903-179E-4D64-B4C2-0B37E6011E38@gmail.com> References: <1451252568-16045-1-git-send-email-rep.dot.nop@gmail.com> <1457217975-28803-1-git-send-email-rep.dot.nop@gmail.com> <1457362636.9813.27.camel@redhat.com> <15FCD903-179E-4D64-B4C2-0B37E6011E38@gmail.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SW-Source: 2016-04/txt/msg00058.txt.bz2 On Sat, 2016-04-23 at 20:21 +0200, Bernhard Reutner-Fischer wrote: > On March 7, 2016 3:57:16 PM GMT+01:00, David Malcolm < > dmalcolm@redhat.com> wrote: > > On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote: > > [...] > > > > > diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c > > > index 405bae0..72ed311 100644 > > > --- a/gcc/fortran/misc.c > > > +++ b/gcc/fortran/misc.c > > [...] > > > > > @@ -274,3 +275,41 @@ get_c_kind(const char > > > *c_kind_name,teropKind_tki > > > nds_table[]) > > > > > > return ISOCBINDING_INVALID; > > > } > > > + > > > + > > > +/* For a given name TYPO, determine the best candidate from > > > CANDIDATES > > > + perusing Levenshtein distance. Frees CANDIDATES before > > > returning. */ > > > + > > > +const char * > > > +gfc_closest_fuzzy_match (const char *typo, char **candidates) > > > +{ > > > + /* Determine closest match. */ > > > + const char *best = NULL; > > > + char **cand = candidates; > > > + edit_distance_t best_distance = MAX_EDIT_DISTANCE; > > > + > > > + while (cand && *cand) > > > + { > > > + edit_distance_t dist = levenshtein_distance (typo, *cand); > > > + if (dist < best_distance) > > > + { > > > + best_distance = dist; > > > + best = *cand; > > > + } > > > + cand++; > > > + } > > > + /* If more than half of the letters were misspelled, the > > > suggestion is > > > + likely to be meaningless. */ > > > + if (best) > > > + { > > > + unsigned int cutoff = MAX (strlen (typo), strlen (best)) / > > > 2; > > > + > > > + if (best_distance > cutoff) > > > + { > > > + XDELETEVEC (candidates); > > > + return NULL; > > > + } > > > + XDELETEVEC (candidates); > > > + } > > > + return best; > > > +} > > > > FWIW, there are two overloaded variants of levenshtein_distance in > > gcc/spellcheck.h, the first of which takes a pair of strlen values; > > your patch uses the second one: > > > > extern edit_distance_t > > levenshtein_distance (const char *s, int len_s, > > const char *t, int len_t); > > > > extern edit_distance_t > > levenshtein_distance (const char *s, const char *t); > > > > So one minor tweak you may want to consider here is to calculate > > strlen (typo) > > once at the top of gfc_closest_fuzzy_match, and then pass it in to > > the > > 4-arg variant of levenshtein_distance, which would avoid > > recalculating > > strlen (typo) for every candidate. > > I've pondered this back then but came to the conclusion to use the > variant without len because to use the 4 argument variant I would > have stored the candidates strlen in the vector too Why would you need to do that? You can simply call strlen inside the loop instead; something like: size_t strlen_typo = strlen (typo); while (cand && *cand) { edit_distance_t dist = levenshtein_distance (typo, strlen_typo, *cand, strlen (*cand)); etc > and was not convinced about the memory footprint for that would be > justified. Maybe it is, but I would prefer the following tweak in the > 4 argument variant: > If you would amend the 4 argument variant with a > > if (len_t == -1) > len_t = strlen (t); > before the > if (len_s == 0) > return len_t; > if (len_t == 0) > return len_s; > > checks then I'd certainly use the 4 arg variant :) > > WDYT? > > > > I can't comment on the rest of the patch (I'm not a Fortran > > expert), > > though it seems sane to > > > > Hope this is constructive > > It is, thanks for your thoughts! > > cheers, >