From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 77344 invoked by alias); 23 Apr 2016 18:22:05 -0000 Mailing-List: contact fortran-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: fortran-owner@gcc.gnu.org Received: (qmail 77318 invoked by uid 89); 23 Apr 2016 18:22:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=closest, malcolm, Malcolm, meaningless X-Spam-User: qpsmtpd, 2 recipients X-HELO: mail-wm0-f65.google.com Received: from mail-wm0-f65.google.com (HELO mail-wm0-f65.google.com) (74.125.82.65) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Sat, 23 Apr 2016 18:22:03 +0000 Received: by mail-wm0-f65.google.com with SMTP id w143so10956367wmw.3; Sat, 23 Apr 2016 11:22:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:user-agent:in-reply-to:references:mime-version :content-transfer-encoding:subject:from:date:to:cc:message-id; bh=9JvCMyDckdhIBCQsUYIxDeaBqsKAUmrM/BISPxPQ6bY=; b=MozNB0TvZOKwjaLAqbVrci1+QAR2WoimDvxRHHk+Kuyk3oV2NJRA3KtkMdgxSg4mBv 0uFmAcm/zTsT5WmZSeQ3ucQsSkuhV1+OM0akESemcxyJh/yUYcXkOzxvy8xjjhTXg4Sr /hWAo2dvDWIwfRI/nqr6zUoRr0dwzipCFIQgBHS+Hv6zpFgVvOl2pJde0sgBbP6oxW9U /MekpcXwxKHFOjjDzfU2/EQkC0Jm1jR992ZY3XntPm/RMiM0uZMpLOzpmc+YqpePOX/i rrDX8AzMtD1D/82cwstBARPR+6vTvaSn7wPR1TzI0km6PbSUirs7knPSAYmSAZml+FvO gcoQ== X-Gm-Message-State: AOPr4FUEtosmusA45idChQ6zw6j2E8moGVAf6kFg9uyeNHa5GYoH7un1UBgoOTGjkkpZgA== X-Received: by 10.194.248.200 with SMTP id yo8mr26104616wjc.38.1461435720333; Sat, 23 Apr 2016 11:22:00 -0700 (PDT) Received: from [10.46.194.191] (089144194191.atnat0003.highway.a1.net. [89.144.194.191]) by smtp.gmail.com with ESMTPSA id r2sm14271923wjm.8.2016.04.23.11.21.58 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 23 Apr 2016 11:21:59 -0700 (PDT) User-Agent: K-9 Mail for Android In-Reply-To: <1457362636.9813.27.camel@redhat.com> References: <1451252568-16045-1-git-send-email-rep.dot.nop@gmail.com> <1457217975-28803-1-git-send-email-rep.dot.nop@gmail.com> <1457362636.9813.27.camel@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Subject: Re: [PATCH, fortran, v3] Use Levenshtein spelling suggestions in Fortran FE From: Bernhard Reutner-Fischer Date: Sat, 23 Apr 2016 18:22:00 -0000 To: David Malcolm ,fortran@gcc.gnu.org CC: gcc-patches@gcc.gnu.org,VandeVondele Joost Message-ID: <15FCD903-179E-4D64-B4C2-0B37E6011E38@gmail.com> X-IsSubscribed: yes X-SW-Source: 2016-04/txt/msg00056.txt.bz2 On March 7, 2016 3:57:16 PM GMT+01:00, David Malcolm wrote: >On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote: >[...] > >> diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c >> index 405bae0..72ed311 100644 >> --- a/gcc/fortran/misc.c >> +++ b/gcc/fortran/misc.c >[...] > >> @@ -274,3 +275,41 @@ get_c_kind(const char *c_kind_name,teropKind_tki >> nds_table[]) >> >> return ISOCBINDING_INVALID; >> } >> + >> + >> +/* For a given name TYPO, determine the best candidate from >> CANDIDATES >> + perusing Levenshtein distance. Frees CANDIDATES before >> returning. */ >> + >> +const char * >> +gfc_closest_fuzzy_match (const char *typo, char **candidates) >> +{ >> + /* Determine closest match. */ >> + const char *best = NULL; >> + char **cand = candidates; >> + edit_distance_t best_distance = MAX_EDIT_DISTANCE; >> + >> + while (cand && *cand) >> + { >> + edit_distance_t dist = levenshtein_distance (typo, *cand); >> + if (dist < best_distance) >> + { >> + best_distance = dist; >> + best = *cand; >> + } >> + cand++; >> + } >> + /* If more than half of the letters were misspelled, the >> suggestion is >> + likely to be meaningless. */ >> + if (best) >> + { >> + unsigned int cutoff = MAX (strlen (typo), strlen (best)) / 2; >> + >> + if (best_distance > cutoff) >> + { >> + XDELETEVEC (candidates); >> + return NULL; >> + } >> + XDELETEVEC (candidates); >> + } >> + return best; >> +} > >FWIW, there are two overloaded variants of levenshtein_distance in >gcc/spellcheck.h, the first of which takes a pair of strlen values; >your patch uses the second one: > >extern edit_distance_t >levenshtein_distance (const char *s, int len_s, > const char *t, int len_t); > >extern edit_distance_t >levenshtein_distance (const char *s, const char *t); > >So one minor tweak you may want to consider here is to calculate > strlen (typo) >once at the top of gfc_closest_fuzzy_match, and then pass it in to the >4-arg variant of levenshtein_distance, which would avoid recalculating >strlen (typo) for every candidate. I've pondered this back then but came to the conclusion to use the variant without len because to use the 4 argument variant I would have stored the candidates strlen in the vector too and was not convinced about the memory footprint for that would be justified. Maybe it is, but I would prefer the following tweak in the 4 argument variant: If you would amend the 4 argument variant with a if (len_t == -1) len_t = strlen (t); before the   if (len_s == 0)     return len_t;   if (len_t == 0)     return len_s; checks then I'd certainly use the 4 arg variant :) WDYT? > >I can't comment on the rest of the patch (I'm not a Fortran expert), >though it seems sane to > >Hope this is constructive It is, thanks for your thoughts! cheers,