From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-72933-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 71854 invoked by alias); 29 Aug 2016 22:06:22 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 71845 invoked by uid 89); 29 Aug 2016 22:06:21 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=surprise
X-HELO: mail-yw0-f193.google.com
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20130820;
        h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
         :date:message-id:subject:to:cc;
        bh=Jayd2nKLy2VKaeL90IU6f2D4+zvp43HFU+lVFq/rx40=;
        b=kOQA7xfJA9tlK8dFGv3+YqhWAhffVwUoERXV+FpyDkTyh31/qlFdXUohT2QTGpRKW+
         bUtvLR+5YYrC1Hobta0GDQqyOZ9E4SrpLJkTBpJzswbLFjpMTReoHxM6smtrgUeG8YMk
         eBBUnD/ypVs/GILrohoEdvKDncOu53YubRudnqe0h596gb1CuLJTpT/zKGKU804So1RV
         JRfSey1T07iGQxKdKkSBOCUjvG3+ObelXWxV/lSusMS0lsdVe8qB2U1929i52XI/FRMQ
         NLTs6TpxJA4gpnI2Wb0phlLEMoGpX9tUpLkOCqOCnvAaiVZ+flg6jKmTqA6cklUd/q0x
         Cmlw==
X-Gm-Message-State: AE9vXwPjhehCM34TW9K1GWyQyX9I+4LUvoC1EtdchF9QE1c3WDzTMXn7kdUXoK6jDQvwTLQglK/0S4Fxf+/I9g==
X-Received: by 10.129.115.4 with SMTP id o4mr229339ywc.312.1472508375299; Mon,
 29 Aug 2016 15:06:15 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <20160829202145.A2548403DC4A9@oldenburg.str.redhat.com>
References: <20160829202145.A2548403DC4A9@oldenburg.str.redhat.com>
From: Michael Kerrisk <mtk.manpages@gmail.com>
Date: Mon, 29 Aug 2016 22:06:00 -0000
Message-ID: <CALxWeYpXBMo6+d3QTtbz1rM2Jv0Gn+G5OTNDV8Sze06Gp9hnrw@mail.gmail.com>
Subject: Re: [PATCH] manual: Clarify the documentation of strverscmp [BZ #20524]
To: Florian Weimer <fweimer@redhat.com>
Cc: libc-alpha <libc-alpha@sourceware.org>
Content-Type: text/plain; charset=UTF-8
X-SW-Source: 2016-08/txt/msg00851.txt.bz2

Hello Florian,

On Tue, Aug 30, 2016 at 8:21 AM, Florian Weimer <fweimer@redhat.com> wrote:
> 2016-08-29  Florian Weimer  <fweimer@redhat.com>
>
>         [BZ #20524]
>         * manual/string.texi (String/Array Comparison): Clarify the
>         strverscmp behavior.
>
> diff --git a/manual/string.texi b/manual/string.texi
> index bce81a7..3577017 100644
> --- a/manual/string.texi
> +++ b/manual/string.texi
> @@ -1374,46 +1374,70 @@ The @code{strverscmp} function compares the string @var{s1} against
>  @var{s2}, considering them as holding indices/version numbers.  The
>  return value follows the same conventions as found in the
>  @code{strcmp} function.  In fact, if @var{s1} and @var{s2} contain no
> -digits, @code{strverscmp} behaves like @code{strcmp}.
> +digits, @code{strverscmp} behaves like @code{strcmp}
> +(in the sense that the sign of the result is the same).
>
> -Basically, we compare strings normally (byte by byte), until
> -we find a digit in each string - then we enter a special comparison
> -mode, where each sequence of digits is taken as a whole.  If we reach the
> -end of these two parts without noticing a difference, we return to the
> -standard comparison mode.  There are two types of numeric parts:
> -"integral" and "fractional" (those  begin with a '0').  The types
> -of the numeric parts affect the way we sort them:
> +The comparison algorithm which the @code{strverscmp} function implements
> +differs slightly from other version comparison algorithms.  The

s/version comparison/version-comparison/

> +implementation is based on a finite state machine, whose behavior is

s/finite state machine/finite-state machine/

> +approximated below.
>
>  @itemize @bullet
>  @item
> -integral/integral: we compare values as you would expect.
> +The input strings are each split into sequences of non-digits and
> +digits.  These sequences can be empty at the beginning and end of the
> +string.  Digits are determined by the @code{isdigit} function and are
> +thus subject to the current locale.
>
>  @item
> -fractional/integral: the fractional part is less than the integral one.
> -Again, no surprise.
> +Comparison starts with a (possibly empty) non-digit sequence.  The first
> +non-equal sequences of non-digits or digits determines the outcome of
> +the comparison.
>
>  @item
> -fractional/fractional: the things become a bit more complex.
> -If the common prefix contains only leading zeroes, the longest part is less
> -than the other one; else the comparison behaves normally.
> +Corresponding non-digit sequences in both strings are compared
> +lexicographically.  If their lengths differ, the shorter non-digit

Should this be "If their lengths differ, and the shorter string is
equal to the corresponding prefix in the longer string..."?

> +sequence is amended with the input string character immediately

s/amended/extended/ ?

> +following it (which can be the null terminator).  This character is
> +compared against the non-digit character in the other sequence, at the
> +same string offset, and their difference determines the outcome of the
> +comparison.
> +
> +@item
> +For two digit sequences, the number of leading zeros is counted (which
> +can be zero).  If the count differs, the string with more leading zeros
> +in the digit sequence is considered smaller than the other string.
> +
> +@item
> +If the two digit sequences have no leading zeros, they are compared as
> +integers, that is, the string with the longer digit sequence is deemed
> +larger, and if both sequences are of equal length, they are compared
> +lexicographically.
> +
> +@item
> +If both digit sequences have an equal, positive number of leading zeros,

Why is the word "positive" here?

> +they are compared lexicographically.  If their length differs, another
> +character is added to to the shorter sequence,

"another character is added to to the shorter sequence" is vague. You
want wording like you used above.

> to determine the outcome
> +of the comparison, as described above for the non-digit sequence case.
>  @end itemize
>
> +The treatment of leading zeros and the tie-breaking characters (which in
> +effect propagate across non-digit/digit sequence boundaries) differ from

s/differ/differs/

> +other version comparison algorithms.

s/version comparison/version-comparison/

> +
>  @smallexample
>  strverscmp ("no digit", "no digit")
>      @result{} 0    /* @r{same behavior as strcmp.} */
>  strverscmp ("item#99", "item#100")
>      @result{} <0   /* @r{same prefix, but 99 < 100.} */
>  strverscmp ("alpha1", "alpha001")
> -    @result{} >0   /* @r{fractional part inferior to integral one.} */
> +    @result{} >0   /* @r{different number of leading zeros (0 and 2).} */
>  strverscmp ("part1_f012", "part1_f01")
> -    @result{} >0   /* @r{two fractional parts.} */
> +    @result{} >0   /* @r{lexicographical comparison with leading zeros.} */
>  strverscmp ("foo.009", "foo.0")
> -    @result{} <0   /* @r{idem, but with leading zeroes only.} */
> +    @result{} <0   /* @r{different number of leading zeros (2 and 1).} */
>  @end smallexample
>
> -This function is especially useful when dealing with filename sorting,
> -because filenames frequently hold indices/version numbers.
> -
>  @code{strverscmp} is a GNU extension.
>  @end deftypefun

I've not completely checked all of the details, but what you write
certainly matches what I did check, and is a great deal better than
the existing text. But, the algorithm it describes *is* strange.

Cheers,

Michael