From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=JiD7=A5=gotplt.org=siddhesh@sourceware.org>
Received: from tiger.tulip.relay.mailchannels.net (tiger.tulip.relay.mailchannels.net [23.83.218.248])
	by sourceware.org (Postfix) with ESMTPS id 51F003858D28
	for <libc-alpha@sourceware.org>; Mon,  8 May 2023 17:47:52 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 51F003858D28
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gotplt.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gotplt.org
X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org
Received: from relay.mailchannels.net (localhost [127.0.0.1])
	by relay.mailchannels.net (Postfix) with ESMTP id C7C0C3E249A;
	Mon,  8 May 2023 17:47:49 +0000 (UTC)
Received: from pdx1-sub0-mail-a304.dreamhost.com (unknown [127.0.0.6])
	(Authenticated sender: dreamhost)
	by relay.mailchannels.net (Postfix) with ESMTPA id 576733E221F;
	Mon,  8 May 2023 17:47:49 +0000 (UTC)
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1683568069; a=rsa-sha256;
	cv=none;
	b=i8b/wfwGq45OryeGiJOV+9tWPWDbka21a556+a5JKlkGCrYTMfBG8moo+MvWpdUfi5jS1Z
	KQQ8amWJGlz+Yeve05pKZd7ByXSPfNcCL6A7X12EEjO+CDuFhQRXWQzdBGhDzI+8asMWsu
	fbb7b4n7RuwlrIfWHShhjDwKzZkoV0PgOS+GdJCINBXg4+vTUuFgvfpYn02QJF8XyvvJNm
	ls5Je2MOh1gehW/K4wtqb1+V92/UdQEevtRoDLxqPpbKHkpy4mRN4CAASBn5027WKOvDcq
	2TiZyUnF7YtGWqcxxnhqmjwCWG3kPMJlbdWnzgP1IPjFi0TpLW6JGfoAF9C0NA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=mailchannels.net;
	s=arc-2022; t=1683568069;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=4UEfRt8gFPEVI+t06VX5cHV3lrc4pJ7vian2LYDxXxE=;
	b=wMev/PNap9K/Wxcm8GpeHiuALDd1RvgsMTmIE6w8VqgbyveMZ+YBD1HH6Eh/u/R9biC2ru
	zhsRsvqoEkhu+welYWAwiNXrP78H4CNNDpEEhv/V8pd0PBoPmGmbDm7Fx0xGNai8YiNdXi
	T1EPTFDRZ0ddqvuZxqyndJ989u/nVwAmNmYbtY7zteAFv6GJGqwX0E6B5jp5/KB8b/dyKg
	ufReVAxp3OSEZHVXKmqdy5AzKZdQnRGKxZhpBS9GyiDQ/Ja+K5/9NpaqpNcR3yZUslwKw6
	OgWUx++gRlCP3I7eFO39Ar+OViv+wUJMv4tzDoGHnKGT3K73LRGEd1fzHWI/HQ==
ARC-Authentication-Results: i=1;
	rspamd-79bb5575d7-tcnpf;
	auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@gotplt.org
X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org
X-MailChannels-Auth-Id: dreamhost
X-Cold-Thread: 183d55ae1fb3361a_1683568069608_2869646286
X-MC-Loop-Signature: 1683568069608:1420286143
X-MC-Ingress-Time: 1683568069608
Received: from pdx1-sub0-mail-a304.dreamhost.com (pop.dreamhost.com
 [64.90.62.162])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384)
	by 100.125.42.172 (trex/6.7.2);
	Mon, 08 May 2023 17:47:49 +0000
Received: from [192.168.85.119] (unknown [24.114.52.148])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits))
	(No client certificate requested)
	(Authenticated sender: siddhesh@gotplt.org)
	by pdx1-sub0-mail-a304.dreamhost.com (Postfix) with ESMTPSA id 4QFTK03lk4zNn;
	Mon,  8 May 2023 10:47:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gotplt.org;
	s=dreamhost; t=1683568069;
	bh=4UEfRt8gFPEVI+t06VX5cHV3lrc4pJ7vian2LYDxXxE=;
	h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
	b=KC/mHfzemmQdZXjYsuZF/eWTvA6XsNR6X5qeAoWBdaz90aUMMmpyCkxpreg1Qt0yc
	 nd7Do94zDoxxSmh3xWURCVb32RcyZJQ8spwn3150UnRrLVl5xKtrlplFen0MzJp8qd
	 SpRUMPyos4NP/38bLO505lARlOO/U0xrNjYs+r8Bp/evQfwKoYIPFfEr6Qo5Mb6dnl
	 B3NE730D/rFhyVoRG73IffzginsgnrohBUX4Tfl0u8O79ukmq7XKGhaQW3ptpmrjja
	 Bi4LrGCcW9gYbYrf9DAAiZh21vTTo+mc9JHNOpgrArr0oZn33aI95gn1bJNl4QOqf5
	 ifqD1OE5rUyPg==
Message-ID: <7372d68a-bd89-d883-4314-9a5a8ffbc2a6@gotplt.org>
Date: Mon, 8 May 2023 13:47:45 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.10.0
Subject: Re: [PATCH 2/2] scripts: Add sort-makefile-lines.py to sort Makefile
 variables.
Content-Language: en-US
To: Carlos O'Donell <carlos@redhat.com>, libc-alpha@sourceware.org
References: <20230428114811.4129539-1-carlos@redhat.com>
 <20230428114811.4129539-3-carlos@redhat.com>
From: Siddhesh Poyarekar <siddhesh@gotplt.org>
In-Reply-To: <20230428114811.4129539-3-carlos@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-3038.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>

On 2023-04-28 07:48, Carlos O'Donell via Libc-alpha wrote:
> The scripts/sort-makefile-lines.py script sorts Makefile variables
> according to project expected order.
> 
> The script is used like this:
> 
> $ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
> $ mv elf/Makefile.tmp elf/Makefile

Should we have a convenience make target like `make sort-makefile-lines` 
to allow folks to reflow files?  I'm trying to think of how we could 
make it easy for users to consume this.

> ---
>   scripts/sort-makefile-lines.py | 217 +++++++++++++++++++++++++++++++++
>   1 file changed, 217 insertions(+)
>   create mode 100755 scripts/sort-makefile-lines.py
> 
> diff --git a/scripts/sort-makefile-lines.py b/scripts/sort-makefile-lines.py
> new file mode 100755
> index 0000000000..06c0b3b3a2
> --- /dev/null
> +++ b/scripts/sort-makefile-lines.py
> @@ -0,0 +1,217 @@
> +#!/usr/bin/python3
> +# Sort Makefile lines as expected by project policy.
> +# Copyright (C) 2022-2023 Free Software Foundation, Inc.
> +# Copyright The GNU Toolchain Authors.

Why both?  Does it include code that was written by someone who does not 
have copyright assignment on file with the FSF?

> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +# The project consensus is to split Makefile variable assignment
> +# across multiple lines with one value per line.  The values are
> +# then sorted as described below, and terminated with a special
> +# list termination marker.  This splitting makes it much easier
> +# to add new tests to the list since they become just a single
> +# line insertion.  It also makes backports and merges easier
> +# since the new test may not conflict due to the ordering.
> +#
> +# Consensus discussion:
> +# https://public-inbox.org/libc-alpha/f6406204-84f5-adb1-d00e-979ebeebbbde@redhat.com/
> +#
> +# To support cleaning up Makefiles we created this program to
> +# help sort existing lists converted to the new format.
> +#
> +# The program takes as input the Makefile to sort correctly,
> +# and the output file to write the correctly sorted output.
> +#
> +# Sorting is only carried out between two special markers:
> +# (a) Marker start is '<variable> += \'
> +# (b) Marker end is '  # <variable>'
> +# With everthing between (a) and (b) being sorted.
> +#
> +# You can use it like this:
> +# $ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
> +# $ mv elf/Makefile.tmp elf/Makefile
> +#
> +# The Makefile lines in the project are sorted using the
> +# following rules:
> +# - First all lines are sorted as-if `LC_COLLATE=C sort`
> +# - Then all entries by group are sorted against the last digits
> +#   of the test.
> +#
> +# For example:
> +# ~~~
> +# tests += \
> +#   test-a \
> +#   test-b \
> +#   test-b1 \
> +#   test-b2 \
> +#   test-b10 \
> +#   test-b20 \
> +#   test-b100 \
> +#   # tests
> +# ~~~
> +# This example shows tests sorted alphabetically, followed
> +# by a numeric suffix sort in increasing numeric order.
> +#
> +# Required cleanups:
> +# - Tests that end in "a" or "b" variants must be renamed to
> +#   end in just the numerical value. For example 'tst-mutex7robust'
> +#   should be renamed to 'tst-mutex12' (the highest numbered test)
> +#   or 'tst-robust11' (the highest numbered test).
> +# - Modules that end in "mod" or "mod1" should be renamed. For
> +#   example 'tst-atfork2mod' should be renamed to 'tst-mod-atfork2'
> +#   (test module for atfork2). If there are more than one module
> +#   then they should be named with a suffix that uses [0-9] first
> +#   then [A-Z] next for a total of 36 possible modules per test.
> +#   No manually listed test currently uses more than that (though
> +#   automatically generated tests may; they don't need sorting).
> +# - Avoid including another test and instead refactor into common
> +#   code with all tests including hte common code, then give the
> +#   tests unique names.
> +#
> +# If you have a Makefile that needs converting, then you can
> +# quickly split the values into one-per-line, ensure the start
> +# and end markers are in place, and then run the script to
> +# sort the values.

I'm not going to block on this if you don't want to do it, but have you 
thought about making this even simpler by, e.g. only looking for the 
start and end marker, splitting everything in between and then sorting 
the values?  Basically don't require developers to break the lists into 
one value per line and do it through the script.

> +
> +import argparse
> +import sys
> +import locale
> +import re
> +import functools
> +
> +def numeric_key(line):
> +    # Turn a line into a numeric sort value by fetching
> +    # the ending number and using that as a key.
> +    var = re.search(r'([0-9]+) \\$', line)
> +    if var == None:
> +        print ("Error: Test line is currently: \"", end='')
> +        print (line, end='')
> +        print ("\"")
> +        print (
> +        '''
> +Test name does not match expected pattern.
> +Rename to match pattern e.g. tst-name[0-9]+.
> +        '''
> +        )
> +        raise Exception ("Invalid test name.")
> +    # Return the numeric value as the key or throws because
> +    # var is None.
> +    return int(var.group(1))
> +
> +def sort_lines(lines):
> +
> +    # Use the C locale for language independent collation.
> +    locale.setlocale (locale.LC_ALL, "C")

Will we ever have non-ASCII names for tests, routines, etc?

> +
> +    # Sort with strcoll initially.  The tests ending in numeric
> +    # names will not sort correctly, but we will adjust that next.
> +    lines = sorted(lines, key=functools.cmp_to_key(locale.strcoll))

I wonder if you could, instead of simply passing strcoll, use a custom 
function which calls strcoll (or a simple sort if we decide we'll never 
have non-ascii file names) and also accounts for the suffix, returning, 
e.g. -1 for cmp('tst-mutex9', 'tst-mutex10').

> +
> +    # We could use a trie here, but since the problem is restricted
> +    # to just numeric suffix we sort by group with a unique key
> +    # function.
> +
> +    # Build a list of all start markers (tuple includes prefix)
> +    prefixes = []
> +    groups = []
> +    for i in range(len(lines)):
> +        # Look for things like "  tst-foo1 \" to start the numbered list.
> +        var = re.search(r'([0-9]+) \\$', lines[i])
> +        if var:
> +            prefix = lines[i][0:var.span()[0]]
> +            if prefix in prefixes:
> +                continue
> +            prefixes.append(prefix)
> +            groups.append((prefix, i))
> +
> +    # For each prefix find the range it covers that needs numeric sorting.
> +    numgroups = []
> +    for group in groups:
> +        for j in range(group[1] + 1,len(lines)):
> +            if not lines[j].startswith(group[0]):
> +                # If it doesn't start with the prefix, then we're on to
> +                # to the next group so mark the last entry as the end
> +                # of the group.
> +                numgroups.append((group[0], group[1], j - 1))
> +                break
> +
> +    # We now have a list of groups to sort.
> +    for ng in numgroups:
> +        # Note slices exclude nth element, so we must add one to right side.
> +        lines[ng[1]:ng[2]+1] = sorted(lines[ng[1]:ng[2]+1], key=numeric_key)
> +
> +    # Return a sorted list with numeric tests sorted by number.
> +    return lines
> +
> +def sort_makefile_lines(infile, outfile):
> +

Maybe add a check here to ensure infile != outfile?  Or do we want to 
support that use case?  It should be doable since you're reading the 
entire file into a list at once.

> +    # Read the whole Makefile.
> +    mfile = open(infile)
> +    lines = mfile.readlines()
> +    mfile.close()
> +
> +    # We will output the Makefile here. Open it early to check
> +    # for any errors.
> +    ofile = open(outfile, "w")
> +
> +    # Build a list of all start markers (tuple includes name).
> +    startmarks = []
> +    for i in range(len(lines)):
> +        # Look for things like "var += \" to start the sorted list.
> +        var = re.search(r'^([a-zA-Z0-9]*) \+\= \\$', lines[i])
> +        if var:
> +            # Remember the index and the name.
> +            startmarks.append((i, var.group(1)))
> +
> +    # For each start marker try to find a matching end mark
> +    # and build a block that needs sorting.  The end marker
> +    # must have the matching comment name for it to be valid.
> +    rangemarks = []
> +    for sm in startmarks:
> +        # Look for things like "  # var" to end the sorted list.
> +        reg = r'^  # ' + sm[1] + r'$'
> +        for j in range(sm[0] + 1, len(lines)):
> +            if re.search(reg, lines[j]):
> +                # Rembember the block to sort (inclusive).
> +                rangemarks.append((sm[0] + 1, j - 1))
> +                break
> +
> +    # We now have a list of all ranges that need sorting.
> +    # Sort those ranges.
> +    for r in rangemarks:
> +        lines[r[0]:r[1]] = sort_lines(lines[r[0]:r[1]])
> +
> +    # Output the whole list with sorted lines.
> +    for line in lines:
> +        ofile.write(line)
> +
> +    ofile.close()
> +
> +def get_parser():
> +    parser = argparse.ArgumentParser(description=__doc__)
> +    parser.add_argument('-i', dest='infile',
> +                        help='Input Makefile to read lines from')
> +    parser.add_argument('-o', dest='outfile',
> +                        help='Output Makefile to write sorted lines to')
> +    return parser
> +
> +def main(argv):
> +    parser = get_parser()
> +    opts = parser.parse_args(argv)
> +    sort_makefile_lines (opts.infile, opts.outfile)
> +
> +if __name__ == '__main__':
> +    main(sys.argv[1:])