Re: [PATCH 2/2] scripts: Add sort-makefile-lines.py to sort Makefile variables.

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Carlos O'Donell <carlos@redhat.com>
To: Siddhesh Poyarekar <siddhesh@gotplt.org>, libc-alpha@sourceware.org
Subject: Re: [PATCH 2/2] scripts: Add sort-makefile-lines.py to sort Makefile variables.
Date: Tue, 9 May 2023 13:58:42 -0400	[thread overview]
Message-ID: <fe87b600-8f51-d7c3-4c81-64da7b37b29d@redhat.com> (raw)
In-Reply-To: <7372d68a-bd89-d883-4314-9a5a8ffbc2a6@gotplt.org>

On 5/8/23 13:47, Siddhesh Poyarekar wrote:
> On 2023-04-28 07:48, Carlos O'Donell via Libc-alpha wrote:
>> The scripts/sort-makefile-lines.py script sorts Makefile variables
>> according to project expected order.
>>
>> The script is used like this:
>>
>> $ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
>> $ mv elf/Makefile.tmp elf/Makefile
> 
> Should we have a convenience make target like `make
> sort-makefile-lines` to allow folks to reflow files?  I'm trying to
> think of how we could make it easy for users to consume this.

The reflow process is manual and NRE, so it doesn't deserve a target IMO.

We *do* need to either:
(a) Add a linter in make check.
(b) Add a pre-commit CI linter check.

Today we can't do this until we convert more of the Makefiles and it is my intent
to convert them all.

It *almost* passes today, we need to cleanup some half-dozen Makefiles.

>> ---
>>   scripts/sort-makefile-lines.py | 217 +++++++++++++++++++++++++++++++++
>>   1 file changed, 217 insertions(+)
>>   create mode 100755 scripts/sort-makefile-lines.py
>>
>> diff --git a/scripts/sort-makefile-lines.py b/scripts/sort-makefile-lines.py
>> new file mode 100755
>> index 0000000000..06c0b3b3a2
>> --- /dev/null
>> +++ b/scripts/sort-makefile-lines.py
>> @@ -0,0 +1,217 @@
>> +#!/usr/bin/python3
>> +# Sort Makefile lines as expected by project policy.
>> +# Copyright (C) 2022-2023 Free Software Foundation, Inc.
>> +# Copyright The GNU Toolchain Authors.
> 
> Why both?  Does it include code that was written by someone who does
> not have copyright assignment on file with the FSF?

That's a mistake I forgot to cleanup.

I'll post a v2.

In v2 I'm changing a number of things:

(a) Fold all the complex double-block sorting into a single sort routine.
(b) Stop issuing errors and just sort according to the rules.

> 
>> +# This file is part of the GNU C Library.
>> +#
>> +# The GNU C Library is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU Lesser General Public
>> +# License as published by the Free Software Foundation; either
>> +# version 2.1 of the License, or (at your option) any later version.
>> +#
>> +# The GNU C Library is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +# Lesser General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU Lesser General Public
>> +# License along with the GNU C Library; if not, see
>> +# <https://www.gnu.org/licenses/>.
>> +
>> +# The project consensus is to split Makefile variable assignment
>> +# across multiple lines with one value per line.  The values are
>> +# then sorted as described below, and terminated with a special
>> +# list termination marker.  This splitting makes it much easier
>> +# to add new tests to the list since they become just a single
>> +# line insertion.  It also makes backports and merges easier
>> +# since the new test may not conflict due to the ordering.
>> +#
>> +# Consensus discussion:
>> +# https://public-inbox.org/libc-alpha/f6406204-84f5-adb1-d00e-979ebeebbbde@redhat.com/
>> +#
>> +# To support cleaning up Makefiles we created this program to
>> +# help sort existing lists converted to the new format.
>> +#
>> +# The program takes as input the Makefile to sort correctly,
>> +# and the output file to write the correctly sorted output.
>> +#
>> +# Sorting is only carried out between two special markers:
>> +# (a) Marker start is '<variable> += \'
>> +# (b) Marker end is '  # <variable>'
>> +# With everthing between (a) and (b) being sorted.
>> +#
>> +# You can use it like this:
>> +# $ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
>> +# $ mv elf/Makefile.tmp elf/Makefile
>> +#
>> +# The Makefile lines in the project are sorted using the
>> +# following rules:
>> +# - First all lines are sorted as-if `LC_COLLATE=C sort`
>> +# - Then all entries by group are sorted against the last digits
>> +#   of the test.
>> +#
>> +# For example:
>> +# ~~~
>> +# tests += \
>> +#   test-a \
>> +#   test-b \
>> +#   test-b1 \
>> +#   test-b2 \
>> +#   test-b10 \
>> +#   test-b20 \
>> +#   test-b100 \
>> +#   # tests
>> +# ~~~
>> +# This example shows tests sorted alphabetically, followed
>> +# by a numeric suffix sort in increasing numeric order.
>> +#
>> +# Required cleanups:
>> +# - Tests that end in "a" or "b" variants must be renamed to
>> +#   end in just the numerical value. For example 'tst-mutex7robust'
>> +#   should be renamed to 'tst-mutex12' (the highest numbered test)
>> +#   or 'tst-robust11' (the highest numbered test).
>> +# - Modules that end in "mod" or "mod1" should be renamed. For
>> +#   example 'tst-atfork2mod' should be renamed to 'tst-mod-atfork2'
>> +#   (test module for atfork2). If there are more than one module
>> +#   then they should be named with a suffix that uses [0-9] first
>> +#   then [A-Z] next for a total of 36 possible modules per test.
>> +#   No manually listed test currently uses more than that (though
>> +#   automatically generated tests may; they don't need sorting).
>> +# - Avoid including another test and instead refactor into common
>> +#   code with all tests including hte common code, then give the
>> +#   tests unique names.
>> +#
>> +# If you have a Makefile that needs converting, then you can
>> +# quickly split the values into one-per-line, ensure the start
>> +# and end markers are in place, and then run the script to
>> +# sort the values.
> 
> I'm not going to block on this if you don't want to do it, but have
> you thought about making this even simpler by, e.g. only looking for
> the start and end marker, splitting everything in between and then
> sorting the values?  Basically don't require developers to break the
> lists into one value per line and do it through the script.

I have, but that means writing code that only gets used once in the conversion
process and then we delete it because it's never needed again.

I want to keep this file as simple as possible with as little code as possible
for the future scenario where all the files are already converted and we're
just checking their sorting.

Notice I don't enforce any whitespace either here just the sorting.

The intent is to lint only the sorting of lines in a specific layout.

>> +
>> +import argparse
>> +import sys
>> +import locale
>> +import re
>> +import functools
>> +
>> +def numeric_key(line):
>> +    # Turn a line into a numeric sort value by fetching
>> +    # the ending number and using that as a key.
>> +    var = re.search(r'([0-9]+) \\$', line)
>> +    if var == None:
>> +        print ("Error: Test line is currently: \"", end='')
>> +        print (line, end='')
>> +        print ("\"")
>> +        print (
>> +        '''
>> +Test name does not match expected pattern.
>> +Rename to match pattern e.g. tst-name[0-9]+.
>> +        '''
>> +        )
>> +        raise Exception ("Invalid test name.")
>> +    # Return the numeric value as the key or throws because
>> +    # var is None.
>> +    return int(var.group(1))
>> +
>> +def sort_lines(lines):
>> +
>> +    # Use the C locale for language independent collation.
>> +    locale.setlocale (locale.LC_ALL, "C")
> 
> Will we ever have non-ASCII names for tests, routines, etc?

Yes.

With C we'll sort by code point order (byte-by-byte).

It won't match the sorting of any *language* but it will sort by UTF-8 code point order.

>> +
>> +    # Sort with strcoll initially.  The tests ending in numeric
>> +    # names will not sort correctly, but we will adjust that next.
>> +    lines = sorted(lines, key=functools.cmp_to_key(locale.strcoll))
> 
> I wonder if you could, instead of simply passing strcoll, use a
> custom function which calls strcoll (or a simple sort if we decide
> we'll never have non-ascii file names) and also accounts for the
> suffix, returning, e.g. -1 for cmp('tst-mutex9', 'tst-mutex10').

Fixed in v2.

> 
>> +
>> +    # We could use a trie here, but since the problem is restricted
>> +    # to just numeric suffix we sort by group with a unique key
>> +    # function.
>> +
>> +    # Build a list of all start markers (tuple includes prefix)
>> +    prefixes = []
>> +    groups = []
>> +    for i in range(len(lines)):
>> +        # Look for things like "  tst-foo1 \" to start the numbered list.
>> +        var = re.search(r'([0-9]+) \\$', lines[i])
>> +        if var:
>> +            prefix = lines[i][0:var.span()[0]]
>> +            if prefix in prefixes:
>> +                continue
>> +            prefixes.append(prefix)
>> +            groups.append((prefix, i))
>> +
>> +    # For each prefix find the range it covers that needs numeric sorting.
>> +    numgroups = []
>> +    for group in groups:
>> +        for j in range(group[1] + 1,len(lines)):
>> +            if not lines[j].startswith(group[0]):
>> +                # If it doesn't start with the prefix, then we're on to
>> +                # to the next group so mark the last entry as the end
>> +                # of the group.
>> +                numgroups.append((group[0], group[1], j - 1))
>> +                break
>> +
>> +    # We now have a list of groups to sort.
>> +    for ng in numgroups:
>> +        # Note slices exclude nth element, so we must add one to right side.
>> +        lines[ng[1]:ng[2]+1] = sorted(lines[ng[1]:ng[2]+1], key=numeric_key)
>> +
>> +    # Return a sorted list with numeric tests sorted by number.
>> +    return lines
>> +
>> +def sort_makefile_lines(infile, outfile):
>> +
> 
> Maybe add a check here to ensure infile != outfile?  Or do we want to
> support that use case?  It should be doable since you're reading the
> entire file into a list at once.

It should be allowed, we open, read, close, and then open, write close.

I've adjusted all the uses to avoid showing a temp file.
 
>> +    # Read the whole Makefile.
>> +    mfile = open(infile)
>> +    lines = mfile.readlines()
>> +    mfile.close()
>> +
>> +    # We will output the Makefile here. Open it early to check
>> +    # for any errors.
>> +    ofile = open(outfile, "w")
>> +
>> +    # Build a list of all start markers (tuple includes name).
>> +    startmarks = []
>> +    for i in range(len(lines)):
>> +        # Look for things like "var += \" to start the sorted list.
>> +        var = re.search(r'^([a-zA-Z0-9]*) \+\= \\$', lines[i])
>> +        if var:
>> +            # Remember the index and the name.
>> +            startmarks.append((i, var.group(1)))
>> +
>> +    # For each start marker try to find a matching end mark
>> +    # and build a block that needs sorting.  The end marker
>> +    # must have the matching comment name for it to be valid.
>> +    rangemarks = []
>> +    for sm in startmarks:
>> +        # Look for things like "  # var" to end the sorted list.
>> +        reg = r'^  # ' + sm[1] + r'$'
>> +        for j in range(sm[0] + 1, len(lines)):
>> +            if re.search(reg, lines[j]):
>> +                # Rembember the block to sort (inclusive).
>> +                rangemarks.append((sm[0] + 1, j - 1))
>> +                break
>> +
>> +    # We now have a list of all ranges that need sorting.
>> +    # Sort those ranges.
>> +    for r in rangemarks:
>> +        lines[r[0]:r[1]] = sort_lines(lines[r[0]:r[1]])
>> +
>> +    # Output the whole list with sorted lines.
>> +    for line in lines:
>> +        ofile.write(line)
>> +
>> +    ofile.close()
>> +
>> +def get_parser():
>> +    parser = argparse.ArgumentParser(description=__doc__)
>> +    parser.add_argument('-i', dest='infile',
>> +                        help='Input Makefile to read lines from')
>> +    parser.add_argument('-o', dest='outfile',
>> +                        help='Output Makefile to write sorted lines to')
>> +    return parser
>> +
>> +def main(argv):
>> +    parser = get_parser()
>> +    opts = parser.parse_args(argv)
>> +    sort_makefile_lines (opts.infile, opts.outfile)
>> +
>> +if __name__ == '__main__':
>> +    main(sys.argv[1:])
> 

-- 
Cheers,
Carlos.

     prev parent reply	other threads:[~2023-05-09 17:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-28 11:48 [PATCH 0/2] Standardize the sorting of " Carlos O'Donell
2023-04-28 11:48 ` [PATCH 1/2] nptl: Re-flow and sort variables in Makefile Carlos O'Donell
2023-05-08 17:17   ` Siddhesh Poyarekar
2023-05-08 17:53     ` Siddhesh Poyarekar
2023-04-28 11:48 ` [PATCH 2/2] scripts: Add sort-makefile-lines.py to sort Makefile variables Carlos O'Donell
2023-05-08 17:47   ` Siddhesh Poyarekar
2023-05-09 17:58     ` Carlos O'Donell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe87b600-8f51-d7c3-4c81-64da7b37b29d@redhat.com \
    --to=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=siddhesh@gotplt.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).