public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [RFC] Updating patchwork patches on commit
@ 2020-12-07  5:48 Siddhesh Poyarekar
  2020-12-07  8:45 ` Florian Weimer
  2020-12-07 16:15 ` DJ Delorie
  0 siblings, 2 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2020-12-07  5:48 UTC (permalink / raw)
  To: libc-alpha

[Re-sending because I don't know how to type email addresses.]

Hi,

I have been running some hacked up scripts to update patch state on 
patchwork for every commit that goes into the glibc repository.  The 
script simply walks through commits in a date range, hashes the diffs 
from each ref (using patchwork/hasher.py) and compares it with hashes on 
patchwork.  If the patch as been committed with the diff unchanged, the 
hashes match.  This is very similar to the git hook that patchwork 
ships[1], so I hope to eventually add this into the glibc git hook.

In the last run (2020-12-07), of the 33 commits went in since 
2020-12-01, 19 were found in patchwork and 14 were missing.  The week 
before (2020-11-23 - 2020-12-01) it was 19 found and 9 missing.

This means that diffs of 14 patches were modified before committing. Our 
commit policy explicitly allows this and trusts committers to limit 
these changes to trivial fixes.  However for patchwork usage to be 
valuable (and in the process, improve transparency), a 1:1 
correspondence between git commits and patchwork would be ideal.  That 
is, every commit on git should have at least one[2] patchwork entry. 
This also solves the question "What finally went in?" I've had to ask 
myself repeatedly when cleaning up patchwork state.

We could achieve this without additional busy work by having the git 
hook send out [pushed] emails to the list in addition to glibc-cvs 
(libc-alpha should be spared the private branch traffic of course) 
whenever it sees a commit that it can't find on patchwork.  A nightly 
script can then trivially mark all [pushed] patches as committed.

Thoughts?

Siddhesh

[1] 
https://github.com/getpatchwork/patchwork/blob/master/tools/post-receive.hook
[2] People have been known to send out identical patches repeatedly as 
part of a series.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-07  5:48 [RFC] Updating patchwork patches on commit Siddhesh Poyarekar
@ 2020-12-07  8:45 ` Florian Weimer
  2020-12-07  9:30   ` Siddhesh Poyarekar
  2020-12-07 16:15 ` DJ Delorie
  1 sibling, 1 reply; 21+ messages in thread
From: Florian Weimer @ 2020-12-07  8:45 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

* Siddhesh Poyarekar:

> We could achieve this without additional busy work by having the git
> hook send out [pushed] emails to the list in addition to glibc-cvs 
> (libc-alpha should be spared the private branch traffic of course)
> whenever it sees a commit that it can't find on patchwork.  A nightly 
> script can then trivially mark all [pushed] patches as committed.

I'm not sure if this useful if we can't find the thread to which the
updated commit belongs.

If we can find the thread, it would be more useful to patch the diff
between what was committed and the latest posted patch.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-07  8:45 ` Florian Weimer
@ 2020-12-07  9:30   ` Siddhesh Poyarekar
  0 siblings, 0 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2020-12-07  9:30 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On 12/7/20 2:15 PM, Florian Weimer wrote:
> * Siddhesh Poyarekar:
> 
>> We could achieve this without additional busy work by having the git
>> hook send out [pushed] emails to the list in addition to glibc-cvs
>> (libc-alpha should be spared the private branch traffic of course)
>> whenever it sees a commit that it can't find on patchwork.  A nightly
>> script can then trivially mark all [pushed] patches as committed.
> 
> I'm not sure if this useful if we can't find the thread to which the
> updated commit belongs.

That's a broader problem not limited to these [pushed] patches; we 
currently don't have a way to associate different versions of the same 
patch.  My thinking was that adding these commits won't make things 
worse and could at least give us confidence that the patches that remain 
definitely did not make it into the repo and take stroner actions.  It 
could let us to do things like walking backwards in time from committed 
patches to find patches with identical subject lines and close them off 
as superseded.  It won't catch all superseded patches, but at least 
we'll get a majority of them.

Once the process is bootstrapped, the likelihood of false positives 
(i.e. marking unrelated patches with the same subject lines) ought to be 
negligible.

Siddhesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-07  5:48 [RFC] Updating patchwork patches on commit Siddhesh Poyarekar
  2020-12-07  8:45 ` Florian Weimer
@ 2020-12-07 16:15 ` DJ Delorie
  2020-12-07 16:39   ` Siddhesh Poyarekar
  1 sibling, 1 reply; 21+ messages in thread
From: DJ Delorie @ 2020-12-07 16:15 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

Siddhesh Poyarekar <siddhesh@gotplt.org> writes:
> This means that diffs of 14 patches were modified before committing.

Do you try removing the Reviewed-by tags and re-hashing?  My last step
before committing is usually to add those according to the reviews, so
my patches might never match patchwork.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-07 16:15 ` DJ Delorie
@ 2020-12-07 16:39   ` Siddhesh Poyarekar
  2020-12-07 17:02     ` DJ Delorie
  0 siblings, 1 reply; 21+ messages in thread
From: Siddhesh Poyarekar @ 2020-12-07 16:39 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

On 12/7/20 9:45 PM, DJ Delorie wrote:
> Siddhesh Poyarekar <siddhesh@gotplt.org> writes:
>> This means that diffs of 14 patches were modified before committing.
> 
> Do you try removing the Reviewed-by tags and re-hashing?  My last step
> before committing is usually to add those according to the reviews, so
> my patches might never match patchwork.
> 

Well your NSS patches did match and auto-close; in fact the v4 of 3/6 in 
that patchset also got closed as Committed because diff-wise it was 
identical to v5 3/6 :)

Patchwork stores the diff separately and the hash is generated only on 
the diff using the patchwork hasher in patchwork/hasher.py.  So changing 
the commit message in any way does not change the hash.

Siddhesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-07 16:39   ` Siddhesh Poyarekar
@ 2020-12-07 17:02     ` DJ Delorie
  2020-12-07 18:11       ` Joseph Myers
  0 siblings, 1 reply; 21+ messages in thread
From: DJ Delorie @ 2020-12-07 17:02 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

Siddhesh Poyarekar <siddhesh@gotplt.org> writes:
> Well your NSS patches did match and auto-close; in fact the v4 of 3/6 in 
> that patchset also got closed as Committed because diff-wise it was 
> identical to v5 3/6 :)

Ah, patchwork hash != git hash.  Nevermind ;-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-07 17:02     ` DJ Delorie
@ 2020-12-07 18:11       ` Joseph Myers
  2020-12-08  2:57         ` Siddhesh Poyarekar
  0 siblings, 1 reply; 21+ messages in thread
From: Joseph Myers @ 2020-12-07 18:11 UTC (permalink / raw)
  To: DJ Delorie; +Cc: Siddhesh Poyarekar, libc-alpha

On Mon, 7 Dec 2020, DJ Delorie via Libc-alpha wrote:

> Siddhesh Poyarekar <siddhesh@gotplt.org> writes:
> > Well your NSS patches did match and auto-close; in fact the v4 of 3/6 in 
> > that patchset also got closed as Committed because diff-wise it was 
> > identical to v5 3/6 :)
> 
> Ah, patchwork hash != git hash.  Nevermind ;-)

A previous discussion suggested "git patch-id" was appropriate to use for 
this purpose, but I don't know if it's what patchwork actually uses.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-07 18:11       ` Joseph Myers
@ 2020-12-08  2:57         ` Siddhesh Poyarekar
  2020-12-08  9:08           ` Andreas Schwab
  0 siblings, 1 reply; 21+ messages in thread
From: Siddhesh Poyarekar @ 2020-12-08  2:57 UTC (permalink / raw)
  To: Joseph Myers, DJ Delorie; +Cc: libc-alpha

On 12/7/20 11:41 PM, Joseph Myers wrote:
> On Mon, 7 Dec 2020, DJ Delorie via Libc-alpha wrote:
> 
>> Siddhesh Poyarekar <siddhesh@gotplt.org> writes:
>>> Well your NSS patches did match and auto-close; in fact the v4 of 3/6 in
>>> that patchset also got closed as Committed because diff-wise it was
>>> identical to v5 3/6 :)
>>
>> Ah, patchwork hash != git hash.  Nevermind ;-)
> 
> A previous discussion suggested "git patch-id" was appropriate to use for
> this purpose, but I don't know if it's what patchwork actually uses.
> 

It doesn't; it has it's own hashing function where it normalizes spaces 
and newline chars to avoid false negatives.  It could however do with 
some rudimentary sorting of diff lines to ensure that it generates the 
same hash for reordered diffs.  I'll play with that a bit later in the 
week and see if it improves matching.

Siddhesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-08  2:57         ` Siddhesh Poyarekar
@ 2020-12-08  9:08           ` Andreas Schwab
  2020-12-08 10:10             ` Siddhesh Poyarekar
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Schwab @ 2020-12-08  9:08 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Joseph Myers, DJ Delorie, libc-alpha

On Dez 08 2020, Siddhesh Poyarekar wrote:

> On 12/7/20 11:41 PM, Joseph Myers wrote:
>> On Mon, 7 Dec 2020, DJ Delorie via Libc-alpha wrote:
>> 
>>> Siddhesh Poyarekar <siddhesh@gotplt.org> writes:
>>>> Well your NSS patches did match and auto-close; in fact the v4 of 3/6 in
>>>> that patchset also got closed as Committed because diff-wise it was
>>>> identical to v5 3/6 :)
>>>
>>> Ah, patchwork hash != git hash.  Nevermind ;-)
>> A previous discussion suggested "git patch-id" was appropriate to use
>> for
>> this purpose, but I don't know if it's what patchwork actually uses.
>> 
>
> It doesn't; it has it's own hashing function where it normalizes spaces
> and newline chars to avoid false negatives.

Like git patch-id?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-08  9:08           ` Andreas Schwab
@ 2020-12-08 10:10             ` Siddhesh Poyarekar
  2020-12-16 18:35               ` Girish Joshi
  0 siblings, 1 reply; 21+ messages in thread
From: Siddhesh Poyarekar @ 2020-12-08 10:10 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Joseph Myers, DJ Delorie, libc-alpha

On 12/8/20 2:38 PM, Andreas Schwab wrote:
>> It doesn't; it has it's own hashing function where it normalizes spaces
>> and newline chars to avoid false negatives.
> 
> Like git patch-id?
> 

Yeah, except that it (AFAICT) doesn't order the diff input like git 
patch-id does :)  I suppose I could check if they're willing to add a 
dependency on git for this and drop their custom hasher or at least 
provide a supported way to add a different hashing function or program.

Siddhesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-08 10:10             ` Siddhesh Poyarekar
@ 2020-12-16 18:35               ` Girish Joshi
  2020-12-16 18:49                 ` Siddhesh Poyarekar
  0 siblings, 1 reply; 21+ messages in thread
From: Girish Joshi @ 2020-12-16 18:35 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Andreas Schwab, Girish Joshi via Libc-alpha, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 1934 bytes --]

Hello all,
I tried a couple of very basic scripts for this. (I know that there
are a lot of improvements needed there.)
I was able to merge 336 series out of 1114.

As "git-pw patch apply <id>" gives "Resource not found" for the older
patches. So right now only series are applied to a branch.
Here is how the scripts work.
We have two scripts, "get-patches.py" and "apply-patches.py" (we can
change the names of course).
"get-patches.py" reads the patches/series starting from page1 to page
100 (currently) in csv format and dumps it to stdout. This output is
piped to the second script "apply-patches.py" which tries to apply
each series/patch to the branch.
In the end we get two files as an output "merged.txt" and
"unmerged.txt" containing the IDs for merged and unmerged series
respectively.
Currently these files are placed in the current directory, I'll change
it to /tmp or something else in the next patch.

Just to have it here, to apply patches using these two scripts

    $ python scripts/get-patches.py series | python
scripts/apply-patches.py series apply

I'm still not sure about what happens to the older patches, do they
get applied from "git-pw series apply" or not (I'm looking into it)
because the newer ones do get applied.

Is it going in the right direction? Please share your thoughts.
Thanks.

Girish Joshi
girishjoshi.io

On Tue, Dec 8, 2020 at 3:40 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>
> On 12/8/20 2:38 PM, Andreas Schwab wrote:
> >> It doesn't; it has it's own hashing function where it normalizes spaces
> >> and newline chars to avoid false negatives.
> >
> > Like git patch-id?
> >
>
> Yeah, except that it (AFAICT) doesn't order the diff input like git
> patch-id does :)  I suppose I could check if they're willing to add a
> dependency on git for this and drop their custom hasher or at least
> provide a supported way to add a different hashing function or program.
>
> Siddhesh

[-- Attachment #2: get-patches.py --]
[-- Type: text/x-python, Size: 286 bytes --]

#!python3
import os
import sys

type_ = sys.argv[1]

command = "git-pw {0} list --page {1} -f csv"
if type_ == 'patch':
    command+= " --state 'new'"

for i in range(1, 100):
    # print(command.format(type_, i))
    ret = os.system(command.format(type_, i))
    if ret:
        break

[-- Attachment #3: apply-patches.py --]
[-- Type: text/x-python, Size: 3382 bytes --]

#!python3
import re
import csv
import sys
import shlex
import subprocess as sp

# import time

prune_warining = "warning: There are too many unreachable loose objects; run 'git prune' to remove them."

# List for series entries
series = []

# These lists will contain merged and unmnerged series data.
merged = []
unmerged = []

# option that we will be operating upon, series or the patch
# this is the command line argument to git-pw
# for example "git-pw patch apply 12345" or "git-pw series apply "12356"
type_ = "series"


# Get the csv data from stdin
csv_data = []
for line in sys.stdin:
    if not '"ID","Date","Name","Version","Submitter"' in line:
        print(line)
        csv_data.append(line.strip())


# parse the csv entries
def read_rows(csvfile):
    spamreader = csv.reader(csvfile, delimiter=",", quotechar='"')
    for row in spamreader:
        # print(row)
        if not row:
            return
        if row and row[1] != "ID":
            series.append(row)


def get_output(cmnd):
    """
    Execute command and check the output, if git throws a warning saying
    "warning: There are too many unreachable loose objects; run 'git prune' to remove them."
    `git prune` will be executed. otherwise output will be printed and exit code
    will be returned.
    """
    try:
        output = sp.check_output(
            cmnd, stderr=sp.STDOUT, shell=True, universal_newlines=True
        )
    except sp.CalledProcessError as exc:
        print("Status : FAIL", exc.returncode, exc.output)
        return exc.returncode
    else:
        print("Output: \n{}\n".format(output))
    if prune_warining in output:
        print("running git prune")
        get_output("git prune")
    return 0


def write_file(filename, list_):
    """
    This function is used to write the IDs for patches/series that
    are merged/unmerged after we have processed everything.
    """
    with open(filename, "w") as f:
        for i in list_:
            f.write(i[0] + "\n")


if __name__ == "__main__":

    read_rows(csv_data)

    # this is crappy, it will be replaced by arg parser.
    if len(sys.argv) >= 3:
        if sys.argv[1] == "series" or sys.argv[1] == "patch":
            type_ = sys.argv[1]

        if sys.argv[2] == "apply":
            print("applying ", type_)
            if series:
                for i in series:
                    try:
                        print("trying to apply:", i[0], i[1], i[2])
                        if i[0] == "ID":
                            pass
                        ret = get_output(f"git-pw {type_} apply {i[0]}")
                        print("git exit code: ", ret)

                        # time.sleep(0.5)
                        if ret:
                            # if `git-pw patch/series apply <id>` fails
                            # resetting to HEAD
                            print("resetting...")
                            get_output("git reset --hard HEAD")
                            get_output("git am --abort")
                            unmerged.append(i)
                        else:
                            merged.append(i)
                    except KeyboardInterrupt as ke:
                        break

    print("total merged: {0}, total unmerged {1}".format(len(merged), len(unmerged)))
    write_file("merged.txt", merged)
    write_file("unmerged.txt", unmerged)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-16 18:35               ` Girish Joshi
@ 2020-12-16 18:49                 ` Siddhesh Poyarekar
  2020-12-17 17:49                   ` Girish Joshi
  0 siblings, 1 reply; 21+ messages in thread
From: Siddhesh Poyarekar @ 2020-12-16 18:49 UTC (permalink / raw)
  To: Girish Joshi; +Cc: Andreas Schwab, Girish Joshi via Libc-alpha, Joseph Myers

On 12/17/20 12:05 AM, Girish Joshi wrote:
> Hello all,
> I tried a couple of very basic scripts for this. (I know that there
> are a lot of improvements needed there.)
> I was able to merge 336 series out of 1114.

I'm surprised there are 1114 series that need action; maybe it's 
including series that have already been committed and you need to filter 
those out?

> As "git-pw patch apply <id>" gives "Resource not found" for the older
> patches. So right now only series are applied to a branch.
> Here is how the scripts work.
> We have two scripts, "get-patches.py" and "apply-patches.py" (we can
> change the names of course).
> "get-patches.py" reads the patches/series starting from page1 to page
> 100 (currently) in csv format and dumps it to stdout. This output is
> piped to the second script "apply-patches.py" which tries to apply
> each series/patch to the branch.

It should become one script.

> In the end we get two files as an output "merged.txt" and
> "unmerged.txt" containing the IDs for merged and unmerged series
> respectively.
> Currently these files are placed in the current directory, I'll change
> it to /tmp or something else in the next patch.
> 
> Just to have it here, to apply patches using these two scripts
> 
>      $ python scripts/get-patches.py series | python
> scripts/apply-patches.py series apply
> 
> I'm still not sure about what happens to the older patches, do they
> get applied from "git-pw series apply" or not (I'm looking into it)
> because the newer ones do get applied.

The older ones do not have a series ID because they were ported over 
from an ancient patchwork instance, so they won't work with `git-pw 
series`.  They'll need some trickery to figure out series.  There ought 
to be some relationship beyond the name, say, in the mbox of the patch 
that could be exploited to make that connection.

Siddhesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-16 18:49                 ` Siddhesh Poyarekar
@ 2020-12-17 17:49                   ` Girish Joshi
  2020-12-18  4:04                     ` Siddhesh Poyarekar
  0 siblings, 1 reply; 21+ messages in thread
From: Girish Joshi @ 2020-12-17 17:49 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Andreas Schwab, Girish Joshi via Libc-alpha, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 1889 bytes --]

Hi Siddhesh,
On Thu, Dec 17, 2020 at 12:19 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> I'm surprised there are 1114 series that need action; maybe it's
> including series that have already been committed and you need to filter
> those out?
Yeah, in the git output we can see that a lot of those are already applied.

> > As "git-pw patch apply <id>" gives "Resource not found" for the older
> > patches. So right now only series are applied to a branch.
> > Here is how the scripts work.
> > We have two scripts, "get-patches.py" and "apply-patches.py" (we can
> > change the names of course).
> > "get-patches.py" reads the patches/series starting from page1 to page
> > 100 (currently) in csv format and dumps it to stdout. This output is
> > piped to the second script "apply-patches.py" which tries to apply
> > each series/patch to the branch.
>
> It should become one script.
Will do that in a couple of iterations, right now I've modified it so
that it can take input from stdin as well as from a csv file.

> > In the end we get two files as an output "merged.txt" and
> > "unmerged.txt" containing the IDs for merged and unmerged series
> > respectively.
> > Currently these files are placed in the current directory, I'll change
> > it to /tmp or something else in the next patch.
I've added one more file to it for unavailable patches. So the the
author can be notified and asked to repost those patches (if needed).
Also added an argument for changing the output location for these files.

> The older ones do not have a series ID because they were ported over
> from an ancient patchwork instance, so they won't work with `git-pw
> series`.  They'll need some trickery to figure out series.  There ought
> to be some relationship beyond the name, say, in the mbox of the patch
> that could be exploited to make that connection.
looking into it.

Thanks.
Girish Joshi

[-- Attachment #2: apply-patches.py --]
[-- Type: text/x-python, Size: 6346 bytes --]

#!python3

import subprocess as sp
import argparse
import csv
import os
import sys


# if these strings are found in output of git/git-pw,
# we need to take some actions.
prune_warining = "warning: There are too many unreachable loose objects; run 'git prune' to remove them."
resource_not_found_warning = "Resource not found"
already_applied_warning = "No changes -- Patch already applied."

# These lists will contain merged and unmnerged series data.
merged = []
unmerged = []
unavailable = []

# parse the csv entries
def read_rows(csvfile):

    # List for series entries
    series_data = []
    csvreader = csv.reader(csvfile, delimiter=",", quotechar='"')
    for row in csvreader:
        # print(row)
        if not row:
            return
        if row and row[1] != "ID":
            series_data.append(row)
    return series_data


def run_cmd(cmd, debug=False):
    """
    Execute command and check the output, if git throws a warning saying
    "warning: There are too many unreachable loose objects; run 'git prune' to remove them."
    `git prune` will be executed. otherwise output will be printed and exit code
    will be returned.
    """
    exit_code = 0
    output = ""
    try:
        output = sp.check_output(
            cmd, stderr=sp.STDOUT, shell=True, universal_newlines=True
        )
    except sp.CalledProcessError as exc:
        if debug:
            print("Status : FAIL", exc.returncode, exc.output)
        exit_code, output = exc.returncode, exc.output
    else:
        if debug:
            print("{}\n".format(output))

    return exit_code, output


def write_file(filename, list_):
    """
    This function is used to write the IDs for patches/series that
    are merged/unmerged/unavailable after we have processed everything.
    """
    with open(filename, "w") as f:
        for i in list_:
            f.write(i[0] + "\n")


def apply_(series):
    for i in series:
        try:
            print(
                f"{bcolors.OKGREEN}trying to apply:{type_}{bcolors.OKCYAN} {i[0]} {bcolors.ENDC}"
            )
            print(f"{bcolors.OKBLUE} {i[1]}, {bcolors.UNDERLINE}{i[2]}{bcolors.ENDC}")
            if i[0] == "ID":
                pass

            exit_code, output = run_cmd(f"git-pw {type_} apply {i[0]}")

            if prune_warining in output:
                print("running: git prune")
                get_output("git prune")

            if exit_code == 1 and resource_not_found_warning in output:
                print(f"{bcolors.WARNING}patch unavailable{bcolors.ENDC}")
                unavailable.append(i)

            if exit_code:
                # if `git-pw patch/series apply <id>` fails
                # resetting to HEAD

                print(
                    f"{bcolors.OKCYAN}git exit code: {bcolors.FAIL}{exit_code}{bcolors.ENDC}"
                )
                print(f"{bcolors.FAIL}resetting to HEAD: {bcolors.ENDC}")

                if os.path.exists(".git/rebase-apply"):
                    run_cmd("git am --abort")
                run_cmd("git reset --hard HEAD", debug=True)

                unmerged.append(i)

            else:
                if output.strip().endswith(already_applied_warning):
                    print(
                        f"{bcolors.WARNING}No changes -- already applied {bcolors.ENDC}\n"
                    )
                else:
                    print(f"{bcolors.OKCYAN}{type_} applied{bcolors.ENDC}\n")
                merged.append(i)
        except KeyboardInterrupt as ke:
            break


if __name__ == "__main__":

    parser = argparse.ArgumentParser(description="Initial Ci script for patchwork")
    parser.add_argument(
        "-c", "--colors", default=False, action="store_true", help="Enable colors"
    )

    parser.add_argument(
        "-t",
        "--type",
        type=str,
        default="series",
        choices=["patch", "series"],
        help="type: patch/series",
    )
    parser.add_argument(
        "-a", "--action", type=str, default="apply", help="action: list/apply"
    )
    parser.add_argument(
        "-o",
        "--output-location",
        type=str,
        default="/tmp/pw-results",
        help="location for the output files containing merged, umerged and unavailable patches/series.",
    )
    parser.add_argument(
        "-i",
        "--input-file",
        type=str,
        default="-",
        help="input file: csv file or '-' for the standard input",
    )
    args = parser.parse_args()
    print(args)

    csv_data = []
    if args.input_file == "-":

        # Get the csv data from stdin
        for line in sys.stdin:
            if not '"ID"' in line:
                print(line)
                csv_data.append(line.strip())

    elif os.path.exists(args.input_file):
        data = open(args.input_file).read().strip().split("\n")
        for line in data:
            if not '"ID"' in line:
                print(line)
                csv_data.append(line.strip())

    # option that we will be operating upon, series or the patch
    # this is the command line argument to git-pw
    # for example "git-pw patch apply 12345" or "git-pw series apply "12356"
    type_ = args.type

    series_data = read_rows(csv_data)
    output_files_loc = args.output_location
    if not os.path.exists(output_files_loc):
        os.mkdir(output_files_loc)

    colors = args.colors

    class bcolors:
        if colors:
            HEADER = "\033[95m"
            OKBLUE = "\033[94m"
            OKCYAN = "\033[96m"
            OKGREEN = "\033[92m"
            WARNING = "\033[93m"
            FAIL = "\033[91m"
            ENDC = "\033[0m"
            BOLD = "\033[1m"
            UNDERLINE = "\033[4m"
        else:
            HEADER = ""
            OKBLUE = ""
            OKCYAN = ""
            OKGREEN = ""
            WARNING = ""
            FAIL = ""
            ENDC = ""
            BOLD = ""
            UNDERLINE = ""

    if args.action == "apply":
        apply_(series_data)

    print(
        "total merged: {0}, total unmerged {1}, total unavailable{2}".format(
            len(merged), len(unmerged), len(unavailable)
        )
    )
    write_file(f"{output_files_loc}/merged.txt", merged)
    write_file(f"{output_files_loc}/unmerged.txt", unmerged)
    write_file(f"{output_files_loc}/unavailable.txt", unavailable)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-17 17:49                   ` Girish Joshi
@ 2020-12-18  4:04                     ` Siddhesh Poyarekar
  2020-12-19 13:25                       ` Girish Joshi
  0 siblings, 1 reply; 21+ messages in thread
From: Siddhesh Poyarekar @ 2020-12-18  4:04 UTC (permalink / raw)
  To: Girish Joshi; +Cc: Andreas Schwab, Girish Joshi via Libc-alpha, Joseph Myers

On 12/17/20 11:19 PM, Girish Joshi wrote:
> Hi Siddhesh,
> On Thu, Dec 17, 2020 at 12:19 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>> I'm surprised there are 1114 series that need action; maybe it's
>> including series that have already been committed and you need to filter
>> those out?
> Yeah, in the git output we can see that a lot of those are already applied.

Are they marked as committed though?  If not then they should be.

Siddhesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-18  4:04                     ` Siddhesh Poyarekar
@ 2020-12-19 13:25                       ` Girish Joshi
  2020-12-22 15:13                         ` Girish Joshi
  0 siblings, 1 reply; 21+ messages in thread
From: Girish Joshi @ 2020-12-19 13:25 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Andreas Schwab, Girish Joshi via Libc-alpha, Joseph Myers

On Fri, Dec 18, 2020 at 9:34 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>
> On 12/17/20 11:19 PM, Girish Joshi wrote:
> > Hi Siddhesh,
> > On Thu, Dec 17, 2020 at 12:19 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> >> I'm surprised there are 1114 series that need action; maybe it's
> >> including series that have already been committed and you need to filter
> >> those out?
> > Yeah, in the git output we can see that a lot of those are already applied.
>
> Are they marked as committed though?  If not then they should be.
Yes, the status for (almost all of) those patches is "committed" on
the patchwork instance.
To verify it I'm writing down the IDs for such series in a separate file now.
Although I did not find an option from the git-pw cli for checking if
a series is already committed.
The work around for that could be to go through all of the patches in
that series and check if all of them are committed.

Girish Joshi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-19 13:25                       ` Girish Joshi
@ 2020-12-22 15:13                         ` Girish Joshi
  2021-01-06 20:26                           ` Girish Joshi
  0 siblings, 1 reply; 21+ messages in thread
From: Girish Joshi @ 2020-12-22 15:13 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Andreas Schwab, Girish Joshi via Libc-alpha, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]

I've created this[1] script to go through all available series and get
the patches that do not belong to any one of them.
It dumps a json containing individual patch ids in /tmp directory.
This script can be merged with the previous one "apply-patches.py".
I'll do that soon.
Currently we have around 106 individual patches with the state "new"
that do not belong to any of the series.

Girish Joshi
girishjoshi.io

On Sat, Dec 19, 2020 at 6:55 PM Girish Joshi <girish946@gmail.com> wrote:
>
> On Fri, Dec 18, 2020 at 9:34 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> >
> > On 12/17/20 11:19 PM, Girish Joshi wrote:
> > > Hi Siddhesh,
> > > On Thu, Dec 17, 2020 at 12:19 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> > >> I'm surprised there are 1114 series that need action; maybe it's
> > >> including series that have already been committed and you need to filter
> > >> those out?
> > > Yeah, in the git output we can see that a lot of those are already applied.
> >
> > Are they marked as committed though?  If not then they should be.
> Yes, the status for (almost all of) those patches is "committed" on
> the patchwork instance.
> To verify it I'm writing down the IDs for such series in a separate file now.
> Although I did not find an option from the git-pw cli for checking if
> a series is already committed.
> The work around for that could be to go through all of the patches in
> that series and check if all of them are committed.
>
> Girish Joshi

[-- Attachment #2: check-series.py --]
[-- Type: text/x-python, Size: 2980 bytes --]

#!/usr/bin/env python
import csv
import sys
import os
import subprocess as sp
import _thread as thread


def read_file(file_name):
    file_data = []
    if os.path.exists(file_name):
        data = open(file_name).read().strip().split("\n")
        for line in data:
            if not '"ID"' in line:
                # print(line)
                file_data.append(line.strip())
    return file_data


# parse the csv entries
def read_rows(csvfile):

    # List for series entries
    series_data = []
    csvreader = csv.reader(csvfile, delimiter=",", quotechar='"')
    for row in csvreader:
        # print(row)
        if not row:
            return
        if row and row[1] != "ID":
            series_data.append(row)
    return series_data


def run_cmd(cmd, debug=False):
    """
    Execute command and return the exit code and output.
    """
    exit_code = 0
    output = ""
    try:
        output = sp.check_output(
            cmd, stderr=sp.STDOUT, shell=True, universal_newlines=True
        )
    except sp.CalledProcessError as exc:
        if debug:
            print("Status : FAIL", exc.returncode, exc.output)
        exit_code, output = exc.returncode, exc.output
    else:
        if debug:
            print("{}\n".format(output))

    return exit_code, output


def write_json(file_name, data):
    import json

    with open(file_name, "w") as f:
        f.write(json.dumps(data))


def check_data(list_, index):
    for i in list_:
        ret, op = run_cmd(f"git-pw series show {i} -f csv")
        print("****", index, "****")

        series_data = read_rows(op.strip().split("\n"))
        print(series_data[1])
        for j in series_data[11:]:
            patch_data = j[1].split()
            print(patch_data[0], patch_data[1])
            series_dict[i].append(patch_data[0])
            if patch_data[0] in patch_ids:
                patch_ids.remove(patch_data[0])

    done_lists[index] = True


if __name__ == "__main__":

    file_loc = "/tmp/pw-analysis"
    if not os.path.exists(file_loc):
        os.mkdir(file_loc)
    series_file = sys.argv[1]
    patches_file = sys.argv[2]

    series = [i for i in read_rows(read_file(series_file))]
    series_dict = {i[0]: [] for i in series}

    patches = read_rows(read_file(patches_file))
    patch_ids = [i[0] for i in patches]

    series_ids = [i for i in series_dict.keys()]

    chunk_size = 100
    all_lists = [
        series_ids[i : i + chunk_size] for i in range(0, len(series_ids), chunk_size)
    ]

    done_lists = [False for i in range(len(all_lists))]

    print(len(all_lists))
    for index, i in enumerate(all_lists):
        print(index)
        thread.start_new_thread(check_data, (i, index))
    while False in done_lists:
        pass

    print("Total individual patches: ", len(patch_ids))
    print("writing patch ids to", file_loc + "/patch_ids")
    write_json(file_loc + "/dict", series_dict)
    write_json(file_loc + "/patch_ids", {"patches": patch_ids})

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2020-12-22 15:13                         ` Girish Joshi
@ 2021-01-06 20:26                           ` Girish Joshi
  2021-02-04 15:47                             ` Girish Joshi
  0 siblings, 1 reply; 21+ messages in thread
From: Girish Joshi @ 2021-01-06 20:26 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Andreas Schwab, Girish Joshi via Libc-alpha, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 2320 bytes --]

I've combined the two scripts to pull data from the patchwork instance
instead of stdin or a csv file.
Also to get the patch ids for the old patches that do not belong to any series.
To get these individual patches
    python scripts/apply-patches.py -u

I'll try to fix a few small things like taking page numbers for
pulling the data from the command line itself by this weekend.
Once this is done, this script can be invoked after a regular interval
of time to check if the new patches can be applied.
Also I will try to set up a local patchwork instance this weekend (I
was supposed to do this a couple of weeks back but got too busy and
could not do that).
Thanks.

Girish Joshi
girishjoshi.io

On Tue, Dec 22, 2020 at 8:43 PM Girish Joshi <girish946@gmail.com> wrote:
>
> I've created this[1] script to go through all available series and get
> the patches that do not belong to any one of them.
> It dumps a json containing individual patch ids in /tmp directory.
> This script can be merged with the previous one "apply-patches.py".
> I'll do that soon.
> Currently we have around 106 individual patches with the state "new"
> that do not belong to any of the series.
>
> Girish Joshi
> girishjoshi.io
>
> On Sat, Dec 19, 2020 at 6:55 PM Girish Joshi <girish946@gmail.com> wrote:
> >
> > On Fri, Dec 18, 2020 at 9:34 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> > >
> > > On 12/17/20 11:19 PM, Girish Joshi wrote:
> > > > Hi Siddhesh,
> > > > On Thu, Dec 17, 2020 at 12:19 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> > > >> I'm surprised there are 1114 series that need action; maybe it's
> > > >> including series that have already been committed and you need to filter
> > > >> those out?
> > > > Yeah, in the git output we can see that a lot of those are already applied.
> > >
> > > Are they marked as committed though?  If not then they should be.
> > Yes, the status for (almost all of) those patches is "committed" on
> > the patchwork instance.
> > To verify it I'm writing down the IDs for such series in a separate file now.
> > Although I did not find an option from the git-pw cli for checking if
> > a series is already committed.
> > The work around for that could be to go through all of the patches in
> > that series and check if all of them are committed.
> >
> > Girish Joshi

[-- Attachment #2: apply-patches.py --]
[-- Type: text/x-python, Size: 9613 bytes --]

#!python3

import subprocess as sp
import _thread as thread
import argparse
import csv
import os
import sys
import time

# if these strings are found in output of git/git-pw,
# we need to take some actions.
prune_warining = "warning: There are too many unreachable loose objects; run 'git prune' to remove them."
resource_not_found_warning = "Resource not found"
already_applied_warning = "No changes -- Patch already applied."

# These lists will contain merged and unmnerged series data.
merged = []
unmerged = []
unavailable = []
already_applied = []

# parse the csv entries
def read_rows(csvfile):

    # List for series entries
    series_data = []
    csvreader = csv.reader(csvfile, delimiter=",", quotechar='"')
    for row in csvreader:
        print(row)
        if not row:
            return
        if row and row[0] != "ID":
            series_data.append(row)
    return series_data


def run_cmd(cmd, debug=False):
    """
    Execute command and return the exit code and output.
    """
    exit_code = 0
    output = ""
    try:
        output = sp.check_output(
            cmd, stderr=sp.STDOUT, shell=True, universal_newlines=True
        )
    except sp.CalledProcessError as exc:
        if debug:
            print("Status : FAIL", exc.returncode, exc.output)
        exit_code, output = exc.returncode, exc.output
    else:
        if debug:
            print("{}\n".format(output))

    return exit_code, output


def write_file(filename, list_):
    """
    This function is used to write the IDs for patches/series that
    are merged/unmerged/unavailable after we have processed everything.
    """
    with open(filename, "w") as f:
        for i in list_:
            f.write(i[0] + "\n")


def write_json(file_name, data):
    import json

    with open(file_name, "w") as f:
        f.write(json.dumps(data))


def apply_(series):
    """if git throws a warning saying
    "warning: There are too many unreachable loose objects; run 'git prune' to remove them."
    `git prune` will be executed. otherwise output will be printed and exit code
    will be returned.
    """

    for i in series:
        try:
            print(
                f"{bcolors.OKGREEN}trying to apply:{type_}{bcolors.OKCYAN} {i[0]} {bcolors.ENDC}"
            )
            print(f"{bcolors.OKBLUE} {i[1]}, {bcolors.UNDERLINE}{i[2]}{bcolors.ENDC}")
            if i[0] == "ID":
                pass

            exit_code, output = run_cmd(f"git-pw {type_} apply {i[0]}")

            if prune_warining in output:
                print("running: git prune")
                run_cmd("git prune")

            if exit_code == 1 and resource_not_found_warning in output:
                print(f"{bcolors.WARNING}patch unavailable{bcolors.ENDC}")
                unavailable.append(i)

            if exit_code:
                # if `git-pw patch/series apply <id>` fails
                # resetting to HEAD

                print(
                    f"{bcolors.OKCYAN}git exit code: {bcolors.FAIL}{exit_code}{bcolors.ENDC}"
                )
                unmerged.append(i)

            else:
                if output.strip().endswith(already_applied_warning):
                    print(
                        f"{bcolors.WARNING}No changes -- already applied {bcolors.ENDC}\n"
                    )
                    already_applied.append(i)
                else:
                    print(f"{bcolors.OKCYAN}{type_} applied{bcolors.ENDC}\n")
                merged.append(i)

            print(f"{bcolors.FAIL}resetting to HEAD: {bcolors.ENDC}")

            if os.path.exists(".git/rebase-apply"):
                run_cmd("git am --abort")
            run_cmd("git reset --hard master", debug=True)

        except KeyboardInterrupt as ke:
            break
        except Exception as e:
            print(e)
            break


def get_patches(from_page=1, to_page=100):
    cmd = "git-pw patch list --page {0} -f csv --state 'new'"
    patches = []
    for i in range(from_page, to_page):
        exit_code, output = run_cmd(cmd.format(i), debug=True)

        if exit_code:
            print(f"git-pw exited with exit code {exit_code}")
            # patches.extend(output.strip().split("\n"))
            break

        patches.extend(output.strip().split("\n"))
        # print(patches)
    return patches


def get_series(from_page=1, to_page=100):
    cmd = "git-pw series list --page {0} -f csv"
    series = []
    for i in range(from_page, to_page):
        exit_code, output = run_cmd(cmd.format(i), debug=True)

        if exit_code:
            print(f"git-pw exited with exit code {exit_code}")
            # series.extend(output.strip().split("\n"))
            break

        series.extend(output.strip().split("\n"))
    print(series)
    return series


def get_patches_for_series(list_, index, series_dict, patch_ids):
    for i in list_:
        print(f"running: git-pw series show {i} -f csv")
        ret, op = run_cmd(f"git-pw series show {i} -f csv")
        print("****", index, "****")
        if ret:
            print(f"exitted with {ret}: {op}")
        series_data = read_rows(op.strip().split("\n"))
        print(series_data[1])
        for j in series_data[11:]:
            patch_data = j[1].split()
            print(patch_data[0], patch_data[1])
            series_dict[i].append(patch_data[0])
            if patch_data[0] in patch_ids:
                print("poping ")
                patch_ids.remove(patch_data[0])


def get_individual_patches():

    file_loc = "/tmp/pwanalysis"

    series = [i for i in read_rows(get_series())]
    # print(series)
    series_dict = {i[0]: [] for i in series}

    patches = read_rows(get_patches())
    patch_ids = [i[0] for i in patches]
    # print(patch_ids)
    series_ids = [i for i in series_dict.keys()]

    get_patches_for_series(series_ids, 0, series_dict, patch_ids)
    print("Individual patches", len(patch_ids))
    print("series_dict", series_dict)

    write_json(file_loc + "/dict", series_dict)
    write_json(file_loc + "/patch_ids", {"patches": patch_ids})


if __name__ == "__main__":

    parser = argparse.ArgumentParser(description="Initial Ci script for patchwork")
    parser.add_argument(
        "-c", "--colors", default=False, action="store_true", help="Enable colors"
    )

    parser.add_argument(
        "-t",
        "--type",
        type=str,
        default="series",
        choices=["patch", "series"],
        help="type: patch/series",
    )
    parser.add_argument(
        "-a", "--action", type=str, default="apply", help="action: list/apply"
    )
    parser.add_argument(
        "-o",
        "--output-location",
        type=str,
        default="/tmp/pw-results",
        help="location for the output files containing merged, umerged and unavailable patches/series.",
    )
    parser.add_argument(
        "-i",
        "--input-file",
        type=str,
        default="",
        help="input file: csv file or '-' for the standard input. If no file is specified\
this data will be pulled from patchwork instance.",
    )

    parser.add_argument(
        "-u",
        "--get-individual-patches",
        default=False,
        action="store_true",
        help="""Get individual patches.
In this case the series data and the patches data is pulled and compared to
find out the individual patches that do not belong to any series.""",
    )
    args = parser.parse_args()
    print(args)

    if args.get_individual_patches:
        print("getting individual patches")
        get_individual_patches()
        sys.exit(0)

    csv_data = []
    if args.input_file == "-":

        # Get the csv data from stdin
        for line in sys.stdin:
            if not '"ID"' in line:
                print(line)
                csv_data.append(line.strip())

    elif os.path.exists(args.input_file):
        data = open(args.input_file).read().strip().split("\n")
        for line in data:
            if not '"ID"' in line:
                # print(line)
                csv_data.append(line.strip())

    # option that we will be operating upon, series or the patch
    # this is the command line argument to git-pw
    # for example "git-pw patch apply 12345" or "git-pw series apply "12356"
    type_ = args.type

    series_data = read_rows(csv_data)
    output_files_loc = args.output_location
    if not os.path.exists(output_files_loc):
        os.mkdir(output_files_loc)

    colors = args.colors

    class bcolors:
        if colors:
            HEADER = "\033[95m"
            OKBLUE = "\033[94m"
            OKCYAN = "\033[96m"
            OKGREEN = "\033[92m"
            WARNING = "\033[93m"
            FAIL = "\033[91m"
            ENDC = "\033[0m"
            BOLD = "\033[1m"
            UNDERLINE = "\033[4m"
        else:
            HEADER = ""
            OKBLUE = ""
            OKCYAN = ""
            OKGREEN = ""
            WARNING = ""
            FAIL = ""
            ENDC = ""
            BOLD = ""
            UNDERLINE = ""

    print(len(series_data))
    if args.action == "apply":
        if args.input_file == "":
            series_data = [i for i in read_rows(get_series())]
        apply_(series_data)

    print(
        "total merged: {0}, total unmerged {1}, total unavailable{2}".format(
            len(merged), len(unmerged), len(unavailable)
        )
    )
    write_file(f"{output_files_loc}/merged.txt", merged)
    write_file(f"{output_files_loc}/unmerged.txt", unmerged)
    write_file(f"{output_files_loc}/unavailable.txt", unavailable)
    write_file(f"{output_files_loc}/already_applied.txt", already_applied)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2021-01-06 20:26                           ` Girish Joshi
@ 2021-02-04 15:47                             ` Girish Joshi
  2021-02-12  5:25                               ` Siddhesh Poyarekar
  2021-02-12  9:02                               ` Siddhesh Poyarekar
  0 siblings, 2 replies; 21+ messages in thread
From: Girish Joshi @ 2021-02-04 15:47 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Girish Joshi via Libc-alpha
  Cc: Andreas Schwab, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

Hello Siddhesh,

On Thu, Jan 7, 2021 at 1:56 AM Girish Joshi <girish946@gmail.com> wrote:
>
> I've combined the two scripts to pull data from the patchwork instance
> instead of stdin or a csv file.
> Also to get the patch ids for the old patches that do not belong to any series.
> To get these individual patches
>     python scripts/apply-patches.py -u
>
> I'll try to fix a few small things like taking page numbers for
> pulling the data from the command line itself by this weekend.

I've done this change in the attached script.

> Once this is done, this script can be invoked after a regular interval
> of time to check if the new patches can be applied.
We can do this now.

There are a couple of functions that need refactoring.
But for now it does the job.
Could you please review it?

Girish Joshi

[-- Attachment #2: apply-patches.py --]
[-- Type: text/x-python, Size: 10264 bytes --]

#!python3

import subprocess as sp
import _thread as thread
import argparse
import csv
import os
import sys
import time

# if these strings are found in output of git/git-pw,
# we need to take some actions.
prune_warining = "warning: There are too many unreachable loose objects; run 'git prune' to remove them."
resource_not_found_warning = "Resource not found"
already_applied_warning = "No changes -- Patch already applied."

# These lists will contain merged and unmnerged series data.
merged = []
unmerged = []
unavailable = []
already_applied = []

# parse the csv entries
def read_rows(csvfile):

    # List for series entries
    series_data = []
    csvreader = csv.reader(csvfile, delimiter=",", quotechar='"')
    for row in csvreader:
        print(row)
        if not row:
            return
        if row and row[0] != "ID":
            series_data.append(row)
    return series_data


def run_cmd(cmd, debug=False):
    """
    Execute command and return the exit code and output.
    """
    exit_code = 0
    output = ""
    try:
        output = sp.check_output(
            cmd, stderr=sp.STDOUT, shell=True, universal_newlines=True
        )
    except sp.CalledProcessError as exc:
        if debug:
            print("Status : FAIL", exc.returncode, exc.output)
        exit_code, output = exc.returncode, exc.output
    else:
        if debug:
            print("{}\n".format(output))

    return exit_code, output


def write_file(filename, list_):
    """
    This function is used to write the IDs for patches/series that
    are merged/unmerged/unavailable after we have processed everything.
    """
    with open(filename, "w") as f:
        for i in list_:
            f.write(i[0] + "\n")


def write_json(file_name, data):
    import json

    with open(file_name, "w") as f:
        f.write(json.dumps(data))


def apply_(series):
    """if git throws a warning saying
    "warning: There are too many unreachable loose objects; run 'git prune' to remove them."
    `git prune` will be executed. otherwise output will be printed and exit code
    will be returned.
    """

    for i in series:
        try:
            print(
                f"{bcolors.OKGREEN}trying to apply:{type_}{bcolors.OKCYAN} {i[0]} {bcolors.ENDC}"
            )
            print(f"{bcolors.OKBLUE} {i[1]}, {bcolors.UNDERLINE}{i[2]}{bcolors.ENDC}")
            if i[0] == "ID":
                pass

            exit_code, output = run_cmd(f"git-pw {type_} apply {i[0]}")

            if prune_warining in output:
                print("running: git prune")
                run_cmd("git prune")

            if exit_code == 1 and resource_not_found_warning in output:
                print(f"{bcolors.WARNING}patch unavailable{bcolors.ENDC}")
                unavailable.append(i)

            if exit_code:
                # if `git-pw patch/series apply <id>` fails
                # resetting to HEAD

                print(
                    f"{bcolors.OKCYAN}git exit code: {bcolors.FAIL}{exit_code}{bcolors.ENDC}"
                )
                unmerged.append(i)

            else:
                if output.strip().endswith(already_applied_warning):
                    print(
                        f"{bcolors.WARNING}No changes -- already applied {bcolors.ENDC}\n"
                    )
                    already_applied.append(i)
                else:
                    print(f"{bcolors.OKCYAN}{type_} applied{bcolors.ENDC}\n")
                merged.append(i)

            print(f"{bcolors.FAIL}resetting to HEAD: {bcolors.ENDC}")

            if os.path.exists(".git/rebase-apply"):
                run_cmd("git am --abort")
            run_cmd("git reset --hard master", debug=True)

        except KeyboardInterrupt as ke:
            break
        except Exception as e:
            print(e)
            break


def get_patches(from_page=1, to_page=100):
    cmd = "git-pw patch list --page {0} -f csv --state 'new'"
    patches = []
    for i in range(from_page, to_page):
        exit_code, output = run_cmd(cmd.format(i), debug=True)

        if exit_code:
            print(f"git-pw exited with exit code {exit_code}")
            # patches.extend(output.strip().split("\n"))
            break

        patches.extend(output.strip().split("\n"))
        # print(patches)
    return patches


def get_series(from_page=1, to_page=100):
    cmd = "git-pw series list --page {0} -f csv"
    series = []
    for i in range(from_page, to_page):
        exit_code, output = run_cmd(cmd.format(i), debug=True)

        if exit_code:
            print(f"git-pw exited with exit code {exit_code}")
            # series.extend(output.strip().split("\n"))
            break

        series.extend(output.strip().split("\n"))
    return series


def get_patches_for_series(list_, index, series_dict, patch_ids):
    for i in list_:
        print(f"running: git-pw series show {i} -f csv")
        ret, op = run_cmd(f"git-pw series show {i} -f csv")
        print("****", index, "****")
        if ret:
            print(f"exitted with {ret}: {op}")
        series_data = read_rows(op.strip().split("\n"))
        print(series_data[1])
        for j in series_data[11:]:
            patch_data = j[1].split()
            print(patch_data[0], patch_data[1])
            series_dict[i].append(patch_data[0])
            if patch_data[0] in patch_ids:
                patch_ids.remove(patch_data[0])


def get_individual_patches():
    """This function iterates over all of the series and all of the patches.
    All of the patches that do not belong to any series are dumped into a file.
    """
    # TODO: This function needs a refactor.

    file_loc = "/tmp/pwanalysis"
    if not os.path.exists(file_loc):
        os.mkdir(file_loc)

    series = [i for i in read_rows(get_series())]
    # print(series)
    series_dict = {i[0]: [] for i in series}

    patches = read_rows(get_patches())
    patch_ids = [i[0] for i in patches]
    # print(patch_ids)
    series_ids = [i for i in series_dict.keys()]

    get_patches_for_series(series_ids, 0, series_dict, patch_ids)
    print("Individual patches", len(patch_ids))
    print("series_dict", series_dict)

    write_json(file_loc + "/dict", series_dict)
    write_json(file_loc + "/patch_ids", {"patches": patch_ids})


if __name__ == "__main__":

    parser = argparse.ArgumentParser(description="Initial Ci script for patchwork")
    parser.add_argument(
        "-c", "--colors", default=False, action="store_true", help="Enable colors"
    )

    parser.add_argument(
        "-t",
        "--type",
        type=str,
        default="series",
        choices=["patch", "series"],
        help="type: patch/series",
    )
    parser.add_argument(
        "-a", "--action", type=str, default="apply", help="action: list/apply"
    )
    parser.add_argument(
        "-o",
        "--output-location",
        type=str,
        default="/tmp/pw-results",
        help="location for the output files containing merged, umerged and unavailable patches/series.",
    )
    parser.add_argument(
        "-i",
        "--input-file",
        type=str,
        default="",
        help="input file: csv file or '-' for the standard input. If no file is specified\
this data will be pulled from patchwork instance.",
    )
    parser.add_argument(
        "-p",
        "--page-range",
        default="1-100",
        help="page range for patchwork in the format 'from_pageNo'-'to_pageNo' for example '1-100'",
    )

    parser.add_argument(
        "-u",
        "--get-individual-patches",
        default=False,
        action="store_true",
        help="""Get individual patches.
In this case the series data and the patches data is pulled and compared to
find out the individual patches that do not belong to any series.""",
    )
    args = parser.parse_args()
    print(args)

    csv_data = []
    if args.input_file == "-":

        # Get the csv data from stdin
        for line in sys.stdin:
            if not '"ID"' in line:
                print(line)
                csv_data.append(line.strip())

    elif os.path.exists(args.input_file):
        data = open(args.input_file).read().strip().split("\n")
        for line in data:
            if not '"ID"' in line:
                # print(line)
                csv_data.append(line.strip())

    # option that we will be operating upon, series or the patch
    # this is the command line argument to git-pw
    # for example "git-pw patch apply 12345" or "git-pw series apply "12356"
    type_ = args.type

    series_data = read_rows(csv_data)
    output_files_loc = args.output_location
    if not os.path.exists(output_files_loc):
        os.mkdir(output_files_loc)

    if args.get_individual_patches:
        print("getting individual patches")
        get_individual_patches()
        sys.exit(0)

    colors = args.colors

    class bcolors:
        if colors:
            HEADER = "\033[95m"
            OKBLUE = "\033[94m"
            OKCYAN = "\033[96m"
            OKGREEN = "\033[92m"
            WARNING = "\033[93m"
            FAIL = "\033[91m"
            ENDC = "\033[0m"
            BOLD = "\033[1m"
            UNDERLINE = "\033[4m"
        else:
            HEADER = ""
            OKBLUE = ""
            OKCYAN = ""
            OKGREEN = ""
            WARNING = ""
            FAIL = ""
            ENDC = ""
            BOLD = ""
            UNDERLINE = ""

    print(len(series_data))
    if args.action == "apply":
        if args.input_file == "":
            try:
                from_range, to_range = [int(i) for i in args.page_range.split("-")]
            except ValueError as ve:
                print("invalid page range")
                sys.exit(1)
            series_data = [i for i in read_rows(get_series(from_range, to_range + 1))]
        apply_(series_data)

    print(
        "total merged: {0}, total unmerged {1}, total unavailable{2}".format(
            len(merged), len(unmerged), len(unavailable)
        )
    )
    write_file(f"{output_files_loc}/merged.txt", merged)
    write_file(f"{output_files_loc}/unmerged.txt", unmerged)
    write_file(f"{output_files_loc}/unavailable.txt", unavailable)
    write_file(f"{output_files_loc}/already_applied.txt", already_applied)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2021-02-04 15:47                             ` Girish Joshi
@ 2021-02-12  5:25                               ` Siddhesh Poyarekar
  2021-02-12  9:02                               ` Siddhesh Poyarekar
  1 sibling, 0 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-12  5:25 UTC (permalink / raw)
  To: Girish Joshi, Girish Joshi via Libc-alpha; +Cc: Andreas Schwab, Joseph Myers

On 2/4/21 9:17 PM, Girish Joshi wrote:
> There are a couple of functions that need refactoring.
> But for now it does the job.
> Could you please review it?

Thanks for doing this, I've got the script running now to see what it 
gives.  While it was running, I noticed that it lists (and processes) 
patches that have been committed and also marked as committed on 
patchwork.  Perhaps the git-pw invocation needs tweaking?

Siddhesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2021-02-04 15:47                             ` Girish Joshi
  2021-02-12  5:25                               ` Siddhesh Poyarekar
@ 2021-02-12  9:02                               ` Siddhesh Poyarekar
  2021-02-12 13:04                                 ` Carlos O'Donell
  1 sibling, 1 reply; 21+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-12  9:02 UTC (permalink / raw)
  To: Girish Joshi, Girish Joshi via Libc-alpha; +Cc: Andreas Schwab, Joseph Myers

On 2/4/21 9:17 PM, Girish Joshi wrote:
>> Once this is done, this script can be invoked after a regular interval
>> of time to check if the new patches can be applied.
> We can do this now.
> 
> There are a couple of functions that need refactoring.
> But for now it does the job.
> Could you please review it?

The script is now done and I have gone through the outputs.  Some notes:

1. git-pw runs are leaving /tmp/git-pw* directories, you need to clean 
them up

2. The script must ignore patches that are not in the New state. 
Currently it seems to be going through everything.

3. The output in pw-results seemed to mostly be old patches from 2014 or 
so by default.  Perhaps it's hitting the limit for the server in 2014 
because it's not filtering correctly on patch state?

If the output does not correspond with what you're seeing, then please 
send the commandline you'd like me to run to get the output you're 
seeing.  The primary goal with this script set is to identify patches in 
2019/2020 that are out of date and no longer apply so that we can mark 
them accordingly.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC] Updating patchwork patches on commit
  2021-02-12  9:02                               ` Siddhesh Poyarekar
@ 2021-02-12 13:04                                 ` Carlos O'Donell
  0 siblings, 0 replies; 21+ messages in thread
From: Carlos O'Donell @ 2021-02-12 13:04 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Girish Joshi, Girish Joshi via Libc-alpha
  Cc: Andreas Schwab, Joseph Myers

On 2/12/21 4:02 AM, Siddhesh Poyarekar wrote:
> On 2/4/21 9:17 PM, Girish Joshi wrote:
>>> Once this is done, this script can be invoked after a regular interval
>>> of time to check if the new patches can be applied.
>> We can do this now.
>>
>> There are a couple of functions that need refactoring.
>> But for now it does the job.
>> Could you please review it?
> 
> The script is now done and I have gone through the outputs.  Some notes:
> 
> 1. git-pw runs are leaving /tmp/git-pw* directories, you need to clean them up
> 
> 2. The script must ignore patches that are not in the New state. Currently it seems to be going through everything.
> 
> 3. The output in pw-results seemed to mostly be old patches from 2014 or so by default.  Perhaps it's hitting the limit for the server in 2014 because it's not filtering correctly on patch state?
> 
> If the output does not correspond with what you're seeing, then please send the commandline you'd like me to run to get the output you're seeing.  The primary goal with this script set is to identify patches in 2019/2020 that are out of date and no longer apply so that we can mark them accordingly.

FYI.

The kernel has a patchwork-bot here:
https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/git-patchwork-bot.py

In case they were doing something interesting that we're not.

Their bot knows how to mark superseded based on vN markup.

It uses sqlite3 local db to track processing state.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-02-12 13:04 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-07  5:48 [RFC] Updating patchwork patches on commit Siddhesh Poyarekar
2020-12-07  8:45 ` Florian Weimer
2020-12-07  9:30   ` Siddhesh Poyarekar
2020-12-07 16:15 ` DJ Delorie
2020-12-07 16:39   ` Siddhesh Poyarekar
2020-12-07 17:02     ` DJ Delorie
2020-12-07 18:11       ` Joseph Myers
2020-12-08  2:57         ` Siddhesh Poyarekar
2020-12-08  9:08           ` Andreas Schwab
2020-12-08 10:10             ` Siddhesh Poyarekar
2020-12-16 18:35               ` Girish Joshi
2020-12-16 18:49                 ` Siddhesh Poyarekar
2020-12-17 17:49                   ` Girish Joshi
2020-12-18  4:04                     ` Siddhesh Poyarekar
2020-12-19 13:25                       ` Girish Joshi
2020-12-22 15:13                         ` Girish Joshi
2021-01-06 20:26                           ` Girish Joshi
2021-02-04 15:47                             ` Girish Joshi
2021-02-12  5:25                               ` Siddhesh Poyarekar
2021-02-12  9:02                               ` Siddhesh Poyarekar
2021-02-12 13:04                                 ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).