public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Sam James <sam@gentoo.org>
To: gcc-patches@gcc.gnu.org
Cc: Sam James <sam@gentoo.org>
Subject: [PATCH 1/4] contrib: add generate_snapshot_index.py
Date: Thu,  2 Nov 2023 08:39:05 +0000	[thread overview]
Message-ID: <20231102084058.1142941-1-sam@gentoo.org> (raw)

Script to create a map between weekly snapshots and the commit they're based on
with space-separated format BRANCH-DATE COMMIT.

For example:
8-20210107 5114ee0676e432493ada968e34071f02fb08114f
8-20210114 f9267925c648f2ccd9e4680b699e581003125bcf
...

This is helpful for bisects and quickly looking up the information from bug
reports.

contrib/:
    * generate_snapshot_index.py: New file.

Signed-off-by: Sam James <sam@gentoo.org>
---
 contrib/generate_snapshot_index.py | 79 ++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100755 contrib/generate_snapshot_index.py

diff --git a/contrib/generate_snapshot_index.py b/contrib/generate_snapshot_index.py
new file mode 100755
index 000000000000..80fc14b2cf1e
--- /dev/null
+++ b/contrib/generate_snapshot_index.py
@@ -0,0 +1,79 @@
+#!/usr/bin/env python3
+#
+# Copyright (C) 2023 Free Software Foundation, Inc.
+# Contributed by Sam James.
+#
+# This script is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# Script to create a map between weekly snapshots and the commit they're based on.
+# Creates known_snapshots.txt with space-separated format: BRANCH-DATE COMMIT
+# For example:
+# 8-20210107 5114ee0676e432493ada968e34071f02fb08114f
+# 8-20210114 f9267925c648f2ccd9e4680b699e581003125bcf
+
+import os
+import re
+import urllib.request
+
+MIRROR = "https://mirrorservice.org/sites/sourceware.org/pub/gcc/snapshots/"
+
+
+def get_remote_snapshot_list() -> list[str]:
+    # Parse the HTML index for links to snapshots
+    with urllib.request.urlopen(MIRROR) as index_response:
+        html = index_response.read().decode("utf-8")
+        snapshots = re.findall(r'href="([0-9]+-.*)"', html)
+
+    return snapshots
+
+
+def load_cached_entries() -> dict[str, str]:
+    local_snapshots = {}
+
+    with open("known_snapshots.txt", encoding="utf-8") as local_entries:
+        for entry in local_entries.readlines():
+            if not entry:
+                continue
+
+            date, commit = entry.strip().split(" ")
+            local_snapshots[date] = commit
+
+    return local_snapshots
+
+
+remote_snapshots = get_remote_snapshot_list()
+try:
+    known_snapshots = load_cached_entries()
+except FileNotFoundError:
+    # No cache available
+    known_snapshots = {}
+
+# This would give us chronological order (as in by creation)
+# snapshots.sort(reverse=False, key=lambda x: x.split('-')[1])
+# snapshots.sort(reverse=True, key=lambda x: x.split('-')[0])
+
+for snapshot in remote_snapshots:
+    # 8-20210107/ -> 8-20210107
+    snapshot = snapshot.strip("/")
+
+    # Don't fetch entries we already have stored.
+    if snapshot in known_snapshots:
+        continue
+
+    # The READMEs are plain text with several lines, one of which is:
+    # "with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-8 revision e4e5ad2304db534957c4af612aa288cb6ef51f25""
+    # We match after 'revision ' to grab the commit used.
+    with urllib.request.urlopen(f"{MIRROR}/{snapshot}/README") as readme_response:
+        data = readme_response.read().decode("utf-8")
+        parsed_commit = re.findall(r"revision (.*)", data)[0]
+        known_snapshots[snapshot] = parsed_commit
+
+# Dump it all back out to disk.
+with open("known_snapshots.txt.tmp", "w", encoding="utf-8") as known_entries:
+    for name, stored_commit in known_snapshots.items():
+        known_entries.write(f"{name} {stored_commit}\n")
+
+os.rename("known_snapshots.txt.tmp", "known_snapshots.txt")
-- 
2.42.0


             reply	other threads:[~2023-11-02  8:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02  8:39 Sam James [this message]
2023-11-02  8:39 ` [PATCH 2/4] maintainer-scripts/gcc_release: create index between snapshots <-> commits Sam James
2023-11-02  9:07   ` Jonathan Wakely
2023-11-02 10:18     ` Andreas Schwab
2023-11-02 10:25       ` Jonathan Wakely
2023-11-02 22:18         ` rep.dot.nop
2023-11-02  8:39 ` [PATCH 3/4] maintainer-scripts/gcc_release: use HTTPS for links Sam James
2023-11-02 18:59   ` Joseph Myers
2023-11-02  8:39 ` [PATCH 4/4] maintainer-scripts/gcc_release: cleanup whitespace Sam James
2023-11-02 19:00   ` Joseph Myers
2023-11-10 23:35     ` Sam James
2023-11-14  0:21       ` Joseph Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231102084058.1142941-1-sam@gentoo.org \
    --to=sam@gentoo.org \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).