From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.gentoo.org (woodpecker.gentoo.org [140.211.166.183]) by sourceware.org (Postfix) with ESMTP id 8B0803858C5E for ; Thu, 2 Nov 2023 08:41:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8B0803858C5E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gentoo.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gentoo.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8B0803858C5E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=140.211.166.183 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698914488; cv=none; b=LKwqeDcPvTiwKk5bqSu9uTtEWUW0fZzmWoU6+TxLAjt9blL6xgjRQ6mKG90ztTlmKeFVyBHjG6zzey4igkzEZd2kOpvTzDHWooVpNCUkY5pa2Tu1PNZT6HdlOR0DNjgaXUPFCILtqbeYzd3QQgrkth2nsBOGHcL14A8BxxMKE9E= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698914488; c=relaxed/simple; bh=mF5RXHHfe+RaQ6w0T9eXJMMHTBcN2Kxna86iKDLinLA=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=XbMOXEDSTmLHD341KvGrMja2+fONL4H8+lxAchf41Tm4iY3AoJlywcbDtJ0Bsnr6wzwUljsQ/Q6FpzGTDMDuOpl1V/fJqLiJ77uy0XydXzK69eaIuyaEXphKQ490HLYCUTC/0INcKx7e9fcHIdngNQh3+0zGtA6JIMXlBWTekfs= ARC-Authentication-Results: i=1; server2.sourceware.org From: Sam James To: gcc-patches@gcc.gnu.org Cc: Sam James Subject: [PATCH 1/4] contrib: add generate_snapshot_index.py Date: Thu, 2 Nov 2023 08:39:05 +0000 Message-ID: <20231102084058.1142941-1-sam@gentoo.org> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Script to create a map between weekly snapshots and the commit they're based on with space-separated format BRANCH-DATE COMMIT. For example: 8-20210107 5114ee0676e432493ada968e34071f02fb08114f 8-20210114 f9267925c648f2ccd9e4680b699e581003125bcf ... This is helpful for bisects and quickly looking up the information from bug reports. contrib/: * generate_snapshot_index.py: New file. Signed-off-by: Sam James --- contrib/generate_snapshot_index.py | 79 ++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100755 contrib/generate_snapshot_index.py diff --git a/contrib/generate_snapshot_index.py b/contrib/generate_snapshot_index.py new file mode 100755 index 000000000000..80fc14b2cf1e --- /dev/null +++ b/contrib/generate_snapshot_index.py @@ -0,0 +1,79 @@ +#!/usr/bin/env python3 +# +# Copyright (C) 2023 Free Software Foundation, Inc. +# Contributed by Sam James. +# +# This script is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# Script to create a map between weekly snapshots and the commit they're based on. +# Creates known_snapshots.txt with space-separated format: BRANCH-DATE COMMIT +# For example: +# 8-20210107 5114ee0676e432493ada968e34071f02fb08114f +# 8-20210114 f9267925c648f2ccd9e4680b699e581003125bcf + +import os +import re +import urllib.request + +MIRROR = "https://mirrorservice.org/sites/sourceware.org/pub/gcc/snapshots/" + + +def get_remote_snapshot_list() -> list[str]: + # Parse the HTML index for links to snapshots + with urllib.request.urlopen(MIRROR) as index_response: + html = index_response.read().decode("utf-8") + snapshots = re.findall(r'href="([0-9]+-.*)"', html) + + return snapshots + + +def load_cached_entries() -> dict[str, str]: + local_snapshots = {} + + with open("known_snapshots.txt", encoding="utf-8") as local_entries: + for entry in local_entries.readlines(): + if not entry: + continue + + date, commit = entry.strip().split(" ") + local_snapshots[date] = commit + + return local_snapshots + + +remote_snapshots = get_remote_snapshot_list() +try: + known_snapshots = load_cached_entries() +except FileNotFoundError: + # No cache available + known_snapshots = {} + +# This would give us chronological order (as in by creation) +# snapshots.sort(reverse=False, key=lambda x: x.split('-')[1]) +# snapshots.sort(reverse=True, key=lambda x: x.split('-')[0]) + +for snapshot in remote_snapshots: + # 8-20210107/ -> 8-20210107 + snapshot = snapshot.strip("/") + + # Don't fetch entries we already have stored. + if snapshot in known_snapshots: + continue + + # The READMEs are plain text with several lines, one of which is: + # "with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-8 revision e4e5ad2304db534957c4af612aa288cb6ef51f25"" + # We match after 'revision ' to grab the commit used. + with urllib.request.urlopen(f"{MIRROR}/{snapshot}/README") as readme_response: + data = readme_response.read().decode("utf-8") + parsed_commit = re.findall(r"revision (.*)", data)[0] + known_snapshots[snapshot] = parsed_commit + +# Dump it all back out to disk. +with open("known_snapshots.txt.tmp", "w", encoding="utf-8") as known_entries: + for name, stored_commit in known_snapshots.items(): + known_entries.write(f"{name} {stored_commit}\n") + +os.rename("known_snapshots.txt.tmp", "known_snapshots.txt") -- 2.42.0