From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23706 invoked by alias); 26 Dec 2019 22:33:48 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 22321 invoked by uid 89); 26 Dec 2019 22:33:47 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=timestamp, consecutive, nbsp, mistakes X-HELO: digraph.polyomino.org.uk Received: from digraph.polyomino.org.uk (HELO digraph.polyomino.org.uk) (81.187.227.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 26 Dec 2019 22:33:46 +0000 Received: from jsm28 (helo=localhost) by digraph.polyomino.org.uk with local-esmtp (Exim 4.90_1) (envelope-from ) id 1ikbhF-0008Q1-3F; Thu, 26 Dec 2019 22:33:33 +0000 Date: Thu, 26 Dec 2019 22:33:00 -0000 From: Joseph Myers To: Jakub Jelinek cc: Alexandre Oliva , "Eric S. Raymond" , Jeff Law , Segher Boessenkool , Mark Wielaard , Maxim Kuvyrkov , "Richard Earnshaw (lists)" , gcc@gcc.gnu.org Subject: Re: Proposal for the transition timetable for the move to GIT In-Reply-To: <20191226111633.GJ10088@tucnak> Message-ID: References: <20191216133632.GC3152@gate.crashing.org> <20191216135451.GA3142@thyrsus.com> <20191216140514.GD3152@gate.crashing.org> <20191216153649.GE3152@gate.crashing.org> <20191225120747.GA96669@thyrsus.com> <20191226111633.GJ10088@tucnak> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2019-12/txt/msg00435.txt.bz2 On Thu, 26 Dec 2019, Jakub Jelinek wrote: > Is there some easy way (e.g. file in the conversion scripts) to correct > spelling and other mistakes in the commit authors? I've added author fixups to bugdb.py, so you can add any number of fixes (e.g. based on authors that look suspicious in "git shortlog -s -e --all" output) to the author_fixups array (and send a merge-request for the gcc-conversion project, or a patch). The case of multiple consecutive spaces in an attribution is now normalized to a single space in reposurgeon, so no fixes are needed for that (and fixups should be given in the form with a single space). In addition to that array of fixes, bugdb.py does the following so they don't need listing in the array of fixups: converts ISO-8859-1 NBSP to space (and trims such spaces at left or right or where the result is multiple consecutive spaces); converts ISO-8859-1 author names (coming from ChangeLog files) to UTF-8 (there are manual fixups for cases where the author in the ChangeLog file didn't seem to be ISO-8859-1 but wasn't valid UTF-8 either); fixes up the cases you found where certain forms of timestamp from the ChangeLog header, or header specifying multiple authors, were used but handled badly in conversion to authors. I've found and reported another case where a form of ChangeLog header used in the past isn't handled at all, and Eric is looking at it. -- Joseph S. Myers jsm@polyomino.org.uk