From: "Martin Liška" <mliska@suse.cz>
To: Joel Brobecker <brobecker@adacore.com>
Cc: Jakub Jelinek <jakub@redhat.com>,
Jonathan Wakely <jwakely.gcc@gmail.com>,
gcc-patches <gcc-patches@gcc.gnu.org>,
Ian Lance Taylor <iant@golang.org>
Subject: Re: Patch RFA: Support non-ASCII file names in git-changelog
Date: Wed, 6 Jan 2021 08:25:52 +0100 [thread overview]
Message-ID: <868ce542-e180-dfd4-1bdd-6730a1011494@suse.cz> (raw)
In-Reply-To: <3c44a148-9514-cf34-0e76-cba9b08b5027@suse.cz>
[-- Attachment #1: Type: text/plain, Size: 551 bytes --]
On 1/4/21 12:47 PM, Martin Liška wrote:
> On 1/4/21 12:01 PM, Martin Liška wrote:
>> Anyway, I'm going to update server hook first and I'll create an issue for GitPython.
>
> So I was not correct about this. Also the server hooks uses now GitPython
> to identify modified files.
>
> I've just created an issue for that:
> https://github.com/gitpython-developers/GitPython/issues/1099
This one got fixed and it's present in the newly done release v3.1.12.
Anyway, I've got a workaround that I'm going to push.
Martin
>
> Martin
[-- Attachment #2: 0001-gcc-changelog-workaround-for-utf8-filenames.patch --]
[-- Type: text/x-patch, Size: 5247 bytes --]
From ed9ffe47d6964dc92c92cfddbb8aac555c28e085 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Wed, 6 Jan 2021 08:11:57 +0100
Subject: [PATCH] gcc-changelog: workaround for utf8 filenames
contrib/ChangeLog:
* gcc-changelog/git_commit.py: Add decode_path function.
* gcc-changelog/git_email.py: Use it in order to solve
utf8 encoding filename issues.
* gcc-changelog/git_repository.py: Likewise.
* gcc-changelog/test_email.py: Test it.
---
contrib/gcc-changelog/git_commit.py | 26 +++++++++++++++++--------
contrib/gcc-changelog/git_email.py | 6 +++---
contrib/gcc-changelog/git_repository.py | 6 +++---
contrib/gcc-changelog/test_email.py | 3 ++-
4 files changed, 26 insertions(+), 15 deletions(-)
diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
index d2e5dbe294a..ee1973371be 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -174,6 +174,24 @@ REVIEW_PREFIXES = ('reviewed-by: ', 'reviewed-on: ', 'signed-off-by: ',
DATE_FORMAT = '%Y-%m-%d'
+def decode_path(path):
+ # When core.quotepath is true (default value), utf8 chars are encoded like:
+ # "b/ko\304\215ka.txt"
+ #
+ # The upstream bug is fixed:
+ # https://github.com/gitpython-developers/GitPython/issues/1099
+ #
+ # but we still need a workaround for older versions of the library.
+ # Please take a look at the explanation of the transformation:
+ # https://stackoverflow.com/questions/990169/how-do-convert-unicode-escape-sequences-to-unicode-characters-in-a-python-string
+
+ if path.startswith('"') and path.endswith('"'):
+ return (path.strip('"').encode('utf8').decode('unicode-escape')
+ .encode('latin-1').decode('utf8'))
+ else:
+ return path
+
+
class Error:
def __init__(self, message, line=None):
self.message = message
@@ -303,14 +321,6 @@ class GitCommit:
'separately from normal commits'))
return
- # check for an encoded utf-8 filename
- hint = 'git config --global core.quotepath false'
- for modified, _ in self.info.modified_files:
- if modified.startswith('"') or modified.endswith('"'):
- self.errors.append(Error('Quoted UTF8 filename, please set: '
- f'"{hint}"', modified))
- return
-
all_are_ignored = (len(project_files) + len(ignored_files)
== len(self.info.modified_files))
self.parse_lines(all_are_ignored)
diff --git a/contrib/gcc-changelog/git_email.py b/contrib/gcc-changelog/git_email.py
index 5b53ca4a6a9..00ad00458f4 100755
--- a/contrib/gcc-changelog/git_email.py
+++ b/contrib/gcc-changelog/git_email.py
@@ -22,7 +22,7 @@ from itertools import takewhile
from dateutil.parser import parse
-from git_commit import GitCommit, GitInfo
+from git_commit import GitCommit, GitInfo, decode_path
from unidiff import PatchSet, PatchedFile
@@ -52,8 +52,8 @@ class GitEmail(GitCommit):
modified_files = []
for f in diff:
# Strip "a/" and "b/" prefixes
- source = f.source_file[2:]
- target = f.target_file[2:]
+ source = decode_path(f.source_file)[2:]
+ target = decode_path(f.target_file)[2:]
if f.is_added_file:
t = 'A'
diff --git a/contrib/gcc-changelog/git_repository.py b/contrib/gcc-changelog/git_repository.py
index 8edcff91ad6..a0e293d756d 100755
--- a/contrib/gcc-changelog/git_repository.py
+++ b/contrib/gcc-changelog/git_repository.py
@@ -26,7 +26,7 @@ except ImportError:
print(' Debian, Ubuntu: python3-git')
exit(1)
-from git_commit import GitCommit, GitInfo
+from git_commit import GitCommit, GitInfo, decode_path
def parse_git_revisions(repo_path, revisions, strict=True):
@@ -51,11 +51,11 @@ def parse_git_revisions(repo_path, revisions, strict=True):
# Consider that renamed files are two operations:
# the deletion of the original name
# and the addition of the new one.
- modified_files.append((file.a_path, 'D'))
+ modified_files.append((decode_path(file.a_path), 'D'))
t = 'A'
else:
t = 'M'
- modified_files.append((file.b_path, t))
+ modified_files.append((decode_path(file.b_path), t))
date = datetime.utcfromtimestamp(c.committed_date)
author = '%s <%s>' % (c.author.name, c.author.email)
diff --git a/contrib/gcc-changelog/test_email.py b/contrib/gcc-changelog/test_email.py
index 2053531452c..5db56caef9e 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -402,4 +402,5 @@ class TestGccChangelog(unittest.TestCase):
def test_bad_unicode_chars_in_filename(self):
email = self.from_patch_glob('0001-Add-horse2.patch')
- assert email.errors[0].message.startswith('Quoted UTF8 filename')
+ assert not email.errors
+ assert email.changelog_entries[0].files == ['koníček.txt']
--
2.29.2
next prev parent reply other threads:[~2021-01-06 7:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-17 4:26 Ian Lance Taylor
2020-12-18 10:28 ` Martin Liška
2020-12-18 18:30 ` Ian Lance Taylor
2020-12-21 9:39 ` Martin Liška
2020-12-21 9:48 ` Jakub Jelinek
2020-12-21 9:57 ` Martin Liška
2020-12-24 12:16 ` Joel Brobecker
2021-01-04 11:01 ` Martin Liška
2021-01-04 11:47 ` Martin Liška
2021-01-06 7:25 ` Martin Liška [this message]
2021-01-06 13:37 ` Martin Liška
2021-01-07 19:03 ` Ian Lance Taylor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=868ce542-e180-dfd4-1bdd-6730a1011494@suse.cz \
--to=mliska@suse.cz \
--cc=brobecker@adacore.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=iant@golang.org \
--cc=jakub@redhat.com \
--cc=jwakely.gcc@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).