public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Error when accessing git read-only archive
@ 2021-09-13 12:52 Thomas Koenig
  2021-09-13 13:01 ` Jonathan Wakely
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Koenig @ 2021-09-13 12:52 UTC (permalink / raw)
  To: gcc mailing list

Hi,

I just got an error when accessing the gcc git pages at
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git , it is:

This page contains the following errors:
error on line 91 at column 6: XML declaration allowed only at the start 
of the document
Below is a rendering of the page up to the first error.

Just to let you know (and it would be nice if this could be fixed :-)

Best regards

	Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Error when accessing git read-only archive
  2021-09-13 12:52 Error when accessing git read-only archive Thomas Koenig
@ 2021-09-13 13:01 ` Jonathan Wakely
  2021-09-13 13:03   ` Jonathan Wakely
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2021-09-13 13:01 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: gcc mailing list

On Mon, 13 Sept 2021 at 13:53, Thomas Koenig via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi,
>
> I just got an error when accessing the gcc git pages at
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git , it is:
>
> This page contains the following errors:
> error on line 91 at column 6: XML declaration allowed only at the start
> of the document
> Below is a rendering of the page up to the first error.

The web server seems to restart the page in the middle of the HTML,
the content contains:

</tr>
<tr class="light">
Content-type: text/html

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Error when accessing git read-only archive
  2021-09-13 13:01 ` Jonathan Wakely
@ 2021-09-13 13:03   ` Jonathan Wakely
  2021-09-15 13:18     ` Jonathan Wakely
  2021-09-15 13:21     ` David Malcolm
  0 siblings, 2 replies; 10+ messages in thread
From: Jonathan Wakely @ 2021-09-13 13:03 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: gcc mailing list

On Mon, 13 Sept 2021 at 14:01, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>
> On Mon, 13 Sept 2021 at 13:53, Thomas Koenig via Gcc <gcc@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > I just got an error when accessing the gcc git pages at
> > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git , it is:
> >
> > This page contains the following errors:
> > error on line 91 at column 6: XML declaration allowed only at the start
> > of the document
> > Below is a rendering of the page up to the first error.
>
> The web server seems to restart the page in the middle of the HTML,
> the content contains:
>
> </tr>
> <tr class="light">
> Content-type: text/html
>
> <?xml version="1.0" encoding="utf-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">

Ah, the "second" page it's trying to display (in the middle of the
first) is an error:

<div class="page_body">
<br /><br />
500 - Internal Server Error
<br />
<hr />
Wide character in subroutine entry at /var/www/git/gitweb.cgi line 2208.

</div>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Error when accessing git read-only archive
  2021-09-13 13:03   ` Jonathan Wakely
@ 2021-09-15 13:18     ` Jonathan Wakely
  2021-09-15 13:21     ` David Malcolm
  1 sibling, 0 replies; 10+ messages in thread
From: Jonathan Wakely @ 2021-09-15 13:18 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: gcc mailing list, Jan-Benedict Glaw

On Mon, 13 Sept 2021 at 14:03, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>
> On Mon, 13 Sept 2021 at 14:01, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> >
> > On Mon, 13 Sept 2021 at 13:53, Thomas Koenig via Gcc <gcc@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > >
> > > I just got an error when accessing the gcc git pages at
> > > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git , it is:
> > >
> > > This page contains the following errors:
> > > error on line 91 at column 6: XML declaration allowed only at the start
> > > of the document
> > > Below is a rendering of the page up to the first error.
> >
> > The web server seems to restart the page in the middle of the HTML,
> > the content contains:
> >
> > </tr>
> > <tr class="light">
> > Content-type: text/html
> >
> > <?xml version="1.0" encoding="utf-8"?>
> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
>
> Ah, the "second" page it's trying to display (in the middle of the
> first) is an error:
>
> <div class="page_body">
> <br /><br />
> 500 - Internal Server Error
> <br />
> <hr />
> Wide character in subroutine entry at /var/www/git/gitweb.cgi line 2208.
>
> </div>

Jan-Benedict managed to push a commit with a non-ASCII author email,
which gitweb can't handle.

f42e95a830ab48e59389065ce79a013a519646f1 says "@ług-owl.de"

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Error when accessing git read-only archive
  2021-09-13 13:03   ` Jonathan Wakely
  2021-09-15 13:18     ` Jonathan Wakely
@ 2021-09-15 13:21     ` David Malcolm
  2021-09-15 13:37       ` Jan-Benedict Glaw
  1 sibling, 1 reply; 10+ messages in thread
From: David Malcolm @ 2021-09-15 13:21 UTC (permalink / raw)
  To: Jonathan Wakely, Thomas Koenig; +Cc: gcc mailing list

On Mon, 2021-09-13 at 14:03 +0100, Jonathan Wakely via Gcc wrote:
> On Mon, 13 Sept 2021 at 14:01, Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
> > 
> > On Mon, 13 Sept 2021 at 13:53, Thomas Koenig via Gcc <
> > gcc@gcc.gnu.org> wrote:
> > > 
> > > Hi,
> > > 
> > > I just got an error when accessing the gcc git pages at
> > > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git , it is:
> > > 
> > > This page contains the following errors:
> > > error on line 91 at column 6: XML declaration allowed only at the
> > > start
> > > of the document
> > > Below is a rendering of the page up to the first error.
> > 
> > The web server seems to restart the page in the middle of the HTML,
> > the content contains:
> > 
> > </tr>
> > <tr class="light">
> > Content-type: text/html
> > 
> > <?xml version="1.0" encoding="utf-8"?>
> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-
> > US">
> 
> Ah, the "second" page it's trying to display (in the middle of the
> first) is an error:
> 
> <div class="page_body">
> <br /><br />
> 500 - Internal Server Error
> <br />
> <hr />
> Wide character in subroutine entry at /var/www/git/gitweb.cgi line
> 2208.
> 
> </div>

Summarizing some notes from IRC:

The last commit it manages to print successfully in that log seems to
be:
  c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
so it appears that:
  42e95a830ab48e59389065ce79a013a519646f1
is triggering the issue, and indeed
  https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f42e95a830ab48e59389065ce79a013a519646f1
fails in a similar way, whereas other commits work.

It appears to be due to the "ł" character in the email address of the
Author, in that:

commit c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
Author: Jan-Benedict Glaw <jbglaw@lug-owl.de>

works, whereas:

commit f42e95a830ab48e59389065ce79a013a519646f1
Author: Jan-Benedict Glaw <jbglaw@ług-owl.de>

doesn't.

git show f42e95a830ab48e59389065ce79a013a519646f1 | hexdump -C

shows:

00000030  41 75 74 68 6f 72 3a 20  4a 61 6e 2d 42 65 6e 65  |Author: Jan-Bene|
00000040  64 69 63 74 20 47 6c 61  77 20 3c 6a 62 67 6c 61  |dict Glaw <jbgla|
00000050  77 40 c5 82 75 67 2d 6f  77 6c 2e 64 65 3e 0a 44  |w@..ug-owl.de>.D|
00000060  61 74 65 3a 20 20 20 4d  6f 6e 20 53 65 70 20 31  |ate:   Mon Sep 1|

i.e. we have the two bytes 0xc5 0x82, which is the UTF-8 encoding of "ł".


$ git format-patch c012297c9d5dfb177adf1423bdd05e5f4b87e5ec^^..c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
0001-Fix-multi-statment-macro.patch
0002-cr16-elf-is-now-obsoleted.patch
$ file *.patch
0001-Fix-multi-statment-macro.patch:  unified diff output, UTF-8 Unicode text
0002-cr16-elf-is-now-obsoleted.patch: unified diff output, ASCII text


Hope this is helpful
Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Error when accessing git read-only archive
  2021-09-15 13:21     ` David Malcolm
@ 2021-09-15 13:37       ` Jan-Benedict Glaw
  2021-09-15 13:43         ` Jonathan Wakely
  0 siblings, 1 reply; 10+ messages in thread
From: Jan-Benedict Glaw @ 2021-09-15 13:37 UTC (permalink / raw)
  To: David Malcolm; +Cc: Jonathan Wakely, Thomas Koenig, gcc mailing list

[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]

Hi David, Jonathan and all others,

On Wed, 2021-09-15 09:21:04 -0400, David Malcolm via Gcc <gcc@gcc.gnu.org> wrote:
> On Mon, 2021-09-13 at 14:03 +0100, Jonathan Wakely via Gcc wrote:
> > On Mon, 13 Sept 2021 at 14:01, Jonathan Wakely <jwakely.gcc@gmail.com>  wrote:
> > > On Mon, 13 Sept 2021 at 13:53, Thomas Koenig via Gcc <gcc@gcc.gnu.org> wrote:
> 
> Summarizing some notes from IRC:
> 
> The last commit it manages to print successfully in that log seems to
> be:
>   c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
> so it appears that:
>   42e95a830ab48e59389065ce79a013a519646f1
> is triggering the issue, and indeed
>   https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f42e95a830ab48e59389065ce79a013a519646f1
> fails in a similar way, whereas other commits work.
> 
> It appears to be due to the "ł" character in the email address of the
> Author, in that:
> 
> commit c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
> Author: Jan-Benedict Glaw <jbglaw@lug-owl.de>
> 
> works, whereas:
> 
> commit f42e95a830ab48e59389065ce79a013a519646f1
> Author: Jan-Benedict Glaw <jbglaw@ług-owl.de>

That was indeed me, after moving my GCC repo to a different machine
and adding an explicit user.email (as this wasn't automatically
picking up a proper domain.) The "ł" was a typo (AltGr key still
pressed while typing the "l" after having entered the "@" which
requires it on a German keyboard layout.)

  So I broke it. Any way to make sure something like this doesn't
occur again?

Sorry for inconvenience!

  Jan-Benedict

-- 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Error when accessing git read-only archive
  2021-09-15 13:37       ` Jan-Benedict Glaw
@ 2021-09-15 13:43         ` Jonathan Wakely
  2021-09-15 14:10           ` Mark Wielaard
  2021-09-15 14:14           ` Jan-Benedict Glaw
  0 siblings, 2 replies; 10+ messages in thread
From: Jonathan Wakely @ 2021-09-15 13:43 UTC (permalink / raw)
  To: Jan-Benedict Glaw; +Cc: David Malcolm, Thomas Koenig, gcc mailing list

On Wed, 15 Sept 2021 at 14:37, Jan-Benedict Glaw <jbglaw@lug-owl.de> wrote:
>
> Hi David, Jonathan and all others,
>
> On Wed, 2021-09-15 09:21:04 -0400, David Malcolm via Gcc <gcc@gcc.gnu.org> wrote:
> > On Mon, 2021-09-13 at 14:03 +0100, Jonathan Wakely via Gcc wrote:
> > > On Mon, 13 Sept 2021 at 14:01, Jonathan Wakely <jwakely.gcc@gmail.com>  wrote:
> > > > On Mon, 13 Sept 2021 at 13:53, Thomas Koenig via Gcc <gcc@gcc.gnu.org> wrote:
> >
> > Summarizing some notes from IRC:
> >
> > The last commit it manages to print successfully in that log seems to
> > be:
> >   c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
> > so it appears that:
> >   42e95a830ab48e59389065ce79a013a519646f1
> > is triggering the issue, and indeed
> >   https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f42e95a830ab48e59389065ce79a013a519646f1
> > fails in a similar way, whereas other commits work.
> >
> > It appears to be due to the "ł" character in the email address of the
> > Author, in that:
> >
> > commit c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
> > Author: Jan-Benedict Glaw <jbglaw@lug-owl.de>
> >
> > works, whereas:
> >
> > commit f42e95a830ab48e59389065ce79a013a519646f1
> > Author: Jan-Benedict Glaw <jbglaw@ług-owl.de>
>
> That was indeed me, after moving my GCC repo to a different machine
> and adding an explicit user.email (as this wasn't automatically
> picking up a proper domain.) The "ł" was a typo (AltGr key still
> pressed while typing the "l" after having entered the "@" which
> requires it on a German keyboard layout.)
>
>   So I broke it. Any way to make sure something like this doesn't
> occur again?

We could add a check to the git hooks (and gcc-verify alias) to reject
non-ASCII email addresses, since they're probably mistakes.

And we should report it to Gitweb (if it isn't already fixed upstream)
and get a fix into the version used on gcc.gnu.org.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Error when accessing git read-only archive
  2021-09-15 13:43         ` Jonathan Wakely
@ 2021-09-15 14:10           ` Mark Wielaard
  2021-09-15 18:34             ` Jan-Benedict Glaw
  2021-09-15 14:14           ` Jan-Benedict Glaw
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Wielaard @ 2021-09-15 14:10 UTC (permalink / raw)
  To: Jonathan Wakely, Jan-Benedict Glaw; +Cc: Thomas Koenig, gcc mailing list

Hi,

On Wed, 2021-09-15 at 14:43 +0100, Jonathan Wakely via Gcc wrote:
> On Wed, 15 Sept 2021 at 14:37, Jan-Benedict Glaw <jbglaw@lug-owl.de>
> wrote:
> > On Wed, 2021-09-15 09:21:04 -0400, David Malcolm via Gcc <
> > gcc@gcc.gnu.org> wrote:
> > > It appears to be due to the "ł" character in the email address of
> > > the
> > > Author, in that:
> > > 
> > > commit c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
> > > Author: Jan-Benedict Glaw <jbglaw@lug-owl.de>
> > > 
> > > works, whereas:
> > > 
> > > commit f42e95a830ab48e59389065ce79a013a519646f1
> > > Author: Jan-Benedict Glaw <jbglaw@ług-owl.de>
> > 
> > That was indeed me, after moving my GCC repo to a different machine
> > and adding an explicit user.email (as this wasn't automatically
> > picking up a proper domain.) The "ł" was a typo (AltGr key still
> > pressed while typing the "l" after having entered the "@" which
> > requires it on a German keyboard layout.)
> > 
> >   So I broke it. Any way to make sure something like this doesn't
> > occur again?
> 
> We could add a check to the git hooks (and gcc-verify alias) to
> reject
> non-ASCII email addresses, since they're probably mistakes.
> 
> And we should report it to Gitweb (if it isn't already fixed
> upstream)
> and get a fix into the version used on gcc.gnu.org.

The issue is the gravatar support, which calls md5_hex($email).
For now I disabled gravatar support on sourceware.org/gcc.gnu.org in
/etc/gitweb.conf

Cheers,

Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Error when accessing git read-only archive
  2021-09-15 13:43         ` Jonathan Wakely
  2021-09-15 14:10           ` Mark Wielaard
@ 2021-09-15 14:14           ` Jan-Benedict Glaw
  1 sibling, 0 replies; 10+ messages in thread
From: Jan-Benedict Glaw @ 2021-09-15 14:14 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: David Malcolm, Thomas Koenig, gcc mailing list

[-- Attachment #1: Type: text/plain, Size: 837 bytes --]

Hi Jonathan!

On Wed, 2021-09-15 14:43:45 +0100, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> On Wed, 15 Sept 2021 at 14:37, Jan-Benedict Glaw <jbglaw@lug-owl.de> wrote:
[UTF-8 in committer's email addresses]
> >   So I broke it. Any way to make sure something like this doesn't
> > occur again?
> 
> We could add a check to the git hooks (and gcc-verify alias) to reject
> non-ASCII email addresses, since they're probably mistakes.

It was indeed a typo for me, but others might, in the long run,
actually use IDNs. Should they prepare their commits using Punycode?

> And we should report it to Gitweb (if it isn't already fixed upstream)
> and get a fix into the version used on gcc.gnu.org.

I hope the local fix is already forwarded. That was quite a Brown
Paperbag typo. :(

Sorry,
  Jan-Benedict

-- 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Error when accessing git read-only archive
  2021-09-15 14:10           ` Mark Wielaard
@ 2021-09-15 18:34             ` Jan-Benedict Glaw
  0 siblings, 0 replies; 10+ messages in thread
From: Jan-Benedict Glaw @ 2021-09-15 18:34 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Jonathan Wakely, Thomas Koenig, gcc mailing list

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

Hi,

On Wed, 2021-09-15 16:10:50 +0200, Mark Wielaard <mark@klomp.org> wrote:
[UTF-8 email address containing a 'ł']
> The issue is the gravatar support, which calls md5_hex($email).
> For now I disabled gravatar support on sourceware.org/gcc.gnu.org in
> /etc/gitweb.conf

I am not a Perl guy, but it seems this works (tested locally):


--- a/gitweb/gitweb.perl	2021-09-15 20:23:13.788195846 +0200
+++ b/gitweb/gitweb.perl	2021-09-15 20:24:19.911806868 +0200
@@ -2193,7 +2193,7 @@
 	my $size = shift;
 	$avatar_cache{$email} ||=
 		"//www.gravatar.com/avatar/" .
-			md5_hex($email) . "?s=";
+			md5_hex(utf8::is_utf8($email)? Encode::encode_utf8($email): $email) . "?s=";
 	return $avatar_cache{$email} . $size;
 }
 

I'll send that to the GIT mailing list and ask for verification.

Thanks,
  Jan-Benedict

-- 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-15 18:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-13 12:52 Error when accessing git read-only archive Thomas Koenig
2021-09-13 13:01 ` Jonathan Wakely
2021-09-13 13:03   ` Jonathan Wakely
2021-09-15 13:18     ` Jonathan Wakely
2021-09-15 13:21     ` David Malcolm
2021-09-15 13:37       ` Jan-Benedict Glaw
2021-09-15 13:43         ` Jonathan Wakely
2021-09-15 14:10           ` Mark Wielaard
2021-09-15 18:34             ` Jan-Benedict Glaw
2021-09-15 14:14           ` Jan-Benedict Glaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).