* Problems with message translations (was: c++/7765)
@ 2002-10-28 12:08 Wolfgang Bangerth
2002-10-28 12:52 ` Andreas Schwab
0 siblings, 1 reply; 3+ messages in thread
From: Wolfgang Bangerth @ 2002-10-28 12:08 UTC (permalink / raw)
To: gcc; +Cc: zack, robitail
The problem with c++/7765 was that in the original and the translated
message, not the same %-formats were used (in fact, the format in the
translation was invalid). I thought this might happen more often, so wrote
a small script (appended) that checks whether the same formats are used in
both messages, and whether they are in the same order. The result is
somewhat negative: there are about 2000 violations in the translations :-(
I guess that at least of them could lead to errors like PR 7765.
I don't think that I can do more about it, but maybe someone can use the
script for further investigations.
Regards
Wolfgang
PS1: Here's the script. Call it like "perl check.pl da.po" or whatever
message catalog you want to check.
$filename = $ARGV[0];
open (IN, $filename);
$line = 0;
while (<IN>) {
++$line;
# get a msgid line and the corresponding msgstr
next if (! (/^msgid\s*\"(.*)\"/));
$msgid = $1;
$_ = <IN>;
++$line;
/msgstr\s*\"(.*)\"/;
$msgstr = $1;
# skip empty msgstrs, since this indicates they are simply not
# translated and the english text will be used
next if ($msgstr =~ /^\s*$/);
# then search for %format substrings in both msgid and msgstr
while ($msgid =~ /(%.)/ ) {
$idformat = $1;
if (! ($msgstr =~ /(%.)/)) {
print "$filename($line): Format '$idformat' string not found\n";
goto next_line;
}
$strformat = $1;
if (! ($idformat eq $strformat)) {
print "$filename($line): Formats '$idformat' and '$strformat' " .
"are ordered differently\n";
goto next_line;
}
# replace the format in both strings by __ to mask it from
# matching the next format
$msgid =~ s/$format/__/;
$msgstr =~ s/$format/__/;
}
# make sure that at the end of the process no more formats are in
# the msg string
if ($msgstr =~ /%/) {
print "$filename($line): Msgstr has excess formats\n";
}
next_line:
}
PS2: These are the number failures on the branch:
da.po 206
el.po 475
es.po 7
fr.po 13
ja.po 540
nl.po 621
sv.po 1
tr.po 153
Typically, they look like this:
gcc/po> perl check.pl fr.po
fr.po(9949): Msgstr has excess formats
fr.po(13125): Formats '%T' and '%t' are ordered differently
fr.po(13129): Formats '%s' and '%t' are ordered differently
fr.po(13681): Formats '%#' and '%D' are ordered differently
fr.po(14001): Formats '%#' and '%D' are ordered differently
fr.po(14029): Formats '%#' and '%D' are ordered differently
I don't know whether differently ordered formats are a problem at all, but
the other types (too many, too little, mismatching formats) definitely
are.
-------------------------------------------------------------------------
Wolfgang Bangerth email: bangerth@ticam.utexas.edu
www: http://www.ticam.utexas.edu/~bangerth
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Problems with message translations (was: c++/7765)
2002-10-28 12:08 Problems with message translations (was: c++/7765) Wolfgang Bangerth
@ 2002-10-28 12:52 ` Andreas Schwab
2002-10-28 16:12 ` Wolfgang Bangerth
0 siblings, 1 reply; 3+ messages in thread
From: Andreas Schwab @ 2002-10-28 12:52 UTC (permalink / raw)
To: Wolfgang Bangerth; +Cc: gcc
Wolfgang Bangerth <bangerth@ticam.utexas.edu> writes:
|> The problem with c++/7765 was that in the original and the translated
|> message, not the same %-formats were used (in fact, the format in the
|> translation was invalid). I thought this might happen more often, so wrote
|> a small script (appended) that checks whether the same formats are used in
|> both messages, and whether they are in the same order. The result is
|> somewhat negative: there are about 2000 violations in the translations :-(
|> I guess that at least of them could lead to errors like PR 7765.
|>
|> I don't think that I can do more about it, but maybe someone can use the
|> script for further investigations.
|>
|> Regards
|> Wolfgang
|>
|> PS1: Here's the script. Call it like "perl check.pl da.po" or whatever
|> message catalog you want to check.
|>
|> $filename = $ARGV[0];
|> open (IN, $filename);
|>
|> $line = 0;
|> while (<IN>) {
|> ++$line;
|>
|> # get a msgid line and the corresponding msgstr
|> next if (! (/^msgid\s*\"(.*)\"/));
Note that msgids can be split across lines, so this probably won't catch
all of them.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Problems with message translations (was: c++/7765)
2002-10-28 12:52 ` Andreas Schwab
@ 2002-10-28 16:12 ` Wolfgang Bangerth
0 siblings, 0 replies; 3+ messages in thread
From: Wolfgang Bangerth @ 2002-10-28 16:12 UTC (permalink / raw)
To: Andreas Schwab; +Cc: gcc
> |> $line = 0;
> |> while (<IN>) {
> |> ++$line;
> |>
> |> # get a msgid line and the corresponding msgstr
> |> next if (! (/^msgid\s*\"(.*)\"/));
>
> Note that msgids can be split across lines, so this probably won't catch
> all of them.
I was lucky: there are no multi-line message strings in the gcc
translations :-)
Wolfgang
-------------------------------------------------------------------------
Wolfgang Bangerth email: bangerth@ticam.utexas.edu
www: http://www.ticam.utexas.edu/~bangerth
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2002-10-28 18:48 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-28 12:08 Problems with message translations (was: c++/7765) Wolfgang Bangerth
2002-10-28 12:52 ` Andreas Schwab
2002-10-28 16:12 ` Wolfgang Bangerth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).