public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Problems with message translations (was: c++/7765)
@ 2002-10-28 12:08 Wolfgang Bangerth
  2002-10-28 12:52 ` Andreas Schwab
  0 siblings, 1 reply; 3+ messages in thread
From: Wolfgang Bangerth @ 2002-10-28 12:08 UTC (permalink / raw)
  To: gcc; +Cc: zack, robitail


The problem with c++/7765 was that in the original and the translated 
message, not the same %-formats were used (in fact, the format in the 
translation was invalid). I thought this might happen more often, so wrote 
a small script (appended) that checks whether the same formats are used in 
both messages, and whether they are in the same order. The result is 
somewhat negative: there are about 2000 violations in the translations :-(
I guess that at least of them could lead to errors like PR 7765.

I don't think that I can do more about it, but maybe someone can use the 
script for further investigations.

Regards
  Wolfgang

PS1: Here's the script. Call it like "perl check.pl da.po" or whatever 
message catalog you want to check.

$filename = $ARGV[0];
open (IN, $filename);

$line = 0;
while (<IN>) {
    ++$line;

    # get a msgid line and the corresponding msgstr
    next if (! (/^msgid\s*\"(.*)\"/));
    $msgid = $1;

    $_ = <IN>;
    ++$line;
    /msgstr\s*\"(.*)\"/;
    $msgstr = $1;

    # skip empty msgstrs, since this indicates they are simply not
    # translated and the english text will be used
    next if ($msgstr =~ /^\s*$/);

    # then search for %format substrings in both msgid and msgstr
    while ($msgid =~ /(%.)/ ) {
	$idformat = $1;

	if (! ($msgstr =~  /(%.)/)) {
	    print "$filename($line): Format '$idformat' string not found\n";
	    goto next_line;
	}
	$strformat = $1;

	if (! ($idformat eq $strformat)) {
	    print "$filename($line): Formats '$idformat' and '$strformat' " .
		"are ordered differently\n";
	    goto next_line;
	}
	
	# replace the format in both strings by __ to mask it from 
	# matching the next format
	$msgid =~ s/$format/__/;
	$msgstr =~ s/$format/__/;
    }
    
    # make sure that at the end of the process no more formats are in 
    # the msg string
    if ($msgstr =~ /%/) {
	print "$filename($line): Msgstr has excess formats\n";
    }
    
  next_line:
}



PS2: These are the number failures on the branch:
da.po     206
el.po     475
es.po       7
fr.po      13
ja.po     540
nl.po     621
sv.po       1
tr.po     153

Typically, they look like this:
gcc/po> perl check.pl fr.po
fr.po(9949): Msgstr has excess formats
fr.po(13125): Formats '%T' and '%t' are ordered differently
fr.po(13129): Formats '%s' and '%t' are ordered differently
fr.po(13681): Formats '%#' and '%D' are ordered differently
fr.po(14001): Formats '%#' and '%D' are ordered differently
fr.po(14029): Formats '%#' and '%D' are ordered differently

I don't know whether differently ordered formats are a problem at all, but 
the other types (too many, too little, mismatching formats) definitely 
are.

-------------------------------------------------------------------------
Wolfgang Bangerth              email:           bangerth@ticam.utexas.edu
                               www: http://www.ticam.utexas.edu/~bangerth




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems with message translations (was: c++/7765)
  2002-10-28 12:08 Problems with message translations (was: c++/7765) Wolfgang Bangerth
@ 2002-10-28 12:52 ` Andreas Schwab
  2002-10-28 16:12   ` Wolfgang Bangerth
  0 siblings, 1 reply; 3+ messages in thread
From: Andreas Schwab @ 2002-10-28 12:52 UTC (permalink / raw)
  To: Wolfgang Bangerth; +Cc: gcc

Wolfgang Bangerth <bangerth@ticam.utexas.edu> writes:

|> The problem with c++/7765 was that in the original and the translated 
|> message, not the same %-formats were used (in fact, the format in the 
|> translation was invalid). I thought this might happen more often, so wrote 
|> a small script (appended) that checks whether the same formats are used in 
|> both messages, and whether they are in the same order. The result is 
|> somewhat negative: there are about 2000 violations in the translations :-(
|> I guess that at least of them could lead to errors like PR 7765.
|> 
|> I don't think that I can do more about it, but maybe someone can use the 
|> script for further investigations.
|> 
|> Regards
|>   Wolfgang
|> 
|> PS1: Here's the script. Call it like "perl check.pl da.po" or whatever 
|> message catalog you want to check.
|> 
|> $filename = $ARGV[0];
|> open (IN, $filename);
|> 
|> $line = 0;
|> while (<IN>) {
|>     ++$line;
|> 
|>     # get a msgid line and the corresponding msgstr
|>     next if (! (/^msgid\s*\"(.*)\"/));

Note that msgids can be split across lines, so this probably won't catch
all of them.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems with message translations (was: c++/7765)
  2002-10-28 12:52 ` Andreas Schwab
@ 2002-10-28 16:12   ` Wolfgang Bangerth
  0 siblings, 0 replies; 3+ messages in thread
From: Wolfgang Bangerth @ 2002-10-28 16:12 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gcc


> |> $line = 0;
> |> while (<IN>) {
> |>     ++$line;
> |> 
> |>     # get a msgid line and the corresponding msgstr
> |>     next if (! (/^msgid\s*\"(.*)\"/));
> 
> Note that msgids can be split across lines, so this probably won't catch
> all of them.

I was lucky: there are no multi-line message strings in the gcc 
translations :-)

Wolfgang

-------------------------------------------------------------------------
Wolfgang Bangerth              email:           bangerth@ticam.utexas.edu
                               www: http://www.ticam.utexas.edu/~bangerth


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-10-28 18:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-28 12:08 Problems with message translations (was: c++/7765) Wolfgang Bangerth
2002-10-28 12:52 ` Andreas Schwab
2002-10-28 16:12   ` Wolfgang Bangerth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).